Neil A. Macmillan, C. Douglas Creelman Detection Theory A User's Guide Psychology Press (2004)

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 513

DownloadNeil A. Macmillan, C. Douglas Creelman - Detection Theory A User's Guide-Psychology Press (2004)
Open PDF In BrowserView PDF
Detection Theory:
A User's Guide
(2nd edition)

This page intentionally left blank

Detection Theory:
A User's Guide
(2nd edition)

NEEL A. MACMILLAN
University of Massachusetts
and

C. DOUGLAS CREELMAN
University of Toronto

2005

LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS
Mahwah, New Jersey
London

Copyright © 2005 by Lawrence Erlbaum Associates, Inc.
All rights reserved. No part of this book may be reproduced in any form,
by photostat, microform, retrieval system, or any other means, without
prior written permission of the publisher.
Lawrence Erlbaum Associates, Inc., Publishers
10 Industrial Avenue
Mahwah, New Jersey 07430
Cover design by Kathryn Houghtaling Lacey
Library of Congress Cataloging-in-Publication Data
Macmillan, Neil A.
Detection theory : a user's guide / Neil A. Macmillan, C. Douglas Creelman.
—2nd ed.
p. cm.

Includes bibliographical references and index.
ISBN 0-8058-4230-6 (cloth : alk. paper)
ISBN 0-8058-4231-4 (pbk. : alk. paper)
1. Signal detection (Psychology). I. Creelman, C. Douglas.
BF237.M25 2004
152.8—dc22

II. Title.
2004043261
CIP

Books published by Lawrence Erlbaum Associates are printed on acid-free
paper, and their bindings are chosen for strength and durability.
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1

To

David M. Green, R. Duncan Luce, John A. Swets,
and the memory of Wilson R Tanner, Jr.

This page intentionally left blank

Contents

Preface

xiii

Introduction

xvii

PART I. Basic Detection Theory and One-Interval
Designs
1

The Yes-No Experiment: Sensitivity
Understanding Yes-No Data
Implied ROCs
The Signal Detection Model
Calculational Methods
Essay: The Provenance of Detection Theory
Summary
Problems

3
3
9
16
20
22
24
25

2

The Yes-No Experiment: Response Bias
Two Examples
Measuring Response Bias
Alternative Measures of Bias
Isobias Curves
Comparing the Bias Measures
How Does the Participant Choose a Decision Rule?
Coda: Calculating Hit and False-Alarm Rates From
Parameters
Essay: On Human Decision Making
Summary

27
27
28
31
35
36
42
44
46
47
vii

viii

Contents
Computational Appendix
Problems
3

4

5

The Rating Experiment and Empirical ROCs
Design of Rating Experiments
ROC Analysis
ROC Analysis With Slopes Other Than 1
Estimating Bias
Systematic Parameter Estimation and Calculational
Methods
Alternative Ways to Generate ROCs
Another Kind of ROC: Type 2
Essay: Are ROCs Necessary?
Summary
Computational Appendix
Problems
Alternative Approaches: Threshold Models
and Choice Theory
Single High-Threshold Theory
Low-Threshold Theory
Double High-Threshold Theory
Choice Theory
Measures Based on Areas in ROC Space:
Unintentional Applications of Choice Theory
Nonparametric Analysis of Rating Data
Essay: The Appeal of Discrete Models
Summary
Computational Appendix
Problems
Classification Experiments for One-Dimensional
Stimulus Sets
Design of Classification Experiments
Perceptual One-Dimensionality
Two-Response Classification
Experiments With More Than Two Responses
Nonparametric Measures

48
48
51
51
53
57
64
70
71
73
74
77
77
78
81
82
86
88
94
100
104
104
107
108
109
113
113
114
115
126
130

Contents
Comparing Classification and Discrimination
Summary
Problems

ix
132
135
136

PART II. Multidimensional Detection Theory
and Multi-Interval Discrimination Designs
6

7

8

9

Detection and Discrimination of Compound Stimuli:
Tools for Multidimensional Detection Theory
Distributions in One- and Two-Dimensional Spaces
Some Characteristics of Two-Dimensional Spaces
Compound Detection
Inferring the Representation From Data
Summary
Problems

141
142
149
152
159
161
161

Comparison (Two-Distribution) Designs
for Discrimination
Two-Alternative Forced Choice (2AFC)
Reminder Paradigm
Essay: Psychophysical Comparisons
and Comparison Designs
Summary
Problems

165

Classification Designs: Attention and Interaction
One-Dimensional Representations and Uncertainty
Two-Dimensional Representations
Two-Dimensional Models for Extrinsic Uncertain
Detection
Uncertain Simple and Compound Detection
Selective and Divided Attention Tasks
Attention Operating Characteristics (AOCs)
Summary
Problems

187
188
191
196
200
202
206
209
210

Classification Designs for Discrimination
Same-Different
ABX (Matching-to-Sample)

213
214
229

166
180
182
184
184

x

Contents
Oddity (Triangular Method)
Summary
Computational Appendix
Problems
10 Identification of Multidimensional Objects
and Multiple Observation Intervals
Object Identification
Interval Identification: m-Alternative Forced
Choice (mAFC)
Comparisons Among Discrimination Paradigms
Simultaneous Detection and Identification
Using Identification to Test for Perceptual
Interaction
Essay: How to Choose an Experimental Design
Summary
Problems

PART III.

235
238
240
242
245
246
249
252
255
259
262
264
265

Stimulus Factors

11 Adaptive Methods for Estimating Empirical Thresholds
Two Examples
Psychometric Functions
The Tracking Algorithm: Choices for the Adaptive
Tester
Evaluation of Tracking Algorithms
Two More Choices: Discrimination Paradigm
and the Issue of Slope
Summary
Problems

269
270
272
217

294
295

12 Components of Sensitivity
Stimulus Determinants of d' in One Dimension
Basic Processes in Multiple Dimensions
Hierarchical Models
Essay: Psychophysics versus Psychoacoustics (etc.)
Summary
Problems

297
298
304
310
312
314
314

289
292

Contents

xi

PART IV. Statistics
13 Statistics and Detection Theory
Hit and False-Alarm Rates
Sensitivity and Bias Measures
Sensitivity Estimates Based on Averaged Data
Systematic Statistical Frameworks for Detection
Theory
Summary
Computational Appendix
Problems

319
320
323
331
337
339
340
341

APPENDICES
Appendix 1 Elements of Probability and Statistics
Probability
Statistics

343
343
351

Appendix 2 Logarithms and Exponentials

357

Appendix 3 Flowcharts to Sensitivity and Bias
Calculations
Chart 1: Guide to Subsequent Charts
Chart 2: Yes-No Sensitivity
Chart 3: Yes-No Response Bias
Chart 4: Rating-Design Sensitivity
Chart 5: Definitions of Multi-Interval Designs
Chart 6: Multi-Interval Sensitivity
Chart 7: Multi-Interval Bias
Chart 8: Classification

359

Appendix 4 Some Useful Equations

369

Appendix 5 Tables
A5.1. Normal Distribution (p to z), for Finding d', c,
and Other SDT Statistics
A5.2. Normal Distribution (z to p)
A5.3. Values ofd' for Same-Different (IndependentObservation Model) and ABX (IndependentObservation and Differencing Models)

374
375

360
361
362
363
364
365
366
367

376
380

xii

Contents
A5.4. Values of a" for Same-Different (Differencing
Model)
A5.5. Values of d' for Oddity, Gaussian Model
A5.6. Values ofp(c) given d" for Oddity
(Differencing and Independent-Observation
Model, Normal)
A5.7. Values of d' for m-Interval Forced Choice
or Identification

401
420
424
426

Appendix 6 Software for Detection Theory
Listing
Web Sites

431
431
433

Appendix 7 Solutions to Selected Problems

435

Glossary

447

References

463

Author Index

477

Subject Index

483

Preface

Detection theory entered psychology as a way to explain detection experiments, in which weak visual or auditory signals must be distinguished from
a "noisy" background. In Signal Detection Theory and Psychophysics
(1966), David Green and John Swets portrayed observers as decision makers trying to optimize performance in the face of unpredictable variability,
and they prescribed experimental methods and data analyses for separating
decision factors from sensory ones.
Since Green and Swets' classic was published, both the content of detection theory and the way it is used have changed. The theory has deepened to
include alternative theoretical assumptions and has been used to analyze
many experimental tasks. The range of substantive problems to which the
theory has been applied has broadened greatly. The contemporary user of
detection theory may be a sensory psychologist, but more typically is interested in memory, cognition, or systems for medical or nonmedical diagnosis. In this book, we draw heavily on the work of Green, Swets, and other
pioneers, but aim for a seamless meshing of historical beginnings and current perspective. In recognition that these methods are often used in situations far from the original problem of finding a "signal" in background
noise, we have omitted the word signal from the title and usually refer to
these methods simply as detection theory.
We are writing with two types of readers in mind: those learning detection theory, and those applying it. For those encountering detection theory
for the first time, this book is a textbook. It could be the basic text in a
one-semester graduate or upper level undergraduate course, or it could be a
supplementary text in a broader course on psychophysics, methodology, or
a substantive topic. We imagine a student who has survived one semester of
"behavioral" statistics at the undergraduate level, and have tried to make the
book accessible to such a person in several ways. First, we provide appenxiii

xiv

Preface

dixes on probability and statistics (Appendix 1) and logarithms (Appendix
2). Second, there are a large number of problems, some with answers.
Third, to the extent possible, the more complex mathematical derivations
have been placed in "Computational Appendixes" at the ends of chapters.
Finally, some conceptually advanced but essential ideas, especially from
multidimensional detection theory, are presented in tutorial detail.
For researchers who use detection theory, this book is a handbook. As far
as possible, the material needed to apply the described techniques is complete in the book. A road map to most methods is provided by the flowcharts
of Appendix 3, which direct the user to appropriate equations (Appendix 4)
and tables (Appendix 5). The software appendix (Appendix 6) provides a
listing of a program for finding the most common detection theory statistics, and directions to standard software and Web sites for a wide range of
calculations.
An important difference between this second edition and its predecessor is
the prominence of multidimensional detection theory, to which the five chapters of Part II are devoted. This topic was covered in a single chapter of the
first edition, and the increase is due to two factors. First, there has been an explosion of multidimensional applications in the past decade or so. Second,
one essential area of detection theory—the analysis of different discrimination paradigms—requires multidimensional methods that were introduced in
passing in the first edition, but are now integrated into a systematic presentation of these methods. Someone concerned only with analyzing specific paradigms will be most interested in chapters 1 to 3, 5, 7, 9, and 10. The
intervening chapters provide greater theoretical depth (chaps. 4 and 8) as well
as a careful introduction to multidimensional analysis (chap. 6).
The flowcharts (Appendix 3) are inspired by similar charts in Behavioral
Statistics by R. B. Darlington and P. M. Carlson (1987). We thank Pat
Carlson for persuasive discussions of the value of this tool and for helping
us use it to best advantage.
We are grateful to many people who helped us complete this project. We
taught courses based on preliminary drafts at Brooklyn College and the
University of Massachusetts. Colleagues used parts of the book in courses
at Purdue University (Hong Tan), the University of California at San Diego
(John Wixted), and the University of Florida (Bob Sorkin). We thank these
instructors and their students for providing us with feedback. We owe a debt
to many other colleagues who commented on one or more chapters in preliminary drafts, and we particularly wish to thank Danny Algom, Michael
Hautus, John Irwin, Marjorie Leek, Todd Maddox, Dawn Morales, Jeff

Preface

xv

Miller, and Dick Pastore. Caren Rotello's comments, covering almost the
entire book, were consistently both telling and supportive.
Our warmest appreciation and thanks go to our wives, Judy Mullins
(Macmillan) and Lynne Beal (Creelman), for their generous support and
patience with a project that —like the first edition—provided serious competition for their company.
We also thank Bill Webber, our editor, and Lawrence Erlbaum Associates for adopting this project and making it their own.
Finally, we continue to feel a great debt to the parents of detection theory.
Among many who contributed to the theory in its early days, our thinking
owes the most to four people. We dedicate this book to David M. Green, R.
Duncan Luce, and John A. Swets, and to the memory of Wilson P. Tanner,
Jr. Without them there would be no users for us to guide.

This page intentionally left blank

Introduction

Detection theory is a general psychophysical approach to measuring performance. Its scope includes the everyday experimentation of many psychologists, social and medical scientists, and students of decision processes.
Among the problems to which it can be applied are these:
• assessing a person's ability to recognize whether a photograph is of
someone previously seen or someone new,
• measuring the skill of a medical diagnostician in distinguishing
X-rays displaying tumors from those showing healthy tissue,
• finding the intensity of a sound that can be heard 80% of the time, and
• determining whether a person can identify which of several words has
been presented on a screen, and whether identification is still possible
if the person reports that a word has not appeared at all.
In each of these situations, the person whose performance we are studying
encounters stimuli of different types and must assign distinct responses to
them. There is a correspondence1 between the stimuli and the responses so
that each response belongs with one of the stimulus classes. The viewer of
photographs, for example, is presented with some photos of Old,2 previously seen faces, as well as some that are New, and must respond "old" to
the Old faces and "new" to the New. Accurate performance consists of using
the corresponding responses as defined by the experimenter.
A correspondence experiment is one in which each possible stimulus is
assigned a correct response from a finite set. In complete correspondence
experiments, which include all the designs in chapters 1,2,4,6,7,9,10, and
11, this partition is rigidly set by the experimenter. In incomplete corre'Most italicized words are defined in the Glossary.
Throughout the book, we capitalize the names of stimuli and stimulus classes.

xvii

xviii

Introduction

spondence experiments (such as the rating design described in chap. 3 and
the classification tasks of chap. 5), there is a class of possible correspondences, each describing ideal performance.
Correspondence provides an objective standard or expectation against
which to evaluate performance. Detection theory measures the discrepancy
between the two and may therefore be viewed as a technique for understanding error. Errors are assumed to arise from inevitable variability, either
in the stimulus input or within the observer. If this noise does not appreciably affect performance, responses correspond perfectly to stimuli, and their
correctness provides no useful information. Response time is often the dependent variable in such situations, and models for interpreting this performance measure are well developed (Luce, 1986).
The possibility of error generally brings with it the possibility of different kinds of errors—misses and false alarms. Medical diagnosticians can
miss the shadow of a tumor on an X-ray or raise a false alarm by reporting
the presence of one that is not there. A previously encountered face may be
forgotten or a new one may be falsely recognized as familiar. The two types
of error typically have different consequences, as these examples make
clear: If the viewer of photographs is in fact an eyewitness to a crime, a miss
will result in the guilty going free, a false alarm in the innocent being accused. A reasonable goal of a training program for X-ray readers would be
to encourage an appropriate balance between misses and false alarms (in
particular, to keep the number of misses very small).
Detection theory, then, provides a method for measuring people's accuracy (and understanding their errors) in correspondence experiments. This
is not a definition—we offer a tentative one at the end of chapter 1—but may
suggest the directions in which a discussion of the theory must lead.
Organization of the Book
This book is divided into four parts. Part I describes the measurement of
sensitivity and response bias in situations that are experimentally and theoretically the simplest. One stimulus is presented on each trial, and the representation of the stimuli is one dimensional. In Part II, multidimensional
representations are used, allowing the analysis of a variety of classification
and identification experiments. Common but complex discrimination designs in which two or more stimuli are presented on each trial are a special
case. In Part III, we consider two important topics in which stimulus characteristics are central. Chapter 11 discusses adaptive techniques for the estimation of thresholds. Chapter 12 describes ways in which detection theory

Introduction

xix

can be used to relate sensitivity to stimulus parameters and partition sensitivity into its components. Part IV (chap. 13) offers some statistical procedures for evaluating correspondence data.
Organization of Each Chapter
Each chapter is organized around one or more examples modeled on experiments that have been reported in the behavioral literature. (We do not attempt to reanalyze actual experiments, which are always more complicated
than the pedagogical uses to which we might put them.) For each design, we
present one or more appropriate methods for analyzing the illustrative data.
The examples make our points concrete and suggest the breadth of application of detection theory, but they are not prescriptive: The use of a recognition memory task to illustrate the two-alternative forced-choice paradigm
(chap. 7) does not mean, for instance, that we believe this design to be the
only or even the best tool for studying recognition memory. The appropriate
design for studying a particular topic should always be dictated by practical
and theoretical aspects of the content area.
The book as a whole represents our opinions about how best to apply detection theory. For the most part, our recommendations are not controversial, but in some places we have occasion to be speculative, argumentative,
or curmudgeonly. Sections in which we take a broader, narrower, or more
peculiar view than usual are labeled essays as a warning to the reader.

This page intentionally left blank

I
Basic Detection Theory
and One-Interval Designs

Part I introduces the one-interval design, in which a single stimulus is presented on each trial. The simplest and most important example is a correspondence experiment in which the stimulus is drawn from one of two
stimulus classes and the observer tries to say from which class it is drawn. In
auditory experiments, for example, the two stimuli might be a weak tone
and no sound, tone sequences that may be slow or fast, or passages from the
works of Mozart and Beethoven.
We begin by describing the use of one-interval designs to measure discrimination, the ability to tell two stimuli apart. Two types of such experiments may be distinguished. If one of the two stimulus classes contains only
the null stimulus, as in the tone-versus-background experiment, the task is
called detection. (This historically important application is responsible for
the use of the term detection theory to refer to these methods.) If neither
stimulus is null, the experiment is called recognition, as in the other examples. The methods for analyzing detection and recognition are the same, and
we make no distinction between them (until chap. 10, where we consider
experiments in which the two tasks are combined).
In chapters 1 and 2, we focus on designs with two possible responses as
well as two stimulus classes. Because the possible responses in some applications (e.g., the tone detection experiment) are "yes" and "no," the paradigm with two stimuli, one interval, and two responses is sometimes termed
yes-no even when the actual responses are, say, "slow" and "fast." Performance can be analyzed into two distinct elements: the degree to which the
observer's responses mirror the stimuli (chap. 1) and the degree to which
they display bias (chap. 2). Measuring these two elements requires a theory;
we use the most common, normal-distribution variant of detection theory to
1

2

Parti

accomplish this end. Chapter 4 broadens the perspective on yes-no sensitivity and bias to include three classes of alternatives to this model: threshold
theory, choice theory, and "nonparametric" techniques.
One-interval experiments may involve more than two responses or more
than two possible stimuli. As an example of a larger response set, listeners
could rate the likelihood that a passage was composed by Mozart rather than
Beethoven on a 6-point scale. One-interval rating designs are discussed in
chapter 3. As an example of a larger stimulus set, listeners could hear sequences presented at one of several different rates. If the requirement is to
assign a different response to each stimulus, the task is called identification',
if the stimuli are to be sorted into a smaller number of classes (perhaps slow,
medium, and fast), it is classification. Chapter 5 applies detection-theory
tools to identification and classification tasks, but only those in which elements of the stimulus sets differ in a single characteristic such as tempo.
Identification and classification of more heterogeneous stimulus sets are
considered in Part II.

1
The Yes-No Experiment: Sensitivity

In this book, we analyze experiments that measure the ability to distinguish
between stimuli. An important characteristic of such experiments is that observers can be more or less accurate. For example, a radiologist's goal is to
identify accurately those X-rays that display abnormalities, and participants
in a recognition memory study are accurate to the degree that they can tell previously presented stimuli from novel ones. Measures of performance in these
kinds of tasks are also called sensitivity measures: High sensitivity refers to
good ability to discriminate, low sensitivity to poor ability. This is a natural
term in detection studies—a sensitive listener hears things an insensitive one
does not—but it applies as well to the radiology and memory examples.
Understanding Yes-No Data
Example 1: Face Recognition
We begin with a memory experiment. In a task relevant to understanding
eyewitness testimony in the courtroom, participants are presented with a series of slides portraying people's faces, perhaps with the instruction to remember them. After a period of time (and perhaps some unrelated activity),
recognition is tested by presenting the same participants with a second series that includes some of the same pictures, shuffled to a new random order,
along with a number of "lures"—faces that were not in the original set.
Memory is good if the person doing the remembering properly recognizes
the Old faces, but not New ones. We wish to measure the ability to distinguish between these two classes of slides. Experiments of this sort have
been performed to compare memory for faces of different races, orientations (upright vs. inverted), and many other variables (for a review, see
Shapiro & Penrod, 1986).
3

4

Chapter 1

Let us look at some (hypothetical) data from such a task. We are interested in just one characteristic of each picture: whether it is an Old face (one
presented earlier) or a New face. Because the experiment concerns two
kinds of faces and two possible responses, "yes" (I've seen this person before in this experiment) and "no" (I haven't), any of four types of events can
occur on a single experimental trial. The number of trials of each type can be
tabulated in a stimulus-response matrix like the following.
Stimulus
Class
Old
New

"Yes"
20
10

Response
"No"
5
15

Total
25
25

The purpose of this yes-no task is to determine the participant's sensitivity to the Old/New difference. High sensitivity is indicated by a concentration of trials counted in the upper left and lower right of the matrix ("yes"
responses to Old stimuli, "no" responses to New).
Summarizing the Data
Conventional, rather military language is used to describe the yes-no experiment. Correctly recognizing an Old item is termed a hit', failing to recognize it, a miss. Mistakenly recognizing a New item as old is a false alarm',
correctly responding "no" to an Old item is, abandoning the metaphor, a
correct rejection. In tabular terms:
Stimulus Class
Old (S2)
New (Si)

"Yes"
Hits
(20)
False alarms
(10)

Response
"No"
Misses

Total

(5)

(25)

Correct rejections
(15)

(25)

We use 5", and S2 as context-free names for the two stimulus classes.
Of the four numbers in the table (excluding the marginal totals), only two
provide independent information about the participant's performance.
Once we know, for example, the number of hits and false alarms, the other
two entries are determined by how many Old and New items the experimenter decided to use (25 of each, in this case). Dividing each number by

The Yes-No Experiment: Sensitivity

5

the total in its row allows us to summarize the table by two numbers: The hit
rate (H) is the proportion of Old trials to which the participant responded
"yes," and the false-alarm rate (F) is the proportion of New trials similarly
(but incorrectly) assessed. The hit and false-alarm rates can be written as
conditional probabilities'
// = P("yes"IS2)

(1.1)

F = P("yes"IS,),

(1.2)

where Equation 1.1 is read "The proportion of 'yes' responses when stimulus S2 is presented."
In this example, H = .8 and F = A. The entire matrix can be rewritten with
response rates (or proportions) rather than frequencies:
Stimulus
Class
Old (S2)
New (5,)

"Yes"
.8
.4

Response
"No"
.2
.6

Total
1.0
1.0

The two numbers needed to summarize an observer's performance, F and
H, are denoted as an ordered (false-alarm, hit) pair. In our example, (F, H)
= (A .8).
Measuring Sensitivity
We now seek a good way to characterize the observer's sensitivity. A function of H and F that attempts to capture this ability of the observer is called a
sensitivity measure, index, or statistic. A perfectly sensitive participant
would have a hit rate of 1 and a false-alarm rate of 0. A completely insensitive participant would be unable to distinguish the two stimuli at all and, indeed, could perform equally well without attending to them. For this
observer, the probability of saying "yes" would not depend on the stimulus
presented, so the hit and false-alarm rates would be the same. In interesting
cases, sensitivity falls between these extremes: //is greater than F, but performance is not perfect.
1

Technically, H and F are estimates of probabilities—a distinction that is important in statistical work
(chap. 13). Probabilities characterize the observer's relation to the stimuli and are considered stable and
unchanging; H and F may vary from one block of trials to the next.

6

Chapter 1

The simplest possibility is to ignore one of our two response rates using, say, H to measure performance. For example, a lie detector might be
touted as detecting 80% of liars or an X-ray reader as detecting 80% of tumors. (Alternatively, the hit rate might be ignored, and evaluation might
depend totally on the false-alarm rate.) Such a measure is clearly inadequate. Compare the memory performance we have been examining with
that of another group:
Stimulus
Class

Old
New

Response

"Yes"
8
1

"No"
17
24

Total

25
25

Group 1 successfully recognized 80% of the Old words, Group 2 just
32%. But this comparison ignores the important fact that Group 2 participants just did not say "yes" very often. The hit rate, or any measure that depends on responses to only one of the two stimulus classes, cannot be a
measure of sensitivity. To speak of sensitivity to a stimulus (as was done, for
instance, in early psychophysics) is meaningless in the framework of detection theory.2
An important characteristic of sensitivity is that it can only be measured between two alternative stimuli and must therefore depend on both
H and F. A moment's thought reveals that not all possible dependencies
will do. Certainly a higher hit rate means greater, not less, sensitivity,
whereas a higher false-alarm rate is an indicator of less sensitive performance. So a sensitivity measure should increase when either//increases
or F decreases.
A final possible characteristic of sensitivity measures is that 5, and S2 trials should have equal importance: Missing an Old item is just as important
an error as incorrectly recognizing a New one. In general, this is too strong a
requirement, and we will encounter sensitivity measures that assign different weights to the two stimulus classes. Nevertheless, equal treatment is a
good starting point, and (with one exception) the indexes described in this
chapter satisfy it.
2

The term sensitivity is used in this way, as a synonym for the hit rate, in medical diagnosis. Specificity is
that field's term for the correct-rejection rate.

The Yes-No Experiment: Sensitivity

1

Two Simple Solutions
We are looking for a measure that goes up when H goes up, goes down when
Fgoes up, and assigns equal importance to these statistics. How about simply subtracting Ffrom HI The difference H- Fhas all these characteristics.
For the first group of memory participants, H - F - .8 - .4 = .4; for the second, H- F = .32 - .04 = .28, and Group 1 wins.
Another measure that combines H and Fin this way is a familiar statistic,
the proportion of correct responses, which we denote p(c). To find proportion correct in conditions with equal numbers of 5, and S2 trials, we take the
average of the proportion correct on S2 trials (the hit rate, H) and the proportion correct on Sl trials (the correct rejection rate, 1 - F}. Thus:

If the numbers of Sl and S2 trials are not equal, then to find the literal proportion of trials on which a correct answer was given the actual numbers in the
matrix would have to be used:
p(c)* = (hits + correct rejections)/total trials .

(1.4)

Usually it is more sensible to give H and F equal weight, as in Equation
1.3, because a sensitivity measure should not depend on the base presentation rate.
Let us look atp(c) f°r equal presentations (Eq. 1.3). Is this a better or
worse measure of sensitivity than H - F itself? Neither. Because p(c) depends directly onH-F (and not on either HoiF separately), one statistic
goes up whenever the other does, and the two are monotonic functions of
each other. Two measures that are monotonically related in this way are said
to be equivalent measures of accuracy. In the running examples, p(c} is .7
for Group 1 and .64 for Group 2, andp(c) leads to the same conclusion as H
- F. For both measures, Group 1 outscores Group 2.
A Detection Theory Solution
The most widely used sensitivity measure of detection theory (Green &
Swets, 1966) is not quite as simple asp(c), but bears an obvious family re-

8

Chapter 1

semblance. The measure is called d' ("dee-prime") and is defined in terms
of z, the inverse of the normal distribution function:

d'=z(H)-z(F).

(1.5)

The z transformation converts a hit or false-alarm rate to a z score (i.e., to
standard deviation units). A proportion of .5 is converted into a z score of 0,
larger proportions into positive z scores, and smaller proportions into negative ones. To compute z, consult Table A5.1 in Appendix 5. The table makes
use of a symmetry property of z scores: Two proportions equally far from .5
lead to the same absolute z score (positive if p > .5, negative if p < .5) so that:

z(l-p) = -z(p).

(1.6)

Thus, z(.4) = -.253, the negative of z(.6). Use of the Gaussian z transformation is dominant in detection theory, and we often refer to normal-distribution models by the abbreviation SDT.
We can use Equation 1.5 to calculate d' for the data in the memory example. For Group 1, H= .8 and F= .4, so z(H) = 0.842, z(F) = -0.253, and d'=
0.842 - (-0.253) = 1.095. When the hit rate is greater than .5 and the
false-alarm rate is less (as in this case), d' can be obtained by adding the absolute values of the corresponding z scores. For Group 2,H= .32 and F =
.04, so d' = -0.468 - (-1.751) = 1.283. When the hit and false-alarm rates
are on the same side of .5, d' is obtained by subtracting the absolute values
of the z scores. Interestingly, by the d' measure, it is Group 2 (the one that
was much more stingy with "yes" responses) rather than Group 1 that has
the superior memory.
When observers cannot discriminate at all, H = F and d' = 0. Inability to
discriminate means having the same rate of saying "yes" when Old faces are
presented as when New ones are offered. As long asH^F, d' must be greater
than or equal to 0. The largest possiblefinite value of d' depends on the number of decimal places to which H and F are carried. When H=.99 and F = .01,
d' - 4.65; many experimenters consider this an effective ceiling.
Perfect accuracy, on the other hand, implies an infinite d'. Two adjustments to avoid infinite values are in common use. One strategy is to convert
proportions of 0 and 1 to l/(2N) and 1 - 1/(2AO, respectively, where N is the
number of trials on which the proportion is based. Suppose a participant has
25 hits and 0 misses (H= 1.0) to go with 10 false alarms and 15 correct rejections (F= .4). The adjustment yields 24.5 hits and 0.5 misses, so H= .98 and
d' = 2.054 - (-0.253) = 2.307. A second strategy (Hautus, 1995; Miller,

The Yes-No Experiment: Sensitivity

9

1996) is to add 0.5 to all data cells regardless of whether zeroes are present.
This adjustment leads to H=25.5/26 = .981 and F= 10.5/26 = .404. Rounding to two decimal places yields the same value as before, but d' is slightly
smaller if computed exactly.
Most experiments avoid chance and perfect performance. Proportions
correct between .6 and .9 correspond roughly to d' values between 0.5 and
2.5. Correct performance on 75% of both Sl and S2 trials yields a d' of 1.35;
69% for both stimuli gives d' = 1.0.
It is sometimes important to calculate d' when only p(c) is known, not H
and F. (Partial ignorance of this sort is common when reanalyzing published data.) Strictly speaking, the calculation cannot be done, but an approximation can be made by assuming that the hit rate equals the correct
rejection rate so that H=l-F. For example, if p(c) = .9, we can guess at a
measure for sensitivity: d' = z(.9) - z(. 1) = 1.282 - (-1.282) = 2.56. To simplify the calculation, notice that one z score is the negative of the other (Eq.
1.6). Hence, in this special case:
d'=2z[p(c)].

(1.7)

This calculation is not correct in general. For example, suppose H= .99 and
F =. 19, so that H and the correct rejection rate are not equal. Then p(c) still
equals. 9, but 0,B1/(B1+B2),
(B1-0.5)/(B1+B2))
= IF(B3>0, B3/(B3+B4),
0.5/(B3+B4))
= NORMSINV(B5)
= NORMSINV(B6)
= B7-B8
= -0.5*(B7 + B8)
=EXP(B9*B10)

The site is http://psych.utoronto.ca/~creelman/.

B
(Setl)
10
0
2
8
.950

C
(Set 2)
9

1
0
10
.900

.200

.050

1.645
-0.842
2.486
-0.402
0.368

1.282
-1.645
2.926
0.182
1.702

22

Chapter 1
Essay: The Provenance of Detection Theory

Psychophysics, the oldest psychology, has continually adapted itself to the
substantive concerns of experimentalists. In particular, detection theory is
well suited to cognitive psychology and might indeed be considered one of
its sources. No grounding in history is needed to use this book, but some appreciation of the intellectual strains that meet here will help place these
tools in context.
The term psychophysics was invented by Gustav Fechner (1860), the
19th-century physicist, philosopher, and mystic. He was the first to take a
mathematical approach to relating the internal and external worlds on the
basis of experimental data. Some present-day psychophysicists directly
pursue Fechner's interest in relating mental experience to the physical
world, usually in simple perceptual experiments. Measuring the way in
which the reported experience of loudness grows with physical intensity is a
psychophysical problem of this sort; we consider a detection theory
approach to this problem in chapter 5.
This book is part of a second Fechnerian legacy, also methodological, but
more general than the first. Fechner developed, tested, and described experimental methods for estimating the difference threshold, or just noticeable
difference (jnd), the minimal difference between two stimuli that leads to a
change in experience. Fechner's assumption that the jnd could be the unit of
measurement, the fundamental building block or atom of experience, was
central to Wundt's and Titchener's structuralism, the first experimentally
based theory of perception. The analogy to 19th-century chemistry was
close: Theory and experiment should focus on uncovering the basic units
and the laws of combination of those units.
Fechner's methods were adopted and became topics of investigation in
their own right; they still form the backbone of experimental psychology.
Attempts to measure jnds led to two complications: (a) The threshold appeared not to be a fixed quantity because, as the difference between two
stimuli increases, correct discrimination becomes only gradually more
likely (Urban, 1908); and (b) different methods produced different values
for the jnd.
The concept of the jnd survived the first problem by redefinition: The jnd
is now considered to be the stimulus difference that can be discriminated on
some fixed percentage of trials (see chaps. 5 and 11). Two early reactions to
the problem of continuity in psychophysical data are recognizable in modern research (see Jones, 1974).

The Yes-No Experiment: Sensitivity

23

One line of thought retained the literal notion of a sensory threshold,
building mechanical and mathematical models to explain the gradual nature
of observed functions (see chap. 4 for the current status of such models).
The threshold idea was congenial with early 20th-century behaviorist and
operationist attitudes: Sensory function could be studied and measured
without invoking unpopular notions of mental content (Garner, Hake, &
Eriksen, 1956). The threshold, in this view, was a construct derived from
data and did not have to relate to any internal and unobservable mental process. The solution to method dependence was merely to subscript thresholds to indicate the method by which they were obtained (Graham, 1950;
Osgood, 1958).
The second response to the variability problem, instigated, according to
Jones (1974), by Delboef (1883), substituted a continuum of experience for
the discrete processes of the threshold; it is this view that informs most contemporary psychophysics. One approach to measuring such continuous experience was Stevens' (1975) magnitude estimation, which used direct
verbal estimates. Detection measurement, in contrast, relies on underlying
random variation or noise. Psychologists' realization of the importance of
random variation dates at least to Fullerton and Cattell (1892), who invoked
it in a rigorous quantitative way to account for inconsistency in response
with repetitions of identical stimuli. Variability later served as the key building block for the pioneering work of Thurstone (1927a, 1927b) in
measuring distances along sensory continua indirectly.
The idea of variability or noise as an explanatory concept also arose in
engineering, with the development and evaluation of radar detection apparatus. Radar and sonar are limited in performance by intrinsic noise in the
input signal. Any input from an antenna or sensor can be due to noise alone
or to a signal of interest embedded in the background noise. Groups at the
University of Michigan (Peterson, Birdsall, & Fox, 1954), MIT (van Meter
& Middleton, 1954), and in the Soviet Union (Kotel'nikov, 1960) recognized that the physical noise that was mixed with all signals, and that could
mimic signal presence, was a major limitation to detection performance.
Knowing that stimulus environments are noisy does not, in itself, tell an
observer how best to cope with them. An approach to this problem was contributed by another applied science: statistical decision theory. Decision
theorists pointed out that information derived from noisy signals could lead
to action only when evaluated against well-defined goals. Decisions (and
thus action) should depend not only on the stimulus, but on the expected
outcomes of actions. The viewer of a radar display that might or might not

24

Chapter 1

contain a blip, for example, should consider the relative effects of failing to
detect a real bomber and of detecting a phantom before deciding on a
response to that display.
W. P. Tanner, Jr., working with J. A. Swets at the University of Michigan,
realized that these engineering notions could be applied to psychology and
appropriated them directly into the psychophysical experiment (Tanner &
Swets, 1954). By separating the world of stimuli and their perturbations
from that of the decision process, detection theory was able to offer measures of performance that were not specific to procedure and that were independent of motivation. Procedure and motivation could influence data, but
affected only the decision process, leaving measurable aspects of the internal stimulus world unchanged and capable of being evaluated separately.
According to detection theory, the observer's access to the stimuli being
discriminated is indirect: An intelligent, not entirely reliable process makes
inferences about them and acts according to the demands of the experimental situation. One might say that detection theory "deals with the processes
by which [a decision about] a perceived, remembered, and thought-about
world is brought into being from [an] unpromising beginning" (Neisser,
1967, p. 4). Neisser's landmark book linked perception and cognition into a
unified framework after a hiatus of many decades. The constructionist (although not complicated) decision processes of detection theory mark it as
an early example of cognitive psychology. The ideas behind detection theory are the everyday assumptions of behavioral experimenters in the cognitive era, and the theory itself is central to a wide range of research areas in
cognitive science. Perhaps Estes' (2002) assessment is not an overstatement: "... [SDT is] the most towering achievement of basic psychological
research of the last half century" (p. 15).
Summary
The results of a one-interval discrimination experiment can be described by
a hit and a false-alarm rate, which in turn can be reduced to a single measure
of sensitivity. Good indexes can be written as the difference between the hit
and false-alarm rates when both are appropriately transformed. The sensitivity measure proposed by detection theory, d', uses the normal-distribution z transformation. The primary rationale for d'as a measure accuracy is
that it is roughly invariant when response bias is manipulated; simpler indexes such as proportion correct do not have this property. The use of d' implies a model in which the two possible stimulus classes lead to normal

The Yes-No Experiment: Sensitivity

25

distributions differing in mean, and the observer decides which class occurred by comparing an observation with an adjustable criterion.
Conditions under which the methods described in this chapter are appropriate are spelled out in Chart 2 of Appendix 3.
Problems
1.1.

1.2.

Suppose you are measuring the sensitivity of a polygraph ("lie detector"). What are "hits," "misses," "false alarms," and "correct rejections"?
The following tables give the number of trials in three conditions of
a detection experiment on which participants responded "yes" or
"no" to Sj or Sr (a) Calculate H and F. (b) Find H - F, p(c\ and
/?(c)*. For these data sets, can H - Fbe greater than p(c) in one case
and the reverse ordering occur in another, or is one index always
greater than the other?

S2

7

no
6
8

(b)

"yes"
55
5

"no"
45
25

"yes"
45
25

"no"
55
5

(a)

yes

9

S2

(c)
J

1.3

1.4.

(a). In Problem 1.2(a), the numbers of 5, and S2 trials are equal, but
in (b) and (c) they are not. Does this matter computationally? experimentally?
(b). Is it possible to calculatep(c) for S2 trials only? What would this
statistic measure?
Compute d' for the following (F, H) pairs:
(a) (.16, .84), (.01, .99), (.75, .75).
(b) (.6, .9), (.5, .9), (.05, .9).

26
1.5

1.6

1.7.
1.8.

1.9.

1.10.

1.11.

Chapter 1
(a). If p(c) = .8 and H and F are unknown, estimate d'.
(b). If p(c) - .8, the numbers of Sl and S2 trials are equal, and F= .05,
find H and d'.
(a). Suppose d' = 1. What is //if F= .01, .1, .5?
(b) Plot the ROC from these points on linear and z coordinates, and
use the zROC to confirm the value of d'.
For the data matrixes of Problem 1.2, find d' from H and F and also
fromp(c). Is there a pattern to the results?
Are the points (.3, .9) and (. 1, .7) on the same ROC according to detection theory (i.e., do they imply the same value of d')! Do they
imply the same value of p(c)l
Suppose (F, H) = (.2, .6). If F is unchanged, what would H have to
be to double the participant's sensitivity, according to detection
theory? If H is unchanged, what would F have to be?
Plot the ROCs implied by the following measures, on both linear
and z coordinates: H2 - F2, ffA - FA, H/F2, tfTF. Which measures are
best? worst?
Suppose a face-recognition experiment yields 20 hits and 10 false
alarms in 45 trials. Can you compute d'7 If not, is it possible to narrow down the possibilities? Hint: The stimulus-response matrix
looks like this:
20
10
45

What happens if there are 0 misses, or 0 correct rejections?

2
The Yes-No Experiment:
Response Bias

In dealing with other people, "bias" is the tendency to respond on some basis other than merit, showing a degree of favoritism. In a correspondence experiment, response bias measures the participant's tilt toward one response
or the other.
The sensitivity measure d' depends on stimulus parameters, but is untainted by response bias: To a good approximation, it remains constant in
the face of changes in response popularity. We now adopt the complementary perspective, seeking an index of response bias that is uncolored by sensitivity. Conceptually, d' corresponds to a fixed aspect of the observer's
decision space, the difference between the means of underlying distributions; a measure of bias should also reflect an appropriate characteristic of
the perceptual representation. How can we assign a value to the participant's preference for one of the two responses?
Two Examples
Example 2a: Face Recognition, Continued
Consider again the face-recognition experiment of chapter 1, in which
viewers discriminated Old from New faces. Suppose the investigator now
repeats the experiment, this time hypnotizing the participants in an effort to
improve their memory, and obtains the following results from a representative observer:
Normal

Old
New

"Yes"
69
31

"No"
31
69

Hypnotized

"Yes"
89
59

"No"
11
41

27

28

Chapter 2

Applying the analyses of chapter 1 reveals that hypnosis has not affected sensitivity: d' is approximately 1.0 in both the normal and hypnotized conditions.
Hypnosis does appear to affect willingness to say "yes"; there are many
more positive responses in the hypnotized condition than in the control
data. (For a discussion of whether hypnotism actually has this effect, see
Klatzky & Erdelyi, 1985.) In this example, therefore, an experimental manipulation affects bias, but not sensitivity. In the next example, a single variable affects both.
Example 2b: X-ray Reader Training
Apprentice radiologists must be trained to distinguish normal from abnormal X-rays (see Getty, Pickett, D'Orsi, & Swets, 1988, for a description of
one training program). In this field, a hit is conventionally defined to be
the correct diagnosis of a tumor from an X-ray, and a false alarm is the incorrect labeling of normal tissue as tumorous. Consider three readers who
before training are equally able to distinguish X-rays displaying real tumors from X-rays of normal tissue, attaining exactly the same performance, but emerge from training with different scores on a posttest:
Trainee 1
Trainee 2
Trainee 3

Before Training
#=.89
F=.59
#=.89
F=.59
#=.89
F=.59

After Training
#=.96
F=.39
#=.993
F=.68
#=.915
F=.265

The trained readers are more sensitive—two of them show both a higher
proportion of hits and a lower proportion of false alarms than before training. But has there also been a change in willingness to say "yes"? In the hypnotic recognition experiment, a response bias change merely masked the
constancy of sensitivity; in this second example, there is clear evidence for a
sensitivity change, but an interesting response-bias question remains.
Measuring Response Bias
Characteristics of a Good Response-Bias Measure
Because a response-bias index is intended to measure the participant's willingness to say "yes," we expect it to depend systematically on both the hit

The Yes-No Experiment: Response Bias

29

and false-alarm rates and in the same direction—either increasing or decreasing in both. Sensitivity measures, remember, increase with H and decrease with F, an analogous property. A response-bias index should depend
on the sum of terms involving H and F, whereas the sensitivity statistic d'
depends on the difference of H and F terms.
Response-bias statistics can reflect either the degree to which "yes" responses dominate or the degree to which "no" responses are preferred. All
the measures in this book index a leaning in the same direction: A positive
bias is a tendency to say "no," whereas a negative bias is a tendency to say
"yes." The rationale for these apparently illogical pairings will become
clear when we discuss the representation.
Criterion Location (c)
The basic bias measure for detection theory, called c (for criterion), is defined as:

(2.1)
When the false- alarm and miss rates are equal, z(F) = z(l-H) = -z(H) and c
equals 0. Negative values arise when the false-alarm rate exceeds the miss
rate, and positive values arise when it is lower. Extreme values of c occur
when H and F are both large or both small: If both equal .99, for example, c
= -2.33, whereas if both equal .01, c = +2.33. The range of c is therefore the
same as that of d' , although 0 is at the center rather than an endpoint. Figure
2. 1 shows the locus of positive, negative, and 0 values of response bias in the
part of ROC space where sensitivity is above chance.
Table A5.1, which was introduced in chapter 1 as a tool for calculating
d', can also be used to compute the bias measure c. Spreadsheets accomplish the table-lookup task automatically (see Table 1.1, which includes
some bias measures). Analyzing the face-recognition results, we find that c
shifts from 0 to -0.73 under hypnosis, reflecting an increase in "yes" responses.
To interpret these numbers according to our model, consider the decision
space in Fig. 2.2. The familiarity decision axis is labeled in standard deviation units, 0 being the point midway between the two distributions. Because
d' = 1.0, the mean of the Old distribution is at 0.5, the mean of the New at
-0.5. The participant's decision rule is to divide the familiarity axis into
"yes" and "no" regions at a criterion.

30

Chapter 2

False-alarm Rate

FIG. 2.1. The representation of criterion location in ROC space. Points in the
shaded regions arise from criteria that are positive (below the minor diagonal) and
negative (above the minor diagonal). Points in the unshaded region below the major diagonal result from negative sensitivity.

FIG. 2.2. Decision spaces for the Normal and Hypnotized conditions of Example 2a, according to SDT. Shaded area corresponds to F, diagonally striped area to
H. (a) Normal controls have a symmetric criterion, d' = 1.0. (b) Hypnotized participants display identical sensitivity but a lower criterion, and thus have higher hit
and false-alarm rates.

The Yes-No Experiment: Response Bias

31

A simple calculation shows that the value of this criterion, in standard deviation units from the midpoint, is the bias parameter c. In chapter 1 , we saw
that the z score of the "yes" rate corresponds to the mean-minus-criterion
distance. For the 5, distribution, this implies
-d'/2-c = z(F),

(2.2a)

and for the S2 distribution
d'/2 -c = z(ff) .

(2.2b)

Adding these two equations produces Equation 2. 1 .
The different values of response bias in the normal and hypnotized conditions of our face-recognition experiment, therefore, correspond to different criterion locations. In the control condition (Fig. 2.2a), the criterion is
located at 0, exactly halfway between the two distributions, and the participant is said to be "unbiased." Under hypnosis (Fig. 2.2b), the participant's
criterion is much lower, below the mean of the New distribution. Because it
is 0.73 standard deviations below the zero-bias point, c = -0.73.
Analysis of the radiology training data from Example 2b is equally
straightforward. All trainees improve in sensitivity: d' about doubles. Values of c can be calculated from Equation 2. 1 . Trainee 1 maintains the same
criterion location after training as before (c = -0.74). Trainee 2 has a more
extreme bias (-1 .46), and Trainee 3 has a less extreme one (-0.37). The degree to which the criteria differ among trainees is easily seen in Fig. 2.3,
which shows the decision space and criterion settings for each reader: The
first row represents the pretraining decision space of all trainees, and the
other rows represent the posttraining spaces of each one individually.
Alternative Measures of Bias
Detection theory offers one measure of sensitivity (for two-response experiments), but is more generous with bias parameters. Besides criterion location, just described, bias can be specified by relative criterion location and
likelihood ratio.
Relative Criterion Location (c')
In this measure of bias, we scale the criterion location relative to performance. A rationale for such scaling is that with easier discrimination tasks a

32

Chapter 2

FIG. 2.3. Decision spaces
for the three radiology trainees of Example 2b. In each
case the hit rate, false-alarm
rate, sensitivity, and three alternative criterion measures
are shown, (a) Before training, d' = 1.0. The criterion c,
the relative criterion c', and
log likelihood ratio equal
-0.73 for all trainees, (b)
Trainee 1, after training; increased sensitivity and approximately the same criterion location c as before training, (c) Trainee 2, after training; increased sensitivity and
approximately the same relative criterion location c' as
before training, (d) Trainee 3,
after training; increased sensitivity and approximately
the same value of log likelihood ratio [ln(/3)] as before
training.

The Yes-No Experiment: Response Bias

33

more extreme criterion (as measured by c) would be needed to yield the
same amount of bias.
Look again at the radiography training data of Example 2b. The first radiologist's criterion location is indeed the same distance from 0 (the
equal-bias point) before and after training, but whether this is to be called
"no change" can be argued. The criterion was initially below the mean of
the Sl distribution, but is above it afterward. If distance from the criterion to
a distribution mean is the key to bias, this observer's bias has become less
extreme. Would it not be sensible to calculate the criterion distance as a proportion of sensitivity distance? The alternative bias measure suggested by
this reasoning is:

Calculated values for c' are given in Fig. 2.3. It happens in this example that
before training, c-c\ but only because d' =1.0. After training, c' is half the
magnitude of c because d' = 2. When d' varies, one must decide whether in
discussing "bias" one wishes to take account of sensitivity. Of the three radiologists, it is Trainee 2 who maintains the same bias in the sense of c' and
Trainee 1 whose bias is unchanged in the sense of c.
Likelihood Ratio (P)
The third measure of bias is found by an apparently different strategy. In
the decision space, each value x on the decision axis has two associated
"likelihoods," one for each distribution. Each likelihood is the height of
one of the distributions; we denote this height at the location jc by/(;c), and
to distinguish the two distributions we refer to the heights of Sl and S2 as
/(xlSj) andf(x\S2). The relative likelihood of S2 versus 5,, obtained by dividing these, is called the likelihood ratio:

Each point x has an associated value of likelihood ratio: It is 1 .0 at the center
(where the two distributions cross), greater than 1.0 to the right, and between 0 and 1.0 to the left. One measure of response bias, therefore, is the
value of likelihood ratio at the criterion.
Equation 2.4 suggests an interesting interpretation of likelihood ratio in
terms of the ROC. Consider two points very close together on the decision

34

Chapter 2

axis—imagine they are a small value 8 units apart, as shown in Fig. 2.4a.
The change in the hit rate between the two points is approximately /(jclS^e,
the height of the S2 distribution multiplied by the width of a tiny rectangle.
The change in the false-alarm rate, by the same token, equals fixlS^e. The
ratio of these changes, which is the slope of the ROC, isflxlSJ/faAS^. Notice that this slope exactly equals the likelihood ratio. The assertion in chapter 1 that the slope of the ROC continuously decreases follows from the
equivalence of likelihood ratio and ROC slope. As the criterion goes from
large to small values of c, the likelihood ratio must decrease, and so therefore must the slope.

FIG. 2.4. Geometric demonstration that the slope of the ROC at any point is the
likelihood ratio at the criterion value that yields that point, (a) In the decision
space, two criteria are shown that differ by a small amount e. For the lower criterion, the hit rate is greater by an amount equal to the area of the filled rectangle, and
the false-alarm rate is greater by an amount equal to the area of the diagonally
shaded rectangle, (b) The two criteria correspond to two points on an ROC curve.
(c) An expanded view of the relevant section of the ROC. The lower point (higher
criterion) is (F, H). At the higher point, the hit and false-alarm rates increase by the
areas of the rectangles in (a). The slope of the ROC, the ratio of these two increments, isflxlSJ/flxlSi), which is the likelihood ratio.

The Yes-No Experiment: Response Bias

35

This conclusion does not depend on any assumptions about the shape of
the underlying distributions, but actual calculation of likelihood ratio does
require such a commitment. In the normal-distribution model we have been
exploring, the height of the likelihood function, denoted 0, depends on x
and on the distribution's mean fi and standard deviation cr according to the
equation

(2.5)

Values of 0 are given in Table A5.1.
The general strategy for finding the likelihood ratio can now be applied
to the normal model. The likelihood function/in Equation 2.4 equals 0, and
the likelihood ratio is the ratio of two values of (z). The value of
O(z) can be found from a normal table, but Table A5.1 is not ideally arranged for this purpose. In that table, p values are given in units of .01,
which is helpful when/? is known, as in data analysis. Table A5.2 gives the
same information, but for z scores in units of .01, which is more convenient
when z is known. The probability p corresponding to a z score is O(z).
The "yes" rate is 1 - (z); because the normal distribution is symmetric,
this equals O(-z). Expressed as a z score, the criterion equals c-d'12 for the
S2 distribution and c - (-d'/2) for S{', so

For an unbiased observer, c = 0, # = O(d'12), and F = O(-d72). In this case,
the hit and correct rejection rates both equal proportion correct, so
(2.10)

FIG. 2.7. Relation between underlying distributions and "yes" rates (hit and
false-alarm rates). When the criterion is at z, the yes rate is O(-z).

46

Chapter 2
Essay: On Human Decision Making

Much of the large literature on decision making by human beings (see, e.g.,
Kahnemann, Slovic, & Tversky, 1982) asks how closely our behavior corresponds to what we "should" do. The decision problem described in this
chapter is in many ways rather simple: Only one dimension is relevant, the
stimuli are presented at predictable times (in most applications), and repeated trials allow the observer to focus on relevant aspects of the stimulus
display. Does the observer indeed deal with this problem in the "right"
way—by establishing a criterion and using it?
At least two nonoptimal strategies have occurred to most psychophysicists who have studied (and, frequently, served as participants in) correspondence experiments: inattention and inconsistency. An inattentive
observer dozes off, or at least drifts into reverie, on some proportion of trials; because failing to respond is usually discouraged, this leads to an unknown number of d' = 0 trials, ones on which the observer responds despite
not having paid attention, mixed in with the others. An inconsistent participant uses a criterion, but changes the location of the cutoff from trial to trial;
because the criterion must be compared to a sensory event, the movement
adds an unknown amount of variance to the underlying distributions
(Wickelgren, 1968). Both strategies, if they may be called that, serve to reduce observed sensitivity.
Do these effects occur? Almost certainly, but little is known about
how badly they contaminate experiments. Training provided before experimental data are collected may serve to reduce these errors; observers
who fail to improve during practice may be suspected of persisting in a
nonoptimal strategy. In most applications, small amounts of inattention
or inconsistency matter little. Stimulus pairs that yield high performance
levels are an exception: The experimenter who wishes to make a precise
estimate of a d' of 4 or so will be frustrated by even an occasional lapse.
If lapses are part of the human condition, such estimates are doomed to
unreliability.
We have been speaking of optimal strategies; what about optimal use of
strategies? Given that an observer is using a criterion in the manner we suppose, are there ways we can encourage "unbiased" decision making, that is,
symmetric criterion placement? Arguments are sometimes put forward that
one or another experimental technique will accomplish this goal, which is
sometimes a valuable one (see especially chap. 11). Often, however, there is
no reason to aim for a symmetric criterion. After all, the sensitivity measure

The Yes-No Experiment: Response Bias

47

with which detection theory provides us is unaffected by bias, so why
worry? Perhaps only because in common parlance (but not in
psychophysics) bias is a pejorative term, something worth avoiding.
Another appeal of unbiased responding is that it makes almost any measure of sensitivity satisfactory, eliminating the need for complex psychophysics. The search for unbiased responding may thus be a vestige of the
belief that, really, simple, untransformed measures are to be trusted more than
theoretical ones. We shall critically evaluate this possibility in chapter 4.
Finally, the concept of bias in detection theory has sometimes been misunderstood in a way that makes neutral bias qualitatively different from
other values. The location of the criterion can, we have seen, be manipulated by instructions: Apparently, then, observers can consciously choose to
change it. If no instructions are given, however, observers are not aware of
the possibility of varying a criterion. Thus, the argument goes, instructions
to change bias provide conscious interference with a normally unconscious
process. In our view, the distinction between consciousness and its lack has
nothing to do with either the existence or location of a criterion. Detection
theory takes no stand on the conscious status of a criterion, and in any case
observers do not naturally choose a neutral value. We shall encounter this
issue again in chapter 10 when we briefly discuss the alleged phenomenon
of subliminal perception. An observer who responds "no" when a stimulus
is presented because of a high criterion is not necessarily aware of the
possibility that a "yes" response would have been possible had the criterion
been set lower.
Summary
Whereas a good sensitivity statistic is the difference between the transformed
hit and false-alarm rates (chap. 1), a good measure of response bias is the sum
of the same two quantities. In the decision space, this index describes the location of a criterion that divides the decision axis between values that lead to
"yes" and "no" responses. Other measures—relative criterion and likelihood
ratio—are equivalent when sensitivity is unvarying, but not when accuracy
changes across conditions. Criterion location has advantages, both logically
and, in some cases, empirically. Using a criterion to partition the decision axis
is an optimal response strategy. The optimal location of the criterion can be
calculated if the performance goal is specified.
Conditions under which the methods described in this chapter are appropriate are spelled out in Chart 3 of Appendix 3.

48

Chapter 2

_

Computational Appendix
Derivation of Equation 2.6
The likelihood ratio is the ratio of the values of the S2 and S1 normal likelihood functions at the location x = c. The function 0(jc) is defined by Equation 2.5. For both distributions, the standard deviation , is appropriate in such circumstances (Swets & Pickett, 1982):

(3.8)
In the current examples, Az = .9 and .76. This statistic equals the area under
the normal-model ROC curve, which increases from .5 at zero sensitivity to
1.0 for perfect performance.

FIG. 3.5. Nonunit-slope ROC, showing alternative indexes of sensitivity: d1',
(unit is the standard deviation of 5,), d'2 (unit is the standard deviation of 52), and
d'e (unit is the arithmetic average of the two standard deviations).

64

Chapter 3

Area under the ROC is a good index of sensitivity and can be measured
without any model assumptions—the first truly nonparametric measure we
have encountered. Pollack and Hsieh (1969) suggested estimating this area
in a straightforward way. Using the linear-coordinate ROC, connect the successive (F, H) points and draw vertical lines from each point to the F-axis,
creating a series of trapezoids (and one triangle). Each of these figures has
an area equal to the difference in the F values times the average //value, and
the total area (which Pollack and Hsieh called Ag) is found by summing
these areas:
Ag=^FM-Fi)(HM+Hi).

(3.9)

The index i tracks the ROC points so that (F,, #,) equals (0,0), (F2, H2) is the
first point to the right, and the last point is (1,1).
This measure is best with a large number of responses: The polygon form
of the ROC is systematically lower than the "true" ROC, and this difference
is greatest for curves with few points. Donaldson and Good (1996) proposed a measure, A' r (r for rating), that increases Ag to approximately compensate for this discrepancy. Of course if the ROC is consistent with the
normal-distribution model, Ai exactly compensates, so the nonparametric
Ag and A'r measures are most useful when this model does not hold.
Estimating Bias
Decision Space for the Rating Experiment
In deriving empirical ROCs, the stimulus-response matrix is partitioned
many different ways, once for each possible rule by which the observer could
have reduced the matrix to a simple yes-no table. Each partition yields a different (false-alarm, hit) pair, as shown in Figs. 3.1 and 3.2, and thus implies a
different criterion. One criterion is enough to generate one ROC point; to produce an ROC curve with n points, the observer must maintain n criteria simultaneously. No paraphernalia beyond those introduced in chapter 2 are needed
to find the locations of these criteria. The rating matrix is reanalyzed as separate yes-no matrixes, and any desired bias measure can then be computed. We
illustrate the calculations involved for our two examples.
Unit-Slope ROCs
First, consider the 7-day delay condition in the Rabin and Cain (1984) odorrecognition experiment. Table 3.7 extends that part of Table 3.4 in which a"

The Rating Experiment and Empirical ROCs

65

was computed for this condition to the bias parameter c. For each column, c
= - l/2[z(H) + z(F)], just as in two-response experiments. The highest values
of the criterion c indicate reluctance to say "old," and therefore correspond
to the left-most responses in the table. Figure 3.6 shows the locations of the
criteria vis-a-vis the underlying distributions.
The likelihood ratio, another measure of bias included in Table 3.7, can
be computed in two ways: either directly from the heights of the underlying
densities or from the product of the criterion location and d' (Eq. 2.6). We illustrate both methods for the criterion dividing "old" from "new" responses. At this point, z(H) = 0.207 and z(F) = -0.706, that is, H = .58 and F
= .24. Using Table A5.1, we find that the likelihood ratio j3 is 0.391/0.311 =
1.26 and ln(/3) = 0.23. And, according to Table 3.2, d' = 0.913 and c = 0.250,
so ln(£) = d'c = 0.23.
TABLE 3.7 Transformed Hit and False-Alarm Rates, d', and
Bias for Each Response in the Odor Recognition Experiment
(7-Day Delay Condition)
Old
New
d'
c
ln(/3)

-1.121
-2.037
0.916
1.579
1.446

"Old"
-0.301
-1.175
0.874
0.738
0.645

0.207
-0.706
0.913
0.250
0.230

"New"
0.649
1.573
-0.253
0.527
0.902
1.046
-€.198
-1.050
-0.179
-1.098

FIG. 3.6. Decision space and response criteria for the odor-recognition rating
experiment of Example 3a (1-week delay condition).

66

Chapter 3

Notice that the likelihood ratio j8 decreases as the hit and false-alarm rates
increase (moving through the table from left to right). As we saw in chapter 2,
the likelihood ratio is the slope of the ROC curve (on linear, not z score coordinates), so the slope of the ROC must also continually decrease. Because of
variability, however, empirical ROCs do not always have monotonically decreasing slope. One particularly glaring violation of monotonicity occurs
when a cell not in an end column of the data matrix contains a 0. A 0 in the upper (5"2) row implies that two adjacent points on the ROC fall on the same horizontal line, and a 0 in the lower row implies two points on the same vertical
line. The first yields a likelihood ratio of zero, the second of infinity. Because
such values are inconsistent with most models, some experimenters
"smooth" them in plotting their data. The simplest method for doing this is to
merge any column with a 0 in the S2 row with the column to its left, and any
column with a 0 in the 5, row with the column to its right. This procedure selects the more sensitive of two horizontally or vertically paired points, and it
eliminates the one displaying less sensitivity.
Nonunit-Slope ROCs
If the slope of the ROC on z coordinates does not equal 1 , a decision must be
made about the unit in which c is to be measured. Let us first consider using
the standard deviation of the S2 distribution in this role, calling the resulting
criterion location index cr Figure 3.7 shows an unequal- variance decision
space; the S^ distribution has standard deviation s and the S2 distribution
standard deviation 1. We wish to calculate the criterion location c2 relative
to the zero-bias point. As we saw in chapter 1 , the z coordinate of each "yes"
rate is a difference between a distribution mean and the criterion location
expressed in standard deviation units:

The analysis differs from that of chapter 1 only because the standard deviation of the Sj distribution is not 1. Combining Equations 3.10 leads to2

2

See Computational Appendix for derivation.

The Rating Experiment and Empirical ROCs

67

FIG. 3.7. Unequal-variance decision space portraying criterion location c2
(measured in terms of the S2 standard deviation).

Three other possible bias statistics employ the standard deviation of the
5, distribution, the rms standard deviation, and the average standard deviation, and can be calculated from

Values for all four measures in the low-frequency condition of the
word-recognition experiment (Example 3b) are given in Table 3.8. Because the measures differ only in unit, they are related to each other by
multiplicative constants. The isobias curves of all measures are the same,
and the same as that for c (Fig. 2.6). When s = 1, all the indexes are equal to
each other and to c.
Two other classes of bias measures were considered in chapter 2: relative
distances and likelihood ratio. In the unequal-variance case, relative distances (c') can be computed by combining the criterion values of Equations

68

Chapter 3
TABLE 3.8 Transformed Hit and False-Alarm Rates
and Bias Measures for the Word-Recognition
Experiment (Low-Frequency Condition)

"1"
Old
New
z(H) + z(F)
c
\
C2

ca
ce

0.279
-2.054
-1.775
1.145
0.630
0.780
0.813

"2"
0.706
-1.282
-0.576
0.372
0.204
0.253
0.264

"3"

1.341
-0.075
1.266
-0.817
-0.449
-0.557
-0.580

«,»
1.751
0.524
2.275
-1.468
-0.807
-1.000
-1.042

3.12 to 3.14 with the corresponding sensitivity values. Likelihood ratio can
be calculated, as in the previous example, by finding the heights of the S{
and S2 densities that correspond to H and F in Table A5.1. However, the simple relation between likelihood ratio and c (Eq. 2.6) no longer applies. In
fact, the use of likelihood ratio in describing nonunit-slope ROCs presents a
difficulty. Two normal densities that differ in both mean and variance intersect each other at two points (Luce, 1963a), as shown in the first panel of
Fig. 3.8. Because each intersection point reflects a likelihood ratio of 1, the
decision axis can no longer be monotonic with likelihood ratio. Indeed, if
the observer uses a cutpoint decision rule on this axis, H will be less than F
near one corner of the ROC (Fig. 3.8b).
Although ROCs with nonunit slope are common, we are not aware of any
data for which H is systematically less than F. There are two possible reasons for this nonphenomenon. First, the reversal occurs at extreme points in
ROC space. If d', = 2 and s = 0.5 (as in the figure), the reversal occurs at HF= .995, an ROC point rarely encountered in application. Even if d\ - 0.5,
the critical point is H = F = .98. The small magnitude of the potential reversals makes them hard to distinguish from chance (H = F) performance.
Second, no observer is forced to use the cutpoint response rule, which is
not ideal in this situation. The optimal decision maker establishes two cutpoints, one at each location having the critical value of likelihood ratio, and
responds "yes" for ratios greater than that value. Such observations are either above the upper cutpoint or below the lower one. The third panel of Fig.
3.8 illustrates this rule for a likelihood ratio of 1.0. The corresponding ROC,
portrayed in the second panel, differs most from the cutpoint rule in the upper corner, where the likelihood ratio rule does not produce below-chance
performance.

The Rating Experiment and Empirical ROCs

69

FIG. 3.8. Decision space and possible ROCs for underlying distributions of unequal variance, (a) Normal distributions with standard deviations 1 and 2. The distributions intersect at two points, (b) Two possible ROCs. Cutpoint rule yields a
linear ROC (s = 0.5) and leads to below-chance performance at low criteria; likelihood ratio rule has nonlinear shape, (c) Decision space of panel (a) showing the
two cutpoints required by a likelihood ratio response rule, here set at ft = 1.

70

Chapter 3
Systematic Parameter Estimation
and Calculational Methods

Statistical methods are used to find the ROC curve of a given shape that
best fits the data. Fitting curves to points is a common procedure in behavioral research, but the ROC presents a peculiar problem. Whereas most experimental data plots have a dependent variable (something measured) on
the ordinate and an independent variable (something varied by the experimenter) on the abscissa, in an ROC both axes are dependent variables. For
a linear fit when only one dimension is free to vary, one aims to minimize
the total discrepancy between the data points and the line on that dimension. This tactic is inappropriate for ROC data (see Appendix 1 for more
discussion).
A solution to this difficulty is provided by the statistical curve-fitting
procedure of maximum-likelihood estimation (discussed in chap. 13 and in
Appendix 1). A program that uses this method to calculate ROCs was developed by Dorfman and Alf (1969); a modified version, called ROCKIT, and
several extensions have been made available by Metz and his colleagues
(e.g., Metz & Kronman, 1980). These programs are available online.3 The
output gives the value of d'2 (called A), s (called fi), and Az.
Statistical packages with detection theory modules can also be used for
both the normal model and a variety of other distributional assumptions
(some of which we discuss in chap. 4). To use the signal detection module in
Systat, first enter the data into three columns, one for the stimulus (0 or 1),
the second for the response (any integers in the range -6 to +6), and the third
for the frequency of that stimulus-response combination, which we call
FREQ. Give this data set a name, say PROBLEM 1, and then issue the
following instructions:
USEPROBLEM1
SIGNAL
MODEL RATING=SIGNAL
FREQ=COUNT
ESTIMATE
The output gives the value of d\ (confusingly, this is called d'}, da, l/s, Az,
and the bias measures /? and ln(/J). The ROC is plotted, and its goodness of
fit is measured by chi-square. SPSS provides a similar module.
3

The Web site is http://www.radiology.uchicago.edu .

The Rating Experiment and Empirical ROCs

11

Alternative Ways to Generate ROCs
There is more than one way to gather ROC data. Although the rating method
is the most efficient, it does not even have historical priority as a procedure
(Tanner & Swets, 1954). All other methods use the same stimulus alternatives under several different experimental conditions at different times. Under the different conditions, observers are encouraged in one way or another
to change their willingness to say "yes"; we expect any change in such willingness to change both the hit and false-alarm rates, but not sensitivity.
Monetary Rewards

(Payoffs)

Experimenters may reward observers, trial by trial, for their performance.
Use of rewards mimics some real-life discrimination situations. An automotive quality-control inspector should perceive the cost of failing to detect
faulty work to be large relative to the cost of a false alarm. On the other hand
we may hope that those who can start a war in response to intelligence information are made cautious by the very high cost of a false alarm.
In a simple yes-no experiment with two possible stimuli and two responses, there are four values to manipulate: the amounts paid (or debited)
by the experimenter for hits, misses, false alarms, and correct rejections.
Different runs use the same stimuli, but a different set of financial rewards
or payoff matrix. Each set of payoffs produces a separate 2 x 2 data matrix,
and the set of (F, H) pairs defines an ROC. The optimal value of the criterion
under each payoff can be calculated from Equation 2.8.
Verbal Instructions
Explicit financial incentives can often be effectively replaced by verbal instructions. Participants are urged during some experimental runs to be lax in
reporting, for instance, that a stimulus is Old, whereas during other runs they
are urged to be strict. Well-trained participants seem able to understand these
instructions—perhaps a bit of support for the notions of SDT in itself—and
can also use "neutral" as a criterion, as well as degrees of strictness or laxness.
This procedure is just as time-consuming as paying money because each verbal criterion must be set in a separate session to establish an ROC.
Exactly what terms can be used, either in separate sessions, or in rating
experiments, is not always obvious. In recognition memory research, for
example, participants are sometimes asked to distinguish items they can remember encountering in the experiment from those they know were presented even though the specific episode is not available. The original

72

Chapter 3

motivation for such experiments (Tulving, 1985) was to tap distinct explicit
versus implicit memorial processes, but participants may also treat "remember" and "know" as different levels of confidence along the same dimension. Donaldson (1996) proposed the latter interpretation, supporting it
with an analysis in which sensitivity is calculated separately for the two putative levels of confidence represented by these responses. More recently,
Rotello, Macmillan, and Reeder (2004) argued that "remember" and
"know" responses reflect multiple sensitivities as well as different response
rules. The key point is that not every manipulation that affects hit and
false-alarm rates generates a true ROC; whether a particular set of points in
ROC space reveals isosensitivity is a substantive theoretical question.
Manipulating Presentation Probability
Another way to alter the willingness of people to say "yes" as opposed to
"no" is to change the relative likelihood of presenting the two stimuli. Viewers who are aware of the presentation probabilities are more willing to report the more likely stimulus. Experimenters can change the presentation
probability of one stimulus from session to session and keep separate records for each probability condition. This strategy is even more tedious than
the last two, especially when one of the stimuli is very unlikely, because estimating a hit or false-alarm rate requires many runs simply to get a sufficiently large sample of trials. However, some intrepid souls have collected
ROC data this way (Creelman, 1965).
Use of presentation probability to trace ROCs encounters two other
problems. First, if feedback is not used, so that the observers are unaware
of the a priori probabilities, decreasing the probability of presenting S2
may actually increase the number of "yes" responses (T. Tanner, Haller, &
Atkinson, 1967; T. Tanner, Rauk, & Atkinson, 1970). One interpretation
of this result is that participants tend to believe the presentation probabilities to be equal (similar effects in identification experiments are discussed
in chap. 5). The last difficulty with changing presentation probabilities is
that doing so may influence sensitivity as well as bias (Markowitz &
Swets, 1967; see also Dusoir, 1983). In a detection task, the higher the proportion of S2 (Signal) trials, the better the observer will be able to remember the Signal. Balakrishnan (1999) also found that changes in
presentation probability—and even changes in payoffs—can affect sensitivity. These difficulties, and the tedium of collecting enough data at low
presentation probabilities, generally make this strategy for collecting
ROCs unattractive.

The Rating Experiment and Empirical ROCs

73

Another Kind of ROC: Type 2
An empirical ROC curve plots ratings conditional on one stimulus class
against ratings conditional on another, but an analogous curve can be constructed from ratings conditional on the correctness of responses. In the
odor-recognition task (Example 3a), observers first labeled stimuli as old or
not and then expressed their confidence in their answers. A Type-2 ROC
curve relates their confidence judgments on correct trials to confidence
judgments on incorrect trials. Type-2 ROCs, first analyzed by Clarke,
Birdsall, and Tanner (1959), provide a perspective on the decision space different from that of their Type-1 siblings, as shown in Fig. 3.9.
Clarke et al. (1959) supposed that the observer's initial response was
based on an unbiased criterion placement (c = 0) and that the later confidence judgments would be high for extreme observations in either direction. Only two levels of confidence are shown in the figure: The observer
reports "sure" for observations that are either above a positive cutoff k or below the negative cutoff -k, and "unsure" otherwise. Thus, the initial response depends on a binary partition of the decision axis, and the later
confidence rating relies on a multiple partition applied to the absolute value.
The hit and false-alarm rates for a Type-2 ROC are the proportions of rating responses up to a particular level of confidence given truly correct and
criterion

"no," then provide a confidence rating.

74

Chapter 3

incorrect initial responses. Because the initial response is assumed to be
symmetrically determined, these response rates can be calculated by considering only one of the two stimuli, say S2. A conditional probability equals
the probability of a joint event (say rating a correct response as correct) divided by the probability of a marginal one (an initial correct response).
Thus,
P(rating correct I initial correct)

/"(rating correct I initial incorrect)

Equations 3.15 contain only two unknowns (d' and k) and can be solved
by iteration. The apparent sensitivity index of the Type-2 ROC (i.e., the intercept of the ROC on z axes) is less than d' . The slope is less than the assumed Type-1 slope of 1.0.
Although the analysis of Clarke et al. (1959) is almost as old as the standard Type-1 method, it is little used. This is unfortunate, because many experiments that are unsuitable for Type 1 are susceptible in principle to
Type-2 description. Consider, for example, the traditional recall task: A participant hears a series of words and later lists as many as possible. With each
item recalled, the participant provides a rating of confidence that it was in
fact on the original list. A Type-2 ROC clearly can be constructed from data
of this sort, but because there is only one stimulus class (the words on the
original list), no Type-1 curve is possible. To our knowledge, no one has
tested the usefulness of this approach to recall.
Essay: Are ROCs Necessary?
When an empirical ROC has unit slope, any point on the curve provides the
same estimate of sensitivity. Rating responses are in this case unnecessary if
we are only interested in sensitivity: The entire ROC can be inferred from

The Rating Experiment and Empirical ROCs

75

one point, and a yes-no experiment suffices to find that point. Indeed, not
every researcher employing detection theory collects ROCs, a procedure
that may appear to introduce unnecessary tedium and complexity into experiments. Are ROCs worth the effort?
It is even possible that the rating procedure distorts "true" yes-no behavior. Perhaps maintaining several criteria is a more taxing cognitive chore
than setting only one (Wickelgren, 1968). Most existing data are reassuring:
Early in the history of detection theory, Egan and his colleagues (Egan &
Clarke, 1956; Egan, Schulman, & Greenberg, 1959) obtained equivalent
measures of detectability with and without ratings in an auditory task.
The difficulty in not collecting ROCs is, of course, that if ROC slopes are
not equal to 1, then comparisons of observed sensitivity estimates may be
misleading. Consider the two points shown in Fig. 3.10, (F = .31, H= .83)
and(F=. 62, H=. 96). For both, d'= z(H) - z(F) = 1.45. Yet if the true ROCs
have slopes of, say, 0.5, the second point reflects much greater sensitivity
than the first.
Collection of full ROCs could be avoided, even if slopes did not equal 1,
if slopes were known a priori. Several theorists have offered models in
which slope is systematically related to sensitivity. Green and Swets (1966,
ch. 4) proposed that the slope s = 4/(4 + d'J, so that ROCs reflecting low
sensitivity have slopes near 1 and those measuring good performance have

F (z scale)
FIG. 3.10. Two points [(.62, .96) and (.31, .83)] that lie on different nonunitslope ROCs but the same unit-slope ROC. (The axes are scaled in z score units.)

76

Chapter 3

increasingly shallow slopes. Other families of ROCs that show a negative
correlation between sensitivity and slope are those that take the underlying
distributions to be (a) chi-square, for which the mean equals half the variance; (b) Poisson, for which the mean equals the variance; or (c) exponential, for which the mean equals the standard deviation. This last possibility
generates ROCs of a simple type, power functions of the form/f=F", where
n is a number between 0 and 1. Egan (1975) provided a thorough description of each of these ROC families, and Laming (1986) provided a
theoretical rationale for various ROC slopes and shapes.
The key question is, of course, whether the slope and sensitivity of ROCs
(equivalently, the mean and variance of the underlying distributions) are actually related in a predictable pattern. Early enthusiasm for this idea was
based on psychoacoustic models of the ideal observer in which sensitivity
was limited by statistical characteristics of the stimuli (see chap. 12). More
recently, Ratcliff, Sheu, and Gronlund (1992) pointed out that different theories of recognition memory make distinct predictions about ROC shape,
and the ROC is a popular tool for testing models in that field.
There are several psychophysical models, not tied to specific stimulus
sets, that attempt to account for ROC slopes. Graham, Kramer, and Yager
(1987) have shown that if detection of a known signal leads to a unit-slope
ROC, then detection of a signal whose characteristics are unknown (see
chap. 8) leads to a curve with a shallower slope. The stimulus-based model
of Laming (1986) predicts that discrimination ROCs should have unit
slopes and detection ROCs shallow ones.
But in many fields in which ROCs have been collected, no theories exist
for predicting then* shape. In a survey of a wide range of content areas,
Swets (1986b) concluded that the slopes of empirical ROCs vary from
about 0.5 to 2.0, and that they are not predictable from sensitivity or any
stimulus characteristic. A similar conclusion is reached in Swets and
Pickett's (1982) survey of detection theory applications to diagnostic systems in medicine and elsewhere. This finding leads directly to their recommendation that ROCs should always be collected.
Meanwhile, the user of detection theory who does not collect ratings is at
risk. For purposes of comparing two points in ROC space, the risk is least in
some important special cases: (a) If two points have the same value of F but
different values of H (or vice versa), there is no question which represents
the greater sensitivity, and (b) two points with the same bias can always be
compared. We found in chapter 2 that "the same bias" is an ambiguous
phrase, but there is less doubt about what "neutral bias" means: H=l-F.
Thus, if bias is minimal, ROCs are minimally necessary. Conditions under

_

The Rating Experiment and Empirical ROCs

77

which bias is neutral are not easy to specify either, but experience may be an
adequate guide.
Summary
In a single session of a one-interval experiment we can collect data that can
be interpreted as multiple (false-alarm, hit) pairs. This is accomplished by
asking observers to provide a graded rather than a binary response, rating
their experience on an ordered scale. The result is an empirical ROC curve.
The data are interpreted as if the observer maintained several response
criteria simultaneously. Sensitivity can be estimated separately for each criterion. If the empirical ROC has unit slope on z coordinates (so that the variances of the underlying distributions are equal), the sensitivity measure will
be the same at all criteria. If the slope of the ROC does not equal 1 , apparent
sensitivity changes along the decision axis; the slope can be interpreted as
the ratio of the standard deviations of the underlying distributions. Sensitivity can be measured in units of either standard deviation or, most commonly,
some sort of average.
Response criteria can be estimated as in the yes-no design except that
multiple criteria are now found. When variances are unequal, the criterion
location c can be measured in any of the units used for sensitivity.
Alternative ways to get multiple points on an ROC are to conduct separate sessions with different a priori probabilities or apply different payoffs
and penalties for the various outcomes.
Conditions under which the methods described in this chapter are appropriate are spelled out in Chart 4 of Appendix 3.
Computational Appendix
Derivation of Equation 3.11
Combining Equations 3.10 yields

The point of equal bias, where the criterion must equal zero, occurs where
z(H) = -z(F). In this case, the left side of Equation 3.16 equals zero, and the
right side equals -[(l/s)M{ + M2]. Thus, ML has the opposite sign from M2 and
is s times as far from zero, and -[(\ls)Ml + M2] always equals zero. The last
term in Equation 3.16 can therefore be dropped, leading to Equation 3.1 1.

78

Chapter 3
Calculation of the Point Where H = F in the
Unequal-Variance ROC

In the example, the ROC curve has a slope of 0.5 so that the S2 distribution has
a standard deviation twice that of 5,. Using Equation 2.5, normal densities
with means of 0 and d', and standard deviations of 1 and 2 can be written as

(3.17)

Setting 0j = 02 yields a quadratic equation, which we can solve for the (two)
values at which the S, and S2 curves cross. In units of the S{ distribution, the
intersections are at

the negative solution being the point below which H2, and it specifies the possible paths from stimulus to internal state and from state to response, together with the probabilities of each path. The adjusted hit rate q is the probability that S2 leads to the
D2 state; if observers could be relied on to report their internal states accurately, q would equal the hit rate H, and the false-alarm rate F would equal

_

FIG. 4.1. (a) State diagram for
single high-threshold theory. Stimuli in class S2 lead to state D2 with
probability q; "yes" responses
(guesses) are made from state D,
with probability u. (b) ROCs implied by high-threshold theory, on
linear coordinates, for three values
of q. Changing u maps out the
ROC. (c) ROCs in z coordinates.
Panels (a) and (b) adapted from
Macmillan and Creelman (1990)
by permission of the publisher.
Copyright 1990 by the American
Psychological Association.

84

Chapter 4

zero. Instead observers respond "yes" on some occasions even when in state
Dp these contaminating guesses occur with probability u and make the correction recommended by Equation 4.1 necessary.
The dependence of H and F on the adjusted hit rate and the guessing rate
can be calculated directly from the state diagram. The probability of each
path through the diagram is the product of the probabilities of the segments,
and the total probability of a response given a stimulus is the sum of the
probabilities of the possible paths. Thus:
H = P("yes"\SJ = q + u(l-q)

(4.2)

F = P("yes"IS1) = w .
Eliminating the guessing parameter u from these equations leads back to
Equation 4.1.
The ROC implied by q is obtained by solving Equation 4.1 for H in terms
of F. As shown in Fig. 4.1b, it is a straight line from (0, q) to (1,1). Unlike
the isosensitivity curves of SDT and Choice Theory, it is nonregular: A
false-alarm rate of zero can be obtained with a nonzero hit rate. On z coordinates, the ROC is not straight, but strongly concave upward.
How can we construct continuous underlying distributions that are consistent with the single high-threshold ROC? To allow for the point (0, q),
there must be a region on the decision axis where events only occur due to
S2—otherwise the false-alarm rate would not be 0. Such a region is drawn
on the right side of Fig. 4.2a. In the rest of the decision space, corresponding
to the ROC segment (0, q) to (1,1), the S} and S2 distributions could have any
shape. They must be proportional to each other, however, because the ratio
of their heights—the likelihood ratio—is constant when the ROC has constant slope. Thus, the decision space is divided into two regions, one with a
likelihood ratio that is some constant less than 1, the other with a likelihood
ratio of infinity. The boundary between these two areas is the "threshold,"
the decision-axis value above which only S2 events occur.
Figure 4.2b eliminates some unnecessary complexity by representing the
underlying distributions in a simple rectangular form. A changing value of
the parameter u (the proportion of D^ trials on which the observer responds
"yes") is modeled in this decision space by a shift in the criterion, but not by a
change in the value of likelihood ratio. The criterion can be sensibly located
only on the below-threshold segment of the decision axis (a higher location
reduces the hit rate without any compensating reduction in F, and it corresponds to a point along the vertical ROC axis below the intercept).

Threshold Models and Choice Theory

85

FIG. 4.2. Two representations of a decision space for single high-threshold theory
consistent with the ROCs of Fig. 4.1: (a) arbitrary distributions, and (b) rectangular
distributions. Panel (b) adapted from Macmillan and Creelman (1990) by permission of the publisher. Copyright 1990 by the American Psychological Association.

This model is traditionally termed high threshold because of the asymmetry between hits and false alarms. The threshold—the dividing line between the internal states—is "high" because 5, stimuli cannot hurdle it,
although S2 stimuli can. In the original application of this model to detection
experiments, the model captured the (now discredited) intuition that background noise could never lead to a "true detection," so that errors on noise
trials arose only from guessing.
Bias Measures F and u
Because the observer can control response bias only by changing the guessing rate, u is the natural bias index for single high-threshold theory. Because
u equals F (Eq. 4.2), the false-alarm rate itself is the model's bias statistic. In
terms of underlying rectangular distributions (Fig. 4.2b), u (and F) measures the location of the response criterion relative to the upper end of the Sl
distribution.

86

Chapter 4

Its association with the single high-threshold model is one count against
F as a bias index. A more serious charge is its failure to depend at all (much
less monotonically, as we have been requiring) on the hit rate H.
Low-Threshold Theory
In low-threshold theory (Luce, 1963b), asymmetric treatment of hits and
false alarms is abandoned. To compare the two theories, consider the lowthreshold state diagram in Fig. 4.3. As before, there are two internal states,
but now S2 as well as 5, can lead to either state. There are two "sensitivity"
parameters: q2, the probability that S2 leads to state D2, the "true" hit rate;
and , and with probability t from state Z>2.

Threshold Models and Choice Theory

87

The state diagram leads directly to expressions for the hit and false-alarm
rates:
(4.3)

(upper limb)
H=tq2
(lower limb)
The bias parameters t and u vary from 0 to 1.
The ROC for low-threshold theory is shown in Fig. 4.4. On linear coordinates, it consists of two straight lines, or "limbs," of different slopes, meeting at the point (qr q2). The lower limb arises from the conservative lower
limb strategy, the upper limb from the more lax upper limb response rule.
The theory predicts regular ROCs that are only moderately nonlinear in z
coordinates. Despite the tell-tale "corner" predicted by low-threshold the-

FIG. 4.4. (a) ROC implied by lowthreshold theory, in linear coordinates.
Changing u maps out the upper limb and
changing t the lower limb, (b) Same
ROC in z coordinates.

88

Chapter 4

ory, it has been experimentally difficult to distinguish this theory from normal-distribution detection theory.
To find continuous underlying distributions corresponding to the twolimbed ROC, we follow the same logic as for high-threshold theory. Because the ROC has only two slopes, there are two possible values of likelihood ratio. In each state, however, the likelihood ratio is finite, so each of
the two distributions takes on two different heights, as shown in Fig. 4.5.
The criterion can be located in either state, depending on whether the observer uses an upper or a lower limb response strategy.
Low-threshold theory retains the appealing intuitions of high-threshold theory, but avoids the unpalatable nonregularity prediction. Its primary disadvantage is its lack of a single sensitivity measure that can be
calculated from one (F, H) pair. Despite this drawback, the theory has been
of substantive interest as a model of auditory detection and, before Luce
described it, of categorical perception in speech (Liberman, Harris,
Hoffman, & Griffith, 1957).
Double High-Threshold Theory
Double high-threshold theory is most often encountered not as a proposal
about a discrete underlying process, but indirectly via its sensitivity parameter: This theory justifies the use of proportion correct to measure performance. It was first explicitly proposed by Egan (1958; summarized in
Green & Swets, 1966, pp. 337-341).

FIG. 4.5. Decision space consistent with the low-threshold ROC of Fig. 4.4,
using rectilinear distributions.

Threshold Models and Choice Theory

89

The Sensitivity Measure p(c)
In chapter 1, we contrasted p(c) with d' as a measure of performance. In
general, p(c) is found by averaging H and 1 - F using presentation probabilities as weights:
F)

=

(4.4)

p(Sl)+p(S2)H-p(Sl*)F,

where p(S) is the probability that S. is presented. Proportion correct equals a
constant plus the difference between weighted hit and false-alarm rates,
with different weights (multiplicative constants) applied to each. When the
number of trials for each type of stimulus is equal, the weights are the same
and proportion correct only depends on the difference between H and F:
p(c) = V2(l+H-F).

(4.5)

Early on, Woodworm (1938) suggested H- F as a performance measure for
recognition memory experiments.
Underlying Rectangular Distributions
Like all sensitivity measures, p(c) implies a decision theory: To usep(c) to
summarize performance is to say that when bias is manipulated, p(c) should
remain constant. The state diagram of the underlying model is shown in Fig.
4.6a. There are three discrete states: D, arises only when Sl occurs, D2 can be
triggered only by Sv and an intermediate state D? can occur for either stimulus. The model specifies two "high" thresholds, each of which can be
crossed by only one of the two stimuli. The special case in which the Dl state
is omitted is equivalent to single high-threshold theory.
As with both high- and low-threshold theories, the sensitivity parameter
in this model is a "true" detection rate. The proportion of S2 presentations
leading to the D2 state equals the proportion of Sl presentations leading to
the D, state; both equal 2p(c) - 1. If p(c) equals .8, for example, the proportion of trials falling in the "sure" D, and D2 states is .6. Other trials lead to the
uncertain state D?, where they are assigned "yes" and "no" responses according to the observer's response bias v.
The ROCs forp(c) were shown in chapter 1 to be straight lines with unit
slope when plotted on probability coordinates. Like the ROCs for single
high-threshold theory, they are curved on z coordinates. Underlying distri-

90

Chapter 4

FIG. 4.6. (a) State diagram implied by double high-threshold theory. Stimuli in
class Sj lead to state Df with probability q and to state D., with probability 1 - q. The
uncertain state leads to a "yes" response with probability v. (b) Underlying rectangular distributions consistent with double high-threshold theory. The criterion can
be located anywhere in the D, region. Adapted from Macmillan and Creelman
(1990) by permission of the publisher. Copyright 1990 by the American Psychological Association.

butions consistent with double high-threshold theory are shown in Fig.
4.6b; as the state diagram (Fig. 4.6a) shows, 5, presentations can lead either
to D, or D?, S2 presentations to either D2 or Dr There are three values of likelihood ratio—zero, infinity, and one value between. The use of proportion
correct makes very strong assumptions about the internal representation of
stimuli.
For sensory detection experiments, these assumptions are not very plausible, but some memory studies have produced linear ROCs. Yonelinas
(1997) conducted an associative recognition experiment: Participants were
presented with pairs of words in both the study and test phases; the question

Threshold Models and Choice Theory

91

was whether the test pairs had occurred together in the study phase. The
ROC data, presented in Fig. 4.7, are clearly linear and consistent with double-high threshold theory. Notice that the ROCs do not have slope 1; instead
they are consistent with a representation in which S2 presentations are detected as Old at a different rate than 5, presentations are detected as New. In
the state diagram of Fig. 4.6a, the parameter q is replaced by separate parameters q} and q2; in Fig. 4.6b, there are still three values of likelihood ratio,
but the intermediate value is not 1.
What accounts for ROC data of this sort? Yonelinas argued that decisions
in associative recognition cannot be based on familiarity because familiar
words may not have occurred together in the study phase. Instead participants must "recollect" the specific episode in which they last encountered
the pair, and recollection is a threshold process. The two limbs of the state
diagram reflect different types of recollection: A pair may be recollected as
Old, or the participant may recollect that one of the two words had a differ-

FIG. 4.7. ROCs for recognition memory from Yonelinas
(1997). (a) Item (single-word)
recognition, and (b) associative (word-pair) recognition.
Adapted with permission.

92

Chapter 4

ent partner in the study phase. The accuracy of these two strategies may differ, accounting for the different intercepts of the ROC.
Bias Measures
Two bias measures that appear to make no distributional assumptions are
actually consistent with the double high-threshold model. These are the yes
rate, l/2(H + F), and the error ratio (1 - H)IF.
Yes Rate. To see the connection between the yes rate and the double high-threshold model, consider again the model's decision space,
shown in Fig. 4.8. The center of the region of overlap is set to zero, and the
criterion k is measured with respect to this origin. Then H=p(c) - k and F =
1 -p(c) - k; solving these equations yields

(4.6)
The criterion is thus a simple linear transformation of the yes rate; like c in
detection theory, the yes rate reflects the location of the criterion relative to
the halfway point between the Sl and S2 distributions.
The relation between k and p(c) is suggested by the similarity between
Equation 4.6 and the corresponding expression for sensitivity when an unweighted average of Hand F is used (Eq. 4.5). The false-alarm and hit rates
are added in Equation 4.6 and subtracted in Equation 4.5, and the same

FIG. 4.8. Decision space for the double high-threshold model as in Fig. 4.6b.
Shaded area is the false-alarm rate, diagonal area is the hit rate. The criterion k is
monotonic with the overall yes rate. Adapted from Macmillan and Creelman
(1990) by permission of the publisher. Copyright 1990 by the American Psychological Association.

Threshold Models and Choice Theory

93

transformation is applied to the result. We encountered a similar relation for
detection theory models (e.g., compare Eqs. 1.5 and 2.1).
Error Ratio. Like c, the yes rate measures the same distance
along the decision axis whether the sensitivity measure is large or small. If
we linearly transform k into a new variable k' that varies from 0 to 1 , no matter what p(c) is, we obtain
(4 7)

'

that is, something that only depends on the error ratio. The parameter k' is, in
fact, equal to 1 - v (see Fig. 4.6) and is therefore equivalent as a bias measure
to v, which was proposed as a bias index by Snodgrass and Corwin (1988).
Comparison of Indexes. Isobias curves for the yes rate and error
ratio are shown in Fig. 4.9. As might be expected from their decision-space
interpretation, the two indexes share attributes with analogous detection
theory statistics. Curves for the yes rate, like curves for c, are parallel, but on
linear rather than z coordinates. Curves for the error ratio, like detection theory curves for relative criterion location, converge at a point (H = F=l and
H - F = .5, respectively).
What about the likelihood ratio? As noted earlier, there are only three different values of likelihood ratio in the proportion correct model. Variation
of criterion within the overlap region does not change likelihood ratio,
which is therefore of little use as a bias statistic for this (or any other)
threshold theory.
Evaluating the yes rate and the error ratio as measures of bias is more difficult than passing judgment on threshold sensitivity indexes. A long history of collecting empirical ROCs (Green & Swets, 1966; Swets, 1986b)
has suggested limits on the shape of implied ROCs, whereas the much
shorter history of collecting empirical isobias curves has been inconclusive.
By some theoretical standards, however (Macmillan & Creelman, 1990),
the two measures fare well. Both change in the same direction with increases in H and F, behave well when sensitivity is at chance, and are undistorted if computed by averaging across participants or conditions. Because
its isobias curves are parallel rather than divergent, the yes rate is independent of p(c) and acts sensibly when sensitivity is below chance; the error ratio
only approximately meets these desiderata.

94

Chapter 4

FIG. 4.9. Isobias curves for (a) the
yes rate, and (b) the error ratio.
Adapted from Macmillan and
Creelman (1990) by permission of the
publisher. Copyright 1990 by the
American Psychological Association.

There is an argument for preferring c over the yes rate: When sensitivity and
bias indexes are both reported, they should derive from the same model. Although there is little to choose between detection theory and double highthreshold bias measures, the sensitivity statistic of detection theory is superior.
Choice Theory
Luce (1959) conjectured that the odds of choosing one stimulus over a second are unaffected by other possible stimuli, and this choice axiom is the basis for the structure of Choice Theory. Although this starting point does not
sound related to the principles of detection in noise that led to the models of
chapters 1 to 3, we shall see that the two theories are formally very similar.
The idea of a decision continuum, and the form of underlying distributions,
can be derived from the choice axiom. Choice Theory predictions look
much like those from the normal-distribution model in simple detection

Threshold Models and Choice Theory

95

tasks and are sometimes easier to generate for more complex experiments.
Because Choice Theory is a close cousin of signal detection theory in many
applications, from now on we include it under the phrase "detection theory."
We continue to use the abbreviation "SDT" to refer to normal distribution
models.
Sensitivity Measures
In Choice Theory (Luce, 1959), the sensitivity measure a is found by

In chapter 1, we noted that the sensitivity measures d' andp(c) amounted
to differences between transformed values of H and F. Choice Theory also
has such an index, obtained by taking the logarithm of a (and thus equivalent to it):1

In Choice Theory, the transformation applied to H and F is the log-odds
transform, which converts a proportion/? to p/(l -p) (the odds in favor) and
then takes logarithms.
To give an idea of the magnitude of a: If F = A and H = .8, then a = [(.8 x
.6)/(.2 x .4)f = 2.45 and ln(a) = 0.90. The (F, H) pair (. 1 , .4) leads to the same
values; these points give similar (although not identical) values of d' . Total inability to discriminate (H = F) leads to a = 1 , ln(a) = 0. When H = .99 and F =
.01 , a= 99 and ln(a) = 4.60. A proportion correct of .75 on both types of trials
yields a= 3, ln(a) = 1.10;/?(c) = .73 corresponds to ln(«) = 1.0.
These examples suggest that d' and ln(a) are similar as measures of sensitivity, and Fig. 4.10 shows that they are very nearly proportional to each
other for low to moderate values. The relation between them can be approximated by ln(a) = 0.8 1 d' , with deviations from this equation being greatest
for hit rates near 1 or false-alarm rates near 0. Figure 4.10 encourages us to
choose between the two accuracy indexes on the basis of convenience; the
two analyses are not likely to support discrepant conclusions.
'Luce (1959, 1963a) assigned sensitivity the symbol a , with slightly different meanings in two versions
of Choice Theory. In memory research, one of the areas in which Choice Theory is most widely used,
ln(a) is sometimes called dL (Hintzman & Curran, 1994) to highlight its similarity to d' .

96

Chapter 4

FIG. 4.10. Relations between ln(a) and d' for the zero-bias case, and for two
cases of bias to respond "yes."

Implied ROC Curves
What is the form of the ROC implied by the Choice Theory measure a? To
answer this question with a question, what transformation would render
these curves straight lines? As Equation 4.9 makes clear, the required function is log odds because

If we were to plot the ROC in log-odds coordinates, (ln[///(l - H)] vs
ln[F/(l -F)]), then the (constant) distance between the ROC and the chance
line would be 2 ln(a). The analogy to SDT correctly suggests that 2 ln(a)
plays the role of mean difference in the decision space.
To get a feel for the relation between the log-odds and z transformations,
consider Fig. 4.11, in which ROCs for constant a and constant d' are plotted. It is hard to distinguish the two sets of curves, which differ systematically only for very small or large proportions. An important difference
between Choice Theory and SDT, however, is that the ROCs implied by a
are always symmetric (like those implied by d'), but there is no measure
analogous to da that allows for ROC curves that are not of unit slope.

Threshold Models and Choice Theory

97

FIG. 4.11. ROCs for SDT and Choice Theory on linear coordinates. Curves connect locations with constant d', and xs are points of constant a.

Bias Measures
Choice Theory's bias measure b (for "bias") can be computed from

Taking logarithms reveals that ln(£), Like c in SDT, is the sum of the transformed hit and false-alarm rates, the transformation in this case being log odds:

As we shall see shortly, ln(£) is a measure of criterion location. Division by
the sensitivity parameter 21n(a) yields a measure of relative criterion analogous to c' :

98

Chapter 4

Finally, the likelihood ratio j3L can be shown to equal2

The algebraic form of Equation 4.12 leads one to expect the isobias curve
for b to be much like that for c, and this conjecture is correct. Although
Equations 4.13 and 4.14 provide less of a hint, isobias curves for relative
criterion and likelihood ratio in the Choice Theory model are also very similar to their SDT counterparts (see Fig. 2.7).
Decision Space
From our analysis of the normal distribution SDT model, we know that sensitivity is a difference of transformed hit and false-alarm rates and response
bias a sum. The transformation, in Choice Theory, is log-odds, which converts a proportion /? to ln[/?/(l -/?)]. Figure 4. 12 shows how this operation is
used to convert the false-alarm/hit pair (.4, .8) to the sensitivity statistic
21n(a) and the bias statistic ln(b).
The decision space implied by these Choice Theory measures contains
two underlying distributions whose form is logistic, rather than normal. The
logistic distribution is symmetric and only subtly different in shape from the
normal when plotted on a log-odds axis (see Fig. 4.13). As in the SDT
model, the distance between the means of the S, and S2 distributions is a sensitivity measure; its value is 21n(a). If we define 0 as the point at which the
two distributions cross, then the distribution means are at ±ln(a) and the criterion is located at ln(b).
In the normal model, the transformation from p to location on the decision axis is z(p)\ the reverse operation, to find hit and false-alarm rates from
a z-score axis location, is O. Both are found using the normal table. In the logistic model, the log-odds transformation is used to find log-odds axis locations, called logits; to find hit and false-alarm rates from a log-odds value
requires solving the equation
x = \n[p/(l-p)]

2

See Computational Appendix.

(4.15)

Threshold Models and Choice Theory

99

FIG. 4.12. A logistic distribution function. The inverse function can be used to
transform proportions into logits. Sensitivity [21n(a)] is the difference between
ln[H/(l - H)] and ln[F/(l - F)].

FIG. 4.13. Decision space for the yes-no experiment according to Choice Theory (logistic distributions).

100

Chapter 4

for p. The solution is
p=l/(l+ex).

(4.16)

To find H and F, * must be expressed as a distance from the mean. For the S2
distribution, x = ln(6) - In(a), and for the 5", distribution, x - ln(&) + ln(a).
Substituting into Equation 4.16 yields
H=a/(a+b)

(4.17)

F=l/(l + ab) .

(4.18)

For an unbiased observer, b=l,H= aJ(a+ 1), and F= l/(a+ 1). Again, H
= 1 - F, and
).

(4.19)

Measures Based on Areas in ROC Space: Unintentional
Applications of Choice Theory
An appealing measure of sensitivity is the area under the ROC, which increases from .5 for chance performance to 1.0 for perfect responding. We
saw in chapter 3 that if the underlying distributions are normal, the estimated area Az is simply related to the mean difference index da\ in addition,
the area can be estimated nonparametrically from ROC data. If only a single (F, tf) point is available, however, we are forced to assume that the underlying distributions are normal, logistic, rectangular, or something
specific. In this section, we consider measures of sensitivity and bias for
single ROC points that were developed without recourse to detection theory. We shall find, however, that most of them are equivalent to parameters
of the logistic model.
Sensitivity: Area Under the One-Point ROC
If only one point in ROC space is obtained in an experiment, there are many
possible ROCs on which it could lie, and some assumptions must be made
to estimate the area under the ROC. One possibility is to find the smallest
possible area consistent with that point. As shown in Fig. 4. 14, this is equivalent to finding the area under the low-threshold ROC for which the ob-

Threshold Models and Choice Theory

101

FIG. 4.14. Calculation of the area under the one-point ROC. The minimum area
is shaded; the statistic A' is the minimum area plus one half the sum of regions A,
and A2. The dashed line is an example of an ROC that bounds an area greater than
the minimum but less than the maximum (minimum plus A, and A2).

tained point forms the corner. This area turns out to equal proportion
correct, a measure with which we have already dealt harshly.
A better estimate, proposed by Pollack and Norman (1964), is also diagramed in Fig. 4.14. Their measure A' is a kind of average between minimum and maximum performance and can be calculated (Grier, 1971) by3
j, ( g - f X l + g - f )
2
4/f(l-F)

.{H>F

(4.20)

If performance is below chance, so that H M), the design is traditionally called
category scaling, but is now often called categorization. We consider the
important special case in which M = 2 in detail first. When N equals M but
both are greater than two, the experiment is absolute judgment, absolute
identification, or simply identification', the second part of the chapter concerns this task.
Classification experiments can be modified by the addition of a standard
stimulus. The stimuli being judged are called comparisons, and a (standard,
comparison) pair is offered on each trial. The presence of standards makes
113

114

Chapter5

no difference to our analysis of classification because the standard gives no
information regarding which response is appropriate. As examples, we use
both tasks with standards and tasks without.
Perceptual One-Dimensionality
What is a "one-dimensional" stimulus set? In the examples used so far in this
book, some stimulus sets are physically one-dimensional (or, to borrow Klein's
[1985] phrase, can be produced with a "single knob"). Examples in sensory
work include intensity and frequency. The stimuli in face recognition and
X-ray reading, on the other hand, clearly vary in many physical dimensions.
The question of the perceptual dimensionality of a stimulus set is distinct
from that of physical structure. Stimuli differing in one dimension can produce multi-dimensional perceptual changes. A dimension that seems to behave in this way is the phase relation between components in a visual grating.
Data suggest that changing the relative phase of components of a stimulus
from negative to zero to positive may yield two-dimensional ("monopolar"
and "bipolar") effects (Klein, 1985). Conversely, stimuli differing in complex
ways can produce internal representations differing along a single continuum. Cases in which two variables appear to contribute to a common dimension of judgment, called trading relations, occur in such disparate fields as
lateralization of binaural stimuli (Moore, 2003) and speech recognition
(Repp, 1982). We consider a speech example later in the chapter.
A detection-theory characterization of perceptual one-dimensionality is
shown in Fig. 5.1. The sensitivity statistic d' is a distance measure, as we
saw in chapter 1, and distances along a single dimension add up. Thus, if
stimuli S1,, S2, and 53 give rise to distributions along a continuum, with their
means in the order nl.

To calculate these proportions, we need information about the normal
distribution function, and for this purpose Table A5.2 is the more convenient of our two tables of this curve (see chap. 2). For each positive z score,
the table gives O(z), the area from the left tail of the distribution to the criterion.1 The general rule is that areas in Table A5.2 are from one tail of the distribution to a z score on the opposite side of the mean. For z scores on the
same side, the areas in the table must be subtracted from 1.
The four basic SDT probabilities are easily found from the table:
• Correct rejections. In the upper panel of Fig. 6.1, the area below the
criterion is the probability of a correct rejection. The criterion c = 1, so
this probability is O(c) = O(l) = .84.
1

Remember that (uppercase) 5> is not the same as (lowercase) , the height of the normal curve, which is
given in Table A5.1.

144

Chapter 6

• False alarms. Still in the upper panel, the area above the criterion is
the false-alarm rate. The total area under the curve is 1, so this probability is 1 - 0(1) = .16.
• Hits. In the lower panel, the value of z at the criterion is negative, specifically -1. The table does not contain negative numbers, and the
symmetry of the normal distribution must be used. The area to the
right of a negative z score equals the area to the left of the corresponding positive z score, so the hit rate is O(l) = .84.
• Misses. Still in the lower panel, the area below the criterion is the miss
rate. The total area under the curve is 1, so this probability is 1 - O( 1) =
.16. Because this is an area below the criterion, it is also a value of O itself, namely, O(-l).
Two-Dimensional Distributions That Can Be Analyzed
One-Dimensionally
Multidimensional distributions build on the familiar one-dimensional variety, but there are several steps in the generalization. Our goal in this chapter
is to describe the compound detection problem, in which two compound
stimuli are discriminated, but to simplify things we temporarily consider
the unrealistic case in which only one (two-dimensional) distribution rather
than the usual pair is possible. We still refer to the observer as making a decision, although a well-informed decision maker would simply produce the
same response on every trial.
Figure 6.2 shows two ways to draw the joint distribution of two variables,
produced when a light and a tone are presented simultaneously. One strategy is to add a third dimension to the graph: A two-dimensional graph (like
Fig. 6.1) was needed to display one variable and its likelihood distribution,
and a three-dimensional graph (Fig. 6.2a) can show two variables plus the
likelihood of the combination. The overall distribution is a hill situated on a
surface defined by loudness and brightness dimensions. A particular value
of loudness and brightness is a point on the surface, and the likelihood of
that value is the height of the hill over that point. The highest point, which
represents the greatest likelihood, lies over the means of both variables, the
point (jix, /xp.
As decision problems grow in complexity, three-dimensional pictures of
perceptual spaces quickly lose their charm. We instead use cross-sections to
represent distributions, omitting the likelihood dimension. The circles in
Fig. 6.2b connect (jc, v) points of equal likelihood from Fig. 6.2a. They can
be thought of as paths of constant height around the hill in Fig. 6.2a or pla-

Detection and Discrimination of Compound Stimuli

145

HG. 6.2. Two representations of a two-dimensional distribution
of the brightness and
loudness of the light/
tone pair: (a) The likelihood of each (x, y) point
is a value in the third dimension, and (b) the
likelihood dimension is
suppressed, and each
circle is the locus of
points having the same
likelihood.

teaus obtained by slicing off the top of the hill at a constant height. The center of the circles is still (nx, /xy), the means on the two axes, and the diameters
represent the standard deviation or a multiple of the standard deviation. Notice that, compared with Fig. 6.1, in which the psychological space is onedimensional, the cross-section picture in Fig. 6.2b portrays a two-dimensional space, each dimension representing a psychological variable. Likelihood is not shown, but the equal-likelihood contours do convey useful
information as we shall see.
Now what about the observer's criterion? In one-dimensional problems,
this was just a point (z = +1 or -1 in Fig. 6.1), but with two internal dimensions we need a curve or line, called a decision boundary, that gives values
of y for all possible values of x. The example in the lower half of Fig. 6.1—a
criterion one standard deviation below the mean—is rendered as a two-dimensional plot in Fig. 6.3. The problem of finding the "yes" rate looks
much more difficult in this representation: Instead of finding the area to the

146

Chapter 6

FIG. 6.3. A two-dimensional distribution (in the style of Fig. 6.2b) and a decision boundary. Points to the right of the boundary represent above-criterion values
of loudness, and the value of brightness is ignored. The marginal distributions of
brightness and loudness are shown along their respective axes.

right of z = -1, we are interested in the volume to the right of the line zx = -1.
Fortunately, there is a shortcut.
To understand the shortcut, called projection, we need to know a little
more about the joint distributions (i.e., those that depend on more than one
variable). If the distributions are normal, they can be described by five values: the means on both variables, the standard deviations on both variables,
and the correlation between them. For calculations involving only one dimension, we can use the marginal distribution on that axis, that is, the distribution of x (ignoring y) or y (ignoring x). One way to think about marginal
distributions is to imagine that the three-dimensional joint distribution is
tipped on its side so that all the mass piles up on one axis. The height at each
value of x in the marginal distribution of x corresponds to the summed
heights of the joint distribution at every point for that value of x for any
value of y. The marginal distributions, shown along the axes of Fig. 6.3, are
also normal, and the mean and standard deviation of the marginal distribution on x are the same as the x-mean and x-standard deviation of the joint
distribution. The joint distribution is said to be projected onto the Jt-axis.
Now we can calculate the "yes" rate for an observer with the representation shown in Fig. 6.3 for repeated presentations of a tone-light pair. The
vertical criterion line means that the decision is based solely on the loudness
of the tone—as if the judgment was made with the eyes closed. The probability of an observation to the right of the decision boundary is the volume to

Detection and Discrimination of Compound Stimuli

147

the right of that boundary in the joint distribution, but this is the same as the
area to the right of the criterion in the marginal distribution. If z = -1, this
area is 1 - O(-l) = .84—the same as in Fig. 6.1b. This is exactly what one
should expect: The probability of detecting the tone is the same when only
the tone is presented as when both a tone and light are presented, but the
light is ignored.
In Fig. 6.3, the joint distribution is drawn as circular, with equal standard
deviations on the two dimensions. For many stimuli (including the simultaneous tone and light presentation we have been discussing), there is no reason to expect equal standard deviations. When variability is unequal on the
two dimensions, equal-likelihood contours are elliptical rather than circular, as in Fig. 6.4. Computations in which the joint distribution of x and y is
projected onto either x or y are unchanged. For example, in Fig. 6.4, the
standard deviation on x is 2 and on y is 1. The area to the right of a vertical
line at x = -1 is 3>(0.5) = .69, but above the line y = -1 it is O(l) = .84. If the
distribution were circular, these two numbers would be equal.
Two-Dimensional Decision Rules
That Can Be Analyzed One-Dimensionally
We have now succeeded in finding the "yes" rate in a two-dimensional perceptual space by projecting a joint distribution onto a single-dimensional
one. This simplification always works when the decision boundary is a
straight line, and the line need not be perpendicular to one of the axes.
Suppose the standard deviations are the same on the two axes, so that the
likelihood contours are circles. An observer might reasonably decide to add
the values of loudness and brightness and use the sum as the basis for a deci-

FIG. 6.4. A two-dimensional
distribution in which the standard deviation of x is greater
than the standard deviation of y.
The outer ellipse represents
points one standard deviation
from the mean, and the inner ellipse represents points 0.5 standard deviations from the mean.

148

Chapter 6

sion about whether the tone-light pair was presented. This observer's decision axis, shown in Fig. 6.5a, is a line at a 45-degree angle to both axes.
Values increase as we move up and to the right along (or parallel to) this
axis: The point (-!,-!) has a sum of-2, (0,0) has a sum of 0, (1,1) has a sum
of 2, and so on. The decision boundary, as in earlier examples, is perpendicular to the decision axis. In the figure, the boundary is set so that any sum of
loudness and brightness greater than -2 leads to a "yes" response. Thus, an
observation of (0, -1) produces a "yes" and (-2,-!) a "no."
For this boundary, what is the probability of a "yes" response? The projection strategy is appropriate, but the projection must be onto the decision
axis (the sum of loudness and brightness), not the ;t-axis (loudness) or the
v-axis (brightness). In the figure, the marginal distribution is drawn on an
axis parallel to the decision axis; notice that all points on the decision
boundary project onto the same point on the decision axis, as is necessary
for projection to work.

FIG. 6.5. (a) A two-dimensional
distribution, a decision axis for increasing values of the sum of AC andy,
and a decision boundary that is the locus of points with a fixed sum of x
and y. The marginal distribution of x
+ y is shown parallel to the decision
axis, (b) Demonstration that the distance from the decision boundary to
the mean of the distribution is the Pythagorean sum of the distances along
the*- andy-axes.

Detection and Discrimination of Compound Stimuli

149

How far is the projected boundary—the criterion—from the mean? At
the critical point (-!,-!), the distance to the mean of (0,0) is, by the Pythagorean Theorem, v2 or 1.41 units (see Fig. 6.5b). The area to the right of the
boundary is therefore O(1.41) = .92. The observer who uses both loudness
and brightness in deciding whether the tone-light pair occurred has a higher
hit rate (92% detections) than the one who is detecting only one or the other
(84%) because the former has two useful pieces of information, the latter
only one. (Keep in mind, however, that this number is just a hit rate, not an
index of sensitivity; a true detection theory analysis is yet to come.)
Some Characteristics of Two-Dimensional Spaces
So far the analysis of two-dimensional perceptual spaces has been only a
matter of properly reducing them to one-dimensional problems. When this
is not possible, it is because the distributions, the decision rule, or both require the second dimension to be taken seriously. Before we return to the
compound detection problem, a brief tour of two-dimensional-space geography is necessary.
Perceptual Independence and Dependence of Distributions
The essential simplicity of distributions like those in Fig. 6.2 through 6.5,
and the possibility of analyzing either component dimension separately,
arises from the lack of correlation between the dimensions. Normal
bivariate distributions with zero correlation result from statistically independent variables and are said to be perceptually independent (Ashby &
Townsend, 1986). The opposite case is called perceptual dependence; this
condition arises in vision, for example, because increasing the brightness of
a patch of light tends to increase its yellowness, and it arises in hearing because increasing the loudness of a pure tone slightly increases its pitch.
There are two equivalent ways to represent perceptual dependence. In Fig.
6.6a, the x and y axes are nonorthogonal; in Fig. 6.6b, the axes are orthogonal,
but the distribution is elliptical. In this figure, the marginal as well as the joint
distributions are displayed, and one way to see that the elliptical distribution
is not perceptually independent is to compare it with the distribution shown in
dashed lines. This distribution is the result of multiplying the marginals together, and it is circular.
Perceptual dependence always refers both to the shape of the bivariate
distribution and its orientation in the perceptual space, or equivalently the

150

Chapter 6

FIG. 6.6. Two equivalent representations of perceptual dependence: (a) the x- and y-axes meet at a
nomight angle, and the distribution
is circular (correlation equals 0);
(b) the axes are orthogonal, but the
distribution is elliptical (correlation
is not equal to 0). The dashed lines
represent a perceptually independent distribution constructed from
the two marginal distributions.

angle between the underlying axes. There is a simple quantitative relation
between the two depictions: The correlation of the bivariate distribution in
panel b equals minus the cosine of the angle between the axes in panel a.
Two-Dimensional Decision Boundaries: The Product Rule
Figure 6.7 illustrates another way to divide up the three-dimensional perceptual space, one that explicitly makes use of the two distinct dimensions
by placing a separate criterion on each of them. The distribution still describes the internal effect of a tone-light pair, and it exhibits perceptual independence. The space (and the distribution) is divided by the criteria into
four regions according to whether the observation is above or below each.
Suppose a cautious observer requires a combined observation to be
above both of the criteria in order to respond "yes." We are interested in the
volume that looms over the area shaded in Fig. 6.7a. The proportion of the
distribution's volume to the right of the x criterion (when it is located one
standard deviation below the mean, as in the figure) is <1>(1) = .84, as we

Detection and Discrimination of Compound Stimuli

151

FIG. 6.7. The maximum and minimum rules. In the maximum rule (a),
the observer responds "yes" only if
both jc and y exceed their respective
criteria. In the minimum rule (b), the
observer responds "yes" if either* ory
exceeds its respective criterion.

found earlier (see Fig. 6.3), but we are interested in just some of that volume, the part that is higher than the y criterion. What fraction would that be?
For the whole distribution, the proportion of volume above the line is .84,
and a convenient consequence of perceptual independence is that this same
proportion applies to any fraction of the distribution to the left or right of the
jc criterion. The fraction of the marginal distribution that is to the right of the
criterion on jc is. 84, but not all of that area can be counted because of the criterion on v. Thus, the volume over the shaded area is .84 x .84, or .71. We
call this principle, naturally enough, the product rule: The volume under a
distribution that is above a horizontal criterion zy and to the right of a vertical
one zx equals the proportion above the horizontal criterion [(-zy), found
from the y marginal distribution] times the proportion to the right of the vertical criterion [O(-z), found from the x marginal]. As an equation:
volume over an infinite "rectangle" above zy
and to the right of z = O(-z) O(-z).

(6.1)

152

Chapter 6

With this result in hand, we can easily find the volume beneath the joint
distribution over the unshaded area in Fig. 6.la. This represents the likelihood that the compound stimulus would lead to a "no" response—that is,
the miss rate. This area is L-shaped rather than rectangular, so the product
rule cannot be used directly, but the likelihood is the complement of the hit
rate, 1 - .71 = .29.
Now consider an alternative decision rule. The shaded area in Fig. 6.7b
corresponds to all observations that are either above the y criterion or to the
right of the x criterion, and it reflects the "yes" rate of an incautious observer
whose decision rule is to say "yes, the compound is present" if either tone or
light yields a sufficiently large input, regardless of the value of the other.
This time it is the miss rate that can be calculated directly from the product
rule; it equals (.16) x (.16) = .026. The hit rate, the volume over the shaded
area, is the complement of this value, or .974.
Even if the standard deviations for two dimensions differ, as in Fig. 6.4,
the procedure for finding the volume over (infinite) rectangular area is the
same. For example, the area in the upper right-hand quadrant of Fig. 6.4 is
the volume above the y criterion, which is ^(-zy) = .84, times the volume to
the right of the x criterion, which is 3>(-zx) = .69, for a product of .58.
Compound Detection
We are at last ready to tackle the problem with which the chapter began, the
detection of compound stimuli such as a simultaneous tone burst and light
flash. An important part of any solution must concern the comparisons
likely to interest the experimenter, in particular detection using the same
components of the stimulus, but alone rather than in combination. For example, the research question might be how combined detection of the
tone-light combination compares with detection of the tone or the light
when either is presented separately. This focus on relative performance in
more than one task with the same stimuli is a strength of multidimensional
detection theory; it allows for theoretical "converging operations," relating
performance in separate tasks (Garner, Hake, & Eriksen, 1956).
Equal-Variance Uncorrelated Representation
for Compound Detection
Half of the representation for compound detection—the distribution due to
the compound stimulus—has been displayed in previous figures. The missing half, in detection, is the distribution due to no stimulus. In Fig. 6.8, two

Detection and Discrimination of Compound Stimuli

153

circular unit normal distributions arise—one with a mean of (0, 0) for the
no-stimulus distribution, the other with a mean o f ( d •*' , d '") for the stimulus
distribution. We develop equations for this general case, but also track the
specific example in which dx' and d' both equal 1.
Decision Rules
A characteristic of multidimensional tasks is that observers may plausibly
adopt any of a number of response strategies, as spelled out in earlier sections
of this chapter. In the one-interval design, variations in performance could be
produced by changing the location of the criterion, and by some degree of inattention, criterion fluctuation, and so on. A criterion shift, according to detection theory, represents a change in the likelihood of using one response
rather than the other, and it does not affect sensitivity as measured by d'. Inattention and criterion fluctuation produce lower performance, arising because
the observer acts nonoptimally. The alternate rules adopted in multidimensional tasks provide an additional level of complexity.
How can we compare different strategies for dealing with the same multidimensional decision problem? In analyzing the one-interval design, we
stressed the bias-free measure d', but d' is a characteristic of the task or
problem, not of the decision rule. To understand decision strategies for a
representation in which d'is the same for the strategies to be compared, we
are forced to depend on some other index. One possibility isp(c), although
we have seen that this depends on the criterion when d' is fixed. A natural

FIG. 6.8. The compound detection problem: A compound
stimulus must be discriminated
from a null stimulus.

154

Chapter 6

criterion-free measure is/?(c)max, the value of p(c) when responding is unbiased, and we sometimes adopt this measure. For the one-dimensional
yes-no task, best performance is with a criterion halfway between the
distributions; we saw in chapter 2 (Eq. 2.10) that in that case,
(6.2)

Decisional Separability. The first decision rule to be considered
is the simplest and the most obviously inadequate: Attempting to detect a
stimulus that has two components, the observer ignores one of them. We
considered this strategy early in the chapter; it makes use of the marginal
distribution of one component. As shown in Fig. 6.9, the decision boundary
is a straight line parallel to one of the axes, a condition called decisional separability. For example, in the tone-light detection example, the observer
considers only the amount of activation on the loudness dimension. The effective sensitivity to the combination is d'x, so/?(c), assuming equal presentation probabilities, is the average of the hit and correct rejection rates,
0.5[<&(d'x - k) + O(jy]. The maximum value is for a criterion halfway between the distributions, and/?(c)max = 

16

9

25



7

18

25

The rows of the table correspond to stimulus sequences, denoted by angle-bracketed lists, rather than to individual stimuli. In this example, the
stimuli are listed spatially from top to bottom in the stimulus presentation so
that, for instance,  represents the presentation of an Old word
on top and a New word below. A temporal sequence rather than a spatial one
is used in auditory (and many visual) 2AFC experiments, but the analysis is
the same.
The two possible responses are "old on top" and "old on bottom." The designations "hit" and "false alarm" in this case are arbitrary; we define them as:

Two questions we can ask of these data are parallel to those posed for the
one-interval design: How sensitive are the observers? How biased are they?
A third question concerns the relation between 2AFC and yes-no. As many

168

Chapter 7

readers will surmise, 2AFC is the easier task. Models for 2AFC must build
on those for yes-no and give an account of this discrepancy in performance.
The first and third questions can be answered together: To compute sensitivity for these data, we first subtract the transformed hit and false-alarm
rates, as we did for one-interval data in chapter 1 . To take account of the difference in difficulty between 2AFC and yes-no, this difference must be adjusted downward by a factor of V2 as follows:

For the example at hand, H = .64 and F = .28, the transformed difference
z(#)-z(F) = 0.358-(-0.583) = 0.941 (from Table A5.1), and d' =0.665.
Choice Theory leads to exactly the same prediction about the
2AFC/yes-no relation. From the hit and false-alarm rates in 2AFC, ln(a)
can be found from

which is a factor of ^12 less than if the data had arisen from a yes-no experiment.
Representation and Analysis
To understand why forced choice should be easier for the observer than
yes-no (and why the discrepancy should be V2), we must derive the characteristics of the decision space underlying a forced-choice task. Each trial involves two stimuli, the top word and the bottom one. We assume that the
observer estimates the familiarity of each word independently, which
means that we can treat each spatial location as a separate dimension in the
decision space. Geometrically, independence is interpreted as orthogonality, so the two stimulus locations in Fig. 7.1 are drawn at right angles.
The internal effect of a single experimental trial is a point in the two-dimensional space: The top word has a familiarity value on the vertical axis, the
bottom word on the horizontal axis. The underlying distributions are surfaces above a plane, but they are indicated in the figure as circles of equal
likelihood, as in chap. 6.
The mean of the New distribution on both axes is, arbitrarily, chosen to
equal zero. A 2AFC trial involves one observation that is most likely to be
near 0 and another near the original d', so points will center around coordi-

Comparison Designs for Discrimination

169

FIG. 7.1. A two-dimensional interpretation of the 2AFC task. The decision axes
are the observation strengths for the two intervals. Each distribution is represented
by a set of concentric circles defining contours of equal likelihood for one possible
stimulus sequence. The decision boundary is perpendicular to the line connecting
the two means. The observer responds "old on top" in the region to the right of the
boundary

nates (0, d') on  trials and around (d', 0) when the stimulus sequence is  was presented to high
confidence in the occurrence of . As in chapter 3, the data are plotted
as ROCs.
The interesting aspect of this experiment is the shape of the ROC. Figure
7.3a shows the standard model for the yes-no task, assuming that S^ and S2

174

Chapter 7

lead to distributions with unequal variances s2 and 1. Figure 7.3b is the
one-dimensional representation for 2AFC under the same assumption. The
decision variable in 2AFC is the difference in strength between Intervals 1
and 2, which we denote A - B (see Fig. 7.2). The variance of this difference
is the sum of the variances, 1 + s2, for either possible stimulus order. Thus,
the representation for 2AFC is equal-variance even if that for yes-no is not,
and the ROC in 2AFC should be a straight line with unit slope on z coordinates in all cases. The unit slope reflects an important theoretical advantage
of forced choice over yes-no: No matter what the criterion is, apparent sensitivity—the difference between the transformed hit and false-alarm
rates—is the same. In a one-interval experiment, this is true only if the underlying distributions have the same variance. It is ironic that an advantage
of 2AFC should be the robustness of its sensitivity measure in the face of
extreme biases that do not normally arise.
Implications for One-Interval ROC Analysis. The relation between the yes-no and 2AFC isosensitivity curves provides a theoretical rationale for the use of da in the one-interval task, as proposed in chapter 3

FIG. 7.3. (a) Decision space for yes-no when the variances of 5, and 52 are unequal, (b) Decision space (in the style of Fig. 7.2) for 2AFC, according to SDT,
when the variances of 5, and S2 are unequal and the observer uses an unbiased
cut-point decision rule. The area under the  distribution to the right of the
criterion (and the area under the  distribution to its left) equals p(c), which
by the area theorem equals the area under the unequal-variance yes-no ROC.

Comparison Designs for Discrimination

175

(Schulman & Mitchell, 1966). The two distributions in Fig. 7.3b each have
variance ( 1 + s2) and differ in mean by 2 d"r The mean difference divided by
the common standard deviation can be estimated by subtracting the z-transformed hit and false-alarm rates:

The right side equals V 2 da (see Eq. 3.4), so
(7.11)
For the case in which s - 1, we recommended earlier that d' be estimated
from 2AFC by dividing z(H) - z(F) by >/2 (Eq. 7.2). It now appears that
when the unit-slope assumption is unwarranted, this method is still desirable and yields an estimate of da.
Finally, Fig. 7.3b can be used to illustrate the area theorem for this normal unequal- variance case. Maximum proportion correct is the same for either stimulus sequence and equals

This expression equals Az, the area under the yes-no ROC, in the SDT case
(Eq. 3.8), confirming the area theorem. The equivalence of a distance measure da to the area under the yes-no ROC is a strong argument for preferring
it to other possible distance measures of sensitivity in the one-interval experiment (Simpson & Fitter, 1973).
Some Empirical Findings and Their Implications for Theory
Although 2 AFC appears to be a simple extension of the one-interval design, a
number of experimental results using this paradigm have been fodder for perceptual theory. In particular, the data force us to think seriously about the limitations imposed on discrimination by imperfect memory.
Empirical Comparisons Between 2AFC and Yes-No. Signal detection theory and Choice Theory agree exactly on the relation between

176

Chapter 7

2AFC and yes-no data. It seems almost impolite to ask whether the data respect this unanimity.
Much of the early work that introduced SDT established that different
tasks yielded constant estimates of d'. The results of most early experiments
using simple auditory and visual detection tasks (see Green & Swets, 1966,
ch. 4; and Luce, 1963a, for summaries) supported detection theory in this
respect. Extending the theory to discrimination tasks uncovered a systematic failure. Jesteadt and Bilger (1974) found that 2AFC performance was a
factor of 2, rather than V2, better than yes-no, both in their own frequencydiscrimination experiments and others they surveyed. Creelman and
Macmillan (1979) found the same result for discrimination of both auditory
frequency and monaural phase.
What accounts for the confirmation of the predicted yes-no/2AFC relation originally found by SDT advocates in detection experiments?
Wickelgren (1968) enumerated the many processing assumptions underlying the v 2 prediction and concluded:
When one considers all the ways in which the «j2d' prediction might
fail for reasons having nothing to do with the essential validity of
strength theory [detection theory] for both absolute [yes-no] and comparative [2AFC] judgments, it is truly amazing that it has not failed
thus far. However, the present analysis makes it clear that, if the
-/2d'prediction fails in some future application of strength theory, one
cannot reject strength theory without a detailed study of the reason for
the failure, (p. 117)

Subsequent data, as we have seen, justified Wickelgren's suspicions.
Two variables in particular—time between intervals and stimulus range—
are known to affect performance in 2AFC, and thus its relation to yes-no.
There has been some progress in interpreting the effects of these variables
theoretically without, as Wickelgren also foresaw, abandoning the basic
detection theory approach.
Effects of Interstimulus Interval.
In temporal 2AFC, the two
stimuli are separated by time rather than space. How much time should
elapse between the two stimuli? Our analysis has assumed that the particular order of the stimuli, and the time between them, makes no difference, but
it turns out that the interstimulus interval (ISI) does affect both sensitivity
and response bias.
The response-bias findings are classic. When the two stimuli on a trial
differ in intensity, the second interval is commonly called "larger" more of-

Comparison Designs for Discrimination

177

ten than the first, an effect called time order error. The sequence  is, accordingly, correctly reported more often than /2 relation between 2AFC and
yes-no clearly cannot hold for both roving and fixed discrimination, and a
model of how decisions are made in roving discrimination tasks is needed to
relate the two types of tasks. Durlach and Braida's (1969) trace-context theory addresses this problem and unifies the perceptual phenomena we have
been discussing.
Durlach and Braida's proposal about the one-dimensional classification
experiment, described in chapter 5, is that both sensory noise ((¥) and
range-dependent context noise (C2) limit performance, and that these
sources of variance add. Context noise is proportional to the square of the
range R so that C2 = CPR2 (G is a constant). Sensitivity is the mean difference
a divided by the standard deviation, or

Discrimination performance in 2AFC depends on both sensory and context variance, and also on trace variance—noise that increases with the
interstimulus interval T. How do these limitations combine? Durlach and
Braida suggested that they do so optimally, the result being that whichever
memory process is more accurate—has smaller variance—dominates. In a
roving experiment, each pair of stimuli is discriminated according to the
following relation:

Although it may not be obvious, this form of combining variances has the
properties we want. First, what if the range R is small, as in fixed discrimination? Then the right-hand variance term is small as well, and

Reducing Talso improves performance so that if the two stimulus intervals
are adjacent in time, range does not matter and Equation 7.15 again holds.

Comparison Designs for Discrimination

179

Trace-context theory has been extensively tested for sets of tones differing in intensity and describes many regularities of the data. One systematic
violation is the prediction that d' ratios across tasks will be the same
throughout the range. What is found instead is that at particular points, like
the edges of the range, the advantage of fixed d' over classification and roving d' is reduced. In the current version of the theory (Braida & Durlach,
1988), this effect is attributed to perceptual anchors that narrow the effective value of R in certain parts of the range.
Two Reasons for Using Two Alternatives
Two-alternative forced choice has been a very popular procedure, for two
excellent reasons. First, the procedure discourages bias. The assumption of
symmetric bias is often a good first approximation, and in any case bias can
be easily evaluated using the methods of chapter 2. Detection theoretic measures are preferable to "nonparametric" ones in 2AFC, as in yes-no, but
small amounts of bias reduce the experimenter's theory dependence, because most measures are equivalent at the ROC's minor diagonal. Low expected bias makes 2AFC a convenient task for use with adaptive
procedures, in which stimulus differences are changed depending on the
current level of performance (see chap. 11).
Second, performance levels in 2AFC, as measured by p(c), are high.
The predicted V2 difference between yes-no and 2AFC permits measurement of sensitivity to smaller stimulus differences than would be
practical with yes-no, and we have seen that, for many possible reasons,
the disparity observed in practice may be even greater. The relative ease
of 2AFC has an impact on some aspects of subjective experience: Observers often report surprise that they can perform above chance with
small stimulus differences, which they might be unwilling to report as
above a yes-no criterion.
In a 2AFC experiment, two stimuli are presented on each trial. The design is occasionally confused with other paradigms that also happen to use
two intervals. The defining characteristics of 2AFC are that both 5, and S2
occur on each trial, and that the order of the stimuli determines the corresponding response. One or the other of these properties is violated by
other, similar designs. In the task we discuss next, one of the two stimuli is
merely a reminder or standard that may improve performance (an empirical question), but is not essential to the judgment process.

180

Chapter?
Reminder Paradigm
Design

Consider again the lowly yes-no experiment, in which one of two stimuli is
presented on each trial. In a detection task, the two stimuli are Signal and
Noise. If the discrimination is difficult, an observer sometimes has the sense
of not being able to remember what the signal looks, sounds, or feels like.
As we have seen, the data support the idea that "memory" for the stimuli to
be detected is fragile, and in the reminder design the experimenter attempts
to jog the observer's memory. Each trial contains two intervals, the first of
which always contains the same stimulus. The observer's task is to determine whether the second stimulus matches the first. If the reminder is 5,,
then the presentations are  and <5,S2>, and the participant in effect
decides "same" or "different" rather than "1" or "2." A variant of the reminder experiment is the method of constant stimuli, in which the comparison stimulus varies from trial to trial, but the reminder in the first interval is
always the same. We considered this design in chapter 5, but without incorporating the reminder into our theoretical analysis.
Analysis
Figure 7.4 portrays the observer's problem in the usual two-dimensional
space. There are two distributions corresponding to the two stimulus possibilities <5,51> and . As for 2AFC, we consider both decisionally separable and differencing strategies.

FIG. 7.4. Decision space for
the reminder experiment, with
decision boundaries for both the
decisionally separable and differencing models. Each model
postulates that the observer responds "different" in the region
above the appropriate decision
boundary.

Comparison Designs for Discrimination

181

In the decisionally separable strategy, the observer places a boundary
perpendicular to the line connecting the two distributions. In effect the
space is viewed from the vantage of the vertical axis, and the distributions
are projected on this axis. Because the distance between the two distributions from this perspective is d", performance is the same as in yes-no. Indeed this is quite sensible: The boundary line is independent of the
observation from Interval 1, implying that the decision maker is ignoring
the "reminder" stimulus.
In the differencing strategy, the observer bases a decision on the difference between the two observations, A -B. In the space of Fig. 7.4, this is accomplished by using a boundary line of the form A + B = constant. Along
the A - B axis, the difference between the means of the two distributions is
not d', but only d'N2, so observed z(H) - z(F) will be poorer than yes-no d'
by a factor of v 2. Thus, counterintuitively, observers who use the reminder
stimulus as a decision aid will suffer a decline in performance. The reminder stimulus has as much variance as the stimulus to be judged, so the
variable A - B has twice the variance of A alone.
Data
We may thus hope that data will help us decide between the intuitive but deleterious differencing strategy and the decisionally separable rule of ignoring the reminder. Some empirical comparisons of the reminder, yes-no, and
2AFC designs are summarized in Table 7.1. All of these experiments were
essentially fixed discrimination, although in all except the line-orientation
study of Vogels and Orban (1986) both stimuli roved slightly from trial to
trial, the difference (on trials where there was a difference) being constant.
Such "jittering" had little effect in the experiments of Jesteadt and Bilger
(1974), who also measured unjittered performance.
Table 7.1 reports ratios of d' measures. If our SDT models were all correct, each entry in the table would equal 1.0. This prediction is most nearly
fulfilled for the 2AFC/reminder comparison, for which the geometric mean
ratio is 1.15. (Some individual ratios are nearer V 2 than 1 and have been interpreted as supporting the differencing model for both tasks; see Vogels &
Orban, 1986.) The two comparisons with yes-no show, once again, the relative difficulty of that task. Comparison of yes-no to reminder data shows
that reminders aid rather than harm performance. Relative to the optimal
performance described by our models, 2AFC yields the best performance
and yes-no the worst, with the reminder task intermediate between them.

182

Chapter?
TABLE 7.1 Experiments Comparing Yes-No, 2AFC,
and Reminder Performance
Relative Performance (a" Ratio)
Reminder/
Yes-No

2AFC/
Yes-No

2AFC/
Reminder

Intensity

1.26

1.51

1.20

Frequency

1.00

1.48

1.48

Jesteadt &
Sims (1975)

Frequency

1.37

1.74

1.27

Frequency
modulation

1.06

0.86

0.81

Creelman &
Macmillan
(1979)

Frequency

1.23

1.37

1.11

Phase

1.33

1.33

1.00

Vogels &
Orban (1986)

Line
orientation

Reference

Continuum

Jesteadt &
Bilger (1974)

Geometric mean

1.31
1.20

1.35

1.15

It is possible to force the use of a differencing strategy in reminder experiments by employing a real roving design, in which the standard varies
across a substantial range of stimuli from trial to trial. Jesteadt and Bilger
(1974) conducted such an experiment; for both intensity and frequency discrimination, and for both 2AFC and the reminder task, d' declined by about
the predicted V 2 compared to the fixed design. A similar result was obtained for intensity discrimination by Long (1973).
Essay: Psychophysical Comparisons
and Comparison Designs
The basic psychophysical process, we believe, is comparison. All psychophysical judgments are of one stimulus relative to another; designs differ in
the nature and difficulty of the comparison to be made. In the one-interval
experiment (or any of our two-interval designs if decisional separability is
in use), comparison is made to events remembered from previous trials. We
have seen evidence that this is a challenging task: Yes-no performance is not
as good as it should be relative to both 2AFC and the reminder experiment.
A comparison task of great importance in psychophysics, but one we have
slighted here, is the matching procedure. To judge the subjective magnitude
of a stimulus, a participant selects a value on some other continuum that

Comparison Designs for Discrimination

183

seems to "match" the standard. For example, the brightness of a light might
be matched by the intensity of white noise or the brightness of a light of a different color. What can we say about the reliability of such judgments?
When the two stimuli being matched are from the same continuum (e.g.,
both are pure tones and differ only on the dimension being studied), adjustment is more accurate than fixed methods. At least that was the finding of
Wier, Jesteadt, and Green (1976) for frequency discrimination. But when
the comparison is across continua, the need to compare disparate stimuli
harms performance: Lim, Rabinowitz, Braida, and Durlach (1977) and
Uchanski, Braida, and Durlach (1981) measured roving intensity discrimination of pure tones (or noises) that differed in frequency (or spectrum).
They found that comparing stimuli from different continua contributed additional, additive variance to the decision process.
One assumption we have been making that may be incorrect concerns the
independence of the intervals being compared. The variance of the difference between two variables is the sum of their variances only if the two variables are independent; if they are positively correlated, the standard could
effectively increase performance. This effect may account for the small advantage of the reminder and 2AFC designs seen in Table 7.1. The matching
procedure, by providing the observer with control, may allow strategies that
maximize this correlation.
Some researchers have combined the 2AFC and reminder designs. In experiments with binaural noise samples, Trahiotis and Bernstein (1990; also
Heller & Trahiotis, 1995) preceded and followed each 2AFC presentation
with an example of the standard, so that the possible stimulus sequences
were  and . The analysis is the same as for 2AFC, but
the instructions no longer require discussion of stimulus order. Instead the
listeners are asked to say whether it is the second or third stimulus that is different from all the others. Trahiotis and colleagues found superior results
with this design, but Gerrits and Schouten (2004) found that it lowered
performance with their speech syllables.
Gerrits and Schouten invoked perceptual memory to account for their results, and a more detailed understanding of memory may be required to
unify findings in this field. McFadden and Callaway (1999) conducted reminder experiments in which the standard was a "commonly encountered"
stimulus or, in other conditions, "less commonly encountered." For example, in musical chord discrimination, the standard was either an in-tune
chord, so that the comparison was out of tune, or an out-of-tune chord, with
a comparison that was in tune. The result was that performance was much

184

Chapter 7

better (a factor of about 2 in the chord experiment) for the commonly encountered standard. McFadden and Callaway suggested that such stimuli
have stable memory representations and may allow a more efficient form of
processing. Whatever the explanation turns out to be, it will necessarily require an understanding of the stimulus domain being studied, not just
general processing principles.
Summary
In comparison designs for discrimination, each trial contains two stimuli,
and the decision problem can be represented by two bivariate distributions
that can be projected onto a single dimension. In two-alternative forced
choice (2AFC), Sl and S2 are presented in either of two orders, and performance is expected to be better than in yes-no. In the reminder design, a
yes-no interval containing either S, or S2 is preceded by a constant "standard" (say Sj), and performance is expected to be worse than yes-no if the
observer compares the two intervals.
Accuracy in 2AFC (and thus its relation to yes-no) depends on two aspects of the two-interval design: the interstimulus interval and the range of
stimuli. Long intervals and wide ranges lower performance, and models
that are explicit about perceptual memory can account for the pattern of results in some domains. Yes-no accuracy tends to be lower, relative to both
2AFC and reminder performance, than detection theory predicts; a likely
culprit is the need to make comparisons across trials rather than across the
shorter intervals within a trial.
Chart 6 in Appendix 3 provides pointers to calculations of sensitivity in
2AFC.
Problems
7.1.

For the following stimulus-response matrixes, calculate d', the criterion c, p(c), and p(c)* assuming that the data arose from (a) a
2AFC experiment, and (b) a yes-no experiment.
Matrix A

7.2.

Matrix B

Matrix C

Matrix D

12

8

18

2

4

16

9

6

8

12

14

6

1

19

2

1

Find p(c)

2AFC

for each matrix of Problem 7.1.

Comparison Designs for Discrimination
7.3.

7.4.

7.5.

185

Marsh and Hicks (1998) conducted both yes-no and 2AFC experiments on source monitoring. In the study phase, participants saw
some words and generated others by rearranging anagrams.
(a) In one yes-no task, words of both types were presented, and the
possible responses were "seen" and "not seen"; the results were H=
.66, F = .18. In a second yes-no task, the possible responses were
"generated" and "not generated"; the results were H = .84, F = .37.
Find d'and c for both conditions. How do the two tasks differ?
(b) In a 2AFC task, two words were presented on each test trial, one
Seen and one Generated. In one version, participants were asked to
choose the one that was generated, andp(c) was .83 (separate hit
and false-alarm rates are not reported). How does this compare with
detection theory predictions?
(c) In a second version of 2AFC, participants were asked to choose
the word that was seen. This time/7(c) equaled .7. How would you
account for the discrepancy between the two 2AFC tasks?
In a 2AFC recognition memory experiment, the participants correctly identify both Old and New items at the same rate, .8. (a) Predict da in a yes-no experiment with the same stimuli, assuming s =
0.5; assuming 5 = 2. (b) Predict d"2 in a yes-no experiment with the
same stimuli, assuming s = 0.5; assuming 5 = 2.
You conduct three intensity-discrimination experiments with the
same observer using the same stimulus pair for each. The first experiment uses a 2AFC paradigm, the second a reminder paradigm,
and the third a yes-no task to figure out what strategy the observer is
using in the reminder task. What would you expect the data to be if
(a) the observer is using a differencing strategy in each condition;
(b) the observer is using adecisionally separable strategy in each;
(c) the observer is using an optimal strategy in each.

This page intentionally left blank

8
Classification Designs:
Attention and Interaction

In a classification design, a number of stimuli are sorted into a smaller or
equal number of categories. When introducing this type of experiment in
chapter 5, we restricted the discussion to sets of stimuli that differed on a
single internal dimension, but we now abandon that limitation and examine
paradigms in which the stimuli lead to representations that differ multidimensionally. Proceeding gently, we consider apparently simple problems
in which just three or four stimuli must be classified into only two categories. This project turns out to be sufficiently challenging for one chapter.
This set of problems has both methodological and substantive applications. Methodologically, there is a set of discrimination paradigms that can
be thought of as classification tasks. Recall that the comparison designs of
chapter 7 always lead to a representation with only two distributions. As a
result, although they can be modeled in two dimensions, they can also be
analyzed by projecting the bivariate distributions onto a single axis and conducting a unidimensional calculation. As long as there are only two stimulus classes, and thus only two distributions, the projection strategy always
works. This simplification cannot be made for classification paradigms, and
in chapter 9 we use the tools developed here to analyze them.
Substantively, classification designs are extensively used to study two
important topics: (a) independence versus interaction between two aspects
of a stimulus, and (b) attention. The independence question was the first,
historically, to which multidimensional detection theory was applied (Tanner, 1956), but the idea of independence turns out to be multifaceted. In
chapter 6, we encountered the concept of perceptual independence—a
characteristic of the representation of a single stimulus or stimulus class. An
analogous concept applies to stimulus sets', this was Tanner's focus, and
187

188

Chapters

many other paradigms have been introduced more recently to explore the
presence or absence of interaction in this sense.
Classification experiments are more complex than discrimination designs in that they require grouping multiple stimuli together (i.e., assigning
them the same response). Does this structural complexity have a corresponding cognitive cost? We discuss three classification paradigms in
which attention has been invoked by theorists. If several distinct stimuli occur that require the same response, we refer to the design as one of uncertainty about which of these stimuli will occur. If the response partition is
such that some aspects of the stimulus set must be appreciated and others ignored, attention is selective', if all aspects are relevant, attention must be divided among dimensions or features.
A critical distinction is that between extrinsic and intrinsic attentional
limitations (Graham, 1989). Extrinsic uncertainty is inherent in the situation, whereas intrinsic uncertainty is internal to the observer. It is essential
to find the extrinsic difficulty of a classification design so that poor performance that is in fact inevitable is not blamed on the experimental participant's inefficiency. Most of this chapter concerns models for extrinsic
uncertainty, which are useful for establishing a performance baseline.
One-Dimensional Representations and Uncertainty
Multiple distributions may lie on a single axis, of course. We explored such
examples in chapter 5, and we begin this chapter with some unidimensional
problems that are special cases of true multidimensional designs.
Inferring a One-Dimensional Representation
There are no one-dimensional stimuli: Every perceptual and cognitive object has multiple characteristics. But stimulus classes can be represented
one-dimensionally if they differ from each other in only one way that is relevant to judgment. The projection strategy of chapters 6 and 7 is a method for
mapping two complex stimuli onto a single decision axis. How can we
know if this is appropriate for more than two stimuli?
As discussed in chapter 5, the additivity of d' permits a simple test of
one-dimensionality: If three stimuli lead to a one-dimensional representation, as in Fig. 8.la, then
 is the distribution due to
Signal on Dimension 1 and Noise on Dimension 2; the notation is more explicit than referring to this distribution simply as 51, (which we continue to
do when no ambiguity is possible).
Three two-stimulus discrimination experiments can be constructed from
the three stimuli: detection of S{ (i.e., discrimination of Sl vs. the Noise
stimulus), detection of S2, and recognition of 51, versus Sr The results of
these tests can be used to decide whether the Sl and S2 dimensions are orthogonal. Nonorthogonality implies (for normal distributions) a correlation
between the dimensions; if the dimensions intersect at an angle 9, the correlation equals cos(0). The results of the three experiments can be used to esti-

192

Chapters

FIG. 8.2. Decision space
showing distributions for
the Null stimulus and two
stimuli differing from it,
each along a different dimension. The angle between the axes measures the
dependence between the
two dimensions.

mate 0. The relation between recognition performance (d112) and detection
sensitivity (d\ and d'2), measured in separate sessions, can be calculated
from the geometry of Fig. 8.2:
(8.3)

The criterion for deciding which response to give in recognition divides
the line joining the means of the S: and S2 distributions. The particular criterion shown in Fig. 8.2 is halfway between the two distributions. Because the
values for d\ and d'2 are not equal in the figure, the criterion line does not
pass through the origin, the mean of the Noise distribution.
Equation 8.3 covers all possible relations between pairs of imperfectly
detectable stimuli. In one important special case, the alternative stimuli produce independent effects, which are said to require independent sensory
channels, a metaphor introduced by Broadbent (1958). In that case, the axes
are orthogonal so that 6 = 90°, cos(0) = 0, and

We derived the same equation for compound detection of orthogonal stimuli in chapter 6 (Eq. 6.9).

Attention and Interaction

193

Values of 9 less than 90° arise from overlap between the channels' regions of sensitivity—a signal that activates one maximally also activates the
other to some extent. Angles of 6 greater than 90° might arise from inhibition between the separate perceptual or sensory channels (Graham, Kramer,
& Haber, 1985; Klein, 1985).
When 6= 0°, we are back in the unidimensional world of the previous examples, wherepairwised'values add: cos(0)= 1.0, so d\ 2= d\-d'r When
6= 180°, another one-dimensional case, the distance between the two Signals in the recognition task is the sum of the individual detectability values.
This is the well-known city-block metric, first described by Shepard (1964)
for the scaling of similarity judgments. For a discussion of the range of application of this metric, see Nosofsky (1984).
In his own experiments, Tanner found that dimensional orthogonality
held when tones were sufficiently different in frequency, but that 6 was less
than 90° when they were similar. The result is consistent with the "criticalband" hypothesis, according to which auditory inputs are divided into channels according to frequency. Tanner's approach offers a convenient summary of the data in geometric terms, but it has a shortcoming: The three
experiments result in three values of d'. This is just enough data to determine the internal angles of the triangle in Fig. 8.2 (by the side-side-side theorem of geometry, which underlies Eq. 8.3), but does not provide any
internal test of validity (Ashby & Townsend, 1986). Later in the chapter, we
shall see how the addition of just one more stimulus can give us more confidence in the representations inferred from data like these.
Example 8d: Item and Source Recognition for Words
In a typical recognition memory experiment, participants are asked whether
test items were on a study list they saw earlier. In real life, a question just as
important as whether an item can be recognized is whether its source can be
identified: Did I see this face yesterday at work or yesterday on TV? at the
scene of the crime or in the police station? To simulate this problem in the
laboratory, two lists are presented for study, and tests can be of two kinds:
item recognition (was this presented on a study list?) and source identification (which list was it on?).
Banks (2000) pointed out that these two tasks are analogous to the detection and recognition problems in Tanner's (1956) model. The question Tanner posed is important in this application: Do the three distributions fall on a
single dimension or are two dimensions required? A single dimension

194

Chapters

would be appropriate if all judgments were based on "familiarity," a variable commonly thought to underlie item-recognition decisions. Banks presented his participants with a visual list of words on a computer monitor and
an auditory list via loudspeakers. ROC curves were collected, and da in item
recognition was found to be 1.55 for the visual list and 1.63 for the auditory
list. Clearly if the same dimension were also responsible for source identification, we would expect da = 1.63 - 1.55 = 0.08. In fact da was 1.59, implying a representation like Fig. 8.2, with 6 = 59°. Furthermore, as Banks
pointed out, the implied decision axes for item recognition (with list membership "uncertain") and source recognition are perpendicular to each other
(as is approximately true in Fig. 8.2). This result is roughly consistent with
the idea that source identification depends on a (probably conscious) recollection process, whereas item recognition depends primarily on a (probably
unconscious) familiarity judgment.
Perceptual Separability and Integrality
Tanner's model captures the idea of interaction, and the degree of interaction (as measured by 0) maps naturally onto concepts in psychoacoustics,
recognition memory, and other fields. As noted earlier, however, an important limitation of the model is the use of only three stimulus classes and thus
three distributions. The data consist of three values of d', and the representation is a triangle, the length of each leg equal to a d'. Except in extreme
cases, for which the triangle inequality is violated (one value is greater than
the sum of the other two), the model is guaranteed to work.
Expanding the stimulus set overcomes this technical limitation and allows for the study of interesting substantive questions. Many stimulus sets
can be constructed by varying two or more dimensions: height and width to
make rectangles, the first and second formants to make vowels, contrast and
spatial frequency to make gratings, and so on. In the resulting stimulus sets,
every value of one dimension can (in principle) combine with every value of
the other. The smallest set of stimuli that has this property is built from two
values on each of two dimensions; the stimuli can be denoted Su (value 1 on
both x and y), 512 (value 1 on x and 2 on y), 521, and 522.
Such sets have been studied extensively by Gamer (1974) and his colleagues, with the intent of distinguishing "integral" pairs of dimensions
(which interact) from "separable" ones (which do not). Garner proposed a
series of classification tests to distinguish these possibilities operationally;
all of them are discussed in this chapter, and we return to an evaluation of the

Attention and Interaction

195

"Garner paradigm" in a later section. For now we simply adapt his terminology for use in our psychophysical context. To avoid confusion, we follow
General Recognition Theory (Ashby & Townsend, 1986) in using the terms
perceptual integrality and perceptual separability for characteristics of representations, and the Garner terms integrality and separability in his
original operational senses.
Perceptual separability (Fig. 8.3a) is defined by a rectangular arrangement
of distribution means; in this case, a change on one dimension has no effect on
the value of the other. In perceptually integral cases (Fig. 8.3b), the two dimensions are correlated so that a change on one is at least partly confusable
with a change on the other. Such representations display mean-shift (or just
mean-) integrality (Kingston & Macmillan, 1995; Maddox, 1992) because
the means of the distributions are shifted compared to the perceptually separable case. In Fig. 8.3b, lines connecting the means are drawn, and the angle 6
is a measure of how integral the two dimensions really are. Notice that these

FIG. 8.3. (a) A perceptually
separable arrangement of distributions due to four stimuli, (b) A
mean-integral arrangement of
the four distributions; the angle
9 measures the degree of mean
integrality.

196

Chapter 8

concepts refer to sets of distributions, whereas the idea of perceptual (in)dependence introduced in chapter 6 refers to a single distribution.
The use of four stimuli produces constraints among the fixed, two-stimulus tasks needed to generate the representation: There are six values of
d' constrained by only five degrees of freedom. The four outside segments
in Fig. 8.3, plus one diagonal, force the value of the other diagonal. Predictions about classification tasks can also be made from such a representation,
as we shall see shortly.
Two-Dimensional Models
for Extrinsic Uncertain Detection
Example 8e: The Uncertainty Design in Multimodal Detection
Bonnel and Miller (1994) asked observers to detect a change in background
that, on different trials, was unpredictably an increment in either the luminance of a spot or the intensity of a tone. The research question was whether
uncertainty would lower performance compared with control conditions in
which the modality to be attended to was known in advance. This basic design was earlier used within a modality (e.g., using tones of different temporal frequencies; Creelman, 1960; Green, 1961) or gratings of different
spatial frequencies (Davis & Graham, 1981). In all these studies, the uncertainty design was used as a tool for exploring sensory channels.
Bonnel and Miller assumed there was no interaction between their visual
and auditory stimuli, and that the representation was thus perceptually separable, as illustrated in Fig. 8.4. The locations of the distribution means for
visual (S,) and auditory (52) distributions are the d'values (1.5 and 2.0)
found in the control conditions in which each increment was discriminated
from no change (N). The uncertainty task requires that observers establish a
decision boundary in the space of Fig. 8.4 that accurately assigns stimuli 5",
and S2 to one response and N to the other. How should this be done?
Summation Rule
Although the representation sports three distributions in two dimensions, it
is still possible to reduce the decision problem to one dimension using the
projection technique. Just as in the compound detection case of chapter 6,
the observer might base a decision on total subjective intensity, which is
greater for points farther out into the upper right quadrant along the decision
axis y - x. Possible decision boundaries consistent with this rule are all per-

Attention and Interaction

197

FIG. 8.4. A representation of an
uncertain detection experiment in
which an auditory stimulus 5,, a
visual stimulus 52, or no stimulus
(AO may occur, and the auditory
and visual dimensions are independent.

pendicular to this line, as shown in Fig. 8.5. When the S} and S2 distributions
are projected onto the decision axis, the means are no longer 2 and 1.5
units from the N mean, but instead (using the Pythagorean theorem as
usual) 2/V2 = 1.41 and 1.5A/2 = 1.06 units away. Choosing a criterion location halfway between the means of N and S2 places it at .71 units from
the N mean. The hit rates are therefore O(0.71) = .761 for S2 and 0>(0.36) =
.641 for Sp the false-alarm rate is O(-0.71) = .239. If Noise is presented on
half the trials and each Signal on one quarter, then p(c) = (.5)(.761) +
(.25)(.641) + (.25)(.761) = .73. The model predicts a drop due to uncertainty
in visual performance from/?(c) = .84 in visual detection andp(c) = .77 in
auditory detection to an overall level of p(c) = .73.
The summation rule is a natural one.' Furthermore, it resembles the optimal strategy for detecting compound stimuli that we developed in chapter 6.
It is clear, however, that Bonnel and Miller's observers did not use this rule
because their performance turned out to be better than the rule predicts. Recall that models of extrinsic uncertainty give the best performance possible,
and intrinsic uncertainty can only lower observer accuracy. When extrinsic
models are outpaced in practice, they are wrong.
Independent-Observation Rule
Bonnel and Miller's observers were not using a straight-line boundary, but
perhaps they employed another relatively simple rule: Compare the obser1

A slightly different summation rule, in which the decision boundary is parallel to a line passing through
the S, and S2 distribution means, yields slightly better performance.

198

Chapter 8

FIG. 8.5. The uncertain detection experiment of Fig.
8.4 and an integrative (total
intensity) decision rule. The
decision boundary permits
all distributions to be projected onto a single dimension, shown at the lower
right, and the observer responds "yes" if the total intensity exceeds a criterion.

vation to criteria on each dimension independently, and say "yes" if either
criterion is exceeded. We also encountered this "minimum" rule in the compound detection problem of chapter 6. Continuing the present example,
suppose the observers placed criteria perpendicular to the x and y axes at unbiased locations, 0.75 units along x and 1 unit along y, and responded "yes"
if their joint observation exceeded either criterion. This leads to the twosegment rectilinear decision boundary shown in Fig. 8.6a.
It is easiest to calculate />("no"leach possibility) because a "no" response
can only be made if the observation is below both the criteria and the product rule introduced in chapter 6 applies. For the Wdistribution, the probability of a correct rejection, P("no"IAO, therefore equals O( 1)O(0.75) =
(.841)(.773) = .650. Applying the same logic to the non-null stimuli gives
us the "miss" rates for each modality: For visual stimuli, P("no"l5,) =
O(-0.75)O(1) = (.229)0841) = .193; for auditory stimuli, P("no"l52) =
 and  distributions. To find the
response probabilities, we need to know only the distances along this line
corresponding to the means of the various distributions. In the figure, the
distance from the mean of  to the criterion is k and that to the mean
of  is d+. By geometry, if the distance to the mean of is d, the
distance to the mean of  is d+ - d. Thus,

202

Chapter 8

(8.7)

The interrelation that follows from these equations does depend on the
assumption of normal distributions. Denoting the z score corresponding to
P, by z,, it is

zl + z2 = zn + zn •

(8.8)

Mulligan and Shaw (1980) applied this approach to the problem of bimodal (auditory and visual) detection and found the independent-observation predictions (Eqs. 8.6) supported over the integration prediction (Eq.
8.8). Shaw reached the same conclusion in her analyses of experiments on
visual detection and Bayesian decision making. The relatively firm preference for one type of model over the other does not depend simply on a comparison of d' values or other performance measures, but on finer, structure-revealing aspects of the data (Fidell, 1982; Shaw & Mulligan, 1982).
That the predictions are to some degree nonparametric is another advantage
of Shaw's approach.
Selective and Divided Attention Tasks
Is the uncertain detection task a "selective"or "divided" attention design?
Recall that in selective tasks the goal is to attend to one dimension and ignore others, whereas in divided tasks attention to both dimensions is necessary. The uncertain-detection task can be viewed either way depending on
the model assumed: The one-dimensional "intensity" model (Fig. 8.5)
treats attention as selective, in that the observer must attend to subjective intensity and ignore characteristics, like modality, that distinguish stimuli Sl
and S2. The comer and optimal models (Fig. 8.6), however, appear to be
strategies for dividing attention.
Selective and divided attention are easier to distinguish operationally
with four-stimulus sets. There are three ways in which four elements can be
partitioned into two equal parts, two of these being examples of selective attention and one of divided. We consider these in turn, following an analysis
presented by Kingston and Macmillan (1995) for speech discrimination
experiments.

Attention and Interaction

203

Selective Attention
Figure 8.8a displays a perceptually separable representation, as in Fig. 8.3a.
In one selective attention task, observers are instructed to respond strictly
on the basis of the x variable, assigning one response to 5H and 512, the other
to 5*2, and S22. A decisionally separable boundary—the vertical line in the
figure—is optimal, and the distributions project onto a single (horizontal)
axis. Performance is just as good as if only the two distributions Su and 5"21
were being discriminated, so the model predicts that for separable dimensions there is no performance deficit due to filtering, as the selective task is
sometimes called. An analogous task for selective attention to the vertical
dimension is analyzed in the same way.

FIG. 8.8. Representation for selective attention task in which 5,, and S,2
are assigned to one response and 521
and S22 to the other, (a) Perceptual separability, and (b) mean integrality.

204

Chapter 8

The mean-integral arrangement in Fig. 8.8b requires a different boundary. This is the kind of problem for which the likelihood ratio analysis of response bias (see chaps. 2 and 4) is essential. When two different
distributions correspond to the same response, the likelihood of an observation due to either of them is the sum of their likelihoods—an example of the
additive rule for combining probabilities. For the representation in Fig.
8.8b, this is true of both stimulus subsets. The boundary shown in the figure
connects all points for which the likelihood of either S,{or 512 is the same as
the likelihood of either 521 or S22—that is, for which the likelihood ratio is 1.
It may seem surprising that the optimal boundary has this curved shape,
rather than being parallel to lines connecting the means, but some insight
can be gained by considering points far up above S12 and S22. In this region,
the Sj, and 521 distributions matter little, so the boundary must be perpendicular to the Dimension 1 axis.
The attention question is how performance in the task sketched in Fig.
8.8b compares to performance with just stimuli Su and 512. Can an observer
do as well as in the baseline two-stimulus control condition, or is there a
"filtering loss," that is, a deficit due to the additional stimuli. Our simple
methods of calculating proportion correct fail us here—numerical integration is needed—but performance is indeed lower than for the baseline task.
The magnitude of the drop depends on 9, the degree of integrality (see Fig.
8.3b). Larger declines arise as 6 nears 0° or 180°. For example if d'= 2 for
all one-dimensional comparisons, so that baselinep(c) = .84, then predicted
p(c) = .82 if 0 = 60° and .78 if 0 = 30°.
Divided Attention
To force attention to both dimensions, the observer is required to assign
stimuli Sn and S22 to one response, S12 and S21 to the other. An optimal strategy for doing this in a perceptually separable representation is shown in Fig.
8.9. The observer divides the decision space into four quadrants and gives
one response for the NE and SW regions, the other for NW and SE. It is clear
that this strategy has no equivalent one-dimensional model, but is the optimal strategy good enough to prevent a performance decline?
To analyze this perceptually separable case, we denote the discriminability of Sn and S2l by d'x and that of 5,, and 512 by d'y. Because of the assumed symmetric criteria, proportion correct is the same for all four stimuli,
so we need to consider only one of them, say stimulus 512. The observer
makes a correct response to this stimulus if the observation falls in either the

Attention and Interaction

205

FIG. 8.9. Representation for
divided attention task in which
5,, and S22 are assigned to one
response and S,2 and 52I to the
other.

upper left or lower right quadrant, and we can calculate the probabilities of
each of these events using the product rule from chapter 6:

(8.9)

lid' = 2 on both dimensions, so that baselinep(c) = .84, these terms are
(.84)2 = .706 and (.16)2 = .026, for a sum of .732. For d'- 1, the decline is
from .69 to .572. Clearly the divided attention task is, extrinsically, quite a
difficult one.
We do not discuss the mean-integral case in detail. The optimal decision
boundary is constructed by combining two curves like the one in Fig. 8.7b.
The interesting result is that performance is relatively unaffected by 0over
its entire range.
The Garner Paradigm for Assessing Interaction
We are now in a position to consider the complete Garner paradigm. Garner
(1974) argued that determining whether two dimensions interact should not
rely on a single test, but on "converging operations." In typical experiments
by Garner and his colleagues, the two dimensions are sampled at two points

206

Chapter 8

each, as in the last few examples. Separability is defined by no filtering loss,
that is, selective attention equal to baseline performance; and no "redundancy gain," for example, the ability to distinguish Sn and S22 being the
same as the ability to distinguish Su and 512. Integrality is the opposite pattern, both a filtering loss and a redundancy gain. Divided attention is not always included and is not considered diagnostic in distinguishing integrality
and separability.
Does the perceptual-space model agree with Garner's definitions? Both
approaches agree that integrality is associated with filtering loss, separability with no loss. As for redundancy gain, the parallelogram model predicts
this effect for all arrangements if optimal decision rules are used, but can
predict no gain in the separable case if decisional separability is assumed
(see chap. 6). In many experiments using the Garner paradigm, participants
are instructed to attend to one dimension even in the redundant case, so it is
perhaps not surprising when redundancy gains are not found.
The analyses in this chapter provide a theoretical convergence of operations that allows for quantitative predictions of the relations among these
tasks, but there are two important limitations. The first we have just seen:
Predicted performance depends on the particular decision strategy used by
the observer. Second, detection theory applies to imperfectly discriminable
stimulus sets and the measurement of accuracy. Most Garner-paradigm
studies have instead used response time, and explicit modeling of this measure is required if quantitative predictions are to be made (Ashby &
Maddox, 1994).
Attention Operating Characteristics (AOCs)
Extrinsic models of attentional paradigms provide a useful baseline for performance: Even a substantial drop due to divided attention, for example,
can be consistent with no real limit on intrinsic attention allocation. We now
consider a detection-theoretic approach to an intrinsic concept, "paying attention." To begin, it helps to return to the problem of compound detection
introduced in chapter 6.
"Multiple-Look" Experiments
Remember that the detectability of a "compound" stimulus, for example, a
simultaneous tone and light flash, is the Pythagorean sum of each component's d''. If the stimuli are equally detectable, the improvement (or redundancy gain) is a factor of V2. Now imagine a slight modification in which

Attention and Interaction

207

the observer gets multiple "looks" at the same stimulus, say a light flash.
The argument still applies, so that the detectability of a double look is V 2
times the d' for a single one. In fact the argument can be extended to any
number of looks, so that 10 looks should improve d' by VlO. Early research
(Swets, Shipley, McKee, & Green, 1959) roughly supported this way of
modeling multiple presentations, although observers were not completely
efficient.
This same relation can be derived in a different way with reference to a
single decision axis. Assume that the decision variable is the sum of observations (on a single dimension). Then n stimuli produce a mean difference
of nd' and a variance of n (because the variance for one observation is 1), so
the effective normalized mean difference is nd'Hn - 4nd'. This one-dimensional perspective allows us to easily go beyond two samples, whereas
visualizing six-dimensional spaces is hard.
Capacity and the Sample-Size Model
What has this to do with attention? Suppose that (as is postulated by many
models of attention) a person has a fixed "capacity" to allocate among whatever tasks are at hand.3 For convenience, let us call this capacity T (for "total") units. As in the previous discussion, assume that as each unit is
allocated it adds a fixed amount to both the mean and variance. Hypothetical performance using one unit of capacity is denoted by d'.
Consider now the uncertain detection experiment with which we began
the chapter. If all attention is allocated to dimension jc, performance will be
^Td' on that dimension, but 0 on dimension y. The reverse is true if all attention is allocated to y. But what if P of the r units are allocated to x and TP to yl Then performance on x, denoted d'x, is ^d'and d'y is ^(T -P)d'.
The model says that capacity can be allocated to one dimension only at
the cost of the other, and so it describes a tradeoff between accuracy on the
two tasks. When P is large, the observer will do well on Dimension x and
poorly on Dimension y, whereas when P is small (so that T- P is large) the
opposite will be true. The relation between x and y performance is an "operating characteristic," analogous to the receiver operating characteristic
(ROC), which describes a tradeoff between hits and correct rejections.
3

Most such models distinguish "controlled" tasks, which require attentional capacity, from "automatic"
tasks, which do not.

208

Chapter 8

To find the form of the attention (or performance) operating characteristic between d'x and d'y, we need to solve the prior expressions derived from
the sample-size model for one in terms of the other. This can be done most
easily in terms of the squares of the sensitivities:

This is a circle (the usual equation is y2 = r2 -x2) as shown in Fig. 8.10. Rearranging the terms provides another perspective:

The idea that squared sensitivities are added to estimate overall capacity is
an old one, dating to Lindsay, Taylor, and Forbes (1968).
What would happen if participants were asked to give, say, 80% attention
to x and 20% to v? They should allocate 80% of their capacity to x and operate at the point labeled (80%, 20%) on the diagram. Experiments of this type
have often shown that participants not only follow a circular tradeoff function, but are also accurate at assigning the requested percentage of capacity.

FIG. 8.10. Schematic representation of a hypothetical
attention operating characteristic (AOC) showing joint
performance (or sensitivity)
in the dual-task paradigm.
Solid symbols depict resource limitation in which a
fixed capacity is allocated to
each task alone in the single
task, but is divided according
to instructions in the dual
task. The open triangle represents a case of independence
in which neither of the dualtask components affects the
other.

Attention and Interaction

209

For some pairs of stimuli, however, no tradeoff is found. For example, Graham and Nachmias (1971) found that attention could be simultaneously
paid to gratings of two different frequencies so that the AOC consisted of
two straight line segments, as also illustrated in Fig. 8.10. This result is
strong quantitative evidence that separate perceptual "channels" are used in
processing the two gratings.
Summary
Classification experiments, in which a number of stimuli are partitioned by
the observer into a smaller number of categories, can be used to study perceptual independence versus interaction and a variety of paradigms for
measuring attention. Uncertainty about which element of a stimulus subset
is to be presented can force performance to be lower than in a corresponding
fixed-discrimination condition. Uncertainty effects occur even for
unidimensional stimulus sets if multiple criteria are used in the decision
process.
Stimuli that differ perceptually in more than one way can be represented
as distributions in a multidimensional space. Sensitivity measures, such as
a", are distances in such a space, and multiple experimental conditions can
allow the geometric arrangement of the distributions to be determined. To
determine whether two stimulus dimensions are represented independently,
a set of stimuli in which both dimensions vary must be used. As few as three
stimuli lead to an answer to the independence question, but a 2 x 2 set
permits stronger conclusions.
Many multidimensional tasks are susceptible to either integration or independent-observation decision strategies. These can sometimes be distinguished on the basis of predicted accuracy, but more powerful methods
examine more detailed aspects of the data.
Selective and especially divided attention are usually intrinsically more
difficult than the corresponding baseline tasks. The loss due to attention depends on whether the dimensions on which the stimuli vary are independent
or interacting.
Detection theory can be used to quantify the idea of a limited attentional
capacity that must be allocated among various tasks. Data from experiments
in which observers are instructed to allocate attention differently can be
used to determine whether different stimulus dimensions are processed by a
single channel or separate ones.

210

Chapters
Problems

8.1.

8.2.

8.3.

8.4.

8.5.
8.6.

8.7.

In detection experiments for two audio frequencies, H = .78, F = .24
for a weak 1000-Hz tone and H = .72, F = .31 for a weak 1200-Hz
tone. Find detection d' for both frequencies. Predict identification
d' assuming that the tones are analyzed by independent channels.
What would a 2AFC identification experiment yield for p(c) if
tones of frequencies 1000 Hz and 1000.5 Hz were each detectable
at d' = L5 and (a) 6= 60° or (b) 0= 30° in Fig. 8.2?
(a) In Example 8b, on increment-decrement uncertainty, recalculate the hit and false-alarm rates assuming the criteria are located at
+/- 0.75 rather than +/- 0.5 SDs from the mean of S2. Do the stricter
criteria lead to better performance in the uncertain task, and thus a
smaller decline due to extrinsic uncertainty?
(b) Extend the calculation to plot an ROC for this task.
In the representation of Tanner's detection/recognition experiment
shown in Fig. 8.2, suppose that d\ = d'2 = d' 12 - 2. These are the
lengths of the three legs of an equilateral triangle in which all the interior angles are 60°. Therefore, participants could get 84% correct
in any of the three tasks (S^ detection, S2 detection, and recognition). But what if they only consider Dimension 1 in making their
decisions? That is, they have decisionally separable boundaries
perpendicular to Dimension 1 or (equivalently) project all three
distributions onto Dimension 1? What is percent correct for the
three tasks in that case?
In the illustration of mean integrality (Fig. 8.3b), what would happen if 8= 0° or 180°?
Redo the uncertain-detection example (Figs. 8.4-8.6) assuming d'
= 1 on both dimensions, (a) For both the summation and independent-observation rules, find p(c) for the uncertain condition and
compare it with the fixed condition, (b) This is not the best possible
performance for the independent-observation rule. How well can
the participant do if the two criteria go through the means of the 512
and S22 distributions rather than halfway between those means and
the origin?
On each trial of a detection experiment, an auditory signal can be
presented to the listener's left earphone (SL), right earphone (5"R),
both, or neither. Two observers produce the following data:

Attention and Interaction
Observer 2

Observer 1
Signal

8.8.

8.9.

8.10.

8.11.

211

"Yes"

"No"

32

17

23

8

32

8

24

16

24

16

36

4

36

4

"Yes"

"No"

Null

8

SL

32

SR
Both

For each listener, determine whether an integration or independent-observation strategy is being used.
Redo the example in Figs. 8.8a and 8.9 assuming d'x = d'y=l. That
is, predict performance in selective and divided attention assuming
that this is the accuracy level in the fixed, baseline tasks.
In Fig. 8.8b, the optimal decision rule cannot be interpreted as a
projection on a single decision axis, but there is a simple nonoptimal rule that can: The decision axis could be parallel to a line
connecting the means of Sn and Sn. Suppose all d' values for onedimensional comparisons equal 2, 9 = 45°, and the decision criterion goes through the middle of the parallelogram. Analyze the
problem in this one dimension and calculate p(c).
The divided attention problem (Fig. 8.9) can also be reduced to one
dimension. Assume the same representation as in Problem 8.9, but
with a decision axis parallel to the line connecting the means of S12
and S2l and three response regions (as in Fig. 8.1b). What is p(c)l
Participants study a list of words. There are two test conditions.
One is standard: Single words are presented, and the participant
says "yes" or "no." In the "expanded" condition, four words are
presented on each trial, either all Old words or all New words. If
p(c) is .75 in the one-word condition, what do you predict it will be
in the four-word condition? (Assume unbiased responding.)

This page intentionally left blank

9
Classification Designs
for Discrimination

We return again to designs for studying discrimination. The tasks described to this point—yes-no with or without a rating response or a reminder, and 2AFC—provided the experimental cornerstone for detection
theory in psychology. They are natural paradigms for studying the detection of weak signals and, as we have seen, are simply related to each other
on theoretical grounds.
Each design, however, has shortcomings. The failure of the predicted relation between yes-no and 2AFC (Eq. 7.2) led us to the suspicion that participants are limited in the one-interval task by imperfect memory. Twoalternative forced choice, which survives this criticism, is subject to another: In some applications, the task is difficult to describe to participants.
Observers in 2AFC are instructed to "choose the picture you think you have
seen before" or "choose the interval that contained a tone added to the noise
background." The dimension of judgment—recency and "tone-ness," in
these examples—is made explicit. But observers may not share the experimenter's definition of the dimension being judged, and may even be able to
distinguish the stimuli without having names for them at all. Many listeners
in simple auditory tone-detection experiments, for example, discover that
"tone-ness" is not, introspectively, the basis for judgment: The experience
of a very weak stimulus is not a small version of a more intense one, but participants usually learn to respond appropriately with training.
Frequently, the problem of describing the dimension on which the stimuli
differ is not so readily solved. Sometimes the physical dimension is difficult
to characterize for participants; the experimental design precludes training;
or participants are unsophisticated, and forced-choice instructions are difficult to convey. We now discuss three participant-friendly designs that seem
213

214

Chapter 9

well suited to such situations: same-different, ABX, and oddity. These tasks
have been used in experiments with animals, unsophisticated participants by
most standards, and in human studies in which the differences among stimuli
are difficult to describe. Our examples illustrate these applications: We consider people categorizing visual objects, animals discriminating visual
shapes, and people discerning subtle differences among wines.
The cost of using these accessible designs is borne by the experimenter,
for they are not psychophysicist-friendly. The "comparison" tasks discussed in chapter 7—2AFC and reminder—assumed two distributions, one
for each of the possible stimuli (or stimulus sequences). These distributions
were represented in a two-dimensional perceptual space, but the optimal
strategy could be displayed in one dimension by an appropriate projection.
The discrimination designs in this chapter require classification—that is,
there are more possible stimulus sequences than responses. The attention
designs analyzed in chapter 8 can be adapted with only minor modification
to describe same-different and ABX (matching to sample). Oddity requires
a slightly different classification analysis.
Same-Different
Example 9a: Semantic Judgments of Pictures
Irwin and Francis (1995a) explored the perception of line drawings of objects that were either natural (e.g., alligator, leaf) or manufactured (e.g.,
various tools). Pairs of such objects were briefly presented, and the observers had to say whether they belonged to the same or different categories.
Thus, the correct response for the pair  was "different,"
whereas for  it was "same."
Letting S1 and 5"2 denote the natural and manufactured stimuli, there are
four possible pair types: , , , and . The participant
has only to respond "same" or "different" and need not know or be able to
articulate the ways in which the stimuli actually differ. The results can be
summarized in a 2 x 2 table as in earlier chapters, but with new labels for the
rows and columns. Here are some possible data:
Response
Stimulus Pair

"Different"

"Same"

> or <55>

30

20

> or 

10

40

Classification Designs for Discrimination

215

Hit and false-alarm rates can be defined in a natural way:

We assume that presentations of the two kinds of Same trials and the two
kinds of Different trials are equally likely. How can we estimate a" for data
of this sort?1
Representation
To appreciate the peculiarity of the same-different task, consider its underlying distributions shown in Fig. 9. 1 . As with 2AFC, the two dimensions are the
two intervals of the task, and every point in the two-dimensional space represents a possible outcome of a trial. For each interval, the mean given 51, is 0
and the mean given 52 is d' , so that d' is the distance between the means of any
two distributions differing along just one axis. The four possible stimulus sequences generate four probability distributions in the space. If the stimulus
sequence is , for example, the observer's observation is drawn from the
distribution at the lower right. Our task is to estimate d', the original normalized distance between the means of the S, and S2 distributions, a sensitivity
statistic that characterizes only the stimulus pair, not the method.

FIG.
9.1. Decision
space for the same-different experiment. The
effects of the two observations are combined
independently. The unbiased decision rule is to
respond "different" in
the shaded area.
1

Because we know of no Choice Theory models for the tasks described in this chapter, we consider only
SDT models.

216

Chapter 9

We explore two decision rules based on this representation: an independent-observation and a differencing rule. Both are special cases of rules we
developed for divided attention in chapter 8, and the independent-observation rule is again the optimal one. However, we shall see that some experimental designs conspire against any decision maker's attempts to use this
strategy, and for those designs differencing is the best available analysis.
Independent-Observation Decision Rule
Statement of Rule. The optimal decision rule is like that used for
the divided attention task: To determine which points on the plane lead to
which response, a pair of criterion lines is used to partition the space of Fig.
9.1. If an observation falls either to the right of the vertical criterion line and
below the horizontal one (in the lower right quadrant) or to the left and
above (upper left quadrant), the response is "different"; otherwise the observer responds "same." For the  distribution, the proportion in this
region is the hit rate; because the decision rule is symmetric, this is also the
proportion correct for all other trials and for the task as a whole. The calculation of proportion correct in same-different using independent observations [p(c)SDIO] is the special case of divided attention (Eq. 8.9) in which d'x
equals d'y:

(9.2)
Solving for d' yields2 (see Computational Appendix to this chapter)
(9.3)

Comparison With Yes-No. How difficult is this task compared
with yes-no? To relate the two tasks, recall that O(J72) is the proportion
correct for an unbiased participant in yes-no. Combining this identity with
Equation 9.2 reveals that
p(cW + [1 -P(c)yes.no]2 .

(9.4)

Sample predictions from this equation, given in Table 9. 1 , clearly show that
observers are expected to find same-different more difficult than the corre2

Equation 9.3 assumes thatp(c) > .5. If not, the equation cannot be used. A heuristic solution is to replace
p(c} with 1 -p(c) and treat the result as a negative value of d'.

Classification Designs for Discrimination

217

TABLE 9.1 Comparison of Yes-No Performance
With Two Decision Strategies in Same-Different

P(c)
Same-Different
IndependentObservation

d'

Yes-No

1

.69

.57

.55

2

.84

.73

.68

3

.93

.88

.80

4

.98

.96

.89

5

.994

.987

.95

6

.999

>.999

.98

Differencing

spending yes-no task, just as they find the divided attention task quite challenging compared with baseline.
Equation 9.4 contains no explicit reference to d'', does that mean it is a
nonparametric result? The requirement is that the underlying distributions
be perceptually independent and that the arrangement be perceptually separable. These assumptions may or may not be correct in general, but in this
application the two dimensions are the two observation intervals of a single
trial. It is common to assume independence and separability in this case (although remember that non-independence was one of the reasons conjectured to account for the superiority of 2AFC and reminder over the level
predicted from yes-no).
Threshold Analysis. Same-different data are often summarized
by proportion correct, but this measure turns out to imply a threshold model
in which the participant covertly classifies each stimulus into one of two
categories. Let/?; and/?2 be the probabilities that Sl and S2, respectively, are
classified covertly by the participant as stimulus Sr The observer responds
"same" whenever the two classifications agree, "different" otherwise. Then
as Pollack and Pisoni (1971) have shown,
P(c\^dlSeKM = ^+(P2-pf]

•

(9-5)

A similar analysis (see Creelman & Macmillan, 1979) reveals that proportion correct by an unbiased observer in both yes-no and 2AFC is

218

Chapter 9

(9.6)
Combining Equations 9.5 and 9.6 leads to a prediction about the relation
between same-different and yes-no performance, for an unbiased observer,
and it is again Equation 9.4 (see Computational Appendix). Apparently, for
an unbiased observer, the covert-classification and independent-observation models are the same. Discrepancies arise when observers display bias,
because the ROC implied by proportion correct has the wrong shape. This
familiar shortcoming of proportion correct is of even greater significance
for same-different than for other paradigms we have discussed, because
participants seem to naturally adopt strong response biases in same-different experiments. In particular, a preference for "same" is commonly observed for hard-to-discriminate stimuli, which are perforce perceived to be
the same on many trials.
Response Bias. The participants in Example 9a display just such
a preference for "same" over "different" responses, implying that the
criterial value of likelihood ratio is some value greater than 1.0. Figure 9.2
shows how the decision space is divided up by an observer who is biased toward "same" so that an observation must be at least twice as likely to come
from a Different trial to evoke a "different" response.
It is possible to convert the representation of Fig. 9.2 to a one-dimensional one. In this strategy, described by Irwin, Hautus, and Francis
(2001), the decision axis is the likelihood ratio j8. [more precisely,

FIG.
9.2. Decision
space for the same-different experiment. The
decision rule is to respond "different" in the
shaded area; this observer is biased toward
"same."

Classification Designs for Discrimination

219

'], and a Same and a Different distribution are constructed on that
axis. (Neither distribution is Gaussian in shape.) The height of the Same
distribution for a specific value of /J. is the sum of heights of the  and
 distributions in Fig. 9.2 for which /J(. has that value, and the height
of the Different distribution is the sum of the heights of the  and
 distributions over points for which /J. has that value. One measure
of response bias is simply /?, and another is the criterion location on the
decision axis, denoted ci and equal to In(j3.)/d'.
Figure 9.3 shows isobias curves for both of these measures, and it is immediately clear that they bear family resemblances to c and ft, the corresponding statistics for the yes-no design introduced in chapter 2. The
criterion location measure again behaves more regularly than likelihood

FIG. 9.3. Isobias curves for
criterion location ct (panel a) and
likelihood ratio /3, (panel b) according to the independent-observation model. (Adapted from
Irwin et al., 2001, Figure 4, with
permission from the author and
publisher.)

220

Chapter 9

ratio. Empirical isobias curves for visual (Irwin, Hautus, & Francis, 2001)
and auditory discrimination (Hautus & Collins, 2003) favor c. over p..
Calculating these measures is somewhat onerous, and for purposes of
comparing experimental conditions it is tempting to adopt the strategy of
simply using the yes-no formulas. A defense for this approach is the similarity between the curves in Fig. 9.3 and those in Fig. 2.7 for the analogous
indexes. For the current example, the statistic c (Eq. 2.1), equals -0.5[z(H)
+ Z(F)] = -0.5(0.253 - 0.842) = 0.294. The likelihood ratio p = 0(.6)/0(.2)
= 1.380. For comparison with biases observed when d' is higher or lower,
the criterion measure can, as before, be normalized by dividing by z(H) z(F): c' = 0.294/1.095 = 0.268. All show that there is some bias toward
saying "same" in these data.
ROC Curves. By systematically varying the critical value of
likelihood ratio and calculating H and F for each value, we can trace out
a same-different ROC. The important characteristic of such curves is
that they are approximately straight lines with unit slope on normal coordinates, so that z(H) - z(F) does not change with criterion. This result allows a simple strategy for finding d' in a same-different task: First,
convert z(H) - z(F) to the equivalent proportion correct for an unbiased
observer (Eq. 7.4):
/>(')_ = 0{[z(H) - z(F)}!2} .

(9.7)

Then insert/?(c)max into Equation 9.3 to find d'. We have followed this logic
in constructing Table A5.3, which provides d' corresponding to any value
of z(H) - z(F) observed in a same-different task.
We can now, finally, analyze the data matrix from the beginning of the
chapter. The transformed difference z(H) - z(F) equals z(.60) - z(.20) =
0.253 + 0.842 = 1.095, and/?(c)max = .71. The underlying d' is found from
Table A5.3 (or Eq. 9.3) to be 1.86.
Our model abandons the requirement of unbiased responding, but retains
another simplifying assumption: The critical value of likelihood ratio for
responding "different" is the same whether the observed difference is positive or negative. Although to our knowledge this assumption is shared by all
models for the same-different paradigm, it need not be correct: P("different"l<525i>) may not equal P("different"kS1(S2>), and the two halves of the
decision contour in Fig. 9.2 may not be symmetric.

Classification Designs for Discrimination

221

Differencing Rule
In chapter 7, we distinguished fixed and roving versions of the 2AFC experiment, according to whether two fixed stimuli recurred throughout a block
of trials or the stimulus pair roved along a continuum. The roving feature is
also often incorporated into same-different tasks. Suppose in our categorization experiment (Example 9a) there are four stimulus classes—5P 52, S3,
and S4—and we wish to measure sensitivity for each adjacent pair. A fixed
experiment requires three separate blocks of trials, whereas the roving procedure can employ just one. An appealing feature of roving experiments is
that they more closely resemble real-life situations, in which repeated
presentation of the same pair of stimuli is unusual.
Sample data for a roving same-different experiment are given in Table 9.2.
The participants' responses have been classified into those relevant to measuring sensitivity between S{ and S2, S2 and S3, and S3 and S4. Notice that some
Same trials are used twice in this table:  trials, for example, enter into
both the 5"/52 and S,/S3 comparisons. This fact produces a correlation between
adjacent sensitivities that would not be present in a fixed design.
TABLE 9.2

Sample Roving Same-Different Data
Response

Stimulus Pair

"Different"

"Same "

 or 

30

20

 or 

10

40

 or 

35

15

 or 

5

45

 or 

25

25

 or 

5

45

In 2AFC the observer's ideal response strategy is the same for both roving and fixed designs, but in same-different the response rules for the two
cases differ. To see why, consider a possible sequence of stimulus pairs in
roving same-different discrimination:  on Trial 1, then ,
, , and so on. The independent-observation decision rule portrayed in Figs. 9.1 and 9.2 requires the observer to independently assess the
relative likelihood that each sound arose from both of the two stimuli in the

222

Chapter 9

sequence. For a set of four possible stimuli, this rule is very complex: The
participant must estimate a likelihood ratio based on the 10 possible stimulus pairs listed in Table 9.2. If, as is often true, the observer does not know
exactly how large the stimulus set is, the information needed for the
calculation is not even available. Another strategy is needed.
Statement of Rule. The appropriate procedure is a differencing
strategy like that used in comparison designs: The two observations on a
trial are subtracted, and the result is compared to a criterion. If the difference
exceeds the criterion, the stimuli are called "different," otherwise "same."
The differencing strategy was first described by Sorkin (1962) and has been
found to describe data from experiments in pitch perception (Wickelgren,
1969), speech perception (Macmillan et al., 1977), and some visual discrimination situations that we discuss presently.
Figure 9.4 illustrates the differencing decision rule; for simplicity, only
stimuli S, and S2 are considered. The criterion lines for a constant difference
resemble the line for 2AFC (Fig. 7.1), but the decision space is more complicated. The shaded areas in the figure mark observations that lead to a
"different" response under the differencing rule, which is at odds with the
independent-observation rule in certain regions of the space.
An example cited by Noreen (1981) can be extended to contrast the rules.
Suppose the two stimulus classes are fifth- and sixth-grade boys, and the
only information available for discriminating the classes is height, which
averages 54 inches in Grade 5 and 56 inches in Grade 6. Then two boys

FIG. 9.4. Decision
space for the same-different experiment. The
effects of the two intervals are subtracted,
and the absolute value
of the result is compared to a criterion
(differencing model).
The decision rule is to
respond "different" in
the shaded area.

_

Classification Designs for Discrimination

223

whose heights are 54 and 56 inches should probably be judged "different"
(i.e., from different grades), but two boys whose heights are 58 and 60
inches should be judged "same" because both are more likely sixth than
fifth graders. This example mimics a fixed design and adopts an independent-observation strategy. In the corresponding roving paradigm, boys are
drawn from Grades 5 to 8, and the average heights are 54, 56, 58, and 60
inches. Again the heights of two boys are announced; now one must decide
whether the two are from the same grade or 1 year apart. Using the differencing strategy, any difference of two inches or more leads to a "different"
response. The strategy is not optimal—heights of 65 and 67 inches are more
likely Same than Different —but it is reasonable and simple to apply. And it
is the only sensible approach for a decision maker without knowledge of the
stimulus range.
Because the differencing rule depends on a single variable— the difference between two observations — we can simplify the decision space by
projecting the distributions onto one dimension (as we did for comparison
designs in chap. 7). Let us consider the probability distributions of the difference for each type of trial. When both trials contain the same stimulus, so
that the pair is either <5,5'1> or , the mean difference is 0. However,
there are two types of Different pairs: those that, when subtracted, yield a
mean difference of d', and those yielding a mean of —d'. The decision problem in one dimension thus involves three difference distributions on one
axis, as shown in Fig. 9.5. The representation resembles that for uncertain
increment-decrement detection (Example 8b), and the decision rule is the
same: Respond "different" whenever the observed difference is more extreme than k, either in a positive or negative direction. As in comparison designs, however, these are difference distributions; because two independent
variables with (by definition) variance 1 are being subtracted, they have
variance 2.
Sensitivity and ROCs. The hit and false-alarm rates result from
combining areas under these distributions:

If k is varied, Equations 9.8 can be used to trace out an ROC; some examples are shown in Fig. 9.6. Unlike the ROCs for the independent-observation rule, these do not have unit slope, so two points with equal values of

224

Chapter 9

FIG. 9.5 One-dimensional decision space for the same-different experiment according to the differencing model. The representation is equivalent to that in Fig.
9.4: The decision rule is to respond "different" in the shaded area. The hit rate is
the sum of all shaded areas under the right-hand (or left-hand) distribution; the
false-alarm rate is the sum of the diagonally shaded and cross-hatched areas under
the center distribution.

z(H) - z(F) do not necessarily have the same d'. Therefore, we cannot expect to find d'via z(H) - z(F), as we did for the independent-observation
model. Table A5.4, modified from the tables of Kaplan, Macmillan, and
Creelman (1978), gives d' for any (F, H) pair, assuming the differencing
model to be correct.
Applying the differencing model to our categorization data yields the
following sensitivity values: d' 12 = 2.16, d'23 = 3.07, and d1'34 = 2.32. What
would happen if we had mistakenly applied the independent-observation
model to these data? Table A5.3 yields d'n = 1.85, d'23 = 2.56, and d'34 =
2.04. The independent-observation model implies smaller values of d', as it
must, but the two models are not dramatically different for these data, leading to values of d' differing by an average of 13%. The ROCs suggest that
the greatest discrepancy will occur when the probability of responding "different" is small. Indeed if H = .10andF= .01, d' is 3.04 according to the differencing model and only 1.83 under the independent-observation model.
As Table 9.1 shows, p(c) by an unbiased differencing observer is poorer
than for the independent-observation rule (unsurprising because the latter is
optimal). One implication is that quite high d' values correspond to lessthan-perfect accuracy. This can sometimes be convenient: If participants

Classification Designs for Discrimination

225

FIG. 9.6. ROCs for the same-different (SD) and yes-no (YN) experiments, according to the differencing
model, on (a) linear coordinates, and
(b) z coordinates.

are "too good" in yes-no or 2AFC and the stimuli cannot be adjusted, a shift
to same-different can avoid a ceiling effect.
Response Bias in the Differencing Model. The likelihood ratio
J3d of Different vis-a-vis Same pairs is the easiest bias measure to formulate.
Its value at the point k is the average height of the two Different distributions
divided by the height of the Same distribution. Assuming equal presentation probabilities for the subtypes of each stimulus class, we have

To develop a criterion-location measure, it is helpful to consider an alternative version of the representation in Fig. 9.5 in which the decision
axis is the absolute value of the difference between the intervals. Only

226

Chapter 9

positive values can occur, of course; the Same distribution looks like the
right half of a normal distribution; the Different distribution looks roughly
like a normal distribution whose left tail has been cut off. Equation 9.8 still
applies, and k is still the criterion location expressed as a distance from 0.
A better reference point is the location at which ft = 1; the distance from
this point, which is denoted cd, can be calculated by a method given in the
Computational Appendix.
Figure 9.7 shows the isobias curves for cd and ft. The family resemblance
between these measures and the corresponding indexes for the independentobservation model (Fig. 9.3) and the yes-no experiment (Fig. 2.7) is clear and
encourages a preference for the criterion statistic.3 Data from an auditory experiment (Hautus & Collins, 2003) also support cd over ft.

FIG. 9.7. Isobias curves for (a) criterion location cd> and (b) likelihood ratio
fld according to the differencing model.
(Adapted from Irwin et al., 2001, Figure
2, with permission from the author and
publisher.)
3

Two alternative reference points have been explored: csd compares the criterion location to the point
J72, and c*., compares it to the point at which H=F. By the criteria of reasonable isobias curves and fit
to data, the second of these measures is a good one and the first is not (see Hautus & Collins, 2003).

Classification Designs for Discrimination

227

As with the independent-observation model, calculation of either of
these statistics is a somewhat tedious process, and the resemblance of the
isobias curves to those for c and j3 argues for the heuristic use of those statistics for the purpose of comparing experimental conditions. For our example, the values of ln(/3) are 0.32,0.68, and 2.27, and the values of c are 0.29,
0.38, and 0.64. That all values are positive reflects the preponderance of
"same" responses in the data.
Relation Between the Two Strategies
The independent-observation and differencing strategies are both special
cases of a general situation (Dai, Versfeld, & Green, 1996). Consider what
would happen if the correlation between x and y (which is 0 in the diagrams
so far) were substantial—that is, if we had perceptual dependence. We assume that the amount of dependence, represented by the correlation p between x and y, is the same for all four stimulus sequences.
The upper panel of Fig. 9.8 shows ellipses with correlation p. Because
the correlations (and variances) are the same in all distributions, this representation is equivalent to a mean-integral one in which the distributions are
perceptually independent, but the axes intersect at an angle of cos'^-p)
(Ashby & Townsend, 1986). The lower panel shows that when p is not 0
(and the angle between the axes not 90°), the spacing between the distributions is wider along the negative diagonal than along the positive one, an effect that results from the smaller standard deviation in that direction. The
optimal rule for this case is not straight lines intersecting at a right angle. In
fact the larger p is, the closer the rule is to two parallel lines perpendicular to
the negative diagonal, as in the differencing model.
Some Relevant Results
Our models for same-different performance could be tested in two ways.
One test concerns the shape of the ROC. The observers in the Irwin and
Francis (1995a) categorization experiment on which Example 9a is based
produced ROCs supporting the independent-observation model, but these
researchers have also shown that observers spontaneously adopt either
strategy depending on the stimulus set (Francis & Irwin, 1995; Irwin &
Francis, 1995a, 1995b). The independent-observation model applied when
observers compared letters varying in orientation (correct vs. reversed),
whereas the differencing model was supported by data using color patches
that could vary in any direction in color space (a type of roving design).

228

Chapter 9

FIG. 9.8. Two equivalent representations for perceptual dependence. In (a) each of the bivariate
distributions has a correlation p between x and y. In (b) the x and y
axes meet at an angle such that
cos(0) = -p.

A second approach asks whether either model correctly describes the
relation between same-different and other discrimination designs, and a
few studies have compared same-different with performance in either
2AFC or yes-no using the differencing model. In a taste study, Hautus and
Irwin (1995) found same-different d' to be just 3% higher than yes-no d'.
Macmillan et al. (1988), investigating a synthetic vowel continuum, found
estimated d'values to be almost exactly equal in same-different and 2AFC
for both fixed and roving procedures. Chen and Macmillan (1990) found
same-different d' to be 6% lower than 2AFC d' in line-length discrimination. In frequency discrimination, Creelman and Macmillan (1979) found
same-different d' to be 14% lower. Creelman and Macmillan also studied
a continuum of pure-tone octaves differing in relative phase, and for these
stimuli the model failed: d'was 50% higher in same-different. Taylor,
Forbes, and Creelman (1983) speculated about characteristics of these
stimuli that might account for the discrepancy.

Classification Designs for Discrimination

229

ABX (Matching-to-Sample)
Of the three stimuli presented on an ABX trial, the third is the focus. The
first two stimuli (A and B) are standards, Sl and S2 in a randomly chosen order, and the observer's task is to choose which of the two is matched by the
final stimulus (X). (A parallel notation, AX, is sometimes used for same-different. Again the first interval is not fixed as it would be in a reminder experiment; rather, A is a place holder for either possible stimulus.) Altogether
there are four legal stimulus sequences in ABX, and they are evenly partitioned by the two responses: "A" is the correct answer for  and
, and "B" is correct for  and .
Example 9b: Matching-to-Sample by Chimpanzees
In animal research, the ABX design is called matching-to-sample. Suppose
we want to know whether a chimpanzee can distinguish a circle (52) from an
ellipse (S1,). On each trial, we present the animal with an object (X) and two
keys to press (A and B), each of which is labeled with a "sample" for the
chimp to "match." The samples  are randomly assigned to the two key
positions, and thus are always in one of two orders:  or , then comparison would not be necessary: The animal might learn
to respond "A" when X was a circle and "B" when it was an ellipse, comparing X with a remembered criterion rather than the sample. Although this is a
perfectly respectable discrimination design—a kind of "reminder"
experiment—it is rarely performed.
Two aspects of matching-to-sample experiments are neglected in this description. First, the samples may be presented last (an XAB design), or
flanking the test stimulus (AXB), as in Example 9b. Second, a delay may be
imposed between the samples and the test stimulus. We describe optimal
strategies that are unaffected by ordering and assume perfect memory. To
capture the nature of nonoptimal processing, substantive theories must be
added to psychophysical models.
The following table offers some possible data for the chimp experiment.

230

Chapter 9
Response
Stimulus Sequence

"A"

"B"

X matches A: ^^S^ or <525,52>

30

20

X matches B.  or 

10

40

Our analysis requires that all four sequences be equally likely, so that we
can lump together the two sequences for which "A" is the correct response
and the two for which "B" is correct to form the familiar 2 x 2 table. The observant reader will notice that the numerical data are the same as Example 9a,
a same-different experiment. Hits are now defined as correct matches of Xto
the A sample, false alarms as incorrect matches of Xto the A sample, so that:

In this example, H - .6 and F = .2.
Summarizing discrimination data as a (false-alarm, hit) pair is, as for all
designs, only a start toward finding underlying detectability. Our goal is to
extract from these statistics the difference between underlying single-stimulus probability distributions. As for same-different, there are two contrasting decision rules, one using independent observations and one using
differencing.
The Independent-Observation Decision Rule
In the independent-observation model, the observer has to decide two
things: the order of the first two stimuli (A - B) and the value of the third
(X). Each of these decisions corresponds to a familiar design. The subsequence  can be either  or  so the first two intervals,
considered by themselves, compose a 2AFC task. The third stimulus X can
be either S1, or S2 so the third interval, considered in isolation, is a yes-no
experiment. Although each ABX trial contains three stimuli, there are
only two independent pieces of information: the order of the samples, or
standards (A and B), and the value of the third stimulus, X. If the internal
variable on which A and B differ is eccentricity, then the intelligent chimp
is interested in two statistics. One is the difference in eccentricity between
A and B (the information needed in 2 AFC), and the other is the eccentricity of X (needed in yes-no).

Classification Designs for Discrimination

231

These two variables combine independently, producing a space similar
to that for same-different, as shown in Fig. 9.9. (The model described here
is that of Macmillan et al., 1977.) As usual, the figure portrays the likelihood of distributions of the internal representations. The result of comparing (i.e., subtracting) the two standards is plotted on the horizontal axis,
the two possible orderings each generating a distribution, as in the 2AFC
model of chapter 7. The difference distributions have means of -d'and
+d", and variance twice as large as any single-stimulus distribution. The
horizontal axis in the figure has been rescaled by dividing by V2, so that
the means are at -d'N2 and +d'H2. The vertical axis of the figure represents the X part of a trial, on which a single stimulus drawn from one of the
two distributions is presented.
A full ABX trial yields a value on each axis, and thus a point in the plane
of the figure. Each of the four distributions in the figure arises from one of
the four possible stimulus sequences. If the stimulus sequence is ,
for example, then (A - B)H2 averages d'N 2, X averages d', and the chimp's
observation is drawn from the distribution at the upper right. To determine a
response, the unbiased observer partitions the decision space, using vertical
and horizontal criterion lines, into regions in which each response is more
likely to be correct. Observations in the shaded area of Fig. 9.9, the upper
right and lower left quadrants of the space, lead to an "A" response, other regions to a "B."

FIG. 9.9. Decision space
for fixed ABX (independent-observation
model).
Probability distributions of
joint occurrence of A - B
differences and X observations are shown for the four
possible presentation sequences. Abscissa values
are scaled by 1/^/2 to equate
standard deviations on the
two axes. The shaded region leads to the response
"A" and the unshaded region to the response "B."

232

Chapter 9

As always, the d' we seek to estimate is the distance between the means
of the Sl and S2 distributions. In the ABX decision space, this is the distance
between the means of the  and  distributions along the vertical axis. Because the decision strategy shown in Fig. 9.9 is unbiased, it is
sufficient to calculate p(c), which equals both H and 1 - F, and to consider
just one possible sequence. Proportion correct on an  trial has two
components, the probabilities of observations in the upper right and lower
left quadrants. Each component probability is the volume over an infinite
rectangular area. The analysis parallels that for the same-different design
earlier in the chapter, and proportion correct can be expressed as follows:

Equation 9.11 can be used to find proportion correct from d'. What the
investigator usually wants is the inverse function, which calculates d' from
proportion correct. Table A5.3, based on atable in Kaplan et al. (1978), provides a solution to this problem.
The chimp in our matching-to-sample example neglected to adopt the
unbiased decision rule shown in Fig. 9.9; that is, the animal's likelihoodratio criterion is some value other than 1.0. Boundaries in the decision
space for which likelihood ratio is constant but not equal to 1.0 resemble
those calculated for the same-different design (see Fig. 9.2). For each possible value of likelihood ratio, the hit and false-alarm rates can be computed by numerical integration. When this is done for many values of
likelihood ratio, an ROC curve results. It turns out that the ROC has unit
slope, so sensitivity depends only on z(H) - z(F), and (as was true for
same-different) d' can be determined by a two-step procedure. First, find
z(H) - z(F) using Table A5.1 and then convert to d' by using Table A5.3.
For our chimps, z(H) - z(F) = z(.60) - z(.20) = 1.095. According to Table
A5.3, d' = 1.57. A given performance level, notice, is more difficult to
reach in ABX than in yes-no.
The three types of bias measures discussed in chapter 2 can all be computed from ABX data. Absolute and relative criterion location are mimicked by 0.5[z(H) + z(F)] with or without dividing by z(H) - z(F). The
association is not precise because the idea of criterion location is, as can be
seen in Fig. 9.9, a two-dimensional one. The third measure, likelihood ratio,
is conceptually simple but computationally unpleasant. Remember that, as
in the one-interval experiment, all these measures convey exactly the same
information when sensitivity is constant; when it is not, a choice must be

Classification Designs for Discrimination

233

made along the lines sketched in chapter 2. Bias measures derived explicitly
for the ABX task have not been developed.
Roving ABX: Another Differencing

Model

The fixed versus roving distinction applies to ABX. For our chimpanzee,
the issue is whether every trial contains only a particular pair of circle and
ellipse or whether trials with ellipses of different degrees of eccentricity
might be intermixed.
In the roving design, the decision rule illustrated in Fig. 9.9 will not
work. Suppose there are three possible stimuli, a circle, a broad ellipse, and
a narrow ellipse, and a particular trial happens to contain the triplet  of stimuli, two of which are alike, and is asked to
locate the "odd" one, which may be any of the three. The identity of the minority stimulus is not known to the observer and can be either Sl or S2. There
are six possible stimulus sequences:  and , for both of
which the correct response is "A";  and , for which the answer is "B"; and  and , for which the answer is "C."
Oddity offers a new complication: Three rather two responses are allowed. To our knowledge, no models for characterizing response bias have
been developed for this design. Oddity is not restricted to three intervals, but
could include any number (although "triangular method" would be a poor
description of four-interval oddity). In practice, three interval is the most
popular variant.
Example 9c: Taste Discrimination
Oddity is a frequently used design in "sensory evaluation" experiments
conducted by food scientists to measure sensitivity to differences in taste

236

Chapter 9

and smell. We consider an enjoyable experiment of this sort, in which tasters attempt to distinguish between two wines: a Burgundy and a claret. Professional wine tasters, it should be said, would be unlikely to use this
method because it does not require that they be able to say which wine was
which. The oddity task is suitable for the enthusiastic novice, who might be
learning the aspects of taste and smell that differentiate wines, but still
would find identification of the dimensions of difference difficult.
On each trial, the taster receives three wine samples, two of one type and
one of the other. Whether the odd glass has Burgundy or claret is randomly
decided for each trial, as is the location of the odd glass among the three. Because there are six possible stimulus sequences and three responses, the
data from this study are best summarized in a 6 x 3 matrix, but in practice the
overall proportion correct is almost always reported. Let us suppose our
tasters, mimicking the results of an experiment with "aqueous solutions of
simple compounds" by Byers and Abrams (1953; described by Frijters,
1979a), are correct on 21 of 45 trials.
Measuring Sensitivity
A decision rule for the oddity task has been described by Frijters (1979b).
The observer compares each pair of presentations in the triplet, determines
the pair with the smallest difference, and chooses the response corresponding to the remaining stimulus. Thus, if glasses A and B are most similar in
taste, glass C is most different from the others, and response "C" is given.
Because the observer knows nothing of the dimensions of judgment, the absolute differences are used. In the language of our other models, this is a differencing, not an independent-observation rule.
The problem for the observer is portrayed in Fig. 9.11. The dimensions
of the space are two of the differences computed by the observer, those between the effects of Intervals A and B and between Intervals B and C. Each
triplet is composed of samples from 5, (Burgundy) and S2 (claret). The distance between a single S{ stimulus and a single S2 is, as always, d'. Hence,
the six possible sequences are readily located in the space. The distributions
corresponding to each stimulus are not circular because both axes depend
on stimulus presentation B, so that the two dimensions covary negatively.
The taster's decision rule is this: Find the smallest of the three differences
between pairs of stimuli, and select the response corresponding to the stimulus that is not in that pair. The decision boundaries arising from this rule
are shown in Fig. 9.11. At first, it may seem surprising that the areas allocated to the three responses are not equal. The area in which response "B" is

Classification Designs for Discrimination

237

FIG. 9.11. Decision space
for the oddity (ABC) task.
The joint probability distributions of A - B and B - C
observations for the six possible
presentation
sequences are shown. Elliptical equal-probability contours result from correlation
between the axes (B contributes to both). Decision
boundaries separate unequal areas of the decision
space because of this asymmetry. The region with vertical shading leads to response "A," the dark region
to response "B," and the unshaded region to response
"C."

appropriate is smaller only because of the covariance noted earlier; the
model does not predict any asymmetry in response rates.
Craven (1992) calculated sensitivity as a function of proportion correct for this decision rule and its extensions to m-alternative oddity, m
ranging from 3 up to 32. The results are given in Table A5.5. An observer
who correctly chooses the odd wine on 21 of 45 trials [p(c) = .47] has a d'
of 1.31.
This rule is clearly in ttfe differencing family; is there an independent-observation model for tfijs paradigm? Versfeld, Dai, and Green
(1996) derived predictions for §uph a model. As in the independent-observation rule fpr the same-different paradigm, the observer does not
subtract values, but instead computes the likelihood of the multi-interval
observation under each of the two hypotheses and bases a decision on
these likelihoods. For m = 3, the representation is thus four-dimensional
(the three intervals plus likelihood), and we do not attempt to illustrate it
here. Table A5.6 gives values of p(c) for both the differencing and independent-observation models, for m = 3, 4, and 5. For our wine taster, a
p(c) of .47 has a d' of 1.10; this is lower than under differencing assumptions because an optimal model requires a lower d' than a nonoptimal
one to reach any given level of performance.

238

Chapter 9
Threshold Analysis

Proportion correct by an unbiased observer can be calculated from threshold assumptions (Pollack & Pisoni, 1971). If/?, and/?2 are the probabilities
of covertly identifying Sl and S2 as being S2, then

Performance according to this model is intermediate between the differencing and independent-observation rules. If p2 = 1 -p{, then Equation 9.14 can
be solved for/?2 and this value converted to d''. Forp(c) = .47, as in the running example, d' - 1.20.
Summary
In a same-different experiment, a pair of stimuli is presented on each trial,
and the observer decides whether its two elements are the same or different.
Two stimuli generate four possible stimulus pairs in such an experiment.
The optimal strategy for the observer is to treat the two observations independently. Even with this approach, the task is more difficult than yes-no.
When the stimulus level is roving, the optimal strategy may not be available,
and a differencing rule may be used instead. In this approach, only the difference between the two observations on a trial is used in making a decision.
Performance is poorer than with the independent-observation strategy, especially in remote regions of ROC space.
In an ABX experiment, three stimuli are presented on each trial; the third
presentation matches one of the first two, and the observer's task is to decide
which. With two stimuli, four stimulus triplets are possible in this experiment. The optimal independent-observation strategy is to independently assess (a) the difference between A and B and (b) X. When stimulus level is
roving, the optimal strategy may not be available, and a differencing rule
may be used instead. In this approach, two differences contribute to the decision: A - X and B - X. Performance is poorer than with the independent-observation model.
In the three-alternative oddity task, two of the three presentations are the
same, and the observer must select the different one. Six stimulus triplets
are possible. The differencing model proposes that the observer finds the
smallest of the three differences and chooses the response corresponding to
the stimulus that does not contribute to it. The independent-observation

240

Chapter 9
Computational Appendix
Finding d'From Unbiased p(c) in Same-Different
Independent-Observations Model

From Equation 9.2,

This equals
p(c\D I0 = 2[0( and  pairs. Which is weighted more heavily in
these estimates, Same trials or Different trials? Looking at the entire data matrix, find/?(c)* and compare it to the average of the hit
and correct-rejection rates. Why are these not the same?
Response
Stimuli

"Different"

"Same"

14

<£,£,>

6



12

8



10

10



4

16



14

6


3 2

12

8



2

18

Classification Designs for Discrimination
9.5.

9.6.

9.7.

9.8.

9.9

9.10.

243

You observep(c) = .95 (symmetric bias) in a fixed same-different
task. What is d'l What would you predict p(c) to be in yes-no?
2AFC?
In the differencing model for same-different, the observer's decision is assumed to be based on the difference between the observations from the two intervals. Can you devise a decision rule that
uses the sum of the observations? How well will someone using this
rule perform compared with someone using the differencing rule?
Interpret the matrixes of Problem 9.1 as arising from an ABX experiment. (A = X corresponds to the top row, "A" responses to the
left-hand columns.) Calculate d' assuming (a) the independent-observation model to be correct, and (b) the differencing
model to be correct.
Suppose d' = 1 in a fixed-level experiment and the participant is
unbiased. What is p(c) in 2AFC, ABX, same-different, and oddity? Repeat for d' = 2. Is the ordering of conditions the same at
both levels?
Suppose d' = 1 in a roving-level experiment and the participant is
unbiased. What isp(c) in 2AFC, ABX, same-different, and oddity?
Repeat for d' = 2. Is the ordering of conditions the same at both levels? The same for roving as for fixed paradigms?
Repeat Problems 9.8 and 9.9, but assume that F = . 1 and find H for
each paradigm.

This page intentionally left blank

10
Identification of Multidimensional
Objects and Multiple Observation
Intervals

In an identification experiment,1 a single stimulus from a known set is presented on each trial, and it is the observer's job to say which it was—to identify it. The purposes of such experiments vary, but usually include obtaining
an overall index of performance, as well as a measure of sensitivity for each
stimulus pair and bias for each response.
If there are only two stimuli, identification is simply the yes-no task of
chapters 1 and 2, and performance can be summarized by one sensitivity
and one bias parameter. The nature of the stimuli is unimportant—it does
not even matter if they differ along one physical dimension (lights of different luminance) or many (X-rays of normal and diseased tissue). With more
than two stimuli, the task is easily described: One stimulus from a set of m is
presented on each trial, and the observer must say which it was. From the
participant's point of view, there is nothing more to say, but to extend the
analysis to more than two stimuli the dimensionality of the representation
must be known. If all stimuli differ perceptually on a single dimension, then
m - 1 sensitivity distances between adjacent stimuli and m - 1 criterion locations can be found along it, as we saw in chapter 5. Perceptual distances
for all other pairs of stimuli are easily calculated as the sum of the stepwise
distances between them. To characterize overall performance, it is natural
to add sensitivity distances across the range.

lf

The term identification has another meaning in speech perception, where it describes what we have
termed a two-category classification experiment, in which psychometric functions are collected. Paradigms like those in this chapter are sometimes distinguished by being termed absolute identification.

245

246

Chapter 10

The assumption of unidimensionality is a restrictive one, and in this
chapter we consider two multidimensional cases. In the first, all members of
the stimulus set are independent of each other and may be thought of as being processed by different channels. In our perceptual-space models, each
stimulus produces a mean shift along a different dimension. An important
application is to the special case in which identification is of intervals in
m-alternative forced-choice experiments. In a second experimental situation, the feature-complete factorial design, values on each of two or more
dimensions are manipulated independently. This design is useful in assessing interactions between dimensions, a topic we introduced (using simple
discrimination experiments) in chapter 8.
Object Identification
Example lOa: Letter Recognition
Consider a letter recognition task: The observer fixates the center of a video
terminal on which is displayed, briefly, a single letter followed by a "mask"
that serves to disrupt retinal storage of the stimulus. One of just four letters
can occur: N, O, P, or S. The task is to press a computer key corresponding to
the letter shown. Let us suppose that an observer obtains p(c) = .5 in this task.
We adopt the simplifying assumption that these four letters are processed
by independent channels. (Although this is too strong a requirement, it is
certainly better than assuming that the letters differ along a single dimension.) The decision space contains m = 4 distributions, each removed from a
common origin in a different dimension. Using the notation of previous
chapters, the m sequences to be distinguished can be written
, and .
Sensitivity (Assuming No Bias)
The simplest (and most optimistic) calculations assume not only that each
stimulus activates a separate, orthogonal channel, and that each is equally
far from the Null stimulus N, but also that there is no bias. In this case, p(c)
can be used to summarize accuracy. An SDT analysis that relates the proportion correct to d' was developed by Elliott (1964) and improved by
Hacker and Ratcliff (1979). Table A5.7 makes the latter calculations available. For each value of proportion correct, the columns show the associated
value of d' for different numbers of alternatives. For our observer, p(c) = .5
and m = 4 implies a d' of 0.84. The m = 2 column gives values for the

Multidimensional Identification

247

two-choice experiment. For example, if p(c) = .75 when m = 2, then d' =
0.95. The table shows negative values of d' for/?(c) less than 1/m, because to
score reliably below chance the observer must know enough systematically
to avoid the correct alternative.
For Choice Theory, again assuming no bias, ln(cu) can be calculated using an equation given by Luce (1963a, p. 140):

Notice that ln(a) = 0 when/?(c) = 1/m (the chance level) and reduces to the unbiased case of Equation 7.3 when m = 2. According to Equation 10.1, a score
of .5 in 4AFC and a score of .75 in 2AFC each indicate that ln(a) equals 0.78.
In the general Choice Theory model, sensitivity is related to the product
of the odds ratios that, for each cell, compare the probability of the response
actually given to the probability of the correct response:

The product n is over all cells in the stimulus-response matrix. When i =j,
the ratio P(R]S)/P(R\S) equals 1 and may be ignored, so the product is effectively over all the nondiagonal cells (those in which / =£/). Equation 10.1
is the special case in which all the proportions of correct responding P(Ri\S)
are equal top and all proportions of incorrect responding P(R}\S) (for i *j)
are equal to (1 -p)l(m - 1).
A Model Assuming "Bias Constancy":
The Constant-Ratio Rule
Luce (1959) developed Choice Theory from an axiom about the relation
among recognition tasks using different-sized subsets from a common universe (see chap. 4). According to the choice axiom, the ratios of response
frequencies in a confusion matrix do not depend on the number of stimuli in
the experiment. This constant ratio rule (Clarke, 1957) can be applied, for
example, to the data from Example 5f, in which the four stimuli were tones
differing in intensity. Table 10.1 gives the number of responses out of 50
presentations for each stimulus in a set of four.

248

Chapter 10

If we were to eliminate Stimulus 4, according to the constant ratio rule
we should find the results in the lower part of Table 10.1. The proportions in
this table are calculated by dividing original frequencies by the total frequency in the first three columns of each row. The response proportions for
Stimulus 3, for instance, are equal to the frequencies 11, 10, and 12, each
divided by their total, 33.
TABLE 10.1 Results of an Identification Experiment
With Four Stimuli and Four Responses
Response
(a) Original Data
Stimulus

1

2

3

4

1

39

7

3

1

2

17

12

10

11

3

11

10

12

17

4

3

5

9

33

(b) Proportions in Three-Stimulus Identification Predicted
by the Constant Ratio Rule
1

2

3

1

.80

.14

.06

2

.44

.31

.26

3

.33

.30

.36

Because the constant ratio rule can extract a 2 x 2 matrix from a larger
one, it can be used to calculate sensitivity for discriminating any stimulus
pair. In an experiment involving only Stimuli 2 and 3, the constant ratio
rule predicts P(R2\S2) = 127(12 + 10) = .55 and P(R2\SJ = .45. Either SDT
or Choice Theory can be used to calculate sensitivity. The predicted value
of ln(a) is ln[(.55 x .55)/(.45 x .45)]1/2 = 0.2. Predicted d'2 3 = z(.55) -z(.45)
= 0.25—lower than that predicted under unidimensional assumptions in
chapter 5, where we found d'23 = 0.4 for this example. Hodge (1967;
Hodge & Pollack, 1962) concluded that the constant ratio rule was more
successful when applied to multidimensional than one-dimensional stimulus domains.
The constant ratio rule is a variant of Choice Theory, in which bias is presumed not to depend on the stimulus subset being studied (Luce, 1963a).
This is a strong and (Luce suggested) uncongenial assumption. One way in

Multidimensional Identification

249

which the assumption could be correct is for participants to be unbiased in
all conditions.
A Complete Model
Bias assumptions can be avoided by using a more systematic approach. A
complete Choice Theory solution, which calculates bias parameters for
each response and discriminability measures for each pair of stimuli, is provided by Smith (1982b, Appendix B).
Interval Identification: /w-Alternative Forced Choice
(mAFC)
Now that we know how to model the identification of the correct object in a
set of any size, it is easy to translate to the identification of one interval in
which a stimulus might be presented. The analogous task is one in which
there are m spatial or temporal intervals, one containing S2 and the others 5r
The analytic problem is formally the same as for identification of objects,
just as the same-different discrimination task was formally the same as divided attention.
Example lOb: Multiple-Choice Exams
An obvious educational application of mAFC is the venerable multiplechoice exam, in which one correct and m -1 incorrect choices are provided
for each question. We wish to estimate true sensitivity for a student for
whom/?(c) in a four-alternative exam is .5, perhaps for comparison with another student who scores .75 on a two-alternative version. To make such
comparisons possible, our models must apply to any number of alternatives.
Representation and Sensitivity
In the initial statement of the 2AFC problem (Fig. 7.1), each interval corresponded to a separate dimension in the decision space. This representation
is also appropriate for m > 2 intervals, so there are as many dimensions in the
representation as there are intervals in the task. The optimal unbiased strategy is to choose the interval with the largest observation. In 2AFC, this rule
is equivalent to basing the decision on A - B and using a criterion of 0, so the
task can be analyzed as a comparison design. No such shortcut is possible
for m > 2.

250

Chapter 10

The models of the previous section apply directly to the raAFC problem,
and the assumptions of equal sensitivity and independent effects for all alternatives are apparently quite reasonable. If we are still willing to assume unbiased responding (a less compelling assumption), we can use Equation 10.1 to
convert p(c) to a distance measure. The calculations of the previous section
allow estimation of SDT and Choice Theory sensitivity parameters. Thus,
p(c) = .5 in4AFC corresponds to ad' of 0.84 and aln(a) of 0.78. According to
SDT, the comparison student for whom/?(<:) = .75 in 2AFC is superior: d" =
0.95 (Eq. 7.1). According to Choice Theory, ln(a) = 0.78 for both students.
For forced-choice experiments with unbiased decision rules, distributions consistent with Choice Theory are not logistic, but double-exponential (Yellott, 1977). Tables of mAFC performance for logistic distributions
have been published as well (Frijters, Kooistra, & Vereijken, 1980).
Response Bias in mAFC
The Choice Theory model for sensitivity given by Equation 10.2 also specifies m -1 bias parameters. Each response R. has a corresponding bias bf but
only the ratios between biases can be estimated:

The product is over all possible stimuli Sk.
As we have seen, bias is customarily ignored in analyzing mAFC data.
That it does not therefore go away is shown in some 4AFC experiments of
Nisbett and Wilson (1977), whom we quote:
In both studies, conducted in commercial establishments under the guise
of a consumer survey, passersby were invited to evaluate articles of clothing—four different nightgowns in one study ... and four identical pairs of
nylon stockings in the other .... [T]he right-most object in the array was
heavily overchosen. For the stockings, the effect was quite large, with the
right-most stockings being preferred over the left-most by a factor of almost four to one.(p. 243)

This study provides an interesting insight into the unconscious nature of response bias:
When asked the reasons for their choices, no subject ever mentioned
spontaneously the position of the article in the array. And, when asked directly about a possible effect of the position of the article, virtually all

Multidimensional Identification

251

subjects denied it, usually with a worried glance at the interviewer suggesting that they felt either that they had misunderstood the question or
were dealing with a madman, (pp. 243-244)

Nisbett and Wilson's experiments were unusual in that d' sometimes
equaled 0, so that there was no basis for choice in the task they put to their
participants. In other conditions, however, the stimuli really did differ. As in
2AFC, proportion correct is highest when bias is least, so the effect of asymmetrical responding is to depress measures of sensitivity that ignore the
possibility of bias.
Interval effects were found in psychophysical tasks by Johnson, Watson,
and Kelly (1984), who observed that/?(c)was higher for the third interval of a
3 AFC design than for the first. Such a result could arise from either bias or sensitivity changes across intervals. Bias effects can be diagnosed by application
of Equation 10.3; to uncover sensitivity effects requires a more complex model.
mAFC Compared With 2AFC and Yes-No
Do our equations and tables for mAFC accurately describe the relations
among two-, three-, and higher-choice paradigms? We know of few data
that address this question, but in an early experiment on tone detection
Swets (1959; cited in Green & Swets, 1966, pp. 111-113) found performance to be well predicted by SDT for up to eight choices. Equation 10.1
makes similar, and thus equally compelling, predictions for Choice Theory.
The Boundary Theorem. Detection theory models make, as always, distributional assumptions. Shaw (1980) showed that if the decision
rule is unbiased, a lower limit can be placed on mAFC performance regardless of the shape of the underlying distribution. Her boundary theorem relates this lower limit, in mAFC, to 2AFC performance by a generalization of
the area theorem:

Table 10.2 compares this lower bound with the level of performance predicted
from SDT. For moderate to high sensitivities, the two values are quite close.
Threshold Measures (Correction for Guessing) in mAFC
We saw earlier that 2AFC data could be "corrected for guessing" (Eq. 7.8).
An equivalent correction has also been applied to mAFC; because the
guessing rate is 1/m,

252

Chapter 10
TABLE 10.2 Gaussian Predictions (Table A5.7)
and Boundary-Theorem Limits (Eq. 10.4) for Proportion
Correct in mAFC by an Unbiased Observer

£^C'2AFC

d'
0.5
1.0
2.0
3.0

.64
.76
.92
.983
c

 and . The observer is instructed to decide
whether the second or third interval is unlike the others. Analytically, the
end intervals are reminders and are ignored by an optimal participant, so
this design is simply 2AFC. Of course it is an empirical question whether
performance will be the same: Perhaps the presence of the end intervals and
the instructions will serve to encourage inappropriate difference judgments
and lower performance, or perhaps the reminders will reduce memory
variance and actually improve accuracy.
Occasionally an incorrect characterization of the paradigm leads to substantive confusion. In chapter 9, we mentioned an experiment by Byers and
Abrams (1953) in which proportion correct was .47 in a three-alternative
oddity task. These investigators reported a "paradox": When the tasters
were presented a second time with the 24 triplets to which they had not responded correctly and were asked to choose the weakest or strongest stimulus (whichever was appropriate), they were successful in 17 (71%) of these
cases. The paradox lay in the ability to give relatively accurate reports to
triplets that had previously not been discriminable.
The paradox depends on the assumption that incorrect responses are
based on a total lack of knowledge (see Frijters, 1979b). The use of p(c) to
compare the original oddity task with the later 3 AFC introduces a threshold
model. According to our continuous differencing model (Table A5.5), the
sensitivity corresponding to/?(c) = .71 is d' = 1.28, in good agreement with
the d' of 1.31 found earlier for oddity. Our analysis leads to a prediction:
The same level of performance, 71%, should be found for the 21 stimulus
triplets that were correctly responded to in the initial oddity task.
Some final thoughts on comparing discrimination paradigms are contained in this chapter's Essay.
Simultaneous Detection and Identification
In some situations, detection and identification are both interesting. (Obviously, the detection must be under uncertainty or else there is nothing to
identify.) In the laboratory, participants may try to detect a grating that has
one of several frequencies and also to identify which grating was seen. In
eyewitness testimony, the witness must both "detect" whether a perpetrator
is present (in the lineup, or in court) and also identify which person that is.

256

Chapter 10
Example We: Measuring Detection and Identification
Performance

The next example mimics the X-ray detection/spatial identification task of
Starr, Metz, Lusted, and Goodenough (1975). Possible data are shown in Table 10.3a. Two of the table's three rows are familiar from chapter 3: The top
row gives the proportion of responses (R) in each rating category when a
shadow was present in one quadrant of the X-ray stimulus, and the bottom
row gives false-alarm data from trials on which no signal was presented. The
second line of the table, which is new to this design, gives the proportion of
Signal trials that were assigned a particular rating and whose location was
correctly identified. The notation P(R&C\S) means "the probability of the rating and a correct recognition, given a Signal presentation."
Cumulating these proportions to give the coordinates of an ROC, as in
chapter 3, leads to Table 10.3b, and to two curves, for detection and combined detection/identification. There is only one set of false-alarm probabilities—it makes no sense to ask the likelihood of being right in identification
when no signal is present. Figure 10.3 shows the two performance curves:
the familiar ROC and (below it) the new identification operating characteristic (IOC).
TABLE 10.3

Detection and Detection/Identification Responses

(a) Proportions for Five Rating Categories
Rating
P(R\S)

5
.10

4
.25

3
.26

2
.25

1
.14

.24
.21
.10
.06
.07
.01
.15
.42
.35
(b) ROC and IOC Curve Coordinates Accumulated Across Rating Categories
P(R&C\S)

.08

P(R\N)
P(R\S)

.10

.35

.61

.86

1.00

P(R&C\S)

.08

.32

.53

.63

.69

P(R\N)

.01

.08

.23

.65

1.00

Relation Between Identification and Uncertain Detection
An independent-observation model can be used to predict the identification
operating curve of Fig. 10.3 from the uncertain-detection ROC
(Benzschawel & Cohn, 1985; Green, Weber, & Duncan, 1977; Starr et al.,

Multidimensional Identification

257

FIG. 10.3. ROC [P(R\S)]
and IOC [P(R&C\S)] for the
data of Table 10.3. The IOC
(identification
operating
characteristic) plots the proportion of trials on which
identification and detection
responses were both correct.

1975). Within this model, there is a natural decision rule: The channel with
the maximum output determines the identification response and is compared to a criterion to determine the detection response. Integration models
are not so easily adapted to identification.
To understand the relation between the two operating characteristics,
consider the decision space. Figure 10.4 shows a single detection boundary
of the independent-observation type used in uncertain detection (as in Fig.
8.6a). The identification boundary is symmetric because the observer is
simply choosing the dimension (channel) with the larger output. The two
criteria divide the space into four regions, those in which the observer responds "yes-1" (there was a signal, and it was 5,), "yes-2," "no-1," and
"no-2." All four regions are labeled in the figure.
Now we compare detection and identification for Sl. (Because the stimuli
are equally detectable and the identification decision rule is symmetric, we
need to think about only one of the two signals.) The probability of both detecting and correctly identifying Sl—the height of the IOC—is that part of
the 5, distribution in the "yes-1" area. The probability of just detecting
it—the height of the ROC—includes both the "yes-1" and "yes-2" areas and
must therefore be larger.
To trace out the IOC and ROC by increasing the false-alarm rate, the detection criterion curve is moved down and to the left. When the curve has
been moved as far as possible in this direction, both the false-alarm rate and
the detection (ROC) hit rate will equal 1. The identification (IOC) success

258

Chapter 10

FIG. 10.4. Decision space showing criteria for the simultaneous detection and
identification task. Observer gives both a detection response ("yes" or "no") and
an identification response ("1" or "2" was presented). The space is therefore divided into four regions, one for each response.

rate will equal the proportion correct by an unbiased observer in mAFC, as
can be seen by comparing Fig. 10.4 with Fig. 7.1. For m = 2, the area theorem (see chap. 7) implies that the asymptote of the IOC will equal the area
under the ROC. Green et al. (1977) generalized the 2AFC area theorem to
the case of m signals.
Subliminal Perception
Of the various stimulus-response events that can occur in simultaneous detection and identification, one has attracted special attention. Is it possible
for an observer to identify an undetected stimulus? If so, how is the combination to be understood? Such a result has generally been considered paradoxical at best; early attempts to demonstrate "subliminal" (literally, below
the threshold) perception were driven by a search for "motivational" determinants of perception.
A reexamination of Fig. 10.4 shows that identification without detection,
on some trials, is to be expected when orthogonal signals are used

Multidimensional Identification

259

(Macmillan, 1986). All observations to the left of the detection boundary
lead to "no" responses; if they arise from either  or , they correspond to failures to detect. Yet such observations—all those in the "no-1"
and "no-2" regions—fall more often than not on the correct side of the identification boundary (i.e., nearer to the Signal distribution responsible for the
observation than to the other), so identification performance will clearly be
above chance.
At least some psychophysically oriented tests of identification without
detection have supported this interpretation. In one that did not, Shipley
(1960) presented tones having one of two frequencies in a 2AFC uncertain
detection experiment and also asked her observers to state on each trial
which signal had been presented. She found chance recognition performance on trials for which the detection response was incorrect. But Lindner
(1968) was able to reverse Shipley's results by explaining to his subjects the
nonreality of thresholds. He found that the proportion of correct identifications increased with criterion (as the IOC suggests), and that identification
was above chance on incorrect detection trials.
Subliminal perception results seem surprising to the degree that an inappropriate, threshold model guides theorizing and experimentation: Failure to
detect is understood as a drop below threshold, rather than below criterion.
This interpretation meshes well with the idea that the threshold is the dividing
line between consciousness and its absence. As noted in chapter 4, however,
detection theory has no construct corresponding to consciousness. It is true
that instructions of which participants presumably are conscious can lead to
criterion changes, but the converse implication need not hold.
Using Identification to Test for Perceptual Interaction
GRTAnalysis of Identification
Identification experiments are a valuable tool for testing whether perceptual
dimensions interact or are perceived independently. General Recognition
Theory has clarified various types of independence (Ashby & Townsend,
1986) and provides two general approaches to testing it with identification
designs. We consider one such method here.2
2

The method we do not discuss, hierarchical model fitting (Ashby & Lee, 1991), is more computationally
intensive. A set of models is constructed in which more complex models are "nested" within and tested
against simpler ones. For example, a model that includes decisional separability might be compared
with one that does not; failure to find a statistically significant improvement in fit for the latter model is
considered evidence for decisional separability.

260

Chapter 10

In the basic stimulus set for testing independence, each value of one dimension is factorially combined with each value of the others. In two dimensions (the only case we consider), choosing two values on each dimension
leads to four stimuli, two on one and three on the other to six, and so forth. As
in all identification experiments, one stimulus is presented on each trial and
the task is to assign a unique label to each stimulus; this may be done by reporting the value on each dimension separately. Such an experiment implements tine feature-complete identification design.
In chapters 6 and 8, we distinguished three meanings of independence.
Perceptual independence (PI) is the independence of two variables and applies to a single stimulus. If X and Fare perceptually independent, their joint
distribution is the product of the marginal distributions,

and has circular or elliptical equal-likelihood contours that display no correlation. Perceptual separability (PS) refers to sets of stimuli, and it is present if the marginal distributions on one dimension, say X, are the same for
different values of F, that is,

and so forth for other values of Y. Decisional separability (DS) also refers to
sets of stimuli and means that the decision criterion on one variable does not
depend on the value of the other. When decisional separability occurs, decision bounds are straight lines perpendicular to a perceptual axis.
These independence qualities, or their opposites, are theoretical characteristics of the perceptual representation. Certain empirical features of the
identification data provide information about each type of independence.
We introduce a GRT method, Multidimensional signal detection analysis
(MSDA) (Kadlec & Townsend, 1992a, 1992b) that can be implemented using a straightforward computer program (Kadlec, 1995,1999).3 It is helpful
to refer to an example.
Example lOd: Perception of Curvature and Orientation
Kadlec (1995) asked her observers to identify stimuli that varied in curvature
and orientation (and also location, which we ignore here). There were two
3

Kadlec's program msda_2a is available at http://castle.uvic.ca/psyc/kadlec/research.htm.

Multidimensional Identification

261

levels of each variable, and thus four possible stimuli. The data from 200 trials per stimulus filled a 4 x 4 contingency table as shown in Table 10.4.
TABLE 10.4 Stimulus-Response Matrix for Identification
of Curve/Orientation Stimuli
Response Pair
Stimulus
Curvature 1, 50°
Curvature 1, 55°
Curvature 2, 50°
Curvature 2, 55°

"Curvature
1; 50°"

172
82
2
1

"Curvature
1; 55°"

13
98
2
15

"Curvature
2; 50°"

11
12
156
54

"Curvature
2; 55°"

4
8
40
129

The MSDA technique includes several distinct analyses; we illustrate the
approach by examining a "macroanalysis" of perceptual and decisional
separability. The question to be asked is whether judgments of curvature are
perceptually or decisionally independent of orientation. Three aspects of
the data are relevant:
1. Marginal response rates. Does the probability of a particular curvature response depend on the orientation? In the table, first look just at
the cases for which orientation was 50° (Rows 1 and 3). The hit rate for
curvature (probability of using response "1" for curvature 1) is (172 +
13)7200 = .925, and the false-alarm rate is (2 + 2)/200 = .02. Compare
these with the same proportions for cases in which orientation was 55°,
which are (82 + 98)/200 = .90 and (1 + 15)7200 = .08. Are the hit and
false-alarm rates invariant? Use of the MSDA program reveals that the
false-alarm rates are reliably different, but the hit rates are not.
2. Marginal d' values. The hit and false-alarm rates can be used to
find curvature d' for both values of orientation; the values are 3.49 and
2.69, which are significantly different.
3. Marginal criterion values. The hit and false-alarm rates can be
used to find curvature criterion values (relative to the means of the curvature-1 distributions) for both values of orientation; the values are
1.44 and 1.28, which are not significantly different.
What can we conclude from these calculations about independence of curvature and orientation? Table 10.5 (from Kadlec, 1995; Kadlec & Townsend,
1992b) summarizes the implications of the data. The left-hand columns give

262

Chapter 10
TABLE 10.5 Inferences About Perceptual and Decisional
Separability From Identification Data
Observed Results

Conclusions

Marginal
Response
Invariance?

Marginal d'
Equal?

Marginal
Criteria
Equal?

Perceptual
Separability

Decisional
Separability

T

T

T

yes

yes

T

T

F

yes

no

T

F

T

no

yes

T

F

F

no

no

F

T

T

yes

possibly no

F

T

F

yes

no

F

F

T

no

unknown

F

F

F

no

unknown

possible outcomes of the three statistical comparisons, in which the marginal
statistics can be equal (T, or true, in the table) or not (F, or false). Conclusions
about separability are in right columns. Notice that if the marginal responses
are invariant, then perceptual separability is associated with equal marginal d'
and decisional separability with equal criteria. In the absence of marginal response invariance, as in the example, conclusions are less firm. The
next-to-last row of the table tells us that Kadlec's data do not display perceptual separability and are inconclusive about decisional separability.
This example portrays only part of the MSDA method. For example, we
have considered the macroanalysis of curvature across orientation, but not
orientation across curvature (see Problem 10.3). Completely different calculations are needed to evaluate perceptual independence. Identification
tasks build on a detailed theoretical analysis (Ashby & Townsend, 1986;
Kadlec & Townsend, 1992b) and are a powerful tool for analyzing
interaction and independence.
Essay: How to Choose an Experimental Design
In this section, we offer some final comments on discrimination paradigms.
Are all the designs we have described—yes-no, identification, and several
examples of comparison and classification—equally useful? Although all
should, in principle, yield the same d' values, many factors influence the
choice of a paradigm. We begin with considerations that derive from our de-

Multidimensional Identification

263

tection theory models and then discuss the possibility that tasks differ in the
cognitive processes they require, and thus are not related as detection theory
says they should be after all.
Detection Theory Factors
Figures 10.1 and 10.2 suggest one important consideration in choosing a
design: level of performance. An observer with a particular value of sensitivity will do best, in proportion correct terms, in a 2AFC task, and worst in
oddity. Knowing this, which (if either) should the experimenter select?
Many sensory psychologists would opt for 2AFC, arguing that this produces the "best" performance of which the observer is capable. But considering only the detection theory models (as we are doing in this section), all
paradigms yield the same performance, that is, the same d'. Showing that a
participant can obtain d' = 1.5 in same-different is just as impressive as the
same demonstration in 2AFC, even thoughp(c) = .86 in 2AFC and only .55
in same-different (differencing model).
A more important consideration is the possibility of floor and ceiling effects. Most experiments aim to compare the discriminability of several different stimulus pairs, a goal that is hard to realize if p(c) is near chance or
near perfect. When/?(c) is near chance, making the task more difficult cannot produce a corresponding drop in performance; when p(c) is near perfect, not only can improvements not be seen, but detection theory measures
cannot even be calculated. Thus, 2AFC (and other high-performance paradigms) are desirable when sensitivity is low, but oddity and its low-performance relatives are desirable when sensitivity is high.
Processing Differences
In assuming optimal processing, detection theory models make a significant simplification. Although people surely fall short of the ideal, inefficiency itself is not usually a serious problem in application. (In chap. 12, we
discuss methods for investigating such inefficiencies in their own right.)
What is worth worrying about is the possibility that inefficiency characterizes some designs more than others, so that the relation among paradigms is
not as expected.
Three predictions of the models we have described have proved overly
simple in this way: (a) One-interval experiments often yield poorer performance than do other designs, (b) discrimination deteriorates with interstimulus interval, and (c) sensitivities measured in fixed and roving

264

Chapter 10

experiments differ by more than the models predict. Models have been proposed, and successfully tested, that account for each of these effects in
terms of memory limitation (see chap. 7).
Many experimenters entertain untested beliefs about relative performance across designs. For example, same-different, ABX, and oddity designs are generally thought to be easier for participants to understand than
2AFC or 3AFC. Also, people who serve as participants in psychophysical
experiments sometimes develop strong opinions about the decision rule
that best describes behavior. For example, observers in ABX (and even in
2AFC) sometimes report ignoring the first interval. Participants in multi-interval designs may report covert classification of the sort postulated by
threshold theory. Models quantifying these and other types of nonoptimal
processing have been developed and may seem more "psychological" than
the normative decision rules of detection theory.
Introspection is a useful source of ideas, but not of experimental truths.
Because the mental processing of which we are aware may not be significant in determining our performance, quantitative tests are necessary for intuitively appealing and unappealing theories alike. Furthermore, substantive theorizing is most likely to succeed when it starts from a solid methodological base. With our present understanding of the in-principle relation
between, say, 2AFC and same-different, the folly of building a memory
model to explain differences in p(c) between them is evident.
Finally, a corollary caution for the innovator: New designs need detection analysis just as much as old ones. Driven by the demands of new content areas, investigators continue to invent new ways to measure sensitivity.
Before results obtained with the new technique can be compared with older
data, a model of the new design is essential.
Summary
In identification tasks, observers provide distinct labels for each of m > 2
possible stimuli. If all stimuli are assumed independent, detection theory
analyses can estimate sensitivity for each stimulus and (for some models)
bias parameters for each response. Data from one identification task can be
used to predict the results of an experiment using a subset of the same stimuli using the constant ratio rule.
Multi-interval forced-choice, in which sequences of length m longer than
2 are constructed that contain one sample of S2 and m -1 samples of 5,, is a
special case of identification in which intervals rather than objects are identified. Such experiments can be analyzed with or without a no-bias assump-

Multidimensional Identification

265

tion using Choice Theory. An unbiased SDT model has also been
developed.
Experiments in which stimuli are both detected and identified can be analyzed using identification operating characteristics, which are theoretically related to receiver operating characteristics (ROCs) for the same data.
Identification of stimuli constructed factorially from values on two or
more dimensions provide data from which various types of perceptual interaction can be evaluated.
Methods appropriate for finding sensitivity in these paradigms are given
in Chart 6 of Appendix 3, those for finding response bias in Chart 7.
Problems
10.1.

10.2.

10.3.

Predict identification performance for a set of 25 orthogonal stimuli if detection d' for each is 1.0. What is the identification performance for a subset of five of these? two of these?
Suppose p(c} = .75 in a 2AFC experiment. What shouldp(c) be in a
3-, 4-, 8-, 32-, and 1,000-alternative forced-choice experiment according to SDT? according to Choice Theory? What are the minimum values according to Shaw's boundary theorem? (Assume
unbiased responding.)
Here is a confusion matrix for three speech sounds:
Rl

R2

/?3

Sl

60

10

10

52

10

60

10

53

10

10

60

(a) Using the constant ratio rule, predict d' for an experiment with
only two stimuli from the set. Do this for each possible stimulus
pair, (b) Use the d' values from part (a) to infer the perceptual representation. How many dimensions are required? (c) What would
happen if the methods of chapter 5 were applied to these data?

266

Chapter 10

10.4.

A three-alternative experiment yields the following response frequencies:

10.5.

10.6.

10.7.

R2

s

l

4

3

2

S2

2

6

2

S3

1

1

8

(a) Estimate d' and ln(a), ignoring bias, (b) Using the general
Choice Theory model (Equations 10.2 and 10.3), estimate ln(a),
ln(V£>2), ln(V^)' and ln(&A)In fixed experiments, p(c) is lower in 3AFC than in yes-no for small
d', but greater for large d'. At what value are the two equal? Answer
the same question for 4AFC, 8AFC, 32AFC, and 1,OOOAFC.
In roving paradigms, at what d' does p(c) in 3AFC equal d' in
ABX? in oddity? Answer both questions for 4AFC, 8AFC, 32AFC,
and l.OOOAFC.
Use MSDA to evaluate the perceptual and decisional independence
of orientation over curvature. (Mimic the analysis of curvature over
orientation, and summarize the results qualitatively rather than
conducting actual hypothesis tests.)

Ill
Stimulus Factors

One way to characterize the shift in the attitude of psychologists toward
their work that came with the cognitive revolution is as a decline in interest
in "the stimulus." In the behaviorist period, understanding the effect of presenting a conditioned or unconditioned stimulus, or a reward, was central,
and that effect was usually a more or less overt "response." In the cognitive
era, the focus has shifted to representations and processing, both nonobservable, and in this respect detection theory is a prototypical cognitive enterprise. In this book, we have repeatedly asked how experimental
situations are represented internally, and what sorts of decision processes
are applied to them. Details of the stimuli being used have been missing,
and in our treatment of data they have not been missed.
This story line is too simple, however, and in the next two chapters we
look at two important detection theory scripts that offer the stimulus a lead
role. Chapter 11, "Adaptive Methods for Estimating Empirical Thresholds," summarizes strategies for determining a stimulus whose detectability
or discriminability is at a preset level. Finding the stimulus corresponding
to a performance level is the inverse of the one-dimension problems in Part I
and assumes the same kinds of representations. The stimulus sets to which
adaptive methods have most often been applied are simple perceptual ones,
although advancing technology is broadening the scope.
Chapter 12, "Components of Sensitivity," is an introduction to the use of
detection theory in partitioning discriminability between the stimulus and
its processing, and among different types of processing. One of the first applications of SDT was in comparing the performance of human listeners to
ideal observers, hypothetical processors who make optimal use of the information in the stimulus in making their decisions. In this early work, sensory
applications dominated, but more recently the approach has advanced into
cognitive and even social domains.
267

This page intentionally left blank

11
Adaptive Methods
for Estimating Empirical Thresholds

Detection theory provides tools for exploring the relation between stimuli
and their psychological magnitudes. In the examples discussed so far, stimulus parameters have been chosen for their inherent interest, and the dependent variable has been d', ln(a), or some other measure of performance.
Often it is natural to turn this experimental question around and try to
find the stimulus difference that leads to a preselected level of performance.
Such a stimulus difference we have called the empirical threshold or simply
the threshold. For example, an experimenter may seek a physical difference
just large enough so that an observer in 2AFC obtains a d' of 1.0 or 1.5, or
(equivalently, for an unbiased observer) so thatp(c) equals .76 or .86. Empirical thresholds are unrelated to those of threshold theory (discussed in
chap. 4); indeed, they can be measured in either detection theory (d'} or
threshold theory [p(c)] terms. The double meaning of threshold, although
unfortunate, is unavoidable and need cause no confusion.1
Measuring a threshold requires access to a set of stimuli that range, on
some physical variable, from too small to too large for the desired level of performance. A field in which threshold measurement has been widely used is
audiology, which assesses sensitivity as an aid to the diagnosis of hearing
problems. Audiologists estimate thresholds by straightforward manipulation
of tone intensity using a Bekesy Audiometer (von Bekesy, 1947). The intensity of the tone being detected is either continuously increased or continuously decreased, and the observer is told to press a switch whenever the
stimulus is audible. The switch is connected to an automatic attenuator in
such a way that holding down the switch decreases the intensity and letting it
lr

The dual use of the term has caused confusion, in our view, in treatments of "subliminal perception." See
chapter 10 for a discussion of this issue.

269

270

Chapter 11

go increases the intensity. A graphic recorder marks the resulting up-down
swings in stimulus intensity over time. Threshold is ordinarily determined by
freehand averaging of the extremes, and clinical workers in audiometry expect accuracy of 5 dB from the result. For clinical diagnosis of disruptive
hearing loss, this degree of accuracy is sufficient.
In some other areas to which detection theory has been applied, stimulus
measurement and control are not simple, and not all sensitivity experiments
can be converted into threshold ones. For example, memory for words is affected by similarity in meaning within a list, but list similarity is difficult to
quantify (M. B. Creelman, 1966), and the prospects for measuring a
"threshold" of semantic relatedness are not very bright. The examples in
this chapter are from sensory experiments, and the stimulus variables are
simple attributes of auditory and visual signals.
Two Examples
Our examples illustrate problems for which threshold measurement makes
more sense than sensitivity measurement. In the first, two conditions leading
to very different sensitivities are compared; if the same stimulus difference
were used in each condition, at least one would necessarily give either perfect
or chance performance. In the second, no correspondence function is defined
by the experimenter. Sensitivity can therefore not be estimated by the methods in Parts I and II of this book, but threshold estimation is still possible.
Example 11 a: Auditory Thresholds at Different Air Pressures
What effect does a difference in air pressure across the eardrum have on
hearing? Creelman (1963a) addressed this question using tones of several
different frequencies as stimuli. The detectability of pure tones changes
greatly with frequency; because measurement of either very small or very
large sensitivities is difficult, using the same tone intensity for all frequencies was impractical. Suppose d' values for 100- and 1,000-Hz tones of the
same intensity are in the ratio 1 to 10. If the actual values are 0.5 and 5.0,
then/?(c) by an unbiased observer in 2AFC will equal .64 and .9998. The
second of these numbers means that only one error will be made in about
5,000 trials. Few experimenters wish to squander 5,000 trials on a single
sensitivity estimate, and in any case a single error in that span could easily
be due to a motor slip, attention lapse, or some other nonsensory factor. On
the other hand, if the stimulus intensity is reduced so that d' equals 0.1 and
1.0, p(c) will be .53 and .76. The first of these numbers is uncomfortably

Estimating Empirical Thresholds

271

close to chance; a further halving of d' from 0.1 to 0.05 will change p(c) by
less than two percentage points. Clearly, the problem requires that stimulus
intensity not be held constant.
Another important consideration in Creelman's study was the length of
an experimental session. Even small differences in air pressure across the
eardrum cannot be maintained for long, so relatively short runs and rapid
threshold estimation were essential.
Creelman chose an adaptive procedure to estimate thresholds for all
stimulus conditions; that is, the intensity of the stimulus being detected was
changed every few trials in response to the listener's performance. Such a
procedure can yield useful data in a short experimental run. Because calculating d' from a small number of trials is problematic, the sensitivity target
was a value of proportion correct, p(c) = .80. We have seen thatp(c) is most
acceptable as a sensitivity measure if performance is unbiased (chap. 4),
and that 2AFC tends to produce unbiased responding (chap. 7). Creelman's
experimental paradigm was 2AFC.
Example lib: Brightness Matching by Pigeons
Blough (1958) wished to measure equal-brightness contours for visual
stimuli of different wavelengths using pigeons as observers. He first trained
birds to peck at a button corresponding to the brighter of two illuminated
disks. When training was complete, the two disks were illuminated with
lights of different wavelengths, one of 450 nm (blue), the other 600 nm (yellow). When both lights were presented at an intensity of 100 units, the pigeon pecked the yellow button, indicating that the yellow spot was brighter.
The experimenter then increased the intensity of the blue light to 110 units
for the next trial. Whenever the yellow button was pecked, the blue light was
made 10 units more intense; whenever the blue key was pecked, the blue
light was decreased in intensity by 10 units. After a block of trials, the average level at which the blue light was presented provided an estimate of the
"threshold" intensity needed to match the yellow one in brightness.
Although two lights were presented in Blough's experiment, the design
was not 2AFC, but yes-no with a "reminder" (chap. 7). The pigeon could, at
least in theory, compare the brightness of the blue light to a remembered criterion corresponding to the (constant) intensity of the yellow light. An important difference between our examples concerns the events controlling
the change in stimulus level. In the hearing sensitivity study, intensity was
adjusted in response to the observer's sensitivity to the experimenter-defined correspondence. In the brightness matching experiment, intensity de-

272

Chapter 11

pended only on the response: No objective correspondence existed. In both
cases, performance was vulnerable to the effects of response bias, but with
different consequences. In the 2AFC task we needed to assume bias to be
slight to have faith in p(c) as an index, whereas in the matching task
whatever bias existed was part of the phenomenon being investigated.
Psychometric Functions
Definitions and Illustrations
Adaptive procedures work because, over some range, sensitivity (or, in the
matching case, sensory magnitude) increases with stimulus level. The experimenter knows, therefore, that performance will rise if the stimulus
value increases and fall if it decreases. The underlying relation between sensitivity and stimulus level, the psychometric function, was introduced in
chapter 5. Figure 11.1 presents illustrative psychometric functions for the
two examples just described.

FIG. 11.1. Two psychometric
functions, (a) Proportion correct
versus stimulus intensity in a
2AFC tone-detection experiment, with threshold corresponding top(c) = .8. (b) Proportion of
"brighter" judgments versus
stimulus intensity in a yes-no
brightness matching experiment,
with the point of subjective
equality (PSE) corresponding to
/T'brighter") = .5.

Estimating Empirical Thresholds

273

In the first panel, proportion correct is plotted against tone intensity for
one condition of our auditory detection example. The graph represents the
outcome of a conventional, nonadaptive experiment: A number of 2AFC
trials are presented at each of six intensities, and/?(c) is estimated for each.
The threshold we seek is the intensity for whichp(c) = .8. To estimate it, we
draw a smooth curve (exactly what curve we discuss later) through the
points. The threshold equals the stimulus level that corresponds to/?(c) = .8
on this curve.
A psychometric function for the brightness matching example is shown
in the second panel. Because this experiment has a yes-no decision (with a
reminder), the proportion of "brighter" responses ranges from 0 to 1 rather
than from .5 to 1. The "threshold" in this case is usually chosen to be the
50% point, the intensity for which judgments of "brighter" and "dimmer"
are equally likely. As we learned in chapter 5, this type of threshold is called
the point of subjective equality (PSE). Remember that this experiment has
no objective correspondence; therefore, there is no way to plot the data that
takes account of response bias. The procedure for finding the PSE in Example lib is formally the same as that for finding the threshold in Example
1 la, but the result is likely to be tainted by response bias.
The Shape of the Psychometric Function
General Considerations. Fitting a normal ogive (i.e., the Gaussian distribution function) to data of the type shown in Fig. 11.1 is traditional
(Woodworm, 1938); like many traditions, the procedure still commands respect, but not obeisance. We ask here whether this is truly the appropriate
form of the function and, if so, under what circumstances and for what reason.
The form of the function does not follow directly from detection theory
as we have described it so far. The psychometric function is a plot of sensitivity against stimulus value, whereas the underlying distributions in a decision space take values along an internal, psychological dimension.
Predictions about psychometric function shape can be made when the detection theory approach is joined to a model for stimulus transduction. To
take the simplest example, if stimulus intensity is converted linearly into
mean location on the decision axis and variance is constant, then the likelihood of judging a stimulus "brighter" can be obtained by moving a Gaussian distribution from low to high values relative to a fixed criterion.
If this same approach is applied to 2AFC experiments, two distributions
that move with respect to each other must be considered. An unbiased observer places a criterion halfway between the two means, and proportion

274

Chapter 11

correct equals the area under either of these distributions on the correct side
of the criterion. As the stimulus difference decreases toward zero, a normal
ogive is traced out, but the curve ends at/?(c) = .5, not/?(c) = 0, and it looks
like the upper half of the curve in Fig. 11. Ib.
Some 2AFC data do take this form, but a probably greater number resemble instead the complete curve of Fig. 11. la. There are two reasons
for this variability: differences in the stimuli and/or their processing, and
differences in how the stimuli are measured. An example of the first was
provided by Foley and Legge (1981), who found 2AFC functions resembling full ogives in a visual detection task, but half-ogive curves in a discrimination task with the same stimuli. The second reason is that if the
stimulus variable is monotonically but nonlinearly transformed, the
shape of the psychometric function must be affected. The very common
logarithmic transformation used in vision and hearing (the decibel is
such a transformation) tends to turn steep functions like the half ogive
into shallower ones.
The exact form of the psychometric function cannot be specified in the
absence of a stimulus theory. Many such theories have been proposed, and
we provide some illustrations in chapter 12 (e.g., quantum theory in vision,
Cornsweet, 1970; ideal-observer theory in audition, Green & Swets, 1966).
Except when stimulus theories are being used, it is appropriate to choose a
shape for the psychometric function on the basis of experience and convenience. Such criteria have led to three prominent candidates, whose
credentials we now consider.
Specific Quantitative Functions. The functions to be discussed
are mathematically different, but have similar shapes (Fig. 11.2). Each has
two parameters of substantive interest: One reflects the location of the function along the AC-axis and primarily determines the threshold (or PSE); the
other is a measure of slope, which indicates the rate of change in response
probability over the range.
The cumulative normal (Gaussian) distribution O has, as already mentioned, long been used to describe psychometric functions. The Gaussian
distribution function cannot be written as an algebraic expression, but is the
integral of the normal density (given in Eq. 2.5). The mean and standard deviation of the variable corresponding to the distribution determine the
psychometric function's threshold and slope. Probit analysis (Finney,
1971) is a set of procedures for fitting <& to data that take account of differences in binomial variance at different points on the curve.

Estimating Empirical Thresholds

275

FIG. 11.2. Cumulative
normal, logistic, and
Weibull functions compared on (a) linear, and (b)
normal coordinates. The
three curves have been
scaled to have similar
slopes and intercepts.

The logistic distribution function is
(11.1)

where ]ii is the threshold parameter and 0is the slope parameter. We saw in
chapter 4 that the underlying distributions implied by Choice Theory for the
yes-no task are logistic in form.
The third and final candidate is the Weibull function
(11.2)

p(x) = \The parameter a corresponds to the threshold and ($ to the slope. The
Weibull function has valuable theoretical properties (Green & Luce, 1975;

276

Chapter 11

Quick, 1974) and is extensively used in vision research (Graham, 1989;
Nachmias, 1981; Pelli, 1985).
All these functions increase from 0 to 1 as x increases. The lower asymptote of real psychometric functions is often greater than 0, however, for two
reasons. First, in a yes-no experiment the observer may well produce some
"yes" responses to the weakest stimulus; if this stimulus is a blank, or null,
these are false alarms. Second, chance performance may be higher than 0: in
2AFC it is .5, and in mAFC it is 1/m. In either case, the curve is often "corrected for guessing" so that the observed function P(x) is related to the true
function p(x) by
.

(11.3)

The consequence of this rescaling is that the function has the shape of a full
ogive, but ranges only from /to 1—for example, from .5 to 1 in 2AFC
(McKee, Klein, & Teller, 1985).2
Adaptive Versus Nonadaptive Methods
To obtain a threshold, the experimenter must present the observer with stimuli at different levels, but has many options in choosing the sequence of
stimuli. There are two general strategies: Decide in advance which stimuli
to use and how many of each to present; or decide about the next stimulus on
the basis of the observer's performance so far. The advance-planning approach has the longer history (Urban, 1908); we commented on several
variants and presented appropriate data analysis techniques in chapter 5.
The adaptive approaches, described in this chapter, do not require that the
experimenter know beforehand what stimuli are most relevant.
Psychometric functions take on useful values (neither near chance nor
near perfect) over a narrow range, and the investigator can locate the critical region only with the cooperation of the observer. Because the goal is
usually to locate the threshold, adaptive methods concentrate testing on
stimuli near it. An adaptive psychophysical procedure is a collaboration in
which the experimenter adjusts the stimulus in a detection or discrimination task on the fly, presenting on each trial a stimulus that is likely to yield
information about the location of the desired stimulus level appropriate
for the observer.
2

The form of the observed psychometric function is sometimes further modified to take account of
"lapses," trials on which the observer fails to give the correct response to large values of*.

Estimating Empirical Thresholds

211

The Tracking Algorithm: Choices for the Adaptive Tester
To define an adaptive procedure, the experimenter must answer five separate questions: (a) Under what conditions should testing end at the present
level and shift to a new one? (b) What target performance should be sought?
(c) When the stimulus level changes, what new level should it change to? (d)
When does an experimental run end? and (e) How should an estimate of
threshold be calculated? Because these questions are largely independent of
each other, an adaptive method can be constructed out of virtually any set of
answers. In this section, we provide multiple-choice alternatives for each
question based on answers given by past investigators.
Decision Rules: When to Change the Stimulus Level
Rules for deciding to change the stimulus level operate in one of three ways.
A new stimulus may be presented on every trial, when the trial-by-trial results match a set sequence, or when the observer's performance deviates
from its target by a specified amount. The target is the response proportion
sought by the experimenter.
After Each Trial. In the pigeon brightness-matching experiment,
the blue stimulus is changed after every trial. When the blue patch is
brighter than the yellow, its magnitude is decreased; when it is dimmer, it is
made more intense. The proportion of "brighter" responses after one trial is
either 1, which is higher than the target, or 0, which is lower. In the long run,
the procedure narrows in on the neutral stimulus to which the observer is as
likely to respond "brighter" as "dimmer." The proportion of "brighter" responses at that threshold stimulus equals the target proportion p(T), and
p(T) = P("brighter") = P("dimmer") .

(11.4)

Because P("dimmer") = 1 - P("brighter")—there are only the two possible
responses—p(T) equals .5, and the method estimates the 50% point.
Two other methods change the level on every trial, but are more flexible
in the target level they track. Kaernbach's (1991) method can estimate any
target percentage by systematically varying the size of the increasing and
decreasing steps. Maximum likelihood procedures consider the entire run
history in deciding on the new level. We discuss both of these approaches
further in the next section.

278

Chapter 11

When Results Match a Predetermined Pattern. In the up-down
transformed-response (UDTR) method (Wetherill & Levitt, 1965), the sequence of correct and incorrect trials at the current stimulus level is compared after each trial to a list of possible patterns. Some patterns require an
upward change in stimulus level, others a downward change. If there is a
match, the stimulus level is changed appropriately, testing is started again,
and a new record of trial results is started. If there is no match, another trial
is run with the same stimulus, and the pattern of results is extended using the
result of that new trial.
A favorite UDTR rule, often applied to 2AFC experiments, has p(T) =
.71: A single incorrect trial leads to a more intense stimulus, a sequence of
two correct trials to a less intense one. Let us verify that this rule does indeed
track the 71 % point, using the logic applied earlier to the 50% target. If p(T)
is the probability of a correct response, the likelihood of two correct trials in
a row is [p(T)]2. At threshold, the patterns that yield a decision to decrease
the stimulus must be as likely as the patterns that call for an increase, so
[p(T)]2 = .5 and/?(7) = >/5 = .71.
Levitt (1971) listed several other sets of sequences along with the probabilistic equations for p(T) of each set. In one subset, a single incorrect response leads to an increase in level (as in the rule just described), but the
number of correct responses needed for a decrease is greater than two. If
three successive correct responses are required, p(T) = .51/3 or .79; if four are
required, p(T) - .51/4 or .84. The experimenter can choose whichever member of this family of rules tracks the desired level of accuracy.
When Performance Deviates From the Target by a Critical Amount.
The optimal rule for determining whether an observed proportion differs
from a target was described by Wald (1947). In Wald's application, a factory
wants to shut down an assembly line if the proportion of defective units exceeds some limit, such as one tenth (we assume we are discussing defective
dolls, not defective automobile steering assemblies). The aim is to react as
quickly as possible if quality slips to a lower proportion, with some acceptable likelihood of error. In adaptive psychophysics we are interested in correct responses instead of satisfactory units; p(T) is the proportion correct at
threshold, and we want to know if the current stimulus gives either better or
worse performance.
Figure 11.3 illustrates the Wald rule for our auditory detection example.
For a series of trials at a single level, the number of correct trials is plotted
against the total number of trials (AO- If performance were perfect, the num-

Estimating Empirical Thresholds

279

FIG. 11.3. Trials correct
versus total trials during a
Wald run. Target performance isp(c) = .8, and deviations of one trial above or
below target are needed to
change level. In the example, a new (lower) level is finally called for after the 10th
trial.

ber of correct trials would equal the number of trials, and the graph would be
a line through the origin with a slope of 1. The expected number of correct
trials at the target equals p(T)N, the target proportion correct times the number of trials. In the figure, this expectation is shown as a straight line with
slopep(T). The experimenter does not require that the observer perform exactly at the target—in general, this is not possible—but demands that performance be within a deviation limit. In the example, this limit is 1: If the
number correct deviates from the expected target by at least 1, the stimulus
level is changed. The lines labeled Deviation limit are parallel to the target
line and bracket a region of performance consistent (by this standard) with
the 80% goal.
Hypothetical data are represented in the figure by crosses. The observer
is correct on Trial 1, incorrect on Trial 2, and correct on Trials 3 through 10.
For the first 9 trials, the total number correct is within 1 of the target number,
but on the 1 Oth trial the total of 9 correct is 1 greater than the expected 8. The
stimulus level is decreased before Trial 11.
The deviation limit can be set to any value, and both narrow and wide
limits have advantages. Narrow limits reject response proportions that are
even slightly discrepant from the target, and therefore lead to rapid decisions. Figure 11.4 shows the mean number of trials to reach a decision for
p(T) = .8 and deviation limits of 1.0, 1.5, and 2.0. The speed advantage of
the narrower limit varies with the true/?(c) at the level being tested, but can
be as great as 5 to 1.

280

Chapter 11

FIG. 11.4. Number of trials to reach a decision in a Wald test aimed at/j(7) = .8,
as a function of the true probability of a correct response, for three deviation limits.
Curves are based on Monte Carlo simulations.

Narrow limit decisions are quick, but they are often wrong. By waiting
for a larger discrepancy to occur, the experimenter can be more confident of
changing level in the right direction. In Fig. 11.5, the probability that a Wald
test decides that the level is too high is shown for each truep(c) value. For a
perfect test, this probability would be 0 for all values less than/?(7) = .8 and
1 for all higher values. The deviation-2.0 limit comes closest to this ideal. If
the true proportion correct is .65 and the target .8, a deviation-1.0 test incorrectly decreases the stimulus level about 20% of the time, a deviation-2.0
test only about 5% of the time.
In psychophysical applications, narrow deviation tests are often used for
their speed; accuracy derives from the use of many repeated tests as an experimental run proceeds. Hall (1974) showed, on the basis of simulations,
that greater accuracy in threshold estimates could be obtained from a larger
number of fast, variable, narrow deviation tests than from a smaller number
of slow, accurate, wide limit tests.
Target: What Performance Level to Track
In subjective yes-no (matching) experiments the target percentage is almost
always 50%, but in objective sensitivity tests the choice is less obvious. Probably the most popular target percentage in application is the 71% tracked by
Levitt's (1971) 2/1 rule. This rule is easy to implement, but Green (1990) has

Estimating Empirical Thresholds

281

FIG. 11.5. Probability
that a Wald test aimed at
p(T) = .8 yields the result
"too high," as a function of
the true probability of a
correct response, for three
deviation limits. Curves
are based on Monte Carlo
simulations.

argued that the most efficient target percentage is much higher. The inherent
variability of a threshold estimate depends on the target percentage because
this percentage is influenced by both the slope of the psychometric function
and the binomial variance of observer responses. Steep slopes (which occur
near the midpoint of the function) and low variance (which occurs near the
extremes) are desirable. Thus, Green suggested a "sweet point" that represents a compromise between these two goals. For 2AFC, this optimum occurs
at 91%; for procedures in which chance is less than 50%, it is lower. Experimenters wishing not to venture too far from the 2/1 rule can improve the precision of their estimates by using 3/1, 4/1, or some higher criterion for
lowering the stimulus level. Kollmeier, Gilkey, and Sieben (1988) showed
that the 3/1 (79%) rule was more efficient than the 2/1 (71%) rule in an auditory masked threshold experiment.
Stepping Rules: What Size Change in Level to Make
Having decided to abandon the old stimulus level, we must now select a new
one. How large a "step" in the direction determined by our test shall we take:
always the same size, a size decreasing during the run, or an adjustable size?
If the last, how much prior data should enter into our decision?
Fixed Steps. The simplest rule is to change the stimulus up or
down by a fixed amount; by architectural analogy, this is called a staircase
procedure (e.g., Blough, 1958; Cornsweet, 1962). To use fixed steps, one
must know the appropriate step size beforehand. This is the sort of advance
planning from which adaptive procedures are supposed to free us, but some-

282

Chapter 11

times the apparatus confines an experiment to fixed stimulus values, or the
experimenter is required to prepare a set of graded stimuli beforehand (as
when the stimuli are colored papers with differing spectral characteristics or
odorants). When fixed steps are unavoidable, the step size must not be too
small (lest performance change insignificantly between steps) or too large
(such that one step changes the task from trivially easy to impossibly difficult). In between, experimenters face one of the many tradeoffs of adaptive
procedures, that between inaccuracy and tedium.
Step Size Determined by Target Level.
Kaernbach (1991)
showed that any point on the psychometric function could be estimated by
choosing increasing and decreasing step sizes that are in the appropriate ratio. In general, to reach the target proportion p the ratio of magnitudes must
be/?/(l -p). For example, a target of 75% is reached by setting the increasing step to be three times the magnitude of a decreasing one.
Decreasing Steps. In an early paper, Dixon and Mood (1948)
suggested that steps in stimulus size be made smaller as an experimental run
progresses to take advantage of increasing certainty about threshold location. The Dixon-Mood rule prescribes a step size equal to the initial size divided by the number of steps taken to date in the experimental run. The
original application was in research on explosives, where amounts of various constituent chemicals could be chosen arbitrarily; stimulus continua in
many behavioral applications have this graded characteristic.
With steps of continually decreasing size, an incorrect decision about the direction of the next step can add to the time to reach a threshold estimate, because recovery from a bad decision takes longer with smaller steps. Most
decision rules are designed to be quick rather than highly accurate, so such an
incorrect decision—even a string of them—is likely. Some means to recover
from steps in the wrong direction by increasing step size is appropriate.
Adjustable Steps Determined by Immediately Preceding Trials.
The first proposal to address this problem was Parameter Estimation by Sequential Testing (Taylor & Creelman, 1967), or PEST (the field's first marginally clever acronym). PEST rules generally use decreasing step sizes, but
switch to increasing ones to recover from apparently incorrect decisions.
There are five rules:
1. After each reversal, halve the step size. A reversal is a step in the
opposite direction from the previous step, for example, an increase in

Estimating Empirical Thresholds

283

level following a decrease. A minimum value is specified below
which step size is not decreased.
2. A step in the same direction as the last uses the same step size as
previously, with the following exceptions.
3. A third step in the same direction calls for a doubled step size,
and each successive step in the same direction is also doubled until the
next reversal. This rule has its own exceptions.
4. If a reversal follows a doubling of step size, then an extra samesize step is taken after the original two before doubling.
5. A maximum step size is specified, at least 8 or 16 times the size
of the minimum step.
These rules are illustrated in Fig. 11.6, which shows the sequence of
stimuli used after each of several decisions to change level; the actual number of trials needed to make a decision varies and is not shown. We begin at a
level of 35, chosen to be relatively easy for the observer. Testing shows the
level to be too high, so it is changed downward by 20 units. This level yields
performance that is too low; applying Rule 1, the level is increased by half
the previous step size. The figure shows successive applications of the foregoing rules. From this point, the rules applied are as follows: 1,2,1,2,3,1,
2,4, 1, l.andl.

FIG. 11.6. Example of stimulus levels during a PEST run for successive blocks
of trials. The level changes for each block according to the indicated rule.

284

Chapter 11

Adjustable Steps Determined by the Entire History of the Run.
PEST's computation of the next level depends on the past history of the run,
but only some of it. In maximum-likelihood methods, a best estimate of the
threshold is calculated after each trial from the entire run history, the result
of all trials at all levels tested so far. The new level is set to that estimate, and
testing is continued.
A maximum-likelihood procedure assumes the underlying psychometric function to have one of the specific forms discussed earlier. This
function, which we call/?(*), specifies the proportion correct for every stimulus level x. If p(x) describes the data, then the likelihood L(x) of a particular
sequence of R correct responses followed by N - R incorrect responses to
stimulus x is:

(The probability of the entire sequence equals the product of probabilities
because we assume that trials are independent.) Thus, if a particular theoretical function predicts thatp(c) = .75 for a stimulus value x, and if in the testing to date there have been four correct trials followed by two incorrect
ones, then L(x) = (.75)4 (.25)2 = .0198 for that sequence. An expression of
this form can be written for any sequence of trials and any possible theoretical function. The overall likelihood of the function is the product of L(x)
values for all stimulus levels x for which data have been collected.
The experimenter chooses a form for the psychometric function and uses the
data to determine which function of that form is correct. Choosing the form of
p(x) specifies afamily of curves whose members differ in mean (threshold) and
variance (slope). The likelihood of the data is computed from Equation 11.5 for
each member of such a family, and the curve giving the largest value—the maximum likelihood—is selected. The current threshold is then the 80% point (or
some other point) on that curve. (For more discussion of maximum-likelihood
estimation, see Madigan & Williams, 1987; computational issues are considered in Press, Flannery, Teukolsky, & Vetterling, 1986).
In a seminal paper, Robbins and Monro (1951) showed that this strategy
is the optimally efficient way to find the threshold. Among the modern
methods that use the maximum-likelihood approach are Pentland's "Best
PEST," which assumes the underlying function to be logistic (Lieberman &
Pentland, 1982; Pentland, 1980), and Watson and Pelli's (1983) QUEST,
which assumes the Weibull function. The calculations these procedures require between trials of an ongoing experiment are well within the capability
of laboratory computers.

Estimating Empirical Thresholds

285

In both methods, the slope is fixed by the experimenter, and the likelihood calculation gives the probability of the obtained data assuming each
possible psychometric function with that slope. QUEST differs from Best
PEST in requiring the experimenter to provide some initial guesses, an a
priori distribution of likely threshold locations. The program uses a
Bayesian strategy to successively revise the odds as data are collected. Not
having to predict where the threshold might lie is an advantage of adaptive
procedures, but the prior estimates have been shown to improve QUEST'S
precision (Madigan & Williams, 1987).
Suspending the Rules. Many small choices the experimenter
makes are not specified by the adaptive procedure, and it is important that
these choices turn out to be truly irrelevant to the outcome. For example, the
rules for changing level, whatever they are, are often suspended at the beginning of a run in order to rapidly locate the region of the threshold. Many
experimenters begin testing with relatively large stimulus differences. To
reach the neighborhood of threshold, they may reduce the stimulus difference by a large step after only one or two correct responses, and revert to
normal rules after two incorrect responses. Other experimenters attempt to
begin each run at the current best estimate of threshold so that special
"run-in" rules are unnecessary. In maximum-likelihood techniques, outlying points (due to inattention or slow learning) can distort threshold estimates. If these points are truly far from threshold, however, they are visited
rarely and have only a small influence on threshold estimates late in a run.
Stopping Rules: When to End a Run
As with deciding when to change levels and what to change to, the experimenter can make more or less use of information from the observer in deciding when to stop.
Fixed Number of Trials. Many experimenters employ runs of
fixed length. Run length is, of course, perfectly predictable with this strategy, a significant advantage, but some of the flexibility of adaptive procedures is lost. For example, sufficient information to locate the threshold to
the desired degree of accuracy may be available before the run is complete.
Fixed Number of Reversals. The experimenter can decide to
stop the run after a fixed number of reversals, say 5 to 10. This strategy
avoids ending a run when the participant has not yet "settled down." The exact number of trials in the run varies, but not dramatically.

286

Chapter 11

Minimum Step Size. For stepping rules in which the step size varies, one can terminate the run when some minimum-size step is demanded.
In the example of Fig. 11.6, for example, data collection might be stopped
when a sub-minimum step of 0.3125 is required. This rule leads to considerable unpredictability in run length, but ensures that when the run is stopped
the stimulus level is near threshold.
Minimum Confidence Interval
The maximum-likelihood
QUEST computations allow a stronger version of the foregoing strategy.
After each trial, the prior distribution plus all data thus far collected are used
to calculate a confidence interval around the estimated threshold. The run is
ended when this interval is less than some minimum. Emerson (1986b) attributed some of QUEST'S apparent advantage over other procedures, in
simulations, to this stopping rule.
Summary Rules: How to Calculate a Threshold Estimate
Most strategies for calculating a threshold ignore the earliest segment of a
run. Many possible summary statistics remain, notably the following:
Average of All Trials. The simplest approach is to find the mean,
or median, of all levels visited.
Average of a Fixed Sample of Trials. In an adaptive session, the
stimuli presented on successive trials are heavily dependent, and those used
on more separated trials less so. Kaplan (1975) estimated PEST thresholds
by averaging stimulus values obtained every 16 trials and was able to make
precise estimates of the auditory threshold of highly trained observers. This
has been called the rapid adaptive tracking (RAT) mode for PEST, as opposed to the previously described minimum overshoot and undershoot sequential estimation (MOUSE) mode, in which testing is stopped when a
step smaller than some minimum is called for. In the RAT mode, testing is
continued by taking the allowed minimum step if smaller steps are requested. RAT is PEST with a long tail.
Average of All Reversals. Averaging stimulus values at each reversal is equivalent to finding the midpoint between reversals and averaging
these, on the assumption that the threshold lies halfway between reversals.
Besides omitting early reversals, some experimenters calculate a "trimmed
mean" by leaving out the most extreme values, perhaps on the assumption
that the observer should be allowed at least one extended lapse in attention.

Estimating Empirical Thresholds

287

Final Testing Level or Point on Best-Fitting Psychometric Function.
The original PEST package used the final testing level—the one called for
by the final minimum- size step— as its estimate of threshold. With a
Bayesian or maximum-likelihood estimation run, this is also a logical summary datum because all the data contribute to it. However, the target level
could in principle be different from the desired definition of threshold. For
example, one might choose a high (say 90%) target but still wish to report
the stimulus level corresponding to the 75% point. To do this, one finds the
best-fitting psychometric function and calculates the level that leads to 75%
correct on that function.
2AFC Threshold Estimation Without Response Bias.
None of
these summary measures is completely free of bias. Does such a statistic exist? We illustrate several possible solutions for a 2AFC detection experiment with three levels of intensity; hypothetical data are shown in Table
11.1. The second column gives the proportions correct for each intensity,
and these values are plotted in Fig. 1 1 .7a. A typical definition of threshold
in this case is the level leading to 75% correct; interpolating between points
yields an estimate of 1.57 units. Sometimes 2AFC data are corrected for
guessing, via a rearrangement of Equation 1 1.3:

Because .5 is the chance ("guessing") rate, 7 is set to .5. Values of the equation thus corrected are shown in the third column of Table 11.1. With this
version of the psychometric function, the natural definition of threshold is
the 50% point, and interpolation shows that this value is still 1.57 units.
To see that neither approach takes account of response bias, we must distinguish the two types of trials on which a given stimulus level can occur.
The non-zero intensity can be presented either in Interval 1 (e.g., the sequence <3,0>) or in Interval 2 (e.g., <0,3>). Column 4 in the table lists the
trial types in this format, and Column 5 characterizes the various types of
trials by the difference in intensity between the two intervals. Remember
(from chap. 7) that the difference between the stimulus effects in the two intervals is the optimal decision variable in 2 AFC. Column 6 provides the
proportion of "2" responses for each possible stimulus sequence, and the
overall p(c) values can be seen to be the averages of two quite different numbers at each level of intensity—for example, at Level 2, the proportion cor-

288

Chapter 11

FIG. 11.7. Three treatments of the
2AFC data in Table 11.1. (a) Proportion correct versus stimulus intensity;
(b) proportion correct adjusted for the
false-alarm rate versus stimulus intensity; and (c) z score corresponding
to proportion correct versus the
(signed) difference in intensity between Intervals 2 and 1. Only the last
approach permits separation of sensitivity from bias.

rect is .93 when the stimulus was in the second interval, but only 1 - .31 =
.69 when it was in the first.
When the proportion of "2" responses is plotted against this difference, as
in Fig. 11.7b, a complete psychometric function that increases from. 16 to .98
is obtained. This function offers a natural measure of bias, the PSE; it is not
even necessary to interpolate (in this case) to find that the PSE is an intensity
difference of-1.0 units. To estimate the threshold from curves like this, one
normally finds the average difference between the 75% and 25% points. The
resulting value is .5[0.72 - (-2.40)] = 1.56. This strategy attempts to eliminate
the bias in the estimate by averaging it away, but does not succeed. The bot-

Estimating Empirical Thresholds
TABLE 11.1

289

Constructing Bias-Free Psychometric Functions

Intensity

p(c)

p(c)
Corrected

Intensity
Pair

Intensity
Difference

P("2")

z[P("2")]

3

.91

.82

<0,3>

3

.98

2.05

2

.81

.62

<0,2>

2

.93

1.48

1

.67

.34

<0,1>

1

.84

0.99

<1,0>

-1

.50

0.00

<2,0>

-2

.31

-0.50

<3,0>

-3

.16

-0.99

torn line is the same as in the earlier calculations because each of the two
points being averaged is still influenced by response bias.3
The best plan, according to detection theory, is to plot the psychometric
function in units of z scores rather than proportions, as shown in Fig. 11.7c
(Klein, 2001). A natural definition of threshold is the level difference required to obtain a specific value of d' for a stimulus compared with the null
difference, d' = z(x) - z(0). Many investigators choose d' = 1, the 76% correct point in unbiased 2AFC. In this example, z(0) is found by interpolation
to be 0.5 units and z(2) equals 1.48, so d' = 1 is obtained when x approximately equals 2. The threshold for detection is therefore estimated to be 2
units, a value that is unaffected by response bias. The bias toward "2" in the
raw data led to an exaggerated impression of the observer's sensitivity (i.e.,
an unduly low estimate of threshold).
Evaluation of Tracking Algorithms
A very large number of adaptive packages can be constructed by combining
rules for changing levels, target percentage, rules for finding new levels,
stopping testing, and computing a threshold. Like Treutwein (1995), who
listed 21 separate procedures, we find the goal of deciding on a single
method that can be recommended in all circumstances to be beyond our
reach. Instead we attempt to set out rationales that might justify choices in
particular applications. A necessary preliminary step is to establish some
criteria for evaluation.
3

For subjective judgment UDTR tasks (such as Example 1 Ib), Jesteadt (1980) suggested a similar strategy: Estimate each of two symmetrically located points (e.g., 71 % and 29%) on subsets of trials and average them to obtain the PSE. The task is said to reduce response bias and provide the illusion of having
a correspondence function, in compensation for a loss of statistical efficiency.

290

Chapter 11
Evaluation Criteria

Statistical Characteristics of Threshold Estimates.
An empirical threshold is a statistic (see Appendix 1), an estimate of a theoretical parameter, and can be evaluated by asking two questions: (a) On the average,
is it equal to the parameter? The average discrepancy between a statistic and
the corresponding parameter is called statistical bias (a usage unrelated to
response bias), (b) Is its variability small? To find out, comparisons with
other, competing measures are made.
The variability of a threshold (or any other) statistic ordinarily decreases
as more trials are used in computing it. Taylor and Creelman (1967) suggested that the work accomplished by a procedure could be measured by the
sweat factor, equal to the product of the number of trials and variance. The
relative efficiency of a measure is its sweat factor divided into the sweat factor of an alternative index. One interesting basis for comparison is the
"ideal" variability, that constrained only by inevitable binomial variance
(see Appendix 1).
Computations and Experiments. Adaptive procedures can be
compared by conducting threshold measurements with human or animal
observers or by simulating the outcome. More computations than experiments have been done, not only because they can be more easily conducted,
but also because they provide a needed baseline for experimental data. The
threshold to be expected can sometimes be calculated by enumerating every
possible outcome of a series of trials (e.g., Madigan & Williams, 1987). For
less tractable calculations, a common approach is simulation using a Monte
Carlo method: An assumed underlying distribution determines the effective probability of each response at each stimulus "level," and "runs" of
many "trials" are presented (Press et al., 1986, ch. 7). The observer imagined by most simulators is ideal (see chap. 12), reaching the best possible
performance given the limitations of sensory and response variability (Taylor & Creelman, 1967, Appendix). Simulations must mirror all important
aspects of the threshold estimation problem. For example, runs must begin
at variable starting points because in real experiments we do not know the
relation of the initial stimulus to the observer's threshold; assuming knowledge about the starting point produces unrealistically precise estimates
(Watson & Fitzhugh, 1990).
We can now evaluate the main classes of adaptive methods, those depending on maximum likelihood, PESTilent rules, or staircases.

Estimating Empirical Thresholds

291

Maximum-Likelihood Methods
If adaptive procedures are compared solely in terms of statistical criteria
and assessed by simulation, then a choice is not difficult to make: Maximum-likelihood procedures (QUEST and Best PEST) have the greatest efficiency. Because of its use of a priori information, QUEST is the better of
the two in efficiency and bias for large stimulus ranges (Emerson, 1986b;
Madigan & Williams, 1987). Watson and Pelli (1983) found the efficiency
of QUEST to be 84%.
One reason that pure maximum-likelihood methods have not made
their competitors obsolete is that they make a number of assumptions. To
determine the psychometric function "most likely" to have produced the
data, one needs a constrained set of candidates; thus the form of the function must be known. Most methods also require knowledge of the slope so
that all the possible functions differ only in threshold. These assumptions
are more attractive in well-mapped research areas than in novel domains.
A further assumption is that the data result from an ideal observer: Whatever the psychometric function is, it is the same on every trial, with no
shifts in threshold, lapses in attention, or loss of memory for stimulus
characteristics. It is straightforward to simulate these kinds of
nonoptimality, but not to incorporate them into algorithms for choosing
the next stimulus level.
Nonparametric Methods Using PEST
PEST packages are nonparametric in that they make no assumptions about
the underlying psychometric function. Taylor and Creelman (1967) calculated PEST to have an efficiency of 40% to 50%, which is better than all but
the maximum-likelihood procedures. Madigan and Williams (1987) found,
in a word-recognition experiment, that PEST was no less efficient in practice than Best PEST or QUEST. PEST-estimated thresholds are biased low
for short experimental runs or large stimulus ranges (Emerson, 1986a;
Madigan & Willams, 1987).
Taylor, Forbes, and Creelman (1983) reported data suggesting that PEST
observers suffer less from sensitivity fluctuations than do participants in the
method of constant stimuli. Shifts that do occur can be detected by examining a plot of stimulus level against trials (Hall, 1983; Leek, Hanna, & Marshall, 1991). Such trends are more evident in PEST or UDTR than in a
similar plot derived from the sort of continual adjustment made by
maximum-likelihood methods.

292

Chapter 11
Nonparametric Staircase Methods

The UDTR method exercises simple staircase control over stimulus intensity. Its major advantage over the other methods is that computation and
stimulus control are simple. Computational complexity is an issue of diminishing importance, but it is still true that continuous adjustment of the stimulus variable is not practical in some domains. When experimenters know
fairly well what step size to use, and the ballpark of the threshold, UDTR
can be a good choice. Kaernbach's (1991) step-size method provides a
modest improvement in efficiency for short runs (Rammsayer, 1992).
UDTR decision rules are, in general, slightly less efficient than Wald rules
for the same target proportion.
A strategy often used in conjunction with UDTR is the interleaving of
multiple adaptive tracks. The particular adaptive track to be used on a trial is
chosen at random, and its current stimulus is presented. This reduces the
predictability of the next stimulus level and aids memory in that if stimuli on
one track are at a low, hard-to-remember level, then those on the other track
may not be. An apparent disadvantage is that twice as many trials are required, but in compensation one obtains two distinct estimates of threshold.
Two More Choices: Discrimination Paradigm
and the Issue of Slope
Two other choices, not logically part of the tracking algorithm, must also be
faced. Of the many available discrimination paradigms discussed in earlier
chapters, which is to be used? And is the threshold the only important feature of the psychometric function, or should the slope also be estimated?
Discrimination Paradigm
2AFC. Most modern adaptive psychophysical procedures have
used the two-alternative forced-choice paradigm, probably because of its reputation for minimal response bias. Although this reputation is deserved,
2AFC is less efficient and more statistically biased than the yes-no paradigm
(Kershaw, 1985; Madigan & Williams, 1987; McKee et al., 1985). The inefficiency results from the reduced range: p(c) takes values between .5 and 1 in
2AFC, whereas the yes-no hit rate increases from 0 (or the false-alarm rate y)
to 1. The response-bias-free method described earlier solves this reducedrange problem. The statistical bias arises because the lowest values of the
psychometric function near an asymptote can still yield erroneous decisions,
whereas the upper values, near the 100% point, are unlikely to give wrong answers. Threshold estimates are therefore systematically too low.

Estimating Empirical Thresholds

293

mAFC. Simulations show that offering more than two alternatives per forced-choice trial results in improved efficiency and smaller bias
(e.g., McKee et al., 1985). Some of the advantage arises from the increased
range of the psychometric function, the lower limit of which is 33.3% in
3AFC and 25% in 4AFC. In addition, mAFC designs make it easier to place
the target above the midpoint of the psychometric function—a desirable
goal (Leek, 2001). Auditory detection experiments confirm that 3AFC and
4AFC are to be preferred over 2AFC (Kollmeier et al., 1988; Shelton &
Scarrow, 1984). Of course more presentations per trial mean longer trials;
thus, Schlauch and Rose (1990) recommended three alternatives over four.
Yes-No. The task that maximizes the range of the psychometric
function is yes-no, which has a minimum response rate of 0%. An additional advantage, in most applications, is that each trial contains a single
temporal interval, reducing experiment time. Kaernbach (1990) developed
a "single-interval adjustment-matrix" procedure in which different targets
are reached by manipulating step size. Both signal and noise trials can occur, and the method aims at a response rate target of t=H- F. Stimulus level
is lowered 1 unit following a hit, increased tl(\ - f) units following a miss,
and increased 1/(1 - f) units following a false alarm. For a 75% target, these
values are -1, +3, and +4 units. In simulations and experiments, Kaernbach
showed a substantial benefit of this method over Bekesy tracking and
2AFC, especially when the time per trial was taken into account.
Slope
Most of our effort in this chapter has been aimed at estimating a single point on
a psychometric function, but it is often useful to know more. Knowledge of a
complete function is best obtained with the method of constant stimuli; Miller
and Ulrich (2001) provided a nonparametric method for estimating the function in some detail. Several investigators have designed adaptive methods with
a more modest goal: an accurate estimate of the function's slope. This statistic
gives information about the reliability of threshold estimates and is helpful in
maximum-Ukerihood calculations in which a slope must be assumed.
At the end of an adaptive run, response proportions for a number of stimulus levels are available, and in principle slope could be estimated by fitting
functions of differing slopes to the data. In fact, however, a set of levels that
is well chosen for the goal of estimating threshold is not ideal for estimating
slope-typically, the points are too close together. The first step in modifying
adaptive methods for slope estimation is to adjust the rules for selecting
stimuli.

294

Chapter 11

Consider, for example, the adaptive probit estimation (APE) procedure
of Watt and Andrews (1981). Four stimuli are selected that are thought to
span the major portion of the psychometric function, and a short run using
these values is presented. At the end of the run, the observed response proportions are fit by the probit method, and four new values are selected that
cover the new estimate of the function. The procedure continues in this
adaptive manner.
One way to view the simultaneous estimation of threshold and slope is
as a search through a two-dimensional parameter space, and current methods approach the problem from this point of view. In the earliest of these,
Hall (1981) used the PEST tracking rules combined with a large initial
step to guarantee a dispersion of stimulus values; both the starting point
and the initial step are adjusted between runs. The summary psychometric
function is chosen by maximum likelihood from a set varying in both
mean and variance. King-Smith and Rose (1997) and Kontsevich and Tyler (1999) improved on this basic approach by the use of maximum likelihood and Bayesian methods in stimulus selection as well as for data
summary. One cautionary conclusion from this body of work is that the
number of trials needed for an accurate assessment of slope is far greater
than the number needed for a threshold. Leek, Hanna, and Marshall
(1992) recommend 200 trials to find a slope value. Kontsevich and Tyler
(1999) estimated that 300 are required, versus 30 for a simple threshold
estimate. Clearly one needs a good reason for all this extra work; one compelling rationale would be the existence of theories that make predictions
about psychometric function slope.
Summary
Adaptive procedures estimate the stimulus level needed for a fixed level of
performance. The stimulus that yields some proportion of correct responding in a forced-choice task (or a specified hit rate in yes-no) is found by systematically varying the stimulus difference during an experimental run.
Procedures differ in the rules by which they decide to change stimulus
level, the target performance accuracy, the rules by which the new level is
computed, the criterion for ending a run, and the method of computing a
threshold from the data. The simplest rules change level after a fixed number of trials, by a fixed step size, stop after a fixed number of trials or reversals of direction, and compute threshold from one or a few points. More
complex rules make greater use of the history of the run, the prior judgments
of the experimenter, and the expected form of the psychometric function.

Estimating Empirical Thresholds

295

Computer simulations show the more complex rules to be more efficient, that is, to produce less variable and less biased estimates, provided
that their assumptions are correct. The most popular discrimination paradigm, 2AFC, is inferior to raAFC and yes-no, especially at low performance levels. Psychometric function slope can be estimated with
appropriate modifications in adaptive procedures, but at considerable experimental cost.
Problems
11.1.

Following are some stimulus values, together with the number of
correct and incorrect responses to date at each level in a 2AFC detection experiment:
Stimulus

Number Correct

Number Incorrect

-2.5

0

2

-2.0

0

1

-1.0

1

4

-0.5

2

3

0.0

3

2

0.5

4

1

1.0

4

0

2.0

2

1

2.5

1

0

For each of the following logistic psychometric functions, find the
likelihood of these data using the method of Equation 11.5:

1
l+e~*

l+e

Which function is most likely for these data? Plot both the observed
data points and the three theoretical curves.

296

Chapter 11

11.2.

A new trial is run at stimulus intensity 0, and the observer is correct.
Recompute the likelihoods of the three functions with the new data.
Which is now most likely? Plot the new data point on the graph you
drew for the previous problem.
A string of trials in an adaptive 2AFC experiment leads to the following correct (+) and incorrect (0) responses:

11.3.

+0+0++0+++0000++++++

11.4.

(a) Apply the Wald sequential test with p(T) = .75 and deviation
limit 1.0 until a decision to change level is made. Continue until
you run out of data. On what trials is the decision made, and in
which direction is the change?
(b) Apply the 2/1 UDTR rule in which p(T) = .71 to the same set of
responses, and answer the same questions.
(c) Apply the 4/1 UDTR rule in whichp(T) = .84 to the same set of
responses, and answer the same questions.
For each of the three rules in Problem 11.3, find the stimulus level
after each decision:
(a) using the PEST stepping rules with initial level of 32, initial step
of 16, maximum step of 16, and minimum step of 1;
(b) and (c) using the UDTR stepping rules with initial level of 32
and all steps of size 4.

12
Components of Sensitivity

What determines the degree to which two stimuli can be distinguished? Detection theory offers a two-part answer: Sensitivity is high if the difference
in the average neural effects of the two are large or if the variability arising
from repeated presentations is small. Common measures of accuracy like d'
are accordingly expressed as a mean difference divided by a standard deviation. In most of the applications we have considered, changes in sensitivity
are equally well interpreted as changes in mean difference or variability,
and attributing such effects to one source or the other is both impossible and
unnecessary. In the early chapters of this book, we therefore suppressed the
role of distribution variances, dealing only with mean differences and standard deviation ratios.
When the experimental situation is expanded beyond two stimuli, the locus of a sensitivity effect may become clear. If three stimuli differ along a
single dimension—light flashes varying only in luminance, for example—
and the extreme stimuli are more discriminable than the adjacent ones, systematic increases in mean effect provide the simplest interpretation. If the
perceptibility of a stimulus decreases when another must also be detected,
as in uncertain detection designs, it is natural to imagine that variance rather
than mean difference has been affected by the demands of attention. Our
treatments of these problems in chapters 5 and 8 adopted exactly these interpretations.
In the pure two-stimulus world, disentangling these two contributions to
sensitivity requires another approach. A starting point is to ask whether
there is variability within a stimulus class itself, and perusal of our several
examples reveals the answer to be: sometimes. Absolute auditory detection
typifies one case: Every presentation of a weak tone burst is the same, so all
the variability must arise from processing. The variance is entirely internal.
297

298

Chapter 12

Recognition memory is quite different: No matter how carefully the stimulus set is constructed, the items in it must differ in familiarity (or whatever
the decision variable is). If recognition is represented as a task of distinguishing two distributions of familiarity, external variance contributed by
the stimulus set combines with the internal variance of the fallible observer.
In this chapter, we examine efforts to partition blame for imperfect sensitivity between external and internal sources, and among components of
each of these. We begin with the simplest case, two distributions on one dimension arising from stimulus classes that are nonconstant. The primary
question is the relative importance of internal and external variance—the
efficiency of the observer compared with the best possible performance. In
the second section, we extend the information combination ideas of chapters 6 to 10 to partition variance among components of a stimulus and
among observers in a "team." Finally, we discuss hierarchical models in
which variance may arise at multiple levels of processing. The application
of these ideas in perception is too widespread to cover in a chapter, and a
thorough understanding requires sophistication in particular content areas.
We intend this introductory presentation to illustrate an important use of detection theory that is not treated elsewhere in the book.
Stimulus Determinants of d' in One Dimension
Example 12a: The Dice Game
The first detection theory problem encountered by many students is a "dice
game" described in the first chapter of Green and Swets' (1966) classic
treatment. On each trial, three dice are rolled, two conventional dice with 1
to 6 spots, and a third die that contains 0 spots on 3 sides and 3 spots on the
other 3 sides. You are given the total number of spots on the three dice and
asked to judge the value of the third, critical die. What decision rule should
you adopt, and how well can you expect to do?
In this artificial problem, the decision maker is obviously discriminating
between distributions, not constant stimuli.1 Possible totals range from 2 to
12 if the third die is 0 and 5 to 15 if it is 3, so values between 5 and 12 could
arise from either 0 or 3. Figure 12.1 shows the distributions, which are triangular in shape. The natural decision rule (natural to the reader of this book,
1

A given trial does not contain a distribution, but only an event—in this case, a number. We nonetheless
use the term distribution discrimination for cases in which the possible events in S{ and S2 are explicitly
varied by the experimenter.

Components of Sensitivity

299

Fig. 12.1. Distribu
tions of totals for the
dice game. The s1 dis

tribution gives the
possible totals for two
conventional dice plus
0, and the S2 distribution gives the possible
totals for two conventional disc plus 3.

who encounters the problem in chap. 12 rather than chap. 1) is to establish a
criterion at some value between 5 and 12.
If presentation probabilities are equal, what is the highest success rate the
player can obtain? In this case, the criterion should be at the crossover point
of the two distributions, and the decision rule is to respond "0" for totals of 8
or less and "3" for totals of 9 or more. Examination of the distributions reveals that the "hit rate" (correctly saying "3") and the correct rejection rate
(correctly saying "0") will each equal 26/36 = .72. This is the best performance level the observer can reach.
Models of optimal performance of this kind are called ideal observers.
The strategy described for the dice example is ideal in three senses. First, the
decision variable is the total number of spots, which is perceived without error. Second, the observed value is compared with a fixed criterion. Third,
the criterion is placed at a location that maximizes accuracy as measured by
p(c). When either of the first two characteristics is violated by a human observer, as when there is an error in perception or variability in the criterion
location, lower than ideal sensitivity results. If only the third characteristic
is violated, a point on the ideal ROC is obtained, and performance might or
might not be considered ideal depending on the application. Performance
that is reliably better than ideal is not possible.
Can we compute a d' for the dice game? The distributions are obviously
not normal, but the analogous statistic is easily found. The mean difference
is 3, the standard deviation can be shown to be 2.45, and the ratio of these is

300

Chapter 12

1.22. If this were ad', it would correspond to ap(c)max of .73, quite close to
the true value based on the actual triangular distributions.
Distribution Discrimination
Green and Swets intended the dice game as a pedagogical device, and for
that matter so do we. But a number of investigators have used similar tasks
to compare actual performance with that of an ideal observer who behaves
optimally.
Lee and Janke (1964, 1965), in fact, used numerical distributions as in
the dice game, except that the distributions were normal. In a later experiment of this type, Kubovy, Rapoport, and Tversky (1971) found that responding was close to ideal, but that about 6% of responses could not be
predicted by the optimal rule. This is what would be expected if the observer
shifted the response criterion from trial to trial; we discuss this further later.
Lee and his colleagues drew similar conclusions from one-dimensional
non-numeric distributions (e.g., grayness of a patch of paper).
Comparisons of Real and Ideal Observers
In measuring detection thresholds, it has been important to find out whether
limitations in sensitivity are inherent in the stimulus or derive from shortcomings of the observer. We consider one example from vision and one
from hearing, and then we consider how the relative contributions of these
two sources of "noise" can be estimated.
Absolute Visual Detection. An early experiment that compared
real and ideal observers predates the development of detection theory.
Hecht, Schlaer, and Pirenne (1942) asked how many quanta of light need to
be absorbed for a viewer to detect a weak visual stimulus (see Cornsweet,
1970, and Luce & Krumhansl, 1988, for summaries). Figure 12.2a shows
the percentage of "yes, I see it" responses as a function of stimulus intensity.
Because of uncertainties about how many quanta are filtered out by the optical system of the eye, these data cannot be used directly to determine the
minimum number of quanta required for detection. Thus, the stimulus axis
is labeled arbitrary: The values are only proportional to the number of
quanta reaching the receptors and cannot be used directly to find the exact
number of quanta being absorbed.
Ideal observers enter the picture because for a fixed light intensity the
number of quanta reaching the retina is not constant, but has a Poisson dis-

Components of Sensitivity

301

FIG. 12.2. (a) A psychometric function obtained by
Hecht, Schlaer, and Pirenne
(1942) for visual detection,
(b) The same data with theoretical Poisson functions
overlaid on them. Each curve
corresponds to a different hypothetical number of quanta
required for detection, and
each has a different slope.
The curve that assumes 7 or
more quanta to be required
for detection provides the
best fit. (Adapted with permission from Figs. 4.6 and
4.7 in Cornsweet, 1970.)

tribution. This distribution can be used to predict the shape of the psychometric function. If the data fit the prediction, we could conclude that the
only limitation in detecting weak lights lies in the stimulus itself. The Poisson is a family of one-parameter distributions for which the mean and variance are equal, and as a consequence the predicted slope of the
psychometric function depends on the number of quanta required for detection. Figure 12.2b shows a family of psychometric functions derived from
the Poisson. When the data points obtained by Hecht et al. are superimposed on these theoretical curves, the best fit is the case in which 7 quanta
are required for the observer to say "yes."
We know, therefore, that if the observer is ideal the number of quanta
needed for detection is 7, but we do not know whether the observer is in fact
ideal. As mentioned earlier, one sort of nonoptimality is variation in the ob-

302

Chapter 12

server's response criterion from trial to trial, and it turns out that the effect of
such variation is to decrease the slope of the psychometric function. The
psychometric function measured by Hecht et al. could, therefore, be steeper
than the true function, and the slope of that function would correspond to a
higher threshold. Hecht et al. were able to resolve this indeterminacy in a
more precise experiment at a single performance level (60%), and found
that 8 quanta were sufficient to produce this hit rate. The small difference
between this number and the good fit of the 7-quantum, ideal observer theoretical curve led them to conclude that the major limitation on visual detection was the variability of the stimulus. The human observer was, in this
case, almost ideal.
Detection of Pure Tones in Noise. One of the early applications
of detection theory, summarized in Green and Swets (1966), was to the detection of tones in noisy backgrounds. The ideal observer for this problem
uses all the detail of the stimulus waveform, and is thus termed a signalknown-exactly observer. The optimal analysis is to calculate a "cross-correlation" between the observation interval and a remembered copy of the signal. To predict d', one calculates the average output of the cross-correlator
for Noise and Signal trials and divides the difference by the standard deviation of this device. Human observers do not meet the prediction made in this
way. Their d' values are lower—they have efficiencies less than 1.0—indicating that they are not able to use all the information (e.g., phase) required
by the cross-correlation strategy.
If observers are not ideal in solving this problem, what aspects of the
waveform do they take advantage of? One way to answer this question is to
examine the potential performance of some nonoptimal strategies. For example, perhaps the observer simply calculates the energy in the observation
interval, discarding other information like phase and frequency. Energy detectors do a much better job of predicting performance. An exact correspondence could be interpreted to mean that human observers are ideal in using
the information they collect, even if they do not incorporate other information that could raise their accuracy.
Combining Internal and External Noise. Because internal noise
is never exactly zero, sensitivity in tasks containing stimulus variation is
limited by a combination of internal and external noise. The most common
approach to modeling this situation is to imagine that the two types of variability are additive, so that the total variance limiting performance is simply
the sum of the external and internal contributions.

Components of Sensitivity

303

Suppose, as in Lee and Janke (1964), that an observer is discriminating
two distributions of line lengths, with means of 10 and 14 cm and a common standard deviation of 2 cm. The ideal observer's sensitivity can be
written as
(12.1)
where Ml and M2 are the means of the distributions and CTE is the external
standard deviation. Thus the situation has ad' of 2 and an unbiased p(c) of
.84. If actual (unbiased) accuracy is only .76, so that d' = 1 .4 1 , how much of
the variability is internal and how much is external?
We assume that the effective variability is the sum of the external and internal variances and these two components are independent. Then

where o; is the internal standard deviation. The ratio of ideal to observed d'
can be used to estimate the ratio of internal to external variance because
(combining Eqs. 12.1 and 12.2)

In the example, this ratio equals 1.0, leading to the conclusion that the
amounts of internal and external variance are equal.
What exactly is "internal noise"? In Lee and Janke's length discrimination task, the fault probably lies in the decision rule rather than the encoding process. Two ways in which decision making might be imperfect were
raised in the Essay in chapter 2. If the observer is completely inattentive on
a proportion 7 of trials, performance declines. In this case, the average
p(c) of .76 would equal (.5)7 + (.84)(1 - 7), so 7 = .24—the observer is not
attending to about one quarter of the trials. Alternatively, as mentioned
earlier, the observer allows the criterion to vary. The equality of internal
and external variances implies that the standard deviation of criterion location, like that of the length distribution itself, equals 2 cm.
Our simple model for combining internal and external noise allows us
to calculate the observer's efficiency, defined as the square of the ratio of
ideal to observed d' . A rearrangement of Equation 12.3 shows that in this
case it equals crE2/(("B") rather than P("A"). COSS functions for
both segments are given in Table 12.1.
TABLE 12.1 COSS Functions for the Ashby and Gott
(1988) Experiment

Lengths
<250

Vertical
P("A")
z[P{"A")]

0.00

Horizontal
P("B")
z[P("B")]

0.00

250-300

0.09

-1.34

0.00

300-350

0.03

-1.88

0.11

-1.23

350-400

0.17

-0.95

0.13

-1.13

400-450

0.44

-0.15

0.36

-0.36

450-500

0.72

0.71

0.55

0.58

500-550

0.94

1.56

0.85

1.04

550-600

0.95

1.64

0.92

1.40

>600

1.00

1.00

Because the A and B distributions are approximately normal, converting
the COSS functions to z scores should produce straight lines, and Fig. 12.4

Components of Sensitivity

305

FIG. 12.3. Data reported by
Ashby and Gott (1988) for
one observer in a classification experiment. Stimuli consisted of a horizontal line segment joined to a vertical line
segment. The lengths of both
were varied by the experimenters and are represented
on the horizontal and vertical
axes, xs indicate stimuli that
the observer judged to come
from a distribution centered
at the point (400,500), «s indicate stimuli judged to come
from one centered at (500,
400), and as indicate response
inconsistency. (Adapted with
permission from Fig. 7 in
Ashby and Gott, 1988.)

very much like it), the problem is converted into a one-dimensional one like
that of Lee and Janke (1964). To determine the optimal value of d', we first
find the difference between the two means from the Pythagorean Theorem;
it is (1002 + 1002)1/2 =141. The standard deviation of 84 is the same in all directions, so d' = 141/84 = 1.68, and the value of/?(c) expected for an unbiased observer is .80. All three observers obtained scores close to this (.88,
.82, and .79), leading to the conclusion that virtually all the variability limiting a decision in this case was in the stimulus distributions themselves.
This conclusion is supported by a closer look at Fig. 12.3. The diagonal
line is the ideal decision boundary, v - h = 0. The xs and »s do not correspond to the A versus B distributions, but to the responses given by the observer. Virtually all the xs ("A" responses) and »s ("B" responses) were
consistent with the use of the ideal boundary, and the "errors" arose from
cases in which the A distribution produced stimuli in the "B" region and
vice versa. Put in terms of the previous section, these observers displayed
little internal noise.
The questions we have asked about classification of two-segment line
figures focus on overall performance, and the analysis has been quite similar to that applied by Lee and Janke (1964) to classification of single
lines. The two-dimensional distributions also allow us to explore the way
in which the two components combine to determine overall performance,
and Ashby and Gott used their stimulus set to compare different integra-

Components of Sensitivity

Line Lengths

"

307

FIG. 12.4. COSS functions constructed from the
Ashby and Gott data in
Fig. 12.3.

shows that this is roughly true. The slopes of the lines (0.0124 for vertical
location and 0.0118 for horizontal) measure the effectiveness of the two
components in making the classification judgment and can be used to determine the weighting given to each dimension.
The decision rule assumed by COSS is to compare a weighted sum of
component observations with a criterion. Calling the horizontal dimension
jc, and the vertical dimension x2, the rule is to respond "A" if a^{ - a^x2 > c.
(The a;.s are weighting constants, and the minus sign arises from the negative relation between the two components.) Solving this relation, we find
that x2 must be greater than v2, where
y2 = (c + a,xl)la2 .

(12.4)

This equation, which describes a line in the space of Fig. 12.3 with slope
ajav is the decision bound for the vertical segments. There is an analogous boundary v, for the horizontal segment. Berg showed that the values
of the weights al and a2 depend on the variance of these variables in the
following way:
a\ ^var[y,J+(T 2
a]

(12.5)

var[y 2 ]+a 2

where cr,2 and (T22 are the external variances of the vertical and horizontal
length distributions. Intuitively, one component receives more weight than

308

Chapter 12

another if the variance of either the stimulus distribution or the observer's
responding is smaller.
Applying this analysis is straightforward. It turns out that the slope of
each COSS function [z(P("A"lv.)) vs v(.] equals the square root of the inverse
of var[y(], so

a 2_2 _
2

I
.01242
1
.0118'

842

(12.6)

Because a{ and a2 must add to 1, the equation can be solved: al = .506 and a2
- .494. The two segments contribute nearly equally to the categorization decision, and the decision bound in Fig. 12.3 has a slope of 1.02.
This conclusion seconds that of Ashby and Gott, which was based on a
different analysis of the data. They also found that the best-fitting linear
bound for this participant in the space of Fig. 12.3 had a slope of about ^implying that jCj and x2 were equally weighted by their observers. Although
this example suggests that GRT and COSS analyses are just two mathematical translations of the same text, more complex perceptual situations reveal
advantages for each. GRT is more flexible when the independence assumption is abandoned. As we saw in Part II, dependence between dimensions
can be defined in several diagnosable ways.
COSS analysis has an advantage when the number of components contributing to a decision is greater than two. In an auditory experiment analogous to those we have been discussing, Berg, Robinson, and Grantham
(1989; summarized in Berg, 1989) presented listeners with a sequence of up
to 10 tones, each drawn from a normal distribution with a mean of 1000 Hz
and a standard deviation of 100 Hz; or each drawn from a normal distribution with a mean of 1100 Hz and the same standard deviation. The COSS
approach assigns each position in the sequence its own weight ap and any
two positions can be compared using Equation 12.5. The requirement that
"La. = 1 allows just enough equations to estimate all the weights. One finding
in the Berg et al. study was that the greatest weight was assigned to the last
item in the sequence.
Groups of Observers
In some important real-life situations, groups of individuals (such as juries
or committees) must make a decision based on the same evidence. The
framework we have been describing for combining "dimensions" within a

Components of Sensitivity

309

stimulus can be extended to the problem of combining information from
group members. Sorkin and his colleagues (Sorkin & Dai, 1994; Sorkin,
Hays, & West, 2001) have studied an experimental situation in which performance of a group can be compared with various ideal models.
In a typical experiment, each member of a team of observers makes a
judgment in a visual discrimination task, and the votes are somehow combined into a group response. One kind of model that can be used as a baseline is the Condorcet group. In such a group, each individual casts an
unbiased vote, the votes are weighted equally, the judgments are treated as
independent, and a decision is reached by some type of majority rule. The
decision rule may be a simple majority, unanimity, or anything in between.
Ideal groups are superior to Condorcet groups in three respects: Individuals
make graded judgments rather than binary ones, their judgments are
weighted in proportion to their expertise (i.e., their d' values) and summed,
and the summed d' statistic is compared with a criterion to make a decision.
Both the ideal group and Condorcet groups with different majority rules
predict that performance will increase with the size of the group as shown in
Fig. 12.5. Also plotted in the figure are data points from an experiment constructed to remove any artificial constraints on group performance—graded
responses were made, a decision was made by consultation rather than strict
voting, and the expertise of the group members was known. As can be seen,
performance increased with group size in the manner predicted by a
Condorcet group operating on the basis of unanimity or near unanimity—a
nonoptimal rule. The discrepancy between observed performance and ideal
increased with group size, in that efficiency (the square of the d' ratio)
dropped from about 90% for a two-person group to 45% for seven people.
Because the analysis follows that developed in the multicomponent stimulus context, tools developed there can be used. For example, Sorkin et al.
(2001) were able to apply the COSS method to determine the weights assigned to each individual in a group. Groups in this experiment weighted
observations roughly according to the expertise of the contributor, a comforting result. Judgments were essentially uncorrelated and, as predicted,
when a correlation was introduced experimentally (by correlating the
stimulus arrays), performance dropped.
Decision making by groups has been studied extensively by social psychologists, and some of their findings are illuminated by these results. For
example, participants sometimes reduce their efforts when participating in
a group, a finding that has been called social loafing. In SDT modeling, this
result can be understood as a response to high correlations between individ-

310

Chapter 12

FIG. 12.5. Group performance in a signal-detection task as a measure of
group size. Theoretical
curves describe the ideal
method of combining information and various simpler "Condorcet" rules
based on unweighted tallying of individual votes.
The data are most consistent with the least optimal
Condorcet rules. (Adapted
with permission from Fig.
4 in Sorkin, Hays, & West,
2001.)

ual judgments or a low weighting assigned in group decision making. The
existence of Condorcet and ideal comparison models allows for more specific hypotheses in the study of such interesting phenomena.
Hierarchical Models
We have been describing models in which observer performance is compared with ideal performance based on stimulus structure, but the general
approach can be elevated one step so that the comparison is between different levels of processing. To make this work, we must construct two tasks
that use the same stimuli but require different kinds of treatment by the observer. In the simplest case (and the only one we pursue here), one task depends only on low-level mechanisms, the other on both low- and high-level
mechanisms.
An example of such an analysis was presented in chapters 5 and 7.
Durlach and Braida (1969) postulated that performance in a 2AFC discrimination paradigm in which the same two stimuli (auditory pure tones) were
discriminated on every trial was essentially limited only by the sensory
noise that arises inevitably from neural coding. Identification accuracy is
also limited by this sensory variance, but also by context coding memory
noise; a comparison of discrimination and identification allows calculation
of the relative magnitude of these two sources of variance. Because Durlach

Components of Sensitivity

311

and Braida assumed the two types of noise to be additive, this proportion
can be found from a variant of Equation 12.3 (cf. Eq. 5.5). A similar analysis
of 2AFC roving discrimination designs, in which the two stimuli to be discriminated vary from trial to trial, proposes that sensory variance, context
coding, and time-dependent trace coding combine to determine sensitivity.
Each component can be estimated from a suitable data set; see chapters 5
and 7 for more detail.
A similar strategy has been applied to visual search experiments, which
are rather more complicated than pure-tone resolution. In a typical visual
search design, the observer must determine whether a target (say, a horizontal line segment) is present in an array of distractors (say, vertical line segments). The dependent variable is usually response time, but accuracy may
also be measured. An important finding in this literature is that performance
is better when target and distractor differ by a single "feature" (as in the line
orientation case) than when two features are relevant (as in finding a red
horizontal segment in a field of red vertical and green horizontal segments).
A common interpretation of this finding (Treisman & Gelade, 1980) is that
additional processing is required in "conjunction" conditions to integrate
the two features.
Geisler and Chou (1995) asked whether low-level factors might be responsible for such differences in search performance and sought to measure
the low-level baseline for this task. Like Durlach and Braida, they measured
2AFC discrimination (in their case of a field with a target from a field that
consisted only of distractors), limited the range of stimuli (using an adaptive procedure like those described in chap. 11), and provided trial-by-trial
feedback. Data from these tests were summarized by a discrimination window describing performance over spatial area and stimulus duration; the
wider this window, the better the low-level processing.
If low-level mechanisms account substantially for visual search speed
and accuracy, then experimental conditions with larger windows should
produce faster and more accurate searches. A strong correlation of this type
is exactly what Geisler and Chou (1995) observed, and they concluded that
the slowness of conjunction searches compared to feature searches, "may
be due (at least in part) to low-level factors and not to complex aspects of the
attentional mechanisms" (p. 370). It is clear that high-level processes like
attention allocation play an important role in visual search, particularly in
multiple-fixation conditions, but accounting for more variance with lowlevel mechanisms makes the task of developing a general model of visual
search both more manageable and more accurate.

312

Chapter 12
Essay: Psychophysics Versus Psychoacoustics (etc.)

In the community of auditory researchers, two terms are used to describe research on the discrimination and classification of sounds, and it is useful to
draw a distinction between them. Psychophysics is the use of theoretically
grounded methodology to interpret perceptual measurements of sounds,
and psychoacoustics is the project of relating those measurements to the
sounds' physical characteristics. A similar distinction can be made in other
modalities and in cognitive applications as well.
Until the present chapter, this book has been almost entirely about
psychophysics: The questions of how to separate sensitivity and bias, compare different discrimination paradigms, and relate classification to discrimination data have been taken up with the most modest of stimulus descriptions.
The variability limiting performance has often been partitioned between
components (sensory and memory, attention to one or more sources of information), but these components have kept their distance from the stimuli.
In this chapter, we have taken a few steps toward adjusting the balance.
Although psychoacoustics has by no means received equal time, the models
here do include stimulus factors as explicit contributors to sensitivity or to
its limitations. Detection theory unifies approaches with varying degrees of
reliance on stimulus factors, and there is a discernable continuum of application, from complete reliance on the stimulus to explain the data to
complete indifference to it.
One early line of detection theory research was heavily psychoacoustic. The path of this program moves from the ideal observer models
summarized in Green and Swets (1966) to studies of profile analysis
(Green, 1988) and Berg's (1989) COSS analysis. The undoubted success
of this body of work, however, was obtained at the cost of the introduction
of explicit variability into the stimulus sets. Detection of a noise increment in a noise background is well understood in terms of energy detection (Green, 1960), but the background noise is necessary to calculate the
variance that limits performance. This kind of ideal observer model would
not be able to predict the detectability of a noise burst in silence. Similarly,
both GRT and COSS do a good job of accounting for the discriminability
of distributions of line-segment pairs, but could not make much progress
if only two such pairs were being distinguished. When no external noise is
present, detection theoretic models still describe data (such as ROCs)
well, but the limiting variance is internal and not identified with any aspect
of the experimental situation. This is pure psychophysics, the other end of
the continuum.

Components of Sensitivity

313

It is the in-between cases that are the most interesting. In the "ideal
group" analyses of Sorkin and his colleagues, experimental results are
compared with a baseline that represents the ideal performance under
particular stimulus conditions. The discrepancy is then interpreted in
terms of nonoptimal decision processes. Kingston and Macmillan's
(1995) analysis of the Garner paradigm showed that some tasks are inherently harder than others, so that the degree of "filtering loss" must be
understood in terms of ideal observers, not simple performance measures. For that matter, the many comparisons between discrimination
paradigms discussed in earlier chapters (and summarized in Figs. 10.1
and 10.2) show that inherent limitations in the decision space can account for a great deal of variance that might otherwise be understood in
psychological terms.
The "internal noise" that remains when inherent limitations are factored out can be conveniently (if not precisely) divided into cognitive and
sensory components. The cognitive category includes Durlach and
Braida's context memory, Geisler and Chou's high-level processes, and
explicit attentional manipulations. The sensory category is neural noise,
of the sort identified by Hecht et al., in vision and by auditory-nerve-based
models in hearing. This category includes processes whose neural substrates are well understood, but of course all internal noise is neurally
based. One direction in which progress is being made is in providing a
neuroscience explanation of cognitive processes as well. As an example,
Patalano, Smith, and Jonides (2001) have shown that different parts of the
brain underlie prototype- and exemplar-based strategies in categorization
tasks, even within a single participant.
A sign of progress in research using resolution designs is that increasingly complex tasks are being used; another is that detection theory
models are keeping pace. The psychophysical and psychoacoustic poles
with which we introduced this essay define an increasingly false dichotomy. Psychoacoustics is becoming more modest in its contributions to
moderately complex problems, as other components are better understood, but more compelling in complex situations (e.g., sound localization) in which models of the stimulus situation have advanced faster than
those of internal processing. Psychophysics is becoming more ambitious, attempting to incorporate cognitive and neural processes into its
models. As our understanding of perception deepens, we can expect to
see fewer and fewer theories that rely on either one alone and more that
draw from many components of sensitivity.

314

Chapter 12
Summary

Detection theory provides strategies for partitioning the variance that limits
performance between external and internal sources, and among subcategories of each. An important tool is the distribution discrimination task, in
which participants must determine which of two overlapping distributions
led to an observation. The magnitude of the external noise arises from the
overlap, and the best possible, ideal performance is what would be found if
this were the only limiting factor. If accuracy falls below this level, the discrepancy is attributed to internal noise.
When stimulus classes vary on more than one dimension, external variance can be divided among the dimensions. Discrimination can be described with multidimensional detection theory, and the decision bound
between the distributions depends on both dimensions. The weighting assigned to a dimension depends on the variability of the distributions along
both dimensions (for the ideal observer) and on the slope of the
psychometric COSS functions (for real observers). Similar analyses can be
applied to features of geometric shapes, frequency regions of noise stimuli,
and individuals within a group.
A comparison of different experimental tasks permits division of internal
variance to multiple levels of processing, given a theoretical perspective
that determines what kind of processing is required for each task. This approach has been successful in such disparate areas as pure-tone resolution
and visual search.
Problems
12.1.
12.2.

12.3.

Draw the ROC curve for the dice game. How is it similar to, and different from, other theoretical ROCs?
In the original dice game, there are 3 dice, 2 regular and 1 with half
3 s and half Os. Consider two modified games: (a) There are 4 dice, 3
regular and 1 with half 3s and half Os. (b) There are 3 dice, 2 regular
and 1 with half 2s and half Os. For each game, what are the underlying distributions for total score? How do their means and variances
compare with those of the original game? (More difficult:) What is
the maximum possible proportion correct?
Suppose the observer in the Lee and Janke experiment adopts a criterion location of 10 cm on half the trials and 14 cm on the other
half. What is the standard deviation of the criterion location? What
proportion correct will be obtained?

Components of Sensitivity
12.4.

12.5.

315

(a) In an auditory experiment, the listener hears two noise bursts. The
average intensity of both bursts is 1 for samples of S1 and 2 for samples
of S2, and both samples have a variance of 1 (see Fig. 12.6.a). The task
is to decide which distribution generated the bursts. The experimenter
plots COSS functions for the first and second intervals separately and
finds that both have slope 1. What weights has the listener assigned to
the two bursts? What is the slope of the decision bound?
(b) Same as (a), but now the COSS function for Interval 2 has a
slope of 0.5.
Same as Problem 12.4, but the two samples have unequal variance:
Samples of S} have variance 1 and samples of 52 have variance 4
(see Fig. 12.6b). The experimenter plots COSS functions for the
first and second intervals separately and finds that both have slope
1. What weights has the listener assigned to the two bursts? What
decision bound does this imply? (b) Same as (a), but now the COSS
function for Interval 2 has a slope of 2.

FIG. 12.6. Distributions of intensity
for an experiment in which two noise
bursts are presented on each trial. The
value of burst 1 is represented on the
horizontal axis and that of burst 2 on
the vertical axis; the circles and ellipses are equal-likelihood contours, as in
previous chapters. Both bursts are
drawn either from a distribution with
mean = 1 or from a distribution with
mean = 2. (a) variance = 1 in both intervals (Problem 12.4), and (b) variance = 1 in interval 1, variance = 4 in
interval 2 (Problem 12.5).

This page intentionally left blank

This page intentionally left blank

IV
Statistics

13
Statistics and Detection Theory

Statistics is commonly divided into two parts. In descriptive statistics, a data
set is reduced to a useful measure—a statistic—such as the sample mean or
observed proportion. Detection theory includes many possible statistics of
sensitivity [d\ a, p(c), etc.] and of bias, and this book has been well stocked
with (descriptive) statistics.
Inferential statistics, on the other hand, provides strategies for generalizing beyond the data. In chapter 2, for example, we met an observer who was
able to correctly recognize 69 of 100 Old faces while producing only 31%
false alarms, and thus boasted a d' of 1.0. As a measure of sensitivity for
these 200 trials, this value cannot be gainsaid, but how much faith can we
have in it as a predictor of future performance? If the same observer were
tested again with another set of faces, might d' be only 0.6 or even 0.0?
The statistician views statistics, such as sensitivity measures, as estimates of true or population parameters. In this chapter, we consider how
statistics can be used to draw conclusions about parameters. The two primary issues are: (a) How good an estimate have we made? What values, for
example, might true d' plausibly have? and (b) Can we be confident that the
parameter values, whatever they are, differ from particular values of interest
(like 0) or from each other? These two problems are called estimation and
hypothesis testing.
The chapter is in four sections. First, we consider the least processed statistics, hit and false-alarm rates. Second, we examine sensitivity and bias
measures. The third section treats an important side issue—the effects of
averaging data across stimuli, experimental sessions, or observers. For all
these topics, the primary model considered is equal-variance SDT, and the
discussion of hypothesis testing is limited to hypotheses about one parameter or the difference between two parameters. The final section shows how
319

320

Chapter 13

the standard statistical technique of logistic regression can be used in testing hypotheses within the basic model of Choice Theory and can be extended to SDT and other models.
Like the rest of the book, most of this chapter should be accessible to the
survivor of a one-semester undergraduate statistics course. Relevant concepts from probability and statistics, some of which may be unfamiliar to
such a reader, are summarized in Appendix 1 .
Hit and False-Alarm Rates
A Single Observed Proportion
To start, consider a face-recognition experiment in which the proportion
correct is reported to be .69. Observed proportions vary from sample to
sample according to a well-known distribution, the binomial. If the true proportion recognized is/7, then the observed proportion P varies across separate tests. The expected value of P is the true value/?, and the variance of P is
p(l -p)IN, where N is the number of trials. We can estimate the variance in
our example by using the observed proportion P instead of/?; this estimate is
(.69)(.3 1)/100 = 0.002139, and the standard error is 0.0462.
When N is fairly large, as in this example, and the products Np and
N(l -/?) are not too small, the distribution of P is approximately normal. (For appropriate methods when these conditions are not satisfied,
see Darlington & Carlson, 1987.) In a normal distribution, about 95%
of scores are within 1 .96 standard deviations of the mean, the remaining 5% being equally divided between the two extreme "tails." We can
use this fact to construct a 95% confidence interval around P:

In our example, approximating p by P,
/? = .69 ± (1.96)(0.0462) = .69 ± .09 .
That is, the true proportion is probably between .60 and .78.
The same strategy leads to hypothesis tests about binomial data. The experimenter can test the hypothesis that the true recognition proportion is .5
by simply noting that .5 is not in the 95% confidence interval. Thus, this hypothesis can be rejected "at the .05 level."
Large and small proportions have less variance than intermediate ones:
/?(! -/?) equals 0.25 when/? = .5, but is only 0.09 when/? = .9 and falls to 0.01

Statistics and Detection Theory

321

when p = .99. This suggests that estimation will be most accurate, other
things being equal, for large values of d' . Does this mean that, in choosing
experimental conditions, one should aim for very high performance levels?
For at least two reasons, the answer is no. The first reason is this: As proportions near 0 or 1 , their variability does indeed decrease, but the probability of obtaining an observed proportion of exactly 0 or 1 increases.
Observed proportions of 0 or 1 can be converted to z scores (or to log odds,
the Choice Theory transformation) only by a somewhat arbitrary adjustment and are thus worth avoiding. Techniques for dealing with perfect proportions attained by individuals in a group were introduced in chapter 1 and
are discussed further later. The second reason for avoiding very large and
small proportions in the first place is also addressed later.
Comparing Two Proportions
Binomial variability also affects comparisons of two data points, each involving only one proportion. To extend the example, suppose a second observer recognizes 89 of 100 faces: Is this a significantly greater proportion
than the first observer's .69? An important statistical theorem (see Appendix 1 , Equations A 1 . 8 and A 1 .9) concerns differences between independent
variables: The mean of the difference is the difference between the means,
and the variance is the sum of the variances. In this case, Pl - P2, the difference in the success rates, has a mean value of pl -p2, the true population difference. The variance of Pl isp^l -p^/N^ the variance of P2 isp2(l -p2)/N2,
and if the two proportions are independent—as we assume—the variance of
the difference is the sum of these.1
Finally, the difference between two normal variables is also normal, so
the 95% confidence interval around the observed difference is:
Pl-p2

= P-P2± z^dptl -p^NJ + \p2(\ -p2)/AgP . (13.2)

For success rates of .89 and .69, again using observed P values to estimate
the p parameters,
p. -p2 = .20 ± (1.96)[0.00098 + 0.00214]* = .20 ± .11
'According to detection theory, H and F are related across conditions by an ROC or isobias curve, and
this is a form a dependence. But the statistical independence assumed here is that within an experimental condition the "yes" rates on Sj and S2 trials do not affect each other. This could be false if, for example, the criterion shifts gradually during a set of trials.

322

Chapter 13

The true difference between the two proportions, we can be 95% sure, is between .09 and .31. Because 0 is not in this interval, we can reject the possibility that the two observers' memories are equally good.
(False-Alarm, Hit) Pairs
When 5, and S2 trials—New and Old faces, say—are distinguished in an experiment, two proportions (false-alarm and hit rates) are estimated, each
with its own binomial variability. The first step in comparing such pairs is to
apply the logic of the preceding section to each proportion.
Let us compare a control condition in which (F, H) = (.31, .69) with a condition in which the observer is hypnotized and (F, H) = (.59, .89), assuming
that each of the four proportions is based on 100 trials. Both conditions can
be represented as points in ROC space. Because F and H are normally distributed, the distribution of the point (F, H) is bivariate normal (see Appendix 1). Confidence regions around these points that include 95% of the mass
of the bivariate distribution are shown in Fig. 13.1. All points at the edge of
region have the same value of likelihood ratio. The regions turn out to be elliptical in shape, with a maximum radius of about 2.5 univariate standarddeviation units. The two contours do not overlap, suggesting that the two
points are reliably different.
Figure 13.1 gives an indication of the variation we can expect in detection theory parameters. The confidence region for the ROC point from the
control condition includes the points (.23, .77) and (.39, .61), which corre-

FIG. 13.1 Bivariate binomial (and therefore approximately normal) distributions
of (false-alarm, hit) pairs in
ROC space. Ellipses indicate
regions containing 95% of
the distributions.

Statistics and Detection Theory

323

spond to d' values of 1.48 and 0.56. Also included are the points (.39, .77)
and (.23, .61), which correspond to c values of-0.23 and +0.23. Confidence
regions based on fewer trials are larger, the radii being inversely related to
the square root of the number of trials.
Sensitivity and Bias Measures
Usually our interest is not just in whether two ROC points could have arisen
from the same underlying (false-alarm, hit) pair, but in whether the two
points reflect the same sensitivity, or the same bias. (In the hypnotic recognition example, both parameters are important.) We consider, in turn, the d'
and c parameters of SDT.
Sensitivity
Remember that a sensitivity parameter is computed by subtracting the
transformed hit and false-alarm rates: for example, d' = z(H) - z(F). Two
separate questions can be asked about this statistic: Is it, on average, equal to
true d' ? What is its variance? In our discussion of hit and false-alarm rates,
we were able to ignore the first, statistical bias, issue because observed proportions are accurate—unbiased—estimators of population proportions.
Things are not so simple with d', and we must answer both questions.2
Statistical Bias ofd'. Miller (1996) evaluated the statistical accuracy problem in a straightforward way. Suppose the true hit rate in a yes-no
experiment is .69 and the true false-alarm rate is .31, so that true d' = 1.0. In an
experiment with 16 signal and 16 noise trials, what should we "expect" our
estimate of d' to be? The expected value is the one obtained, on the average,
in experiments of this type. One experiment might yield H= 12/16 = .75 and
F = 5/16 = .31 for an estimated d' of 1.170; in another, perhaps H = 10/16 =
.62 and F = 4/16 = .25, so d' = 0.979. The expected value can be calculated
from the binomial distributions of H and F. For this particular situation,
Miller found it to be 1.064, 6.4% greater than the true value.
Miller conducted this calculation for several values of true d' and the
number of trials; some results are shown in Table 13.1, an abbreviated version of Miller's Table 1. Miller simplified his calculations by assuming
2

The terminology is potentially confusing: Statistical bias is conceptually unrelated to response bias,
and statistical accuracy (unbiasedness) is unrelated to accuracy as measured by a sensitivity measure.
In this chapter, we avoid using the terms bias and accuracy without a qualifier unless the context makes
the usage clear.

324

Chapter 13
TABLE 13.1

Expected Yes-No d' and Percent Bias

8
True d'

d'

0.5

0.555

Number of Signal and Noise Trials
32
128
% bias
d'
% bias
d'
% bias

572
d'
% bias

11.0

0.514

2.8

0.503

0.6

0.501

0.2

2.9

1.007

0.7

1.002

0.2

1.0

1.096

9.6

1.029

2.0

2.036

1.8

2.084

4.2

2.019

1.0

2.004

0.2

3.0

2.641

-12.0

3.135

4.5

3.048

1.6

3.011

0.4

4.0

2.926

-26.8

3.879

3.0

4.122

3.0

4.032

0.8

equal response bias in all cases and also had to decide what to do about observed hit rates of 1 and false-alarm rates of 0, a problem we examined in
chapter 1. Table 13.1 uses the correction in which a frequency of 0 is converted to */2 and a frequency of N is converted to N -1/2.
Estimation is accurate if the appropriate table entry equals true d'. This is
most nearly true for estimates based on large numbers of trials—if true d' =
1, for example, the table shows that with 512 trials per stimulus the average
observed d' is 1.002, an error of just 0.2%. With fewer trials, unsurprisingly,
estimates are less accurate. The most problematic cases—those with the
greatest bias—are those in which the number of trials is small and true sensitivity is high; these are the results for which the correction for 0 and 1 cells
is most often needed, and any correction leads to some distortion in the
estimate.
There are at least two ways in which the pattern in Table 13.1 might affect
substantive conclusions. First, sensitivity comparisons involving different
numbers of trials entail a constant error. If d' = 3.14 in a condition with 32
trials and d' - 3.05 in a condition with 128 trials, the apparent difference is
entirely attributable to different amounts of bias applied to a true d' of 3.0.
Typically, one can avoid such comparisons. The second threat is more insidious: Comparisons of different sensitivity values based on the same number
of trials are also contaminated by error. For example, if in two conditions
with 32 trials each one measures d' values of 3.14 and 3.88, for a difference
of 0.74, the bias pattern implies that the true d' values are 3.0 and 4.0, for a
difference of 1.00.
Table 13.1 illustrates a potential distortion in data analysis, but fortunately also contains the information needed to avoid the problem by "cor-

Statistics and Detection Theory

325

reeling" estimates of d' for statistical bias. A d' of 1.10 based on eight trials
per stimulus should be adjusted, according to the table, to its most likely
true value of 1.00.
Standard Error ofd'. The problem of finding the standard error
of d' was first solved by Gourevitch and Galanter (1967) using an approximation. We begin with their approach, and then we consider the more exact
calculations of Miller (1996).
Because d' = z(H) - z(F), the first step in finding the variance (square of
the standard error) of d' is to compute the variances of the transformed proportions. The variance of the difference between the two (independent)
variables is then the sum of their variances.
Gourevitch and Galanter showed that observed z scores have an approximately normal distribution, with variance

where N is the number of trials and 0(p) is the height of the normal density
function at z(p). As a result,

where N2 and TV, are the number of Signal (52) and Noise (51,) trials.
Values of the function 0 can be found in Table A5.1 or computed (compare Equations 2.9 and A1.10):

Continuing our example, let us find a 95% confidence interval around
d' in the hypnotic condition. The hit and false-alarm rates are .89 and .59,
each based on 100 trials. Equation 13.5 reveals that 0(.89) = 0.1880 and
0(.59) = 0.3887. According to Equation 13.4, the variance associated with
d1 is 0.0277 + 0.0160 = 0.0437, and the standard error is (0.0437)l/2 =
0.209. The center of the confidence interval is about 1.00—Table 13.1
shows that the statistical bias is less than 1 % when Nis approximately 100.
The confidence interval extends 1.96 standard errors above and below ob-

326

Chapter 13

served d', that is, 1.00 ± (1.96)(0.209) = 1.00 ± 0.41. We can be 95% confident that true d' is between 0.59 and 1.41, and in particular that it is not 0.
The approach can be extended to hypothesis tests about more than two
ROC points (Marascuilo, 1970).
An interesting aspect of this example is that the variance associated with
the hit rate (0.0277) is substantially greater than the variance associated
with the false-alarm rate (0.0160). The general finding, pictured in Fig.
13.2, is that the variance associated with i scores increases as proportions
approach 0 or 1; this is true even though the variance associated with the
proportions themselves decreases in this region. Here is the promised second reason to avoid extremely large or small proportions: Even if observations of 0 or 1 can be avoided, the variability associated with d' is large.
Miller (1996) extended the computational approach he used to estimate
bias, discussed earlier, to standard errors. Table 13.2 gives an abbreviated
version of his Table 2 (again for the l/2, N - l/i correction) and includes a
comparison of the direct calculation with the Gourevitch and Galanter
approximation.
A large number of trials lead to a small standard error. The variance should
be proportional to 1/N, and this is almost exactly true for the approximation:
For every increase in N by a factor of 4, the variance decreases by that factor.
Direct computation shows a much less regular pattern for small N particularly
at high sensitivity levels. Because they are exact, Miller's computations are to
be preferred to the Gourevitch and Galanter approximation, especially because in some cases the degree of discrepancy is quite large.
The approximation and direct method give exactly the same result, to
two decimal places, for the H = .89, F = .59 running example (letting N} =
N2 = 128 to allow for more direct reference to Tables 13.1 and 13.2). But

FIG. 13.2. The variance of a proportion p,
and of its z transform
z(p), as a function of p.
Variability of p is greatest whenp = .5; variability of z(p) is greatest
when p is near 0 or 1.

Statistics and Detection Theory
TABLE 13.2

327

Variance of Yes-No d'
Number of Signal and Noise Trials

Computation
Direct

Approximation

Percent Error in
Approximation

Trued'

128

572

8

32

0.5

0.491

0.106

0.026

0.0063

1.0

0.482

0.117

0.027

0.0068

2.0

0.358

0.173

0.037

0.0090

3.0

0.168

0.224

0.067

0.0150

4.0

0.056

0.121

0.141

0.0333

0.5

0.402

0.100

0.025

0.0063

1.0

0.430

0.108

0.027

0.0067

2.0

0.570

0.142

0.036

0.0089

3.0

0.929

0.232

0.058

0.0145

4.0

1.907

0.477

0.119

0.0298

0.5

-18.1

-5.7

1.0

-10.8

-7.7

59.2

-36.6

2.0
3.0

453

4.0

3305

3.6
294

-3.8

0.0

0.0

-1.5

-2.8

-1.1

-13.4

-3.3

-15.6

-10.5

consider two cases with smaller N and higher d'. First, if true d' = 2 and N{
= N2 = 32, the approximation gives a variance of 0.142 (in Table 13.2), so
the 95% confidence interval is 2.00 ± (1.96)(0.142)1/2 = 2.00 ± 0.74. Direct
computation reveals a bias of 4.2% (Table 13.1) and a variance of 0.173
(Table 13.2), so the confidence interval for d' is 2.08 ± (1.96)(0.173)I/2 =
2.08 ± 0.82. The discrepancy here is moderate. For a more extreme example, consider the case in which true d' = 3 and N}=N2 = 8. Now the approximation leads to a confidence interval of 3.00 ± 5.98 (i.e., it is uncertain whether d' is even positive). Direct calculation is more reassuring (d' = 2.64 ± 0.80), but this result is deceptive. The smaller standard
error results from the use of an approximation to eliminate undefined
values of d'. As Miller (1996) noted, if true d' is high enough, all the data
will fall at the maximum value allowed for perfect data, which in this case
is d' = z(7.5/8) - z(0.5/8) = 3.07. This is not really a precise estimate of
anything. Clearly one needs a very good excuse, and considerable caution,
to estimate sensitivity from a mere 16 trials.

328

Chapter 13
Response Bias

Bias and sensitivity are much alike statistically, because they are much alike
algebraically: d' is the difference between z(H) and z(F), and c is -0.5 times
the sum of these terms. The variance of c is found to be just one quarter the
variance of d'\
var(c) = var[-0.5(z(#) + z(F))]

(13.6)

= 0.25var[z(#)-z(F)]
= 0.25 var( = 20.09, and so on. Using a calculator, entering 1 followed by e* should produce 2.718281828 (i.e., e},
The properties of exponentials parallel those of logarithms:
1 . Multiplication of exponentials corresponds to addition of their exponents: e*ey - e**y.
2. Division of exponentials corresponds to subtraction of their exponents: e*ley = tTy.
3. Raising an exponential to a constant power corresponds to multiplying the exponent by that power: (e*}a - e?a.
4. Taking the reciprocal of an exponential corresponds to negating
the exponent: l/e* - e~x.
Because exponentials and logarithms are inverse functions, performing
the two operations successively leaves the initial value unchanged: \n(ex) =x
and e[n(x) = x. Thus, if ? has been calculated and x is desired, take the logarithm of e*', if ln(x) has been calculated but x is desired, take the exponential
ofln(jc).

Appendix

3

Flowcharts to Sensitivity
and Bias Calculations

The following charts will guide you to the appropriate equations, tables, or
computer programs for finding sensitivity and bias. Start with Chart 1,
which directs you to other charts depending on the paradigm.
To use the charts, proceed from left to right. Whenever more than one
path is available, choose the one that corresponds to your specific application. Each path ends with an outcome in the right-hand column.

359

360

Appendix 3
Chart 1: Guide to Subsequent Charts

A discrimination experiment measures the ability of an observer to distinguish two stimuli, A and B. If the experiment has one observation interval,
either A or B is presented on each trial. If it has more than one interval, a sequence of stimuli, each of which is either A or 5, is presented on each trial.
Three separate charts are needed to analyze experiments of the second type:
one to determine the design of the experiment (Chart 5), one to find the appropriate index of sensitivity (Chart 6), and the last to find the bias index
(Chart 7).
Classification experiments measure the ability of an observer to label
stimuli (from sets of more than two).
In some designs, the stimuli to be judged are preceded or followed by a
specific, constant stimulus on each trial. The appropriate analyses for such
experiments are found by ignoring the constant stimulus.
1 interval

2 stimuli

2 responses

sensitivity

(discrimination)

(yes-no)

bias

> 2 responses (rating)
> 2 stimuli (classification)
> 1 interval, 2 s timuli (discriminati Dn)

sensitivity
bias

Chart 2
Chart3
Chart 4
Charts
Charts5&6
Charts5&7

Flowcharts to Sensitivity and Bias Calculations

361

Chart 2: Yes-No Sensitivity
In this and later charts, two types of decisions are often made, one based on
the shape of the assumed underlying distributions, the other on the format of
the data. In choosing among the various distributional assumptions, we recommend Gaussian or logistic models. Rectangular-distribution models entail undesirable threshold assumptions (see chap. 4); their only advantage is
that they are sometimes simpler to compute.
Data normally are reduced to hit and false-alarm rates (H and F), which
should be used whenever they are available. If only proportion correct
\p(c)} is given, it is necessary to assume that responding is unbiased.
For discussion, see chapters 1 and 4.
Gaussian
distributions

distance
measure d'

from H and F
from p(c)

proportion
measures

from H and F
from p(c)

logistic distributions
(Choice Theory)
rectangular
distributions
(threshold
theory)

1 -threshold
model
2-threshold
model

logistic distributions for low
sensitivity, rectangular distributions
for high sensitivity

Eq. 1.5
Eq. 1.7
M^max

Eq.7.5
unchanged

from H and F

BOB. 4.8, 4.9

from p(c)

%,4.19,
(§&i$d for 0)

proportion
measure
proportion
measure
area measure

q
P(c)
A'

si. 4.1 , ;
unchanged
Bq& 430,421

362

Appendix 3
Chart 3: Yes-No Response Bias

For discussion, see chapters 2 and 4.
Gaussian distributions

logistic distributions

rectangular distributions

criterion
location
relative
criterion
likelihood
ratio
criterion
location
relative
criterion
likelihood ratio
(equivalent)
criterion
location
relative
criterion

c

Eq.2,1

c'

Eq.2.3

ft

Bq.2.6

ln(*>)

!

Eqs. 4.1 1,4.12

b'

Bq.413

A,
B"

Eq.4.14

Ed. 4,23
yes rate a^Jf+J5)
bror ratio = {1-IO/F

Flowcharts to Sensitivity and Bias Calculations

363

Chart 4: Rating-Design Sensitivity
To analyze a rating experiment with normal-distribution assumptions, an
ROC is fitted to (F, H) pairs, and sensitivity and slope statistics are calculated from the curve. Fitting is best done by a maximum-likelihood computer method (see Appendix 6 for pointers to such programs). The chart
assumes that ROC slope and sensitivity are obtained by one of these methods and shows how to obtain other measures.
For discussion, see chapter 3.
Gaussian
distributions
(any slope)

sensitivity

bias (criterion
location)
nonparametric fit ROC using
trapezoidal
rule

distance
measures

rms standard
deviation

*a

Eqs. 3.4, 3.5

mean standard
deviation

*e

Eqs. 3.6, 3.7

area measure

\

rms standard deviation

C

mean standard deviation

C

area measure

A

a
e

s

lq.3.8
Bq. 3.13
Eq. 3.14
Eq.3,9

364

Appendix 3
Chart 5: Definitions of Multi-Interval Designs

A discrimination experiment tests the ability to distinguish two stimuli (A
and B), but may use a temporal or spatial sequence of stimuli on each trial.
We denote such sequences as bracketed lists; for example,  means
Stimulus A followed by Stimulus B. In the lists of possible sequences, the
notation "vs" separates sequences with distinct corresponding (correct)
responses.
In some designs, the stimulus sequence to be judged is preceded or followed by a specific, constant stimulus on each trial. The appropriate analysis is the same as if these fixed stimuli were not present. Thus, if the only
possible sequences are  and , the design is the same as if
the possibilities were just  and  (i.e., 2AFC). As another example, if the possible sequences are just  and , the design is the
same as if the possibilities were just  and  (i.e., one-interval), and
Charts 2 and 3 should be consulted instead of Charts 6 and 7.
Number
of
intervals

Number
of
responses

2

2

3

2
3

m (m > 4)

m (m > 4)

Sequences

Paradigm

Chapter

vs


2AFC

7

,  vs
, 
,  vs
, 

samedifferent

9

ABX

9

 vs
 vs


3AFC

10

,  vs
,  vs
, 
vs
vs...vs


oddity

9

mAFC

10

Flowcharts to Sensitivity and Bias Calculations

365

Chart 6: Multi-Interval Sensitivity
Some designs in this chart have two models: one for "independent observations," the other for "differencing." As a rule of thumb, independent observation models are used for fixed designs (only two stimuli in a block of
trials) and differencing models for roving designs. As in the one-interval
designs, SDT models assume normal distributions. Choice Theory models, however, do not assume logistic distributions in designs other than
one-interval, even though a parameter of the one-interval experiment
[ln(a)] is estimated.
For discussion, see chapters 7, 9, and 10.
2AFC

SDT

Choice Theory
mAFC

SDT

Choice Theory

distance measure d'

Eqs.7.2,7.7

proportion
measure

Eq.7.6

distance measure ln(ot)
from p(c)
d'
from p(c)
ln(a)
from full matrix

reminder
same-different
(Gaussian)

ABX
(Gaussian)

oddity [from
p(c) only]

PWmax

same as yes-no
independent-observation model

differencing model
independent-observation model
differencing model
independent-observation model
differencing model

Eq.73
TaMeAS.7

BQ. 10.1

%10,2
Chart 2
from H and F B$. 9,7 and
Table A5.3
from p(c)
Bo. 9.3
T**fe5,4
TaWeASJ
. TtfteASJ
Gaussian
T»^A5.6
Gaussian
•WWeAS.5
logistic
TaMeASJ
ln(or)

366

Appendix 3
Chart 7: Multi-Interval Bias

For many designs, no bias measures have been developed. (Likelihood ratio
is always a possible statistic, but is often difficult to calculate.) If hit and
false-alarm rates are available, the yes-no methods of Chart 2 may be used,
although the yes-no interpretation (criterion location, likelihood ratio, etc.)
cannot be made. We call this a heuristic use of these methods.
2AFC

same as yes-no

roAFC

Choice Theory

same-different

independentobservation
model
differencing
model

other 2-response
oddity

Chart3

EQ, 10.3
heuristic use

Chart 3

criterion

c;

Chapter 9

likelihood ratio

A

Chapter 9
Charts

criterion

Cd

Chapter 9

likelihood ratio

fa

Bq. 9.10

heuristic use

yes-no methods (heuristic use)

Charts
no method

Flowcharts to Sensitivity and Bias Calculations

367

Chart 8: Classification
All models view sets of more than two stimuli as arranged in a perceptual
space. In general, the space may be of any dimension up to one less than the
number of stimuli. We consider in this chart only three special (but important) cases: (a) all stimuli are represented on the same dimension, (b) the
stimulus set is feature-complete (i.e., orthogonally combines values on
multiple dimensions), and (c) all stimuli are orthogonal (i.e., each differs
from the other on a distinct dimension).
For discussion, see chapters 5 (one dimension) and 10 (more than one dimension).
one dimension

unequal variances
Thurstonian
models (Gaussian)
equal variances

SchOnemann & Tucker
(1967)
Braida & Durlach
(1972)
Kadlec & Townsend
(1992a, 1992t>)

feature-complete

GRT models

MSDA methods

orthogonal

all stimuli equally
discriminable,
no bias

SDT

Table A5.7

Choice Theory

Eqs. 10.1, 10.2

above
assumptions not
made

"constant" bias
(Choice Theory)

(see chap. 10 for
example)

arbitrary bias

Smith (1982b)

This page intentionally left blank

Appendix

4

Some Useful Equations

The equations listed here are taken directly from the text. Only equations
useful for computing sensitivity and bias indexes (including all those to
which the user of the Appendix 3 flowcharts is directed), or for comparing
paradigms, are given. To find out when specific measures are appropriate,
see the flowcharts in Appendix 3. For further discussion, refer back to the
relevant chapter.
Yes-No Sensitivity

(1.5)
(1.7)
(4.8)

(4.9)
(7.4)
(7.5)

(2.1)
369

370

Appendix 4

(2.6)

(4.11)

(4.12)
(4.13)
(4.14)
Rating Experiments

(3.1)
(3.2)
(3.3)

(3.5)

(3.7)

(3.8)

Threshold and "Nonparametric"

372

Appendix 4
One-Dimensional Classification

(5.4)
(5.5)

(7.2)

(7.3)

(7.4)
(7.6)
(7.9)

(7-12)

(7.14)

(10.1)

(10.2)

Some Useful Equations

373
(10.3)

(10.5)

(9.3)

(9.4)
(9.9)

(13.1)

(13.3)

(13.4)
(13.6)

Appendix

5

Tables

Table A5.1: Normal Distribution (p to z) for Finding (z)

.91
.92
.93
.94
.95

.8185887
.8212136
.8238145
.8263912
.8289439

1.21
1.22
1.23
1.24
1.25

.8868606
.8887676
.8906514
.8925123
.8943502

1.51
1.52
1.53
1.54
1.55

.9344783
.9357445
.9369916
.9382198
.9394292

.96
.97
.98
.99
1.00

.8314724
.8339768
.8364569
.8389129
.8413447

1.26
1.27
1.28
1.29
1.30

.8961653
.8979577
.8997274
.9014747
.9031995

1.56
1.57
1.58
1.59
1.60

.9406201
.9417924
.9429466
.9440826
.9452007

1.01
1.02
1.03
1.04
1.05

.8437524
.8461358
.8484950
.8508300
.8531409

1.31
1.32
1.33
1.34
1.35

.9049021
.9065825
.9082409
.9098773
.9114920

1.61
1.62
1.63
1.64
1.65

.9463011
.9473839
.9484493
.9494974
.9505285

1.06
1.07
1.08
1.09
1.10

.8554277
.8576903
.8599289
.8621434
.8643339

1.36
1.37
1.38
1.39
1.40

.9130850
.9146565
.9162067
.9177356
.9192433

1.66
1.67
1.68
1.69
1.70

.9515428
.9525403
.9535213
.9544860
.9554345

1.11
1.12
1.13
1.14
1.15

.8665005
.8686431
.8707619
.8728568
.8749281

1.41
1.42
1.43
1.44
1.45

.9207302
.9221962
.9236415
.9250663
.9264707

1.71
1.72
1.73
1.74
1.75

.9563671
.9572838
.9581849
.9590705
.9599408

1.16
1.17
1.18
1.19
1.20

.8769756
.8789995
.8809999
.8829768
.8849303

1.46
1.47
1.48
1.49
1.50

.9278550
.9292191
.9305634
.9318879
.9331928

1.76
1.77
1.78
1.79
1.80

.9607961
.9616364
.9624620
.9632730
.9640697

z

0(z)

z

«>(z)

377

378

Appendix 5

TABLE A5.2 Normal Distribution (z to p)
(cont.)

z
1.81
1.82
1.83
1.84
1.85

<&(z)
.9648521
.9656205
.9663750
.9671159
.9678432

z
2.11
2.12
2.13
2.14
2.15

0(z)
.9825708
.9829970
.9834142
.9838226
.9842224

z
2.41
2.42
2.43
2.44
2.45

*(z)
.9920237
.9922397
.9924506
.9926564
.9928572

1.86
1.87
1.88
1.89
1.90

.9685572
.9692581
.9699460
.9706210
.9712834

2.16
2.17
2.18
2.19
2.20

.9846137
.9849966
.9853713
.9857379
.9860966

2.46
2.47
2.48
2.49
2.50

.9930531
.9932443
.9934309
.9936128
.9937903

191
1.92
193
1.94
1.95

.9719334
.9725711
.9731966
.9738102
.9744119

2.21
2.22
2.23
2.24
2.25

.9864474
.9867906
.9871263
.9874545
.9877755

2.51
2.52
2.53
2.54
2.55

.9939634
.9941323
.9942969
.9944574
.9946139

1.96
1.97
1.98
1.99
2.00

.9750021
.9755808
.9761482
.9767045
.9772499

2.26
2.27
2.28
2.29
2.30

.9880894
.9883962
.9886962
.9889893
.9892759

2.56
2.57
2.58
2.59
2.60

.9947664
.9949151
.9950600
.9952012
.9953388

2.01
2.02
2.03
2.04
2.05

.9777844
.9783083
.9788217
.9793248
.9798178

2.31
2.32
2.33
2.34
2.35

.9895559
.9898296
.9900969
.9903581
.9906133

2.70
2.80
2.90
3.00

.9965330
.9974449
.9981342
.9986501

2.06
2.07
2.08
2.09
2.10

.9803007
.9807738
.9812372
.9816911
.9821356

2.36
2.37
2.38
2.39
2.40

.9908625
.9911060
.9913437
.9915758
.9918025

3.20
3.40
3.60
3.80
4.00

.9993129
.9996631
.9998409
.9999277
.9999683

Tables
TABLE A5.2
(com.)

379

Normal Distribution (z to p)

4.50 .9999966
5.00 .9999997
5.50 .9999999

Source: Tables A5.1 and A5.2 excerpted from Tables for Statisticians and Biometricians, Part II, edited
by K. Pearson (1931). Reprinted by permission of the Biometrika Trustees.

380

Appendix 5

Table A5.3. Values of d' for Same-Different (Independent-Observation
Model) and ABX (Independent-Observation and Differencing Models).
To find d' from H and F, first calculate z(//) - z(F) and find the result in
the first column. Then look across to the appropriate design and model. If//
and F are not available, assume that the observer is unbiased: Find p(c) in
tjie second column and look across for d'.

d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

0.01

0.502

0.16

0.13

0.15

0.02

0.504

0.22

0.19

0.21

0.03

0.506

0.28

0.23

0.26

0.04

0.508

0.32

0.27

0.30

0.05

0.510

0.36

0.30

0.33

0.06

0.512

0.39

0.33

0.36

0.07

0.514

0.42

0.35

0.39

0.08

0.516

0.45

0.38

0.42

0.09

0.518

0.48

0.40

0.45

0.10

0.520

0.51

0.43

0.47

0.11

0.522

0.53

0.45

0.50

0.12

0.524

0.56

0.47

0.52

0.13

0.526

0.58

0.49

0.54

0.14

0.528

0.60

0.51

0.56

0.15

0.530

0.62

0.52

0.58

0.16

0.532

0.64

0.54

0.60

0.17

0.534

0.66

0.56

0.62

0.18

0.536

0.68

0.58

0.64

0.19

0.538

0.70

0.59

0.66

0.20

0.540

0.72

0.61

0.68

382

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)
0.46

0.591

ABX
Independent
Observation

Differencing

1.13
1.14

0.95

1.06

0.96

1.07

0.97

1.08

0.98

1.09

0.47

0.593

0.48

0.595

0.49

0.597

1.15
1.17

0.50

0.599

1.18

0.99

1.11

0.51

0.601

1.19

1.01

1.12

0.52

0.603

1.20

1.02

0.53

0.604

1.22

1.03

1.13
1.14

0.54

0.606
0.608

1.23
1.24

1.04

0.55

1.05

1.16
1.17

0.56

0.610

1.25

1.06

1.18

0.57

0.612

1.27

1.07

1.19

0.58

0.614

1.28

1.08

1.20

0.59
0.60

0.616

1.09
1.10

1.22

0.618

1.29
1.30

0.61
0.62

0.620

1.32

1.11

1.24

0.622

1.25

0.624

1.33
1.34

1.12

0.63

1.13

1.26

0.64

0.626

1.35

1.14

1.27

0.65

0.627

1.36

1.15

1.29

0.66
0.67

0.629

1.38
1.39

1.16
1.17

1.30
1.31

0.68

0.633

1.18

1.32

0.69

0.635

1.40
1.41

1.19

1.33

0.70

0.637

1.42

1.20

1.34

0.631

1.23

Tables
TABLE A5.3

381

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

0.21

0.542

0.74

0.62

0.69

0.22

0.544

0.76

0.64

0.71

0.23

0.546

0.78

0.65

0.73

0.24

0.548

0.80

0.67

0.74

0.25

0.550

0.81

0.68

0.76

0.26

0.552

0.83

0.70

0.78

0.27

0.554

0.85

0.71

0.79

0.28

0.556

0.86

0.73

0.81

0.29

0.558

0.88

0.74

0.82

0.30

0.560

0.89

0.75

0.84

0.31

0.562

0.91

0.77

0.85

0.32

0.564

0.93

0.78

0.87

0.33

0.566

0.94

0.79

0.88

0.34

0.567

0.96

0.81

0.90

0.35

0.569

0.97

0.82

0.91

0.36

0.571

0.99

0.83

0.92

0.37

0.573

1.00

0.84

0.94

0.38

0.575

1.01

0.86

0.95

0.39

0.577

1.03

0.87

0.96

0.40

0.579

1.04

0.88

0.98

0.41

0.581

1.06

0.89

0.99

0.42

0.583

1.07

0.90

1.00

0.43

0.585

1.09

0.92

1.02

0.44

0.587

1.10

0.93

1.03

0.45

0.589

1.11

0.94

1.04

384

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

0.96

0.684

1.71

1.45

1.62

0.97

0.686

1.72

1.46

1.63

0.98

0.688

1.73

1.47

1.64

0.99

0.690

1.74

1.48

1.65

1.00

0.691

1.75

1.48

1.66

1.01

0.693

1.76

1.49

1.68

1.02

0.695

1.77

1.50

1.69

1.03
1.04

0.697

1.78

0.698

1.79

1.51
1.52

1.70
1.71

1.05

0.700

1.80

1.53

1.72

1.06

0.702

1.81

1.54

1.07

0.704

1.82

1.55

1.73
1.74

1.08

0.705

1.83

1.56

1.75

1.09
1.10

0.707

1.84

1.57

1.76

0.709

1.85

1.57

1.77

1.11

0.711

1.87

1.58

1.78

1.12

0.712

1.88

1.59

1.13

0.714

1.89

1.60

1.79
1.80

1.14

0.716

1.90

1.61

1.81

1.15

0.717

1.91

1.62

1.82

1.16
1.17

0.719
0.721
0.722

1.63
1.64
1.64

1.83
1.84

1.18

1.92
1.93
1.94

1.19

0.724

1.95

1.65

1.86

1.20

0.726

1.96

1.66

1.87

1.85

Tables
TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)
0.71

0.639

1.43

ABX
Independent
Observation

Differencing

1.21

1.35

0.72

0.641

1.45

1.22

1.36

0.73
0.74

0.642
0.644

1.46
1.47

1.23
1.24

1.38

0.75

0.646

1.48

1.25

1.39
1.40

0.76

0.648

1.49

1.26

1.41

0.77

0.650

1.50

1.27

1.42

0.78

0.652

1.51

1.28

0.79

0.654

1.52

1.29

1.43
1.44

0.80

0.655

1.54

1.30

1.45

0.81

0.657

1.55

1.31

1.46

0.82

0.659

1.56

1.32

1.47

0.83

0.661

1.57

1.33

1.49

0.84

0.663

1.58

1.34

1.50

0.85

0.665

1.59

1.35

1.51

0.86
0.87

0.666
0.668

1.36
1.37

1.52

0.88

0.670

1.60
1.61
1.62

1.38

1.53
1.54

0.89

0.672

1.63

1.38

1.55

0.90

0.674

1.65

1.39

1.56

0.91

0.675

1.66

1.40

1.57

0.92

0.677

1.67

1.41

0.93
0.94

0.679

1.68

1.42

1.58
1.59

0.681
0.683

1.69
1.70

1.43
1.44

1.60
1.61

0.95

383

Tables
TABLE A5.3

385

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX

Independent
Observation

Differencing

1.21
1.22

0.727

1.97

1.67

1.88

0.729

1.98

1.68

1.89

1.23

0.731

1.69

1.90
1.91

1.24

0.732

1.99
2.00

1.25

0.734

2.01

1.70
1.71

1.26

0.736

2.02

1.71

1.93

1.27

0.737

1.72

1.94

1.28

0.739

2.03
2.04

1.73

1.95

1.29

0.741

2.05

1.74

1.96

1.30

0.742

2.06

1.75

1.97

1.31
1.32

0.744

2.07

1.76

1.98

0.745

2.08

1.77

1.99

1.33

0.747

2.09

1.77

2.00

1.34

0.749

2.09

1.78

1.35

0.750

2.10

1.79

2.01
2.02

1.36

0.752

2.11

1.80

1.37

0.753

2.12

1.38
1.39
1.40

0.755

1.81
1.82

0.756

2.13
2.14

0.758

2.15

1.83
1.83

1.92

2.03
2.04
2.05
2.06
2.07

1.41

0.760

2.16

1.84

2.08

1.42

0.761

2.17

1.85

2.09

1.43

0.763

2.18

1.86

2.10

1.44

0.764

2.19

1.87

1.45

0.766

2.20

1.88

2.11
2.12

386

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'

z(H)-z(F)

P^unb

Same-Different
(Independent
Observation)

ABX
Independent
Observation

Differencing

1.46

0.767

2.21

1.88

2.13

1.47

0.769

2.22

0.770

1.49

0.772

2.23
2.24

1.89
1.90
1.91

2.14

1.48
1.50

0.773

2.25

1.92

2.17

1.51
1.52

0.775

2.26

1.93

2.18

0.776

2.27

1.94

2.19

1.53

0.778

2.28

1.94

2.20

1.54

0.779

2.29

1.95

2.21

1.55

0.781

2.30

1.96

2.22

1.56

0.782

2.31

1.97

2.23

1.57

0.784

2.32

1.98

2.24

1.58

0.785

1.99

2.25

1.59

0.787

2.33
2.34

1.99

2.26

1.60

0.788

2.35

2.00

2.27

1.61
1.62
1.63

0.790

2.01
2.02

2.28

0.791
0.792

2.36
2.36
2.37

1.64

0.794

2.38

2.03
2.04

1.65

0.795

2.39

2.04

2.32

1.66

0.797

2.40

2.05

2.33

1.67

0.798

2.41

2.06

2.34

1.68
1.69
1.70

0.800

2.42

0.801
0.802

2.43
2.44

2.07
2.08

2.35
2.36
2.37

2.09

2.15
2.16

2.29
2.30
2.31

Tables
TABLE A5.3

387

Values of a" for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

1.71

0.804

2.45

2.10

2.38

1.72

0.805

2.46

2.10

2.39

1.73

0.806

2.47

2.11

2.40

1.74

0.808

2.48

2.12

2.41

1.75

0.809

2.49

2.13

2.42

1.76

0.811

2.50

2.14

2.43

1.77

0.812

2.51

2.15

2.44

1.78

0.813

2.52

2.15

2.45

1.79

0.815

2.53

2.16

2.46

1.80

0.816

2.53

2.17

2.47

1.81

0.817

2.54

2.18

2.48

1.82

0.819

2.55

2.19

2.49

1.83

0.820

2.56

2.20

2.50

1.84

0.821

2.57

2.20

2.51

1.85

0.823

2.58

2.21

2.52

1.86

0.824

2.59

2.22

2.53

1.87

0.825

2.60

2.23

2.54

1.88

0.826

2.61

2.24

2.55

1.89

0.828

2.62

2.25

2.56

1.90

0.829

2.63

2.25

2.57

1.91

0.830

2.64

2.26

2.58

1.92

0.831

2.65

2.27

2.59

1.93

0.833

2.66

2.28

2.60

1.94

0.834

2.67

2.29

2.61

1.95

0.835

2.67

2.30

2.62

388

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

1.96

0.836

2.68

2.30

2.63

1.97

0.838

2.69

2.31

2.64

1.98

0.839

2.70

2.32

2.65

1.99

0.840

2.71

2.33

2.67

2.00

0.841

2.72

2.34

2.68

2.01

0.843

2.73

2.35

2.69

2.02

0.844

2.74

2.35

2.70

2.03

0.845

2.75

2.36

2.71

2.04

0.846

2.76

2.37

2.72

2.05

0.847

2.77

2.38

2.73

2.06

0.848

2.78

2.39

2.74

2.07

0.850

2.79

2.40

2.75

2.08

0.851

2.79

2.40

2.76

2.09

0.852

2.80

2.41

2.77

2.10

0.853

2.81

2.42

2.78

2.11

0.854

2.82

2.43

2.79

2.12

0.855

2.83

2.44

2.80

2.13

0.857

2.84

2.45

2.81

2.14

0.858

2.85

2.45

2.82

2.15

0.859

2.86

2.46

2.83

2.16

0.860

2.87

2.47

2.84

2.17

0.861

2.88

2.48

2.85

2.18

0.862

2.89

2.49

2.86

2.19

0.863

2.90

2.50

2.87

2.20

0.864

2.91

2.50

2.88

Tables
TABLE A5.3

389

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same -Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

2.21

0.865

2.91

2.51

2.89

2.22

0.867

2.92

2.52

2.90

2.23

0.868

2.93

2.53

2.91

2.24

0.869

2.94

2.54

2.92

2.25

0.870

2.95

2.55

2.93

2.26

0.871

2.55

2.94

2.27

0.872

2.96
2.97

0.873

2.98

2.56
2.57

2.95

2.28
2.29

0.874

2.99

2.58

2.97

2.30

0.875

3.00

2.59

2.99

2.96

2.31

0.876

3.01

2.60

3.00

2.32

0.877

3.02

2.60

3.01

2.33
2.34

0.878

2.61
2.62

3.02

2.35

0.880

3.02
3.03
3.04

2.63

3.03
3.04

2.36

0.881

3.05

2.64

3.05

2.37

0.882

3.06

2.65

3.06

0.879

2.38

0.883

3.07

2.65

3.07

2.39

0.884

3.08

2.66

3.08

2.40

0.885

3.09

2.67

3.09

2.41
2.42

0.886
0.887

3.10
3.11

2.68
2.69

2.43

0.888

3.12

2.70

3.10
3.11
3.12

2.44

0.889

3.13

2.70

3.13

2.45

0.890

3.13

2.71

3.14

390

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

2.46

0.891

3.14

2.72

3.15

2.47

0.892

3.15

2.73

3.16

2.48

0.893

3.16

2.74

3.18

2.49

0.893

3.17

2.75

3.19

2.50

0.894

3.18

2.76

3.20

2.51

0.895

3.19

2.76

3.21

2.52

0.896

3.20

2.77

3.22

2.53

0.897

3.21

2.78

3.23

2.54

0.898

3.22

2.79

3.24

2.55

0.899

3.23

2.80

3.25

2.56

0.900

3.24

2.81

3.26

2.57

0.901

3.24

2.81

3.27

2.58

0.901

3.25

2.82

3.28

2.59

0.902

3.26

2.83

3.29

2.60

0.903

3.27

2.84

3.30

2.61

0.904

3.28

2.85

3.32

2.62

0.905

3.29

2.86

3.33

2.63

0.906

3.30

2.87

3.34

2.64

0.907

3.31

2.87

3.35

2.65

0.907

3.32

2.88

3.36

2.66

0.908

3.33

2.89

3.37

2.67

0.909

3.34

2.90

3.38

2.68

0.910

3.35

2.91

3.39

2.69

0.911

3.35

2.92

3.40

2.70

0.911

3.36

2.92

3.41

Tables
TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'

Same-Different
(Independent
Observation)

z(H)-z(F)
2.71

391

ABX
Independent
Observation

Differencing

0.912

3.37

2.93

3.42

2.72

0.913

3.38

2.94

3.44

2.73

0.914

3.39

2.95

3.45

2.74

0.915

3.40

2.96

3.46

2.75

0.915

3.41

2.97

3.47

2.76

0.916

3.42

2.98

3.48

2.77

0.917

3.43

2.98

3.49

2.78

0.918

3.44

2.99

3.50

2.79

0.918

3.45

3.00

3.51

2.80

0.919

3.45

3.01

3.52

2.81

0.920

3.46

3.02

3.53

2.82

0.921

3.47

3.03

3.55

2.83

0.921

3.48

3.04

3.56

2.84

0.922

3.49

3.04

3.57

2.85

0.923

3.50

3.05

3.58

2.86

0.924

3.51

3.06

3.59

2.87

0.924

3.52

3.07

3.60

2.88

0.925

3.53

3.08

3.61

2.89

0.926

3.54

3.09

3.62

2.90

0.926

3.55

3.10

3.63

2.91

0.927

3.56

3.11

3.65

2.92

0.928

3.56

3.11

3.66

2.93

0.929

3.57

3.12

3.67

2.94

0.929

3.58

3.13

3.68

2.95

0.930

3.59

3.14

3.69

392

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

2.96

0.931

3.60

3.15

3.70

2.97

0.931

3.61

3.16

3.71

2.98

0.932

3.62

3.17

3.72

2.99

0.933

3.63

3.17

3.74

3.00

0.933

3.64

3.18

3.75

3.01

0.934

3.65

3.19

3.76

3.02

0.934

3.66

3.20

3.77

3.03

0.935

3.67

3.21

3.78

3.04

0.936

3.67

3.22

3.79

3.05

0.936

3.68

3.23

3.80

3.06

0.937

3.69

3.24

3.82

3.07

0.938

3.70

3.24

3.83

3.08

0.938

3.71

3.25

3.84

3.09

0.939

3.72

3.26

3.85

3.10

0.939

3.73

3.27

3.86

3.11

0.940

3.74

3.28

3.87

3.12

0.941

3.75

3.29

3.88

3.13

0.941

3.76

3.30

3.90
3.91

3.14

0.942

3.77

3.31

3.15

0.942

3.78

3.32

3.92

3.16

0.943

3.78

3.32

3.93

3.17

0.944

3.79

3.33

3.94
3.95

3.18

0.944

3.80

3.34

3.19

0.945

3.81

3.35

3.96

3.20

0.945

3.82

3.36

3.98

Tables
TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'

Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

3.21

0.946

3.83

3.37

3.99

3.22

0.946

3.84

3.38

4.00

3.23

0.947

3.85

3.39

4.01

3.24

0.947

3.86

4.02

3.25

0.948

3.87

3.39
3.40

3.26

0.948

3.88

3.41

4.05

3.27

0.949

3.89

3.42

4.06

3.28

0.949

3.90

3.43

4.07

3.29

0.950

3.90

3.44

4.08

3.30

0.951

3.91

3.45

4.09

3.31

0.951

3.92

0.952

3.93

3.46
3.47

4.10

3.32
3.33

0.952

3.94

3.48

3.34

0.953

3.95

3.48

4.13
4.14

3.35

0.953

3.96

3.49

4.15

3.36

0.954
0.954

3.97

3.50
3.51
3.52

4.16

4.20

3.37
3.38

393

4.03

4.12

4.18
4.19

0.954

3.98
3.99

3.39

0.955

4.00

3.40

0.955

4.01

3.53
3.54

3.41

0.956

4.02

3.55

4.22

3.42

0.956

4.02

3.56

4.23

3.43

0.957

4.03

3.57

4.25

3.44
3.45

0.957
0.958

4.04
4.05

3.57
3.58

4.26
4.27

4.21

394

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'

Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

3.46

0.958

4.06

3.59

4.28

3.47
3.48

0.959

4.07

0.959

4.08

3.60
3.61

4.29
4.31

3.49

0.960

4.09

3.62

4.32

3.50

0.960

4.10

3.63

4.33

3.51

0.960

4.11

3.64

4.34

3.52

0.961

4.12

3.65

4.35

3.53

0.961

3.66

4.37

3.54

0.962

4.13
4.14

3.67

4.38

3.55

0.962

4.15

3.67

4.39

3.56
3.57

0.962

4.16

3.68

0.963

4.16

3.69

4.40
4.41

3.58

0.963

4.17

3.70

4.43

3.59

0.964

4.18

3.71

4.44

3.60

0.964

4.19

3.72

4.45

3.61
3.62

0.964
0.965

4.20
4.21

3.73
3.74

4.46
4.47

3.63

0.965

4.22

3.75

4.49

3.64

0.966

4.23

3.76

4.50

3.65

0.966

4.24

3.77

4.51

3.66
3.67

0.966

4.25

3.78

4.52

0.967
0.967
0.967

4.26
4.27

3.79
3.79

4.28

0.968

4.29

3.80
3.81

4.53
4.55
4.56

3.68
3.69
3.70

4.57

Tables
TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'

Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

3.71

0.968

4.30

3.82

4.58
4.60

3.72

0.969

4.31

3.73

0.969

4.31

3.83
3.84

3.74

4.32

3.85

4.62

3.75

0.969
0.970

4.33

3.86

4.63

3.76

0.970

4.34

3.87

4.64

3.77

0.970

4.35

3.88

4.66

3.78

0.971

4.36

3.89

4.67

3.79

0.971

4.37

3.90

4.68

3.80

0.971

4.38

3.91

4.69

3.81

0.972
0.972
0.972

4.39
4.40
4.41

3.92

3.82
3.83

4.71
4.72

3.84

0.973

4.42

3.95

4.73
4.74

3.85

0.973

4.43

3.95

4.76

3.86

0.973

4.44

3.96

4.77

3.87

0.974

4.45

3.97

4.78

3.88

0.974
0.974

4.46
4.47

3.98

0.974

4.48

3.99
4.00

4.79
4.81
4.82

3.91

0.975

4.49

4.01

4.83

3.92

0.975

4.50

4.02

4.84

3.93

0.975

4.50

4.03

4.86

3.94

0.976

4.51

4.04

4.87

3.95

0.976

4.52

4.05

4.88

3.89
3.90

395

3.93
3.94

4.61

396

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX

Independent
Observation

Differencing

3.96

0.976

4.53

4.06

4.89

3.97

0.976

4.54

4.07

4.91

3.98

0.977

4.55

4.08

4.92

3.99
4.00

0.977

4.56

0.977

4.57

4.09
4.10

4.93
4.94

4.01

0.978

4.58

4.11

4.96

4.02

0.978

4.97

0.978

4.59
4.60

4.12

4.03

4.13

4.98

4.04

0.978

4.61

4.14

4.99

4.05

0.979

4.62

4.15

5.01

4.06
4.07

0.979

4.16
4.17

5.02

0.979

4.63
4.64

4.08

0.979

4.65

4.18

5.05

4.09

0.980

4.66

4.19

5.06

4.10

0.980

4.67

4.20

5.07

4.11

0.980

4.68

4.21

5.08

4.12

0.980

4.69

4.22

4.13
4.14

0.981
0.981

4.70
4.71

4.23
4.24

5.10
5.11

4.15

0.981

4.72

4.25

5.13

4.16

0.981

4.73

4.26

5.15

4.17

0.981

4.74

4.27

5.16

4.18

0.982

4.75

4.28

5.17

4.19
4.20

0.982

4.76
4.77

4.29
4.30

5.19
5.20

0.982

5.03

5.12

Tables
TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'

z(H)-z(F)
4.21

397

/*c)unb

Same-Different
(Independent
Observation)

ABX
Independent
Observation

Differencing

5.21

0.982

4.78

4.22

0.983

4.79

4.32

5.23

4.23

0.983

5.24

4.24

0.983

4.80
4.81

4.33
4.34

5.25

4.25

0.983

4.82

4.35

5.26

4.26
4.27

0.983
0.984

4.83
4.84

4.36
4.37

5.28
5.29

4.28

0.984

4.85

4.38

5.30

4.29

0.984

4.86

4.39

5.32

4.30

0.984

4.87

4.40

5.33

4.31

4.31

0.984

4.88

4.41

5.34

4.32

0.985

4.89

4.42

5.36

4.33
4.34

0.985
0.985

4.90
4.91

4.43
4.44

5.37

4.35

0.985

4.92

4.45

5.38
5.40

4.36
4.37

0.985

4.93

4.46

5.41

0.986

4.94

4.47

5.42

4.38

0.986

4.95

4.48

5.43

4.39

0.986

4.96

4.49

5.45

4.40

0.986

4.97

4.50

5.46

4.41
4.42

0.986
0.986

4.98

5.47

4.99

4.51
4.52

4.43

0.987

5.00

4.53

5.50

4.44

0.987

5.01

4.54

5.51

4.45

0.987

5.02

4.55

5.53

5.49

398

Appendix 5

TABLE A5.3

Values ofd' for Same-Different and ABX Models (cont.)
d'
Same-Different
(Independent
Observation)

z(H)-z(F)

ABX
Independent
Observation

Differencing

4.46

0.987

5.03

4.56

5.54

4.47

0.987

5.04

4.57

4.48

0.987

5.05

4.58

5.56
5.57

4.49

0.988

4.59

5.58

4.50

0.988

5.06
5.07

4.60

5.60

4.51

0.988

5.08

4.62

5.61

4.52

0.988

5.09

4.63

5.62

4.53

0.988

5.10

4.64

5.64

4.54

0.988

4.65

5.65

4.55

0.989

5.11
5.12

4.66

5.66

4.56

0.989

4.67

5.68

4.57
4.58

0.989

5.13
5.14

4.68
4.69

5.69
5.70

4.59

0.989

5.15
5.16

4.70

5.72

4.60

0.989

5.17

4.71

5.73

4.61

0.989

5.18

4.72

5.75

4.62

0.990

5.19

4.73

5.76

0.989

4.63

0.990

5.20

4.74

5.77

4.64

0.990

4.65

0.990

5.21
5.22

4.76
4.77

5.79
5.80

4.66

0.990

5.23

4.78

5.81

4.67

0.990

5.25

4.79

5.83

4.68

0.990

5.26

4.80

5.84

4.69

0.990

5.27

4.81

5.86

4.70

0.991

5.28

4.82

5.87

Tables
TABLE A5.3

Values ofd' for Same-Different

and ABX Models (cont.)

d'

Same-Different
(Independent
Observation)

z(H)-z(F)

399

ABX
Independent
Observation

Differencing

4.71

0.991

5.29

4.72

0.991

5.30

4.73

0.991

5.31

4.85

5.91

4.74

0.991

5.32

4.87

5.93

4.75

0.991

5.33

4.88

5.94

4.76

0.991

5.34

4.89

5.96

4.77

0.991

5.35

4.90

5.97

4.78

0.992

4.91

5.98

4.79

0.992

5.36
5.37

4.92

4.80

0.992

5.39

4.93

6.00
6.01

4.81

0.992

5.40

4.94

6.03

4.82

0.992

5.41

6.04

4.83

0.992

5.42

4.96
4.97

4.84

0.992
0.992

5.43

4.98

6.07

4.85

5.44

4.99

6.09

4.86

0.992

5.45

5.00

4.87

0.993

5.46

5.01

6.10
6.11

4.88

0.993

5.47

5.03

6.13

4.89

0.993

5.49

5.04

6.14

4.90

0.993

5.50

5.05

6.16

4.91
4.92
4.93
4.94

0.993
0.993
0.993

5.51
5.52
5.53

5.06
5.07
5.08

6.17
6.19
6.20

0.993

5.54

5.10

6.22

4.95

0.993

5.55

5.11

6.23

4.83
4.84

5.88
5.90

6.06

This page intentionally left blank

Tables

401

Table A5.4. Values of d' for Same-Different (Differencing Model).
Fb
a

0.01

0.01

0.00

0.02

0.71

0.00

0.03

0.97

0.56

0.04

1.16

0.78

0.49

0.00

0.05

1.31

0.94

0.45

0.00

0.06

1.44

1.08

0.69
0.84

0.63

0.42

0.00

0.07

1.55

1.19

0.96

0.77

0.59

0.39

0.00

0.08

1.65

1.30

1.07

0.88

0.72

0.56

0.38

0.00

H

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.00

0.09

1.75

1.39

1.16

0.98

0.83

0.68

0.53

0.36

0.00

0.10

1.83

1.47

1.25

1.07

0.92

0.79

0.65

0.51

0.35

0.11
0.12

1.91

1.55

1.01

0.88

0.75

0.34

1.23

1.09

0.96

0.84

0.63
0.73

0.50

1.63

1.33
1.40

1.15

1.98

0.61

0.49

0.13

2.05

1.70

1.47

1.30

1.16

1.04

0.92

0.81

0.71

0.60

0.14

2.11

1.54

1.37

1.23

1.11

1.00

0.89

0.79

0.69

0.15

2.18

1.76
1.82

1.60

1.43

1.17

1.07

0.96

0.87

0.77

0.00

0.16

2.24

1.88

1.66

1.49

1.29
1.36

1.24

1.13

1.03

0.94

0.85

0.17

2.29

1.94

1.72

1.55

1.41

1.30

1.19

1.09

1.00

0.91

0.18

2.35

1.99

1.77

1.61

1.47

1.35

1.25

1.15

0.19

2.40

2.05

1.83

1.41

1.30

1.21

1.06
1.12

0.98
1.04

0.20

2.45

2.10

1.88

1.66
1.71

1.52
1.58

1.46

1.36

1.26

1.18

1.10

0.21

2.50

2.15

1.93

1.76

1.63

1.41

1.32

1.23

1.15

0.22

2.55

2.20

1.98

1.81

1.68

1.51
1.56

1.46

1.37

1.28

1.20

0.23

2.60

2.24

2.02

1.86

1.72

1.61

1.51

1.42

1.33

1.25

0.24

2.64

2.29

2.07

1.90

1.77

1.66

1.56

1.46

1.38

1.30

0.25

2.69

2.34

2.11

1.95

1.82

1.70

1.60

1.51

1.43

1.35

0.26
0.27

2.73

2.38

2.16

1.86

1.75

1.65

1.56

2.42
2.47

2.20
2.24

1.79
1.83

1.69
1.73

1.60
1.64

1.40
1.44

1.56

1.49

0.29

2.86

2.51

2.29

2.08
2.12

1.90
1.95

1.47
1.52

0.28

2.78
2.82

1.99
2.04

1.99

1.87

1.78

1.69

1.61

0.30

2.90

2.55

2.33

2.16

2.03

1.92

1.82

1.73

1.65

1.53
1.57

"H = hit rate = />("different" I Different).
*F = false-alarm rate = P("different" I Same).

402

Appendix 5

TABLE A5.4
(cont.)

Values 0/d' for Same-Different (Differencing

Model)

pb
a

H

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.31

2.94

2.59

2.37

2.20

2.07

1.96

1.86

1.77

1.69

1.61

0.32

2.98

2.63

2.41

2.24

2.11

2.00

1.90

1.81

1.73

1.66

0.33

3.02

2.67

2.45

2.28

2.15

2.04

1.94

1.85

1.77

1.70

0.34

3.06

2.71

2.49

2.32

2.1

2.07

1.98

1.89

1.81

1.73

0.35

3.10

2.74

2.52

2.36

2.23

2.11

2.02

1.93

1.85

1.77

0.36

3.14

2.78

2.56

2.40

2.26

2.15

2.05

1.97

1.89

1.81

0.37

3.17

2.82

2.60

2.43

2.30

2.19

2.09

2.00

1.92

1.85

0.38

3.21

2.86

2.64

2.47

2.34

2.23

2.13

2.04

1.96

1.89

0.39

3.25

2.89

2.67

2.51

2.38

2.26

2.17

2.08

2.00

1.93

0.40

3.28

2.93

2.71

2.55

2.41

2.30

2.20

2.12

2.04

1.96

0.41

3.32

2.97

2.75

2.58

2.45

2.34

2.24

2.15

2.07

2.00

0.42

3.36

3.00

2.78

2.62

2.49

2.37

2.28

2.19

2.11

2.04

0.43

3.39

3.04

2.82

2.65

2.52

2.41

2.31

2.22

2.15

2.07

0.44

3.43

3.08

2.86

2.69

2.56

2.45

2.35

2.26

2.18

2.11

0.45

3.47

3.11

2.89

2.73

2.59

2.48

2.38

2.30

2.22

2.15

0.46

3.50

3.15

2.93

2.76

2.63

2.52

2.42

2.33

2.25

2.18

0.47

3.54

3.18

2.96

2.80

2.67

2.55

2.46

2.37

2.29

2.22

0.48

3.57

3.22

3.00

2.83

2.70

2.59

2.49

2.40

2.33

2.25

0.49

3.61

3.25

3.03

2.87

2.74

2.62

2.53

2.44

2.36

2.29

0.50

3.64

3.29

3.07

2.90

2.77

2.66

2.56

2.48

2.40

2.32

0.51

3.68

3.33

3.10

2.94

2.81

2.70

2.60

2.51

2.43

2.36

0.52

3.71

3.36

3.14

2.98

2.84

2.73

2.63

2.55

2.47

2.40

0.53

3.75

3.40

3.18

3.01

2.88

2.77

2.67

2.58

2.50

2.43

0.54

3.78

3.43

3.21

3.05

2.91

2.80

2.70

2.62

2.54

2.47

0.55

3.82

3.47

3.25

3.08

2.95

2.84

2.74

2.65

2.57

2.50

0.56

3.86

3.50

3.28

3.12

2.99

2.87

2.78

2.69

2.61

2.54

0.57

3.89

3.54

3.32

3.15

3.02

2.91

2.81

2.72

2.65

2.57

0.58

3.93

3.58

3.35

3.19

3.06

2.95

2.85

2.76

2.68

2.61

0.59

3.96

3.61

3.39

3.23

3.09

2.98

2.88

2.80

2.72

2.65

0.60

4.00

3.65

3.43

3.26

3.13

3.02

2.92

2.83

2.76

2.68

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

Tables
TABLE A5.4
(cont.).

Values ofd' for Same-Different (Differencing

403

Model)

F*
Ha

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.61

4.04

3.68

3.46

3.30

3.17

3.05

2.96

2.87

2.79

2.72

0.62

4.07

3.72

3.50

3.34

3.20

3.09

2.99

2.91

2.83

2.76

0.63

4.11

3.76

3.54

3.37

3.24

3.13

3.03

2.94

2.87

2.79

0.64

4.15

3.80

3.58

3.41

3.28

3.17

3.07

2.98

2.90

2.83

0.65

4.19

3.83

3.61

3.45

3.32

3.20

3.11

3.02

2.94

2.87

0.66

4.23

3.87

3.65

3.49

3.36

3.24

3.15

3.06

2.98

2.91

0.67

4.26

3.91

3.69

3.53

3.39

3.28

3.18

3.10

3.02

2.95

0.68

4.30

3.95

3.73

3.57

3.43

3.32

3.22

3.14

3.06

2.99

0.69

4.34

3.99

3.77

3.61

3.47

3.36

3.26

3.18

3.10

3.03

0.70

4.38

3.14

3.07

4.03

3.81

3.65

3.51

3.40

3.30

3.22

3.85

3.69

3.55

3.44

3.34

3.26

3.18

3.11
3.15
3.19

0.71

4.43

4.07

0.72

4.47

4.11

3.89

3.73

3.60

3.48

3.39

3.30

3.22

0.73

4.51

4.16

3.94

3.77

3.64

3.53

3.43

3.34

3.26

0.74

4.55

4.20

3.98

3.81

3.68

3.57

3.47

3.39

3.31

3.24

0.75

4.60

4.24

4.02

3.86

3.73

3.61

3.52

3.43

3.35

3.28

0.76

4.64

4.29

4.07

3.90

3.77

3.66

3.56

3.47

3.40

3.32
3.37

0.77

4.69

4.33

4.11

3.95

3.82

3.70

3.61

3.52

3.44

0.78

4.73

4.38

4.16

4.00

3.86

3.75

3.65

3.57

3.49

3.42

0.79

4.78

4.43

4.21

4.04

3.91

3.80

3.70

3.62

3.54

3.47

0.80

4.83

4.48

4.26

4.09

3.96

3.85

3.75

3.67

3.59

3.52

0.81

4.88

4.53

4.31

4.15

4.01

3.90

3.80

3.72

3.64

3.57

0.82

4.94

4.58

4.36

4.20

4.07

3.95

3.86

3.77

3.69

3.62

0.83

4.99

4.64

4.42

4.25

4.12

4.01

3.91

3.83

3.75

3.68

0.84

5.05

4.70

4.48

4.31

4.18

4.07

3.97

3.88

3.80

3.73

0.85

5.11

4.76

4.53

4.37

4.24

4.13

4.03

3.94

3.86

3.79

0.86

5.17

4.82

4.60

4.43

4.30

4.19

4.09

4.00

3.93

3.85

0.87

5.24

4.88

4.66

4.50

4.36

4.25

4.16

4.07

3.99

3.92

0.88

5.30

4.95

4.73

4.57

4.43

4.32

4.22

4.14

4.06

3.99

0.89

5.38

5.02

4.80

4.64

4.51

4.39

4.30

4.21

4.13

4.06

0.90

5.46

5.10

4.88

4.72

4.58

4.47

4.37

4.29

4.21

4.14

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

404

Appendix 5

TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

Model)

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.91

F»
0.01
5.54

5.19

4.97

4.80

4.67

4.56

4.46

4.37

4.29

a

H

0.10
4.22

0.92

5.63

5.28

5.06

4.89

4.76

4.65

4.55

4.46

4.38

4.31

0.93

5.73

5.38

5.16

4.99

4.86

4.75

4.65

4.56

4.48

4.41

0.94

5.84

5.27

4.86
4.99
5.14

4.76

5.97
6.12

5.10
5.23

4.97

0.95

5.49
5.62

4.67
4.80

4.60
4.72

4.52
4.65

4.95

4.87

4.80

0.96
0.97

6.30

0.98

6.55

0.99

6.93

5.77

5.40
5.54

5.38

5.10
5.25

4.89
5.04

5.95

5.73

5.56

5.43

5.32

5.22

5.14

5.06

4.99

6.19
6.58

5.97

5.81

5.68

5.56

5.47

5.38

5.30

5.23

6.36

6.19

6.06

5.95

5.85

5.77

5.69

5.62

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.00
0.32

0.00

0.46

0.32

0.00

0.45

0.31

0.00

Fb
Ha

0.11

0.12

0.11
0.12

0.00
0.34

0.00

0.13
0.14

0.48

0.33

0.58

0.15

0.67

0.47
0.57

0.16

0.75

0.66

0.56

0.17

0.83

0.74

0.65

0.55

0.44

0.31

0.00

0.18

0.89
0.96
1.02

0.81
0.88
0.94

0.73
0.80

0.64

0.19
0.20

0.44
0.54

0.86

0.79

0.55
0.63
0.71

0.62

0.31
0.43
0.53

0.00
0.30
0.43

0.00
0.30

0.00

0.21

1.07

1.00

0.93

0.85

0.78

0.70

0.62

0.53

0.43

0.30

0.22

1.13

1.05

0.98

0.91

0.84

0.77

0.69

0.61

0.52

0.42

0.23

1.18

1.11

1.04

0.97

0.90

0.83

0.76

0.69

0.61

0.52

0.24

1.16

0.96

0.60

0.75

0.26
0.27

1.33
1.37

1.26
1.30

1.19
1.24

1.01
1.06

0.75
0.82

0.68

1.08
1.13
1.17

0.89
0.95

0.82

1.21

1.09
1.14

1.02

0.25

1.23
1.28

0.87

0.68
0.74

0.28

1.42

1.35

1.28

1.22

0.29

1.46

0.30

1.50

1.39
1.44

1.33
1.37

0.72

1.00

0.88
0.94

1.11

1.05

0.99

0.93

0.81
0.87

1.10

1.04

0.98

0.92

0.86

1.27

1.16
1.21

1.15

1.03

0.97

0.92

1.31

1.25

1.20

1.09
1.14

1.08

1.03

0.97

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

0.80

Tables
TABLE A5.4
(cont.)

Values of d' for Same-Different (Differencing

405

Model)

0.40

f*
0.11
1.54
1.59
1.63
1.67
1.71
1.74
1.78
1.82
1.86
1.90

0.41

1.93

1.87

1.81

1.75

1.70

1.65

1.60

1.55

1.50

1.45

0.42

1.97

1.91

1.85

1.79

1.74

1.68

1.63

1.59

1.54

1.49

0.43

2.01

1.94

1.88

1.83

1.77

1.72

1.67

1.62

1.58

1.53

0.44

2.04

1.92

1.76
1.80

1.71

1.66
1.70
1.74

1.61
1.65

1.57

2.08
2.11

1.86
1.90
1.94

1.81

0.45

1.98
2.02

a

H

0.31
0.32
0.33
0.34
0.35
0.36
0.37
0.38
0.39

0.46

0.12

0.13

0.14

1.48

1.42

1.36

0.15
1.30

0.16
1.24

0.17

0.18

1.18

1.13

0.19
1.07

0.20
1.02

1.52

1.46

1.40

1.34

1.28

1.23

1.50

1.44

1.38

1.33

1.27

1.17
1.22

1.12

1.56

1.17

1.07
1.11

1.60

1.54

1.48

1.42

1.37

1.32

1.26

1.21

1.16

1.64

1.58

1.52

1.47

1.41

1.36

1.31

1.25

1.20

1.68

1.62

1.56

1.51

1.45

1.40

1.35

1.30

1.25

1.72
1.76

1.66

1.60
1.64

1.55

1.49
1.53

1.44

1.39
1.43

1.34

1.29
1.33

2.05

1.51

1.46

1.41

1.61

1.72

1.96
1.99

1.56

1.66

1.77

1.83

1.38
1.42

1.48
1.52

1.58
1.62

1.70
1.73

1.47

1.57

1.68

1.80

1.85
1.88

1.83

1.75
1.78

1.69

1.37

1.61
1.64

0.47

2.15

2.09

2.03

1.97

1.92

1.87

1.82

1.77

1.73

1.68

0.48

2.12

2.07

2.01

1.96

1.91

1.86

1.81

1.76

1.72

0.49

2.19
2.22

2.16

2.10

2.05

1.99

1.94

1.89

1.85

1.76

0.50

2.26

2.20

2.14

2.08

2.03

1.98

1.93

1.88

1.80
1.84

0.51
0.52

2.29
2.33
2.36

2.23
2.27
2.30

2.17
2.21

2.12

1.92

1.87

2.00
2.04

1.96

0.54

2.40

2.34

2.28

2.23

2.17

2.12

2.07

1.99
2.03

1.91
1.95

1.83
1.87

2.19

2.01
2.05
2.09

1.97

2.24

2.06
2.10
2.14

1.98

1.90
1.94

0.55

2.44

2.37

2.26

2.21

2.16

2.11

2.06

2.02

1.98

0.56

2.47

2.41

2.32
2.35

2.30

2.24

2.20

2.15

2.10

2.06

2.01

0.53

2.15

1.79

0.57

2.51

2.45

2.39

2.33

2.28

2.23

2.18

2.14

2.09

2.05

0.58

2.54

2.48

2.42

2.37

2.32

2.27

2.22

2.17

2.13

2.09

0.59
0.60

2.58
2.62

2.52

2.46
2.50

2.41
2.44

2.35

2.30
2.34

2.26
2.29

2.21

2.17

2.25

2.20

2.12
2.16

2.56

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

2.39

406

Appendix 5

TABLE A5.4
(com.)

Values of d' for Same-Different (Differencing

Model)

Fb
a

H

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.61

2.65

2.59

2.53

2.48

2.43

2.38

2.33

2.29

2.24

2.20

0.62

2.69

2.63

2.57

2.52

2.47

2.42

2.37

2.32

2.28

2.24

0.63

2.73

2.67

2.61

2.55

2.50

2.45

2.41

2.36

2.32

2.27

0.64

2.77

2.70

2.65

2.59

2.54

2.49

2.44

2.40

2.35

2.31

0.65

2.80

2.74

2.68

2.63

2.58

2.53

2.48

2.44

2.39

2.35

0.66

2.84

2.78

2.72

2.67

2.62

2.57

2.52

2.48

2.43

2.39

0.67

2.88

2.82

2.76

2.71

2.66

2.61

2.56

2.51

2.47

2.43

0.68

2.92

2.86

2.80

2.75

2.70

2.65

2.60

2.55

2.51

2.47

0.69

2.96

2.90

2.84

2.79

2.74

2.69

2.64

2.59

2.55

2.51

0.70

3.00

2.94

2.88

2.83

2.78

2.73

2.68

2.63

2.59

2.55

0.71

3.04

2.98

2.92

2.87

2.82

2.77

2.72

2.68

2.63

2.59

0.72

3.08

3.02

2.96

2.91

2.86

2.81

2.76

2.72

2.67

2.63

0.73

3.13

3.07

3.01

2.95

2.90

2.85

2.81

2.76

2.72

2.68

0.74

3.17

3.11

3.05

3.00

2.94

2.90

2.85

2.80

2.76

2.72

0.75

3.21

3.15

2.09

3.04

2.99

2.94

2.89

2.85

2.81

2.76

0.76

3.26

3.20

3.14

3.09

3.03

2.98

2.94

2.89

2.85

2.81

0.77

3.30

3.24

3.19

3.13

3.08

3.03

2.98

2.94

2.90

2.86

3.08

3.03

2.90

0.78

3.35

3.29

3.23

2.18

3.13

2.99

2.94

0.79

3.40

3.34

3.28

3.23

3.18

3.13

3.08

3.04

2.99

2.95

0.80

3.45

3.39

3.33

3.28

3.23

3.18

3.13

3.09

3.04

3.00

0.81

3.50

3.44

3.38

3.33

3.28

3.23

3.18

3.14

3.09

3.05

0.82

3.55

3.49

3.44

3.38

3.33

3.28

3.23

3.19

3.15

3.11

0.83

3.61

3.55

3.49

3.44

3.38

3.34

3.29

3.24

3.20

3.16

0.84

3.67

3.61

3.55

3.49

3.44

3.39

3.35

3.30

3.26

3.22

0.85

3.73

3.66

3.61

3.55

3.50

3.45

3.41

3.36

3.32

3.28

0.86

3.79

3.73

3.67

3.61

3.56

3.51

3.47

3.42

3.38

3.34

0.87

3.85

3.79

3.73

3.68

3.63

3.58

3.53

3.49

3.45

3.40

0.88

3.92

3.86

3.80

3.75

3.70

3.65

3.60

3.56

3.51

3.47

0.89

3.99

3.93

3.88

3.82

3.77

3.72

3.67

3.63

3.59

3.55

0.90

4.07

4.01

3.95

3.90

3.85

3.80

3.75

3.71

3.67

3.62

"H = hit rate = P("different" I Different).
*F = false-alarm rate = P("different" I Same).

Tables
TABLE A5.4
(cont.)

Values of d' for Same-Different (Differencing

407

Model)

pb
a

H

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.91

4.16

4.09

4.04

3.98

3.93

3.88

3.84

3.79

3.75

3.71

0.92

4.25

4.19

4.13

4.07

4.02

3.97

3.93

3.88

3.84

3.80

0.93

4.35

4.29

4.23

4.17

4.12

4.07

4.03

3.98

3.94

3.90

0.94

4.46

4.40

4.34

4.29

4.23

4.19

4.14

4.09

4.05

4.01

0.95

4.59

4.52

4.47

4.41

4.36

4.31

4.27

4.22

4.18

4.14

0.96

4.74

4.67

4.62

4.56

4.51

4.46

4.42

4.37

4.33

4.29

0.97

4.92

4.86

4.80

4.75

4.70

4.65

4.60

4.56

4.51

4.47

0.98

5.16

5.10

5.05

4.99

4.94

4.89

4.85

4.80

4.76

4.72

0.99

5.55

5.49

5.43

5.38

5.33

5.28

5.23

5.19

5.14

5.10

0.22

0.23

0.24

0.25

0.26

0.27

0.28

0.29

0.30

pb
a

H

0.21

0.21

0.00

0.22

0.30

0.00

0.23

0.42

0.30

0.00

0.24

0.52

0.42

0.29

0.00

0.25

0.60

0.51

0.42

0.29

0.26

0.67

0.60

0.51

0.41

0.29

0.00

0.27

0.74

0.67

0.59

0.51

0.41

0.29

0.00

0.28

0.80

0.73

0.66

0.59

0.51

0.41

0.29

0.00

0.29

0.86

0.79

0.73

0.66

0.59

0.51

0.41

0.29

0.00

0.30

0.91

0.85

0.79

0.73

0.66

0.59

0.51

0.41

0.29

0.00

0.31

0.96

0.91

0.85

0.79

0.73

0.66

0.59

0.50

0.41

0.29

0.32

1.01

0.96

0.90

0.85

0.79

0.72

0.66

0.58

0.50

0.41

0.33

1.06

1.01

0.95

0.90

0.84

0.78

0.72

0.66

0.58

0.50

0.34

1.11

1.06

1.00

0.95

0.90

0.84

0.78

0.72

0.65

0.58

0.35

1.15

1.10

1.05

1.00

0.95

0.89

0.84

0.78

0.72

0.65

0.36

1.20

1.15

1.10

1.05

1.00

0.95

0.89

0.84

0.78

0.72

0.37

1.24

1.19

1.14

1.10

1.05

1.00

0.94

0.89

0.84

0.78

0.38

1.28

1.24

1.19

1.14

1.09

1.04

0.99

0.94

0.89

0.84

0.39

1.32

1.28

1.23

1.18

1.14

1.09

1.04

0.99

0.94

0.89

0.40

1.37

1.32

1.27

1.23

1.18

1.14

1.09

1.04

0.99

0.94

"H = hit rate = /T'different" I Different).
F = false-alarm rate = /"("different" I Same).

b

0.00

408

Appendix 5

TABLE A5.4
(cont.)

Values of d' for Same-Different (Differencing

F*
0.21
1.41

0.22

0.42

1.45

0.43
0.44

1.49

a

Model)

0.25
1.22

0.26
1.18

0.27

0.28

1.13

1.09

0.29
1.04

1.31

1.27

1.22

1.18

1.13

1.09

0.99
1.04

1.35

1.31

1.27

1.22

1.13

1.09

1.44

1.39

1.35

1.31

1.26

1.18
1.22

1.18

1.13

1.52

1.48

1.43

1.39

1.35

1.31

1.26

1.22

1.18

1.56

1.51

1.47

1.13

1.39

1.35

1.31

1.26

1.22

1.64

1.60

1.55

1.51

1.47

1.43

1.39

1.35

1.31

1.63
1.67

1.59
1.63

1.55
1.59

1.51
1.55

1.47

0.49

1.68
1.71

1.51

1.43
1.47

1.39
1.43

1.35
1.39

1.26
1.31

0.50

1.75

1.71

1.67

1.63

1.59

1.55

1.51

1.47

1.43

1.39

0.51
0.52

1.79

1.75

1.66

1.62

1.58

1.55

1.51

1.47

1.43

1.82

1.78

1.70
1.74

1.70

1.66

1.62

1.58

1.55

1.51

1.47

0.53

1.86

1.82

1.78

1.74

1.70

1.66

1.62

1.59

1.55

1.51

0.54

1.90

1.86

1.82

1.78

1.74

1.66

1.62

1.59

1.55

0.55

1.93
1.97

1.89
1.93

1.85

1.81

1.78

1.70
1.74

1.85

1.78

1.70
1.74

1.63

1.89

1.59
1.63

0.23
1.32

0.24

1.40

1.36

1.44

1.40

1.52

1.48

0.45

1.56

0.46

1.60

0.47
0.48

H

0.41

1.36

1.27

0.30

1.35

0.56
0.57

2.01

1.97

1.93

1.89

1.81
1.85

1.81

1.78

1.66
1.70
1.74

0.58

2.05

2.01

1.97

1.93

1.89

1.85

1.81

1.78

1.74

1.67
1.71

0.59

2.04

1.93

1.89

1.85

1.82

1.78

1.75

2.08

2.00
2.04

1.96

0.60

2.08
2.12

2.00

1.96

1.93

1.89

1.86

1.82

1.78

0.61

2.16

2.12

2.08

2.04

2.00

1.97

1.93

1.89

1.86

1.82

2.00
2.04

1.97
2.01

1.93
1.97

1.86
1.90
1.94

1.66
1.70

0.62

2.20

2.16

0.63
0.64

2.23
2.27

2.19
2.23

2.12
2.15
2.19

2.08
2.12
2.16

2.04

2.08

2.05

2.01

1.90
1.94
1.98

0.65

2.31

2.27

2.23

2.19

2.16

2.12

2.09

2.05

2.02

1.98

0.66

2.35

2.31

2.27

2.23

2.20

2.16

2.13

2.09

2.06

2.02

2.20

2.17

2.13

2.10

2.06

2.24

2.21

2.17

2.14

2.10

2.08
2.12

0.67

2.39

2.35

2.31

2.27

2.24

0.68

2.43

2.39

2.35

2.31

2.28

0.69

2.47

2.43

2.39

2.35

2.32

2.28

2.25

2.21

2.18

2.15

0.70

2.51

2.47

2.43

2.39

2.36

2.32

2.29

2.25

2.22

2.19

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

Tables
TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

409

Model)

**

Ha

0.21

0.71

2.55

0.72

2.59

0.73

2.64

0.74
0.75

2.68
2.72

0.22

0.23

0.24

0.25

0.26

0.27

0.28

0.29

0.30

2.51

2.47

2.44

2.40

2.36

2.33

2.30

2.26

2.23

2.55

2.52

2.48

2.44

2.41

2.37

2.34

2.31

2.27

2.60

2.56

2.52

2.49

2.45

2.42

2.38

2.35

2.32

2.64
2.68

2.60
2.65

2.57
2.61

2.53
2.57

2.49
2.54

2.46
2.50

2.43
2.47

2.39
2.44

2.36
2.41

0.76

2.77

2.73

2.69

2.66

2.62

2.58

2.55

2.52

2.48

2.45

0.77

2.81

2.78

2.74

2.70

2.67

2.63

2.60

2.56

2.53

2.50

0.78

2.86

2.82

2.79

2.75

2.71

2.68

2.64

2.61

2.58

2.55

0.79

2.91

2.87

2.83

2.80

2.76

2.73

2.69

2.66

2.63

2.60

0.80

2.96

2.92

2.88

2.85

2.81

2.78

2.74

2.71

2.68

2.65

0.81
0.82

3.01
3.07

2.97
3.03

2.94

2.90
2.95

2.86
2.92

2.83
2.88

2.80
2.85

2.76
2.82

2.73
2.78

2.70

2.99

0.83

3.12

3.08

3.04

3.01

2.97

2.94

2.90

2.87

2.84

2.81

0.84

3.14

3.10

3.07

3.03

2.00

2.96

2.93

2.90

2.87

0.85

3.18
3.24

3.20

3.16

3.13

3.09

3.02

2.99

2.96

2.93

0.86

3.30

3.26

3.22

3.19

3.15

3.06
3.12

3.08

3.05

3.02

2.99

0.87

3.36

3.33

3.29

3.25

3.22

3.18

3.15

3.12

3.09

3.05

0.88
0.89
0.90

3.43

3.40
3.47
3.55

3.36

3.32

3.22

3.39
3.47

3.33
3.40

3.29
3.37

3.19
3.26
3.34

3.15
3.23
3.31

3.12

3.43
3.51

3.29
3.36
3.44

3.25

3.51
3.58

0.91

3.67

3.63

3.59

3.56

3.52

3.49

3.45

3.42

3.39

3.36

0.92

3.76

3.72

3.68

3.65

3.61

3.58

3.55

3.51

3.48

3.45

2.75

3.20
3.27

0.93

3.86

3.82

3.78

3.75

3.71

3.68

3.65

3.61

3.58

3.55

0.94

3.97

3.93

3.90

3.86

3.82

3.79

3.76

3.73

3.69

3.66

0.95
0.96
0.97

4.10
4.25
4.43

4.06
4.21
4.39

4.02
4.17

3.95
4.10
4.29

3.92

4.36

3.99
4.14
4.32

4.07
4.25

3.89
4.04
4.22

3.85
4.00
4.19

3.82
3.97
4.16

3.79
3.94
4.12

0.98

4.68

4.64

4.60

4.57

4.53

4.50

4.46

4.43

4.40

4.37

0.99

5.06

5.02

4.99

4.95

4.92

4.88

4.85

4.82

4.79

4.76

"H = hit rate = /^'different" I Different).
*F = false-alarm rate = P("different" I Same).

410

Appendix 5

TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

Model)

pb

H

a

0.31

0.31

0.00

0.32
0.33
0.34

0.32

0.33

0.29
0.41

0.00
0.29

0.00

0.50

0.41

0.29

0.34

0.35

0.36

0.37

0.38

0.39

0.40

0.00

0.35

0.58

0.50

0.41

0.29

0.00

0.36

0.65

0.58

0.50

0.41

0.29

0.00

0.37

0.72

0.65

0.58

0.50

0.41

0.29

0.38

0.78

0.72

0.65

0.58

0.50

0.41

0.29

0.00

0.39
0.40

0.84
0.89

0.78
0.84

0.72
0.78

0.66
0.72

0.59
0.66

0.51
0.59

0.41
0.51

0.29
0.41

0.00
0.29

0.00

0.41

0.94

0.89

0.84

0.78

0.72

0.66

0.59

0.51

0.41

0.29

0.42

0.99
1.04

0.94

0.89
0.94

0.84

0.78

0.72

0.66

0.59

0.51

0.42

0.84

0.73

0.66

0.59

0.51
0.59
0.67

0.00

1.09

0.99
1.04

0.99

0.89
0.95

0.90

0.78
0.84

0.79

0.73

0.66

0.45

1.13

1.09

1.04

1.01

0.95

0.90

0.84

0.79

0.73

0.46

1.18
1.22

1.13
1.18

1.09
1.14

1.04

1.00

0.95

0.90

0.85

1.00

1.18
1.23

1.09
1.14

1.05
1.10

0.95
1.00
1.05

0.90

1.22
1.27

1.09
1.14
1.18

1.05

1.26

0.79
0.85
0.91
0.96

0.85
0.91

1.31

1.27

1.23

1.19

1.14

1.10

1.06

1.01

0.96

0.43
0.44

0.47
0.48
0.49

0.96
1.01

0.73
0.79

0.50

1.31
1.35

0.51

1.39

1.35

1.31

1.27

1.23

1.19

1.15

1.10

1.06

1.01

0.52

1.43

1.39

1.35

1.31

1.28

1.23

1.19

1.15

1.11

1.06

0.53
0.54

1.47

1.43

1.40

1.32

1.28

1.24

1.11

1.47

1.44
1.48
1.52

1.36

1.32
1.37
1.41

1.28
1.33

1.20
1.24

1.16

1.51
1.55

1.36
1.40
1.44

1.29
1.33

1.20
1.25
1.29

1.16
1.21
1.25

1.38

1.34

1.30

1.38

1.34

1.59

1.51
1.56

1.48

1.40
1.45

0.57

1.63

1.60

1.56

1.52

1.49

1.45

1.37
1.41

0.58

1.67

1.64

1.56

1.53

1.49

1.46

1.42

0.59

1.71

1.68

1.60
1.64

1.60

1.57

1.53

1.50

1.46

1.43

1.39

0.60

1.75

1.71

1.68

1.65

1.61

1.58

1.54

1.51

1.47

1.43

0.55
0.56

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

Tables
TABLE A5.4
(cont.)

Values of d' for Same-Different (Differencing

411

Model)

pb
Ha

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0.38

0.39

0.40

0.61

1.79

1.75

1.73

1.69

1.65

1.62

1.58

1.55

1.51

1.48

0.62

1.83

1.79

1.76

1.73

1.69

1.66

1.62

1.59

1.56

1.52

0.63

1.87

1.83

1.80

1.77

1.73

1.70

1.67

1.63

1.60

1.56

0.64

1.91

1.88

1.84

1.81

1.78

1.74

1.71

1.68

1.64

1.61

0.65

1.95

1.88
1.92

1.82

1.78

1.75

1.72

1.89

1.86

1.83

1.79

1.76

1.68
1.73

1.65

1.99

1.92
1.96

1.85

0.66
0.67

1.96

1.93

1.90

1.87

1.84

1.80

1.77

2.03

1.69
1.74

0.68

2.07

2.00
2.04

2.01

1.97

1.94

1.91

1.88

1.85

1.81

1.78

0.69

2.11

2.08

2.05

2.02

1.98

1.95

1.92

1.89

1.86

1.83

0.70

2.15

2.12

2.09

2.06

2.03

2.00

1.96

1.93

1.90

1.87

0.71

2.20

2.17

2.13

2.10

2.07

2.04

2.01

1.98

1.95

1.92

0.72

2.24

2.21

2.11

2.08

2.05

2.02

2.28

2.19
2.23

2.17

2.10
2.14

2.07
2.11

1.99
2.04

2.27

2.16
2.20

2.13

2.33

2.25
2.30

2.18
2.22

2.15

0.73
0.74
0.75

2.37

2.34

2.31

2.28

2.25

2.22

2.19

0.76

2.42

2.39

2.36

2.33

2.30

2.27

2.24

0.77

2.47

2.44

2.41

2.37

2.34

2.31

0.78

2.52

2.48

2.45

2.42

2.36

0.79
0.80

2.56

2.53

2.47

2.41

2.62

2.58

2.50
2.55

2.39
2.44

2.52

2.49

2.47

0.81

2.67

2.64

2.55

2.72

2.69

2.61
2.66

2.58

0.82

2.63

2.60

0.83

2.78

2.75

2.69

2.66

1.96

2.08

2.01
2.05

2.16

2.13

2.10

2.21

2.18

2.15

2.28

2.26

2.23

2.20

2.33

2.30

2.28

2.25

2.38
2.44

2.36

2.30

2.41

2.33
2.38

2.52

2.49

2.46

2.43

2.40

2.57

2.54

2.51

2.46

2.63

2.60

2.57

2.49
2.54

2.35

2.52

0.84

2.83

2.80

2.72
2.77

2.74

2.72

2.69

2.66

2.63

2.60

2.57

0.85

2.89

2.86

2.83

2.81

2.78

2.75

2.72

2.69

2.66

2.64

0.86

2.96

2.93

2.90

2.87

2.84

2.81

2.78

2.75

2.73

2.70

0.87

3.02
3.09
3.17

2.99

2.96

3.06
3.14

2.91
2.98
3.05

2.88
2.95
3.02

2.85
2.92
2.99

2.89
2.97

2.79
2.86
2.94

2.77
2.84
2.91

0.90

3.24

3.21

3.03
3.11
3.18

2.93
3.00
3.08

2.82

0.88
0.89

3.16

3.13

3.10

3.07

3.04

3.02

2.99

"H = hit rate = P("different" I Different).
*F = false-alarm rate = /'("different" I Same).

412

Appendix 5

TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

Model)

pb
Ha

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0.38

0.39

0.40

0.91

3.33

3.30

3.27

3.24

3.21

3.18

3.16

3.13

3.10

3.08

0.92

3.42

3.39

3.36

3.33

3.30

3.28

3.25

3.22

3.19

3.17

0.93

3.52

3.49

3.35

3.30
3.41

3.76

3.73

3.70

3.67

3.49
3.62

3.46

0.95

3.52
3.64

3.32
3.43

3.27

3.60

3.43
3.54

3.38

3.63

3.46
3.57

3.40

0.94

3.54

3.91

3.88

3.82

3.80

3.77

3.71

4.09

4.07

4.01

3.98

3.95

3.93

3.90

3.69
3.87

3.66

0.97

3.85
4.04

3.59
3.74

3.56

0.96
0.98

4.34

4.31

4.28

4.25

4.22

4.20

4.17

4.14

4.12

4.09

0.99

4.73

4.70

4.67

4.64

4.61

4.58

4.56

4.53

4.50

4.48

H

f*
0.41

0.42

0.43

0.44

0.45

0.46

0.47

0.48

0.49

0.50

0.41
0.42

0.00
0.29

0.00

0.43

0.42

0.29

0.00

0.44

0.51

0.42

0.30

0.00

0.45

0.59

0.51

0.42

0.30

0.46

0.67

0.60

0.52

0.42

0.30

0.00

0.47
0.48

0.73
0.80

0.67
0.74

0.60
0.67

0.52
0.60

0.42
0.52

0.30
0.43

0.00
0.30

0.00

0.49
0.50

0.86
0.91

0.80
0.86

0.74
0.80

0.68
0.74

0.61
0.68

0.52
0.61

0.43
0.53

0.30
0.43

0.00
0.31

0.00

0.51

0.97

0.92

0.86

0.81

0.75

0.68

0.61

0.53

0.43

0.52

1.07

0.97

0.92

0.87

0.81

0.75

0.69

0.62

0.53

0.31
0.44

0.53
0.54

1.07

1.02

0.98

0.93

0.87

0.82

0.76

0.69

0.62

0.54

1.12

1.07

1.03

0.98

0.93

0.82

0.76

0.70

0.62

0.55
0.56

1.17
1.21

1.12

1.08
1.13

1.03

0.99
1.04

0.88
0.94

0.88
0.94

0.83
0.88

0.77

0.70

0.99

0.57

1.22
1.27

1.09
1.14

1.05

1.00

1.10

1.05

0.95
1.01

0.77
0.84

0.58

1.26
1.31

0.83
0.90
0.95

0.90

0.59

1.35

1.15

1.11

1.01

0.96

0.60

1.40

1.19
1.24

1.20

1.16

1.07

1.02

a

1.17

1.18

1.09
1.14
1.19

1.31

1.23
1.27

1.23

1.36

1.32

1.28

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

3.38
3.51
3.85

0.00

1.06
1.11

Tables
TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

413

Model)

pb
H

a

0.41

0.42

0.43

0.44

0.45

0.46

0.47

0.48

0.61

1.44

1.40

1.37

1.33

1.29

1.25

1.21

0.62

1.49

1.45

1.41

1.38

1.34

1.30

1.26

0.63

1.53

1.49

1.46

1.42

1.39

1.35

0.64

1.57

1.54

1.50

1.47

1.43

0.65

1.62

1.58

1.55

1.51

0.66

1.66

1.63

1.59

0.67

1.71

1.67

1.64

0.68

1.75

1.72

1.68

0.69

1.79

1.76

0.70

1.84

1.81

0.71

1.88

0.72

0.49

0.50

1.17

1.12

1.08

1.22

1.18

1.13

1.31

1.27

1.23

1.18

1.39

1.36

1.32

1.28

1.24

1.48

1.44

1.41

1.37

1.33

1.29

1.56

1.53

1.49

1.45

1.42

1.38

1.34

1.61

1.57

1.54

1.50

1.47

1.43

1.39

1.65

1.62

1.58

1.55

1.51

1.48

1.44

1.73

1.70

1.66

1.63

1.60

1.56

1.53

1.49

1.78

1.74

1.71

1.68

1.65

1.61

1.58

1.54

1.85

1.82

1.79

1.76

1.73

1.69

1.66

1.63

1.59

1.93

1.90

1.87

1.84

1.81

1.77

1.74

1.71

1.68

1.64

0.73

1.98

1.95

1.92

1.88

1.85

1.82

1.79

1.76

1.73

1.69

0.74

2.02

1.99

1.96

1.93

1.90

1.87

1.84

1.81

1.78

1.74

0.75

2.07

2.04

2.01

1.98

1.95

1.92

1.89

1.86

1.83

1.80

0.76

2.12

2.09

2.06

2.03

2.00

1.97

1.94

1.91

1.88

1.85

0.77

2.17

2.14

2.11

2.08

2.05

2.02

1.99

1.96

1.93

1.90

0.78

2.22

2.19

2.16

2.13

2.10

2.07

2.04

2.01

1.98

1.95

0.79

2.27

2.24

2.21

2.18

2.15

2.13

2.10

2.07

2.04

2.01

0.80

2.32

2.29

2.26

2.24

2.21

2.18

2.15

2.12

2.09

2.06

0.81

2.37

2.35

2.32

2.29

2.26

2.23

2.21

2.18

2.15

2.12

0.82

2.43

2.40

2.37

2.35

2.32

2.29

2.26

2.23

2.21

2.18

0.83

2.49

2.46

2.43

2.40

2.38

2.35

2.32

2.29

2.27

2.24

0.84

2.55

2.52

2.49

2.46

2.44

2.41

2.38

2.35

2.33

2.30

0.85

2.61

2.58

2.55

2.53

2.50

2.47

2.44

2.42

2.39

2.36

0.86

2.67

2.64

2.62

2.59

2.56

2.54

2.51

2.48

2.46

2.43

0.87

2.74

2.71

2.69

2.66

2.63

2.61

2.58

2.55

2.53

2.50

0.88

2.81

2.78

2.76

2.73

2.70

2.68

2.65

2.62

2.60

2.57

0.89

2.96

2.86

2.83

2.80

2.78

2.75

2.73

2.70

2.67

2.65

0.90

2.96

2.94

2.91

2.88

2.86

2.83

2.81

2.78

2.76

2.73

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = /'("different" I Same).

414

Appendix 5

TABLE A5.4
(cont.)

Values of d' for Same-Different (Differencing

Model)

pb
a

H

0.41

0.42

0.43

0.44

0.45

0.46

0.47

0.48

0.49

0.50

0.91

3.05

3.02

3.00

2.97

2.94

2.92

2.89

2.87

2.84

2.82

0.92

3.14

3.12

3.09

3.06

3.04

3.01

2.99

2.96

2.94

2.91

0.93

3.24

3.22

3.19

3.17

3.14

3.11

3.09

3.06

3.04

3.01

0.94

3.36

3.33

3.30

3.28

3.25

3.23

3.20

3.18

3.15

3.13

0.95

3.48

3.46

3.43

3.41

3.38

3.36

3.33

3.31

3.28

3.26

0.96

3.64

3.61

3.58

3.56

3.53

3.51

3.49

3.46

3.44

3.41

0.97

3.82

3.80

3.77

3.75

3.72

3.70

3.67

3.65

3.62

3.60

0.98

4.07

4.04

4.02

3.99

3.97

3.94

3.92

3.90

3.87

3.85

0.99

4.45

4.43

4.40

4.38

4.35

4.33

4.31

4.28

4.26

4.24

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0.60

f*
a

H

0.51

0.51

0.00

0.52

0.31

0.00

0.53

0.44

0.31

0.00

0.54

0.54

0.44

0.31

0.55

0.63

0.54

0.45

0.32

0.00

0.56

0.71

0.63

0.55

0.45

0.32

0.00
0.00

0.57

0.78

0.71

0.64

0.55

0.45

0.32

0.00

0.58

0.84

0.78

0.72

0.64

0.56

0.46

0.32

0.00

0.59

0.91

0.85

0.79

0.72

0.65

0.56

0.46

0.33

0.00

0.60

0.97

0.92

0.86

0.80

0.73

0.65

0.57

0.47

0.33

0.00

0.61

1.03

0.98

0.92

0.87

0.80

0.74

0.66

0.57

0.47

0.33

0.62

1.08

1.04

0.99

0.93

0.87

0.81

0.74

0.67

0.58

0.47

0.63

1.14

1.09

1.05

0.99

0.94

0.88

0.82

0.75

0.67

0.58

0.64

1.19

1.15

1.10

1.05

1.00

0.95

0.89

0.83

0.76

0.68

0.65

1.25

1.20

1.16

1.11

1.06

1.01

0.96

0.90

0.84

0.77

0.66

1.30

1.26

1.22

1.17

1.12

1.07

1.02

0.97

0.91

0.84

0.67

1.35

1.31

1.27

1.23

1.18

1.14

1.09

1.03

0.98

0.92

0.69

1.45

1.42

1.38

1.34

1.30

1.25

1.21

1.16

1.11

1.06

0.70

1.51

1.47

1.43

1.39

1.35

1.31

1.27

1.22

1.17

1.12

"H = hit rate = P("different" I Different).
*F = false-alarm rate = P("different" I Same).

416

Appendix 5

TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

Model)

Fb
Ha

0.61

0.61

0.62

0.00
0.34

0.63
0.64
0.65

0.64

0.65

0.67

0.62

0.63

0.48

0.00
0.34

0.00

0.59

0.48

0.34

0.00

0.69

0.60

0.49

0.35

0.00
0.35

0.00
0.36

0.00

0.66

0.68

0.69

0.70

0.66

0.77

0.70

0.60

0.50

0.67

0.85

0.78

0.70

0.61

0.68

0.93

0.86

0.79

0.71

0.50
0.62

0.51

0.36

0.00

0.69

1.00

0.94

0.87

0.80

0.72

0.63

0.52

0.37

0.00

0.70

1.07

1.01

0.95

0.89

0.81

0.73

0.64

0.52

0.37

0.00

0.71

1.14

1.08

1.03

0.96

0.90

0.82

0.74

0.65

0.53

0.38

0.72

1.20

1.15

1.10

1.04

0.98

0.91

0.84

0.75

0.66

0.54

0.73

1.27

1.22

1.17

1.11

1.05

0.99

0.92

0.85

0.76

0.67

0.74

1.33

1.28

1.23

1.18

1.13

1.07

1.01

0.94

0.86

0.78

0.75

1.39
1.46

1.35
1.41

1.30

1.25

1.20

1.15

1.09

1.02

0.76

1.37

1.32

1.27

1.22

1.10

0.95
1.04

0.97

0.77

1.52

1.43

1.39

1.34

1.29

1.16
1.24

1.18

1.12

1.06

1.50

1.46

1.41

1.36

1.31

1.26

1.20

1.14

0.88

0.78

1.58

1.48
1.54

0.79

1.64

1.60

1.56

1.52

1.48

1.43

1.39

1.34

1.28

1.22

0.80

1.71

1.67

1.63

1.59

1.55

1.50

1.46

1.41

1.36

1.31

0.81
0.82

1.77

1.73

1.70

1.66

1.62

1.58

1.53

1.49

1.44

1.39

1.84

1.80

1.69

1.47

1.76

1.68

1.90

1.80
1.87

1.56
1.64

1.51

1.87
1.94

1.65
1.72

1.60

1.90
1.97

1.76
1.83

1.73

0.83
0.84

1.83

1.79

1.75

1.71

1.59
1.67

1.55
1.63

0.85

2.04

2.01

1.97

1.94

1.90

1.87

1.83

1.75

1.71

0.86

2.11

2.08

2.05

2.02

1.98

1.95

1.91

1.79
1.87

1.83

1.79

0.87

2.19

2.16

2.13

2.09

2.06

2.02

1.99

1.95

1.91

1.87

0.88

2.27

2.24

2.21

2.17

2.14

2.11

2.07

2.04

2.00

1.96

2.26
2.35

2.23

2.19
2.28

2.16

2.32

2.25

2.13
2.22

2.09
2.18

2.15

0.89

2.35

2.32

2.29

0.90

2.43

2.41

2.38

"H = hit rate = PO'different" I Different).
b
F = false-alarm rate = P("different" I Same).

2.05

Tables
TABLE A5.4
(com.)
a

H

0.71
0.72

F*
0.51

Values ofd' for Same-Different (Differencing

0.52

1.56
1.61

1.52
1.57

0.53
1.48
1.54

0.54
1.45
1.50

0.55
1.41

0.56
1.37

0.57

0.58

1.32

1.46

1.42

1.38

1.28
1.34

415

Model)

0.59
1.24

0.60

1.30

1.19
1.25

1.55

1.52

1.40

1.36

1.31

1.57

1.48
1.54

1.44
1.50

1.46

1.42

1.70

1.61
1.66

1.63

1.59

1.55

1.52

1.48

1.37
1.44

1.78

1.75

1.72

1.68

1.65

1.61

1.57

1.54

1.50

1.84

1.80

1.71

1.74

1.70

1.67

1.63

1.60

1.56

1.92

1.89

1.73

1.69

1.91

1.79
1.85

1.75

1.97

1.82
1.88

1.66
1.72

1.62

1.85

2.03

1.83
1.88
1.94

1.76

1.95
2.00

1.86
1.92

1.80

1.98

1.81

1.78

1.68
1.74

0.81

2.09

2.06

2.03

2.00

1.97

1.94

1.91

1.87

1.84

1.81

0.82

2.15

2.12

2.06

2.03

1.97

1.94

1.90

1.87

2.12

2.09

2.00
2.06

2.03

2.00

1.97

1.94

0.73

1.66

1.62

0.74

1.71

1.68

1.59
1.64

0.75

1.73

0.76

1.76
1.82

0.77

1.87

0.78
0.79
0.80

0.83

2.21

2.18

2.09
2.15

0.84

2.27

2.24

2.21

2.19

2.16

2.13

2.10

2.07

2.04

2.00

0.85

2.34
2.40

2.31

2.28

2.25

2.22

2.19

2.16

2.13

2.10

2.07

2.37

2.32

2.47

2.45

2.35
2.42

2.29
2.36

2.26
2.33

2.23
2.31

2.20
2.28

2.17
2.25

2.14
2.22

0.88

2.55

2.52

2.49

2.41

2.38

2.35

2.33

2.30

2.62

2.60

2.57

2.46
2.54

2.44

0.89

2.52

2.46

2.70

2.68

2.65

2.63

2.60

2.55

2.43
2.52

2.41

0.90

2.49
2.57

2.49

2.38
2.46

0.91

2.79

2.77

2.74

2.71

2.69

2.66

2.64

2.61

2.58

2.56

0.92

2.89

2.76

2.73

2.71

2.68

2.65

2.94

2.81
2.92

2.78

2.99
3.11

2.86
2.97

2.84

0.93
0.94

2.84

3.06

3.03

2.76
2.88

3.16

3.04

3.02

3.39

3.37

3.32

3.29

3.11
3.27

3.09

0.96

3.19
3.34

2.81
2.93
3.07

2.79
2.91

3.24

3.08
3.21

2.86
2.98

0.95

2.89
3.01
3.14

3.22

3.20

3.17

0.97

3.58

3.55

3.53

3.51

3.48

3.46

3.25
3.44

3.41

3.83

3.80

3.78

3.76

3.73

3.71

3.69

3.66

3.39
3.64

3.36

0.98
0.99

4.21

4.19

4.17

4.15

4.12

4.10

4.08

4.06

4.03

4.01

0.86
0.87

2.39

"H= hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

2.96

3.62

Tables
TABLE A5.4
(cont.)

Values of d' for Same-Different (Differencing

417

Model)

F*
Ha

0.61

0.62

0.63

0.64

0.65

0.66

0.67

0.68

0.91

2.53

2.50

2.47

2.44

2.41

2.38

2.35

2.32

2.28

2.25

0.92

2.63

2.60

2.57

2.54

2.51

2.48

2.45

2.42

2.39

2.36

0.93
0.94

2.73

2.71

2.68

2.65

2.62

2.60

2.57

2.54

2.50

2.47

2.85

2.83

2.80

2.77

2.75

2.72

2.69

2.66

2.63

2.60

0.95

2.99

2.96

2.94

2.91

2.89

2.86

2.83

2.80

2.77

2.74

0.96

3.15

3.12

3.10

3.07

3.05

3.02

2.99

2.97

2.94

2.91

0.97

3.34

3.32

3.29

3.27

3.24

3.22

3.19

3.16

3.14

3.11

0.98

3.59

3.57

3.55

3.52

3.50

3.47

3.45

3.42

3.40

3.37

0.99

3.99

3.97

3.94

3.92

3.90

3.87

3.85

3.83

3.80

3.78

0.72

0.73

0.74

0.75

0.76

0.77

0.78

0.79

0.80

0.69

0.70

Fb

H

a

0.71

0.71

0.00

0.72

0.38

0.00

0.73

0.55

0.39

0.00

0.74

0.68

0.56

0.40

0.00

0.75

0.79

0.69

0.57

0.40

0.00

0.76

0.89

0.80

0.70

0.58

0.41

0.00

0.77

0.99

0.91

0.82

0.71

0.59

0.42

0.00

0.78

1.08

1.00

0.92

0.83

0.73

0.60

0.43

0.00

0.79

1.16

1.10

1.02

0.94

0.85

0.74

0.61

0.44

0.00

0.80

1.25

1.19

1.12

1.04

0.96

0.87

0.76

0.63

0.45

0.00

0.81

1.33

1.27

1.21

1.14

1.07

0.98

0.89

0.78

0.64

0.46

0.82

1.41

1.36

1.30

1.24

1.17

1.09

1.00

0.91

0.79

0.66

0.83

1.50

1.44

1.39

1.33

1.26

1.19

1.12

1.03

0.93

0.81

0.84

1.58

1.53

1.48

1.42

1.36

1.29

1.22

1.14

1.06

0.95

0.85

1.66

1.61

1.56

1.51

1.45

1.39

1.33

1.25

1.17

1.08

0.86

1.75

1.70

1.65

1.60

1.55

1.49

1.43

1.36

1.29

1.21

0.87

1.83

1.79

1.74

1.70

1.64

1.59

1.53

1.47

1.40

1.33

0.88

1.92

1.88

1.84

1.79

1.74

1.69

1.64

1.58

1.51

1.44

1.69

1.63

1.56

1.80

1.74

1.68

0.89

2.01

1.97

1.93

1.89

1.84

1.79

1.74

0.90

2.11

2.07

2.03

1.99

1.95

1.90

1.85

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = PC'different" I Same).

418

Appendix 5

TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

Model)

Fb

a

H

0.71

0.72

0.73

0.74

0.75

0.76

0.77

0.78

0.79

0.80

1.92

0.91

2.21

2.18

2.14

2.10

2.06

2.01

1.97

1.86

1.81

0.92

2.32

2.29

2.25

2.21

2.17

2.13

2.08

2.04

1.99

1.94

0.93

2.44

2.41

2.37

2.33

2.29

2.25

2.21

2.17

2.12

2.07

0.94

2.57

2.54

2.50

2.47

2.43

2.39

2.35

2.31

2.27

2.22

0.95

2.71

2.68

2.65

2.62

2.58

2.54

2.51

2.47

2.42

2.38

0.96

2.88

2.85

2.82

2.79

2.75

2.72

2.68

2.65

2.61

2.56

0.97

3.08

3.05

3.02

2.99

2.96

2.93

2.89

2.86

2.82

2.78

0.98

3.35

3.32

3.29

3.26

3.23

3.20

3.17

3.14

3.10

3.07

0.99

3.75

3.73

3.70

3.68

3.65

3.62

3.59

3.56

3.53

3.50

0.82

0.83

0.84

0.85

0.86

0.87

0.88

0.89

0.90

F»
H

a

0.81

0.81

0.00

0.82

0.47

0.00

0.83

0.67

0.48

0.00

0.84

0.84

0.69

0.50

0.00

0.85

0.98

0.86

0.71

0.51

0.00

0.86

1.12

1.01

0.89

0.74

0.53

0.00

0.87

1.24

1.15

1.04

0.92

0.75

0.55

0.00

0.88

1.37

1.28

1.19

1.08

0.95

0.79

0.57

0.00

0.89

1.49

1.42

1.33

1.23

1.12

0.99

0.82

0.59

0.00

0.90

1.62

1.55

1.47

1.38

1.28

1.17

1.03

0.86

0.62

"H = hit rate = P("different" I Different).
b
F = false-alarm rate = P("different" I Same).

0.00

Tables
TABLE A5.4
(cont.)

Values ofd' for Same-Different (Differencing

419

Model)

f*

Ha
0.91

0.81

0.82

0.83

0.84

0.85

0.86

0.87

0.88

0.89

0.90

1.75

1.68

1.61

1.53

1.44

1.34

1.22

1.08

0.90

0.65

0.92

1.88

1.82

1.75

1.68

1.60

1.50

1.40

1.28

1.13

0.95

0.93

2.02

1.96

1.90

1.83

1.76

1.67

1.58

1.47

1.35

1.20

0.94

2.17

2.11

2.06

1.99

1.93

1.85

1.77

1.67

1.56

1.43

0.95

2.33

2.28

2.23

2.17

2.11

2.04

1.96

1.88

1.78

1.67

0.96

2.52

2.47

2.42

2.37

2.31

2.25

2.18

2.10

2.02

1.92

0.97

2.74

2.70

2.65

2.61

2.55

2.50

2.43

2.36

2.29

2.20

0.98

3.03

2.99

2.95

2.90

2.86

2.80

2.75

2.69

2.62

2.54

0.99

3.46

3.43

3.39

3.35

3.31

3.27

3.22

3.16

3.11

3.04

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

pb
a

H

0.91

0.91

0.00

0.92

0.69

0.00

0.93

1.00

0.73

0.00

0.94

1.28

1.07

0.79

0.00

0.95

1.54

1.37

1.16

0.85

0.00

0.96

1.80

1.67

1.50

1.27

0.95

0.00

0.97

2.10

1.98

1.84

1.67

1.43

1.07

0.00

0.98

2.46

2.36

2.24

2.10

1.92

1.67

1.27

0.00

0.99

2.97

2.89

2.79

2.68

2.54

2.36

2.10

1.67

0.00

"H = hit rate = P("different" I Different).
*F = false-alarm rate = P("different" I Same).
Source: Adapted from Kaplan etal. (1978) by permission of The Psychonomic Society, Inc.
Hit and false-alarm rates, defined here in terms of the "different" response, were defined by Kaplan et al.
in terms of the "same" response. In addition, some entries are slightly different because of an improved
algorithm.

420

Appendix 5

TABLE A5.5 Values of d' for Oddity, Gaussian Model
(M = Number of Intervals).
M

p(c}

3

4

5

6

7

8

9

10

11

12

16

0.04 0.05 -

-

-

-

-

-

-

-

-

-

-

0.32 0.51

-

0.46 0.62

24

32

001
0 f>9

nm

0.06

-

-

-

-

-

-

-

-

-

-

0.07
0.08

_

_

_

-

_

_

_

_

-

-

0.09 0.10-

-

-

-

-

-

-

- 0.24 0.51 0.75 0.88
0.00 0.28 0.38 0.60 0.82 0.95

0.11

-

-

-

-

-

-

-

0.29 0.40 0.48 0.68 0.88 1.01

-

-

-

-

0.13
0.14

-

-

-

-

-

0.15

-

0.23 0.44 0.56 0.64 0.71 0.76 0.92 1.10 1.22

0.16

-

0.36 0.52 0.63 0.70 0.76 0.82 0.97 1.15 1.27

0.17
0.18

-

-

-

0.17 0.46 0.59 0.69 0.76 0.82 0.87 1.02 1.20 1.31
0.32 0.53 0.66 0.75 0.81 0.87 0.92 1.06 1.24 1.35

0.19

-

-

-

0.42 0.60 0.72 0.80 0.87 0.92 0.96 1.11 1.28 1.40

0.20

-

-

0.00 0.51 0.67 0.77 0.85 0.91 0.97 1.01 1.15 1.32 1.44

0.21

-

-

0.28 0.58 0.72 0.82 0.90 0.96 1.01 1.05 1.19 1.36 1.47

0.22

-

-

0.40 0.64 0.78 0.87 0.94 1.00 1.05 1.10 1.23 1.40 1.51

0.23

-

-

0.49 0.70 0.83 0.92 0.99 1.04 1.09 1.14 1.27 1.44 1.55

0.24
0.25
0.26

-

- 0.56 0.76 0.88 0.96 1.03 1.09 1.13 1.18 1.31 1.47 1.59
0.00 0.63 0.81 0.92 1.01 1.07 1.13 1.17 1.21 1.34 1.51 1.62
0.29 0.69 0.86 0.97 1.05 1.11 1.16 1.21 1.25 1.38 1.54 1.65

0.12-

-

0.27 0.58 0.72
0.41 0.67 0.80

0.27 0.40 0.50 0.56 0.74 0.94 1.07

0.20 0.39 0.49 0.57 0.63 0.81 1.00 1.12
0.34 0.48 0.57 0.64 0.70 0.86 1.05 1.17

0.27

-

0.42 0.75 0.91 1.01 1.09 1.15 1.20 1.25 1.29 1.41 1.58 1.69

0.28

-

0.51 0.80 0.95 1.05 1.13 1.19 1.24 1.28 1.32 1.45 1.61 1.72

0.29

-

0.59 0.86 1.00 1.09 1.17 1.23 1.28 1.32 1.36 1.48 1.64 1.75

0.30

-

0.66 0.90 1.04 1.13 1.20 1.26 1.31 1.36 1.39 1.52 1.68 1.79

422

Appendix 5

TABLE A5.5 Values of d; for Oddity, Gaussian Model
(M = Number of Intervals) (cont.)
p(c)

3

4

5

6

7

8

M
9

10

11

12

16

24

32

0.61 2.03 2.04 2.09 2.13 2.17 2.22 2.25 2.29 2.32 2.35 2.44 2.58

2.68

0.62 2.08 2.08 2.12 2.17 2.21 2.25 2.29 2.32 2.35 2.38 2.47 2.61

2.71

0.63 2.13 2.12 2.16 2.20 2.24 2.28 2.32 2.35 2.38 2.41 2.50 2.64
0.64 2.18 2.17 2.20 2.24 2.28 2.32 2.35 2.38 2.41 2.44 2.54 2.67

2.74
2.77

0.65 2.23 2.21 2.24 2.27 2.31 2.35 2.38 2.42 2.44 2.47 2.57 2.70

2.80

0.66 2.29 2.25 2.27 2.31 2.35 2.38 2.42 2.45 2.48 2.51 2.60 2.73

2.83

0.67 2.34 2.29 2.31 2.35 2.38 2.42 2.45 2.48 2.51 2.54 2.63 2.77

2.86

0.68 2.39 2.34 2.35 2.38 2.42 2.45 2.48 2.52 2.54 2.57 2.66 2.80

2.89

0.69 2.45 2.38 2.39 2.42 2.45 2.49 2.52 2.55 2.58 2.61 2.70 2.83

2.92

0.70 2.50 2.42 2.43 2.46 2.49 2.52 2.55 2.59 2.61 2.64 2.73 2.86

2.96

0.71 2.56 2.47 2.47 2.50 2.53 2.56 2.59 2.62 2.65 2.67 2.76 2.89

2.99

0.72 2.62 2.52 2.51 2.54 2.57 2.60 2.63 2.66 2.68 2.71 2.80 2.93
0.73 2.68 2.56 2.56 2.58 2.61 2.64 2.67 2.69 2.72 2.74 2.83 2.96

3.02
3.05

0.74 2.74 2.61 2.60 2.62 2.65 2.67 2.70 2.73 2.76 2.78 2.87 3.00

3.09

0.75 2.80 2.66 2.64 2.66 2.69 2.71 2.74 2.77 2.79 2.82 2.90 3.03

3.12

0.76 2.86 2.71 2.69 2.70 2.73 2.75 2.78 2.81 2.83 2.86 2.94 3.07

3.16

0.77 2.92 2.76 2.74 2.75 2.77 2.80 2.82 2.85 2.87 2.90 2.98 3.11

3.20

0.78 2.99 2.81 2.78 2.79 2.81 2.84 2.86 2.89 2.91 2.94 3.02 3.14

3.23

0.79 3.06 2.87 2.83 2.84 2.86 2.88 2.91 2.93 2.96 2.98 3.06 3.18 3.27
0.80 3.13 2.92 2.88 2.89 2.90 2.93 2.95 2.97 3.00 3.02 3.10 3.22 3.31
0.81 3.20 2.98 2.94 2.94 2.95 2.97 3.00 3.02 3.04 3.06 3.14 3.26

3.35

0.82 3.28 3.04 2.99 2.99 3.00 3.02 3.04 3.06 3.09 3.11 3.19 3.31

3.40

0.83 3.35 3.10 3.05 3.04 3.05 3.07 3.09 3.11 3.13 3.16 3.23 3.35

3.44

0.84 3.44 3.17 3.11 3.10 3.10 3.12 3.14 3.16 3.18 3.21 3.28 3.40

3.49

0.85 3.52 3.24 3.17 3.15 3.16 3.17 3.19 3.22 3.23 3.26 3.33 3.45

3.53

0.86 3.61 3.31 3.23 3.21 3.22 3.23 3.25 3.27 3.29 3.31 3.38 3.50

3.58

0.87 3.71 3.38 3.30 3.27 3.28 3.29 3.31 3.33 3.35 3.37 3.44 3.55
0.88 3.81 3.46 3.37 3.34 3.34 3.36 3.37 3.39 3.41 3.42 3.49 3.61

3.64
3.69

0.89 3.91 3.55 3.44 3.41 3.41 3.42 3.44 3.45 3.47 3.49 3.55 3.67

3.75

0.90 4.03 3.64 3.52 3.49 3.48 3.49 3.51 3.52 3.54 3.55 3.62 3.73 3.81

Tables

421

16

32

TABLE A5.5 Values of d' for Oddity, Gaussian Model
(M = Number of Intervals) (cont.)
4

5

6

7

3

0.31

-

0.72 0.95 1.08 1.17 1.24 1.30 1.35 1.39 1.43 1.55 1.71

1.82

0.32

-

0.79 1.00 1.12 1.21 1.28 1.33 1.38 1.42 1.46 1.58 1.74

1.85

0.33 0.00 0.84 1.04 1.16 1.24 1.31 1.37 1.42 1.46 1.50 1.61 1.77

1.88

1.20 1.28

8

M
9

p(c)

1.35

1.40

10

11

1.45 1.49

12

0.34 0.27 0.90

1.09

0.35 0.43 0.95

1.13 1.24 1.32 1.38 1.44 1.48 1.52 1.56

1.53 1.64

24

1.80

1.91

1.68 1.83

1.94

0.36 0.55

1.00 1.17 1.27 1.35 1.41 1.47 1.51 1.55 1.59 1.71 1.86

1.97

0.37 0.64

1.05

1.89

2.00

0.38 0.73

1.10 1.25 1.35 1.42 1.48 1.53 1.58 1.62 1.65 1.77 1.92

2.03

0.39 0.81 1.14 1.29 1.38 1.46 1.51 1.57 1.61 1.65 1.68 1.80 1.95

2.06

1.21 1.31 1.39

1.45

0.40 0.88 1.19 1.32 1.42 1.49 1.55
0.41 0.95

1.50 1.55

1.60

1.59

1.62 1.74

1.64 1.68 1.72 1.83 1.98 2.09

1.23 1.36 1.45 1.52 1.58 1.63 1.67 1.71 1.75 1.86 2.01 2.12

0.42 1.01 1.27 1.40 1.49 1.56 1.61 1.66 1.70 1.74 1.78 1.89 2.04

2.14

0.43 1.07 1.32 1.44 1.52 1.59 1.64
0.44 1.13 1.36 1.47 1.56 1.62 1.68

1.69 1.74 1.77 1.81 1.92 2.07
1.72 1.77 1.80 1.84 1.95 2.10

2.17
2.20

0.45 1.19 1.40 1.51 1.59 1.65 1.71 1.75 1.80 1.83 1.87 1.98 2.13

2.23

0.46

1.25 1.44 1.55 1.62 1.69 1.74 1.79 1.83 1.86 1.90 2.00 2.15

2.26

0.47

1.31 1.48 1.58

2.28

0.48

1.36 1.52 1.62 1.69

0.49

1.41 1.56 1.65 1.73 1.78 1.83 1.88 1.92 1.95 1.99 2.09 2.24 2.34

1.66 1.72 1.77 1.82 1.86 1.89 1.93 2.03 2.18
1.75 1.80 1.85 1.89 1.92 1.96 2.06 2.21

0.50 1.47 1.60 1.69 1.76 1.82 1.86 1.91 1.95 1.98 2.02 2.12 2.27

2.31
2.37

0.51 1.52 1.64 1.73 1.79 1.85 1.90 1.94 1.98 2.01 2.04 2.15 2.29 2.40
0.52 1.57 1.68 1.76 1.83 1.88 1.93 1.97 2.01 2.04 2.07 2.18 2.32 2.42
0.53

1.62 1.72 1.80 1.86 1.91 1.96 2.00 2.04 2.07 2.10 2.21 2.35

2.45

0.54

1.67

2.48

0.55

1.76

1.83

1.89 1.95 1.99 2.03 2.07 2.10 2.13 2.24 2.38

1.72 1.80 1.87 1.93 1.98 2.02 2.06 2.10 2.13 2.16 2.27 2.41

2.51

0.56 1.77 1.84 1.90 1.96 2.01 2.05 2.09 2.13 2.16 2.19 2.30 2.44

2.54

0.57 1.82 1.88 1.94

2.56

1.99 2.04 2.09 2.12 2.16 2.19 2.22 2.32 2.47

0.58 1.87 1.92 1.98 2.03 2.07 2.12 2.16 2.19 2.22 2.26 2.35 2.50

2.59

0.59
0.60

2.62
2.65

1.92 1.96 2.01 2.06 2.11 2.15 2.19 2.22 2.26 2.29 2.38 2.52
1.98 2.00 2.05 2.10 2.14 2.18 2.22 2.25 2.29 2.32 2.41 2.55

Tables

423

16

32

TABLE A5.5 Values of d' for Oddity, Gaussian Model
(M - Number of Intervals) (cont.)
p(c)

3

4

5

6

7

8

M
9

10

11

12

24

0.91 4.15 3.73 3.61 3.57 3.56 3.57 3.58 3.59 3.61 3.63 3.69 3.80 3.88
0.92 4.29 3.84 3.71 3.66 3.65 3.65 3.66 3.67 3.69 3.71 3.77 3.87 3.95
0.93 4.44 3.96 3.81 3.76 3.74 3.74 3.75 3.76 3.78 3.79 3.85 3.95 4.03
0.94 4.61 4.09 3.93 3.87 3.85 3.85 3.85 3.86 3.88 3.89 3.95 4.04 4.12
0.95 4.80 4.24 4.07 4.00 3.97 3.97 3.97 3.98 3.99 4.00 4.05 4.15 4.23
0.96 5.03 4.42 4.23 4.15 4.12 4.11 4.11 4.11 4.12 4.13 4.18 4.27 4.35
0.97 5.32 4.64 4.43 4.34 4.30 4.28 4.27 4.28 4.29 4.29 4.34 4.43 4.50
0.98 5.70 4.94 4.70 4.59 4.54 4.51 4.50 4.50 4.50 4.51 4.55 4.63 4.70
0.99 6.31 5.42 5.12 4.99 4.92 4.88 4.86 4.85 4.85 4.86 4.88 4.95 5.02
Source: Reprinted from Craven (1992) by permission of the author and the Psychonomic Society.

424

Appendix 5

TABLE A5.6 Values of p(c) given d' for Oddity (Differencing and
Independent-Observation Model, Normal), and for mAFC.
m =5
Oddity

m =4
Oddity

m =3
Oddity
d'

AFC

£2=0

£2 = 1

AFC

£2=0

e2 = l

AFC

E2=0

£2 = 1

0.0

0.333

0.333

0.333

0.250

0.250

0.250

0.200

0.200

0.200

0.1

0.362

0.334

0.335

0.277

0.252

0.252

0.224

0.201

0.202

0.2

0.391
0.422

0.337

0.338

0.304

0.255

0.257

0.249

0.205

0.208

0.342

0.345
0.353

0.333
0.362

0.261
0.268

0.267

0.278

0.452

0.279

0.211
0.221

0.219
0.234

0.364

0.393
0.424

0.279

0.295

0.305
0.334

0.232

0.252

0.292

0.246

0.275

0.306

0.313
0.334

0.366

0.456

0.397

0.261

0.299

0.3
0.4
0.5

0.483

0.348
0.356

0.6

0.512

0.365

0.7

0.543

0.376

0.376
0.392

0.8

0.574

0.389

0.409

0.487

0.321

0.358

0.428

0.279

0.327

0.9

0.604

0.403

0.427

0.520

0.339

0.385

0.461

0.299

0.357

1.0

0.633

0.418

0.446

0.552

0.360

0.413

0.495

0.321

0.391

1.1
1.2

0.663

0.434

0.468

0.583

0.344

0.426

0.452

0.491

0.559

0.716

0.468

0.513

0.426

0.506

0.591

0.368
0.394

0.460

1.3
1.4

0.615
0.644

0.443
0.474

0.528

0.690

0.381
0.404

0.496

0.741

0.488

0.536

0.673

0.449

0.538

0.624

0.420

0.533

1.5

0.765

0.507

0.475

0.571

0.653

0.447

0.788

0.525

0.560
0.584

0.702

1.6

0.729

0.603

0.683

0.475

0.568
0.604

1.7

0.810

0.546

0.608

0.755

0.499
0.524

0.635

0.711

0.503

0.639

1.8

0.831
0.848

0.565

0.631
0.654

0.779
0.802

0.665

0.738
0.764

0.531

0.673

0.696

0.823

0.725

0.788

0.559
0.586

0.705

0.677

0.549
0.575
0.600

0.699

0.842

0.624

0.751

0.810

0.614

0.764

0.865

0.585
0.605

2.1

0.881

0.624

1.9
2.0

0.735

2.2

0.896

0.645

0.722

0.860

0.649

0.777

0.831

0.640

0.791

2.3

0.909

0.663

0.742

0.877

0.672

0.801

0.851

0.667

0.816

2.4

0.921

0.682

0.763

0.893

0.695

0.824

0.869

0.692

0.839

2.5

0.931

0.700

0.781

0.907

0.716

0.845

0.885

0.716

0.859

2.6

0.941

0.718

0.799

0.738
0.758

0.864

0.900
0.914

0.740
0.762

0.878
0.895

0.925

0.783

0.910

0.936

0.803

0.924

0.945

0.821

0.935

2.7

0.949

0.734

0.816

0.919
0.930

2.8

0.957

0.751

0.832

0.941

0.778

2.9

0.963

0.766

0.847

0.949

0.796

0.898
0.911

3.0

0.968

0.783

0.862

0.957

0.813

0.924

0.882

Tables

425

TABLE A5.6 Values of p(c) given d' for Oddity (Differencing and
Independent-Observation Model, Normal), and for mAFC (cont.)
m =3
Oddity

m =4
Oddity

m=5
Oddity
AFC

£2=0

e2 = 7

0.829
0.845

£2-l
0.935
0.945

0.953
0.961

0.839
0.855

0.945
0.954

0.974

0.859

0.953

0.967

0.870

0.962

0.908

0.978

0.873

0.961

0.972

0.884

0.969

0.917

0.982

0.885

0.967

0.977

0.897

0.974

0.985

0.896

0.973

0.981

0.908

0.979

0.870

0.926
0.934

0.988

0.907

0.978

0.984

0.941

0.990

0.916

0.982

0.987

0.919
0.928

0.983

0.879

0.989
0.991

d'
3.1

AFC
0.974

£2=0

£2 = 1

AFC

£2 = 0

0.795

0.874

3.2

0.978

0.810

0.886

0.963
0.969

3.3

0.981

0.823

0.897

3.4

0.985

0.836

3.5

0.987

0.848

3.6

0.990

0.858

3.7
3.8

0.991
0.993

0.985

3.9

0.995

0.889

0.947

0.992

0.925

0.985

0.990

0.938

4.0

0.995

0.898

0.953

0.993

0.933

0.988

0.992

0.945

4.1

0.996

0.906

0.941

0.952

0.914

0.996

0.948

0.990
0.992

0.993

0.997

0.959
0.964

0.995

4.2

0.995

0.958

0.993
0.994

4.3

0.998

0.921

0.968

0.997

0.953

0.994

0.996

0.964

0.995

4.4

0.998

0.927

0.972

0.997

0.959

0.995

0.996

0.969

0.996

4.5

0.999

0.934

0.976

0.998

0.964

0.996

0.997

0.973

0.997

4.6
4.7

0.999
0.999

0.939

0.978
0.981

0.998

0.997
0.997

0.998
0.998

0.998

0.999

0.968
0.972

0.977

0.946

0.980

0.998

4.8

0.999

0.950

0.983

0.999

0.976

0.998

0.999

0.983

4.9
5.0

0.999

0.955

0.986

0.999

0.979

0.998

0.999

0.986

0.999
0.999

1.000

0.959

0.987

0.999

0.981

0.999

0.999

0.988

0.999

Note: e 2 = 0 is the differencing model, e 2 = 1 is independent-observations. Table from Versfeld et al.
(1996) with permission of the authors and The Psychonomic Society.

TABLE A5.7

Values of d' for m-Interval Forced Choice or Identification.

P\»

.01
.02

-3.29

-2.42

-2.02

-1.77

-1.59

-2.90

-2.08

-1.69

-1.28

-1.46
-1.14

-1.35
-1.04

-1.26
-0.95

10

11

12

-1.18

-1.12

-1.06

-0.88

24
-0.65

-0.88

-0.81

-0.75

-0.58

-0.35

16

32
-0.50
-0.21

256
0.35
0.64

1000
0.80
1.26

.03
.04

-2.66

-1.86

-1.48

-1.45
-1.24

-0.94

-0.84

-0.75

-0.68

-0.62

-0.56

-0.39

-0.16

-0.02

-2.48

-1.69

-1.32

-1.09

-1.07
-0.92

0.82

-0.79

-0.69

-0.61

-0.53

-0.47

-0.41

-0.24

-0.02

0.12

0.96

1.40

.05

-2.33

-0.96

-0.80

-0.67

-0.57

-0.49

-0.41

-0.35

-0.29

-0.12

0.10

0.24

1.07

1.51

.06

-1.56
-1.44

-1.19

-2.20

-1.08

-0.85

-0.69

-0.57

-0.47

-0.38

-0.31

-0.25

-0.19

-0.02

0.19

0.34

1.16

1.60

.07

-2.09

-1.34

-0.98

-0.76

-0.48

-0.38

-0.16

-0.11

0.07

-0.90

-0.68

-0.39

-0.29

-0.14

-0.08

-0.03

0.14

1.25
1.32

1.68

-1.25

0.28
0.36

0.42

-1.99

-0.29
-0.21

-0.22

.08

-0.60
-0.52

.09

-1.90

-1.17

-0.82

-0.44

-0.32

-0.22

-0.14

-0.07

-0.01

0.05

0.22

0.43

0.57

-1.81

-1.09

-0.75

-0.37

-0.25

-0.15

-0.07

0.00

0.06

0.11

0.28

0.50

0.63

1.39
1.45

1.82

.10

-0.60
-0.53

.11

-1.73

-1.02

-0.68

-0.46

-0.31

-0.19

-0.09

-0.01

0.12

0.18

0.34

0.56

0.69

1.51

1.94

.12

-1.66

-0.96

-0.62

-0.40
-0.34

-0.25

-0.13
-0.07

-0.03

0.05

0.06
0.12

0.24

0.40

0.75

1.57

2.00

0.03

0.11

0.18

0.18
0.24

0.80

1.62

2.05

-0.01

0.08

0.16

0.23

.13
.14

-1.59
-1.53

-0.89
-0.83

-0.56
-0.50

.15

-1.47
-1.41

-0.78
-0.72

-1.35

-0.29

-0.19
-0.13

-0.45

-0.23

-0.08

0.04

0.13

0.21

0.28

-0.18
-0.13

-0.03
0.02

0.09
0.13

0.18
0.23

0.26
0.31

0.33

-0.67

-0.39
-0.35

-0.62

-0.30

-0.57

0.11

0.18
0.22

0.27
0.32

.20

-1.19

-0.53

-0.25
-0.21

-0.09
-0.04

0.06

.19

-1.29
-1.24

0.00

0.15

0.26

0.36

.16
.17
.18

0.29
0.34

0.50

1.08

1.76
1.89

0.29
0.34

0.46

0.61
0.67

0.51

0.72

0.85

1.67

2.10

0.39

0.56

0.77

0.90

1.71

2.14

0.44

0.60

0.81

0.95

0.37

0.39
0.43

0.49

0.65

0.86

2.19
2.23

0.35

0.42

0.48

0.53

2.27

0.46

0.52

0.57

0.90
0.94

1.84

0.39

0.69
0.73

0.99
1.04

1.76
1.80

2.31

0.44

0.50

0.56

0.61

0.77

0.98

1.88
1.92

1.08
1.12

2.35

TABLE A5.7

p\m

Values ofd' for m-Interval Forced Choice or Identification
10

11

12

16

24

32

256

1000

0.40
0.44
0.48
0.52
0.55
0.59
0.62
0.66
0.69
0.73

9
0.48
0.52
0.56
0.59
0.63
0.66
0.70
0.74
0.77
0.80

0.54
0.58
0.62
0.66
0.70
0.73
0.77
0.80
0.83
0.87

0.60
0.64
0.68
0.72
0.75
0.79
0.82
0.86
0.89
0.92

0.65
0.69
0.73
0.77
0.80
0.84
0.87
0.91
0.94
0.97

0.81
0.85
0.89
0.93
0.96
1.00
1.03
1.07
1.10
1.13

1.02
1.06
1.10
1.13
1.17
1.20
1.24
1.27
1.30
1.33

1.16
1.19
1.23
1.27
1.30
1.34
1.37
1.40
1.43
1.47

1.96
1.99
2.03
2.06
2.10
2.13
2.17
2.20
2.23
2.26

2.39
2.42
2.46
2.49
2.52
2.56
2.59
2.62
2.65
2.68

0.76
0.79
0.83
0.86
0.89
0.92
0.95
0.98
1.01
1.04

0.84
0.87
0.90
0.93
0.96
1.00
1.03
1.06
1.09
1.12

0.90
0.93
0.97
1.00
1.03
1.06
1.09
1.12
1.15
1.18

0.96
0.99
1.02
1.05
1.08
1.12
1.15
1.18
1.21
1.24

1.01
1.04
1.07
1.10
1.13
1.16
1.19
1.23
1.26
1.29

1.16
1.20
1.23
1.26
1.29
1.32
1.35
1.38
1.41
1.44

1.37
1.40
1.43
1.46
1.49
1.52
1.55
1.58
1.61
1.64

1.50
1.53
1.56
1.59
1.62
1.65
1.68
1.71
1.74
1.77

2.29
2.32
2.35
2.38
2.41
2.44
2.46
2.49
2.52
2.55

2.71
2.74
2.77
2.80
2.83
2.86
2.89
2.91
2.94
2.97

6

7

8

-0.16
-0.12
-0.08
-0.04
0.00
0.04
0.08
0.11
0.15
0.19

5
0.04
0.08
0.12
0.16
0.20
0.24
0.28
0.31
0.35
0.38

0.19
0.23
0.27
0.31
0.35
0.38
0.42
0.46
0.49
0.53

0.31
0.35
0.38
0.42
0.46
0.50
0.53
0.57
0.60
0.64

0.22
0.26
0.29
0.32
0.36
0.39
0.42
0.46
0.49
0.52

0.42
0.45
0.48
0.52
0.55
0.58
0.62
0.65
0.68
0.71

0.56
0.59
0.63
0.66
0.69
0.72
0.76
0.79
0.82
0.85

0.67
0.70
0.74
0.77
0.80
0.83
0.86
0.89
0.93
0.96

3

4

.21
.22
.23
.24
.25
.26
.27
.28
.29
.30

2
-1.14
-1.09
-1.04
-1.00
-0.95
-0.91
-0.87
-0.82
-0.78
-0.74

-0.48
-0.44
-0.40
-0.35
-0.31
-0.27
-0.23
-0.20
-0.16
-0.12

.31
.32
.33
.34
.35
.36
.37
.38
.39
.40

-0.70
-0.66
-0.62
-0.58
-0.55
-0.51
-0.47
-0.43
-0.40
-0.36

-0.08
-0.05
-0.01
0.02
0.06
0.09
0.13
0.16
0.20
0.23

(cont.)

to

TABLE A5 .7
m

P\

Values of d ' for m-Interval Forced Choice or Identification

(cont. )

.41
.42
.43
.44
.45
.46
.47
.48
.49
.50

2
-0.32
-0.29
-0.25
-0.21
-0.18
-0.14
-0.11
-0.07
-0.04
0.00

3
0.26
0.30
0.33
0.36
0.39
0.43
0.46
0.49
0.52
0.56

4
0.55
0.59
0.62
0.65
0.68
0.71
0.74
0.77
0.81
0.84

5
0.74
0.77
0.80
0.84
0.87
0.90
0.93
0.96
0.99
1.02

6
0.88
0.91
0.94
0.97
1.00
1.03
1.06
1.09
1.12
1.15

7
0.99
1.02
1.05
1.08
1.11
1.14
1.17
1.20
1.23
1.26

8
1.07
1.10
1.13
1.16
1.19
1.22
1.25
1.28
1.31
1.34

9
1.15
1.18
1.21
1.24
1.27
1.30
1.33
1.35
1.38
1.41

10
1.21
1.24
1.27
1.30
1.33
1.36
1.39
1.42
1.45
1.47

11
1.27
1.30
1.33
1.35
1.38
1.41
1.44
1.47
1.50
1.53

12
1.31
1.34
1.37
1.40
1.43
1.46
1.49
1.52
1.55
1.58

16
1.47
1.50
1.53
1.55
1.58
1.61
1.64
1.67
1.70
1.73

24
1.67
1.69
1.72
1.75
1.78
1.81
1.84
1.86
1.89
1.92

32
1.79
1.82
1.85
1.88
1.91
1.94
1.96
1.99
2.02
2.05

256
2.58
2.60
2.63
2.66
2.68
2.71
2.74
2.77
2.79
2.82

1000
3.00
3.02
3.05
3.08
3.10
3.13
3.16
3.18
3.21
3.24

.51
.52
.53
.54
.55
.56
.57
.58
.59
.60

0.04
0.07
0.11
0.14
0.18
0.21
0.25
0.29
0.32
0.36

0.59
0.62
0.65
0.69
0.72
0.75
0.78
0.82
0.85
0.89

0.87
0.90
0.93
0.96
0.99
1.02
1.06
1.09
1.12
1.15

1.05
1.08
1.11
1.14
1.17
1.20
1.23
1.27
1.30
1.33

1.18
1.21
1.24
1.27
1.30
1.33
1.37
1.39
1.43
1.46

1.29
1.32
1.35
1.38
1.41
1.44
1.47
1.50
1.53
1.56

1.37
1.40
1.43
1.46
1.49
1.52
1.55
1.58
1.61
1.64

1.44
1 .41
1.50
1.53
1.56
1.59
1.62
1.65
1.68
1.71

1.50
1.53
1.56
1.59
1.62
1.65
1.68
1.71
1.74
1.77

1.56
1.59
1.62
1.65
1.67
1.70
1.73
1.76
1.79
1.82

1.61
1.63
1.66
1.69
1.72
1.75
1.78
1.81
1.84
1.87

1.75
1.78
1.81
1.84
1.87
1.90
1.93
1.96
1.99
2.02

1.95
1.98
2.01
2.03
2.06
2.09
2.12
2.15
2.18
2.21

2.08
2.10
2.13
2.16
2.19
2.22
2.24
2.27
2.30
2.33

2.85
2.87
2.90
2.93
2.95
2.98
3.01
3.04
3.06
3.09

3.26
3.29
3.32
3.34
3.37
3.40
3.42
3.45
3.48
3.51

TABLE A5.7
m

Values ofd ' for m-Interval Forced Choice or Identification

(cont.)

P\

2

3

4

5

6

7

8

9

10

11

12

16

24

32

256

.61
.62
.63
.64
.65
.66
.67
.68
.69
.70

0.40
0.43
0.47
0.51
0.54
0.58
0.62
0.66
0.70
0.74

0.92
0.95
0.99
1.02
1.06
1.09
1.13
1.16
1.20
1.24

1.19
1.22
1.25
1.29
1.32
1.35
1.39
1.42
1.46
1.49

1.36
1.39
1.42
1.46
1.49
1.52
1.56
1.59
1.63
1.66

1.49
1.52
1.55
1.58
1.62
1.65
1.68
1,75
1.79

1.59
1.62
1.65
1.68
1.72
1.75
1.78
1.81
1.85
1.89

1.67
1.70
1.73
1.77
1.80
1.83
1.86
1.89
1.93
1.96

1.74
1.77
1.80
1.83
1.87
1.90
1.93
1.96
2.00
2.03

1.80
1.83
1.86
1.89
1.92
1.96
1.99
2.02
2.05
2.09

1.85
1.88
1.91
1.94
1.98
2.01
2.04
2.07
2.10
2.14

1.90
1.93
1.96
1.99
2.02
2.05
2.09
2.12
2.15
2.18

2.05
2.07
2.10
2.14
2.17
2.20
2.23
2.26
2.29
2.33

2.23
2.26
2.29
2.32
2.35
2.38
2.42
2.45
2.48
2.51

2.36
2.39
2.42
2.45
2.48
2.51
2.54
2.57
2.60
2.63

3.12
3.15
3.18
3.20
3.23
3.26
3.29
3.32
3.35
3.38

1000
3.53
3.56
3.59
3.62
3.64
3.67
3.70
3.73
3.76
3.79

.71
.72
.73
.74
.75
.76
.77
.78
.79
.80

0.78
0.82
0.87
0.91
0.95
1.00
1.05
1.09
1.14
1.19

1.28
1.31
1.35
1.39
1.43
1.47
1.52
1.56
1.61
1.65

1.53
1.57
1.60
1.64
1.68
1.72
1.76
1.81
1.85
1.89

1.70
1.73
1.77
1.81
1.85
1.89
1.93
1.97
2.01
2.05

1.82
1.86
1.89
1.93
1.97
2.01
2.05
2.09
2.13
2.17

1.92
1.95
1.99
2.03
2.06
2.10
2.14
2.18
2.22
2.26

2.00
2.03
2.07
2.10
2.14
2.18
2.22
2.26
2.30
2.34

2.06
2.10
2.13
2.17
2.21
2.24
2.28
2.32
2.36
2.40

2.12
2jl6
2.19
2.23
2.26
2.30
2.34
2.38
2.42
2.46

2.17
2.21
2.24
2.28
2.31
2.35
2.39
2.43
2.47
2.51

2.22
2.25
2.29
2.32
2.36
2.40
2.43
2.47
2.51
2.55

2.36
2.39
2.43
2.46
2.50
2.54
2.57
2.61
2.65
2.69

2.54
2.58
2.61
2.65
2.68
2.72
2.75
2.79
2.83
2.87

2.67
2.70
2.73
2.77
2.80
2.84
2.87
2.91
2.95
2.99

3.42
3.45
3.48
3.51
3.54
3.58
3.61
3.65
3.69
3.73

3.82
3.85
3.89
3.92
3.95
3.99
4.02
4.06
4.09
4.13

L72

TABLE A5.7

p\m

Values of d' for m-Interval Forced Choice or Identification
2.45
2.49
2.54
.2.59
2.64
2.69
2.74
2.80
2.86
2.93

10
2.50
2.55
2.59
2.64
2.69
2.74
2.80
2.86
2.92
2.98

11
2.55
2.60
2.64
2.69
2.74
2.79
2.85
2.90
2.96
3.03

12
2.60
2.64
2.68
2.73
2.78
2.83
2.89
2.95
3.01
3.07

16
2.73
2.78
2.82
2.87
2.92
2.97
3.02
3.08
3.14
3.20

24
2.91
2.96
3.00
3.05
3.09
3.14
3.20
3.25
3.31
3.37

32
3.03
3.07
3.12
3.16
3.21
3.26
3.31
3.37
3.43
3.49

256
3.77
3.81
3.85
3.89
3.94
3.99
4.04
4.09
4.15
4.20

1000
4.17
4.21
4.25
4.29
4.34
4.39
4.44
4.49
4.54
4.60

3.00
3.08
3.16
3.26
3.37
3.50
3.65
3.87
4.20

3.05
3.13
3.22
3.31
3.42
3.55
3.70
3.91
4.25

3.10
3.18
3.26
3.35
3.46
3.59
3.75
3.95
4.29

3.14
3.22
3.30
3.39
3.50
3.63
3.78
3.99
4.32

3.27
3.35
3.43
3.52
3.63
3.75
3.91
4.11
4.44

3.44
3.52
3.60
3.69
3.79
3.92
4.07
4.27
4.59

3.56
3.63
3.71
3.80
3.91
4.03
4.18
4.38
4.69

4.27
4.34
4.42
4.50
4.60
4.72
4.86
5.05
5.36

4.67
4.73
4.81
4.90
4.99
5.11
5.25
5.44
5.73

6

7

8

9

1.94
1.99
2.04
2.09
2.14
2.20
2.25
2.32
2.38
2.45

5
2.10
2.14
2.19
2.24
2.29
2.35
2.41
2.47
2.53
2.60

2.22
2.26
2.31
2.36
2.41
2.46
2.52
2.58
2.64
2.71

2.31
2.35
2.40
2.45
2.50
2.55
2.61
2.67
2.73
2.80

2.38
2.43
2.47
2.52
2.57
2.63
2.68
2.74
2.80
2.87

2.53
2.61
2.70
2.80
2.92
3.05
3.22
3.44
3.80

2.67
2.75
2.84
2.94
3.06
3.19
3.35
3.57
3.92

2.78
2.86
2.95
3.05
3.16
3.29
3.45
3.67
4.01

2.87
2.95
3.03
3.13
3.24
3.37
3.53
3.75
4.09

2.94
3.02
3.10
3.20
3.31
3.44
3.60
3.81
4.15

3

4

.81
.82
.83
.84
.85
.86
.87
.88
.89
.90

2
1.24
1.29
1.35
1.41
1.47
1.53
1.59
1.66
1.73
1.81

1.70
1.75
1.80
1.85
1.91
1.97
2.03
2.09
2.16
2.23

.91
.92
.93
.94
.95
.96
.97
.98
.99

1.90
1.99
2.09
2.20
2.33
2.48
2.66
2.90
3.29

2.31
2.39
2.49
2.59
2.71
2.85
3.02
3.25
3.62

(cont.)

Note: Equal detectability and unbiased responding is assumed.
Source: Reprinted from Hacker and Ratcliff (1979) by permission of the authors and The Psychonomic Society, Inc.

Appendix O

Software for Detection Theory

In this appendix, we describe software available for doing the calculations
presented in the book. The descriptions take two forms. First, the listing of a
program for basic calculation (e.g., of yes-no d', c, and ft) is given. Second,
we list Web sites from which useful programs can be downloaded. We thank
the colleagues who have agreed to publicize their sites in this way.
Listing
The following Pascal program calculates d', c, and j8 for frequency data
typed into a keyboard. It consists of a subroutine for estimating z and a simple driver program. To deal with perfect proportions, it uses the l/2, N-l/z
rule described in chapter 1 .
program sdtdrive;

function z(p:real):real; {Odeh & Evans}
var y:real;
begin
y: = sqrt(-2*ln(p));
z: = -y + ((((0.0000453642210148*y + 0.020423 12 10245) *y +
0.342242088547)*y + l)*y + 0.32223243 1088)/
((((0.0038560700634*y + 0.10353775285)*y + 0.531103462366)*y
0.58858 1570495)*y + 0.099348462606)

431

432

Appendix 6

end;

var Nhits,Nmisses,Nfa,Ncr,N2,N 1 :
real;
hitrate,farate,dp,cr,beta,zh,zf:
real;
answer:
char;
adjustment:
boolean;
begin
writelnfFollow all responses by ');
writeln;
repeat
adjustment: = false;
writeln;
write('# of hits: ');
read(Nhits);
write('
#of misses: ');
readln(Nmisses);
write ('# of false alarms: ');
read (Nfa);
write(' # of correct rejections: ');
readln(Ncr);
N2: = Nhits + Nmisses;
Nl:=Nfa + Ncr;
if (Nhits < = 0) or (Nmisses < 0) or (Nfa < 0) or (Ncr < = 0)
then writeln('Bad data')
else begin
if Nmisses = 0 then begin
Nmisses: = 1\2;
Nhits: = Nhits - 1\2;
adjustment: = true
end;
if Nfa = 0 then begin
Nfa: = 1\2;
Ncr: = Ncr-l\2;
adjustment: = true
end;
nitrate: = Nhits/N2;
farate: =Nfa/Nl;

Software for Detection Theory

433

zh: = z(hitrate);
zf: = z(farate);
dp: = zh - zf;
cr: = -0.5*(zh + zf);
beta: = exp(-0.5*(zh*zh - zf*zf));
writeln;
if adjustment = true
then writeln('Data have been adjusted');
writeln('H = ',hitrate:4:3,' F = ',farate:6:3);
writeln('d" = ',dp:4:3,', c = ',
cr:6:3,', beta = ',beta:6:3)
end;
writeln;
write('Continue? (y/n)');
readln(answer);
until answer = 'n'
end.

Web Sites
A useful site for exploring the use of spreadsheets in detection theory (Bob
Sorkin):
http://www.psych.ufl.edu/~sorkin
A program for finding d' and other statistics for data collected under a wide
variety of paradigms is d'plus (Macmillan & Creelman, 1997), available at
http://psych.utoronto.ca/~creelman
Programs for fitting ROCs using maximum-likelihood techniques can be
found at sites maintained by Lew Harvey and Charles Metz:
http ://psych. Colorado .eduMharvey
http://www.xray.bsd.uchicago.edu/krl/KRL_ROC/software_index.htm
The MSDA method (Helena Kadlec) is available at
http://castle.uvic.ca/psyc/kadlec/research.htm

434

Appendix 6

Several programs permitting statistical evaluation of SDT data (Larry DeCarlo)
are at
http://www.columbia.edu/~ld208
Information about the statistical accuracy and efficiency of ROC parameters
(Caren Rotello) is located at
http://www-unix.oit.umass.edu/~caren/Design/Assets/index.htm

Appendix /

Solutions to Selected Problems

Most answers were obtained using the tables (Appendix 5). Answers that
have been found by interpolation carry asterisks.
Chapter 1
1.1. If the person being tested tells a lie, a hit occurs if the polygraph responds positively, a miss if it responds negatively. If the person tells the
truth, a false alarm is a positive response, a correct rejection a negative one.

1.2.

Problem

(a)
(b)
(c)

H
.6

.55
.45

F
.47
.17
.83

H-F

p(c)

p(c)*

.13
.38
-.38

.57
.69
.31

.57
.62
.38

p(c) is always greater than H- F except when both equal 1 (draw a graph of
Eq. 1.3).
1.3. (a) Computationally, "base rates" do not affect the calculation of conditional probabilities or/?(c), but do affectp(c)*. Experimentally, the likelihood of a "yes" response may well depend on these rates. For more detail,
see chapter 2. (b) Yes, p(c) for S2 trials is simply the hit rate.

1.4.

(a) 1.99, 4.65, 0 (b) 1.03, 1.28, 2.93.

1.5.

(a) 1.68 assuming no bias, (b) H = .65 and d' = 2.03.

435

436
1.7.

Appendix 7
Problem

d'from HandF

(a)

d'fromp(c)

0.336*

0.330*

(b)

1.080

0.992

(c)

-1.080

-0.992

1.8. Yes, in both cases, because the implied ROCs of both d' and p(c) are
symmetric.
1.9.

For (.2, .6), d' = 1.095. For (.2, .91) and (.03, .6), d' = 2.190.

1.11. If no cell contains a frequency of 0, the largest d' is 1.85, and the
smallest -1.11. If a 0 cell does occur, the largest d' is 2.31 and the smallest is
-1.47.
Chapter 2
2.1.

p(c) - .65 before training, .785, .656, and .825 after.

2.2.
2.3.

2.4.
7,5

H

F

.6

.47

.55

.17

.45

.83

c'

c

d'

0.328
1.08

-0.089
0.414

-1.08

0.414

-0.271
0.383
-0.383

ln(P)

-0.029
0.447
-0.447

ft

0.971
1.564

0.639

(a) A vertical line, (b) A line with slope -1.

H

F

.6

.2

1.095

0.2945

.79

.08

2.211

0.2995

.71

.05

2.141

.83

.11

2.181

d'

c

c'

0.269

ft
1.38

0.268
1.35

Solutions to Selected Problems
2.6.

437

The false-alarm rate [1 - P("truth"ltruth)], c, and d' are as follows:
Experimental group

F

e

d

'

Interrogators

.34

-0.215

1.254

Sheriffs

.44

-0.294

0.890

Clinical psychologists

.36

-0.098

0.911

Academic psychologists

.42

0.013

0.378

Trained interrogators are probably a bit better at detecting lying than sheriffs
or clinical psychologists, and academic psychologists are—as a group—terrible. It is also interesting that sheriffs and interrogators showed a strong bias
toward stating that a person was lying, whereas the psychologists were relatively neutral.
2.7.

(a) Payoff matrix is 10,0, 0, 10.

2.7. (b-d)

2.10.

P(S2)

LR

c

H

F

.5
.25
.1

1
3
9

0
1.1
2.2

.69
.27
.04

.31
.05
.003

(a) (F, H) = (.07, .69), (.31, .93); (b) (F, H) = (.02, .5), (.5, .98).
Chapter 3

3.1. (a) The first (F, H) point is (.12, .52), the second (.36, .80). (If you did
not get the second point, remember that the response categories have to be
in order of confidence from one alternative to the other.) (b) d' = 1.225 and
1.200.
3.2. If three "not sure" responses of each type are assigned to "sure in
tune" and the rest to "sure out of tune," estimated d' will drop to 1.064.
3.3. Condition 1: da = 2.44, s = 0.59, Az = .96, ca = 0.453, -0.147, -0.287.
Condition 2: d = 1.15, s = 0.76, A = .79, c = 1.00, -0.433, -0.299.

438
3.4.

Appendix 7
1.235,0.850,0.152.

3.5. da = 2.21, d'e = 2.22, Az = .94.
3.6. For low-frequency words, Ag = .879 and Az = .90. For high-frequency
words, A8 =.752 and A' =.76.
Chapter 4
4.1.

(a) .75, .6, 0. (b) .2. u always equals F.

4.2. (a) You can't, (b) On the upper limb, because it is above the minor diagonal.
4.3. /?(c) = .7atF = 0,and.5atF=l;d'=2.098atF=.01,and0.186at
F=.99.
4.4. (a) For (.4, .9),p(c) = .75, yes rate = .65, error ratio = (1 -H)/F=Q.25.
For (.2, .9), p(c) = .85, yes rate = .55, error ratio = 0.5. (b) The v-intercept
(lowest hit rate) is 2p(c) - 1, and the jc-intercept (highest false-alarm rate) is

4.5.

.83, .75, .875

4.6.

.8, .2.

4.8. (a) Area under two-limbed curve = .75, A' = .835, d' = 1.366, Az =
.833. (b) Area under two-limbed curve = .695, A' = .842, d' = 2.073, Az =
.93.
4.9. It is systematically greater than for the low-threshold area. It will be
similar to the area for high-threshold theory if the point is near the minor diagonal and similar to that for double high-threshold theory if accuracy is
high.
4.10. (a) For both points, d' = 1.095, ln(a) = 0.90,/?(c) = .7, and A' = .646.
(b) .667, .5. (c) H = .75, F = .25.
4.13.

Forbothpoints, ft =0.429 andfl''=-4).4.1n(fc)=-1.52and-0.675.

Solutions to Selected Problems

439

Chapter 5
5.1. (a) d' = 3 (b) p(c) = .93 . No additivity in/?(c) itself, need to convert to z
scores: z[/?(c)AB] + z[p(c\c] ~ z\P(c\J' Th*8 ^s me same as Equation 5.1
(except for a factor of 2). (c) d' = 5, so p(c) is very close to 1.0.
5.2. No. If 5 = 0.5 for both comparisons, then the standard deviations of
distributions A, B, and C can be set to 1, 2, and 4, with means at 0, 1, and 5.

5.3. (a) 1 .168 for 1 presentation, 1 .621 for 2, and 2.320 for 4. Criterion is
.915 above mean of New distribution, (b) d' = 1.15, p(c) = .72.
5.4. For each stimulus, find P("higher") and convert this to a i score.
These scores are -0.842, -0.253, 0.253, 0.842, and 1.282. The PSE is at
999.5 Hz. The jnd is VX 1000.7 -998.3) = 1.2 Hz (interpolate to find the values for which z = ± 0.675).

5.5.

84% and 16%.

5-O.

Stimulus

d'

10

cumulative d'

0
0.440

15

0.440
0.457

20

0.897
0.638

1.535

25

0.392
30

1.927
0.675

35

2.602
0.524

40

3.126

Criteria are at cumulative d' = 1.282 and 2.602, which correspond to approximately 23 and 35 cm.

440

Appendix 7

-*•''

Stimulus

d'

cumulative d'

10

0

0.363
0.363

15

0.348
20

0.711
0.616

25

1.327
0.318

30

1.645
0.674

35

2.319
0.362

40

2.681

Criterion is at cumulative d' - 1.645, which corresponds to 30 cm.

5.9.

Stimulus

d'

1 (the letter E)

cumulative d'
0

0.599
2

0.599
1.029

3

1.628
1.095

4

2.723
0.440

5 (the letter F)

3.163

Criterion is at cumulative d'= 1.881, about stimulus 3.

Solutions to Selected Problems
5.10.

5.11.

Stimulus pair

SDT

threshold

1,2

.62

.502

2,3

.70

.545

3,4

.71

.580

4,5

.59

.505

1,3
2,4

.79

.568

.86

.745

3,5

.78

.625

441

0.599, 2.60, and 5.40.

5.12. Distribution means are 0, 0.589, 1.366, and 2.732. Criteria are
0.253, 0.842, and 2.208.
Chapter 6

6.1.

.98, .50, .16, .31.

6.2.

(a) .92. (b) no change; .84. (c) ignore sound; .98.

6.3. (a) Same decision axis; .5. (b) Decision axis and criterion are both rotated clockwise compared to Fig. 6.5, by an angle less than 45°.

6.4. (a), (i) and (ii) .5, (iii) and (iv) .25. (b) (i) no (ii) no (iii) yes. (c) (i) and
(ii) .07, (iii) .07 x .07, (iv) .93 x .93. (d) (i) no (ii) yes (iii) yes.
6.5.

.61, .63, .65.

6.6. Additivity holds in both cases although the d' calculation in (b) is
heuristic only.

442

Appendix 7

Chapter?
7.1.
Matrix
A
B
C
D

7.2.

2AFC
d'
0.358

yes-no
d'
0.506

0.536
0.568
-0.132

0.758
0.803

-0.903
1.244

-0.187

0.346

Matrix

P(c'ma\

A
B

.65

C
D

c
0

(H+l-F)/2

.6
.6
.575
.467

p(c)*
.6
.6

.575
.556

.6

.66
.46

Notes: (a) p(c)max is the same for 2AFC and yes-no—it depends only on H
andF. (b) p(c)max is actually smaller than p(c) for Matrix D; it is a "maximum" in that it represents a point that is maximally different from chance.
7.3. (a) d' = 1.33 in both cases, c = 0.25 for item recognition and -0.33 for
source discrimination. (b)/?(c)max = .83, so the prediction is exactly right, (c)
See if you can account for this by assuming decisional separability.
7.4. (a) da = 1.19 independent of s. (b) d'2=0.94 assuming s = 0.5, but 1.88
assuming 5 = 2. An advantage of da is that it can be predicted from p(c)max
without knowing s.

Chapter 8
8.1. d' 1000 = 1.478, d' 1200 = 1.079, identification d'- 1.830.
8.2. p(c) = .86,.71.
8.3.

(a) Yes: p(c) in the uncertain condition is about .59.

8.4. Unbiasedp(c) for Sl detection is .84, for 52 detection and 512 recognition, .69.

Solutions to Selected Problems

443

8.7. Matrix 1 supports the independent-observation model, and Matrix 2
the integration model.

8.9.

= .79.

8.10. p(c) = .71.
8.11.

= .91.
Chapter 9

9.1.

d'

H
independentobservation

.4
.7
.05
.667

.6
.9
.2
.6

differencing

1.19

1.43

1.49
1.54

2.15
1.58

-0.70

-0.92

9.2. Xc)yes_no = -9; P(c)same.different = -82 according to the threshold and independent-observation models, .75 for differencing.
9.3. Threshold model makes no obvious prediction; if/?(c) is still .82, then
H is .69. According to the independent-observations model, H - .57 and
p(c) = .76; according to the differencing model, H = .44 and p(c) = .70.
9.4. 5, versus 52, H- .55, F= .25. S2 versus 53, H= .65, F= . 15. Same trials
count more heavily. Overall, p(c) = .686, but average of Hand 1 -Fis .70.

9
5
y.j.

9.7.

= .yy
997i .
-

= -7/*t
974*,
d'

matrix
independentobservation

A

1.01

B
C
D

1.26
1.30

-0.59

differencing

1.12
1.41
1.45
-0.66

444

Appendix 7

9.8. Entries are p(c):
Design

d'

yes-no

9.9.

1

2

.69

.84

2AFC

.76

.92

ABX

.600

.788

same-different

.573

.732

oddity

.45

.68

Entries are p(c):
d'

Design

1

2

ABX

.583

.747

same-different

.55

.675*

oddity

.42

.60

Results for other paradigms are the same as in Problem 9.8.
Chapter 10
10.1. p(c)25AFC = .20; p(c)5AFC = .49; p(c)2AFC = .76.
10.2.

Entries in last three columns are p(c).
m

SDT

Choice Theory

Boundary

3

.62

.60

4

.54

.50

.42

8
32

.37
.16

.30

.13

.088

.00013

.015*

.003

(.75)999

1,000

.56

10.3. d' = 2.160 for any pak; points in representation form an equilateral
triangle.
10.4.

(a) 0.777; (b) 0.817, -0.217, -0.327, -0.597

Solutions to Selected Problems

445

10.5.

10.7. Hit rates are .915 and .790, false-alarm rates are .47 and .28. Both
are reliably different, so there is no marginal response invariance. Values of
d' are 1.45 and 1.39 (not different), criteria are 1.37 and 0.81 (different).
This pattern implies PS, but not DS.
Chapter 11
11.1.

(a) .0000558 (most likely), (b) .000046. (c) .000035.

1 1 .2. (a) .000028. (b) .000029 (most likely), (c) .000017.
11.3.

(a) Trials 4 (+), 12 (+), 14 (+), 18 (-).

(b) Trials 2 (+), 4 (+), 6 (-), 7 (+), 9 (-), 1 1 (+), 12 (+), 13 (+), 14 (+), 16 (-),

(c) Trials 2(+), 4(+), 7(+), 11(+), 12(+), 13(+), 14(+), 18(-).
1 1 .4. After 20 trials, level is (a) 80, (b) 40, (c) 56.
Chapter 12
12.1. Smooth and symmetric, like a normal-normal curve. Consists of
points corresponding to the possible cutpoint decision rules, and line segments connecting them corresponding to a mixture of two adjacent criteria.
12.2. (a) Mean difference unchanged, both variances increase. Best/?(c)
is .70. (b) Mean difference decreases by 0.5, variances unchanged. Best
p(c) is .65.
12.4. (a) a, and a2 both equal 0.5, decision bound has slope -a2/a} = -1 . (b)
0j = 0.4, a2 = 0.6, decision bound has slope -a2/al = -1.5.
12.5. (a) a, = 0.4, a2 = 0.6, decision bound has slope -ajal = -1.5. (a) a, =
0.33, a2 = 0.67, decision bound has slope -aja^ = -2.

Glossary

The number in parentheses following each entry gives the chapter in which
the term is introduced. Part I is denoted by I, Appendix 1 by Al, and so on.
2AFC (7). Two-alternative forced-choice.
a (4). Sensitivity measure for Choice Theory.
A' (area under the ROC) (4). An estimate of the area under the ROC based
on a single point in ROC space. A measure of sensitivity.
Ag (minimum area under the ROC) (4). An estimate of the area under the
ROC based on more than one point in ROC space. A measure of
sensitivity.
Az (3). Area under an SDT ROC (i.e., one that is linear on z coordinates). A
measure of sensitivity.
absolute identification (5). A classification experiment in which the number
of responses equals the number of stimuli.
absolute judgment (5). Same as absolute identification.
ABX (9). A discrimination design in which three stimuli are presented on
each trial, and the observer must decide whether the third matches
the first or the second.
accuracy (1,13). (a) Same as sensitivity, (b) In statistics, the degree to which
the expected value of an estimator equals the parameter being estimated.
adaptation level theory (5). A theory that states that judgments in identification are relative to a central point, the adaptation level.
adaptive probit estimation (APE) (11). An adaptive procedure that estimates
psychometric function slope as well as threshold.
447

446

Appendix 7
Chapter 13

Note: Statistically significant results are indicated by $.

13.1. (a) .38 to .62.
(b) .25 to .55$.
(c) .44 to .56; .33 to AT.
13.2. J'max =
(d' = 1.438 for both).
13.3. (a) matrix 1: d' = 0.506 ± 0.786, c = 0 ± 0.393
matrix 2: d'= 0.758 ± 0.946, c = -0.903 ± 0.473$
matrix 3: d'= 0.803 ± 1.118, c = 1.244 ± 0.559$
matrix 4: d' = -0.187 ± 1.467, c = -0.347 ± 0.735$
(b) d\ - d'4 = 0.990 ± 1.847; c, - c2 = 0.903 ± 0.615$
13.4.

(a) 0.412 ± 0.499
(b) 0.291 ± 0.353

13.5. (a) average = 0.641, pooled = 0.5 16
(b) 0.651
(c) 0.5 11

448

Glossary

adaptive procedure (11). A method for estimating empirical thresholds by
choosing stimuli in reaction to the observer's previous responses.
area theorem (7). The equivalence between area under the yes-no ROC and
the proportion correct obtainable by an unbiased observer in 2AFC.
attention operating characteristic (AOC) (8). In a divided attention paradigm, accuracy on one task versus accuracy on another as attention
is shifted from one to the other.
j8 (2). In SDT, likelihood ratio for two Gaussian distributions. A measure of
response bias.
ft (4). In Choice Theory, likelihood ratio for two logistic distributions. A
measure of response bias.
ft (9). Likelihood ratio for the differencing model of the same-different
paradigm. A measure of response bias.
ft (9). Likelihood ratio for the independent-observation model of the
same-different paradigm. A measure of response bias.
b (4). In Choice Theory, ln(b) is the location of the criterion in standard deviation units from the equal-bias point. A measure of response bias.
b' (4). In Choice Theory, the relative criterion. A measure of response bias.
B", B'H (4). Bias measures based on the geometry of ROC space.
Bayesian (11). Referring to the result that the odds in favor of a hypothesis
before an observation is made, multiplied by the likelihood ratio of
the observation, equal the odds after the observation.
Bekesy audiometry (11). An adaptive procedure, psychophysically informal, in which the observer provides a continuous detection response to a continuously changing stimulus.
Bernoulli random variable (Al). A random variable that can take on only
two values, 0 and 1.
bias, response (2). See response bias.
bias, statistical (Al, 11,13). The average amount by which an estimate differs from the parameter being estimated.
binomial distribution (13, A1). Distribution of a binomial random variable.
binomial proportion distribution (Al). Distribution of the proportion of successes in N trials (i.e., of a binomial random variable divided by AO-

Glossary

449

binomial random variable (Al). A random variable that is the sum of N
Bernoulli random variables; the number of successes in N trials.
bivariate distribution (13, Al). Probability distribution of two variables.
boundary theorem (10). A generalization of the area theorem that predicts a
lower bound on mAFC performance, given 2AFC performance.
c (2). In SDT, the location of the criterion in z units from the equal-bias
point. A measure of response bias.
c' (2). In SDT, the relative criterion. A measure of response bias.
ca (3). Criterion location in units of the root mean square standard deviation.
cd (9). Criterion location for the differencing model for the same-different
task. A measure of response bias.
ce (3). Criterion location in units of the average standard deviation.
ci (9). Criterion location for the independent-observation model for the
same-different task. A measure of response bias.
categorical perception hypothesis (5). The hypothesis that sensitivity in
classification is the same as in discrimination and/or that discrimination sensitivity reaches a peak at an intermediate point on a continuum.
categorization (5). A classification experiment in which the number of responses is less than the number of possible stimuli.
category scaling (5). Categorization, usually with stimuli that vary along a
single continuum.
central limit theorem (Al). The result that the sum of many independent
variables, each with the same distribution, has a normal distribution.
channels (8). Theoretical analyzers of multidimensional stimuli, often assumed independent.
choice axiom (4). Basic tenet of Choice Theory. States that the odds of
choosing one stimulus over a second are unaffected by the availability of other possible stimuli.
Choice Theory (4). (a) A theory of choice behavior, derived from the choice
axiom, in which responses are determined by the strengths of corresponding stimuli and by response biases, (b) For the yes-no experiment, equivalent to a version of detection theory in which
underlying distributions are assumed to be logistic.

450

Glossary

city-block metric (1, 8). A distance measure for multidimensional stimuli
equal to the sum of the distances on each dimension.
classification experiment (5). An experiment in which one stimulus, from a
set of more than two, is presented on each trial.
comparison design (7). Discrimination paradigm with two intervals that can
be represented with two underlying distributions.
comparison stimulus (5). Stimulus that varies from trial to trial, in an experiment that contains a standard stimulus (which does not).
complete correspondence experiment (Intro). See correspondence experiment,
compound detection (6). Detection of a multidimensional stimulus.
conditional-on-single-stimulus analysis (12). Method for determining the
sensitivity of single components in multidimensional stimuli.
conditional probability (Al). A probability defined on a subset of a sample
space; the probability of one event given that another occurs.
Condorcet group (12). Group that makes decisions by counting unweighted
votes.
confidence interval (13, A1). Interval within which, with some degree of
confidence, a population parameter falls.
constant-ratio rule (10). The assertion that the ratio of response frequencies
in a stimulus-response matrix is unchanged by the addition or removal of items to the stimulus set.
context coding (5). Perceptual process in which stimuli being judged are
compared with the context provided by previous trials.
context variance (5). Variability in perceptual process contributed by context coding.
continuous random variable (Al). A random variable that can take on any
value in a (finite or infinite) interval.
correct rejection (1). In a yes-no experiment, a response of "no" to 5, (the
stimulus class for which "no" is correct).
correct rejection rate (1). The proportion of correct rejections on 5, trials.
correction for guessing (4). A formula for computing q, the "corrected" hit
rate. Equivalent to high-threshold theory.
correlation (Al). The tendency for two variables to covary.

Glossary

451

correspondence experiment (Intro). An experiment in which each possible
stimulus class is associated with one "correct" response from
among a finite set. The determination of which response is correct
may be rigidly set by the experimenter or may be limited to a class
of possibilities.
COSS (12). See conditional-on-single-stimulus analysis.
criterion (1). The point on a decision axis that divides one response from another. See also decision boundary.
criterion variability (2). Setting the criterion at different locations on different trials in the same experimental condition.
cumulative sensitivity (5). Sensitivity to the difference between a stimulus
and an endpoint stimulus, sometimes inferred from sensitivities to
adjacent stimulus pairs.
d' (1). Sensitivity measure for SDT, assuming equal-variance distributions.
da (3). Measure of sensitivity for SDT, assuming unequal-variance underlying distributions and using the root-mean-square standard
deviation.
d'e (3). Measure of sensitivity for SDT, assuming unequal-variance underlying distributions and using the average standard deviation.
DYN (3). In ROC space on z coordinates, the minimum distance from the origin to the ROC.
decision space (1). The underlying distributions in an experiment, together
with the observer's decision rule for making responses.
decision boundary (6). Multidimensional generalization of the criterion:
the locus of points in a decision space that divides one response
from another.
decisional separability (6). In a multidimensional representation, a decision
rule that depends on only one dimension.
density function (Al). Function representing the likelihoods of possible
values of a continuous random variable.
detection (I). Discrimination experiment in which one stimulus is the Null
stimulus, or noise.
detection theory (1). A theory relating choice behavior to a psychological
decision space. An observer's choices are determined by the distances between distributions due to different stimuli in this space

452

Glossary
(sensitivities) and by the manner in which the space is partitioned
into regions corresponding to the possible responses.

detection with uncertainty (8). An experiment in which the stimulus to be
detected varies from trial to trial.
deviation limit (11). In PEST and other adaptive procedures, the extent to
which observed p(c) must differ from p(T) before the stimulus is
changed.
difference threshold (1,5). Empirical threshold in a discrimination experiment not involving the Null stimulus.
differencing models (7, 9). Models in which the observer uses the difference between observations from multiple intervals or dimensions
as the basis for decision.
discrete random variable (Al). A random variable that takes on only a finite
or countable number of values.
discrimination (I). The ability to distinguish between two stimulus classes,
one of which may or may not be the Null stimulus, or noise. Also an
experiment to measure this ability.
distance measure (1). A measure that has the characteristics of a distance.
The sensitivity statistics d' and ln(a) are distance measures.
distribution discrimination (12). Task in which stimuli are drawn from explicit distributions and the observer must determine which of them
is the source of the stimulus presented.
distribution function (Al). Function giving the probability that a random
variable is less than or equal to some value. For continuous variables, the integral of the density function up to that value; for discrete ones, the sum.
divided attention (8). Task in which attention to more than one dimension or
channel is required for an optimal decision.
double high-threshold theory (4). A theory with three internal states and
two high thresholds.
efficiency, relative (11, 12, 13). (a) In adaptive procedures, the ratio between the sweat factors of two statistics (or of the variances, if the
number of trials is equal); a measure of relative precision, or repeatability over estimates, (b) In model comparisons, the square of the
d' ratio, (c) In statistics, the ratio of the variances of two estimators
of a parameter.

Glossary

453

elementary event (Al). One of a finite number of equally probable events in
a sample space.
empirical ROC (3). See ROC.
equivalent measures (1). Measures that are related by a monotonic transformation. Equivalent sensitivity measures have the same implied
ROC, and equivalent bias measures have the same implied isobias
curve.
error ratio (4). In a yes-no experiment, the ratio of misses to false alarms. A
measure of response bias.
estimated probability (A1). A proportion used to estimate a true probability.
estimation (13). Process of approximating a population parameter from
data.
Euclidean distance (1). Distance measured by the Pythagorean theorem,
expectation (Al). The mean of a random variable.
external noise (12). Variability limiting performance that arises from the
stimulus itself rather than within the observer.
extrinsic uncertainty (8). Decline in performance due to uncertainty because of inherent limitation in the stimulus array.
F (1). The false-alarm rate.
false alarm (1). In a yes-no experiment, a response of "yes" to Sl (the stimulus class for which "no" is correct).
false-alarm/hit pair (1). The false-alarm and hit rates considered as an ordered pair; graphically, a point in ROC space.
false-alarm rate (1). The proportion of false alarms on Sj trials.
feature-complete factorial design (10). Identification design in which each
value of one variable is combined with each value of the others.
feedback (3,5). Information provided at the end of a trial about whether the
response was correct.
fixed discrimination (5). A discrimination task in which only two stimulus
classes can occur in a block of trials, so that only one sensitivity parameter is estimated.
forced choice (7, 10). A discrimination experiment in which m stimuli are
presented on each trial, one containing a sample of 52, the others
samples of S{.

454

Glossary

General Recognition Theory (CRT) (6). Formulation of multidimensional
SDT.
H(l). The hit rate.
high-threshold theory (4). A threshold theory with a finite number of internal states, one or more of which can only be activated by a specific
corresponding stimulus. See single high-threshold theory and double high-threshold theory.
hit (1). In a yes-no experiment, a response of "yes" to S2 (the stimulus class
for which "yes" is correct).
hit rate (1). The proportion of hits on S2 trials.
hypothesis testing (13). Statistical evaluation of statements about population parameters.
ideal observer (12). Decision strategy that uses all available information
and thus maximizes performance.
identification (I, 5, 10). (a) absolute identification, (b) classification.
identification operating characteristic (10). In a simultaneous detection and
identification experiment, the function relating the probability of
both a correct detection and a correct identification to the probability of a false alarm.
implied ROC (1). See ROC.
incomplete correspondence experiments (Introduction). See correspondence experiment.
independent channels (10). Channels whose outputs are independent random variables.
independent-observation rule (8,9). Rule by which the observer independently combines the observations in multiple intervals or channels to
reach a decision.
independent random variables (Al). Two or more variables whose joint distribution is such that the value of one variable does not affect the
value of another.
index (1). Same as statistic.
integrality (8). Dependence between dimensions, as measured operationally in the Garner paradigm.
integration rule (6). Rule for combining information by adding or subtracting values of multiple dimensions.

Glossary

455

internal noise (12). Variability limiting performance that arises within the
observer rather than from the stimulus itself.
internal representation (1). Same as decision space.
intrinsic uncertainty (8). Decline in performance due to uncertainty because
of nonoptimal processing by the observer.
IOC (10). See identification operating characteristic.
isobias curve (2). A curve in ROC space connecting points with the same response bias but different sensitivities. An isobias curve may be theoretical (implied by a theory or sensitivity parameter) or empirical
(observed in an experiment).
isosensitivity curve (1). Same as ROC.
joint distribution (6). Distribution of more than one variable.
just-noticeable-difference (jnd) (1,5). See difference threshold.
least-squares fit (Al). Method of approximating data by a model so that the
sum of the squared deviations between the model and data is as
small as possible.
likelihood ratio (2). The odds that an event arose from one distribution
rather than another. When the distributions are underlying ones due
to two possible stimulus classes, a measure of response bias.
log odds transformation (1). A transformation that converts a proportion/?
to the natural log of /?/(! -/?).
logistic distribution (1). The form of underlying distribution assumed by
Choice Theory for the one-interval design.
logistic regression (13). A statistical technique that can be used to test hypotheses about detection theory parameters.
logit (4). Unit proportional to the standard deviation of the logistic distribution, and equal to the natural log of pl(\ -/?).
low-threshold theory (4). A threshold theory with two internal states, each
of which can be activated by either of the possible stimuli.
m-alternative forced choice (mAFC) (10). An m-interval experiment in
which the observer must determine which interval contains a sample of S2 (all others containing samples of S,).
matching to sample (9). Same as ABX.
maximum-likelihood estimation (11, 13). Estimation of a parameter by
finding the value for which the observed data are most likely.

Glossary

457

one-interval design (I). A paradigm in which one stimulus is presented on
each trial.
optimal decision rule (3). A decision rule that serves to maximize some performance criterion, such as payoffs.
parameter (13). A characteristic of some population, according to a theory.
Parameter Estimation by Sequential Testing (PEST) (11). An adaptive procedure in which the decision to change stimuli is based on a Wald
test, and the amount by which the stimulus is changed depends on
the past history of the experimental run.
payoff function or matrix (3). The rewards associated with each stimulusresponse outcome in a correspondence experiment.
p(c) (1). Proportion correct.
/?(c)max (6). Maximum possible value of p(c).
perceptual dimensionality (6). Number of dimensions needed to describe
sensitivities to all pairs of stimuli in the stimulus set.
perceptual independence (6). Property of a joint distribution in a decision
space that is equal to the product of its marginal distributions.
perceptual integrality (8). Dependence between two dimensions of an underlying representation, measured across the entire stimulus set.
perceptual separability (8). Dependence between two dimensions of an underlying representation, measured within a single stimulus class.
point of subjective equality (5). In a two-response classification experiment, the stimulus for which each response is equally likely.
pooled estimate (13). Estimate of a parameter obtained by averaging response frequencies before other calculations.
positivity (1). The property of being always positive or zero.
presentation probability (3). The probability of presenting one of the possible stimulus classes.
probability function (Al). For a discrete random variable, the function giving the probability of each value of the variable.
probability summation (6). Advantage in performance due to multiple
chances at success.
probit analysis (11). A procedure for fitting the normal distribution function
to psychometric function data.

456

Glossary

maximum rule (6). A decision rule in which observations on all dimensions
must exceed the respective criteria for a positive response to be
made.
maximum p(c) (6). The highest value of p(c) that could be obtained by an
observer with a given value of sensitivity (e.g., value of d').
mean category scale (5). A scale constructed from category scaling data, assigning to each stimulus the average of the categories used in responding to it.
mean sensitivity (13). Estimate of a parameter obtained by averaging estimates based on individual stimuli, sessions, or subjects.
mean (shift) integrality (8). Type of perceptual integrality in which the dependence between two dimensions is reflected by distribution
means.
measure (1). Same as statistic.
method of constant stimuli (5,11). A classification design in which a standard stimulus is followed by one of a set of comparison stimuli.
minimum rule (6). A decision rule in which an observation above criterion
on any dimension is sufficient for a positive response to be made.
miss (1). In a yes-no experiment, a response of "no" to S2 (the stimulus class
for which "yes" is correct).
miss rate (1). The proportion of misses on S2 trials.
Multidimensional Signal Detection Analysis (MSDA) (10). Method for assessing various types of independence in a feature-complete identification design.
multiple-look experiment (8). Design in which a sample of one stimulus
class or the other is presented in each of several intervals.
Ney man-Pear son objective (2). Maximizing the hit rate while keeping the
false-alarm rate at some fixed low level.
nonparametric measure (4). A measure making no distributional assumptions.
normal distribution (1). The form of underlying distribution assumed by
SDT.
oddity (9). A design in which three (or more) stimuli are presented on each
trial, one from one stimulus class, the rest from the other. The observer must choose the "odd" interval.

458

Glossary

product rule (6). In a two-dimensional representation, the probability that
X < a and Y< b equals the probability that X < a times the probability that Y  (normal density), see Normal distribution
O (normal distribution function), see
Normal distribution
Point of subjective equality, see PSE
Poisson distribution, 301-302
Pooled data, 331-337
Presentation probabilities, 7
and bias, 42^4
in one-dimensional classification,
129-130
and ROC generation, 72
Probability, 343-351
Probit analysis, 274, 293
see also Normal distribution, as
psychometric function
Product rule, 151,350
Projection of multidimensional distributions, 146-149
Proportion correct, see p(c)
PSE, 120-121, 273
Pseudo-J', 122, 124
Psychometric function, 119, 272-276
shape of, 273-276
slope, 293
in 2AFC, 273-274
in mAFC, 253
see also Logistic distribution;
Normal distribution;
Weibull function
Psychophysics
history, 22-24
vs psychoacoustics, 312-313

Q
QUEST, 284-286
vs other methods, 286, 291

489

R
Radiology, see X-ray reading
Random variables, 345-349
Range-frequency model, 130
Rating experiment, 2, 51-70
calculating response rates, 53-57
decision space, 64-69
design, 51-52
graphing data, 55-57
response sets, 52
see also ROC
Receiver operating characteristic, see ROC
Recognition, 1
of faces, 3-6
of letters, 246-249
of odors, 51-57, 64-66
of words, 40, 57-59, 90-92, 160-161,
166-170,185, 193-194
Rectangular distribution, see Underlying
distributions, in threshold theory
Relative operating characteristic, see ROC
Reminder paradigm, 180-182, 255
vs other designs, 181-183
see also Standard stimulus
Response bias, 27-44, 362, 366
in below-chance performance, 41
as criterion location, 29-31
in multi-interval designs, see "response
bias " under specific design
Response bias measures, 362, 363, 366
characteristics of, 28-29
comparisons of, 36—42
Choice Theory, see b,b',B", B'H , &
in multi-interval designs, see "response
bias " under specific design
nonparametric, see B",B'H
for rating experiment, 64-69
SDT, see c, c', ft
and sensitivity measures, 41—42
threshold theory, 85-86
see also Error ratio; False-alarm rate, as
response-bias measure; Yes
rate
Reversal (in adaptive methods), 283-286
Reward function, see Payoffs
ROC, 10,51-77
for A'
empirical, 55-59, 66-77
fitting to data, 70, 330,433
generation methods, 71-72

490

Subject Index

for group data, 337
implied, 9-13
in multi-interval designs, see "ROC"
under specific design

regularity, 11,18
symmetry, 14
threshold, 12-13, 83-84, 89-92, 110
Type-2, 73-74
in z-coordinates, 11, 55-59
see also Maximum-likelihood estimation, ROC; Rating experiment
ROC slope (linear coordinates), 11, 33-34
ROC slope (z-coordinates), 14, 59,
330-331
in multi-interval designs, see "ROC"
under specific design

nonunit slope, 57-59
and sensitivity, 74-77
and uncertainty, 76
unit slope, 14
ROC space, 10
Roving discrimination, see Designs, roving

5, see ROC slope (^-coordinates)
S', (sensitivity measure for rating design),
104
Same-different, 214-228
decision space, 215-218, 222-224
differencing model, 221-227
hit and false-alarm rates, 215, 223
independent-observation model,
216-217
isobias curves, 219-220, 226-227
vs other designs, 216-217,228,253-255
response bias, 218-220, 225-227
ROC, 220, 223-225
sensitivity, 216-220, 223-225,
380-419
statistical properties of d', 329-330
threshold model, 217-218
Sampling distribution, 351-352
Saturated model, see Logistic regression
Sensitivity, 3, 361, 363, 365
as mean difference in decision space,
18-20
medical use of term, 6
in multi-interval designs, see "sensitivity" under specific design

near-chance, 8-9,40-41
near-perfect, 8-9,129, 224-225, 321,
336
as perceptual distance, 15
Sensitivity measures, 3, 361, 362, 365
area-based, see A', Ag, Area theorem,
^
and bias measures,
41-42
characteristics of, 5-7
in Choice Theory, see a
in classification, one-dimensional; see
Classification, one-dimensional, sensitivity
in multi-interval designs, see "sensitivity" under specific design
nonparametric, see A',Ag, S'
for nonunit-slope ROCs, see Az, d\, d'2,
da,d'e

in ROC space, 12, 59-64
in SDT, see Az, d', d\, d'2, da, d'e
in threshold theory, 82-89
for unit-slope ROCs, see d', a
see also p(c\ p(c)mm
Separability, 194-195
Sequential effects, 183
Simulations, see Computer simulations
Simultaneous detection and identification,
255-259
Simultaneous simple and compound detection, 200-202
Specificity, 6
Staircase procedure, 281-282
Standard stimulus, 113-114
see also Reminder experiment
State diagram, 81
Statistical bias, 352
of d' estimates, 323-325
of pooled sensitivity estimates,
331-335
of threshold estimates, 290
Statistics, 351-355
and detection theory, 319-341
see also Hypothesis testing; Maximum-likelihood estimation;
Parameter estimation
Stimulus repetition, see Compound
detection; Multiple look
experiments
Subliminal perception, 105-106, 258-259
Sweat factor, 290

Subject Index

491

U
Target proportion (of an adaptive method),
see Adaptive methods, target
proportion
3AFC (three-alternative forced-choice),
249-252, 426-430
vs other designs, 251-255
Threshold, compared with criterion, 22-23
Threshold, empirical, 119-120, 269-296
and response bias, 287-289
Threshold theories, 81-94,104-107
double high-threshold, 88-94
for multi-interval designs, see specific
design
low threshold, 86-88
single high-threshold, 82-86
three-state, 110
Thurstonian scaling, see Classification,
one-dimensional
Time order errors, 176-177
Total d', see d', total
Trace coding, 178-179
Trace-context theory, 133-135, 178-179,
310-311
Trading relations, 114, 124-126
Training, effects of, 46
Transformations
arcsine, 103
logarithmic and exponential, 274,357-358
log-odds, 95
z, see z-transformation
Triangular method, see Oddity
2AFC (two-alternative forced-choice),
166-179
advantages, 179
decision space, 168-170
hit and false-alarm rates, 167
vs one-interval, 167-168, 175-176,
181-182
vs other designs, 181-183, 234, 251,
253-255
for psychometric functions, see
psychometric function, 2AFC
response bias, 170, 287-289
ROC, 173-174
sensitivity, 168, 170-175, 426-430
statistical properties of d', 328-329
unbiased performance, 170-171
Type-I error, 44

UDTR, 278, 281, 289
decision rule, 278
vs other methods, 292
Unbiased performance, see p(c)max
Uncertain detection, 188-202
vs identification, see Simultaneous detection and identification
independent-observation rule, 197-199
on one dimension, 189-191
optimal model, 199
summation rule, 196-197
Uncertainty
extrinsic vs intrinsic, 188
see also Uncertain detection
Underlying distributions, 16
in Choice Theory, 98-100
multidimensional, 144-152
in multi-interval designs, see "decision
space " under specific design
in SDT, 16-20
in threshold theory, 82-91
and transformations, 19-20
with unequal variances, 57-64,
173-175
yes-no, 16
see also Decision space
Unsaturated model, see Logistic regression
Up-Down Transformed Method, see
UDTR

Variance, external, 297-298, 302-303
Variance, internal, 297-298, 302-303
context, 134-135
sensory, 134-135, 178-179
trace, 178-179
see also Attention, incomplete
Variance (in statistics), see "confidence interval" under specific statistic
Visual search, 311

W
Wald rule in adaptive methods, 278-280
Weibull function, 275-276

492

Subject Index

X-ray reading, 28-35
Y
Yes-no design, 1-50, 361-362
for
in adaptive methods 271-272 293
vs other designs, 167-168,175-176,
181-182, 228, 234, 253-255
Yes rate, 92-93

z-transformation! 8
for one-dimensional classification,
117-128
for psychometric functions, 117-121
ROCs jW2 55_
yariance Qf 325_32?



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : Yes
XMP Toolkit                     : 3.1-702
Modify Date                     : 2009:04:23 10:55:48+02:00
Create Date                     : 2009:04:23 10:54:48+02:00
Metadata Date                   : 2009:04:23 10:55:48+02:00
Creator Tool                    : Acrobat 5.0 Paper Capture Plug-in for Windows
Format                          : application/pdf
Document ID                     : uuid:14708ed3-2d62-4e75-80c6-9e512f725506
Instance ID                     : uuid:09c7d5c5-9252-423d-841d-1764b57d1909
Producer                        : Adobe Acrobat 6.0
Has XFA                         : No
Page Count                      : 513
Creator                         : Acrobat 5.0 Paper Capture Plug-in for Windows
EXIF Metadata provided by
EXIF.tools

Navigation menu