SRI Tech Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 101

DownloadSRI Tech Guide
Open PDF In BrowserView PDF
COVER

74216_SRI_TechGuide_FC-105.indd i

9/26/07 6:03:09 PM

inside front cover

74216_SRI_TechGuide_FC-105.indd ii

8/14/07 6:54:02 PM

Technical Guide

74216_SRI_TechGuide_FC-105.indd 1

9/26/07 6:03:20 PM

Parts of this compilation originally appeared in the following Scholastic Inc. products:
Scholastic Reading Inventory Target Success with the Lexile Framework for Reading,
copyright © 2005, 2003, 1999; Scholastic Reading Inventory Using the Lexile Framework,
Technical Manual Forms A and B, copyright © 1999; Scholastic Reading Inventory Technical
Guide, copyright © 2001, 1999; Lexiles: A System for Measuring Reader Ability and
Text Difficulty, A Guide for Educators, copyright © Scholastic Inc.
No part of this publication may be reproduced in whole or in part, or stored in a retrieval
system, or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without written permission of the publisher. For information regarding
permission, write to Scholastic Inc., Education Group, 557 Broadway, New York, NY 10012.
Copyright © 2007 by Scholastic Inc.
All rights reserved. Published by Scholastic Inc. Printed in the U.S.A.
ISBN-13: 978-0-439-74216-0
ISBN-10: 0-439-74216-1
SCHOLASTIC, SCHOLASTIC READING INVENTORY, SCHOLASTIC READING
COUNTS!, and associated logos and designs are trademarks and/or registered trademarks
of Scholastic Inc.
LEXILE and LEXILE FRAMEWORK are registered trademarks of MetaMetrics, Inc.
Other company names, brand names, and product names are the property and/or trademarks of their respective owners.
1 2 3 4 5 6 7 8 9 10

74216_SRI_TechGuide_FC-105.indd 2

23

16 15 14 13 12 11 10 09 08 07

8/14/07 6:54:02 PM

TABLE OF CONTENTS
Introduction
Features of Scholastic Reading Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Purposes and Uses of Scholastic Reading Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Limitations of Scholastic Reading Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Theoretical Framework of Reading Ability and The Lexile Framework for Reading
Readability Formulas and Reading Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Lexile Framework for Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Validity of The Lexile Framework for Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lexile Item Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11
14
18
22

Description of the Test
Test Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test Administration and Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interpreting Scholastic Reading Inventory Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using Scholastic Reading Inventory Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25
26
28
37

Development of Scholastic Reading Inventory
Development of the Scholastic Reading Inventory Item Bank . . . . . . . . . . . . . . . . . . . . . . 43
Scholastic Reading Inventory Computer-Adaptive Algorithm . . . . . . . . . . . . . . . . . . . . . . 47
Scholastic Reading Inventory Algorithm Testing During Development . . . . . . . . . . . . . . . 55

Reliability
Standard Error of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sources of Measurement Error—Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sources of Measurement Error—Item Writers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sources of Measurement Error—Reader. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Forecasted Comprehension Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61
62
67
71
73

Validity
Content Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Criterion-Related Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Construct Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Appendices
Appendix 1: Lexile Framework Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Appendix 2: Norm Reference Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Appendix 3: References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

74216_SRI_TechGuide_FC-105.indd 3

9/26/07 6:03:27 PM

List of Tables
Table 1: Results from linking studies connected with The Lexile Framework for Reading.
page 19
Table 2: Correlations between theory-based calibrations produced by the Lexile equation
and rank order of unit in basal readers. page 20
Table 3: Correlations between theory-based calibrations produced by the Lexile equation
and the empirical item difficulty. page 21
Table 4: Comprehension rates for the same individual with materials of varying
comprehension difficulty. page 33
Table 5: Comprehension rates of different-ability readers with the same material. page 34
Table 6: Performance standard proficiency bands for SRI, in Lexiles, by grade. page 36
Table 7: Distribution of items in SRI item bank by Lexile zone. page 46
Table 8: Student responses to Question 7: preferred test format. page 56
Table 9: Relationship between SRI and SRI-print version. page 58
Table 10: Relationship between SRI and other measures of reading comprehension. page 58
Table 11: Descriptive statistics for each test administration group in the comparison study,
April/May 2005. page 59
Table 12: Mean SEM on SRI by extent of prior knowledge. page 62
Table 13: Standard errors for selected values of the length of the text. page 64
Table 14: Analysis of 30 item ensembles providing an estimate of the theory
misspecifications error. page 66
Table 15: Old method text readabilities, resampled SEMs, and new SEMs for
selected books. page 68
Table 16: Lexile measures and standard errors across item writers. page 69
Table 17: SRI reader consistency estimates over a four-month period, by grade. page 72
Table 18: Confidence intervals (90%) for various combinations of comprehension rates and
standard error of differences (SED) between reader and text measures. page 74
Table 19: Clark County (NV) School District: Normal curve equivalents of SRI by
grade level. page 78
Table 20: Indian River (DE) School District: SRI average scores (Lexiles) for READ 180
students in 2004–2005. page 80
Table 21: Large Urban School District: SRI scores by student demographic
classification. page 82
Table 22: Large Urban School District: Descriptive statistics for SRI and the SAT-9/10,
matched sample. page 85
Table 23: Large Urban School District: Descriptive statistics for SRI and the SSS,
matched sample. page 85
Table 24: Large Urban School District: Descriptive statistics for SRI and the PSAT,
matched sample. page 86

74216_SRI_TechGuide_FC-105.indd 4

8/14/07 6:54:03 PM

List of Figures
Figure 1: An example of an SRI test item. page 9
Figure 2: Sample administration of SRI for a sixth-grade student with a prior Lexile
measure of 880L. page 27
Figure 3: Normal distraction of scores described in scale scores, percentiles, stanines, and
normal curve equivalents (NCEs). page 29
Figure 4: Relationship between reader-text discrepancy and forecasted reading
comprehension rate. page 33
Figure 5: The Rasch Model—the probability person n responds correctly to item i.
page 49
Figure 6: The “start” phase of the SRI computer-adaptive algorithm. page 51
Figure 7: The “step” phase of the SRI computer-adaptive algorithm. page 53
Figure 8: The “stop” phase of the SRI computer-adaptive algorithm. page 54
Figure 9: Scatter plot between observed item difficulty and theoretical item
difficulty. page 64
Figure 10a: Plot of observed ensemble means and theoretical calibrations (RMSE  111L).
page 67
Figure 10b: Plot of simulated “true” ensemble means and theoretical calibrations
(RMSE  64L). page 67
Figure 11: Examination of item writer error across items and occasions. page 70
Figure 12: Growth on SRI—Median and upper and lower quartiles, by grade. page 77
Figure 13: Memphis (TN) Public Schools: Distribution of initial and final SRI scores for
READ 180 participants. page 78
Figure 14: Des Moines (IA) Independent Community School District: Group SRI mean
Lexile measures, by starting grade level in READ 180. page 79
Figure 15: Kirkwood (MO) School District: Pretest and posttest SRI scores, school year
2000–2001, general education students. page 82
Figure 16: Kirkwood (MO) School District: Pretest and posttest SRI scores, school year
2001–2002, general education students. page 83
Figure 17: Kirkwood (MO) School District: Pretest and posttest SRI scores, school year
2002–2003, general education students. page 83
Figure 18: Large Urban School District: Fit of quadratic growth model to SRI data for
students in Grades 2 through 10. page 87

74216_SRI_TechGuide_FC-105.indd 5

8/14/07 6:54:03 PM

74216_SRI_TechGuide_FC-105.indd 6

8/14/07 6:54:03 PM

INTRODUCTION
Scholastic Reading Inventory™ (SRI), developed by Scholastic Inc., is an objective assessment of
a student’s reading comprehension level (Scholastic, 2006a). The assessment can be administered to students in Grades 1 through 12 by paper and pencil or by computer; the result
of either mode is a Lexile® measure for the reader. The assessment is based on the Lexile
Framework® for Reading and can be used for two purposes: (1) to assess a student’s reading
comprehension level, and (2) to match students with appropriate texts for successful reading
experiences. Using the Lexile score reported by SRI, teachers and administrators can:
• identify struggling readers,
• plan for instruction,
• gauge the effectiveness of a curriculum, and
• demonstrate accountability.
Scholastic Reading Inventory was initially developed in 1998 and 1999 as a print-based
assessment of reading comprehension. In late 1998, Scholastic began developing a
computer-based version. Pilot studies of the computer application were conducted in
fall and winter 1998. Version 1 of the interactive presentation was launched in fall 1999.
Subsequent versions were launched between 1999 and 2003, with Version 4.0/Enterprise
Edition appearing in winter 2006.
This technical guide for the interactive version of SRI is intended to provide users with the
broad research foundation essential for deciding if and how SRI should be used and what
kinds of inferences about readers and texts can be drawn from it. SRI Technical Report #2 is
the second in a series of technical publications describing the development and psychometric characteristics of SRI. SRI Technical Report #1 described the development and validation
of the print version of SRI. Subsequent publications are forthcoming as additional data
become available.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 7

7

8/14/07 6:54:03 PM

Features of Scholastic Reading Inventory
SRI is designed to measure how well readers comprehend literary and expository texts.
It measures reading comprehension by focusing on the skills readers use to understand
written materials sampled from various content areas. These skills include referring to
details in the passage, drawing conclusions, and making comparisons and generalizations.
SRI does not require prior knowledge of ideas beyond the test passages, vocabulary taken
out of context, or formal logic. SRI is composed of authentic passages that are typical of
the materials students read both in and out of school, including topics in prose fiction, the
humanities, social studies, science, and everyday texts such as magazines and newspapers.
The purpose of SRI is to locate the reader on the Lexile Map for Reading (see Appendix
1). Once a reader has been measured, it is possible to forecast how well the reader will likely
comprehend hundreds of thousands of texts that have been analyzed using the Lexile metric.
Several features of SRI are noteworthy.
• Passages are authentic: they are sampled from best-selling literature,
curriculum texts, and familiar periodicals.
• The “embedded completion” item format used by SRI has been
shown to measure the same core reading competency measured by
norm-referenced, criterion-referenced, and individually administered
reading tests (Stenner, Smith, Horiban, and Smith, 1987).
• A decade of research defined the rules for sampling text and developing embedded completion items. A multi-stage review process ensured
conformity with item-writing specifications.
• SRI is the first among available reading tests in using the Lexile Theory
to convert a raw score (number correct) into the Lexile metric. The
equation used to calibrate SRI test items is the same equation used
to measure texts. Thus, readers and texts are measured using the same
metric.
• SRI is a full-range instrument capable of accurately measuring reading
performance from the middle of first grade to college.
• The test format supports quick administration in an un-timed, lowpressure format.
• SRI employs a computer-adaptive algorithm to adapt the test to the
specific level of the reader. This methodology continuously targets the
reading level of the student and produces more precise measurements
than “fixed-form” assessments.
• SRI applies a Bayesian scoring algorithm that uses past performance
to predict future performance. This methodology connects each test
administration to every other administration to produce more precise
measurements when compared with independent assessments.

8

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 8

8/14/07 6:54:03 PM

• Little specialized preparation is needed to administer SRI, though
proper interpretation and use of the results requires knowledge of the
Lexile Framework.

Purposes and Uses of Scholastic Reading Inventory
SRI is designed to measure a reader’s ability to comprehend narrative and expository texts
of increasing difficulty. Students are generally well measured when they are administered a
test that is targeted near their true reading ability. When students take poorly targeted tests,
there is considerable uncertainty about their location on the Lexile Map.
SRI’s lowest-level item passages are sampled from beginning first-grade literature; the
highest-level item passages are sampled from high school (and more difficult) literature and
other print materials. Figure 1 shows an example of an 800L item from SRI.

Figure 1. An example of an SRI test item.
Wilbur likes Charlotte better and better each day. Her campaign against insects seemed
sensible and useful. Hardly anybody around the farm had a good word to say for a fly.
Flies spent their time pestering others. The cows hated them. The horses hated them.
The sheep loathed them. Mr. and Mrs. Zuckerman were always complaining about
them, and putting up screens.
Everyone
A. agreed
B. gathered

about them.
C. laughed
D. learned

From Charlotte’s Web by E. B. White, 1952, New York: Harper & Row.

Readers and texts are measured using the same Lexile metric, making it possible to directly
compare reader and text. When reader and text measures match, the Lexile Framework
forecasts 75% comprehension. The operational definition of 75% comprehension is that
given 100 items from a text, the reader will be able to correctly answer 75. When a text
has a Lexile measure 250L higher than the reader’s measure, the Framework forecasts 50%
comprehension. When the reader measure exceeds the text measure by 250L, the forecasted comprehension is 90%.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 9

9

8/14/07 6:54:04 PM

Limitations of Scholastic Reading Inventory
A well-targeted SRI assessment can provide useful information for matching texts and
readers. SRI, like any other assessment, is just one source of evidence about a reader’s
level of comprehension. Obviously, decisions are best made when using multiple sources
of evidence about a reader. Other sources include other reading test data, reading group
placement, lists of books read, and, most importantly, teacher judgment. One measure of
reader performance, taken on one day, is not sufficient to make high-stakes, student-level
decisions such as summer school placement or retention.
The Lexile Framework provides a common metric for combining different sources of
information about a reader into a best overall judgment of the reader’s ability expressed in
Lexiles. Scholastic encourages users of SRI to employ multiple measures when deciding
where to locate a reader on the Lexile scale.

10

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 10

8/14/07 6:54:04 PM

Theoretical Framework of Reading Ability
and The Lexile Framework for Reading
All symbol systems share two features: a semantic component and a syntactic component. In
language, the semantic units are words. Words are organized according to rules of syntax into
thought units and sentences (Carver, 1974). In all cases, the semantic units vary in familiarity and the syntactic structures vary in complexity. The comprehensibility or difficulty of a
message is dominated by the familiarity of the semantic units and by the complexity of the
syntactic structures used in constructing the message.

Readability Formulas and Reading Levels
Readability Formulas. Readability formulas have been in use for more than 60 years.
These formulas are generally based on a theory about written language and use mathematical equations to calculate text difficulty. While each formula has discrete features, nearly
all attempt to assign difficulty based on a combination of semantic (vocabulary) features
and syntactic (sentence length) features. Traditional readability formulas are all based on a
simple theory about written language and a simple equation to calculate text difficulty.
Unless users are interested in conducting research, there is little to be gained by choosing a
highly complex readability formula. A simple two-variable formula is sufficient, especially
if one of the variables is a word or semantic variable and the other is a sentence or syntactic
variable. Beyond these two variables, more data adds relatively little predictive validity
while increasing the application time involved. Moreover, a formula with many variables is
likely to be difficult to calculate by hand.
The earliest readability formulas appeared in the 1920s. Some of them were esoteric
and primarily intended for chemistry and physics textbooks or for shorthand dictation
materials. The first milestone that provided an objective way to estimate word difficulty
was Thorndike’s The Teacher Word Book, published in 1921. The concepts discussed in
Thorndike’s book led Lively and Pressey in 1923 to develop the first readability formula
based on tabulations of the frequency with which words appear. In 1928, Vogel and
Washburne developed a formula that took the form of a regression equation involving
more than one language variable. This format became the prototype for most of the
formulas that followed. The work of Washburne and Morphett in 1938 provided a formula
that yielded scores on a grade-placement scale. The trend to make the formulas easy to
apply resulted in the most widely used of all readability formulas—Flesch’s Reading Ease
Formula (1948). Dale and Chall (1948) published another two-variable formula that
became very popular in educational circles. Spache designed his renowned formula using a
word-list approach in 1953. This design was useful for Grades 1 through 3 at a time when
most formulas were designed for the upper grade levels. That same year, Taylor proposed
the cloze procedure for measuring readability. Twelve years later, Coleman used this
procedure to develop his fill-in-the-blank method as a criterion for his formula. Danielson
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 11

11

8/14/07 6:54:04 PM

and Bryan developed the first computer-generated formulas in 1963. Also in 1963, Fry
simplified the process of interpreting readability formulas by developing a readability graph.
Later, in 1977, he extended his readability graph, and his method is the most widely used of
all current methods (Klare, 1984; Zakaluk and Samuels, 1988).
Two often-used formulas—the Fog Index and the Flesch-Kincaid Readability Formula—
can be calculated by hand for short passages. First, a passage is selected that contains
100 words. For a lengthy text, several different 100-word passages are selected.
For the Fog Index, first the average number of words per sentence is determined. If
the passage does not end at a sentence break, the percentage of the final sentence to be
included in the passage is calculated and added to the total number of sentences. Then,
the percentage of “long” words (words with three or more syllables) is determined. Finally,
the two measures are added together and multiplied by 0.4. This number indicates the
approximate Reading Grade Level (RGL) of the passage.
For the Flesch-Kincaid Readability Formula the following equation is used:
RGL ⴝ 0.39 (average number of words per sentence) ⴙ
11.8 (average number of syllables per word) ⴚ 15.59
For a lengthy text, using either formula, the RGLs are averaged for the several different
100-word passages.
Another commonly used readability formula is ATOS™ for Books developed by Advantage Learning Systems. ATOS is based on the following variables related to the reading
demands of text: words per sentence, characters per word, and average grade level of the
words. ATOS uses whole-book scans instead of text samples, and results are reported on a
grade-level scale.
Guided Reading Levels. Within the Guided Reading framework (Fountas & Pinnell, 1996),
books are assigned to levels by teachers according to specific characteristics. These characteristics include the level of support provided by the text (e.g., the use and role of illustrations, the size and layout of the print) and the predictability and pattern of language (e.g.,
oral language compared to written language). An initial list of leveled books is provided so
teachers have models to compare when leveling a book.
For students in kindergarten through Grade 3, there are 18 Guided Reading Levels, A
through R (kindergarten: Levels A–C; first grade: Levels A–I; second grade: Levels C–P; and
third grade: Levels J–R). The books include several genres: informational texts on a variety
of topics, “how to” books, mysteries, realistic fiction, historical fiction, biography, fantasy,
traditional folk and fairy tales, science fiction, and humor.
How do readability formulas and reading levels relate to readers? The previous section described
how to level books in terms of grade levels and reading levels based on the characteristics
of the text. But how can these levels be connected to the reader? Do we say that a reader
in Grade 6 should read only books whose readability measures between 6.0 and 6.9?

12

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 12

8/14/07 6:54:04 PM

How do we know that a student is reading at Guided Reading Level “G” and when is he
or she ready to move on to Level “H”? What is needed is some way to put readers on
these scales.
To match students with readability levels, their “reading“ grade level needs to be determined, which is often not the same as their “nominal” grade level (the grade level of the
class they are in). On a test, a grade equivalent (GE) is a score that represents the typical
(mean or median) performance of students tested in a given month of the school year. For
example, if Alicia, a fourth-grade student, obtained a GE of 4.9 on a fourth-grade reading
test, her score is the score that a student at the end of the ninth month of fourth grade
would likely achieve on that same reading test. But there are two main problems with
grade equivalents:
How grade equivalents are derived determines the appropriate conclusions that may be drawn from
the scores. For example, if Stephanie scores 5.9 on a fourth-grade mathematics test, it is
not appropriate to conclude that Stephanie has mastered the mathematics content of the
fifth grade (in fact, it may be unknown how fifth-grade students would perform on the
fourth-grade test). It certainly cannot be assumed that Stephanie has the prerequisites
for sixth-grade mathematics. All that is known for certain is that Stephanie is well above
average in mathematics.
Grade equivalents represent unequal units. The content of instruction varies somewhat from
grade to grade (as in high school, where subjects may be studied only one or two years), and
the emphasis placed on a subject may vary from grade to grade. Grade units are unequal, and
these inequalities occur irregularly in different subjects. A difference of one grade equivalent in
elementary school reading (2.6 to 3.6) is not the same as a difference of one grade equivalent
in middle school (7.6 to 8.6).
To match students with Guided Reading Levels, the teacher makes decisions based on
observations of what the child can or cannot do to construct meaning. Teachers also use
ongoing assessments—such as running records, individual conferences, and observations of
students’ reading—to monitor and support student progress.
Both of these approaches to helping readers select books appropriate to their reading
level—readability formulas and reading levels—are subjective and prone to misinterpretation. What is needed is one scale that can describe the reading demands of a piece of text
and the reading ability of a child. The Lexile Framework for Reading is a powerful tool for
determining the reading ability of children and finding texts that provide the appropriate
level of challenge.
Jack Stenner, a leading psychometrician and one of the developers of the Lexile Framework, likens this situation to an experience he had several years ago with his son.
Some time ago I went into a shoe store and asked for a fifth-grade
shoe. The clerk looked at me suspiciously and asked if I knew how
much shoe sizes varied among eleven-year-olds. Furthermore, he

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 13

13

8/14/07 6:54:04 PM

pointed out that shoe size was not nearly as important as purpose,
style, color, and so on. But if I would specify the features I wanted
and the size, he could walk to the back and quickly reappear with
several options to my liking. The clerk further noted, somewhat
condescendingly, that the store used the same metric to measure
feet and shoes, and when there was a match between foot and shoe,
the shoes got worn, there was no pain, and the customer was happy
and became a repeat customer. I called home and got my son’s
shoe size and then asked the clerk for a “size 8, red hightop Penny
Hardaway basketball shoe.” After a brief transaction, I had the shoes.
I then walked next door to my favorite bookstore and asked for
a fifth-grade fantasy novel. Without hesitation, the clerk led me
to a shelf where she gave me three choices. I selected one and
went home with The Hobbit, a classic that I had read three times
myself as a youngster. I later learned my son had yet to achieve
the reading fluency needed to enjoy The Hobbit. His understandable response to my gifts was to put the book down in favor of
passionately practicing free throws in the driveway.
The next section of this technical report describes the development and validation of
the Lexile Framework for Reading.

The Lexile Framework for Reading
A reader’s comprehension of text depends on several factors: the purpose for reading, the
ability of the reader, and the text being read. The reader can read a text for entertainment
(literary experience), to gain information, or to perform a task. The reader brings to the
reading experience a variety of important factors: reading ability, prior knowledge, interest
level, and developmental appropriateness. For any text, three factors determine readability:
difficulty, support, and quality. All of these factors are important to consider when evaluating the appropriateness of a text for a reader. The Lexile Framework focuses primarily on
two: reader ability and text difficulty.
Like other readability formulas, the Lexile Framework examines two features of text
to determine its readability—semantic difficulty and syntactic complexity. Within the
Lexile Framework, text difficulty is determined by examining the characteristics of word
frequency and sentence length. Text measures typically range from 200L to 1700L, but
they can go below zero (reported as “Beginning Reader”) and above 2000L. Within any
one classroom, the reading materials will span a range of difficulty levels.
All symbol systems share two features: a semantic component and a syntactic component.
In language, the semantic units are words. Words are organized according to rules of
syntax into thought units and sentences (Carver, 1974). In all cases, the semantic units
vary in familiarity and the syntactic structures vary in complexity. The comprehensibility

14

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 14

8/14/07 6:54:05 PM

or difficulty of a message is dominated by the familiarity of the semantic units and by the
complexity of the syntactic structures used in constructing the message.
The Semantic Component. Most operationalizations of semantic difficulty are proxies for the
probability that an individual will encounter a word in a familiar context and thus be able
to infer its meaning (Bormuth, 1966). This is the basis of exposure theory, which explains
the way receptive or hearing vocabulary develops (Miller and Gildea, 1987; Stenner, Smith,
and Burdick, 1983). Klare (1963) hypothesized that the semantic component varied along
a familiar-to-rare continuum. This concept was further developed by Carroll, Davies,
and Richman (1971), whose word-frequency study examined the reoccurrence of words
in a five-million-word corpus of running text. Knowing the frequency of words as they
are used in written and oral communication provided the best means of inferring the
likelihood that a word would be encountered by a reader and thus become part of that
individual’s receptive vocabulary.
Variables such as the average number of letters or syllables per word have been observed
to be proxies for word frequency. There is a high negative correlation between the length
of a word and the frequency of its usage. Polysyllabic words are used less frequently than
monosyllabic words, making word length a good proxy for the likelihood that an individual
will be exposed to a word.
In a study examining receptive vocabulary, Stenner, Smith, and Burdick (1983) analyzed
more than 50 semantic variables in order to identify those elements that contributed to the
difficulty of the 350 vocabulary items on Forms L and M of the Peabody Picture Vocabulary
Test—Revised (Dunn and Dunn, 1981). Variables included part of speech, number of letters,
number of syllables, the modal grade at which the word appeared in school materials,
content classification of the word, the frequency of the word from two different word
counts, and various algebraic transformations of these measures.
The word frequency measure used was the raw count of how often a given word
appeared in a corpus of 5,088,721 words sampled from a broad range of school materials
(Carroll, Davies, and Richman, 1971). A “word family” included: (1) the stimulus word;
(2) all plurals (adding “-s” or changing “-y” to “-ies”); (3) adverbial forms; (4) comparatives and superlatives; (5) verb forms (“-s,” “-d,” “-ed,” and “-ing”); (6) past participles;
and (7) adjective forms. Correlations were computed between algebraic transformations of these means and the rank order of the test items. Since the items were ordered
according to increasing difficulty, the rank order was used as the observed item difficulty.
The mean log word frequency provided the highest correlation with item rank order
(r  0.779) for the items on the combined form.
The Lexile Framework currently employs a 600-million-word corpus when examining
the semantic component of text. This corpus was assembled from the thousands of texts
publishers have measured. When text is analyzed by MetaMetrics, all electronic files are
initially edited according to established guidelines used with the Lexile Analyzer software.
These guidelines include the removal of all incomplete sentences, chapter titles, and paragraph headings; running of a spell check; and repunctuating where necessary to correspond

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 15

15

8/14/07 6:54:05 PM

to how the book would be read by a child (for example, at the end of a page). The text
is then submitted to the Lexile Analyzer that examines the lengths of the sentences and
the frequencies of the words and reports a Lexile measure for the book. When enough
additional texts have been analyzed to make an adjustment to the corpus necessary and
desirable, a linking study will be conducted to adjust the calibration equation such that
the Lexile measure of a text based on the current corpus will be equivalent to the Lexile
measure based on the new corpus.
The Syntactic Component. Klare (1963) provided a possible interpretation for how sentence
length works in predicting passage difficulty. He speculated that the syntactic component
varied with the load placed on short-term memory. Crain and Shankweiler (1988),
Shankweiler and Crain (1986), and Liberman, Mann, Shankweiler, and Westelman (1982)
have also supported this explanation. The work of these individuals has provided evidence
that sentence length is a good proxy for the demand that structural complexity places upon
verbal short-term memory.
While sentence length has been shown to be a powerful proxy for the syntactic complexity of a passage, an important caveat is that sentence length is not the underlying causal
influence (Chall, 1988). Researchers sometimes incorrectly assume that manipulation of
sentence length will have a predictable effect on passage difficulty. Davidson and Kantor
(1982), for example, illustrated rather clearly that sentence length can be reduced and
difficulty increased and vice versa.
Based on previous research, it was decided to use sentence length as a proxy for the
syntactic component of reading difficulty in the Lexile Framework.
Calibration of Text Difficulty. A research study on semantic units conducted by Stenner,
Smith, and Burdick (1983) was extended to examine the relationship of word frequency
and sentence length to reading comprehension. In 1987(a), Stenner, Smith, Horabin, and
Smith performed exploratory regression analysis to test the explanatory power of these
variables. This analysis involved calculating the mean word frequency and the log of
the mean sentence length for each of the 66 reading comprehension passages on the
Peabody Individual Achievement Test. The observed difficulty of each passage was the mean
difficulty of the items associated with the passage (provided by the publisher) converted
to the logit scale. A regression analysis based on the word-frequency and sentence-length
measures produced a regression equation that explained most of the variance found in the
set of reading comprehension tasks. The resulting correlation between the observed logit
difficulties and the theoretical calibrations was 0.97 after correction for range restriction
and measurement error. The regression equation was further refined based on its use in
predicting the observed difficulty of the reading comprehension passages on eight other
standardized tests. The resulting correlation between the observed logit difficulties and
the theoretical calibrations when the nine tests were combined into one was 0.93 after
correction for range restriction and measurement error.
Once a regression equation was established linking the syntactic and semantic features of a
text to its difficulty, that equation was used to calibrate test items and text.

16

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 16

8/14/07 6:54:05 PM

The Lexile scale. In developing the Lexile scale, the Rasch item response theory model
(Wright and Stone, 1979) was used to estimate the difficulties of items and the abilities
of readers on the logit scale.
The calibrations of the items from the Rasch model are objective in the sense that the
relative difficulties of the items will remain the same across different samples of readers
(i.e., specific objectivity). When two items are administered to the same person, which
item is harder and which one is easier can be determined. This ordering is likely to hold
when the same two items are administered to a second person. If two different items are
administered to the second person, there is no way to know which set of items is harder
and which set is easier. The problem is that the location of the scale is not known. General
objectivity requires that scores obtained from different test administrations be tied to a
common zero—absolute location must be sample independent (Stenner, 1990). To achieve
general objectivity, the theoretical logit difficulties must be transformed to a scale where
the ambiguity regarding the location of zero is resolved.
The first step in developing a scale with a fixed zero was to identify two anchor points for
the scale. The following criteria were used to select the two anchor points: they should be
intuitive, easily reproduced, and widely recognized. For example, with most thermometers
the anchor points are the freezing and boiling points of water. For the Lexile scale, the
anchor points are text from seven basal primers for the low end and text from The Electronic
Encyclopedia (Grolier, Inc., 1986) for the high end. These points correspond to mediumdifficulty first-grade text and medium-difficulty workplace text.
The next step was to determine the unit size for the scale. For the Celsius thermometer,
the unit size (a degree) is 1/100th of the difference between freezing (0 degrees) and
boiling (100 degrees) water. For the Lexile scale, the unit size was defined as 1/1000th of
the difference between the mean difficulty of the primer material and the mean difficulty
of the encyclopedia samples. Therefore, a Lexile by definition equals 1/1000th of the
difference between the comprehensibility of the primers and the comprehensibility of the
encyclopedia.
The third step was to assign a value to the lower anchor point. The low-end anchor on the
Lexile scale was assigned a value of 200.
Finally, a linear equation of the form
[(Logit ⴙ constant) ⴛ CF] ⴙ 200 ⴝ Lexile text measure

(Equation 1)

was developed to convert logit difficulties to Lexile calibrations. The values of the conversion factor (CF) and the constant were determined by substituting in the anchor points and
then solving the system of equations.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 17

17

8/14/07 6:54:05 PM

Validity of The Lexile Framework for Reading
Validity is the “extent to which a test measures what its authors or users claim it measures;
specifically, test validity concerns the appropriateness of inferences that can be made on
the basis of test results” (Salvia and Ysseldyke, 1998). The 1999 Standards for Educational and
Psychological Testing (America Educational Research Association, American Psychological
Association, and National Council on Measurement in Education) state that “validity
refers to the degree to which evidence and theory support the interpretations of test scores
entailed in the uses of tests” (p. 9). In other words, does the test measure what it is supposed
to measure? For the Lexile Framework, which measures a skill, the most important aspect
of validity that should be examined is construct validity. The construct validity of The
Lexile Framework for Reading can be evaluated by examining how well Lexile measures
relate to other measures of reading comprehension and text difficulty.
Lexile Framework Linked to Other Measures of Reading Comprehension. The Lexile Framework
for Reading has been linked to numerous standardized tests of reading comprehension.
When assessment scales are linked, a common frame of reference can be used to interpret
the test results. This frame of reference can be “used to convey additional normative
information, test-content information, and information that is jointly normative and
content-based. For many test uses, [this frame of reference] conveys information that is
more crucial than the information conveyed by the primary score scale” (Petersen, Kolen,
and Hoover, 1989, p. 222).
Table 1 presents the results from linking studies conducted with the Lexile Framework for
Reading. For each of the tests listed, student reading comprehension scores can also be
reported as Lexile measures. This dual reporting provides a rich, criterion-related frame
of reference for interpreting the standardized test scores. When a student takes one of the
standardized tests, in addition to receiving his norm-referenced test results, he can receive a
reading list that is targeted to his specific reading level.
Lexile Framework and the Difficulty of Basal Readers. In a study conducted by Stenner,
Smith, Horabin, and Smith (1987b), Lexile calibrations were obtained for units in eleven
basal series. It was hypothesized that each basal series was sequenced by difficulty. So, for
example, the latter portion of a third-grade reader is presumably more difficult than the
first portion of the same book. Likewise, a fourth-grade reader is presumed to be more
difficult than a third-grade reader. Observed difficulties for each unit in a basal series were
estimated by the rank order of the unit in the series. Thus, the first unit in the first book of
the first grade was assigned a rank order of one, and the last unit of the eighth-grade reader
was assigned the highest rank order number.

18

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 18

9/26/07 6:03:34 PM

Table 1. Results from linking studies conducted with The Lexile Framework for Reading.

Standardized Test

Grades in Study

N

Correlation between
Test Score and
Lexile Measure

Stanford Achievement Tests (Ninth Edition)

4, 6, 8, 10

1,167

0.92

Stanford Diagnostic Reading Test (Version 4.0)

4, 6, 8, 10

1, 169

0.91

North Carolina End-of-Grade Tests (Reading
Comprehension)

3, 4, 5, 8

956

0.90

TerraNova (CTBS/5)

2, 4, 6, 8

2,713

0.92

Texas Assessment of Academic Skills (TAAS)

3–8

3,623

0.73 to 0.78*

Metropolitan Achievement Test (Eighth
Edition)

2, 4, 6, 8, and 10

2,382

0.93

Gates-MacGinitie Reading Test (Version 4.0)

2, 4, 6, 8, and 10

4,644

0.92

Utah Core Assessments

3–6

1,551

0.73

Texas Assessment of
Knowledge and Skills

3, 5, and 8

1,960

0.60 to 0.73*

The Iowa Tests (Iowa Tests of Basic Skills and
Iowa Tests of Educational Development)

3, 5, 7, 9, and 11

4,666

0.88

Stanford Achievement Test (Tenth Edition)

2, 4, 6, 8, and 10

3,064

0.93

Oregon Knowledge and Skills

3, 5, 8, and 10

3,180

0.89

California Standards Test (CST)

2–12

55,564

NA**

Mississippi Curriculum Test (MCT)

2, 4, 6, and 8

7,045

0.90

Georgia Criterion
Referenced Competency Test (CRCT)

1–8

16,363

0.72 to 0.88*

Notes: Results are based on final samples used with each linking study.
*TAAS, TAKS and CRCT were not vertically equated; separate linking equations were derived for each grade.
**CST was linked using a set of Lexile calibrated items embedded in the CST research blocks. CST items were calibrated to the Lexile scale.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 19

19

8/14/07 6:54:05 PM

Correlations were computed between the rank order and the Lexile calibration of each
unit in each series. After correction for range restriction and measurement error, the
average disattenuated correlation between the Lexile calibration of text comprehensibility
and the rank order of the basal units was 0.995 (see Table 2).

Table 2. Correlations between theory-based calibrations produced by the Lexile equation
and rank order of unit in basal readers.
Number of Units

rOT

R OT

R ′OT

Ginn Rainbow Series (1985)

53

.93

.98

1.00

HBJ Eagle Series (1983)

70

.93

.98

1.00

Scott Foresman Focus Series (1985)

92

.84

.99

1.00

Riverside Reading Series (1986)

67

.87

.97

1.00

Houghton-Mifflin Reading Series (1983)

33

.88

.96

.99

Economy Reading Series (1986)

67

.86

.96

.99

Scott Foresman American Tradition (1987)

88

.85

.97

.99

HBJ Odyssey Series (1986)

38

.79

.97

.99

Holt Basic Reading Series (1986)

54

.87

.96

.98

Houghton-Mifflin Reading Series (1986)

46

.81

.95

.98

Open Court Headway Program (1985)

52

.54

.94

.97

660

.839

.965

.995

Basal Series

Total/Means

rOT  raw correlation between observed difficulties (O) and theory-based calibrations (T).
ROT  correlation between observed difficulties (O) and theory-based calibrations (T) corrected for range restriction.
R′OT  correlation between observed difficulties (O) and theory-based calibrations (T) corrected for range restriction and measurement error.
Mean correlations are the weighted averages of the respective correlations.
Based on the consistency of the results in Table 2, the Lexile theory was able to account
for the unit rank ordering of the eleven basal series despite numerous differences among
them—prose selections, developmental range addressed, types of prose introduced (e.g.,
narrative versus expository), and purported skills and objectives emphasized.
Lexile Framework and the Difficulty of Reading Test Items. In a study conducted by Stenner,
Smith, Horabin, and Smith (1987a), 1,780 reading comprehension test items appearing on
nine nationally normed tests were analyzed. The study correlated empirical item difficulties
provided by the publisher with the Lexile calibrations specified by computer analysis of the
text of each item. The empirical difficulties were obtained in one of three ways. Three of
the tests included observed logit difficulties from either a Rasch or three-parameter analysis
(e.g., NAEP). For four of the tests, logit difficulties were estimated from item p-values and
raw score means and standard deviations (Poznansky, 1990; Stenner, Wright, and Linacre,

20

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 20

8/14/07 6:54:06 PM

1994). Two of the tests provided no item parameters, but in each case items were ordered
on the test in terms of difficulty (e.g., PIAT). For these two tests, the empirical difficulties
were approximated by the difficulty rank order of the items. In those cases where multiple
questions were asked about a single passage, empirical item difficulties were averaged to
yield a single observed difficulty for the passage.
Once theory-specified calibrations and empirical item difficulties were computed, the two
arrays were correlated and plotted separately for each test. The plots were checked for
unusual residual distributions and curvature, and it was discovered that the equation did
not fit poetry items and noncontinuous prose items (e.g., recipes, menus, or shopping lists).
This indicated that the universe to which the Lexile equation could be generalized was
limited to continuous prose. The poetry and noncontinuous prose items were removed and
correlations were recalculated. Table 3 contains the results of this analysis.
Table 3. Correlations between theory-based calibrations produced by the Lexile equation
and empirical item difficulty.
Test

Number of
Questions

Number of
Passages

Mean

SD

Range

Min

Max

rOT

R OT

R ′OT

SRA
CAT-E
Lexile
PIAT
CAT-C
CTBS
NAEP
Battery
Mastery

235
418
262
66
253
246
189
26
85

46
74
262
66
43
50
70
26
85

644
789
771
939
744
703
833
491
593

353
258
463
451
238
271
263
560
488

1303
1339
1910
1515
810
1133
1162
2186
2135

33
212
304
242
314
173
169
702
586

1336
1551
1606
1757
1124
1306
1331
1484
1549

.95
.91
.93
.93
.83
.74
.65
.88
.74

.97
.95
.95
.94
.93
.92
.92
.84
.75

1.00
.98
.97
.97
.96
.95
.94
.87
.77

1780

722

767

343

1441

50 1491

.84

.91

.93

Total/
Mean

rOT  raw correlation between observed difficulties (O) and theory-based calibrations (T).
ROT  correlation between observed difficulties (O) and theory-based calibrations (T) corrected for range restriction.
R′OT  correlation between observed difficulties (O) and theory-based calibrations (T) corrected for range restriction and measurement error.
Means are computed on Fisher Z transformed correlations.

The last three columns in Table 3 show the raw correlations between observed (O) item
difficulties and theoretical (T) item calibrations, with the correlations corrected for restriction in range and measurement error. The Fisher Z mean of the raw correlations (rOT) is
0.84. When corrections are made for range restriction and measurement error, the Fisher
Z mean disattenuated correlation between theory-based calibration and empirical difficulty
in an unrestricted group of reading comprehension items (R′OT) is 0.93. These results
show that most attempts to measure reading comprehension—no matter what the item
form, type of skill objectives assessed, or response requirement used—measure a common
comprehension factor specified by the Lexile Theory.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 21

21

8/14/07 6:54:06 PM

Lexile Item Bank
The Lexile Item Bank contains over 10,000 items that were developed between 1986 and
2003 for research purposes with the Lexile Framework.
Passage Selection. Passages selected for use came from “real-world” reading materials that
students may encounter both in and out of the classroom. Sources include textbooks,
literature, and periodicals from a variety of interest areas and material written by authors of
different backgrounds. The following criteria were used to select passages:
• the passage must develop one main idea or contain one complete
piece of information,
• understanding of the passage is independent of the information that
comes before or after the passage in the source text, and
• understanding of the passage is independent of prior knowledge not
contained in the passage.
With the aid of a computer program, item writers examined blocks of text (minimum
of three sentences) that were calibrated to be within 100L of the source text. From these
blocks of text item writers were asked to select four to five that could be developed as
items. If it was necessary to shorten or lengthen the passage in order to meet the criteria
for passage selection, the item writer could immediately recalibrate the text to ensure that
it was still targeted within 100L of the complete text (i.e., source targeting).
Item Format. The native-Lexile item format is embedded completion. The embedded
completion format is similar to the fill-in-the-blank format. When properly written,
this format directly assesses the reader’s ability to draw inferences and establish logical
connections between the ideas in the passage. The reader is presented with a passage of
approximately 30 to 150 words in length. The passages are shorter for beginning readers
and longer for more advanced readers. The passage is then response illustrated—a statement with a word or phrase missing is added at the end of the passage, followed by four
options. From the four presented options, the reader is asked to select the “best” option
that completes the statement. With this format, all options are semantically and syntactically
appropriate completions of the sentence, but one option is unambiguously the “best”
option when considered in the context of the passage.
The statement portion of the embedded completion item can assess a variety of skills
related to reading comprehension: paraphrase information in the passage, draw a logical
conclusion based on information in the passage, make an inference, identify a supporting detail, or make a generalization based on information in the passage. The statement
is written to ensure that by reading and comprehending the passage, the reader is able to
select the correct option. When the embedded completion statement is read by itself, each
of the four options is plausible.

22

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 22

8/14/07 6:54:06 PM

Item Writer Training. Item writers were classroom teachers and other educators who had
experience with the everyday reading ability of students at various levels. The use of
individuals with these types of experiences helped to ensure that the items are valid
measures of reading comprehension. Item writers were provided with training materials
concerning the embedded completion item format and guidelines for selecting passages,
developing statements, and creating options. The item writing materials also contained
incorrect items that illustrated the criteria used to evaluate items and corrections based
on those criteria. The final phase of item writer training was a short practice session with
three items.
Item writers were provided vocabulary lists to use during statement and option development. The vocabulary lists were compiled from spelling books one grade level below the
level targeted by the item. The rationale was that these words should be part of a reader’s
“working” vocabulary if they were learned the previous year.
Item writers were also given extensive training related to sensitivity issues. Part of the
item-writing materials addressed these issues and identified areas to avoid when selecting
passages and developing items. The following areas were covered: violence and crime,
depressing situations/death, offensive language, drugs/alcohol/tobacco, sex/attraction,
race/ethnicity, class, gender, religion, supernatural/magic, parent/family, politics, animals/
environment, and brand names/junk food. These materials were developed to be compliant
with standards of universal design and fair access—equal treatment of the sexes, fair
representation of minority groups, and the fair representation of disabled individuals.
Item Review. All items were subjected to a two-stage review process. First, items were
reviewed and edited according to the 19 criteria identified in the item-writing materials
and for sensitivity issues. Approximately 25% of the items developed were deleted for
various reason. Where possible, items were edited and maintained in the item bank.
Items were then reviewed and edited by a group of specialists representing various
perspectives: test developers, editors, and curriculum specialists. These individuals examined
each item for sensitivity issues and the quality of the response options. During the second
stage of the item review process, items were either “approved as presented,” “approved with
edits,” or “deleted.” Approximately 10% of the items written were “approved with edits” or
“deleted” at this stage. When necessary, item writers received additional ongoing feedback
and training.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 23

23

8/14/07 6:54:06 PM

Item Analyses. As part of the linking studies and research studies conducted by MetaMetrics,
items in the Lexile Item Bank were evaluated for difficulty (relationship between logit
[observed Lexile measure] and theoretical Lexile measure), internal consistency (pointbiserial correlation), and bias (ethnicity and gender where possible). Where necessary, items
were deleted from the item bank or revised and recalibrated.
During the spring of 1999, eight levels of a Lexile assessment were administered in a large
urban school district to students in Grades 1 through 12. The eight test levels were administered in Grades 1, 2, 3, 4, 5, 6, 7–8, and 9–12 and ranged from 40 to 70 items depending
on the grade level. A total of 427 items were administered across the eight test levels. Each
item was answered by at least 9,000 students (the number of students per level ranged
from 9,286 in Grade 2 to 19,056 in Grades 9–12). The item responses were submitted
to a Winsteps IRT analysis. The resulting item difficulties (in logits) were assigned Lexile
measures by multiplying by 180 and anchoring each set of items to the mean theoretical
difficulty of the items on the form.

24

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 24

8/14/07 6:54:07 PM

Description of the Test
Test Materials
SRI is “an interactive reading comprehension test that provides an assessment of reading
levels, reported in Lexile measures” (Scholastic, 2006a, p. 1). The results can be used to
measure how well readers comprehend literary and expository texts of varying difficulties.
Item Bank. SRI consists of a bank of approximately 5,000 multiple-choice items that are
presented as embedded completion items. In this question format the student is asked to
read a passage taken from an actual text and then choose the option that best fills the blank
in the last statement. In order to complete the statement, the student must respond on a
literal level (recall a fact) or an inferential level (determine the main idea of the passage,
draw an inference from the material presented, or make a connection between sentences in
the passage).
Educator’s Guide. This guide provides an overview of the SRI software and software
support. Educators are provided information on getting started with the software (installing it, enrolling students, reporting results), how the SRI student program works (login,
book interest screen, Practice Test, Locator Test, SRI test, and reports), and working with
the Scholastic Achievement Manager (SAM). SAM is the learning management system
for all Scholastic software programs including READ 180, Scholastic Reading Counts!, and
ReadAbout. Educators use SAM to collect and organize student-produced data. SAM
helps educators understand and implement data-driven instruction by
• managing student rosters;
• generating reports that capture student performance data at various
levels of aggregation (student, classroom, group, school, and district);
• locating helpful resources for classroom instruction and aligning the
instruction to standards; and
• communicating student progress to parents, teachers, and administrators.
The Educator’s Guide also provides teachers with information on how to use the results
from SRI in the classroom. Teachers can access their students’ reading levels and prescribe
appropriate instructional support material to aid in developing their students’ reading
skills and growth as readers. Information related to best practices for test administration,
interpreting reports, and using Lexiles in the classroom is provided. Reproducibles are
also provided to help educators communicate SRI results to parents, monitor growth, and
recommend books.

Te c h n i c a l G u i d e

74216_SRI_TechGuide_FC-105.indd 25

25

9/26/07 6:03:39 PM

Test Administration and Scoring
Administration Time. SRI can be administered at any time during the school year. The
tests are intended to be untimed. Typically, students take 20–30 minutes to complete the
test. There should be at least eight weeks of elapsed time between administrations to allow
for growth in reading ability.
Administration Setting. SRI can be administered in a group setting or individually—
wherever computers are available: in the classroom, in a computer lab, or in the library
media center. The setting should be quiet and free from distractions. Teachers should make
sure that students have the computer skills needed to complete the test. Practice items
are provided to ensure that students understand the directions and know how to use the
computer to take the test.
Administration and Scoring. The student experience with SRI consists of three phrases:
practice test, locator test, and SRI test. Prior to testing, the teacher or administrator inputs
information into the computer-adaptive algorithm that controls the administration of the
test. The student’s identification number and grade level must be input; prior standardized
reading results (Lexile measure, percentile, stanine, or NCE) and the teacher’s judgment of
the student’s reading level (Far Below, Below, On, Above, or Far Above) should be input.
This information is used to determine the best starting point for the student.
The Practice Test consists of three items that are significantly below the student’s reading level
(approximately 10th percentile for grade level). The practice items are administered only
during the student’s first experience with SRI and are designed to ensure that the student
understands the directions and how to use the computer to take the test.
For students in Grades 7 and above and for whom the only data to set the starting item
difficulty is their grade level, a Locator Test is presented to better target the students. The
Locator Test consists of 2–5 items that have a reading demand 500L below the “On Level”
designation for the grade. The results are used to establish the student’s prior reading ability
level. If students respond incorrectly to one or more items, their prior reading ability is set
to “Far Below Grade Level.”
SRI uses a three-phase approach to assess a student’s level of reading comprehension: Start,
Step, Stop. During test administration, the computer adapts the test continually according
to the student’s responses to the items. The student starts the test; the test steps up or down
according to the student’s performance; and, when the computer has enough information
about the student’s reading level, the test stops.
The first phase, Start, determines the best point on the Lexile scale to begin testing the
student. The more information that is input into the algorithm, the better targeted the
beginning of the test. Research has shown that well-targeted tests include less error in
reporting student scores than poorly targeted tests. A student is targeted in one of three
ways: (1) the teacher or test administrator enters the student’s Estimated Reading Level;
(2) the student is in Grade 6 or below and the student’s grade level is used; or (3) the
student is in Grade 7 or above and the Locator Test is administered.

26

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 26

8/14/07 6:54:07 PM

For the student whose test administration is illustrated in Figure 2, the teacher input the
student’s grade (6) and Lexile measure from the previously administered SRI Print.

Figure 2. Sample administration of SRI for a sixth-grade student with a prior Lexile
measure of 880L.
900

Q14

890

Q13

880

Q12

Item Difficulty

870

Q10

860

Q7

850

Q6

840

Q3

830
820

Q2

810
800

Q9

Q11

Q8

Q5
Q4

Q1

790

SRI Administration
The second phase, Step, controls the selection of items presented to the student. If only the
student’s grade level was input during the first phase, then the student is presented with an
item that has a Lexile measure at the 50th percentile for her grade. If more information
about the student’s reading ability was input during the first phase, then the student is
presented with an item that is nearer her true ability. If the student answers the item
correctly, then she is presented with an item that is slightly more difficult. If the student
responds incorrectly to the item, then she is presented with an item that is slightly easier.
After the student responds to each item, her SRI score (Lexile measure) is recomputed.
Figure 2 above shows how SRI could be administered. The first item presented to the
student measured 800L. Because she answered the item correctly, the next item was slightly
more difficult (810L), her third item measured 830L. Because she responded incorrectly to
this item, the next item was slightly easier (820L).
The final phase, Stop, controls the termination of the test. Each student will be presented
15–25 items. The exact number of items a student receives depends on how the student
responds to the items as they are presented. In addition, the number of items presented to
the student is affected by how well the test is targeted in the beginning. Well-targeted tests

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 27

27

8/14/07 6:54:07 PM

begin with less measurement error and, therefore, the student will be asked to respond to
fewer items.
Because the test administered to the student in Figure 2 was well-targeted to her reading
level (50th percentile for Grade 6 is 880L), only 15 items were administered to the student
to determine her Lexile measure.
Results from SRI are reported as scale scores (Lexile measures). This scale extends from
Beginning Reader (less than 100L) to 1500L. A scale score is determined by the difficulty
of the items a student answered both correctly and incorrectly. Scale scores can be used to
report the results of both criterion-referenced tests and norm-referenced tests.
There are many reasons to use scale scores rather than raw scores to report test results.
Scale scores overcome the disadvantage of many other types of scores (e.g., percentiles and
raw scores) in that equal differences between scale score points represent equal differences
in achievement. Each question on a test has a unique level of difficulty; therefore, answering 23 items correctly on one form of a test requires a slightly different level of achievement than answering 23 items correctly on another form of the test. But receiving a scale
score (in this case, a Lexile measure) of 675L on one form of a test represents the same level
of reading ability as receiving a scale score of 675L on another form of the test.
Keep in mind that no one test should be the sole determinate when making high-stakes
decisions about students (e.g., summer-school placement or retention). Consider the
student’s interests and experiences, as well as knowledge of each student’s reading abilities,
when making these kinds of decisions.
SRI begins with the concept of targeted level testing and takes it a step further. With the
Lexile Framework as the yardstick of text difficulty, SRI produces a measure that places texts
and readers on the same scale. The Lexile measure connects each student to actual reading
materials—school texts, story books, magazines, newspapers, employee instructions—which
can be readily understood by that student. Because SRI provides an accurate measure of
where each student reads among the variety of reading materials calibrated in the Lexile
Titles Database, the instructional approach and reading assignments for optimal growth are
explicit. SRI targeted testing not only measures how well each student can actually read, but
also locates them among the real reading materials which are most useful to them. In addition, the performance experience of taking a targeted test, a test that, because of its targeting,
is both challenging and reassuring, brings out the best in students.

Interpreting Scholastic Reading Inventory Scores
SRI provides both criterion-referenced and norm-referenced interpretations of the
Lexile measures. Criterion-referenced interpretations of test results provide a rich frame
of reference that can be used to guide instruction and text selection for optimal student
reading growth. While norm-referenced interpretations of test results are often required for
accountability purposes, they indicate only how well the student is reading in relation to
how other, similar students read.

28

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 28

8/14/07 6:54:09 PM

Norm-Referenced Interpretations. A norm-referenced interpretation of a test score expresses
how a student performed on the test compared to other students of the same age or grade.
Norm-referenced interpretations of reading test results, however, do not provide any information about what a student can or cannot read. For accountability purposes, percentiles,
normal curve equivalents (NCEs), and stanines are used to report test results when making
comparisons (norm-referenced interpretations). For a comparison of these measures, refer
to Figure 3.

Figure 3. Normal distraction of scores described in scale scores, percentiles, stanines,
and normal curve equivalents (NCEs).

Percentage of area
under the normal curve
1

2
1

5

9

10

20

15
30

18
40

18
50

15
60

9

5

70

80

2
90

1
99

Normal curve equivalent scores (NCEs)
10

1

20

30

40

50

60

70

80

90

95

Percentiles
1

2

3

4

5

6

7

8

9

Stanines
The percentile rank of a score indicates the percentage of scores less than or equal to that score.
Percentile ranks range from 1 to 99. For example, if a student scores at the 65th percentile,
it means that he or she performed as well as or better than 65% of the norm group. Real
differences in performance are greater at the ends of the percentile range than in the middle.
Percentile ranks of scores can be compared across two or more distributions; percentile ranks
cannot be used to determine differences in relative rank due to the fact that the intervals
between adjacent percentile ranks do not necessarily represent equal raw score intervals. Note
that the percentile rank does not refer to the percentage of items answered correctly.
A normal curve equivalent (NCE) is a normalized student score with a mean of 50 and a
standard deviation of 21.06. NCEs range from 1 to 99. NCEs allow comparisons between
different tests for the same student or group of students and between different students on

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 29

29

8/14/07 6:54:09 PM

the same test. NCEs have many of the same characteristics as percentile ranks, but have the
additional advantage of being based on an interval scale. That is, the difference between
two consecutive scores on the scale has the same meaning throughout the scale. NCEs are
required by many categorical funding agencies (for example, Title I).
A stanine is a standardized student score with a mean of 5 and a standard deviation of 2.
Stanines range from 1 to 9. In general, stanines of 1–3 are considered below average, stanines
of 4–6 are considered average, and stanines of 7–9 are considered above average. A difference
of 2 between the stanines for two measures indicates that the two measures are significantly
different. Stanines, like percentiles, indicate a student’s relative standing in a norm group.
While not very useful at the student level, normative information can be useful (and often
required) at the aggregate levels for program evaluation. Appendix 2 contains normative
data (percentiles, stanines, and NCEs) for some levels of SRI. Complete levels are found
in the SRI program under the Resource Section in the Scholastic Achievement Manager
(SAM).
A linking study conducted with the Lexile Framework developed normative information
based on a sample of 512,224 students from a medium-to-large state. The majority of the
students in the norming population were Caucasian (66.3%), with 29.3% African American,
1.7% Native American, 1.2% Hispanic, 1.0% Asian, and 0.6% Other. Less than 1% (0.7%) of
the students were classified as “limited English proficient,” and 10.1% of the students were
classified as “Students with Disabilities.” Approximately 40% of the students were eligible
for the free or reduced-price lunch program. Approximately half of the schools in the state
had some form of Title I program (either school-wide or targeted assistance). The sample’s
distributions of scores on norm-referenced and other standardized measures of reading
comprehension are similar to those reported for national distributions.
Criterion-Referenced Interpretations. An important feature of the Lexile Framework is that it
also provides criterion-referenced interpretations of every measure. A criterion-referenced
interpretation of a test score compares the specific knowledge and skills measured by the
test to the student’s proficiency with the same knowledge and skills. Criterion-referenced
scores have meaning in terms of what the student knows or can do, rather than in relation
to the scores produced by some external reference (or norm) group.
When a reader’s measure is equal to the task’s calibration, then the Lexile scale forecasts that
the individual has a 75% comprehension rate on that task. When 20 such tasks are given
to this reader, one expects three-fourths of the responses to be correct. If the task is more
difficult than the reader is able, then the probability is less than 75% that the response of the
person to the task will be correct. Similarly, when the task is easier compared to a reader’s
measure, then the probability is greater than 75% that the response will be correct.
There is empirical evidence supporting the choice of a 75% target comprehension rate,
as opposed to, say, a 50% or a 90% rate. Squires, Huitt, and Segars (1983) observed that
reading achievement for second-graders peaked when the success rate reached 75%. A
75% success rate also is supported by the findings of Crawford, King, Brophy, and Evertson
(1975), Rim (1980), and Huynh (1998). It may be, however, that there is no one optimal
30

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 30

8/14/07 6:54:09 PM

rate of reading comprehension. It may be that there is a range in which individuals can
operate to optimally improve their reading ability.
Since the Lexile Theory provides complementary procedures for measuring people and text,
the scale can be used to match a person’s level of comprehension with books that the person
is forecast to read with a high comprehension rate. Trying to identify possible supplemental
reading materials for students has, for the most part, relied on a teacher’s familiarity with the
titles. For example, an eighth-grade girl who is interested in sports but is not reading at grade
level may be interested in reading a biography about Chris Evert. The teacher may not know,
however, whether a specific biography is too difficult or too easy for the student. The Lexile
Framework provides a reader measure and a text measure on the same scale. Armed with this
information, a teacher, librarian, media specialist, student, or parent can plan for success.
Readers develop reading comprehension skills by reading. Skill development is enhanced
when their reading is accompanied by frequent response requirements. Response requirements may be structured in a variety of ways. An instructor may ask oral questions as
the reader progresses through the prose or written questions may be embedded in the
text, much as is done with Scholastic Reading Inventory items. Response requirements are
important; unless there is some evaluation and self-assessment, there can be no assurance
that the reader is properly targeted and comprehending the material. Students need to
be given a text on which they can practice being a competent reader (Smith, 1973). The
above approach does not complete a fully articulated instructional theory, but its prescription is straightforward. Students need to read more and teachers need to monitor this
reading with some efficient response requirement. One implication of these notions is that
some of the time spent on skill sheets might be better spent reading targeted prose with
concomitant response requirements (Anderson, Hiebert, Scott, and Wilkinson, 1985). This
approach has been supported by the research of Five (1980) and Hiebert (1998).
As the reader improves, new titles with higher text measures can be chosen to match the
growing reader ability. This results in a constantly growing person-measure, thus keeping
the comprehension rate at the most productive level. We need to locate a reader’s “edge”
and then expose the reader to text that plays on that edge. When this approach is followed
in any domain of human development, the edge moves and the capacities of the individual
are enhanced.
What happens when the “edge” is over-estimated and repeatedly exceeded? In physical exertion, if you push beyond the edge you feel pain; if you demand even more from the muscle,
you will experience severe muscle strain or ligament damage. In reading, playing on the edge
is a satisfying and confidence-building activity, but exceeding that edge by over-challenging
readers with out-of-reach materials reduces self-confidence, stunts growth, and results in the
individual “tuning out.” The tremendous emphasis on reading in daily activities makes every
encounter with written text a reconfirmation of a poor reader’s inadequacy.
For individuals to become competent readers, they need to be exposed to text that results
in a comprehension rate of 75% or better. If an 850L reader is faced with an 1100L text
(resulting in a 50% comprehension rate), there will be too much unfamiliar vocabulary

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 31

31

8/14/07 6:54:10 PM

and too much of a load placed on the reader’s tolerance for syntactical complexity for that
reader to attend to meaning. The rhythm and flow of familiar sentence structures will
be interrupted by frequent unfamiliar vocabulary, resulting in inefficient chunking and
short-term memory overload. When readers are correctly targeted, they read fluidly with
comprehension; when incorrectly targeted, they struggle both with the material and with
maintaining their self-esteem. Within the Lexile Framework, there are no poor readers—only
mistargeted readers who are being over challenged.
Forecasting Comprehension Rates. A reader with a measure of 600L who is given a text
measured at 600L is expected to have a 75% comprehension rate. This 75% comprehension
rate is the basis for selecting text that is targeted to a reader’s ability, but what exactly does
it mean? And what would the comprehension rate be if this same reader were given a text
measured at 350L or one at 850L?
The 75% comprehension rate for a reader-text pairing can be given an operational meaning by imagining the text is carved into item-sized ”chunks” of approximately 125–140
words with a question embedded in each chunk. A reader who answers three-fourths of
the questions correctly has a 75% comprehension rate.
Suppose instead that the text and reader measures are not the same. The difference in
Lexiles between reader and text governs comprehension. If the text measure is less than
the reader measure, the comprehension rate will exceed 75%. If the text measure is much
less, the comprehension rate will be much greater. But how much greater? What is the
expected comprehension rate when a 600L reader reads a 350L text?
If all the item-sized chunks in the 350L text had the same calibration, the 250L difference
between the 600L reader and the 350L text could be determined using the Rasch model
equation (Equation 2 on page 37). This equation describes the relationship between the
measure of a student’s level of reading comprehension and the calibration of the items.
Unfortunately, comprehension rates calculated only by this procedure would be biased
because the calibrations of the slices in ordinary prose are not all the same. The average
difficulty level of the slices and their variability both affect the comprehension rate.
Figure 4 shows the general relationship between reader-text discrepancy and forecasted
comprehension rate. When the reader measure and the text calibration are the same, then
the forecasted comprehension rate is 75%. In the example from the preceding paragraph,
the difference between the reader measure of 600L and the text calibration of 350L is
250L. Referring to Figure 4 and using +250L (reader minus text), the forecasted comprehension rate for this reader-text combination would be 90%.
The subjective experience of 50%, 75%, and 90% comprehension as reported by readers
varies greatly. A 1000L reader reading 1000L text (75% comprehension) reports confidence
and competence. Teachers listening to such a reader report that the reader can sustain the
meaning thread of the text and can read with motivation and appropriate emotion and
emphasis. In short, such readers appear to comprehend what they are reading. A 1000L
reader reading 1250L text (50% comprehension) encounters so much unfamiliar vocabulary and difficult syntax that the meaning thread is frequently lost.
32

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 32

8/14/07 6:54:10 PM

Forecasted Comprehension Rate

Figure 4. Relationship between reader-text discrepancy and forecasted reading
comprehension rate.
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
–1000

–750

–500

–250

0

250

500

750

1000

Reader – Text (in Lexiles)
Tables 4 and 5 show comprehension rates calculated for various combinations of reader
measures and text calibrations.

Table 4. Comprehension rates for the same individual with materials of varying
comprehension difficulty.
Reader
Measure

Text Calibration

1000L
1000L
1000L
1000L
1000L

500L
750L
1000L
1250L
1500L

Sample Titles
Tornado (Byars)
The Martian Chronicles (Bradbury)
Reader’s Digest
The Call of the Wild (London)
On the Equality Among Mankind
(Rousseau)

Forecasted
Comprehension
96%
90%
75%
50%
25%

Such readers report frustration and seldom choose to read independently at this level of
comprehension. Finally, a 1000L reader reading 750L text (90% comprehension) reports total
control of the text, reads with speed, and experiences automaticity during the reading process.
The primary utility of the Lexile Framework is its ability to forecast what happens when
readers confront text. Every application by a teacher, student, librarian, or parent is a test
of the Lexile framework’s accuracy. The Lexile framework makes a point prediction every
time a text is chosen for a reader. Anecdotal evidence suggests that the Lexile Framework

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 33

33

8/14/07 6:54:10 PM

Table 5. Comprehension rates of different-ability readers with the same material.
Reader
Measure

Calibration of
Typical Grade 10 Textbook

Forecasted
Comprehension Rate

500L
750L
1000L
1250L
1500L

1000L
1000L
1000L
1000L
1000L

25%
50%
75%
90%
96%

predicts as intended. That is not to say the forecasted comprehension is error-free. There is
error in text measures, reader measures, and their difference modeled as forecasted comprehension. However, the error is sufficiently small that the judgments about readers, texts, and
comprehension rates are useful.
Performance Standard Proficiency Bands. A growing trend in education is to differentiate
between content standards—curricular frameworks that specify what should be taught at each
grade level—and performance standards—what students must do to demonstrate proficiency
with respect to the specific content. Increasingly, educators and parents want to know more
than just how a student’s performance compares with that of other students: they ask, “What
level of performance does a score represent?” and “How good is good enough?”
The Lexile Framework for Reading, in combination with Scholastic Reading Inventory,
provides a context for examining performance standards from two perspectives—
reader-based standards and text-based standards. Reader-based standards are determined
by examining the skills and knowledge of students identified as being at the requisite level
(the examinee-centered method) or by examining the test items and defining what level of
skills and knowledge the student must have to be at the requisite level (the task-centered
method). A cut score is established that differentiates between students who have the
desired level of skills and knowledge to be considered as meeting the standard and those
who do not. Text-based standards are determined by specifying those texts that students
with a certain level of skills and knowledge (for example, a high school graduate) should be
able to read with a specified level of comprehension. A cut score is established that reflects
this level of ability and is then annotated with benchmark texts descriptive of the standard.
In 1999, four performance standards were set at each grade level in SRI—Below Basic,
Basic, Proficient, and Advanced. Proficient was defined as performance that exhibited
competent academic performance when students read grade-level appropriate text and
could be considered as reading “on Grade Level.” Students performing at this level should
be able to identify details, draw conclusions, and make comparisons and generalizations
when reading materials developmentally appropriate for their nominal grade level.

34

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 34

8/14/07 6:54:10 PM

The standard-setting group consisted of curriculum specialists, test development consultants, and other educators. A general description of the process used by the standard-setting
group to arrive at the final cut scores follows:
• Group members reviewed previously established performance standards
for Grades 1–12 that could be reported in terms of the Lexile scale.
Information that defined and/or described each of the measures was
provided to the group. In addition, for the reader-based standards,
information was provided concerning when the standards were set, the
policy definition of the standards, the performance descriptors of the
standards (where available), the method used to set the standards, and the
type of impact data provided to the panelists.
• Reader-based standards included the following: the Stanford Achievement
Test, Version 9 (Harcourt Brace Educational Measurement, 1997); the
North Carolina End-of-Grade Test (North Carolina Department of Public
Instruction, 1996); and the National Assessment of Educational Progress
(National Assessment Governing Board, 1997).
• Text-based standards included the following: Miami-Dade Public
Schools (Miami, Florida, 1998); text on the National Assessment of
Educational Progress at Grades 4, 8, and 12; text-based materials found
in classrooms and delineated on the Lexile Map; materials associated
with adult literacy (workplace—1100L–1400L; continuing education—1100L–1400L; citizenship—newspapers 1200L–1400L; morals,
ethics, and religion—1400L–1500L; and entertainment—typical
novels 900L–1100L); and grade-level based curriculum materials such
as READ 180 by Scholastic Inc.
• Round 1. Members of the standard-setting group individually studied
the previously established performance standards and determined
corresponding Lexile measures for student performance at the top and
bottom of the “Proficient” standard.
• Round 2. The performance levels identified for each grade in Round
1 were distributed to all members of the standard-setting group. The
group discussed the range of cut scores identified for a grade level
until consensus was reached. The process was repeated for each grade,
1–11. In addition, lower “intervention” points were identified that
could be used to flag results that indicated a student was significantly
below grade level (the “Below Basic” performance standard).
• Round 3. In this round impact data were provided to the members
of the standard-setting group. This information was based on the
reader-based standards that had been previously established (Stanford
Achievement Test, Version 9 national percentiles).

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 35

35

8/14/07 6:54:11 PM

The policy descriptions for each of the performance standard proficiency band used at
each grade level are as follows:
• Advanced: Students scoring in this range exhibit superior performance
when reading grade-level appropriate text and can be considered as
reading “above Grade Level.”
• Proficient: Students scoring in this range exhibit competent performance
when reading grade-level appropriate text and can be considered as
reading “on Grade Level.” Students performing at this level should be
able to identify details, draw conclusions, and make comparisons and
generalizations when reading materials developmentally appropriate for
the grade level.
• Basic: Students scoring in this range exhibit minimally competent
performance when reading grade-level appropriate text and can be
considered as reading “Below Grade Level.”
• Below Basic: Students scoring in this range do not exhibit minimally
competent performance when reading grade-level appropriate text
and can be considered as reading significantly “Below Grade Level.”
The final cut scores for each grade level in Scholastic Reading Inventory are presented in Table 6.

Table 6. Performance standard proficiency bands for SRI, in Lexiles, by grade.
Grade

Below Basic

Basic

Proficient

Advanced

1
2
3
4
5
6
7
8
9
10
11

—
99 and Below
249 and Below
349 and Below
449 and Below
499 and Below
549 and Below
599 and Below
649 and Below
699 and Below
799 and Below

99 and Below
100 to 299
250 to 499
350 to 599
450 to 699
500 to 799
550 to 849
600 to 899
650 to 999
700 to 1024
800 to 1049

100 to 400
300 to 600
500 to 800
600 to 900
700 to 1000
800 to 1050
850 to 1100
900 to 1150
1000 to 1200
1025 to 1250
1050 to 1300

401 and Above
601 and Above
801 and Above
901 and Above
1001 and Above
1051 and Above
1101 and Above
1151 and Above
1201 and Above
1251 and Above
1301 and Above

Note: The original standards for Grade 2 were revised by Scholastic Inc. (December 1999) and are presented above. The original
standards for Grades 9, 10, and 11 were revised by Scholastic Inc. (January 2000) and are presented above.

36

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 36

8/14/07 6:54:11 PM

Using SRI Results
The Lexile Framework for Reading provides teachers and educators with tools to help
them link the results of assessment with subsequent instruction. Tests such as SRI that are
linked to the Lexile scale provide tools for monitoring the progress of students at any time
during the school year.
When a reader takes an SRI test, his or her results are reported as a Lexile measure. This
means, for example, that a student whose reading skills have been measured at 500L is
expected to read with 75% comprehension a book that is also measured at 500L. When the
reader and text are matched by their Lexile measures, the reader is “targeted.” A targeted
reader reports confidence, competence, and control over the text. When a text measure
is 250L above the reader’s measure, comprehension is predicted to drop to 50% and the
reader experiences frustration and inadequacy. Conversely, when a text measure is 250L
below the reader’s measure, comprehension is predicted to increase to 90% and the reader
experiences total control and automaticity.
Lexile Framework. The Lexile Framework for Reading is a tool that can help determine
the reading level of written material—from a book, to a test item, to a magazine article,
to a Web site, to a textbook. After test results are converted into Lexile measures, readers
can be matched with materials on their own level. More than 100,000 books, 80 million
periodical articles, and many newspapers have been leveled using this tool to assist in selecting reading materials.
Developed by the psychometric research company MetaMetrics, Inc., the Lexile Framework was funded in part by a series of grants from the National Institute of Child Health
and Human Development. The Lexile Framework makes provisions for students who
read below or beyond their grade level. See the Lexile Framework Map in Appendix 1
for fiction and nonfiction titles, leveled reading samples, and approximate grade ranges.
A Lexile measure is the specific number assigned to any text. A computer program called
the Lexile Analyzer ® computes it. The Lexile Analyzer carefully examines the complete text
to measure such characteristics as sentence length and word frequency—characteristics
that are highly related to overall reading comprehension. The Lexile Analyzer then reports a
Lexile measure for the text.
Using the Lexile Framework to Select Books. Teachers, parents, and students can use the
tools provided by the Lexile Framework to plan instruction. When teachers provide parents
and students with lists of titles that match the students’ Lexile measures, they can then work
together to choose appropriate titles that also match the students’ interest and background
knowledge. The Lexile Framework does not prescribe a reading program; it is a tool that gives
educators more control over the variables involved when they design reading instruction. The Lexile
Framework yields multiple opportunities for use in a variety of instructional activities.
After becoming familiar with the Lexile Framework, teachers are likely to think of a
variety of additional creative ways to use this tool to match students to books that they find
challenging but not frustrating.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 37

37

8/14/07 6:54:11 PM

The Lexile Framework is a system that helps match readers with literature appropriate for
their reading skills. When reading a book within their Lexile range (50L above to 100L
below their Lexile measure), readers should comprehend enough of the text to make sense
of it, while still being challenged enough to maintain interest and learning.
Remember, there are many factors that affect the relationship between a reader and a book.
These factors include content, age of the reader, interest, suitability of the text, and text
difficulty. The Lexile measure of a text, a measure of text difficulty, is a good starting point
for the selection process; other factors should then be considered. The Lexile measure
should never be the sole factor considered when selecting a text.
Helping Students Set Appropriate Learning Goals. Students’ Lexile measures can be
used to identify reading materials that they are likely to comprehend with 75% accuracy.
Students can set goals for improving their reading comprehension, and plan clear strategies
to reach those goals, using literature from the appropriate Lexile ranges. Students can be
retested using SRI during the school year to monitor their progress toward their goals.
Monitoring Progress Toward Reading Program Goals. As students’ Lexile measures increase,
their reading comprehension ability increases, and the set of reading materials they can
comprehend at 75% accuracy expands. Many school districts are required to write school
improvement plans that include measurable goals. Schools also write grant applications in
which they are required to state how they will monitor progress of the intervention funded
by the grant. For example, schools that receive Reading Excellence Act funds can use the
Lexile Framework for evaluation purposes. Schools can use student-level and district-level
Lexile information to monitor and evaluate interventions designed to improve reading skills.
Examples of measurable goals and clearly related strategies for reading intervention
programs might include:
Goal: At least half of the students will improve their reading comprehension abilities by 100L after one year’s use of an intervention.
Goal: Students’ attitudes about reading will improve after reading
10 books at their 75% comprehension rate.
These examples of goals emphasize the fact that the Lexile Framework is not an intervention,
but a tool to help educators plan instruction and measure the success of the reading program.
Including Parents in the Educational Process. Teachers can use the Lexile Framework to
engage parents in the following sample exchanges: “Your child will be able to read with at
least 75% comprehension these materials from the next grade level”; “Your child will need
to improve by 400–500 Lexiles to prepare for college in the next few years. Here is a list of
appropriate titles your child can choose from for reading this summer.”
Challenging the Best Readers. A variety of instructional programs are available for the
poorest readers, but few resources are available to help teachers challenge their best readers.
The Lexile Framework links reading comprehension levels to reading material for the
entire range of reading abilities and will help teachers identify age-appropriate reading
material to challenge the best readers.
38

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 38

8/14/07 6:54:11 PM

Studies have shown that students who succeed in school without being challenged often
develop poor work habits and unrealistic expectations of effortless success as adults. Even
though these problems are not likely to be evidenced until the reader is beyond school age,
providing appropriate-level curriculum to the best students may be as important as it is for
the poorest-reading students.
Improving Students’ Reading Fluency. Educational researchers have found that students
who spend a minimum of three hours a week reading at their own level develop reading
fluency that leads to improved mastery. Researchers have also found that students who read
age-appropriate materials with a high level of comprehension also learn to enjoy reading.
Teaching Learning Strategies by Controlling Comprehension Match. The Lexile Framework permits teachers to intentionally under- or over-target students when they want
students to work on fluency and automaticity or new skills. Metacognitive ability has
been well documented to play an important role in reading comprehension performance.
When teachers know the level of texts that would challenge a group of readers, they can
systematically target instruction that will allow students to encounter difficult text in a
controlled fashion. Teachers can model appropriate learning strategies for students, such as
rereading or rephrasing text in one’s own words, so that students can then learn what to do
when comprehension breaks down. Then students can practice metacognitive strategies on
selected text while the teacher monitors their progress.
Teachers can use Lexiles to guide a struggling student toward texts at the lower end of the
student’s Lexile range (below 100L to 50L above the Lexile measure). Similarly, advanced
students can be adequately challenged by reading texts at the midpoint of their Lexile
range, or slightly above. Challenging new topics may be approached in the same way.
Reader-focused adjustment of the learning experience relates to the student’s motivation
and purpose. If a student is highly motivated for a particular reading task, the teacher
may suggest books higher in the student’s Lexile range. If the student is less motivated
or intimidated by a reading task, material at the lower end of his or her Lexile range can
provide the comprehension support to keep the student from feeling overwhelmed.
Targeting Instruction to Students’ Abilities. To encourage optimal progress with reading,
teachers need to be aware of the difficulty level of the text relative to a student’s reading
level. A text that is too difficult serves to undermine a student’s confidence and diminishes
learning itself. A text that is too easy fosters bad work habits and unrealistic expectations.
When students confront new kinds of texts, their introduction can be softened and made
less intimidating by guiding students to easier reading. On the other hand, students who
are comfortable with a particular genre or format can be challenged with more material
from difficult levels, which will prevent boredom and promote the greatest improvement in
vocabulary and comprehension skills.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 39

39

8/14/07 6:54:11 PM

To become better readers, students need to be continually challenged—they need to be
exposed to less common and more difficult vocabulary in meaningful contexts. A 75%
comprehension rate provides an appropriate level of challenge. If text is too difficult for
a reader, the result is frustration and a probable dislike for reading. If text is too easy, the
result is often boredom. Reading levels promote growth and literacy by providing the
optimal balance. Reading just 20 minutes a day can be vital.
Applying Lexiles Across the Curriculum. Over 450 publishers Lexile their titles, enabling
educators to link all the different components of the curriculum to target instruction more
effectively. Equipped with a student’s Lexile measure, teachers can connect him or her to
books and newspaper and magazine articles that have Lexile measures (visit www.Lexile.
com for more details).
Using Lexiles in the Classroom
• Develop individualized reading lists that are tailored to provide
appropriately challenging reading.
• Enhance thematic teaching by building a bank of titles at varying
levels that not only support the theme, but also provide a way for all
students to participate in the theme successfully.
• Sequence reading materials according to their difficulty. For example,
choose one book a month for use as a read-aloud throughout the school
year, then increase the difficulty of the books throughout the year. This
approach is also useful for core programs or textbooks organized in
anthology format. (Educators often find that they need to rearrange the
order of the anthologies to best meet their students’ needs.)
• Develop a reading folder that goes home with students and returns
weekly for review. The folder can contain a reading list of books
within the student’s Lexile range, reports of recent assessments, and a
parent form to record reading that occurs at home.
• Choose texts lower in a student’s Lexile range when factors make the
reading situation more challenging, threatening, or unfamiliar. Select
texts at or above a student’s range to stimulate growth, when a topic
holds high interest for a student, or when additional support such as
background teaching or discussion is provided.
• Use the Lexile Titles Database (at www.Lexile.com) to support book
selection and create booklists within a student’s Lexile range to inform
students’ choices of texts.
• Use the Lexile Calculator (at www.Lexile.com) to gauge expected reading comprehension at different Lexile measures for readers and texts.

40

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 40

8/14/07 6:54:12 PM

Using Lexiles in the Library
• Label books with Lexile measures to help students find interesting
books at their reading level.
• Compare student Lexile levels with the Lexile levels of the books and
periodicals in the library to help educators analyze and develop the
collection to more fully meet the needs of all students.
• Use the Lexile Titles Database (at www.Lexile.com) to support book
selection and create booklists within a student’s Lexile range to help
educators guide student reading selections.
Using Lexiles at Home
• Ensure that each child gets plenty of reading practice, concentrating
on material within his or her Lexile range. Parents can ask their child’s
teacher or school librarian to print a list of books in their child’s range
or search the Lexile Titles Database.
• Communicate with the child’s teacher and school librarian about the
child’s reading needs and accomplishments. They can use the Lexile
scale to describe their assessment of the child’s reading ability.
• When a reading assignment proves too challenging for a child, use
activities to help. For example, review the words and definitions from
the glossary and the study questions at the end of a chapter before the
child reads the text. Afterwards, be sure to return to the glossary and
study questions to make certain the child understands the material.
• Celebrate a child’s reading accomplishments. The Lexile Framework
provides an easy way for readers to track their own growth. Parents
and children can set goals for reading—following a reading schedule,
reading a book with a higher Lexile measure, trying new kinds of
books and articles, or reading a certain number of pages per week.
When children reach the goal, make it an occasion!
Limitations of the Lexile Framework. Just as variables other than temperature affect
comfort, variables other than semantic and syntactic complexity affect reading comprehension ability. A student’s personal interests and background knowledge are known to
affect comprehension. We do not dismiss the importance of temperature simply because it
alone does not dictate the comfort of an environment. Similarly, though the information
communicated by the Lexile Framework is valuable, the inclusion of other information
enhances instructional decisions. Parents and students should have the opportunity to give
input regarding students’ interests and background knowledge when test results are linked
to instruction.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 41

41

8/14/07 6:54:12 PM

SRI Results and Grade Levels. Lexile measures do not translate precisely to grade levels.
Any grade will encompass a range of readers and reading materials. A fifth-grade classroom
will include some readers who are far ahead of the rest (about 250L above) and some readers who are far below the rest (about 250L below). To say that some books are “just right”
for fifth graders assumes that all fifth graders are reading at the same level. The Lexile
Framework can be used to match readers with texts at whatever level is appropriate.
Just because a student is an excellent reader does not mean that he or she would comprehend a text typical of a higher grade level. Without the requisite background knowledge,
a student will still struggle to make sense of the text. A high Lexile measure for a grade
indicates only that the student can read grade-level appropriate materials at a higher level
of comprehension (say 90%).
The real power of the Lexile Framework is in tracking readers’ growth—wherever they
may be in the development of their reading skills. Readers can be matched with texts that
they are forecasted to read with 75% comprehension. As readers grow, they can be matched
with more demanding texts. And, as texts become more demanding, readers grow.

42

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 42

8/14/07 6:54:12 PM

Development of Scholastic Reading Inventory
Scholastic Reading Inventory was developed to assess a student’s overall level of reading
comprehension based on the Lexile Framework. SRI is an extension of the test development work begun in the 1980s and 1990s on the Early Learning Inventory (MetaMetrics,
1995) and the Lexile Framework which was funded by a series of grants from the National
Institute of Child Health and Human Development. The Early Learning Inventory was
developed for use in Grades 1 through 3 as an alternative to many standardized assessments
of reading comprehension; it was neither normed nor timed and was designed to examine
a student’s ability to read text for meaning.
Item development and test development are interrelated processes; for the purpose of this
document they will be treated as independent activities. A bank of approximately 3,000
items was developed for the initial implementation of SRI. Two subsequent item development phases were completed in 2002 and 2003. SRI was first developed as a print-based
assessment. Two parallel forms of the assessment (A and B) were developed during 1998
and 1999. Also in 1998, Scholastic decided to develop a computer-based, interactive
version of the assessment. The interactive Version 1 of SRI was launched in fall 1999.
Subsequent versions were launched between 1999 and 2003 with Version 1.0/Enterprise
Edition launched in winter 2006.

Development of the SRI Item Bank
Passage Selection. Passages selected for use on Scholastic Reading Inventory came from “real
world” reading materials that students may encounter both in and out of the classroom.
Sources included school textbooks, literature, and periodicals from a variety of interest areas
and material written by authors of different backgrounds. The following criteria were used
to select passages:
• the passage must develop one main idea or contain one complete
piece of information,
• understanding of the passage is independent of the information that
comes before or after the passage in the source text, and
• understanding of the passage is independent of prior knowledge not
contained in the passage.
With the aid of a computer program, item writers examined prose excerpts of 125 words
in length that included a minimum of three sentences and were calibrated to within
100L of the source text. This process, called source targeting, uses information from an
entire text to ensure that the estimated syntactic complexity and semantic demand of an
excerpted passage are consistent with the “true” reading demand of the source text. From
these passages the item writers were asked to select four to five that could be developed as
items. If it was necessary to shorten or lengthen the passage in order to meet the criteria
for selection, the item writer could immediately recalibrate the passage to ensure that it was
still targeted within 100L of the complete text.
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 43

43

8/14/07 6:54:12 PM

Item Writing—Format. The traditional cloze procedure for item creation is based on
deleting every fifth to seventh word (or some variation) regardless of its part of speech
(Bormuth, 1967, 1968, 1970). Certain categories of words can also be selectively deleted.
Selective deletions have shown greater instructional effects than random deletions.
Evidence shows that cloze items reveal both text comprehension and language mastery
levels. Some of the research on metacognition shows that better readers use more strategies
(and, more importantly, appropriate strategies) when they read. Cloze items have been
shown to require more rereading of the passage and increased use of context clues.
Scholastic Reading Inventory consists of embedded completion items. Embedded completion
items are an extension of the cloze format, similar to fill-in-the-blank. When properly
written, this item type directly assesses a reader’s ability to draw inferences and establish
logical connections among the ideas in a passage. SRI presents a reader with a passage of
approximately 30 to 150 words in length. Passages are shorter for beginning readers and
longer for more advanced readers. The passage is then response illustrated—a statement
with a word or phrase missing is added at the end of the passage, followed by four options.
From the four presented options, which may be a single word or phrase, a reader is asked to
select the “best” option to complete the statement.
Items were written so that the correct response is not stated directly in the passage, and
the correct answer cannot be suggested by the item itself. Rather, the examinee must
determine the correct answer by comprehending the passage. The four options derive from
the Lexile Vocabulary Analyzer word list that corresponds with the Lexile measure of the
passage. In this format, all options are semantically and syntactically appropriate completions of the sentence, but one option is unambiguously “best” when considered in the
context of the passage. This format is “well-suited for testing a student’s ability to evaluate”
(Haladyna, 1994, p. 62). In addition, this format is useful instructionally.
The statement portion of the embedded completion item can assess a variety of skills
related to reading comprehension: paraphrase information in the passage; draw a logical
conclusion based on information in the passage; make an inference; identify a supporting
detail; or make a generalization based on information in the passage. The statements were
written to ensure that by reading and comprehending the passage, the reader can select the
correct option. When the statement is read by itself, each of the four options is plausible.
There are two main advantages to using embedded completion items on SRI. The first
is that the reading difficulty of the statement and the four options is easier than the most
difficult word in the passage. The second advantage of the embedded completion format
is that only authentic passages are used, with no attempt to control the length of sentences
or level of vocabulary in the passage. The embedded completion statement is as short as or
shorter than the briefest sentence in the passage. These two advantages help ensure that the
statement is easier than the accompanying passage.

44

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 44

8/14/07 6:54:12 PM

Item Writing—Training. Item writers for Scholastic Reading Inventory were classroom
teachers and other educators who had experience with the everyday reading ability of
students at various levels. In 1998 and 1999, twelve individuals developed items for Forms
A and B of SRI and the second set of items. In 2003, six individuals developed items for
the third set. Using individuals with classroom teaching experience helped to ensure that
the items are valid measures of reading comprehension. Item writers were provided with
training materials concerning the embedded completion item format and guidelines for
selecting passages, developing statements, and selecting options. The item writing materials
also contained model items that illustrated the criteria used to evaluate items and corrections based on those criteria. The final phase of item writer training was a short practice
session with three items.
Item writers were provided vocabulary lists to use during statement and option development. The vocabulary lists were compiled by MetaMetrics based on research to determine
the Lexile measures of words (i.e., their difficulty). The Lexile Vocabulary Analyzer (LVA)
determines the Lexile measure of a word using a set of features related to the source text and
the word’s prevalence in the MetaMetrics corpus (MetaMetrics, 2006b). The rationale used to
compile the vocabulary lists was that the words should be part of a reader’s “working” vocabulary if they had likely been encountered in easier text (those with lower Lexile measures).
Item writers were also given extensive training related to “sensitivity” issues. Part of the
item writing materials addressed these issues and identified areas to avoid when selecting
passages and developing items. The following areas were covered: violence and crime,
depressing situations/death, offensive language, drugs/alcohol/tobacco, sex/attraction,
race/ethnicity, class, gender, religion, supernatural/magic, parent/family, politics, animals/
environment, and brand names/junk food. These materials were developed based on
standards published by CTB/McGraw-Hill for universal design and fair access—equal
treatment of the sexes, fair representation of minority groups, and the fair representation of
disabled individuals (Guidelines for Bias-Free Publishing).
Item writers were first asked to develop 10 items independently. The items were then
reviewed for item format, grammar, and sensitivity. Based on this review, item writers
received feedback and more training if necessary. Item writers were then asked to develop
additional items.
Item Writing—Review. All items were subjected to a two-stage review process. First, items
were reviewed and edited according to the 19 criteria identified in the item-writing materials and for sensitivity issues. Approximately 25% of the items developed were rejected for
various reasons. Where possible, items were edited and maintained in the item bank.
Items were then reviewed and edited by a group of specialists representing various perspectives
—test developers, editors, and curriculum specialists. These individuals examined each item for
sensitivity issues and the quality of the response options. During the second stage of the item
review process, items were either “approved as presented,” “approved with edits,” or “deleted.”
Approximately 10 percent of the items written were approved with edits or deleted at this
stage. When necessary, item writers received additional feedback and training.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 45

45

8/14/07 6:54:12 PM

SRI Item Bank Specifications. Three sets of items were developed between 1998 and 2003.
Set 1 was developed in 1998 and used with the print and online versions of the test. Item
specifications required that the majority of the items be developed for the 500L through
1100L range (70% of the total number of items; 10% per Lexile zone) with 15% below this
range and 15% above this range. This range is typical of the majority of readers in Grades
3 through 9. Set 2 was written in fall 2002 and followed the same specifications. Set 3
was written in spring and summer of 2003. This set of items was developed for a different
purpose—to provide items that would be interesting and developmentally appropriate for
students in middle and high school, but written at a lower Lexile level (below the 50th
percentile) than would typically be administered to students in these grades. A total of
4,879 items were submitted to Scholastic for inclusion in SRI. Table 7 presents the number
of items developed for each item set by Lexile zone.

Table 7. Distribution of items in SRI item bank by Lexile zone.

46

Lexile Zone

Item Set 1
Original Item Bank

Item Set 2

Item Set 3
“Hi-Lo” Item Bank

BR (0L and Below)
5L to 100L
105L to 200L
205L to 300L
305L to 400L
405L to 500L
505L to 600L
605L to 700L
705L to 800L
805L to 900L
905L to 1000L
1005L to 1100L
1105L to 1200L
1205L to 1300L
1305L to 1400L
1405L to 1500L
1500+L (Above 1500L)

22
10
45
55
129
225
314
277
332
294
294
335
304
212
110
42
15

15
6
13
23
30
58
96
91
83
83
83
84
88
76
79
57
35

---16
91
169
172
170
131
76
37
2
------

Total

3,015

1,000

864

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 46

8/14/07 6:54:13 PM

SRI Computer-Adaptive Algorithm
Schoolwide tests are often administered at grade level to large groups of students in order
to make decisions about students and schools. Consequently, since all students in a grade
are given the same test, each test must include a wide range of items to cover the needs of
both low- and high-achieving students. These wide-range tests are often unable to measure
some students as precisely as a more focused assessment could.
To provide the most accurate measure of a student’s level of reading comprehension, it is
important to assess the student’s reading level as precisely as possible. One method is to use as
much background information as possible to target a specific test level for each student. This
information can consist of the student’s grade level, a teacher’s judgment concerning the reading level of the student, or the student’s standardized test results (e.g., scale scores, percentiles,
stanines). This method requires the test administrator to administer multiple test forms during
one test session, which can be cumbersome and may introduce test security problems.
With the widespread availability of computers in classrooms and schools, another more efficient
method is to administer a test tailored to each student—Computer-Adaptive Testing (CAT).
Computer-adaptive testing is conducted individually with the aid of a computer algorithm
to select each item so that the greatest amount of information about the student’s ability is
obtained before the next item is selected. SRI employs such a methodology for testing online.
What are the benefits of CAT testing? Many benefits of computer-adaptive testing have
been described in the literature (Wainer et al., 1990; Stone and Lunz, 1994; Wang and
Vispoel, 1998). Each test is tailored to the student. Item selection is based on the student’s
ability and responses to each question. The benefits include the following:
• increased efficiency through reduced testing time and targeted testing;
• immediate scoring. A score can be reported as soon as the student
finishes the test; and
• more control over the test item bank. Because the test forms do not
have to be physically developed, printed, shipped, administered, or
scored, a broader range of forms can be used.
In addition, studies conducted by Hardwicke and Yoes (1984) and Schinoff and Steed
(1988) provide evidence that below-level students tend to prefer computer-adaptive tests
because they do not discourage students by presenting a large number of questions that are
too hard for them (cited in Wainer, 1992).
Bayesian Paradigm and the Rasch Model. Bayesian methodology provides a paradigm for
combining prior information with current data, both subject to uncertainty, to produce an
estimate of current status, which is again subject to uncertainty. Uncertainty is modeled
mathematically using probability.
Within SRI, prior information can be the student’s current grade level, the student’s
performance on previous assessments, or teacher estimates of the student’s abilities. The
current data in this context is the student’s performance on SRI, which can be summarized
as the number of items answered correctly from the total number of items attempted.
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 47

47

8/14/07 6:54:13 PM

Both prior information and current data are represented by probability models reflecting
uncertainty. The need to incorporate uncertainty when modeling prior information is
intuitively clear. The need to incorporate uncertainty when modeling test performance
is perhaps less intuitive. When the test has been taken and scored, and assuming that no
scoring errors were made, the performance, i.e., the raw score, is known with certainty.
Uncertainty arises because test performance is associated with, but not wholly determined
by, the ability of the student, and it is that ability, rather than the test performance per se,
that we are trying to measure. Thus, though the test results reflect the test performance
with certainty, we remain uncertain about the ability that produced the performance.
The uncertainty associated with prior knowledge is modeled by a probability distribution
for the ability parameter. This distribution is called the prior distribution, and it is usually
represented by a probability density function, e.g., the normal bell-shaped curve. The
uncertainty arising from current data is modeled by a probability function for the data
when the ability parameter is held fixed. When roles are reversed so that the data are held
fixed and the ability parameter is allowed to vary, this function is called the likelihood
function. In the Bayesian paradigm, the posterior probability density for the ability
parameter is proportional to the product of the prior density and the likelihood, and this
posterior density is used to obtain the new ability estimate along with its uncertainty.
The computer-adaptive algorithm used with SRI is also based on the Rasch (one-parameter)
item response theory model. Classical test theory has two basic shortcomings: (1) the use of
item indices whose values depend on the particular group of examinees from which they
were obtained, and (2) the use of examinee ability estimates that depend on the particular choice of items selected for a test. The basic premises of item response theory (IRT)
overcome these shortcomings by predicting the performance of an examinee on a test item
based on a set of underlying abilities (Hambleton and Swaminathan, 1985). The relationship
between an examinee’s item performance and the set of traits underlying item performance
can be described by a monotonically increasing function called an item characteristic curve
(ICC). This function specifies that as the level of the trait increases, the probability of a correct response to an item increases.
The conversion of observations into measures can be accomplished using the Rasch (1980)
model, which requires that item calibrations and observations (count of correct items)
interact in a probability model to produce measures. The Rasch item response theory
model expresses the probability that a person (n) answers a certain item (i) correctly by the
following relationship:
Pni 

e bn  d i
1  e bn  d i

(Equation 2)

where di is the difficulty of item i (i  1, 2, …, number of items);
bn is the ability of person n (n  1, 2, …, number of persons);
bn  di is the difference between the ability of person n and the difficulty of item i; and
Pni is the probability that examinee n responds correctly to item i
(Hambleton and Swaminathan, 1985; Wright and Linacre, 1994).

48

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 48

8/14/07 6:54:13 PM

This measurement model assumes that item difficulty is the only item characteristic that
influences the examinee’s performance such that all items are equally discriminating in
their ability to identify low-achieving persons and high-achieving persons (Bond and Fox,
2001; and Hambleton, Swaminathan, and Rogers, 1991). In addition, the lower asymptote
is zero, which specifies that examinees of very low ability have zero probability of correctly
answering the item. The Rasch model has the following assumptions: (1) unidimensionality—only one ability is assessed by the set of items; and (2) local independence—when
abilities influencing test performance are held constant, an examinee’s responses to any pair
of items are statistically independent (conditional independence, i.e., the only reason an
examinee scores similarly on several items is because of his or her ability, not because the
items are correlated). The Rasch model is based on fairly restrictive assumptions, but it is
appropriate for criterion-referenced assessments. Figure 5 shows the relationship between
the difference of a person’s ability and an item’s difficulty and the probability that a person
will respond correctly to the item.

Figure 5. The Rasch Model—the probability person n responds correctly to item i.

Probability Correct Response

1.0
0.8
0.6
0.4
0.2
0.0
–4

–3

–2

–1

0

1

2

3

4

b (n ) – d (i )
An assumption of the Rasch model is that the probability of a response to an item is
governed by the difference between the item calibration (di ) and the person’s measure (bn ).
From an examination of the graph in Figure 5, when the ability of the person matches the
difficulty of the item (bn  di  0), then the person has a 50% probability of responding
to the item correctly. With the Lexile Framework, 75% comprehension is modeled by
subtracting a constant.
The number correct for a person is the probability of a correct response summed over
the number of items. When the measure of a person greatly exceeds the calibration
(difficulties) of the items (bn  di  0), then the expected probabilities will be high and
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 49

49

8/14/07 6:54:14 PM

the sum of these probabilities will yield an expectation of a high number correct. Conversely, when the item calibrations generally exceed the person measure (bn  di  0),
the modeled probabilities of a correct response will be low and a low number correct is
expected.
Thus, Equation 2 can be rewritten in terms of a person’s number of correct responses on
a test
L

Op  3

t1

e bn  d i
1  e bn  d i

(Equation 3)

where Op is the number of person p’s correct responses and L is the number of items on
the test.
When the sum of the correct responses and the item calibrations (di) is known, an iterative
procedure can be used to find the person measure (bn) that will make the sum of the modeled probabilities most similar to the number of correct responses. One of the key features
of the Rasch item response model is its ability to place both persons and items on the
same scale. It is possible to predict the odds of two individuals answering an item correctly
based on knowledge of the relationship between the abilities of the two individuals. If one
person has an ability measure double that of another person (as measured by b—the ability
scale), then he or she has double the odds of answering the item correctly.
Equation 3 has several distinguishing characteristics:
• The key terms from the definition of measurement are placed in a
precise relationship to one another.
• The individual responses of a person to each item on an instrument
are absent from the equation. The only piece of data that survives the
act of observation is the “count correct” (Op ), thus confirming that
the observation is “sufficient” for estimating the measure.
For any set of items the possible raw scores are known. When it is possible to know the
item calibrations (either theoretically or empirically from field studies), the only parameter
that must be estimated in Equation 3 is the measure that corresponds to each observable count correct. Thus, when the calibrations (di) are known, a correspondence table
linking observation and measure can be constructed without reference to data from other
individuals.
How does CAT testing work with SRI? As described earlier, SRI uses a three-phase
approach to assess a student’s level of reading ability: Start, Step, Stop. During test administration, the computer adapts the test continually according to the student’s responses to the
questions. The student starts the test; the test steps up or down according to the student’s
performance; and, when the computer has enough information about the student’s reading
level, the test stops.
The first phase, Start, determines the best point on the Lexile scale to begin testing the
student. Figure 6 presents a flowchart of the “start” phase of SRI.

50

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 50

8/14/07 6:54:14 PM

Figure 6: The “start” phase of the SRI computer-adaptive algorithm.
Input Student Data
•Grade Level
•Other Test Scores
•Teacher Judgment

No
Administer Locator Test?

Other Test Scores or
Teacher Judgments Entered?
Yes
Take Practice Test:
Ask question at 10th
percentile of grade level

Get interface help
from teacher
Yes
No
1st Time?

Pass Practice Test?
Yes

No
Determine Bayesian Priors:
Ability  b
Uncertainty  S

Randomly selected item at
75% success level:
difficulty of item  b

Prior to testing, the teacher or administrator inputs information into the computer-adaptive
algorithm that controls the administration of the test. The student’s identification number
and grade level must be input; prior standardized reading results (e.g., a Lexile measure from
SRI-print) and the teacher’s estimate of the student’s reading level may also be input. This
information is used to determine the best starting point (Reader Measure) for the student.
The more information input into the algorithm, the better targeted the beginning of the
test. Research has shown that well-targeted tests report less error in student scores than
poorly-targeted tests.
Within the Bayesian algorithm, initial Reader Measures (ability [b]) are determined by the
following information: grade level, prior SRI test score, or teacher estimate of the student’s
reading level. If only grade level is entered, the student starts SRI with a Reader Measure
equal to the 50th percentile for his or her grade. If a prior SRI test score and administration date are entered, then this Lexile measure is used as the student’s Reader Measure.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 51

51

8/14/07 6:54:15 PM

The Reader Measure is adjusted based on the amount of growth expected per month since
the prior test was administered. The amount of growth expected in Lexiles per month
is based on research by MetaMetrics, Inc. related to cross-sectional norms. If the teacher
enters an estimated reading level, then the Lexile measure associated with each percentile
for the grade is used as the student’s Reader Measure. Teachers can enter the following
estimated reading levels: far below grade level (5th percentile), below grade level (25th
percentile), on grade level (50th percentile), above grade level (75th percentile), and far
above grade level (95th percentile).
Initial uncertainties (sigma ) are determined by a prior Reader Measure (if available),
when the measure was collected, and the reliability of the measure. If a prior Reader
Measure is unavailable or if teacher estimation is the basis of the prior Reader Measure,
then maximum uncertainty (225L) is assumed. This value is based on prior research
conducted by MetaMetrics, Inc. (2006a). If a prior Reader Measure is available, then the
elapsed time, measured in months, is used to prorate the maximum uncertainty associated
with three years of elapsed time.
If the administration is the student’s first time interacting with SRI, three practice items
are presented. The practice items are selected at the 10th percentile for the grade level.
The practice items are not counted in the student’s score; their purpose is solely to
familiarize the student with the embedded completion item format and the test’s internal
navigation.
If the student is enrolled in middle or high school (Grade 7 or above) and no prior reading
ability information (i.e., other test scores or teacher estimate) is provided, a short Locator Test is
administered. The purpose of the Locator Test is to ensure that students who read significantly
below grade level receive a valid Lexile measure from the first administration of SRI. When
a student is initially mis-targeted, it is difficult for the algorithm to produce a valid Lexile
measure given the logistical parameters of the program. The items administered as the Locator
Test are 500L below the “on grade level” (50th percentile) estimated reading level.
For subsequent administrations of SRI, the Reader Measure and uncertainty are the prior
values adjusted for time. The Reader Measure is adjusted based on the amount of growth
expected per month during the elapsed time. The elapsed time (measured in months) is
used to prorate the maximum uncertainty associated with three years of elapsed time.
The second phase, Step, controls the selection of questions presented to the student.
Figure 7 presents a flowchart of the “step” phase of SRI.
If only the student’s grade level was input during the first phase, then the student is
presented with a question that has a Lexile measure at the 50th percentile for his or her
grade. If more information about the student’s reading ability was input during the first
phase, then the student is presented with a question that is nearer his or her true ability.
If the student responds correctly to the question, then he or she is presented with a question
that is slightly more difficult. If the student responds incorrectly to the question, then he
or she is presented with a question that is slightly easier. After the student responds to each
question, his or her SRI score (Lexile measure) is recomputed.
52

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 52

8/14/07 6:54:15 PM

Figure 7: The “step” phase of the SRI computer-adaptive algorithm.
Randomly selected item at
75% success level:
difficulty of item  b

Incorrect
Find new b iteratively

Correct

Find new ability estimate (b)
iteratively
Yes

Set
b  new b

If number incorrect  0
No
Adjust Uncertainty ( S )

Questions are randomly selected from all possible items that are within 10L of the student’s
current Reader Measure. If necessary, the range of items available for selection can be
broadened to 50L. The frequency with which items appear is controlled by marking an
item “Do Not Use” once it has been administered to a student. The item is then unavailable for selection in the next three test administrations.
If the student is in Grade 6 or above and his or her Lexile measure is below the specified
minimum measure for the grade (15th percentile), then he or she is administered items
from the Hi-Lo pool. This set of items has been identified from all items developed for
SRI based on the following criteria: (1) developmentally appropriate for middle and high
school students (high interest), and (2) Lexile text measure between 200L and 1000L (low
difficulty).
The final phase, Stop, controls the termination of the test. Figure 8 presents a flowchart of
the “stop” phase of SRI.
Approximately 20 items are presented to every student. The exact number of questions
administered depends on how the student responds to the items as they are presented. In
addition, how well-targeted the test is at its start affects the number of questions presented
to the student.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 53

53

8/14/07 6:54:15 PM

Figure 8: The “stop” phase of the SRI computer-adaptive algorithm.
Are stopping conditions satisfied?

No

• Number of items answered
• Number of correct/incorrect responses
• Amount of elapsed time

Randomly select item at
75% success level:
difficulty of item  b

Yes

Convert Reader Measure
to Lexiles

Stop

Adjust Uncertainty ( S )

Well-targeted tests begin with less measurement error and, subsequently, the student will
be asked to respond to fewer items. After the student responds to each item, his or her
Reader Measure is calculated through an iterative process using the Rasch model
(Equation 2, page 48).
The testing session ends when one of the following conditions is met:
• the student has responded to at least 20 items and has responded
correctly to at least 6 items and incorrectly to at least 3 items,
• the student has responded to 30 items, and
• the elapsed test administration time is at least 40 minutes and the
student has responded to at least 10 items.
At this time the student’s resulting Lexile measure and uncertainty are converted to Lexiles.
Lexile measures are reported as a number followed by a capital “L.” There is no space
between the measure and the “L,” and measures of 1,000 or greater are reported without
a comma (e.g., 1050L). Within SRI, Lexile measures are reported to the nearest whole
number. As with any test score, uncertainty in the form of measurement error is present.
Lexile measures below 100L are reported as “BR” for “Beginning Reader.”

54

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 54

8/14/07 6:54:16 PM

SRI Algorithm Testing During Development
Feasibility Study. SRI was field tested with 879 students in Grades 3, 4, 5, and 7 from
four schools in North Carolina and Florida. The schools were selected according to the
following criteria: school location (urban versus rural), school size (small, medium, or large
based on the number of students and staff), and availability of Macintosh computers within
a laboratory setting.
• In School 1 (suburban K–5), 72.1% of the students were Caucasian, 22.5%
African American, 4.8% Hispanic, 0.3% Asian, and 0.2% Native American. The computer lab was equipped with Power Mac G3s with 32 MB
RAM. A total of 28 computers were in the lab arranged in 4 rows with a
teacher station. There were also two video monitor displays in the lab.
• In School 2 (rural K–5), 60.5% of the students were Caucasian, 29.7%
African American, 8.6% Hispanic, 0.7% Asian, and 0.5% Native
American. Of the students sampled, 60% were male and 40% were
female. The computer lab was equipped with Macintosh LC 580s.
• School 3 (urban 6–8) was predominately Caucasian (91%), with 5%
of the students classified as African American, 2% of the students
Hispanic, and 2% Asian. At the school, 17% of the students qualified
for the Free and Reduced Price Lunch Program, 14% were classified
as having a disability, 6% were classified as gifted, and 0.1% were classified as limited English proficient. Of the students sampled, 49% were
male and 51% were female.
• School 4 (urban K–5) was predominately Caucasian (86%), with 14% of
the students classified as minority. Of the students sampled, 58% were
male and 42% were female. At the school 46% of the students qualified
for the Free and Reduced Price Lunch Program, 21% were classified as
having a disability, 4% were classified as gifted, and 0.1% were classified
as limited English proficient. Technology was integrated into all subjects
and content areas, and the curriculum included a variety of handson activities and projects. The school had a school-wide computer
network and at least one computer for every three students. Multimedia
development stations with video laser and CD-ROM technology were
also available.
The purpose of this phase of the study was to examine the algorithm and the software used
to administer the computer-adaptive test. In addition, other reading test data was collected
to examine the construct validity of the assessment.
Based on the results of the first administration in School 1, it was determined that the item
selection routine was not selecting the optimal item each time. As a result, the calculation
of the ability estimate was changed to occur after the administration of each item, and a
specified minimum number of responses was required before the program terminated.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 55

55

8/14/07 6:54:16 PM

The Computer-Adaptive Test Survey was completed by 255 students (Grade 3, N  71;
Grade 5, N  184). There were no significant differences by grade (Grade 3 versus Grade
5) or by school within grade (Grade 5: School 1 versus School 2) in the responses to any of
the questions on the survey.
Question 1 asked students if they had understood how to take the computer-adaptive test.
On a scale with 0 being “no” and 2 being “yes,” the mean was 1.83. Students in Grades 3
and 5 responded the same way. This information was also confirmed in the written student
comments and in the discussion at the end of the session. The program was easy to use and
follow.
Question 2 asked students whether they used the mouse, the keyboard, or both to respond
to the test. Of the 254 students responding to this question, 76% (194) used the mouse,
20% (52) used the keyboard, and 3% (8) used both the keyboard and the mouse. Several
students commented that they liked the computer-adaptive test because it allowed them to
use the mouse.
Question 7 asked students which testing format they preferred—paper-and-pencil,
computer-adaptive, or both formats equally. Sixty-five percent of the sample liked the
computer-adaptive test format better. There were no significant differences between the
responses for students in Grade 3 compared to those in Grade 5. The results for each grade
and the total sample are presented in Table 8.

Table 8. Student responses to Question 7: preferred test format.
Grade

Paper-and-Pencil Format

Computer-Adaptive
Format

Both Formats Equally

3
5

9%
17%

71%
62%

20%
21%

Total

15%

65%

21%

Students offered a variety of reasons for liking the computer-adaptive test format better:
✓ “I liked that you don’t have to
turn the pages.”
✓ “I liked that you didn’t have to
write.”
✓ “I liked that you only had to
point and click.”
✓ “I liked the concept that you
don’t have a certain amount of
questions to answer.”

56

✓ “You don’t write and don’t have
to worry about lead breaking or
black stuff on your fingers.”
✓ “I like working on computers.”
✓ “Because you didn’t have to circle
the answer with a pencil and your
hand won’t hurt.”

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 56

8/14/07 6:54:16 PM

Of the 21% of students who liked both test formats equally, several students provided reasons:
✓ “They’re about the same thing except on the computer your
hand doesn’t get tired.”
✓ “On number 7, I put about the same because I like just the
point that we don’t have to write.”
A greater percentage of Grade 5 students (17%) than Grade 3 students (9%) stated that
they preferred the paper-and-pencil test format. This may be explained by the further
development of test-taking strategies by the Grade 5 students. Their reasons for preferring
the paper-and-pencil version generally dealt with features of the computer-adaptive test
format—the ability to skip questions and review and change answers:
✓ “I liked the computer test, but I like paper-and-pencil because
I can check over.”
✓ “Because I can skip a question and look back on the story.”
Four students stated that they preferred the paper-and-pencil format because of the
computer environment:
✓ “I liked the paper-and-pencil test better because you don’t
have to stare at a screen with a horrible glare!”
✓ “Because it would be much easier for me because I didn’t feel
comfortable at a computer.”
✓ “Because it is easier to read because my eyesight is bad.”
✓ “I don’t like reading on a computer.”
Questions 4 and 5 on the survey dealt with the student’s test-taking strategies—the ability to
skip questions and to review and change responses. Question 4 asked students whether they
had skipped any of the questions on the computer-adaptive test. Seventy-three percent (73%)
of the students skipped at least one item on the test. From the student’s comments, this was
one of the features of the computer-adaptive test that they really liked. Several students commented that they were not allowed enough passes. One student stated, “It’s [the CAT] very
easy to control and we can pass on the hard ones.” Another student stated that, “I like the part
where you could pass some [questions] where you did not understand.”
Question 5 asked students whether they went back and changed answers when they took
tests on paper. On a scale with 0 being “never” and 2 being “always,” the mean was 0.98.
According to many students’ comments, this was one of the features of the computeradaptive test that they did not like.
Several students commented on the presentation of the text in the computer-adaptive test
format.
✓ “I liked the way you answered the questions. I like the way it
changes colors.”
✓ “The words keep getting little, then big.”
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 57

57

8/14/07 6:54:16 PM

Questions 3 and 6 dealt with the student’s perceptions of the computer-adaptive test’s difficulty. The information from these questions was not analyzed due to the redevelopment
of the algorithm for selecting items.
When SRI was field tested with this sample of students in Grades 3, 4, 5, and 7 (N  879)
during the 1998–1999 school year, other measures of reading were collected. Tables 9 and
10 present the correlations between SRI and other measures of reading comprehension.

Table 9. Relationship between SRI and SRI-print version.
Grade

N

Correlation with SRI-print version

3
4
5
7

226
104
93
122

0.72
0.74
0.73
0.62

Total

545

0.83

Table 10. Relationship between SRI and other measures of reading comprehension.
Grade

N

Correlation

North Carolina End-of-Grade
Tests (NCEOG)

3
4

109
104

0.73
0.67

Pinellas Instructional
Assessment Program (PIAP)

3

107

0.62

Comprehensive Test of Basic
Skills (CTBS)

5
7

110
117

0.74
0.56

Test

From the results it can be concluded that SRI measures a construct similar to that
measured by other standardized tests designed to measure reading comprehension. The
magnitude of the within-grade correlations with SRI-print version is close to that of the
observed correlations for parallel test forms (i.e., alternate forms reliability), thus suggesting
that the different tests are measuring the same construct. The NCEOG, PIAP, and CTBS
tests consist of passages followed by traditional multiple-choice items, and SRI consists of
embedded completion multiple-choice items. Given the differences in format, the limited
range of scores (within-grade), and the small sample sizes, the correlations suggest that the
four assessments are measuring a similar construct.

58

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 58

8/14/07 6:54:17 PM

Comparison of SRI v3.0 and SRI v4.0. The newest edition of SRI, the Enterprise Edition
of the suite of Scholastic technology products, is built on Industry-Standard Technology
that is smarter and faster, featuring SAM (Scholastic Achievement Manager)—a robust
new management system. SRI provides district-wide data aggregation capabilities to help
administrators meet AYP accountability requirements and provide teachers with data to
differentiate instruction effectively.
Prior to the integration of Version 4.0/Enterprise Edition (April/May 2005), a study was
conducted to compare results from version 3.0 with those from Version 4.0 (Scholastic,
May 2005). A sample of 144 students in Grades 9 through 12 participated in the study.
Each student was randomly assigned to one of four groups: (A) Test 1/v4.0; Test 2/v3.0;
(B) Test 1/v3.0; Test 2/v4.0; (C) Test 1/v3.0; Test 2/v3.0; and (D) Test 1/v4.0; Test 2/v4.0.
Each student’s grade level was set and verified prior to testing. For students in groups (C)
and (D), two accounts were established for each student to ensure that the starting criteria
were the same for both test administrations. The final sample of students (N  122) consisted of students who completed both assessments. Table 11 presents the summary results
from the two testing groups that completed different versions of SRI.

Table 11. Descriptive statistics for each test administration group in the comparison
study, April/May 2005
Test Group

N

Test 1
Mean (SD)

N

Test 2
Mean (SD)

Difference

A: Test 1/v4.0; Test 2/v3.0

32

1085.00
(179.13)

32

1103.34
(194.72)

18.34

B: Test 1/v3.0; Test 2/v4.0

30

1114.83
(198.24)

30

1094.67
(232.51)

20.16

p  .05

The differences between the two versions of the test for each group were not significant
(paired t-test) at the .05 level. It can be concluded that scores from versions 3.0 and 4.0
for groups (A) and (B) were not significantly different. A modest correlation of 0.69 was
observed between the two sets of scores (v3.0 and v4.0). Given the small sample size
(N = 62) that took the two different versions, the correlation meets expectations.
Locator Test Introduction Simulations. In 2005, with the move to SRI Enterprise Edition,
Scholastic introduced the Locator Test. The purpose of the Locator Test is to ensure that
students who read significantly below grade level (at grade level  50th percentile) receive
a valid Lexile measure from the first administration of SRI. Two studies were conducted to
examine whether the Locator Test was serving the purpose for which it was designed.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 59

59

8/14/07 6:54:17 PM

Study 1. The first study was conducted in September 2005 and consisted of simulating
the responses of approximately 90 test administrations “by hand.” The results showed that
students who failed the Locator Test could get BR scores (Scholastic, 2006b, p.1).
Study 2. The second study was conducted in 2006 and consisted of the simulation of
6,900 students under five different test conditions. Each simulated student took all five tests
(three tests included the Locator Test and two excluded it).
The first simulation tested whether students who perform as well on the Locator Test as they
perform on the rest of SRI can expect to receive higher or lower scores (Trial 1) than if
they never receive the Locator Test (Trial 4). A total of 4,250 simulated students participated
in this study, and a correlation of .96 was observed between the two test scores (with and
without the Locator Test). The results showed that performance on the Locator Test did not
affect SRI scores for students who had reading abilities above BR (N  4,150; Wilcoxson
Rank Sum Test  1.7841e07; p  .0478). In addition, the proportion of students who
scored BR from each administration was examined. As expected, the proportion of
students who scored BR without the Locator Test was 12.17% (840 out of 6,900) compared
to 22.16% (1,529 out of 6,900) who scored BR with the Locator Test. The results confirmed
the hypothesis that the Locator Test allows students to start SRI at a much lower Reader
Measure and, thus, descend to the BR level with more reliability.
The third simulation tested whether students who failed the Locator Test (Trial 3) received
basically the same score as when they had a prior Reader Measure 500L below grade level
and were administered SRI without the Locator Test (Trial 5). The results showed that failing
the Locator Test produced results similar to inputting a “below basic” estimated reading level
(N  6,900; Wilcoxson Rank Sum Test  4.7582e07; p  .8923).

60

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 60

8/14/07 6:54:17 PM

Reliability
To be useful, a piece of information should be reliable—stable, consistent, and dependable. In reality, all test scores include some measure of error (or level of uncertainty). This
uncertainty in the measurement process is related to three factors: the statistical model used
to compute the score, the questions used to determine the score, and the condition of the
test taker when the questions used to determine the score were administered. Once the
level of uncertainty in a test score is known, then it can be taken into account when the
test results are used.
Reliability, or the consistency of scores obtained from an assessment, is a major consideration in evaluating any assessment procedure. Two sources of uncertainty have been
examined for SRI—text error and reader error.

Standard Error of Measurement
Uncertainty and Standard Error of Measurement. There is always some uncertainty about a
student’s true score because of the measurement error associated with test unreliability.
This uncertainty is known as the standard error of measurement (SEM). The magnitude of
the SEM of an individual student’s score depends on the following characteristics of
the test:
• the number of test items—smaller standard errors are associated with
longer tests;
• the quality of the test items—in general, smaller standard errors are
associated with highly discriminating items for which correct answers
cannot be obtained by guessing; and
• the match between item difficulty and student ability—smaller standard errors are associated with tests composed of items with difficulties approximately equal to the ability of the student (targeted tests).
(Hambleton, Swaminathan, and Rogers, 1991).

SRI was developed using the Rasch one-parameter item response theory model to relate
a reader’s ability to the difficulty of the items. There is a unique amount of measurement
error due to model misspecification (violation of model assumptions) associated with each
score on SRI. The computer algorithm that controls the administration of the assessment
uses a Bayesian procedure to estimate each student’s reading comprehension ability. This
procedure uses prior information about students to control the selection of questions and
the recalculation of each student’s reading ability after responding to each question.
Compared to a fixed-item test where all students answer the same questions, a computeradaptive test produces a different test for every student. When students take a computeradaptive test, they all receive approximately the same raw score or number of items correct.
This occurs because all students are answering questions that are targeted for their unique
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 61

61

8/14/07 6:54:17 PM

ability—not questions that are too easy or too hard. Because each student takes a unique
test, the error associated with any one score or student is also unique.
The initial uncertainty for an SRI score is 225L (within-grade standard deviation from
previous research conducted by MetaMetrics, Inc.). When a student retests with SRI, the
uncertainty of his or her score is the uncertainty that resulted from the previous assessment adjusted for the time elapsed between administrations. An assumption is made that
after three years without a test, the student’s ability should again be measured at maximum
uncertainty. Average SEMs are presented in Table 12. These values can be used as a general
“rule of thumb” when reviewing SRI results. It bears repeating that because each student
takes a unique test and the results rely partly on prior information, the error associated with any one
score or student is also unique.

Table 12. Mean SEM on SRI by extent of prior knowledge.
Number of Items

SEM
Grade Level Known

SEM
Grade and Reading Level Known

15
16
17
18
19
20
21
22
23
24

104L
102L
99L
96L
93L
91L
89L
87L
86L
84L

58L
57L
57L
57L
57L
56L
56L
55L
54L
54L

As can be seen from the information in Table 12, when the test is well-targeted (grade level
and prior reading level of the student are known), the student can respond to fewer test
questions and not increase the error associated with the measurement process. When only
the grade level of the student is known, the more questions the student responds to, the less
error in the score associated with the measurement process.

Sources of Measurement Error—Text
SRI is a theory-referenced measurement system for reading comprehension. Internal
consistency and other traditional indices of test quality are not critical considerations.
What matters is how well individual and group performances conform to theoretical
expectations. The Lexile Framework states an invariant and absolute requirement that the
performance of items and test takers must match.

62

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 62

8/14/07 6:54:17 PM

Measurement is the process of converting observations into quantities via theory. There
are many sources of error in the measurement process: the model used to relate observed
measurements to theoretical ones, the method used to determine measurements, and the
moment when measurements are made.
To determine a Lexile measure for a text, the standard procedure is to process the entire
text. All pages in the work are concatenated into an electronic file that is processed by a
software package called the Lexile Analyzer (developed by MetaMetrics, Inc.). The Analyzer
“slices” the text file into as many 125-word passages as possible, analyzes the set of slices,
and then calibrates each slice in terms of the logit metric. That set of calibrations is then
processed to determine the Lexile measure corresponding to a 75% comprehension rate.
The analyzer uses the slice calibrations as test item calibrations and then solves for the
measure corresponding to a raw score of 75% (e.g., 30 out of 40 correct, as if the slices
were test items). Obviously, the measure corresponding to a raw score of 75% on Goodnight
Moon (300L) slices would be lower than the measure corresponding to a comparable raw
score on USA Today (1200L) slices. The Lexile Analyzer automates this process, but what
“certainty” can be attached to each text measure?
Using the bootstrap procedure to examine error due to the text samples, the above analysis
could be repeated. The result would be an identical text measure to the first because there
is no sampling error when a complete text is calibrated.
There is, however, another source of error that increases the uncertainty about where a
text is located on the Lexile Map. The Lexile Theory is imperfect in its calibration of the
difficulty of individual text slices. To examine this source of error, 200 items that had been
previously calibrated and shown to fit the model were administered to 3,026 students in
Grades 2 through 12 in a large urban school district. The sample of students was socioeconomically and ethnically diverse. For each item the observed item difficulty calibrated
from the Rasch model was compared with the theoretical item difficulty calibrated from
the regression equation used to calibrate texts. A scatter plot of the data is presented in
Figure 9.
The correlation between the observed and the theoretical calibrations for the 200 items
was .92 and the root mean square error was 178L. Therefore, for an individual slice of text
the measurement error is 178L.
The standard error of measurement associated with a text is a function of the error
associated with one slice of text (178L) and the number of slices that are calibrated from a
text. Very short books have larger uncertainties than longer books. A book with only four
slices would have an uncertainty of 89 Lexiles whereas a longer book such as War and Peace
(4,082 slices of text) would only have an uncertainty of three Lexiles (Table 13).
Study 2. A second study was conducted by Stenner, Burdick, Sanford, and Burdick (2006)
during 2002 to examine ensemble differences across items. An ensemble consists of the all
of the items that could be developed from a selected piece of text. The Lexile measure of a
piece of text is the mean difficulty.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 63

63

8/14/07 6:54:18 PM

Figure 9. Scatter plot between observed item difficulty and theoretical item difficulty.

Observed Difficulties

2000
1500
1000
500
0
–500
–500

0

500

1000

2000

1500

Theoretical Difficulties

Table 13. Standard errors for selected values of the length of the text.
Title

Number
of Slices

Text
Measure

Standard Error
of Text

The Stories Julian Tells
Bunnicula
The Pizza Mystery
Meditations of First Philosophy
Metaphysics of Morals
Adventures of Pinocchio
Red Badge of Courage
Scarlet Letter
Pride and Prejudice
Decameron
War and Peace

46
102
137
206
209
294
348
597
904
2431
4082

520L
710L
620L
1720L
1620L
780L
900L
1420L
1100L
1510L
1200L

26L
18L
15L
12L
12L
10L
10L
7L
6L
4L
3L

Participants. Participants in this study were students from four school districts in a large
southwestern state. These students were participating in a larger study that was designed to
assess reading comprehension with the Lexile scale. The total sample included 1,186 Grade
3 students, 893 Grade 5 students, and 1,531 Grade 8 students. The mean tested abilities
of the three samples were similar to the mean tested abilities of all students in each grade
on the state reading assessment. Though 3,610 students participated in the study, the data
records for only 2,867 of these students were used for determining the ensemble item
difficulties presented in this paper. The students were administered one of four forms at
each grade level. The reduction in sample size is because one of the four forms was created
64

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 64

8/14/07 6:54:18 PM

using the same ensemble items as another form. For consistency of sample size across
forms, the data records from this fourth form were not included in the ensemble study.
Instrument. Thirty text passages were response-illustrated by three different item writing
teams resulting in three items nested within each of 30 passages for a total of 90 items. All
three teams employed a similar item-writing protocol. The ensemble items were spiraled
into test forms at the grade level (3, 5, or 8) that most closely corresponded with the item’s
theoretical calibration.
Winsteps (Wright & Linacre, 2003) was used to estimate item difficulties for the 90
ensemble study items. Of primary interest in this study was the correspondence between
theoretical text calibrations, ensemble means and the consequences that theory misspecification holds for text measure standard errors.
Results. Table 14 presents the ensemble study data in which three independent teams wrote
one item for each of thirty passages for ninety items. Observed ensemble means taken over
the three ensemble item difficulties for each passage are given along with an estimate of the
within ensemble standard deviation for each passage.
The difference between passage text calibration and observed ensemble mean is provided
in the last column. The RMSE from regressing observed ensemble means on text calibrations is 110L. Figures 10a and 10b show plots of observed ensemble means compared to
theoretical text calibrations.
Note, that some of the deviations about the identity line are because ensemble means are
poorly estimated given that each mean is based on only three items. The bottom panel
in Figure 10b depicts simulated data when an error term [distributed  N(0,  = 64L)] is
added to each theoretical value. Contrasting the two plots in Figures 10a and 10b provides
a visual depiction of the difference between regressing observed ensemble means on theory
and regressing “true” ensemble means on theory. An estimate of the RMSE when “true”
ensemble means are regressed on the Lexile Theory is 64L (
1102  892  
4,038
 63.54). This is the average error at the passage level when predicting “true” ensemble
means from the Lexile Theory.
Since the RMSE equal to 64L applies to the expected error at the passage/slice level, a text
made up of ni slices would have an expected error of 64  
ni . Thus, a short periodical article of 500 words (ni  4) would have a SEM of 32L (64  
4 ), whereas a much
longer text like the novel Harry Potter and the Chamber of Secrets (880L, Rowling, 2001)
would have a SEM of 2L (64  
900 ). Table 15 contrasts the SEMs computed using the
old method with SEMs computed using the Lexile Framework for several books across a
broad range of Lexile measures.
As can be see in Table 15, the uncertainty associated with the measurement of the reading
demand of the text is small.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 65

65

8/14/07 6:54:18 PM

Table 14. Analysis of 30 item ensembles providing an estimate of the theory
misspecification error.
Item
Number

Theory
(T)

Team A

Team B

Team C

Meana
(O)

SDb

Within
Ensemble
Variance

T-O

1
2
3
4
11
5
6
7
8
9
12
13
14
16
15
21
10
17
22
18
19
23
24
25
20
26
27
28
29
30

400L
430L
460L
490L
510L
540L
569L
580L
620L
720L
720L
745L
770L
770L
790L
812L
820L
850L
866L
870L
880L
940L
960L
1010L
1020L
1020L
1040L
1060L
1150L
1210L

456
269
306
553
267
747
909
594
897
584
953
791
855
1077
866
902
967
747
819
974
1093
945
1124
926
888
1260
1503
1109
1014
1270

553
632
407
508
602
8925
657
683
805
850
587
972
1017
1095
557
1133
740
864
809
1197
733
1057
1205
1172
1372
987
1361
1091
1104
1291

303
704
483
670
468
654
582
807
497
731
774
490
958
893
553
715
675
674
780
870
692
965
1170
899
863
881
1239
981
1055
1014

437
535
399
577
446
742
716
695
733
722
771
751
944
1022
659
917
794
762
803
1014
839
989
1166
999
1041
1043
1368
1061
1058
1193

126
234
88
84
169
86
172
107
209
133
183
244
82
112
180
209
153
96
20
167
221
60
41
151
287
196
132
69
45
156

15,909
54,523
7,832
6,993
28,413
7,332
29,424
11,386
43,808
17,811
33,386
59,354
6,717
12,446
32,327
43,753
23,445
9,257
419
28,007
48,739
3,546
1,653
22,733
82,429
38,397
17,536
4,785
2,029
24,204

37
105
61
87
64
202
147
115
113
2
51
6
74
252
131
105
26
88
63
144
41
49
206
11
21
23
328
1
92
17

Total MSE  Average of (T  O)2  12022; Pooled within variance for ensembles  7984; Remaining between ensemble variance 
4038; Theory misspecification error  64L
Barlett’s test for homogeneity of variance produced an approximate chi-square statistic of 24.6 on 29 degrees of freedom and sustained
the null hypothesis that the variances are equal across ensembles.
Note. All data is reported in Lexiles.
a. Mean (O) is the observed ensemble mean.
b. SD is the standard deviation within ensemble.

66

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 66

8/14/07 6:54:18 PM

Figure 10a. Plot of observed ensemble means and theoretical calibrations (RMSE ⴝ 111L).

Ensemble Mean

1600

1200

800

400

0
0

400

800

1600

1200

Theory
Figure 10b. Plot of simulated “true” ensemble means and theoretical calibrations
(RMSE ⴝ 64L).

Ensemble Mean

1600

1200

800

400

0
0

400

800

1200

1600

Theory
Sources of Measurement Error—Item Writers
Another source of uncertainty in a test measure is due to the writers who develop the test
items. Item writers are trained to develop items according to a set of procedures, but item
writers are individuals and therefore subject to differences in behavior. General objectivity
requires that the origin and unit of measure be maintained independently of the instant
and particulars of the measurement process (Stenner, 1994). SRI purports to yield generally
objective measures of reader performance.
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 67

67

8/14/07 6:54:19 PM

Table 15. Old method text readabilities, resampled SEMs, and new SEMs for selected books.
Book
The Boy Who Drank Too Much
Leroy and the Old Man
Angela and the Broken Heart
The Horse of Her Dreams
Little House by Boston Bay
Marsh Cat
The Riddle of the Rosetta Stone
John Tyler
A Clockwork Orange
Geometry and the Visual Arts
The Patriot Chiefs
Traitors

Number of
Slices

Lexile
Measure

Resampled
Old
SEMa

New
SEM

257
309
157
277
235
235
49
223
419
481
790
895

447L
647L
555L
768L
852L
954L
1063L
1151L
1260L
1369L
1446L
1533L

102
9
118
126
126
125
70
89
268
140
139
140

4
4
5
4
4
4
9
4
3
3
2
2

Three slices selected for each replicate: one slice from the first third of the book, one from the middle third, and one from the last third.
Resampled 1,000 times. SEM  SD of the resampled distribution.

Prior to working on SRI, five item writers attended a four-hour training session that included
an introduction to the Lexile Framework, rules for writing native-Lexile format items, practice
in writing items, and instruction in how to use the Lexile Analyzer software to calibrate test
items. Each item writer was instructed to write 60 items uniformly distributed over the range
from 900L to 1300L. Items were edited for rule compliance by two trained item writers.
The resulting 300 items were organized into five test forms of 60 items each. Each item
writer contributed twelve items to each form. Items on a form were ordered from lowest
calibration to highest. The five forms were administered in random order over five days
to seven students (two sixth graders and five seventh graders). Each student responded
to all 300 items. Raw score performances were converted via the Rasch model to Lexile
measures using the theoretical calibrations provided by the Lexile Analyzer.
Table 16 displays the students’ scores by item writer. A part measure is the Lexile measure for
the student on the cross-referenced writer’s items (n  60). Part-measure resampled SEMs
describe expected variability in student performances when generalizing over items and days.
Two methods were used to determine each student’s Lexile measure: (1) across all 300
items and (2) by item writer. By employing two methods, different aspects of uncertainty
could be examined. Using the first method, resampling using the bootstrap procedure
accounted for uncertainty across item writers, items, and occasions. The reading comprehension abilities of the students ranged from 972L to 1360L. Since the items were targeted
at 900L to 1300L, only student D was mis-targeted. Mis-targeting resulted in the SEM of
the student’s score being almost twice that of the other students measured.

68

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 68

8/14/07 6:54:19 PM

Table 16. Lexile measures and standard errors across item writers.
Writer

Student
A

B

C

1

937 (58)

2

1000 (114)

927 (85) 1156 (72)

3

1002 (94)

1078 (72) 1095 (86)

4

952 (74)

D

E

964 (74) 1146 (105) 1375 (70) 1204 (73)

F

G

1128 (93)

1226 (155)

1249 (76) 1047 (118) 1156 (83)

136 (129)

1323 (127) 1189 (90)

1262 (90)

1236 (111)

1086 (71) 1251 (108) 1451 (126) 1280 (115) 1312 (95)

1251 (114)

5

973 (77)

945 (88) 1163 (82)

1452 (85) 1163 (77)

1223 (71)

1109 (116)

Across Items
& Days

972 (13)

1000 (34) 1162 (25)

1370 (39) 1176 (38)

1216 (42)

1192 (29)

Across IWs,
Items, Days

972 (48)

998 (46) 1158 (50)

1360 (91) 1170 (51)

1209 (54)

1187 (47)

Using the second method (level determined by analysis of the part scores of the items
written by each item writer), resampling using the bootstrap procedure accounted
for uncertainty across days and items. Error due to differences in occasions and items
accounted for about two-thirds of the errors in the student measures.
The box-and-whisker plots in Figure 11 display each student’s results with the box
representing the 90% confidence interval. The long line through each graph shows where
the student’s overall measure falls in relation to the part scores computed separately for each
item writer. For each student, his or her measure line passes through every box on the plot.
By chance alone at least three graphs would show lines that did not pass through a box.
Thus, the item writer’s effect on the student’s measure is negligible. Item writer is a proxy
for (1) mode of the text—whether the writer chose a narrative or expository passage,
(2) source of the text—no two writers wrote items for the same passage, and
(3) style variation—how writers created embedded completion items. A combination
of item-writing specification and the Lexile Analyzer’s calibration of items resulted in
reproducible reader measures based on theory alone.
General objectivity requires that the origin and unit of measure be maintained independently of the instant and particulars of the measurement process. This study demonstrates
that SRI produces reproducible measures of reader performance independently of item
author, source of text, and occasion of measurement.
The Lexile unit is specified through the calibration equations that operationalize the
construct theory. These equations are used to define and maintain the unit of measurement
independently of the method and instant of measurement. A Lexile unit transcends the
instrument and thereby achieves the status of a quantity. Without this transcendent quality,
units remain local and dependent on particular instruments and samples for their absolute
expression (Stenner, 1994).

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 69

69

8/14/07 6:54:20 PM

Figure 11. Examination of item writer error across items and occasions.

1700

Student B—Five Writers
LEXILE Measures

LEXILE Measures

Student A—Five Writers
1500
1300
1100
900
700
2

3

4

1300
1100
900

5

1

2

3

4

5

Writers

Writers

Student C—Five Writers

Student D—Five Writers

1700

LEXILE Measures

LEXILE Measures

1500

700
1

1500
1300
1100
900
700

1700
1500
1300
1100
900
700

1

2

3

4

5

1

2

3

4

5

Writers

Writers

Student E—Five Writers

Student F—Five Writers

1700

LEXILE Measures

LEXILE Measures

1700

1500
1300
1100
900
700

1700
1500
1300
1100
900
700

1

2

3

4

5

Writers

1

2

3

4

5

Writers

LEXILE Measures

Student G—Five Writers
1700
1500
1300
1100
900
700
1

2

3

4

5

Writers
70

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 70

8/14/07 6:54:20 PM

Sources of Measurement Error—Reader
Resampling of reader performance implies a different set of items (method) on a different
occasion (moment)—method and moment are random facets and are expected to vary
with each replication of the measurement process. With this definition of a replication
there is nothing special about one set of items compared with another set, nor is there anything special about one Tuesday morning compared to another. Any calibrated set of items
given on any day within a two-week period is considered interchangeable with any other
set of items given on another day (method and moment). The interchangeability of the
item sets suggests there is no a priori basis for believing that one particular method-moment
combination will yield a higher or lower measure than any other. That is not to say that
the resulting measures are expected to be the same. On the contrary, they are expected to
be different. It is unknown which method-moment combination will prove more difficult
and which more easy. The anticipated variance among replications due to method-moment
combinations and their interactions is error.
A better understanding of how these sources of error come about can be gained by describing some of the measurement and behavior factors that may vary from administration to
administration. Suppose that most of the SRI items that Sally responds to are sampled from
books in the Baby Sitter series (by Ann M. Martin), which is Sally’s favorite series. When Sally
is measured again, the items are sampled from less familiar texts. The differences in Lexile
measures resulting from highly familiar and unfamiliar texts would be error. The items on
each level of SRI were selected to minimize this source of error. It was specified during item
development that no more than two items could be developed from a single source or series.
Characteristics of the moment and context of measurement can contribute to variation in
replicate measures. Suppose, unknown to the test developer, scores increase with each replication due to practice effects. This “occasion main effect” also would be treated as error. Again,
suppose Sally is fed breakfast and rides the bus on Tuesdays and Thursdays, but on other days
Sally gets no breakfast and must walk one mile to school. Some of the test administrations occur
on what Sally calls her “good days” and some occur on “bad days.” Variation in her reading
performance due to these context factors contributes to error. (For more information related
to why scores change, see the paper entitled “Why do Scores Change?” by Gary L. Williamson
(2004) located at www.Lexile.com.)
The best approach to attaching uncertainty to a reader’s measure is to resample the item response
record (i.e., simulating what would happen if the reader were actually assessed again). Suppose
eight-year-old José takes two 40-item SRI tests one week apart. Occasions (the two different
days) and the 40 items nested within each occasion can be independently resampled (two-stage
resampling), and the resulting two measures averaged for each replicate. One thousand replications would result in a distribution of replicate measures. The standard deviation of this distribu-

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 71

71

8/14/07 6:54:22 PM

tion is the resampled SEM, and it describes uncertainty in José’s reading measure by treating
methods (items), moments (occasion and context), and their interactions as error. Furthermore,
in computing José’s reading measure and the uncertainty in that measure, he is treated as an
individual without reference to the performance of other students. In general, on SRI, typical
reader measure error across items (method) and days (moment) is 70L (Stenner, 1996).
Reader Measure Consistency. Alternate-form reliability examines the extent to which two
equivalent forms of an assessment yield the same results (i.e., students’ scores have the same
rank order on both tests). Test-retest reliability examines the extent to which two administrations of the same test yield similar results. When taken together, alternate-form reliability
and test-retest reliability are estimates of reader measure consistency. A study has examined
the consistency of reader measures. If decisions about individuals are to be made on the
basis of assessment data (for example, placement or instructional program decisions), then
the assessment results should exhibit a reliability coefficient of at least 0.85.
Study 1. In a large urban school district, SRI was administered to all students in Grades 2
through 10. Table 17 shows the reader consistency estimates for each grade level and across
all grades over a four-month period. The data is from the first and second SRI administrations during the 2004–2005 school year.

Table 17. SRI reader consistency estimates over a four-month period, by grade.

72

Grade

N

Reader Consistency Correlation

3
4
5
6
7
8
9
10

1,241
7,236
8,253
6,339
3,783
3,581
2,694
632

0.829
0.832
0.854
0.848
0.860
0.877
0.853
0.901

Total

33,759

0.894

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 72

8/14/07 6:54:22 PM

Forecasted Comprehension Error
The difference between a text measure and a reader measure can be used to forecast the
reader’s comprehension of the text. If a 1200L reader reads USA Today (1200L), the Lexile
Framework forecasts 75% comprehension. This forecast means that if a 1200L reader
responds to 100 items developed from USA Today, the number correct is estimated to be
75, or 75% of the items are administered. The same 1200L reader is forecast to have 50%
comprehension of senior-level college text (1450L) and 90% comprehension of The Secret
Garden (950L). How much error is present in such a forecast? That is, if the forecast were
recalculated, what kind of variability in the comprehension rate would be expected?
The comprehension rate is determined by the relationship between the reader measure and
the text measure. Consequently, error variation in the comprehension rate derives from error
variation in those two quantities. Using resampling theory, a small amount of variation in
the text measure and considerably more variation in the reader measure will be expected.
The result of resampling is a new text measure and a new reader measure, which combine to
forecast a new comprehension rate. Thus, errors in reader measure and text measure combine
to create variability in the replicated comprehension rate. Unlike text and reader error,
comprehension rate error is not symmetrical about the forecasted comprehension rate.
It is possible to determine a confidence interval for the forecasted comprehension rate.
Suppose a 1000L reader measured with 71L of error reads a 1000L text measured with 30L
of error. The error associated with the difference between the reader measure and the text
measure (0L) is 77L (Stenner and Burdick, 1997). Referring to Table 18, the 90% confidence interval for a 75% forecasted comprehension rate is 63% to 84% comprehension
(round the SED of 77L to 80L for nearest tabled value).

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 73

73

8/14/07 6:54:22 PM

Table 18. Confidence intervals (90%) for various combinations of comprehension rates
and standard error differences (SED) between reader and text measures.

74

Reader—Text
(in Lexiles)

Forecasted
Comprehension Rate

SED
40

SED
60

SED
80

SED
100

SED
120

250
225
200
175
150
125
100
75
50
25
0
25
50
75
100
125
150
175
200
225
250

50%
53%
55%
58%
61%
63%
66%
68%
71%
73%
75%
77%
79%
81%
82%
84%
85%
87%
88%
89%
90%

43–57
46–60
48–62
51–65
54–67
56–70
59–72
62–74
64–76
67–78
69–80
72–82
74–83
76–85
78–86
80–87
81–89
83–90
84–91
86–92
87–92

39–61
42–63
45–66
47–68
50–71
53–73
56–75
58–77
61–79
64–81
66–82
68–84
71–85
73–87
75–88
77–89
79–90
81–91
82–92
84–93
85–93

36–64
38–67
41–69
44–71
47–73
49–76
52–78
55–79
57–81
60–83
63–84
65–86
68–87
70–88
72–89
74–90
77–91
78–92
80–93
82–94
83–94

33–67
35–70
38–72
40–74
43–76
46–78
48–80
51–82
54–83
57–85
59–86
62–87
64–89
67–90
69–91
72–91
74–92
76–93
78–94
80–94
81–95

30–70
32–73
34–75
37–77
39–79
42–81
45–82
48–84
50–85
53–87
56–88
58–89
61–90
64–91
66–92
69–93
71–93
73–94
76–95
77–95
79–96

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 74

8/14/07 6:54:22 PM

Validity
Validity is the “extent to which a test measures what its authors or users claim it measures;
specifically, test validity concerns the appropriateness of inferences that can be made on
the basis of test results” (Salvia and Ysseldyke, 1998). The 1999 Standards for Educational and
Psychological Testing (America Educational Research Association, American Psychological
Association, and National Council on Measurement in Education) state that “validity
refers to the degree to which evidence and theory support the interpretations of test scores
entailed in the uses of tests” (p. 9). In other words, does the test measure what it is supposed
to measure?
“The process of ascribing meaning to scores produced by a measurement procedure is
generally recognized as the most important task in developing an educational or psychological measure, be it an achievement test, interest inventory, or personality scale” (Stenner,
Smith, and Burdick, 1983). The appropriateness of any conclusions drawn from the results
of a test is a function of the test’s validity. The validity of a test is the degree to which the
test actually measures what it purports to measure. Validity provides a direct check on how
well the test fulfills its purpose.
The sections that follow describe the studies conducted to establish the validity of SRI. As
additional validity studies are conducted, they will be described in future editions of the SRI
Technical Manual. For the sake of clarity, the various components of test validity—content
validity, criterion-related validity, and construct validity—will be described as if they are
unique, independent components rather than interrelated parts.

Content Validity
The content validity of a test refers to the adequacy with which relevant content has been
sampled and represented in the test. Content validity was built into SRI during its development. All texts sampled for SRI items are authentic and developmentally appropriate, and
the student is asked to respond to the texts in ways that are relevant to the texts’ genres
(e.g., a student is asked specific questions related to a nonfiction text’s content rather than
asked to make predictions about what would happen next in the text—a question more
appropriate for fiction). For middle school and high school students who read below
grade level, a subset of items from the main item pool is classified “Hi-Lo.” The Hi-Lo
pool of items was identified from all items developed for SRI based on whether they were
developmentally appropriate for middle school and high school students (high interest) and
had Lexile measures between 200L and 1000L (low difficulty). The administration of these
items ensures that students will read developmentally appropriate content.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 75

75

8/14/07 6:54:23 PM

Criterion-Related Validity
The criterion-related validity of a test indicates the test’s effectiveness in predicting an
individual’s behavior in a specific situation. Convergent validity examines those situations in
which test scores are expected to be influenced by behavior; conversely, discriminate validity
examines those situations in which test scores are not expected to be influenced by behavior.
Convergent validity looks at the relationships between test scores and other criterion
variables (e.g., number of class discussions, reading comprehension grade equivalent,
library usage, remediation). Because targeted reading intervention programs are specifically
designed to improve students’ reading comprehension, an effective intervention would be
expected to improve students’ reading test scores.
READ 180 ® is a research-based reading intervention program designed to meet the needs
of students in Grades 4 through 12 whose reading achievement is significantly below
the proficient level. READ 180 was initially developed through a collaboration between
Vanderbilt University and the Orange County (FL) Public School System between 1991
and 1999. It combines research-based reading practices with the effective use of technology to offer students an opportunity to achieve reading success through a combination of
instructional, modeled, and independent reading components. Because READ 180 is a
reading intervention program, students who participate in the program would be expected
to show improvement in their reading comprehension as measured by SRI.
Reading comprehension generally increases as a student progresses through school. It
increases rapidly during elementary school because students are specifically instructed in
reading. In middle school, reading comprehension grows at a slower rate because instruction concentrates on specific content areas, such as science, literature, and social studies.
SRI was designed to be a developmental measure of reading comprehension. Figure 12
shows the median performance (and upper and lower quartiles) on SRI for students at each
grade level. As predicted, student scores on SRI climb rapidly in elementary grades and
level off in middle school.
Discriminate validity looks at the relationships between test scores and other criterion
variables that the scores should not be related to (e.g., gender, race/ethnicity). SRI scores
would not be expected to fluctuate according to the demographic characteristics of the
students taking the test.
Study 1. During the 2003–2004 school year, the Memphis (TN) Public Schools remediated 525 students with READ 180 (Memphis Public Schools, no date). Pretests were
administered between May 1, 2003 and December 1, 2003, and posttests were administered
between January 1, 2004 and August 1, 2004. A minimum of one month and a maximum
of 15 months elapsed between the pretest and posttest. Pretest scores ranged from 24L to
1070L with a mean of 581L (standard deviation of 606L). Posttest scores ranged from 32L
to 1261L with a mean of 667L (standard deviation of 214L). The mean gain from pretest to
posttest was 85.2L (standard deviation of 183L). Figure 13 shows the distribution of scores
on the pretest and the posttest for all students.

76

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 76

8/14/07 6:54:23 PM

Figure 12. Growth on SRI—Median and upper and lower quartiles, by grade.
1400
1200

Lexile Measure

1000
800
600
400
200
0

1

2

3

4

5

6

7

8

9

10

11

Grade Level
The results of the study show a positive relationship between SRI scores and enrollment in
a reading intervention program.
Study 2. During the 2002–2003 school year, students at 14 middle schools in Clark
County (NV) School District participated in READ 180 and completed SRI. Of the
4,223 students pretested in August through October and posttested in March through May,
399 students had valid numerical data for both the pretest and the posttest. Table 19 shows
the mean gains in Lexile measures by grade level.
The results of the study show a positive relationship between SRI scores and enrollment in
a reading intervention program.
Study 3. During the 2000–2001 through 2004–2005 school years, the Des Moines (IA)
Independent Community School District administered READ 180 to 1,213 special education middle school and high school students (Hewes, Mielke, and Johnson, 2006; Palmer,
2003). SRI was administered as a pretest to students entering the intervention program and
as a posttest at the end of each school year. SRI pretest scores were collected for 1,168 of
the sampled students; posttest 1 scores were collected for 1,122 of the sampled students; and
posttest 2 scores were collected for 361 of the sampled students. Figure 14 shows the mean
pretest and posttest scores (1 and 2) for students in various cohorts. The standard deviation
across all students was 257.40 Lexiles.
As shown in Figure 14, reading ability as measured by SRI increased from the initial grade
level of the student. In addition, when the students’ cohort, starting grade, pattern of
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 77

77

8/14/07 6:54:23 PM

Figure 13. Memphis (TN) Public Schools: Distribution of initial and final SRI scores for
READ 180 participants.

14
Distribution of SRI Scores

Percent of Fall 2003 READ 180
Participants (N = 314)

12
10

Mean
Stnd Dev
Median
Medium
Maximum

Initial SRI
Scores

Final SRI
Scores

581L ( 18.7L)
218L
606L
24L
1070L

667L ( 17.7L)
214L
698L
32L
1261L

8
6
4
Mean Final
SRI Score

Mean Initial
SRI Score

2

0
50
10
0
15
0
20
0
25
0
30
0
35
0
40
0
45
0
50
0
55
0
60
0
65
0
70
0
75
0
80
0
85
0
90
0
95
10 0
0
10 0
5
11 0
0
11 0
5
12 0
0
12 0
5
13 0
0
13 0
5
14 0
0
14 0
5
15 0
00

0

Lexile Scale Score
Initial Test Score

Final Test Score

Adapted from Memphis Public Schools (no date), Exhibit 2.

Table 19. Clark County (NV) School District: Normal curve equivalents on SRI by grade level.
Grade

N

SRI Pretest Mean (SD)

SRI Posttest Mean (SD)

Gain (SD)

6
7
8

159
128
52

N/A
N/A
N/A

N/A
N/A
N/A

88.91 (157.24)**
137.84 (197.44)**
163.12 (184.20)**

Total

399

461.09 (204.57)

579.86 (195.74)

118.77

Adapted from Papalewis (2003), Table 4.
** p  .01, pre to post paired t test.

participation, and level of special education were controlled for, students grew at a rate of
39.68 Lexiles for each year of participation in READ 180 (effect size .15; NCE  3.16).
“These were annual gains associated with READ 180 above and beyond yearly growth in
achievement” (Hewes, Mielke, and Johnson, 2006, p. 14). Students who started READ 180
in middle school (Grades 6 and 7) improved the most.

78

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 78

9/26/07 6:03:45 PM

Figure 14. Des Moines (IA) Independent Community School District: Group SRI mean Lexile
measures, by starting grade level in READ 180.
800

Lexile Measure

700
600
500
400
300
Pretest

200

Posttest 1

100

Posttest 2

0
6

7

8

9

10

11

Initial Grade Level
Study 4. The St. Paul (MN) School District implemented READ 180 in middle schools
during the 2003–2004 school year (St. Paul School District, no date). A total of 820
students were enrolled in READ 180 (45% regular education, 34% English language learners, 15% special education, and 6% ELL/SPED), and of those students 44% were African
American, 30% Asian, 15% Caucasian, 9% Hispanic, and 2% Native American. Of the 820
students in the program, 573 students in Grades 7 and 8 had complete data for SRI. The
mean group pretest score was 659.0L, and the mean group posttest score was 768.5L with a
gain of 109.5L (p  .01). The results of the study show a positive relationship between SRI
scores and enrollment in a reading intervention program.
Study 5. Fairfax County (VA) Public Schools implemented READ 180 for 548 students
in Grades 7 and 8 at 11 middle schools during the 2002–2003 school year (Pearson and
White, 2004). The general population at the 11 schools was as follows: 45% Caucasian, 22%
Hispanic, and 18% African American; 55% male and 45% female; 16% classified as English
for Speakers of Other Languages (ESOL); and 25% classified as receiving special education
services. The sample of students enrolled in READ 180 can be described as follows: 15%
Caucasian, 37% Hispanic, and 29% African American; 52% male and 48% female; 42% classified as ESOL; and 14% classified as receiving special education services. The population
that participated in the READ 180 program can be considered significantly different from
the general population in terms of race/ethnicity, ESOL classification, and special education services received.
Pretest Lexile scores from SRI ranged from 136L to 1262L with a mean of 718L (standard
deviation of 208L). Posttest Lexile scores from SRI ranged from 256L to 1336L with a
mean of 815L (standard deviation of 203L). The mean gain from pretest to posttest was

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 79

79

8/14/07 6:54:24 PM

95.9L (standard deviation of 111.3L). The gains in Lexile scores were statistically significant,
and the effect size was 0.46 standard deviations. The results of the study showed a positive
relationship between SRI scores and enrollment in a reading intervention program.
The study also examined the gains of various subgroups of students and observed that “no
statistically significant differences in the magnitude of pretest-posttest changes in reading
ability were found to be associated with other characteristics of READ 180 participants:
gender, race, eligibility for ESOL, eligibility for special education, and the number of days
the student was absent from school during 2002–03” (Pearson and White, 2004, p. 13).
Study 6. Indian River (DE) School District piloted READ 180 at Selbyville Middle
School during the 2003–2004 school year for students in Grades 6 though 8 performing
in the bottom quartile of standardized assessments (Indian River School District, no date).
During the 2004–2005 school year, SRI was administered to all students in the district
enrolled in READ 180 (the majority of students also received special education services).
Table 20 presents the descriptive statistics for students enrolled in READ 180 at Selbyville
Middle School and Sussex Central Middle School.

Table 20. Indian River (DE) School District: SRI average scores (Lexiles) for READ 180
students in 2004–2005.
Grade

N

Fall SRI Lexile measure
(Mean/SD)

Spring SRI Lexile measure
(Mean/SD)

6
7
8

65
57
62

498.0 (242.1)
518.0 (247.7)
651.5 (227.8)

651.2 (231.7)
734.8 (182.0)
818.6 (242.9)

Adapted from Indian River School District (no date), Table 1.

Based on the results, the increase in students classified as “Reading at Grade Level” was
18.5% in Grade 6, 13.4% in Grade 7, and 26.2% in Grade 8. “Students not only showed
improvement in the quantitative data, they also showed an increase in their positive
attitudes toward reading in general” (Indian River School District, no date, p. 1). The results
of the study show a positive relationship between SRI scores and enrollment in a reading
intervention program. In addition, SRI scores monotonically increased across grade levels.
Study 7. In response to a drop-out problem with special education students at Fulton
Middle School (Callaway County, GA), READ 180 was implemented in 2005 (Sommerhauser, 2006). Students in Grades 6 and 7 whose reading skills were significantly below
grade level (N = 24) participated in the program. The results showed that “20 of the 24
students have shown improvement in their Lexile scores, a basic reading test.”
Study 8. East Elementary School in Kodiak, Alaska, instituted a reading program in 2000
that matched readers with text at their level of comprehension (MetaMetrics, 2006c).
Students were administered SRI as part of the Scholastic Reading Counts! ® program and
encouraged to read books at their Lexile level. Reed, the school reading specialist, stated

80

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 80

8/14/07 6:54:25 PM

that the program has led to more books being checked out of the library, increased student
enthusiasm for reading, and increased teacher participation in the program (e.g., lesson
planning, materials selection across all content areas).
Study 9. The Kirkwood (MO) School District Implemented READ 180 between 1999
and 2003 (Thomas, 2003). Initially, students in Grades 6 through 8 were enrolled. In subsequent years, the program was expanded to include students in Grades 4 through 8. The
program served: 379 students during the 2000–2001 school year (34% classified as Special
Education/SSD); 311 students during the 2001–2002 school year (43% classified as Special
Education/SSD); and 369 students during the 2002–2003 school year (41% classified as
Special Education/SSD). Figures 15 through 17 show the pretest and posttest scores of
general education students for three years of the program.
The results of the study show a positive relationship between SRI scores and enrollment
in a reading intervention program (within school year gains for 90% of students enrolled
in the program). The study concluded that “fourth and fifth grade students have higher
increases than middle school, students, reinforcing the need for earliest intervention.
Middle school scores, however, are influenced by higher numbers of new students needing
reading intervention” (Thomas, 2003, p. 7).
Study 10. In fall 2003, the Phoenix (AZ) Union High School District began using
Stage C of READ 180 to help struggling ninth- and tenth-grade students become
proficient readers and increase their opportunities for success in school (White and
Haslam, 2005). Of the Grade 9 students (N  882) who participated, 49% were classified
as ELL and 9% were eligible for Special Education services. Information was not provided
for the Grade 10 students (N  697).
For students in Grade 9, the mean gain from SRI pretest to posttest was 110.9L. For
students in Grade 10, the mean gain from pretest to posttest was 68.8L for the fall cohort
and 110.9L for the spring cohort. The gains in Lexile scores were statistically significant at
the .05 level. The results of the study showed a positive relationship between SRI scores
and enrollment in a reading intervention program.
The study also examined the gains of various subgroups of students. No significant differences were observed between students classified as ELL (ELL gain scores of 13.3 NCEs
and non-ELL gain scores of 13.5 NCEs, p  .86). No significant differences were observed
between students eligible for Special Education services (Special Education gain scores of
13.7 NCEs and non-Special Education gain scores of 13.5 NCEs, p  .88).
Study 11. A large urban school district administers SRI to all students in Grades 2 through
10. Data has been collected since the 2000–2001 school year and matched at the student
level. All students are administered SRI at the beginning of the school year (September)
and in March, and a sample of students in intervention programs are administered SRI in
December also. Information is collected on race/ethnicity, gender, and limited English
proficiency (LEP) classification. The student demographic data presented in Table 21 is from
the 2004–2005 school year.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 81

81

8/14/07 6:54:25 PM

Table 21. Large Urban School District: SRI scores by student demographic classification.

N

Mean (SD)

3,498
35,500
27,260
723
5,305
65,124

979.90 (316.21)
753.43 (316.55)
790.24 (338.11)
868.41 (311.20)
906.42 (310.10)
982.54 (303.79)

68,454
68,956

898.21 (316.72)
865.10 (345.26)

6,926
7,459
13,917
109,108

689.73 (258.22)
435.98 (292.68)
890.52 (288.37)
923.10 (316.67)

Student Demographic Characteristic
Race/Ethnicity
• Asian
• African American
• Hispanic
• Indian
• Multiracial
• Caucasian
Gender
• Female
• Male
Limited English Proficiency Status
• Former LEP student
• Limited English and in ESOL program
• Exited from ESOL program
• Never in ESOL program

Figure 15. Kirkwood (MO) School District: Pretest and posttest SRI scores, school year
2000–2001, general education students.
1000
900

Lexile Measure

800
700
600
500
400
300
200

Pretest

100

Posttest 1

0
2

3

4

5

6

7

8

9

Grade Level

82

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 82

8/14/07 6:54:25 PM

Figure 16. Kirkwood (MO) School District: Pretest and posttest SRI scores, school year
2001–2002, general education students.
1000
900

Lexile Measure

800
700
600
500
400
300
200

Pretest

100

Posttest

0
2

3

4

5

6

7

8

9

Grade Level
Figure 17. Kirkwood (MO) School District: Pretest and posttest SRI scores, school year
2002–2003, general education students.
1000
900

Lexile Measure

800
700
600
500
400
300
200

Pretest

100

Posttest

0
2

3

4

5

6

7

8

9

Grade Level
Given the sample sizes, the contrasts are significant. Using the rule of thumb that a quarter
of a standard deviation represents an educational difference, the data shows that Caucasian
students score significantly higher than all other groups except Asian students. The data does
not show any differences based on gender, and the observed differences based on LEP status
are expected.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 83

83

8/14/07 6:54:25 PM

Construct Validity
The construct validity of a test is the extent to which the test may be said to measure a
theoretical construct or trait, such as reading comprehension. Anastasi (1982) identifies a
number of ways that the construct validity of a test can be examined. Two of the techniques are appropriate for examining the construct validity of Scholastic Reading Inventory.
One technique is to examine developmental changes in test scores for traits that are
expected to increase with age. Another technique is to examine the “correlations between
a new test and other similar tests . . . [the correlations are] evidence that the new test
measures approximately the same general areas of behavior as other tests designated by the
same name” (p. 145).
Construct validity is the most important aspect of validity related to the computer adaptive
test of SRI. This product is designed to measure the development of reading comprehension; therefore, how well it measures reading comprehension and how well it measures the
development of reading comprehension must be examined.
Reading Comprehension Construct. Reading comprehension is the process of independently constructing meaning from text. Scores from tests purporting to measure the
same construct, for example “reading comprehension,” should be moderately correlated
(Anastasi, 1982). (For more information related to how to interpret multiple test scores
reported in the same metric, see the paper entitled “Managing Multiple Measures” by Gary
L. Williamson (2006) located at www.Lexile.com.)
Study 1. During the 2000–2001 through 2004–2005 school years, the Des Moines (IA)
Independent Community School District enrolled 1,213 special education middle and
high school students in READ 180. SRI was administered as a pretest to students entering
READ 180 and annually at the end of each school year as a posttest. A correlation of 0.65
(p  .05) was observed between SRI and the Stanford Diagnostic Reading Test (SDRT4)
Comprehension subtest; a correlation of 0.64 (p  .05) was observed between SRI and the
SDRT4 Vocabulary subtest; and a correlation of 0.65 (p  .05) was observed between SRI
and the SDRT4 total score. “The low correlations observed for this sample of students may
be related to the fact that this sample is composed exclusively of special education students”
(Hewes, Mielke, and Johnson, 2006, p. A-3)

84

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 84

8/14/07 6:54:26 PM

Study 2. A large urban school district administers SRI to all students in Grades 2 through
10. Data has been collected since the 2000–2001 school year and matched at the student
level. All students are administered SRI at the beginning of the school year (September)
and in March, and a sample of students in intervention programs are administered SRI in
December also. Students are also administered the state assessment, the Florida Comprehensive Assessment Test, which consists of a norm-referenced assessment (Stanford Achievement
Tests, Ninth or Tenth Edition [SAT-9/10]) and a criterion-referenced assessment (Sunshine
State Standards Test [SSS]). In addition, a sample of students takes the PSAT. Tables 22
through 24 show the descriptive statistics for matched samples of students during four years
of data collection.

Table 22. Large Urban School District: Descriptive statistics for SRI and the SAT-9/10,
matched sample.

2001–2002
2002–2003
2003–2004
2004–2005

SAT-9/10
(reported in Lexiles)

SRI

School Year

N

Mean (SD)

N

Mean (SD)

79,423
80,677
84,707
85,486

848.22 (367.65)
862.42 (347.03)
895.70 (344.45)
885.07 (349.40)

87,380
88,962
91,018
101,776

899.47 (244.30)
909.54 (231.29)
920.94 (226.30)
881.11 (248.53)

r
0.824
0.800
0.789
0.821

From the results it can be concluded that SRI measures a construct similar to that measured
by other standardized tests designed to measure reading comprehension. The magnitude
of the within-grade correlations between SRI and the PSAT is close to the observed
correlations for parallel test forms (i.e., alternate forms reliability), thus suggesting that the
different tests are measuring the same construct. The SAT-9/10, SSS, and PSAT consist
of passages followed by traditional multiple-choice items, and SRI consists of embedded
completion multiple-choice items. Despite the differences in format, the correlations
suggest that the four assessments are measuring a similar construct.

Table 23. Large Urban School District: Descriptive statistics for SRI and the SSS, matched
sample.
School Year
2001–2002
2002–2003
2003–2004
2004–2005

SRI

SSS

N

Mean (SD)

N

Mean (SD)

79,423
80,677
84,707
85,486

848.22 (367.65)
862.42 (347.03)
895.70 (344.45)
885.07 (349.40)

87,969
90,770
92,653
104,803

1641 (394.98)
1679 (368.26)
1699 (361.46)
1683 (380.13)

r
0.835
0.823
0.817
0.825
Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 85

85

8/14/07 6:54:26 PM

Table 24. Large Urban School District: Descriptive statistics for SRI and the PSAT, matched
sample.
School Year
2002–2003
2003–2004
2004–2005

SRI

PSAT

N

Mean (SD)

N

Mean (SD)

80,677
84,707
85,486

862.42 (347.03)
895.70 (344.45)
885.07 (349.40)

2,219
2,146
1,731

44.48 (11.70)
41.86 (12.14)
44.64 (11.40)

r
0.730
0.696
0.753

Study 3. In 2005, a group of 20 Grade 4 students at a Department of Defense Education Activity (DoDEA) school in Fort Benning (GA), were administered both SRI and
SRI-Print (Level 14, Form B). The correlation between the two Lexile measures was 0.92
(MetaMetrics, 2005). The results show that the two tests measure similar reading constructs.
Developmental Nature of Scholastic Reading Inventory. Reading is a skill that is expected
to develop with age—as students read more, their skills improve, and therefore they are able
to read more complex material. Because growth in reading comprehension is uneven, with
the greatest growth usually taking place in earlier grades, SRI scores should show a similar
trend of decreasing gains as grade level increases.
Study 1. A middle school in Pasco County (FL) School District administered SRI during
the 2005–2006 school year to 721 students. Growth in reading ability was examined by
collecting data in September and April. The mean Lexile measure in September across all
grades was 978.26L (standard deviation of 194.92), and the mean Lexile measure in April was
1026.12L (standard deviation of 203.20). The mean growth was 47.87L (standard deviation
of 143.09). The typical growth for middle school students is approximately 75L across a
calendar year (see Williamson, Thompson, and Baker, 2006). When the growth for the sample
of students in Pasco County was prorated to compare with a typical year’s growth, 73.65L
is consistent with prior research. In addition, when the data was examined by grade level,
it was observed that Grade 6 exhibited the most growth, while growth tapered off in later
grades (Grade 6, N ⫽ 211, Growth ⫽ 56L [prorated 87L]; Grade 7, N ⫽ 254, Growth ⫽ 52L
[prorated 79L]; Grade 8, N ⫽ 256, Growth ⫽ 37L [prorated 58L]).
Study 2. A large urban school district administers SRI to all students in Grades 2 through
10. Data has been collected since the 2000–2001 school year and matched at the student
level. All students are administered SRI at the beginning of the school year (September)
and in March, and a sample of students in intervention programs are administered SRI in
December also.
The data was examined to estimate growth in reading ability using a quadratic regression
equation. Students with at least seven SRI scores were included in the analyses (45,495
students out of a possible 172,412). The resulting quadratic regression slope was slightly
more than 0.50L/day (about 100L of growth between fall and spring), which is consistent
with prior research conducted by MetaMetrics, Inc. (see Williamson, Thompson, and Baker,
86

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 86

9/26/07 6:03:56 PM

2006). The median R-squared coefficient was between .800 and .849, which indicates that
the correlation between reading ability and time is approximately 0.91. Figure 18 shows
the fit of the model compared to observed SRI data.

Figure 18. Large Urban School District: Fit of quadratic growth model to SRI data for
students in Grades 3 through 10.
1400

Lexile Measure

1200
1000
800
600
400

SRI Observed Lexile

200

SRI Quadratic Fit

0
3

4

5

6

7

8

9

10

11

Grade

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 87

87

8/14/07 6:54:27 PM

Appendix 1: Lexile Framework Map
Connecting curriculum-based reading to the Lexile Framework, the titles in this chart are typical of texts
that developmentally correspond to Lexile® level.
There are many readily available texts that have older interest levels but a lower Lexile level (hi-lo titles).
Conversely, there are many books that have younger interests but are written on a higher Lexile level
(adult-directed picture books). By evaluating the Lexile level for any text, educators can provide reading
opportunities that foster student growth.
For more information on the Lexile ranges for additional titles, please visit www.Lexile.com or the
Scholastic Reading Counts! ® e-Catalog at www.Scholastic.com.

LEXILE LEVEL

200L

BENCHMARK LITERATURE

BENCHMARK NONFICTION TEXTS

Clifford The Big Red Dog
by Norman Bridwell (220L)

Harbor by Donald Crews (220L)

Inch by Inch by Leo Lionni (210L)

Amanda Pig, Schoolgirl
by Jean Van Leeuwen (240L)

Ms. Frizzle’s Adventure: Medieval Castles
by Joanna Cole (270L)

The Cat in the Hat by Dr. Seuss (260L)
Hey, Al! by Arthur Yorinks (320L)

300L

400L

“A” My Name is Alice
by Jane Bayer (370L)
Arthur Goes to Camp
by Marc Brown (380L)

You Forgot Your Skirt, Amelia Bloomer
by Shana Corey (350L)
George Washington and the General’s
Dog by Frank Murphy (380L)
How A Book is Made by Aliki (390L)

Frog and Toad are Friends
by Arnold Lobel (400L)

How My Parents Learned to Eat
by Ina R. Friedman (450L)

Cam Jansen and the Mystery of
the Stolen Diamonds
by David A. Adler (420L)

Finding Providence by Avi (450L)
When I Was Nine
by James Stevenson (470L)

Bread and Jam for Frances
by Russell Hoban (490L)

500L

Bicycle Man by Allen Say (500L)

By My Brother’s Side by Tiki Barber (500L)

Can I Keep Him?
by Steven Kellogg (510L)

The Wild Boy by Mordicai Gerstein (530L)

The Music of Dolphins
by Karen Hesse (560L)
Artemis Fowl by Eoin Colfer (600L)

600L

Sadako and the Thousand Paper Cranes
by Eleanor Coerr (630L)
Charlotte’s Web by E.B. White (680L)

The Emperor’s Egg
by Martin Jenkins (570L)
Koko’s Kitten
by Dr. Francine Patterson (610L)
Lost City: The Discovery of Machu Picchu
by Ted Lewin (670L)
Passage to Freedom: The Sugihara Story
by Ken Mochizuki (670L)

88

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 88

9/26/07 6:04:17 PM

LEXILE LEVEL

700L

BENCHMARK LITERATURE

Journey to Ellis Island: How My Father
Came to America by Carol Bierman (750L)

Beethoven Lives Upstairs
by Barbara Nichol (750L)

The Red Scarf Girl by Ji-li Jiang (780L)

Harriet the Spy by Louise Fitzhugh (760L)
Interstellar Pig by William Sleator (810L)
Charlie and the Chocolate Factory
by Roald Dahl (810L)

800L

BENCHMARK NONFICTION TEXTS

Bunnicula
by Deborah Howe, James Howe (710L)

Julie of the Wolves
by Jean Craighead George (860L)

Four Against the Odds
by Stephen Krensky (790L)
Can’t You Make Them Behave, King
George? by Jean Fritz (800L)
Anthony Burns: The Defeat and
Triumph of a Fugitive Slave
by Virginia Hamilton (860L)
Having Our Say: The Delany Sisters’
First 100 Years by Sarah L. Delany
and A. Elizabeth Delany (890L)

900L
1000L
1100L

Roll of Thunder, Hear My Cry
by Mildred D. Taylor (920L)

October Sky
by Homer H. Hickam, Jr. (900L)

Abel’s Island by William Steig (920L)

Black Boy by Richard Wright (950L)

The Slave Dancer by Paula Fox (970L)

All Creatures Great and Small
by James Herriott (990L)

Hatchet by Gary Paulsen (1020L)

The Greatest: Muhammad Ali
by Walter Dean Myers (1030L)

The Great Gatsby
by F. Scott Fitzgerald (1070L)
Their Eyes Were Watching God
by Zora Neale Hurston (1080L)

Anne Frank: Diary of A Young Girl
by Anne Frank (1080L)
My Thirteenth Winter
by Samantha Abeel (1050L)

Pride and Prejudice
by Jane Austen (1100L)

Black Diamond
by Patricia McKissack (1100L)

Ethan Frome by Edith Wharton (1160L)

Dead Man Walking
by Helen Prejean (1140L)

Animal Farm by George Orwell (1170L)

Hiroshima by John Hersey (1190L)

1200L

Great Expectations
by Charles Dickens (1200L)

In the Shadow of Man
by Jane Goodall (1220L)

The Midwife’s Apprentice
by Karen Cushman (1240L)

Fast Food Nation: The Dark Side of the
All-American Meal
by Eric Schlosser (1240L)

The House of the Spirits
by Isabel Allende (1280L)

1300L

Eight Tales of Terror
by Edgar Allan Poe (1340L)
The Metamorphosis
by Franz Kafka (1320L)
Silas Marner by George Eliot (1330L)

Into the Wild by Jon Krakauer (1270L)
Common Sense by Thomas Paine (1330L)
Never Cry Wolf by Farley Mowat (1330L)
The Life and Times of Frederick Douglass
by Frederick Douglass (1400L)

Te c h n i c a l G u i d e

74216_SRI_TechGuide_FC-105.indd 89

89

10/8/07 9:47:54 AM

Appendix 2: Fall Norm Tables
Fall scores based norming study performed by MetaMetrics to determine a baseline for growth.

90

Fall
Percentile
1
5
10
25
35
50
65
75
90
95

Grade 1
BR
BR
BR
BR
BR
BR
BR
BR
105
205

Grade 2
BR
BR
BR
115
200
310
425
520
650
750

Grade 3
BR
75
160
360
455
550
645
715
850
945

Grade 4
BR
225
295
470
560
670
770
835
960
1030

Grade 5
50
350
430
610
695
795
875
945
1060
1125

Grade 6
160
425
490
670
760
845
925
985
1095
1180

Fall
Percentile
1
5
10
25
35
50
65
75
90
95

Grade 7
210
510
590
760
825
910
985
1050
1160
1245

Grade 8
285
550
630
815
885
970
1045
1105
1210
1295

Grade 9
380
655
720
865
935
1015
1095
1150
1260
1345

Grade 10
415
670
735
880
960
1045
1125
1180
1290
1365

Grade 11
455
720
780
930
995
1080
1155
1205
1315
1390

Grade 12
460
745
805
945
1010
1090
1165
1215
1325
1405

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 90

9/26/07 6:04:18 PM

Appendix 2: Spring Norm Tables

Spring
Percentile
1
5
10
25
35
50
65
75
90
95

Grade 1
BR
BR
BR
BR
BR
150
270
345
550
635

Grade 2
BR
BR
BR
275
400
475
575
645
780
870

Grade 3
BR
125
210
390
480
590
690
755
890
965

Grade 4
BR
255
325
505
595
700
800
865
990
1060

Grade 5
BR
390
475
630
710
810
905
970
1085
1155

Grade 6
190
455
525
700
775
880
975
1035
1155
1220

Spring
Percentile
1
5
10
25
35
50
65
75
90
95

Grade 7
240
545
625
780
860
955
1040
1095
1210
1270

Grade 8
295
560
645
835
905
1000
1090
1145
1265
1330

Grade 9
400
670
730
880
960
1045
1125
1180
1290
1365

Grade 10
435
720
780
930
995
1080
1155
1205
1320
1290

Grade 11
465
745
810
945
1010
1090
1165
1215
1330
1405

Grade 12
465
755
820
955
1020
1100
1175
1225
1340
1415

Te c h n i c a l G u i d e

74216_SRI_TechGuide_FC-105.indd 91

91

10/8/07 9:47:58 AM

Appendix 3: References
America Educational Research Association, American Psychological Association, and
National Council on Measurement in Education. (1999). Standards for educational and
psychological testing. Washington, DC: American Educational Research Association.
Anastasi, A. (1982). Psychological Testing (Fifth Edition). New York: MacMillan Publishing
Company, Inc.
Anderson, R.C., Hiebert, E.H., Scott, J.A., & Wilkinson, I. (1985). Becoming a nation of readers:The report of the commission on reading. Washington, DC: U.S. Department of Education.
Bond, T.G. & Fox, C.M. (2001). Applying the Rasch model: Fundamental measurement in the
human sciences. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
Bormuth, J.R. (1966). Readability: New approach. Reading Research Quarterly, 7, 79–132.
Bormuth, J.R. (1967). Comparable cloze and multiple-choice comprehension test scores.
Journal of Reading, February 1967, 292–299.
Bormuth, J.R. (1968). Cloze test readability: Criterion reference scores. Journal of Educational Measurement, 3(3), 189–196.
Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: The University of
Chicago Press.
Carroll, J.B., Davies, P., & Richman, B. (1971). Word frequency book. Boston: Houghton
Mifflin.
Carver, R.P. (1974). Measuring the primary effect of reading: Reading storage technique,
understanding judgments and cloze. Journal of Reading Behavior, 6, 249–274.
Chall, J.S. (1988). “The beginning years.” In B.L. Zakaluk and S.J. Samuels (Eds.), Readability: Its past, present, and future. Newark, DE: International Reading Association.
Crain, S. & Shankweiler, D. (1988). “Syntactic complexity and reading acquisition.” In
A. Davidson and G.M. Green (Eds.), Linguistic complexity and text comprehension: Readability
issues reconsidered. Hillsdale, NJ: Erlbaum Associates.
Crawford, W.J., King, C.E., Brophy, J.E., & Evertson, C.M. (1975, March). Error rates and
question difficulty related to elementary children’s learning. Paper presented at the annual
meeting of the American Educational Research Association, Washington, D.C.
Davidson, A. & Kantor, R.N. (1982). On the failure of readability formulas to define readable text: A case study from adaptations. Reading Research Quarterly, 17, 187–209.
Dunn, L.M. & Dunn, L.M. (1981). Peabody Picture Vocabulary Test-Revised, Forms L and M.
Circle Pines, MN: American Guidance Service.
Five, C. L. (1986). Fifth graders respond to a changed reading program. Harvard Educational
Review, 56, 395-405.

92

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 92

9/26/07 6:04:19 PM

Fountas, I.C. & Pinnell, G.S. (1996). Guided Reading: Good First Teaching for All Children.
Portsmouth, NH: Heinemann Press.
Grolier, Inc. (1986). The Electronic Encyclopedia, a computerized version of the Academic
American Encyclopedia. Danbury, CT: Author.
Haladyna, T.M. (1994). Developing and validating multiple-choice test items. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Hambleton, R.K. & Swaminathan, H. (1985). Item response theory: Principles and appplications.
Boston: Kluwer · Nijhoff Publishing.
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response
theory (Measurement methods for the social sciences,Volume 2). Newbury Park, CA: Sage
Publications, Inc.
Hardwicke S.B. & Yoes M.E (1984). Attitudes and performance on computerized adaptive testing. San Diego: Rehab Group.
Hewes, G.M., Mielke, M.B., & Johnson, J.C. (2006, January). Five years of READ 180 in Des
Moines: Middle and high school special education students. Policy Studies Associates: Washington,
DC.
Hiebert, E.F. (1998, November). Text matters in learning to read. CIERA Report 1-001.
Ann Arbor, MI: Center for the Improvement of Early Reading Achievement (CIERA).
Huynh, H. (1998). On score locations of binary and partial credit items and their applications to item mapping and criterion-referenced interpretation. Journal of Educational and
Behavioral Statistics, 23(1), 38–58.
Indian River School District. (no date). Special education students: Shelbyville Middle and
Sussex Central Middle Schools. [Draft manuscript provided by Scholastic Inc., January 25,
2006.]
Klare, G.R. (1963). The measurement of readability. Ames, IA: lowa State University Press.
Klare, G.R. (1984). Readability. In P.D. Pearson (Ed.), Handbook of reading research (Volume
1, 681-744). Newark, DL: International Reading Association.
Liberman, I.Y., Mann,V.A., Shankweiler, D., & Westelman, M. (1982). Children’s memory
for recurring linguistic and non-linguistic material in relation to reading ability. Cortex, 18,
367–375.
Memphis Public Schools. (no date). How did MPS students perform at the initial administration of SRI? [Draft manuscript provided by Scholastic Inc., January 25, 2006.]
MetaMetrics, Inc. (2005, December). SRI paper vs. SRI Interactive [unpublished data].
Durham, NC: Author.
MetaMetrics, Inc. (2006a, January). Brief description of Bayesian grade level priors [unpublished manuscript]. Durham, NC: Author.

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 93

93

8/14/07 6:54:28 PM

MetaMetrics, Inc. (2006b, August). Lexile Vocabulary Analyzer:Technical report. Durham, NC:
Author.
MetaMetrics, Inc. (2006c, October). “Lexiles help Alaska elementary school foster strong
reading habits, increase students reading proficiency.” Lexile Case Studies, October 2006
[available at www.Lexile.com]. Durham, NC: Author.
Miller, G.A. & Gildea, P.M. (1987). How children learn words. Scientific American, 257,
94–99.
Palmer, N. (2003, July). An evaluation of READ 180 with special education students. New
York: Scholastic Research and Evaluation Department/Scholastic Inc.
Papalewis R. (2003, December). A study of READ 180 in middle schools in Clark County
School District, Las Vegas, Nevada. New York: Scholastic Research and Evaluation Department/Scholastic Inc.
Pearson, L.M. & White, R.N. (2004, June). Study of the impact of READ 180 on student
performance in Fairfax County Public Schools. [Draft manuscript provided by Scholastic
Inc., January 25, 2006.]
Petersen, N.S., Kolen, M.J., & Hoover, H.D. (1989). “Scaling, Norming, and Equating.”
In R.L. Linn (Ed.), Educational Measurement (Third Edition) (pp. 221–262). New York:
American Council on Education and Macmillan Publishing Company.
Petty, R. (1995, May 24). Touting computerized tests’ potential for K–12 arena. Education
Week on the web, Letters To the Editor, pp. 1–2.
Poznanski, J.B. (1990). A meta-analytic approach to the estimation of item difficulties.
Unpublished doctoral dissertation, Duke University, Durham, NC.
Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attachment Tests. Chicago: The
University of Chicago Press (first published in 1960).
Rim, E-D. (1980). Personal communication to Squires, Huitt, and Segars.
Salvia, J. & Ysseldyke, J.E. (1998). Assessment (Seventh Edition). Boston: Houghton Mifflin
Company.
Scholastic Inc. (2005, May). SRI 3.0/4.0 comparison study [unpublished manuscript]. New
York; Author.
Scholastic Inc. (2006a). Scholastic Reading Inventory: Educator’s Guide. New York: Author.
Scholastic Inc. (2006b). Analysis of the effect of the “locator test” on SRI scores on a large
population of simulated students [unpublished manuscript]. New York: Author.
School Renaissance Institute. (2000).Comparison of the STAR Reading ComputerAdaptive Test and the Scholastic Reading Inventory-Interactive Test. Madison, WI: Author.
Shankweiler, D. & Crain, S. (1986). Language mechanisms and reading disorder: A modular
approach. Cognition, 14, 139-168.

94

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 94

9/26/07 6:04:26 PM

Smith, F. (1973). Psycholinguistics and reading. New York: Holt Rinehart Winston.
Sommerhauser, M. (2006, January 16). Read 180 sparks turnaround for FMS special-needs
students. Fulton Sun, Callaway County, Georgia. Retrieved January 17, 2006, from http://
www.fultonsun.com/articles/2006/01/15/news/351news13.txt.
Squires, D.A., Huitt, W.G., & Segars, J.K. (1983). Effective schools and classrooms. Alexandria,
VA: Association for Supervisor and Curricular Development.
St. Paul School District. (no date). Read 180 Stage B: St. Paul School District, Minnesota.
[Draft manuscript provided by Scholastic Inc., January 25, 2006.]
Stenner, A.J. (1990). Objectivity: Specific and general. Rasch Measurement Transactions, 4, 111.
Stenner, A.J. (1994). Specific objectivity—local and general. Rasch Measurement Transactions,
8, 374.
Stenner, A.J. (1996, October). Measuring reading comprehension with the Lexile Framework. Paper presented at the California Comparability Symposium, Burlingame, CA.
Stenner, A.J. & Burdick, D.S. (1997, January). The objective measurement of reading
comprehension in response to technical questions raised by the California Department of
Education Technical Study Group. Durham, NC: MetaMetrics, Inc.
Stenner, A.J., Burdick, H., Sanford, E.E., & Burdick, D.S. (2006). How accurate are Lexile
text measures? Journal of Applied Measurement, 7(3), 307–322.
Stenner, A.J., Smith, M., & Burdick, D.S. (1983). Toward a theory of construct definition.
Journal of Educational Measurement, 20(4), 305–315.
Stenner, A.J., Smith, D.R., Horabin, I., & Smith, M. (1987a). Fit of the Lexile Theory to
item difficulties on fourteen standardized reading comprehension tests. Durham, NC:
MetaMetrics, Inc.
Stenner, A.J., Smith, D.R., Horabin, I., & Smith, M. (1987b). Fit of the Lexile Theory to
sequenced units from eleven basal series. Durham, NC: MetaMetrics, Inc.
Stone, G.E. & Lunz, M.E. (1994). The effect of review on the psychometric characteristics
of computerized adaptive Tests. Applied Measurement in Education, 7, 211–222.
Thomas, J. (2003, November). Reading program Evaluation: READ 180, Grades 4–8.
[Draft manuscript provided by Scholastic Inc., January 25, 2006.]
Wainer, H. (1992). Some practical considerations when converting a linearly administered
test to an adaptive format. (Program Statistics Research Technical Report No. 92-21).
Princeton, NJ: Educational testing Service.
Wainer, H., Dorans, N.J., Flaugher, R., Green, B.F., Mislevy, R.J., Steinberg, L., & Thissen,
D. (1990). Computerized adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates,
Publishers.

Te c h n i c a l G u i d e

74216_SRI_TechGuide_FC-105.indd 95

95

10/8/07 9:48:00 AM

Wang, T. & Vispoel, W.P. (1998). Properties of ability estimation methods in computerized
adaptive testing. Journal of Educational Measurement, 35, 109–135.
White, E.B. (1952). Charlotte’s Web. New York: Harper and Row.
White, R.N. & Haslam, M.B. (2005, June). Study of performance of READ 180 participants
in the Phoenix Union High School District – 2003–04. Policy Studies Associates: Washington,
DC.
Williamson G.L. (2004). Why do Scores Change? Durham NC: MetaMetrics, Inc.
Williamson G.L. (2006). Managing Multiple Measures. Durham: NC: MetaMetrics, Inc.
Williamson, G.L., Thompson, C.L., & Baker, R.F. (2006, March). North Carolina’s growth
in reading and mathematics. Paper presented at the annual meeting of the North Carolina
Association for Research in Education (NCARE), Hickory, NC.
Wright, B.D. & Linacre, J.M. (1994). The Rasch model as a foundation for the Lexile
Framework. Unpublished manuscript.
Wright, B.D., & Linacre, J.M. (2003). A user’s guide to WINSTEPS Rasch-Model computer
program, 3.38. Chicago, Illinois: Winsteps.com.
Wright, B.D. & Stone, M.H. (1979). Best Test Design. Chicago: MESA Press.
Zakaluk, B.L. & Samuels, S.J. (1988). Readability: Its past, present, and future. Newark, DL:
International Reading Association.

96

Scholastic Reading Inventory

74216_SRI_TechGuide_FC-105.indd 96

9/26/07 6:04:28 PM

Notes

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 97

97

8/14/07 6:54:29 PM

Notes

98

Sch olastic Rea ding Inv e nto r y

74216_SRI_TechGuide_FC-105.indd 98

8/14/07 6:54:29 PM

Notes

Tec h n i c al Gu i d e

74216_SRI_TechGuide_FC-105.indd 99

99

8/14/07 6:54:29 PM



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
Create Date                     : 2007:08:14 18:54:52-04:00
Modify Date                     : 2015:07:31 15:16:40-06:00
XMP Toolkit                     : Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26
Producer                        : Acrobat Distiller 7.0.5 for Macintosh
Metadata Date                   : 2015:07:31 15:16:40-06:00
Document ID                     : uuid:6a490b80-6c80-11dc-b224-000d93c9402c
Instance ID                     : uuid:772294c6-da45-45e3-aaee-3222e5d36394
Format                          : application/pdf
Title                           : untitled
Page Count                      : 101
EXIF Metadata provided by EXIF.tools

Navigation menu