# A Beginner's Guide To Structural Equation Ing Beginners 3rd Ed

User Manual: Pdf

Open the PDF directly: View PDF .

Page Count: 510 [warning: Documents this large are best viewed by clicking the View PDF Link!]

- A Beginner’s Guide to Structural Equation Modeling
- Copyright
- Contents
- About the Authors
- Preface
- 1 Introduction
- 2 Data Entry and Data Editing Issues
- 3 Correlation
- 4 SEM Basics
- 5 Model Fit
- 6 Regression Models
- 7 Path Models
- 8 Confirmatory Factor Models
- 9 Developing Structural Equation Models: Part I
- 10 Developing Structural Equation Models: Part II
- 11 Reporting SEM Research: Guidelines and Recommendations
- 12 Model Validation
- 13 Multiple Sample, Multiple Group, and Structured Means Models
- 14 Second-Order, Dynamic, and Multitrait Multimethod Models
- 15 Multiple Indicator–Multiple Indicator Cause, Mixture, and Multilevel Models
- 16 Interaction, Latent Growth, and Monte Carlo Methods
- 17 Matrix Approach to Structural Equation Modeling
- Appendix A: Introduction to Matrix Operations
- Appendix B: Statistical Tables
- Answers to Selected Exercises
- Author Index
- Subject Index

A Beginner’s Guide to

Structural

Equation

Randall E. Schumacker

The University of Alabama

Richard G. Lomax

The Ohio State University

Modeling

Third Edition

Y102005.indb 3 4/3/10 4:25:16 PM

Routledge

Taylor & Francis Group

711 Third Avenue

New York, NY 10017

Routledge

Taylor & Francis Group

27 Church Road

Hove, East Sussex BN3 2FA

© 2010 by Taylor and Francis Group, LLC

Routledge is an imprint of Taylor & Francis Group, an Informa business

International Standard Book Number: 978-1-84169-890-8 (Hardback) 978-1-84169-891-5 (Paperback)

For permission to photocopy or use material electronically from this work, please access www.

copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.

(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-

tion that provides licenses and registration for a variety of users. For organizations that have been

granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and

are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Schumacker, Randall E.

A beginner’s guide to structural equation modeling / authors, Randall E.

Schumacker, Richard G. Lomax.-- 3rd ed.

p. cm.

Includes bibliographical references and index.

ISBN 978-1-84169-890-8 (hardcover : alk. paper) -- ISBN 978-1-84169-891-5

(pbk. : alk. paper)

1. Structural equation modeling. 2. Social sciences--Statistical methods. I.

Lomax, Richard G. II. Title.

QA278.S36 2010

519.5’3--dc22 2010009456

Visit the Taylor & Francis Web site at

http://www.taylorandfrancis.com

and the Psychology Press Web site at

http://www.psypress.com

Y102005.indb 4 4/3/10 4:25:16 PM

vii

Contents

About the Authors ...........................................................................................xv

Preface ............................................................................................................. xvii

1 Introduction ................................................................................................1

1.1 What Is Structural Equation Modeling? .......................................2

1.2 History of Structural Equation Modeling .................................... 4

1.3 Why Conduct Structural Equation Modeling? ............................ 6

1.4 Structural Equation Modeling Software Programs ....................8

1.5 Summary ......................................................................................... 10

References .................................................................................................. 11

2 Data Entry and Data Editing Issues ..................................................... 13

2.1 Data Entry ....................................................................................... 14

2.2 Data Editing Issues ........................................................................ 18

2.2.1 Measurement Scale ........................................................... 18

2.2.2 Restriction of Range ......................................................... 19

2.2.3 Missing Data ...................................................................... 20

2.2.4 LISREL–PRELIS Missing Data Example........................ 21

2.2.5 Outliers ............................................................................... 27

2.2.6 Linearity ............................................................................. 27

2.2.7 Nonnormality .................................................................... 28

2.3 Summary ......................................................................................... 29

References .................................................................................................. 31

3 Correlation ................................................................................................33

3.1 Types of Correlation Coefcients .................................................33

3.2 Factors Affecting Correlation Coefcients ................................. 35

3.2.1 Level of Measurement and Range of Values ................. 35

3.2.2 Nonlinearity ...................................................................... 36

3.2.3 Missing Data ......................................................................38

3.2.4 Outliers ............................................................................... 39

3.2.5 Correction for Attenuation .............................................. 39

3.2.6 Nonpositive Denite Matrices ........................................ 40

3.2.7 Sample Size ........................................................................ 41

3.3 Bivariate, Part, and Partial Correlations .....................................42

3.4 Correlation versus Covariance .....................................................46

3.5 Variable Metrics (Standardized versus Unstandardized) ........ 47

3.6 Causation Assumptions and Limitations ...................................48

3.7 Summary ......................................................................................... 49

References .................................................................................................. 51

Y102005.indb 7 4/3/10 4:25:17 PM

viii Contents

4 SEM Basics ................................................................................................ 55

4.1 Model Specication ........................................................................55

4.2 Model Identication ....................................................................... 56

4.3 Model Estimation ........................................................................... 59

4.4 Model Testing ................................................................................. 63

4.5 Model Modication ....................................................................... 64

4.6 Summary ......................................................................................... 67

References .................................................................................................. 69

5 Model Fit .................................................................................................... 73

5.1 Types of Model-Fit Criteria ........................................................... 74

5.1.1 LISREL–SIMPLIS Example ..............................................77

5.1.1.1 Data .....................................................................77

5.1.1.2 Program ..............................................................80

5.1.1.3 Output ................................................................. 81

5.2 Model Fit ..........................................................................................85

5.2.1 Chi-Square (χ2) .................................................................. 85

5.2.2 Goodness-of-Fit Index (GFI) and Adjusted

Goodness-of-Fit Index (AGFI) .........................................86

5.2.3 Root-Mean-Square Residual Index (RMR) .................... 87

5.3 Model Comparison ........................................................................ 88

5.3.1 Tucker–Lewis Index (TLI) ................................................ 88

5.3.2 Normed Fit Index (NFI) and Comparative Fit

Index (CFI) .........................................................................88

5.4 Model Parsimony ........................................................................... 89

5.4.1 Parsimony Normed Fit Index (PNFI) ............................. 90

5.4.2 Akaike Information Criterion (AIC) .............................. 90

5.4.3 Summary ............................................................................ 91

5.5 Parameter Fit ................................................................................... 92

5.6 Power and Sample Size ................................................................. 93

5.6.1 Model Fit ............................................................................ 94

5.6.1.1 Power ................................................................... 94

5.6.1.2 Sample Size ........................................................99

5.6.2 Model Comparison ......................................................... 108

5.6.3 Parameter Signicance ....................................................111

5.6.4 Summary ...........................................................................113

5.7 Two-Step Versus Four-Step Approach to Modeling ................114

5.8 Summary ........................................................................................116

Chapter Footnote .....................................................................................118

Standard Errors ........................................................................................118

Chi-Squares ...............................................................................................118

References ................................................................................................ 120

Y102005.indb 8 4/3/10 4:25:17 PM

Contents ix

6 Regression Models ................................................................................ 125

6.1 Overview ....................................................................................... 126

6.2 An Example ................................................................................... 130

6.3 Model Specication ...................................................................... 130

6.4 Model Identication ..................................................................... 131

6.5 Model Estimation ......................................................................... 131

6.6 Model Testing ............................................................................... 133

6.7 Model Modication ..................................................................... 134

6.8 Summary ....................................................................................... 135

6.8.1 Measurement Error ......................................................... 136

6.8.2 Additive Equation ........................................................... 137

Chapter Footnote .................................................................................... 138

Regression Model with Intercept Term ..................................... 138

LISREL–SIMPLIS Program (Intercept Term) ...................................... 138

References ................................................................................................ 139

7 Path Models ............................................................................................ 143

7.1 An Example ................................................................................... 144

7.2 Model Specication ...................................................................... 147

7.3 Model Identication ..................................................................... 150

7.4 Model Estimation ......................................................................... 151

7.5 Model Testing ............................................................................... 154

7.6 Model Modication ..................................................................... 155

7.7 Summary ....................................................................................... 156

Appendix: LISREL–SIMPLIS Path Model Program ........................... 156

Chapter Footnote .................................................................................... 158

Another Traditional Non-SEM Path Model-Fit Index ............ 158

LISREL–SIMPLIS program ......................................................... 158

References .................................................................................................161

8 Conrmatory Factor Models ............................................................... 163

8.1 An Example ................................................................................... 164

8.2 Model Specication ...................................................................... 166

8.3 Model Identication ......................................................................167

8.4 Model Estimation ......................................................................... 169

8.5 Model Testing ............................................................................... 170

8.6 Model Modication ..................................................................... 173

8.7 Summary ........................................................................................174

Appendix: LISREL–SIMPLIS Conrmatory Factor Model Program ....174

References ................................................................................................ 177

9 Developing Structural Equation Models: Part I.............................. 179

9.1 Observed Variables and Latent Variables ................................. 180

9.2 Measurement Model .................................................................... 184

Y102005.indb 9 4/3/10 4:25:17 PM

x Contents

9.3 Structural Model .......................................................................... 186

9.4 Variances and Covariance Terms .............................................. 189

9.5 Two-Step/Four-Step Approach .................................................. 191

9.6 Summary ....................................................................................... 192

References ................................................................................................ 193

10 Developing Structural Equation Models: Part II ............................ 195

10.1 An Example ................................................................................... 195

10.2 Model Specication ...................................................................... 197

10.3 Model Identication ..................................................................... 200

10.4 Model Estimation ......................................................................... 202

10.5 Model Testing ............................................................................... 203

10.6 Model Modication ..................................................................... 205

10.7 Summary ....................................................................................... 207

Appendix: LISREL–SIMPLIS Structural Equation Model Program .....207

References ................................................................................................ 208

11 Reporting SEM Research: Guidelines and Recommendations ... 209

11.1 Data Preparation .......................................................................... 212

11.2 Model Specication ...................................................................... 213

11.3 Model Identication ..................................................................... 215

11.4 Model Estimation ..........................................................................216

11.5 Model Testing ............................................................................... 217

11.6 Model Modication ..................................................................... 218

11.7 Summary ....................................................................................... 219

References ................................................................................................ 220

12 Model Validation ................................................................................... 223

Key Concepts ........................................................................................... 223

12.1 Multiple Samples .......................................................................... 223

12.1.1 Model A Computer Output ...........................................226

12.1.2 Model B Computer Output ............................................ 227

12.1.3 Model C Computer Output ........................................... 228

12.1.4 Model D Computer Output ...........................................229

12.1.5 Summary .......................................................................... 229

12.2 Cross Validation ........................................................................... 229

12.2.1 ECVI .................................................................................. 230

12.2.2 CVI .................................................................................... 231

12.3 Bootstrap .......................................................................................234

12.3.1 PRELIS Graphical User Interface .................................. 234

12.3.2 LISREL and PRELIS Program Syntax .......................... 237

12.4 Summary ....................................................................................... 241

References ................................................................................................ 243

Y102005.indb 10 4/3/10 4:25:17 PM

Contents xi

13 Multiple Sample, Multiple Group, and Structured

Means Models ........................................................................................ 245

13.1 Multiple Sample Models ............................................................. 245

Sample 1 ........................................................................................ 247

Sample 2 ........................................................................................ 247

13.2 Multiple Group Models ...............................................................250

13.2.1 Separate Group Models .................................................. 251

13.2.2 Similar Group Model .....................................................255

13.2.3 Chi-Square Difference Test ............................................ 258

13.3 Structured Means Models .......................................................... 259

13.3.1 Model Specication and Identication ........................ 259

13.3.2 Model Fit .......................................................................... 261

13.3.3 Model Estimation and Testing ...................................... 261

13.4 Summary ....................................................................................... 263

Suggested Readings ................................................................................ 267

Multiple Samples ......................................................................... 267

Multiple Group Models .............................................................. 267

Structured Means Models ........................................................... 267

Chapter Footnote .................................................................................... 268

SPSS ................................................................................................ 268

References ................................................................................................ 269

14 Second-Order, Dynamic, and Multitrait Multimethod Models .....271

14.1 Second-Order Factor Model ....................................................... 271

14.1.1 Model Specication and Identication ........................ 271

14.1.2 Model Estimation and Testing ...................................... 272

14.2 Dynamic Factor Model .................................................................274

14.3 Multitrait Multimethod Model (MTMM) ................................. 277

14.3.1 Model Specication and Identication ........................ 279

14.3.2 Model Estimation and Testing ...................................... 280

14.3.3 Correlated Uniqueness Model ...................................... 281

14.4 Summary ....................................................................................... 286

Suggested Readings ................................................................................ 290

Second-Order Factor Models ...................................................... 290

Dynamic Factor Models .............................................................. 290

Multitrait Multimethod Models ................................................. 290

Correlated Uniqueness Model ................................................... 291

References ................................................................................................ 291

15 Multiple Indicator–Multiple Indicator Cause, Mixture,

and Multilevel Models ......................................................................... 293

15.1 Multiple Indicator–Multiple Cause (MIMIC) Models ............. 293

15.1.1 Model Specication and Identication ........................ 294

15.1.2 Model Estimation and Model Testing .......................... 294

Y102005.indb 11 4/3/10 4:25:17 PM

xii Contents

15.1.3 Model Modication ........................................................ 297

Goodness-of-Fit Statistics .............................................. 297

Measurement Equations ................................................ 297

Structural Equations ....................................................... 298

15.2 Mixture Models ............................................................................ 298

15.2.1 Model Specication and Identication ........................ 299

15.2.2 Model Estimation and Testing ...................................... 301

15.2.3 Model Modication ........................................................ 302

15.2.4 Robust Statistic ................................................................305

15.3 Multilevel Models ........................................................................ 307

15.3.1 Constant Effects .............................................................. 313

15.3.2 Time Effects ..................................................................... 313

15.3.3 Gender Effects ................................................................. 315

15.3.4 Multilevel Model Interpretation ....................................318

15.3.5 Intraclass Correlation ..................................................... 319

15.3.6 Deviance Statistic ............................................................ 320

15.4 Summary ....................................................................................... 320

Suggested Readings ................................................................................ 324

Multiple Indicator–Multiple Cause Models ............................. 324

Mixture Models ............................................................................ 325

Multilevel Models ........................................................................ 325

References ................................................................................................ 325

16 Interaction, Latent Growth, and Monte Carlo Methods ................ 327

16.1 Interaction Models ....................................................................... 327

16.1.1 Categorical Variable Approach ..................................... 328

16.1.2 Latent Variable Interaction Model ................................ 331

16.1.2.1 Computing Latent Variable Scores ............... 331

16.1.2.2 Computing Latent Interaction Variable ....... 333

16.1.2.3 Interaction Model Output ..............................335

16.1.2.4 Model Modication ......................................... 336

16.1.2.5 Structural Equations—No Latent

Interaction Variable ......................................... 336

16.1.3 Two-Stage Least Squares (TSLS) Approach ................ 337

16.2 Latent Growth Curve Models..................................................... 341

16.2.1 Latent Growth Curve Program ..................................... 343

16.2.2 Model Modication ........................................................344

16.3 Monte Carlo Methods ..................................................................345

16.3.1 PRELIS Simulation of Population Data........................ 346

16.3.2 Population Data from Specied

Covariance Matrix .......................................................... 352

16.3.2.1 SPSS Approach ................................................ 352

16.3.2.2 SAS Approach ..................................................354

16.3.2.3 LISREL Approach ............................................ 355

Y102005.indb 12 4/3/10 4:25:18 PM

Contents xiii

16.3.3 Covariance Matrix from Specied Model ................... 359

16.4 Summary ....................................................................................... 365

Suggested Readings ................................................................................ 368

Interaction Models ....................................................................... 368

Latent Growth-Curve Models .................................................... 368

Monte Carlo Methods .................................................................. 368

References ................................................................................................ 369

17 Matrix Approach to Structural Equation Modeling ....................... 373

17.1 General Overview of Matrix Notation ...................................... 373

17.2 Free, Fixed, and Constrained Parameters ................................. 379

17.3 LISREL Model Example in Matrix Notation ............................ 382

LISREL8 Matrix Program Output (Edited and Condensed)..385

17.4 Other Models in Matrix Notation ..............................................400

17.4.1 Path Model .......................................................................400

17.4.2 Multiple-Sample Model ................................................. 404

17.4.3 Structured Means Model ............................................... 405

17.4.4 Interaction Models .......................................................... 410

PRELIS Computer Output .......................................................... 412

LISREL Interaction Computer Output .......................................416

17.5 Summary ....................................................................................... 421

References ................................................................................................ 423

Appendix A: Introduction to Matrix Operations ...................................425

Appendix B: Statistical Tables ...................................................................439

Answers to Selected Exercises ................................................................... 449

Author Index .................................................................................................. 489

Subject Index ................................................................................................. 495

Y102005.indb 13 4/3/10 4:25:18 PM

xv

About the Authors

RANDALL E. SCHUMACKER received his Ph.D. in educational psychol-

ogy from Southern Illinois University. He is currently professor of educa-

tional research at the University of Alabama, where he teaches courses

in structural equation modeling, multivariate statistics, multiple regres-

sion, and program evaluation. His research interests are varied, including

modeling interaction in SEM, robust statistics (normal scores, centering,

and variance ination factor issues), and SEM specication search issues

as well as measurement model issues related to estimation, mixed-item

formats, and reliability.

He has published in several journals including Academic Medicine,

Educational and Psychological Measurement, Journal of Applied Measurement,

Journal of Educational and Behavioral Statistics, Journal of Research Methodology,

Multiple Linear Regression Viewpoints, and Structural Equation Modeling.

He has served on the editorial boards of numerous journals and is a

member of the American Educational Research Association, American

Psychological Association—Division 5, as well as past-president of the

Southwest Educational Research Association, and emeritus editor of

Structural Equation Modeling journal. He can be contacted at the University

of Alabama College of Education.

RICHARD G. LOMAX received his Ph.D. in educational research meth-

odology from the University of Pittsburgh. He is currently a professor in

the School of Educational Policy and Leadership, Ohio State University,

where he teaches courses in structural equation modeling, statistics, and

quantitative research methodology.

His research primarily focuses on models of literacy acquisition, multi-

variate statistics, and assessment. He has published in such diverse jour-

nals as Parenting: Science and Practice, Understanding Statistics: Statistical

Issues in Psychology, Education, and the Social Sciences, Violence Against

Women, Journal of Early Adolescence, and Journal of Negro Education. He has

served on the editorial boards of numerous journals, and is a member of

the American Educational Research Association, the American Statistical

Association, and the National Reading Conference. He can be contacted at

Ohio State University College of Education and Human Ecology.

Y102005.indb 15 4/3/10 4:25:18 PM

xvii

Preface

Approach

This book presents a basic introduction to structural equation modeling

(SEM). Readers will nd that we have kept to our tradition of keeping

examples rudimentary and easy to follow. The reader is provided with

a review of correlation and covariance, followed by multiple regression,

path, and factor analyses in order to better understand the building blocks

of SEM. The book describes a basic structural equation model followed by

the presentation of several different types of structural equation models.

Our approach in the text is both conceptual and application oriented.

Each chapter covers basic concepts, principles, and practice and then

utilizes SEM software to provide meaningful examples. Each chapter also

features an outline, key concepts, a summary, numerous examples from

a variety of disciplines, tables, and gures, including path diagrams, to

assist with conceptual understanding. Chapters with examples follow the

conceptual sequence of SEM steps known as model specication, identi-

cation, estimation, testing, and modication.

The book now uses LISREL 8.8 student version to make the software and

examples readily available to readers. Please be aware that the student

version, although free, does not contain all of the functional features as a

full licensed version. Given the advances in SEM software over the past

decade, you should expect updates and patches of this software package

and therefore become familiar with any new features as well as explore the

excellent library of examples and help materials. The LISREL 8.8 student

version is an easy-to-use Windows PC based program with pull-down

menus, dialog boxes, and drawing tools. To access the program, and/or

if you’re a Mac user and are interested in learning about Mac availability,

please check with Scientic Software (http://www.ssicentral.com). There

is also a hotlink to the Scientic Software site from the book page for A

Beginner’s Guide to Structural Equation Modeling, 3rd edition on the Textbook

Resources tab at www.psypress.com.

The SEM model examples in the book do not require complicated pro-

gramming skills nor does the reader need an advanced understanding of

statistics and matrix algebra to understand the model applications. We have

provided a chapter on the matrix approach to SEM as well as an appendix

on matrix operations for the interested reader. We encourage the under-

standing of the matrices used in SEM models, especially for some of the

more advanced SEM models you will encounter in the research literature.

Y102005.indb 17 4/3/10 4:25:18 PM

xviii Preface

Goals and Content Coverage

Our main goal in this third edition is for students and researchers to be

able to conduct their own SEM model analyses, as well as be able to under-

stand and critique published SEM research. These goals are supported by

the conceptual and applied examples contained in the book and several

journal article references for each advanced SEM model type. We have

also included a SEM checklist to guide your model analysis according to

the basic steps a researcher takes.

As for content coverage, the book begins with an introduction to SEM

(what it is, some history, why conduct it, and what software is available),

followed by chapters on data entry and editing issues, and correlation.

These early chapters are critical to understanding how missing data, non-

normality, scale of measurement, non-linearity, outliers, and restriction of

range in scores affects SEM analysis. Chapter 4 lays out the basic steps of

model specication, identication, estimation, testing, and modication,

followed by Chapter 5, which covers issues related to model t indices,

power and sample size. Chapters 6 through 10 follow the basic SEM steps

of modeling, with actual examples from different disciplines, using regres-

sion, path, conrmatory factor and structural equation models. Logically

the next chapter presents information about reporting SEM research and

includes a SEM checklist to guide decision-making. Chapter 12 presents

different approaches to model validation, an important nal step after

obtaining an acceptable theoretical model. Chapters 13 through 16 provide

SEM examples that introduce many of the different types of SEM model

applications. The nal chapter describes the matrix approach to structural

equation modeling by using examples from the previous chapters.

Theoretical models are present in every discipline, and therefore can be

formulated and tested. This third edition expands SEM models and appli-

cations to provide the students and researchers in medicine, political sci-

ence, sociology, education, psychology, business, and the biological sciences

the basic concepts, principles, and practice necessary to test their theoreti-

cal models. We hope you become more familiar with structural equation

modeling after reading the book, and use SEM in your own research.

New to the Third Edition

The rst edition of this book was one of the rst books published on SEM,

while the second edition greatly expanded knowledge of advanced SEM

models. Since that time, we have had considerable experience utilizing the

Y102005.indb 18 4/3/10 4:25:18 PM

Preface xix

book in class with our students. As a result of those experiences, the third

edition represents a more useable book for teaching SEM. As such it is an

ideal text for introductory graduate level courses in structural equation

modeling or factor analysis taught in departments of psychology, educa-

tion, business, and other social and healthcare sciences. An understand-

ing of correlation is assumed.

The third edition offers several new surprises, namely:

1. Our instruction and examples are now based on freely available

software: LISREL 8.8 student version.

2. More examples presented from more disciplines, including input,

output, and screenshots.

3. Every chapter has been updated and enhanced with additional

material.

4. A website with raw data sets for the book’s examples and exer-

cises so they can be used with any SEM program, all of the book’s

exercises, hotlinks to related websites, and answers to all of the

exercises for instructors only. To access the website visit the book

page or the Textbook Resource page at www.psypress.com.

5. Expanded coverage of advanced models with more on multiple-

group, multi-level, and mixture modeling (Chs. 13 and 15), second-

order and dynamic factor models (Ch. 14), and Monte Carlo

methods (Ch. 16).

6. Increased coverage of sample size and power (Ch. 5), including

software programs, and reporting research (Ch. 11).

7. New journal article references help readers better understand

published research (Chs. 13–17).

8. Troubleshooting tips on how to address the most frequently

encountered problems are found in Chapters 3 and 11.

9. Chapters 13 to 16 now include additional SEM model examples.

10. 25% new exercises with answers to half in the back of the book

for student review (and answers to all for instructors only on the

book and/or Textbook Resource page at www.psypress.com).

11. Added Matrix examples for several models in Chapter 17.

12. Updated references in all chapters on all key topics.

Overall, we believe this third edition is a more complete book that can

be used to teach a full course in SEM. The past several years have seen an

explosion in SEM coursework, books, websites, and training courses. We

are proud to have been considered a starting point for many beginner’s

to SEM. We hope you nd that this third edition expands on many of the

programming tools, trends and topics in SEM today.

Y102005.indb 19 4/3/10 4:25:18 PM

xx Preface

Acknowledgments

The third edition of this book represents more than thirty years of inter-

acting with our colleagues and students who use structural equation

modeling. As before, we are most grateful to the pioneers in the eld of

structural equation modeling, particularly to Karl Jöreskog, Dag Sörbom,

Peter Bentler, James Arbuckle, and Linda and Bengt Muthèn. These indi-

viduals have developed and shaped the new advances in the SEM eld as

well as the content of this book, plus provided SEM researchers with soft-

ware programs. We are also grateful to Gerhard Mels who answered our

questions and inquiries about SEM programming problems in the chap-

ters. We also wish to thank the reviewers: James Leeper, The University

of Alabama, Philip Smith, Augusta State University, Phil Wood, the

University of Missouri–Columbia, and Ke-Haie Yuan, the University of

Notre Dame.

This book was made possible through the encouragement of Debra

Riegert at Routledge/Taylor & Francis who insisted it was time for a third

edition. We wish to thank her and her editorial assistant, Erin M. Flaherty,

for coordinating all of the activity required to get a book into print. We

also want to thank Suzanne Lassandro at Taylor & Francis Group, LLC

for helping us through the difcult process of revisions, galleys, and nal

book copy.

Randall E. Schumacker

The University of Alabama

Richard G. Lomax

The Ohio State University

Y102005.indb 20 4/3/10 4:25:18 PM

1

1

Introduction

Key Concepts

Latent and observed variables

Independent and dependent variables

Types of models

Regression

Path

Conrmatory factor

Structural equation

History of structural equation modeling

Structural equation modeling software programs

Structural equation modeling can be easily understood if the researcher

has a grounding in basic statistics, correlation, and regression analysis.

The rst three chapters provide a brief introduction to structural equation

modeling (SEM), basic data entry, and editing issues in statistics, and con-

cepts related to the use of correlation coefcients in structural equation

modeling. Chapter 4 covers the essential concepts of SEM: model speci-

cation, identication, estimation, testing, and modication. This basic

understanding provides the framework for understanding the material

presented in chapters 5 through 8 on model-t indices, regression analy-

sis, path analysis, and conrmatory factor analysis models (measurement

models), which form the basis for understanding the structural equation

models (latent variable models) presented in chapters 9 and 10. Chapter 11

provides guidance on reporting structural equation modeling research.

Chapter 12 addresses techniques used to establish model validity and

generalization of ndings. Chapters 13 to 16 present many of the advanced

SEM models currently appearing in journal articles: multiple group, mul-

tiple indicators–multiple causes, mixture, multilevel, structured means,

multitrait–multimethod, second-order factor, dynamic factor, interaction

Y102005.indb 1 3/22/10 3:24:44 PM

2 A Beginner’s Guide to Structural Equation Modeling

models, latent growth curve models, and Monte Carlo studies. Chapter 17

presents matrix notation for one of our SEM applications, covers the differ-

ent matrices used in structural equation modeling, and presents multiple

regression and path analysis solutions using matrix algebra. We include

an introduction to matrix operations in the Appendix for readers who

want a more mathematical understanding of matrix operations. To start

our journey of understanding, we rst ask, What is structural equation

modeling? Then, we give a brief history of SEM, discuss the importance of

SEM, and note the availability of SEM software programs.

1.1 What Is Structural Equation Modeling?

Structural equation modeling (SEM) uses various types of models to

depict relationships among observed variables, with the same basic goal

of providing a quantitative test of a theoretical model hypothesized by

the researcher. More specically, various theoretical models can be tested

in SEM that hypothesize how sets of variables dene constructs and

how these constructs are related to each other. For example, an educa-

tional researcher might hypothesize that a student’s home environment

inuences her later achievement in school. A marketing researcher may

hypothesize that consumer trust in a corporation leads to increased prod-

uct sales for that corporation. A health care professional might believe

that a good diet and regular exercise reduce the risk of a heart attack.

In each example, the researcher believes, based on theory and empirical

research, sets of variables dene the constructs that are hypothesized to be

related in a certain way. The goal of SEM analysis is to determine the extent to

which the theoretical model is supported by sample data. If the sample data

support the theoretical model, then more complex theoretical models can be

hypothesized. If the sample data do not support the theoretical model, then

either the original model can be modied and tested, or other theoretical

models need to be developed and tested. Consequently, SEM tests theoreti-

cal models using the scientic method of hypothesis testing to advance our

understanding of the complex relationships among constructs.

SEM can test various types of theoretical models. Basic models include

regression (chapter 6), path (chapter 7), and conrmatory factor (chap-

ter 8) models. Our reason for covering these basic models is that they

provide a basis for understanding structural equation models (chapters

9 and 10). To better understand these basic models, we need to dene a

few terms. First, there are two major types of variables: latent variables

and observed variables. Latent variables (constructs or factors) are vari-

ables that are not directly observable or measured. Latent variables are

Y102005.indb 2 3/22/10 3:24:44 PM

Introduction 3

indirectly observed or measured, and hence are inferred from a set of

observed variables that we actually measure using tests, surveys, and

so on. For example, intelligence is a latent variable that represents a psy-

chological construct. The condence of consumers in American business

is another latent variable, one representing an economic construct. The

physical condition of adults is a third latent variable, one representing a

health-related construct.

The observed, measured, or indicator variables are a set of variables that

we use to dene or infer the latent variable or construct. For example, the

Wechsler Intelligence Scale for Children—Revised (WISC-R) is an instru-

ment that produces a measured variable (scores), which one uses to infer

the construct of a child’s intelligence. Additional indicator variables, that

is, intelligence tests, could be used to indicate or dene the construct of

intelligence (latent variable). The Dow-Jones index is a standard measure

of the American corporate economy construct. Other measured variables

might include gross national product, retail sales, or export sales. Blood

pressure is one of many health-related variables that could indicate a

latent variable dened as “tness.” Each of these observed or indicator

variables represent one denition of the latent variable. Researchers use

sets of indicator variables to dene a latent variable; thus, other measure-

ment instruments are used to obtain indicator variables, for example, the

Stanford–Binet Intelligence Scale, the NASDAQ index, and an individual’s

cholesterol level, respectively.

Variables, whether they are observed or latent, can also be dened

as either independent variables or dependent variables. An independent

variable is a variable that is not inuenced by any other variable in

the model. A dependent variable is a variable that is inuenced by

another variable in the model. Let us return to the previous examples

and specify the independent and dependent variables. The educational

researcher hypothesizes that a student’s home environment (indepen-

dent latent variable) inuences school achievement (dependent latent

variable). The marketing researcher believes that consumer trust in a

corporation (independent latent variable) leads to increased product

sales (dependent latent variable). The health care professional wants to

determine whether a good diet and regular exercise (two independent

latent variables) inuence the frequency of heart attacks (dependent

latent variable).

The basic SEM models in chapters 6 through 8 illustrate the use of

observed variables and latent variables when dened as independent

or dependent. A regression model consists solely of observed variables

where a single dependent observed variable is predicted or explained by

one or more independent observed variables; for example, a parent’s edu-

cation level (independent observed variable) is used to predict his or her

child’s achievement score (dependent observed variable). A path model is

Y102005.indb 3 3/22/10 3:24:44 PM

4 A Beginner’s Guide to Structural Equation Modeling

also specied entirely with observed variables, but the exibility allows

for multiple independent observed variables and multiple dependent

observed variables—for example, export sales, gross national product,

and NASDAQ index inuence consumer trust and consumer spending

(dependent observed variables). Path models, therefore, test more com-

plex models than regression models. Conrmatory factor models con-

sist of observed variables that are hypothesized to measure one or more

latent variables (independent or dependent); for example, diet, exercise,

and physiology are observed measures of the independent latent variable

“tness.” An understanding of these basic models will help in under-

standing structural equation modeling, which combines path and factor

analytic models. Structural equation models consist of observed variables

and latent variables, whether independent or dependent; for example, an

independent latent variable (home environment) inuences a dependent

latent variable (achievement), where both types of latent variables are

measured, dened, or inferred by multiple observed or measured indica-

tor variables.

1.2 History of Structural Equation Modeling

To discuss the history of structural equation modeling, we explain the fol-

lowing four types of related models and their chronological order of devel-

opment: regression, path, conrmatory factor, and structural equation

models.

The rst model involves linear regression models that use a correlation

coefcient and the least squares criterion to compute regression weights.

Regression models were made possible because Karl Pearson created a

formula for the correlation coefcient in 1896 that provides an index for

the relationship between two variables (Pearson, 1938). The regression

model permits the prediction of dependent observed variable scores

(Y scores), given a linear weighting of a set of independent observed

scores (X scores) that minimizes the sum of squared residual error val-

ues. The mathematical basis for the linear regression model is found in

basic algebra. Regression analysis provides a test of a theoretical model

that may be useful for prediction (e.g., admission to graduate school or

budget projections). In an example study, regression analysis was used

to predict student exam scores in statistics (dependent variable) from a

series of collaborative learning group assignments (independent vari-

ables; Delucchi, 2006). The results provided some support for collabora-

tive learning groups improving statistics exam performance, although

not for all tasks.

Y102005.indb 4 3/22/10 3:24:44 PM

Introduction 5

Some years later, Charles Spearman (1904, 1927) used the correlation

coefcient to determine which items correlated or went together to create

the factor model. His basic idea was that if a set of items correlated or

went together, individual responses to the set of items could be summed

to yield a score that would measure, dene, or infer a construct. Spearman

was the rst to use the term factor analysis in dening a two-factor con-

struct for a theory of intelligence. D.N. Lawley and L.L. Thurstone in 1940

further developed applications of factor models, and proposed instru-

ments (sets of items) that yielded observed scores from which constructs

could be inferred. Most of the aptitude, achievement, and diagnostic

tests, surveys, and inventories in use today were created using factor ana-

lytic techniques. The term conrmatory factor analysis (CFA) is used today

based in part on earlier work by Howe (1955), Anderson and Rubin (1956),

and Lawley (1958). The CFA method was more fully developed by Karl

Jöreskog in the 1960s to test whether a set of items dened a construct.

Jöreskog completed his dissertation in 1963, published the rst article on

CFA in 1969, and subsequently helped develop the rst CFA software pro-

gram. Factor analysis has been used for over 100 years to create measure-

ment instruments in many academic disciplines, while today CFA is used

to test the existence of these theoretical constructs. In an example study,

CFA was used to conrm the “Big Five” model of personality by Goldberg

(1990). The ve-factor model of extraversion, agreeableness, conscientious-

ness, neuroticism, and intellect was conrmed through the use of multiple

indicator variables for each of the ve hypothesized factors.

Sewell Wright (1918, 1921, 1934), a biologist, developed the third type of

model, a path model. Path models use correlation coefcients and regres-

sion analysis to model more complex relationships among observed

variables. The rst applications of path analysis dealt with models of

animal behavior. Unfortunately, path analysis was largely overlooked

until econometricians reconsidered it in the 1950s as a form of simultane-

ous equation modeling (e.g., H. Wold) and sociologists rediscovered it in

the 1960s (e.g., O. D. Duncan and H. M. Blalock). In many respects, path

analysis involves solving a set of simultaneous regression equations that

theoretically establish the relationship among the observed variables in

the path model. In an example path analysis study, Walberg’s theoretical

model of educational productivity was tested for fth- through eighth-

grade students (Parkerson et al., 1984). The relations among the follow-

ing variables were analyzed in a single model: home environment, peer

group, media, ability, social environment, time on task, motivation, and

instructional strategies. All of the hypothesized paths among those vari-

ables were shown to be statistically signicant, providing support for the

educational productivity model.

The nal model type is structural equation modeling (SEM). SEM mod-

els essentially combine path models and conrmatory factor models;

Y102005.indb 5 3/22/10 3:24:44 PM

6 A Beginner’s Guide to Structural Equation Modeling

that is, SEM models incorporate both latent and observed variables. The

early development of SEM models was due to Karl Jöreskog (1969, 1973),

Ward Keesling (1972), and David Wiley (1973); this approach was initially

known as the JKW model, but became known as the linear structural rela-

tions model (LISREL) with the development of the rst software program,

LISREL, in 1973. Since then, many SEM articles have been published; for

example, Shumow and Lomax (2002) tested a theoretical model of paren-

tal efcacy for adolescent students. For the overall sample, neighborhood

quality predicted parental efcacy, which predicted parental involvement

and monitoring, both of which predicted academic and social-emotional

adjustment.

Jöreskog and van Thillo originally developed the LISREL software pro-

gram at the Educational Testing Service (ETS) using a matrix command

language (i.e., involving Greek and matrix notation), which is described

in chapter 17. The rst publicly available version, LISREL III, was released

in 1976. Later in 1993, LISREL8 was released; it introduced the SIMPLIS

(SIMPle LISrel) command language in which equations are written

using variable names. In 1999, the rst interactive version of LISREL was

released. LISREL8 introduced the dialog box interface using pull-down

menus and point-and-click features to develop models, and the path dia-

gram mode, a drawing program to develop models. Karl Jöreskog was rec-

ognized by Cudeck, DuToit, and Sörbom (2001) who edited a Festschrift

in honor of his contributions to the eld of structural equation modeling.

Their volume contains chapters by scholars who address the many top-

ics, concerns, and applications in the eld of structural equation model-

ing today, including milestones in factor analysis; measurement models;

robustness, reliability, and t assessment; repeated measurement designs;

ordinal data; and interaction models. We cover many of these topics in

this book, although not in as great a depth. The eld of structural equa-

tion modeling across all disciplines has expanded since 1994. Hershberger

(2003) found that between 1994 and 2001 the number of journal articles

concerned with SEM increased, the number of journals publishing SEM

research increased, SEM became a popular choice amongst multivariate

methods, and the journal Structural Equation Modeling became the primary

source for technical developments in structural equation modeling.

1.3 Why Conduct Structural Equation Modeling?

Why is structural equation modeling popular? There are at least four

major reasons for the popularity of SEM. The rst reason suggests that

researchers are becoming more aware of the need to use multiple observed

Y102005.indb 6 3/22/10 3:24:45 PM

Introduction 7

variables to better understand their area of scientic inquiry. Basic statis-

tical methods only utilize a limited number of variables, which are not

capable of dealing with the sophisticated theories being developed. The

use of a small number of variables to understand complex phenomena is

limiting. For instance, the use of simple bivariate correlations is not suf-

cient for examining a sophisticated theoretical model. In contrast, struc-

tural equation modeling permits complex phenomena to be statistically

modeled and tested. SEM techniques are therefore becoming the preferred

method for conrming (or disconrming) theoretical models in a quanti-

tative fashion.

A second reason involves the greater recognition given to the valid-

ity and reliability of observed scores from measurement instruments.

Specically, measurement error has become a major issue in many dis-

ciplines, but measurement error and statistical analysis of data have

been treated separately. Structural equation modeling techniques explic-

itly take measurement error into account when statistically analyzing

data. As noted in subsequent chapters, SEM analysis includes latent and

observed variables as well as measurement error terms in certain SEM

models.

A third reason pertains to how structural equation modeling has matured

over the past 30 years, especially the ability to analyze more advanced the-

oretical SEM models. For example, group differences in theoretical models

can be assessed through multiple-group SEM models. In addition, analyz-

ing educational data collected at more than one level—for example, school

districts, schools, and teachers with student data—is now possible using

multilevel SEM modeling. As a nal example, interaction terms can now

be included in an SEM model so that main effects and interaction effects

can be tested. These advanced SEM models and techniques have provided

many researchers with an increased capability to analyze sophisticated

theoretical models of complex phenomena, thus requiring less reliance on

basic statistical methods.

Finally, SEM software programs have become increasingly user-

friendly. For example, until 1993 LISREL users had to input the pro-

gram syntax for their models using Greek and matrix notation. At

that time, many researchers sought help because of the complex pro-

gramming requirement and knowledge of the SEM syntax that was

required. Today, most SEM software programs are Windows-based

and use pull-down menus or drawing programs to generate the pro-

gram syntax internally. Therefore, the SEM software programs are now

easier to use and contain features similar to other Windows-based

software packages. However, such ease of use necessitates statisti-

cal training in SEM modeling and software via courses, workshops,

or textbooks to avoid mistakes and errors in analyzing sophisticated

theoretical models.

Y102005.indb 7 3/22/10 3:24:45 PM

8 A Beginner’s Guide to Structural Equation Modeling

1.4 Structural Equation Modeling Software Programs

Although the LISREL program was the rst SEM software program,

other software programs have subsequently been developed since the

mid-1980s. Some of the other programs include AMOS, EQS, Mx, Mplus,

Ramona, and Sepath, to name a few. These software programs are each

unique in their own way, with some offering specialized features for

conducting different SEM applications. Many of these SEM software

programs provide statistical analysis of raw data (e.g., means, correla-

tions, missing data conventions), provide routines for handling missing

data and detecting outliers, generate the program’s syntax, diagram the

model, and provide for import and export of data and gures of a theo-

retical model. Also, many of the programs come with sets of data and

program examples that are clearly explained in their user guides. Many

of these software programs have been reviewed in the journal Structural

Equation Modeling.

The pricing information for SEM software varies depending on indi-

vidual, group, or site license arrangements; corporate versus educa-

tional settings; and even whether one is a student or faculty member.

Furthermore, newer versions and updates necessitate changes in pric-

ing. Most programs will run in the Windows environment; some run

on MacIntosh personal computers. We are often asked to recommend

a software package to a beginning SEM researcher; however, given the

different individual needs of researchers and the multitude of different

features available in these programs, we are not able to make such a rec-

ommendation. Ultimately the decision depends upon the researcher’s

needs and preferences. Consequently, with so many software packages,

we felt it important to narrow our examples in the book to LISREL–

SIMPLIS programs.

We will therefore be using the LISREL 8.8 student version in the book

to demonstrate the many different SEM applications, including regres-

sion models, path models, conrmatory factor models, and the various

SEM models in chapters 13 through 16. The free student version of the

LISREL software program (Windows, Mac, and Linux editions) can be

downloaded from the website: http://www.ssicentral.com/lisrel/student.

html. (Note: The LISREL 8.8 Student Examples folder is placed in the main

directory C:/ of your computer, not the LISREL folder under C:/Program

Files when installing the software.)

Y102005.indb 8 3/22/10 3:24:45 PM

Introduction 9

Once the LISREL software is downloaded, place an icon on your desk-

top by creating a shortcut to the LISREL icon. The LISREL icon should

look something like this:

LISREL 8.80 Student.lnk

When you click on the icon, an empty dialog box will appear that should

look like this:

NOTE: Nothing appears until you open a program le or data set using

the File or open folder icon; more about this in the next chapter.

We do want to mention the very useful HELP menu. Click on the ques-

tion mark (?), a HELP menu will appear, then enter Output Questions in

the search window to nd answers to key questions you may have when

going over examples in the Third Edition.

Y102005.indb 9 3/22/10 3:25:10 PM

10 A Beginner’s Guide to Structural Equation Modeling

1 . 5 S u m m a r y

In this chapter we introduced structural equation modeling by describ-

ing basic types of variables—that is, latent, observed, independent, and

dependent—and basic types of SEM models—that is, regression, path,

conrmatory factor, and structural equation models. In addition, a brief

history of structural equation modeling was provided, followed by a dis-

cussion of the importance of SEM. This chapter concluded with a brief

listing of the different structural equation modeling software programs

and where to obtain the LISREL 8.8 student version for use with examples

Y102005.indb 10 3/22/10 3:25:11 PM

Introduction 11

in the book, including what the dialog box will rst appear like and a very

useful HELP menu.

In chapter 2 we consider the importance of examining data for issues

related to measurement level (nominal, ordinal, interval, or ratio), restric-

tion of range (fewer than 15 categories), missing data, outliers (extreme

values), linearity or nonlinearity, and normality or nonnormality, all of

which can affect statistical methods, and especially SEM applications.

Exercises

1. Dene the following terms:

a. Latent variable

b. Observed variable

c. Dependent variable

d. Independent variable

2. Explain the difference between a dependent latent variable and

a dependent observed variable.

3. Explain the difference between an independent latent variable

and an independent observed variable.

4. List the reasons why a researcher would conduct structural

equation modeling.

5. Download and activate the student version of LISREL: http://

www.ssicentral.com

6. Open and import SPSS or data le.

References

Anderson, T. W., & Rubin, H. (1956). Statistical inference in factor analysis. In

J. Neyman (Ed.), Proceedings of the third Berkeley symposium on mathemati-

cal statistics and probability, Vol. V (pp. 111–150). Berkeley: University of

California Press.

Cudeck, R., Du Toit, S., & Sörbom, D. (2001) (Eds). Structural equation modeling:

Present and future. A Festschrift in honor of Karl Jöreskog. Lincolnwood, IL:

Scientic Software International.

Delucchi, M. (2006). The efcacy of collaborative learning groups in an under-

graduate statistics course. College Teaching, 54, 244–248.

Goldberg, L. (1990). An alternative “description of personality”: Big Five factor

structure. Journal of Personality and Social Psychology, 59, 1216–1229.

Hershberger, S. L. (2003). The growth of structural equation modeling: 1994–2001.

Structural Equation Modeling, 10(1), 35–46.

Howe, W. G. (1955). Some contributions to factor analysis (Report No. ORNL-1919).

Oak Ridge National Laboratory, Oak Ridge, Tennessee.

Jöreskog, K. G. (1963). Statistical estimation in factor analysis: A new technique and its

foundation. Stockholm: Almqvist & Wiksell.

Y102005.indb 11 3/22/10 3:25:11 PM

12 A Beginner’s Guide to Structural Equation Modeling

Jöreskog, K. G. (1969). A general approach to conrmatory maximum likelihood

factor analysis. Psychometrika, 34, 183–202.

Jöreskog, K. G. (1973). A general method for estimating a linear structural equation

system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models

in the social sciences (pp. 85–112). New York: Seminar.

Keesling, J. W. (1972). Maximum likelihood approaches to causal ow analysis.

Unpublished doctoral dissertation. Chicago: University of Chicago.

Lawley, D. N. (1958). Estimation in factor analysis under various initial assump-

tions. British Journal of Statistical Psychology, 11, 1–12.

Parkerson, J. A., Lomax, R. G., Schiller, D. P., & Walberg, H. J. (1984). Exploring

causal models of educational achievement. Journal of Educational Psychology,

76, 638–646.

Pearson, E. S. (1938). Karl Pearson. An appreciation of some aspects of his life and work.

Cambridge: Cambridge University Press.

Shumow, L., & Lomax, R. G. (2002). Parental efcacy: Predictor of parenting behav-

ior and adolescent outcomes. Parenting: Science and Practice, 2, 127–150.

Spearman, C. (1904). The proof and measurement of association between two

things. American Journal of Psychology, 15, 72–101.

Spearman, C. (1927). The abilities of man. New York: Macmillan.

Wiley, D. E. (1973). The identication problem for structural equation models with

unmeasured variables. In A. S. Goldberger & O. D. Duncan (Eds.), Structural

equation models in the social sciences (pp. 69–83). New York: Seminar.

Wright, S. (1918). On the nature of size factors. Genetics, 3, 367–374.

Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20,

557–585.

Wright, S. (1934). The method of path coefcients. Annals of Mathematical Statistics,

5, 161–215.

Y102005.indb 12 3/22/10 3:25:11 PM

13

2

Data Entry and Data Editing Issues

Key Concepts

Importing data le

System le

Measurement scale

Restriction of range

Missing data

Outliers

Linearity

Nonnormality

An important rst step in using LISREL is to be able to enter raw data and/

or import data, such as les from other programs (SPSS, SAS, EXCEL, etc.).

Other important steps involve being able to use LISREL–PRELIS to save

a system le, as well as output and save les that contain the variance–

covariance matrix, the correlation matrix, means, and standard deviations

of variables so they can be input into command syntax programs. The

LISREL–PRELIS program will be briey explained in this chapter to dem-

onstrate how it handles raw data entry, importing of data, and the output

of saved les.

There are several key issues in the eld of statistics that impact our anal-

yses once data have been imported into a software program. These data

issues are commonly referred to as the measurement scale of variables,

restriction in the range of data, missing data values, outliers, linearity, and

nonnormality. Each of these data issues will be discussed because they

not only affect traditional statistics, but present additional problems and

concerns in structural equation modeling.

We use LISREL software throughout the book, so you will need to use

that software and become familiar with their Web site. You should have

downloaded by now the free student version of the LISREL software.

Y102005.indb 13 3/22/10 3:25:11 PM

14 A Beginner’s Guide to Structural Equation Modeling

We use some of the data and model examples available in the free stu-

dent version to illustrate SEM applications. (Note: The LISREL 8.8 Student

Examples folder is placed in the main directory C:/ of your computer.)

The free student version of the software has a user guide, help functions,

and tutorials. The Web site also contains important research, documenta-

tion, and information about structural equation modeling. However, be

aware that the free student version of the software does not contain the

full capabilities available in their full licensed version (e.g., restricted to

15 observed variables in SEM analyses). These limitations are spelled out

on their Web site.

2.1 Data Entry

The LISREL software program interfaces with PRELIS, a preprocessor of

data prior to running LISREL (matrix command language) or SIMPLIS

(easier-to-use variable name syntax) programs. The newer Interactive

LISREL uses a spreadsheet format for data with pull-down menu options.

LISREL offers several different options for inputting data and importing

les from numerous other programs. The New, Open, and Import Data

functions provide maximum exibility for inputting data.

The New option permits the creation of a command syntax language

program (PRELIS, LISREL, or SIMPLIS) to read in a PRELIS data le, or

Y102005.indb 14 3/22/10 3:25:12 PM

Data Entry and Data Editing Issues 15

to open SIMPLIS and LISREL saved projects as well as a previously saved

Path Diagram.

The Open option permits you to browse and locate previously saved

PRELIS (.pr2), LISREL (.ls8), or SIMPLIS (.spl) programs; each with their

unique le extension. The student version has distinct folders containing

several program examples, for example LISREL (LS8EX folder), PRELIS

(PR2EX folder), and SIMPLIS (SPLEX folder).

The Import Data option permits inputting raw data les or SPSS

saved les. The raw data le, lsat6.dat, is in the PRELIS folder (PR2EX).

When selecting this le, you will need to know the number of variables

in the le.

Y102005.indb 15 3/22/10 3:25:12 PM

16 A Beginner’s Guide to Structural Equation Modeling

An SPSS saved le, data100.sav, is in the SPSS folder (SPSSEX). Once you

open this le, a PRELIS system le is created.

Y102005.indb 16 3/22/10 3:25:13 PM

Data Entry and Data Editing Issues 17

Once the PRELIS system le becomes active, then it needs to be saved for

future use. (Note: # symbol may appear if columns are to narrow; simply use

your mouse to expand the columns so that the missing values—999999.00

will appear. Also, if you right-mouse click on the variable names, a menu

appears to dene missing values, etc.). The PRELIS system le (.psf) acti-

vates a pull-down menu that permits data editing features, data transfor-

mations, statistical analysis of data, graphical display of data, multilevel

modeling, and many other related features.

Y102005.indb 17 3/22/10 3:25:16 PM

18 A Beginner’s Guide to Structural Equation Modeling

The statistical analysis of data includes factor analysis, probit regres-

sion, least squares regression, and two-stage least squares methods.

Other important data editing features include imputing missing values,

a homogeneity test, creation of normal scores, bootstrapping, and data

output options. The data output options permit saving different types of

variance–covariance matrices and descriptive statistics in les for use in

LISREL and SIMPLIS command syntax programs. This capability is very

important, especially when advanced SEM models are analyzed in chap-

ters 13 to 16. We will demonstrate the use of this Output Options dialog

box in this chapter and in some of our other chapter examples.

2.2 Data Editing Issues

2.2.1 Measurement Scale

How variables are measured or scaled inuences the type of statistical

analyses we perform (Anderson, 1961; Stevens, 1946). Properties of scale

also guide our understanding of permissible mathematical operations.

Y102005.indb 18 3/22/10 3:25:17 PM

Data Entry and Data Editing Issues 19

For example, a nominal variable implies mutually exclusive groups; a

biological gender has two mutually exclusive groups, male and female.

An individual can only be in one of the groups that dene the levels

of the variable. In addition, it would not be meaningful to calculate a

mean and a standard deviation on the variable gender. Consequently,

the number or percentage of individuals at each level of the gender

variable is the only mathematical property of scale that makes sense.

An ordinal variable, for example, attitude toward school, that is scaled

strongly agree, agree, neutral, disagree, and strongly disagree, implies mutu-

ally exclusive categories that are ordered or ranked. When levels of a

variable have properties of scale that involve mutually exclusive groups

that are ordered, only certain mathematical operations are meaning-

ful, for example, a comparison of ranks between groups. SEM nal

exam scores, an example of an interval variable, possesses the property

of scale, implying equal intervals between the data points, but no true

zero point. This property of scale permits the mathematical operation

of computing a mean and a standard deviation. Similarly, a ratio vari-

able, for example, weight, has the property of scale that implies equal

intervals and a true zero point (weightlessness). Therefore, ratio vari-

ables also permit mathematical operations of computing a mean and

a standard deviation. Our use of different variables requires us to be

aware of their properties of scale and what mathematical operations

are possible and meaningful, especially in SEM, where variance–

covariance (correlation) matrices are used with means and standard

deviations of variables. Different correlations among variables are

therefore possible depending upon the level of measurement, but they

create unique problems in SEM (see chapter 3). PRELIS designates con-

tinuous variables (CO), ordinal variables (OR), and categorical vari-

ables (CL) to make these distinctions.

2.2.2 Restriction of Range

Data values at the interval or ratio level of measurement can be further

dened as being discrete or continuous. For example, SEM nal exam

scores could be reported in whole numbers (discrete). Similarly, the num-

ber of children in a family would be considered a discrete level of mea-

surement—or example, 5 children. In contrast, a continuous variable is

reported using decimal values; for example, a student’s grade point aver-

age would be reported as 3.75 on a 5-point scale.

Karl Jöreskog (1996) provided a criterion in the PRELIS program based

on his research that denes whether a variable is ordinal or interval,

based on the presence of 15 distinct scale points. If a variable has fewer

than 15 categories or scale points, it is referenced in PRELIS as ordi-

nal (OR), whereas a variable with 15 or more categories is referenced as

Y102005.indb 19 3/22/10 3:25:17 PM

20 A Beginner’s Guide to Structural Equation Modeling

continuous (CO). This 15-point criterion allows Pearson correlation coef-

cient values to vary between +/–1.0. Variables with fewer distinct scale

points restrict the value of the Pearson correlation coefcient such that it

may only vary between +/–0.5. Other factors that affect the Pearson cor-

relation coefcient are presented in this chapter and discussed further

in chapter 3.

2.2.3 Missing Data

The statistical analysis of data is affected by missing data values in vari-

ables. That is, not every subject has an actual value for every variable in

the dataset, as some values are missing. It is common practice in statis-

tical packages to have default values for handling missing values. The

researcher has the options of deleting subjects who have missing values,

replacing the missing data values, or using robust statistical procedures

that accommodate for the presence of missing data.

The various SEM software handle missing data differently and have

different options for replacing missing data values. Table 2.1 lists many

of the various options for dealing with missing data. These options can

dramatically affect the number of subjects available for analysis, the

magnitude and direction of the correlation coefcient, or create problems

if means, standard deviations, and correlations are computed based on

different sample sizes. The Listwise deletion of cases and Pairwise dele-

tion of cases are not always recommended options due to the possibil-

ity of losing a large number of subjects, thus dramatically reducing the

sample size. Mean substitution works best when only a small number

of missing values is present in the data, whereas regression imputation

provides a useful approach with a moderate amount of missing data.

In LISREL–PRELIS the expectation maximization (EM), Monte Carlo

Markov Chain (MCMC), and matching response pattern approaches

are recommended when larger amounts of data are missing at random.

TABLE 2.1

Options for Dealing with Missing Data

Listwise Delete subjects with missing data on any variable

Pairwise Delete subjects with missing data on each pair of variables used

Mean substitution Substitute the mean for missing values of a variable

Regression imputation Substitute a predicted value for the missing value of a variable

Expectation

maximization (EM)

Find expected value based on expectation maximization

algorithm

Matching response

pattern

Match cases with incomplete data to cases with complete data

to determine a missing value

Y102005.indb 20 3/22/10 3:25:17 PM

Data Entry and Data Editing Issues 21

More information about missing data is available in resources such as

Enders (2006), McKnight, McKnight, Sidani and Aurelio (2007), and

Peng, Harwell, Liou, and Ehman (2007). Davey and Savla (2010) have

more recently published an excellent book with SAS, SPSS, STATA, and

Mplus source programs to handle missing data in SEM in the context of

power analysis.

2.2.4 LISREL–PRELIS Missing Data Example

Imputation of missing values is possible for a single variable (Impute

Missing Values) or several variables simultaneously (Multiple Imputation)

by selecting Statistics from the tool bar menu. The Impute Missing Values

option uses the matching response pattern approach. The value to be sub-

stituted for the missing value of a single case is obtained from another

case (or cases) having a similar response pattern over a set of matching

variables. In data sets where missing values occur on more than one vari-

able, you can use multiple imputation of missing values with mean sub-

stitution, delete cases, or leave the variables with dened missing values

as options in the dialog box. In addition, the Multiple Imputation option

uses either the expectation maximization algorithm (EM) or Monte Carlo

Markov Chain (MCMC, generating random draws from probability dis-

tributions via Markov chains) approaches to replacing missing values

across multiple variables.

We present an example from LISREL–PRELIS involving the choles-

terol levels for 28 patients treated for heart attacks. We assume the data

to be missing at random (MAR) with an underlying multivariate normal

distribution. Cholesterol levels were measured after 2 days (VAR1), after

4 days (VAR2), and after 14 days (VAR3), but were only complete for 19

of the 28 patients. The data are shown from the PRELIS System File,

chollev.psf. The PRELIS system le was created by selecting File, Import

Data, and selecting the raw data le chollev.raw located in the Tutorial

folder [C:\LISREL 8.8 Student Examples\Tutorial]. We must know the num-

ber of variables in the raw data le. We must also select Data, then Dene

Variables, and then select −9.00 as the missing value for the VAR 3 vari-

able [Optionally, right mouse click on VAR1 in the PRELIS chollev le].

Y102005.indb 21 3/22/10 3:25:18 PM

22 A Beginner’s Guide to Structural Equation Modeling

Y102005.indb 22 3/22/10 3:25:18 PM

Data Entry and Data Editing Issues 23

We now click on Statistics on the tool bar menu and select Impute

Missing Values from the pull-down menu.

We next select Output Options and save the transformed data in a new

PRELIS system le cholnew.psf, and output the new correlation matrix,

mean, and standard deviation les.

Y102005.indb 23 3/22/10 3:25:19 PM

24 A Beginner’s Guide to Structural Equation Modeling

We should examine our data both before (Table 2.2) and after (Table 2.3)

imputation of missing values. Here, we used the matching response pat-

tern method. This comparison provides us with valuable information

about the nature of the missing data.

We can also view our new transformed PRELIS System File, cholnew.psf,

to verify that the missing values were in fact replaced; for example, VAR3

has values replaced for Case 2 = 204, Case 4 = 142, Case 5 = 182, Case 10 =

280, and so on.

Y102005.indb 24 3/22/10 3:25:19 PM

Data Entry and Data Editing Issues 25

TABLE 2.2

Data Before Imputation of Missing Values

Number of Missing Values per Variable

VAR1

-------------- VAR2

------------- VAR3

------------

0 0 9

Distribution of Missing Values

Total Sample Size = 28

Number of Missing Values 0 1

Number of Cases 19 9

Effective Sample Sizes

Univariate (in Diagonal) and Pairwise Bivariate (off

Diagonal) VAR1

-------------- VAR2

------------- VAR3

------------

VAR1 28

VAR2 28 28

VAR3 19 19 19

Percentage of Missing Values

Univariate (in Diagonal) and Pairwise Bivariate (off

Diagonal) VAR1

-------------- VAR2

------------- VAR3

------------

VAR1 0.00

VAR2 0.00 0.00

VAR3 32.14 32.14 32.14

Correlation MatrixVAR1

-------------- VAR2

------------- VAR3

------------

VAR1 1.000

VAR2 0.673 1.000

VAR3 0.395 0.665 1.000

Means VAR1

-------------- VAR2

------------- VAR3

------------

253.929 230.643 221.474

Standard Deviations

VAR1

-------------- VAR2

------------- VAR3

------------

47.710 46.967 43.184

Y102005.indb 25 3/22/10 3:25:20 PM

26 A Beginner’s Guide to Structural Equation Modeling

We have noticed that selecting matching variables with a higher cor-

relation to the variable with missing values provides better imputed

values for the missing data. We highly recommend comparing any anal-

yses before and after the replacement of missing data values to fully

understand the impact missing data values have on the parameter esti-

mates and standard errors. LISREL–PRELIS also permits replacement

TABLE 2.3

Data After Imputation of Missing Values

Number of Missing Values per Variable

VAR1

------------------ VAR2

---------------- VAR3

----------------

0 0 9

Imputations for VAR3

Case 2 imputed with value 204 (Variance Ratio = 0.000), NM= 1

Case 4 imputed with value 142 (Variance Ratio = 0.000), NM= 1

Case 5 imputed with value 182 (Variance Ratio = 0.000), NM= 1

Case 10 imputed with value 280 (Variance Ratio = 0.000), NM= 1

Case 13 imputed with value 248 (Variance Ratio = 0.000), NM= 1

Case 16 imputed with value 256 (Variance Ratio = 0.000), NM= 1

Case 18 imputed with value 216 (Variance Ratio = 0.000), NM= 1

Case 23 imputed with value 188 (Variance Ratio = 0.000), NM= 1

Case 25 imputed with value 256 (Variance Ratio = 0.000), NM= 1

Number of Missing Values per Variable After Imputation

VAR1

------------------ VAR2

---------------- VAR3

----------------

0 0 0

Total Sample Size = 28

Correlation Matrix VAR1

------------------ VAR2

---------------- VAR3

----------------

VAR1 1.000

VAR2 0.673 1.000

VAR3 0.404 0.787 1.000

Means VAR1

------------------ VAR2

---------------- VAR3

----------------

253.929 230.643 220.714

Standard Deviations

VAR1

------------------ VAR2

---------------- VAR3

----------------

47.710 46.967 42.771

Y102005.indb 26 3/22/10 3:25:20 PM

Data Entry and Data Editing Issues 27

of missing values using the EM and MCMC approaches, which may be

practical when matching sets of variables are not possible. A comparison

of EM and MCMC is also warranted in multiple imputations to deter-

mine the effect of using a different algorithm on the replacement of miss-

ing values.

2.2.5 Outliers

Outliers or inuential data points can be dened as data values that are

extreme or atypical on either the independent (X variables) or dependent

(Y variables) variables or both. Outliers can occur as a result of observa-

tion errors, data entry errors, instrument errors based on layout or instruc-

tions, or actual extreme values from self-report data. Because outliers

affect the mean, the standard deviation, and correlation coefcient values,

they must be explained, deleted, or accommodated by using robust sta-

tistics. Sometimes, additional data will need to be collected to ll in the

gap along either the Y or X axes. LISREL–PRELIS has outlier detection

methods available that include the following: box plot display, scatterplot,

histogram, and frequency distributions.

2.2.6 Linearity

Some statistical techniques, such as SEM, assume that the variables are lin-

early related to one another. Thus, a standard practice is to visualize the

coordinate pairs of data points of two continuous variables by plotting the

data in a scatterplot. These bivariate plots depict whether the data are lin-

early increasing or decreasing. The presence of curvilinear data reduces the

magnitude of the Pearson correlation coefcient, even resulting in the pres-

ence of a zero correlation. Recall that the Pearson correlation value indicates

the magnitude and direction of the linear relationships between two vari-

ables. Figure 2.1 shows the importance of visually displaying the bivariate

data scatterplot.

FIGURE 2.1

Left: correlation is linear. Right: correlation is nonlinear.

Y102005.indb 27 3/22/10 3:25:20 PM

28 A Beginner’s Guide to Structural Equation Modeling

2.2.7 Nonnormality

In basic statistics, several transformations are given to handle issues with

nonnormal data. Some of these common transformations are in Table 2.4.

Inferential statistics often rely on the assumption that the data are nor-

mally distributed. Data that are skewed (lack of symmetry) or more fre-

quently occurring along one part of the measurement scale will affect the

variance–covariance among variables. In addition, kurtosis (peakedness)

in data will impact statistics. Leptokurtic data values are more peaked than

the normal distribution, whereas platykurtic data values are atter and

more dispersed along the X axis, but have a consistent low frequency on

the Y axis—that is, the frequency distribution of the data appears more

rectangular in shape.

Nonnormal data can occur because of the scaling of variables (e.g.,

ordinal rather than interval) or the limited sampling of subjects. Possible

solutions for skewness are to resample more participants or to perform a

linear transformation as outlined above. Our experience is that a probit

data transformation works best in correcting skewness. Kurtosis in data

is more difcult to resolve; some possible solutions in LISREL–PRELIS

include additional sampling of subjects, or the use of bootstrap meth-

ods, normalizing scores, or alternative methods of estimation (e.g., WLS

or ADF).

The presence of skewness and kurtosis can be detected in LISREL–

PRELIS using univariate tests, multivariate tests, and measures of skew-

ness and kurtosis that are available in the pull-down menus or output.

One recommended method of handling nonnormal data is to use an

asymptotic covariance matrix as input along with the sample covariance

matrix in the LISREL–PRELIS program, as follows:

TABLE 2.4

Data Transformation Types

y = ln(x) or y = log10(x) or

y = ln(x+0.5)

Useful with clustered data or cases where the standard

deviation increases with the mean

y = sqrt(x) Useful with Poisson counts

y = arcsin((x + 0.375)/(n + 0.75)) Useful with binomial proportions [0.2 < p = x/n < 0.8]

y = 1/x Useful with gamma-distributed x variable

y = logit(x) = ln(x/(1 – x)) Useful with binomial proportions x = p

y = normit(x) Quantile of normal distribution for standardized x

y = probit(x) = 5 + normit(x) Most useful to resolve nonnormality of data

Note: probit(x) is same as normit(x) plus 5 to avoid negative values.

Y102005.indb 28 3/22/10 3:25:21 PM

Data Entry and Data Editing Issues 29

LISREL

CM = boy.cov

AC = boy.acm

SIMPLIS

Covariance matrix from file boy.cov

Asymptotic covariance matrix from file boy.acm

We can use the asymptotic covariance matrix in two different ways: (a) as a

weight matrix when specifying the method of estimation as weighted least

squares (WLS), and (b) as a weight matrix that adjusts the normal-theory

weight matrix to correct for bias in standard errors and t statistics. The

appropriate moment matrix in PRELIS, using OUTPUT OPTIONS, must

be selected before requesting the calculation of the asymptotic covariance

matrix.

PRELIS recognizes data as being continuous (CO), ordinal (OR), or

classes (CL), that is gender (boy, girl). Different correlations are possible

depending upon the level of measurement. A variance–covariance matrix

with continuous variables would use Pearson correlations, while ordinal

variables would use Tetrachoric correlations. If skewed nonnormal data

is present, then consider a linear transformation using Probit. In SEM,

researchers typically output and use an asymptotic variance–covariance

matrix. When using a PRELIS data set, consider the normal score option

in the menu to correct for nonnormal variables.

2 . 3 S u m m a r y

Structural equation modeling is a correlation research method; therefore,

the measurement scale, restriction of range in the data values, missing

data, outliers, nonlinearity, and nonnormality of data affect the variance–

covariance among variables and thus can impact the SEM analysis.

Researchers should use the built-in menu options to examine, graph, and

test for any of these problems in the data prior to conducting any SEM

model analysis. Basically, researchers should know their data character-

istics. Data screening is a very important rst step in structural equation

modeling. The next chapter illustrates in more detail issues related to the

use of correlation and variance–covariance in SEM models. There, we

provide specic examples to illustrate the importance of topics covered

in this chapter. A troubleshooting box summarizing these issues is pro-

vided in Box 2.1.

Y102005.indb 29 3/22/10 3:25:21 PM

30 A Beginner’s Guide to Structural Equation Modeling

BOX 2.1 TROUBLESHOOTING TIPS

Issue Suggestions

Measurement

scale

Need to take the measurement scale of the variables into account

when computing statistics such as means, standard deviations, and

correlations.

Restriction of

range

Need to consider range of values obtained for variables, as

restricted range of one or more variables can reduce the

magnitude of correlations.

Missing data Need to consider missing data on one or more subjects for one or

more variables as this can affect SEM results. Cases are lost with

listwise deletion, pairwise deletion is often problematic (e.g.,

different sample sizes), and thus modern imputation methods are

recommended.

Outliers Need to consider outliers as they can affect statistics such as

means, standard deviations, and correlations. They can either be

explained, deleted, or accommodated (using either robust

statistics or obtaining additional data to ll-in). Can be detected

by methods such as box plots, scatterplots, histograms or

frequency distributions.

Linearity Need to consider whether variables are linearly related, as

nonlinearity can reduce the magnitude of correlations. Can be

detected by scatterplots. Can be dealt with by transformations or

deleting outliers.

Nonnormality Need to consider whether the variables are normally distributed,

as nonnormality can affect resulting SEM statistics. Can be

detected by univariate tests, multivariate tests, and skewness and

kurtosis statistics. Can be dealt with by transformations,

additional sampling, bootstrapping, normalizing scores, or

alternative methods of estimation.

Exercises

1. LISREL uses which command to import data sets?

a. File, then Export Data

b. File, then Open

c. File, then Import Data

d. File, then New

2. Dene the following levels of measurement.

a. Nominal

b. Ordinal

c. Interval

d. Ratio

3. Mark each of the following statements true (T) or false (F).

a. LISREL can deal with missing data.

b. PRELIS can deal with missing data.

Y102005.indb 30 3/22/10 3:25:21 PM

Data Entry and Data Editing Issues 31

c. LISREL can compute descriptive statistics.

d. PRELIS can compute descriptive statistics.

4. Explain how each of the following affects statistics:

a. Restriction of range

b. Missing data

c. Outliers

d. Nonlinearity

e. Nonnormality

References

Anderson, N. H. (1961). Scales and statistics: Parametric and non-parametric.

Psychological Bulletin, 58, 305–316.

Davey, A., & Savla, J. (2009). Statistical power analysis with missing data: A structural

equation modeling approach. Routledge, Taylor & Francis Group: New York.

Enders, C. K. (2006). Analyzing structural equation models with missing data. In

G.R. Hancock & R.O. Mueller (Eds.), Structural equation modeling: A second

course (pp. 313–342). Greenwich, CT: Information Age.

Jöreskog, K. G., & Sörbom, D. (1996). PRELIS2: User’s reference guide. Lincolnwood,

IL: Scientic Software International.

McKnight, P. E., McKnight, K. M., Sidani, S., & Aurelio, J. F. (2007). Missing data: A

gentle introduction. New York: Guilford.

Peng, C.-Y. J., Harwell, M., Liou, S.-M., & Ehman, L. H. (2007). Advances in missing

data methods and implications for educational research. In S.S. Sawilowsky

(Ed.), Real data analysis. Charlotte: Information Age.

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103,

677–680.

Y102005.indb 31 3/22/10 3:25:21 PM

33

3

Correlation

Key Concepts

Types of correlation coefcients

Factors affecting correlation

Correction for attenuation

Nonpositive denite matrices

Bivariate, part, and partial correlation

Suppressor variable

Covariance and causation

In chapter 2 we considered a number of data preparation issues in struc-

tural equation modeling. In this chapter, we move beyond data prepara-

tion in describing the important role that correlation (covariance) plays

in SEM. We also include a discussion of a number of factors that affect

correlation coefcients as well as the assumptions and limitations of cor-

relation methods in structural equation modeling.

3.1 Types of Correlation Coefficients

Sir Francis Galton conceptualized the correlation and regression proce-

dure for examining covariance in two or more traits, and Karl Pearson

(1896) developed the statistical formula for the correlation coefcient and

regression based on his suggestion (Crocker & Algina, 1986; Ferguson &

Takane, 1989; Tankard, 1984). Shortly thereafter, Charles Spearman (1904)

used the correlation procedure to develop a factor analysis technique.

The correlation, regression, and factor analysis techniques have for many

decades formed the basis for generating tests and dening constructs.

Today, researchers are expanding their understanding of the roles that

correlation, regression, and factor analysis play in theory and construct

Y102005.indb 33 3/22/10 3:25:21 PM

34 A Beginner’s Guide to Structural Equation Modeling

denition to include latent variable, covariance structure, and conrma-

tory factor measurement models.

The relationships and contributions of Galton, Pearson, and Spearman

to the eld of statistics, especially correlation, regression, and factor anal-

ysis, are quite interesting (Tankard, 1984). In fact, the basis of association

between two variables—that is, correlation or covariance—has played a

major role in statistics. The Pearson correlation coefcient provides the

basis for point estimation (test of signicance), explanation (variance

accounted for in a dependent variable by an independent variable), predic-

tion (of a dependent variable from an independent variable through lin-

ear regression), reliability estimates (test–retest, equivalence), and validity

(factorial, predictive, concurrent).

The Pearson correlation coefcient also provides the basis for estab-

lishing and testing models among measured and/or latent variables. The

partial and part correlations further permit the identication of specic

bivariate relationships between variables that allow for the specication

of unique variance shared between two variables while controlling for the

inuence of other variables. Partial and part correlations can be tested for

signicance, similar to the Pearson correlation coefcient, by simply using

the degrees of freedom, n – 2, in the standard correlation table of signi-

cance values (Table A.3) or an F test in multiple regression which tests the

difference in R2 values between full and restricted models (Table A.5).

Although the Pearson correlation coefcient has had a major impact in

the eld of statistics, other correlation coefcients have emerged depend-

ing upon the level of variable measurement. Stevens (1968) provided the

properties of scales of measurement that have become known as nominal,

ordinal, interval, and ratio. The types of correlation coefcients developed

for these various levels of measurement are categorized in Table 3.1.

TABLE 3.1

Types of Correlation Coefcients

Correlation Coefcient Level of Measurement

Pearson product-moment Both variables interval

Spearman rank, Kendall’s tau Both variables ordinal

Phi, contingency Both variables nominal

Point biserial One variable interval, one variable dichotomous

Gamma, rank biserial One variable ordinal, one variable nominal

Biserial One variable interval, one variable articiala

Polyserial One variable interval, one variable ordinal with

underlying continuity

Tetrachoric Both variables dichotomous (nominal articiala)

Polychoric Both variables ordinal with underlying continuities

a Articial refers to recoding variable values into a dichotomy.

Y102005.indb 34 3/22/10 3:25:21 PM

Correlation 35

Many popular computer programs, for example, SAS and SPSS, typi-

cally do not compute all of these correlation types. Therefore, you may

need to check a popular statistics book or look around for a computer pro-

gram that will compute the type of correlation coefcient you need—for

example, the phi and point-biserial coefcient are not readily available. In

SEM analyses, the Pearson coefcient, tetrachoric or polychoric (for several

ordinal variable pairs) coefcient, and biserial or polyserial (for several

continuous and ordinal variable pairs) coefcient are typically used (see

PRELIS for the use of Kendall’s tau-c or tau-b, and canonical correlation).

LISREL permits mixture models, which use variables with both ordinal and

interval-ratio levels of measurement (chapter 15). Although SEM software

programs are now demonstrating how mixture models can be analyzed,

the use of variables with different levels of measurement has traditionally

been a problem in the eld of statistics—for example, multiple regression

and multivariate statistics.

3.2 Factors Affecting Correlation Coefficients

Given the important role that correlation plays in structural equation

modeling, we need to understand the factors that affect establishing rela-

tionships among multivariable data points. The key factors are the level

of measurement, restriction of range in data values (variability, skewness,

kurtosis), missing data, nonlinearity, outliers, correction for attenuation,

and issues related to sampling variation, condence intervals, effect size,

signicance, sample size, and power.

3.2.1 Level of Measurement and Range of Values

Four types or levels of measurement typically dene whether the charac-

teristic or scale interpretation of a variable is nominal, ordinal, interval, or

ratio (Stevens, 1968). In structural equation modeling, each of these types

of scaled variables can be used. However, it is not recommended that they

be included together or mixed in a correlation (covariance) matrix. Instead,

the PRELIS data output option should be used to save an asymptotic cova-

riance matrix for input along with the sample variance-covariance matrix

into a LISREL or SIMPLIS program.

Initially, SEM required variables measured at the interval or ratio level

of measurement, so the Pearson product-moment correlation coefcient

was used in regression, path, factor, and structural equation modeling.

The interval or ratio scaled variable values should also have a sufcient

range of score values to introduce variance (15 or more scale points). If the

Y102005.indb 35 3/22/10 3:25:22 PM

36 A Beginner’s Guide to Structural Equation Modeling

range of scores is restricted, the magnitude of the correlation value is

decreased. Basically, as a group of subjects become more homogeneous,

score variance decreases, reducing the correlation value between the vari-

ables. So, there must be enough variation in scores to allow a correlation

relationship to manifest itself between variables. Variables with fewer

than 15 categories are treated as ordinal variables in LISREL–PRELIS, so

if you are assuming continuous interval-level data, you will need to check

whether the variables meet this condition. Also, the use of the same scale

values for variables can help in the interpretation of results and/or rela-

tive comparison among variables. The meaningfulness of a correlation

relationship will depend on the variables employed; hence, your theoreti-

cal perspective is very important. You may recall from your basic statistics

course that a spurious correlation is possible when two sets of scores cor-

relate signicantly, but their relationship is not meaningful or substantive

in nature.

If the distributions of variables are widely divergent, correlation can

also be affected, and so several data transformations are suggested by

Ferguson and Takane (1989) to provide a closer approximation to a nor-

mal, homogeneous variance for skewed or kurtotic data. Some possible

transformations are the square root transformation (sqrt X), the logarith-

mic transformation (log X), the reciprocal transformation (1/X), and the

arcsine transformation (arcsin X). The probit transformation appears to be

most effective in handling univariate skewed data.

Consequently, the type of scale used and the range of values for the

measured variables can have profound effects on your statistical analysis

(in particular, on the mean, variance, and correlation). The scale and range

of a variable’s numerical values affects statistical methods, and this is no

different in structural equation modeling. The PRELIS program is avail-

able to provide tests of normality, skewness, and kurtosis on variables

and to compute an asymptotic covariance matrix for input into LISREL if

required. The use of normal scores is also an option in PRELIS.

3.2.2 Nonlinearity

The Pearson correlation coefcient indicates the degree of linear relation-

ship between two variables. It is possible that two variables can indicate no

correlation if they have a curvilinear relationship. Thus, the extent to which

the variables deviate from the assumption of a linear relationship will affect

the size of the correlation coefcient. It is therefore important to check for

linearity of the scores; the common method is to graph the coordinate data

points in a scatterplot. The linearity assumption should not be confused

with recent advances in testing interaction in structural equation models

discussed in chapter 16. You should also be familiar with the eta coefcient

as an index of nonlinear relationship between two variables and with the

Y102005.indb 36 3/22/10 3:25:22 PM

Correlation 37

testing of linear, quadratic, or cubic effects. Consult an intermediate statis-

tics text, for example, Lomax (2007) to review these basic concepts.

The heuristic data sets in Table 3.2 will demonstrate the dramatic effect

a lack of linearity has on the Pearson correlation coefcient value. In the

rst data set, the Y values increase from 1 to 10, and the X values increase

from 1 to 5, then decrease from 5 to 1 (nonlinear). The result is a Pearson

correlation coefcient of r = 0; although a nonlinear relationship does exist

in the data, it is not indicated by the Pearson correlation coefcient. The

restriction of range in values can be demonstrated using the fourth heu-

ristic data set in Table 3.2. The Y values only range between 3 and 7, and

the X values only range from 1 to 4. The Pearson correlation coefcient is

also r = 0 for these data. The fth data set indicates how limited sampling

can affect the Pearson coefcient. In these sample data, only three pairs

of scores are sampled, and the Pearson correlation is r = –1.0, or perfectly

negatively correlated.

TABLE 3.2

Heuristic Data Sets

Nonlinear Data Complete Data Missing Data

Y X Y X Y X

1.00 1.00 8.00 6.00 8.00 —

2.00 2.00 7.00 5.00 7.00 5.00

3.00 3.00 8.00 4.00 8.00 —

4.00 4.00 5.00 2.00 5.00 2.00

5.00 5.00 4.00 3.00 4.00 3.00

6.00 5.00 5.00 2.00 5.00 2.00

7.00 4.00 3.00 3.00 3.00 3.00

8.00 3.00 5.00 4.00 5.00 —

9.00 2.00 3.00 1.00 3.00 1.00

10.00 1.00 2.00 2.00 2.00 2.00

Range of Data Sampling Effect

Y X Y X

3.00 1.00 8.00 3.00

3.00 2.00 9.00 2.00

4.00 3.00 10.00 1.00

4.00 4.00

5.00 1.00

5.00 2.00

6.00 3.00

6.00 4.00

7.00 1.00

7.00 2.00

Y102005.indb 37 3/22/10 3:25:22 PM

38 A Beginner’s Guide to Structural Equation Modeling

3.2.3 Missing Data

A complete data set is also given in Table 3.2 where the Pearson correla-

tion coefcient is r = .782, p = .007, for n = 10 pairs of scores. If missing

data were present, the Pearson correlation coefcient would drop to r =

.659, p = .108, for n = 7 pairs of scores. The Pearson correlation coefcient

changes from statistically signicant to not statistically signicant. More

importantly, in a correlation matrix with several variables, the various

correlation coefcients could be computed on different sample sizes. If

we used listwise deletion of cases, then any variable in the data set with

a missing value would cause a subject to be deleted, possibly causing a

substantial reduction in our sample size, whereas pairwise deletion of cases

would result in different sample sizes for our correlation coefcients in

the correlation matrix.

Researchers have examined various aspects of how to handle or treat

missing data beyond our introductory example using a small heuristic

data set. One basic approach is to eliminate any observations where some

of the data are missing, listwise deletion. Listwise deletion is not recom-

mended because of the loss of information on other variables, and statisti-

cal estimates are based on reduced sample size. Pairwise deletion excludes

data only when they are missing on the pairs of variables selected for

analysis. However, this could lead to different sample sizes for the differ-

ent correlations and related statistical estimates. A third approach, data

imputation, replaces missing values with an estimate, for example, the

mean value on a variable for all subjects who did not report any data for

that variable (Beale & Little, 1975; also see chapter 2).

Missing data can arise in different ways (Little & Rubin, 1987, 1990).

Missing completely at random (MCAR) implies that data on variable X are

missing unrelated statistically to the values that have been observed

for other variables as well as X. Missing at random (MAR) implies that

data values on variable X are missing conditional on other variables,

but are unrelated to the values of X. A third situation, nonignorable data,

implies probabilistic information about the values that would have been

observed. For MCAR data, mean substitution yields biased variance and

covariance estimates, whereas listwise and pairwise deletion methods

yield consistent solutions. For MAR data, mean substitution, listwise,

and pairwise deletion methods produce biased results. When missing

data are nonignorable, all approaches yield biased results. It would be

prudent for the researcher to investigate how parameter estimates are

affected by the use or nonuse of a data imputation method. A few ref-

erences are provided to give a more detailed understanding of miss-

ing data (Arbuckle, 1996; Enders, 2006; McKnight, McKnight, Sidani &

Aurelio, 2007; Peng, Harwell, Liou & Ehman, 2007; Wothke, 2000; Davey

& Savla, 2009).

Y102005.indb 38 3/22/10 3:25:22 PM

Correlation 39

3.2.4 Outliers

The Pearson correlation coefcient can be drastically affected by a sin-

gle outlier on X or Y. For example, the two data sets in Table 3.3 indicate

a Y = 27 value (Set A) versus a Y = 2 value (Set B) for the last subject. In

the rst set of data, r = .524, p = .37, whereas in the second set of data,

r = –.994, p = .001. Is the Y = 27 data value an outlier based on limited

sampling or is it a data entry error? A large body of research has been

undertaken to examine how different outliers on X, Y, or both X, and

Y affect correlation relationships, and how to better analyze the data

using robust statistics (Anderson & Schumacker, 2003; Ho & Naugher,

2000; Huber, 1981; Rousseeuw & Leroy, 1987; Staudte & Sheather, 1990).

TABLE 3.3

Outlier Data Sets

Set A Set B

X Y X Y

1 9 1 9

2 7 2 7

3 5 3 5

4 3 4 3

5 27 5 2

3.2.5 Correction for Attenuation

A basic assumption in psychometric theory is that observed data contain mea-

surement error. A test score (observed data) is a function of a true score and

measurement error. A Pearson correlation coefcient will have different val-

ues, depending on whether it was computed with observed scores or the true

scores where measurement error has been removed. The Pearson correlation

coefcient can be corrected for attenuation or unreliable measurement error in

scores, thus yielding a true score correlation; however, the corrected correla-

tion coefcient can become greater than 1.0! Low reliability in the indepen-

dent and/or dependent variables, coupled with a high correlation between

the independent and dependent variable, can result in correlations greater

than 1.0. For example, given a correlation of r = .90 between the observed

scores on X and Y, the Cronbach alpha reliability coefcient of .60 for X scores,

and the Cronbach alpha reliability coefcient of .70 for Y scores, the Pearson

correlation coefcient, corrected for attenuation (r*) , is greater than 1.0:

rr

rr

xy

xy

xx yy

*.

.(.)

.

..== ==

90

60 70

90

648 1 389

Y102005.indb 39 3/22/10 3:25:23 PM

40 A Beginner’s Guide to Structural Equation Modeling

When this happens, a nonpositive denite error message occurs stopping

the SEM program.

3.2.6 Nonpositive Definite Matrices

Correlation coefcients greater than 1.0 in a correlation matrix cause the

correlation matrix to be nonpositive denite. In other words, the solution is

not admissible, indicating that parameter estimates cannot be computed.

Correction for attenuation is not the only situation that causes nonposi-

tive matrices to occur (Wothke, 1993). Sometimes the ratio of covariance

to the product of variable variances yields correlations greater than 1.0.

The following variance–covariance matrix is nonpositive denite because

it contains a correlation coefcient greater than 1.0 between the Relations

and Attribute latent variables (denoted by an asterisk):

Variance–Covariance Matrix

Task 1.043

Relations .994 1.079

Management .892 .905 .924

Attribute 1.065 1.111 .969 1.12

Correlation Matrix

Task 1.000

Relations .937 1.000

Management .908 .906 1.000

Attribute .985 1.010* .951 1.000

Nonpositive denite covariance matrices occur when the determinant of

the matrix is zero or the inverse of the matrix is not possible. This can

be caused by correlations greater than 1.0, linear dependency among the

observed variables, multicollinearity among the observed variables, a

variable that is a linear combination of other variables, a sample size less

than the number of variables, the presence of a negative or zero variance

(Heywood Case), variance–covariance (correlation) values outside the

permissible range, for example, correlation beyond +/−1.0, and bad start

values in the user-specied model. A Heywood case also occurs when the

communality estimate is greater than 1.0. Possible solutions to resolve

this error are to reduce communality or x communality to less than 1.0,

extract a different number of factors (possibly by dropping paths), rescale

observed variables to create a more linear relationship, or eliminate a bad

observed variable that indicates linear dependency or multicollinearity.

Regression, path, factor, and structural equation models mathematically

solve a set of simultaneous equations typically using ordinary least squares

Y102005.indb 40 3/22/10 3:25:23 PM

Correlation 41

(OLS) estimates as initial estimates of coefcients in the model. However,

these initial estimates or coefcients are sometimes distorted or too differ-

ent from the nal admissible solution. When this happens, more reason-

able start values need to be chosen. It is easy to see from the basic regression

coefcient formula that the correlation coefcient value and the standard

deviation values of the two variables affect the initial OLS estimates:

br s

s

xy

y

x

=

.

3.2.7 Sample Size

A common formula used to determine sample size when estimating means

of variables was given by McCall (1982): n = (Z s/e)2, where n is the sample

size needed for the desired level of precision, e is the effect size, Z is the

condence level, and s is the population standard deviation of scores

(s can be estimated from prior research studies, test norms, or the range of

scores divided by 6). For example, given a random sample of ACT scores

from a dened population with a standard deviation of 100, a desired con-

dence level of 1.96 (which corresponds to a .05 level of signicance), and

an effect size of 20 (difference between sampled ACT mean and popula-

tion ACT mean), the sample size needed would be [1.96 (100)/20)]2 = 96.

In structural equation modeling, however, the researcher often requires

a much larger sample size to maintain power and obtain stable parameter

estimates and standard errors. The need for larger sample sizes is also

due in part to the program requirements and the multiple observed vari-

ables used to dene latent variables. Hoelter (1983) proposed the critical

N statistic, which indicates the sample size needed to obtain a chi-square

value that would reject the null hypothesis in a structural equation model.

The required sample size and power estimates that provide a reasonable

indication of whether a researcher’s data ts their theoretical model or to

estimate parameters is discussed in more detail in chapter 5.

SEM software programs estimate coefcients based on the user-specied

theoretical model, or implied model, but also must work with the satu-

rated and independence models. A saturated model is the model with all

parameters indicated, while the independence model is the null model or

model with no parameters estimated. A saturated model with p observed

variables has p (p + 3)/2 free parameters [Note: Number of independent

elements in the symmetric covariance matrix = p(p + 1)/2. Number of

means = p, so total number of independent elements = p (p + 1)/2 + p = p

(p + 3)/2]. For example, with 10 observed variables, 10(10 + 3)/2 = 65 free

parameters. If the sample size is small, then there is not enough informa-

tion to estimate parameters in the saturated model for a large number of

variables. Consequently, the chi-square t statistic and derived statistics

Y102005.indb 41 3/22/10 3:25:23 PM

42 A Beginner’s Guide to Structural Equation Modeling

such as Akaike’s Information Criterion (AIC) and the root-mean-square

error of approximation (RMSEA) cannot be computed. In addition, the t

of the independence model is required to calculate other t indices such

as the Comparative Fit Index (CFI) and the Normed Fit Index (NFI).

Ding, Velicer, and Harlow (1995) located numerous studies (e.g.,

Anderson & Gerbing, 1988) that were in agreement that 100 to 150 subjects

is the minimum satisfactory sample size when conducting structural equa-

tion models. Boomsma (1982, 1983) recommended 400, and Hu, Bentler,

and Kano (1992) indicated that in some cases 5,000 is insufcient! Many

of us may recall rules of thumb in our statistics texts, for example, 10 sub-

jects per variable or 20 subjects per variable. Costello and Osborne (2005)

demonstrated in their Monte Carlo study that 20 subjects per variable is

recommended for best practices in factor analysis. In our examination of

published SEM research, we have found that many articles used from 250

to 500 subjects, although the greater the sample size, the more likely it

is one can validate the model using cross-validation (see chapter 12). For

example, Bentler and Chou (1987) suggested that a ratio as low as ve sub-

jects per variable would be sufcient for normal and elliptical distributions

when the latent variables have multiple indicators and that a ratio of at

least 10 subjects per variable would be sufcient for other distributions.

Determination of sample size is now better understood in SEM model-

ing and further discussed in chapter 5.

3.3 Bivariate, Part, and Partial Correlations

The types of correlations indicated in Table 3.1 are considered bivariate cor-

relations, or associations between two variables. Cohen & Cohen (1983), in

describing correlation research, further presented the correlation between

two variables controlling for the inuence of a third variable. These correla-

tions are referred to as part and partial correlations, depending upon how

variables are controlled or partialled out. Some of the various ways in which

three variables can be depicted are illustrated in Figure 3.1. The diagrams

illustrate different situations among variables where (a) all the variables are

uncorrelated (Case 1), (b) only one pair of variables is correlated (Cases 2

and 3), (c) two pairs of variables are correlated (Cases 4 and 5), and (d) all of

the variables are correlated (Case 6). It is obvious that with more than three

variables the possibilities become overwhelming. It is therefore important to

have a theoretical perspective to suggest why certain variables are correlated

and/or controlled in a study. A theoretical perspective is essential in specify-

ing a model and forms the basis for testing a structural equation model.

The partial correlation coefcient measures the association between two

variables while controlling for a third variable, for example, the association

Y102005.indb 42 3/22/10 3:25:24 PM

Correlation 43

between age and reading comprehension, controlling for reading level.

Controlling for reading level in the correlation between age and compre-

hension partials out the correlation of reading level with age and the cor-

relation of reading level with comprehension. Part correlation, in contrast,

is the correlation between age and comprehension with reading level con-

trolled for, where only the correlation between comprehension and read-

ing level is removed before age is correlated with comprehension.

Whether a part or partial correlation is used depends on the specic

model or research questio