Linear Mixed S: A Practical Guide Using Statistical Software, Second Edition S Stat Softw 2014
User Manual:
Open the PDF directly: View PDF .
Page Count: 434
Download | |
Open PDF In Browser | View PDF |
SECOND Statistics EDITION Ideal for anyone who uses software for statistical modeling, this book eliminates the need to read multiple software-specific texts by covering the most popular software programs for fitting LMMs in one handy guide. The authors illustrate the models and methods through real-world examples that enable comparisons of model-fitting options and results across the software procedures. West, Welch, and Gałecki New to the Second Edition • A new chapter on models with crossed random effects that uses a case study to illustrate software procedures capable of fitting these models • Power analysis methods for longitudinal and clustered study designs, including software options for power analyses and suggested approaches to writing simulations • Use of the lmer() function in the lme4 R package • New sections on fitting LMMs to complex sample survey data and Bayesian approaches to making inferences based on LMMs • Updated graphical procedures in the software packages • Substantially revised index to enable more efficient reading and easier location of material on selected topics or software options • More practical recommendations on using the software for analysis • A new R package (WWGbook) that contains all of the data sets used in the examples LINEAR MIXED MODELS Highly recommended by JASA, Technometrics, and other journals, the first edition of this bestseller showed how to easily perform complex linear mixed model (LMM) analyses via a variety of software programs. Linear Mixed Models: A Practical Guide Using Statistical Software, Second Edition continues to lead you step by step through the process of fitting LMMs. This second edition covers additional topics on the application of LMMs that are valuable for data analysts in all fields. It also updates the case studies using the latest versions of the software procedures and provides up-to-date information on the options and features of the software procedures available for fitting LMMs in SAS, SPSS, Stata, R/S-plus, and HLM. K15924 K15924_cover.indd 1 6/13/14 3:30 PM LINEAR MIXED MODELS A Practical Guide Using Statistical Software SECOND EDITION This page intentionally left blank LINEAR MIXED MODELS A Practical Guide Using Statistical Software SECOND EDITION Brady T. West Kathleen B. Welch Andrzej T. Gałecki University of Michigan Ann Arbor, USA with contributions from Brenda W. Gillespie First edition published in 2006. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20140326 International Standard Book Number-13: 978-1-4665-6102-1 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com To Laura and Carter To all of my mentors, advisors, and teachers, especially my parents and grandparents —B.T.W. To Jim, my children, and grandchildren To the memory of Fremont and June —K.B.W. To Viola, my children, and grandchildren To my teachers and mentors In memory of my parents —A.T.G. This page intentionally left blank Contents List of Tables xv List of Figures xvii Preface to the Second Edition xix Preface to the First Edition xxi The Authors xxiii Acknowledgments 1 Introduction 1.1 What Are Linear Mixed Models (LMMs)? . . . . . . . . . . . 1.1.1 Models with Random Effects for Clustered Data . . 1.1.2 Models for Longitudinal or Repeated-Measures Data 1.1.3 The Purpose of This Book . . . . . . . . . . . . . . 1.1.4 Outline of Book Contents . . . . . . . . . . . . . . . 1.2 A Brief History of LMMs . . . . . . . . . . . . . . . . . . . . 1.2.1 Key Theoretical Developments . . . . . . . . . . . . 1.2.2 Key Software Developments . . . . . . . . . . . . . . xxv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Linear Mixed Models: An Overview 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Types and Structures of Data Sets . . . . . . . . . . . . . . . . . 2.1.1.1 Clustered Data vs. Repeated-Measures and Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1.2 Levels of Data . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Types of Factors and Their Related Effects in an LMM . . . . . 2.1.2.1 Fixed Factors . . . . . . . . . . . . . . . . . . . . . . . 2.1.2.2 Random Factors . . . . . . . . . . . . . . . . . . . . . . 2.1.2.3 Fixed Factors vs. Random Factors . . . . . . . . . . . . 2.1.2.4 Fixed Effects vs. Random Effects . . . . . . . . . . . . 2.1.2.5 Nested vs. Crossed Factors and Their Corresponding Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Specification of LMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 General Specification for an Individual Observation . . . . . . . . 2.2.2 General Matrix Specification . . . . . . . . . . . . . . . . . . . . 2.2.2.1 Covariance Structures for the D Matrix . . . . . . . . . 2.2.2.2 Covariance Structures for the Ri Matrix . . . . . . . . 2.2.2.3 Group-Specific Covariance Parameter Values for the D and Ri Matrices . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Alternative Matrix Specification for All Subjects . . . . . . . . . 1 1 2 2 3 4 5 5 7 9 9 9 9 11 12 12 12 13 13 14 15 15 16 19 20 21 21 vii Contents viii 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.2.4 Hierarchical Linear Model (HLM) Specification of the LMM . . The Marginal Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Specification of the Marginal Model . . . . . . . . . . . . . . . . 2.3.2 The Marginal Model Implied by an LMM . . . . . . . . . . . . . Estimation in LMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Maximum Likelihood (ML) Estimation . . . . . . . . . . . . . . 2.4.1.1 Special Case: Assume θ Is Known . . . . . . . . . . . . 2.4.1.2 General Case: Assume θ Is Unknown . . . . . . . . . . 2.4.2 REML Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 REML vs. ML Estimation . . . . . . . . . . . . . . . . . . . . . . Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Algorithms for Likelihood Function Optimization . . . . . . . . . 2.5.2 Computational Problems with Estimation of Covariance Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tools for Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Basic Concepts in Model Selection . . . . . . . . . . . . . . . . . 2.6.1.1 Nested Models . . . . . . . . . . . . . . . . . . . . . . . 2.6.1.2 Hypotheses: Specification and Testing . . . . . . . . . . 2.6.2 Likelihood Ratio Tests (LRTs) . . . . . . . . . . . . . . . . . . . 2.6.2.1 Likelihood Ratio Tests for Fixed-Effect Parameters . . . 2.6.2.2 Likelihood Ratio Tests for Covariance Parameters . . . 2.6.3 Alternative Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3.1 Alternative Tests for Fixed-Effect Parameters . . . . . . 2.6.3.2 Alternative Tests for Covariance Parameters . . . . . . 2.6.4 Information Criteria . . . . . . . . . . . . . . . . . . . . . . . . . Model-Building Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 The Top-Down Strategy . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 The Step-Up Strategy . . . . . . . . . . . . . . . . . . . . . . . . Checking Model Assumptions (Diagnostics) . . . . . . . . . . . . . . . . . 2.8.1 Residual Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1.1 Raw Residuals . . . . . . . . . . . . . . . . . . . . . . . 2.8.1.2 Standardized and Studentized Residuals . . . . . . . . . 2.8.2 Influence Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.3 Diagnostics for Random Effects . . . . . . . . . . . . . . . . . . . Other Aspects of LMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.1 Predicting Random Effects: Best Linear Unbiased Predictors . . 2.9.2 Intraclass Correlation Coefficients (ICCs) . . . . . . . . . . . . . 2.9.3 Problems with Model Specification (Aliasing) . . . . . . . . . . . 2.9.4 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.5 Centering Covariates . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.6 Fitting Linear Mixed Models to Complex Sample Survey Data . 2.9.6.1 Purely Model-Based Approaches . . . . . . . . . . . . . 2.9.6.2 Hybrid Design- and Model-Based Approaches . . . . . 2.9.7 Bayesian Analysis of Linear Mixed Models . . . . . . . . . . . . Power Analysis for Linear Mixed Models . . . . . . . . . . . . . . . . . . 2.10.1 Direct Power Computations . . . . . . . . . . . . . . . . . . . . 2.10.2 Examining Power via Simulation . . . . . . . . . . . . . . . . . . Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 22 23 23 25 25 26 26 27 28 29 29 31 33 34 34 34 34 35 35 36 36 38 38 39 39 40 41 41 41 42 42 43 46 46 47 47 49 50 50 51 52 55 56 56 57 58 Contents 3 Two-Level Models for Clustered Data: The Rat Pup Example 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Rat Pup Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Data Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Overview of the Rat Pup Data Analysis . . . . . . . . . . . . . . . . . 3.3.1 Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2.1 General Model Specification . . . . . . . . . . . . . . 3.3.2.2 Hierarchical Model Specification . . . . . . . . . . . . 3.3.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Analysis Steps in the Software Procedures . . . . . . . . . . . . . . . . 3.4.1 SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3.1 Analysis Using the lme() Function . . . . . . . . . . 3.4.3.2 Analysis Using the lmer() Function . . . . . . . . . 3.4.4 Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 HLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5.1 Data Set Preparation . . . . . . . . . . . . . . . . . . 3.4.5.2 Preparing the Multivariate Data Matrix (MDM) File 3.5 Results of Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Likelihood Ratio Tests for Random Effects . . . . . . . . . . . 3.5.2 Likelihood Ratio Tests for Residual Variance . . . . . . . . . . 3.5.3 F -tests and Likelihood Ratio Tests for Fixed Effects . . . . . . 3.6 Comparing Results across the Software Procedures . . . . . . . . . . . 3.6.1 Comparing Model 3.1 Results . . . . . . . . . . . . . . . . . . 3.6.2 Comparing Model 3.2B Results . . . . . . . . . . . . . . . . . 3.6.3 Comparing Model 3.3 Results . . . . . . . . . . . . . . . . . . 3.7 Interpreting Parameter Estimates in the Final Model . . . . . . . . . . 3.7.1 Fixed-Effect Parameter Estimates . . . . . . . . . . . . . . . . 3.7.2 Covariance Parameter Estimates . . . . . . . . . . . . . . . . . 3.8 Estimating the Intraclass Correlation Coefficients (ICCs) . . . . . . . . 3.9 Calculating Predicted Values . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 Litter-Specific (Conditional) Predicted Values . . . . . . . . . 3.9.2 Population-Averaged (Unconditional) Predicted Values . . . . 3.10 Diagnostics for the Final Model . . . . . . . . . . . . . . . . . . . . . . 3.10.1 Residual Diagnostics . . . . . . . . . . . . . . . . . . . . . . . 3.10.1.1 Conditional Residuals . . . . . . . . . . . . . . . . . . 3.10.1.2 Conditional Studentized Residuals . . . . . . . . . . . 3.10.2 Influence Diagnostics . . . . . . . . . . . . . . . . . . . . . . . 3.10.2.1 Overall Influence Diagnostics . . . . . . . . . . . . . 3.10.2.2 Influence on Covariance Parameters . . . . . . . . . . 3.10.2.3 Influence on Fixed Effects . . . . . . . . . . . . . . . 3.11 Software Notes and Recommendations . . . . . . . . . . . . . . . . . . 3.11.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.2 Syntax vs. Menus . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.3 Heterogeneous Residual Variances for Level 2 Groups . . . . . 3.11.4 Display of the Marginal Covariance and Correlation Matrices . 3.11.5 Differences in Model Fit Criteria . . . . . . . . . . . . . . . . . 3.11.6 Differences in Tests for Fixed Effects . . . . . . . . . . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 59 60 60 62 65 66 68 68 69 72 75 75 84 91 91 96 98 102 102 103 107 107 107 108 109 109 114 114 115 115 118 118 120 120 121 122 122 122 124 126 126 128 129 130 130 130 130 130 131 131 x Contents 3.11.7 Post-Hoc Comparisons of Least Squares (LS) Means (Estimated Marginal Means . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.8 Calculation of Studentized Residuals and Influence Statistics . . 3.11.9 Calculation of EBLUPs . . . . . . . . . . . . . . . . . . . . . . . 3.11.10 Tests for Covariance Parameters . . . . . . . . . . . . . . . . . . 3.11.11 Reference Categories for Fixed Factors . . . . . . . . . . . . . . 4 Three-Level Models for Clustered Data: The Classroom Example 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Classroom Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Data Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2.1 Data Set Preparation . . . . . . . . . . . . . . . . . . 4.2.2.2 Preparing the Multivariate Data Matrix (MDM) File 4.3 Overview of the Classroom Data Analysis . . . . . . . . . . . . . . . . 4.3.1 Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2.1 General Model Specification . . . . . . . . . . . . . . 4.3.2.2 Hierarchical Model Specification . . . . . . . . . . . . 4.3.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Analysis Steps in the Software Procedures . . . . . . . . . . . . . . . . 4.4.1 SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3.1 Analysis Using the lme() Function . . . . . . . . . . 4.4.3.2 Analysis Using the lmer() Function . . . . . . . . . 4.4.4 Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 HLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Results of Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Likelihood Ratio Tests for Random Effects . . . . . . . . . . . 4.5.2 Likelihood Ratio Tests and t -Tests for Fixed Effects . . . . . . 4.6 Comparing Results across the Software Procedures . . . . . . . . . . . 4.6.1 Comparing Model 4.1 Results . . . . . . . . . . . . . . . . . . 4.6.2 Comparing Model 4.2 Results . . . . . . . . . . . . . . . . . . 4.6.3 Comparing Model 4.3 Results . . . . . . . . . . . . . . . . . . 4.6.4 Comparing Model 4.4 Results . . . . . . . . . . . . . . . . . . 4.7 Interpreting Parameter Estimates in the Final Model . . . . . . . . . . 4.7.1 Fixed-Effect Parameter Estimates . . . . . . . . . . . . . . . . 4.7.2 Covariance Parameter Estimates . . . . . . . . . . . . . . . . . 4.8 Estimating the Intraclass Correlation Coefficients (ICCs) . . . . . . . . 4.9 Calculating Predicted Values . . . . . . . . . . . . . . . . . . . . . . . . 4.9.1 Conditional and Marginal Predicted Values . . . . . . . . . . . 4.9.2 Plotting Predicted Values Using HLM . . . . . . . . . . . . . . 4.10 Diagnostics for the Final Model . . . . . . . . . . . . . . . . . . . . . . 4.10.1 Plots of the EBLUPs . . . . . . . . . . . . . . . . . . . . . . . 4.10.2 Residual Diagnostics . . . . . . . . . . . . . . . . . . . . . . . 4.11 Software Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.1 REML vs. ML Estimation . . . . . . . . . . . . . . . . . . . . 4.11.2 Setting up Three-Level Models in HLM . . . . . . . . . . . . . 4.11.3 Calculation of Degrees of Freedom for t-Tests in HLM . . . . . 4.11.4 Analyzing Cases with Complete Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 133 133 133 134 135 135 137 137 139 139 139 142 142 146 146 146 148 151 151 157 162 162 165 168 171 177 177 177 179 179 181 181 181 185 185 186 187 189 189 190 191 191 193 195 195 195 196 196 Contents 4.12 xi 4.11.5 Miscellaneous Differences . . . . . . . . . . . . . . . . . . . . . . Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 198 5 Models for Repeated-Measures Data: The Rat Brain Example 199 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.2 The Rat Brain Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.2.1 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.2.2 Data Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.3 Overview of the Rat Brain Data Analysis . . . . . . . . . . . . . . . . . 204 5.3.1 Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 5.3.2 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . 206 5.3.2.1 General Model Specification . . . . . . . . . . . . . . . 206 5.3.2.2 Hierarchical Model Specification . . . . . . . . . . . . . 207 5.3.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 5.4 Analysis Steps in the Software Procedures . . . . . . . . . . . . . . . . . 212 5.4.1 SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 5.4.2 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 5.4.3 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 5.4.3.1 Analysis Using the lme() Function . . . . . . . . . . . 219 5.4.3.2 Analysis Using the lmer() Function . . . . . . . . . . 221 5.4.4 Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 5.4.5 HLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 5.4.5.1 Data Set Preparation . . . . . . . . . . . . . . . . . . . 226 5.4.5.2 Preparing the MDM File . . . . . . . . . . . . . . . . . 227 5.5 Results of Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . 231 5.5.1 Likelihood Ratio Tests for Random Effects . . . . . . . . . . . . 231 5.5.2 Likelihood Ratio Tests for Residual Variance . . . . . . . . . . . 231 5.5.3 F -Tests for Fixed Effects . . . . . . . . . . . . . . . . . . . . . . 232 5.6 Comparing Results across the Software Procedures . . . . . . . . . . . . 232 5.6.1 Comparing Model 5.1 Results . . . . . . . . . . . . . . . . . . . 233 5.6.2 Comparing Model 5.2 Results . . . . . . . . . . . . . . . . . . . 233 5.7 Interpreting Parameter Estimates in the Final Model . . . . . . . . . . . 233 5.7.1 Fixed-Effect Parameter Estimates . . . . . . . . . . . . . . . . . 233 5.7.2 Covariance Parameter Estimates . . . . . . . . . . . . . . . . . . 239 5.8 The Implied Marginal Variance-Covariance Matrix for the Final Model . 240 5.9 Diagnostics for the Final Model . . . . . . . . . . . . . . . . . . . . . . . 241 5.10 Software Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 5.10.1 Heterogeneous Residual Variances for Level 1 Groups . . . . . . 243 5.10.2 EBLUPs for Multiple Random Effects . . . . . . . . . . . . . . . 244 5.11 Other Analytic Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 244 5.11.1 Kronecker Product for More Flexible Residual Covariance Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 5.11.2 Fitting the Marginal Model . . . . . . . . . . . . . . . . . . . . . 246 5.11.3 Repeated-Measures ANOVA . . . . . . . . . . . . . . . . . . . . 247 5.12 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 6 Random Coefficient Models for Longitudinal Data: The Autism Example 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Autism Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Data Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 249 249 249 251 Contents xii 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 Overview of the Autism Data Analysis . . . . . . . . . . . . . . . . . . . 6.3.1 Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2.1 General Model Specification . . . . . . . . . . . . . . . 6.3.2.2 Hierarchical Model Specification . . . . . . . . . . . . . 6.3.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis Steps in the Software Procedures . . . . . . . . . . . . . . . . . 6.4.1 SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3.1 Analysis Using the lme() Function . . . . . . . . . . . 6.4.3.2 Analysis Using the lmer() Function . . . . . . . . . . 6.4.4 Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.5 HLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.5.1 Data Set Preparation . . . . . . . . . . . . . . . . . . . 6.4.5.2 Preparing the MDM File . . . . . . . . . . . . . . . . . Results of Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Likelihood Ratio Test for Random Effects . . . . . . . . . . . . . 6.5.2 Likelihood Ratio Tests for Fixed Effects . . . . . . . . . . . . . . Comparing Results across the Software Procedures . . . . . . . . . . . . 6.6.1 Comparing Model 6.1 Results . . . . . . . . . . . . . . . . . . . 6.6.2 Comparing Model 6.2 Results . . . . . . . . . . . . . . . . . . . 6.6.3 Comparing Model 6.3 Results . . . . . . . . . . . . . . . . . . . Interpreting Parameter Estimates in the Final Model . . . . . . . . . . . 6.7.1 Fixed-Effect Parameter Estimates . . . . . . . . . . . . . . . . . 6.7.2 Covariance Parameter Estimates . . . . . . . . . . . . . . . . . . Calculating Predicted Values . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Marginal Predicted Values . . . . . . . . . . . . . . . . . . . . . 6.8.2 Conditional Predicted Values . . . . . . . . . . . . . . . . . . . . Diagnostics for the Final Model . . . . . . . . . . . . . . . . . . . . . . . 6.9.1 Residual Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2 Diagnostics for the Random Effects . . . . . . . . . . . . . . . . 6.9.3 Observed and Predicted Values . . . . . . . . . . . . . . . . . . . Software Note: Computational Problems with the D Matrix . . . . . . . 6.10.1 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . An Alternative Approach: Fitting the Marginal Model with an Unstructured Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.1 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . 255 255 257 257 260 261 263 263 267 270 271 273 276 278 278 279 284 284 285 285 285 289 289 289 289 291 293 293 295 297 297 298 300 301 302 302 305 7 Models for Clustered Longitudinal Data: The Dental Veneer Example 307 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 7.2 The Dental Veneer Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 7.2.1 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . . 309 7.2.2 Data Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 7.3 Overview of the Dental Veneer Data Analysis . . . . . . . . . . . . . . . 312 7.3.1 Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 7.3.2 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . 314 7.3.2.1 General Model Specification . . . . . . . . . . . . . . . 314 7.3.2.2 Hierarchical Model Specification . . . . . . . . . . . . . 317 7.3.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Contents 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 xiii Analysis 7.4.1 7.4.2 7.4.3 Steps in the Software Procedures . . . . . . . . . . . . . . . . SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3.1 Analysis Using the lme() Function . . . . . . . . . . 7.4.3.2 Analysis Using the lmer() Function . . . . . . . . . 7.4.4 Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5 HLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5.1 Data Set Preparation . . . . . . . . . . . . . . . . . . 7.4.5.2 Preparing the Multivariate Data Matrix (MDM) File Results of Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Likelihood Ratio Tests for Random Effects . . . . . . . . . . . 7.5.2 Likelihood Ratio Tests for Residual Variance . . . . . . . . . . 7.5.3 Likelihood Ratio Tests for Fixed Effects . . . . . . . . . . . . . Comparing Results across the Software Procedures . . . . . . . . . . . 7.6.1 Comparing Model 7.1 Results . . . . . . . . . . . . . . . . . . 7.6.2 Comparing Results for Models 7.2A, 7.2B, and 7.2C . . . . . . 7.6.3 Comparing Model 7.3 Results . . . . . . . . . . . . . . . . . . Interpreting Parameter Estimates in the Final Model . . . . . . . . . . 7.7.1 Fixed-Effect Parameter Estimates . . . . . . . . . . . . . . . . 7.7.2 Covariance Parameter Estimates . . . . . . . . . . . . . . . . . The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for the Final Model . . . . . . . . . . . . . . . . . . . . . . 7.9.1 Residual Diagnostics . . . . . . . . . . . . . . . . . . . . . . . 7.9.2 Diagnostics for the Random Effects . . . . . . . . . . . . . . . Software Notes and Recommendations . . . . . . . . . . . . . . . . . . 7.10.1 ML vs. REML Estimation . . . . . . . . . . . . . . . . . . . . 7.10.2 The Ability to Remove Random Effects from a Model . . . . . 7.10.3 Considering Alternative Residual Covariance Structures . . . . 7.10.4 Aliasing of Covariance Parameters . . . . . . . . . . . . . . . . 7.10.5 Displaying the Marginal Covariance and Correlation Matrices 7.10.6 Miscellaneous Software Notes . . . . . . . . . . . . . . . . . . . Other Analytic Approaches . . . . . . . . . . . . . . . . . . . . . . . . 7.11.1 Modeling the Covariance Structure . . . . . . . . . . . . . . . 7.11.2 The Step-Up vs. Step-Down Approach to Model Building . . . 7.11.3 Alternative Uses of Baseline Values for the Dependent Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Models for Data with Crossed Random Factors: The SAT Score Example 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 The SAT Score Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Data Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Overview of the SAT Score Data Analysis . . . . . . . . . . . . . . . . . 8.3.1 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1.1 General Model Specification . . . . . . . . . . . . . . . 8.3.1.2 Hierarchical Model Specification . . . . . . . . . . . . . 8.3.2 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Analysis Steps in the Software Procedures . . . . . . . . . . . . . . . . . 8.4.1 SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 322 327 330 331 334 337 341 341 342 346 346 347 347 348 348 348 349 355 355 356 357 359 359 360 363 363 364 364 365 365 366 366 366 367 367 369 369 369 369 371 373 374 374 374 375 376 376 378 380 Contents xiv 8.4.4 8.4.5 8.5 8.6 8.7 8.8 8.9 8.10 Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . HLM . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.5.1 Data Set Preparation . . . . . . . . . . . . . 8.4.5.2 Preparing the MDM File . . . . . . . . . . . 8.4.5.3 Model Fitting . . . . . . . . . . . . . . . . . Results of Hypothesis Tests . . . . . . . . . . . . . . . . . . . 8.5.1 Likelihood Ratio Tests for Random Effects . . . . . . 8.5.2 Testing the Fixed Year Effect . . . . . . . . . . . . . . Comparing Results across the Software Procedures . . . . . . Interpreting Parameter Estimates in the Final Model . . . . . 8.7.1 Fixed-Effect Parameter Estimates . . . . . . . . . . . 8.7.2 Covariance Parameter Estimates . . . . . . . . . . . . The Implied Marginal Variance-Covariance Matrix for the Final Recommended Diagnostics for the Final Model . . . . . . . . Software Notes and Additional Recommendations . . . . . . . A Statistical Software Resources A.1 Descriptions/Availability of Software Packages A.1.1 SAS . . . . . . . . . . . . . . . . . . . A.1.2 IBM SPSS Statistics . . . . . . . . . . A.1.3 R . . . . . . . . . . . . . . . . . . . . . A.1.4 Stata . . . . . . . . . . . . . . . . . . . A.1.5 HLM . . . . . . . . . . . . . . . . . . . A.2 Useful Internet Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model . . . . . . . . . 382 . 384 . 384 . 385 . 385 . 387 . 387 . 387 . 387 . 389 . 389 . 390 . 391 . . 392 . . 393 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 395 395 395 395 396 396 396 B Calculation of the Marginal Variance-Covariance Matrix 397 C Acronyms/Abbreviations 399 Bibliography 401 Index 407 List of Tables 2.1 2.5 Hierarchical Structures of the Example Data Sets Considered in Chapters 3 through 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Levels of the Hierarchical Data Sets Considered in Each Chapter . Examples of the Interpretation of Fixed and Random Effects in an LMM Based on the Autism Data Analyzed in Chapter 6 . . . . . . . . . . . . . . Computational Algorithms Used by the Software Procedures for Estimation of the Covariance Parameters in an LMM . . . . . . . . . . . . . . . . . . . Summary of Influence Diagnostics for LMMs . . . . . . . . . . . . . . . . . 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Examples of Two-Level Data in Different Research Settings . . . Sample of the Rat Pup Data Set . . . . . . . . . . . . . . . . . . Selected Models Considered in the Analysis of the Rat Pup Data Summary of Hypotheses Tested in the Rat Pup Analysis . . . . . Summary of Hypothesis Test Results for the Rat Pup Analysis . Comparison of Results for Model 3.1 . . . . . . . . . . . . . . . . Comparison of Results for Model 3.2B . . . . . . . . . . . . . . . Comparison of Results for Model 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 61 70 73 108 110 112 116 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Examples of Three-Level Data in Different Research Settings . . Sample of the Classroom Data Set . . . . . . . . . . . . . . . . . Summary of Selected Models Considered for the Classroom Data Summary of Hypotheses Tested in the Classroom Analysis . . . . Summary of Hypothesis Test Results for the Classroom Analysis Comparison of Results for Model 4.1 . . . . . . . . . . . . . . . . Comparison of Results for Model 4.2 . . . . . . . . . . . . . . . . Comparison of Results for Model 4.3 . . . . . . . . . . . . . . . . Comparison of Results for Model 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 138 147 149 178 180 182 183 184 5.1 5.2 200 5.3 5.4 5.5 5.6 5.7 5.8 Examples of Repeated-Measures Data in Different Research Settings . . . . The Rat Brain Data in the Original “Wide” Data Layout. Treatments are “Carb” and “Basal”; brain regions are BST, LS, and VDB . . . . . . . . . . Sample of the Rat Brain Data Set Rearranged in the “Long” Format . . . . Summary of Selected Models for the Rat Brain Data . . . . . . . . . . . . . Summary of Hypotheses Tested in the Analysis of the Rat Brain Data . . . Summary of Hypothesis Test Results for the Rat Brain Analysis . . . . . . Comparison of Results for Model 5.1 . . . . . . . . . . . . . . . . . . . . . . Comparison of Results for Model 5.2 . . . . . . . . . . . . . . . . . . . . . . 201 201 208 211 232 234 236 6.1 6.2 6.3 6.4 Examples of Longitudinal Data in Different Research Settings Sample of the Autism Data Set . . . . . . . . . . . . . . . . . Summary of Selected Models Considered for the Autism Data Summary of Hypotheses Tested in the Autism Analysis . . . 250 251 258 262 2.2 2.3 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 12 14 31 44 xv xvi List of Tables 6.5 6.6 6.7 6.8 Summary of Hypothesis Test Results Comparison of Results for Model 6.1 Comparison of Results for Model 6.2 Comparison of Results for Model 6.3 for . . . . . . the . . . . . . Autism Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 286 288 290 7.1 7.2 7.3 7.4 7.5 7.6 7.7 308 310 318 321 347 350 7.8 Examples of Clustered Longitudinal Data in Different Research Settings . . Sample of the Dental Veneer Data Set . . . . . . . . . . . . . . . . . . . . . Summary of Models Considered for the Dental Veneer Data . . . . . . . . . Summary of Hypotheses Tested for the Dental Veneer Data . . . . . . . . . Summary of Hypothesis Test Results for the Dental Veneer Analysis . . . . Comparison of Results for Model 7.1 . . . . . . . . . . . . . . . . . . . . . . Comparison of Results for Models 7.2A, 7.2B (Both with Aliased Covariance Parameters), and 7.2C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of Results for Model 7.3 . . . . . . . . . . . . . . . . . . . . . . 8.1 8.2 Sample of the SAT Score Data Set in the “Long” Format . . . . . . . . . . Comparison of Results for Model 8.1 . . . . . . . . . . . . . . . . . . . . . . 370 388 352 354 List of Figures 3.1 3.2 Box plots of rat pup birth weights for levels of treatment by sex. . . . . . . Litter-specific box plots of rat pup birth weights by treatment level and sex. Box plots are ordered by litter size. . . . . . . . . . . . . . . . . . . . . . . . 3.3 Model selection and related hypotheses for the analysis of the Rat Pup data. 3.4 Histograms for conditional raw residuals in the pooled high/low and control treatment groups, based on the fit of Model 3.3. . . . . . . . . . . . . . . . . 3.5 Normal Q–Q plots for the conditional raw residuals in the pooled high/low and control treatment groups, based on the fit of Model 3.3. . . . . . . . . . 3.6 Scatter plots of conditional raw residuals vs. predicted values in the pooled high/low and control treatment groups, based on the fit of Model 3.3. . . . 3.7 Box plot of conditional studentized residuals by new litter ID, based on the fit of Model 3.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Effect of deleting each litter on the REML likelihood distance for Model 3.3. 3.9 Effects of deleting one litter at a time on summary measures of influence for Model 3.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Effects of deleting one litter at a time on measures of influence for the covariance parameters in Model 3.3. . . . . . . . . . . . . . . . . . . . . . . . 3.11 Effect of deleting each litter on measures of influence for the fixed effects in Model 3.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 5.1 5.2 5.3 5.4 6.1 6.2 6.3 64 65 66 123 124 125 126 127 127 128 129 Nesting structure of a clustered three-level data set in an educational setting. 136 Box plots of the MATHGAIN responses for students in the selected classrooms in the first eight schools for the Classroom data set. . . . . . . . . . . 143 Model selection and related hypotheses for the Classroom data analysis. . . 144 Marginal predicted values of MATHGAIN as a function of MATHKIND and MINORITY, based on the fit of Model 4.2 in HLM3. . . . . . . . . . . . . . 191 EBLUPs of the random classroom effects from Model 4.2, plotted using SPSS. 192 EBLUPs of the random school effects from Model 4.2, plotted using SPSS. . 193 Normal quantile–quantile (Q–Q) plot of the residuals from Model 4.2, plotted using SPSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Residual vs. fitted plot from SPSS. . . . . . . . . . . . . . . . . . . . . . . . 195 Line graphs of activation for each animal by region within levels of treatment for the Rat Brain data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Model selection and related hypotheses for the analysis of the Rat Brain data. 204 Distribution of conditional residuals from Model 5.2. . . . . . . . . . . . . . 243 Scatter plot of conditional residuals vs. conditional predicted values based on the fit of Model 5.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Observed VSAE values plotted against age for children in each SICD group. 254 Mean profiles of VSAE values for children in each SICD group. . . . . . . . 255 Model selection and related hypotheses for the analysis of the Autism data. 256 xvii xviii List of Figures 6.4 Marginal predicted VSAE trajectories in the three SICDEGP groups for Model 6.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Conditional (dashed lines) and marginal (solid lines) trajectories, for the first 12 children with SICDEGP = 3. . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Residual vs. fitted plot for each level of SICDEGP, based on the fit of Model 6.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Plot of conditional raw residuals versus AGE.2. . . . . . . . . . . . . . . . . 6.8 Normal Q–Q Plots of conditional residuals within each level of SICDEGP. . 6.9 Normal Q–Q Plots for the EBLUPs of the random effects. . . . . . . . . . . 6.10 Scatter plots of EBLUPs for age-squared vs. age by SICDEGP. . . . . . . . 6.11 Agreement of observed VSAE scores with conditional predicted VSAE scores for each level of SICDEGP, based on Model 6.3. . . . . . . . . . . . . . . . . 7.1 7.2 7.3 7.4 7.5 7.6 8.1 8.2 8.3 8.4 Structure of the clustered longitudinal data for the first patient in the Dental Veneer data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Raw GCF values for each tooth vs. time, by patient. Panels are ordered by patient age. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guide to model selection and related hypotheses for the analysis of the Dental Veneer data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Residual vs. fitted plot based on the fit of Model 7.3. . . . . . . . . . . . . . Normal Q–Q plot of the standardized residuals based on the fit of Model 7.3. Normal Q–Q plots for the EBLUPs of the random patient effects. . . . . . . 294 296 297 298 299 299 300 301 307 312 315 360 361 362 Box plots of SAT scores by year of measurement. . . . . . . . . . . . . . . . 372 Box plots of SAT scores for each of the 13 teachers in the SAT score data set. 372 Box plots of SAT scores for each of the 122 students in the SAT score data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Predicted SAT math score values by YEAR based on Model 8.1. . . . . . . 390 Preface to the Second Edition Books attempting to serve as practical guides on the use of statistical software are always at risk of becoming outdated as the software continues to develop, especially in an area of statistics and data analysis that has received as much research attention as linear mixed models. In fact, much has changed since the first publication of this book in early 2007, and while we tried to keep pace with these changes on the web site for this book, the demand for a second edition quickly became clear. There were also a number of topics that were only briefly referenced in the first edition, and we wanted to provide more comprehensive discussions of those topics in a new edition. This second edition of Linear Mixed Models: A Practical Guide Using Statistical Software aims to update the case studies presented in the first edition using the newest versions of the various software procedures, provide coverage of additional topics in the application of linear mixed models that we believe valuable for data analysts from all fields, and also provide up-to-date information on the options and features of the sofware procedures currently available for fitting linear mixed models in SAS, SPSS, Stata, R/S-plus, and HLM. Based on feedback from readers of the first edition, we have included coverage of the following topics in this second edition: • Models with crossed random effects, and software procedures capable of fitting these models (see Chapter 8 for a new case study); • Power analysis methods for longitudinal and clustered study designs, including software options for power analyses and suggested approaches to writing simulations; • Use of the lmer() function in the lme4 package in R; • Fitting linear mixed models to complex sample survey data; • Bayesian approaches to making inferences based on linear mixed models; and • Updated graphical procedures in the various software packages. We hope that readers will find the updated coverage of these topics helpful for their research activities. We have substantially revised the subject index for the book to enable more efficient reading and easier location of material on selected topics or software options. We have also added more practical recommendations based on our experiences using the software throughout each of the chapters presenting analysis examples. New sections discussing overall recommendations can be found at the end of each of these chapters. Finally, we have created an R package named WWGbook that contains all of the data sets used in the example chapters. We will once again strive to keep readers updated on the web site for the book, and also continue to provide working, up-to-date versions of the software code used for all of the analysis examples on the web site. Readers can find the web site at the following address: http://www.umich.edu/~bwest/almmussp.html. xix This page intentionally left blank Preface to First Edition The development of software for fitting linear mixed models was propelled by advances in statistical methodology and computing power in the late twentieth century. These developments, while providing applied researchers with new tools, have produced a sometimes confusing array of software choices. At the same time, parallel development of the methodology in different fields has resulted in different names for these models, including mixed models, multilevel models, and hierarchical linear models. This book provides a reference on the use of procedures for fitting linear mixed models available in five popular statistical software packages (SAS, SPSS, Stata, R/S-plus, and HLM). The intended audience includes applied statisticians and researchers who want a basic introduction to the topic and an easy-to-navigate software reference. Several existing texts provide excellent theoretical treatment of linear mixed models and the analysis of variance components (e.g., McCulloch & Searle, 2001; Searle, Casella, & McCulloch, 1992; Verbeke & Molenberghs, 2000); this book is not intended to be one of them. Rather, we present the primary concepts and notation, and then focus on the software implementation and model interpretation. This book is intended to be a reference for practicing statisticians and applied researchers, and could be used in an advanced undergraduate or introductory graduate course on linear models. Given the ongoing development and rapid improvements in software for fitting linear mixed models, the specific syntax and available options will likely change as newer versions of the software are released. The most up-to-date versions of selected portions of the syntax associated with the examples in this book, in addition to many of the data sets used in the examples, are available at the following web site: http://www.umich.edu/~bwest/ almmussp.html. xxi This page intentionally left blank The Authors Brady T. West is a research assistant professor in the Survey Methodology Program, located within the Survey Research Center at the Institute for Social Research (ISR) on the University of Michigan–Ann Arbor campus. He also serves as a statistical consultant at the Center for Statistical Consultation and Research (CSCAR) on the University of Michigan– Ann Arbor campus. He earned his PhD from the Michigan program in survey methodology in 2011. Before that, he received an MA in applied statistics from the University of Michigan Statistics Department in 2002, being recognized as an Outstanding First-Year Applied Masters student, and a BS in statistics with Highest Honors and Highest Distinction from the University of Michigan Statistics Department in 2001. His current research interests include applications of interviewer observations in survey methodology, the implications of measurement error in auxiliary variables and survey paradata for survey estimation, survey nonresponse, interviewer variance, and multilevel regression models for clustered and longitudinal data. He has developed short courses on statistical analysis using SPSS, R, and Stata, and regularly consults on the use of procedures in SAS, SPSS, R, Stata, and HLM for the analysis of longitudinal and clustered data. He is also a coauthor of a book entitled Applied Survey Data Analysis (with Steven Heeringa and Patricia Berglund), which was published by Chapman Hall in April 2010. He lives in Dexter, Michigan with his wife Laura, his son Carter, and his American Cocker Spaniel Bailey. Kathy Welch is a senior statistician and statistical software consultant at the Center for Statistical Consultation and Research (CSCAR) at the University of Michigan–Ann Arbor. She received a BA in sociology (1969), an MPH in epidemiology and health education (1975), and an MS in biostatistics (1984) from the University of Michigan (UM). She regularly consults on the use of SAS, SPSS, Stata, and HLM for analysis of clustered and longitudinal data, teaches a course on statistical software packages in the University of Michigan Department of Biostatistics, and teaches short courses on SAS software. She has also codeveloped and cotaught short courses on the analysis of linear mixed models and generalized linear models using SAS. Andrzej Galecki is a research professor in the Division of Geriatric Medicine, Department of Internal Medicine, and Institute of Gerontology at the University of Michigan Medical School, and has a joint appointment in the Department of Biostatistics at the University of Michigan School of Public Health. He received a MSc in applied mathematics (1977) from the Technical University of Warsaw, Poland, and an MD (1981) from the Medical Academy of Warsaw. In 1985 he earned a PhD in epidemiology from the Institute of Mother and Child Care in Warsaw (Poland). Since 1990, Dr. Galecki has collaborated with researchers in gerontology and geriatrics. His research interests lie in the development and application of statistical methods for analyzing correlated and overdispersed data. He developed the SAS macro NLMEM for nonlinear mixed-effects models, specified as a solution of ordinary differential equations. His research (Galecki, 1994) on a general class of covariance structures for two or more within-subject factors is considered to be one of the very first approaches to the xxiii xxiv The Authors joint modeling of multiple outcomes. Examples of these structures have been implemented in SAS proc mixed. He is also a co-author of more than 90 publications. Brenda Gillespie is the associate director of the Center for Statistical Consultation and Research (CSCAR) at the University of Michigan in Ann Arbor. She received an AB in mathematics (1972) from Earlham College in Richmond, Indiana, an MS in statistics (1975) from The Ohio State University, and earned a PhD in statistics (1989) from Temple University in Philadelphia, Pennsylvania. Dr. Gillespie has collaborated extensively with researchers in health-related fields, and has worked with mixed models as the primary statistician on the Collaborative Initial Glaucoma Treatment Study (CIGTS), the Dialysis Outcomes Practice Pattern Study (DOPPS), the Scientific Registry of Transplant Recipients (SRTR), the University of Michigan Dioxin Study, and at the Complementary and Alternative Medicine Research Center at the University of Michigan. Acknowledgments First and foremost, we wish to thank Brenda Gillespie for her vision and the many hours she spent on making the first edition of this book a reality. Her contributions were invaluable. We sincerely wish to thank Caroline Beunckens at the Universiteit Hasselt in Belgium, who patiently and consistently reviewed our chapters, providing her guidance and insight. We also wish to acknowledge, with sincere appreciation, the careful reading of our text and invaluable suggestions for its improvement provided by Tomasz Burzykowski at the Universiteit Hasselt in Belgium; Oliver Schabenberger at the SAS Institute; Douglas Bates and José Pinheiro, codevelopers of the lme() and gls() functions in R; Sophia RabeHesketh, developer of the gllamm procedure in Stata; Chun-Yi Wu, Shu Chen, and Carrie Disney at the University of Michigan–Ann Arbor; and John Gillespie at the University of Michigan–Dearborn. We would also like to thank the technical support staff at SAS and SPSS for promptly responding to our inquiries about the mixed modeling procedures in those software packages. We also thank the anonymous reviewers provided by Chapman & Hall/CRC Press for their constructive suggestions on our early draft chapters. The Chapman & Hall/CRC Press staff has consistently provided helpful and speedy feedback in response to our many questions, and we are indebted to Kirsty Stroud for her support of this project in its early stages. We especially thank Rob Calver at Chapman & Hall /CRC Press for his support and enthusiasm for this project, and his deft and thoughtful guidance throughout. We thank our colleagues from the University of Michigan, especially Myra Kim and Julian Faraway (now at the University of Bath), for their perceptive comments and useful discussions. Our colleagues at the University of Michigan Center for Statistical Consultation and Research (CSCAR) have been wonderful, particularly Ed Rothman, who has provided encouragement and advice. We are very grateful to our clients who have allowed us to use their data sets as examples. We are also thankful to individuals who have participated in our statistics.com course on mixed-effects modeling over the years, and provided us with feedback on the first edition of this book. In particular, we acknowledge Rickie Domangue from James Madison University, Robert E. Larzelere from the University of Nebraska, and Thomas Trojian from the University of Connecticut. We also gratefully acknowledge support from the Claude Pepper Center Grants AG08808 and AG024824 from the National Institute of Aging. The transformation of the first edition of this book from Microsoft Word to LATEX was not an easy one. This would not have been possible without the careful work and attention to detail provided by Alexandra Birg, who is currently a graduate student at Ludwig Maximilians University in Munich, Germany. We are extremely grateful to Alexandra for her extraordinary LATEX skills and all of her hard work. We would also like to acknowledge the CRC Press / Chapman and Hall typesetting staff for their hard work and careful review of the new edition. As was the case with the first edition of this book, we are once again especially indebted to our families and loved ones for their unconditional patience and support. It has been a long and sometimes arduous process that has been filled with hours of discussions and many late nights. The time we have spent writing this book has been a period of great learning and has developed a fruitful exchange of ideas that we have all enjoyed. Brady, Kathy, and Andrzej xxv This page intentionally left blank 1 Introduction 1.1 What Are Linear Mixed Models (LMMs)? LMMs are statistical models for continuous outcome variables in which the residuals are normally distributed but may not be independent or have constant variance. Study designs leading to data sets that may be appropriately analyzed using LMMs include (1) studies with clustered data, such as students in classrooms, or experimental designs with random blocks, such as batches of raw material for an industrial process, and (2) longitudinal or repeatedmeasures studies, in which subjects are measured repeatedly over time or under different conditions. These designs arise in a variety of settings throughout the medical, biological, physical, and social sciences. LMMs provide researchers with powerful and flexible analytic tools for these types of data. Although software capable of fitting LMMs has become widely available in the past three decades, different approaches to model specification across software packages may be confusing for statistical practitioners. The available procedures in the general-purpose statistical software packages SAS, SPSS, R, and Stata take a similar approach to model specification, which we describe as the “general” specification of an LMM. The hierarchical linear model (HLM) software takes a hierarchical approach (Raudenbush & Bryk, 2002), in which an LMM is specified explicitly in multiple levels, corresponding to the levels of a clustered or longitudinal data set. In this book, we illustrate how the same models can be fitted using either of these approaches. We also discuss model specification in detail in Chapter 2 and present explicit specifications of the models fitted in each of our six example chapters (Chapters 3 through 8). The name linear mixed models comes from the fact that these models are linear in the parameters, and that the covariates, or independent variables, may involve a mix of fixed and random effects. Fixed effects may be associated with continuous covariates, such as weight, baseline test score, or socioeconomic status, which take on values from a continuous (or sometimes a multivalued ordinal) range, or with factors, such as gender or treatment group, which are categorical. Fixed effects are unknown constant parameters associated with either continuous covariates or the levels of categorical factors in an LMM. Estimation of these parameters in LMMs is generally of intrinsic interest, because they indicate the relationships of the covariates with the continuous outcome variable. Readers familiar with linear regression models but not LMMs specifically may know fixed effects as regression coefficients. When the levels of a categorical factor can be thought of as having been sampled from a sample space, such that each particular level is not of intrinsic interest (e.g., classrooms or clinics that are randomly sampled from a larger population of classrooms or clinics), the effects associated with the levels of those factors can be modeled as random effects in an LMM. In contrast to fixed effects, which are represented by constant parameters in an LMM, random effects are represented by (unobserved) random variables, which are 1 2 Linear Mixed Models: A Practical Guide Using Statistical Software usually assumed to follow a normal distribution. We discuss the distinction between fixed and random effects in more detail and give examples of each in Chapter 2. With this book, we illustrate (1) a heuristic development of LMMs based on both general and hierarchical model specifications, (2) the step-by-step development of the modelbuilding process, and (3) the estimation, testing, and interpretation of both fixed-effect parameters and covariance parameters associated with random effects. We work through examples of analyses of real data sets, using procedures designed specifically for the fitting of LMMs in SAS, SPSS, R, Stata, and HLM. We compare output from fitted models across the software procedures, address the similarities and differences, and give an overview of the options and features available in each procedure. 1.1.1 Models with Random Effects for Clustered Data Clustered data arise when observations are made on subjects within the same randomly selected group. For example, data might be collected from students within the same classroom, patients in the same clinic, or rat pups in the same litter. These designs involve units of analysis nested within clusters. If the clusters can be considered to have been sampled from a larger population of clusters, their effects can be modeled as random effects in an LMM. In a designed experiment with blocking, such as a randomized block design, the blocks are crossed with treatments, meaning that each treatment occurs once in each block. Block effects are usually considered to be random. We could also think of blocks as clusters, where treatment is a factor with levels that vary within clusters. LMMs allow for the inclusion of both individual-level covariates (such as age and sex) and cluster-level covariates (such as cluster size), while adjusting for the random effects associated with each cluster. Although individual cluster-specific coefficients are not explicitly estimated, most LMM software produces cluster-specific “predictions” (EBLUPs, or empirical best linear unbiased predictors) of the random cluster-specific effects. Estimates of the variability of the random effects associated with clusters can then be obtained, and inferences about the variability of these random effects in a greater population of clusters can be made. We note that traditional approaches to analysis of variance (ANOVA) models with both fixed and random effects used expected mean squares to determine the appropriate denominator for each F -test. Readers who learned mixed models under the expected mean squares system will begin the study of LMMs with valuable intuition about model building, although expected mean squares per se are now rarely mentioned. We examine a two-level model with random cluster-specific intercepts for a two-level clustered data set in Chapter 3 (the Rat Pup data). We then consider a three-level model for data from a study with students nested within classrooms and classrooms nested within schools in Chapter 4 (the Classroom data). 1.1.2 Models for Longitudinal or Repeated-Measures Data Longitudinal data arise when multiple observations are made on the same subject or unit of analysis over time. Repeated-measures data may involve measurements made on the same unit over time, or under changing experimental or observational conditions. Measurements made on the same variable for the same subject are likely to be correlated (e.g., measurements of body weight for a given subject will tend to be similar over time). Models fitted to longitudinal or repeated-measures data involve the estimation of covariance parameters to capture this correlation. The software procedures (e.g., the GLM, or General Linear Model, procedures in SAS and SPSS) that were available for fitting models to longitudinal and repeated-measures Introduction 3 data prior to the advent of software for fitting LMMs accommodated only a limited range of models. These traditional repeated-measures ANOVA models assumed a multivariate normal (MVN) distribution of the repeated measures and required either estimation of all covariance parameters of the MVN distribution or an assumption of “sphericity” of the covariance matrix (with corrections such as those proposed by Geisser & Greenhouse (1958) or Huynh & Feldt (1976) to provide approximate adjustments to the test statistics to correct for violations of this assumption). In contrast, LMM software, although assuming the MVN distribution of the repeated measures, allows users to fit models with a broad selection of parsimonious covariance structures, offering greater efficiency than estimating the full variance-covariance structure of the MVN model, and more flexibility than models assuming sphericity. Some of these covariance structures may satisfy sphericity (e.g., independence or compound symmetry), and other structures may not (e.g., autoregressive or various types of heterogeneous covariance structures). The LMM software procedures considered in this book allow varying degrees of flexibility in fitting and testing covariance structures for repeated-measures or longitudinal data. Software for LMMs has other advantages over software procedures capable of fitting traditional repeated-measures ANOVA models. First, LMM software procedures allow subjects to have missing time points. In contrast, software for traditional repeated-measures ANOVA drops an entire subject from the analysis if the subject has missing data for a single time point, known as complete-case analysis (Little & Rubin, 2002). Second, LMMs allow for the inclusion of time-varying covariates in the model (in addition to a covariate representing time), whereas software for traditional repeated-measures ANOVA does not. Finally, LMMs provide tools for the situation in which the trajectory of the outcome varies over time from one subject to another. Examples of such models include growth curve models, which can be used to make inference about the variability of growth curves in the larger population of subjects. Growth curve models are examples of random coefficient models (or Laird–Ware models), which will be discussed when considering the longitudinal data in Chapter 6 (the Autism data). In Chapter 5, we consider LMMs for a small repeated-measures data set with two withinsubject factors (the Rat Brain data). We consider models for a data set with features of both clustered and longitudinal data in Chapter 7 (the Dental Veneer data). Finally, we consider a unique educational data set with repeated measures on both students and teachers over time in Chapter 8 (the SAT score data), to illustrate the fitting of models with crossed random effects. 1.1.3 The Purpose of This Book This book is designed to help applied researchers and statisticians use LMMs appropriately for their data analysis problems, employing procedures available in the SAS, SPSS, Stata, R, and HLM software packages. It has been our experience that examples are the best teachers when learning about LMMs. By illustrating analyses of real data sets using the different software procedures, we demonstrate the practice of fitting LMMs and highlight the similarities and differences in the software procedures. We present a heuristic treatment of the basic concepts underlying LMMs in Chapter 2. We believe that a clear understanding of these concepts is fundamental to formulating an appropriate analysis strategy. We assume that readers have a general familiarity with ordinary linear regression and ANOVA models, both of which fall under the heading of general (or standard) linear models. We also assume that readers have a basic working knowledge of matrix algebra, particularly for the presentation in Chapter 2. Nonlinear mixed models and generalized LMMs (in which the dependent variable may be a binary, ordinal, or count variable) are beyond the scope of this book. For a discussion of 4 Linear Mixed Models: A Practical Guide Using Statistical Software nonlinear mixed models, see Davidian & Giltinan (1995), and for references on generalized LMMs, see Diggle et al. (2002) or Molenberghs & Verbeke (2005). We also do not consider spatial correlation structures; for more information on spatial data analysis, see Gregoire et al. (1997). A general overview of current research and practice in multilevel modeling for all types of dependent variables can be found in the recently published (2013) edited volume entitled The Sage Handbook of Multilevel Modeling. This book should not be substituted for the manuals of any of the software packages discussed. Although we present aspects of the LMM procedures available in each of the five software packages, we do not present an exhaustive coverage of all available options. 1.1.4 Outline of Book Contents Chapter 2 presents the notation and basic concepts behind LMMs and is strongly recommended for readers whose aim is to understand these models. The remaining chapters are dedicated to case studies, illustrating some of the more common types of LMM analyses with real data sets, most of which we have encountered in our work as statistical consultants. Each chapter presenting a case study describes how to perform the analysis using each software procedure, highlighting features in one of the statistical software packages in particular. In Chapter 3, we begin with an illustration of fitting an LMM to a simple two-level clustered data set and emphasize the SAS software. Chapter 3 presents the most detailed coverage of setting up the analyses in each software procedure; subsequent chapters do not provide as much detail when discussing the syntax and options for each procedure. Chapter 4 introduces models for three-level data sets and illustrates the estimation of variance components associated with nested random effects. We focus on the HLM software in Chapter 4. Chapter 5 illustrates an LMM for repeated-measures data arising from a randomized block design, focusing on the SPSS software. Examples in the second edition of this book were constructed using IBM SPSS Statistics Version 21, and all SPSS syntax presented should work in earlier versions of SPSS. Chapter 6 illustrates the fitting of a random coefficient model (specifically, a growth curve model), and emphasizes the R software. Regarding the R software, the examples have been constructed using the lme() and lmer() functions, which are available in the nlme and lme4 packages, respectively. Relative to the lme() function, the lmer() function offers improved estimation of LMMs with crossed random effects. More generally, each of these functions has particular advantages depending on the data structure and the model being fitted, and we consider these differences in our example chapters. Chapter 7 highlights the Stata software and combines many of the concepts introduced in the earlier chapters by introducing a model for clustered longitudinal data, which includes both random effects and correlated residuals. Finally, Chapter 8 discusses a case study involving crossed random effects, and highlights the use of the lmer() function in R. The analyses of examples in Chapters 3, 5, and 7 all consider alternative, heterogeneous covariance structures for the residuals, which is a very important feature of LMMs that makes them much more flexible than alternative linear modeling tools. At the end of each chapter presenting a case study, we consider the similarities and differences in the results generated by the software procedures. We discuss reasons for any discrepancies, and make recommendations for use of the various procedures in different settings. Appendix A presents several statistical software resources. Information on the background and availability of the statistical software packages SAS (Version 9.3), IBM SPSS Statistics (Version 21), Stata (Release 13), R (Version 3.0.2), and HLM (Version 7) is provided in addition to links to other useful mixed modeling resources, including web sites for important materials from this book. Appendix B revisits the Rat Brain analysis from Chap- Introduction 5 ter 5 to illustrate the calculation of the marginal variance-covariance matrix implied by one of the LMMs considered in that chapter. This appendix is designed to provide readers with a detailed idea of how one models the covariance of dependent observations in clustered or longitudinal data sets. Finally, Appendix C presents some commonly used abbreviations and acronyms associated with LMMs. 1.2 A Brief History of LMMs Some historical perspective on this topic is useful. At the very least, while LMMs might seem difficult to grasp at first, it is comforting to know that scores of people have spent over a hundred years sorting it all out. The following subsections highlight many (but not nearly all) of the important historical developments that have led to the widespread use of LMMs today. We divide the key historical developments into two categories: theory and software. Some of the terms and concepts introduced in this timeline will be discussed in more detail later in the book. For additional historical perspective, we refer readers to Brown & Prescott (2006). 1.2.1 Key Theoretical Developments The following timeline presents the evolution of the theoretical basis of LMMs: 1861: The first known formulation of a one-way random-effects model (an LMM with one random factor and no fixed factors) is that by Airy, which was further clarified by Scheffé in 1956. Airy made several telescopic observations on the same night (clustered data) for several different nights and analyzed the data separating the variance of the random night effects from the random within-night residuals. 1863: Chauvenet calculated variances of random effects in a simple random-effects model. 1925: Fisher’s book Statistical Methods for Research Workers outlined the general method for estimating variance components, or partitioning random variation into components from different sources, for balanced data. 1927: Yule assumed explicit dependence of the current residual on a limited number of the preceding residuals in building pure serial correlation models. 1931: Tippett extended Fisher’s work into the linear model framework, modeling quantities as a linear function of random variations due to multiple random factors. He also clarified an ANOVA method of estimating the variances of random effects. 1935: Neyman, Iwaszkiewicz, and Kolodziejczyk examined the comparative efficiency of randomized blocks and Latin squares designs and made extensive use of LMMs in their work. 1938: The seventh edition of Fisher’s 1925 work discusses estimation of the intraclass correlation coefficient (ICC). 1939: Jackson assumed normality for random effects and residuals in his description of an LMM with one random factor and one fixed factor. This work introduced the term effect in the context of LMMs. Cochran presented a one-way random-effects model for unbalanced data. 1940: Winsor and Clarke, and also Yates, focused on estimating variances of random effects in the case of unbalanced data. Wald considered confidence intervals for ratios of variance components. At this point, estimates of variance components were still not unique. 6 Linear Mixed Models: A Practical Guide Using Statistical Software 1941: Ganguli applied ANOVA estimation of variance components associated with random effects to nested mixed models. 1946: Crump applied ANOVA estimation to mixed models with interactions. Ganguli and Crump were the first to mention the problem that ANOVA estimation can produce negative estimates of variance components associated with random effects. Satterthwaite worked with approximate sampling distributions of variance component estimates and defined a procedure for calculating approximate degrees of freedom for approximate F -statistics in mixed models. 1947: Eisenhart introduced the “mixed model” terminology and formally distinguished between fixed- and random-effects models. 1950: Henderson provided the equations to which the BLUPs of random effects and fixed effects were the solutions, known as the mixed model equations (MMEs). 1952: Anderson and Bancroft published Statistical Theory in Research, a book providing a thorough coverage of the estimation of variance components from balanced data and introducing the analysis of unbalanced data in nested random-effects models. 1953: Henderson produced the seminal paper “Estimation of Variance and Covariance Components” in Biometrics, focusing on the use of one of three sums of squares methods in the estimation of variance components from unbalanced data in mixed models (the Type III method is frequently used, being based on a linear model, but all types are available in statistical software packages). Various other papers in the late 1950s and 1960s built on these three methods for different mixed models. 1965: Rao was responsible for the systematic development of the growth curve model, a model with a common linear time trend for all units and unit-specific random intercepts and random slopes. 1967: Hartley and Rao showed that unique estimates of variance components could be obtained using maximum likelihood methods, using the equations resulting from the matrix representation of a mixed model (Searle et al., 1992). However, the estimates of the variance components were biased downward because this method assumes that fixed effects are known and not estimated from data. 1968: Townsend was the first to look at finding minimum variance quadratic unbiased estimators of variance components. 1971: Restricted maximum likelihood (REML) estimation was introduced by Patterson & Thompson (1971) as a method of estimating variance components (without assuming that fixed effects are known) in a general linear model with unbalanced data. Likelihoodbased methods developed slowly because they were computationally intensive. Searle described confidence intervals for estimated variance components in an LMM with one random factor. 1972: Gabriel developed the terminology of ante-dependence of order p to describe a model in which the conditional distribution of the current residual, given its predecessors, depends only on its p predecessors. This leads to the development of the first-order autoregressive AR(1) process (appropriate for equally spaced measurements on an individual over time), in which the current residual depends stochastically on the previous residual. Rao completed work on minimum-norm quadratic unbiased equation (MINQUE) estimators, which demand no distributional form for the random effects or residual terms (Rao, 1972). Lindley and Smith introduced HLMs. 1976: Albert showed that without any distributional assumptions at all, ANOVA estimators are the best quadratic unbiased estimators of variance components in LMMs, and the best unbiased estimators under an assumption of normality. Introduction 7 Mid-1970s onward: LMMs are frequently applied in agricultural settings, specifically split-plot designs (Brown & Prescott, 2006). 1982: Laird and Ware described the theory for fitting a random coefficient model in a single stage (Laird & Ware, 1982). Random coefficient models were previously handled in two stages: estimating time slopes and then performing an analysis of time slopes for individuals. 1985: Khuri and Sahai provided a comprehensive survey of work on confidence intervals for estimated variance components. 1986: Jennrich and Schluchter described the use of different covariance pattern models for analyzing repeated-measures data and how to choose between them (Jennrich & Schluchter, 1986). Smith and Murray formulated variance components as covariances and estimated them from balanced data using the ANOVA procedure based on quadratic forms. Green would complete this formulation for unbalanced data. Goldstein introduced iteratively reweighted generalized least squares. 1987: Results from Self & Liang (1987) and later from Stram & Lee (1994) made testing the significance of variance components feasible. 1990: Verbyla and Cullis applied REML in a longitudinal data setting. 1994: Diggle, Liang, and Zeger distinguished between three types of random variance components: random effects and random coefficients, serial correlation (residuals close to each other in time are more similar than residuals farther apart), and random measurement error (Diggle et al., 2002). 1990s onward: LMMs become increasingly popular in medicine (Brown & Prescott, 2006) and in the social sciences (Raudenbush & Bryk, 2002), where they are also known as multilevel models or hierarchical linear models (HLMs). 1.2.2 Key Software Developments Some important landmarks are highlighted here: 1982: Bryk and Raudenbush first published the HLM computer program. 1988: Schluchter and Jennrich first introduced the BMDP5-V software routine for unbalanced repeated-measures models. 1992: SAS introduced proc mixed as a part of the SAS/STAT analysis package. 1995: StataCorp released Stata Release 5, which offered the xtreg procedure for fitting models with random effects associated with a single random factor, and the xtgee procedure for fitting models to panel data using the Generalized Estimation Equations (GEE) methodology. 1998: Bates and Pinheiro introduced the generic linear mixed-effects modeling function lme() for the R software package. 2001: Rabe-Hesketh et al. collaborated to write the Stata command gllamm for fitting LMMs (among other types of models). SPSS released the first version of the MIXED procedure as part of SPSS version 11.0. 2005: Stata made the general LMM command xtmixed available as a part of Stata Release 9, and this would later become the mixed command in Stata Release 13. Bates introduced the lmer() function for the R software package. This page intentionally left blank 2 Linear Mixed Models: An Overview 2.1 Introduction A linear mixed model (LMM) is a parametric linear model for clustered, longitudinal, or repeated-measures data that quantifies the relationships between a continuous dependent variable and various predictor variables. An LMM may include both fixed-effect parameters associated with one or more continuous or categorical covariates and random effects associated with one or more random factors. The mix of fixed and random effects gives the linear mixed model its name. Whereas fixed-effect parameters describe the relationships of the covariates to the dependent variable for an entire population, random effects are specific to clusters or subjects within a population. Consequently, random effects are directly used in modeling the random variation in the dependent variable at different levels of the data. In this chapter, we present a heuristic overview of selected concepts important for an understanding of the application of LMMs. In Subsection 2.1.1, we describe the types and structures of data that we analyze in the example chapters (Chapters 3 through 8). In Subsection 2.1.2, we present basic definitions and concepts related to fixed and random factors and their corresponding effects in an LMM. In Sections 2.2 through 2.4, we specify LMMs in the context of longitudinal data, and discuss parameter estimation methods. In Sections 2.5 through 2.10, we present other aspects of LMMs that are important when fitting and evaluating models. We assume that readers have a basic understanding of standard linear models, including ordinary least-squares regression, analysis of variance (ANOVA), and analysis of covariance (ANCOVA) models. For those interested in a more advanced presentation of the theory and concepts behind LMMs, we recommend Verbeke & Molenberghs (2000). 2.1.1 2.1.1.1 Types and Structures of Data Sets Clustered Data vs. Repeated-Measures and Longitudinal Data In the example chapters of this book, we illustrate fitting linear mixed models to clustered, repeated-measures, and longitudinal data. Because different definitions exist for these types of data, we provide our definitions for the reader’s reference. We define clustered data as data sets in which the dependent variable is measured once for each subject (the unit of analysis), and the units of analysis are grouped into, or nested within, clusters of units. For example, in Chapter 3 we analyze the birth weights of rat pups (the units of analysis) nested within litters (clusters of units). We describe the Rat Pup data as a two-level clustered data set. In Chapter 4 we analyze the math scores of students (the units of analysis) nested within classrooms (clusters of units), which are in turn nested within schools (clusters of clusters). We describe the Classroom data as a three-level clustered data set. We define repeated-measures data quite generally as data sets in which the dependent variable is measured more than once on the same unit of analysis across levels of a repeated9 10 Linear Mixed Models: A Practical Guide Using Statistical Software measures factor (or factors). The repeated-measures factors, which may be time or other experimental or observational conditions, are often referred to as within-subject factors. For example, in the Rat Brain example in Chapter 5, we analyze the activation of a chemical measured in response to two treatments across three brain regions within each rat (the unit of analysis). Both brain region and treatment are repeated-measures factors. Dropout of subjects is not usually a concern in the analysis of repeated-measures data, although there may be missing data because of an instrument malfunction or due to other unanticipated reasons. By longitudinal data, we mean data sets in which the dependent variable is measured at several points in time for each unit of analysis. We usually conceptualize longitudinal data as involving at least two repeated measurements made over a relatively long period of time. For example, in the Autism example in Chapter 6, we analyze the socialization scores of a sample of autistic children (the subjects or units of analysis), who are each measured at up to five time points (ages 2, 3, 5, 9, and 13 years). In contrast to repeated-measures data, dropout of subjects is often a concern in the analysis of longitudinal data. In some cases, when the dependent variable is measured over time, it may be difficult to classify data sets as either longitudinal or repeated-measures data. In the context of analyzing data using LMMs, this distinction is not critical. The important feature of both of these types of data is that the dependent variable is measured more than once for each unit of analysis, with the repeated measures likely to be correlated. Clustered longitudinal data sets combine features of both clustered and longitudinal data. More specifically, the units of analysis are nested within clusters, and each unit is measured more than once. In Chapter 7 we analyze the Dental Veneer data, in which teeth TABLE 2.1: Hierarchical Structures of the Example Data Sets Considered in Chapters 3 through 7 Clustered Data Data Type Two-Level Data set (Chap.) Rat Pup (Chap. 3) ThreeLevel Classroom (Chap. 4) Repeated/ longitudinal measures (t) Subject/unit of analysis (i) Cluster of units (j) Rat Pup Litter Student Classroom Repeated-Measures/ Longitudinal Data RepeatedMeasures Longitudinal Clustered Longitudinal Rat Brain (Chap. 5) Autism (Chap. 6) Dental Veneer (Chap. 7) Spanned by brain region and treatment Age in years Time in months Child Tooth Rat Patient Cluster of School clusters (k) Note: Italicized terms in boxes indicate the unit of analysis for each study; (t, i, j, k) indices shown here are used in the model notation presented later in this book. Linear Mixed Models: An Overview 11 (the units of analysis) are nested within a patient (a cluster of units), and each tooth is measured at multiple time points (i.e., at 3 months and 6 months post-treatment). We refer to clustered, repeated-measures, and longitudinal data as hierarchical data sets, because the observations can be placed into levels of a hierarchy in the data. In Table 2.1, we present the hierarchical structures of the example data sets. The distinction between repeated-measures/longitudinal data and clustered data is reflected in the presence or absence of a blank cell in the row of Table 2.1 labeled “Repeated/Longitudinal Measures.” In Table 2.1 we also introduce the index notation used in the remainder of the book. In particular, we use the index t to denote repeated/longitudinal measurements, the index i to denote subjects or units of analysis, and the index j to denote clusters. The index k is used in models for three-level clustered data to denote “clusters of clusters.” We note that Table 2.1 does not include the example data set from Chapter 8, which features crossed random factors. In these cases, there is not an explicit hierarchy present in the data. We discuss crossed random effects in more detail in Subsection 2.1.2.5. 2.1.1.2 Levels of Data We can also think of clustered, repeated-measures, and longitudinal data sets as multilevel data sets, as shown in Table 2.2. The concept of “levels” of data is based on ideas from the hierarchical linear modeling (HLM) literature (Raudenbush & Bryk, 2002). All data sets appropriate for an analysis using LMMs have at least two levels of data. We describe the example data sets that we analyze as two-level or three-level data sets, depending on how many levels of data are present. One notable exception is data sets with crossed random factors (Chapter 8), which do not have an explicit hierarchy due to the fact that levels of one random factor are not nested within levels of other random factors (see Subsection 2.1.2.5). We consider data with at most three levels (denoted as Level 1, Level 2, or Level 3) in the examples illustrated in this book, although data sets with additional levels may be encountered in practice: Level 1 denotes observations at the most detailed level of the data. In a clustered data set, Level 1 represents the units of analysis (or subjects) in the study. In a repeated-measures or longitudinal data set, Level 1 represents the repeated measures made on the same unit of analysis. The continuous dependent variable is always measured at Level 1 of the data. Level 2 represents the next level of the hierarchy. In clustered data sets, Level 2 observations represent clusters of units. In repeated-measures and longitudinal data sets, Level 2 represents the units of analysis. Level 3 represents the next level of the hierarchy, and generally refers to clusters of units in clustered longitudinal data sets, or clusters of Level 2 units (clusters of clusters) in three-level clustered data sets. We measure continuous and categorical variables at different levels of the data, and we refer to the variables as Level 1, Level 2, or Level 3 variables (with the exception of models with crossed random effects, as in Chapter 8). The idea of levels of data is explicit when using the HLM software, but it is implicit when using the other four software packages. We have emphasized this concept because we find it helpful to think about LMMs in terms of simple models defined at each level of the data hierarchy (the approach to specifying LMMs in the HLM software package), instead of only one model combining sources of variation from all levels (the approach to LMMs used in the other software procedures). However, when using the paradigm of levels 12 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 2.2: Multiple Levels of the Hierarchical Data Sets Considered in Each Chapter Clustered Data Data Type Two-Level Data set (Chap.) Level 1 Level 2 ThreeLevel Repeated-Measures/ Longitudinal Data RepeatedMeasures Longitudinal Clustered Longitudinal Rat Pup (Chap. 3) Classroom (Chap. 4) Rat Brain (Chap. 5) Autism (Chap. 6) Dental Veneer (Chap. 7) Rat Pup Student Repeated measures (spanned by brain region and treatment) Longitudinal measures (age in years) Longitudinal measures (time in months) Litter Classroom Rat Child Tooth Level 3 School Patient Note: Italicized terms in boxes indicate the units of analysis for each study. of data, the distinction between clustered vs. repeated-measures/longitudinal data becomes less obvious, as illustrated in Table 2.2. 2.1.2 Types of Factors and Their Related Effects in an LMM The distinction between fixed and random factors and their related effects on a dependent variable is critical in the context of LMMs. We therefore devote separate subsections to these topics. 2.1.2.1 Fixed Factors The concept of a fixed factor is most commonly used in the setting of a standard ANOVA or ANCOVA model. We define a fixed factor as a categorical or classification variable, for which the investigator has included all levels (or conditions) that are of interest in the study. Fixed factors might include qualitative covariates, such as gender; classification variables implied by a survey sampling design, such as region or stratum, or by a study design, such as the treatment method in a randomized clinical trial; or ordinal classification variables in an observational study, such as age group. Levels of a fixed factor are chosen so that they represent specific conditions, and they can be used to define contrasts (or sets of contrasts) of interest in the research study. 2.1.2.2 Random Factors A random factor is a classification variable with levels that can be thought of as being randomly sampled from a population of levels being studied. All possible levels of the random factor are not present in the data set, but it is the researcher’s intention to make inferences about the entire population of levels. The classification variables that identify the Linear Mixed Models: An Overview 13 Level 2 and Level 3 units in both clustered and repeated-measures/longitudinal data sets are often considered to be random factors. Random factors are considered in an analysis so that variation in the dependent variable across levels of the random factors can be assessed, and the results of the data analysis can be generalized to a greater population of levels of the random factor. 2.1.2.3 Fixed Factors vs. Random Factors In contrast to the levels of fixed factors, the levels of random factors do not represent conditions chosen specifically to meet the objectives of the study. However, depending on the goals of the study, the same factor may be considered either as a fixed factor or a random factor, as we note in the following paragraph. In the Dental Veneer data analyzed in Chapter 7, the dependent variable (Gingival Crevicular Fluid, or GCF) is measured repeatedly on selected teeth within a given patient, and the teeth are numbered according to their location in the mouth. In our analysis, we assume that the teeth measured within a given patient represent a random sample of all teeth within the patient, which allows us to generalize the results of the analysis to the larger hypothetical “population” of “teeth within patients.” In other words, we consider “tooth within patient” to be a random factor. If the research had been focused on the specific differences between the selected teeth considered in the study, we might have treated “tooth within patient” as a fixed factor. In this latter case, inferences would have only been possible for the selected teeth in the study, and not for all teeth within each patient. 2.1.2.4 Fixed Effects vs. Random Effects Fixed effects, called regression coefficients or fixed-effect parameters, describe the relationships between the dependent variable and predictor variables (i.e., fixed factors or continuous covariates) for an entire population of units of analysis, or for a relatively small number of subpopulations defined by levels of a fixed factor. Fixed effects may describe contrasts or differences between levels of a fixed factor (e.g., between males and females) in terms of mean responses for the continuous dependent variable, or they may describe the relationship of a continuous covariate with the dependent variable. Fixed effects are assumed to be unknown fixed quantities in an LMM, and we estimate them based on our analysis of the data collected in a given research study. Random effects are random values associated with the levels of a random factor (or factors) in an LMM. These values, which are specific to a given level of a random factor, usually represent random deviations from the relationships described by fixed effects. For example, random effects associated with the levels of a random factor can enter an LMM as random intercepts (representing random deviations for a given subject or cluster from the overall fixed intercept), or as random coefficients (representing random deviations for a given subject or cluster from the overall fixed effects) in the model. In contrast to fixed effects, random effects are represented as random variables in an LMM. In Table 2.3, we provide examples of the interpretation of fixed and random effects in an LMM, based on the analysis of the Autism data (a longitudinal study of socialization among autistic children) presented in Chapter 6. There are two covariates under consideration in this example: the continuous covariate AGE, which represents a child’s age in years at which the dependent variable was measured, and the fixed factor SICDEGP, which identifies groups of children based on their expressive language score at baseline (age 2). The fixed effects associated with these covariates apply to the entire population of children. The classification variable CHILDID is a unique identifier for each child, and is considered to be a random factor in the analysis. The random effects associated with the levels of CHILDID apply to specific children. 14 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 2.3: Examples of the Interpretation of Fixed and Random Effects in an LMM Based on the Autism Data Analyzed in Chapter 6 Effect Type Fixed Predictor Variables Associated with Each Effect Possible Interpretation of Effects Variable corresponding to the intercept (i.e., equal to 1 for all observations) Entire population Mean of the dependent variable when all covariates are equal to zero AGE Entire population Fixed slope for AGE (i.e., expected change in the dependent variable for a 1-year increase in AGE) SICDEGP1, SICDEGP2 (indicators for baseline expressive language groups; reference level is SICDEGP3) Entire population within each subgroup of SICDEGP Contrasts for different levels of SICDEGP (i.e., mean differences in the dependent variable for children in Level 1 and Level 2 of SICDEGP, relative to Level 3) CHILDID (individual child) Child-specific random deviation from the fixed intercept CHILDID (individual child) Child-specific random deviation from the fixed slope for AGE Random Variable corresponding to the intercept AGE 2.1.2.5 Effect Applies to Nested vs. Crossed Factors and Their Corresponding Effects When a particular level of a factor (random or fixed) can only be measured within a single level of another factor and not across multiple levels, the levels of the first factor are said to be nested within levels of the second factor. The effects of the nested factor on the response are known as nested effects. For example, in the Classroom data set analyzed in Chapter 4, both schools and classrooms within schools were randomly sampled. Levels of classroom (one random factor) are nested within levels of school (another random factor), because each classroom can appear within only one school. When a given level of a factor (random or fixed) can be measured across multiple levels of another factor, one factor is said to be crossed with another, and the effects of these factors on the dependent variable are known as crossed effects. For example, in the analysis of the Rat Pup data in Chapter 3, we consider two crossed fixed factors: TREATMENT and SEX. Specifically, levels of TREATMENT are crossed with the levels of SEX, because both male and female rat pups are studied for each level of treatment. Linear Mixed Models: An Overview 15 We consider crossed random factors and their associated random effects in Chapter 8 of this book. In this chapter, we analyze a data set from an educational study in which there are multiple measures collected over time on a random sample of students within a school, and multiple students are instructed by the same randomly sampled teacher at a given point in time. A reasonable LMM for these data would include random student effects and random teacher effects, and the levels of the random student and teacher factors are crossed with each other. Software Note: Estimation of the parameters in LMMs with crossed random effects is more computationally intensive than for LMMs with nested random effects, primarily due to the fact that the design matrices associated with the crossed random effects are no longer block-diagonal; see Chapter 15 of Galecki & Burzykowski (2013) for more discussion of this point. The lmer() function in R, which is available in the lme4 package, was designed to optimize the estimation of parameters in LMMs with crossed random effects via the use of sparse matrices (see http://pages.cs.wisc. edu/~bates/reports/MixedEffects.pdf, or Fellner (1987)), and we recommend its use for such problems. SAS proc hpmixed also uses sparse matrices when fitting these models. Each of these procedures in SAS and R can increase the efficiency of modelfitting algorithms for larger data sets with crossed random factors; models with crossed random effects can also be fitted using SAS proc mixed. We present examples of fitting models with crossed random effects in the various software packages in Chapter 8. Crossed and nested effects also apply to interactions of continuous covariates and categorical factors. For example, in the analysis of the Autism data in Chapter 6, we discuss the crossed effects of the continuous covariate, AGE, and the categorical factor, SICDEGP (expressive language group), on children’s socialization scores. 2.2 Specification of LMMs The general specification of an LMM presented in this section refers to a model for a longitudinal two-level data set, with the first index, t, being used to indicate a time point, and the second index, i, being used for subjects. We use a similar indexing convention (index t for Level 1 units, and index i for Level 2 units) in Chapters 5 through 7, which illustrate analyses involving repeated-measures and longitudinal data. In Chapters 3 and 4, in which we consider analyses of clustered data, we specify the models in a similar way but follow a modified indexing convention. More specifically, we use the first index, i, for Level 1 units, the second index, j, for Level 2 units (in both chapters), and the third index, k, for Level 3 units (in Chapter 4 only). Refer to Table 2.1 for more details. In both of these conventions, the unit of analysis is indexed by i. We define the index notation in Table 2.1 and in each of the chapters presenting example analyses. 2.2.1 General Specification for an Individual Observation We begin with a simple and general formula that indicates how most of the components of an LMM can be written at the level of an individual observation in the context of a longitudinal two-level data set. The specification of the remaining components of the LMM, 16 Linear Mixed Models: A Practical Guide Using Statistical Software which in general requires matrix notation, is deferred to Subsection 2.2.2. In the example chapters we proceed in a similar manner; that is, we specify the models at the level of an individual observation for ease of understanding, followed by elements of matrix notation. For the sake of simplicity, we specify an LMM in (2.1) for a hypothetical two-level longitudinal data set. In this specification, Yti represents the measure of the continuous response variable Y taken on the t-th occasion for the i-th subject. (1) (2) (3) (p) Yti =β1 × Xti + β2 × Xti + β3 × Xti + · · · + βp × Xti + u1i × (1) Zti + · · · + uqi × (q) Zti + εti (fixed) (random) (2.1) The value of t (t = 1, . . . , ni ), indexes the ni longitudinal observations of the dependent variable for a given subject, and i (i = 1, . . . , m) indicates the i-th subject (unit of analysis). We assume that the model involves two sets of covariates, namely the X and Z covariates. The first set contains p covariates, X (1) , . . . , X (p) , associated with the fixed effects β1 , . . . , βp . The second set contains q covariates, Z (1) , . . . , Z (q) , associated with the random effects u1i , . . . , uqi that are specific to subject i. The X and/or Z covariates may be continuous or indicator variables. The indices for the X and Z covariates are denoted by superscripts so that they do not interfere with the subscript indices, t and i, for the elements in the design matrices, Xi and Zi , presented in Subsection 2.2.2.1 For each X co(1) (p) variate, X (1) , . . . , X (p) , the elements Xti , . . . , Xti represent the t-th observed value of the corresponding covariate for the i-th subject. We assume that the p covariates may be either time-invariant characteristics of the individual subject (e.g., gender) or time-varying for each measurement (e.g., time of measurement, or weight at each time point). Each β parameter represents the fixed effect of a one-unit change in the corresponding X covariate on the mean value of the dependent variable, Y , assuming that the other covariates remain constant at some value. These β parameters are fixed effects that we wish to estimate, and their linear combination with the X covariates defines the fixed portion of the model. The effects of the Z covariates on the response variable are represented in the random portion of the model by the q random effects, u1i , . . . , uqi , associated with the i-th subject. In addition, εti represents the residual associated with the t-th observation on the i-th subject. The random effects and residuals in (2.1) are random variables, with values drawn from distributions that are defined in (2.3) and (2.4) in the next section using matrix notation. We assume that for a given subject, the residuals are independent of the random effects. The individual observations for the i-th subject in (2.1) can be combined into vectors and matrices, and the LMM can be specified more efficiently using matrix notation as shown in the next section. Specifying an LMM in matrix notation also simplifies the presentation of estimation and hypothesis tests in the context of LMMs. 2.2.2 General Matrix Specification We now consider the general matrix specification of an LMM for a given subject i, by stacking the formulas specified in Subsection 2.2.1 for individual observations indexed by t into vectors and matrices. 1 In Chapters 3 through 7, in which we analyze real data sets, our superscript notation for the covariates (1) in (2.1) is replaced by actual variable names (e.g., for the Autism data in Chapter 6, Xti might be replaced by AGEti , the t-th age at which child i is measured). Linear Mixed Models: An Overview 17 Y i = X i β + Zi u i + ε i random fixed (2.2) ui ∼ N (0, D) εi ∼ N (0, Ri ) In (2.2), Yi represents a vector of continuous responses for the i-th subject. We present elements of the Yi vector as follows, drawing on the notation used for an individual observation in (2.1): ⎛ ⎞ Y1i ⎜ Y2i ⎟ ⎜ ⎟ Yi = ⎜ . ⎟ ⎝ .. ⎠ Yni i Note that the number of elements, ni , in the vector Yi may vary from one subject to another. The Xi in (2.2) is an ni × p design matrix, which represents the known values of the p covariates, X (1) , . . . , X (p) , for each of the ni observations collected on the i-th subject: ⎛ (1) ⎞ (2) (p) X1i X1i · · · X1i ⎜ (1) (2) (p) ⎟ ⎜ X2i X2i · · · X2i ⎟ ⎜ Xi = ⎜ . .. .. ⎟ .. ⎟ . ⎝ .. . . ⎠ (1) Xni i (2) Xni i (p) · · · Xni i In a model including an intercept term, the first column would simply be equal to 1 for all observations. Note that all elements in a column of the Xi matrix corresponding to a time-invariant (or subject-specific) covariate will be the same. For ease of presentation, we assume that the Xi matrices are of full rank; that is, none of the columns (or rows) is a linear combination of the remaining ones. In general, Xi matrices may not be of full rank, and this may lead to an aliasing (or parameter identifiability) problem for the fixed effects stored in the vector β (see Subsection 2.9.3). The β in (2.2) is a vector of p unknown regression coefficients (or fixed-effect parameters) associated with the p covariates used in constructing the Xi matrix: ⎛ ⎞ β1 ⎜ β2 ⎟ ⎜ ⎟ β=⎜ . ⎟ ⎝ .. ⎠ βp The ni × q matrix Zi in (2.2) is a design matrix that represents the known values of the q covariates, Z (1) , . . . , Z (q) , for the i-th subject. This matrix is very much like the Xi matrix in that it represents the observed values of covariates; however, it usually has fewer columns than the Xi matrix: ⎛ (1) ⎞ (2) (q) Z1i Z1i · · · Z1i ⎜ (1) (2) (q) ⎟ ⎜ Z2i Z2i · · · Z2i ⎟ ⎜ Zi = ⎜ . .. .. ⎟ .. ⎟ . ⎝ .. . . ⎠ (1) (2) (q) Zni i Zni i · · · Zni i 18 Linear Mixed Models: A Practical Guide Using Statistical Software The columns in the Zi matrix represent observed values for the q predictor variables for the i-th subject, which have effects on the continuous response variable that vary randomly across subjects. In many cases, predictors with effects that vary randomly across subjects are represented in both the Xi matrix and the Zi matrix. In an LMM in which only the intercepts are assumed to vary randomly from subject to subject, the Zi matrix would simply be a column of 1’s. The ui vector for the i-th subject in (2.2) represents a vector of q random effects (defined in Subsection 2.1.2.4) associated with the q covariates in the Zi matrix: ⎛ ⎞ u1i ⎜ u2i ⎟ ⎜ ⎟ ui = ⎜ . ⎟ ⎝ .. ⎠ uqi Recall that by definition, random effects are random variables. We assume that the q random effects in the ui vector follow a multivariate normal distribution, with mean vector 0 and a variance-covariance matrix denoted by D: ui ∼ N (0, D) (2.3) Elements along the main diagonal of the D matrix represent the variances of each random effect in ui , and the off-diagonal elements represent the covariances between two corresponding random effects. Because there are q random effects in the model associated with the i-th subject, D is a q × q matrix that is symmetric and positive-definite.2 Elements of this matrix are shown as follows: ⎛ ⎞ V ar(u1i ) cov(u1i , u2i ) · · · cov(u1i , uqi ) ⎜ cov(u1i , u2i ) V ar(u2i ) · · · cov(u2i , uqi ) ⎟ ⎜ ⎟ D = V ar(ui ) = ⎜ ⎟ .. .. .. .. ⎝ ⎠ . . . . cov(u1i , uqi ) cov(u2i , uqi ) · · · V ar(uqi ) The elements (variances and covariances) of the D matrix are defined as functions of a (usually) small set of covariance parameters stored in a vector denoted by θ D . Note that the vector θ D imposes structure (or constraints) on the elements of the D matrix. We discuss different structures for the D matrix in Subsection 2.2.2.1. Finally, the εi vector in (2.2) is a vector of ni residuals, with each element in εi denoting the residual associated with an observed response at occasion t for the i-th subject. Because some subjects might have more observations collected than others (e.g., if data for one or more time points are not available when a subject drops out), the εi vectors may have a different number of elements. ⎛ ⎞ ε1i ⎜ ε2i ⎟ ⎜ ⎟ εi = ⎜ . ⎟ ⎝ .. ⎠ εni i In contrast to the standard linear model, the residuals associated with repeated observations on the same subject in an LMM can be correlated. We assume that the ni residuals in the εi vector for a given subject, i, are random variables that follow a multivariate normal 2 For more details on positive-definite matrices, interested readers can visit http://en.wikipedia.org/ wiki/Positive-definite_matrix. Linear Mixed Models: An Overview 19 distribution with a mean vector 0 and a positive-definite symmetric variance-covariance matrix Ri : εi ∼ N (0, Ri ) (2.4) We also assume that residuals associated with different subjects are independent of each other. Further, we assume that the vectors of residuals, ε1 , . . . , εm , and random effects, u1 , . . . , um , are independent of each other. We represent the general form of the Ri matrix as shown below: ⎛ ⎞ V ar(ε1i ) cov(ε1i , ε2i ) · · · cov(ε1i , εni i ) ⎜ cov(ε1i , ε2i ) V ar(ε2i ) · · · cov(ε2i , εni i ) ⎟ ⎜ ⎟ Ri = V ar(εi ) = ⎜ ⎟ .. .. .. . . ⎝ ⎠ . . . . cov(ε1i , εni i ) cov(ε2i , εni i ) · · · V ar(εni i ) The elements (variances and covariances) of the Ri matrix are defined as functions of another (usually) small set of covariance parameters stored in a vector denoted by θ R . Many different covariance structures are possible for the Ri matrix; we discuss some of these structures in Subsection 2.2.2.2. To complete our notation for the LMM, we introduce the vector θ used in subsequent sections, which combines all covariance parameters contained in the vectors θD and θ R . 2.2.2.1 Covariance Structures for the D Matrix We consider different covariance structures for the D matrix in this subsection. A D matrix with no additional constraints on the values of its elements (aside from positive-definiteness and symmetry) is referred to as an unstructured (or general) D matrix. This structure is often used forrandom coefficient models (discussed in Chapter 6). The symmetry in the q × q matrix D implies that the θ D vector has q × (q + 1)/2 parameters. The following matrix is an example of an unstructured D matrix, in the case of an LMM having two random effects associated with the i-th subject. D = V ar(ui ) = 2 σu1 σu1,u2 σu1,u2 2 σu2 In this case, the vector θ D contains three covariance parameters: ⎛ ⎞ 2 σu1 θ D = ⎝ σu1,u2 ⎠ 2 σu2 We also define other more parsimonious structures for D by imposing certain constraints on the structure of D. A very commonly used structure is the variance components (or diagonal) structure, in which each random effect in ui has its own variance, and all covariances in D are defined to be zero. In general, the θ D vector for the variance components structure requires q covariance parameters, defining the variances on the diagonal of the D matrix. For example, in an LMM having two random effects associated with the i-th subject, a variance component D matrix has the following form: D = V ar(ui ) = 2 σu1 0 0 2 σu2 In this case, the vector θ D contains two parameters: θD = 2 σu1 2 σu2 20 Linear Mixed Models: A Practical Guide Using Statistical Software The unstructured D matrix and variance components structures for the matrix are the most commonly used in practice, although other structures are available in some software procedures. For example, the parameters representing the variances and covariances of the random effects in the vector θD could be allowed to vary across different subgroups of cases (e.g., males and females in a longitudinal study), if greater between-subject variance in selected effects was to be expected in one subgroup compared to another (e.g., males have more variability in their intercepts); see Subsection 2.2.2.3. We discuss the structure of the D matrices for specific models in the example chapters. 2.2.2.2 Covariance Structures for the Ri Matrix In this section, we discuss some of the more commonly used covariance structures for the Ri matrix. The simplest covariance matrix for Ri is the diagonal structure, in which the residuals associated with observations on the same subject are assumed to be uncorrelated and to have equal variance. The diagonal Ri matrix for each subject i has the following structure: ⎛ 2 ⎞ σ 0 ··· 0 ⎜ 0 σ2 · · · 0 ⎟ ⎜ ⎟ Ri = V ar(εi ) = σ 2 Ini = ⎜ . .. . . .. ⎟ ⎝ .. . . . ⎠ 0 0 · · · σ2 The diagonal structure requires one parameter in θ R , which defines the constant variance at each time point: θ R = (σ 2 ) All software procedures that we discuss use the diagonal structure as the default structure for the Ri matrix. The compound symmetry structure is frequently used for the Ri matrix. The general form of this structure for each subject i is as follows: ⎛ 2 ⎞ σ + σ1 σ1 ··· σ1 ⎜ ⎟ σ1 σ 2 + σ1 · · · σ1 ⎜ ⎟ Ri = V ar(εi ) = ⎜ ⎟ .. .. . .. .. ⎝ ⎠ . . . σ1 σ1 · · · σ 2 + σ1 In the compound symmetry covariance structure, there are two parameters in the θ R vector that define the variances and covariances in the Ri matrix: θR = σ2 σ1 Note that the ni residuals associated with the observed response values for the i-th subject are assumed to have a constant covariance, σ1 , and a constant variance, σ 2 + σ1 , in the compound symmetry structure. This structure is often used when an assumption of equal correlation of residuals is plausible (e.g., repeated trials under the same condition in an experiment). The first-order autoregressive structure, denoted by AR(1), is another commonly used covariance structure for Ri the matrix. The general form of the Ri matrix for this covariance structure is as follows: Linear Mixed Models: An Overview ⎛ ⎜ ⎜ Ri = V ar(εi ) = ⎜ ⎜ ⎝ 21 σ2 σ2 ρ .. . 2 ni −1 σ ρ σ2 ρ σ2 .. . 2 ni −2 σ ρ · · · σ 2 ρni −1 · · · σ 2 ρni −2 .. .. . . ··· σ2 ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ The AR(1) structure has only two parameters in the θ R vector that define all the variances and covariances in the Ri matrix: a variance parameter, σ 2 , and a correlation parameter, ρ. θR = σ2 ρ Note that σ 2 must be positive, whereas ρ can range from −1 to 1. In the AR(1) covariance structure, the variance of the residuals, σ 2 , is assumed to be constant, and the covariance of residuals of observations that are w units apart is assumed to be equal to σ 2 ρw . This means that all adjacent residuals (i.e., the residuals associated with observations next to each other in a sequence of longitudinal observations for a given subject) have a covariance of σ 2 ρ, and residuals associated with observations two units apart in the sequence have a covariance of σ 2 ρ2 , and so on. The AR(1) structure is often used to fit models to data sets with equally spaced longitudinal observations on the same units of analysis. This structure implies that observations closer to each other in time exhibit higher correlation than observations farther apart in time. Other covariance structures, such as the Toeplitz structure, allow more flexibility in the correlations, but at the expense of using more covariance parameters in the θ R vector. In any given analysis, we try to determine the structure for the Ri matrix that seems most appropriate and parsimonious, given the observed data and knowledge about the relationships between observations on an individual subject. 2.2.2.3 Group-Specific Covariance Parameter Values for the D and Ri Matrices The D and Ri covariance matrices can also be specified to allow heterogeneous variances for different groups of subjects (e.g., males and females). Specifically, we might assume the same structures for the matrices in different groups, but with different values for the covariance parameters in the θ D and θ R vectors. Examples of heterogeneous Ri matrices defined for different groups of subjects and observations are given in Chapter 3, Chapter 5, and Chapter 7. We do not consider examples of heterogeneity in the D matrix. For a recently published example of this type of heterogeneity (many exist in the literature), interested readers can refer to West & Elliott (Forthcoming in 2014). 2.2.3 Alternative Matrix Specification for All Subjects In (2.2), we presented a general matrix specification of the LMM for a given subject i. An alternative specification, based on all subjects under study, is presented in (2.5): 22 Linear Mixed Models: A Practical Guide Using Statistical Software Y = Xβ + Zu + ε fixed random (2.5) u ∼ N (0, G) ε ∼ N (0, R) In (2.5), the n × 1 vector Y, where n = i ni , is the result of “stacking” the Yi vectors for all subjects vertically. The n ×p design matrix X is obtained by stacking all Xi matrices vertically as well. In two-level models or models with nested random effects, the Z matrix is a block-diagonal matrix, with blocks on the diagonal defined by the Zi matrices. The u vector stacks all ui vectors vertically, and the vector ε stacks all εi vectors vertically. The G matrix is a block-diagonal matrix representing the variance-covariance matrix for all random effects (not just those associated with a single subject i), with blocks on the diagonal defined by the D matrix. The n × n matrix R is a block-diagonal matrix representing the variance-covariance matrix for all residuals, with blocks on the diagonal defined by the Ri matrices. This “all subjects” specification is used in the documentation for SAS proc mixed and the MIXED command in SPSS, but we primarily refer to the D and Ri matrices for a single subject (or cluster) throughout the book. 2.2.4 Hierarchical Linear Model (HLM) Specification of the LMM It is often convenient to specify an LMM in terms of an explicitly defined hierarchy of simpler models, which correspond to the levels of a clustered or longitudinal data set. When LMMs are specified in such a way, they are often referred to as hierarchical linear models (HLMs), or multilevel models (MLMs). The HLM software is the only program discussed in this book that requires LMMs to be specified in a hierarchical manner. The HLM specification of an LMM is equivalent to the general LMM specification introduced in Subsection 2.2.2, and may be implemented for any LMM. We do not present a general form for the HLM specification of LMMs here, but rather introduce examples of the HLM specification in Chapters 3 through 8. The levels of the example data sets considered in the HLM specification of models for these data sets are displayed in Table 2.2. 2.3 The Marginal Linear Model In Section 2.2, we specified the general LMM. In this section, we specify a closely related marginal linear model. The key difference between the two models lies in the presence or absence of random effects. Specifically, random effects are explicitly used in LMMs to explain the between-subject or between-cluster variation, but they are not used in the specification of marginal models. This difference implies that the LMM allows for subjectspecific inference, whereas the marginal model does not. For the same reason, LMMs are often referred to as subject-specific models, and marginal models are called populationaveraged models. In Subsection 2.3.1, we specify the marginal model in general, and in Subsection 2.3.2, we present the marginal model implied by an LMM. Linear Mixed Models: An Overview 2.3.1 23 Specification of the Marginal Model The general matrix specification of the marginal model for subject i is Yi = Xi β + ε∗i where (2.6) ε∗i ∼ N (0, Vi∗ ) In (2.6), the ni ×p design matrix Xi is constructed the same way as in an LMM. Similarly, β is a vector of fixed effects. The vector ε∗i represents a vector of marginal residual errors. Elements in the ni × ni marginal variance-covariance matrix Vi∗ are usually defined by a small set of covariance parameters, which we denote as θ ∗ . All structures used for the Ri matrix in LMMs (described in Subsection 2.2.2.2) can be used to specify a structure for Vi∗ . Other structures for Vi∗ , such as those shown in Subsection 2.3.2, are also allowed. Note that the entire random part of the marginal model is described in terms of the marginal residuals ε∗i only. In contrast to the LMM, the marginal model does not involve the random effects, ui , so inferences cannot be made about them and consequently this model is not a mixed model. Software Note: Several software procedures designed for fitting LMMs, including the procedures in SAS, SPSS, R, and Stata, also allow users to specify a marginal model directly. The most natural way to specify selected marginal models in these procedures is to make sure that random effects are not included in the model, and then specify an appropriate covariance structure for the Ri matrix, which in the context of the marginal model will be used for Vi∗ . A marginal model of this form is not an LMM, because no random effects are included in the model. This type of model cannot be specified using the HLM software, because HLM generally requires the specification of at least one set of random effects (e.g., a random intercept). Examples of fitting a marginal model by omitting random effects and using an appropriate Ri matrix are given in alternative analyses of the Rat Brain data at the end of Chapter 5, and the Autism data at the end of Chapter 6. 2.3.2 The Marginal Model Implied by an LMM The LMM introduced in (2.2) implies the following marginal linear model: Yi = Xi β + ε∗i where (2.7) ε∗i ∼ N (0, Vi ) and the variance-covariance matrix, Vi , is defined as Vi = Zi DZi + Ri A few observations are in order. First, the implied marginal model is an example of the marginal model defined in Subsection 2.3.1. Second, the LMM in (2.2) and the corresponding implied marginal model in (2.7) involve the same set of covariance parameters θ (i.e., the θ D and θ R vectors combined). The important difference is that there are more restrictions imposed on the covariance parameter space in the LMM than in the implied marginal model. In general, the D and Ri matrices in LMMs have to be positive-definite, whereas the only 24 Linear Mixed Models: A Practical Guide Using Statistical Software requirement in the implied marginal model is that the Vi matrix be positive-definite. Third, interpretation of the covariance parameters in a marginal model is different from that in an LMM, because inferences about random effects are no longer valid. The concept of the implied marginal model is important for at least two reasons. First, estimation of fixed-effect and covariance parameters in the LMM (see Subsection 2.4.1.2) is carried out in the framework of the implied marginal model. Second, in the case in which a software procedure produces a nonpositive-definite (i.e., invalid) estimate of the D matrix in an LMM, we may be able to fit the implied marginal model, which has fewer restrictions. Consequently, we may be able to diagnose problems with nonpositive-definiteness of the D matrix or, even better, we may be able to answer some relevant research questions in the context of the implied marginal model. The implied marginal model defines the marginal distribution of the Y i vector: Y i ∼ N (Xi β, Zi DZi + Ri ) (2.8) The marginal mean (or expected value) and the marginal variance-covariance matrix of the vector Y i are equal to E(Y i ) = Xi β (2.9) and V ar(Y i ) = Vi = Zi DZi + Ri The off-diagonal elements in the ni × ni matrix Vi represent the marginal covariances of the Y i vector. These covariances are in general different from zero, which means that in the case of a longitudinal data set, repeated observations on a given individual i are correlated. We present an example of calculating the Vi matrix for the marginal model implied by an LMM fitted to the Rat Brain data (Chapter 5) in Appendix B. The marginal distribution specified in (2.8), with mean and variance defined in (2.9), is a focal point of the likelihood estimation in LMMs outlined in the next section. Software Note: The software discussed in this book is primarily designed to fit LMMs. In some cases, we may be interested in fitting the marginal model implied by a given LMM using this software: 1. For some fairly simple LMMs, it is possible to specify the implied marginal model directly using the software procedures in SAS, SPSS, R, and Stata, as described in Subsection 2.3.1. As an example, consider an LMM with random intercepts and constant residual variance. The Vi matrix for the marginal model implied by this LMM has a compound symmetry structure (see Appendix B), which can be specified by omitting the random intercepts from the model and choosing a compound symmetry structure for the Ri matrix. 2. Another very general method available in the LMM software procedures is to “emulate” fitting the implied marginal model by fitting the LMM itself. By emulation, we mean using the same syntax as for an LMM, i.e., including specification of random effects, but interpreting estimates and other results as if they were obtained for the marginal model. In this approach, we simply take advantage of the fact that estimation of the LMM and of the implied marginal model are performed using the same algorithm (see Section 2.4). Linear Mixed Models: An Overview 25 3. Note that the general emulation approach outlined in item 2 has some limitations related to less restrictive constraints in the implied marginal model compared to LMMs. In most software procedures that fit LMMs, it is difficult to relax the positive-definiteness constraints on the D and Ri matrices as required by the implied marginal model. The nobound option in SAS proc mixed is the only exception among the software procedures discussed in this book that allows users to remove the positive-definiteness constraints on the D and Ri matrices and allows user-defined constraints to be imposed on the covariance parameters in the θ D and θR vectors. An example of using the nobound option to relax the constraints on covariance parameters applicable to the fitted linear mixed model is given in Subsection 6.4.1. 2.4 Estimation in LMMs In the LMM, we estimate the fixed-effect parameters, β, and the covariance parameters, θ (i.e., θ D and θ R for the D and Ri matrices, respectively). In this section, we discuss maximum likelihood (ML) and restricted maximum likelihood (REML) estimation, which are methods commonly used to estimate these parameters. 2.4.1 Maximum Likelihood (ML) Estimation In general, maximum likelihood (ML) estimation is a method of obtaining estimates of unknown parameters by optimizing a likelihood function. To apply ML estimation, we first construct the likelihood as a function of the parameters in the specified model, based on distributional assumptions. The maximum likelihood estimates (MLEs) of the parameters are the values of the arguments that maximize the likelihood function (i.e., the values of the parameters that make the observed values of the dependent variable most likely, given the distributional assumptions). See Casella & Berger (2002) for an in-depth discussion of ML estimation. In the context of the LMM, we construct the likelihood function of β and θ by referring to the marginal distribution of the dependent variable Y i defined in (2.8). The corresponding multivariate normal probability density function, f (Y i |β, θ), is: f (Yi |β, θ) = (2π) −ni 2 det(Vi ) −1 2 exp(−0.5 × (Y i − Xi β) Vi−1 (Y i − Xi β)) (2.10) where det refers to the determinant. Recall that the elements of the Vi matrix are functions of the covariance parameters in θ. Based on the probability density function (pdf) defined in (2.10), and given the observed data Y i = yi , the likelihood function contribution for the i-th subject is defined as follows: Li (β, θ; yi ) = (2π) −ni 2 det(Vi ) −1 2 exp(−0.5 × (yi − Xi β) Vi−1 (yi − Xi β)) (2.11) We write the likelihood function, L(β, θ) as the product of the m independent contributions defined in (2.11) for the individuals (i = 1, . . . , m): 26 Linear Mixed Models: A Practical Guide Using Statistical Software −ni 2 −1 2 exp(−0.5 × (yi − Xi β) Vi−1 (yi − Xi β)) (2.12) The corresponding log-likelihood function, (β, θ), is defined using (2.12) as L(β, θ) = i Li (β, θ) = i (2π) det(Vi ) (β, θ) = ln L(β, θ) = −0.5n × ln(2π) − 0.5 × −0.5 × ln(det(Vi )) i (yi − Xi β) Vi−1 (yi − Xi β) (2.13) i where n (= ni ) is the number of observations (rows) in the data set, and “ln” refers to the natural logarithm. Although it is often possible to find estimates of β and θ simultaneously, by optimization of (β, θ) with respect to both β and θ, many computational algorithms simplify the optimization by profiling out the β parameters from (β, θ), as shown in Subsections 2.4.1.1 and 2.4.1.2. 2.4.1.1 Special Case: Assume θ Is Known In this section, we consider a special case of ML estimation for LMMs, in which we assume that θ, and as a result the matrix Vi , are known. Although this situation does not occur in practice, it has important computational implications, so we present it separately. Because we assume that θ is known, the only parameters that we estimate are the fixed effects, β. The log-likelihood function, (β, θ), thus becomes a function of β only, and its optimization is equivalent to finding a minimum of an objective function q(β), defined by the last term in (2.13): q(β) = 0.5 × (yi − Xi β) Vi−1 (yi − Xi β) (2.14) i The function in (2.14) looks very much like the matrix formula for the sum of squared errors that is minimized in the standard linear model, but with the addition of the nondiagonal “weighting” matrix Vi−1 . Note that optimization of q(β) with respect to β can be carried out by applying the method of generalized least squares (GLS). The optimal value of β can be obtained analytically: = β i −1 Xi Vi−1 Xi Xi Vi−1 yi (2.15) i has the desirable statistical property of being the best linear unbiased The estimate β estimator (BLUE) of β. The closed-form formula in (2.15) also defines a functional relationship between the covariance parameters, θ, and the value of β that maximizes (β, θ). We use this relationship in the next section to profile out the fixed-effect parameters, β, from the log-likelihood, and make it strictly a function of θ. 2.4.1.2 General Case: Assume θ Is Unknown In this section, we consider ML estimation of the covariance parameters, θ, and the fixed effects, β, assuming θ is unknown. Linear Mixed Models: An Overview 27 First, to obtain estimates for the covariance parameters in θ, we construct a profile loglikelihood function ML (θ). The function ML (θ) is derived from (β, θ) by replacing the in (2.15). The resulting function is β parameters with the expression defining β ML (θ) = −0.5n × ln(2π) − 0.5 × ln(det(Vi )) − 0.5 × i ⎛ where ri = yi − Xi ⎝ ri Vi−1 ri (2.16) i −1 Xi Vi−1 Xi i ⎞ Xi Vi−1 yi ⎠ (2.17) i In general, maximization of ML (θ), as shown in (2.16), with respect to θ is an example of a nonlinear optimization, with inequality constraints imposed on θ so that positivedefiniteness requirements on the D and Ri matrices are satisfied. There is no closed-form solution for the optimal θ, so the estimate of θ is obtained by performing computational iterations until convergence (see Subsection 2.5.1). After the ML estimates of the covariance parameters in θ (and consequently, estimates of the variances and covariances in D and Ri ) are obtained through an iterative computational This can be done without an iterative process, using process, we are ready to calculate β. (2.18) and (2.19). First, we replace the D and Ri matrices in (2.9) by their ML estimates, and R i , to calculate Vi , an estimate of Vi : D +R i Vi = Zi DZ i (2.18) with Vi replaced by Then, we use the generalized least-squares formula, (2.15), for β, its estimate defined in (2.18) to obtain β: β = −1 Xi Vi−1 Xi i Xi Vi−1 yi (2.19) i is the empirical best linear Because we replaced Vi by its estimate, Vi , we say that β unbiased estimator (EBLUE) of β. var(β), is a p × p variance-covariance matrix calculated as follows: The variance of β, = var(β) −1 Xi Vi−1 Xi (2.20) i in Subsection 2.4.3, because they We discuss issues related to the estimates of var(β) apply to both ML and REML estimation. The ML estimates of θ are biased because they do not take into account the loss of degrees of freedom that results from estimating the fixed-effect parameters in β (see Verbeke & Molenberghs (2000) for a discussion of the bias in ML estimates of θ in the context of LMMs). An alternative form of the maximum likelihood method known as REML estimation is frequently used to eliminate the bias in the ML estimates of the covariance parameters. We discuss REML estimation in Subsection 2.4.2. 2.4.2 REML Estimation REML estimation is an alternative way of estimating the covariance parameters in θ. REML estimation (sometimes called residual maximum likelihood estimation) was introduced in the early 1970s by Patterson & Thompson (1971) as a method of estimating 28 Linear Mixed Models: A Practical Guide Using Statistical Software variance components in the context of unbalanced incomplete block designs. Alternative and more general derivations of REML are given by Harville (1977), Cooper & Thompson (1977), and Verbyla (1990). REML is often preferred to ML estimation, because it produces unbiased estimates of covariance parameters by taking into account the loss of degrees of freedom that results from estimating the fixed effects in β. The REML estimates of θ are based on optimization of the following REML loglikelihood function: REML (θ) = −0.5 × (n − p) × ln(2π) − 0.5 × −0.5 × ri Vi−1 ri − 0.5 × i ln(det(Vi )) i ln(det(Xi Vi−1 Xi )) (2.21) i In the function shown in (2.21), ri is defined as in (2.17). Once an estimate, Vi , of the Vi matrix has been obtained, REML-based estimates of the fixed-effect parameters, β, and var(β) can be computed. In contrast to ML estimation, the REML method does not provide a formula for the estimates. Instead, we use (2.18) and (2.19) from ML estimation to estimate the fixed-effect parameters and their standard errors. Although we use the same formulas in (2.18) and (2.19) for REML and ML estimation of and corresponding the fixed-effect parameters, it is important to note that the resulting β from REML and ML estimation are different, because the Vi matrix is different in var(β) each case. 2.4.3 REML vs. ML Estimation In general terms, we use maximum likelihood methods (either REML or ML estimation) to obtain estimates of the covariance parameters in θ in an LMM. We then obtain estimates of the fixed-effect parameters in β using results from generalized least squares. However, ML estimates of the covariance parameters are biased, whereas REML estimates are not. When used to estimate the covariance parameters in θ, ML and REML estimation are computationally intensive; both involve the optimization of some objective function, which generally requires starting values for the parameter estimates and several subsequent iterations to find the values of the parameters that maximize the likelihood function (iterative methods for optimizing the likelihood function are discussed in Subsection 2.5.1). Statistical software procedures capable of fitting LMMs often provide a choice of either REML or ML as an estimation method, with the default usually being REML. Table 2.4 provides information on the estimation methods available in the software procedures discussed in this book. Note that the variances of the estimated fixed effects, i.e., the diagonal elements in as presented in (2.20), are biased downward in both ML and REML estimation, var(β) because they do not take into account the uncertainty introduced by replacing Vi with Vi are also in (2.15). Consequently, the standard errors of the estimated fixed effects, se(β), biased downward. In the case of ML estimation, this bias is compounded by the bias in the estimation of θ and hence in the elements of Vi . To take this bias into account, approximate degrees of freedom are estimated for the t-tests or F -tests that are used for hypothesis tests about the fixed-effect parameters (see Subsection 2.6.3.1). Kenward & Roger (1997) proposed an adjustment to account for the extra variability in using Vi as an estimator of Vi , which has been implemented in SAS proc mixed. Linear Mixed Models: An Overview 29 The estimated variances of the estimated fixed-effect parameters contained in var(β) depend on how close Vi is to the “true” value of Vi . To get the best possible estimate of Vi in practice, we often use REML estimation to fit LMMs with different structures for the D and Ri matrices and use model selection tools (discussed in Section 2.6) to find the best estimate for Vi . We illustrate the selection of appropriate structures for the D and Ri variance-covariance matrices in detail for the LMMs that we fit in the example chapters. Although we dealt with estimation in the LMM in this section, a very similar algorithm can be applied to the estimation of fixed effects and covariance parameters in the marginal model specified in Section 2.3. 2.5 2.5.1 Computational Issues Algorithms for Likelihood Function Optimization Having defined the ML and REML estimation methods, we briefly introduce the computational algorithms used to carry out the estimation for an LMM. The key computational difficulty in the analysis of LMMs is estimation of the covariance parameters, using iterative numerical optimization of the log-likelihood functions introduced in Subsection 2.4.1.2 for ML estimation and in Subsection 2.4.2 for REML estimation, subject to constraints imposed on the parameters to ensure positive-definiteness of the D and Ri matrices. The most common iterative algorithms used for this optimization problem in the context of LMMs are the expectation-maximization (EM) algorithm, the Newton–Raphson (N–R) algorithm (the preferred method), and the Fisher scoring algorithm. The EM algorithm is often used to maximize complicated likelihood functions or to find good starting values of the parameters to be used in other algorithms (this latter approach is currently used by the procedures in R, Stata, and HLM, as shown in Table 2.4). General descriptions of the EM algorithm, which alternates between expectation (E) and maximization (M) steps, can be found in Dempster et al. (1977) and Laird et al. (1987). For “incomplete” data sets arising from studies with unbalanced designs, the E-step involves, at least conceptually, creation of a “complete” data set based on a hypothetical scenario, in which we assume that data have been obtained from a balanced design and there are no missing observations for the dependent variable. In the context of the LMM, the complete data set is obtained by augmenting observed values of the dependent variable with expected values of the sum of squares and sum of products of the unobserved random effects and residuals. The complete data are obtained using the information available at the current iteration of the algorithm, i.e., the current values of the covariance parameter estimates and the observed values of the dependent variable. Based on the complete data, an objective function called the complete data log-likelihood function is constructed and maximized in the M-step, so that the vector of estimated θ parameters is updated at each iteration. The underlying assumption behind the EM algorithm is that optimization of the complete data log-likelihood function is simpler than optimization of the likelihood based on the observed data. The main drawback of the EM algorithm is its slow rate of convergence. In addition, the precision of estimators derived from the EM algorithm is overly optimistic, because the estimators are based on the likelihood from the last maximization step, which uses complete data instead of observed data. Although some solutions have been proposed to overcome 30 Linear Mixed Models: A Practical Guide Using Statistical Software these shortcomings, the EM algorithm is rarely used to fit LMMs, except to provide starting values for other algorithms. The N–R algorithm and its variations are the most commonly used algorithms in ML and REML estimation of LMMs. The N–R algorithm minimizes an objective function defined as –2 times the log-likelihood function for the covariance parameters specified in Subsection 2.4.1.2 for ML estimation or in Subsection 2.4.2 for REML estimation. At every iteration, the N–R algorithm requires calculation of the vector of partial derivatives (the gradient), and the second derivative matrix with respect to the covariance parameters (the observed Hessian matrix). Analytical formulas for these matrices are given in Jennrich & Schluchter (1986) and Lindstrom & Bates (1988). Owing to Hessian matrix calculations, N–R iterations are more time consuming, but convergence is usually achieved in fewer iterations than when using the EM algorithm. Another advantage of using the N–R algorithm is that the Hessian matrix from the last iteration can be used to obtain an asymptotic variance-covariance matrix for the estimated covariance parameters in θ, allowing for calculation of standard errors of θ. The Fisher scoring algorithm can be considered as a modification of the N–R algorithm. The primary difference is that Fisher scoring uses the expected Hessian matrix rather than the observed one. Although Fisher scoring is often more stable numerically, more likely to converge, and calculations performed at each iteration are simplified compared to the N–R algorithm, Fisher scoring is not recommended to obtain final estimates. The primary disadvantage of the Fisher scoring algorithm, as pointed out by Little & Rubin (2002), is that it may be difficult to determine the expected value of the Hessian matrix because of difficulties with identifying the appropriate sampling distribution. To avoid problems with determining the expected Hessian matrix, use of the N–R algorithm instead of the Fisher scoring algorithm is recommended. To initiate optimization of the N–R algorithm, a sensible choice of starting values for the covariance parameters is needed. One method for choosing starting values is to use a noniterative method based on method-of-moment estimators (Rao, 1972). Alternatively, a small number of EM iterations can be performed to obtain starting values. In other cases, initial values may be assigned explicitly by the analyst. The optimization algorithms used to implement ML and REML estimation need to ensure that the estimates of the D and Ri matrices are positive-definite. In general, it is preferable to ensure that estimates of the covariance parameters in θ, updated from one iteration of an optimization algorithm to the next, imply positive-definiteness of D and Ri at every step of the estimation process. Unfortunately, it is difficult to meet these requirements, so software procedures set much simpler conditions that are necessary, but not sufficient, to meet positive-definiteness constraints. Specifically, it is much simpler to ensure that elements on the diagonal of the estimated D and Ri matrices are greater than zero during the entire iteration process, and this method is often used by software procedures in practice. At the last iteration, estimates of the D and Ri matrices are checked for being positive-definite, and a warning message is issued if the positive-definiteness constraints are not satisfied. See Subsection 6.4.1 for a discussion of a nonpositive-definite D matrix (called the G matrix in SAS), in the analysis of the Autism data using proc mixed in SAS. An alternative way to address positive-definiteness constraints is to apply a log-Cholesky decomposition (or other transformations) to the D and/or Ri matrices, which results in substantial simplification of the optimization problem. This method changes the problem from a constrained to an unconstrained one and ensures that the D, Ri , or both matrices are positive-definite during the entire estimation process (see Pinheiro & Bates, 1996, for more details on the log-Cholesky decomposition method). Table 2.4 details the computational algorithms used to implement both ML and REML estimation by the LMM procedures in the five software packages presented in this book. Linear Mixed Models: An Overview 31 TABLE 2.4: Computational Algorithms Used by the Software Procedures for Estimation of the Covariance Parameters in an LMM Software Procedures Available Estimation Methods, Default Method Computational Algorithms SAS proc mixed ML, REML Ridge-stabilized N–R,a Fisher scoring SPSS MIXED ML, REML N–R, Fisher scoring R: lme() function ML, REML EMb algorithm,c N–R R: lmer() function ML, REML EMb algorithm,c N–R Stata: mixed command ML, REML EM algorithm, N–R (default) HLM: HLM2 (Chapters 3, 5, 6) ML, REML EM algorithm, Fisher scoring HLM: HLM3 (Chapter 4) ML EM algorithm, Fisher scoring HLM: HMLM2 (Chapter 7) ML EM algorithm, Fisher scoring HLM: HCM2 ML EM algorithm, Fisher scoring (Chapter 8) a N–R denotes the Newton–Raphson algorithm (see Subsection 2.5.1). b EM denotes the Expectation-Maximization algorithm (see Subsection 2.5.1). c The lme() function in R actually use the ECME (expectation conditional maximization either) algorithm, which is a modification of the EM algorithm. For details, see Liu and Rubin (1994). 2.5.2 Computational Problems with Estimation of Covariance Parameters The random effects in the ui vector in an LMM are assumed to arise from a multivariate normal distribution with variances and covariances described by the positive-definite variance-covariance matrix D. Occasionally, when one is using a software procedure to fit an LMM, depending on (1) the nature of a clustered or longitudinal data set, (2) the degree of similarity of observations within a given level of a random factor, or (3) model misspecification, the iterative estimation routines converge to a value for the estimate of a covariance parameter in θ D that lies very close to or outside the boundary of the parameter space. Consequently, the estimate of the D matrix may not be positive-definite. Note that in the context of estimation of the D matrix, we consider positive-definiteness in a numerical, rather than mathematical, sense. By numerical, we mean that we take into account the finite numeric precision of a computer. Each software procedure produces different error messages or notes when computational problems are encountered in estimating the D matrix. In some cases, some software procedures (e.g., proc mixed in SAS, or MIXED in SPSS) stop the estimation process, assume that an estimated variance in the D matrix lies on a boundary of the parameter space, 32 Linear Mixed Models: A Practical Guide Using Statistical Software and report that the estimated D matrix is not positive-definite (in a numerical sense). In other cases, computational algorithms elude the positive-definiteness criteria and converge to an estimate of the D matrix that is outside the allowed parameter space (a nonpositivedefinite matrix). We encounter this type of problem when fitting Model 6.1 in Chapter 6 (see Subsection 6.4.1). In general, when fitting an LMM, analysts should be aware of warning messages indicating that the estimated D matrix is not positive-definite and interpret parameter estimates with extreme caution when these types of messages are produced by a software procedure. We list some alternative approaches for fitting the model when problems arise with estimation of the covariance parameters: 1. Choose alternative starting values for covariance parameter estimates: If a computational algorithm does not converge or converges to possibly suboptimal values for the covariance parameter estimates, the problem may lie in the choice of starting values for covariance parameter estimates. To remedy this problem, we may choose alternative starting values or initiate computations using a more stable algorithm, such as the EM algorithm (see Subsection 2.5.1). 2. Rescale the covariates: In some cases, covariance parameters are very different in magnitude and may even be several orders of magnitude apart. Joint estimation of covariance parameters may cause one of the parameters to become extremely small, approaching the boundary of the parameter space, and the D matrix may become nonpositive-definite (within the numerical tolerance of the computer being used). If this occurs, one could consider rescaling the covariates associated with the small covariance parameters. For example, if a covariate measures time in minutes and a study is designed to last several days, the values on the covariate could become very large and the associated variance component could be small (because the incremental effects of time associated with different subjects will be relatively small). Dividing the time covariate by a large number (e.g., 60, so that time would be measured in hours instead of minutes) may enable the corresponding random effects and their variances to be on a scale more similar to that of the other covariance parameters. Such rescaling may improve numerical stability of the optimization algorithm and may circumvent convergence problems. We do not consider this alternative in any of the examples that we discuss. 3. Based on the design of the study, simplify the model by removing random effects that may not be necessary: In general, we recommend removing higher-order terms (e.g., higher-level interactions and higher-level polynomials) from a model first for both random and fixed effects. This method helps to ensure that the reduced model remains well formulated (Morrell et al., 1997). However, in some cases, it may be appropriate to remove lower-order random effects first, while retaining higher-order random effects in a model; such an approach requires thorough justification. For instance, in the analysis of the longitudinal data for the Autism example in Chapter 6, we remove the random effects associated with the intercept (which contribute to variation at all time points for a given subject) first, while retaining random effects associated with the linear and quadratic effects of age. By doing this, we assume that all variation between measurements of the dependent variable at the initial time point is attributable to residual variation (i.e., we assume that none of the overall variation at the first time point is attributable to between-subject variation). To implement this in an LMM, we define additional random effects (i.e., the random linear and quadratic effects associated with age) in such a way that they do not contribute to the vari- Linear Mixed Models: An Overview 33 ation at the initial time point, and consequently, all variation at this time point is due to residual error. Another implication of this choice is that between-subject variation is described using random linear and quadratic effects of age only. 4. Fit the implied marginal model: As mentioned in Section 2.3, one can sometimes fit the marginal model implied by a given LMM. The important difference when fitting the implied marginal model is that there are fewer restrictions on the covariance parameters being estimated. We present two examples of this approach: (a) If one is fitting an LMM with random intercepts only and a homogeneous residual covariance structure, one can directly fit the marginal model implied by this LMM by fitting a model with random effects omitted, and with a compound symmetry covariance structure for the residuals. We present an example of this approach in the analysis of the Dental Veneer data in Subsection 7.11.1. (b) Another approach is to “emulate” the fit of an implied marginal model by fitting an LMM and, if needed, removing the positive-definiteness constraints on the D and the Ri matrices. The option of relaxing constraints on the D and Ri matrices is currently only available in SAS proc mixed, via use of the nobound option. We consider this approach in the analysis of the Autism data in Subsection 6.4.1. 5. Fit the marginal model with an unstructured covariance matrix: In some cases, software procedures are not capable of fitting an implied marginal model, which involves less restrictive constraints imposed on the covariance parameters. If measurements are taken at a relatively small number of prespecified time points for all subjects, one can instead fit a marginal model (without any random effects specified) with an unstructured covariance matrix for the residuals. We consider this alternative approach in the analysis of the Autism data in Chapter 6. Note that none of these alternative methods guarantees convergence to the optimal and properly constrained values of covariance parameter estimates. The methods that involve fitting a marginal model (items 4 and 5 in the preceding text) shift a more restrictive requirement for the D and Ri matrices to be positive-definite to a less restrictive requirement for the matrix Vi (or Vi∗ ) to be positive-definite, but they still do not guarantee convergence. In addition, methods involving marginal models do not allow for inferences about random effects and their variances. 2.6 Tools for Model Selection When analyzing clustered and repeated-measures/longitudinal data sets using LMMs, researchers are faced with several competing models for a given data set. These competing models describe sources of variation in the dependent variable and at the same time allow researchers to test hypotheses of interest. It is an important task to select the “best” model, i.e., a model that is parsimonious in terms of the number of parameters used, and at the same time is best at predicting (or explaining variation in) the dependent variable. In selecting the best model for a given data set, we take into account research objectives, sampling and study design, previous knowledge about important predictors, and important subject matter considerations. We also use analytic tools, such as the hypothesis tests 34 Linear Mixed Models: A Practical Guide Using Statistical Software and the information criteria discussed in this section. Before we discuss specific hypothesis tests and information criteria in detail, we introduce the basic concepts of nested models and hypothesis specification and testing in the context of LMMs. For readers interested in additional details on current practice and research in model selection techniques for LMMs, we suggest Steele (2013). 2.6.1 2.6.1.1 Basic Concepts in Model Selection Nested Models An important concept in the context of model selection is to establish whether, for any given pair of models, there is a “nesting” relationship between them. Assume that we have two competing models: Model A and Model B. We define Model A to be nested in Model B if Model A is a “special case” of Model B. By special case, we mean that the parameter space for the nested Model A is a subspace of that for the more general Model B. Less formally, we can say that the parameters in the nested model can be obtained by imposing certain constraints on the parameters in the more general model. In the context of LMMs, a model is nested within another model if a set of fixed effects and/or covariance parameters in a nested model can be obtained by imposing constraints on parameters in a more general model (e.g., constraining certain parameters to be equal to zero or equal to each other). 2.6.1.2 Hypotheses: Specification and Testing Hypotheses about parameters in an LMM are specified by providing null (H0 ) and alternative (HA ) hypotheses about the parameters in question. Hypotheses can also be formulated in the context of two models that have a nesting relationship. A more general model encompasses both the null and alternative hypotheses, and we refer to it as a reference model. A second simpler model satisfies the null hypothesis, and we refer to this model as a nested (null hypothesis) model. Briefly speaking, the only difference between these two models is that the reference model contains the parameters being tested, but the nested (null) model does not. Hypothesis tests are useful tools for making decisions about which model (nested vs. reference) to choose. The likelihood ratio tests presented in Subsection 2.6.2 require analysts to fit both the reference and nested models. In contrast, the alternative tests presented in Subsection 2.6.3 require fitting only the reference model. We refer to nested and reference models explicitly in the example chapters when testing various hypotheses. We also include a diagram in each of the example chapters (e.g., Figure 3.3) that indicates the nesting of models, and the choice of preferred models based on results of formal hypothesis tests or other considerations. 2.6.2 Likelihood Ratio Tests (LRTs) LRTs are a class of tests that are based on comparing the values of likelihood functions for two models (i.e., the nested and reference models) defining a hypothesis being tested. LRTs can be employed to test hypotheses about covariance parameters or fixed-effect parameters in the context of LMMs. In general, LRTs require that both the nested (null hypothesis) model and reference model corresponding to a specified hypothesis are fitted to the same subset of the data. The LRT statistic is calculated by subtracting –2 times the log-likelihood for the reference model from that for the nested model, as shown in the following equation: −2 ln Lnested Lref erence = −2 ln(Lnested ) − (−2 ln(Lref erence )) ∼ χ2df (2.22) Linear Mixed Models: An Overview 35 In (2.22), Lnested refers to the value of the likelihood function evaluated at the ML or REML estimates of the parameters in the nested model, and Lref erence refers to the value of the likelihood function in the reference model. Likelihood theory states that under mild regularity conditions the LRT statistic asymptotically follows a χ2 distribution, in which the number of degrees of freedom, df, is obtained by subtracting the number of parameters in the nested model from the number of parameters in the reference model. Using the result in (2.22), hypotheses about the parameters in LMMs can be tested. The significance of the likelihood ratio test statistic can be determined by referring it to a χ2 distribution with the appropriate degrees of freedom. If the LRT statistic is sufficiently large, there is evidence against the null hypothesis model and in favor of the reference model. If the likelihood values of the two models are very close, and the resulting LRT statistic is small, we have evidence in favor of the nested (null hypothesis) model. 2.6.2.1 Likelihood Ratio Tests for Fixed-Effect Parameters The likelihood ratio tests that we use to test linear hypotheses about fixed-effect parameters in an LMM are based on ML estimation; using REML estimation is not appropriate in this context (Morrell, 1998; Pinheiro & Bates, 2000; Verbeke & Molenberghs, 2000). For LRTs of fixed effects, the nested and reference models have the same set of covariance parameters but different sets of fixed-effect parameters. The test statistic is calculated by subtracting the –2 ML log-likelihood for the reference model from that for the nested model. The asymptotic null distribution of the test statistic is a χ2 with degrees of freedom equal to the difference in the number of fixed-effect parameters between the two models. 2.6.2.2 Likelihood Ratio Tests for Covariance Parameters When testing hypotheses about covariance parameters in an LMM, REML estimation should be used for both the reference and nested models, especially in the context of small sample size. REML estimation has been shown to reduce the bias inherent in ML estimates of covariance parameters (Morrell, 1998). We assume that the nested and reference models have the same set of fixed-effect parameters, but different sets of covariance parameters. To carry out a REML-based likelihood ratio test for covariance parameters, the –2 REML log-likelihood value for the reference model is subtracted from that for the nested model. The null distribution of the test statistic depends on whether the null hypothesis values for the covariance parameters lie on the boundary of the parameter space for the covariance parameters or not. Case 1: The covariance parameters satisfying the null hypothesis do not lie on the boundary of the parameter space. When carrying out a REML-based likelihood ratio test for covariance parameters in which the null hypothesis does not involve testing whether any parameters lie on the boundary of the parameter space (e.g., testing a model with heterogeneous residual variance vs. a model with constant residual variance, or testing whether a covariance between two random effects is equal to zero), the test statistic is asymptotically distributed as a χ2 with degrees of freedom calculated by subtracting the number of covariance parameters in the nested model from that in the reference model. An example of such a test is given in Subsection 5.5.2, in which we test a heterogeneous residual variance model vs. a model with constant residual variance (Hypothesis 5.2 in the Rat Brain example). Case 2: The covariance parameters satisfying the null hypothesis lie on the boundary of the parameter space. 36 Linear Mixed Models: A Practical Guide Using Statistical Software Tests of null hypotheses in which covariance parameters have values that lie on the boundary of the parameter space often arise in the context of testing whether a given random effect should be kept in a model or not. We do not directly test hypotheses about the random effects themselves. Instead, we test whether the corresponding variances and covariances of the random effects are equal to zero. In the case in which we have a single random effect in a model, we might wish to test the null hypothesis that the random effect can be omitted. Self & Liang (1987), Stram & Lee (1994), and Verbeke & Molenberghs (2000) have shown that the test statistic in this case has an asymptotic null distribution that is a mixture of χ20 and χ21 distributions, with each having an equal weight of 0.5. Note that the χ20 distribution is concentrated entirely at zero, so calculations of p-values can be simplified and effectively are based on the χ21 distribution only. An example of this type of test is given in the analysis of the Rat Pup data, in which we test whether the variance of the random intercepts associated with litters is equal to zero in Subsection 3.5.1 (Hypothesis 3.1). In the case in which we have two random effects in a model and we wish to test whether one of them can be omitted, we need to test whether the variance for the given random effect that we wish to test and the associated covariance of the two random effects are both equal to zero. The asymptotic null distribution of the test statistic in this case is a mixture of χ21 and χ22 distributions, with each having an equal weight of 0.5 (Verbeke & Molenberghs, 2000). An example of this type of likelihood ratio test is shown in the analysis of the Autism data in Chapter 6, in which we test whether the variance associated with the random quadratic age effects and the associated covariance of these random effects with the random linear age effects are both equal to zero in Subsection 6.5.1 (Hypothesis 6.1). Because most statistical software procedures capable of fitting LMMs provide the option of using either ML estimation or REML estimation for a given model, one can choose to use REML estimation to fit the reference and nested models when testing hypotheses about covariance parameters, and ML estimation when testing hypotheses about fixed effects. Finally, we note that Crainiceanu & Ruppert (2004) have defined the exact null distribution of the likelihood ratio test statistic for a single variance component under more general conditions (including small samples), and software is available in R implementing exact likelihood ratio tests based on simulations from this distribution (the exactLRT() function in the RLRsim package). Galecki & Burzykowski (2013) present examples of the use of this function. Appropriate null distributions of likelihood ratio test statistics for multiple covariance parameters have not been derived to date, and classical likelihood ratio tests comparing nested models with multiple variance components constrained to be 0 in the reduced model should be considered conservative. The mixed command in Stata, for example, makes explicit note of this when users fit models with multiple random effects (for example, see Section 5.4.4). 2.6.3 Alternative Tests In this section we present alternatives to likelihood ratio tests of hypotheses about the parameters in a given LMM. Unlike the likelihood ratio tests discussed in Subsection 2.6.2, these tests require fitting only a reference model. 2.6.3.1 Alternative Tests for Fixed-Effect Parameters A t-test is often used for testing hypotheses about a single fixed-effect parameter (e.g., H0 : β = 0 vs. HA : β = 0) in an LMM. The corresponding t-statistic is calculated as follows: Linear Mixed Models: An Overview 37 t= β se(β) (2.23) In the context of an LMM, the null distribution of the t-statistic in (2.23) does not in general follow an exact t distribution. Unlike the case of the standard linear model, the number of degrees of freedom for the null distribution of the test statistic is not equal to n − p (where p is the total number of fixed-effect parameters estimated). Instead, we use approximate methods to estimate the degrees of freedom. The approximate methods for degrees of freedom for both t-tests and F -tests are discussed later in this section. Software Note: The mixed command in Stata calculates z-statistics for tests of single fixed-effect parameters in an LMM using the same formula as specified for the t-test in (2.23). These z-statistics assume large sample sizes and refer to the standard normal distribution, and therefore do not require the calculation of degrees of freedom to derive a p-value. An F -test can be used to test linear hypotheses about multiple fixed effects in an LMM. For example, we may wish to test whether any of the parameters associated with the levels of a fixed factor are different from zero. In general, when testing a linear hypothesis of the form H0 : Lβ = 0 vs. HA : Lβ = 0 where L is a known matrix, the F -statistic defined by ⎛ −1 ⎞−1 L ⎝L β Xi Vi−1 Xi L ⎠ Lβ F = i rank(L) (2.24) follows an approximate F distribution, with numerator degrees of freedom equal to the rank of the matrix L (recall that the rank of a matrix is the number of linearly independent rows or columns), and an approximate denominator degrees of freedom that can be estimated using various methods (Verbeke & Molenberghs, 2000). Similar to the case of the t-test, the F -statistic in general does not follow an exact F distribution, with known numerator and denominator degrees of freedom. Instead, the denominator degrees of freedom are approximated. The approximate methods that apply to both t-tests and F -tests take into account the presence of random effects and correlated residuals in an LMM. Several of these approximate methods (e.g., the Satterthwaite method, or the “between-within” method) involve different choices for the degrees of freedom used in the approximate t-tests and F -tests. The Kenward–Roger method goes a step further. In addition to adjusting the degrees of freedom using the Satterthwaite method, this method also modifies the estimated covariance matrix to reflect uncertainty in using Vi as a substitute for Vi in (2.19) and (2.20). We discuss these approximate methods in more detail in Subsection 3.11.6. Different types of F -tests are often used in practice. We focus on Type I F -tests and Type III F -tests. Briefly, Type III F -tests are conditional on the effects of all other terms in a given model, whereas Type I (sequential) F -tests are conditional on just the fixed effects listed in the model prior to the effects being tested. Type I and Type III F -tests are therefore equivalent only for the term entered last in the model (except for certain models 38 Linear Mixed Models: A Practical Guide Using Statistical Software for balanced data). We compare these two types of F -tests in more detail in the example chapters. An omnibus Wald test can also be used to test linear hypotheses of the form H0 : Lβ = 0 vs. HA : Lβ = 0. The test statistic for a Wald test is the numerator in (2.24), and it asymptotically follows a χ2 distribution with degrees of freedom equal to the rank of the L matrix. We consider Wald tests for fixed effects using the Stata and HLM software in the example chapters. 2.6.3.2 Alternative Tests for Covariance Parameters A simple test for covariance parameters is the Wald z-test. In this test, a z-statistic is computed by dividing an estimated covariance parameter by its estimated standard error. The p-value for the test is calculated by referring the test statistic to a standard normal distribution. The Wald z-test is asymptotic, and requires that the random factor with which the random effects are associated has a large number of levels. This test statistic also has unfavorable properties when a hypothesis test about a covariance parameter involves values on the boundary of its parameter space. Because of these drawbacks, we do not recommend using Wald z-tests for covariance parameters, and instead recommend the use of likelihood ratio tests, with p-values calculated using appropriate χ2 distributions or mixtures of χ2 distributions. The procedures in the HLM software package by default generate alternative chi-square tests for covariance parameters in an LMM (see Subsection 4.7.2 for an example). These tests are described in detail in Raudenbush & Bryk (2002). 2.6.4 Information Criteria Another set of tools useful in model selection are referred to as information criteria. The information criteria (sometimes referred to as fit criteria) provide a way to assess the fit of a model based on its optimum log-likelihood value, after applying a penalty for the number of parameters that are estimated in fitting the model. A key feature of the information criteria discussed in this section is that they provide a way to compare any two models fitted to the same set of observations; i.e., the models do not need to be nested. We use the “smaller is better” form for the information criteria discussed in this section; that is, a smaller value of the criterion indicates a “better” fit. The Akaike information criterion (AIC) may be calculated based on the (ML or REML) log-likelihood, (β, θ), of a fitted model as follows (Akaike, 1973): θ) + 2p AIC = −2 × (β, (2.25) In (2.25), p represents the total number of parameters being estimated in the model for both the fixed and random effects. Note that the AIC in effect “penalizes” the fit of a model for the number of parameters being estimated by adding 2p to the –2 log-likelihood. Some software procedures calculate the AIC using slightly different formulas, depending on whether ML or REML estimation is being used (see Subsection 3.6.1 for a discussion of the calculation formulas used for the AIC in the different software procedures). The Bayes information criterion (BIC) is also commonly used and may be calculated as follows: θ) + p × ln(n) BIC = −2 × (β, (2.26) Linear Mixed Models: An Overview 39 The BIC in (2.26) applies a greater penalty for models with more parameters than does the AIC, because we multiply the number of parameters being estimated by the natural logarithm of n, where n is the total number of observations used in estimation of the model. Recent work (Steele, 2013; Gurka, 2006) suggests that no one information criterion stands apart as the best criterion to be used when selecting LMMs, and that more work still needs to be done in understanding the role that information criteria play in the selection of LMMs. Consistent with Steele (2013), we recommend that analysts compute a variety of information criteria when choosing among competing models, and identify models favored by multiple criteria. 2.7 Model-Building Strategies A primary goal of model selection is to choose the simplest model that provides the best fit to the observed data. There may be several choices concerning which fixed and random effects should be included in an LMM. There are also many possible choices of covariance structures for the D and Ri matrices. All of these considerations have an impact on both the estimated marginal mean (Xi β) and the estimated marginal variance-covariance matrix Vi (= Zi DZi + Ri ) for the observed responses in Yi based on the specified model. The process of building an LMM for a given set of longitudinal or clustered data is an iterative one that requires a series of model-fitting steps and investigations, and selection of appropriate mean and covariance structures for the observed data. Model building typically involves a balance of statistical and subject matter considerations; there is no single strategy that applies to every application. 2.7.1 The Top-Down Strategy The following broadly defined steps are suggested by Verbeke & Molenberghs (2000) for building an LMM for a given data set. We refer to these steps as a top-down strategy for model building, because they involve starting with a model that includes the maximum number of fixed effects that we wish to consider in a model. 1. Start with a well-specified mean structure for the model: This step typically involves adding the fixed effects of as many covariates (and interactions between the covariates) as possible to the model to make sure that the systematic variation in the responses is well explained before investigating various covariance structures to describe random variation in the data. In the example chapters we refer to this as a model with a loaded mean structure. 2. Select a structure for the random effects in the model: This step involves the selection of a set of random effects to include in the model. The need for including the selected random effects can be tested by performing REML-based likelihood ratio tests for the associated covariance parameters (see Subsection 2.6.2.2 for a discussion of likelihood ratio tests for covariance parameters). 3. Select a covariance structure for the residuals in the model: Once fixed effects and random effects have been added to the model, the remaining variation in the observed responses is due to residual error, and an appropriate covariance structure for the residuals should be investigated. 40 Linear Mixed Models: A Practical Guide Using Statistical Software 4. Reduce the model: This step involves using appropriate statistical tests (see Subsections 2.6.2.1 and 2.6.3.1) to determine whether certain fixed-effect parameters are needed in the model. In general, one would iterate between Steps 2 and 4 when building a model. We use this top-down approach to model building for the data sets that we analyze in Chapter 3, and Chapters 5 through 7. 2.7.2 The Step-Up Strategy An alternative approach to model building, which we refer to as the step-up strategy, has been developed in the literature on HLMs. We use the step-up model-building strategy in the analysis of the Classroom data in Chapter 4. This approach is outlined in both Raudenbush & Bryk (2002) and Snijders & Bosker (1999), and is described in the following text: 1. Start with an “unconditional” (or means-only) Level 1 model for the data: This step involves fitting an initial Level 1 model having the fixed intercept as the only fixed-effect parameter. The model also includes random effects associated with the Level 2 units, and Level 3 units in the case of a three-level data set. This model allows one to assess the variation in the response values across the different levels of the clustered or longitudinal data set without adjusting for the effects of any covariates. 2. Build the model by adding Level 1 covariates to the Level 1 model. In the Level 2 model, consider adding random effects to the equations for the coefficients of the Level 1 covariates: In this step, Level 1 covariates and their associated fixed effects are added to the Level 1 model. These Level 1 covariates may help to explain variation in the residuals associated with the observations on the Level 1 units. The Level 2 model can also be modified by adding random effects to the equations for the coefficients of the Level 1 covariates. These random effects allow for random variation in the effects of the Level 1 covariates across Level 2 units. 3. Build the model by adding Level 2 covariates to the Level 2 model. For three-level models, consider adding random effects to the Level 3 equations for the coefficients of the Level 2 covariates: In this step, Level 2 covariates and their associated fixed effects can be added to the Level 2 model. These Level 2 covariates may explain some of the random variation in the effects of the Level 1 covariates that is captured by the random effects in the Level 2 models. In the case of a three-level data set, the effects of the Level 2 covariates in the Level 2 model might also be allowed to vary randomly across Level 3 units. After appropriate equations for the effects of the Level 1 covariates have been specified in the Level 2 model, one can assess assumptions about the random effects in the Level 2 model (e.g., normality and constant variance). This process is then repeated for the Level 3 model in the case of a three-level analysis (e.g., Chapter 4). The model-building steps that we present in this section are meant to be guidelines and are not hard-and-fast rules for model selection. In the example chapters, we illustrate aspects of the top-down and step-up model-building strategies when fitting LMMs to real data sets. Our aim is to illustrate specific concepts in the analysis of longitudinal or clustered data, rather than to construct the best LMM for a given data set. Linear Mixed Models: An Overview 2.8 41 Checking Model Assumptions (Diagnostics) After fitting an LMM, it is important to carry out model diagnostics to check whether distributional assumptions for the residuals are satisfied and whether the fit of the model is sensitive to unusual observations. The process of carrying out model diagnostics involves several informal and formal techniques. Diagnostic methods for standard linear models are well established in the statistics literature. In contrast, diagnostics for LMMs are more difficult to perform and interpret, because the model itself is more complex, due to the presence of random effects and different covariance structures. In this section, we focus on the definitions of a selected set of terms related to residual and influence diagnostics in LMMs. We refer readers to Claeskens (2013) and Schabenberger (2004) for more detailed descriptions of existing diagnostic methods for LMMs. In general, model diagnostics should be part of the model-building process throughout the analysis of a clustered or longitudinal data set. We consider diagnostics only for the final model fitted in each of the example chapters for simplicity of presentation. 2.8.1 Residual Diagnostics Informal techniques are commonly used to check residual diagnostics; these techniques rely on the human mind and eye, and are used to decide whether or not a specific pattern exists in the residuals. In the context of the standard linear model, the simplest example is to decide whether a given set of residuals plotted against predicted values represents a random pattern or not. These residual vs. fitted plots are used to verify model assumptions and to detect outliers and potentially influential observations. In general, residuals should be assessed for normality, constant variance, and outliers. In the context of LMMs, we consider conditional residuals and their “studentized” versions, as described in the following subsections. 2.8.1.1 Raw Residuals A conditional residual is the difference between the observed value and the conditional predicted value of the dependent variable. For example, we write an equation for the vector of conditional residuals for a given individual i in a two-level longitudinal data set as follows (refer to Subsection 2.9.1 for the calculation of u i ): − Zi u i εi = yi − Xi β (2.27) In general, conditional residuals in their basic form in (2.27) are not well suited for verifying model assumptions and detecting outliers. Even if the true model residuals are uncorrelated and have equal variance, conditional residuals will tend to be correlated and their variances may be different for different subgroups of individuals. The shortcomings of raw conditional residuals apply to models other than LMMs as well. We discuss alternative forms of the conditional residuals in Subsection 2.8.1.2. In contrast to conditional residuals, marginal residuals are based on models that do not include explicit random effects: ε∗i = yi − Xi β (2.28) Although we consider diagnostic tools for conditional residuals in this section, a separate class of diagnostic tools exists for the marginal residuals defined in (2.28) (see Galecki & 42 Linear Mixed Models: A Practical Guide Using Statistical Software Burzykowski, 2013, for more details). We consider examples of different covariance structures for marginal residuals in later chapters. 2.8.1.2 Standardized and Studentized Residuals To alleviate problems with the interpretation of conditional residuals that may have unequal variances, we consider scaling (i.e., dividing) the residuals by their true or estimated standard deviations. Ideally, we would like to scale residuals by their true standard deviations to obtain standardized residuals. Unfortunately, the true standard deviations are rarely known in practice, so scaling is done using estimated standard deviations instead. Residuals obtained in this manner are called studentized residuals. Another method of scaling residuals is to divide them by the estimated standard deviation of the dependent variable. The resulting residuals are called Pearson residuals. can be ignored. Other Pearson-type scaling is appropriate if we assume that variability of β scaling choices are also possible, although we do not consider them. The calculation of a studentized residual may also depend on whether the observation corresponding to the residual in question is included in the estimation of the standard deviation or not. If the corresponding observation is included, we refer to it as internal studentization. If the observation is excluded, we refer to it as external studentization. We discuss studentized residuals in the model diagnostics section in the analysis of the Rat Pup data in Chapter 3. Studentized residuals are directly available in SAS proc mixed, but are not readily available in the other software that we feature, and require additional calculation. 2.8.2 Influence Diagnostics Likelihood-based estimation methods (both ML and REML) are sensitive to unusual observations. Influence diagnostics are formal techniques that allow one to identify observations that heavily influence estimates of the parameters in either β or θ. Influence diagnostics for LMMs is an active area of research. The idea of influence diagnostics for a given observation (or subset of observations) is to quantify the effect of omission of those observations on the results of the analysis of the entire data set. Schabenberger discusses several influence diagnostics for LMMs in detail (Schabenberger, 2004), and a recent review of current practice and research in this area can be found in Claeskens (2013). Influence diagnostics may be used to investigate various aspects of the model fit. Because LMMs are more complicated than standard linear models, the influence of observations on the model fit can manifest itself in more varied and complicated ways. It is generally recommended to follow a top-down approach when carrying out influence diagnostics in mixed models. First, check overall influence diagnostics. Assuming that there are influential sets of observations based on the overall influence diagnostics, proceed with other diagnostics to see what aspect of the model a given subset of observations affects: fixed effects, covariance parameters, the precision of the parameter estimates, or predicted values. Influence diagnostics play an important role in the interpretation of the results. If a given subset of data has a strong influence on the estimates of covariance parameters, but limited impact on the fixed effects, then it is appropriate to interpret the model with respect to prediction. However, we need to keep in mind that changes in estimates of covariance parameters may affect the precision of tests for fixed effects and, consequently, confidence intervals. Linear Mixed Models: An Overview 43 We focus on a selected group of influence diagnostics, which are summarized in Table 2.5. Following Schabenberger’s notation, we use the subscript (U ) to denote quantities calculated based on the data having a subset, U , excluded from calculations. For instance, consider the overall influence calculations for an arbitrarily chosen vector of parameters, ψ (which can used in the calculation formulas denotes an include parameters in β or θ). The vector ψ u estimate of ψ computed based on the reduced “leave-U -out” data. These methods include, but are not limited to, overall influence, change in parameter estimates, change in precision of parameter estimates, and effect on predicted values. All methods for influence diagnostics presented in Table 2.5 clearly depend on the subset, U , of observations that is being considered. The main difference between the Cook’s distance statistic and the MDFFITS statistic shown in Table 2.5 is that the MDFFITS which are based on recalculated covaristatistic uses “externalized” estimates of var(β), ance estimates using the reduced data, whereas Cook’s distance does not recalculate the (see (2.20)). covariance parameter estimates in var(β) Calculations for influence statistics can be performed using either noniterative or iterative methods. Noniterative methods are based on explicit (closed-form) updated formulas (not shown in Table 2.5). The advantage of noniterative methods is that they are more time efficient than iterative methods. The disadvantage is that they require the rather strong assumption that all covariance parameters are known, and thus are not updated, with the exception of the profiled residual variance. Iterative influence diagnostics require refitting the model without the observations in question; consequently, the covariance parameters are updated at each iteration, and computational execution time is much longer. Software Note: All the influence diagnostic methods presented in Table 2.5 are currently supported by proc mixed in SAS. A class of leverage-based methods is also available in proc mixed, but we do not discuss them in the example chapters. In Chapter 3, we present and interpret several influence diagnostics generated by proc mixed for the final model fitted to the Rat Pup data. Selected influence diagnostics can also be computed when using the nlmeU and HLMdiag packages in R (see Section 20.3 of Galecki & Burzykowski, 2013, or Loy & Hofmann, 2014, for computational details). To our knowledge, influence diagnostic methods are not currently available in the other software procedures. The theory behind these and other diagnostic methods is outlined in more detail by Claeskens (2013). 2.8.3 Diagnostics for Random Effects The natural choice to diagnose random effects is to consider the empirical Bayes (EB) predictors defined in Subsection 2.9.1. EB predictors are also referred to as random-effects predictors or, due to their properties, empirical best linear unbiased predictors (EBLUPs). We recommend using standard diagnostic plots (e.g., histograms, quantile–quantile (Q– Q) plots, and scatter-plots) to investigate EBLUPs for potential outliers that may warrant further investigation. In general, checking EBLUPs for normality is of limited value, because their distribution does not necessarily reflect the true distribution of the random effects. We consider informal diagnostic plots for EBLUPs in the example chapters. 44 TABLE 2.5: Summary of Influence Diagnostics for LMMs Descriptiona Name Par. of Interest Formula Overall influence Likelihood distance / displacement ψ − (ψ )} LD(u) = 2{(ψ) (u) Restricted likelihood distance / displacement ψ − R (ψ )} RLD(u) = 2{R (ψ) (u) Cook’s D β −β ) v −1 (β −β )/rank(X) D(β) = (β ar[β] (u) (u) θ −θ (u) ) v −1 (θ −θ (u) ) D(θ) = (θ ar[θ] β M DF F IT S(β) = −β ) v ]−1 (β −β )/rank(X) (β ar[β (u) (u) (u) θ −θ (u) ) v (u) ]−1 (θ −θ (u) ) M DF F IT S(θ) = (θ ar[θ Change in parameter estimates Multivariate DFFITS statistic Change in ML log-likelihood for all data with ψ estimated for all data vs. reduced data Change in REML log-likelihood for all data with ψ estimated for all data vs. reduced data Scaled change in entire estimated β vector Scaled change in entire estimated θ vector Scaled change in entire estimated β vector, using externalized estimates of var (β) Scaled change in entire estimated θ vector, using externalized estimates of var (θ) Linear Mixed Models: A Practical Guide Using Statistical Software Group Group Name Par. of Interest Formula Descriptiona Change in precision of parameter estimates Trace of covariance matrix β COV T RACE(β) = −1 v ]) − rank(X)| |trace( v ar[β] ar[β (u) Change in precision of estimated β vector, based on trace of var (β) θ −1 v (u) ]) − q| COV T RACE(θ) = |trace( v ar[θ] ar[θ β COV RAT IO(β) = ]) detns ( v ar[β (u) detns ( v ar[β]) θ COV RAT IO(θ) = (u) ]) detns ( v ar[θ detns ( v ar[θ]) Covariance ratiob ) Effect on Sum of squared N/A P RESS(u) = (yi − xi β (u) i∈u predicted PRESS residuals value a The “change” in the parameters estimates for each influence statistic is calculated by using all data compared to the reduced “leave-U -out” data. b detns means the determinant of the nonsingular part of the matrix. Change in precision of estimated θ vector, based on trace of var (θ) Linear Mixed Models: An Overview TABLE 2.5: (Continued) Change in precision of estimated β vector, based on determinant of var (β) Change in precision of estimated θ vector, based on determinant of var (θ) Sum of PRESS residuals calculated by deleting observations in U 45 46 2.9 Linear Mixed Models: A Practical Guide Using Statistical Software Other Aspects of LMMs In this section, we discuss additional aspects of fitting LMMs that may be considered when analyzing clustered or longitudinal data sets. 2.9.1 Predicting Random Effects: Best Linear Unbiased Predictors One aspect of LMMs that is different from standard linear models is the prediction of the values in the random-effects vector, ui . The values in ui are not fixed, unknown parameters that can be estimated, as is the case for the values of β in a linear model. Rather, they are random variables that are assumed to follow some multivariate normal distribution. As a result, we predict the values of these random effects, rather than estimate them (Carlin & Louis, 2009). Thus far, we have discussed the variances and covariances of the random effects in the D matrix without being particularly interested in predicting the values that these random effects may take. However, in some research settings, it may be useful to predict the values of the random effects associated with specific levels of a random factor. Unlike fixed effects, we are not interested in estimating the mean (i.e., the expected value) of a set of random effects, because we assume that the expected value of the multivariate normal distribution of random effects is a vector of zeroes. However, assuming that the expected value of a random effect is zero does not make any use of the observed data. In the context of an LMM, we take advantage of all the data collected for those observations sharing the same level of a particular random factor and use that information to predict the values of the random effects in the LMM. To do this, we look at the conditional expectations of the random effects, given the observed response values, yi , in Y i . The conditional expectation for ui is i V −1 (yi − Xi β) i = E(ui |Y i = yi ) = DZ u i (2.29) The predicted values in (2.29) are the expected values of the random effects, ui , associated with the i-th level of a random factor, given the observed data in yi . These conditional expectations are known as best linear unbiased predictors (BLUPs) of the random effects. We refer to them as EBLUPs (or empirical BLUPs), because they are based on the estimates of the β and θ parameters. The variance-covariance matrix of the EBLUPs can be written as follows: (V −1 − V −1 Xi ( V ar( ui ) = DZ Xi Vi−1 Xi )−1 Xi Vi−1 )Zi D (2.30) i i i i EBLUPs are “linear” in that they are linear functions of the observed data, yi . They are “unbiased” in that their expectation is equal to the expectation of the random effects for a single subject i. They are “best” in that they have minimum variance (see (2.30)) among all linear unbiased estimators (i.e., they are the most precise linear unbiased estimators; Robinson (1991)). And finally, they are “predictions” of the random effects based on the observed data. EBLUPs are also known as shrinkage estimators because they tend to be closer to zero than the estimated effects would be if they were computed by treating a random factor as if it were fixed. We include a discussion of shrinkage estimators on the web page for the book (see Appendix A). Linear Mixed Models: An Overview 2.9.2 47 Intraclass Correlation Coefficients (ICCs) In general, the intraclass correlation coefficient (ICC) is a measure describing the similarity (or homogeneity) of the responses on the dependent variable within a cluster (in a clustered data set) or a unit of analysis (in a repeated-measures or longitudinal data set). We consider different forms of the ICC in the analysis of a two-level clustered data set (the Rat Pup data) in Chapter 3, and the analysis of a three-level data set (the Classroom data) in Chapter 4. 2.9.3 Problems with Model Specification (Aliasing) In this subsection we informally discuss aliasing (and related concepts) in general terms. We then illustrate these concepts with two hypothetical examples. In our explanation, we follow the work of Nelder (1977). We can think of aliasing as an ambiguity that may occur in the specification of a parametric model (e.g., an LMM), in which multiple parameter sets (aliases) imply models that are indistinguishable from each other. There are two types of aliasing: 1. Intrinsic aliasing: Aliasing attributable to the model formula specification. 2. Extrinsic aliasing: Aliasing attributable to the particular characteristics of a given data set. Nonidentifiability and overparameterization are other terms often used to refer to intrinsic aliasing. In this section we use the term aliasing to mean intrinsic aliasing; however, most of the remarks apply to both intrinsic and extrinsic aliasing. Aliasing should be detected by the researcher at the time that a model is specified; otherwise, if unnoticed, it may lead to difficulties in the estimation of the model parameters and/or incorrect interpretation of the results. Aliasing has important implications for parameter estimation. More specifically, aliasing implies that only certain linear combinations of parameters are estimable and other combinations of the parameters are not. “Nonestimability” due to aliasing is caused by the fact that there are infinitely many sets of parameters that lead to the same set of predicted values (i.e., imply the same model). Consequently, each value of the likelihood function (including the maximum value) can be obtained with infinitely many sets of parameters. To resolve a problem with aliasing so that a unique solution in a given parameter space can be obtained, the common practice is to impose additional constraints on the parameters in a specified model. Although constraints can be chosen arbitrarily out of infinitely many, some choices are more natural than others. We choose constraints in such a way as to facilitate interpretation of parameters in the model. At the same time, it is worthwhile to point out that the choice of constraints does not affect the meaning (or interpretation) of the model itself. It should also be noted that constraints imposed on parameters should not be considered as part of the model specification. Rather, constraints are a convenient way to resolve the issue of nonestimability caused by aliasing. In the case of aliasing of the β parameters in standard linear models (Example 1 following), many software packages by default impose constraints on the parameters to avoid aliasing, and it is the user’s responsibility to determine what constraints are used. In Example 2, we consider aliasing of covariance parameters. Example 1. A linear model with an intercept and a gender factor (Model E1). Most commonly, intrinsic aliasing is encountered in linear models involving categorical fixed factors as covariates. Consider for instance a hypothetical linear model, Model E1, with an intercept and gender considered as a fixed factor. Suppose that this model involves three 48 Linear Mixed Models: A Practical Guide Using Statistical Software corresponding fixed-effect parameters: μ (the intercept), μF for females, and μM for males. The X design matrix for this model has three columns: a column containing an indicator variable for the intercept (a column of ones), a column containing an indicator variable for females, and a column containing an indicator variable for males. Note that this design matrix is not of full rank. Consider transformation T1 of the fixed-effect parameters μ, μF , and μM , such that a constant C is added to μ and the same constant is subtracted from both μF and from μM . Transformation T1 is artificially constructed in such a way that any transformed set of parameters μ = μ + C, μF = μF − C, and μM = μM − C generates predicted values that are the same as in Model E1. In other words, the model implied by any transformed set of parameters is indistinguishable from Model E1. Note that the linear combinations μ+μF , μ+μM , or C1 ×μF +C2 ×μM , where C1 +C2 = 0, are not affected by transformation T1 , because μ + μF = μ + μF , μ + μM = μ + μM , and C1 × μF + C2 × μM = C1 × μF + C2 × μM . All linear combinations of parameters unaffected by transformation T1 are estimable. In contrast, the individual parameters μ, μF , and μM are affected by transformation T1 and, consequently, are not estimable. To resolve this issue of nonestimability, we impose constraints on μ, μF , and μM . Out of an infinite number of possibilities, we arbitrarily constrain μ to be zero. This constraint was selected so that it allows us to directly interpret μF and μM as the means of a dependent variable for females and males, respectively. In SAS proc mixed, for example, such a constraint can be accomplished by using the noint option in the model statement. By default, using the solution option in the model statement of proc mixed would constrain μM to be equal to zero, meaning that μ would be interpreted as the mean of the dependent variable for males, and μF would represent the difference in the mean for females compared to males. Example 2. An LMM with aliased covariance parameters (Model E2). Consider an LMM (Model E2) with the only fixed effect being the intercept, one random effect associated with the intercept for each subject (resulting in a single covariance parameter, 2 σint ), and a compound symmetry covariance structure for the residuals associated with repeated observations on the same subject (resulting in two covariance parameters, σ 2 and σ1 ; see the compound symmetry covariance structure for the Ri matrix in Subsection 2.2.2.2). In the marginal Vi matrix for observations on subject i that is implied by this model, 2 the diagonal elements (i.e., the marginal variances) are equal to σ 2 + σ1 + σint , and the 2 off-diagonal elements (i.e., the marginal covariances) are equal to σ1 + σint . 2 Consider transformation T2 , such that a constant C is added to σint , and the same constant C is subtracted from σ 1 . We assume that the possible values of C in transformation T2 should be constrained to those for which the matrices D and Ri remain positive-definite. Transformation T2 is constructed in such a way that any transformed set of parameters 2 σint + C and σ1 − C implies the same marginal variance-covariance matrix, Vi , and consequently, the marginal distribution of the dependent variable is the same as in Model E2, which means that all these models are indistinguishable. Moreover, after applying transformation T2 , the matrix Ri remains compound symmetric, as needed. 2 2 The linear combinations of covariance parameters σ 2 + σ1 + σint and σ1 + σint (i.e., the elements in the Vi matrix) are not affected by transformation T2 . In other words, these 2 linear combinations are estimable. Due to aliasing, the individual parameters σint and σ1 are not estimable. 2 To resolve this issue of nonestimability, we impose constraints on σint and σ1 . One possible constraint to consider, out of infinitely many, is σ1 = 0, which is equivalent to assuming that the residuals are not correlated and have constant variance (σ 2 ). In other Linear Mixed Models: An Overview 49 words, the Ri matrix no longer has a compound symmetry structure, but rather has a structure with constant variance on the diagonal and all covariances equal to zero. If such a constraint is not defined by the user, then the corresponding likelihood function based on all parameters has an infinite number of ML solutions. Consequently, the algorithm used for optimization in software procedures may not converge to a solution at all, or it may impose arbitrary constraints on the parameters and converge. In such a case, software procedures will generally issue a warning, such as “Invalid likelihood” or “Hessian not positive-definite” or “Convergence not achieved” (among others). In all these instances, parameter estimates and their standard errors may be invalid and should be interpreted with caution. We discuss aliasing of covariance parameters and illustrate how each software procedure handles it in the analysis of the Dental Veneer data in Chapter 7. 2.9.4 Missing Data In general, analyses using LMMs are carried out under the assumption that missing data in clustered or longitudinal data sets are missing at random (MAR) (see Little & Rubin, 2002, or Allison, 2001, for a more thorough discussion of missing data patterns and mechanisms). Under the assumption that missing data are MAR, inferences based on methods of ML estimation in LMMs are valid (Verbeke & Molenberghs, 2000). The MAR pattern means that the probability of having missing data on a given variable may depend on other observed information, but does not depend on the data that would have been observed but were in fact missing. For example, if subjects in a study do not report their weight because the actual (unobserved or missing) weights are too large or too small, then the missing weight data are not MAR. Likewise, if a rat pup’s birth weight is not collected because it is too small or too large for a measurement device to accurately detect it, the information is not MAR. However, if a subject’s current weight is not related to the probability that he or she reports it, but rather the likelihood of failing to report it depends on other observed information (e.g., illness or previous weight), then the data can be considered MAR. In this case, an LMM for the outcome of current weight should consider the inclusion of covariates, such as previous weight and illness, which are related to the nonavailability of current weight. Missing data are quite common in longitudinal studies, often due to dropout. Multivariate repeated-measures ANOVA models are often used in practice to analyze repeatedmeasures or longitudinal data sets, but LMMs offer two primary advantages over these multivariate approaches when there are missing data. First, they allow subjects being followed over time to have unequal numbers of measurements (i.e., some subjects may have missing data at certain time points). If a subject does not have data for the response variable present at all time points in a longitudinal or repeated-measures study, the subject’s entire set of data is omitted in a multivariate ANOVA (this is known as listwise deletion); the analysis therefore involves complete cases only. In an LMM analysis, all observations that are available for a given subject are used in the analysis. Second, when analyzing longitudinal data with repeated-measures ANOVA techniques, time is considered to be a within-subject factor, where the levels of the time factor are assumed to be the same for all subjects. In contrast, LMMs allow the time points when measurements are collected to vary for different subjects. Because of these key differences, LMMs are much more flexible analytic tools for longitudinal data than repeated-measures ANOVA models, under the assumption that any missing data are MAR. We advise readers to inspect longitudinal data sets thoroughly for problems with missing data. If the vast majority of subjects in a longitudinal study have data present at only a single time point, an LMM approach may not be warranted, because there may not 50 Linear Mixed Models: A Practical Guide Using Statistical Software be enough information present to estimate all of the desired covariance parameters in the model. In this situation, simpler regression models should probably be considered because issues of within-subject dependency in the data may no longer apply. When analyzing clustered data sets (such as students nested within classrooms), clusters may be of unequal sizes, or there may be data within clusters that are MAR. These problems result in unbalanced data sets, in which an unequal number of observations are collected for each cluster. LMMs can be fitted to unbalanced clustered data sets, again under the assumption that any missing data are MAR. Quite similar to the analysis of longitudinal data sets, multivariate techniques or techniques requiring balanced data break down when attempting to analyze unbalanced clustered data. LMMs allow one to make valid inferences when modeling these types of clustered data, which arise frequently in practice. 2.9.5 Centering Covariates Centering covariates at specific values (i.e., subtracting a specific value, such as the mean, from the observed values of a continuous covariate) has the effect of changing the intercept in the model, so that it represents the expected value of the dependent value at a specific value of the covariate (e.g., the mean), rather than the expected value when the covariate is equal to zero (which is often outside the range of the observed data). This type of centering is often known as grand mean centering. We consider grand mean centering of covariates in the analysis of the Autism data in Chapter 6. An alternative centering procedure is to center continuous covariates at the mean values of the higher-level clusters or groups within which the units measured on the continuous covariates are nested. This type of centering procedure, sometimes referred to as group mean centering, changes the interpretation of both the intercept and the estimated fixed effects for the centered covariates, unlike grand mean centering. This type of centering requires an interpretation of the fixed effects associated with the centered covariates that is relative to other units within a higher-level group (or cluster). A fixed effect in this case now reflects the expected change in the dependent variable for a one-unit within-cluster increase in the centered covariate. A thorough overview of these centering options in the context of LMMs can be found in Enders (2013). We revisit this issue in Chapter 6. 2.9.6 Fitting Linear Mixed Models to Complex Sample Survey Data In many applied fields, scientific advances are driven by secondary analyses of large survey data sets. Survey data are often collected from large, representative probability samples of specified populations, where a formal sample design assigns a known probability of selection to all individual elements (people, households, establishments, etc.) in the target population. Many of the larger, nationally representative survey data sets that survey organizations make available for public consumption (e.g., the National Health and Nutrition Examination Survey, or NHANES, in the United States; see http://www.cdc.gov/nchs/ nhanes/about_nhanes.htm) are collected from probability samples with so-called “complex” designs, giving rise to the term complex sample survey data. These complex sample designs generally have the following features (Heeringa et al., 2010): • The sampling frame, or list of population elements (or clusters of population elements, e.g., schools where students will be sampled and surveyed) is stratified into mutually exclusive divisions of elements that are homogeneous within (in terms of the features of survey interest) and heterogeneous between. This stratification serves to increase the precision of survey estimates and ensure that the sample is adequately representative Linear Mixed Models: An Overview 51 of the target population. Public-use survey data sets will sometimes include a variable containing unique codes for each stratum of the population. • To reduce the costs associated with data collection, naturally-occurring clusters of elements (e.g., schools, hospitals, neighborhoods, Census blocks, etc.) are randomly sampled within strata, rather than randomly sampling individual elements. This is often done because sampling frames only contain clusters, rather than the individual elements of interest. This type of cluster sampling gives rise to a hierarchical structure in the data set, where sampled elements nested within the sampled clusters tend to have similar values on the survey features of interest (e.g., similar attitudes within a neighborhood, or similar test performance within a school). This intracluster correlation needs to be accounted for in analyses, so that standard errors of parameter estimates appropriately reflect this dependency among the survey respondents. Public-use survey data sets will sometimes include a variable containing unique codes for each sampled cluster. • The known probabilities of selection assigned to each case in the target population need not be equal (as in the case of a simple random sample), and these probabilities are inverted to compute what is known as a design weight for each sampled case. These design weights may also be adjusted to account for differential nonresponse among different subgroups of the sample, and further calibrated to produce weighted sample distributions that match known population distributions (Valliant et al., 2013). Public-use survey data sets will include variables containing the final survey weight values for responding cases in a survey, and these weights enable analysts to compute unbiased estimates of population quantities. Analysts of complex sample survey data are therefore faced with some decisions when working with data sets containing variables that reflect these sampling features. When fitting linear mixed models to complex sample survey data that recognize the hierarchical structures of these data sets (given the cluster sampling), how exactly should these complex design features be accounted for? Completely ignoring these design features when fitting linear mixed models can be problematic for inference purposes, especially if the design features are predictive of (or informative about) the dependent variable of interest. The existing literature generally suggests two approaches that could be used to fit linear mixed models to survey data sets, and we discuss implementation of these two approaches using existing software in the following subsections. 2.9.6.1 Purely Model-Based Approaches The majority of the analyses discussed in this book can be thought of as “model-based” approaches to data analysis, where a probability model is specified for a dependent variable of interest, and one estimates the parameters defining that probability model. The probability models in this book are defined by fixed effects and covariance parameters associated with random effects, and the dependent variables considered have marginal normal distributions defined by these parameters. When analyzing survey data following this framework, the key complex sample design features outlined above are simply thought of as additional relevant information for the dependent variable that should be accounted for in the specification of a given probability model. First, the sampling strata are generally thought of as categorical characteristics of the randomly sampled clusters (which may define Level 2 or Level 3 units, depending on the sample design), and the effects of the strata (or recoded versions of the original stratum variable, to reduce the number of categories) are considered as fixed effects. The reason that the effects of strata enter into linear mixed models as fixed effects is that all possible 52 Linear Mixed Models: A Practical Guide Using Statistical Software strata (or divisions of the population) will be included in each hypothetical replication of the sample; there is no sampling variability introduced by having different strata in different samples. The effects associated with the different strata are therefore considered as fixed. The fixed effects of strata can be incorporated into models by following the standard approaches for categorical fixed factors discussed in detail in this book. Second, the intracluster correlations in the survey data introduced by the cluster sampling are generally accounted for by including random effects of the randomly sampled clusters in the specification of a given linear mixed model. In this sense, the data hierarchy of a given survey data set needs to be carefully considered. For example, an analyst may be interested in modeling between-school variance when fitting a linear mixed model, but the cluster variable in that survey data set may represent a higher-level unit of clustering (such as a county, where multiple schools may be included in the sample from a given county). In this example, the analyst may need to consider a three-level linear mixed model, where students are nested within schools, and schools are nested within clusters (or counties). Model-based approaches to fitting linear mixed models will be most powerful when the models are correctly specified, and correctly accounting for the different possible levels of clustering in a given data set therefore becomes quite important, so that important random effects are not omitted from a model. Third, when following a purely model-based approach to the analysis of complex sample survey data using linear mixed models, the weights (or characteristics of sample units used to form the weights, e.g., ethnicity in a survey that over-samples minority ethnic groups relative to known population distributions) are sometimes considered as covariates associated with the sampled units. As discussed by Gelman (2007), model-based approaches to the analysis of survey data arising from complex samples need to account for covariates that affect sample selection or nonresponse (and of course also potentially predict the dependent variable of interest). Gelman (2007) proposed a model-based approach to estimating population means and differences in population means (which can be estimated using regression) based on the ideas of post-stratification weighting, which is a technique commonly used to adjust for differences between samples and populations in survey data. In this approach, implicit adjustment weights for survey respondents are based on regression models that condition on variables defining post-stratification cells, where corresponding population distributions across those cells are also known. Gelman (2007) called for more work to develop modelbased approaches to fitting regression models (possibly with random cluster effects) that simultaneously adjust for these differences between samples and populations using the ideas of post-stratification. Other purely model-based approaches to accounting for sampling weights in multilevel models for survey data are outlined by Korn & Graubard (1999) and Heeringa et al. (2010). In general, these model-based approaches to accounting for survey weights can be readily implemented using existing software capable of fitting multilevel models, given that the weights (or features of respondents used to define the weights) can be included as covariates in the specified models. 2.9.6.2 Hybrid Design- and Model-Based Approaches In contrast to the purely model-based approaches discussed above, an alternative approach is to incorporate complex sampling weights into the estimation of the parameters in a linear mixed model. We refer to this as a “hybrid” approach, given that it includes features of model-based approaches (where the models include random effects of sampling clusters and fixed effects of sampling strata) and design-based approaches (incorporating sampling weights into the likelihood functions used for estimation, in an effort to compute design-unbiased estimates of model parameters). It is well known that if sampling weights Linear Mixed Models: An Overview 53 informative about a dependent variable in a linear mixed model are not accounted for when estimating that model, estimates of the parameters may be biased (Carle, 2009). These “hybrid” approaches offer analysts an advantage over purely model-based approaches, in that the variables used to compute sampling weights may not always be well-known (or well-communicated by survey organizations in the documentation for their surveys) for inclusion in models. Incorporating the sampling weights into estimation of the parameters in a linear mixed model will generally lead to unbiased estimates of the parameters defining the larger population model that is the objective of the estimation, even if the model has been misspecified in some way. In this sense, unbiased estimates of the parameters in a poorly-specified model would be preferred over biased estimates of the parameters in a poorly-specified model. What is critical for these “hybrid” approaches is that software be used that correctly implements the theory of weighted estimation for multilevel models. Initial theory for parameter estimation and variance estimation in this context was communicated by Pfeffermann et al. (1998), and later elaborated on by Rabe-Hesketh & Skrondal (2006), who have implemented the theory in the Stata software procedure gllamm (available at www.gllamm.org). This theory requires that conditional weights be used for estimation at lower levels of the data hierarchy; for example, in a two-level cross-sectional data set with people nested within clusters, the weights associated with individual people need to be conditional on their cluster having been sampled. Making this more concrete, suppose that the weight provided in a survey data set for a given individual is 100, meaning that (ignoring possible nonresponse and post-stratification adjustments for illustration purposes), their probability of selection into the sample was 1/100. This probability of selection was actually determined based on the probability that their cluster was selected into the sample (say, 1/20), and then the conditional probability that the individual was selected given that the cluster was sampled (say, 1/5). In this sense, the appropriate “Level-1 weight” for implementing this theory for that individual would be 5, or the inverse of their conditional selection probability. Furthermore, the “Level-2 weight” of 20 for the sampled cluster is also needed in the data set; this is because this theory requires these two components of the overall weight at different places in the likelihood function. There are two important issues that arise in this case. First, survey agencies seldom release these conditional weights (or the information needed to compute them) in public-use data files, and cluster-level weights are also seldom released; in general, only weights for the ultimate Level-1 units accounting for the final probabilities of selection across all stages of sampling are released. Second, these weights need to be scaled for inclusion in the likelihood function used to estimate the model parameters. Scaling becomes most important when an analyst only has access to final weights (as opposed to conditional weights) for Level-1 and Level-2 units, and the final Level-1 weights (representing overall probabilities of selection) need to be appropriately re-scaled to remove their dependence on the Level-2 weights (for the clusters). Scaling can still be important even if the conditional Level-1 weights are in fact provided in a data set (Rabe-Hesketh & Skrondal, 2006). Carle (2009) presents the results of simulation studies examining the performance of the two most popular scaling methods that have emerged in this literature, concluding that they do not tend to differ in performance and still provide better inference than ignoring the weights entirely. Based on this work, we recommend that analysts following these “hybrid” approaches use existing software to examine the sensitivity of their inferences to the different scaling approaches (and differences in inference compared to unweighted analyses). At the time of the writing for the second edition of this book, two of the software procedures emphasized in this book implement the appropriate theory for this hybrid type of analysis approach: the mixed command in Stata, and the various procedures in the HLM software package. When using the mixed command, the appropriate scaling approaches are 54 Linear Mixed Models: A Practical Guide Using Statistical Software only currently implemented for two-level models, and in general, conditional weights need to be computed at each level for higher-level models. The following syntax indicates the general setup of the mixed command, for a two-level cross-sectional survey data set with individuals nested within sampling clusters: . mixed depvar indvar1 i.strata ... [pw = finwgt] || cluster:, pweight(level2wgt) pwscale(size) In this syntax, we assume that finwgt is the final weight associated with Level-1 respondents, and reflects the overall probability of selection (including the conditional probability of selection for the individuals and the cluster probability of selection). This is the type of weight that is commonly provided in survey data sets. The pwscale(size) option is then a tool for re-scaling these weights to remove their dependence on the cluster-level weights, which reflect the probabilities of selection for the clusters (level2wgt). If conditional Level-1 weights were available in the data set, the pwscale(gk) scaling option could also be used. In general, all three scaling options (including pwscale(effective)) should be considered in a given analysis, to examine the sensitivity of the results to the different scaling approaches. We note that random cluster intercepts are included in the model to reflect the within-cluster dependency in values on the dependent variable, and fixed effects of the categorical strata are included as well (using i.strata). Obviously the use of these options requires the presence of cluster weights and possibly conditional Level-1 weights in the survey data set. In the absence of cluster weights, one would have to assume that the weights associated with each cluster were 1 (essentially, a simple random sample of clusters was selected). This is seldom the case in practice, and would likely lead to biased parameter estimates. In the HLM software, the conditional lower-level weights and the higher-level cluster weights can be selected by clicking on the Other Settings menu, and then Estimation Settings. Next, click on the button for Weighting, and select the appropriate weight variables at each level. If weights are only available at Level 1 and are not conditional (the usual case), HLM will automatically normalize the weights so that they have a mean of 1. If Level-1 and Level-2 weights are specified, HLM will assume that the Level-1 weights are conditional weights, and normalize these weights so that they sum to the size of the sample within each Level-2 unit (or cluster). In this sense, HLM automatically performs scaling of the weights. See http://www.ssicentral.com/hlm/example6-2.html for more details for higher-level models. Although the approaches above are described for two-level, cross-sectional, clustered survey data sets, there are also interesting applications of these hybrid modeling approaches for longitudinal survey data, where repeated measures are nested within individuals, and individuals may be nested within sampling clusters. The same basic theory with regard to weighting at each level still holds, but the longitudinal data introduce the possibility for unique error covariance structures at Level 1. Heeringa et al. (2010) present an example of fitting a model to data from the longitudinal Health and Retirement Study (HRS) using this type of hybrid approach (as implemented in the gllamm procedure of Stata), and review some of the literature that has advanced the use of these methods for longitudinal survey data. More recently, Veiga et al. (2014) described the theory and methods required to fit multilevel models with specific error covariance structures for the repeated measures at Level 1, and showed how accounting for the survey weights at each level can make a difference in terms of inferences related to trends over time. These authors also make software available that implements this hybrid approach for this specific class of multilevel models. The analysis of data from longitudinal surveys is a ripe area for future research, and we anticipate more developments in this area in the near future. Linear Mixed Models: An Overview 55 Other software packages that currently implement the theory for this hybrid approach include Mplus (see Asparouhov, 2006, 2008), the gllamm add-on command for Stata (RabeHesketh & Skrondal, 2006; see www.gllamm.org for examples) and the MLwiN software package (http://www.bristol.ac.uk/cmm/software/mlwin/). We are hopeful that these approaches will become available in more of the general-purpose software packages in the near future, and we will provide any updates in this regard on the web page for this book. 2.9.7 Bayesian Analysis of Linear Mixed Models In general, we present a frequentist approach to the analysis of linear mixed models in this book. These approaches involve the specification of a likelihood function corresponding to a given linear mixed model, and subsequent application of the various numerical optimization algorithms discussed earlier in this chapter to find the parameter estimates that maximize that likelihood function. In contrast, one could also take a Bayesian approach to the analysis of linear mixed models. Heuristically speaking, this approach makes use of Bayes’ Theorem, and defines a posterior distribution for a parameter vector θ given data denoted by y, f (θ|y), as a function of the product of a prior distribution for that parameter vector, f (θ), and the likelihood function defined by the model of interest and the observed data, f (y|θ). The Bayesian approach treats the data y as fixed and the parameter vector θ as random, allowing analysts to ascribe uncertainty to the parameters of interest based on previous investigations via the prior distributions. Another important distinction between the Bayesian and frequentist approaches is that uncertainty in estimates of variance components describing distributions of random coefficients is incorporated into inferential procedures, as described below. Given a prior distribution for the vector of parameters θ (specified by the analyst) and the likelihood function defined by the data and a given model, various computational methods (e.g., Markov Chain Monte Carlo (MCMC) methods) can be used to simulate draws from the resulting posterior distribution for the parameters of interest. The prior distribution is essentially “updated” by the likelihood defined by a given data set and model to form the posterior distribution. Inferences for the parameters are then based on many simulated draws from the posterior distribution; for example, a 95% credible set for a parameter of interest could be defined by the 0.025 and 0.975 percentiles of the simulated draws from the posterior distribution for that parameter. This approach to inference is considered by Bayesians to be much more “natural” than frequentist inferences based on confidence intervals (which would suggest, for example, that 95% of intervals computed a certain way will cover a true population parameters across hypothetical repeated samples). With the Bayesian approach, which essentially treats parameters of interest as random variables, one could suggest that the probability that a parameter takes a value between the limits defined by a 95% credible set is actually 0.95. Bayesian approaches also provide a natural method of inference for data sets having a hierarchical structure, such as those analyzed in the case studies in this book, given that the parameters of interest are treated as random variables rather than fixed constants. For example, in a two-level cross-sectional clustered data set, observations on the dependent variable at Level 1 (the “data”) may arise from a normal distribution governed by random coefficients specific to a higher-level cluster; the coefficients in this model define the likelihood function. The random cluster coefficients themselves may arise from a normal distribution governed by mean and variance parameters. Then, these parameters may be governed by some prior distribution, reflecting uncertainty associated with those parameters based on prior studies. Following a Bayesian approach, one can then simulate draws from the posterior distribution for all of the random coefficients and parameters that is defined 56 Linear Mixed Models: A Practical Guide Using Statistical Software by these conditional distributions, making inference about either the random coefficients or the parameters of interest. There is a vast literature on Bayesian approaches to fitting linear mixed models; we only provide a summary of the approach in this section. General texts providing very accessible overviews of this topic with many worked examples using existing software include Gelman et al. (2004), Carlin & Louis (2009), and Jackman (2009). Gelman & Hill (2006) provide a very practical overview of using Bayesian approaches to fit multilevel models using the R software in combination with the BUGS software. We will aim to include additional examples using other software on the book’s web page as the various software packages covered in this book progress in making Bayesian approaches more available and accessible. At the time of this writing, fully Bayesian analysis approaches are only available in proc mixed in SAS, and we will provide examples of these approaches on the book’s web page. 2.10 Power Analysis for Linear Mixed Models At the stage where researchers are designing studies that will produce longitudinal or clustered data, a power analysis is often necessary to ensure that sample sizes at each level of the data hierarchy will be large enough to permit detection of effects (or parameters) of scientific interest. In this section, we review methods that can be used to perform power analyses for linear mixed models. We focus on a priori power calculations, which would be performed at the stage of designing a given study. We first consider direct power computations based on known analytic results. Readers interested in more details with regard to study design considerations for LMMs should review van Breukelen & Moerbeek (2013). 2.10.1 Direct Power Computations Methods and software for performing a priori power calculations for a study that will employ linear mixed models are readily available, and much of the available software can fortunately be obtained free of charge. Helms (1992) and Verbeke & Molenberghs (2000) describe power calculations for linear mixed models based on known analytic formulas, and Spybrook et al. (2011) provide a detailed account of power analysis methods for linear mixed models that do not require simulations. In terms of existing software, Galecki & Burzykowski (2013) describe the use of available tools in the freely available R software for performing a priori power calculations for linear mixed models. Spybrook et al. (2011) have developed the freely available Optimal Design software package, which can be downloaded from http://sitemaker.umich.edu/ group-based/optimal_design_software. The documentation for this software, which can also be downloaded free-of-charge from the same web site, provides several detailed examples of a priori power calculations using known analytic results for linear mixed models that are employed by this software. We find these tools to be extremely valuable for researchers designing studies that will collect longitudinal or clustered data sets, and we urge readers to consult these references when designing such studies. For some study designs where closed-form results for power calculations are not readily available, simulations will likely need to be employed. In the next section, we discuss a general approach to the use of simulations to perform power analyses for studies that will employ linear mixed models. Linear Mixed Models: An Overview 2.10.2 57 Examining Power via Simulation In general, there are three steps involved in using simulations to compute the power of a given study design to detect specified parameters of interest in a linear mixed model: 1. Specify the model of interest, using actual values for the parameters of interest which correspond to the values that the research team would like to be able to detect at a specified level of significance. For example, in a simple random intercept model for a continuous outcome variable in a two-level clustered data set with one binary covariate at Level 1 (e.g., treatment), there are four parameters to be estimated: the fixed intercept (or the mean for the control group), the fixed treatment effect, the variance of the random intercepts, and the variance of the errors at Level 1. A researcher may wish to detect a mean difference of 5 units between the treatment and control groups, where the control group is expected to have a mean of 50 on the continuous outcome. The researcher could also use estimated values of the variance of the random intercepts and the variance of the errors at Level 1 based on pilot studies or previous publications. 2. Randomly generate a predetermined number of observations based on the specified model, at each level of the data hierarchy. In the context of the two-level data set described above, the researcher may be able afford data collection in 30 clusters, with 20 observations per cluster. This would involve randomly selecting 30 random effects from the specified distribution, and then randomly selecting 20 draws of values for the outcome variable conditional on each selected random effect (based on the prespecified values of the fixed-effect parameters and 20 random draws of residuals from the specified distribution for the errors). 3. After generating a hypothetical data set, fit the linear mixed model of interest, and use appropriate methods to test the hypothesis of interest or generate inference regarding the target parameter(s). Record the result of the hypothesis test, and repeat these last two steps several hundred (or even thousands) of times, recording the proportion of simulated data sets where the effect of interest can be detected at a specified level of significance. This proportion is the power of the proposed design to detect the effect of interest. We now present an example of a macro in SAS that illustrates this three-step simulation methodology, in the case of a fairly simple linear mixed model. We suppose that a researcher is interested in being able to detect a between-cluster variance component in a two-level cross-sectional design, where individual observations on a continuous dependent variable are nested within clusters. The research question is whether the between-cluster variance component is greater than zero. Based on previous studies, the researcher estimates that the dependent variable has a mean of 45, and a within-cluster variance (error variance) of 64. We do not consider any covariates in this example, meaning that 64 would be the “raw” (or unconditional) variance of the dependent variable within a given cluster. The researcher would like to be able to detect a between-cluster variance component of 2 as being significantly greater than zero with at least 80 percent power when using a 0.05 level of significance. This variance component corresponds to a within-cluster correlation (or ICC) of 0.03. The 80 percent power means that if this ICC exists in the target population, the researcher would be able to detect it 80 times out of 100 when using a significance level of 0.05 and a likelihood ratio test based on a mixture of chi-square distributions (see Subsection 2.6.2.2). The researcher needs to know how many clusters to sample, and how many individuals to sample within each cluster. We have prepared a SAS program on the book’s web page (see Appendix A) for performing this simulation. We use proc glimmix in SAS to fit the linear mixed model to 58 Linear Mixed Models: A Practical Guide Using Statistical Software each simulated data set, which enables the appropriate likelihood ratio test of the null hypothesis that the between-cluster variance is equal to zero based on a mixture of chi-square distributions (via the covtest glm statement). We also take advantage of the ods output functionality in SAS to save the likelihood ratio test p-values in a SAS data file (lrtresult) that can be analyzed further. This program requests 100 simulated data sets, where 40 observations are sampled from each of 30 clusters. One run of this macro results in a computed power value of 0.91, suggesting that these choices for the number of clusters and the number of observations per cluster would result in sufficient power (and that the researcher may be able to sample fewer clusters or observations, given that a power value of 0.8 was desired). This simple example introduces the idea of using simulation for power analysis when fitting linear mixed models and designing longitudinal studies or cluster samples. With a small amount of SAS code, these simulations are straightforward to implement, and similar macros can be written using other languages (SPSS, Stata, R, etc.). For additional reading on the use of simulations to conduct power analyses for linear mixed models, we refer readers to Gelman & Hill (2006) and Galecki & Burzykowski (2013), who provide worked examples of writing simulations for power analysis using the freely available R software. We also refer readers to the freely available MLPowSim software that can be used for similar types of simulation-based power analyses (http://www.bristol.ac.uk/cmm/software/mlpowsim/). 2.11 Chapter Summary LMMs are flexible tools for the analysis of clustered and repeated-measures/longitudinal data that allow for subject-specific inferences. LMMs extend the capabilities of standard linear models by allowing: 1. Unbalanced and missing data, as long as the missing data are MAR; 2. The fixed effects of time-varying covariates to be estimated in models for repeatedmeasures or longitudinal data sets; 3. Structured variance-covariance matrices for both the random effects (the D matrix) and the residuals (the Ri matrix). In building an LMM for a specific data set, we aim to specify a model that is appropriate both for the mean structure and the variance-covariance structure of the observed responses. The variance-covariance structure in an LMM should be specified in light of the observed data and a thorough understanding of the subject matter. From a statistical point of view, we aim to choose a simple (or parsimonious) model with a mean and variance-covariance structure that reflects the basic relationships among observations, and maximizes the likelihood of the observed data. A model with a variance-covariance structure that fits the data well leads to more accurate estimates of the fixed-effect parameters and to appropriate statistical tests of significance. 3 Two-Level Models for Clustered Data: The Rat Pup Example 3.1 Introduction In this chapter, we illustrate the analysis of a two-level clustered data set. Such data sets typically include randomly sampled clusters (Level 2) and units of analysis (Level 1), which are randomly selected from each cluster. Covariates can measure characteristics of the clusters or of the units of analysis, so they can be either Level 2 or Level 1 variables. The dependent variable, which is measured on each unit of analysis, is always a Level 1 variable. The models fitted to clustered data sets with two or more levels of data (or to longitudinal data) are often called multilevel models (see Subsection 2.2.4). Two-level models are the simplest examples of multilevel models and are often used to analyze two-level data sets. In this chapter, we consider two-level random intercept models that include only a single random effect associated with the intercept for each cluster. We formally define an example of a two-level random intercept model in Subsection 3.3.2. Study designs that can result in two-level clustered data sets include observational studies on units within clusters, in which characteristics of both the clusters and the units are measured; cluster-randomized trials, in which a treatment is randomly assigned to all units within a cluster; and randomized block design experiments, in which the blocks represent clusters and treatments are assigned to units within blocks. Examples of two-level data sets and related study designs are presented in Table 3.1. This is the first chapter in which we illustrate the analysis of a data set using the five software procedures discussed in this book: proc mixed in SAS, the MIXED command in SPSS, the lme() and lmer() functions in R, the mixed command in Stata, and the HLM2 procedure in HLM. We highlight the SAS software in this chapter. SAS is used for the initial data summary, and for the model diagnostics at the end of the analysis. We also go into the modeling steps in more detail in SAS. 59 60 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 3.1: Examples of Two-Level Data in Different Research Settings Research Setting/Study Design Sociology Education Toxicology Observational Study ClusterRandomized Trial ClusterRandomized Trial Cluster (random factor) City Block Classroom Litter Covariates Urban vs. rural indicator, percentage single-family dwellings Teaching method, teacher years of experience Treatment, litter size Unit of analysis Household Student Rat Pup Dependent variable Household income Test score Birth weight Covariates Number of people in household, own or rent home Gender, age Sex Level of Data Level 2 Level 1 3.2 3.2.1 The Rat Pup Study Study Description Jose Pinheiro and Doug Bates, authors of the lme() function in R, provide the Rat Pup data in their book Mixed-Effects Models in S and S-PLUS (2000). The data come from a study in which 30 female rats were randomly assigned to receive one of three doses (high, low, or control) of an experimental compound. The objective of the study was to compare the birth weights of pups from litters born to female rats that received the high- and low-dose treatments to the birth weights of pups from litters that received the control treatment. Although 10 female rats were initially assigned to receive each treatment dose, three of the female rats in the high-dose group died, so there are no data for their litters. In addition, litter sizes varied widely, ranging from 2 to 18 pups. Because the number of litters per treatment and the number of pups per litter were unequal, the study has an unbalanced design. The Rat Pup data is an example of a two-level clustered data set obtained from a cluster-randomized trial: each litter (cluster) was randomly assigned to a specific level of treatment, and rat pups (units of analysis) were nested within litters. The birth weights of rat pups within the same litter are likely to be correlated because the pups shared the same maternal environment. In models for the Rat Pup data, we include random litter effects (which imply that observations on the same litter are correlated) and fixed effects associated Two-Level Models for Clustered Data:The Rat Pup Example 61 TABLE 3.2: Sample of the Rat Pup Data Set Litter (Level 2) Rat Pup (Level 1) Cluster ID Covariates Unit ID Dependent Variable Covariate LITTER TREATMENT LITSIZE PUP ID WEIGHT SEX 1 Control 12 1 1 Control 12 2 1 Control 12 3 1 Control 12 4 1 Control 12 5 1 Control 12 6 1 Control 12 7 1 Control 12 8 1 Control 12 9 1 Control 12 10 ... 11 Low 16 132 11 Low 16 133 ... 21 High 14 258 21 High 14 259 21 High 14 260 21 High 14 261 ... Note: “...” indicates that a portion of data is not displayed. 6.60 7.40 7.15 7.24 7.10 6.04 6.98 7.05 6.95 6.29 Male Male Male Male Male Male Male Male Female Female 5.65 5.78 Male Male 5.09 5.57 5.69 5.50 Male Male Male Male with treatment. Our analysis uses a two-level random intercept model to compare the mean birth weights of rat pups from litters assigned to the three different doses, after taking into account variation both between litters and between pups within the same litter. A portion of the 322 observations in the Rat Pup data set is shown in Table 3.2, in the “long”1 format appropriate for a linear mixed model (LMM) analysis using the procedures in SAS, SPSS, R, and Stata. Each data row represents the values for an individual rat pup. The litter ID and litter-level covariates TREATMENT and LITSIZE are included, along with the pup-level variables WEIGHT and SEX. Note that the values of TREATMENT and LITSIZE are the same for all rat pups within a given litter, whereas SEX and WEIGHT vary from pup to pup. Each variable in this data set is classified as either a Level 2 or Level 1 variable, as follows: Litter (Level 2) Variables • LITTER = Litter ID number 1 The HLM software requires a different data setup, which will be discussed in Subsection 3.4.5. 62 Linear Mixed Models: A Practical Guide Using Statistical Software • TREATMENT = Dose level of the experimental compound assigned to the litter (high, low, control) • LITSIZE = Litter size (i.e., number of pups per litter) Rat Pup (Level 1) Variables • PUP ID = Unique identifier for each rat pup • WEIGHT = Birth weight of the rat pup (the dependent variable) • SEX = Sex of the rat pup (male, female) 3.2.2 Data Summary The data summary for this example was generated using SAS Release 9.3. A link to the syntax and commands that can be used to carry out a similar data summary in the other software packages is included on the web page for the book (see Appendix A). We first create the ratpup data set in SAS by reading in the tab-delimited raw data file, rat_pup.dat, assumed to be located in the C:\temp directory of a Windows machine. Note that SAS users can optionally import the data directly from a web site (see the second filename statement that has been “commented out”): filename ratpup "C:\temp\rat_pup.dat"; *filename ratpup url "http://www-personal.umich.edu/~bwest/rat_pup.dat"; data ratpup; infile ratpup firstobs = 2 dlm = "09"X; input pup_id weight sex $ litter litsize treatment $; if treatment = "High" then treat = 1; if treatment = "Low" then treat = 2; if treatment = "Control" then treat = 3; run; We skip the first row of the raw data file, containing variable names, by using the firstobs = 2 option in the infile statement. The dlm = "09"X option tells SAS that tabs, having the hexadecimal code of 09, are the delimiters in the data file. We create a new numeric variable, TREAT, that represents levels of the original character variable, TREATMENT, recoded into numeric values (High = 1, Low = 2, and Control = 3). This recoding is carried out to facilitate interpretation of the parameter estimates for TREAT in the output from proc mixed in later analyses (see Subsection 3.4.1 for an explanation of how this recoding affects the output from proc mixed). Next we create a user-defined format, trtfmt, to label the levels of TREAT in the output. Note that assigning a format to a variable can affect the order in which levels of the variable are processed in different SAS procedures; we will provide notes on the ordering of the TREAT variable in each procedure that we use. proc format; value trtfmt 1 = "High" 2 = "Low" 3 = "Control"; run; Two-Level Models for Clustered Data:The Rat Pup Example 63 The following SAS syntax can be used to generate descriptive statistics for the birth weights of rat pups at each level of treatment by sex. The maxdec = 2 option specifies that values displayed in the output from proc means are to have only two digits after the decimal. Software Note: By default the levels of the class variable, TREAT, are sorted by their numeric (unformatted) values in the proc means output, rather than by their alphabetic (formatted) values. The values of SEX are ordered alphabetically, because no format is applied. title "Summary statistics for weight by treatment and sex"; proc means data = ratpup maxdec = 2; class treat sex; var weight; format treat trtfmt.; run; The SAS output displaying descriptive statistics for each level of treatment and sex are shown in the Analysis Variable: weight table below. ' $ Analysis Variable : weight ------------------------------------------------------------N Treat Sex Obs N Mean Std Dev Minimum Maximum High Female Male 32 33 32 33 5.85 5.92 0.60 0.69 4.48 5.01 7.68 7.70 Low Female Male 65 61 65 61 5.84 6.03 0.45 0.38 4.75 5.25 7.73 7.13 Control Female Male 54 77 54 77 6.12 6.47 0.69 0.75 3.68 4.57 7.57 8.33 & % The experimental treatments appear to have a negative effect on mean birth weight: the sample means of birth weight for pups born in litters that received the high- and low-dose treatments are lower than the mean birth weights of pups born in litters that received the control dose. We note this pattern in both female and male rat pups. We also see that the sample mean birth weights of male pups are consistently higher than those of females within all levels of treatment. We use box plots to compare the distributions of birth weights for each treatment by sex combination graphically. We generate these box plots using the sgpanel procedure, creating panels for each treatment and showing box plots for each sex within each treatment: title "Boxplots of rat pup birth weights (Figure 3.1)"; ods listing style = journal2; proc sgpanel data = ratpup; panelby treat / novarname columns=3; vbox weight / category = sex; format treat trtfmt.; run; 64 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 3.1: Box plots of rat pup birth weights for levels of treatment by sex. The pattern of lower birth weights for the high- and low-dose treatments compared to the control group is apparent in Figure 3.1. Male pups appear to have higher birth weights than females in both the low and control groups, but not in the high group. The distribution of birth weights appears to be roughly symmetric at each level of treatment and sex. The variances of the birth weights are similar for males and females within each treatment but appear to differ across treatments (we will check the assumption of constant variance across the treatment groups as part of the analysis). We also note potential outliers, which are investigated in the model diagnostics (Section 3.10). Because each box plot pools measurements for rat pups from several litters, the possible effects of litter-level covariates, such as litter size, are not shown in this graph. In Figure 3.2, we use box plots to illustrate the relationship between birth weight and litter size. Each panel shows the distributions of birth weights for all litters ranked by size, within a given level of treatment and sex. We first create a new variable, RANKLIT, to order the litters by size. The smallest litter has a size of 2 pups (RANKLIT = 1), and the largest litter has a size of 18 pups (RANKLIT = 27). After creating RANKLIT, we create box plots as a function of RANKLIT for each combination of TREAT and SEX by using proc sgpanel: proc sort data=ratpup; by litsize litter; run; data ratpup2; set ratpup; by litsize litter; if first.litter then ranklit+1; label ranklit = "New Litter ID (Ordered by Size)"; run; Two-Level Models for Clustered Data:The Rat Pup Example 65 FIGURE 3.2: Litter-specific box plots of rat pup birth weights by treatment level and sex. Box plots are ordered by litter size. /* Box plots for weight by litsize, for each level of treat and sex */ proc sgpanel data=ratpup2; panelby sex treat / novarname columns=3 ; vbox weight / category = ranklit meanattrs=(size=6) ; format treat trtfmt.; colaxis display=(novalues noticks); run; Figure 3.2 shows a strong tendency for birth weights to decrease as a function of litter size in all groups except males from litters in the low-dose treatment. We consider this important relationship in our models for the Rat Pup data. 3.3 Overview of the Rat Pup Data Analysis For the analysis of the Rat Pup data, we follow the “top-down” modeling strategy outlined in Subsection 2.7.1. In Subsection 3.3.1 we outline the analysis steps, and informally introduce related models and hypotheses to be tested. Subsection 3.3.2 presents a more formal specification of selected models that are fitted to the Rat Pup data, and Subsection 3.3.3 details the hypotheses tested in the analysis. To follow the analysis steps outlined in this section, we refer readers to the schematic diagram presented in Figure 3.3. 66 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 3.3: Model selection and related hypotheses for the analysis of the Rat Pup data. 3.3.1 Analysis Steps Step 1: Fit a model with a “loaded” mean structure (Model 3.1). Fit a two-level model with a “loaded” mean structure and random litter-specific intercepts. Model 3.1 includes the fixed effects of treatment, sex, litter size, and the interaction between treatment and sex. The model also includes a random effect associated with the intercept for each litter and a residual associated with each birth weight observation. The residuals are assumed to be independent and identically distributed, given the random and fixed effects, with constant variance across the levels of treatment and sex. Step 2: Select a structure for the random effects (Model 3.1 vs. Model 3.1A). Decide whether to keep the random litter effects in Model 3.1. In this step we test whether the random litter effects associated with the intercept should be omitted from Model 3.1 (Hypothesis 3.1), by fitting a nested model omitting the random effects (Model 3.1A) and performing a likelihood ratio test. Based on the result of this test, we decide to retain the random litter effects in all subsequent models. Two-Level Models for Clustered Data:The Rat Pup Example 67 Step 3: Select a covariance structure for the residuals (Models 3.1, 3.2A, or 3.2B). Decide whether the model should have homogeneous residual variances (Model 3.1), heterogeneous residual variances for each of the treatment groups (Model 3.2A), or grouped heterogeneous residual variances (Model 3.2B). We observed in Figure 3.2 that the within-litter variance in the control group appears to be larger than the within-litter variance in the high and low treatment groups, so we investigate heterogeneity of residual variance in this step. In Model 3.1, we assume that the residual variance is homogeneous across all treatment groups. In Model 3.2A, we assume a heterogeneous residual variance structure, i.e., that the residual variance of the birth weight observations differs for each level of treatment (high, low, and control). In Model 3.2B, we specify a common residual variance for the high and low treatment groups, and a different residual variance for the control group. We test Hypotheses 3.2, 3.3, and 3.4 (specified in Section 3.3) in this step to decide which covariance structure to choose for the residual variance. Based on the results of these tests, we choose Model 3.2B as our preferred model at this stage of the analysis. Step 4: Reduce the model by removing nonsignificant fixed effects, test the main effects associated with treatment, and assess model diagnostics. Decide whether to keep the treatment by sex interaction in Model 3.2B (Model 3.2B vs. Model 3.3). Test the significance of the treatment effects in our final model, Model 3.3 (Model 3.3 vs. Model 3.3A). Assess the assumptions for Model 3.3. We first test whether we wish to keep the treatment by sex interaction in Model 3.2B (Hypothesis 3.5). Based on the result of this test, we conclude that the treatment by sex interaction is not significant, and can be removed from the model. Our new model is Model 3.3. The model-building process is complete at this point, and Model 3.3 is our final model. We now focus on testing the main hypothesis of the study: whether the main effects of treatment are equal to zero (Hypothesis 3.6). We use ML estimation to refit Model 3.3 and to fit a nested model, Model 3.3A, excluding the fixed treatment effects. We then carry out a likelihood ratio test for the fixed effects of treatment. Based on the result of the test, we conclude that the fixed effects of treatment are significant. The estimated fixed-effect parameters indicate that, controlling for sex and litter size, treatment lowers the mean birth weight of rat pups in litters receiving the high and low dose of the drug compared to the birth weight of rat pups in litters receiving the control dose. Finally, we refit Model 3.3 using REML estimation to reduce the bias of the estimated covariance parameters, and carry out diagnostics for Model 3.3 using SAS (see Section 3.10 for diagnostics). Software Note: Steps 3 and 4 of the analysis are carried out using proc mixed in SAS, the GENLINMIXED procedure in SPSS, the lme() function in R, and the mixed command in Stata only. The other procedures discussed in this book do not allow us to fit models that have heterogeneous residual variance for groups defined by Level 2 (cluster-level) factors. In Figure 3.3, we summarize the model selection process and hypotheses considered in the analysis of the Rat Pup data. Each box corresponds to a model and contains a brief description of the model. Each arrow corresponds to a hypothesis and connects two models 68 Linear Mixed Models: A Practical Guide Using Statistical Software involved in the specification of that hypothesis. The arrow starts at the box representing the reference model for the hypothesis and points to the simpler (nested) model. A dashed arrow indicates that, based on the result of the hypothesis test, we chose the reference model as the preferred one, and a solid arrow indicates that we chose the nested (null) model. The final model is included in a bold box. 3.3.2 Model Specification In this section we specify selected models considered in the analysis of the Rat Pup data. We summarize the models in Table 3.3. 3.3.2.1 General Model Specification The general specification of Model 3.1 for an individual birth weight observation (WEIGHTij ) on rat pup i within the j-th litter is shown in (3.1). This specification corresponds closely to the syntax used to fit the model using the procedures in SAS, SPSS, R, and Stata. WEIGHTij = β0 + β1 × TREAT1j + β2 × TREAT2j + β3 × SEX1ij + β4 × LITSIZEj + β5 × TREAT1j × SEX1ij ⎪ ⎭ + β6 × TREAT2j × SEX1ij +uj + εij ⎫ ⎪ ⎬ } fixed random (3.1) In Model 3.1 WEIGHTij is the dependent variable, and TREAT1j and TREAT2j are Level 2 indicator variables for the high and low levels of treatment, respectively. SEX1ij is a Level 1 indicator variable for female rat pups. In this model, WEIGHTij depends on the β parameters (i.e., the fixed effects) in a linear fashion. The fixed intercept parameter, β0 , represents the expected value of WEIGHTij for the reference levels of treatment and of sex (i.e., males in the control group) when LITSIZEj is equal to zero. We do not interpret the fixed intercept, because a litter size of zero is outside the range of the data (alternatively, the LITSIZE variable could be centered to make the intercept interpretable; see Subsection 2.9.5). The parameters β1 and β2 represent the fixed effects of the dummy variables (TREAT1j and TREAT2j ) for the high and low treatment levels vs. the control level, respectively. The β3 parameter represents the effect of SEX1ij (female vs. male), and β4 represents the fixed effect of LITSIZEj . The two parameters, β5 and β6 , represent the fixed effects associated with the treatment by sex interaction. The random effect associated with the intercept for litter j is indicated by uj , which is 2 assumed to have a normal distribution with mean of 0 and constant variance σlitter . We write the distribution of these random effects as: 2 uj ∼ N (0, σlitter ) 2 where σlitter represents the variance of the random litter effects. In Model 3.1, the distribution of the residual εij , associated with the observation on an individual rat pup i within litter j, is assumed to be the same for all levels of treatment: εij ∼ N (0, σ 2 ) In Model 3.2A, we allow the residual variances for observations at different levels of treatment to differ: Two-Level Models for Clustered Data:The Rat Pup Example High Treatment: 2 εij ∼ N (0, σhigh ) Low Treatment: 2 εij ∼ N (0, σlow ) 69 2 Control Treatment: εij ∼ N (0, σcontrol ) In Model 3.2B, we consider a separate residual variance for the combined high/low treatment group and for the control group: 2 High/Low Treatment: εij ∼ N (0, σhigh/low ) Control Treatment: 2 εij ∼ N (0, σcontrol ) In Model 3.3, we include the same residual variance structure as in Model 3.2B, but remove the fixed effects, β5 and β6 , associated with the treatment by sex interaction from the model. In all models, we assume that the random effects, uj , associated with the litterspecific intercepts and the residuals, εij , are independent. 3.3.2.2 Hierarchical Model Specification We now present Model 3.1 in the hierarchical form used by the HLM software, with the same notation as in (3.1). The correspondence between this notation and the HLM software notation is defined in Table 3.3. The hierarchical model has two components, reflecting two sources of variation: namely variation between litters, which we attempt to explain using the Level 2 model and variation between pups within a given litter, which we attempt to explain using the Level 1 model. We write the Level 1 model as: Level 1 Model (Rat Pup) WEIGHTij = b0j + b1j × SEX1ij + εij (3.2) where εij ∼ N (0, σ2 ) The Level 1 model in (3.2) assumes that WEIGHTij , i.e., the birth weight of rat pup i within litter j, follows a simple ANOVA-type model defined by the litter-specific intercept, b0j and the litter-specific effect of SEX1ij , b1j . Both b0j and b1j are unobserved quantities that are defined as functions of Level 2 covariates in the Level 2 model: Level 2 Model (Litter) b0j = β0 + β1 × TREAT1j + β2 × TREAT1j + β4 × LITSIZEj + uj b1j = β3 + β5 × TREAT1j + β6 × TREAT2j where 2 uj ∼ N (0, σlitter ) (3.3) 70 TABLE 3.3: Selected Models Considered in the Analysis of the Rat Pup Data Term/Variable Fixed effects Litter (j) Residuals Rat pup (pup i in litter j) Covariance Litter paramelevel ters (θ D ) for D matrix HLM Notation Notation Intercept TREAT1 (High vs. control) TREAT2 (Low vs. control) SEX1 (Female vs. male) LITSIZE TREAT1 × SEX1 TREAT2 × SEX1 β0 γ00 β1 γ02 β2 γ03 β3 β4 β5 β6 γ10 γ01 γ11 γ12 Intercept uj uj εij rij 2 σlitter τ Variance of intercepts Model a 3.1 3.2A 3.2Ba 3.3a √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Linear Mixed Models: A Practical Guide Using Statistical Software Random effects General Term/Variable General HLM Notation Notation Model 3.1 3.2Aa 3.2Ba 3.3a 2 2 2 2 Covariance Rat pup Variances of σhigh σ2 σ2 σhigh σhigh/low, σhigh/low, 2 2 2 2 paramelevel residuals σlow σlow σcontrol σcontrol 2 2 ters (θ R ) σcontrol σcontrol for Ri matrix a Models 3.2A, 3.2B, and 3.3 (with heterogeneous residual variances) can only be fit using selected procedures in SAS (proc mixed), SPSS (GENLINMIXED), R (the lme() function), and Stata (mixed). Two-Level Models for Clustered Data:The Rat Pup Example TABLE 3.3: (Continued) 71 72 Linear Mixed Models: A Practical Guide Using Statistical Software The Level 2 model in (3.3) assumes that b0j , the intercept for litter j, depends on the fixed intercept, β0 , and for pups in litters assigned to the high- or low-dose treatments, on the fixed effect associated with their level of treatment vs. control (β1 or β2 , respectively). The intercept also depends on the fixed effect of litter size, β4 , and a random effect, uj , associated with litter j. The effect of SEX1 within each litter, b1j , depends on an overall fixed SEX1 effect, denoted by β3 , and an additional fixed effect of either the high or low treatment vs. control (β5 or β6 , respectively). Note that the effect of sex varies from litter to litter only through the fixed effect of the treatment assigned to the litter; there is no random effect associated with sex. By substituting the expressions for b0j and b1j from the Level 2 model into the Level 1 model, we obtain the general LMM specified in (3.1). The fixed treatment effects, β5 and β6 , for TREAT1j and TREAT2j in the Level 2 model for the effect of SEX1 correspond to the interaction effects for treatment by sex (TREAT1j × SEX1ij and TREAT2j × SEX1ij ) in the general model specification. 3.3.3 Hypothesis Tests Hypothesis tests considered in the analysis of the Rat Pup data are summarized in Table 3.4. Hypothesis 3.1. The random effects, uj , associated with the litter-specific intercepts can be omitted from Model 3.1. We do not directly test the significance of the random litter-specific intercepts, but rather test a hypothesis related to the variance of the random litter effects. We write the null and alternative hypotheses as follows: 2 H0 : σlitter =0 2 HA : σlitter >0 To test Hypothesis 3.1, we use a REML-based likelihood ratio test. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 3.1 (the reference model) from the value for Model 3.1A (the nested model, which omits the random litter effects). The asymptotic null distribution of the test statistic is a mixture of χ2 distributions, with 0 and 1 degrees of freedom, and equal weights of 0.5. Hypothesis 3.2. The variance of the residuals is the same (homogeneous) for the three treatment groups (high, low, and control). The null and alternative hypotheses for Hypothesis 3.2 are: 2 2 2 H0 : σhigh = σlow = σcontrol = σ2 HA : At least one pair of residual variances is not equal to each other We use a REML-based likelihood ratio test for Hypothesis 3.2. The test statistic is obtained by subtracting the –2 REML log-likelihood value for Model 3.2A (the reference model, with heterogeneous variances) from that for Model 3.1 (the nested model). The asymptotic null distribution of this test statistic is a χ2 with 2 degrees of freedom, where the 2 degrees of freedom correspond to the 2 additional covariance parameters (i.e., the 2 additional residual variances) in Model 3.2A compared to Model 3.1. Hypothesis 3.3. The residual variances for the high and low treatment groups are equal. Hypothesis Specification Hypothesis Test Models Compared Label Null (H 0 ) Alternative (H A ) Test Nested Model (H 0 ) Ref. Model (H A ) Est. Method Test Stat. Dist. under H 0 Drop uj 2 (σlitter = 0) Homogeneous residual variance 2 2 (σhigh = σlow = 2 2 σcontrol = σ ) Retain uj 2 (σlitter > 0) Residual variances are not all equal LRT Model 3.1 A Model 3.1 Model 3.1 Model 3.2A REML 0.5χ20 +0.5χ21 REML χ22 3.3 Grouped heterogeneous residual variance 2 2 (σhigh = σlow ) 2 2 (σhigh = σlow ) LRT Model 3.2B Model 3.2A REML χ21 3.4 Homogeneous residual variance 2 2 (σhigh/low = σcontrol = σ2 ) Grouped heterogeneous residual variance 2 2 (σhigh/low = σcontrol ) LRT Model 3.1 Model 3.2B REML χ21 3.5 Drop TREATMENT × SEX effects (β5 = β6 = 0) β5 = 0, or β6 = 0 Type III F -test N/A Model 3.2B REML F(2,194)a 3.1 3.2 LRT Two-Level Models for Clustered Data:The Rat Pup Example TABLE 3.4: Summary of Hypotheses Tested in the Rat Pup Analysis Drop TREATMENT β1 = 0, or β2 = 0 LRT Model Model ML χ22 Effects 3.3A 3.3 (β1 = β2 = 0) Type III N/A REML F(2,24.3) a Different methods for calculating denominator degrees of freedom are available in the software procedures; we report the Satterthwaite estimate of degrees of freedom calculated by proc mixed in SAS. 3.6 73 74 Linear Mixed Models: A Practical Guide Using Statistical Software The null and alternative hypotheses are as follows: 2 2 H0 : σhigh = σlow 2 2 HA : σhigh = σlow We test Hypothesis 3.3 using a REML-based likelihood ratio test. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 3.2A (the reference model) from the corresponding value for Model 3.2B (the nested model, with pooled residual variance for the high and low treatment groups). The asymptotic null distribution of this test statistic is a χ2 with 1 degree of freedom, where the single degree of freedom corresponds to the one additional covariance parameter (i.e., the one additional residual variance) in Model 3.2A compared to Model 3.2B. Hypothesis 3.4. The residual variance for the combined high/low treatment group is equal to the residual variance for the control group. In this case, the null and alternative hypotheses are: 2 2 H0 : σhigh/low = σcontrol = σ2 2 2 HA : σhigh/low = σcontrol We test Hypothesis 3.4 using a REML-based likelihood ratio test. The test statistic is obtained by subtracting the –2 REML log-likelihood value for Model 3.2B (the reference model) from that for Model 3.1 (the nested model). The asymptotic null distribution of this test statistic is a χ2 with 1 degree of freedom, corresponding to the one additional variance parameter in Model 3.2B compared to Model 3.1. Hypothesis 3.5. The fixed effects associated with the treatment by sex interaction are equal to zero in Model 3.2B. The null and alternative hypotheses are: H0 : β5 = β6 = 0 HA : β5 = 0 or β6 = 0 We test Hypothesis 3.5 using an approximate F -test, based on the results of the REML estimation of Model 3.2B. Because this test is not significant, we remove the treatment by sex interaction term from Model 3.2B and obtain Model 3.3. Hypothesis 3.6. The fixed effects associated with treatment are equal to zero in Model 3.3. This hypothesis differs from the previous ones, in that it is not being used to select a model, but is testing the primary research hypothesis. The null and alternative hypotheses are: H0 : β1 = β2 = 0 HA : β1 = 0 or β2 = 0 We test Hypothesis 3.6 using an ML-based likelihood ratio test. The test statistic is calculated by subtracting the –2 ML log-likelihood value for Model 3.3 (the reference model) from that for Model 3.3A (the nested model excluding the fixed treatment effects). The asymptotic null distribution of this test statistic is a χ2 with 2 degrees of freedom, corresponding to the two additional fixed-effect parameters in Model 3.3 compared to Model 3.3A. Alternatively, we can test Hypothesis 3.6 using an approximate F -test for TREATMENT, based on the results of the REML estimation of Model 3.3. For the results of these hypothesis tests see Section 3.5. Two-Level Models for Clustered Data:The Rat Pup Example 3.4 75 Analysis Steps in the Software Procedures In this section, we illustrate fitting the LMMs for the Rat Pup example using the software procedures in SAS, SPSS, R, Stata, and HLM. Because we introduce the use of the software procedures in this chapter, we present a more detailed description of the steps and options for fitting each model than we do in Chapters 4 through 8. We compare results for selected models across the software procedures in Section 3.6. 3.4.1 SAS Step 1: Fit a model with a “loaded” mean structure (Model 3.1). We assume that the ratpup data set has been created in SAS, as illustrated in the data summary (Subsection 3.2.2). The SAS commands used to fit Model 3.1 to the Rat Pup data using proc mixed are as follows: ods output fitstatistics = fitl; title "Model 3.1"; proc mixed data = ratpup order = internal covtest; class treat sex litter; model weight = treat sex litsize treat*sex / solution ddfm = sat; random int / subject= litter; format treat trtfmt.; run; The ods statement is used to create a data set, fit1, containing the –2 REML loglikelihood and other fit statistics for Model 3.1. We will use the fit1 data set later to perform likelihood ratio tests for Hypotheses 3.1, 3.2, and 3.4. The proc mixed statement invokes the analysis, using the default REML estimation method. We use the covtest option to obtain the standard errors of the estimated covariance parameters for comparison with the results from the other software procedures. The covtest option also causes SAS to display a Wald z-test of whether the variance of the random litter effects equals zero (i.e., Hypothesis 3.1), but we do not recommend using this test (see the discussion of Wald tests for covariance parameters in Subsection 2.6.3.2). The order = internal option requests that levels of variables declared in the class statement be ordered based on their (unformatted) internal numeric values and not on their formatted values. The class statement includes the two categorical factors, TREAT and SEX, which will be included as fixed predictors in the model statement, as well as the classification factor, LITTER, that defines subjects in the random statement. The model statement sets up the fixed effects. The dependent variable, WEIGHT, is listed on the left side of the equal sign, and the covariates having fixed effects are included on the right of the equal sign. We include the fixed effects of TREAT, SEX, LITSIZE, and the TREAT × SEX interaction in this model. The solution option follows a slash (/), and instructs SAS to display the fixed-effect parameter estimates in the output (they are not displayed by default). The ddfm = option specifies the method used to estimate denominator degrees of freedom for F -tests of the fixed effects. In this case, we use ddfm = sat for the Satterthwaite approximation (see Subsection 3.11.6 for more details on denominator degrees of freedom options in SAS). 76 Linear Mixed Models: A Practical Guide Using Statistical Software Software Note: By default, SAS generates an indicator variable for each level of a class variable included in the model statement. This typically results in an overparameterized model, in which there are more columns in the X matrix than there are degrees of freedom for a factor or an interaction term involving a factor. SAS then uses a generalized inverse (denoted by – in the following formula) to calculate the fixed-effect parameter estimates (see the SAS documentation for proc mixed for more information): = (X V −1 X)− X V −1 y β In the output for the fixed-effect parameter estimates produced by requesting the solution option as part of the model statement, the estimate for the highest level of a class variable is by default set to zero; and the level that is considered to be the highest level for a variable will change depending on whether there is a format associated with the variable or not. In the analysis of the Rat Pup data, we wish to contrast the effects of the high- and low-dose treatments to the control dose, so we use the order = internal option to order levels of TREAT. This results in the parameter for Level 3 of TREAT (i.e., the control dose, which is highest numerically) being set to zero, so the parameter estimates for the other levels of TREAT represent contrasts with TREAT = 3 (control). This corresponds to the specification of Model 3.1 in (3.1). The value of TREAT = 3 is labeled “Control” in the output by the user-defined format. We refer to TREAT = 3 as the reference category throughout our discussion. In general, we refer to the highest level of a class variable as the “reference” level when we estimate models using proc mixed throughout the book. For example, in Model 3.1, the “reference category” for SEX is “Male” (the highest level of SEX alphabetically), which corresponds to our specification in (3.1). The random statement specifies that a random intercept, int, is to be associated with each litter, and litters are specified as subjects by using the subject = litter option. Alternative syntax for the random statement is: random litter ; This syntax results in a model that is equivalent to Model 3.1, but is much less efficient computationally. Because litter is specified as a random factor, we get the same blockdiagonal structure for the variance-covariance matrix for the random effects, which SAS refers to as the G matrix, as when we used subject = litter in the previous syntax (see Subsection 2.2.3 for a discussion of the G matrix). However, all observations are assumed to be from one “subject,” and calculations for parameter estimation use much larger matrices and take more time than when subject = litter is specified. The format statement attaches the user-defined format, trtfmt., to values of the variable TREAT. Step 2: Select a structure for the random effects (Model 3.1 vs. Model 3.1A). To test Hypothesis 3.1, we first fit Model 3.1A without the random effects associated with litter, by using the same syntax as for Model 3.1 but excluding the random statement: title "Model 3.1A"; ods output fitstatistics = fitla; proc mixed data = ratpup order = internal covtest; Two-Level Models for Clustered Data:The Rat Pup Example 77 class treat sex litter; model weight = treat sex treat*sex litsize / solution ddfm = sat; format treat trtfmt.; run; The SAS code below can be used to calculate the likelihood ratio test statistic for Hypothesis 3.1, compute the corresponding p-value, and display the resulting p-value in the SAS log. To apply this syntax, the user has to manually extract the –2 REML log-likelihood value for the reference model, Model 3.1 (–2 REML log-likelihood = 401.1), and for the nested model, Model 3.1A (–2 REML log-likelihood = 490.5), from the output and include these values in the code. Recall that the asymptotic null distribution of the test statistic for Hypothesis 3.1 is a mixture of χ20 and χ21 distributions, each having equal weight of 0.5. Because the χ20 has all of its mass concentrated at zero, its contribution to the p-value is zero, so it is not included in the following code. title "P-value for Hypothesis 3.1: Simple syntax"; data _null_; lrtstat = 490.5 - 401.1; df = 1; pvalue = 0.5*(1 - probchi(lrtstat,df)); format pvalue 10.8; put lrtstat = df = pvalue = ; run; Alternatively, we can use the data sets fit1 and fit1a, containing the fit statistics for Models 3.1 and 3.1A, respectively, to perform this likelihood ratio test. The information contained in these two data sets is displayed below, along with more advanced SAS code to merge the data sets, derive the difference of the –2 REML log-likelihoods, calculate the appropriate degrees of freedom, and compute the p-value for the likelihood ratio test. The results of the likelihood ratio test will be included in the SAS log. title "Fit 1"; proc print data = fit1; run; title "Fit 1a"; proc print data = fit1a; run; ' $ Fit 1 Obs Descr Value 1 2 3 4 -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 401.1 405.1 405.1 407.7 Fit 1a Obs Descr 1 -2 Res Log Likelihood 2 AIC (smaller is better) 3 AICC (smaller is better) 4 BIC (smaller is better) & Value 490.5 492.5 492.5 496.3 % 78 Linear Mixed Models: A Practical Guide Using Statistical Software title "p-value for Hypothesis 3.1: Advanced syntax"; data _null_; merge fit1(rename = (value = reference)) fit1a(rename = (value = nested)); retain loglik_diff; if descr = "-2 Res Log Likelihood" then loglik_diff = nested - reference; if descr = "AIC (smaller is better)" then do; df = floor((loglik_diff - nested + reference)/2); pvalue = 0.5*(1 - probchi(loglik_diff,df)); put loglik_diff = df = pvalue = ; format pvalue 10.8; end; run; The data _null_ statement causes SAS to execute the data step calculations without creating a new data set. The likelihood ratio test statistic for Hypothesis 3.1 is calculated by subtracting the –2 REML log-likelihood value for the reference model (contained in the data set fit1) from the corresponding value for the nested model (contained in fit1a). To calculate degrees of freedom for this test, we take advantage of the fact that SAS defines the AIC statistic as AIC = –2 REML log-likelihood + 2 × number of covariance parameters. Software Note: When a covariance parameter is estimated to be on the boundary of a parameter space by proc mixed (e.g., when a variance component is estimated to be zero), SAS will not include it when calculating the number of covariance parameters for the AIC statistic. Therefore, the advanced SAS code presented in this section for computing likelihood ratio tests for covariance parameters is only valid if the estimates of the covariance parameters being tested do not lie on the boundaries of their respective parameter spaces (see Subsection 2.5.2). Results from the likelihood ratio test of Hypothesis 3.1 and other hypotheses are presented in detail in Subsection 3.5.1. Step 3: Select a covariance structure for the residuals (Models 3.1, 3.2A, or 3.2B). The following SAS commands can be used to fit Model 3.2A, which allows unequal residual variance for each level of treatment. The only change in these commands from those used for Model 3.1 is the addition of the repeated statement: title "Model 3.2A"; proc mixed data = ratpup order = internal covtest; class treat litter sex; model weight = treat sex treat*sex litsize / solution ddfm = sat; random int / subject = litter; repeated / group = treat; format treat trtfmt.; run; In Model 3.2A, the option group = treat in the repeated statement allows a heterogeneous variance structure for the residuals, with each level of treatment having its own residual variance. Two-Level Models for Clustered Data:The Rat Pup Example 79 Software Note: In general, the repeated statement in proc mixed specifies the structure of the Rj matrix, which contains the variances and covariances of the residuals for the j-th cluster (e.g., litter). If no repeated statement is used, the default covariance structure for the residuals is employed, i.e., Rj = σ 2 Inj , where Inj is an nj ×nj identity matrix, with nj equal to the number of observations in a cluster (e.g., the number of rat pups in litter j). In other words, the default specification is homogeneous residual variance. Because this default Rj matrix is used for Model 3.1, we do not include a repeated statement for this model. To test Hypothesis 3.2, we calculate a likelihood ratio test statistic by subtracting the value of the –2 REML log-likelihood for Model 3.2A (the reference model with heterogeneous variance, –2 REML LL = 359.9) from that for Model 3.1 (the nested model, –2 REML LL = 401.1). The simple SAS syntax used to calculate this likelihood ratio test statistic is similar to that used for Hypothesis 3.1. The p-value is calculated by referring the test statistic to a χ2 distribution with two degrees of freedom, which correspond to the two additional variance parameters in Model 3.2A compared to Model 3.1. We do not use a mixture of χ20 and χ21 distributions, as in Hypothesis 3.1, because we are not testing a null hypothesis with values of the variances on the boundary of the parameter space: title "p-value for Hypothesis 3.2"; data _null_; lrtstat = 401.1 - 359.9; df = 2; pvalue = 1 - probchi(lrtstat,df); format pvalue 10.8; put lrtstat = df = pvalue = ; run; The test result is significant (p < 0.001), so we choose Model 3.2A, with heterogeneous residual variance, as our preferred model at this stage of the analysis. Before fitting Model 3.2B, we create a new variable named TRTGRP that combines the high and low treatment groups, to allow us to test Hypotheses 3.3 and 3.4. We also define a new format, TGRPFMT, for the TRTGRP variable. title "RATPUP3 dataset"; data ratpup3; set ratpup2; if treatment in ("High", "Low") then TRTGRP = 1; if treatment = "Control" then TRTGRP = 2; run; proc format; value tgrpfmt 1 = "High/Low" 2 = "Control"; run; We now fit Model 3.2B using the new data set, ratpup3, and the new group variable in the repeated statement (group = trtgrp). We also include TRTGRP in the class statement so that SAS will properly include it as the grouping variable for the residual variance. 80 Linear Mixed Models: A Practical Guide Using Statistical Software title "Model 3.2B"; proc mixed data = ratpup3 order = internal covtest; class treat litter sex trtgrp; model weight = treat sex treat*sex litsize / solution ddfm = sat; random int / subject = litter; repeated / group = trtgrp; format treat trtfmt. trtgrp tgrpfmt.; run; We use a likelihood ratio test for Hypothesis 3.3 to decide if we can use a common residual variance for both the high and low treatment groups (Model 3.2B) rather than different residual variances for each treatment group (Model 3.2A). For this hypothesis Model 3.2A is the reference model and Model 3.2B is the nested model. To calculate the test statistic we subtract the –2 REML log-likelihood for Model 3.2A from that for Model 3.2B (–2 REML LL = 361.1). This test has 1 degree of freedom, corresponding to the one fewer covariance parameter in Model 3.2B compared to Model 3.2A. title "p-value for Hypothesis 3.3"; data _null_; lrtstat = 361.1 - 359.9; df = 1; pvalue = 1 - probchi(lrtstat, df); format pvalue 10.8; put lrtstat = df = pvalue = ; run; The likelihood ratio test statistic for Hypothesis 3.3 is not significant (p = 0.27), so we choose the simpler grouped residual variance model, Model 3.2B, as our preferred model at this stage of the analysis. To test Hypothesis 3.4, and decide whether we wish to have a grouped heterogeneous residual variance structure vs. a homogeneous variance structure, we subtract the –2 REML log-likelihood of Model 3.2B (= 361.1) from that of Model 3.1 (= 401.1). The test statistic has 1 degree of freedom, corresponding to the 1 additional covariance parameter in Model 3.2B as compared to Model 3.1. The syntax for this comparison is not shown here. Based on the significant result of this likelihood ratio test (p < 0.001), we conclude that Model 3.2B (with grouped heterogeneous variances) is our preferred model. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 3.2B vs. 3.3, and Model 3.3 vs. 3.3A). We test Hypothesis 3.5 to decide whether we can remove the treatment by sex interaction term, making use of the default Type III F -test for the TREAT × SEX interaction in Model 3.2B. Because the result of this test is not significant (p = 0.73), we drop the TREAT × SEX interaction term from Model 3.2B, which gives us Model 3.3. We now test Hypothesis 3.6 to decide whether the fixed effects associated with treatment are equal to zero, using a likelihood ratio test. This test is not used as a tool for possible model reduction but as a way of assessing the impact of treatment on birth weights. To carry out the test, we first fit the reference model, Model 3.3, using maximum likelihood (ML) estimation: title "Model 3.3 using ML"; proc mixed data = ratpup3 order = internal method = ml; Two-Level Models for Clustered Data:The Rat Pup Example 81 class treat litter sex trtgrp; model weight = treat sex litsize / solution ddfm = sat; random int / subject = litter; repeated / group = trtgrp; format treat trtfmt.; run; The method = ml option in the proc mixed statement requests maximum likelihood estimation. To complete the likelihood ratio test for Hypothesis 3.6, we fit a nested model, Model 3.3A, without the fixed treatment effects, again requesting ML estimation, by making the following modifications to the SAS code for Model 3.3: title "Model 3.3A using ML"; proc mixed data = ratpup3 order = internal method = ml; ... model weight = sex litsize / solution ddfm = sat; ... The likelihood ratio test statistic used to test Hypothesis 3.6 is calculated by subtracting the –2 ML log-likelihood for Model 3.3 (the reference model) from that for Model 3.3A (the nested model without the fixed effects associated with treatment). The SAS code for this test is not shown here. Because the result of this test is significant (p < 0.001), we conclude that treatment has an effect on rat pup birth weights, after adjusting for the fixed effects of sex and litter size and the random litter effects. We now refit Model 3.3, our final model, using the default REML estimation method to get unbiased estimates of the covariance parameters. We also add a number of options to the SAS syntax. We request diagnostic plots in the proc mixed statement by adding a plots= option. We add options to the model statement to get output data sets containing the conditional predicted and residual values (outpred = pdat1) and we get another data set containing the marginal predicted and residual values (outpredm = pdat2). We also request that estimates of the implied marginal variance-covariance and correlation matrices, v and vcorr, for the third litter be displayed in the output by adding the options v = 3 and vcorr = 3 to the random statement. Post-hoc tests for all estimated differences among treatment means using the Tukey–Kramer adjustment for multiple comparisons are requested by adding the lsmeans statement. Finally, we request that the EBLUPs for the random intercept for each litter be saved in a new file called eblupsdat, by using an ods output statement. We first sort the ratpup3 data set by PUP ID, because the diagnostic plots identify individual points by row numbers in the data set, and the sorting will make the PUP ID variable equal to the row number in the data set for ease in reading the graphical output. proc sort data = ratpup3; by pup_id; run; ods graphics on; ods rtf file = "c:\temp\ratpup_diagnostics.rtf" style = journal; ods exclude influence; title "Model 3.3 using REML. Model diagnostics"; proc mixed data = ratpup3 order = internal covtest 82 Linear Mixed Models: A Practical Guide Using Statistical Software plots = (residualpanel boxplot influencestatpanel); class treat litter sex trtgrp; model weight = treat sex litsize / solution ddfm = sat residual outpred = pdat1 outpredm = pdat2 influence (iter = 5 effect = litter est) ; id pup_id litter treatment trtgrp ranklit litsize; random int / subject=litter solution v = 3 vcorr = 3 ; repeated / group=trtgrp ; lsmeans treat / adjust = tukey ; format treat trtfmt. trtgrp tgrpfmt. ; ods output solutionR = eblupsdat ; run; ods graphics off; ods rtf close; Software Note: In earlier versions of SAS, the default output mode was “Listing,” which produced text in the Output window. ODS (Output Delivery System) graphics were not automatically produced. However, in Version 9.3, the default output mode has changed to HTML, and ODS graphics are automatically produced. When you use the default HTML output, statistical tables and graphics are included together in the Results Viewer Window. You can change these default behaviors through SAS commands, or by going to Tools > Options > Preferences and clicking on the Results tab. By selecting “Create listing,” you will add text output to the Output window, as in previous versions of SAS. This will be in addition to the HTML output, which can be turned off by deselecting “Create HTML.” To turn off ODS graphics, which can become cumbersome when running large jobs, deselect “Use ODS graphics.” Click “OK” to confirm these changes. These same changes can be accomplished by the following commands: ods listing; ods html close; ods graphics off; The type of output that is produced can be modified by choosing an output type (e.g., .rtf) and sending the output and SAS ODS graphics to that file. There are a number of possible styles, with Journal being a black-and-white option suitable for a manuscript. If no style is selected, the default will be used, which includes color graphics. The .rtf file can be closed when the desired output has been captured. ods graphics on; ods rtf file="example.rtf" style=journal; proc mixed ...; run; ods rtf close; Any portion of the SAS output can be captured in a SAS data set, or can be excluded from the output by using ODS statements. For example, in the SAS code for Model 3.3, we used ods exclude influence. This prevents the influence statistics for each individual observation from being printed in the output, which can get very long if there are a large number of cases, but it still allows the influence diagnostic plots to be Two-Level Models for Clustered Data:The Rat Pup Example 83 displayed. We must also request that the influence statistics be calculated by including an influence option as part of the model statement. Output that we wish to save to a SAS data set can be requested by using an ods output statement. The statement below requests that the solutions for the random effects (the SolutionR table from the output) be output to a new data set called eblupsdat. ods output solutionR = eblupsdat; To view the names of SAS output tables so that they can be captured in a data set, use the following code, which will place the names of each portion of the output in the SAS log. ods trace on; proc mixed ...; run; ods trace off; The proc mixed statement for Model 3.3 has been modified by the addition of the plots = option, with the specific ODS plots that are requested listed within parentheses: (residualpanel boxplot influencestatpanel). The residualpanel suboption generates diagnostic plots for both conditional and marginal residuals. Although these residual plots were requested, we do not display them here, because these plots are not broken down by TRTGRP. Histograms and normal quantile–quantile (Q–Q) plots of residuals for each treatment group are displayed in Figures 3.4, 3.5, and 3.6. The model statement has also been modified by adding the residual option, to allow the generation of panels of residual diagnostic plots as part of the ODS graphics output. The boxplot suboption requests box plots of the marginal and conditional residuals by the levels of each class variable, including class variables specified in the subject = and group = options, to be created in the ODS graphics output. SAS also generates box plots for levels of the “subject” variable (LITTER), but only if we do not use nesting specification for litter in the random statement (i.e., we must use subject = litter rather than subject = litter(treat)). Box plots showing the studentized residuals for each litter are shown in Figure 3.7. The influencestatpanel suboption requests that influence plots for the model fit (REML distance), overall model statistics, covariance parameters, and fixed effects be produced. These plots are illustrated in Figures 3.8 through 3.11. Diagnostic plots generated for Model 3.3 are presented in Section 3.10. The influence option has also been added to the model statement for Model 3.3 to obtain influence plots as part of the ODS graphics output (see Subsection 3.10.2). The inclusion of the iter = suboption is used to produce iterative updates to the model, by removing the effect of each litter, and then re-estimating all model parameters (including both fixed-effect and random-effect parameters). The option effect = specifies an effect according to which observations are grouped, i.e., observations sharing the same level of the effect are removed as a group when calculating the influence diagnostics. The effect must contain only class variables, but these variables do not need to be contained in the model. Without the effect = suboption, influence statistics would be created for the values for the individual rat pups and not for litters. The influence diagnostics are discussed in Subsection 3.10.2. 84 Linear Mixed Models: A Practical Guide Using Statistical Software We caution readers that running proc mixed with these options (e.g., iter = 5 and effect = litter) can cause the procedure to take considerably longer to finish running. In our example, with 27 litters and litter-specific subset deletion, the longer execution time is a result of the fact that proc mixed needs to fit 28 models, i.e., the initial model and the model corresponding to each deleted litter, with up to five iterations per model. Clearly, this can become very time-consuming if there are a large number of levels of a variable that is being checked for influence. The outpred = option in the model statement causes SAS to output the predicted values and residuals conditional on the random effects in the model. In this example, we output the conditional residuals and predicted values to a data set called pdat1 by specifying outpred = pdat1. The outpredm = option causes SAS to output the marginal predicted and residual values to another data set. In this case, we request that these variables be output to the pdat2 data set by specifying outpredm = pdat2. We discuss these residuals in Subsection 3.10.1. The id statement allows us to place variables in the output data sets, pdat1 and pdat2, that identify each observation. We specify that PUP ID, LITTER, TREATMENT, TRTGRP, RANKLIT, and LITSIZE be included, so that we can use them later in the residual diagnostics. The random statement has been modified to include the v = and vcorr = options, so that SAS displays the estimated marginal Vj matrix and the corresponding marginal correlation matrix implied by Model 3.3 for birth weight observations from the third litter (v = 3 and vcorr = 3). We chose the third litter in this example because it has only four rat pups, to keep the size of the estimated Vj matrix in the output manageable (see Section 3.8 for a discussion of the implied marginal covariance matrix). We also include the solution option, to display the EBLUPs for the random litter effects in the output. The lsmeans statement allows us to obtain estimates of the least-squares means of WEIGHT for each level of treatment, based on the fixed-effect parameter estimates for TREAT. The least-squares means are evaluated at the mean of LITSIZE, and assuming that there are equal numbers of rat pups for each level of SEX. We also carry out post-hoc comparisons among all pairs of the least-squares means using the Tukey–Kramer adjustment for multiple comparisons by specifying adjust = tukey. Many other adjustments for multiple comparisons can be obtained, such as Dunnett’s and Bonferroni. Refer to the SAS documentation for proc mixed for more information on post-hoc comparison methods available in the lsmeans statement. Diagnostics for this final model using the REML fit for Model 3.3 are presented in Section 3.10. 3.4.2 SPSS Most analyses in SPSS can be performed using either the menu system or SPSS syntax. The syntax for LMMs can be obtained by specifying a model using the menu system and then pasting the syntax into the syntax window. We recommend pasting the syntax for any LMM that is fitted using the menu system, and then saving the syntax file for documentation. We present SPSS syntax throughout the example chapters for ease of presentation, although the models were usually set up using the menu system. A link to an example of setting up an LMM using the SPSS menus is included on the web page for this book (see Appendix A). For the analysis of the Rat Pup data, we first read in the raw data from the tab-delimited file rat pup.dat (assumed to be located in the C:\temp folder) using the following syntax. This SPSS syntax was pasted after reading in the data using the SPSS menu system. * Read in Rat Pup data. GET DATA Two-Level Models for Clustered Data:The Rat Pup Example 85 /TYPE = TXT /FILE = "C:\temp\rat_pup.dat" /DELCASE = LINE /DELIMITERS = "\t" /ARRANGEMENT = DELIMITED /FIRSTCASE = 2 /IMPORTCASE = ALL /VARIABLES = pup_id F2.1 weight F4.2 sex A6 litter F1. litsize F2.1 treatment A7 . CACHE. EXECUTE. Because the MIXED command in SPSS sets the fixed-effect parameter associated with the highest-valued level of a fixed factor to 0 by default, to prevent overparameterization of models (similar to proc mixed in SAS; see Subsection 3.4.1), the highest-valued levels of fixed factors can be thought of as “reference categories” for the factors. As a result, we recode TREATMENT into a new variable named TREAT, so that the control group (TREAT = 3) will be the reference category. * Recode TREATMENT variable . RECODE Treatment ("High"=1) ("Low"=2) ("Control"=3) INTO treat . EXECUTE . VARIABLE LABEL treat "Treatment" . VALUE LABELS treat 1 "High" 2 "Low" 3 "Control" . Step 1: Fit a model with a “loaded” mean structure (Model 3.1). The following SPSS syntax can be used to fit Model 3.1: * Model 3.1. MIXED weight BY treat sex WITH litsize /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = treat sex litsize treat*sex | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM INTERCEPT | SUBJECT(litter) COVTYPE(VC) /SAVE = PRED RESID . The first variable listed after invoking the MIXED command is the dependent variable, WEIGHT. The BY keyword indicates that the TREAT and SEX variables are to be considered as categorical factors (they can be either fixed or random). Note that we do not need 86 Linear Mixed Models: A Practical Guide Using Statistical Software to include LITTER as a factor, because this variable is identified as a SUBJECT variable later in the code. The WITH keyword identifies continuous covariates, and in this case, we specify LITSIZE as a continuous covariate. The CRITERIA subcommand specifies default settings for the convergence criteria obtained by specifying the model using the menu system. In the FIXED subcommand, we include terms that have fixed effects associated with them in the model: TREAT, SEX, LITSIZE and the TREAT × SEX interaction. The SSTYPE(3) option after the vertical bar indicates that the default Type III analysis is to be used when calculating F -statistics. We also use the METHOD = REML subcommand, which requests that the REML estimation method (the default) be used. The SOLUTION keyword in the PRINT subcommand specifies that the estimates of the fixed-effect parameters, covariance parameters, and their associated standard errors are to be included in the output. The RANDOM subcommand specifies that there is a random effect in the model associated with the INTERCEPT for each level of the SUBJECT variable (i.e., LITTER). The information about the “subject” variable is specified after the vertical bar (|). Note that because we included LITTER as a “subject” variable, we did not need to list it after the BY keyword (including LITTER after BY does not affect the analysis if LITTER is also indicated as a SUBJECT variable). The COVTYPE(VC) option indicates that the default Variance Components (VC) covariance structure for the random effects (the D matrix) is to be used. We did not need to specify a COVTYPE here because only a single variance associated with the random effects is being estimated. Conditional predicted values and residuals are saved in the working data set by specifying PRED and RESID in the SAVE subcommand. The keyword PRED saves litter-specific predicted values that incorporate both the estimated fixed effects and the EBLUPs of the random litter effects for each observation. The keyword RESID saves the conditional residuals that represent the difference between the actual value of WEIGHT and the predicted value for each rat pup, based on the estimated fixed effects and the EBLUP of the random effect for each observation. The set of population-averaged predicted values, based only on the estimated fixed-effect parameters, can be obtained by adding the FIXPRED keyword to the SAVE subcommand, as shown later in this chapter (see Section 3.9 for more details): /SAVE = PRED RESID FIXPRED Software Note: There is currently no option to display or save the predicted values of the random litter effects (EBLUPs) in the output in SPSS. However, because all models considered for the Rat Pup data contain a single random intercept for each litter, the EBLUPs can be calculated by simply taking the difference between the “populationaveraged” and “litter-specific” predicted values. The values of FIXPRED from the first LMM can be stored in a variable called FIXPRED 1, and the values of PRED from the first model can be stored as PRED 1. We can then compute the difference between these two predicted values and store the result in a new variable that we name EBLUP: COMPUTE eblup = pred_1 - fixpred_1 . EXECUTE . The values of the EBLUP variable, which are constant for each litter, can then be displayed in the output by using this syntax: SORT CASES BY litter. SPLIT FILE Two-Level Models for Clustered Data:The Rat Pup Example 87 LAYERED BY litter. DESCRIPTIVES VARIABLES = eblup /STATISTICS=MEAN STDDEV MIN MAX. SPLIT FILE OFF. Step 2: Select a structure for the random effects (Model 3.1 vs. Model 3.1A). We now use a likelihood ratio test of Hypothesis 3.1 to decide if the random effects associated with the intercept for each litter can be omitted from Model 3.1. To carry out the likelihood ratio test we first fit a nested model, Model 3.1A, using the same syntax as for Model 3.1 but with the RANDOM subcommand omitted: * Model 3.1A . MIXED weight BY treat sex WITH litsize /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = treat sex litsize treat*sex | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /SAVE = PRED RESID FIXPRED . The test statistic for Hypothesis 3.1 is calculated by subtracting the –2 REML loglikelihood value associated with the fit of Model 3.1 (the reference model) from that for Model 3.1A (the nested model). These values are displayed in the SPSS output for each model. The null distribution for this test statistic is a mixture of χ20 and χ21 distributions, each with equal weight of 0.5 (see Subsection 3.5.1). Because the result of this test is significant (p < 0.001), we choose to retain the random litter effects. Step 3: Select a covariance structure for the residuals (Models 3.1, 3.2A, or 3.2B). To fit Model 3.2A, with heterogeneous error variances as a function of the TREAT factor, we need to use the GENLINMIXED command in IBM SPSS Statistics (Version 21). This command can fit linear mixed models in addition to generalized linear mixed models, which are not covered in this book. Models with heterogeneous error variances for different groups of clusters cannot be fitted using the MIXED command. We use the following syntax to fit Model 3.2A using the GENLINMIXED command: * Model 3.2A. GENLINMIXED /DATA_STRUCTURE SUBJECTS=litter REPEATED_MEASURES=pup_id GROUPING=treat COVARIANCE_TYPE=IDENTITY /FIELDS TARGET=weight TRIALS=NONE OFFSET=NONE /TARGET_OPTIONS DISTRIBUTION=NORMAL LINK=IDENTITY /FIXED EFFECTS=sex litsize treat sex*treat USE_INTERCEPT=TRUE /RANDOM USE_INTERCEPT=TRUE SUBJECTS=litter COVARIANCE_TYPE=VARIANCE_COMPONENTS 88 Linear Mixed Models: A Practical Guide Using Statistical Software /BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=SATTERTHWAITE COVB=MODEL /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD. There are several important notes to consider when fitting this type of model using the GENLINMIXED command: • First, the LITTER variable (or the random factor identifying clusters of observations, more generally) needs to be declared as a Nominal variable in the SPSS Variable View, under the “Measure” column. Continuous predictors with fixed effects included in the model (e.g., LITSIZE) need to be declared as Scale variables in Variable View, and categorical fixed factors with fixed effects included in the model should be declared as Nominal variables. • Second, heterogeneous error variance structures for clustered data can only be set up if some type of repeated measures index is defined for each cluster. We arbitrarily defined the variable PUP ID, which has a unique value for each pup within a litter, as this index (in the REPEATED_MEASURES option of the DATA_STRUCTURE subcommand). This index variable should be a Scale variable. • Third, the grouping factor that defines cases that will have different error variances is defined in the GROUPING option (TREAT). • Fourth, if we desire a simple error covariance structure for each TREAT group of litters, defined only by a constant error variance, we need to use COVARIANCE_TYPE=IDENTITY. Other error covariance structures are possible, but in models for cross-sectional clustered data that already include random cluster effects, additional covariance among the errors (that is not already accounted for by the random effects) is generally unlikely. • Fifth, note that the dependent variable is identified as the TARGET variable, the marginal distribution of the dependent variable is identified as NORMAL, and the IDENTITY link is used (in the context of generalized linear mixed models, these options set up a linear mixed model). • Sixth, the RANDOM subcommand indicates random intercepts for each litter by USE_INTERCEPT=TRUE, with SUBJECTS=litter and COVARIANCE_TYPE=VARIANCE _COMPONENTS. After submitting this syntax, SPSS will generate output for the model in the output viewer. The output generated by the GENLINMIXED command is fairly unusual relative to the other procedures. By default, most of the output appears in “graphical” format, and users need to double-click on the “Model Viewer” portion of the output to open the full set of output windows. To test Hypothesis 3.2, we first need to find the –2 REML loglikelihood value for this model (Model 3.2A). This value can be found in the very first “Model Summary” window, in the footnote (359.9). We subtract this value from the –2 REML log-likelihood value of 401.1 for Model 3.1 (with constant error variance across the treatment groups), and compute a p-value for the resulting chi-square test statistic using the following syntax: COMPUTE hyp32a = SIG.CHISQ(401.1 - 359.9, 2) . EXECUTE . We use two degrees of freedom given the two additional error variance parameters in Model 3.2A relative to Model 3.1. The resulting p-value will be shown in the last column of Two-Level Models for Clustered Data:The Rat Pup Example 89 the SPSS data set, and suggests that we reject the null hypothesis (Model 3.1, with equal error variance across treatment groups) and proceed with Model 3.2A (p < 0.001). Before fitting Model 3.2B, we recode the original TREAT variable into a new variable, TRTGRP, that combines the high and low treatment groups (for testing Hypotheses 3.3 and 3.4): RECODE treat (1 = 1) (2 = 1) (3 = 2) into trtgrp . EXECUTE . VALUE LABELS trtgrp 1 "High/Low" 2 "Control". We now fit Model 3.2B using the GENLINMIXED command once again: * Model 3.2B . GENLINMIXED /DATA_STRUCTURE SUBJECTS=litter REPEATED_MEASURES=pup_id GROUPING=trtgrp COVARIANCE_TYPE=IDENTITY /FIELDS TARGET=weight TRIALS=NONE OFFSET=NONE /TARGET_OPTIONS DISTRIBUTION=NORMAL LINK=IDENTITY /FIXED EFFECTS=sex litsize treat sex*treat USE_INTERCEPT=TRUE /RANDOM USE_INTERCEPT=TRUE SUBJECTS=litter COVARIANCE_TYPE=VARIANCE_COMPONENTS /BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=SATTERTHWAITE COVB=MODEL /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD. Note that the only difference from the syntax used for Model 3.2A is the use of the newly recoded TRTGRP variable in the GROUPING option, which will define a common error variance for the high and low treatment groups, and a different error variance for the control group. The resulting –2 REML log-likelihood value for this model is 361.1, and we compute a likelihood ratio test p-value to test this model against Model 3.2A: COMPUTE hyp33 = SIG.CHISQ(361.1 - 359.9, 1) . EXECUTE . The resulting p-value added to the data set (p = 0.27) suggests that we choose the simpler model (Model 3.2B) moving forward. To test Hypothesis 3.4, and compare the fit of Model 3.2B with Model 3.1, we perform another likelihood ratio test: COMPUTE hyp34 = SIG.CHISQ(401.1 - 359.9, 1) . EXECUTE . This test result (p < 0.001) indicates that we should reject the null hypothesis of constant error variance across the treatment groups, and proceed with Model 3.2B, allowing for different error variances in the high/low treatment group and the control group. Software Note: When processing the GENLINMIXED output for Model 3.2B, SPSS users can navigate the separate windows within the “Model Viewer” window. In windows showing the tests for the fixed effects and the estimated fixed effects themselves, we recommend changing the display style to “Table” in the lower left corner of each window. This will show the actual tests and estimates rather than a graphical display. 90 Linear Mixed Models: A Practical Guide Using Statistical Software In the window showing estimates of the covariance parameters, users can change the “Effect” being shown to examine the estimated variance of the random litter effects or the estimates of the error variance parameters, and when examining the estimates of the error variance parameters, users can toggle the (treatment) group being shown in the lower left corner of the window. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 3.2B vs. 3.3, and Model 3.3 vs. 3.3A). We test Hypothesis 3.5 to decide whether we can remove the treatment by sex interaction term, making use of the default Type III F -test for the TREAT × SEX interaction in Model 3.2B. The result of this test can be found in the “Model Viewer” window for Model 3.2B, in the window entitled “Fixed Effects.” Because the result of this test is not significant (p = 0.73), we drop the TREAT × SEX interaction term from Model 3.2B, which gives us Model 3.3. We now test Hypothesis 3.6 to decide whether the fixed effects associated with treatment are equal to 0 (in the model omitting the interaction term). While we illustrate the use of likelihood ratio tests based on maximum likelihood estimation to test this hypothesis in the other procedures, we do not have the option of fitting a model using ML estimation when using GENLINMIXED. This is because this command takes the general approach of using penalized quasi-likelihood (PQL) estimation, which can be used for a broad class of generalized linear mixed models. For this reason, we simply refer to the Type III F -test for treatment based on Model 3.3. Here is the syntax to fit Model 3.3: * Model 3.3 . GENLINMIXED /DATA_STRUCTURE SUBJECTS=litter REPEATED_MEASURES=pup_id GROUPING=trtgrp COVARIANCE_TYPE=IDENTITY /FIELDS TARGET=weight TRIALS=NONE OFFSET=NONE /TARGET_OPTIONS DISTRIBUTION=NORMAL LINK=IDENTITY /FIXED EFFECTS=sex litsize treat USE_INTERCEPT=TRUE /RANDOM USE_INTERCEPT=TRUE SUBJECTS=litter COVARIANCE_TYPE=VARIANCE_COMPONENTS /BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=SATTERTHWAITE COVB=MODEL /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD. Note that the interaction term has been dropped from the FIXED subcommand. The resulting F -test for TREAT in the “Fixed Effects” window suggests that TREAT is strongly significant. Pairwise comparisons of the marginal means for each treatment group along with fitted values and residuals (for diagnostic purposes) can be generated using the following syntax: * Model 3.3, pairwise comparisons and diagnostics . GENLINMIXED /DATA_STRUCTURE SUBJECTS=litter REPEATED_MEASURES=pup_id GROUPING=trtgrp COVARIANCE_TYPE=IDENTITY /FIELDS TARGET=weight TRIALS=NONE OFFSET=NONE /TARGET_OPTIONS DISTRIBUTION=NORMAL LINK=IDENTITY /FIXED EFFECTS=sex litsize treat USE_INTERCEPT=TRUE Two-Level Models for Clustered Data:The Rat Pup Example 91 /RANDOM USE_INTERCEPT=TRUE SUBJECTS=litter COVARIANCE_TYPE=VARIANCE_COMPONENTS /BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=SATTERTHWAITE COVB=MODEL /EMMEANS TABLES=treat COMPARE=treat CONTRAST=PAIRWISE /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=SEQBONFERRONI /SAVE PREDICTED_VALUES(PredictedValue) PEARSON_RESIDUALS(PearsonResidual) . Note the two new EMMEANS subcommands: the first requests a table showing pairwise comparisons of the means for TREAT, while the second indicates a sequential Bonferroni adjustment for the multiple comparisons. The resulting comparisons can be found in the “Estimated Means” window of the “Model Viewer” output (where we again recommend using a Table style for the display). In addition, the new SAVE subcommand generates fitted values and residuals based on this model in the data set, which can be used for diagnostic purposes (see Section 3.10). 3.4.3 R Before starting the analysis in R, we first import the tab-delimited data file, rat_pup.dat (assumed to be located in the C:\temp directory), into a data frame object in R named ratpup.2 > ratpup <- read.table("c:\\temp\\rat_pup.dat", h = T) > attach(ratpup) The h = T argument in the read.table() function indicates that the first row (the header) in the rat_pup.dat file contains variable names. After reading the data, we “attach” the ratpup data frame to R’s working memory so that the columns (i.e., variables) in the data frame can be easily accessed as separate objects. Note that we show the “>”prompt for each command as it would appear in R, but this prompt is not typed as part of the commands. To facilitate comparisons with the analyses performed using the other software procedures, we recode the variable SEX into SEX1, which is an indicator variable for females, so that males will be the reference group: > ratpup$sex1[sex == "Female"] <- 1 > ratpup$sex1[sex == "Male"] <- 0 We first consider the analysis using the lme() function (from the nlme package), and then replicate as many steps of the analysis as possible using the newer lmer() function (from the lme4 package). 3.4.3.1 Analysis Using the lme() Function Step 1: Fit a model with a “loaded” mean structure (Model 3.1). We first load the nlme package, so that the lme() function will be available for model fitting: > library(nlme) 2 The Rat Pup data set is also available as a data frame object in the nlme package. After loading the package, the name of the data frame object is RatPupWeight. 92 Linear Mixed Models: A Practical Guide Using Statistical Software We next fit the initial LMM, Model 3.1, to the Rat Pup data using the lme() function: > # Model 3.1. > model3.1.fit <- lme(weight ~ treatment + sex1 + litsize + treatment:sex1, random = ~ 1 | litter, data = ratpup, method = "REML") We explain each part of the syntax used for the lme() function below: • model3.1.fit is the name of the object that will contain the results of the fitted model. • The first argument of the function, weight ~ treatment + sex1 + litsize + treatment:sex1, defines a model formula. The response variable, WEIGHT, and the terms that have associated fixed effects in the model (TREATMENT, SEX1, LITSIZE, and the TREATMENT × SEX1 interaction), are listed. The factor() function is not necessary for the categorical variable TREATMENT, because the original treatment variable has string values High, Low, and Control, and will therefore be considered as a factor automatically. We also do not need to declare SEX1 as a factor, because it is an indicator variable having only values of 0 and 1. • The second argument, random = ~ 1 | litter, includes a random effect for each level of LITTER in the model. These random effects will be associated with the intercept, as indicated by ~ 1. • The third argument of the function, ratpup, indicates the name of the data frame object to be used in the analysis. • The final argument, method = "REML", specifies that the default REML estimation method is to be used. By default, the lme() function treats the lowest level (alphabetically or numerically) of a categorical fixed factor as the reference category. This means that “Control” will be the reference category of TREATMENT because “Control” is the lowest level of treatment alphabetically. The relevel() function can also be used to change the reference categories of factors. For example, if one desired “High” to be the reference category of treatment, they could use the following function: > treatment <- relevel(treatment, ref = "High") We obtain estimates from the model fit by using the summary() function: > summary(model3.1.fit) Additional results for the fit of Model 3.1 can be obtained by using other functions in conjunction with the mode13.1.fit object. For example, we can obtain F -tests for the fixed effects in the model by using the anova() function: > anova(model3.1.fit) The anova() function performs a series of Type I (or sequential) F -tests for the fixed effects in the model, each of which are conditional on the preceding terms in the model specification. For example, the F -test for SEX1 is conditional on the TREATMENT effects, but the F -test for TREATMENT is not conditional on the SEX1 effect. The random.effects() function can be used to display the EBLUPs for the random litter effects: > # Display the random effects (EBLUPs) from the model. > random.effects(model3.1.fit) Two-Level Models for Clustered Data:The Rat Pup Example 93 Step 2: Select a structure for the random effects (Model 3.1 vs. Model 3.1A). We now test Hypothesis 3.1 to decide whether the random effects associated with the intercept for each litter can be omitted from Model 3.1, using a likelihood ratio test. We do 2 this indirectly by testing whether the variance of the random litter effects, σlitter , is zero vs. the alternative that the variance is greater than zero. We fit Model 3.1A, which is nested within Model 3.1, by excluding the random litter effects. Because the lme() function requires the specification of at least one random effect, we use the gls() function, which is also available in the nlme package, to fit Model 3.1, excluding the random litter effects. The gls() function fits marginal linear models using REML estimation. We fit Model 3.1A using the gls() function and then compare the –2 REML log-likelihood values for Models 3.1 and 3.1A using the anova() function: > # Model 3.1A. > model3.la.fit <- gls(weight ~ treatment + sex1 + litsize + treatment:sex1, data = ratpup) > anova(model3.1.fit, model3.la.fit) # Test Hypothesis 3.1. The anova() function performs a likelihood ratio test by subtracting the –2 REML loglikelihood value for Model 3.1 (the reference model) from the corresponding value for Model 3.1A (the nested model) and referring the difference to a χ2 distribution with 1 degree of freedom. The result of this test (p < 0.001) suggests that the random litter effects should be retained in this model. To get the correct p-value for Hypothesis 3.1, however, we need to divide the p-value reported by the anova() function by 2; this is because we are testing the null hypothesis that the variance of the random litter effects equals zero, which is on the boundary of the parameter space for a variance. The null distribution of the likelihood ratio test statistic for Hypothesis 3.1 follows a mixture of χ20 and χ21 distributions, with equal weight of 0.5 (see Subsection 3.5.1 for more details). Based on the significant result of this test (p < 0.0001), we keep the random litter effects in this model and in all subsequent models. Step 3: Select a covariance structure for the residuals (Models 3.1, 3.2A, or 3.2B). We now fit Model 3.2A, with a separate residual variance for each treatment group 2 2 2 (σhigh , σlow , and σcontrol ). > # Model 3.2A. > model3.2a.fit <- lme(weight ~ treatment + sex1 + litsize + treatment:sex1, random = ~1 | litter, ratpup, method = "REML", weights = varIdent(form = ~1 | treatment)) The arguments of the lme() function are the same as those used to fit Model 3.1, with the addition of the weights argument. The weights = varIdent(form = ~ 1 | treatment) argument sets up a heterogeneous residual variance structure, with observations at different levels of TREATMENT having different residual variance parameters. We apply the summary() function to review the results of the model fit: > summary(model3.2a.fit) In the Variance function portion of the following output, note the convention used by the lme() function to display the heterogeneous variance parameters: 94 ' Linear Mixed Models: A Practical Guide Using Statistical Software $ Random effects: Formula: ~1 | litter (Intercept) Residual StdDev: 0.3134714 0.5147866 Variance function: Structure: Different standard deviations per stratum Formula: ~ 1 | treatment Parameter estimates: Control Low High 1.0000000 0.5650369 0.6393779 & % We first note in the Random effects portion of the output that the estimated Residual standard deviation is equal to 0.5147866. The Parameter estimates specify the values by which the residual standard deviation should be multiplied to obtain the estimated standard deviation of the residuals in each treatment group. This multiplier is 1.0 for the control group (the reference group). The multipliers reported for the low and high treatment groups are very similar (0.565 and 0.639, respectively), suggesting that the residual standard deviation is smaller in these two treatment groups than in the control group. The estimated residual variance for each treatment group can be obtained by squaring their respective standard deviations. To test Hypothesis 3.2, we subtract the –2 REML log-likelihood for the heterogeneous residual variance model, Model 3.2A, from the corresponding value for the model with homogeneous residual variance for all treatment groups, Model 3.1, by using the anova() function. > # Test Hypothesis 3.2. > anova(model3.1.fit, model3.2a.fit) We do not need to adjust the p-value returned by the anova() function for Hypothesis 3.2, because the null hypothesis (stating that the residual variance is identical for each treatment group) does not set a covariance parameter equal to the boundary of its parameter space, as in Hypothesis 3.1. Because the result of this likelihood ratio test is significant (p < 0.001), we choose the heterogeneous variances model (Model 3.2A) as our preferred model at this stage of the analysis. We next test Hypothesis 3.3 to decide if we can pool the residual variances for the high and low treatment groups. To do this, we first create a pooled treatment group variable, TRTGRP: > ratpup$trtgrp[treatment == "Control"] <- 1 > ratpup$trtgrp[treatment == "Low" | treatment == "High"] <- 2 We now fit Model 3.2B, using the new TRTGRP variable in the weights = argument to specify the grouped heterogeneous residual variance structure: > model3.2b.fit <- lme(weight ~ treatment + sex1 + litsize + treatment:sex1, random = ~ 1 | litter, ratpup, method = "REML", weights = varIdent(form = ~1 | trtgrp)) We test Hypothesis 3.3 using a likelihood ratio test, by applying the anova() function to the objects containing the fits for Model 3.2A and Model 3.2B: Two-Level Models for Clustered Data:The Rat Pup Example 95 > # Test Hypothesis 3.3. > anova(model3.2a.fit, model3.2b.fit) The null distribution of the test statistic in this case is a χ2 with one degree of freedom. Because the test is not significant (p = 0.27), we select the nested model, Model 3.2B, as our preferred model at this stage of the analysis. We use a likelihood ratio test for Hypothesis 3.4 to decide whether we wish to retain the grouped heterogeneous error variances in Model 3.2B or choose the homogeneous error variance model, Model 3.1. The anova() function is also used for this test: > # Test Hypothesis 3.4. > anova(model3.1.fit, model3.2b.fit) The result of this likelihood ratio test is significant (p < 0.001), so we choose the pooled heterogeneous residual variances model, Model 3.2B, as our preferred model. We can view the parameter estimates from the fit of this model using the summary() function: > summary(model3.2b.fit) Step 4: Reduce the model by removing nonsignificant fixed effects (Model 3.2B vs. Model 3.3, and Model 3.3 vs. Model 3.3A). We test Hypothesis 3.5 to decide whether the fixed effects associated with the treatment by sex interaction are equal to zero in Model 3.2B, using a Type I F -test in R. To obtain the results of this test (along with Type I F -tests for all of the other fixed effects in the model), we apply the anova() function to the model3.2b.fit object: > # Test Hypothesis 3.5. > anova(model3.2b.fit) Based on the nonsignificant Type I F -test (p = 0.73), we delete the TREATMENT × SEX1 interaction term from the model and obtain our final model, Model 3.3. We test Hypothesis 3.6 to decide whether the fixed effects associated with treatment are equal to zero in Model 3.3, using a likelihood ratio test based on maximum likelihood (ML) estimation. We first fit the reference model, Model 3.3, using ML estimation (method = "ML"), and then fit a nested model, Model 3.3A, without the TREATMENT term, also using ML estimation. > model3.3.ml.fit <- lme(weight ~ treatment + sexl + litsize, random = ~1 | litter, ratpup, method = "ML", weights = varIdent(form = ~1 | trtgrp)) > model3.3a.ml.fit <- lme(weight ~ sex1 + litsize, random = ~1 | litter, ratpup, method = "ML", weights = varIdent(form = ~1 | trtgrp)) We then use the anova() function to carry out the likelihood ratio test of Hypothesis 3.6: > # Test Hypothesis 3.6. > anova(model3.3.ml.fit, model3.3a.ml.fit) The likelihood ratio test result is significant (p < 0.001), so we retain the significant fixed treatment effects in the model. We keep the fixed effects associated with SEX1 and LITSIZE without testing them, to adjust for these fixed effects when assessing the treatment 96 Linear Mixed Models: A Practical Guide Using Statistical Software effects. See Section 3.5 for a discussion of the results of all hypothesis tests for the Rat Pup data analysis. We now refit our final model, Model 3.3, using REML estimation to get unbiased estimates of the variance parameters. Note that we now specify TREATMENT as the last term in the fixed-effects portion of the model, so the Type I F -test reported for TREATMENT by the anova() function will be comparable to the Type III F -test reported by SAS proc mixed. > # Model 3.3: Final Model. > model3.3.reml.fit <- lme(weight ~ sex1 + litsize + treatment, random = ~1 | litter, ratpup, method = "REML", weights = varIdent(form = ~1 | trtgrp)) > summary(model3.3.reml.fit) > anova(model3.3.reml.fit) 3.4.3.2 Analysis Using the lmer() Function Step 1: Fit a model with a “loaded” mean structure (Model 3.1). We first load the lme4 package, so that the lmer() function will be available for model fitting: > library(lme4) We next fit the initial LMM, Model 3.1, to the Rat Pup data using the lmer() function: > # Model 3.1. > model3.1.fit.lmer <- lmer(weight ~ treatment + sex1 + litsize + treatment:sex1 + (1 | litter), ratpup, REML = T) We explain each part of the syntax used for the lmer() function below: • model3.1.fit.lmer is the name of the object that will contain the results of the fitted model. • The first portion of the first argument of the function, weight ~ treatment + sex1 + litsize + treatment:sex1, partly defines the model formula. The response variable, WEIGHT, and the terms that have associated fixed effects in the model (TREATMENT, SEX1, LITSIZE, and the TREATMENT × SEX1 interaction), are listed. The factor() function is not necessary for the categorical variable TREATMENT, because the original treatment variable has string values High, Low, and Control, and will therefore be considered as a factor automatically. We also do not need to declare SEX1 as a factor, because it is an indicator variable having only values of 0 and 1. • The model formula also includes the term (1 | litter), which includes a random effect associated with the intercept (1) for each unique level of LITTER. This is a key difference from the lme() function, where there is a separate random argument needed to specify the random effects in a given model. • The third argument of the function, ratpup, indicates the name of the data frame object to be used in the analysis. • The final argument, REML = T, specifies that the default REML estimation method is to be used. Two-Level Models for Clustered Data:The Rat Pup Example 97 Like the lme() function, the lmer() function treats the lowest level (alphabetically or numerically) of a categorical fixed factor as the reference category by default. This means that “Control” will be the reference category of TREATMENT because “Control” is the lowest level of treatment alphabetically. The relevel() function can also be used in conjunction with the lmer() function to change the reference categories of factors. We obtain estimates from the model fit by using the summary() function: > summary(model3.1.fit.lmer) In the resulting output, we see that the lmer() function only produces t-statistics for the fixed effects, with no corresponding p-values. This is primarily due to the lack of agreement in the literature over appropriate degrees of freedom for these test statistics. The anova() function also does not provide p-values for the F -statistics when applied to a model fit object generated by using the lmer() function. In general, we recommend use of the lmerTest package in R for users interested in testing hypotheses about parameters estimated using the lmer() function. In this chapter and others, we illustrate likelihood ratio tests using selected functions available in the lme4 and lmerTest packages. The ranef() function can be used to display the EBLUPs for the random litter effects: > # Display the random effects (EBLUPs) from the model. > ranef(model3.1.fit.lmer) Step 2: Select a structure for the random effects (Model 3.1 vs. Model 3.1A). For this step, we first load the lmerTest package, which enables likelihood ratio tests of hypotheses concerning the variances of random effects in models fitted using the lmer() function (including models with only a single random effect, such as Model 3.1). As an alternative to loading the lme4 package, R users may also simply load the lmerTest package first, which includes all related packages required for its use. > library(lmerTest) We once again employ the lmer() function and fit Model 3.1 to the Rat Pup data, after loading the lmerTest package: > # Model 3.1. > model3.1.fit.lmer <- lmer(weight ~ treatment + sex1 + litsize + treatment:sex1 + (1 | litter), ratpup, REML = T) We then apply the summary() function to this model fit object: > summary(model3.1.fit.lmer) We note that the lmer() function now computes p-values for all of the fixed effects included in this model, using a Satterthwaite approximation of the degrees of freedom for this test (similar to the MIXED command in SPSS). For testing Hypothesis 3.1 (i.e., is the variance of the random litter effects greater than zero?), we can use the rand() function to perform a likelihood ratio test: > rand(model3.1.fit.lmer) In this case, the rand() function fits Model 3.1A (without the random litter effects) and computes the appropriate likelihood ratio test statistic, representing the positive difference in the –2 REML log-likelihood values of the two models (89.4). The corresponding p-value based on a mixture of chi-square distributions (p < 0.001) suggests a strong rejection of the null hypothesis, and we therefore retain the random litter effects in all subsequent models. 98 Linear Mixed Models: A Practical Guide Using Statistical Software Step 3: Select a covariance structure for the residuals (Models 3.1, 3.2A, or 3.2B). At the time of this writing, the lmer() function does not allow users to fit models with heterogeneous error variance structures. We therefore do not consider Models 3.2A or 3.2B in this analysis. The fixed effects in the model including the random litter effects and assuming constant error variance across the treatment groups (Model 3.1) can be tested using the lmerTest package, and the lme() function can be used to test for the possibility of heterogeneous error variances, as illustrated in Section 3.4.3.1. 3.4.4 Stata We begin by importing the tab-delimited version of the Rat Pup data set into Stata, assuming that the rat_pup.dat data file is located in the C:\temp directory. Note that we present the Stata commands including the prompt (.), which is not entered as part of the commands. . insheet using "C:\temp\rat_pup.dat", tab Alternatively, users of web-aware Stata can import the Rat Pup data set directly from the book’s web site: . insheet using http://www-personal.umich.edu/~bwest/rat_pup.dat, tab We now utilize the mixed command to fit the models for this example. Step 1: Fit a model with a “loaded” mean structure (Model 3.1). Because string variables cannot be used as categorical factor variables in Stata, we first recode the TREATMENT and SEX variables into numeric format: . . . . gen female = (sex == "Female") gen treatment2 = 1 if treatment == "Control" replace treatment2 = 2 if treatment == "Low" replace treatment2 = 3 if treatment == "High" The mixed command used to fit Model 3.1 (in Version 13+ of Stata) is then specified as follows: . * Model 3.1 fit . . mixed weight ib1.treatment2 female litsize ib1.treatment2#c.female || litter:, covariance(identity) variance reml The mixed command syntax has three parts. The first part specifies the dependent variable and the fixed effects, the second part specifies the random effects, and the third part specifies the covariance structure for the random effects, in addition to miscellaneous options. We note that although we have split the single command onto two lines, readers should attempt to submit the command on a single line in Stata. We discuss these parts of the syntax in detail below. The first variable listed after the mixed command is the continuous dependent variable, WEIGHT. The variables following the dependent variable are the terms that will have associated fixed effects in the model. We include fixed effects associated with TREATMENT (where ib1. indicates that the recoded variable TREATMENT2 is a categorical factor, with Two-Level Models for Clustered Data:The Rat Pup Example 99 the category having value 1 (Control) as the reference, or baseline category), the FEMALE indicator, LITSIZE, and the interaction between TREATMENT2 and FEMALE (indicated using #). We note that even though FEMALE is a binary indicator, it needs to be specified as “continuous” in the interaction term using c., because it has not been specified as a categorical factor previously in the variable list using i.. The two vertical bars (||) precede the variable that defines clusters of observations (litter:) in this two-level data set. The absence of additional variables after the colon indicates that there will only be a single random effect associated with the intercept for each level of LITTER in the model. The covariance option after the comma specifies the covariance structure for the random effects (or the D matrix). Because Model 3.1 includes only a single random effect associated with the intercept (and therefore a single variance parameter associated with the random effects), it has an identity covariance structure. The covariance option is actually not necessary in this simple case. Finally, the variance option requests that the estimated variances of the random effects and the residuals be displayed in the output, rather than their estimated standard deviations, which is the default. The mixed procedure also uses ML estimation by default, so we also include the reml option to request REML estimation for Model 3.1. The AIC and BIC information criteria for this model can be obtained by using the following command after the mixed command has finished running: . * Information criteria. . estat ic By default, the mixed command does not display F -tests for the fixed effects in the model. Instead, omnibus Wald chi-square tests for the fixed effects in the model can be performed using the test command. For example, to test the overall significance of the fixed treatment effects, the following command can be used: . * Test overall significance of the fixed treatment effects. . test 2.treatment2 3.treatment2 The two terms listed after the test command are the dummy variables automatically generated by Stata for the fixed effects of the Low and High levels of treatment (as indicated in the estimates of the fixed-effect parameters). The test command is testing the null hypothesis that the two fixed effects associated with these dummy variables are both equal to zero (i.e., the null hypothesis that the treatment means are all equal for males, given that the interaction between TREATMENT2 and FEMALE has been included in this model). Similar omnibus tests may be obtained for the fixed FEMALE effect, the fixed LITSIZE effect, and the interaction between TREATMENT2 and SEX: . . . . . * Omnibus tests for FEMALE, LITSIZE and the * TREATMENT2*FEMALE interaction. test female test litsize test 2.treatment2#c.female 3.treatment2#c.female Once a model has been fitted using the mixed command, EBLUPs of the random effects associated with the levels of the random factor (LITTER) can be saved in a new variable (named EBLUPS) using the following command: . predict eblups, reffects The saved EBLUPs can then be used to check for random effects that may be outliers. 100 Linear Mixed Models: A Practical Guide Using Statistical Software Step 2: Select a structure for the random effects (Model 3.1 vs. Model 3.1A). We perform a likelihood ratio test of Hypothesis 3.1 to decide whether the random effects associated with the intercept for each litter can be omitted from Model 3.1. In the case of two-level models with random intercepts, the mixed command performs the appropriate likelihood ratio test automatically. We read the following output after fitting Model 3.1: LR test vs. linear regression: chibar2(01) = 89.41 Prob >= chibar2 = 0.0000 Stata reports chibar2(01), indicating that it uses the correct null hypothesis distribution of the test statistic, which in this case is a mixture of χ20 and χ21 distributions, each with equal weight of 0.5 (see Subsection 3.5.1). The likelihood ratio test reported by the mixed command is an overall test of the covariance parameters associated with all random effects in the model. In models with a single random effect for each cluster, as in Model 3.1, it is appropriate to use this test to decide if that random effect should be included in the model. The significant result of this test (p < 0.001) suggests that the random litter effects should be retained in Model 3.1. Step 3: Select a covariance structure for the residuals (Models 3.1, 3.2A, or 3.2B). We now fit Model 3.2A, with a separate residual variance for each treatment group 2 2 2 (σhigh , σlow , and σcontrol ). . * Model 3.2A. . mixed weight ib1.treatment2 female litsize ib1.treatment2#c.female || litter:, covariance(identity) variance reml residuals(independent, by(treatment2)) We use the same mixed command that was used to fit Model 3.1, with one additional option: residuals(independent, by(treatment2)). This option specifies that the residuals in this model are independent of each other (which is reasonable for two-level clustered data if a random effect associated with each cluster has already been included in the model), and that residuals associated with different levels of TREATMENT2 have different variances. The resulting output will provide estimates of the residual variance for each treatment group. To test Hypothesis 3.2, we subtract the –2 REML log-likelihood for the heterogeneous residual variance model, Model 3.2A, from the corresponding value for the model with homogeneous residual variance for all treatment groups, Model 3.1. We can do this automatically by fitting both models and saving the results from each model in new objects. We first save the results from Model 3.2A in an object named model32Afit, refit Model 3.1 and save those results in an object named model31fit, and then use the lrtest command to perform the likelihood ratio test using the two objects (where Model 3.1 is nested within Model 3.2A, and listed second): . est store model32Afit . mixed weight ib1.treatment2 female litsize ib1.treatment2#c.female || litter:, covariance(identity) variance reml . est store model31fit . lrtest model32Afit model31fit We do not need to modify the p-value returned by the lrtest command for Hypothesis 3.2, because the null hypothesis (stating that the residual variance is identical for each treatment group) does not set a covariance parameter equal to the boundary of its parameter Two-Level Models for Clustered Data:The Rat Pup Example 101 space, as in Hypothesis 3.1. Because the result of this likelihood ratio test is significant (p < 0.001), we choose the heterogeneous variances model (Model 3.2A) as our preferred model at this stage of the analysis. We next test Hypothesis 3.3 to decide if we can pool the residual variances for the high and low treatment groups. To do this, we first create a pooled treatment group variable, TRTGRP: . gen trtgrp = 1 if treatment2 == 1 . replace trtgrp = 2 if treatment2 == 2 | treatment2 == 3 We now fit Model 3.2B, using the new TRTGRP variable to specify the grouped heterogeneous residual variance structure and saving the results in an object named model32Bfit: . * Model 3.2B. . mixed weight ib1.treatment2 female litsize ib1.treatment2#c.female || litter:, covariance(identity) variance reml residuals(independent, by(trtgrp)) . est store model32Bfit We test Hypothesis 3.3 using a likelihood ratio test, by applying the lrtest command to the objects containing the fits for Model 3.2A and Model 3.2B: . lrtest model32Afit model32Bfit The null distribution of the test statistic in this case is a χ2 with one degree of freedom. Because the test is not significant (p = 0.27), we select the nested model, Model 3.2B, as our preferred model at this stage of the analysis. We use a likelihood ratio test for Hypothesis 3.4 to decide whether we wish to retain the grouped heterogeneous error variances in Model 3.2B or choose the homogeneous error variance model, Model 3.1. The lrtest command is also used for this test: . * Test Hypothesis 3.4. . lrtest model32Bfit model31fit The result of this likelihood ratio test is significant (p < 0.001), so we choose the pooled heterogeneous residual variances model, Model 3.2B, as our preferred model at this stage. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 3.2B vs. Model 3.3, and Model 3.3 vs. Model 3.3A). We test Hypothesis 3.5 to decide whether the fixed effects associated with the treatment by sex interaction are equal to zero in Model 3.2B, using a Type III F -test in Stata. To obtain the results of this test, we execute the test command below after fitting Model 3.2B: . * Test Hypothesis 3.5. . test 2.treatment2#c.female 3.treatment2#c.female Based on the nonsignificant Type III F -test (p = 0.73), we delete the TREATMENT2 × FEMALE interaction term from the model and obtain our final model, Model 3.3. We test Hypothesis 3.6 to decide whether the fixed effects associated with treatment are equal to zero in Model 3.3, using a likelihood ratio test based on maximum likelihood (ML) estimation. We first fit the reference model, Model 3.3, using ML estimation (the default of the mixed command, meaning that we drop the reml option along with the interaction term), and then fit a nested model, Model 3.3A, without the TREATMENT2 term, also using ML estimation. We then use the lrtest command to carry out the likelihood ratio test of Hypothesis 3.6: 102 Linear Mixed Models: A Practical Guide Using Statistical Software . * Test Hypothesis 3.6. . mixed weight ib1.treatment2 female litsize || litter:, covariance(identity) variance residuals(independent, by(trtgrp)) . est store model33fit . mixed weight female litsize || litter:, covariance(identity) variance residuals(independent, by(trtgrp)) . est store model33Afit . lrtest model33fit model33Afit The likelihood ratio test result is significant (p < 0.001), so we retain the significant fixed treatment effects in the model. We keep the fixed effects associated with FEMALE and LITSIZE without testing them, to adjust for these fixed effects when assessing the treatment effects. See Section 3.5 for a discussion of the results of all hypothesis tests for the Rat Pup data analysis. We now refit our final model, Model 3.3, using REML estimation to get unbiased estimates of the variance parameters: . * Model 3.3: Final Model. . mixed weight ib1.treatment2 female litsize || litter:, covariance(identity) variance reml residuals(independent, by(trtgrp)) 3.4.5 3.4.5.1 HLM Data Set Preparation To perform the analysis of the Rat Pup data using the HLM software package, we need to prepare two separate data sets. 1. The Level 1 (pup-level) data set contains a single observation (row of data) for each rat pup. This data set includes the Level 2 cluster identifier variable, LITTER, and the variable that identifies the units of analysis, PUP ID. The response variable, WEIGHT, which is measured for each pup, must also be included, along with any pup-level covariates. In this example, we have only a single puplevel covariate, SEX. In addition, the data set must be sorted by the cluster-level identifier, LITTER. 2. The Level 2 (litter-level) data set contains a single observation for each LITTER. The variables in this data set remain constant for all rat pups within a given litter. The Level 2 data set needs to include the cluster identifier, LITTER, and the litter-level covariates, TREATMENT and LITSIZE. This data set must also be sorted by LITTER. Because the HLM program does not automatically create dummy variables for categorical predictors, we need to create dummy variables to represent the nonreference levels of the categorical predictors prior to importing the data into HLM. We first need to add an indicator variable for SEX to represent female rat pups in the Level 1 data set, and we need to create two dummy variables in the Level 2 data set for TREATMENT, to represent the high and low dose levels. If the input data files were created in SPSS, the SPSS syntax to create these indicator variables in the Level 1 and Level 2 data files would look like this: Two-Level Models for Clustered Data:The Rat Pup Example 103 Level 1 data COMPUTE sexl = (sex = "Female") . EXECUTE . Level 2 data COMPUTE EXECUTE COMPUTE EXECUTE 3.4.5.2 treat1 = (treatment = "High") . . treat2 = (treatment = "Low") . . Preparing the Multivariate Data Matrix (MDM) File We create a new MDM file, using the Level 1 and Level 2 data sets described earlier. In the main HLM menu, click File, Make new MDM file, and then Stat package input. In the window that opens, select HLM2 to fit a two-level hierarchical linear model, and click OK. In the Make MDM window that opens, select the Input File Type as SPSS/Windows. Now, locate the Level-1 Specification area of the MDM window, and Browse to the location of the Level 1 SPSS data set. Once the data file has been selected, click on the Choose Variables button and select the following variables from the Level 1 file: LITTER (check “ID” for the LITTER variable, because this variable identifies the Level 2 units), WEIGHT (check “in MDM” for this variable, because it is the dependent variable), and the indicator variable for females, SEX1 (check “in MDM”). Next, locate the Level-2 Specification area of the MDM window and Browse to the location of the Level 2 SPSS data set that has one record per litter. Click on the Choose Variables button to include LITTER (check “ID”), TREAT1 and TREAT2 (check “in MDM” for each indicator variable), and finally LITSIZE (check “in MDM”). After making these choices, check the cross sectional (persons within groups) option for the MDM file, to indicate that the Level 1 data set contains measures on individual rat pups (“persons” in this context), and that the Level 2 data set contains litter-level information (the litters are the “groups”). Also, select No for Missing Data? in the Level 1 data set, because we do not have any missing data for any of the litters in this example. Enter a name for the MDM file with an .mdm extension (e.g., ratpup.mdm) in the upper right corner of the MDM window. Finally, save the .mdmt template file under a new name (click Save mdmt file), and click the Make MDM button. After HLM has processed the MDM file, click the Check Stats button to see descriptive statistics for the variables in the Level 1 and Level 2 data sets (HLM 7+ will show these automatically). This step, which is required prior to fitting a model, allows you to check that the correct number of records has been read into the MDM file and that there are no unusual values for the variables included in the MDM file (e.g., values of 999 that were previously coded as missing data; such values would need to be set to system-missing in SPSS prior to using the data file in HLM). Click Done to proceed to the model-building window. Step 1: Fit a model with a loaded mean structure (Model 3.1). In the model-building window, select WEIGHT from the list of variables, and click Outcome variable. The initial “unconditional” (or “means-only”) model for WEIGHT, broken down into Level 1 and Level 2 models, is now displayed in the model-building window. The initial Level 1 model is: 104 Linear Mixed Models: A Practical Guide Using Statistical Software Model 3.1: Level 1 Model (Initial) WEIGHT = β0 + r To add more informative subscripts to the models (if they are not already shown), click File and Preferences, and choose Use level subscripts. The Level 1 model now includes the subscripts i and j, where i indexes individual rat pups and j indexes litters, as follows: Model 3.1: Level 1 Model (Initial) With Subscripts WEIGHTij = β0j + rij This initial Level 1 model shows that the value of WEIGHTij for an individual rat pup i, within litter j, depends on the intercept, β0j , for litter j, plus a residual, rij , associated with the rat pup. The initial Level 2 model for the litter-specific intercept, β0j , is also displayed in the model-building window. Model 3.1: Level 2 Model (Initial) β0j = γ00 + u0j This model shows that at Level 2 of the data set, the litter-specific intercept depends on the fixed overall intercept, γ00 , plus a random effect, u0j , associated with litter j. In this “unconditional” model, β0j is allowed to vary randomly from litter to litter. After clicking the Mixed button for this model (in the lower-right corner of the model-building window), the initial means-only mixed model is displayed. Model 3.1: Overall Mixed Model (Initial) WEIGHTij = γ00 + u0j + rij To complete the specification of Model 3.1, we add the pup-level covariate, SEX1. Click the Level 1 button in the model-building window and then select SEX1. Choose add variable uncentered. SEX1 is then added to the Level 1 model along with a litter-specific coefficient, β1j , for the effect of this covariate. Model 3.1: Level 1 Model (Final) WEIGHTij = β0j + β1j (SEX1ij ) + rij The Level 2 model now has equations for both the litter-specific intercept, β0j , and for β1j , the litter-specific coefficient associated with SEX1. Model 3.1: Level 2 Model (Intermediate) β0j = γ00 + u0j β1j = γ10 The equation for the litter-specific intercept is unchanged. The value of β1j is defined as a constant (equal to the fixed effect γ10 ) and does not include any random effects, because we assume that the effect of SEX1 (i.e., the effect of being female) does not vary randomly from litter to litter. Two-Level Models for Clustered Data:The Rat Pup Example 105 To finish the specification of Model 3.1, we add the uncentered versions of the two litterlevel dummy variables for treatment, TREAT1 and TREAT2, to the Level 2 equations for the intercept, β0j , and for the effect of being female, β1j . We add the effect of the uncentered version of the LITSIZE covariate to the Level 2 equation for the intercept only, because we do not wish to allow the effect of being female to vary as a function of litter size. Click the Level 2 button in the model-building window. Then, click on each Level 2 equation and click on the specific variables (uncentered) to add. Model 3.1: Level 2 Model (Final) β0j = γ00 + γ01 (LITSIZEj ) + γ02 (TREAT1j ) + γ03 (TREAT2j ) + u0j β1j = γ10 + γ11 (TREAT1j ) + γ12 (TREAT2j ) In this final Level 2 model, the main effects of TREAT1 and TREAT2, i.e., γ02 and γ03 , enter the model through their effect on the litter-specific intercept, β0j . The interaction between treatment and sex enters the model by allowing the litter-specific effect for SEX1, β1j , to depend on fixed effects associated with TREAT1 and TREAT2 (γ11 and γ12 , respectively). The fixed effect associated with LITSIZE, γ01 , is only included in the equation for the litter-specific intercept and is not allowed to vary by sex (i.e., our model does not include a LITSIZE × SEX1 interaction). We can view the final LMM by clicking the Mixed button in the HLM model-building window: Model 3.1: Overall Mixed Model (Final) WEIGHTij = γ00 + γ01 ∗ LITSIZEj + γ02 ∗ TREAT1j + γ03 ∗ TREAT2j + γ10 ∗ SEX1ij + γ11 ∗ TREAT1j ∗ SEX1ij + γ12 ∗ TREAT2j ∗ SEX1ij + u0j + rij The final mixed model in HLM corresponds to Model 3.1 as it was specified in (3.1), but with somewhat different notation. Table 3.3 shows the correspondence of this notation with the general LMM notation used in (3.1). After specifying Model 3.1, click Basic Settings to enter a title for this analysis (such as “Rat Pup Data: Model 3.1”) and a name for the output (.html) file. Note that the default outcome variable distribution is Normal (Continuous), so we do not need to specify it. The HLM2 procedure automatically creates two residual data files, corresponding to the two levels of the model. The “Level-1 Residual File” contains the conditional residuals, rij , and the “Level-2 Residual File” contains the EBLUPs of the random litter effects, u0j . To change the names and/or file formats of these residual files, click on either of the two buttons for the files in the Basic Settings window. Click OK to return to the model-building window. Click File ... Save As to save this model specification to a new .hlm file. Finally, click Run Analysis to fit the model. HLM2 by default uses REML estimation for two-level models such as Model 3.1. Click on File ... View Output to see the estimates for this model. Step 2: Select a structure for the random effects (Model 3.1 vs. Model 3.1A). In this step, we test Hypothesis 3.1 to decide whether the random effects associated with the intercept for each litter can be omitted from Model 3.1. We cannot perform a likelihood ratio test for the variance of the random litter effects in this model because HLM does not 106 Linear Mixed Models: A Practical Guide Using Statistical Software allow us to remove the random effects in the Level 2 model (there must be at least one random effect associated with each level of the data set in HLM). Because we cannot use a likelihood ratio test for the variance of the litter-specific intercepts, we instead use the χ2 tests for the covariance parameters provided by HLM2. These χ2 statistics are calculated using methodology described in Raudenbush & Bryk (2002) and are displayed near the bottom of the output file. Step 3: Select a covariance structure for the residuals (Models 3.1, 3.2A, or 3.2B). Models 3.2A, 3.2B and 3.3, which have heterogeneous residual variance for different levels of treatment, cannot be fitted using HLM2, because this procedure does not allow the Level 1 variance to depend on a factor measured at Level 2 of the data. However, HLM does provide an option labeled Test homogeneity of Level 1 variance under the Hypothesis Testing settings, which can be used to obtain a test of whether the assumption of homogeneous residual variance is met [refer to Raudenbush & Bryk (2002) for more details]. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 3.2B vs. Model 3.3, and Model 3.3 vs. Model 3.3A). We can set up general linear hypothesis tests for the fixed effects in a model in HLM by clicking Other Settings, and then Hypothesis Testing prior to fitting a model. In the hypothesis-testing window, each numbered button corresponds to a test of a null hypothesis that Cγ = 0, where C is a known matrix for a given hypothesis, and γ is a vector of fixedeffect parameters. This specification of the linear hypothesis in HLM corresponds to the linear hypothesis specification, Lβ = 0, described in Subsection 2.6.3.1. For each hypothesis, HLM computes a Wald-type test statistic, which has a χ2 null distribution, with degrees of freedom equal to the rank of C [see Raudenbush & Bryk (2002) for more details]. For example, to test the overall effect of treatment in Model 3.1, which has seven fixedeffect parameters (γ00 , associated with the intercept term; γ01 , with litter size; γ02 and γ03 , with the treatment dummy variables TREAT1 and TREAT2; γ10 , with sex; and γ11 and γ12 , with the treatment by sex interaction terms), we would need to set up the following C matrix and γ vector: ⎛ ⎞ γ00 ⎜ γ01 ⎟ ⎜ ⎟ ⎜ γ02 ⎟ ⎜ ⎟ 0 0 1 0 0 0 0 ⎟ C= ,γ = ⎜ ⎜ γ03 ⎟ 0 0 0 1 0 0 0 ⎜ γ10 ⎟ ⎜ ⎟ ⎝ γ11 ⎠ γ12 This specification of C and γ corresponds to the null hypothesis H0 : γ02 = 0 and γ03 = 0. Each row in the C matrix corresponds to a column in the HLM Hypothesis Testing window. To set up this hypothesis test, click on the first numbered button under Multivariate Hypothesis Tests in the Hypothesis Testing window. In the first column of zeroes, corresponding to the first row of the C matrix, enter a 1 for the fixed effect γ02 . In the second column of zeroes, enter a 1 for the fixed effect γ03 . To complete the specification of the hypothesis, click on the third column, which will be left as all zeroes, and click OK. Additional hypothesis tests can be obtained for the fixed effects associated with other terms in Model 3.1 (including the interaction terms) by entering additional C matrices under different numbered buttons in the Hypothesis Testing window. After setting up all hypothesis tests of interest, click OK to return to the main model-building window. Two-Level Models for Clustered Data:The Rat Pup Example 3.5 107 Results of Hypothesis Tests The hypothesis test results reported in Table 3.5 were derived from output produced by SAS proc mixed. See Table 3.4 and Subsection 3.3.3 for more information about the specification of each hypothesis. 3.5.1 Likelihood Ratio Tests for Random Effects Hypothesis 3.1. The random effects, u0j , associated with the litter-specific intercepts can be omitted from Model 3.1. To test Hypothesis 3.1, we perform a likelihood ratio test. The test statistic is calculated by subtracting the –2 REML log-likelihood value of the reference model, Model 3.1, from the corresponding value for a nested model omitting the random effects, Model 3.1A. Because a 2 variance cannot be less than zero, the null hypothesis value of σlitter = 0 is at the boundary of the parameter space, and the asymptotic null distribution of the likelihood ratio test statistic is a mixture of χ20 and χ21 distributions, each with equal weight of 0.5 (Verbeke & Molenberghs, 2000). We illustrate calculation of the p-value for the likelihood ratio test statistic: p-value = 0.5 × P (χ20 > 89.4) + 0.5 × P (χ21 > 89.4) < 0.001 The resulting test statistic is significant (p < 0.001), so we retain the random effects associated with the litter-specific intercepts in Model 3.1 and in all subsequent models. As noted in Subsection 3.4.1, the χ20 distribution has all of its mass concentrated at zero, so its contribution to the p-value is zero and the first term can be omitted from the p-value calculation. 3.5.2 Likelihood Ratio Tests for Residual Variance Hypothesis 3.2. The variance of the residuals is the same (homogeneous) for the three treatment groups (high, low, and control). We use a REML-based likelihood ratio test for Hypothesis 3.2. The test statistic is calculated by subtracting the value of the –2 REML log-likelihood for Model 3.2A (the reference model) from that for Model 3.1 (the nested model). Under the null hypothesis, the variance parameters are not on the boundary of their parameter space (i.e., the null hypothesis does not specify that they are equal to zero). The test statistic has a χ2 distribution with 2 degrees of freedom because Model 3.2A has 2 more covariance parameters (i.e., the 2 additional residual variances) than Model 3.1. The test result is significant (p < 0.001). We therefore reject the null hypothesis and decide that the model with heterogeneous residual variances, Model 3.2A, is our preferred model at this stage of the analysis. Hypothesis 3.3. The residual variances for the high and low treatment groups are equal. To test Hypothesis 3.3, we again carry out a REML-based likelihood ratio test. The test statistic is calculated by subtracting the value of the –2 REML log-likelihood for Model 3.2A (the reference model) from that for Model 3.2B (the nested model). Under the null hypothesis, the test statistic has a χ2 distribution with 1 degree of freedom. The nested model, Model 3.2B, has one fewer covariance parameter (i.e., one less residual variance) than the reference model, Model 3.2A, and the null hypothesis value of 108 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 3.5: Summary of Hypothesis Test Results for the Rat Pup Analysis Hypothesis Label Test Estimation Method Models Compared (Nested vs. Reference) Test Statistic Values (Calculation) p-value 3.1 LRT REML 3.1A vs. 3.1 χ2 (0 : 1) = 89.4 (490.5 – 401.1) < .001 3.2 LRT REML 3.1 vs. 3.2A χ2 (2) = 41.2 (401.1 – 359.9) < .001 3.3 LRT REML 3.2B vs. 3.2A χ2 (1) = 1.2 (361.1 – 359.9) 0.27 3.4 LRT REML 3.1 vs. 3.2B χ2 (1) = 40.0 (401.1 – 361.1) < .001 3.5 Type III F -test REML 3.2Ba F (2, 194) = 0.3 0.73 3.6 LRT ML 3.3A vs. 3.3 χ2 (2) = 18.6 (356.4 – 337.8) < .001 Type III REML 3.3b F (2, 24.3) = 11.4 < .001 F -test Note: See Table 3.4 for null and alternative hypotheses, and distributions of test statistics under H0 . a We use an F -test for the fixed effects associated with TREATMENT × SEX based on the fit of Model 3.2B only. b We use an F -test for the fixed effects associated with TREATMENT based on the fit of Model 3.3 only. the parameter does not lie on the boundary of the parameter space. The test result is not significant (p = 0.27). We therefore do not reject the null hypothesis, and decide that Model 3.2B, with pooled residual variance for the high and low treatment groups, is our preferred model. Hypothesis 3.4. The residual variance for the combined high/low treatment group is equal to the residual variance for the control group. To test Hypothesis 3.4, we carry out an additional REML-based likelihood ratio test. The test statistic is calculated by subtracting the value of the –2 REML log-likelihood for Model 3.2B (the reference model) from that of Model 3.1 (the nested model). Under the null hypothesis, the test statistic has a χ2 distribution with 1 degree of freedom: the reference model has 2 residual variances, and the nested model has 1. The test result is significant (p < 0.001). We therefore reject the null hypothesis and choose Model 3.2B as our preferred model. 3.5.3 F -tests and Likelihood Ratio Tests for Fixed Effects Hypothesis 3.5. The fixed effects, β5 and β6 , associated with the treatment by sex interaction are equal to zero in Model 3.2B. Two-Level Models for Clustered Data:The Rat Pup Example 109 We test Hypothesis 3.5 using a Type III F -test for the treatment by sex interaction in Model 3.2B. The results of the test are not significant (p = 0.73). Therefore, we drop the fixed effects associated with the treatment by sex interaction and select Model 3.3 as our final model. Hypothesis 3.6. The fixed effects associated with treatment, β1 and β2 , are equal to zero in Model 3.3. Hypothesis 3.6 is not part of the model selection process but tests the primary hypothesis of the study. We would not remove the effect of treatment from the model even if it proved to be nonsignificant because it is the main focus of the study. The Type III F -test, reported by SAS for treatment in Model 3.3, is significant (p < 0.001), and we conclude that the mean birth weights differ by treatment group, after controlling for litter size, sex, and the random effects associated with litter. We also carry out an ML-based likelihood ratio test for Hypothesis 3.6. To do this, we refit Model 3.3 using ML estimation. We then fit a nested model without the fixed effects of treatment (Model 3.3A), again using ML estimation. The test statistic is calculated by subtracting the –2 log-likelihood value for Model 3.3 from the corresponding value for Model 3.3A. The result of this test is also significant (p < 0.001). 3.6 3.6.1 Comparing Results across the Software Procedures Comparing Model 3.1 Results Table 3.6 shows selected results generated using each of the six software procedures to fit Model 3.1 to the Rat Pup data. This model is “loaded” with fixed effects, has random effects associated with the intercept for each litter, and has a homogeneous residual variance structure. All five procedures agree in terms of the estimated fixed-effect parameters and their estimated standard errors for Model 3.1. They also agree on the estimated variance compo2 nents (i.e., the estimates of σlitter and σ 2 ) and their respective standard errors, when they are reported. Portions of the model fit criteria differ across the software procedures. Reported values of the –2 REML log-likelihood are the same for the procedures in SAS, SPSS, R, and Stata. However, the reported value in HLM (indicated as the deviance statistic in the HLM output) is lower than the values reported by the other four procedures. According to correspondence with HLM technical support staff, the difference arising in this illustration is likely due to differences in default convergence criteria between HLM and the other procedures, or in implementation of the iterative REML procedure within HLM. This minor difference is not critical in this case. We also note that the values of the AIC and BIC statistics vary because of the different computing formulas being used (the HLM2 procedure does not compute these information criteria). SAS and SPSS compute the AIC as –2 REML log-likelihood + 2 × (# covariance parameters in the model). Stata and R compute the AIC as –2 REML log-likelihood + 2 × (# fixed effects + # covariance parameters in the model). Although the AIC and BIC statistics are not always comparable across procedures, they can be used to compare the fits of models within any given procedure. For details on how the computation of the BIC criteria varies from the presentation in Subsection 2.6.4 across the different software procedures, refer to the documentation for each procedure. 110 TABLE 3.6: Comparison of Results for Model 3.1 SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM2 Estimation Method REML REML REML REML REML REML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) a 8.32(0.27) −0.91(0.19) −0.47(0.16) −0.41(0.07) −0.13(0.02) 0.11(0.13) 0.08(0.11) 8.32(0.27) −0.91(0.19) −0.47(0.16) −0.41(0.07) −0.13(0.02) 0.11(0.13) 0.08(0.11) 8.32(0.27) −0.91(0.19) −0.47(0.16) −0.41(0.07) −0.13(0.02) 0.11(0.13) 0.08(0.11) 8.32(0.27) −0.91(0.19) −0.47(0.16) −0.41(0.07) −0.13(0.02) 0.11(0.13) 0.08(0.11) 8.32(0.27) −0.91(0.19) −0.47(0.16) −0.41(0.07) −0.13(0.02) 0.11(0.13) 0.08(0.11) 8.32(0.27) −0.91(0.19) −0.47(0.16) −0.41(0.07) −0.13(0.02) 0.11(0.13) 0.08(0.11) Estimate (SE) Estimate (SE) Estimate (n.c.) Estimate (n.c.) Estimate (SE) Estimate (n.c.) β0 (Intercept) β1 (High vs. Control) β2 (Low vs. Control) β3 (Female vs. male) β4 (Litter size) β5 (High × Female) β6 (Low × Female) Covariance Parameter 2 σlitter 2 0.10(0.03) 0.16(0.01) σ (Residual variance) 0.10(0.03) 0.16(0.01) 0.10 0.16 b 0.10 0.16 0.10(0.03) 0.16(0.01) 0.10 0.16 Model Information Criteria –2 REML log-likelihood AIC BIC Tests for Fixed Effects Intercept 401.1 405.1c 407.7 401.1 405.1c 412.6 401.1 419.1d 452.9 401.1 419.1d 453.1 401.1 419.1d 453.1 399.3 n.c. n.c. Type III F -Tests Type III F -Tests Type I F -Tests t(32.9) = 30.5 F (1, 34.0) = 1076.2 F (1, 292) = 9093.8 Type I F -Tests N/Ae Wald χ2 -Tests Z = 30.5 Wald χ2 -Tests χ2 (1) = 927.3 F (2, 24.3) = 11.5 F (2, 24.3) = 11.5 F (2, 23.0) = 5.08 F (2,xx) = 5.08e χ2 (2) = 23.7 χ2 (2) = 23.7 p < .01 TREATMENT p < .01 p < .01 p < .01 p < .01 p = 0.01 p < .01 p < .01 p < .01 p < .01 Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed SAS: proc mixed SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM2 REML REML REML REML REML REML SEX F (1, 303) = 47.0 F (1, 302.9) = 47.0 F (1, 292) = 52.6 F (1,xx) = 52.6e χ2 (1) = 31.7 χ2 (l) = 31.7 LITSIZE F (1, 31.8) = 46.9 F (1, 31.8) = 46.9 F (1, 23.0) = 47.4 F (1,xx) = 47.4e χ2 (l) = 46.9 χ2 (1) = 46.9 TREATMENT × SEX F (2, 302.0) = 0.5 F (2, 302.3) = 0.5 F (2, 292.0) = 0.5 F (2,xx) = 0.5e χ2 (2) = 0.9 χ2 (2) = 0.9 Estimation Method p < .01 EBLUPs p < .01 p = .63 p < .01 p < .01 p = .63 p < .01 p < .01 p = .63 p < .01 p < .01 p = .63 p < .01 p < .01 p > .50 Output Computed Can be Can be Can be Saved by (w/sig. tests) (Subsection 3.3.2) saved saved saved default Note: (n.c.) = not computed Note: 322 Rat Pups at Level 1; 27 Litters at Level 2 a HLM2 also reports “robust” standard errors for the estimated fixed effects in the output by default. We report the model-based standard errors here. b Users of the lme() function in R can use the function intervals(model3.1.ratpup) to obtain approximate 95% confidence intervals for covariance parameters. The estimated standard deviations reported by the summary() function have been squared to obtain variances. c SAS and SPSS compute the AIC as –2 REML log-likelihood + 2 × (# covariance parameters in the model). d Stata and R compute the AIC as –2 REML log-likelihood + 2 × (# fixed effects + # covariance parameters in the model). e Likelihood ratio tests are recommended for making inferences about fixed effects when using the lmer() function in R, given that denominator degrees of freedom for the test statistics are not computed by default. Satterthwaite approximations of the denominator degrees of freedom are computed when using the lmerTest package. Two-Level Models for Clustered Data:The Rat Pup Example TABLE 3.6: (Continued) 111 112 TABLE 3.7: Comparison of Results for Model 3.2B SPSS: GENLINMIXED R: lme() function Stata: mixed Estimation Method REML REML REML REML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) 8.35(0.28) −0.90(0.19) −0.47(0.16) −0.41(0.09) −0.13(0.02) 0.09(0.12) 0.08(0.11) 8.35(0.28) −0.90(0.19) −0.47(0.16) −0.41(0.09) −0.13(0.02) 0.09(0.12) 0.08(0.11) 8.35(0.28) −0.90(0.19) −0.47(0.16) −0.41(0.09) −0.13(0.02) 0.09(0.12) 0.08(0.11) 8.35(0.28) −0.90(0.19) −0.47(0.16) −0.41(0.09) −0.13(0.02) 0.09(0.12) 0.08(0.11) Estimate (SE) Estimate (SE) Estimate (n.c.) Estimate (SE) β0 (Intercept) β1 (High vs. control) β2 (Low vs. control) β3 (Female vs. male) β4 (Litter size) β5 (High × female) β6 (Low × female) Covariance Parameter 2 σlitter 2 σhigh/low 2 σcontrol Tests for Fixed Effects Intercept TREATMENT 0.10(0.03) 0.09(0.01) 0.27(0.03) 0.10(0.03) 0.09(0.01) 0.27(0.03) 0.10 0.09 0.27 Type III Type III Type I F -Tests F -Tests F -Tests t(34) = 30.29, t(34) = 30.29, F (1, 292) = 9027.94, p < .001 p < .001 p < .001 F (2, 24.4) = 11.18, F (2, 24.4) = 11.18, F (2, 23.0) = 4.24, p < .001 p < .001 p = .027 0.10(0.03) 0.09(0.01) 0.27(0.03) Wald χ2 -Tests t(34) = 30.30, p < .001 χ2 (2) = 22.9, p < .01 Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Estimation Method SEX LITSIZE TREATMENT × SEX SAS: proc mixed SPSS: GENLINMIXED R: lme() function Stata: mixed REML REML REML REML F (1, 292) = 61.57, p < .001 F (1, 23.0) = 49.58, p < .001 F (2, 292) = 0.32, p = .73 χ2 (1) = 19.3, p < .01 χ2 (1) = 49.33, p < .01 χ2 (2) = 0.63, p = .73 361.1 381.1 418.6 361.1 381.1 418.8 F (1, 29.6) = 59.17, F (1, 296) = 59.17, p < .001 p < .001 F (1, 31.2) = 49.33, F (1, 31.0) = 49.33, p < .001 p < .001 F (2, 194) = 0.32, F (2, 194) = 0.32, p = .73 p = .73 Model Information Criteria –2 REML log-likelihood 361.1 AIC 367.1 BIC 371.0 Note: (n.c.) = not computed Note: 322 Rat Pups at Level 1; 27 Litters at Level 2 361.1 367.2 378.3 Two-Level Models for Clustered Data:The Rat Pup Example TABLE 3.7: (Continued) 113 114 Linear Mixed Models: A Practical Guide Using Statistical Software The significance tests for the fixed intercept are also different across the software procedures. SPSS and R report F -tests and t-tests for the fixed intercept, whereas SAS only reports a t-test by default (an F -test can be obtained for the intercept in SAS by specifying the intercept option in the model statement of proc mixed), Stata reports a z-test, and the HLM2 procedure can optionally report a Wald chi-square test in addition to the default t-test. The procedures that report F -tests for fixed effects (other than the intercept) differ in the F -statistics that they report. Type III F -tests are reported in SAS and SPSS, and Type I (sequential) F -tests are reported in R. There are also differences in the denominator degrees of freedom used for the F -tests (see Subsection 3.11.6 for a discussion of different denominator degrees of freedom options in SAS). Using the test command for fixed effects in Stata or requesting general linear hypothesis tests for fixed effects in HLM (as illustrated in Subsection 3.4.5) results in Type III Wald chi-square tests being reported for the fixed effects. The p-values for all tests of the fixed effects are similar for this model across the software packages, despite differences in the tests being used. 3.6.2 Comparing Model 3.2B Results Table 3.7 shows selected results for Model 3.2B, which has the same fixed effects as Model 3.1, random effects associated with the intercept for each litter, and heterogeneous residual variances for the combined high/low treatment group and the control group. We report results for Model 3.2B generated by proc mixed in SAS, GENLINMIXED in SPSS, the lme() function in R, and the mixed command in Stata, because these are the only procedures that currently accommodate models with heterogeneous residual (i.e., Level 1) variances in different groups defined by categorical Level 2 variables. The estimated variance of the random litter effects and the estimated residual variances for the pooled high/low treatment group and the control treatment group are the same across these procedures. However, these parameters are displayed differently in R; R displays multipliers of a single parameter estimate (see Subsection 3.4.3 for an example of the R output for covariance parameters, and Table 3.7 for an illustration of how to calculate the covariance parameters based on the R output in Subsection 3.4.3). In terms of tests for fixed effects, only the F -statistics reported for the treatment by sex interaction are similar in R. This is because the other procedures are using Type III tests by default (considering all other terms in the model), and R only produces Type I F -tests. The Type I and Type III F -test results correspond only for the last term entered into the model formula. Type I F -tests can also be obtained in proc mixed by using the htype = 1 option in the model statement. Stata uses Wald chi-square tests when the test command is used to perform these tests. The values of the –2 REML log-likelihoods for the fitted models are the same across these procedures. Again, we note that the information criteria (AIC and BIC) differ because of different calculation formulas being used by the two procedures. 3.6.3 Comparing Model 3.3 Results Table 3.8 shows selected results from the fit of Model 3.3 (our final model) using the procedures in SAS, SPSS, R, and Stata. This model has fixed effects associated with sex, treatment and litter size, and heterogeneous residual variances for the high/low vs. control treatment groups. These four procedures agree on the reported values of the estimated fixed-effect parameters and their standard errors, as well as the estimated covariance parameters and their standard errors. We note the same differences between the procedures in Table 3.8 that were discussed for Table 3.7, for the model fit criteria (AIC and BIC) and the F -tests for the fixed effects. Two-Level Models for Clustered Data:The Rat Pup Example 3.7 115 Interpreting Parameter Estimates in the Final Model The results discussed in this section were obtained from the fit of Model 3.3 using SAS proc mixed. Similar results can be obtained when using the GENLINMIXED procedure in SPSS, the lme() function in R, or the mixed command in Stata. 3.7.1 Fixed-Effect Parameter Estimates The SAS output below displays the fixed-effect parameter estimates and their corresponding t-tests. ' $ Solution for Fixed Effects ------------------------------------------------------------------------Standard Effect Sex Treat Estimate Error DF t Value Pr<|t| Intercept treat treat treat sex sex litsize & High Low Control Female Male 8.3276 - 0.8623 - 0.4337 0.0000 - 0.3434 0.0000 - 0.1307 0.27410 0.18290 0.15230 33.3 25.7 24.3 30.39 - 4.71 - 2.85 <.0001 <.0001 .0088 0.04204 256.0 - 8.17 <.0001 % The main effects of treatment (high vs. control and low vs. control) are both negative and significant (p < 0.01). We estimate that the mean birth weights of rat pups in the high and low treatment groups are 0.86 g and 0.43 g lower, respectively, than the mean birth weight of rat pups in the control group, adjusting for sex and litter size. Note that the degrees of freedom (DF) reported for the t-statistics are computed using the Satterthwaite method (requested by using the ddfm = sat option in proc mixed). The main effect of sex on birth weight is also negative and significant (p < 0.001). We estimate that the mean birth weight of female rat pups is 0.34 g less than that of male rat pups, after adjusting for treatment level and litter size. The main effect of litter size is also negative and significant (p < 0.001). We estimate that the mean birth weight of a rat pup in a litter with one additional pup is decreased by 0.13 g, adjusting for treatment and sex. The relationship between litter size and birth weight confirms our impression based on Figure 3.2, and corresponds with our intuition that larger litters would on average tend to have smaller pups. Post-hoc comparisons of the least-squares means, using the Tukey–Kramer adjustment method to compute p-values for all pairwise comparisons, are shown in the SAS output below. These comparisons are obtained by including the lsmeans statement in proc mixed: 0.01855 31.1 - 7.04 <.0001 lsmeans treat / adjust = tukey; # Differences of Least-Squares Means -------------------------------------------------------------------------------------------Effect treat _treat Estimate Std Error DF t Value Pr t Adjustment Adj P treat treat treat " High High Low Low Control Control - 0.4286 - 0.8623 - 0.4337 0.1755 0.1829 0.1523 23.4 25.7 24.3 2.44 4.71 2.85 .0225 <.0001 .0088 Tukey--Kramer Tukey--Kramer Tukey--Kramer .0558 .0002 .0231 ! 116 TABLE 3.8: Comparison of Results for Model 3.3 SPSS: GENLINMIXED R: lme() function Stata: mixed Estimation Method REML REML REML REML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) 8.33(0.27) −0.86(0.18) −0.43(0.15) −0.34(0.04) −0.13(0.02) 8.33(0.27) −0.86(0.18) −0.43(0.15) −0.34(0.04) −0.13(0.02) 8.33(0.27) −0.86(0.18) −0.43(0.15) −0.34(0.04) −0.13(0.02) 8.33(0.27) −0.86(0.18) −0.43(0.15) −0.34(0.04) −0.13(0.02) Estimate (SE) Estimate (SE) Estimate (n.c.) Estimate (SE) β0 (Intercept) β1 (High vs. Control) β2 (Low vs. Control) β3 (Female vs. Male) β4 (Litter size) Covariance Parameter 2 σlitter 2 σhigh/low 2 σcontrol 0.10(0.03) 0.09(0.01) 0.26(0.03) 0.10(0.03) 0.09(0.01) 0.26(0.03) 0.10 0.09 0.26 0.10(0.03) 0.09(0.01) 0.26(0.03) 356.3 362.3 366.2 356.3 362.3 373.6 356.3 372.3 402.3 356.3 372.3 402.5 Type III F -Tests t(33.3) = 30.4, p < .01 Type III F -Tests t(33.0) = 30.4, p < .01 Type I F -Tests F (1, 294) = 9029.6, p < .01 Wald χ2 -Tests Z = 30.39, p < .01 Model Information Criteria –2 REML log-likelihood AIC BIC Tests for Fixed Effects Intercept Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Estimation Method SEX LITSIZE TREATMENT SAS: proc mixed SPSS: GENLINMIXED R: lme() function Stata: mixed REML REML REML REML F (1, 294) = 63.6, p < .01 F (1, 23) = 33.7, p < .01 F (2, 23) = 11.4, p < .01 χ2 (1) = 66.7, p < .01 χ2 (1) = 49.6, p < .01 χ2 (2) = 22.8, p < .01 F (1, 256) = 66.7, F (1, 256) = 66.7, p < .01 p < .01 F (1, 31.1) = 49.6, F (1, 31.1) = 49.6, p < .01 p < .01 F (2, 24.3) = 11.4, F (2, 24) = 11.4, p < .01 p < .01 Note: (n.c.) = not computed Note: 322 Rat Pups at Level 1; 27 Litters at Level 2 Two-Level Models for Clustered Data:The Rat Pup Example TABLE 3.8: (Continued) 117 118 Linear Mixed Models: A Practical Guide Using Statistical Software Both the high and low treatment means of birth weight are significantly different from the Control mean at α = 0.05 (p = .0002 and p = .0231, respectively), and the high and low treatment means are not significantly different (p = .0558). 3.7.2 Covariance Parameter Estimates The SAS output below displays the covariance parameter estimates reported by proc mixed for the REML-based fit of Model 3.3. Covariance Parameter Estimates -------------------------------------------------------------------------------Cov Parm Subject Group Estimate Std Error Z Value Pr Z Intercept litter 0.09900 0.03288 3.01 .0013 Residual TRTGRP High/low 0.09178 0.009855 9.31 <.0001 Residual TRTGRP control 0.2646 0.03395 7.79 <.0001 The estimated variance of the random effects associated with the intercept for litters is 0.099. The estimated residual variances for the high/low and control treatment groups are 0.092 and 0.265, respectively. That is, the estimated residual variance for the combined high/low treatment group is only about one-third as large as that of the control group. Based on this result, it appears that treatment not only lowers the mean birth weights of rat pups, but reduces their within-litter variance as well. The difference in variability based on treatment groups is apparent in the box plots in Figure 3.2. We do not consider the Wald z-tests for the covariance parameters included in this output, but rather recommend the use of likelihood ratio tests for testing the covariance parameters, as discussed in Subsections 3.5.1 and 3.5.2. 3.8 Estimating the Intraclass Correlation Coefficients (ICCs) In the context of a two-level model specification with random intercepts, the intraclass correlation coefficient (ICC) is a measure describing the similarity (or homogeneity) of the responses within a cluster (e.g., litter) and can be defined as a function of the variance components in the model. For the models investigated in this chapter, the litter-level ICC is defined as the proportion of the total random variation in the responses (the denominator in (3.4)) that is due to the variance of the random litter effects (the numerator in (3.4)): ICClitter = 2 σlitter 2 σlitter + σ2 (3.4) ICClitter will be high if the total random variation in the denominator is dominated by the variance of the random litter effects. Because variance components are by definition greater than or equal to zero, the ICCs derived using (3.4) are also greater than or equal to zero. Although the classical definition of an ICC arises from a variance components model, in which there are only random intercepts and no fixed effects, we illustrate the calculation of adjusted ICCs in this chapter (Raudenbush & Bryk, 2002). The definition of an adjusted ICC is based on the variance components used in the model containing both random intercepts and fixed effects. Two-Level Models for Clustered Data:The Rat Pup Example 119 We obtain an estimate of the ICC by substituting the estimated variance components from a fitted model into (3.4). The variance component estimates are clearly labeled in the output provided by each software procedure and can be used for this calculation. Another way to think about the intraclass correlation is in the context of a marginal model implied by a mixed model with random intercepts, which has a marginal variancecovariance matrix with a compound symmetry structure (see Subsection 2.3.2). In the marginal model, the ICC can be defined as a correlation coefficient between responses, common for any pair of individuals within the same cluster. For the Rat Pup example, the marginal ICC is defined as the correlation of observed birth weight responses on any two rat pups i and i within the same litter j, i.e., corr (WEIGHTij , WEIGHTi j ). Unlike the ICC calculated in (3.4), an ICC calculated using the marginal model approach could be either positive or negative. A negative marginal correlation between the weights of pairs of rat pups within the same litter would result in a negative intraclass correlation. This could occur, for example, in the context of competition for maternal resources in the womb, where rat pups that grow to be large might do so at the expense of other pups in the same litter. In practice, there are arguments for and against using each form of the ICC. The ICC in (3.4) is easy to understand conceptually, but is restricted to be positive by the definition of variance components. The ICC estimated from the marginal model allows for negative correlations, but loses the nice interpretation of the variance components made possible with random intercepts. In general, models with explicitly specified random effects and the corresponding ICC definition in (3.4) are preferred when the pairwise correlations within clusters are positive. However, we advise fitting a marginal model without random effects and with a compound symmetry variance-covariance structure for the residuals (see Subsection 2.3.1) to check for a negative ICC. If the ICC is in fact negative, a marginal model must be used. We illustrate fitting marginal models in Chapters 5 and 6, and discuss marginal models in more detail in Section 2.3. In this section, we calculate ICCs using the estimated variance components from Model 3.3 and also show how to obtain the intraclass correlations from the variance-covariance matrix of the marginal model implied by Model 3.3. Recall that Model 3.3 has a heterogeneous residual variance structure, with one residual variance for the control group and another for the pooled high/low treatment group. As a result, the observations on rat pups in the control group also have a different estimated intraclass correlation than observations on rat pups in the high/low treatment group. Equation (3.5) shows the estimated ICC for the control group, and (3.6) shows the estimated ICC for the pooled high/low treatment group. Because there is more within-litter variation in the control group, as noted in Figure 3.2, the estimated ICC is lower in the control group than in the pooled high/low group. litter = Control: ICC litter = High/Low: ICC 2 σ litter 0.0990 = = 0.2722 2 +σ control 0.0990 + 0.2646 2 σ litter 2 σ litter 0.0990 = = 0.5189 2 2 σ litter +σ high/low 0.0990 + 0.0918 (3.5) (3.6) We can also obtain estimates of the ICC for a single litter from the marginal variancecovariance matrix (Vj ) implied by Model 3.3. When we specify v = 3 in the random statement in proc mixed in SAS, the estimated marginal variance-covariance matrix for the observations on the four rat pups in litter 3 (V3 ) is displayed in the following output. Note that this litter was assigned to the control treatment. 120 Linear Mixed Models: A Practical Guide Using Statistical Software ' $ Estimated V Correlation Matrix for litter 3 Row Col1 Col2 Col3 Col4 -----------------------------------------1 0.3636 0.0990 0.0990 0.0990 2 0.0990 0.3636 0.0990 0.0990 3 0.0990 0.0990 0.3636 0.0990 4 0.0990 0.0990 0.0990 0.3636 & % The marginal variance-covariance matrices for observations on rat pups within every litter assigned to the control dose would have the same structure, with constant variance on the diagonal and constant covariance in the off-diagonal cells. The size of the matrix for each litter would depend on the number of pups in that litter. Observations on rat pups from different litters would have zero marginal covariance because they are assumed to be independent. The estimated marginal correlations of observations collected on rat pups in litter 3 can be derived using the SAS output obtained from the vcorr = 3 option in the random statement. The estimated marginal correlation matrix for rat pups within litter 3, which received the control treatment, is as follows: ' $ Row Col1 Col2 Col3 Col4 -----------------------------------------1 1.0000 0.2722 0.2722 0.2722 2 0.2722 1.0000 0.2722 0.2722 3 0.2722 0.2722 1.0000 0.2722 4 0.2722 0.2722 0.2722 1.0000 & % Estimated V Correlation Matrix for litter 3 We see the same estimates of the marginal within-litter correlation for control litters, 0.2722, that was found when we calculated the ICC using the estimated variance components. Similar estimates of the marginal variance-covariance matrices implied by Model 3.3 can be obtained in R using the getVarCov() function: > getVarCov(model3.3.reml.fit, individual="3", type="marginal") Estimates for litters in different treatment groups can be obtained by changing the value of the “individual” argument in the preceding function in R, or by specifying different litter numbers in the v = and vcorr = options in SAS proc mixed. The option of displaying these marginal variance-covariance matrices is currently not available in the other three software procedures. 3.9 3.9.1 Calculating Predicted Values Litter-Specific (Conditional) Predicted Values Using the estimates obtained from the fit of Model 3.3, a formula for the predicted birth weight of an individual observation on rat pup i from litter j is written as: Two-Level Models for Clustered Data:The Rat Pup Example 121 WEIGHT ij = 8.33 − 0.36 × TREAT1j − 0.43 × TREAT2j − 0.34 × SEXij − 0.13 × LITSIZEj + ûj (3.7) Recall that TREAT1j and TREAT2j in (3.7) are dummy variables, indicating whether or not an observation is for a rat pup in a litter that received the high or low dose treatment, SEX1ij is a dummy variable indicating whether or not a particular rat pup is a female, and LITSIZEj is a continuous variable representing the size of litter j. The predicted value of uj is the realization (or EBLUP) of the random litter effect for the j-th litter. This equation can also be used to write three separate equations for predicting birth weight in the three treatment groups: WEIGHT ij = 7.47 − 0.34 × SEX1ij − 0.13 × LITSIZEj + ûj Low: WEIGHT ij = 7.90 − 0.34 × SEX1ij − 0.13 × LITSIZEj + ûj Control: WEIGHT ij = 8.33 − 0.34 × SEX1ij − 0.13 × LITSIZEj + ûj High: For example, after fitting Model 3.3 using proc mixed, we see the EBLUPs of the random litter effects displayed in the SAS output. The EBLUP of the random litter effect for litter 7 (assigned to the control dose) is 0.3756, suggesting that pups in this litter tend to have higher birth weights than expected. The formula for the predicted birth weight values in this litter is: WEIGHT ij = 8.33 − 0.34 × SEX1ij − 0.13 × LITSIZEj + 0.3756 = 8.71 − 0.34 × SEX1ij − 0.13 × 18 = 6.37 − 0.34 × SEX1ij We present this calculation for illustrative purposes only. The conditional predicted values for each rat pup can be placed in an output data set by using the outpred = option in the model statement in proc mixed, as illustrated in the final syntax for Model 3.3 in Subsection 3.4.1. 3.9.2 Population-Averaged (Unconditional) Predicted Values Population-averaged (unconditional) predicted values for birth weight in the three treatment groups, based only on the estimated fixed effects in Model 3.3, can be calculated using the following formulas: WEIGHT ij = 7.47 − 0.34 × SEX1ij − 0.13 × LITSIZEj Low: WEIGHT ij = 7.90 − 0.34 × SEX1ij − 0.13 × LITSIZEj Control: WEIGHTij = 8.33 − 0.34 × SEX1ij − 0.13 × LITSIZEj High: Note that these formulas, which can be used to determine the expected values of birth weight for rat pups of a given sex in a litter of a given size, simply omit the random litter effects. These population-averaged predicted values represent expected values of birth weight based on the marginal distribution implied by Model 3.3 and are therefore sometimes referred to as marginal predicted values. The marginal predicted values for each rat pup can be placed in an output data set by using the outpredm = option in the model statement in proc mixed, as illustrated in the final syntax for Model 3.3 in Subsection 3.4.1. 122 3.10 Linear Mixed Models: A Practical Guide Using Statistical Software Diagnostics for the Final Model The model diagnostics considered in this section are for the final model, Model 3.3, and are based on REML estimation of this model using SAS proc mixed. The syntax for this model is shown in Subsection 3.4.1. 3.10.1 Residual Diagnostics We first examine conditional raw residuals in Subsection 3.10.1.1, and then discuss studentized conditional raw residuals in Subsection 3.10.1.2. The conditional raw residuals are not as well suited to detecting outliers as are the studentized conditional residuals (Schabenberger, 2004). 3.10.1.1 Conditional Residuals In this section, we examine residual plots for the conditional raw residuals, which are the realizations of the residuals, εij , based on the REML fit of Model 3.3. They are the differences between the observed values and the litter-specific predicted values for birth weight, based on the estimated fixed effects and the EBLUPs of the random effects. Similar plots for the conditional residuals are readily obtained for this model using the other software procedures. Instructions on how to obtain these diagnostic plots in the other software procedures are included on the book’s web page (see Appendix A). The conditional raw residuals for Model 3.3 were saved in a new SAS data set (pdat1) by using the outpred = option in the model statement of proc mixed in Subsection 3.4.1. To check the assumption of normality for the conditional residuals, a histogram and a normal quantile–quantile plot (Q–Q plot) are obtained separately for the high/low treatment group and the control treatment group. To do this, we first use proc sgpanel with the histogram keyword, and then use proc univariate with the qqplot keyword: /* Figure 3.4: Histograms of residuals by treatment group */ title; proc sgpanel data=pdat1; panelby trtgrp / novarname rows=2; histogram resid; format trtgrp tgrpfmt.; run; /* Figure 3.5: Normal Q-Q plots of residuals by treatment group */ proc univariate data=pdat1; class trtgrp ; var resid; qqplot / normal(mu=est sigma=est); format trtgrp tgrpfmt.; run; We look at separate graphs for the high/low and control treatment groups. It would be inappropriate to combine these graphs for the conditional raw residuals, because of the differences in the residual variances for levels of TRTGRP in Model 3.3. The plots in Figures 3.4 and 3.5 show that there are some outliers at the low end of the distribution of the conditional residuals and potentially negatively skewed distributions for both the high/low treatment and control groups. However, the skewness is not severe. Two-Level Models for Clustered Data:The Rat Pup Example 123 FIGURE 3.4: Histograms for conditional raw residuals in the pooled high/low and control treatment groups, based on the fit of Model 3.3. The Shapiro–Wilk test of normality for the conditional residuals reported by proc univariate (with the normal option) reveals that the assumption of normality for the conditional residuals is violated in both the pooled high/low treatment group (Shapiro– Wilk W = 0.982, p = 0.015)3 and the control group (W = 0.868, p < 0.001). These results suggest that the model still requires some additional refinement (e.g., transformation of the response variable, or further investigation of outliers). A plot of the conditional raw residuals vs. the predicted values by levels of TRTGRP can be generated using the following syntax: 3 SAS proc univariate also reports three additional tests for normality (including Kolmogorov– Smirnov), the results of which indicate that we would not reject a null hypothesis of normality for the conditional residuals in the high/low group (KS p > .15). 124 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 3.5: Normal Q–Q plots for the conditional raw residuals in the pooled high/low and control treatment groups, based on the fit of Model 3.3. /* Figure 3.6: Plot of conditional raw residuals vs. predicted values */ proc sgpanel data = pdat1; panelby trtgrp / novarname rows=2; scatter y = resid x = pred ; format trtgrp tgrpfmt.; run; We do not see strong evidence of nonconstant variance for the pooled high/low group in Figure 3.6. However, in the control group, there is evidence of an outlier, which after some additional investigation was identified as being for PUP ID = 66. An analysis without the values for this rat pup resulted in similar estimates for each of the fixed effects included in Model 3.3, suggesting that this observation, though poorly fitted, did not have a great deal of influence on these estimates. 3.10.1.2 Conditional Studentized Residuals The conditional studentized residual for an observation is the difference between the observed value and the predicted value, based on both the fixed and random effects in the model, divided by its estimated standard error. The standard error of a residual depends on both the residual variance for the treatment group (high/low or control), and the leverage of the observation. The conditional studentized residuals are more appropriate to examine model assumptions and to detect outliers and potentially influential points than are the raw residuals because the raw residuals may not come from populations with equal variance, as was the case in this analysis. Furthermore, if an observation has a large conditional residual, we cannot know if it is large because it comes from a population with a larger variance, or if it is just an unusual observation (Schabenberger, 2004). Two-Level Models for Clustered Data:The Rat Pup Example 125 FIGURE 3.6: Scatter plots of conditional raw residuals vs. predicted values in the pooled high/low and control treatment groups, based on the fit of Model 3.3. We first examine the distribution of the conditional studentized residuals by obtaining box plots of the distribution of these residuals for each litter. These box plots are produced as a result of the boxplot option being used in the plots = specification of proc mixed, or they can be obtained by using the following syntax: proc sgplot data = pdat1 noautolegend; vbox studentresid / category=litter; run; In Figure 3.7, we notice that the rat pup in row 66 of the data set (corresponding to PUP ID = 66, from litter 6, because we sorted the ratpup3 data set by PUP ID before fitting Model 3.3) is an outlier. These box plots also show that the distribution of the conditional studentized residuals is approximately homogeneous across litters, unlike the raw birth weight observations displayed in Figure 3.2. Note that there is no need to plot studentized residuals separately for different treatments as there was for the raw conditional residuals. This plot gives us some confidence in the appropriateness of Model 3.3. 126 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 3.7: Box plot of conditional studentized residuals by new litter ID, based on the fit of Model 3.3. 3.10.2 Influence Diagnostics In this section, we examine influence diagnostics generated by proc mixed for the rat pups and their litters. 3.10.2.1 Overall Influence Diagnostics The graphs displayed here were obtained by using the influence (iter = 5 effect = litter est) option in the model statement. The iter = suboption controls the maximum number of additional iterations that will be performed by proc mixed to update the parameter estimates when a given litter is deleted. The est suboption causes SAS to display a graph of the effects of each litter on the estimates of the fixed effects (shown in Figure 3.11). model weight = treat sex litsize / solution ddfm=sat influence(iter=5 effect=litter est) ; Figure 3.8 shows the effect of deleting one litter at a time on the restricted (REML) likelihood distance, a measure of the overall fit of the model. Based on this graph, litter 6 is shown to have a very large influence on the REML likelihood. In the remaining part of the model diagnostics, we will investigate which aspect of the model fit is influenced by individual litters (and litter 6 in particular). Figure 3.9 shows the effect of deleting each litter on selected model summary statistics (Cook’s D and Covratio). The upper left panel, Cook’s D Fixed Effects, shows the overall effect of the removal of each litter on the vector of fixed-effect parameter estimates. Litter 6 does not have a high value of Cook’s D statistic, indicating that it does not have a large effect on the fixed-effect parameter estimates, even though it has a large influence on the Two-Level Models for Clustered Data:The Rat Pup Example 127 FIGURE 3.8: Effect of deleting each litter on the REML likelihood distance for Model 3.3. FIGURE 3.9: Effects of deleting one litter at a time on summary measures of influence for Model 3.3. 128 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 3.10: Effects of deleting one litter at a time on measures of influence for the covariance parameters in Model 3.3. REML likelihood. However, deletion of litter 6 has a very large influence on the estimated covariance parameters, as shown in the Cook’s D Covariance Parameters panel. The Covratio Fixed Effects panel illustrates the change in the precision of the fixed-effect parameter estimates brought about by deleting the j-th litter. A reference line is drawn at 1.0 for this graph. A litter with a value of covratio = 1.0 indicates that deletion of that litter has “no effect” on the precision of the fixed-effect parameter estimates, and a value of covratio substantially below 1.0 indicates that deleting that litter will improve the precision of the fixed-effect parameter estimates. A litter that greatly affects the covratio for the fixed effects could have a large influence on statistical tests (such as t-tests and F -tests). There do not appear to be any litters that have a large influence on the precision of the estimated fixed effects (i.e., much lower covratio than the others). However, the very small value of covratio = 0.4 for litter 6 in the panel of Covratio for Covariance Parameters indicates that the precision of the estimated covariance parameters can be substantially improved by removing litter 6. 3.10.2.2 Influence on Covariance Parameters The ODS graphics output from SAS also produces influence diagnostics for the covariance parameter estimates. These can be useful in identifying litters that may have a large effect 2 on the estimated between-litter variance (σlitter ), or on the residual variance estimates for the two treatment groups (high/low and control). We have included the panel of plots displaying the influence of each litter on the covariance parameters in Figure 3.10. Each panel of Figure 3.10 depicts the influence of individual litters on one of the covariance parameters in the model. There is a horizontal line showing the estimated value of the covariance parameter when all litters are used in the model. For example, in the first panel Two-Level Models for Clustered Data:The Rat Pup Example 129 FIGURE 3.11: Effect of deleting each litter on measures of influence for the fixed effects in Model 3.3. of the graph (labeled Intercept litter), the between-litter variance is displayed, with a horizontal line at 0.099, the estimated between-litter variance from Model 3.3. The change in the between-litter variance that would result from deletion of each litter is depicted. Litter 6 has little influence on the between-litter variance estimate. The two lower panels depict the residual variance estimates for the high/low treatment group (labeled Residual TRTGRP HIGH/LOW), and for the control group (labeled Residual TRTGRP Control). This graph shows that deletion of litter 6 would decrease the residual variance estimated for the control group from 0.2646 (as in Model 3.3) to about 0.155. We refitted Model 3.3 excluding litter 6, and found that this was in fact the case, with the residual variance for the control group decreasing from 0.2646 to 0.1569. 3.10.2.3 Influence on Fixed Effects Figure 3.11 shows the effect of the deletion of each litter on the fixed-effect parameter estimates, with one panel for each fixed effect. A horizontal line is drawn to represent the value of the parameter estimate when all cases are used in the analysis, and the effect of deleting each litter is depicted. For example, there is very little effect of litter 6 on the estimated intercept in the model (because litter 6 is a control litter, its effect would be included in the intercept). We refitted Model 3.3 excluding litter 6, and found that the estimated fixed effects were very similar (litter 6 received the control treatment). The residual variance for the control group is much smaller without the presence of this litter, as discussed in Subsection 3.10.2.2. The likelihood ratio test for Hypothesis 3.4 was still significant after deleting litter 6 (p < 0.001), which suggested retaining the heterogeneous residual variance model (Model 3.2B) rather than assuming homogeneous residual variance (Model 3.1). 130 3.11 3.11.1 Linear Mixed Models: A Practical Guide Using Statistical Software Software Notes and Recommendations Data Structure SAS proc mixed, the MIXED and GENLINMIXED commands in SPSS, the lme() and lmer() functions in R, and the mixed command in Stata all require a single data set in the “long” format, containing both Level 1 and Level 2 variables. See Table 3.2 for an illustration of the Rat Pup data set in the “long” format. The HLM2 procedure requires the construction of two data sets for a two-level analysis: a Level 1 data set containing information on the dependent variable, and covariates measured at Level 1 of the data (e.g., rat pups); and a Level 2 data set containing information on the clusters (e.g., litters). See Subsection 3.4.5 for more details on setting up the two data sets for HLM. 3.11.2 Syntax vs. Menus SAS proc mixed and the lme() and lmer() functions in R require users to create syntax to specify an LMM. The MIXED and GENLINMIXED procedures in SPSS allows users to specify a model using the menu system and then paste the syntax into a syntax window, where it can be saved or run later. Stata allows users to specify a model either through the menu system or by typing syntax for the mixed command. The HLM program works mainly by menus (although batch processing of syntax for fitting HLM models is possible; see the HLM documentation for more details). 3.11.3 Heterogeneous Residual Variances for Level 2 Groups In the Rat Pup analysis we considered models (e.g., Model 3.2A, Model 3.2B, and Model 3.3) in which rat pups from different treatment groups were allowed to have different residual variances. In other words, the Level 1 (residual) variance was allowed to vary as a function of a Level 2 variable (TREATMENT, which is a characteristic of the litter). The option of specifying a heterogeneous structure for the residual variance matrix, Rj , based on groups defined by Level 2 variables, is currently only available in SAS proc mixed, the GENLINMIXED command in SPSS, the lme() function in R, and the mixed command in Stata. The HLM2 procedure currently allows the residual variance at Level 1 of a two-level data set to be a function of Level 1 variables, but not of Level 2 variables. The ability to fit models with heterogeneous residual variance structures for different groups of observations can be very helpful in certain settings and was found to improve the fits of the models in the Rat Pup example. In some analyses, fitting a heterogeneous residual variance structure could make a critical difference in the results obtained and in the conclusions for the fixed effects in a model. 3.11.4 Display of the Marginal Covariance and Correlation Matrices SAS proc mixed and the lme() function in R allow users to request that the marginal covariance matrix implied by a fitted model be displayed. The option of displaying the implied marginal covariance matrix is currently not available in the other three software procedures. We have found that it is helpful to investigate the implied marginal variancecovariance and correlation matrices for a linear mixed model to see whether the estimated variance-covariance structure is appropriate for the data set being analyzed. Two-Level Models for Clustered Data:The Rat Pup Example 3.11.5 131 Differences in Model Fit Criteria The –2 REML log-likelihood values obtained by fitting the two-level random intercept models discussed in this chapter are nearly identical across the software procedures (See Tables 3.6 to 3.8). However, there are some minor differences in this statistic, as calculated by the HLM2 procedure, which are perhaps due to computational differences. The AIC statistic differs across the software procedures that calculate it, because of differences in the formulas used (see Subsection 3.6.1 for a discussion of the different calculation formulas for AIC). 3.11.6 Differences in Tests for Fixed Effects SAS proc mixed, the anova() function in R, and the MIXED and GENLINMIXED commands in SPSS all compute F -tests for the overall effects of fixed factors and covariates in the model, in addition to t-tests for individual fixed-effect parameters. In contrast, the mixed command in Stata calculates z-tests for individual fixed-effect parameters, or alternatively, Wald chi-square tests for fixed effects via the use of the test command. The HLM2 program calculates t-tests for fixed-effect parameters, or alternatively, Wald chi-square tests for sets of fixed effects. By default, the procedures in SAS and SPSS both calculate Type III F -tests, in which the significance of each term is tested conditional on the fixed effects of all other terms in the model. When the anova() function in R is applied to a model fit object, it returns Type I (sequential) F -tests, which are conditional on just the fixed effects listed in the model prior to the effects being tested. The Type III and Type I F -tests are comparable only for the term entered last in the model (except for certain models for balanced data). When making comparisons between the F -test results for SAS and R, we used the F -tests only for the last term entered in the model to maintain comparability. Note that SAS proc mixed can calculate Type I F -tests by specifying the htype = 1 option in the model statement. The programs that calculate F -tests and t-tests use differing methods to calculate denominator degrees of freedom (or degrees of freedom for t-tests) for these approximate tests. SAS allows the most flexibility in specifying methods for calculating denominator degrees of freedom, as discussed below. The MIXED command in SPSS and the lmerTest package in R both use the Satterthwaite approximation when calculating denominator degrees of freedom for these tests and do not provide other options. A note on denominator degrees of freedom (df ) methods for F -tests in SAS The containment method is the default denominator degrees of freedom method for proc mixed when a random statement is used, and no denominator degrees of freedom method is specified. The containment method can be explicitly requested by using the ddfm = contain option. The containment method attempts to choose the correct degrees of freedom for fixed effects in the model, based on the syntax used to define the random effects. For example, the df for the variable TREAT would be the df for the random effect that contains the word “treat” in it. The syntax random int / subject = litter(treat); would cause SAS to use the degrees of freedom for litter(treat) as the denominator degrees of freedom for the F -test of the fixed effects of TREAT. If no random effect syntactically includes the word “treat,” the residual degrees of freedom would be used. The Satterthwaite approximation (ddfm = sat) is intended to produce a more accurate F -test approximation, and hence a more accurate p-value for the F -test. The 132 Linear Mixed Models: A Practical Guide Using Statistical Software SAS documentation warns that the small-sample properties of the Satterthwaite approximation have not been thoroughly investigated for all types of models implemented in proc mixed. The Kenward–Roger method (ddfm = kr) is an attempt to make a further adjustment to the F -statistics calculated by SAS, to take into account the fact that the REML estimates of the covariance parameters are in fact estimates and not known quantities. This method inflates the marginal variance-covariance matrix and then uses the Satterthwaite method on the resulting matrix. The between-within method (ddfm = bw) is the default for repeated-measures designs, and may also be specified for analyses that include a random statement. This method divides the residual degrees of freedom into a between-subjects portion and a within-subjects portion. If levels of a covariate change within a subject (e.g., a timedependent covariate), then the degrees of freedom for effects associated with that covariate are the within-subjects df. If levels of a covariate are constant within subjects (e.g., a time-independent covariate), then the degrees of freedom are assigned to the between-subjects df for F -tests. The residual method (ddfm = resid) assigns n − rank(X) as the denominator degrees of freedom for all fixed effects in the model. The rank of X is the number of linearly independent columns in the X matrix for a given model. This is the same as the degrees of freedom used in ordinary least squares (OLS) regression (i.e., n−p, where n is the total number of observations in the data set and p is the number of fixed-effect parameters being estimated). In R, when applying the summary() or anova() functions to the object containing the results of an lme() fit, denominator degrees of freedom for tests associated with specific fixed effects are calculated as follows: Denominator degrees of freedom (df ) methods for F -tests in R Level 1: df = # of Level 1 observations – # of Level 2 clusters – # of Level 1 fixed effects Level 2: df = # of Level 2 clusters – # of Level 2 fixed effects For example, in the Type I F -test for TREATMENT from R reported in Table 3.8, the denominator degrees of freedom is 23, which is calculated as 27 (the number of litters) minus 4 (the number of fixed-effect parameters associated with the Level 2 variables). In the Type I F -test for SEX, the denominator degrees of freedom is 294, which is calculated as 322 (the number of pups in the data set) minus 27 (the number of litters), minus 1 (the number of fixed-effect parameters associated with the Level 1 variables). In general, the differences in the denominator degrees of freedom calculations are not critical for the t-tests and F -tests when working with data sets and relatively large (> 100) samples at each level (e.g., rat pups at Level 1 and litters at Level 2) of the data set. However, more attention should be paid when working with small samples, and these cases are best handled using the different options provided by proc mixed in SAS. We recommend using the Kenward–Roger method in these situations (Verbeke & Molenberghs, 2000). We also remind readers that denominator degrees of freedom are not computed by default when using the lme4 package version of the lmer() function in R, and p-values for computed test statistics are not displayed as a result. Users of the lmer() function can Two-Level Models for Clustered Data:The Rat Pup Example 133 consider loading the lmerTest package to compute Satterthwaite approximations of these degrees of freedom (and p-values for the corresponding test statistics). 3.11.7 Post-Hoc Comparisons of Least Squares (LS) Means (Estimated Marginal Means) A very useful feature of SAS proc mixed and the MIXED and GENLINMIXED commands in SPSS is the ability to obtain post-hoc pairwise comparisons of the least-squares means (called estimated marginal means in SPSS) for different levels of a fixed factor, such as TREATMENT. This can be accomplished by the use of the lsmeans statement in SAS and the EMMEANS subcommand in SPSS. Different adjustments are available for multiple comparisons in these two software procedures. SAS has the wider array of choices (e.g., Tukey–Kramer, Bonferroni, Dunnett, Sidak, and several others). SPSS currently offers only the Bonferroni and Sidak methods. Post-hoc comparisons can also be computed in R with some additional programming (Faraway, 2005), and users of Stata can apply the margins command after fitting a model using mixed to explore alternative differences in marginal means (see help margins in Stata). 3.11.8 Calculation of Studentized Residuals and Influence Statistics Whereas each software procedure can calculate both conditional and marginal raw residuals, SAS proc mixed is currently the only program that automatically provides studentized residuals, which are preferred for model diagnostics (detection of outliers and identification of influential observations). SAS has also developed powerful and flexible influence diagnostics that can be used to explore the potential influence of individual observations or clusters of observations on both fixed-effect parameter estimates and covariance parameter estimates, and we have illustrated the use of these diagnostic tools in this chapter. 3.11.9 Calculation of EBLUPs The procedures in SAS, R, Stata, and HLM all allow users to generate and save the EBLUPs for any random effects (and not just random intercepts, as in this example) included in an LMM. In general, the current version of SPSS does not allow users to extract the values of the EBLUPs. However, for models with a single random intercept for each cluster, the EBLUPs can be easily calculated in SPSS, as illustrated for Model 3.1 in Subsection 3.4.2. 3.11.10 Tests for Covariance Parameters By default, the procedures in SAS and SPSS do not report Wald tests for covariance parameters, but they can be requested as a means of obtaining the standard errors of the estimated covariance parameters. We do not recommend the use of Wald tests for testing hypotheses about the covariance parameters. REML-based likelihood ratio tests for covariance parameters can be carried out (with a varying amount of effort by the user) in proc mixed in SAS, the MIXED and GENLINMIXED commands in SPSS, and by using the anova() function after fitting an LMM with the either the lme() or the lmer() function in R. The syntax to perform these likelihood ratio tests is most easily implemented using R, although the user may need to adjust the p-value obtained, to take into account mixtures of χ2 distributions when appropriate. The default likelihood ratio test reported by the mixed command in Stata for the covariance parameters is an overall test of the covariance parameters associated with all random 134 Linear Mixed Models: A Practical Guide Using Statistical Software effects in the model. In models with a single random effect for each cluster, as in Model 3.1, it is appropriate to use this test to decide if that effect should be included in the model. However, the likelihood ratio test reported by Stata should be considered conservative in models including multiple random effects. The HLM procedures utilize alternative chi-square tests for covariance parameters associated with random effects, the definition of which depend on the number of levels of data (e.g., whether one is studying a two-level or three-level data set). For details on these tests in HLM, refer to Raudenbush & Bryk (2002). 3.11.11 Reference Categories for Fixed Factors By default, proc mixed in SAS and the MIXED and GENLINMIXED commands in SPSS both prevent overparameterization of LMMs with categorical fixed factors by setting the fixed effects associated with the highest-valued levels of the factors to zero. This means that when using these procedures, the highest-valued level of a fixed factor can be thought of as the “reference” category for the factor. The lme() and lmer() functions in R and the mixed command in Stata both use the lowest-valued levels of factors as the reference categories, by default. These differences will result in different parameterizations of the model being fitted in the different software procedures unless recoding is carried out prior to fitting the model. However, the choice of parameterization will not affect the overall model fit criteria, predicted values, hypothesis tests, or residuals. Unlike the other four procedures, the HLM procedures in general (e.g., HLM2 for twolevel models, HLM3 for three-level models, etc.) do not automatically generate dummy variables for levels of categorical fixed factors to be included in a mixed model. Indicator variables for the nonreference levels of fixed factors need to be generated first in another software package before importing the data sets into HLM. This leaves the choice of the reference category to the user. 4 Three-Level Models for Clustered Data: The Classroom Example 4.1 Introduction In this chapter, we illustrate models for clustered study designs having three levels of data. In three-level clustered data sets, the units of analysis (Level 1) are nested within randomly sampled clusters (Level 2), which are in turn nested within other larger randomly sampled clusters (Level 3). Such study designs allow us to investigate whether covariates measured at each level of the data hierarchy have an impact on the dependent variable, which is always measured on the units of analysis at Level 1. Designs that lead to three-level clustered data sets can arise in many fields of research, as illustrated in Table 4.1. For example, in the field of education research, as in the Classroom data set analyzed in this chapter, students’ math achievement scores are studied by first randomly selecting a sample of schools, then sampling classrooms within each school, and finally sampling students from each selected classroom (Figure 4.1). In medical research, a study to evaluate treatments for blood pressure may be carried out in multiple clinics, with several doctors from each clinic selected to participate and multiple patients treated by each doctor participating in the study. In a laboratory research setting, a study of birth weights in rat pups similar to the Rat Pup study that we analyzed in Chapter 3 might have replicate experimental runs, with several litters of rats involved in each run, and several rat pups in each litter. In Table 4.1, we provide possible examples of three-level clustered data sets in different research settings. In these studies, the dependent variable is measured on one occasion for each Level 1 unit of analysis, and covariates may be measured at each level of the data hierarchy. For example, in a study of student achievement, the dependent variable, math achievement score, is measured once for each student. In addition, student characteristics such as age, classroom characteristics such as class size, and school characteristics such as neighborhood poverty level are all measured. In the multilevel models that we fit to threelevel data sets, we relate the effects of covariates measured at each level of the data to the dependent variable. Thus, in an analysis of the study of student achievement, we might use student characteristics to help explain between-student variability in math achievement scores, classroom characteristics to explain between-classroom variability in the classroomspecific average math scores, and school-level covariates to help explain between-school variation in school-specific mean math achievement scores. Figure 4.1 illustrates the hierarchical structure of the data for an educational study similar to the Classroom study analyzed in this chapter. This figure depicts the data structure for a single (hypothetical) school with two randomly selected classrooms. Models applicable for data from such sampling designs are known as three-level hierarchical linear models (HLMs) and are extensions of the two-level models introduced in 135 136 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 4.1: Examples of Three-Level Data in Different Research Settings Research Setting Level of Data Level 3 Level 2 Level 1 Education Medicine Biology Cluster of clusters (random factor) School Clinic Replicate Covariates School size, poverty level of neighborhood surrounding the school Number of doctors in the clinic, clinic type (public or private) Experimental run, instrument calibration, ambient temperature, run order Cluster of units (random factor) Classroom Doctor Litter Covariates Class size, years of experience of teacher Specialty, years of experience Litter size, weight of mother rat Unit of analysis Student Patient Rat Pup Dependent variable Test scores Blood pressure Birth weight Covariates Sex, age Age, illness severity Sex FIGURE 4.1: Nesting structure of a clustered three-level data set in an educational setting. Three-Level Models for Clustered Data:The Classroom Example 137 Chapter 3. Two-level, three-level, and higher-level models are generally referred to as multilevel models. Although multilevel models in general may have random effects associated with both the intercept and with other covariates, in this chapter we restrict our discussion to random intercept models, in which random effects at each level of clustering are associated with the intercept. The random effects associated with the clusters at Level 2 of a three-level data set are often referred to as nested random effects, because the Level 2 clusters are nested within the clusters at Level 3. In the absence of fixed effects associated with covariates, a three-level HLM is also known as a variance components model. The HLM software (Raudenbush et al., 2005; Raudenbush & Bryk, 2002), which we highlight in this chapter, was developed specifically for analyses involving these and related types of models. 4.2 4.2.1 The Classroom Study Study Description The Study of Instructional Improvement,1 or SII (Anderson et al., 2009), was carried out by researchers at the University of Michigan to study the math achievement scores of firstand third-grade students in randomly selected classrooms from a national U.S. sample of elementary schools. In this example, we analyze data for 1,190 first-grade students sampled from 312 classrooms in 107 schools. The dependent variable, MATHGAIN, measures change in student math achievement scores from the spring of kindergarten to the spring of first grade. The SII study design resulted in a three-level data set, in which students (Level 1) are nested within classrooms (Level 2), and classrooms are nested within schools (Level 3). We examine the contribution of selected student-level, classroom-level and school-level covariates to the variation in students’ math achievement gain. Although one of the original study objectives was to compare math achievement gain scores in schools participating in comprehensive school reforms (CSRs) to the gain scores from a set of matched comparison schools not participating in the CSRs, this comparison is not considered here. A sample of the Classroom data is shown in Table 4.2. The first two rows in the table are included to distinguish between the different types of variables; in the actual electronic data set, the variable names are defined in the first row. Each row of data in Table 4.2 contains the school ID, classroom ID, student ID, the value of the dependent variable, and values of three selected covariates, one at each level of the data. The layout of the data in the table reflects the hierarchical nature of the data set. For instance, the value of MATHPREP, a classroom-level covariate, is the same for all students in the same classroom, and the value of HOUSEPOV, a school-level covariate, is the same for all students within a given school. Values of student-level variables (e.g., the dependent variable, MATHGAIN, and the covariate, SEX) vary from student to student (row to row) in the data. 1 Work on this study was supported by grants from the U.S. Department of Education to the Consortium for Policy Research in Education (CPRE) at the University of Pennsylvania (Grant # OERI-R308A60003), the National Science Foundation’s Interagency Educational Research Initiative to the University of Michigan (Grant #s REC-9979863 & REC-0129421), the William and Flora Hewlett Foundation, and the Atlantic Philanthropies. Opinions expressed in this book are those of the authors and do not reflect the views of the U.S. Department of Education, the National Science Foundation, the William and Flora Hewlett Foundation, or the Atlantic Philanthropies. 138 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 4.2: Sample of the Classroom Data Set School (Level 3) Classroom (Level 2) Student (Level 1) Cluster ID Covariate Cluster ID Covariate Unit ID Dependent Variable Covariate SCHOOLID HOUSEPOV CLASSID MATHPREP CHILDID MATHGAIN SEX 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 109 56 83 53 65 51 66 88 7 60 2 101 30 65 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0.0817 160 2.00 1 0.0817 160 2.00 1 0.0817 160 2.00 1 0.0817 217 3.25 1 0.0817 217 3.25 1 0.0817 217 3.25 1 0.0817 217 3.25 1 0.0817 217 3.25 1 0.0817 217 3.25 1 0.0817 217 3.25 1 0.0817 217 3.25 2 0.0823 197 2.50 2 0.0823 197 2.50 2 0.0823 211 2.33 2 0.0823 211 2.33 ... Note: “...” indicates portion of the data not displayed. The following variables are considered in the analysis of the Classroom data: School (Level 3) Variables • SCHOOLID = School ID numbera • HOUSEPOV = Percentage of households in the neighborhood of the school below the poverty level Classroom (Level 2) Variables • CLASSID = Classroom ID numbera • YEARSTEAb = First-grade teacher’s years of teaching experience • MATHPREP = First-grade teacher’s mathematics preparation: number of mathematics content and methods courses • MATHKNOWb = First-grade teacher’s mathematics content knowledge; based on a scale based composed of 30 items (higher values indicate higher content knowledge) Three-Level Models for Clustered Data:The Classroom Example 139 Student (Level 1) Variables • CHILDID = Student ID numbera • MATHGAIN = Student’s gain in math achievement score from the spring of kindergarten to the spring of first grade (the dependent variable) • MATHKINDb = Student’s math score in the spring of their kindergarten year • SEX = Indicator variable (0 = boy, 1 = girl) • MINORITYb = Indicator variable (0 = non-minority student, 1 = minority student) • SESb = Student socioeconomic status a The original ID numbers in the study were randomly reassigned for the data set used in this example. b Not shown in Table 4.2. 4.2.2 Data Summary The descriptive statistics and plots for the Classroom data presented in this section are obtained using the HLM software (Version 7). Syntax to carry out these descriptive analyses using the other software packages is included on the web page for the book (see Appendix A). 4.2.2.1 Data Set Preparation To perform the analyses for the Classroom data set using the HLM software, we need to prepare three separate data sets: 1. The Level 1 (student-level) Data Set has one record per student, and contains variables measured at the student level, including the response variable, MATHGAIN, the student-level covariates, and the identifying variables: SCHOOLID, CLASSID, and CHILDID. The Level 1 data set is sorted in ascending order by SCHOOLID, CLASSID, and CHILDID. 2. The Level 2 (classroom-level) Data Set has one record per classroom, and contains classroom-level covariates, such as YEARSTEA and MATHKNOW, that are measured for the teacher. Each record contains the identifying variables SCHOOLID and CLASSID, and the records are sorted in ascending order by these variables. 3. The Level 3 (school-level) Data Set has one record per school, and contains schoollevel covariates, such as HOUSEPOV, and the identifying variable, SCHOOLID. The data set is sorted in ascending order by SCHOOLID. The Level 1, Level 2, and Level 3 data sets can easily be derived from a single data set having the “long” structure illustrated in Table 4.2. All three data sets should be stored in a format readable by HLM, such as ASCII (i.e., raw data in text files), or a data set specific to a statistical software package. For ease of presentation, we assume that the three data sets for this analysis have been set up in SPSS format (Version 21+). 4.2.2.2 Preparing the Multivariate Data Matrix (MDM) File We prepare two MDM files for this initial data summary. One includes only variables that do not have any missing values (i.e., it excludes MATHPREP and MATHKNOW), and 140 Linear Mixed Models: A Practical Guide Using Statistical Software consequently, all students (n = 1190) are included. The second MDM file contains all variables, including MATHPREP and MATHKNOW, and thus includes complete cases (n = 1081) only. We create the first Multivariate Data Matrix (MDM) file using the Level 1, Level 2, and Level 3 data sets defined earlier. After starting HLM, locate the main menu, and click on File, Make new MDM file, and then Stat package input. In the dialog box that opens, select HLM3 to fit a three-level hierarchical linear model with nested random effects, and click OK. In the next window, choose the Input File Type as SPSS/Windows. In the Level-1 Specification area of this window, we select the Level 1 data set. Browse to the location of the student-level file. Click Choose Variables, and select the following variables: CLASSID (check “L2id,” because this variable identifies clusters of students at Level 2 of the data), SCHOOLID (check “L3id,” because this variable identifies clusters of classrooms at Level 3 of the data), MATHGAIN (check “in MDM,” because this is the dependent variable), and the student-level covariates MATHKIND, SEX, MINORITY, and SES (check “in MDM” for all of these variables). Click OK when finished selecting these variables. In the Missing Data? box, select “Yes,” because some students may have missing values on some of the variables. Then choose Delete missing level-1 data when: running analyses, so that observations with missing data will be deleted from individual analyses and not from the MDM file. In the Level-2 Specification area of this window, select the Level 2 (classroom-level) data set defined above. Browse to the location of the classroom-level data set. In the Choose Variables dialog box, choose the CLASSID variable (check “L2id”) and the SCHOOLID variable (check “L3id”). In addition, select the classroom-level covariate that has complete data for all classrooms,2 which is YEARSTEA (check “in MDM”). Click OK when finished selecting these variables. In the Level-3 Specification area of this window, select the Level 3 (school-level) data set defined earlier. Browse to the location of the school-level data set, and choose the SCHOOLID variable (click on “L3id”) and the HOUSEPOV variable, which has complete data for all schools (check “in MDM”). Click OK when finished. Once all three data sets have been identified and the variables have been selected, go to the MDM template file portion of the window, where you will see a white box for an MDM File Name (with an .mdm suffix). Enter a name for the MDM file (such as classroom.mdm), including the .mdm suffix, and then click Save mdmt file to save an MDM Template file that can be used later when creating the second MDM file. Finally, click on Make MDM to create the MDM file using the three input files. After HLM finishes processing the MDM file, click Check Stats to view descriptive statistics for the selected variables at each level of the data, as shown in the following HLM output: 2 We do not select the classroom-level variables MATHPREP and MATHKNOW when setting up the first MDM file, because information on these two variables is missing for some classrooms, which would result in students from these classrooms being omitted from the initial data summary. Three-Level Models for Clustered Data:The Classroom Example ' 141 $ LEVEL-1 DESCRIPTIVE STATISTICS VARIABLE NAME SEX MINORITY MATHKIND MATHGAIN SES N 1190 1190 1190 1190 1190 MEAN 0.51 0.68 466.66 57.57 -0.01 SD 0.50 0.47 41.46 34.61 0.74 MINIMUM 0.00 0.00 290.00 -110.00 -1.61 MAXIMUM 1.00 1.00 629.00 253.00 3.21 LEVEL-2 DESCRIPTIVE STATISTICS VARIABLE NAME YEARSTEA N 312 MEAN 12.28 SD 9.65 MINIMUM 0.00 MAXIMUM 40.00 MINIMUM 0.01 MAXIMUM 0.56 LEVEL-3 DESCRIPTIVE STATISTICS VARIABLE NAME HOUSEPOV & N 107 MEAN 0.19 SD 0.14 % Because none of the selected variables in the MDM file have any missing data, this is a descriptive summary for all 1190 students, within the 312 classrooms and 107 schools. We note that 51% of the 1190 students are female and that 68% of them are of minority status. The average number of years teaching for the 312 teachers is 12.28, and the mean proportion of households in poverty in the neighborhoods of the 107 schools is 0.19 (19%). We now construct the second MDM file. After closing the window containing the descriptive statistics, click Choose Variables in the Level-2 Specification area. At this point, we add the MATHKNOW and MATHPREP variables to the MDM file by checking “in MDM” for each of these variables; click OK after selecting them. Then, save the .mdm file and the .mdmt template file under different names and click Make MDM to generate a new MDM file containing these additional Level 2 variables. After clicking Check Stats, the following output is displayed: ' $ LEVEL-1 DESCRIPTIVE STATISTICS VARIABLE NAME SEX MINORITY MATHKIND MATHGAIN SES N 1081 1081 1081 1081 1081 MEAN 0.50 0.67 467.15 57.84 -0.01 SD 0.50 0.47 42.00 34.70 0.75 MINIMUM 0.00 0.00 290.00 -84.00 -1.61 MAXIMUM 1.00 1.00 629.00 253.00 3.21 MINIMUM 0.00 -2.50 1.00 MAXIMUM 40.00 2.61 6.00 MINIMUM 0.01 MAXIMUM 0.56 LEVEL-2 DESCRIPTIVE STATISTICS VARIABLE NAME YEARSTEA MATHKNOW MATHPREP N 285 285 285 MEAN 12.28 -0.08 2.58 SD 9.80 1.00 0.96 LEVEL-3 DESCRIPTIVE STATISTICS VARIABLE NAME HOUSEPOV & N 105 MEAN 0.20 SD 0.14 % 142 Linear Mixed Models: A Practical Guide Using Statistical Software Note that 27 classrooms have been dropped from the Level 2 file because of missing data on the additional Level 2 variables, MATHKNOW and MATHPREP. Consequently, there were two schools (SCHOOLID = 48 and 58) omitted from the Check Stats summary, because none of the classrooms in these two schools had information on MATHKNOW and MATHPREP. This resulted in only 1081 students being analyzed in the data summary based on the second MDM file, which is 109 fewer than were included in the first summary. We close the window containing the descriptive statistics and click Done to proceed to the HLM model-building window. In the model-building window we continue the data summary by examining box plots of the observed MATHGAIN responses for students in the selected classrooms from the first eight schools in the first (complete) MDM file. Select File, Graph Data, and box-whisker plots. To generate these box plots, select MATHGAIN as the Y-axis variable and select Group at level-3 to generate a separate panel of box plots for each school. For Number of groups, we select First ten groups (corresponding to the first 10 schools). Finally, click OK. HLM produces box plots for the first 10 schools, and we display the plots for the first eight schools in Figure 4.2. In Figure 4.2, we note evidence of both between-classroom and between-school variability in the MATHGAIN responses. We also see differences in the within-classroom variability, which may be explained by student-level covariates. By clicking Graph Settings in the HLM graph window, one can select additional Level 2 (e.g., YEARSTEA) or Level 3 (e.g., HOUSEPOV) variables as Z-focus variables that color-code box plots, based on values of the classroom- and school-level covariates. Readers should refer to the HLM manual for more information on additional graphing features. 4.3 Overview of the Classroom Data Analysis For the analysis of the Classroom data, we follow the “step-up” modeling strategy outlined in Subsection 2.7.2. This approach differs from the strategy used in Chapter 3, in that it starts with a simple model, containing only a single fixed effect (the overall intercept), random effects associated with the intercept for classrooms and schools, and residuals, and then builds the model by adding fixed effects of covariates measured at the various levels. In Subsection 4.3.1 we outline the analysis steps, and informally introduce related models and hypotheses to be tested. In Subsection 4.3.2 we present the specification of selected models that will be fitted to the Classroom data, and in Subsection 4.3.3 we detail the hypotheses tested in the analysis. To follow the analysis steps outlined in this section, we refer readers to the schematic diagram presented in Figure 4.3. 4.3.1 Analysis Steps Step 1: Fit the initial “unconditional” (variance components) model (Model 4.1). Fit a three-level model with a fixed intercept, and random effects associated with the intercept for classrooms (Level 2) and schools (Level 3), and decide whether to keep the random intercepts for classrooms (Model 4.1 vs. Model 4.1A). Because Model 4.1 does not include fixed effects associated with any covariates, it is referred to as a “means-only” or “unconditional” model in the HLM literature and is known as a variance components model in the classical ANOVA context. We include the term Three-Level Models for Clustered Data:The Classroom Example 143 FIGURE 4.2: Box plots of the MATHGAIN responses for students in the selected classrooms in the first eight schools for the Classroom data set. “unconditional” in quotes because it indicates a model that is not conditioned on any fixed effects other than the intercept, although it is still conditional on the random effects. The model includes a fixed overall intercept, random effects associated with the intercept for classrooms within schools, and random effects associated with the intercept for schools. After fitting Model 4.1, we obtain estimates of the initial variance components, i.e., the variances of the random effects at the school level and the classroom level, and the residual variance at the student level. We use the variance component estimates from the Model 4.1 fit to estimate intraclass correlation coefficients (ICCs) of MATHGAIN responses at the school level and at the classroom level (see Section 4.8). 144 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 4.3: Model selection and related hypotheses for the Classroom data analysis. We also test Hypothesis 4.1 to decide whether the random effects associated with the intercepts for classrooms nested within schools can be omitted from Model 4.1. We fit a model without the random classroom effects (Model 4.1A) and perform a likelihood ratio test. We decide to retain the random intercepts associated with classrooms nested within schools, and we also retain the random intercepts associated with schools in all subsequent models, to preserve the hierarchical structure of the data. Model 4.1 is preferred at this stage of the analysis. Step 2: Build the Level 1 Model by adding Level 1 Covariates (Model 4.1 vs. Model 4.2). Add fixed effects associated with covariates measured on the students to the Level 1 Model to obtain Model 4.2, and evaluate the reduction in the residual variance. In this step, we add fixed effects associated with the four student-level covariates (MATHKIND, SEX, MINORITY, and SES) to Model 4.1 and obtain Model 4.2. We in- Three-Level Models for Clustered Data:The Classroom Example 145 formally assess the related reduction in the between-student variance (i.e., the residual variance). We also test Hypothesis 4.2, using a likelihood ratio test, to decide whether we should add the fixed effects associated with all of the student-level covariates to Model 4.1. We decide to add these fixed effects and choose Model 4.2 as our preferred model at this stage of the analysis. Step 3: Build the Level 2 Model by adding Level 2 covariates (Model 4.3). Add fixed effects associated with the covariates measured on the Level 2 clusters (classrooms) to the Level 2 model to create Model 4.3, and decide whether to retain the effects of the Level 2 covariates in the model. We add fixed effects associated with the three classroom-level covariates (YEARSTEA, MATHPREP, and MATHKNOW) to Model 4.2 to obtain Model 4.3. At this point, we would like to assess whether the Level 2 component of variance (i.e., the variance of the nested random classroom effects) is reduced when we include the effects of these Level 2 covariates in the model. However, Models 4.2 and 4.3 are fitted using different sets of observations, owing to the missing values on the MATHPREP and MATHKNOW covariates, and a simple comparison of the classroom-level variance components obtained from these two models is therefore not appropriate. We also test Hypotheses 4.3, 4.4, and 4.5, to decide whether we should keep the fixed effects associated with YEARSTEA, MATHPREP, and MATHKNOW in Model 4.3, using individual t-tests for each hypothesis. Based on the results of the t-tests, we decide that none of the fixed effects associated with these Level 2 covariates should be retained and choose Model 4.2 as our preferred model at this stage. We do not use a likelihood ratio test for the fixed effects of all Level 2 covariates at once, as we did for all Level 1 covariates in Step 2, because different sets of observations were used to fit Models 4.2 and 4.3. A likelihood ratio test is only possible if both models were fitted using the same set of observations. In Subsection 4.11.4, we illustrate syntax that could be used to construct a “complete case” data set in each software package. Step 4: Build the Level 3 Model by adding the Level 3 covariate (Model 4.4). Add a fixed effect associated with the covariate measured on the Level 3 clusters (schools) to the Level 3 model to create Model 4.4, and evaluate the reduction in the variance component associated with the Level 3 clusters. In this last step, we add a fixed effect associated with the only school-level covariate, HOUSEPOV, to Model 4.2 and obtain Model 4.4. We assess whether the variance component at the school level (i.e., the variance of the random school effects) is reduced when we include this fixed effect at Level 3 of the model. Because the same set of observations was used to fit Models 4.2 and 4.4, we informally assess the relative reduction in the between-school variance component in this step. We also test Hypothesis 4.6 to decide whether we should add the fixed effect associated with the school-level covariate to Model 4.2. Based on the result of a t-test for the fixed effect of HOUSEPOV in Model 4.4, we decide not to add this fixed effect, and choose Model 4.2 as our final model. We consider diagnostics for Model 4.2 in Section 4.9, using residual files generated by the HLM software. Figure 4.3 provides a schematic guide to the model selection process and hypotheses considered in the analysis of the Classroom data. See Subsection 3.3.1 for a detailed interpretation of the elements of this figure. Table 4.3 provides a summary of the various models considered in the Classroom data analyses. 146 4.3.2 4.3.2.1 Linear Mixed Models: A Practical Guide Using Statistical Software Model Specification General Model Specification We specify Model 4.3 in (4.1), because it is the model with the most fixed effects that we consider in the analysis of the Classroom data. Models 4.1, 4.1A, and 4.2 are simplifications of this more general model. Selected models are summarized in Table 4.3. The general specification for Model 4.3 is: MATHGAINijk = ⎫ β0 + β1 × MATHKINDijk + β2 × SEXijk ⎪ ⎬ + β3 × MINORITYijk + β4 × SESijk + β5 × YEARSTEAjk fixed ⎪ ⎭ + β6 × MATHPREPjk + β7 × MATHKNOWjk +uk + uj|k + εijk } random (4.1) In this specification, MATHGAINijk represents the value of the dependent variable for student i in classroom j nested within school k; β0 through β7 represent the fixed intercept and the fixed effects of the covariates (e.g., MATHKIND,. . . , MATHKNOW); uk is the random effect associated with the intercept for school k; uj|k is the random effect associated with the intercept for classroom j within school k; and εijk represents the residual. To obtain Model 4.4, we add the fixed effect (β8 ) of the school-level covariate HOUSEPOV and omit the fixed effects of the classroom-level covariates (β5 through β7 ). The distribution of the random effects associated with the schools in Model 4.3 is written as 2 uk ∼ N (0, σint:school ) 2 where σint:school represents the variance of the school-specific random intercepts. The distribution of the random effects associated with classrooms nested within a given school is 2 uj|k ∼ N (0, σint:classroom ) 2 where σint:classroom represents the variance of the random classroom-specific intercepts at any given school. This between-classroom variance is assumed to be constant for all schools. The distribution of the residuals associated with the student-level observations is εijk ∼ N (0, σ2 ) where σ 2 represents the residual variance. We assume that the random effects, uk , associated with schools, the random effects, uj|k , associated with classrooms nested within schools, and the residuals, εijk , are all mutually independent. The general specification of Model 4.3 corresponds closely to the syntax that is used to fit the model in SAS, SPSS, Stata, and R. In Subsection 4.3.2.2 we provide the hierarchical specification that more closely corresponds to the HLM setup of Model 4.3. 4.3.2.2 Hierarchical Model Specification We now present a hierarchical specification of Model 4.3. The following model is in the form used in the HLM software, but employs the notation used in the general specification of Model 4.3 in (4.1), rather than the HLM notation. The correspondence between the notation used in this section and that used in the HLM software is shown in Table 4.3. General HLM Notation Notation Intercept MATHKIND SEX MINORITY SES YEARSTEA MATHPREP MATHKNOW HOUSEPOV β0 β1 β2 β3 β4 β5 β6 β7 β8 γ000 γ300 γ100 γ200 γ400 γ010 γ030 γ020 γ001 Intercept Intercept uj|k uk rjk u00k εijk eijk Term/Variable Fixed effects Random effects Classroom (j) School (k) Residuals Student (i) Covariance parameters (θD ) for D matrix Classroom level Variance of intercepts 2 σint:class τπ School level Variance of intercepts 2 σint:school τβ Student level Residual variance σ2 σ2 Covariance parameters (θR ) for Ri matrix Model 4.1 4.2 4.3 4.4 √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Three-Level Models for Clustered Data:The Classroom Example TABLE 4.3: Summary of Selected Models Considered for the Classroom Data 147 148 Linear Mixed Models: A Practical Guide Using Statistical Software The hierarchical model has three components, reflecting contributions from the three levels of data shown in Table 4.2. First, we write the Level 1 component as Level 1 Model (Student) MATHGAINijk = b0j|k + β1 × MATHKINDijk + β2 × SEXijk + β3 × MINORITYijk + β4 × SESijk + εijk (4.2) where εijk ∼ N (0, σ 2 ) The Level 1 model (4.2) shows that at the student level of the data, we have a set of simple classroom-specific linear regressions of MATHGAIN on the student-level covariates. The unobserved classroom-specific intercepts, b0j|k , are related to several other fixed and random effects at the classroom level, and are defined in the following Level 2 model. The Level 2 model for the classroom-specific intercepts can be written as Level 2 Model (Classroom) b0j|k = b0k + β5 × YEARSTEAjk + β6 × MATHPREPjk + β7 × MATHKNOWjk + uj|k (4.3) where 2 uj|k ∼ N (0, σint:classroom ) The Level 2 model (4.3) assumes that the intercept, b0j|k , for classroom j nested within school k, depends on the unobserved intercept specific to the k-th school, b0k , the classroomspecific covariates associated with the teacher for that classroom (YEARSTEA, MATHPREP, and MATHKNOW), and a random effect, uj|k , associated with classroom j within school k. The Level 3 model for the school-specific intercepts in Model 4.3 is: Level 3 Model (School) b0k = β0 + uk (4.4) where 2 uk ∼ N (0, σint:school ) The Level 3 model in (4.4) shows that the school-specific intercept in Model 4.3 depends on the overall fixed intercept, β0 , and the random effect, uk , associated with the intercept for school k. By substituting the expression for b0k from the Level 3 model into the Level 2 model, and then substituting the resulting expression for b0j|k from the Level 2 model into the Level 1 model, we recover Model 4.3 as it was specified in (4.1). We specify Model 4.4 by omitting the fixed effects of the classroom-level covariates from Model 4.3, and adding the fixed effect of the school-level covariate, HOUSEPOV, to the Level 3 model. 4.3.3 Hypothesis Tests Hypothesis tests considered in the analysis of the Classroom data are summarized in Table 4.4. Hypothesis Specification Hypothesis Test Models Compared Label Null (H0 ) Alternative (HA ) Test Nested Model (H0 ) Ref. Model (HA ) Est. Method Test Stat. Dist. under H0 Drop uj|k 2 (σint:classroom = 0) Retain uj|k 2 (σint:classroom > 0) LRT Model 4.1A Model 4.1 REML 0.5χ20 +0.5χ21 4.2 Fixed effects of student-level covariates are all zero (β1 = β2 = β3 = β4 = 0) At least one fixed effect at the student level is different from zero LRT Model 4.1 Model 4.2 ML χ24 4.3 Fixed effect of YEARSTEA is zero (β5 = 0) β5 = 0 t-test N/A Model 4.3 REML/ ML ta177 4.4 Fixed effect of MATHPREP is zero (β6 =0) β6 = 0 t-test N/A Model 4.3 REML/ ML ta177 4.5 Fixed effect of MATHKNOW is zero (β7 = 0) β7 = 0 t-test N/A Model 4.3 REML/ ML ta177 Fixed effect of β8 = 0 t-test N/A Model HOUSEPOV is zero 4.4 (β8 = 0) a Degrees of freedom for the t-statistics are those reported by the HLM3 procedure. REML/ ML ta105 4.6 149 4.1 Three-Level Models for Clustered Data:The Classroom Example TABLE 4.4: Summary of Hypotheses Tested in the Classroom Analysis 150 Linear Mixed Models: A Practical Guide Using Statistical Software Hypothesis 4.1. The random effects associated with the intercepts for classrooms nested within schools can be omitted from Model 4.1. We do not directly test the significance of the random classroom-specific intercepts, but rather test null and alternative hypotheses about the variance of the classroom-specific intercepts. The null and alternative hypotheses are: 2 H0 : σint:classroom =0 2 HA : σint:classroom > 0 We use a REML-based likelihood ratio test for Hypothesis 4.1 in SAS, SPSS, R, and Stata. The test statistic is calculated by subtracting the –2 REML log-likelihood for Model 4.1 (the reference model, including the nested random classroom effects) from the corresponding value for Model 4.1A (the nested model). To obtain a p-value for this test statistic, we refer it to a mixture of χ2 distributions, with 0 and 1 degrees of freedom and equal weight 0.5. The HLM3 procedure does not use REML estimation and is not able to fit a model without any random effects at a given level of the model, such as Model 4.1A. Therefore, an LRT cannot be performed, so we consider an alternative chi-square test for Hypothesis 4.1 provided by HLM in Subsection 4.7.2. We decide that the random effects associated with the intercepts for classrooms nested within schools should be retained in Model 4.1. We do not explicitly test the variance of the random school-specific intercepts, but we retain them in Model 4.1 and all subsequent models to reflect the hierarchical structure of the data. Hypothesis 4.2. The fixed effects associated with the four student-level covariates should be added to Model 4.1. The null and alternative hypotheses for the fixed effects associated with the student-level covariates, MATHKIND, SEX, MINORITY, and SES, are: H0 : β 1 = β 2 = β 3 = β 4 = 0 HA : At least one fixed effect is not equal to zero We test Hypothesis 4.2 using a likelihood ratio test, based on maximum likelihood (ML) estimation. The test statistic is calculated by subtracting the –2 ML log-likelihood for Model 4.2 (the reference model with the fixed effects of all student-level covariates included) from that for Model 4.1 (the nested model). Under the null hypothesis, the distribution of this test statistic is asymptotically a χ2 with 4 degrees of freedom. We decide that the fixed effects associated with the Level 1 covariates should be added and select Model 4.2 as our preferred model at this stage of the analysis. Hypotheses 4.3, 4.4, and 4.5. The fixed effects associated with the classroom-level covariates should be retained in Model 4.3. The null and alternative hypotheses for the fixed effects associated with the classroomlevel covariates YEARSTEA, MATHPREP, and MATHKNOW are written as follows: • Hypothesis 4.3 for the fixed effect associated with YEARSTEA: H0 : β 5 = 0 HA : β5 = 0 • Hypothesis 4.4 for the fixed effect associated with MATHPREP: H0 : β 6 = 0 HA : β6 = 0 Three-Level Models for Clustered Data:The Classroom Example 151 • Hypothesis 4.5 for the fixed effect associated with MATHKNOW: H0 : β 7 = 0 HA : β7 = 0 We test each of these hypotheses using t-tests based on the fit of Model 4.3. We decide that none of the fixed effects associated with the classroom-level covariates should be retained, and keep Model 4.2 as our preferred model at this stage of the analysis. Because there are classrooms with missing values for the classroom-level covariate MATHKNOW, there are different sets of observations used to fit Models 4.2 and 4.3. In this case, we cannot use a likelihood ratio test to decide if we should keep the fixed effects of all covariates at Level 2 of the model, as we did for the fixed effects of the covariates at Level 1 of the model when we tested Hypothesis 4.2. To perform a likelihood ratio test for the effects of all of the classroom-level covariates, we would need to fit both Models 4.2 and 4.3 using the same cases (i.e., those cases with no missing data on all covariates would need to be considered for both models). See Subsection 4.11.4 for a discussion of how to set up the necessary data set, with complete data for all covariates, in each of the software procedures. Hypothesis 4.6. The fixed effect associated with the school-level covariate HOUSEPOV should be added to Model 4.2. The null and alternative hypotheses for the fixed effect associated with HOUSEPOV are specified as follows: H0 : β8 = 0 HA : β8 = 0 We test Hypothesis 4.6 using a t-test for the significance of the fixed effect of HOUSEPOV in Model 4.4. We decide that the fixed effect of this school-level covariate should not be added to the model, and choose Model 4.2 as our final model. For the results of these hypothesis tests, see Section 4.5. 4.4 Analysis Steps in the Software Procedures We compare results for selected models across the software procedures in Section 4.6. 4.4.1 SAS We begin by reading the comma-delimited data file, classroom.csv (assumed to have the “long” data structure displayed in Table 4.2, and located in the C:\temp directory) into a temporary SAS data set named classroom: proc import out = WORK.classroom datafile = "C:\temp\classroom.csv"; dbms = csv replace; getnames = YES; datarow = 2; guessingrows = 20; run; 152 Linear Mixed Models: A Practical Guide Using Statistical Software Step 1: Fit the initial “unconditional” (variance components) model (Model 4.1), and decide whether to omit the random classroom effects (Model 4.1 vs. Model 4.1A). Prior to fitting the models in SAS, we sort the data set by SCHOOLID and CLASSID. Although not necessary for fitting the models, this sorting makes it easier to interpret elements in the marginal variance-covariance and correlation matrices that we display later. proc sort data = classroom; by schoolid classid; run; The SAS code for fitting Model 4.1 using REML estimation is shown below. Because the only fixed effect in Model 4.1 is the intercept, we use the model statement with no covariates on the right-hand side of the equal sign; the intercept is included by default. The two random statements specify that a random intercept should be included for each school and for each classroom nested within a school. title "Model 4.1"; proc mixed data = classroom noclprint covtest; class classid schoolid; model mathgain = / solution; random intercept / subject = schoolid v vcorr; random intercept / subject = classid(schoolid); run; We specify two options in the proc mixed statement. The noclprint option suppresses listing of levels of the class variables in the output to save space; we recommend using this option only after an initial examination of the levels of the class variables to be sure the data set is being read correctly. The covtest option is specified to request that the standard errors of the estimated variance components be included in the output. The Wald tests of covariance parameters that are reported when the covtest option is specified should not be used to test hypotheses about the variance components. The solution option in the model statement causes SAS to print estimates of the fixedeffect parameters included in the model, which for Model 4.1 is simply the overall intercept. We use the v option to request that a single block (corresponding to the first school) of the estimated V matrix for the marginal model implied by Model 4.1 be displayed in the output (see Subsection 2.2.3 for more details on the V and Z matrices in SAS). In addition, we specify the vcorr option to request that the corresponding marginal correlation matrix be displayed. The v and vcorr options can be added to one or both of the random statements, with the same results. We display the SAS output from these commands in Section 4.8. Software Note: When the proc mixed syntax above is used, the overall subject is considered by SAS to be SCHOOLID. Because CLASSID is specified in the class statement, SAS sets up the same number of columns in the Z matrix for each school. The SAS output below, obtained by fitting Model 4.1 to the Classroom data, indicates Three-Level Models for Clustered Data:The Classroom Example 153 that there are 107 subjects included in this model (corresponding to the 107 schools). SAS sets up 10 columns in the Z matrix for each school, which corresponds to the maximum number of classrooms in any given school (9) plus a single column for the school. The maximum number of students in any school is 31. ' $ Dimensions Covariance Parameters Columns in X Columns in Z Per Subject Subjects Max Obs Per Subject 3 1 10 107 31 & % This syntax for Model 4.1 allows SAS to fit the model efficiently, by taking blocks of the marginal V matrix into account. However, it also means that if EBLUPs are requested by specifying the solution option in the corresponding random statements, as illustrated below, SAS will display output for 10 classrooms within each school, even for schools that have fewer than 10 classrooms. This results in extra unwanted output being generated. We present alternative syntax to avoid this potential problem below. Alternative syntax for Model 4.1 may be specified by omitting the subject = option in the random statement for SCHOOLID. By including the solution option in one or both random statements, we obtain the predicted values of the EBLUPs for each school, followed by the EBLUPs for the classrooms, with no extra output being produced. title "Model 4.1: Alternative Syntax"; proc mixed data = classroom noclprint covtest; class classid schoolid; model mathgain = / solution; random schoolid / solution; random int / subject = classid(schoolid); run; The resulting “Dimensions” output for this model specification is as follows: ' Covariance Parameters Columns in x Columns in Z Per Subject Subjects Max obs Per Subject & $ Dimensions 3 1 419 1 1190 % SAS now considers there to be only one subject in this model, even though we specified a subject = option in the second random statement. In this case, the Z matrix has 419 columns, which correspond to the total number of schools (107) plus classrooms (312). A portion of the resulting output for the EBLUPs is as follows: 154 ' Linear Mixed Models: A Practical Guide Using Statistical Software $ & % Solution for Random Effects --------------------------------------------------------------------Std Err Effect Classid Schoolid Estimate Pred DF t Value pr > |t| --------------------------------------------------------------------schoolid 1 0.94110 7.1657 878 0.13 0.8955 schoolid 2 2.54710 7.0625 878 0.36 0.7184 schoolid 3 13.67300 6.6390 878 2.06 0.0397 ... Intercept 160 1 1.63800 8.9188 878 0.18 0.8543 Intercept 217 1 -0.43260 8.1149 878 -0.05 0.9575 Intercept 197 2 -1.69390 9.1905 878 -0.18 0.8538 Intercept 211 2 2.51290 8.6881 878 0.29 0.7725 Intercept 307 2 2.44330 8.6881 878 0.28 0.7786 Intercept 11 3 3.86990 8.6607 878 0.45 0.6551 Intercept 137 3 5.56300 9.1818 878 0.61 0.5448 Intercept 145 3 8.10160 8.4622 878 0.96 0.3386 Intercept 228 3 -0.02247 8.8971 878 -0.00 0.9980 Note that the standard error, along with a t-test, is displayed for each EBLUP. The test statistic is calculated by dividing the predicted EBLUP by its estimated standard error, with the degrees of freedom being the same as indicated by the ddfm = option. If no ddfm = option is specified, SAS will use the default “containment” method (see Subsection 3.11.6 for a discussion of different methods of calculating denominator degrees of freedom in SAS). Although it may not be of particular interest to test whether the predicted random effect for a given school or classroom is equal to zero, large values of the t-statistic may indicate an outlying value for a given school or classroom. To suppress the display of the EBLUPs in the output, but obtain them in a SAS data set, we use the following ODS statements prior to invoking proc mixed: ods exclude solutionr; ods output solutionr=eblupdat; The first ods statement prevents the EBLUPs from being displayed in the output. The second ods statement requests that the EBLUPs be placed in a new SAS data set named eblupdat. The distributions of the EBLUPs contained in the eblupdat data set can be investigated graphically to check for possible outliers (see Subsection 4.10.1 for diagnostic plots of the EBLUPs associated with Model 4.2). Note that the eblupdat data set is only created if the solution option is added to either of the random statements. We recommend that readers be cautious about including options, such as v and vcorr, for displaying the marginal variance-covariance and correlation matrices when using random statements with no subject = option, as shown in the first random statement in the alternative syntax for Model 4.1. In this case, SAS will display matrices that are of the same dimension as the total number of observations in the entire data set. In the case of the Classroom data, this would result in a 1190 × 1190 matrix being displayed for both the marginal variance-covariance and correlation matrices. We now fit Model 4.1A using REML estimation (the default), so that we can perform a likelihood ratio test of Hypothesis 4.1 to decide if we need the nested random classroom effects in Model 4.1. We omit the random classroom-specific intercepts in Model 4.1A by adding an asterisk at the beginning of the second random statement, to comment it out: Three-Level Models for Clustered Data:The Classroom Example 155 title "Model 4.1A"; proc mixed data = classroom noclprint covtest; class classid schoolid; model mathgain = / solution; random intercept / subject = schoolid; *random intercept / subject = classid(schoolid); run; Results of the REML-based likelihood ratio test for Hypothesis 4.1 are discussed in detail in Subsection 4.5.1. We decide to retain the nested random classroom effects in Model 4.1 based on the significant (p < 0.001) result of this test. We also keep the random school-specific intercepts in the model to reflect the hierarchical structure of the data. Step 2: Build the Level 1 Model by adding Level 1 Covariates (Model 4.1 vs. Model 4.2). To fit Model 4.2, again using REML estimation, we modify the model statement to add the fixed effects of the student-level covariates MATHKIND, SEX, MINORITY, and SES: title "Model 4.2"; proc mixed data = classroom noclprint covtest; class classid schoolid; model mathgain = mathkind sex minority ses / solution; random intercept / subject = schoolid; random intercept / subject = classid(schoolid); run; Because the covariates SEX and MINORITY are indicator variables, having values of 0 and 1, they do not need to be identified as categorical variables in the class statement. We formally test Hypothesis 4.2 (whether any of the fixed effects associated with the student-level covariates are different from zero) by performing an ML-based likelihood ratio test, subtracting the –2 ML log-likelihood of Model 4.2 (the reference model) from that of Model 4.1 (the nested model, excluding the four fixed effects being tested), and referring the difference to a χ2 distribution with 4 degrees of freedom. Note that the method = ML option is used to request maximum likelihood estimation of both models (see Subsection 2.6.2.1 for a discussion of likelihood ratio tests for fixed effects): title "Model 4.1: ML Estimation"; proc mixed data = classroom noclprint covtest method = ML; class classid schoolid; model mathgain = / solution; random intercept / subject = schoolid; random intercept / subject = classid(schoolid); run; title "Model 4.2: ML Estimation"; proc mixed data = classroom noclprint covtest method = ML; class classid schoolid; model mathgain = mathkind sex minority ses / solution; random intercept / subject = schoolid; random intercept / subject = classid(schoolid); run; 156 Linear Mixed Models: A Practical Guide Using Statistical Software We reject the null hypothesis that all fixed effects associated with the student-level covariates are equal to zero, based on the result of this test (p < 0.001), and choose Model 4.2 as our preferred model at this stage of the analysis. The significant result for the test of Hypothesis 4.2 suggests that the fixed effects of the student-level covariates explain at least some of the variation at the student level of the data. Informal examination of the estimated residual variance in Model 4.2 suggests that the Level 1 (student-level) residual variance is substantially reduced compared to that for Model 4.1 (see Section 4.6). Step 3: Build the Level 2 Model by adding Level 2 Covariates (Model 4.3). To fit Model 4.3, we add the fixed effects of the classroom-level covariates YEARSTEA, MATHPREP, and MATHKNOW to the model statement that we specified for Model 4.2: title "Model 4.3"; proc mixed data = classroom noclprint covtest; class classid schoolid; model mathgain = mathkind sex minority ses yearstea mathprep mathknow / solution; random intercept / subject = schoolid; random intercept / subject = classid(schoolid); run; We consider t-tests of Hypotheses 4.3, 4.4, and 4.5, for the individual fixed effects of the classroom-level covariates added to the model in this step. The results of these t-tests indicate that none of the fixed effects associated with the classroom-level covariates are significant, so we keep Model 4.2 as the preferred model at this stage of the analysis. Note that results of the t-tests are based on an analysis with 109 observations omitted. Step 4: Build the Level 3 Model by adding the Level 3 Covariate (Model 4.4). To fit Model 4.4, we add the fixed effect of the school-level covariate, HOUSEPOV, to the model statement for Model 4.2: title "Model 4.4"; proc mixed data = classroom noclprint covtest; class classid schoolid; model mathgain = mathkind sex minority ses housepov / solution; random intercept / subject = schoolid; random intercept / subject = classid(schoolid); run; To test Hypothesis 4.6, we carry out a t-test for the fixed effect of HOUSEPOV, based on the REML fit of Model 4.4. The result of the t-test indicates that the fixed effect for this school-level covariate is not significant (p = 0.25). We also note that the estimated variance of the random school effects in Model 4.4 has not been reduced compared to that of Model 4.2. We therefore do not retain the fixed effect of the school-level covariate and choose Model 4.2 as our final model. We now refit our final model, using REML estimation: title "Model 4.2 (Final)"; proc mixed data = classroom noclprint covtest; class classid schoolid; Three-Level Models for Clustered Data:The Classroom Example 157 model mathgain = mathkind sex minority ses / solution outpred = pdatl; random intercept / subject = schoolid solution; random intercept / subject = classid(schoolid); run; The data set pdat1 (requested with the outpred = option in the model statement) contains the conditional predicted values for each student (based on the estimated fixed effects in the model, and the EBLUPs for the random school and classroom effects) and the conditional residuals for each observation. This data set can be used to visually assess model diagnostics (see Subsection 4.10.2). 4.4.2 SPSS We first import the comma-delimited data file named classroom.csv, which has the “long” format displayed in Table 4.2, using the following SPSS syntax: GET DATA /TYPE = TXT /FILE = "C:\temp\classroom.csv" /DELCASE = LINE /DELIMITERS = "," /ARRANGEMENT = DELIMITED /FIRSTCASE = 2 /IMPORTCASE = ALL /VARIABLES = sex F1.0 minority F1.0 mathkind F3.2 mathgain F4.2 ses F5.2 yearstea F5.2 mathknow F5.2 housepov F5.2 mathprep F4.2 classid F3.2 schoolid F1.0 childid F2.1 . CACHE. EXECUTE. Now that we have a data set in SPSS in the “long” form appropriate for fitting linear mixed models, we proceed with the analysis. Step 1: Fit the initial “unconditional” (variance components) model (Model 4.1), and decide whether to omit the random classroom effects (Model 4.1 vs. Model 4.1A). There are two ways to set up a linear mixed model with nested random effects using SPSS syntax. We begin by illustrating how to set up Model 4.1 without specifying any “subjects” in the RANDOM subcommands: 158 Linear Mixed Models: A Practical Guide Using Statistical Software * Model 4.1 (more efficient syntax). MIXED mathgain BY classid schoolid /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE (0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM classid(schoolid) | COVTYPE(VC) /RANDOM schoolid | COVTYPE(VC) . The first variable listed after invoking the MIXED command is the dependent variable, MATHGAIN. The two random factors (CLASSID and SCHOOLID) have been declared as categorical factors by specifying them after the BY keyword. The CRITERIA subcommand specifies the estimation criteria to be used when fitting the model, which are the defaults. The FIXED subcommand has no variables on the right-hand side of the equal sign before the vertical bar (|), which indicates that an intercept-only model is requested (i.e., the only fixed effect in the model is the overall intercept). The SSTYPE(3) option after the vertical bar specifies that SPSS should use the default “Type 3” analysis, in which the tests for the fixed effects are adjusted for all other fixed effects in the model. This option is not critical for Model 4.1, because the only fixed effect in this model is the intercept. We use the default REML estimation method by specifying REML in the METHOD subcommand. The PRINT subcommand specifies that the printed output should include the estimated fixed effects (SOLUTION). Note that there are two RANDOM subcommands: the first identifies CLASSID nested within SCHOOLID as a random factor, and the second specifies SCHOOLID as a random factor. No “subject” variables are specified in either of these RANDOM subcommands. The COVTYPE(VC) option specified after the vertical bar indicates that a variance components (VC) covariance structure for the random effects is desired. In the following syntax, we show an alternative specification of Model 4.1 that is less efficient computationally for larger data sets. We specify INTERCEPT before the vertical bar and a SUBJECT variable after the vertical bar for each RANDOM subcommand. This syntax means that for each level of the SUBJECT variables, we add a random effect to the model associated with the INTERCEPT: * Model 4.1 (less efficient syntax). MIXED mathgain /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE (0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM INTERCEPT | SUBJECT(classid*schoolid) COVTYPE(VC) /RANDOM INTERCEPT | SUBJECT(schoolid) COVTYPE(VC) . There is no BY keyword in this syntax, meaning that the random factors in this model are not identified initially as categorical factors. Three-Level Models for Clustered Data:The Classroom Example 159 The first RANDOM subcommand declares that there is a random effect associated with the INTERCEPT for each subject identified by a combination of the classroom ID and school ID variables (CLASSID*SCHOOLID). Even though this syntax appears to be setting up a crossed effect for classroom by school, it is equivalent to the nested specification for these two random factors that was seen in the previous syntax (CLASSID(SCHOOLID)). The asterisk is necessary because nested SUBJECT variables cannot currently be specified when using the MIXED command. The second RANDOM subcommand specifies that there is a random effect in the model associated with the INTERCEPT for each subject identified by the SCHOOLID variable. Software Note: Fitting Model 4.1 using the less efficient SPSS syntax takes a relatively long time compared to the time required to fit the equivalent model using the other packages; and it also takes longer compared to the more efficient version of the SPSS syntax. We use the more computationally efficient syntax for the remainder of the analysis in SPSS. We now carry out a likelihood ratio test of Hypothesis 4.1 to decide if we wish to omit the nested random effects associated with classrooms from Model 4.1. To do this, we fit Model 4.1A by removing the first RANDOM subcommand from the more efficient syntax for Model 4.1: * Model 4.1A . MIXED mathgain BY classid schoolid /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE (0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM schoolid | COVTYPE(VC) . To calculate the test statistic for Hypothesis 4.1, we subtract the –2 REML log-likelihood value for the reference model, Model 4.1, from the corresponding value for Model 4.1A (the nested model). We refer the resulting test statistic to a mixture of χ2 distributions, with 0 and 1 degrees of freedom, and equal weight of 0.5 (see Subsection 4.5.1 for a discussion of this test). Based on the significant result of this test (p = 0.002), we decide to retain the nested random classroom effects in Model 4.1 and all future models. We also retain the random effects associated with schools in all models without testing their significance, to reflect the hierarchical structure of the data in the model specification. Step 2: Build the Level 1 Model by adding Level 1 Covariates (Model 4.1 vs. Model 4.2). Model 4.2 adds the fixed effects of the student-level covariates MATHKIND, SEX, MINORITY, and SES to Model 4.1, using the following syntax: * Model 4.2 . MIXED mathgain WITH mathkind sex minority ses BY classid schoolid /CRITERIA = CIN (95) MXITER(100) MXSTER(5) SCORING(1) 160 Linear Mixed Models: A Practical Guide Using Statistical Software SINGULAR (0.000000000001) HCONVERGE (0,ABSOLUTE) LCONVERGE (0,ABSOLUTER) PCONVERGE (0.000001,ABSOLUTE) /FIXED = mathkind sex minority ses | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM classid(schoolid) | COVTYPE(VC) /RANDOM schoolid | COVTYPE(VC) . Note that the four student-level covariates have been identified as continuous by listing them after the WITH keyword. This is an acceptable approach for SEX and MINORITY, even though they are categorical, because they are both indicator variables having values 0 and 1. All four student-level covariates are also listed in the FIXED = subcommand, so that fixed effect parameters associated with each covariate will be added to the model. Software Note: If we had included SEX and MINORITY as categorical factors by listing them after the BY keyword, SPSS would have used the highest levels of each of these variables as the reference categories (i.e., SEX = 1, girls; and MINORITY = 1, minority students). The resulting parameter estimates would have given us the estimated fixed effect of being a boy, and of being a nonminority student, respectively, on math achievement score. These parameter estimates would have had the opposite signs of the estimates resulting from the syntax that we used, in which MINORITY and SEX were listed after the WITH keyword. We test Hypothesis 4.2 to decide whether all fixed effects associated with the Level 1 (student-level) covariates are equal to zero, by performing a likelihood ratio test based on ML estimation (see Subsection 2.6.2.1 for a discussion of likelihood ratio tests for fixed effects). The test statistic is calculated by subtracting the –2 ML log-likelihood for Model 4.2 (the reference model) from the corresponding value for Model 4.1 (the nested model, excluding all fixed effects associated with the Level 1 covariates). We use the /METHOD = ML subcommand to request Maximum Likelihood estimation of both models: * Model 4.1 (ML Estimation). MIXED mathgain BY classid schoolid /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE (0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = | SSTYPE(3) /METHOD = ML /PRINT = SOLUTION /RANDOM classid(schoolid) | COVTYPE(VC) /RANDOM schoolid | COVTYPE(VC) . * Model 4.2 (ML Estimation). MIXED mathgain WITH mathkind sex minority ses BY classid schoolid /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE (0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = mathkind sex minority ses | SSTYPE(3) /METHOD = ML Three-Level Models for Clustered Data:The Classroom Example 161 /PRINT = SOLUTION /RANDOM classid(schoolid) | COVTYPE(VC) /RANDOM schoolid | COVTYPE(VC) . We reject the null hypothesis (p < 0.001) and decide to retain all fixed effects associated with the Level 1 covariates. This result suggests that the Level 1 fixed effects explain at least some of the variation at Level 1 (the student level) of the data, which we had previously attributed to residual variance in Model 4.1. Step 3: Build the Level 2 Model by adding Level 2 Covariates (Model 4.3). We fit Model 4.3 using the default REML estimation method by adding the Level 2 (classroom-level) covariates YEARSTEA, MATHPREP, and MATHKNOW to the SPSS syntax. We add these covariates to the WITH subcommand, so that they will be treated as continuous covariates, and we also add them to the FIXED subcommand: * Model 4.3 . MIXED mathgain WITH mathkind sex minority ses yearstea mathprep mathknow BY classid schoolid /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE (0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = mathkind sex minority ses yearstea mathprep mathknow | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM classid(schoolid) | COVTYPE(VC) /RANDOM schoolid | COVTYPE(VC) . We cannot perform a likelihood ratio test of Hypotheses 4.3 through 4.5, owing to the presence of missing data on some of the classroom-level covariates; we instead use t-tests for the fixed effects of each of the Level 2 covariates added at this step. The results of these t-tests are reported in the SPSS output for Model 4.3. Because none of these t-tests are significant, we do not add these fixed effects to the model, and select Model 4.2 as our preferred model at this stage of the analysis. Step 4: Build the Level 3 Model by adding the Level 3 Covariate (Model 4.4). We fit Model 4.4 using REML estimation, by updating the syntax used to fit Model 4.2. We add the Level 3 (school-level) covariate, HOUSEPOV, to the WITH subcommand and to the FIXED subcommand, as shown in the following syntax: * Model 4.4. MIXED mathgain WITH mathkind sex minority ses housepov BY classid schoolid /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE (0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = mathkind sex minority ses housepov | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM classid(schoolid) | COVTYPE(VC) /RANDOM schoolid | COVTYPE(VC) . 162 Linear Mixed Models: A Practical Guide Using Statistical Software We use a t-test for Hypothesis 4.6, to decide whether we wish to add the fixed effect of HOUSEPOV to the model. The result of this t-test is reported in the SPSS output, and indicates that the fixed effect of HOUSEPOV is not significant (p = 0.25). We therefore choose Model 4.2 as our final model. 4.4.3 R We begin by reading the comma-delimited raw data file, having the structure described in Table 4.2 and with variable names in the first row, into a data frame object named class. The h = T option instructs R to read a header record containing variable names from the first row of the raw data: > class <- read.csv("c:\\temp\\classroom.csv", h = T) 4.4.3.1 Analysis Using the lme() Function We first load the nlme package, so that the lme() function can be used in the analysis: > library(nlme) We now proceed with the analysis steps. Step 1: Fit the initial “unconditional” (variance components) model (Model 4.1), and decide whether to omit the random classroom effects (Model 4.1 vs. Model 4.1A). We fit Model 4.1 to the Classroom data using the lme() function as follows: > # Model 4.1. > model4.1.fit <- lme(mathgain ~ 1, random = ~ 1 | schoolid/classid, class, method = "REML") We describe the syntax used in the lme() function below: • model4.1.fit is the name of the object that will contain the results of the fitted model. • The first argument of the function, mathgain ~ 1, defines the response variable, MATHGAIN, and the single fixed effect in this model, which is associated with the intercept (denoted by a 1 after the ~). • The second argument, random = ~ 1 | schoolid/classid, indicates the nesting structure of the random effects in the model. The random = ~ 1 portion of the argument indicates that the random effects are to be associated with the intercept. The first variable listed after the vertical bar (|) is the random factor at the highest level (Level 3) of the data (i.e., SCHOOLID). The next variable after a forward slash (/) indicates the random factor (i.e., CLASSID) with levels nested within levels of the first random factor. This notation for the nesting structure is known as the Wilkinson–Rogers notation. • The third argument, class, indicates the name of the data frame object to be used in the analysis. • The final argument of the function, method = "REML", tells R that REML estimation should be used for the variance components in the model. REML is the default estimation method in the lme() function. Three-Level Models for Clustered Data:The Classroom Example 163 Estimates saved in the model fit object can be obtained by applying the summary() function: > summary(mode14.1.fit) Software Note: The getVarCov() function, which can be used to display blocks of the estimated marginal V matrix for a two-level model, currently does not have the capability of displaying blocks of the estimated V matrix for the models considered in this example, due to the multiple levels of nesting in the Classroom data set. The EBLUPs of the random school effects and the nested random classroom effects in the model can be obtained by using the random.effects() function: > random.effects(model4.1.fit) At this point we perform a likelihood ratio test of Hypothesis 4.1, to decide if we need the nested random classroom effects in the model. We first fit a nested model, Model 4.1A, omitting the random effects associated with the classrooms. We do this by excluding the CLASSID variable from the nesting structure for the random effects in the lme() function: > # Model 4.1A. > model4.1A.fit <- lme(mathgain ~ 1, random = ~1 | schoolid, data = class, method = "REML") The anova() function can now be used to carry out a likelihood ratio test for Hypothesis 4.1, to decide if we wish to retain the nested random effects associated with classrooms. > anova(model4.1.fit, model4.1A.fit) The anova() function subtracts the –2 REML log-likelihood value for Model 4.1 (the reference model) from that for Model 4.1A (the nested model), and refers the resulting test statistic to a χ2 distribution with 1 degree of freedom. However, because the appropriate null distribution for the likelihood ratio test statistic for Hypothesis 4.1 is a mixture of two χ2 distributions, with 0 and 1 degrees of freedom and equal weights of 0.5, we multiply the p-value provided by the anova() function by 0.5 to obtain the correct p-value. Based on the significant result of this test (p < 0.01), we retain the nested random classroom effects in Model 4.1 and in all future models. We also retain the random school effects as well, to reflect the hierarchical structure of the data in the model specification. Step 2: Build the Level 1 Model by adding Level 1 Covariates (Model 4.1 vs. Model 4.2). After obtaining the estimates of the fixed intercept and the variance components in Model 4.1, we modify the syntax to fit Model 4.2, which includes the fixed effects of the four Level 1 (student-level) covariates MATHKIND, SEX, MINORITY, and SES. Note that these covariates are added on the right-hand side of the ~ in the first argument of the lme() function: > # Model 4.2. > model4.2.fit <- lme(mathgain ~ mathkind + sex + minority + ses, random = ~1 | schoolid/classid, class, na.action = "na.omit", method = "REML") 164 Linear Mixed Models: A Practical Guide Using Statistical Software Because some of the students might have missing data on these covariates (which is actually not the case for the Level 1 covariates in the Classroom data set), we include the argument na.action = "na.omit", to tell the two functions to drop cases with missing data from the analysis. Software Note: Without the na.action = "na.omit" specification, the lme() function will not run if there are missing data on any of the variables input to the function. The 1 that was used to identify the intercept in the fixed part of Model 4.1 does not need to be specified in the syntax for Model 4.2, because the intercept is automatically included in any model with at least one fixed effect. We assess the results of fitting Model 4.2 using the summary() function: > summary(mode14.2.fit) We now test Hypothesis 4.2, to decide whether the fixed effects associated with all Level 1 (student-level) covariates in Model 4.2 are equal to zero, by carrying out a likelihood ratio test using the anova() function. To do this we refit the nested model, Model 4.1, and the reference model, Model 4.2, using ML estimation. The test statistic is calculated by the anova() function by subtracting the –2 ML log-likelihood for Model 4.2 (the reference model) from that for Model 4.1 (the nested model), and referring the test statistic to a χ2 distribution with 4 degrees of freedom. > # Model 4.1: ML estimation with lme(). > model4.1.ml.fit <- lme(mathgain ~ 1, random = ~1 | schoolid/classid, class, method = "ML") > # Model 4.2: ML estimation with lme(). > model4.2.ml.fit <- lme(mathgain ~ mathkind + sex + minority + ses, random = ~1 | schoolid/classid, class, na.action = "na.omit", method = "ML") > anova(model4.1.ml.fit, model4.2.ml.fit) We see that at least one of the fixed effects associated with the Level 1 covariates is significant, based on the result of this test (p < 0.001); Subsection 4.5.2 presents details on testing Hypothesis 4.2. We therefore proceed with Model 4.2 as our preferred model. Step 3: Build the Level 2 Model by adding Level 2 Covariates (Model 4.3). We fit Model 4.3 by adding the fixed effects of the Level 2 (classroom-level) covariates, YEARSTEA, MATHPREP, and MATHKNOW, to Model 4.2: > # Model 4.3. > model4.3.fit <- update(model4.2.fit, fixed = ~ mathkind + sex + minority + ses + yearstea + mathprep + mathknow) We investigate the resulting parameter estimates and standard errors for the estimated fixed effects by applying the summary() function to the model fit object: > summary(model4.3.fit) We cannot consider a likelihood ratio test for the fixed effects added to Model 4.2, because some classrooms have missing data on the MATHKNOW variable, and Models 4.2 Three-Level Models for Clustered Data:The Classroom Example 165 and 4.3 are fitted using different observations as a result. Instead, we test the fixed effects associated with the classroom-level covariates (Hypotheses 4.3 through 4.5) using t-tests. None of these fixed effects are significant based on the results of these t-tests (provided by the summary() function), so we choose Model 4.2 as the preferred model at this stage of the analysis. Step 4: Build the Level 3 Model by adding the Level 3 Covariate (Model 4.4). Model 4.4 can be fitted by adding the Level 3 (school-level) covariate to the formula for the fixed-effects portion of the model in the lme() function. We add the fixed effect of the HOUSEPOV covariate to the model by updating the fixed = argument for Model 4.2: > # Model 4.4. > model4.4.fit <- update(model4.2.fit, fixed = ~ mathkind + sex + minority + ses + housepov) We apply the summary() function to the model fit object to obtain the resulting parameter estimates and t-tests for the fixed effects (in the case of model4.4.fit): > summary(model4.4.fit) The t-test for the fixed effect of HOUSEPOV is not significant, so we choose Model 4.2 as our final model for the Classroom data set. 4.4.3.2 Analysis Using the lmer() Function We begin by loading the lme4 package, so that the lmer() function can be used in the analysis: > library(lme4) We now proceed with the analysis steps. Step 1: Fit the initial “unconditional” (variance components) model (Model 4.1), and decide whether to omit the random classroom effects (Model 4.1 vs. Model 4.1A). We fit Model 4.1 to the Classroom data using the lmer() function as follows: > # Model 4.1. > model4.1.fit.lmer <- lmer(mathgain ~ 1 + (1|schoolid) + (1|classid), class, REML = T) We describe the syntax used in the lmer() function below: • model4.1.fit.lmer is the name of the object that will contain the results of the fitted model. • Like the lme() function, the first argument of the function, mathgain ~ 1, defines the response variable, MATHGAIN, and the single fixed effect in this model, which is associated with the intercept (denoted by a 1 after the ~). • Next, random intercepts associated with each level of SCHOOLID and CLASSID are added to the model formula (using + notation), using the syntax (1|schoolid) and (1|classid). Note that a specific nesting structure does not need to be indicated here. 166 Linear Mixed Models: A Practical Guide Using Statistical Software • The third argument, class, once again indicates the name of the data frame object to be used in the analysis. • The final argument of the function, REML = T, tells R that REML estimation should be used for the variance components in the model. REML is also the default estimation method in the lmer() function. Estimates from the model fit can be obtained using the summary() function: > summary(mode14.1.fit.lmer) The EBLUPs of the random school effects and the nested random classroom effects in the model can be obtained using the ranef() function: > ranef(model4.1.fit.lmer) At this point we perform a likelihood ratio test of Hypothesis 4.1, to decide if we need the nested random classroom effects in the model. We first fit a nested model, Model 4.1A, omitting the random effects associated with the classrooms. We do this by excluding (1|CLASSID) from the model formula for the lmer() function: > # Model 4.1A. > model4.1A.fit.lmer <- lmer(mathgain ~ 1 + (1|schoolid), class, REML = T) The anova() function can now be used to carry out a likelihood ratio test for Hypothesis 4.1, to decide if we wish to retain the nested random effects associated with classrooms. > anova(model4.1.fit.lmer, model4.1A.fit.lmer) The anova() function subtracts the –2 REML log-likelihood value for Model 4.1 (the reference model) from that for Model 4.1A (the nested model), and refers the resulting test statistic to a χ2 distribution with 1 degree of freedom. However, because the appropriate null distribution for the likelihood ratio test statistic for Hypothesis 4.1 is a mixture of two χ2 distributions, with 0 and 1 degrees of freedom and equal weights of 0.5, we multiply the p-value provided by the anova() function by 0.5 to obtain the correct p-value. Based on the significant result of this test (p < 0.01), we retain the nested random classroom effects in Model 4.1 and in all future models. We also retain the random school effects as well, to reflect the hierarchical structure of the data in the model specification. Step 2: Build the Level 1 Model by adding Level 1 Covariates (Model 4.1 vs. Model 4.2). After obtaining the estimates of the fixed intercept and the variance components in Model 4.1, we modify the syntax to fit Model 4.2, which includes the fixed effects of the four Level 1 (student-level) covariates MATHKIND, SEX, MINORITY, and SES. Note that these covariates are added on the right-hand side of the ~ in the first argument of the lmer() function: > # Model 4.2. > model4.2.fit.lmer <- lmer(mathgain ~ mathkind + sex + minority + ses + (1|schoolid) + (1|classid), class, na.action = "na.omit", REML = T) Three-Level Models for Clustered Data:The Classroom Example 167 Because some of the students might have missing data on these covariates (which is actually not the case for the Level 1 covariates in the Classroom data set), we include the argument na.action = "na.omit", to tell the two functions to drop cases with missing data from the analysis. Software Note: Without the na.action = "na.omit" specification, the lmer() function will not run if there are missing data on any of the variables input to the function. Software Note: The version of the lmer() function in the lme4 package does not automatically compute p-values for the t-statistics that are generated by dividing the fixed-effect parameter estimates by their standard errors (for testing the hypothesis that a given fixed-effect parameter is equal to zero). This is primarily due to the lack of agreement in the literature over an appropriate distribution for this test statistic under the null hypothesis. Instead, approximate tests available in the lmerTest package can be used; see Chapter 3 for an example using two-level models. When possible, we use likelihood ratio tests for the fixed-effect parameters in this chapter. The 1 that was used to identify the intercept in the fixed part of Model 4.1 does not need to be specified in the syntax for Model 4.2, because the intercept is automatically included by the two functions in any model with at least one fixed effect. We assess the results of fitting Model 4.2 using the summary() function: > summary(model4.2.fit.lmer) We now test Hypothesis 4.2, to decide whether the fixed effects associated with all Level 1 (student-level) covariates in Model 4.2 are equal to zero, by carrying out a likelihood ratio test using the anova() function. To do this, we refit the nested model, Model 4.1, and the reference model, Model 4.2, using ML estimation (note the REML = F arguments below). The test statistic is calculated by the anova() function by subtracting the –2 ML loglikelihood for Model 4.2 (the reference model) from that for Model 4.1 (the nested model), and referring the test statistic to a χ2 distribution with 4 degrees of freedom. > # Model 4.1: ML estimation with lmer(). > model4.1.lmer.ml.fit <- lmer(mathgain ~ 1 + (1|schoolid) + (1|classid), class, REML = F) > # Model 4.2: ML estimation with lmer(). > model4.2.lmer.ml.fit <- lmer(mathgain ~ mathkind + sex + minority + ses + (1|schoolid) + (1|classid), class, REML = F) > anova(model4.1.lmer.ml.fit, model4.2.lmer.ml.fit) We see that at least one of the fixed effects associated with the Level 1 covariates is significant, based on the result of this test (p < 0.001); Subsection 4.5.2 presents details on testing Hypothesis 4.2. We therefore proceed with Model 4.2 as our preferred model. Step 3: Build the Level 2 Model by adding Level 2 Covariates (Model 4.3). We fit Model 4.3 by adding the fixed effects of the Level 2 (classroom-level) covariates, YEARSTEA, MATHPREP, and MATHKNOW, to Model 4.2: 168 Linear Mixed Models: A Practical Guide Using Statistical Software > # Model 4.3. > model4.3.fit.lmer <- lmer(mathgain ~ mathkind + sex + minority + ses + yearstea + mathprep + mathknow + (1|schoolid) + (1|classid), class, na.action = "na.omit", REML = T) We investigate the resulting parameter estimates and standard errors for the estimated fixed effects by applying the summary() function to the model fit object: > summary(model4.3.fit.lmer) We cannot consider a likelihood ratio test for the fixed effects added to Model 4.2, because some classrooms have missing data on the MATHKNOW variable, and Models 4.2 and 4.3 are fitted using different observations as a result. Instead, we can refer the test statistics provided by the summary() function to standard normal distributions to make approximate inferences about the importance of the effects (where a test statistic larger than 1.96 in absolute value would suggest a significant fixed effect at the 0.05 significance level, under asymptotic assumptions). These tests would suggest that none of these fixed effects are significant, so we would not retain them in this model. Step 4: Build the Level 3 Model by adding the Level 3 Covariate (Model 4.4). Model 4.4 can be fitted by adding the Level 3 (school-level) covariate HOUSEPOV to the formula for the fixed-effects portion of Model 4.2: > # Model 4.4. > model4.4.fit.lmer <- lmer(mathgain ~ mathkind + sex + minority + ses + housepov + (1|schoolid) + (1|classid), class, na.action = "na.omit", REML = T) We apply the summary() function to the model fit object to obtain the resulting parameter estimates and standard errors: > summary(mode14.4.fit.lmer) Based on the test statistic for the fixed effect of HOUSEPOV (−1.151), we once again do not have enough evidence to say that this effect is different from 0 (at the 0.05 level), so we choose Model 4.2 as our final model for the Classroom data set. 4.4.4 Stata First, we read the raw comma-delimited data into Stata using the insheet command: . insheet using "C:\temp\classroom.csv", comma clear Users of web-aware Stata can also import the data directly from the book’s web page: . insheet using http://www-personal.umich.edu/~bwest/classroom.csv The mixed command can then be used to fit three-level hierarchical models with nested random effects. Three-Level Models for Clustered Data:The Classroom Example 169 Step 1: Fit the initial “unconditional” (variance components) model (Model 4.1), and decide whether to omit the random classroom effects (Model 4.1 vs. Model 4.1A). We first specify the mixed syntax to fit Model 4.1, including the random effects of schools and of classrooms nested within schools: . * Model 4.1. . mixed mathgain || schoolid: || classid:, variance reml The first variable listed after invoking mixed is the continuous dependent variable, MATHGAIN. No covariates are specified after the dependent variable, because the only fixed effect in Model 4.1 is the intercept, which is included by default. After the first clustering indicator (||), we list the random factor identifying clusters at Level 3 of the data set, SCHOOLID, followed by a colon (:). We then list the nested random factor, CLASSID, after a second clustering indicator. This factor identifies clusters at Level 2 of the data set, and is again followed by a colon. Software Note: If a multilevel data set is organized by a series of nested groups, such as classrooms nested within schools as in this example, the random effects structure of the mixed model is specified in mixed by listing the random factors defining the structure, separated by two vertical bars (||). The nesting structure reads left to right; e.g., SCHOOLID is the highest level of clustering, with levels of CLASSID nested within each school. If no variables are specified after the colon at a given level of the nesting structure, the model will only include a single random effect (associated with the intercept) for each level of the random factor. Additional covariates with random effects at a given level of the nesting structure can be specified after the colon. Finally, the variance and reml options specified after a comma request that the estimated variances of the random school and classroom effects, rather than their estimated standard deviations, should be displayed in the output, and that REML estimation should be used to fit this model (ML estimation is the default in Stata 12+). Information criteria, including the REML log-likelihood, can be obtained by using the estat ic command after submitting the mixed command: . estat ic In the output associated with the fit of Model 4.1, Stata automatically reports a likelihood ratio test, calculated by subtracting the –2 REML log-likelihood of Model 4.1 (including the random school effects and nested random classroom effects) from the –2 REML log-likelihood of a simple linear regression model without the random effects. Stata reports the following note along with the test: Note: LR test is conservative and provided only for reference Stata performs a classical likelihood ratio test here, where the distribution of the test statistic (under the null hypothesis that both variance components are equal to zero) is asymptotically χ22 (where the 2 degrees of freedom correspond to the two variance components in Model 4.1). Appropriate theory for testing a model with multiple random effects (e.g., Model 4.1) vs. a model without any random effects has yet to be developed, and Stata 170 Linear Mixed Models: A Practical Guide Using Statistical Software discusses this issue in detail if users click on the LR test is conservative note. The pvalue for this test statistic is known to be larger than it should be (making it conservative). We recommend testing the need for the random effects by using individual likelihood ratio tests, based on REML estimation of nested models. To test Hypothesis 4.1, and decide whether we want to retain the nested random effects associated with classrooms in Model 4.1, we fit a nested model, Model 4.1A, again using REML estimation: . * Model 4.1A. . mixed mathgain || schoolid:, variance reml The test statistic for Hypothesis 4.1 can be calculated by subtracting the –2 REML loglikelihood for Model 4.1 (the reference model) from that of Model 4.1A (the nested model). The p-value for the test statistic (7.9) is based on a mixture of χ2 distributions with 0 and 1 degrees of freedom, and equal weight 0.5. Because of the significant result of this test (p = 0.002), we retain the nested random classroom effects in Model 4.1 and in all future models (see Subsection 4.5.1 for a discussion of this test). We also retain the random effects associated with schools, to reflect the hierarchical structure of the data set in the model. Step 2: Build the Level 1 Model by adding Level 1 Covariates (Model 4.1 vs. Model 4.2). We fit Model 4.2 by adding the fixed effects of the four student-level covariates, MATHKIND, SEX, MINORITY, and SES, using the following syntax: . * Model 4.2. . mixed mathgain mathkind sex minority ses || schoolid: || classid:, variance reml Information criteria associated with the fit of this model can be obtained by using the estat ic command after the mixed command has finished running. We test Hypothesis 4.2 to decide whether the fixed effects that were added to Model 4.1 to form Model 4.2 are all equal to zero, using a likelihood ratio test. We first refit the nested model, Model 4.1, and the reference model, Model 4.2, using ML estimation. We specify the mle option to request maximum likelihood estimation for each model. The est store command is then used to store the results of each model fit in new objects. . * Model 4.1: ML Estimation. . mixed mathgain || schoolid: || classid:, variance mle . est store model4_1_ml_fit . * Model 4.2: ML Estimation. . mixed mathgain mathkind sex minority ses || schoolid: || classid:, variance mle . est store model4_2_ml_fit We use the lrtest command to perform the likelihood ratio test. The likelihood ratio test statistic is calculated by subtracting the –2 ML log-likelihood for Model 4.2 from that for Model 4.1, and referring the difference to a χ2 distribution with 4 degrees of freedom. The likelihood ratio test requires that both models are fitted using the same cases. . lrtest model4_1_ml_fit model4_2_ml_fit Based on the significant result (p < 0.001) of this test, we choose Model 4.2 as our preferred model at this stage of the analysis. We discuss the likelihood ratio test for Hypothesis 4.2 in more detail in Subsection 4.5.2. Three-Level Models for Clustered Data:The Classroom Example 171 Step 3: Build the Level 2 Model by adding Level 2 Covariates (Model 4.3). To fit Model 4.3, we modify the mixed command used to fit Model 4.2 by adding the fixed effects of the classroom-level covariates, YEARSTEA, MATHPREP, and MATHKNOW, to the fixed portion of the command. We again use the default REML estimation for this model, and obtain the model information criteria by using the post-estimation command estat ic. . * Model 4.3. . mixed mathgain mathkind sex minority ses yearstea mathprep mathknow || schoolid: || classid:, variance reml . estat ic We do not consider a likelihood ratio test for the fixed effects added to Model 4.2 to form Model 4.3, because Model 4.3 was fitted using different cases, owing to the presence of missing data on some of the classroom-level covariates. Instead, we consider the z-tests reported by Stata for Hypotheses 4.3 through 4.5. None of the z-tests reported for the fixed effects of the classroom-level covariates are significant. Therefore, we do not retain these fixed effects in Model 4.3, and choose Model 4.2 as our preferred model at this stage of the analysis. Step 4: Build the Level 3 Model by adding the Level 3 Covariate (Model 4.4). To fit Model 4.4, we add the fixed effect of the school-level covariate, HOUSEPOV, to the model, by updating the mixed command that was used to fit Model 4.2. We again use the default REML estimation method, and use the estat ic post-estimation command to obtain information criteria for Model 4.4. . * Model 4.4. . mixed mathgain mathkind sex minority ses housepov || schoolid: || classid:, variance reml . estat ic To test Hypothesis 4.6, we use the z-test reported by the mixed command for the fixed effect of HOUSEPOV. Because of the nonsignificant test result (p = 0.25), we do not retain this fixed effect, and choose Model 4.2 as our final model for the analysis of the Classroom data. 4.4.5 HLM We assume that the first MDM file discussed in the initial data summary (Subsection 4.2.2) has been generated using HLM3, and proceed to the model-building window. Step 1: Fit the initial “unconditional” (variance components) model (Model 4.1), and decide whether to omit the random classroom effects (Model 4.1 vs. Model 4.1A). We begin by specifying the Level 1 (student-level) model. In the model-building window, click on MATHGAIN, and identify it as the Outcome variable. Go to the Basic Settings menu and identify the outcome variable as a Normal (Continuous) variable. Choose a title for this analysis (such as “Classroom Data: Model 4.1”), and choose a location and name for the output (.html) file that will contain the results of the model fit. Click OK to return to the model-building window. Under the File menu, click Preferences, and then click Use level subscripts to display subscripts in the model-building window. 172 Linear Mixed Models: A Practical Guide Using Statistical Software Three models will now be displayed. The Level 1 model describes the “means-only” model at the student level. We show the Level 1 model below as it is displayed in the HLM model-building window: Model 4.1: Level 1 Model MATHGAINijk = π0jk + eijk The value of MATHGAIN for an individual student i, within classroom j nested in school k, depends on the intercept for classroom j within school k, π0jk , plus a residual, eijk , associated with the student. The Level 2 model describes the classroom-specific intercept in Model 4.1 at the classroom level of the data set: Model 4.1: Level 2 Model π0jk = β00k + r0jk The classroom-specific intercept, π0jk , depends on the school-specific intercept, β00k , and a random effect, r0jk , associated with the j-th classroom within school k. The Level 3 model describes the school-specific intercept in Model 4.1: Model 4.1: Level 3 Model β00k = γ000 + u00k The school-specific intercept, β00k , depends on the overall (grand) mean, γ000 , plus a random effect, u00k associated with the school. The overall “means-only” mixed model derived from the preceding Level 1, Level 2, and Level 3 models can be displayed by clicking on the Mixed button: Model 4.1: Overall Mixed Model MATHGAINijk = γ000 + r0jk + u00k + eijk An individual student’s MATHGAIN depends on an overall fixed intercept, γ000 (which represents the overall mean of MATHGAIN across all students), a random effect associated with the student’s classroom, r0jk , a random effect associated with the student’s school, u00k , and a residual, eijk . Table 4.3 shows the correspondence of this HLM notation with the general notation used in (4.1). To fit Model 4.1, click Run Analysis, and select Save as and Run to save the .hlm command file. You will be prompted to supply a name and location for this .him file. After the estimation has finished, click on File, and select View Output to see the resulting parameter estimates and fit statistics. At this point, we test the significance of the random effects associated with classrooms nested within schools (Hypothesis 4.1). However, because the HLM3 procedure does not allow users to remove all random effects from a given level of a hierarchical model (in this example, the classroom level, or the school level), we cannot perform a likelihood ratio test of Hypothesis 4.1, as was done in the other software procedures. Instead, HLM provides chi-square tests that are calculated using methodology described in Raudenbush & Bryk (2002). The following output is generated by HLM3 after fitting Model 4.1: ' $ Final estimation of level-1 and level-2 variance components: -------------------------------------------------------------------Random Effect Standard Variance df ChiP-value Deviation Component square -------------------------------------------------------------------INTRCPT1, r0 10.02212 100.44281 205 301.95331 <0.001 level-1, e 32.05828 1027.73315 & % Three-Level Models for Clustered Data:The Classroom Example ' Final estimation of level-3 variance components: -------------------------------------------------------------------Random Effect Standard Variance df ChiP-value Deviation Component square -------------------------------------------------------------------INTRCPT1/ u00 8.66240 75.03712 106 165.74813 <0.001 INTRCPT2, & 173 $ % The chi-square test statistic for the variance of the nested random classroom effects (301.95) is significant (p < 0.001), so we reject the null hypothesis for Hypothesis 4.1 and retain the random effects associated with both classrooms and schools in Model 4.1 and all future models. We now proceed to fit Model 4.2. Step 2: Build the Level 1 Model by adding Level 1 Covariates (Model 4.1 vs. Model 4.2). We specify the Level 1 Model for Model 4.2 by clicking on Level 1 to add fixed effects associated with the student-level covariates to the model. We first select the variable MATHKIND, choose add variable uncentered, and then repeat this process for the variables SEX, MINORITY, and SES. Notice that as each covariate is added to the Level 1 model, the Level 2 and Level 3 models are also updated. The new Level 1 model is as follows: Model 4.2: Level 1 Model MATHGAINijk = π0jk + π1jk (SEXijk ) + π2jk (MINORITYijk ) + π3jk (MATHKINDijk ) + π4jk (SESijk ) + eijk This updated Level 1 model shows that a student’s MATHGAIN now depends on the intercept specific to classroom j, π0jk , the classroom-specific effects (π1jk , π2jk , π3jk and π4jk ) of each of the student-level covariates, and a residual, eijk. The Level 2 portion of the model-building window displays the classroom-level equations for the student-level intercept (π0jk ) and for each of the student-level effects (π1jk through π4jk ) defined in this model. The equation for each effect from HLM is as follows: Model 4.2: Level 2 Model π0jk π1jk π2jk π3jk π4jk = β00k + r0jk = β10k = β20k = β30k = β40k The equation for the student-level intercept (π0jk ) has the same form as in Model 4.1. It includes an intercept specific to school k, β00k , plus a random effect, r0jk , associated with each classroom in school k. Thus, the student-level intercepts are allowed to vary randomly from classroom to classroom within the same school. The equations for each of the effects associated with the four student-level covariates (π1jk through π4jk ) are all constant at the classroom level. This means that the effects of being female, being a minority student, kindergarten math achievement, and student-level SES are assumed to be the same for students within all classrooms (i.e., these coefficients do not vary across classrooms within a given school). The Level 3 portion of the model-building window shows the school-level equations for the school-specific intercept, β00k , and for each of the school-specific effects in the classroomlevel model, β10k through β40k : 174 Linear Mixed Models: A Practical Guide Using Statistical Software Model 4.2: Level 3 Model β00k β10k β20k β30k β40k = γ000 + u00k = γ100 = γ200 = γ300 = γ400 The equation for the school-specific intercept includes a parameter for an overall fixed intercept, γ000 , plus a random effect, u00k , associated with the school. Thus, the intercepts are allowed to vary randomly from school to school, as in Model 4.1. However, the effects (β10k through β40k ) associated with each of the covariates measured at the student level are not allowed to vary from school to school. This means that the effects of being female, being a minority student, of kindergarten math achievement, and of student-level SES (socioeconomic status) are assumed to be the same across all schools. Click the Mixed button to view the overall linear mixed model specified for Model 4.2: Model 4.2: Overall Mixed Model MATHGAINijk = γ000 + γ100 ∗ SEXijk + γ200 ∗ MINORITYijk + γ300 ∗ MATHKINDijk + γ400 ∗ SESijk + r0jk + u00k + eijk The HLM specification of the model at each level results in the same overall linear mixed model (Model 4.2) that is fitted in the other software procedures. Table 4.3 shows the correspondence of the HLM notation with the general model notation used in (4.1). At this point we wish to test Hypothesis 4.2, to decide whether the fixed effects associated with the Level 1 (student-level) covariates should be added to Model 4.1. We set up the likelihood ratio test for Hypothesis 4.2 in HLM before running the analysis for Model 4.2. To set up a likelihood ratio test of Hypothesis 4.2, click on Other Settings and select Hypothesis Testing. Enter the Deviance (or –2 ML log-likelihood)3 displayed in the output for Model 4.1 (deviance = 11771.33) and the Number of Parameters from Model 4.1 (number of parameters = 4: the fixed intercept, and the three variance components) in the Hypothesis Testing window. After fitting Model 4.2, HLM calculates the appropriate likelihood ratio test statistic and corresponding p-value for Hypothesis 4.2 by subtracting the deviance statistic for Model 4.2 (the reference model) from that for Model 4.1 (the nested model). After setting up the analysis for Model 4.2, click Basic Settings, and enter a new title for this analysis, in addition to a new file name for the saved output. Finally, click Run Analysis, and choose Save as and Run to save a new .hlm command file for this model. After the analysis has finished running, click File and View Output to see the results. Based on the significant (p < 0.001) result of the likelihood ratio test for the studentlevel fixed effects, we reject the null for Hypothesis 4.2 and conclude that the fixed effects at Level 1 should be retained in the model. The results of the test of Hypothesis 4.2 are discussed in more detail in Subsection 4.5.2. The significant test result for Hypothesis 4.2 also indicates that the fixed effects at Level 1 help to explain residual variation at the student level of the data. A comparison of the estimated residual variance for Model 4.2 vs. that for Model 4.1, both calculated using ML estimation in HLM3, provides evidence that the residual variance at Level 1 is in fact substantially reduced in Model 4.2 (as discussed in Subsection 4.7.2). We retain the fixed effects of the Level 1 covariates in Model 4.2 and proceed to consider Model 4.3. 3 HLM reports the value of the –2 ML log-likelihood for a given model as the model deviance. Three-Level Models for Clustered Data:The Classroom Example 175 Step 3: Build the Level 2 Model by adding Level 2 covariates (Model 4.3). Before fitting Model 4.3, we need to add the MATHPREP and MATHKNOW variables to the MDM file (as discussed in Subsection 4.2.2). We then need to recreate Model 4.2 in the model-building window. We obtain Model 4.3 by adding the Level 2 (classroom-level) covariates to Model 4.2. To do this, first click on Level 2, then click on the Level 2 model for the intercept term (π0jk ); include the nested random classroom effects, r0jk , and add the uncentered versions of the classroom-level variables, YEARSTEA, MATHPREP, and MATHKNOW, to the Level 2 model for the intercept. This results in the following Level 2 model for the classroom-specific intercepts: Model 4.3: Level 2 Model for Classroom-Specific Intercepts π0jk = β00k + β01k (YEARSTEAjk ) + β02k (MATHKNOWjk ) + β03k (MATHPREPjk ) + r0jk We see that adding the classroom-level covariates to the model implies that the randomly varying intercepts at Level 1 (the values of π0jk ) depend on the school-specific intercept (β00k ), the classroom-level covariates, and the random effect associated with each classroom (i.e., the value of r0jk ). The effects of the student-level covariates (π1jk through π4jk ) have the same expressions as in Model 4.2 (they are again assumed to remain constant from classroom to classroom). Adding the classroom-level covariates to the Level 2 model for the intercept causes HLM to include additional Level 3 equations for the effects of the classroom-level covariates in the model-building window, as follows: Model 4.3: Level 3 Model (Additional Equations) β01k = γ010 β02k = γ020 β03k = γ030 These equations show that the effects of the Level 2 (classroom-level) covariates are constant at the school level. That is, the classroom-level covariates are not allowed to have effects that vary randomly at the school level, although we could set up the model to allow this. Click the Mixed button in the HLM model-building window to view the overall mixed model for Model 4.3: Model 4.3: Overall Mixed Model MATHGAINijk = γ000 + γ010 ∗ YEARSTEAjk + γ020 ∗ MATHKNOWjk + γ030 ∗ MATHPREPjk + γ100 ∗ SEXijk + γ200 ∗ MINORITYijk + γ300 ∗ MATHKINDijk + γ400 ∗ SESijk + r0jk + u00k + eijk We see that the linear mixed model specified here is the same model that is being fit using the other software procedures. Table 4.3 shows the correspondence of the HLM model parameters with the parameters that we use in (4.1). After setting up Model 4.3, click Basic Settings to enter a new name for this analysis and a new name for the .html output file. Click OK, and then click Run Analysis, and 176 Linear Mixed Models: A Practical Guide Using Statistical Software choose Save as and Run to save a new .html command file for this model before fitting the model. After the analysis has finished running, click File and View Output to see the results. We use t-tests for Hypotheses 4.3 through 4.5 to decide if we want to keep the fixed effects associated with the Level 2 covariates in Model 4.3 (a likelihood ratio test based on the deviance statistics for Models 4.2 and 4.3 is not appropriate, due to the missing data on the classroom-level covariates). Based on the nonsignificant t-tests for each of the classroomlevel fixed effects displayed in the HLM output, we choose Model 4.2 as our preferred model at this stage of the analysis. Step 4: Build the Level 3 Model by adding the Level 3 covariate (Model 4.4). In this step, we add the school-level covariate to Model 4.2 to obtain Model 4.4. We first open the .hlm file corresponding to Model 4.2 from the model-building window by clicking File, and then Edit/Run old command file. After locating the .hlm file saved for Model 4.2, open the file, and click the Level 3 button. Click on the first Level 3 equation for the intercept that includes the random school effects (u00k ). Add the uncentered version of the school-level covariate, HOUSEPOV, to this model for the intercept. The resulting Level 3 model is as follows: Model 4.4: Level 3 Model for School-Specific Intercepts β00k = γ000 + γ001 (HOUSEPOVk ) + u00k The school specific intercepts, β00k , in this model now depend on the overall fixed intercept, γ000 , the fixed effect, γ001 , of HOUSEPOV, and the random effect, u00k , associated with school k. After setting up Model 4.4, click Basic Settings to enter a new name for this analysis and a new name for the .html output file. Click OK, and then click Run Analysis, and choose Save as and Run to save a new .html command file before fitting the model. After the analysis has finished running, click File and View Output to see the results. We test Hypothesis 4.6 using a t-test for the fixed effect associated with HOUSEPOV in Model 4.4. Based on the nonsignificant result of this t-test (p = 0.25), we do not retain the fixed effect of HOUSEPOV, and choose Model 4.2 as our final model in the analysis of the Classroom data set. We now generate residual files to be used in checking model diagnostics (discussed in Section 4.10) for Model 4.2. First, open the .hlm file for Model 4.2, and click Basic Settings. In this window, specify names and file types (we choose to save SPSS-format data files in this example) for the Level 1, Level 2, and Level 3 “Residual” files (click on the buttons for each of the three files). The Level 1 file will contain the Level 1 residuals in a variable named l1resid, and the conditional predicted values of the dependent variable in a variable named fitval. The Level 2 residual file will include a variable named ebintrcp, and the Level 3 residual file will include a variable named eb00; these variables will contain the Empirical Bayes (EB) predicted values (i.e., the EBLUPs) of the random classroom and school effects, respectively. These three files can be used for exploration of the distributions of the EBLUPs and the Level 1 residuals. Covariates measured at the three levels of the Classroom data set can also be included in the three files, although we do not use that option here. Rerun the analysis for Model 4.2 to generate the residual files, which will be saved in the same folder where the .html output file was saved. We apply SPSS syntax to the resulting residual files in Section 4.10, to check the diagnostics for Model 4.2. Three-Level Models for Clustered Data:The Classroom Example 4.5 4.5.1 177 Results of Hypothesis Tests Likelihood Ratio Tests for Random Effects When the “step-up” approach to model building is used for three-level random intercept models, as for the Classroom data, random effects are usually retained in the model, regardless of the results of significance tests for the associated covariance parameters. However, when tests of significance for random effects are desired, we recommend using likelihood ratio tests, which require fitting a nested model (in which the random effects in question are omitted) and a reference model (in which the random effects are included). Both the nested and reference models should be fitted using REML estimation. Likelihood ratio tests for the random effects in a three-level random intercept model are not possible when using the HLM3 procedure, because (1) HLM3 uses ML rather than REML estimation, and (2) HLM in general will not allow models to be specified that do not include random effects at each level of the data. Instead, HLM implements alternative chi-square tests for the variance of random effects, which are discussed in more detail in Raudenbush & Bryk (2002). In this section, we present the results of a likelihood ratio test for the random effects in Model 4.1, based on fitting the reference and nested models using SAS. Hypothesis 4.1. The random effects associated with classrooms nested within schools can be omitted from Model 4.1. We calculate the likelihood ratio test statistic for Hypothesis 4.1 by subtracting the value of the –2 REML log-likelihood for Model 4.1 (the reference model) from the value for Model 4.1A (the nested model excluding the random classroom effects). The resulting test statistic is equal to 7.9 (see Table 4.5). Because a variance cannot be less than zero, the null 2 hypothesis value of σint:classroom = 0 is at the boundary of the parameter space, and the null distribution of the likelihood ratio test statistic is a mixture of χ20 and χ21 distributions, each having equal weight 0.5 (Verbeke & Molenberghs, 2000). The calculation of the p-value for the likelihood ratio test statistic is as follows: p-value = 0.5 × p(χ20 > 7.9) + 0.5 × p(χ21 > 7.9) < 0.01 Based on the result of this test, we conclude that there is significant variance in the MATHGAIN means between classrooms nested within schools, and we retain the random effects associated with classrooms in Model 4.1 and in all subsequent models. We also retain the random school effects, without testing them, to reflect the hierarchical structure of the data in the model specification. 4.5.2 Likelihood Ratio Tests and t -Tests for Fixed Effects Hypothesis 4.2. The fixed effects, β1 , β2 , β3 , and β4 , associated with the four studentlevel covariates, MATHKIND, SEX, MINORITY, and SES, should be added to Model 4.1. We test Hypothesis 4.2 using a likelihood ratio test, based on ML estimation. We calculate the likelihood ratio test statistic by subtracting the –2 ML log-likelihood for Model 4.2 (the reference model including the four student-level fixed effects) from the corresponding value for Model 4.1 (the nested model excluding the student-level fixed effects). The distribution of the test statistic, under the null hypothesis that the four fixed-effect parameters are all equal to zero, is asymptotically a χ2 with 4 degrees of freedom. Because the p-value 178 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 4.5: Summary of Hypothesis Test Results for the Classroom Analysis p-value Hypothesis Label Test Models Compared (Nested vs. Reference)a Estima- Test Statistic tion Values Methodb (Calculation) 4.1 LRT 4.1A vs. 4.1 REML χ2 (0 : 1) = 7.9 (11776.7–11768.8) < 0.01 4.2 LRT 4.1 vs. 4.2 ML χ2 (4) = 380.4 (11771.3–11390.9) < 0.01 4.3 t-test 4.3 REML ML t(792) = 0.34 t(177) = 0.35 0.73 0.72 4.4 t-test 4.3 REML ML t(792) = 0.97 t(177) = 0.97 0.34 0.34 4.5 t-test 4.3 REML ML t(792) = 1.67 t(177) = 1.67 0.10 0.10 t-test 4.4 REML t(873) = −1.15 0.25 ML t(105) = −1.15 0.25 Note: See Table 4.4 for null and alternative hypotheses, and distributions of test statistics under H0 . a Nested models are not necessary for the t-test of Hypotheses 4.4 through 4.6. b The HLM3 procedure uses ML estimation only; we also report results based on REML estimation from SAS proc mixed. 4.6 is significant (p < 0.001), we add the fixed effects associated with the four student-level covariates to the model and choose Model 4.2 as our preferred model at this stage of the analysis. Recall that likelihood ratio tests are only valid if both the reference and nested models are fitted using the same observations, and the fits of these two models are based on all 1190 cases in the data set. Hypotheses 4.3, 4.4, and 4.5. The fixed effects, β5 , β6 , and β7 , associated with the classroom-level covariates, YEARSTEA, MATHKNOW, and MATHPREP, should be retained in Model 4.3. We are unable to use a likelihood ratio test for the fixed effects of all the Level 2 (classroom-level) covariates, because cases are lost due to missing data on the MATHKNOW variable. Instead, we consider individual t-tests for the fixed effects of the classroom-level covariates in Model 4.3. To illustrate testing Hypothesis 4.3, we consider the t-test reported by HLM3 for the fixed effect, β5 , of YEARSTEA in Model 4.3. Note that HLM used ML estimation for all models fitted in this chapter, so the estimates of the three variance components will be biased and, consequently, the t-tests calculated by HLM will also be biased (see Subsection 2.4.1). However, under the null hypothesis that β5 = 0, the test statistic reported by HLM approximately follows a t-distribution with 177 degrees of freedom (see Subsec- Three-Level Models for Clustered Data:The Classroom Example 179 tion 4.11.3 for a discussion of the calculation of degrees of freedom in HLM3). Because the t-test for Hypothesis 4.3 is not significant (p = 0.724), we decide not to include the fixed effect associated with YEARSTEA in the model and conclude that there is not a relationship between the MATHGAIN score of the student and the years of experience of their teacher. Similarly, we use t-statistics to test Hypotheses 4.4 and 4.5. Because neither of these tests is significant (see Table 4.5), we conclude that there is not a relationship between the MATHGAIN score of the student and the math knowledge or math preparation of their teacher, as measured for this study. Because the results of hypothesis tests 4.3 through 4.5 were not significant, we do not add the fixed effects associated with any classroom-level covariates to the model, and proceed with Model 4.2 as our preferred model at this stage of the analysis. Hypothesis 4.6. The fixed effect, β8 , associated with the school-level covariate, HOUSEPOV, should be retained in Model 4.4. We consider a t-test for Hypothesis 4.6. Under the null hypothesis, the t-statistic reported by HLM3 for the fixed effect of HOUSEPOV in Model 4.4 approximately follows a t-distribution with 105 degrees of freedom (see Table 4.5). Because this test is not significant (p = 0.253), we do not add the fixed effect associated with HOUSEPOV to the model, and conclude that the MATHGAIN score of a student is not related to the poverty level of the households in the neighborhood of their school. We choose Model 4.2 as our final model for the Classroom data analysis. 4.6 Comparing Results across the Software Procedures In Tables 4.6 to 4.9, we present comparisons of selected results generated by the five software procedures after fitting Models 4.1, 4.2, 4.3, and 4.4, respectively, to the Classroom data. 4.6.1 Comparing Model 4.1 Results The initial model fitted to the Classroom data, Model 4.1, is variously described as an unconditional, variance components, or “means-only” model. It has a single fixed-effect parameter, the intercept, which represents the mean value of MATHGAIN for all students. Despite the fact that HLM3 uses ML estimation, and the other five software procedures use REML estimation for this model, all six procedures produce the same estimates for the intercept and its standard error. The REML estimates of the variance components and their standard errors are very similar across the procedures in SAS, SPSS, R, and Stata, whereas the ML estimates from HLM are somewhat different. Looking at the REML estimates, the estimated vari2 ance of the random school effects (σint:school ) is 77.5, the estimated variance of the 2 nested random classroom effects (σint:classroom ) is 99.2, and the estimated residual variance (σ 2 ) is approximately 1028.2; the largest estimated variance component is the residual variance. Table 4.6 also shows that the –2 REML log-likelihood values calculated for Model 4.1 agree across the procedures in SAS, SPSS, R, and Stata. The AIC and BIC information criteria based on the –2 REML log-likelihood values disagree across the procedures that compute them, owing to the different formulas that are used to calculate them (as discussed in Subsection 3.6.1). The HLM3 procedure does not calculate these information criteria. 180 TABLE 4.6: Comparison of Results for Model 4.1 SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM3 Estimation Method REML REML REML REML REML ML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) 57.43(1.44) 57.43(1.44) 57.43(1.44) 57.43(1.44) 57.43(1.44) Estimate (SE) Estimate (SE) Estimate (SE) b Estimate (n.c.) Estimate (SE) β0 (intercept) Covariance Parameter 2 σint:school 2 σint:classroom 2 σ (residual variance) 77.44(32.61) 77.49(32.62) 99.19(41.80) 99.23(41.81) 1028.28(49.04) 1028.23(49.04) 77.49 99.23 1028.23 c 77.49 99.23 1028.23 77.50(32.62) 99.23(41.81) 1028.23(49.04) 57.43(1.44)a Estimate (SE) 75.04(31.70) 100.44(38.45) 1027.73(48.06) Model Information Criteria –2 RE/ML log-likelihood 11768.8 11768.8 11768.8 11768 11768.8 AIC 11774.8 11774.8 11776.8 11777 11776.8 BIC 11782.8 11790.0 11797.1 11797 11797.1 Note: (n.c.) = not computed Note: 1190 Students at Level 1; 312 Classrooms at Level 2; 107 Schools at Level 3. a Model-based standard errors are presented for the fixed-effect parameter estimates in HLM; robust (sandwich-type) standard errors are also produced in HLM by default. b Standard errors for the estimated covariance parameters are not reported in the output generated by the summary() function in R; 95% confidence intervals for the parameter estimates can be generated by applying the intervals() function in the nlme package to the object containing the results of an lme() fit (e.g., intervals(model4.1.fit)). c These are squared values of the estimated standard deviations reported by the nlme version of the lme() function in R. 11771.3 n.c. n.c. Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Three-Level Models for Clustered Data:The Classroom Example 4.6.2 181 Comparing Model 4.2 Results Model 4.2 includes four additional parameters, representing the fixed effects of the four student-level covariates. As Table 4.7 shows, the estimates of these fixed-effect parameters and their standard errors are very similar across the six procedures. The estimates produced using ML estimation in HLM3 are only slightly different from the estimates produced by the other four procedures. The estimated variance components in Table 4.7 are all smaller than the estimates in Table 4.6, across the six procedures, owing to the inclusion of the fixed effects of the Level 1 (student-level) covariates in Model 4.2. The estimate of the variance between schools was the least affected, while the residual variance was the most affected (as expected). Table 4.7 also shows that the –2 REML log-likelihood values agree across the procedures that use REML estimation, as was noted for Model 4.1. 4.6.3 Comparing Model 4.3 Results Table 4.8 shows that the estimates of the fixed-effect parameters in Model 4.3 (and their standard errors) are once again nearly identical across the five procedures that use REML estimation of the variance components (SAS, SPSS, R, and Stata). The parameter estimates are slightly different when using HLM3, due to the use of ML estimation (rather than REML) by this procedure. The five procedures that use REML estimation agree quite well (with small differences likely due to rounding error) on the values of the estimated variance components. The variance component estimates from HLM3 are somewhat smaller. The –2 REML log-likelihood values agree across the procedures in SAS, SPSS, R, and Stata. The AIC and BIC model fit criteria calculated using the –2 REML log-likelihood values for each program differ, due to the different calculation formulas used for the information criteria across the software procedures. We have included the t-tests and z-tests reported by five of the six procedures for the fixed-effect parameters associated with the classroom-level covariates in Table 4.8, to illustrate the differences in the degrees of freedom computed by the different procedures for the approximate t-statistics. Despite the different methods used to calculate the approximate degrees of freedom for the t-tests (see Subsections 3.11.6 or 4.11.3), the results are nearly identical across the procedures. Note that the z-statistics calculated by Stata do not involve degrees of freedom; Stata refers these test statistics to a standard normal distribution to calculate p-values, and this methodology yields very similar test results. We remind readers that p-values for the t-statistics are not computed when using the lme4 package version of the lmer() function in R to fit the model, because of the different approaches that exist for defining the reference distribution. 4.6.4 Comparing Model 4.4 Results The comparison of the results produced by the software procedures in Table 4.9 is similar to the comparisons in the other three tables. Test statistics calculated for the fixed effect of HOUSEPOV again show that the procedures that compute the test statistics agree in terms of the results of the tests, despite the different degrees of freedom calculated for the approximate t-statistics. 182 TABLE 4.7: Comparison of Results for Model 4.2 SAS: proc mixed SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM3 Estimation Method REML REML REML REML REML ML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Covariance Parameter 282.79(10.85) −0.47(0.02) −1.25(1.66) −8.26(2.34) 5.35(1.24) Estimate (SE) 2 σint:school 2 σint:classroom 2 σ (residual variance) 75.22(25.92) 83.24(29.37) 734.59(34.70) 282.79(10.85) −0.47(0.02) −1.25(1.66) −8.26(2.34) 5.35(1.24) Estimate (SE) 75.20(25.92) 83.28(29.38) 734.57(34.70) 282.79(10.85) −0.47(0.02) −1.25(1.66) −8.26(2.34) 5.35(1.24) Estimate (n.c.) 282.79(10.85) −0.47(0.02) −1.25(1.66) −8.26(2.34) 5.35(1.24) Estimate (n.c.) 75.20 83.28 734.57 75.20 83.29 734.57 282.79(10.85) −0.47(0.02) −1.25(1.66) −8.26(2.34) 5.35(1.24) Estimate (SE) 75.20(25.92) 83.28(29.38) 734.57(34.70) 282.73(10.83) −0.47(0.02) −1.25(1.65) −8.25(2.33) 5.35(1.24) Estimate (SE) 72.88(26.10) 82.98(28.82) 732.22(34.30) Model Information Criteria –2 RE/ML log-likelihood AIC BIC Note: (n.c.) = not computed Note: 1190 Students at Level 1; 11385.8 11391.8 11399.8 11385.8 11391.8 11407.0 11385.8 11401.8 11442.4 312 Classrooms at Level 2; 107 Schools at Level 3. 11386 11402 11442 11385.8 11401.8 11442.5 11390.9 n.c. n.c. Linear Mixed Models: A Practical Guide Using Statistical Software β0 (Intercept) β1 (MATHKIND) β2 (SEX) β3 (MINORITY) β4 (SES) SAS: proc mixed SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM3 Estimation Method REML REML REML REML REML ML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) β0 (Intercept) β1 (MATHKIND) β2 (SEX) β3 (MINORITY) β4 (SES) β5 (YEARSTEA) β6 (MATHPREP) β7 (MATHKNOW) Covariance Parameter 2 σint:school 2 σint:classroom σ2 (residual variance) 282.02(11.70) −0.48(0.02) −1.34(1.72) −7.87(2.42) 5.42(1.28) 0.04(0.12) 1.09(1.15) 1.91(1.15) Estimate (SE) 282.02(11.70) −0.48(0.02) −1.34(1.72) −7.87(2.42) 5.42(1.28) 0.04(0.12) 1.09(1.15) 1.91(1.15) Estimate (SE) 282.02(11.70) −0.48(0.02) −1.34(1.72) −7.87(2.42) 5.42(1.28) 0.04(0.12) 1.09(1.15) 1.91(1.15) Estimate (n.c.) 282.02(11.70) −0.48(0.02) −1.34(1.72) −7.87(2.42) 5.42(1.28) 0.04(0.12) 1.09(1.15) 1.91(1.15) Estimate (n.c.) 282.02(11.70) −0.48(0.02) −1.34(1.72) −7.87(2.42) 5.42(1.28) 0.04(0.12) 1.09(1.15) 1.91(1.15) Estimate (SE) 281.90(11.65) −0.47(0.02) −1.34(1.71) −7.83(2.40) 5.43(1.27) 0.04(0.12) 1.10(1.14) 1.89(1.14) Estimate (SE) 75.24(27.35) 75.19(27.35) 75.19 75.19 75.19(27.35) 86.52(31.39) 86.68(31.43) 86.68 86.68 86.68(31.43) 72.16(27.44) 82.69(30.32) 713.91(35.47) 713.83(35.47) 713.83 713.83 713.83(35.47) 711.50(35.00) Model Information Criteria –2 RE/ML log-likelihood AIC BIC Tests for Fixed Effects 10313.0 10319.0 10327.0 10313.0 10319.0 10333.9 10313.0 10335.0 10389.8 10312.0 10335.0 10390.0 N/A 10313.0 10335.0 10389.8 10320.1 n.c. n.c. t-tests t-tests t-tests z-tests t-tests β5 (YEARSTEA) t(792.0) = 0.34, t(227.7) = 0.34, t(177.0) = 0.34, Z = 0.34, t(177.0) = 0.35, β6 (MATHPREP) t(792.0) = 0.95, t(206.2) = 0.95, t(177.0) = 0.95, Z = 0.95, t(177.0) = 0.97, β7 (MATHKNOW) t(792.0) = 1.67, t(232.3) = 1.67, t(177.0) = 1.67, Z = 1.67, t(177.0 = 1.67,) p = 0.73 p = 0.34 p = 0.10 p = 0.74 p = 0.34 p = 0.10 p = 0.73 p = 0.34 p = 0.10 p = 0.34 p = 0.10 p = 0.72 p = 0.34 p = 0.10 183 Note: (n.c.) = not computed Note: 1081 Students at Level 1; 285 Classrooms at Level 2; 105 Schools at Level 3. p = 0.73 Three-Level Models for Clustered Data:The Classroom Example TABLE 4.8: Comparison of Results for Model 4.3 184 TABLE 4.9: Comparison of Results for Model 4.4 SAS: proc mixed SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM3 Estimation Method REML REML REML REML REML ML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Covariance Parameter 2 σint:school 2 σint:classroom 2 σ (residual variance) 285.06(11.02) −0.47(0.02) −1.23(1.66) −7.76(2.39) 5.24(1.25) −11.44(9.94) Estimate (SE) 77.77(25.99) 81.52(29.07) 734.44(34.67) 285.06(11.02) −0.47(0.02) −1.23(1.66) −7.76(2.38) 5.24(1.25) −11.44(9.94) Estimate (SE) 77.76(25.99) 81.56(29.07) 734.42(34.67) 285.06(11.02) −0.47(0.02) −1.23(1.66) −7.76(2.38) 5.24(1.25) −11.44(9.94) Estimate (n.c.) 77.76 81.56 734.42 285.06(11.02) −0.47(0.02) −1.23(1.66) −7.76(2.38) 5.24(1.25) −11.44(9.94) Estimate (n.c.) 77.76 81.56 734.42 285.06(11.02) −0.47(0.02) −1.23(1.66) −7.76(2.38) 5.24(1.25) −11.44(9.94) Estimate (SE) 77.76(25.99) 81.56(29.07) 734.42(34.67) 284.92(10.99) −0.47(0.02) −1.23(1.65) −7.74(2.37) 5.24(1.24) −11.30(9.83) Estimate (SE) 74.14(26.16) 80.96(28.61) 732.08(34.29) Model Information Criteria –2 RE/ML log-likelihood AIC BIC Tests for Fixed Effects β8 (HOUSEPOV) 11378.1 11384.1 11392.1 11378.1 11384.1 11399.3 11378.1 11396.1 11441.8 11378 11396 11442.0 11378.1 11396.1 11441.8 11389.6 n.c. n.c. t-tests t-tests t-tests N/A z-tests t-tests t(873.0) = −1.15, t(119.5) = −1.15, t(105.0) = −1.15, Z = −1.15, t(105.0) = −1.15, p = 0.25 p = 0.25 p = 0.25 Note: (n.c.) = not computed Note: 1190 Students at Level 1; 312 Classrooms at Level 2; 107 Schools at Level 3. p = 0.25 p = 0.25 Linear Mixed Models: A Practical Guide Using Statistical Software β0 (Intercept) β1 (MATHKIND) β2 (SEX) β3 (MINORITY) β4 (SES) β8 (HOUSEPOV) Three-Level Models for Clustered Data:The Classroom Example 4.7 185 Interpreting Parameter Estimates in the Final Model We consider results generated by the HLM3 procedure in this section. 4.7.1 Fixed-Effect Parameter Estimates Based on the results from Model 4.2, we see that gain in math score in the spring of first grade (MATHGAIN) is significantly related to math achievement score in the spring of kindergarten (MATHKIND), minority status (MINORITY), and student SES. The portion of the HLM3 output for Model 4.2 presented below shows that the individual tests for each of these fixed-effect parameters are significant (p < 0.05). The estimated fixed effect of SEX (females relative to males) is the only nonsignificant fixed effect in Model 4.2 (p = 0.45). ' The outcome variable is MATHGAIN Final estimation of fixed effects: ------------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ------------------------------------------------------------------------------For INTRCPT1, P0 For INTRCPT2, B00 INTRCPT3, G000 282.726785 10.828453 26.110 106 0.000 For SEX slope, P1 For INTRCPT2, B10 INTRCPT3, G100 -1.251422 1.654663 -0.756 767 0.450 For MINORITY slope, P2 For INTRCPT2, B20 INTRCPT3, G200 -8.253782 2.331248 -3.540 767 0.001 For MATHKIND slope, P3 For INTRCPT2, B30 INTRCPT3, G300 -0.469668 0.022216 -21.141 767 0.000 For SES slope, P4 For INTRCPT2, B40 INTRCPT3, G400 5.348526 1.238400 4.319 767 0.000 ------------------------------------------------------------------------------& The Greek letters for the fixed-effect parameters in the HLM version of Model 4.2 (see Table 4.3 and Subsection 4.4.5) are shown in the left-most column of the output, in their Latin form, along with the name of the variable whose fixed effect is included in the table. For example, G100 represents the overall fixed effect of SEX (γ100 in HLM notation). This fixed effect is actually the intercept in the Level 3 equation for the school-specific effect of SEX (hence, the INTRCPT3 notation). The column labeled “Coefficient” contains the fixedeffect parameter estimate for each of these covariates. The standard errors of the parameter estimates are also provided, along with the T-ratios (t-test statistics), approximate degrees of freedom (d.f.) for the T-ratios, and the p-value. We describe the HLM calculation of degrees of freedom for these approximate t-tests in Subsection 4.11.3. The estimated fixed effect of kindergarten math score, MATHKIND, on math achievement score in first grade, MATHGAIN, is negative (−0.47), suggesting that students with higher math scores in the spring of their kindergarten year have a lower predicted gain in math achievement in the spring of first grade, after adjusting for the effects of other covari- $ % 186 Linear Mixed Models: A Practical Guide Using Statistical Software ates (i.e., SEX, MINORITY, and SES). That is, students doing well in math in kindergarten will not improve as much over the next year as students doing poorly in kindergarten. Minority students are predicted to have a mean MATHGAIN score that is 8.25 units lower than their nonminority counterparts, after adjusting for the effects of other covariates. In addition, students with higher SES are predicted to have higher math achievement gain than students with lower SES, controlling for the effects of the other covariates in the model. 4.7.2 Covariance Parameter Estimates The HLM output below presents the estimated variance components for Model 4.2, based on the HLM3 fit of this model. ' $ Final estimation of level-1 and level-2 variance components: -------------------------------------------------------------------Random Effect Standard Variance df Chi-square P-value Deviation Component -------------------------------------------------------------------INTRCPT1, r0 9.10959 82.98470 205 298.96800 0.000 Level-1, e 27.05951 732.21715 Final estimation of level-3 variance components: --------------------------------------------------------------------------Random Effect Standard Variance df Chi-square P-value Deviation Component --------------------------------------------------------------------------INTRCPT1/INTRCPT2, u00 8.53721 72.88397 106 183.59757 0.000 --------------------------------------------------------------------------& % The variance components in this three-level model are reported in two blocks of output. The first block of output contains the estimated standard deviation of the nested random effects associated with classrooms (labeled r0, and equal to 9.11), and the corresponding estimated variance component (equal to 82.98). In addition, a chi-square test (discussed in the following text) is reported for the significance of this variance component. The first block of output also contains the estimated standard deviation of the residuals (labeled e, and equal to 27.06), and the corresponding estimated variance component (equal to 732.22). No test of significance is reported for the residual variance. The second block of output above contains the estimated standard deviation of the random effects associated with schools (labeled u00), and the corresponding estimated variance component (equal to 72.88). HLM also reports a chi-square test of significance for the variance component at the school level. The addition of the fixed effects of the student-level covariates to Model 4.1 (to produce Model 4.2) reduced the estimated residual variance by roughly 29% (estimated residual variance = 1027.73 in Model 4.1, vs. 732.22 in Model 4.2). The estimates of the classroomand school-level variance components were also reduced by the addition of the fixed effects associated with the student-level covariates, although not substantially (the estimated classroom-level variance was reduced by roughly 17.4%, and the estimated school-level variance was reduced by about 2.9%). This suggests that the four student-level covariates are effectively explaining some of the random variation in the response values at the different levels of the data set, especially at the student level (as expected). The magnitude of the variance components in Model 4.2 (and the significant chi-square tests reported for the variance components by HLM3) suggests that there is still unexplained random variation in the response values at all three levels of this data set. Three-Level Models for Clustered Data:The Classroom Example 187 We see in the output above that HLM3 produces chi-square tests for the variance components in the output (see Raudenbush & Bryk (2002) for details on these tests). These tests suggest that the variances of the random effects at the school level (u00) and the classroom level (r0) in Model 4.2 are both significantly greater than zero, even after the inclusion of the fixed effects of the student-level covariates. These test results indicate that a significant amount of random variation in the response values at all three levels of this data set remains unexplained. At this point, fixed effects associated with additional covariates could be added to the model, to see if they help to explain random variation at the different levels of the data. 4.8 Estimating the Intraclass Correlation Coefficients (ICCs) In the context of a three-level hierarchical model with random intercepts, the intraclass correlation coefficient (ICC) is a measure describing the similarity (or homogeneity) of observed responses within a given cluster. For each level of clustering (e.g., classroom or school), an ICC can be defined as a function of the variance components. For brevity in this section, we represent the variance of the random effects associated with schools as σs2 2 (instead of σint:school ), and the variance of the random effects associated with classrooms 2 nested within schools as σc2 (instead of σint:classroom ). The school-level ICC is defined as the proportion of the total random variation in the observed responses (the denominator in (4.5)) due to the variance of the random school effects (the numerator in (4.5)): ICCschool = σs2 σs2 + σc2 + σ 2 (4.5) The value of ICCschool is high if the total random variation is dominated by the variance of the random school effects. In other words, the ICCschool is high if the MATHGAIN scores of students in the same school are relatively homogeneous, but the MATHGAIN scores across schools tend to vary widely. Similarly, the classroom-level ICC is defined as the proportion of the total random variation (the denominator in (4.6)) due to random between-school and between-classroom variation (the numerator in (4.6)): ICCclassroom = σs2 + σc2 σs2 + σc2 + σ 2 (4.6) This ICC is high if there is little variation in the responses of students within the same classroom (σ 2 is low) compared to the total random variation. The ICCs for classrooms and for schools are estimated by substituting the estimated variance components from a random intercept model into the preceding formulas. Because variance components are positive or zero by definition, the resulting ICCs are also positive or zero. The software procedures discussed in this chapter provide clearly labeled variance component estimates in the computer output when fitting a random intercepts model, allowing for easy calculation of estimates of these ICCs. We can use the estimated variance components from Model 4.1 to compute estimates of the intraclass correlation coefficients (ICCs) defined in (4.5) and (4.6). We estimate the ICC of observations on students within the same school to be 77.5/(77.5 + 99.2 + 1028.2) = 0.064, and we estimate the ICC of observations on students within the same classroom nested within a school to be (77.5 + 99.2)/(77.5 + 188 Linear Mixed Models: A Practical Guide Using Statistical Software 99.2 + 1028.2) = 0.147. Observations on students in the same school are modestly correlated, while observations on students within the same classroom have a somewhat higher correlation. To further illustrate ICC calculations, we consider the marginal variance-covariance matrix Vk implied by Model 4.1 for a hypothetical school, k, having two classrooms, with the first classroom having two students, and the second having three students. The first two rows and columns of this matrix correspond to observations on the two students from the first classroom, and the last three rows and columns correspond to observations on the three students from the second classroom: ⎛ ⎜ ⎜ Vk = ⎜ ⎜ ⎝ σs2 + σc2 + σ 2 σs2 + σc2 2 2 2 σs + σc σs + σc2 + σ 2 2 σs σs2 σs2 σs2 σs2 σs2 ⎛ σs2 + σc2 + σ 2 ⎝ σs2 + σc2 σs2 + σc2 σs2 σs2 σs2 σs2 σs2 σs2 σs2 + σc2 2 σs + σc2 + σ 2 σs2 + σc2 ⎞ ⎟ ⎞⎟ ⎟ σs2 + σc2 ⎟ σs2 + σc2 ⎠⎠ σs2 + σc2 + σ 2 The corresponding marginal correlation matrix for these observations can be calculated by dividing all elements in the matrix above by the total variance of a given observation, [var(yijk ) = σs2 + σc2 + σ 2 ], as shown below. The ICCs defined in (4.5) and (4.6) can easily be identified in this implied correlation matrix: ⎛ ⎛ ⎜ ⎝ ⎜ ⎜ ⎜ ⎜ Vk (corr) = ⎜ ⎜ ⎜ ⎜ ⎝ 1 σs2 +σc2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 +σc2 σs2 +σc2 +σ2 1 σs2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 ⎞ ⎠ ⎛ ⎜ ⎜ ⎜ ⎝ σs2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 1 σs2 +σc2 σs2 +σc2 +σ2 σs2 +σc2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 +σc2 σs2 +σc2 +σ2 1 σs2 +σc2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 σs2 +σc2 +σ2 σs2 +σc2 σs2 +σc2 +σ2 σs2 +σc2 σs2 +σc2 +σ2 ⎞ ⎞ ⎟ ⎟ ⎟ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 1 We obtain estimates of the ICCs from the marginal variance-covariance matrix for the MATHGAIN observations implied by Model 4.1 by using the v option in the random statement in SAS proc mixed. The estimated 11 × 11 V1 matrix for the observations on the 11 students from school 1 is displayed as follows: ' $ Estimated V Matrix for schoolid 1 Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 --------------------------------------------------------------------------------------------------------1 1204.910 176.630 176.630 77.442 77.442 77.442 77.442 77.442 77.442 77.442 77.442 2 176.630 1204.910 176.630 77.442 77.442 77.442 77.442 77.442 77.442 77.442 77.442 3 176.630 176.630 1204.910 77.442 77.442 77.442 77.442 77.442 77.442 77.442 77.442 4 77.442 77.442 77.442 1204.910 176.630 176.630 176.630 176.630 176.630 176.630 176.630 5 77.442 77.442 77.442 176.630 1204.910 176.630 176.630 176.630 176.630 176.630 176.630 6 77.442 77.442 77.442 176.630 176.630 1204.910 176.630 176.630 176.630 176.630 176.630 7 77.442 77.442 77.442 176.630 176.630 176.630 1204.910 176.630 176.630 176.630 176.630 8 77.442 77.442 77.442 176.630 176.630 176.630 176.630 1204.910 176.630 176.630 176.630 9 77.442 77.442 77.442 176.630 176.630 176.630 176.630 176.630 1204.910 176.630 176.630 10 77.442 77.442 77.442 176.630 176.630 176.630 176.630 176.630 176.630 1204.910 176.630 11 77.442 77.442 77.442 176.630 176.630 176.630 176.630 176.630 176.630 176.630 1204.910 & The 3 × 3 submatrix in the upper-left corner of this matrix corresponds to the marginal variances and covariances of the observations for the three students in the first classroom, and the 8 × 8 submatrix in the lower-right corner represents the corresponding values for the eight students from the second classroom. % Three-Level Models for Clustered Data:The Classroom Example 189 We note that the estimated covariance of observations collected on students in the same classroom is 176.63. This is the sum of the estimated variance of the nested random classroom effects, 99.19, and the estimated variance of the random school effects, 77.44. Observations collected on students attending the same school but having different teachers are estimated to have a common covariance of 77.44, which is the variance of the random school effects. Finally, all observations have a common estimated variance, 1204.91, which is equal to the sum of the three estimated variance components in the model (99.19 + 77.44 + 1028.28 = 1204.91), and is the value along the diagonal of this matrix. The marginal variance-covariance matrices for observations on students within any given school would have the same structure, but would be of different dimensions, depending on the number of students within the school. Observations on students in different schools will have zero covariance, because they are assumed to be independent of each other. The estimated marginal correlations of observations for students within school 1 implied by Model 4.1 can be derived by using the vcorr option in the random statement in SAS proc mixed. Note in the corresponding SAS output below that observations on different students within the same classroom in this school have an estimated marginal correlation of 0.1466, and observations on students in different classrooms within this school have an estimated correlation of 0.06427. These results match our initial ICC calculations based on the estimated variance components. Covariates are not considered in the classical definitions of the ICC, either based on the random intercept model or the marginal model; however, covariates can easily be accommodated in the mixed model framework in either model setting. The ICC may be calculated from a model without fixed effects of other covariates (e.g., Model 4.1) or for a model including these fixed effects (e.g., Models 4.2 or 4.3). In either case, we can obtain the ICCs from the labeled variance component estimates or from the estimated marginal correlation matrix, as described earlier. ' $ Estimated V Correlation Matrix for schoolid 1 Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 ------------------------------------------------------------------------------1 1.000 0.147 0.147 0.064 0.064 0.064 0.064 0.064 0.064 0.064 0.064 2 0.147 1.000 0.147 0.064 0.064 0.064 0.064 0.064 0.064 0.064 0.064 3 0.147 0.147 1.000 0.064 0.064 0.064 0.064 0.064 0.064 0.064 0.064 4 0.064 0.064 0.064 1.000 0.147 0.147 0.147 0.147 0.147 0.147 0.147 5 0.064 0.064 0.064 0.147 1.000 0.147 0.147 0.147 0.147 0.147 0.147 6 0.064 0.064 0.064 0.147 0.147 1.000 0.147 0.147 0.147 0.147 0.147 7 0.064 0.064 0.064 0.147 0.147 0.147 1.000 0.147 0.147 0.147 0.147 8 0.064 0.064 0.064 0.147 0.147 0.147 0.147 1.000 0.147 0.147 0.147 9 0.064 0.064 0.064 0.147 0.147 0.147 0.147 0.147 1.000 0.147 0.147 10 0.064 0.064 0.064 0.147 0.147 0.147 0.147 0.147 0.147 1.000 0.147 11 0.064 0.064 0.064 0.147 0.147 0.147 0.147 0.147 0.147 0.147 1.000 & 4.9 4.9.1 % Calculating Predicted Values Conditional and Marginal Predicted Values In this section, we use the estimated fixed effects in Model 4.2, generated by the HLM3 procedure, to write formulas for calculating predicted values of MATHGAIN. Recall that three 190 Linear Mixed Models: A Practical Guide Using Statistical Software different sets of predicted values can be generated: conditional predicted values including the EBLUPs of the random school and classroom effects, and marginal predicted values based only on the estimated fixed effects. For example, considering the estimates for the fixed effects in Model 4.2, we can write a formula for the conditional predicted values of MATHGAIN for a student in a given classroom: MATHGAIN ijk = 282.73 − 0.47 × MATHKINDijk − 1.25 × SEXijk − 8.25 × MINORITYijk + 5.35 × SESijk + u k + u j|k (4.7) This formula includes the EBLUPs of the random effect for this student’s school, uk , and the random classroom effect for this student, uj|k . Residuals calculated based on these conditional predicted values should be used to assess assumptions of normality and constant variance for the residuals (see Subsection 4.10.2). A formula similar to (4.7) that omits the EBLUPs of the random classroom effects (uj|k ) could be written for calculating a second set of conditional predicted values specific to schools (4.8): MATHGAIN ijk = 282.73 − 0.47 × MATHKINDijk − 1.25 × SEXijk − 8.25 × MINORITYijk + 5.35 × SESijk + ûk (4.8) A third set of marginal predicted values, based on the marginal distribution of MATHGAIN responses implied by Model 4.2, can be calculated based only on the estimated fixed effects: MATHGAIN ijk = 282.73 − 0.47 × MATHKINDijk − 1.25 × SEXijk − 8.25 × MINORITYijk + 5.35 × SESijk These predicted values represent average values of the MATHGAIN response (across schools and classrooms) for all students having given values on the covariates. We discuss how to obtain both conditional and marginal predicted values based on the observed data using SAS, SPSS, R, and Stata in Chapter 3 and Chapters 5 through 7, respectively. Readers can refer to Subsection 4.4.5 for details on obtaining conditional predicted values in HLM. 4.9.2 Plotting Predicted Values Using HLM The HLM software has several convenient graphical features that can be used to visualize the fit of a linear mixed model. For example, after fitting Model 4.2 in HLM, we can plot the marginal predicted values of MATHGAIN as a function of MATHKIND for each level of MINORITY, based on the estimated fixed effects in Model 4.2. In the model-building window of HLM, click File, Graph Equations, and then Model graphs. In the Equation Graphing window, we set the parameters of the plot. First, set the Level 1 X focus to be MATHKIND, which will set the horizontal axis of the graph. Next, set the first Level 1 Z focus to be MINORITY. Finally, click on OK in the main Equation Graphing window to generate the graph in Figure 4.4. We can see the significant negative effect of MATHKIND on MATHGAIN in Figure 4.4, along with the gap in predicted MATHGAIN for students with different minority status. The fitted lines are parallel because we did not include an interaction between MATHKIND and MINORITY in Model 4.2. We also note that the values of SES and SEX are held fixed at their mean when calculating the marginal predicted values in Figure 4.4. Three-Level Models for Clustered Data:The Classroom Example 191 FIGURE 4.4: Marginal predicted values of MATHGAIN as a function of MATHKIND and MINORITY, based on the fit of Model 4.2 in HLM3. We can also generate a graph displaying the fitted conditional MATHGAIN values as a function of MATHKIND for a sample of individual schools, based on both the estimated fixed effects and the predicted random school effects (i.e., EBLUPs) resulting from the fit of Model 4.2. In the HLM model-building window, click File, Graph Equations, and then Level 1 equation graphing. First, choose MATHKIND as the Level 1 X focus. For Number of groups (Level 2 units or Level 3 units), select First ten groups. Finally, set Grouping to be Group at level 3, and click OK. This plots the conditional predicted values of MATHGAIN as a function of MATHKIND for the first ten schools in the data set, in separate panels (not displayed here). 4.10 Diagnostics for the Final Model In this section we consider diagnostics for our final model, Model 4.2, fitted using ML estimation in HLM. 4.10.1 Plots of the EBLUPs Plots of the EBLUPs for the random classroom and school effects from Model 4.2 were generated by first saving the EBLUPs from the HLM3 procedure in SPSS data files (see Subsection 4.4.5), and then generating the plots in SPSS. Figure 4.5 below presents a normal Q–Q plot of the EBLUPs for the random classroom effects. This plot was created using the EBINTRCP variable saved in the Level 2 residual file by the HLM3 procedure: PPLOT /VARIABLES=ebintrcp /NOLOG /NOSTANDARDIZE /TYPE=Q-Q 192 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 4.5: EBLUPs of the random classroom effects from Model 4.2, plotted using SPSS. /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL. We do not see evidence of any outliers in the random classroom effects, and the distribution of the EBLUPs for the random classroom effects is approximately normal. In the next plot (Figure 4.6), we investigate the distribution of the EBLUPs for the random school effects, using the EB00 variable saved in the Level 3 residual file by the HLM3 procedure: PPLOT /VARIABLES=eb00 /NOLOG /NOSTANDARDIZE /TYPE=Q-Q /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL. We do not see any evidence of a deviation from a normal distribution for the EBLUPs of the random school effects, and more importantly, we do not see any extreme outliers. Plots such as these can be used to identify EBLUPs that are potential outliers, and further investigate the clusters (e.g., schools or classrooms) associated with the extreme EBLUPs. Note that evidence of a normal distribution in these plots does not always imply that the distribution of the random effects is in fact normal (see Subsection 2.8.3). Three-Level Models for Clustered Data:The Classroom Example 193 FIGURE 4.6: EBLUPs of the random school effects from Model 4.2, plotted using SPSS. 4.10.2 Residual Diagnostics In this section, we investigate the assumptions of normality and constant variance for the residuals, based on the fit of Model 4.2. These plots were created in SPSS, using the Level 1 residual file generated by the HLM3 procedure. We first investigate a normal Q–Q plot for the residuals: PPLOT /VARIABLES=l1resid /NOLOG /NOSTANDARDIZE /TYPE=Q-Q /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL. If the residuals based on Model 4.2 followed an approximately normal distribution, all of the points in Figure 4.7 would lie on or near the straight line included in the figure. We see a deviation from this line at the tails of the distribution, which suggests a longtailed distribution of the residuals (since only the points at the ends of the distribution deviate from normality). There appear to be small sets of extreme negative and positive residuals that may warrant further investigation. Transformations of the response variable (MATHGAIN) could also be performed, but the scale of the MATHGAIN variable (where some values are negative) needs to be considered; for example, a log transformation of the response would not be possible without first adding a constant to each response to produce a positive value. 194 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 4.7: Normal quantile–quantile (Q–Q) plot of the residuals from Model 4.2, plotted using SPSS. Next, we investigate a scatter plot of the conditional residuals vs. the fitted MATHGAIN values, which include the EBLUPs of the random school effects and the nested random classroom effects. These fitted values are saved by the HLM3 procedure in a variable named FITVAL in the Level 1 residual file. We investigate this plot to get a visual sense of whether or not the residuals have constant variance: GRAPH /SCATTERPLOT(BIVAR) = fitval WITH llresid /MISSING = LISTWISE . We have edited the scatter plot in SPSS (Figure 4.8) to include the fit of a smooth Loess curve, indicating the relationship of the fitted values with the residuals, in addition to a dashed reference line set at zero. We see evidence of nonconstant variance in the residuals in Figure 4.8. We would expect there to be no relationship between the fitted values and the residuals (a line fitted to the points in this plot should look like the reference line, representing the zero mean of the residuals), but the Loess smoother shows that the residuals tend to get larger for larger predicted values of MATHGAIN. This problem suggests that the model may be misspecified; there may be omitted covariates that would explain the large positive values and the low negative values of MATHGAIN that are not being well fitted. Scatter plots of the residuals against other covariates would be useful to investigate at this point, as there might be nonlinear relationships of the covariates with the MATHGAIN response that are not being captured by the strictly linear fixed effects in Model 4.2. Three-Level Models for Clustered Data:The Classroom Example 195 FIGURE 4.8: Residual vs. fitted plot from SPSS. 4.11 4.11.1 Software Notes REML vs. ML Estimation The procedures in SAS, SPSS, R, and Stata use restricted maximum likelihood (REML) estimation as the default estimation method for fitting models with nested random effects to three-level data sets. These four procedures estimate the variance and covariance parameters using REML (where ML estimation is also an option), and then use the estimated marginal V matrix to estimate the fixed-effect parameters in the models using generalized least squares (GLS). The procedure available in HLM (HLM3) utilizes ML estimation when fitting three-level models with nested random effects. 4.11.2 Setting up Three-Level Models in HLM In the following text, we note some important differences in setting up three-level models using the HLM software as opposed to the other four packages: • Three data sets, corresponding to the three levels of the data, are required to fit an LMM to a three-level data set. The other procedures require that all variables for each level of the data be included in a single data set, and that the data be arranged in the “long” format displayed in Table 4.2. 196 Linear Mixed Models: A Practical Guide Using Statistical Software • Models in HLM are specified in multiple parts. For a three-level data set, Level 1, Level 2, and Level 3 models are identified. The Level 2 models are for the effects of covariates measured on the Level 1 units and specified in the Level 1 model; and the Level 3 models are for the effects of covariates measured on the Level 2 units and specified in the Level 2 models. • In models for three-level data sets, the effects of any of the Level 1 predictors (including the intercept) are allowed to vary randomly across Level 2 and Level 3 units. Similarly, the effects of Level 2 predictors (including the intercept) are allowed to vary randomly across Level 3 units. In the models fitted in this chapter, we have allowed only the intercepts to vary randomly at different levels of the data. 4.11.3 Calculation of Degrees of Freedom for t-Tests in HLM The degrees of freedom for the approximate t-statistics calculated by the HLM3 procedure, and reported for Hypotheses 4.3 through 4.6, are described in this subsection. Level 1 Fixed Effects: df = number of Level 1 observations (i.e., number of students) – number of random effects at Level 2 – number of random effects at Level 3 – number of fixed-effect parameters associated with the covariates at Level 1. For example, for the t-tests for the fixed effects associated with the Level 1 (student-level) covariates in Model 4.2, we have df = 1190 – 312 – 107 – 4 = 767. Level 2 Fixed Effects: df = number of random effects at Level 2 – number of random effects at Level 3 – number of fixed effects at Level 2. For example, in Model 4.3, we have df = 285 – 105 – 3 = 177 for the t-tests for the fixed effects associated with the Level 2 (classroom-level) covariates, as shown in Table 4.5. Note that there are three fixed effects at Level 2 (the classroom level) in Model 4.3. Level 3 Fixed Effects: df = number of random effects at Level 3 – number of fixed effects at Level 3. Therefore, in Model 4.4, we have df = 107 – 2 = 105 for the t-test for the fixed effect associated with the Level 3 (school-level) covariate, as shown in Table 4.5. The fixed intercept is considered to be a Level 3 fixed effect, and there is one additional fixed effect at Level 3. 4.11.4 Analyzing Cases with Complete Data We mention in the analysis of the Classroom data that likelihood ratio tests are not possible for Hypotheses 4.3 through 4.5, due to the presence of missing data for some of the Level 2 covariates. An alternative way to approach the analyses in this chapter would be to begin with a data set having cases with complete data for all covariates. This would make either likelihood ratio tests or alternative tests (e.g., t-tests) appropriate for any of the hypotheses that we test. In the Classroom data set, MATHKNOW is the only classroom-level covariate with missing data. Taking that into consideration, we include the following syntax for each of the software packages that could be used to derive a data set where only cases with complete data on all covariates are included. In SAS, the following data step could be used to create a new SAS data set, classroom_nomiss, which contains only observations with complete data: Three-Level Models for Clustered Data:The Classroom Example 197 data classroom_nomiss; set classroom; if mathknow ne .; run; In SPSS, the following syntax can be used to select cases that do not have missing data on MATHKNOW (the resulting data set should be saved under a different name): FILTER OFF. USE ALL. SELECT IF (not MISSING (mathknow)). EXECUTE. In R, we could create a new data frame object excluding those cases with missing data on MATHKNOW: > class.nomiss <- subset(class, !is.na(mathknow)) In Stata, the following command could be used to delete cases with missing data on MATHKNOW: . keep if mathknow ! = . Finally, in HLM, this can be accomplished by selecting Delete data when ... making MDM (rather than when running the analysis) when setting up the MDM file. 4.11.5 Miscellaneous Differences Less critical differences between the five software procedures in terms of fitting three-level models are highlighted in the following text: • Procedures in the HLM software package automatically generate both model-based standard errors and robust (or sandwich-type) standard errors for estimated fixed effects. The two different sets of standard errors are clearly distinguished in the HLM output. The robust standard errors are useful to report if one is unsure about whether the marginal variance-covariance matrix for the data has been correctly specified; if the robust standard errors differ substantially from the model-based standard errors, we would recommend reporting the robust standard errors (for more details see Raudenbush & Bryk (2002)). Robust standard errors can be obtained in SAS by using the empirical option when invoking proc mixed. • EBLUPs for random effects cannot be calculated in SPSS when fitting models to data sets with multiple levels of clustering, such as the Classroom data. • Fitting three-level random intercept models using the MIXED command in SPSS tends to be computationally intensive, and can take longer than in the other software procedures. • The mixed command in Stata reports z-tests for the fixed effects, rather than the t-tests reported by the other four procedures. The z-tests are asymptotic, and thus require large sample sizes at all three levels of the data. 198 4.12 Linear Mixed Models: A Practical Guide Using Statistical Software Recommendations Three-level models for cross-sectional data introduce the possibility of extremely complex random-effects structures. In the example analyses presented in this chapter, we only considered models with random intercepts; we could have allowed the relationships of selected covariates at Level 1 (students) of the Classroom data to randomly vary across classrooms and schools, and the relationships of selected covariates at Level 2 (classrooms) to vary across schools. The decision to include many additional random effects in a three-level model will result in a much more complex implied covariance structure for the dependent variable, including several covariance parameters (especially if an unstructured D matrix is used, which is the default random-effects covariance structure in HLM and the two R functions). This may result in estimation difficulties, or the software appearing to “hang” or “freeze” when attempting to estimate a model. For this reason, we only recommend including a large number of random effects (above and beyond random intercepts) at higher levels if there is explicit research interest in empirically describing (and possibly attempting to explain) the variance in the relationships of selected covariates with the dependent variable across higher-level units. Including random intercepts at Level 2 and Level 3 of a given three-level data set will typically result in a reasonable implied covariance structure for a given continuous dependent variable in a cross-sectional three-level data set. Because three-level models do introduce the possibility of allowing many relationships to vary across higher levels of the data hierarchy (e.g., the relationship of student-level SES with mathematical performance varying across schools), the ability to graphically explore variance in both the means of a dependent variable and the relationships of key independent variables with the dependent variable across higher-level units becomes very important when analyzing three-level data. For this reason, having good graphical tools “built in” to a given software package becomes very important. We find that the HLM software provides users with a useful set of “point-and-click” graphing procedures for exploring random coefficients without too much additional work (see Subsection 4.2.2.2). Creating similar graphs and figures in the other software tends to take more work and some additional programming, but is still possible. 5 Models for Repeated-Measures Data: The Rat Brain Example 5.1 Introduction This chapter introduces the analysis of repeated-measures data, in which multiple measurements are made on the same subject (unit of analysis) under different conditions or across time. Repeated-measures data sets can be considered to be a type of two-level data, in which Level 2 represents the subjects and Level 1 represents the repeated measurements made on each subject. Covariates measured at Level 2 of the data (the subject level) describe between-subject variation and Level 1 covariates describe within-subject variation. The data that we analyze are from an experimental study of rats in which the dependent variable was measured in three brain regions for two levels of a drug treatment. Brain region and treatment are crossed within-subject factors; measurements were made for the same brain regions and the same treatments within each rat. Between-subject factors (e.g., sex or genotype) are not considered in this example. Repeated-measures data typically arise in an experimental setting, and often involve measurements made on the same subject over time, although time is not a within-subject factor in this example. In Table 5.1, we present examples of repeated-measures data in different research settings. In this chapter we highlight the SPSS software. 5.2 5.2.1 The Rat Brain Study Study Description The data used in this example were originally reported by Douglas et al. (2004).1 The aim of their experiment was to examine nucleotide activation (guanine nucleotide bonding) in seven different brain nuclei (i.e., brain regions) among five adult male rats. The basal nucleotide activation, measured after treatment with saline solution, was compared to activation in the same region after treatment with the drug carbachol. Activation was measured as the mean optical density produced by autoradiography. We compare activation in a subset of three of the original seven brain regions studied by the authors: the bed nucleus of the stria terminalis (BST), the lateral septum (LS), and the diagonal band of Broca (VDB). The original data layout for this study is shown in Table 5.2. 1 Data from this study are used with permission of the authors and represent a part of their larger study. Experiments were conducted in accordance with the National Institutes of Health Policy on Humane Care and Use of Laboratory Animals. 199 200 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 5.1: Examples of Repeated-Measures Data in Different Research Settings Research Setting Level of Data Unit of analysis (Level 2) Linguistics Medicine Anesthesiology Subject variable (random factor) Person Patient Rat Subjectlevel covariates Age, native language Sex, severity score Sex, genotype Word type, context Time (minutes after administration of drug) Brain region, treatment Vowel duration (msec) Pain relief (visual analog scale) Nucleotide activation (optical density) Repeated Withinmeasures subject (Level 1) factors Dependent variable The following SPSS syntax can be used to read in the tab-delimited raw data from the original ratbrain.dat file, assumed to be in the C:\temp directory: GET DATA /TYPE = TXT /FILE = "C:\temp\ratbrain.dat" /DELCASE = LINE /DELIMITERS = "\t" /ARRANGEMENT = DELIMITED /FIRSTCASE = 2 /IMPORTCASE = ALL /VARIABLES = animal A7 Carb.BST F6.2 Carb.LS F6.2 Carb.VDB F6.2 Basal.BST F6.2 Basal.LS F6.2 Basal.VDB F6.2 . CACHE. EXECUTE. Before we carry out an analysis of the Rat Brain data using SAS, SPSS, R, Stata, or HLM, we need to restructure the data set into the “long” format. The SPSS syntax to restructure the data is shown as follows. A portion of the restructured Rat Brain data is shown in Table 5.3. Models for Repeated-Measures Data:The Rat Brain Example 201 TABLE 5.2: The Rat Brain Data in the Original “Wide” Data Layout. Treatments are “Carb” and “Basal”; brain regions are BST, LS, and VDB Animal Carb BST Carb LS Carb VDB Basal BST Basal LS Basal VDB R111097 R111397 R100797 R100997 R110597 371.71 492.58 664.72 515.29 589.25 302.02 355.74 587.10 437.56 493.93 449.70 459.58 726.96 604.29 621.07 366.19 375.58 458.16 479.81 462.79 199.31 204.85 245.04 261.19 278.33 187.11 179.38 237.42 195.51 262.05 VARSTOCASES /MAKE activate FROM Basal BST Basal LS Basal VDB Carb BST Carb LS Carb VDB /INDEX = treatment(2) region(3) /KEEP = animal /NULL = KEEP. VALUE LABELS treatment 1 ’Basal’ 2 ’Carbachol’ / region 1 ‘BST’ 2 ‘LS’ 3 ’VDB’. The following variables are included in the Rat Brain data set (note that there are no Level 2 covariates included in this study): TABLE 5.3: Sample of the Rat Brain Data Set Rearranged in the “Long” Format Rat (Level 2) Unit ID ANIMAL Repeated Measures (Level 1) Within-Subject Fixed Factors TREATMENT REGION R111097 1 1 R111097 1 2 R111097 1 3 R111097 2 1 R111097 2 2 R111097 2 3 R111397 1 1 R111397 1 2 R111397 1 3 R111397 2 1 R111397 2 2 R111397 2 3 R100797 1 1 R100797 1 2 R100797 1 3 ... Note: “...” indicates portion of the data not displayed. Dependent Variable ACTIVATE 366.19 199.31 187.11 371.71 302.02 449.70 375.58 204.85 179.38 492.58 355.74 459.58 458.16 245.04 237.42 202 Linear Mixed Models: A Practical Guide Using Statistical Software Rat (Level 2) Variable • ANIMAL = Unique identifier for each rat Repeated-Measures (Level 1) Variables • TREATMENT = Level of drug treatment (1 = basal, 2 = carbachol) • REGION = Brain nucleus (1 = BST, 2 = LS, 3 = VDB) • ACTIVATE = Nucleotide activation (the dependent variable) We recommend sorting the data in ascending order by ANIMAL and then by TREATMENT and REGION within each level of ANIMAL prior to running the analysis. Although this sorting is not necessary for the analysis, it makes the output displayed later (e.g., marginal variance-covariance matrices) easier to read. 5.2.2 Data Summary The following SPSS syntax will generate descriptive statistics for the dependent variable, ACTIVATE, for each level of REGION by TREATMENT: MEANS TABLES = activate BY treatment BY region /CELLS MEAN COUNT STDDEV MIN MAX. The following table displays the SPSS output generated by submitting the syntax above. ' $ Report activate treatment region Mean N Basal BST LS VDB Total 428.5060 237.7440 212.2940 292.8480 5 5 5 15 53.31814 34.67477 35.72899 107.21452 366.19 199.31 179.38 179.38 479.81 278.33 262.05 479.81 Carbachol BST LS VDB Total 526.7100 435.2700 572.3200 511.4333 5 5 5 15 109.86160 112.44907 117.32236 120.30398 371.71 302.02 449.70 302.02 664.72 587.10 726.96 726.96 Total BST LS VDB Total 477.6080 336.5070 392.3070 402.1407 10 10 10 30 96.47086 130.35415 206.61591 157.77534 366.19 199.31 179.38 179.38 664.72 587.10 726.96 726.96 & Std. Deviation Minimum Maximum % The mean activation level is generally higher for carbachol than for the basal treatment in each region. The mean activation also appears to differ by region, with BST having the highest mean activation in the basal condition and VDB having the highest mean activation in the carbachol condition. The standard deviations of activation appear to be much larger for the carbachol treatment than for the basal treatment. We investigate the data by creating line graphs of the activation in the three brain regions for each animal for both the basal and carbachol treatments: Models for Repeated-Measures Data:The Rat Brain Example 203 Treatment Basal animal R10079 7 R10099 7 R11059 7 R11109 7 R11139 7 Carbachol 800.00 600.00 noitavitcA 400.00 200.00 BST LS BST VDB LS VDB Region FIGURE 5.1: Line graphs of activation for each animal by region within levels of treatment for the Rat Brain data. GRAPH /LINE(MULTIPLE) MEAN(activate) BY region BY animal /PANEL COLVAR = treatment COLOP = CROSS. The effects of treatment and region on the activation within each animal are clear in Figure 5.1. Activation values are consistently higher in the carbachol treatment than in the basal treatment across all regions, and the LS region has a lower mean than the BST region for both carbachol and basal treatments. There also appears to be a greater effect of carbachol treatment in the VDB region than in either the BST or LS region. These observations suggest that the fixed effects of TREATMENT and REGION and the TREATMENT × REGION interaction are likely to be significant in a mixed model for the data. In Figure 5.1, we also note characteristics of these data that will be useful in specifying the random effects in a mixed model. First, between-animal variation is apparent in the basal treatment, but is even greater in the carbachol treatment. To capture the between-animal variation in both treatments, we initially include in the model a random effect associated with the intercept for each rat. To address the greater between-animal variation in the carbachol treatment, we include a second random animal effect associated with TREATMENT (carbachol vs. basal). This results in each animal having two random intercepts, one for the basal condition and another for the carbachol treatment. 204 Linear Mixed Models: A Practical Guide Using Statistical Software STEP 1 M5.1 with fixed treatment by region interaction, and random intercepts H5.1 STEP 2 M5.2 with fixed treatment by region interaction, random intercepts, and random treatment effects H5.2 H5.3 STEP 3 M5.3 with heterogeneous residual variances for Basal and Carbachol treatments Model obtained by removing fixed effects of treatment by region interaction from Model 5.2 Legend: Model preference Reference model Nested model Nested (null hypothesis) model preferred Reference model Nested model Reference model preferred FIGURE 5.2: Model selection and related hypotheses for the analysis of the Rat Brain data. 5.3 Overview of the Rat Brain Data Analysis We apply the “top-down” modeling strategy discussed in Chapter 2 (Subsection 2.7.1) to the analysis of the Rat Brain data. Subsection 5.3.1 outlines the analysis steps and informally introduces the related models and hypotheses to be tested. Subsection 5.3.2 presents a more formal specification of selected models, and Subsection 5.3.3 presents details about the hypotheses tested. The analysis steps outlined in this section are shown schematically in Figure 5.2. See Subsection 5.3.3 for details on the interpretation of Figure 5.2. 5.3.1 Analysis Steps Step 1: Fit a model with a “loaded” mean structure (Model 5.1). Fit a two-level model with a loaded mean structure and random animal-specific intercepts. Model 5.1 includes fixed effects associated with region, treatment, and the interaction between region and treatment. This model also includes a random effect associated with the intercept for each animal and a residual associated with each observation. The residuals are assumed to be independent and to have the same variance across all levels of region and treatment. The assumption of homogeneous variance for the residuals, in conjunction Models for Repeated-Measures Data:The Rat Brain Example 205 with the single random effect associated with the intercept for each animal, implies that the six observations on each animal have the same marginal variance and that all pairs of observations have the same (positive) marginal covariance (i.e., a compound symmetry covariance structure). Step 2: Select a structure for the random effects (Model 5.1 vs. Model 5.2). Fit Model 5.2 by adding random animal-specific effects of treatment to Model 5.1, and decide whether to retain them in the model. (Note: In this step we do not carry out a formal test of whether the random intercepts should be retained in Model 5.1, but assume that they should be kept, based on the study design.) We observed in Figure 5.1 that the between-animal variation was greater for the carbachol treatment than for the basal treatment. To accommodate this difference in variation, we add a random animal-specific effect of treatment to Model 5.1 to obtain Model 5.2. The effect of treatment is fixed in Model 5.1 and therefore constant across all animals. The additional random effect associated with treatment included in Model 5.2 allows the implied marginal variance of observations for the carbachol treatment to differ from that for the basal treatment. Model 5.2 can also be interpreted as having two random intercepts per rat, i.e., one for the carbachol treatment and an additional one for the basal treatment. We test Hypothesis 5.1 to decide if the random treatment effects should be kept in Model 5.2. Based on the test result, we retain the random treatment effects and keep Model 5.2 as the preferred model at this stage of the analysis. Step 3: Select a covariance structure for the residuals (Model 5.2 vs. Model 5.3). Fit Model 5.3 with heterogeneous residual variances for the basal and carbachol treatments and decide whether the model should have homogeneous residual variances (Model 5.2) or heterogeneous residual variances (Model 5.3) We observed in Figure 5.1 that the variance of individual measurements appeared to be greater for the carbachol treatment than for basal treatment. In Step 2, we explored whether this difference can be attributed to between-subject variation by considering random treatment effects in addition to random intercepts. In this step, we investigate whether there is heterogeneity in the residual variances. In Model 5.2, we assume that the residual variance is constant across all levels of region and treatment. Model 5.3 allows a more flexible specification of the residual variance; i.e., it allows the variance of the residuals to differ across levels of treatment. We test Hypothesis 5.2 to decide if the residual variance is equal for the carbachol and basal treatments. Based on the results of this test, we keep Model 5.2, with homogeneous residual variance, as our preferred model. Step 4: Reduce the model by removing nonsignificant fixed effects and assess model diagnostics for the final model (Model 5.2). Decide whether to keep the fixed effects of the region by treatment interaction in Model 5.2. Based on the result of the test for Hypothesis 5.3, we conclude that the region by treatment interaction effects are significant and consider Model 5.2 to be our final model. We carry out diagnostics for the fit of Model 5.2 using informal graphical methods in SPSS in Section 5.9. 206 5.3.2 5.3.2.1 Linear Mixed Models: A Practical Guide Using Statistical Software Model Specification General Model Specification We specify Models 5.2 and 5.3 in this subsection. We do not explicitly specify the simpler Model 5.1. All three models are summarized in Table 5.4. The general specification of Model 5.2 corresponds closely to the syntax used to fit this model in SAS, SPSS, Stata, and R. The value of ACTIVATEti for a given observation indexed by t (t = 1, 2, ..., 6) on the i-th animal (i = 1, 2, ..., 5) can be written as follows: ACTIVATEti = β0 + β1 × REGION1ti + β2 × REGION2ti ⎫ ⎪ ⎬ + β3 × TREATMENTti + β4 × REGION1ti × TREATMENTti fixed ⎪ ⎭ + β5 × REGION2ti × TREATMENTti +u0i + u3i × TREATMENTti + εti } random (5.1) In Model 5.2, we include two indicator variables for region, REGION1ti and REGION2ti , which represent the BST and LS regions, respectively. TREATMENTti is an indicator variable that indicates the carbachol treatment. In our parameterization of the model, we assume that fixed effects associated with REGION=“VDB” and TREATMENT=“Basal” are set to zero. The fixed-effect parameters are represented by β0 through β5 . The fixed intercept β0 represents the expected value of ACTIVATEti for the reference levels of region and treatment (i.e., the VDB region under the basal treatment). The parameters β1 and β2 represent the fixed effects of the BST and LS regions vs. the VDB region, respectively, for the basal treatment (given that the model includes an interaction between region and treatment). The parameter β3 represents the fixed effect associated with TREATMENTti (carbachol vs. basal) for the VDB region. The parameters β4 and β5 represent the fixed effects associated with the region by treatment interaction. These parameters can either be interpreted as changes in the carbachol effect for the BST and LS regions relative to the VDB region, or changes in the BST and LS region effects for the carbachol treatment relative to the basal treatment. The u0i term represents the random intercept associated with animal i, and u3i represents the random effect associated with treatment (carbachol vs. basal) for animal i. We denote the random treatment effect as u3i because this random effect is coupled with the fixed effect of TREATMENT (β3 ). We assume that the distribution of the random effects associated with animal i, u0i and u3i , is bivariate normal: ui = u0i u3i ∼ N (0, D) 2 2 A set of three covariance parameters, σint , σtreat , and σint,treat , defines the D matrix of variances and covariances for the two random effects in Model 5.2, as follows: D= 2 σint σint,treat σint,treat 2 σtreat 2 In the D matrix, σint is the variance of the random intercepts, σint,treat is the covariance 2 of the two random effects, and σtreat is the variance of the random treatment effects. The residuals associated with the six observations on animal i are assumed to follow a multivariate normal distribution: Models for Repeated-Measures Data:The Rat Brain Example ⎛ ⎜ ⎜ ⎜ εi = ⎜ ⎜ ⎜ ⎝ ε1i ε2i ε3i ε4i ε5i ε6i 207 ⎞ ⎟ ⎟ ⎟ ⎟ ∼ N (0, Ri ) ⎟ ⎟ ⎠ where Ri is a 6 × 6 covariance matrix. We assume that the six components of the εi vector are ordered in the same manner as in Table 5.3: The first three residuals are associated with the basal treatment in the BST, LS, and VDB regions, and the next three are associated with the carbachol treatment in the same three regions. In Model 5.2, the Ri matrix is simply σ 2 I6 . The diagonal elements of the 6 × 6 matrix represent the variances of the residuals, all equal to σ 2 (i.e., the variances are homogeneous). The off-diagonal elements represent the covariances of the residuals, which are all zero: ⎛ 2 ⎞ σ 0 ··· 0 ⎜ 0 σ2 · · · 0 ⎟ ⎜ ⎟ 2 Ri = ⎜ . .. . . .. ⎟ = σ I6 ⎝ .. . . ⎠ . 0 0 · · · σ2 In Model 5.3, we allow the residual variances to differ for each level of treatment, by in2 2 cluding separate residual variances (σbasal and σcarb ) for the basal and carbachol treatments. This heterogeneous structure for the 6 × 6 Ri matrix is as follows: Ri = 2 σbasal I3 0 0 2 σcarb I3 The upper-left 3 × 3 submatrix corresponds to observations for the three regions in the basal treatment, and the lower right submatrix corresponds to the regions in the carbachol treatment. The treatment-specific residual variance is on the diagonal of the two 3 × 3 submatrices, and zeroes are off the diagonal. We assume that the residuals, εti (t = 1, ..., 6), conditional on a given animal i, are independent in both Models 5.2 and 5.3. 5.3.2.2 Hierarchical Model Specification We now present an equivalent hierarchical specification of Model 5.2, using the same notation as in Subsection 5.3.2.1. The correspondence between this notation and the HLM software notation is shown in Table 5.4. The hierarchical model has two components that reflect contributions from the two levels of the data: the repeated measures at Level 1 and the rats at Level 2. We write the Level 1 component as: Level 1 Model (Repeated Measures) ACTIVATEti = b0i + b1i × REGION1ti + b2i × REGION2ti + b3i × TREATMENTti + b4i × REGION1ti × TREATMENTti + b5i × REGION2ti × TREATMENTti + εti (5.2) where the residuals εti have the distribution defined in the general specification of Model 5.2 in Subsection 5.3.2.1, with constant variance and zero covariances in the Ri matrix. 208 TABLE 5.4: Summary of Selected Models for the Rat Brain Data HLM Notation Notation Fixed effects Intercept REGION1 (BST vs. VDB) REGION2 (LS vs. VDB) TREATMENT (carbachol vs. basal) REGION1 × TREATMENT REGION2 × TREATMENT β0 β1 β00 β10 β2 β20 β3 β30 β4 β40 β5 β50 Random effects Rat (i) Intercept u0i r0i TREATMENT (carbachol vs. basal) u3i r3i Residuals Measure (t) on rat (i) εti eti Covariance parameters (θ D ) for D matrix Rat level 2 σint τ [1,1] Variance of intercepts Model 5.1 5.2 5.3 √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Linear Mixed Models: A Practical Guide Using Statistical Software General Term/Variable General HLM Notation Notation Covariance of intercepts and treatment effects Variance of treatment effects σint,treat τ [2,1] 2 σtreat τ [2,2] Variances of residuals 2 2 σbasal , σcarb σ2 Term/Variable Covariance parameters (θ R ) for Ri matrix a Repeated measures Structure Ri Heterogeneous residual variance across treatments (see Subsection 5.3.2.1). Model 5.1 5.2 5.3 √ √ √ √ 2 2 2 σbasal = σbasal = σbasal = 2 2 2 σcarb = σcarb = σcarb σ2 σ2 σ 2 I6 σ 2 I6 Heta Models for Repeated-Measures Data:The Rat Brain Example TABLE 5.4: (Continued) 209 210 Linear Mixed Models: A Practical Guide Using Statistical Software In the Level 1 model shown in (5.2), we assume that ACTIVATEti , the nucleotide activation for an individual observation t on rat i, follows a linear model, defined by the animalspecific intercept b0i , the animal-specific effects b1i and b2i of REGION1ti and REGION2ti (BST and LS relative to the VDB region, respectively), the animal-specific effect b3i of TREATMENTti (carbachol vs. basal treatment), and the animal-specific interaction effects b4i and b5i . The Level 2 model describes variation between animals in terms of the animal-specific intercepts (b0i ) and the remaining animal-specific effects (b1i through b5i ). Although the Level 2 model has six equations (which is consistent with the HLM software specification), there is a simple expression for each one: Level 2 Model (Rat) b0i b1i b2i b3i b4i b5i = β0 + u0i = β1 = β2 = β3 + u3i = β4 = β5 (5.3) where ui = u0i u3i ∼ N (0, D) In this Level 2 model, the intercept b0i for rat i depends on the fixed intercept β0 (i.e., the ACTIVATE mean for the VDB region in the basal treatment), and a random effect, u0i , associated with rat i. The effect of treatment for rat i, b3i , also depends on a fixed effect (β3 ) and a random effect associated with rat i (u3i ). All remaining animal-specific effects (bli , b2i , b4i , and b5i ) are defined only by their respective fixed effects β1 , β2 , β4 , and β5 . By substituting the expressions for b0i through b5i from the Level 2 model into the Level 1 model, we obtain the general linear mixed model (LMM) as specified in (5.1). In Model 5.3, the residuals in the Level 1 model have the heterogeneous variance structure that was defined in Subsection 5.3.2.1. 5.3.3 Hypothesis Tests Hypothesis tests considered in the analysis of the Rat Brain data are summarized in Table 5.5. Hypothesis 5.1. The random effects (u3i ) associated with treatment for each animal can be omitted from Model 5.2. Model 5.1 has a single random effect (u0i ) associated with the intercept for each animal, and Model 5.2 includes an additional random effect (u3i ) associated with treatment for each animal. We do not directly test the significance of the random animal-specific treatment effects in Model 5.2, but rather, we test a null version of the D matrix (for Model 5.1) vs. an alternative version for Model 5.2 (Verbeke & Molenberghs, 2000). 2 The null hypothesis D matrix has a single positive element, σint , which is the variance of the single random effect (u0i ). The alternative hypothesis D matrix is positive semidefinite 2 and contains two additional parameters: σtreat > 0, which is the variance of the random treatment effects (u3i ), and σint,treat , which is the covariance of the two random effects, u0i and u3i , associated with each animal. Models Compared Label Null (H0 ) Alternative (HA ) Test Nested Model (H0 ) Ref. Model (HA ) Est. Method Test Stat. Dist. under H0 5.1 Drop u3i Retain u3i LRT Model 5.1 Model 5.2 REML 0.5χ21 +0.5χ22 5.2 Homogeneous residual variance 2 2 (σcarb = σbasal ) Heterogeneous residual variances 2 2 (σcarb = σbasal ) LRT Model 5.2 Model 5.3 REML χ21 5.3 Drop TREATMENT × REGION effects (β4 = β5 = 0) Retain TREATMENT × REGION effects (β4 = 0, or β5 = 0) Type III F -test Wald χ2 test N/A Model 5.2 REML F(2, 16)a χ22 Note: N/A = Not applicable. a We report the distribution of the F -statistic used by the MIXED command in SPSS, with the Satterthwaite approximation used to compute the denominator degrees of freedom. Models for Repeated-Measures Data:The Rat Brain Example TABLE 5.5: Summary of Hypotheses Tested in the Analysis of the Rat Brain Data 211 212 Linear Mixed Models: A Practical Guide Using Statistical Software H0 : D = HA : D = 2 σint 0 2 σint σint,treat 0 0 σint,treat 2 σtreat We use a REML-based likelihood ratio test for Hypothesis 5.1. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 5.2 (the reference model) from that for Model 5.1 (the nested model). The asymptotic null distribution of the test statistic is a mixture of χ21 and χ22 distributions, with equal weights of 0.5 (see Subsection 2.6.2.2). Note that likelihood ratio tests, such as the one used for Hypothesis 5.1, rely on asymptotic (large-sample) theory, so we would not usually carry out this type of test for such a small data set (five rats). Rather, in practice, the random effects would probably be retained without testing, so that the appropriate marginal variance-covariance structure would be obtained for the data set. We present the calculation of this likelihood ratio test for the random effects (and those that follow in this chapter) strictly for illustrative purposes. Hypothesis 5.2. The variance of the residuals is constant across both treatments. The null and alternative hypotheses are 2 2 H0 : σbasal = σcarb 2 2 HA : σbasal = σcarb We test Hypothesis 5.2 using a REML-based likelihood ratio test. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 5.3 (the reference model with heterogeneous residual variances) from that for Model 5.2 (the nested model). The asymptotic distribution of the test statistic under the null hypothesis is a χ21 , and not a mixture of χ2 distributions, as in the case of Hypothesis 5.1. This is because we are testing whether two variances are equal, which does not involve testing any parameter values that are on the boundary of a parameter space. The single degree of freedom corresponds to the one additional variance parameter in Model 5.3. Hypothesis 5.3. The fixed effects associated with the region by treatment interaction can be omitted from Model 5.2. The null and alternative hypotheses are H0 : β4 = β5 = 0 HA : β4 = 0 or β5 = 0 We test Hypothesis 5.3 using F -tests in the software procedures that provide them (SAS, SPSS, and R), based on REML estimation of the parameters in Model 5.2. By default, the procedures in SAS and SPSS both calculate Type III F -tests, whereas R provides a Type I F -test only. In this case, because the interaction term is the last one added to the model, the Type III and Type I F -tests for the fixed interaction effects are comparable. We consider Wald chi-square tests for the fixed interaction effects in Stata and HLM. For the results of these hypothesis tests, see Section 5.5. 5.4 Analysis Steps in the Software Procedures The modeling results for all software procedures are presented and compared in Section 5.6. Models for Repeated-Measures Data:The Rat Brain Example 5.4.1 213 SAS We first read in the raw data from the tab-delimited file, which we assume is stored in the C:\temp folder. Note that the data actually begin on the second row of the file, so we use the firstobs=2 option in the infile statement. We also use the dlm="09"X option to specify that the raw data file is tab-delimited. In addition, we create an indicator variable TREAT for the carbachol treatment: data ratbrain; infile "c:\temp\rat brain.dat" firstobs=2 dlm="09"X; input animal $ treatment region activate; if treatment = 1 then treat = 0; if treatment = 2 then treat = 1; run; We now proceed with the model-fitting steps. Step 1: Fit a model with a “loaded” mean structure (Model 5.1). The SAS syntax to fit Model 5.1 using proc mixed is as follows: title "Model 5.1"; proc mixed data = ratbrain covtest; class region animal; model activate = region treat region*treat / solution; random int / subject = animal type = vc solution v vcorr; run; We have specified the covtest option in the proc mixed statement to obtain the standard errors of the estimated variance components in the output for comparison with the other software procedures. This option also causes SAS to display a Wald test for the variance of the random effects associated with the animals, which we do not recommend for use in testing whether to include random effects in a model (see Subsection 2.6.3.2). The class statement identifies the categorical variables that are required to specify the model. We include the fixed factor, REGION, and the random factor, ANIMAL, in the class statement. The variable TREAT is an indicator variable with a value of one for carbachol and zero for basal treatment, and therefore does not need to be included in the class statement. The model statement sets up the fixed-effects portion of Model 5.1. We specify that the dependent variable, ACTIVATE, is a linear function of the fixed effects associated with the REGION factor, the TREAT indicator, and the REGION × TREAT interaction. The solution option requests that the estimate of each fixed-effect parameter be displayed in the output, along with its standard error and a t-test for the parameter. Software Note: Because REGION is included in the class statement, SAS creates three indicator variables for REGION in the model: the first variable is for the BST region (REGION = l), the second is for the LS region (REGION = 2), and the third is for the VDB region (REGION = 3). The solution option in the model statement sets the fixed-effect parameter for the highest level of REGION (the VDB region) to zero, and consequently, VDB becomes the reference category. 214 Linear Mixed Models: A Practical Guide Using Statistical Software The random statement sets up the random-effects structure for the model. In this case, ANIMAL is identified as the subject, indicating that it is a random factor. By specifying random int, we include a random effect associated with the intercept for each animal. We have included several options after the slash (/) in the random statement. The type= option defines the structure of the D covariance matrix for random effects. Because there is only one random effect specified in Model 5.1, the D matrix is 1 × 1, and defining its structure is not necessary. Although proc mixed uses a variance components structure by default, we specify it explicitly using the type=vc option. The type option becomes more important later when we specify models with two random effects, one associated with the intercept and another associated with the effect of treatment (Models 5.2 and 5.3). We have also requested that the estimated marginal covariance matrix Vi and the estimated marginal correlation matrix be displayed in the output for the first animal, by specifying the v and vcorr options in the random statement. The solution option in the random statement requests that the predicted random effect (i.e., the EBLUP) associated with each animal also be displayed in the output. Step 2: Select a structure for the random effects (Model 5.1 vs. Model 5.2). We now fit Model 5.2 by including an additional random effect associated with the treatment (carbachol vs. basal) for each animal. title "Model 5.2"; proc mixed data = ratbrain covtest; class animal region; model activate = region treat region*treat / solution; random int treat / subject = animal type = un solution v vcorr; run; The random statement from Model 5.1 has been updated to include TREAT. The type = option has also been changed to type = un, which specifies an “unstructured” covariance structure for the 2 × 2 D matrix. The SAS syntax for an alternative random effects specification of Model 5.2 is as follows: title "Alternative Random Effects Specification for Model 5.2"; proc mixed data = ratbrain covtest; class region animal treatment; model activate = region treatment region*treatment / solution; random treatment / subject=animal type=un solution v vcorr; lsmeans region*treatment / slice=region; run; Because we include the original TREATMENT variable in the class statement, SAS generates two dummy variables that will be used in the Zi matrix, one for carbachol and the other for basal treatment. This has the effect of requesting two random intercepts, one for each level of treatment. The first syntax for Model 5.2, in which a dummy variable, TREAT, was used for treatment, was specified to facilitate comparisons across software procedures. Specifically, the HLM2 procedure cannot directly accommodate categorical variables as predictors, unless they are specified as dummy variables. We also include an lsmeans statement for the region*treatment interaction term to create a comparison of the effects of treatment within each region (/slice=region). We do not display these results generated by the lsmeans statement. Similar results obtained using SPSS are presented in Section 5.7. Models for Repeated-Measures Data:The Rat Brain Example 215 To test Hypothesis 5.1, we use a REML-based likelihood ratio test. See the SAS section in Chapter 3 (Subsection 3.4.1) for two different versions of the syntax to carry out a likelihood ratio test for random effects, and Subsection 5.5.1 for the results of the test. Step 3: Select a covariance structure for the residuals (Model 5.2 vs. Model 5.3). To fit Model 5.3, which has heterogeneous residual variances for the two levels of TREATMENT, we add a repeated statement to the syntax used for Model 5.2 and include the original TREATMENT variable in the class statement: title "Model 5.3"; proc mixed data = ratbrain; class animal region treatment; model activate = region treat region*treat / solution; random int treat / subject = animal type = un solution v vcorr; repeated region / subject = animal*treatment group = treatment; run; The subject = animal*treatment option in the repeated statement defines two 3 × 3 blocks on the diagonal of the 6 × 6 Ri matrix, with each block corresponding to a level of treatment (see Subsection 5.3.2.1). Because the type= option is omitted, the default “variance component” structure for each block is used. The group = treatment option specifies that the variance components defining the blocks of the Ri matrix be allowed to vary for each level of TREATMENT. We carry out a likelihood ratio test of Hypothesis 5.2 to decide whether to retain the heterogeneous residual variance structure for the two treatments in Model 5.3. SAS syntax for this test is not shown here; refer to Subsection 3.4.1 for examples of relevant SAS syntax. Based on the nonsignificant result of this test (p = 0.66), we decide not to include the heterogeneous residual variances, and keep Model 5.2 as our preferred model at this stage of the analysis. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 5.2). The result of the F -test for Hypothesis 5.3 in Model 5.2 is reported in Sections 5.5 and 5.6. The fixed effects associated with the REGION × TREAT interaction are significant (p < 0.001), so we retain these fixed effects and keep Model 5.2 as our final model. 5.4.2 SPSS We assume that the data set created to carry out the data summary in Subsection 5.2.2 is currently open in SPSS. Prior to fitting any models we generate TREAT, an indicator variable for the carbachol treatment: IF (treatment = 1) treat = 0. IF (treatment = 2) treat = 1. EXECUTE . Step 1: Fit a model with a “loaded” mean structure (Model 5.1). We begin the SPSS analysis by setting up the syntax to fit Model 5.1 using the MIXED command: 216 Linear Mixed Models: A Practical Guide Using Statistical Software * Model 5.1. MIXED activate BY region WITH treat /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = region treat region*treat | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM INTERCEPT | SUBJECT(animal) COVTYPE(VC). In this syntax, ACTIVATE is listed as the dependent variable. The variable REGION is specified as a categorical factor because it appears after the BY keyword. This causes SPSS to generate the appropriate indicator variables for the REGION factor and for any interactions involving REGION. The variable TREAT is specified as a covariate because it appears after the WITH keyword. The /FIXED subcommand lists the variables and interactions that have associated fixed effects in Model 5.1. These terms include REGION, TREAT, and the REGION × TREAT interaction. The /METHOD subcommand specifies the REML estimation method, which is the default. The /PRINT subcommand requests that the SOLUTION for the estimated fixed-effect parameters in the model be displayed in the output. The /RANDOM subcommand indicates that the only random effect for each level of the SUBJECT variable (ANIMAL) is associated with the INTERCEPT. The covariance structure for the random effects is specified as variance components (the default), using the COVTYPE(VC) syntax. Because there is only a single random effect for each animal included in Model 5.1, there is no need to use a different covariance structure (D is a 1 × 1 matrix). Step 2: Select a structure for the random effects (Model 5.1 vs. Model 5.2). We now fit Model 5.2, which adds a random effect associated with treatment for each animal. This allows the effect of the carbachol treatment (vs. basal) to vary from animal to animal. * Model 5.2. MIXED activate BY region WITH treat /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = region treat region*treat | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION G /RANDOM INTERCEPT treat | SUBJECT(animal) COVTYPE(UN). The /RANDOM subcommand has been updated by adding TREAT, which adds a random treatment effect to the model for each animal. The option COVTYPE(UN) specifies that the 2 × 2 D matrix defined in Subsection 5.3.2.1 for Model 5.2 is “unstructured.” We request that the estimated D matrix be displayed in the output by including the G option in the /PRINT subcommand. Note that the G option requests that a single block of the estimated G matrix (defined in Subsection 2.2.3) be displayed in the output. To test Hypothesis 5.1, and determine whether the random treatment effects can be omitted from Model 5.2, we perform a likelihood ratio test. We decide to retain the random Models for Repeated-Measures Data:The Rat Brain Example 217 treatment effects as a result of this significant test (see Subsection 5.5.1) and keep Model 5.2 as the preferred one at this stage of the analysis. Step 3: Select a covariance structure for the residuals (Model 5.2 vs. Model 5.3). In this step of the analysis we fit Model 5.3, in which the residual variances in the Ri matrix are allowed to vary for the different treatments. * Model 5.3. MIXED activate BY region WITH treat /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = region treat region*treat | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION /RANDOM INTERCEPT treat | SUBJECT(animal) COVTYPE(UN) /REPEATED = treatment | SUBJECT(animal*region) COVTYPE(DIAG). The /REPEATED subcommand specifies that repeated measures, uniquely indexed by levels of the TREATMENT variable, are collected for each combination of levels of the ANIMAL and REGION variables, by including the SUBJECT(ANIMAL*REGION) option. Note that this specification of the /REPEATED subcommand is different from that used for proc mixed in SAS. This setup is required in SPSS to model heterogenous residual variances for each level of TREATMENT via the DIAG covariance structure. A diagonal covariance structure for the Ri matrix is specified by COVTYPE(DIAG). The DIAG covariance type in SPSS means that there are heterogeneous variances for each level of TREATMENT on the diagonal of the 6 × 6 Ri matrix and that the residuals are not correlated (see Subsection 5.3.2.1). This specification allows observations at different levels of treatment on the same animal to have different residual variances. After fitting Model 5.3, we test Hypothesis 5.2 using a likelihood ratio test to decide whether we should retain the heterogeneous residual variance structure. Based on the nonsignificant result of this test, we keep the simpler model (Model 5.2) at this stage of the analysis as our preferred model. See Subsection 5.5.2 for the result of this likelihood ratio test. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 5.2). We use a Type III F -test to test Hypothesis 5.3 to decide if we wish to retain the fixed effects associated with the REGION × TREAT interaction in Model 5.2. Because this test (shown in the SPSS output for Model 5.2) is significant, we retain these fixed effects and select Model 5.2 is our final model (see Table 5.8). We refit Model 5.2 in SPSS, and add syntax to generate pairwise comparisons of the means at each brain region for each treatment to aid interpretation of the significant region by treatment interaction: * Model 5.2 (w/ interaction means). MIXED activate BY region WITH treat /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) 218 Linear Mixed Models: A Practical Guide Using Statistical Software LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = region treat region*treat | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION G /RANDOM INTERCEPT treat | SUBJECT(animal) COVTYPE(UN) /EMMEANS = TABLES(region) WITH(treat=1) COMPARE ADJ(BON) /EMMEANS = TABLES(region) WITH(treat=0) COMPARE ADJ(BON). Note that there are two instances of the /EMMEANS subcommand in the preceding syntax. These subcommands request pairwise comparisons of the estimated marginal activation means for each brain region, first for the carbachol treatment (treat=l) and then for the basal treatment (treat=0). A Bonferroni adjustment for multiple comparisons is requested with the ADJ(BON) option. We consider diagnostics for Model 5.2 using SPSS in Section 5.9. 5.4.3 R We begin the analysis of the Rat Brain data using R by reading the tab-delimited raw data set (having the “long” structure described in Table 5.3 and variable names in the first row of the raw data) into a data frame object. Recall that the h = T option tells R that the raw data file has a header (first row) containing variable names. > rat.brain <- read.table("C:\\temp\\rat brain.dat", h = T) We then attach the vectors (or variables) in the rat.brain data frame object to R’s working memory: > attach(rat.brain) Because R by default treats the lowest category (alphabetically or numerically) of a categorical fixed factor as the reference category in a model, we recode the REGION and TREATMENT variables to obtain results consistent with those in SAS, SPSS, and Stata. We first create a new factor REGION.F, which has VDB (REGION = 3) as the lowest value (equal to zero). We then create TREAT, which is an indicator variable for the carbachol treatment (TREAT = 1 for carbachol, TREAT = 0 for basal). > > > > > > > > region.f <- region region.f[region == 1] <- 1 region.f[region == 2] <- 2 region.f[region == 3] <- 0 region.f <- factor(region.f) treat <- treatment treat[treatment == 1] <- 0 treat[treatment == 2] <- 1 We add these new recoded variables to the rat.brain data frame object: > rat.brain <- data.frame(rat.brain, region.f, treat) Now that the appropriate variables in the Rat Brain data set have been attached to memory, we can analyze the data using the available functions in the nlme and lme4 packages. Models for Repeated-Measures Data:The Rat Brain Example 5.4.3.1 219 Analysis Using the lme() Function The nlme package first needs to be loaded, so that the lme() function can be used in the analysis: > library(nlme) Step 1: Fit a model with a “loaded” mean structure (Model 5.1). Model 5.1 is fitted to the data using the lme() function: > # Model 5.1. > model5.1.fit <- lme(activate ~ region.f*treat, random = ~1 | animal, method = "REML", data = rat.brain) We describe each part of this specification of the lme() function: • model5.1.fit is the name of the object that contains the results of the fitted linear mixed model. • The first argument of the function, activate ~ region.f*treat, is the model formula, which defines the response variable (activate), and the terms with associated fixed effects in the model (region.f and treat). The asterisk (*) requests that the main effects associated with each variable (the REGION factor and the TREAT indicator) be included in the model, in addition to the fixed effects associated with the interaction between the variables. • The second argument of the function, random = ~1 | animal, includes a random effect associated with the intercept (~1) for each level of the categorical random factor (animal). • The third argument of the function, method = "REML", tells R that REML estimation should be used for the desired covariance parameters in the model. This is the default estimation method for the lme() function. • The final argument of the function, data = rat.brain, indicates the name of the data frame object to be used. After the function is executed, estimates from the model fit can be obtained using the summary() function: > summary(model5.1.fit) Additional results of interest for this LMM fit can be obtained by using other functions in conjunction with the model5.1.fit object. For example, one can look at Type I (sequential) F -tests for the fixed effects in this model using the anova() function: > anova(model5.1.fit) Step 2: Select a structure for the random effects (Model 5.1 vs. Model 5.2). We now fit Model 5.2 by updating Model 5.1 to include a second animal-specific random effect associated with the treatment (carbachol vs. basal): > # Model 5.2. > model5.2.fit <- update(model5.1.fit, random = ~ treat | animal) 220 Linear Mixed Models: A Practical Guide Using Statistical Software Note that we use the update() function to specify Model 5.2, by modifying the random effects specification in Model 5.1 to include random TREAT effects associated with each animal. The random effects associated with the intercept for each animal will be included in the model by default, and an “unstructured” covariance structure for the D matrix will also be used by default (see Subsection 5.3.2.1 for specification of the D matrix for Models 5.2 and 5.3). We use the summary() function to display results generated by fitting Model 5.2: > summary(model5.2.fit) We also use the anova() function to obtain Type I F -tests of the fixed effects in Model 5.2: > anova(model5.2.fit) We test Hypothesis 5.1 to decide if we need the random treatment effects, using a likelihood ratio test. This would typically not be done with such a small sample of animals (given the asymptotic nature of likelihood ratio tests), but we perform this test for illustrative purposes. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 5.2 (the reference model) from that for Model 5.1 (the value of the test statistic is 275.3 − 249.2 = 26.1). The –2 REML log-likelihood values can be obtained from the output provided by the summary() function for each model. The test statistic has a null distribution that is a mixture of χ21 and χ22 distributions with equal weights of 0.5, so the anova() function cannot be used for the p-value. Instead, we calculate a p-value for the test statistic as follows: > 0.5*(1 - pchisq(26.1,1)) + 0.5*(1 - pchisq(26.1,2)) [1] 1.237138e-06 See Subsection 5.5.1 for details. The test statistic is significant (p < 0.001), so we decide to reject the null hypothesis and retain the random treatment effects in the model; Model 5.2 is our preferred model at this stage of the analysis. Step 3: Select a covariance structure for the residuals (Model 5.2 vs. Model 5.3). We now fit Model 5.3, which has a heterogeneous variance structure for the Ri matrix (i.e., it allows the residual variances to differ for the two levels of TREAT). This is accomplished by using the weights argument, as shown in the following syntax: > # Model 5.3. > model5.3.fit <- lme(activate ~ region.f*treat, random = ~ treat | animal, weights = varIdent(form = ~ 1 | treat), data = rat.brain) We use the summary() function to obtain estimates of the parameters in Model 5.3: > summary(model5.3.fit) We now test Hypothesis 5.2 with a likelihood ratio test, to decide if we wish to retain the heterogeneous residual variance structure. To calculate the test statistic, we subtract the –2 REML log-likelihood of Model 5.3 (the reference model, with heterogeneous residual variances) from that for Model 5.2 (the nested model), using the anova() function: Models for Repeated-Measures Data:The Rat Brain Example 221 > # Likelihood ratio test for Hypothesis 5.2. > anova(model5.2.fit, model5.3.fit) The result of this test is not significant (p = 0.66), so we keep Model 5.2 as our preferred model at this stage of the analysis (see Subsection 5.5.2). Step 4: Reduce the model by removing nonsignificant fixed effects (Model 5.2). The Type I F -tests reported for the fixed effects in Model 5.2 (see Section 5.6) indicate that the fixed effects associated with the REGION × TREAT interaction are significant (p < 0.05), so we reject the null hypothesis for Hypothesis 5.3. We therefore retain these fixed effects, and select Model 5.2 as our final model. Software Note: The Type I F -test in R for the fixed effects associated with the REGION × TREAT interaction is comparable to the Type III F -tests performed in SAS and SPSS because the interaction is added last to the model, and the test is therefore conditional on the main fixed effects of REGION and TREAT (which is also the case for the Type III F -tests reported by SAS and SPSS). 5.4.3.2 Analysis Using the lmer() Function The lme4 package first needs to be loaded, so that the lmer() function can be used in the analysis: > library(lme4) Step 1: Fit a model with a “loaded” mean structure (Model 5.1). Model 5.1 is fitted to the data using the lmer() function: > # Model 5.1. > model5.1.fit.lmer <- lmer(activate ~ region.f*treat + (1|animal), data = rat.brain, REML = T) We describe each part of this specification of the lmer() function: • model5.1.fit.lmer is the name of the object that contains the results of the fitted linear mixed model. • The first argument of the function, activate ~ region.f*treat + (1|animal), is the model formula, which defines the response variable (activate), and the terms with associated fixed effects in the model (region.f and treat). The asterisk (*) requests that the main effects associated with each variable (the REGION factor and the TREAT indicator) be included in the model, in addition to the fixed effects associated with the interaction between the variables. • The (1|animal) term in the model formula indicates that a random effect associated with the intercept should be included for each level of the categorical random factor animal. • The third argument of the function, data = rat.brain, indicates the name of the data frame object to be used. • The final argument of the function, REML = T, tells R that REML estimation should be used for the desired covariance parameters in the model. This is the default estimation method for the lmer() function. 222 Linear Mixed Models: A Practical Guide Using Statistical Software After the function is executed, estimates from the model fit can be obtained using the summary() function: > summary(model5.1.fit.lmer) Additional results of interest for this LMM fit can be obtained by using other functions in conjunction with the model5.1.fit.lmer object. For example, one can look at Type I (sequential) F -tests for the fixed effects in this model using the anova() function: > anova(model5.1.fit.lmer) Software Note: Applying the summary() and anova() functions to model fit objects produced by the lme4 package version of the lmer() function does not result in pvalues for the computed t-statistics and F -statistics. This is primarily due to the lack of consensus in the literature over appropriate degrees of freedom for these test statistics under the null hypothesis. In general, we recommend use of the lmerTest package in R for users interested in testing hypotheses about parameters estimated using the lmer() function. In this chapter, we illustrate likelihood ratio tests using selected functions available in the lme4 package. See Chapter 3 for an example of using the lmerTest package, in the case of a random intercept model. EBLUPs of the random effects for each animal can also be displayed: > ranef(model5.1.fit.lmer) Step 2: Select a structure for the random effects (Model 5.1 vs. Model 5.2). We now fit Model 5.2 by updating Model 5.1 to include a second animal-specific random effect associated with the treatment (carbachol vs. basal): > # Model 5.2. > model5.2.fit.lmer <- lmer(activate ~ region.f*treat + (treat|animal), data = rat.brain, REML = T) Note that we modify the random-effects specification in Model 5.1 to include random TREAT effects associated with each animal. The random effects associated with the intercept for each animal will be included in the model by default, and an “unstructured” covariance structure for the D matrix will also be used by default (see Subsection 5.3.2.1 for specification of the D matrix for Models 5.2 and 5.3). We use the summary() function to display results generated by fitting Model 5.2: > summary(model5.2.fit.lmer) We next test Hypothesis 5.1 to decide if we need the random treatment effects, using a likelihood ratio test. This would typically not be done with such a small sample of animals (given the asymptotic nature of likelihood ratio tests), but we perform this test for illustrative purposes. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 5.2 (the reference model) from that for Model 5.1 (the value of the test statistic is 275.3 − 249.2 = 26.1). The –2 REML log-likelihood values can be obtained from the output provided by the summary() function for each model. The test statistic has a null distribution that is a mixture of χ21 and χ22 distributions with equal weights of 0.5, so the anova() function cannot be used for the p-value. Instead, we calculate a p-value for the test statistic as follows: Models for Repeated-Measures Data:The Rat Brain Example 223 > 0.5*(1 - pchisq(26.1,1)) + 0.5*(1 - pchisq(26.1,2)) [1] 1.237138e-06 See Subsection 5.5.1 for details. The test statistic is significant (p < 0.001), so we decide to reject the null hypothesis and retain the random treatment effects in the model; Model 5.2 is our preferred model at this stage of the analysis. Step 3: Select a covariance structure for the residuals (Model 5.2 vs. Model 5.3). Unfortunately, we cannot fit models with heterogeneous residual variances when using the current implementation of the lmer() function. See Section 5.4.3.1 for an example of how this model can be fitted using the lme() function, and the results of the corresponding hypothesis tests. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 5.2). The Type I F -tests reported for the fixed effects in Model 5.2 (see Section 5.6) suggest that the fixed effects associated with the REGION × TREAT interaction are significant (given the large F -statistics, despite the absence of computed p-values), so we reject the null hypothesis for Hypothesis 5.3. We remind readers that ML-based likelihood ratio tests could also be performed at this point to test fixed effects for significance (given larger sample sizes), by fitting a model without the interaction using ML estimation (using the REML = F argument in lmer()), and then using the anova() function to compare the two models (see, for example, Section 4.4.3.2). We therefore retain these fixed effects, and select Model 5.2 as our final model. 5.4.4 Stata We begin the analysis in Stata by reading the tab-delimited raw data file (having the “long” structure) into Stata’s working memory from the C:\temp folder: . insheet using "C:\temp\rat brain.dat", tab Users of web-aware Stata can also import the data directly from the book’s web page: . insheet using http://www-personal.umich.edu/~bwest/rat brain.dat Next, we generate an indicator variable (TREAT) for the carbachol treatment: . gen treat = 0 if treatment == 1 . replace treat = 1 if treatment == 2 We now proceed with the analysis steps. Step 1: Fit a model with a “loaded” mean structure (Model 5.1). We first fit Model 5.1 using the mixed command. Because Stata by default treats the lowestvalued level (alphabetically or numerically) of a categorical factor as the reference category (i.e., REGION = l, or the BST region), we explicitly declare the value 3 for REGION to be the reference region, using the ib3. modifier as indicated below: . * Model 5.1. . mixed activate ib3.region treat ib3.region#c.treat || animal:, covariance(identity) variance reml 224 Linear Mixed Models: A Practical Guide Using Statistical Software The mixed command has three parts. The first part specifies the fixed effects; the second part, the random effects; and the third part (after a comma), the covariance structure for the random effects, together with miscellaneous options. We describe each portion of the mixed syntax for Model 5.1 in the following paragraphs. The first variable listed after the mixed command is the continuous dependent variable, ACTIVATE. In this particular command, we then include fixed effects associated with values 1 and 2 of the categorical factor REGION (using ib3.region), a fixed effect associated with value 1 of TREAT, and fixed effects associated with the interaction between REGION and TREAT (using ib3.region#c.treat). We note the use of c.treat, which more generally indicates that TREAT is a “continuous” variable in this specification of the interaction (we did not indicate factor variable coding for the TREAT effect included in the model). Although the indicator variable for TREAT is of course not strictly continuous, this simplifies the specification of the random effects in Model 5.2, which we will explain shortly. After the dependent variable and the fixed effects have been identified, two vertical bars (||) precede the specification of the random effects in the model. We list the ANIMAL variable (animal:) as the variable that defines clusters of observations. Because no additional variables are listed after the colon, there will only be a single random effect in the model, associated with the intercept for each animal. The covariance structure for the random effects is specified after the comma following the random effects. The covariance(identity) option tells Stata that an identity covariance structure is to be used for the single random effect associated with the intercept in Model 5.1 (this option is not necessary for models that only include a single random effect, because the D matrix will only have a single variance component). Finally, the variance option requests that the estimated variances of the random animal effects and the residuals be displayed in the output (rather than the default estimated standard deviations), and the reml option requests REML estimation (as opposed to ML estimation, which is the default). Information criteria associated with the model fit (i.e., the AIC and BIC statistics) can be obtained by using the following command: . estat ic Once a model has been fitted using the mixed command, EBLUPs of the random effects associated with the levels of the random factor (ANIMAL) can be saved in a new variable (named EBLUPS) using the following command: . predict eblups, reffects The saved EBLUPs can then be used to check for random animal effects that appear to be outliers. Step 2: Select a structure for the random effects (Model 5.1 vs. Model 5.2). We now fit Model 5.2, including a second random effect associated with each animal that allows the animals to have unique treatment effects (carbachol vs. basal): . * Model 5.2. . mixed activate ib3.region treat ib3.region#c.treat, || animal: treat, covariance(unstruct) variance reml The random-effects portion of the model specified after the two vertical bars (||) has been changed. We have added the TREAT variable after the colon (animal: treat) to Models for Repeated-Measures Data:The Rat Brain Example 225 indicate that there will be an additional random effect in the model for each animal, associated with the effect of the carbachol treatment. Stata includes a random effect associated with the intercept for each level of ANIMAL by default. The covariance structure of the random effects has also been changed to covariance(unstruct). This will fit a model with an unstructured D matrix, as defined in Subsection 5.3.2.1. Alternatively, we could have used the following command to try and fit Model 5.2 (and also Model 5.1, minus the random treatment effects): . * Model 5.2. . mixed activate ib3.region ib0.treat ib3.region#ib0.treat, || animal: R.treat, covariance(unstruct) variance reml Note in this command that we identify treat as a categorical factor, with 0 as the reference category, when including the fixed effect of TREAT in addition to the interaction term. Because we have specified TREAT as a categorical factor, and we wish to allow the TREAT effects to randomly vary across levels of ANIMAL, we need to use R.treat in the random effects specification (use of ib0.treat after the colon would produce an error message). Unfortunately, when random effects associated with the nonreference levels of a categorical factor are included in a model, Stata does not allow the user to specify an unstructured covariance structure (most likely to prevent the estimation of too many covariance parameters for the random effects of categorical factors with several levels). This is the reason that we decided to specify the TREAT indicator as if it were a “continuous” predictor, which is perfectly valid for binary predictor variables. After fitting Model 5.2 (using the first command above), Stata automatically displays the following output, which is a likelihood ratio test for the two random effects in the model: LR test vs. linear regression: chi2(3) = 42.07 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference This likelihood ratio test is a conservative test of the need for both random effects associated with each animal in Model 5.2. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 5.2 from that for a linear regression model with no random effects. The p-value for the test is obtained by referring the test statistic to a χ2 distribution with 3 degrees of freedom, displayed as chi2(3), because the reference model contains 3 more covariance parameters than the nested model. Software Note: Stata uses a standard likelihood ratio test, which is a conservative approach. The appropriate theory for testing a model with multiple random effects (e.g., Model 5.2) vs. a nested model without any random effects (i.e., an ordinary linear regression model) has not yet been developed. Stata users can click on the link LR test is conservative in the Stata output for a detailed explanation, including references. We do not consider this test in any of the other software procedures. We can now perform a likelihood ratio test of Hypothesis 5.1, to decide if we need the random treatment effects in Model 5.2, by subtracting the –2 REML log-likelihood for Model 5.2 (the reference model) from that for Model 5.1 (the nested model). The asymptotic null distribution of the test statistic is a mixture of χ21 and χ22 distributions with equal weights of 0.5. Because the test is significant at α = 0.05, we choose Model 5.2 as our preferred model at this stage of the analysis (see Subsection 5.5.1 for a discussion of this test and its results). 226 Linear Mixed Models: A Practical Guide Using Statistical Software Step 3: Select a covariance structure for the residuals (Model 5.2 vs. Model 5.3). We now fit Model 5.3, allowing the variance of the residuals to vary depending on the level of the TREAT variable: . mixed activate ib3.region treat ib3.region#c.treat, || animal: treat, covariance(unstruct) residuals(independent, by(treat)) variance reml Note the addition of residuals(independent, by(treat)) in the command above. This option indicates that the residuals remain independent, but that the residual variance should be allowed to vary across levels of the TREAT variable. In preparation for testing Hypothesis 5.2 and the need for heterogeneous residual variance, we save the results from this model fit in a new object: . est store model53 We now fit Model 5.2 once again, save the results from that model fit in a separate object, and perform the likelihood ratio test using the lrtest command (where Model 5.2 is the nested model): . mixed activate ib3.region treat ib3.region#c.treat, || animal: treat, covariance(unstruct) variance reml . est store model52 . lrtest model53 model52 The resulting likelihood ratio test statistic suggests that the null hypothesis (i.e., the nested model with constant residual variance across the treatment groups) should not be rejected, and we therefore proceed with Model 5.2. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 5.2). We now test Hypothesis 5.3 to decide whether we need to retain the fixed effects associated with the REGION × TREAT interaction in Model 5.2. To do this, we first refit Model 5.2, and then use the test command to generate a Wald χ2 test: . mixed activate ib3.region treat ib3.region#c.treat, || animal: treat, covariance(unstruct) variance reml . test 1.region#c.treat 2.region#c.treat The two terms following the test command are the indicator variables generated by Stata to represent the REGION × TREAT interaction (the products of the indicator variables for the nonreference categories of each factor). These product terms are listed in the output displayed for the estimated fixed effects, where the values of REGION associated with the coefficients for the interaction term (1 and 2) indicate the values that should be used in the specification of the test statement above. The result of this test is significant (p < 0.05), so we retain the fixed interaction effects and select Model 5.2 as our final model. 5.4.5 5.4.5.1 HLM Data Set Preparation To perform the analysis of the Rat Brain data using the HLM software, two separate data sets need to be prepared: Models for Repeated-Measures Data:The Rat Brain Example 227 1. The Level 1 (repeated-measures) data set: Each row in this data set corresponds to an observation for a given level of REGION and TREATMENT on a given rat. This data set is similar in structure to the data set displayed in Table 5.3. The Level 1 data set includes ANIMAL, the REGION variable, the TREATMENT variable, and the response variable, ACTIVATE. This data set should be sorted by ANIMAL, TREATMENT, and REGION. 2. The Level 2 (rat-level) data set: This contains a single observation for each level of ANIMAL. The variables in this file represent measures that remain constant for a given rat. HLM requires that the Level 2 data set must include ANIMAL and at least one other variable measured at the rat level for the purpose of generating the MDM file. In this example, we do not have any covariates that remain constant for a given rat, so we include a variable (NOBS) representing the number of repeated measures for each rat (six). This data set should be sorted by ANIMAL. To fit the LMMs used for this example in HLM, we need to create several new indicator variables and add them to the Level 1 data set. First, we need to create an indicator variable, TREAT, for the carbachol treatment. We also need to add two indicator variables representing the nonreference levels of REGION and two variables representing the interaction between REGION and TREAT. These new variables need to be created in the software package used to create the input data files prior to importing the data into HLM. We import the Level 1 and Level 2 data sets from SPSS to HLM for this example and show the SPSS syntax to create these five variables in the following Level 1 file: IF (treatment = 1) treat = 0 . IF (treatment = 2) treat = 1 . EXECUTE . COMPUTE region1 = (region = 1) . COMPUTE region2 = (region = 2) . EXECUTE . COMPUTE reg1 tre = region1 * treat . COMPUTE reg2 tre = region2 * treat . EXECUTE . After these two data sets have been created, we can proceed to create the multivariate data matrix (MDM), and fit Model 5.1. 5.4.5.2 Preparing the MDM File In the main HLM menu, click File, Make new MDM file and then Stat Package Input. In the window that opens, select HLM2 to fit a two-level hierarchical linear model, and click OK. Select the Input File Type as SPSS/Windows. To prepare the MDM file for Model 5.1, locate the Level 1 Specification area, and Browse to the location of the Level 1 data set. Click Open after selecting the Level 1 SPSS file, click the Choose Variables button, and select the following variables: ANIMAL (click “ID” for the ANIMAL variable, because this variable identifies the Level 2 units), REGION1, REGION2, TREAT, REG1 TRT, and REG2 TRT (click “in MDM” for each of these variables). Check “in MDM” for the continuous response variable, ACTIVATE, as well. Click OK when finished. 228 Linear Mixed Models: A Practical Guide Using Statistical Software Next, locate the Level 2 Specification area, and Browse to the location of the Level 2 SPSS data set. Click Open after selecting the file, and click the Choose Variables button to include ANIMAL (click on “ID”) and the variable indicating the number of repeated measures on each animal, NOBS (click on “in MDM”). Software Note: Specifying at least one variable to be included in the MDM file at Level 2 (besides the ID variable) is not optional. This can be any variable, as long as it has nonmissing values for all levels of the ID variable (i.e., ANIMAL). This variable does not need to be included in the analysis and is not part of the analysis for this example. After making these choices, check the longitudinal (occasions within persons) option for this repeated-measures data set (we use this option, although we actually have measures within rats; this selection will not affect the analysis, and only determines the notation that HLM uses to display the model). Select No for Missing Data? in the Level 1 data set, because we do not have any missing data in this analysis. In the upper-right corner of the MDM window, enter a name with a .mdm extension for the MDM file. Save the .mdmt template file under a new name (click Save mdmt file), and click Make MDM. After HLM has processed the MDM file, click the Check Stats button to display descriptive statistics for the variables in the Level 1 and Level 2 files (this is not optional). Click Done to begin building Model 5.1. Step 1: Fit a model with a “loaded” mean structure (Model 5.1). In the model-building window, identify ACTIVATE as the Outcome variable. To add more informative subscripts to the models, click File and Preferences and then choose Use level subscripts. To complete the specification of Model 5.1, we first add the effects of the uncentered Level 1 indicator variables TREAT, REGION1, and REGION2 to the model, in addition to the effects of the interaction variables REG1 TRE and REG2 TRE. The Level 1 model is displayed in HLM as follows: Model 5.1: Level 1 Model ACTIVATEti = π0i + π1i (TREATti ) + π2i (REGION1ti ) + π3i (REGION2ti ) + π4i (REG1 TREti ) + π5i (REG2 TREti ) + eti The Level 2 equation for the rat-specific intercept (π0i ) includes a constant fixed effect (β00 ) and a random effect associated with the rat (r0i ), which allows the intercept to vary randomly from rat to rat. The Level 2 equations for the five rat-specific coefficients for TREAT, REGION1, REGION2, REG1 TRE, and REG2 TRE (π1i through π5i respectively) are simply defined as constant fixed effects (β10 through β50 ): Model 5.1: Level 2 Model π0i π1i π2i ... π5i = β00 + r0i = β10 = β20 = β50 We display the overall LMM by clicking the Mixed button in the model building window: Models for Repeated-Measures Data:The Rat Brain Example 229 Model 5.1: Overall Mixed Model ACTIVATEti = β00 + β10 ∗ TREATti + β20 ∗ REGION1ti + β30 ∗ REGION2ti + β40 ∗ REG1 TREti + β50 ∗ REG2 TREti + r0i + eti This model is the same as the general specification of Model 5.1 introduced in Subsection 5.3.2.1, although the notation is somewhat different. The correspondence between the HLM notation and the notation used in Subsection 5.3.2.1 is shown in Table 5.4. After specifying Model 5.1, click Basic Settings to enter a title for this analysis (such as “Rat Brain Data: Model 5.1”) and a name for the output (.html) file that HLM generates when fitting this model. Note that the default outcome variable distribution is Normal (Continuous). Click OK to return to the model-building window, and then click Other Settings and Hypothesis Testing to set up multivariate hypothesis tests for the fixed effects in Model 5.1 (see Subsection 3.4.5 for details). Set up the appropriate general linear hypothesis tests for the fixed effects associated with REGION, TREAT, and the REGION × TREAT interaction. After setting up the desired tests for the fixed effects in the model, return to the modelbuilding window, and click File and Save As to save this model specification in a new .hlm file. Finally, click Run Analysis to fit the model. After the estimation of the parameters in Model 5.1 has finished, click on File and View Output to see the resulting estimates. Step 2: Select a structure for the random effects (Model 5.1 vs. Model 5.2). We now fit Model 5.2, including a second random effect associated with each animal that allows for animal-specific treatment effects (carbachol vs. basal). We include this second random effect by clicking on the shaded r1i term in the Level 2 equation for the rat-specific effect of TREAT. Model 5.2: Level 2 Equation for Treatment Effects π1i = β10 + r1i The Level 2 equation for TREAT now implies that the rat-specific effect of TREAT (π1i ) depends on an overall fixed effect (β10 ) and a random effect (r1i ) associated with rat i. After adding the random effect to this equation, click Basic Settings to enter a different title for the analysis and change the name of the associated output file. HLM will again perform the same general linear hypothesis tests for the fixed effects in the model that were specified for Model 5.1. Save the .hlm file under a different name (File, Save As ...), and click Run Analysis to fit the model. Click File and View Output to see the resulting parameter estimates and significance tests when HLM has finished processing the model. To test Hypothesis 5.1 (to decide if we need the random treatment effects), we subtract the –2 REML log-likelihood for Model 5.2 (reported as the deviance in the Model 5.2 output) from that for Model 5.1 and refer the difference to the appropriate mixture of χ2 distributions (see Subsection 5.5.1 for more details). Step 3: Select a covariance structure for the residuals (Model 5.2 vs. Model 5.3). We now fit Model 5.3, which has the same fixed and random effects as Model 5.2 but allows different residual variances for the basal and carbachol treatments. To specify a model with heterogeneous residual variances at Level 1, click Other Settings in the menu of the model-building window, and then click Estimation Settings. 230 Linear Mixed Models: A Practical Guide Using Statistical Software In the estimation settings window, click the Heterogeneous sigma2 button, which allows us to specify variables measured at Level 1 of the data set that define the Level 1 residual variance. In the window that opens, double-click on TREAT to identify it as a predictor of the Level 1 residual variance. Click OK to return to the estimation settings window, and then click OK to return to the model-building window. Note that HLM has added another equation to the Level 1 portion of the model, shown as follows: Model 5.3: Level 1 Model (Heterogeneous Residual Variance) 2 2 Var(eti ) = σti and log(σti ) = α0 + α1 (TREATti ) This equation defines the parameterization of the heterogeneous residual variance at Level 1: two parameters (α0 and α1 ) are estimated, which define the variance as a function of TREAT. Because HLM2 models a log-transformed version of the residual variance, both sides of the preceding equation need to be exponentiated to calculate the estimated residual variance for a given level of treatment. For example, once estimates of α0 and α1 have been computed and displayed in the HLM output, the estimated residual variance for the carbachol treatment can be calculated as exp(α0 + α1 ). We enter a new title and a new output file name for Model 5.3 under Basic Settings, and then save the .hlm file under a different name. Click Run Analysis to fit the new model. HLM divides the resulting output into two sections: a section for Model 5.2 with homogeneous variance and a section for Model 5.3 with heterogeneous residual variance for observations from the carbachol and basal treatments. Software Note: The current version of HLM only allows models with heterogeneous variances for residuals defined by groups at Level 1 of the data. In addition, REML estimation for models with heterogeneous residual variances is not available, so ML estimation is used instead. In the output section for Model 5.3, HLM automatically performs a likelihood ratio test of Hypothesis 5.2. The test statistic is obtained by subtracting the –2 ML log-likelihood (i.e., the deviance statistic) of Model 5.3 (the reference model) from that of Model 5.2 (the nested model). This test is a helpful guide to ascertaining whether specifying heterogeneous residual variance improves the fit of the model; however, it is based on ML estimation, which results in biased estimates of variance parameters: ' $ Summary of Model Fit -----------------------------------------------------------------Model Number of Parameters Deviance 1. Homogeneous sigma squared 10 292.72297 2. Heterogeneous sigma squared 11 292.53284 -----------------------------------------------------------------Model Comparison Chi-square df P-value Model 1 vs Model 2 0.19012 1 >.500 & % The resulting likelihood ratio test statistic is not significant when referred to a χ2 distribution with 1 degree of freedom (corresponding to the extra covariance parameter, α1 , in Model 5.3). This nonsignificant result suggests that we should not include heterogeneous residual variances for the two treatments in the model. At this stage, we keep Model 5.2 as our preferred model. Although HLM uses an estimation method (ML) different from the Models for Repeated-Measures Data:The Rat Brain Example 231 procedures in SAS, SPSS, R, and Stata (see Subsection 5.5.2 for more details), the choice of the final model based on the likelihood ratio test is consistent with the other procedures. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 5.2). We return to the output for Model 5.2 to investigate tests for the fixed effects of the region by treatment interaction (Hypothesis 5.3). Specifically, locate the “Results of General Linear Hypothesis Testing” in the HLM output. Investigation of the Wald chi-square tests reported for the REGION by TREATMENT interaction in Model 5.2 indicates that Hypothesis 5.3 should be rejected, because the fixed effects associated with the REGION × TREAT interaction are significant. See Sections 5.5 and 5.6 for more details. 5.5 Results of Hypothesis Tests Table 5.6 presents results of the hypothesis tests carried out in the analysis of the Rat Brain data. The test results reported in this section were calculated based on the analysis in SPSS. 5.5.1 Likelihood Ratio Tests for Random Effects Hypothesis 5.1. The random effects (u3i ) associated with treatment for each animal can be omitted from Model 5.2. The likelihood ratio test statistic for Hypothesis 5.1 is calculated by subtracting the –2 REML log-likelihood for Model 5.2 (the reference model including the random treatment effects) from that for Model 5.1 (the nested model without the random treatment effects). This difference is calculated as 275.3 – 249.2 = 26.1. Because the null hypothesis value for the variance of the random treatment effects is on the boundary of the parameter space (i.e., zero), the asymptotic null distribution of this test statistic is a mixture of χ21 and χ22 distributions, each with equal weights of 0.5 (Verbeke & Molenberghs, 2000). To evaluate the significance of the test, we calculate the p-value as follows: p-value = 0.5 × P (χ22 > 26.1) + 0.5 × P (χ21 > 26.1) < 0.001 We reject the null hypothesis and retain the random effects associated with treatment in Model 5.2 and all subsequent models. 5.5.2 Likelihood Ratio Tests for Residual Variance Hypothesis 5.2. The residual variance is homogeneous for both the carbachol and basal treatments. We use a likelihood ratio test for Hypothesis 5.2. The test statistic is calculated by subtracting the –2 REML log-likelihood for Model 5.3, the reference model with heterogeneous residual variances, from that for Model 5.2, the nested model with homogeneous residual variance. The asymptotic null distribution of the test statistic is a χ2 with one degree of freedom. The single degree of freedom is a consequence of the reference model having one additional covariance parameter (i.e., the additional residual variance for the carbachol treatment) in the Ri matrix. We do not reject the null hypothesis in this case (p = 0.66), so we conclude that Model 5.2, with homogeneous residual variance, is our preferred model. 232 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 5.6: Summary of Hypothesis Test Results for the Rat Brain Analysis Hypothesis Label Test Estimation Method Models Compared (Nested vs. Reference) Test Statistic Values (Calculation) p-Value 5.1 LRTa REML 5.1 vs. 5.2 χ2 (1 : 2) = 26.1 (275.3 – 249.2) < .001 5.2 LRT REML 5.2 vs. 5.3 χ2 (1) = 0.2 (249.2 – 249.0) 0.66 5.3 Type-III F -test REML F (2, 16) = 81.0 < .001 Wald 5.2b χ2 (1) = 162.1 < .001 2 χ -test Note: See Table 5.5 for null and alternative hypotheses and distributions of test statistics under H0 . a Likelihood ratio test; the test statistic is calculated by subtracting the –2 REML loglikelihood for the reference model from that of the nested model. b The use of an F -test (SAS, SPSS, or R) or a Wald χ2 -test (Stata or HLM) does not require fitting a nested model. 5.5.3 F -Tests for Fixed Effects Hypothesis 5.3. The REGION by TREATMENT interaction effects can be omitted from Model 5.2. To test Hypothesis 5.3, we use an F -test based on the results of the REML estimation of Model 5.2. We present the Type III F -test results based on the SPSS output in this section. This test is significant at α = 0.05 (p < .001), which indicates that the fixed effect of the carbachol treatment on nucleotide activation differs by region, as we noted in our original data summary. We retain the fixed effects associated with the region by treatment interaction and select Model 5.2 as our final model. 5.6 Comparing Results across the Software Procedures Table 5.7 shows a comparison of selected results obtained by using the five software procedures to fit Model 5.1 to the Rat Brain data. This model is “loaded” with fixed effects, has a random effect associated with the intercept for each animal, and has homogeneous residual variance across levels of TREATMENT and REGION. Models for Repeated-Measures Data:The Rat Brain Example 233 Table 5.8 presents a comparison of selected results from the five procedures for Model 5.2, which has the same fixed and random effects as Model 5.1, and an additional random effect associated with treatment for each animal. 5.6.1 Comparing Model 5.1 Results Table 5.7 shows that results for Model 5.1 agree across the software procedures in terms of the fixed-effect parameter estimates and their estimated standard errors. They also agree on 2 the values of the estimated variances, σint and σ 2 , and their standard errors, when reported. The value of the –2 REML log-likelihood is the same across all five software procedures. However, there is some disagreement in the values of the information criteria (AIC and BIC) because of different calculation formulas that are used (see Subsection 3.6.1). There are also differences in the types of tests reported for the fixed-effect parameters and, thus, in the results for these tests. SAS and SPSS report Type III F -tests, R reports Type I F -tests, and Stata and HLM report Wald χ2 tests (see Subsection 2.6.3.1 for a detailed discussion of the differences in these tests of fixed effects). The values of the test statistics for the software procedures that report the same tests agree closely. We note that the lmer() function in R does not compute denominator degrees of freedom for F -test statistics or degrees of freedom for t-statistics, primarily due to the variability in methods that are available for approximating these degrees of freedom and the absence of an accepted standard. As noted in the footnotes for Table 5.7, users of this function can use the lmerTest package to compute approximate degrees of freedom for these test statistics, along with p-values. 5.6.2 Comparing Model 5.2 Results Table 5.8 shows that the estimated fixed-effect parameters and their respective standard errors for Model 5.2 agree across all five software procedures. The estimated covariance parameters differ slightly, likely due to rounding differences. It is also noteworthy that R reports the estimated correlation of the two random effects in Model 5.2, as opposed to the covariances reported by the other four software procedures. There are also differences in the types of the F -tests for fixed effects computed in SAS, SPSS, and R. These differences are discussed in general in Subsections 2.6.3.1 and 3.11.6. 5.7 Interpreting Parameter Estimates in the Final Model The results that we present in this section were obtained by fitting the final model (Model 5.2) to the Rat Brain data, using REML estimation in SPSS. 5.7.1 Fixed-Effect Parameter Estimates The fixed-effect parameter estimates, standard errors, significance tests (t-tests with Satterthwaite approximations for the degrees of freedom), and 95% confidence intervals obtained by fitting Model 5.2 to the Rat Brain data in SPSS are reported in the following SPSS output: 234 TABLE 5.7: Comparison of Results for Model 5.1 SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM: HLM2 Estimation Method REML REML REML REML REML REML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) β0 (Intercept) β1 (BST vs. VDB) β2 (LS vs. VDB) β3 (TREATMENT) β4 (BST × TREATMENT) β5 (LS × TREATMENT) 212.29(38.21) 216.21(31.31) 25.45(31.31) 360.03(31.31) −261.82(44.27) −162.50(44.27) 212.29(38.21) 216.21(31.31) 25.45(31.31) 360.03(31.31) −261.82(44.27) −162.50(44.27) 212.29(38.21) 216.21(31.31) 25.45(31.31) 360.03(31.31) −261.82(44.27) −162.50(44.27) 212.29(38.21) 216.21(31.31) 25.45(31.31) 360.03(31.31) −261.82(44.27) −162.50(44.27) 212.29(38.21) 212.29(38.21) 216.21(31.31) 216.21(31.31) 25.45(31.31) 25.45(31.31) 360.03(31.31) 360.03(31.31) −261.82(44.27) −261.82(44.28) −162.50(44.27) −162.50(44.28) Covariance Parameter Estimate (SE) Estimate (SE) 2 σint 2 σ (Residual variance) Estimate (n.c.)a 4849.81(3720.35) 4849.81(3720.35) 4849.81 2450.29(774.85) 2450.30(774.85) 2450.30 b Estimate (n.c.) 4849.8 2450.30 Estimate (SE) Estimate (n.c.) 4849.81(3720.35) 4849.74 2450.30(774.85) 2450.38 Model Information Criteria –2 RE/ML log-likelihood AIC BIC Tests for Fixed Effects Intercept 275.3 279.3 278.5 275.3 279.3 281.6 275.3 291.3 300.7 275.3 291.3 302.5 275.3 291.3 302.5 275.3 n.c. n.c. Type III F -Testsc Type III F -Tests Type I F -Tests Type I F -Tests t(4) = 5.6, F (1, 4.7) = 75.7, F (1, 20) = 153.8, Wald χ2 -Testsc Wald χ2 -Testsc p <.01 p <.01 p <.01 p <.01 p <.01 t = 5.6 (no d.f.)d Z = 5.6, t(4) = 5.6, REGION F (2, 20) = 28.5, F (2, 20) = 28.5, F (2, 20) = 20.6, F (1,no d.f.) = 20.6 χ2 (2) = 57.0, χ2 (2) = 57.0, TREATMENT F (1, 20) = 146.3, F (1, 20) = 146.2, F (1, 20) = 146.2, F (1,no d.f.) = 146.2 χ2 (1) = 132.3, χ2 (1) = 132.2, p <.01 p <.01 p <.01 p <.01 p <.01 Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Estimation Method SAS: proc mixed SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM: HLM2 REML REML REML REML REML REML p <.01 p <.01 p <.01 REGION × F (2, 20) = 17.8, F (2, 20) = 17.8, F (2, 20) = 17.8, F (2,no d.f.) = 17.8 TREATMENT p <.01 p <.01 p <.01 Note: (n.c.) = not computed Note: 30 Repeated Measures at Level 1; 5 Rats at Level 2 a The nlme version of the lme() function in R reports the estimated standard deviations of the random effects and the residuals; these estimates are squared to get the estimated variances. b Users of the lme() function in R can use the function intervals(model5.1.fit) to obtain approximate (i.e., large-sample) 95% confidence intervals for covariance parameters. c The test reported for the intercept differs from the tests for the other fixed effects in the model. d Users of the lmer() function in R can use the lmerTest package to obtain approximate degrees of freedom for these test statistics based on a Satterthwaite approximation (which enables the computation of p-values for the test statistics). p <.01 χ2 (2) = 35.7, p <.01 p <.01 χ2 (2) = 35.7, p <.01 Models for Repeated-Measures Data:The Rat Brain Example TABLE 5.7: (Continued) 235 236 TABLE 5.8: Comparison of Results for Model 5.2 SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM: HLM2 Estimation Method REML REML REML REML REML REML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) β0 (Intercept) β1 (BST vs. VDB) β2 (LS vs. VDB) β3 (TREATMENT) β4 (BST × TREATMENT) β5 (LS × TREATMENT) 212.29(19.10) 212.29(19.10) 216.21(14.68) 216.21(14.68) 25.45(14.68) 25.45(14.68) 360.03(38.60) 360.03(38.60) −261.82(20.76) −261.82(20.76) −162.50(20.76) −162.50(20.76) 212.29(19.10) 216.21(14.68) 25.45(14.68) 360.03(38.60) −261.82(20.76) −162.50(20.76) 212.29(19.10) 216.21(14.68) 25.45(14.68) 360.03(38.60) −261.82(20.76) −162.50(20.76) 212.29(19.10) 212.29(19.10) 216.21(14.68) 216.21(14.68) 25.45(14.68) 25.45(14.68) 360.03(38.60) 360.03(38.60) −261.82(20.76) −261.82(20.76) −162.50(20.76) −162.50(20.76) Covariance Parameter Estimate (SE) 2 σint σint,treat 2 σtreat σ2 Estimate (n.c.)a Estimate (n.c.) 1284.32(1037.12)1284.32(1037.12) 1284.29 2291.22(1892.63)2291.22(1892.63) 0.80(corr.) 6371.33(4760.94)6371.33(4760.94) 6371.25 538.90(190.53) 538.90(190.53) 538.90 1284.30 0.80(corr.) 6371.30 538.90 Estimate (SE) Estimate (SE) Estimate (n.c.) 1284.32(1037.12) 1284.30 2291.23(1892.63) 2291.25 6371.34(4760.95) 6371.29 538.90(190.53) 538.90 Model Information Criteria –2 RE/ML log-likelihood AIC BIC 249.2 257.2 255.6 249.2 257.2 261.9 249.2 269.2 281.0 249.2 269.2 283.2 249.2 269.2 283.2 249.2 n.c. n.c. Tests for Fixed Effects Type III F -Testsa Type III F -Tests Type I F -Tests Type I F -Tests Wald χ2 -Testsa Wald χ2 -Testsa Intercept t(4) = 11.1, F (1, 4) = 292.9, F (1, 20) = 313.8, t(no d.f.) = 11.1 Z = 11.1, t(4) = 11.1, t(2, 16) = 129.6, t(2, 16) = 129.6, t(2, 20) = 93.7, F (2,no d.f.) = 93.7 χ2 (2) = 259.1, χ2 (2) = 259.1, p <.01 REGION p <.01 p <.01 p <.01 p <.01 Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Estimation Method SAS: proc mixed SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM: HLM2 REML REML REML REML REML REML F (1,no d.f.) = 35.1 χ2 (1) = 87.0, χ2 (1) = 87.0, χ2 (2) = 162.1, χ2 (2) = 162.1, p <.01 TREATMENT F (1, 4) = 35.5, p <.01 F (1, 4) = 35.5, p <.01 F (1, 20) = 35.5, p <.01 p <.01 p <.01 REGION × F (2, 16) = 81.1, F (2, 16) = 81.0, F (2, 20) = 81.0, F (2,no d.f.) = 81.0 TREATMENT p <.01 p <.01 p <.01 Note: (n.c.) = not computed Note: 30 Repeated Measures at Level 1; 5 Rats at Level 2 a The test used for the intercept differs from the tests for the other fixed effects in the model. p <.01 p <.01 p <.01 p <.01 p <.01 p <.01 Models for Repeated-Measures Data:The Rat Brain Example TABLE 5.8: (Continued) 237 238 Linear Mixed Models: A Practical Guide Using Statistical Software ' $ Estimates of Fixed Effects (a) Parameter Estimate Intercept 212.294000 [region=1] 216.212000 [region=2] 25.450000 [region=3] 0(b) treat 360.026000 [region=1]*treat -261.822000 [region=2]*treat -162.500000 [region=3]*treat 0(b) Std. Error 19.095630 14.681901 14.681901 0 38.598244 20.763343 20.763343 0 df 6.112 16 16 . 4.886 16 16 . t 11.117 14.726 1.733 . 9.328 -12.610 -7.826 . Sig. .000 .000 .102 . .000 .000 .000 . 95% Confidence Interval Lower Bound Upper Bound 165.775675 258.812325 185.087761 247.336239 -5.674239 56.574239 . . 260.103863 459.948137 -305.838322 -217.805678 -206.516322 -118.483678 . . a. Dependent Variable: activate. b. This parameter is set to zero because it is redundant. & % Because of the presence of the REGION × TREAT interaction in the model, we need to be careful when interpreting the main effects associated with these variables. To aid interpretation of the results in the presence of the significant interaction, we investigate the estimated marginal means (EMMEANS) of activation for each region within each level of treatment, requested in the SPSS syntax for Model 5.2: ' $ Estimates (a) region BST LS VDB Mean 526.710(b) 435.270(b) 572.320(b) Std. Error 50.551 50.551 50.551 df 4.234 4.234 4.234 95% Confidence Interval Lower Bound Upper Bound 389.364 664.056 297.924 572.616 434.974 709.666 a. Dependent Variable: activate. b. Covariates appearing in the model are evaluated at the following values: treat = 1.00. & % These are the estimated means for activation at the three regions when TREAT = 1 (carbachol). The estimated marginal means are calculated using the estimates of the fixed effects displayed earlier. SPSS also performs pairwise comparisons of the estimated marginal means for the three regions for carbachol treatment, requested with the COMPARE option in the /EMMEANS subcommand: ' $ (I) region BST LS VDB (J) LS VDB BST VDB BST LS Pairwise Comparisons (a) Mean Difference region (I-J) Std. Error df 91.440* 14.682 16 -45.610* 14.682 16 -91.440* 14.682 16 -137.050* 14.682 16 45.610* 14.682 16 137.050* 14.682 16 Based on estimated marginal means *. The mean difference is significant at the .05 level. a. Dependent Variable: activate. c. Adjustment for multiple comparisons: Bonferroni. & 95% Confidence Interval Sig.(c) Lower Bound Upper Bound .000 52.195 130.685 .020 -84.855 -6.365 .000 -130.685 -52.195 .000 -176.295 -97.805 .020 6.365 84.855 .000 97.805 176.295 % We see that all of the estimated marginal means are significantly different at α = 0.05 after performing a Bonferroni adjustment for the multiple comparisons. We display similar tables for the basal treatment (TREAT = 0): Models for Repeated-Measures Data:The Rat Brain Example ' 239 $ Estimates (a) region BST LS VDB Mean 428.506(b) 237.744(b) 212.294(b) Std. Error 19.096 19.096 19.096 95% Confidence Interval Lower Bound Upper Bound 381.988 475.024 191.226 284.262 165.776 258.812 df 6.112 6.112 6.112 a. Dependent Variable: activate. b. Covariates appearing in the model are evaluated at the following values: treat = .00. & % Note that the activation means at the LS and VDB regions are not significantly different for the basal treatment (p = 0.307): ' $ (I) region BST LS VDB (J) LS VDB BST VDB BST LS Pairwise Comparisons (a) Mean Difference region (I-J) Std. Error df 190.762* 14.682 16 216.212* 14.682 16 -190.762* 14.682 16 25.450 14.682 16 -216.212* 14.682 16 -25.450* 14.682 16 95% Confidence Interval Sig.(c) Lower Bound Upper Bound .000 151.517 230.007 .000 176.967 255.457 .000 -230.007 -151.517 .307 -13.795 64.695 .000 -255.457 -176.967 .307 -64.695 13.795 Based on estimated marginal means *. The mean difference is significant at the .05 level. a. Dependent Variable: activate. c. Adjustment for multiple comparisons: Bonferroni. & % These results are in agreement with what we observed in our initial graph of the data (Figure 5.1), in which the LS and VDB regions had very similar activation means for the basal treatment, but different activation means for the carbachol treatment. 5.7.2 Covariance Parameter Estimates The estimated covariance parameters obtained by fitting Model 5.2 to the Rat Brain data using the MIXED command in SPSS with REML estimation are reported in the following output: ' $ Estimates of Covariance Parameters (a) Parameter Estimate Residual 538.895532 190.528343 1284.319876 2291.223258 6371.331082 1037.116565 1892.631626 4760.943896 Intercept + treat [subject = animal] UN (1,1) UN (2,1) UN (2,2) a. Dependent Variable: activate. & Std. Error % The first part of the output contains the estimated residual variance, which has a value of 538.9. The next part of the output, labeled “Intercept + treat [subject=animal],” lists the three elements of the estimated D covariance matrix for the two random effects in the model. These elements are labeled according to their position (row, column) in the D matrix. We specified the D matrix to be unstructured by using the COVTYPE(UN) option in the /RANDOM subcommand of the SPSS syntax. 240 Linear Mixed Models: A Practical Guide Using Statistical Software The variance of the random intercepts, labeled UN(1,1) in this unstructured matrix, is estimated to be 1284.32, and the variance of the random treatment effects, labeled UN(2,2), is estimated to be 6371.33. The positive estimated covariance between the random intercepts and random treatment effects, denoted by UN(2,1) in the output, is 2291.22. The estimated D matrix, referred to as the G matrix by SPSS, is shown as follows in matrix form. This output was requested by using the /PRINT G subcommand in the SPSS syntax for Model 5.2. ' $ Random Effect Covariance Structure (G) (a) Intercept | animal treat | animal Intercept | animal 1284.319876 2291.223258 treat | animal 2291.223258 6371.331082 Unstructured a. Dependent Variable: activate. & 5.8 % The Implied Marginal Variance-Covariance Matrix for the Final Model The current version of the MIXED command in SPSS does not provide an option to display the estimated Vi covariance matrix for the marginal model implied by Model 5.2 in the output, so we use output from SAS and R in this section. The matrices of marginal covariances and marginal correlations for an individual subject can be obtained in SAS by including the v and vcorr options in the random statement in the proc mixed syntax for Model 5.2: random int treat / subject = animal type = un v vcorr; By default, SAS displays the marginal variance-covariance and corresponding correlation matrices for the first subject in the data file (in this case the matrices displayed correspond to animal R100797). Note that these matrices have the same structure for any given animal. Both the marginal Vi matrix and the marginal correlation matrix are of dimension 6 × 6, corresponding to the values of activation for each combination of region by treatment for a given rat i. The “Estimated V Matrix for animal R100797” displays the estimated marginal variances of activation on the diagonal and the estimated marginal covariances off the diagonal. The 3 × 3 submatrix in the upper-left corner represents the marginal covariance matrix for observations on the BST, LS, and VDB regions in the basal treatment, and the 3 × 3 submatrix in the lower-right corner represents the marginal covariance matrix for observations on the three brain regions in the carbachol treatment. The remainder of the Vi matrix represents the marginal covariances of observations on the same rat across treatments. ' $ Estimated V Matrix for animal R100797 Row Col1 1 1823.22 2 1284.32 3 1284.32 4 3575.54 5 3575.54 6 3575.54 & Col2 1284.32 1823.22 1284.32 3575.54 3575.54 3575.54 Col3 1284.32 1284.32 1823.22 3575.54 3575.54 3575.54 Col4 3575.54 3575.54 3575.54 12777 12238 12238 Col5 3575.54 3575.54 3575.54 12238 12777 12238 Col6 3575.54 3575.54 3575.54 12238 12238 12777 % Models for Repeated-Measures Data:The Rat Brain Example 241 The inclusion of the random treatment effects in Model 5.2 implies that the marginal variances and covariances differ for the carbachol and basal treatments. We see in the estimated Vi matrix that observations for the carbachol treatment have a much larger estimated marginal variance (12777) than observations for the basal treatment (1823.22). This result is consistent with the initial data summary and with Figure 5.1, in which we noted that the between-rat variability in the carbachol treatment is greater than in the basal treatment. The implied marginal covariances of observations within a given treatment are assumed to be constant, which might be viewed as a fairly restrictive assumption. We consider alternative models that allow these marginal covariances to vary in Section 5.11. The 6 × 6 matrix of estimated marginal correlations implied by Model 5.2 (taken from the SAS output) is displayed below. The estimated marginal correlations of observations in the basal treatment and the carbachol treatment are both very high (.70 and .96, respectively). The estimated marginal correlation of observations for the same rat across the two treatments is also high (.74). ' $ Estimated V Correlation Matrix for animal R100797 Row Col1 1 1.0000 2 0.7044 3 0.7044 4 0.7408 5 0.7408 6 0.7408 & Col2 0.7044 1.0000 0.7044 0.7408 0.7408 0.7408 Col3 0.7044 0.7044 1.0000 0.7408 0.7408 0.7408 Col4 0.7408 0.7408 0.7408 1.0000 0.9578 0.9578 Col5 0.7408 0.7408 0.7408 0.9578 1.0000 0.9578 Col6 0.7408 0.7408 0.7408 0.9578 0.9578 1.0000 % In R, the estimated marginal variance-covariance matrix can be displayed by applying the getVarCov() function to a model fit object produced by the lme() function: > getVarCov(model5.2.fit, individual = "R100797", type = "marginal") These findings support our impressions in the initial data summary (Figure 5.1), in which we noted that observations on the same animal appeared to be very highly correlated (i.e., the level of activation for a given animal tended to “track” across regions and treatments). We present a detailed example of the calculation of the implied marginal variancecovariance matrix for the simpler Model 5.1 in Appendix B. 5.9 Diagnostics for the Final Model In this section we present an informal graphical assessment of the diagnostics for our final model (Model 5.2), fitted using REML estimation in SPSS. The syntax in the following text was used to refit Model 5.2 with the MIXED command in SPSS, using REML estimation to get unbiased estimates of the covariance parameters. The /SAVE subcommand requests that the conditional predicted values, PRED, and the conditional residuals, RESID, be saved in the current working data set. The predicted values and the residuals are conditional on the random effects in the model and are saved in two new variables in the working SPSS data file. Optionally, marginal predicted values can be saved in the data set, using the FIXPRED option in the /SAVE subcommand. See Section 3.10 for a more general discussion of conditional predicted values and conditional residuals. 242 Linear Mixed Models: A Practical Guide Using Statistical Software Software Note: The variable names used for the conditional predicted values and the conditional residuals saved by SPSS depend on how many previously saved versions of these variables already exist in the data file. If the current model is the first for which these variables have been saved, they will be named PRED 1 and RESID 1 by default. SPSS numbers successively saved versions of these variables as PRED n and RESID n, where n increments by one for each new set of conditional predicted and residual values. * Model 5.2 (Diagnostics). MIXED activate BY region WITH treat /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = region treat region*treat | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION G /RANDOM INTERCEPT treat | SUBJECT(animal) COVTYPE(UN) /SAVE = PRED RESID . We include the following syntax to obtain a normal Q–Q plot of the conditional residuals: PPLOT /VARIABLES=RESID_1 /NOLOG /NOSTANDARDIZE /TYPE=Q-Q /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL. The conditional residuals from this analysis appear to follow a normal distribution fairly well (see Figure 5.3). However, it is difficult to assess the distribution of the conditional residuals, because there are only 30 total observations (= 5 rats × 6 observations per rat). A Kolmogorov–Smirnov test for normality of the conditional residuals can be carried out using the following syntax: NPAR TESTS /K-S(NORMAL)= RESID_1 /MISSING ANALYSIS. The result of the Kolmogorov–Smirnov test for normality2 is not significant (p = 0.95). We consider normality of the residuals to be a reasonable assumption for this model. We also investigate the assumption of equal residual variance in both treatments by examining a scatter plot of the conditional residuals vs. the conditional predicted values: GRAPH /SCATTERPLOT(BIVAR)=PRED_1 WITH RESID_1 BY treatment /MISSING=LISTWISE . 2 In general, the Shapiro–Wilk test for normality is more powerful than the Kolmogorov–Smirnov test when working with small sample sizes. Unfortunately, this test is not available in SPSS. Models for Repeated-Measures Data:The Rat Brain Example 243 FIGURE 5.3: Distribution of conditional residuals from Model 5.2. Figure 5.4 suggests that the residual variance is fairly constant across treatments (there is no pattern, and the residuals are symmetric). We formally tested the assumption of equal residual variances across treatments in Hypothesis 5.2, and found no significant difference in the residual variance for the carbachol treatment vs. the basal treatment in Model 5.2 (see Subsection 5.5.2). The distributions of the EBLUPs of the random effects should also be investigated to check for possible outliers. Unfortunately, EBLUPs for the two random effects associated with each animal in Model 5.2 cannot be generated in the current version of the MIXED command in IBM SPSS Statistics (Version 21). Because we have a very small number of animals, we do not investigate diagnostics for the EBLUPs for this model. 5.10 5.10.1 Software Notes Heterogeneous Residual Variances for Level 1 Groups Recall that in Chapter 3 we used a heterogeneous residual variance structure for groups defined by a Level 2 variable (treatment) in Model 3.2B. The ability to fit such models is available only in proc mixed in SAS, the GENLINMIXED command in SPSS, the lme() function in R, and the mixed command in Stata. When we fit Model 5.3 to the Rat Brain data in this chapter, we defined a heterogeneous residual variance structure for different values of a Level 1 variable (TREATMENT). We were able to fit this model using all of the software procedures, with the exception of the lmer() function in R. The HLM2 procedure only allows maximum likelihood estimation for models that are fitted with a heterogeneous residual variance structure. SAS, SPSS, the the lme() function 244 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 5.4: Scatter plot of conditional residuals vs. conditional predicted values based on the fit of Model 5.2. in R, and the mixed command in Stata all allow ML or REML (default) estimation for these models. The parameterization of the heterogeneous residual variances in HLM2 employs a logarithmic transformation, so the parameter estimates for the variances from HLM2 need to be exponentiated before they can be compared with results from the other software procedures (see Subsection 5.4.5). 5.10.2 EBLUPs for Multiple Random Effects Model 5.2 specified two random effects for each animal: one associated with the intercept and a second associated with treatment. The EBLUPs for multiple random effects per subject can be displayed using SAS, R, Stata, and HLM. However, it is not possible to obtain separate estimates of the EBLUPs for more than one random effect per subject when using the MIXED command in the current version of IBM SPSS Statistics (Version 21). 5.11 5.11.1 Other Analytic Approaches Kronecker Product for More Flexible Residual Covariance Structures Most residual covariance structures (e.g., AR(1) or compound symmetry) are designed for one within-subject factor (e.g., time). In the Rat Brain example, we have two within-subject Models for Repeated-Measures Data:The Rat Brain Example 245 factors: brain region and treatment. With such data, one can consider modeling a residual covariance structure using the Kronecker Product of the underlying within-subject factor-specific covariance matrices (Galecki, 1994). This method adds flexibility in building residual covariance structures and has an attractive interpretation in terms of independent within-subject factor-specific contributions to the overall within-subject covariance structure. Examples of this general methodology are implemented in SAS proc mixed. The SAS syntax that implements an example of this methodology for the Rat Brain data is provided below: title "Kronecker Product Covariance Structure"; proc mixed data = ratbrain; class animal region treatment; model activate = region treatment region*treatment / s; random int / subject = animal type=vc v vcorr solution; repeated region treatment / subject=animal type=un@un r rcorr; run; Note that both REGION and TREATMENT must be listed in the class statement. In the random statement we retain the random intercept but omit the random animal-specific treatment effects to avoid overparameterization of the model. The repeated statement includes the option type=un@un, which specifies the Kronecker product of the two matrices for the REGION and TREATMENT factors, with three and two levels, respectively. The syntax implies that REGION contributes an unstructured 3 × 3 matrix and TREATMENT contributes an unstructured 2 × 2 matrix to the overall 6 × 6 matrix Ri . To ensure identifiability of the matrices, we assume that the upper-left element in the matrix contributed by TREATMENT is equal to 1 (which is automatically done by the software). This syntax results in the following estimates of the elements of the underlying factorspecific matrices for both REGION and TREATMENT. ' $ Covariance Parameter Estimates -------------------------------------Cov Parm Subject Estimate -------------------------------------Intercept animal 7637.3000 region UN(1,1) animal 2127.7400 UN(2,1) animal 1987.2900 UN(2,2) animal 2744.6600 UN(3,1) animal 1374.5100 UN(3,2) animal 2732.2200 UN(3,3) animal 3419.7000 treatment UN(1,1) animal 1.0000 UN(2,1) animal -0.4284 UN(2,2) animal 0.6740 & % We can use these estimates to determine the unstructured variance-covariance matrices for the residuals contributed by the REGION and TREATMENT factors: ⎛ ⎞ 2127.74 1987.29 1374.51 RREGION = ⎝ 1987.29 2744.66 2732.22 ⎠ 1374.51 2732.22 3419.70 RT REAT MEN T = 1.00 −0.43 −0.43 0.67 246 Linear Mixed Models: A Practical Guide Using Statistical Software The Kronecker product of these two factor-specific residual variance-covariance matrices implies the following overall estimated Ri correlation matrix for a given rat: ' $ Estimated R Correlation Matrix for animal R100797 Row Col1 1 1.0000 2 0.8224 3 0.5096 4 -0.5219 5 -0.4292 6 -0.2659 & Col2 0.8224 1.0000 0.8918 -0.4292 -0.5219 -0.4654 Col3 0.5096 0.8918 1.0000 -0.2659 -0.4654 -0.5219 Col4 -0.5219 -0.4292 -0.2659 1.0000 0.8224 0.5096 Col5 -0.4292 -0.5219 -0.4654 0.8224 1.0000 0.8918 Col6 -0.2659 -0.4654 -0.5219 0.5096 0.8918 1.0000 % The implied marginal correlation matrix of observations for a given rat based on this model is as follows. The structure of this marginal correlation matrix reveals a high level of correlation among observations on the same animal, as was observed for the implied marginal correlation matrix based on the fit of Model 5.2: ' $ Estimated V Correlation Matrix for animal R100797 Row Col1 1 1.0000 2 0.9559 3 0.8673 4 0.7146 5 0.7050 6 0.7153 & Col2 0.9559 1.0000 0.9678 0.6992 0.6511 0.6365 Col3 0.8673 0.9678 1.0000 0.7038 0.6314 0.5887 Col4 0.7146 0.6992 0.7038 1.0000 0.9676 0.9017 Col5 0.7050 0.6511 0.6314 0.9676 1.0000 0.9760 Col6 0.7153 0.6365 0.5887 0.9017 0.9760 1.0000 % The AIC for this model is 258.0, which is very close to the value for Model 5.2 (AIC = 257.2) and better (i.e., smaller) than the value for Model 5.1 (AIC = 279.3). Note that covariance structures based on Kronecker products can also be used for studies involving multiple dependent variables measured longitudinally (not considered in this book). 5.11.2 Fitting the Marginal Model We can also take a strictly marginal approach (in which random animal effects are not considered) to modeling the Rat Brain data. However, there are only 5 animals and 30 observations; therefore, fitting a marginal model with an unstructured residual covariance matrix is not recommended, because the unstructured Ri matrix would require the estimation of 21 covariance parameters. When attempting to fit a marginal model with an unstructured covariance structure for the residuals using REML estimation in SPSS, the MIXED command issues a warning and does not converge to a valid solution. We can consider marginal models with more restrictive residual covariance structures. For example, we can readily fit a model with a heterogeneous compound symmetry Ri matrix, which requires the estimation of seven parameters: six variances, i.e., one for each combination of treatment and region, and a constant correlation parameter (note that these are also a lot of parameters to estimate for this small data set). We use the following syntax in SPSS: * Marginal model with heterogeneous compound symmetry R(i) matrix. MIXED activate BY region treatment Models for Repeated-Measures Data:The Rat Brain Example 247 /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = region treatment region*treatment | SSTYPE(3) /METHOD = REML /PRINT = SOLUTION R /REPEATED Region Treatment | SUBJECT(animal) COVTYPE(CSH) . In this syntax for the marginal model, note that the /RANDOM subcommand is not included. The structure of the Ri matrix is specified as CSH (compound symmetric heterogeneous), which means that the residual marginal variance is allowed to differ for each combination of REGION and TREATMENT, although the correlation between observations on the same rat is constant (estimated to be 0.81). The AIC for this model is 267.8, as compared to the value of 257.2 for Model 5.2. So, it appears that we have a better fit using the LMM with explicit random effects (Model 5.2). 5.11.3 Repeated-Measures ANOVA A more traditional approach to repeated-measures ANOVA (Winer et al., 1991) starts with a data set in the wide format shown in Table 5.2. This type of analysis could be carried out, for example, using the GLM procedures in SPSS and SAS. However, if any missing values occur for a given subject (e.g., animal), that subject is dropped from the analysis altogether (complete case analysis). Refer to Subsection 2.9.4 for more details on the problems with this approach when working with missing data. The correlation structure assumed in a traditional repeated-measures ANOVA is spherical (i.e., compound symmetry with homogeneous variance), with adjustments (Greenhouse– Geisser or Huynh–Feldt) made to the degrees of freedom used in the denominator for the F -tests of the within-subject effects when the assumption of sphericity is violated. There are no explicit random effects in this approach, but a separate mean square error is estimated for each within-subject factor, which is then used in the F -tests for that factor. Thus, in an analysis of the Rat Brain data, there would be a separate residual variance estimated for REGION, for TREATMENT, and for the REGION × TREATMENT interaction. In general, the LMM approach to the analysis of repeated-measures data allows for much more flexible correlation structures than can be specified in a traditional repeated-measures ANOVA model. 5.12 Recommendations This chapter has illustrated some important points with respect to how categorical fixed factors are handled in the specification of random-effects structures for linear mixed models in the different software procedures. If, for example, a categorical fixed factor has four levels, a standard reference (or “dummy variable”) parameterization of the model is used (where fixed effects of indicators for the nonreference categories are included in the model), and there are multiple measurements at each level of the categorical factor within each of the higher-level units (as was the case for the two treatments in the Rat Brain example), then technically one could include random effects associated with each of the indicator variables. However, because some of the software procedures (e.g., R, HLM) use an unstructured covariance matrix for the random effects included in the model by default, analysts need to 248 Linear Mixed Models: A Practical Guide Using Statistical Software be wary of how many covariance parameters are being included when specifying random effects associated with categorical factors. Allowing the effects of the three nonreference levels of the four-category fixed factor to randomly vary across higher-level units, for example, not only introduces three additional variance components that need to be estimated, but several additional covariances as well (e.g., the covariance of the random effects associated with nonreference levels 1 and 2 of the categorical fixed factor). Analysts need to be careful when specifying these random effects, and the approach that we took in the Stata example in this chapter underscores this issue (where Stata does not allow for the estimation of covariances of random effects when the effects associated with a categorical fixed factor are allowed to randomly vary at higher levels). For this reason, we recommend “manual” indicator coding for fixed factors, enabling more control over exactly what random effects covariance structure is being estimated. We illustrated this approach with our handling of the TREATMENT variable in the Rat Brain example. We also illustrated the ability of the different software procedures to allow the error variances to vary depending on the levels of categorical factors at Level 1 of the data hierarchy in a repeated measures design (i.e., within-subject factors). While allowing the error variances to vary across the two treatments did not improve the fits of the models to the Rat Brain data, we recommend that analysts always consider this possibility, beginning with graphical exploration of the data. It has been our experience that many real-world data sets have this feature, and the fits of models (and corresponding tests for various parameters) can be greatly improved by correctly modeling this aspect of the covariance structure for a given data set. Finally, we illustrated various likelihood ratio tests in this chapter, and we remind readers that likelihood ratio tests rely on asymptotic theory (i.e., large sample sizes at each of the levels of the data hierarchy, and especially at Level 2 in repeated measures designs). The Rat Brain data set is certainly not large enough to make these tests valid in an asymptotic sense, and the likelihood ratio tests are only included for illustration purposes. For smaller data sets, analysts might consider Bayesian methods for hypothesis testing, as was discussed in Chapter 2. 6 Random Coefficient Models for Longitudinal Data: The Autism Example 6.1 Introduction This chapter illustrates fitting random coefficient models to data arising from a longitudinal study of the social development of children with autism, whose socialization scores were observed at ages 2, 3, 5, 9 and 13 years. We consider models that allow the child-specific coefficients describing individual time trajectories to vary randomly. Random coefficient models are often used for the analysis of longitudinal data when the researcher is interested in modeling the effects of time and other time-varying covariates at Level 1 of the model on a continuous dependent variable, and also wishes to investigate the amount of between-subject variance in the effects of the covariates across Level 2 units (e.g., subjects in a longitudinal study). In the context of growth and development over time, random coefficient models are often referred to as growth curve models. Random coefficient models may also be employed in the analysis of clustered data, when the effects of Level 1 covariates, such as student’s socioeconomic status, tend to vary across clusters (e.g., classrooms or schools). Table 6.1 illustrates some examples of longitudinal data that may be analyzed using linear mixed models with random coefficients. We highlight the R software in this chapter. 6.2 6.2.1 The Autism Study Study Description The data used in this chapter were collected by researchers at the University of Michigan (Anderson et al., 2009) as part of a prospective longitudinal study of 214 children. The children were divided into three diagnostic groups when they were 2 years old: autism, pervasive developmental disorder (PDD), and nonspectrum children. We consider a subset of 158 autism spectrum disorder (ASD) children, including autistic and PDD children, for this example. The study was designed to collect information on each child at ages 2, 3, 5, 9, and 13 years, although not all children were measured at each age. One of the study objectives was to assess the relative influence of the initial diagnostic category (autism or PDD), language proficiency at age 2, and other covariates on the developmental trajectories of the socialization of these children. Study participants were children who had had consecutive referrals to one of two autism clinics before the age of 3 years. Social development was assessed at each age using the Vineland Adaptive Behavior Interview survey form, a parent-reported measure of socialization. The dependent variable, VSAE (Vineland Socialization Age Equivalent), was a 249 250 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 6.1: Examples of Longitudinal Data in Different Research Settings Research Setting Level of Data Subject (Level 2) Time (Level 1) Substance Abuse Business Autism Research Subject variable (random factor) College Company Child Covariates Geographic region, public/ private, rural/urban Industry, geographic region Gender, baseline language level Time variable Year Quarter Age Dependent variable Percent of students who use marijuana during each academic year Stock value in each quarter Socialization score at each age Covariates School ranking, cost of tuition Quarterly sales, workforce size Amount of therapy received combined score that included assessments of interpersonal relationships, play/leisure time activities, and coping skills. Initial language development was assessed using the Sequenced Inventory of Communication Development (SICD) scale; children were placed into one of three groups (SICDEGP) based on their initial SICD scores on the expressive language subscale at age 2. Table 6.2 displays a sample of cases from the Autism data in the “long” form appropriate for analysis using the LMM procedures in SAS, SPSS, Stata, and R. The data have been sorted in ascending order by CHILDID and by AGE within each level of CHILDID. This sorting is helpful when interpreting analysis results, but is not required for the model-fitting procedures. Note that the values of the subject-level variables, CHILDID and SICDEGP, are the same for each observation within a child, whereas the value of the dependent variable (VSAE) is different at each age. We do not consider any time-varying covariates other than AGE in this example. The variables that will be used in the analysis are defined below: Subject (Level 2) Variables • CHILDID = Unique child identifier • SICDEGP = Sequenced Inventory of Communication Development Expressive Random Coefficient Models for Longitudinal Data: The Autism Example 251 TABLE 6.2: Sample of the Autism Data Set Child (Level 2) Subject ID Longitudinal Measures (Level 1) Covariate CHILDID SICDEGP 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 ... 3 3 3 3 3 1 1 1 1 1 3 3 3 3 3 Time Variable Dependent Variable AGE VSAE 2 3 5 9 13 2 3 5 9 13 2 3 5 9 13 6 7 18 25 27 6 7 7 8 14 17 18 12 18 24 Group: categorized expressive language score at age 2 years (1 = low, 2 = medium, 3 = high) Time-Varying (Level 1) Variables • AGE = Age in years (2, 3, 5, 9, 13); the time variable • VSAE = Vineland Socialization Age Equivalent: parent-reported socialization, the dependent variable, measured at each age 6.2.2 Data Summary The data summary for this example is carried out using the R software package. A link to the syntax and commands that can be used to perform similar analyses in the other software packages is included on the book’s web page (see Appendix A). We begin by reading the comma-separated raw data file (autism.csv) into R and “attaching” the data set to memory, so that shorter versions of the variable names can be used in R functions (e.g., age rather than autism$age). > autism <- read.csv("c:\\temp\\autism.csv", h = T) > attach(autism) 252 Linear Mixed Models: A Practical Guide Using Statistical Software Alternatively, R users can import text data files directly from a web site: > file <- "http://www-personal.umich.edu/~bwest/autism.csv" > autism <- read.csv(file, h = T) > attach(autism) Next, we apply the factor() function to the numeric variables SICDEGP and AGE to create categorical versions of these variables (SICDEGP.F and AGE.F), and add the new variables to the data frame. > > > > sicdegp.f <- factor(sicdegp) age.f <- factor(age) # Add the new variables to the data frame object. autism.updated <- data.frame(autism, sicdegp.f, age.f) After creating these factors, we request descriptive statistics for both the continuous and factor variables included in the analysis using the summary() function. Note that the summary() function produces different output for the continuous variable, VSAE, than for the factor variables. > # Number of Observations at each level of AGE > summary(age.f) 2 3 5 9 13 156 150 91 120 95 > # Number of Observations at each level of AGE within each group > # defined by the SICDEGP factor > table(sicdegp.f, age.f sicdegp.f 2 3 5 1 50 48 29 2 66 64 36 3 40 38 26 age.f) 9 37 48 35 13 28 41 26 > # Overall summary for VSAE > summary(vsae) Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s 1.00 10.00 4.00 26.41 27.00 198.00 2.00 > # VSAE means at each AGE > tapply(vsae, age.f, mean, na.rm=TRUE) 2 3 5 9 13 9.089744 15.255034 21.483516 39.554622 60.600000 > # VSAE minimum values at each AGE Random Coefficient Models for Longitudinal Data: The Autism Example 253 > tapply(vsae, age.f, min, na.rm=TRUE) 2 3 5 9 13 1 4 4 3 7 > # VSAE maximum values at each AGE > tapply(vsae, age.f, max, na.rm=TRUE) 2 3 5 9 13 20 63 77 171 198 The number of children examined at each age differs due to attrition over time. We also have fewer children at age 5 years because one of the clinics did not schedule children to be examined at that age. There were two children for whom VSAE scores were not obtained at age 9, although the children were examined at that age (missing values of VSAE are displayed as NAs in the output). Overall, VSAE scores ranged from 1 to 198, with a mean of 26.41. The minimum values changed only slightly at each age, but the means and maximum values increased markedly at later ages. We next generate graphs that show the observed VSAE scores as a function of age for each child within levels of SICDEGP (Figure 6.1). We also display the mean VSAE profiles by SICDEGP (Figure 6.2). The R syntax that can be used to generate these graphs is provided below: > library(lattice) # Load the library for trellis graphics. > trellis.device(color=F) # Color is turned off. > # Load the nlme library, which is required for the > # plots as well as for subsequent models. > library(nlme) We generate Figure 6.1 using the R code below. We use the model formula vsae ~ age | childid as an argument in the groupedData() function to create a “grouped” data frame object, autism.g1, in which VSAE is the y-axis variable, AGE is the x-axis variable, and CHILDID defines the grouping of the observations (one line for each child in the plot). The one-sided formula ~ sicdegp.f in the outer = argument defines the outer groups for the plot (i.e., requests one plot per level of the SICDEGP factor). > autism.gl <- groupedData(vsae ~ age | childid, outer = ~ sicdegp.f, data = autism.updated) > # Generate individual profiles in Figure 6.1. > plot(autism.gl, display = "childid", outer = TRUE, aspect = 2, key = F, xlab = "Age (Years)", ylab = "VSAE", main = "Individual Data by SICD Group") For Figure 6.2, we create a grouped data frame object, autism.g2, where VSAE and AGE remain the y-axis and x-axis variables, respectively. However, by replacing “| childid” with “| sicdegp.f”, all children with the same value of SICDEGP are defined as a group and used to generate mean profiles. The argument order.groups = F preserves the numerical order of the SICDEGP levels. 254 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 6.1: Observed VSAE values plotted against age for children in each SICD group. > autism.g2 <- groupedData(vsae ~ age | sicdegp.f, order.groups = F, data = autism.updated) > # Generate mean profiles in Figure 6.2. > plot(autism.g2, display = "sicdegp", aspect = 2, key = F, xlab = "Age (Years)", ylab = "VSAE", main = "Mean Profiles by SICD Group") The plots of the observed VSAE values for individual children in Figure 6.1 show substantial variation from child to child within each level of SICD group; the VSAE scores of some children tend to increase as the children get older, whereas the scores for other children remain relatively constant. On the other hand, we do not see much variability in the initial values of VSAE at age 2 years for any of the levels of SICD group. Overall, we observe increasing between-child variability in the VSAE scores at each successive year of age. The random coefficient models fitted to the data account for this important feature, as we shall see later in this chapter. The mean profiles displayed in Figure 6.2 show that the mean VSAE scores generally increase with age. There may also be a quadratic trend in VSAE scores, especially in SICD group two. This suggests that a model to predict VSAE should include both linear and quadratic fixed effects of age, and possibly interactions between the linear and quadratic effects of age and SICD group. Random Coefficient Models for Longitudinal Data: The Autism Example 255 FIGURE 6.2: Mean profiles of VSAE values for children in each SICD group. 6.3 Overview of the Autism Data Analysis For the analysis of the Autism data, we follow the “top-down” modeling strategy outlined in Subsection 2.7.1 of Chapter 2. In Subsection 6.3.1 we outline the analysis steps, and informally introduce related models and hypotheses to be tested. Subsection 6.3.2 presents a more formal specification of selected models that are fitted to the Autism data, and Subsection 6.3.3 details the hypotheses tested in the analysis. To follow the analysis steps outlined in this section, refer to the schematic diagram presented in Figure 6.3. 6.3.1 Analysis Steps Step 1: Fit a model with a “loaded” mean structure (Model 6.1). Fit an initial random coefficient model with a “loaded” mean structure and random child-specific coefficients for the intercept, age, and age-squared. In Model 6.1, we fit a quadratic regression model for each child, which describes their VSAE as a function of age. This initial model includes the fixed effects of age1 , age-squared, SICD group, the SICD group by age interaction, and the SICD group by age-squared interaction. We also include three random effects associated with each child: a random intercept, a random age effect, and a random age-squared effect. This allows each child to have a unique parabolic trajectory, with coefficients that vary randomly around the fixed effects defining 1 To simplify interpretation of the intercept, we subtract two from the value of AGE and create an auxiliary variable named AGE 2. The intercept can then be interpreted as the predicted VSAE score at age 2, rather than at age zero, which is outside the range of our data. 256 Linear Mixed Models: A Practical Guide Using Statistical Software FIGURE 6.3: Model selection and related hypotheses for the analysis of the Autism data. the mean growth curve for each SICD group. We use REML to estimate the variances and covariances of the three random effects. Model 6.1 also includes residuals associated with the VSAE observations, which conditional on a given child are assumed to be independent and identically distributed. We encounter some estimation problems when we fit Model 6.1 using the software procedures. SAS proc mixed reports problems with estimating the covariance parameters for the random effects; the procedures in SPSS and R do not achieve convergence; the mixed command in Stata converges to a solution, but encounters difficulties in estimating the standard errors of the covariance parameters; and HLM2 requires more than 1000 iterations to converge to a solution. As a result, the estimates of the covariance parameters defined in Model 6.1 differ widely across the packages. Step 2: Select a structure for the random effects (Model 6.2 vs. Model 6.2A). Fit a model without the random child-specific intercepts (Model 6.2), and test whether to keep the remaining random effects in the model. We noted in the initial data summary (Figures 6.1 and 6.2) that there was little variability in the VSAE scores at age 2, and therefore in Model 6.2 we attribute this variation Random Coefficient Models for Longitudinal Data: The Autism Example 257 entirely to random error rather than to between-subject variability. Compared to Model 6.1, we remove the random child-specific intercepts, while retaining the same fixed effects and the child-specific random effects of age and age-squared. Model 6.2 therefore implies that the children-specific predicted trajectories within a given level of SICD group have a common VSAE value at age 2 (i.e., there is no between-child variability in VSAE scores at age 2). We also assume for Model 6.2 that the age-related linear and quadratic random effects describe the between-subject variation. We formally test the need for the child-specific quadratic effects of age (Hypothesis 6.1) by using a REML-based likelihood ratio test. To perform this test, we fit a nested model (Model 6.2A) that omits the random quadratic effects. Based on the significant result of this test, we decide to retain both the linear and quadratic child-specific random effects in all subsequent models. Step 3: Reduce the model by removing nonsignificant fixed effects (Model 6.2 vs. Model 6.3), and check model diagnostics. In this step, we test whether the fixed effects associated with the AGE-squared × SICDEGP interaction can be omitted from Model 6.2 (Hypothesis 6.2). We conclude that these fixed effects are not significant, and we remove them to form Model 6.3. We then test whether the fixed effects associated with the AGE × SICDEGP interaction can be omitted from Model 6.3 (Hypothesis 6.3), and find that these fixed effects are significant and should be retained in the model. Finally, we refit Model 6.3 (our final model) using REML estimation to obtain unbiased estimates of the covariance parameters. We check the assumptions for Model 6.3 by examining the distribution of the residuals and of the EBLUPs for the random effects. We also investigate the agreement of the observed VSAE values with the conditional predicted values based on Model 6.3 using scatter plots. The diagnostic plots are generated using the R software package (Section 6.9). In Figure 6.3 we summarize the model selection process and hypotheses considered in the analysis of the Autism data. Refer to Subsection 3.3.1 for details on the notation in this figure. 6.3.2 Model Specification Selected models considered in the analysis of the Autism data are summarized in Table 6.3. 6.3.2.1 General Model Specification The general form of Model 6.1 for an individual response, VSAEti , on child i at the t-th visit (t = 1, 2, 3, 4, 5, corresponding to ages 2, 3, 5, 9 and 13), is shown in (6.1). This specification corresponds closely to the syntax used to fit the model using the procedures in SAS, SPSS, R, and Stata. VSAEti = β0 + β1 × AGE 2ti + β2 × AGE 2SQti + β3 × SICDEGP1i ⎫ ⎪ ⎪ ⎪ ⎬ + β4 × SICDEGP2i + β5 × AGE 2ti × SICDEGP1i fixed + β6 × AGE 2ti × SICDEGP2i × β7 × AGE 2SQti × SICDEGP1i ⎪ ⎪ ⎪ ⎭ + β8 × AGE 2SQti × SICDEGP2i +u0i + u1i × AGE 2ti + u2i × AGE 2SQti + εti } random (6.1) 258 TABLE 6.3: Summary of Selected Models Considered for the Autism Data Term/Variable Fixed effects Child (i) Residuals Time (t) Covariance Parameters (θ D ) for D Matrix Child level Covariance Parameters (θ R ) for Ri Matrix Time level HLM Notation Notation Intercept AGE 2 AGE 2SQ SICDEGP1 SICDEGP2 AGE 2 × SICDEGP1 AGE 2 × SICDEGP2 AGE 2SQ × SICDEGP1 AGE 2SQ × SICDEGP2 β0 β1 β2 β3 β4 β5 β6 β7 β8 β00 β10 β20 β01 β02 β11 β12 β21 β22 Intercept AGE 2 AGE 2SQ u0i u1i u2i r0i r1i r2i εti eti 2 σint τ [1,1] Covariance of intercepts, AGE 2 effects Covariance of intercepts, AGE 2SQ effects Variance of AGE 2 effects Covariance of AGE 2 effects, AGE 2SQ effects Variance of AGE 2SQ effects σint,age τ [1,2] σint,age-sq τ [1,3] 2 σage σage,age-sq τ [2,2] τ [2,3] σage-sq τ [3,3] Residual variance σ2 σ2 Variance of intercepts Model 6.1 6.2 6.3 √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Linear Mixed Models: A Practical Guide Using Statistical Software Random effects General Random Coefficient Models for Longitudinal Data: The Autism Example 259 In (6.1), the AGE 2 variable represents the value of AGE minus 2, and AGE 2SQ represents AGE 2 squared. We include two dummy variables, SICDEGP1 and SICDEGP2, to indicate the first two levels of the SICD group. Because we set the fixed effect for the third level of the SICD group to 0, we consider SICDEGP = 3 as the “reference category.” There are two variables that represent the interaction between age and SICD group: AGE 2 × SICDEGP1 and AGE 2 × SICDEGP2. There are also two variables that represent the interaction between age-squared and SICD group: AGE 2SQ × SICDEGP1 and AGE 2SQ × SICDEGP2. The parameters β0 through β8 represent the fixed effects associated with the intercept, the covariates, and the interaction terms in the model. Because the fixed intercept, β0 , corresponds to the predicted VSAE score when all covariates, including AGE 2, are equal to zero, the intercept can be interpreted as the mean predicted VSAE score for children at 2 years of age in the reference category of the SICD group (SICDEGP = 3). The parameters β1 and β2 represent the fixed effects of age and age-squared for the reference category of the SICD group (SICDEGP = 3). The fixed effects β3 and β4 represent the difference in the intercept for the first two levels of the SICD group vs. the reference category. The fixed effects β5 and β6 represent the differences in the linear effect of age between the first two levels of SICD group and the linear effect of age in the reference category of the SICD group. Similarly, the fixed effects β7 and β8 represent the differences in the quadratic effect of age between the first two levels of the SICD group and the quadratic effect of age in the reference category. The terms u0i , u1i , and u2i in (6.1) represent the random effects associated with the intercept, linear effect of age, and quadratic effect of age for child i. The distribution of the vector of the three random effects, ui , associated with child i is assumed to be multivariate normal: ⎛ ⎞ u0i ui = ⎝ u1i ⎠ ∼ N (0, D) u2i Each of the three random effects has a mean of 0, and the variance-covariance matrix, D, for the random effects is: ⎛ ⎞ 2 σint σint,age σint,age-squared 2 σint,age σage σage,age-squared ⎠ D=⎝ 2 σint,age-squared σage,age-squared σage-squared The term εti in (6.1) represents the residual associated with the observation at time t on child i. The distribution of the residuals can be written as εti ∼ N (0, σ2 ) We assume that the residuals are independent and identically distributed, conditional on the random effects, and that the residuals are independent of the random effects. We do not include the specification of other models (e.g., Models 6.2 and 6.3) in this section. These models can be obtained by simplification of Model 6.1. For example, the D matrix in Model 6.2 has the following form, because the random intercepts are omitted from the model: 2 σage σage,age-squared D= 2 σage,age-squared σage-squared 260 6.3.2.2 Linear Mixed Models: A Practical Guide Using Statistical Software Hierarchical Model Specification We now present Model 6.1 in hierarchical form, using the same notation as in (6.1). The correspondence between this notation and the HLM software notation is defined in Table 6.3. The hierarchical model has two components, reflecting contributions from the two levels of the Autism data: Level 1 (the time level), and Level 2 (the child level). Level 1 Model (Time) VSAEti = b0i + b1i × AGE 2ti + b2i × AGE 2SQti + εti (6.2) where εti ∼ N (0, σ 2 ) This model shows that at Level 1 of the data, we have a set of child-specific quadratic regressions of VSAE on AGE 2 and AGE 2SQ. The intercept (b0i ), the linear effect of AGE 2 (b1i ), and the quadratic effect of AGE 2SQ (b2i ) defined in the Level 2 model are allowed to vary between children, who are indexed by i. The unobserved child-specific coefficients for the intercept, linear effect of age, and quadratic effect of age (b0i , b1i , and b2i ) in the Level 1 model depend on fixed effects associated with Level 2 covariates and random child effects, as shown in the Level 2 model below: Level 2 Model (Child) b0i = β0 + β3 × SICDEGP1i + β4 × SICDEGP2i + u0i b1i = β1 + β5 × SICDEGP1i + β6 × SICDEGP2i + u1i b2i = β2 + β7 × SICDEGP1i + β8 × SICDEGP2i + u2i (6.3) where ⎛ ⎞ u0i ui = ⎝ u1i ⎠ ∼ N (0, D) u2i The Level 2 model in (6.3) shows that the intercept (b0i ) for child i depends on the fixed overall intercept (β0 ), the fixed effects (β3 and β4 ) of the child-level covariates SICDEGP1 and SICDEGP2, and a random effect (u0i ) associated with child i. The child-specific linear effect of age (b1i ) depends on the overall fixed effect of age (β1 ), the fixed effect of SICDEGP1 (β5 ), the fixed effect of SICDEGP2 (β6 ), and a random effect (u1i ) associated with child i. The equation for the child-specific quadratic effect of age (b2i ) for child i is defined similarly. The random effects in the Level 2 model allow the child-specific intercepts, linear effects of age, and quadratic effects of age to vary randomly between children. The variance-covariance matrix (D) of the random effects is defined as in the general model specification. The hierarchical specification of Model 6.1 is equivalent to the general specification for this model presented in Subsection 6.3.2.1. We can derive the model as specified in (6.1) by substituting the expressions for b0i , b1i , and b2i from the Level 2 model (6.3) into the Level Random Coefficient Models for Longitudinal Data: The Autism Example 261 1 model (6.2). The fixed effects associated with the child-specific covariates SICDEGP1 and SICDEGP2 in the Level 2 equations for b1i and b2i represent interactions between these covariates and AGE 2 and AGE 2SQ in the general model specification. 6.3.3 Hypothesis Tests Hypothesis tests considered in the analysis of the Autism data are summarized in Table 6.4. Hypothesis 6.1. The random effects associated with the quadratic effect of AGE can be omitted from Model 6.2. We indirectly test whether these random effects can be omitted from Model 6.2. The null and alternative hypotheses are: 2 σage 0 H0 : D = 0 0 2 σage σage,age-squared HA : D = 2 σage,age-squared σage-squared To test Hypothesis 6.1, we use a REML-based likelihood ratio test. The test statistic is the value of the –2 REML log-likelihood value for Model 6.2A (the nested model excluding the random quadratic age effects) minus the value for Model 6.2 (the reference model). To obtain a p-value for this statistic, we refer it to a mixture of χ2 distributions, with 1 and 2 degrees of freedom and equal weight 0.5. Hypothesis 6.2. The fixed effects associated with the AGE-squared × SICDEGP interaction are equal to zero in Model 6.2. The null and alternative hypotheses in this case are specified as follows: H0 : β 7 = β 8 = 0 HA : β7 = 0 or β8 = 0 We test Hypothesis 6.2 using an ML-based likelihood ratio test. The test statistic is the value of the –2 ML log-likelihood for Model 6.3 (the nested model excluding the fixed effects associated with the interaction) minus the value for Model 6.2 (the reference model). To obtain a p-value for this statistic, we refer it to a χ2 distribution with 2 degrees of freedom, corresponding to the 2 additional fixed-effect parameters in Model 6.2. Hypothesis 6.3. The fixed effects associated with the AGE × SICDEGP interaction are equal to zero in Model 6.3. The null and alternative hypotheses in this case are specified as follows: H0 : β 5 = β 6 = 0 HA : β5 = 0 or β6 = 0 We test Hypothesis 6.3 using an ML-based likelihood ratio test. The test statistic is the value of the –2 ML log-likelihood for Model 6.4 (the nested model excluding the fixed effects associated with the interaction) minus the value for Model 6.3 (the reference model). To obtain a p-value for this statistic, we refer it to a χ2 distribution with 2 degrees of freedom, corresponding to the 2 additional fixed-effect parameters in Model 6.3. For the results of these hypothesis tests, see Section 6.5. 262 TABLE 6.4: Summary of Hypotheses Tested in the Autism Analysis Models Compared Label Null (H0 ) Alternative (HA ) Test Nested Model (H0 ) Ref. Model (HA ) Est. Method Test Stat. Dist. under H0 Drop u2i (random effects associated with AGE-squared) Retain u2i LRT Model 6.2A Model 6.2 REML 0.5χ21 +0.5χ22 6.2 Drop fixed effects associated with AGEsquared by SICDEGP interaction (β7 = β8 = 0) Either β7 = 0, or β8 = 0 LRT Model 6.3 Model 6.2 ML χ22 6.3 Drop fixed effects associated with AGE by SICDEGP interaction (β5 = β6 = 0) Either β5 = 0, or β6 = 0 LRT Model 6.4 Model 6.3 ML χ22 Linear Mixed Models: A Practical Guide Using Statistical Software 6.1 Random Coefficient Models for Longitudinal Data: The Autism Example 6.4 263 Analysis Steps in the Software Procedures In general, when fitting an LMM to longitudinal data using the procedures discussed in this book, all observations available for a subject at any time point are included in the analysis. For this approach to yield correct results, we assume that missing values are missing at random, or MAR (Little & Rubin, 2002). See Subsection 2.9.4 for a further discussion of the MAR concept, and how missing data are handled by software procedures that fit LMMs. We compare results for selected models across the software procedures in Section 6.6. 6.4.1 SAS We first import the comma-separated data file (autism.csv, assumed to be located in the C:\temp directory) into SAS, and create a temporary SAS data set named autism. PROC IMPORT OUT = WORK.autism DATAFILE="C:\temp\autism.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; RUN; Next, we generate a data set named autism2 that contains the new variable AGE 2 (equal to AGE minus 2), and its square, AGE 2SQ. data autism2; set autism; age_2 = age - 2; age_2sq = age_2*age_2; run; Step 1: Fit a model with a “loaded” mean structure (Model 6.1). The SAS syntax for Model 6.1 is as follows: title "Model 6.1"; proc mixed data = autism2 covtest; class childid sicdegp; model vsae = age_2 age_2sq sicdegp age_2*sicdegp age_2sq*sicdegp / solution ddfm = sat; random int age_2 age_2sq / subject = childid type = un g v solution; run; The model statement specifies the dependent variable, VSAE, and lists the terms that have fixed effects in the model. The ddfm = sat option is used to request the Satterthwaite degrees of freedom approximation for the denominator in the F -tests of fixed effects (see Subsection 3.11.6 for a discussion of denominator degrees of freedom options in SAS). The random statement lists the child-specific random effects associated with the intercept (int), the linear effect of age (age_2), and the quadratic effect of age (age_2sq). We specify the structure of the variance-covariance matrix of the random effects (called the G matrix by SAS; see Subsection 2.2.3) as unstructured (type = un). The g option requests that a single block of the estimated G matrix (the 3 × 3 D matrix in our notation) be displayed in 264 Linear Mixed Models: A Practical Guide Using Statistical Software the output, while the v option requests that the estimate of the implied marginal variancecovariance matrix for observations on the first child (the subject variable) be displayed as well. The solution option in the random statement instructs SAS to display the EBLUPs of the three random effects associated with each level of CHILDID. This option can be omitted to shorten the output. Software Note: The following note is displayed in the SAS log after fitting Model 6.1: NOTE: Estimated G matrix is not positive-definite. This message is important, and should not be disregarded (even though SAS does not generate an error, which would cause the model fitting to terminate). When such a message is generated in the log, results of the model fit should be interpreted with extreme caution, and the model may need to be simplified or respecified. The NOTE means that proc mixed has converged to an estimated solution for the covariance parameters in the G matrix that results in G being nonpositive-definite. One reason for G being nonpositive-definite is that a variance parameter estimate might be either very small (close to zero), or lie outside the parameter space (i.e., is estimated to be negative). There can be nonpositive-definite G matrices in which this is not the case (i.e., there are cases in which all variance estimates are positive, but the matrix is still not positive-definite). We note in the SAS output that the estimate for the variance of 2 the random intercepts (σint ) is set to zero. One way to investigate the problem encountered with the estimation of the G matrix is to relax the requirement that it be positive-definite by using the nobound option in the proc mixed statement: proc mixed data=autism2 nobound; A single block of the G matrix obtained after specifying the nobound option (corresponding to the D matrix introduced in Subsection 6.3.2) is: # Estimated G Matrix Row Effect childid 1 Intercept 2 age_2 3 age_2sq " 1 1 1 Col1 Col2 Col3 -10.5406 4.2760 0.1423 4.2760 11.9673 -0.4038 0.1423 -0.4038 0.1383 ! Note that this block of the estimated G matrix is symmetric as needed, but that the value of the entry that corresponds to the estimated variance of the random intercepts is negative (-10.54); G is therefore not a variance-covariance matrix. Consequently, we are not able to make a valid statement about the variance of the child-specific intercepts in the context of Model 6.1. If we are not interested in making inferences about the between-child variability, this G matrix is valid in the context of the marginal model implied by Model 6.1. As long as the overall V matrix is positive-definite, we can use the marginal model to make valid inferences about the fixed-effect parameters. This alternative approach, which does not apply constraints to the D matrix (i.e., does not force it to be positive-definite) is not presently available in the other software procedures. Random Coefficient Models for Longitudinal Data: The Autism Example 265 In spite of the problems encountered in fitting Model 6.1, we consider the results generated by proc mixed (using the original syntax, without the nobound option) in Section 6.6 so that we can make comparisons across the software procedures. Step 2: Select a structure for the random effects (Model 6.2 vs. Model 6.2A). We now fit Model 6.2, which has the same fixed effects as Model 6.1 but omits the random effects associated with the child-specific intercepts. We then decide whether to keep the remaining random effects in the model. The syntax for Model 6.2 is as follows: title "Model 6.2"; proc mixed data = autism2 covtest; class childid sicdegp; model vsae = age_2 age_2sq sicdegp age_2*sicdegp age_2sq*sicdegp / solution ddfm = sat; random age_2 age_2sq / subject = childid type = un g; run; Note that the int (intercept) term has been removed from the random statement, which is the only difference between the syntax for Model 6.1 and Model 6.2. SAS does not indicate any problems with the estimation of the G matrix for Model 6.2. We next carry out a likelihood ratio test of Hypothesis 6.1, to decide whether we need to keep the random effects associated with age-squared in Model 6.2. To test Hypothesis 6.1, we fit a nested model (Model 6.2A) using syntax much like the syntax for Model 6.1, but omitting the AGE 2SQ term from the random statement, as shown in the following text. We retain the type = un and g options below: random age_2 / subject = childid type = un g; We calculate a likelihood ratio test statistic for Hypothesis 6.1 by subtracting the –2 REML log-likelihood of Model 6.2 (the reference model, –2 REML LL = 4615.3) from that of Model 6.2A (the nested model, –2 REML LL = 4699.2). The p-value for the resulting test statistic is derived by referring it to a mixture of χ2 distributions, with 1 and 2 degrees of freedom and weights equal to 0.5, as shown in the syntax that follows below. The p-value for this test will be displayed in the SAS log. Based on the significant result of this test (p < 0.001), we retain both the random linear and quadratic effects of age in Model 6.2. title "p-value for Hypothesis 6.1"; data _null_; lrtstat = 4699.2 - 4615.3; pvalue = 0.5*(1 - probchi(lrtstat,1)) + 0.5*(1 - probchi(lrtstat,2)); format pvalue 10.8; put pvalue =; run; Step 3: Reduce the model by removing nonsignificant fixed effects (Model 6.2 vs. Model 6.3). In this step, we investigate whether we can reduce the number of fixed effects in Model 6.2 while maintaining the random linear and quadratic age effects. To test Hypothesis 6.2 (where the null hypothesis is that there is no AGE-squared × SICDEGP interaction), we fit Model 6.2 and Model 6.3 using maximum likelihood estimation, by including the method = ML option in the proc mixed statement: 266 Linear Mixed Models: A Practical Guide Using Statistical Software title "Model 6.2 (ML)"; proc mixed data = autism2 covtest method = ML; class childid sicdegp; model vsae = age_2 age_2sq sicdegp age_2*sicdegp age_2sq*sicdegp / solution ddfm = sat; random age_2 age_2sq / subject = childid type = un; run; To fit a nested model, Model 6.3 (also using ML estimation), we remove the interaction term SICDEGP × AGE 2SQ from the model statement: title "Model 6.3 (ML)"; proc mixed data = autism2 covtest method = ML; class childid sicdegp; model vsae = age_2 age_2sq sicdegp age_2*sicdegp / solution ddfm = sat; random age_2 age_2sq / subject = childid type = un; run; We then compute a likelihood ratio test statistic for Hypothesis 6.2 by subtracting the −2 ML log-likelihood of Model 6.2 (the reference model, –2 LL = 4610.4) from that of Model 6.3 (the nested model, –2 LL = 4612.3). The SAS code for this likelihood ratio test is shown in the syntax that follows. Based on the nonsignificant test result (p = 0.39), we drop the fixed effects associated with the SICDEGP × AGE 2SQ interaction from Model 6.2 and obtain Model 6.3. Additional hypothesis tests for fixed effects (i.e., Hypothesis 6.3) do not suggest any further reduction of Model 6.3. title "P-value for Hypothesis 6.2"; data _null_; lrtstat = 4612.3 - 4610.4; df = 2; pvalue = 1 - probchi(lrtstat,df); format pvalue 10.8; put lrtstat= df= pvalue= ; run; We now refit Model 6.3 (our final model) using REML estimation. The ods output statement is included to capture the EBLUPs of the random effects in a data set, eblup_dat, and to get the conditional studentized residuals in another data set, inf_dat. The captured data sets can be used for checking model diagnostics. title "Model 6.3 (REML)"; ods output influence = inf_dat solutionR = eblup_dat; ods exclude influence solutionR; proc mixed data = autism2 covtest; class childid sicdegp; model vsae = sicdegp age_2 age_2sq age_2*sicdegp / solution ddfm = sat influence; random age_2 age_2sq / subject = childid solution g v vcorr type = un; run; The ods exclude statement requests that SAS not display the influence statistics for each observation or the EBLUPs for the random effects in the output, to save space. The Random Coefficient Models for Longitudinal Data: The Autism Example 267 ods exclude statement does not interfere with the ods output statement; influence statistics and EBLUPs are still captured in separate data sets, but they are omitted from the output. We must also include the influence option in the model statement and the solution option in the random statement for these data sets to be created. See Chapter 3 (Subsection 3.10.2) for information on obtaining influence statistics and graphics for the purposes of checking model diagnostics using SAS. 6.4.2 SPSS We first import the raw comma-separated data file, autism.csv, from the C:\temp folder into SPSS: GET DATA /TYPE = TXT /FILE ’C:\temp\autism.csv’ /DELCASE = LINE /DELIMITERS = "," /ARRANGEMENT = DELIMITED /FIRSTCASE = 2 /IMPORTCASE = ALL /VARIABLES = age F2.1 vsae F3.2 sicdegp F1.0 childid F2.1 . CACHE. EXECUTE. Next, we compute the new AGE variable (AGE 2) and the squared version of this new variable, AGE 2SQ: COMPUTE age_2 = age - 2 . EXECUTE . COMPUTE age_2sq = age_2*age_2 . EXECUTE. We now proceed with the analysis steps. Step 1: Fit a model with a “loaded” mean structure (Model 6.1). The SPSS syntax for Model 6.1 is as follows: * Model 6.1 . MIXED vsae WITH age_2 age_2sq BY sicdegp /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = age_2 age_2sq sicdegp age_2*sicdegp age_2sq*sicdegp | SSTYPE(3) /METHOD = REML /PRINT = G SOLUTION /RANDOM INTERCEPT age_2 age_2sq | SUBJECT(CHILDID) COVTYPE(UN) . 268 Linear Mixed Models: A Practical Guide Using Statistical Software The dependent variable, VSAE, is listed first after invocation of the MIXED command. The continuous covariates (AGE 2 and AGE 2SQ) appear after the WITH keyword. The categorical fixed factor, SICDEGP, appears after the BY keyword. The convergence criteria (listed after the /CRITERIA subcommand) are the defaults obtained when the model is set up using the SPSS menu system. The FIXED subcommand identifies the terms with associated fixed effects in the model. The METHOD subcommand identifies the estimation method for the covariance parameters (the default REML method is used). The PRINT subcommand requests that the estimated G matrix be displayed (the displayed matrix corresponds to the D matrix that we defined for Model 6.1 in Subsection 6.3.2). We also request that estimates of the fixed effects (SOLUTION) be displayed in the output. The RANDOM subcommand specifies that the model should include random effects associated with the intercept (INTERCEPT), the linear effect of age (AGE_2), and the quadratic effect of age (AGE_2SQ) for each level of CHILDID. The SUBJECT is specified as CHILDID in the RANDOM subcommand. The structure of the G matrix of variances and covariances of the random effects (COVTYPE) is specified as unstructured (UN) (see Subsection 6.3.2). When we attempt to fit Model 6.1 in IBM SPSS Statistics (Version 21), the following warning message appears in the SPSS output: Warnings Iteration was terminated but convergence has not been achieved. The MIXED procedure continues despite this warning. Subsequent results produced are based on the last iteration. Validity of the model fit is uncertain. Although this is a warning message and does not appear to be a critical error (which would cause the model fitting to terminate), it should not be ignored and the model fit should be viewed with caution. It is always good practice to check the SPSS output for similar warnings when fitting a linear mixed model. Investigation of the “Estimates of Covariance Parameters” table in the SPSS output reveals problems. ' $ Estimates of Covariance Parameters (a) Parameter Estimate Std. Error Residual 36.945035 2.830969 Intercept + age_2 + age_2sq [subject = childid] UN UN UN UN UN UN (1,1) 0.000000 (b) 0.000000 (2,1) -15.014722 2.406356 (2,2) 15.389867 3.258686 (3,1) 3.296464 0.237604 (3,2) -0.676210 0.254689 (3,3) 0.135217 0.028072 a. Dependent Variable: vsae. b. This covariance parameter is redundant. & % The second footnote states that the variance of the random effects associated with the INTERCEPT for each child (labeled UN(1,1) in the table) is “redundant.” This variance estimate is set to a value of 0.000000, with a standard error of zero. In spite of the estimation problems encountered, we display results from the fit of Model 6.1 in SPSS in Section 6.6, for comparison with the other software procedures. Random Coefficient Models for Longitudinal Data: The Autism Example 269 Step 2: Select a structure for the random effects (Model 6.2 vs. Model 6.2A). We now fit Model 6.2, which includes the same fixed effects as Model 6.1, but omits the random effects associated with the intercept for each CHILDID. The only change is in the RANDOM subcommand: * Model 6.2 . MIXED vsae WITH age 2 age 2sq BY sicdegp /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = age 2 age 2sq sicdegp age 2*sicdegp age 2sq*sicdegp | SSTYPE(3) /METHOD = REML /PRINT = G SOLUTION /RANDOM age 2 age 2sq | SUBJECT(CHILDID) COVTYPE(UN) . Note that the INTERCEPT term has been omitted from the RANDOM subcommand in the preceding code. The fit of Model 6.2 does not generate any warning messages. To test Hypothesis 6.1, we fit a nested model (Model 6.2A) by modifying the RANDOM subcommand for Model 6.2 as shown: * Model 6.2A modified RANDOM subcommand. /RANDOM age 2 | SUBJECT(CHILDID). Note that the AGE 2SQ term has been omitted. A likelihood ratio test can now be carried out by subtracting the –2 REML log-likelihood for Model 6.2 (the reference model) from that of Model 6.2A (the reduced model). The p-value for the test statistic is derived by referring it to a mixture of χ2 distributions, with equal weight 0.5 and 1 and 2 degrees of freedom (see Section 6.5.1). Based on the significant result (p < 0.001) of this test, we retain the random effects associated with the quadratic (and therefore linear) effects of age in Model 6.2. Step 3: Reduce the model by removing nonsignificant fixed effects (Model 6.2 vs. Model 6.3). We now proceed to reduce the number of fixed effects in the model, while maintaining the random-effects structure specified in Model 6.2. To test Hypothesis 6.2, we first refit Model 6.2 using ML estimation (/METHOD = ML): * Model 6.2 (ML) . MIXED vsae WITH age 2 age 2sq BY sicdegp /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = age 2 age 2sq sicdegp age 2*sicdegp age 2sq*sicdegp | SSTYPE(3) /METHOD = ML /PRINT = G SOLUTION /RANDOM age 2 age 2sq | SUBJECT(CHILDID) COVTYPE(UN) . 270 Linear Mixed Models: A Practical Guide Using Statistical Software Next, we fit a nested model (Model 6.3) by removing the term representing the interaction between SICDEGP and the quadratic effect of age, SICDEGP × AGE 2SQ, from the /FIXED subcommand. We again use /METHOD = ML (the other parts of the full MIXED command are not shown here): /FIXED = age_2 age_2sq sicdegp age_2*sicdegp | SSTYPE(3) /METHOD = ML Based on the nonsignificant likelihood ratio test (p = 0.39; see Subsection 6.5.2), we conclude that the fixed effects associated with this interaction can be dropped from Model 6.2, and we proceed with Model 6.3. Additional likelihood ratio tests (e.g., a test of Hypothesis 6.3) suggest no further reduction of Model 6.3. We now fit Model 6.3 (the final model in this example) using REML estimation: * Model 6.3 (REML). MIXED vsae WITH age_2 age_2sq BY sicdegp /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(l) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = age_2 age_2sq sicdegp age_2*sicdegp | SSTYPE(3) /METHOD = REML /PRINT = G SOLUTION /SAVE = PRED RESID /RANDOM age_2 age_2sq | SUBJECT(CHILDID) COVTYPE(UN) . The SAVE subcommand is added to the syntax to save the conditional predicted values (PRED) in the data set. These predicted values are based on the fixed effects and the EBLUPs of the random AGE 2 and AGE 2SQ effects. We also save the conditional residuals (RESID) in the data set. These variables can be used for checking assumptions about the residuals for this model. 6.4.3 R We start with the same data frame object (autism.updated) that was used for the initial data summary in R (Subsection 6.2.2), but we first create additional variables that will be used in subsequent analyses. Note that we create the new variable SICDEGP2, which has a value of zero for SICDEGP = 3, so that SICD group 3 will be considered the reference category (lowest value) for the SICDEGP2.F factor. We do this to be consistent with the output from the other software procedures. > # Compute age.2 (AGE minus 2) and age.2sq (AGE2 squared). > age.2 <- age - 2 > age.2sq <- age.2*age.2 > > > > > > # Recode the SICDEGP factor for model fitting. sicdegp2 <- sicdegp sicdegp2[sicdegp == 3] <- 0 sicdegp2[sicdegp == 2] <- 2 sicdegp2[sicdegp == 1] <- 1 sicdegp2.f <- factor(sicdegp2) Random Coefficient Models for Longitudinal Data: The Autism Example 271 > # Omit two records with VSAE = NA, and add the recoded > # variables to the new data frame object. > autism.updated <- subset(data.frame(autism, sicdegp2.f, age.2), !is.na(vsae)) Alternatively, the new variable SICDEGP2 can be obtained using this syntax: > sicdegp2 <- cut(sicdegp, breaks = 0:3, labels= FALSE) 6.4.3.1 Analysis Using the lme() Function We first load the nlme library, so that we can utilize the nlme version of the lme() function for this example: > library(nlme) Next, we create a “grouped” data frame object named autism.grouped, using the groupedData() function, to define the hierarchical structure of the Autism data set. The arguments of this function indicate that (1) VSAE is the dependent variable, (2) AGE 2 is the primary covariate, and (3) CHILDID defines the “groups” of observations with which random effects are associated when fitting the models. Note that the groupedData() function is only available after loading the nlme library: > autism.grouped <- groupedData(vsae ~ age.2 | childid, data = autism.updated, order.groups = F) The order.groups = F argument is specified to preserve the original order of the children in the input data set (children are sorted in descending order by SICDEGP, so SICDEGP = 3 is the first group in the data set). We now continue with the analysis steps. Step 1: Fit a model with a loaded mean structure (Model 6.1). We first fit Model 6.1 using the lme() function (note that the modified version of the AGE variable, AGE 2, is used): > model6.1.fit <- lme(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2:sicdegp2.f + I(age.2^2):sicdegp2.f, random = ~ age.2 + I(age.2^2), method= "REML", data = autism.grouped) This specification of the lme() function is described as follows: • The name of the created object that contains the results of the fitted model is model6.1.fit. • The model formula, vsae ~ age.2 + I(age.2^2) + sicdegp2.f +age.2:sicdegp2.f + I(age.2^2):sicdegp2.f, defines the continuous response variable (VSAE) and the terms that have fixed effects in the model (including the interactions). The lme() function automatically creates the appropriate dummy variables for the categories of the SICDEGP2.F factor, treating the lowest-valued category (which corresponds to the original value of SICDEGP=3) as the reference. The I() function is used to inhibit R from interpreting the “^” character as an operator in the fixed-effects formula (as opposed to an arithmetic operator meaning “raised to the power of”). A fixed intercept is included by default. 272 Linear Mixed Models: A Practical Guide Using Statistical Software • The second argument of the function, random = ~ age.2 + I(age.2^2), indicates the variables that have random effects associated with them. A random effect associated with the intercept for each level of CHILDID is included by default. These random effects are associated with each level of CHILDID because of the definition of the grouped data frame object. • The third argument of the function, method= "REML", requests that the default REML estimation method is to be used. • The fourth argument of the function, data = autism.grouped, indicates the “grouped” data frame object to be used. By default, the lme() function uses an unstructured D matrix for the variancecovariance matrix of the random effects. After fitting Model 6.1, the following message is displayed: Error in lme.formula(vsae) ~ age.2 + I(age.2^2) + sicdegp2.f + age.2:sicdegp2.f + : nlminb problem, convergence error code = 1 iteration limit reached without convergence (10) The estimation algorithm did not converge to a solution for the parameter estimates. As a result, the model6.1.fit object is not created, and estimates of the parameters in Model 6.1 cannot be obtained using the summary() function. We proceed to consider Model 6.2 as an alternative. Step 2: Select a structure for the random effects (Model 6.2 vs. Model 6.2A). We now fit Model 6.2, which includes the same fixed effects as in Model 6.1, but omits the random effects associated with the intercept for each child. We omit these random effects by including a -1 in the random-effects specification: > model6.2.fit <- lme(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2:sicdegp2.f + I(age.2^2):sicdegp2.f, random = ~ age.2 + I(age.2^2) - 1, method= "REML", data = autism.grouped) Results from the fit of Model 6.2 are accessible using summary(model6.2.fit). To decide whether to keep the random effects associated with the quadratic effects of age in Model 6.2, we test Hypothesis 6.1 using a likelihood ratio test. To do this, we fit a nested model (Model 6.2A) by removing AGE 2SQ (specifically, I(age.2^2)) from the random portion of the syntax: > model6.2a.fit <- update(model6.2.fit, random = ~ age.2 - 1) A likelihood ratio test is performed by subtracting the –2 REML log-likelihood for Model 6.2 (4615.3) from that of Model 6.2A (4699.2). This difference (83.9) follows a mixture of χ2 distributions, with equal weight 0.5 and 1 and 2 degrees of freedom; more details for this test are included in Subsection 6.5.1. We calculate a p-value for this test statistic in R by making use of the pchisq() function: > h6.1.pvalue <- 0.5*(1-pchisq(83.9,1)) + 0.5*(1-pchisq(83.9,2)) > h6.1.pvalue The significant (p < 0.001) likelihood ratio test for Hypothesis 6.1 indicates that the random effects associated with the quadratic (and therefore linear) effects of age should be retained in Model 6.2 and in all subsequent models. Random Coefficient Models for Longitudinal Data: The Autism Example 273 Step 3: Reduce the model by removing nonsignificant fixed effects (Model 6.2 vs. Model 6.3). To test the fixed effects associated with the age-squared by SICD group interaction (Hypothesis 6.2), we first refit Model 6.2 (the reference model) using maximum likelihood estimation (method = "ML"): > model6.2.ml.fit <- update(model6.2.fit, method = "ML") Next, we consider a nested model (Model 6.3), with the interaction between age-squared and SICD group omitted. To fit Model 6.3 using ML estimation, we update the fixed part of Model 6.2 by omitting the interaction between the squared version of AGE 2 and SICDEGP2.F: > model6.3.ml.fit <- update(model6.2.ml.fit, fixed = ~ age.2 + I(age.2^2) + sicdegp2.f + age.2:sicdegp2.f) We use the anova() function to perform a likelihood ratio test for Hypothesis 6.2: > anova(model6.2.ml.fit, model6.3.ml.fit) Based on the p-value for the test of Hypothesis 6.2 (p = 0.39; see Subsection 6.5.2), we drop the fixed effects associated with this interaction and obtain Model 6.3. An additional likelihood ratio test for the fixed effects associated with the age by SICD group interaction (i.e., Hypothesis 6.3) does not suggest that these fixed effects should be dropped from Model 6.3. We therefore refit our final model, Model 6.3, using REML estimation. To obtain Model 6.3 we update Model 6.2 with a previously used specification of the fixed argument: > model6.3.fit <- update(model6.2.fit, fixed = ~ age.2 + I(age.2^2) + sicdegp2.f + age.2:sicdegp2.f) The results obtained by applying the summary() function to the model6.3.fit object are displayed in Section 6.6. Section 6.9 contains R syntax for checking the Model 6.3 diagnostics. 6.4.3.2 Analysis Using the lmer() Function We first load the lme4 library, so that we can utilize the lmer() function for this example: > library(lme4) Step 1: Fit a model with a loaded mean structure (Model 6.1). Next, we fit Model 6.1 using the lmer() function (note that the modified version of the AGE variable, AGE 2, is used): > model6.1.fit.lmer <- lmer(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2*sicdegp2.f + I(age.2^2)*sicdegp2.f + (age.2 + I(age.2^2) | childid), REML = T, data = autism.updated) This specification of the lmer() function is described as follows: • The name of the created object that contains the results of the fitted model is model6.1.fit.lmer. 274 Linear Mixed Models: A Practical Guide Using Statistical Software • The first portion of the model formula, vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2*sicdegp2.f + I(age.2^2)*sicdegp2.f, defines the continuous response variable (VSAE), and the terms that have fixed effects in the model (including the interactions). The lmer() function automatically creates the appropriate dummy variables for the categories of the SICDEGP2.F factor, treating the lowest-valued category (which corresponds to the original value of SICDEGP = 3) as the reference. The I() function is used to inhibit R from interpreting the “^” character as an operator in the fixed-effects formula (as opposed to an arithmetic operator meaning “raised to the power of”). A fixed intercept is included by default. • The second portion of the model formula, + (age.2 + I(age.2^2) | childid), indicates the variables that have random effects associated with them (in parentheses). A random effect associated with the intercept for each level of CHILDID is included by default. These random effects are associated with each level of CHILDID by the use of | childid to “condition” the effects on CHILDID. • The third argument of the function, REML = T, requests that the REML estimation method is to be used. • The fourth argument of the function, data = autism.updated, indicates the “updated” data frame object to be used. By default, the lmer() function uses an unstructured D matrix for the variancecovariance matrix of the random effects. After fitting Model 6.1, we do not see an error message indicating lack of convergence, as was the case when performing the analysis using the lme() function. However, when applying the summary() function to the model fit object to examine the estimates of the parameters in this model, the estimate of the variance of the random CHILDID intercepts is displayed as 1.9381e − 10, which indicates that this estimate is essentially equal to 0. Estimates of variance components that are set to zero generally indicate that the randomeffects specification of a model should be reconsidered. We proceed to consider Model 6.2 as an alternative. Step 2: Select a structure for the random effects (Model 6.2 vs. Model 6.2A). We now fit Model 6.2, which includes the same fixed effects as in Model 6.1, but omits the random effects associated with the intercept for each child. We omit these random effects by including a -1 in the random-effects specification: > model6.2.fit.lmer <- lmer(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2*sicdegp2.f + I(age.2^2)*sicdegp2.f + (age.2 + I(age.2^2) - 1 | childid), REML = T, data = autism.updated) Results from the fit of Model 6.2 are accessible using summary(model6.2.fit.lmer). To decide whether to keep the random effects associated with the quadratic effects of age in Model 6.2, we test Hypothesis 6.1 using a likelihood ratio test. To do this, we fit a nested model (Model 6.2A) by removing AGE 2SQ (specifically, I(age.2^2)) from the random portion of the syntax: > model6.2a.fit.lmer <- lmer(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2*sicdegp2.f + I(age.2^2)*sicdegp2.f + (age.2 - 1 | childid), REML = T, data = autism.updated) Random Coefficient Models for Longitudinal Data: The Autism Example 275 A likelihood ratio test is performed by subtracting the –2 REML log-likelihood for Model 6.2 (4615.3) from that of Model 6.2A (4699.2). This difference (83.9) follows a mixture of χ2 distributions, with equal weight 0.5 and 1 and 2 degrees of freedom; more details for this test are included in Subsection 6.5.1. We calculate a p-value for this test statistic in R by making use of the pchisq() function: > h6.1.pvalue <- 0.5*(1-pchisq(83.9,1)) + 0.5*(1-pchisq(83.9,2)) > h6.1.pvalue The significant (p < 0.001) likelihood ratio test for Hypothesis 6.1 indicates that the random effects associated with the quadratic (and therefore linear) effects of age should be retained in Model 6.2 and in all subsequent models. Step 3: Reduce the model by removing nonsignificant fixed effects (Model 6.2 vs. Model 6.3). To test the fixed effects associated with the age-squared by SICD group interaction (Hypothesis 6.2), we first refit Model 6.2 (the reference model) using maximum likelihood estimation (note the use of REML = F): > model6.2.ml.fit.lmer <- lmer(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2*sicdegp2.f + I(age.2^2)*sicdegp2.f + (age.2 + I(age.2^2) - 1 | childid), REML = F, data = autism.updated) Next, we consider a nested model (Model 6.3), with the interaction between age-squared and SICD group omitted. To fit Model 6.3 using ML estimation, we update the fixed part of Model 6.2 by omitting the interaction between these two terms: > model6.3.ml.fit.lmer <- lmer(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2*sicdegp2.f + (age.2 + I(age.2^2) - 1 | childid), REML = F, data = autism.updated) We then use the anova() function to perform a likelihood ratio test for Hypothesis 6.2: > anova(model6.2.ml.fit.lmer, model6.3.ml.fit.lmer) Based on the p-value for the test of Hypothesis 6.2 (p = 0.39; see Subsection 6.5.2), we drop the fixed effects associated with this interaction and obtain Model 6.3. An additional likelihood ratio test for the fixed effects associated with the age by SICD group interaction (i.e., Hypothesis 6.3) does not suggest that these tested fixed effects should be dropped from Model 6.3. We therefore refit our final model, Model 6.3, using REML estimation (REML = T). To obtain Model 6.3, we update Model 6.2 with a previously used specification of the fixed effects: > model6.3.fit.lmer <- lmer(vsae ~ age.2 + I(age.2^2) + sicdegp2.f + age.2*sicdegp2.f + (age.2 + I(age.2^2) - 1 | childid), REML = T, data = autism.updated) The results obtained by applying the summary() function to the model6.3.fit.lmer object are displayed in Section 6.6. Section 6.9 contains R syntax for checking the Model 6.3 diagnostics. 276 Linear Mixed Models: A Practical Guide Using Statistical Software 6.4.4 Stata We begin the analysis by importing the comma-separated values file (autism.csv) containing the Autism data into Stata: . insheet using "C:\temp\autism.csv", comma We generate the variable AGE 2 by subtracting 2 from AGE, and then square this new variable to generate AGE 2SQ: . gen age_2 = age - 2 . gen age_2sq = age_2 * age_2 We now proceed with the analysis. Step 1: Fit a model with a “loaded” mean structure (Model 6.1). We first fit Model 6.1 using the mixed command, with the reml estimation option: . * Model 6.1 (REML) . mixed vsae ib3.sicdegp age_2 age_2sq ib3.sicdegp#c.age_2 ib3.sicdegp#c.age_2sq || childid: age_2 age_2sq, covariance(unstruct) variance reml The ib3. notation causes Stata to generate the appropriate indicator variables for the SICDEGP factor, using the highest level of SICDEGP (3) as the reference category. The terms listed after the dependent variable (VSAE) represent the fixed effects that we wish to include in this model, and include the two-way interactions between SICDEGP (treated as a categorical variable with level 3 as the reference category, using ib3.) and both AGE 2 and AGE 2SQ (both specified as continuous variables in the interaction terms using the c. notation, which is necessary for correct specification of these interactions). The random effects portion of the model is specified following the fixed effects portion, after two vertical lines (||), as follows: || childid: age_2 age_2sq, covariance(unstruct) variance We specify CHILDID as the grouping variable that identifies the Level 2 units. The variables listed after the colon (:) are the Level 1 (time-varying) covariates with effects that vary randomly between children. We list AGE 2 and AGE 2SQ to allow both the linear and quadratic effects of age to vary from child to child. A random effect associated with the intercept for each child will be included by default, and does not need to be explicitly specified. The structure of the variance-covariance matrix of the random effects (D) is specified as unstructured, by using the option covariance(unstruct). The variance option is used so that Stata will display the estimated variances of the random effects and their standard errors in the output, rather than the default standard deviations of the random effects. When attempting to fit this model in Stata, the following error message appears in red in the output: Hessian is not negative semidefinite conformability error r(503) Random Coefficient Models for Longitudinal Data: The Autism Example 277 While this error message is slightly cryptic, recall from Chapter 2 that the Hessian matrix is used to compute standard errors of the estimated parameters in a linear mixed model. This error message therefore indicates that there are problems with the Hessian matrix that was computed based on the specified model, and that these problems are preventing estimation of the standard errors (Stata does not provide any output for this model). This message should be considered seriously when it is encountered, and the model will need to be simplified or respecified. We now consider Model 6.2 as an alternative. Step 2: Select a structure for the random effects (Model 6.2 vs. Model 6.2A). We now fit Model 6.2, which includes the same fixed effects as Model 6.1, but excludes the random effects associated with the intercept for each child, by using the noconst option: . * Model 6.2 . mixed vsae ib3.sicdegp age_2 age_2sq ib3.sicdegp#c.age_2 ib3.sicdegp#c.age_2sq || childid: age_2 age_2sq, noconst covariance(unstruct) variance reml The resulting output does not indicate a problem with the computation of the standard errors. The information criteria associated with the fit of Model 6.2 can be obtained by submitting the estat ic command after estimation of the model has finished: . estat ic A likelihood ratio test of Hypothesis 6.1 is performed by fitting a nested model (Model 6.2A) that omits the random effects associated with AGE 2SQ: . * Model 6.2A . mixed vsae ib3.sicdegp age_2 age_2sq ib3.sicdegp#c.age_2 ib3.sicdegp#c.age_2sq || childid: age_2, noconst covariance(unstruct) variance reml We calculate a test statistic by subtracting the –2 REML log-likelihood for Model 6.2 from that for Model 6.2A. We then refer this test statistic to a mixture of χ2 distributions, with equal weight 0.5 and 1 and 2 degrees of freedom: . di (-2 * -2349.6013) - (-2 * -2307.6378) 83.927 . di 0.5 * chi2tail(1,83.927) + 0.5 * chi2tail(2,83.927) 3.238e-19 The significant result (p < 0.001) of this likelihood ratio test indicates that the variance of the random effects associated with the quadratic effect of age (Hypothesis 6.1) is significant in Model 6.2, so both the quadratic and the linear random effects are retained in all subsequent models (see Subsection 6.5.1 for more details). Step 3: Reduce the model by removing nonsignificant fixed effects (Model 6.2 vs. Model 6.3). We now refit Model 6.2 using maximum likelihood estimation, so that we can carry out likelihood ratio tests for the fixed effects in the model (starting with Hypothesis 6.2). This is accomplished by specifying the mle option: 278 Linear Mixed Models: A Practical Guide Using Statistical Software . * Model 6.2 (ML) . mixed vsae ib3.sicdegp age_2 age_2sq ib3.sicdegp#c.age_2 ib3.sicdegp#c.age_2sq || childid: age_2 age_2sq, noconst covariance(unstruct) variance mle After fitting Model 6.2 with ML estimation, we store the results in an object named model6_2_ml: . est store model6_2_ml We now omit the two fixed effects associated with the SICDEGP × AGE 2SQ interaction, and fit a nested model (Model 6.3): . * Model 6.3 (ML) . mixed vsae ib3.sicdegp age_2 age_2sq ib3.sicdegp#c.age_2 || childid: age_2 age_2sq, noconst covariance(unstruct) variance mle We store the results from the Model 6.3 fit in a second object named model6_3_ml: . est store model6_3_ml Finally, we perform a likelihood ratio test of Hypothesis 6.2 using the lrtest command: . lrtest model6_2_ml model6_3_ml The resulting test statistic (which follows a χ2 distribution with 2 degrees of freedom, corresponding to the 2 fixed effects omitted from Model 6.2) is not significant (p = 0.39). We therefore simplify the model by excluding the fixed effects associated with the SICD group by age-squared interaction, and obtain Model 6.3. Additional likelihood ratio tests can be performed for the remaining fixed effects in the model, beginning with the SICD group by age interaction (Hypothesis 6.3). No further model reduction is indicated; all remaining fixed effects are significant at the 0.05 level, and are retained. We fit the final model (Model 6.3) using REML estimation, and obtain the model fit criteria: . * Model 6.3 (REML) . mixed vsae ib3.sicdegp age_2 age_2sq ib3.sicdegp#c.age_2 || childid: age_2 age_2sq, noconst covariance(unstruct) variance reml . estat ic 6.4.5 6.4.5.1 HLM Data Set Preparation To perform the analysis of the Autism data using the HLM software package, two separate data sets need to be prepared: 1. The Level 1 (time-level) data set contains a single observation (row of data) for each observed age on each child, and is similar in structure to Table 6.2. The data set should include CHILDID, the time variable (i.e., AGE), a variable representing AGE-squared, and any other variables of interest measured over time for each child, including the response variable, VSAE. This data set must be sorted by CHILDID and by AGE within CHILDID. Random Coefficient Models for Longitudinal Data: The Autism Example 279 2. The Level 2 (child-level) data set contains a single observation (row of data) for each level of CHILDID. The variables in this file represent measures that remain constant for a given child. This includes variables collected at baseline (e.g., SICDEGP) or demographic variables (e.g., GENDER) that do not change as the child gets older. This data set must also include the CHILDID variable, and be sorted by CHILDID. Because the HLM program does not automatically create indicator variables for the levels of categorical predictors, we need to create two indicator variables representing the two nonreference levels of SICDEGP in the Level 2 data file prior to importing the data into HLM. For example, if the input data files were created in SPSS, the syntax used to compute the two appropriate indicator variables in the Level 2 file would look like this: COMPUTE EXECUTE COMPUTE EXECUTE sicdegp1 = (sicdegp = 1) . . sicdegp2 = (sicdegp = 2) . . We also create a variable AGE 2, which is equal to the original AGE variable minus two (i.e., AGE 2 is equal to zero when the child is 2 years old), and create the squared version of this new variable (AGE 2SQ). Both of these variables are created in the Level 1 data set prior to importing it into HLM. SPSS syntax to compute these new variables is shown below: COMPUTE EXECUTE COMPUTE EXECUTE age 2 = age - 2. . age 2sq = age 2 * age 2. . Once the Level 1 and Level 2 data sets have been created, we can proceed to prepare the multivariate data matrix (MDM) file in HLM. 6.4.5.2 Preparing the MDM File We start by creating a new MDM file using the Level 1 and Level 2 data sets described above. In the main HLM menu, click File, Make new MDM file, and then Stat package input. In the window that opens, select HLM2 to fit a two-level hierarchical linear model with random coefficients, and click OK. Select the Input File Type as SPSS/Windows. To make the MDM file for Model 6.1, locate the Level 1 Specification area and Browse to the location of the Level 1 data set. Click the Choose Variables button and select the following variables in the Level 1 file to be used in the Model 6.1 analysis: CHILDID (click on “ID” for the CHILDID variable), AGE 2 (click “in MDM” for this timevarying variable), AGE 2SQ (click “in MDM”), and the time-varying dependent variable, VSAE (click “in MDM”). Next, locate the Level 2 Specification area, and Browse to the location of the Level 2 data set. Click the Choose Variables button to include CHILDID (click “ID”) and the two indicator variables for the nonreference levels of SICDEGP (SICDEGP1 and SICDEGP2) in the MDM file. Click “in MDM” for these two indicator variables. After making these choices, select the longitudinal (occasions within persons) radio option for the structure of this longitudinal data set (for notation purposes only). Also, select Yes for Missing Data? in the Level 1 data set (because some children have missing data), and make sure that the option to Delete missing data when: running analyses is selected. Enter a name for the MDM file with a .mdm extension in the upper-right corner 280 Linear Mixed Models: A Practical Guide Using Statistical Software of the MDM window, save the .mdmt template file under a new name (click Save mdmt file), and click Make MDM. After HLM has processed the MDM file, click the Check Stats button to view descriptive statistics for the variables in the Level 1 and Level 2 files (this is not optional). Be sure that the desired number of records has been read into the MDM file and that there are no unusual values for the variables. Click Done to begin building Model 6.1. Step 1: Fit a model with a loaded mean structure (Model 6.1). In the model-building window, select VSAE from the list of variables, and then click on Outcome variable. This will cause the initial “unconditional” model without covariates (or with intercepts only) to be displayed, broken down into the Level 1 and the Level 2 models. To add more informative subscripts to the model specification, click File and Preferences, and then choose Use level subscripts. We now set up the Level 1 portion of Model 6.1 by adding the effects of the time-varying covariates AGE 2 and AGE 2SQ. Click the Level 1 button in the model-building window, and then click the AGE 2 variable. Choose add variable uncentered. The Level 1 model shows that the AGE 2 covariate has been added, along with its child-specific coefficient, π1i . Repeat this process, adding the uncentered version of the AGE 2SQ variable to the Level 1 model: Model 6.1: Level 1 Model VSAEti = π0i + π1i (AGE 2ti ) + π2i (AGE 2SQti ) + eti At this point, the Level 2 equation for the child-specific intercept (π0i ) contains a random effect for each child (r0i ). However, the coefficients for AGE 2 and AGE 2SQ (π1i and π2i , respectively) are simply defined as constants, and do not include any random effects. We can change this by clicking the r1i and r2i terms in the Level 2 model to add random child-specific effects to the coefficients of AGE 2 and AGE 2SQ, as shown in the following preliminary Level 2 model: Model 6.1: Level 2 Model (Preliminary) π0i = β00 + r0i π1i = β10 + r1i π2i = β20 + r2i This specification implies that the child-specific intercept (π0i ) depends on the overall intercept (β00 ) and the child-specific random effect associated with the intercept (r0i ). The effects of AGE 2 and AGE 2SQ (π1i and π2i , respectively) depend on the overall fixed effects of AGE 2 and AGE 2SQ (β10 and β20 ) and the random effects associated with each child (r1i and r2i , respectively). As a result, the initial VSAE value at age 2 and the trajectory of VSAE are both allowed to vary randomly between children. To complete the specification of Model 6.1, we need to add the uncentered versions of the indicator variables for the first two levels of SICDEGP to the Level 2 equations for the intercept (π0i ), the linear effect of age (π1i ), and the quadratic effect of age (π2i ). Click the Level 2 button in the model-building window. Then, click on each Level 2 equation, and click on the two indicator variables to add them to the equations (uncentered). The completed Level 2 model now appears as follows: Random Coefficient Models for Longitudinal Data: The Autism Example 281 Model 6.1: Level 2 Model (Final) π0i = β00 + β01 (SICDEGP1i ) + β02 (SICDEGP2i ) + r0i π1i = β10 + β11 (SICDEGP1i ) + β12 (SICDEGP2i ) + r1i π2i = β20 + β21 (SICDEGP1i ) + β22 (SICDEGP2i ) + r2i Adding SICDEGP1i and SICDEGP2i to the Level 2 equation for the child-specific intercept (π0i ) shows that the main effects (β01 and β02 ) of these indicator variables represent changes in the intercept (i.e., the expected VSAE response at age 2) for SICDEGP groups 1 and 2 relative to the reference category (SICDEGP = 3). By including the SICDEGP1i and SICDEGP2i indicator variables in the Level 2 equations for the effects of AGE 2 and AGE 2SQ (π1i and π2i ), we imply that the interactions between these indicator variables and AGE 2 and AGE 2SQ will be included in the overall linear mixed model, as shown below. We can view the overall linear mixed model by clicking the Mixed button in the model-building window: Model 6.1: Overall Mixed Model VSAEti = β00 + β01 ∗ SICDEGP1i + β02 ∗ SICDEGP2i + β10 ∗ AGE 2ti + β11 ∗ SICDEGP1i ∗ AGE 2ti + β12 ∗ SICDEGP2i ∗ AGE 2ti + β20 ∗ AGE 2SQti + β21 ∗ SICDEGP1i ∗ AGE 2SQti + β22 ∗ SICDEGP2i ∗ AGE 2SQti + r0i + r1i ∗ AGE 2ti + r2i ∗ AGE 2SQti + eti This model is the same as the general specification of Model 6.1 (see (6.1)) introduced in Subsection 6.3.2.1, although the notation is somewhat different. The correspondence between the HLM notation and the general notation that we used in Subsection 6.3.2 is displayed in Table 6.3. Note that we can also derive this form of the overall linear mixed model by substituting the expressions for the child-specific effects in the Level 2 model into the Level 1 model. After specifying Model 6.1, click Basic Settings to enter a title for this analysis (such as “Autism Data: Model 6.1”) and a name for the output (.html) file that HLM generates when it fits this model. We do not need to alter the outcome variable distribution setting, because the default is Normal (Continuous). Click OK to return to the model-building window, and then click File and Save As to save this model specification in a new .hlm file. Finally, click Run Analysis to fit the model. HLM by default uses REML estimation to estimate the covariance parameters in Model 6.1. We see the following message generated by HLM when it attempts to fit Model 6.1: The maximum number of iterations has been reached, but the analysis has not converged. Do you want to continue until convergence? At this point, one can request that the iterative estimation procedure continue by typing a “y” and hitting enter. After roughly 1600 iterations, the analysis finishes running. The large number of iterations required for the REML estimation algorithm to converge to a solution indicates a potential problem in fitting the model. We can click on File and View Output to see the results from this model fit. 282 Linear Mixed Models: A Practical Guide Using Statistical Software Despite the large number of iterations required to fit Model 6.1, the HLM results for this model are displayed in Section 6.6 for comparison with the other software procedures. Step 2: Select a structure for the random effects (Model 6.2 vs. Model 6.2A). We now fit Model 6.2, which includes the same fixed effects as Model 6.1 but does not have child-specific random effects associated with the intercept. To remove the random effects associated with the intercept from the model, simply click the r0i term in the Level 2 equation for the child-specific intercept. The new Level 2 equation for the intercept is: Model 6.2: Level 2 Equation for Child-Specific Intercept π0i = β00 + β01 (SICDEGP1i ) + β02 (SICDEGP2i ) Model 6.2 implies that the intercept for a given child is a function of their SICD group, but does not vary randomly from child to child. Click Basic Settings to enter a different title for this analysis (such as “Autism Data: Model 6.2”) and to change the name of the output file. Then click File and Save As to save the new model specification in a different .hlm file, so that the previous .hlm file is not overwritten. Click Run Analysis to fit Model 6.2. The REML algorithm converges to a solution in only 19 iterations. The default χ2 tests for the covariance parameters reported by HLM in the output for Model 6.2 suggest that there is significant variability in the child-specific linear and quadratic effects of age (rejecting the null hypothesis for Hypothesis 6.1), so we retain the random effects associated with AGE 2 and AGE 2SQ in all subsequent models. Note that the tests reported by HLM are not likelihood ratio tests, with p-values based on a mixture of χ2 distributions, as we reported in the other software procedures; see pages 63-64 of Raudenbush & Bryk (2002) for more details on these tests. We do not illustrate how to carry out a REML-based likelihood ratio test of Hypothesis 6.1 using HLM2.2 We can perform this likelihood ratio test (with a p-value based on a mixture of χ2 distributions) in HLM2 by calculating the difference in the deviance statistics reported for a reference (Model 6.2) and a nested (Model 6.2A) model, as long as at least one random effect is retained in the Level 2 models that are being compared. Subsection 6.5.1 provides more detail on the likelihood ratio test for the random quadratic age effects considered in this example. Step 3: Reduce the model by removing nonsignificant fixed effects (Model 6.2 vs. Model 6.3). In this step, we test the fixed effects in the model, given that the random child effects associated with the linear and quadratic effects of age are included. We begin by testing the SICD group by age-squared interaction (Hypothesis 6.2) in Model 6.2. We first refit Model 6.2 using maximum likelihood (ML) estimation. To do this, click Other Settings and then Estimation Settings in the model-building window. Select the Full maximum likelihood option (as opposed to REML estimation, which is the default), and click OK. Then click Basic Settings, save the output file under a different name, and enter a different title for the analysis (such as “Model 6.2: Maximum Likelihood”). Save the .hlm file under a new name, and click Run Analysis to refit the Model 6.2 using ML estimation. 2 HLM uses chi-square tests for covariance parameters by default (see Chapter 3). Likelihood ratio tests may also be calculated, as long as at least one random effect is retained in the Level 2 model for both the reference and nested models. Random Coefficient Models for Longitudinal Data: The Autism Example 283 We now fit a nested model (Model 6.3) that omits the fixed effects associated with the SICD group by age-squared interaction. To do this, we remove the SICDEGP1 and SICDEGP2 terms from the Level 2 model for the child-specific effect of AGE 2SQ (π2i ). Click on this Level 2 equation in the model-building window, click the SICDEGP1 variable in the variable list, and then select Delete variable from model. Do the same for the SICDEGP2 variable. The equation for the child-specific quadratic effect of age (i.e., the effect of AGE 2SQ) now appears as follows: Model 6.3: Level 2 Equation for Child-Specific Quadratic Effect of Age π2i = β20 + r2i This reduced model implies that the child-specific effect of AGE 2SQ depends on an overall fixed effect (β20 ) and a random effect associated with the child (r2i ). The childspecific effect of AGE 2SQ no longer depends on the SICD group of the child in this reduced model. After removing these fixed effects, click on Basic Settings, and save the output in a different file, so that the original ML fit of Model 6.2 is not overwritten. To perform a likelihood ratio test comparing the fit of the nested model (Model 6.3) to the fit of Model 6.2, we locate the deviance (i.e., the –2 ML log-likelihood) and number of parameters associated with the ML fit of Model 6.2 in the previous output file (4610.44 and 13, respectively). Click Other Settings, and then click Hypothesis Testing. Enter the deviance and number of parameters from the ML fit of Model 6.2 (deviance = 4610.44 and number of parameters = 13) in the window that opens, and click OK. HLM will now compare the deviance associated with the ML fit of Model 6.3 to the deviance of the ML fit of Model 6.2, and perform the appropriate likelihood ratio test. Save the .hlm file under a new name, and fit the model by clicking Run Analysis. HLM provides the result of the likelihood ratio test for Hypothesis 6.2 at the bottom of the resulting output file, which can be viewed by clicking File and View Output. The test is not significant in this case (HLM reports p > .50 for the resulting χ2 statistic), suggesting that the fixed effects associated with the SICDEGP × AGE 2SQ interaction can be dropped from the model. We refer to the model obtained after removing these fixed effects as Model 6.3. Additional likelihood ratio tests can be performed for other fixed effects (e.g., Hypothesis 6.3) in a similar manner. Based on these tests, we conclude that the Model 6.3 is our final model. We now refit Model 6.3 using REML estimation. This model has the same setup as Model 6.2, but without the fixed effects associated with the SICD group by age-squared interaction. To do this, the Estimation Settings need to be reset to REML, and the title of the output, as well as the output file name, should also be reset. In the Basic Settings window, files containing the Level 1 and Level 2 residuals can be generated for the purpose of checking assumptions about the residuals and random child effects in the model, by clicking the Level 1 Residual File and Level 2 Residual File buttons. We choose to generate SPSS versions of these residual files. The Level 1 residual file will contain the conditional residuals associated with the longitudinal measures (labeled L1RESID) and the conditional predicted values (labeled FITVAL). The Level 2 residual file will contain the EBLUPs for the child-specific random effects associated with AGE 2 and AGE 2SQ (labeled EBAGE 2 and V9, because EBAGE 2SQ is more than eight characters long). 284 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 6.5: Summary of Hypothesis Test Results for the Autism Analysis Hypothesis Label Test Estimation Method Models Compared (Nested vs. Reference) Test Statistic Values (Calculation) p-Value 6.1 LRT REML 6.2A vs. 6.2 χ2 (1 : 2) = 83.9 (4699.2 – 4615.3) < .001 6.2 LRT ML 6.3 vs. 6.2 χ2 (2) = 1.9 (4612.3 – 4610.4) 0.39 χ2 (2) = 23.4 < .001 (4635.7 – 4612.3) Note: See Table 6.4 for null and alternative hypotheses and distributions of test statistics under H0 . 6.3 6.5 6.5.1 LRT ML 6.4 vs. 6.3 Results of Hypothesis Tests Likelihood Ratio Test for Random Effects In Step 2 of the analysis we used a likelihood ratio test to test Hypothesis 6.1, and decide whether to retain the random quadratic (and therefore linear) effects of age in Model 6.2. These likelihood ratio tests were carried out based on REML estimation in all software packages except HLM. Hypothesis 6.1. The child-specific quadratic random effects of age can be omitted from Model 6.2. We tested the need for the quadratic random effects of age indirectly, by carrying out tests for the corresponding elements in the D matrix. The null and alternative hypotheses for Hypothesis 6.1 are defined in terms of the D matrix, and shown in Subsection 6.3.3. We calculated the likelihood ratio test statistic for Hypothesis 6.1 by subtracting the –2 REML log-likelihood value for Model 6.2 (the reference model) from the value for Model 6.2A (the nested model). The resulting test statistic is equal to 83.9 (see Table 6.5). The asymptotic distribution of the likelihood ratio test statistic under the null hypothesis is a mixture of χ21 and χ22 distributions with equal weights of 0.5 rather than the usual χ22 2 distribution, because the null hypothesis value of one of the parameters (σage-squared = 0) is on the boundary of the parameter space (Verbeke & Molenberghs, 2000). The p-value for this test statistic is computed as follows: p-value = 0.5 × P (χ22 > 83.9) + 0.5 × P (χ21 > 83.9) < 0.001 We therefore decided to retain the random quadratic age effects in this model and in all subsequent models. We also retain the random linear age effects as well, so that the model is well formulated in a hierarchical sense (Morrell et al., 1997). In other words, because we keep the higher-order quadratic effects, the lower-order linear effects are also kept in the model. Random Coefficient Models for Longitudinal Data: The Autism Example 285 The child-specific linear and quadratic effects of age are in keeping with what we observed in Figure 6.1 in our initial data summary, in which we noted marked differences in the individual VSAE trajectories of children as they grew older. The random effects in the model capture the variability between these trajectories. 6.5.2 Likelihood Ratio Tests for Fixed Effects In Step 3 of the analysis we carried out likelihood ratio tests for selected fixed effects using ML estimation in all software packages. Specifically, we tested Hypotheses 6.2 and 6.3. Hypothesis 6.2. The age-squared by SICD group interaction effects can be dropped from Model 6.2 (β7 = β8 = 0). To perform a test of Hypothesis 6.2, we used maximum likelihood (ML) estimation to fit Model 6.2 (the reference model) and Model 6.3 (the nested model with the AGE 2SQ × SICDEGP interaction term omitted). The likelihood ratio test statistic was calculated by subtracting the –2 ML log-likelihood for Model 6.2 from the value for Model 6.3. The asymptotic null distribution of the test statistic is a χ2 with 2 degrees of freedom. The 2 degrees of freedom arise from the two fixed effects omitted in Model 6.3. The result of the test was not significant (p = 0.39), so we dropped the AGE 2SQ × SICDEGP interaction term from Model 6.2. Hypothesis 6.3. The age by SICD group interaction effects can be dropped from Model 6.3 (β5 = β6 = 0). To test Hypothesis 6.3 we used ML estimation to fit Model 6.3 (the reference model) and Model 6.4 (a nested model without the AGE 2 × SICDEGP interaction). The test statistic was calculated by subtracting the –2 ML log-likelihood for Model 6.3 from that of Model 6.4. The asymptotic null distribution of the test statistic again was a χ2 with 2 degrees of freedom. The p-value for this test was significant (p < 0.001). We concluded that the linear effect of age on VSAE does differ for different levels of SICD group, and we kept the AGE 2 × SICDEGP interaction term in Model 6.3. 6.6 6.6.1 Comparing Results across the Software Procedures Comparing Model 6.1 Results Table 6.6 shows a comparison of selected results obtained by fitting Model 6.1 to the Autism data, using four of the six software procedures (results were not available when using the lme() function in R or the mixed command in Stata, because of problems encountered when fitting this model). We present results for SAS, SPSS, the lmer() function in R, and HLM, despite the problems encountered when fitting Model 6.1 using each of the procedures, to highlight the differences and similarities across the procedures. Both warning and error messages were produced by the procedures in SAS, SPSS, R, and Stata, and a large number of iterations were required to fit this model when using the HLM2 procedure. See the data analysis steps for each software procedure in Section 6.4 for details on the problems encountered when fitting Model 6.1. Because of the estimation problems, the results in Table 6.6 should be regarded with a great deal of caution. 286 TABLE 6.6: Comparison of Results for Model 6.1 Estimation Method Fixed-Effect Parameter β0 (Intercept) β1 (AGE 2) β2 (AGE 2SQ) β3 (SICDEGP1) β4 (SICDEGP2) β5 (AGE 2 × SICDEGP1) β6 (AGE 2 × SICDEGP2) β7 (AGE 2SQ × SICDEGP1) β8 (AGE 2SQ × SICDEGP2) Covariance Parameter 2 σint σint,age σint,age-sq 2 σage σage,age-sq 2 σage-sq σ2 SPSS: MIXED R: lmer() function HLM2 REML REML REML REML None 1603 Iterations G Matrix Not Lack of Positive-Definite Convergence Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) 13.78(0.81) 5.61(0.79) 0.20(0.09) −5.43(1.10) −4.03(1.03) −3.29(1.08) −2.77(1.02) −0.14(0.12) 13.76(0.79) 5.60(0.80) 0.21(0.08) −5.41(1.07) −4.01(1.01) −3.25(1.10) −2.76(1.03) −0.14(0.11) 13.77(0.81) 5.60(0.79) 0.20(0.08) −5.42(1.09) −4.04(1.03) −3.30(1.09) −2.75(1.03) −0.13(0.11) 13.79(0.82) 5.60(0.79) 0.20(0.09) −5.44(1.11) −4.04(1.05) −3.28(1.08) −2.75(1.02) −0.14(0.12) −0.13(0.11) −0.13(0.11) −0.13(0.11) −0.13(0.11) Estimate (SE) Estimate (SE) Estimate (n.c.) Estimate (n.c.) 0.00(n.c.) 0.62(2.29) 0.57(0.22) 14.03(3.09) −0.64(0.26) 0.17(0.03) 38.71 0.00(0.00)a −15.01(2.41) 3.30(0.24) 15.39(3.26) −0.68(0.25) 0.14(0.03) 36.95 0.00 0.00 0.00 14.67 −0.32(corr.) 0.13 38.50 1.48 0.25 0.42 14.27 −0.59 0.16 37.63 Linear Mixed Models: A Practical Guide Using Statistical Software Warning Message SAS: proc mixed Estimation Method SAS: proc mixed SPSS: MIXED R: lmer() function HLM2 REML REML REML REML Model Information Criteria –2 REML log-likelihood 4604.7 4618.8 4610.0 4606.2 AIC 4616.7 4632.8 4647.0 n.c. BIC 4635.1 4663.6 4718.0 n.c. Note: (n.c.) = not computed Note: 610 Longitudinal Measures at Level 1; 158 Children at Level 2 a This covariance parameter is reported to be “redundant” by the MIXED command in SPSS. Random Coefficient Models for Longitudinal Data: The Autism Example TABLE 6.6: (Continued) 287 288 TABLE 6.7: Comparison of Results for Model 6.2 Estimation Method SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM: HLM2 REML REML REML REML REML REML Estimate (SE) 13.77(0.81) 5.60(0.79) 0.20(0.08) −5.42(1.09) −4.04(1.03) −3.30(1.09) −2.75(1.03) −0.13(0.11) Estimate (SE) 13.77(0.81) 5.60(0.79) 0.20(0.08) −5.42(1.09) −4.04(1.03) −3.30(1.09) −2.75(1.03) −0.13(0.11) Estimate (SE) 13.77(0.81) 5.60(0.79) 0.20(0.08) −5.42(1.09) −4.04(1.03) −3.30(1.09) −2.75(1.03) −0.13(0.11) Estimate (SE) 13.77(0.81) 5.60(0.79) 0.20(0.08) −5.42(1.09) −4.04(1.03) −3.30(1.09) −2.75(1.03) −0.13(0.11) Fixed-Effect Parameter β0 (Intercept) β1 (AGE 2) β2 (AGE 2SQ) β3 (SICDEGP1) β4 (SICDEGP2) β5 (AGE 2 × SICDEGP1) β6 (AGE 2 × SICDEGP2) β7 (AGE 2SQ × SICDEGP1) β8 (AGE 2SQ × SICDEGP2) Estimate (SE) Estimate (SE) 13.77(0.81) 13.77(0.81) 5.60(0.79) 5.60(0.79) 0.20(0.08) 0.20(0.08) −5.42(1.09) −5.42(1.09) −4.04(1.03) −4.04(1.03) −3.30(1.09) −3.30(1.09) −2.75(1.03) −2.75(1.03) −0.13(0.11) −0.13(0.11) −0.13(0.11) −0.13(0.11) −0.13(0.11) −0.13(0.11) −0.13(0.11) −0.13(0.11) Covariance Parameter Estimate (SE) Estimate (SE) Estimate (n.c.) Estimate (n.c.) Estimate (SE) Estimate (n.c.) 14.67(2.63) −0.44(0.21) 0.13(0.03) 38.50 14.67(2.63) −0.44(0.21) 0.13(0.03) 38.50 14.67 −0.32(corr.) 0.13 38.50 2 σage σage,age-sq 2 σage-sq σ2 14.67 −0.32 0.13 38.50 14.67(2.63) −0.44(0.21) 0.13(0.03) 38.50 14.67 −0.44 0.13 38.50 Model Information Criteria –2 RE/ML log-likelihood 4615.3 AIC 4623.3 BIC 4635.5 Note: (n.c.) = not computed Note: 610 Longitudinal Measures at Level 1; 158 4615.3 4623.3 4640.9 4615.3 4641.3 4698.5 Children at Level 2 4615.3 4641.0 4699.0 4615.3 4641.3 4698.7 4613.4 n.c. n.c. Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Random Coefficient Models for Longitudinal Data: The Autism Example 289 The major differences in the results for Model 6.1 across the software procedures are in the covariance parameter estimates and their standard errors. These differences are due to the violation of positive-definite constraints for the D matrix. Despite these differences, the fixed-effect parameter estimates and their standard errors are similar. The –2 REML log-likelihoods, which are a function of the fixed-effect and covariance parameter estimates, also differ across the software procedures. In general, warning messages that can result in these types of discrepancies depending on the software used should never be ignored when fitting linear mixed models with multiple random effects. Given the likely lack of variability in the intercepts in this particular model, we consider results from Model 6.2 (with random intercepts excluded) next. 6.6.2 Comparing Model 6.2 Results Selected results obtained by fitting Model 6.2 to the Autism data using each of the six software procedures are displayed in Table 6.7. The only difference between Models 6.1 and 6.2 is that the latter does not contain the random child-specific effects associated with the intercept. The difficulties in estimating the covariance parameters that were encountered when fitting Model 6.1 were not experienced when fitting this model. The five procedures agree very closely in terms of the estimated fixed effects, the covariance parameter estimates, and their standard errors. The –2 REML log-likelihoods reported by the procedures in SAS, SPSS, R, and Stata all agree. The –2 REML log-likelihood reported by HLM differs, perhaps because of differences in default convergence criteria (see Subsection 3.6.1). The other model information criteria (AIC and BIC), not reported by HLM, differ because of differences in the calculation formulas used across the software procedures (see Section 3.6 for a discussion of these differences). 6.6.3 Comparing Model 6.3 Results Table 6.8 compares the results obtained by fitting the final model, Model 6.3, across the six software procedures. As we noted in the comparison of the Model 6.2 results, there is agreement between the six procedures in terms of both the fixed-effect and covariance parameter estimates and their standard errors (when reported). The –2 REML log-likelihoods agree across the procedures in SAS, SPSS, R and Stata. The HLM value of the –2 REML log-likelihood is again different from that reported by the other procedures. Other differences in the model information criteria (e.g., AIC and BIC) are due to differences in the calculation formulas, as noted in Subsection 6.6.2. 6.7 Interpreting Parameter Estimates in the Final Model We now use the results obtained by using the lme() function in R to interpret the parameter estimates for Model 6.3. 6.7.1 Fixed-Effect Parameter Estimates We show a portion of the output for Model 6.3 below. This output includes the fixed-effect parameter estimates, their corresponding standard errors, the degrees of freedom, the t-test values, and the corresponding p-values. The output is obtained by applying the summary() function to the object model6.3.fit, which contains the results of the model fit. 290 TABLE 6.8: Comparison of Results for Model 6.3 SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HLM: HLM2 Estimation Method REML REML REML REML REML REML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) 13.46(0.78) 6.15(0.69) 0.11(0.04) −4.99(1.04) −3.62(0.98) −4.07(0.88) −3.50(0.83) 13.46(0.78) 6.15(0.69) 0.11(0.04) −4.99(1.04) −3.62(0.98) −4.07(0.88) −3.50(0.83) 13.46(0.78) 6.15(0.69) 0.11(0.04) −4.99(1.04) −3.62(0.98) −4.07(0.88) −3.50(0.83) 13.46(0.78) 6.15(0.69) 0.11(0.04) −4.99(1.04) −3.62(0.98) −4.07(0.88) −3.50(0.83) 13.46(0.78) 6.15(0.69) 0.11(0.04) −4.99(1.04) −3.62(0.98) −4.07(0.88) −3.50(0.83) 13.46(0.78) 6.15(0.69) 0.11(0.04) −4.99(1.04) −3.62(0.98) −4.07(0.88) −3.50(0.83) Estimate (SE) Estimate (SE) Estimate (n.c.) Estimate (n.c.) Estimate (SE) Estimate (n.c.) 14.52(2.61) −0.42(0.20) 0.13(0.03) 38.79 14.52(2.61) −0.42(0.20) 0.13(0.03) 38.79 β0 (Intercept) β1 (AGE 2) β2 (AGE 2SQ) β3 (SICDEGP1) β4 (SICDEGP2) β5 (AGE 2 × SICDEGP1) β6 (AGE 2 × SICDEGP2) Covariance Parameter 2 σage σage,age-sq 2 σage-squared σ2 14.52 −0.31a 0.13 38.79 14.52 −0.31 0.13 38.79 14.52(2.61) −0.42(0.20) 0.13(0.03) 38.79 14.52 −0.42 0.13 38.79 Model Information Criteria –2 RE/ML log-likelihood 4611.6 AIC 4619.6 BIC 4631.8 Note: (n.c.) = not computed Note: 610 Longitudinal Measures at Level 1; 158 a (correlation). 4611.6 4619.6 4637.2 4611.6 4633.6 4682.0 Children at Level 2 4612.0 4634.0 4682.0 4611.6 4633.6 4682.2 4609.7 n.c. n.c. Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Random Coefficient Models for Longitudinal Data: The Autism Example ' 291 $ Fixed effects: vsae ~ age.2 + I(age.2 ^ 2) + sicdegp2.f + age.2:sicdegp2.f (Intercept) age.2 I(age.2 ^ 2) sicdegp2.f1 sicdegp2.f2 age.2:sicdegp2.f1 age.2:sicdegp2.f2 & Value 13.463533 6.148750 0.109008 -4.987639 -3.622820 -4.068041 -3.495530 Std.Error 0.7815177 0.6882638 0.0427795 1.0379064 0.9774516 0.8797676 0.8289509 DF 448 448 448 155 155 448 448 t-value 17.227419 8.933711 2.548125 -4.805480 -3.706394 -4.623995 -4.216812 p-value 0.0000 0.0000 0.0112 0.0000 0.0003 0.0000 0.0000 % The (Intercept) (= 13.46) represents the estimated mean VSAE score for children at 2 years of age in the reference category of SICDEGP2.F (i.e., Level 3 of SICDEGP: the children who had the highest initial expressive language scores). The value reported for sicdegp2.f1 represents the estimated difference between the mean VSAE score for 2-yearold children in Level 1 of SICDEGP vs. the reference category. In this case, the estimate is negative (−4.99), which means that the mean initial VSAE score for children in Level 1 of SICDEGP is 4.99 units lower than that of children in the reference category. Similarly, the effect of sicdegp2.f2 represents the estimated difference in the mean VSAE score for children at age 2 in Level 2 of SICDEGP vs. Level 3. Again, the value is negative (−3.62), which means that the children in Level 2 of SICDEGP are estimated to have an initial mean VSAE score at age 2 years that is 3.62 units lower than children in the reference category. The parameter estimates for age.2 and I(age.2^2) (6.15 and 0.11, respectively) indicate that both coefficients defining the quadratic regression model for children in the reference category of SICD group (SICDEGP = 3) are positive and significant, which suggests a trend in VSAE scores that is consistently accelerating as a function of age. The value associated with the interaction term age2:sicdegp2.f1 represents the difference in the linear effect of age for children in Level 1 of SICDEGP vs. Level 3. The linear coefficient of age for children in SICDEGP = 1 is estimated to be 4.07 units less than that of children in SICDEGP = 3. However, the estimated linear effect of age for children in SICDEGP = 1 is still positive: 6.15 − 4.07 = 2.08. The value for the interaction term age2:sicdegp2.f2 represents the difference in the linear coefficient of age for children in Level 2 of SICDEGP vs. Level 3, which is again negative. Despite this, the linear trend for age for children in Level 2 of SICDEGP is also estimated to be positive and very similar to that for children in SICDEGP Level 1: 6.15 − 3.50 = 2.65. 6.7.2 Covariance Parameter Estimates In this subsection, we discuss the covariance parameter estimates for the child-specific linear and quadratic age effects for Model 6.3. Notice in the output from R below that the estimated standard deviations (StdDev) and correlation (Corr) of the two random effects are reported in the R output, rather than their variances and covariances, as shown in Table 6.8. We remind readers that this output is based on the lme() function in R; when using the lmer() function, estimates of the variances and corresponding standard deviations will be displayed by the summary() function. 292 Linear Mixed Models: A Practical Guide Using Statistical Software ' $ Random effects: Formula: ~age.2 + I(age.2 ^ 2) - 1 | childid Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr age.2 3.8110274 age.2 I(age.2 ^ 2) 0.3556805 -0.306 Residual 6.2281389 & % To calculate the estimated variance of the random linear age effects, we square the reported StdDev value for AGE.2 (3.81 × 3.81 = 14.52). We also square the reported StdDev of the random quadratic effects of age to obtain their estimated variance (0.36×0.36 = 0.13). The correlation of the random linear age effects and the random quadratic age effects is estimated to be −0.31. The residual variance is estimated to be 6.23 × 6.23 = 38.81. There is no entry in the Corr column for the Residual, because we assume that the residuals are independent of the random effects in the model. We use the intervals() function to obtain the estimated standard errors for the covariance parameter estimates, and approximate 95% confidence intervals for the parameters. R calculates these estimates for the standard deviations and correlation of the random effects, rather than for their variances and covariance. > intervals(model6.3.fit) The approximate 95% confidence intervals for the standard deviations of the random linear and quadratic age effects do not contain zero. However, these confidence intervals are based on the asymptotic normality of the covariance parameter estimates, as are the Wald tests for covariance parameters produced by SAS and SPSS. Because these confidence intervals are only approximate, they should be interpreted with caution. For formal tests of the need for the random effects in the model (Hypothesis 6.1), we recommend likelihood ratio tests, with p-values calculated using a mixture of χ2 distributions, as discussed in Subsection 6.5.1. Based on the likelihood ratio test results, we concluded that there is significant between-child variability in the quadratic effects of age on VSAE score. We noted in the initial data summary (Figure 6.2) that the variability of the individual VSAE scores increased markedly with age. The marginal Vi matrix (= Zi DZi + Ri ) for the i-th child implied by Model 6.3 can be obtained for the first child by using the R syntax shown below. We note in this matrix that the estimated marginal variances of the VSAE scores (shown on the diagonal of the matrix) increase dramatically with age. > getVarCov(model6.3.fit, individual= "1", type= "marginal") ' $ CHILDID1 Marginal variance covariance matrix 1 2 3 4 5 1 38.79 0.000 0.000 0.000 0.00 2 0.00 52.610 39.728 84.617 120.27 3 0.00 39.728 157.330 273.610 425.24 4 0.00 84.617 273.610 769.400 1293.00 5 0.00 120.270 425.240 1293.000 2543.20 & % The fact that the implied marginal covariances associated with age 2 years (in the first row and first column of the Vi matrix) are zero is a direct result of our choice to delete the random effects associated with the intercepts from Model 6.3, and to use AGE – 2 as a Random Coefficient Models for Longitudinal Data: The Autism Example 293 covariate. The values in the first row of the Zi matrix correspond to the values of AGE – 2 and AGE – 2 squared for the first measurement (age 2 years). Because we used AGE – 2 as a covariate, the Zi matrix has values of 0 in the first row. ⎛ ⎞ 0 0 ⎜ 1 1 ⎟ ⎜ ⎟ ⎜ 9 ⎟ Zi = ⎜ 3 ⎟ ⎝ 7 49 ⎠ 11 121 In addition, because we did not include a random intercept in Model 6.3, the only nonzero component corresponding to the first time point (age = 2) in the upper-left corner of the Vi matrix is contributed by the Ri matrix, which is simply σ 2 Ini (see Subsection 6.3.2 in this example). This means that the implied marginal variance at age 2 is equal to the estimated residual variance (38.79), and the corresponding marginal covariances are zero. This reflects our decision to attribute all the variance in VSAE scores at age 2 to residual variance. 6.8 6.8.1 Calculating Predicted Values Marginal Predicted Values Using the estimates of the fixed-effect parameters obtained by fitting Model 6.3 in R (Table 6.8), we can write a formula for the marginal predicted VSAE score at visit t for child i, as shown in (6.4): ti = VSAE 13.46 + 6.15 × AGE 2ti + 0.11 × AGE 2SQti − 4.99 × SICDEGP1i − 3.62 × SICDEGP2i − 4.07 × AGE 2ti × SICDEGP1i − 3.50 × AGE 2ti × SICDEGP2i (6.4) We can use the values in (6.4) to write three separate formulas for predicting the marginal VSAE scores for children in the three levels of SICDEGP. Recall that SICDEGP1 and SICDEGP2 are dummy variables that indicate whether a child is in the first or second level of SICDEGP. The marginal predicted values are the same for all children at a given age who share the same level of SICDEGP. FOR SICDEGP = 1: ti = (13.46 − 4.99) + (6.15 − 4.07) × AGE 2ti + 0.11 × AGE 2SQ VSAE ti = 8.47 + 2.08 × AGE 2ti + 0.11 × AGE 2SQti FOR SICDEGP = 2: ti = (13.46 − 3.62) + (6.15 − 3.50) × AGE 2ti + 0.11 × AGE 2SQ VSAE ti = 9.84 + 2.65 × AGE 2ti + 0.11 × AGE 2SQti FOR SICDEGP = 3: ti = 13.46 + 6.15 × AGE 2ti + 0.11 × AGE 2SQ VSAE ti Linear Mixed Models: A Practical Guide Using Statistical Software 40 60 SICDEGP = 1 SICDEGP = 2 SICDEGP = 3 0 20 Marginal Predicted VSAE 80 100 294 0 2 4 6 8 10 AGE minus 2 FIGURE 6.4: Marginal predicted VSAE trajectories in the three SICDEGP groups for Model 6.3. The marginal intercept for children in the highest expressive language group at 2 years of age (SICDEGP = 3) is higher than that of children in group 1 or group 2. The marginal linear effect of age is also less for children in SICDEGP Level 1 and Level 2 than for children in Level 3 of SICDEGP, but the quadratic effect of age is assumed to be the same for the three levels of SICDEGP. Figure 6.4 graphically shows the marginal predicted values for children in each of the three levels of SICDEGP at each age, obtained using the following R syntax: > curve(0.ll*x^2 + 6.15*x + 13.46, 0, 11, xlab = "AGE minus 2", ylab = "Marginal Predicted VSAE", lty = 3, ylim=c(0,100), lwd = 2) > curve(0.ll*x^2+ 2.65*x + 9.84, 0, 11, add=T, lty = 2, lwd = 2) > curve(0.ll*x^2 + 2.08*x + 8.47, 0, 11, add=T, lty = 1, lwd = 2) > # Add a legend to the plot; R will wait for the user to click > # on the point in the plot where the legend is desired. > legend(locator(1), c("SICDEGP = 1", "SICDEGP = 2", "SICDEGP = 3"), lty = c(l, 2, 3), lwd = c(2, 2, 2)) The different intercepts for each level of SICDEGP are apparent in Figure 6.4, and the differences in the predicted trajectories for each level of SICDEGP can be easily visualized. Random Coefficient Models for Longitudinal Data: The Autism Example 295 Children in SICDEGP = 3 are predicted to start at a higher initial level of VSAE at age 2 years, and also have predicted mean VSAE scores that increase more quickly as a function of age than children in the first or second SICD group. 6.8.2 Conditional Predicted Values We can also write a formula for the predicted VSAE score at visit t for child i, conditional on the random linear and quadratic age effects in Model 6.3, as follows: ti = VSAE 13.46 + 6.15 × AGE 2ti + 0.11 × AGE 2SQti − 4.99 × SICDEGP1i − 3.62 × SICDEGP2i (6.5) − 4.07 × AGE 2ti × SICDEGP1i − 3.50 × AGE 2ti × SICDEGP2i + û1i × AGE 2ti + û2i × AGE 2SQti In general, the intercept will be the same for all children in a given level of SICDEGP, but their individual trajectories will differ, because of the random linear and quadratic effects of age that were included in the model. For the i-th child, the predicted values of u1i and u2i are the realizations of the EBLUPs of the random linear and quadratic age effects, respectively. The formula below can be used to calculate the conditional predicted values for a given child i in SICDEGP = 3: ti = VSAE 13.46 + 6.15 × AGE 2ti + 0.11 × AGE 2SQti + û1i × AGE 2ti + û2i × AGE 2SQti (6.6) For example, we can write a formula for the predicted value of VSAE at visit t for CHILDID = 4 (who is in SICDEGP = 3) by substituting the predicted values of the EBLUPs generated by R using the random.effects() function for the fourth child into the formula above. The EBLUP for u14 is 2.31, and the EBLUP for u24 is 0.61: t4 = 13.46 + 6.15 × AGE 2t4 + 0.11 × AGE 2SQ VSAE t4 + 2.31 × AGE 2t4 + 0.61 × AGE 2SQt4 (6.7) = 13.46 + 8.46 × AGE 2t4 + 0.72 × AGE 2SQt4 The conditional predicted VSAE value for child 4 at age 2 is 13.46, which is the same for all children in SICDEGP = 3. The predicted linear effect of age specific to child 4 is positive (8.46), and is larger than the predicted marginal effect of age for all children in SICDEGP = 3 (6.15). The quadratic effect of age for this child (0.72) is also much larger than the marginal quadratic effect of age (0.11) for all children in SICDEGP = 3. See the third panel in the bottom row of Figure 6.5 for a graphical depiction of the individual trajectory of CHILDID = 4. We graph the child-specific predicted values of VSAE for the first 12 children in SICDEGP = 3, along with the marginal predicted values for children in SICDEGP = 3, using the following R syntax: > # Load the lattice package. > # Set the trellis graphics device to have no color. > library(lattice) 296 Linear Mixed Models: A Practical Guide Using Statistical Software marginal mean profile subject−specific profile 0 2 4 6 8 10 0 2 4 6 8 10 46 48 49 51 21 27 36 42 250 200 150 100 50 Predicted VSAE 0 250 200 150 100 50 0 1 4 3 19 250 200 150 100 50 0 0 2 4 6 8 10 0 2 4 6 8 10 AGE minus 2 FIGURE 6.5: Conditional (dashed lines) and marginal (solid lines) trajectories, for the first 12 children with SICDEGP = 3. > trellis.device(color=F) > > > > > > # # # # # # Use the augPred function in the nlme package to plot conditional predicted values for the first twelve children with SICDEGP = 3, based on the fit of Model 6.3 (note that this requires the autism.csv data set to be sorted in descending order by SICDEGP, prior to being imported into R). > plot(augPred(model6.3.fit, level = 0:1), layout = c(4, 3, 1), xlab = "AGE minus 2", ylab = "Predicted VSAE", key = list( lines = list(lty = c(1, 2), col = c(1, 1), lwd = c(1, 1) ), text = list(c("marginal mean profile", "subject-specific profile")), columns = 2)) We can clearly see the variability in the fitted trajectories for different children in the third level of SICDEGP in Figure 6.5. In general, the fitted() function can be applied to a model fit object (e.g., model6.3.fit) to obtain conditional predicted values in the R software, and the random.effects() function (in the nlme package) can be applied to a model fit object to obtain EBLUPs of random effects. Refer to Section 6.9 for additional R syntax that can be used to obtain and plot conditional predicted values. Random Coefficient Models for Longitudinal Data: The Autism Example 0 50 100 150 200 3 2 1 297 Standardized residuals 6 4 2 0 −2 −4 0 50 100 150 0 200 50 100 150 200 Fitted values FIGURE 6.6: Residual vs. fitted plot for each level of SICDEGP, based on the fit of Model 6.3. 6.9 Diagnostics for the Final Model We now check the assumptions for Model 6.3, fitted using REML estimation, using informal graphical procedures in the R software. Similar plots can be generated in the other four software packages after saving the conditional residuals, the conditional predicted values, and the EBLUPs of the random effects based on the fit of Model 6.3 (see the book’s web page in Appendix A). 6.9.1 Residual Diagnostics We first assess the assumption of constant variance for the residuals in Model 6.3. Figure 6.6 presents a plot of the standardized conditional residuals vs. the conditional predicted values for each level of SICDEGP. > library(lattice) > trellis.device(color= F) > plot(model6.3.fit, resid(., type = "p") ~ fitted(.) | factor(sicdegp), layout=c(3,1), aspect=2, abline=0) The variance of the residuals appears to decrease for larger fitted values, and there are some possible outliers that may warrant further investigation. The preceding syntax may be modified by adding the id = 0.05 argument to produce a plot (not shown) that identifies outliers at the 0.05 significance level: > plot(model6.3.fit, resid(., type= "p") ~ fitted(.) | factor(sicdegp), id = 0.05, layout = c(3,1), aspect = 2, abline = 0) 298 Linear Mixed Models: A Practical Guide Using Statistical Software 40 Residuals 20 0 −20 0 2 4 6 8 10 age.2 FIGURE 6.7: Plot of conditional raw residuals versus AGE.2. Next, we investigate whether the residual variance is constant as a function of AGE.2. > plot(model6.3.fit, resid(.) ~ age.2, abline = 0) Figure 6.7 suggests that the variance of the residuals is fairly constant across the values of AGE – 2. We again note the presence of outliers. Next, we assess the assumption of normality of the residuals using Q–Q plots within each level of SICDEGP, and request that unusual points be identified by CHILDID using the id = 0.05 argument: > qqnorm(model6.3.fit, ~resid(.) | factor(sicdegp) , layout = c(3,1), aspect = 2, id = 0.05) Figure 6.8 suggests that the assumption of normality for the residuals seems acceptable. However, the presence of outliers in each level of SICDEGP (e.g., CHILDID = 46 in SICDEGP = 3) may warrant further investigation. 6.9.2 Diagnostics for the Random Effects We now check the distribution of the random effects (EBLUPs) generated by fitting Model 6.3 to the Autism data. Figure 6.9 presents Q–Q plots for the two sets of random effects. Significant outliers at the 0.10 level of significance are identified by CHILDID in this graph (id = 0.10): > qqnorm(model6.3.fit, ~ranef(.) , id = 0.10) Random Coefficient Models for Longitudinal Data: The Autism Example −20 20 40 2 1 Quantiles of standard normal 0 80 124 3 180 46 10649 97 131 193 87 2 299 0 −2 100 49 180 46 77 −20 0 20 −20 40 0 20 40 Residuals FIGURE 6.8: Normal Q–Q Plots of conditional residuals within each level of SICDEGP. I(age.2^2) age.2 139 124 2 Quantiles of standard normal 100 77 105 4 210 105 193 110 155 91 80 156 9 1 0 −1 −2 110 155 124 −5 0 5 10 15 −0.5 0.0 0.5 Random effects FIGURE 6.9: Normal Q–Q Plots for the EBLUPs of the random effects. 300 Linear Mixed Models: A Practical Guide Using Statistical Software −5 1 0 5 10 15 2 3 I(age.2^2) 0.5 0.0 −0.5 124 −5 0 5 10 −5 15 0 5 10 15 age.2 FIGURE 6.10: Scatter plots of EBLUPs for age-squared vs. age by SICDEGP. We note that CHILDID = 124 is an outlier in terms of both random effects. The children indicated as outliers in these plots should be investigated in more detail to make sure that there is nothing unusual about their observations. Next, we check the joint distribution of the random linear and quadratic age effects across levels of SICDEGP using the pairs() function: > pairs(model6.3.fit, ~ranef(.) | factor(sicdegp), id = ~childid == 124, layout = c(3, 1), aspect = 2) The form of these plots is not suggestive of a very strong relationship between the random effects for age and age-squared, although R reported a modest negative correlation (r = −0.31) between them in Model 6.3 (see Table 6.8). The distinguishing features of these plots are the outliers, which give the overall shape of the plots a rather unusual appearance. The EBLUPs for CHILDID = 124 are again unusual in Figure 6.10. Investigation of the values for children with unusual EBLUPs would be useful at this point, and might provide insight into the reasons for the outliers; we do not pursue such an investigation here. We also remind readers (from Chapter 2) that selected influence diagnostics can also be computed when using the nlmeU or HLMdiag packages in R; see Section 20.3 of Galecki & Burzykowski (2013) or Loy & Hofmann (2014) for additional details on the computational steps involved. 6.9.3 Observed and Predicted Values Finally, we check for agreement between the conditional predicted values based on the fit of Model 6.3 and the actual observed VSAE scores. Figure 6.11 displays scatter plots of the observed VSAE scores vs. the conditional predicted VSAE scores for each level of SICDEGP, with possible outliers once again identified: Random Coefficient Models for Longitudinal Data: The Autism Example 0 50 100 150 200 2 1 301 3 200 vsae 150 124 100 49 49 46 18097 180 50 80 193 87 106100 131 46 77 0 0 50 100 150 0 200 50 100 150 200 Fitted values FIGURE 6.11: Agreement of observed VSAE scores with conditional predicted VSAE scores for each level of SICDEGP, based on Model 6.3. > plot(model6.3.fit, vsae ~ fitted(.) | factor(sicdegp), id = 0.05, layout = c(3,1) , aspect = 2) We see relatively good agreement between the observed and predicted values within each SICDEGP group, with the exception of some outliers. We refit Model 6.3 after excluding the observations for CHILDID = 124 (in SICDEGP = 1) and CHILDID = 46 (in SICDEGP = 3): > autism.grouped2 <- autism.grouped[(autism.grouped$childid != 124 & autism.grouped$childid != 46),] > model6.3.fit.out <- update(model6.3.fit, data = autism.grouped2) Applying the summary() and intervals() functions to the model6.3.fit.out object indicates that the primary results in Model 6.3 did not change substantially after excluding the outliers. 6.10 Software Note: Computational Problems with the D Matrix The major differences between the software procedures in this analysis were encountered when attempting to estimate the variance-covariance matrix of the random effects (the D matrix) in Model 6.1. This model included three random effects for each child, associated with the intercept, the linear effect of age, and the quadratic effect of age. Each software procedure reacted differently to the estimation problems that occurred when fitting Model 6.1. We summarize these differences in Table 6.6. SAS proc mixed produced a note in the log stating that the estimated G matrix (i.e., the block-diagonal matrix with blocks defined 302 Linear Mixed Models: A Practical Guide Using Statistical Software by the 3 × 3 D matrix for a single child) was not positive-definite. The estimated value of 2 σint was reported to be zero in the output, and its standard error was not reported. SPSS MIXED produced a warning in the output stating that model convergence was not achieved, and that the validity of the model fit was uncertain. SPSS also reported the estimated 2 value of σint and its standard error to be zero. The lme() function in R reported problems with convergence of the estimation algorithm, and estimates of the parameters in the model were not available. Similar to the procedures in SAS and SPSS, the lmer() function in R 2 reported an estimate of 0 for σint . The mixed command in Stata reported an error message and did not produce any output. HLM did not report any problems in fitting the model, but required more than 1600 iterations to converge. To investigate the estimation problems encountered in fitting Model 6.1, we used the nobound option in SAS proc mixed. This option allowed us to fit the implied marginal model without requiring that the G matrix be positive-definite. We found that the estimated 2 value for what would have been σint in the unconstrained matrix was actually negative (−10.54). Subsequent models were simplified by omitting the random effect associated with the intercept. SAS proc mixed is currently the only software procedure that provides an option to relax the requirement that the D matrix be positive-definite. 6.10.1 Recommendations We recommend carefully checking the covariance parameter estimates and their standard errors. Models with multiple random effects, like those fitted in this chapter, may need to be simplified or respecified if problems are encountered. For this analysis, we decided to remove the child-specific random effects associated with the intercept from Model 6.1 because of the estimation problems, and because there was little variability in the initial VSAE scores for these autistic children at 2 years of age, as illustrated in our initial data summary (see Figures 6.1 and 6.2). This had implications for the marginal covariance matrix, as illustrated in Subsection 6.7.2. But the remedies for this problem will depend on the subject matter under study and the model specification. We find that this issue arises quite often when analysts attempt to include too many random effects in a given linear mixed model (and accordingly request that the software provide estimates of several variances and covariances). Careful examination of the covariance parameter estimates and possible reduction of the number of random effects included in the model (for example, were we really interested in the variance of the intercepts in the model for the Autism data?) will generally prevent these estimation issues. 6.11 An Alternative Approach: Fitting the Marginal Model with an Unstructured Covariance Matrix Fitting a marginal model with an “unstructured” covariance matrix is a plausible alternative for the Autism data, because a relatively small number of observations are made on each child, and the observations are made at the same ages. We use the gls() function in R to fit a marginal model having the same fixed effects as in Model 6.3, but with no random effects, and an “unstructured” covariance structure for the marginal residuals. To fit a marginal model with an unstructured covariance matrix (i.e., an unstructured Ri matrix) for the residuals using the R software, we need to specify a “General” correlation structure within each level of CHILDID. This correlation structure (specified with Random Coefficient Models for Longitudinal Data: The Autism Example 303 the correlation = corSymm() argument) is characterized by completely general (unconstrained) correlations. In order to have general (unconstrained) variances as well (resulting in a specification consistent with the “unstructured” covariance structure in SAS and SPSS), we also need to make use of the weights = argument. The correlation = corSymm() argument requires the specification of an index variable with consecutive integer values, to identify the ordering of the repeated measures in a longitudinal data set. We first create this variable as follows: > > > > > > index <- age.2 index[age.2 == 0] index[age.2 == 1] index[age.2 == 3] index[age.2 == 7] index[age.2 == 11] <<<<<- 1 2 3 4 5 We then add the index variable to the original autism data frame object: > autism.updated <- subset(data.frame( autism, sicdegp2.f, age.2, index), !is.na(vsae)) We now specify this correlation structure in the gls() function using the correlation=corSymm(form = ~ index | childid) argument in the following syntax, and allow for unequal variances at each level of AGE 2 by using the weights = varIdent(form = ~ 1 | age.2) argument. Refer to Pinheiro & Bates (1996) for more information about this structure; the SAS and SPSS procedures allow users to select this “unstructured” covariance structure for the Ri matrix directly as opposed to specifying correlation and weights arguments separately. > marg.model.fit <- gls( vsae ~ age. 2 + I(age.2^2) + sicdegp2.f + age.2:sicdegp2.f, correlation=corSymm(form = ~ index | childid), weights = varIdent(form = ~ 1 | age.2) , autism.updated) The estimated fixed effects in the marginal model can be obtained by applying the summary() function to the model fit object: > summary(marg.model.fit) ' $ Coefficients (Intercept) age.2 I(age.2^2) sicdegp2.f1 sicdegp2.f2 age.2:sicdegp2.f1 age.2:sicdegp2.f2 Value 12.471584 7.373850 -0.004881 -5.361092 -3.725960 -4.332097 -3.867433 Std.Error 0.5024636 0.6042509 0.0373257 0.6691472 0.6307950 0.7787670 0.7334745 t-value 24.820869 12.203293 -0.130761 -8.011828 -5.906769 -5.562764 -5.272757 p-value 0.000 0.000 0.896 0.000 0.000 0.000 0.000 & % We note that the effect of AGE 2 squared (I(age.2^2)) for the reference SICD group (SICDEGP = 3) is not significant in the marginal model (p = 0.896), whereas it is positive and significant (p = 0.01) in Model 6.3. In general, the estimates of the fixed effects and their standard errors in the marginal model are different from those estimated for Model 6.3, because the estimated Vi matrix for the marginal model differs from the implied Vi matrix for Model 6.3. The covariance matrix from the marginal model is also part of the output generated by the summary() function: 304 Linear Mixed Models: A Practical Guide Using Statistical Software ' $ Correlation Structure: General Formula: ~index | childid Parameter estimate(s): Correlation: 1 2 3 4 2 0.365 3 0.214 0.630 4 0.263 0.448 0.574 5 0.280 0.543 0.664 0.870 & % The correlations of the marginal residuals at time 1 (corresponding to 2 years of age) with residuals at later times are in general smaller (ranging from about 0.21 to 0.37) than the correlations of the residuals at times 2 through 5 with the residuals for later time points. Recall that the marginal correlations of observations at 2 years of age with other observations implied by Model 6.3 were zero (see Subsection 6.7.2), because we had omitted the random intercept and used AGE – 2 as a predictor (resulting in zeroes for the Zi matrix values corresponding to 2 years of age). We report the estimated marginal variance-covariance matrix Vi for a single child i as follows, using additional R code that can be found on the book’s web page (see Appendix A): ' $ [1,] [2,] [3,] [4,] [5,] 0 1 3 7 11 10.811330 8.858858 8.310693 24.59511 43.05836 8.858858 54.509537 55.123024 94.33787 187.58256 8.310693 55.123024 140.582610 194.00964 368.23192 24.595114 94.337868 194.009635 811.74023 1160.52408 43.058363 187.582559 368.231923 1160.52408 2191.80398 & % Software Note: The getVarCov() function, when applied to the marg.model.fit object obtained by using gls(), returns an incorrect result for the estimated Vi matrix (not shown). This function was primarily designed to work with model fit objects obtained using the lme() function. According to the R documentation, this function should also work for a gls() object. However, in our example, which involves the weights argument, the getVarCov() function does not return a correct marginal variance-covariance matrix. The estimated marginal variances, indicated on the diagonal of the matrix above, increase with age, similar to the marginal variances implied by Model 6.3 (see Subsection 6.7.2). We can also fit the preceding marginal model using the following syntax in SAS: title "Marginal Model w/ Unstructured Covariance Matrix"; proc mixed data = autism2 noclprint covtest; class childid sicdegp age; model vsae = sicdegp age_2 age_2sq age_2*sicdegp / solution; repeated age / subject = childid type = un r rcorr; run; The repeated statement indicates that observations on the same CHILDID are indexed by the original AGE variable (identified in the class statement), and that an unstructured covariance matrix (type = un) is to be used. The r option requests that SAS display the estimated marginal variance-covariance matrix in the output, and the rcorr option requests the corresponding correlation matrix. SPSS users can also fit a similar model using only a REPEATED statement in the MIXED syntax. Random Coefficient Models for Longitudinal Data: The Autism Example 6.11.1 305 Recommendations Fitting the marginal model directly with an “unstructured” residual covariance matrix allows us to get a better sense of the marginal variances and covariances in the observed data than from the marginal model implied by the random coefficient model (Model 6.3). When analyzing data sets from longitudinal studies with balanced designs (i.e., where all subjects are measured at the same time points), we recommend fitting the marginal model in this manner, because it allows us to get an idea of whether the random coefficient model being fitted implies a reasonable marginal covariance structure for the data. However, fitting the marginal model directly does not allow us to answer research questions about the betweenchild variability in the VSAE trajectories, because the marginal model does not explicitly include any random child effects. Because these models do not include explicit random effects, they should not be referred to as linear mixed models. Linear mixed models, by definition, include a mix of fixed and random effects. When describing these models in technical reports or academic publications, they should be referred to as general linear models with correlated errors, and the type of covariance structure used for the errors should be clearly defined. This page intentionally left blank 7 Models for Clustered Longitudinal Data: The Dental Veneer Example 7.1 Introduction In this chapter we illustrate fitting linear mixed models (LMMs) to clustered longitudinal data, in which units of analysis are nested within clusters, and repeated measures are collected on the units of analysis over time. Each cluster may have a different number of units of analysis, and the time points at which the dependent variable is measured can differ for each unit of analysis. Such data sets can be considered to have three levels. Level 3 represents the clusters of units, Level 2 the units of analysis, and Level 1 represents the longitudinal (repeated) measures made over time. In Table 7.1 we illustrate examples of clustered longitudinal data in different research settings. Such data structures might arise in an educational setting when student achievement scores are measured over time, with students clustered within a sample of classrooms. A clustered longitudinal data structure might also be encountered in an environmental setting in a study of the weekly oxygen yield for a sample of trees measured over the course of a growing season. In this case, trees are clustered within a sample of plots, and the trees are measured repeatedly over time. In the Dental Veneer data set analyzed in this chapter, the dependent variable, gingival crevicular fluid (GCF), was measured at two post-treatment time points for each tooth, with teeth clustered within patients. Figure 7.1 illustrates an example of the structure of a clustered longitudinal data set, using the first patient in the Dental Veneer data set. Note that patient 1 (who represents a cluster of units) had four teeth (the units of analysis) included in the study, but other patients could have a different number of treated teeth. Measurements were made on each tooth at two follow-up time points (3 months and 6 months). FIGURE 7.1: Structure of the clustered longitudinal data for the first patient in the Dental Veneer data set. 307 308 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 7.1: Examples of Clustered Longitudinal Data in Different Research Settings Research Setting Level of Data Environment Education Dentistry Cluster of Units (Level 3) Cluster ID variable (random factor) Plot Classroom Patient Covariates Soil minerals, tree crown density in the plot Teacher’s years of experience, classroom size Gender, age Unit of Analysis ID variable (random factor) Tree Student Tooth Covariates Tree size Gender, age, baseline score Treatment, tooth type Time variable Week Marking period Month Dependent variable Oxygen yield Test score Gingival crevicular fluid (GCF) Timevarying covariates Sunlight exposure, precipitation Attendance Frequency of tooth brushing Unit of Analysis (Level 2) Time (Level 1) LMMs for clustered longitudinal data are “hybrids” of the models that we have used for the examples in previous chapters. Specifically, these models include random effects associated with both the clusters (e.g., plots, classrooms, and patients) and the units of analysis nested within these clusters (e.g., trees, students, and teeth) to take the clustered structure of the data into account, and also allow the residuals associated with the longitudinal measures on the same unit of analysis to be correlated. The variables indicating the units of analysis (Level 2) and the clusters of units (Level 3) are assumed to be random factors, with levels (e.g., students and classrooms) sampled from a larger population. However, convenience samples, which are easy to obtain but do not arise from a probability sample, are commonly used in practice. The time variable itself can be considered to be either a categorical fixed factor or a continuous predictor. If each unit is measured at the same time points, time is crossed with the random factors defined at Level 2 and Level 3. However, we do not necessarily require each unit of analysis to be measured at the same time points. In this chapter, we highlight features in the Stata software package. Models for Clustered Longitudinal Data:The Dental Veneer Example 7.2 7.2.1 309 The Dental Veneer Study Study Description The Dental Veneer data were collected by researchers at the University of Michigan Dental School, in a study investigating the impact of veneer placement on subsequent gingival (gum) health among adult patients (Ocampo, 2005). Ceramic veneers were applied to selected teeth to hide discoloration. The treatment process involved removing some of the surface of each treated tooth, and then attaching the veneer to the tooth with an adhesive. The veneer was placed to match the original contour of the tooth as closely as possible. The investigators were interested in studying whether varying amounts of contour difference (CDA) due to placement of the veneer might affect gingival health in the treated teeth over time. One measure of gingival health was the amount of GCF in pockets of the gum adjacent to the treated teeth. GCF was measured for each tooth at visits 3 months and 6 months posttreatment. A total of 88 teeth in 17 patients were prepared for veneer placement, and a baseline measure of GCF was collected for each tooth. We consider only the 55 treated teeth located in the maxillary arches of 12 patients in this example to avoid duplication of results from the original authors. Each patient could have different numbers of treated teeth, and the particular teeth that were treated could differ by patient. Table 7.2 presents a portion of the Dental Veneer data set in the “long” format appropriate for analysis using the mixed command in Stata, and the LMM procedures in SAS, SPSS, and R. The following variables are included in the Dental Veneer data set: Patient (Level 3) Variables • PATIENT = Patient ID variable (Level 3 ID) • AGE = Age of patient when veneer was placed; constant for all observations on the same patient Tooth (Level 2) Variables • TOOTH = Tooth number (Level 2 ID) • BASE GCF = Baseline measure of GCF for the tooth; constant for all observations on the same tooth • CDA = Average contour difference in the tooth after veneer placement; constant for all observations on the same tooth Time-Varying (Level 1) Variables • TIME = Time points of longitudinal measures (3 = 3 Months, 6 = 6 Months) • GCF = Gingival crevicular fluid adjacent to the tooth, collected at each time point (dependent variable) 310 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 7.2: Sample of the Dental Veneer Data Set Patient (Level 3) Tooth (Level 2) Cluster ID Covariate Unit ID PATIENT AGE TOOTH 1 46 1 46 1 46 1 46 ... 1 46 1 46 3 32 3 32 3 32 3 32 ... Note: “...” indicates portion of 7.2.2 Longitudinal Measures (Level 1) Covariate BASE GCF Time Variable Dependent Variable CDA TIME GCF 6 6 7 7 17 17 22 22 4.67 4.67 4.67 4.67 3 6 3 6 11 68 13 47 11 11 6 6 7 7 17 17 3 3 4 4 5.67 5.67 7.67 7.67 11.00 11.00 3 6 3 6 3 6 11 53 28 23 17 15 the data not displayed. Data Summary The data summary for this example was generated using Stata (Release 13). A link to the syntax and commands that can be used to perform similar analyses in the other software packages is included on the book’s web page (see Appendix A). We begin by importing the tab-delimited raw data file (veneer.dat) into Stata from the C:\temp directory: . insheet using "C:\temp\veneer.dat", tab We create line graphs of the GCF values across time for all teeth within each patient. To create these graphs, we need to restructure the data set to a “wide” format using the reshape wide command: . keep patient tooth age time gcf . reshape wide gcf, i(patient time) j(tooth) Software Note: The Dental Veneer data set is restructured here only for the purpose of generating Figure 7.2. The original data set in the “long” form displayed in Table 7.2 will be used for all mixed model analyses in Stata. The following output is generated as a result of applying the reshape command to the original data set. As indicated in this output, the restructured data set contains just 24 observations (2 per patient, corresponding to the 2 time points) and 9 variables: PATIENT, TIME, AGE, and GCF6 through GCF11. The new GCF variables are indexed by levels of the original TOOTH variable (GCF6 corresponds to tooth 6, etc.). Models for Clustered Longitudinal Data:The Dental Veneer Example ' 311 $ (note: j = 6 7 8 9 10 11) Data long -> wide ----------------------------------------------------------------------------Number of obs. 110 -> 24 Number of variables 5 -> 9 j variable (6 values) tooth -> (dropped) xij variables: gcf -> gcf6 gcf7 ... gcf11 ----------------------------------------------------------------------------& % The first four observations in the restructured data set are as follows: . list in 1/4 ' +------------------------------------------------------------------+ | patient time gcf6 gcf7 gcf8 gcf9 gcf10 gcf11 age | |------------------------------------------------------------------| 1. | 1 3 11 13 14 10 14 11 46 | 2. | 1 6 68 47 58 57 44 53 46 | 3. | 3 3 28 17 19 34 54 38 32 | 4. | 3 6 23 15 32 46 39 19 32 | +------------------------------------------------------------------+ & $ % We now label the six new GCF variables to indicate the tooth numbers to which they correspond and label the AGE variable: . . . . . . . label label label label label label label var var var var var var var gcf6 "6" gcf7 "7" gcf8 "8" gcf9 "9" gcf10 "10" gcf11 "11" age "Patient Age" Next, we generate Figure 7.2 using the twoway line plotting command to create a single figure containing multiple line graphs: . twoway line gcf6 gcf7 gcf8 gcf9 gcf10 gcf11 time, /// lcolor(black black black black black black) /// lpattern(solid dash longdash vshortdash shortdash dash_dot) /// ytitle(GCF) by(age) We use the /// symbols in this command because we are splitting a long command into multiple lines. The lcolor option is used to set the color of the six possible lines (corresponding to teeth within a given patient) within a single graph to be black. The pattern of the lines is set with the lpattern option. The plots in this graph are ordered by the age of each patient, displayed at the top of each panel. In Figure 7.2, we observe that the GCF values for all teeth within a given patient tend to follow the same trend over time (the lines for the treated teeth appear to be roughly parallel within each patient). In some patients, the GCF levels tend to increase, whereas in others the GCF levels tend to decrease or remain relatively constant over time. This pattern suggests that an appropriate model for the data might include random patientspecific time slopes. The GCF levels of the teeth also tend to differ by patient, suggesting 312 Linear Mixed Models: A Practical Guide Using Statistical Software 25 28 32 36 41 46 47 50 51 53 62 0 50 100 0 50 GCF 100 0 50 100 21 3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6 time 6 8 10 7 9 11 Graphs by Patient Age FIGURE 7.2: Raw GCF values for each tooth vs. time, by patient. Panels are ordered by patient age. that a model should also include random patient-specific intercepts. There is also evidence in most patients that the level of GCF tends to differ by tooth, suggesting that we may want to include random tooth-specific intercepts in the model. 7.3 Overview of the Dental Veneer Data Analysis For the analysis of the Dental Veneer data, we use the “top-down” modeling strategy discussed in Subsection 2.7.1 of Chapter 2. In Subsection 7.3.1, we outline the analysis steps and informally introduce related models and hypotheses to be tested. In Subsection 7.3.2 we present more detailed specifications of selected models. The hypotheses tested are detailed in Subsection 7.3.3. To follow the analysis steps outlined in this section, refer to the schematic diagram presented in Figure 7.3. 7.3.1 Analysis Steps Step 1: Fit a model with a “loaded” mean structure (Model 7.1). Fit a three-level model with a “loaded” mean structure, and random effects associated with both the intercept and slope for patients and with individual teeth within patients. Models for Clustered Longitudinal Data:The Dental Veneer Example 313 In Model 7.1, we include fixed effects associated with all covariates under consideration (TIME, BASE CGF, CDA, and AGE) and the two-way interactions between TIME and each of the other covariates. We consider TIME to be a continuous predictor in all models, even though it only has two levels; this allows us to interpret the fixed effect of TIME as the expected change in GCF over a one-month period. Based on inspection of the initial graphs in Figure 7.2, we add the following random components to this model: random patient-specific effects associated with both the intercept and slope (i.e., the effect of time) for each patient, and random effects associated with the intercept for each tooth nested within a patient. We assume that the residuals in Model 7.1 are independent and identically distributed, with constant variance across the time points. Step 2: Select a structure for the random effects (Model 7.1 vs. Model 7.1A). Decide whether to keep the random tooth-specific intercepts in the model. In this step we fit Model 7.1A, which excludes the random intercepts associated with teeth nested within patients, and test whether we need to keep these random effects in the model (Hypothesis 7.1). Based on the results of this hypothesis test, we decide to retain the nested random tooth effects in the model (and therefore the random patient effects as well, to preserve the hierarchical structure of the data in the model specification) and keep Model 7.1 as our preferred model at this stage. Step 3: Select a covariance structure for the residuals (Models 7.1, 7.2A, 7.2B, or 7.2C). Fit models with different covariance structures for the residuals associated with observations on the same tooth. In this step, we investigate different covariance structures for the residuals, while maintaining the same fixed and random effects as in Model 7.1. Because there are only two repeated measures on each tooth, we can only consider a limited number of residual covariance structures, and we will note that parameters in some of these covariance structures are aliased (or not identifiable; see Subsection 2.9.3). In short, the tooth-specific random intercepts are inducing marginal correlations between the repeated measures on each tooth, so an additional correlation parameter for the residuals is not needed. Model 7.2A: Unstructured covariance matrix for the residuals. The most flexible residual covariance structure involves three parameters: different residual variances at 3 and 6 months, and the covariance between the residuals at the two visits. This is an appropriate structure to use if GCF at 3 months has a residual variance different from that at 6 months. We include a residual covariance because we expect the two residuals for a given tooth to be positively related. Unfortunately, specifying an unstructured covariance matrix for the residuals together with the variance for the random tooth intercepts leads to an aliasing problem with the covariance parameters. Preferably, aliasing should be detected during model specification. If aliasing is unnoticed during model specification it leads to an estimation problem and usually can be indirectly detected during estimation of model parameters, as will be shown in the software-specific sections. Although in practice we would not consider this model, we present it (and Model 7.2B following) to illustrate what happens in each software procedure when aliasing of covariance parameters occurs. 314 Linear Mixed Models: A Practical Guide Using Statistical Software Model 7.2B: Compound symmetry covariance structure for the residuals. We next fitted a simpler model with equal (homogeneous) residual variances at 3 months and at 6 months, and correlated residuals. This residual covariance structure has only two parameters, but the residual covariance parameter is again aliased with the variance of the random tooth-specific intercepts in this model. Model 7.2C: Heterogeneous (unequal) residual variances at each time point, with zero covariance for the residuals. This residual covariance structure also has two parameters, representing the different residual variances at 3 months and at 6 months. However, the covariance of the residuals at the two time points is constrained to be zero. In this model, the residual covariance parameters are no longer aliased with the variance of the random tooth-specific intercepts. In Model 7.1, we assume that the residuals have constant variance at each time point, but this restriction is relaxed in Model 7.2C. We test Hypothesis 7.2 in this step to decide whether to use a heterogeneous residual covariance structure, Model 7.2C, or retain the simpler residual covariance structure of Model 7.1. Because of the nonsignificant result of this hypothesis test, we keep Model 7.1 as our preferred model at this stage of the analysis. Step 4: Reduce the model by removing fixed effects associated with the two-way interactions (Model 7.1 vs. Model 7.3) and check model diagnostics. We now fit Model 7.3, which omits the fixed effects associated with the two-way interactions between TIME and the other fixed covariates, using maximum likelihood (ML) estimation. We test the significance of these fixed effects (Hypothesis 7.3) and decide that Model 7.3 is our preferred model. We decide to retain all the main fixed effects associated with the covariates in our final model, so that the primary research questions can be addressed. We refit the final model using REML estimation to obtain unbiased estimates of the covariance parameters. We informally examine diagnostics for Model 7.3 in Section 7.10, using Stata. Figure 7.3 provides a schematic guide to the model selection process in the analysis of the Dental Veneer data (see Section 3.3.1 for details on how to interpret this figure). 7.3.2 Model Specification The general specification of the models in Subsection 7.3.2.1 corresponds closely to the syntax used to fit the models in SAS, SPSS, Stata, and R. In Subsection 7.3.2.2, we discuss the hierarchical specification of the models, which corresponds closely to the model setup in HLM. Selected models considered in the analysis of the Dental Veneer data are summarized in Figure 7.3. 7.3.2.1 General Model Specification The general form of Models 7.1 through 7.2C for an individual GCF response at visit t (t = 1, 2, corresponding to months 3 and 6) on tooth i nested within patient j (denoted by GCFtij ) is as follows: ⎫ GCFtij = β0 + β1 × TIMEt + β2 × BASE GCFij ⎪ ⎬ + β3 × CDAij + β4 × AGEj + β5 × TIMEt × BASE GCFij fixed ⎪ ⎭ + β6 × TIMEt × CDAij + β7 × TIMEt × AGEj +u0j + u1j × TIMEt + u0i|j + εtij } random (7.1) Models for Clustered Longitudinal Data:The Dental Veneer Example 315 FIGURE 7.3: Guide to model selection and related hypotheses for the analysis of the Dental Veneer data. The parameters β0 through β7 represent the fixed effects associated with the intercept, TIME, the patient-level and tooth-level covariates, and their two-way interactions; u0j and u1j are random patient effects associated with the intercept and time slope, respectively; u0i|j is the random effect associated with a tooth nested within a patient; and εtij represents a residual. We write the joint distribution of the two random effects associated with patient j as: u0j uj = ∼ N (0, D (2) ) u1j where the variance-covariance matrix D (2) is defined as: V ar(u0j ) cov(u0j , u1j ) (2) D = cov(u0j , u1j ) V ar(u1j ) For all models considered in this analysis, we specify an unstructured D (2) matrix defined 2 2 by three covariance parameters, σint:patient , σint,time:patient , and σtime: patient , as shown in the following matrix: 316 Linear Mixed Models: A Practical Guide Using Statistical Software D (2) = 2 σint:patient σint,time: patient σint,time: patient 2 σtime: patient The distribution of the random effects associated with tooth i nested within patient j is u0i|j ∼ N (0, D (1) ), 2 where the 1 × 1 matrix D (1) contains the variance, σint:tooth(patient) , of the nested random tooth effects u0i|j . 2 D (1) = Var(u0i|j ) = σint: tooth(patient) The distribution of the residuals, εij , associated with observations on the same tooth is ε1ij εij = ∼ N (0, Rij ) ε2ij where the variance-covariance matrix Rij for the residuals is defined as Var(ε1ij ) cov(ε1ij , ε2ij ) Rij = cov(ε1ij , ε2ij ) Var(ε2ij ) Different structures for Rij will be specified for different models in the analysis. We assume that the random effects associated with patients, u0j and u1j , are independent of the random effects associated with teeth nested within patients, u0i|j , and that all random effects are independent of the residuals. In Model 7.1, the Rij matrix is equal to σ 2 I2 and involves only one parameter, σ 2 , as shown below: 2 σ 0 Rij = = σ 2 I2 0 σ2 In this structure, the residual variance at 3 months is equal to that at 6 months, and there is no covariance between the residuals at 3 and at 6 months (they are independent). In Model 7.2A, we have an unstructured (UN) Rij matrix 2 σt1 σt1,t2 Rij = 2 σt1,t2 σt2 2 2 defined by three covariance parameters: σt1 , σt2 , and σt1,t2 . In this structure, the residual variance is allowed to be different at 3 and at 6 months, and we allow the covariance between the residuals at 3 and at 6 months to be different from zero (i.e., the residuals are not independent). In Model 7.2B, we have a compound symmetric (CS) Rij matrix involving two covariance parameters, σ 2 and σtl,t2 : 2 σ + σt1,t2 σt1,t2 Rij = σt1,t2 σ 2 + σt1,t2 In this structure, the residual variance is equal at 3 and at 6 months, and there is a covariance between the residuals at the two time points. 2 In Model 7.2C, we have a heterogeneous (HET) Rij matrix with two parameters, σt1 2 and σt2 : 2 σt1 0 Rij = 2 0 σt2 Models for Clustered Longitudinal Data:The Dental Veneer Example 317 In this structure, the residual variances are different at 3 and at 6 months, and the covariance is assumed to be zero. Finally, Model 7.3 has the same random effects and residual covariance structure as in Model 7.1, but omits the fixed effects associated with the two-way interactions. 7.3.2.2 Hierarchical Model Specification We now consider an equivalent specification of the model defined in (7.1), corresponding to the hierarchical specification in the HLM software. The correspondence between the notation used in HLM and our notation is defined in Table 7.3. The hierarchical model has three components, reflecting contributions from the three levels of the Dental Veneer data (time, tooth, and patient). First, we write the Level 1 component as: Level 1 Model (Time) GCFtij = b0i|j + b1i|j × TIMEt + εtij where ε1ij ε2ij (7.2) ∼ N (0, Rij ) The model in (7.2) implies that at the most basic level of the data set (the GCF measures at each time point), we have a set of simple linear regressions of GCF on TIME. The unobserved regression coefficients, i.e., the tooth-specific intercepts (b0i|j ) and TIME slopes (b1i|j ), depend on other fixed and random effects, as shown in the Level 2 model below: Level 2 Model (Tooth) b0i|j = b0j + β2 × BASE GCFij + β3 × CDAij + u0i|j b1i|j = b1j + β5 × BASE GCFij + β6 × CDAij (7.3) where 2 u0i|j ∼ N (0, σint:tooth(patient) ) The Level 2 model in (7.3) implies that the intercept, b0i|j , for tooth i nested within patient j depends on the intercept specific to the j-th patient, b0j , the tooth-specific covariates (BASE GCF and CDA), and a random effect associated with the tooth, u0i|j . The tooth-specific slope for time, b1i|j , depends on the patient-specific time effect, b1j , and the tooth-specific covariates. Note that we do not include a random effect for the tooth-specific slope for TIME. The Level 3 model for the patient-specific contributions to the intercept and slope is shown below: Level 3 Model (Patient) b0j = β0 + β4 × AGEj + u0j b1j = β1 + β7 × AGEj + u1j where uj = u0j u1j ∼ N (0, D (2) ) (7.4) 318 TABLE 7.3: Summary of Models Considered for the Dental Veneer Data Term/Variable General HLM Notation Notation Intercept TIME BASE GCF CDA AGE TIME × BASE GCF TIME × CDA TIME × AGE β0 β1 β2 β3 β4 β5 β6 β7 γ000 γ100 γ010 γ020 γ001 γ110 γ120 γ101 Random effects Intercept u0j u00k TIME Intercept u1j u0i|j u10k r0jk εtij εijk Variance of intercepts 2 σint:pat τ β [1,1] Variance of slopes Covariance of intercepts, slopes 2 σtime:pat σint,time:pat τ β [2,2] τ β [2,1] Patient (j) Tooth (i) within Patient (j) Residuals Visit (t) Covariance Patient Paramelevel ters (θ D ) for D Matrix 7.1 7.2A 7.2Ba 7.2C 7.3 √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Linear Mixed Models: A Practical Guide Using Statistical Software Fixed effects Model a General HLM Notation Notation 7.1 7.2Aa 7.2Ba 7.2C 7.3 Structureb D (2) τβ UN UN UN UN UN Variance of intercepts 2 σint:tooth(pat) τπ √ √ √ √ √ Variances at Time 1, Time 2 2 σt1 2 σt2 σ12 σ22 Equal Unequal Equal Covariance of Time 1, Time 2 σt1,t2 Variesc 0 Term/Variable Tooth Level Covariance Time ParameLevel ters (θ R ) for Rij Matrix Model √ √ Unequal Equal 0 Structureb Rij S σ2 I2 UN CS HET In Models 7.2A and 7.2B, the residual covariance parameters are aliased with the variance of the random tooth-level intercepts. b UN = unstructured, CS = compound symmetry, HET = diagonal with heterogeneous variances. c The notation for this covariance parameter varies in HLM, depending on the structure specified. a 0 σ 2 I2 Models for Clustered Longitudinal Data:The Dental Veneer Example TABLE 7.3: (Continued) 319 320 Linear Mixed Models: A Practical Guide Using Statistical Software The Level 3 model in (7.4) implies that the unobserved patient-specific intercept, b0j , depends on the overall fixed intercept, β0 , the single covariate measured at the patient level (AGE), and a random effect, u0j , associated with patient j. The Level 3 model also implies that the patient-specific TIME effect, b1j , depends on the overall fixed TIME effect, β1 , the patient’s age, and the random effect of TIME, u1j , associated with patient j. By substituting the values for b0j and b1j from the Level 3 model back into the Level 2 model, and then substituting the values for b0i|j and b1i|j from the Level 2 model into the Level 1 model, we obtain the general LMM specified in (7.1). 7.3.3 Hypothesis Tests Hypothesis tests considered in the analysis of the Dental Veneer data are summarized in Table 7.4. We consider three-level models for the Dental Veneer data, with random effects at Level 3 (the patient level) and at Level 2 (the tooth level). To preserve the hierarchy of the data, we first test the significance of the random effects beginning at Level 2 (the tooth level), while retaining those at Level 3 (the patient level). Hypothesis 7.1. The nested random effects, u0i|j , associated with teeth within the same patient can be omitted from Model 7.1. We decide whether we can remove the random effects associated with teeth indirectly by testing null and alternative hypotheses about the variance of these random effects, as follows: 2 H0 : σint:tooth(patient) =0 2 HA : σint:tooth(patient) >0 We use a REML-based likelihood ratio test for Hypothesis 7.1. The test statistic is calculated by subtracting the –2 REML log-likelihood for Model 7.1 (the reference model) from that of Model 7.1A (the nested model, excluding the nested random tooth effects). To obtain a p-value for this test statistic, we refer it to a mixture of χ2 distributions, with 0 and 1 degrees of freedom, and equal weights of 0.5. Based on the significant result of this test, we retain the nested random tooth effects in the model. Having made this choice, we do not test for the need of the random patient effects, u0j and u1j , to preserve the hierarchical structure of the data. Hypothesis 7.2. The variance of the residuals is constant (homogeneous) across the time points in Model 7.2C. We test Hypothesis 7.2 to decide whether we should include heterogeneous residual variances at the two time points (Model 7.2C) vs. having homogeneous residual variances. We write the null and alternative hypotheses as follows: 2 2 H0 : σt1 = σt2 2 2 HA : σt1 = σt2 The test statistic is calculated by subtracting the –2 REML log-likelihood for Model 7.2C (the reference model, with heterogeneous residual variances) from that of Model 7.1 (the nested model, with homogeneous residual variance). The distribution of the test statistic under the null hypothesis is a χ2 with 1 degree of freedom. Hypothesis 7.3. The fixed effects associated with the two-way interactions between TIME and the patient- and tooth-level covariates can be omitted from Model 7.1. Hypothesis Specification Hypothesis Test Models Compared Label Null (H0 ) Alternative (HA ) Test Nested Model (H0 ) Ref. Model (HA ) Est. Method Test Stat. Dist. under H0 7.1 Drop u0i|j , random tooth-specific intercepts 2 (σint:tooth(pat) = 0) Retain u0i|j 2 (σint:tooth(pat) > 0) LRT Model 7.1A Model 7.1 REML 0.5χ20 +0.5χ21 7.2 Constant residual 2 2 variance: σt1 = σt2 2 2 σt1 = σt2 LRT Model 7.1 Model 7.2C REML χ21 7.3 Drop fixed effects associated with all two-way interactions (β5 = β6 = β7 = 0) β5 = 0, or β6 = 0, or β7 = 0 LRT Model 7.3 Model 7.1 ML χ23 Models for Clustered Longitudinal Data:The Dental Veneer Example TABLE 7.4: Summary of Hypotheses Tested for the Dental Veneer Data 321 322 Linear Mixed Models: A Practical Guide Using Statistical Software In Hypothesis 7.3, we test whether the fixed effects associated with the two-way interactions between TIME and the covariates BASE CGF, CDA, and AGE are all equal to zero. The null and alternative hypotheses are specified as follows: H0 : β 5 = β 6 = β 7 = 0 HA : β5 = 0, or β6 = 0, or β7 = 0 We test Hypothesis 7.3 using a likelihood ratio test, based on ML estimation. The test statistic is calculated by subtracting the –2 ML log-likelihood for Model 7.1 (the reference model) from that of Model 7.3 (the nested model, excluding the fixed effects associated with the two-way interactions). The distribution of the test statistic under the null hypothesis is a χ2 with 3 degrees of freedom, corresponding to the 3 fixed effects set to zero under the null hypothesis. For the results of these hypothesis tests, see Section 7.5. 7.4 Analysis Steps in the Software Procedures In this section, we demonstrate the analysis steps described earlier in the chapter using the procedures in SAS, SPSS, R, Stata, and HLM. We compare results for selected models across the procedures in Section 7.6. 7.4.1 SAS Step 1: Fit a model with a “loaded” mean structure (Model 7.1). To fit Model 7.1 using SAS proc mixed, we use a temporary SAS data set named veneer, assumed to have the data structure shown in Table 7.2. Note that we sort the data by levels of PATIENT, TOOTH, and TIME before fitting the model, to make portions of the output easier to read (this sorting is optional and does not affect the analysis): proc sort data = veneer; by patient tooth time; run; title "Model 7.1"; proc mixed data = veneer noclprint covtest; class patient tooth; model gcf = time base_gcf cda age time*base_gcf time*cda time*age / solution outpred = resids; random intercept time / subject = patient type = un solution v = 1 vcorr = 1; random intercept / subject = tooth(patient) solution; run; We include the noclprint option in the proc mixed statement to save space by preventing SAS from displaying all levels of the class variables in the output. The covtest option is only used to display the estimated standard errors of the covariance parameters in the output for comparison with the estimates from the other software procedures. Because no estimation method is specified, the default REML method will be used. Models for Clustered Longitudinal Data:The Dental Veneer Example 323 The class statement identifies the categorical variables used in the random statements. PATIENT denotes clusters of units, and TOOTH denotes the units of analysis. The model statement specifies the terms that have associated fixed effects in the model. We include TIME, BASE GCF, CDA, AGE, and the two-way interactions between TIME and the other covariates. The solution option is specified so that SAS displays the estimated values of the fixed-effect parameters in the output. The outpred = option is used to obtain conditional residuals (conditional on the values of all of the appropriate random effects) for each observation in a new data set called resids. The first random statement (random intercept time / subject = patient) identifies the intercept and the slope for TIME as two random effects associated with each patient. The solution option requests that the EBLUPs for the random effects be displayed in the output. The type = un option specifies that the individual 2 × 2 blocks of the G covariance matrix of random effects for each subject (which we denote as the D (2) matrix) are unstructured. We also request that the variance-covariance matrix and the corresponding correlation matrix for the marginal model implied by Model 7.1 be displayed for the first subject by using the v = 1 and vcorr = 1 options. The second random statement (random intercept / subject = tooth(patient)) specifies that we wish to include a random effect associated with the intercept for each tooth nested within a patient. We also use the solution option to request that the EBLUPs for these random effects be displayed in the output. Because we do not include a repeated statement in the syntax for Model 7.1, SAS assumes that the residuals associated with all observations, εtij , are independent and identically distributed, with constant variance σ 2 . Step 2: Select a structure for the random effects (Model 7.1 vs. Model 7.1A). To test Hypothesis 7.1, we fit Model 7.1A, which omits the random effects associated with the intercept for each tooth nested within a patient. To do this, we remove the second random statement from the syntax for Model 7.1. We then subtract the –2 REML loglikelihood for Model 7.1 (the reference model) from the –2 REML log-likelihood for Model 7.1A (the nested model, excluding the nested random tooth effects). The p-value for the resulting test statistic (858.3 − 847.1 = 11.2) can be obtained in the SAS log by using the following syntax: data _null_; p_value = 0.5*(1-probchi(11.2,1)); put p_value; run; We use the probchi() function to obtain the appropriate p-value for the χ21 distribution and weight it by 0.5. Note that the χ20 distribution is not included in the syntax, because it contributes zero to the resulting p-value. We keep Model 7.1 as our preferred model at this stage of the analysis based on the result of this test. Step 3: Select a covariance structure for the residuals (Model 7.1 and Models 7.2A through 7.2C). Next, we investigate different covariance structures for the residuals associated with observations on the same tooth, by adding a repeated statement to the SAS syntax. We first create a new variable in the veneer data set named CATTIME, which has the same values as the original TIME variable and will be used to index the repeated measures: 324 Linear Mixed Models: A Practical Guide Using Statistical Software data veneer; set veneer; cattime = time; run; We fit Model 7.2A, which has an unstructured covariance matrix for the residuals associated with observations on the same tooth: title "Model 7.2A"; proc mixed data = veneer noclprint covtest; class patient tooth cattime; model gcf = time base gcf cda age time*base gcf time*cda time*age / solution outpred = resids; random intercept time / subject = patient type = un solution v = 1 vcorr = 1; random intercept / subject = tooth(patient) solution; repeated cattime / subject = tooth(patient) type = un r rcorr; run; Note that the proc mixed code used to fit this model is similar to the code used for Model 7.1; the only differences are the presence of the repeated statement and the inclusion of CATTIME in the class statement. The repeated statement specifies CATTIME as an index for the time points, allowing SAS to identify the repeated measures on a given tooth correctly and to identify appropriately the row and column elements in the Rij matrix. The subject = option identifies the units of analysis on which repeated observations were measured and, in this case, the subject is each tooth nested within a patient: subject = tooth(patient). We specify the covariance structure (or type) of the Rij matrix as unstructured by using the type = un option. We also request that the estimated 2 × 2 Rij matrix for the first tooth in the data set be displayed in the output along with the corresponding correlation matrix by specifying the r and rcorr options. Software Note: proc mixed requires that the variable used to define the index for the time points in the repeated statement be included in the class statement, so that proc mixed considers the variable as categorical. As long as there are no missing values on any of the repeated measures for any subjects and the data have been sorted (e.g., by PATIENT, TOOTH, and TIME), the use of such an index variable is not necessary. Index variables are also not necessary if a covariance structure that does not require ordering of the repeated measures, such as compound symmetry, is specified for the Rij matrix. The following note is produced in the SAS log after fitting Model 7.2A using SAS Release 9.3: NOTE: Convergence criteria met but final Hessian is not positive-definite. Models for Clustered Longitudinal Data:The Dental Veneer Example 325 The Hessian matrix is used to compute the standard errors of the estimated covariance parameters. Because we specified the covtest option in the syntax, we can inspect the estimated standard errors of the covariance parameters. As shown in the following SAS output, the estimated standard error of covariance parameter UN(2,2) for TOOTH(PATIENT) is zero. This is the estimated standard error of the estimated residual variance at time 2. ' $ Covariance Parameter Estimates Cov Parm Subject UN(1,1) UN(2,1) UN(2,2) Intercept UN(1,1) UN(2,1) UN(2,2) PATIENT PATIENT PATIENT TOOTH(PATIENT) TOOTH(PATIENT) TOOTH(PATIENT) TOOTH(PATIENT) Estimate Standard Error Z Value Pr Z 546.61 -148.64 44.6420 7.7452 101.55 39.1711 76.1128 279.34 74.3843 21.0988 18.4261 26.4551 15.2980 0 1.96 -2.00 2.12 0.42 3.84 2.56 . 0.0252 0.0457 0.0172 0.3371 <.0001 0.0105 . & % The nonpositive-definite Hessian matrix encountered in fitting Model 7.2A is a consequence of the aliasing of the residual covariance parameters and the variance of the random tooth effects. We fit Model 7.2B by modifying the repeated statement to specify that residuals associated with observations on the same tooth have a simpler compound symmetry covariance structure: repeated cattime / subject = tooth(patient) type = cs rcorr r; The only difference in the SAS syntax here is the specification of type = cs in the repeated statement, which requests a compound symmetry structure for the Rij matrix. When attempting to fit Model 7.2B in SAS Version 9.3, we see the following warning in the SAS log: NOTE: Stopped because of too many likelihood evaluations. The REML estimation procedure does not converge to a valid solution, and this is another consequence of the aliasing of the residual covariance and the variance of the random tooth effects. We do not see any estimates printed in the SAS output as a result. We now fit Model 7.2C, which has a diagonal Rij matrix that allows the variance of the residuals at 3 and at 6 months to differ, and has zero covariance between the two time points. We use the following repeated statement for this model: repeated cattime / subject = tooth(patient) type = un(l) rcorr r; The option type = un(1) specifies that the residuals in the Rij matrix are uncorrelated and that the residual variances differ for observations at different time points. This model now has no accompanying error messages, and the covariance parameter estimates and their standard errors seem appropriate. There is no apparent problem with aliasing of covariance parameters in this model. 326 ' Linear Mixed Models: A Practical Guide Using Statistical Software $ Covariance Parameter Estimates Cov Parm Subject UN(1,1) UN(2,1) UN(2,2) Intercept UN(1,1) UN(2,1) UN(2,2) PATIENT PATIENT PATIENT TOOTH(PATIENT) TOOTH(PATIENT) TOOTH(PATIENT) TOOTH(PATIENT) Estimate Standard Error Z Value Pr Z 546.60 -148.64 44.6417 46.9155 62.3774 0 36.9482 279.33 74.3839 21.0987 16.5274 18.8107 . 15.3006 1.96 -2.00 2.12 2.84 3.32 . 2.41 0.0252 0.0457 0.0172 0.0023 0.0005 . 0.0079 & % Note in the output above that the UN(2,1) parameter equals zero, because it is constrained to be zero by the specification of the type = un(1) structure for the Rij matrix. Model 7.2C is the only possible model other than Model 7.1 to consider at this stage of the analysis, because the other models have problems with aliasing of the covariance parameters. We use a likelihood ratio test for Hypothesis 7.2 to decide if we wish to keep the heterogeneous residual variances, as in Model 7.2C, or if we should have constant residual variance as in Model 7.1. We calculate the test statistic by subtracting the –2 REML log-likelihood for Model 7.2C (the reference model with heterogeneous residual variances) from that of Model 7.1 (the nested model). The test statistic is equal to 0.9. The p-value for this test is calculated by referring the test statistic to a χ2 distribution with 1 degree of freedom. We do not use a mixture of χ2 distributions for this test, because the null hypothesis value of the test statistic is not at the boundary of the parameter space. SAS code for computing the p-value for this test and displaying it in the log is as follows: data _null_; p_value = 1 - probchi(0.9, 1); put p_value; run; Because the test is not significant (p = 0.34), we keep Model 7.1, with homogeneous residual variance, as our preferred model at this stage of the analysis. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 7.1 vs. Model 7.3). We test Hypothesis 7.3 to decide whether we want to keep the fixed effects of the two-way interactions between TIME and the other covariates in Model 7.1. We first refit Model 7.1, using ML estimation, by including the method = ml option in the prox mixed statement, as shown in the following code: title "Model 7.1: ML Estimation"; proc mixed data = veneer noclprint covtest method = ml; class patient tooth; model gcf = time base gcf cda age time*base gcf time*cda time*age / solution outpred = resids; random intercept time / subject = patient type = un solution v = 1 vcorr = 1; random intercept / subject = tooth(patient) solution; run; Models for Clustered Longitudinal Data:The Dental Veneer Example 327 Next, we fit Model 7.3, also using ML estimation, by removing all two-way interactions from the model statement and refitting the model: model gcf = time base gcf cda age / solution outpred = resids; We calculate a likelihood ratio test statistic by subtracting the –2 ML log-likelihood for Model 7.1 (the reference model) from that for Model 7.3 (the nested model, without any two-way interactions). The test statistic has a value of 1.84, and the null distribution of this test statistic is a χ2 with 3 degrees of freedom, corresponding to the 3 fixed-effect parameters that we omitted from Model 7.1. The p-value for this likelihood ratio test is calculated and displayed in the log using the following SAS syntax: data _null_; p_value = 1 - probchi(1.84, 3); put p_value; run; Because this test statistic is not significant, we conclude that we can omit the fixed effects associated with the two-way interactions from the model, and select Model 7.3 as our final model. We now refit Model 7.3 using the default REML estimation method, to obtain unbiased estimates of the covariance parameters: title "Model 7.3: REML Estimation"; proc mixed data = veneer noclprint covtest; class patient tooth; model gcf = time base_gcf cda age / solution outpred = resids; random intercept time / solution subject = patient type = un v = 1 vcorr = 1; random intercept / solution subject = tooth(patient); run; 7.4.2 SPSS Step 1: Fit a model with a “loaded” mean structure (Model 7.1). We assume that an SPSS data set having the format displayed in Table 7.2 is currently open, and begin by specifying the SPSS syntax to set up Model 7.1: * Model 7.1 . MIXED gcf WITH time base gcf cda age /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = time base gcf cda age time*base gcf time*cda time*age | SSTYPE(3) /METHOD = REML /PRINT = G SOLUTION TESTCOV /SAVE = PRED RESID /RANDOM INTERCEPT time | SUBJECT(Patient) COVTYPE(UN) /RANDOM INTERCEPT | SUBJECT(tooth*Patient) . 328 Linear Mixed Models: A Practical Guide Using Statistical Software This syntax is very similar to that used to fit the three-level models discussed in Chapter 4. The first RANDOM subcommand sets up the two random effects for each patient: INTERCEPT and TIME. The subject is specified as PATIENT, and the covariance structure for the patient-specific random effects is unstructured, as indicated by the COVTYPE(UN) option. The second RANDOM subcommand sets up the random effects associated with the INTERCEPT for each tooth. The subject identified in this subcommand is (TOOTH*PATIENT), which actually indicates tooth nested within patient, although the syntax appears to be specifying TOOTH crossed with PATIENT. We would run this syntax to fit Model 7.1. Step 2: Select a structure for the random effects (Model 7.1 vs. Model 7.1A). To test Hypothesis 7.1, we fit a nested model, Model 7.1A, which excludes the random effects associated with the teeth nested within patients, by omitting the second RANDOM subcommand entirely from the syntax used for Model 7.1: * Model 7.1A . MIXED gcf WITH time base gcf cda age /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0,ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = time base gcf cda age time*base gcf time*cda time*age | SSTYPE(3) /METHOD = REML /PRINT = G SOLUTION TESTCOV /SAVE = PRED RESID /RANDOM INTERCEPT time | SUBJECT(Patient) COVTYPE(UN) . The likelihood ratio test statistic is calculated by subtracting the –2 REML log-likelihood for Model 7.1 (reported in the SPSS output) from that of Model 7.1A. The p-value is calculated by referring the test statistic to a mixture of χ2 distributions with degrees of freedom of 0 and 1, and equal weights of 0.5 (see Section 7.5). Step 3: Select a covariance structure for the residuals (Model 7.1, and Models 7.2A through 7.2C). Once the structure for the random effects has been set up, we consider different covariance structures for the residuals. But before fitting these models, we sort the data set in ascending order by PATIENT, TOOTH, and TIME to ensure that all desired covariance structures will be displayed in a way that is easy to read in the output (this sorting is recommended but not essential for the analysis): SORT CASES BY patient (A) tooth (A) time (A) . We first fit Model 7.2A with an unstructured covariance structure for the residuals associated with observations over time on the same tooth: * Model 7.2A . MIXED gcf WITH time base gcf cda age /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) Models for Clustered Longitudinal Data:The Dental Veneer Example 329 SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = time base gcf cda age time*base gcf time*cda time*age | SSTYPE(3) /METHOD = REML /PRINT = G R SOLUTION TESTCOV /SAVE = PRED RESID /RANDOM INTERCEPT time | SUBJECT(Patient) COVTYPE(UN) /RANDOM INTERCEPT | SUBJECT(tooth*Patient) /REPEATED time | SUBJECT(tooth*Patient) COVTYPE(UN) . This syntax is identical to that used to fit Model 7.1, with the exception of the PRINT and REPEATED subcommands. In the PRINT subcommand, we have requested that the estimated 2×2 Rij variance-covariance matrix be displayed in the SPSS output by using the R keyword (the corresponding correlation matrix is not available). The REPEATED subcommand is used to identify the units of analysis (or subjects) that have measurements made on them over time, the time index (TIME), and the structure of the covariance matrix for the residuals. In this case, we again specify the subject as (TOOTH*PATIENT), which denotes teeth nested within patients, and identify the covariance structure as UN (i.e., unstructured). After fitting Model 7.2A, the following warning message is displayed in the SPSS output: Warnings Iteration was terminated but convergence has not been achieved. The MIXED procedure continues despite this warning. Subsequent results produced are based on the last iteration. Validity of the model fit is uncertain. The MIXED command fails to converge to a solution because the residual covariance parameters are aliased with the variance of the random tooth effects in Model 7.2A. Results from the model fit should not be interpreted if this warning message appears. Next, we use SPSS syntax to fit Model 7.2B, which has a compound symmetry residual covariance structure. The REPEATED subcommand in the syntax used to fit Model 7.2A is modified to include the COVTYPE(CS) option: /REPEATED time | SUBJECT(tooth*patient) COVTYPE(CS) . Model 7.2B also has a problem with aliasing of the covariance parameters, and the same warning message about the model failing to converge is displayed in the output. The results of this model fit should also not be interpreted. We now fit Model 7.2C using the COVTYPE(DIAG) option, which specifies that the two residuals associated with observations on the same tooth are independent and have different variances: /REPEATED time | SUBJECT(tooth*patient) COVTYPE(DIAG) . In the diagonal covariance structure defined for the Rij matrix in this model, the covariance between observations on the same tooth is set to zero, and the residual variances along the diagonal of the matrix are heterogeneous. We test Hypothesis 7.2 by carrying out a likelihood ratio test to decide if we wish to have heterogeneous residual variances in the model. The test statistic is calculated by subtracting the –2 REML log-likelihood for Model 7.2C (the reference model with heterogeneous residual variances) from that of Model 7.1 (the nested model). The appropriate p-value for the test statistic is based on a χ2 distribution with 1 degree of freedom. Because this test is not significant, we keep Model 7.1 (with homogeneous residual variance) as our preferred model at this stage of the analysis. 330 Linear Mixed Models: A Practical Guide Using Statistical Software Step 4: Reduce the model by removing nonsignificant fixed effects (Model 7.1 vs. Model 7.3). Finally, we test Hypothesis 7.3 to see if we can remove the fixed effects associated with all the two-way interactions from the model. To do this, we first refit Model 7.1 using ML estimation by including the /METHOD = ML subcommand: * Model 7.1 (ML) . MIXED gcf WITH time base gcf cda age /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = time base gcf cda age time*base gcf time*cda time*age | SSTYPE(3) /METHOD = ML /PRINT = G SOLUTION TESTCOV /SAVE = PRED RESID /RANDOM INTERCEPT time | SUBJECT(Patient) COVTYPE(UN) /RANDOM INTERCEPT | SUBJECT(tooth*Patient) . Next, we fit Model 7.3 by removing all two-way interactions listed in the FIXED subcommand: /FIXED = time base gcf cda age | SSTYPE(3) We can now carry out a likelihood ratio test for Hypothesis 7.3. To do this, we subtract the –2 ML log-likelihood for Model 7.1 (the reference model) from that of Model 7.3 (the nested model). Because this test is not significant (see Subsection 7.5.3), we choose Model 7.3 as our preferred model at this stage of the analysis. Finally, we refit Model 7.3 using the default REML estimation method, to obtain unbiased estimates of the covariance parameters: * Model 7.3 . MIXED gcf WITH time base gcf cda age /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = time base gcf cda age | SSTYPE(3) /METHOD = REML /PRINT = G SOLUTION TESTCOV /SAVE = PRED RESID /RANDOM INTERCEPT time | SUBJECT(Patient) COVTYPE(UN) /RANDOM INTERCEPT | SUBJECT(tooth*Patient) . 7.4.3 R We begin the analysis of the Dental Veneer data using R by reading the tab-delimited raw data file, which has the structure described in Table 7.2 with variable names in the first row, into a data frame object: > veneer <- read.delim("c:\\temp\\veneer.dat", h = T) Models for Clustered Longitudinal Data:The Dental Veneer Example 7.4.3.1 331 Analysis Using the lme() Function We first load the nlme package, so that the lme() function can be used in the analysis: > library(nlme) Step 1: Fit a model with a “loaded” mean structure (Model 7.1). We now fit Model 7.1 using the lme() function: > model7.1.fit <- lme(gcf ~ time + base gcf + cda + age + time:base gcf + time:cda + time:age, random = list(patient = ~time, tooth = ~1), data = veneer, method = "REML") The syntax used for the lme() function is discussed in detail below: • model7.1.fit is the name of the object that contains the results from the fit of Model 7.1. • The first argument of the function is a formula, which defines the continuous response variable (GCF), and the covariates and interaction terms that have fixed effects in the model. • The second argument of the function, random = list(patient = ~time, tooth = ~1), indicates the random effects to be included in the model. Note that a “list” has been declared to identify the specific structure of the random effects, and each random factor (i.e., PATIENT and TOOTH) in the list needs to have at least one associated random effect. This syntax implies that levels of the TOOTH variable are nested within levels of the PATIENT variable, because the PATIENT variable is the first argument of the list function. We include a patient-specific TIME slope (listed after the ~ symbol), and a random intercept associated with patients is included by default. Next, we explicitly specify that a random effect associated with the intercept (1) should be included for each level of TOOTH. By default, the lme() function chooses an unstructured 2 × 2 D (2) matrix for the random patient-specific effects. • The third argument of the function, data = veneer, indicates the name of the data frame object to be used. • The final argument, method = "REML", specifies that REML estimation, which is the default method for the lme() function, should be used. The estimates from the model fit can be obtained by using the summary() function: > summary(model7.1.fit) Confidence intervals for the covariance parameters in Model 7.1 can be obtained using the intervals() function: > intervals(model7.1.fit) The EBLUPs for each of the random effects in the model associated with the patients and the teeth within patients are obtained by using the random.effects() function: > random.effects(model7.1.fit) Unfortunately, the getVarCov() function cannot be used to obtain the estimated marginal variance-covariance matrices for given individuals in the data set, because this function has not yet been implemented in the nlme package for analyses with multiple levels of nested random effects. 332 Linear Mixed Models: A Practical Guide Using Statistical Software Step 2: Select a structure for the random effects (Model 7.1 vs. Model 7.1A). We carry out a likelihood ratio test of Hypothesis 7.1 by fitting Model 7.1 and Model 7.1A using REML estimation. Model 7.1 was fitted in Step 1 of the analysis. Model 7.1A is specified by omitting the random tooth effects from Model 7.1: > model7.1A.fit <- lme(gcf ~ time + base gcf + cda + age + time:base gcf + time:cda + time:age, random = list(patient = ~ time), data = veneer, method = "REML") We then subtract the –2 REML log-likelihood for Model 7.1 from that of the nested model, Model 7.1A. The p-value for the resulting test statistic can be obtained by referring it to a mixture of χ2 distributions with degrees of freedom of 0 and 1, and equal weights of 0.5. Users of R can use the anova() function to calculate the likelihood ratio test statistic for Hypothesis 7.1, but the reported p-value should be divided by 2, because of the distribution that the statistic follows under the null hypothesis. > anova(model7.1.fit, model7.1A.fit) We have strong evidence in favor of retaining the nested random tooth effects in the model. Step 3: Select a covariance structure for the residuals (Model 7.1, and Models 7.2A through 7.2C). Next, we fit models with less restrictive covariance structures for the residuals associated with observations on the same tooth. We first attempt to fit a model with an unstructured residual covariance matrix (Model 7.2A). We have included additional arguments in the lme() function, as shown in the following syntax: > model7.2A.fit <- lme(gcf ~ time + base gcf + cda + age + time:base gcf + time:cda + time:age, random = list(patient = ~ time, tooth = ~1), corr = corCompSymm(0.5, form = ~1 | patient/tooth), weights = varIdent(form = ~1 | time), data = veneer, method = "REML") We specify the unstructured residual covariance structure in two parts: the correlation of the residuals and their variances. First, we specify that the correlation structure is compound symmetric, by using the corr=corCompSymm() argument. An arbitrary starting value of 0.5 is used to estimate the correlation; the patient/tooth argument is used to identify the units of analysis to which the correlation structure applies (i.e., teeth nested within patients). Next, we use the weights = varIdent(form = ~1 | time) argument, to identify a covariance structure for the residuals that allows the residuals at the two time points to have different variances. We obtain results from this model fit by using the summary() function: > summary(model7.2A.fit) R does not produce a warning message when fitting this model. We attempt to use the intervals() function to obtain 95% confidence intervals for the covariance parameters: > intervals(model7.2A.fit) Models for Clustered Longitudinal Data:The Dental Veneer Example 333 Unlike the other software procedures, the lme() function in R computes approximate confidence intervals for each of the parameters being estimated (including the covariance parameters). However, we see that the estimated standard deviation of the residuals at Time 3 is 8.39 × 1.000 = 8.39, with a 95% confidence interval of (0.0008, 90591.87), while the estimated standard deviation of the residuals at Time 6 is 8.39 × 0.799 = 6.70, with a 95% confidence interval of (0.004, 143.73) (see Section 3.4.3 for more details about how these estimates are reported by the lme() function). Furthermore, the 95% confidence interval for the correlation of the two random effects at the patient level is (−1, 1). These results suggest that the estimates are extremely unstable, despite the apparently valid solution. We therefore consider simpler versions of this model. Next, we specify Model 7.2B with a more parsimonious compound symmetry covariance structure for the residuals associated with observations on the same tooth: > model7.2B.fit <- lme(gcf ~ time + base gcf + cda + age + time:base gcf + time:cda + time:age, random = list(patient = ~ time, tooth = ~1), corr = corCompSymm(0.5, form = ~1 | patient/tooth), data = veneer, method = "REML") This syntax is changed from that used to fit Model 7.2A by omitting the weights() argument, so that the residual variances at the two time points are constrained to be equal. We obtain results for this model fit by using the summary() function and attempt to obtain confidence intervals for the covariance parameters by using the intervals() function: > summary(model7.2B.fit) > intervals(model7.2B.fit) We once again see extremely wide intervals suggesting that the estimates of the covariance parameters are very unstable, despite the fact that the solution appears to be valid. This is due to the fact that the covariance of the residuals associated with the same tooth is aliased with the variance of the random tooth effects. We now fit Model 7.2C, which allows the residuals to have different variances at the two time points but assumes that the residuals at the two time points are uncorrelated (the heterogeneous, or diagonal, structure): > model7.2C.fit <- lme(gcf ~ time + base gcf + cda + age + time:base gcf + time:cda + time:age, random = list(patient = ~ time, tooth = ~1), weights = varIdent(form = ~1 | time), data = veneer, method = "REML") In this syntax, we omit the corr = corCompSymm() option and include the weights option, which specifies that the residual variances at each time point differ (form = ~1 | time). We obtain results of the model fit by using the summary() function, and we no longer see extremely wide confidence intervals when using the intervals() function to obtain approximate 95% confidence intervals for the covariance parameters: > summary(model7.2C.fit) > intervals(model7.2C.fit) We test Hypothesis 7.2 by subtracting the –2 REML log-likelihood for Model 7.2C (the reference model, with heterogeneous residual variances) from that of Model 7.1 (the nested model). This likelihood ratio test can be easily implemented using the anova() function within R: 334 Linear Mixed Models: A Practical Guide Using Statistical Software > anova(model7.2C.fit, model7.1.fit) Because the test is not significant (p = 0.33), we keep Model 7.1 as our preferred model at this stage of the analysis. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 7.1 vs. Model 7.3). We test whether we can omit the fixed effects associated with the two-way interactions between TIME and the other covariates in the model (Hypothesis 7.3) using a likelihood ratio test. First, we refit Model 7.1 using maximum likelihood estimation by including the method = "ML" option in the following syntax: > model7.1.ml.fit <- lme(gcf ~ time + base gcf + cda + age + time:base gcf + time:cda + time:age, random = list(patient = ~ time, tooth = ~1), data = veneer, method = "ML") Next, we fit Model 7.3, without the fixed effects of any of the two-way interactions, also using ML estimation: > model7.3.ml.fit <- lme(gcf ~ time + base gcf + cda + age, random = list(patient = ~ time, tooth = ~1), data = veneer, method = "ML") After fitting these two models using ML estimation, we use the anova() function to perform a likelihood ratio test of Hypothesis 7.3: > anova(model7.1.ml.fit, model7.3.ml.fit) The likelihood ratio test is nonsignificant (p = 0.61), so we keep Model 7.3 as our final model. We now refit Model 7.3 using REML estimation to obtain unbiased estimates of the covariance parameters. > model7.3.fit <- lme(gcf ~ time + base gcf + cda + age, random = list(patient = ~ time, tooth = ~1), data = veneer, method = "REML") > summary(model7.3.fit) 7.4.3.2 Analysis Using the lmer() Function We first load the lme4 package, so that the lmer() function can be used in the analysis: > library(lme4) Next, because the values of the TOOTH variable are not unique to a given patient and we need to specify that teeth are nested within patients, we create a new version of the TOOTH variable, TOOTH2, that does have unique values for each patient: > veneer$tooth2 <- as.numeric(paste(factor(veneer$patient), factor(veneer$tooth),sep="")) Models for Clustered Longitudinal Data:The Dental Veneer Example 335 Because of the way that models are specified when using the lmer() function, a failure to do this recoding would result in the random effects of TOOTH being crossed with the random patient effects (i.e., TOOTH = 6 for the first patient and TOOTH = 6 for the second patient would be interpreted as the same tooth for model-fitting purposes). This would result in unnecessary estimation complications, and incorrectly capture the nested structure of these data. We now proceed with model fitting using this recoded version of the TOOTH variable. Step 1: Fit a model with a “loaded” mean structure (Model 7.1). We first fit Model 7.1 using the lmer() function: > model7.1.fit.lmer <- lmer(gcf ~ time + base gcf + cda + age + time*base gcf + time*cda + time*age + (time | patient) + (1 | tooth2), data = veneer, REML = T) The syntax used for the lmer() function is discussed in detail below: • model7.1.fit.lmer is the name of the object that contains the results from the fit of Model 7.1. • The first argument of the function is a formula, which first defines the continuous response variable (GCF), and the covariates and interaction terms that have fixed effects in the model. Note that asterisks are used to indicate the interaction terms. • The formula also defines the random effects that are included in this model. The first set of random effects is specified with (time | patient), which indicates that the effect of time on GCF is allowed to randomly vary across levels of patient, in addition to the intercept (where random intercepts are once again included by default when the effect of a covariate is allowed to randomly vary). The covariance structure for these two random patient effects will be unstructured. The next random effect is specified with (1 | tooth2), which indicates that the intercept (represented by the constant value of 1) is also allowed to randomly vary across teeth within a patient (given that the recoded TOOTH2 identifier is being used). • The second argument of the function, data = veneer, indicates the name of the data frame object to be used. • The final argument, REML = T, specifies that REML estimation should be used. The estimates from the model fit can be obtained by using the summary() function: > summary(model7.1.fit.lmer) The intervals() function that was available for objects created using the lme() function is not available for objects created using the lmer() function, so we do not generate approximate 95% confidence intervals for the parameters in Model 7.1 here. The EBLUPs for each of the random effects in the model associated with the patients and the teeth within patients are obtained by using the ranef() function: > ranef(model7.1.fit.lmer)$patient > ranef(model7.1.fit.lmer)$tooth2 336 Linear Mixed Models: A Practical Guide Using Statistical Software Step 2: Select a structure for the random effects (Model 7.1 vs. Model 7.1A). We carry out a likelihood ratio test of Hypothesis 7.1 by fitting Model 7.1 and Model 7.1A using REML estimation. Model 7.1 was fitted in Step 1 of the analysis. Model 7.1A is specified by omitting the random tooth effects from Model 7.1: > model7.1A.fit.lmer <- lmer(gcf ~ time + base gcf + cda + age + time*base gcf + time*cda + time*age + (time | patient), data = veneer, REML = T) We then subtract the –2 REML log-likelihood for Model 7.1 from that of the nested model, Model 7.1A. The p-value for the resulting test statistic can be obtained by referring it to a mixture of χ2 distributions with degrees of freedom of 0 and 1, and equal weights of 0.5. Users of R can use the anova() function to calculate the likelihood ratio test statistic for Hypothesis 7.1, but the reported p-value should be divided by 2, because of the distribution that the statistic follows under the null hypothesis. > anova(model7.1.fit.lmer, model7.1A.fit.lmer) We have strong evidence in favor of retaining the nested random tooth effects in the model. Step 3: Select a covariance structure for the residuals (Model 7.1, and Models 7.2A through 7.2C). The current implementation of the lmer() function does not allow users to fit models with specified covariance structures for the conditional residuals, and assumes that residuals have constant variance and zero covariance (conditional on the random effects included in a given model). Given the problems with aliasing that we encountered when attempting to fit Models 7.2A through 7.2C in the other software procedures, this is not a serious limitation for this case study; however, some models may be improved by allowing for heterogeneous residual variances across groups, and this could be a limitation in other studies (e.g., Chapter 3). We therefore proceed with testing the fixed effects of the interactions in the model that retains all of the random effects associated with the patients and the teeth within patients. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 7.1 vs. Model 7.3). We test whether we can omit the fixed effects associated with the two-way interactions between TIME and the other covariates in the model (Hypothesis 7.3) using a likelihood ratio test. First, we refit Model 7.1 using maximum likelihood estimation by including the REML = F option in the following syntax: > model7.1.ml.fit.lmer <- lmer(gcf ~ time + base_gcf + cda + age + time*base_gcf + time*cda + time*age + (time | patient) + (1 | tooth2), data = veneer, REML = F) Next, we fit Model 7.3, without the fixed effects of any of the two-way interactions, also using ML estimation: Models for Clustered Longitudinal Data:The Dental Veneer Example 337 > model7.3.ml.fit.lmer <- lmer(gcf ~ time + base_gcf + cda + age + (time | patient) + (1 | tooth2), data = veneer, REML = F) After fitting these two models using ML estimation, we use the anova() function to perform a likelihood ratio test of Hypothesis 7.3: > anova(model7.1.ml.fit.lmer, model7.3.ml.fit.lmer) The likelihood ratio test is nonsignificant (p = 0.61), so we keep Model 7.3 as our final model. We now refit Model 7.3 using REML estimation to obtain unbiased estimates of the covariance parameters. > model7.3.fit.lmer <- lmer(gcf ~ time + base_gcf + cda + age + (time | patient) + (1 | tooth2), data = veneer, REML = T) > summary(model7.3.fit.lmer) As mentioned in earlier chapters, we recommend use of the lmerTest package in R for users interested in testing hypotheses about individual fixed-effect parameters estimated using the lmer() function. Likelihood ratio tests can be used, as demonstrated here, but these tests rely on asymptotic assumptions that may not apply for smaller data sets like the Dental Veneer data. 7.4.4 Stata Before we begin the analysis using Stata, we illustrate how to import the raw tab-delimited Dental Veneer data from the web site for the book, using web-aware Stata (where there is an active connection to the Internet): . insheet using http://www-personal.umich.edu/~bwest/veneer.dat Next, we generate variables representing the two-way interactions between TIME and the other continuous covariates, BASE GCF, CDA, and AGE: . gen time base gcf = time * base gcf . gen time cda = time * cda . gen time age = time * age Step 1: Fit a model with a “loaded” mean structure (Model 7.1). Now, we fit Model 7.1 using the mixed command: . * Model 7.1. . mixed gcf time base_gcf cda age time_base_gcf time_cda time_age || patient: time, cov(unstruct) || tooth: , variance reml The first variable listed is the dependent variable, GCF. Next, we list the covariates (including the two-way interactions) with associated fixed effects in the model. The random effects are specified after the fixed part of the model. If a multilevel data set is organized by a series of nested groups, such as patients and teeth nested within patients, then the random-effects structure of the mixed model is specified by a series of equations, 338 Linear Mixed Models: A Practical Guide Using Statistical Software each separated by ||. The nesting structure reads from left to right, with the first cluster identifier (PATIENT in this case) indicating the highest level of the data set. For Model 7.1, we specify the random factor identifying clusters at Level 3 of the data set (PATIENT) first. We indicate that the effect of TIME is allowed to vary randomly by PATIENT. A random patient-specific intercept is included by default and is not listed. We also specify that the covariance structure for the random effects at the patient level (the D (2) , matrix following our notation) is unstructured, with the option cov(unstruct). Because TOOTH follows the second clustering indicator ||, Stata assumes that levels of TOOTH are nested within levels of PATIENT. We do not list any variables after TOOTH, so Stata assumes that the only random effect for each tooth is associated with the intercept. A covariance structure for the single random effect associated with each tooth is not required, because only a single variance will be estimated. Finally, the variance option requests that the estimated variances of the random patient and tooth effects, rather than their estimated standard deviations, be displayed in the output (along with the estimated variance of the residuals), and the reml option requests REML estimation. After running the mixed command, Stata displays a summary of the clustering structure of the data set implied by this model specification: # ----------------------------------------------------------| No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+-----------------------------------------patient | 12 2 9.2 12 tooth | 55 2 2.0 2 ----------------------------------------------------------" ! This summary is useful to determine whether the clustering structure has been identified correctly to Stata. In this case, Stata notes that there are 12 patients and 55 teeth nested within the patients. There are from 2 to 12 observations per patient, and 2 observations per tooth. After the command has finished running, the parameter estimates appear in the output. We can obtain information criteria associated with the fit of the model by using the estat ic command, and then save these criteria in a model fit object (model71) for later analyses: . estat ic . est store model71 Step 2: Select a structure for the random effects (Model 7.1 vs. Model 7.1A). In the output associated with the fit of Model 7.1, Stata automatically reports an omnibus likelihood ratio test for all random effects at once vs. no random effects. The test statistic is calculated by subtracting the −2 REML log-likelihood for Model 7.1 from that of a simple linear regression model without any random effects. Stata reports the following note along with the test: Note: LR test is conservative and provided only for reference. The test is conservative because appropriate theory for the distribution of this test statistic for multiple random effects has not yet been developed (users can click on the LR test is conservative statement in the Stata Results window for an explanation of this issue). We recommend testing the variance components associated with the random Models for Clustered Longitudinal Data:The Dental Veneer Example 339 effects one by one (e.g., Hypothesis 7.1), using likelihood ratio tests based on REML estimation. While this is generally appropriate for larger samples, we now illustrate this process using the Dental Veneer data. To test Hypothesis 7.1 using an LRT, we fit Model 7.1A. We specify this model by removing the portion of the random effects specification from Model 7.1 involving teeth nested within patients (i.e., || tooth: ,): . * Model 7.1A. . mixed gcf time base_gcf cda age time_base_gcf time_cda time_age || patient: time, cov(unstruct) variance reml To obtain a test statistic for Hypothesis 7.1, the –2 REML log-likelihood for Model 7.1 (the reference model) is subtracted from that of Model 7.1A (the nested model); both values are calculated by multiplying the reported log-restricted likelihood in the output by –2. The resulting test statistic (11.2) is referred to a mixture of χ2 distributions with degrees of freedom of 0 and 1, and equal weights of 0.5. We calculate the appropriate p-value for this test statistic as follows: . display 0.5*chiprob(1,11.2) .00040899 The test is significant (p = 0.0004), so we retain the nested random tooth effects in the model. We retain the random patient effects without testing them to reflect the hierarchical structure of the data in the model specification. Step 3: Select a covariance structure for the residuals (Model 7.1, and Models 7.2A through 7.2C). We now consider alternative covariance structures for the residuals, given that we have repeated measures on the teeth nested within each patient. We begin with Model 7.2A, and specify an unstructured covariance structure for the residuals associated with each tooth within a patient: . mixed gcf time base_gcf cda age time_base_gcf time_cda time_age || patient: time, cov(unstruct) || tooth: , residuals(un,t(time)) variance reml We note the inclusion of the option residuals(un,t(time)) after the random tooth effects have been specified. This option indicates that at the lowest level of the data set (the repeated measures on each tooth), the residuals have an unstructured covariance structure (indicated by un), and the residuals should be ordered within unique teeth by values on the variable TIME (indicated by t(time)). When fitting this model, Stata does not generate any warnings or error messages, but we see red flags in the output similar to those noted when performing this analysis using the lme() function in R. The estimated standard errors for the estimated variance of the random tooth effects and the three residual covariance parameters (the residual variances at the two time points and the covariance of the two residuals) are all extremely large and nearly equal (about 2416), suggesting an identifiability problem with these four covariance parameters. The corresponding approximate confidence intervals for these covariance parameters are defined by entirely unrealistic limits that are either extremely large or extremely small, so we would view these results with some suspicion and consider alternative covariance structures. We now consider Model 7.2B, with a compound symmetry covariance structure for the residuals associated with each tooth within a patient: 340 Linear Mixed Models: A Practical Guide Using Statistical Software . mixed gcf time base_gcf cda age time_base_gcf time_cda time_age || patient: time, cov(unstruct) || tooth: , residuals(exchangeable) variance reml The only difference between this code and the code for Model 7.2A is the specification of an exchangeable, or compound symmetry, covariance structure for the residuals associated with each tooth (where no time ordering is needed given this covariance structure). When fitting this model, Stata does not converge to a solution after several hundred iterations, and the estimation algorithm needs to be stopped manually. This is due to the fact that the covariance of the residuals and the variance of the random tooth effects in this structure are perfectly aliased with each other, and Stata is unable to generate parameter estimates. We therefore consider Model 7.2C, which removes the covariance of the residuals and allows the variances of the residuals at each time point to be unique: . mixed gcf time base_gcf cda age time_base_gcf time_cda time_age || patient: time, cov(unstruct) || tooth: , residuals(ind, by(time)) variance reml We now specify that the residuals are independent (using the option ind), meaning that the residuals have a diagonal covariance structure, and allow the residuals at each time point to have unique variances (using the by(time) option). We do not encounter any problems or red flags when fitting this model, and we save the information criteria in a second model fit object, enabling a likelihood ratio test of Hypothesis 7.2 (that the residual variances at each time point are equal): . est store model72C We now perform a formal test of Hypothesis 7.2 using the lrtest command: . lrtest model72C model71 The resulting test statistic is not significant (p = 0.33), suggesting that we would fail to reject the null hypothesis that the residual variance is constant at the two time points. We therefore proceed with Model 7.1. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 7.1 vs. Model 7.3). We now test Hypothesis 7.3 using a likelihood ratio test based on ML estimation to decide whether we want to retain the fixed effects associated with the interactions between TIME and the other covariates in the model. To do this, we first refit Model 7.1 using ML estimation for all parameters in the model (note that the reml option has been removed): . * Model 7.1. . mixed gcf time base_gcf cda age time_base_gcf time_cda time_age || patient: time, cov(unstruct) || tooth: , variance We view the model fit criteria by using the estat ic command and store the model estimates and fit criteria in a new object named model7_l_ml: . estat ic . est store model7_l_ml Next, we fit a nested model (Model 7.3), again using ML estimation, by excluding the two-way interaction terms from the model specification: Models for Clustered Longitudinal Data:The Dental Veneer Example 341 . * Model 7.3. . mixed gcf time base_gcf cda age || patient: time, cov(unstruct) || tooth: , variance We display the model information criteria for this nested model, and store the model fit criteria and related estimates in another new object named model7_3_ml: . estat ic . est store model7_3_ml The appropriate likelihood ratio test for Hypothesis 7.3 can now be carried out by applying the lrtest command to the two objects containing the model fit information: . lrtest model7_l_ml model7_3_ml The results of this test are displayed in the Stata output: Likelihood-ratio test (Assumption: model7_3_ml nested in model7_1_ml) LR chi2(3) = Prob > chi2 = 1.84 0.6060 Because the test is not significant (p = 0.61), we omit the two-way interactions and choose Model 7.3 as our preferred model. Additional tests could be performed for the fixed effects associated with the four covariates in the model, but we stop at this point (so that we can interpret the main effects of the covariates) and refit our final model (Model 7.3) using REML estimation: . * Model 7.3 (REML). . mixed gcf time base_gcf cda age || patient: time, cov(unstruct) || tooth: , variance reml We carry out diagnostics for Model 7.3 using informal graphical procedures in Stata in Section 7.10. 7.4.5 HLM We use the HMLM2 (Hierarchical Multivariate Linear Model 2) procedure to fit the models for the Dental Veneer data set, because this procedure allows for specification of alternative residual covariance structures (unlike the HLM3 procedure). We note that only ML estimation is available in the HMLM2 procedure. 7.4.5.1 Data Set Preparation To fit the models outlined in Section 7.3 using the HLM software, we need to prepare three separate data sets: 1. The Level 1 (longitudinal measures) data set: This data set has two observations (rows) per tooth and contains variables measured at each time point (such as GCF and the TIME variable). It also contains the mandatory TOOTH and PATIENT variables. In addition, the data set needs to include two indicator variables, one for each time point. We create two indicator variables: TIME3 has a value of 1 for all observations at 3 months, and 0 otherwise, whereas TIME6 equals 1 for all observations at 6 months, and 0 otherwise. These indicator variables must be created prior to importing the data set into HLM. The data must be sorted by PATIENT, TOOTH, and TIME. 342 Linear Mixed Models: A Practical Guide Using Statistical Software 2. The Level 2 (tooth-level) data set: This data set has one observation (row) per tooth and contains variables measured once for each tooth (e.g., TOOTH, CDA, and BASE GCF). This data set must also include the PATIENT variable. The data must be sorted by PATIENT and TOOTH. 3. The Level 3 (patient-level) data set: This data set has one observation (row) per patient and contains variables measured once for each patient (e.g., PATIENT and AGE). The data must be sorted by PATIENT. The Level 1, Level 2, and Level 3 data sets can easily be derived from a single data set having the “long” structure shown in Table 7.2. For this example, we assume that all three data sets are stored in SPSS for Windows format. 7.4.5.2 Preparing the Multivariate Data Matrix (MDM) File In the main HLM window, click File, Make new MDM file, and then Stat package input. In the dialog box that opens, select HMLM2 to fit a Hierarchical (teeth nested within patients), Multivariate (repeated measures on the teeth) Linear Model, and click OK. In the Make MDM window, choose the Input File Type as SPSS/Windows. Locate the Level 1 Specification, Browse to the location of the Level 1 data set defined earlier, and open the file. Now, click on the Choose Variables button, and select the following variables: PATIENT (check “L3id,” because this variable identifies Level 3 units), TOOTH (check “L2id,” because this variable identifies the Level 2 units), TIME (check “MDM” to include this variable in the MDM file), the response variable GCF (check “MDM”), and finally, TIME3 and TIME6 (check “ind” for both, because they are indicators for the repeated measures). Click OK when finished selecting these six variables. Next, locate the Level 2 Specification area, Browse to the Level 2 data set defined earlier, and open it. In the Choose Variables dialog box, select the following variables: PATIENT (check “L3id”), TOOTH (check “L2id”), CDA (check “MDM” to include this tooth-level variable in the MDM file), and BASE GCF (check “MDM”). Click OK when finished selecting these four variables. Now, in the Level 3 Specification area, Browse to the Level 3 data set defined earlier, and open it. Select the PATIENT variable (check “L3id”) and the AGE variable (check “MDM”). Click on OK to continue. Next, select “longitudinal” as the structure of the data. As the MDM template window indicates, this selection will only affect the notation used when the models are displayed in the HLM model-building window. Once all three data sets have been identified and the variables of interest have been selected, type a name for the MDM file (with an .mdm extension), and go to the MDM template file portion of the window. Click on Save mdmt file to save this setup as an MDM template file for later use (you will be prompted to supply a file name with an .mdmt suffix). Finally, click on the Make MDM button to create the MDM file using the three input files. You should briefly see a screen displaying descriptive statistics and identifying the number of records processed in each of the three input files. After this screen disappears, you can click on the Check Stats button to view descriptive statistics for the selected MDM variables at each level of the data. Click on the Done button to proceed to the model specification window. In the following model-building steps, we use notation from the HLM software. Table 7.3 shows the correspondence between the HLM notation and that used in (7.1) through (7.4). Step 1: Fit a model with a “loaded” mean structure (Model 7.1). We begin by specifying the Level 1 model, i.e., the model for the longitudinal measures collected on the teeth. The variables in the Level 1 data set are displayed in a list at the Models for Clustered Longitudinal Data:The Dental Veneer Example 343 left-hand side of the model specification window. Click on the outcome variable (GCF), and identify it as the Outcome variable. Go to the Basic Settings menu, and click on Skip Unrestricted (the “unrestricted” model in HMLM2 refers to a model with no random effects and an unstructured covariance matrix for the residuals, which will be considered in the next step), and click on Homogeneous (to specify that the residual variance will be constant and that the covariance of the residuals will be zero in this initial model). Choose a title for this analysis (such as “Veneer Data: Model 7.1”), and choose a location and name for the output (.html) file that will contain the results of the model fit. Click on OK to return to the model-building window. Click on File and Preferences, and then select Use level subscripts, to display subscripts in the model-building window. Three models will now be displayed. The initial Level 1 model, as displayed in the HLM model specification window, is as follows: Model 7.1: Level 1 Model (Initial) GCF*tij = π0ij + εtij In this simplest specification of the Level 1 model, the GCFtij for an individual measurement on a tooth depends on the tooth-specific intercept, denoted by π0ij , and a residual for the individual measure, denoted by εtij . We now add the TIME variable from the Level 1 data set to this model, by clicking on the TIME variable and then clicking on “Add variable uncentered.” The Level 1 model now has the following form: Model 7.1: Level 1 Model (Final) GCF*tij = π0ij + π1ij (TIMEtij ) + εtij The Level 2 model describes the equations for the tooth-specific intercept, π0ij , and the tooth-specific time effect, π1ij . The simplest Level 2 model is given by the following equations: Model 7.1: Level 2 Model (Initial) π0ij = β00j + r0ij π1ij = β10j The tooth-specific intercept, π0ij , depends on the patient-specific intercept, β00j , and a random effect associated with the tooth, r0ij . The tooth-specific time effect, π1ij , does not vary from tooth to tooth within the same patient and is simply equal to the patient-specific time effect, β10j . If we had more than two time points, this random effect could be included by clicking on the shaded r1ij term in the model for π1ij . We now include the tooth-level covariates by clicking on the Level 2 button, and then selecting the Level 2 equations for π0ij and π1ij , in turn. We add the uncentered versions of CDA and BASE GCF to both equations to get the completed version of the Level 2 model: Model 7.1: Level 2 Model (Final) π0ij = β00j + β01j (BASE GCFij ) + β02j (CDAij ) + r0ij π1ij = β10j + β11j (BASE GCFij ) + β12j (CDAij ) After defining the Level 1 and Level 2 models, HLM displays the combined Level 1 and Level 2 model in the model specification window. It also shows how the marginal variancecovariance matrix will be calculated based on the random tooth effects and the residuals 344 Linear Mixed Models: A Practical Guide Using Statistical Software currently specified in the Level 1 and Level 2 models. By choosing the Homogeneous option in the Basic Settings menu earlier, we specified that the residuals are assumed to be independent with constant variance. The Level 3 portion of the model specification window shows the equations for the patient-specific intercept, β00j , the patient-specific time slope, β10j , and the patient-specific effects of BASE GCF and CDA, which were defined in the Level 2 model. The simplest Level 3 equations for the patient-specific intercepts and slopes include only the overall fixed effects, γ000 and γ100 , and the patient-specific random effects for the intercept, u00j , and slope, u10j , respectively (one needs to click on the shaded u10j term to include it in the model). We add the patient-level covariate AGE to these equations: Model 7.1: Level 3 Model (Final) β00j = γ000 + γ001 (AGEj ) + u00j β10j = γ100 + γ101 (AGEj ) + u10j The patient-specific intercept, β00j , depends on the overall fixed intercept, γ000 , the patient-level covariate AGE, and a random effect for the intercept associated with the patient, u00j . The patient-specific time slope, β10j , depends on the fixed overall time slope, γ100 , the patient-level covariate AGE, and a random effect for TIME associated with the patient,u10j . The expressions for the Level 3, Level 2, and Level 1 models defined earlier can be combined to obtain the LMM defined in (7.1). The correspondence between the HLM notation and the notation we use for (7.1) can be found in Table 7.3. To fit Model 7.1, click on Run Analysis, and select Save and Run to save the .hlm command file. HLM will prompt you to supply a name and location for this file. After the estimation of the model has finished, click on File and select View Output to see the resulting parameter estimates and fit statistics. Note that HLM automatically displays the fixed-effect parameter estimates with model-based standard errors (“Final estimation of fixed effects”). The estimates of the covariance parameters associated with the random effects at each level of the model are also displayed. Step 2: Select a structure for the random effects (Model 7.1 vs. Model 7.1A). At this stage of the analysis, we wish to test Hypothesis 7.1 using a likelihood ratio test. However, HLM cannot fit models in which all random effects associated with units at a given level of a clustered data set have been removed. Because Model 7.1A has no random effects at the tooth level, we cannot consider a likelihood ratio test of Hypothesis 7.1 in HLM, and we retain all random effects in Model 7.1, as we did when using the other software procedures. Step 3: Select a covariance structure for the residuals (Model 7.1, and Models 7.2A through 7.2C). We are unable to specify Model 7.2A, having random effects at the patient and tooth levels and an unstructured covariance structure for the residuals, using the HMLM2 procedure. However, we can fit Model 7.2B, which has random effects at the patient- and tooth-level and a compound symmetry covariance structure for the residuals. To do this, we use a firstorder autoregressive, or AR(1), covariance structure for the Level 1 (or residual) variance. In this case, the AR(1) covariance structure is equivalent to the compound symmetry structure, because there are only two time points for each tooth in the Dental Veneer data set. Open the .hlm file saved in the process of fitting Model 7.1, and click Basic Settings. Choose the 1st order autoregressive covariance structure option, make sure that the Models for Clustered Longitudinal Data:The Dental Veneer Example 345 “unrestricted” model is still being skipped, enter a new title for the analysis and a different name for the output (.html) file to save, and click on OK to continue. We recommend saving the .hlm file under a different name as well when making these changes. Next, click on Run Analysis to fit Model 7.2B. In the process of fitting the model, HLM displays the following message: Invalid info, score, or likelihood This warning message arises because the residual covariance parameters are aliased with the variance of the nested random tooth effects in this model. The output for Model 7.2B has two parts. The first part is essentially a repeat of the output for Model 7.1, which had a homogeneous residual variance structure (under the header OUTPUT FOR RANDOM EFFECTS MODEL WITH HOMOGENEOUS LEVEL-1 VARIANCE). The second part (under the header OUTPUT FOR RANDOM EFFECTS MODEL FIRST-ORDER AUTOREGRESSIVE MODEL FOR LEVEL-1 VARIANCE) does not include any estimates of the fixed-effect parameters, due to the warning message indicated above. Because of the problems encountered in fitting Model 7.2B, we do not consider these results. Next, we fit Model 7.2C, which has a heterogeneous residual variance structure. In this model, the residuals at time 1 and time 2 are allowed to have different residual variances, but they are assumed to be uncorrelated. Click on the Basic Settings menu in the model-building window, and then click on the Heterogeneous option for the residual variance (make sure that “Skip unrestricted” is still checked). Enter a different title for this analysis (e.g., “Veneer Data: Model 7.2C”) and a new name for the output file, and then click on OK to proceed with the analysis. Next, save the .hlm file under a different name, and click on Run Analysis to fit this model and investigate the output. No warning messages are generated when Model 7.2C is fitted. However, because HLM only fits models of this type using ML estimation, we cannot carry out the REML-based likelihood ratio test of Hypothesis 7.2. The HMLM2 procedure by default performs an ML-based likelihood ratio test, calculating the difference in the deviance (or –2 ML loglikelihood) statistics from Model 7.1 and Model 7.2C, and displays the result of this test at the bottom of the output for the heterogeneous variances model (Model 7.2C). This nonsignificant likelihood ratio test (p = 0.31) suggests that the simpler nested model (Model 7.1) is preferable at this stage of the analysis. Step 4: Reduce the model by removing nonsignificant fixed effects (Model 7.1 vs. Model 7.3). We now fit Model 7.3, which omits the fixed effects associated with the two-way interactions between TIME and the other covariates from Model 7.1. In HLM, this is accomplished by removing the effects of the covariates in question from the Level 2 and Level 3 equations for the effects of TIME. First, the fixed effects associated with the tooth-level covariates, BASE GCF and CDA, are removed from the Level 2 equation for the tooth-specific effect of TIME, π1ij , as follows: Model 7.1: Level 2 Equation for the Effect of Time π1ij = β10j + β11j (BASE GCFij ) + β12j (CDAij ) Model 7.3: Level 2 Equation for the Effect of Time with Covariates Removed π1ij = β10j 346 Linear Mixed Models: A Practical Guide Using Statistical Software We also remove the patient-level covariate, AGE, from the Level 3 model for the patientspecific effect of TIME, β10j : Model 7.1: Level 3 Equation for the Effect of Time β10j = γ100 + γ101 (AGEj ) + u10j Model 7.3: Level 3 Equation for the Effect of Time with Covariates Removed β10j = γ100 + u10j To accomplish this, open the .hlm file defining Model 7.1. In the HLM model specification window, click on the Level 2 equation for the effect of TIME, click on the BASE GCF covariate in the list of covariates at the left of the window, and click on Delete variable from model. Repeat this process for the CDA variable in the Level 2 model. Then, click on the Level 3 equation for the effect of TIME, and delete the AGE variable. After making these changes, click on Basic Settings to change the title for this analysis and the name of the text output file, and click OK. To set up a likelihood ratio test of Hypothesis 7.3, click Other Settings and Hypothesis Testing. Enter the deviance reported for Model 7.1 (843.65045) and the number of parameters in Model 7.1 (13), and click OK. Save the .hlm file associated with the Model 7.3 specification under a different name, and then click Run Analysis. The results of the likelihood ratio test for the fixedeffect parameters that have been removed from this model (Hypothesis 7.3) can be viewed at the bottom of the output for Model 7.3. 7.5 Results of Hypothesis Tests The results of the hypothesis tests reported in this section were based on the analysis of the Dental Veneer data using Stata, and are summarized in Table 7.5. 7.5.1 Likelihood Ratio Tests for Random Effects Hypothesis 7.1. The nested random effects, u0i|j , associated with teeth within the same patient can be omitted from Model 7.1. The likelihood ratio test statistic for Hypothesis 7.1 is calculated by subtracting the value of the –2 REML log-likelihood associated with Model 7.1 (the reference model including the random tooth-specific intercepts) from that of Model 7.1A (the nested model excluding the random tooth effects). Because the null hypothesis value of the variance of the random tooth2 specific intercepts is at the boundary of the parameter space (H0 : σint : tooth(patient) = 0), the null distribution of the test statistic is a mixture of χ20 and χ21 distributions, each with equal weight 0.5 (Verbeke & Molenberghs, 2000). To evaluate the significance of the test statistic, we calculate the p-value as follows: p-value = 0.5 × P(χ20 > 11.2) + 0.5 × P(χ21 > 11.2) < 0.001 We reject the null hypothesis and retain the random effects associated with teeth nested within patients in Model 7.1 and all subsequent models. Models for Clustered Longitudinal Data:The Dental Veneer Example 347 TABLE 7.5: Summary of Hypothesis Test Results for the Dental Veneer Analysis Hypothesis Label Test Estimation Method Models Compared (Nested vs. Reference) Test Statistic Values (Calculation) p-Value 7.1 LRT REML 7.1A vs. 7.1 χ2 (0 : 1) = 11.2 (858.3 – 847.1) < .001 7.2 LRT REML 7.1 vs. 7.2C χ2 (1) = 0.9 (847.1 – 846.2) 0.34 χ2 (3) = 1.8 0.61 (845.5 – 843.7) Note: See Table 7.4 for null and alternative hypotheses, and distributions of test statistics under H0 . 7.3 LRT ML 7.3 vs. 7.1 The presence of the random tooth-specific intercepts implies that different teeth within the same patient tend to have consistently different GCF values over time, which is in keeping with what we observed in the initial data summary. To preserve the hierarchical nature of the model, we do not consider fitting a model without the random patient-specific effects, u0j and u1j . 7.5.2 Likelihood Ratio Tests for Residual Variance Hypothesis 7.2. The variance of the residuals is constant (homogeneous) across the time points in Model 7.2C. To test Hypothesis 7.2, we use a REML-based likelihood ratio test. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 7.2C, the reference model with heterogeneous residual variances, from that for Model 7.1, the nested model. Because Model 7.2C has one additional variance parameter compared to Model 7.1, the asymptotic null distribution of the test statistic is a χ21 distribution. We do not reject the null hypothesis in this case (p = 0.34) and decide to keep a homogeneous residual variance structure in Model 7.1 and all of the subsequent models. 7.5.3 Likelihood Ratio Tests for Fixed Effects Hypothesis 7.3. The fixed effects associated with the two-way interactions between TIME and the patient- and tooth-level covariates can be omitted from Model 7.1. To test Hypothesis 7.3, we use an ML-based likelihood ratio test. We calculate the test statistic by subtracting the –2 ML log-likelihood for Model 7.1 from that for Model 7.3. The asymptotic null distribution of the test statistic is a χ2 with 3 degrees of freedom, corresponding to the 3 fixed-effect parameters that are omitted in the nested model (Model 7.3) compared to the reference model (Model 7.1). There is not enough evidence to reject the null hypothesis for this test (p = 0.61), so we remove the two-way interactions involving TIME from the model. 348 Linear Mixed Models: A Practical Guide Using Statistical Software We do not attempt to reduce the model further, because the research is focused on the effects of the covariates on GCF. Model 7.3 is the final model that we consider for the analysis of the Dental Veneer data. 7.6 7.6.1 Comparing Results across the Software Procedures Comparing Model 7.1 Results Table 7.6 shows a comparison of selected results obtained using the six software procedures to fit the initial three-level model, Model 7.1, to the Dental Veneer data. This model is “loaded” with fixed effects, has two patient-specific random effects (associated with the intercept and with the effect of time), has a random effect associated with each tooth nested within a patient, and has residuals that are independent and identically distributed. Model 7.1 was fitted using REML estimation in SAS, SPSS, R, and Stata, and was fitted using ML estimation in HLM (the current version of the HMLM2 procedure does not allow models to be fitted using REML estimation). Table 7.6 demonstrates that the procedures in SAS, SPSS, R, and Stata agree in terms of the estimated fixed-effect parameters and their standard errors for Model 7.1. The procedures in all of these software packages use REML estimation by default. REML estimation is not available in HMLM2, so HMLM2 uses ML estimation instead. Consequently, the fixed-effect estimates from HMLM2 are not comparable to those from the other software procedures. As expected, the fixed-effect estimates from HMLM2 differ somewhat from those for the other software procedures. Most notably, the estimated standard errors reported by HMLM2 are smaller in almost all cases than those reported in the other software procedures. We expect this because of the bias in the estimated covariance parameters when ML estimation is used instead of REML (see Subsection 2.4.1). The estimated covariance parameters generated by HMLM2 differ more markedly from those in the other five software procedures. Although the results generated by the procedures in SAS, SPSS, R, and Stata are the same, the covariance parameters estimated by HMLM2 tend to be smaller and to have smaller standard errors than those reported by the other software procedures. Again, this is anticipated, in view of the bias in the ML estimates of the covariance parameters. The difference is most apparent in the variance of the random 2 patient-specific intercepts, σint:patient , which is estimated to be 555.39 with a standard error of 279.75 by the mixed procedure in Stata, and is estimated to be 447.13 with a standard error of 212.85 by HMLM2. There are also differences in the information criteria reported across the software procedures. The programs that use REML estimation agree in terms of the –2 REML loglikelihoods, but disagree in terms of the other information criteria, because of different calculation formulas that are used (see Section 3.6 for a discussion of these differences). The –2 log-likelihood reported by HMLM2 (referred to as the deviance) is not comparable with the other software procedures, because it is calculated using ML estimation. 7.6.2 Comparing Results for Models 7.2A, 7.2B, and 7.2C Table 7.7 presents a comparison of selected results across the procedures in SAS, SPSS, R, Stata, and HLM for Models 7.2A, 7.2B, and 7.2C. Recall that each of these models has a Models for Clustered Longitudinal Data:The Dental Veneer Example 349 different residual covariance structure, and that there were problems with aliasing of the covariance parameters in Models 7.2A and 7.2B. We do not display results for the lmer() procedure in R, given that one cannot currently fit models with conditional errors that are correlated and/or have nonconstant variance when using this procedure. In Table 7.7 we present the information criteria calculated by the procedures in SAS, SPSS, R, and Stata for Model 7.2A, Model 7.2B, and Model 7.2C. Because the covariance parameters in Model 7.2A and Model 7.2B are aliased, we do not compare their results with those for Model 7.2C, but present brief descriptions of how the problem might be detected in the software procedures. We note that the –2 REML log-likelihoods are virtually the same for a given model across the procedures. The other information criteria (AIC and BIC) differ because of different calculation formulas. We report the model information criteria and covariance parameter estimates for Model 7.2C, which has a heterogeneous residual variance structure and is the only model in Table 7.7 that does not have an aliasing problem. The estimated covariance parameters and their respective standard errors are comparable across the procedures in SAS, SPSS, R, and Stata, each of which use REML estimation. The covariance parameter estimates reported by the HMLM2 procedure, which are calculated using ML estimation, are in general smaller than those reported by the other procedures, and their estimated standard errors are smaller as well. We expect this because of the bias in the ML estimation of the covariance parameters. We do not present the estimates of the fixed-effect parameters for Model 7.2C in Table 7.7. 7.6.3 Comparing Model 7.3 Results Table 7.8 shows results from fitting the final model, Model 7.3, using REML estimation in SAS, SPSS, R, and Stata, and using ML estimation in HLM. This model has the same random effects and residual covariance structure as in Model 7.1, but omits the fixed effects associated with the two-way interactions between TIME and the other covariates from the model. The fixed-effect parameter estimates and their estimated standard errors are nearly identical across the five procedures (proc mixed in SAS, MIXED in SPSS, lme() in R, lmer() in R, and mixed in Stata) that use REML estimation. Results from the HMLM2 procedure again differ because HMLM2 uses ML estimation. In general, the estimated standard errors of the fixed-effect parameter estimates are smaller in HMLM2 than in the other software procedures. As noted in the comparison of results for Model 7.1, the –2 REML log-likelihood values agree very well across the procedures. The AIC and BIC differ because of different computational formulas. The information criteria are not computed by the HMLM2 procedure. The estimated covariance parameters and their estimated standard errors are also very similar across the procedures in SAS, SPSS, R, and Stata. Again, we note that these estimated parameters and their standard errors are consistently smaller in HMLM2, which uses ML estimation. 350 TABLE 7.6: Comparison of Results for Model 7.1 SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HMLM2 Estimation Method REML REML REML REML REML ML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) β0 (Intercept) β1 (Time) β2 (Baseline GCF) β3 (CDA) β4 (Age) β5 (Time × Base GCF) β6 (Time × CDA) β7 (Time × Age) Covariance Parameter 2 σint:patient σint,time:patient 2 σtime:patient 2 σint:tooth(patient) σ2 69.92(28.40) −6.02(7.45) −0.32(0.29) −0.88(1.08) −0.97(0.61) 0.07(0.06) 0.13(0.22) 0.11(0.17) Estimate (SE) a 69.92(28.40) −6.02(7.45) −0.32(0.29) −0.88(1.08) −0.97(0.61) 0.07(0.06) 0.13(0.22) 0.11(0.17) Estimate (SE) a 69.92(28.40) −6.02(7.45) −0.32(0.29) −0.88(1.08) −0.97(0.61) 0.07(0.06) 0.13(0.22) 0.11(0.17) Estimate (n.c.) b,c 555.39 (279.75) 555.39 (279.75) 555.39 −149.76(74.55)e −149.76(74.55)e −0.95(corr.) 44.72(21.15)f 44.72(21.15)f 44.72c 46.96(16.67) 46.96(16.67) 46.96c 49.69(10.92) 49.69(10.92) 49.69 69.92(28.40) −6.02(7.45) −0.32(0.29) −0.88(1.08) −0.97(0.61) 0.07(0.06) 0.13(0.22) 0.11(0.17) Estimate (n.c.) c 555.39 −0.95(corr.) 44.72c 46.96c 49.69 69.92(28.40) −6.02(7.45) −0.32(0.29) −0.88(1.08) −0.97(0.61) 0.07(0.06) 0.13(0.22) 0.11(0.17) Estimate (SE) 70.47(26.11) −6.11(6.83) −0.32(0.28) −0.88(1.05) −0.98(0.55) 0.07(0.06) 0.13(0.21) 0.11(0.15) Estimate (SE) 555.39(279.75) 447.13d (212.85) −149.77(74.55) −122.23(57.01) 44.72(21.15) 36.71(16.21) 46.96(16.67) 45.14(15.66) 49.69(10.92) 47.49(10.23) Model Information Criteria –2 RE/ML log-likelihood 847.1 847.1 847.1 847.1 AIC 857.1 857.1 873.1 873.1 BIC 859.5 870.2 907.2 908.2 Note: (n.c.) = not computed Note: 110 Longitudinal Measures at Level 1; 55 Teeth at Level 2; 12 Patients at Level 3. a Reported as UN(1,1) in SAS and SPSS. 847.1 873.1 908.2 843.7g n.c. n.c. Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed The nlme version of the lme() function reports the estimated standard deviations of the random effects and residuals by default; these estimates have been squared in Tables 7.6, 7.7, and 7.8. The intervals() function can be applied to obtain CIs for the parameters. c Standard errors are not reported. d HLM reports the four covariance parameters associated with the random effects in the 2 × 2 Tau(beta) matrix and the scalar Tau(pi), respectively. e Reported as UN(2,1) in SAS and SPSS. f Reported as UN(2,2) in SAS and SPSS. g The –2 ML log-likelihood associated with the model fit is referred to in the HLM output as the model deviance. Models for Clustered Longitudinal Data:The Dental Veneer Example b 351 SPSS: MIXED R: lme() function Stata: mixed HMLM2 REML REML REML REML ML Warning: Hessian not positive-definite Warning: validity of the fit uncertain Estimation Method Model 7.2A (Unstructured) Software Notes –2 REML log-likelihood AIC (smaller the better) BIC (smaller the better) 846.2 860.2 863.6 846.7 860.7 879.0 Wide intervals Extremely large SEs for cov. parms. for cov. parms. 846.2 876.2 915.5 Cannot be fitted 846.2 876.2 916.7 Model 7.2B (Comp. Symm.) Software Notes Stopped: Too many likelihood evaluations –2 REML/ML log-likelihood AIC BIC N/A N/A N/A Warning: validity of the fit uncertain Wide intervals for cov. parms. 847.1 859.1 874.9 847.1 875.1 911.9 846.2 858.2 873.9 846.2 874.2 910.9 Failure to converge Invalid likelihood N/A N/A N/A N/A N/A N/A Model 7.2C (Heterogeneous) –2 REML/ML log-likelihood AIC BIC 846.2 858.2 861.1 846.2 874.2 912.0 842.6 n.c. n.c. Covariance Parameters (Model 7.2C) 2 σint:patient σint,time:patient Estimate (SE) 546.60(279.33) −148.64(74.38) Estimate (SE) 546.61(279.34) −148.64(74.38) Estimate (SE) 546.61a,b −0.95(corr.) Estimate (SE) 546.61(279.34) −148.64(74.38) Estimate (SE) 438.18(212.68) −121.14(56.90) Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed 352 TABLE 7.7: Comparison of Results for Models 7.2A, 7.2B (Both with Aliased Covariance Parameters), and 7.2C Estimation Method SAS: proc mixed SPSS: MIXED R: lme() function Stata: mixed HMLM2 REML REML REML REML ML 2 σtime:patient 44.64(21.10) 44.64(21.10) 44.64b 44.64(21.10) 2 b σint:tooth(patient) 46.92(16.53) 46.92(16.53) 46.92 46.92(16.53) 2 σt1 62.38(18.81) 62.38(18.81) 62.38b 62.38(18.81) 2 b,c σt2 36.95(15.30) 36.95(15.30) 36.95 36.95(15.30) Note: SE = Standard error, (n.c.) = not computed Note: 110 Longitudinal Measures at Level 1; 55 Teeth at Level 2; 12 Patients at Level 3. a Users of R can employ the function intervals(model7.2c.fit) to obtain approximate 95% confidence intervals for the covariance parameters. b Standard errors are not reported. c See Subsection 3.4.3 for a discussion of the lme() function output from models with heterogeneous residual variance. 36.65(16.16) 45.12(15.54) 59.93(17.70) 35.06(14.32) Models for Clustered Longitudinal Data:The Dental Veneer Example TABLE 7.7: (Continued) 353 354 TABLE 7.8: Comparison of Results for Model 7.3 SPSS: MIXED R: lme() function R: lmer() function Stata: mixed HMLM2 Estimation Method REML REML REML REML REML ML Fixed-Effect Parameter Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) β0 (Intercept) β1 (Time) β2 (Baseline GCF) β3 (CDA) β4 (Age) Covariance Parameter 2 σint:patient σint,time:patient 2 σtime:patient 2 σint:tooth(patient) σ2 45.74(12.55) 0.30(1.94) −0.02(0.14) −0.33(0.53) −0.58(0.21) Estimate (SE) 45.74(12.55) 0.30(1.94) −0.02(0.14) −0.33(0.53) −0.58(0.21) Estimate (SE) 45.74(12.55) 0.30(1.94) −0.02(0.14) −0.33(0.53) −0.58(0.21) Estimate (n.c.) 524.95(252.99) 524.98(253.02) −140.42(66.57) −140.42(66.58) 41.89(18.80) 41.89(18.80) 47.45(16.63) 47.46(16.63) 48.87(10.51) 48.87(10.51) 524.99 −0.95(corr.) 41.89 47.46 48.87 45.74(12.55) 0.30(1.94) −0.02(0.14) −0.33(0.53) −0.58(0.21) Estimate (n.c.) 524.98 −0.95(corr.) 41.89 47.46 48.87 45.74(12.55) 0.30(1.94) −0.02(0.14) −0.33(0.53) −0.58(0.21) Estimate (SE) 46.02(11.70) 0.29(1.86) −0.02(0.14) −0.31(0.51) −0.58(0.19) Estimate (SE) 524.99(253.02) 467.74(221.98) −140.42(66.58) −127.80(59.45) 41.89(18.80) 38.23(16.86) 47.46(16.63) 44.57(15.73) 48.87(10.51) 48.85(10.52) Model Information Criteria –2 log-likelihood 841.9 841.9 841.9 841.9 AIC 851.9 851.9 861.9 861.9 BIC 854.3 865.1 888.4 888.9 Note: SE = Standard error, (n.c.) = not computed Note: 110 Longitudinal Measures at Level 1; 55 Teeth at Level 2; 12 Patients at Level 3. 841.9 861.9 888.9 845.5 n.c. n.c. Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Models for Clustered Longitudinal Data:The Dental Veneer Example 7.7 355 Interpreting Parameter Estimates in the Final Model Results in this section were obtained by fitting Model 7.3 to the Dental Veneer data using the mixed command in Stata. 7.7.1 Fixed-Effect Parameter Estimates The Stata output for the fixed-effect parameter estimates and their estimated standard errors is shown below. ' $ Log restricted-likelihood = -420.92761 Wald chi2(4) = 7.48 Prob > chi2 = 0.1128 gcf Coef. Std. Err. z P > |z| [95% Conf. Interval] -------------------------------------------------------------------------time 0.3009815 1.9368630 0.16 0.877 -3.4952000 4.0971630 base_gcf -0.0183127 0.1433094 -0.13 0.898 -0.2991940 0.2625685 cda -0.3293040 0.5292525 -0.62 0.534 -1.3666190 0.7080128 age -0.5773932 0.2139656 -2.70 0.007 -0.9967582 -0.1580283 _cons 45.7386200 12.5549700 3.64 0.000 21.1313300 70.3459100 & % The first part of the output above shows the value of the REML log-likelihood for Model 7.3. Note that in Table 7.8 we report the –2 REML log-likelihood value for this model (841.9). The Wald chi-square statistic and corresponding p-value reported at the top of the output represent an omnibus test of all fixed effects (with the exception of the intercept). The null distribution of the test statistic is a χ2 with 4 degrees of freedom, corresponding to the 4 fixed effects in the model. This test statistic is not significant (p = 0.11), suggesting that these covariates do not explain a significant amount of variation in the GCF measures. The mixed command reports z-tests for the fixed-effect parameters, which are asymptotic (i.e., they assume large sample sizes). The z-tests suggest that the only fixed-effect parameter significantly different from zero is the one associated with AGE (p = 0.007). There appears to be a negative effect of AGE on GCF, after controlling for the effects of time, baseline GCF, and CDA. Patients who are one year older are predicted to have an average value of GCF that is 0.58 units lower than similar patients who are one year younger. There is no significant fixed effect of TIME on GCF overall. This result is not surprising, given the initial data summary in Figure 7.2, in which we saw that the GCF for some patients went up over time, whereas that for other patients decreased over time. The effect of contour difference (CDA) is also not significant, indicating that a greater discrepancy in tooth contour after veneer placement is not necessarily associated with a higher mean value of GCF. Earlier, when we tested the two-way interactions between TIME and the other covariates (Hypothesis 7.3), we found that none of the fixed effects associated with the two-way interactions were significant (p = 0.61; see Table 7.5). As a result, all two-way interactions between TIME and the other covariates were dropped from the model. The fact that there were no significant interactions between TIME and the other covariates suggests that the effect of TIME on GCF does not tend to differ for different values of AGE, baseline GCF, or contour difference. 356 Linear Mixed Models: A Practical Guide Using Statistical Software 7.7.2 Covariance Parameter Estimates The Stata output below displays the estimates of the covariance parameters associated with the random effects in Model 7.3, reported by the mixed command. ' $ Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] -----------------------------------------------------------------------------patient: Unstructured var(time) 41.88772 524.98510 -140.42290 18.799970 var(_cons) 253.02050 66.57623 17.38009 204.128700 cov(time, _cons) -270.90990 100.95350 1350.17500 -9.935907 tooth: Identity var(_cons) 47.45738 16.63034 23.87920 94.31650 var(Residual) 48.86704 10.50523 32.06479 74.47382 LR test vs. linear regression: chi2(4) = 91.12 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. & % Stata reports the covariance parameter estimates, their standard errors, and approximate 95% confidence intervals. The output table divides the parameter estimates into three groups. The top group corresponds to the patient level (Level 3) of the model, where the Unstructured covariance structure produces three covariance parameter estimates. These include the variance of the random patient-specific time effects, var(time), the variance of the random patient-specific intercepts, var(_cons), and the covariance between these two random effects, cov(time, _cons). We note that the covariance between the two patientspecific random effects is negative. This means that patients with a higher (lower) time 1 value for their GCF tend to have a lower (higher) time 2 value. Because there is only a single random effect at the tooth level of the model (Level 2), the variance-covariance matrix for the nested random tooth effects (which only has one element) has an Identity covariance structure. The single random tooth effect is associated with the intercept, and the estimated covariance parameter at the tooth level represents the estimated variance of these nested random tooth effects. At the lowest level of the data (Level 1), there is a single covariance parameter associated with the variance of the residuals, labeled var(Residual). Stata also displays approximate 95% confidence intervals for the covariance parameters based on their standard errors, which can be used to get an impression of whether the true covariance parameters in the population of patients and teeth are equal to zero. We note that none of the reported confidence intervals cover zero. However, Stata does not automatically generate formal tests for any of these covariance parameters. Readers should note that interpreting these 95% confidence intervals for covariance parameters can be problematic, especially when estimates of variances are small. See Bottai & Orsini (2004) for more details. Finally, we note an omnibus likelihood ratio test for all covariance parameters. Stata generates a test statistic by calculating the difference in the –2 REML log-likelihood of Model 7.3 (the reference model) and that of a linear regression model with the same fixed effects but without any random effects (the nested model, which has four fewer covariance parameters). The result of this conservative test suggests that some of the covariance parameters are significantly different from zero, which is in concordance with the approximate 95% confidence intervals for the parameters. Stata allows users to click on the note about the likelihood ratio test in the output, for additional information about the reason why this Models for Clustered Longitudinal Data:The Dental Veneer Example 357 test should be considered conservative (see Subsection 5.4.4 for more details on this type of test). Based on these results and the formal test of Hypothesis 7.1, we have evidence of between-patient variance and between-tooth variance within the same patient that is not being explained by the fixed effects of the covariates included in Model 7.3. 7.8 The Implied Marginal Variance-Covariance Matrix for the Final Model In this section, we present the estimated Vj matrix for the marginal model implied by Model 7.3 for the first patient in the Dental Veneer data set. We use SAS proc mixed to generate this output, because the post-estimation commands associated with the current implementation of the mixed command in Stata 13 do not allow one to display blocks of the estimated Vj matrix in the output. Recall that prior to fitting the models in SAS, we first sorted the data by PATIENT, TOOTH, and TIME, as shown in the following syntax. This was done to facilitate reading the output for the marginal variance-covariance and correlation matrices. proc sort data = veneer; by patient tooth time; run; The estimated Vj matrix for patient 1 shown in the SAS output that follows can be generated by using the v = 1 option in either of the random statements in the proc mixed syntax for Model 7.3 (see Subsection 7.4.1). The following output displays an 8 × 8 matrix, because there are eight observations for the first patient, corresponding to measurements at 3 months and at 6 months for each of the patient’s four treated teeth. If another patient had three teeth, we would have had a 6 × 6 marginal covariance matrix. The estimated marginal variances and covariances for each tooth are represented by 2×2 blocks, along the diagonal of the Vj matrix. Note that the 2 × 2 tooth-specific covariance matrix has the same values across all teeth. The estimated marginal variance for a given observation on a tooth at 3 months is 155.74, whereas that at 6 months is 444.19. The estimated marginal covariance between observations on the same tooth at 3 and at 6 months is 62.60. ' Estimated V Matrix for PATIENT 1 $ Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 ---------------------------------------------------------------------------------1 155.7400 62.6020 59.4182 15.1476 59.4182 15.1476 59.4182 15.1476 2 62.6020 444.1900 15.1476 347.8600 15.1476 347.8600 15.1476 347.8600 3 59.4182 15.1476 155.7400 62.6020 59.4182 15.1476 59.4182 15.1476 4 15.1476 347.8600 62.6020 444.1900 15.1476 347.8600 15.1476 347.8600 5 59.4182 15.1476 59.4182 15.1476 155.7400 62.6020 59.4182 15.1476 6 15.1476 347.8600 15.1476 347.8600 62.6020 444.1900 15.1476 347.8600 7 59.4182 15.1476 59.4182 15.1476 59.4182 15.1476 155.7400 62.6020 8 15.1476 347.8600 15.1476 347.8600 15.1476 347.8600 62.6020 444.1900 & % 358 Linear Mixed Models: A Practical Guide Using Statistical Software The following SAS output shows the estimated marginal correlation matrix for patient 1 generated by proc mixed, obtained by using the vcorr = 1 option in either of the random statements in the syntax for Model 7.3. The 2 × 2 submatrices along the diagonal represent the marginal correlations between the two measurements on any given tooth at 3 and at 6 months. ' $ Estimated V Correlation Matrix for PATIENT 1 Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 --------------------------------------------------------------------------1 1.00000 0.23800 0.38150 0.05759 0.38150 0.05759 0.38150 0.05759 2 0.23800 1.00000 0.05759 0.78310 0.05759 0.78310 0.05759 0.78310 3 0.38150 0.05759 1.00000 0.23800 0.38150 0.05759 0.38150 0.05759 4 0.05759 0.78310 0.23800 1.00000 0.05759 0.78310 0.05759 0.78310 5 0.38150 0.05759 0.38150 0.05759 1.00000 0.23800 0.38150 0.05759 6 0.05759 0.78310 0.05759 0.78310 0.23800 1.00000 0.05759 0.78310 7 0.38150 0.05759 0.38150 0.05759 0.38150 0.05759 1.00000 0.23800 8 0.05759 0.78310 0.05759 0.78310 0.05759 0.78310 0.23800 1.00000 & % We note in this matrix that the estimated covariance parameters for Model 7.3 imply that observations on the same tooth are estimated to have a rather small marginal correlation of approximately 0.24. If we re-sort the data by PATIENT, TIME, and TOOTH and refit Model 7.3 using SAS proc mixed, the rows and columns in the correlation matrix are reordered correspondingly, and we can more readily view the blocks of marginal correlations among observations on all teeth at each time point. proc sort data = veneer; by patient time tooth; run; We run identical syntax to fit Model 7.3 and display the resulting marginal correlation matrix for patient 1: ' $ Estimated V Correlation Matrix for PATIENT 1 Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 --------------------------------------------------------------------------1 1.00000 0.38150 0.38150 0.38150 0.23800 0.05759 0.05759 0.05759 2 0.38150 1.00000 0.38150 0.38150 0.05759 0.23800 0.05759 0.05759 3 0.38150 0.38150 1.00000 0.38150 0.05759 0.05759 0.23800 0.05759 4 0.38150 0.38150 0.38150 1.00000 0.05759 0.05759 0.05759 0.23800 5 0.23800 0.05759 0.05759 0.05759 1.00000 0.78310 0.78310 0.78310 6 0.05759 0.23800 0.05759 0.05759 0.78310 1.00000 0.78310 0.78310 7 0.05759 0.05759 0.23800 0.05759 0.78310 0.78310 1.00000 0.78310 8 &0.05759 0.05759 0.05759 0.23800 0.78310 0.78310 0.78310 1.00000 % In this output, we focus on the 4 × 4 blocks that represent the correlations among the four teeth for patient 1 at time 1 and at time 2. It is readily apparent that the marginal correlation at time 1 is estimated to have a constant value of 0.38, whereas the marginal correlation among observations on the four teeth at time 2 is estimated to have a higher value of 0.78. As noted earlier, the estimated marginal correlation between the observations on tooth 1 at time 1 and time 2 (displayed in this output in row 1, column 5) is 0.24. We also note Models for Clustered Longitudinal Data:The Dental Veneer Example 359 that the estimated marginal correlation of observations on tooth 1 at time 1 and the other three teeth at time 2 (displayed in this output in row 1, columns 6 through 8) is rather low, not surprisingly, and is estimated to be 0.06. 7.9 Diagnostics for the Final Model In this section, we check the assumptions for the REML-based fit of Model 7.3, using informal graphical procedures available in Stata. Similar plots can be generated using the other four software packages by saving the conditional residuals, conditional predicted values, and EBLUPs of the random effects based on the fit of Model 7.3. We include syntax for performing these diagnostics in the other software packages on the book’s web page (see Appendix A). 7.9.1 Residual Diagnostics We first assess the assumption of constant variance for the residuals in Model 7.3. Figure 7.4 presents a plot of the standardized conditional residuals vs. the conditional predicted values (based on the fit of Model 7.3) to assess whether the variance of the residuals is constant. The final command used to fit Model 7.3 in Stata is repeated from Subsection 7.4.4 as follows: . * Model 7.3 (REML). . mixed gcf time base_gcf cda age || patient: time, cov(unstruct) || tooth: , variance reml After fitting Model 7.3, we save the standardized residuals in a new variable named ST RESID, by using the predict post-estimation command in conjunction with the rstandard option (this option requests that standardized residuals be saved in the data set): . predict st resid, rstandard We also save the conditional predicted GCF values (including the EBLUPs) in a new variable named PREDVALS, by using the fitted option: . predict predvals, fitted We then use the two new variables to generate the fitted-residual scatter plot in Figure 7.4, with a reference line at zero on the y-axis: . twoway (scatter st resid predvals), yline(0) The plot in Figure 7.4 suggests nonconstant variance in the residuals as a function of the predicted values, and that a variance-stabilizing transformation of the GCF response variable (such as the square-root transformation) may be needed, provided that no important fixed effects of relevant covariates have been omitted from the model. Analysts interested in generating marginal predicted values for various subgroups defined by the covariates and factors included in a given model, in addition to plots of those marginal predicted values, can make use of the post-estimation commands margins and marginsplot (where the latter command needs to directly follow the former command) to visualize these Linear Mixed Models: A Practical Guide Using Statistical Software −2 −1 Standardized residuals 0 1 2 3 360 0 20 40 Fitted values: xb + Zu 60 80 FIGURE 7.4: Residual vs. fitted plot based on the fit of Model 7.3. predicted values. For example, to plot marginal predicted values of GCF at TIME = 0, TIME = 3, and TIME = 6 in addition to 95% confidence intervals for the predicted values, the following two commands can be used (plot not shown): . margins, at(time=(0,3,6)) . marginsplot These plots can be useful for illustrating predicted values of the dependent variable based on a given model, and can be especially useful for interpreting more complex interactions. Stata users can submit the command help margins for more details. The assumption of normality for the conditional residuals can be checked by using the qnorm command to generate a normal Q–Q plot: . qnorm st resid The resulting plot in Figure 7.5 suggests that the distribution of the conditional residuals deviates from a normal distribution. This further suggests that a transformation of the response variable may be warranted, provided that no fixed effects of important covariates or interaction terms have been omitted from the final model. In this example, a square-root transformation of the response variable (GCF) prior to model fitting was found to improve the appearance of both of these diagnostic plots, suggesting that such a transformation would be recommended before making any final inferences about the parameters in this model. 7.9.2 Diagnostics for the Random Effects We now check the distributions of the predicted values (EBLUPs) for the three random effects in Model 7.3. After refitting Model 7.3, we save the EBLUPs of the two random 361 −2 −1 Standardized residuals 0 1 2 3 Models for Clustered Longitudinal Data:The Dental Veneer Example −2 −1 0 Inverse Normal 1 2 FIGURE 7.5: Normal Q–Q plot of the standardized residuals based on the fit of Model 7.3. patient effects and the nested random tooth effects in three new variables, again using the predict command: . predict pat_eblups*, reffects level(patient) . predict tooth_eblups, reffects level(tooth) The first command saves the predicted random effects (EBLUPs) for each level of PATIENT in new variables named PAT EBLUPS1 (for the random TIME effects) and PAT EBLUPS2 (for the random effects associated with the intercept) in the data set. The asterisk (*) requests that a single new variable be created for each random effect associated with the levels of PATIENT. In this case, two new variables are created, because there are two random effects in Model 7.3 associated with each patient. The second command saves the EBLUPs of the random effects associated with the intercept for each tooth in a new variable named TOOTH EBLUPS. After saving these three new variables, we generate a new data set containing a single case per patient, and including the individual patient EBLUPs (the original data set should be saved before creating this collapsed data set): . save "C:\temp\veneer.dta", replace . collapse pat eblups1 pat eblups2, by(patient) We then generate normal Q–Q plots for each set of patient-specific EBLUPs and check for outliers: . qnorm pat_eblups1, ytitle(EBLUPs of Random Patient TIME Effects) . graph save "C:\temp\figure76_part1.gph" Linear Mixed Models: A Practical Guide Using Statistical Software −10 −40 EBLUPs of Random Patient Intercepts −20 0 20 EBLUPs of Random Patient TIME Effects −5 0 5 10 15 40 362 −10 −5 0 Inverse Normal 5 10 −40 −20 0 20 Inverse Normal 40 FIGURE 7.6: Normal Q–Q plots for the EBLUPs of the random patient effects. . qnorm pat_eblups2, ytitle(EBLUPs of Random Patient Intercepts) . graph save "C:\temp\figure76_part2.gph" . graph combine "C:\temp\figure76_part1.gph" "C:\temp\figure76_part2.gph" Note that we make use of the graph save and graph combine commands to save the individual plots and then combine them into a single figure. Figure 7.6 suggests that there are two positive outliers in terms of the random TIME effects (EBLUPs greater than 5) associated with the patients (left panel). We can investigate selected variables for these patients after opening the original data set (including the saved EBLUPs) once again: . use "C:\temp\veneer.dta", clear . list patient tooth gcf time age cda base gcf if pat eblups1 > 5 Models for Clustered Longitudinal Data:The Dental Veneer Example ' +-------------------------------------------------+ patient tooth gcf time age cda base_gcf --------------------------------------------------1. 1 6 11 3 46 4.666667 17 2. 1 6 68 6 46 4.666667 17 3. 1 7 13 3 46 4.666667 22 4. 1 7 47 6 46 4.666667 22 5. 1 8 14 3 46 5.000000 18 --------------------------------------------------6. 1 8 58 6 46 5.000000 18 7. 1 9 10 3 46 3.333333 12 8. 1 9 57 6 46 3.333333 12 9. 1 10 14 3 46 8.666667 10 10. 1 10 44 6 46 8.666667 10 --------------------------------------------------11. 1 11 11 3 46 5.666667 17 12. 1 11 53 6 46 5.666667 17 85. 10 6 29 3 36 2.333333 27 86. 10 6 87 6 36 2.333333 27 87. 10 7 28 3 36 8.666667 10 --------------------------------------------------88. 10 7 61 6 36 8.666667 10 89. 10 8 17 3 36 6.333333 25 90. 10 8 47 6 36 6.333333 25 91. 10 9 11 3 36 6.666667 7 92. 10 9 42 6 36 6.666667 7 --------------------------------------------------93. 10 10 48 3 36 6.333333 15 94. 10 10 65 6 36 6.333333 15 95. 10 11 55 3 36 4.666667 19 96. 10 11 70 6 36 4.666667 19 +-------------------------------------------------+ & 363 $ % As expected, these two patients (PATIENT = 1 and 10) consistently have large increases in GCF as a function of TIME for all of their teeth, and their data should be checked for validity. The tooth-specific random effects can be assessed in a similar manner by first creating a tooth-specific data set containing only the PATIENT, TOOTH, and TOOTH EBLUP variables (after generating predicted values of the random effects in the original data set, as shown above), and then generating a normal Q–Q plot: . collapse tooth eblups, by(patient tooth) . qnorm tooth eblups, ytitle(EBLUPs of Random Tooth Effects) The resulting plot (not displayed) does not provide any evidence of extremely unusual random tooth effects. 7.10 7.10.1 Software Notes and Recommendations ML vs. REML Estimation In this chapter, we introduce for the first time the HMLM2 (hierarchical multivariate linear models) procedure, which was designed for analyses of clustered longitudinal data sets in 364 Linear Mixed Models: A Practical Guide Using Statistical Software HLM. Unlike the LMM procedures in SAS, SPSS, R, and Stata, this procedure only uses ML estimation. The procedures in SAS, SPSS, R, and Stata provide users with a choice of either REML or ML estimation when fitting these models. This difference could have important consequences when developing a model. We recommend using likelihood ratio tests based on REML estimation when testing hypotheses, such as Hypotheses 7.1 and 7.2, involving covariance parameters. This is not possible if the models are fitted using ML estimation. A second important consequence of using ML estimation is that the covariance parameter estimates are known to be biased. This can result in smaller estimated standard errors for the estimates of the fixed effects in the model and also has implications for the fixedeffect parameters that are estimated. Some of these differences were apparent in Tables 7.6 through 7.8. 7.10.2 The Ability to Remove Random Effects from a Model The HMLM2 procedure requires that at least one random effect be specified in the model at Level 3 and at Level 2 of the data. The procedures in SAS, SPSS, R, and Stata all allow more flexibility in specifying and testing which levels of the data (e.g., patient or teeth nested within patients) should have random effects included in the model. Although the HMLM2 procedure is more restrictive than some of the other software procedures in this sense, it also ensures that the hierarchy of the data is maintained in the analysis. Users of SAS, SPSS, R, and Stata must think carefully about how the hierarchy of the data is specified in the model, and then correctly specify the appropriate random effects in the syntax. HMLM2 forces the hierarchical structure of these data sets to be taken into consideration. 7.10.3 Considering Alternative Residual Covariance Structures With the exception of the lmer() function in R, all of the other LMM procedures considered in this chapter allow users to fit models with nonidentity residual covariance (Rij ) matrices. The unstructured residual covariance matrix (Model 7.2A) is not available in HMLM2 when random effects are also considered simultaneously, and we had to use an alternative setup of the model to allow us to fit the compound symmetry structure (Model 7.2B) using HMLM2 (see Subsection 7.4.5). In this analysis, we found that the identity residual covariance structure was the better and more parsimonious choice for our models, but this would not necessarily be the case in analyses of other data sets. Heterogeneity of variances and correlation of residuals is a common feature in longitudinal data sets, and the ability to accommodate a wide range of residual covariance structures is very important. We recommend that analysts of clustered longitudinal data sets with more than two longitudinal observations consider alternative covariance structures for the residuals, and attempt to identify the structures that provide the best fit to a given set of data. The information criteria (AIC, BIC, etc.) can be useful in this regard. In the Dental Veneer example, there were only a small number of residual covariance structures that could be considered for the 2 × 2 Rij matrix, because it contained only three parameters, at the most, and aliasing with other covariance parameters was involved. In other data sets with more longitudinal observations, a wider variety of residual covariance structures could (and should) be considered. The procedures in SAS, SPSS, Stata, HLM, and the lme() function in R offer flexibility in this regard, with proc mixed in SAS having the largest list of available residual covariance structures. Models for Clustered Longitudinal Data:The Dental Veneer Example 7.10.4 365 Aliasing of Covariance Parameters We had difficulties when fitting Models 7.2A and 7.2B because of aliasing (nonidentifiability) of the covariance parameters. The problems with these models arose because we were specifying random effects at two levels of the data (patient and teeth within patients), as well as an additional residual covariance at the tooth level. If we had more than two observations per tooth, this would have been a problem for Model 7.2B only. The symptoms of aliasing of covariance parameters manifest themselves in different fashions in the different software programs. For Model 7.2A, SAS complained in a NOTE in the log that the estimated Hessian matrix (which is used to compute the standard errors of the estimated covariance parameters) was not positive-definite. Users of SAS need to be aware of these types of messages in the log file. SAS also reported a value of zero for the UN(2,2) covariance parameter (i.e., the residual variance at time 2) and did not report a standard error for this parameter estimate in the output. For Model 7.2B, SAS did not provide estimates due to too many likelihood evaluations. SPSS produced a warning message in the output window about lack of convergence for both Models 7.2A and 7.2B. In this case, results from SPSS should not be interpreted, because the estimation algorithm has not converged to a valid solution for the parameter estimates. After fitting Models 7.2A and 7.2B with the lme() function in R, attempts to use the intervals() function to obtain confidence intervals for the estimated covariance parameters resulted in extremely wide and unrealistic intervals. Simply fitting these two models in R did not indicate any problems with the model specification, but the intervals for these parameters provide an indication of instability in the estimates of the standard errors for these covariance parameter estimates. Similar problems were apparent when fitting Model 7.2A in Stata, and Model 7.2B could not be fitted in Stata due to a lack of convergence (similar to the procedures in SAS and SPSS). We were not able to fit Model 7.2A using HMLM2, because the unstructured residual covariance matrix is not available as an option in a model that also includes random effects. In addition, HMLM2 reported a generic message for Model 7.2B that stated “Invalid info, score, or likelihood” and did not report parameter estimates for this model. In general, users of these software procedures need to be very cautious about interpreting the output for covariance parameters. We recommend always examining the estimated covariance parameters and their standard errors to see if they are reasonable. The procedures in SAS, SPSS and Stata make this relatively easy to do. In R, the intervals() function is helpful (when using the lme() function). HMLM2 is fairly direct and obvious about problems that occur, but it is not very helpful in diagnosing this particular problem. Readers should be aware of potential problems when fitting models to clustered longitudinal data, pay attention to warnings and notes produced by the software, and check model specification carefully. We considered three possible structures for the residual covariance matrix in this example to illustrate potential problems with aliasing. We advise exercising caution when fitting these models so as not to overspecify the covariance structure. 7.10.5 Displaying the Marginal Covariance and Correlation Matrices The ability to examine implied marginal covariance matrices and their associated correlation matrices can be very helpful in understanding an LMM that has been fitted (see Section 7.8). SAS makes it easy to do this for any subject desired, by using the v = and vcorr = options in the random statement. In fact, proc mixed in SAS is currently the only procedure that allows users to examine the marginal covariance matrix implied by a LMM fitted to a clustered longitudinal data set with three levels. 366 Linear Mixed Models: A Practical Guide Using Statistical Software 7.10.6 Miscellaneous Software Notes 1. SPSS: The syntax to set up the subject in the RANDOM subcommand for TOOTH nested within PATIENT is (TOOTH*PATIENT), which appears to be specifying TOOTH crossed with PATIENT, but is actually the syntax used for nesting. Alternatively, one could use a RANDOM subcommand of the form /RANDOM tooth(patient), without any SUBJECT variable(s), to include nested random tooth effects in the model; however, this would not allow one to specify multiple random effects at the tooth level. 2. HMLM2: This procedure requires that the Level 1 data set include an indicator variable for each time point. For instance, in the Dental Veneer example, the Level 1 data set needs to include two indicator variables: one for observations at 3 months, and a second for observations at 6 months. These indicator variables are not necessary when using the procedures in SAS, SPSS, R, and Stata. 7.11 7.11.1 Other Analytic Approaches Modeling the Covariance Structure In Section 7.8 we examined the marginal covariance of observations on patient 1 implied by the random effects specified for Model 7.3. As discussed in Chapter 2, we can model the marginal covariance structure directly by allowing the residuals for observations on the same tooth to be correlated. For the Dental Veneer data, we can model the tooth-level marginal covariance structure implied by Model 7.3 by removing the random tooth-level effects from the model and specifying a compound symmetry covariance structure for the residuals, as shown in the following SAS syntax for Model 7.3A: title "Alternative Model 7.3A"; proc mixed data = veneer noclprint covtest; class patient tooth cattime; model gcf = time base gcf cda age / solution outpred = resids; random intercept time / subject = patient type = un solution v = 1 vcorr = 1; repeated cattime / subject = tooth(patient) type=cs; run; We can view the estimated covariance parameters for Model 7.3A in the following output: ' $ Covariance Parameter Estimates (Model 7.3A) Cov Parm UN(l,l) UN(2,1) UN(2,2) CS Residual & Subject Estimate PATIENT 524.9700 PATIENT -140.4200 PATIENT 41.8869 TOOTH (PATIENT) 47.4573 48.8675 Standard Error 253.0100 66.5737 18.7993 16.6304 10.5053 Z Value Pr Z 2.07 0.0190 -2.11 0.0349 2.23 0.0129 2.85 0.0043 4.65 <0.0001 % Models for Clustered Longitudinal Data:The Dental Veneer Example 367 The analogous syntax and output for Model 7.3 are shown below for comparison. Note that the output for the models is nearly identical, except for the labels assigned to the covariance parameters in the output. The −2 REML log-likelihood values are the same for the two models, as are the AIC and BIC. title "Model 7.3"; proc mixed data = data.veneer noclprint covtest; class patient tooth cattime; model gcf = time base gcf cda age / solution outpred = resids; random intercept time / subject = patient type = un v = 1 vcorr = 1; random intercept / subject = tooth(patient) solution; run; ' $ Covariance Parameter Estimates (Model 7.3) Cov Parm UN(l,l) UN(2,1) UN(2,2) Intercept Residual & Subject Estimate PATIENT 524.9500 PATIENT -140.4200 PATIENT 41.8874 TOOTH (PATIENT) 47.4544 48.8703 Standard Error 252.9900 66.5725 18.7998 16.6298 10.5059 Z Value Pr Z 2.07 0.0190 -2.11 0.0349 2.23 0.0129 2.85 0.0022 4.65 <0.0001 % It is important to note that the model setup used for Model 7.3 only allows for positive marginal correlations among observations on the same tooth over time, because the implied marginal correlations are a result of the variance of the random intercepts associated with each tooth. The specification of Model 7.3A allows for negative correlations among observations on the same tooth. 7.11.2 The Step-Up vs. Step-Down Approach to Model Building The step-up approach to model building commonly used in the HLM literature (Raudenbush & Bryk, 2002) begins with an “unconditional” model, containing only the intercept and random effects. The reduction in the estimated variance components at each level of the data is then monitored as fixed effects are added to the model. The mean structure is considered complete when adding fixed-effect terms provides no further reduction in the variance components. This step-up approach to model building (see Chapter 4, or Subsection 2.7.2) could also be considered for the Dental Veneer data. The step-down (or top-down) approach involves starting the analysis with a “loaded” mean structure and then working on the covariance structure. One advantage of this approach is that the covariances can then be truly thought of as measuring “variance” and not simply variation due to fixed effects that have been omitted from the model. An advantage of using the step-up approach is that the effect of each covariate on reducing the model “variance” can be viewed for each level of the data. If we had used the step-up approach and adopted a strategy of only including significant main effects in the model, our final model for the Dental Veneer data might have been different from Model 7.3. 7.11.3 Alternative Uses of Baseline Values for the Dependent Variable The baseline (first) value of the dependent variable in a series of longitudinal measures may be modeled as simply one of the repeated outcome measures, or it can be considered as a baseline covariate, as we have done in the Dental Veneer example. 368 Linear Mixed Models: A Practical Guide Using Statistical Software There are strong theoretical reasons for treating the baseline value as another measure of the outcome. If the subsequent measures represent values on the dependent variable, measured with error, then it is difficult to argue that the first of the series is “fixed,” as required for covariates. In this sense it is more natural to consider the entire sequence, including the baseline values, as having a multivariate normal distribution. However, when using this approach, if a treatment is administered after the baseline measurement, the treatment effect must be modeled as a treatment by time interaction if treatment groups are similar at baseline. A changing treatment effect over time may lead to a complex interaction between treatment and a function of time. Those who consider the baseline value as a covariate argue that the baseline value is inherently different from other values in the series. The baseline value is often taken prior to a treatment or intervention, as in the Dental Veneer data. There is a history of including baseline values as covariates, particularly in clinical trials. The inclusion of baseline covariates in a model may substantially reduce the residual variance (because of strong correlations with the subsequent values), thus increasing the power of tests for other covariates. The inclusion of baseline covariates also allows an appropriate adjustment for baseline imbalance between groups. Finally, the values in the subsequent series of response measurements may be a function of the initial value. This can happen in instances when there is large room for improvement when the baseline level is poor, but little room for improvement when the baseline level is already good. This situation is easily modeled with an interaction between time and the baseline covariate, but more difficult to handle in the model considering the baseline value as one of the outcome measures. In summary, we find both model frameworks to be useful in different settings. The longitudinal model, which includes baseline values as measures on the dependent variable, is more elegant; the model considering the first outcome measurement as a baseline covariate is often more practical. 8 Models for Data with Crossed Random Factors: The SAT Score Example 8.1 Introduction This chapter introduces the analysis of data sets with crossed random factors, where there are multiple random factors with levels that are crossed with each other, rather than having an explicit nesting structure. For example, in Chapter 4, we analyzed a data set where students were nested in classrooms, and classrooms were nested within schools. The classroom and school ID variables were both random factors, where the levels of these variables were randomly selected from larger populations of classrooms and schools. Further, the levels of the classroom factor were nested within levels of the school factor; a given classroom could not exist in multiple schools. In this chapter, we consider an example data set where there are repeated measurements of math scores on an SAT test (Student Aptitude Test) collected on randomly sampled students within a given school, and those students have multiple teachers from the school over time. As a result, both students and teachers have multiple measures on the dependent variable associated with them, but the levels of these two random factors (student ID and teacher ID) are crossed with each other. Linear mixed models with crossed random effects enable the potential correlations of the repeated observations associated with each level of these crossed random factors to be modeled simultaneously. For example, we might expect between-student variance in math performance over time; at the same time, we might expect that some teachers are better math instructors, resulting in between-teacher variance in the math scores. These models enable simultaneous estimation of the components of variance associated with the levels of the crossed random factors, and assessment of which random factor tends to contribute the most to variability in measures on the dependent variable. As discussed in Chapter 2, models with crossed random effects tend to be more difficult to estimate. Estimation is facilitated by the use of sparse matrices in model specification and maximum likelihood estimation, but different software procedures will tend to use different algorithms when fitting these types of models. We do not highlight one package in particular in this chapter, and discuss some of the notable differences between the software procedures in Section 8.10. 8.2 8.2.1 The SAT Score Study Study Description The data used in this example have been borrowed from the example data sets provided by the developers of the HLM software. Specifically, we analyze a subset of the data from an 369 370 Linear Mixed Models: A Practical Guide Using Statistical Software TABLE 8.1: Sample of the SAT Score Data Set in the “Long” Format STUDID TCHRID MATH YEAR 13099 14433 631 −1 13100 14433 596 −1 13100 14484 575 0 13100 14755 591 1 13101 14433 615 −1 13101 14494 590 0 13102 14433 621 −1 13102 14494 624 0 13102 14545 611 1 ... Note: “...” indicates portion of the data not displayed. educational study, focusing on one of the original 67 schools in the study, and the teachers and students within that school. In this study, students were given a math test in grades 3, 4, and 5 that was similar to the SAT math test used to evaluate college applicants. While different treatments were applied in some years for randomly sampled students, we do not consider treatment effects in the analysis in this chapter. We focus mainly on the components of variance in the math scores associated with students and teachers, in addition to change over time in the scores. The subset of the original data set that we work with in this chapter features 234 repeated measures collected from 122 students who have been instructed by a total of 12 teachers. Before we carry out an analysis of the SAT score data using the procedures in SAS, SPSS, R, Stata, or HLM, we need to make sure that the data set has been restructured into the “long” format (similar to the other case studies presented in the book). A portion of the SAT score data in the “long” format is shown in Table 8.1. The portion of the SAT score data set presented in Table 8.1 demonstrates some key features of both this specific data set and data sets with crossed random factors more generally. First, note that the YEAR variable represents a centered version of the grade in which the student was measured (centered at Grade 4). Second, note that there are repeated measures on both students and teachers; students are measured repeatedly over time (with not all students measured in each of the three years), and different students might have had the same teacher in a given year. For example, teacher ID 14433 instructed all four of the students in Table 8.1 in grade 3. Students are therefore not nested within teachers. This crossed structure, illustrated in the matrix below, introduces multiple levels of potentially correlated observations in the data set, and linear mixed models including random effects for both students and teachers enable decomposition of the components of variance due to each of these crossed random factors. Student ID 13099 13100 13101 13102 ... Teacher ID 14433 X X X X 14484 14494 14545 X 14755 X X X X ... Models for Data with Crossed Random Factors:The SAT Score Example 371 We note below how this crossed structure, where an “X” indicates an observation on the dependent variable measuring math achievement on the SAT for a given student–teacher pairing, results in a matrix with several empty cells, where a particular student was not instructed by a particular teacher. This is why methods using sparse matrices for estimation are the most efficient when fitting these types of models. In the analysis in this chapter, we consider models including a fixed effect of YEAR (enabling assessment of change over time in the mean math score) and random effects for the levels of TCHRID (teachers) and STUDID (students), to see whether the variation in math scores is being driven by students or teachers. To summarize, the following variables are included in the SAT score data set: • STUDID = Unique Student ID • TCHRID = Unique Teacher ID • MATH = Score on SAT Math Test • YEAR = Year of Measurement (−1 = Grade 3, 0 = Grade 4, 1 = Grade 5) Sorting the data set by STUDID or TCHRID is not required for using the software procedures that enable fitting models with crossed random effects. 8.2.2 Data Summary In this section, we consider some exploratory graphical analyses of the SAT score data, using the R software. These plots can easily be generated using the other four software packages as well. We first consider the distribution of the SAT scores as a function of the year (or grade) in which the data were collected. The following R syntax can be used to read in the data (with variable names in the first row) from the C:\temp directory, and then generate the side-by-side box plots in Figure 8.1. > sat <- read.csv("C:\\temp\\school_data_final.csv", h=T) > attach(sat) > plot(math ~ factor(year), xlab = "Year (Centered at Grade 4)", ylab = "SAT Math Score") Figure 8.1 shows evidence of a fairly clear linear increase in performance on the math portion of the SAT among these students as a function of year of measurement. These plots result in an expectation that the fixed effect of YEAR (representing a linear rate of change in the SAT math scores) will likely be positive and significant in our analysis. We also see evidence of fairly constant variance in the scores as a function of YEAR. We now consider variability in the math scores among the teachers and students. First, we examine side-by-side box plots for the teachers, using the following R syntax to generate Figure 8.2. > plot(math ~ factor(tchrid), xlab = "Teacher ID", ylab = "SAT Math Score") Linear Mixed Models: A Practical Guide Using Statistical Software 550 600 SAT Math Score 650 700 372 −1 0 1 Year (Centered at Grade 4) 600 550 SAT Math Score 650 700 FIGURE 8.1: Box plots of SAT scores by year of measurement. 14405 14443 14473 14494 14545 14694 Teacher ID FIGURE 8.2: Box plots of SAT scores for each of the 13 teachers in the SAT score data set. 373 600 550 SAT Math Score 650 700 Models for Data with Crossed Random Factors:The SAT Score Example 13099 13116 13133 13150 13168 13195 13315 13344 Student ID FIGURE 8.3: Box plots of SAT scores for each of the 122 students in the SAT score data set. Figure 8.2 provides evidence of substantial variability in performance on the math portion of the SAT among these 13 teachers, suggesting that a model for these data should include random teacher effects. Finally, we consider variability among the students, using the following R syntax to generate Figure 8.3. > plot(math ~ factor(studid), xlab = "Student ID", ylab = "SAT Math Score") We also note a fair amount of variability among the students in terms of the SAT scores in Figure 8.3, and that several students have only been measured once. These results suggest that random effects associated with the students should also be included in a model for these data (meaning that the random effects of students would be crossed with the random effects of teachers, given the structure of these data). We now consider the model that we will fit to the SAT score data. 8.3 Overview of the SAT Score Data Analysis In this chapter, we do not consider explicit model-building steps for the SAT score data. Instead, we simply fit a single model in each of the different software procedures that includes all of the fixed effects and crossed random effects that we are interested in evaluating. We then compare the methods and syntax used to fit the model across the software procedures, in addition to the resulting estimates produced by the procedures. 374 8.3.1 8.3.1.1 Linear Mixed Models: A Practical Guide Using Statistical Software Model Specification General Model Specification We specify Model 8.1 in this subsection. The general specification of Model 8.1 corresponds closely to the syntax used to fit this model when using the procedures in SAS, SPSS, Stata, and R. The value of MATHtij in a given year indexed by t (t = 1, 2, 3) for the i-th student (i = 1, 2, ..., 122) being instructed by the j-th teacher (j = 1, 2, ..., 13) can be written as follows: MATHtij = β0 + β1 × YEARtij +ui + vj + εtij fixed } random (8.1) The fixed-effect parameters are represented by β0 and β1 . The fixed intercept β0 represents the expected value of MATHtij when YEARtij in equal to zero (or Grade 4, given the centering). The parameter β1 represents the fixed effect of the YEAR variable, which can be interpreted as the linear rate of change in the expected math score associated with a one-year increase. The ui term represents the random effect associated with student i, and vj represents the random effect associated with teacher j. We assume that the random effects arise from two independent normal distributions: ui ∼ N (0, σi2 ), vj ∼ N (0, σj2 ) We have a total of 135 random effects in this model, corresponding to the 122 students and the 13 teachers. The overall resulting D matrix corresponding to this type of model with crossed random effects is a diagonal matrix with 122 + 13 = 135 columns and 135 rows, corresponding to the 135 random effects of the students and teachers. The first 122 × 122 block-diagonal portion of the matrix will have the variance of the random student effects, σi2 , on the diagonal, and zeroes off the diagonal. The remaining 13 × 13 block-diagonal portion of this matrix will have the variance of the random teacher effects, σj2 , on the diagonal, and zeroes off the diagonal. The residuals associated with the math score observations are assumed to be independent of the two random effects, and follow a normal distribution: εtij ∼ N (0, σ2 ) We next consider the hierarchical specification of a model with crossed random effects. 8.3.1.2 Hierarchical Model Specification We now present an equivalent hierarchical specification of Model 8.1, using the same notation as in Subsection 8.3.1.1. The hierarchical model has two components, reflecting contributions from the two levels of the data: the repeated measures at Level 1, and the crossing of students and teachers at Level 2 (i.e., the repeated measures of the dependent variable are associated with both students and teachers simultaneously). We write the Level 1 component as Level 1 Model (Repeated Measures) MATHtij = b0ij + b1ij × YEARtij + εtij (8.2) where the residuals (εtij ) have the distribution defined in the general specification of Model 8.1 in Subsection 8.3.1.1, with constant variance. Models for Data with Crossed Random Factors:The SAT Score Example 375 In the Level 1 model, we assume that MATHtij , the math SAT score for an individual combination of student i and teacher j at time t, follows a linear model, defined by the intercept specific to the student–teacher combination, b0ij , and the effect of YEAR specific to the student–teacher combination, b1ij . The Level 2 model then describes variation between the various student–teacher combinations in terms of the random intercepts and time effects, using the crossed random effects: Level 2 Model (Student–Teacher Combination) b0ij = β0 + ui + vj b1ij = β1 (8.3) where ui ∼ N (0, σi2 ), vj ∼ N (0, σj2 ) In this Level 2 model, the intercept b0ij for student i and teacher j depends on the overall fixed intercept, β0 , the random effect associated with student i, ui , and the random effect associated with teacher j, vj . We note that crossed random effects associated with the students and teachers are not included in the Level 2 equation for the effect of YEAR in this simple example (although they could be more generally, if one wished to test hypotheses about variance among students or teachers in terms of the YEAR effects). As a result, the YEAR effect specific to a student–teacher combination is simply defined by the overall fixed effect of YEAR, β1 . By substituting the expressions for b0ij and b1ij from the Level 2 model into the Level 1 model, we obtain the general linear mixed model (LMM) with crossed random effects that was specified in (8.1). 8.3.2 Hypothesis Tests We test a simple set of three hypotheses in this case study, related to the two crossed random effects and the overall fixed effect of YEAR on the SAT math scores for these students. Hypothesis 8.1. The random effects associated with students (ui ) can be omitted from Model 8.1. Model 8.1 has a single random effect, ui , associated with the intercept for each student. To test Hypothesis 8.1, we fit a model (Model 8.2) excluding the random student effects (i.e., a two-level model with repeated measures nested within teachers), and use a REMLbased likelihood ratio test. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 8.1 (the reference model) from that for Model 8.2 (the nested model). The asymptotic null distribution of the test statistic is a mixture of χ20 and χ21 distributions, with equal weights of 0.5 (see Subsection 2.6.2.2). We once again remind readers that likelihood ratio tests, such as the one used for Hypothesis 8.1, rely on asymptotic (large-sample) theory, so we would not usually carry out this type of test for such a small data set. Rather, in practice, the random effects would probably be retained without testing, so that the appropriate marginal variance-covariance structure would be obtained for the data set. We present the calculation of this likelihood ratio test for the random effects (and those that follow in this chapter) strictly for illustrative purposes. Hypothesis 8.2. The random effects associated with teachers (vj ) can be omitted from Model 8.1. 376 Linear Mixed Models: A Practical Guide Using Statistical Software We test Hypothesis 8.2 using a similar REML-based likelihood ratio test. The test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 8.1 from that for a new nested model, Model 8.3, excluding the random teacher effects (i.e., a two-level model with repeated measures nested within students). The asymptotic distribution of the test statistic under the null hypothesis is once again a mixture of χ20 and χ21 distributions, with equal weights of 0.5. Hypothesis 8.3. The fixed effects associated with the YEAR variable can be omitted from Model 8.1. The null and alternative hypotheses are H0 : β1 = 0 HA : β1 = 0 We test Hypothesis 8.3 using the standard test statistics for single fixed-effect parameters that are computed automatically by the various software procedures. We talk about differences in the test statistics reported and the test results in the next section (Section 8.4). For more detail on the results of these hypothesis tests, see Section 8.5. 8.4 Analysis Steps in the Software Procedures The modeling results for all software procedures are presented and compared in Section 8.6. 8.4.1 SAS We first import the comma-separated data file (school_data_final.csv, assumed to be located in the C:\temp directory) into SAS, and create a temporary SAS data set named satmath. PROC IMPORT OUT = WORK.satmath DATAFILE="C:\temp\school_data_final.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; RUN; We now proceed with fitting Model 8.1 and testing the hypotheses outlined in Section 8.3.2. The SAS syntax to fit Model 8.1 using proc mixed is as follows: title "Model 8.1"; proc mixed data = satmath covtest; class studid tchrid; model math = year / solution; random int / subject = studid; random int / subject = tchrid; run; We have specified the covtest option in the proc mixed statement to obtain the standard errors of the estimated variance components in the output for comparison with the other software procedures. Recall that this option also causes SAS to display a Wald test Models for Data with Crossed Random Factors:The SAT Score Example 377 for the variance of the random effects associated with the teachers and students, which we do not recommend for use in testing whether to include random effects in a model (see Subsection 2.6.3.2). The class statement identifies the categorical variables that are required to specify the model. We include the two crossed random factors, STUDID and TCHRID, in the class statement; as shown in earlier chapters, random factors (crossed or nested) need to be specified here. The dependent variable, MATH, and the predictor variable, YEAR, are both treated as continuous variables in this model. The model statement sets up the fixed-effects portion of Model 8.1. We specify that the dependent variable, MATH, is a linear function of a fixed intercept (included by default) and the fixed effect of the YEAR variable. The solution option requests that the estimates of the two fixed-effect parameters be displayed in the output, along with their standard errors and a t-test for each parameter. The two random statements set up the crossed random effects structure for this model. In this case, STUDID is identified as the subject in the first random statement, indicating that it is a random factor. By specifying random int, we include a random effect associated with the intercept for each unique student. The second random statement with a different subject variable specified (TCHRID) also includes a random effect associated with the intercept for each teacher, and indicates that the levels of these two subject variables may potentially cross with each other. We note the difference in these two random statements from those used in Chapter 4, for example; there is no nesting relationship indicated for the random factors. We now test Hypothesis 8.1 by removing the random student effects from Model 8.1, and performing a likelihood ratio test. We first refit Model 8.1 without the first random statement: title "Hypothesis 8.1"; proc mixed data = satmath covtest; class studid tchrid; model math = year / solution; random int / subject = tchrid; run; The –2 REML log-likelihood value for this reduced two-level model is 2170.3, and the corresponding value for Model 8.1 was 2123.6. We compute the p-value for the likelihood ratio test using the following syntax: title "p-value for Hypothesis 8.1"; data _null_; lrtstat = 2170.3 - 2123.6; df = 1; pvalue = 0.5 * (1 - probchi(lrtstat, df)); format pvalue 10.8; put lrtstat = df = pvalue = ; run; We have very strong evidence (p < 0.001) against the null hypothesis in this case, and would choose to retain the random student effects in the model; there is clear evidence of substantial between-student variance in performance on the math test, as was apparent in the initial data summary. We test Hypothesis 8.2 using a similar approach. We first fit a reduced model without the random teacher effects, and then compute the likelihood ratio test statistic and p-value: 378 Linear Mixed Models: A Practical Guide Using Statistical Software title "Hypothesis 8.2"; proc mixed data = satmath covtest; class studid tchrid; model math = year / solution; random int / subject = studid; run; title "p-value for Hypothesis 8.2"; data _null_; lrtstat = 2203.1 - 2123.6; df = 1; pvalue = 0.5 * (1 - probchi(lrtstat, df)); format pvalue 10.8; put lrtstat = df = pvalue = ; run; We have even stronger evidence against the null hypothesis in this case, and would also choose to retain the random teacher effects in this model. Collectively, we can conclude that there is substantial variance among both students and teachers in performance on the math test. The solution option can be added to both random statements for Model 8.1 to examine predicted values of the random effects (EBLUPs) for individual students or teachers: title "Model 8.1, EBLUPs"; proc mixed data = satmath covtest; class studid tchrid; model math = year / solution; random int / subject = studid solution; random int / subject = tchrid solution; run; Finally, we test Hypothesis 8.3 with regard to the fixed effect of YEAR by examining the t-test for the fixed year effect included in the SAS output for Model 8.1. The resulting p-value (p = 0.0032) suggests that the fixed effect of YEAR is significantly different from zero, and the positive estimated coefficient suggests that the average performance of the students is increasing significantly over time. At this point, the model diagnostics examined in other chapters could also be examined for Model 8.1. 8.4.2 SPSS We assume that the .csv data set used to carry out the data summary in Subsection 8.2.2 has been imported into SPSS. We begin the SPSS analysis by setting up the syntax to fit Model 8.1 using the MIXED command: * Model 8.1 . MIXED math WITH year /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION TESTCOV Models for Data with Crossed Random Factors:The SAT Score Example 379 /RANDOM=INTERCEPT | SUBJECT(studid) COVTYPE(VC) /RANDOM=INTERCEPT | SUBJECT(tchrid) COVTYPE(VC). In this syntax, MATH is listed as the dependent variable. The /FIXED subcommand then lists the variable that has an associated fixed effect in Model 8.1 (YEAR). Note that YEAR is being treated as a continuous covariate, given that it is specified following the WITH keyword in the MIXED command. A fixed intercept term is included in the model by default. The /METHOD subcommand then specifies the REML estimation method, which is the default. The /PRINT subcommand requests that the SOLUTION for the estimated parameters in the model be displayed in the output. Furthermore, we request simple Wald tests of the two variance components with the TESTCOV option, to get an initial sense of the importance of the between-student variance and the between-teacher variance. We then include two separate /RANDOM subcommands, indicating that crossed random effects associated with each level of STUDID and TCHRID should be included in Model 8.1. The covariance structure for the random effects is specified as variance components (the default), using the COVTYPE(VC) syntax. We note that this specification of the two /RANDOM subcommands does not identify any type of nesting relationship between the two random effects. After fitting Model 8.1 and generating the parameter estimates and model information criteria in the SPSS output viewer, we now test Hypothesis 8.1 by removing the random student effects from Model 8.1, and performing a likelihood ratio test. We first refit Model 8.1 without the first /RANDOM subcommand: * Hypothesis 8.1 . MIXED math WITH year /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION TESTCOV /RANDOM=INTERCEPT | SUBJECT(tchrid) COVTYPE(VC). The –2 REML log-likelihood value for this reduced two-level model is 2170.3, and the corresponding value for Model 8.1 was 2123.6 (a difference of 46.7). We compute the p-value for the likelihood ratio test using the following syntax: * Hypothesis 8.1 . COMPUTE hyp81pvalue = 0.5 * (1 - CDF.CHISQ(46.7,1)). EXECUTE. This syntax will compute a new variable in the SPSS data set, containing the p-value associated with this test as a constant value for all cases in the data set. We have very strong evidence (p < 0.001) against the null hypothesis in this case, and would choose to retain the random student effects in the model; there is clear evidence of substantial between-student variance in performance on the math test, as was apparent in the initial data summary. We test Hypothesis 8.2 using a similar approach. We first fit a reduced model without the random teacher effects, and then compute the likelihood ratio test statistic and p-value: * Hypothesis 8.2 . MIXED math WITH year 380 Linear Mixed Models: A Practical Guide Using Statistical Software /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION TESTCOV /RANDOM=INTERCEPT | SUBJECT(studid) COVTYPE(VC). COMPUTE hyp82pvalue = 0.5 * (1 - CDF.CHISQ(79.5,1)). EXECUTE. We have even stronger evidence against the null hypothesis in this case, and would also choose to retain the random teacher effects in this model. Collectively, we can conclude that there is substantial variance among both students and teachers in performance on the math test. Unfortunately, there are no easy ways to extract the predicted random effects (EBLUPs) associated with teachers and students when using the MIXED command to fit models with crossed random effects in the current version of SPSS. Finally, we test Hypothesis 8.3 with regard to the fixed effect of YEAR by examining the t-test for the fixed year effect included in the SPSS output for Model 8.1. The resulting p-value (p = 0.016, based on a Satterthwaite approximation of the denominator degrees of freedom) suggests that the fixed effect of YEAR is significantly different from zero, and the positive estimated coefficient suggests that the average performance of the students is increasing significantly over time. At this point, the model diagnostics examined in other chapters could (and should) also be examined for Model 8.1. 8.4.3 R We assume that the sat data frame object created in the initial data summary (Subsection 8.2.2) has been attached to R’s working memory. Importantly, the lme() function in the R package nlme cannot fit models with crossed random effects, so we only fit Model 8.1 and test the three hypotheses using the lmer() function in the R package lme4 in this section. Assuming that it has already been installed from a Comprehensive R Archive Network (CRAN) mirror, the lme4 package first needs to be loaded, so that the lmer() function can be used in the analysis: > library(lme4) We then fit Model 8.1 to the SAT score data using the lmer() function: > # Model 8.1. > model8.1.fit <- lmer(math ~ year + (1|studid) + (1|tchrid), REML = T) We describe each part of this specification of the lmer() function: • model8.1.fit is the name of the object that contains the results of the fitted linear mixed model with crossed random effects. • The first argument of the function, math ~ year + (1|studid) + (1|tchrid), is the model formula, which defines the response variable (math), and the terms with associated fixed effects in the model (year). • The (1|studid) and (1|tchrid) terms in the model formula indicate that a random effect associated with the intercept should be included for each level of the categorical Models for Data with Crossed Random Factors:The SAT Score Example 381 random factor studid, and each level of the random factor tchrid. The lmer() function will automatically recognize whether the levels of these random factors are crossed or nested when estimating the variances of the random effects. • The final argument of the function, REML = T, tells R that REML estimation should be used for the desired covariance parameters in the model. This is the default estimation method for the lmer() function. After the function is executed, estimates from the model fit can be obtained using the summary() function: > summary(model8.1.fit) Additional results of interest for this LMM fit can be obtained by using other functions in conjunction with the model8.1.fit object. For example, predicted values (EBLUPs) of the random effects for each student and each teacher can be displayed using the ranef() function: > ranef(model8.1.fit) Software Note: As noted in earlier chapters, the lmer() function only produces tstatistics for the fixed effects, with no corresponding p-values. This is primarily due to the lack of agreement in the literature over appropriate degrees of freedom for these test statistics. In general, we recommend use of the lmerTest package in R for users interested in testing hypotheses about parameters estimated using the lmer() function, and we illustrate the use of this package later in this analysis. We now test Hypothesis 8.1 by fitting a reduced form of Model 8.1 without the random student effects: > # Hypothesis 8.1. > model8.1.fit.nostud <- lmer(math ~ year + (1|tchrid), REML = T) > summary(model8.1.fit.nostud) The likelihood test statistic is calculated by subtracting the –2 REML log-likelihood value for Model 8.1 (the reference model) from that for this reduced model (the value of the test statistic, based on the output provided by the summary() function, is 2170−2124 = 46). The test statistic has a null distribution that is a mixture of χ21 and χ22 distributions with equal weights of 0.5, so the anova() function cannot be used for the p-value. Instead, we calculate a p-value for the test statistic as follows: > 0.5*(1 - pchisq(46,0)) + 0.5*(1 - pchisq(46,1)) See Subsection 8.5.1 for details. The test statistic is significant (p < 0.001), so we decide to reject the null hypothesis and retain the random student effects in the model. We use a similar approach for testing Hypothesis 8.2, extracting the –2 REML loglikelihood (or REMLdev) from the summary() function output for this reduced model excluding the random teacher effects: > # Hypothesis 8.2. > model8.1.fit.notchr <- lmer(math ~ year + (1|studid), REML = T) > summary(model8.1.fit.notchr) 382 Linear Mixed Models: A Practical Guide Using Statistical Software The value of this test statistic, based on the output provided by the summary() function, is 2203 − 2124 = 79. We compute the asymptotic p-value for this test statistic: > 0.5*(1 - pchisq(79,0)) + 0.5*(1 - pchisq(79,1)) The resulting p-value (p < 0.0001) provides strong evidence for retaining the random teacher effects in the model. There is clear evidence of substantial variation among both students and teachers. Finally, we load the lmerTest package, which enables approximate t-tests of fixed-effect parameters in models fitted using the lmer() function. > library(lmerTest) We once again fit Model 8.1: > # Model 8.1. > model8.1.fit <- lmer(math ~ year + (1|studid) + (1|tchrid), REML = T) We then apply the summary() function to the new model fit object: > summary(model8.1.fit) In the resulting output, the p-value computed for the approximate t-statistic (p = 0.016, based on a Satterthwaite approximation of the denominator degrees of freedom) suggests that the fixed effect of YEAR is significantly different from zero, and the positive estimated coefficient suggests that the average performance of the students is increasing significantly over time. At this point, the model diagnostics examined in other chapters could (and should) also be examined for Model 8.1. 8.4.4 Stata We begin the analysis by importing the comma-separated values file containing the SAT score data (school_data_final.csv) into Stata: . insheet using "C:\temp\school data final.csv", comma We now proceed with fitting Model 8.1. We use the following mixed command to fit Model 8.1 to the SAT score data: . * Model 8.1. . mixed math year || _all: R.studid || _all: R.tchrid, variance reml . estat ic In this mixed command, MATH is listed as the dependent variable, followed by the predictor variable that has an associated fixed effect in Model 8.1 (YEAR). Note that YEAR is being treated as a continuous covariate, given that it is not preceded by i. (Stata’s factor notation). A fixed intercept term is included in the model by default. We next indicate the crossed random effects using two successive || symbols in the command. The first random factor, STUDID, follows the first || symbol, and is specified in a unique fashion. The _all: notation indicates that the entire data set is treated as one “cluster,” and R.studid indicates that STUDID is a categorical random factor, where the effects of the levels of that factor randomly vary for the one “cluster.” We then indicate the second random factor crossed with STUDID following a second || symbol, using the Models for Data with Crossed Random Factors:The SAT Score Example 383 same general concept with the categorical TCHRID factor. We note that this command does not specify an explicit nesting structure, given that the entire data set is treated as one “subject.” Finally, after a comma, we specify the variance option, which requests estimates of the variances of the random effects in the output (rather than estimates of standard deviations), and the reml option, requesting REML estimation of the parameters in this model (rather than the default ML estimation). The subsequent estat ic command requests that Stata compute selected information criteria (e.g., AIC, BIC) for this model and display them in the output. After fitting Model 8.1 and generating the parameter estimates and model information criteria in the Stata results window, we now test Hypothesis 8.1 by removing the random student effects from Model 8.1, and performing a likelihood ratio test. We first refit Model 8.1 without the random student effects: . * Model 8.1, no random student effects. . mixed math year || _all: R.tchrid, variance reml . estat ic This reduced model could also be fitted using this command, shown in earlier chapters for simple two-level models: . * Model 8.1, no random student effects. . mixed math year || tchrid:, variance reml . estat ic The –2 REML log-likelihood value for this reduced two-level model (found by multiplying the REML log-likelihood displayed by the estat ic command by –2) is 2170.3, and the corresponding value for Model 8.1 was 2123.6 (a difference of 46.7). We compute the p-value for the likelihood ratio test using the following syntax: . * Hypothesis 8.1 . . di 0.5*chi2tail(1,46.7) Based on the resulting p-value, we have very strong evidence (p < 0.001) against the null hypothesis in this case, and would choose to retain the random student effects in the model; there is clear evidence of substantial between-student variance in performance on the math test, as was apparent in the initial data summary. We test Hypothesis 8.2 using a similar approach. We first fit a reduced model without the random teacher effects, and then compute the likelihood ratio test statistic and p-value: . * Model 8.1, no random teacher effects. . mixed math year || _all: R.studid, variance reml . estat ic . * Hypothesis 8.2 . . di 0.5*chi2tail(1,79.5) We have even stronger evidence against the null hypothesis in this case, and would also choose to retain the random teacher effects in this model. Collectively, we can conclude that there is substantial variance among both students and teachers in performance on the math test. Once a model with crossed random effects has been fitted using the mixed command, the predict post-estimation command can be used to generate new variables in the Stata data 384 Linear Mixed Models: A Practical Guide Using Statistical Software set containing predicted values of each random effect. For example, we save the EBLUPs of the random school and teacher effects, respectively, in two new variables (b1 and b2) using the following predict command (after fitting Model 8.1): . * Model 8.1. . mixed math year || _all: R.studid || _all: R.tchrid, variance reml . predict b*, reffects Finally, we test Hypothesis 8.3 with regard to the fixed effect of YEAR by examining the z-statistic for the fixed year effect included in the Stata output for Model 8.1. The resulting p-value (p = 0.003) suggests that the fixed effect of YEAR is significantly different from zero, and the positive estimated coefficient suggests that the average performance of the students is increasing significantly over time. At this point, the model diagnostics examined in other chapters could (and should) also be examined for Model 8.1. 8.4.5 8.4.5.1 HLM Data Set Preparation When using the HLM software to fit linear mixed models with crossed random effects to data sets with two crossed random factors like the SAT score data, three separate data sets need to be prepared: 1. The Level-1 data set: Each row in this data set corresponds to an observation on a unit of analysis (including a unique measurement on the dependent variable) for a given combination of the two crossed random factors. In the context of the present SAT score study, each row of the Level-1 data set represents a measurement on a student–teacher combination in a given year. This data set is similar in structure to the data set displayed in Table 8.1. The Level-1 data set for this example includes STUDID, TCHRID, the dependent variable (MATH), and the predictor variable of interest (YEAR). 2. The Row-Level data set: When thinking about the crossed random factors in a data set like the SAT score data, one can think of a matrix like that shown in Section 8.2.1, where the unique levels of one random factor define the rows, and the unique levels of the second random factor define the columns. This distinction between the row factor and the column factor is essentially arbitrary in HLM, but the row-level data set contains one row per unique level of one of the two random factors. In this example, we consider STUDID as the random factor defining the “rows” of this cross-tabulation, and this data set therefore has one row per student. We include the unique student ID (STUDID), in addition to a variable recording the number of measures collected on each student (MEASURES), and we sort the data set is ascending order by STUDID. Although we don’t actually analyze the MEASURES variable in this example, HLM requires at least one nonID variable to be included in this data set. In general, additional student-level covariates of interest in the analysis would be included in the row-level data set. 3. The Column-Level data set: This data set has one row per unique level of the second random factor. In this example, this data set has one row per unique level of TCHRID, or per teacher. We include the TCHRID variable in this data set, along with a variable recording the number of measures collected on each teacher (MEASURES T), and sort the data set in ascending order by TCHRID. Similar to the row-level data set, the column-level data set needs to have at least Models for Data with Crossed Random Factors:The SAT Score Example 385 one non-ID variable for the teachers, and additional teacher-level covariates could also be included in this data set. After these three data sets have been created, we can proceed to create the multivariate data matrix (MDM), and fit Model 8.1. 8.4.5.2 Preparing the MDM File In the main HLM menu, click File, Make new MDM file and then Stat package input. In the window that opens, select HCM2 to fit a hierarchical linear model with crossed random effects associated with two (2) random factors, and click OK. Select the Input File Type as SPSS/Windows. To prepare the MDM file for Model 8.1, locate the Level-1 Specification area, and Browse to the location of the Level-1 data set. Click Open after selecting the Level 1 SPSS file, click the Choose Variables button, and select the following variables: STUDID (click “rowid” for the STUDID variable, because this random factor identifies the “rows” in the cross-classification), TCHRID (click “colid” for the TCHRID variable, because this random factor identifies the “columns” in the cross-classification), and both MATH and YEAR (click “in MDM” for each of these variables). Click OK when finished. Next, locate the Row-Level Specification area, and Browse to the location of the row-level (student-level) SPSS data set. Click Open after selecting the file, and click the Choose Variables button to include STUDID (click “rowid”) and the variable indicating the number of measures on each student, MEASURES (click “in MDM”). Again, although we don’t analyze the MEASURES variable in this example, HLM requires that at least one numeric variable be included in the MDM file from both the row and column data sets. Click OK when finished. Next, locate the Column-Level Specification area, and Browse to the location of the column-level (teacher-level) SPSS data set. Click Open after selecting the file, and click the Choose Variables button to include TCHRID (click “colid”) and the variable indicating the number of measures on each teacher, MEASURES T (click “in MDM”). Again, although we don’t analyze the MEASURES T variable in this example, HLM requires that at least one numeric variable be included in the MDM file from both the row and column data sets. Click OK when finished. After making these choices, select No for Missing Data? in the Level-1 data set, because we do not have any missing data in this analysis. In the upper-right corner of the MDM window, enter a name with a .mdm extension for the MDM file (e.g., SAT.mdm). Save the .mdmt template file under a new name (click Save mdmt file), and click Make MDM. After HLM has processed the MDM file, click the Check Stats button to display descriptive statistics for the variables in the Level-1, row, and column data files (this is not optional). Click Done to begin building Model 8.1. 8.4.5.3 Model Fitting In the model building window, identify MATH as the Outcome variable. To add more informative subscripts to the models (if they do not already appear), click File and Preferences, and then choose Use level subscripts. To complete the specification of Model 8.1, we first add the fixed effect of the uncentered YEAR variable to the model. We choose uncentered because this variable has already been centered at the data management stage. The Level 1 model is displayed in HLM as follows (where i is an index for the individual observations, j is an index for students, and k is an index for teachers): 386 Linear Mixed Models: A Practical Guide Using Statistical Software Model 8.1: Level 1 Model MATHijk = π0jk + π1jk (YEARijk ) + eijk The Level 2 equation for the random intercept specific to a given observation (π0jk ) includes a constant fixed effect, θ0 , a random effect associated with the student (b00j ), which allows the intercept to vary randomly across students, and a random effect associated with the teacher (c00k ), which allows the intercept to vary randomly across teachers. The Level 2 equation for the random coefficient for YEAR specific to a student–teacher combination, π1jk , is simply defined by a constant fixed effect (θ1 ), although we could allow the effect of year to randomly vary across both students and teachers: Model 8.1: Level 2 Model π0jk = θ0 + b00j + c00k π1jk = θ1 We display the overall LMM by clicking the Mixed button in the model building window: Model 8.1: Overall Mixed Model MATHijk = θ0 + θ1 ∗ YEARijk + b00j + c00k + eijk This model is the same as the general specification of Model 8.1 introduced in Subsection 8.3.1.1, although the notation is somewhat different. After specifying Model 8.1, click Basic Settings to enter a title for this analysis (such as “SAT Score Data: Model 8.1”) and a name for the output (.html) file that HLM generates when fitting this model. Note that the default outcome variable distribution is Normal (Continuous). Click OK to return to the model-building window. We note that there is not an option to choose either REML or ML estimation under Other Settings and Estimation Settings; HLM only provides ML estimation as an option for models with crossed random effects. Click File and Save As to save this model specification in a new .hlm file. Finally, click Run Analysis to fit the model. After the estimation of the parameters in Model 8.1 has finished (one may need to enter “y” for estimation to proceed past the predetermined maximum number of iterations), click File and View Output to see the resulting estimates if they do not automatically appear in a new browser window. In the other software procedures, we tested Hypotheses 8.1 and 8.2 by fitting nested models that excluded either the random student effects or the random teacher effects, and then performed likelihood ratio tests. HLM automatically produces tests of the null hypothesis that the variance of the random effects associated with a given random factor is zero, and in the case of a model with crossed random effects, two of these hypothesis tests are displayed in the output [see Raudenbush & Bryk (2002) for details]. These two tests strongly suggest that the null hypothesis of zero variance in the random effects should be rejected for both students and teachers, indicating that we should retain both crossed random effects in the model. Hypothesis 8.3 can be tested by examining the p-value for the fixed effect of YEAR (θ1 in the HLM notation). We have strong evidence against the null hypothesis (p = 0.001), and conclude that there is a significant positive increase in the expected math score as a function of YEAR. Models for Data with Crossed Random Factors:The SAT Score Example 387 At this point, fixed effects of additional covariates could be added to the model in an effort to explain variance in the two sets of random effects, and residual diagnostics could be examined. As illustrated in earlier chapters, HLM enables users to save external residual files in the format of a statistical package of their choosing (e.g., SPSS, Stata) in the Basic Settings window. 8.5 Results of Hypothesis Tests The test results reported in this section were calculated based on the analysis in R. 8.5.1 Likelihood Ratio Tests for Random Effects Hypothesis 8.1. The random effects (ui ) associated with the intercept for each student can be omitted from Model 8.1. The likelihood ratio test statistic for Hypothesis 8.1 is calculated by subtracting the –2 REML log-likelihood for Model 8.1 from that for a reduced version of Model 8.1 excluding the random student effects. This difference is calculated as 2170 − 2124 = 46. Because the null hypothesis value for the variance of the random student effects is on the boundary of the parameter space (i.e., zero), the asymptotic null distribution of this test statistic is a mixture of χ20 and χ21 distributions, each with equal weights of 0.5 (Verbeke & Molenberghs, 2000). To evaluate the significance of the test, we calculate the p-value as follows: p-value = 0.5 × P (χ20 > 46) + 0.5 × P (χ21 > 46) < 0.001 We reject the null hypothesis and retain the random effects associated with students in Model 8.1. We perform a similar test of Hypothesis 8.2 for the random teacher effects, and arrive at a similar conclusion. We noted earlier that the HLM software provides slightly different chi-square tests of these hypotheses, each of which led to the same conclusions; interested readers can consult Raudenbush & Bryk (2002) for more details. 8.5.2 Testing the Fixed Year Effect Hypothesis 8.3. The fixed effect of YEAR can be omitted from Model 8.1. In this chapter, we tested this hypothesis using t- or z-statistics produced in the output by the various software procedures, which are computed using the ratio of the estimated fixed effect to its estimated standard error. Regardless of the slight variance in these statistics (and their degrees of freedom, if applicable) across the procedures in SAS, SPSS, R, Stata, and HLM, we arrived at the same conclusion, rejecting the null hypothesis and concluding that the fixed effect of YEAR is positive and significant. 8.6 Comparing Results across the Software Procedures Table 8.2 shows that the results for Model 8.1 generally agree across the software procedures, in terms of the fixed-effect parameter estimates and their estimated standard errors. The 388 TABLE 8.2: Comparison of Results for Model 8.1 Estimation Method SPSS: MIXED R: lmer() function Stata: mixed HLM: HCM2 REML REML REML REML ML Estimate (SE) 597.38(8.38) 29.05(9.63) Estimate (SE) 597.38(8.38) 29.05(9.63) Estimate (SE) 597.71(7.53) 28.56(8.64) Fixed-Effect Parameter β0 (Intercept) β1 (YEAR) Estimate (SE) Estimate (SE) 597.38(8.38) 597.38(8.38) 29.05(9.63) 29.05(9.63) Covariance Parameter σi2 (Students) σj2 (Teachers) σ 2 (Residuals) Estimate (SE) Estimate (SE) Estimate (n.c.) 338.41(67.84) 338.41(67.84) 338.41 762.90(396.09) 762.94(396.12) 762.94 238.30 238.30 238.30 Model Information Criteria –2 REML log-likelihood 2123.6 2123.6 AIC 2129.6 2129.6 BIC 2123.6 2140.0 Note: (n.c.) = not computed Note: 234 Measures at Level 1; 122 students and 12 teachers 2124.0 2134.0 2151.0 Estimate (SE) Estimate (n.c.) 338.41(67.84) 340.79 762.94(396.12) 604.90 238.30 237.91 2123.6 2133.6 2150.9 2135.9 n.c. n.c. Linear Mixed Models: A Practical Guide Using Statistical Software SAS: proc mixed Models for Data with Crossed Random Factors:The SAT Score Example 389 five software procedures also generally agree in terms of the values of the estimated variance components for the two sets of crossed random effects, σi2 and σj2 , and their standard errors, when reported. We note that the value of the –2 REML log-likelihood is the same across the four software procedures using REML estimation (with the difference in R just due to rounding). The Hierarchical Cross-Classified Model (HCM2) procedure in HLM uses ML estimation only, leading to the different estimates in Table 8.2 and the different value of the –2 ML loglikelihood. We also once again note that there is some disagreement in the values of the information criteria (AIC and BIC), because of different calculation formulas that are used in the different procedures. We also note that the lmer() function in R and the HLM procedure do not compute standard errors for the estimated variance components. The various methods discussed in this chapter for testing hypotheses about the variance components can be used to make inferences about the variance components in these cases. 8.7 Interpreting Parameter Estimates in the Final Model The results that we present in this section were obtained by fitting Model 8.1 to the SAT score data, using REML estimation in Stata. 8.7.1 Fixed-Effect Parameter Estimates The fixed-effect parameter estimates, standard errors, significance tests, and 95% confidence intervals obtained by fitting Model 8.1 to the SAT score data in Stata are reported in the following output: ' $ Wald chi2(1) = 9.11 Log restricted-likelihood = -1061.8139 Prob > chi2 = 0.0025 ---------------------------------------------------------------------gcf Coef. Std. Err. z P > |z| [95 pct Conf. Interval] ---------------------------------------------------------------------year 29.04963 9.626692 3.02 0.003 10.18166 47.9176 _cons 597.3811 8.378181 71.30 0.000 580.96020 613.8021 & % Based on these estimates, we would conclude that the expected value of the SAT math score at Grade 4 for the individuals in this school is 597.38, with a 95% confidence interval for this mean of (580.96, 613.80). We would also conclude that the fixed effect of YEAR on the SAT score is significant (p = 0.003), with a one-year (or one-grade) increase resulting in an expected change of 29.05 in the SAT score. We can use the margins and marginsplot post-estimation commands in Stata to visualize this relationship. Note that these commands need to be submitted immediately after fitting Model 8.1. . . . . * Model 8.1, with plots of marginal predicted values. mixed math year || _all: R.studid || _all: R.tchrid, variance reml margins, at(year = (-1,0,1)) marginsplot We note that we are using the margins command to plot expected values of the dependent variable (SAT math score) at values −1, 0, and 1 on the predictor variable YEAR. 390 Linear Mixed Models: A Practical Guide Using Statistical Software 550 Linear Prediction, Fixed Portion 600 650 Adjusted Predictions with 95% CIs −1 0 year 1 FIGURE 8.4: Predicted SAT math score values by YEAR based on Model 8.1. We then immediately follow the margins command, which displays the marginal predicted values in the output, with the marginsplot command to plot the predicted values (in addition to 95% confidence intervals for the predicted values). The resulting plot is shown in Figure 8.4. 8.7.2 Covariance Parameter Estimates The estimated covariance parameters obtained by fitting Model 8.1 to the SAT score data using the mixed command in Stata with REML estimation are reported in the following output: ' $ Random-effects Parameters Estimate Std. Err. [95 pct Conf. Interval] -------------------------------------------------------------------------_all: Identity var(R.studid) 338.4091 67.8428 228.453 501.2879 _all: Identity var(R.tchrid) 762.9415 396.1218 var(Residual) 238.2957 32.8589 LR test vs. linear regression: chi2(4) = 162.50 275.7685 2110.7560 181.8624 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. & 312.2407 % We first see that the variance of the random student effects, var(R.studid), is estimated to be 338.4091. The 95% confidence interval suggests that this parameter is in fact greater than zero; we illustrated the use of a likelihood ratio test for a more formal (and asymptotic) test of this hypothesis earlier in the chapter. We next see that the estimated variance of the random teacher effects, var(R.tchrid), is 762.9415. The 95% confidence interval for this variance component also suggests that the variance is greater than zero. Finally, we Models for Data with Crossed Random Factors:The SAT Score Example 391 see the estimated residual variance (238.2957). Based on these results, it is fairly clear that between-student and between-teacher variance represent substantial portions of the overall variance in the SAT math scores. 8.8 The Implied Marginal Variance-Covariance Matrix for the Final Model In this section, we capitalize on the ability of proc mixed in SAS to compute the implied marginal variance-covariance matrix for observations in the SAT score data set based on Model 8.1, and examine the unique structure of this matrix. The matrix of implied marginal variances and covariances for the SAT score observations (and the corresponding correlation matrix) can be obtained in SAS by including the v and vcorr options in either of the two random statements for Model 8.1: random int / subject = studid v vcorr; In models with crossed random effects, the default in SAS is no longer to display the implied marginal variance-covariance matrix for the first “subject” in the data set (e.g., Chapter 5), given that there are now multiple “subject” variables crossed with each other. When fitting these models, proc mixed will display this matrix for all observations in the data set, which can be prohibitive when working with larger data sets (in this case, we would see a 234 × 234 matrix). In this illustration, examination of this matrix is instructive for understanding how the estimated variance components in a model with crossed random effects determine the marginal variances and covariances of observations in the data set with crossed random factors. In the table below, we consider the 6 × 6 submatrix in the upper-left corner of the full matrix. This submatrix represents the first six observations in the SAT score data set, shown earlier in Table 8.1, where we have three unique students (with 1, 3, and 2 observations, respectively), and four unique teachers. We first see that the estimated marginal variance of a given observation (on the diagonal of the submatrix below) is simply defined by the sum of the three estimated variance components (338.41 + 762.90 + 238.30 = 1339.61). The implied marginal covariance of observations on the same teacher (e.g., observations 1 and 2) is defined by the estimated variance of the random teacher effects (762.90). The estimated covariance of observations on the same student (e.g., observations 2 and 3) is defined by the estimated variance of the random student effects (338.41). The matrix then has empty cells where there were no covariances, due to the fact that a given pair of observations was collected on unique students and teachers (e.g., observations 1 and 4). ' $ Row Col1 Col2 Col3 Col4 Col5 Col6 -----------------------------------------------------------1 1339.61 762.90 762.90 2 762.90 1339.61 338.41 338.41 762.90 3 338.41 1339.61 338.41 4 338.41 338.41 1339.61 5 762.90 762.90 1339.61 338.41 6 338.41 1339.61 & % The “blank” cells in this matrix indicate the need to use estimation algorithms for sparse matrices that was discussed in Chapter 2 when fitting models with crossed random effects. 392 Linear Mixed Models: A Practical Guide Using Statistical Software For example, in order to compute estimates of standard errors for the estimated fixed effects in this model, we need to invert the “sparse” implied marginal variance-covariance shown above. The implied marginal correlation matrix is simply determined by dividing all cells in the implied marginal variance-covariance matrix by the estimated total variance on the diagonal. The implied intraclass correlation of observations on the same student is simply the proportion of the total variance due to students, while the implied intraclass correlation of observations on the same teacher is simply the proportion of the total variance due to teachers. The implied marginal correlation matrix for the SAT score data set is shown in the output below. ' $ Row Col1 Col2 Col3 Col4 Col5 Col6 -----------------------------------------------------1 1.0000 0.5695 0.5695 2 0.5695 1.0000 0.2526 0.2526 0.5695 3 0.2526 1.0000 0.2526 4 0.2526 0.2526 1.0000 5 0.5695 0.5695 1.0000 0.2526 6 0.2526 1.0000 % & We see that there is a stronger correlation of observations on the same teacher compared to observations from the same student, suggesting that the teacher tends to be a stronger determinant of performance on the test. 8.9 Recommended Diagnostics for the Final Model Using the procedures outlined in earlier chapters in the various software packages, we recommend performing the following set of diagnostic analyses when fitting models with crossed random effects: • Generate predicted values (EBLUPs) of the two (or more) crossed random effects, and examine the distributions of the random effects using normal Q–Q plots. For example, when adding the solution option to each of the two random statements using to fit Model 8.1 in SAS proc mixed, we identify several students and teachers with extreme values, and the data for these “subjects” could be examined in more detail to make sure that there are not any extreme outliers or data entry errors. • Generate residuals and fitted values for the dependent variable based on the final model, and use normal Q–Q plots and scatter plots to examine assumptions of normality and constant variance for the residuals. Consider whether important covariates (or functions of covariates) have been omitted if these assumptions seem violated, and consider transforming the dependent variable if needed. Alternative generalized linear mixed models for non-normal outcomes (outside of the scope of this book) may be needed if these suggested remedies do not fix apparent violations of these assumptions. • Plot the residuals as a function of each of the continuous predictor variables included in the model, to make sure that there are no systematic patterns in the residuals as a function of the predictors (indicating possible model misspecification). The sensitivity of model estimates (and the corresponding inferences) to removing extreme subjects or extreme observations should also be examined. Models for Data with Crossed Random Factors:The SAT Score Example 8.10 393 Software Notes and Additional Recommendations We noted some important differences between the five software procedures in this chapter when fitting models with crossed random effects: 1. There are no straightforward ways to extract the predicted random effects (EBLUPs) associated with teachers and students when using the MIXED command to fit models with crossed random effects in the current version of SPSS. The other four procedures discussed in this chapter provide options for computing these EBLUPs. 2. To fit models with crossed random effects in R, we need to use the lmer() function in the lme4 package. 3. The lme4 package version of the lmer() function in R does not provide p-values for computed test statistics based on fitted models. As a result, we recommend using the lmerTest package in R to test hypotheses about parameters estimated using the lmer() function. 4. When using the HLM package to fit models with crossed random effects, only ML estimation is allowed. This will lead to slight differences in parameter estimates relative to other procedures that may be using REML by default. Because fitting models with crossed random effects can be more intensive computationally than fitting models with nested random effects, we recommend that analysts only include random intercepts associated with the various levels of crossed random factors. This will capture essential features of the dependence of observations based on the multiple random factors, and enable estimation of components of variance due to the random factors. Additional (crossed) random effects (e.g., random coefficients associated with a given predictor variable) can be included in models and tested if there is explicit research interest in between-subject (or between-cluster) variance in these coefficients, but this can lead to computational difficulties, especially when attempting to fit models to larger data sets than the one considered in this chapter. This page intentionally left blank A Statistical Software Resources A.1 A.1.1 Descriptions/Availability of Software Packages SAS SAS is a comprehensive software package produced by the SAS Institute, Inc., which has its headquarters in Cary, North Carolina. SAS is used for business intelligence, scientific applications, and medical research. SAS provides tools for data management, reporting, and analysis. proc mixed is a procedure located within the SAS/STAT software package, a collection of procedures that implement statistical analyses. The current version of the SAS/STAT software package at the time of this publication is SAS Release 9.3, which is available for many different computing platforms, including Windows and UNIX. Additional information on ordering and availability can be obtained by calling 1-800-727-0025 (United States), or visiting the following web site: http://www.sas.com/nextsteps/index.html. A.1.2 IBM SPSS Statistics IBM SPSS Statistics (referred to simply as “SPSS” in this book) is a comprehensive statistical software package produced by International Business Machines (IBM) Corporation, headquartered in Armonk, New York. SPSS’s statistical software, or the collection of procedures available in the Base version of SPSS and several add-on modules, is used primarily for data mining, data management and database analysis, market and survey research, and research of all types in general. The Linear Mixed Models (MIXED) procedure in SPSS is part of the Advanced Statistics module that can be used in conjunction with the Base SPSS software. The current version of the SPSS software package at the time of this publication (Version 22) is available for Windows, MacOS, and Linux desktop platforms. Additional information on ordering and availability can be obtained by calling 1-800-543-2185, or visiting the following web site: http://www-01.ibm.com/software/analytics/spss/ products/statistics/. A.1.3 R R is a free software environment for statistical computing and graphics, which is available for Windows, UNIX, and MacOS platforms. R is an open source software package, meaning that the code written to implement the various functions can be freely examined and modified. The lme() function for fitting linear mixed models can be found in the nlme package, which automatically comes with the R software, and the newer lmer() function for fitting linear mixed models can be found in the lme4 package, which needs to be downloaded by users. The newest version of R at the time of this publication is 3.1.0 (April 2014), and all analyses in this book were performed using at least Version 2.15.1. To download the base R software or any contributed packages (such as the lme4 package) free of charge, readers can visit any 395 396 Linear Mixed Models: A Practical Guide Using Statistical Software of the Comprehensive R Archive Network (CRAN) mirrors listed at the following web site: http://www.r-project.org/. This web site provides a variety of additional information about the R software environment. A.1.4 Stata Stata is a statistical software package for research professionals of all disciplines, offering a completely integrated set of commands and procedures for data analysis, data management, and graphics. Stata is produced by StataCorp LP, which is headquartered in College Station, Texas. The mixed procedure for fitting linear mixed models was first available in Stata Release 13, which is currently available for Windows, Macintosh, and UNIX platforms. For more information on sales or availability, call 1-800-782-8272, or visit: http://www.stata. com/order/. A.1.5 HLM The HLM software program is produced by Scientific Software International, Inc. (SSI), headquartered in Lincolnwood, Illinois, and is designed primarily for the purpose of fitting hierarchical linear models. HLM is not a general-purpose statistical software package similar to SAS, SPSS, R, or Stata, but offers several tools for description, graphing and analysis of hierarchical (clustered and/or longitudinal) data. The current version of HLM (HLM 7) can fit a wide variety of hierarchical linear models, including generalized HLMs for non-normal response variables (not covered in this book). A free student edition of HLM 7 is available at the following web site: http://www.ssicentral.com/hlm/student.html. More information on ordering the full commercial version of HLM 7, which is currently available for Windows, UNIX systems, and Linux servers, can be found at the following web site: http://www.ssicentral.com/ordering/index.html. A.2 Useful Internet Links The web site for this book, which contains links to electronic versions of the data sets, output, and syntax discussed in each chapter, in addition to syntax in the various software packages for performing the descriptive analyses and model diagnostics discussed in the example chapters, can be found at the following link: http://www.umich.edu/~bwest/ almmussp.html. A very helpful web site introducing matrix algebra operations that are useful for understanding the calculations presented in Chapter 2 and Appendix B can be found at the following link: http://www.sosmath.com/matrix/matrix.html. In this book, we have focused on procedures capable of fitting linear mixed models in the HLM software package and four general-purpose statistical software packages. To the best of our knowledge, these five software tools are in widespread use today, but these by no means are the only statistical software tools available for the analysis of linear mixed models. The following web site provides an excellent survey of the procedures available in these and other popular statistical software packages, including MLwiN: http://www. bristol.ac.uk/cmm/learning/mmsoftware/. B Calculation of the Marginal Variance-Covariance Matrix In this appendix, we present the detailed calculation of the marginal variance-covariance matrix, Vi , implied by Model 5.1 in Chapter 5 (the analysis of the Rat Brain data). This calculation assumes knowledge of simple matrix algebra. Vi = Zi DZi + Ri ⎛ ⎞ 1 ⎜ 1 ⎟ ⎜ ⎟ ⎜ 1 ⎟ 2 ⎟ =⎜ ⎜ 1 ⎟ (σint )( 1 ⎜ ⎟ ⎝ 1 ⎠ 1 ⎛ 1 1 1 1 ⎜ ⎜ ⎜ 1) + ⎜ ⎜ ⎜ ⎝ σ2 0 0 0 0 0 0 σ2 0 0 0 0 0 0 σ2 0 0 0 0 0 0 σ2 0 0 0 0 0 0 σ2 0 0 0 0 0 0 σ2 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ Note that the Zi design matrix has a single column of 1s (for the random intercept for each animal in Model 5.1). Multiplying the Zi matrix by the D matrix, we have the following: ⎛ ⎞ 2 σint ⎜ σ2 ⎟ ⎜ int ⎟ ⎜ σ2 ⎟ ⎜ ⎟ Zi D = ⎜ int ⎟ 2 ⎜ σint ⎟ ⎜ 2 ⎟ ⎝ σint ⎠ 2 σint Then, multiplying the above result by the transpose of the Zi matrix, we have ⎛ 2 σint 2 ⎜ σint ⎜ ⎜ σ2 ⎜ Zi DZi = ⎜ int 2 ⎜ σint ⎜ 2 ⎝ σint 2 σint ⎞ ⎟ ⎟ ⎟ ⎟ ⎟(1 ⎟ ⎟ ⎠ ⎛ 1 1 1 1 ⎜ ⎜ ⎜ ⎜ 1) = ⎜ ⎜ ⎜ ⎝ 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ For the final step, we add the 6 × 6 Ri matrix to the above result to obtain the Vi matrix: Vi = Zi DZi + Ri ⎛ 2 2 σint + σ 2 ‘ σint 2 2 ⎜ σ σint + σ 2 int ⎜ ⎜ σ2 2 σint ⎜ int =⎜ 2 2 ⎜ σint σint ⎜ 2 2 ⎝ σint σint 2 2 σint σint 2 σint 2 σint 2 σint + σ2 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint + σ 2 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint + σ2 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint 2 σint + σ 2 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 397 398 Linear Mixed Models: A Practical Guide Using Statistical Software 2 We see how the small sets of covariance parameters defining the D and Ri matrices (σint 2 and σ , respectively) are used to obtain the implied marginal variances (on the diagonal of the Vi matrix) and covariances (off the diagonal) for the six observations on an animal i. Note that this marginal Vi matrix implied by Model 5.1 has a compound symmetry covariance structure (see Subsection 2.2.2.2), where the marginal covariances are restricted 2 to be positive due to the constraints on the D matrix in the LMM (σint > 0). We could fit a marginal model without random animal effects and with a compound symmetry variancecovariance structure for the marginal residuals to allow the possibility of negative marginal covariances. C Acronyms/Abbreviations Definitions for selected acronyms and abbreviations used in the book: AIC = Akaike Information Criterion ANOVA = Analysis of Variance AR(1) = First-order Autoregressive (covariance structure) BIC = Bayes Information Criterion CS = Compound Symmetry (covariance structure) DIAG = Diagonal (covariance structure) det = Determinant df = Degrees of freedom (E)BLUE = (Empirical) Best Linear Unbiased Estimator (E)BLUP = (Empirical) Best Linear Unbiased Predictor (for random effects) EM = Expectation-Maximization (algorithm) EMMEANS = Estimated Marginal MEANS (from SPSS) GLS = Generalized Least Squares HET = Heterogeneous Variance Structure HLM = Hierarchical Linear Model ICC = Intraclass Correlation Coefficient LL = Log-likelihood LMM = Linear Mixed Model LRT = Likelihood Ratio Test LSMEANS = Least Squares MEANS (from SAS) MAR = Missing at Random ML = Maximum Likelihood MLM = Multilevel Model N–R = Newton–Raphson (algorithm) ODS = Output Delivery System (in SAS) OLS = Ordinary Least Squares REML = Restricted Maximum Likelihood UN = Unstructured (covariance structure) VC = Variance Components (covariance structure) 399 This page intentionally left blank Bibliography Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In E. Petrov, & F. Csaki (eds.), Second International Symposium on Information Theory and Control , pp. 267–281. Akademiai Kiado. Allison, P. (2001). Missing Data: Quantitative Applications in the Social Sciences. Newbury Park, CA: Sage Publications. Anderson, D., Oti, R., Lord, C., & Welch, K. (2009). Patterns of growth in adaptive social abilities among children with autism spectrum disorders. Journal of Abnormal Child Psychology, 37 (7), 1019–1034. Asparouhov, T. (2006). General multi-level modeling with sampling weights. Communications in Statistics, Theory and Methods, 35 (3), 439–460. Asparouhov, T. (2008). Scaling of sampling weights for two-level models in Mplus 4.2. Available at http://www.statmodel.com/download/Scaling3.pdf. Bottai, M., & Orsini, N. (2004). A new stata command for estimating confidence intervals for the variance components of random-effects linear models. Presented at the United Kingdom Stata Users’ Group Meetings, London, United Kingdom, June 28–29. Brown, H., & Prescott, R. (2006). Applied Mixed Models in Medicine, Second Edition. New York, NY: John Wiley and Sons. Carle, A. (2009). Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology, 9 (49), 1–13. Carlin, B. P., & Louis, T. A. (2009). Bayesian Methods for Data Analysis, Third Edition. London: Chapman & Hall / CRC Press. Casella, G., & Berger, R. (2002). Statistical Inference. North Scituate, MA: Duxbury Press. Claeskens, G. (2013). Lack of fit, graphics, and multilevel model diagnostics. In M. Scott, J. Simonoff, & B. Marx (eds.), The Sage Handbook of Multilevel Modeling, pp. 425–443. London: Sage Publications. Cooper, D., & Thompson, R. (1977). A note on the estimation of the parameters of the autoregressive-moving average process. Biometrika, 64 (3), 625–628. Crainiceanu, C., & Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society, Series B , 66 , 165–185. Davidian, M., & Giltinan, D. (1995). Nonlinear Models for Repeated Measurement Data. London: Chapman & Hall. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B , 39 (1), 1–38. With discussion. 401 402 Bibliography Diggle, P. J., Heagerty, P. J., Liang, K.-Y., & Zeger, S. L. (2002). Analysis of Longitudinal Data, Second Edition, vol. 25 of Oxford Statistical Science Series. Oxford: Oxford University Press. Douglas, C., Demarco, G., Baghdoyan, H., & Lydic, R. (2004). Pontine and basal forebrain cholinergic interaction: Implications for sheep and breathing. Respiratory Physiology and Neurobiology, 143 (2-3), 251–262. Enders, C. (2013). Centering predictors and contextual effects. In M. Scott, J. Simonoff, & B. Marx (eds.), The Sage Handbook of Multilevel Modeling, pp. 89–107. London: Sage Publications. Faraway, J. (2005). Linear Models with R. London: Chapman & Hall / CRC Press. Fellner, W. (1987). Sparse matrices, and the estimation of variance components by likelihood equations. Communications in Statistics-Simulation, 16 (2), 439–463. Galecki, A. (1994). General class of covariance structures for two or more repeated factors in longitudinal data analysis. Communications in Statistics-Theory and Methods, 23 (11), 3105–3119. Galecki, A., & Burzykowski, T. (2013). Linear Mixed-Effects Models using R: A Step-by-Step Approach. New York, NY: Springer. Geisser, S., & Greenhouse, S. (1958). An extension of box’s results on the use of the f distribution in multivariate analysis. The Annals of Mathematical Statistics, 29 (3), 885– 891. Gelman, A. (2007). Struggles with survey weighting and regression modeling. Statistical Science, 22 (2), 153–164. Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian Data Analysis. London: Chapman and Hall / CRC Press. Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel / Hierarchical Models. New York, NY: Cambridge University Press. Gregoire, T., Brillinger, D., Diggle, P., Russek-Cohen, E., Warren, W., & Wolfinger, R. (1997). Modeling Longitudinal and Spatially Correlated Data: Methods, Applications and Future Directions. New York, NY: Springer-Verlag. Gurka, M. (2006). Selecting the best linear mixed model under reml. The American Statistician, 60 (1), 19–26. Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association, 72 (358), 320– 340. With a comment by J. N. K. Rao and a reply by the author. Heeringa, S., West, B., & Berglund, P. (2010). Applied Survey Data Analysis. New York, NY: Chapman & Hall / CRC Press. Helms, R. (1992). Intentionally incomplete longitudinal designs: 1. Methodology and comparison of some full span designs. Statistics in Medicine, 11 (14–15), 1889–1913. Huynh, H., & Feldt, L. (1976). Estimation of the box correction for degrees of freedom from sample data in the randomized block and split plot designs. Journal of Educational Statistics, 1 (1), 69–82. Bibliography 403 Jackman, S. (2009). Bayesian Analysis for the Social Sciences. New York, NY: Wiley. Jennrich, R. I., & Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 42 (4), 805–820. Kenward, M., & Roger, J. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53 (3), 983–997. Korn, E., & Graubard, B. (1999). Analysis of Health Surveys. New York, NY: Wiley. Laird, N., Lange, N., & Stram, D. (1987). Maximum likelihood computations with repeated measures: Application of the EM algorithm. Journal of the American Statistical Association, 82 (397), 97–105. Laird, N., & Ware, J. (1982). Random-effects models for longitudinal data. Biometrics, 38 (4), 963–974. Lindstrom, M. J., & Bates, D. M. (1988). Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association, 83 (404), 1014–1022. Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data, Second Edition. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley-Interscience [John Wiley & Sons]. Liu, C., & Rubin, D. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81 (4), 633–648. Loy, A., & Hofmann, H. (2014). Hlmdiag: A suite of diagnostics for hierarchical linear models in r. Journal of Statistical Software, 56 (5), 1–28. McCulloch, C. E., Searle, S. R., & Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models, Second Edition. Wiley. Molenberghs, G., & Verbeke, G. (2005). Models for Discrete Longitudinal Data. Berlin: Springer-Verlag. Morrell, C. (1998). Likelihood ratio testing of variance components in the linear mixedeffects model using restricted maximum likelihood. Biometrics, 54 (4), 1560–1568. Morrell, C., Pearson, J., & Brant, L. (1997). Linear transformations of linear mixed-effects models. The American Statistician, 51 (4), 338–343. Nelder, J. (1977). A reformulation of linear models. Journal of the Royal Statistical Society, Series A, 140 (1), 48–77. Ocampo, J. (2005). Effect of porcelain laminate contour on gingival inflammation. Master’s thesis, University of Michigan School of Dentistry. Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58 , 545–554. Pfeffermann, D., Skinner, C., Holmes, D., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society, Series B, 60 (1), 23–40. Pinheiro, J., & Bates, D. (1996). Unconstrained parameterizations for variance-covariance matrices. Statistics and Computing, 6 , 289–296. 404 Bibliography Pinheiro, J., & Bates, D. (2000). Mixed-Effects Models in S and S-PLUS . Berlin: SpringerVerlag. Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modeling of complex survey data. Journal of the Royal Statistical Society, Series A, 169 , 805–827. Rao, C. (1972). Estimation of variance of covariance components in linear models. Journal of the American Statistical Association, 67 (337), 112–115. Raudenbush, S., & Bryk, A. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park, CA: Sage Publications. Raudenbush, S., Bryk, A., & Congdon, R. (2005). HLM 6: Hierarchical Linear and Nonlinear Modeling. Lincolnwood, IL: Scientific Software International. Robinson, G. (1991). That blup is a good thing: The estimation of random effects. Statistical Science, 6 (1), 15–32. Discussion: pp. 32-51. Schabenberger, O. (2004). Mixed model influence diagnostics. Presented at the TwentyNinth Annual SAS Users Group International Conference, Montreal, Canada; May 9–12. Paper 189-29. Searle, S., Casella, G., & McCulloch, C. (1992). Variance Components. New York, NY: John Wiley. Self, S. G., & Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82 (398), 605–610. Snijders, T., & Bosker, R. (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Newbury Park, CA: Sage Publications. Spybrook, J., Bloom, H., Congdon, R., Hill, C., Martinez, A., & Raudenbush, S. (2011). Optimal design plus empirical evidence: Documentation for the “Optimal Design” software version 3.0. Available from www.wtgrantfoundation.org. Steele, R. (2013). Model selection for multilevel models. In M. Scott, J. Simonoff, & B. Marx (eds.), The Sage Handbook of Multilevel Modeling, pp. 109–125. London: Sage Publications. Stram, D., & Lee, J. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 50 (4), 1171–1177. Valliant, R., Dever, J., & Kreuter, F. (2013). Practical Tools for Designing and Weighting Survey Samples. New York, NY: Springer. van Breukelen, G., & Moerbeek, M. (2013). Design considerations in multilevel studies. In M. Scott, J. Simonoff, & B. Marx (eds.), The Sage Handbook of Multilevel Modeling, pp. 183–199. London: Sage Publications. Veiga, A., Smith, P., & Brown, J. (2014). The use of sample weights in multivariate multilevel models with an application to income data collected using a rotating panel survey. Journal of the Royal Statistical Society, Series C (Applied Statistics), 63 (1), 65–84. Verbeke, G., & Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. New York, NY: Springer-Verlag. Bibliography 405 Verbyla, A. (1990). A conditional derivation of residual maximum likelihood. The Australian Journal of Statistics, 32 (2), 227–230. West, B., & Elliott, M. (Forthcoming in 2014). Frequentist and bayesian approaches for comparing interviewer variance components in two groups of survey interviewers. Survey Methodology. Winer, B., Brown, D., & Michels, K. (1991). Statistical Principles in Experimental Design. New York, NY: McGraw-Hill. This page intentionally left blank SECOND Statistics EDITION Ideal for anyone who uses software for statistical modeling, this book eliminates the need to read multiple software-specific texts by covering the most popular software programs for fitting LMMs in one handy guide. The authors illustrate the models and methods through real-world examples that enable comparisons of model-fitting options and results across the software procedures. West, Welch, and Gałecki New to the Second Edition • A new chapter on models with crossed random effects that uses a case study to illustrate software procedures capable of fitting these models • Power analysis methods for longitudinal and clustered study designs, including software options for power analyses and suggested approaches to writing simulations • Use of the lmer() function in the lme4 R package • New sections on fitting LMMs to complex sample survey data and Bayesian approaches to making inferences based on LMMs • Updated graphical procedures in the software packages • Substantially revised index to enable more efficient reading and easier location of material on selected topics or software options • More practical recommendations on using the software for analysis • A new R package (WWGbook) that contains all of the data sets used in the examples LINEAR MIXED MODELS Highly recommended by JASA, Technometrics, and other journals, the first edition of this bestseller showed how to easily perform complex linear mixed model (LMM) analyses via a variety of software programs. Linear Mixed Models: A Practical Guide Using Statistical Software, Second Edition continues to lead you step by step through the process of fitting LMMs. This second edition covers additional topics on the application of LMMs that are valuable for data analysts in all fields. It also updates the case studies using the latest versions of the software procedures and provides up-to-date information on the options and features of the software procedures available for fitting LMMs in SAS, SPSS, Stata, R/S-plus, and HLM. K15924 K15924_cover.indd 1 6/13/14 3:30 PM
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : Yes Author : West, Brady T. Create Date : 2014:08:11 20:51:33Z EBX PUBLISHER : CRC Press LLC Modify Date : 2014:08:11 22:24:22+01:00 XMP Toolkit : Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26 Format : application/pdf Creator : West, Brady T. Title : Linear Mixed Models: A Practical Guide Using Statistical Software, Second Edition Creator Tool : dvips(k) 5.99 Copyright 2010 Radical Eye Software Metadata Date : 2014:08:11 22:24:22+01:00 Producer : Acrobat Distiller 9.4.5 (Windows) Ebx Publisher : CRC Press LLC Document ID : uuid:bd4d9ff0-ab6f-4000-baf2-119eac5b0cc0 Instance ID : uuid:211d2ce9-3c44-40f4-bbdb-66a3e631613c Page Layout : SinglePage Page Mode : UseNone Page Count : 434EXIF Metadata provided by EXIF.tools