User Guide
UserGuide
UserGuide
UserGuide
User Manual:
Open the PDF directly: View PDF .
Page Count: 262
Download | |
Open PDF In Browser | View PDF |
E. E. Holmes, E. J. Ward, and M. D. Scheuerell Analysis of multivariate timeseries using the MARSS package version 3.10.1 May 30, 2014 Northwest Fisheries Science Center, NOAA Seattle, WA, USA Holmes, E. E., E. J. Ward and M. D. Scheuerell. Analysis of multivariate time-series using the MARSS package. NOAA Fisheries, Northwest Fisheries Science Center, 2725 Montlake Blvd E., Seattle, WA 98112. Contacts eli.holmes@noaa.gov, eric.ward@noaa.gov, and mark.scheuerell@noaa.gov Disclaimer: E. E. Holmes, E. J. Ward, and M. D. Scheuerell are NOAA scientists employed by the U.S. National Marine Fisheries Service. The views and opinions presented here are solely those of the authors and do not necessarily represent those of our employer. V Preface The initial motivation for our work with MARSS models was a collaboration with Rich Hinrichsen. Rich developed a framework for analysis of multi-site population count data using MARSS models and bootstrap AICb (Hinrichsen and Holmes, 2009). Our work (EEH and EJW) extended Rich’s framework, made it more general, and led to the development of a parametric bootstrap AICb for MARSS models, which allows one to do model-selection using datasets with missing values (Ward et al., 2010; Holmes and Ward, 2010). Later, we developed additional algorithms for simulation and confidence intervals. Discussions with Mark Scheuerell led to an extensive revision of the EM algorithm and to the development of a general EM algorithm for constrained MARSS models (Holmes, 2012). Discussions with Mark also led to a complete rewrite of the model specification so that the package could be used for MARSS models in general—rather than simply the form of MARSS model used in our applications. Many collaborators have helped test the package; we thank especially Yasmin Lucero, Kevin See, and Brice Semmens. Development of the code into a R package would not have been possible without Kellie Wills, who wrote much of the original package code outside of the algorithm functions. Finally, we thank the participants of our MARSS workshops and courses and the MARSS users who have contacted us regarding issues that were unclear in the manual, errors, or suggestions regarding new applications. Discussions with these users have helped us improve the manual and go in new directions. The application chapters were developed originally as part of workshops on analysis of multivariate time-series data given at the Ecological Society of America meetings since 2005 and taught by us along with Yasmin Lucero, Stephanie Hampton, and Brice Semmens. The chapter on extinction estimation and trend estimation was initially developed by Brice Semmens and later extended by us for this user guide. The algorithm behind the TMU figure in Chapter 6 was developed during a collaboration with Steve Ellner (Ellner and Holmes, 2008). Later we further developed the chapters as part of a course we taught on analysis of fisheries and environmental time-series data at the University of Washington, where we are affiliate faculty. You can find online versions of the workshops and the time-series analysis course on EEH’s website http://faculty.washington.edu/eeholmes. The authors are research scientists at the Northwest Fisheries Science Center (NWFSC). This work was conducted as part of our jobs at the NWFSC, a research center for NOAA Fisheries which is a United States federal government agency. A CAMEO grant from the National Science Foundation and NOAA Fisheries provided the initial impetus for the development of the package as part of a research project with Stephanie Hampton, Lindsay Scheef, and Steven Katz on analysis of marine plankton time series. During the initial stages of this work, EJW was supported on a post-doctoral fellowship from the VI National Research Council and MDS was partially supported by a PECASE award from the White House Office of Science and Technology Policy. You are welcome to use the code and adapt it with full attribution. You should use citation Holmes et al. (2012) for the MARSS package. It may not be used in any commercial applications nor may it be copyrighted. Use of the EM algorithm should cite Holmes (2012). Links to more code and publications on MARSS applications can be found by following the links at our our academic websites: http://faculty.washington.edu/eeholmes http://faculty.washington.edu/scheuerl https://sites.google.com/site/ericward2 Contents Part I The MARSS package 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 What does the MARSS package do? . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 What does MARSS output and how do I get the output? . . . . . 6 1.3 How to get started (quickly) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Important notes about the algorithms . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Other related packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 The main package functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The MARSS() function: inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The MARSS() function: outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Core functions for fitting a MARSS model . . . . . . . . . . . . . . . . . . 2.4 Functions for a fitted marssMLE object . . . . . . . . . . . . . . . . . . . . 2.5 Functions for marssMODEL objects . . . . . . . . . . . . . . . . . . . . . . . 13 13 14 14 15 16 3 Algorithms used in the MARSS package . . . . . . . . . . . . . . . . . . . 3.1 The full time-varying model used in the MARSS EM algorithm 3.2 Maximum-likelihood parameter estimation . . . . . . . . . . . . . . . . . . 3.3 Kalman filter and smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 The exact likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Parametric and innovations bootstrapping . . . . . . . . . . . . . . . . . . 3.6 Simulation and forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 18 19 20 21 22 22 Part II Fitting models with MARSS VIII Contents 4 Fitting models: the MARSS() function . . . . . . . . . . . . . . . . . . . . 4.1 u, a and π model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Q, R, Λ model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 B model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Z model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Default model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 26 27 29 29 30 5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Fixed and estimated elements in parameter matrices . . . . . . . . . 5.2 Different numbers of state processes . . . . . . . . . . . . . . . . . . . . . . . 5.3 Time-varying parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Including inputs (or covariates) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Printing and summarizing models and model fits . . . . . . . . . . . . 5.6 Confidence intervals on a fitted model . . . . . . . . . . . . . . . . . . . . . . 5.7 Vectors of just the estimated parameters . . . . . . . . . . . . . . . . . . . 5.8 Kalman filter and smoother output . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Degenerate variance estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Bootstrap parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Random initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Data simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13 Bootstrap AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 33 34 43 44 44 46 48 48 49 52 52 53 53 54 Part III Applications 6 7 Count-based population viability analysis (PVA) using corrupted data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Simulated data with process and observation error . . . . . . . . . . . 6.3 Maximum-likelihood parameter estimation . . . . . . . . . . . . . . . . . . 6.4 Probability of hitting a threshold Π(xd ,te ) . . . . . . . . . . . . . . . . . . 6.5 Certain and uncertain regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 More risk metrics and some real data . . . . . . . . . . . . . . . . . . . . . . 6.7 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combining multi-site data to estimate regional population trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Harbor seals in the Puget Sound, WA. . . . . . . . . . . . . . . . . . . . . . 7.2 A single well-mixed population with i.i.d. errors . . . . . . . . . . . . . 7.3 Single population model with independent and non-identical errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Two subpopulations, north and south . . . . . . . . . . . . . . . . . . . . . . 7.5 Other population structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 61 62 65 70 75 76 77 79 81 81 83 88 91 92 Contents IX 7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8 Identifying spatial population structure and covariance . . . . 99 8.1 Harbor seals on the U.S. west coast . . . . . . . . . . . . . . . . . . . . . . . . 99 8.2 Question 1, How many distinct subpopulations? . . . . . . . . . . . . . 101 8.3 Fit the different models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.4 Summarize the data support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.5 Question 2, Are the subpopulations independent? . . . . . . . . . . . . 107 8.6 Question 3, Is the Hood Canal independent? . . . . . . . . . . . . . . . . 110 8.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 9 Dynamic factor analysis (DFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 9.2 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 9.3 Setting up the model in MARSS . . . . . . . . . . . . . . . . . . . . . . . . . . 117 9.4 Using model selection to determine the number of trends . . . . . 121 9.5 Using varimax rotation to determine the loadings and trends . . 124 9.6 Examining model fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 9.7 Adding covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 9.8 Questions and further analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 10 Analyzing noisy animal tracking data . . . . . . . . . . . . . . . . . . . . . . 131 10.1 A simple random walk model of animal movement . . . . . . . . . . . 131 10.2 Loggerhead sea turtle tracking data . . . . . . . . . . . . . . . . . . . . . . . . 132 10.3 Estimate locations from bad tag data . . . . . . . . . . . . . . . . . . . . . . 133 10.4 Estimate speeds for each turtle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 10.5 Using specialized packages to analyze tag data . . . . . . . . . . . . . . 137 11 Detection of outliers and structural breaks . . . . . . . . . . . . . . . . 141 11.1 River flow in the Nile River . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 11.2 Different models for the Nile flow levels . . . . . . . . . . . . . . . . . . . . 141 11.3 Observation and state residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 12 Incorporating covariates into MARSS models . . . . . . . . . . . . . . 151 12.1 Covariates as inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 12.2 Examples using plankton data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 12.3 Observation-error only model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 12.4 Process-error only model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 12.5 Both process- & observation-error model . . . . . . . . . . . . . . . . . . . 158 12.6 Including seasonal effects in MARSS models . . . . . . . . . . . . . . . . 159 12.7 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 12.8 Covariates with missing values or observation error . . . . . . . . . . 164 X Contents 13 Estimation of species interaction strengths with and without covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 13.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 13.2 Two-species example using wolves and moose . . . . . . . . . . . . . . . 170 13.3 Analysis a four-species plankton community . . . . . . . . . . . . . . . . 176 13.4 Stability metrics from estimated interaction matrices . . . . . . . . . 189 13.5 Further information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 14 Combining data from multiple time series . . . . . . . . . . . . . . . . . 193 14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 14.2 Salmon spawner surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 14.3 American kestrel abundance indices . . . . . . . . . . . . . . . . . . . . . . . . 198 15 Univariate dynamic linear models (DLMs) . . . . . . . . . . . . . . . . . 203 15.1 Overview of dynamic linear models . . . . . . . . . . . . . . . . . . . . . . . . 203 15.2 Example of a univariate DLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 15.3 Forecasting with a univariate DLM . . . . . . . . . . . . . . . . . . . . . . . . 209 16 Multivariate linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 16.1 Univariate linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 16.2 Multivariate response example using longitudinal data . . . . . . . 223 16.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 17 Lag-p models with MARSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 17.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 17.2 MAR(2) models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 17.3 MAR(p) models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 17.4 MARSS(p): models with observation error . . . . . . . . . . . . . . . . . . 235 17.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 A Textbooks and articles that use MARSS modeling for population modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 B Package MARSS: Warnings and errors . . . . . . . . . . . . . . . . . . . . 243 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Part I The MARSS package 1 Overview MARSS stands for Multivariate Auto-Regressive(1) State-Space. The MARSS package is an R package for estimating the parameters of linear MARSS models with Gaussian errors. This class of model is extremely important in the study of linear stochastic dynamical systems, and these models are important in many different fields, including economics, engineering, genetics, physics and ecology (Appendix A). The model class has different names in different fields, for example in some fields they are termed dynamic linear models (DLMs) or vector autoregressive (VAR) state-space models. The MARSS package allows you to easily fit time-varying constrained and unconstrained MARSS models with or without covariates to multivariate time-series data via maximum-likelihood using primarily an EM algorithm1 . A MARSS model, with Gaussian errors, takes the form: xt = Bt xt−1 + ut + Ct ct + wt , where wt ∼ MVN(0, Qt ) (1.1a) yt = Zt xt + at + Dt dt + vt , where vt ∼ MVN(0, Rt ) (1.1b) x1 ∼ MVN(π, Λ) or x0 ∼ MVN(π, Λ) (1.1c) The x equation is termed the state process and the y equation is termed the observation process. Data enter the model as the y; that is the y is treated as the data although there may be missing data. The ct and dt are inputs (aka, exogenous variables, covariates or indicator variables). The bolded terms are matrices with the following definitions: x is a m × T matrix of states. Each xt is a realization of the random variable Xt at time t. w is a m × T matrix of the process errors. The process errors at time t are multivariate normal with mean 0 and covariance matrix Qt . y is a n × T matrix of the observations. Some observations may be missing. 1 Fitting via the BFGS algorithm is also provided using R ’s optim function, but this is not the focus of the package. 4 1 Overview v is a n × T column vector of the non-process errors. The observation erros at time t are multivariate normal with mean 0 and covariance matrix Rt . Bt and Zt are parameters and are m × m and n × m matrices. ut and at are parameters and are m × 1 and n × 1 column vectors. Qt and Rt are parameters and are m × m and n × n variance-covariance matrices. π is either a parameter or a fixed prior. It is a m × 1 matrix. Λ is either a parameter or a fixed prior. It is a m × m variance-covariance matrix. Ct and Dt are parameters and are m × p and n × q matrices. c and d are inputs (no missing values) and are p × T and q × T matrices. In some fields, the u and a terms are routinely set to 0 or the model is written in such a way that they are incorporated into B or Z. However, in other fields, the u and a terms are the main objects of interest, and the model is written to explicitly show them. We include them throughout our discussion, but they can be set to zero if desired. The AR(p) models can be written in the above form by properly defining the x vector and setting some of the R variances to zero; see Chapter 17. Although the model appears to only include i.i.d. errors (vt and wt ), in practice, AR(p) errors can be included by moving the errors into the state model. Similarly, the model appears to have independent process (vt ) and observation (wt ) errors, however, in practice, these can be modeled as identical or correlated by using one of the state processes to model the errors with the B matrix set appropriately for AR or white noise—although one may have to fix many of the parameters associated with the errors to have an identifiable model. Study the case studies in this User Guide and textbooks on MARSS models for examples of how a wide variety of autoregressive models can be written in MARSS form. 1.1 What does the MARSS package do? Written in an unconstrained form2 , a MARSS model can be written out as follows. Two state processes (x) and three observation processes (y) are used here for example’s sake. 2 meaning all the elements in a parameter matrices are allowed to be different 1.1 What does the MARSS package do? x1 b b x1 w = 11 12 + 1 , x2 t b21 b22 x2 t−1 w2 t 5 w1 u1 q q ∼ MVN , 11 12 w2 t u2 q21 q22 y1 z11 z12 v1 v1 a1 r11 r12 r13 x y2 = z21 z22 1 + v2 , v2 ∼ MVN a2 , r21 r22 r23 x2 t y3 t z31 z32 v3 t v3 t a3 r31 r32 r33 x1 π1 ν ν ∼ MVN , 11 12 x2 0 π2 ν21 ν22 or x1 π1 ν ν ∼ MVN , 11 12 x2 1 π2 ν21 ν22 However not all parameter elements can be estimated simultaneously. Constraints are required in order to specify a model with a unique solution. The MARSS package allows you to specify constraints by fixing elements in a parameter matrix or specifying that some elements are estimated—and have a linear relationship to other elements. Here is an example of a MARSS model with fixed and estimated parameter elements: x1 a 0 x1 w1 w1 0.1 q11 q12 = + , ∼ MVN , 0 a x2 t−1 u x2 t w2 t w2 t q12 q22 y1 d d v1 x 1 y2 = c c + v2 , x2 t y3 t 1 + 2d + 3c 2 + 3d v3 t v1 a1 r00 v2 ∼ MVN a2 , 0 r 0 v3 t 0 00r x1 π 10 ∼ MVN , x2 0 π 01 Notice that some elements are fixed (in case to 0, but could be any fixed number), some elements are shared (have the same value), and some elements are linear combinations of some estimated values (c, 1 + 2d + 3c and 2 + 3d are linear combinations of c and d). The MARSS package fits models via maximum likelihood. The MARSS package is unusual among packages for fitting MARSS models in that fitting is performed via a constrained EM algorithm (Holmes, 2012) based on a vectorized form of Equation 1.1 (See Chapter 3 for the vectorized form used in the algorithm). Although fitting via the BFGS algorithm is also provided using method="BFGS" and R ’s optim function, the examples in this guide use the EM algorithm primarily because it gives robust estimation for datasets replete with missing values and for high-dimensional models with various constraints. However, there are many models/datasets where BFGS is faster and we typically try both for problems. The EM algorithm is also often used to provide initial conditions for the BFGS algorithm (or an MCMC routine) in order to improve the performance of those algorithms. In addition to the main model 6 1 Overview fitting function, the MARSS package supplies functions for bootstrap and approximate confidence intervals, parametric and non-parametric bootstrapping, model selection (AIC and bootstrap AIC), simulation, and bootstrap bias correction. 1.2 What does MARSS output and how do I get the output? MARSS models are used in many different ways and different users will want different types of output. Some users will want the parameter estimates while others want the smoothed states and others want to use MARSS to interpolate missing values and want the expected values of missing data. The best way to find out how to get output is to type ?print.MARSS at the command line after installing MARSS. This help page discusses how to get parameter estimates in different forms, the smoothed and filtered states, all the Kalman filter and smoother output, all the expectations of y (missing data), confidence intervals and bias estimates for the parameters, and standard errors of the states. If you are looking only for Kalman filter and smoother output, see the relevant section in Chapter 3 and ?MARSSkf at the R command line. 1.3 How to get started (quickly) If you already work with models in the form of Equation 1.1, you can immediately fit your model with the MARSS package. Install the MARSS package and then type library(MARSS) at the command line to load the package. Look at the Quick Start Guide and then skim through Chapter ??. Your data need to be a matrix (not dataframe) with time going across the columns and any non-data columns (like year) removed. The MARSS functions assume discrete time steps and you will need a column for each time step. Replace any missing time steps with a missing value holder (e.g. NA). Write your model down on paper and identify which parameters correspond to which parameter matrices in Equation 1.1. Call the MARSS() function (Chapter 4) using your data and using the model argument to specify the structure of each parameter. 1.4 Important notes about the algorithms Specification of a properly constrained model with a unique solution is the responsibility of the user because MARSS has no way to tell if you have specified an insufficiently constrained model—with correspondingly an infinite number of solutions. Specifying a properly constrained model with a unique solution is imperative. How do you know if the model is properly constrained? If you are using 1.4 Important notes about the algorithms 7 a MARSS model form that is widely used, then you can probably assume that it is properly constrained. If you go to papers where someone developed the model or method, the issue of constraints necessary to ensure “identifiability” will likely be addressed if it is an issue. Are you fitting novel MARSS models? Then you will need to do some study on identifiability in this class of models using textbooks (see the textbook list at end of this User Guide). Often textbooks do not address identifiability explicitly. Rather it is addressed implicitly by only showing a model constructed in such a way that it is identifiable. In our work, if we suspect identification problems, we will often first do a Bayesian analysis with flat priors and look for oddities in the posteriors, such as ridges, plateaus or bimodality. All the EM code in the MARSS package is in native R . Thus the model fitting is slow (relatively). The classic Kalman filter/smoother algorithm, as shown in Shumway and Stoffer (2006, p. 331-335), is based on the original smoother presented in Rauch (1963). This Kalman filter is provided in function MARSSkfss, but the default Kalman filter and smoother used in the MARSS package is based on the algorithm in Kohn and Ansley (1989) and papers by Koopman et al. This Kalman filter and smoother is provided in the KFAS package (Helske 2012). Table 2 in Koopman (1993) indicates that the classic algorithm is 40-100 times slower than the algorithm given in Kohn and Ansley (1989), Koopman (1993), and Koopman et al. (1999). The MARSS package function MARSSkfas provides a translator between the model objects in MARSS and those in KFAS so that the KFAS functions can be used. MARSSkfas also includes a lag-one covariance smoother algorithm as this is not output by the KFAS functions, and it provides proper formulation of the priors so that one can use the KFAS functions when the prior on the states is set at t = 0 instead of t = 1 (and no, simply off-setting your data to start at t=2 and sending that value to tinit = 1 in the KFAS Kalman filter would not be mathematically correct). EM algorithms will quickly get in the vicinity of the maximum likelihood, but the final approach to the maximum is generally slow relative to quasiNewton methods. On the flip side, EM algorithms are quite robust to initial conditions choices and can be extremely fast at getting close to the MLE values for high-dimensional models. The MARSS package also allows one to use the BFGS method to fit MARSS models, thus one can use an EM algorithm to “get close” and then the BFGS algorithm to polish off the estimate. Restricted maximum-likelihood algorithms are also available for AR(1) state-space models, both univariate (Staples et al., 2004) and multivariate (Hinrichsen, 2009). REML can give parameter estimates with lower variance than plain maximumlikelihood algorithms. However, the algorithms for REML when there are missing values are not currently available (although that will probably change in the near future). Another maximum-likelihood method is data-cloning which adapts MCMC algorithms used in Bayesian analysis for maximum-likelihood estimation (Lele et al., 2007). 8 1 Overview Missing values are seamlessly accommodated with the MARSS package. Simply specify missing data with NAs. The likelihood computations are exact and will deal appropriately with missing values. However, no innovations3 bootstrapping can be done if there are missing values. Instead parametric bootstrapping must be used. You should be aware that maximum-likelihood estimates of variance in MARSS models are fundamentally biased, regardless of the algorithm used. This bias is more severe when one or the other of R or Q is very small, and the bias does not go to zero as sample size goes to infinity. The bias arises because variance is constrained to be positive. Thus if R or Q is essentially zero, the mean estimate will not be zero and thus the estimate will be biased high while the corresponding bias of the other variance will be biased low. You can generate unbiased variance estimates using a bootstrap estimate of the bias. The function MARSSparamCIs() will do this. However be aware that adding an estimated bias to a parameter estimate will lead to an increase in the variance of your parameter estimate. The amount of variance added will depend on sample size. You should also be aware that mis-specification of the prior on the initial states (π and Λ) can have catastrophic effects on your parameter estimates if your prior conflicts with the distribution of the initial states implied by the MARSS model. These effects can be very difficult to detect because the model will appear to be well-fitted. Unless you have a good idea of what the parameters should be, you might not realize that your prior conflicts. The most common problems, we have found with priors on x0 are the following. Problem 1) The correlation structure in Λ (whether the prior is diffuse or not) does not match the correlation structure in x0 implied by your model. For example, you specify a diagonal Λ (independent states), but the implied distribution has correlations. Problem 2) The correlation structure in Λ does not match the structure in x0 implied by constraints you placed on π. For example, you specify that all values in π are shared, yet you specify that Λ is diagonal (independent). Unfortunately, using a diffuse prior does not help with these two problems because the diffuse prior still has a correlation structure and can still conflict with the implied correlation in x0 . One way to get around these problems is to set Λ=0 (a m×m matrix of zeros) and estimate π ≡ x0 only. Now π is a fixed but unknown (estimated) parameter, not the mean of a distribution. In this case, Λ does not exist in your model and there is no conflict with the model. Unfortunately estimating π as a parameter is not always robust. If you specify that Λ=0 and specify that π corresponds to x0 , but your model “explodes” when run backwards in time, you cannot estimate π because you cannot get a good estimate of x0 . Sometimes this can be avoided by specifying that π corresponds to x1 so that it can be constrained by the data y1 . In summary, if the implied correlation structure of your initial states is independent (diagonal variance-covariance matrix), you should generally be 3 referring to the non-parametric bootstrap developed by Stoffer and Wall (1991). 1.5 Troubleshooting 9 ok with a diagonal and high variance prior or with treating the initial states as parameters (with Λ = 0). But if your initial states have an implied correlation structure that is not independent, then proceed with caution (which means that you should assume you have problems and test the fitting with simulated data). There is a large class of models in the statistical finance literature that have the form xt+1 = Bxt + Γηt yt = Zxt + ηt For example, ARMA(p,q) models can be written in this form. The MARSS model framework in this package will not allow you to write models in that form. You can put the ηt into the xt vector and set R = 0 to make models of this form using the MARSS form, but the EM algorithm in the MARSS package won’t let you estimate parameters because the parameters will drop out of the full likelihood being maximized in the algorithm. You can try using BFGS by passing in the method argument to the MARSS() call. 1.5 Troubleshooting Numerical errors due to ill-conditioned matrices are not uncommon when fitting MARSS models. The Kalman and EM algorithms need inverses of matrices. If those matrices become ill-conditioned, for example all elements are close to the same value, then the algorithm becomes unstable. Warning messages will be printed if the algorithms are becoming unstable and you can set control$trace=1, to see details of where the algorithm is becoming unstable. Whenever possible, you should avoid using shared π values in your model4 . The way our algorithm deals with Λ tends to make this case unstable, especially if R is not diagonal. In general, estimation of a non-diagonal R is more difficult, more prone to ill-conditioning, and more data-hungry. You may also see non-convergence warnings, especially if your MLE model turns out to be degenerate. This means that one of the elements on the diagonal of your Q or R matrix are going to zero (are degenerate). It will take the EM algorithm forever to get to zero. BFGS will have the same problem, although it will often get a bit closer to the degenerate solution. If you are using method="kem", MARSS will warn you if it looks like the solution is degenerate. If you use control=list(allow.degen=TRUE), the EM algorithm will attempt to set the degenerate variances to zero (instead of trying to get to zero using an infinite number of iterations). However, if one of the variances is going to zero, first think about why this is happening. This is typically caused by one of three problems: 1) you made a mistake in inputting your data, e.g. used -99 as the missing value in your data but did not replace these with NAs 4 An example of a π with shared values is π = a a . a 10 1 Overview before passing to MARSS, 2) your data are not sufficient to estimate multiple variances or 3) your data are inconsistent with the model you are trying fit. The algorithms in the MARSS package are designed for cases where the Q and R diagonals are all non-minuscule. For example, the EM update equation for u will grind to a halt (not update u) if Q is tiny (like 1E-7). Conversely, the BFGS equations are likely to miss the maximum-likelihood when R is tiny because then the likelihood surface becomes hyper-sensitive to π. The solution is to use the degenerate likelihood function for the likelihood calculation and the EM update equations. MARSS will implement this automatically when Q or R diagonal elements are set to zero and will try setting Q and R terms to zero automatically if control$allow.degen=TRUE. One odd case can occur when R goes to zero (a matrix of zeros), but you are estimating π. If model$tinitx=1, then π must be y1 as R goes to zero, but as R goes to zero, the log-likelihood will go (correctly) to infinity. But if you set R = 0, the loglikelihood will be finite. The reason is that R ≈ 0 and R = 0 specify different likelihoods. In the first, the determinant of R will appear, and this goes to positive infinity as R goes to zero. In the second case, R does not appear in the likelihood and so the determinant of R does not appear. If some elements of the diagonal of R are going to zero, you should be suspect of the parameter estimates. Sometimes the structure of your data, e.g. one data value followed by a long string of missing values, is causing an odd spike in the likelihood at R ≈ 0. Try manually setting R equal to zero to get the correct log-likelihood5 . 1.6 Other related packages Packages that will do Kalman filtering and smoothing are many, but packages that estimate the parameters in a MARSS model, especially constrained MARSS models, are much less common. The following are those with which we are familiar, however there are certainly more packages for estimating MARSS models in engineering and economics of which we are unfamiliar. The MARSS package is unusual in that it uses an EM algorithm for maximizing the likelihood as opposed to a Newton-esque method (e.g. BFGS). The package is also unusual in that it allows you to specify the initial conditions at t = 0 or t = 1, allows degenerate models (with some of the diagonal elements of R or Q equal to zero). Lastly, model specification in the MARSS package has a one-to-one relationship between the model list in MARSS and the model as you would write it on paper as a matrix equation. This makes the learning curve a bit less steep. However, the MARSS package has not been optimized for speed and probably will be really slow if you have time-series data with a lot of time points. 5 The likelihood returned when R ≈ 0 is not incorrect. It is just not the likelihood that you probably want. You want the likelihood where the R term is dropped because it is zero. 1.6 Other related packages 11 DLM DLM is an R package for fitting MARSS models. Our impression is that it is mainly Bayesian focused but it does allow MLE estimation via the optim() function. It has a book, Dynamic Linear Models with R by Petris et al., which has many examples of how to write MARSS models for different applications. sspir sspir an R package for fitting ARSS (univariate) models with Gaussian, Poisson and binomial error distributions. dse dse (Dynamic Systems Estimation) is an R package for multivariate Gaussian state-space models with a focus on ARMA models. SsfPack SsfPack is a package for Ox/Splus that fits constrained multivariate Gaussian state-space models using mainly (it seems) the BFGS algorithm but the newer versions support other types of maximization. SsfPack is very flexible and written in C to be fast. It has been used extensively on statistical finance problems and is optimized for dealing with large (financial) data sets. It is used and documented in Time Series Analysis by State Space Methods by Durbin and Koopman, An Introduction to State Space Time Series Analysis by Commandeur and Koopman, and Statistical Algorithms for Models in State Space Form: SsfPack 3.0, by Koopman, Shephard, and Doornik. Brodgar The Brodgar software was developed by Alain Zuur to do (among many other things) dynamic factor analysis, which involves a special type of MARSS model. The methods and many example analyses are given in Analyzing Ecological Data by Zuur, Ieno and Smith. This is the one package that we are aware of that also uses an EM algorithm for parameter estimation. eViews eViews is a commercial economics software that will estimate at least some types of MARSS models. KFAS The KFAS R package provides a fast Kalman filter and smoother. Examples in the package show how to estimate MARSS models using the KFAS functions and R ’s optim() function. The MARSS package uses the filter and smoother functions from the KFAS package. S+FinMetrics S+FinMetrics is a S-plus module for fitting MAR models, which are called vector autoregressive (VAR) models in the economics and finance literature. It has some support for state-space VAR models, though we haven’t used it so are not sure which parameters it allows you to estimate. It was developed by Andrew Bruce, Doug Martin, Jiahui Wang, and Eric Zivot, and it has a book associated with it: Modeling Financial Time Series with S-plus by Eric Zivot and Jiahui Wang. kftrack The kftrack R package provides a suite of functions specialized for fitting MARSS models to animal tracking data. 2 The main package functions The MARSS package is object-based. It has two main types of objects: a model object (class marssMODEL) and a maximum-likelihood fitted model object (class marssMLE). A marssMODEL object specifies the structure of the model to be fitted. It is an R code version of the MARSS equation (Equation 1.1). A marssMLE object specifies both the model and the information necessary for fitting (initial conditions, controls, method). If the model has been fitted, the marssMLE object will also have the parameter estimates and (optionally) confidence intervals and bias. 2.1 The MARSS() function: inputs The function MARSS() is an interface to the core fitting functions in the MARSS package. It allows a user to fit a MARSS model using a list to describe the model structure. It returns marssMODEL and marssMLE objects which the user can later use in other functions, e.g. simulating or computing bootstrap confidence intervals. MLEobj=MARSS(data, model=list(), ..., fit=TRUE) This function will fit a MARSS model to the data using a model list which is a list describing the structure of the model parameter matrices. In the default model, i.e. if you use MARSS(dat) with no model argument, Z and B are the identity matrix, R is a diagonal matrix with one variance, Q is a diagonal matrix with unique variances, u is unique, a is scaling, and C, c, D, and d are all zero. The output is a marssMLE object where the estimated parameter matrices are in MLEobj$par. If fit=FALSE, it returns a minimal marssMLE object that is ready for passing to a fitting function (below) but with no par element. 14 2 The main package functions 2.2 The MARSS() function: outputs The marssMLE object returned by a MARSS() call include the estimated parameters, states, and expected values of any missing data. Derived statistics, such as confidence intervals and standard errors, can be computed using the functions described below. The print method for marssMLE objects will print or compute all the frequently needed output using the what= argument in the print call. Type ?print.MARSS at the R command line to see the print help file. estimated parameters coef(marssMLE) The coef function can output parameters in a variety of formats. See ?coef.marssMLE. residuals residuals(marssMLE) See ?residuals.marssMLE for a discussion of standardized residuals in the context of MARSS models. Kalman filter and smoother output The smoothed states are in marssMLE$states but the full Kalman filter and smoother output is available from MARSSkf(marssMLE). See ?MARSSkf for a discussion of the Kalman filter and smoother outputs. If you just want the estimated states conditioned on all the data, use print(marssMLE, what="xtT"). If you want all the Kalman filter and smoother output, use print(marssMLE, what="kfs"). expected value of missing y MARSShatyt(marssMLE) See ?MARSShatyt for a discussion of the expectations involving y. If you just want the estimated missing y conditioned on all the data, use print(marssMLE, what="ytT"). If you want all the expectations involving y conditioned on the data, use print(marssMLE, what="Ey"). 2.3 Core functions for fitting a MARSS model The following core functions are designed to work with ‘unfitted’ marssMLE objects, that is a marssMLE object without the par element. Users do not normally need to call the MARSSkem or MARSSoptim functions since MARSS() will call those. Below, MLEobj means the argument is a marssMLE object. Note, these functions can be called with a marssMLE object with a par element, but these functions will overwrite that element. MLEobj=MARSSkem(MLEobj) This will fit a MARSS model via the EM algorithm to the data using a properly specified marssMLE object, which has data, the marssMODEL object and the necessary initial condition and control elements. See the appendix on the object structures in the MARSS package. MARSSkem does no error-checking. See is.marssMLE(). MARSSkem uses MARSSkf described below. MLEobj=MARSSoptim(MLEobj) This will fit a MARSS model via the BFGS algorithm provided in optim(). This requires a properly specified marssMLE object, such as would be passed to MARSSkem. 2.4 Functions for a fitted marssMLE object 15 MLEobj=MARSSmcinit(MLEobj) This will perform a Monte Carlo initial conditions search and update the marssMLE object with the best initial conditions from the search. is.marssMLE(MLEobj) This will check that a marssMLE object is properly specified and ready for fitting. This should be called before MARSSkem or MARSSoptim is called. This function is not typically needed if using MARSS() since MARSS() builds the model object for the user and does error-checking on model structure. 2.4 Functions for a fitted marssMLE object The following functions use a marssMLE object that has a populated par element, i.e. a marssMLE object returned from one of the fitting functions (MARSS, MARSSkem, MARSSoptim). Below modelObj means the argument is a marssMODEL object and MLEobj means the argument is a marssMLE object. Type ?function.name to see information on function usage and examples. kf=MARSSkf(MLEobj) This will compute the expected values of the hidden states given data via the Kalman filter (to produce estimates conditioned on 1 : t − 1) and the Kalman smoother (to produce estimates conditioned on 1 : T ). The function also returns the exact likelihood of the data conditioned on MLEobj$par. A variety of other Kalman filter/smoother information is also output (kf is a list of output); see ?MARSSkf for details. MLEobj=MARSSaic(MLEobj) This adds model selection criteria, AIC, AICc, and AICb, to a marssMLE object. boot=MARSSboot(MLEobj) This returns a list containing bootstrapped parameters and data via parametric or innovations bootstrapping. MLEobj=MARSShessian(MLEobj) This adds a numerically estimated Hessian matrix to a marssMLE object. MLEobj=MARSSparamCIs(MLEobj) This adds standard errors, confidence intervals, and bootstrap estimated bias for the maximum-likelihood parameters using bootstrapping or the Hessian to the passed-in marssMLE object. sim.data=MARSSsimulate(MLEobj) This returns simulated data from a MARSS model specified via a list of parameter matrices in MLEobj$parList (this is a list with elements Q, R, U, etc). paramVec=MARSSvectorizeparam(MLEobj) This returns the estimated (and only the estimated) parameters as a vector. This is useful for storing the results of simulations and for writing functions that fit MARSS models using R’s optim function. new.MLEobj=MARSSvectorizeparam(MLEobj, paramVec) This will return a marssMLE object in which the estimated parameters (which are in MLEobj$par along with the fixed values) are replaced with the values in paramVec. 16 2 The main package functions 2.5 Functions for marssMODEL objects is.marssMODEL(modelObj) This will check that the free and fixed matrices in a marssMODEL object are properly specified. This function is not typically needed if using MARSS() since MARSS() builds the marssMODEL object for the user and does error-checking on model structure. summary(modelObj) This will print the model parameter matrices showing the fixed values (in parentheses) and the location of the estimated elements. The estimated elements are shown as g1, g2, g3, ... which indicates which elements are shared (i.e., forced to have the same value). For example, an i.i.d. R matrix would appear as a diagonal matrix with just g1 on the diagonal. 3 Algorithms used in the MARSS package 3.1 The full time-varying model used in the MARSS EM algorithm In mathematical form, the model that is being fit with the package is > xt = (xt−1 ⊗ Im ) vec(Bt ) + (ut> ⊗ Im ) vec(Ut ) + wt , yt = (xt> ⊗ In ) vec(Zt ) + (at> ⊗ In ) vec(At ) + vt , Wt ∼ MVN(0, Qt ) Vt ∼ MVN(0, Rt ) (3.1) xt0 = π + F , L ∼ MVN(0, Λ) Each model parameter matrix, Bt , Ut , Qt , Zt , At , and Rt , is written as a time-varying linear model, ft + Dt m, where f and D are fully-known (not estimated and no missing values) and m is a column vector of the estimates elements of the parameter matrix: vec(Bt ) = ft,b + Dt,bβ vec(Zt ) = ft,z + Dt,zζ vec(Ut ) = ft,u + Dt,uυ vec(At ) = ft,a + Dt,aα vec(Λ) = fλ + Dλλ vec(Qt ) = ft,q + Dt,q q vec(Rt ) = ft,r + Dt,r r vec(π) = fπ + Dπ p The internal MARSS model specification (element $marss in a fitted marssMLE object output by a MARSS() call) is a list with the ft (“fixed”) and Dt (“free”) matrices for each parameter. The output from fitting are the vectors, β , υ , etc. The trick is to rewrite the user’s linear multivariate problem into the general form (Equation 3.1). MARSS does this using functions that take more familiar arguments as input and then constructs the ft and Dt matrices. Because the ft and Dt can be whatever the user wants (assuming they are the right shape), this allows users to include covariates, trends (linear, sinusoidal, etc) or indicator variables in a variety of ways. It also means that terms like 1 + b + 2c can appear in the parameter matrices. Although the above form looks unusual, it is equivalent to the commonly seen form but leads to a log-likelihood function where all terms have form Mm, 18 3 Algorithms used in the MARSS package where M is a matrix and m is a column vector of only the different estimated values. This makes it easy to do the partial differentiation with respect to m necessary for the EM algorithm and as a result, easy to impose linear constraints and structure on the elements in a parameter matrix (Holmes, 2012). 3.2 Maximum-likelihood parameter estimation 3.2.1 EM algorithm Function MARSSkem in the MARSS package provides a maximum-likelihood algorithm for parameter estimation based on an Expectation-Maximization (EM) algorithm (Holmes, 2012). EM algorithms are widely used algorithms that extend maximum-likelihood estimation to cases where there are hidden random variables in a model (Dempster et al., 1977; Harvey, 1989; Harvey and Shephard, 1993; McLachlan and Krishnan, 2008). Expectation-Maximization algorithms for unconstrained MARSS models have been around for many years and algorithms for certain constrained cases have also been published. What makes the EM algorithm in MARSS different is that it is a general constrained algorithm that allows generic linear constraints among matrix elements (thus allows fixed, shared and linear combinations of estimated elements). The EM algorithm finds the maximum-likelihood estimates of the parameters in a MARSS model using an iterative process. Starting with an initial set of parameters1 , which we will denote Θ̂1 , an updated parameter set Θ̂2 is obtaining by finding the Θ̂2 that maximizes the expected value of the likelihood over the distribution of the states (X) conditioned on Θ̂1 . This distributon of states is computed via the Kalman smoother (Section 3.3). Mathematically, each iteration of an EM algorithm does this maximization: Θ̂2 = arg max Θ EX|Θ̂1 [log L(Θ|Y = yT1 , X)] (3.2) Then using Θ̂2 , the distibution of X conditioned on Θ̂2 is computed. Then that distibution along with Θ̂2 in place of Θ̂1 is used in Equation (3.2) to produce an updated parameter set Θ̂3 . This is repeated until the expected log-likelihood stops increasing (or increases less than some set tolerance level). Implementing this algorithm is straight-forward, hence its popularity. 1. Set an initial set of parameters, Θ̂1 2. E step: using the model for the hidden states (X) and Θ̂1 , calculate the expected values of X conditioned on all the data yT1 ; this is xtT output by the Kalman smoother (function MARSSkf in MARSS). Also calculate 1 You can choose these however you wish, however choosing something not too far off from the correct values will make the algorithm go faster. 3.3 Kalman filter and smoother 19 expected values of any functions of X (or Y if there are missing Y values) that appear in your expected log-likelihood function. 3. M step: put those E(X|Y = yT1 , Θ̂1 ) and E(g(X)|Y = yT1 , Θ̂1 ) into your expected log-likelihood function in place of X (and g(X)) and maximize with respect to Θ. This gives you Θ̂2 . 4. Repeat the E and M steps until the log likelihood stops increasing. The EM equations used in the MARSS package (function MARSSkem) are described in Holmes (2012) and are extensions of those in Shumway and Stoffer (1982) and Ghahramani and Hinton (1996). Our EM algorithm is an extended version because our algorithm is for cases where there are constraints within the parameter matrices (shared values, linear combinations, diagonal structure, block-diagonal structure, ...), where there are fixed values within the parameter matrices, or where there may be 0s on the diagonal of Q, R and Λ. The EM algorithm is a hill-climbing algorithm and like all hill-climbing algorithms can get stuck on local maxima. The MARSS package includes a Monte-Carlo initial conditions searcher (function MARSSmcinit) based on Biernacki et al. (2003) to minimize this problem. EM algorithms are also known to get close to the maximum very quickly but then creep toward the absolute maximum. Once in the vicinity of the maximum, quasi-Newton methods find the absolute maximum much faster, but they can be sensitive to initial conditions and in practice, we have found the EM algorithm to be much faster for large problems. 3.3 Kalman filter and smoother The Kalman filter (Kalman, 1960) is a recursive algorithm that solves for the expected value of the hidden states (the X) in a MARSS model (Equation 1.1) at time t conditioned on the data up to time t: E(Xt |yt1 ). The Kalman filter gives the optimal (lowest mean square error) estimate of the unobserved xt based on the observed data up to time t for this class of linear dynamical system. The Kalman smoother (Rauch et al., 1965) solves for the expected value of the hidden state(s) conditioned on all the data: E(Xt |yT1 ). If the errors in the stochastic process are Gaussian, then the estimators from the Kalman filter and smoother are also the maximum-likelihood estimates. However, even if the the errors are not Gaussian, the estimators are optimal in the sense that they are estimators with the least variability possible. This robustness is one reason the Kalman filter is so powerful—it provides well-behaving estimates of the hidden states for all kinds of multivariate autoregressive processes, not just Gaussian processes. The Kalman filter and smoother are widely used in time-series analysis, and there are many textbooks covering it and its applications. In the interest of giving the reader a single point of reference, we use Shumway and Stoffer (2006) as our primary reference. 20 3 Algorithms used in the MARSS package The MARSSkf function provides the Kalman filter and smoother output using one of two algorithms (specified by fun.kf). The algorithm in MARSSkfss is that shown in Shumway and Stoffer (2006). This algorithm is not computationally efficient; see Koopman et al. (1999, sec. 4.3) for a more efficient Kalman filter implementation. The Koopman et al. implementation is provided in the functions MARSSkfas using the KFAS R package. MARSSkfss (and MARSSkfas with a few exceptions) has the following outputs: xtt1 The expected value of Xt conditioned on the data up to time t − 1. xtt The expected value of Xt conditioned on the data up to time t. xtT The expected value of Xt conditioned on all the data from time 1 to T . This the smoothed state estimate. Vtt1 The variance of Xt conditioned on the data up to time t − 1. Denoted Ptt−1 in section 6.2 in Shumway and Stoffer (2006). Vtt The variance of Xt conditioned on the data up to time t. Denoted Ptt in section 6.2 in Shumway and Stoffer (2006). VtT The variance of Xt conditioned on all the data from time 1 to T . Vtt1T The lag-one covariance of Xt and Xt−1 conditioned on all the data, 1 to T . Kt The Kalman gain. This is part of the update equations and relates to the amount xtt1 is updated by the data at time t to produce xtt. Not output by MARSSkfas. J This is similar to the Kalman gain but is part of the Kalman smoother. See Equation 6.49 in Shumway and Stoffer (2006). Not output by MARSSkfas. Innov This has the innovations at time t, defined as εt ≡ yt - E(Yt ). These are the residuals, the difference between the data and their predicted values. See Equation 6.24 in Shumway and Stoffer (2006). Not output by MARSSkfas. Sigma This has the Σt , the variance-covariance matrices for the innovations at time t. This is used for the calculation of confidence intervals, the s.e. on the state estimates and the likelihood. See Equation 6.25 in Shumway and Stoffer (2006) for the Σt calculation. Not output by MARSSkfas. logLik The log-likelihood of the data conditioned on the model parameters. 3.4 The exact likelihood The likelihood of the data given a set of MARSS parameters is part of the output of the MARSSkfss and MARSSkfas functions. The likelihood computation is based on the innovations form of the likelihood (Schweppe, 1965) and uses the output from the Kalman filter: ! T 1 T N > −1 − (3.3) log L(Θ|data) = − ∑ log |Σt | + ∑ (εt ) Σt εt 2 log(2π) 2 t=1 t=1 3.5 Parametric and innovations bootstrapping 21 where N is the total number of data points, εt is the innovations at time t and |Σt | is the determinant of the innovations variance-covariance matrix at time t. Reference Equation 6.62 in Shumway and Stoffer (2006). However there are a few differences between the log-likelihood output by MARSSkf and MARSSkfas and that described in Shumway and Stoffer (2006). The standard likelihood calculation (Equation 6.62 in Shumway and Stoffer (2006)) is biased when there are missing values in the data, and the missing data modifications discussed in Section 6.4 in Shumway and Stoffer (2006) do not correct for this bias. Harvey (1989), Section 3.4.7, discusses at length that the standard missing values correction leads to an inexact likelihood when there are missing values. The bias is minor if there are few missing values, but it becomes severe as the number of missing values increases. Many ecological datasets have high fractions of missing values and this leads to a very biased likelihood if one uses the inexact formula. Harvey (1989) provides some nontrivial ways to compute the exact likelihood. We use instead the exact likelihood correction for missing values that is presented in Section 12.3 in Brockwell and Davis (1991). This solution is straight-forward to implement. The correction involves the following changes to εt and Σt in the Equation 3.3. Suppose the value yi,t is missing. First, the corresponding i-th value of εt is set to 0. Second, the i-th diagonal value of Σt is set to 1 and the off-diagonal elements on the i-th column and i-th row are set to 0. 3.5 Parametric and innovations bootstrapping Bootstrapping can be used to construct frequentist confidence intervals on the parameter estimates (Stoffer and Wall, 1991) and to compute the smallsample AIC corrector for MARSS models (Cavanaugh and Shumway, 1997); the functions MARSSparamCIs and MARSSaic do these computations. The MARSSboot function provides both parametric and innovations bootstrapping of MARSS models. The innovations bootstrap algorithm by Stoffer and Wall (1991) bootstraps the model residuals (the innovations). This is a semi-parametric bootstrap since is uses, partially, the maximum-likelihood parameter estimates. This algorithm cannot be used if there are missing values in the data. Also for short time series, it gives biased bootstraps because one cannot resample the first few innovations. MARSSboot also provides a fully parametric bootstrap. This uses the maximum-likelihood MARSS parameters to simulate data from which bootstrap parameter estimates are obtained. Our research (Holmes and Ward, 2010) indicates that this provides unbiased bootstrap parameter estimates, and it works with datasets with missing values. Lastly, MARSSboot can also output parameters sampled from a numerically estimated Hessian matrix. 22 3 Algorithms used in the MARSS package 3.6 Simulation and forecasting The MARSSsimulate function simulates from a fitted marssMLE object (e.g. output from a MARSS() call). It use the mvrnorm (package MASS) or rmvnorm (package mvtnorm) functions to produce draws of the process and observation errors from multivariate normal distributions for each time step. 3.7 Model selection The package provides the MARSSaic function for computing AIC, AICc and AICb. The latter is a small-sample corrector for autoregressive state-space models. The bias problem with AIC and AICc for short time-series data has been shown in Cavanaugh and Shumway (1997) and Holmes and Ward (2010). AIC and AICc tend to select overly complex MARSS models when the time-series data are short. AICb corrects this bias. The algorithm for a non-parametric AICb is given in Cavanaugh and Shumway (1997). Their algorithm uses the innovations bootstrap (Stoffer and Wall, 1991), which means it cannot be used when there are missing data. We added a parametric AICb (Holmes and Ward, 2010), which uses a parametric bootstrap. This algorithm allows one to compute AICb when there are missing data and it provides unbiased AIC even for short time series. See Holmes and Ward (2010) for discussion and testing of parametric AICb for MARSS models. AICb is comprised of the familiar AIC fit term, −2 log L, plus a penalty term that is the mean difference between the log likelihood the data under the bootstrapped maximum-likelihood parameter estimates and the log likelihood of the data under the original maximum-likelihood parameter estimate: 1 Nb L(Θ̂∗ (i)|y) AICb = −2 log L(Θ̂|y) + 2 ∑ − log L(Θ̂|y) Nb i=1 (3.4) where Θ̂ is the maximum-likelihood parameter set under the original data y, Θ̂∗ (i) is a maximum-likelihood parameter set estimated from the i-th bootstrapped data set y∗ (i), and Nb is the number of bootstrap data sets. It is important to notice that the likelihood in the AICb equation is L(Θ̂∗ |y) not L(Θ̂∗ |y∗ ). In other words, we are taking the average of the likelihood of the original data given the bootstrapped parameter sets. Part II Fitting models with MARSS 4 Fitting models: the MARSS() function From the user perspective, the main package function is MARSS(). This fits a MARSS model (Equation 1.1) to a matrix of data: MARSS(data, model=list(), form="marxss")) The model argument is a list with names B, U, C, c, Q, Z, A, D, d, R, x0, V0. Elements can be left off to use default values. The form argument tells MARSS() how to use the model list elements. The default is form="marxss" which is the model in Equation 1.1. The data must be passed in as a n × T matrix; that is time goes across columns. A vector is not a matrix, nor is a dataframe. A data matrix consisting of three time series (n = 3) with six time steps might look like 1 2 NA NA 3.2 8 y = 2 5 3 NA 5.1 5 1 NA 2 2.2 NA 7 where NA denotes a missing value. The argument model specifies the structure of the MARSS model. It is a list, where the list elements for each model parameter specifies the form of that parameter. The most general way to specify model structure is to use a list matrix. The list matrix allows one to combine fixed and estimated elements in one’s parameter specification. It allows a one-to-one correspondence between how you write the parameter matrix on paper and how you specify it in R . For example, let’s say Q and u have the following forms in your model: q00 0.05 Q = 0 q 0 and u = u1 001 u2 So Q is a diagonal matrix with the 3rd variance fixed at 1 and the 1st and 2nd estimated and equal. The 1st element of u is fixed, and the 2nd and 3rd are estimated and different. You can specify this using a list matrix: 26 4 Fitting models: the MARSS() function Q=matrix(list("q",0,0,0,"q",0,0,0,1),3,3) U=matrix(list(0.05,"u1","u2"),3,1) If you print out Q and U, you will see they look exactly like Q and u written above. MARSS will keep the fixed values fixed and estimate q, u1, and u2. List matrices allow the most flexible model structures, but MARSS also has text shortcuts for a number of common model structures. Below, the possible ways to specify each model parameter are shown, using m = 3 (the number of hidden state processes) and n = 3 (number of observation time series). 4.1 u, a and π model structures u, a and π are all row matrices and the options for specifying their structures are the same. a has one special option, "scaling" described below. The allowable structures are shown using u as an example. Note that you should be careful about specifying shared structure in π because you need to make sure the structure in Λ matches. For example, if you require that all the π values are shared (equal) then Λ cannot be a diagonal matrix since that would be saying that the π values are independent, which they are clearly not if you force them to be equal. U=matrix(list(),m,1): This is the most general form and allows one to specify fixed and estimated elements in u. Each character string in u is the name of one of the u elements to be estimated. For example if U=matrix(list(0.01,"u","u"),3,1), then u in the model has the following structure: 0.01 u u U=matrix(c(),m,1), where the values in c() are all character strings: each character string is the name of an element to be estimated. For example if U=matrix(c("u1","u1","u2"),3,1), then u in the model has the following structure: u1 u1 u2 with two values being estimated. U=matrix(list("u1","u1","u2"),3,1) has the same effect. U="unequal" or U="unconstrained": Both of these stings indicate that each element of u is estimated. If m = 3, then u would have the form: u1 u2 u3 4.2 Q, R, Λ model structures 27 U="equal": There is only one value in u: u u u U=matrix(c(),m,1), where the values in c() all numerical values: u is fixed and has no estimated values. If U=matrix(c(0.01,1,-0.5),3,1), then u in the model is: 0.01 1 −0.5 U=matrix(list(0.01,1,-0.5),3,1) would have the same effect. U="zero": u is all zero: 0 0 0 The a parameter has a special option, "scaling", which is the default behavior. In this case, a is treated like a scaling parameter. If there is only one y row associated with an x row, then the corresponding a element is 0. If there are more than one y rows associated with an x row, then the first a element is set to 0 and the others are estimated. For example, say m = 2 and n = 4 and Z looks like the following: 10 1 0 Z= 1 0 01 Then the 1st-3rd rows of y are associated with the first row of x, and the 4th row of y is associated with the last row of x. Then if a is specified as "scaling", a has the following structure: 0 a1 a2 0 4.2 Q, R, Λ model structures The possible Q, R, and Λ model structures are identical, except that R is n × n while Q and Λ are m × m. All types of structures can be specified using a list matrix, but there are also text shortcuts for specifying common structures. The structures are shown using Q as the example. 28 4 Fitting models: the MARSS() function Q=matrix(list(),m,m): This is the most general way to specify the parameters and allows there to be fixed and estimated elements. Each character string in the list matrix is the name of one of the Q elements to be estimated, and each numerical value is a fixed value. For example if Q=matrix(list("s2a",0,0,0,"s2a",0,0,0,"s2b"),3,3), then Q has the following structure: 2 σa 0 0 0 σ2a 0 0 0 σ2b Note that diag(c("s2a","s2a","s2b")) will not have the desired effect of producing a matrix with numeric 0s on the off-diagonals. It will have character 0s and MARSS will interpret “0” as the name of an element of Q to be estimated. Instead, the following two lines can be used: Q=matrix(list(0),3,3) diag(Q)=c("s2a","s2a","s2b") Q="diagonal and equal": There is only one process variance value in this case: 2 σ 0 0 0 σ2 0 0 0 σ2 Q="diagonal and unequal": There are m process variance values in this case: 2 σ1 0 0 0 σ2 0 2 0 0 σ23 Q="unconstrained": There are values on the diagonal and the off-diagonals of Q and the variances and covariances are all different: 2 σ1 σ1,2 σ1,3 σ1,2 σ2 σ2,3 2 σ1,3 σ2,3 σ23 There are m process variances and (m2 − m)/2 covariances in this case, so (m2 + m)/2 values to be estimated. Note that variance-covariance matrices are never truly unconstrained since the upper and lower triangles of the matrix must be equal. Q="equalvarcov": There is one process variance and one covariance: 2 σ β β β σ2 β β β σ2 4.4 Z model 29 Q=matrix(c(), m, m), where all values in c() are character strings: Each element in Q is estimated and each character string is the name of a value to be estimated. Note if m = 1, you still need to wrap its value in matrix() so that its class is matrix. Q=matrix(c(), m, m), where all values in c() are numeric values: Each element in Q is fixed to the values in the matrix. Q="identity": The Q matrix is the identity matrix: 100 0 1 0 001 Q="zero": The Q matrix is all zeros: 00 0 0 00 0 0 0 Be careful when setting Λ model structures. Mis-specifying the structure of Λ can have catastrophic, but difficult to discern, effects on your estimates. See the comments on priors in Chapter 1. 4.3 B model structures Like the variance-covariance matrices (Q, R and Λ), B can be specified with a list matrix to allow you to have both fixed and shared elements in the B matrix. Character matrices and matrices with fixed values operate the same way as for the variance-covariance matrices. In addition, the same text shortcuts are available: “unconstrained”, “identity”, “diagonal and equal”, “diagonal and unequal”, “equalvarcov”, and “zero”. A fixed B can be specified with a numeric matrix, but all eigenvalues must fall within the unit circle; meaning all(abs(eigen(B)$values)<=1) must be true. 4.4 Z model Like B and the variance-covariance matrices, Z can be specified with a list matrix to allow you to have both fixed and estimated elements in Z. If Z is a square matrix, many of the same text shortcuts are available: “diagonal and equal”, “diagonal and equal”, and “equalvarcov”. If Z is a design matrix1 , then a special shortcut is available using factor() which allows you to specify which y rows are associated with which x rows. See Chapter ?? and the case studies for more examples. 1 a matrix with only 0s and 1s and where the row sums are all equal to 1 30 4 Fitting models: the MARSS() function Z=factor(c(1,1,1)): All y time series are observing the same (and only) hidden state trajectory x (n = 3 and m = 1): 1 Z = 1 1 Z=factor(c(1,2,3)): Each time series in y corresponds to a different hidden state trajectory. This is the default Z model and in this case n = m: 100 Z = 0 1 0 001 Z=factor(c(1,1,2)): The first two time series in y corresponds to one hidden state trajectory and the third y time series corresponds to a different hidden state trajectory. Here n = 3 and m = 2: 10 Z = 1 0 01 The Z model can be specified using either numeric or character factor levels. c(1,1,2) is the same as c("north","north","south") Z="identity": This is the default behavior. This means Z is a n×n identity matrix and m = n. If n = 3, it is the same as Z=factor(c(1,2,3)). Z=matrix(c(), n, m), where the elements in c() are all strings: Passing in a n × m character matrix, means that each character string is a value to be estimated. Be careful that you are specifying an identifiable model when using this option. Z=matrix(c(), n, m), where the elements in c() are all numeric: Passing in a n × m numeric matrix means that Z is fixed to the values in the matrix. The matrix must be numeric but it does not need to be a design matrix. Z=matrix(list(), n, m): Passing in a n × m list matrix allows you to combine fixed and estimated values in the Z matrix. Be careful that you are specifying an identifiable model. 4.5 Default model structures The defaults for the model arguments in form="marxss" are Z="identity" each y in y corresponds to one x in x B="identity" no interactions among the x’s in x U="unequal" the u’s in u are all different 4.5 Default model structures 31 Q="diagonal and unequal" process errors are independent but have different variances R="diagonal and equal" the observations are i.i.d. A="scaling" a is a set of scaling factors C="zero" and D="zero" no inputs. c="zero" and d="zero" no inputs. pi="unequal" all initial states are different V0="zero" the initial condition on the states (x0 or x1 ) is fixed but unknown tinitx=0 the initial state refers to t = 0 instead of t = 1. 5 Examples In this chapter, we work through a series of short examples using the MARSS package functions. This chapter is oriented towards those who are already somewhat familiar with MARSS models and want to get started quickly. We provide little explanatory text. Those unfamiliar with MARSS models might want to start with the applications. In these examples, we will use the default form="marxss" argument for a MARSS() call. This specifies a MARSS model of the form: xt = Bt xt−1 + ut + Ct ct + wt , where wt ∼ MVN(0, Qt ) (5.1a) yt = Zt xt + at + Dt dt + vt , where vt ∼ MVN(0, Rt ) (5.1b) x1 ∼ MVN(π, Λ) or x0 ∼ MVN(π, Λ) (5.1c) The c and d are inputs (not estimated). In the examples here, we leave off c and d, and we address including inputs only briefly at the end of the chapter. See Chapter 12 for extended examples of including covariates as inputs in a MARSS model. 5.1 Fixed and estimated elements in parameter matrices Suppose one has a MARSS model (Equation 5.1) with the following structure: x1,t b1 0.1 x1,t−1 u w1,t 0 q1 q3 = + + , wt ∼ MVN , x2,t b2 2 x2,t−1 u w2,t 0 q3 q2 y1,t z1 0 0 v1,t 0 r00 y2,t = z2 z2 x1,t + 0 + v2,t , vt ∼ MVN 0 , 0 r 0 x2,t y3,t 0 3 0 v3,t 0 001 π1 10 x0 ∼ MVN , π2 01 34 5 MARSS examples Notice how this model mixes fixed values, estimated values and shared values. In MARSS, model structure is specified using a list with the names, Z, A, R, B, U, Q, x0 and V0. Each element is matrix (class matrix) with the same dimensions as the matrix of the same name in the MARSS model. MARSS distinguishes between the estimated and fixed values in a matrix by using a list matrix in which you can have numeric and character elements. Numeric elements are fixed; character elements are names of things to be estimated. The model above would be specified as: Z=matrix(list("z1","z2",0,0,"z2",3),3,2) A=matrix(0,3,1) R=matrix(list(0),3,3); diag(R)=c("r","r",1) B=matrix(list("b1",0.1,"b2",2),2,2) U=matrix(c("u","u"),2,1) Q=matrix(c("q1","q3","q3","q2"),2,2) x0=matrix(c("pi1","pi2"),2,1) V0=diag(1,2) model.gen=list(Z=Z,A=A,R=R,B=B,U=U,Q=Q,x0=x0,V0=V0,tinitx=0) Notice that there is a one-to-one correspondence between the model list in R and the model on paper. Fitting the model is then just a matter of passing the data and model list to the MARSS function: kemfit = MARSS(dat, model=model.gen) If you work often with MARSS models then you will probably know whether prior sensitivity is a problem for your types of MARSS applications. If so, note that the MARSS package is unusual in that it allows you to set Λ = 0 and treat x0 as an unknown estimated parameter. This eliminates the prior and thus the prior sensitivity problems—at the cost of adding m parameters. Depending on your application, you may need to set the initial conditions at t = 1 instead of the default of t = 0. If you are unsure, look in the index and read all the sections that talk about troubleshooting priors. 5.2 Different numbers of state processes Here we show a series of short examples using a dataset on Washington harbor seals (?harborSealWA), which has five observation time series. The dataset is a little unusual in that it has four missing years from years 2 to 5. This causes some interesting issues with prior specification. Before starting the harbor seal examples, we set up the data, making time go across the columns and removing the year column: dat = t(harborSealWA) dat = dat[2:nrow(dat),] #remove the year row 5.2 Different numbers of state processes 35 5.2.1 One hidden state process for each observation time series This is the default model for the MARSS() function. In this case, n = m, the observation errors are i.i.d. and the process errors are independent and have different variances. The elements in u are all different (meaning, they are not forced to be the same). Mathematically, the MARSS model being fit is: q1 0 0 0 0 w1,t u1 1 0 0 0 0 x1,t−1 x1,t 0 0 q2 0 0 x2,t 0 1 0 0 0 x2,t−1 u2 w2,t x3,t = 0 0 1 0 0 x3,t−1 + u3 + w3,t , wt ∼ MVN 0 , 0 0 q3 0 0 0 0 0 q4 x4,t 0 0 0 1 0 x4,t−1 u4 w4,t 0 0 0 0 0 w5,t u5 0 0 0 0 1 x5,t−1 x5,t 1 y1,t y2,t 0 y3,t = 0 y4,t 0 0 y5,t 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 r0000 0 v1,t 0 0 x1,t 0 0 r 0 0 0 0 x2,t 0 v2,t 0 x3,t + 0 + v3,t , vt ∼ MVN 0 , 0 0 r 0 0 0 0 0 0 r 0 0 x4,t 0 v4,t 0000r 0 v5,t 0 1 x5,t This is the default model, so you can fit it by simply passing dat to MARSS(). kemfit = MARSS(dat) Success! abstol and log-log tests passed at 38 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 38 iterations. Log-likelihood: 19.13428 AIC: -6.268557 AICc: 3.805517 Estimate R.diag 0.00895 U.X.SJF 0.06839 U.X.SJI 0.07163 U.X.EBays 0.04179 U.X.PSnd 0.05226 U.X.HC -0.00279 Q.(X.SJF,X.SJF) 0.03205 Q.(X.SJI,X.SJI) 0.01098 Q.(X.EBays,X.EBays) 0.00706 Q.(X.PSnd,X.PSnd) 0.00414 0 0 0 0 q5 36 5 MARSS examples Q.(X.HC,X.HC) x0.X.SJF x0.X.SJI x0.X.EBays x0.X.PSnd x0.X.HC 0.05450 5.98647 6.72487 6.66212 5.83969 6.60482 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. The output warns you that the convergence tolerance is high. You can set it lower by passing in control=list(conv.test.slope.tol=0.1). MARSS() is automatically creating parameter names since you did not tell it the names. To see exactly where each parameter element appears in its parameter matrix, type summary(kemfit$model). Though it is not necessary to specify the model for this example since it is the default, here is how you could do so using matrices: B=Z=diag(1,5) U=matrix(c("u1","u2","u3","u4","u5"),5,1) x0=A=matrix(0,5,1) R=Q=matrix(list(0),5,5) diag(R)="r" diag(Q)=c("q1","q2","q3","q4","q5") Notice that when a matrix has both fixed and estimated elements (like R and Q), a list matrix is used to allow you to specify the fixed elements as numeric and to give the estimated elements character names. The default MLE method is the EM algorithm (method="kem"). You can also use a quasi-Newton method (BFGS) by setting method="BFGS". kemfit.bfgs = MARSS(dat, method="BFGS") Success! Converged in 99 iterations. Function MARSSkfas used for likelihood calculation. MARSS fit is Estimation method: BFGS Estimation converged in 99 iterations. Log-likelihood: 19.13936 AIC: -6.278712 AICc: 3.795362 R.diag U.X.SJF U.X.SJI U.X.EBays Estimate 0.00849 0.06838 0.07152 0.04188 5.2 Different numbers of state processes 37 U.X.PSnd 0.05233 U.X.HC -0.00271 Q.(X.SJF,X.SJF) 0.03368 Q.(X.SJI,X.SJI) 0.01124 Q.(X.EBays,X.EBays) 0.00722 Q.(X.PSnd,X.PSnd) 0.00437 Q.(X.HC,X.HC) 0.05600 x0.X.SJF 5.98437 x0.X.SJI 6.72169 x0.X.EBays 6.65689 x0.X.PSnd 5.83527 x0.X.HC 6.60425 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Using the default EM convergence criteria, the EM algorithm stops at a loglikelihood a little lower than the BFGS algorithm does, but the EM algorithm was faster, 11.6 times faster, in this case. If you wanted to use the EM fit as the initial conditions, pass in the inits argument using the $par element (or coef(fit,form="marss")) of the EM fit. kemfit.bfgs2 = MARSS(dat, method="BFGS", inits=kemfit$par) The BFGS algorithm now converges in 107 iterations. Output not shown. We mentioned that the missing years from year 2 to 4 creates an interesting issue with the prior specification. The default behavior of MARSS is to treat the initial state as at t = 0 instead of t = 1. Usually this doesn’t make a difference, but for this dataset, if we set the prior at t = 1, the MLE estimate of R becomes 0. If we estimate x1 as a parameter and let R go to 0, the likelihood will go to infinity (slowly but surely). This is neither an error nor a pathology, but is probably not what you would like to have happen. Note that the “BFGS” algorithm will not find the maximum in this case; it will stop before R gets small and the likelihood gets very large. However, the EM algorithm will climb up the peak. You can try it by running the following code. It will report warnings which you can read about in Appendix B. kemfit.strange = MARSS(dat, model=list(tinitx=1)) 5.2.2 Five correlated hidden state processes This is the same model except that the five hidden states have correlated process errors. Mathematically, this is the model: 38 5 MARSS examples q1 w1,t u1 x1,t−1 x1,t c1,2 x2,t x2,t−1 u2 w2,t x3,t = x3,t−1 + u3 + w3,t , wt ∼ MVN 0, c1,3 c1,4 x4,t x4,t−1 u4 w4,t c1,5 w5,t u5 x5,t−1 x5,t 1 y1,t y2,t 0 y3,t = 0 y4,t 0 0 y5,t 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 c1,2 q2 c2,3 c2,4 c2,5 c1,3 c2,3 q3 c3,4 c3,5 c1,4 c2,4 c3,4 q4 c4,5 c1,5 c2,5 c3,5 c4,5 q5 r0000 v1,t 0 0 x1,t 0 r 0 0 0 0 x2,t 0 v2,t x3,t + 0 + v3,t , vt ∼ MVN 0, 0 0 r 0 0 0 0 0 0 r 0 0 x4,t 0 v4,t 0000r v5,t 0 1 x5,t B is not shown in the top equation; it is a m × m identity matrix. To fit, use MARSS() with the model argument set. The output is not shown but it will appear if you type this on the R command line. kemfit = MARSS(dat, model=list(Q="unconstrained")) This shows one of the text shortcuts, "unconstrained", which means estimate all elements in the matrix. This shortcut can be used for all parameter matrices. 5.2.3 Five equally correlated hidden state processes This is the same model except that now there is only one process error variance and one process error covariance. Mathematically, the model is: qcccc x1,t−1 u1 w1,t x1,t c q c c c x2,t x2,t−1 u2 w2,t x3,t = x3,t−1 + u3 + w3,t , wt ∼ MVN 0, c c q c c c c c q c x4,t x4,t−1 u4 w4,t ccccq x5,t x5,t−1 u5 w5,t 1 y1,t y2,t 0 y3,t = 0 y4,t 0 0 y5,t 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 r0000 v1,t 0 0 x1,t 0 r 0 0 0 0 x2,t 0 v2,t 0 x3,t + 0 + v3,t , vt ∼ MVN 0, 0 0 r 0 0 0 0 0 r 0 0 v4,t 0 x4,t 0000r 0 v5,t 1 x5,t Again B is not shown in the top equation; it is a m × m identity matrix. To fit, use the following code (output not shown): kemfit = MARSS(dat, model=list(Q="equalvarcov")) The shortcut ‘"equalvarcov" means one value on the diagonal and one on the off-diagonal. It can be used for all square matrices (B,Q,R, and Λ). 5.2 Different numbers of state processes 39 5.2.4 Five hidden state processes with a “north” and a “south” u and Q elements Here we fit a model with five independent hidden states where each observation time series is an independent observation of a different hidden trajectory but the hidden trajectories 1-3 share their u and Q elements, while hidden trajectories 4-5 share theirs. This is the model: qn 0 0 0 0 un w1,t x1,t−1 x1,t 0 qn 0 0 0 x2,t x2,t−1 un w2,t x3,t = x3,t−1 + un + w3,t , wt ∼ MVN 0, 0 0 qn 0 0 0 0 0 qs 0 x4,t x4,t−1 us w4,t x5,t−1 x5,t 0 0 0 0 qs us w5,t y1,t 1 y2,t 0 y3,t = 0 y4,t 0 0 y5,t 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 x1,t 0 v1,t r0000 0 r 0 0 0 0 x2,t 0 v2,t 0 x3,t + 0 + v3,t , vt ∼ MVN 0, 0 0 r 0 0 0 x4,t 0 0 0 0 r 0 v4,t 1 x5,t 0 0000r v5,t To fit use the following code, we specify the model argument for u and Q using list matrices. List matrices allow us to combine numeric and character values in a matrix. MARSS will interpret the numeric values as fixed, and the character values as parameters to be estimated. Parameters with the same name are constrained to be identical. regions=list("N","N","N","S","S") U=matrix(regions,5,1) Q=matrix(list(0),5,5); diag(Q)=regions kemfit = MARSS(dat, model=list(U=U, Q=Q)) Only u and Q need to be specified since the other parameters are at their default values. 5.2.5 Fixed observation error variance Here we fit the same model but with a known observation error variance. This is the model: 40 5 MARSS examples qn un w1,t x1,t−1 x1,t 0 x2,t x2,t−1 un w2,t x3,t = x3,t−1 + un + w3,t , wt ∼ MVN 0, 0 0 x4,t x4,t−1 us w4,t 0 us w5,t x5,t−1 x5,t 1 y1,t y2,t 0 y3,t = 0 y4,t 0 0 y5,t 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 qn 0 0 0 0 0 qn 0 0 0 0 0 qs 0 0 0 0 0 qs v1,t 0 0 x1,t x2,t 0 v2,t 0 0 x3,t + 0 + v3,t , 0 x4,t 0 v4,t v5,t 0 1 x5,t 0.01 0 0 0 0 0 0.01 0 0 0 0 0 0.01 0 0 0, vt ∼ MVN 0 0 0 0.01 0 0 0 0 0 0.01 To fit this model, use the following code (output not shown): regions=list("N","N","N","S","S") U=matrix(regions,5,1) Q=matrix(list(0),5,5); diag(Q)=regions R=diag(0.01,5) kemfit = MARSS(dat, model=list(U=U, Q=Q, R=R)) 5.2.6 One hidden state and five i.i.d. observation time series Instead of five hidden state trajectories, we specify that there is only one and all the observations are of that one trajectory. Mathematically, the model is: xt = xt−1 + u + wt , wt ∼ N(0, q) y1,t 1 0 v1,t r0000 y2,t 1 a2 v2,t 0 r 0 0 0 y3,t = 1 xt + a3 + v3,t , vt ∼ MVN 0, 0 0 r 0 0 y4,t 1 a4 v4,t 0 0 0 r 0 y5,t 1 a5 v5,t 0000r Note the default model for R is "diagonal and equal"’ so we can leave this off when specifying the model argument. To fit, use this code (output not shown): 5.2 Different numbers of state processes 41 Z=factor(c(1,1,1,1,1)) kemfit = MARSS(dat, model=list(Z=Z)) Success! abstol and log-log tests passed at 28 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 28 iterations. Log-likelihood: 3.593276 AIC: 8.813447 AICc: 11.13603 Estimate A.SJI 0.80153 A.EBays 0.28245 A.PSnd -0.54802 A.HC -0.62665 R.diag 0.04523 U.U 0.04759 Q.Q 0.00429 x0.x0 6.39199 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. You can also pass in Z exactly as it is in the equation: Z=matrix(1,5,1), but the factor shorthand is handy if you need to assign different observed time series to different underlying state time series (see next examples). The default a form is "scaling", which means that the first y row associated with a given x has a = 0 and the rest are estimated. 5.2.7 One hidden state and five independent observation time series with different variances Mathematically, this model is: xt = xt−1 + u + wt , wt ∼ N(0, q) y1,t 1 0 v1,t r1 y2,t 1 a2 v2,t 0 y3,t = 1 xt + a3 + v3,t , vt ∼ MVN 0, 0 y4,t 1 a4 v4,t 0 y5,t 1 a5 v5,t 0 0 r2 0 0 0 0 0 r3 0 0 0 0 0 r4 0 0 0 0 0 r5 42 5 MARSS examples To fit this model: Z=factor(c(1,1,1,1,1)) R="diagonal and unequal" kemfit = MARSS(dat, model=list(Z=Z, R=R)) Success! abstol and log-log tests passed at 24 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 24 iterations. Log-likelihood: 16.66199 AIC: -9.323982 AICc: -3.944671 Estimate A.SJI 0.79555 A.EBays 0.27540 A.PSnd -0.53694 A.HC -0.60874 R.(SJF,SJF) 0.03229 R.(SJI,SJI) 0.03528 R.(EBays,EBays) 0.01352 R.(PSnd,PSnd) 0.01082 R.(HC,HC) 0.19609 U.U 0.05270 Q.Q 0.00604 x0.x0 6.26676 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. 5.2.8 Two hidden state processes Here we fit a model with two hidden states (north and south) where observation time series 1-3 are for the north and 4-5 are for the south. We make the hidden state processes independent (meaning a diagonal Q matrix) but with the same process variance. We make the observation errors i.i.d. (the default) and the u elements equal. Mathematically, this is the model: 5.3 Time-varying parameters 43 xn,t x u w q0 = n,t−1 + + n,t , wt ∼ MVN 0, xs,t xs,t−1 u ws,t 0q 1 y1,t y2,t 1 y3,t = 1 y4,t 0 0 y5,t r0000 v1,t 0 0 0 r 0 0 0 a2 v2,t 0 xn,t 0 0 r 0 0 0 0, + a3 + v3,t , vt ∼ MVN x 0 0 0 r 0 0 v4,t 1 s,t 1 0000r v5,t a5 To fit the model, use the following code (output not shown): Z=factor(c("N","N","N","S","S")) Q="diagonal and equal" U="equal" kemfit = MARSS(dat, model=list(Z=Z,Q=Q,U=U)) You can also pass in Z exactly as it is in the equation as a numeric matrix (matrix(1,5,1); the factor notation is a shortcut for making a design matrix (as Z is in these examples). "equal" is a shortcut meaning all elements in a matrix are constrained to be equal. It can be used for all column matrices (a, u and π). "diagonal and equal" can be used as a shortcut for all square matrices (B,Q,R, and Λ). 5.3 Time-varying parameters Time-varying parameters are specified by passing in an array of matrices (list, numeric or character) where the 3rd dimension of the array is time and must be the same value as the 2nd (time) dimension of the data matrix. No text shortcuts are allowed for time-varying parameters; you need to use the matrix form. For example, let’s say we wanted a different u for the first half versus second half of the harbor seal time series. We would pass in an array for u as follows: U1=matrix("t1",5,1); U2=matrix("t2",5,1) Ut=array(U2,dim=c(dim(U1),dim(dat)[2])) TT=dim(dat)[2] Ut[,,1:floor(TT/2)]=U1 kemfit.tv=MARSS(dat,model=list(U=Ut,Q="diagonal and equal")) You can have some elements in a parameter matrix be time-constant and some be time-varying: U1=matrix(c(rep("t1",4),"hc"),5,1); U2=matrix(c(rep("t2",4),"hc"),5,1) Ut=array(U2,dim=c(dim(U1),dim(dat)[2])) 44 5 MARSS examples Ut[,,1:floor(TT/2)]=U1 kemfit.tv=MARSS(dat,model=list(U=Ut,Q="diagonal and equal")) Note that how the time-varying model is specified for MARSS is the same as you would write the time-varying model on paper in matrix math form. 5.4 Including inputs (or covariates) In MARSS models with covariates, the covariates are often treated as inputs and appear as either the c or d in Equation 5.1, depending on the application. However, more generally, c and d are simply inputs that are fully-known (no missing values). ct is the p × 1 vector of inputs at time t which affect the states and dt is a q × 1 vector of inputs (potentially the same as ct ), which affect the observations. Ct is an m × p matrix of coefficients relating the effects of ct to the m × 1 state vector xt , and Dt is an n × q matrix of coefficients relating the effects of dt to the n × 1 observation vector yt . The elements of C and D can be estimated, and their form is specified much like the other matrices. With the MARSS() function, one can fit a model with inputs by simply passing in model$c and/or model$d in the MARSS() call as a p × T or q × T matrix, respectively. The form for Ct and Dt is similarly specified by passing in model$C and/or model$D. If C and D are not time-varying, they are passed in as a matrix. If they are time-varying, they must be passed in as an 3dimensional array with the 3rd dimension equal to the number of time steps if they are time-varying. See Chapter 12 for extended examples of including covariates as inputs in a MARSS model. 5.5 Printing and summarizing models and model fits The package includes print functions for marssMODEL objects and marssMLE objects (fitted models). print(kemfit) print(kemfit$model) This will print the basic information on model structure and model fit that you have seen in the previous examples. The package also includes a summary function for models. summary(kemfit$model) Output is not shown because it is verbose, but it prints each matrix with the fixed elements denoted with their values and the free elements denoted by their names. This is very helpful for confirming exactly what model structure you are fitting to the data. 5.5 Printing and summarizing models and model fits 45 The print function will also print various other things like a vector of the estimated parameters, the estimated states, the state standard errors, etc., using the what argument in the print call: print(kemfit, what="par") List of the estimated values in each parameter matrix $Z [,1] $A [,1] SJI 0.79786453 EBays 0.27743474 HC -0.07035021 $R [,1] diag 0.03406192 $B [,1] $U [,1] 1 0.04317641 $Q [,1] diag 0.007669608 $x0 [,1] N 6.172048 S 6.206155 $V0 [,1] $G [,1] $H [,1] $C 46 5 MARSS examples [,1] $D [,1] $c [,1] $d [,1] print(kemfit, what="Q") Parameter matrix Q [,1] [,2] [1,] 0.007669608 0.000000000 [2,] 0.000000000 0.007669608 Type ?print.MARSS to see a list of the types of output that can be printed with a print call. If you want to use the output from print instead of printing, then assign the print call to a value: x=print(kemfit, what="states",silent=TRUE) x [,1] [,2] N 6.215483 6.329702 S 6.249445 6.295591 [,7] [,8] N 6.904124 6.944425 S 6.526317 6.572463 [,13] [,14] N 7.228397 7.293141 S 6.776937 6.817832 [,19] [,20] N 7.561182 7.524175 S 6.846578 6.813743 [,3] 6.443921 6.341736 [,9] 6.976697 6.613358 [,15] 7.380439 6.786202 [,21] 7.475514 6.791537 [,4] 6.558140 6.387881 [,10] 7.050053 6.654252 [,16] 7.467975 6.764235 [,22] 7.459263 6.819195 [,5] 6.672359 6.434027 [,11] 7.156567 6.695147 [,17] 7.488458 6.786233 [,6] 6.786578 6.480172 [,12] 7.198947 6.736042 [,18] 7.541996 6.816405 5.6 Confidence intervals on a fitted model The function MARSSparamCIs() is used to compute confidence intervals with a default alpha level of 0.05. The function can compute approximate confidence intervals using a numerically estimated Hessian matrix (method="hessian") or via parametric (method="parametric") or non-parametric (method="innovations") bootstrapping. 5.6 Confidence intervals on a fitted model 47 5.6.1 Approximate confidence intervals from a Hessian matrix The default method for MARSSparamCIs is to use a numerically estimated Hessian matrix: kem.with.hess.CIs = MARSSparamCIs(kemfit) Use print or just type the marssMLE object name to see the confidence intervals: print(kem.with.hess.CIs) MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 22 iterations. Log-likelihood: 7.949236 AIC: 0.1015284 AICc: 2.424109 ML.Est Std.Err low.CI up.CI A.SJI 0.79786 0.0615 0.67729 0.9184 A.EBays 0.27743 0.0625 0.15487 0.4000 A.HC -0.07035 0.0888 -0.24444 0.1037 R.diag 0.03406 0.0175 0.02256 0.0479 U.1 0.04318 0.0144 0.01500 0.0714 Q.diag 0.00767 0.0235 0.00173 0.0179 x0.N 6.17205 0.1455 5.88696 6.4571 x0.S 6.20615 0.1571 5.89828 6.5140 CIs calculated at alpha = 0.05 via method=hessian 5.6.2 Confidence intervals from a parametric bootstrap Use method="parametric" to use a parametric bootstrap to compute confidence intervals and bias using a parametric bootstrap. kem.w.boot.CIs=MARSSparamCIs(kemfit,method="parametric",nboot=10) #nboot should be more like 1000, but set low for example's sake print(kem.w.boot.CIs) MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 22 iterations. Log-likelihood: 7.949236 AIC: 0.1015284 AICc: 2.424109 ML.Est Std.Err low.CI up.CI Est.Bias 48 5 MARSS examples A.SJI 0.79786 0.05472 0.7127 0.8721 0.017888 A.EBays 0.27743 0.06704 0.1814 0.3635 0.022826 A.HC -0.07035 0.09814 -0.2260 0.0492 0.010760 R.diag 0.03406 0.00833 0.0217 0.0484 0.000993 U.1 0.04318 0.01982 0.0303 0.0850 -0.007329 Q.diag 0.00767 0.00523 0.0000 0.0135 0.001860 x0.N 6.17205 0.27866 5.8029 6.6966 -0.034773 x0.S 6.20615 0.49057 5.2721 6.7419 0.188819 Unbias.Est A.SJI 0.81575 A.EBays 0.30026 A.HC -0.05959 R.diag 0.03505 U.1 0.03585 Q.diag 0.00953 x0.N 6.13727 x0.S 6.39497 CIs calculated at alpha = 0.05 via method=parametric Bias calculated via parametric bootstrapping with 10 bootstraps. 5.7 Vectors of just the estimated parameters Often it is useful to have a vector of the estimated parameters. For example, if you are writing a call to optim, you will need a vector of just the estimated parameters. You can use the function coef: parvec=coef(kemfit, type="vector") parvec A.SJI 0.797864531 U.1 0.043176408 A.EBays A.HC 0.277434738 -0.070350207 Q.diag x0.N 0.007669608 6.172047633 R.diag 0.034061922 x0.S 6.206154697 5.8 Kalman filter and smoother output All the standard Kalman filter and smoother output (along with the lag-one covariance smoother output) is available using the MARSSkf function. Read the help file (?MARSSkf) for details and meanings of the names in the output list. kf=MARSSkf(kemfit) names(kf) 5.9 Degenerate variance estimates [1] [5] [9] [13] [17] [21] "xtT" "V0T" "V00T" "J0" "Innov" "ok" "VtT" "x10T" "Vtt" "Kt" "Sigma" "errors" "Vtt1T" "V10T" "Vtt1" "xtt1" "kfas.model" 49 "x0T" "x00T" "J" "xtt" "logLik" #if you only need the logLik, this is the fast way to get it MARSSkf(kemfit, only.logLik=TRUE) 5.9 Degenerate variance estimates If your data are short relative to the number of parameters you are estimating, then you are liable to find that some of the variance elements are degenerate (equal to zero). Try the following: dat.short = dat[1:4,1:10] kem.degen = MARSS(dat.short,control=list(allow.degen=FALSE)) Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 WARNING: Abstol convergence only no log-log convergence. maxit (=500) reached before log-log convergence. The likelihood and params might not be at the ML values. Try setting control$maxit higher. Log-likelihood: 11.67854 AIC: 2.642914 AICc: 63.30958 R.diag U.X.SJF U.X.SJI U.X.EBays U.X.PSnd Q.(X.SJF,X.SJF) Q.(X.SJI,X.SJI) Q.(X.EBays,X.EBays) Q.(X.PSnd,X.PSnd) x0.X.SJF x0.X.SJI x0.X.EBays x0.X.PSnd Estimate 1.22e-02 9.79e-02 1.09e-01 9.28e-02 1.11e-01 1.89e-02 1.03e-05 8.24e-06 3.05e-05 5.96e+00 6.73e+00 6.60e+00 5.71e+00 50 5 MARSS examples Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Convergence warnings Warning: the Q.(X.SJI,X.SJI) parameter value has not converged. Warning: the Q.(X.EBays,X.EBays) parameter value has not converged. Warning: the Q.(X.PSnd,X.PSnd) parameter value has not converged. Type MARSSinfo("convergence") for more info on this warning. This will print a warning that the maximum number of iterations was reached before convergence of some of the Q parameters. It might be that if you just ran a few more iterations the variances will converge. So first try setting control$maxit higher. kem.degen2 = MARSS(dat.short, control=list(maxit=1000, allow.degen=FALSE), silent=2) Output not shown, but if you run the code, you will see that some of the Q terms are still not converging. MARSS can detect if a variance is going to zero and it will try zero to see if that has a higher likelihood. Try removing the allow.degen=FALSE which was turning off this feature. kem.short = MARSS(dat.short) Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 WARNING: Abstol convergence only no log-log convergence. maxit (=500) reached before log-log convergence. The likelihood and params might not be at the ML values. Try setting control$maxit higher. Log-likelihood: 11.6907 AIC: 2.6186 AICc: 63.28527 R.diag U.X.SJF U.X.SJI U.X.EBays U.X.PSnd Q.(X.SJF,X.SJF) Q.(X.SJI,X.SJI) Q.(X.EBays,X.EBays) Q.(X.PSnd,X.PSnd) Estimate 1.22e-02 9.79e-02 1.09e-01 9.24e-02 1.11e-01 1.89e-02 1.03e-05 0.00e+00 3.04e-05 5.9 Degenerate variance estimates x0.X.SJF x0.X.SJI x0.X.EBays x0.X.PSnd 51 5.96e+00 6.73e+00 6.60e+00 5.71e+00 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Convergence warnings Warning: the Q.(X.SJI,X.SJI) parameter value has not converged. Warning: the Q.(X.PSnd,X.PSnd) parameter value has not converged. Type MARSSinfo("convergence") for more info on this warning. So three of the four Q elements are going to zero. This often happens when you do not have enough data to estimate both observation and process variance. Perhaps we are trying to estimate too many variances. We can try using only one variance value in Q and one u value in u: kem.small=MARSS(dat.short,model=list(Q="diagonal and equal", U="equal")) Success! abstol and log-log tests passed at 164 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 164 iterations. Log-likelihood: 11.19 AIC: -8.379994 AICc: 0.9533396 R.diag U.1 Q.diag x0.X.SJF x0.X.SJI x0.X.EBays x0.X.PSnd Estimate 0.0191 0.1027 0.0000 6.0609 6.7698 6.5307 5.7451 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. No, there are simply not enough data to estimate both process and observation variances. 52 5 MARSS examples 5.10 Bootstrap parameter estimates You can easily produce bootstrap parameter estimates from a fitted model using MARSSboot(): boot.params = MARSSboot(kemfit, nboot=20, output="parameters", sim="parametric")$boot.params |2% |20% |40% |60% |80% |100% Progress: |||||||||||||||||||||||||||||||||||||||||||||||||| Use silent=TRUE to stop the progress bar from printing. The function will also produce parameter sets generated using a Hessian matrix (sim="hessian") or a non-parametric bootstrap (sim="innovations"). 5.11 Random initial conditions You can use random initial conditions by passing in MCInit=TRUE: Z.model = factor(c(1,1,2,2,2)) U.model = "equal" Q.model = "diagonal and unequal" R.model = "diagonal and equal" model.list=list(Z=Z.model, R=R.model, U=U.model, Q=Q.model) #Set the numInits very low so the example runs quickly cntl.list=list(MCInit=TRUE,numInits=10) kem.mcinit = MARSS(dat, model=model.list, control=cntl.list) > Starting Monte Carlo Initializations |2% |20% |40% |60% |80% |100% Progress: |||||||||||||||||||||||||||||||||||||||||||||||||| Success! abstol and log-log tests passed at 26 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Monte Carlo initialization with random starts. Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 26 iterations. Log-likelihood: 12.02576 AIC: -6.051511 AICc: -3.100691 A.SJI A.PSnd Estimate 0.79876 -0.78580 5.13 Bootstrap AIC 53 A.HC -0.85449 R.diag 0.02893 U.1 0.04191 Q.(1,1) 0.01162 Q.(2,2) 0.00441 x0.1 6.05128 x0.2 6.89080 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. 5.12 Data simulation 5.12.1 Simulated data from a fitted MARSS model Data can be simulated from marssMLE object using MARSSsimulate(). sim.data=MARSSsimulate(kemfit, nsim=2, tSteps=100)$sim.data Then you might want to estimate parameters from that simulated data. Above we created two simulated datasets (nsim=2). We will fit to the first one. Here the default settings for MARSS() are used. kem.sim.1 = MARSS(sim.data[,,1]) Then we might like to see the likelihood of the second set of simulated data under the model fit to the first set of data. We do that with the Kalman filter function. This function takes a marssMLE object (as output by say the MARSS function), and we have to replace the data in kem.sim.1 with the second set of simulated data. kem.sim.2 = kem.sim.1 kem.sim.2$model$data = sim.data[,,2] MARSSkf( kem.sim.2 )$logLik [1] 20.19664 5.13 Bootstrap AIC The function MARSSaic() computes a bootstrap AIC for model selection purposes. Use output="AICbp" to produce a parameter bootstrap. Use output="AICbb" to produce a non-parametric bootstrap AIC. You will need a large number of bootstraps (nboot). We use only 10 bootstraps to show you how to compute AICb with the MARSS package, but the AICbp estimate will be terrible with this few bootstraps. 54 5 MARSS examples kemfit.with.AICb = MARSSaic(kemfit, output = "AICbp", Options = list(nboot = 10, silent=TRUE)) #nboot should be more like 1000, but set low here for example sake print(kemfit.with.AICb) MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 22 iterations. Log-likelihood: 7.949236 AIC: 0.1015284 AICc: 2.424109 AICbp(param): 211.2704 Estimate A.SJI 0.79786 A.EBays 0.27743 A.HC -0.07035 R.diag 0.03406 U.1 0.04318 Q.diag 0.00767 x0.N 6.17205 x0.S 6.20615 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. 5.14 Convergence MARSS uses two convergence tests. The first is logLiki+1 − logLiki < tol This is called abstol (meaning absolute tolerance) in the output. The second is called the conv.test.slope. This looks at the slope of the log parameter value (or likelihood) versus log iteration number and asks whether that is close to zero (not changing). If you are having trouble getting the model to converge, then start by addressing the following 1) Are you trying to fit a bad model, e.g. a nonstationary model fit to stationary data or the opposite or a model that specifies independence of errors or states to data that clearly violate that or a model that implies a particular stationary distribution (particular mean and variance) to data that strongly violate that? 2) Do you have confounded parameters, e.g. two parameters that have the same effect (like effectively two intercepts)?, 3) Are you trying to fit a model to 1 data point somewhere, e.g. in a big multivariate dataset with lots of missing values? 4) How many 5.14 Convergence 55 parameters are you trying to estimate per data point? 5) Check your residuals (residuals(kemfit)$model.residuals) for normality. 6) Did you do any data transformations that would cause one of the variances to go to zero? Replacing 0s with a constant will do that. Try replacing them with NAs (missing). Do you have long strings of constant numbers in your data? Binned data often look like that, and that will drive Q to 0. Part III Applications 59 In this part, we walk you through some longer analyses using MARSS models for a variety of different applications. Most of these are analyses of ecological data, but the same models are used in many other fields. These longer examples will take you through both the conceptual steps (with pencil and paper) and a R step which translates the conceptual model into code. Set-up If you haven’t already, install the MARSS package. See directions on the CRAN webpage (http://cran.r-project.org/) for instructions on installing packages. You will need write permissions for your R program directories to install packages. See the help pages on CRAN for workarounds if you don’t have write permission. Type in library(MARSS) at the R command line to load the package after you install it. Tips summary(foo$model), where foo is a fitted model object, will print detailed information on the structure of the MARSS model that was fit in the call foo = MARSS(logdata). This allows you to double check the model you fit. print(foo) will print a ‘English’ version of the model structure along with the parameter estimates. When you run MARSS(), it will output the number of iterations used. If you reached the maximum, re-run with control=list(maxit=...) set higher than the default. If you mis-specify the model, MARSS() will post an error that should give you an idea of the problem (make sure silent=FALSE to see full error reports). Remember, the number of rows in your data is n, time is across the columns, and the length of the vector or factors passed in for model$Z must be m, the number of x hidden state trajectories in your model. The missing value indicator is NA. Running MARSS(data), with no arguments except your data, will fit a MARSS model with m = n, a diagonal Q matrix with m variances, and i.i.d. observation errors. Try MARSSinfo() at the command line if you get errors or warnings you don’t understand. You might find insight there. Or look at the warnings and errors notes in the appendix of this user guide. 6 Count-based population viability analysis (PVA) using corrupted data 6.1 Background Estimates of extinction and quasi-extinction risk are an important risk metric used in the management and conservation of endangered and threatened species. By necessity, these estimates are based on data that contain both variability due to real year-to-year changes in the population growth rate (process errors) and variability in the relationship between the true population size and the actual count (observation errors). Classic approaches to extinction risk assume the data have only process error, i.e. no observation error. In reality, observation error is ubiquitous both because of the sampling variability and also because of year-to-year (and day-to-day) variability in sightability. In this application, we will fit a univariate (meaning one time series) statespace model to population count data with observation error. We will compute the extinction risk metrics given in Dennis et al. (1991), however instead of using a process-error only model (as is done in the original paper), we use a model with both process and observation error. The risk metrics and their interpretations are the same as in Dennis et al. (1991). The only real difference is how we compute σ2 , the process error variance. However this difference has a large effect on our risk estimates, as you will see. We use here a density-independent model, a stochastic exponential growth model in log space. This equivalent to a MARSS model with B = 1. Densityindependence is often a reasonable assumption when doing a population viability analysis because we do such calculations for at-risk populations that are either declining or that are well below historical levels (and presumably carrying capacity). In an actual population viability analysis, it is necessary to justify this assumption and if there is reason to doubt the assumption, Type RShowDoc("Chapter_PVA.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 62 6 Count-based PVA one tests for density-dependence (Taper and Dennis, 1994) and does sensitivity analyses using state-space models with density-dependence (Dennis et al., 2006). The univariate model is written: xt = xt−1 + u + wt where wt ∼ N(0, σ2 ) (6.1) yt = xt + vt N(0, η2 ) (6.2) where vt ∼ where yt is the logarithm of the observed population size at time t, xt is the unobserved state at time t, u is the growth rate, and σ2 and η2 are the process and observation error variances, respectively. In the R code to follow, σ2 is denoted Q and η2 is denoted R because the functions we are using are also for multivariate state-space models and those models use Q and R for the respective variance-covariance matrices. 6.2 Simulated data with process and observation error We will start by using simulated data to see the difference between data and estimates from a model with process error only versus a model that also includes observation error. For our simulated data, we used a decline of 5% per year, process variability of 0.02 (typical for small to medium-sized vertebrates), and a observation variability of 0.05 (which is a bit on the high end). We’ll randomly set 10% of the values as missing. Here is the code: First, set things up: sim.u = -0.05 sim.Q = 0.02 sim.R = 0.05 nYr= 50 fracmissing = 0.1 init = 7 years = seq(1:nYr) x = rep(NA,nYr) y = rep(NA,nYr) # # # # # # # # growth rate process error variance non-process error variance number of years of data to generate fraction of years that are missing log of initial pop abundance sequence 1 to nYr replicate NA nYr times Then generate the population sizes using Equation 6.1: x[1]=init for(t in 2:nYr){ x[t] = x[t-1]+ sim.u + rnorm(1,mean=0,sd=sqrt(sim.Q)) } Lastly, add observation error using Equation 6.2 and then add missing values: for(t in 1:nYr){ y[t]= x[t] + rnorm(1,mean=0,sd=sqrt(sim.R)) } 6.2 Simulated data with process and observation error 63 missYears = sample(years[2:(nYr-1)],floor(fracmissing*nYr), replace = FALSE) y[missYears]=NA Stochastic population trajectories show much variation, so it is best to look at a few simulated data sets at once. In Figure 6.1, nine simulations from the identical parameters are shown. 0 10 20 30 40 50 7 6 5 simulation 6 0 0 10 20 30 40 50 10 20 30 40 50 7.0 simulation 9 ● ●● ● ●● ● ●●● ● ● ● ● ●● ●●●●● ● ● ● ●● 5.5 ● ● ● ● ●●● ● ● ● ● ● ●●● ●●●●●●● ● ● ● ● ●●● ● ●●● ● ●●●● ● ●●● ● ● ● ●● ● ● ●● ●● ●● ● ●● ● ●● ●● ● 4.0 6.5 5.5 4.5 7 6 5 4 10 20 30 40 50 10 20 30 40 50 ●● ●●● ● ●● ● ● ● ●● ●●●●● ● ● ●● ● ● ● ● ●●●●●● ● ● ●● ●● ●● ● ● ● ●● simulation 8 index of log abundance ●●●● ●● ●●●● ●● ● 3 index of log abundance simulation 7 ● ●●● ● ● ●●●●●●● ● ● ●●● ● ● ●●●● ●●●● ● ● ●● 0 0 4.5 5.5 6.5 ●● ● ●●●●● ● ●● ●● ●● ● ● ●● ●●●●● ● ●●● ● ●● ● ● ● ● ● ●●● ●●● ● ●● index of log abundance 6.5 5.0 3.5 ● 10 20 30 40 50 index of log abundance 8.0 6.5 ● ● ●●●●● ● ●●●●● ● ● ● ●● ●● 0 ● ●● ●● ●● ● ●● ● ●● ● ●● ●● ●● ● ● ●●● ●● ● ● ● ●● ● ●●● ● ●●●● ● ●●● simulation 5 ●●●● ● ●●● ● ● ●● ● ●●● ●●● ● ● ● ●● 5.0 index of log abundance simulation 4 10 20 30 40 50 4 5.5 0 simulation 3 index of log abundance 10 20 30 40 50 ● ● ● ●●●● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●●●● ●● ● ● ● ● ●● ● ●● ● ● ●● ● index of log abundance 0 6.5 simulation 2 index of log abundance 7 6 5 4 index of log abundance simulation 1 ● ● ●● ●●● ●●●● ● ● ● ● ● ●●● ● ● ● ●●●● ●●●●● ● ● ●●● ●● ● ● ●●●●● 0 10 20 30 40 50 Fig. 6.1. Plot of nine simulated population time series with process and observation error. Circles are observation and the dashed line is the true population size. Example 6.1 (The effect of parameter values on parameter estimates) A good way to get a feel for reasonable σ2 values is to generate simulated data and look at the time series. A biologist would have a pretty good idea of what kind of year-to-year population changes are reasonable for their study species. For example for many large mammalian species, the maximum population yearly increase would be around 50% (the population could go from 1000 to 64 6 Count-based PVA 1500 in one year), but some of fish species could easily double or even triple in a really good year. Observed data may bounce around a lot for many different reasons having to do with sightability, sampling error, age-structure, etc., but the underlying population trajectory is constrained by the kinds of year-to-year changes in population size that are biologically possible. σ2 describes those true population changes. You should run the example code several times using different parameter values to get a feel for how different the time series can look based on identical parameter values. You can cut and paste from the pdf into the R command line. Typical vertebrate σ2 values are 0.002 to 0.02, and typical η2 values are 0.005 to 0.1. A u of -0.01 translates to an average 1% per year decline and a u of -0.1 translates to an average 10% per year decline (approximately). Example 6.1 code par(mfrow=c(3,3)) sim.u = -0.05 sim.Q = 0.02 sim.R = 0.05 nYr= 50 fracmiss = 0.1 init = 7 years = seq(1:nYr) for(i in 1:9){ x = rep(NA,nYr) # vector for ts w/o measurement error y = rep(NA,nYr) # vector for ts w/ measurement error x[1]=init for(t in 2:nYr){ x[t] = x[t-1]+ sim.u + rnorm(1, mean=0, sd=sqrt(sim.Q)) } for(t in 1:nYr){ y[t]= x[t] + rnorm(1,mean=0,sd=sqrt(sim.R)) } missYears = sample(years[2:(nYr-1)],floor(fracmiss*nYr),replace = FALSE) y[missYears]=NA plot(years, y, xlab="",ylab="log abundance",lwd=2,bty="l") lines(years,x,type="l",lwd=2,lty=2) title(paste("simulation ",i) ) } legend("topright", c("Observed","True"), lty = c(-1, 2), pch = c(1, -1)) 6.3 Maximum-likelihood parameter estimation 65 6.3 Maximum-likelihood parameter estimation 6.3.1 Model with process and observation error Using the simulated data, we estimate the parameters, u, σ2 , and η2 , and the hidden population sizes. These are the estimates using a model with process and observation variability. The function call is kem = MARSS(data), where data is a vector of logged (base e) counts with missing values denoted by NA. After this call, the maximum-likelihood parameter estimates are shown with coef(kem). There are numerous other outputs from the MARSS() function. To get a list of the standard model output available type in ?print.MARSS. Note that kem is just a name; the output could have been called foo. Here’s code to fit to the simulated time series: kem = MARSS(y) Let’s look at the parameter estimates for the nine simulated time series in Figure 6.1 to get a feel for the variation. The MARSS() function was used on each time series to produce parameter estimate for each simulation. The estimates are followed by the mean (over the nine simulations) and the true values: sim 1 sim 2 sim 3 sim 4 sim 5 sim 6 sim 7 sim 8 sim 9 mean sim true kem.U -0.07340254 -0.02955458 -0.06468184 -0.03546548 -0.06600771 -0.05154663 -0.07953722 -0.04622466 -0.04827980 -0.05496672 -0.05000000 kem.Q 0.011951194 0.055749879 0.000000000 0.031934036 0.008450966 0.009137402 0.005988066 0.023932029 0.021325149 0.018718747 0.020000000 kem.R 0.052419041 0.003257744 0.092393541 0.040441294 0.071950486 0.072497614 0.071740967 0.033372804 0.048361357 0.054048316 0.050000000 As expected, the estimated parameters do not exactly match the true parameters, but the average should be fairly close (although nine simulations is a small sample size). Also note that although we do not get u quite right, our estimates are usually negative. Thus our estimates usually indicate declining dynamics. Some of the kem.Q estimates may be 0. This means that the maximum-likelihood estimate that the data are generated by is a process with no environment variation and only observation error. The MARSS model fit also gives an estimate of the true population size with observation error removed. This is in kem$states. Figure 6.2 shows the estimated true states of the population over time as a solid line. Note that the solid line is considerably closer to the actual true states (dashed line) than the observations. On the other hand with certain datasets, the estimates can be quite wrong as well! 6 Count-based PVA 0 10 20 30 40 50 7 6 5 simulation 6 0 0 10 20 30 40 50 10 20 30 40 50 simulation 9 7.0 ● ● ● ● ●●● ● ● ● ● ● ●●● ●●●●●●● ● ● ● ● ●●● ● ●●● ● ●●●● ● ●●● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ●● ●●●●● ● ● ● ●● 5.5 6.5 5.5 4.5 7 6 5 4 10 20 30 40 50 10 20 30 40 50 ●● ●●● ● ●● ● ● ● ●● ●●●●● ● ● ●● ● ● ● ● ●●●●●● ● ● ●● ●● ●● ● ● ● ●● simulation 8 index of log abundance ●●●● ●● ●●●● ●● ● 3 index of log abundance simulation 7 ● ●●● ● ● ●●●●●●● ● ● ●●● ● ● ●●●● ●●●● ● ● ●● 0 0 4.5 5.5 6.5 ●● ● ●●●●● ● ●● ●● ●● ● ● ●● ●●●●● ● ●●● ● ●● ● ● ● ● ● ●●● ●●● ● ●● index of log abundance 6.5 5.0 3.5 ● 10 20 30 40 50 index of log abundance 8.0 6.5 ● ● ●●●●● ● ●●●●● ● ● ● ●● ●● 0 ● ●● ●● ●● ● ●● ● ●● ● ●● ●● ●● ● ● ●●● ●● ● ● ● ●● ● ●●● ● ●●●● ● ●●● simulation 5 ●●●● ● ●●● ● ● ●● ● ●●● ●●● ● ● ● ●● 5.0 index of log abundance simulation 4 10 20 30 40 50 4 5.5 0 simulation 3 index of log abundance 10 20 30 40 50 ● ● ● ●●●● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●●●● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ● ●● ●● ● 4.0 0 6.5 simulation 2 index of log abundance 7 6 5 4 index of log abundance simulation 1 ● ● ●● ●●● ●●●● ● ● ● ● ● ●●● ● ● ● ●●●● ●●●●● ● ● ●●● ●● ● ● ●●●●● index of log abundance 66 0 10 20 30 40 50 Fig. 6.2. The circles are the observed population sizes with error. The dashed lines are the true population sizes. The solid thin lines are the estimates of the true population size from the MARSS model. When the process error variance is 0, these lines are straight. 6.3.2 Model with no observation error We used the MARSS model to estimate the mean population rate u and process variability σ2 under the assumption that the count data have observation error. However, the classic approach to this problem, referred to as the “Dennis model” (Dennis et al., 1991), uses a model that assumes the data have no observation error (a MAR model); all the variability in the data is assumed to result from process error. This approach works well if the observation error in the data is low, but not so well if the observation error is high. We will next fit the data using the classic approach so that we can compare and contrast parameter estimates from the different methods. Using the estimation method in Dennis et al. (1991), our data need to be re-specified as the observed population changes (delta.pop) between censuses along with the time between censuses (tau). We re-specify the data as follows: den.years = years[!is.na(y)] # the non missing years den.y = y[!is.na(y)] # the non missing counts 6.3 Maximum-likelihood parameter estimation 67 den.n.y = length(den.years) delta.pop = rep(NA, den.n.y-1 ) # population transitions tau = rep(NA, den.n.y-1 ) # step sizes for (i in 2:den.n.y ){ delta.pop[i-1] = den.y[i] - den.y[i-1] tau[i-1] = den.years[i] - den.years[i-1] } # end i loop Next, we regress the changes in population size between censuses (delta.pop) on the time between censuses (tau) while setting the regression intercept to 0. The slope of the resulting regression line is an estimate of u, while the variance of the residuals around the line is an estimate of σ2 . The regression is shown in Figure 6.3. Here is the code to do that regression: den91 <- lm(delta.pop ~ -1 + tau) # note: the "-1" specifies no intercept den91.u = den91$coefficients den91.Q = var(resid(den91)) #type ?lm to learn about the linear regression function in R #form is lm(dependent.var ~ response.var1 + response.var2 + ...) #type summary(den91) to see other info about our regression fit Here are the parameter values for the data in Figure 6.2 using the processerror only model: sim 1 sim 2 sim 3 sim 4 sim 5 sim 6 sim 7 sim 8 sim 9 mean sim true den91.U -0.057849407 -0.003238396 -0.076826139 -0.023682404 -0.041188033 -0.039165343 -0.108078066 -0.054505097 -0.057180717 -0.051301511 -0.050000000 den91.Q 0.12482394 0.07356204 0.14110386 0.11590874 0.18079387 0.17231437 0.12537368 0.09170650 0.11592826 0.12683503 0.02000000 Notice that the u estimates are similar to those from MARSS model, but the σ2 estimate (Q) is much larger. That is because this approach treats all the variance as process variance, so any observation variance in the data is lumped into process variance (in fact it appears as an additional variance of twice the observation variance). Example 6.2 (The variability in parameter estimates) 6 Count-based PVA 0.5 ● ● ● −0.5 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.0 population transition size 68 ● 0.0 0.5 1.0 1.5 2.0 time step size (tau) Fig. 6.3. The regression of log(Nt+τ ) − log(Nt ) against τ. The slope is the estimate of u and the variance of the residuals is the estimate of σ2 . In this example, you will look at how variable the parameter estimates are by generating multiple simulated data sets and then estimating parameter values for each. You’ll compare the MARSS estimates to the estimates using a process error only model (i.e. ignoring the observation error). Run the example code a few times to compare the estimates using a state-space model (kem) versus the model with no observation error (den91). You can copy and paste the code from the pdf file into R . Next, chanbe the observation variance in the code, sim.R, in the data generation step in order to get a feel for the estimation performance as observations are further corrupted. What happens as observation error is increased? Next, decrease the number of years of data, nYr, and re-run the parameter estimation. What is the effect of fewer years of data? If you find that the example code takes too long to run, reduce the number of simulations (by reducing nsim in the code). 6.3 Maximum-likelihood parameter estimation 69 Example 6.2 code sim.u = -0.05 # growth rate sim.Q = 0.02 # process error variance sim.R = 0.05 # non-process error variance nYr= 50 # number of years of data to generate fracmiss = 0.1 # fraction of years that are missing init = 7 # log of initial pop abundance (~1100 individuals) nsim = 9 years = seq(1:nYr) # col of years params = matrix(NA, nrow=(nsim+2), ncol=5, dimnames=list(c(paste("sim",1:nsim),"mean sim","true"), c("kem.U","den91.U","kem.Q","kem.R", "den91.Q"))) x.ts = matrix(NA,nrow=nsim,ncol=nYr) # ts w/o measurement error y.ts = matrix(NA,nrow=nsim,ncol=nYr) # ts w/ measurement error for(i in 1:nsim){ x.ts[i,1]=init for(t in 2:nYr){ x.ts[i,t] = x.ts[i,t-1]+sim.u+rnorm(1,mean=0,sd=sqrt(sim.Q))} for(t in 1:nYr){ y.ts[i,t] = x.ts[i,t]+rnorm(1,mean=0,sd=sqrt(sim.R))} missYears = sample(years[2:(nYr-1)], floor(fracmiss*nYr), replace = FALSE) y.ts[i,missYears]=NA #MARSS estimates kem = MARSS(y.ts[i,], silent=TRUE) #type=vector outputs the estimates as a vector instead of a list params[i,c(1,3,4)] = coef(kem,type="vector")[c(2,3,1)] #Dennis et al 1991 estimates den.years = years[!is.na(y.ts[i,])] # the non missing years den.yts = y.ts[i,!is.na(y.ts[i,])] # the non missing counts den.n.yts = length(den.years) delta.pop = rep(NA, den.n.yts-1 ) # transitions tau = rep(NA, den.n.yts-1 ) # time step lengths for (t in 2:den.n.yts ){ delta.pop[t-1] = den.yts[t] - den.yts[t-1] # transitions tau[t-1] = den.years[t]-den.years[t-1] # time step length } # end i loop den91 <- lm(delta.pop ~ -1 + tau) # -1 specifies no intercept params[i,c(2,5)] = c(den91$coefficients, var(resid(den91))) } params[nsim+1,]=apply(params[1:nsim,],2,mean) params[nsim+2,]=c(sim.u,sim.u,sim.Q,sim.R,sim.Q) 70 6 Count-based PVA Here is an example of the output from the Example 6.2 code: print(params,digits=3) sim 1 sim 2 sim 3 sim 4 sim 5 sim 6 sim 7 sim 8 sim 9 mean sim true kem.U -0.0287 -0.0635 -0.0206 -0.0410 -0.0457 -0.0642 -0.0765 -0.0371 -0.0338 -0.0457 -0.0500 den91.U -0.0384 -0.0669 -0.0365 -0.0438 -0.0278 -0.0875 -0.0838 -0.0300 -0.0443 -0.0510 -0.0500 kem.Q 0.02118 0.02218 0.03940 0.00000 0.03530 0.01841 0.02101 0.00336 0.00957 0.01893 0.02000 kem.R den91.Q 0.0680 0.1486 0.0516 0.1304 0.0514 0.1571 0.0753 0.1300 0.0392 0.1216 0.0292 0.0772 0.0512 0.1343 0.0497 0.0829 0.0750 0.1522 0.0545 0.1260 0.0500 0.0200 6.4 Probability of hitting a threshold Π(xd ,te ) A common extinction risk metric is ‘the probability that a population will hit a certain threshold xd within a certain time frame te – if the observed trends continue’. In practice, the threshold used is not Ne = 1, which would be true extinction. Often a ‘functional’ extinction threshold will be used (Ne >> 1). Other times a threshold representing some fraction of current levels is used. The latter is used because we often have imprecise information about the relationship between the true population size and what we measure in the field; that is, many population counts are index counts. In these cases, one must use ‘fractional declines’ as the threshold. Also, extinction estimates that use an absolute threshold (like 100 individuals) are quite sensitive to error in the estimate of true population size. Here, we are going to use fractional declines as the threshold, specifically pd = 0.1 which means a 90% decline. The probability of hitting a threshold, denoted Π(xd ,te ), is typically presented as a curve showing the probabilities of hitting the threshold (y-axis) over different time horizons (te ) on the x-axis. Extinction probabilities can be computed through Monte Carlo simulations or analytically using Equation 16 in Dennis et al. (1991) (note there is a typo in Equation 16; the last + is supposed to be a − ). We will use the latter method: ! ! −x − |u|t −xd + |u|te e d p p + exp(2xd |u|/σ2 )Φ (6.3) Π(xd ,te ) = π(u) × Φ σ2te σ2te 6.4 Probability of hitting a threshold Π(xd ,te ) 71 where xe is the threshold and is defined as xe = log(N0 /Ne ). N0 is the current population estimate and Ne is the threshold. If we are using fractional declines then xe = log(N0 /(pd ×N0 )) = − log(pd ). π(u) is the probability that the threshold is eventually hit (by te = ∞). π(u) = 1 if u <= 0 and π(u) = exp(−2uxd /σ2 ) if u > 0. Φ() is the cumulative probability distribution of the standard normal (mean = 0, sd = 1). Here is the R code for that computation: pd = 0.1 #means a 90 percent decline tyrs = 1:100 xd = -log(pd) p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q)) #Q=sigma2 for (i in 1:100){ Pi[i] = p.ever * pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+ exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)*tyrs[i])/sqrt(Q*tyrs[i])) } Figure 6.4 shows the estimated probabilities of hitting the 90% decline for the nine 30-year times series simulated with u = −0.05, σ2 = 0.01 and η2 = 0.05. The dashed line shows the estimates using the MARSS parameter estimates and the solid line shows the estimates using a process-error only model (the den91 estimates). The circles are the true probabilities. The difference between the estimates and the true probalities is due to errors in û. Those errors are due largely to process error—not observation error. As we saw earlier, by chance population trajectories with a u < 0 will increase, even over a 50-year period. In this case, û will be positive when in fact u < 0. Looking at the figure, it is obvious that the probability estimates are highly variable. However, look at the first panel. This is the average estimate (over nine simulations). Note that on average (over nine simulations), the estimates are good. If we had averaged over 1000 simulations instead of nine, you would see that the MARSS line falls on the true line. It is an unbiased predictor. While that may seem a small consolation if estimates for individual simulations are all over the map, it is important for correctly specifying our uncertainty about our estimates. Second, rather than focusing on how the estimates and true lines match up, see if there are any types of forecasts that seem better than others. For example, are 20-year predictions better than 50-year and are 100-year forecasts better or worse. In Example 6.3, you will remake this figure with different u. You’ll discover from that forecasts are more certain for populations that are declining faster. Example 6.3 (The effect of parameter values on risk estimates) In this example, you will recreate Figure 6.4 using different parameter values. This will give you a feel for how variability in the data and population pro- 6 Count-based PVA 20 40 60 80 0.8 0.4 0.0 probability of extinction 0.8 0.4 0.0 probability of extinction 0 simulation 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 20 40 60 80 0 20 40 60 80 0.4 0.0 0.4 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 simulation 5 probability of extinction simulation 4 0.8 simulation 3 probability of extinction time steps into future ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 time steps into future simulation 6 simulation 7 simulation 8 20 40 60 80 time steps into future 0 20 40 60 80 time steps into future 0.4 0.0 0.4 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 time steps into future 0.8 time steps into future ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time steps into future probability of extinction 0.8 0.4 0.0 0.8 0.4 probability of extinction 0.0 0.8 20 40 60 80 simulation 1 time steps into future ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.4 probability of extinction ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.0 probability of extinction average over sims probability of extinction 72 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● True Dennis KalmanEM 0 20 40 60 80 time steps into future Fig. 6.4. Plot of the true and estimated probability of declining 90% in different time horizons for nine simulated population time series with observation error. The plot may look like a step-function if the σ2 estimate is very small (<1e-4 or so). cess affect the risk estimates. You’ll need to run the Example 6.2 code before running the Example 6.3 code. Begin by changing sim.R and rerunning the Example 6.2 code. Now run the Example 6.3 code and generate parameter estimates. When are the estimates using the process-error only model (den91) worse and in what way are they worse? You might imagine that you should always use a model that includes observation error, since in practice observations are never perfect. However, there is a cost to estimating that extra variance parameter and the cost is a more variable σ2 (Q) estimate. Play with shortening the time series and decreasing the sim.R values. Are there situations when the ‘cost’ of the extra parameter is greater than the ‘cost’ of ignoring observation error? Next change the rate of decline in the simulated data. To do this, rerun the Example 6.2 code using a lower sim.u; then run the Example 6.3 code. Do the estimates seem better or worse for rapidly declining populations? Rerun the Example 6.2 code using fewer number of years (nYr smaller) and increase 6.4 Probability of hitting a threshold Π(xd ,te ) 73 fracmiss. Run the Example 6.3 code again. The graphs will start to look peculiar. Why do you think it is doing that? Hint: look at the estimated parameters. Last change the extinction threshold (pd in the Example 6.3 code). How does changing the extinction threshold change the extinction probability curves? Do not remake the data, i.e. don’t rerun the Example 6.2 code. 74 6 Count-based PVA Example 6.3 code #Needs Example 2 to be run first par(mfrow=c(3,3)) pd = 0.1; xd = -log(pd) # decline threshold te = 100; tyrs = 1:te # extinction time horizon for(j in c(10,1:8)){ real.ex = denn.ex = kal.ex = matrix(nrow=te) #MARSS parameter estimates u=params[j,1]; Q=params[j,3] if(Q==0) Q=1e-4 #just so the extinction calc doesn't choke p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q)) for (i in 1:100){ if(is.finite(exp(2*xd*abs(u)/Q))){ sec.part = exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)* tyrs[i])/sqrt(Q*tyrs[i])) }else sec.part=0 kal.ex[i]=p.ever*pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+sec.part } # end i loop #Dennis et al 1991 parameter estimates u=params[j,2]; Q=params[j,5] p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q)) for (i in 1:100){ denn.ex[i]=p.ever*pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+ exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)*tyrs[i])/sqrt(Q*tyrs[i])) } # end i loop #True parameter values u=sim.u; Q=sim.Q p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q)) for (i in 1:100){ real.ex[i]=p.ever*pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+ exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)*tyrs[i])/sqrt(Q*tyrs[i])) } # end i loop #plot it plot(tyrs, real.ex, xlab="time steps into future", ylab="probability of extinction", ylim=c(0,1), bty="l") if(j<=8) title(paste("simulation ",j) ) if(j==10) title("average over sims") lines(tyrs,denn.ex,type="l",col="red",lwd=2,lty=1) lines(tyrs,kal.ex,type="l",col="green",lwd=2,lty=2) } legend("bottomright",c("True","Dennis","KalmanEM"),pch=c(1,-1,-1), col=c(1,2,3),lty=c(-1,1,2),lwd=c(-1,2,2),bty="n") 6.5 Certain and uncertain regions 75 6.5 Certain and uncertain regions From Example 6.3, you have observed one of the problems with estimates of the probability of hitting thresholds. Looking over the nine simulations, your risk estimates will be on the true line sometimes and other times they are way off. So your estimates are variable and one should not present only the point estimates of the probability of 90% decline. At the minimum, confidence intervals need to be added (next section), but even with confidence intervals, the probability of hitting declines often does not capture our certainty and uncertainty about extinction risk estimates. From Example 6.3, you might have also noticed that there are some time horizons (10, 20 years) for which the estimate are highly certain (the threshold is never hit), while for other time horizons (30, 50 years) the estimates are all over the map. Put another way, you may be able to say with high confidence that a 90% decline will not occur between years 1 to 20 and that by year 100 it most surely will have occurred. However, between the years 20 and 100, you are very uncertain about the risk. The point is that you can be certain about some forecasts while at the same time being uncertain about other forecasts. One way to show this is to plot the uncertainty as a function of the forecast, where the forecast is defined in terms of the forecast length (number of years) and forecasted decline (percentage). Uncertainty is defined as how much of the 0-1 range your 95% confidence interval covers. Ellner and Holmes (2008) show such a figure (their Figure 1). Figure 6.5 shows a version of this figure that you can produce with the function CSEGtmufigure(u= val, N= val, s2p= val). For the figure, the values u = −0.05 which is a 5% per year decline, N = 25 so 25 years between the first and last census, and s2p = 0.01 are used. The process variability for big mammals is typically in the range of 0.002 to 0.02. Example 6.4 (Uncertain and certain regions) Use the Example 6.4 code to re-create Figure 6.5 and get a feel for when risk estimates are more certain and when they are less certain. N are the number of years of data, u is the mean population growth rate, and s2p is the process variance. Example 6.4 code par(mfrow = c(1, 1)) CSEGtmufigure(N = 50, u = -0.05, s2p = 0.02) 76 6 Count-based PVA −1.0 high certainty P<0.05 high certainty P>0.95 uncertain highly uncertain 50% 90% −1.5 −2.0 xe = log10(N0/Ne) −0.5 0.0 time steps = 50 mu = −0.05 s2.p = 0.02 99% 20 40 60 80 100 Projection interval T time steps Fig. 6.5. This figure shows your region of high uncertainty (dark gray). In this region, the minimum 95% confidence intervals (meaning if you had no observation error) span 80% of the 0 to 1 probability. That is, you are uncertain if the probability of a specified decline is close to 0 or close to 1. The white area shows where your upper 95% CIs does not exceed P=0.05. So you are quite sure the probability of a specified decline is less than 0.05. The black area shows where your lower 95% confidence interval is above P=.95. So you are quite sure the probability is greater than P=0.95. The light gray is between these two certain/uncertain extremes. 6.6 More risk metrics and some real data The previous sections have focused on the probability of hitting thresholds because this is an important and common risk metric used in population viability analysis and it appears in IUCN Red List criteria. However, as you have seen, there is high uncertainty associated with such estimates. Part of the problem is that probability is constrained to be 0 to 1, and it is easy to get estimates with confidence intervals that span 0 to 1. Other metrics of risk, û and the distribution of the time to hit a threshold (Dennis et al., 1991), do not have this problem and may be more informative. Figure 6.6 shows different risk metrics from Dennis et al. (1991) on a single plot. This figure is generated by a call to the function CSEGriskfigure(): 6.7 Confidence intervals 77 dat=read.table(datafile, skip=1) dat=as.matrix(dat) CSEGriskfigure(dat) The datafile is the name of the data file, with years in column 1 and population count (logged) in column 2. CSEGriskfigure() has a number of arguments that can be passed in to change the default behavior. The variable te is the forecast length (default is 100 years), threshold is the extinction threshold either as an absolute number, if absolutethresh=TRUE, or as a fraction of current population count, if absolutethresh=FALSE. The default is absolutethresh=FALSE and threshold=0.1. datalogged=TRUE means the data are already logged; this is the default. Example 6.5 (Risk figures for different species) Use the Example 6.5 code to re-create Figure 6.6. The package includes other data for you to run: prairiechicken from the endangered Attwater Prairie Chicken, graywhales from Gerber et al. (1999), and grouse from the Sharptailed Grouse (a species of U.S. federal concern) in Washington State. Note for some of these other datasets, the Hessian matrix cannot be inverted and you will need to use CI.method="parametric". If you have other text files of data, you can run those too. The commented lines show how to read in data from a tab-delimited text file with a header line. Example code #If you have your data in a tab delimited file with a header #This is how you would read it in using file.choose() #to call up a directory browser. #However, the package has the datasets for the examples #dat=read.table(file.choose(), skip=1) #dat=as.matrix(dat) dat = wilddogs CSEGriskfigure(dat, CI.method="hessian", silent=TRUE) 6.7 Confidence intervals The figures produced by CSEGriskfigure() have confidence intervals (95% and 75%) on the probabilities in the top right panel. A standard way to 78 6 Count-based PVA ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 1.0 0.4 0.6 0.8 95% CI 75% CI mean 0.2 ● probability to hit threshold ● ● Prob. to hit 2 0.0 60 ● 40 Pop. Estimate 80 u est = −0.054 (95% CIs NULL , NULL ) Q est = 0.052 1970 1974 1978 1982 1986 1990 0 20 40 60 80 100 time steps into future 50 100 150 1.0 0.8 0.6 0.4 0.2 probability to hit threshold 0.010 200 1.5 2.0 2.5 3.0 3.5 4.0 time steps into future Number of ind. at Ne Sample projections time steps = 22 mu = −0.054 s2.p = 0.052 4.5 xe = log10(N0/Ne) 0 20 40 60 80 time steps into the future 100 −2.0 −1.5 −1.0 −0.5 40 0 20 N 60 0.0 0 90% threshold 0.0 0.020 Prob. of hitting threshold in 100 time steps 0.000 probability to hit threshold PDF of time to threshold given it IS reached 50% 90% 99% 20 40 60 80 100 Projection interval T time steps Fig. 6.6. Risk figure using data for the critically endangered African Wild Dog (data from Ginsberg et al. 1995). This population went extinct after 1992. produce these intervals is via parametric bootstrapping. Here are the steps in a parametric bootstrap: You estimate u, σ2 and η2 Then you simulate time series using those estimates and Equations 6.1 and 6.2 Then you re-estimate your parameters from the simulated data (using say MARSS(simdata) Repeat for 1000s of time series simulated using your estimated parameters. This gives you a large set of bootstrapped parameter estimates 6.8 Comments 79 For each bootstrapped parameter set, compute a set of extinction estimates (you use Equation 6.3 and code from Example 6.3) The α% ranges on those bootstrapped extinction estimates gives you your α confidence intervals on your probabilities of hitting thresholds The MARSS package provides the function MARSSparamCIs() to add bootstrapped confidence intervals to fitted models (type ?MARSSparamCIs to learn about the function). In the function CSEGriskfigure(), you can set CI.method = c("hessian", "parametric", "innovations", "none") to tell it how to compute the confidence intervals. The methods ‘parametric’ and ‘innovations’ specify parametric and non-parametric bootstrapping respectively. Producing parameter estimates by bootstrapping is quite slow. Approximate confidence intervals on the parameters can be generated rapidly using the inverse of a numerically estimated Hessian matrix (method ‘hessian’). This uses an estimate of the variance-covariance matrix of the parameters (the inverse of the Hessian matrix). Using an estimated Hessian matrix to compute confidence intervals is a handy trick that can be used for all sorts of maximum-likelihood parameter estimates. 6.8 Comments Data with cycles, from age-structure or predator-prey interactions, are difficult to analyze and the EM algorithm used in the MARSS package will give poor estimates for this type of data. The slope method (Holmes, 2001) is more robust to those problems. Holmes et al. (2007) used the slope method in a large study of data from endangered and threatened species, and Ellner and Holmes (2008) showed that the slope estimates are close to the theoretical minimum uncertainty. Especially, when doing a population viability analysis using a time series with fewer than 25 years of data, the slope method is often less biased and (much) less variable because that method is less data-hungry (Holmes, 2004). However the slope method is not a true maximum-likelihood method and thus constrains the types of further analyses you can do (such as model selection). 7 Combining multi-site data to estimate regional population trends 7.1 Harbor seals in the Puget Sound, WA. In this application, we will use multivariate state-space models to combine surveys from multiple regions (or sites) into one estimate of the average longterm population growth rate and the year-to-year variability in that growth rate. Note this is not quite the same as estimating the ‘trend’; ‘trend’ often means what population change happened, whereas the long-term population growth rate refers to the underlying population dynamics. We will use as our example a dataset from harbor seals in Puget Sound, Washington, USA. We have five regions (or sites) where harbor seals were censused from 19781999 while hauled out of land1 . During the period of this dataset, harbor seals were recovering steadily after having been reduced to low levels by hunting prior to protection. The methodologies were consistent throughout the 20 years of the data but we do not know what fraction of the population that each region represents nor do we know the observation-error variance for each region. Given differences between behaviors of animals in different regions and the numbers of haul-outs in each region, the observation errors may be quite different. The regions have had different levels of sampling; the best sampled region has only 4 years missing while the worst has over half the years missing (Figure 7.1). We will assume that the underlying population process is a stochastic exponential growth process with rates of increase that were not changing through 1978-1999. However, we are not sure if all five regions sample a single “total Puget Sound” population or if there are independent subpopulations. We will estimate the long-term population growth rate using different assumptions 1 Type RShowDoc("Chapter_SealTrend.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. Jeffries et al. 2003. Trends and status of harbor seals in Washington State: 19781999. Journal of Wildlife Management 67(1):208–219 82 7 Combining multi-site and subpopulation data 8.5 Puget Sound Harbor Seal Surveys 2 2 2 8.0 2 2 2 2 2 2 2 2 2 3 1 2 2 3 3 1 3 1 3 1 3 3 3 3 2 3 1 2 3 3 3 1 1 1 1 3 3 1 1 1 5 4 4 1 1 5 4 5 4 4 1 4 1 4 4 5 5 5 7.5 7.0 6.0 6.5 log(counts) 2 2 3 5 3 1 4 5 1 4 1980 1985 1990 1995 Fig. 7.1. Plot of the of the count data from the five harbor seal regions (Jeffries et al. 2003). The numbers on each line denote the different regions: 1) Strait of Juan de Fuca (SJF), 2) San Juan Islands (SJI), 2) Eastern Bays (EBays), 4) Puget Sound (PSnd), and 5) Hood Canal (HC). Each region is an index of the total harbor seal population, but the bias (the difference between the index and the true population size) for each region is unknown. about the population structures (one big population versus multiple smaller ones) and observation error structures to see how different assumptions change the trend estimates. The harbor seal data are included in the MARSS package. The data have time running down the rows and years in the first column. We need time across the columns for the MARSS() function, so we will transpose the data: dat=t(harborSealWA) #Transpose years = dat[1,] #[1,] means row 1 n = nrow(dat)-1 dat = dat[2:nrow(dat),] #no years If you needed to read data in from a comma-delimited or tab-delimited file, these are the commands to do that: 7.2 A single well-mixed population with i.i.d. errors 83 dat = read.csv("datafile.csv",header=TRUE) dat = read.table("datafile.csv",header=TRUE) The years are in column 1 of dat and the logged data are in the rest of the columns. The number of observation time series (n) is the number of rows in dat minus 1 (for years row). Let’s look at the first few years of data: print(harborSealWA[1:8,], digits=3) [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] Year 1978 1979 1980 1981 1982 1983 1984 1985 SJF 6.03 NA NA NA NA 6.78 6.93 7.16 SJI EBays PSnd HC 6.75 6.63 5.82 6.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7.43 7.21 NA NA 7.74 7.45 NA NA 7.53 7.26 6.60 NA The NA’s in the data are missing values. 7.1.1 A MARSS model for Puget Sound harbor seals The first step is to mathematically specify the population structure and how the regions relate to that structure. The general state-space model is xt = Bxt−1 + u + wt , where wt ∼ MVN(0, Q) yt = Zxt + a + vt , where vt ∼ MVN(0, R) where all the bolded symbols are matrices. To specify the structure of the population and observations, we will specify what those matrices look like. 7.2 A single well-mixed population with i.i.d. errors When we are looking at data over a large geographic region, we might make the assumption that the different census regions are measuring a single population if we think animals are moving sufficiently such that the whole area (multiple regions together) is “well-mixed”. We write a model of the total population abundance for this case as: nt = exp(u + wt )nt−1 , (7.1) where nt is the total count in year t, u is the mean population growth rate, and wt is the deviation from that average in year t. We then take the log of both sides and write the model in log space: xt = xt−1 + u + wt , where wt ∼ N(0, q) (7.2) 84 7 Combining multi-site and subpopulation data xt = log nt . When there is one effective population, there is one x, therefore xt is a 1 × 1 matrix. There is one population growth rate (u) and there is one process variance (q). Thus u and Q are 1 × 1 matrices. 7.2.1 The observation process We assume that all five regional time series are observations of this one population trajectory but they are scaled up or down relative to that trajectory. In effect, we think that animals are moving around a lot and our regional samples are some fraction of the population. There is year-to-year variation in the fraction in each region, just by chance. Notice that under this analysis, we do not think the regions represent independent subpopulations but rather independent observations of one population. Our model for the data, yt = Zxt + a + vt , is written as: 0 v1 y1 1 a2 v2 y2 1 y3 = 1 xt + a3 + v3 (7.3) a4 v4 y4 1 y5 t a5 v5 t 1 Each yi is the time series for a different region. The a’s are the bias between the regional sample and the total population. The a’s are scaling (or interceptlike) parameters2 . We allow that each region could have a unique observation variance and that the observation errors are independent between regions. Lastly, we assume that the observations errors on log(counts) are normal and thus the errors on (counts) are log-normal.3 For our first analysis, we assume that the observation variance is equal across regions but the errors are independent. This means we estimate one observation variance instead of five. This is a fairly standard assumption for data that come from the uniform survey methodology.4 . We specify independent observation errors with identical variances by specifying that the v’s 2 3 4 To get rid of the a’s, we scale multiple observation time series against each other; thus one a will be fixed at 0. Estimating the bias between regional indices and the total population is important for getting an estimate of the total population size. The type of time-series analysis that we are doing here (trend analysis) is not useful for estimating a’s. Instead to get a’s one would need some type of mark-recapture data. However, for trend estimation, the a’s are not important. The regional observation variance captures increased variance due to a regional estimate being a smaller sample of the total population. The assumption of normality is not unreasonable since these regional counts are the sum of counts across multiple haul-outs. By the way, this is not a good assumption for these data since the number haulouts in each region varies and the regional counts are the sums across all haul-outs in a region. We will change this assumption in the next fit and see that the AIC values decline. 7.2 A single well-mixed population with i.i.d. errors 85 come from a multivariate normal distribution with variance-covariance matrix R (v ∼ MVN(0, R)), where r0000 0 r 0 0 0 (7.4) R= 0 0 r 0 0 0 0 0 r 0 0000r Z specifies which observation time series, yi,1:T , is associated with which population trajectory, x j,1:T . Z is like a look up table with 1 row for each of the n observation time series and 1 column for each of the m population trajectories. A 1 in row i column j means that observation time series i is measuring state process j. Otherwise the value in Zi j = 0. Since we have only 1 population trajectory, all the regions must be measuring that one population trajectory. Thus Z is n × 1: 1 1 (7.5) Z= 1 1 1 7.2.2 Fitting the model We have specified the mathematical form of our state-space model. The next step is to fit this model with MARSS(). The function call will now look like: kem1 = MARSS(dat, model=list(Z=Z.model, R=R.model) ) The model list argument tells the MARSS() function the model structure, i.e. the form of Z, u, Q, etc. For our first analysis, we only need to set the model structure for Z and R. Since there is only one population, there is only one u and Q (they are scalars), so they have no ’structure’. First we specify the Z matrix. We need to tell the MARSS function that Z is a 5×1 matrix of 1s (as in Equation 7.3). We can do this two ways. We can pass in Z.model as a matrix of ones, matrix(1,5,1), just like in Equation (7.3) or we can pass in a vector of five factors, factor(c(1,1,1,1,1)). The i-th factor specifies which population trajectory the i-th observation time series belongs to. Since there is only one population trajectory in this first analysis, we will have a vector of five 1’s: every observation time series is measuring the first, and only, population trajectory. Z.model = factor(c(1,1,1,1,1)) Note, the vector (the c() bit) must be wrapped in factor() so that MARSS recognizes what it is. You can use either numeric or character vectors: c(1,1,1,1,1) is the same as c("PS","PS","PS","PS","PS"). 86 7 Combining multi-site and subpopulation data Next we specify that the R variance-covariance matrix only has terms on the diagonal (the variances) with the off-diagonal terms (the covariances) equal to zero: R.model = "diagonal and unequal" The ‘and unequal’ part specifies that the variances are allowed to be unique on the diagonal. If we wanted to force the observation variances to be equal at all regions, we would use "diagonal and equal". 9 Observations and total population estimate 2 2 2 2 2 2 2 2 2 7 3 2 1 2 2 1 3 1 3 3 3 1 3 3 3 3 2 3 1 1 3 2 3 3 3 1 1 1 3 1 3 1 1 1 5 4 4 4 4 1 1 5 4 5 4 1 4 1 4 5 4 5 5 5 2 3 5 1 4 5 6 index of log abundance 8 2 2 2 1980 1985 1990 1995 Fig. 7.2. Plot of the estimate of “log total harbor seals in Puget Sound”. The estimate of the total count has been scaled relative to the first time series. The 95% confidence intervals on the population estimates are the dashed lines. These are not the confidence intervals on the observations, and the observations (the numbers) will not fall between the confidence interval lines. Code 7.2 shows you how to fit the single population model (Equations 7.2 and 7.3) to the harbor seal data. 7.2 A single well-mixed population with i.i.d. errors 87 Code 7.2 #Code to fit the single population model with i.i.d. errors #Read in data dat=t(harborSealWA) #Transpose since MARSS needs time ACROSS columns years = dat[1,] n = nrow(dat)-1 dat = dat[2:nrow(dat),] legendnames = (unlist(dimnames(dat)[1])) #estimate parameters Z.model = factor(c(1,1,1,1,1)) R.model = "diagonal and unequal" kem1 = MARSS(dat, model=list(Z=Z.model, R=R.model)) #make figure matplot(years, t(dat),xlab="",ylab="index of log abundance", pch=c("1","2","3","4","5"),ylim=c(5,9),bty="L") lines(years,kem1$states-1.96*kem1$states.se,type="l", lwd=1,lty=2,col="red") lines(years,kem1$states+1.96*kem1$states.se,type="l", lwd=1,lty=2,col="red") lines(years,kem1$states,type="l",lwd=2) title("Observations and total population estimate",cex.main=.9) coef(kem1, type="vector") #show the estimated parameter elements as a vector #show estimated elements for each parameter matrix as a list coef(kem1) kem1$logLik #show the log-likelihood kem1$AIC #show the AIC 7.2.3 The MARSS() output The output from MARSS(), here assigned the name kem1, is a list of objects: names(kem1) The maximum-likelihood estimates of “total harbor seal population” scaled to the first observation data series (Figure 7.2) are in kem1$states, and kem1$states.se are the standard errors on those estimates. To get 95% confidence intervals, use kem1$states +/- 1.96*kem1$states.se. Figure 7.2 shows a plot of kem1$states with its 95% confidence intervals over the data. Because kem1$states has been scaled relative to the first time series, it is on top of that time series. One of the biases, the /aa, cannot be estimated and arbitrarily our algorithm choses a1 = 0, so the population estimate is scaled to the first observation time series. 88 7 Combining multi-site and subpopulation data The estimated parameters are output with the function coef: coef(kem1). To get the estimate just for U, which is the estimated long-term population growth rate, use coef(kem1)$U. Multiply by 100 to get the percent increase per year. The estimated process variance is given by coef(kem2)$Q. The log-likelihood of the fitted model is in kem1$logLik. We estimated one initial x (t = 1), one process variance, one u, four a’s, and five observation variances’s. So K = 12 parameters. The AIC of this model is −2×log-like+2K, which we can show by typing kem1$AIC. 7.3 Single population model with independent and non-identical errors Here is the estimated R matrix for our first model: coef(kem1,type="matrix")$R [1,] [2,] [3,] [4,] [5,] [,1] 0.03229417 0.00000000 0.00000000 0.00000000 0.00000000 [,2] 0.00000000 0.03527748 0.00000000 0.00000000 0.00000000 [,3] 0.00000000 0.00000000 0.01352073 0.00000000 0.00000000 [,4] 0.00000000 0.00000000 0.00000000 0.01082157 0.00000000 [,5] 0.0000000 0.0000000 0.0000000 0.0000000 0.1960897 Notice that the variances along the diagonal are all the same—we estimated one observation variance and it applied to all observation time series. We might be able to improve the fit (at the cost of more parameters) by assuming that the observation variance is different across regions while the errors are still independent. This means we estimate five observation variances instead of one. In this case, R has the form: r1 0 0 0 0 0 r2 0 0 0 (7.6) R= 0 0 r3 0 0 0 0 0 r4 0 0 0 0 0 r5 To impose this model, we set the R model to R.model="diagonal and unequal" This tells MARSS that all the r’s along the diagonal in R are different. To fit this model to the data, call MARSS() as: Z.model = factor(c(1,1,1,1,1)) R.model = "diagonal and equal" kem2 = MARSS(dat, model=list(Z=Z.model, R=R.model)) 7.3 Single population model with independent and non-identical errors 89 We estimated one initial x, one process variance, one u, four a’s, and five observation variances. So K = 11 parameters. The AIC for this new model compared to the old model with one observation variance is: c(kem1$AIC,kem2$AIC) [1] -9.323982 8.813447 A smaller AIC means a better model. The difference between the one observation variance versus the unique observation variances is >10, suggesting that the unique observation variances model is better. One of the key diagnostics when you are comparing fits from multiple models is whether the model is flexible enough to fit the data. This can be checked by looking for temporal trends in the the residuals between the estimated population states (e.g. kem2$states) and the data. In Figure 7.3, the residuals for the second analysis are shown. Ideally, these residuals should not have a temporal trend. They should look cloud-like. The fact that the residuals have a strong temporal trend is an indication that our one population model is too restrictive for the data5 . Code 7.3 shows you how to fit the second model and make the diagnostics plot. 5 When comparing models via AIC, it is important that you only compare models that are flexible enough to fit the data. Fortunately if you neglect to do this, the inadequate models will usually have very high AICs and fall out of the mix anyhow. 7 Combining multi-site and subpopulation data SJF SJI ● ● ● ● ● ●● 0.0 ● ● ● ● ● −0.4 10 15 5 10 15 Index PSnd HC ● ● ● ● ● ● ● ●● ● ● ● ● ● 5 10 15 Index ● ● 0.0 0.1 ● ● 0.4 0.8 Index ● 0.0 ●● ● ● ● ● ● ● −0.2 ● 0.2 ●● ● ● ● ● ● ● residuals 0.0 residuals ● ● 5 residuals ●● ● ● −0.2 −0.4 residuals ● ● ● ● ● 0.0 0.1 0.2 ● ● ● 0.2 0.2 ● ● EBays residuals 90 ● ● 2 4 6 Index 8 ● ● −0.4 −0.2 ● ● 1 3 ● 5 7 Index Fig. 7.3. Residuals for the model with a single population. The plots of the residuals should not have trends with time, but they do. This is an indication that the single population model is inconsistent with the data. The code to make this plot is given in the script file for this chapter. Code 7.3 #Code to fit the single population model with independent and unequal errors Z.model = factor(c(1,1,1,1,1)) R.model = "diagonal and equal" kem2 = MARSS(dat, model=list(Z=Z.model, R=R.model)) coef(kem2) #the estimated parameter elements kem2$logLik #log likelihood c(kem1$AIC,kem2$AIC) #AICs #plot residuals plotdat = t(dat) matrix.of.biases = matrix(coef(kem2, type="matrix")$A, nrow=nrow(plotdat),ncol=ncol(plotdat),byrow=T) xs = matrix(kem2$states, nrow=dim(plotdat)[1],ncol=dim(plotdat)[2],byrow=F) resids = plotdat-matrix.of.biases-xs par(mfrow=c(2,3)) for(i in 1:n){ plot(resids[!is.na(resids[,i]),i],ylab="residuals") title(legendnames[i]) } par(mfrow=c(1,1)) 7.4 Two subpopulations, north and south 91 7.4 Two subpopulations, north and south For the third analysis, we will change our assumption about the structure of the population. We will assume that there are two subpopulations, north and south, and that regions 1 and 2 (Strait of Juan de Fuca and San Juan Islands) fall in the north subpopulation and regions 3, 4 and 5 fall in the south subpopulation. For this analysis, we will assume that these two subpopulations share their growth parameter, u, and process variance, q, since they share a similar environment and prey base. However we postulate that because of fidelity to natal rookeries for breeding, animals do not move much year-to-year between the north and south and the two subpopulations are independent. We need to write down the state-space model to reflect this population structure. There are two subpopulations, xn and xs , and they have the same growth rate u: xn xn u w = + + n (7.7) xs t xs t−1 u ws t We specify that they are independent by specifying that their year-to-year population fluctuations (their process errors) come from a multivariate normal with no covariance: wn 0 q0 ∼ MV N , (7.8) 0 0q ws t For the observation process, we use the Z matrix to associate the regions with their respective xn and xs values: v1 y1 10 0 y2 1 0 a2 v2 y3 = 0 1 xn + 0 + v3 (7.9) xs t y4 0 1 a4 v4 01 a5 v5 t y5 t 7.4.1 Specifying the model elements We need to change the Z specification to indicate that there are two subpopulations (north and south), and that regions 1 and 2 are in the north subpopulation and regions 3,4 and 5 are in the south subpopulation. There are a few ways, we can specify this Z matrix for MARSS(): Z.model = matrix(c(1,1,0,0,0,0,0,1,1,1),5,2) Z.model = factor(c(1,1,2,2,2)) Z.model = factor(c("N","N","S","S","S")) Which you choose is a matter of preference as they all specify the same form for Z. We also want to specify that the u’s are the same for each subpopulation and that Q is diagonal with equal q’s. To do this, we set 92 7 Combining multi-site and subpopulation data U.model = "equal" Q.model = "diagonal and equal" This says that there is one u and one q parameter and both subpopulations share it (if we wanted the u’s to be different, we would use U.model="unequal" or leave off the u model since the default behavior is U.model="unequal"). Code 7.4 puts all the pieces together and shows you how to fit the north and south population model and create the residuals plot (Figure 7.4). The residuals look better (more cloud-like) but the Hood Canal residuals are still temporally correlated. Code 7.4 #fit the north and south population model Z.model = factor(c(1,1,2,2,2)) U.model = "equal" Q.model = "diagonal and equal" R.model = "diagonal and equal" kem3 = MARSS(dat, model=list(Z=Z.model, R=R.model, U=U.model, Q=Q.model)) #plot residuals plotdat = t(dat) matrix.of.biases = matrix(coef(kem3,type="matrix")$A, nrow=nrow(plotdat),ncol=ncol(plotdat),byrow=T) par(mfrow=c(2,3)) for(i in 1:n){ j=c(1,1,2,2,2) xs = kem3$states[j[i],] resids = plotdat[,i]-matrix.of.biases[,i]-xs plot(resids[!is.na(resids)],ylab="residuals") title(legendnames[i]) } par(mfrow=c(1,1)) 7.5 Other population structures Now work through a number of different structures and examine how your estimation of the mean population growth rate varies under different assumptions about the structure of the population and the data. You can compare the model fits using AIC (or AICc). For AIC, lower is better and only the relative differences matter. A difference of 10 between two AICs means substantially more support for the model with lower AIC. A difference of 30 or 40 between two AICs is very large. 7.5 Other population structures SJF SJI EBays ● ●● ● 5 ● ● ● 5 10 15 Index Index PSnd HC ●● ● 5 10 15 Index 8 Index ● ● ● 6 ● ● ● ● −0.2 ● 4 ● ● ● ● ● ● ● 2 ● ● 0.2 residuals ● ● ● ● ● 0.4 0.1 ● ● ● ● ● −0.1 −0.3 ● 10 15 ● 0.1 ● ● residuals ● ● ● ● ● ● ● ● 0.0 ● ● −0.15 −0.2 ● ●● residuals ● ● ● ● ● ● −0.2 ● ● 0.00 ● ● ● ● residuals ● 0.0 residuals 0.10 ● 0.2 ● ● 93 1 3 ● 5 7 Index Fig. 7.4. The residuals for the analysis with a north and south subpopulation. The plots of the residuals should not have trends with time. Compare with the residuals for the analysis with one subpopulation. 7.5.1 Five subpopulations Analyze the data using a model with five subpopulations, where each of the five census regions is sampling one of the subpopulations. Assume that the subpopulation are independent (diagonal Q), however let each subpopulation share the same population parameters, u and q. Code 7.5.1 shows how to set the MARSS() arguments for this case. You can use R.model="diagonal and equal" to make all the observation variances equal. 94 7 Combining multi-site and subpopulation data Code 7.5.1 Z.model=factor(c(1,2,3,4,5)) U.model="equal" Q.model="diagonal and equal" R.model="diagonal and unequal" kem=MARSS(dat, model=list(Z=Z.model, U=U.model, Q=Q.model, R=R.model) ) 7.5.2 Two subpopulations with different population parameters Analyze the data using a model that assumes that the Strait of Juan de Fuca and San Juan Islands census regions represent a northern Puget Sound subpopulation, while the other three regions represent a southern Puget Sound subpopulation. This time assume that each population trajectory (north and south) has different u and q parameters: un ,us and qn ,qs . Also assume that each of the five census regions has a different observation variance. Try to write your own code. If you get stuck (or want to check your work, you can open a script file with sample R code by typing RShowDoc("Chapter_SealTrend.R",package="MARSS") at the R command line. In math form, this model is: xn x u w w q 0 = n + n + n , n ∼ MVN 0, n (7.10) xs t xs t−1 us ws t ws t 0 qs v1 y1 10 0 y2 1 0 a2 v2 y3 = 0 1 xn + 0 + v3 (7.11) xs t y4 0 1 a4 v4 01 a5 v5 t y5 t 7.5.3 Hood Canal covaries with the other regions Analyze the data using a model with two subpopulations with the divisions being Hood Canal versus everywhere else. In math form, this model is: xp x u wp wp qc = p + p + , ∼ MVN 0, (7.12) xh t xh t−1 uh wh t wh t cq y1 10 0 v1 y2 1 0 a2 v2 y3 = 1 0 x p + a3 + v3 (7.13) xh t y4 1 0 a4 v4 y5 t 01 0 v5 t 7.6 Discussion 95 To specify that Q has one value on the diagonal (one variance) and one value on the off-diagonal (covariance) you can specify Q.model two ways: Q.model = "equalvarcov" Q.model = matrix(c("q","c","c","q"),2,2) 7.5.4 Three subpopulations with shared parameter values Analyze the data using a model with three subpopulations as follows: north (regions 1 and 2), south (regions 3 and 4), Hood Canal (region 5). You can specify that some subpopulations share parameters while others do not. First, let’s specify that each population is affected by independent environmental variability, but that the variance of that variability is the same for the two interior populations: Q.model=matrix(list(0),3,3) diag(Q.model)=c("coastal","interior","interior") print(Q.model) Notice that Q is a diagonal matrix (independent year-to-year environmental variability) but the variance of two of the populations is the same. Notice too that the off-diagonal terms are numeric; they do not have quotes. We specified Q using a matrix of class list, so that we could have numeric values (fixed) and character values (estimated parameters). In a similar way, we specify that the observation errors are independent but that estimates from a plane do not have the same variance as those from a boat: R.model=matrix(list(0),5,5) diag(R.model)=c("boat","boat","plane","plane","plane") For the long-term trends, we specify that x1 and x2 share a long-term trend (“puget sound”) while x3 is allowed to have a separate trend (“hood canal”). U.model=matrix(c("puget sound","puget sound","hood canal"),3,1) 7.6 Discussion There are a number of corners that we cut in order to show you code that runs quickly: We ran the code starting from one initial condition. For a real analysis, you should start from a large number of random initial conditions and use the one that gives the highest likelihood. Since the EM algorithm is a “hill-climbing” algorithm, this ensures that it does not get stuck on a local maxima. MARSS() will do this for you if you pass it the argument control=list(MCInit=TRUE). This will use a Monte Carlo routine to try many different initial conditions. See the help file on MARSS() for more information (by typing ?MARSS at the R prompt). 96 7 Combining multi-site and subpopulation data We assume independent observation and process errors. Depending on your system, observation errors may be driven by large-scale environmental factors (temperature, tides, prey locations) that would cause your observation errors to covary across regions. If your observation errors strongly covary between regions and you treat them as independent, this could be bad for your analysis. Unfortunately, separating covariance across observation versus process errors will require much data (to have any power). In practice, the first step is to think hard about what drives sightability for your species and what are the relative levels of process and observation variance. You may be able to subsample your data in a way that will make the observation errors more independent. The MARSS() argument control specifies the options for the EM algorithm. We left the default tolerance for the convergence test. You would want to set this lower for a real analysis. You will need to up the maxit argument correspondingly. We used the large-sample approximation for AIC instead of a bootstrap AIC that is designed to correct for small sample size in state-space models. The bootstrap metric, AICb, takes a long time to run. Use the call MARSSaic(kem, output=c("AICbp")) to compute AICb. We could have shown AICc, which is the small-sample size corrector for non-state-space models. Type kem$AICc to get that. Finally, in a real (maximum-likelihood) analysis, one needs to be careful not to dredge the data. The temptation is to look at the data and pick a population structure that will fit that data. This can lead to including models in your analysis that have no biological basis. In practice, we spend a lot of time discussing the population structure with biologists working on the species and review all the biological data that might tell us what are reasonable structures. From that, a set of model structures to use are selected. Other times, a particular model structure needs to be used because the population structure is not in question rather it is a matter of using that pre-specified structure and using all the data to get parameter estimates for forecasting. Some more questions you might ponder Do different assumptions about whether the observation error variances are all identical versus different affect your estimate of the long-term population growth rate (u)? You may want to rerun examples 3-7 with the R.model changed. R.model="diagonal and unequal" means measurement variances all different versus "diagonal and equal". Do assumptions about the underlying structure of the population affect your estimates of u? Structure here means number of subpopulations and which areas are in which subpopulation. The confidence intervals for the first two analyses are very tight because the estimated process variance, Q, was very small. Why do you think process 7.6 Discussion 97 variance (q) was forced to be so small? [Hint: We are forcing there to be one and only one true population trajectory and all the observation time series have to fit that one time series. Look at the AICs too.] 8 Identifying spatial population structure and covariance 8.1 Harbor seals on the U.S. west coast In this application, we use time series of harbor seal abundance estimates along the west coast to examine large-scale spatial structure. Harbor seals are distributed along the west coast of the U.S. from California to Washington. The populations in Oregon and Washington have been surveyed for over 25 years (Jeffries et al., 2003) at a number of haul-out sites (Figure 8.1). These populations have been increasing steadily since the 1972 Marine Mammal Protection Act. For management purposes, three stocks are recognized; the CA stock, the OR/WA coastal stock which consists of four regions (Northern/Southern Oregon, Coastal Estuaries, Olympic Peninsula), and the inland WA stock which consists of the regions in the WA inland waters minus Hood Canal (Figure 8.1). Differences exist in the demographics across regions (e.g. pupping dates), however mtDNA analyses and tagging studies support the larger stock structure. Harbor seals are known for strong site fidelity, but at the same time travel large distances to forage. Our goal is to address the following questions about spatial structure: 1) Does population abundance data support the existing management boundaries, or are there alternative groupings that receive more support?, 2) Do subpopulations (if they exist) experience independent environmental variability or correlated variability? and 3) Does the Hood Canal site represent a distinct subpopulation? To address these questions, we will mathematically formulate different hypotheses about population structure via different MARSS models. We will then compare the data support for different models using model selection criteria, specifically AICc and AIC weights. Type RShowDoc("Chapter_SealPopStructure.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. Figure 01. Map of spatial distribution of 9 harbor seal sites in Washington and Oregon. 100 8 Using MARSS models to identify spatial population structure and covariance San Juans Eastern Bays Juan de Fuca Puget Sound Olympic Peninsula Hood Canal Coastal Estuaries Northern Coast Southern Coast Fig. 8.1. Map of spatial distribution of harbor seal survey regions in Washington and Oregon. In addition to these nine survey regions, we also have data from the Georgia Strait just north of the San Juan Islands, the California coast and the Channels Islands in Southern California. 8.1.1 MARSS models for a population with spatial structure The mathematical form of the model we will use is xt = xt−1 + u + wt where wt ∼ MVN(0, Q) yt = Zxt + a + vt where vt ∼ MVN(0, R) (8.1) x0 ∼ MVN(π, Λ) B is in front of x but is left off above since it is the identity matrix1 . We will use Z, u, and Q to specify different hypotheses about the population structure. The form of a will be “scaling” in all cases. Aerial survey methodology has been relatively constant across time and space, and we will assume that all the time series from each region has identical and independent observation 1 a diagonal matrix with 1s on the diagonal 8.2 Question 1, How many distinct subpopulations? 101 error variance, which means a diagonal R matrix with one variance term on the diagonal2 . Each call to MARSS() will look like fit = MARSS(sealData, model=list( Z = Z.model, Q = Q.model, ...)) where the ... are components of the model list that are the same across all models. We will specify different Z.model and Q.model in order to model different population spatial structures. 8.2 Question 1, How many distinct subpopulations? We will start by evaluating the data support for the following hypotheses about the population structure: H1 H2 H3 H4 3 subpopulations defined by stock 2 subpopulations defined by coastal versus WA inland 2 subpopulations defined by north and south split in the middle of Oregon 4 subpopulations defined by N coastal, S coastal, SJF+Georgia Strait, and Puget Sound H5 All regions are part of the same panmictic population H6 Each of the 11 regions is a subpopulation We will analyze each of these under the assumption of independent process errors with each subpopulation having different variances or the same variance. 8.2.1 Specify the Z matrices The Z matrices specify the relationship between the survey regions and the subpopulations and allow us to specify the spatial population structures in the hypotheses. Each column of Z corresponds to a different subpopulation and associates regions with particular subpopulations. For example for hypothesis 1, column 1 of the Z matrix is OR/WA Coastal, column 2 is inland WA (ps for Puget Sound) and column 3 is CA. The Z matrix for hypotheses 1, 2, 4, and 5 take the following form: To tell MARSS() the form of Z, we construct the same matrix in R. For example, for hypotheses 1, we can write: 2 The sampling regions have different number of sites where animals are counted. But we are working with log counts. We assume that the distribution of percent errors is the same (the probability of a 10% over-count is the same) and thus that the variances are similar on the log-scale. 102 8 Using MARSS models to identify spatial population structure and covariance Coastal Estuaries Olympic Peninsula Str. Juan de Fuca San Juan Islands Eastern Bays Puget Sound CA.Mainland CA.ChannelIslands OR North Coast OR South Coast Georgia Strait H1 Z wa.or ps 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 H2 H4 H5 Z Z Z ca coast ps nc is ps sc pan 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 1 Z.model=matrix(0,11,3) Z.model[c(1,2,9,10),1]=1 #which elements in col 1 are 1 Z.model[c(3:6,11),2]=1 #which elements in col 2 are 1 Z.model[7:8,3]=1 #which elements in col 3 are 1 MARSS has a shortcut for making this kind of Z matrix using factor(). To make the Z matrix for hypotheses 1, we could also write factor(c(1,1,2,2,2,2,3,3,1,1,2)). Each element is for one of the rows of Z and indicates which column the “1” appears in or which row of your data belongs to which subpopulation. Instead of numbers however, we will use text strings to denote the subpopulations. For example, the Z.model specification for hypothesis 1 is Z1=factor(c("wa.or","wa.or",rep("ps",4),"ca","ca","wa.or","wa.or","bc")) Notice it is 11 elements in length; one element for each row of data (in this case survey region). 8.2.2 Specify the u structure We will assume that subpopulations can have a unique population growth rate. Mathematically, this means that the u matrix in Equation 8.1 looks like this for hypotheses 1 (3 subpopulations): u1 u2 u3 To specify this, we construct U.model as a character matrix where shared elements have the same character name. For example, U.model=matrix(c("u1","u2","u3"),3,1) for a three subpopulation model. Alternatively, we can use the shortcut U.model="unequal". 8.2 Question 1, How many distinct subpopulations? 103 8.2.3 Specify the Q structures For our first analysis, we fit a model where the subpopulations experience independent process errors. We will use two different types of independent process errors: independent process errors with different variances and independent process errors with identical variance. Independence is specified with a diagonal variance-covariance matrix with 0s on the off-diagonals. Independent process errors with different variances is a diagonal matrix with different values on the diagonal: q1 0 0 0 q2 0 0 0 q3 This matrix has fixed numeric values, the zeros, combined with symbols q1 , q2 and q3 , representing estimated values. We specified this for MARSS() using a list matrix which combines numeric values (the fixed zeros) with character values (names of the estimated elements). The following produces this and printing it shows that it combines numeric values and character strings in quotes. Q.model=matrix(list(0),3,3) diag(Q.model)=c("q1","q2","q3") Q.model [,1] [,2] [,3] [1,] "q1" 0 0 [2,] 0 "q2" 0 [3,] 0 0 "q3" We can also use the shortcut Q.model="diagonal and unequal". Independent process errors with identical variance is a diagonal matrix with one value on the diagonal: q00 0 q 0 00q Q.model=matrix(list(0),3,3) diag(Q.model)="q" Q.model [,1] [1,] "q" [2,] 0 [3,] 0 [,2] 0 "q" 0 [,3] 0 0 "q" The shortcut for this form is Q.model="diagonal and equal". 104 8 Using MARSS models to identify spatial population structure and covariance 8.3 Fit the different models The dataset harborSeal is a 29-year dataset of abundance indices for each of 12 regions between 1975-2004 (Figure 8.2). We start by setting up our data matrix. We will leave off Hood Canal (column 8) for now. years = harborSeal[,1] #first col is years #leave off Hood Canal data for now sealData = t(harborSeal[,c(2:7,9:13)]) ● 2005 2005 ● ●● ● 1995 2005 1995 2005 2005 ●● ● ● ● ● 1995 7.9 ● ● ● ●● ● 8.4 2005 ●● ● ● 7.7 ● ● ● ● ● ● ● ● ● ● ● ● 1975 ● 1985 1995 2005 Georgia.Strait ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1985 1985 ● ● ●● ● 7.5 6.4 6.8 7.2 7.6 ● ● ● ● ● ● ● ● ● ●● 2005 ● OR.SouthCoast ●● ● 1975 ● ● ●● 1975 1995 ● ● 8.0 ● OR.NorthCoast ● ● ● ●●● ● 1985 ● ● 7.6 ● 1995 ● ●● 1995 2005 8.5 1985 ● ● ● 1975 9.5 6.8 6.4 ● 1975 9.3 9.5 9.7 9.9 ● ● ● 2005 CA.ChannelIslands ● ● ● 5.8 1985 CA.Mainland ● 1995 ● ● 1975 HoodCanal ● 1985 7.0 ●●●● ● ● ●● 7.0 6.6 7.0 ● 1985 ● 1975 6.2 ● ● ●● ● ● ● ● ● ● PugetSound ● ● 7.4 8.0 1995 ● ● ● ● EasternBays ● ● ●● ●● 1975 1985 7.8 SanJuanIslands ●● ● ●● ● ●● 1975 ● ● ● ● ● 6.6 1995 ● ● ● ● ● ● ●● 6.0 6.5 7.0 7.5 ● 7.4 1985 ● ● ● ● ●● 7.8 8.5 8.0 7.5 ● ● ●● 1975 StraitJuanDeFuca ● ●● ● ● ● ● ● ● ● ●● ● OlympicPeninsula ● ● ● 10.5 ● ●● 8.2 9.0 CoastalEstuaries ● 1975 1985 1995 2005 ● 1975 1985 1995 2005 Fig. 8.2. Plot of the of the harbor seal sites in the harborSeal dataset. Each region is an index of the harbor seal abundance in that region. We will set up our models so we can fit all of them with one loop of code. First the Z models. #H1 stock Z1=factor(c("wa.or","wa.or",rep("ps",4),"ca","ca","wa.or","wa.or","bc")) #H2 coastal+PS Z2=factor(c(rep("coast",2),rep("ps",4),rep("coast",4),"ps")) #H3 N and S 8.3 Fit the different models 105 Z3=factor(c(rep("N",6),"S","S","N","S","N")) #H4 North Coast, Inland Strait, Puget Sound, South Coast Z4=factor(c("nc","nc","is","is","ps","ps","sc","sc","nc","sc","is")) #H5 panmictic Z5=factor(rep("pan",11)) #H6 Site Z6=factor(1:11) #site Z.models=list(Z1,Z2,Z3,Z4,Z5,Z6) names(Z.models)= c("stock","coast+PS","N-S","NC+Strait+PS+SC","panmictic","site") Next we set up the Q models. Q.models=c("diagonal and equal", "diagonal and unequal") The rest of the model matrices have the same form across all models. U.model="unequal" R.model="diagonal and equal" A.model="scaling" B.model="identity" x0.model="unequal" V0.model="zero" model.constant=list( U=U.model, R=R.model, A=A.model, x0=x0.model, V0=V0.model, tinitx=0) We loop through the models, fit and store the results: out.tab=NULL fits=list() for(i in 1:length(Z.models)){ for(Q.model in Q.models){ fit.model = c(list(Z=Z.models[[i]], Q=Q.model), model.constant) fit = MARSS(sealData, model=fit.model, silent=TRUE, control=list(maxit=1000)) out=data.frame(H=names(Z.models)[i], Q=Q.model, U=U.model, logLik=fit$logLik, AICc=fit$AICc, num.param=fit$num.params, m=length(unique(Z.models[[i]])), num.iter=fit$numIter, converged=!fit$convergence) out.tab=rbind(out.tab,out) fits=c(fits,list(fit)) if(i==5) next #one m for panmictic so only run 1 Q } } 106 8 Using MARSS models to identify spatial population structure and covariance 8.4 Summarize the data support We will use AICc and AIC weights to summarize the data support for the different hypotheses. First we will sort the fits based on AICc: min.AICc=order(out.tab$AICc) out.tab.1=out.tab[min.AICc,] Next we add the ∆AICc values by subtracting the lowest AICc: out.tab.1=cbind(out.tab.1, delta.AICc=out.tab.1$AICc-out.tab.1$AICc[1]) Relative likelihood is defined as exp(−∆AICc/2). out.tab.1=cbind(out.tab.1, rel.like=exp(-1*out.tab.1$delta.AICc/2)) The AIC weight for a model is its relative likelihood divided by the sum of all the relative likelihoods. out.tab.1=cbind(out.tab.1, AIC.weight = out.tab.1$rel.like/sum(out.tab.1$rel.like)) Let’s look at the model weights (out.tab.1): H NC+Strait+PS+SC NC+Strait+PS+SC N-S N-S coast+PS coast+PS stock stock panmictic panmictic site site Q delta.AICc AIC.weight diagonal and equal 0.00 0.886 diagonal and unequal 4.15 0.112 diagonal and unequal 12.67 0.002 diagonal and equal 14.78 0.001 diagonal and equal 31.23 0.000 diagonal and unequal 33.36 0.000 diagonal and equal 34.01 0.000 diagonal and unequal 36.84 0.000 diagonal and equal 48.28 0.000 diagonal and unequal 48.28 0.000 diagonal and equal 56.36 0.000 diagonal and unequal 57.95 0.000 It appears that a population structure north and south coast subpopulations and two inland subpopulations is more supported than any of the other West Coast population structures—under the assumption of independent process errors. The latter means that good and bad years are not correlated across the subpopulations. The stock structure, supported by genetic information, does not appear to correspond to independent subpopulations and the individual survey regions, which are characterized by differential pupping times, does not appear to lead to correspond to independent subpopulations either. Figure 8.3 shows the the four subpopulation trajectories estimated by the best fit model. The trajectories have been rescaled so that each starts at 0 in 1975 (to facilitate comparison). 8.5 Question 2, Are the subpopulations independent? 107 0.0 0.5 1.0 1.5 2.0 North Coastal Inland Straits Puget Sound South Coastal 1975 1980 1985 1990 1995 2000 2005 abundance index Fig. 8.3. Estimated trajectories for the four subpopulations in the best-fit model. The plots have been rescaled so that each is at 0 at 1975. 8.5 Question 2, Are the subpopulations independent? The assumption of independent process errors is unrealistic given ocean conditions are correlated across large spatial scales. We will repeat the analysis allowing correlated process errors using two different Q models. The first correlated Q model is correlated process errors with the same variance and covariance. For a model with three subpopulations, this Q would look like: qcc c q c ccq We can construct this like so #identical variances Q.model=matrix("c",3,3) diag(Q.model)="q" or use the short-cut Q.model="equalvarcov". The second type of correlated Q we will use is allows each subpopulation to have a different process variance 108 8 Using MARSS models to identify spatial population structure and covariance and covariances. For a model with three subpopulations, this is the following variance-covariance matrix: q1 c1,2 c1,3 c1,2 q2 c2,3 c1,2 c2,3 q3 Constructing this is tedious in R, but there is a short-cut: Q.model="unconstrained". We will re-run all the Z matrices with these two extra Q types and add them to our results table. for(i in 1:length(Z.models)){ if(i==5) next #don't rerun panmictic for(Q.model in c("equalvarcov","unconstrained")){ fit.model = c(list(Z=Z.models[[i]], Q=Q.model), model.constant) fit = MARSS(sealData, model=fit.model, silent=TRUE, control=list(maxit=1000)) out=data.frame(H=names(Z.models)[i], Q=Q.model, U=U.model, logLik=fit$logLik, AICc=fit$AICc, num.param=fit$num.params, m=length(unique(Z.models[[i]])), num.iter=fit$numIter, converged=!fit$convergence) out.tab=rbind(out.tab,out) fits=c(fits,list(fit)) } } Again we sort the models by AICc and compute model weights. min.AICc=order(out.tab$AICc) out.tab.2=out.tab[min.AICc,] fits=fits[min.AICc] out.tab.2=cbind(out.tab.2,delta.AICc=out.tab.2$AICc-out.tab.2$AICc[1]) out.tab.2=cbind(out.tab.2,rel.like=exp(-1*out.tab.2$delta.AICc/2)) out.tab.2=cbind(out.tab.2,AIC.weight=out.tab.2$rel.like/sum(out.tab.2$rel.like)) Examination of the expanded results table (out.tab.2) shows there is strong support for correlated process errors; top 10 models shown: H Q delta.AICc AIC.weight NC+Strait+PS+SC equalvarcov 0.00 0.976 site equalvarcov 7.65 0.021 NC+Strait+PS+SC unconstrained 11.47 0.003 NC+Strait+PS+SC diagonal and equal 23.39 0.000 NC+Strait+PS+SC diagonal and unequal 27.53 0.000 N-S unconstrained 32.61 0.000 N-S diagonal and unequal 36.06 0.000 N-S equalvarcov 36.97 0.000 stock equalvarcov 37.82 0.000 N-S diagonal and equal 38.16 0.000 8.5 Question 2, Are the subpopulations independent? 109 The model weight for “equalvarcov”, “unconstrained”, versus “diagonal and equal” is c( sum(out.tab.2$AIC.weight[out.tab.2$Q=="equalvarcov"]), sum(out.tab.2$AIC.weight[out.tab.2$Q=="unconstrained"]), sum(out.tab.2$AIC.weight[out.tab.2$Q=="diagonal and equal"]) ) [1] 0.997 0.003 0.000 8.5.1 Looking at the correlation structure in the Q matrix The 3rd model in the output table is a model with all elements of the process error variance-covariance matrix estimated. Estimating a variance-covariance matrix with so many extra parameters is not supported relative to a constrained variance-covariance matrix with two parameters (compare the AICc for the 1st model and 3rd model) but looking at the full variance-covariance matrix shows some interesting and not surprising patterns. The Q matrix is recovered from the model fit using this command Q.unc=coef(fits[[3]],type="matrix")$Q The diagonal of this matrix shows that each region appears to experience process variability of a similar magnitude: diag(Q.unc) [1] 0.009049512 0.007451479 0.004598690 0.005276587 We can compute the correlation matrix as follows. Rownames are added to make the matrix more readable. h=diag(1/sqrt(diag(Q.unc))) Q.corr=h%*%Q.unc%*%h rownames(Q.corr)=unique(Z4) colnames(Q.corr)=unique(Z4) Q.corr nc is ps sc nc 1.0000000 0.5970202 0.6421536 0.9163056 is 0.5970202 1.0000000 0.9970869 0.2271385 ps 0.6421536 0.9970869 1.0000000 0.2832502 sc 0.9163056 0.2271385 0.2832502 1.0000000 The correlation matrix indicates that the inland strait (‘is’) subpopulation experiences process errors (good and bad years) that are almost perfectly correlated with the Puget Sound subpopulation though the two have different population growth rates (Figure 8.3). Similarly the north and south 110 8 Using MARSS models to identify spatial population structure and covariance coastal subpopulations (‘nc’ and ‘sc’) experience highly correlated process errors, though again population growth rates are much higher to the north. There is much higher correlation between the process errors of the north coastal subpopulation and the nearby inland straits and Puget Sound subpopulations than between the two inland subpopulations and the much farther south coastal subpopulation. These patterns are not ecologically surprising but are not easy to discern looking at the raw count data. 8.6 Question 3, Is the Hood Canal independent? In the initial analysis, the data from Hood Canal were removed. Hood Canal has experienced a series of hypoxic events which has led to large perturbations to the harbor seal prey. We will add the Hood Canal data back in and look at whether treating Hood Canal as separate is supported compared to treating it as part of the Puget Sound subpopulation in the top model. sealData.hc = rbind(sealData,harborSeal[,8]) rownames(sealData.hc)[12]="Hood.Canal" Here are the two Z matrices for a ‘Hood Canal in the Puget Sound’ and ‘Hood Canal separate’ model: ZH1=factor(c("nc","nc","is","is","ps", "ps","sc","sc","nc","sc","is","ps")) ZH2=factor(c("nc","nc","is","is","ps", "ps","sc","sc","nc","sc","is","hc")) Z.models.hc=list(ZH1, ZH2) names(Z.models.hc)=c("hood.in.ps","hood.separate") We will test three different Q matrices: a matrix with one variance and one covariance, an unconstrained variance-covariance matrix and a variancecovariance where the Hood Canal subpopulation has independent process errors. Q3=matrix(list("offdiag"),5,5) diag(Q3)="q" Q3[,5]=0; Q3[5,]=0; Q3[5,5]="q.hc" Q.models=list("equalvarcov","unconstrained",Q3) names(Q.models)=c("equalvarcov","unconstrained","hood.independent") The independent Hood Canal Q allow correlation between the other four subpopulations but none between Hood Canal and those four: Q.models$hood.independent [,1] [1,] "q" [,2] [,3] [,4] [,5] "offdiag" "offdiag" "offdiag" 0 8.7 Discussion [2,] [3,] [4,] [5,] "offdiag" "offdiag" "offdiag" 0 "q" "offdiag" "offdiag" 0 "offdiag" "q" "offdiag" 0 "offdiag" "offdiag" "q" 0 111 0 0 0 "q.hc" As before, we loop through the model and create a results table: out.tab.hc=NULL fits.hc=list() for(i in 1:length(Z.models.hc)){ for(j in 1:length(Q.models)){ if(i==1 & j==3) next #Q3 is only for Hood Separate model Q.model=Q.models[[j]] fit.model = c(list(Z=Z.models.hc[[i]], Q=Q.model), model.constant) fit = MARSS(sealData.hc, model=fit.model, silent=TRUE, control=list(maxit=1000)) out=data.frame(H=names(Z.models.hc)[i], Q=names(Q.models)[j], U=U.model, logLik=fit$logLik, AICc=fit$AICc, num.param=fit$num.params, m=length(unique(Z.models.hc[[i]])), num.iter=fit$numIter, converged=!fit$convergence) out.tab.hc=rbind(out.tab.hc, out) fits.hc=c(fits.hc,list(fit)) } } We sort the results by AICc and compute the ∆AICc. min.AICc=order(out.tab.hc$AICc) out.tab.hc=out.tab.hc[min.AICc,] out.tab.hc=cbind(out.tab.hc, delta.AICc=out.tab.hc$AICc-out.tab.hc$AICc[1]) out.tab.hc=cbind(out.tab.hc,rel.like=exp(-1*out.tab.hc$delta.AICc/2)) out.tab.hc=cbind(out.tab.hc,AIC.weight=out.tab.hc$rel.like/sum(out.tab.hc$rel.like)) The results table (out.tab.hc) indicates strong support for treating Hood Canal as a separate subpopulation but not support for completely independent process errors. H Q delta.AICc AIC.weight hood.separate equalvarcov 0.00 0.988 hood.separate hood.independent 8.74 0.012 hood.in.ps equalvarcov 23.53 0.000 hood.separate unconstrained 30.65 0.000 hood.in.ps unconstrained 36.66 0.000 8.7 Discussion In this chapter, we used model selection and AICc model weights to explore the temporal correlation structure in the harbor seal abundance data from 112 8 Using MARSS models to identify spatial population structure and covariance the U.S. west coast. We used the term ‘subpopulation’, however it should be kept in mind that we are actually looking at the data support for different spatial patterns of temporal correlation in the process errors. Treating region A and B as a ‘subpopulation’ in this context means that we are asking if the counts from A and B can be treated as observations of the same underlying stochastic trajectory. Metapopulation structure refers to a case where a larger population is composed on a collection of smaller temporally independent subpopulations. Metapopulation structure buffers the variability seen in the larger population and has important consequences for the viability of a population. We tested for temporal independence using diagonal versus non-diagonal Q matrices. Although the west coast harbor seal population appears to divided into ‘subpopulations’ that experience different population growth rates, there is strong temporal correlation in the year-to-year variability experienced in these subpopulations. This suggests that this harbor seal population does not function as a true metapopulation with independent subpopulations but rather as a collection of subpopulations that are temporal correlated. 9 Dynamic factor analysis (DFA) 9.1 Overview In this chapter, we use MARSS to do dynamic factor analysis (DFA), which allows us to look for a set of common underlying trends among a relatively large set of time series (Harvey, 1989, sec. 8.5). See also Zuur et al. (2003) which shows a number of examples of DFA applied to fisheries catch data and densities of zoobenthos. We will walk through some examples to show you the math behind DFA, and then in section 9.4, we will show a short-cut for doing a DFA with MARSS using form="dfa". DFA is conceptually different than what we have been doing in the previous applications. Here we are trying to explain temporal variation in a set of n observed time series using linear combinations of a set of m hidden random walks, where m << n. A DFA model is a type of MARSS model with the following structure: xt = xt−1 + wt where wt ∼ MVN(0, Q) yt = Zxt + a + vt where vt ∼ MVN(0, R) (9.1) x0 ∼ MVN(π, Λ) The general idea is that the observations (y) are modeled as a linear combination of hidden trends (x) and factor loadings (Z) plus some offsets (a). The DFA model in Equation 9.1 and the standard MARSS model in Equation 1.1 are equivalent–we have simply set the matrix B equal to an m × m identity matrix (i.e., a diagonal matrix with 1’s on the diagonal and 0’s elsewhere) and the vector u = 0. Type RShowDoc("Chapter_DFA.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 114 9 Dynamic factor analysis 9.1.1 Writing out a DFA model in MARSS form Imagine a case where we had a data set with six observed time series (n = 6) and we want to fit a model with three hidden trends (m = 3). If we write out our DFA model in MARSS matrix form (ignoring the error structures and initial conditions for now), it would look like this: x1 1 0 0 x1 0 w1 x2 = 0 1 0 x2 + 0 + w2 x3 t 0 0 1 x3 t−1 0 w3 t y1 z11 y2 z21 y3 = z31 y4 z41 y5 z51 y6 t z61 z12 z22 z32 z42 z52 z62 v1 z13 a1 a2 v2 z23 x1 z33 x2 + a3 + v3 . a4 v4 z43 x 3 t a5 v5 z53 v6 t a6 z63 The process errors of the hidden trends would be w1 0 q11 q12 q13 w2 ∼ MVN 0 , q12 q22 q23 , w3 t 0 q13 q23 q33 and the observation errors would be r11 0 v1 v2 0 r12 v3 ∼ MVN 0 , r13 v4 0 r14 v5 0 r15 0 r16 v6 t r12 r22 r23 r24 r25 r26 r13 r23 r33 r34 r35 r36 r14 r24 r34 r44 r45 r46 r15 r25 r35 r45 r55 r56 (9.2) (9.3) r16 r26 r36 . r46 r56 r66 (9.4) 9.1.2 Constraints to ensure identifiability If Z, a, and Q in Equation 9.1 are not constrained, then the DFA model above is unidentifiable (Harvey, 1989, sec 4.4). Harvey (1989, sec. 8.5.1) suggests the following parameter constraints to make the model identifiable: in the first m − 1 rows of Z, the z-value in the j-th column and i-th row is set to zero if j > i; a is constrained so that the first m values are set to zero; and Q is set equal to the identity matrix (Im ). 9.1 Overview 115 Zuur et al. (2003), however, found that with Harvey’s second constraint, the EM algorithm is not particularly robust, and it takes a long time to converge. Zuur et al. found that the EM estimates are much better behaved if you instead constrain each of the time series in x to have a mean of zero across t = 1 to T . To do so, they replaced the estimates of the hidden states, xtT , coming out of the Kalman smoother with xtT − x̄ for t = 1 to T , where NOTE : x̄ is the mean of xt across t. With this approach, you estimate all of the a elements, which represent the average level of yt relative to Z(xt − x̄). We found that demeaning the xtT in this way can cause the EM algorithm to have errors (decline in log-likelihood). Instead, we demean our data, and fix all elements of a to zero. Using these constraints, the DFA model in Equation 12.3 becomes x1 1 0 0 x1 0 w1 x2 = 0 1 0 x2 + 0 + w2 x3 t 0 0 1 x3 t−1 0 w3 t y1 z11 y2 z21 y3 = z31 y4 z41 y5 z51 y6 t z61 0 z22 z32 z42 z52 z62 0 v1 0 0 v2 0 x1 z33 x2 + 0 + v3 . 0 v4 z43 x z53 3 t 0 v5 z63 0 v6 t The process errors of the hidden trends w1 0 1 w2 ∼ MVN 0 , 0 w3 t 0 0 (9.5) in Equation 9.3 would then become 00 1 0 , (9.6) 01 but the observation errors in Equation 9.4 would stay the same, such that v1 0 r11 r12 r13 r14 r15 r16 v2 0 r12 r22 r23 r24 r25 r26 v3 ∼ MVN 0 , r13 r23 r33 r34 r35 r36 . (9.7) v4 0 r14 r24 r34 r44 r45 r46 v5 0 r15 r25 r35 r45 r55 r56 0 r16 r26 r36 r46 r56 r66 v6 t To complete our model, we still need the final form for the initial conditions of the state. Following Zuur et al. (2003), we set the initial state vector (x0 ) to have zero mean and a diagonal variance-covariance matrix with large variances, such that 0 500 x0 ∼ MVN 0 , 0 5 0 . (9.8) 0 005 116 9 Dynamic factor analysis 9.2 The data We will analyze some of the Lake Washington plankton data included in the MARSS package. This dataset includes 33 years of monthly counts for 13 plankton species along with data on water temperature, total phosphorous (TP), and pH. First, we load the data and then extract a subset of columns corresponding to the phytoplankton species only. For the purpose of speeding up model fitting times and to limit our analysis to years with no missing covariate data, we will only examine 10 years of data (1980-1989). # load the data (there are 3 datasets contained here) data(lakeWAplankton) # we want lakeWAplanktonTrans, which has been transformed # so the 0s are replaced with NAs and the data z-scored dat = lakeWAplanktonTrans # use only the 10 years from 1980-1989 plankdat = dat[dat[,"Year"]>=1980 & dat[,"Year"]<1990,] # create vector of phytoplankton group names phytoplankton = c("Cryptomonas", "Diatoms", "Greens", "Unicells", "Other.algae") # get only the phytoplankton dat.spp.1980 = plankdat[,phytoplankton] Next, we transpose the data and calculate the number of time series and their length. # transpose data so time goes across columns dat.spp.1980 = t(dat.spp.1980) # get number of time series N.ts = dim(dat.spp.1980)[1] # get length of time series TT = dim(dat.spp.1980)[2] It is normal in this type of analysis to standardize each time series by first subtracting its mean and then dividing by its standard deviation (i.e., create a z -score yt∗ with mean = 0 and SD = 1), such that yt∗ = Σ−1 (yt − ȳ), Σ is a diagonal matrix with the standard deviations of each time series along the diagonal, and ȳ is a vector of the means. In R, this can be done as follows Sigma = sqrt(apply(dat.spp.1980, 1, var, na.rm=TRUE)) y.bar = apply(dat.spp.1980, 1, mean, na.rm=TRUE) dat.z = (dat.spp.1980 - y.bar) * (1/Sigma) rownames(dat.z) = rownames(dat.spp.1980) Figure 9.1 shows time series of Lake Washington phytoplankton data following z -score transformation. 9.3 Setting up the model in MARSS Cryptomonas 1980 1982 1984 1986 1988 0 −2 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● −4 ● Abundance index 1 3 2 1 0 Abundance index −2 −1 Unicells ● ● 1990 ● 1980 1982 1984 2 ● 1980 1982 1984 1986 ● 1988 1990 1988 ● 1990 ● ● ●● 1 ● ● 0 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● Abundance index ● 1986 Other.algae ● ● 1980 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● −2 1 0 −1 −2 Abundance index 2 Diatoms ● ● ● ● 117 ●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● 1984 ●● ● ● ●● ●● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● 1982 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 1986 1988 1990 2 1 0 −2 −1 Abundance index 3 Greens ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●●●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 1980 ● 1982 1984 1986 ● 1988 ● 1990 Fig. 9.1. Time series of Lake Washington phytoplankton data following z -score transformation. 9.3 Setting up the model in MARSS As we have seen in other cases, setting up the model structure for MARSS requires that the parameter matrices have a one-to-one correspondence to the model as you would write it on paper (i.e., Equations 9.5 through 9.8). If a parameter matrix has a combination of fixed and estimated values, then you specify that using matrix(list(), nrow, ncol). This is a matrix of class list and allows you to combine numeric and character values in a single matrix. MARSS recognizes the numeric values as fixed values and the character values as estimated values. This is how we set up Z for MARSS, assuming a model with 5 observed time series and 3 hidden trends: Z.vals = list( "z11", 0 , 0 , "z21","z22", 0 , "z31","z32","z33", "z41","z42","z43", 118 9 Dynamic factor analysis "z51","z52","z53") Z = matrix(Z.vals, nrow=N.ts, ncol=3, byrow=TRUE) When specifying the list values, spacing and carriage returns were added to help show the correspondence with the Z matrix in Equation 9.3. If you print Z (at the R command line), you will see that it is a matrix with character values (the estimated elements) and numeric values (the fixed 0’s). print(Z) [1,] [2,] [3,] [4,] [5,] [,1] "z11" "z21" "z31" "z41" "z51" [,2] 0 "z22" "z32" "z42" "z52" [,3] 0 0 "z33" "z43" "z53" Notice that the 0’s do not have quotes around them. If they did, it would mean the "0" is a character value and would be interpreted as the name of a parameter to be estimated rather than a fixed numeric value. The Q and B matrices are both set equal to the identity matrix using diag(). Q = B = diag(1,3) For our first analysis, we will assume that each time series of phytoplankton has a different observation variance, but that there is no covariance among time series. Thus, R should be a diagonal matrix that looks like: r11 0 0 0 0 0 r22 0 0 0 0 0 r33 0 0 , 0 0 0 r44 0 0 0 0 0 r55 and each of the ri,i elements is a different parameter to be estimated. We can also specify this R structure using a list matrix as follows: R.vals = list( "r11",0,0,0,0, 0,"r22",0,0,0, 0,0,"r33",0,0, 0,0,0,"r44",0, 0,0,0,0,"r55") R = matrix(R.vals, nrow=N.ts, ncol=N.ts, byrow=TRUE) 9.3 Setting up the model in MARSS 119 You can print R at the R command line to see what it looks like: print(R) [1,] [2,] [3,] [4,] [5,] [,1] "r11" 0 0 0 0 [,2] 0 "r22" 0 0 0 [,3] 0 0 "r33" 0 0 [,4] 0 0 0 "r44" 0 [,5] 0 0 0 0 "r55" This form of variance-covariance matrix is commonly used, and therefore MARSS has a built-in shorthand for this structure. Alternatively, we could simply type: R = "diagonal and unequal" As mentioned in earlier chapters, there are other shorthand notations for many of the common parameter structures. Type ?MARSS at the R command line to see a list of the shorthand options for each parameter vector/matrix. The parameter vectors π (termed x0 in MARSS), a and u are each set to be a column vector of zeros. Either of the following can be used: x0 = U = matrix(0, nrow=3, ncol=1) A = matrix(0, nrow=6, ncol=1) x0 = U = A = "zero" The Λ matrix (termed V0 in MARSS) is a diagonal matrix with 5’s along the diagonal: V0 = diag(5,3) Finally, we make a list of the model parameters to pass to the MARSS() function and set the control list: dfa.model = list(Z=Z, A="zero", R=R, B=B, U=U, Q=Q, x0=x0, V0=V0) cntl.list = list(maxit=50) For the examples in this chapter, we have set the maximum iterations to 50 to speed up model fitting. Note, however, that the parameter estimates will not have converged to their maximum likelihood values, which would likely take 100s, if not 1000+, iterations. 9.3.1 Fitting the model We can now pass the DFA model list to MARSS() to estimate the Z matrix and underlying hidden states (x). The output is not shown because it is voluminous, but the model fits are plotted in Figure 9.2. The warnings regarding non-convergence are due to setting maxit to 50. kemz.3 = MARSS(dat.z, model=dfa.model, control=cntl.list) 120 9 Dynamic factor analysis Warning! Reached maxit before parameters converged. Maxit was 50. neither abstol nor log-log convergence tests were passed. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 WARNING: maxit reached at 50 iter before convergence. Neither abstol nor log-log convergence test were passed. The likelihood and params are not at the ML values. Try setting control$maxit higher. Log-likelihood: -782.202 AIC: 1598.404 AICc: 1599.463 Z.z11 Z.z21 Z.z31 Z.z41 Z.z51 Z.z22 Z.z32 Z.z42 Z.z52 Z.z33 Z.z43 Z.z53 R.(Cryptomonas,Cryptomonas) R.(Diatoms,Diatoms) R.(Greens,Greens) R.(Unicells,Unicells) R.(Other.algae,Other.algae) Estimate 0.4163 0.5364 0.2780 0.5179 0.1611 0.6757 -0.2381 -0.2381 -0.2230 0.2305 -0.1225 0.3887 0.6705 0.0882 0.7201 0.1865 0.5441 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Convergence warnings 10 warnings. First 10 shown. Type cat(object$errors) to see the full list. Warning: the Z.z51 parameter value has not converged. Warning: the Z.z32 parameter value has not converged. Warning: the Z.z52 parameter value has not converged. Warning: the Z.z33 parameter value has not converged. Warning: the Z.z43 parameter value has not converged. Warning: the R.(Diatoms,Diatoms) parameter value has not converged. Warning: the R.(Greens,Greens) parameter value has not converged. Warning: the R.(Other.algae,Other.algae) parameter value has not converged. 9.4 Using model selection to determine the number of trends 121 Warning: the logLik parameter value has not converged. Type MARSSinfo("convergence") for more info on this warning. Cryptomonas 3 2 1 0 abundance index ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●●●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● −4 −2 3 2 1 0 −2 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ●● ●●● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● −4 abundance index Unicells ● ● ● 1980 1982 1984 1986 1988 1990 ● 1980 1982 1988 ● 1990 1 2 3 ● 0 ● ● −2 ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● abundance index ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● −4 2 1 0 −2 ● ● ● ● ● −4 abundance index 1986 Other.algae 3 Diatoms 1984 1980 1982 1984 1986 1988 1990 1980 1982 1984 1986 1988 1990 2 1 0 −2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ●●● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 abundance index 3 Greens ● ● 1980 1982 1984 1986 1988 1990 Fig. 9.2. Plots of Lake Washington phytoplankton data with model fits (dark lines) from a model with 3 trends and a diagonal and unequal variance-covariance matrix for the observation errors. This model was run to convergence so is different than that shown in the text which uses maxit=50. 9.4 Using model selection to determine the number of trends Following Zuur et al. (2003), we use model selection criteria (specifically AICc) to determine the number of underlying trends that have the highest data support. Our first model had three underlying trends (m = 3). Let’s compare this to a model with two underlying trends. The forms for parameter matrix R and vector a will stay the same but we need to change the other parameter vectors and matrices because m is different. 122 9 Dynamic factor analysis After showing you the matrix math behind a DFA model, we will now use the form argument for a MARSS call to specify that we want to fit a DFA model. This will set up the Z matrix and the other parameters for you. Specify how many trends you want by passing in model=list(m=x). You can also pass in different forms for the R matrix in the usual way. Here is how to fit two trends using form="dfa": model.list = list(m=2, R="diagonal and unequal") kemz.2 = MARSS(dat.spp.1980, model=model.list, z.score=TRUE, form="dfa", control=cntl.list) Warning! Reached maxit before parameters converged. Maxit was 50. neither abstol nor log-log convergence tests were passed. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 WARNING: maxit reached at 50 iter before convergence. Neither abstol nor log-log convergence test were passed. The likelihood and params are not at the ML values. Try setting control$maxit higher. Log-likelihood: -789.7433 AIC: 1607.487 AICc: 1608.209 Z.11 Z.21 Z.31 Z.41 Z.51 Z.22 Z.32 Z.42 Z.52 R.(Cryptomonas,Cryptomonas) R.(Diatoms,Diatoms) R.(Greens,Greens) R.(Unicells,Unicells) R.(Other.algae,Other.algae) Estimate 0.3128 0.1797 0.3061 0.5402 0.0791 -0.1174 0.4024 -0.0552 0.3895 0.7500 0.8565 0.5672 0.2292 0.6738 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Convergence warnings Warning: the Z.31 parameter value has not converged. Warning: the Z.51 parameter value has not converged. 9.4 Using model selection to determine the number of trends 123 Warning: the Z.32 parameter value has not converged. Warning: the Z.42 parameter value has not converged. Warning: the R.(Greens,Greens) parameter value has not converged. Warning: the logLik parameter value has not converged. Type MARSSinfo("convergence") for more info on this warning. and compare its AICc value to that from the 3-trend model. print(cbind(model=c("3 trends", "2 trends"), AICc=round(c(kemz.3$AICc, kemz.2$AICc))), quote=FALSE) model AICc [1,] 3 trends 1589 [2,] 2 trends 1608 It looks like a model with 3 trends has much more support from the data because its AICc value is more than 10 units less than that for the 2-trend model. 9.4.1 Comparing many model structures Now let’s examine a larger suite of possible models. We will test from one to four underlying trends (m = 1 to 4) and four different structures for the R matrix: 1. 2. 3. 4. same variances & no covariance (``diagonal and equal''); different variances & no covariance (``diagonal and unequal''); same variances & same covariance (``equalvarcov''); and different variances & covariances (``unconstrained''). The following code builds our model matrices; you could also write out each matrix as we did in the first example, but this allows us to build and run all of the models together. (NOTE : the following piece of code will take a very long time to run!) # set new control params cntl.list = list(minit=200, maxit=5000, allow.degen=FALSE) # set up forms of R matrices levels.R = c("diagonal and equal", "diagonal and unequal", "equalvarcov", "unconstrained") model.data = data.frame() # fit lots of models & store results # NOTE: this will take a long time to run! for(R in levels.R) { for(m in 1:(N.ts-1)) { 124 9 Dynamic factor analysis dfa.model = list(A="zero", R=R, m=m) kemz = MARSS(dat.z, model=dfa.model, control=cntl.list, form="dfa", z.score=TRUE) model.data = rbind(model.data, data.frame(R=R, m=m, logLik=kemz$logLik, K=kemz$num.params, AICc=kemz$AICc, stringsAsFactors=FALSE)) assign(paste("kemz", m, R, sep="."), kemz) } # end m loop } # end R loop Model selection results are shown in Table 9.1. The model with lowest AICc has 2 trends and an unconstrained R matrix. It also appears that, in general, models with an unconstrained R matrix fit the data much better than those models with less complex structures for the observation errors (i.e., models with unconstrained forms for R had nearly all of the AICc weight). Table 9.1. Model selection results. R unconstrained unconstrained unconstrained unconstrained diagonal and unequal equalvarcov diagonal and unequal diagonal and equal diagonal and equal equalvarcov equalvarcov diagonal and unequal equalvarcov diagonal and equal diagonal and unequal diagonal and equal m 3 2 4 1 4 2 3 4 3 4 3 2 1 2 1 1 logLik -762.5 -765.9 -761.5 -772.4 -774.2 -782.7 -777.1 -779.3 -781.8 -779.0 -781.4 -786.6 -799.9 -798.4 -798.4 -813.5 delta.AICc 0.0 0.1 2.3 4.4 5.9 6.1 7.5 7.7 8.4 9.1 9.9 20.2 32.3 35.4 35.4 57.4 Ak.wt Ak.wt.cum 0.39 0.39 0.37 0.76 0.12 0.89 0.04 0.93 0.02 0.95 0.02 0.97 0.01 0.98 0.01 0.99 0.01 0.99 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 9.5 Using varimax rotation to determine the loadings and trends As Harvey (1989, p. 450) discusses in section 8.5.1, there are multiple equivalent solutions to the dynamic factor loadings. We arbitrarily constrained Z in 9.5 Using varimax rotation to determine the loadings and trends 125 such a way to choose only one of these solutions, but fortunately the different solutions are equivalent, and they can be related to each other by a rotation matrix H. Let H be any m × m non-singular matrix. The following are then equivalent solutions: yt = Zxt + a + vt (9.9) xt = xt−1 + wt and yt = ZH−1 xt + a + vt Hxt = Hxt−1 + Hwt (9.10) There are many ways of doing factor rotations, but a common approach is the varimax rotation which seeks a rotation matrix H that creates the largest difference between loadings. For example, let’s say there are three trends in our model. In our estimated Z matrix, let’s say row 3 is (0.2, 0.2, 0.2). That would mean that data series 3 is equally described by trends 1, 2, and 3. If instead row 3 was (0.8, 0.1, 0.1), this would make interpretation easier because we could say that data time series 3 was mostly described by trend 1. The varimax rotation finds the H matrix that makes the Z rows more like (0.8, 0.1, 0.1) and less like (0.2, 0.2, 0.2). The varimax rotation is easy to compute because R has a built in function for this. To do so, we first get the model fits from the highest ranked model. # get the "best" model best.model = model.tbl[1,] fitname = paste("kemz",best.model$m,best.model$R,sep=".") best.fit = get(fitname) Next, we retrieve the matrix used for varimax rotation. # get the inverse of the rotation matrix H.inv = varimax(coef(best.fit, type="matrix")$Z)$rotmat Finally, we use H−1 to rotate the factor loadings and H to rotate the trends as in Equation 9.10. # rotate factor loadings Z.rot = coef(best.fit, type="matrix")$Z %*% H.inv # rotate trends trends.rot = solve(H.inv) %*% best.fit$states Rotated factor loadings for the best model are shown in Figure 9.3. Oddly, some taxa appear to have no loadings on some trends (e.g., diatoms on trend 1). The reason is that, merely for display purposes, we chose to plot only those loadings that are greater than 0.05, and it turns out that after varimax rotation, several loadings are close to 0. Recall that we set Var(wt ) = Q = Im in order to make our DFA model identifiable. Does the variance in the process errors also change following varimax 126 9 Dynamic factor analysis rotation? Interestingly, no. Because H is a non-singular, orthogonal matrix, Var(Hwt ) = HVar(wt )H> = HIm H> = Im . Other.algae Diatoms Cryptomonas Greens 0.4 0.0 Unicells Factor loadings on trend 2 −0.4 Other.algae Greens Diatoms Cryptomonas −0.4 0.0 0.4 Factor loadings on trend 1 Unicells Greens −0.4 0.0 0.4 Factor loadings on trend 3 Fig. 9.3. Plot of the factor loadings (following varimax rotation) from the best model fit to the phytoplankton data. 9.6 Examining model fits Now that we have found a “best” model and done the appropriate factor and trends rotations, we should examine some plots of model fits (see Figure 9.5). First, it looks like the model did an adequate job of capturing some of the high frequency variation (i.e., seasonality) in the time series. Second, some of the time series had much better overall fits than others (e.g., compare Diatoms versus Cryptomonas). Given the obvious seasonal patterns in the phytoplankton data, it might be worthwhile to first “detrend” the data and then repeat the model fitting exercise to see (1) how many trends would be favored, and (2) the shape of those trends. 9.6 Examining model fits 2 −2 −6 −6 −2 2 6 Trend 2 6 Trend 1 127 1980 1983 1986 1989 1980 1983 1986 1989 −6 −2 2 6 Trend 3 1980 1983 1986 1989 Fig. 9.4. Plot of the unobserved trends (following varimax rotation) from the best model fit to the phytoplankton data. Cryptomonas 3 2 1 0 abundance index ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●●●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● −4 −2 3 2 1 0 −2 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ●●● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● −4 abundance index Unicells ● ● ● 1980 1982 1984 1986 1988 1990 ● 1980 1982 1988 ● 1990 3 2 1 0 ● ● −2 ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● abundance index ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● −4 2 1 0 −2 ● ● ● ● ● −4 abundance index 1986 Other.algae 3 Diatoms 1984 1980 1982 1984 1986 1988 1990 1980 1982 1984 1986 1988 1990 2 1 0 −2 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 abundance index 3 Greens ● ● ● ● 1980 1982 1984 1986 1988 1990 Fig. 9.5. Plot of the ”best” model fits to the phytoplankton data. 128 9 Dynamic factor analysis 9.7 Adding covariates It is standard to add covariates to the analysis so that one removes known important drivers. The DFA with covariates is written: xt = xt−1 + wt where wt ∼ MVN(0, Q) yt = Zxt + a + Ddt + vt where vt ∼ MVN(0, R) (9.11) x0 ∼ MVN(π, Λ) where the q × 1 vector dt contains the covariate(s) at time t, and the n × q matrix D contains the effect(s) of the covariate(s) on the observations. Using form="dfa" and covariates=, we can easily add covariates to our DFA, but this means that the covariates are input, not data, and there can be no missing values (see Chapter 6 for how to include covariates with missing values). The Lake Washington dataset has two environmental covariates that we might expect to have effects on phytoplankton growth, and hence, abundance: temperature (Temp) and total phosphorous (TP). We need the covariate inputs to have the same number of time steps as the variate data, and thus we limit the covariate data to the years 1980-1994 also. temp = t(plankdat[,"Temp",drop=FALSE]) TP = t(plankdat[,"TP",drop=FALSE]) We will now fit 3 different models that each add covariate effects (i.e., Temp, TP, Temp & TP) to our “best” model from Table 9.1 where m = 2 and R is “unconstrained”. model.list=list(m=2, R="unconstrained") kemz.temp = MARSS(dat.spp.1980, model=model.list, z.score=TRUE, form="dfa", control=cntl.list, covariates=temp) kemz.TP = MARSS(dat.spp.1980, model=model.list, z.score=TRUE, form="dfa", control=cntl.list, covariates=TP) kemz.both = MARSS(dat.spp.1980, model=model.list, z.score=TRUE, form="dfa", control=cntl.list, covariates=rbind(temp,TP)) Next we can compare whether the addition of the covariates improves the model fit (effectively less residual error while accounting for the additional parameters). (NOTE : The following results were obtained by letting the EM algorithm run for a very long time, so your results may differ.) print(cbind(model=c("no covars", "Temp", "TP", "Temp & TP"), AICc=round(c(best.fit$AICc, kemz.temp$AICc, kemz.TP$AICc, kemz.both$AICc))), quote=FALSE) model AICc [1,] no covars 1582 [2,] Temp 1518 9.8 Questions and further analyses 129 [3,] TP 1568 [4,] Temp & TP 1522 This suggests that adding temperature or phosphorus to the model, either alone or in combination with one another, improves overall model fit. If we were truly interested in assessing the “best” model structure that includes covariates, however, we should examine all combinations of trends and structures for R. The model fits for the temperature-only model are shown in Fig 9.6 and they appear much better than the best model without any covariates. Cryptomonas 3 2 1 0 abundance index ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●●●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● −4 −2 3 2 1 0 −2 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ●●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● −4 abundance index Unicells ● ● ● 1980 1982 1984 1986 1988 1990 ● 1980 1982 1988 ● 1990 2 3 ● 1 ● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● 0 ● ● −2 ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● −4 −2 ● ● ● abundance index 2 1 ● ● ● ● 0 ● −4 abundance index 1986 Other.algae 3 Diatoms 1984 1980 1982 1984 1986 1988 1990 1980 1982 1984 1986 1988 1990 2 1 0 −2 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ●●● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 abundance index 3 Greens 1980 1982 1984 1986 1988 1990 Fig. 9.6. Plot of the fits from the temperature-only model to the phytoplankton data. 9.8 Questions and further analyses We analyzed the phytoplankton data alone. You can try analyzing the zooplankton data (type head(plankdat) to see the names). You can also try analyzing the phytoplankton and zooplankton together. You can also try different assumptions concerning the structure of R; we just tried unconstrained, 130 9 Dynamic factor analysis diagonal and unequal, and diagonal and equal. To see all the R code behind the figures, type RShowDoc("Chapter_DFA.R",package="MARSS"). This opens a file with all the code. Copy and paste the code into a new file, and then you can edit that code. DFA models often take an unusually long time to converge. In a real DFA, you will want to make sure to try different initial starting values (e.g., set MCInit = TRUE), and force the algorithm to run a long time by using minit=x and maxit=(x+c), where x and c are something like 200 and 5000, respectively. 10 Analyzing noisy animal tracking data 10.1 A simple random walk model of animal movement A simple random walk model of movement with drift (directional movement) but no correlation is x1,t = x1,t−1 + u1 + w1,t , w1,t ∼ N(0, σ21 ) (10.1) N(0, σ22 ) (10.2) x2,t = x2,t−1 + u2 + w2,t , w2,t ∼ where x1,t is the location at time t along one axis (here, longitude) and x2,t is for another, generally orthogonal, axis (in here, latitude). The parameter u1 is the rate of longitudinal movement and u2 is the rate of latitudinal movement. We add errors to our observations of location: y1,t = x1,t + v1,t , v1,t ∼ N(0, η21 ) (10.3) N(0, η22 ), (10.4) y2,t = x2,t + v2,t , v2,t ∼ This model is comprised of two separate univariate state-space models. Note that y1 depends only on x1 and y2 depends only on x2 . There are no actual interactions between these two univariate models. However, we can write the model down in the form of a multivariate model using diagonal variance-covariance matrices and a diagonal design (Z) matrix. Because the variance-covariance matrices and Z are diagonal, the x1 :y1 and x2 :y2 processes will be independent as intended. Here are Equations 10.2 and 10.4 written as a MARSS model (in matrix form): Type RShowDoc("Chapter_AnimalTracking.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 132 10 Analyzing animal tracking data 2 x1,t x u w σ 0 = 1,t−1 + 1 + 1,t , wt ∼ MVN 0, 1 2 x2,t x2,t−1 u2 w2,t 0 σ2 (10.5) 2 y1,t 1 0 x1,t v η 0 = + 1,t , vt ∼ MVN 0, 1 2 y2,t 0 1 x2,t v2,t 0 η2 (10.6) The variance-covariance matrix for wt is a diagonal matrix with unequal variances, σ21 and σ22 . The variance-covariance matrix for vt is a diagonal matrix with unequal variances, η21 and η22 . We can write this succinctly as xt = xt−1 + u + wt , wt ∼ MVN(0, Q) (10.7) yt = xt + vt , vt ∼ MVN(0, R). (10.8) 10.2 Loggerhead sea turtle tracking data Loggerhead sea turtles (Caretta caretta) are listed as threatened under the United States Endangered Species Act of 1973. Over the last ten years, a number of state and local agencies have been deploying ARGOS tags on loggerhead turtles on the east coast of the United States. We have data on eight individuals over that period. In this chapter, we use some turtle data from the WhaleNet Archive of STOP Data, however we have corrupted this data severely by adding random errors in order to create a “bad tag” problem. We corrupted latitude and longitude data by errors (Figure 10.1) and it would appear that our sea turtles are becoming land turtles (at least part of the time). We will the MARSS model to estimate true positions and speeds from the corrupted data. Our noisy data are in loggerheadNoisy. They consist of daily readings of location (longitude/latitude). If data are missing for a day, then the entries for latitude and longitude for that day should be NA. However, to make the code in this chapter run quickly, we have interpolated all missing values in the original, uncorrupted, dataset (loggerhead). The first six lines of the corrupted data look like so loggerheadNoisy[1:6,] 1 2 3 4 5 6 turtle month day year lon lat BigMama 5 28 2001 -81.45989 31.70337 BigMama 5 29 2001 -80.88292 32.18865 BigMama 5 30 2001 -81.27393 31.67568 BigMama 5 31 2001 -81.59317 31.83092 BigMama 6 1 2001 -81.35969 32.12685 BigMama 6 2 2001 -81.15644 31.89568 The file has data for eight turtles: 10.3 Estimate locations from bad tag data 133 turtles=levels(loggerheadNoisy$turtle) turtles [1] "BigMama" [6] "MaryLee" "Bruiser" "TBA" "Humpty" "Yoto" "Isabelle" "Johanna" We will analyze the position data for “Big Mama”. We put the data for “Big Mama” into matrix dat. dat is transposed because we need time across the columns. turtlename="BigMama" dat = loggerheadNoisy[which(loggerheadNoisy$turtle==turtlename),5:6] dat = t(dat) #transpose Figure 10.1 shows the corrupted location data for Big Mama. The figure was generated with the code below and uses the maps R package. You will need to install this R package in order to run the example code. #load the map package; you have to install it first library(maps) # Read in our noisy data (no missing values) pdat = loggerheadNoisy #for plotting turtlename="BigMama" par(mai = c(0,0,0,0),mfrow=c(1,1)) map('state', region = c('florida', 'georgia', 'south carolina', 'north carolina', 'virginia', 'delaware','new jersey','maryland'),xlim=c(-85,-70)) points(pdat$lon[which(pdat$turtle==turtlename)], pdat$lat[which(pdat$turtle==turtlename)], col="blue",pch=21, cex=0.7) 10.3 Estimate locations from bad tag data We will begin by specifying the structure of the MARSS model and then use MARSS() to fit that model to the data. There are two state processes (one for latitude and the other for longitude), and there is one observation time series for each state process. As we saw in Equation 10.6, Z is the an identity matrix (a diagonal matrix with 1s on the diagonal). We could specify this structure as Z.model="identity" or Z.model=factor(c(1,2)). Although technically, this is unnecessary as this is the default form for Z. We will assume that the errors are independent and that there are different drift rates (u), process variances (σ2 ) and observation variances for latitude and longitude (η2 ). Z.model="identity" U.model="unequal" Q.model="diagonal and unequal" R.model="diagonal and unequal" 134 10 Analyzing animal tracking data ● ● ●● ● ● ●●● ●●● ● ● ●● ● ● ●● ●● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●●● ● ●●● ●● ● ● ●● ●●● ●● ● ● ● ● ● ●● ● ● ● Fig. 10.1. Plot of the tag data from the turtle Big Mama. Errors in the location data make it seem that Big Mama has been moving overland. Fit the model to the data: kem = MARSS(dat, model=list(Z = Z.model, Q = Q.model, R = R.model, U = U.model)) Now we can create a plot comparing the estimated and actual locations (Figure 10.2). In Figure The real locations (from which loggerheadNoisy was produced by adding noise) are in loggerhead and plotted with crosses. There are only a few data points for the real data because in the real tag data, there are many missing days. #Code to plot estimated turtle track against observations #The estimates pred.lon = kem$states[1,] pred.lat = kem$states[2,] par(mai = c(0,0,0,0),mfrow=c(1,1)) library(maps) pdat=loggerheadNoisy turtlename="BigMama" 10.3 Estimate locations from bad tag data 135 map('state', region = c('florida', 'georgia', 'south carolina', 'north carolina', 'virginia', 'delaware','new jersey','maryland'),xlim=c(-85,-70)) points(pdat$lon[which(pdat$turtle==turtlename)], pdat$lat[which(pdat$turtle==turtlename)], col="blue",pch=21, cex=0.7) lines(pred.lon, pred.lat,col="red", lwd=2) goodturtles = loggerhead gooddat = goodturtles[which(goodturtles$turtle==turtlename),5:6] points(gooddat[,1], gooddat[,2],col="black", lwd=2, pch=3,cex=1.1) legend("bottomright",c("bad locations", "estimated true location", "good location data"),pch=c(1,-1,3),lty=c(-1,1,-1), col=c("blue","red","black"), bty=FALSE) ● ● ●● ● ● ●●● ●●● ● ● ●● ● ● ●● ●● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●●● ● ●● ●● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● bad locations estimated true location good location data Fig. 10.2. Plot of the estimated track of the turtle Big Mama versus the good location data (before we corrupted it with noise). 136 10 Analyzing animal tracking data 10.4 Estimate speeds for each turtle Turtle biologists designated one of these loggerheads “Big Mama,” presumably for her size and speed. For each of the eight turtles, estimate the average miles traveled per day. To calculate the distance traveled by a turtle each day, you use the estimate (from MARSS()) of the lat/lon location of turtle at day t and at day t − 1. To calculate distance traveled in miles from lat/lon start and finish locations, we will use the function GCDF: GCDF <- function(lon1, lon2, lat1, lat2, degrees=TRUE, units="miles") { temp = ifelse(degrees==FALSE, acos(sin(lat1)*sin(lat2)+cos(lat1)*cos(lat2)*cos(lon2-lon1)), acos(sin(lat1/57.2958)*sin(lat2/57.2958)+cos(lat1/57.2958)*cos(lat2/57.2958) *cos(lon2/57.2958-lon1/57.2958))) r=3963.0 # (statute miles) , default if("units"=="nm") r=3437.74677 # (nautical miles) if("units"=="km") r=6378.7 # (kilometers) return (r * temp) } We can now compute the distance traveled each day by passing in lat/lon estimates from day i − 1 and day i: distance[i-1]=GCDF(pred.lon[i-1],pred.lon[i], pred.lat[i-1],pred.lat[i]) pred.lon and pred.lat are the predicted longitudes and latitudes from MARSS(): rows one and two in kem$states. To calculate the distances for all days, we put this through a for loop: distance = array(NA, dim=c(dim(dat)[2]-1,1)) for(i in 2:dim(dat)[2]) distance[i-1]=GCDF(pred.lon[i-1],pred.lon[i], pred.lat[i-1],pred.lat[i]) The command mean(distance) gives us the average distance per day. We can also make a histogram of the distances traveled per day (Figure 10.3). We can compare the histogram of daily distances to what we would get if we had not accounted for measurement error (Figure 10.4). We can also compare the mean miles per day: #accounting for observation error mean(distance) [1] 15.53858 #assuming the data have no observation error mean(distance.noerr) [1] 34.80579 10.5 Using specialized packages to analyze tag data 137 par(mfrow=c(1,1)) hist(distance) #make a histogram of distance traveled per day 10 0 5 Frequency 15 Histogram of distance 0 10 20 30 40 50 distance Fig. 10.3. Histogram of the miles traveled per day for Big Mama with estimates that account for measurement error in the data. You can repeat the analysis done for “Big Mama” for each of the other turtles and compare the turtle speeds and errors. You will need to replace “Big Mama” in the code with the name of the other turtle: levels(loggerheadNoisy$turtle) [1] "BigMama" [6] "MaryLee" "Bruiser" "TBA" "Humpty" "Yoto" "Isabelle" "Johanna" 10.5 Using specialized packages to analyze tag data If you have real tag data to analyze, you should use a state-space modeling package that is customized for fitting MARSS models to tracking data. The MARSS package does not have all the bells and whistles that you would want for analyzing tracking data, particularly tracking data in the marine 138 10 Analyzing animal tracking data # Compare to the distance traveled per day if you used the raw data distance.noerr = array(NA, dim=c(dim(dat)[2]-1,1)) for(i in 2:dim(dat)[2]) distance.noerr[i-1]=GCDF(dat[1,i-1],dat[1,i],dat[2,i-1],dat[2,i]) hist(distance.noerr) #make a histogram of distance traveled per day 10 0 5 Frequency 15 20 Histogram of distance.noerr 0 20 40 60 80 100 distance.noerr Fig. 10.4. Histogram of the miles traveled per day for Big Mama with estimates that account for measurement error in the data. environment. These are a couple R packages that we have come across for this purpose: UKFSST http://www.soest.hawaii.edu/tag-data/tracking/ukfsst/ KFTRACK http://www.soest.hawaii.edu/tag-data/tracking/kftrack/ kftrack is a full-featured toolbox for analyzing tag data with extended Kalman filtering. It incorporates a number of extensions that are important for analyzing track data: barriers to movement such as coastlines and nonGaussian movement distributions. With kftrack, you can use the real tag data which has big gaps, i.e. days with no location. MARSS() will struggle with these data because it will estimate states for all the unseen days; kftrack only fits to the seen days. To use kftrack to fit the turtle data, type 10.5 Using specialized packages to analyze tag data library(kftrack) # must be installed from a local zip file loggerhead = loggerhead # Run kftrack with the first turtle (BigMama) turtlename = "BigMama" dat = loggerhead[which(loggerhead$turtle == turtlename),2:6] model = kftrack(dat, fix.first=F, fix.last=F, var.struct="uniform") 139 11 Detection of outliers and structural breaks 11.1 River flow in the Nile River This chapter is based on a short example shown on pages 147-148 in Koopman et al. (1999) using a 100-year record of river flow on the Nile River. The methods are based on Harvey et al. (1998) which is in turn based on techniques in Harvey and Koopman (1992) and Koopman (1993). The Nile dataset is included in R . Figure 11.1 shows the data. 11.2 Different models for the Nile flow levels We begin by fitting different flow models to the data and compare these models with AIC. After that, we will use the model residuals to look for outliers and structural breaks. 11.2.1 Flat level model We will start by modeling these data as a simple average river flow with variability around this level. yt = a + vt where vt ∼ N(0, r) (11.1) where yt is the river flow volume at year t and x is some constant average flow level (notice it has no t subscript). To fit this model with MARSS, we will explicitly show all the MARSS parameters. Type RShowDoc("Chapter_StructuralBreaks.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 142 11 Outliers and structural breaks 1000 800 600 Flow volume 1200 1400 #load the datasets package library(datasets) data(Nile) #load the data plot(Nile,ylab="Flow volume",xlab="Year") 1880 1900 1920 1940 1960 Year Fig. 11.1. The Nile River flow volume 1871 to 1970 (included dataset in R ). xt = 1 × xt−1 + 0 + wt where wt ∼ N(0, 0) yt = 0 × xt + a + vt where vt ∼ N(0, r) (11.2) x0 = 0 MARSS includes the state process xt but we are setting Z to zero so that does not appear in our observation model. We need to fix all the state parameters to zero so that the algorithm doesn’t “chase its tail” trying to fit xt to the data. An equivalent way to write this model is to use xt as the average flow level and make it be a constant level by setting q = 0. The average flow appears as the x0 parameter. In MARSS form, the model is: xt = 1 × xt−1 + 0 + wt where wt ∼ N(0, 0) yt = 1 × xt + 0 + vt where vt ∼ N(0, r) x0 = a (11.3) 11.2 Different models for the Nile flow levels 143 We will use this latter format since we will be building on this form. The model is specified as a list as follows and we denote this model “0”: mod.nile.0 = list( Z=matrix(1), A=matrix(0), R=matrix("r"), B=matrix(1), U=matrix(0), Q=matrix(0), x0=matrix("a") ) We then fit the model with MARSS(): #The data is in a ts format, and we need a matrix dat = t(as.matrix(Nile)) #Now we fit the model kem.0 = MARSS(dat, model=mod.nile.0) Success! algorithm run for 15 iterations. abstol and log-log tests passed. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Algorithm ran 15 (=minit) iterations and convergence was reached. Log-likelihood: -654.5157 AIC: 1313.031 AICc: 1313.155 R.r x0.a Estimate 28352 919 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. 11.2.2 Linear trend in flow model Figure 11.2 shows the fit for the flat average river flow model. Looking at the data, we might expect that a declining average river flow would be better. In MARSS form, that model would be: xt = 1 × xt−1 + u + wt where wt ∼ N(0, 0) yt = 1 × xt + 0 + vt where vt ∼ N(0, r) (11.4) x0 = a where u is now the average per-year decline in river flow volume. The model is specified as a list as follows and we denote this model “1”: 144 11 Outliers and structural breaks mod.nile.1 = list( Z=matrix(1), A=matrix(0), R=matrix("r"), B=matrix(1), U=matrix("u"), Q=matrix(0), x0=matrix("a") ) We then fit the model with MARSS(): kem.1 = MARSS(dat, model=mod.nile.1) Success! abstol and log-log tests passed at 18 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 18 iterations. Log-likelihood: -642.3159 AIC: 1290.632 AICc: 1290.882 Estimate R.r 22213.60 U.u -2.69 x0.a 1054.94 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Figure 11.2 shows the fits for the two models with deterministic models (flat and declining) for mean river flow along with their AICc values (smaller AICc is better). The AICc for the model with a declining river flow is lower by over 20 (which is a lot). 11.2.3 Stochastic level model Looking at the flow levels, we might suspect that a model that allows the average flow to change would model the data better and we might suspect that there have been sudden, and anomalous, changes in the river flow level. We will now model the average river flow at year t as a random walk, specifically an autoregressive process which means that average river flow is year t is a function of average river flow in year t − 1. xt = xt−1 + wt where wt ∼ N(0, q) yt = xt + vt where vt ∼ N(0, r) (11.5) x0 = π As before, yt is the river flow volume at year t. With all the MARSS parameters shown, the model is: 11.2 Different models for the Nile flow levels 145 xt = 1 × xt−1 + 0 + wt where wt ∼ N(0, q) yt = 1 × xt + 0 + vt where vt ∼ N(0, r) (11.6) x0 = π Thus, Z = 1, a = 0, R = r, B = 1, u = 0, Q = q, and x0 = π. The model is then specified as: mod.nile.2 = list( Z=matrix(1), A=matrix(0), R=matrix("r"), B=matrix(1), U=matrix(0), Q=matrix("q"), x0=matrix("pi") ) We could also use the text shortcuts to specify the model. Because R and Q are 1 × 1 matrices, “unconstrained”, “diagonal and unequal“, “diagonal and equal” and “equalvarcov” will all lead to a 1 × 1 matrix with one estimated element. For a and u, the following shortcut could be used: A=U="zero" Because x0 is 1 × 1, it could be specified as “unequal”, “equal” or “unconstrained”. We fit the model with the MARSS() function. We are using the “BFGS” algorithm to polish off the estimates, since it will get the maximum faster than the default EM algorithm as long as we start it close to the maximum. kem.2em = MARSS(dat, model=mod.nile.2, silent=TRUE) kem.2 = MARSS(dat, model=mod.nile.2, inits=kem.2em$par, method="BFGS") Success! Converged in 12 iterations. Function MARSSkfas used for likelihood calculation. MARSS fit is Estimation method: BFGS Estimation converged in 12 iterations. Log-likelihood: -637.7451 AIC: 1281.49 AICc: 1281.74 R.r Q.q x0.pi Estimate 15337 1218 1112 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. This is the same model fit in Koopman et al. (1999, p. 148) except that we estimate x1 as parameter rather than specifying x1 via a diffuse prior. As 146 11 Outliers and structural breaks 1000 model 0, AICc= 1313 600 Flow volume 1400 a result, the log-likelihood value and R and Q are a little different than in Koopman et al. (1999). 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1000 model 1, AICc= 1291 600 Flow volume 1400 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1000 model 2, AICc= 1282 600 Flow volume 1400 1870 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 Fig. 11.2. The Nile River flow volume with the model estimated flow rates (solid lines). The bottom model is a stochastic level model, and the 2 standard deviations for the level are also shown. The other two models are deterministic level models so the state is not stochastic and does not have a standard deviation. 11.3 Observation and state residuals Figure 11.2 shows the MARSS fits to the data. From these model fits, auxiliary residuals can be computed which contain information about whether the data and models fits at time t differ more than you would expect given the model and the model fits at time t − 1. In this section, we follow the example shown 11.3 Observation and state residuals 147 on page 147-148 in Koopman et al. (1999) and use these residuals to look for outliers and sudden flow level changes. Using auxiliary residuals this way follows mainly from Harvey and Koopman (1992), but see also Koopman (1993, sec. 3), de Jong and Penzer (1998) and Penzer (2001) for discussions of using auxiliary residuals for detection of outliers and structural breaks. The MARSS function will output the expected values of xt conditioned on the maximum-likelihood values of q, r, and x1 and on the data (y from t = 1 to T ). In time-series literature, these are called the smoothed state estimates and they are output by the Kalman filter-smoother. We will call these smoothed estimates x̃t|T (and are xtT in the MARSS output). The time value after the | in the subscript indicates the data on which the estimate was conditioned (in this case, 1 to T ). From these, we can compute the model predicted value of yt , denoted or ŷt|T . This is the predicted value of yt conditioned on x̃t|T . x̃t|T = E(Xt |θ̂, yT1 ) ŷt|T = E(Yt |θ̂, x̃t|T ) = x̃t|T + (11.7) E(wt |θ̂, yT1 ) = x̃t|T where θ̂ are the maximum-likelihood estimates of the parameters. The ŷt|T equation comes directly from equation (11.5). This expectation is not conditioned on the data yT1 , directly. It is conditioned on x̃t|T , which is conditioned on yT1 . 11.3.1 Using observation residuals to detect outliers The standardized smoothed observation residuals1 are the difference between the data at time t and the model fit at time t conditioned on all the data standardized by the observation variance: v̂t = yt − ŷt|T 1 v̂t et = p var(v̂t ) (11.8) These residuals should have (asymptotically) a t-distribution (Kohn and Ansley, 1989, sec. 3) and by looking at the residuals, we can identify potential outlier data points–or more accurately, we can identify data points that do not fit the model (Equation 11.5). The call residuals() will compute these residuals for a marssMLE object (output by a MARSS call). It returns the standardized residuals (also called auxiliary residuals) as a n + m × T matrix. The first n rows are the estimated vt standardized observation residuals and 1 also called smoothations in the literature to distinguish them from innovations, which are yt − E(Yt |x̃t|t−1 ). Notice that for innovations the expectation is conditioned on the data up to time t − 1 while for smoothations, we condition on all the data. 148 11 Outliers and structural breaks the next m rows are the estimated wt standardized state residuals (discussed below). resids.0=residuals(kem.0)$std.residuals resids.1=residuals(kem.1)$std.residuals resids.2=residuals(kem.2)$std.residuals Figure 11.3 shows the observation residuals for the three models developed above. We immediately see that model 0 (flat level) and model 1 (linear declining level) have problems because the residuals are all positive for the first part of the time series and then all negative. The residuals should not be temporally correlated like that. Model 2 with a stochastic level shows wellbehaving residuals with low temporal correlation between t and t − 1. Looking at the residuals for model 2, we see that there are a number of years with flow levels that appear to be outliers (are beyond the dashed level lines). 2 0 −2 −4 std. residuals 4 model 0−−flat level 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1950 1960 1970 1950 1960 1970 2 0 −2 −4 std. residuals 4 model 1−−linearly declining level 1870 1880 1890 1900 1910 1920 1930 1940 2 0 −2 −4 std. residuals 4 model 2−−stochastic level 1870 1880 1890 1900 1910 1920 1930 1940 Fig. 11.3. The standardized observation residuals from models 0, 1, and 2. These residuals are the standardized v̂t . The dashed lines are the 95% CIs for a tdistribution. 11.3 Observation and state residuals 149 11.3.2 Detecting sudden level changes The standardized smoothed state residuals ( ft below) are the difference between the estimated state at time t and the estimated state at time t − 1 conditioned on all the data standardized by the standard deviation: ŵt = x̃t|T − x̃t−1|T 1 ŵt ft = p var(ŵt ) (11.9) These state residuals do not show simple changes in the average level; xt is clearly changing in Figure 11.2, bottom panel. Instead we are looking for “breaks” or sudden changes in the level. The bottom panel of Figure 11.4 shows the standardized state residuals ( ft ). This shows, as we can see by eye, the average flow level in the Nile appears to have suddenly changed around the turn of the century when the first Aswan dam was built. The top panel shows the standardized observation residuals for comparison. 1870 1890 1910 1930 1950 1970 1950 1970 test for level changes −4 −2 0 2 standardized residuals 4 −4 −2 0 2 4 test for outliers 1870 1890 1910 1930 Fig. 11.4. Top panel, the standardized observation residuals. Bottom panel, the standardized state residuals. This replicates Figure 12 in Koopman et al. (1999). 12 Incorporating covariates into MARSS models 12.1 Covariates as inputs A MARSS model with covariate effects in both the process and observation components is written as: xt = Bt xt−1 + ut + Ct ct + wt , where wt ∼ MVN(0, Qt ) yt = Zt xt + at + Dt dt + vt , where vt ∼ MVN(0, Rt ) (12.1) where ct is the p × 1 vector of covariates (e.g., temperature, rainfall) which affect the states and dt is a q × 1 vector of covariates (potentially the same as ct ), which affect the observations. Ct is an m × p matrix of coefficients relating the effects of ct to the m × 1 state vector xt , and Dt is an n × q matrix of coefficients relating the effects of dt to the n × 1 observation vector yt . With the MARSS() function, one can fit this model by passing in model$c and/or model$d in the MARSS() call as a p × T or q × T matrix, respectively. The form for Ct and Dt is similarly specified by passing in model$C and/or model$D. Because C and D are matrices, they must be passed in as an 3dimensional array with the 3rd dimension equal to the number of time steps if they are time-varying. If they are time-constant, then they can be specified as 2-dimensional matrices. 12.2 Examples using plankton data Here we show some examples using the Lake Washington plankton data set and covariates in that dataset. We use the 10 years of data from 1965-1974 Type RShowDoc("Chapter_Covariates.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 152 12 Covariates (Figure 12.1), a decade with particularly high green and bluegreen algae levels. We use the transformed plankton dataset which has 0s replaced with NAs. Below, we set up the data and z-score the data. The original data were already z-scored, but we changed the mean when we subsampled the years so need to z-score again. fulldat = lakeWAplanktonTrans years = fulldat[,"Year"]>=1965 & fulldat[,"Year"]<1975 dat = t(fulldat[years,c("Greens", "Bluegreens")]) the.mean = apply(dat,1,mean,na.rm=TRUE) the.sigma = sqrt(apply(dat,1,var,na.rm=TRUE)) dat = (dat-the.mean)*(1/the.sigma) Next we set up the covariate data, temperature and total phosphorous. We z-score the covariates to standardize and remove the mean. covariates = rbind( Temp = fulldat[years,"Temp"], TP = fulldat[years,"TP"]) # z.score the covariates the.mean = apply(covariates,1,mean,na.rm=TRUE) the.sigma = sqrt(apply(covariates,1,var,na.rm=TRUE)) covariates = (covariates-the.mean)*(1/the.sigma) 12.3 Observation-error only model We can estimate the effect of the covariates using a process-error only model, an observation-error only model, or a model with both types of error. An observation-error only model is a multivariate regression, and we will start here so you see the relationship of MARSS model to more familiar linear regression models. 12.3.1 Multivariate linear regression In a standard multivariate linear regression, we only have an observation model with independent errors (i.e., the state process does not appear in the model): yt = a + Ddt + vt , where vt ∼ MVN(0, R) (12.2) The elements in a are the intercepts and those in D are the slopes (effects). We have dropped the t subscript on a and D because these will be modeled as time-constant. Writing this out for the two plankton and the two covariates we get: yg a1 βg,temp βg,tp temp v = + + 1 (12.3) ybg t a2 βbg,temp βbg,tp tp t−1 v2 t 153 1 −1 1 0 −2 −1 0 2 −1 1 TP 3 4 −1 Temp 1 Bluegreens 2 −3 Greens 2 12.3 Observation-error only model 1966 1968 1970 1972 1974 Time Fig. 12.1. Time series of Green and Bluegreen algae abundances in Lake Washington along with the temperature and total phosporous covariates. Let’s fit this model with MARSS. The x part of the model is irrelevant so we want to fix the parameters in that part of the model. We won’t set B = 0 or Z = 0 since that might cause numerical issues for the Kalman filter. Instead we fix them as identity matrices and fix x0 = 0 so that xt = 0 for all t. Q = U = x0 = "zero"; B = Z = "identity" d = covariates A = "zero" D = "unconstrained" y = dat # to show relationship between dat & the equation model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,D=D,d=d,x0=x0) kem = MARSS(y, model=model.list) Success! algorithm run for 15 iterations. abstol and log-log tests passed. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem 154 12 Covariates Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Algorithm ran 15 (=minit) iterations and convergence was reached. Log-likelihood: -276.4287 AIC: 562.8573 AICc: 563.1351 R.diag D.(Greens,Temp) D.(Bluegreens,Temp) D.(Greens,TP) D.(Bluegreens,TP) Estimate 0.706 0.367 0.392 0.058 0.535 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. We set A="zero" because the data and covariates have been demeaned. Of course, one can do multiple regression in R using, say, lm(), and that would be much, much faster. The EM algorithm is over-kill here, but it is shown so that you see how a standard multivariate linear regression model is written as a MARSS model in matrix form. 12.3.2 Multivariate linear regression with autocorrelated errors We can add a twist to the standard multivariate linear regression model, and instead of having temporally i.i.d. errors in the observation process, we’ll assume autoregressive errors. There is still no state process in our model, but we will use the state part of a MARSS model to model our errors. Mathematically, this can be written as xt = Bxt−1 + wt , where wt ∼ MVN(0, Q) yt = Dt dt + xt (12.4) Here, the xt are the errors for the observation model; they are modeled as an autoregressive process via the x equation. We drop the vt (set R = 0) because the xt in the y equation are now the observation errors. As usual, we have left the intercepts (a and u) off since the data and covariates are all demeaned. Here’s how we fit this model in MARSS: Q = "unconstrained" B = "diagonal and unequal" A = U = x0 = "zero" R = "diagonal and equal" d = covariates D = "unconstrained" y = dat # to show the relation between dat & the model equations model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,D=D,d=d,x0=x0) 12.4 Process-error only model 155 control.list = list(maxit=1500) kem = MARSS(y, model=model.list, control=control.list) Success! abstol and log-log tests passed at 79 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 79 iterations. Log-likelihood: -209.3408 AIC: 438.6816 AICc: 439.7243 R.diag B.(X.Greens,X.Greens) B.(X.Bluegreens,X.Bluegreens) Q.(1,1) Q.(2,1) Q.(2,2) D.(Greens,Temp) D.(Bluegreens,Temp) D.(Greens,TP) D.(Bluegreens,TP) Estimate 0.0428 0.2479 0.9136 0.7639 -0.0285 0.1265 0.3777 0.2621 0.0459 0.0675 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. You can try setting B to identity and MARSS will fit a model with non-meanreverting autoregressive errors to the data. It is not done here since it turns out that that is not a very good model and it takes a long time to fit. If you try it, you’ll see that Q gets small meaning that the x part is being removed from the model. 12.4 Process-error only model Now let’s model the data as an autoregressive process observed without error, and incorporate the covariates into the process model. Note that this is much different from typical linear regression models. The x part represents our model of the data (in this case plankton species). How is this different from the autoregressive observation errors? Well, we are modeling our data as autoregressive so data at t − 1 affects the data at t. Population abundances are inherently autoregressive so this model is a bit closer to the underlying 156 12 Covariates mechanism generating the data. Here is our new process model for plankton abundance. xt = xt−1 + Cct + wt , where wt ∼ MVN(0, Q) (12.5) We can fit this as follows: R = A = U = "zero"; B = Z = "identity" Q = "equalvarcov" C = "unconstrained" x = dat # to show the relation between dat & the equations model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=covariates) kem = MARSS(x, model=model.list) Success! algorithm run for 15 iterations. abstol and log-log tests passed. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Algorithm ran 15 (=minit) iterations and convergence was reached. Log-likelihood: -285.0732 AIC: 586.1465 AICc: 586.8225 Q.diag Q.offdiag x0.X.Greens x0.X.Bluegreens C.(X.Greens,Temp) C.(X.Bluegreens,Temp) C.(X.Greens,TP) C.(X.Bluegreens,TP) Estimate 0.7269 -0.0210 -0.5189 -0.2431 -0.0434 0.0988 -0.0589 0.0104 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Now, it looks like temperature has a strong negative effect on algae? Also our log-likelihood dropped a lot. Well, the data do not look at all like a random walk model (i.e., where B = 1), which we can see from the plot of the data (Figure 12.1). The data are fluctuating about some mean so let’s switch to a better autoregressive model—a mean-reverting model. To do this, we will allow the diagonal elements of B to be something other than 1. model.list$B = "diagonal and unequal" kem = MARSS(dat, model=model.list) 12.4 Process-error only model 157 Success! algorithm run for 15 iterations. abstol and log-log tests passed. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Algorithm ran 15 (=minit) iterations and convergence was reached. Log-likelihood: -236.6106 AIC: 493.2211 AICc: 494.2638 B.(X.Greens,X.Greens) B.(X.Bluegreens,X.Bluegreens) Q.diag Q.offdiag x0.X.Greens x0.X.Bluegreens C.(X.Greens,Temp) C.(X.Bluegreens,Temp) C.(X.Greens,TP) C.(X.Bluegreens,TP) Estimate 0.1981 0.7672 0.4899 -0.0221 -1.2915 -0.4179 0.2844 0.1655 0.0332 0.1340 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Notice that the log-likelihood goes up quite a bit, which means that the meanreverting model fits the data much better. With this model, we are estimating x0 . If we set model$tinitx=1, we will get a error message that R diagonals are equal to 0 and we need to fix x0. Because R = 0, if we set the initial states at t = 1, then they are fully determined by the data. x0 = dat[,1,drop=FALSE] model.list$tinitx = 1 model.list$x0 = x0 kem = MARSS(dat, model=model.list) Success! algorithm run for 15 iterations. abstol and log-log tests passed. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Algorithm ran 15 (=minit) iterations and convergence was reached. 158 12 Covariates Log-likelihood: -235.4827 AIC: 486.9653 AICc: 487.6414 B.(X.Greens,X.Greens) B.(X.Bluegreens,X.Bluegreens) Q.diag Q.offdiag C.(X.Greens,Temp) C.(X.Bluegreens,Temp) C.(X.Greens,TP) C.(X.Bluegreens,TP) Estimate 0.1980 0.7671 0.4944 -0.0223 0.2844 0.1655 0.0332 0.1340 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. 12.5 Both process- & observation-error model The MARSS package is really designed for state-space models where you have errors (v and w) in both the process and observation models. For example, xt = Bxt−1 + Ct ct + wt , where wt ∼ MVN(0, Q) yt = xt−1 + vt , where vt ∼ MVN(0, R), (12.6) x is the true algae abundances and y is the observation of the x’s. Let’s say we knew that the observation variance on the algae measurements was about 0.16 and we wanted to include that known value in the model. To do that, we can simply add R to the model list from the process-error only model in the last example. model.list$R = diag(0.16,2) kem = MARSS(dat, model=model.list) Success! abstol and log-log tests passed at 26 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 26 iterations. Log-likelihood: -240.3718 AIC: 496.7436 AICc: 497.4196 Estimate 12.6 Including seasonal effects in MARSS models 159 B.(X.Greens,X.Greens) 0.31201 B.(X.Bluegreens,X.Bluegreens) 0.76142 Q.diag 0.33842 Q.offdiag -0.00355 C.(X.Greens,Temp) 0.23569 C.(X.Bluegreens,Temp) 0.16966 C.(X.Greens,TP) 0.02449 C.(X.Bluegreens,TP) 0.14164 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Note, our estimates of the effect of temperature and total phosphorous are not that different than what you get from a simple multiple regression (our first example). This might be because the autoregressive component is small, meaning the estimated diagonals on the B matrix are small. 12.6 Including seasonal effects in MARSS models Time-series data are often collected at intervals with some implicit “seasonality.” For example, quarterly earnings for a business, monthly rainfall totals, or hourly air temperatures. In those cases, it is often helpful to extract any recurring seasonal patterns that might otherwise mask some of the other temporal dynamics we are interested in examining. Here we show a few approaches for including seasonal effects using the Lake Washington plankton data, which were collected monthly. The following examples will use all five phytoplankton species from Lake Washington. First, let’s set up the data. years = fulldat[,"Year"]>=1965 & fulldat[,"Year"]<1975 phytos = c("Diatoms", "Greens", "Bluegreens", "Unicells", "Other.algae") dat = t(fulldat[years,phytos]) # z.score data because we changed the mean when we subsampled the.mean = apply(dat,1,mean,na.rm=TRUE) the.sigma = sqrt(apply(dat,1,var,na.rm=TRUE)) dat = (dat-the.mean)*(1/the.sigma) # number of time periods/samples TT = dim(dat)[2] 12.6.1 Seasonal effects as fixed factors One common approach for estimating seasonal effects is to treat each one as a fixed factor, such that the number of parameters equals the number of “seasons” (e.g., 24 hours per day, 4 quarters per year). The plankton data are 160 12 Covariates collected monthly, so we will treat each month as a fixed factor. To fit a model with fixed month effects, we create a 12 × T covariate matrix c with one row for each month (Jan, Feb, ...) and one column for each time point. We put a 1 in the January row for each column corresponding to a January time point, a 1 in the February row for each column corresponding to a February time point, and so on. All other values of c equal 0. The following code will create such a c matrix. # number of "seasons" (e.g., 12 months per year) period = 12 # first "season" (e.g., Jan = 1, July = 7) per.1st = 1 # create factors for seasons c.in = diag(period) for(i in 2:(ceiling(TT/period))) {c.in = cbind(c.in,diag(period))} # trim c.in to correct start & length c.in = c.in[,(1:TT)+(per.1st-1)] # better row names rownames(c.in) = month.abb Next we need to set up the form of the C matrix which defines any constraints we want to set on the month effects. C is a 5 × 12 matrix. Five taxon and 12 month effects. If we wanted each taxon to have the same month effect, i.e. there is a common month effect across all taxon, then we have the same value in each C column1 : C = matrix(month.abb,5,12,byrow=TRUE) C [1,] [2,] [3,] [4,] [5,] [1,] [2,] [3,] [4,] [5,] [,1] "Jan" "Jan" "Jan" "Jan" "Jan" [,10] "Oct" "Oct" "Oct" "Oct" "Oct" [,2] "Feb" "Feb" "Feb" "Feb" "Feb" [,11] "Nov" "Nov" "Nov" "Nov" "Nov" [,3] "Mar" "Mar" "Mar" "Mar" "Mar" [,12] "Dec" "Dec" "Dec" "Dec" "Dec" [,4] "Apr" "Apr" "Apr" "Apr" "Apr" [,5] "May" "May" "May" "May" "May" [,6] "Jun" "Jun" "Jun" "Jun" "Jun" [,7] "Jul" "Jul" "Jul" "Jul" "Jul" [,8] "Aug" "Aug" "Aug" "Aug" "Aug" [,9] "Sep" "Sep" "Sep" "Sep" "Sep" Notice, that C only has 12 values in it, the 12 common month effects. However, for this example, we will let each taxon have a different month effect thus allowing different seasonality for each taxon. For this model, we want each value in C to be unique: 1 month.abb is a R constant that gives month abbreviations in text. 12.6 Including seasonal effects in MARSS models 161 C = "unconstrained" Now C has 5 × 12 = 60 separate effects. Then we set up the form for the rest of the model parameters. We make the following assumptions: # B # Q # # U # Z # A # # R # D d Each taxon has unique density-dependence = "diagonal and unequal" Assume independent process errors = "diagonal and unequal" We have demeaned the data & are fitting a mean-reverting model by estimating a diagonal B, thus = "zero" Each obs time series is associated with only one process = "identity" The data are demeaned & fluctuate around a mean = "zero" We assume observation errors are independent, but they have similar variance due to similar collection methods = "diagonal and equal" We are not including covariate effects in the obs equation = "zero" = "zero" Now we can set up the model list for MARSS and fit the model (results are not shown since they are verbose with 60 different month effects). model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=c.in,D=D,d=d) seas.mod.1 = MARSS(dat,model=model.list,control=list(maxit=1500)) # Get the estimated seasonal effects # rows are taxa, cols are seasonal effects seas.1 = coef(seas.mod.1,type="matrix")$C rownames(seas.1) = phytos colnames(seas.1) = month.abb The top panel in Figure 12.2 shows the estimated seasonal effects for this model. Note that if we had set U=”unequal”, we would need to set one of the columns of C to zero because the model would be under-determined (infinite number of solutions). If we substracted the mean January abundance off each time series, we could set the January column in C to 0 and get rid of 5 estimated effects. 12.6.2 Seasonal effects as a polynomial The fixed factor approach required estimating 60 effects. Another approach is to model the month effect as a 3rd -order (or higher) polynomial: a + b × m + c × m2 + d × m3 where m is the month number. This approach has less flexibility but requires only 20 estimated parameters (i.e., 4 regression parameters 162 12 Covariates times 5 taxa). To do so, we create a 4 × T covariate matrix c with the rows corresponding to 1, m, m2 , and m3 , and the columns again corresponding to the time points. Here is how to set up this matrix: # number of "seasons" (e.g., 12 months per year) period = 12 # first "season" (e.g., Jan = 1, July = 7) per.1st = 1 # order of polynomial poly.order = 3 # create polynomials of months month.cov = matrix(1,1,period) for(i in 1:poly.order) {month.cov = rbind(month.cov,(1:12)^i)} # our c matrix is month.cov replicated once for each year c.m.poly = matrix(month.cov, poly.order+1, TT+period, byrow=FALSE) # trim c.in to correct start & length c.m.poly = c.m.poly[,(1:TT)+(per.1st-1)] # Everything else remains the same as in the previous example model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=c.m.poly,D=D,d=d) seas.mod.2 = MARSS(dat, model=model.list, control=list(maxit=1500)) The effect of month m for taxon i is ai + bi × m + ci × m2 + di × m3 , where ai , bi , ci and di are in the i-th row of C. We can now calculate the matrix of seasonal effects as follows, where each row is a taxon and each column is a month: C.2 = coef(seas.mod.2,type="matrix")$C seas.2 = C.2 %*% month.cov rownames(seas.2) = phytos colnames(seas.2) = month.abb The middle panel in Figure 12.2 shows the estimated seasonal effects for this polynomial model. 12.6.3 Seasonal effects as a Fourier series The factor approach required estimating 60 effects, and the 3rd order polynomial model was an improvement at only 20 parameters. A third option is to use a discrete Fourier series, which is combination of sine and cosine waves; it would require only 10 parameters. Specifically, the effect of month m on taxon i is ai × cos(2πm/p) + bi × sin(2πm/p), where p is the period (e.g., 12 months, 4 quarters), and ai and bi are contained in the i-th row of C. We begin by defining the 2 × T seasonal covariate matrix c as a combination of 1 cosine and 1 sine wave: cos.t = cos(2 * pi * seq(TT) / period) sin.t = sin(2 * pi * seq(TT) / period) c.Four = rbind(cos.t,sin.t) 12.6 Including seasonal effects in MARSS models 163 Everything else remains the same and we can fit this model as follows: model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=c.Four,D=D,d=d) seas.mod.3 = MARSS(dat, model=model.list, control=list(maxit=1500)) We make our seasonal effect matrix as follows: C.3 = coef(seas.mod.3, type="matrix")$C # The time series of net seasonal effects seas.3 = C.3 %*% c.Four[,1:period] rownames(seas.3) = phytos colnames(seas.3) = month.abb 0.0 0.5 Diatoms Greens Bluegreens Unicells Other.algae 0.0 0.4 Diatoms Greens Bluegreens Unicells Other.algae 0.0 0.4 Diatoms Greens Bluegreens Unicells Other.algae Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan −0.4 Fourier Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan −0.8 −0.4 Cubic Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan −0.5 Fixed monthly 1.0 The bottom panel in Figure 12.2 shows the estimated seasonal effects for this seasonal-effects model based on a discrete Fourier series. Fig. 12.2. Estimated monthly effects for the three approaches to estimating seasonal effects. Top panel: each month modelled as a separate fixed effect for each taxon (60 parameters); Middle panel: monthly effects modelled as a 3rd order polynomial (20 parameters); Bottom panel: monthly effects modelled as a discrete Fourier series (10 parameters). 164 12 Covariates Rather than rely on our eyes to judge model fits, we should formally assess which of the 3 approaches offers the most parsimonious fit to the data. Here is a table of AICc values for the 3 models: data.frame(Model=c("Fixed", "Cubic", "Fourier"), AICc=round(c(seas.mod.1$AICc, seas.mod.2$AICc, seas.mod.3$AICc),1)) Model AICc 1 Fixed 1188.4 2 Cubic 1144.9 3 Fourier 1127.4 The model selection results indicate that the model with monthly seasonal effects estimated via the discrete Fourier sequence is the best of the 3 models. Its AICc value is much lower than either the polynomial or fixed-effects models. 12.7 Model diagnostics We will examine some basic model diagnostics for these three approaches by looking at plots of the model residuals and their autocorrelation functions (ACFs) for all five taxa using the following code: for(i in 1:3) { dev.new() modn = paste("seas.mod",i,sep=".") for(j in 1:5) { plot.ts(residuals(modn)$model.residuals[j,], ylab="Residual", main=phytos[j]) abline(h=0, lty="dashed") acf(residuals(modn)$model.residuals[j,]) } } Figures 12.3–12.5 shows these diagnostics for the three models. The model residuals for all taxa and models appear to show significant negative autocorrelation at lag=1, suggesting that a model with seasonal effects is inadequate to capture all of the systematic variation in phytoplankton abundance. 12.8 Covariates with missing values or observation error The specific formulation of Equation 12.1 creates restrictions on the assumptions regarding the covariate data. You have to assume that your covariate 12.8 Covariates with missing values or observation error 1.0 −0.2 1.0 4 6 8 10 12 −0.2 0.4 ACF 0.0 4 6 ACF 0.8 2 8 10 12 Lag −0.4 0.2 0.0 −1.5 Residual 1.5 0 Series residuals(seas.mod.1)$model.residuals[i, ] Unicells 2 4 6 ACF 8 10 12 Lag −0.4 0.2 0.5 0.8 0 −1.0 Residual 2 Series residuals(seas.mod.1)$model.residuals[i, ] Bluegreens Series residuals(seas.mod.1)$model.residuals[i, ] Other.algae 2 4 6 ACF 8 10 12 Lag −0.4 0.2 0.0 0.8 1.0 0 −1.5 Residual 0 Lag −0.6 Residual 0.4 ACF 1.0 −1.0 0.0 Series residuals(seas.mod.1)$model.residuals[i, ] Greens 0.6 Residual Diatoms 165 1965 1967 1969 1971 Time 1973 0 1 2 3 4 5 6 7 8 9 10 11 12 Time Lag lag Fig. 12.3. Model residuals and their ACF for the model with fixed monthly effects. data has no error, which is probably not true. You cannot have missing values in your covariate data, again unlikely. You cannot combine instrument time series; for example, if you have two temperature recorders with different error rates and biases. Also, what if you have one noisy temperature recorder in the first part of your time series and then you switch to a much better recorder in the second half of your time series? All these problems require pre-analysis massaging of the covariate data, leaving out noisy and gappy covariate data, and making what can feel like arbitrary choices about which covariate time series to include. To circumvent these potential problems and allow more flexibility in how we incorporate covariate data, one can instead treat the covariates as components of an auto-regressive process by including them in both the process and observation models. Beginning with the process equation, we can write (v) (v) (v) (v) x B C x u = + + wt , x(c) t u(c) 0 B(c) x(c) t−1 (12.7) (v) 0 Q wt ∼ MVN 0, 0 Q(c) 12 Covariates −0.2 0.4 ACF 0.0 −1.0 Residual 1.0 Diatoms 1.0 166 Series residuals(seas.mod.2)$model.residuals[i, ] Greens 8 10 12 −0.2 4 6 0.8 2 ACF 8 10 12 Lag −0.4 0.2 1 0 −2 Residual 6 Lag 0 Series residuals(seas.mod.2)$model.residuals[i, ] Unicells 2 4 6 8 10 12 0.2 Lag −0.4 0.0 ACF 0.8 1.0 0 −1.5 Residual 4 Series residuals(seas.mod.2)$model.residuals[i, ] Bluegreens Series residuals(seas.mod.2)$model.residuals[i, ] Other.algae 2 4 6 8 10 12 Lag −0.4 0.2 ACF 0.8 0.0 1.0 0 −1.5 Residual 2 0.4 0.8 ACF 0.0 −0.6 Residual 0.6 0 1965 1967 1969 1971 Time 1973 0 1 2 3 4 5 6 7 8 9 10 11 12 Time Lag lag Fig. 12.4. Model residuals and their ACF for the model with monthly effects modelled as a 3rd -rd order polynomial. The elements with superscript (v) are for the k variate states and those with superscript (c) are for the q covariate states. The dimension of x(c) is q × 1 and q is not necessarily equal to p, the number of covariate observation time series in your dataset. Imagine, for example, that you have two temperature sensors and you are combining these data. Then you have two covariate observation time series (p = 2) but only one underlying covariate state time series (q = 1). The matrix C is dimension k × q, and B(c) and Q(c) are dimension q × q. The dimension2 of x(v) is k × 1, and B(v) and Q(v) are dimension k × k. Next, we can write the observation equation in an analogous manner, such that (v) (v) (v) (v) y Z D x a = + (c) + vt , y(c) t a 0 Z(c) x(c) t (12.8) (v) R 0 vt ∼ MVN 0, 0 R(c) 2 The dimension of x is always denoted m. If your process model includes only variates, then k = m, but now your process model includes k variates and q covariate states so m = k + q. 12.8 Covariates with missing values or observation error 1.0 −0.2 8 10 12 −0.2 4 6 8 10 12 0.2 Lag −0.4 −1.5 0.0 ACF 0.8 1.5 2 Series residuals(seas.mod.3)$model.residuals[i, ] 0 2 4 6 ACF 10 12 −0.4 −1.0 8 Lag 0.2 0.0 0.8 1.0 Series residuals(seas.mod.3)$model.residuals[i, ] Other.algae 2 4 6 8 10 12 Lag −0.4 −1.5 ACF 0.0 0.8 0 0.2 Residual 6 Lag 0 Unicells 1.0 Residual 4 Series residuals(seas.mod.3)$model.residuals[i, ] Bluegreens Residual 2 0.4 0.8 0.0 ACF 0 −0.6 Residual 0.4 ACF 0.0 −1.0 Series residuals(seas.mod.3)$model.residuals[i, ] Greens 0.6 Residual 1.0 Diatoms 167 1965 1967 1969 1971 Time 1973 0 1 2 3 4 5 6 7 8 9 10 11 12 Time Lag lag Fig. 12.5. Model residuals and their ACF for the model with monthly effects estimated using a Fourier transform. The dimension of y(c) is p × 1, where p is the number of covariate observation time series in your dataset. The dimension of y(v) is l ×1, where l is the number of variate observation time series in your dataset. The total dimension of y is l + p. The matrix D is dimension l × q, Z(c) is dimension p × q, and R(c) are dimension p × p. The dimension of Z(v) is dimension l × k, and R(v) are dimension l × l. The D matrix would presumably have a number of all zero rows in it, as would the C matrix. The covariates that affect the states would often be different than the covariates that affect the observations. For example, mean annual temperature would affect population growth rates for many species while having little or no affect on observability, and turbidity might strongly affect observability in many types of aquatic, say, surveys but have little affect on population growth rate. Our MARSS model with covariates now looks on the surface like a regular MARSS model: xt = Bxt−1 + u + wt , where wt ∼ MVN(0, Q) yt = Zxt + a + vt , where vt ∼ MVN(0, R) (12.9) 168 12 Covariates with the xt , yt and parameter matrices redefined as in Equations 12.7 and 12.8: (v) (v) (v) (v) x B C u Q 0 x = (c) B= u = (c) Q= x u 0 B(c) 0 Q(c) (12.10) (v) (v) (v) (v) y Z D a R 0 y = (c) Z= a = (c) R= y a 0 Z(c) 0 R(c) Note Q and R are written as block diagonal matrices, but you could allow covariances if that made sense. u and a are column vectors here. We can fit the model (Equation 12.9) as usual using the MARSS() function. The log-likelihood that is returned by MARSS will include the loglikelihood of the covariates under the covariate state model. If you want only the the log-likelihood of the non-covariate data, you will need to substract off the log-likelihood of the covariate model: (c) (c) xt = B(c) xt−1 + u(c) + wt , where wt ∼ MVN(0, Q(c) ) (c) (c) yt = Z(c) xt + a(c) + vt , where vt ∼ MVN(0, R(c) ) (12.11) An easy way to get this log-likelihood for the covariate data only is use the augmented model (Equation 12.9 with terms defined as in Equation 12.10) but pass in missing values for the non-covariate data. The following code shows how to do this. y.aug = rbind(data,covariates) fit.aug = MARSS(y.aug, model=model.aug) fit.aug is the MLE object that can be passed to MARSSkf(). You need to make a version of this MLE object with the non-covariate data filled with NAs so that you can compute the logLik without the covariates. This needs to be done in the marss element since that is what is used by MARSSkf(). Below is code to do this. fit.cov = fit.aug; fit.cov$marss$data[1:dim(data)[1],] = NA extra.LL = MARSSkf(fit.cov)$logLik Note that when you fit the augmented model, the estimates of C and B(c) are affected by the non-covariate data since the model for both the noncovariate and covariate data are estimated simultaneously and are not independent (since the covariate states affect the non-covariates states). If you want the covariate model to be unaffected by the non-covariate data, you can fit the covariate model separately and use the estimates for B(c) and Q(c) as fixed values in your augmented model. 13 Estimation of species interaction strengths with and without covariates 13.1 Background Multivariate autoregressive models (commonly termed MAR models) have been developed as a tool for analyzing community dynamics from time series data (Ives, 1995; Ives et al., 1999, 2003). These models are based on a process model for log abundances (x) of the form xt = Bxt−1 + u + wt where wt ∼ MVN(0, Q) (13.1) B is the interaction matrix; self interaction strengths (density-dependence) are on the diagonal and inter-specific interaction strengths are on the offdiagonals such that Bi, j is the ‘effect’ of species j on species i. This model has a stochastic equilibrium—it fluctuates around mean, (I − B)−1 u. The term u determines the mean level but once the system is at equilibrium, it does not affect the fluctuations relative to the mean. To see this, compare two models with b = 0.5 and u = 1 versus u = 0. The mean for the first is 1/(1 − 0.5) = 2 and for the second is 0. If we start both 1 above the mean, the next x is the same distance from the mean: x2 = 0.5(2 + 1) + 1 = 2.5 and x2 = 0.5(0 + 1) + 0 = 0.5. So both end up at 0.5 above the mean. So once the system is at equilibrium, it is ‘scale invariant’, where u is the scaling term. The way that Ives et al. (2003) write their process model (their Equation 10) is Xt = A + BXt−1 + Et . The A in Ives’s equation is the u appearing in Equation 13.1 and the Et is our wt . Often the models include environmental covariates, but we will leave that off for the moment and address that at the end of the chapter. If we add a Type RShowDoc("Chapter_SpeciesInteractions.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 170 13 B estimation measurement process1 , we have a MARSS model: yt = Zxt + a + vt where vt ∼ MVN(0, R) (13.2) Typically, we have one time series per species and thus we assume that m = n and Z is an m × m identity matrix (when m = n, a is set to 0). However, it is certainly possible to have multiple time series per species (for example data taken at multiple sites). In this chapter, we will estimate the B matrix of species interactions for a simple wolf-moose system and for a four-species freshwater plankton system. 13.2 Two-species example using wolves and moose Population dynamics of wolves and moose on Isle Royale, Michigan make an interesting case study of a two-species predator-prey interactions. These populations have been studied intensively since 1958(?) 2 . Unlike other populations of gray wolves, the Isle Royale population has a diet dominated by one prey item, moose. The only predator of moose on Isle Royale is the gray wolf, as this population is not hunted. We will use the wolf and moose winter census data from Isle Royale to learn how to fit community dynamics models to time-series data. The longterm January (wolf) and February (moose) population estimates are provided at http://www.isleroyalewolf.org. The mathematical form of the process model for the wolf-moose population dynamics is bww bw→m xwol f u wwol f xwol f = + wol f + umoose wmoose t xmoose t bm→w bmm xmoose t−1 (13.3) wwol f q 0 ∼ MVN 0, wol f wmoose t 0 qmoose 13.2.1 Load in and plot the data royale.dat = log(t(isleRoyal[,2:3])) 1 2 You can fit a MAR model with no observation error by setting R = 0, but a conditional least-squares algorithm is vastly faster than EM or BFGS for the R = 0 case (assuming no missing data). There are many, many publications from this long-term study site; see http:// www.isleroyalewolf.org/wolfhome/tech_pubs.html and the review here http: //www.isleroyalewolf.org/data/data/home.html. 13.2 Two-species example using wolves and moose 171 matplot(isleRoyal[,1],log(isleRoyal[,2:3]), ylab="log count",xlab="Year",type="l", lwd=3,bty="L",col="black") legend("topright",c("Wolf","Moose"), lty=c(1,2), bty="n") 5 3 4 log count 6 7 Wolf Moose 1960 1970 1980 1990 2000 2010 Year Fig. 13.1. Plot of the Isle Royale wolf and moose data. 13.2.2 Fit the model to the wolf-moose data The naive way to fit the model is to use Equations 13.2 and 13.1 “as is”: royale.model.0=list(B="unconstrained",Q="diagonal and unequal", R="diagonal and unequal",U="unequal") kem.0=MARSS(royale.dat,model=royale.model.0) If you try this, you will notice that it does not converge but stops when it reaches maxit and prints a number of warnings about non-convergence. The problem is that when you try to estimate B and u, they are often confounded. This a well-known problem, and you will need to find a way to fix u at some value. If you are willing to assume that the process is at equilibrium (i.e. not recovering to equilibrium from a big perturbation), then you can simply demean the data and set u to 0. It is also common to standardize the variance 172 13 B estimation by dividing by the square root of the variance of the data. This is called z-scoring the data. #if missing values are in the data, they should be NAs z.royale.dat=(royale.dat-apply(royale.dat,1,mean,na.rm=TRUE))/ sqrt(apply(royale.dat,1,var,na.rm=TRUE)) We also need to change a couple settings before fitting the model. In the default MARSS model, the initial value of x is treated as being at t = 0. If we are estimating the B matrix, we need to set this to be at t = 1 so that the initial x is constrained by the data3 at t = 1. The reason is that we need to estimate the initial x. Even if we use a prior on the initial x, we are still estimating its value4 . A model with a non-diagonal B matrix, does not “run backwards” well and the estimation of the initial x will run in circles. If we constrain it by data (at t = 1), we avoid this problem. So we will set model$tinitx=1. The other setting we want to change is allow.degen. This sets the diagonals of Q or R to zero if they are heading towards zero. When the initial x is at t = 1, this can have non-intuitive (not wrong but puzzling; see Appendix B) consequences if R is going to zero. So, we will set control$allow.degen=FALSE and manually set R to 0 if needed. royale.model.1=list(Z="identity", B="unconstrained", Q="diagonal and unequal", R="diagonal and unequal", U="zero", tinitx=1) cntl.list=list(allow.degen=FALSE,maxit=200) kem.1=MARSS(z.royale.dat, model=royale.model.1, control=cntl.list) Warning! Reached maxit before parameters converged. Maxit was 200. neither abstol nor log-log convergence tests were passed. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 WARNING: maxit reached at 200 iter before convergence. Neither abstol nor log-log convergence test were passed. The likelihood and params are not at the ML values. Try setting control$maxit higher. Log-likelihood: -76.46247 AIC: 172.9249 AICc: 175.2407 3 4 If there are many missing values at t = 1, we might still have problems and have to adjust accordingly. Also putting a prior on the initial x’s requires specifying their variance-covariance structure, which depends on the unknown B, and specifying some variancecovariance structure that conflicts with B will change your B estimates. So, in general, using a prior on the initial x’s when estimating B is a bad idea. 13.2 Two-species example using wolves and moose 173 Estimate R.(Wolf,Wolf) 0.001421 R.(Moose,Moose) 0.000378 B.(1,1) 0.762723 B.(2,1) -0.173536 B.(1,2) 0.069223 B.(2,2) 0.833074 Q.(X.Wolf,X.Wolf) 0.456457 Q.(X.Moose,X.Moose) 0.173376 x0.X.Wolf -0.267175 x0.X.Moose -1.277329 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Convergence warnings Warning: the R.(Wolf,Wolf) parameter value has not converged. Warning: the R.(Moose,Moose) parameter value has not converged. Warning: the logLik parameter value has not converged. Type MARSSinfo("convergence") for more info on this warning. It looks like R is going to zero, meaning that the maximum-likelihood model is a process error only model. That is not too surprising given that the data look more like a random walk than white noise. We will set R manually to zero: royale.model.2=list(Z="identity", B="unconstrained", Q="diagonal and unequal", R="zero", U="zero") kem.2=MARSS(z.royale.dat, model=royale.model.2) Success! abstol and log-log tests passed at 16 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 16 iterations. Log-likelihood: -82.3988 AIC: 180.7976 AICc: 182.2821 B.(1,1) B.(2,1) B.(1,2) B.(2,2) Estimate 0.7618 -0.1734 0.0692 0.8328 174 13 B estimation Q.(X.Wolf,X.Wolf) Q.(X.Moose,X.Moose) x0.X.Wolf x0.X.Moose 0.4499 0.1708 -0.2086 -1.5769 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. 13.2.3 Look at the estimated interactions The estimated B elements are coef(kem.2)$B. wolf.B=coef(kem.2,type="matrix")$B rownames(wolf.B)=colnames(wolf.B)=rownames(royale.dat) print(wolf.B, digits=2) Wolf Moose Wolf 0.76 0.069 Moose -0.17 0.833 The coef() function returns the estimated parameters of the fitted object, but in this case we want to see the estimates in matrix form. Thus we use type="matrix". Element row=i, col= j in B is the effect of species j on species i, so B2,1 is the effect of wolves on moose and B1,2 is the effect of moose on wolves. The B matrix suggests that wolves have a negative effect on moose and that moose have a positive effect on wolves—as one would expect. The diagonals are interpreted differently than the off-diagonals since the diagonals are (bi,i − 1) so subtract off 1 from the diagonals to get the effect of species i on itself. If the species are density-independent, then Bi,i would equal 1. Smaller Bi,i means more density dependence. 13.2.4 Adding covariates It is well-known that moose numbers are strongly affected by winter and summer climate. The Isle Royale data set provided with MARSS has climate data from climate stations in Northeastern Minnesota, near Isle Royale5 . The covariate data include January-February, July-September and April-May average temperature and precipitation. Also included are three-year running means of these data, where the number for year x is the average of years x-1, x and x+1. We will include these covariates in the analysis to see how they change our interaction estimates. We have to adjust our covariates because the census numbers are from winter in year x and we want the climate data from the previous year to affect this winter’s moose count. As usual, we will 5 From the Western Regional Climate Center. See the help file for this dataset for references. 13.2 Two-species example using wolves and moose 175 need to demean our covariate data so that we can set u equal to zero. We will standardize the variance also so that we can more easily compare the effects across different covariates. The mathematical form of our new process model for the wolf-moose population dynamics is win temp xwol f x 0 0 0 wwol f win precip + = B wol f + (13.4) xmoose t xmoose t−1 C21 C22 C23 wmoose t sum temp t−1 The C21 , C22 , etc. terms are the effect of winter temperature, winter precipitation, previous summer temperature and previous summer precipitation on winter moose numbers. Since climate is known to mainly affect the moose, we set the climate effects to 0 for wolves (top row of C). First we prepare the covariate data. The winter temperature and precipitation data is in columns 4 and 10, while the summer temperature data is in columns 6. We need to use the previous year’s climate data with this winter’s abundance data, so we will off-set the climate data. clim.dat= t(isleRoyal[1:52,c(4,10,6)]) z.score.clim.dat=(clim.dat-apply(clim.dat,1,mean,na.rm=TRUE))/ sqrt(apply(clim.dat,1,var,na.rm=TRUE)) A plot of the covariate data against each other indicates that there is not much correlation between winter temperature and precipitation (Figure 13.2, which is good for analysis purposes, but warm winters are somewhat correlated with warm summers. The latter will make it harder to interpret the effect of winter versus summer temperature although the correlation is not too strong fortunately. Next we prepare the list with the structure of all the model matrices. We give descriptive names to the C elements so we can remember what each C element means. royale.model.3=list(Z="identity", B="unconstrained", Q="diagonal and unequal", R="zero", U="zero", C=matrix(list(0,"Moose win temp",0,"Moose win precip", 0,"Moose sum temp"),2,3), c=z.score.clim.dat) Then we fit the model. Because we have to use the climate data from the previous year, we lose a year of our count data. kem.3=MARSS(z.royale.dat[,2:53], model=royale.model.3) Success! abstol and log-log tests passed at 17 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. 176 13 B estimation MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 17 iterations. Log-likelihood: -80.79155 AIC: 183.5831 AICc: 186.4527 B.(1,1) B.(2,1) B.(1,2) B.(2,2) Q.(X.Wolf,X.Wolf) Q.(X.Moose,X.Moose) x0.X.Wolf x0.X.Moose C.Moose win temp C.Moose win precip C.Moose sum temp Estimate 0.7667 -0.1609 0.0790 0.8343 0.4567 0.1679 0.1543 -1.4008 0.0242 -0.0713 -0.0306 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. The results suggest what is already known about this system: cold winters and lots of snow are bad for moose as are hot summers. 13.2.5 Change the model and data You can explore the sensitivity of the B estimates when the measurement error is increased by adding white noise to the data: bad.data=z.royale.dat+matrix(rnorm(100,0,sqrt(.2)),2,50) kem.bad=MARSS(bad.data, model=model) You can change the model by changing the constraints on R and Q. 13.3 Analysis a four-species plankton community Ives et al. (2003) presented weekly data on the biomass of two species of phytoplankton and two species of zooplankton in two lakes, one with low planktivory and one with high planktivory. They used these data to estimate the interaction terms for the four species. Here we will reanalyze data and compare our results. Ives et al. (2003) explain the data as: “The data consist of weekly samples of zooplankton and phytoplankton, which for the analyses were divided into 13.3 Analysis a four-species plankton community −1 0 1 2 3 ● ● ● 2 ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● 1 ● ● ● 3 ● ● 0 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● −1 jan.feb.ave.temp 177 ● 2 ● ● ● ● ● 1 ● jan.feb.ave.precip ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● 0 −0.039 ● −0.036 july.sept.ave.temp −2 −1 0.33 0 1 2 −1 ●● −1 0 1 2 −2 −1 0 1 2 Fig. 13.2. Pairs plot of the covariate data for Isle Royale with correlations in the lower panel. The R code that produced this plot was cor.fun=function(x, y)text(0.5,0.5,format(cor(x,y),digits=2),cex=2) pairs(t(z.score.clim.dat),lower.panel=cor.fun). two zooplankton groups (Daphnia and non-Daphnia) and two phytoplankton groups (large and small phytoplankton). Daphnia are large, effective herbivores, and small phytoplankton are particularly vulnerable to herbivory, so we anticipate strong interactions between Daphnia and small phytoplankton groups.” Figure 13.3 shows the data. What you can see from the figure is that the data are only collected in the summer. 13.3.1 Load in the plankton data # only use the plankton, daphnia, & non-daphnia plank.spp = c("Large Phyto","Small Phyto","Daphnia","Non-daphnia") plank.dat = ivesDataByWeek[,plank.spp] #The data are not logged plank.dat = log(plank.dat) #Transpose to get time going across the columns plank.dat = t(plank.dat) 178 13 B estimation #make a demeaned version d.plank.dat = (plank.dat-apply(plank.dat,1,mean,na.rm=TRUE)) 0 −2 −4 log biomass 2 As before, we will demean the data so we can set u to 0. We do not standardize by the variance, however because we are going to fix the R variance later as Ives et al. did. 1 15 31 47 63 79 95 113 133 153 173 193 213 233 253 273 293 week of study Fig. 13.3. Plot of the de-meaned plankton data. Zooplankton are the thicker lines. Phytoplankton are the thinner lines. 13.3.2 Specify a MARSS model for the plankton data We will start by fitting a model with the following assumptions: All phytoplankton share the same process variance. All zooplankton share the same process variance. Phytoplankton and zooplankton have different measurement variances Measurement errors are independent. Process errors are independent. Z="identity" U="zero" B="unconstrained" Q=matrix(list(0),4,4); diag(Q)=c("Phyto","Phyto","Zoo","Zoo") R=matrix(list(0),4,4); diag(R)=c("Phyto","Phyto","Zoo","Zoo") plank.model.0=list(Z=Z, U=U, Q=Q, R=R, B=B) Why did we set U="zero"? Equation 13.1 is a stationary model; it fluctuates about a mean. The u in Equation 13.1 is a scaling term that just affects the 13.3 Analysis a four-species plankton community 179 mean level—once the system is at equilibrium. If we assume that the mean of y (the mean of our data) is a good estimate of the mean of the system (the x), then we can set u equal to zero. 13.3.3 Fit the plankton model and look at the estimated B matrix The call to fit the model is standard with the addition of setting model$tinitx so that the initial states (x) are set at t = 1 instead of t = 0. plank.model.0$tinitx=1 kem.plank.0 = MARSS(d.plank.dat, model=plank.model.0 ) Now we can print the B matrix, with a little cleaning up so it looks prettier. #Cleaning up the B matrix for printing B.0 = coef(kem.plank.0, type="matrix")$B[1:4,1:4] rownames(B.0) = colnames(B.0) = c("LP","SP","D","ND") print(B.0,digits=2) LP SP D ND LP 0.77 0.29 -0.0182 0.131 SP 0.19 0.51 0.0052 -0.045 D -0.43 2.29 0.4916 0.389 ND -0.33 1.35 -0.2180 0.831 LP stands for large phytoplankton, SP for small phytoplankton, D for Daphnia and ND for non-Daphnia. We can compare this to the Ives et al. estimates (in their Table 2, bottom right) and see a few differences: LP SP D ND LP 0.48 -0.39 --SP -- 0.25 -0.17 -0.11 D --- 0.74 0.00 ND -- 0.10 0.00 0.60 First, thing you will notice is that the Ives et al. matrix is missing values. The matrix they show is after a model selection step to determine which interactions had little data support and thus could be set to zero. Also, they fixed apriori the interactions between Daphnia and non-Daphnia at zero because they do not prey on each other. The second thing you will notice is that the estimates are not particularly similar. Next we will try some other ways of fitting the data that are closer to the way that Ives et al. fitted the data. By the way, if you are curious what would happen if we removed all those NAs, you can run the following code. test.dat = d.plank.dat[, !is.na(d.plank.dat[1, ])] test = MARSS(test.dat, model = plank.model.0) 180 13 B estimation Removing all the NAs would mean that the end of summer 1 is connected to the beginning of summer 2. This adds some steep steps in the Daphnia time series where Daphnia ended the summer high and started the next summer low. 13.3.4 Look at different ways to fit the model We will try a series of changes to get closer to the way Ives et al. fit the data, and you will see how different assumptions change (or do not change) our species interaction estimates. First, we change Q to be unconstrained. Making Q diagonal in model 0 meant that we were assuming that whatever environmental factor is driving variation in phytoplankton numbers is uncorrelated with the environmental factor driving zooplankton variation. That is probably not true since they are all in the same lake. This case takes awhile to run. plank.model.1=plank.model.0 plank.model.1$Q="unconstrained" kem.plank.1 = MARSS(d.plank.dat, model=plank.model.1) Notice that the Q specification changed to “unconstrained”. Everything else stays the same as in model 0. The code now runs longer, and the B estimates are not particulaly closer to Ives et al. LP SP D LP 0.4961 0.061 0.079 SP -0.1833 0.896 0.067 D 0.1180 0.350 0.638 ND 0.0023 0.370 -0.122 ND 0.123 0.011 0.370 0.810 Next, we will set some of the interactions to zero as in Table 2 in Ives et al. (2003). In that table, certain interactions were fixed at 0 (denoted with 0s), and some were made 0 after fitting (the blanks). We will fix all to zero. To do this, we need to write out the B matrix as a list matrix so that we can have estimated and fixed values (the 0s) in the B specification. B.2=matrix(list(0),4,4) #set up the list matrix diag(B.2)=c("B11","B22","B33","B44") #give names to diagonals #and names to the estimated non-diagonals B.2[1,2]="B12"; B.2[2,3]="B23"; B.2[2,4]="B24"; B.2[4,2]="B42" print(B.2) [1,] [2,] [3,] [4,] [,1] "B11" 0 0 0 [,2] "B12" "B22" 0 "B42" [,3] 0 "B23" "B33" 0 [,4] 0 "B24" 0 "B44" 13.3 Analysis a four-species plankton community 181 As you can see, the B matrix now has elements that will be estimated (the names in quotes) and fixed values (the numbers with no quotes). When preparing your list matrix, make sure your fixed values do not have have quotes around them. If they do, they are strings (class character) not numbers (class numeric), and MARSS will interpret a string as the name of something to be estimated. If you use the same name for an element, then MARSS will force those elements to be shared (have the same value). This one will take a while to run also. #model 2 plank.model.2=plank.model.1 plank.model.2$B = B.2 kem.plank.2= MARSS(d.plank.dat, model=plank.model.2) Now we are getting closer to the Ives et al. estimates: LP SP D ND LP 0.65 -0.33 --SP -- 0.54 0.0016 -0.026 D --- 0.8349 -ND -- 0.13 -- 0.596 Ives et al. did not estimate R. Instead they used a fixed observation variance of 0.04 for phytoplankton and 0.16 for zooplankton6 . We fit the model with their fixed R as follows: #model 3 plank.model.3=plank.model.2 plank.model.3$R=diag(c(.04,.04,.16,.16)) kem.plank.3= MARSS(d.plank.dat, model=plank.model.3) As you can see from Table 13.1, we are getting closer to Ives et al. estimates, but we are still a bit off. Now we need to add the environmental covariates: phosphorous and fish biomass. 13.3.5 Adding covariates A standard way that you will see covariate data added to a MARSS model is the following: xt = Bxt−1 + u + Cct + wt , where wt ∼ MVN(0, Q) yt = Zxt + a + Ddt + vt , where vt ∼ MVN(0, R) (13.5) ct and dt are covariate data, like temperature. At time t and C is a matrix with the (linear) effects of ct on xt , and D is a matrix with the (linear) effects of dt on yt . 6 You can compare this to the estimated observation variances by looking at coef(kem.plank.2)$R 182 13 B estimation Ives et al. (2003) only include covariates in their process model, and their process model (their Equation 27) is written Xt = A + BXt−1 + CUt + Et . In our Equation 13.5, Ut = ct , and C is a m × p matrix, where p is the number of covariates in ct . We will set their A (our u) to zero by demeaning the y and implicitly assuming that the mean of the y is a good estimate of the mean of the x’s. Thus the model where covariates only affect the underlying process is xt = Bxt−1 + Cct + wt , where wt ∼ MVN(0, Q) yt = xt + vt , where vt ∼ MVN(0, R) (13.6) To fit this model, we first need to prepare the covariate data. We will just use the phosphorous data. #transpose to make time go across columns #drop=FALSE so that R doesn't change our matrix to a vector phos = t(log(ivesDataByWeek[,"Phosph",drop=FALSE])) d.phos = (phos-apply(phos,1,mean,na.rm=TRUE)) Why log the covariate data? It is what Ives et al. did, so we follow their method. However, in general, you want to think about what relationship you want to assume between the covariates and their effects. For example, log (or square-root) transformations mean that extremes have less impact relative to their untransformed value and that a small absolute change, say from 0.01 to 0.0001 in the untransformed value, can mean large difference in the effects since log(0.0001) < log(0.01). Phosporous is assumed to only affect phytoplankton so the other terms in C, corresponding to the zooplankton, are set to 0. The C matrix is defined as follows: CLP,phos CSP,phos (13.7) C= 0 0 To add C and c to our latest model, we add C and c to the model list used in the MARSS call: plank.model.4=plank.model.3 plank.model.4$C=matrix(list("C11","C21",0,0),4,1) plank.model.4$c=d.phos Then we fit the model as usual: kem.plank.4= MARSS(d.plank.dat, model=plank.model.4) Success! abstol and log-log tests passed at 55 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is 13.3 Analysis a four-species plankton community 183 Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 55 iterations. Log-likelihood: -393.189 AIC: 834.3781 AICc: 837.9284 B.B11 B.B12 B.B22 B.B42 B.B23 B.B33 B.B24 B.B44 Q.(1,1) Q.(2,1) Q.(3,1) Q.(4,1) Q.(2,2) Q.(3,2) Q.(4,2) Q.(3,3) Q.(4,3) Q.(4,4) x0.X.Large Phyto x0.X.Small Phyto x0.X.Daphnia x0.X.Non-daphnia C.C11 C.C21 Estimate 0.6138 -0.4619 0.3320 0.0479 -0.0182 0.8889 -0.0476 0.6643 0.7376 0.2159 0.0796 0.0293 0.2688 -0.1271 -0.0878 0.8654 0.4685 0.3906 0.1615 -0.5273 -1.1121 -1.8082 0.1385 0.1580 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. 13.3.6 Including a covariate observation model The difficulty with the standard approach to including covariates (Equation 13.5) is that it limits what kind of covariate data you can use and how you model that covariate data. You have to assume that your covariate data has no error, which is probably not true. Assuming that your covariate has no error reduces the reported uncertainty in your covariate effect because you did not include uncertainty in those values. The standard approach also does not allow missing values in your covariate data, which is why we did not include the fish covariate data in the last model. Also you cannot combine instrument time 184 13 B estimation series; for example, if you have two temperature recorders with different error rates and biases. Also, what if you have one noisy temperature recorder in the first part of your time series and then you switch to a much better recorder in the second half of your time series? All these problems require pre-analysis massaging of the covariate data, leaving out noisy and gappy covariate data, and making what can feel like arbitrary choices about which covariate time series to include which is especially worrisome when the covariates are then incorporated in the model as known without error. Instead one can include an observation and process model for the covariates just like for the non-covariate data. Now the covariates are included in yt and are modeled with their own state process(es) in xt . A MARSS model with a covariate observation and process model is shown below. The elements with superscript (v) are for the variates and those with superscript (c) are for the covariates. The superscripts just help us keep straight which of the state processes and parameters corresponding to the parts that correspond abundances and which correspond to the environmental covariates. (v) (v) (v) (v) (v) x B C x u Q 0 = + (c) + wt , wt ∼ MVN 0, u x(c) t 0 B(c) x(c) t−1 0 Q(c) (v) (v) (v) (v) (v) Z 0 a R 0 y x = + (c) + vt , vt ∼ MVN 0, a y(c) t 0 Z(c) x(c) t 0 R(c) (13.8) Note that when you fit your covariate and non-covariate data jointly as in Equation 13.8, your non-covariate data affect the estimates of the covariate models. When you maximize the likelihood, you do so conditioned on all the data. The likelihood that is output is the likelihood of the non-covariate and covariate data. Depending on your system, you might not want the covariate model affected by the non-covariate data. In this case, you can fit the covariate model separately: (c) (c) xt = B(c) xt−1 + u(c) + wt , wt ∼ MVN(0, Q(c) ) (c) (c) yt = Z(c) xt + a(c) , vt ∼ MVN(0, R(c) ) (13.9) At this point, you have another choice. Do you want the estimated covariates states, the x(c) , to be affected by the non-covariate data? For example, you have temperature data. You can estimates true temperature for the temperature only from the temperature data or you can decide that the non-covariate data has information about the true temperature, because the non-covariate states are affected by the true temperature. If you want the covariate states to only be affected by the covariate data, then use Equation 13.5 with ut set from your estimates of x(c) from Equation 13.9. Or if you want the non-covariate data to affect the estimates of the covariate states, use Equation 13.8 with the parameters estimated from Equation 13.9. 13.3 Analysis a four-species plankton community 185 13.3.7 The MARSS model with covariates following Ives et al. Ives et al. used Equation 13.5 for phosphorous and Equation 13.8 for fish biomass. Phosphorous was treated as observed with no error since it was experimentally manipulated and there were no missing values. Fish biomass was treated as having observation error and was modeled as a autoregressive process with unknown parameters as in Equation 13.8. Their MARSS model takes the form: xt = Bxt−1 + Cct + wt , where wt ∼ MVN(0, Q) yt = xt + vt , where vt ∼ MVN(0, R) (13.10) where x and y are redefined as large phyto small phyto Daphnia Non-Daphnia zooplank fish biomass (13.11) The covariate fish biomass appears in x because it will be modeled, and its interaction terms (Ives et al.’s C terms) appear in B. Phosphorous appears in the ct terms because it is treated as a known additive term and its interaction terms appear in C. Recall that we set u to 0 by demeaning the plankton data, so it does not appear above. The Z matrix does not appear in front of the xt since there is a one-to-one correspondence the x’s and y’s, and thus Z is the identity matrix. The B matrix is bLP bLP,SP 0 0 0 (v) 0 bSP bSP,D bSP,ND 0 B C 0 bD 0 CD, f ish (13.12) B= = 0 0 B(c) 0 bND,SP 0 bND,ND CND, f ish 0 0 0 0 b f ish The B elements have some interactions fixed at 0 as in our last model fit. The c’s are the interactions between the fish and the species. We will estimate a B term for fish since Ives et al. did, but this is an odd thing to do for the fish data since these data were interpolated from two samples per season. The Q matrix is the same as that in our last model fit, with the addition of an element for the variance for the fish biomass: qLP qLP,SP qLP,D qLP,ND 0 (v) qLP,SP qSP qSP,D qSP,ND 0 Q 0 qD qD,ND 0 Q= (13.13) (c) = qLP,D qSP,D 0 Q qLP,ND qSP,ND qD,ND qND 0 0 0 0 0 q f ish 186 13 B estimation Again it is odd to estimate a variance term for data interpolated from two points, but we follow Ives et al. here. Ives et al. set the observation variance for the logged fish biomass data to 0.36 (page 320 in Ives et al. (2003)). The observation variances for the plankton data was set as in our previous model. 0.04 0 0 0 0 0 0.04 0 0 0 0 0 0.16 0 0 (13.14) R= 0 0 0 0.16 0 0 0 0 0 0.36 13.3.8 Setting the model structure for the model with fish covariate data First we need to add the logged fish biomass to our data matrix. #transpose to make time go across columns #drop=FALSE so that R doesn't change our matrix to a vector fish = t(log(ivesDataByWeek[,"Fish biomass",drop=FALSE])) d.fish = (fish-apply(fish,1,mean,na.rm=TRUE)) #plank.dat.w.fish = rbind(plank.dat,fish) d.plank.dat.w.fish = rbind(d.plank.dat,d.fish) Next make the B matrix. Some elements are estimated and others are fixed at 0. B=matrix(list(0),5,5) diag(B)=list("B11","B22","B33","B44","Bfish") B[1,2]="B12";B[2,3]="B23"; B[2,4]="B24" B[4,2]="B42"; B[1:4,5]=list(0,0,"C32","C42") print(B) [1,] [2,] [3,] [4,] [5,] [,1] "B11" 0 0 0 0 [,2] "B12" "B22" 0 "B42" 0 [,3] 0 "B23" "B33" 0 0 [,4] 0 "B24" 0 "B44" 0 [,5] 0 0 "C32" "C42" "Bfish" Now we have a B matrix that looks like that in Equation 13.12. We need to add an extra row to C for the fish biomass row in x: C=matrix(list("C11","C21",0,0,0),5,1) Then we set up the R matrix. 13.3 Analysis a four-species plankton community 187 R=matrix(list(0),5,5) diag(R)=list(0.04,0.04,0.16,0.16,0.36) Last, we need to set up the Q matrix: Q=matrix(list(0),5,5); Q[1:4,1:4]=paste(rep(1:4,times=4),rep(1:4,each=4),sep="") Q[5,5]="fish" Q[lower.tri(Q)]=t(Q)[lower.tri(Q)] print(Q) [1,] [2,] [3,] [4,] [5,] [,1] "11" "12" "13" "14" 0 [,2] "12" "22" "23" "24" 0 [,3] "13" "23" "33" "34" 0 [,4] "14" "24" "34" "44" 0 [,5] 0 0 0 0 "fish" 13.3.9 Fit the model with covariates The model is the same as the previous model with updated process parameters and updated R. We will pass in the updated data matrix with the fish biomass added: plank.model.5=plank.model.4 plank.model.5$B=B plank.model.5$C=C plank.model.5$Q=Q plank.model.5$R=R kem.plank.5=MARSS(d.plank.dat.w.fish, model=plank.model.5) This is the new B matrix using covariates. LP SP D ND LP 0.61 -0.465 --SP -- 0.333 -0.019 -0.048 D --- 0.896 -ND -- 0.044 -- 0.675 Now we are getting are getting close to Ives et al.’s estimates. Compare model 5 in Table 13.1 to the first column. NOTE! When you include your covariates in your state model (the x part), the reported log-likelihood is for the variate plus the covariate data. If you want just the log-likelihood for the variates, then your replace the covariate data with NAs and run the Kalman filter with your estimated model: tmp=kem.plank.5 tmp$marss$data[5,]=NA LL.variates=MARSSkf(tmp)$logLik 188 13 B estimation Table 13.1. The parameter estimates under the different plankton models. Models 0 to 3 do not include covariates, so the C elements are blank. Bij is the effect of species i on species j. 1=large phytoplankton, 2=small phytoplankton, 3=Daphnia, 4=non-Daphnia zooplankton. The Ives et al. (2003) estimates are from their table 2 for the low planktivory lake with the observation model. B11 B22 B33 B44 B12 B23 B24 B42 C11 C21 C32 C42 Ives et al. Model 0 Model 1 Model 2 Model 3 Model 4 Model 5 0.48 0.77 0.50 0.65 0.62 0.61 0.61 0.25 0.51 0.90 0.54 0.51 0.33 0.33 0.74 0.49 0.64 0.83 0.89 0.89 0.90 0.60 0.83 0.81 0.60 0.67 0.66 0.67 -0.39 0.29 0.06 -0.33 -0.32 -0.46 -0.46 -0.17 0.01 0.07 0.00 -0.02 -0.02 -0.02 -0.11 -0.04 0.01 -0.03 0.02 -0.05 -0.05 0.10 1.35 0.37 0.13 0.09 0.05 0.04 0.25 0.14 0.14 0.25 0.16 0.16 -0.14 -0.04 -0.04 -0.01 MARSSkf is the Kalman filter function and it needs a fitted model as output by a MARSS call. We set up a temporary fitted model, tmp, equal to our fitted model and then set the covariate data in that to NAs. Note we need to do this for the marssMODEL object used by MARSSkf, which will be in $marss. We then pass that temporary fitted model to MARSSkf to get the log-likelihood of just the variates. 13.3.10 Discussion The estimates for model 4 are fairly close to the Ives et al. estimates, but still a bit different. There are two big difference between model 4 and the Ives et al. analysis. Ives et al. had data from three lakes and the estimate of Q used the data from all lakes. Combining data, whether it be from different areas or years, can be done in a MARSS model as follows. Let y1 be the first data set (say from site 1) and y2 be the second data set (say from site 2). Then a MARSS model with shared parameters values across datasets would be + xt+ = B+ xt−1 + u+ wt , where wt ∼ MVN(0, Q+ ) yt+ = Z+ xt+ + a+ + vt , where vt ∼ MVN(0, R+ ) (13.15) where the + matrices are stacked matrices from the different sites (1 and 2): 13.4 Stability metrics from estimated interaction matrices 189 x1,t B 0 x1,t−1 u Q q = + + wt , wt ∼ MVN 0, x2,t 0 B x2,t−1 u q Q (13.16) y1,t Z 0 x1,t a R 0 = + + vt , vt ∼ MVN 0, y2,t 0 Z x2,t a 0 R The q in the process variance allows that the environmental variability might might be correlated between datasets, i.e. if they are replicate plots that are nearby, say. If you did not want all the parameters shared, then you replace the B in B+ with B1 and B2 , say. The second big difference is that Ives et al. did not demean their data, but estimated u. We could have done that too, but with all the NAs in the data (during winter), estimating u is not robust and takes a long time. You can try the analysis on the data that has not been demeaned and set U="unequal". The results are not particularly different, but it takes a long, long,...long time to converge. You can also try using the actual fish data instead of the interpolated data. Fish biomass was estimated at the end and start of the season, so only the values at the start and finish of strings of fish numbers are the real data. The others are interpolated. You can fill in those interpolated values with NAs (missing values) and rerun model 4. The results are not appreciably different, but the effect of fish drops a bit as you might expect when you have less fish information. You don’t see it here, but your estimated confidence in the fish effects would also drop since this estimate is based on less fish data. 13.4 Stability metrics from estimated interaction matrices The previous sections focused on estimation of the B and C matrices. The estimated B matrix gives a picture of the species interactions, but it also can be used to compute metrics of the intrinsic community stability (Ives et al., 2003). Here we illustrate how to compute these metric; the reader should see Ives et al. (2003) for details on the meaning of each. For the examples here, we will use the estimated B and Q matrices from our model 5: B = coef(kem.plank.5,type="matrix")$B[1:4,1:4] Q = coef(kem.plank.5,type="matrix")$Q[1:4,1:4] 13.4.1 Return rate metrics Return rate metrics measure how rapidly the system returns to the stationary distribution of species abundances after it is perturbed away from the stationary distribution. With a deterministic (Q = 0) MARSS community model, the 190 13 B estimation equilibrium is a point or stable limit cycle. In a stochastic model (Q 6= 0), the equilibrium is stochastic and is a stationary distribution. Rate of return to the stochastic equilibrium is the rate at which the distribution converges to the stationary distribution after a perturbation away from this stationary distribution. The more rapid the convergence, the more stable the system. The rate of return of the mean of the stationary distribution is governed by the dominant eigenvalue of B. In R, we can compute this as: max(eigen(B)$values) [1] 0.8964988 The rate of return of the variance of the stationary distribution is governed by the dominant eigenvalue of B ⊗ B: max(eigen(kronecker(B,B))$values) [1] 0.8037101 13.4.2 Variance metrics These metrics measure the variance of the stationary distribution of species abundances (with variance due to environmental drivers removed) relative to the process error variance. The system is considered more stable when the stationary distribution variance is low relative to the process error variance. To compute variance metrics, we need to first compute the variancecovariance matrix for the stationary distribution, V∞ : m=nrow(B) vecV = solve(diag(m*m)-kronecker(B,B))%*%as.vector(Q) V_inf = matrix(vecV,nrow=m,ncol=m) A measure of the proportion of the “volume” of the stationary distribution due to species interactions is given by the square of the determinant of the B matrix (Eqn. 24 in Ives et al. (2003)): abs(det(B))^2 [1] 0.01559078 To compare stability across systems of different sizes, you scale by the number of species: abs(det(B))^(2/nrow(B)) [1] 0.3533596 13.5 Further information 191 13.4.3 Reactivity metrics Reactivity measure how the system responds to a perturbation. A highly reactive system tends to move farther away from a stable equilibrium immediately after a perturbation, even though the system will eventually return to the equilibrium. High reactivity occurs when species interactions greatly amplify the environmental variance to produce a stationary distribution with high variances in the abundance of individual species. Both metrics of reactivity of estimates of the average expected change in distance from the mean of the stationary distribution. The first uses estimates of Q and V∞ . -sum(diag(Q))/sum(diag(V_inf)) [1] -0.346845 Estimation of Q is prone to high uncertainty. Another metric that uses only B is the worst-case reactivity. This is given by max(eigen(t(B)%*%B)$values)-1 [1] -0.1957795 13.5 Further information MAR modeling and models have been used estimate species interaction strengths, stability metrics, and environmental drivers for a variety of freshwater plankton systems (Ives, 1995; Ives et al., 1999, 2003; Hampton et al., 2008, 2006; Hampton and Schindler, 2006; Klug and Cottingham, 2001). They have been used to gain much insight into the dynamics of ecological communities and how environmental drivers affect the system. See ? for a review of the literature using MAR models to understand plankton dynamics. 14 Combining data from multiple time series 14.1 Overview In this section, we consider the case where multiple time series exist and we want to use all the datasets to estimate a common underlying state-process or common underlying parameters. In ecological applications, this situation arises when 1) They are time series of observations from the same species as the original time series (e.g. aerial and land based surveys of the same same species) or 2) They are time series collected in the same survey, but represent observations of multiple species (e.g. we might be doing a fisheries trawl survey that collects multiple species in each trawl). Why should we consider using other time series? In the first scenario, where methodology differs between time series of the same species, observation error may be survey-specific. These time series may represent observations of multiple populations, or these may represent multiple observations of the same population. In the second scenario, each species should be treated as a separate process (given its own state vector), but because the survey methodology is the same across species, it might be reasonable to assume a shared observation error variance. If these species have a similar response to environmental stochasticity, it might be possible to also assume a shared process variance. In both of the above examples, MARSS models offer a way to linking multiple time series. If parameters are allowed to be shared among the state processes (trend parameters, process variances) or observation processes (observation variances), parameter estimates will be more precise than if we treated each time series as independent. By improving estimates of variance Type RShowDoc("Chapter_CombiningTrendData.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 194 14 Combining data from multiple time series parameters, we will also be better able to discriminate between process and observation error variances. In this chapter, we will show examples of using MARSS to analyze multiple time series on the same species but either different populations or different survey methods. The multivariate first-order autoregressive state process is written as usual as: xt = Bxt−1 + u + wt where wt ∼ MVN(0, Q) (14.1) The true population sizes at time t are represented by the state xt , whose dimensions are equal to the number of state processes (m). The m × m matrix B allows interaction between processes (density dependence and competition, for instance), u is a vector describing the mean trend, and the correlation of the process deviations is determined by the structure of the matrix Q. The multivariate observation error model is expressed as, yt = Zxt + a + vt where vt ∼ MVN(0, R) (14.2) where yt is a vector of observations at time t, Z is a design matrix of 0s and 1s, a is a vector of bias adjustments, and the correlation structure of observation matrices is specified with the matrix R. Including Z and a will not be required for every model, but is useful when some processes are observed multiple times. 14.2 Salmon spawner surveys In our first application combining multiple time series, we will analyze a dataset on Chinook salmon (Oncorhynchus tshawytscha). This data comes from the Okanogan River in Washington state, a major tributary of the Columbia River (headwaters in British Columbia). As an index of the abundance of spawning adults, biologists have conducted redd surveys during summer months (redds are nests or collection of rocks on stream bottoms where females deposit eggs). Aerial surveys of redds on the Okanogan have been conducted 1956-2008. Alternative ground surveys of were initiated in 1990, and have been conducted annually. 14.2.1 Read in and plot the raw data We will be using the logged counts. head(okanaganRedds) Year aerial ground [1,] 1956 37 NA [2,] 1957 53 NA [3,] 1958 94 NA 14.2 Salmon spawner surveys [4,] 1959 [5,] 1960 [6,] 1961 50 29 NA 195 NA NA NA logRedds = log(t(okanaganRedds)[2:3,]) Year is in the first column, and the counts (in normal space) are in columns 2:3. Missing observations are represented by NAs. 2000 # Code for plotting raw Okanagan redd counts plot(okanaganRedds[,1], okanaganRedds[,2], xlab = "Year", ylab="Redd counts",main="", col="red", pch=1) points(okanaganRedds[,1], okanaganRedds[,3], col="blue", pch=2) legend('topleft', inset=0.1, legend=c("Aerial survey","Ground survey"), col=c("red","blue"), pch=c(1,2)) ● ● 1500 ● Aerial survey Ground survey ● ● ● 1000 Redd counts ● ● ● 500 ● ● 0 ● ● ●● ●● 1960 ●● ● ● ● ● ● ● ● ● ● 1970 ●● ●● ● ● ● ●● ● ● ● ● ● 1980 ●●● ● ●● 1990 ● ● ● ● 2000 2010 Year Fig. 14.1. The two time series look to be pretty close to one another in the years where there is overlap. 196 14 Combining data from multiple time series 14.2.2 Test hypotheses about whether the data can be combined Do these surveys represent observations of the same underlying process? We can evaluate data support for this question by testing a few relatively simple models. Using the logged data, we will start with a simple model that assumes the underlying population process is univariate (there is one underlying population trajectory) and each survey is an independent observation of this population process. Mathematically, the model is: yaer ygnd t xt = xt−1 + u + wt , where wt ∼ N(0, q) 1 0 vaer r0 = x + + , where vt ∼ MVN (0, 1 t a2 vgnd t 0r (14.3) The a structure means that the a for one of the y’s is fixed at 0 and the other a is estimated relative to that fixed a. In MARSS, this is the “scaling” structure for a. We specify this model in MARSS as follows. Since x is univariate, Q and u are just scalars (single numbers) and have no structure (like ‘diagonal’). Thus we can leave them off in our specification. model1=list() model1$R="diagonal and equal" model1$Z=matrix(1,2,1) #matrix of 2 rows and 1 column model1$A="scaling" #the default # Fit the single state model, where the time series are assumed # to be from thesame population. kem1 = MARSS(logRedds, model=model1) We can print the AIC and AICc values for this model by typing kem1$AIC and kem1$AICc. How would we modify the above model to let the observation error variances to be unique? We can do this in our second model: model2=model1 #model2 is based on model1 model2$R="diagonal and unequal" kem2 = MARSS(logRedds, model=model2) It is possible that these surveys are measuring different population processes, so for our third model, we will fit a model with two different population process with the same process parameters. For simplicity, we will keep the trend and variance parameters the same. Mathematically, the model we are fitting is: x1 x u q0 = 1 + + wt , where wt ∼ MVN(0, ) x2 t x2 t−1 u 0q (14.4) yaer 1 0 x1 0 v r0 = + + aer , where vt ∼ MVN 0, ygnd t 0 1 x2 t 0 vgnd t 0r We specify this in MARSS as 14.2 Salmon spawner surveys 197 model3=list() model3$Q="diagonal and equal" model3$R="diagonal and equal" model3$U="equal" model3$Z="identity" model3$A="zero" kem3 = MARSS(logRedds, model=model3) 8 Based on AIC, it looks like the best model is also the simplest one, with one state vector (model1). This suggests that the two different surveys are not only measuring the same thing, but have the same observation error variance. Finally,we will make a plot of the model-predicted states (with +/- 2 s.e.s) and the log-transformed data (Figure 14.2). ● ● ● ● ● ● ● ● 6 ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2 Redd counts ● ● ● ● ●● ● ● ● ● 1960 1970 1980 1990 2000 2010 Year Fig. 14.2. The data support the hypothesis that the two redd-count time series are observations of the same population. The points are the data and the thick black line is the estimated underlying state. 198 14 Combining data from multiple time series 14.3 American kestrel abundance indices In this example, we evaluate uncertainty in the structure of process variability (environmental stochasticity) using some bird survey data. Breeding Bird Surveys have been conducted in the U.S. and Canada for 50+ years. In this analysis, we focus on 3 time series of American kestrel (Falco sparverius) abundance from adjacent Canadian provinces along a longitudinal gradient (British Columbia, Alberta, Saskatchewan). Data have been collected annually, and corrected for changes in observer coverage and detectability. 14.3.1 Read in and look at the data Figure 14.3 shows the data. The data are already log transformed. birddat = t(kestrel[,2:4]) head(kestrel) [1,] [2,] [3,] [4,] [5,] [6,] Year British.Columbia Alberta Saskatchewan 1969 0.754 0.460 0.000 1970 0.673 0.899 0.192 1971 0.734 1.133 0.280 1972 0.589 0.528 0.386 1973 1.405 0.789 0.451 1974 0.624 0.528 0.234 We know that the surveys use the same design, so we will force observation error to be shared. Our uncertainty lies in whether these time series are sampling the same population, and how environmental stochasticity varies by subpopulation (if there are subpopulations). Our first model has one population trajectory (meaning there is one big panmictic BC/AB/SK population) and each of these three surveys is an observation of this big population with equal observation variances. Mathematically, the model is: xt = xt−1 + u + wt , where wt ∼ N(0, q) yBC 1 0 vBC r00 yAB = 1 xt + a2 + vAB , where vt ∼ MVN 0, 0 r 0 ySK t 1 a3 vSK t 00r (14.5) In MARSS, we denote the model: model.b1=list() model.b1$R="diagonal and equal" model.b1$Z=matrix(1,3,1) kem.b1 = MARSS(birddat, model=model.b1, control=list(minit=100) ) As for the redd count example, we do not need to specify the structure of Q and u since they are scalar and have no structure. 2.0 14.3 American kestrel abundance indices ● ● ● ● 1.5 ● British Columbia Alberta Saskatchewan ●● ● ● ● ● ● 1.0 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● 0.5 Index of kestrel abundance ● ● 199 ● ● ● ● ●● 0.0 ● 1970 1980 1990 2000 Year Fig. 14.3. The kestrel data. kem.b1$AICc [1] 20.9067 Let’s compare this to a model where we assume that there is a separate population for British Columbia, Alberta, and Saskatchewan but they have the same process parameters (trend and process variance). Mathematically, this model is: xBC xBC u q00 xAB = xAB + u + wt , where wt ∼ MVN(0, 0 q 0) xSK t xSK t−1 u 00q yBC 1 0 0 xBC 0 vBC r00 yAB = 0 1 0 xAB + 0 + vAB , where vt ∼ MVN 0, 0 r 0 ySK t 0 0 1 xSK t 0 vSK t 00r (14.6) This is specified as: model.b2=list() model.b2$Q="diagonal and equal" 200 14 Combining data from multiple time series model.b2$R="diagonal and equal" model.b2$Z="identity" model.b2$A="zero" model.b2$U="equal" kem.b2 = MARSS(birddat, model=model.b2) The AICc for this model is kem.b2$AICc [1] 22.96714 Because these populations are surveyed over a relatively large geographic area, it is reasonable to expect that environmental variation may differ between populations. For our third model, we will fit a model with separate processes that are allowed to have unequal process parameters. model.b3=model.b2 #is is based on model.b2 #all we change is the structure of Q model.b3$Q="diagonal and unequal" model.b3$U="unequal" kem.b3 = MARSS(birddat, model=model.b3) kem.b3$AICc [1] 23.75125 Finally for a fourth model, we will consider lumping Alberta/Saskatchewan, because the time series suggest similar trends. Mathematically, this model is: xBC u q0 = + + wt , where wt ∼ MVN 0, xAB−SK t xAB−SK t−1 u 0q yBC 10 0 vBC r00 x BC yAB = 0 1 + 0 + vAB , where vt ∼ MVN 0, 0 r 0 xAB−SK t ySK t 01 a3 vSK t 00r (14.7) This model is specified as xBC model.b4=list() model.b4$Q="diagonal and unequal" model.b4$R="diagonal and equal" model.b4$Z=factor(c("BC","AB-SK","AB-SK")) model.b4$A="scaling" model.b4$U="unequal" kem.b4 = MARSS(birddat, model=model.b4) kem.b4$AICc [1] 14.76889 14.3 American kestrel abundance indices 201 2.0 This last model was superior to the others, improving the AICc value compared to model 1 by 8 units. Figure 14.4 shows the fits for this model. ● ● ● ● 1.5 ● ●● ● ● ● ● ● 1.0 ● ● ● ● ● ● ●● ● ● ● British Columbia Alberta Saskatchewan ● ● ● ● ● ● 0.5 ● ● ●● ● ● ● ● ● ●● 0.0 Index of kestrel abundance ● 1970 1980 1990 2000 Year Fig. 14.4. Plot model 4 fits to the kestrel data. 15 Univariate dynamic linear models (DLMs) 15.1 Overview of dynamic linear models In this chapter, we will use MARSS to analyze dynamic linear models (DLMs), wherein the parameters in a regression model are treated as time-varying. DLMs are used commonly in econometrics, but have received less attention in the ecological literature (Lamon et al., 1998; Scheuerell and Williams, 2005, c.f.). Our treatment of DLMs is rather cursory—we direct the reader to excellent textbooks by Pole et al. (1994) and Petris et al. (2009) for more in-depth treatments of DLMs. The former focuses on Bayesian estimation whereas the latter addresses both likelihood-based and Bayesian estimation methods. We begin our description of DLMs with a static regression model, wherein the ith observation is a linear function of an intercept, predictor variable(s), and a random error term. For example, if we had one predictor variable (F), we could write the model as yi = α + βFi + vi , (15.1) where the α is the intercept, β is the regression slope, Fi is the predictor variable matched to the ith observation (yi ), and vi ∼ N(0, r). It is important to note here that there is no implicit ordering of the index i. That is, we could shuffle any/all of the (yi , Fi ) pairs in our dataset with no effect on our ability to estimate the model parameters. We can write the model in Equation 16.2 using vector notation, such that α yi = 1 Fi × + vi β = F> i θ + vi , (15.2) Type RShowDoc("Chapter_UnivariateDLM.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 204 15 Univariate dynamic linear models > and F> i = (1, Fi ) and θ = (α, β) . In a DLM, however, the regression parameters are dynamic in that they “evolve” over time. For a single observation at time t, we can write yt = Ft> θt + vt , (15.3) where Ft is a column vector of regression variables at time t, θt is a column vector of regression parameters at time t and vt ∼ N(0, r). This formulation presents two features that distinguish it from Equation 15.2. First, the observed data are explicitly time ordered (i.e., y = {y1 , y2 , y3 , ..., yT }), which means we expect them to contain implicit information. Second, the relationship between the observed datum and the predictor variables are unique at every time t (i.e., θ = {θ1 , θ2 , θ3 , ..., θT }). However, closer examination of Equation 15.3 reveals an apparent complication for parameter estimation. With only one datum at each time step t, we could, at best, estimate only one regression parameter, and even then, the 1:1 correspondence between data and parameters would preclude any estimation of parameter uncertainty. To address this shortcoming, we return to the time ordering of model parameters. Rather than assume the regression parameters are independent from one time step to another, we instead model them as an autoregressive process where θt = Gt θt−1 + wt , (15.4) Gt is the parameter “evolution” matrix, and wt is a vector of process errors, such that wt ∼ MVN(0, Q). The elements of Gt may be known and fixed a priori, or unknown and estimated from the data. Although we allow for Gt to be time-varying, we will typically assume that it is time invariant. The idea is that the evolution matrix Gt deterministically maps the parameter space from one time step to the next, so the parameters at time t are temporally related to those before and after. However, the process is corrupted by stochastic error, which amounts to a degradation of information over time. If the diagonal elements of Q are relatively large, then the parameters can vary widely from t to t + 1. If Q = 0, then θ1 = θ2 = θT and we are back to the static model in Equation 16.2. 15.2 Example of a univariate DLM Let’s consider an example from the literature. Scheuerell and Williams (2005) used a DLM to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. Upwelling brings cool, nutrient-rich waters from the deep ocean to shallower coastal areas. Scheuerell & Williams hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton. In turn, juvenile salmon 15.2 Example of a univariate DLM 205 (“smolts”) entering the ocean in May and June should find better foraging opportunities. Thus, for smolts entering the ocean in year t, survivalt = αt + βt Ft + vt with vt ∼ N(0, r), (15.5) and Ft is the coastal upwelling index (cubic meters of seawater per second per 100 m of coastline) for the month of April in year t. Both the intercept and slope are time varying, so (1) αt = αt−1 + wt (2) βt = βt−1 + wt (1) with wt with (2) wt ∼ N(0, q1 ); and (15.6) ∼ N(0, q2 ). (15.7) (1) (2) If we define θt = (αt , βt )> , Gt = I ∀ t, wt = (wt , wt )> , and Q = diag(q1 , q2 ), we get Equation 15.4. If we define yt = survivalt and Ft = (1, Ft )> , we can write out the full univariate DLM as a state-space model with the following form: θt = Gt θt−1 + wt with wt ∼ MVN(0, Q); yt = Ft> θt + vt with vt ∼ N(0, r); (15.8) θ0 ∼ MVN(π0 , Λ0 ). Equation 15.8 is, not surprisingly, equivalent to our standard MARSS model: xt = Bt xt−1 + ut + Ct ct + wt with wt ∼ MVN(0, Qt ); yt = Zt xt + at + Dt dt + vt with vt ∼ MVN(0, Rt ); (15.9) x0 ∼ MVN(π, Λ); where xt = θt , Bt = Gt , ut = Ct = ct = 0, yt = yt (i.e., yt is 1 x 1), Zt = Ft> , at = Dt = dt = 0, and Rt = r (i.e., Rt is 1 x 1). 15.2.1 Fitting a univariate DLM with MARSS Now let’s go ahead and analyze the DLM specified in Equations 15.5–15.8. We begin by getting the data set, which has 3 columns for 1) the year the salmon smolts migrated to the ocean (year), 2) logit-transformed survival1 (logit.s), and 3) the coastal upwelling index for April (CUI.apr). There are 42 years of data (1964–2005). # load the data data(SalmonSurvCUI) # get time indices 1 Survival in the original context was defined as the proportion of juveniles that survive to adulthood. Thus, we use the logit function, defined as logit(p) = loge (p/[1 − p]), to map survival from the open interval (0,1) onto the interval (−∞, ∞), which allows us to meet our assumption of normally distributed observation errors. 206 15 Univariate dynamic linear models years = SalmonSurvCUI[,1] # number of years of data TT = length(years) # get response data: logit(survival) dat = matrix(SalmonSurvCUI[,2],nrow=1) As we have seen in other case studies, standardizing our covariate(s) to have zero-mean and unit-variance can be helpful in model fitting and interpretation. In this case, it’s a good idea because the variance of CUI.apr is orders of magnitude greater than survival. # get regressor variable CUI = SalmonSurvCUI[,3] # z-score the CUI CUI.z = matrix((CUI - mean(CUI))/sqrt(var(CUI)), nrow=1) # number of regr params (slope + intercept) m = dim(CUI.z)[1] + 1 −4.0 ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● 1 ● ● ● ● −3 −1 CUI ●● ● ● −6.0 Logit(s) Plots of logit-transformed survival and the z-scored April upwelling index are shown in Figure 15.1. ● ●● ● ● ● ●●●● ●●● ● ● ● ● ●●● ●● ●● ● 1970 1975 1980 1985 1990 ● ● ● ●● ● ● 1965 ● ● ● 1995 2000 2005 Year of ocean entry Fig. 15.1. Time series of logit-transformed marine survival estimates for Snake River spring/summer Chinook salmon (top) and z -scores of the coastal upwelling index at 45N 125W (bottom). The x -axis indicates the year that the salmon smolts entered the ocean. Next, we need to set up the appropriate matrices and vectors for MARSS. Let’s begin with those for the process equation because they are straightforward. 15.2 Example of a univariate DLM # for process eqn B = diag(m) U = matrix(0,nrow=m,ncol=1) Q = matrix(list(0),m,m) diag(Q) = c("q1","q2") # # # # 2x2; 2x1; 2x2; 2x2; 207 Identity both elements = 0 all 0 for now diag = (q1,q2) Defining the correct form for the observation model is a little more tricky, however, because of how we model the effect(s) of explanatory variables. In a DLM, we need to use Zt (instead of dt ) as the matrix of known regressors/drivers that affect yt , and xt (instead of Dt ) as the regression parameters. Therefore, we need to set Zt equal to an n x m x T array, where n is the number of response variables (= 1; yt is univariate), m is the number of regression parameters (= intercept + slope = 2), and T is the length of the time series (= 42). # for observation eqn Z = array(NA, c(1,m,TT)) Z[1,1,] = rep(1,TT) Z[1,2,] = CUI.z A = matrix(0) R = matrix("r") # # # # # NxMxT; empty for now Nx1; 1's for intercept Nx1; regr variable 1x1; scalar = 0 1x1; scalar = r Lastly, we need to define our lists of initial starting values and model matrices/vectors. # only need starting values for regr parameters inits.list = list(x0=matrix(c(0, 0), nrow=m)) # list of model matrices & vectors mod.list = list(B=B, U=U, Q=Q, Z=Z, A=A, R=R) And now we can fit our DLM with MARSS. # fit univariate DLM dlm1 = MARSS(dat, inits=inits.list, model=mod.list) Success! abstol and log-log tests passed at 115 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 115 iterations. Log-likelihood: -40.03813 AIC: 90.07627 AICc: 91.74293 R.r Estimate 0.15708 208 15 Univariate dynamic linear models Q.q1 0.11264 Q.q2 0.00564 x0.X1 -3.34023 x0.X2 -0.05388 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. −4.0 −0.2 0.4 βt 1.0 −6.0 αt Notice that the MARSS output does not list any estimates of the regression parameters themselves. Why not? Remember that in a DLM the matrix of states (x) contains the estimates of the regression parameters (θ). Therefore, we need to look in dlm1$states for the MLEs of the regression parameters, and in dlm1$states.se for their standard errors. Time series of the estimated intercept and slope are shown in Figure 15.2. It appears as though the intercept is much more dynamic than the slope, as indicated by a much larger estimate of process variance for the former (Q.q1). In fact, although the effect of April upwelling appears to be increasing over time, it doesn’t really become important as an explanatory variable until about 1990 when the approximate 95% confidence interval for the slope no longer overlaps zero. 1965 1970 1975 1980 1985 1990 1995 2000 2005 Year of ocean entry Fig. 15.2. Time series of estimated mean states (thick lines) for the intercept (top) and slope (bottom) parameters from the univariate DLM specified by Equations 15.5–15.8. Thin lines denote the mean ± 2 standard deviations. 15.3 Forecasting with a univariate DLM 209 15.3 Forecasting with a univariate DLM Scheuerell and Williams (2005) were interested in how well upwelling could be used to actually forecast expected survival of salmon, so let’s look at how well our model does in that context. To do so, we need the predictive distributions for the regression parameters and observation. Beginning with our definition for the distribution of the parameters at time t = 0, θ0 ∼ MVN(π0 , Λ0 ) in Equation 15.8, we write θt−1 |y1:t−1 ∼ MVN(πt−1 , Λt−1 ) (15.10) to indicate the distribution of θ at time t − 1 conditioned on the observed data through time t − 1 (i.e., y1:t−1 ). Then, we can write the one-step ahead predictive distribution for θt given y1:t−1 as θt |y1:t−1 ∼ MVN(ηt , Φt ), where ηt = Gt πt−1 , and (15.11) Φt = Gt Λt−1 Gt> + Q. Consequently, the one-step ahead predictive distribution for the observation at time t given y1:t−1 is yt |y1:t−1 ∼ N(ζt , Ψt ), where ζt = Ft ηt , and (15.12) Ψt = Ft Φt Ft> + R. 15.3.1 Forecasting a univariate DLM with MARSS Working from Equation 15.12, we can now use MARSS to compute the expected value of the forecast at time t ( E[yt |y1:t−1 ] = ζt ), and its variance ( var[yt |y1:t−1 ] = Ψt ). For the expectation, we need Ft ηt . Recall that Ft is our 1×m matrix of explanatory variables at time t (Ft is called Zt in MARSS notation). The one-step ahead forecasts of the parameters at time t (ηt ) are calculated as part of the Kalman filter algorithm—they are termed x̃tt−1 in MARSS notation and stored as 'xtt1' in the list produced by the MARSSkfss() function. # get list of Kalman filter output kf.out = MARSSkfss(dlm1) # forecasts of regr parameters; 2xT matrix eta = kf.out$xtt1 # ts of E(forecasts) fore.mean = vector() for(t in 1:TT) { fore.mean[t] = Z[,,t] %*% eta[,t,drop=F] } 210 15 Univariate dynamic linear models For the variance of the forecasts, we need Ft Φt Ft> + R. As with the mean, Ft ≡ Zt . The variances of the one-step ahead forecasts of the parameters at time t (Φt ) are also calculated as part of the Kalman filter algorithm—they are stored as 'Vtt1' in the list produced by the MARSSkfss() function. Lastly, the observation variance R is part of the standard MARSS output. # variance of regr parameters; 1x2xT array Phi = kf.out$Vtt1 # obs variance; 1x1 matrix R.est = coef(dlm1, type="matrix")$R # ts of Var(forecasts) fore.var = vector() for(t in 1:TT) { tZ = matrix(Z[,,t],m,1) # transpose of Z fore.var[t] = Z[,,t] %*% Phi[,,t] %*% tZ + R.est } −4 ●●● ● ● ● ● ●● ● ● ● ● ● ● −6 ● ● ● ●● ●● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ●● −8 Logit(s) −2 Plots of the model mean forecasts with their estimated uncertainty are shown in Figure 15.3. Nearly all of the observed values fell within the approximate prediction interval. Notice that we have a forecasted value for the first year of the time series (1964), which may seem at odds with our notion of forecasting at time t based on data available only through time t − 1. In this case, however, MARSS is actually estimating the states at t = 0 (θ0 ), which allows us to compute a forecast for the first time point. 1965 1970 1975 1980 1985 1990 1995 2000 2005 Year of ocean entry Fig. 15.3. Time series of logit-transformed survival data (blue dots) and model mean forecasts (thick line). Thin lines denote the approximate 95% prediction intervals. Although our model forecasts look reasonable in logit-space, it is worthwhile to examine how well they look when the survival data and forecasts are back-transformed onto the interval [0,1] (Figure 15.4). In that case, the accuracy does not seem to be affected, but the precision appears much worse, 15.3 Forecasting with a univariate DLM 211 0.06 Survival 0.12 especially during the early and late portions of the time series when survival is changing rapidly. ● 0.00 ●●● 1965 ● ● ●● ● ●● 1970 ●● ● ●● ● ● ● ●●●●● ●●●●●● ●● ●●● 1975 1980 1985 1990 1995 ●● ●● ●●● 2000 2005 Year of ocean entry Fig. 15.4. Time series of survival data (blue dots) and model mean forecasts (thick line). Thin lines denote the approximate 95% prediction intervals. 15.3.2 DLM forecast diagnostics As with other time series models, evaluation of a DLM should include some model diagnostics. In a forecasting context, we are often interested in the forecast errors, which are simply the observed data minus the forecasts (et = yt − ζt ). In particular, the following assumptions should hold true for et : 1. et ∼ N(0, σ2 ); 2. cov(et , et−k ) = 0. In the literature on state-space models, the set of et are commonly referred to as “innovations”. MARSS() calculates the innovations as part of the Kalman filter algorithm—they are stored as 'Innov' in the list produced by the MARSSkfss() function. # forecast errors innov = kf.out$Innov Let’s see if our innovations meet the model assumptions. Beginning with (1), we can use a Q-Q plot to see whether the innovations are normally distributed with a mean of zero. We’ll use the qqnorm() function to plot the quantiles of the innovations on the y-axis versus the theoretical quantiles from a Normal distribution on the x-axis. If the 2 distributions are similar, the points should fall on the line defined by y = x. # Q-Q plot of innovations qqnorm(t(innov), main="", pch=16, col="blue") 212 15 Univariate dynamic linear models ● −0.5 0.5 ● ● ● −1.5 Sample Quantiles 1.5 # add y=x line for easier interpretation qqline(t(innov)) ●● ● ●● ●●● ●●● ● ●●●● ●● ●● ●●●●● ●●●● ● ● ●●● ● ●● ● −2 −1 0 1 2 Theoretical Quantiles Fig. 15.5. Q-Q plot of the forecast errors (innovations) for the DLM specified in Equations 15.5–15.8. The Q-Q plot (Figure 15.5) indicates that the innovations appear to be more-or-less normally distributed (i.e., most points fall on the line). Furthermore, it looks like the mean of the innovations is about 0, but we should use a more reliable test than simple visual inspection. We can formally test whether the mean of the innovations is significantly different from 0 by using a one-sample t-test. based on a null hypothesis of E(et ) = 0. To do so, we will use the function t.test() and base our inference on a significance value of α = 0.05. # p-value for t-test of H0: E(innov) = 0 t.test(t(innov), mu=0)$p.value [1] 0.4840901 The p-value >> 0.05 so we cannot reject the null hypothesis that E(et ) = 0. Moving on to assumption (2), we can use the sample autocorrelation function (ACF) to examine whether the innovations covary with a time-lagged version of themselves. Using the acf() function, we can compute and plot the correlations of et and et−k for various values of k. Assumption (2) will be met if none of the correlation coefficients exceed the 95% confidence intervals √ defined by ± z0.975 / n. # plot ACF of innovations acf(t(innov), lag.max=10) The ACF plot (Figure 15.6) shows no significant autocorrelation in the innovations at lags 1–10, so it looks like both of our model assumptions have indeed been met. 213 0.2 −0.2 ACF 0.6 1.0 15.3 Forecasting with a univariate DLM 0 2 4 6 8 10 Lag Fig. 15.6. Autocorrelation plot of the forecast errors (innovations) for the DLM specified in Equations 15.5–15.8. Horizontal blue lines define the upper and lower 95% confidence intervals. 214 15 Univariate dynamic linear models Type RShowDoc("Chapter_MLR.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 16 Multivariate linear regression This chapter shows how to write regression models with multivariate responses and multivariate explanatory variables in MARSS form. R has many excellent packages for multiple linear regression. We will be showing how to use the MARSS() function to fit these models, but note that R’s standard linear regression functions would be much better choices in most cases. The purpose of this chapter is to show the relationship between multivariate linear regression and the MARSS equation. In a classic linear regression, the response variable (y) is univariate and there may one to multiple explanatory variables (d1 , d2 , . . . ) plus an optional intercept (α): yt = α + ∑ βk dk + et , where et ∼ N(0, σ2 ) (16.1) k Here the subscript, t is used since we are working with time-series data. Explanatory variables are normally denoted x in linear regression however x is not used here since x is already used in MARSS models to denote the hidden process trajectory. Instead d is used when the explanatory variables appear in the y part of the equation (and c if they appear in the x part). This chapter will start with classical linear regression where the explanatory variables are treated as inputs that are known without error and where we are trying to explain the variation in y with our explanatory variables. We will extend this to the case of autocorrelated errors. 16.1 Univariate linear regression A vanilla linear regression where our data are time ordered but we treat them as independent can be written yt = α + β1 d1,t + β2 d2,t + et , (16.2) 216 16 Multivariate linear regression where the d are our explanatory variables. This model can be written in many different ways in as a MARSS equation. Here we use a specific form that where the i.i.d. component of the errors is vt in the y part of the MARSS equation and the autocorrelated errors appears as xt in the y equation. Specifying the MARSS model this way allows us to use the EM-algorithm to fit the model which will prove to be important. d1,t d yt = α + β1 β2 . . . 2,t + vt + xt , vt ∼ N(0, r) .. . (16.3) xt = bxt−1 + wt , wt N(0, q) x0 = 0 The vt are the i.i.d. errors and the xt are the AR(1) errors. 16.1.1 Univariate response using the Longley dataset: example 1 We will start by using an example from Chapter 6 in Linear Models in R (Faraway, 2004). This example uses the built-in R dataset “longley” which has the number of people employed from 1947 to 1962 and a number of predictors. For this example we will regress the number of people employed against gross National product and population size (following Faraway). Mathematically, the model we are fitting is GNPt Employedt = α + βGNP βPop + vt , vt ∼ N(0, r) (16.4) Popt x does not appear in the vanilla linear regression since we do not have autocorrelated errors (yet). We are trying to estimate α (intercept), βGNP and βPop . A full multivariate MARSS model looks like yt = Zxt + a + Ddt + vt , where vt ∼ MVN(0, R) xt = Bxt−1 + u + Cct + wt , where wt ∼ MVN(0, Q) (16.5) We need to specify the parameters in Equation 16.5 such that we get Equation 16.4 First, we load the data and set up y, the response variable Employed, as a matrix with time going across the columns. data(longley) Employed = matrix(longley$Employed, nrow=1) Second create a list to hold our model specification. longley.model=list() 217 66 60 62 64 Employed 68 70 16.1 Univariate linear regression 1950 1955 1960 Fig. 16.1. Employment time series from the Longley dataset. Set the u, Q and x0 parameters to 0. We will also set a and C to 0 and B and Z to identity although this is not necessary since these are the defaults. longley.model$U=longley.model$Q="zero" longley.model$C="zero" longley.model$B=longley.model$Z="identity" longley.model$x0="zero" longley.model$tinitx=0 We will estimate R; this is the variance of the i.i.d. errors (residuals). longley.model$R=matrix("r") The D matrix has the two β (slope) parameters for GNP and Population and a has the intercept.1 longley.model$A=matrix("intercept") longley.model$D=matrix(c("GNP","Pop"),nrow=1) 1 A better way to fit the model is to put the intercept into D by adding a row of 1s to d and putting the intercept parameter on the first row of D. This reduces by one the number of matrices being estimated by the EM algorithm. It’s not done here just so the equations look more like standard linear regression equations. 218 16 Multivariate linear regression Last we set up our explanatory variables. This is the d matrix and we need each explanatory variable in a row and time across the columns. longley.model$d = rbind(longley$GNP, longley$Population) Now we can fit the model with the MARSS() function: mod1=MARSS(Employed, model=longley.model) and look at the estimates. coef(mod1, type="vector") method="BFGS" can also be used and gives similar results. We can compare the fit to that from lm() and see that we get the same estimates: mod1.lm=lm(Employed ~ GNP + Population, data=longley) coef(mod1.lm) (Intercept) 88.93879831 GNP Population 0.06317244 -0.40974292 16.1.2 Univariate response using auto-correlated errors: example 1 As ? discusses, the errors in this dataset are temporally correlated. We can model the errors as an AR(1) process to account for this. This changes our model to GNPt Employedt = α + βGNP βPop + vt + xt , vt ∼ N(0, r) Popt (16.6) xt = bxt−1 + wt , wt ∼ N(0, q) x0 = 0 We assume the AR(1) errors have mean 0 so u = 0 in the xt equation. Setting u to anything else would make the mean of our errors equal to u/(1 − b) for −1 < b < 1. This would lead to two mean levels in our model, α and u/(1 − b), and we would not be able to estimate both. Notice that the model is somewhat confounded since if b = 0 then xt is i.i.d. errors same as vt . In case, either q or r would be redundant. It is thus possible that either r or q will go to zero. Then we rewrite the model list from our vanilla linear regression to correspond to this MARSS model with AR(1) errors. We estimate b (called φ here) and q. longley.ar1=longley.model longley.ar1$B=matrix("phi") longley.ar1$Q=matrix("q") Now we could fit the model with the MARSS() function as before mod2=MARSS(Employed, model=longley.ar1) 16.1 Univariate linear regression 219 however, this is a difficult model to fit and takes a long, long time to converge (using method="BFGS" helps a little but not much). We can improve behavior by using the fit of the model with i.i.d. errors as initial conditions for D and a. inits=list(A=coef(mod1)$A, D=coef(mod1)$D) mod2=MARSS(Employed, model=longley.ar1, inits=inits, control=list(maxit=1000)) Warning! Abstol convergence only. Maxit (=1000) reached before log-log convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 WARNING: Abstol convergence only no log-log convergence. maxit (=1000) reached before log-log convergence. The likelihood and params might not be at the ML values. Try setting control$maxit higher. Log-likelihood: -10.53742 AIC: 33.07484 AICc: 42.40818 Estimate A.intercept 95.36017 R.r 0.00247 B.phi 0.35094 Q.q 0.21602 D.GNP 0.06749 D.Pop -0.47840 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Convergence warnings Warning: the R.r parameter value has not converged. Type MARSSinfo("convergence") for more info on this warning. We can compare the fit using gls() (nlme package) and see that we get the same estimates: require(nlme) mod2.gls=gls(Employed ~ GNP + Population, correlation=corAR1(), data=longley, method="ML") mod2.gls.phi = coef(mod2.gls$modelStruct[[1]], unconstrained=FALSE) c(mod2.gls.phi, coef(mod2.gls), logLik(mod2.gls)) Phi 0.36511964 -10.47396091 (Intercept) 96.09369245 GNP 0.06822305 Population -0.48715545 220 16 Multivariate linear regression Note we need to set method="ML" to maximize the likelihood because the default is to maximize the restricted maximum-likelihood (method="REML") and that gives a different answer from the MARSS() function since MARSS() is maximizing the likelihood. 16.1.3 Univariate response using the Longley dataset: example 2 The full Longley dataset is often used to test the performance of numerical methods for fitting linear regression models because it has severe collinearity problems (Figure 16.2). We can compare the EM and BFGS algorithms for the full dataset and see how fitting a MARSS model with the BFGS algorithm leads to estimates far from the maximum-likelihood values for this problem. ● ● ● ● ●● ● GNP ● ●● ● ● 150 300 ● ● ● ●● ●● ● ●●●● ● ●● ● 1950 1960 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● 400 ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●● ● ● ●● ● ●●●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● Year ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● Employed ● ●● ● ●● ●● 125 105 ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● 110 ● ●● ●●●● Population ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ●●●● ● ● ●● ● ●●● ●● ● ● ●● ● ●●●● Armed.Forces ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● 85 ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ●● ● ●● ●● ● ●●● ● ● ● ● ●●● 105 ● ● ●● ● ● Unemployed ● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ●● 85 ● ● ● ●● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ●●● ● ● ● ●●● ● ●● ● ● ●● ●● ● ● ● ● ●●● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● 400 250 450 ●● ● ● ● ●● ●● 1950 1960 ● ●● ●● ● ● ● ●● ● ● 200 ● ●● 300 ●● ● ●● ● ● ●● ● ● ● 125 ● ● ● ●● GNP.deflator ● ● ●● ● 110 150 ● ●● ●● ● ● 66 450 60 250 60 66 Fig. 16.2. Pairs plot showing colliniearity in the Longley explanatory variables. We can fit a regression of Employed to all the Longley explanatory variables using the following code. The mathematical model is the same as in Equation 16.4 except that instead of two explanatory variables with have all seven shown in Figure 16.2. 16.1 Univariate linear regression 221 eVar.names = colnames(longley)[-7] eVar = t(longley[,eVar.names]) longley.model=list() longley.model$U=longley.model$Q="zero" longley.model$C="zero" longley.model$B=longley.model$Z="identity" longley.model$A=matrix("intercept") longley.model$R=matrix("r") longley.model$D=matrix(eVar.names,nrow=1) longley.model$d = eVar longley.model$x0="zero" longley.model$tinitx=0 And then we fit as usual. We will fit with the EM-algorithm (the default) and compare to BFGS. mod3.em=MARSS(Employed, model=longley.model) mod3.bfgs=MARSS(Employed, model=longley.model, method="BFGS") Here are the EM estimates with the log-likelihood. par.names = c("A.intercept", paste("D",eVar.names,sep=".")) c(coef(mod3.em, type="vector")[par.names], logLik=mod3.em$logLik) A.intercept D.GNP.deflator -3.482259e+03 1.506187e-02 D.Armed.Forces D.Population -1.033227e-02 -5.110412e-02 D.GNP -3.581918e-02 D.Year 1.829151e+00 D.Unemployed -2.020230e-02 logLik 9.066497e-01 Compared to the BFGS estimates: c(coef(mod3.bfgs, type="vector")[par.names], logLik=mod3.bfgs$logLik) A.intercept D.GNP.deflator -14.062098829 -0.052705201 D.Armed.Forces D.Population -0.005744197 -0.412771922 D.GNP 0.070642032 D.Year 0.055610012 D.Unemployed -0.004298481 logLik -6.996818721 And compared to the estimates from the lm() function: mod3.lm=lm(Employed ~ 1 + GNP.deflator + GNP + Unemployed + Armed.Forces + Population + Year, data=longley) c(coef(mod3.lm),logLik=logLik(mod3.lm)) (Intercept) GNP.deflator GNP Unemployed -3.482259e+03 1.506187e-02 -3.581918e-02 -2.020230e-02 Armed.Forces Population Year logLik -1.033227e-02 -5.110411e-02 1.829151e+00 9.066497e-01 222 16 Multivariate linear regression As you can see the BFGS algorithm struggles with the ridge-like likelihood caused by the collinearity in the explanatory variables. We can also compare the performance of the model with AR(1) errors. This is Equation 16.6 but with all seven explanatory variables. We set up the MARSS model2 for a linear regression with correlated errors as before with the addition of b (called φ) and q. longley.correrr.model=longley.model longley.correrr.model$B=matrix("phi") longley.correrr.model$Q=matrix("q") We fit as usual and compare the EM-algorithm (the default) to fits using BFGS. We will use the estimatefrom the model with i.i.d. errors as initial conditions. inits=list(A=coef(mod3.em)$A, D=coef(mod3.em)$D) mod4.em=MARSS(Employed, model=longley.correrr.model, inits=inits) mod4.bfgs=MARSS(Employed, model=longley.correrr.model, inits=inits, method="BFGS") Here are the EM estimates with the log-likelihood. We only show φ (the b term in the AR(1) error equation) and the log-likelihood. c(coef(mod4.em, type="vector")["B.phi"], logLik=mod4.em$logLik) B.phi -0.7737392 logLik 4.5374543 Compared to the BFGS estimates: c(coef(mod4.bfgs, type="vector")["B.phi"], logLik=mod4.bfgs$logLik) B.phi logLik 0.8368962 0.9066497 And compared to the estimates from the gls() function: mod4.gls=gls(Employed ~ 1 + GNP.deflator + GNP + Unemployed + Armed.Forces + Population + Year, correlation=corAR1(), data=longley, method="ML") mod4.gls.phi = coef(mod4.gls$modelStruct[[1]], unconstrained=FALSE) c(mod4.gls.phi, logLik=logLik(mod4.gls)) Phi -0.7288687 logLik 4.3865475 Again we see that the BFGS algorithm struggles with the ridge-like likelihood caused by the collinearity in the explanatory variables. 2 Notice that x0 is set at 0. The model is having a hard time fitting x0 because the time series is short. Estimating x0 or using a diffuse prior by setting V0 big, leads to poor estimates. Since this is just the error term, we set x0 = 0 since the mean of the errors is assumed to be 0. 16.2 Multivariate response example using longitudinal data 223 16.2 Multivariate response example using longitudinal data We will illustrate linear regression with a multivariate response using longitudinal data from a sleep study on 18 subjects from the lme4 R package. These are data on reaction time of subjects after 0 to 9 days of being restricted to 3 hours of sleep. We load the data from the lme4 package: data(sleepstudy,package="lme4") 0 2 4 6 8 372 0 2 4 6 8 333 352 0 2 4 6 8 331 330 337 ●● 450 400 ●● ● ●● 350 300 250 ● ● ● ●● ● ● ●● ● ●● ●●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● Average reaction time (ms) ● 200 308 371 369 351 335 332 ● ● 450 ● ● 400 ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ● ●●● ●● ● ● 350 300 ● 250 200 310 309 370 349 350 334 450 400 ●● ● ● 350 300 250 200 ● ●●● ● ●●● ●●● ● ● ● ●●●● ● ●●●● ●● 0 2 4 6 8 ● 0 2 4 6 8 ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●●● 0 2 4 6 8 Days of sleep deprivation Fig. 16.3. Plot of the sleep study data (package lme4). We set up the data into the form required for the MARSS() function. #number of subjects nsub = length(unique(sleepstudy$Subject)) ndays=length(sleepstudy$Days)/nsub #each subject is a row with day across the columns dat = matrix(sleepstudy$Reaction, nsub, ndays,byrow=TRUE) rownames(dat)=paste("sub",unique(sleepstudy$Subject),sep=".") 224 16 Multivariate linear regression #the day number 0 to 9 is the explanatory variable exp.var=matrix(sleepstudy$Days, 1, ndays,byrow=TRUE) Let’s start with a simple regression where each subject has a separate intercept (reaction time at day 0) but the slope (increase in reaction time with each successive day) is the same across the 18 subjects. Mathematically the model is v1 β α1 resp1 v2 α2 β resp2 . . . = . . . + . . . dayt + . . . v18 β α18 resp18 t t (16.7) v1 r 0 ... 0 v2 ∼ N 0, 0 r . . . 0 ... . . . . . . . . . . . . v18 t 0 0 0 r The response time of subject i is a subject specific intercept (αi ) plus an effect of day at time t (so 0, 1, 2, etc) that doesn’t vary by subject and error that is i.i.d. across subject and day. We specifiy and fit this model as follows sleep.model=list( A="unequal",B="zero", x0="zero", U="zero", D=matrix("b1",nsub,1), d=exp.var, tinitx=0, Q="zero") sleep.mod1 = MARSS(dat, model=sleep.model) This is the same as the following with lm(): sleep.lm1 = lm(Reaction ~ -1 + Subject + Days, data=sleepstudy) Now let’s allow each subject to have different slopes (increase in reaction time with each successive day) across subjects. This model is resp1 α1 β1 v1 resp2 α2 β2 v2 . . . = . . . + . . . dayt + . . . resp18 t α18 β18 v18 t (16.8) v1 r 0 ... 0 v2 ∼ N 0, 0 r . . . 0 ... . . . . . . . . . . . . v18 t 0 0 0 r We specify and fit this model as sleep.model=list( A="unequal",B="zero", x0="zero", U="zero", D="unequal", d=exp.var, tinitx=0, Q="zero") sleep.mod2 = MARSS(dat, model=sleep.model, silent=TRUE) 16.2 Multivariate response example using longitudinal data 225 This is the same as the following with lm(): sleep.lm2 = lm(Reaction ~ 0 + Subject + Days:Subject, data = sleepstudy) We can repeat the above but allow the residual variance to differ across subjects by setting R="diagonal and unequal". This model is resp1 α1 β1 v1 resp2 α2 β2 v2 . . . = . . . + . . . dayt + . . . resp18 t α18 β18 v18 t (16.9) r1 0 . . . 0 v1 v2 ∼ N 0, 0 r2 . . . 0 . . . . . . . . . . . . ... 0 0 0 r18 v18 t sleep.model=list( A="unequal",B="zero", x0="zero", U="zero", D="unequal", d=exp.var, tinitx=0, Q="zero", R="diagonal and unequal") sleep.mod3 = MARSS(dat, model=sleep.model, silent=TRUE) Or we can allow AR(1) errors across subjects and allow each subject to have its own AR(1) parameters for this error. This model is resp1 α1 β1 v1 x1 resp2 α2 β2 v2 x2 . . . = . . . + . . . dayt + . . . + . . . resp18 t α18 β18 v18 t x18 t v1 r1 0 . . . 0 v2 0 r2 . . . 0 ∼ N 0, ... . . . . . . . . . . . . v18 t 0 0 0 r18 (16.10) x1 x1 b1 0 . . . 0 w1 x2 = 0 b2 . . . 0 x2 + w2 ... . . . . . . . . . . . . . . . ... 0 0 0 b18 x18 t−1 x18 t w18 t w1 q1 0 . . . 0 w2 0 q2 . . . 0 ∼ N 0, ... . . . . . . . . . . . . w18 t 0 0 0 q18 We fit this model as inits=list(A=coef(sleep.mod3)$A,D=coef(sleep.mod3)$D) #estimate a separate intercept for each but slope is the same 226 16 Multivariate linear regression sleep.model=list( A="unequal",B="diagonal and unequal", x0="zero", U="zero", D="unequal", d=exp.var, tinitx=0, Q="diagonal and unequal", R="diagonal and unequal") sleep.mod4 = MARSS(dat, model=sleep.model, inits=inits, silent=TRUE) It is not obvious how to specify these last two models using gls() or if it is possible. We can also allow each subject has his/her own error process but specify that the parameters of these (φ, q and r) are the same across subjects. We do this by using "diagonal and equal". Mathematically this model is x1 v1 β1 α1 resp1 v2 x2 α2 β2 resp2 . . . = . . . + . . . dayt + . . . + . . . x18 t v18 t β18 α18 resp18 t r 0 ... 0 v1 v2 0 r . . . 0 ∼ N 0, ... . . . . . . . . . . . . 0 0 0 r v18 t (16.11) x1 x1 b 0 ... 0 w1 x2 = 0 b . . . 0 x2 + w2 ... . . . . . . . . . . . . . . . ... x18 t 0 0 0 b x18 t−1 w18 t w1 q 0 ... 0 w2 0 q . . . 0 ∼ N 0, ... . . . . . . . . . . . . 0 0 0 q w18 t We specify and fit this model as inits=list(A=coef(sleep.mod3)$A,D=coef(sleep.mod3)$D) #estimate a separate intercept for each but slope is the same sleep.model=list( A="unequal",B="diagonal and equal", x0="zero", U="zero", D="unequal", d=exp.var, tinitx=0, Q="diagonal and equal", R="diagonal and equal") sleep.mod5 = MARSS(dat, model=sleep.model, inits=inits, silent=TRUE) This is fairly close to this model fit with gls(). sleep.mod5.gls=gls(Reaction ~ 0 + Subject + Days:Subject, data=sleepstudy, correlation=corAR1(form=~ 1|Subject), method="ML") The way the variance-covariance structure is modeled is a little different but it’s the same idea. 16.2 Multivariate response example using longitudinal data 227 Table 16.1. Parameter estimates of different versions of the model where each subject has a separate intercept (response time on normal sleep) and different slope by day (increase in response time with each day of sleep deprivation). The model types are discussed in the text. logLik slope 308 slope 309 slope 310 slope 330 slope 331 slope 332 slope 333 slope 334 slope 335 slope 337 slope 349 slope 350 slope 351 slope 352 slope 369 slope 370 slope 371 slope 372 phi 308 phi 309 phi 310 phi 330 phi 331 phi 332 phi 333 phi 334 phi 335 phi 337 phi 349 phi 350 phi 351 phi 352 phi 369 phi 370 phi 371 phi 372 lm mod2 em mod3 em mod4 em mod5 em mod5 gls -818.94 -818.94 -770.19 -754.97 -818.76 -818.55 21.76 21.76 21.76 21.77 21.83 21.87 2.26 2.26 2.26 1.43 2.24 2.23 6.11 6.11 6.11 6.12 6.10 6.08 3.01 3.01 3.01 2.93 3.01 3.04 5.27 5.27 5.27 3.59 5.36 5.46 9.57 9.57 9.57 8.55 9.39 9.21 9.14 9.14 9.14 8.85 9.12 9.12 12.25 12.25 12.25 11.73 12.24 12.26 -2.88 -2.88 -2.88 -3.19 -2.82 -2.77 19.03 19.03 19.03 19.09 18.95 18.90 13.49 13.49 13.49 12.14 13.47 13.46 19.50 19.50 19.50 18.21 19.38 19.28 6.43 6.43 6.43 6.15 6.54 6.64 13.57 13.57 13.57 19.20 13.71 13.80 11.35 11.35 11.35 11.41 11.32 11.31 18.06 18.06 18.06 18.31 18.01 17.97 9.19 9.19 9.19 9.56 9.23 9.28 11.30 11.30 11.30 11.45 11.28 11.26 0.02 0.12 0.08 0.63 0.12 0.08 -0.01 0.12 0.08 0.32 0.12 0.08 -1.66 0.12 0.08 0.26 0.12 0.08 -1.04 0.12 0.08 0.51 0.12 0.08 -0.40 0.12 0.08 -0.08 0.12 0.08 0.80 0.12 0.08 0.32 0.12 0.08 -0.15 0.12 0.08 0.80 0.12 0.08 -0.25 0.12 0.08 -0.44 0.12 0.08 0.63 0.12 0.08 -0.47 0.12 0.08 228 16 Multivariate linear regression 16.3 Summary The purpose of the is chapter is to illustrate how linear regression models with multivariate explanatory variable or response variables can be written in MARSS form. Thus they can be fit with the MARSS() function3 . Obviously R has many, many packages for linear regression and generalized linear regression (non-Gaussian errors). While the MARSS package can fit a variety of linear regression models with Gaussian errors, that is not what it is designed to do. The MARSS package is designed for fitting models that cannot be fit with typical linear regression: multivariate autoregressive state-space models with inputs (explanatory variables). 3 with caveat that one must always be careful when the likelihood surface has prominent ridges which will occur with collinear explanatory variables. 17 Lag-p models with MARSS 17.1 Background Most of the chapters in the User Guide are ‘lag-1’ in the autoregressive part of the model. This means that xt in the process model only depends on xt−1 and not xt−2 (lag-2) or more generally xt−p (lag-p). A lag-p model can be written in state-space form as a MARSS lag-1 model, aka a MARSS(1) model (see section 11.3.2 in Tsay (2010)). Writing lag-p models in this form allows one to take advantage of the fitting algorithms for MARSS(1) models. There are a number of ways to do the conversion to a MARSS(1) form. We use Hamilton’s form (section 1 in Hamilton (1994)) because it can be fit with an EM algorithm while the other forms (Harvey’s and Akaike’s) cannot. This chapter shows how to convert and fit the following using the MARSS(1) form: AR(p) A univariate autoregressive model where xt is a function of xt−p (and the prior lags usually too). No observation error. MAR(p) The same as AR(p) but the x term is multivariate not univariate. ARSS(p) The same as AR(p) but with a observation model and observation error. The observations may be multivariate (y can be multivariate) but the x term is univariate. MARSS(p) The same as ARSS(p) but the x term is multivariate not univariate. Note that only ARSS(p) and MARSS(p) assume observation error in the data. AR(p) and MAR(p) will be rewritten in the state-space form with a y component to facilitate statistical analysis but the data themselves are considered error free. Type RShowDoc("Chapter_MARp.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter. 230 17 Models with lag-p Note there are many R packages for fitting AR(p) (and ARMA(p,q) for that matter) models. If you are only interested in univariate data with no observation error in the data then you probably want to look into the arima() function included in base R and into R packages that specialize in fitting ARMA models to univariate data. The forecast package in R is a good place to start but others can be found on the CRAN task view: Time Series Analysis. 17.2 MAR(2) models A MAR(2) model is a lag-2 MAR model, aka a multivariate autoregressive process with no observation process (no SS part). A MAR(2) model is written 0 0 xt0 = B1 xt−1 + B2 xt−2 + u + wt , where wt ∼ MVN(0, Q) 0 x We rewrite this as MARSS(1) by defining xt = 0 t : xt−1 0 0 xt B1 B2 xt−1 u wt wt Q0 = + + , ∼ MVN 0, 0 0 xt−1 Im 0 xt−2 0 0 0 0 0 0 x0 ∼ MVN(π, Λ) x0−1 Our observations are of xt only, so our observation model is xt0 yt = I 0 0 xt−1 (17.1) (17.2) (17.3) 17.2.1 Example of AR(2): univariate data Here is an example of fitting a univariate AR(2) model written in MARSS(1) form. First, let’s generate some simulated AR(2) data from this AR(2) process: xt = −1.5xt−1 + −0.75xt−2 + wt , where wt ∼ N(0, 1) (17.4) TT=50 true.2=c(r=0,b1=-1.5,b2=-0.75,q=1) temp=arima.sim(n=TT,list(ar=true.2[2:3]),sd=sqrt(true.2[4])) sim.ar2=matrix(temp,nrow=1) Next, we set up the model list for an AR(2) model written in MARSS(1) form (refer to Equation 17.2 and 17.3): Z=matrix(c(1,0),1,2) B=matrix(list("b1",1,"b2",0),2,2) U=matrix(0,2,1) Q=matrix(list("q",0,0,0),2,2) 17.2 MAR(2) models 231 A=matrix(0,1,1) R=matrix(0,1,1) pi=matrix(sim.ar2[2:1],2,1) V=matrix(0,2,2) model.list.2=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=1) Notice that we do not need to estimate π. We will fit our model to the data (y) starting at t = 2. Because R = 0 (see Equation refeq:mar2.y, this means y x1 = 2 . If we define π ≡ x1 by setting tinitx=1, then π is known and is y1 simply the first two data points. Then we can then estimate the b1 and b2 parameters for the AR(2) process. ar2=MARSS(sim.ar2[2:TT],model=model.list.2) Success! algorithm run for 15 iterations. abstol and log-log tests passed. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Algorithm ran 15 (=minit) iterations and convergence was reached. Log-likelihood: -63.02523 AIC: 132.0505 AICc: 132.5838 B.b1 B.b2 Q.q Estimate -1.582 -0.777 0.809 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. Comparison to the true values shows the estimates are close: print(cbind(true=true.2[2:4],estimates=coef(ar2,type="vector"))) true estimates b1 -1.50 -1.5816137 b2 -0.75 -0.7767462 q 1.00 0.8091055 Missing values in the data are fine. Let’s make half the data missing being careful that the first data point does not get categorized as missing: gappy.data=sim.ar2[2:TT] gappy.data[floor(runif(TT/2,2,TT))]=NA ar2.gappy=MARSS(gappy.data,model=model.list.2) 232 17 Models with lag-p And the estimates are still close: print(cbind(true=true.2[2:4], estimates.no.miss=coef(ar2,type="vector"), estimates.w.miss=coef(ar2.gappy,type="vector"))) true estimates.no.miss estimates.w.miss b1 -1.50 -1.5816137 -1.6553403 b2 -0.75 -0.7767462 -0.8578665 q 1.00 0.8091055 0.6492250 By the way, there are much easier and faster functions in R for fitting univariate AR models (no observation error). For example, here is how you would fit the AR(2) model using the base arima() function: arima(gappy.data,order=c(2,0,0),include.mean=FALSE) Call: arima(x = gappy.data, order = c(2, 0, 0), include.mean = FALSE) Coefficients: ar1 -1.6651 s.e. 0.0746 ar2 -0.8601 0.0743 sigma^2 estimated as 0.6539: log likelihood = -48.05, aic = 102.1 The advantage of using the MARSS package really only comes in when you are fitting to multivariate data or data with observation error. 17.2.2 Example of MAR(2): multivariate data Here we show an example of fitting a MAR(2) model. Let’s make some simulated data of two realizations of the same AR(2) process: TT=50 true.2=c(r=0,b1=-1.5,b2=-0.75,q=1) temp1=arima.sim(n=TT,list(ar=true.2[c("b1","b2")]),sd=sqrt(true.2["q"])) temp2=arima.sim(n=TT,list(ar=true.2[c("b1","b2")]),sd=sqrt(true.2["q"])) sim.mar2=rbind(temp1,temp2) Although these are independent time series, we want to fit with a MAR(2) model to allow us to use both datasets together to estimate the AR(2) parameters. We need to set up the model list for the multivariate model (Equation 17.2 and 17.3): 17.2 MAR(2) models 233 Z=matrix(c(1,0,0,1,0,0,0,0),2,4) B1=matrix(list(0),2,2); diag(B1)="b1" B2=matrix(list(0),2,2); diag(B2)="b2" B=matrix(list(0),4,4) B[1:2,1:2]=B1; B[1:2,3:4]=B2; B[3:4,1:2]=diag(1,2) U=matrix(0,4,1) Q=matrix(list(0),4,4) Q[1,1]="q"; Q[2,2]="q" A=matrix(0,2,1) R=matrix(0,2,2) pi=matrix(c(sim.mar2[,2],sim.mar2[,1]),4,1) V=matrix(0,4,4) model.list.2m=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=1) Notice the form of the Z matrix: [1,] [2,] [,1] [,2] [,3] [,4] 1 0 0 0 0 1 0 0 It is a 2 × 2 identity matrix followed by a 2 × 2 all-zero matrix. The B matrix is composed of B1 and B2 which are diagonal matrices with b1 and b2 respectively on the diagonal. [1,] [2,] [3,] [4,] [,1] "b1" 0 1 0 [,2] 0 "b1" 0 1 [,3] "b2" 0 0 0 [,4] 0 "b2" 0 0 We fit the model as usual: mar2=MARSS(sim.mar2[,2:TT],model=model.list.2m) Then we can compare how using two time series improves the fit versus using only one alone: model.list.2$x0=matrix(sim.mar2[1,2:1],2,1) mar2a=MARSS(sim.mar2[1,2:TT],model=model.list.2) model.list.2$x0=matrix(sim.mar2[2,2:1],2,1) mar2b=MARSS(sim.mar2[2,2:TT],model=model.list.2) true est.mar2 est.mar2a est.mar2b b1 -1.50 -1.4546302 -1.3192188 -1.5560202 b2 -0.75 -0.8176845 -0.7514445 -0.8766648 q 1.00 0.7736720 0.7922118 0.6803098 234 17 Models with lag-p 17.3 MAR(p) models A MAR(p) model is similar to a MAR(2) except it has lags up to time p: 0 0 0 xt0 = B1 xt−1 + B2 xt−2 + · · · + B p xt−p + u0 + wt0 , where wt0 ∼ MVN(0, Q0 ) where B1 xt0 Im 0 xt−1 xt = . , B = 0 .. 0 0 xt−p 0 0 0 Bp u Q 0 0 0 0 , u = .. , Q = . 0 .. . 0 0 0 ... 0 B2 . . . 0 ... Im . . . . 0 .. 0 ... 0 ... . 0 .. 0 0 .. . (17.5) 0 ... 0 Here’s an example of fitting a univariate AR(3) in MARSS(1) form. We need more data to estimate an AR(3), so use 100 time steps. TT=100 true.3=c(r=0,b1=-1.5,b2=-0.75,b3=.05,q=1) temp3=arima.sim(n=TT,list(ar=true.3[c("b1","b2","b3")]),sd=sqrt(true.3["q"])) sim.ar3=matrix(temp3,nrow=1) We set up the model list for the AR(3) in MARSS(1) form as follows: Z=matrix(c(1,0,0),1,3) B=matrix(list("b1",1,0,"b2",0,1,"b3",0,0),3,3) U=matrix(0,3,1) Q=matrix(list(0),3,3); Q[1,1]="q" A=matrix(0,1,1) R=matrix(0,1,1) pi=matrix(sim.ar3[3:1],3,1) V=matrix(0,3,3) model.list.3=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=1) and fit as normal: ar3=MARSS(sim.ar3[3:TT],model=model.list.3) The estimates are: print(cbind(true=true.3[c("b1","b2","b3","q")], estimates.no.miss=coef(ar3,type="vector"))) true estimates.no.miss b1 -1.50 -1.5130316 b2 -0.75 -0.6755283 b3 0.05 0.1368458 q 1.00 1.1267684 17.4 MARSS(p): models with observation error 235 17.4 MARSS(p): models with observation error We can easily fit MAR(p) processes observed with error using MARSS(p) models, but the difficulty is specifying the initial state condition. π ≡ x1 and thus involves x1 , x0 , .... However, we do not know the variance covariance structure for these consequtive x. Specifying Λ = 0 and estimating π often causes the EM algorithm to run into numerical problems. But if we have an abundance of data, fixing π might not overly affect the B and Q estimates. Here is an example where we set π to the mean of the data and set Λ to zero. Why not set Λ equal to a diagonal matrix with large values on the diagonal to approximate a vague prior? The temporally consecutive initial states are definitely not independent. A diagonal matrix would imply independence which will conflict with the process model and means our model would be fundamentally inconsistent with the data (and that usually has bad consequences for estimation). Create some simulated data: TT=1000 #set long true.2ss=c(r=.5,b1=-1.5,b2=-0.75,q=.1) temp=arima.sim(n=TT,list(ar=true.2ss[c("b1","b2")]),sd=sqrt(true.2ss["q"])) sim.ar=matrix(temp,nrow=1) noise=rnorm(TT-1,0,sqrt(true.2ss["r"])) noisy.data=sim.ar[2:TT]+noise Set up the model list for the model in MARSS(1) form: Z=matrix(c(1,0),1,2) B=matrix(list("b1",1,"b2",0),2,2) U=matrix(0,2,1) Q=matrix(list("q",0,0,0),2,2) A=matrix(0,1,1) R=matrix("r") V=matrix(0,2,2) pi=matrix(mean(noisy.data),2,1) model.list.2ss=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=0) Fit as usual: ar2ss=MARSS(noisy.data,model=model.list.2ss) Success! abstol and log-log tests passed at 101 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 101 iterations. 236 17 Models with lag-p Log-likelihood: -1368.796 AIC: 2745.592 AICc: 2745.632 R.r B.b1 B.b2 Q.q Estimate 0.477 -1.414 -0.685 0.140 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. We can compare the results to modeling the data as if there is no observation error, and we see that that assumption leads to poor B estimates: model.list.2ss.bad=model.list.2ss #set R to zero in this model model.list.2ss.bad$R=matrix(0) Fit using the model with R set to 0: ar2ss2=MARSS(noisy.data,model=model.list.2ss.bad) Compare results print(cbind(true=true.2ss, model.no.error=c(NA,coef(ar2ss2,type="vector")), model.w.error=coef(ar2ss,type="vector"))) true model.no.error model.w.error r 0.50 NA 0.4772368 b1 -1.50 -0.52826082 -1.4136279 b2 -0.75 0.03372857 -0.6853180 q 0.10 0.95834464 0.1404334 The middle column are the estimates assuming that the data have no observation error and the right column are our estimates with the observation error estimated. Clearly, assuming no observation error when it is present has negative consequences for the B and Q estimates. By the way, there is a straight-forward way to deal with the measurement error if you are working with univariate ARMA models and you are only interested in the AR parameters (the b’s). Inclusion of measurement error leads to additional MA components up to lag p (Staudenmayer and Buonaccorsi, 2005). This means that if you are fitting a AR(p) model with measurement error, you can fit a ARMA(p,p) and the measurement error will be absorbed in the p MA components. For the example above, we could estimate the AR parameters for our AR(2) data with measurement error by fitting a ARMA(p,p) model. Here’s how we could do that using R ’s arima() function: 17.5 Discussion 237 arima(noisy.data,order=c(2,0,2),include.mean=FALSE) Call: arima(x = noisy.data, order = c(2, 0, 2), include.mean = FALSE) Coefficients: ar1 -1.4448 s.e. 0.0593 ar2 -0.6961 0.0427 ma1 0.9504 0.0686 sigma^2 estimated as 0.9069: ma2 0.3428 0.0482 log likelihood = -1368.99, aic = 2747.99 Accounting for the measurement error definitely improves the estimates for the AR component. 17.5 Discussion Although both MARSS(1) and ARMA(p,p) approaches can be used to deal with AR(p) processes (univariate data) observed with error, our simulations suggest that the MARSS(1) approach is less biased and more precise (Figure 17.1) and that the EM algorithm is working better for this problem. The performance of different approaches depends greatly on the underlying model. We chose AR parameters where both ARMA(p,p) and MARSS(1) approaches work. If we used, for example, b1 = 0.8 and b2 = −0.2, the ARMA(2,2) gives b1 estimates close to 0 (i.e. wrong) while the MARSS(1) EM approach gives estimates close to the truth (though rather variable). One would want to also check REML approaches for fitting the ARMA(p,p) models since REML has been found to be less biased than ML estimation for this class (Cheang and Reinsel, 2000; Ives et al., 2010). Ives et al. 2010 has R code for REML estimation of ARMA(p,q) models in their appendix. For multivariate data observed with error, especially multivariate data without a one-to-one relationship to the underlying autoregressive process, an explict MARSS model will need to be used rather than an ARMA(p,p) model. The time steps required for good parameter estimates are likely to be large; in our simulations, we used 100 for a AR(3) and 1000 for a ARSS(2). Thorough simulation testing should be conducted to determine if the data available are sufficient to allow estimation of the B terms at multiple lags. 238 17 Models with lag-p b1 b2 ● ● 1.5 ● ● ● ● 1.0 ● ● ● ● x 0.0 ● ● ● ● ● ● ● x ● ● ● −1.0 ● x x ARMA (2,2) −0.5 ● MARSS BFGS estimates of the ar coefficients ● 0.5 x x ● ● ● ● ● ● ● x x −1.5 ● MARSS EM AR(2) ARMA (2,2) MARSS BFGS MARSS EM AR(2) −2.0 Fig. 17.1. Comparison of the AR parameter estimates using different approaches to model ARSS(2) data (univariate AR(2) data observed with error). Results are from 200 simulations of AR(2) data with 100 time steps. Results are shown for the b1 and b2 parameters of the AR process fit with a 1) AR(2) model with no correction for measurement error, 2) MARSS(1) model fit via the EM optimization, 3) MARSS(1) model fit via BFGS optimization, 4) ARMA(2,2) model fit with the R arima function, and 5) AR(2) model fit 2nd differencing with the R arima function. The ”x” shows the mean of the simulations and the bar in the boxplot is the median. The true values are shown with the dashed horizontal line. The σ2 for the AR process was 0.1 and the σ2 for the measurement error was 0.5. The b1 parameters was -1.5,and b2 was -0.75. A Textbooks and articles that use MARSS modeling for population modeling Textbooks Describing the Estimation of Process and Non-process Variance There are many textbooks on Kalman filtering and estimation of state-space models. The following are a sample of books on state-space modeling that we have found especially helpful. Shumway, R. H., and D. S. Stoffer. 2006. Time series analysis and its applications. Springer-Verlag. Harvey, A. C. 1989. Forecasting, structural time series models and the Kalman filter. Cambridge University Press. Durbin, J., and S. J. Koopman. 2001. Time series analysis by state space methods. Oxford University Press. Kim, C. J. and Nelson, C. R. 1999. State space models with regime switching. MIT Press. King, R., G. Olivier, B. Morgan, and S. Brooks. 2009. Bayesian analysis for population ecology. CRC Press. Giovanni, P., S. Petrone, and P. Campagnoli. 2009. Dynamic linear models in R. Springer-Verlag. Pole, A., M. West, and J. Harrison. 1994. Applied Bayesian forecasting and time series analysis. Chapman and Hall. Bolker, B. 2008. Ecological models and data in R. Princeton University Press. West, M. and Harrison, J. 1997. Bayesian forecasting and dynamic models. Springer-Verlag. Tsay, R. S. 2010. Analysis of financial time series. Wiley. Maximum-likelihood papers This is just a sample of the papers from the population modeling literature. 240 A Textbooks and articles that use MARSS modeling for population modeling de Valpine, P. 2002. Review of methods for fitting time-series models with process and observation error and likelihood calculations for nonlinear, nonGaussian state-space models. Bulletin of Marine Science 70:455-471. de Valpine, P. and A. Hastings. 2002. Fitting population models incorporating process noise and observation error. Ecological Monographs 72:57-76. de Valpine, P. 2003. Better inferences from population-dynamics experiments using Monte Carlo state-space likelihood methods. Ecology 84:30643077. de Valpine, P. and R. Hilborn. 2005. State-space likelihoods for nonlinear fisheries time series. Canadian Journal of Fisheries and Aquatic Sciences 62:1937-1952. Dennis, B., J.M. Ponciano, S.R. Lele, M.L. Taper, and D.F. Staples. 2006. Estimating density dependence, process noise, and observation error. Ecological Monographs 76:323-341. Ellner, S.P. and E.E. Holmes. 2008. Resolving the debate on when extinction risk is predictable. Ecology Letters 11:E1-E5. Erzini, K. 2005. Trends in NE Atlantic landings (southern Portugal): identifying the relative importance of fisheries and environmental variables. Fisheries Oceanography 14:195-209. Erzini, K., Inejih, C. A. O., and K. A. Stobberup. 2005. An application of two techniques for the analysis of short, multivariate non-stationary time-series of Mauritanian trawl survey data ICES Journal of Marine Science 62:353-359. Hinrichsen, R.A. and E.E. Holmes. 2009. Using multivariate state-space models to study spatial structure and dynamics. In Spatial Ecology (editors Robert Stephen Cantrell, Chris Cosner, Shigui Ruan). CRC/Chapman Hall. Hinrichsen, R.A. 2009. Population viability analysis for several populations using multivariate state-space models. Ecological Modelling 220:1197-1202. Holmes, E.E. 2001. Estimating risks in declining populations with poor data. Proceedings of the National Academy of Sciences of the United States of America 98:5072-5077. Holmes, E.E. and W.F. Fagan. 2002. Validating population viability analysis for corrupted data sets. Ecology 83:2379-2386. Holmes, E.E. 2004. Beyond theory to application and evaluation: diffusion approximations for population viability analysis. Ecological Applications 14:1272-1293. Holmes, E.E., W.F. Fagan, J.J. Rango, A. Folarin, S.J.A., J.E. Lippe, and N.E. McIntyre. 2005. Cross validation of quasi-extinction risks from real time series: An examination of diffusion approximation methods. U.S. Department of Commerce, NOAA Tech. Memo. NMFS-NWFSC-67, Washington, DC. Holmes, E.E., J.L. Sabo, S.V. Viscido, and W.F. Fagan. 2007. A statistical approach to quasi-extinction forecasting. Ecology Letters 10:1182-1198. Kalman, R.E. 1960. A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82:35-45. A Textbooks and articles that use MARSS modeling for population modeling 241 Lele, S.R. 2006. Sampling variability and estimates of density dependence: a composite likelihood approach. Ecology 87:189-202. Lele, S.R., B. Dennis, and F. Lutscher. 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10:551-563. Lindley, S.T. 2003. Estimation of population growth and extinction parameters from noisy data. Ecological Applications 13:806-813. Ponciano, J.M., M.L. Taper, B. Dennis, S.R. Lele. 2009. Hierarchical models in ecology: confidence intervals, hypothesis testing, and model selection using data cloning. Ecology 90:356-362. Staples, D.F., M.L. Taper, and B. Dennis. 2004. Estimating population trend and process variation for PVA in the presence of sampling error. Ecology 85:923-929. Zuur, A. F., and G. J. Pierce. 2004. Common trends in Northeast Atlantic squid time series. Journal of Sea Research 52:57-72. Zuur, A. F., I. D. Tuck, and N. Bailey. 2003. Dynamic factor analysis to estimate common trends in fisheries time series. Canadian Journal of Fisheries and Aquatic Sciences 60:542-552. Zuur, A. F., R. J. Fryer, I. T. Jolliffe, R. Dekker, and J. J. Beukema. 2003. Estimating common trends in multivariate time series using dynamic factor analysis. Environmetrics 14:665-685. Bayesian papers This is a sample of the papers from the population modeling and animal tracking literature. Buckland, S.T., K.B. Newman, L. Thomas and N.B. Koestersa. 2004. State-space models for the dynamics of wild animal populations. Ecological modeling 171:157-175. Calder, C., M. Lavine, P. Müller, J.S. Clark. 2003. Incorporating multiple sources of stochasticity into dynamic population models. Ecology 84:13951402. Chaloupka, M. and G. Balazs. 2007. Using Bayesian state-space modelling to assess the recovery and harvest potential of the Hawaiian green sea turtle stock. Ecological Modelling 205:93-109. Clark, J.S. and O.N. Bjørnstad. 2004. Population time series: process variability, observation errors, missing values, lags, and hidden states. Ecology 85:3140-3150. Jonsen, I.D., R.A. Myers, and J.M. Flemming. 2003. Meta-analysis of animal movement using state space models. Ecology 84:3055-3063. Jonsen, I.D, J.M. Flemming, and R.A. Myers. 2005. Robust state-space modeling of animal movement data. Ecology 86:2874-2880. 242 A Textbooks and articles that use MARSS modeling for population modeling Meyer, R. and R.B. Millar. 1999. BUGS in Bayesian stock assessments. Can. J. Fish. Aquat. Sci. 56:1078-1087. Meyer, R. and R.B. Millar. 1999. Bayesian stock assessment using a statespace implementation of the delay difference model. Can. J. Fish. Aquat. Sci. 56:37-52. Meyer, R. and R.B. Millar. 2000. Bayesian state-space modeling of agestructured data: fitting a model is just the beginning. Can. J. Fish. Aquat. Sci. 57:43-50. Newman, K.B., S.T. Buckland, S.T. Lindley, L. Thomas, and C. Fernández. 2006. Hidden process models for animal population dynamics. Ecological Applications 16:74-86. Newman, K.B., C. Fernández, L. Thomas, and S.T. Buckland. 2009. Monte Carlo inference for state-space models of wild animal populations. Biometrics 65:572-583 Rivot, E., E. Prévost, E. Parent, and J.L. Baglinière. 2004. A Bayesian state-space modelling framework for fitting a salmon stage-structured population dynamic model to multiple time series of field data. Ecological Modeling 179:463-485. Schnute, J.T. 1994. A general framework for developing sequential fisheries models. Canadian J. Fisheries and Aquatic Sciences 51:1676-1688. Swain, D.P., I.D. Jonsen, J.E. Simon, and R.A. Myers. 2009. Assessing threats to species at risk using stage-structured state-space models: mortality trends in skate populations. Ecological Applications 19:1347-1364. Thogmartin, W.E., J.R. Sauer, and M.G. Knutson. 2004. A hierarchical spatial model of avian abundance with application to cerulean warblers. Ecological Applications 14:1766-1779. Trenkel, V.M., D.A. Elston, and S.T. Buckland. 2000. Fitting population dynamics models to count and cull data using sequential importance sampling. J. Am. Stat. Assoc. 95:363-374. Viljugrein, H., N.C. Stenseth, G.W. Smith, and G.H. Steinbakk. 2005. Density dependence in North American ducks. Ecology 86:245-254. Ward, E.J., R. Hilborn, R.G. Towell, and L. Gerber. 2007. A state-space mixture approach for estimating catastrophic events in time series data. Can. J. Fish. Aquat. Sci., Can. J. Fish. Aquat. Sci. 644:899-910. Wikle, C.K., L.M. Berliner, and N. Cressie. 1998. Hierarchical Bayesian space-time models. Journal of Environmental and Ecological Statistics 5:117154 Wikle, C.K. 2003. Hierarchical Bayesian models for predicting the spread of ecological processes. Ecology 84:1382-1394. B Package MARSS: Warnings and errors The following are brief descriptions of the warning and error message you may see and what they mean (or might mean). B update is outside the unit circle If you are estimating B, then if the absolute value of all the eigenvalues of B are less than 1, the system is stationary (meaning the X’s have some multivariate distribution that does not change over time). In this case, we say that B is within the unit circle. A pure univariate random walk for example would have B=1 and it is not stationary. The distribution of X for the pure random walk has a variance that increases with time. If on the other hand |B| < 1, you have an Ornstein-Uhlenbeck process and is stationary, with a stationary variance of Q/(1 − B2 ) (note B is a scalar here because in this example X is univariate). If any of the eigenvalues (real part) are greater than 1, then the system will “explode”—it rapidly diverges. In the EM algorithm, there is nothing to force B to be on or within the unit circle (real part of the eigenvalues less than or equal to 1). It is possible at one of the EM iterations the B update will be outside the unit circle. The problem is that if you get too far outside the unit circle, the algorithm becomes numerically unstable since small errors are magnified by the “explosive” B term. If you see the ‘B outside the unit circle’ warning, it is fine as long as it is temporary and the log-likelihood does not start decreasing (you will see a separate warning if that happens). If you do see B outside the unit circle and the log-likelihood decreases, then it probably means that you have poorly specified the model somehow. An easy way to do this is to poorly specify the initial conditions, π and Λ. If, say, you try to specify a vague prior on x0 (or x1 ) with π equal to zero and Λ equal to a diagonal matrix with a large variance on the diagonal, you will likely run into trouble if B has off-diagonal terms. The reason is that by specifying that Λ is diagonal, you specified that the individual X’s in X0 are 244 B Package MARSS: Warnings and errors independent, yet if B has off-diagonal terms, the stationary distribution of X1 is NOT independent. If you force the diagonal terms on Λ to be big enough, you can force the maximum-likelihood estimate of B to be outside the unit circle since this is the only way to account for X0 independent and X1 highly correlated. The problem is that you will not know the stationary distribution of the X’s (from which X0 was presumably drawn) without knowing the parameters you are trying to estimate. One approach is the estimate both π and Λ by setting x0="unconstrained" and V0="unconstrained" in the model specification. Estimating both π and Λ cannot be done robustly for all MARSS models, and in general, one probably wants to specify the model in such a way as to fix one or both of these. Another, more robust approach, is to treat x1 as fixed but unknown (instead of x0 ). You do this by setting model$tinitx=1, so that π refers to t = 1 not t = 0. Then estimate π and fix Λ = 0. This eliminates Λ from the model and often eliminates the problems with prior specification— as the expense of m more parameters. Note, when you set Λ = 0, Λ is truly eliminated from the model by MARSS; the likelihood function is different, so do not expect Λ = 0 and Λ ∼ 0 to have the same likelihood under all conditions. Warning! Reached maxit before parameters converged The maximum number of EM iterations is set by control$maxit. If you get this warning, it means that one of the parameters or log-likelihood had not yet reached the convergence stopping criteria before maxit was reached. There are many situations where you might want to set control$maxit lower than the value needed to reach convergence. For example, if you are using the EM algorithm to produce initial values for a different algorithm (like a Bayesian MCMC algorithm or a Newton method) then you can set maxit low, say 20 or 50. Stopped at iter=xx in MARSSkem() because numerical errors were generated in MARSSkf This means the Kalman filter/smoother algorithm became unstable and most likely one of the variances became ill-conditioned. When that happens the inverses of those matrices are poor, and you will start to get negative values on the diagonals of your variance-covariance matrices. Once that happens, the inverse of that var-covariance matrix produces an error. If you get this error, turn on tracing with control$trace=1. This will store the error messages so you can see what is going on. It may be that you have specified the model in such a way that some of the variances are being forced very close to 0, which makes the var-covariance matrix ill-conditioned. The output from the MARSS call will be the parameter values just before the error occurred. B Package MARSS: Warnings and errors 245 Warning: the xyz parameter value has not converged The algorithm checks whether the log-likelihood and each individual parameter has converged. If a parameter has not converged, you can try upping control$maxit and see if it converges. If you set, maxit high, but the parameter is still not converging, then it suggests that one of the variance parameters is so small that the EM update steps for that parameter are tiny. For example, as Q goes to zero, the update steps for u go to zero. As Λ goes to zero, the update steps for π go to zero. The first thing to do is to reflect on whether you are inadvertently specifying the model in such a way that one of the variances is forced to zero. For example, if the total variance in X is 0.1 and you fix R = 0.2 then Q must go to zero. The second thing to do is to try using a Newton algorithm, using your last EM values as the initial conditions for the Newton algorithm. The initial values are set using the inits argument for the MARSS() function. MARSSkem: The soln became unstable and logLik DROPPED This is a more serious error as in the EM algorithm, the log-likelihood should never drop. The first thing to do is check if you have specified a bizarre model or data, inadvertently. Plot the data you are trying to fit. Often, this error arises when a user has inadvertently scrambled their data order during a demeaning or variance-standardization step. Second, check the model you are trying to fit. Use test=MARSS(data, model=xyz, fit=FALSE) and then summary(test$model). This shows you what MARSS() thinks your model is. You may be trying to fit an illogical model. If those checks looks good, then pass control$trace=1 into the MARSS() call. This will report a fuller set of warnings. Look if the error “B is outside the unit circle” appears. If so, you are probably specifying a strange B matrix. Are you forcing the B matrix to be outside the unit circle (eigenvalues > 1)? If so, you need to rethink your B matrix contraints. If you do not see that error, look at test$iter.record$logLik. If the log-likelihood is steadily dropping (at each iteration) or drops by large amounts (much larger than the machine precision), that is bad and means that the EM algorithm did not work. If however the log-likelihood is just fluctuating by small amounts about some steady value, that is ok as it means that the values converged but the parameters are such that there are slight numerical fluctuations. Try passing control$safe=TRUE in the MARSS() call. This can sometimes help as it inserts a call to the Kalman filter after each individual parameter update. 246 B Package MARSS: Warnings and errors Stopped at iter=xx in MARSSkem: solution became unstable. R (or Q) update is not positive definite First check if you have specified an illegally constrained variance-covariance matrix. For example, if the variances (diagonal) are constrained to be equal, you cannot specify the covariances (off-diagonals) as unequal. Or if you specify that some of the covariances are equal, you cannot specify the variances as all unequal. These are illegal constraints on a variance-covariance matrix from a statistical perspective (nothing to do with MARSS specifically). This could also be due to numerical instability as B leaves the unit circle or one of the variance matrix becomes ill-conditioned. Try turning on tracing with control$trace=1 and turn on safe with control$safe=TRUE. This will print out the error warnings at each parameter update step. Then consider whether you have inadvertently specified the model in such a way as to force this behavior in the B parameter. You might also get this error if you inadvertantly specified an improper structure for R or Q. For example, if you used R=diag(c(1,1,"r")) with the intent of specifying a diagonal matrix with fixed variance 1 at R[1, 1] and R[2, 2] and an estimated R[3, 3], you would have actually specified a character matrix with "0" on the off-diagonals and c("1","1","r") on the diagonal. MARSS() interprets all elements in quotes as names of parameters to be estimated. Thus it will estimate one off-diagonal covariance and two diagonal variances. That happens to put illegal constraints on estimation of a variancecovariance matrix having nothing to do with MARSS() but with estimation of variance-covariance matrices in general. iter=xx MARSSkf: logLik computation is becoming unstable. Condition num. of Sigma[t=1] = Inf and of R = Inf. This means, generally, that V0 is very small, say 0, and R is very small and very close to zero. Warning: setting diagonal to 0 blocked at iter=X. logLik was lower in attempt to set 0 diagonals on X This is a warning not an error. What is happening is that one of the variances (in Q or R) is getting small and the EM algorithm is attempting to set the value to 0 (because control$degen.allow=TRUE). But when it tried to do this, the new likelihood with the variance equal to 0 was lower and the variance was not set to 0. A model with a variance minuscule and a model with the same variance equal to 0 are not the same model. In the first, a stochastic process with B Package MARSS: Warnings and errors 247 small variance exists but in the second, the analogous process is deterministic. And in the first case, you can get a situation where the likelihood term L(x|mean=mu,sigma=0) appears. That term will be infinite when x=mu. So in the model with variance minuscule, you will get very large likelihood values as the variance term gets smaller and smaller. In the analogous model with that variance set to 0, that likelihood term does not appear so the likelihood does not go to infinity. This is not an error nor pathological behavior; the models are fundamentally different. Nonetheless, this will pose a dilemma when you want to chose the best model based on maximum likelihood. The model with minuscule variance will have infinite likelihood but the same behavior as the one with variance 0. In our experience, this dilemma arises when one has a lot of missing data near the beginning of the time series and is affected by how you specify the prior on the initial state. Try setting the prior at t = 0 versus t = 1. Try using a diffuse prior. You absolutely want to compare estimates using the BFGS and EM algorithms in this case, because the different algorithms differ in their ability to find the maximum in this strange case. Neither is uniformly better or worse. It seems to depend on which variance (Q or R) is going to zero. Warning: kf returned error at iter=X in attempt to set 0 diagonals for X This is a warning that the EM algorithm tried to set one of the diagonals of element X to 0 because allow.degen is TRUE and element X is going to zero. However when this was tried, the Kalman filter returned an error. Typically, this happens when both R and Q elements are both trying to be set to 0. If the maximum-likelihood estimate is that both R and Q are zero, it probably means that your MARSS model is not a very good description of the data. Warning: At iter=X attempt to set 0 diagonals for R blocked for elements where corresponding rows of A or Z are not fixed. You have control$degen.allow=TRUE and one of the R diagonal elements is getting very small. MARSS attempts to set these R elements to 0, but if row i of R is 0, then the corresponding i rows of a and Z must be fixed. This is for the EM algorithm. It might work with the BFGS algorithm, or might spit out garbage without telling you. Always be a suspect, when the EM and BFGS behavior is different. That is a good sign that something is wrong with how your model describes the data. It’s not a problem with the algorithms per se; rather for certain pathological models, the algorithms behave differently from each other. 248 B Package MARSS: Warnings and errors Stopped at iter=X in MARSSkem. XYZ is not invertible. There are a series of checks in MARSS that check if matrix inversions are possible before doing the inversion. These errors crop up most often when Q or R are getting very small. At some point, they can get so small that inversions become unstable. If this error is given, then the output will be the last parameter estimates before the error. Try setting control$allow.degen=FALSE. Sometimes the error occurs when a diagonal element of Q or R is being set to 0. You will also have to set control$maxit to something smaller because the EM algorithm will not stop since the problematic diagonal element will walk slowly and inexorably to 0. References Biernacki, C., Celeux, G., and Govaert, G. 2003. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate gaussian mixture models. Computational Statistics and Data Analysis 41:561–575. Brockwell, P. J. and Davis, R. A. 1991. Time series: theory and methods. Springer-Verlag, New York, NY. Cavanaugh, J. and Shumway, R. 1997. A bootstrap variant of AIC for state-space model selection. Statistica Sinica 7:473–496. Cheang, W. K. and Reinsel, G. C. 2000. Bias reduction of autoregressive estimates in time series regression model through restricted maximum likelihood. Journal of the American Statistical Association 95:1173–1184. de Jong, P. and Penzer, J. 1998. Diagnosing shocks in time series. Journal of the American Statistical Association 93:796–806. Dempster, A., Laird, N., and Rubin, D. 1977. Likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39:1–38. Dennis, B., Munholland, P. L., and Scott, J. M. 1991. Estimation of growth and extinction parameters for endangered species. Ecological Monographs 61:115–143. Dennis, B., Ponciano, J. M., Lele, S. R., Taper, M. L., and Staples, D. F. 2006. Estimating density dependence, process noise, and observation error. Ecological Monographs 76:323–341. Ellner, S. P. and Holmes, E. E. 2008. Resolving the debate on when extinction risk is predictable. Ecology Letters 11:E1–E5. Faraway, J. 2004. Linear models with R. CRC Press. Gerber, L. R., Master, D. P. D., and Kareiva, P. M. 1999. Grey whales and the value of monitoring data in implementing the u.s. endangered species act. Conservation Biology 13:1215–1219. Ghahramani, Z. and Hinton, G. E. 1996. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2, University of Totronto, Dept. of Computer Science. 250 References Hamilton, J. D. 1994. State-space models, volume IV, chapter 50. Elsevier Science. Hampton, S. E., Izmest’Eva, L. R., Moore, M. V., Katz, S. L., Dennis, B., and Silow, E. A. 2008. Sixty years of environmental change in the world’s largest freshwater lake – Lake Baikal, Siberia. Global Change Biology 14:1947–1958. Hampton, S. E., Scheuerell, M. D., and Schindler, D. E. 2006. Coalescence in the lake washington story: Interaction strengths in a planktonic food web. Limnology and Oceanography 51:2042–2051. Hampton, S. E. and Schindler, D. E. 2006. Empirical evaluation of observation scale effects in community time series. Oikos 113:424–439. Harvey, A., Koopman, S. J., and Penzer, J. 1998. Messy time series: a unified approach. Advances in Econometrics 13:103–143. Harvey, A. C. 1989. Forecasting, structural time series models and the Kalman filter. Cambridge University Press, Cambridge, UK. Harvey, A. C. and Koopman, S. J. 1992. Diagnostic checking of unobserved components time series models. Journal of Business and Economic Statistics 10:377–389. Harvey, A. C. and Shephard, N. 1993. Structural time series models. Elsevier Science Publishers B V, Amsterdam. Hinrichsen, R. 2009. Population viability analysis for several populations using multivariate state-space models. Ecological Modelling 220:1197–1202. Hinrichsen, R. and Holmes, E. E. 2009. Using multivariate state-space models to study spatial structure and dynamics. CRC/Chapman Hall. Holmes, E. E. 2001. Estimating risks in declining populations with poor data. Proceedings of the National Academy of Sciences of the United States of America 98:5072–5077. Holmes, E. E. 2004. Beyond theory to application and evaluation: diffusion approximations for population viability analysis. Ecological Applications 14:1272–1293. Holmes, E. E. 2012. Derivation of the EM algorithm for constrained and unconstrained marss models. Technical report, Northwest Fisheries Science Center, Mathematical Biology Program. Holmes, E. E., Sabo, J. L., Viscido, S. V., and Fagan, W. F. 2007. A statistical approach to quasi-extinction forecasting. Ecology Letters 10:1182–1198. Holmes, E. E., Ward, E. J., and Wills, K. 2012. Marss: Multivariate autoregressive state-space models for analyzing time-series data. The R Journal 4:11–19. Holmes, E. E. and Ward, E. W. 2010. Analyzing noisy, gappy, and multivariate population abundance data: modeling, estimation, and model selection in a maximum-likelihood framework. Technical report, Northwest Fisheries Science Center, Mathematical Biology Program. Ives, A. R. 1995. Measuring resilience in stochastic systems. Ecological Monographs 65:217–233. References 251 Ives, A. R., Abbott, K. C., and Ziebarth, N. L. 2010. Analysis of ecological time series with arma(p,q) models. Ecology 91:858–871. Ives, A. R., Carpenter, S. R., and Dennis, B. 1999. Community interaction webs and zooplankton responses to planktivory manipulations. Ecology 80:1405–1421. Ives, A. R., Dennis, B., Cottingham, K. L., and Carpenter, S. R. 2003. Estimating community stability and ecological interactions from timeseries data. Ecological Monographs 73:301–330. Jeffries, S., Huber, H., Calambokidis, J., and Laake, J. 2003. Trends and status of harbor seals in washington state 1978-1999. Journal of Wildlife Management 67:208–219. Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82:35–45. Klug, J. L. and Cottingham, K. L. 2001. Interactions among environmental drivers: Community responses to changing nutrients and dissolved organic carbon. Ecology 82:3390–3403. Kohn, R. and Ansley, C. F. 1989. A fast algorithm for signal extraction, influence and cross-validation in state-space models. Biometrika 76:65–79. Koopman, S. J. 1993. Distrubance smoother for state space models. Biometrika 80:117–126. Koopman, S. J., Shephard, N., and Doornik, J. A. 1999. Statistical algorithms for models in state space using ssfpack 2.2. Econometrics Journal 2:113–166. Lamon, E. I., Carpenter, S., and Stow, C. 1998. Forecasting pcb concentrations in lake michigan salmonids: a dynamic linear model approach. Ecological Applications 8:659âĂŞ668. Lele, S. R., Dennis, B., and Lutscher, F. 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using bayesian markov chain monte carlo methods. Ecology Letters 10:551–563. McLachlan, G. J. and Krishnan, T. 2008. The EM algorithm and extensions. John Wiley and Sons, Inc., Hoboken, NJ, 2nd edition. Penzer, J. 2001. Critical values for time series diagnostics. Technical report, Department of Statistics, London School of Economics. Petris, G., Petrone, S., and Campagnoli, P. 2009. Dynamic Linear Models with R. Use R! Springer. Pole, A., West, M., and Harrison, J. 1994. Applied Bayesian forecasting and time series analysis. Chapman and Hall, New York. Rauch, H. E. 1963. Solutions to the linear smoothing problem. IEEE Transactions on Automatic Control 8:371–372. Rauch, H. E., Tung, F., and Striebel, C. T. 1965. Maximum likelihood estimation of linear dynamical systems. Journal of AIAA 3:1445–1450. Scheuerell, M. D. and Williams, J. G. 2005. Forecasting climate induced changes in the survival of snake river spring/summer chinook salmon (oncorhynchus tshawytscha). Fisheries Oceanography 14:448–457. 252 References Schweppe, F. C. 1965. Evaluation of likelihood functions for Gaussian signals. IEEE Transactions on Information Theory IT-r:294–305. Shumway, R. and Stoffer, D. 2006. Time series analysis and its applications. Springer-Science+Business Media, LLC, New York, New York, 2nd edition. Shumway, R. H. and Stoffer, D. S. 1982. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis 3:253–264. Staples, D. F., Taper, M. L., and Dennis, B. 2004. Estimating population trend and process variation for PVA in the presence of sampling error. Ecology 85:923–929. Staudenmayer, J. and Buonaccorsi, J. R. 2005. Measurement error in linear autoregressive models. Journal of the American Statistical Association 10:841–852. Stoffer, D. S. and Wall, K. D. 1991. Bootstrapping state-space models: Gaussian maximum likelihood estimation and the Kalman filter. Journal of the American Statistical Association 86:1024–1033. Taper, M. L. and Dennis, B. 1994. Density dependence in time series observations of natural populations: estimation and testing. Ecological Monographs 64:205–224. Tsay, R. S. 2010. Analysis of financial time series. Wiley Series in Probability and Statistics. Wiley. Ward, E. J., Chirakkal, H., González-Suárez, M., AuriolesGamboa, D., Holmes, E. E., and Gerber, L. 2010. Inferring spatial structure from time-series data: using multivariate state-space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico. Journal of Applied Ecology 1:47–56. Zuur, A. F., Fryer, R. J., Jolliffe, I. T., Dekker, R., and Beukema, J. J. 2003. Estimating common trends in multivariate time series using dynamic factor analysis. Environmetrics 14:665–685. Index animal tracking, 131 kftrack, 138 bootstrap innovations, 15, 21, 22 MARSSboot function, 15 parametric, 15, 21, 22 confidence intervals, 77 Hessian approximation, 15, 79 MARSSparamCIs function, 15 non-parametric bootstrap, 15 parametric bootstrap, 15, 77 covariates, 151, 181, 185 density-independent, 61 diagnostics, 89 error observation, 62 process, 61, 62 errors degenerate, 9 ill-conditioned, 9 estimation, 65 BFGS, 36 Dennis method, 66 EM, 14, 18, 65 Kalman filter, 15, 19 Kalman smoother, 15, 19 maximum-likelihood, 65, 66 Newton methods, 19 quasi-Newton, 14, 36 REML, 7 extinction, 61 diffusion approximation, 70 uncertainty, 75 functions coef, 48 is.marssMLE, 15 is.marssMODEL, 16 MARSS, 13, 35, 38, 40, 42 MARSSaic, 15, 21, 22, 53 MARSSboot, 15, 21, 52 MARSShessian, 15 MARSSkem, 14, 18, 19 MARSSkf, 15, 20, 21, 48 MARSSkfas, 20, 21 MARSSkfss, 20 MARSSmcinit, 15, 19 MARSSoptim, 14 MARSSparamCIs, 8, 15, 21, 46 MARSSsimulate, 15, 22, 53 MARSSvectorizeparam, 15 optim, 14 summary, 16, 44 initial conditions setting for BFGS, 37 Kalman filter and smoother, 48 lag-1 covariance smoother, 48 likelihood, 15, 20, 53 and missing values, 21 innovations algorithm, 20 MARSSkf function, 53 254 Index missing value modifications, 21 multimodal, 19 troubleshooting, 9, 19 MAR(p), 230 MARSS model, 3, 6, 131 DFA example, 113 DLM example, 203 multivariate example, 81, 99, 131 print, 44 summary, 44 univariate example, 62 missing values, 8 and AICb, 22 and parametric bootstrap, 21 likelihood correction, 21 model selection, 22, 99 AIC, 22, 88, 89, 92, 96 AICc, 22, 96 bootstrap AIC, 22, 96 bootstrap AIC, AICbb, 22, 53 bootstrap AIC, AICbp, V, 22, 53, 96 MARSSaic function, 15, 53 model specification in MARSS, 25 objects marssMLE, 13 marssMODEL, 13, 16 outliers, 141 prior, 4, 29, 34 diffuse, 145 troubleshooting, 8, 37, 172, 244, 246 simulation, 22, 53, 62 MARSSsimulate function, 15, 53 standard errors, 15 structural breaks, 141 troubleshooting, 9, 243 B outside unit circle, 243 degenerate, 9 degenerate variances, 172 ill-conditioning, 9 Kalman filter errors, 247 local maxima, 19 matrix not invertible, 248 matrix not positive definite, 246 non-convergence, 9, 244, 245 numerical instability, 9, 244, 245 sensitivity to x0 prior, 34, 37, 172 setting diagonal to 0 blocked, 246, 247 sigma condition number, 246
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 262 Page Mode : UseOutlines Author : Title : Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.12 Create Date : 2014:05:30 19:15:47-07:00 Modify Date : 2014:05:30 19:15:47-07:00 Trapped : False PTEX Fullbanner : This is MiKTeX-pdfTeX 2.9.4487 (1.40.12)EXIF Metadata provided by EXIF.tools