User Guide

UserGuide

UserGuide

UserGuide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 262

DownloadUser Guide
Open PDF In BrowserView PDF
E. E. Holmes, E. J. Ward, and M. D. Scheuerell

Analysis of multivariate timeseries using the MARSS package
version 3.10.1
May 30, 2014

Northwest Fisheries Science Center, NOAA
Seattle, WA, USA

Holmes, E. E., E. J. Ward and M. D. Scheuerell. Analysis of multivariate
time-series using the MARSS package. NOAA Fisheries, Northwest Fisheries
Science Center, 2725 Montlake Blvd E., Seattle, WA 98112. Contacts
eli.holmes@noaa.gov, eric.ward@noaa.gov, and mark.scheuerell@noaa.gov
Disclaimer: E. E. Holmes, E. J. Ward, and M. D. Scheuerell are NOAA
scientists employed by the U.S. National Marine Fisheries Service. The views
and opinions presented here are solely those of the authors and do not
necessarily represent those of our employer.

V

Preface
The initial motivation for our work with MARSS models was a collaboration
with Rich Hinrichsen. Rich developed a framework for analysis of multi-site
population count data using MARSS models and bootstrap AICb (Hinrichsen and Holmes, 2009). Our work (EEH and EJW) extended Rich’s framework, made it more general, and led to the development of a parametric bootstrap AICb for MARSS models, which allows one to do model-selection using
datasets with missing values (Ward et al., 2010; Holmes and Ward, 2010).
Later, we developed additional algorithms for simulation and confidence intervals. Discussions with Mark Scheuerell led to an extensive revision of the
EM algorithm and to the development of a general EM algorithm for constrained MARSS models (Holmes, 2012). Discussions with Mark also led to a
complete rewrite of the model specification so that the package could be used
for MARSS models in general—rather than simply the form of MARSS model
used in our applications. Many collaborators have helped test the package; we
thank especially Yasmin Lucero, Kevin See, and Brice Semmens. Development
of the code into a R package would not have been possible without Kellie
Wills, who wrote much of the original package code outside of the algorithm
functions. Finally, we thank the participants of our MARSS workshops and
courses and the MARSS users who have contacted us regarding issues that
were unclear in the manual, errors, or suggestions regarding new applications.
Discussions with these users have helped us improve the manual and go in
new directions.
The application chapters were developed originally as part of workshops
on analysis of multivariate time-series data given at the Ecological Society of
America meetings since 2005 and taught by us along with Yasmin Lucero,
Stephanie Hampton, and Brice Semmens. The chapter on extinction estimation and trend estimation was initially developed by Brice Semmens and later
extended by us for this user guide. The algorithm behind the TMU figure in
Chapter 6 was developed during a collaboration with Steve Ellner (Ellner and
Holmes, 2008). Later we further developed the chapters as part of a course
we taught on analysis of fisheries and environmental time-series data at the
University of Washington, where we are affiliate faculty. You can find online
versions of the workshops and the time-series analysis course on EEH’s website
http://faculty.washington.edu/eeholmes.
The authors are research scientists at the Northwest Fisheries Science Center (NWFSC). This work was conducted as part of our jobs at the NWFSC,
a research center for NOAA Fisheries which is a United States federal government agency. A CAMEO grant from the National Science Foundation and
NOAA Fisheries provided the initial impetus for the development of the package as part of a research project with Stephanie Hampton, Lindsay Scheef,
and Steven Katz on analysis of marine plankton time series. During the initial
stages of this work, EJW was supported on a post-doctoral fellowship from the

VI

National Research Council and MDS was partially supported by a PECASE
award from the White House Office of Science and Technology Policy.
You are welcome to use the code and adapt it with full attribution. You
should use citation Holmes et al. (2012) for the MARSS package. It may not be
used in any commercial applications nor may it be copyrighted. Use of the EM
algorithm should cite Holmes (2012). Links to more code and publications on
MARSS applications can be found by following the links at our our academic
websites:
ˆ
ˆ
ˆ

http://faculty.washington.edu/eeholmes
http://faculty.washington.edu/scheuerl
https://sites.google.com/site/ericward2

Contents

Part I The MARSS package
1

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 What does the MARSS package do? . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 What does MARSS output and how do I get the output? . . . . . 6
1.3 How to get started (quickly) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Important notes about the algorithms . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Other related packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2

The main package functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 The MARSS() function: inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The MARSS() function: outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Core functions for fitting a MARSS model . . . . . . . . . . . . . . . . . .
2.4 Functions for a fitted marssMLE object . . . . . . . . . . . . . . . . . . . .
2.5 Functions for marssMODEL objects . . . . . . . . . . . . . . . . . . . . . . .

13
13
14
14
15
16

3

Algorithms used in the MARSS package . . . . . . . . . . . . . . . . . . .
3.1 The full time-varying model used in the MARSS EM algorithm
3.2 Maximum-likelihood parameter estimation . . . . . . . . . . . . . . . . . .
3.3 Kalman filter and smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 The exact likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Parametric and innovations bootstrapping . . . . . . . . . . . . . . . . . .
3.6 Simulation and forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17
17
18
19
20
21
22
22

Part II Fitting models with MARSS

VIII

Contents

4

Fitting models: the MARSS() function . . . . . . . . . . . . . . . . . . . .
4.1 u, a and π model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Q, R, Λ model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 B model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Z model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Default model structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25
26
27
29
29
30

5

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Fixed and estimated elements in parameter matrices . . . . . . . . .
5.2 Different numbers of state processes . . . . . . . . . . . . . . . . . . . . . . .
5.3 Time-varying parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Including inputs (or covariates) . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Printing and summarizing models and model fits . . . . . . . . . . . .
5.6 Confidence intervals on a fitted model . . . . . . . . . . . . . . . . . . . . . .
5.7 Vectors of just the estimated parameters . . . . . . . . . . . . . . . . . . .
5.8 Kalman filter and smoother output . . . . . . . . . . . . . . . . . . . . . . . .
5.9 Degenerate variance estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.10 Bootstrap parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.11 Random initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.12 Data simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.13 Bootstrap AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.14 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33
33
34
43
44
44
46
48
48
49
52
52
53
53
54

Part III Applications
6

7

Count-based population viability analysis (PVA) using
corrupted data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Simulated data with process and observation error . . . . . . . . . . .
6.3 Maximum-likelihood parameter estimation . . . . . . . . . . . . . . . . . .
6.4 Probability of hitting a threshold Π(xd ,te ) . . . . . . . . . . . . . . . . . .
6.5 Certain and uncertain regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 More risk metrics and some real data . . . . . . . . . . . . . . . . . . . . . .
6.7 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Combining multi-site data to estimate regional population
trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1 Harbor seals in the Puget Sound, WA. . . . . . . . . . . . . . . . . . . . . .
7.2 A single well-mixed population with i.i.d. errors . . . . . . . . . . . . .
7.3 Single population model with independent and non-identical
errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Two subpopulations, north and south . . . . . . . . . . . . . . . . . . . . . .
7.5 Other population structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61
61
62
65
70
75
76
77
79

81
81
83
88
91
92

Contents

IX

7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8

Identifying spatial population structure and covariance . . . . 99
8.1 Harbor seals on the U.S. west coast . . . . . . . . . . . . . . . . . . . . . . . . 99
8.2 Question 1, How many distinct subpopulations? . . . . . . . . . . . . . 101
8.3 Fit the different models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.4 Summarize the data support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.5 Question 2, Are the subpopulations independent? . . . . . . . . . . . . 107
8.6 Question 3, Is the Hood Canal independent? . . . . . . . . . . . . . . . . 110
8.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

9

Dynamic factor analysis (DFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.2 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
9.3 Setting up the model in MARSS . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.4 Using model selection to determine the number of trends . . . . . 121
9.5 Using varimax rotation to determine the loadings and trends . . 124
9.6 Examining model fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.7 Adding covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.8 Questions and further analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

10 Analyzing noisy animal tracking data . . . . . . . . . . . . . . . . . . . . . . 131
10.1 A simple random walk model of animal movement . . . . . . . . . . . 131
10.2 Loggerhead sea turtle tracking data . . . . . . . . . . . . . . . . . . . . . . . . 132
10.3 Estimate locations from bad tag data . . . . . . . . . . . . . . . . . . . . . . 133
10.4 Estimate speeds for each turtle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.5 Using specialized packages to analyze tag data . . . . . . . . . . . . . . 137
11 Detection of outliers and structural breaks . . . . . . . . . . . . . . . . 141
11.1 River flow in the Nile River . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
11.2 Different models for the Nile flow levels . . . . . . . . . . . . . . . . . . . . 141
11.3 Observation and state residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
12 Incorporating covariates into MARSS models . . . . . . . . . . . . . . 151
12.1 Covariates as inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
12.2 Examples using plankton data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
12.3 Observation-error only model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
12.4 Process-error only model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
12.5 Both process- & observation-error model . . . . . . . . . . . . . . . . . . . 158
12.6 Including seasonal effects in MARSS models . . . . . . . . . . . . . . . . 159
12.7 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
12.8 Covariates with missing values or observation error . . . . . . . . . . 164

X

Contents

13 Estimation of species interaction strengths with and
without covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
13.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
13.2 Two-species example using wolves and moose . . . . . . . . . . . . . . . 170
13.3 Analysis a four-species plankton community . . . . . . . . . . . . . . . . 176
13.4 Stability metrics from estimated interaction matrices . . . . . . . . . 189
13.5 Further information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
14 Combining data from multiple time series . . . . . . . . . . . . . . . . . 193
14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
14.2 Salmon spawner surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
14.3 American kestrel abundance indices . . . . . . . . . . . . . . . . . . . . . . . . 198
15 Univariate dynamic linear models (DLMs) . . . . . . . . . . . . . . . . . 203
15.1 Overview of dynamic linear models . . . . . . . . . . . . . . . . . . . . . . . . 203
15.2 Example of a univariate DLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
15.3 Forecasting with a univariate DLM . . . . . . . . . . . . . . . . . . . . . . . . 209
16 Multivariate linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
16.1 Univariate linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
16.2 Multivariate response example using longitudinal data . . . . . . . 223
16.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
17 Lag-p models with MARSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
17.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
17.2 MAR(2) models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
17.3 MAR(p) models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
17.4 MARSS(p): models with observation error . . . . . . . . . . . . . . . . . . 235
17.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
A

Textbooks and articles that use MARSS modeling for
population modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

B

Package MARSS: Warnings and errors . . . . . . . . . . . . . . . . . . . . 243

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

Part I

The MARSS package

1
Overview

MARSS stands for Multivariate Auto-Regressive(1) State-Space. The MARSS
package is an R package for estimating the parameters of linear MARSS models with Gaussian errors. This class of model is extremely important in the
study of linear stochastic dynamical systems, and these models are important
in many different fields, including economics, engineering, genetics, physics
and ecology (Appendix A). The model class has different names in different fields, for example in some fields they are termed dynamic linear models (DLMs) or vector autoregressive (VAR) state-space models. The MARSS
package allows you to easily fit time-varying constrained and unconstrained
MARSS models with or without covariates to multivariate time-series data
via maximum-likelihood using primarily an EM algorithm1 .
A MARSS model, with Gaussian errors, takes the form:
xt = Bt xt−1 + ut + Ct ct + wt , where wt ∼ MVN(0, Qt )

(1.1a)

yt = Zt xt + at + Dt dt + vt , where vt ∼ MVN(0, Rt )

(1.1b)

x1 ∼ MVN(π, Λ) or x0 ∼ MVN(π, Λ)

(1.1c)

The x equation is termed the state process and the y equation is termed the
observation process. Data enter the model as the y; that is the y is treated as
the data although there may be missing data. The ct and dt are inputs (aka,
exogenous variables, covariates or indicator variables).
The bolded terms are matrices with the following definitions:
x is a m × T matrix of states. Each xt is a realization of the random variable
Xt at time t.
w is a m × T matrix of the process errors. The process errors at time t are
multivariate normal with mean 0 and covariance matrix Qt .
y is a n × T matrix of the observations. Some observations may be missing.
1

Fitting via the BFGS algorithm is also provided using R ’s optim function, but
this is not the focus of the package.

4

1 Overview

v is a n × T column vector of the non-process errors. The observation erros at
time t are multivariate normal with mean 0 and covariance matrix Rt .
Bt and Zt are parameters and are m × m and n × m matrices.
ut and at are parameters and are m × 1 and n × 1 column vectors.
Qt and Rt are parameters and are m × m and n × n variance-covariance matrices.
π is either a parameter or a fixed prior. It is a m × 1 matrix.
Λ is either a parameter or a fixed prior. It is a m × m variance-covariance
matrix.
Ct and Dt are parameters and are m × p and n × q matrices.
c and d are inputs (no missing values) and are p × T and q × T matrices.
In some fields, the u and a terms are routinely set to 0 or the model is
written in such a way that they are incorporated into B or Z. However, in other
fields, the u and a terms are the main objects of interest, and the model is
written to explicitly show them. We include them throughout our discussion,
but they can be set to zero if desired.
The AR(p) models can be written in the above form by properly defining the x vector and setting some of the R variances to zero; see Chapter
17. Although the model appears to only include i.i.d. errors (vt and wt ), in
practice, AR(p) errors can be included by moving the errors into the state
model. Similarly, the model appears to have independent process (vt ) and observation (wt ) errors, however, in practice, these can be modeled as identical
or correlated by using one of the state processes to model the errors with the
B matrix set appropriately for AR or white noise—although one may have to
fix many of the parameters associated with the errors to have an identifiable
model. Study the case studies in this User Guide and textbooks on MARSS
models for examples of how a wide variety of autoregressive models can be
written in MARSS form.

1.1 What does the MARSS package do?
Written in an unconstrained form2 , a MARSS model can be written out as
follows. Two state processes (x) and three observation processes (y) are used
here for example’s sake.

2

meaning all the elements in a parameter matrices are allowed to be different

1.1 What does the MARSS package do?

 

 
 
x1
b b
x1
w
= 11 12
+ 1 ,
x2 t
b21 b22 x2 t−1
w2 t

5

 
  

w1
u1
q q
∼ MVN
, 11 12
w2 t
u2
q21 q22

 


   
  

y1
z11 z12  
v1
v1
a1
r11 r12 r13
x
y2  = z21 z22  1 + v2  , v2  ∼ MVN a2  , r21 r22 r23 
x2 t
y3 t
z31 z32
v3 t v3 t
a3
r31 r32 r33
 
  

x1
π1
ν ν
∼ MVN
, 11 12
x2 0
π2
ν21 ν22

or

 
  

x1
π1
ν ν
∼ MVN
, 11 12
x2 1
π2
ν21 ν22

However not all parameter elements can be estimated simultaneously. Constraints are required in order to specify a model with a unique solution. The
MARSS package allows you to specify constraints by fixing elements in a parameter matrix or specifying that some elements are estimated—and have a
linear relationship to other elements. Here is an example of a MARSS model
with fixed and estimated parameter elements:
 
  
 
 
  

x1
a 0 x1
w1
w1
0.1
q11 q12
=
+
,
∼ MVN
,
0 a x2 t−1
u
x2 t
w2 t
w2 t
q12 q22
 


 
 
y1
d
d
v1
x
1
y2  = 
c
c 
+ v2  ,
x2 t
y3 t
1 + 2d + 3c 2 + 3d
v3 t

 
  

v1
a1
r00
v2  ∼ MVN a2  , 0 r 0
v3 t
0
00r

 
   
x1
π
10
∼ MVN
,
x2 0
π
01
Notice that some elements are fixed (in case to 0, but could be any fixed
number), some elements are shared (have the same value), and some elements
are linear combinations of some estimated values (c, 1 + 2d + 3c and 2 + 3d are
linear combinations of c and d).
The MARSS package fits models via maximum likelihood. The MARSS
package is unusual among packages for fitting MARSS models in that fitting
is performed via a constrained EM algorithm (Holmes, 2012) based on a vectorized form of Equation 1.1 (See Chapter 3 for the vectorized form used in the
algorithm). Although fitting via the BFGS algorithm is also provided using
method="BFGS" and R ’s optim function, the examples in this guide use the
EM algorithm primarily because it gives robust estimation for datasets replete
with missing values and for high-dimensional models with various constraints.
However, there are many models/datasets where BFGS is faster and we typically try both for problems. The EM algorithm is also often used to provide
initial conditions for the BFGS algorithm (or an MCMC routine) in order to
improve the performance of those algorithms. In addition to the main model

6

1 Overview

fitting function, the MARSS package supplies functions for bootstrap and
approximate confidence intervals, parametric and non-parametric bootstrapping, model selection (AIC and bootstrap AIC), simulation, and bootstrap
bias correction.

1.2 What does MARSS output and how do I get the
output?
MARSS models are used in many different ways and different users will want
different types of output. Some users will want the parameter estimates while
others want the smoothed states and others want to use MARSS to interpolate
missing values and want the expected values of missing data. The best way to
find out how to get output is to type ?print.MARSS at the command line after
installing MARSS. This help page discusses how to get parameter estimates
in different forms, the smoothed and filtered states, all the Kalman filter and
smoother output, all the expectations of y (missing data), confidence intervals
and bias estimates for the parameters, and standard errors of the states. If
you are looking only for Kalman filter and smoother output, see the relevant
section in Chapter 3 and ?MARSSkf at the R command line.

1.3 How to get started (quickly)
If you already work with models in the form of Equation 1.1, you can immediately fit your model with the MARSS package. Install the MARSS package
and then type library(MARSS) at the command line to load the package. Look
at the Quick Start Guide and then skim through Chapter ??. Your data need
to be a matrix (not dataframe) with time going across the columns and any
non-data columns (like year) removed. The MARSS functions assume discrete
time steps and you will need a column for each time step. Replace any missing
time steps with a missing value holder (e.g. NA). Write your model down on
paper and identify which parameters correspond to which parameter matrices
in Equation 1.1. Call the MARSS() function (Chapter 4) using your data and
using the model argument to specify the structure of each parameter.

1.4 Important notes about the algorithms
Specification of a properly constrained model with a unique solution is the responsibility of the user because MARSS has no way to tell if you have specified
an insufficiently constrained model—with correspondingly an infinite number
of solutions.
Specifying a properly constrained model with a unique solution is imperative. How do you know if the model is properly constrained? If you are using

1.4 Important notes about the algorithms

7

a MARSS model form that is widely used, then you can probably assume that
it is properly constrained. If you go to papers where someone developed the
model or method, the issue of constraints necessary to ensure “identifiability”
will likely be addressed if it is an issue. Are you fitting novel MARSS models? Then you will need to do some study on identifiability in this class of
models using textbooks (see the textbook list at end of this User Guide). Often textbooks do not address identifiability explicitly. Rather it is addressed
implicitly by only showing a model constructed in such a way that it is identifiable. In our work, if we suspect identification problems, we will often first
do a Bayesian analysis with flat priors and look for oddities in the posteriors,
such as ridges, plateaus or bimodality.
All the EM code in the MARSS package is in native R . Thus the model
fitting is slow (relatively). The classic Kalman filter/smoother algorithm, as
shown in Shumway and Stoffer (2006, p. 331-335), is based on the original
smoother presented in Rauch (1963). This Kalman filter is provided in function
MARSSkfss, but the default Kalman filter and smoother used in the MARSS
package is based on the algorithm in Kohn and Ansley (1989) and papers by
Koopman et al. This Kalman filter and smoother is provided in the KFAS
package (Helske 2012). Table 2 in Koopman (1993) indicates that the classic algorithm is 40-100 times slower than the algorithm given in Kohn and
Ansley (1989), Koopman (1993), and Koopman et al. (1999). The MARSS
package function MARSSkfas provides a translator between the model objects
in MARSS and those in KFAS so that the KFAS functions can be used.
MARSSkfas also includes a lag-one covariance smoother algorithm as this is
not output by the KFAS functions, and it provides proper formulation of the
priors so that one can use the KFAS functions when the prior on the states is
set at t = 0 instead of t = 1 (and no, simply off-setting your data to start at
t=2 and sending that value to tinit = 1 in the KFAS Kalman filter would not
be mathematically correct).
EM algorithms will quickly get in the vicinity of the maximum likelihood,
but the final approach to the maximum is generally slow relative to quasiNewton methods. On the flip side, EM algorithms are quite robust to initial
conditions choices and can be extremely fast at getting close to the MLE values for high-dimensional models. The MARSS package also allows one to use
the BFGS method to fit MARSS models, thus one can use an EM algorithm to
“get close” and then the BFGS algorithm to polish off the estimate. Restricted
maximum-likelihood algorithms are also available for AR(1) state-space models, both univariate (Staples et al., 2004) and multivariate (Hinrichsen, 2009).
REML can give parameter estimates with lower variance than plain maximumlikelihood algorithms. However, the algorithms for REML when there are missing values are not currently available (although that will probably change in
the near future). Another maximum-likelihood method is data-cloning which
adapts MCMC algorithms used in Bayesian analysis for maximum-likelihood
estimation (Lele et al., 2007).

8

1 Overview

Missing values are seamlessly accommodated with the MARSS package.
Simply specify missing data with NAs. The likelihood computations are exact
and will deal appropriately with missing values. However, no innovations3
bootstrapping can be done if there are missing values. Instead parametric
bootstrapping must be used.
You should be aware that maximum-likelihood estimates of variance in
MARSS models are fundamentally biased, regardless of the algorithm used.
This bias is more severe when one or the other of R or Q is very small, and
the bias does not go to zero as sample size goes to infinity. The bias arises
because variance is constrained to be positive. Thus if R or Q is essentially
zero, the mean estimate will not be zero and thus the estimate will be biased
high while the corresponding bias of the other variance will be biased low.
You can generate unbiased variance estimates using a bootstrap estimate of
the bias. The function MARSSparamCIs() will do this. However be aware that
adding an estimated bias to a parameter estimate will lead to an increase in
the variance of your parameter estimate. The amount of variance added will
depend on sample size.
You should also be aware that mis-specification of the prior on the initial
states (π and Λ) can have catastrophic effects on your parameter estimates
if your prior conflicts with the distribution of the initial states implied by
the MARSS model. These effects can be very difficult to detect because the
model will appear to be well-fitted. Unless you have a good idea of what the
parameters should be, you might not realize that your prior conflicts.
The most common problems, we have found with priors on x0 are the
following. Problem 1) The correlation structure in Λ (whether the prior is
diffuse or not) does not match the correlation structure in x0 implied by your
model. For example, you specify a diagonal Λ (independent states), but the
implied distribution has correlations. Problem 2) The correlation structure in
Λ does not match the structure in x0 implied by constraints you placed on
π. For example, you specify that all values in π are shared, yet you specify
that Λ is diagonal (independent). Unfortunately, using a diffuse prior does not
help with these two problems because the diffuse prior still has a correlation
structure and can still conflict with the implied correlation in x0 . One way to
get around these problems is to set Λ=0 (a m×m matrix of zeros) and estimate
π ≡ x0 only. Now π is a fixed but unknown (estimated) parameter, not the
mean of a distribution. In this case, Λ does not exist in your model and there
is no conflict with the model. Unfortunately estimating π as a parameter is not
always robust. If you specify that Λ=0 and specify that π corresponds to x0 ,
but your model “explodes” when run backwards in time, you cannot estimate
π because you cannot get a good estimate of x0 . Sometimes this can be avoided
by specifying that π corresponds to x1 so that it can be constrained by the
data y1 . In summary, if the implied correlation structure of your initial states
is independent (diagonal variance-covariance matrix), you should generally be
3

referring to the non-parametric bootstrap developed by Stoffer and Wall (1991).

1.5 Troubleshooting

9

ok with a diagonal and high variance prior or with treating the initial states as
parameters (with Λ = 0). But if your initial states have an implied correlation
structure that is not independent, then proceed with caution (which means
that you should assume you have problems and test the fitting with simulated
data).
There is a large class of models in the statistical finance literature that
have the form
xt+1 = Bxt + Γηt
yt = Zxt + ηt
For example, ARMA(p,q) models can be written in this form. The MARSS
model framework in this package will not allow you to write models in that
form. You can put the ηt into the xt vector and set R = 0 to make models
of this form using the MARSS form, but the EM algorithm in the MARSS
package won’t let you estimate parameters because the parameters will drop
out of the full likelihood being maximized in the algorithm. You can try using
BFGS by passing in the method argument to the MARSS() call.

1.5 Troubleshooting
Numerical errors due to ill-conditioned matrices are not uncommon when
fitting MARSS models. The Kalman and EM algorithms need inverses of
matrices. If those matrices become ill-conditioned, for example all elements
are close to the same value, then the algorithm becomes unstable. Warning
messages will be printed if the algorithms are becoming unstable and you
can set control$trace=1, to see details of where the algorithm is becoming
unstable. Whenever possible, you should avoid using shared π values in your
model4 . The way our algorithm deals with Λ tends to make this case unstable,
especially if R is not diagonal. In general, estimation of a non-diagonal R is
more difficult, more prone to ill-conditioning, and more data-hungry.
You may also see non-convergence warnings, especially if your MLE model
turns out to be degenerate. This means that one of the elements on the diagonal of your Q or R matrix are going to zero (are degenerate). It will take
the EM algorithm forever to get to zero. BFGS will have the same problem,
although it will often get a bit closer to the degenerate solution. If you are
using method="kem", MARSS will warn you if it looks like the solution is degenerate. If you use control=list(allow.degen=TRUE), the EM algorithm
will attempt to set the degenerate variances to zero (instead of trying to get to
zero using an infinite number of iterations). However, if one of the variances is
going to zero, first think about why this is happening. This is typically caused
by one of three problems: 1) you made a mistake in inputting your data, e.g.
used -99 as the missing value in your data but did not replace these with NAs
4

An example of a π with shared values is π =

a
a .
a

10

1 Overview

before passing to MARSS, 2) your data are not sufficient to estimate multiple
variances or 3) your data are inconsistent with the model you are trying fit.
The algorithms in the MARSS package are designed for cases where the Q
and R diagonals are all non-minuscule. For example, the EM update equation
for u will grind to a halt (not update u) if Q is tiny (like 1E-7). Conversely,
the BFGS equations are likely to miss the maximum-likelihood when R is tiny
because then the likelihood surface becomes hyper-sensitive to π. The solution
is to use the degenerate likelihood function for the likelihood calculation and
the EM update equations. MARSS will implement this automatically when
Q or R diagonal elements are set to zero and will try setting Q and R terms
to zero automatically if control$allow.degen=TRUE. One odd case can occur when R goes to zero (a matrix of zeros), but you are estimating π. If
model$tinitx=1, then π must be y1 as R goes to zero, but as R goes to zero,
the log-likelihood will go (correctly) to infinity. But if you set R = 0, the loglikelihood will be finite. The reason is that R ≈ 0 and R = 0 specify different
likelihoods. In the first, the determinant of R will appear, and this goes to
positive infinity as R goes to zero. In the second case, R does not appear in
the likelihood and so the determinant of R does not appear. If some elements
of the diagonal of R are going to zero, you should be suspect of the parameter
estimates. Sometimes the structure of your data, e.g. one data value followed
by a long string of missing values, is causing an odd spike in the likelihood at
R ≈ 0. Try manually setting R equal to zero to get the correct log-likelihood5 .

1.6 Other related packages
Packages that will do Kalman filtering and smoothing are many, but packages that estimate the parameters in a MARSS model, especially constrained
MARSS models, are much less common. The following are those with which we
are familiar, however there are certainly more packages for estimating MARSS
models in engineering and economics of which we are unfamiliar. The MARSS
package is unusual in that it uses an EM algorithm for maximizing the likelihood as opposed to a Newton-esque method (e.g. BFGS). The package is also
unusual in that it allows you to specify the initial conditions at t = 0 or t = 1,
allows degenerate models (with some of the diagonal elements of R or Q equal
to zero). Lastly, model specification in the MARSS package has a one-to-one
relationship between the model list in MARSS and the model as you would
write it on paper as a matrix equation. This makes the learning curve a bit
less steep. However, the MARSS package has not been optimized for speed
and probably will be really slow if you have time-series data with a lot of time
points.
5

The likelihood returned when R ≈ 0 is not incorrect. It is just not the likelihood
that you probably want. You want the likelihood where the R term is dropped
because it is zero.

1.6 Other related packages

11

DLM DLM is an R package for fitting MARSS models. Our impression is
that it is mainly Bayesian focused but it does allow MLE estimation via
the optim() function. It has a book, Dynamic Linear Models with R by
Petris et al., which has many examples of how to write MARSS models
for different applications.
sspir sspir an R package for fitting ARSS (univariate) models with Gaussian,
Poisson and binomial error distributions.
dse dse (Dynamic Systems Estimation) is an R package for multivariate Gaussian state-space models with a focus on ARMA models.
SsfPack SsfPack is a package for Ox/Splus that fits constrained multivariate
Gaussian state-space models using mainly (it seems) the BFGS algorithm
but the newer versions support other types of maximization. SsfPack is
very flexible and written in C to be fast. It has been used extensively
on statistical finance problems and is optimized for dealing with large
(financial) data sets. It is used and documented in Time Series Analysis
by State Space Methods by Durbin and Koopman, An Introduction to
State Space Time Series Analysis by Commandeur and Koopman, and
Statistical Algorithms for Models in State Space Form: SsfPack 3.0, by
Koopman, Shephard, and Doornik.
Brodgar The Brodgar software was developed by Alain Zuur to do (among
many other things) dynamic factor analysis, which involves a special type
of MARSS model. The methods and many example analyses are given
in Analyzing Ecological Data by Zuur, Ieno and Smith. This is the one
package that we are aware of that also uses an EM algorithm for parameter
estimation.
eViews eViews is a commercial economics software that will estimate at least
some types of MARSS models.
KFAS The KFAS R package provides a fast Kalman filter and smoother.
Examples in the package show how to estimate MARSS models using the
KFAS functions and R ’s optim() function. The MARSS package uses the
filter and smoother functions from the KFAS package.
S+FinMetrics S+FinMetrics is a S-plus module for fitting MAR models,
which are called vector autoregressive (VAR) models in the economics
and finance literature. It has some support for state-space VAR models,
though we haven’t used it so are not sure which parameters it allows you to
estimate. It was developed by Andrew Bruce, Doug Martin, Jiahui Wang,
and Eric Zivot, and it has a book associated with it: Modeling Financial
Time Series with S-plus by Eric Zivot and Jiahui Wang.
kftrack The kftrack R package provides a suite of functions specialized for
fitting MARSS models to animal tracking data.

2
The main package functions

The MARSS package is object-based. It has two main types of objects: a model
object (class marssMODEL) and a maximum-likelihood fitted model object
(class marssMLE). A marssMODEL object specifies the structure of the model
to be fitted. It is an R code version of the MARSS equation (Equation 1.1).
A marssMLE object specifies both the model and the information necessary
for fitting (initial conditions, controls, method). If the model has been fitted,
the marssMLE object will also have the parameter estimates and (optionally)
confidence intervals and bias.

2.1 The MARSS() function: inputs
The function MARSS() is an interface to the core fitting functions in the
MARSS package. It allows a user to fit a MARSS model using a list to describe the model structure. It returns marssMODEL and marssMLE objects
which the user can later use in other functions, e.g. simulating or computing
bootstrap confidence intervals.
MLEobj=MARSS(data, model=list(), ..., fit=TRUE) This function will fit
a MARSS model to the data using a model list which is a list describing
the structure of the model parameter matrices. In the default model, i.e.
if you use MARSS(dat) with no model argument, Z and B are the identity
matrix, R is a diagonal matrix with one variance, Q is a diagonal matrix with unique variances, u is unique, a is scaling, and C, c, D, and d
are all zero. The output is a marssMLE object where the estimated parameter matrices are in MLEobj$par. If fit=FALSE, it returns a minimal
marssMLE object that is ready for passing to a fitting function (below)
but with no par element.

14

2 The main package functions

2.2 The MARSS() function: outputs
The marssMLE object returned by a MARSS() call include the estimated
parameters, states, and expected values of any missing data. Derived statistics,
such as confidence intervals and standard errors, can be computed using the
functions described below. The print method for marssMLE objects will print
or compute all the frequently needed output using the what= argument in the
print call. Type ?print.MARSS at the R command line to see the print help
file.
estimated parameters coef(marssMLE) The coef function can output parameters in a variety of formats. See ?coef.marssMLE.
residuals residuals(marssMLE) See ?residuals.marssMLE for a discussion
of standardized residuals in the context of MARSS models.
Kalman filter and smoother output The smoothed states are in marssMLE$states
but the full Kalman filter and smoother output is available from MARSSkf(marssMLE).
See ?MARSSkf for a discussion of the Kalman filter and smoother outputs.
If you just want the estimated states conditioned on all the data, use
print(marssMLE, what="xtT"). If you want all the Kalman filter and
smoother output, use print(marssMLE, what="kfs").
expected value of missing y MARSShatyt(marssMLE) See ?MARSShatyt for a
discussion of the expectations involving y. If you just want the estimated
missing y conditioned on all the data, use print(marssMLE, what="ytT").
If you want all the expectations involving y conditioned on the data, use
print(marssMLE, what="Ey").

2.3 Core functions for fitting a MARSS model
The following core functions are designed to work with ‘unfitted’ marssMLE
objects, that is a marssMLE object without the par element. Users do not normally need to call the MARSSkem or MARSSoptim functions since MARSS() will
call those. Below, MLEobj means the argument is a marssMLE object. Note,
these functions can be called with a marssMLE object with a par element,
but these functions will overwrite that element.
MLEobj=MARSSkem(MLEobj) This will fit a MARSS model via the EM algorithm to the data using a properly specified marssMLE object, which
has data, the marssMODEL object and the necessary initial condition
and control elements. See the appendix on the object structures in the
MARSS package. MARSSkem does no error-checking. See is.marssMLE().
MARSSkem uses MARSSkf described below.
MLEobj=MARSSoptim(MLEobj) This will fit a MARSS model via the BFGS algorithm provided in optim(). This requires a properly specified marssMLE
object, such as would be passed to MARSSkem.

2.4 Functions for a fitted marssMLE object

15

MLEobj=MARSSmcinit(MLEobj) This will perform a Monte Carlo initial conditions search and update the marssMLE object with the best initial conditions from the search.
is.marssMLE(MLEobj) This will check that a marssMLE object is properly
specified and ready for fitting. This should be called before MARSSkem
or MARSSoptim is called. This function is not typically needed if using
MARSS() since MARSS() builds the model object for the user and does
error-checking on model structure.

2.4 Functions for a fitted marssMLE object
The following functions use a marssMLE object that has a populated par
element, i.e. a marssMLE object returned from one of the fitting functions
(MARSS, MARSSkem, MARSSoptim). Below modelObj means the argument is a
marssMODEL object and MLEobj means the argument is a marssMLE object.
Type ?function.name to see information on function usage and examples.
kf=MARSSkf(MLEobj) This will compute the expected values of the hidden
states given data via the Kalman filter (to produce estimates conditioned
on 1 : t − 1) and the Kalman smoother (to produce estimates conditioned
on 1 : T ). The function also returns the exact likelihood of the data conditioned on MLEobj$par. A variety of other Kalman filter/smoother information is also output (kf is a list of output); see ?MARSSkf for details.
MLEobj=MARSSaic(MLEobj) This adds model selection criteria, AIC, AICc,
and AICb, to a marssMLE object.
boot=MARSSboot(MLEobj) This returns a list containing bootstrapped parameters and data via parametric or innovations bootstrapping.
MLEobj=MARSShessian(MLEobj) This adds a numerically estimated Hessian
matrix to a marssMLE object.
MLEobj=MARSSparamCIs(MLEobj) This adds standard errors, confidence intervals, and bootstrap estimated bias for the maximum-likelihood parameters using bootstrapping or the Hessian to the passed-in marssMLE object.
sim.data=MARSSsimulate(MLEobj) This returns simulated data from a MARSS
model specified via a list of parameter matrices in MLEobj$parList (this
is a list with elements Q, R, U, etc).
paramVec=MARSSvectorizeparam(MLEobj) This returns the estimated (and
only the estimated) parameters as a vector. This is useful for storing the
results of simulations and for writing functions that fit MARSS models
using R’s optim function.
new.MLEobj=MARSSvectorizeparam(MLEobj, paramVec) This will return a
marssMLE object in which the estimated parameters (which are in MLEobj$par
along with the fixed values) are replaced with the values in paramVec.

16

2 The main package functions

2.5 Functions for marssMODEL objects
is.marssMODEL(modelObj) This will check that the free and fixed matrices
in a marssMODEL object are properly specified. This function is not
typically needed if using MARSS() since MARSS() builds the marssMODEL
object for the user and does error-checking on model structure.
summary(modelObj) This will print the model parameter matrices showing
the fixed values (in parentheses) and the location of the estimated elements. The estimated elements are shown as g1, g2, g3, ... which indicates
which elements are shared (i.e., forced to have the same value). For example, an i.i.d. R matrix would appear as a diagonal matrix with just g1
on the diagonal.

3
Algorithms used in the MARSS package

3.1 The full time-varying model used in the MARSS EM
algorithm
In mathematical form, the model that is being fit with the package is
>
xt = (xt−1
⊗ Im ) vec(Bt ) + (ut> ⊗ Im ) vec(Ut ) + wt ,

yt = (xt> ⊗ In ) vec(Zt ) + (at> ⊗ In ) vec(At ) + vt ,

Wt ∼ MVN(0, Qt )
Vt ∼ MVN(0, Rt )

(3.1)

xt0 = π + F , L ∼ MVN(0, Λ)
Each model parameter matrix, Bt , Ut , Qt , Zt , At , and Rt , is written as
a time-varying linear model, ft + Dt m, where f and D are fully-known (not
estimated and no missing values) and m is a column vector of the estimates
elements of the parameter matrix:
vec(Bt ) = ft,b + Dt,bβ
vec(Zt ) = ft,z + Dt,zζ

vec(Ut ) = ft,u + Dt,uυ
vec(At ) = ft,a + Dt,aα

vec(Λ) = fλ + Dλλ

vec(Qt ) = ft,q + Dt,q q
vec(Rt ) = ft,r + Dt,r r

vec(π) = fπ + Dπ p

The internal MARSS model specification (element $marss in a fitted
marssMLE object output by a MARSS() call) is a list with the ft (“fixed”)
and Dt (“free”) matrices for each parameter. The output from fitting are the
vectors, β , υ , etc. The trick is to rewrite the user’s linear multivariate problem
into the general form (Equation 3.1). MARSS does this using functions that
take more familiar arguments as input and then constructs the ft and Dt matrices. Because the ft and Dt can be whatever the user wants (assuming they
are the right shape), this allows users to include covariates, trends (linear,
sinusoidal, etc) or indicator variables in a variety of ways. It also means that
terms like 1 + b + 2c can appear in the parameter matrices.
Although the above form looks unusual, it is equivalent to the commonly
seen form but leads to a log-likelihood function where all terms have form Mm,

18

3 Algorithms used in the MARSS package

where M is a matrix and m is a column vector of only the different estimated
values. This makes it easy to do the partial differentiation with respect to
m necessary for the EM algorithm and as a result, easy to impose linear
constraints and structure on the elements in a parameter matrix (Holmes,
2012).

3.2 Maximum-likelihood parameter estimation
3.2.1 EM algorithm
Function MARSSkem in the MARSS package provides a maximum-likelihood
algorithm for parameter estimation based on an Expectation-Maximization
(EM) algorithm (Holmes, 2012). EM algorithms are widely used algorithms
that extend maximum-likelihood estimation to cases where there are hidden
random variables in a model (Dempster et al., 1977; Harvey, 1989; Harvey and
Shephard, 1993; McLachlan and Krishnan, 2008). Expectation-Maximization
algorithms for unconstrained MARSS models have been around for many years
and algorithms for certain constrained cases have also been published. What
makes the EM algorithm in MARSS different is that it is a general constrained
algorithm that allows generic linear constraints among matrix elements (thus
allows fixed, shared and linear combinations of estimated elements).
The EM algorithm finds the maximum-likelihood estimates of the parameters in a MARSS model using an iterative process. Starting with an initial set
of parameters1 , which we will denote Θ̂1 , an updated parameter set Θ̂2 is obtaining by finding the Θ̂2 that maximizes the expected value of the likelihood
over the distribution of the states (X) conditioned on Θ̂1 . This distributon of
states is computed via the Kalman smoother (Section 3.3). Mathematically,
each iteration of an EM algorithm does this maximization:
Θ̂2 = arg max
Θ

EX|Θ̂1 [log L(Θ|Y = yT1 , X)]

(3.2)

Then using Θ̂2 , the distibution of X conditioned on Θ̂2 is computed. Then that
distibution along with Θ̂2 in place of Θ̂1 is used in Equation (3.2) to produce an
updated parameter set Θ̂3 . This is repeated until the expected log-likelihood
stops increasing (or increases less than some set tolerance level).
Implementing this algorithm is straight-forward, hence its popularity.
1. Set an initial set of parameters, Θ̂1
2. E step: using the model for the hidden states (X) and Θ̂1 , calculate the
expected values of X conditioned on all the data yT1 ; this is xtT output
by the Kalman smoother (function MARSSkf in MARSS). Also calculate
1

You can choose these however you wish, however choosing something not too far
off from the correct values will make the algorithm go faster.

3.3 Kalman filter and smoother

19

expected values of any functions of X (or Y if there are missing Y values)
that appear in your expected log-likelihood function.
3. M step: put those E(X|Y = yT1 , Θ̂1 ) and E(g(X)|Y = yT1 , Θ̂1 ) into your expected log-likelihood function in place of X (and g(X)) and maximize with
respect to Θ. This gives you Θ̂2 .
4. Repeat the E and M steps until the log likelihood stops increasing.
The EM equations used in the MARSS package (function MARSSkem) are
described in Holmes (2012) and are extensions of those in Shumway and Stoffer
(1982) and Ghahramani and Hinton (1996). Our EM algorithm is an extended
version because our algorithm is for cases where there are constraints within
the parameter matrices (shared values, linear combinations, diagonal structure, block-diagonal structure, ...), where there are fixed values within the
parameter matrices, or where there may be 0s on the diagonal of Q, R and Λ.
The EM algorithm is a hill-climbing algorithm and like all hill-climbing
algorithms can get stuck on local maxima. The MARSS package includes a
Monte-Carlo initial conditions searcher (function MARSSmcinit) based on Biernacki et al. (2003) to minimize this problem. EM algorithms are also known
to get close to the maximum very quickly but then creep toward the absolute
maximum. Once in the vicinity of the maximum, quasi-Newton methods find
the absolute maximum much faster, but they can be sensitive to initial conditions and in practice, we have found the EM algorithm to be much faster
for large problems.

3.3 Kalman filter and smoother
The Kalman filter (Kalman, 1960) is a recursive algorithm that solves for the
expected value of the hidden states (the X) in a MARSS model (Equation
1.1) at time t conditioned on the data up to time t: E(Xt |yt1 ). The Kalman
filter gives the optimal (lowest mean square error) estimate of the unobserved
xt based on the observed data up to time t for this class of linear dynamical
system. The Kalman smoother (Rauch et al., 1965) solves for the expected
value of the hidden state(s) conditioned on all the data: E(Xt |yT1 ). If the errors
in the stochastic process are Gaussian, then the estimators from the Kalman
filter and smoother are also the maximum-likelihood estimates.
However, even if the the errors are not Gaussian, the estimators are optimal in the sense that they are estimators with the least variability possible.
This robustness is one reason the Kalman filter is so powerful—it provides
well-behaving estimates of the hidden states for all kinds of multivariate autoregressive processes, not just Gaussian processes. The Kalman filter and
smoother are widely used in time-series analysis, and there are many textbooks covering it and its applications. In the interest of giving the reader a
single point of reference, we use Shumway and Stoffer (2006) as our primary
reference.

20

3 Algorithms used in the MARSS package

The MARSSkf function provides the Kalman filter and smoother output using one of two algorithms (specified by fun.kf). The algorithm in MARSSkfss
is that shown in Shumway and Stoffer (2006). This algorithm is not computationally efficient; see Koopman et al. (1999, sec. 4.3) for a more efficient
Kalman filter implementation. The Koopman et al. implementation is provided in the functions MARSSkfas using the KFAS R package. MARSSkfss
(and MARSSkfas with a few exceptions) has the following outputs:
xtt1 The expected value of Xt conditioned on the data up to time t − 1.
xtt The expected value of Xt conditioned on the data up to time t.
xtT The expected value of Xt conditioned on all the data from time 1 to T .
This the smoothed state estimate.
Vtt1 The variance of Xt conditioned on the data up to time t − 1. Denoted
Ptt−1 in section 6.2 in Shumway and Stoffer (2006).
Vtt The variance of Xt conditioned on the data up to time t. Denoted Ptt in
section 6.2 in Shumway and Stoffer (2006).
VtT The variance of Xt conditioned on all the data from time 1 to T .
Vtt1T The lag-one covariance of Xt and Xt−1 conditioned on all the data, 1
to T .
Kt The Kalman gain. This is part of the update equations and relates to the
amount xtt1 is updated by the data at time t to produce xtt. Not output
by MARSSkfas.
J This is similar to the Kalman gain but is part of the Kalman smoother. See
Equation 6.49 in Shumway and Stoffer (2006). Not output by MARSSkfas.
Innov This has the innovations at time t, defined as εt ≡ yt - E(Yt ). These
are the residuals, the difference between the data and their predicted values. See Equation 6.24 in Shumway and Stoffer (2006). Not output by
MARSSkfas.
Sigma This has the Σt , the variance-covariance matrices for the innovations
at time t. This is used for the calculation of confidence intervals, the s.e.
on the state estimates and the likelihood. See Equation 6.25 in Shumway
and Stoffer (2006) for the Σt calculation. Not output by MARSSkfas.
logLik The log-likelihood of the data conditioned on the model parameters.

3.4 The exact likelihood
The likelihood of the data given a set of MARSS parameters is part of the
output of the MARSSkfss and MARSSkfas functions. The likelihood computation is based on the innovations form of the likelihood (Schweppe, 1965) and
uses the output from the Kalman filter:
!
T
1 T
N
> −1
−
(3.3)
log L(Θ|data) = −
∑ log |Σt | + ∑ (εt ) Σt εt
2 log(2π) 2 t=1
t=1

3.5 Parametric and innovations bootstrapping

21

where N is the total number of data points, εt is the innovations at time t and
|Σt | is the determinant of the innovations variance-covariance matrix at time t.
Reference Equation 6.62 in Shumway and Stoffer (2006). However there are a
few differences between the log-likelihood output by MARSSkf and MARSSkfas
and that described in Shumway and Stoffer (2006).
The standard likelihood calculation (Equation 6.62 in Shumway and Stoffer
(2006)) is biased when there are missing values in the data, and the missing
data modifications discussed in Section 6.4 in Shumway and Stoffer (2006) do
not correct for this bias. Harvey (1989), Section 3.4.7, discusses at length that
the standard missing values correction leads to an inexact likelihood when
there are missing values. The bias is minor if there are few missing values, but
it becomes severe as the number of missing values increases. Many ecological
datasets have high fractions of missing values and this leads to a very biased
likelihood if one uses the inexact formula. Harvey (1989) provides some nontrivial ways to compute the exact likelihood.
We use instead the exact likelihood correction for missing values that is
presented in Section 12.3 in Brockwell and Davis (1991). This solution is
straight-forward to implement. The correction involves the following changes
to εt and Σt in the Equation 3.3. Suppose the value yi,t is missing. First, the
corresponding i-th value of εt is set to 0. Second, the i-th diagonal value of Σt
is set to 1 and the off-diagonal elements on the i-th column and i-th row are
set to 0.

3.5 Parametric and innovations bootstrapping
Bootstrapping can be used to construct frequentist confidence intervals on
the parameter estimates (Stoffer and Wall, 1991) and to compute the smallsample AIC corrector for MARSS models (Cavanaugh and Shumway, 1997);
the functions MARSSparamCIs and MARSSaic do these computations.
The MARSSboot function provides both parametric and innovations bootstrapping of MARSS models. The innovations bootstrap algorithm by Stoffer
and Wall (1991) bootstraps the model residuals (the innovations). This is a
semi-parametric bootstrap since is uses, partially, the maximum-likelihood parameter estimates. This algorithm cannot be used if there are missing values
in the data. Also for short time series, it gives biased bootstraps because one
cannot resample the first few innovations.
MARSSboot also provides a fully parametric bootstrap. This uses the
maximum-likelihood MARSS parameters to simulate data from which bootstrap parameter estimates are obtained. Our research (Holmes and Ward,
2010) indicates that this provides unbiased bootstrap parameter estimates,
and it works with datasets with missing values. Lastly, MARSSboot can also
output parameters sampled from a numerically estimated Hessian matrix.

22

3 Algorithms used in the MARSS package

3.6 Simulation and forecasting
The MARSSsimulate function simulates from a fitted marssMLE object (e.g.
output from a MARSS() call). It use the mvrnorm (package MASS) or rmvnorm
(package mvtnorm) functions to produce draws of the process and observation
errors from multivariate normal distributions for each time step.

3.7 Model selection
The package provides the MARSSaic function for computing AIC, AICc and
AICb. The latter is a small-sample corrector for autoregressive state-space
models. The bias problem with AIC and AICc for short time-series data
has been shown in Cavanaugh and Shumway (1997) and Holmes and Ward
(2010). AIC and AICc tend to select overly complex MARSS models when
the time-series data are short. AICb corrects this bias. The algorithm for a
non-parametric AICb is given in Cavanaugh and Shumway (1997). Their algorithm uses the innovations bootstrap (Stoffer and Wall, 1991), which means
it cannot be used when there are missing data. We added a parametric AICb
(Holmes and Ward, 2010), which uses a parametric bootstrap. This algorithm
allows one to compute AICb when there are missing data and it provides
unbiased AIC even for short time series. See Holmes and Ward (2010) for
discussion and testing of parametric AICb for MARSS models.
AICb is comprised of the familiar AIC fit term, −2 log L, plus a penalty
term that is the mean difference between the log likelihood the data under the
bootstrapped maximum-likelihood parameter estimates and the log likelihood
of the data under the original maximum-likelihood parameter estimate:


1 Nb
L(Θ̂∗ (i)|y)
AICb = −2 log L(Θ̂|y) + 2
∑ − log L(Θ̂|y)
Nb i=1

(3.4)

where Θ̂ is the maximum-likelihood parameter set under the original data y,
Θ̂∗ (i) is a maximum-likelihood parameter set estimated from the i-th bootstrapped data set y∗ (i), and Nb is the number of bootstrap data sets. It is
important to notice that the likelihood in the AICb equation is L(Θ̂∗ |y) not
L(Θ̂∗ |y∗ ). In other words, we are taking the average of the likelihood of the
original data given the bootstrapped parameter sets.

Part II

Fitting models with MARSS

4
Fitting models: the MARSS() function

From the user perspective, the main package function is MARSS(). This fits a
MARSS model (Equation 1.1) to a matrix of data:
MARSS(data, model=list(), form="marxss"))
The model argument is a list with names B, U, C, c, Q, Z, A, D, d, R, x0,
V0. Elements can be left off to use default values. The form argument tells
MARSS() how to use the model list elements. The default is form="marxss"
which is the model in Equation 1.1.
The data must be passed in as a n × T matrix; that is time goes across
columns. A vector is not a matrix, nor is a dataframe. A data matrix consisting
of three time series (n = 3) with six time steps might look like


1 2 NA NA 3.2 8
y =  2 5 3 NA 5.1 5 
1 NA 2 2.2 NA 7
where NA denotes a missing value.
The argument model specifies the structure of the MARSS model. It is a
list, where the list elements for each model parameter specifies the form of
that parameter.
The most general way to specify model structure is to use a list matrix.
The list matrix allows one to combine fixed and estimated elements in one’s
parameter specification. It allows a one-to-one correspondence between how
you write the parameter matrix on paper and how you specify it in R . For
example, let’s say Q and u have the following forms in your model:




q00
0.05
Q = 0 q 0 and u =  u1 
001
u2
So Q is a diagonal matrix with the 3rd variance fixed at 1 and the 1st and
2nd estimated and equal. The 1st element of u is fixed, and the 2nd and 3rd
are estimated and different. You can specify this using a list matrix:

26

4 Fitting models: the MARSS() function

Q=matrix(list("q",0,0,0,"q",0,0,0,1),3,3)
U=matrix(list(0.05,"u1","u2"),3,1)
If you print out Q and U, you will see they look exactly like Q and u written
above. MARSS will keep the fixed values fixed and estimate q, u1, and u2.
List matrices allow the most flexible model structures, but MARSS also has
text shortcuts for a number of common model structures. Below, the possible
ways to specify each model parameter are shown, using m = 3 (the number of
hidden state processes) and n = 3 (number of observation time series).

4.1 u, a and π model structures
u, a and π are all row matrices and the options for specifying their structures
are the same. a has one special option, "scaling" described below. The allowable structures are shown using u as an example. Note that you should be
careful about specifying shared structure in π because you need to make sure
the structure in Λ matches. For example, if you require that all the π values
are shared (equal) then Λ cannot be a diagonal matrix since that would be
saying that the π values are independent, which they are clearly not if you
force them to be equal.
U=matrix(list(),m,1): This is the most general form and allows one
to specify fixed and estimated elements in u. Each character string in
u is the name of one of the u elements to be estimated. For example if
U=matrix(list(0.01,"u","u"),3,1), then u in the model has the following structure:


0.01
 u 
u
U=matrix(c(),m,1), where the values in c() are all character strings:
each character string is the name of an element to be estimated. For example if U=matrix(c("u1","u1","u2"),3,1), then u in the model has
the following structure:
 
u1
 u1 
u2
with two values being estimated. U=matrix(list("u1","u1","u2"),3,1)
has the same effect.
U="unequal" or U="unconstrained": Both of these stings indicate that
each element of u is estimated. If m = 3, then u would have the form:
 
u1
 u2 
u3

4.2 Q, R, Λ model structures

27

U="equal": There is only one value in u:
 
u
u
u
U=matrix(c(),m,1), where the values in c() all numerical values: u is
fixed and has no estimated values. If U=matrix(c(0.01,1,-0.5),3,1),
then u in the model is:


0.01
 1 
−0.5
U=matrix(list(0.01,1,-0.5),3,1) would have the same effect.
U="zero": u is all zero:

 
0
0
0

The a parameter has a special option, "scaling", which is the default
behavior. In this case, a is treated like a scaling parameter. If there is only
one y row associated with an x row, then the corresponding a element is
0. If there are more than one y rows associated with an x row, then the
first a element is set to 0 and the others are estimated. For example, say
m = 2 and n = 4 and Z looks like the following:


10
1 0

Z=
1 0
01
Then the 1st-3rd rows of y are associated with the first row of x, and the
4th row of y is associated with the last row of x. Then if a is specified as
"scaling", a has the following structure:
 
0
 a1 
 
 a2 
0

4.2 Q, R, Λ model structures
The possible Q, R, and Λ model structures are identical, except that R is n × n
while Q and Λ are m × m. All types of structures can be specified using a list
matrix, but there are also text shortcuts for specifying common structures.
The structures are shown using Q as the example.

28

4 Fitting models: the MARSS() function

Q=matrix(list(),m,m): This is the most general way to specify the parameters and allows there to be fixed and estimated elements. Each character string in the list matrix is the name of one of the Q elements to be
estimated, and each numerical value is a fixed value. For example if
Q=matrix(list("s2a",0,0,0,"s2a",0,0,0,"s2b"),3,3),
then Q has the following structure:
 2

σa 0 0
 0 σ2a 0 
0 0 σ2b
Note that diag(c("s2a","s2a","s2b")) will not have the desired effect
of producing a matrix with numeric 0s on the off-diagonals. It will have
character 0s and MARSS will interpret “0” as the name of an element of
Q to be estimated. Instead, the following two lines can be used:
Q=matrix(list(0),3,3)
diag(Q)=c("s2a","s2a","s2b")
Q="diagonal and equal": There is only one process variance value in this
case:
 2

σ 0 0
 0 σ2 0 
0 0 σ2
Q="diagonal and unequal": There are m process variance values in this
case:
 2

σ1 0 0
 0 σ2 0 
2
0 0 σ23
Q="unconstrained": There are values on the diagonal and the off-diagonals
of Q and the variances and covariances are all different:
 2

σ1 σ1,2 σ1,3
 σ1,2 σ2 σ2,3 
2
σ1,3 σ2,3 σ23
There are m process variances and (m2 − m)/2 covariances in this case, so
(m2 + m)/2 values to be estimated. Note that variance-covariance matrices
are never truly unconstrained since the upper and lower triangles of the
matrix must be equal.
Q="equalvarcov": There is one process variance and one covariance:
 2

σ β β
 β σ2 β 
β β σ2

4.4 Z model

29

Q=matrix(c(), m, m), where all values in c() are character strings: Each
element in Q is estimated and each character string is the name of a value
to be estimated. Note if m = 1, you still need to wrap its value in matrix()
so that its class is matrix.
Q=matrix(c(), m, m), where all values in c() are numeric values: Each
element in Q is fixed to the values in the matrix.
Q="identity": The Q matrix is the identity matrix:


100
0 1 0
001
Q="zero": The Q matrix is all zeros:

00
0 0
00


0
0
0

Be careful when setting Λ model structures. Mis-specifying the structure
of Λ can have catastrophic, but difficult to discern, effects on your estimates.
See the comments on priors in Chapter 1.

4.3 B model structures
Like the variance-covariance matrices (Q, R and Λ), B can be specified with
a list matrix to allow you to have both fixed and shared elements in the B
matrix. Character matrices and matrices with fixed values operate the same
way as for the variance-covariance matrices. In addition, the same text shortcuts are available: “unconstrained”, “identity”, “diagonal and equal”, “diagonal
and unequal”, “equalvarcov”, and “zero”. A fixed B can be specified with a
numeric matrix, but all eigenvalues must fall within the unit circle; meaning
all(abs(eigen(B)$values)<=1) must be true.

4.4 Z model
Like B and the variance-covariance matrices, Z can be specified with a list
matrix to allow you to have both fixed and estimated elements in Z. If Z is a
square matrix, many of the same text shortcuts are available: “diagonal and
equal”, “diagonal and equal”, and “equalvarcov”. If Z is a design matrix1 , then
a special shortcut is available using factor() which allows you to specify
which y rows are associated with which x rows. See Chapter ?? and the case
studies for more examples.
1

a matrix with only 0s and 1s and where the row sums are all equal to 1

30

4 Fitting models: the MARSS() function

Z=factor(c(1,1,1)): All y time series are observing the same (and only)
hidden state trajectory x (n = 3 and m = 1):
 
1
Z = 1
1
Z=factor(c(1,2,3)): Each time series in y corresponds to a different
hidden state trajectory. This is the default Z model and in this case n = m:


100
Z = 0 1 0
001
Z=factor(c(1,1,2)): The first two time series in y corresponds to one
hidden state trajectory and the third y time series corresponds to a different hidden state trajectory. Here n = 3 and m = 2:


10
Z = 1 0
01
The Z model can be specified using either numeric or character factor
levels. c(1,1,2) is the same as c("north","north","south")
Z="identity": This is the default behavior. This means Z is a n×n identity
matrix and m = n. If n = 3, it is the same as Z=factor(c(1,2,3)).
Z=matrix(c(), n, m), where the elements in c() are all strings: Passing
in a n × m character matrix, means that each character string is a value
to be estimated. Be careful that you are specifying an identifiable model
when using this option.
Z=matrix(c(), n, m), where the elements in c() are all numeric: Passing
in a n × m numeric matrix means that Z is fixed to the values in the matrix.
The matrix must be numeric but it does not need to be a design matrix.
Z=matrix(list(), n, m): Passing in a n × m list matrix allows you to
combine fixed and estimated values in the Z matrix. Be careful that you
are specifying an identifiable model.

4.5 Default model structures
The defaults for the model arguments in form="marxss" are
Z="identity" each y in y corresponds to one x in x
B="identity" no interactions among the x’s in x
U="unequal" the u’s in u are all different

4.5 Default model structures

31

Q="diagonal and unequal" process errors are independent but have different variances
R="diagonal and equal" the observations are i.i.d.
A="scaling" a is a set of scaling factors
C="zero" and D="zero" no inputs.
c="zero" and d="zero" no inputs.
pi="unequal" all initial states are different
V0="zero" the initial condition on the states (x0 or x1 ) is fixed but unknown
tinitx=0 the initial state refers to t = 0 instead of t = 1.

5
Examples

In this chapter, we work through a series of short examples using the MARSS
package functions. This chapter is oriented towards those who are already
somewhat familiar with MARSS models and want to get started quickly. We
provide little explanatory text. Those unfamiliar with MARSS models might
want to start with the applications.
In these examples, we will use the default form="marxss" argument for a
MARSS() call. This specifies a MARSS model of the form:
xt = Bt xt−1 + ut + Ct ct + wt , where wt ∼ MVN(0, Qt )

(5.1a)

yt = Zt xt + at + Dt dt + vt , where vt ∼ MVN(0, Rt )

(5.1b)

x1 ∼ MVN(π, Λ) or x0 ∼ MVN(π, Λ)

(5.1c)

The c and d are inputs (not estimated). In the examples here, we leave off c
and d, and we address including inputs only briefly at the end of the chapter.
See Chapter 12 for extended examples of including covariates as inputs in a
MARSS model.

5.1 Fixed and estimated elements in parameter matrices
Suppose one has a MARSS model (Equation 5.1) with the following structure:
  

   

  

x1,t
b1 0.1 x1,t−1
u
w1,t
0
q1 q3
=
+
+
, wt ∼ MVN
,
x2,t
b2 2
x2,t−1
u
w2,t
0
q3 q2
  

   
  

y1,t
z1 0  
0
v1,t
0
r00
y2,t  = z2 z2  x1,t + 0 + v2,t  , vt ∼ MVN 0 , 0 r 0
x2,t
y3,t
0 3
0
v3,t
0
001
   
π1
10
x0 ∼ MVN
,
π2
01

34

5 MARSS examples

Notice how this model mixes fixed values, estimated values and shared values.
In MARSS, model structure is specified using a list with the names, Z, A,
R, B, U, Q, x0 and V0. Each element is matrix (class matrix) with the same
dimensions as the matrix of the same name in the MARSS model. MARSS
distinguishes between the estimated and fixed values in a matrix by using a
list matrix in which you can have numeric and character elements. Numeric
elements are fixed; character elements are names of things to be estimated.
The model above would be specified as:
Z=matrix(list("z1","z2",0,0,"z2",3),3,2)
A=matrix(0,3,1)
R=matrix(list(0),3,3); diag(R)=c("r","r",1)
B=matrix(list("b1",0.1,"b2",2),2,2)
U=matrix(c("u","u"),2,1)
Q=matrix(c("q1","q3","q3","q2"),2,2)
x0=matrix(c("pi1","pi2"),2,1)
V0=diag(1,2)
model.gen=list(Z=Z,A=A,R=R,B=B,U=U,Q=Q,x0=x0,V0=V0,tinitx=0)
Notice that there is a one-to-one correspondence between the model list in R
and the model on paper. Fitting the model is then just a matter of passing
the data and model list to the MARSS function:
kemfit = MARSS(dat, model=model.gen)
If you work often with MARSS models then you will probably know
whether prior sensitivity is a problem for your types of MARSS applications. If
so, note that the MARSS package is unusual in that it allows you to set Λ = 0
and treat x0 as an unknown estimated parameter. This eliminates the prior
and thus the prior sensitivity problems—at the cost of adding m parameters.
Depending on your application, you may need to set the initial conditions at
t = 1 instead of the default of t = 0. If you are unsure, look in the index and
read all the sections that talk about troubleshooting priors.

5.2 Different numbers of state processes
Here we show a series of short examples using a dataset on Washington harbor
seals (?harborSealWA), which has five observation time series. The dataset is
a little unusual in that it has four missing years from years 2 to 5. This causes
some interesting issues with prior specification. Before starting the harbor
seal examples, we set up the data, making time go across the columns and
removing the year column:
dat = t(harborSealWA)
dat = dat[2:nrow(dat),] #remove the year row

5.2 Different numbers of state processes

35

5.2.1 One hidden state process for each observation time series
This is the default model for the MARSS() function. In this case, n = m, the
observation errors are i.i.d. and the process errors are independent and have
different variances. The elements in u are all different (meaning, they are not
forced to be the same). Mathematically, the MARSS model being fit is:
  

   

  
q1 0 0 0
0
w1,t
u1
1 0 0 0 0 x1,t−1
x1,t
0  0 q2 0 0
x2,t  0 1 0 0 0 x2,t−1  u2  w2,t 
  

   

  
x3,t  = 0 0 1 0 0 x3,t−1  + u3  + w3,t  , wt ∼ MVN 0 ,  0 0 q3 0
  

   

  
0  0 0 0 q4
x4,t  0 0 0 1 0 x4,t−1  u4  w4,t 
0 0 0 0
0
w5,t
u5
0 0 0 0 1 x5,t−1
x5,t
  
1
y1,t
y2,t  0
  
y3,t  = 0
  
y4,t  0
0
y5,t

0
1
0
0
0

0
0
1
0
0

0
0
0
1
0


  
     
r0000
0
v1,t
0
0 x1,t
0 0 r 0 0 0
     
0

  
 x2,t  0 v2,t 

  







0 x3,t  + 0 + v3,t  , vt ∼ MVN 
0 , 0 0 r 0 0
0 0 0 0 r 0
0 x4,t  0 v4,t 
0000r
0
v5,t
0
1 x5,t

This is the default model, so you can fit it by simply passing dat to MARSS().
kemfit = MARSS(dat)
Success! abstol and log-log tests passed at 38 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 38 iterations.
Log-likelihood: 19.13428
AIC: -6.268557
AICc: 3.805517
Estimate
R.diag
0.00895
U.X.SJF
0.06839
U.X.SJI
0.07163
U.X.EBays
0.04179
U.X.PSnd
0.05226
U.X.HC
-0.00279
Q.(X.SJF,X.SJF)
0.03205
Q.(X.SJI,X.SJI)
0.01098
Q.(X.EBays,X.EBays) 0.00706
Q.(X.PSnd,X.PSnd)
0.00414


0

0


0

0 
q5

36

5 MARSS examples

Q.(X.HC,X.HC)
x0.X.SJF
x0.X.SJI
x0.X.EBays
x0.X.PSnd
x0.X.HC

0.05450
5.98647
6.72487
6.66212
5.83969
6.60482

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
The output warns you that the convergence tolerance is high. You can set it
lower by passing in control=list(conv.test.slope.tol=0.1). MARSS() is
automatically creating parameter names since you did not tell it the names.
To see exactly where each parameter element appears in its parameter matrix,
type summary(kemfit$model).
Though it is not necessary to specify the model for this example since it
is the default, here is how you could do so using matrices:
B=Z=diag(1,5)
U=matrix(c("u1","u2","u3","u4","u5"),5,1)
x0=A=matrix(0,5,1)
R=Q=matrix(list(0),5,5)
diag(R)="r"
diag(Q)=c("q1","q2","q3","q4","q5")
Notice that when a matrix has both fixed and estimated elements (like R and
Q), a list matrix is used to allow you to specify the fixed elements as numeric
and to give the estimated elements character names.
The default MLE method is the EM algorithm (method="kem"). You can
also use a quasi-Newton method (BFGS) by setting method="BFGS".
kemfit.bfgs = MARSS(dat, method="BFGS")
Success! Converged in 99 iterations.
Function MARSSkfas used for likelihood calculation.
MARSS fit is
Estimation method: BFGS
Estimation converged in 99 iterations.
Log-likelihood: 19.13936
AIC: -6.278712
AICc: 3.795362

R.diag
U.X.SJF
U.X.SJI
U.X.EBays

Estimate
0.00849
0.06838
0.07152
0.04188

5.2 Different numbers of state processes

37

U.X.PSnd
0.05233
U.X.HC
-0.00271
Q.(X.SJF,X.SJF)
0.03368
Q.(X.SJI,X.SJI)
0.01124
Q.(X.EBays,X.EBays) 0.00722
Q.(X.PSnd,X.PSnd)
0.00437
Q.(X.HC,X.HC)
0.05600
x0.X.SJF
5.98437
x0.X.SJI
6.72169
x0.X.EBays
6.65689
x0.X.PSnd
5.83527
x0.X.HC
6.60425
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Using the default EM convergence criteria, the EM algorithm stops at a loglikelihood a little lower than the BFGS algorithm does, but the EM algorithm
was faster, 11.6 times faster, in this case. If you wanted to use the EM fit as
the initial conditions, pass in the inits argument using the $par element (or
coef(fit,form="marss")) of the EM fit.
kemfit.bfgs2 = MARSS(dat, method="BFGS", inits=kemfit$par)
The BFGS algorithm now converges in 107 iterations. Output not shown.
We mentioned that the missing years from year 2 to 4 creates an interesting
issue with the prior specification. The default behavior of MARSS is to treat
the initial state as at t = 0 instead of t = 1. Usually this doesn’t make a
difference, but for this dataset, if we set the prior at t = 1, the MLE estimate
of R becomes 0. If we estimate x1 as a parameter and let R go to 0, the
likelihood will go to infinity (slowly but surely). This is neither an error nor
a pathology, but is probably not what you would like to have happen. Note
that the “BFGS” algorithm will not find the maximum in this case; it will
stop before R gets small and the likelihood gets very large. However, the EM
algorithm will climb up the peak. You can try it by running the following
code. It will report warnings which you can read about in Appendix B.
kemfit.strange = MARSS(dat, model=list(tinitx=1))
5.2.2 Five correlated hidden state processes
This is the same model except that the five hidden states have correlated
process errors. Mathematically, this is the model:

38

5 MARSS examples

 

   
  
q1
w1,t
u1
x1,t−1
x1,t
 c1,2
x2,t  x2,t−1  u2  w2,t 
 

   
  
x3,t  = x3,t−1  + u3  + w3,t  , wt ∼ MVN 0, c1,3
 

   
  
 c1,4
x4,t  x4,t−1  u4  w4,t 
c1,5
w5,t
u5
x5,t−1
x5,t
  
1
y1,t
y2,t  0
  
y3,t  = 0
  
y4,t  0
0
y5,t

0
1
0
0
0

0
0
1
0
0

0
0
0
1
0

c1,2
q2
c2,3
c2,4
c2,5

c1,3
c2,3
q3
c3,4
c3,5

c1,4
c2,4
c3,4
q4
c4,5


c1,5

c2,5 


c3,5 

c4,5 
q5


 
     
r0000
v1,t
0
0 x1,t
 0 r 0 0 0
     
0

 
 x2,t  0 v2,t 
x3,t  + 0 + v3,t  , vt ∼ MVN 0, 0 0 r 0 0
0

 
     
 0 0 0 r 0
0 x4,t  0 v4,t 
0000r
v5,t
0
1 x5,t

B is not shown in the top equation; it is a m × m identity matrix. To fit, use
MARSS() with the model argument set. The output is not shown but it will
appear if you type this on the R command line.
kemfit = MARSS(dat, model=list(Q="unconstrained"))
This shows one of the text shortcuts, "unconstrained", which means estimate all elements in the matrix. This shortcut can be used for all parameter
matrices.
5.2.3 Five equally correlated hidden state processes
This is the same model except that now there is only one process error variance
and one process error covariance. Mathematically, the model is:

 
  
   

qcccc
x1,t−1
u1
w1,t
x1,t
 c q c c c
x2,t  x2,t−1  u2  w2,t 

 
  
   

x3,t  = x3,t−1  + u3  + w3,t  , wt ∼ MVN 0, c c q c c

 
  
   

 c c c q c
x4,t  x4,t−1  u4  w4,t 
ccccq
x5,t
x5,t−1
u5
w5,t
  
1
y1,t
y2,t  0
  
y3,t  = 0
  
y4,t  0
0
y5,t

0
1
0
0
0

0
0
1
0
0

0
0
0
1
0


 
     
r0000
v1,t
0
0 x1,t
 0 r 0 0 0
     
0
 

 x2,t  0 v2,t 









0 x3,t  + 0 + v3,t  , vt ∼ MVN 
0, 0 0 r 0 0









0 0 0 r 0
0
v4,t
0 x4,t
0000r
0
v5,t
1 x5,t

Again B is not shown in the top equation; it is a m × m identity matrix. To
fit, use the following code (output not shown):
kemfit = MARSS(dat, model=list(Q="equalvarcov"))
The shortcut ‘"equalvarcov" means one value on the diagonal and one on
the off-diagonal. It can be used for all square matrices (B,Q,R, and Λ).

5.2 Different numbers of state processes

39

5.2.4 Five hidden state processes with a “north” and a “south” u
and Q elements
Here we fit a model with five independent hidden states where each observation time series is an independent observation of a different hidden trajectory
but the hidden trajectories 1-3 share their u and Q elements, while hidden
trajectories 4-5 share theirs. This is the model:
 


   
  
qn 0 0 0 0
un
w1,t
x1,t−1
x1,t
  0 qn 0 0 0 
x2,t  x2,t−1  un  w2,t 
  
 


   
x3,t  = x3,t−1  + un  + w3,t  , wt ∼ MVN 0,  0 0 qn 0 0 
 


   
  
  0 0 0 qs 0 
x4,t  x4,t−1   us  w4,t 
x5,t−1
x5,t
0 0 0 0 qs
us
w5,t
  
y1,t
1
y2,t  0
  
y3,t  = 0
  
y4,t  0
0
y5,t

0
1
0
0
0

0
0
1
0
0

0
0
0
1
0

     

 
0 x1,t
0
v1,t
r0000
     
 0 r 0 0 0
0
 x2,t  0 v2,t 

 









0 x3,t  + 0 + v3,t  , vt ∼ MVN 
0, 0 0 r 0 0









0 x4,t
0
0 0 0 r 0
v4,t
1 x5,t
0
0000r
v5,t

To fit use the following code, we specify the model argument for u and Q
using list matrices. List matrices allow us to combine numeric and character
values in a matrix. MARSS will interpret the numeric values as fixed, and the
character values as parameters to be estimated. Parameters with the same
name are constrained to be identical.
regions=list("N","N","N","S","S")
U=matrix(regions,5,1)
Q=matrix(list(0),5,5); diag(Q)=regions
kemfit = MARSS(dat, model=list(U=U, Q=Q))
Only u and Q need to be specified since the other parameters are at their
default values.
5.2.5 Fixed observation error variance
Here we fit the same model but with a known observation error variance. This
is the model:

40

5 MARSS examples

 

   
  
qn
un
w1,t
x1,t−1
x1,t
 0
x2,t  x2,t−1  un  w2,t 
 

   
  
x3,t  = x3,t−1  + un  + w3,t  , wt ∼ MVN 0,  0
 

   
  
 0
x4,t  x4,t−1   us  w4,t 
0
us
w5,t
x5,t−1
x5,t
  
1
y1,t
y2,t  0
  
y3,t  = 0
  
y4,t  0
0
y5,t

0
1
0
0
0

0
0
1
0
0

0
0
0
1
0

0
qn
0
0
0

0
0
qn
0
0

0
0
0
qs
0


0

0


0 

0 
qs

     
v1,t
0
0 x1,t
x2,t  0 v2,t 
0
     
     
0
 x3,t  + 0 + v3,t  ,
0 x4,t  0 v4,t 
v5,t
0
1 x5,t


 
0.01 0
0
0
0

  0 0.01 0
0
0 

 

 0
0
0.01
0
0
0,
vt ∼ MVN 

 
  0
0
0 0.01 0 
0
0
0
0 0.01

To fit this model, use the following code (output not shown):
regions=list("N","N","N","S","S")
U=matrix(regions,5,1)
Q=matrix(list(0),5,5); diag(Q)=regions
R=diag(0.01,5)
kemfit = MARSS(dat, model=list(U=U, Q=Q, R=R))
5.2.6 One hidden state and five i.i.d. observation time series
Instead of five hidden state trajectories, we specify that there is only one and
all the observations are of that one trajectory. Mathematically, the model is:
xt = xt−1 + u + wt , wt ∼ N(0, q)
   
   
 

y1,t
1
0
v1,t
r0000
y2,t  1
a2  v2,t 
 0 r 0 0 0
   
   
 

y3,t  = 1 xt + a3  + v3,t  , vt ∼ MVN 0, 0 0 r 0 0
   
   
 

y4,t  1
a4  v4,t 
 0 0 0 r 0
y5,t
1
a5
v5,t
0000r

Note the default model for R is "diagonal and equal"’ so we can leave this
off when specifying the model argument. To fit, use this code (output not
shown):

5.2 Different numbers of state processes

41

Z=factor(c(1,1,1,1,1))
kemfit = MARSS(dat, model=list(Z=Z))
Success! abstol and log-log tests passed at 28 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 28 iterations.
Log-likelihood: 3.593276
AIC: 8.813447
AICc: 11.13603
Estimate
A.SJI
0.80153
A.EBays 0.28245
A.PSnd -0.54802
A.HC
-0.62665
R.diag
0.04523
U.U
0.04759
Q.Q
0.00429
x0.x0
6.39199
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
You can also pass in Z exactly as it is in the equation: Z=matrix(1,5,1),
but the factor shorthand is handy if you need to assign different observed
time series to different underlying state time series (see next examples). The
default a form is "scaling", which means that the first y row associated with
a given x has a = 0 and the rest are estimated.
5.2.7 One hidden state and five independent observation time
series with different variances
Mathematically, this model is:
xt = xt−1 + u + wt , wt ∼ N(0, q)
   
   
 
y1,t
1
0
v1,t
r1
y2,t  1
a2  v2,t 
 0
   
   
 
y3,t  = 1 xt + a3  + v3,t  , vt ∼ MVN 0,  0
   
   
 
y4,t  1
a4  v4,t 
 0
y5,t
1
a5
v5,t
0

0
r2
0
0
0

0
0
r3
0
0

0
0
0
r4
0


0

0


0

0 
r5

42

5 MARSS examples

To fit this model:
Z=factor(c(1,1,1,1,1))
R="diagonal and unequal"
kemfit = MARSS(dat, model=list(Z=Z, R=R))
Success! abstol and log-log tests passed at 24 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 24 iterations.
Log-likelihood: 16.66199
AIC: -9.323982
AICc: -3.944671
Estimate
A.SJI
0.79555
A.EBays
0.27540
A.PSnd
-0.53694
A.HC
-0.60874
R.(SJF,SJF)
0.03229
R.(SJI,SJI)
0.03528
R.(EBays,EBays) 0.01352
R.(PSnd,PSnd)
0.01082
R.(HC,HC)
0.19609
U.U
0.05270
Q.Q
0.00604
x0.x0
6.26676
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
5.2.8 Two hidden state processes
Here we fit a model with two hidden states (north and south) where observation time series 1-3 are for the north and 4-5 are for the south. We make the
hidden state processes independent (meaning a diagonal Q matrix) but with
the same process variance. We make the observation errors i.i.d. (the default)
and the u elements equal. Mathematically, this is the model:

5.3 Time-varying parameters

43

  
   

  
xn,t
x
u
w
q0
= n,t−1 +
+ n,t , wt ∼ MVN 0,
xs,t
xs,t−1
u
ws,t
0q
  
1
y1,t
y2,t  1
  
y3,t  = 1
  
y4,t  0
0
y5,t


 

   
r0000
v1,t
0
0
 0 r 0 0 0
  a2  v2,t 
0

 
 xn,t
   






0 0 r 0 0
0
0, 
+ a3  + v3,t  , vt ∼ MVN 



x
 0 0 0 r 0
 0  v4,t 
1 s,t
1
0000r
v5,t
a5

To fit the model, use the following code (output not shown):
Z=factor(c("N","N","N","S","S"))
Q="diagonal and equal"
U="equal"
kemfit = MARSS(dat, model=list(Z=Z,Q=Q,U=U))
You can also pass in Z exactly as it is in the equation as a numeric matrix
(matrix(1,5,1); the factor notation is a shortcut for making a design matrix
(as Z is in these examples). "equal" is a shortcut meaning all elements in a
matrix are constrained to be equal. It can be used for all column matrices
(a, u and π). "diagonal and equal" can be used as a shortcut for all square
matrices (B,Q,R, and Λ).

5.3 Time-varying parameters
Time-varying parameters are specified by passing in an array of matrices (list,
numeric or character) where the 3rd dimension of the array is time and must
be the same value as the 2nd (time) dimension of the data matrix. No text
shortcuts are allowed for time-varying parameters; you need to use the matrix
form.
For example, let’s say we wanted a different u for the first half versus
second half of the harbor seal time series. We would pass in an array for u as
follows:
U1=matrix("t1",5,1); U2=matrix("t2",5,1)
Ut=array(U2,dim=c(dim(U1),dim(dat)[2]))
TT=dim(dat)[2]
Ut[,,1:floor(TT/2)]=U1
kemfit.tv=MARSS(dat,model=list(U=Ut,Q="diagonal and equal"))
You can have some elements in a parameter matrix be time-constant and some
be time-varying:
U1=matrix(c(rep("t1",4),"hc"),5,1); U2=matrix(c(rep("t2",4),"hc"),5,1)
Ut=array(U2,dim=c(dim(U1),dim(dat)[2]))

44

5 MARSS examples

Ut[,,1:floor(TT/2)]=U1
kemfit.tv=MARSS(dat,model=list(U=Ut,Q="diagonal and equal"))
Note that how the time-varying model is specified for MARSS is the same as
you would write the time-varying model on paper in matrix math form.

5.4 Including inputs (or covariates)
In MARSS models with covariates, the covariates are often treated as inputs
and appear as either the c or d in Equation 5.1, depending on the application.
However, more generally, c and d are simply inputs that are fully-known (no
missing values). ct is the p × 1 vector of inputs at time t which affect the states
and dt is a q × 1 vector of inputs (potentially the same as ct ), which affect the
observations.
Ct is an m × p matrix of coefficients relating the effects of ct to the m × 1
state vector xt , and Dt is an n × q matrix of coefficients relating the effects of dt
to the n × 1 observation vector yt . The elements of C and D can be estimated,
and their form is specified much like the other matrices.
With the MARSS() function, one can fit a model with inputs by simply
passing in model$c and/or model$d in the MARSS() call as a p × T or q × T
matrix, respectively. The form for Ct and Dt is similarly specified by passing
in model$C and/or model$D. If C and D are not time-varying, they are passed
in as a matrix. If they are time-varying, they must be passed in as an 3dimensional array with the 3rd dimension equal to the number of time steps
if they are time-varying.
See Chapter 12 for extended examples of including covariates as inputs in
a MARSS model.

5.5 Printing and summarizing models and model fits
The package includes print functions for marssMODEL objects and marssMLE
objects (fitted models).
print(kemfit)
print(kemfit$model)
This will print the basic information on model structure and model fit that
you have seen in the previous examples.
The package also includes a summary function for models.
summary(kemfit$model)
Output is not shown because it is verbose, but it prints each matrix with the
fixed elements denoted with their values and the free elements denoted by
their names. This is very helpful for confirming exactly what model structure
you are fitting to the data.

5.5 Printing and summarizing models and model fits

45

The print function will also print various other things like a vector of the
estimated parameters, the estimated states, the state standard errors, etc.,
using the what argument in the print call:
print(kemfit, what="par")
List of the estimated values in each parameter matrix
$Z
[,1]
$A
[,1]
SJI
0.79786453
EBays 0.27743474
HC
-0.07035021
$R
[,1]
diag 0.03406192
$B
[,1]
$U
[,1]
1 0.04317641
$Q
[,1]
diag 0.007669608
$x0
[,1]
N 6.172048
S 6.206155
$V0
[,1]
$G
[,1]
$H
[,1]
$C

46

5 MARSS examples

[,1]
$D
[,1]
$c
[,1]
$d
[,1]
print(kemfit, what="Q")
Parameter matrix Q
[,1]
[,2]
[1,] 0.007669608 0.000000000
[2,] 0.000000000 0.007669608
Type ?print.MARSS to see a list of the types of output that can be printed
with a print call. If you want to use the output from print instead of printing,
then assign the print call to a value:
x=print(kemfit, what="states",silent=TRUE)
x
[,1]
[,2]
N 6.215483 6.329702
S 6.249445 6.295591
[,7]
[,8]
N 6.904124 6.944425
S 6.526317 6.572463
[,13]
[,14]
N 7.228397 7.293141
S 6.776937 6.817832
[,19]
[,20]
N 7.561182 7.524175
S 6.846578 6.813743

[,3]
6.443921
6.341736
[,9]
6.976697
6.613358
[,15]
7.380439
6.786202
[,21]
7.475514
6.791537

[,4]
6.558140
6.387881
[,10]
7.050053
6.654252
[,16]
7.467975
6.764235
[,22]
7.459263
6.819195

[,5]
6.672359
6.434027
[,11]
7.156567
6.695147
[,17]
7.488458
6.786233

[,6]
6.786578
6.480172
[,12]
7.198947
6.736042
[,18]
7.541996
6.816405

5.6 Confidence intervals on a fitted model
The function MARSSparamCIs() is used to compute confidence intervals with a
default alpha level of 0.05. The function can compute approximate confidence
intervals using a numerically estimated Hessian matrix (method="hessian")
or via parametric (method="parametric") or non-parametric (method="innovations")
bootstrapping.

5.6 Confidence intervals on a fitted model

47

5.6.1 Approximate confidence intervals from a Hessian matrix
The default method for MARSSparamCIs is to use a numerically estimated
Hessian matrix:
kem.with.hess.CIs = MARSSparamCIs(kemfit)
Use print or just type the marssMLE object name to see the confidence
intervals:
print(kem.with.hess.CIs)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 22 iterations.
Log-likelihood: 7.949236
AIC: 0.1015284
AICc: 2.424109
ML.Est Std.Err
low.CI up.CI
A.SJI
0.79786 0.0615 0.67729 0.9184
A.EBays 0.27743 0.0625 0.15487 0.4000
A.HC
-0.07035 0.0888 -0.24444 0.1037
R.diag
0.03406 0.0175 0.02256 0.0479
U.1
0.04318 0.0144 0.01500 0.0714
Q.diag
0.00767 0.0235 0.00173 0.0179
x0.N
6.17205 0.1455 5.88696 6.4571
x0.S
6.20615 0.1571 5.89828 6.5140
CIs calculated at alpha = 0.05 via method=hessian
5.6.2 Confidence intervals from a parametric bootstrap
Use method="parametric" to use a parametric bootstrap to compute confidence intervals and bias using a parametric bootstrap.
kem.w.boot.CIs=MARSSparamCIs(kemfit,method="parametric",nboot=10)
#nboot should be more like 1000, but set low for example's sake
print(kem.w.boot.CIs)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 22 iterations.
Log-likelihood: 7.949236
AIC: 0.1015284
AICc: 2.424109
ML.Est Std.Err

low.CI

up.CI

Est.Bias

48

5 MARSS examples

A.SJI
0.79786 0.05472 0.7127 0.8721 0.017888
A.EBays 0.27743 0.06704 0.1814 0.3635 0.022826
A.HC
-0.07035 0.09814 -0.2260 0.0492 0.010760
R.diag
0.03406 0.00833 0.0217 0.0484 0.000993
U.1
0.04318 0.01982 0.0303 0.0850 -0.007329
Q.diag
0.00767 0.00523 0.0000 0.0135 0.001860
x0.N
6.17205 0.27866 5.8029 6.6966 -0.034773
x0.S
6.20615 0.49057 5.2721 6.7419 0.188819
Unbias.Est
A.SJI
0.81575
A.EBays
0.30026
A.HC
-0.05959
R.diag
0.03505
U.1
0.03585
Q.diag
0.00953
x0.N
6.13727
x0.S
6.39497
CIs calculated at alpha = 0.05 via method=parametric
Bias calculated via parametric bootstrapping with 10 bootstraps.

5.7 Vectors of just the estimated parameters
Often it is useful to have a vector of the estimated parameters. For example,
if you are writing a call to optim, you will need a vector of just the estimated
parameters. You can use the function coef:
parvec=coef(kemfit, type="vector")
parvec
A.SJI
0.797864531
U.1
0.043176408

A.EBays
A.HC
0.277434738 -0.070350207
Q.diag
x0.N
0.007669608 6.172047633

R.diag
0.034061922
x0.S
6.206154697

5.8 Kalman filter and smoother output
All the standard Kalman filter and smoother output (along with the lag-one
covariance smoother output) is available using the MARSSkf function. Read
the help file (?MARSSkf) for details and meanings of the names in the output
list.
kf=MARSSkf(kemfit)
names(kf)

5.9 Degenerate variance estimates

[1]
[5]
[9]
[13]
[17]
[21]

"xtT"
"V0T"
"V00T"
"J0"
"Innov"
"ok"

"VtT"
"x10T"
"Vtt"
"Kt"
"Sigma"
"errors"

"Vtt1T"
"V10T"
"Vtt1"
"xtt1"
"kfas.model"

49

"x0T"
"x00T"
"J"
"xtt"
"logLik"

#if you only need the logLik, this is the fast way to get it
MARSSkf(kemfit, only.logLik=TRUE)

5.9 Degenerate variance estimates
If your data are short relative to the number of parameters you are estimating,
then you are liable to find that some of the variance elements are degenerate
(equal to zero). Try the following:
dat.short = dat[1:4,1:10]
kem.degen = MARSS(dat.short,control=list(allow.degen=FALSE))
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: 11.67854
AIC: 2.642914
AICc: 63.30958

R.diag
U.X.SJF
U.X.SJI
U.X.EBays
U.X.PSnd
Q.(X.SJF,X.SJF)
Q.(X.SJI,X.SJI)
Q.(X.EBays,X.EBays)
Q.(X.PSnd,X.PSnd)
x0.X.SJF
x0.X.SJI
x0.X.EBays
x0.X.PSnd

Estimate
1.22e-02
9.79e-02
1.09e-01
9.28e-02
1.11e-01
1.89e-02
1.03e-05
8.24e-06
3.05e-05
5.96e+00
6.73e+00
6.60e+00
5.71e+00

50

5 MARSS examples

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the Q.(X.SJI,X.SJI) parameter value has not converged.
Warning: the Q.(X.EBays,X.EBays) parameter value has not converged.
Warning: the Q.(X.PSnd,X.PSnd) parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
This will print a warning that the maximum number of iterations was reached
before convergence of some of the Q parameters. It might be that if you just
ran a few more iterations the variances will converge. So first try setting
control$maxit higher.
kem.degen2 = MARSS(dat.short, control=list(maxit=1000,
allow.degen=FALSE), silent=2)
Output not shown, but if you run the code, you will see that some of the Q
terms are still not converging. MARSS can detect if a variance is going to zero
and it will try zero to see if that has a higher likelihood. Try removing the
allow.degen=FALSE which was turning off this feature.
kem.short = MARSS(dat.short)
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: 11.6907
AIC: 2.6186
AICc: 63.28527

R.diag
U.X.SJF
U.X.SJI
U.X.EBays
U.X.PSnd
Q.(X.SJF,X.SJF)
Q.(X.SJI,X.SJI)
Q.(X.EBays,X.EBays)
Q.(X.PSnd,X.PSnd)

Estimate
1.22e-02
9.79e-02
1.09e-01
9.24e-02
1.11e-01
1.89e-02
1.03e-05
0.00e+00
3.04e-05

5.9 Degenerate variance estimates

x0.X.SJF
x0.X.SJI
x0.X.EBays
x0.X.PSnd

51

5.96e+00
6.73e+00
6.60e+00
5.71e+00

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the Q.(X.SJI,X.SJI) parameter value has not converged.
Warning: the Q.(X.PSnd,X.PSnd) parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
So three of the four Q elements are going to zero. This often happens when you
do not have enough data to estimate both observation and process variance.
Perhaps we are trying to estimate too many variances. We can try using
only one variance value in Q and one u value in u:
kem.small=MARSS(dat.short,model=list(Q="diagonal and equal",
U="equal"))
Success! abstol and log-log tests passed at 164 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 164 iterations.
Log-likelihood: 11.19
AIC: -8.379994
AICc: 0.9533396

R.diag
U.1
Q.diag
x0.X.SJF
x0.X.SJI
x0.X.EBays
x0.X.PSnd

Estimate
0.0191
0.1027
0.0000
6.0609
6.7698
6.5307
5.7451

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
No, there are simply not enough data to estimate both process and observation
variances.

52

5 MARSS examples

5.10 Bootstrap parameter estimates
You can easily produce bootstrap parameter estimates from a fitted model
using MARSSboot():
boot.params = MARSSboot(kemfit,
nboot=20, output="parameters", sim="parametric")$boot.params
|2%
|20%
|40%
|60%
|80%
|100%
Progress: ||||||||||||||||||||||||||||||||||||||||||||||||||
Use silent=TRUE to stop the progress bar from printing. The function will also
produce parameter sets generated using a Hessian matrix (sim="hessian")
or a non-parametric bootstrap (sim="innovations").

5.11 Random initial conditions
You can use random initial conditions by passing in MCInit=TRUE:
Z.model = factor(c(1,1,2,2,2))
U.model = "equal"
Q.model = "diagonal and unequal"
R.model = "diagonal and equal"
model.list=list(Z=Z.model, R=R.model, U=U.model, Q=Q.model)
#Set the numInits very low so the example runs quickly
cntl.list=list(MCInit=TRUE,numInits=10)
kem.mcinit = MARSS(dat, model=model.list, control=cntl.list)
> Starting Monte Carlo Initializations
|2%
|20%
|40%
|60%
|80%
|100%
Progress: ||||||||||||||||||||||||||||||||||||||||||||||||||
Success! abstol and log-log tests passed at 26 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Monte Carlo initialization with random starts.
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 26 iterations.
Log-likelihood: 12.02576
AIC: -6.051511
AICc: -3.100691

A.SJI
A.PSnd

Estimate
0.79876
-0.78580

5.13 Bootstrap AIC

53

A.HC
-0.85449
R.diag
0.02893
U.1
0.04191
Q.(1,1) 0.01162
Q.(2,2) 0.00441
x0.1
6.05128
x0.2
6.89080
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.

5.12 Data simulation
5.12.1 Simulated data from a fitted MARSS model
Data can be simulated from marssMLE object using MARSSsimulate().
sim.data=MARSSsimulate(kemfit, nsim=2, tSteps=100)$sim.data
Then you might want to estimate parameters from that simulated data. Above
we created two simulated datasets (nsim=2). We will fit to the first one. Here
the default settings for MARSS() are used.
kem.sim.1 = MARSS(sim.data[,,1])
Then we might like to see the likelihood of the second set of simulated data
under the model fit to the first set of data. We do that with the Kalman
filter function. This function takes a marssMLE object (as output by say the
MARSS function), and we have to replace the data in kem.sim.1 with the
second set of simulated data.
kem.sim.2 = kem.sim.1
kem.sim.2$model$data = sim.data[,,2]
MARSSkf( kem.sim.2 )$logLik
[1] 20.19664

5.13 Bootstrap AIC
The function MARSSaic() computes a bootstrap AIC for model selection purposes. Use output="AICbp" to produce a parameter bootstrap. Use output="AICbb"
to produce a non-parametric bootstrap AIC. You will need a large number of
bootstraps (nboot). We use only 10 bootstraps to show you how to compute
AICb with the MARSS package, but the AICbp estimate will be terrible with
this few bootstraps.

54

5 MARSS examples

kemfit.with.AICb = MARSSaic(kemfit, output = "AICbp",
Options = list(nboot = 10, silent=TRUE))
#nboot should be more like 1000, but set low here for example sake
print(kemfit.with.AICb)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 22 iterations.
Log-likelihood: 7.949236
AIC: 0.1015284
AICc: 2.424109
AICbp(param): 211.2704
Estimate
A.SJI
0.79786
A.EBays 0.27743
A.HC
-0.07035
R.diag
0.03406
U.1
0.04318
Q.diag
0.00767
x0.N
6.17205
x0.S
6.20615
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.

5.14 Convergence
MARSS uses two convergence tests. The first is
logLiki+1 − logLiki < tol
This is called abstol (meaning absolute tolerance) in the output. The second
is called the conv.test.slope. This looks at the slope of the log parameter
value (or likelihood) versus log iteration number and asks whether that is close
to zero (not changing).
If you are having trouble getting the model to converge, then start by
addressing the following 1) Are you trying to fit a bad model, e.g. a nonstationary model fit to stationary data or the opposite or a model that specifies independence of errors or states to data that clearly violate that or a
model that implies a particular stationary distribution (particular mean and
variance) to data that strongly violate that? 2) Do you have confounded parameters, e.g. two parameters that have the same effect (like effectively two
intercepts)?, 3) Are you trying to fit a model to 1 data point somewhere,
e.g. in a big multivariate dataset with lots of missing values? 4) How many

5.14 Convergence

55

parameters are you trying to estimate per data point? 5) Check your residuals (residuals(kemfit)$model.residuals) for normality. 6) Did you do
any data transformations that would cause one of the variances to go to zero?
Replacing 0s with a constant will do that. Try replacing them with NAs (missing). Do you have long strings of constant numbers in your data? Binned data
often look like that, and that will drive Q to 0.

Part III

Applications

59

In this part, we walk you through some longer analyses using MARSS
models for a variety of different applications. Most of these are analyses of
ecological data, but the same models are used in many other fields. These
longer examples will take you through both the conceptual steps (with pencil
and paper) and a R step which translates the conceptual model into code.

Set-up
ˆ

ˆ

If you haven’t already, install the MARSS package. See directions on the
CRAN webpage (http://cran.r-project.org/) for instructions on installing
packages. You will need write permissions for your R program directories
to install packages. See the help pages on CRAN for workarounds if you
don’t have write permission.
Type in library(MARSS) at the R command line to load the package after
you install it.

Tips
ˆ

ˆ
ˆ

ˆ
ˆ
ˆ

summary(foo$model), where foo is a fitted model object, will print detailed information on the structure of the MARSS model that was fit in the
call foo = MARSS(logdata). This allows you to double check the model
you fit. print(foo) will print a ‘English’ version of the model structure
along with the parameter estimates.
When you run MARSS(), it will output the number of iterations used. If you
reached the maximum, re-run with control=list(maxit=...) set higher
than the default.
If you mis-specify the model, MARSS() will post an error that should give
you an idea of the problem (make sure silent=FALSE to see full error
reports). Remember, the number of rows in your data is n, time is across
the columns, and the length of the vector or factors passed in for model$Z
must be m, the number of x hidden state trajectories in your model.
The missing value indicator is NA.
Running MARSS(data), with no arguments except your data, will fit a
MARSS model with m = n, a diagonal Q matrix with m variances, and
i.i.d. observation errors.
Try MARSSinfo() at the command line if you get errors or warnings you
don’t understand. You might find insight there. Or look at the warnings
and errors notes in the appendix of this user guide.

6
Count-based population viability analysis
(PVA) using corrupted data

6.1 Background
Estimates of extinction and quasi-extinction risk are an important risk
metric used in the management and conservation of endangered and threatened species. By necessity, these estimates are based on data that contain
both variability due to real year-to-year changes in the population growth
rate (process errors) and variability in the relationship between the true population size and the actual count (observation errors). Classic approaches to
extinction risk assume the data have only process error, i.e. no observation
error. In reality, observation error is ubiquitous both because of the sampling
variability and also because of year-to-year (and day-to-day) variability in
sightability.
In this application, we will fit a univariate (meaning one time series) statespace model to population count data with observation error. We will compute
the extinction risk metrics given in Dennis et al. (1991), however instead of
using a process-error only model (as is done in the original paper), we use
a model with both process and observation error. The risk metrics and their
interpretations are the same as in Dennis et al. (1991). The only real difference
is how we compute σ2 , the process error variance. However this difference has
a large effect on our risk estimates, as you will see.
We use here a density-independent model, a stochastic exponential growth
model in log space. This equivalent to a MARSS model with B = 1. Densityindependence is often a reasonable assumption when doing a population viability analysis because we do such calculations for at-risk populations that
are either declining or that are well below historical levels (and presumably
carrying capacity). In an actual population viability analysis, it is necessary
to justify this assumption and if there is reason to doubt the assumption,
Type RShowDoc("Chapter_PVA.R",package="MARSS") at the R command line to
open a file with all the code for the examples in this chapter.

62

6 Count-based PVA

one tests for density-dependence (Taper and Dennis, 1994) and does sensitivity analyses using state-space models with density-dependence (Dennis et al.,
2006).
The univariate model is written:
xt = xt−1 + u + wt

where wt ∼ N(0, σ2 )

(6.1)

yt = xt + vt

N(0, η2 )

(6.2)

where vt ∼

where yt is the logarithm of the observed population size at time t, xt is the
unobserved state at time t, u is the growth rate, and σ2 and η2 are the process
and observation error variances, respectively. In the R code to follow, σ2 is
denoted Q and η2 is denoted R because the functions we are using are also
for multivariate state-space models and those models use Q and R for the
respective variance-covariance matrices.

6.2 Simulated data with process and observation error
We will start by using simulated data to see the difference between data and
estimates from a model with process error only versus a model that also includes observation error. For our simulated data, we used a decline of 5%
per year, process variability of 0.02 (typical for small to medium-sized vertebrates), and a observation variability of 0.05 (which is a bit on the high end).
We’ll randomly set 10% of the values as missing. Here is the code:
First, set things up:
sim.u = -0.05
sim.Q = 0.02
sim.R = 0.05
nYr= 50
fracmissing = 0.1
init = 7
years = seq(1:nYr)
x = rep(NA,nYr)
y = rep(NA,nYr)

#
#
#
#
#
#
#
#

growth rate
process error variance
non-process error variance
number of years of data to generate
fraction of years that are missing
log of initial pop abundance
sequence 1 to nYr
replicate NA nYr times

Then generate the population sizes using Equation 6.1:
x[1]=init
for(t in 2:nYr){
x[t] = x[t-1]+ sim.u + rnorm(1,mean=0,sd=sqrt(sim.Q)) }
Lastly, add observation error using Equation 6.2 and then add missing
values:
for(t in 1:nYr){
y[t]= x[t] + rnorm(1,mean=0,sd=sqrt(sim.R))
}

6.2 Simulated data with process and observation error

63

missYears = sample(years[2:(nYr-1)],floor(fracmissing*nYr),
replace = FALSE)
y[missYears]=NA
Stochastic population trajectories show much variation, so it is best to
look at a few simulated data sets at once. In Figure 6.1, nine simulations from
the identical parameters are shown.

0

10 20 30 40 50

7
6
5

simulation 6

0

0

10 20 30 40 50

10 20 30 40 50

7.0

simulation 9
●
●●
● ●●
● ●●● ● ● ●
●
●●
●●●●● ●
● ● ●●

5.5

● ●
●
● ●●●
● ●
● ●
●
●●●
●●●●●●● ●
●
●
●
●●●
● ●●●
● ●●●● ●
●●●
●
●
●

●● ●
●
●●
●● ●● ●
●●
●
●●
●●
●

4.0

6.5
5.5
4.5

7
6
5
4

10 20 30 40 50

10 20 30 40 50

●●
●●●
●
●● ● ●
●
●●
●●●●●
●
●
●●
● ●
●
●
●●●●●●
●
● ●● ●●
●● ● ●
●
●●

simulation 8
index of log abundance

●●●● ●●
●●●● ●● ●

3

index of log abundance

simulation 7
●
●●●
● ●
●●●●●●●
● ●
●●●
●
● ●●●●
●●●●
● ● ●●

0

0

4.5 5.5 6.5

●● ●
●●●●● ● ●● ●●
●● ● ●
●● ●●●●●
● ●●● ●
●● ●
●
● ●
● ●●●
●●●
●
●●

index of log abundance

6.5
5.0
3.5

●

10 20 30 40 50

index of log abundance

8.0
6.5

●
●
●●●●●
● ●●●●● ●
●
●
●● ●●

0

●
●●
●●
●● ● ●● ●
●●
●
●● ●●
●● ●
● ●●● ●●
●
●
●
●●
● ●●●
●
●●●● ●
●●●

simulation 5

●●●●
●
●●● ●
● ●●
● ●●●
●●● ●
● ●
●●

5.0

index of log abundance

simulation 4

10 20 30 40 50

4

5.5

0

simulation 3
index of log abundance

10 20 30 40 50

●
● ●
●●●●
●● ●
●
●
●●
● ● ●●
●
● ● ●●
● ●●●●●
●●
●
●
●
●
●●
● ●● ●
●
●●
●

index of log abundance

0

6.5

simulation 2
index of log abundance

7
6
5
4

index of log abundance

simulation 1
●
● ●●
●●●
●●●●
● ●
● ●
●
●●●
●
●
●
●●●● ●●●●●
●
●
●●●
●● ● ●
●●●●●

0

10 20 30 40 50

Fig. 6.1. Plot of nine simulated population time series with process and observation
error. Circles are observation and the dashed line is the true population size.

Example 6.1 (The effect of parameter values on parameter estimates)
A good way to get a feel for reasonable σ2 values is to generate simulated data
and look at the time series. A biologist would have a pretty good idea of what
kind of year-to-year population changes are reasonable for their study species.
For example for many large mammalian species, the maximum population
yearly increase would be around 50% (the population could go from 1000 to

64

6 Count-based PVA

1500 in one year), but some of fish species could easily double or even triple in
a really good year. Observed data may bounce around a lot for many different
reasons having to do with sightability, sampling error, age-structure, etc., but
the underlying population trajectory is constrained by the kinds of year-to-year
changes in population size that are biologically possible. σ2 describes those true
population changes.
You should run the example code several times using different parameter values to get a feel for how different the time series can look based on identical
parameter values. You can cut and paste from the pdf into the R command
line. Typical vertebrate σ2 values are 0.002 to 0.02, and typical η2 values are
0.005 to 0.1. A u of -0.01 translates to an average 1% per year decline and a
u of -0.1 translates to an average 10% per year decline (approximately).

Example 6.1 code
par(mfrow=c(3,3))
sim.u = -0.05
sim.Q = 0.02
sim.R = 0.05
nYr= 50
fracmiss = 0.1
init = 7
years = seq(1:nYr)
for(i in 1:9){
x = rep(NA,nYr) # vector for ts w/o measurement error
y = rep(NA,nYr) # vector for ts w/ measurement error
x[1]=init
for(t in 2:nYr){
x[t] = x[t-1]+ sim.u + rnorm(1, mean=0, sd=sqrt(sim.Q)) }
for(t in 1:nYr){
y[t]= x[t] + rnorm(1,mean=0,sd=sqrt(sim.R)) }
missYears =
sample(years[2:(nYr-1)],floor(fracmiss*nYr),replace = FALSE)
y[missYears]=NA
plot(years, y,
xlab="",ylab="log abundance",lwd=2,bty="l")
lines(years,x,type="l",lwd=2,lty=2)
title(paste("simulation ",i) )
}
legend("topright", c("Observed","True"),
lty = c(-1, 2), pch = c(1, -1))

6.3 Maximum-likelihood parameter estimation

65

6.3 Maximum-likelihood parameter estimation
6.3.1 Model with process and observation error
Using the simulated data, we estimate the parameters, u, σ2 , and η2 , and the
hidden population sizes. These are the estimates using a model with process
and observation variability. The function call is kem = MARSS(data), where
data is a vector of logged (base e) counts with missing values denoted by NA.
After this call, the maximum-likelihood parameter estimates are shown with
coef(kem). There are numerous other outputs from the MARSS() function. To
get a list of the standard model output available type in ?print.MARSS. Note
that kem is just a name; the output could have been called foo. Here’s code
to fit to the simulated time series:
kem = MARSS(y)
Let’s look at the parameter estimates for the nine simulated time series
in Figure 6.1 to get a feel for the variation. The MARSS() function was used
on each time series to produce parameter estimate for each simulation. The
estimates are followed by the mean (over the nine simulations) and the true
values:
sim 1
sim 2
sim 3
sim 4
sim 5
sim 6
sim 7
sim 8
sim 9
mean sim
true

kem.U
-0.07340254
-0.02955458
-0.06468184
-0.03546548
-0.06600771
-0.05154663
-0.07953722
-0.04622466
-0.04827980
-0.05496672
-0.05000000

kem.Q
0.011951194
0.055749879
0.000000000
0.031934036
0.008450966
0.009137402
0.005988066
0.023932029
0.021325149
0.018718747
0.020000000

kem.R
0.052419041
0.003257744
0.092393541
0.040441294
0.071950486
0.072497614
0.071740967
0.033372804
0.048361357
0.054048316
0.050000000

As expected, the estimated parameters do not exactly match the true parameters, but the average should be fairly close (although nine simulations is a
small sample size). Also note that although we do not get u quite right, our
estimates are usually negative. Thus our estimates usually indicate declining dynamics. Some of the kem.Q estimates may be 0. This means that the
maximum-likelihood estimate that the data are generated by is a process with
no environment variation and only observation error.
The MARSS model fit also gives an estimate of the true population size
with observation error removed. This is in kem$states. Figure 6.2 shows the
estimated true states of the population over time as a solid line. Note that the
solid line is considerably closer to the actual true states (dashed line) than
the observations. On the other hand with certain datasets, the estimates can
be quite wrong as well!

6 Count-based PVA

0

10 20 30 40 50

7
6
5

simulation 6

0

0

10 20 30 40 50

10 20 30 40 50

simulation 9
7.0

● ●
●
● ●●●
● ●
● ●
●
●●●
●●●●●●● ●
●
●
●
●●●
● ●●●
● ●●●● ●
●●●
●
●
●

●
●●
● ●●
● ●●● ● ● ●
●
●●
●●●●● ●
● ● ●●

5.5

6.5
5.5
4.5

7
6
5
4

10 20 30 40 50

10 20 30 40 50

●●
●●●
●
●● ● ●
●
●●
●●●●●
●
●
●●
● ●
●
●
●●●●●●
●
● ●● ●●
●● ● ●
●
●●

simulation 8
index of log abundance

●●●● ●●
●●●● ●● ●

3

index of log abundance

simulation 7
●
●●●
● ●
●●●●●●●
● ●
●●●
●
● ●●●●
●●●●
● ● ●●

0

0

4.5 5.5 6.5

●● ●
●●●●● ● ●● ●●
●● ● ●
●● ●●●●●
● ●●● ●
●● ●
●
● ●
● ●●●
●●●
●
●●

index of log abundance

6.5
5.0
3.5

●

10 20 30 40 50

index of log abundance

8.0
6.5

●
●
●●●●●
● ●●●●● ●
●
●
●● ●●

0

●
●●
●●
●● ● ●● ●
●●
●
●● ●●
●● ●
● ●●● ●●
●
●
●
●●
● ●●●
●
●●●● ●
●●●

simulation 5

●●●●
●
●●● ●
● ●●
● ●●●
●●● ●
● ●
●●

5.0

index of log abundance

simulation 4

10 20 30 40 50

4

5.5

0

simulation 3
index of log abundance

10 20 30 40 50

●
● ●
●●●●
●● ●
●
●
●●
● ● ●●
●
● ● ●●
● ●●●●●
●●
●
●
●
●
●●
● ●● ●
●
●●
●

●● ●
●
●●
●● ●● ●
●●
●
●●
●●
●

4.0

0

6.5

simulation 2
index of log abundance

7
6
5
4

index of log abundance

simulation 1
●
● ●●
●●●
●●●●
● ●
● ●
●
●●●
●
●
●
●●●● ●●●●●
●
●
●●●
●● ● ●
●●●●●

index of log abundance

66

0

10 20 30 40 50

Fig. 6.2. The circles are the observed population sizes with error. The dashed lines
are the true population sizes. The solid thin lines are the estimates of the true
population size from the MARSS model. When the process error variance is 0, these
lines are straight.

6.3.2 Model with no observation error
We used the MARSS model to estimate the mean population rate u and process variability σ2 under the assumption that the count data have observation
error. However, the classic approach to this problem, referred to as the “Dennis model” (Dennis et al., 1991), uses a model that assumes the data have no
observation error (a MAR model); all the variability in the data is assumed to
result from process error. This approach works well if the observation error in
the data is low, but not so well if the observation error is high. We will next
fit the data using the classic approach so that we can compare and contrast
parameter estimates from the different methods.
Using the estimation method in Dennis et al. (1991), our data need to be
re-specified as the observed population changes (delta.pop) between censuses
along with the time between censuses (tau). We re-specify the data as follows:
den.years = years[!is.na(y)] # the non missing years
den.y = y[!is.na(y)] # the non missing counts

6.3 Maximum-likelihood parameter estimation

67

den.n.y = length(den.years)
delta.pop = rep(NA, den.n.y-1 ) # population transitions
tau = rep(NA, den.n.y-1 ) # step sizes
for (i in 2:den.n.y ){
delta.pop[i-1] = den.y[i] - den.y[i-1]
tau[i-1] = den.years[i] - den.years[i-1]
} # end i loop
Next, we regress the changes in population size between censuses (delta.pop)
on the time between censuses (tau) while setting the regression intercept to 0.
The slope of the resulting regression line is an estimate of u, while the variance
of the residuals around the line is an estimate of σ2 . The regression is shown
in Figure 6.3. Here is the code to do that regression:
den91 <- lm(delta.pop ~ -1 + tau)
# note: the "-1" specifies no intercept
den91.u = den91$coefficients
den91.Q = var(resid(den91))
#type ?lm to learn about the linear regression function in R
#form is lm(dependent.var ~ response.var1 + response.var2 + ...)
#type summary(den91) to see other info about our regression fit
Here are the parameter values for the data in Figure 6.2 using the processerror only model:
sim 1
sim 2
sim 3
sim 4
sim 5
sim 6
sim 7
sim 8
sim 9
mean sim
true

den91.U
-0.057849407
-0.003238396
-0.076826139
-0.023682404
-0.041188033
-0.039165343
-0.108078066
-0.054505097
-0.057180717
-0.051301511
-0.050000000

den91.Q
0.12482394
0.07356204
0.14110386
0.11590874
0.18079387
0.17231437
0.12537368
0.09170650
0.11592826
0.12683503
0.02000000

Notice that the u estimates are similar to those from MARSS model, but the
σ2 estimate (Q) is much larger. That is because this approach treats all the
variance as process variance, so any observation variance in the data is lumped
into process variance (in fact it appears as an additional variance of twice the
observation variance).
Example 6.2 (The variability in parameter estimates)

6 Count-based PVA

0.5

●
●

●

−0.5

0.0

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

●
●
●

●
●
●

−1.0

population transition size

68

●

0.0

0.5

1.0

1.5

2.0

time step size (tau)

Fig. 6.3. The regression of log(Nt+τ ) − log(Nt ) against τ. The slope is the estimate
of u and the variance of the residuals is the estimate of σ2 .

In this example, you will look at how variable the parameter estimates are by
generating multiple simulated data sets and then estimating parameter values
for each. You’ll compare the MARSS estimates to the estimates using a process
error only model (i.e. ignoring the observation error).
Run the example code a few times to compare the estimates using a state-space
model (kem) versus the model with no observation error (den91). You can
copy and paste the code from the pdf file into R . Next, chanbe the observation
variance in the code, sim.R, in the data generation step in order to get a feel
for the estimation performance as observations are further corrupted. What
happens as observation error is increased? Next, decrease the number of years
of data, nYr, and re-run the parameter estimation. What is the effect of fewer
years of data? If you find that the example code takes too long to run, reduce
the number of simulations (by reducing nsim in the code).

6.3 Maximum-likelihood parameter estimation

69

Example 6.2 code
sim.u = -0.05
# growth rate
sim.Q = 0.02
# process error variance
sim.R = 0.05
# non-process error variance
nYr= 50
# number of years of data to generate
fracmiss = 0.1 # fraction of years that are missing
init = 7
# log of initial pop abundance (~1100 individuals)
nsim = 9
years = seq(1:nYr) # col of years
params = matrix(NA, nrow=(nsim+2), ncol=5,
dimnames=list(c(paste("sim",1:nsim),"mean sim","true"),
c("kem.U","den91.U","kem.Q","kem.R", "den91.Q")))
x.ts = matrix(NA,nrow=nsim,ncol=nYr) # ts w/o measurement error
y.ts = matrix(NA,nrow=nsim,ncol=nYr) # ts w/ measurement error
for(i in 1:nsim){
x.ts[i,1]=init
for(t in 2:nYr){
x.ts[i,t] = x.ts[i,t-1]+sim.u+rnorm(1,mean=0,sd=sqrt(sim.Q))}
for(t in 1:nYr){
y.ts[i,t] = x.ts[i,t]+rnorm(1,mean=0,sd=sqrt(sim.R))}
missYears = sample(years[2:(nYr-1)], floor(fracmiss*nYr),
replace = FALSE)
y.ts[i,missYears]=NA
#MARSS estimates
kem = MARSS(y.ts[i,], silent=TRUE)
#type=vector outputs the estimates as a vector instead of a list
params[i,c(1,3,4)] = coef(kem,type="vector")[c(2,3,1)]
#Dennis et al 1991 estimates
den.years = years[!is.na(y.ts[i,])] # the non missing years
den.yts = y.ts[i,!is.na(y.ts[i,])]
# the non missing counts
den.n.yts = length(den.years)
delta.pop = rep(NA, den.n.yts-1 ) # transitions
tau = rep(NA, den.n.yts-1 )
# time step lengths
for (t in 2:den.n.yts ){
delta.pop[t-1] = den.yts[t] - den.yts[t-1] # transitions
tau[t-1] = den.years[t]-den.years[t-1]
# time step length
} # end i loop
den91 <- lm(delta.pop ~ -1 + tau) # -1 specifies no intercept
params[i,c(2,5)] = c(den91$coefficients, var(resid(den91)))
}
params[nsim+1,]=apply(params[1:nsim,],2,mean)
params[nsim+2,]=c(sim.u,sim.u,sim.Q,sim.R,sim.Q)

70

6 Count-based PVA

Here is an example of the output from the Example 6.2 code:
print(params,digits=3)

sim 1
sim 2
sim 3
sim 4
sim 5
sim 6
sim 7
sim 8
sim 9
mean sim
true

kem.U
-0.0287
-0.0635
-0.0206
-0.0410
-0.0457
-0.0642
-0.0765
-0.0371
-0.0338
-0.0457
-0.0500

den91.U
-0.0384
-0.0669
-0.0365
-0.0438
-0.0278
-0.0875
-0.0838
-0.0300
-0.0443
-0.0510
-0.0500

kem.Q
0.02118
0.02218
0.03940
0.00000
0.03530
0.01841
0.02101
0.00336
0.00957
0.01893
0.02000

kem.R den91.Q
0.0680 0.1486
0.0516 0.1304
0.0514 0.1571
0.0753 0.1300
0.0392 0.1216
0.0292 0.0772
0.0512 0.1343
0.0497 0.0829
0.0750 0.1522
0.0545 0.1260
0.0500 0.0200

6.4 Probability of hitting a threshold Π(xd ,te )
A common extinction risk metric is ‘the probability that a population will hit
a certain threshold xd within a certain time frame te – if the observed trends
continue’. In practice, the threshold used is not Ne = 1, which would be true
extinction. Often a ‘functional’ extinction threshold will be used (Ne >> 1).
Other times a threshold representing some fraction of current levels is used.
The latter is used because we often have imprecise information about the
relationship between the true population size and what we measure in the
field; that is, many population counts are index counts. In these cases, one
must use ‘fractional declines’ as the threshold. Also, extinction estimates that
use an absolute threshold (like 100 individuals) are quite sensitive to error
in the estimate of true population size. Here, we are going to use fractional
declines as the threshold, specifically pd = 0.1 which means a 90% decline.
The probability of hitting a threshold, denoted Π(xd ,te ), is typically presented as a curve showing the probabilities of hitting the threshold (y-axis)
over different time horizons (te ) on the x-axis. Extinction probabilities can be
computed through Monte Carlo simulations or analytically using Equation 16
in Dennis et al. (1991) (note there is a typo in Equation 16; the last + is
supposed to be a − ). We will use the latter method:
!
!
−x
−
|u|t
−xd + |u|te
e
d
p
p
+ exp(2xd |u|/σ2 )Φ
(6.3)
Π(xd ,te ) = π(u) × Φ
σ2te
σ2te

6.4 Probability of hitting a threshold Π(xd ,te )

71

where xe is the threshold and is defined as xe = log(N0 /Ne ). N0 is the current
population estimate and Ne is the threshold. If we are using fractional declines
then xe = log(N0 /(pd ×N0 )) = − log(pd ). π(u) is the probability that the threshold is eventually hit (by te = ∞). π(u) = 1 if u <= 0 and π(u) = exp(−2uxd /σ2 )
if u > 0. Φ() is the cumulative probability distribution of the standard normal
(mean = 0, sd = 1).
Here is the R code for that computation:
pd = 0.1 #means a 90 percent decline
tyrs = 1:100
xd = -log(pd)
p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q)) #Q=sigma2
for (i in 1:100){
Pi[i] = p.ever * pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+
exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))
}
Figure 6.4 shows the estimated probabilities of hitting the 90% decline for
the nine 30-year times series simulated with u = −0.05, σ2 = 0.01 and η2 = 0.05.
The dashed line shows the estimates using the MARSS parameter estimates
and the solid line shows the estimates using a process-error only model (the
den91 estimates). The circles are the true probabilities. The difference between
the estimates and the true probalities is due to errors in û. Those errors are due
largely to process error—not observation error. As we saw earlier, by chance
population trajectories with a u < 0 will increase, even over a 50-year period.
In this case, û will be positive when in fact u < 0.
Looking at the figure, it is obvious that the probability estimates are highly
variable. However, look at the first panel. This is the average estimate (over
nine simulations). Note that on average (over nine simulations), the estimates
are good. If we had averaged over 1000 simulations instead of nine, you would
see that the MARSS line falls on the true line. It is an unbiased predictor.
While that may seem a small consolation if estimates for individual simulations
are all over the map, it is important for correctly specifying our uncertainty
about our estimates. Second, rather than focusing on how the estimates and
true lines match up, see if there are any types of forecasts that seem better
than others. For example, are 20-year predictions better than 50-year and
are 100-year forecasts better or worse. In Example 6.3, you will remake this
figure with different u. You’ll discover from that forecasts are more certain for
populations that are declining faster.
Example 6.3 (The effect of parameter values on risk estimates)

In this example, you will recreate Figure 6.4 using different parameter values.
This will give you a feel for how variability in the data and population pro-

6 Count-based PVA

20 40 60 80

0.8
0.4
0.0

probability of extinction

0.8
0.4
0.0

probability of extinction

0

simulation 2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

0

20 40 60 80

20 40 60 80

0

20 40 60 80

0.4
0.0

0.4
0.0

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

0.8

simulation 5
probability of extinction

simulation 4
0.8

simulation 3
probability of extinction

time steps into future

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

0

20 40 60 80
time steps into future

simulation 6

simulation 7

simulation 8

20 40 60 80
time steps into future

0

20 40 60 80
time steps into future

0.4
0.0

0.4
0.0

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

0.8

time steps into future

0.8

time steps into future

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

0

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

time steps into future

probability of extinction

0.8
0.4
0.0
0.8
0.4

probability of extinction

0.0
0.8

20 40 60 80

simulation 1

time steps into future

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

0

0.4

probability of extinction

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

0

0.0

probability of extinction

average over sims

probability of extinction

72

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

True
Dennis
KalmanEM

0

20 40 60 80
time steps into future

Fig. 6.4. Plot of the true and estimated probability of declining 90% in different
time horizons for nine simulated population time series with observation error. The
plot may look like a step-function if the σ2 estimate is very small (<1e-4 or so).

cess affect the risk estimates. You’ll need to run the Example 6.2 code before
running the Example 6.3 code.
Begin by changing sim.R and rerunning the Example 6.2 code. Now run the
Example 6.3 code and generate parameter estimates. When are the estimates
using the process-error only model (den91) worse and in what way are they
worse? You might imagine that you should always use a model that includes
observation error, since in practice observations are never perfect. However,
there is a cost to estimating that extra variance parameter and the cost is
a more variable σ2 (Q) estimate. Play with shortening the time series and
decreasing the sim.R values. Are there situations when the ‘cost’ of the extra
parameter is greater than the ‘cost’ of ignoring observation error?
Next change the rate of decline in the simulated data. To do this, rerun the
Example 6.2 code using a lower sim.u; then run the Example 6.3 code. Do
the estimates seem better or worse for rapidly declining populations? Rerun
the Example 6.2 code using fewer number of years (nYr smaller) and increase

6.4 Probability of hitting a threshold Π(xd ,te )

73

fracmiss. Run the Example 6.3 code again. The graphs will start to look peculiar. Why do you think it is doing that? Hint: look at the estimated parameters.
Last change the extinction threshold (pd in the Example 6.3 code). How does
changing the extinction threshold change the extinction probability curves? Do
not remake the data, i.e. don’t rerun the Example 6.2 code.

74

6 Count-based PVA

Example 6.3 code
#Needs Example 2 to be run first
par(mfrow=c(3,3))
pd = 0.1; xd = -log(pd)
# decline threshold
te = 100; tyrs = 1:te
# extinction time horizon
for(j in c(10,1:8)){
real.ex = denn.ex = kal.ex = matrix(nrow=te)
#MARSS parameter estimates
u=params[j,1];
Q=params[j,3]
if(Q==0) Q=1e-4 #just so the extinction calc doesn't choke
p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q))
for (i in 1:100){
if(is.finite(exp(2*xd*abs(u)/Q))){
sec.part = exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)* tyrs[i])/sqrt(Q*tyrs[i]))
}else sec.part=0
kal.ex[i]=p.ever*pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+sec.part
} # end i loop
#Dennis et al 1991 parameter estimates
u=params[j,2];
Q=params[j,5]
p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q))
for (i in 1:100){
denn.ex[i]=p.ever*pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+
exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))
} # end i loop
#True parameter values
u=sim.u;
Q=sim.Q
p.ever = ifelse(u<=0,1,exp(-2*u*xd/Q))
for (i in 1:100){
real.ex[i]=p.ever*pnorm((-xd+abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))+
exp(2*xd*abs(u)/Q)*pnorm((-xd-abs(u)*tyrs[i])/sqrt(Q*tyrs[i]))
} # end i loop
#plot it
plot(tyrs, real.ex, xlab="time steps into future",
ylab="probability of extinction", ylim=c(0,1), bty="l")
if(j<=8) title(paste("simulation ",j) )
if(j==10) title("average over sims")
lines(tyrs,denn.ex,type="l",col="red",lwd=2,lty=1)
lines(tyrs,kal.ex,type="l",col="green",lwd=2,lty=2)
}
legend("bottomright",c("True","Dennis","KalmanEM"),pch=c(1,-1,-1),
col=c(1,2,3),lty=c(-1,1,2),lwd=c(-1,2,2),bty="n")

6.5 Certain and uncertain regions

75

6.5 Certain and uncertain regions
From Example 6.3, you have observed one of the problems with estimates of
the probability of hitting thresholds. Looking over the nine simulations, your
risk estimates will be on the true line sometimes and other times they are
way off. So your estimates are variable and one should not present only the
point estimates of the probability of 90% decline. At the minimum, confidence
intervals need to be added (next section), but even with confidence intervals,
the probability of hitting declines often does not capture our certainty and
uncertainty about extinction risk estimates.
From Example 6.3, you might have also noticed that there are some time
horizons (10, 20 years) for which the estimate are highly certain (the threshold
is never hit), while for other time horizons (30, 50 years) the estimates are all
over the map. Put another way, you may be able to say with high confidence
that a 90% decline will not occur between years 1 to 20 and that by year 100
it most surely will have occurred. However, between the years 20 and 100, you
are very uncertain about the risk. The point is that you can be certain about
some forecasts while at the same time being uncertain about other forecasts.
One way to show this is to plot the uncertainty as a function of the forecast,
where the forecast is defined in terms of the forecast length (number of years)
and forecasted decline (percentage). Uncertainty is defined as how much of the
0-1 range your 95% confidence interval covers. Ellner and Holmes (2008) show
such a figure (their Figure 1). Figure 6.5 shows a version of this figure that
you can produce with the function CSEGtmufigure(u= val, N= val, s2p=
val). For the figure, the values u = −0.05 which is a 5% per year decline,
N = 25 so 25 years between the first and last census, and s2p = 0.01 are used.
The process variability for big mammals is typically in the range of 0.002 to
0.02.
Example 6.4 (Uncertain and certain regions)

Use the Example 6.4 code to re-create Figure 6.5 and get a feel for when risk
estimates are more certain and when they are less certain. N are the number
of years of data, u is the mean population growth rate, and s2p is the process
variance.

Example 6.4 code
par(mfrow = c(1, 1))
CSEGtmufigure(N = 50, u = -0.05, s2p = 0.02)

76

6 Count-based PVA

−1.0

high certainty P<0.05
high certainty P>0.95
uncertain
highly uncertain

50%

90%

−1.5
−2.0

xe = log10(N0/Ne)

−0.5

0.0

time steps = 50
mu = −0.05 s2.p = 0.02

99%
20

40

60

80

100

Projection interval T time steps

Fig. 6.5. This figure shows your region of high uncertainty (dark gray). In this
region, the minimum 95% confidence intervals (meaning if you had no observation
error) span 80% of the 0 to 1 probability. That is, you are uncertain if the probability
of a specified decline is close to 0 or close to 1. The white area shows where your
upper 95% CIs does not exceed P=0.05. So you are quite sure the probability of
a specified decline is less than 0.05. The black area shows where your lower 95%
confidence interval is above P=.95. So you are quite sure the probability is greater
than P=0.95. The light gray is between these two certain/uncertain extremes.

6.6 More risk metrics and some real data
The previous sections have focused on the probability of hitting thresholds
because this is an important and common risk metric used in population
viability analysis and it appears in IUCN Red List criteria. However, as you
have seen, there is high uncertainty associated with such estimates. Part of
the problem is that probability is constrained to be 0 to 1, and it is easy to get
estimates with confidence intervals that span 0 to 1. Other metrics of risk, û
and the distribution of the time to hit a threshold (Dennis et al., 1991), do not
have this problem and may be more informative. Figure 6.6 shows different
risk metrics from Dennis et al. (1991) on a single plot. This figure is generated
by a call to the function CSEGriskfigure():

6.7 Confidence intervals

77

dat=read.table(datafile, skip=1)
dat=as.matrix(dat)
CSEGriskfigure(dat)
The datafile is the name of the data file, with years in column 1 and population count (logged) in column 2. CSEGriskfigure() has a number of arguments that can be passed in to change the default behavior. The variable
te is the forecast length (default is 100 years), threshold is the extinction
threshold either as an absolute number, if absolutethresh=TRUE, or as a
fraction of current population count, if absolutethresh=FALSE. The default
is absolutethresh=FALSE and threshold=0.1. datalogged=TRUE means the
data are already logged; this is the default.
Example 6.5 (Risk figures for different species)

Use the Example 6.5 code to re-create Figure 6.6. The package includes other
data for you to run: prairiechicken from the endangered Attwater Prairie
Chicken, graywhales from Gerber et al. (1999), and grouse from the Sharptailed Grouse (a species of U.S. federal concern) in Washington State. Note
for some of these other datasets, the Hessian matrix cannot be inverted and
you will need to use CI.method="parametric". If you have other text files of
data, you can run those too. The commented lines show how to read in data
from a tab-delimited text file with a header line.

Example code
#If you have your data in a tab delimited file with a header
#This is how you would read it in using file.choose()
#to call up a directory browser.
#However, the package has the datasets for the examples
#dat=read.table(file.choose(), skip=1)
#dat=as.matrix(dat)
dat = wilddogs
CSEGriskfigure(dat, CI.method="hessian", silent=TRUE)

6.7 Confidence intervals
The figures produced by CSEGriskfigure() have confidence intervals (95%
and 75%) on the probabilities in the top right panel. A standard way to

78

6 Count-based PVA

●
●

●

20

●

●

● ●

●
●

●

●

●
●

●

●

1.0
0.4

0.6

0.8

95% CI
75% CI
mean

0.2

●

probability to hit threshold

●

●

Prob. to hit 2

0.0

60

●

40

Pop. Estimate

80

u est = −0.054 (95% CIs NULL , NULL )
Q est = 0.052

1970 1974 1978 1982 1986 1990

0

20

40

60

80

100

time steps into future

50

100

150

1.0
0.8
0.6
0.4
0.2

probability to hit threshold

0.010

200

1.5

2.0

2.5

3.0

3.5

4.0

time steps into future

Number of ind. at Ne

Sample projections

time steps = 22
mu = −0.054 s2.p = 0.052

4.5

xe = log10(N0/Ne)
0

20

40

60

80

time steps into the future

100

−2.0 −1.5 −1.0 −0.5

40
0

20

N

60

0.0

0

90% threshold

0.0

0.020

Prob. of hitting threshold in 100 time steps

0.000

probability to hit threshold

PDF of time to threshold
given it IS reached

50%

90%

99%
20

40

60

80

100

Projection interval T time steps

Fig. 6.6. Risk figure using data for the critically endangered African Wild Dog
(data from Ginsberg et al. 1995). This population went extinct after 1992.

produce these intervals is via parametric bootstrapping. Here are the steps in
a parametric bootstrap:
ˆ
ˆ
ˆ
ˆ

You estimate u, σ2 and η2
Then you simulate time series using those estimates and Equations 6.1 and
6.2
Then you re-estimate your parameters from the simulated data (using say
MARSS(simdata)
Repeat for 1000s of time series simulated using your estimated parameters.
This gives you a large set of bootstrapped parameter estimates

6.8 Comments

ˆ
ˆ

79

For each bootstrapped parameter set, compute a set of extinction estimates
(you use Equation 6.3 and code from Example 6.3)
The α% ranges on those bootstrapped extinction estimates gives you your
α confidence intervals on your probabilities of hitting thresholds

The MARSS package provides the function MARSSparamCIs() to add bootstrapped confidence intervals to fitted models (type ?MARSSparamCIs to learn
about the function).
In the function CSEGriskfigure(), you can set CI.method = c("hessian",
"parametric", "innovations", "none") to tell it how to compute the confidence intervals. The methods ‘parametric’ and ‘innovations’ specify parametric and non-parametric bootstrapping respectively. Producing parameter
estimates by bootstrapping is quite slow. Approximate confidence intervals on
the parameters can be generated rapidly using the inverse of a numerically
estimated Hessian matrix (method ‘hessian’). This uses an estimate of the
variance-covariance matrix of the parameters (the inverse of the Hessian matrix). Using an estimated Hessian matrix to compute confidence intervals is a
handy trick that can be used for all sorts of maximum-likelihood parameter
estimates.

6.8 Comments
Data with cycles, from age-structure or predator-prey interactions, are difficult
to analyze and the EM algorithm used in the MARSS package will give poor
estimates for this type of data. The slope method (Holmes, 2001) is more
robust to those problems. Holmes et al. (2007) used the slope method in a
large study of data from endangered and threatened species, and Ellner and
Holmes (2008) showed that the slope estimates are close to the theoretical
minimum uncertainty. Especially, when doing a population viability analysis
using a time series with fewer than 25 years of data, the slope method is often
less biased and (much) less variable because that method is less data-hungry
(Holmes, 2004). However the slope method is not a true maximum-likelihood
method and thus constrains the types of further analyses you can do (such as
model selection).

7
Combining multi-site data to estimate regional
population trends

7.1 Harbor seals in the Puget Sound, WA.
In this application, we will use multivariate state-space models to combine
surveys from multiple regions (or sites) into one estimate of the average longterm population growth rate and the year-to-year variability in that growth
rate. Note this is not quite the same as estimating the ‘trend’; ‘trend’ often
means what population change happened, whereas the long-term population
growth rate refers to the underlying population dynamics. We will use as our
example a dataset from harbor seals in Puget Sound, Washington, USA.
We have five regions (or sites) where harbor seals were censused from 19781999 while hauled out of land1 . During the period of this dataset, harbor seals
were recovering steadily after having been reduced to low levels by hunting
prior to protection. The methodologies were consistent throughout the 20
years of the data but we do not know what fraction of the population that
each region represents nor do we know the observation-error variance for each
region. Given differences between behaviors of animals in different regions and
the numbers of haul-outs in each region, the observation errors may be quite
different. The regions have had different levels of sampling; the best sampled
region has only 4 years missing while the worst has over half the years missing
(Figure 7.1).
We will assume that the underlying population process is a stochastic exponential growth process with rates of increase that were not changing through
1978-1999. However, we are not sure if all five regions sample a single “total
Puget Sound” population or if there are independent subpopulations. We will
estimate the long-term population growth rate using different assumptions

1

Type RShowDoc("Chapter_SealTrend.R",package="MARSS") at the R command
line to open a file with all the code for the examples in this chapter.
Jeffries et al. 2003. Trends and status of harbor seals in Washington State: 19781999. Journal of Wildlife Management 67(1):208–219

82

7 Combining multi-site and subpopulation data

8.5

Puget Sound Harbor Seal Surveys
2 2 2

8.0

2

2
2 2

2 2

2

2 2
3
1
2 2
3 3 1
3 1
3
1 3
3 3 3
2
3
1
2 3 3 3
1 1
1
1
3 3
1
1
1 5
4 4
1
1
5 4
5 4
4
1
4
1
4
4
5
5
5

7.5
7.0
6.0

6.5

log(counts)

2

2
3
5

3
1
4
5

1
4
1980

1985

1990

1995

Fig. 7.1. Plot of the of the count data from the five harbor seal regions (Jeffries et
al. 2003). The numbers on each line denote the different regions: 1) Strait of Juan
de Fuca (SJF), 2) San Juan Islands (SJI), 2) Eastern Bays (EBays), 4) Puget Sound
(PSnd), and 5) Hood Canal (HC). Each region is an index of the total harbor seal
population, but the bias (the difference between the index and the true population
size) for each region is unknown.

about the population structures (one big population versus multiple smaller
ones) and observation error structures to see how different assumptions change
the trend estimates.
The harbor seal data are included in the MARSS package. The data have
time running down the rows and years in the first column. We need time
across the columns for the MARSS() function, so we will transpose the data:
dat=t(harborSealWA) #Transpose
years = dat[1,] #[1,] means row 1
n = nrow(dat)-1
dat = dat[2:nrow(dat),] #no years
If you needed to read data in from a comma-delimited or tab-delimited file,
these are the commands to do that:

7.2 A single well-mixed population with i.i.d. errors

83

dat = read.csv("datafile.csv",header=TRUE)
dat = read.table("datafile.csv",header=TRUE)
The years are in column 1 of dat and the logged data are in the rest of the
columns. The number of observation time series (n) is the number of rows in
dat minus 1 (for years row). Let’s look at the first few years of data:
print(harborSealWA[1:8,], digits=3)
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]

Year
1978
1979
1980
1981
1982
1983
1984
1985

SJF
6.03
NA
NA
NA
NA
6.78
6.93
7.16

SJI EBays PSnd HC
6.75 6.63 5.82 6.6
NA
NA
NA NA
NA
NA
NA NA
NA
NA
NA NA
NA
NA
NA NA
7.43 7.21
NA NA
7.74 7.45
NA NA
7.53 7.26 6.60 NA

The NA’s in the data are missing values.
7.1.1 A MARSS model for Puget Sound harbor seals
The first step is to mathematically specify the population structure and how
the regions relate to that structure. The general state-space model is
xt = Bxt−1 + u + wt , where wt ∼ MVN(0, Q)
yt = Zxt + a + vt , where vt ∼ MVN(0, R)
where all the bolded symbols are matrices. To specify the structure of the
population and observations, we will specify what those matrices look like.

7.2 A single well-mixed population with i.i.d. errors
When we are looking at data over a large geographic region, we might make the
assumption that the different census regions are measuring a single population
if we think animals are moving sufficiently such that the whole area (multiple
regions together) is “well-mixed”. We write a model of the total population
abundance for this case as:
nt = exp(u + wt )nt−1 ,

(7.1)

where nt is the total count in year t, u is the mean population growth rate,
and wt is the deviation from that average in year t. We then take the log of
both sides and write the model in log space:
xt = xt−1 + u + wt , where wt ∼ N(0, q)

(7.2)

84

7 Combining multi-site and subpopulation data

xt = log nt . When there is one effective population, there is one x, therefore xt
is a 1 × 1 matrix. There is one population growth rate (u) and there is one
process variance (q). Thus u and Q are 1 × 1 matrices.
7.2.1 The observation process
We assume that all five regional time series are observations of this one population trajectory but they are scaled up or down relative to that trajectory.
In effect, we think that animals are moving around a lot and our regional
samples are some fraction of the population. There is year-to-year variation
in the fraction in each region, just by chance. Notice that under this analysis, we do not think the regions represent independent subpopulations but
rather independent observations of one population. Our model for the data,
yt = Zxt + a + vt , is written as:
   
 
 
0
v1
y1
1
 a2   v2 
 y2 
1
   
 
 
 y3  =  1  xt +  a3  +  v3 
(7.3)
   
 
 
 a4   v4 
 y4 
1
y5 t
a5
v5 t
1
Each yi is the time series for a different region. The a’s are the bias between
the regional sample and the total population. The a’s are scaling (or interceptlike) parameters2 . We allow that each region could have a unique observation
variance and that the observation errors are independent between regions.
Lastly, we assume that the observations errors on log(counts) are normal and
thus the errors on (counts) are log-normal.3
For our first analysis, we assume that the observation variance is equal
across regions but the errors are independent. This means we estimate one
observation variance instead of five. This is a fairly standard assumption for
data that come from the uniform survey methodology.4 . We specify independent observation errors with identical variances by specifying that the v’s
2

3
4

To get rid of the a’s, we scale multiple observation time series against each other;
thus one a will be fixed at 0. Estimating the bias between regional indices and
the total population is important for getting an estimate of the total population
size. The type of time-series analysis that we are doing here (trend analysis) is
not useful for estimating a’s. Instead to get a’s one would need some type of
mark-recapture data. However, for trend estimation, the a’s are not important.
The regional observation variance captures increased variance due to a regional
estimate being a smaller sample of the total population.
The assumption of normality is not unreasonable since these regional counts are
the sum of counts across multiple haul-outs.
By the way, this is not a good assumption for these data since the number haulouts in each region varies and the regional counts are the sums across all haul-outs
in a region. We will change this assumption in the next fit and see that the AIC
values decline.

7.2 A single well-mixed population with i.i.d. errors

85

come from a multivariate normal distribution with variance-covariance matrix
R (v ∼ MVN(0, R)), where


r0000
0 r 0 0 0



(7.4)
R=
0 0 r 0 0
0 0 0 r 0
0000r
Z specifies which observation time series, yi,1:T , is associated with which
population trajectory, x j,1:T . Z is like a look up table with 1 row for each
of the n observation time series and 1 column for each of the m population
trajectories. A 1 in row i column j means that observation time series i is
measuring state process j. Otherwise the value in Zi j = 0. Since we have only
1 population trajectory, all the regions must be measuring that one population
trajectory. Thus Z is n × 1:
 
1
1
 

(7.5)
Z=
1
1
1
7.2.2 Fitting the model
We have specified the mathematical form of our state-space model. The next
step is to fit this model with MARSS(). The function call will now look like:
kem1 = MARSS(dat, model=list(Z=Z.model, R=R.model) )
The model list argument tells the MARSS() function the model structure, i.e.
the form of Z, u, Q, etc. For our first analysis, we only need to set the model
structure for Z and R. Since there is only one population, there is only one u
and Q (they are scalars), so they have no ’structure’.
First we specify the Z matrix. We need to tell the MARSS function that Z is
a 5×1 matrix of 1s (as in Equation 7.3). We can do this two ways. We can pass
in Z.model as a matrix of ones, matrix(1,5,1), just like in Equation (7.3)
or we can pass in a vector of five factors, factor(c(1,1,1,1,1)). The i-th
factor specifies which population trajectory the i-th observation time series
belongs to. Since there is only one population trajectory in this first analysis,
we will have a vector of five 1’s: every observation time series is measuring the
first, and only, population trajectory.
Z.model = factor(c(1,1,1,1,1))
Note, the vector (the c() bit) must be wrapped in factor() so that MARSS
recognizes what it is. You can use either numeric or character vectors:
c(1,1,1,1,1) is the same as c("PS","PS","PS","PS","PS").

86

7 Combining multi-site and subpopulation data

Next we specify that the R variance-covariance matrix only has terms
on the diagonal (the variances) with the off-diagonal terms (the covariances)
equal to zero:
R.model = "diagonal and unequal"
The ‘and unequal’ part specifies that the variances are allowed to be unique
on the diagonal. If we wanted to force the observation variances to be equal
at all regions, we would use "diagonal and equal".

9

Observations and total population estimate

2 2

2

2

2 2

2 2
2

7

3
2
1
2 2
1 3 1
3 3
3
1 3
3
3 3
2
3
1
1 3
2 3
3 3 1
1
1
3
1
3
1
1
1 5
4 4
4 4
1
1
5 4
5
4
1
4
1
4
5
4
5
5
5

2
3
5
1
4

5

6

index of log abundance

8

2 2 2

1980

1985

1990

1995

Fig. 7.2. Plot of the estimate of “log total harbor seals in Puget Sound”. The
estimate of the total count has been scaled relative to the first time series. The 95%
confidence intervals on the population estimates are the dashed lines. These are not
the confidence intervals on the observations, and the observations (the numbers) will
not fall between the confidence interval lines.

Code 7.2 shows you how to fit the single population model (Equations 7.2
and 7.3) to the harbor seal data.

7.2 A single well-mixed population with i.i.d. errors

87

Code 7.2
#Code to fit the single population model with i.i.d. errors
#Read in data
dat=t(harborSealWA) #Transpose since MARSS needs time ACROSS columns
years = dat[1,]
n = nrow(dat)-1
dat = dat[2:nrow(dat),]
legendnames = (unlist(dimnames(dat)[1]))
#estimate parameters
Z.model = factor(c(1,1,1,1,1))
R.model = "diagonal and unequal"
kem1 = MARSS(dat, model=list(Z=Z.model, R=R.model))
#make figure
matplot(years, t(dat),xlab="",ylab="index of log abundance",
pch=c("1","2","3","4","5"),ylim=c(5,9),bty="L")
lines(years,kem1$states-1.96*kem1$states.se,type="l",
lwd=1,lty=2,col="red")
lines(years,kem1$states+1.96*kem1$states.se,type="l",
lwd=1,lty=2,col="red")
lines(years,kem1$states,type="l",lwd=2)
title("Observations and total population estimate",cex.main=.9)
coef(kem1, type="vector") #show the estimated parameter elements as a vector
#show estimated elements for each parameter matrix as a list
coef(kem1)
kem1$logLik
#show the log-likelihood
kem1$AIC #show the AIC
7.2.3 The MARSS() output
The output from MARSS(), here assigned the name kem1, is a list of objects:
names(kem1)
The maximum-likelihood estimates of “total harbor seal population” scaled
to the first observation data series (Figure 7.2) are in kem1$states, and
kem1$states.se are the standard errors on those estimates. To get 95% confidence intervals, use kem1$states +/- 1.96*kem1$states.se. Figure 7.2
shows a plot of kem1$states with its 95% confidence intervals over the data.
Because kem1$states has been scaled relative to the first time series, it is on
top of that time series. One of the biases, the /aa, cannot be estimated and
arbitrarily our algorithm choses a1 = 0, so the population estimate is scaled
to the first observation time series.

88

7 Combining multi-site and subpopulation data

The estimated parameters are output with the function coef: coef(kem1).
To get the estimate just for U, which is the estimated long-term population
growth rate, use coef(kem1)$U. Multiply by 100 to get the percent increase
per year. The estimated process variance is given by coef(kem2)$Q.
The log-likelihood of the fitted model is in kem1$logLik. We estimated
one initial x (t = 1), one process variance, one u, four a’s, and five observation
variances’s. So K = 12 parameters. The AIC of this model is −2×log-like+2K,
which we can show by typing kem1$AIC.

7.3 Single population model with independent and
non-identical errors
Here is the estimated R matrix for our first model:
coef(kem1,type="matrix")$R
[1,]
[2,]
[3,]
[4,]
[5,]

[,1]
0.03229417
0.00000000
0.00000000
0.00000000
0.00000000

[,2]
0.00000000
0.03527748
0.00000000
0.00000000
0.00000000

[,3]
0.00000000
0.00000000
0.01352073
0.00000000
0.00000000

[,4]
0.00000000
0.00000000
0.00000000
0.01082157
0.00000000

[,5]
0.0000000
0.0000000
0.0000000
0.0000000
0.1960897

Notice that the variances along the diagonal are all the same—we estimated
one observation variance and it applied to all observation time series. We
might be able to improve the fit (at the cost of more parameters) by assuming
that the observation variance is different across regions while the errors are
still independent. This means we estimate five observation variances instead
of one. In this case, R has the form:


r1 0 0 0 0
 0 r2 0 0 0 



(7.6)
R=
 0 0 r3 0 0 
 0 0 0 r4 0 
0 0 0 0 r5
To impose this model, we set the R model to
R.model="diagonal and unequal"
This tells MARSS that all the r’s along the diagonal in R are different. To fit
this model to the data, call MARSS() as:
Z.model = factor(c(1,1,1,1,1))
R.model = "diagonal and equal"
kem2 = MARSS(dat, model=list(Z=Z.model, R=R.model))

7.3 Single population model with independent and non-identical errors

89

We estimated one initial x, one process variance, one u, four a’s, and five
observation variances. So K = 11 parameters. The AIC for this new model
compared to the old model with one observation variance is:
c(kem1$AIC,kem2$AIC)
[1] -9.323982

8.813447

A smaller AIC means a better model. The difference between the one observation variance versus the unique observation variances is >10, suggesting that
the unique observation variances model is better.
One of the key diagnostics when you are comparing fits from multiple models is whether the model is flexible enough to fit the data. This can be checked
by looking for temporal trends in the the residuals between the estimated population states (e.g. kem2$states) and the data. In Figure 7.3, the residuals
for the second analysis are shown. Ideally, these residuals should not have a
temporal trend. They should look cloud-like. The fact that the residuals have
a strong temporal trend is an indication that our one population model is too
restrictive for the data5 . Code 7.3 shows you how to fit the second model and
make the diagnostics plot.

5

When comparing models via AIC, it is important that you only compare models
that are flexible enough to fit the data. Fortunately if you neglect to do this,
the inadequate models will usually have very high AICs and fall out of the mix
anyhow.

7 Combining multi-site and subpopulation data
SJF

SJI

●

● ●
●

●

●●

0.0

●

●
●

●
●

−0.4
10 15

5

10 15
Index

PSnd

HC

●
●

● ● ●

●

●

●●
●
●

●

●
●

5

10

15

Index

●
●

0.0

0.1

●

●

0.4

0.8

Index

●

0.0

●●
●

● ●
●

●

●

−0.2

●

0.2

●●
● ●
●

●

●

●

residuals

0.0

residuals

●

●

5

residuals

●●

●

●

−0.2
−0.4

residuals

●
●

●

●
●

0.0 0.1 0.2

●

● ●

0.2

0.2

●
●

EBays

residuals

90

●

●

2

4

6

Index

8

●

●

−0.4

−0.2

●

●

1

3

●

5

7

Index

Fig. 7.3. Residuals for the model with a single population. The plots of the residuals
should not have trends with time, but they do. This is an indication that the single
population model is inconsistent with the data. The code to make this plot is given
in the script file for this chapter.

Code 7.3
#Code to fit the single population model with independent and unequal errors
Z.model = factor(c(1,1,1,1,1))
R.model = "diagonal and equal"
kem2 = MARSS(dat, model=list(Z=Z.model, R=R.model))
coef(kem2) #the estimated parameter elements
kem2$logLik #log likelihood
c(kem1$AIC,kem2$AIC) #AICs
#plot residuals
plotdat = t(dat)
matrix.of.biases = matrix(coef(kem2, type="matrix")$A,
nrow=nrow(plotdat),ncol=ncol(plotdat),byrow=T)
xs = matrix(kem2$states,
nrow=dim(plotdat)[1],ncol=dim(plotdat)[2],byrow=F)
resids = plotdat-matrix.of.biases-xs
par(mfrow=c(2,3))
for(i in 1:n){
plot(resids[!is.na(resids[,i]),i],ylab="residuals")
title(legendnames[i])
}
par(mfrow=c(1,1))

7.4 Two subpopulations, north and south

91

7.4 Two subpopulations, north and south
For the third analysis, we will change our assumption about the structure
of the population. We will assume that there are two subpopulations, north
and south, and that regions 1 and 2 (Strait of Juan de Fuca and San Juan
Islands) fall in the north subpopulation and regions 3, 4 and 5 fall in the south
subpopulation. For this analysis, we will assume that these two subpopulations
share their growth parameter, u, and process variance, q, since they share
a similar environment and prey base. However we postulate that because of
fidelity to natal rookeries for breeding, animals do not move much year-to-year
between the north and south and the two subpopulations are independent.
We need to write down the state-space model to reflect this population
structure. There are two subpopulations, xn and xs , and they have the same
growth rate u:
 
 
   
xn
xn
u
w
=
+
+ n
(7.7)
xs t
xs t−1
u
ws t
We specify that they are independent by specifying that their year-to-year
population fluctuations (their process errors) come from a multivariate normal
with no covariance:
 
  

wn
0
q0
∼ MV N
,
(7.8)
0
0q
ws t
For the observation process, we use the Z matrix to associate the regions
with their respective xn and xs values:

 

   
v1
y1
10
0
 y2 
 1 0     a2   v2 

 

   
 y3  =  0 1  xn +  0  +  v3 
(7.9)
 xs
 

   
t
 y4 
0 1
 a4   v4 
01
a5
v5 t
y5 t
7.4.1 Specifying the model elements
We need to change the Z specification to indicate that there are two subpopulations (north and south), and that regions 1 and 2 are in the north
subpopulation and regions 3,4 and 5 are in the south subpopulation. There
are a few ways, we can specify this Z matrix for MARSS():
Z.model = matrix(c(1,1,0,0,0,0,0,1,1,1),5,2)
Z.model = factor(c(1,1,2,2,2))
Z.model = factor(c("N","N","S","S","S"))
Which you choose is a matter of preference as they all specify the same form
for Z.
We also want to specify that the u’s are the same for each subpopulation
and that Q is diagonal with equal q’s. To do this, we set

92

7 Combining multi-site and subpopulation data

U.model = "equal"
Q.model = "diagonal and equal"
This says that there is one u and one q parameter and both subpopulations
share it (if we wanted the u’s to be different, we would use U.model="unequal"
or leave off the u model since the default behavior is U.model="unequal").
Code 7.4 puts all the pieces together and shows you how to fit the north
and south population model and create the residuals plot (Figure 7.4). The
residuals look better (more cloud-like) but the Hood Canal residuals are still
temporally correlated.

Code 7.4
#fit the north and south population model
Z.model = factor(c(1,1,2,2,2))
U.model = "equal"
Q.model = "diagonal and equal"
R.model = "diagonal and equal"
kem3 = MARSS(dat, model=list(Z=Z.model,
R=R.model, U=U.model, Q=Q.model))
#plot residuals
plotdat = t(dat)
matrix.of.biases = matrix(coef(kem3,type="matrix")$A,
nrow=nrow(plotdat),ncol=ncol(plotdat),byrow=T)
par(mfrow=c(2,3))
for(i in 1:n){
j=c(1,1,2,2,2)
xs = kem3$states[j[i],]
resids = plotdat[,i]-matrix.of.biases[,i]-xs
plot(resids[!is.na(resids)],ylab="residuals")
title(legendnames[i])
}
par(mfrow=c(1,1))

7.5 Other population structures
Now work through a number of different structures and examine how your
estimation of the mean population growth rate varies under different assumptions about the structure of the population and the data. You can compare the
model fits using AIC (or AICc). For AIC, lower is better and only the relative
differences matter. A difference of 10 between two AICs means substantially
more support for the model with lower AIC. A difference of 30 or 40 between
two AICs is very large.

7.5 Other population structures
SJF

SJI

EBays

●

●●
●

5

●

●

●

5

10 15

Index

Index

PSnd

HC

●●

●

5

10

15

Index

8

Index

●

●
●

6

● ●

●

●

−0.2

●

4

●
●
● ●
●

●

●

2

●

●

0.2

residuals

●

●
● ●

●

0.4

0.1

●
● ●
●
●

−0.1
−0.3

●

10 15

●

0.1

●

●

residuals

●

●

●
●

●

●

●
●

0.0

● ●

−0.15

−0.2

●

●●

residuals

●

●

●

●

●

●

−0.2

●

●

0.00

● ●

●
●

residuals

●

0.0

residuals

0.10

●

0.2

●
●

93

1

3

●

5

7

Index

Fig. 7.4. The residuals for the analysis with a north and south subpopulation. The
plots of the residuals should not have trends with time. Compare with the residuals
for the analysis with one subpopulation.

7.5.1 Five subpopulations
Analyze the data using a model with five subpopulations, where each of the five
census regions is sampling one of the subpopulations. Assume that the subpopulation are independent (diagonal Q), however let each subpopulation share
the same population parameters, u and q. Code 7.5.1 shows how to set the
MARSS() arguments for this case. You can use R.model="diagonal and equal"
to make all the observation variances equal.

94

7 Combining multi-site and subpopulation data

Code 7.5.1
Z.model=factor(c(1,2,3,4,5))
U.model="equal"
Q.model="diagonal and equal"
R.model="diagonal and unequal"
kem=MARSS(dat, model=list(Z=Z.model,
U=U.model, Q=Q.model, R=R.model) )
7.5.2 Two subpopulations with different population parameters
Analyze the data using a model that assumes that the Strait of Juan
de Fuca and San Juan Islands census regions represent a northern Puget
Sound subpopulation, while the other three regions represent a southern
Puget Sound subpopulation. This time assume that each population trajectory (north and south) has different u and q parameters: un ,us and qn ,qs .
Also assume that each of the five census regions has a different observation variance. Try to write your own code. If you get stuck (or want to
check your work, you can open a script file with sample R code by typing
RShowDoc("Chapter_SealTrend.R",package="MARSS") at the R command
line.
In math form, this model is:
 
 
     
 

xn
x
u
w
w
q 0
= n
+ n + n , n ∼ MVN 0, n
(7.10)
xs t
xs t−1
us
ws t ws t
0 qs

 

   
v1
y1
10
0
 y2 
 1 0     a2   v2 

 

   
 y3  =  0 1  xn +  0  +  v3 
(7.11)
 xs
 

   
t
 y4 
0 1
 a4   v4 
01
a5
v5 t
y5 t
7.5.3 Hood Canal covaries with the other regions
Analyze the data using a model with two subpopulations with the divisions
being Hood Canal versus everywhere else. In math form, this model is:
 
 
     
 

xp
x
u
wp
wp
qc
= p
+ p +
,
∼ MVN 0,
(7.12)
xh t
xh t−1
uh
wh t wh t
cq
 


   
y1
10
0
v1
 y2 
 1 0     a2   v2 
 


   
 y3  =  1 0  x p +  a3  +  v3 
(7.13)
 

 xh
   
t
 y4 
1 0
 a4   v4 
y5 t
01
0
v5 t

7.6 Discussion

95

To specify that Q has one value on the diagonal (one variance) and one
value on the off-diagonal (covariance) you can specify Q.model two ways:
Q.model = "equalvarcov"
Q.model = matrix(c("q","c","c","q"),2,2)
7.5.4 Three subpopulations with shared parameter values
Analyze the data using a model with three subpopulations as follows: north
(regions 1 and 2), south (regions 3 and 4), Hood Canal (region 5). You can
specify that some subpopulations share parameters while others do not. First,
let’s specify that each population is affected by independent environmental
variability, but that the variance of that variability is the same for the two
interior populations:
Q.model=matrix(list(0),3,3)
diag(Q.model)=c("coastal","interior","interior")
print(Q.model)
Notice that Q is a diagonal matrix (independent year-to-year environmental
variability) but the variance of two of the populations is the same. Notice too
that the off-diagonal terms are numeric; they do not have quotes. We specified
Q using a matrix of class list, so that we could have numeric values (fixed)
and character values (estimated parameters).
In a similar way, we specify that the observation errors are independent
but that estimates from a plane do not have the same variance as those from
a boat:
R.model=matrix(list(0),5,5)
diag(R.model)=c("boat","boat","plane","plane","plane")
For the long-term trends, we specify that x1 and x2 share a long-term trend
(“puget sound”) while x3 is allowed to have a separate trend (“hood canal”).
U.model=matrix(c("puget sound","puget sound","hood canal"),3,1)

7.6 Discussion
There are a number of corners that we cut in order to show you code that
runs quickly:
ˆ

We ran the code starting from one initial condition. For a real analysis,
you should start from a large number of random initial conditions and
use the one that gives the highest likelihood. Since the EM algorithm is
a “hill-climbing” algorithm, this ensures that it does not get stuck on a
local maxima. MARSS() will do this for you if you pass it the argument
control=list(MCInit=TRUE). This will use a Monte Carlo routine to try
many different initial conditions. See the help file on MARSS() for more
information (by typing ?MARSS at the R prompt).

96

ˆ

ˆ

ˆ

7 Combining multi-site and subpopulation data

We assume independent observation and process errors. Depending on your
system, observation errors may be driven by large-scale environmental factors (temperature, tides, prey locations) that would cause your observation
errors to covary across regions. If your observation errors strongly covary
between regions and you treat them as independent, this could be bad
for your analysis. Unfortunately, separating covariance across observation
versus process errors will require much data (to have any power). In practice, the first step is to think hard about what drives sightability for your
species and what are the relative levels of process and observation variance. You may be able to subsample your data in a way that will make
the observation errors more independent.
The MARSS() argument control specifies the options for the EM algorithm. We left the default tolerance for the convergence test. You would
want to set this lower for a real analysis. You will need to up the maxit
argument correspondingly.
We used the large-sample approximation for AIC instead of a bootstrap
AIC that is designed to correct for small sample size in state-space models. The bootstrap metric, AICb, takes a long time to run. Use the call
MARSSaic(kem, output=c("AICbp")) to compute AICb. We could have
shown AICc, which is the small-sample size corrector for non-state-space
models. Type kem$AICc to get that.

Finally, in a real (maximum-likelihood) analysis, one needs to be careful
not to dredge the data. The temptation is to look at the data and pick a
population structure that will fit that data. This can lead to including models
in your analysis that have no biological basis. In practice, we spend a lot of
time discussing the population structure with biologists working on the species
and review all the biological data that might tell us what are reasonable
structures. From that, a set of model structures to use are selected. Other
times, a particular model structure needs to be used because the population
structure is not in question rather it is a matter of using that pre-specified
structure and using all the data to get parameter estimates for forecasting.

Some more questions you might ponder
Do different assumptions about whether the observation error variances are
all identical versus different affect your estimate of the long-term population
growth rate (u)? You may want to rerun examples 3-7 with the R.model
changed. R.model="diagonal and unequal" means measurement variances
all different versus "diagonal and equal".
Do assumptions about the underlying structure of the population affect
your estimates of u? Structure here means number of subpopulations and
which areas are in which subpopulation.
The confidence intervals for the first two analyses are very tight because
the estimated process variance, Q, was very small. Why do you think process

7.6 Discussion

97

variance (q) was forced to be so small? [Hint: We are forcing there to be one
and only one true population trajectory and all the observation time series
have to fit that one time series. Look at the AICs too.]

8
Identifying spatial population structure and
covariance

8.1 Harbor seals on the U.S. west coast
In this application, we use time series of harbor seal abundance estimates
along the west coast to examine large-scale spatial structure. Harbor seals are
distributed along the west coast of the U.S. from California to Washington.
The populations in Oregon and Washington have been surveyed for over 25
years (Jeffries et al., 2003) at a number of haul-out sites (Figure 8.1). These
populations have been increasing steadily since the 1972 Marine Mammal
Protection Act.
For management purposes, three stocks are recognized; the CA stock, the
OR/WA coastal stock which consists of four regions (Northern/Southern Oregon, Coastal Estuaries, Olympic Peninsula), and the inland WA stock which
consists of the regions in the WA inland waters minus Hood Canal (Figure
8.1). Differences exist in the demographics across regions (e.g. pupping dates),
however mtDNA analyses and tagging studies support the larger stock structure. Harbor seals are known for strong site fidelity, but at the same time
travel large distances to forage.
Our goal is to address the following questions about spatial structure:
1) Does population abundance data support the existing management boundaries, or are there alternative groupings that receive more support?, 2) Do subpopulations (if they exist) experience independent environmental variability
or correlated variability? and 3) Does the Hood Canal site represent a distinct
subpopulation? To address these questions, we will mathematically formulate
different hypotheses about population structure via different MARSS models. We will then compare the data support for different models using model
selection criteria, specifically AICc and AIC weights.

Type RShowDoc("Chapter_SealPopStructure.R",package="MARSS") at the R
command line to open a file with all the code for the examples in this chapter.

Figure 01. Map of spatial distribution of 9 harbor seal sites in Washington and Oregon.

100

8 Using MARSS models to identify spatial population structure and covariance

San Juans

Eastern Bays

Juan de Fuca
Puget Sound
Olympic
Peninsula

Hood Canal

Coastal Estuaries

Northern Coast

Southern Coast

Fig. 8.1. Map of spatial distribution of harbor seal survey regions in Washington
and Oregon. In addition to these nine survey regions, we also have data from the
Georgia Strait just north of the San Juan Islands, the California coast and the
Channels Islands in Southern California.

8.1.1 MARSS models for a population with spatial structure
The mathematical form of the model we will use is
xt = xt−1 + u + wt where wt ∼ MVN(0, Q)
yt = Zxt + a + vt where vt ∼ MVN(0, R)

(8.1)

x0 ∼ MVN(π, Λ)
B is in front of x but is left off above since it is the identity matrix1 . We will use
Z, u, and Q to specify different hypotheses about the population structure.
The form of a will be “scaling” in all cases. Aerial survey methodology has
been relatively constant across time and space, and we will assume that all
the time series from each region has identical and independent observation
1

a diagonal matrix with 1s on the diagonal

8.2 Question 1, How many distinct subpopulations?

101

error variance, which means a diagonal R matrix with one variance term on
the diagonal2 .
Each call to MARSS() will look like
fit = MARSS(sealData, model=list(
Z = Z.model, Q = Q.model, ...))
where the ... are components of the model list that are the same across
all models. We will specify different Z.model and Q.model in order to model
different population spatial structures.

8.2 Question 1, How many distinct subpopulations?
We will start by evaluating the data support for the following hypotheses
about the population structure:
H1
H2
H3
H4

3 subpopulations defined by stock
2 subpopulations defined by coastal versus WA inland
2 subpopulations defined by north and south split in the middle of Oregon
4 subpopulations defined by N coastal, S coastal, SJF+Georgia Strait,
and Puget Sound
H5 All regions are part of the same panmictic population
H6 Each of the 11 regions is a subpopulation
We will analyze each of these under the assumption of independent process
errors with each subpopulation having different variances or the same variance.
8.2.1 Specify the Z matrices
The Z matrices specify the relationship between the survey regions and the
subpopulations and allow us to specify the spatial population structures in the
hypotheses. Each column of Z corresponds to a different subpopulation and
associates regions with particular subpopulations. For example for hypothesis
1, column 1 of the Z matrix is OR/WA Coastal, column 2 is inland WA (ps
for Puget Sound) and column 3 is CA. The Z matrix for hypotheses 1, 2, 4,
and 5 take the following form:

To tell MARSS() the form of Z, we construct the same matrix in R. For
example, for hypotheses 1, we can write:
2

The sampling regions have different number of sites where animals are counted.
But we are working with log counts. We assume that the distribution of percent
errors is the same (the probability of a 10% over-count is the same) and thus that
the variances are similar on the log-scale.

102

8 Using MARSS models to identify spatial population structure and covariance

Coastal Estuaries
Olympic Peninsula
Str. Juan de Fuca
San Juan Islands
Eastern Bays
Puget Sound
CA.Mainland
CA.ChannelIslands
OR North Coast
OR South Coast
Georgia Strait

H1
Z
wa.or ps
1
0
1
0
0
1
0
1
0
1
0
1
0
0
0
0
1
0
1
0
0
1

H2
H4
H5
Z
Z
Z
ca coast ps nc is ps sc pan
0
1 0 1 0 0 0
1
0
1 0 1 0 0 0
1
0
0 1 0 1 0 0
1
0
0 1 0 1 0 0
1
0
0 1 0 0 1 0
1
0
0 1 0 0 1 0
1
1
1 0 0 0 0 1
1
1
1 0 0 0 0 1
1
0
1 0 1 0 0 0
1
0
1 0 0 0 0 1
1
0
0 1 0 1 0 0
1

Z.model=matrix(0,11,3)
Z.model[c(1,2,9,10),1]=1 #which elements in col 1 are 1
Z.model[c(3:6,11),2]=1 #which elements in col 2 are 1
Z.model[7:8,3]=1 #which elements in col 3 are 1
MARSS has a shortcut for making this kind of Z matrix using factor().
To make the Z matrix for hypotheses 1, we could also write factor(c(1,1,2,2,2,2,3,3,1,1,2)).
Each element is for one of the rows of Z and indicates which column the “1”
appears in or which row of your data belongs to which subpopulation. Instead
of numbers however, we will use text strings to denote the subpopulations.
For example, the Z.model specification for hypothesis 1 is
Z1=factor(c("wa.or","wa.or",rep("ps",4),"ca","ca","wa.or","wa.or","bc"))
Notice it is 11 elements in length; one element for each row of data (in this
case survey region).
8.2.2 Specify the u structure
We will assume that subpopulations can have a unique population growth
rate. Mathematically, this means that the u matrix in Equation 8.1 looks like
this for hypotheses 1 (3 subpopulations):
 
u1
u2 
u3
To specify this, we construct U.model as a character matrix where shared
elements have the same character name. For example,
U.model=matrix(c("u1","u2","u3"),3,1)
for a three subpopulation model. Alternatively, we can use the shortcut
U.model="unequal".

8.2 Question 1, How many distinct subpopulations?

103

8.2.3 Specify the Q structures
For our first analysis, we fit a model where the subpopulations experience
independent process errors. We will use two different types of independent
process errors: independent process errors with different variances and independent process errors with identical variance. Independence is specified with
a diagonal variance-covariance matrix with 0s on the off-diagonals.
Independent process errors with different variances is a diagonal matrix
with different values on the diagonal:


q1 0 0
 0 q2 0 
0 0 q3
This matrix has fixed numeric values, the zeros, combined with symbols q1 , q2
and q3 , representing estimated values. We specified this for MARSS() using
a list matrix which combines numeric values (the fixed zeros) with character
values (names of the estimated elements). The following produces this and
printing it shows that it combines numeric values and character strings in
quotes.
Q.model=matrix(list(0),3,3)
diag(Q.model)=c("q1","q2","q3")
Q.model
[,1] [,2] [,3]
[1,] "q1" 0
0
[2,] 0
"q2" 0
[3,] 0
0
"q3"
We can also use the shortcut Q.model="diagonal and unequal".
Independent process errors with identical variance is a diagonal matrix
with one value on the diagonal:


q00
0 q 0
00q
Q.model=matrix(list(0),3,3)
diag(Q.model)="q"
Q.model
[,1]
[1,] "q"
[2,] 0
[3,] 0

[,2]
0
"q"
0

[,3]
0
0
"q"

The shortcut for this form is Q.model="diagonal and equal".

104

8 Using MARSS models to identify spatial population structure and covariance

8.3 Fit the different models
The dataset harborSeal is a 29-year dataset of abundance indices for each of
12 regions between 1975-2004 (Figure 8.2). We start by setting up our data
matrix. We will leave off Hood Canal (column 8) for now.
years = harborSeal[,1] #first col is years
#leave off Hood Canal data for now
sealData = t(harborSeal[,c(2:7,9:13)])

●

2005

2005

● ●●
●

1995

2005

1995

2005

2005

●●
●
● ●
●

1995

7.9

●
●
● ●● ●

8.4

2005

●●

●
●

7.7

● ●
●

●
● ●

●
●

●
●

●

●

1975

●

1985

1995

2005

Georgia.Strait

●●
●
●

● ●
● ●

●
● ●
●

●
●

●
●
●
●
●
●
●

●

●
●

1985

1985

●

●
●● ●

7.5

6.4 6.8 7.2 7.6

●
●
●
●
● ●
●
●
●
●●

2005

●

OR.SouthCoast

●●
●

1975

●

●

●●

1975

1995

●
●

8.0

●

OR.NorthCoast

●
●

●
●●●

●

1985

●
●

7.6

●

1995

●
●●

1995

2005

8.5

1985

●
●

●

1975

9.5

6.8
6.4

●

1975

9.3 9.5 9.7 9.9

●

● ●

2005

CA.ChannelIslands

●

●

●

5.8
1985

CA.Mainland

●

1995

●

●

1975

HoodCanal
●

1985

7.0

●●●● ●
●
●●

7.0
6.6

7.0

●

1985

●

1975

6.2

●

●

●●
●
●
●

●

●
●

PugetSound

●

●

7.4

8.0

1995

● ●
● ●

EasternBays

●
●
●● ●●

1975

1985

7.8

SanJuanIslands
●●
●
●●

●

●●

1975

● ●

●
●

●

6.6

1995

●

●

● ● ●
●
●●

6.0 6.5 7.0 7.5

●

7.4
1985

●

●
●
●
●●

7.8

8.5
8.0
7.5

●

●
●●

1975

StraitJuanDeFuca

●

●● ●
●
●

●
●
●
● ●●

●

OlympicPeninsula

● ●
●

10.5

●
●●

8.2

9.0

CoastalEstuaries

●

1975

1985

1995

2005

●

1975

1985

1995

2005

Fig. 8.2. Plot of the of the harbor seal sites in the harborSeal dataset. Each region
is an index of the harbor seal abundance in that region.

We will set up our models so we can fit all of them with one loop of code.
First the Z models.
#H1 stock
Z1=factor(c("wa.or","wa.or",rep("ps",4),"ca","ca","wa.or","wa.or","bc"))
#H2 coastal+PS
Z2=factor(c(rep("coast",2),rep("ps",4),rep("coast",4),"ps"))
#H3 N and S

8.3 Fit the different models

105

Z3=factor(c(rep("N",6),"S","S","N","S","N"))
#H4 North Coast, Inland Strait, Puget Sound, South Coast
Z4=factor(c("nc","nc","is","is","ps","ps","sc","sc","nc","sc","is"))
#H5 panmictic
Z5=factor(rep("pan",11))
#H6 Site
Z6=factor(1:11) #site
Z.models=list(Z1,Z2,Z3,Z4,Z5,Z6)
names(Z.models)=
c("stock","coast+PS","N-S","NC+Strait+PS+SC","panmictic","site")
Next we set up the Q models.
Q.models=c("diagonal and equal", "diagonal and unequal")
The rest of the model matrices have the same form across all models.
U.model="unequal"
R.model="diagonal and equal"
A.model="scaling"
B.model="identity"
x0.model="unequal"
V0.model="zero"
model.constant=list(
U=U.model, R=R.model, A=A.model,
x0=x0.model, V0=V0.model, tinitx=0)
We loop through the models, fit and store the results:
out.tab=NULL
fits=list()
for(i in 1:length(Z.models)){
for(Q.model in Q.models){
fit.model = c(list(Z=Z.models[[i]], Q=Q.model), model.constant)
fit = MARSS(sealData, model=fit.model,
silent=TRUE, control=list(maxit=1000))
out=data.frame(H=names(Z.models)[i], Q=Q.model, U=U.model,
logLik=fit$logLik, AICc=fit$AICc, num.param=fit$num.params,
m=length(unique(Z.models[[i]])),
num.iter=fit$numIter, converged=!fit$convergence)
out.tab=rbind(out.tab,out)
fits=c(fits,list(fit))
if(i==5) next #one m for panmictic so only run 1 Q
}
}

106

8 Using MARSS models to identify spatial population structure and covariance

8.4 Summarize the data support
We will use AICc and AIC weights to summarize the data support for the
different hypotheses. First we will sort the fits based on AICc:
min.AICc=order(out.tab$AICc)
out.tab.1=out.tab[min.AICc,]
Next we add the ∆AICc values by subtracting the lowest AICc:
out.tab.1=cbind(out.tab.1,
delta.AICc=out.tab.1$AICc-out.tab.1$AICc[1])
Relative likelihood is defined as exp(−∆AICc/2).
out.tab.1=cbind(out.tab.1,
rel.like=exp(-1*out.tab.1$delta.AICc/2))
The AIC weight for a model is its relative likelihood divided by the sum of all
the relative likelihoods.
out.tab.1=cbind(out.tab.1,
AIC.weight = out.tab.1$rel.like/sum(out.tab.1$rel.like))
Let’s look at the model weights (out.tab.1):
H
NC+Strait+PS+SC
NC+Strait+PS+SC
N-S
N-S
coast+PS
coast+PS
stock
stock
panmictic
panmictic
site
site

Q delta.AICc AIC.weight
diagonal and equal
0.00
0.886
diagonal and unequal
4.15
0.112
diagonal and unequal
12.67
0.002
diagonal and equal
14.78
0.001
diagonal and equal
31.23
0.000
diagonal and unequal
33.36
0.000
diagonal and equal
34.01
0.000
diagonal and unequal
36.84
0.000
diagonal and equal
48.28
0.000
diagonal and unequal
48.28
0.000
diagonal and equal
56.36
0.000
diagonal and unequal
57.95
0.000

It appears that a population structure north and south coast subpopulations and two inland subpopulations is more supported than any of the other
West Coast population structures—under the assumption of independent process errors. The latter means that good and bad years are not correlated across
the subpopulations. The stock structure, supported by genetic information,
does not appear to correspond to independent subpopulations and the individual survey regions, which are characterized by differential pupping times,
does not appear to lead to correspond to independent subpopulations either.
Figure 8.3 shows the the four subpopulation trajectories estimated by the
best fit model. The trajectories have been rescaled so that each starts at 0 in
1975 (to facilitate comparison).

8.5 Question 2, Are the subpopulations independent?

107

0.0

0.5

1.0

1.5

2.0

North Coastal
Inland Straits
Puget Sound
South Coastal

1975

1980

1985

1990

1995

2000

2005

abundance index

Fig. 8.3. Estimated trajectories for the four subpopulations in the best-fit model.
The plots have been rescaled so that each is at 0 at 1975.

8.5 Question 2, Are the subpopulations independent?
The assumption of independent process errors is unrealistic given ocean conditions are correlated across large spatial scales. We will repeat the analysis allowing correlated process errors using two different Q models. The first
correlated Q model is correlated process errors with the same variance and
covariance. For a model with three subpopulations, this Q would look like:


qcc
c q c
ccq
We can construct this like so
#identical variances
Q.model=matrix("c",3,3)
diag(Q.model)="q"
or use the short-cut Q.model="equalvarcov". The second type of correlated
Q we will use is allows each subpopulation to have a different process variance

108

8 Using MARSS models to identify spatial population structure and covariance

and covariances. For a model with three subpopulations, this is the following
variance-covariance matrix:


q1 c1,2 c1,3
c1,2 q2 c2,3 
c1,2 c2,3 q3
Constructing this is tedious in R, but there is a short-cut: Q.model="unconstrained".
We will re-run all the Z matrices with these two extra Q types and add
them to our results table.
for(i in 1:length(Z.models)){
if(i==5) next #don't rerun panmictic
for(Q.model in c("equalvarcov","unconstrained")){
fit.model = c(list(Z=Z.models[[i]], Q=Q.model), model.constant)
fit = MARSS(sealData, model=fit.model,
silent=TRUE, control=list(maxit=1000))
out=data.frame(H=names(Z.models)[i], Q=Q.model, U=U.model,
logLik=fit$logLik, AICc=fit$AICc, num.param=fit$num.params,
m=length(unique(Z.models[[i]])),
num.iter=fit$numIter, converged=!fit$convergence)
out.tab=rbind(out.tab,out)
fits=c(fits,list(fit))
}
}
Again we sort the models by AICc and compute model weights.
min.AICc=order(out.tab$AICc)
out.tab.2=out.tab[min.AICc,]
fits=fits[min.AICc]
out.tab.2=cbind(out.tab.2,delta.AICc=out.tab.2$AICc-out.tab.2$AICc[1])
out.tab.2=cbind(out.tab.2,rel.like=exp(-1*out.tab.2$delta.AICc/2))
out.tab.2=cbind(out.tab.2,AIC.weight=out.tab.2$rel.like/sum(out.tab.2$rel.like))
Examination of the expanded results table (out.tab.2) shows there is
strong support for correlated process errors; top 10 models shown:
H
Q delta.AICc AIC.weight
NC+Strait+PS+SC
equalvarcov
0.00
0.976
site
equalvarcov
7.65
0.021
NC+Strait+PS+SC
unconstrained
11.47
0.003
NC+Strait+PS+SC
diagonal and equal
23.39
0.000
NC+Strait+PS+SC diagonal and unequal
27.53
0.000
N-S
unconstrained
32.61
0.000
N-S diagonal and unequal
36.06
0.000
N-S
equalvarcov
36.97
0.000
stock
equalvarcov
37.82
0.000
N-S
diagonal and equal
38.16
0.000

8.5 Question 2, Are the subpopulations independent?

109

The model weight for “equalvarcov”, “unconstrained”, versus “diagonal and
equal” is
c(
sum(out.tab.2$AIC.weight[out.tab.2$Q=="equalvarcov"]),
sum(out.tab.2$AIC.weight[out.tab.2$Q=="unconstrained"]),
sum(out.tab.2$AIC.weight[out.tab.2$Q=="diagonal and equal"])
)
[1] 0.997 0.003 0.000
8.5.1 Looking at the correlation structure in the Q matrix
The 3rd model in the output table is a model with all elements of the process
error variance-covariance matrix estimated. Estimating a variance-covariance
matrix with so many extra parameters is not supported relative to a constrained variance-covariance matrix with two parameters (compare the AICc
for the 1st model and 3rd model) but looking at the full variance-covariance
matrix shows some interesting and not surprising patterns.
The Q matrix is recovered from the model fit using this command
Q.unc=coef(fits[[3]],type="matrix")$Q
The diagonal of this matrix shows that each region appears to experience
process variability of a similar magnitude:
diag(Q.unc)
[1] 0.009049512 0.007451479 0.004598690 0.005276587
We can compute the correlation matrix as follows. Rownames are added to
make the matrix more readable.
h=diag(1/sqrt(diag(Q.unc)))
Q.corr=h%*%Q.unc%*%h
rownames(Q.corr)=unique(Z4)
colnames(Q.corr)=unique(Z4)
Q.corr
nc
is
ps
sc

nc
1.0000000
0.5970202
0.6421536
0.9163056

is
0.5970202
1.0000000
0.9970869
0.2271385

ps
0.6421536
0.9970869
1.0000000
0.2832502

sc
0.9163056
0.2271385
0.2832502
1.0000000

The correlation matrix indicates that the inland strait (‘is’) subpopulation experiences process errors (good and bad years) that are almost perfectly correlated with the Puget Sound subpopulation though the two have
different population growth rates (Figure 8.3). Similarly the north and south

110

8 Using MARSS models to identify spatial population structure and covariance

coastal subpopulations (‘nc’ and ‘sc’) experience highly correlated process errors, though again population growth rates are much higher to the north.
There is much higher correlation between the process errors of the north
coastal subpopulation and the nearby inland straits and Puget Sound subpopulations than between the two inland subpopulations and the much farther
south coastal subpopulation. These patterns are not ecologically surprising
but are not easy to discern looking at the raw count data.

8.6 Question 3, Is the Hood Canal independent?
In the initial analysis, the data from Hood Canal were removed. Hood Canal
has experienced a series of hypoxic events which has led to large perturbations
to the harbor seal prey. We will add the Hood Canal data back in and look at
whether treating Hood Canal as separate is supported compared to treating
it as part of the Puget Sound subpopulation in the top model.
sealData.hc = rbind(sealData,harborSeal[,8])
rownames(sealData.hc)[12]="Hood.Canal"
Here are the two Z matrices for a ‘Hood Canal in the Puget Sound’ and ‘Hood
Canal separate’ model:
ZH1=factor(c("nc","nc","is","is","ps",
"ps","sc","sc","nc","sc","is","ps"))
ZH2=factor(c("nc","nc","is","is","ps",
"ps","sc","sc","nc","sc","is","hc"))
Z.models.hc=list(ZH1, ZH2)
names(Z.models.hc)=c("hood.in.ps","hood.separate")
We will test three different Q matrices: a matrix with one variance and
one covariance, an unconstrained variance-covariance matrix and a variancecovariance where the Hood Canal subpopulation has independent process errors.
Q3=matrix(list("offdiag"),5,5)
diag(Q3)="q"
Q3[,5]=0; Q3[5,]=0; Q3[5,5]="q.hc"
Q.models=list("equalvarcov","unconstrained",Q3)
names(Q.models)=c("equalvarcov","unconstrained","hood.independent")
The independent Hood Canal Q allow correlation between the other four subpopulations but none between Hood Canal and those four:
Q.models$hood.independent
[,1]
[1,] "q"

[,2]
[,3]
[,4]
[,5]
"offdiag" "offdiag" "offdiag" 0

8.7 Discussion

[2,]
[3,]
[4,]
[5,]

"offdiag"
"offdiag"
"offdiag"
0

"q"
"offdiag"
"offdiag"
0

"offdiag"
"q"
"offdiag"
0

"offdiag"
"offdiag"
"q"
0

111

0
0
0
"q.hc"

As before, we loop through the model and create a results table:
out.tab.hc=NULL
fits.hc=list()
for(i in 1:length(Z.models.hc)){
for(j in 1:length(Q.models)){
if(i==1 & j==3) next #Q3 is only for Hood Separate model
Q.model=Q.models[[j]]
fit.model = c(list(Z=Z.models.hc[[i]], Q=Q.model), model.constant)
fit = MARSS(sealData.hc, model=fit.model,
silent=TRUE, control=list(maxit=1000))
out=data.frame(H=names(Z.models.hc)[i], Q=names(Q.models)[j], U=U.model,
logLik=fit$logLik, AICc=fit$AICc, num.param=fit$num.params,
m=length(unique(Z.models.hc[[i]])),
num.iter=fit$numIter, converged=!fit$convergence)
out.tab.hc=rbind(out.tab.hc, out)
fits.hc=c(fits.hc,list(fit))
}
}
We sort the results by AICc and compute the ∆AICc.
min.AICc=order(out.tab.hc$AICc)
out.tab.hc=out.tab.hc[min.AICc,]
out.tab.hc=cbind(out.tab.hc, delta.AICc=out.tab.hc$AICc-out.tab.hc$AICc[1])
out.tab.hc=cbind(out.tab.hc,rel.like=exp(-1*out.tab.hc$delta.AICc/2))
out.tab.hc=cbind(out.tab.hc,AIC.weight=out.tab.hc$rel.like/sum(out.tab.hc$rel.like))
The results table (out.tab.hc) indicates strong support for treating Hood
Canal as a separate subpopulation but not support for completely independent
process errors.
H
Q delta.AICc AIC.weight
hood.separate
equalvarcov
0.00
0.988
hood.separate hood.independent
8.74
0.012
hood.in.ps
equalvarcov
23.53
0.000
hood.separate
unconstrained
30.65
0.000
hood.in.ps
unconstrained
36.66
0.000

8.7 Discussion
In this chapter, we used model selection and AICc model weights to explore
the temporal correlation structure in the harbor seal abundance data from

112

8 Using MARSS models to identify spatial population structure and covariance

the U.S. west coast. We used the term ‘subpopulation’, however it should be
kept in mind that we are actually looking at the data support for different
spatial patterns of temporal correlation in the process errors. Treating region
A and B as a ‘subpopulation’ in this context means that we are asking if the
counts from A and B can be treated as observations of the same underlying
stochastic trajectory.
Metapopulation structure refers to a case where a larger population is
composed on a collection of smaller temporally independent subpopulations.
Metapopulation structure buffers the variability seen in the larger population
and has important consequences for the viability of a population. We tested
for temporal independence using diagonal versus non-diagonal Q matrices.
Although the west coast harbor seal population appears to divided into ‘subpopulations’ that experience different population growth rates, there is strong
temporal correlation in the year-to-year variability experienced in these subpopulations. This suggests that this harbor seal population does not function
as a true metapopulation with independent subpopulations but rather as a
collection of subpopulations that are temporal correlated.

9
Dynamic factor analysis (DFA)

9.1 Overview
In this chapter, we use MARSS to do dynamic factor analysis (DFA), which
allows us to look for a set of common underlying trends among a relatively
large set of time series (Harvey, 1989, sec. 8.5). See also Zuur et al. (2003)
which shows a number of examples of DFA applied to fisheries catch data and
densities of zoobenthos. We will walk through some examples to show you the
math behind DFA, and then in section 9.4, we will show a short-cut for doing
a DFA with MARSS using form="dfa".
DFA is conceptually different than what we have been doing in the previous
applications. Here we are trying to explain temporal variation in a set of n
observed time series using linear combinations of a set of m hidden random
walks, where m << n. A DFA model is a type of MARSS model with the
following structure:
xt = xt−1 + wt where wt ∼ MVN(0, Q)
yt = Zxt + a + vt where vt ∼ MVN(0, R)

(9.1)

x0 ∼ MVN(π, Λ)
The general idea is that the observations (y) are modeled as a linear combination of hidden trends (x) and factor loadings (Z) plus some offsets (a). The
DFA model in Equation 9.1 and the standard MARSS model in Equation 1.1
are equivalent–we have simply set the matrix B equal to an m × m identity
matrix (i.e., a diagonal matrix with 1’s on the diagonal and 0’s elsewhere)
and the vector u = 0.

Type RShowDoc("Chapter_DFA.R",package="MARSS") at the R command line to
open a file with all the code for the examples in this chapter.

114

9 Dynamic factor analysis

9.1.1 Writing out a DFA model in MARSS form
Imagine a case where we had a data set with six observed time series (n = 6)
and we want to fit a model with three hidden trends (m = 3). If we write out
our DFA model in MARSS matrix form (ignoring the error structures and
initial conditions for now), it would look like this:

 

 
   
x1
1 0 0 x1
0
w1
x2  = 0 1 0 x2  + 0 + w2 
x3 t
0 0 1 x3 t−1
0
w3 t
 

y1
z11
y2 
z21
 

y3 

  = z31
y4 
z41
 

y5 
z51
y6 t
z61

z12
z22
z32
z42
z52
z62


   
v1
z13
a1
  a2  v2 
z23 
 x1
   
   
z33 
 x2  + a3  + v3  .

a4  v4 
z43 
   
x
3
t
a5  v5 
z53 
v6 t
a6
z63

The process errors of the hidden trends would be
 
  

w1
0
q11 q12 q13
w2  ∼ MVN 0 , q12 q22 q23  ,
w3 t
0
q13 q23 q33
and the observation errors would be
 
  
r11
0
v1
v2 
0 r12
 
  
v3 
  
  ∼ MVN 0 , r13
v4 
0 r14
 
  
v5 
0 r15
0
r16
v6 t

r12
r22
r23
r24
r25
r26

r13
r23
r33
r34
r35
r36

r14
r24
r34
r44
r45
r46

r15
r25
r35
r45
r55
r56

(9.2)

(9.3)


r16

r26 


r36 
.

r46 


r56 
r66

(9.4)

9.1.2 Constraints to ensure identifiability
If Z, a, and Q in Equation 9.1 are not constrained, then the DFA model above
is unidentifiable (Harvey, 1989, sec 4.4). Harvey (1989, sec. 8.5.1) suggests the
following parameter constraints to make the model identifiable:
ˆ
ˆ
ˆ

in the first m − 1 rows of Z, the z-value in the j-th column and i-th row is
set to zero if j > i;
a is constrained so that the first m values are set to zero; and
Q is set equal to the identity matrix (Im ).

9.1 Overview

115

Zuur et al. (2003), however, found that with Harvey’s second constraint, the
EM algorithm is not particularly robust, and it takes a long time to converge.
Zuur et al. found that the EM estimates are much better behaved if you instead
constrain each of the time series in x to have a mean of zero across t = 1 to
T . To do so, they replaced the estimates of the hidden states, xtT , coming out
of the Kalman smoother with xtT − x̄ for t = 1 to T , where NOTE : x̄ is the
mean of xt across t. With this approach, you estimate all of the a elements,
which represent the average level of yt relative to Z(xt − x̄). We found that
demeaning the xtT in this way can cause the EM algorithm to have errors
(decline in log-likelihood). Instead, we demean our data, and fix all elements
of a to zero.
Using these constraints, the DFA model in Equation 12.3 becomes
 

 
   
x1
1 0 0 x1
0
w1
x2  = 0 1 0 x2  + 0 + w2 
x3 t
0 0 1 x3 t−1
0
w3 t
 

y1
z11
y2 
z21
 

y3 

  = z31
y4 
z41
 

y5 
z51
y6 t
z61

0
z22
z32
z42
z52
z62


   
0
v1
0
  0 v2 
0
 x1
   
   
z33 
 x2  + 0 + v3  .
0 v4 
z43 
 x
   
z53  3 t 0 v5 
z63
0
v6 t

The process errors of the hidden trends
 
  
w1
0
1
w2  ∼ MVN 0 , 0
w3 t
0
0

(9.5)

in Equation 9.3 would then become

00
1 0 ,
(9.6)
01

but the observation errors in Equation 9.4 would stay the same, such that
 

  
v1
0
r11 r12 r13 r14 r15 r16
v2 
0 r12 r22 r23 r24 r25 r26 
 

  
v3 

  
  ∼ MVN 0 , r13 r23 r33 r34 r35 r36  .
(9.7)
v4 
0 r14 r24 r34 r44 r45 r46 
 

  
v5 
0 r15 r25 r35 r45 r55 r56 
0
r16 r26 r36 r46 r56 r66
v6 t
To complete our model, we still need the final form for the initial conditions
of the state. Following Zuur et al. (2003), we set the initial state vector (x0 )
to have zero mean and a diagonal variance-covariance matrix with large variances, such that
  

0
500
x0 ∼ MVN 0 , 0 5 0 .
(9.8)
0
005

116

9 Dynamic factor analysis

9.2 The data
We will analyze some of the Lake Washington plankton data included in the
MARSS package. This dataset includes 33 years of monthly counts for 13
plankton species along with data on water temperature, total phosphorous
(TP), and pH. First, we load the data and then extract a subset of columns
corresponding to the phytoplankton species only. For the purpose of speeding
up model fitting times and to limit our analysis to years with no missing
covariate data, we will only examine 10 years of data (1980-1989).
# load the data (there are 3 datasets contained here)
data(lakeWAplankton)
# we want lakeWAplanktonTrans, which has been transformed
# so the 0s are replaced with NAs and the data z-scored
dat = lakeWAplanktonTrans
# use only the 10 years from 1980-1989
plankdat = dat[dat[,"Year"]>=1980 & dat[,"Year"]<1990,]
# create vector of phytoplankton group names
phytoplankton = c("Cryptomonas", "Diatoms", "Greens",
"Unicells", "Other.algae")
# get only the phytoplankton
dat.spp.1980 = plankdat[,phytoplankton]
Next, we transpose the data and calculate the number of time series and their
length.
# transpose data so time goes across columns
dat.spp.1980 = t(dat.spp.1980)
# get number of time series
N.ts = dim(dat.spp.1980)[1]
# get length of time series
TT = dim(dat.spp.1980)[2]
It is normal in this type of analysis to standardize each time series by first
subtracting its mean and then dividing by its standard deviation (i.e., create
a z -score yt∗ with mean = 0 and SD = 1), such that
yt∗ = Σ−1 (yt − ȳ),
Σ is a diagonal matrix with the standard deviations of each time series along
the diagonal, and ȳ is a vector of the means. In R, this can be done as follows
Sigma = sqrt(apply(dat.spp.1980, 1, var, na.rm=TRUE))
y.bar = apply(dat.spp.1980, 1, mean, na.rm=TRUE)
dat.z = (dat.spp.1980 - y.bar) * (1/Sigma)
rownames(dat.z) = rownames(dat.spp.1980)
Figure 9.1 shows time series of Lake Washington phytoplankton data following
z -score transformation.

9.3 Setting up the model in MARSS
Cryptomonas

1980

1982

1984

1986

1988

0
−2

●
●
● ●
●
●
● ●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●●
●●
●● ●
●
●
●
● ●
●
●
●
●
●
●● ●
● ●
●●●
●● ●
●
●
●
●
●
● ●
●
●
●●
● ●● ●
●
●
●
● ●
●●
●
●
● ●
● ●●
●
●
●
●
●
●
●
●●
●

●
●
●
●
●●
●
●
●
●
●
● ● ●●
●●
●● ●●
●● ●
● ● ●
● ●●
●
●
●
●●
● ●
●
●
●
● ●● ●
●
●
●● ●
●
●●
●
● ●●
●●
●
●●
● ●
● ●
●●●
●
●
●●
●
●
●●●● ●
●
●
● ● ●
●
● ●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
●
●●
●
●
●
● ●
●
●
●

−4

●

Abundance index

1

3
2
1
0

Abundance index

−2 −1

Unicells

●

●

1990

●

1980

1982

1984

2

●

1980

1982

1984

1986

●

1988

1990

1988

●

1990

●
●
●●

1

●

●

0

●
●

●

●
●
●
●
●
●
●●
●●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
● ● ●
●
●
●
●
●●●
●
●● ● ● ●
● ● ●
●
●●
●
●
●●
● ●●
●
● ●
●
● ● ●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

−1

●

●
●

Abundance index

●

1986

Other.algae

●
●

1980

●

●

●

● ●
●
●

●
●
●●
●
● ●
● ● ●
●
● ●

−2

1
0
−1
−2

Abundance index

2

Diatoms
●
●
●
●

117

●●
●
●

●
●

●

●●

● ●
●●●
●

●
●

●
●
●
●

●
● ●

●●
●●
●

●●

●

●
●
●
●

1984

●●
●
●

●●

●●
●
●
●

● ●●
● ●●●
● ●
●
●●
● ●● ●
●
● ●
●
●● ●
●
●
●
●

●

●
●●

1982

●

●
●
●
●

●
●
●
●
●
●

●

●

●●

1986

1988

1990

2
1
0
−2 −1

Abundance index

3

Greens
●
●
●

●
●

●
●

●

●
●

●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●● ● ●● ●
●
● ● ●●● ●
●
● ●
●
●●●●
●●
●●
●
●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●
●
● ●
● ●●
●
●
● ●
●
● ●
●
●
●
●
● ●
●● ●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●

●

1980

●

1982

1984

1986

●

1988

●

1990

Fig. 9.1. Time series of Lake Washington phytoplankton data following z -score
transformation.

9.3 Setting up the model in MARSS
As we have seen in other cases, setting up the model structure for MARSS
requires that the parameter matrices have a one-to-one correspondence to the
model as you would write it on paper (i.e., Equations 9.5 through 9.8). If a
parameter matrix has a combination of fixed and estimated values, then you
specify that using matrix(list(), nrow, ncol). This is a matrix of class list
and allows you to combine numeric and character values in a single matrix.
MARSS recognizes the numeric values as fixed values and the character values
as estimated values.
This is how we set up Z for MARSS, assuming a model with 5 observed
time series and 3 hidden trends:
Z.vals = list(
"z11", 0 , 0 ,
"z21","z22", 0 ,
"z31","z32","z33",
"z41","z42","z43",

118

9 Dynamic factor analysis

"z51","z52","z53")
Z = matrix(Z.vals, nrow=N.ts, ncol=3, byrow=TRUE)
When specifying the list values, spacing and carriage returns were added to
help show the correspondence with the Z matrix in Equation 9.3. If you print
Z (at the R command line), you will see that it is a matrix with character
values (the estimated elements) and numeric values (the fixed 0’s).
print(Z)
[1,]
[2,]
[3,]
[4,]
[5,]

[,1]
"z11"
"z21"
"z31"
"z41"
"z51"

[,2]
0
"z22"
"z32"
"z42"
"z52"

[,3]
0
0
"z33"
"z43"
"z53"

Notice that the 0’s do not have quotes around them. If they did, it would
mean the "0" is a character value and would be interpreted as the name of a
parameter to be estimated rather than a fixed numeric value.
The Q and B matrices are both set equal to the identity matrix using
diag().
Q = B = diag(1,3)
For our first analysis, we will assume that each time series of phytoplankton
has a different observation variance, but that there is no covariance among
time series. Thus, R should be a diagonal matrix that looks like:


r11 0 0 0 0
 0 r22 0 0 0 


 0 0 r33 0 0  ,


 0 0 0 r44 0 
0 0 0 0 r55
and each of the ri,i elements is a different parameter to be estimated. We can
also specify this R structure using a list matrix as follows:
R.vals = list(
"r11",0,0,0,0,
0,"r22",0,0,0,
0,0,"r33",0,0,
0,0,0,"r44",0,
0,0,0,0,"r55")
R = matrix(R.vals, nrow=N.ts, ncol=N.ts, byrow=TRUE)

9.3 Setting up the model in MARSS

119

You can print R at the R command line to see what it looks like:
print(R)
[1,]
[2,]
[3,]
[4,]
[5,]

[,1]
"r11"
0
0
0
0

[,2]
0
"r22"
0
0
0

[,3]
0
0
"r33"
0
0

[,4]
0
0
0
"r44"
0

[,5]
0
0
0
0
"r55"

This form of variance-covariance matrix is commonly used, and therefore
MARSS has a built-in shorthand for this structure. Alternatively, we could
simply type:
R = "diagonal and unequal"
As mentioned in earlier chapters, there are other shorthand notations for many
of the common parameter structures. Type ?MARSS at the R command line
to see a list of the shorthand options for each parameter vector/matrix.
The parameter vectors π (termed x0 in MARSS), a and u are each set to
be a column vector of zeros. Either of the following can be used:
x0 = U = matrix(0, nrow=3, ncol=1)
A = matrix(0, nrow=6, ncol=1)
x0 = U = A = "zero"
The Λ matrix (termed V0 in MARSS) is a diagonal matrix with 5’s along
the diagonal:
V0 = diag(5,3)
Finally, we make a list of the model parameters to pass to the MARSS()
function and set the control list:
dfa.model = list(Z=Z, A="zero", R=R, B=B, U=U, Q=Q, x0=x0, V0=V0)
cntl.list = list(maxit=50)
For the examples in this chapter, we have set the maximum iterations to 50
to speed up model fitting. Note, however, that the parameter estimates will
not have converged to their maximum likelihood values, which would likely
take 100s, if not 1000+, iterations.
9.3.1 Fitting the model
We can now pass the DFA model list to MARSS() to estimate the Z matrix
and underlying hidden states (x). The output is not shown because it is voluminous, but the model fits are plotted in Figure 9.2. The warnings regarding
non-convergence are due to setting maxit to 50.
kemz.3 = MARSS(dat.z, model=dfa.model, control=cntl.list)

120

9 Dynamic factor analysis

Warning! Reached maxit before parameters converged. Maxit was 50.
neither abstol nor log-log convergence tests were passed.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: maxit reached at 50 iter before convergence.
Neither abstol nor log-log convergence test were passed.
The likelihood and params are not at the ML values.
Try setting control$maxit higher.
Log-likelihood: -782.202
AIC: 1598.404
AICc: 1599.463

Z.z11
Z.z21
Z.z31
Z.z41
Z.z51
Z.z22
Z.z32
Z.z42
Z.z52
Z.z33
Z.z43
Z.z53
R.(Cryptomonas,Cryptomonas)
R.(Diatoms,Diatoms)
R.(Greens,Greens)
R.(Unicells,Unicells)
R.(Other.algae,Other.algae)

Estimate
0.4163
0.5364
0.2780
0.5179
0.1611
0.6757
-0.2381
-0.2381
-0.2230
0.2305
-0.1225
0.3887
0.6705
0.0882
0.7201
0.1865
0.5441

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
10 warnings. First 10 shown. Type cat(object$errors) to see the full list.
Warning: the Z.z51 parameter value has not converged.
Warning: the Z.z32 parameter value has not converged.
Warning: the Z.z52 parameter value has not converged.
Warning: the Z.z33 parameter value has not converged.
Warning: the Z.z43 parameter value has not converged.
Warning: the R.(Diatoms,Diatoms) parameter value has not converged.
Warning: the R.(Greens,Greens) parameter value has not converged.
Warning: the R.(Other.algae,Other.algae) parameter value has not converged.

9.4 Using model selection to determine the number of trends

121

Warning: the logLik parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.

Cryptomonas
3
2
1
0

abundance index

●
●
●

●
●
●
●
● ●●
●
●
●
●●
●● ●● ●
●● ●
●
● ●●
● ●
●
●
●●
●● ●
●
● ●●● ●
●
●
● ● ●
●
●●
●
● ●●
●● ●●●●
●
● ●
● ●
●●●
●
● ●● ●
●
● ●
● ●
●
●
● ● ●
● ●
●●
●
●●●
●
●
●
●●
●●●
●
●
● ● ● ●
● ●
●
●●
●
●
●
● ●
●
●
●

−4

−2

3
2
1
0
−2

●
●
● ●
●
●
● ●
●
● ●
●
●
●●
● ● ●●
●
●
●
●
●
●
● ●
●● ● ●
●
●
●● ●
●
●●
●
● ●●● ● ● ● ●
●
●
● ●
●●
●●
●● ●
●●● ●
●
●
●
●
●
● ●● ● ●●
●●●
●● ●●
●
● ●
●●
●
●
●
● ●● ●
●
●
●
● ●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●

−4

abundance index

Unicells

●

●
●

1980

1982

1984

1986

1988

1990

●

1980

1982

1988

●

1990

1

2

3
●

0

●

●

−2

●
●
●
●
●●
●
●
●
●
●
●●
●● ●
●
●● ●
●●
●
●
●● ●
●
● ● ● ●●●
●
●
●●
●
●
●
●
● ● ● ●●
●
● ●
●
●●
●
●
●
● ●●
●
● ●
●
● ● ●
● ●●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●●
●
● ● ●
●
● ●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●

abundance index

●

●
●

●
●
●
●
● ●
●●●
●
●●
●
●●
●
● ●●● ●
●
●
●●
●
● ●
●● ●●
●
●●
●
●
●
●
●
● ●
● ●
●
●
●
●
●● ● ● ●
●
●
●●
●
●
●●
●
●●
●
●●●
● ●
●
●
●
● ● ●
● ● ●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●

−4

2
1
0
−2

●

●
●
●
●

−4

abundance index

1986

Other.algae

3

Diatoms

1984

1980

1982

1984

1986

1988

1990

1980

1982

1984

1986

1988

1990

2
1
0
−2

●
●
●

●

●
●
●

●
●
●

●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●● ● ●●●●
●
● ● ●●● ●
●
●
●
●●●●
●
●●
●
●
● ●
●
● ● ● ●
●
●●
● ●
●
● ●●
● ●
●
●
● ●
●
●●●
●● ●● ● ●
●
●● ●
●
●
● ●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

●

●

●

−4

abundance index

3

Greens
●
●

1980

1982

1984

1986

1988

1990

Fig. 9.2. Plots of Lake Washington phytoplankton data with model fits (dark lines)
from a model with 3 trends and a diagonal and unequal variance-covariance matrix
for the observation errors. This model was run to convergence so is different than
that shown in the text which uses maxit=50.

9.4 Using model selection to determine the number of
trends
Following Zuur et al. (2003), we use model selection criteria (specifically AICc)
to determine the number of underlying trends that have the highest data
support. Our first model had three underlying trends (m = 3). Let’s compare
this to a model with two underlying trends. The forms for parameter matrix
R and vector a will stay the same but we need to change the other parameter
vectors and matrices because m is different.

122

9 Dynamic factor analysis

After showing you the matrix math behind a DFA model, we will now
use the form argument for a MARSS call to specify that we want to fit a
DFA model. This will set up the Z matrix and the other parameters for you.
Specify how many trends you want by passing in model=list(m=x). You can
also pass in different forms for the R matrix in the usual way.
Here is how to fit two trends using form="dfa":
model.list = list(m=2, R="diagonal and unequal")
kemz.2 = MARSS(dat.spp.1980, model=model.list,
z.score=TRUE, form="dfa", control=cntl.list)
Warning! Reached maxit before parameters converged. Maxit was 50.
neither abstol nor log-log convergence tests were passed.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: maxit reached at 50 iter before convergence.
Neither abstol nor log-log convergence test were passed.
The likelihood and params are not at the ML values.
Try setting control$maxit higher.
Log-likelihood: -789.7433
AIC: 1607.487
AICc: 1608.209

Z.11
Z.21
Z.31
Z.41
Z.51
Z.22
Z.32
Z.42
Z.52
R.(Cryptomonas,Cryptomonas)
R.(Diatoms,Diatoms)
R.(Greens,Greens)
R.(Unicells,Unicells)
R.(Other.algae,Other.algae)

Estimate
0.3128
0.1797
0.3061
0.5402
0.0791
-0.1174
0.4024
-0.0552
0.3895
0.7500
0.8565
0.5672
0.2292
0.6738

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the Z.31 parameter value has not converged.
Warning: the Z.51 parameter value has not converged.

9.4 Using model selection to determine the number of trends

123

Warning: the Z.32 parameter value has not converged.
Warning: the Z.42 parameter value has not converged.
Warning: the R.(Greens,Greens) parameter value has not converged.
Warning: the logLik parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
and compare its AICc value to that from the 3-trend model.
print(cbind(model=c("3 trends", "2 trends"),
AICc=round(c(kemz.3$AICc, kemz.2$AICc))),
quote=FALSE)
model
AICc
[1,] 3 trends 1589
[2,] 2 trends 1608
It looks like a model with 3 trends has much more support from the data
because its AICc value is more than 10 units less than that for the 2-trend
model.
9.4.1 Comparing many model structures
Now let’s examine a larger suite of possible models. We will test from one to
four underlying trends (m = 1 to 4) and four different structures for the R
matrix:
1.
2.
3.
4.

same variances & no covariance (``diagonal and equal'');
different variances & no covariance (``diagonal and unequal'');
same variances & same covariance (``equalvarcov''); and
different variances & covariances (``unconstrained'').

The following code builds our model matrices; you could also write out each
matrix as we did in the first example, but this allows us to build and run all
of the models together. (NOTE : the following piece of code will take a very
long time to run!)
# set new control params
cntl.list = list(minit=200, maxit=5000, allow.degen=FALSE)
# set up forms of R matrices
levels.R = c("diagonal and equal",
"diagonal and unequal",
"equalvarcov",
"unconstrained")
model.data = data.frame()
# fit lots of models & store results
# NOTE: this will take a long time to run!
for(R in levels.R) {
for(m in 1:(N.ts-1)) {

124

9 Dynamic factor analysis

dfa.model = list(A="zero", R=R, m=m)
kemz = MARSS(dat.z, model=dfa.model, control=cntl.list,
form="dfa", z.score=TRUE)
model.data = rbind(model.data,
data.frame(R=R,
m=m,
logLik=kemz$logLik,
K=kemz$num.params,
AICc=kemz$AICc,
stringsAsFactors=FALSE))
assign(paste("kemz", m, R, sep="."), kemz)
} # end m loop
} # end R loop
Model selection results are shown in Table 9.1. The model with lowest AICc
has 2 trends and an unconstrained R matrix. It also appears that, in general,
models with an unconstrained R matrix fit the data much better than those
models with less complex structures for the observation errors (i.e., models
with unconstrained forms for R had nearly all of the AICc weight).
Table 9.1. Model selection results.
R
unconstrained
unconstrained
unconstrained
unconstrained
diagonal and unequal
equalvarcov
diagonal and unequal
diagonal and equal
diagonal and equal
equalvarcov
equalvarcov
diagonal and unequal
equalvarcov
diagonal and equal
diagonal and unequal
diagonal and equal

m
3
2
4
1
4
2
3
4
3
4
3
2
1
2
1
1

logLik
-762.5
-765.9
-761.5
-772.4
-774.2
-782.7
-777.1
-779.3
-781.8
-779.0
-781.4
-786.6
-799.9
-798.4
-798.4
-813.5

delta.AICc
0.0
0.1
2.3
4.4
5.9
6.1
7.5
7.7
8.4
9.1
9.9
20.2
32.3
35.4
35.4
57.4

Ak.wt Ak.wt.cum
0.39
0.39
0.37
0.76
0.12
0.89
0.04
0.93
0.02
0.95
0.02
0.97
0.01
0.98
0.01
0.99
0.01
0.99
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00

9.5 Using varimax rotation to determine the loadings
and trends
As Harvey (1989, p. 450) discusses in section 8.5.1, there are multiple equivalent solutions to the dynamic factor loadings. We arbitrarily constrained Z in

9.5 Using varimax rotation to determine the loadings and trends

125

such a way to choose only one of these solutions, but fortunately the different
solutions are equivalent, and they can be related to each other by a rotation
matrix H. Let H be any m × m non-singular matrix. The following are then
equivalent solutions:
yt = Zxt + a + vt
(9.9)
xt = xt−1 + wt
and

yt = ZH−1 xt + a + vt
Hxt = Hxt−1 + Hwt

(9.10)

There are many ways of doing factor rotations, but a common approach is
the varimax rotation which seeks a rotation matrix H that creates the largest
difference between loadings. For example, let’s say there are three trends in our
model. In our estimated Z matrix, let’s say row 3 is (0.2, 0.2, 0.2). That would
mean that data series 3 is equally described by trends 1, 2, and 3. If instead row
3 was (0.8, 0.1, 0.1), this would make interpretation easier because we could
say that data time series 3 was mostly described by trend 1. The varimax
rotation finds the H matrix that makes the Z rows more like (0.8, 0.1, 0.1) and
less like (0.2, 0.2, 0.2).
The varimax rotation is easy to compute because R has a built in function
for this. To do so, we first get the model fits from the highest ranked model.
# get the "best" model
best.model = model.tbl[1,]
fitname = paste("kemz",best.model$m,best.model$R,sep=".")
best.fit = get(fitname)
Next, we retrieve the matrix used for varimax rotation.
# get the inverse of the rotation matrix
H.inv = varimax(coef(best.fit, type="matrix")$Z)$rotmat
Finally, we use H−1 to rotate the factor loadings and H to rotate the trends
as in Equation 9.10.
# rotate factor loadings
Z.rot = coef(best.fit, type="matrix")$Z %*% H.inv
# rotate trends
trends.rot = solve(H.inv) %*% best.fit$states
Rotated factor loadings for the best model are shown in Figure 9.3. Oddly,
some taxa appear to have no loadings on some trends (e.g., diatoms on trend
1). The reason is that, merely for display purposes, we chose to plot only
those loadings that are greater than 0.05, and it turns out that after varimax
rotation, several loadings are close to 0.
Recall that we set Var(wt ) = Q = Im in order to make our DFA model identifiable. Does the variance in the process errors also change following varimax

126

9 Dynamic factor analysis

rotation? Interestingly, no. Because H is a non-singular, orthogonal matrix,
Var(Hwt ) = HVar(wt )H> = HIm H> = Im .

Other.algae

Diatoms

Cryptomonas

Greens

0.4
0.0

Unicells

Factor loadings on trend 2

−0.4

Other.algae

Greens
Diatoms

Cryptomonas

−0.4

0.0

0.4

Factor loadings on trend 1

Unicells

Greens

−0.4

0.0

0.4

Factor loadings on trend 3

Fig. 9.3. Plot of the factor loadings (following varimax rotation) from the best
model fit to the phytoplankton data.

9.6 Examining model fits
Now that we have found a “best” model and done the appropriate factor and
trends rotations, we should examine some plots of model fits (see Figure 9.5).
First, it looks like the model did an adequate job of capturing some of the
high frequency variation (i.e., seasonality) in the time series. Second, some
of the time series had much better overall fits than others (e.g., compare
Diatoms versus Cryptomonas). Given the obvious seasonal patterns in the
phytoplankton data, it might be worthwhile to first “detrend” the data and
then repeat the model fitting exercise to see (1) how many trends would be
favored, and (2) the shape of those trends.

9.6 Examining model fits

2
−2
−6

−6

−2

2

6

Trend 2

6

Trend 1

127

1980

1983

1986

1989

1980

1983

1986

1989

−6

−2

2

6

Trend 3

1980

1983

1986

1989

Fig. 9.4. Plot of the unobserved trends (following varimax rotation) from the best
model fit to the phytoplankton data.
Cryptomonas
3
2
1
0

abundance index

●
●
●
●
●
●
●
● ●●
●
●
●
●●
●● ●● ●
●● ●
●
● ●●
● ●
●
●
●●
●● ●
●
● ●●● ●
●
●
● ● ●
●
●●
●
● ●●
●● ●●●●
●
● ●
● ●
●●●
●
● ●● ●
●
● ●
● ●
●
●
● ● ●
● ●
●●
●
●●●
●
●
●●
●●●
●
●
● ●● ● ●
●
● ●
●●
●
●
●
● ●
●
●
●

−4

−2

3
2
1
0
−2

●
●
● ●
●
●
● ●
●
● ●
●
●●
● ● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●● ●
● ●●
●●
●
● ●●● ● ● ● ●
●
●
●
● ●
●
●●
●● ●
●● ●●
● ● ●●
●
●
●
●
● ● ●●
●●●
●● ●●
●
●●
●
●
●
●
● ●● ●
●
●
●
●
● ●
●●
●
●
● ●
● ●●
●
●
●●
●
●
●
●●
●

−4

abundance index

Unicells

●

●
●

1980

1982

1984

1986

1988

1990

●

1980

1982

1988

●

1990

3
2
1
0

●

●

−2

●
●
●
●
●●
●
●
●
●
●
●●
●● ●
●
●● ●
●●
●
●
●● ●
●
● ● ● ●●●
●
●
●●
●●
●
● ● ● ●●
● ● ●
●
●●
●●
●
●
● ●●
●
● ●
●
● ● ●
● ●●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●●
●
● ● ●
●
● ●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●

abundance index

●

●
●

●
●
●
●
● ●
●●●
●
●●
●
●●
●
● ●●● ●
●
●
●●
●
● ●
●● ●●
●
●●
● ● ●● ● ●
●
●
●
● ●
●
●
●
●
●● ●
●
●
●●
●
●
●●
●
●●
●
●●●
● ●
●
●
●
● ● ●
● ● ●
● ●
●
●
●
●
● ●
●
● ●
● ● ●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●

−4

2
1
0
−2

●

●
●
●
●

−4

abundance index

1986

Other.algae

3

Diatoms

1984

1980

1982

1984

1986

1988

1990

1980

1982

1984

1986

1988

1990

2
1
0
−2

●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●● ● ●●●●
●
● ● ●●● ●
●
●
●
●
●
●
●●
●
●
●●●
●
● ● ● ●
●● ●
●
●
●
●
●
●
●
● ●
●
● ●●
●
●
● ●
●●●
●
●
●● ●
●
●● ●
● ●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

●
●
●

●

●

●

●

−4

abundance index

3

Greens
●
●

●
●

1980

1982

1984

1986

1988

1990

Fig. 9.5. Plot of the ”best” model fits to the phytoplankton data.

128

9 Dynamic factor analysis

9.7 Adding covariates
It is standard to add covariates to the analysis so that one removes known
important drivers. The DFA with covariates is written:
xt = xt−1 + wt where wt ∼ MVN(0, Q)
yt = Zxt + a + Ddt + vt where vt ∼ MVN(0, R)

(9.11)

x0 ∼ MVN(π, Λ)
where the q × 1 vector dt contains the covariate(s) at time t, and the n × q
matrix D contains the effect(s) of the covariate(s) on the observations. Using
form="dfa" and covariates=, we can easily add covariates to our DFA, but this means that the covariates are input, not data, and
there can be no missing values (see Chapter 6 for how to include covariates
with missing values).
The Lake Washington dataset has two environmental covariates that we
might expect to have effects on phytoplankton growth, and hence, abundance:
temperature (Temp) and total phosphorous (TP). We need the covariate inputs to have the same number of time steps as the variate data, and thus we
limit the covariate data to the years 1980-1994 also.
temp = t(plankdat[,"Temp",drop=FALSE])
TP = t(plankdat[,"TP",drop=FALSE])
We will now fit 3 different models that each add covariate effects (i.e.,
Temp, TP, Temp & TP) to our “best” model from Table 9.1 where m = 2 and
R is “unconstrained”.
model.list=list(m=2, R="unconstrained")
kemz.temp = MARSS(dat.spp.1980, model=model.list, z.score=TRUE,
form="dfa", control=cntl.list, covariates=temp)
kemz.TP = MARSS(dat.spp.1980, model=model.list, z.score=TRUE,
form="dfa", control=cntl.list, covariates=TP)
kemz.both = MARSS(dat.spp.1980, model=model.list, z.score=TRUE,
form="dfa", control=cntl.list, covariates=rbind(temp,TP))
Next we can compare whether the addition of the covariates improves the
model fit (effectively less residual error while accounting for the additional
parameters). (NOTE : The following results were obtained by letting the EM
algorithm run for a very long time, so your results may differ.)
print(cbind(model=c("no covars", "Temp", "TP", "Temp & TP"),
AICc=round(c(best.fit$AICc, kemz.temp$AICc, kemz.TP$AICc,
kemz.both$AICc))), quote=FALSE)
model
AICc
[1,] no covars 1582
[2,] Temp
1518

9.8 Questions and further analyses

129

[3,] TP
1568
[4,] Temp & TP 1522
This suggests that adding temperature or phosphorus to the model, either
alone or in combination with one another, improves overall model fit. If we
were truly interested in assessing the “best” model structure that includes
covariates, however, we should examine all combinations of trends and structures for R. The model fits for the temperature-only model are shown in Fig
9.6 and they appear much better than the best model without any covariates.

Cryptomonas
3
2
1
0

abundance index

●
●
●
●
●
●
●
● ●●
●
●
●
●●
●● ●● ●
●● ●
●
● ●●
● ●
●
●
●●
●● ●
●
● ●●● ●
●
●
● ● ●
●
●
●●
●
● ●●
●● ●●●●
●
● ●
●
●●●
● ●
● ●● ●
●
● ●
●
●
● ● ●
● ●
●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
● ●● ● ●
● ●
●
●●
●
●
●
● ●
●
●
●

−4

−2

3
2
1
0
−2

●
●
● ●
●
●
● ●
●
● ●
●
●
●●
● ● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●● ●
● ●●
●●
●
● ●●● ● ● ● ●
●
●
● ●
●●
●●
●● ●
●● ●●
● ● ●●
●
●
●
●
● ● ●●
●●●
●● ●●
●
●●
●
●
●
●●
● ●● ●
●
●
●
● ●
●●
●
●
● ●
● ●●
●
●
●●
●
●
●
●●

−4

abundance index

Unicells

●

●
●

1980

1982

1984

1986

1988

1990

●

1980

1982

1988

●

1990

2

3
●

1

●
●●
●
●
●
●
●
●●
●● ●
●
●● ●
●●
●
●
●● ●
●
● ● ● ●●●
●
●
●●
●●
●
● ● ● ●●
● ● ●
●
●●
●●
●
●
● ●●
●
● ●
●
● ● ●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●●
●
● ● ●
●
● ●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●

0

●

●

−2

●
●

●
●
●
●
● ●
●●●
●
● ● ●
●
●●
●●
●
● ●●● ●
●
●
●●
●
● ●
●● ●●
●●
● ● ●● ● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●●
●
●●●
● ●
●
●
●
● ● ●
● ● ●
● ●
●
●
●
●
● ●
●
● ●
● ● ●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●

−4

−2

●

●
●

abundance index

2
1

●
●
●
●

0

●

−4

abundance index

1986

Other.algae

3

Diatoms

1984

1980

1982

1984

1986

1988

1990

1980

1982

1984

1986

1988

1990

2
1
0
−2

●
●
●

●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●● ● ●●●●
●
● ● ●●● ●
●
●
●
●●●●
●
●●
●
●
● ●
●
● ● ● ●
●
●●
●●
● ●
●
●
●
●
●
●
●
● ●
●
●●●
●
● ●
●
●● ●
●
●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

●

●
●
●

●

●

●

●

−4

abundance index

3

Greens

1980

1982

1984

1986

1988

1990

Fig. 9.6. Plot of the fits from the temperature-only model to the phytoplankton
data.

9.8 Questions and further analyses
We analyzed the phytoplankton data alone. You can try analyzing the zooplankton data (type head(plankdat) to see the names). You can also try
analyzing the phytoplankton and zooplankton together. You can also try different assumptions concerning the structure of R; we just tried unconstrained,

130

9 Dynamic factor analysis

diagonal and unequal, and diagonal and equal. To see all the R code behind the
figures, type RShowDoc("Chapter_DFA.R",package="MARSS"). This opens a
file with all the code. Copy and paste the code into a new file, and then you
can edit that code. DFA models often take an unusually long time to converge.
In a real DFA, you will want to make sure to try different initial starting values (e.g., set MCInit = TRUE), and force the algorithm to run a long time by
using minit=x and maxit=(x+c), where x and c are something like 200 and
5000, respectively.

10
Analyzing noisy animal tracking data

10.1 A simple random walk model of animal movement
A simple random walk model of movement with drift (directional movement)
but no correlation is
x1,t = x1,t−1 + u1 + w1,t , w1,t ∼ N(0, σ21 )

(10.1)

N(0, σ22 )

(10.2)

x2,t = x2,t−1 + u2 + w2,t , w2,t ∼

where x1,t is the location at time t along one axis (here, longitude) and x2,t is
for another, generally orthogonal, axis (in here, latitude). The parameter u1 is
the rate of longitudinal movement and u2 is the rate of latitudinal movement.
We add errors to our observations of location:
y1,t = x1,t + v1,t , v1,t ∼ N(0, η21 )

(10.3)

N(0, η22 ),

(10.4)

y2,t = x2,t + v2,t , v2,t ∼

This model is comprised of two separate univariate state-space models.
Note that y1 depends only on x1 and y2 depends only on x2 . There are no
actual interactions between these two univariate models. However, we can
write the model down in the form of a multivariate model using diagonal
variance-covariance matrices and a diagonal design (Z) matrix. Because the
variance-covariance matrices and Z are diagonal, the x1 :y1 and x2 :y2 processes
will be independent as intended. Here are Equations 10.2 and 10.4 written as
a MARSS model (in matrix form):

Type RShowDoc("Chapter_AnimalTracking.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter.

132

10 Analyzing animal tracking data

  
   

  2 
x1,t
x
u
w
σ 0
= 1,t−1 + 1 + 1,t , wt ∼ MVN 0, 1 2
x2,t
x2,t−1
u2
w2,t
0 σ2

(10.5)

      
  2 
y1,t
1 0 x1,t
v
η 0
=
+ 1,t , vt ∼ MVN 0, 1 2
y2,t
0 1 x2,t
v2,t
0 η2

(10.6)

The variance-covariance matrix for wt is a diagonal matrix with unequal variances, σ21 and σ22 . The variance-covariance matrix for vt is a diagonal matrix
with unequal variances, η21 and η22 . We can write this succinctly as
xt = xt−1 + u + wt , wt ∼ MVN(0, Q)

(10.7)

yt = xt + vt , vt ∼ MVN(0, R).

(10.8)

10.2 Loggerhead sea turtle tracking data
Loggerhead sea turtles (Caretta caretta) are listed as threatened under the
United States Endangered Species Act of 1973. Over the last ten years, a
number of state and local agencies have been deploying ARGOS tags on loggerhead turtles on the east coast of the United States. We have data on eight
individuals over that period. In this chapter, we use some turtle data from
the WhaleNet Archive of STOP Data, however we have corrupted this data
severely by adding random errors in order to create a “bad tag” problem. We
corrupted latitude and longitude data by errors (Figure 10.1) and it would
appear that our sea turtles are becoming land turtles (at least part of the
time). We will the MARSS model to estimate true positions and speeds from
the corrupted data.
Our noisy data are in loggerheadNoisy. They consist of daily readings of
location (longitude/latitude). If data are missing for a day, then the entries
for latitude and longitude for that day should be NA. However, to make the
code in this chapter run quickly, we have interpolated all missing values in
the original, uncorrupted, dataset (loggerhead). The first six lines of the
corrupted data look like so
loggerheadNoisy[1:6,]
1
2
3
4
5
6

turtle month day year
lon
lat
BigMama
5 28 2001 -81.45989 31.70337
BigMama
5 29 2001 -80.88292 32.18865
BigMama
5 30 2001 -81.27393 31.67568
BigMama
5 31 2001 -81.59317 31.83092
BigMama
6
1 2001 -81.35969 32.12685
BigMama
6
2 2001 -81.15644 31.89568

The file has data for eight turtles:

10.3 Estimate locations from bad tag data

133

turtles=levels(loggerheadNoisy$turtle)
turtles
[1] "BigMama"
[6] "MaryLee"

"Bruiser"
"TBA"

"Humpty"
"Yoto"

"Isabelle" "Johanna"

We will analyze the position data for “Big Mama”. We put the data for “Big
Mama” into matrix dat. dat is transposed because we need time across the
columns.
turtlename="BigMama"
dat = loggerheadNoisy[which(loggerheadNoisy$turtle==turtlename),5:6]
dat = t(dat) #transpose
Figure 10.1 shows the corrupted location data for Big Mama. The figure
was generated with the code below and uses the maps R package. You will
need to install this R package in order to run the example code.
#load the map package; you have to install it first
library(maps)
# Read in our noisy data (no missing values)
pdat = loggerheadNoisy #for plotting
turtlename="BigMama"
par(mai = c(0,0,0,0),mfrow=c(1,1))
map('state', region = c('florida', 'georgia', 'south carolina', 'north carolina',
'virginia', 'delaware','new jersey','maryland'),xlim=c(-85,-70))
points(pdat$lon[which(pdat$turtle==turtlename)], pdat$lat[which(pdat$turtle==turtlename)],
col="blue",pch=21, cex=0.7)

10.3 Estimate locations from bad tag data
We will begin by specifying the structure of the MARSS model and then use
MARSS() to fit that model to the data. There are two state processes (one for
latitude and the other for longitude), and there is one observation time series
for each state process. As we saw in Equation 10.6, Z is the an identity matrix
(a diagonal matrix with 1s on the diagonal). We could specify this structure
as Z.model="identity" or Z.model=factor(c(1,2)). Although technically,
this is unnecessary as this is the default form for Z.
We will assume that the errors are independent and that there are different
drift rates (u), process variances (σ2 ) and observation variances for latitude
and longitude (η2 ).
Z.model="identity"
U.model="unequal"
Q.model="diagonal and unequal"
R.model="diagonal and unequal"

134

10 Analyzing animal tracking data

●
●
●●
●
●
●●●
●●●
●
●
●● ● ●
●●
●● ●
●●
●●●
●
●
●●
●
●

●
●
●
●
●
●
●
●
●●
●
●● ● ●
●
●●
●
●●●
●
●●●
●● ●
●
●●
●●●
●● ●
●
●
●
● ●●

●
●

●

Fig. 10.1. Plot of the tag data from the turtle Big Mama. Errors in the location
data make it seem that Big Mama has been moving overland.

Fit the model to the data:
kem = MARSS(dat, model=list(Z = Z.model,
Q = Q.model, R = R.model, U = U.model))
Now we can create a plot comparing the estimated and actual locations
(Figure 10.2). In Figure The real locations (from which loggerheadNoisy was
produced by adding noise) are in loggerhead and plotted with crosses. There
are only a few data points for the real data because in the real tag data, there
are many missing days.
#Code to plot estimated turtle track against observations
#The estimates
pred.lon = kem$states[1,]
pred.lat = kem$states[2,]
par(mai = c(0,0,0,0),mfrow=c(1,1))
library(maps)
pdat=loggerheadNoisy
turtlename="BigMama"

10.3 Estimate locations from bad tag data

135

map('state', region = c('florida', 'georgia', 'south carolina', 'north carolina',
'virginia', 'delaware','new jersey','maryland'),xlim=c(-85,-70))
points(pdat$lon[which(pdat$turtle==turtlename)], pdat$lat[which(pdat$turtle==turtlename)],
col="blue",pch=21, cex=0.7)
lines(pred.lon, pred.lat,col="red", lwd=2)
goodturtles = loggerhead
gooddat = goodturtles[which(goodturtles$turtle==turtlename),5:6]
points(gooddat[,1], gooddat[,2],col="black", lwd=2, pch=3,cex=1.1)
legend("bottomright",c("bad locations", "estimated true location",
"good location data"),pch=c(1,-1,3),lty=c(-1,1,-1),
col=c("blue","red","black"), bty=FALSE)

●
●
●●
●
●
●●●
●●●
●
●
●● ● ●
●●
●● ●
●●

●

●●●
●
●●

●
●

●
●
●
●
●
●
●
●
●●
●
●● ● ●
●
●●
●
●●●
●
●●
●●
●● ●
●
●
●
●
●●●
●
●
●●
●
● ●

●
●

●

●

bad locations
estimated true location
good location data

Fig. 10.2. Plot of the estimated track of the turtle Big Mama versus the good
location data (before we corrupted it with noise).

136

10 Analyzing animal tracking data

10.4 Estimate speeds for each turtle
Turtle biologists designated one of these loggerheads “Big Mama,” presumably
for her size and speed. For each of the eight turtles, estimate the average miles
traveled per day. To calculate the distance traveled by a turtle each day, you
use the estimate (from MARSS()) of the lat/lon location of turtle at day t and
at day t − 1. To calculate distance traveled in miles from lat/lon start and
finish locations, we will use the function GCDF:
GCDF <- function(lon1, lon2, lat1, lat2, degrees=TRUE, units="miles") {
temp = ifelse(degrees==FALSE,
acos(sin(lat1)*sin(lat2)+cos(lat1)*cos(lat2)*cos(lon2-lon1)),
acos(sin(lat1/57.2958)*sin(lat2/57.2958)+cos(lat1/57.2958)*cos(lat2/57.2958)
*cos(lon2/57.2958-lon1/57.2958)))
r=3963.0 # (statute miles) , default
if("units"=="nm") r=3437.74677 # (nautical miles)
if("units"=="km") r=6378.7 # (kilometers)
return (r * temp)
}
We can now compute the distance traveled each day by passing in lat/lon
estimates from day i − 1 and day i:
distance[i-1]=GCDF(pred.lon[i-1],pred.lon[i],
pred.lat[i-1],pred.lat[i])
pred.lon and pred.lat are the predicted longitudes and latitudes from
MARSS(): rows one and two in kem$states. To calculate the distances for
all days, we put this through a for loop:
distance = array(NA, dim=c(dim(dat)[2]-1,1))
for(i in 2:dim(dat)[2])
distance[i-1]=GCDF(pred.lon[i-1],pred.lon[i],
pred.lat[i-1],pred.lat[i])
The command mean(distance) gives us the average distance per day. We
can also make a histogram of the distances traveled per day (Figure 10.3).
We can compare the histogram of daily distances to what we would get
if we had not accounted for measurement error (Figure 10.4). We can also
compare the mean miles per day:
#accounting for observation error
mean(distance)
[1] 15.53858
#assuming the data have no observation error
mean(distance.noerr)
[1] 34.80579

10.5 Using specialized packages to analyze tag data

137

par(mfrow=c(1,1))
hist(distance) #make a histogram of distance traveled per day

10
0

5

Frequency

15

Histogram of distance

0

10

20

30

40

50

distance

Fig. 10.3. Histogram of the miles traveled per day for Big Mama with estimates
that account for measurement error in the data.

You can repeat the analysis done for “Big Mama” for each of the other
turtles and compare the turtle speeds and errors. You will need to replace
“Big Mama” in the code with the name of the other turtle:
levels(loggerheadNoisy$turtle)
[1] "BigMama"
[6] "MaryLee"

"Bruiser"
"TBA"

"Humpty"
"Yoto"

"Isabelle" "Johanna"

10.5 Using specialized packages to analyze tag data
If you have real tag data to analyze, you should use a state-space modeling
package that is customized for fitting MARSS models to tracking data. The
MARSS package does not have all the bells and whistles that you would
want for analyzing tracking data, particularly tracking data in the marine

138

10 Analyzing animal tracking data

# Compare to the distance traveled per day if you used the raw data
distance.noerr = array(NA, dim=c(dim(dat)[2]-1,1))
for(i in 2:dim(dat)[2])
distance.noerr[i-1]=GCDF(dat[1,i-1],dat[1,i],dat[2,i-1],dat[2,i])
hist(distance.noerr) #make a histogram of distance traveled per day

10
0

5

Frequency

15

20

Histogram of distance.noerr

0

20

40

60

80

100

distance.noerr

Fig. 10.4. Histogram of the miles traveled per day for Big Mama with estimates
that account for measurement error in the data.

environment. These are a couple R packages that we have come across for this
purpose:
UKFSST http://www.soest.hawaii.edu/tag-data/tracking/ukfsst/
KFTRACK http://www.soest.hawaii.edu/tag-data/tracking/kftrack/
kftrack is a full-featured toolbox for analyzing tag data with extended
Kalman filtering. It incorporates a number of extensions that are important
for analyzing track data: barriers to movement such as coastlines and nonGaussian movement distributions. With kftrack, you can use the real tag
data which has big gaps, i.e. days with no location. MARSS() will struggle with
these data because it will estimate states for all the unseen days; kftrack only
fits to the seen days.
To use kftrack to fit the turtle data, type

10.5 Using specialized packages to analyze tag data

library(kftrack) # must be installed from a local zip file
loggerhead = loggerhead
# Run kftrack with the first turtle (BigMama)
turtlename = "BigMama"
dat = loggerhead[which(loggerhead$turtle == turtlename),2:6]
model = kftrack(dat, fix.first=F, fix.last=F,
var.struct="uniform")

139

11
Detection of outliers and structural breaks

11.1 River flow in the Nile River
This chapter is based on a short example shown on pages 147-148 in Koopman
et al. (1999) using a 100-year record of river flow on the Nile River. The
methods are based on Harvey et al. (1998) which is in turn based on techniques
in Harvey and Koopman (1992) and Koopman (1993). The Nile dataset is
included in R . Figure 11.1 shows the data.

11.2 Different models for the Nile flow levels
We begin by fitting different flow models to the data and compare these models
with AIC. After that, we will use the model residuals to look for outliers and
structural breaks.
11.2.1 Flat level model
We will start by modeling these data as a simple average river flow with
variability around this level.
yt = a + vt where vt ∼ N(0, r)

(11.1)

where yt is the river flow volume at year t and x is some constant average flow
level (notice it has no t subscript).
To fit this model with MARSS, we will explicitly show all the MARSS
parameters.
Type RShowDoc("Chapter_StructuralBreaks.R",package="MARSS") at the R
command line to open a file with all the code for the examples in this chapter.

142

11 Outliers and structural breaks

1000
800
600

Flow volume

1200

1400

#load the datasets package
library(datasets)
data(Nile)
#load the data
plot(Nile,ylab="Flow volume",xlab="Year")

1880

1900

1920

1940

1960

Year

Fig. 11.1. The Nile River flow volume 1871 to 1970 (included dataset in R ).

xt = 1 × xt−1 + 0 + wt where wt ∼ N(0, 0)
yt = 0 × xt + a + vt where vt ∼ N(0, r)

(11.2)

x0 = 0
MARSS includes the state process xt but we are setting Z to zero so that does
not appear in our observation model. We need to fix all the state parameters
to zero so that the algorithm doesn’t “chase its tail” trying to fit xt to the
data.
An equivalent way to write this model is to use xt as the average flow level
and make it be a constant level by setting q = 0. The average flow appears as
the x0 parameter. In MARSS form, the model is:
xt = 1 × xt−1 + 0 + wt where wt ∼ N(0, 0)
yt = 1 × xt + 0 + vt where vt ∼ N(0, r)
x0 = a

(11.3)

11.2 Different models for the Nile flow levels

143

We will use this latter format since we will be building on this form. The
model is specified as a list as follows and we denote this model “0”:
mod.nile.0 = list(
Z=matrix(1), A=matrix(0), R=matrix("r"),
B=matrix(1), U=matrix(0), Q=matrix(0),
x0=matrix("a") )
We then fit the model with MARSS():
#The data is in a ts format, and we need a matrix
dat = t(as.matrix(Nile))
#Now we fit the model
kem.0 = MARSS(dat, model=mod.nile.0)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -654.5157
AIC: 1313.031
AICc: 1313.155

R.r
x0.a

Estimate
28352
919

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
11.2.2 Linear trend in flow model
Figure 11.2 shows the fit for the flat average river flow model. Looking at the
data, we might expect that a declining average river flow would be better. In
MARSS form, that model would be:
xt = 1 × xt−1 + u + wt where wt ∼ N(0, 0)
yt = 1 × xt + 0 + vt where vt ∼ N(0, r)

(11.4)

x0 = a
where u is now the average per-year decline in river flow volume. The model
is specified as a list as follows and we denote this model “1”:

144

11 Outliers and structural breaks

mod.nile.1 = list(
Z=matrix(1), A=matrix(0), R=matrix("r"),
B=matrix(1), U=matrix("u"), Q=matrix(0),
x0=matrix("a") )
We then fit the model with MARSS():
kem.1 = MARSS(dat, model=mod.nile.1)
Success! abstol and log-log tests passed at 18 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 18 iterations.
Log-likelihood: -642.3159
AIC: 1290.632
AICc: 1290.882
Estimate
R.r 22213.60
U.u
-2.69
x0.a 1054.94
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Figure 11.2 shows the fits for the two models with deterministic models (flat
and declining) for mean river flow along with their AICc values (smaller AICc
is better). The AICc for the model with a declining river flow is lower by over
20 (which is a lot).
11.2.3 Stochastic level model
Looking at the flow levels, we might suspect that a model that allows the
average flow to change would model the data better and we might suspect that
there have been sudden, and anomalous, changes in the river flow level. We
will now model the average river flow at year t as a random walk, specifically
an autoregressive process which means that average river flow is year t is a
function of average river flow in year t − 1.
xt = xt−1 + wt where wt ∼ N(0, q)
yt = xt + vt where vt ∼ N(0, r)

(11.5)

x0 = π
As before, yt is the river flow volume at year t. With all the MARSS parameters
shown, the model is:

11.2 Different models for the Nile flow levels

145

xt = 1 × xt−1 + 0 + wt where wt ∼ N(0, q)
yt = 1 × xt + 0 + vt where vt ∼ N(0, r)

(11.6)

x0 = π
Thus, Z = 1, a = 0, R = r, B = 1, u = 0, Q = q, and x0 = π. The model is then
specified as:
mod.nile.2 = list(
Z=matrix(1), A=matrix(0), R=matrix("r"),
B=matrix(1), U=matrix(0), Q=matrix("q"),
x0=matrix("pi") )
We could also use the text shortcuts to specify the model. Because R and
Q are 1 × 1 matrices, “unconstrained”, “diagonal and unequal“, “diagonal and
equal” and “equalvarcov” will all lead to a 1 × 1 matrix with one estimated
element. For a and u, the following shortcut could be used:
A=U="zero"
Because x0 is 1 × 1, it could be specified as “unequal”, “equal” or “unconstrained”.
We fit the model with the MARSS() function. We are using the “BFGS”
algorithm to polish off the estimates, since it will get the maximum faster
than the default EM algorithm as long as we start it close to the maximum.
kem.2em = MARSS(dat, model=mod.nile.2, silent=TRUE)
kem.2 = MARSS(dat, model=mod.nile.2,
inits=kem.2em$par, method="BFGS")
Success! Converged in 12 iterations.
Function MARSSkfas used for likelihood calculation.
MARSS fit is
Estimation method: BFGS
Estimation converged in 12 iterations.
Log-likelihood: -637.7451
AIC: 1281.49
AICc: 1281.74

R.r
Q.q
x0.pi

Estimate
15337
1218
1112

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
This is the same model fit in Koopman et al. (1999, p. 148) except that we
estimate x1 as parameter rather than specifying x1 via a diffuse prior. As

146

11 Outliers and structural breaks

1000

model 0, AICc= 1313

600

Flow volume

1400

a result, the log-likelihood value and R and Q are a little different than in
Koopman et al. (1999).

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

1000

model 1, AICc= 1291

600

Flow volume

1400

1870

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

1000

model 2, AICc= 1282

600

Flow volume

1400

1870

1870

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

Fig. 11.2. The Nile River flow volume with the model estimated flow rates (solid
lines). The bottom model is a stochastic level model, and the 2 standard deviations
for the level are also shown. The other two models are deterministic level models so
the state is not stochastic and does not have a standard deviation.

11.3 Observation and state residuals
Figure 11.2 shows the MARSS fits to the data. From these model fits, auxiliary
residuals can be computed which contain information about whether the data
and models fits at time t differ more than you would expect given the model
and the model fits at time t − 1. In this section, we follow the example shown

11.3 Observation and state residuals

147

on page 147-148 in Koopman et al. (1999) and use these residuals to look
for outliers and sudden flow level changes. Using auxiliary residuals this way
follows mainly from Harvey and Koopman (1992), but see also Koopman
(1993, sec. 3), de Jong and Penzer (1998) and Penzer (2001) for discussions
of using auxiliary residuals for detection of outliers and structural breaks.
The MARSS function will output the expected values of xt conditioned on
the maximum-likelihood values of q, r, and x1 and on the data (y from t = 1 to
T ). In time-series literature, these are called the smoothed state estimates and
they are output by the Kalman filter-smoother. We will call these smoothed
estimates x̃t|T (and are xtT in the MARSS output). The time value after the |
in the subscript indicates the data on which the estimate was conditioned (in
this case, 1 to T ). From these, we can compute the model predicted value of
yt , denoted or ŷt|T . This is the predicted value of yt conditioned on x̃t|T .
x̃t|T = E(Xt |θ̂, yT1 )
ŷt|T = E(Yt |θ̂, x̃t|T )
= x̃t|T +

(11.7)

E(wt |θ̂, yT1 ) = x̃t|T

where θ̂ are the maximum-likelihood estimates of the parameters. The ŷt|T
equation comes directly from equation (11.5). This expectation is not conditioned on the data yT1 , directly. It is conditioned on x̃t|T , which is conditioned
on yT1 .
11.3.1 Using observation residuals to detect outliers
The standardized smoothed observation residuals1 are the difference between
the data at time t and the model fit at time t conditioned on all the data
standardized by the observation variance:
v̂t = yt − ŷt|T
1
v̂t
et = p
var(v̂t )

(11.8)

These residuals should have (asymptotically) a t-distribution (Kohn and Ansley, 1989, sec. 3) and by looking at the residuals, we can identify potential
outlier data points–or more accurately, we can identify data points that do
not fit the model (Equation 11.5). The call residuals() will compute these
residuals for a marssMLE object (output by a MARSS call). It returns the
standardized residuals (also called auxiliary residuals) as a n + m × T matrix.
The first n rows are the estimated vt standardized observation residuals and
1

also called smoothations in the literature to distinguish them from innovations,
which are yt − E(Yt |x̃t|t−1 ). Notice that for innovations the expectation is conditioned on the data up to time t − 1 while for smoothations, we condition on all
the data.

148

11 Outliers and structural breaks

the next m rows are the estimated wt standardized state residuals (discussed
below).
resids.0=residuals(kem.0)$std.residuals
resids.1=residuals(kem.1)$std.residuals
resids.2=residuals(kem.2)$std.residuals
Figure 11.3 shows the observation residuals for the three models developed above. We immediately see that model 0 (flat level) and model 1 (linear
declining level) have problems because the residuals are all positive for the
first part of the time series and then all negative. The residuals should not be
temporally correlated like that. Model 2 with a stochastic level shows wellbehaving residuals with low temporal correlation between t and t − 1. Looking
at the residuals for model 2, we see that there are a number of years with flow
levels that appear to be outliers (are beyond the dashed level lines).

2
0
−2
−4

std. residuals

4

model 0−−flat level

1870

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

1950

1960

1970

1950

1960

1970

2
0
−2
−4

std. residuals

4

model 1−−linearly declining level

1870

1880

1890

1900

1910

1920

1930

1940

2
0
−2
−4

std. residuals

4

model 2−−stochastic level

1870

1880

1890

1900

1910

1920

1930

1940

Fig. 11.3. The standardized observation residuals from models 0, 1, and 2. These
residuals are the standardized v̂t . The dashed lines are the 95% CIs for a tdistribution.

11.3 Observation and state residuals

149

11.3.2 Detecting sudden level changes
The standardized smoothed state residuals ( ft below) are the difference between the estimated state at time t and the estimated state at time t − 1
conditioned on all the data standardized by the standard deviation:
ŵt = x̃t|T − x̃t−1|T
1
ŵt
ft = p
var(ŵt )

(11.9)

These state residuals do not show simple changes in the average level; xt is
clearly changing in Figure 11.2, bottom panel. Instead we are looking for
“breaks” or sudden changes in the level. The bottom panel of Figure 11.4
shows the standardized state residuals ( ft ). This shows, as we can see by eye,
the average flow level in the Nile appears to have suddenly changed around
the turn of the century when the first Aswan dam was built. The top panel
shows the standardized observation residuals for comparison.

1870

1890

1910

1930

1950

1970

1950

1970

test for level changes

−4

−2

0

2

standardized residuals
4
−4

−2

0

2

4

test for outliers

1870

1890

1910

1930

Fig. 11.4. Top panel, the standardized observation residuals. Bottom panel, the
standardized state residuals. This replicates Figure 12 in Koopman et al. (1999).

12
Incorporating covariates into MARSS models

12.1 Covariates as inputs
A MARSS model with covariate effects in both the process and observation
components is written as:
xt = Bt xt−1 + ut + Ct ct + wt , where wt ∼ MVN(0, Qt )
yt = Zt xt + at + Dt dt + vt , where vt ∼ MVN(0, Rt )

(12.1)

where ct is the p × 1 vector of covariates (e.g., temperature, rainfall) which
affect the states and dt is a q × 1 vector of covariates (potentially the same as
ct ), which affect the observations. Ct is an m × p matrix of coefficients relating
the effects of ct to the m × 1 state vector xt , and Dt is an n × q matrix of
coefficients relating the effects of dt to the n × 1 observation vector yt .
With the MARSS() function, one can fit this model by passing in model$c
and/or model$d in the MARSS() call as a p × T or q × T matrix, respectively.
The form for Ct and Dt is similarly specified by passing in model$C and/or
model$D. Because C and D are matrices, they must be passed in as an 3dimensional array with the 3rd dimension equal to the number of time steps
if they are time-varying. If they are time-constant, then they can be specified
as 2-dimensional matrices.

12.2 Examples using plankton data
Here we show some examples using the Lake Washington plankton data set
and covariates in that dataset. We use the 10 years of data from 1965-1974
Type RShowDoc("Chapter_Covariates.R",package="MARSS") at the R command
line to open a file with all the code for the examples in this chapter.

152

12 Covariates

(Figure 12.1), a decade with particularly high green and bluegreen algae levels.
We use the transformed plankton dataset which has 0s replaced with NAs.
Below, we set up the data and z-score the data. The original data were already
z-scored, but we changed the mean when we subsampled the years so need to
z-score again.
fulldat = lakeWAplanktonTrans
years = fulldat[,"Year"]>=1965 & fulldat[,"Year"]<1975
dat = t(fulldat[years,c("Greens", "Bluegreens")])
the.mean = apply(dat,1,mean,na.rm=TRUE)
the.sigma = sqrt(apply(dat,1,var,na.rm=TRUE))
dat = (dat-the.mean)*(1/the.sigma)
Next we set up the covariate data, temperature and total phosphorous. We
z-score the covariates to standardize and remove the mean.
covariates = rbind(
Temp = fulldat[years,"Temp"],
TP = fulldat[years,"TP"])
# z.score the covariates
the.mean = apply(covariates,1,mean,na.rm=TRUE)
the.sigma = sqrt(apply(covariates,1,var,na.rm=TRUE))
covariates = (covariates-the.mean)*(1/the.sigma)

12.3 Observation-error only model
We can estimate the effect of the covariates using a process-error only model,
an observation-error only model, or a model with both types of error. An
observation-error only model is a multivariate regression, and we will start
here so you see the relationship of MARSS model to more familiar linear
regression models.
12.3.1 Multivariate linear regression
In a standard multivariate linear regression, we only have an observation
model with independent errors (i.e., the state process does not appear in
the model):
yt = a + Ddt + vt , where vt ∼ MVN(0, R)
(12.2)
The elements in a are the intercepts and those in D are the slopes (effects).
We have dropped the t subscript on a and D because these will be modeled as
time-constant. Writing this out for the two plankton and the two covariates
we get:
 
  


 
yg
a1
βg,temp βg,tp temp
v
=
+
+ 1
(12.3)
ybg t
a2
βbg,temp βbg,tp
tp t−1
v2 t

153

1
−1

1
0
−2 −1
0

2
−1

1

TP

3

4

−1

Temp

1

Bluegreens

2

−3

Greens

2

12.3 Observation-error only model

1966

1968

1970

1972

1974

Time

Fig. 12.1. Time series of Green and Bluegreen algae abundances in Lake Washington along with the temperature and total phosporous covariates.

Let’s fit this model with MARSS. The x part of the model is irrelevant so
we want to fix the parameters in that part of the model. We won’t set B = 0
or Z = 0 since that might cause numerical issues for the Kalman filter. Instead
we fix them as identity matrices and fix x0 = 0 so that xt = 0 for all t.
Q = U = x0 = "zero"; B = Z = "identity"
d = covariates
A = "zero"
D = "unconstrained"
y = dat # to show relationship between dat & the equation
model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,D=D,d=d,x0=x0)
kem = MARSS(y, model=model.list)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem

154

12 Covariates

Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -276.4287
AIC: 562.8573
AICc: 563.1351

R.diag
D.(Greens,Temp)
D.(Bluegreens,Temp)
D.(Greens,TP)
D.(Bluegreens,TP)

Estimate
0.706
0.367
0.392
0.058
0.535

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
We set A="zero" because the data and covariates have been demeaned. Of
course, one can do multiple regression in R using, say, lm(), and that would
be much, much faster. The EM algorithm is over-kill here, but it is shown so
that you see how a standard multivariate linear regression model is written as
a MARSS model in matrix form.
12.3.2 Multivariate linear regression with autocorrelated errors
We can add a twist to the standard multivariate linear regression model, and
instead of having temporally i.i.d. errors in the observation process, we’ll assume autoregressive errors. There is still no state process in our model, but we
will use the state part of a MARSS model to model our errors. Mathematically,
this can be written as
xt = Bxt−1 + wt , where wt ∼ MVN(0, Q)
yt = Dt dt + xt

(12.4)

Here, the xt are the errors for the observation model; they are modeled as an
autoregressive process via the x equation. We drop the vt (set R = 0) because
the xt in the y equation are now the observation errors. As usual, we have left
the intercepts (a and u) off since the data and covariates are all demeaned.
Here’s how we fit this model in MARSS:
Q = "unconstrained"
B = "diagonal and unequal"
A = U = x0 = "zero"
R = "diagonal and equal"
d = covariates
D = "unconstrained"
y = dat # to show the relation between dat & the model equations
model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,D=D,d=d,x0=x0)

12.4 Process-error only model

155

control.list = list(maxit=1500)
kem = MARSS(y, model=model.list, control=control.list)
Success! abstol and log-log tests passed at 79 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 79 iterations.
Log-likelihood: -209.3408
AIC: 438.6816
AICc: 439.7243

R.diag
B.(X.Greens,X.Greens)
B.(X.Bluegreens,X.Bluegreens)
Q.(1,1)
Q.(2,1)
Q.(2,2)
D.(Greens,Temp)
D.(Bluegreens,Temp)
D.(Greens,TP)
D.(Bluegreens,TP)

Estimate
0.0428
0.2479
0.9136
0.7639
-0.0285
0.1265
0.3777
0.2621
0.0459
0.0675

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
You can try setting B to identity and MARSS will fit a model with non-meanreverting autoregressive errors to the data. It is not done here since it turns
out that that is not a very good model and it takes a long time to fit. If you
try it, you’ll see that Q gets small meaning that the x part is being removed
from the model.

12.4 Process-error only model
Now let’s model the data as an autoregressive process observed without error, and incorporate the covariates into the process model. Note that this is
much different from typical linear regression models. The x part represents
our model of the data (in this case plankton species). How is this different
from the autoregressive observation errors? Well, we are modeling our data
as autoregressive so data at t − 1 affects the data at t. Population abundances
are inherently autoregressive so this model is a bit closer to the underlying

156

12 Covariates

mechanism generating the data. Here is our new process model for plankton
abundance.
xt = xt−1 + Cct + wt , where wt ∼ MVN(0, Q)

(12.5)

We can fit this as follows:
R = A = U = "zero"; B = Z = "identity"
Q = "equalvarcov"
C = "unconstrained"
x = dat # to show the relation between dat & the equations
model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=covariates)
kem = MARSS(x, model=model.list)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -285.0732
AIC: 586.1465
AICc: 586.8225

Q.diag
Q.offdiag
x0.X.Greens
x0.X.Bluegreens
C.(X.Greens,Temp)
C.(X.Bluegreens,Temp)
C.(X.Greens,TP)
C.(X.Bluegreens,TP)

Estimate
0.7269
-0.0210
-0.5189
-0.2431
-0.0434
0.0988
-0.0589
0.0104

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Now, it looks like temperature has a strong negative effect on algae? Also our
log-likelihood dropped a lot. Well, the data do not look at all like a random
walk model (i.e., where B = 1), which we can see from the plot of the data
(Figure 12.1). The data are fluctuating about some mean so let’s switch to
a better autoregressive model—a mean-reverting model. To do this, we will
allow the diagonal elements of B to be something other than 1.
model.list$B = "diagonal and unequal"
kem = MARSS(dat, model=model.list)

12.4 Process-error only model

157

Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -236.6106
AIC: 493.2211
AICc: 494.2638

B.(X.Greens,X.Greens)
B.(X.Bluegreens,X.Bluegreens)
Q.diag
Q.offdiag
x0.X.Greens
x0.X.Bluegreens
C.(X.Greens,Temp)
C.(X.Bluegreens,Temp)
C.(X.Greens,TP)
C.(X.Bluegreens,TP)

Estimate
0.1981
0.7672
0.4899
-0.0221
-1.2915
-0.4179
0.2844
0.1655
0.0332
0.1340

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Notice that the log-likelihood goes up quite a bit, which means that the meanreverting model fits the data much better.
With this model, we are estimating x0 . If we set model$tinitx=1, we
will get a error message that R diagonals are equal to 0 and we need to fix
x0. Because R = 0, if we set the initial states at t = 1, then they are fully
determined by the data.
x0 = dat[,1,drop=FALSE]
model.list$tinitx = 1
model.list$x0 = x0
kem = MARSS(dat, model=model.list)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.

158

12 Covariates

Log-likelihood: -235.4827
AIC: 486.9653
AICc: 487.6414

B.(X.Greens,X.Greens)
B.(X.Bluegreens,X.Bluegreens)
Q.diag
Q.offdiag
C.(X.Greens,Temp)
C.(X.Bluegreens,Temp)
C.(X.Greens,TP)
C.(X.Bluegreens,TP)

Estimate
0.1980
0.7671
0.4944
-0.0223
0.2844
0.1655
0.0332
0.1340

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.

12.5 Both process- & observation-error model
The MARSS package is really designed for state-space models where you have
errors (v and w) in both the process and observation models. For example,
xt = Bxt−1 + Ct ct + wt , where wt ∼ MVN(0, Q)
yt = xt−1 + vt , where vt ∼ MVN(0, R),

(12.6)

x is the true algae abundances and y is the observation of the x’s.
Let’s say we knew that the observation variance on the algae measurements
was about 0.16 and we wanted to include that known value in the model. To
do that, we can simply add R to the model list from the process-error only
model in the last example.
model.list$R = diag(0.16,2)
kem = MARSS(dat, model=model.list)
Success! abstol and log-log tests passed at 26 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 26 iterations.
Log-likelihood: -240.3718
AIC: 496.7436
AICc: 497.4196
Estimate

12.6 Including seasonal effects in MARSS models

159

B.(X.Greens,X.Greens)
0.31201
B.(X.Bluegreens,X.Bluegreens) 0.76142
Q.diag
0.33842
Q.offdiag
-0.00355
C.(X.Greens,Temp)
0.23569
C.(X.Bluegreens,Temp)
0.16966
C.(X.Greens,TP)
0.02449
C.(X.Bluegreens,TP)
0.14164
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Note, our estimates of the effect of temperature and total phosphorous are
not that different than what you get from a simple multiple regression (our
first example). This might be because the autoregressive component is small,
meaning the estimated diagonals on the B matrix are small.

12.6 Including seasonal effects in MARSS models
Time-series data are often collected at intervals with some implicit “seasonality.” For example, quarterly earnings for a business, monthly rainfall totals, or
hourly air temperatures. In those cases, it is often helpful to extract any recurring seasonal patterns that might otherwise mask some of the other temporal
dynamics we are interested in examining.
Here we show a few approaches for including seasonal effects using the
Lake Washington plankton data, which were collected monthly. The following
examples will use all five phytoplankton species from Lake Washington. First,
let’s set up the data.
years = fulldat[,"Year"]>=1965 & fulldat[,"Year"]<1975
phytos = c("Diatoms", "Greens", "Bluegreens",
"Unicells", "Other.algae")
dat = t(fulldat[years,phytos])
# z.score data because we changed the mean when we subsampled
the.mean = apply(dat,1,mean,na.rm=TRUE)
the.sigma = sqrt(apply(dat,1,var,na.rm=TRUE))
dat = (dat-the.mean)*(1/the.sigma)
# number of time periods/samples
TT = dim(dat)[2]
12.6.1 Seasonal effects as fixed factors
One common approach for estimating seasonal effects is to treat each one
as a fixed factor, such that the number of parameters equals the number of
“seasons” (e.g., 24 hours per day, 4 quarters per year). The plankton data are

160

12 Covariates

collected monthly, so we will treat each month as a fixed factor. To fit a model
with fixed month effects, we create a 12 × T covariate matrix c with one row
for each month (Jan, Feb, ...) and one column for each time point. We put a
1 in the January row for each column corresponding to a January time point,
a 1 in the February row for each column corresponding to a February time
point, and so on. All other values of c equal 0. The following code will create
such a c matrix.
# number of "seasons" (e.g., 12 months per year)
period = 12
# first "season" (e.g., Jan = 1, July = 7)
per.1st = 1
# create factors for seasons
c.in = diag(period)
for(i in 2:(ceiling(TT/period))) {c.in = cbind(c.in,diag(period))}
# trim c.in to correct start & length
c.in = c.in[,(1:TT)+(per.1st-1)]
# better row names
rownames(c.in) = month.abb
Next we need to set up the form of the C matrix which defines any constraints we want to set on the month effects. C is a 5 × 12 matrix. Five taxon
and 12 month effects. If we wanted each taxon to have the same month effect,
i.e. there is a common month effect across all taxon, then we have the same
value in each C column1 :
C = matrix(month.abb,5,12,byrow=TRUE)
C
[1,]
[2,]
[3,]
[4,]
[5,]
[1,]
[2,]
[3,]
[4,]
[5,]

[,1]
"Jan"
"Jan"
"Jan"
"Jan"
"Jan"
[,10]
"Oct"
"Oct"
"Oct"
"Oct"
"Oct"

[,2]
"Feb"
"Feb"
"Feb"
"Feb"
"Feb"
[,11]
"Nov"
"Nov"
"Nov"
"Nov"
"Nov"

[,3]
"Mar"
"Mar"
"Mar"
"Mar"
"Mar"
[,12]
"Dec"
"Dec"
"Dec"
"Dec"
"Dec"

[,4]
"Apr"
"Apr"
"Apr"
"Apr"
"Apr"

[,5]
"May"
"May"
"May"
"May"
"May"

[,6]
"Jun"
"Jun"
"Jun"
"Jun"
"Jun"

[,7]
"Jul"
"Jul"
"Jul"
"Jul"
"Jul"

[,8]
"Aug"
"Aug"
"Aug"
"Aug"
"Aug"

[,9]
"Sep"
"Sep"
"Sep"
"Sep"
"Sep"

Notice, that C only has 12 values in it, the 12 common month effects. However,
for this example, we will let each taxon have a different month effect thus
allowing different seasonality for each taxon. For this model, we want each
value in C to be unique:
1

month.abb is a R constant that gives month abbreviations in text.

12.6 Including seasonal effects in MARSS models

161

C = "unconstrained"
Now C has 5 × 12 = 60 separate effects.
Then we set up the form for the rest of the model parameters. We make
the following assumptions:
#
B
#
Q
#
#
U
#
Z
#
A
#
#
R
#
D
d

Each taxon has unique density-dependence
= "diagonal and unequal"
Assume independent process errors
= "diagonal and unequal"
We have demeaned the data & are fitting a mean-reverting model
by estimating a diagonal B, thus
= "zero"
Each obs time series is associated with only one process
= "identity"
The data are demeaned & fluctuate around a mean
= "zero"
We assume observation errors are independent, but they
have similar variance due to similar collection methods
= "diagonal and equal"
We are not including covariate effects in the obs equation
= "zero"
= "zero"

Now we can set up the model list for MARSS and fit the model (results
are not shown since they are verbose with 60 different month effects).
model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=c.in,D=D,d=d)
seas.mod.1 = MARSS(dat,model=model.list,control=list(maxit=1500))
# Get the estimated seasonal effects
# rows are taxa, cols are seasonal effects
seas.1 = coef(seas.mod.1,type="matrix")$C
rownames(seas.1) = phytos
colnames(seas.1) = month.abb
The top panel in Figure 12.2 shows the estimated seasonal effects for this
model. Note that if we had set U=”unequal”, we would need to set one of the
columns of C to zero because the model would be under-determined (infinite
number of solutions). If we substracted the mean January abundance off each
time series, we could set the January column in C to 0 and get rid of 5
estimated effects.
12.6.2 Seasonal effects as a polynomial
The fixed factor approach required estimating 60 effects. Another approach is
to model the month effect as a 3rd -order (or higher) polynomial: a + b × m +
c × m2 + d × m3 where m is the month number. This approach has less flexibility but requires only 20 estimated parameters (i.e., 4 regression parameters

162

12 Covariates

times 5 taxa). To do so, we create a 4 × T covariate matrix c with the rows
corresponding to 1, m, m2 , and m3 , and the columns again corresponding to
the time points. Here is how to set up this matrix:
# number of "seasons" (e.g., 12 months per year)
period = 12
# first "season" (e.g., Jan = 1, July = 7)
per.1st = 1
# order of polynomial
poly.order = 3
# create polynomials of months
month.cov = matrix(1,1,period)
for(i in 1:poly.order) {month.cov = rbind(month.cov,(1:12)^i)}
# our c matrix is month.cov replicated once for each year
c.m.poly = matrix(month.cov, poly.order+1, TT+period, byrow=FALSE)
# trim c.in to correct start & length
c.m.poly = c.m.poly[,(1:TT)+(per.1st-1)]
# Everything else remains the same as in the previous example
model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=c.m.poly,D=D,d=d)
seas.mod.2 = MARSS(dat, model=model.list, control=list(maxit=1500))
The effect of month m for taxon i is ai + bi × m + ci × m2 + di × m3 , where ai , bi ,
ci and di are in the i-th row of C. We can now calculate the matrix of seasonal
effects as follows, where each row is a taxon and each column is a month:
C.2 = coef(seas.mod.2,type="matrix")$C
seas.2 = C.2 %*% month.cov
rownames(seas.2) = phytos
colnames(seas.2) = month.abb
The middle panel in Figure 12.2 shows the estimated seasonal effects for this
polynomial model.
12.6.3 Seasonal effects as a Fourier series
The factor approach required estimating 60 effects, and the 3rd order polynomial model was an improvement at only 20 parameters. A third option is to
use a discrete Fourier series, which is combination of sine and cosine waves; it
would require only 10 parameters. Specifically, the effect of month m on taxon
i is ai × cos(2πm/p) + bi × sin(2πm/p), where p is the period (e.g., 12 months,
4 quarters), and ai and bi are contained in the i-th row of C.
We begin by defining the 2 × T seasonal covariate matrix c as a combination of 1 cosine and 1 sine wave:
cos.t = cos(2 * pi * seq(TT) / period)
sin.t = sin(2 * pi * seq(TT) / period)
c.Four = rbind(cos.t,sin.t)

12.6 Including seasonal effects in MARSS models

163

Everything else remains the same and we can fit this model as follows:
model.list = list(B=B,U=U,Q=Q,Z=Z,A=A,R=R,C=C,c=c.Four,D=D,d=d)
seas.mod.3 = MARSS(dat, model=model.list, control=list(maxit=1500))
We make our seasonal effect matrix as follows:
C.3 = coef(seas.mod.3, type="matrix")$C
# The time series of net seasonal effects
seas.3 = C.3 %*% c.Four[,1:period]
rownames(seas.3) = phytos
colnames(seas.3) = month.abb

0.0

0.5

Diatoms
Greens
Bluegreens
Unicells
Other.algae

0.0

0.4

Diatoms
Greens
Bluegreens
Unicells
Other.algae

0.0

0.4

Diatoms
Greens
Bluegreens
Unicells
Other.algae

Dec

Nov

Oct

Sep

Aug

Jul

Jun

May

Apr

Mar

Feb

Jan

−0.4

Fourier

Dec

Nov

Oct

Sep

Aug

Jul

Jun

May

Apr

Mar

Feb

Jan

−0.8

−0.4

Cubic

Dec

Nov

Oct

Sep

Aug

Jul

Jun

May

Apr

Mar

Feb

Jan

−0.5

Fixed monthly

1.0

The bottom panel in Figure 12.2 shows the estimated seasonal effects for this
seasonal-effects model based on a discrete Fourier series.

Fig. 12.2. Estimated monthly effects for the three approaches to estimating seasonal
effects. Top panel: each month modelled as a separate fixed effect for each taxon (60
parameters); Middle panel: monthly effects modelled as a 3rd order polynomial (20
parameters); Bottom panel: monthly effects modelled as a discrete Fourier series (10
parameters).

164

12 Covariates

Rather than rely on our eyes to judge model fits, we should formally assess
which of the 3 approaches offers the most parsimonious fit to the data. Here
is a table of AICc values for the 3 models:
data.frame(Model=c("Fixed", "Cubic", "Fourier"),
AICc=round(c(seas.mod.1$AICc,
seas.mod.2$AICc,
seas.mod.3$AICc),1))
Model
AICc
1
Fixed 1188.4
2
Cubic 1144.9
3 Fourier 1127.4
The model selection results indicate that the model with monthly seasonal
effects estimated via the discrete Fourier sequence is the best of the 3 models. Its AICc value is much lower than either the polynomial or fixed-effects
models.

12.7 Model diagnostics
We will examine some basic model diagnostics for these three approaches by
looking at plots of the model residuals and their autocorrelation functions
(ACFs) for all five taxa using the following code:
for(i in 1:3) {
dev.new()
modn = paste("seas.mod",i,sep=".")
for(j in 1:5) {
plot.ts(residuals(modn)$model.residuals[j,],
ylab="Residual", main=phytos[j])
abline(h=0, lty="dashed")
acf(residuals(modn)$model.residuals[j,])
}
}
Figures 12.3–12.5 shows these diagnostics for the three models. The model
residuals for all taxa and models appear to show significant negative autocorrelation at lag=1, suggesting that a model with seasonal effects is inadequate
to capture all of the systematic variation in phytoplankton abundance.

12.8 Covariates with missing values or observation error
The specific formulation of Equation 12.1 creates restrictions on the assumptions regarding the covariate data. You have to assume that your covariate

12.8 Covariates with missing values or observation error
1.0
−0.2
1.0

4

6

8

10

12

−0.2

0.4

ACF

0.0

4

6

ACF

0.8

2

8

10

12

Lag

−0.4

0.2

0.0
−1.5

Residual

1.5

0

Series residuals(seas.mod.1)$model.residuals[i, ]
Unicells

2

4

6

ACF

8

10

12

Lag

−0.4

0.2

0.5

0.8

0

−1.0

Residual

2

Series residuals(seas.mod.1)$model.residuals[i, ]
Bluegreens

Series residuals(seas.mod.1)$model.residuals[i, ]
Other.algae

2

4

6

ACF

8

10

12

Lag

−0.4

0.2

0.0

0.8

1.0

0

−1.5

Residual

0

Lag

−0.6

Residual

0.4

ACF

1.0
−1.0 0.0

Series residuals(seas.mod.1)$model.residuals[i, ]
Greens

0.6

Residual

Diatoms

165

1965

1967

1969

1971

Time

1973

0

1

2

3

4

5

6

7

8

9 10 11 12

Time
Lag lag

Fig. 12.3. Model residuals and their ACF for the model with fixed monthly effects.

data has no error, which is probably not true. You cannot have missing values
in your covariate data, again unlikely. You cannot combine instrument time
series; for example, if you have two temperature recorders with different error
rates and biases. Also, what if you have one noisy temperature recorder in the
first part of your time series and then you switch to a much better recorder
in the second half of your time series? All these problems require pre-analysis
massaging of the covariate data, leaving out noisy and gappy covariate data,
and making what can feel like arbitrary choices about which covariate time
series to include.
To circumvent these potential problems and allow more flexibility in how
we incorporate covariate data, one can instead treat the covariates as components of an auto-regressive process by including them in both the process and
observation models. Beginning with the process equation, we can write
 (v) 
 (v)
  (v) 
 (v) 
x
B
C
x
u
=
+
+ wt ,
x(c) t
u(c)
0 B(c) x(c) t−1
(12.7)

  (v)
0
Q
wt ∼ MVN 0,
0 Q(c)

12 Covariates

−0.2

0.4

ACF

0.0
−1.0

Residual

1.0

Diatoms

1.0

166

Series residuals(seas.mod.2)$model.residuals[i, ]
Greens

8

10

12

−0.2

4

6

0.8

2

ACF

8

10

12

Lag

−0.4

0.2

1
0
−2

Residual

6
Lag

0

Series residuals(seas.mod.2)$model.residuals[i, ]
Unicells

2

4

6

8

10

12

0.2

Lag

−0.4

0.0

ACF

0.8

1.0

0

−1.5

Residual

4

Series residuals(seas.mod.2)$model.residuals[i, ]
Bluegreens

Series residuals(seas.mod.2)$model.residuals[i, ]
Other.algae

2

4

6

8

10

12

Lag

−0.4

0.2

ACF

0.8

0.0 1.0

0

−1.5

Residual

2

0.4 0.8

ACF

0.0
−0.6

Residual

0.6

0

1965

1967

1969

1971

Time

1973

0

1

2

3

4

5

6

7

8

9 10 11 12

Time
Lag lag

Fig. 12.4. Model residuals and their ACF for the model with monthly effects modelled as a 3rd -rd order polynomial.

The elements with superscript (v) are for the k variate states and those with
superscript (c) are for the q covariate states. The dimension of x(c) is q × 1 and
q is not necessarily equal to p, the number of covariate observation time series
in your dataset. Imagine, for example, that you have two temperature sensors
and you are combining these data. Then you have two covariate observation
time series (p = 2) but only one underlying covariate state time series (q = 1).
The matrix C is dimension k × q, and B(c) and Q(c) are dimension q × q. The
dimension2 of x(v) is k × 1, and B(v) and Q(v) are dimension k × k.
Next, we can write the observation equation in an analogous manner, such
that
 (v) 
 (v)
  (v)   (v) 
y
Z
D
x
a
=
+ (c) + vt ,
y(c) t
a
0 Z(c) x(c) t
(12.8)
  (v)

R
0
vt ∼ MVN 0,
0 R(c)
2

The dimension of x is always denoted m. If your process model includes only variates, then k = m, but now your process model includes k variates and q covariate
states so m = k + q.

12.8 Covariates with missing values or observation error
1.0
−0.2

8

10

12

−0.2

4

6

8

10

12

0.2

Lag

−0.4

−1.5

0.0

ACF

0.8

1.5

2

Series residuals(seas.mod.3)$model.residuals[i, ]
0

2

4

6

ACF

10

12

−0.4

−1.0

8

Lag

0.2

0.0

0.8

1.0

Series residuals(seas.mod.3)$model.residuals[i, ]
Other.algae

2

4

6

8

10

12

Lag

−0.4

−1.5

ACF

0.0

0.8

0

0.2

Residual

6
Lag

0

Unicells

1.0

Residual

4

Series residuals(seas.mod.3)$model.residuals[i, ]
Bluegreens

Residual

2

0.4 0.8

0.0

ACF

0

−0.6

Residual

0.4

ACF

0.0
−1.0

Series residuals(seas.mod.3)$model.residuals[i, ]
Greens

0.6

Residual

1.0

Diatoms

167

1965

1967

1969

1971

Time

1973

0

1

2

3

4

5

6

7

8

9 10 11 12

Time
Lag lag

Fig. 12.5. Model residuals and their ACF for the model with monthly effects estimated using a Fourier transform.

The dimension of y(c) is p × 1, where p is the number of covariate observation
time series in your dataset. The dimension of y(v) is l ×1, where l is the number
of variate observation time series in your dataset. The total dimension of y
is l + p. The matrix D is dimension l × q, Z(c) is dimension p × q, and R(c)
are dimension p × p. The dimension of Z(v) is dimension l × k, and R(v) are
dimension l × l.
The D matrix would presumably have a number of all zero rows in it,
as would the C matrix. The covariates that affect the states would often be
different than the covariates that affect the observations. For example, mean
annual temperature would affect population growth rates for many species
while having little or no affect on observability, and turbidity might strongly
affect observability in many types of aquatic, say, surveys but have little affect
on population growth rate.
Our MARSS model with covariates now looks on the surface like a regular
MARSS model:
xt = Bxt−1 + u + wt , where wt ∼ MVN(0, Q)
yt = Zxt + a + vt , where vt ∼ MVN(0, R)

(12.9)

168

12 Covariates

with the xt , yt and parameter matrices redefined as in Equations 12.7 and
12.8:
 (v) 
 (v)

 (v) 
 (v)

x
B
C
u
Q
0
x = (c)
B=
u = (c)
Q=
x
u
0 B(c)
0 Q(c)
(12.10)
 (v) 
 (v)

 (v) 
 (v)

y
Z
D
a
R
0
y = (c)
Z=
a = (c)
R=
y
a
0 Z(c)
0 R(c)
Note Q and R are written as block diagonal matrices, but you could allow
covariances if that made sense. u and a are column vectors here. We can fit
the model (Equation 12.9) as usual using the MARSS() function.
The log-likelihood that is returned by MARSS will include the loglikelihood of the covariates under the covariate state model. If you want only
the the log-likelihood of the non-covariate data, you will need to substract off
the log-likelihood of the covariate model:
(c)

(c)

xt = B(c) xt−1 + u(c) + wt , where wt ∼ MVN(0, Q(c) )
(c)

(c)

yt = Z(c) xt + a(c) + vt , where vt ∼ MVN(0, R(c) )

(12.11)

An easy way to get this log-likelihood for the covariate data only is use the
augmented model (Equation 12.9 with terms defined as in Equation 12.10) but
pass in missing values for the non-covariate data. The following code shows
how to do this.
y.aug = rbind(data,covariates)
fit.aug = MARSS(y.aug, model=model.aug)
fit.aug is the MLE object that can be passed to MARSSkf(). You need to
make a version of this MLE object with the non-covariate data filled with NAs
so that you can compute the logLik without the covariates. This needs to be
done in the marss element since that is what is used by MARSSkf(). Below
is code to do this.
fit.cov = fit.aug; fit.cov$marss$data[1:dim(data)[1],] = NA
extra.LL = MARSSkf(fit.cov)$logLik
Note that when you fit the augmented model, the estimates of C and
B(c) are affected by the non-covariate data since the model for both the noncovariate and covariate data are estimated simultaneously and are not independent (since the covariate states affect the non-covariates states). If you
want the covariate model to be unaffected by the non-covariate data, you can
fit the covariate model separately and use the estimates for B(c) and Q(c) as
fixed values in your augmented model.

13
Estimation of species interaction strengths
with and without covariates

13.1 Background
Multivariate autoregressive models (commonly termed MAR models) have
been developed as a tool for analyzing community dynamics from time series
data (Ives, 1995; Ives et al., 1999, 2003). These models are based on a process
model for log abundances (x) of the form
xt = Bxt−1 + u + wt where wt ∼ MVN(0, Q)

(13.1)

B is the interaction matrix; self interaction strengths (density-dependence)
are on the diagonal and inter-specific interaction strengths are on the offdiagonals such that Bi, j is the ‘effect’ of species j on species i. This model has
a stochastic equilibrium—it fluctuates around mean, (I − B)−1 u.
The term u determines the mean level but once the system is at equilibrium, it does not affect the fluctuations relative to the mean. To see this,
compare two models with b = 0.5 and u = 1 versus u = 0. The mean for the
first is 1/(1 − 0.5) = 2 and for the second is 0. If we start both 1 above the
mean, the next x is the same distance from the mean: x2 = 0.5(2 + 1) + 1 = 2.5
and x2 = 0.5(0 + 1) + 0 = 0.5. So both end up at 0.5 above the mean. So once
the system is at equilibrium, it is ‘scale invariant’, where u is the scaling term.
The way that Ives et al. (2003) write their process model (their Equation 10)
is Xt = A + BXt−1 + Et . The A in Ives’s equation is the u appearing in Equation
13.1 and the Et is our wt .
Often the models include environmental covariates, but we will leave that
off for the moment and address that at the end of the chapter. If we add a
Type RShowDoc("Chapter_SpeciesInteractions.R",package="MARSS") at the
R command line to open a file with all the code for the examples in this chapter.

170

13 B estimation

measurement process1 , we have a MARSS model:
yt = Zxt + a + vt where vt ∼ MVN(0, R)

(13.2)

Typically, we have one time series per species and thus we assume that m = n
and Z is an m × m identity matrix (when m = n, a is set to 0). However, it is
certainly possible to have multiple time series per species (for example data
taken at multiple sites).
In this chapter, we will estimate the B matrix of species interactions for a
simple wolf-moose system and for a four-species freshwater plankton system.

13.2 Two-species example using wolves and moose
Population dynamics of wolves and moose on Isle Royale, Michigan make an
interesting case study of a two-species predator-prey interactions. These populations have been studied intensively since 1958(?) 2 . Unlike other populations
of gray wolves, the Isle Royale population has a diet dominated by one prey
item, moose. The only predator of moose on Isle Royale is the gray wolf, as
this population is not hunted.
We will use the wolf and moose winter census data from Isle Royale to
learn how to fit community dynamics models to time-series data. The longterm January (wolf) and February (moose) population estimates are provided
at http://www.isleroyalewolf.org.
The mathematical form of the process model for the wolf-moose population
dynamics is


 





bww bw→m xwol f
u
wwol f
xwol f
=
+ wol f +
umoose
wmoose t
xmoose t
bm→w bmm
xmoose t−1
(13.3)


 

wwol f
q
0
∼ MVN 0, wol f
wmoose t
0 qmoose
13.2.1 Load in and plot the data
royale.dat = log(t(isleRoyal[,2:3]))

1

2

You can fit a MAR model with no observation error by setting R = 0, but a
conditional least-squares algorithm is vastly faster than EM or BFGS for the
R = 0 case (assuming no missing data).
There are many, many publications from this long-term study site; see http://
www.isleroyalewolf.org/wolfhome/tech_pubs.html and the review here http:
//www.isleroyalewolf.org/data/data/home.html.

13.2 Two-species example using wolves and moose

171

matplot(isleRoyal[,1],log(isleRoyal[,2:3]),
ylab="log count",xlab="Year",type="l",
lwd=3,bty="L",col="black")
legend("topright",c("Wolf","Moose"), lty=c(1,2), bty="n")

5
3

4

log count

6

7

Wolf
Moose

1960

1970

1980

1990

2000

2010

Year

Fig. 13.1. Plot of the Isle Royale wolf and moose data.

13.2.2 Fit the model to the wolf-moose data
The naive way to fit the model is to use Equations 13.2 and 13.1 “as is”:
royale.model.0=list(B="unconstrained",Q="diagonal and unequal",
R="diagonal and unequal",U="unequal")
kem.0=MARSS(royale.dat,model=royale.model.0)
If you try this, you will notice that it does not converge but stops when it
reaches maxit and prints a number of warnings about non-convergence. The
problem is that when you try to estimate B and u, they are often confounded.
This a well-known problem, and you will need to find a way to fix u at some
value. If you are willing to assume that the process is at equilibrium (i.e.
not recovering to equilibrium from a big perturbation), then you can simply
demean the data and set u to 0. It is also common to standardize the variance

172

13 B estimation

by dividing by the square root of the variance of the data. This is called
z-scoring the data.
#if missing values are in the data, they should be NAs
z.royale.dat=(royale.dat-apply(royale.dat,1,mean,na.rm=TRUE))/
sqrt(apply(royale.dat,1,var,na.rm=TRUE))
We also need to change a couple settings before fitting the model. In the
default MARSS model, the initial value of x is treated as being at t = 0. If we
are estimating the B matrix, we need to set this to be at t = 1 so that the initial
x is constrained by the data3 at t = 1. The reason is that we need to estimate
the initial x. Even if we use a prior on the initial x, we are still estimating
its value4 . A model with a non-diagonal B matrix, does not “run backwards”
well and the estimation of the initial x will run in circles. If we constrain it by
data (at t = 1), we avoid this problem. So we will set model$tinitx=1.
The other setting we want to change is allow.degen. This sets the diagonals of Q or R to zero if they are heading towards zero. When the initial x is
at t = 1, this can have non-intuitive (not wrong but puzzling; see Appendix B)
consequences if R is going to zero. So, we will set control$allow.degen=FALSE
and manually set R to 0 if needed.
royale.model.1=list(Z="identity", B="unconstrained",
Q="diagonal and unequal", R="diagonal and unequal",
U="zero", tinitx=1)
cntl.list=list(allow.degen=FALSE,maxit=200)
kem.1=MARSS(z.royale.dat, model=royale.model.1, control=cntl.list)
Warning! Reached maxit before parameters converged. Maxit was 200.
neither abstol nor log-log convergence tests were passed.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: maxit reached at 200 iter before convergence.
Neither abstol nor log-log convergence test were passed.
The likelihood and params are not at the ML values.
Try setting control$maxit higher.
Log-likelihood: -76.46247
AIC: 172.9249
AICc: 175.2407

3
4

If there are many missing values at t = 1, we might still have problems and have
to adjust accordingly.
Also putting a prior on the initial x’s requires specifying their variance-covariance
structure, which depends on the unknown B, and specifying some variancecovariance structure that conflicts with B will change your B estimates. So, in
general, using a prior on the initial x’s when estimating B is a bad idea.

13.2 Two-species example using wolves and moose

173

Estimate
R.(Wolf,Wolf)
0.001421
R.(Moose,Moose)
0.000378
B.(1,1)
0.762723
B.(2,1)
-0.173536
B.(1,2)
0.069223
B.(2,2)
0.833074
Q.(X.Wolf,X.Wolf)
0.456457
Q.(X.Moose,X.Moose) 0.173376
x0.X.Wolf
-0.267175
x0.X.Moose
-1.277329
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.(Wolf,Wolf) parameter value has not converged.
Warning: the R.(Moose,Moose) parameter value has not converged.
Warning: the logLik parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
It looks like R is going to zero, meaning that the maximum-likelihood model
is a process error only model. That is not too surprising given that the data
look more like a random walk than white noise. We will set R manually to
zero:
royale.model.2=list(Z="identity", B="unconstrained",
Q="diagonal and unequal", R="zero", U="zero")
kem.2=MARSS(z.royale.dat, model=royale.model.2)
Success! abstol and log-log tests passed at 16 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 16 iterations.
Log-likelihood: -82.3988
AIC: 180.7976
AICc: 182.2821

B.(1,1)
B.(2,1)
B.(1,2)
B.(2,2)

Estimate
0.7618
-0.1734
0.0692
0.8328

174

13 B estimation

Q.(X.Wolf,X.Wolf)
Q.(X.Moose,X.Moose)
x0.X.Wolf
x0.X.Moose

0.4499
0.1708
-0.2086
-1.5769

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
13.2.3 Look at the estimated interactions
The estimated B elements are coef(kem.2)$B.
wolf.B=coef(kem.2,type="matrix")$B
rownames(wolf.B)=colnames(wolf.B)=rownames(royale.dat)
print(wolf.B, digits=2)
Wolf Moose
Wolf
0.76 0.069
Moose -0.17 0.833
The coef() function returns the estimated parameters of the fitted object,
but in this case we want to see the estimates in matrix form. Thus we use
type="matrix". Element row=i, col= j in B is the effect of species j on species
i, so B2,1 is the effect of wolves on moose and B1,2 is the effect of moose on
wolves. The B matrix suggests that wolves have a negative effect on moose
and that moose have a positive effect on wolves—as one would expect. The
diagonals are interpreted differently than the off-diagonals since the diagonals
are (bi,i − 1) so subtract off 1 from the diagonals to get the effect of species i on
itself. If the species are density-independent, then Bi,i would equal 1. Smaller
Bi,i means more density dependence.
13.2.4 Adding covariates
It is well-known that moose numbers are strongly affected by winter and summer climate. The Isle Royale data set provided with MARSS has climate
data from climate stations in Northeastern Minnesota, near Isle Royale5 . The
covariate data include January-February, July-September and April-May average temperature and precipitation. Also included are three-year running
means of these data, where the number for year x is the average of years x-1,
x and x+1. We will include these covariates in the analysis to see how they
change our interaction estimates. We have to adjust our covariates because
the census numbers are from winter in year x and we want the climate data
from the previous year to affect this winter’s moose count. As usual, we will
5

From the Western Regional Climate Center. See the help file for this dataset for
references.

13.2 Two-species example using wolves and moose

175

need to demean our covariate data so that we can set u equal to zero. We will
standardize the variance also so that we can more easily compare the effects
across different covariates.
The mathematical form of our new process model for the wolf-moose population dynamics is







 win temp


xwol f
x
0 0 0 
wwol f
win precip +
= B wol f
+
(13.4)
xmoose t
xmoose t−1
C21 C22 C23
wmoose t
sum temp t−1
The C21 , C22 , etc. terms are the effect of winter temperature, winter precipitation, previous summer temperature and previous summer precipitation on
winter moose numbers. Since climate is known to mainly affect the moose, we
set the climate effects to 0 for wolves (top row of C).
First we prepare the covariate data. The winter temperature and precipitation data is in columns 4 and 10, while the summer temperature data is in
columns 6. We need to use the previous year’s climate data with this winter’s
abundance data, so we will off-set the climate data.
clim.dat= t(isleRoyal[1:52,c(4,10,6)])
z.score.clim.dat=(clim.dat-apply(clim.dat,1,mean,na.rm=TRUE))/
sqrt(apply(clim.dat,1,var,na.rm=TRUE))
A plot of the covariate data against each other indicates that there is not
much correlation between winter temperature and precipitation (Figure 13.2,
which is good for analysis purposes, but warm winters are somewhat correlated
with warm summers. The latter will make it harder to interpret the effect of
winter versus summer temperature although the correlation is not too strong
fortunately.
Next we prepare the list with the structure of all the model matrices. We
give descriptive names to the C elements so we can remember what each C
element means.
royale.model.3=list(Z="identity", B="unconstrained",
Q="diagonal and unequal", R="zero", U="zero",
C=matrix(list(0,"Moose win temp",0,"Moose win precip",
0,"Moose sum temp"),2,3),
c=z.score.clim.dat)
Then we fit the model. Because we have to use the climate data from the
previous year, we lose a year of our count data.
kem.3=MARSS(z.royale.dat[,2:53], model=royale.model.3)
Success! abstol and log-log tests passed at 17 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.

176

13 B estimation

MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 17 iterations.
Log-likelihood: -80.79155
AIC: 183.5831
AICc: 186.4527

B.(1,1)
B.(2,1)
B.(1,2)
B.(2,2)
Q.(X.Wolf,X.Wolf)
Q.(X.Moose,X.Moose)
x0.X.Wolf
x0.X.Moose
C.Moose win temp
C.Moose win precip
C.Moose sum temp

Estimate
0.7667
-0.1609
0.0790
0.8343
0.4567
0.1679
0.1543
-1.4008
0.0242
-0.0713
-0.0306

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
The results suggest what is already known about this system: cold winters
and lots of snow are bad for moose as are hot summers.
13.2.5 Change the model and data
You can explore the sensitivity of the B estimates when the measurement error
is increased by adding white noise to the data:
bad.data=z.royale.dat+matrix(rnorm(100,0,sqrt(.2)),2,50)
kem.bad=MARSS(bad.data, model=model)
You can change the model by changing the constraints on R and Q.

13.3 Analysis a four-species plankton community
Ives et al. (2003) presented weekly data on the biomass of two species of
phytoplankton and two species of zooplankton in two lakes, one with low
planktivory and one with high planktivory. They used these data to estimate
the interaction terms for the four species. Here we will reanalyze data and
compare our results.
Ives et al. (2003) explain the data as: “The data consist of weekly samples
of zooplankton and phytoplankton, which for the analyses were divided into

13.3 Analysis a four-species plankton community

−1

0

1

2

3

●

●
●

2

●
●

●

●●●

●

●
●

●

●

●
●

●
●

●

●

●

●
●
●●
●
●
●
●●
● ●
●●
●
● ●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●●
● ●●
●
●
●
●
●●

●

1

● ●
●

3

●

●

0

●
●
●
● ●●
●
● ●
● ●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●● ●
●
● ●● ●

−1

jan.feb.ave.temp

177

●

2

●

●

●

●
●

1

●

jan.feb.ave.precip

●

● ●
● ●
●

●
●

●

●
● ●
●
●●
●● ●
●
●
●
● ● ●
●
●
● ● ●
●●
●
●
● ●●
●
●
●
●
●
●

0

−0.039

●

−0.036

july.sept.ave.temp

−2

−1

0.33

0

1

2

−1

●●

−1

0

1

2

−2

−1

0

1

2

Fig. 13.2. Pairs plot of the covariate data for Isle Royale with correlations in the lower panel. The R code that produced this plot was
cor.fun=function(x, y)text(0.5,0.5,format(cor(x,y),digits=2),cex=2)
pairs(t(z.score.clim.dat),lower.panel=cor.fun).

two zooplankton groups (Daphnia and non-Daphnia) and two phytoplankton
groups (large and small phytoplankton). Daphnia are large, effective herbivores, and small phytoplankton are particularly vulnerable to herbivory, so
we anticipate strong interactions between Daphnia and small phytoplankton
groups.” Figure 13.3 shows the data. What you can see from the figure is that
the data are only collected in the summer.
13.3.1 Load in the plankton data
# only use the plankton, daphnia, & non-daphnia
plank.spp = c("Large Phyto","Small Phyto","Daphnia","Non-daphnia")
plank.dat = ivesDataByWeek[,plank.spp]
#The data are not logged
plank.dat = log(plank.dat)
#Transpose to get time going across the columns
plank.dat = t(plank.dat)

178

13 B estimation

#make a demeaned version
d.plank.dat = (plank.dat-apply(plank.dat,1,mean,na.rm=TRUE))

0
−2
−4

log biomass

2

As before, we will demean the data so we can set u to 0. We do not standardize
by the variance, however because we are going to fix the R variance later as
Ives et al. did.

1 15 31 47 63 79 95 113 133 153 173 193 213 233 253 273 293
week of study

Fig. 13.3. Plot of the de-meaned plankton data. Zooplankton are the thicker lines.
Phytoplankton are the thinner lines.

13.3.2 Specify a MARSS model for the plankton data
We will start by fitting a model with the following assumptions:
ˆ
ˆ
ˆ
ˆ
ˆ

All phytoplankton share the same process variance.
All zooplankton share the same process variance.
Phytoplankton and zooplankton have different measurement variances
Measurement errors are independent.
Process errors are independent.
Z="identity"
U="zero"
B="unconstrained"
Q=matrix(list(0),4,4); diag(Q)=c("Phyto","Phyto","Zoo","Zoo")
R=matrix(list(0),4,4); diag(R)=c("Phyto","Phyto","Zoo","Zoo")
plank.model.0=list(Z=Z, U=U, Q=Q, R=R, B=B)

Why did we set U="zero"? Equation 13.1 is a stationary model; it fluctuates
about a mean. The u in Equation 13.1 is a scaling term that just affects the

13.3 Analysis a four-species plankton community

179

mean level—once the system is at equilibrium. If we assume that the mean of
y (the mean of our data) is a good estimate of the mean of the system (the
x), then we can set u equal to zero.
13.3.3 Fit the plankton model and look at the estimated B matrix
The call to fit the model is standard with the addition of setting model$tinitx
so that the initial states (x) are set at t = 1 instead of t = 0.
plank.model.0$tinitx=1
kem.plank.0 = MARSS(d.plank.dat, model=plank.model.0 )
Now we can print the B matrix, with a little cleaning up so it looks prettier.
#Cleaning up the B matrix for printing
B.0 = coef(kem.plank.0, type="matrix")$B[1:4,1:4]
rownames(B.0) = colnames(B.0) = c("LP","SP","D","ND")
print(B.0,digits=2)
LP
SP
D
ND
LP 0.77 0.29 -0.0182 0.131
SP 0.19 0.51 0.0052 -0.045
D -0.43 2.29 0.4916 0.389
ND -0.33 1.35 -0.2180 0.831
LP stands for large phytoplankton, SP for small phytoplankton, D for Daphnia
and ND for non-Daphnia.
We can compare this to the Ives et al. estimates (in their Table 2, bottom
right) and see a few differences:
LP
SP
D
ND
LP 0.48 -0.39
--SP
-- 0.25 -0.17 -0.11
D
--- 0.74 0.00
ND
-- 0.10 0.00 0.60
First, thing you will notice is that the Ives et al. matrix is missing values. The
matrix they show is after a model selection step to determine which interactions had little data support and thus could be set to zero. Also, they fixed
apriori the interactions between Daphnia and non-Daphnia at zero because
they do not prey on each other. The second thing you will notice is that the
estimates are not particularly similar. Next we will try some other ways of
fitting the data that are closer to the way that Ives et al. fitted the data.
By the way, if you are curious what would happen if we removed all those
NAs, you can run the following code.
test.dat = d.plank.dat[, !is.na(d.plank.dat[1, ])]
test = MARSS(test.dat, model = plank.model.0)

180

13 B estimation

Removing all the NAs would mean that the end of summer 1 is connected to
the beginning of summer 2. This adds some steep steps in the Daphnia time
series where Daphnia ended the summer high and started the next summer
low.
13.3.4 Look at different ways to fit the model
We will try a series of changes to get closer to the way Ives et al. fit the data,
and you will see how different assumptions change (or do not change) our
species interaction estimates.
First, we change Q to be unconstrained. Making Q diagonal in model 0
meant that we were assuming that whatever environmental factor is driving
variation in phytoplankton numbers is uncorrelated with the environmental
factor driving zooplankton variation. That is probably not true since they are
all in the same lake. This case takes awhile to run.
plank.model.1=plank.model.0
plank.model.1$Q="unconstrained"
kem.plank.1 = MARSS(d.plank.dat, model=plank.model.1)
Notice that the Q specification changed to “unconstrained”. Everything else
stays the same as in model 0. The code now runs longer, and the B estimates
are not particulaly closer to Ives et al.
LP
SP
D
LP 0.4961 0.061 0.079
SP -0.1833 0.896 0.067
D
0.1180 0.350 0.638
ND 0.0023 0.370 -0.122

ND
0.123
0.011
0.370
0.810

Next, we will set some of the interactions to zero as in Table 2 in Ives et
al. (2003). In that table, certain interactions were fixed at 0 (denoted with
0s), and some were made 0 after fitting (the blanks). We will fix all to zero.
To do this, we need to write out the B matrix as a list matrix so that we can
have estimated and fixed values (the 0s) in the B specification.
B.2=matrix(list(0),4,4) #set up the list matrix
diag(B.2)=c("B11","B22","B33","B44") #give names to diagonals
#and names to the estimated non-diagonals
B.2[1,2]="B12"; B.2[2,3]="B23"; B.2[2,4]="B24"; B.2[4,2]="B42"
print(B.2)
[1,]
[2,]
[3,]
[4,]

[,1]
"B11"
0
0
0

[,2]
"B12"
"B22"
0
"B42"

[,3]
0
"B23"
"B33"
0

[,4]
0
"B24"
0
"B44"

13.3 Analysis a four-species plankton community

181

As you can see, the B matrix now has elements that will be estimated
(the names in quotes) and fixed values (the numbers with no quotes). When
preparing your list matrix, make sure your fixed values do not have have quotes
around them. If they do, they are strings (class character) not numbers (class
numeric), and MARSS will interpret a string as the name of something to be
estimated. If you use the same name for an element, then MARSS will force
those elements to be shared (have the same value). This one will take a while
to run also.
#model 2
plank.model.2=plank.model.1
plank.model.2$B = B.2
kem.plank.2= MARSS(d.plank.dat, model=plank.model.2)
Now we are getting closer to the Ives et al. estimates:
LP
SP
D
ND
LP 0.65 -0.33
--SP
-- 0.54 0.0016 -0.026
D
--- 0.8349
-ND
-- 0.13
-- 0.596
Ives et al. did not estimate R. Instead they used a fixed observation variance of 0.04 for phytoplankton and 0.16 for zooplankton6 . We fit the model
with their fixed R as follows:
#model 3
plank.model.3=plank.model.2
plank.model.3$R=diag(c(.04,.04,.16,.16))
kem.plank.3= MARSS(d.plank.dat, model=plank.model.3)
As you can see from Table 13.1, we are getting closer to Ives et al. estimates,
but we are still a bit off. Now we need to add the environmental covariates:
phosphorous and fish biomass.
13.3.5 Adding covariates
A standard way that you will see covariate data added to a MARSS model is
the following:
xt = Bxt−1 + u + Cct + wt , where wt ∼ MVN(0, Q)
yt = Zxt + a + Ddt + vt , where vt ∼ MVN(0, R)

(13.5)

ct and dt are covariate data, like temperature. At time t and C is a matrix
with the (linear) effects of ct on xt , and D is a matrix with the (linear) effects
of dt on yt .
6

You can compare this to the estimated observation variances by looking at
coef(kem.plank.2)$R

182

13 B estimation

Ives et al. (2003) only include covariates in their process model, and their
process model (their Equation 27) is written Xt = A + BXt−1 + CUt + Et . In
our Equation 13.5, Ut = ct , and C is a m × p matrix, where p is the number of
covariates in ct . We will set their A (our u) to zero by demeaning the y and
implicitly assuming that the mean of the y is a good estimate of the mean of
the x’s. Thus the model where covariates only affect the underlying process is
xt = Bxt−1 + Cct + wt , where wt ∼ MVN(0, Q)
yt = xt + vt , where vt ∼ MVN(0, R)

(13.6)

To fit this model, we first need to prepare the covariate data. We will just
use the phosphorous data.
#transpose to make time go across columns
#drop=FALSE so that R doesn't change our matrix to a vector
phos = t(log(ivesDataByWeek[,"Phosph",drop=FALSE]))
d.phos = (phos-apply(phos,1,mean,na.rm=TRUE))
Why log the covariate data? It is what Ives et al. did, so we follow their
method. However, in general, you want to think about what relationship you
want to assume between the covariates and their effects. For example, log (or
square-root) transformations mean that extremes have less impact relative to
their untransformed value and that a small absolute change, say from 0.01
to 0.0001 in the untransformed value, can mean large difference in the effects
since log(0.0001) < log(0.01).
Phosporous is assumed to only affect phytoplankton so the other terms in
C, corresponding to the zooplankton, are set to 0. The C matrix is defined as
follows:


CLP,phos
CSP,phos 

(13.7)
C=
 0 
0
To add C and c to our latest model, we add C and c to the model list used
in the MARSS call:
plank.model.4=plank.model.3
plank.model.4$C=matrix(list("C11","C21",0,0),4,1)
plank.model.4$c=d.phos
Then we fit the model as usual:
kem.plank.4= MARSS(d.plank.dat, model=plank.model.4)
Success! abstol and log-log tests passed at 55 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is

13.3 Analysis a four-species plankton community

183

Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 55 iterations.
Log-likelihood: -393.189
AIC: 834.3781
AICc: 837.9284

B.B11
B.B12
B.B22
B.B42
B.B23
B.B33
B.B24
B.B44
Q.(1,1)
Q.(2,1)
Q.(3,1)
Q.(4,1)
Q.(2,2)
Q.(3,2)
Q.(4,2)
Q.(3,3)
Q.(4,3)
Q.(4,4)
x0.X.Large Phyto
x0.X.Small Phyto
x0.X.Daphnia
x0.X.Non-daphnia
C.C11
C.C21

Estimate
0.6138
-0.4619
0.3320
0.0479
-0.0182
0.8889
-0.0476
0.6643
0.7376
0.2159
0.0796
0.0293
0.2688
-0.1271
-0.0878
0.8654
0.4685
0.3906
0.1615
-0.5273
-1.1121
-1.8082
0.1385
0.1580

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
13.3.6 Including a covariate observation model
The difficulty with the standard approach to including covariates (Equation
13.5) is that it limits what kind of covariate data you can use and how you
model that covariate data. You have to assume that your covariate data has no
error, which is probably not true. Assuming that your covariate has no error
reduces the reported uncertainty in your covariate effect because you did not
include uncertainty in those values. The standard approach also does not allow
missing values in your covariate data, which is why we did not include the fish
covariate data in the last model. Also you cannot combine instrument time

184

13 B estimation

series; for example, if you have two temperature recorders with different error
rates and biases. Also, what if you have one noisy temperature recorder in the
first part of your time series and then you switch to a much better recorder
in the second half of your time series? All these problems require pre-analysis
massaging of the covariate data, leaving out noisy and gappy covariate data,
and making what can feel like arbitrary choices about which covariate time
series to include which is especially worrisome when the covariates are then
incorporated in the model as known without error.
Instead one can include an observation and process model for the covariates
just like for the non-covariate data. Now the covariates are included in yt
and are modeled with their own state process(es) in xt . A MARSS model
with a covariate observation and process model is shown below. The elements
with superscript (v) are for the variates and those with superscript (c) are
for the covariates. The superscripts just help us keep straight which of the
state processes and parameters corresponding to the parts that correspond
abundances and which correspond to the environmental covariates.
 (v) 
 (v)
  (v) 
 (v) 
  (v)

x
B
C
x
u
Q
0
=
+ (c) + wt , wt ∼ MVN 0,
u
x(c) t
0 B(c) x(c) t−1
0 Q(c)
 (v) 
 (v)
  (v)
  (v)   (v) 

Z
0
a
R
0
y
x
=
+ (c) + vt , vt ∼ MVN 0,
a
y(c) t
0 Z(c) x(c) t
0 R(c)
(13.8)
Note that when you fit your covariate and non-covariate data jointly as in
Equation 13.8, your non-covariate data affect the estimates of the covariate
models. When you maximize the likelihood, you do so conditioned on all the
data. The likelihood that is output is the likelihood of the non-covariate and
covariate data. Depending on your system, you might not want the covariate
model affected by the non-covariate data. In this case, you can fit the covariate
model separately:
(c)

(c)

xt = B(c) xt−1 + u(c) + wt , wt ∼ MVN(0, Q(c) )
(c)

(c)

yt = Z(c) xt + a(c) , vt ∼ MVN(0, R(c) )

(13.9)

At this point, you have another choice. Do you want the estimated covariates states, the x(c) , to be affected by the non-covariate data? For example, you
have temperature data. You can estimates true temperature for the temperature only from the temperature data or you can decide that the non-covariate
data has information about the true temperature, because the non-covariate
states are affected by the true temperature. If you want the covariate states to
only be affected by the covariate data, then use Equation 13.5 with ut set from
your estimates of x(c) from Equation 13.9. Or if you want the non-covariate
data to affect the estimates of the covariate states, use Equation 13.8 with the
parameters estimated from Equation 13.9.

13.3 Analysis a four-species plankton community

185

13.3.7 The MARSS model with covariates following Ives et al.
Ives et al. used Equation 13.5 for phosphorous and Equation 13.8 for fish
biomass. Phosphorous was treated as observed with no error since it was
experimentally manipulated and there were no missing values. Fish biomass
was treated as having observation error and was modeled as a autoregressive
process with unknown parameters as in Equation 13.8.
Their MARSS model takes the form:
xt = Bxt−1 + Cct + wt , where wt ∼ MVN(0, Q)
yt = xt + vt , where vt ∼ MVN(0, R)

(13.10)

where x and y are redefined as



large phyto


small phyto




Daphnia


Non-Daphnia zooplank
fish biomass

(13.11)

The covariate fish biomass appears in x because it will be modeled, and its
interaction terms (Ives et al.’s C terms) appear in B. Phosphorous appears in
the ct terms because it is treated as a known additive term and its interaction
terms appear in C. Recall that we set u to 0 by demeaning the plankton data,
so it does not appear above. The Z matrix does not appear in front of the xt
since there is a one-to-one correspondence the x’s and y’s, and thus Z is the
identity matrix.
The B matrix is


bLP bLP,SP 0
0
0
 (v)
  0 bSP bSP,D bSP,ND
0 


B
C

0
bD
0
CD, f ish 
(13.12)
B=
= 0

0 B(c)
 0 bND,SP 0 bND,ND CND, f ish 
0
0
0
0
b f ish
The B elements have some interactions fixed at 0 as in our last model fit. The
c’s are the interactions between the fish and the species. We will estimate a
B term for fish since Ives et al. did, but this is an odd thing to do for the fish
data since these data were interpolated from two samples per season.
The Q matrix is the same as that in our last model fit, with the addition
of an element for the variance for the fish biomass:


qLP qLP,SP qLP,D qLP,ND 0
 (v)
  qLP,SP qSP qSP,D qSP,ND 0 


Q
0

qD qD,ND 0 
Q=
(13.13)
(c) =  qLP,D qSP,D

0 Q
qLP,ND qSP,ND qD,ND qND 0 
0
0
0
0 q f ish

186

13 B estimation

Again it is odd to estimate a variance term for data interpolated from two
points, but we follow Ives et al. here.
Ives et al. set the observation variance for the logged fish biomass data
to 0.36 (page 320 in Ives et al. (2003)). The observation variances for the
plankton data was set as in our previous model.


0.04 0
0
0
0
 0 0.04 0
0
0 



0
0
0.16
0
0
(13.14)
R=


 0
0
0 0.16 0 
0
0
0
0 0.36
13.3.8 Setting the model structure for the model with fish
covariate data
First we need to add the logged fish biomass to our data matrix.
#transpose to make time go across columns
#drop=FALSE so that R doesn't change our matrix to a vector
fish = t(log(ivesDataByWeek[,"Fish biomass",drop=FALSE]))
d.fish = (fish-apply(fish,1,mean,na.rm=TRUE))
#plank.dat.w.fish = rbind(plank.dat,fish)
d.plank.dat.w.fish = rbind(d.plank.dat,d.fish)
Next make the B matrix. Some elements are estimated and others are fixed
at 0.
B=matrix(list(0),5,5)
diag(B)=list("B11","B22","B33","B44","Bfish")
B[1,2]="B12";B[2,3]="B23"; B[2,4]="B24"
B[4,2]="B42";
B[1:4,5]=list(0,0,"C32","C42")
print(B)
[1,]
[2,]
[3,]
[4,]
[5,]

[,1]
"B11"
0
0
0
0

[,2]
"B12"
"B22"
0
"B42"
0

[,3]
0
"B23"
"B33"
0
0

[,4]
0
"B24"
0
"B44"
0

[,5]
0
0
"C32"
"C42"
"Bfish"

Now we have a B matrix that looks like that in Equation 13.12.
We need to add an extra row to C for the fish biomass row in x:
C=matrix(list("C11","C21",0,0,0),5,1)
Then we set up the R matrix.

13.3 Analysis a four-species plankton community

187

R=matrix(list(0),5,5)
diag(R)=list(0.04,0.04,0.16,0.16,0.36)
Last, we need to set up the Q matrix:
Q=matrix(list(0),5,5);
Q[1:4,1:4]=paste(rep(1:4,times=4),rep(1:4,each=4),sep="")
Q[5,5]="fish"
Q[lower.tri(Q)]=t(Q)[lower.tri(Q)]
print(Q)
[1,]
[2,]
[3,]
[4,]
[5,]

[,1]
"11"
"12"
"13"
"14"
0

[,2]
"12"
"22"
"23"
"24"
0

[,3]
"13"
"23"
"33"
"34"
0

[,4]
"14"
"24"
"34"
"44"
0

[,5]
0
0
0
0
"fish"

13.3.9 Fit the model with covariates
The model is the same as the previous model with updated process parameters
and updated R. We will pass in the updated data matrix with the fish biomass
added:
plank.model.5=plank.model.4
plank.model.5$B=B
plank.model.5$C=C
plank.model.5$Q=Q
plank.model.5$R=R
kem.plank.5=MARSS(d.plank.dat.w.fish, model=plank.model.5)
This is the new B matrix using covariates.
LP
SP
D
ND
LP 0.61 -0.465
--SP
-- 0.333 -0.019 -0.048
D
--- 0.896
-ND
-- 0.044
-- 0.675
Now we are getting are getting close to Ives et al.’s estimates. Compare model
5 in Table 13.1 to the first column.
NOTE! When you include your covariates in your state model (the x part),
the reported log-likelihood is for the variate plus the covariate data. If you
want just the log-likelihood for the variates, then your replace the covariate
data with NAs and run the Kalman filter with your estimated model:
tmp=kem.plank.5
tmp$marss$data[5,]=NA
LL.variates=MARSSkf(tmp)$logLik

188

13 B estimation

Table 13.1. The parameter estimates under the different plankton models. Models
0 to 3 do not include covariates, so the C elements are blank. Bij is the effect of
species i on species j. 1=large phytoplankton, 2=small phytoplankton, 3=Daphnia,
4=non-Daphnia zooplankton. The Ives et al. (2003) estimates are from their table
2 for the low planktivory lake with the observation model.
B11
B22
B33
B44
B12
B23
B24
B42
C11
C21
C32
C42

Ives et al. Model 0 Model 1 Model 2 Model 3 Model 4 Model 5
0.48
0.77
0.50
0.65
0.62
0.61
0.61
0.25
0.51
0.90
0.54
0.51
0.33
0.33
0.74
0.49
0.64
0.83
0.89
0.89
0.90
0.60
0.83
0.81
0.60
0.67
0.66
0.67
-0.39
0.29
0.06
-0.33
-0.32
-0.46
-0.46
-0.17
0.01
0.07
0.00
-0.02
-0.02
-0.02
-0.11
-0.04
0.01
-0.03
0.02
-0.05
-0.05
0.10
1.35
0.37
0.13
0.09
0.05
0.04
0.25
0.14
0.14
0.25
0.16
0.16
-0.14
-0.04
-0.04
-0.01

MARSSkf is the Kalman filter function and it needs a fitted model as output
by a MARSS call. We set up a temporary fitted model, tmp, equal to our fitted
model and then set the covariate data in that to NAs. Note we need to do this
for the marssMODEL object used by MARSSkf, which will be in $marss. We
then pass that temporary fitted model to MARSSkf to get the log-likelihood of
just the variates.
13.3.10 Discussion
The estimates for model 4 are fairly close to the Ives et al. estimates, but still
a bit different. There are two big difference between model 4 and the Ives et
al. analysis. Ives et al. had data from three lakes and the estimate of Q used
the data from all lakes.
Combining data, whether it be from different areas or years, can be done
in a MARSS model as follows. Let y1 be the first data set (say from site 1)
and y2 be the second data set (say from site 2). Then a MARSS model with
shared parameters values across datasets would be
+
xt+ = B+ xt−1
+ u+ wt , where wt ∼ MVN(0, Q+ )

yt+ = Z+ xt+ + a+ + vt , where vt ∼ MVN(0, R+ )

(13.15)

where the + matrices are stacked matrices from the different sites (1 and 2):

13.4 Stability metrics from estimated interaction matrices

189

  

  
 

x1,t
B 0 x1,t−1
u
Q q
=
+
+ wt , wt ∼ MVN 0,
x2,t
0 B x2,t−1
u
q Q
(13.16)
  
   
 

y1,t
Z 0 x1,t
a
R 0
=
+
+ vt , vt ∼ MVN 0,
y2,t
0 Z x2,t
a
0 R
The q in the process variance allows that the environmental variability might
might be correlated between datasets, i.e. if they are replicate plots that are
nearby, say. If you did not want all the parameters shared, then you replace
the B in B+ with B1 and B2 , say.
The second big difference is that Ives et al. did not demean their data, but
estimated u. We could have done that too, but with all the NAs in the data
(during winter), estimating u is not robust and takes a long time. You can try
the analysis on the data that has not been demeaned and set U="unequal".
The results are not particularly different, but it takes a long, long,...long time
to converge.
You can also try using the actual fish data instead of the interpolated data.
Fish biomass was estimated at the end and start of the season, so only the
values at the start and finish of strings of fish numbers are the real data. The
others are interpolated. You can fill in those interpolated values with NAs
(missing values) and rerun model 4. The results are not appreciably different,
but the effect of fish drops a bit as you might expect when you have less fish
information. You don’t see it here, but your estimated confidence in the fish
effects would also drop since this estimate is based on less fish data.

13.4 Stability metrics from estimated interaction
matrices
The previous sections focused on estimation of the B and C matrices. The
estimated B matrix gives a picture of the species interactions, but it also can
be used to compute metrics of the intrinsic community stability (Ives et al.,
2003). Here we illustrate how to compute these metric; the reader should see
Ives et al. (2003) for details on the meaning of each.
For the examples here, we will use the estimated B and Q matrices from
our model 5:
B = coef(kem.plank.5,type="matrix")$B[1:4,1:4]
Q = coef(kem.plank.5,type="matrix")$Q[1:4,1:4]
13.4.1 Return rate metrics
Return rate metrics measure how rapidly the system returns to the stationary
distribution of species abundances after it is perturbed away from the stationary distribution. With a deterministic (Q = 0) MARSS community model, the

190

13 B estimation

equilibrium is a point or stable limit cycle. In a stochastic model (Q 6= 0), the
equilibrium is stochastic and is a stationary distribution. Rate of return to
the stochastic equilibrium is the rate at which the distribution converges to
the stationary distribution after a perturbation away from this stationary distribution. The more rapid the convergence, the more stable the system.
The rate of return of the mean of the stationary distribution is governed
by the dominant eigenvalue of B. In R, we can compute this as:
max(eigen(B)$values)
[1] 0.8964988
The rate of return of the variance of the stationary distribution is governed
by the dominant eigenvalue of B ⊗ B:
max(eigen(kronecker(B,B))$values)
[1] 0.8037101
13.4.2 Variance metrics
These metrics measure the variance of the stationary distribution of species
abundances (with variance due to environmental drivers removed) relative to
the process error variance. The system is considered more stable when the
stationary distribution variance is low relative to the process error variance.
To compute variance metrics, we need to first compute the variancecovariance matrix for the stationary distribution, V∞ :
m=nrow(B)
vecV = solve(diag(m*m)-kronecker(B,B))%*%as.vector(Q)
V_inf = matrix(vecV,nrow=m,ncol=m)
A measure of the proportion of the “volume” of the stationary distribution
due to species interactions is given by the square of the determinant of the B
matrix (Eqn. 24 in Ives et al. (2003)):
abs(det(B))^2
[1] 0.01559078
To compare stability across systems of different sizes, you scale by the number
of species:
abs(det(B))^(2/nrow(B))
[1] 0.3533596

13.5 Further information

191

13.4.3 Reactivity metrics
Reactivity measure how the system responds to a perturbation. A highly reactive system tends to move farther away from a stable equilibrium immediately after a perturbation, even though the system will eventually return
to the equilibrium. High reactivity occurs when species interactions greatly
amplify the environmental variance to produce a stationary distribution with
high variances in the abundance of individual species.
Both metrics of reactivity of estimates of the average expected change in
distance from the mean of the stationary distribution. The first uses estimates
of Q and V∞ .
-sum(diag(Q))/sum(diag(V_inf))
[1] -0.346845
Estimation of Q is prone to high uncertainty. Another metric that uses only
B is the worst-case reactivity. This is given by
max(eigen(t(B)%*%B)$values)-1
[1] -0.1957795

13.5 Further information
MAR modeling and models have been used estimate species interaction
strengths, stability metrics, and environmental drivers for a variety of freshwater plankton systems (Ives, 1995; Ives et al., 1999, 2003; Hampton et al.,
2008, 2006; Hampton and Schindler, 2006; Klug and Cottingham, 2001). They
have been used to gain much insight into the dynamics of ecological communities and how environmental drivers affect the system. See ? for a review of
the literature using MAR models to understand plankton dynamics.

14
Combining data from multiple time series

14.1 Overview
In this section, we consider the case where multiple time series exist and we
want to use all the datasets to estimate a common underlying state-process
or common underlying parameters. In ecological applications, this situation
arises when 1) They are time series of observations from the same species as
the original time series (e.g. aerial and land based surveys of the same same
species) or 2) They are time series collected in the same survey, but represent
observations of multiple species (e.g. we might be doing a fisheries trawl survey
that collects multiple species in each trawl). Why should we consider using
other time series? In the first scenario, where methodology differs between
time series of the same species, observation error may be survey-specific. These
time series may represent observations of multiple populations, or these may
represent multiple observations of the same population. In the second scenario,
each species should be treated as a separate process (given its own state
vector), but because the survey methodology is the same across species, it
might be reasonable to assume a shared observation error variance. If these
species have a similar response to environmental stochasticity, it might be
possible to also assume a shared process variance.
In both of the above examples, MARSS models offer a way to linking
multiple time series. If parameters are allowed to be shared among the state
processes (trend parameters, process variances) or observation processes (observation variances), parameter estimates will be more precise than if we
treated each time series as independent. By improving estimates of variance

Type RShowDoc("Chapter_CombiningTrendData.R",package="MARSS") at the R
command line to open a file with all the code for the examples in this chapter.

194

14 Combining data from multiple time series

parameters, we will also be better able to discriminate between process and
observation error variances.
In this chapter, we will show examples of using MARSS to analyze multiple
time series on the same species but either different populations or different
survey methods. The multivariate first-order autoregressive state process is
written as usual as:
xt = Bxt−1 + u + wt where wt ∼ MVN(0, Q)

(14.1)

The true population sizes at time t are represented by the state xt , whose
dimensions are equal to the number of state processes (m). The m × m matrix
B allows interaction between processes (density dependence and competition,
for instance), u is a vector describing the mean trend, and the correlation of
the process deviations is determined by the structure of the matrix Q.
The multivariate observation error model is expressed as,
yt = Zxt + a + vt where vt ∼ MVN(0, R)

(14.2)

where yt is a vector of observations at time t, Z is a design matrix of 0s and 1s,
a is a vector of bias adjustments, and the correlation structure of observation
matrices is specified with the matrix R. Including Z and a will not be required
for every model, but is useful when some processes are observed multiple times.

14.2 Salmon spawner surveys
In our first application combining multiple time series, we will analyze a
dataset on Chinook salmon (Oncorhynchus tshawytscha). This data comes
from the Okanogan River in Washington state, a major tributary of the
Columbia River (headwaters in British Columbia). As an index of the abundance of spawning adults, biologists have conducted redd surveys during summer months (redds are nests or collection of rocks on stream bottoms where
females deposit eggs). Aerial surveys of redds on the Okanogan have been
conducted 1956-2008. Alternative ground surveys of were initiated in 1990,
and have been conducted annually.
14.2.1 Read in and plot the raw data
We will be using the logged counts.
head(okanaganRedds)
Year aerial ground
[1,] 1956
37
NA
[2,] 1957
53
NA
[3,] 1958
94
NA

14.2 Salmon spawner surveys

[4,] 1959
[5,] 1960
[6,] 1961

50
29
NA

195

NA
NA
NA

logRedds = log(t(okanaganRedds)[2:3,])
Year is in the first column, and the counts (in normal space) are in columns
2:3. Missing observations are represented by NAs.

2000

# Code for plotting raw Okanagan redd counts
plot(okanaganRedds[,1], okanaganRedds[,2],
xlab = "Year", ylab="Redd counts",main="", col="red", pch=1)
points(okanaganRedds[,1], okanaganRedds[,3], col="blue", pch=2)
legend('topleft', inset=0.1, legend=c("Aerial survey","Ground survey"),
col=c("red","blue"), pch=c(1,2))

●
●

1500

●

Aerial survey
Ground survey

●

● ●

1000

Redd counts

●

●
●

500

●

●

0

●
●
●● ●●

1960

●●

●

●
●
●

● ●
●

●

●

1970

●●

●●
●

●

●

●●
●

●
● ●
●

1980

●●●

●
●●

1990

●
●
● ●

2000

2010

Year

Fig. 14.1. The two time series look to be pretty close to one another in the years
where there is overlap.

196

14 Combining data from multiple time series

14.2.2 Test hypotheses about whether the data can be combined
Do these surveys represent observations of the same underlying process? We
can evaluate data support for this question by testing a few relatively simple models. Using the logged data, we will start with a simple model that
assumes the underlying population process is univariate (there is one underlying population trajectory) and each survey is an independent observation of
this population process. Mathematically, the model is:



yaer
ygnd


t

xt = xt−1 + u + wt , where wt ∼ N(0, q)
 
  

  
1
0
vaer
r0
=
x +
+
, where vt ∼ MVN (0,
1 t
a2
vgnd t
0r

(14.3)

The a structure means that the a for one of the y’s is fixed at 0 and the other a
is estimated relative to that fixed a. In MARSS, this is the “scaling” structure
for a. We specify this model in MARSS as follows. Since x is univariate, Q
and u are just scalars (single numbers) and have no structure (like ‘diagonal’).
Thus we can leave them off in our specification.
model1=list()
model1$R="diagonal and equal"
model1$Z=matrix(1,2,1) #matrix of 2 rows and 1 column
model1$A="scaling" #the default
# Fit the single state model, where the time series are assumed
# to be from thesame population.
kem1 = MARSS(logRedds, model=model1)
We can print the AIC and AICc values for this model by typing kem1$AIC
and kem1$AICc.
How would we modify the above model to let the observation error variances to be unique? We can do this in our second model:
model2=model1 #model2 is based on model1
model2$R="diagonal and unequal"
kem2 = MARSS(logRedds, model=model2)
It is possible that these surveys are measuring different population processes, so for our third model, we will fit a model with two different population
process with the same process parameters. For simplicity, we will keep the
trend and variance parameters the same. Mathematically, the model we are
fitting is:
 
 
 
 
x1
x
u
q0
= 1
+
+ wt , where wt ∼ MVN(0,
)
x2 t
x2 t−1
u
0q


     

   (14.4)
yaer
1 0 x1
0
v
r0
=
+
+ aer , where vt ∼ MVN 0,
ygnd t
0 1 x2 t
0
vgnd t
0r
We specify this in MARSS as

14.2 Salmon spawner surveys

197

model3=list()
model3$Q="diagonal and equal"
model3$R="diagonal and equal"
model3$U="equal"
model3$Z="identity"
model3$A="zero"
kem3 = MARSS(logRedds, model=model3)

8

Based on AIC, it looks like the best model is also the simplest one, with
one state vector (model1). This suggests that the two different surveys are not
only measuring the same thing, but have the same observation error variance.
Finally,we will make a plot of the model-predicted states (with +/- 2 s.e.s)
and the log-transformed data (Figure 14.2).

●

●

●
●
●
● ●
●

6

●
●
● ●

4

●
●

● ●

●

●
●

●

●

●●

●

●

●

●

●

●

●
●

● ●

●
●

●

●
●

●

●

●

0

2

Redd counts

●

●

●

●
●●

●

●

●
●

1960

1970

1980

1990

2000

2010

Year

Fig. 14.2. The data support the hypothesis that the two redd-count time series are
observations of the same population. The points are the data and the thick black
line is the estimated underlying state.

198

14 Combining data from multiple time series

14.3 American kestrel abundance indices
In this example, we evaluate uncertainty in the structure of process variability (environmental stochasticity) using some bird survey data. Breeding Bird
Surveys have been conducted in the U.S. and Canada for 50+ years. In this
analysis, we focus on 3 time series of American kestrel (Falco sparverius)
abundance from adjacent Canadian provinces along a longitudinal gradient
(British Columbia, Alberta, Saskatchewan). Data have been collected annually, and corrected for changes in observer coverage and detectability.
14.3.1 Read in and look at the data
Figure 14.3 shows the data. The data are already log transformed.
birddat = t(kestrel[,2:4])
head(kestrel)
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]

Year British.Columbia Alberta Saskatchewan
1969
0.754
0.460
0.000
1970
0.673
0.899
0.192
1971
0.734
1.133
0.280
1972
0.589
0.528
0.386
1973
1.405
0.789
0.451
1974
0.624
0.528
0.234

We know that the surveys use the same design, so we will force observation error to be shared. Our uncertainty lies in whether these time series are
sampling the same population, and how environmental stochasticity varies by
subpopulation (if there are subpopulations). Our first model has one population trajectory (meaning there is one big panmictic BC/AB/SK population)
and each of these three surveys is an observation of this big population with
equal observation variances. Mathematically, the model is:
xt = xt−1 + u + wt , where wt ∼ N(0, q)
 
   
 



yBC
1
0
vBC
r00 
yAB  = 1 xt + a2  + vAB  , where vt ∼ MVN 0, 0 r 0
ySK t
1
a3
vSK t
00r

(14.5)

In MARSS, we denote the model:
model.b1=list()
model.b1$R="diagonal and equal"
model.b1$Z=matrix(1,3,1)
kem.b1 = MARSS(birddat, model=model.b1, control=list(minit=100) )
As for the redd count example, we do not need to specify the structure of Q
and u since they are scalar and have no structure.

2.0

14.3 American kestrel abundance indices

●

●

●
●

1.5

●

British Columbia
Alberta
Saskatchewan

●●
●

●

●

●

●

1.0

●

●

●

●

●

●

●●

●

●

●

●

●
●

●
●

●

●

●●

0.5

Index of kestrel abundance

●

●

199

●
●
●
●
●●

0.0

●

1970

1980

1990

2000

Year

Fig. 14.3. The kestrel data.

kem.b1$AICc
[1] 20.9067
Let’s compare this to a model where we assume that there is a separate
population for British Columbia, Alberta, and Saskatchewan but they have
the same process parameters (trend and process variance). Mathematically,
this model is:
 
 
 


xBC
xBC
u
q00
xAB  = xAB  + u + wt , where wt ∼ MVN(0, 0 q 0)
xSK t
xSK t−1
u
00q
 

     
 

yBC
1 0 0 xBC
0
vBC
r00
yAB  = 0 1 0 xAB  + 0 + vAB  , where vt ∼ MVN 0, 0 r 0
ySK t
0 0 1 xSK t
0
vSK t
00r
(14.6)
This is specified as:
model.b2=list()
model.b2$Q="diagonal and equal"

200

14 Combining data from multiple time series

model.b2$R="diagonal and equal"
model.b2$Z="identity"
model.b2$A="zero"
model.b2$U="equal"
kem.b2 = MARSS(birddat, model=model.b2)
The AICc for this model is
kem.b2$AICc
[1] 22.96714
Because these populations are surveyed over a relatively large geographic
area, it is reasonable to expect that environmental variation may differ between populations. For our third model, we will fit a model with separate
processes that are allowed to have unequal process parameters.
model.b3=model.b2 #is is based on model.b2
#all we change is the structure of Q
model.b3$Q="diagonal and unequal"
model.b3$U="unequal"
kem.b3 = MARSS(birddat, model=model.b3)
kem.b3$AICc
[1] 23.75125
Finally for a fourth model, we will consider lumping Alberta/Saskatchewan,
because the time series suggest similar trends. Mathematically, this model is:




 
  
xBC
u
q0
=
+
+ wt , where wt ∼ MVN 0,
xAB−SK t
xAB−SK t−1
u
0q
 
 
   
 



yBC
10
0
vBC
r00
x
BC
yAB  = 0 1
+  0  + vAB  , where vt ∼ MVN 0, 0 r 0
xAB−SK t
ySK t
01
a3
vSK t
00r
(14.7)
This model is specified as
xBC



model.b4=list()
model.b4$Q="diagonal and unequal"
model.b4$R="diagonal and equal"
model.b4$Z=factor(c("BC","AB-SK","AB-SK"))
model.b4$A="scaling"
model.b4$U="unequal"
kem.b4 = MARSS(birddat, model=model.b4)
kem.b4$AICc
[1] 14.76889

14.3 American kestrel abundance indices

201

2.0

This last model was superior to the others, improving the AICc value compared to model 1 by 8 units. Figure 14.4 shows the fits for this model.

●

●

●
●

1.5

●

●●
●

●

●

●

●

1.0

●

●

●

●

●

●

●●

●

●

●

British Columbia
Alberta
Saskatchewan

●

●

●
●

●

●

0.5

●

●

●●

●
●
●
●

●

●●

0.0

Index of kestrel abundance

●

1970

1980

1990

2000

Year

Fig. 14.4. Plot model 4 fits to the kestrel data.

15
Univariate dynamic linear models (DLMs)

15.1 Overview of dynamic linear models
In this chapter, we will use MARSS to analyze dynamic linear models (DLMs),
wherein the parameters in a regression model are treated as time-varying.
DLMs are used commonly in econometrics, but have received less attention in
the ecological literature (Lamon et al., 1998; Scheuerell and Williams, 2005,
c.f.). Our treatment of DLMs is rather cursory—we direct the reader to excellent textbooks by Pole et al. (1994) and Petris et al. (2009) for more in-depth
treatments of DLMs. The former focuses on Bayesian estimation whereas the
latter addresses both likelihood-based and Bayesian estimation methods.
We begin our description of DLMs with a static regression model, wherein
the ith observation is a linear function of an intercept, predictor variable(s),
and a random error term. For example, if we had one predictor variable (F),
we could write the model as
yi = α + βFi + vi ,

(15.1)

where the α is the intercept, β is the regression slope, Fi is the predictor
variable matched to the ith observation (yi ), and vi ∼ N(0, r). It is important
to note here that there is no implicit ordering of the index i. That is, we could
shuffle any/all of the (yi , Fi ) pairs in our dataset with no effect on our ability
to estimate the model parameters. We can write the model in Equation 16.2
using vector notation, such that
 

α
yi = 1 Fi ×
+ vi
β
= F>
i θ + vi ,

(15.2)

Type RShowDoc("Chapter_UnivariateDLM.R",package="MARSS") at the R command line to open a file with all the code for the examples in this chapter.

204

15 Univariate dynamic linear models

>
and F>
i = (1, Fi ) and θ = (α, β) .
In a DLM, however, the regression parameters are dynamic in that they
“evolve” over time. For a single observation at time t, we can write

yt = Ft> θt + vt ,

(15.3)

where Ft is a column vector of regression variables at time t, θt is a column vector of regression parameters at time t and vt ∼ N(0, r). This formulation presents two features that distinguish it from Equation 15.2. First, the
observed data are explicitly time ordered (i.e., y = {y1 , y2 , y3 , ..., yT }), which
means we expect them to contain implicit information. Second, the relationship between the observed datum and the predictor variables are unique at
every time t (i.e., θ = {θ1 , θ2 , θ3 , ..., θT }).
However, closer examination of Equation 15.3 reveals an apparent complication for parameter estimation. With only one datum at each time step t, we
could, at best, estimate only one regression parameter, and even then, the 1:1
correspondence between data and parameters would preclude any estimation
of parameter uncertainty. To address this shortcoming, we return to the time
ordering of model parameters. Rather than assume the regression parameters
are independent from one time step to another, we instead model them as an
autoregressive process where
θt = Gt θt−1 + wt ,

(15.4)

Gt is the parameter “evolution” matrix, and wt is a vector of process errors,
such that wt ∼ MVN(0, Q). The elements of Gt may be known and fixed a
priori, or unknown and estimated from the data. Although we allow for Gt to
be time-varying, we will typically assume that it is time invariant.
The idea is that the evolution matrix Gt deterministically maps the parameter space from one time step to the next, so the parameters at time t are
temporally related to those before and after. However, the process is corrupted
by stochastic error, which amounts to a degradation of information over time.
If the diagonal elements of Q are relatively large, then the parameters can
vary widely from t to t + 1. If Q = 0, then θ1 = θ2 = θT and we are back to
the static model in Equation 16.2.

15.2 Example of a univariate DLM
Let’s consider an example from the literature. Scheuerell and Williams (2005)
used a DLM to examine the relationship between marine survival of Chinook
salmon and an index of ocean upwelling strength along the west coast of
the USA. Upwelling brings cool, nutrient-rich waters from the deep ocean
to shallower coastal areas. Scheuerell & Williams hypothesized that stronger
upwelling in April should create better growing conditions for phytoplankton,
which would then translate into more zooplankton. In turn, juvenile salmon

15.2 Example of a univariate DLM

205

(“smolts”) entering the ocean in May and June should find better foraging
opportunities. Thus, for smolts entering the ocean in year t,
survivalt = αt + βt Ft + vt with vt ∼ N(0, r),

(15.5)

and Ft is the coastal upwelling index (cubic meters of seawater per second per
100 m of coastline) for the month of April in year t.
Both the intercept and slope are time varying, so
(1)

αt = αt−1 + wt

(2)
βt = βt−1 + wt

(1)

with wt
with

(2)
wt

∼ N(0, q1 ); and

(15.6)

∼ N(0, q2 ).

(15.7)

(1)

(2)

If we define θt = (αt , βt )> , Gt = I ∀ t, wt = (wt , wt )> , and Q = diag(q1 , q2 ),
we get Equation 15.4. If we define yt = survivalt and Ft = (1, Ft )> , we can write
out the full univariate DLM as a state-space model with the following form:
θt = Gt θt−1 + wt with wt ∼ MVN(0, Q);
yt = Ft> θt + vt with vt ∼ N(0, r);

(15.8)

θ0 ∼ MVN(π0 , Λ0 ).
Equation 15.8 is, not surprisingly, equivalent to our standard MARSS model:
xt = Bt xt−1 + ut + Ct ct + wt with wt ∼ MVN(0, Qt );
yt = Zt xt + at + Dt dt + vt with vt ∼ MVN(0, Rt );

(15.9)

x0 ∼ MVN(π, Λ);
where xt = θt , Bt = Gt , ut = Ct = ct = 0, yt = yt (i.e., yt is 1 x 1), Zt = Ft> ,
at = Dt = dt = 0, and Rt = r (i.e., Rt is 1 x 1).
15.2.1 Fitting a univariate DLM with MARSS
Now let’s go ahead and analyze the DLM specified in Equations 15.5–15.8. We
begin by getting the data set, which has 3 columns for 1) the year the salmon
smolts migrated to the ocean (year), 2) logit-transformed survival1 (logit.s),
and 3) the coastal upwelling index for April (CUI.apr). There are 42 years of
data (1964–2005).
# load the data
data(SalmonSurvCUI)
# get time indices
1

Survival in the original context was defined as the proportion of juveniles that
survive to adulthood. Thus, we use the logit function, defined as logit(p) =
loge (p/[1 − p]), to map survival from the open interval (0,1) onto the interval
(−∞, ∞), which allows us to meet our assumption of normally distributed observation errors.

206

15 Univariate dynamic linear models

years = SalmonSurvCUI[,1]
# number of years of data
TT = length(years)
# get response data: logit(survival)
dat = matrix(SalmonSurvCUI[,2],nrow=1)
As we have seen in other case studies, standardizing our covariate(s) to
have zero-mean and unit-variance can be helpful in model fitting and interpretation. In this case, it’s a good idea because the variance of CUI.apr is orders
of magnitude greater than survival.
# get regressor variable
CUI = SalmonSurvCUI[,3]
# z-score the CUI
CUI.z = matrix((CUI - mean(CUI))/sqrt(var(CUI)), nrow=1)
# number of regr params (slope + intercept)
m = dim(CUI.z)[1] + 1

−4.0

●●●

●
●

●
●

●

●

●

●
●

●

●

●●

●●

●●

●●

●

●
●
●● ●

●●

●

●

●
●

●

●
●●

●

●

●

1

●

●

●
●

−3 −1

CUI

●●

●
●

−6.0

Logit(s)

Plots of logit-transformed survival and the z-scored April upwelling index are
shown in Figure 15.1.

●

●●
●

●

●
●●●● ●●● ●
●
●

●

●●●

●●

●●

●

1970

1975

1980

1985

1990

●

●
●

●●
●

●

1965

●

●

●

1995

2000

2005

Year of ocean entry

Fig. 15.1. Time series of logit-transformed marine survival estimates for Snake
River spring/summer Chinook salmon (top) and z -scores of the coastal upwelling
index at 45N 125W (bottom). The x -axis indicates the year that the salmon smolts
entered the ocean.

Next, we need to set up the appropriate matrices and vectors for MARSS.
Let’s begin with those for the process equation because they are straightforward.

15.2 Example of a univariate DLM

# for process eqn
B = diag(m)
U = matrix(0,nrow=m,ncol=1)
Q = matrix(list(0),m,m)
diag(Q) = c("q1","q2")

#
#
#
#

2x2;
2x1;
2x2;
2x2;

207

Identity
both elements = 0
all 0 for now
diag = (q1,q2)

Defining the correct form for the observation model is a little more tricky,
however, because of how we model the effect(s) of explanatory variables. In
a DLM, we need to use Zt (instead of dt ) as the matrix of known regressors/drivers that affect yt , and xt (instead of Dt ) as the regression parameters.
Therefore, we need to set Zt equal to an n x m x T array, where n is the number of response variables (= 1; yt is univariate), m is the number of regression
parameters (= intercept + slope = 2), and T is the length of the time series
(= 42).
# for observation eqn
Z = array(NA, c(1,m,TT))
Z[1,1,] = rep(1,TT)
Z[1,2,] = CUI.z
A = matrix(0)
R = matrix("r")

#
#
#
#
#

NxMxT; empty for now
Nx1; 1's for intercept
Nx1; regr variable
1x1; scalar = 0
1x1; scalar = r

Lastly, we need to define our lists of initial starting values and model
matrices/vectors.
# only need starting values for regr parameters
inits.list = list(x0=matrix(c(0, 0), nrow=m))
# list of model matrices & vectors
mod.list = list(B=B, U=U, Q=Q, Z=Z, A=A, R=R)
And now we can fit our DLM with MARSS.
# fit univariate DLM
dlm1 = MARSS(dat, inits=inits.list, model=mod.list)
Success! abstol and log-log tests passed at 115 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 115 iterations.
Log-likelihood: -40.03813
AIC: 90.07627
AICc: 91.74293

R.r

Estimate
0.15708

208

15 Univariate dynamic linear models

Q.q1
0.11264
Q.q2
0.00564
x0.X1 -3.34023
x0.X2 -0.05388
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.

−4.0
−0.2 0.4

βt

1.0

−6.0

αt

Notice that the MARSS output does not list any estimates of the regression
parameters themselves. Why not? Remember that in a DLM the matrix of
states (x) contains the estimates of the regression parameters (θ). Therefore,
we need to look in dlm1$states for the MLEs of the regression parameters,
and in dlm1$states.se for their standard errors.
Time series of the estimated intercept and slope are shown in Figure 15.2.
It appears as though the intercept is much more dynamic than the slope, as
indicated by a much larger estimate of process variance for the former (Q.q1).
In fact, although the effect of April upwelling appears to be increasing over
time, it doesn’t really become important as an explanatory variable until
about 1990 when the approximate 95% confidence interval for the slope no
longer overlaps zero.

1965

1970

1975

1980

1985

1990

1995

2000

2005

Year of ocean entry

Fig. 15.2. Time series of estimated mean states (thick lines) for the intercept (top)
and slope (bottom) parameters from the univariate DLM specified by Equations
15.5–15.8. Thin lines denote the mean ± 2 standard deviations.

15.3 Forecasting with a univariate DLM

209

15.3 Forecasting with a univariate DLM
Scheuerell and Williams (2005) were interested in how well upwelling could be
used to actually forecast expected survival of salmon, so let’s look at how well
our model does in that context. To do so, we need the predictive distributions
for the regression parameters and observation.
Beginning with our definition for the distribution of the parameters at
time t = 0, θ0 ∼ MVN(π0 , Λ0 ) in Equation 15.8, we write
θt−1 |y1:t−1 ∼ MVN(πt−1 , Λt−1 )

(15.10)

to indicate the distribution of θ at time t − 1 conditioned on the observed
data through time t − 1 (i.e., y1:t−1 ). Then, we can write the one-step ahead
predictive distribution for θt given y1:t−1 as
θt |y1:t−1 ∼ MVN(ηt , Φt ), where
ηt = Gt πt−1 , and

(15.11)

Φt = Gt Λt−1 Gt> + Q.
Consequently, the one-step ahead predictive distribution for the observation
at time t given y1:t−1 is
yt |y1:t−1 ∼ N(ζt , Ψt ), where
ζt = Ft ηt , and

(15.12)

Ψt = Ft Φt Ft> + R.
15.3.1 Forecasting a univariate DLM with MARSS
Working from Equation 15.12, we can now use MARSS to compute the expected value of the forecast at time t ( E[yt |y1:t−1 ] = ζt ), and its variance
( var[yt |y1:t−1 ] = Ψt ). For the expectation, we need Ft ηt . Recall that Ft is our
1×m matrix of explanatory variables at time t (Ft is called Zt in MARSS notation). The one-step ahead forecasts of the parameters at time t (ηt ) are calculated as part of the Kalman filter algorithm—they are termed x̃tt−1 in MARSS
notation and stored as 'xtt1' in the list produced by the MARSSkfss() function.
# get list of Kalman filter output
kf.out = MARSSkfss(dlm1)
# forecasts of regr parameters; 2xT matrix
eta = kf.out$xtt1
# ts of E(forecasts)
fore.mean = vector()
for(t in 1:TT) {
fore.mean[t] = Z[,,t] %*% eta[,t,drop=F]
}

210

15 Univariate dynamic linear models

For the variance of the forecasts, we need Ft Φt Ft> + R. As with the mean,
Ft ≡ Zt . The variances of the one-step ahead forecasts of the parameters at
time t (Φt ) are also calculated as part of the Kalman filter algorithm—they
are stored as 'Vtt1' in the list produced by the MARSSkfss() function. Lastly,
the observation variance R is part of the standard MARSS output.
# variance of regr parameters; 1x2xT array
Phi = kf.out$Vtt1
# obs variance; 1x1 matrix
R.est = coef(dlm1, type="matrix")$R
# ts of Var(forecasts)
fore.var = vector()
for(t in 1:TT) {
tZ = matrix(Z[,,t],m,1) # transpose of Z
fore.var[t] = Z[,,t] %*% Phi[,,t] %*% tZ + R.est
}

−4

●●●

●
●

●
●

●●

●
●

●

●

●
●

−6

●

●
●

●●

●●

●●

●

●●

●
●
●● ●

●●

●

●

●
●

●
●

●●

−8

Logit(s)

−2

Plots of the model mean forecasts with their estimated uncertainty are
shown in Figure 15.3. Nearly all of the observed values fell within the approximate prediction interval. Notice that we have a forecasted value for the first
year of the time series (1964), which may seem at odds with our notion of
forecasting at time t based on data available only through time t − 1. In this
case, however, MARSS is actually estimating the states at t = 0 (θ0 ), which
allows us to compute a forecast for the first time point.

1965

1970

1975

1980

1985

1990

1995

2000

2005

Year of ocean entry

Fig. 15.3. Time series of logit-transformed survival data (blue dots) and model
mean forecasts (thick line). Thin lines denote the approximate 95% prediction intervals.

Although our model forecasts look reasonable in logit-space, it is worthwhile to examine how well they look when the survival data and forecasts
are back-transformed onto the interval [0,1] (Figure 15.4). In that case, the
accuracy does not seem to be affected, but the precision appears much worse,

15.3 Forecasting with a univariate DLM

211

0.06

Survival

0.12

especially during the early and late portions of the time series when survival
is changing rapidly.

●

0.00

●●●

1965

●

●

●●

●
●●

1970

●●

●

●● ●
●
● ●●●●●
●●●●●●
●● ●●●

1975

1980

1985

1990

1995

●●

●●
●●●

2000

2005

Year of ocean entry

Fig. 15.4. Time series of survival data (blue dots) and model mean forecasts (thick
line). Thin lines denote the approximate 95% prediction intervals.

15.3.2 DLM forecast diagnostics
As with other time series models, evaluation of a DLM should include some
model diagnostics. In a forecasting context, we are often interested in the
forecast errors, which are simply the observed data minus the forecasts (et =
yt − ζt ). In particular, the following assumptions should hold true for et :
1. et ∼ N(0, σ2 );
2. cov(et , et−k ) = 0.
In the literature on state-space models, the set of et are commonly referred to as “innovations”. MARSS() calculates the innovations as part of the
Kalman filter algorithm—they are stored as 'Innov' in the list produced by
the MARSSkfss() function.
# forecast errors
innov = kf.out$Innov
Let’s see if our innovations meet the model assumptions. Beginning with
(1), we can use a Q-Q plot to see whether the innovations are normally distributed with a mean of zero. We’ll use the qqnorm() function to plot the
quantiles of the innovations on the y-axis versus the theoretical quantiles from
a Normal distribution on the x-axis. If the 2 distributions are similar, the
points should fall on the line defined by y = x.
# Q-Q plot of innovations
qqnorm(t(innov), main="", pch=16, col="blue")

212

15 Univariate dynamic linear models

●

−0.5

0.5

●

● ●

−1.5

Sample Quantiles

1.5

# add y=x line for easier interpretation
qqline(t(innov))

●●
●
●●
●●●
●●●
●
●●●●
●●
●●
●●●●●
●●●●
●
●
●●●
●
●●

●

−2

−1

0

1

2

Theoretical Quantiles

Fig. 15.5. Q-Q plot of the forecast errors (innovations) for the DLM specified in
Equations 15.5–15.8.

The Q-Q plot (Figure 15.5) indicates that the innovations appear to be
more-or-less normally distributed (i.e., most points fall on the line). Furthermore, it looks like the mean of the innovations is about 0, but we should
use a more reliable test than simple visual inspection. We can formally test
whether the mean of the innovations is significantly different from 0 by using
a one-sample t-test. based on a null hypothesis of E(et ) = 0. To do so, we will
use the function t.test() and base our inference on a significance value of
α = 0.05.
# p-value for t-test of H0: E(innov) = 0
t.test(t(innov), mu=0)$p.value
[1] 0.4840901
The p-value >> 0.05 so we cannot reject the null hypothesis that E(et ) = 0.
Moving on to assumption (2), we can use the sample autocorrelation function (ACF) to examine whether the innovations covary with a time-lagged
version of themselves. Using the acf() function, we can compute and plot
the correlations of et and et−k for various values of k. Assumption (2) will be
met if none of the correlation
coefficients exceed the 95% confidence intervals
√
defined by ± z0.975 / n.
# plot ACF of innovations
acf(t(innov), lag.max=10)
The ACF plot (Figure 15.6) shows no significant autocorrelation in the innovations at lags 1–10, so it looks like both of our model assumptions have
indeed been met.

213

0.2
−0.2

ACF

0.6

1.0

15.3 Forecasting with a univariate DLM

0

2

4

6

8

10

Lag

Fig. 15.6. Autocorrelation plot of the forecast errors (innovations) for the DLM
specified in Equations 15.5–15.8. Horizontal blue lines define the upper and lower
95% confidence intervals.

214

15 Univariate dynamic linear models

Type RShowDoc("Chapter_MLR.R",package="MARSS") at the R command line to
open a file with all the code for the examples in this chapter.

16
Multivariate linear regression

This chapter shows how to write regression models with multivariate responses
and multivariate explanatory variables in MARSS form. R has many excellent packages for multiple linear regression. We will be showing how to use the
MARSS() function to fit these models, but note that R’s standard linear regression functions would be much better choices in most cases. The purpose of
this chapter is to show the relationship between multivariate linear regression
and the MARSS equation.
In a classic linear regression, the response variable (y) is univariate and
there may one to multiple explanatory variables (d1 , d2 , . . . ) plus an optional
intercept (α):
yt = α + ∑ βk dk + et , where et ∼ N(0, σ2 )

(16.1)

k

Here the subscript, t is used since we are working with time-series data. Explanatory variables are normally denoted x in linear regression however x is
not used here since x is already used in MARSS models to denote the hidden
process trajectory. Instead d is used when the explanatory variables appear
in the y part of the equation (and c if they appear in the x part).
This chapter will start with classical linear regression where the explanatory variables are treated as inputs that are known without error and where
we are trying to explain the variation in y with our explanatory variables. We
will extend this to the case of autocorrelated errors.

16.1 Univariate linear regression
A vanilla linear regression where our data are time ordered but we treat them
as independent can be written
yt = α + β1 d1,t + β2 d2,t + et ,

(16.2)

216

16 Multivariate linear regression

where the d are our explanatory variables. This model can be written in many
different ways in as a MARSS equation. Here we use a specific form that where
the i.i.d. component of the errors is vt in the y part of the MARSS equation
and the autocorrelated errors appears as xt in the y equation. Specifying the
MARSS model this way allows us to use the EM-algorithm to fit the model
which will prove to be important.
 
d1,t

 d 
yt = α + β1 β2 . . .  2,t  + vt + xt , vt ∼ N(0, r)
..
.
(16.3)
xt = bxt−1 + wt , wt N(0, q)
x0 = 0
The vt are the i.i.d. errors and the xt are the AR(1) errors.
16.1.1 Univariate response using the Longley dataset: example 1
We will start by using an example from Chapter 6 in Linear Models in R
(Faraway, 2004). This example uses the built-in R dataset “longley” which has
the number of people employed from 1947 to 1962 and a number of predictors.
For this example we will regress the number of people employed against gross
National product and population size (following Faraway).
Mathematically, the model we are fitting is



 GNPt
Employedt = α + βGNP βPop
+ vt , vt ∼ N(0, r)
(16.4)
Popt
x does not appear in the vanilla linear regression since we do not have autocorrelated errors (yet). We are trying to estimate α (intercept), βGNP and
βPop .
A full multivariate MARSS model looks like
yt = Zxt + a + Ddt + vt , where vt ∼ MVN(0, R)
xt = Bxt−1 + u + Cct + wt , where wt ∼ MVN(0, Q)

(16.5)

We need to specify the parameters in Equation 16.5 such that we get Equation
16.4
First, we load the data and set up y, the response variable Employed, as
a matrix with time going across the columns.
data(longley)
Employed = matrix(longley$Employed, nrow=1)
Second create a list to hold our model specification.
longley.model=list()

217

66
60

62

64

Employed

68

70

16.1 Univariate linear regression

1950

1955

1960

Fig. 16.1. Employment time series from the Longley dataset.

Set the u, Q and x0 parameters to 0. We will also set a and C to 0 and B and
Z to identity although this is not necessary since these are the defaults.
longley.model$U=longley.model$Q="zero"
longley.model$C="zero"
longley.model$B=longley.model$Z="identity"
longley.model$x0="zero"
longley.model$tinitx=0
We will estimate R; this is the variance of the i.i.d. errors (residuals).
longley.model$R=matrix("r")
The D matrix has the two β (slope) parameters for GNP and Population and
a has the intercept.1
longley.model$A=matrix("intercept")
longley.model$D=matrix(c("GNP","Pop"),nrow=1)
1

A better way to fit the model is to put the intercept into D by adding a row of 1s
to d and putting the intercept parameter on the first row of D. This reduces by
one the number of matrices being estimated by the EM algorithm. It’s not done
here just so the equations look more like standard linear regression equations.

218

16 Multivariate linear regression

Last we set up our explanatory variables. This is the d matrix and we need
each explanatory variable in a row and time across the columns.
longley.model$d = rbind(longley$GNP, longley$Population)
Now we can fit the model with the MARSS() function:
mod1=MARSS(Employed, model=longley.model)
and look at the estimates.
coef(mod1, type="vector")
method="BFGS" can also be used and gives similar results.
We can compare the fit to that from lm() and see that we get the same
estimates:
mod1.lm=lm(Employed ~ GNP + Population, data=longley)
coef(mod1.lm)
(Intercept)
88.93879831

GNP Population
0.06317244 -0.40974292

16.1.2 Univariate response using auto-correlated errors: example 1
As ? discusses, the errors in this dataset are temporally correlated. We can
model the errors as an AR(1) process to account for this. This changes our
model to



 GNPt
Employedt = α + βGNP βPop
+ vt + xt , vt ∼ N(0, r)
Popt
(16.6)
xt = bxt−1 + wt , wt ∼ N(0, q)
x0 = 0
We assume the AR(1) errors have mean 0 so u = 0 in the xt equation. Setting
u to anything else would make the mean of our errors equal to u/(1 − b) for
−1 < b < 1. This would lead to two mean levels in our model, α and u/(1 − b),
and we would not be able to estimate both. Notice that the model is somewhat
confounded since if b = 0 then xt is i.i.d. errors same as vt . In case, either q or
r would be redundant. It is thus possible that either r or q will go to zero.
Then we rewrite the model list from our vanilla linear regression to correspond to this MARSS model with AR(1) errors. We estimate b (called φ here)
and q.
longley.ar1=longley.model
longley.ar1$B=matrix("phi")
longley.ar1$Q=matrix("q")
Now we could fit the model with the MARSS() function as before
mod2=MARSS(Employed, model=longley.ar1)

16.1 Univariate linear regression

219

however, this is a difficult model to fit and takes a long, long time to converge
(using method="BFGS" helps a little but not much). We can improve behavior
by using the fit of the model with i.i.d. errors as initial conditions for D and
a.
inits=list(A=coef(mod1)$A, D=coef(mod1)$D)
mod2=MARSS(Employed, model=longley.ar1, inits=inits, control=list(maxit=1000))
Warning! Abstol convergence only. Maxit (=1000) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=1000) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: -10.53742
AIC: 33.07484
AICc: 42.40818
Estimate
A.intercept 95.36017
R.r
0.00247
B.phi
0.35094
Q.q
0.21602
D.GNP
0.06749
D.Pop
-0.47840
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.r parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
We can compare the fit using gls() (nlme package) and see that we get the
same estimates:
require(nlme)
mod2.gls=gls(Employed ~ GNP + Population,
correlation=corAR1(), data=longley, method="ML")
mod2.gls.phi = coef(mod2.gls$modelStruct[[1]], unconstrained=FALSE)
c(mod2.gls.phi, coef(mod2.gls), logLik(mod2.gls))
Phi
0.36511964
-10.47396091

(Intercept)
96.09369245

GNP
0.06822305

Population
-0.48715545

220

16 Multivariate linear regression

Note we need to set method="ML" to maximize the likelihood because the
default is to maximize the restricted maximum-likelihood (method="REML")
and that gives a different answer from the MARSS() function since MARSS()
is maximizing the likelihood.
16.1.3 Univariate response using the Longley dataset: example 2
The full Longley dataset is often used to test the performance of numerical
methods for fitting linear regression models because it has severe collinearity
problems (Figure 16.2). We can compare the EM and BFGS algorithms for
the full dataset and see how fitting a MARSS model with the BFGS algorithm
leads to estimates far from the maximum-likelihood values for this problem.

●
●
●
●
●●
●

GNP

●
●●

●
●

150

300

●
●
●
●●
●● ●
●●●●
●
●●
●

1950 1960

●
●
●

●
●
●
●
●
●

●
●
●
●●
●

●
●
●
●

●
●

●●
●●

●

●
●

●
●
●

●
●
●

●
●
●
●
●
●

●

●
●

●
●

200

●
●
●
●
●
●
●
●

●

●●
●

●
●
●●
●
● ●

●
●

●

400

●
●
●

●
●●●

●●
●

●

●
●
●
●
●
●
●
●
●
●
●
●
●
●

●
●●

●

●
●● ●
●●
●●

●●
●
●
●●
●
●●●●

●●
●
●●
●
● ●

●

●
● ●
● ●●
●

●●
●●

●
●
●
●●
●
●●
●
●
●●
●●
●●

●
●●
●

●
●
●
● ●
●
●
● ●●
●●

●

Year
●
●
●
●
●
●●
●
●●
● ●

●
●
●
●
●

●
●
●
●
●
●

Employed

●
●● ●
●●
●●

125

105

●
● ●
●●

●●
●

●

●
●
●

●
●

●
●●
●
●●
● ●

110

●
●●

●●●●

Population

●

●
●

●

●
●
●
●

●

●
●
●

●●
● ●

●
●
●●
●

●

●

●
●●

●
●
●
●●
●
●●

●
●

●●

●

●
●
●

●●
●

●●●●
●

●
●●
●

●●●

●●
●
●
●●
●
●●●●

Armed.Forces

●●

●
●
●
●
● ●

●
●
●●

●●
●

●

●
●● ●●

●
●
●●
● ●
●
●
●●
●●

● ●

●●●
●●
●
●

●
●
●●

●

●
●●
●
●

●

●
●●
●

●

●
●●

●

●
●
●
● ●
●
●

● ●●
●●

85

●
●
●●
●●
●
●
●●
●●
●
●●
●

●
●●

●
●

●
●
●
●
●
●
●
●

●●
●

●●●
●

●
●
●●
●●
●
●
●●
●●
●
●●
●
●

●
●
●

●●

●●
●

●●
●●
●
●●●

●
● ●

●
●●●

105

●
●
●●
●

●

Unemployed

●
●
●●
●
●●
●
●
●●
●●
●●

●
●●
●●

85

●
●

●
●●

●

●●
●
● ●
●●
●
● ●●●

●
●
●
●
●
●

●
●
●●

●
●
●
●●
●
●

●
●●
●●
●
●
●
●●
●
●

●●●
●

●
●

●●●
●

●●
●
●
●●
●●
● ●

●
●

●●●
●●

●●
●

●

● ●

●● ●

●

●
●●

●
●
●

●
●

●

●
●●
●

●
●●

●●

● ●

●●

●
●

●
●

●
●

●
●
● ●

400

250

450

●●
●
●
●
●●
●●

1950 1960
●
●●
●●
●
●
●
●●
●
●

200

●
●●

300
●●
●
●●
●
●
●●
●
● ●

125

●
●
●
●●

GNP.deflator

● ●
●● ●

110

150

●
●●
●●
●
●

66

450

60

250

60

66

Fig. 16.2. Pairs plot showing colliniearity in the Longley explanatory variables.

We can fit a regression of Employed to all the Longley explanatory variables using the following code. The mathematical model is the same as in
Equation 16.4 except that instead of two explanatory variables with have all
seven shown in Figure 16.2.

16.1 Univariate linear regression

221

eVar.names = colnames(longley)[-7]
eVar = t(longley[,eVar.names])
longley.model=list()
longley.model$U=longley.model$Q="zero"
longley.model$C="zero"
longley.model$B=longley.model$Z="identity"
longley.model$A=matrix("intercept")
longley.model$R=matrix("r")
longley.model$D=matrix(eVar.names,nrow=1)
longley.model$d = eVar
longley.model$x0="zero"
longley.model$tinitx=0
And then we fit as usual. We will fit with the EM-algorithm (the default)
and compare to BFGS.
mod3.em=MARSS(Employed, model=longley.model)
mod3.bfgs=MARSS(Employed, model=longley.model, method="BFGS")
Here are the EM estimates with the log-likelihood.
par.names = c("A.intercept", paste("D",eVar.names,sep="."))
c(coef(mod3.em, type="vector")[par.names], logLik=mod3.em$logLik)
A.intercept D.GNP.deflator
-3.482259e+03
1.506187e-02
D.Armed.Forces
D.Population
-1.033227e-02 -5.110412e-02

D.GNP
-3.581918e-02
D.Year
1.829151e+00

D.Unemployed
-2.020230e-02
logLik
9.066497e-01

Compared to the BFGS estimates:
c(coef(mod3.bfgs, type="vector")[par.names], logLik=mod3.bfgs$logLik)
A.intercept D.GNP.deflator
-14.062098829
-0.052705201
D.Armed.Forces
D.Population
-0.005744197
-0.412771922

D.GNP
0.070642032
D.Year
0.055610012

D.Unemployed
-0.004298481
logLik
-6.996818721

And compared to the estimates from the lm() function:
mod3.lm=lm(Employed ~ 1 + GNP.deflator + GNP + Unemployed
+ Armed.Forces + Population + Year, data=longley)
c(coef(mod3.lm),logLik=logLik(mod3.lm))
(Intercept) GNP.deflator
GNP
Unemployed
-3.482259e+03 1.506187e-02 -3.581918e-02 -2.020230e-02
Armed.Forces
Population
Year
logLik
-1.033227e-02 -5.110411e-02 1.829151e+00 9.066497e-01

222

16 Multivariate linear regression

As you can see the BFGS algorithm struggles with the ridge-like likelihood
caused by the collinearity in the explanatory variables.
We can also compare the performance of the model with AR(1) errors.
This is Equation 16.6 but with all seven explanatory variables. We set up the
MARSS model2 for a linear regression with correlated errors as before with
the addition of b (called φ) and q.
longley.correrr.model=longley.model
longley.correrr.model$B=matrix("phi")
longley.correrr.model$Q=matrix("q")
We fit as usual and compare the EM-algorithm (the default) to fits using
BFGS. We will use the estimatefrom the model with i.i.d. errors as initial
conditions.
inits=list(A=coef(mod3.em)$A, D=coef(mod3.em)$D)
mod4.em=MARSS(Employed, model=longley.correrr.model, inits=inits)
mod4.bfgs=MARSS(Employed, model=longley.correrr.model, inits=inits, method="BFGS")
Here are the EM estimates with the log-likelihood. We only show φ (the b
term in the AR(1) error equation) and the log-likelihood.
c(coef(mod4.em, type="vector")["B.phi"], logLik=mod4.em$logLik)
B.phi
-0.7737392

logLik
4.5374543

Compared to the BFGS estimates:
c(coef(mod4.bfgs, type="vector")["B.phi"], logLik=mod4.bfgs$logLik)
B.phi
logLik
0.8368962 0.9066497
And compared to the estimates from the gls() function:
mod4.gls=gls(Employed ~ 1 + GNP.deflator + GNP + Unemployed
+ Armed.Forces + Population + Year,
correlation=corAR1(), data=longley, method="ML")
mod4.gls.phi = coef(mod4.gls$modelStruct[[1]], unconstrained=FALSE)
c(mod4.gls.phi, logLik=logLik(mod4.gls))
Phi
-0.7288687

logLik
4.3865475

Again we see that the BFGS algorithm struggles with the ridge-like likelihood
caused by the collinearity in the explanatory variables.
2

Notice that x0 is set at 0. The model is having a hard time fitting x0 because the
time series is short. Estimating x0 or using a diffuse prior by setting V0 big, leads
to poor estimates. Since this is just the error term, we set x0 = 0 since the mean
of the errors is assumed to be 0.

16.2 Multivariate response example using longitudinal data

223

16.2 Multivariate response example using longitudinal
data
We will illustrate linear regression with a multivariate response using longitudinal data from a sleep study on 18 subjects from the lme4 R package. These
are data on reaction time of subjects after 0 to 9 days of being restricted to 3
hours of sleep.
We load the data from the lme4 package:
data(sleepstudy,package="lme4")

0 2 4 6 8

372

0 2 4 6 8

333

352

0 2 4 6 8

331

330

337
●●

450
400
●●
●
●●

350
300
250

●
● ●
●●

●
●
●● ●
●●
●●●

●
●
●
●●●●
●
●

●
●
●
●
●

●
●
●
●
●●
●
● ●●
●
●
●
●● ●
●
●●
●●● ●

Average reaction time (ms)

●

200

308

371

369

351

335

332

●

●

450

●

●

400

●

●
●

●
●
●
●●●

●
● ●
●●● ●
●

●
●●
●●
●
●● ●
●

●
●

●
●
●
●
● ●
●
●

●●●
● ●
●
● ●●●
●
●●● ●●

●
●

350
300

●

250
200

310

309

370

349

350

334

450
400
●●
● ●

350
300
250
200

●
●●●
● ●●●
●●● ●
●
●
●●●● ●
●●●●
●●

0 2 4 6 8

●

0 2 4 6 8

●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
● ●●
●●
●
●
●
●●●

0 2 4 6 8

Days of sleep deprivation

Fig. 16.3. Plot of the sleep study data (package lme4).

We set up the data into the form required for the MARSS() function.
#number of subjects
nsub = length(unique(sleepstudy$Subject))
ndays=length(sleepstudy$Days)/nsub
#each subject is a row with day across the columns
dat = matrix(sleepstudy$Reaction, nsub, ndays,byrow=TRUE)
rownames(dat)=paste("sub",unique(sleepstudy$Subject),sep=".")

224

16 Multivariate linear regression

#the day number 0 to 9 is the explanatory variable
exp.var=matrix(sleepstudy$Days, 1, ndays,byrow=TRUE)
Let’s start with a simple regression where each subject has a separate
intercept (reaction time at day 0) but the slope (increase in reaction time
with each successive day) is the same across the 18 subjects. Mathematically
the model is
 
   


v1
β
α1
resp1
 v2 
 α2   β 
 resp2 
 
   


 . . .  =  . . .  + . . . dayt +  . . . 
v18
β
α18
resp18 t
 
 
 t
(16.7)
v1
r 0 ... 0
 v2 
 

  ∼ N 0,  0 r . . . 0 
...
 . . . . . . . . . . . .
v18 t
0 0 0 r
The response time of subject i is a subject specific intercept (αi ) plus an effect
of day at time t (so 0, 1, 2, etc) that doesn’t vary by subject and error that is
i.i.d. across subject and day.
We specifiy and fit this model as follows
sleep.model=list(
A="unequal",B="zero", x0="zero", U="zero",
D=matrix("b1",nsub,1), d=exp.var, tinitx=0, Q="zero")
sleep.mod1 = MARSS(dat, model=sleep.model)
This is the same as the following with lm():
sleep.lm1 = lm(Reaction ~ -1 + Subject + Days, data=sleepstudy)
Now let’s allow each subject to have different slopes (increase in reaction
time with each successive day) across subjects. This model is


   
 
resp1
α1
β1
v1
 resp2 
 α2   β2 
 v2 


   
 
 . . .  =  . . .  +  . . .  dayt +  . . . 
resp18 t
α18
β18
v18
 
 
 t
(16.8)
v1
r 0 ... 0
 v2 
 

  ∼ N 0,  0 r . . . 0 
...
 . . . . . . . . . . . .
v18 t
0 0 0 r
We specify and fit this model as
sleep.model=list(
A="unequal",B="zero", x0="zero", U="zero",
D="unequal", d=exp.var, tinitx=0, Q="zero")
sleep.mod2 = MARSS(dat, model=sleep.model, silent=TRUE)

16.2 Multivariate response example using longitudinal data

225

This is the same as the following with lm():
sleep.lm2 = lm(Reaction ~ 0 + Subject + Days:Subject, data = sleepstudy)
We can repeat the above but allow the residual variance to differ across
subjects by setting R="diagonal and unequal". This model is


   
 
resp1
α1
β1
v1
 resp2 
 α2   β2 
 v2 


   
 
 . . .  =  . . .  +  . . .  dayt +  . . . 
resp18 t
α18
β18
v18
 t
 
 
(16.9)
r1 0 . . . 0
v1

 
 v2 
  ∼ N 0,  0 r2 . . . 0 
 . . . . . . . . . . . . 
...
0 0 0 r18
v18 t
sleep.model=list(
A="unequal",B="zero", x0="zero", U="zero",
D="unequal", d=exp.var, tinitx=0, Q="zero",
R="diagonal and unequal")
sleep.mod3 = MARSS(dat, model=sleep.model, silent=TRUE)
Or we can allow AR(1) errors across subjects and allow each subject to
have its own AR(1) parameters for this error. This model is


   
   
resp1
α1
β1
v1
x1
 resp2 
 α2   β2 
 v2   x2 


   
   
 . . .  =  . . .  +  . . .  dayt +  . . .  +  . . . 
resp18 t
α18
β18
v18 t
x18 t
 
 

v1
r1 0 . . . 0
 v2 
  0 r2 . . . 0 
  ∼ N 0, 

...
 . . . . . . . . . . . . 
v18 t
0 0 0 r18
 
 

 
(16.10)
x1
x1
b1 0 . . . 0
w1
 x2 
 

 
  =  0 b2 . . . 0   x2  +  w2 
...
. . . . . . . . . . . .   . . . 
 ... 
0 0 0 b18 x18 t−1
x18 t
w18 t
 
 

w1
q1 0 . . . 0
 w2 
  0 q2 . . . 0 
  ∼ N 0, 

 ... 
 . . . . . . . . . . . . 
w18 t
0 0 0 q18
We fit this model as
inits=list(A=coef(sleep.mod3)$A,D=coef(sleep.mod3)$D)
#estimate a separate intercept for each but slope is the same

226

16 Multivariate linear regression

sleep.model=list(
A="unequal",B="diagonal and unequal", x0="zero", U="zero",
D="unequal", d=exp.var, tinitx=0, Q="diagonal and unequal",
R="diagonal and unequal")
sleep.mod4 = MARSS(dat, model=sleep.model, inits=inits, silent=TRUE)
It is not obvious how to specify these last two models using gls() or if it is
possible.
We can also allow each subject has his/her own error process but specify
that the parameters of these (φ, q and r) are the same across subjects. We do
this by using "diagonal and equal". Mathematically this model is
   
   


x1
v1
β1
α1
resp1
 v2   x2 
 α2   β2 
 resp2 
   
   


 . . .  =  . . .  +  . . .  dayt +  . . .  +  . . . 
x18 t
v18 t
β18
α18
resp18 t

 
 
r 0 ... 0
v1
 v2 
  0 r . . . 0 

  ∼ N 0, 
...
 . . . . . . . . . . . .
0 0 0 r
v18 t
 
 

 
(16.11)
x1
x1
b 0 ... 0
w1
 x2 
 

 
  =  0 b . . . 0   x2  +  w2 
...
. . . . . . . . . . . .  . . . 
 ... 
x18 t
0 0 0 b
x18 t−1
w18 t
 

 
w1
q 0 ... 0
 w2 
  0 q . . . 0 
  ∼ N 0, 

 ... 
 . . . . . . . . . . . .
0 0 0 q
w18 t
We specify and fit this model as
inits=list(A=coef(sleep.mod3)$A,D=coef(sleep.mod3)$D)
#estimate a separate intercept for each but slope is the same
sleep.model=list(
A="unequal",B="diagonal and equal", x0="zero", U="zero",
D="unequal", d=exp.var, tinitx=0, Q="diagonal and equal",
R="diagonal and equal")
sleep.mod5 = MARSS(dat, model=sleep.model, inits=inits, silent=TRUE)
This is fairly close to this model fit with gls().
sleep.mod5.gls=gls(Reaction ~ 0 + Subject + Days:Subject, data=sleepstudy,
correlation=corAR1(form=~ 1|Subject), method="ML")
The way the variance-covariance structure is modeled is a little different but
it’s the same idea.

16.2 Multivariate response example using longitudinal data

227

Table 16.1. Parameter estimates of different versions of the model where each
subject has a separate intercept (response time on normal sleep) and different slope
by day (increase in response time with each day of sleep deprivation). The model
types are discussed in the text.
logLik
slope 308
slope 309
slope 310
slope 330
slope 331
slope 332
slope 333
slope 334
slope 335
slope 337
slope 349
slope 350
slope 351
slope 352
slope 369
slope 370
slope 371
slope 372
phi 308
phi 309
phi 310
phi 330
phi 331
phi 332
phi 333
phi 334
phi 335
phi 337
phi 349
phi 350
phi 351
phi 352
phi 369
phi 370
phi 371
phi 372

lm mod2 em mod3 em mod4 em mod5 em mod5 gls
-818.94 -818.94 -770.19 -754.97 -818.76 -818.55
21.76
21.76
21.76
21.77
21.83
21.87
2.26
2.26
2.26
1.43
2.24
2.23
6.11
6.11
6.11
6.12
6.10
6.08
3.01
3.01
3.01
2.93
3.01
3.04
5.27
5.27
5.27
3.59
5.36
5.46
9.57
9.57
9.57
8.55
9.39
9.21
9.14
9.14
9.14
8.85
9.12
9.12
12.25
12.25
12.25
11.73
12.24
12.26
-2.88
-2.88
-2.88
-3.19
-2.82
-2.77
19.03
19.03
19.03
19.09
18.95
18.90
13.49
13.49
13.49
12.14
13.47
13.46
19.50
19.50
19.50
18.21
19.38
19.28
6.43
6.43
6.43
6.15
6.54
6.64
13.57
13.57
13.57
19.20
13.71
13.80
11.35
11.35
11.35
11.41
11.32
11.31
18.06
18.06
18.06
18.31
18.01
17.97
9.19
9.19
9.19
9.56
9.23
9.28
11.30
11.30
11.30
11.45
11.28
11.26
0.02
0.12
0.08
0.63
0.12
0.08
-0.01
0.12
0.08
0.32
0.12
0.08
-1.66
0.12
0.08
0.26
0.12
0.08
-1.04
0.12
0.08
0.51
0.12
0.08
-0.40
0.12
0.08
-0.08
0.12
0.08
0.80
0.12
0.08
0.32
0.12
0.08
-0.15
0.12
0.08
0.80
0.12
0.08
-0.25
0.12
0.08
-0.44
0.12
0.08
0.63
0.12
0.08
-0.47
0.12
0.08

228

16 Multivariate linear regression

16.3 Summary
The purpose of the is chapter is to illustrate how linear regression models
with multivariate explanatory variable or response variables can be written in
MARSS form. Thus they can be fit with the MARSS() function3 . Obviously R
has many, many packages for linear regression and generalized linear regression
(non-Gaussian errors). While the MARSS package can fit a variety of linear
regression models with Gaussian errors, that is not what it is designed to do.
The MARSS package is designed for fitting models that cannot be fit with
typical linear regression: multivariate autoregressive state-space models with
inputs (explanatory variables).

3

with caveat that one must always be careful when the likelihood surface has
prominent ridges which will occur with collinear explanatory variables.

17
Lag-p models with MARSS

17.1 Background
Most of the chapters in the User Guide are ‘lag-1’ in the autoregressive part
of the model. This means that xt in the process model only depends on xt−1
and not xt−2 (lag-2) or more generally xt−p (lag-p). A lag-p model can be
written in state-space form as a MARSS lag-1 model, aka a MARSS(1) model
(see section 11.3.2 in Tsay (2010)). Writing lag-p models in this form allows
one to take advantage of the fitting algorithms for MARSS(1) models. There
are a number of ways to do the conversion to a MARSS(1) form. We use
Hamilton’s form (section 1 in Hamilton (1994)) because it can be fit with an
EM algorithm while the other forms (Harvey’s and Akaike’s) cannot.
This chapter shows how to convert and fit the following using the MARSS(1)
form:
AR(p) A univariate autoregressive model where xt is a function of xt−p (and
the prior lags usually too). No observation error.
MAR(p) The same as AR(p) but the x term is multivariate not univariate.
ARSS(p) The same as AR(p) but with a observation model and observation
error. The observations may be multivariate (y can be multivariate) but
the x term is univariate.
MARSS(p) The same as ARSS(p) but the x term is multivariate not univariate.
Note that only ARSS(p) and MARSS(p) assume observation error in the data.
AR(p) and MAR(p) will be rewritten in the state-space form with a y component to facilitate statistical analysis but the data themselves are considered
error free.
Type RShowDoc("Chapter_MARp.R",package="MARSS") at the R command line to
open a file with all the code for the examples in this chapter.

230

17 Models with lag-p

Note there are many R packages for fitting AR(p) (and ARMA(p,q) for
that matter) models. If you are only interested in univariate data with no
observation error in the data then you probably want to look into the arima()
function included in base R and into R packages that specialize in fitting
ARMA models to univariate data. The forecast package in R is a good place
to start but others can be found on the CRAN task view: Time Series Analysis.

17.2 MAR(2) models
A MAR(2) model is a lag-2 MAR model, aka a multivariate autoregressive
process with no observation process (no SS part). A MAR(2) model is written
0
0
xt0 = B1 xt−1
+ B2 xt−2
+ u + wt , where wt ∼ MVN(0, Q)
 0 
x
We rewrite this as MARSS(1) by defining xt = 0 t :
xt−1
 0  
 0       
 

xt
B1 B2 xt−1
u
wt
wt
Q0
=
+
+
,
∼ MVN 0,
0
0
xt−1
Im 0 xt−2
0
0
0
0 0
 0 
x0
∼ MVN(π, Λ)
x0−1

Our observations are of xt only, so our observation model is


  xt0
yt = I 0
0
xt−1

(17.1)

(17.2)

(17.3)

17.2.1 Example of AR(2): univariate data
Here is an example of fitting a univariate AR(2) model written in MARSS(1)
form. First, let’s generate some simulated AR(2) data from this AR(2) process:
xt = −1.5xt−1 + −0.75xt−2 + wt , where wt ∼ N(0, 1)

(17.4)

TT=50
true.2=c(r=0,b1=-1.5,b2=-0.75,q=1)
temp=arima.sim(n=TT,list(ar=true.2[2:3]),sd=sqrt(true.2[4]))
sim.ar2=matrix(temp,nrow=1)
Next, we set up the model list for an AR(2) model written in MARSS(1)
form (refer to Equation 17.2 and 17.3):
Z=matrix(c(1,0),1,2)
B=matrix(list("b1",1,"b2",0),2,2)
U=matrix(0,2,1)
Q=matrix(list("q",0,0,0),2,2)

17.2 MAR(2) models

231

A=matrix(0,1,1)
R=matrix(0,1,1)
pi=matrix(sim.ar2[2:1],2,1)
V=matrix(0,2,2)
model.list.2=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=1)
Notice that we do not need to estimate π. We will fit our model to the data
(y) starting
  at t = 2. Because R = 0 (see Equation refeq:mar2.y, this means
y
x1 = 2 . If we define π ≡ x1 by setting tinitx=1, then π is known and is
y1
simply the first two data points.
Then we can then estimate the b1 and b2 parameters for the AR(2) process.
ar2=MARSS(sim.ar2[2:TT],model=model.list.2)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -63.02523
AIC: 132.0505
AICc: 132.5838

B.b1
B.b2
Q.q

Estimate
-1.582
-0.777
0.809

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Comparison to the true values shows the estimates are close:
print(cbind(true=true.2[2:4],estimates=coef(ar2,type="vector")))
true estimates
b1 -1.50 -1.5816137
b2 -0.75 -0.7767462
q
1.00 0.8091055
Missing values in the data are fine. Let’s make half the data missing being
careful that the first data point does not get categorized as missing:
gappy.data=sim.ar2[2:TT]
gappy.data[floor(runif(TT/2,2,TT))]=NA
ar2.gappy=MARSS(gappy.data,model=model.list.2)

232

17 Models with lag-p

And the estimates are still close:
print(cbind(true=true.2[2:4],
estimates.no.miss=coef(ar2,type="vector"),
estimates.w.miss=coef(ar2.gappy,type="vector")))
true estimates.no.miss estimates.w.miss
b1 -1.50
-1.5816137
-1.6553403
b2 -0.75
-0.7767462
-0.8578665
q
1.00
0.8091055
0.6492250
By the way, there are much easier and faster functions in R for fitting
univariate AR models (no observation error). For example, here is how you
would fit the AR(2) model using the base arima() function:
arima(gappy.data,order=c(2,0,0),include.mean=FALSE)
Call:
arima(x = gappy.data, order = c(2, 0, 0), include.mean = FALSE)
Coefficients:
ar1
-1.6651
s.e.
0.0746

ar2
-0.8601
0.0743

sigma^2 estimated as 0.6539:

log likelihood = -48.05,

aic = 102.1

The advantage of using the MARSS package really only comes in when you
are fitting to multivariate data or data with observation error.
17.2.2 Example of MAR(2): multivariate data
Here we show an example of fitting a MAR(2) model. Let’s make some simulated data of two realizations of the same AR(2) process:
TT=50
true.2=c(r=0,b1=-1.5,b2=-0.75,q=1)
temp1=arima.sim(n=TT,list(ar=true.2[c("b1","b2")]),sd=sqrt(true.2["q"]))
temp2=arima.sim(n=TT,list(ar=true.2[c("b1","b2")]),sd=sqrt(true.2["q"]))
sim.mar2=rbind(temp1,temp2)
Although these are independent time series, we want to fit with a MAR(2)
model to allow us to use both datasets together to estimate the AR(2) parameters. We need to set up the model list for the multivariate model (Equation
17.2 and 17.3):

17.2 MAR(2) models

233

Z=matrix(c(1,0,0,1,0,0,0,0),2,4)
B1=matrix(list(0),2,2); diag(B1)="b1"
B2=matrix(list(0),2,2); diag(B2)="b2"
B=matrix(list(0),4,4)
B[1:2,1:2]=B1; B[1:2,3:4]=B2; B[3:4,1:2]=diag(1,2)
U=matrix(0,4,1)
Q=matrix(list(0),4,4)
Q[1,1]="q"; Q[2,2]="q"
A=matrix(0,2,1)
R=matrix(0,2,2)
pi=matrix(c(sim.mar2[,2],sim.mar2[,1]),4,1)
V=matrix(0,4,4)
model.list.2m=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=1)
Notice the form of the Z matrix:
[1,]
[2,]

[,1] [,2] [,3] [,4]
1
0
0
0
0
1
0
0

It is a 2 × 2 identity matrix followed by a 2 × 2 all-zero matrix. The B matrix is
composed of B1 and B2 which are diagonal matrices with b1 and b2 respectively
on the diagonal.
[1,]
[2,]
[3,]
[4,]

[,1]
"b1"
0
1
0

[,2]
0
"b1"
0
1

[,3]
"b2"
0
0
0

[,4]
0
"b2"
0
0

We fit the model as usual:
mar2=MARSS(sim.mar2[,2:TT],model=model.list.2m)
Then we can compare how using two time series improves the fit versus using
only one alone:
model.list.2$x0=matrix(sim.mar2[1,2:1],2,1)
mar2a=MARSS(sim.mar2[1,2:TT],model=model.list.2)
model.list.2$x0=matrix(sim.mar2[2,2:1],2,1)
mar2b=MARSS(sim.mar2[2,2:TT],model=model.list.2)
true
est.mar2 est.mar2a est.mar2b
b1 -1.50 -1.4546302 -1.3192188 -1.5560202
b2 -0.75 -0.8176845 -0.7514445 -0.8766648
q
1.00 0.7736720 0.7922118 0.6803098

234

17 Models with lag-p

17.3 MAR(p) models
A MAR(p) model is similar to a MAR(2) except it has lags up to time p:
0
0
0
xt0 = B1 xt−1
+ B2 xt−2
+ · · · + B p xt−p
+ u0 + wt0 , where wt0 ∼ MVN(0, Q0 )

where


B1
xt0
 Im
0

xt−1 



xt =  .  , B =  0

 .. 
0
0
xt−p
0



 0
 0
Bp
u
Q

0
0
0
 

0
 , u =  ..  , Q = 
.
0
.. 
. 
0
0
0 ... 0

B2 . . .
0 ...
Im . . .
.
0 ..

0 ...
0 ...
.
0 ..


0
0

.. 
.

(17.5)

0 ... 0

Here’s an example of fitting a univariate AR(3) in MARSS(1) form. We
need more data to estimate an AR(3), so use 100 time steps.
TT=100
true.3=c(r=0,b1=-1.5,b2=-0.75,b3=.05,q=1)
temp3=arima.sim(n=TT,list(ar=true.3[c("b1","b2","b3")]),sd=sqrt(true.3["q"]))
sim.ar3=matrix(temp3,nrow=1)
We set up the model list for the AR(3) in MARSS(1) form as follows:
Z=matrix(c(1,0,0),1,3)
B=matrix(list("b1",1,0,"b2",0,1,"b3",0,0),3,3)
U=matrix(0,3,1)
Q=matrix(list(0),3,3); Q[1,1]="q"
A=matrix(0,1,1)
R=matrix(0,1,1)
pi=matrix(sim.ar3[3:1],3,1)
V=matrix(0,3,3)
model.list.3=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=1)
and fit as normal:
ar3=MARSS(sim.ar3[3:TT],model=model.list.3)
The estimates are:
print(cbind(true=true.3[c("b1","b2","b3","q")],
estimates.no.miss=coef(ar3,type="vector")))
true estimates.no.miss
b1 -1.50
-1.5130316
b2 -0.75
-0.6755283
b3 0.05
0.1368458
q
1.00
1.1267684

17.4 MARSS(p): models with observation error

235

17.4 MARSS(p): models with observation error
We can easily fit MAR(p) processes observed with error using MARSS(p)
models, but the difficulty is specifying the initial state condition. π ≡ x1 and
thus involves x1 , x0 , .... However, we do not know the variance covariance
structure for these consequtive x. Specifying Λ = 0 and estimating π often
causes the EM algorithm to run into numerical problems. But if we have an
abundance of data, fixing π might not overly affect the B and Q estimates.
Here is an example where we set π to the mean of the data and set Λ to
zero. Why not set Λ equal to a diagonal matrix with large values on the diagonal to approximate a vague prior? The temporally consecutive initial states
are definitely not independent. A diagonal matrix would imply independence
which will conflict with the process model and means our model would be fundamentally inconsistent with the data (and that usually has bad consequences
for estimation).
Create some simulated data:
TT=1000 #set long
true.2ss=c(r=.5,b1=-1.5,b2=-0.75,q=.1)
temp=arima.sim(n=TT,list(ar=true.2ss[c("b1","b2")]),sd=sqrt(true.2ss["q"]))
sim.ar=matrix(temp,nrow=1)
noise=rnorm(TT-1,0,sqrt(true.2ss["r"]))
noisy.data=sim.ar[2:TT]+noise
Set up the model list for the model in MARSS(1) form:
Z=matrix(c(1,0),1,2)
B=matrix(list("b1",1,"b2",0),2,2)
U=matrix(0,2,1)
Q=matrix(list("q",0,0,0),2,2)
A=matrix(0,1,1)
R=matrix("r")
V=matrix(0,2,2)
pi=matrix(mean(noisy.data),2,1)
model.list.2ss=list(Z=Z,B=B,U=U,Q=Q,A=A,R=R,x0=pi,V0=V,tinitx=0)
Fit as usual:
ar2ss=MARSS(noisy.data,model=model.list.2ss)
Success! abstol and log-log tests passed at 101 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 101 iterations.

236

17 Models with lag-p

Log-likelihood: -1368.796
AIC: 2745.592
AICc: 2745.632

R.r
B.b1
B.b2
Q.q

Estimate
0.477
-1.414
-0.685
0.140

Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
We can compare the results to modeling the data as if there is no observation error, and we see that that assumption leads to poor B estimates:
model.list.2ss.bad=model.list.2ss
#set R to zero in this model
model.list.2ss.bad$R=matrix(0)
Fit using the model with R set to 0:
ar2ss2=MARSS(noisy.data,model=model.list.2ss.bad)
Compare results
print(cbind(true=true.2ss,
model.no.error=c(NA,coef(ar2ss2,type="vector")),
model.w.error=coef(ar2ss,type="vector")))
true model.no.error model.w.error
r
0.50
NA
0.4772368
b1 -1.50
-0.52826082
-1.4136279
b2 -0.75
0.03372857
-0.6853180
q
0.10
0.95834464
0.1404334
The middle column are the estimates assuming that the data have no observation error and the right column are our estimates with the observation
error estimated. Clearly, assuming no observation error when it is present has
negative consequences for the B and Q estimates.
By the way, there is a straight-forward way to deal with the measurement
error if you are working with univariate ARMA models and you are only interested in the AR parameters (the b’s). Inclusion of measurement error leads
to additional MA components up to lag p (Staudenmayer and Buonaccorsi,
2005). This means that if you are fitting a AR(p) model with measurement error, you can fit a ARMA(p,p) and the measurement error will be absorbed in
the p MA components. For the example above, we could estimate the AR parameters for our AR(2) data with measurement error by fitting a ARMA(p,p)
model. Here’s how we could do that using R ’s arima() function:

17.5 Discussion

237

arima(noisy.data,order=c(2,0,2),include.mean=FALSE)
Call:
arima(x = noisy.data, order = c(2, 0, 2), include.mean = FALSE)
Coefficients:
ar1
-1.4448
s.e.
0.0593

ar2
-0.6961
0.0427

ma1
0.9504
0.0686

sigma^2 estimated as 0.9069:

ma2
0.3428
0.0482

log likelihood = -1368.99,

aic = 2747.99

Accounting for the measurement error definitely improves the estimates for
the AR component.

17.5 Discussion
Although both MARSS(1) and ARMA(p,p) approaches can be used to deal
with AR(p) processes (univariate data) observed with error, our simulations
suggest that the MARSS(1) approach is less biased and more precise (Figure
17.1) and that the EM algorithm is working better for this problem. The
performance of different approaches depends greatly on the underlying model.
We chose AR parameters where both ARMA(p,p) and MARSS(1) approaches
work. If we used, for example, b1 = 0.8 and b2 = −0.2, the ARMA(2,2) gives
b1 estimates close to 0 (i.e. wrong) while the MARSS(1) EM approach gives
estimates close to the truth (though rather variable). One would want to
also check REML approaches for fitting the ARMA(p,p) models since REML
has been found to be less biased than ML estimation for this class (Cheang
and Reinsel, 2000; Ives et al., 2010). Ives et al. 2010 has R code for REML
estimation of ARMA(p,q) models in their appendix.
For multivariate data observed with error, especially multivariate data
without a one-to-one relationship to the underlying autoregressive process, an
explict MARSS model will need to be used rather than an ARMA(p,p) model.
The time steps required for good parameter estimates are likely to be large; in
our simulations, we used 100 for a AR(3) and 1000 for a ARSS(2). Thorough
simulation testing should be conducted to determine if the data available are
sufficient to allow estimation of the B terms at multiple lags.

238

17 Models with lag-p

b1

b2
●
●

1.5

●
●
●
●

1.0

●
●

●
●

x

0.0

●
●
●
●
●
●

●

x

●
●

●

−1.0

●

x

x

ARMA
(2,2)

−0.5

●

MARSS
BFGS

estimates of the ar coefficients

●

0.5

x
x

●
●
●
●
●
●
●

x

x

−1.5

●

MARSS
EM

AR(2)

ARMA
(2,2)

MARSS
BFGS

MARSS
EM

AR(2)

−2.0

Fig. 17.1. Comparison of the AR parameter estimates using different approaches
to model ARSS(2) data (univariate AR(2) data observed with error). Results are
from 200 simulations of AR(2) data with 100 time steps. Results are shown for
the b1 and b2 parameters of the AR process fit with a 1) AR(2) model with no
correction for measurement error, 2) MARSS(1) model fit via the EM optimization,
3) MARSS(1) model fit via BFGS optimization, 4) ARMA(2,2) model fit with the R
arima function, and 5) AR(2) model fit 2nd differencing with the R arima function.
The ”x” shows the mean of the simulations and the bar in the boxplot is the median.
The true values are shown with the dashed horizontal line. The σ2 for the AR
process was 0.1 and the σ2 for the measurement error was 0.5. The b1 parameters
was -1.5,and b2 was -0.75.

A
Textbooks and articles that use MARSS
modeling for population modeling

Textbooks Describing the Estimation of Process and
Non-process Variance
There are many textbooks on Kalman filtering and estimation of state-space
models. The following are a sample of books on state-space modeling that we
have found especially helpful.
Shumway, R. H., and D. S. Stoffer. 2006. Time series analysis and its
applications. Springer-Verlag.
Harvey, A. C. 1989. Forecasting, structural time series models and the
Kalman filter. Cambridge University Press.
Durbin, J., and S. J. Koopman. 2001. Time series analysis by state space
methods. Oxford University Press.
Kim, C. J. and Nelson, C. R. 1999. State space models with regime switching. MIT Press.
King, R., G. Olivier, B. Morgan, and S. Brooks. 2009. Bayesian analysis
for population ecology. CRC Press.
Giovanni, P., S. Petrone, and P. Campagnoli. 2009. Dynamic linear models
in R. Springer-Verlag.
Pole, A., M. West, and J. Harrison. 1994. Applied Bayesian forecasting
and time series analysis. Chapman and Hall.
Bolker, B. 2008. Ecological models and data in R. Princeton University
Press.
West, M. and Harrison, J. 1997. Bayesian forecasting and dynamic models.
Springer-Verlag.
Tsay, R. S. 2010. Analysis of financial time series. Wiley.

Maximum-likelihood papers
This is just a sample of the papers from the population modeling literature.

240

A Textbooks and articles that use MARSS modeling for population modeling

de Valpine, P. 2002. Review of methods for fitting time-series models with
process and observation error and likelihood calculations for nonlinear, nonGaussian state-space models. Bulletin of Marine Science 70:455-471.
de Valpine, P. and A. Hastings. 2002. Fitting population models incorporating process noise and observation error. Ecological Monographs 72:57-76.
de Valpine, P. 2003. Better inferences from population-dynamics experiments using Monte Carlo state-space likelihood methods. Ecology 84:30643077.
de Valpine, P. and R. Hilborn. 2005. State-space likelihoods for nonlinear fisheries time series. Canadian Journal of Fisheries and Aquatic Sciences
62:1937-1952.
Dennis, B., J.M. Ponciano, S.R. Lele, M.L. Taper, and D.F. Staples. 2006.
Estimating density dependence, process noise, and observation error. Ecological Monographs 76:323-341.
Ellner, S.P. and E.E. Holmes. 2008. Resolving the debate on when extinction risk is predictable. Ecology Letters 11:E1-E5.
Erzini, K. 2005. Trends in NE Atlantic landings (southern Portugal): identifying the relative importance of fisheries and environmental variables. Fisheries Oceanography 14:195-209.
Erzini, K., Inejih, C. A. O., and K. A. Stobberup. 2005. An application of two techniques for the analysis of short, multivariate non-stationary
time-series of Mauritanian trawl survey data ICES Journal of Marine Science
62:353-359.
Hinrichsen, R.A. and E.E. Holmes. 2009. Using multivariate state-space
models to study spatial structure and dynamics. In Spatial Ecology (editors
Robert Stephen Cantrell, Chris Cosner, Shigui Ruan). CRC/Chapman Hall.
Hinrichsen, R.A. 2009. Population viability analysis for several populations
using multivariate state-space models. Ecological Modelling 220:1197-1202.
Holmes, E.E. 2001. Estimating risks in declining populations with poor
data. Proceedings of the National Academy of Sciences of the United States
of America 98:5072-5077.
Holmes, E.E. and W.F. Fagan. 2002. Validating population viability analysis for corrupted data sets. Ecology 83:2379-2386.
Holmes, E.E. 2004. Beyond theory to application and evaluation: diffusion approximations for population viability analysis. Ecological Applications
14:1272-1293.
Holmes, E.E., W.F. Fagan, J.J. Rango, A. Folarin, S.J.A., J.E. Lippe, and
N.E. McIntyre. 2005. Cross validation of quasi-extinction risks from real time
series: An examination of diffusion approximation methods. U.S. Department
of Commerce, NOAA Tech. Memo. NMFS-NWFSC-67, Washington, DC.
Holmes, E.E., J.L. Sabo, S.V. Viscido, and W.F. Fagan. 2007. A statistical
approach to quasi-extinction forecasting. Ecology Letters 10:1182-1198.
Kalman, R.E. 1960. A new approach to linear filtering and prediction
problems. Journal of Basic Engineering 82:35-45.

A Textbooks and articles that use MARSS modeling for population modeling

241

Lele, S.R. 2006. Sampling variability and estimates of density dependence:
a composite likelihood approach. Ecology 87:189-202.
Lele, S.R., B. Dennis, and F. Lutscher. 2007. Data cloning: easy maximum
likelihood estimation for complex ecological models using Bayesian Markov
chain Monte Carlo methods. Ecology Letters 10:551-563.
Lindley, S.T. 2003. Estimation of population growth and extinction parameters from noisy data. Ecological Applications 13:806-813.
Ponciano, J.M., M.L. Taper, B. Dennis, S.R. Lele. 2009. Hierarchical models in ecology: confidence intervals, hypothesis testing, and model selection
using data cloning. Ecology 90:356-362.
Staples, D.F., M.L. Taper, and B. Dennis. 2004. Estimating population
trend and process variation for PVA in the presence of sampling error. Ecology
85:923-929.
Zuur, A. F., and G. J. Pierce. 2004. Common trends in Northeast Atlantic
squid time series. Journal of Sea Research 52:57-72.
Zuur, A. F., I. D. Tuck, and N. Bailey. 2003. Dynamic factor analysis to
estimate common trends in fisheries time series. Canadian Journal of Fisheries
and Aquatic Sciences 60:542-552.
Zuur, A. F., R. J. Fryer, I. T. Jolliffe, R. Dekker, and J. J. Beukema. 2003.
Estimating common trends in multivariate time series using dynamic factor
analysis. Environmetrics 14:665-685.

Bayesian papers
This is a sample of the papers from the population modeling and animal
tracking literature.
Buckland, S.T., K.B. Newman, L. Thomas and N.B. Koestersa. 2004.
State-space models for the dynamics of wild animal populations. Ecological
modeling 171:157-175.
Calder, C., M. Lavine, P. Müller, J.S. Clark. 2003. Incorporating multiple
sources of stochasticity into dynamic population models. Ecology 84:13951402.
Chaloupka, M. and G. Balazs. 2007. Using Bayesian state-space modelling
to assess the recovery and harvest potential of the Hawaiian green sea turtle
stock. Ecological Modelling 205:93-109.
Clark, J.S. and O.N. Bjørnstad. 2004. Population time series: process variability, observation errors, missing values, lags, and hidden states. Ecology
85:3140-3150.
Jonsen, I.D., R.A. Myers, and J.M. Flemming. 2003. Meta-analysis of animal movement using state space models. Ecology 84:3055-3063.
Jonsen, I.D, J.M. Flemming, and R.A. Myers. 2005. Robust state-space
modeling of animal movement data. Ecology 86:2874-2880.

242

A Textbooks and articles that use MARSS modeling for population modeling

Meyer, R. and R.B. Millar. 1999. BUGS in Bayesian stock assessments.
Can. J. Fish. Aquat. Sci. 56:1078-1087.
Meyer, R. and R.B. Millar. 1999. Bayesian stock assessment using a statespace implementation of the delay difference model. Can. J. Fish. Aquat. Sci.
56:37-52.
Meyer, R. and R.B. Millar. 2000. Bayesian state-space modeling of agestructured data: fitting a model is just the beginning. Can. J. Fish. Aquat.
Sci. 57:43-50.
Newman, K.B., S.T. Buckland, S.T. Lindley, L. Thomas, and C. Fernández. 2006. Hidden process models for animal population dynamics. Ecological
Applications 16:74-86.
Newman, K.B., C. Fernández, L. Thomas, and S.T. Buckland. 2009. Monte
Carlo inference for state-space models of wild animal populations. Biometrics
65:572-583
Rivot, E., E. Prévost, E. Parent, and J.L. Baglinière. 2004. A Bayesian
state-space modelling framework for fitting a salmon stage-structured population dynamic model to multiple time series of field data. Ecological Modeling
179:463-485.
Schnute, J.T. 1994. A general framework for developing sequential fisheries
models. Canadian J. Fisheries and Aquatic Sciences 51:1676-1688.
Swain, D.P., I.D. Jonsen, J.E. Simon, and R.A. Myers. 2009. Assessing
threats to species at risk using stage-structured state-space models: mortality
trends in skate populations. Ecological Applications 19:1347-1364.
Thogmartin, W.E., J.R. Sauer, and M.G. Knutson. 2004. A hierarchical
spatial model of avian abundance with application to cerulean warblers. Ecological Applications 14:1766-1779.
Trenkel, V.M., D.A. Elston, and S.T. Buckland. 2000. Fitting population
dynamics models to count and cull data using sequential importance sampling.
J. Am. Stat. Assoc. 95:363-374.
Viljugrein, H., N.C. Stenseth, G.W. Smith, and G.H. Steinbakk. 2005.
Density dependence in North American ducks. Ecology 86:245-254.
Ward, E.J., R. Hilborn, R.G. Towell, and L. Gerber. 2007. A state-space
mixture approach for estimating catastrophic events in time series data. Can.
J. Fish. Aquat. Sci., Can. J. Fish. Aquat. Sci. 644:899-910.
Wikle, C.K., L.M. Berliner, and N. Cressie. 1998. Hierarchical Bayesian
space-time models. Journal of Environmental and Ecological Statistics 5:117154
Wikle, C.K. 2003. Hierarchical Bayesian models for predicting the spread
of ecological processes. Ecology 84:1382-1394.

B
Package MARSS: Warnings and errors

The following are brief descriptions of the warning and error message you may
see and what they mean (or might mean).

B update is outside the unit circle
If you are estimating B, then if the absolute value of all the eigenvalues of B are
less than 1, the system is stationary (meaning the X’s have some multivariate
distribution that does not change over time). In this case, we say that B is
within the unit circle. A pure univariate random walk for example would have
B=1 and it is not stationary. The distribution of X for the pure random walk
has a variance that increases with time. If on the other hand |B| < 1, you have
an Ornstein-Uhlenbeck process and is stationary, with a stationary variance
of Q/(1 − B2 ) (note B is a scalar here because in this example X is univariate).
If any of the eigenvalues (real part) are greater than 1, then the system will
“explode”—it rapidly diverges.
In the EM algorithm, there is nothing to force B to be on or within the
unit circle (real part of the eigenvalues less than or equal to 1). It is possible
at one of the EM iterations the B update will be outside the unit circle. The
problem is that if you get too far outside the unit circle, the algorithm becomes
numerically unstable since small errors are magnified by the “explosive” B
term. If you see the ‘B outside the unit circle’ warning, it is fine as long as it
is temporary and the log-likelihood does not start decreasing (you will see a
separate warning if that happens).
If you do see B outside the unit circle and the log-likelihood decreases,
then it probably means that you have poorly specified the model somehow.
An easy way to do this is to poorly specify the initial conditions, π and Λ.
If, say, you try to specify a vague prior on x0 (or x1 ) with π equal to zero
and Λ equal to a diagonal matrix with a large variance on the diagonal, you
will likely run into trouble if B has off-diagonal terms. The reason is that by
specifying that Λ is diagonal, you specified that the individual X’s in X0 are

244

B Package MARSS: Warnings and errors

independent, yet if B has off-diagonal terms, the stationary distribution of X1
is NOT independent. If you force the diagonal terms on Λ to be big enough,
you can force the maximum-likelihood estimate of B to be outside the unit
circle since this is the only way to account for X0 independent and X1 highly
correlated.
The problem is that you will not know the stationary distribution of the X’s
(from which X0 was presumably drawn) without knowing the parameters you
are trying to estimate. One approach is the estimate both π and Λ by setting
x0="unconstrained" and V0="unconstrained" in the model specification.
Estimating both π and Λ cannot be done robustly for all MARSS models, and
in general, one probably wants to specify the model in such a way as to fix
one or both of these. Another, more robust approach, is to treat x1 as fixed
but unknown (instead of x0 ). You do this by setting model$tinitx=1, so that
π refers to t = 1 not t = 0. Then estimate π and fix Λ = 0. This eliminates Λ
from the model and often eliminates the problems with prior specification—
as the expense of m more parameters. Note, when you set Λ = 0, Λ is truly
eliminated from the model by MARSS; the likelihood function is different, so
do not expect Λ = 0 and Λ ∼ 0 to have the same likelihood under all conditions.

Warning! Reached maxit before parameters converged
The maximum number of EM iterations is set by control$maxit. If you get
this warning, it means that one of the parameters or log-likelihood had not
yet reached the convergence stopping criteria before maxit was reached. There
are many situations where you might want to set control$maxit lower than
the value needed to reach convergence. For example, if you are using the EM
algorithm to produce initial values for a different algorithm (like a Bayesian
MCMC algorithm or a Newton method) then you can set maxit low, say 20
or 50.

Stopped at iter=xx in MARSSkem() because numerical
errors were generated in MARSSkf
This means the Kalman filter/smoother algorithm became unstable and most
likely one of the variances became ill-conditioned. When that happens the
inverses of those matrices are poor, and you will start to get negative values
on the diagonals of your variance-covariance matrices. Once that happens, the
inverse of that var-covariance matrix produces an error. If you get this error,
turn on tracing with control$trace=1. This will store the error messages so
you can see what is going on. It may be that you have specified the model in
such a way that some of the variances are being forced very close to 0, which
makes the var-covariance matrix ill-conditioned. The output from the MARSS
call will be the parameter values just before the error occurred.

B Package MARSS: Warnings and errors

245

Warning: the xyz parameter value has not converged
The algorithm checks whether the log-likelihood and each individual parameter has converged. If a parameter has not converged, you can try upping
control$maxit and see if it converges. If you set, maxit high, but the parameter is still not converging, then it suggests that one of the variance parameters
is so small that the EM update steps for that parameter are tiny. For example,
as Q goes to zero, the update steps for u go to zero. As Λ goes to zero, the
update steps for π go to zero. The first thing to do is to reflect on whether you
are inadvertently specifying the model in such a way that one of the variances
is forced to zero. For example, if the total variance in X is 0.1 and you fix
R = 0.2 then Q must go to zero. The second thing to do is to try using a
Newton algorithm, using your last EM values as the initial conditions for the
Newton algorithm. The initial values are set using the inits argument for
the MARSS() function.

MARSSkem: The soln became unstable and logLik
DROPPED
This is a more serious error as in the EM algorithm, the log-likelihood should
never drop. The first thing to do is check if you have specified a bizarre
model or data, inadvertently. Plot the data you are trying to fit. Often, this
error arises when a user has inadvertently scrambled their data order during
a demeaning or variance-standardization step. Second, check the model you
are trying to fit. Use test=MARSS(data, model=xyz, fit=FALSE) and then
summary(test$model). This shows you what MARSS() thinks your model is.
You may be trying to fit an illogical model.
If those checks looks good, then pass control$trace=1 into the MARSS()
call. This will report a fuller set of warnings. Look if the error “B is outside
the unit circle” appears. If so, you are probably specifying a strange B matrix.
Are you forcing the B matrix to be outside the unit circle (eigenvalues >
1)? If so, you need to rethink your B matrix contraints. If you do not see
that error, look at test$iter.record$logLik. If the log-likelihood is steadily
dropping (at each iteration) or drops by large amounts (much larger than the
machine precision), that is bad and means that the EM algorithm did not
work. If however the log-likelihood is just fluctuating by small amounts about
some steady value, that is ok as it means that the values converged but the
parameters are such that there are slight numerical fluctuations. Try passing
control$safe=TRUE in the MARSS() call. This can sometimes help as it inserts
a call to the Kalman filter after each individual parameter update.

246

B Package MARSS: Warnings and errors

Stopped at iter=xx in MARSSkem: solution became
unstable. R (or Q) update is not positive definite
First check if you have specified an illegally constrained variance-covariance
matrix. For example, if the variances (diagonal) are constrained to be equal,
you cannot specify the covariances (off-diagonals) as unequal. Or if you specify
that some of the covariances are equal, you cannot specify the variances as all
unequal. These are illegal constraints on a variance-covariance matrix from a
statistical perspective (nothing to do with MARSS specifically).
This could also be due to numerical instability as B leaves the unit circle
or one of the variance matrix becomes ill-conditioned. Try turning on tracing
with control$trace=1 and turn on safe with control$safe=TRUE. This will
print out the error warnings at each parameter update step. Then consider
whether you have inadvertently specified the model in such a way as to force
this behavior in the B parameter.
You might also get this error if you inadvertantly specified an improper
structure for R or Q. For example, if you used R=diag(c(1,1,"r")) with
the intent of specifying a diagonal matrix with fixed variance 1 at R[1, 1] and
R[2, 2] and an estimated R[3, 3], you would have actually specified a character
matrix with "0" on the off-diagonals and c("1","1","r") on the diagonal.
MARSS() interprets all elements in quotes as names of parameters to be
estimated. Thus it will estimate one off-diagonal covariance and two diagonal
variances. That happens to put illegal constraints on estimation of a variancecovariance matrix having nothing to do with MARSS() but with estimation
of variance-covariance matrices in general.

iter=xx MARSSkf: logLik computation is becoming
unstable. Condition num. of Sigma[t=1] = Inf and of R
= Inf.
This means, generally, that V0 is very small, say 0, and R is very small and
very close to zero.

Warning: setting diagonal to 0 blocked at iter=X.
logLik was lower in attempt to set 0 diagonals on X
This is a warning not an error. What is happening is that one of the variances
(in Q or R) is getting small and the EM algorithm is attempting to set the
value to 0 (because control$degen.allow=TRUE). But when it tried to do this,
the new likelihood with the variance equal to 0 was lower and the variance
was not set to 0.
A model with a variance minuscule and a model with the same variance
equal to 0 are not the same model. In the first, a stochastic process with

B Package MARSS: Warnings and errors

247

small variance exists but in the second, the analogous process is deterministic. And in the first case, you can get a situation where the likelihood term
L(x|mean=mu,sigma=0) appears. That term will be infinite when x=mu. So
in the model with variance minuscule, you will get very large likelihood values
as the variance term gets smaller and smaller. In the analogous model with
that variance set to 0, that likelihood term does not appear so the likelihood
does not go to infinity.
This is not an error nor pathological behavior; the models are fundamentally different. Nonetheless, this will pose a dilemma when you want to chose
the best model based on maximum likelihood. The model with minuscule
variance will have infinite likelihood but the same behavior as the one with
variance 0. In our experience, this dilemma arises when one has a lot of missing data near the beginning of the time series and is affected by how you
specify the prior on the initial state. Try setting the prior at t = 0 versus t = 1.
Try using a diffuse prior. You absolutely want to compare estimates using the
BFGS and EM algorithms in this case, because the different algorithms differ
in their ability to find the maximum in this strange case. Neither is uniformly
better or worse. It seems to depend on which variance (Q or R) is going to
zero.

Warning: kf returned error at iter=X in attempt to set 0
diagonals for X
This is a warning that the EM algorithm tried to set one of the diagonals of
element X to 0 because allow.degen is TRUE and element X is going to zero.
However when this was tried, the Kalman filter returned an error. Typically,
this happens when both R and Q elements are both trying to be set to 0. If
the maximum-likelihood estimate is that both R and Q are zero, it probably
means that your MARSS model is not a very good description of the data.

Warning: At iter=X attempt to set 0 diagonals for R
blocked for elements where corresponding rows of A or
Z are not fixed.
You have control$degen.allow=TRUE and one of the R diagonal elements is
getting very small. MARSS attempts to set these R elements to 0, but if row
i of R is 0, then the corresponding i rows of a and Z must be fixed. This is for
the EM algorithm. It might work with the BFGS algorithm, or might spit out
garbage without telling you. Always be a suspect, when the EM and BFGS
behavior is different. That is a good sign that something is wrong with how
your model describes the data. It’s not a problem with the algorithms per se;
rather for certain pathological models, the algorithms behave differently from
each other.

248

B Package MARSS: Warnings and errors

Stopped at iter=X in MARSSkem. XYZ is not
invertible.
There are a series of checks in MARSS that check if matrix inversions are possible before doing the inversion. These errors crop up most often when Q or
R are getting very small. At some point, they can get so small that inversions
become unstable. If this error is given, then the output will be the last parameter estimates before the error. Try setting control$allow.degen=FALSE.
Sometimes the error occurs when a diagonal element of Q or R is being set to
0. You will also have to set control$maxit to something smaller because the
EM algorithm will not stop since the problematic diagonal element will walk
slowly and inexorably to 0.

References

Biernacki, C., Celeux, G., and Govaert, G. 2003. Choosing starting
values for the EM algorithm for getting the highest likelihood in multivariate gaussian mixture models. Computational Statistics and Data Analysis
41:561–575.
Brockwell, P. J. and Davis, R. A. 1991. Time series: theory and methods.
Springer-Verlag, New York, NY.
Cavanaugh, J. and Shumway, R. 1997. A bootstrap variant of AIC for
state-space model selection. Statistica Sinica 7:473–496.
Cheang, W. K. and Reinsel, G. C. 2000. Bias reduction of autoregressive estimates in time series regression model through restricted maximum
likelihood. Journal of the American Statistical Association 95:1173–1184.
de Jong, P. and Penzer, J. 1998. Diagnosing shocks in time series. Journal
of the American Statistical Association 93:796–806.
Dempster, A., Laird, N., and Rubin, D. 1977. Likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society, Series
B 39:1–38.
Dennis, B., Munholland, P. L., and Scott, J. M. 1991. Estimation
of growth and extinction parameters for endangered species. Ecological
Monographs 61:115–143.
Dennis, B., Ponciano, J. M., Lele, S. R., Taper, M. L., and Staples,
D. F. 2006. Estimating density dependence, process noise, and observation
error. Ecological Monographs 76:323–341.
Ellner, S. P. and Holmes, E. E. 2008. Resolving the debate on when
extinction risk is predictable. Ecology Letters 11:E1–E5.
Faraway, J. 2004. Linear models with R. CRC Press.
Gerber, L. R., Master, D. P. D., and Kareiva, P. M. 1999. Grey
whales and the value of monitoring data in implementing the u.s. endangered species act. Conservation Biology 13:1215–1219.
Ghahramani, Z. and Hinton, G. E. 1996. Parameter estimation for
linear dynamical systems. Technical Report CRG-TR-96-2, University of
Totronto, Dept. of Computer Science.

250

References

Hamilton, J. D. 1994. State-space models, volume IV, chapter 50. Elsevier
Science.
Hampton, S. E., Izmest’Eva, L. R., Moore, M. V., Katz, S. L., Dennis, B., and Silow, E. A. 2008. Sixty years of environmental change in
the world’s largest freshwater lake – Lake Baikal, Siberia. Global Change
Biology 14:1947–1958.
Hampton, S. E., Scheuerell, M. D., and Schindler, D. E. 2006. Coalescence in the lake washington story: Interaction strengths in a planktonic
food web. Limnology and Oceanography 51:2042–2051.
Hampton, S. E. and Schindler, D. E. 2006. Empirical evaluation of
observation scale effects in community time series. Oikos 113:424–439.
Harvey, A., Koopman, S. J., and Penzer, J. 1998. Messy time series: a
unified approach. Advances in Econometrics 13:103–143.
Harvey, A. C. 1989. Forecasting, structural time series models and the
Kalman filter. Cambridge University Press, Cambridge, UK.
Harvey, A. C. and Koopman, S. J. 1992. Diagnostic checking of unobserved components time series models. Journal of Business and Economic
Statistics 10:377–389.
Harvey, A. C. and Shephard, N. 1993. Structural time series models.
Elsevier Science Publishers B V, Amsterdam.
Hinrichsen, R. 2009. Population viability analysis for several populations
using multivariate state-space models. Ecological Modelling 220:1197–1202.
Hinrichsen, R. and Holmes, E. E. 2009. Using multivariate state-space
models to study spatial structure and dynamics. CRC/Chapman Hall.
Holmes, E. E. 2001. Estimating risks in declining populations with poor
data. Proceedings of the National Academy of Sciences of the United States
of America 98:5072–5077.
Holmes, E. E. 2004. Beyond theory to application and evaluation: diffusion
approximations for population viability analysis. Ecological Applications
14:1272–1293.
Holmes, E. E. 2012. Derivation of the EM algorithm for constrained and
unconstrained marss models. Technical report, Northwest Fisheries Science
Center, Mathematical Biology Program.
Holmes, E. E., Sabo, J. L., Viscido, S. V., and Fagan, W. F. 2007.
A statistical approach to quasi-extinction forecasting. Ecology Letters
10:1182–1198.
Holmes, E. E., Ward, E. J., and Wills, K. 2012. Marss: Multivariate
autoregressive state-space models for analyzing time-series data. The R
Journal 4:11–19.
Holmes, E. E. and Ward, E. W. 2010. Analyzing noisy, gappy, and multivariate population abundance data: modeling, estimation, and model selection in a maximum-likelihood framework. Technical report, Northwest
Fisheries Science Center, Mathematical Biology Program.
Ives, A. R. 1995. Measuring resilience in stochastic systems. Ecological
Monographs 65:217–233.

References

251

Ives, A. R., Abbott, K. C., and Ziebarth, N. L. 2010. Analysis of
ecological time series with arma(p,q) models. Ecology 91:858–871.
Ives, A. R., Carpenter, S. R., and Dennis, B. 1999. Community interaction webs and zooplankton responses to planktivory manipulations. Ecology
80:1405–1421.
Ives, A. R., Dennis, B., Cottingham, K. L., and Carpenter, S. R.
2003. Estimating community stability and ecological interactions from timeseries data. Ecological Monographs 73:301–330.
Jeffries, S., Huber, H., Calambokidis, J., and Laake, J. 2003. Trends
and status of harbor seals in washington state 1978-1999. Journal of Wildlife
Management 67:208–219.
Kalman, R. E. 1960. A new approach to linear filtering and prediction
problems. Journal of Basic Engineering 82:35–45.
Klug, J. L. and Cottingham, K. L. 2001. Interactions among environmental drivers: Community responses to changing nutrients and dissolved
organic carbon. Ecology 82:3390–3403.
Kohn, R. and Ansley, C. F. 1989. A fast algorithm for signal extraction,
influence and cross-validation in state-space models. Biometrika 76:65–79.
Koopman, S. J. 1993. Distrubance smoother for state space models.
Biometrika 80:117–126.
Koopman, S. J., Shephard, N., and Doornik, J. A. 1999. Statistical
algorithms for models in state space using ssfpack 2.2. Econometrics Journal
2:113–166.
Lamon, E. I., Carpenter, S., and Stow, C. 1998. Forecasting pcb concentrations in lake michigan salmonids: a dynamic linear model approach.
Ecological Applications 8:659âĂŞ668.
Lele, S. R., Dennis, B., and Lutscher, F. 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using bayesian
markov chain monte carlo methods. Ecology Letters 10:551–563.
McLachlan, G. J. and Krishnan, T. 2008. The EM algorithm and extensions. John Wiley and Sons, Inc., Hoboken, NJ, 2nd edition.
Penzer, J. 2001. Critical values for time series diagnostics. Technical report,
Department of Statistics, London School of Economics.
Petris, G., Petrone, S., and Campagnoli, P. 2009. Dynamic Linear
Models with R. Use R! Springer.
Pole, A., West, M., and Harrison, J. 1994. Applied Bayesian forecasting
and time series analysis. Chapman and Hall, New York.
Rauch, H. E. 1963. Solutions to the linear smoothing problem. IEEE Transactions on Automatic Control 8:371–372.
Rauch, H. E., Tung, F., and Striebel, C. T. 1965. Maximum likelihood
estimation of linear dynamical systems. Journal of AIAA 3:1445–1450.
Scheuerell, M. D. and Williams, J. G. 2005. Forecasting climate induced changes in the survival of snake river spring/summer chinook salmon
(oncorhynchus tshawytscha). Fisheries Oceanography 14:448–457.

252

References

Schweppe, F. C. 1965. Evaluation of likelihood functions for Gaussian signals. IEEE Transactions on Information Theory IT-r:294–305.
Shumway, R. and Stoffer, D. 2006. Time series analysis and its applications. Springer-Science+Business Media, LLC, New York, New York, 2nd
edition.
Shumway, R. H. and Stoffer, D. S. 1982. An approach to time series
smoothing and forecasting using the EM algorithm. Journal of Time Series
Analysis 3:253–264.
Staples, D. F., Taper, M. L., and Dennis, B. 2004. Estimating population trend and process variation for PVA in the presence of sampling error.
Ecology 85:923–929.
Staudenmayer, J. and Buonaccorsi, J. R. 2005. Measurement error in
linear autoregressive models. Journal of the American Statistical Association 10:841–852.
Stoffer, D. S. and Wall, K. D. 1991. Bootstrapping state-space models:
Gaussian maximum likelihood estimation and the Kalman filter. Journal
of the American Statistical Association 86:1024–1033.
Taper, M. L. and Dennis, B. 1994. Density dependence in time series observations of natural populations: estimation and testing. Ecological Monographs 64:205–224.
Tsay, R. S. 2010. Analysis of financial time series. Wiley Series in Probability
and Statistics. Wiley.
Ward, E. J., Chirakkal, H., González-Suárez, M., AuriolesGamboa, D., Holmes, E. E., and Gerber, L. 2010. Inferring spatial
structure from time-series data: using multivariate state-space models to
detect metapopulation structure of California sea lions in the Gulf of California, Mexico. Journal of Applied Ecology 1:47–56.
Zuur, A. F., Fryer, R. J., Jolliffe, I. T., Dekker, R., and Beukema,
J. J. 2003. Estimating common trends in multivariate time series using
dynamic factor analysis. Environmetrics 14:665–685.

Index

animal tracking, 131
kftrack, 138
bootstrap
innovations, 15, 21, 22
MARSSboot function, 15
parametric, 15, 21, 22
confidence intervals, 77
Hessian approximation, 15, 79
MARSSparamCIs function, 15
non-parametric bootstrap, 15
parametric bootstrap, 15, 77
covariates, 151, 181, 185
density-independent, 61
diagnostics, 89
error
observation, 62
process, 61, 62
errors
degenerate, 9
ill-conditioned, 9
estimation, 65
BFGS, 36
Dennis method, 66
EM, 14, 18, 65
Kalman filter, 15, 19
Kalman smoother, 15, 19
maximum-likelihood, 65, 66
Newton methods, 19
quasi-Newton, 14, 36
REML, 7

extinction, 61
diffusion approximation, 70
uncertainty, 75
functions
coef, 48
is.marssMLE, 15
is.marssMODEL, 16
MARSS, 13, 35, 38, 40, 42
MARSSaic, 15, 21, 22, 53
MARSSboot, 15, 21, 52
MARSShessian, 15
MARSSkem, 14, 18, 19
MARSSkf, 15, 20, 21, 48
MARSSkfas, 20, 21
MARSSkfss, 20
MARSSmcinit, 15, 19
MARSSoptim, 14
MARSSparamCIs, 8, 15, 21, 46
MARSSsimulate, 15, 22, 53
MARSSvectorizeparam, 15
optim, 14
summary, 16, 44
initial conditions
setting for BFGS, 37
Kalman filter and smoother, 48
lag-1 covariance smoother, 48
likelihood, 15, 20, 53
and missing values, 21
innovations algorithm, 20
MARSSkf function, 53

254

Index

missing value modifications, 21
multimodal, 19
troubleshooting, 9, 19
MAR(p), 230
MARSS model, 3, 6, 131
DFA example, 113
DLM example, 203
multivariate example, 81, 99, 131
print, 44
summary, 44
univariate example, 62
missing values, 8
and AICb, 22
and parametric bootstrap, 21
likelihood correction, 21
model selection, 22, 99
AIC, 22, 88, 89, 92, 96
AICc, 22, 96
bootstrap AIC, 22, 96
bootstrap AIC, AICbb, 22, 53
bootstrap AIC, AICbp, V, 22, 53, 96
MARSSaic function, 15, 53
model specification
in MARSS, 25
objects
marssMLE, 13

marssMODEL, 13, 16
outliers, 141
prior, 4, 29, 34
diffuse, 145
troubleshooting, 8, 37, 172, 244, 246
simulation, 22, 53, 62
MARSSsimulate function, 15, 53
standard errors, 15
structural breaks, 141
troubleshooting, 9, 243
B outside unit circle, 243
degenerate, 9
degenerate variances, 172
ill-conditioning, 9
Kalman filter errors, 247
local maxima, 19
matrix not invertible, 248
matrix not positive definite, 246
non-convergence, 9, 244, 245
numerical instability, 9, 244, 245
sensitivity to x0 prior, 34, 37, 172
setting diagonal to 0 blocked, 246,
247
sigma condition number, 246



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 262
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.12
Create Date                     : 2014:05:30 19:15:47-07:00
Modify Date                     : 2014:05:30 19:15:47-07:00
Trapped                         : False
PTEX Fullbanner                 : This is MiKTeX-pdfTeX 2.9.4487 (1.40.12)
EXIF Metadata provided by EXIF.tools

Navigation menu