MAMI Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 25

Model Averaging and Model Selection after Multiple
Imputation using the R-package MAMI
Version: August 17, 2018
Author: Michael Schomaker1, with support from Christian Heumann
NEW: speed up estimation using parallelization, see Section 6.1.
Contents
1 Background 2
1.1 Installation ...................................... 2
1.2 Citation ........................................ 2
1.3 Motivation ...................................... 3
1.3.1 Model Averaging ............................... 3
1.3.2 Multiple Imputation ............................. 3
1.3.3 Model Averaging (or Model Selection) after Multiple Imputation .... 4
2 Model Choice 5
2.1 Imputation Model .................................. 5
2.2 Analysis Model .................................... 6
3 Choice of Model Averaging (or Selection) Method 7
3.1 Model Averaging ................................... 7
3.1.1 Criterion Based Model Averaging ...................... 7
3.1.2 Mallow’s Model Averaging .......................... 8
3.1.3 Lasso Averaging ............................... 9
3.1.4 Jackknife Model Averaging ......................... 11
3.2 Model Selection .................................... 11
3.2.1 Criterion Based Model Selection ...................... 11
3.2.2 LASSO and Shrinkage Based Model Selection ............... 12
4 Inference 12
5 Analysis and Interpretation 13
5.1 Example ........................................ 13
5.2 Suggested Reporting in Publications ........................ 16
6 Miscellaneous 19
6.1 Computation Time .................................. 19
6.2 Limitations of the Approach ............................. 21
6.3 Optimal Model Averaging for Prediction: Super Learning ............ 21
6.4 Miscellaneous ..................................... 22
Index 22
1University of Cape Town, Centre for Infectious Disease Epidemiology and Research, Observatory, 7925;
Cape Town, South Africa, michael.schomaker@uct.ac.za
1
1 Background
The R-package MAMI offers an implementation of the methodology proposed in
Schomaker, M. and C. Heumann (2014). Model selection and model averaging after mul-
tiple imputation. Computational Statistics and Data Analysis 71, 758–770.
It is essentially a summary of functions used in the paper, with some useful extensions.
The main function of the package (mami()) performs model selection/averaging on multiply
imputed datasets and combines the resulting estimates. The package also provides access to
less frequently used model averaging techniques and offers integrated bootstrap estimation.
The package is useful if
one wants to perform model selection or model averaging on multiply imputed data and
the analysis model of interest is either the linear model, the logistic model, the Poisson
model, or the Cox proportional hazards model, possibly with a random intercept.
one wants to obtain bootstrap confidence intervals for model selection or model averag-
ing estimators (with or without missing data/imputation) – to address model selection
uncertainty and to discover relationships of small effect size, see Table 1 in Schomaker
and Heumann (2014).
one wants to compare different model selection and averaging techniques, easily with the
same syntax.
The package is of limited use under the following circumstances:
if one is interested in model selection or averaging for models other than those listed
above, for example parametric survival models, additive models, time-series analysis,
and many others.
if one decides for a specific model selection or averaging technique not provided by the
package, see Section 3for more details.
if the model selection/averaging problem is computationally too intensive, see Section
6.1 for more details.
1.1 Installation
The package can be downloaded at Rforge:
http://mami.r-forge.r-project.org/
https://r-forge.r-project.org/R/?group_id=2152
Or, simply type:
install.packages("MAMI", repos=c("http://R-Forge.R-project.org",
"http://cran.at.r-project.org"), dependencies=TRUE)
The latter option is recommended as mami() depends on a couple of other packages.
1.2 Citation
If you use MAMI, please cite it. For suggested citations use
citation("MAMI")
print(citation("MAMI"), bibtex=T)
2
It is always recommended to cite the methodological reference:
Schomaker, M. and C. Heumann (2014). Model selection and model averaging after mul-
tiple imputation. Computational Statistics and Data Analysis 71, 758–770.
Additionally, the package and this manual can be cited as well:
Schomaker, M. (2017).MAMI: Model averaging (and model selection) after multiple Impu-
tation. R package version 0.9.12.; http://mami.r-forge.r-project.org/
Schomaker, M. and C. Heumann (2017). Model averaging and model selection after multiple
imputation using the R-package MAMI;http://mami.r-forge.r-project.org/
1.3 Motivation
1.3.1 Model Averaging
The motivation for variable selection in regression models is based on the rationale that associ-
ational relationships between variables are best understood by reducing the model’s dimension.
The problem with this approach is that (i) regression parameters after model selection are of-
ten biased and (ii) the respective standard errors are too small because they do not reflect
the uncertainty related to the model selection process (Leeb and P¨otscher,2005;Burnham
and Anderson,2002). It has been proposed (Chatfield,1995;Draper,1995;Hoeting et al.,
1999) that the drawback of model selection can be overcome by model averaging. With model
averaging, one calculates a weighted average ˆ
¯
β=Pκwκˆ
βκfrom the kparameter estimates ˆ
βκ
(κ= 1, . . . , k) of a set of candidate (regression) models M={M1, . . . , Mk}, where the weights
are calculated in a way such that ‘better’ models receive a higher weight. A popular weight
choice would be based on the exponential AIC,
wAIC
κ=exp(1
2AICκ)
Pk
κ=1 exp(1
2AICκ),(1)
where AICκis the AIC value related to model Mκ M (Buckland et al.,1997) and PκwAIC
κ=
1. It has been suggested 2to estimate the variance of the scalar ˆ
¯
βjˆ
¯
βvia
d
Var( ˆ
¯
βj) = (k
X
κ=1
wκqd
Var( ˆ
βj,κ|Mκ)+(ˆ
βj,κ ˆ
¯
βj)2)2
,(2)
where ˆ
βj,κ is the jth regression coefficient of the κth candidate model. This approach tackles
problem (ii), the incorporation of model selection uncertainty into the standard errors of the
regression parameters; but it may not necessarily tackle problem (i) as the regression param-
eters may still be biased. There are multiple different suggestions on how the weights can be
calculated, and those implemented in mami() are explained in Section 3. Note that model
selection can be viewed as a special case of model averaging where the “best” model receives
weight 1 (and all others a weight of 0). All implemented model selection options are listed in
Section 3too.
1.3.2 Multiple Imputation
Multiple imputation (MI) is a popular method to address missing data. Based on assumptions
about the data distribution (and the mechanism which gives rise to the missing data) missing
2While formula (2) from Buckland et al. (1997) is the most popular choice to calculate standard errors in
model averaging, it has also been criticized that the coverage probability of interval estimates based on (2) can
be biased (Hjort and Claeskens,2003).
3
values can be imputed by means of draws from the posterior predictive distribution of the
unobserved data given the observed data. This procedure is repeated to create Mimputed
data sets, the (regression) analysis is then conducted on each of these data sets and the M
results (Mpoint and Mvariance estimates) are combined by a set of simple rules:
ˆ
βMI =1
M
M
X
m=1
ˆ
β(m)(3)
and
d
Cov( ˆ
βMI) = c
W+M+ 1
Mˆ
B=1
M
M
X
m=1 d
Cov( ˆ
β(m)) + M+ 1
M(M1)
M
X
m=1
(ˆ
β(m)ˆ
βMI)( ˆ
β(m)ˆ
βMI)0
(4)
where ˆ
β(m)refers to the estimate of βin the mth imputed set of data D(m),m= 1, . . . , M,
c
W=M1Pmd
Cov( ˆ
β(m)) is the average within imputation covariance, and ˆ
B= (M
1)1Pm(ˆ
β(m)ˆ
βMI)( ˆ
β(m)ˆ
βMI)0the between imputation covariance. Confidence intervals
are constructed on a tR-distribution with approximately R= (M1)[1 + {Mˆ
W /(M+ 1) ˆ
V}]2
degrees of freedom (Rubin and Schenker,1986), though there are alternative approximations,
especially for small samples (Lipsitz et al.,2002). More details on imputation can be found in
Rubin (1996) and White et al. (2011), among others.
1.3.3 Model Averaging (or Model Selection) after Multiple Imputation
How can model averaging and model selection be applied to multiply imputed data? The
detailed motivation can be found in Schomaker and Heumann (2014). The basic results for
model averaging are
ˆ
¯
βMI =1
M
M
X
m=1
ˆ
¯
β(m)with ˆ
¯
β(m)=
k
X
κ=1
w(m)
κˆ
β(m)
κ(5)
and applies to any weight choice. If the variance of the model averaging estimator is estimated
via (2), the overall variance of the estimator after multiple imputation relates to
d
Var( ˆ
¯
βj,MI) = 1
M
M
X
m=1 (k
X
κ=1
wκqd
Var( ˆ
β(m)
j,κ )+(ˆ
β(m)
j,κ ˆ
¯
β(m)
j)2)2
+
M+ 1
M(M1)
M
X
m=1
(ˆ
¯
β(m)
jˆ
¯
βj,MI)2.(6)
Confidence intervals could then again be estimated based on a tRdistribution (as explained
above) or, alternatively, via bootstrapping – see Section 4, and Table 1 in Schomaker and
Heumann (2014), for more details.
Model selection after imputation works essentially the same, except that parameters asso-
ciated with variables which have not been selected are assumed to be 0. With this assumption,
a variable will be formally selected if it is selected in at least one imputed set of data, but its
overall impact will depend on how often it is chosen. Here, confidence intervals will almost al-
ways be too narrow if (6) is applied (because of model selection uncertainty) and bootstrapping
is recommended.
As a consequence for model selection (and model averaging), effects of variables which are
not supported throughout imputed data sets (and candidate models) will simply be less pro-
nounced.
4
MAMI estimates the point estimates (5), together with confidence intervals that are either
based on (6), or a Bayesian variation thereof (Section 3.1.1), or bootstrapping (preferred
option, Section 4).
In addition, a variable importance measure (averaged over the Mimputed data sets)
will be calculated: this measure, see also Burnham and Anderson (2002), simply sums up
the weights wκof those candidate models Mκthat contain the relevant variable, and lies
between 0 (unimportant) and 1 (very important). It is similar to the Bayesian posterior
effect probability.
Results could be interpreted and reported as suggested in Section 5.
MAMI’s main function is mami(). It is recommended to get familiar with the function’s
syntax by typing ?mami and running the examples at the bottom of the help page. Briefly, the
syntax is
mami(X, missing.data=c("imputed","none","CC"),
model=c("gaussian","binomial","poisson","cox"), outcome=1, id=NULL,
method=c("MA.criterion","MS.criterion","LASSO","LAE","MMA"),
criterion=c("AIC","BIC","BIC+","CV","GCV"), kfold=5, cvr=F,
inference=c("standard","+boot"), B=20, X.org=NULL, CI=0.95,
var.remove=NULL, aweights=NULL, add.factor=NULL, add.interaction=NULL,
add.transformation=NULL, add.stratum=NULL, ncores=1,
candidate.models=c("all","restricted","very restricted"), screen=0,
report.exp=FALSE, print.time=FALSE, print.warnings=TRUE, ...)
?mami
example(mami)
Using the above framework requires decisions with respect to the following:
A decision about the imputation procedure, see Section 2.1.
this decision is being made before applying mami
A decision about the (regression) analysis model, see Section 2.2.
this is being specified by means of the following options: model, id as well as some-
times var.remove, outcome, aweights, add.factor, add.interaction, add.-
transformation, add.stratum, screen .
A decision about the model selection or model averaging approach, i.e. the determination
of the weights and the set of candidate models, see Section 3.
this is being specified by means of the following options: method, criterion as
well as sometimes kfold, candidate.models .
A decision about confidence interval calculation, see Section 4.
this is being specified by means of the following options: inference as well as
sometimes B, X.org, CI .
2 Model Choice
2.1 Imputation Model
MAMI does not impute. The user would typically provide the data Xin one of the following
formats:
5
An object of class amelia, generated by imputation with Amelia II [author’s recom-
mendation]. See Honaker and King (2010) and Honaker et al. (2011) for more details.
An object of class mids, generated by imputation with mice. See, for example, van
Buuren and Groothuis-Oudshoorn (2011) for more details.
A list of data sets imputed with any other package or software. With this choice bootstrap
confidence intervals, as explained in Section 4, can not be calculated. If the user has used
other software than Rto generate the imputations, one option would be to i) save the
imputed data sets as .csv files, ii) import them to Rwith read.csv(), and then iii)
create a list of these data sets which can then be used by mami().
A single data frame. This can be of interest when
one is interested in a comparison with results from a complete case analysis
(missing.data = "CC"),
or if one has no missing data (missing.data="none") but is interested in post
model selection/averaging intervals using bootstrapping .
2.2 Analysis Model
The analysis model can (currently) be any of the following: the linear regression model, the
logistic regression model, the Poisson regression model, or the Cox proportional hazards re-
gression model. Thus, βrefers to the parameter vector of a generalized linear model or survival
model. It is being specified with the option model and can be either "gaussian","binomial",
"poisson", or "cox".
If the data are not cross-sectional, but longitudinal, it is possible to specify this with a
variable which indicates the cross-sectional unit, i.e. via the option id. Then, a (generalized)
linear mixed model with random intercept is fit [or a Cox model with frailty]. When dealing
with longitudinal data the following needs to be considered:
the imputation model needs to account for the longitudinal structure of the data. For
longitudinal data, there are only semi-satisfactory imputation procedures available at the
moment. The best might be to use the “time series - cross section” facilities in Amelia
II, specified with the ts,cs and intercs options; see Honaker et al. (2011, Section 4.5)
for more details. This approach often works well, though it does not explicitly take into
account the time-ordering in the data. Alternatively, if there are not many time points,
reshaping the data with reshape(), from long to wide format, could be an option in
some applications.
The bootstrap needs to be facilitated on the id-level. mami() does this.
The full analysis model (before model selection or averaging) is the one specified via model
and id. It includes all variables in the data set X, and uses the first column as the outcome
variable, unless any of the below options are specified.
var.remove – either a vector of character strings or integers, specifying the variables or
columns which are part of the data but should not be considered in the model selec-
tion/averaging procedure.
screen – number of variables which should be removed in an initial screening step with
LASSO.
6
outcome – a character vector or integer specifying the variable or column which should
be treated as outcome variable. For survival models, two variables (time to event, event)
need to be specified.
add.transformation – a vector of character strings, specifying transformations of vari-
ables which should be added to the analysis models.
add.interaction – a list of either character strings or integers, specifying the variables
which should be added as interactions in the analysis model.
add.factor – a list of either character strings or integers which indicates the (categorical)
variables that should be treated as a factor. Rarely needed as mami() already searches
for existing factor variables.
aweights – a weight vector to be used in the analysis model.
add.stratum – a character vector or integer specifying the variable used as a stratum in
Cox regression analyses.
Example 2 in ?mami shows how these options could be used.
Future releases of MAMI are likely going to contain the option to specify quasi-likelihood
models. Other analysis models, like GEEs or parametric survival models may be offered too;
however, mixed models with random slope or additive models not, because the best use of
model averaging in this context is not clear yet.
3 Choice of Model Averaging (or Selection) Method
The model averaging (selection) method is chosen by the combination of the method and
criterion options.
3.1 Model Averaging
3.1.1 Criterion Based Model Averaging
Criterion based model averaging means essentially using the weights (1), with any information
type criterion. This can be utilized by picking:
method="MA.criterion" and either
criterion="AIC" or criterion="BIC" or criterion="BIC+".
Model averaging is utilized with the package MuMIn (Barton,2017) for criterion="AIC/BIC"
and with BMA (Raftery et al.,2017) for criterion="BIC+".
MuMIn evaluates (by default) all possible candidate models M(i.e. 2pfor pvariables),
whereas BMA uses a subset of models based on a leaps and bounds algorithm in conjunction
with “Occam’s razor”, see Hoeting et al. (1999) for more details.
There are several implications by using the above methodology:
With many variables, the computation time can be (too) large if “all” candidate mod-
els are evaluated. A solution to this is restricting the number of candidate models
and/or parallelization. Section 6.1 gives an overview on how this can be facilitated.
Briefly, by i) restricting candidate models by accessing dredge from MuMIn, ii) or using
criterion="BIC+" [the number of candidate models is printed], or iii) using criterion="BIC+"
and accessing bic.surv or bic.glm from BMA to adjust Occam’s window, iv) using the op-
tion candidate.models, by v) screening variables with option screen, or vi) by parallel
computing using option ncores.
7
If there is no proper likelihood function, then it can be argued that using an information
type criterion, and therefore criterion based model averaging, does not make sense. In
particular, survival models, such as Cox’s proportional hazards model, use a partial
likelihood, due to the censoring inherent in survival data. Nevertheless, a pragmatic
solution is to simply adopt the partial likelihood; if BIC is used, one has to decide what
nmeans, i.e. the number of subjects or the number of uncensored subjects, see Hoeting
et al. (1999) and Volinsky et al. (1997) for more details. Similarly, the definition of
information criteria in mixed models is not entirely clear and we use the implementation
from package lme4.
From the Bayesian perspective the quality of a regression model Mκ∈ M may be judged
upon its estimated posterior probability that this model is correct, that is
Pr(Mκ|y)Pr(Mκ)ZPr(y|Mκ, βκ)·Pr(βκ|Mκ)κ,
where Pr(Mκ) is the prior probability for the model Mκto be correct, Pr(y|Mκ, βκ) =
L(β) represents the maximized likelihood, and Pr(βκ|Mκ) reflects the prior of βκfor
model Mκ. Since, for a large sample size, Pr(Mκ|y) can be approximated via the Bayes-
Criterion of Schwarz (BCS, BIC), it is often suggested that the weight (1) is used for
the construction of the Bayesian Model Averaging estimator, but with BIC, instead of
AIC. The BCS corresponds to 2L(ˆ
β) + ln n·p, where pcorresponds to the number of
parameters. Note the following:
The BMA implementation does not allow the specification of Pr(Mκ) and assumes
equal prior probabilities for each model to be correct.
Broadly, using option BIC+ uses variance estimation based on variance decomposi-
tion (Draper,1995) such as the law of total variance, i.e. using
d
Var( ˆ
¯
βj) = b
EM(d
Var( ˆ
βj,κ|y, Mκ)) + d
VarM(b
E( ˆ
βj,κ|y, Mκ)) (7)
which is similar, but not identical to (2). This means standard errors and confidence
intervals are not identical when comparing options BIC and BIC+ – even if the
candidate models are the same (which practically never happens). However, in
most situations the confidence intervals will be very close.
Using BIC+ and passing on the option OR=Inf (to bic.glm or bic.surv) means con-
sidering all candidate models. Can be of interest when considering a full Bayesian
approach.
For BIC+, the variable importance measure displayed by mami essentially refers to
the posterior effect probability (averaged over the imputed data sets) calculated by
BMA; see Hoeting et al. (1999), among others, for more details.
The number of candidate models is 2p1 if the Cox model ist the analysis model of
interest. This is because the “intercept only” model is not evaluated, as there is no
natural intercept (estimation of the baseline hazard is treated as a nuisance parameter).
3.1.2 Mallow’s Model Averaging
Mallow’s model averaging (MMA) refers to the approach described by Hansen (2007). This
estimator is an example of optimal model averaging, which may be of particular interest from
a predictive point of view, rather than explanatory point of view. It is implemented in the
function mma() and can be used within mami() when the option method="MMA" is chosen. It
can only be used for the linear model.
Hansen considers a situation of knested linear regression models for kvariables. Let
ˆ
¯
β=Pk
κ=1 wκˆ
βκbe a model averaging estimator with ˆµw=Xkˆ
¯
β. Based on similar thoughts as
8
in the construction of Mallow’s Cp, Hansen suggests to minimize the mean squared (prediction)
error by minimizing the following criterion:
˜
Cp= (yXkˆ
¯
β)0(yXkˆ
¯
β)+2σ2Kw,(8)
where Kw= tr (Pw), Pw=Pk
κ=1 wκPκ,Pκ=Xκ(X0
κXκ)1X0
κ, and σ2is the variance which
needs to be estimated from the full model. Consequently, the weights are chosen such that ˜
Cp
is minimized
wMMA
κ= arg min
wκ∈H
˜
Cp,(9)
with H={(w1, . . . , wk)[0,1]k:Pk
κ=1 wκ= 1}. Since the first part of (8) is quadratic in wκ
and the second one linear, one can obtain the model averaging estimator by means of quadratic
programming (i.e. the package quadprog). Note the following:
The MMA estimator is based on the weights (9) rather than (1).
The standard error is calculated according to (2); however, it has limited meaning since
Mallow’s model averaging is not ideal to address explanatory questions. If it is of in-
terest, one could calculate bootstrap standard errors. This can be facilitated with the
variance and bsa options in mma:
data(Prostate)
mma(Prostate,formula=lpsa.,outcome="lpsa",variance="boot",bsa=200)
Similarly, the variance and bsa arguments can be given to mami().
A different implementation of Mallow’s Model Averaging can be found on Bruce Hansen’s
website: http://www.ssc.wisc.edu/~bhansen/progs/ecnmt_07.html
It has been shown that MMA can also be used under a candidate set of unordered
regressors. Hansen’s implementation allows for this. However, MMA is much more
unstable under such a setup. Even in the simpler setup, matrices which are not positive
definite yield to the failure of the optimization problem. mma prevents this by looking for
a “close” positive definite matrix that can be used, based on make.positive.definite
from package corpcor.
3.1.3 Lasso Averaging
Shrinkage estimation, for example via the LASSO (Tibshirani,1996), can be used for model
selection. This requires the choice of a tuning parameter which comes with tuning parameter
selection uncertainty. LASSO averaging estimation (LAE), or more general shrinkage averaging
estimation (Schomaker,2012), is a way to combine shrinkage estimators with different tuning
parameters. This is implemented in MAMI in the lae function and can be used in mami() by
calling the option method="LAE".
Consider the LASSO estimator for a simple linear model:
ˆ
βLE(λ) = arg min
n
X
i=1
(yiβ0
p
X
j=1
xij βj)2+λ
p
X
j=1
|βj|
.(10)
The complexity parameter λ0 tunes the amount of shrinkage and is typically estimated via
the generalized cross validation criterion (GCV) or any other cross validation criterion (CV).
The larger the value of λ, the greater the amount of shrinkage since the estimated coefficients
are shrunk towards zero.
9
Consider a sequence of candidate tuning parameters λ={λ1, . . . , λL}. If each estimator
ˆ
βLE(λi) obtains a specific weight wλi, then a LASSO averaging estimator takes the form
ˆ
¯
βLAE =
L
X
i=1
wλiˆ
βLE(λi) = wλˆ
BLE ,(11)
where λi[0, c], c > 0 is a suitable constant, ˆ
BLE = ( ˆ
βLE(λ1),..., ˆ
βLE(λL))0is the L×p
matrix of the LASSO estimators, wλ= (wλ1, . . . , wλL) is an 1 ×Lweight vector, wλ∈ W and
W={wλ[0,1]L:10wλ= 1}.
One could choose the weights
ˆ
wOCV
λ= arg min
wλ∈W
OCVk(12)
with
OCVk=1
n˜
κ(w)0˜
κ(w)
wλE0
kEkwλ
0,(13)
referring to an optimal cross validation (OCV) based criterion and Ek= (˜
k(λ1),...,˜
k(λL))
is the n×Lmatrix of the cross-validation residuals for the Lcompeting tuning parameters
(given a specific loss function). More details can be found in Schomaker (2012). Using LAE
with method="LAE" has the following implications:
LAE can be used for the linear model, the logistic model and the Poisson model – but
not for longitudinal or survival data.
The model averaging estimator is based on the weights (12) rather than (1).
The variance of the LAE estimator is calculated according to (2), but the candidate
models refer to the different choices of the tuning parameter. The variance of each
single LASSO estimator ˆ
βLE(λi) is based on bootstrapping. The default is 100 bootstrap
samples as not for all problems confidence intervals of the LASSO are of interest. The
option B.var (in lae, or called from mami) can be used to adjust the number of bootstrap
samples. Note that the LASSO estimator is a shrinkage estimator which trades decreased
variance with increased bias. Therefore simple bootstrap based confidence intervals will
not achieve nominal coverage (a warning is printed), and using it for model averaging
(another shrinkage estimator) will not change this.
The LASSO implementation cv.glmnet from package glmnet is used. Therefore, any
arguments can passed to this function: for example alpha (to call the Elastic Net or
Ridge estimator [see Example 4, ?mami]; or the λsequence (nlambda).
The kof k-fold cross-validation (for the definition of OCVk) can be specified with the
option kfold [both in lae and mami]. The default split of the data for cross validation
is based on the sequence 1,2, ..., k, 1,2, .... The option cvr=T randomly shuffles the split.
The default cvr=F is meant to make results reproducible and comparable when using
both method="LAE" and method="LASSO".
Note that the deviance is used as the loss function to calculate the cross validation
residuals for logisitic and Poisson regression models. However, the weights for Lasso
averaging have to be determined based on a squared loss function.
Example 3 under ?mami as well as the examples under ?lae give examples of the use of
LAE.
10
3.1.4 Jackknife Model Averaging
Jackknife Model Averaging (JMA), as suggested in Hansen and Racine (2012), has been im-
plemented in the function jma. It will soon be available as an option in function mami too.
3.2 Model Selection
Model selection means assigning a weight of 1 to the model which is optimal with respect to
a specific criterion. In many situations it is likely that there is model selection uncertainty in
the sense that different samples would lead to different model choices: sometimes a variable is
included, and sometimes not. Sometimes, this can make the distribution of parameter estimates
bimodal (Schomaker and Heumann,2014) – and bootstrap based confidence intervals (Section
4) together with graphical summaries (Section 5) can therefore be a good choice when applying
model selection.
3.2.1 Criterion Based Model Selection
MAMI offers model selection based on the following options:
method="MS.criterion" [or sometimes method="MA.criterion"] and
criterion="AIC" or criterion="BIC" or criterion="GCV" or criterion="CV".
criterion="AIC" chooses the model with the smallest AIC based on a stepwise search
with stepAIC from MASS. Note that the number of candidate models is therefore not
2pas with method="MA.criterion" [where model selection results are displayed in ad-
dition to model averaging results]. This means that model selection results can po-
tentially differ between the two approaches. The latter is preferred if computationally
feasible. For longitudinal data, AIC based model selection can only be utilized with
method="MA.criterion" and not with the quicker method="MS.criterion".
criterion="BIC" chooses the model with the smallest BIC based on a stepwise search
with stepAIC from MASS. The same considerations as above apply here as well, i.e. with
respect to longitudinal data and potentially differing results to those displayed after
model averaging.
For both AIC and BIC the considerations from Section 3.1.1 apply. For example, the
critique of these criteria in survival analysis and mixed models.
criterion="CV" together with kfold uses the k-fold cross validation error for model
selection based on a squared loss function. This approach is implemented using dredge
in MuMIn, and therefore all 2pmodels are being evaluated. This may be time-consuming
and criterion="GCV" (see below) is an alternative. The default split of the data for
cross validation is based on the sequence 1,2, ..., k, 1,2, .... The option cvr=T randomly
shuffles the split. Cross validation can not be used for survival models. It can be used
for longitudinal data, i.e. in the context of mixed models, but one needs to be aware
that the cross validation predictions are using the average intercept within each subject.
criterion="GCV" uses generalized cross validation, i.e. an approximation to leave-one-
out cross-validation, for model selection and is much quicker than criterion="CV". It
can’t be used for survival data, but for longitudinal data. Again, the implementation
utilizes dredge from MuMIn and thus all 2pcandidate models are being evaluated.
11
3.2.2 LASSO and Shrinkage Based Model Selection
Model selection can be done with the LASSO estimator as introduced in (10). This can be
achieved with the following option:
method="LASSO" and kfold
As opposed to LASSO averaging, LASSO based model selection can be used for not only
the linear, logistic and Poisson model, but also for the Cox proportional hazards model. The
implementation is based on cv.glmnet from package glmnet.
Essentially the same considerations as in Section 3.1.3 apply. Briefly:
The variance of each single LASSO estimator is based on bootstrapping, with a default
of max(B,100) bootstrap samples. This can be lowered in future releases, as the standard
error may not necessarily be of interest.
One might argue that confidence intervals for the LASSO estimator are not meaningful.
For sure, they are not corrected for the bias introduced by shrinkage and won’t achieve
nominal coverage.
The data is split into a sequence of 1,2, ..., k, 1,2, ... and the tuning parameter which
minimizes the k-fold cross validation error (chosen via kfold) is being used. The option
cvr=T randomly shuffles the split.
The loss function to calculate the cross validation error is the squared error loss for linear
models, the deviance for logistic and Poisson models, and the partial likelihood for the
Cox model.
To use general Elastic Net type model selection pass on alpha to cv.glmnet, e.g.
alpha=0.5.
4 Inference
In general, confidence intervals based on (6) are calculated (or they are based on the very
similar (7) if criterion="BIC+"). Schomaker and Heumann (2014, Table 3) show that these
intervals can work quite well for criterion based model averaging estimators; however, they
will underestimate the variance for model selection estimators. Bootstrap confidence inter-
vals can help to improve the coverage of model selection estimators. Moreover, distributions
post model selection and post model averaging are often non-symmetric (Hjort and Claeskens,
2003,Schomaker and Heumann,2014, Fig. 2-4) which is another motivation for bootstrap-
ping. When applying bootstrapping one has of course to impute the data in each bootstrap
sample. Bootstrap model selection/averaging confidence intervals, after imputation, can thus
be generated as follows:
1) Create Bbootstrap samples of the original data (including missing observations)
2) Generate Mimputed sets of data for each bootstrap sample
3) In each bootstrap sample calculate a model averaging (selection) estimator of
the regression parameters using equation (5) – based on the application of a
particular model averaging/selection scheme on the multiply imputed data
4) Construct 1 αconfidence intervals based on the α/2 and 1 α/2 percentiles
of the empirical distribution of the Bpoint estimates produced in step 3
If the option inference="+boot" is chosen, confidence intervals according to the above
algorithm are being generated in addition to the standard confidence intervals from (6). If
computationally feasible, it is recommended to implement this approach and plot the results
(see also Section 5.1).
12
The option inference="+boot" needs to be complemented with the original unimputed
data (option X.org) [because of step 2] and the number of bootstrap samples to be drawn
(option B) .
The point estimates reported from mami are still the point estimates according to (5), and
not the arithmetic mean of the bootstrap samples as suggested by Table 1 in Schomaker
and Heumann (2014), though the latter are reported by print.mami as well.
By default a 95% confidence interval is reported, but this can be changed with the option
CI.
Confidence intervals are based on a tRdistribution as explained in Section 1.3.2. If
M= 1, for example under no missing data or a complete case analysis, Ris assumed to
be infinity under model averaging and thus a standard normal distribution is used. For
model selection, an appropriate t-distribution is used, if meaningful for the respective
model class.
More about combining bootstrapping and multiple imputation can be found in Schomaker
and Heumann (2018).
5 Analysis and Interpretation
5.1 Example
For a reasonably realistic example one could look at the hypothetical HIV data set provided by
MAMI. It contains typical variables of HIV treatment research such as follow-up time (futime),
event of death (dead), baseline variables such as CD4 count (cd4), WHO stage (stage), weight
(weight), and many others. The data looks as follows:
patient hospital futime dead sex age cd4 cd4slope6 weight period haem stage tb cm
1 3 58 0 0 39 128 1.8887 65 2004 13.2184 2 0 0
2 3 1717 1 1 38 77 -6.5082 NA 2001 10.7261 3 0 NA
3 3 1941 0 1 34 124 12.0466 62 2001 NA 3 NA 0
4 3 512 0 1 22 147 8.9213 63 2004 NA 2 0 0
5 3 766 0 1 28 187 4.6059 NA 2004 12.7076 3 0 NA
6 3 2242 0 1 36 10 8.8271 NA 2001 9.4803 3 NA NA
Since data of CD4 count, haemoglobin, WHO stage, tuberculosis and cryptococcal menin-
gitis (all at baseline) are missing, one could impute the data under a missing at random
assumption. This could be done with Amelia II in R, see Honaker et al. (2011) for details.
library(MAMI)
library(Amelia)
data(HIV)
HIV imp <- amelia(HIV, m=5, idvars="patient",logs=c("futime","cd4"),
noms=c("hospital","sex","dead","tb","cm"), ords=c("period","stage"),
bounds=matrix(c(3,7,9,11,0,0,0,0,3000,5000,200,150),ncol=3,nrow=4))
To select a model with AIC on the multiply imputed data, we could simply pass it to
mami. Picking the options model="cox" and outcome=c("futime","dead") makes it clear
that we are interested in risk factors for the hazard of death, modeled by a Cox proportional
hazards model. With method="MS.criterion" and criterion="AIC" we make clear that
we are interested in model selection with AIC. To ensure that the categorical variables are
treated as such we use add.factor. Now, we may already know that the influence of CD4
count on mortality is typically squared, and that haemoglobin levels mean different things
for men and women, and therefore add the respective transformations and interactions with
add.transformation and add.interaction. Two variables (patient identifier, and average
13
CD4 slope) may not be of interest and be removed. We want the results to be reported as
hazard ratios (rather than coefficients), and thus use report.exp=T. The code looks as follows.
# Model selection after imputation
mami(HIV imp, model="cox", outcome=c("futime","dead"), method="MS.criterion",
criterion="AIC", add.factor=c("period","hospital","stage"),
add.transformation=c("cd4^2"), add.interaction=list(c("haem","sex")),
report.exp=TRUE, var.remove=c("patient","cd4slope6"))
To get a sense how results would vary if we selected the model with LASSO rather than
AIC, we simply replace method="MS.criterion" with method="LASSO".
# Model selection after imputation with LASSO
mami(HIV imp, model="cox", outcome=c("futime","dead"), method="LASSO",
add.factor=c("period","hospital","stage"),
add.transformation=c("cd4^2"), add.interaction=list(c("haem","sex")),
report.exp=TRUE, var.remove=c("patient","cd4slope6"))
Both methods suggest to pick all variables, but at the same time confidence intervals for
some variables are very wide. Let’s say we are interested in model selection uncertainty, there-
fore use model averaging, but with the more parsimonious BIC instead. We may pick BIC+ for
a reduced set of candidate models and therefore quicker results.
# Model averaging after imputation
mami(HIV imp, model="cox", outcome=c("futime","dead"), method="MA.criterion",
criterion="BIC+", add.factor=c("period","hospital","stage"),
add.transformation=c("cd4^2"), add.interaction=list(c("haem","sex")),
report.exp=TRUE, var.remove=c("patient","cd4slope6"))
The output looks as follows:
Estimates for model averaging:
Estimate Lower CI Upper CI
factor.hospital..4 -- -- --
factor.hospital..5 -- -- --
sex 0.914877 0.75895 1.102839
age 1.024 0.977467 1.072749
cd4 0.998027 0.994104 1.001965
wt 0.974976 0.927272 1.025134
factor.period..2004 0.999213 0.993614 1.004844
factor.period..2007 0.999764 0.998083 1.001448
haem 0.890771 0.707211 1.121974
factor.stage..3 1.425054 0.651179 3.118619
factor.stage..4 2.40972 0.420057 13.823716
tb 1.445979 0.600062 3.484396
cm 1.011204 0.957709 1.067687
I.cd4.2. 1 0.999999 1.000002
sex.haem 0.988829 0.964419 1.013856
Estimates for model selection:
Estimate Lower CI Upper CI
factor.hospital..4 -- -- --
factor.hospital..5 -- -- --
sex 0.94812 0.613547 1.46514
age 1.023815 0.977524 1.072298
cd4 0.99802 0.994124 1.001931
14
wt 0.97507 0.927527 1.025051
factor.period..2004 -- -- --
factor.period..2007 -- -- --
haem 0.892626 0.710438 1.121534
factor.stage..3 1.414049 0.641343 3.117733
factor.stage..4 2.385434 0.423874 13.424494
tb 1.44305 0.516333 4.033044
cm -- -- --
I.cd4.2. -- -- --
sex.haem 0.984011 0.925989 1.045668
Posterior effect probabilities:
age factor.stage. wt haem cd4 tb sex.haem
1.00 1.00 1.00 1.00 0.90 0.82 0.42
sex I.cd4.2. cm factor.period. factor.hospital.
0.35 0.14 0.03 0.00 0.00
One can see from the results of both model selection and averaging that the variable
hospital is not picked and has a variable importance of 0. Some of the variables which had
wider confidence intervals with AIC, are now not being selected anymore by BIC (e.g. cryp-
tococcal meningitis and CD42). The variable importance (i.e. posterior effect probabilities)
suggest that there is some uncertainty around variables such as the added interaction, or sex.
So, what to do? We could repeat the analysis with bootstrap confidence intervals to get a
better sense of the model selection and model averaging results. Therefore we use the options
inference="+boot", B=200, X.org=HIV. We know that biologically it makes sense that co-
infections are relevant to predict mortality but that the associations in the data are maybe
not strong enough to immediately pick them. Thus, we could opt to do model averaging with
AIC rather than BIC, because the latter has a more parsimonious approach. Knowing that
mami utilizes dredge from MuMIn, we could pass on the option subset to specify that candidate
models should contain squared CD4 count only if linear CD4 count is contained, and that the
interaction of sex and haemoglobin should only be contained in the models where the main
effect is contained too. This can be utilized with dependency chains (dc), see ?dredge for help.
Then, the syntax looks as follows:
# Model averaging after imputation + bootstrapping
m1 <- mami(HIV imp,
model="cox", outcome=c("futime","dead"),
method="MA.criterion", criterion="AIC",
add.factor=c("period","hospital","stage"),
add.transformation=c("cd4^2"), add.interaction=list(c("haem","sex")),
report.exp=TRUE, var.remove=c("patient","cd4slope6"),
inference="+boot", B=200, X.org=HIV,
subset = dc("cd4","I(cd4^2)") && dc("sex","sex:haem")
&& dc("haem","sex:haem"),
print.time=TRUE)
summary(m1)
plot(m1)
> summary(m1)
...
Estimates for model selection (based on 200 bootstrap samples):
Estimate LCI UCI Boot LCI Boot UCI VI
15
age 1.023907 0.977561 1.072449 1.011644 1.034206 1
cd4 0.997417 0.991887 1.002979 0.994914 0.999347 0.99
cm 1.078447 0.581898 1.998714 0.796985 1.881647 0.36
factor(hospital)4 -- -- -- 0.678480 1.000000 0.23
factor(hospital)5 -- -- -- 0.783300 1.143629 --
factor(period)2004 0.760642 0.444489 1.301663 0.586118 1.000000 0.7
factor(period)2007 0.908518 0.740873 1.114098 0.545100 1.424140 --
factor(stage)3 1.382479 0.647652 2.951041 0.961486 1.961763 1
factor(stage)4 2.286076 0.440729 11.857959 1.520232 3.193898 --
haem 0.879414 0.680912 1.135784 0.830303 0.953194 1
haem:sex -- -- -- 0.898716 1.016400 0.31
I(cd4^2) 1.000002 0.999994 1.000009 1.000000 1.000005 0.58
sex 0.763531 0.448949 1.298543 0.591503 2.268933 0.94
tb 1.514913 0.636557 3.605274 1.044047 1.903004 0.94
wt 0.975424 0.92847 1.024753 0.965503 0.985913 1
The results we got a now much more nuanced: estimates of coefficients from variables we
were unsure about, have non-normal distributions. For example, the 95% bootstrap confidence
interval of the hazard ratio of cryptococcal meningitis is bimodal, see also Figures 1a and 1b
[which are provided by plot(m1)]. Comprehensive model selection and averaging with AIC
suggests a moderate effect of variables such as the transformation, but no major role of the
interaction. This would have potentially remained undiscovered without bootstrapping. Figure
1shows that from an explorative perspective post-model averaging/selection plots can help us
to get us a better sense of which variables are potentially relevant and which not.
Given the variety of options how to perform and report model selection and averaging
results, we give some guidance in the next section on how to report results.
5.2 Suggested Reporting in Publications
By default, print.mami lists a big variety of results and all of them could be reported. How-
ever, a more concise approach may be better understood.
Since model averaging is not always well-known, as opposed to model selection, a good
option could be to report i) the results of the full model (without model selection, as
estimates are consistent), ii) the results after the preferred model selection procedure has
been applied – together with bootstrap confidence intervals (crucial, such that the intervals
reflect model selection uncertainty), and iii) the variable importance measure related to
the weights of model averaging (to get a sense of model selection uncertainty). A template
is given in Table 1(use summary()) .
For example, the results reported in Porter et al. (2015) – see Figure 2– are very similar to
the suggested template, except that confidence intervals are not based on bootstrapping and
univariate results are reported as well.
Karamchand et al. (2016) do not report results from model averaging, e.g. via variable
importance measures, but are otherwise similar in their approach on how to report results of
model selection after imputation; see Figure 3.
Another good example is the table from Visser et al. (2012) [Figure 4], where Bayesian
posterior effect probabilities (based on Bayesian Model Averaging after imputation) are given,
but no results of model selection.
16
−1.0 −0.5 0.0 0.5 1.0
0.0 0.5 1.0 1.5 2.0 2.5
Bootstrap distribution
(after model selection)
cm
Density
(a) cryptococcal meningitis
−0.5 0.0 0.5 1.0
0.0 0.5 1.0 1.5 2.0
Bootstrap distribution
(after model averaging)
cm
Density
(b) cryptococcal meningitis
−2e−06 0e+00 2e−06 4e−06 6e−06 8e−06
0 50000 100000 150000 200000 250000
Bootstrap distribution
(after model selection)
I(cd4^2)
Density
(c) squared CD4
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
0 10 20 30 40 50
Bootstrap distribution
(after model selection)
haem:sex
Density
(d) sex:haemoglobin (interaction)
Figure 1: Bootstrap distribution, after model selection and model averaging
Table 1: A template to report results from mami (use summary())
Full Model Model Selection
Variable β95% CI β95% CI VI
Variable 1
Variable 2
Variable 3
Variable 4
Variable 5
.
.
..
.
..
.
..
.
..
.
..
.
.
Footnotes on screening, model assumptions etc.
17
Figure 2: Table 3 from Porter et al. (2015)
Figure 3: Table 3 from Karamchand et al. (2016)
18
Figure 4: Table 2 from Visser et al. (2012)
6 Miscellaneous
6.1 Computation Time
In many settings, the computation time can be long; particularly when performing model
averaging. When applying bootstrapping to obtain confidence intervals, estimation can take
even longer. To decrease computation time several options are possible. each of these options
either decreases the number of candidate models and/or parallelize the computations.
(i) If model averaging or model selection is based on a criterion, and mami accesses dredge
from package MuMIn as described in Section 3.1.1 and Section 3.2.1, then restrictions on
the number of candidate models can be passed on to dredge. Type ?dredge to learn
more, and possibly see Example 4 in ?mami and the last example in Section 5.1.
(ii) For criterion based model averaging (or selection with cross validation) as described in
(i), the option candidate.models can be used as well. This is a simple wrapper to
reduce the maximum amount of variables in the candidate models to half of all variables
(candidate.models="restricted") or a fourth of all variables (candidate.models =
"very restricted"). Should be used only if one is sure that this is appropriate.
(iii) As indicated above, the option criterion="BIC+" utilizes Bayesian Model Averaging
by calling the package BMA. This implementation does not evaluate all candidate mod-
els, but uses a branch-and-bound algorithm to reduce the number of candidate models.
The number of candidate models (in each imputed dataset) is printed by mami. Thus,
criterion="BIC+" is a viable alternative to criterion="BIC" if the number of candidate
models is large.
(iv) Similarly, when using criterion="BIC+" and therefore accessing bic.surv or bic.glm
from BMA, one can adjust Occam’s Window (Hoeting et al.,1999) using the option OR,
i.e. based on the ratio between the posterior model probabilities of the “best” model and
the candidate model under consideration, models with ratios greater than Care rejected.
19
Since within BMA’s branch-and-bound algorithm rejection of a model means also that
nested submodels can be rejected, this leads to reduced complexity. The default of Cis
20, and a reduction down to 5 can possibly be o.k. in complex settings. See Hoeting et al.
(1999, Section3) for more details, and Example 4 from ?mami for a practical example.
(v) For a very large amount of variables a pre-screening step can be an option. This implies
that the model selection (averaging) estimator is then conditional on this pre-screening
step. An efficient way to screen variables is to use the LASSO estimator as introduced in
(10). This is implemented in the package by the option screen. One has to specify the
amount of variables that should be excluded before model selection/averaging after im-
putation is utilized. Then, based on the LASSO path an appropriate amount of variables
are excluded and mami reports which one.
Screening is done on the first imputed data set.
mami deals automatically with the consequences related to suggested interactions,
transformations etc.
Screening works for all model classes, but only for cross-sectional data.
Screening might not work in conjunction with complex sub-options, for example
when accessing subset in dredge.
Screening is recommended if the amount of variables is very large. For example,
when dealing with 100 variables, and having the knowledge that only a small subset
is likely relevant, one could screen away 60 variables and then use criterion="BIC+"
afterwards.
If screening is utilized, it is suggested to report the excluded variables in a footnote.
(vi) The best option to save time is parallelization using the option ncores. One has to simply
supply the number of available cores (threads) and mami automatically parallelizes parts
of the code to speed up calculations.
Note that this (experimental) options is still in its infancy. It has been tested heavily,
but mostly on Windows machines. Please report any bugs.
There will be no speed up for very simple problems, or the calculation may even
take longer. However, you can expect a considerable speed up if
i) you have multiply imputed data sets (i.e. M >> 1) and/or
ii) you are using option inference="+boot" for bootstrap confidence intervals
(except in conjunction with mice) or
iii) M= 1 and model averaging is criterion based ("MA.criterion") and very
complex (many models and/or complex models).
In the example from Section 5.1, utilizing 7 cores on a Window’s machine improved
the computation time by a factor of 4.2.
If a Cox model is fit it may be necessary to load library(survival) first; if boot-
strapping is used it may be needed to load library(boot) first; and if M= 1 (com-
plete cases, no missing data, one imputation), and also method="MA.criterion" or
method="MS.criterion" and criterion is "CV" or "GCV", it is needed to first load
library(snow). The same applies in the context of mixed models. mami will notify
the user if a needed package has not been loaded.
Currently the cluster for parallelization is set up (and stopped) automatically, mostly
for convenience. In future releases there will be more flexible options for advanced
users and different computing environments.
set.seed() will not work with parallelization. Thus, bootstrap confidence intervals
are not fully reproducible. There are ways around this, and guidance will be given
in future.
Using parallelization means that less warning messages are printed from mami.
20
6.2 Limitations of the Approach
Model Selection and model averaging estimators are generally biased. Improving the
coverage probability with the approach implemented in mami doesn’t alter this conclu-
sion. If the number of variables is reasonable, fitting the full model in addition to the
selected/averaged model is recommended.
The implementation in mami is meant to be useful, stable and flexible. However, for
extremely complex situations, or more complex models, mami does not work. Here, a
manual implementation is needed.
Re-sampling has limitations, is not always valid, and this applies to model averaging too.
Good references are Leeb and P¨otscher (2005,2006,2008) and otscher (2006).
There are simple pragmatic alternatives to our suggested approach: for example selecting
only variables which are selected in each imputed data set; or stacking the imputed data,
see Wood et al. (2008).
6.3 Optimal Model Averaging for Prediction: Super Learning
The motivation for optimal model averaging, i.e. Mallow’s Model Averaging, Jackknife Model
Averaging, and Shrinkage Averaging Estimation, is essentially prediction. For this reason
predict methods are available: predict.mma,predict.jma,predict.lae. For example:
data(Prostate)
m1 <- mma(Prostate,lpsa.,ycol=’lpsa’)
predict(m1)
predict(m1, newdata=Prostate[1,])
Depending on the specific problem, optimal model averaging may be a good prediction
algorithm or not. To choose and combine the best prediction methods, super learning can be
used. Super learning means considering a set of prediction algorithms, for example regression
models, shrinkage estimators or regression trees. Instead of choosing the algorithm with the
smallest cross validation error, super learning chooses a weighted combination of different
algorithms, that is the weighted combination which minimizes the cross validation error. It can
be shown that this weighted combination will perform at least as good as the best algorithm,
if not better (Van der Laan et al.,2008). One may interpret this procedure as model averaging
in a broader sense. The interested reader is referred to Van der Laan and Petersen (2007) and
Van der Laan and Rose (2011), and the references therein, for more details.
Briefly, MAMI contains several wrappers that can be used for super learning. They are listed
and explained by typing:
listSLWrappers()
Note that the package SuperLearner (Polley et al.,2017) is required. A simple example
from ?listSLWrappers would be:
21
library(SuperLearner) # needs to be installed
SL.library <- c(’SL.glm’,’SL.stepAIC’, ’SL.mean’, ’SL.mma.int’, ’SL.jma’)
SL.library2 <- c(’SL.glm’,’SL.stepAIC’, ’SL.mean’, ’SL.lae2’)
data(Prostate)
P1 <- SuperLearner(Y=Prostate[,9], X=Prostate[,-9], SL.library =
SL.library, verbose=T)
P2 <- SuperLearner(Y=Prostate[,5], X=Prostate[,-5], family=’binomial’,
SL.library = SL.library2, verbose=T)
P2$coef
P2$SL.predict
In the above example both a continuous outcome (P1) and a binary outcome (P2) is pre-
dicted – using generalized linear models, the arithmetic mean, models selected by AIC, LASSO
averaging including squared varriables (‘SL.lae2’), among others. To see the weight each algo-
rithm contributes to the prediction type P2$coef. The prediction itself is P2$SL.predict.
Super learning is often used for estimation of causal estimands with targeted maximum
likelihood estimation. The implemented wrappers can be used, and have been tested, with
package tmle (Gruber and van der Laan,2012) and ltmle (Lendle et al.,2017).
6.4 Miscellaneous
Other options in mami.
report.exp: strongly recommended to set report.exp=T when fitting a logistic model, a
Poisson model or Cox model to obtain the odds ratio, incidence rate ratio and hazard ratio
respectively. Can also be of interest when the outcome of a linear model is log-transformed
and thus a log-linear model is interpreted.
print.warnings: if set as TRUE,mami prints not only warnings but also plenty of informa-
tion on the progress of the fitting procedure and on implied assumptions. The default is TRUE
to be as transparent as possible, but may be set as FALSE in simulations.
print.time: primarily intended to forecast the time when inference="+boot". Simply
multiplies the time of the model selection/averaging procedure on the original data times the
intended bootstrap runs.
22
Index
BMA,7
bic.glm
OR,19
MAMI
jma,11
lae,9
B.var,10
mami,5
B,12,13
CI,13
X.org,13
add.factor,7
add.interaction,7
add.stratum,7
add.transformation,7
aweights,7
candidate.models,19
criterion,7
cvr,1012
id,6
inference,12
kfold,10,12
method,79,12
model,6
outcome,7
print.time,22
print.warnings,22
report.exp,22
screen,6,20
var.remove,6
mma,8
bsa,9
variance,9
plot.mami,16
predict,21
print.mami,13,16
summary.mami,15,16
MASS
stepAIC,11
MuMIn,7
dredge,11,19
dc,15
subset,15
corpcor,9
glmnet
cv.glmnet,10
alpha,10,12
nlambda,10
lme4,8
quadprog,9
analysis model, 6
bootstrapping, 10,12
citation, 2
cross validation, 9
imputation model, 5
installation, 2
interpretation, 13
LASSO, 9
model averaging, 7
criterion based, 7
LASSO, 9
MMA, 8
model selection, 11
criterion based, 11
LASSO, 12
multiple imputation, 3
ncores, 20
parallelization, 20
super learner, 21
TMLE, 22
23
References
Barton, K. (2017). MuMIn: Multi-Model Inference. R package version 1.16.6/r405.
Buckland, S. T., K. P. Burnham, and N. H. Augustin (1997). Model selection: an integral part of inference.
Biometrics 53, 603–618.
Burnham, K. and D. Anderson (2002). Model selection and multimodel inference. A practical information-
theoretic approach. Springer, New York.
Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical
Society A 158, 419–466.
Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society
B 57, 45–97.
Gruber, S. and M. J. van der Laan (2012). tmle: An R package for targeted maximum likelihood estimation.
Journal of Statistical Software 51(13), 1–35.
Hansen, B. E. (2007). Least squares model averaging. Econometrica 75, 1175–1189.
Hansen, B. E. and J. Racine (2012). Jackknife model averaging. Journal of Econometrics, 167, 38–46.
Hjort, L. and G. Claeskens (2003). Frequentist model average estimators. Journal of the American Statistical
Association 98, 879–945.
Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999). Bayesian model averaging: a tutorial.
Statistical Science 14, 382–417.
Honaker, J. and G. King (2010). What to do about missing values in time series cross-section data. American
Journal of Political Science 54, 561–581.
Honaker, J., G. King, and M. Blackwell (2011). Amelia II: A program for missing data. Journal of Statistical
Software 45 (7), 1–47.
Karamchand, S., R. Leisegang, M. Schomaker, G. Maartens, L. Walters, M. Hislop, J. A. Dave, N. S. Levitt,
and K. Cohen (2016). Risk factors for incident diabetes in a cohort taking first-line nonnucleoside reverse
transcriptase inhibitor-based antiretroviral therapy. Medicine (Baltimore) 95 (9), e2844.
Leeb, H. and B. M. P¨otscher (2005). Model selection and inference: facts and fiction. Econometric Theory 21,
21–59.
Leeb, H. and B. M. P¨otscher (2006). Can one estimate the conditional distribution of post-model-selection
estimators? Annals of Statistics 34, 2554–2591.
Leeb, H. and B. M. P¨otscher (2008). Can one estimate the unconditional distribution of post-model-selection
estimators? Econometric Theory 24, 338–376.
Lendle, S. D., J. Schwab, M. L. Petersen, and M. J. van der Laan (2017). ltmle: An R package implementing
targeted minimum loss-based estimation for longitudinal data. Journal of Statistical Software 81 (1), 1–21.
Lipsitz, S., M. Parzen, and L. Zhao (2002). A degrees-of-freedom approximation in multiple imputation. Journal
of Statistical Computation and Simulation 72, 309–318.
Polley, E., E. LeDell, C. Kennedy, and M. van der Laan (2017). SuperLearner: Super Learner Prediction. R
package version 2.0-22.
Porter, M., M. A. Davies, M. K. Mapani, H. Rabie, S. Phiri, J. Nuttall, L. Fairlie, K. G. Technau, K. Stin-
son, R. Wood, M. Wellington, A. D. Haas, J. Giddy, F. Tanser, and B. Eley (2015). Outcomes of infants
starting antiretroviral therapy in southern africa, 2004-2012. Journal of Acquired Immune Deficiency Syn-
dromes 69 (5), 593–601.
otscher, B. (2006). The distribution of model averaging estimators and an impossibility result regarding its
estimation. In H. Ho, C. Ing, and T. Lai (Eds.), IMS Lecture Notes: Time series and related topics, Volume 52,
pp. 113–129.
Raftery, A., J. Hoeting, C. Volinsky, I. Painter, and K. Y. Yeung (2017). BMA: Bayesian Model Averaging. R
package version 3.18.7.
24
Rubin, D. and N. Schenker (1986). Multiple imputation for interval estimation from simple random samples
with ignorable nonresponse. Journal of the American Statistical Association 81, 366–374.
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Associa-
tion 91 (434), 473–489.
Schomaker, M. (2012). Shrinkage averaging estimation. Statistical Papers 53, 1015–1034.
Schomaker, M. and C. Heumann (2014). Model selection and model averaging after multiple imputation.
Computational Statistics and Data Analysis 71, 758–770.
Schomaker, M. and C. Heumann (2018). Bootstrap inference when using multiple imputation. Statistics in
Medicine 37 (14), 2252–2266.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society
B 58, 267–288.
van Buuren, S. and K. Groothuis-Oudshoorn (2011). mice: Multivariate imputation by chained equations in r.
Journal of Statistical Software 45(3), 1–67.
Van der Laan, M. and M. Petersen (2007). Statistical learning of origin-specific statistically optimal individu-
alized treatment rules. International Journal of Biostatistics 3, Article 3.
Van der Laan, M., E. Polley, and A. Hubbard (2008). Super learner. Statistical Applications in Genetics and
Molecular Biology 6, Article 25.
Van der Laan, M. and S. Rose (2011). Targeted Learning. Springer.
Visser, M., M. Stead, G. Walzl, R. Warren, M. Schomaker, H. Grewal, E. Swart, and G. Maartens (2012).
Baseline predictors of sputum conversion in pulmonary tuberculosis: importance of cavities, smoking, time
to detection and W-Beijing genotype. PLoS ONE 7, e29588.
Volinsky, C. T., D. Madigan, A. E. Raftery, and R. A. Kronmal (1997). Bayesian model averaging in proportional
hazard models. assessing the risk of a stroke. Journal of the Royal Statistical Society Series C-Applied
Statistics 46 (4), 433–448.
White, I., P. Royston, and A. Wood (2011). Multiple imputation using chained equations. Statistics in
Medicine 30, 377–399.
Wood, A., I. White, and P. Royston (2008). How should variable selection be performed with multiply imputed
data? Statistics in Medicine 27, 3227–3246.
25

Navigation menu