Causal Learning Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 16

Download
Open PDF In Browser	View PDF

Package ‘causalLearning’
September 7, 2018
Title Methods for heterogeneous treatment effect estimation
Version 1.0.0
Description The main functions are cv.causalBoosting and bagged.causalMARS,
which build upon the simpler causalBoosting and causalMARS functions. All of
these functions have their own predict methods.
Depends R (>= 3.3.0)
Imports ranger
License GPL-2
LazyData true
RoxygenNote 6.0.1
NeedsCompilation yes
Author Scott Powers [aut, cre],
Junyang Qian [aut],
Trevor Hastie [aut],
Robert Tibshirani [aut]
Maintainer Scott Powers 

R topics documented:
bagged.causalMARS . . . .
causalBoosting . . . . . . .
causalMARS . . . . . . . .
cv.causalBoosting . . . . . .
pollinated.ranger . . . . . .
predict.bagged.causalMARS
predict.causalBoosting . . .
predict.causalMARS . . . .
predict.causalTree . . . . . .
predict.cv.causalBoosting . .
predict.pollinated.ranger . .
predict.PTOforest . . . . . .
PTOforest . . . . . . . . . .
stratify . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

Index

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

2
3
4
7
8
9
9
10
11
11
12
13
13
14
16

1

2

bagged.causalMARS

bagged.causalMARS

Fit a bag of causal MARS models

Description
Fit a bag of causal MARS models
Usage
bagged.causalMARS(x, tx, y, nbag = 20, maxterms = 11, nquant = 5,
degree = ncol(x), eps = 1, backstep = FALSE, propensity = FALSE,
stratum = rep(1, nrow(x)), minnum = 5, verbose = FALSE)
Arguments
x

matrix of covariates

tx

vector of treatment indicators (0 or 1)

y

vector of response values

nbag

number of models to bag

maxterms

maximum number of terms to include in the regression basis (e.g. maxterms = 11
means intercept + 5 pairs added)

nquant

number of quantiles used in splitting

degree

max number of different predictors that can interact in model

eps

shrinkage factor for new term added

backstep

logical: should out-of-bag samples be used to prune each model? otherwise full
regression basis is used for each model

propensity

logical: should propensity score stratification be used?

stratum

optional vector giving propensity score stratum for each observation (only used
if propensity = TRUE)

minnum

minimum number of observations in each arm of each propensity score stratum
needed to estimate regression coefficients for basis (only used if propensity = TRUE)

verbose

logical: should progress be printed to console?

Value
an object of class bagged.causalMARS, which is itself a list of causalMARS objects
Examples
# Randomized experiment example
n = 100 # number of training-set patients to simulate
p = 10 # number of features for each training-set patient
# Simulate data
x = matrix(rnorm(n * p), nrow = n, ncol = p) # simulate covariate matrix
tx_effect = x[, 1] + (x[, 2] > 0) # simple heterogeneous treatment effect
tx = rbinom(n, size = 1, p = 0.5) # random treatment assignment

causalBoosting

3

y = rowMeans(x) + tx * tx_effect + rnorm(n, sd = 0.001) # simulate response
# Estimate bagged causal MARS model
fit_bcm = causalLearning::bagged.causalMARS(x, tx, y, nbag = 10)
pred_bcm = predict(fit_bcm, newx = x)
# Visualize results
plot(tx_effect, pred_bcm, main = 'Bagged causal MARS',
xlab = 'True treatment effect', ylab = 'Estimated treatment effect')
abline(0, 1, lty = 2)

causalBoosting

Fit a causal boosting model

Description
Fit a causal boosting model
Usage
causalBoosting(x, tx, y, num.trees = 500, maxleaves = 4, eps = 0.01,
splitSpread = 0.1, x.est = NULL, tx.est = NULL, y.est = NULL,
propensity = FALSE, stratum = NULL, stratum.est = NULL,
isConstVar = TRUE)
Arguments
x

matrix of covariates

tx

vector of treatment indicators (0 or 1)

y

vector of response values

num.trees

number of shallow causal trees to build

maxleaves

maximum number of leaves per causal tree

eps

learning rate

splitSpread

how far apart should the candidate splits be for the causal trees? (e.g. splitSpread = 0.1)
means we consider 10 quantile cutpoints as candidates for making split

x.est

optional matrix of estimation-set covariates used for honest re-estimation (ignored if tx.est = NULL or y.est = NULL)

tx.est

optional vector of estimation-set treatment indicators (ignored if x.est = NULL
or y.est = NULL)

y.est

optional vector of estimation-set response values (ignored if x.est = NULL or
y.est = NULL)

propensity

logical: should propensity score stratification be used?

stratum

optional vector giving propensity score stratum for each observation (only used
if propensity = TRUE)

stratum.est

optional vector giving propensity score stratum for each estimation-set observation (ignored if x.est = NULL or tx.est = NULL or y.est = NULL)

isConstVar

logical: for the causal tree splitting criterion (T-statistc), should it be assumed
that the noise variance is the same in treatment and control arms?

4

causalMARS

Details
This function exists primarily to be called by cv.causalBoosting because the num.trees parameter
generally needs to be tuned via cross-validation.
Value
an object of class causalBoosting with attributes:
• CBM: a list storing the intercept, the causal trees and eps
• tauhat: matrix of treatment effects for each patient for each step
• G1: estimated-treatment conditional mean for each patient
• G0: estimated-control conditional mean for each patient
• err.y: training error at each step, in predicting response
• num.trees: number of trees specified by function call
Examples
# Randomized experiment example
n = 100 # number of training-set patients to simulate
p = 10 # number of features for each training-set patient
# Simulate data
x = matrix(rnorm(n * p), nrow = n, ncol = p) # simulate covariate matrix
tx_effect = x[, 1] + (x[, 2] > 0) # simple heterogeneous treatment effect
tx = rbinom(n, size = 1, p = 0.5) # random treatment assignment
y = rowMeans(x) + tx * tx_effect + rnorm(n, sd = 0.001) # simulate response
# Estimate causal boosting model
fit_cb = causalBoosting(x, tx, y, num.trees = 500)
pred_cb = predict(fit_cb, newx = x, num.trees = 500)
# Visualize results
plot(tx_effect, pred_cb, main = 'Causal boosting',
xlab = 'True treatment effect', ylab = 'Estimated treatment effect')
abline(0, 1, lty = 2)

causalMARS

Fit a causal MARS model

Description
Fit a causal MARS model
Usage
causalMARS(x, tx, y, maxterms = 11, nquant = 5, degree = ncol(x),
eps = 1, backstep = FALSE, x.val = NULL, tx.val = NULL,
y.val = NULL, propensity = FALSE, stratum = rep(1, nrow(x)),
stratum.val = NULL, minnum = 5)

causalMARS

5

Arguments
x

matrix of covariates

tx

vector of treatment indicators (0 or 1)

y

vector of response values

maxterms

maximum number of terms to include in the regression basis (e.g. maxterms = 11
means intercept + 5 pairs added)

nquant

number of quantiles used in splitting

degree

max number of different predictors that can interact in model

eps

shrinkage factor for new term added

backstep

logical: after building out regression basis, should backward stepwise selection
be used to create a sequence of models, with the criterion evaluated on a validation set to choose among the sequence?

x.val

optional matrix of validation-set covariates (only used if backstep = TRUE)

tx.val

optional vector of validation-set treatment indicators (only used if backstep = TRUE)

y.val

optional vector of validation-set response values (only used if backstep = TRUE)

propensity

logical: should propensity score stratification be used?

stratum

optional vector giving propensity score stratum for each observation (only used
if propensity = TRUE)

stratum.val

optional vector giving propensity score stratum for each validation-set observation (only used if propensity = backstep = TRUE)

minnum

minimum number of observations in each arm of each propensity score stratum
needed to estimate regression coefficients for basis (only used if propensity = TRUE)

Details
parallel arms mars with backward stepwise BOTH randomized case and propensity stratum. data
structures: model terms (nodes) are numbered 1, 2, ... with 1 representing the intercept. forward
stepwise: modmatrix contains basis functions as model is built up – two columns are added at each
step. Does not include a column of ones for tidiness, we always add two terms, even when term
added in linear (so that reflected version is just zero). backward stepwise: khat is the sequence
of terms deleted at each step, based on deltahat = relative change in rss. rsstesthat is rss over test
(validation) set achieved by each reduced model in sequence- used later for selecting a member of
the sequence. active2 contains indices of columns with nonzero norm
Value
an object of class causalMARS with attributes:
• parent: indices of nodes that are parents at each stage
• childvar: index of predictor chosen at each forward step
• childquant: quantile of cutoff chosen at each forward step
• quant: quantiles of the columns of x
• active: indices of columns with nonzero norm
• allvars: list of variables appearing in each term
• khat: the sequence of terms deleted at each step
• deltahat: relative change in rss

6

causalMARS
• rsstesthat: validation-set rss achieved by each model in sequence
• setesthat: standard error for rsstesthat
• tim1: time elapsed during forward stepwise phase
• tim2: total time elapsed
• x
• tx
• y
• maxterms
• eps
• backstep
• propensity
• x.val
• tx.val
• y.val
• stratum
• stratum.val
• minnum

Examples
# Randomized experiment example
n = 100 # number of training-set patients to simulate
p = 10 # number of features for each training-set patient
# Simulate data
x = matrix(rnorm(n * p), nrow = n, ncol = p) # simulate covariate matrix
tx_effect = x[, 1] + (x[, 2] > 0) # simple heterogeneous treatment effect
tx = rbinom(n, size = 1, p = 0.5) # random treatment assignment
y = rowMeans(x) + tx * tx_effect + rnorm(n, sd = 0.001) # simulate response
# Estimate causal MARS model
fit_cm = causalLearning::causalMARS(x, tx, y)
pred_cm = predict(fit_cm, newx = x)
# Visualize results
plot(tx_effect, pred_cm, main = 'Causal MARS',
xlab = 'True treatment effect', ylab = 'Estimated treatment effect')
abline(0, 1, lty = 2)

cv.causalBoosting

cv.causalBoosting

7

Fit a causal boosting model with cross validation

Description
Fit a causal boosting model with cross validation
Usage
cv.causalBoosting(x, tx, y, num.trees = 500, maxleaves = 4, eps = 0.01,
splitSpread = 0.1, type.measure = c("effect", "response"), nfolds = 5,
foldid = NULL, propensity = FALSE, stratum = NULL, isConstVar = TRUE)
Arguments
x

matrix of covariates

tx

vector of treatment indicators (0 or 1)

y

vector of response values

num.trees

number of shallow causal trees to build

maxleaves

maximum number of leaves per causal tree

eps

learning rate

splitSpread

how far apart should the candidate splits be for the causal trees? (e.g. splitSpread = 0.1)
means we consider 10 quantile cutpoints as candidates for making split

type.measure

loss to use for cross validation: ’response’ returns mean-square error for predicting response in each arm. ’effect’ returns MSE for treatment effect using honest
over-fit estimation.

nfolds

number of cross validation folds

foldid

vector of fold membership

propensity

logical: should propensity score stratification be used?

stratum

optional vector giving propensity score stratum for each observation (only used
if propensity = TRUE)

isConstVar

logical: for the causal tree splitting criterion (T-statistc), should it be assumed
that the noise variance is the same in treatment and control arms?

Value
an object of class cv.causalBoosting which is an object of class causalBoosting with these
additional attributes:
• num.trees.min: number of trees with lowest CV error
• cvm: vector of mean CV error for each number of trees
• cvsd: vector of standard errors for mean CV errors

8

pollinated.ranger

Examples
# Randomized experiment example
n = 100 # number of training-set patients to simulate
p = 10 # number of features for each training-set patient
# Simulate data
x = matrix(rnorm(n * p), nrow = n, ncol = p) # simulate covariate matrix
tx_effect = x[, 1] + (x[, 2] > 0) # simple heterogeneous treatment effect
tx = rbinom(n, size = 1, p = 0.5) # random treatment assignment
y = rowMeans(x) + tx * tx_effect + rnorm(n, sd = 0.001) # simulate response
# Estimate causal boosting model with cross-validation
fit_cv = causalLearning::cv.causalBoosting(x, tx, y)
fit_cv$num.trees.min.effect # number of trees chosen by cross-validation
pred_cv = predict(fit_cv, newx = x)
# Visualize results
plot(tx_effect, pred_cv, main = 'Causal boosting w/ CV',
xlab = 'True treatment effect', ylab = 'Estimated treatment effect')
abline(0, 1, lty = 2)

pollinated.ranger

Pollinate a fitted ranger random forest model

Description
Pollinate a fitted ranger random forest model
Usage
pollinated.ranger(object, x, y)
Arguments
object

a fitted ranger object

x

matrix of covariates

y

vector of response values

Value
an object of class pollinated.ranger which is a ranger object that has been pollinated with the
data in (x, y)

predict.bagged.causalMARS

9

predict.bagged.causalMARS
Make predictions from a bag of fitted causal MARS models

Description
Make predictions from a bag of fitted causal MARS models
Usage
## S3 method for class 'bagged.causalMARS'
predict(object, newx, type = c("average", "all"),
...)
Arguments
object

a fitted bagged.causalMARS object

newx

matrix of new covariates for which estimated treatment effects are desired

type

type of prediction required: ’average’ returns a vector of the averages of the
bootstrap estimates. ’all’ returns a matrix of all of the bootstrap estimates.

...

ignored

Value
a vector of estimated personalized treatment effects corresponding to the rows of newx

predict.causalBoosting
Make predictions from a fitted causal boosting model

Description
Make predictions from a fitted causal boosting model
Usage
## S3 method for class 'causalBoosting'
predict(object, newx, newtx = NULL,
type = c("treatment.effect", "conditional.mean", "response"),
num.trees = 1:object$num.trees, honest = FALSE, naVal = 0, ...)

10

predict.causalMARS

Arguments
object

a fitted causalBoosting object

newx

matrix of new covariates for which estimated treatment effects are desired

newtx

option vector of new treatment assignments (only used if type = 'response')

type

type of prediction required: ’treatment.effect’ returns estimated treatment effect.
’conditional.mean’ returns two predictions, one for each arm. ’response’ returns
prediction for arm corresponding to newtx.

num.trees

number(s) of shallow causal trees to use for prediction

honest

logical: should honest re-estimates of leaf means be used for prediction? This
requires that x.est, tx.est, y.est were specified when the causal boosting
model was fit

naVal

value with which to replace NA predictions

...

ignored

Value
a vector or matrix of predictions corresponding to the rows of newx

predict.causalMARS

Make predictions from a fitted causal MARS model

Description
Make predictions from a fitted causal MARS model
Usage
## S3 method for class 'causalMARS'
predict(object, newx, active, ...)
Arguments
object

a fitted causalMARS object

newx

matrix of new covariates for which estimated treatment effects are desired

active

indices of columns with nonzero norm (defaults to model selected via backward
stepwise phase, or the full model if backstep = FALSE)

...

ignored

Value
a vector of estimated personalized treatment effects corresponding to the rows of newx

predict.causalTree

11

predict.causalTree

Make predictions from a fitted causal tree model

Description
Make predictions from a fitted causal tree model
Usage
## S3 method for class 'causalTree'
predict(object, newx, newtx = NULL,
type = c("treatment.effect", "conditional.mean", "response"),
honest = FALSE, naVal = 0, ...)
Arguments
object

a fitted causalTree object

newx

matrix of new covariates for which estimated treatment effects are desired

newtx

option vector of new treatment assignments (only used if type = 'response')

type

type of prediction required: ’treatment.effect’ returns estimated treatment effect.
’conditional.mean’ returns two predictions, one for each arm. ’response’ returns
prediction for arm corresponding to newtx.

honest

logical: should honest re-estimates of leaf means be used for prediction? This
requires that x.est, tx.est, y.est were specified when the causal boosting
model was fit

naVal

value with which to replace NA predictions

...

ignored

Value
a vector or matrix of predictions corresponding to the rows of newx

predict.cv.causalBoosting
Make predictions from a fitted cross-validated causal boosting model

Description
Make predictions from a fitted cross-validated causal boosting model
Usage
## S3 method for class 'cv.causalBoosting'
predict(object, newx, newtx = NULL,
type = c("treatment.effect", "conditional.mean", "response"),
num.trees = object$num.trees.min.effect, naVal = 0, ...)

12

predict.pollinated.ranger

Arguments
object

a fitted cv.causalBoosting object

newx

matrix of new covariates for which estimated treatment effects are desired

newtx

option vector of new treatment assignments (only used if type = 'individual')

type

type of prediction required: ’treatment.effect’ returns estimated treatment effect.
’conditional.mean’ returns two predictions, one for each arm. ’response’ returns
prediction for arm corresponding to newtx.

num.trees

number of shallow causal trees to use for prediction

naVal

value with which to replace NA predictions

...

ignored

Value
a vector or matrix of predictions corresponding to the rows of newx

predict.pollinated.ranger
Make predictions from a pollinated ranger random forest model

Description
Make predictions from a pollinated ranger random forest model
Usage
## S3 method for class 'pollinated.ranger'
predict(object, newx, predict.all = FALSE,
na.treatment = c("omit", "replace", "NA"), ...)
Arguments
object

a fitted pollinated.ranger object

newx

matrix of new covariates for which predictions are desired

predict.all

logical: should predictions from all trees be returned? Otherwise the average
across trees is returned

na.treatment

how to treat NA predictions from individual trees: ’omit’ only uses trees for
which the prediction is not NA. ’replace’ replaces NA predictions with the overall mean response. ’NA’ returns NA if any tree prediction is NA.

...

additional arguments passed on to predict.ranger

Value
a vector of predicted treatment effects corresponding to the rows of newx

predict.PTOforest

predict.PTOforest

13

Make predictions from a fitted PTO forest model

Description
Make predictions from a fitted PTO forest model
Usage
## S3 method for class 'PTOforest'
predict(object, newx, ...)
Arguments
object

a fitted PTOforest object

newx

matrix of new covariates for which estimated treatment effects are desired

...

ignored

Value
a vector of predictions corresponding to the rows of newx

PTOforest

Fit a pollinated transformed outcome (PTO) forest model

Description
Fit a pollinated transformed outcome (PTO) forest model
Usage
PTOforest(x, tx, y, pscore = rep(0.5, nrow(x)), num.trees = 500,
mtry = ncol(x), min.node.size = max(25, nrow(x)/40), postprocess = TRUE,
verbose = FALSE)
Arguments
x

matrix of covariates

tx

vector of treatment indicators (0 or 1)

y

vector of response values

pscore

vector of propensity scores

num.trees

number of trees for transformed outcome forest

mtry

number of variables to possibly split at in each node

min.node.size

minimum node size for transformed outcome forest

postprocess

logical: should optional post-processing random forest be fit at end?

verbose

logical: should progress be printed to console?

14

stratify

Value
an object of class PTOforest with attributes:
• x: matrix of covariates supplied by function call
• pscore: vector of propensity score supplied by function call
• postprocess: logical supplied by function call
• TOfit: fitted random forest on transformed outcomes
• PTOfit1: TOfit pollinated with treatment-arm outcomes
• PTOfit0: TOfit pollinated with control-arm outcomes
• postfit: post-processing random forest summarizing results
Examples
# Randomized experiment example
n = 100 # number of training-set patients to simulate
p = 10 # number of features for each training-set patient
# Simulate data
x = matrix(rnorm(n * p), nrow = n, ncol = p) # simulate covariate matrix
tx_effect = x[, 1] + (x[, 2] > 0) # simple heterogeneous treatment effect
tx = rbinom(n, size = 1, p = 0.5) # random treatment assignment
y = rowMeans(x) + tx * tx_effect + rnorm(n, sd = 0.001) # simulate response
# Estimate PTO forest model
fit_pto = PTOforest(x, tx, y)
pred_pto = predict(fit_pto, newx = x)
# Visualize results
plot(tx_effect, pred_pto, main = 'PTO forest',
xlab = 'True treatment effect', ylab = 'Estimated treatment effect')
abline(0, 1, lty = 2)

stratify

Get propensity strata from propensity scores

Description
Get propensity strata from propensity scores
Usage
stratify(pscore, tx, min.per.arm = 30)
Arguments
pscore

vector of propensity scores

tx

vector of treatment indicators

min.per.arm

minimum number of observations for each arm within each stratum

stratify

15

Value
a vector of integers with length equal to the length of pscore, reporting the propensity stratum
corresponding to each propensity score

Index
bagged.causalMARS, 2
causalBoosting, 3
causalMARS, 4
cv.causalBoosting, 7
pollinated.ranger, 8
predict.bagged.causalMARS, 9
predict.causalBoosting, 9
predict.causalMARS, 10
predict.causalTree, 11
predict.cv.causalBoosting, 11
predict.pollinated.ranger, 12
predict.PTOforest, 13
PTOforest, 13
stratify, 14

16

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 16
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.15
Create Date                     : 2018:09:07 08:59:15-07:00
Modify Date                     : 2018:09:07 08:59:15-07:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) kpathsea version 6.2.0

EXIF Metadata provided by EXIF.tools

Causal Learning Manual

Navigation menu

Versions of this User Manual:

Views

Navigation