Longitudinal causal analysis without censoring
Xiao Yan, Kuan Liu
Source:vignettes/bayesmsm-nocensoring.Rmd
bayesmsm-nocensoring.Rmd
Introduction
-
The
bayesmsm
package enables easy implementation of the Bayesian marginal structural models (BMSMs) for longitudinal data. The methodology of BMSMs can be divided into 2 estimation steps:- Step 1. Bayesian treatment effect weight estimation
- Step 2. Bayesian non-parametric bootstrap to maximize the utility function with respect to the causal effect
For Step 1, we estimate treatment weights using posterior samples of the and via fitting a series of logistic regressions in a Bayesian framework. The package incorporates both Inverse Probability of Treatment Weighting (IPTW) and Inverse Probability of Censoring Weighting (IPCW) to handle longitudinal data without and with right-censoring. For Step 2, is estimated via non-parametric Bayesian bootstrap with sampling weights.
-
The main functions in this package include:
-
bayesweight
: Calculates Bayesian weights for subject-specific treatment effects. -
bayesweight_cen
: Calculates Bayesian weights for subject-specific treatment effects with right-censored data. -
bayesmsm
: Estimates marginal structural models using the calculated Bayesian weights. -
plot_ATE
: Plots the estimated Average Treatment Effect (ATE). -
plot_APO
: Plots the estimated Average Potential Outcome (APO). -
plot_est_box
: Plots the distribution of estimated treatment effects. -
summary_bayesmsm
: Summarizes the model results frombayesmsm
.
-
-
Installation
- To install the bayesmsm package, you can use the
devtools
package to install it directly from GitHub:
- To install the bayesmsm package, you can use the
devtools::install_github("Kuan-Liu-Lab/bayesmsm")
library(bayesmsm)
Simulated observational data with a time-dependent treatment
- The simulated dataset (continuous outcome)
- 1000 patients and 3 visits (2 of which patients were assigned a treatment)
- y, an end-of-study continuous outcome
- z, a binary treatment
- w1 and w2 are two baseline covariates (one continuous and one binary, mimicking age and sex)
- L1 and L2 are two time-dependent covariates (also one continuous and one binary)
- no missing data
- The simulated DAG
library(DiagrammeR)
grViz("
digraph causal {
# Nodes
node [shape=plaintext]
W [label = 'w1, w2']
L1 [label = 'L11, L21']
Z1 [label = 'Z1']
L2 [label = 'L12, L22']
Z2 [label = 'Z2']
Y [label = 'Y']
# Edges
edge [color=black, arrowhead=vee]
rankdir = LR
W->L1
W->Z1
W->L2
W->Z2
W->Y
L1->Z1
L1->L2
L1->Z2
L1->Y
Z1->L2
Z1->Z2
Z1->Y
L2->Z2
L2->Y
Z2->Y
# Graph
graph [overlap=true, fontsize=14]
}")
library(DT)
options(scipen = 999)
testdata <- read.csv(system.file("extdata", "continuous_outcome_data.csv", package = "bayesmsm"))
# look at the data;
datatable(testdata,
rownames = FALSE,
options = list(dom = 't')) %>%
formatRound(columns=c('w2', 'L2_1', 'L2_2', 'y'), digits=2)
- Frequency Counts by Treatment Combinations
# frequency counts by treatment combinations;
table(testdata$a_1, testdata$a_2)
#>
#> 0 1
#> 0 520 201
#> 1 111 168
- Suppose the causal parameter of interest is the average treatment effect between always treated and never treated,
Bayesian treatment effect weight estimation using
bayesweight
- The following code calls the function
bayesweight
to run JAGS and calculate the weights.- Non-parallel computing requires that
n.chains = 1
. Parallel MCMC requires at least 2 chains because computing is running on 1 core per chain, and we recommend using at most 2 chains less than the number of available cores on your computer. - Running this function automatically saves a JAGS model file in the working directory, which the user can check to review the model specifications.
- Non-parallel computing requires that
- Parameters Description:
-
trtmodel.list
: A list of formulas corresponding to each time point with the time-specific treatment variable on the left hand side and pre-treatment covariates to be balanced on the right hand side. Interactions and functions of covariates are allowed. -
data
: The dataset containing all the variables specified in trtmodel.list. -
n.iter
: Total number of iterations for each chain (including burn-in). -
n.burnin
: Number of iterations to discard at the beginning of the simulation (burn-in). -
n.thin
: Thinning rate for the MCMC sampler. -
n.chains
: Number of MCMC chains to run. For non-parallel execution, this should be set to 1. For parallel execution, it requires at least 2 chains. -
seed
: Seed to ensure reproducibility. -
parallel
: Logical flag indicating whether to run the MCMC chains in parallel. Default is TRUE.
-
weights <- bayesweight(trtmodel.list = list(a_1 ~ w1 + w2 + L1_1 + L2_1,
a_2 ~ w1 + w2 + L1_1 + L2_1 + L1_2 + L2_2 + a_1),
data = testdata,
n.iter = 2000,
n.burnin = 1000,
n.thin = 5,
n.chains = 2,
seed = 890123,
parallel = TRUE)
str(weights)
#> num [1:1000] 1.234 1.134 1.021 0.82 0.795 ...
- It returns a list containing:
-
weights
: The calculated weights for subject-specific treatment effects.
-
Bayesian non-parametric bootstrap to maximize the utility function
with respect to the causal effect using bayesmsm
The function bayesmsm
estimates causal effect of
time-varying treatments. It uses subject-specific treatment assignmennt
weights weights calculated using bayesweight
, and
performs Bayesian non-parametric bootstrap to estimate the causal
parameters.
- Parameters Description:
-
ymodel
: A formula representing the outcome model, which can include interactions and functions of covariates. -
nvisit
: Specifies the number of visits or time points considered in the model. -
reference
: The baseline or reference intervention across all visits, typically represented by a vector of zeros indicating no treatment (default is a vector of all zeros). -
comparator
: The comparison intervention across all visits, typically represented by a vector of ones indicating full treatment (default is a vector of all ones). -
family
: Specifies the outcome distribution family; use “gaussian” for continuous outcomes or “binomial” for binary outcomes (default is “gaussian”). -
data
: The dataset containing all variables required for the model. -
wmean
: A vector of treatment assignment weights. Default is a vector of ones, implying equal weighting. -
nboot
: The number of bootstrap iterations to perform for estimating the uncertainty around the causal estimates. -
optim_method
: The optimization method used to find the best parameters in the model (default is ‘BFGS’). -
seed
: A seed value to ensure reproducibility of results. -
parallel
: A logical flag indicating whether to perform computations in parallel (default is TRUE). -
ncore
: The number of cores to use for parallel computation (default is 4).
-
model <- bayesmsm(ymodel = y ~ a_1+a_2,
nvisit = 2,
reference = c(rep(0,2)),
comparator = c(rep(1,2)),
family = "gaussian",
data = testdata,
wmean = weights,
nboot = 1000,
optim_method = "BFGS",
parallel = TRUE,
seed = 890123,
ncore = 2)
str(model)
#> List of 6
#> $ RD_mean : num -3.16
#> $ RD_sd : num 0.0948
#> $ RD_quantile: Named num [1:2] -3.34 -2.98
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ bootdata :'data.frame': 1000 obs. of 3 variables:
#> ..$ effect_reference : num [1:1000] 2.36 2.29 2.26 2.34 2.31 ...
#> ..$ effect_comparator: num [1:1000] -0.946 -0.825 -0.754 -0.796 -0.764 ...
#> ..$ RD : num [1:1000] -3.31 -3.11 -3.02 -3.13 -3.08 ...
#> $ reference : num [1:2] 0 0
#> $ comparator : num [1:2] 1 1
- It returns a model object which contains:
-
mean
,sd
,quantile
: the mean, standard deviation and 95% credible interval of the estimated causal effect (ATE). From the above results, the mean of ATE is approximately -3.161, which indicates that the expected outcome for always treated patients is, on average, 3.161 units less than that for never treated patients. -
bootdata
: a data frame containing the bootstrap samples for the reference effect, comparator effect, and average treatment effect (ATE). -
reference
,comparator
: the reference level and comparator level the user chooses to compare. Here the reference level is never treated (0,0), and the comparator level is always treated (1,1).
-
Visualization functions: plot_ATE
,
plot_APO
, plot_est_box
The bayesmsm
package also provides several other
functions for visualizing the above results: plot_ATE
,
plot_APO
, and plot_est_box
. These functions
help the user better interpret the estimated causal effects.
- Plotting the Average Treatment Effect (ATE)
- The
plot_ATE
function generates a plot of the estimated ATE with its 95% credible interval.
- The
plot_ATE(model)
- Plotting the Average Potential Outcome (APO)
- Similarly, the
plot_APO
function plots the estimated APO for both the reference and comparator level effects.
- Similarly, the
plot_APO(model, "effect_reference")
plot_APO(model, "effect_comparator")
- Plotting the Distribution of Estimated Treatment Effects
- The
plot_est_box
function generates an error bar plot of the estimated treatment effects (APO and ATE) from the bootstrap samples.
- The
plot_est_box(model)
Reference
- Liu, K. (2021). Bayesian causal inference with longitudinal data. Tspace.library.utoronto.ca. https://tspace.library.utoronto.ca/handle/1807/109330
- Saarela, O., Stephens, D. A., Moodie, E. E. M., & Klein, M. B. (2015). On Bayesian estimation of marginal structural models. Biometrics, 71(2), 279–288. https://doi.org/10.1111/biom.12269
- Robins, J. M., Hernán, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560. https://doi.org/10.1097/00001648-200009000-00011
- Liu, K., Saarela, O., Feldman, B. M., & Pullenayegum, E. (2020). Estimation of causal effects with repeatedly measured outcomes in a Bayesian framework. Statistical Methods in Medical Research, 29(9), 2507–2519. https://doi.org/10.1177/0962280219900362