Longitudinal causal analysis with informative right-censoring
Xiao Yan, Kuan Liu
Source:vignettes/bayesmsm-censoring.Rmd
bayesmsm-censoring.Rmd
Introduction
-
The
bayesmsm
package enables easy implementation of the Bayesian marginal structural models (BMSMs) for longitudinal data. The methodology of BMSMs can be divided into 2 estimation steps:- Step 1. Bayesian treatment effect weight estimation
- Step 2. Bayesian non-parametric bootstrap to maximize the utility function with respect to the causal effect
For Step 1, we estimate treatment weights using posterior samples of the and via fitting a series of logistic regressions in a Bayesian framework. The package incorporates both Inverse Probability of Treatment Weighting (IPTW) and Inverse Probability of Censoring Weighting (IPCW) to handle longitudinal data without and with right-censoring. For Step 2, is estimated via non-parametric Bayesian bootstrap with sampling weights.
-
The main functions in this package include:
-
bayesweight
: Calculates Bayesian weights for subject-specific treatment effects. -
bayesweight_cen
: Calculates Bayesian weights for subject-specific treatment effects with right-censored data. -
bayesmsm
: Estimates marginal structural models using the calculated Bayesian weights. -
plot_ATE
: Plots the estimated Average Treatment Effect (ATE). -
plot_APO
: Plots the estimated Average Potential Outcome (APO). -
plot_est_box
: Plots the distribution of estimated treatment effects. -
summary_bayesmsm
: Summarizes the model results frombayesmsm
.
-
-
Installation
- To install the bayesmsm package, you can use the
devtools
package to install it directly from GitHub:
- To install the bayesmsm package, you can use the
devtools::install_github("Kuan-Liu-Lab/bayesmsm")
library(bayesmsm)
Simulated observational data with right-censoring
We illustrate the implementation of the bayesmsm
package
using a simulated dataset. The simulated dataset contains
right-censoring with a binary end-of-study outcome. This example will
provide a comprehensive understanding of how to apply the package to
real-world data.
Dataset Introduction
In this simulation study, we use a simulated longitudinal dataset to mimic complex real-world clinical data with right-censoring. This dataset consists of 500 patients observed over 3 visits. The binary outcome variable represents the end-of-study status of the patients. The dataset includes baseline covariates and , with being binary and continuous. Time-dependent covariates and are observed at the second visit, and and at the third visit. The treatment variables are represented as , , and for the three visits. Right-censoring indicators are represented as , , and . For example, for observations with , all records at or after visit 1 were censored.
Variable | Description |
---|---|
L11 | Baseline covariate (binary) |
L21 | Baseline covariate (continuous) |
A1, A2, A3 | Treatment assignments (binary) |
C1, C2, C3 | Right-censoring indicators |
Y | End-of-study outcome (binary) |
# simulating causal data with censoring;
simdat_cen <- read.csv(system.file("extdata", "sim_causal.csv", package = "bayesmsm"))
# look at the data;
head(simdat_cen)
#> L11 L21 A1 L12 L22 A2 L13 L23 A3 C1 C2 C3 Y
#> 1 0 -0.37560287 NA NA NA NA NA NA NA 1 NA NA NA
#> 2 1 -0.56187636 NA NA NA NA NA NA NA 1 NA NA NA
#> 3 0 -0.34391723 1 1 -0.8053290 1 1 -2.5508298 1 0 0 0 0
#> 4 1 0.09049665 1 1 -1.7754302 1 1 -2.2849997 0 0 0 0 0
#> 5 1 1.59850877 1 1 0.5604468 1 0 2.1787137 0 0 0 0 1
#> 6 0 -0.08856511 0 0 1.4798603 0 1 0.7025151 0 0 0 0 1
Bayesian treatment effect weight estimation using
bayesweight_cen
Next, we use the bayesweight_cen
function to estimate
the weights with censoring. We specify the treatment and censoring
models for each time point, including the relevant covariates.
- Parameters Description:
-
trtmodel.list
: A list of formulas corresponding to each time point with the time-specific treatment variable on the left hand side and pre-treatment covariates to be balanced on the right hand side. Interactions and functions of covariates are allowed. -
cenmodel.list
: A list of formulas of the censoring model with the censoring indicators on the left hand side and the covariates prior to the censoring indicators on the right hand side. -
data
: The dataset containing all the variables specified in trtmodel.list. -
n.iter
: Total number of iterations for each chain (including burn-in). -
n.burnin
: Number of iterations to discard at the beginning of the simulation (burn-in). -
n.thin
: Thinning rate for the MCMC sampler. -
n.chains
: Number of MCMC chains to run. For non-parallel execution, this should be set to 1. For parallel execution, it requires at least 2 chains. -
seed
: Seed to ensure reproducibility. -
parallel
: Logical flag indicating whether to run the MCMC chains in parallel. Default is TRUE.
-
weights <- bayesweight_cen(trtmodel.list = list(A1 ~ L11 + L21,
A2 ~ L11 + L21 + L12 + L22 + A1,
A3 ~ L11 + L21 + L12 + L22 + A1 + L13 + L23 + A2),
cenmodel.list = list(C1 ~ L11 + L21,
C2 ~ L11 + L21 + A1,
C3 ~ L11 + L21 + A1 + L12 + L22 + A2),
data = simdat_cen,
n.iter = 250,
n.burnin = 150,
n.thin = 5,
n.chains = 1,
seed = 890123,
parallel = FALSE)
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 4836
#> Unobserved stochastic nodes: 44
#> Total graph size: 16501
#>
#> Initializing model
summary(weights)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.2191 1.2419 2.1838 3.9613 5.1154 53.1423 136
Similarly, the function will automatically run MCMC with JAGS based on the specified treatment and censoring model inputs, saving a JAGS model file in the working directory for model customization. The function returns a list containing the updated weights for subject-specific treatment and censoring effects.
Bayesian non-parametric bootstrap to maximize the utility function
with respect to the causal effect using bayesmsm
Using the weights estimated by bayesweight_cen
, we now
fit the Bayesian Marginal Structural Model and estimate the marginal
treatment effects using the bayesmsm
function as before. We
specify the outcome model and other relevant parameters.
# Remove all NAs (censored observations) from the original dataset and weights
simdat_cen <- na.omit(simdat_cen)
weights <- na.omit(weights)
model <- bayesmsm(ymodel = Y ~ A1+A2+A3,
nvisit = 3,
reference = c(rep(0, 3)),
comparator = c(rep(1, 3)),
family = "binomial",
data = simdat_cen,
wmean = weights,
nboot = 364,
optim_method = "BFGS",
parallel = FALSE,
seed = 890123,
ncore = 1)
str(model)
#> List of 12
#> $ RD_mean : num 0.308
#> $ RR_mean : num 1.84
#> $ OR_mean : num 4.48
#> $ RD_sd : num 0.124
#> $ RR_sd : num 0.507
#> $ OR_sd : num 2.87
#> $ RD_quantile: Named num [1:2] 0.0792 0.5402
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ RR_quantile: Named num [1:2] 1.14 3.13
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ OR_quantile: Named num [1:2] 1.39 11.29
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ bootdata :'data.frame': 364 obs. of 5 variables:
#> ..$ effect_reference : num [1:364] -0.7733 -0.9519 -0.1119 -0.6218 0.0165 ...
#> ..$ effect_comparator: num [1:364] 1.374 1.12 0.853 1.195 0.504 ...
#> ..$ RD : num [1:364] 0.482 0.476 0.229 0.418 0.119 ...
#> ..$ RR : num [1:364] 2.53 2.71 1.49 2.2 1.24 ...
#> ..$ OR : num [1:364] 8.56 7.94 2.62 6.15 1.63 ...
#> $ reference : num [1:3] 0 0 0
#> $ comparator : num [1:3] 1 1 1
The bayesmsm
function returns a model object containing
the following: the mean, standard deviation, and 95% credible interval
of the Risk Difference (RD), Risk Ratio (RR), and Odds Ratio (OR). It
also includes a data frame containing the bootstrap samples for the
reference effect, comparator effect, RD, RR, and OR, as well as the
reference and comparator levels chosen by the user.
- We can extract and visualize the results as follows:
# Extract results
head(model$bootdata)
#> effect_reference effect_comparator RD RR OR
#> 1 -0.77330885 1.3741127 0.4822800 2.527344 8.562751
#> 2 -0.95194861 1.1201767 0.4755284 2.707505 7.941684
#> 3 -0.11185065 0.8530526 0.2291406 1.485399 2.624534
#> 4 -0.62184259 1.1949757 0.4182673 2.197230 6.152253
#> 5 0.01652478 0.5042640 0.1193298 1.236704 1.628630
#> 6 -0.27175158 0.8775624 0.2738397 1.633189 3.156027
Visualization functions: plot_ATE
,
plot_APO
, plot_est_box
Similarly, we can use the built-in functions as well as
summary_bayesmsm
to visualize and summarize the
results.
- Plotting the Average Treatment Effect (ATE)
- The
plot_ATE
function generates a plot of the estimated ATE with its 95% credible interval.
- The
plot_ATE(model)
- Plotting the Average Potential Outcome (APO)
- The
plot_APO
function plots the estimated APO for both the reference and comparator level effects.
- The
plot_APO(model, effect_type = "effect_comparator")
plot_APO(model, effect_type = "effect_reference")
- Plotting the Distribution of Estimated Treatment Effects
- The
plot_est_box
function generates an error bar plot of the estimated treatment effects (APO and ATE) from the bootstrap samples.
- The
plot_est_box(model)
- Summary function to generate result table from
bayesmsm
- The
summary_bayesmsm
function automatically generates a summary table of the model output from the functionbayesmsm
.
- The
summary_bayesmsm(model)
#> mean sd 2.5% 97.5%
#> Reference -0.3648858 0.3505655 -1.11200604 0.2898218
#> Comparator 0.9691181 0.3111957 0.38740379 1.5981192
#> RD 0.3083083 0.1238811 0.07919336 0.5402016
#> RR 1.8412739 0.5071578 1.14213799 3.1259907
#> OR 4.4819705 2.8740524 1.39094085 11.2924450
Reference
- Liu, K. (2021). Bayesian causal inference with longitudinal data. Tspace.library.utoronto.ca. https://tspace.library.utoronto.ca/handle/1807/109330
- Saarela, O., Stephens, D. A., Moodie, E. E. M., & Klein, M. B. (2015). On Bayesian estimation of marginal structural models. Biometrics, 71(2), 279–288. https://doi.org/10.1111/biom.12269
- Robins, J. M., Hernán, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560. https://doi.org/10.1097/00001648-200009000-00011
- Liu, K., Saarela, O., Feldman, B. M., & Pullenayegum, E. (2020). Estimation of causal effects with repeatedly measured outcomes in a Bayesian framework. Statistical Methods in Medical Research, 29(9), 2507–2519. https://doi.org/10.1177/0962280219900362