Skip to contents

Introduction

  • The bayesmsm package enables easy implementation of the Bayesian marginal structural models (BMSMs) for longitudinal data. The methodology of BMSMs can be divided into 2 estimation steps:

    • Step 1. Bayesian treatment effect weight estimation
    • Step 2. Bayesian non-parametric bootstrap to maximize the utility function with respect to the causal effect
  • For Step 1, we estimate treatment weights wijw_{ij} using posterior samples of the α\alpha and β\beta via fitting a series of logistic regressions in a Bayesian framework. The package incorporates both Inverse Probability of Treatment Weighting (IPTW) and Inverse Probability of Censoring Weighting (IPCW) to handle longitudinal data without and with right-censoring. For Step 2, Pn(vij)P_n(v_{ij}) is estimated via non-parametric Bayesian bootstrap with Dir(1,...,1)Dir(1,...,1) sampling weights.

  • The main functions in this package include:

    • bayesweight: Calculates Bayesian weights for subject-specific treatment effects.
    • bayesweight_cen: Calculates Bayesian weights for subject-specific treatment effects with right-censored data.
    • bayesmsm: Estimates marginal structural models using the calculated Bayesian weights.
    • plot_ATE: Plots the estimated Average Treatment Effect (ATE).
    • plot_APO: Plots the estimated Average Potential Outcome (APO).
    • plot_est_box: Plots the distribution of estimated treatment effects.
    • summary_bayesmsm: Summarizes the model results from bayesmsm.
  • Installation

    • To install the bayesmsm package, you can use the devtools package to install it directly from GitHub:
devtools::install_github("Kuan-Liu-Lab/bayesmsm")
library(bayesmsm)

Simulated observational data with right-censoring

We illustrate the implementation of the bayesmsm package using a simulated dataset. The simulated dataset contains right-censoring with a binary end-of-study outcome. This example will provide a comprehensive understanding of how to apply the package to real-world data.

Dataset Introduction

In this simulation study, we use a simulated longitudinal dataset to mimic complex real-world clinical data with right-censoring. This dataset consists of 500 patients observed over 3 visits. The binary outcome variable represents the end-of-study status of the patients. The dataset includes baseline covariates L11L11 and L21L21, with L11L11 being binary and L21L21 continuous. Time-dependent covariates L12L12 and L22L22 are observed at the second visit, and L13L13 and L23L23 at the third visit. The treatment variables are represented as A1A1, A2A2, and A3A3 for the three visits. Right-censoring indicators are represented as C1C1, C2C2, and C3C3. For example, for observations with C1=1C1 = 1, all records at or after visit 1 were censored.

Description of Simulated Right-Censored Dataset
Variable Description
L11 Baseline covariate (binary)
L21 Baseline covariate (continuous)
A1, A2, A3 Treatment assignments (binary)
C1, C2, C3 Right-censoring indicators
Y End-of-study outcome (binary)
# simulating causal data with censoring;
simdat_cen <- read.csv(system.file("extdata", "sim_causal.csv", package = "bayesmsm"))

# look at the data;
head(simdat_cen)
#>   L11         L21 A1 L12        L22 A2 L13        L23 A3 C1 C2 C3  Y
#> 1   0 -0.37560287 NA  NA         NA NA  NA         NA NA  1 NA NA NA
#> 2   1 -0.56187636 NA  NA         NA NA  NA         NA NA  1 NA NA NA
#> 3   0 -0.34391723  1   1 -0.8053290  1   1 -2.5508298  1  0  0  0  0
#> 4   1  0.09049665  1   1 -1.7754302  1   1 -2.2849997  0  0  0  0  0
#> 5   1  1.59850877  1   1  0.5604468  1   0  2.1787137  0  0  0  0  1
#> 6   0 -0.08856511  0   0  1.4798603  0   1  0.7025151  0  0  0  0  1

Bayesian treatment effect weight estimation using bayesweight_cen

Next, we use the bayesweight_cen function to estimate the weights with censoring. We specify the treatment and censoring models for each time point, including the relevant covariates.

  • Parameters Description:
    • trtmodel.list: A list of formulas corresponding to each time point with the time-specific treatment variable on the left hand side and pre-treatment covariates to be balanced on the right hand side. Interactions and functions of covariates are allowed.
    • cenmodel.list: A list of formulas of the censoring model with the censoring indicators on the left hand side and the covariates prior to the censoring indicators on the right hand side.
    • data: The dataset containing all the variables specified in trtmodel.list.
    • n.iter: Total number of iterations for each chain (including burn-in).
    • n.burnin: Number of iterations to discard at the beginning of the simulation (burn-in).
    • n.thin: Thinning rate for the MCMC sampler.
    • n.chains: Number of MCMC chains to run. For non-parallel execution, this should be set to 1. For parallel execution, it requires at least 2 chains.
    • seed: Seed to ensure reproducibility.
    • parallel: Logical flag indicating whether to run the MCMC chains in parallel. Default is TRUE.
weights <- bayesweight_cen(trtmodel.list = list(A1 ~ L11 + L21,
                                                A2 ~ L11 + L21 + L12 + L22 + A1,
                                                A3 ~ L11 + L21 + L12 + L22 + A1 + L13 + L23 + A2),
                           cenmodel.list = list(C1 ~ L11 + L21,
                                                C2 ~ L11 + L21 + A1,
                                                C3 ~ L11 + L21 + A1 + L12 + L22 + A2),
                           data = simdat_cen,
                           n.iter = 250,
                           n.burnin = 150,
                           n.thin = 5,
                           n.chains = 1,
                           seed = 890123,
                           parallel = FALSE)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 4836
#>    Unobserved stochastic nodes: 44
#>    Total graph size: 16501
#> 
#> Initializing model
summary(weights)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>  0.2191  1.2419  2.1838  3.9613  5.1154 53.1423     136

Similarly, the function will automatically run MCMC with JAGS based on the specified treatment and censoring model inputs, saving a JAGS model file in the working directory for model customization. The function returns a list containing the updated weights for subject-specific treatment and censoring effects.

Bayesian non-parametric bootstrap to maximize the utility function with respect to the causal effect using bayesmsm

Using the weights estimated by bayesweight_cen, we now fit the Bayesian Marginal Structural Model and estimate the marginal treatment effects using the bayesmsm function as before. We specify the outcome model and other relevant parameters.

# Remove all NAs (censored observations) from the original dataset and weights
simdat_cen <- na.omit(simdat_cen)
weights <- na.omit(weights)

model <- bayesmsm(ymodel = Y ~ A1+A2+A3,
                  nvisit = 3,
                  reference = c(rep(0, 3)),
                  comparator = c(rep(1, 3)),
                  family = "binomial",
                  data = simdat_cen,
                  wmean = weights,
                  nboot = 364,
                  optim_method = "BFGS",
                  parallel = FALSE,
                  seed = 890123,
                  ncore = 1)
str(model)
#> List of 12
#>  $ RD_mean    : num 0.308
#>  $ RR_mean    : num 1.84
#>  $ OR_mean    : num 4.48
#>  $ RD_sd      : num 0.124
#>  $ RR_sd      : num 0.507
#>  $ OR_sd      : num 2.87
#>  $ RD_quantile: Named num [1:2] 0.0792 0.5402
#>   ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#>  $ RR_quantile: Named num [1:2] 1.14 3.13
#>   ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#>  $ OR_quantile: Named num [1:2] 1.39 11.29
#>   ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#>  $ bootdata   :'data.frame': 364 obs. of  5 variables:
#>   ..$ effect_reference : num [1:364] -0.7733 -0.9519 -0.1119 -0.6218 0.0165 ...
#>   ..$ effect_comparator: num [1:364] 1.374 1.12 0.853 1.195 0.504 ...
#>   ..$ RD               : num [1:364] 0.482 0.476 0.229 0.418 0.119 ...
#>   ..$ RR               : num [1:364] 2.53 2.71 1.49 2.2 1.24 ...
#>   ..$ OR               : num [1:364] 8.56 7.94 2.62 6.15 1.63 ...
#>  $ reference  : num [1:3] 0 0 0
#>  $ comparator : num [1:3] 1 1 1

The bayesmsm function returns a model object containing the following: the mean, standard deviation, and 95% credible interval of the Risk Difference (RD), Risk Ratio (RR), and Odds Ratio (OR). It also includes a data frame containing the bootstrap samples for the reference effect, comparator effect, RD, RR, and OR, as well as the reference and comparator levels chosen by the user.

  • We can extract and visualize the results as follows:
# Extract results
head(model$bootdata)
#>   effect_reference effect_comparator        RD       RR       OR
#> 1      -0.77330885         1.3741127 0.4822800 2.527344 8.562751
#> 2      -0.95194861         1.1201767 0.4755284 2.707505 7.941684
#> 3      -0.11185065         0.8530526 0.2291406 1.485399 2.624534
#> 4      -0.62184259         1.1949757 0.4182673 2.197230 6.152253
#> 5       0.01652478         0.5042640 0.1193298 1.236704 1.628630
#> 6      -0.27175158         0.8775624 0.2738397 1.633189 3.156027

Visualization functions: plot_ATE, plot_APO, plot_est_box

Similarly, we can use the built-in functions as well as summary_bayesmsm to visualize and summarize the results.

  • Plotting the Average Treatment Effect (ATE)
    • The plot_ATE function generates a plot of the estimated ATE with its 95% credible interval.
plot_ATE(model)

  • Plotting the Average Potential Outcome (APO)
    • The plot_APO function plots the estimated APO for both the reference and comparator level effects.
plot_APO(model, effect_type = "effect_comparator")

plot_APO(model, effect_type = "effect_reference")

  • Plotting the Distribution of Estimated Treatment Effects
    • The plot_est_box function generates an error bar plot of the estimated treatment effects (APO and ATE) from the bootstrap samples.

  • Summary function to generate result table from bayesmsm
    • The summary_bayesmsm function automatically generates a summary table of the model output from the function bayesmsm.
summary_bayesmsm(model)
#>                  mean        sd        2.5%      97.5%
#> Reference  -0.3648858 0.3505655 -1.11200604  0.2898218
#> Comparator  0.9691181 0.3111957  0.38740379  1.5981192
#> RD          0.3083083 0.1238811  0.07919336  0.5402016
#> RR          1.8412739 0.5071578  1.14213799  3.1259907
#> OR          4.4819705 2.8740524  1.39094085 11.2924450

Reference