Skip to contents

Introduction

  • The bayesmsm package enables easy implementation of the Bayesian marginal structural models (BMSMs) for longitudinal data. The methodology of BMSMs can be divided into 2 estimation steps:

    • Step 1. Bayesian treatment effect weight estimation
    • Step 2. Bayesian non-parametric bootstrap to maximize the utility function with respect to the causal effect
  • For Step 1, we estimate treatment weights wijw_{ij} using posterior samples of the α\alpha and β\beta via fitting a series of logistic regressions in a Bayesian framework. The package incorporates both Inverse Probability of Treatment Weighting (IPTW) and Inverse Probability of Censoring Weighting (IPCW) to handle longitudinal data without and with right-censoring. For Step 2, Pn(vij)P_n(v_{ij}) is estimated via non-parametric Bayesian bootstrap with Dir(1,...,1)Dir(1,...,1) sampling weights.

  • The main functions in this package include:

    • bayesweight: Calculates Bayesian weights for subject-specific treatment effects.
    • bayesweight_cen: Calculates Bayesian weights for subject-specific treatment effects with right-censored data.
    • bayesmsm: Estimates marginal structural models using the calculated Bayesian weights.
    • plot_ATE: Plots the estimated Average Treatment Effect (ATE).
    • plot_APO: Plots the estimated Average Potential Outcome (APO).
    • plot_est_box: Plots the distribution of estimated treatment effects.
    • summary_bayesmsm: Summarizes the model results from bayesmsm.
  • Installation

    • To install the bayesmsm package, you can use the devtools package to install it directly from GitHub:
devtools::install_github("Kuan-Liu-Lab/bayesmsm")
library(bayesmsm)

Simulated observational data with a time-dependent treatment

  • The simulated dataset (continuous outcome)
    • 1000 patients and 3 visits (2 of which patients were assigned a treatment)
    • y, an end-of-study continuous outcome
    • z, a binary treatment
    • w1 and w2 are two baseline covariates (one continuous and one binary, mimicking age and sex)
    • L1 and L2 are two time-dependent covariates (also one continuous and one binary)
    • no missing data
  • The simulated DAG
library(DiagrammeR)
grViz("
    digraph causal {
    # Nodes
    node [shape=plaintext]
    W [label = 'w1, w2']
    L1 [label = 'L11, L21']
    Z1 [label = 'Z1']
    L2 [label = 'L12, L22']
    Z2 [label = 'Z2']
    Y [label = 'Y']
    
    # Edges
    edge [color=black, arrowhead=vee]
    rankdir = LR
    W->L1
    W->Z1
    W->L2
    W->Z2
    W->Y
    L1->Z1
    L1->L2
    L1->Z2
    L1->Y
    Z1->L2
    Z1->Z2
    Z1->Y
    L2->Z2
    L2->Y
    Z2->Y
    
    # Graph
    graph [overlap=true, fontsize=14]
    }")
library(DT)
options(scipen = 999)

testdata <- read.csv(system.file("extdata", "continuous_outcome_data.csv", package = "bayesmsm"))

# look at the data;
datatable(testdata,
          rownames = FALSE,
          options = list(dom = 't')) %>%
  formatRound(columns=c('w2', 'L2_1', 'L2_2', 'y'), digits=2)
  • Frequency Counts by Treatment Combinations
# frequency counts by treatment combinations;
table(testdata$a_1, testdata$a_2)
#>    
#>       0   1
#>   0 520 201
#>   1 111 168
  • Suppose the causal parameter of interest is the average treatment effect between always treated and never treated,

ATE=E(YZ1=1,Z2=1)E(YZ1=0,Z2=0) ATE = E(Y \mid Z_1 = 1, Z_2 = 1) - E(Y \mid Z_1 = 0, Z_2 = 0)

Bayesian treatment effect weight estimation using bayesweight

  • The following code calls the function bayesweight to run JAGS and calculate the weights.
    • Non-parallel computing requires that n.chains = 1. Parallel MCMC requires at least 2 chains because computing is running on 1 core per chain, and we recommend using at most 2 chains less than the number of available cores on your computer.
    • Running this function automatically saves a JAGS model file in the working directory, which the user can check to review the model specifications.
  • Parameters Description:
    • trtmodel.list: A list of formulas corresponding to each time point with the time-specific treatment variable on the left hand side and pre-treatment covariates to be balanced on the right hand side. Interactions and functions of covariates are allowed.
    • data: The dataset containing all the variables specified in trtmodel.list.
    • n.iter: Total number of iterations for each chain (including burn-in).
    • n.burnin: Number of iterations to discard at the beginning of the simulation (burn-in).
    • n.thin: Thinning rate for the MCMC sampler.
    • n.chains: Number of MCMC chains to run. For non-parallel execution, this should be set to 1. For parallel execution, it requires at least 2 chains.
    • seed: Seed to ensure reproducibility.
    • parallel: Logical flag indicating whether to run the MCMC chains in parallel. Default is TRUE.
weights <- bayesweight(trtmodel.list = list(a_1 ~ w1 + w2 + L1_1 + L2_1,
                                             a_2 ~ w1 + w2 + L1_1 + L2_1 + L1_2 + L2_2 + a_1),
                        data = testdata,
                        n.iter = 2000,
                        n.burnin = 1000,
                        n.thin = 5,
                        n.chains = 2,
                        seed = 890123,
                        parallel = TRUE)
str(weights)
#>  num [1:1000] 1.234 1.134 1.021 0.82 0.795 ...
  • It returns a list containing:
    • weights: The calculated weights for subject-specific treatment effects.

Bayesian non-parametric bootstrap to maximize the utility function with respect to the causal effect using bayesmsm

The function bayesmsm estimates causal effect of time-varying treatments. It uses subject-specific treatment assignmennt weights weights calculated using bayesweight, and performs Bayesian non-parametric bootstrap to estimate the causal parameters.

  • Parameters Description:
    • ymodel: A formula representing the outcome model, which can include interactions and functions of covariates.
    • nvisit: Specifies the number of visits or time points considered in the model.
    • reference: The baseline or reference intervention across all visits, typically represented by a vector of zeros indicating no treatment (default is a vector of all zeros).
    • comparator: The comparison intervention across all visits, typically represented by a vector of ones indicating full treatment (default is a vector of all ones).
    • family: Specifies the outcome distribution family; use “gaussian” for continuous outcomes or “binomial” for binary outcomes (default is “gaussian”).
    • data: The dataset containing all variables required for the model.
    • wmean: A vector of treatment assignment weights. Default is a vector of ones, implying equal weighting.
    • nboot: The number of bootstrap iterations to perform for estimating the uncertainty around the causal estimates.
    • optim_method: The optimization method used to find the best parameters in the model (default is ‘BFGS’).
    • seed: A seed value to ensure reproducibility of results.
    • parallel: A logical flag indicating whether to perform computations in parallel (default is TRUE).
    • ncore: The number of cores to use for parallel computation (default is 4).
model <- bayesmsm(ymodel = y ~ a_1+a_2,
                           nvisit = 2,
                           reference = c(rep(0,2)),
                           comparator = c(rep(1,2)),
                           family = "gaussian",
                           data = testdata,
                           wmean = weights,
                           nboot = 1000,
                           optim_method = "BFGS",
                           parallel = TRUE,
                           seed = 890123,
                           ncore = 2)
str(model)
#> List of 6
#>  $ RD_mean    : num -3.16
#>  $ RD_sd      : num 0.0948
#>  $ RD_quantile: Named num [1:2] -3.34 -2.98
#>   ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#>  $ bootdata   :'data.frame': 1000 obs. of  3 variables:
#>   ..$ effect_reference : num [1:1000] 2.36 2.29 2.26 2.34 2.31 ...
#>   ..$ effect_comparator: num [1:1000] -0.946 -0.825 -0.754 -0.796 -0.764 ...
#>   ..$ RD               : num [1:1000] -3.31 -3.11 -3.02 -3.13 -3.08 ...
#>  $ reference  : num [1:2] 0 0
#>  $ comparator : num [1:2] 1 1
  • It returns a model object which contains:
    • mean, sd, quantile: the mean, standard deviation and 95% credible interval of the estimated causal effect (ATE). From the above results, the mean of ATE is approximately -3.161, which indicates that the expected outcome for always treated patients is, on average, 3.161 units less than that for never treated patients.
    • bootdata: a data frame containing the bootstrap samples for the reference effect, comparator effect, and average treatment effect (ATE).
    • reference, comparator: the reference level and comparator level the user chooses to compare. Here the reference level is never treated (0,0), and the comparator level is always treated (1,1).

Visualization functions: plot_ATE, plot_APO, plot_est_box

The bayesmsm package also provides several other functions for visualizing the above results: plot_ATE, plot_APO, and plot_est_box. These functions help the user better interpret the estimated causal effects.

  • Plotting the Average Treatment Effect (ATE)
    • The plot_ATE function generates a plot of the estimated ATE with its 95% credible interval.
plot_ATE(model)

  • Plotting the Average Potential Outcome (APO)
    • Similarly, the plot_APO function plots the estimated APO for both the reference and comparator level effects.
plot_APO(model, "effect_reference")

plot_APO(model, "effect_comparator")

  • Plotting the Distribution of Estimated Treatment Effects
    • The plot_est_box function generates an error bar plot of the estimated treatment effects (APO and ATE) from the bootstrap samples.

Reference