bayesmsm
1. Introduction
The bayesmsm
package enables easy implementation of
the Bayesian marginal structural models (BMSMs) for longitudinal data.
The methodology of BMSMs can be divided into 2 estimation steps:
For Step 1, we estimate treatment weights \(w_{ij}\) using posterior samples of the \(\alpha\) and \(\beta\) via fitting a series of logistic regressions in a Bayesian framework. The package incorporates both Inverse Probability of Treatment Weighting (IPTW) and Inverse Probability of Censoring Weighting (IPCW) to handle longitudinal data without and with right-censoring. For Step 2, \(P_n(v_{ij})\) is estimated via non-parametric Bayesian bootstrap with \(Dir(1,\ldots,1)\) sampling weights.
The main functions in this package include:
bayesweight
: Calculates Bayesian weights for
subject-specific treatment effects.bayesmsm
: Estimates marginal structural models using
the calculated Bayesian weights.plot_ATE
: Plots the estimated Average Treatment Effect
(ATE).plot_APO
: Plots the estimated Average Potential Outcome
(APO).plot_est_box
: Plots the distribution of estimated
treatment effects.Installation
devtools
package to install it directly from GitHub:2. Simulated observational data with a time-dependent treatment
library(DiagrammeR)
grViz("
digraph causal {
# Nodes
node [shape=plaintext]
W [label = 'w1, w2']
L1 [label = 'L11, L21']
Z1 [label = 'Z1']
L2 [label = 'L12, L22']
Z2 [label = 'Z2']
Y [label = 'Y']
# Edges
edge [color=black, arrowhead=vee]
rankdir = LR
W->L1
W->Z1
W->L2
W->Z2
W->Y
L1->Z1
L1->L2
L1->Z2
L1->Y
Z1->L2
Z1->Z2
Z1->Y
L2->Z2
L2->Y
Z2->Y
# Graph
graph [overlap=true, fontsize=14]
}")
library(DT)
options(scipen = 999)
testdata <- read.csv(system.file("extdata", "continuous_outcome_data.csv", package = "bayesmsm"))
# look at the data;
datatable(testdata,
rownames = FALSE,
options = list(dom = 't')) %>%
formatRound(columns=c('w2', 'L2_1', 'L2_2', 'y'), digits=2)
# frequency counts by treatment combinations;
table(testdata$a_1, testdata$a_2)
#>
#> 0 1
#> 0 520 201
#> 1 111 168
\[ ATE = E(Y \mid Z_1 = 1, Z_2 = 1) - E(Y \mid Z_1 = 0, Z_2 = 0) \]
3. Bayesian treatment effect weight estimation using
bayesweight
bayesweight
to
run JAGS and calculate the weights.
n.chains = 1
.
Parallel MCMC requires at least 2 chains because computing is running on
1 core per chain, and we recommend using at most 2 chains less than the
number of available cores on your computer.trtmodel.list
: A list of formulas corresponding to each
time point with the time-specific treatment variable on the left hand
side and pre-treatment covariates to be balanced on the right hand side.
Interactions and functions of covariates are allowed.data
: The dataset containing all the variables
specified in trtmodel.list.n.iter
: Total number of iterations for each chain
(including burn-in).n.burnin
: Number of iterations to discard at the
beginning of the simulation (burn-in).n.thin
: Thinning rate for the MCMC sampler.n.chains
: Number of MCMC chains to run. For
non-parallel execution, this should be set to 1. For parallel execution,
it requires at least 2 chains.seed
: Seed to ensure reproducibility.parallel
: Logical flag indicating whether to run the
MCMC chains in parallel. Default is TRUE.weights <- bayesweight(trtmodel.list = list(a_1 ~ w1 + w2 + L1_1 + L2_1,
a_2 ~ w1 + w2 + L1_1 + L2_1 + L1_2 + L2_2 + a_1),
data = testdata,
n.iter = 10000,
n.burnin = 5000,
n.thin = 5,
n.chains = 4,
seed = 890123,
parallel = TRUE)
str(weights)
#> num [1:1000] 1.233 1.138 1.016 0.83 0.794 ...
weights
: The calculated weights for subject-specific
treatment effects.4. Bayesian non-parametric bootstrap to maximize the utility
function with respect to the causal effect using
bayesmsm
The function bayesmsm
estimates causal effect of
time-varying treatments. It uses subject-specific treatment assignmennt
weights weights calculated using bayesweight
, and
performs Bayesian non-parametric bootstrap to estimate the causal
parameters.
ymodel
: A formula representing the outcome model, which
can include interactions and functions of covariates.nvisit
: Specifies the number of visits or time points
considered in the model.reference
: The baseline or reference intervention
across all visits, typically represented by a vector of zeros indicating
no treatment (default is a vector of all zeros).comparator
: The comparison intervention across all
visits, typically represented by a vector of ones indicating full
treatment (default is a vector of all ones).family
: Specifies the outcome distribution family; use
“gaussian” for continuous outcomes or “binomial” for binary outcomes
(default is “gaussian”).data
: The dataset containing all variables required for
the model.wmean
: A vector of treatment assignment weights.
Default is a vector of ones, implying equal weighting.nboot
: The number of bootstrap iterations to perform
for estimating the uncertainty around the causal estimates.optim_method
: The optimization method used to find the
best parameters in the model (default is ‘BFGS’).seed
: A seed value to ensure reproducibility of
results.parallel
: A logical flag indicating whether to perform
computations in parallel (default is TRUE).ncore
: The number of cores to use for parallel
computation (default is 4).model <- bayesmsm(ymodel = y ~ a_1+a_2,
nvisit = 2,
reference = c(rep(0,2)),
comparator = c(rep(1,2)),
family = "gaussian",
data = testdata,
wmean = rep(1, 1000),
nboot = 1000,
optim_method = "BFGS",
parallel = TRUE,
seed = 890123,
ncore = 4)
str(model)
#> List of 6
#> $ RD_mean : num -3.16
#> $ RD_sd : num 0.0947
#> $ RD_quantile: Named num [1:2] -3.34 -2.98
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ bootdata :'data.frame': 1000 obs. of 3 variables:
#> ..$ effect_reference : num [1:1000] 2.36 2.29 2.24 2.33 2.3 ...
#> ..$ effect_comparator: num [1:1000] -0.951 -0.825 -0.77 -0.811 -0.75 ...
#> ..$ RD : num [1:1000] -3.31 -3.11 -3.01 -3.14 -3.05 ...
#> $ reference : num [1:2] 0 0
#> $ comparator : num [1:2] 1 1
mean
, sd
, quantile
: the mean,
standard deviation and 95% credible interval of the estimated causal
effect (ATE). From the above results, the mean of ATE is approximately
-3.161, which indicates that the expected outcome for always treated
patients is, on average, 3.161 units less than that for never treated
patients.bootdata
: a data frame containing the bootstrap samples
for the reference effect, comparator effect, and average treatment
effect (ATE).reference
, comparator
: the reference level
and comparator level the user chooses to compare. Here the reference
level is never treated (0,0), and the comparator level is always treated
(1,1).5. Visualization functions: plot_ATE
,
plot_APO
, plot_est_box
The bayesmsm
package also provides several other
functions for visualizing the above results: plot_ATE
,
plot_APO
, and plot_est_box
. These functions
help the user better interpret the estimated causal effects.
plot_ATE
function generates a plot of the estimated
ATE with its 95% credible interval.plot_APO
function plots the estimated
APO for both the reference and comparator level effects.plot_est_box
function generates an error bar plot
of the estimated treatment effects (APO and ATE) from the bootstrap
samples.Reference