Joint calibrated estimation of inverse probability of treatment and censoring weights for marginal structural models

Abstract Marginal structural models (MSMs) with inverse probability weighted estimators (IPWEs) are widely used to estimate causal effects of treatment sequences on longitudinal outcomes in the presence of time‐varying confounding and dependent censoring. However, IPWEs for MSMs can be inefficient and unstable if weights are estimated by maximum likelihood. To improve the performance of IPWEs, covariate balancing weight (CBW) methods have been proposed and recently extended to MSMs. However, existing CBW methods for MSMs are inflexible for practical use because they often do not handle dependent censoring, nonbinary treatments, and longitudinal outcomes (instead of eventual outcomes at a study end). In this paper, we propose a joint calibration approach to CBW estimation for MSMs that can accommodate (1) both time‐varying confounding and dependent censoring, (2) binary and nonbinary treatments, (3) eventual outcomes and longitudinal outcomes. We develop novel calibration restrictions by jointly eliminating covariate associations with both treatment assignment and censoring processes after weighting the observed data sample (i.e., to optimize covariate balance in finite samples). Two different methods are proposed to implement the calibration. Simulations show that IPWEs with calibrated weights perform better than IPWEs with weights from maximum likelihood and the “Covariate Balancing Propensity Score” method. We apply our method to a natural history study of HIV for estimating the effects of highly active antiretroviral therapy on CD4 cell counts over time.


Marginal Structural Models and Covariate Balancing Weights
probability of censoring weighting (IPCW) that addresses the additional selection bias from censoring (Hernán et al., 2001).
To implement IPTW and IPCW for MSMs, time-varying weights are commonly estimated by fitting parametric models for treatment assignment and censoring processes and then plugging in parameter estimates from maximum likelihood estimation (MLE). However, the MLE approach to weight estimation can result in inefficient and unstable inverse probability weighted estimators (IPWEs), especially when the treatment assignment and/or censoring model is misspecified (Kang and Schafer, 2007;Cole and Hernán, 2008;Lefebvre et al., 2008;Howe et al., 2011). Because final weights for fitting MSMs are a product of the time-varying weights for IPTW and IPCW, the efficiency and stability issues of IPWEs can be exacerbated when both time-varying confounding and dependent censoring are present.
Motivated by improving IPWEs primarily for binary point treatments, covariate balancing weight (CBW) methods, which directly optimize covariate balance for weight estimation, have been proposed (Graham et al., 2012;Hainmueller, 2012;Imai and Ratkovic, 2014;Zubizarreta, 2015;Chan et al., 2016;Yiu and Su, 2018;Fong et al., 2018). In empirical studies, CBW methods have been shown to dramatically improve the performance of IPWEs by reducing their mean squared errors (MSEs) under both correct and incorrect model specification. Recent theoretical investigations by Tan (2020) also reveal that, unlike the MLE approach, CBW methods can bound the MSEs of IPWEs even under model misspecification.
Recently, CBW methods have been extended to improve the IPWEs in MSMs. Imai and Ratkovic (2015) first extended the "Covariate Balancing Propensity Score" (CBPS) method to MSMs with binary treatments. However, because the number of moment conditions for weight estimation increases exponentially with the number of follow-up visits, CBPS could only practically accommodate a small number of visits and covariates, and is computationally intensive. Yiu and Su (2018) demonstrated that their CBW framework for a general point treatment can be extended to estimate the short-term direct effect of a timevarying treatment on a longitudinal outcome. Nonetheless, it is often of greater interest to estimate the total effect of a treatment sequence in MSMs (i.e., both the direct treatment effect and the indirect treatment effect through time-varying confounders on the longitudinal outcome). Focusing on MSMs with continuous treatments, Zhou and Wodtke (2020) proposed a different approach called "residual balancing", where conditional means of time-varying confounders are modeled, and weights for IPTW are estimated by balancing the residuals of the time-varying confounder models across future treatments as well as the his-tory of treatments and confounders. In practice, it is undesirable to model a whole set of time-varying confounders and to specify the functional form of future treatments for balancing. A notable limitation of the CBW methods in Imai and Ratkovic (2015), Yiu and Su (2018), and Zhou and Wodtke (2020) is that they do not address the common problem of dependent censoring in MSMs. Kallus and Santacatterina (2019) developed "Kernel Optimal Weighting" to handle both time-varying confounding and dependent censoring in MSMs. However, their approach is restricted to binary treatments and requires modeling conditional means of potential outcomes given observed histories of treatments and confounders. With the exception of Yiu and Su (2018), the aforementioned methods focused on MSMs for an eventual outcome at a study end, instead of MSMs for a longitudinal outcome over time, which are often of interest in practice. In summary, more research is required to improve the flexibility and practicality of CBW methods for their widespread use in longitudinal settings.

Joint Calibration Approach to Weight Estimation
To enhance the flexibility of CBW methods for practical use in MSMs, we propose a new calibration approach to CBW estimation that can accommodate (1) both timevarying confounding and dependent censoring, (2) binary and nonbinary treatment sequences, and (3) eventual and longitudinal outcomes. Specifically, by building on the "Covariate Association Eliminating Weights" framework by Yiu and Su (2018) for point treatments, we propose novel moment conditions (i.e., calibration restrictions) for weight estimation that jointly remove covariate associations over time with both treatment assignment and censoring processes after weighting the observed data sample (i.e., to optimize covariate balance for both treatment assignment and censoring processes in finite samples).
The joint calibration of CBWs based on our proposed moment conditions can be implemented by three types of methods, as pointed out by a referee. These include: Type (1), calibrating an initial set of estimated weights (e.g., from MLE) with an exponential tilting term containing a parameter vector with dimension equal to the number of moment conditions (see Han (2016) for an example in handling dependent censoring); Type (2), estimating modelbased parameters of inverse probability weights by solving estimating equations based on the proposed moment conditions (e.g., Imai and Ratkovic, 2015); Type (3), estimating weights nonparametrically by solving a constrained optimization problem that incorporates the proposed moment conditions in the constraints (e.g., Zubizarreta, 2015;Chan et al., 2016;Yiu and Su, 2018;Kallus and Santacatterina, 2019;Zhou and Wodtke, 2020). We discuss the pros and cons of all three methods in the MSMs settings and develop both Types (1) and (2) methods for implementing the joint calibration in Section 4.3.
Besides flexibility, an important feature of our approach is its computational efficiency. This is because, (I) unlike the methods in Zhou and Wodtke (2020) and Kallus and Santacatterina (2019), it does not require models for the conditional means of time-varying confounders or potential outcomes, (II) unlike the CBPS method in Imai and Ratkovic (2015), it allows for parsimony in deriving moment conditions when there exist many time-varying confounders and visits, therefore the number of proposed moment conditions does not have to increase exponentially with the number of visits. For example, the number of moment conditions can increase linearly even if separate treatment assignment models are specified at each visit. Together the flexibility and computational efficiency of our approach can encourage more widespread and practical use of MSMs in complex longitudinal settings. Further details of the related literature and our contributions are provided in Web Appendix A.
As we focus on the main idea of the proposed calibration approach to CBWs, its implementation, and empirical evaluation in this paper, we leave the theoretical investigation of the robustness and efficiency of the proposed IPWEs for future work. In Section 7, we briefly discuss the robustness and efficiency issues regarding the proposed IPWEs in light of the recent theoretical development in the CBW literature (Wang and Zubizarreta, 2020).

Motivating Example
This research is motivated by data from the HIV Epidemiology Research Study (HERS), a natural history study of 1310 women with, or at high risk of, HIV infection at four sites (Baltimore, Detroit, New York, Providence) from 1993 to 2000 (Ko et al., 2003). During the study, 12 visits were scheduled, where a variety of clinical, behavioral, and sociological outcomes as well as self-reported information on antiretroviral therapies (ARTs) were recorded approximately every 6 months. Our objective is to quantify the effect of highly active antiretroviral therapy (HAART), which contains three or more ART regimens, on the CD4 cell counts over time in the HERS cohort. Because the HERS was an observational study, where therapies were not randomly assigned and varying over time, this leads to the potential for timevarying confounding between treatment and outcome. Moreover, estimation of the treatment effect is further complicated by dependent censoring due to dropout: more than half of the 871 HIV-infected women at enrollment did not complete the study. We provide further details about these problems in the HERS cohort in Web Appendix G.
In the previous analysis by Ko et al. (2003), weights for IPTW and IPCW were estimated by MLE to fit several MSMs and address the time-varying confounding and dependent censoring problems in the HERS data. However, Ko et al. (2003) considered the time-varying treatments to be binary (i.e., with and without HAART). Because patients on ART other than HAART (i.e., less than 3 ARTs) were combined with patients not receiving any treatment, the therapeutic effect of HAART relative to no treatment was likely to be underestimated. In this paper, we consider the time-varying treatment as ordinal with 3 levels-"no treatment," "ART other than HAART," and "HAART," which therefore allows more precise quantification of the effect of HAART. However, the probability of each level of the ordinal treatment can depend on many baseline and time-varying covariates and their interactions (as reflected in the treatment guidelines when the HERS was conducted), which not only makes model misspecification likely but also makes it more challenging to balance the covariates across treatment levels. These issues thus motivated us to develop a new calibration approach to CBW estimation in MSMs.

NOTATION, SETTING AND ASSUMPTIONS
We consider a study in which independent patients are enrolled at baseline (denoted by visit 0) and then followed up over time at visits = 1, … , . For the th patient, baseline covariates (e.g., demographics) are recorded. At each follow-up visit , we assume that this patient's treatment assignment , time-varying covariate vector , and longitudinal outcome are measurable, and are recorded only if the patient makes the visit. We further assume that the variables follow the temporal ordering where , , and are only affected by { , , , −1 , }, { , , −1 , , −1 , }, and { , −1 , , −1 , , −1 , }, respectively, for = 1, … , .
Here an overbar is used to represent the history of a process, for example, = { 1 , … , }. For ease of exposition in what follows, we absorb , −1 , , −1 and into the covariate history , −1 ( = 1, … , ), unless stated otherwise. Let be the potential outcome that would have arisen at visit had the th patient been assigned the potential treatment sequence from the first visit after baseline up to visit . We assume that the causal effect of on can be encoded in an MSM of the form E( ) = ( , ) = {ℎ( ), }, where ℎ(⋅) is a function satisfying ℎ( = ) = 0, is the vector of zeros, = is the potential treatment sequence where no treatment is administered at every visit up to visit , and (⋅) is a function that relates the mean of the potential outcome to ℎ( ) through a finite-dimensional parameter vector . Note that baseline covariates can also be included in the MSM. To identify and estimate from the observed data, we make the stable unit treatment value (SUTVA) assumption, that is, the distribution of potential outcomes for one patient is assumed to be independent of potential treatment sequence of another patient, and the potential outcomes are well defined. Additionally, we make the sequential ignorability of treatment assignment assumption, that is, pr( | , , −1 ) = pr( | , −1 ) for = 1, … , , and the positivity assumption, that is, pr( ∈  | , −1 ) > 0 for all , −1 and for any set  with positive measure. Note that can have arbitrary distributions (e.g., ordinal and continuous).
In the presence of dependent censoring, the objective is to estimate the causal effect of the treatment sequence in MSMs without censoring, therefore further assumptions about the censoring process need to be made. Let be the indicator of whether the th patient remains in the study up to visit . We assume that 0 = 1 (i.e., baseline visit assessments are complete for all patients) and , −1 = 0 ⇒ = 0 (monotone missingness due to dropout). Our interest is to estimate the parameters of the MSM for E( , = ), where is the potential sequence of the indicator of the th patient being in the study by visit and is the vector of ones. To achieve this, we make an assumption that the censoring is sequentially ignorable, that is, pr( | , , −1 , , −1 = 1) = pr( | , −1 , , −1 = 1) for = 1, … , , where , −1 denotes the observable history of the th patient up to visit − 1 that can include , −1 and any other relevant covariate information. In addition, we assume that pr( | , −1 , , −1 = 1) > 0 for all , −1 , which is similar to the positivity assumption made for the treatment process.
Throughout the paper, we make the above assumptions; otherwise our method may result in severely biased estimates for parameters in the MSM, possibly even compared to an analysis without addressing time-varying confounding and dependent censoring.

INVERSE PROBABILITY OF TREATMENT AND CENSORING WEIGHTING FOR MSMs
To identify and consistently estimate using observed data under the assumptions described in Section 2, the following inverse probability of treatment and censoring weighted (IPTCW) estimating equations , −1 = 1) are the inverse probability of censoring weights, ( , ) = { ( , )∕ } −1 , and = var( ) (e.g., see Robins (1999b); Hernán et al. (2001); Ko et al. (2003) for proof). Other commonly used versions of (1) include replacing with the stabilized weights for censoring, = ∏ =1 pr( = 1 | , −1 , , −1 = 1), and incorporating baseline covariates in the numerator of (and ) when they are included in the MSM. For simplicity, we do not consider these alternatives here, but our proposed method described in Section 4 easily extends to these scenarios (e.g., see Web Appendix C for more details).
The purpose behind weighting the uncensored observation for the th patient at visit by is to create a representative sample of the target population (in the absence of censoring) at visit . This is achieved because the ∑ =1 ( − 1) copies of the uncensored observations at visit are representative of the censored observations up to and including visit in terms of , −1 , and the remaining ∑ =1 copies of the uncensored observations represent themselves (in total there are ∑ =1 copies). That is, if * denotes the pseudo-population after weighting the uncensored observations at visit by − 1, it can be shown that pr * ( = 1 | , −1 , 0 = 1) = 1∕2 (see Web Appendix C for proof). Subsequently, the purpose of weighting the uncensored observations further by is to create a pseudo-population where is conditionally independent of , −1 given , −1 , and the causal effect of on E( ) is the same as in the original population. Under the sequential ignorability, positivity, and SUTVA assumptions described in Section 2, the treatment process up to visit after weighting by will then be causally exogenous (Robins, 1999b), that is, pr * ( | , , −1 ) = pr * ( | , −1 ) = pr( | , −1 ), where * denotes the pseudo-population after weighting by . Then standard regression methods can be used to consistently estimate in the MSM if the weights in (1) are known.

JOINT CALIBRATED WEIGHT ESTIMATION FOR MSMs
In this section, we describe our joint calibration approach to CBW estimation in MSMs. It is important to note that the goal for calibration is to improve IPWEs of MSM parameters, and not to improve estimation of the treatment and censoring processes.
Specifically, we propose to calibrate (ˆ,ˆ) (ˆ) by jointly imposing calibration restrictions (i.e., moment conditions) implying that, after weighting the observed data sample, at each study visit (I) treatment assignments are unassociated with the history of time-varying covariates, and (II) we have a representative sample of the target population in the absence of censoring. For the simpler setting of IPTW, (ˆ,ˆ) can be calibrated by only imposing the restrictions for the treatment assignment process.
Let ⋆ ( ) be the calibrated weights with parameter vector . Note that we use ⋆ in superscript to highlight that the weights are calibrated.
After obtaining an estimate of ,ˆ, we propose to replace by ⋆ (ˆ) in (1) to estimate .
In the next sections, we derive calibration restrictions for the treatment assignment and censoring processes before developing the implementation procedures for the joint calibration.

Calibration Restrictions for Treatment Assignment
We derive calibration restrictions for treatment assignment by building on the framework proposed in Yiu and Su (2018) for point treatments. Let pr( | , −1 ; w ) be a parametric model for the treatment assignment. Here we use the subscript "w" in w to highlight that this is the parametric model used to derive restrictions. Following Yiu and Su (2018), we use the partition w = { wb , wd }, where wd are the unique parameters that characterize the dependence of on , −1 excluding the treatment history , −1 (i.e., regression coefficients of baseline and timevarying covariates and their interactions with , −1 ), and wb include the intercept terms and parameters that characterize the dependence on treatment history (i.e., regression coefficients of , −1 ). Here the subscripts "d" and "b" stand for dependence and baseline, respectively. Without loss of generality, let pr( | , −1 ; wb = , wd = ) = pr( | , −1 ; ), that is, setting { wb = , wd = } results in a treatment process model that only depends on treatment history and is parameterized by . Now suppose that is fixed and we have known weights ⋆ ( ), it is possible to examine whether wd = in the weighted sample with ⋆ ( ), by finding the value of w , that is,ˆw, that maximizes or equivalently solves the score equations (3) The terms in the curly brackets in (2) make it explicit that ⋆ ( ) is used to weight the likelihood of the observed treatment sequence for the th patient up to visit . We propose to derive calibration restrictions by inverting (3) and finding the value of implying that {ˆw b = ,ˆw d = } are the values that maximize (2). That is, we solve for Satisfaction of the restrictions in (4) means that, after weighting by ⋆ (ˆ), the treatment assignments up to visit are unassociated with the covariate histories conditional on the treatment histories in the observed data sample (i.e.,ˆw d = ). Note that the structure of the covariate associations is characterized by the specified parametric treatment process model. More discussion about this general framework for weight estimation for point treatments can be found in Yiu and Su (2018). In addition, Web Appendix B provides details of deriving calibration restrictions for the eventual outcome setting.

Application to ordinal treatments
We consider the following model for the ordinal treatment variable in the HERS data, where 0 is the indicator of whether at least one ART was administered, 1 is the indicator of whether HAART was administered,˜0 , −1 and˜1 , −1 are functionals of , −1 (e.g., transformations and interactions) including 1, and 0 and 1 are corresponding regression coefficients. Applying (4), restrictions based on (5) can be derived as whereˆ0 andˆ1 are the predicted probabilities of receiving treatment at visit from fitting the model (5) but with treatment history as the only covariates. The restrictions in (6) are in spirit similar to the covariate balancing restrictions/conditions for binary point treatments (Imai and Ratkovic, 2014;Yiu and Su, 2018), but they are aggregated over time. Examining these restrictions carefully, we can see that they aim to remove the associations of the covariates,˜0 , −1 and˜1 , −1 , with the residuals of the treatment variables (after fitting (5) with treatment history as the only covariates) over time.

Application to continuous treatments
We assume that a time-varying continuous treatment at visit for the th patient follows a heteroscedastic normal linear model ∼ {˜⊤ , −1 , exp(˜⊤ , −1 )}, wherẽ , −1 and˜, −1 include 1 and functionals of , −1 (e.g., interactions), and and are corresponding regression coefficients. After applying (4) to this model for treatment assignment, we obtain the following restrictions whereˆandˆ2 are the estimated mean and variance of the continuous treatments from fitting the same normal linear model but with treatment history as the only covariates. It is easy to see that these restrictions are designed to remove the associations between the covariates˜, −1 , , −1 and the standardized residuals of a treatment model that depends only on treatment history.

4.1.3
Additional restrictions for IPTW when implementing with the Type (1) method When the Type (1) method is used to implement IPTW only (e.g., when censoring is assumed to depend on treatment history only), we propose to estimate calibrated weights ⋆ ( ) for IPTW (instead of ⋆ ( ) for IPTCW) by jointly imposing (4) and additional restrictions for = 1, … , , in the same spirit as in Cao et al. (2009). That is, we also constrain the average of the calibrated weights to equal 1 at each visit. The purpose of these restrictions is to avoid the trivial solution of zeros for the weights in (4), and to improve the stability of the IPWE by prohibiting extremely large weights. As we shall see in Section 4.2, the calibration restrictions for IPCW already impose constraints on the sample size after weighting. Therefore, restrictions in (7) are redundant when calibrating weights for IPTCW.

Calibration Restrictions for Censoring
To derive calibration restrictions for IPCW, we utilize the proposition that at visit ( = 1, … , ), IPCW creates a representative sample of the target population (in the absence of censoring) at visit after weighting the uncensored observations by . This proposition can be proved by induction (see Web Appendix C). An important step in the proof is to validate the inductive steps up to visit , that is, weighting the uncensored observations at visit = 1, … , by 1∕ − 1, where = pr( = 1 | , −1 , , −1 = 1), creates a representative sample of the censored observations at visit , assuming that the proposition holds at visit − 1 and the uncensored observations at visit − 1 have been weighted by , −1 . Therefore we derive calibration restrictions for censoring by inverting weighted score equations of a parametric model evaluated at the point implying no evidence against the inductive steps.
Specifically, suppose that is fixed and the calibrated weights ⋆ 1 ( ), … , ⋆ ( ) are known. We can assess the validity of the proposition at visit by specifying a parametric model ( w ) = pr( = 1 | , −1 , , −1 = 1; w ) ( = 1, … , ) and estimating its parameter w by maximizing where 1∕ ⋆ ( ) = ⋆ ( )∕ ⋆ , −1 ( ) ( = 1, … , ), ⋆ 0 ( ) = 1, and by convention 0 0 = 1. The terms in (8) are used to assess the validity of the inductive steps at visits = 1, … , in the observed data sample, given that weighting by ⋆ , −1 has been applied at visit − 1. In particular, if ( w = ) = 1∕2 ∀ , deviations fromˆw = in the observed data sample would provide evidence against the inductive step at one or more visits up to and including visit , and thus evidence against the proposition at visit . Similarly, we can simultaneously assess the validity of the proposition at all visits by maximizing with the score equations The terms in square brackets in (9) are weighted by − + 1 because they are required for assessing whether the proposition holds at visits , … , . We derive restrictions by finding such thatˆw = are the values that solve (10). That is, we solve for such that In this paper, we assume a logistic model logit{ ( w )} =˜⊤ , −1 w , where˜, −1 is a vector of functionals of , −1 including 1. Then the restrictions based on (11) are The term (12) can be interpreted as the covariate balance summary of˜, −1 between the weighted uncensored observations at visit and the weighted uncensored observations at visit − 1.

Equation in (12) is equivalent to
(see details in Web Appendix C). Since˜, −1 ( = 1, … , ) includes 1, (13) implies , which means that the total number of "observations" after weighting is equal to , the total number of observations of the target population without censoring. If˜, −1 includes baseline covariates , (13) implies , that is, the weighted average of over all visits is equal to the sample average of . If˜, −1 includes an indicator for visit, ( = ) ( = 1, … , ), and an interaction between this visit indicator and , ( = ) , then (13) implies That is, at each visit the sample size after weighting is and the weighted average of is equal to the sample average of . Note that interactions between visits and timevarying covariates can also be included in˜, −1 , therefore time-varying covariates at different visits can be balanced separately.
In this section, we have derived restrictions for calibrating unstabilized weights for censoring; restrictions for stabilized weights for censoring can be found in Web Appendix C.

Pros and cons of implementation methods
We discuss the pros and cons of the three types of implementation methods for calibration introduced in Section 1.2. The advantage of Type (1) methods is that they almost always result in a unique solution and they are computationally efficient. This is explained in Section 4.3.3 by showing that Type (1) methods are equivalent to solving a convex minimization problem. The disadvantage of Type (1) methods is that they can inherit the poor performance of the initial weights, for example, when the initial weights are generated by a severely misspecified model. Type (2) methods do not have this problem because no initial weights are required. However, they are not guaranteed to produce weights that satisfy the calibration restrictions exactly, not to mention a unique set of weights if such a set exists. This problem is especially prominent for long treatment sequences and complex models involving nonbinary treatments. For example, in our first simulation study in Section 5, the Type (2) method in Section 4.3.4 worked well when only IPTW was required, but failed to converge (i.e., did not find weights that satisfied our proposed moment conditions) for the majority of simulated data sets when IPTCW was applied. Furthermore, in our HERS data example, the same Type (2) method produced several sets of weights that differed by more than a constant of proportionality, that is, there existed multiple solutions.
Type (3) methods look promising because they seem not to suffer from the disadvantages of Type (1) and (2) methods. Nevertheless, we discourage using Type (3) methods to implement our calibration approach because (a) they may not result in a consistent estimator of treatment effects in MSMs even when the models for deriving calibration restrictions are correctly specified, and (b) they can be computationally intensive. Issue (a) arises because in longitudinal settings, unlike Type (1) and (2) methods, Type (3) methods do not impose enough structure on the weights to ensure that they converge to the true weights for IPTCW. One way to address issue (a) is to impose more calibration restrictions by either specifying and/or modeling conditional means of the time-varying confounders given observed histories (Zhou and Wodtke, 2020), or conditional means of the potential outcomes given observed histories (Kallus and Santacatterina, 2019). However, in practice, it would be cumbersome and undesirable to conduct additional complex modeling, apart from specifying the treatment and censoring models for IPTCW in MSMs.
Here,ˆ,ˆ, andˆdenote parameter estimates by MLE without weighting. The implementation of joint calibration requires solving a system of linear equations in terms of ⋆ ( ) since the restrictions (4), (7), and (13) are linear in the calibrated weights. Let be the known × matrix and be the known × 1 vector, where is the numbers of restrictions. For example, for IPTCW, would be the combined size of w and w . Both and are determined by the calibration restrictions (4), (7), and (13). For obtaining the calibrated weights, we need to solve which can be performed in R (R Development Core Team, 2014) by using the package nleqslv (Hasselman, 2018), once the form of the calibrated weights ⋆ ( ) has been specified. The forms of the calibrated weights in the Type (1) and (2) methods differ and are now specified in the following sections.

Calibrated weights in the Type (1) method
We consider calibrated weights of the form ⋆ ( ) = (ˆ,ˆ,ˆ)• exp( ) for the Type (1) method, where exp(⋅) is performed element-wise, • denotes element-wise product, and is an × 1 vector of parameters. Although other forms of calibration are possible (e.g., see Han (2016)), this particular choice is appealing because solving (14) is equivalent to minimizing the convex function for , where is an × 1 vector of ones. The convexity of (15) ensures that the solution to (14) is unique and can be found efficiently. For the HERS analysis in Section 6, it took approximately two seconds to obtain the calibrated weights by imposing 84 restrictions for 2581 observations on a Linux machine with 2.40GHz CPU (four processors) and 128 GB memory.

Calibrated weights in the Type (2) method
For the Type (2) method, we consider calibrated weights of the form ⋆ ( ) = (ˆ, , ). That is, the calibrated weights take the form of the standard IPTCW weights. However, the parameters characterizing these weights, = { , }, are estimated by solving (14) once has been estimated by MLE. Recall that if only IPTW is to be applied using the Type (2) method, (14) will only include restrictions (4) (not additional restrictions (7)) in order to ensure that the number of parameters to be estimated is equal to the number of moment conditions. Unlike the Type (1) method, this Type (2) method resulted in multiple solutions in the HERS data example, and failed to converge when IPTCW was applied in both the HERS data example and simulation studies.

Other practical guidelines
It can be shown that the true inverse probability weights must satisfy the proposed moment conditions asymptotically, that is, the calibration/exponential tilting function in the Type (1) method should converge to 0 if the initial weights are estimated from a correctly specified model (see Web Appendix F for proof). Thus the proposed moment conditions can also be used for model checking. For example, in the HERS data analysis, we calculated the variance of the estimated calibration function since, if the treatment and censoring models for the initial weights are correctly specified, this variance should be close to zero (see Section 7.4 in Web Appendix G). Alternatively, one could detect whether a particular set of weights approximate the true IPTCW weights by assessing how close these weights are to satisfying the proposed moment conditions. We distinguish between covariate histories that are predictive of E( , = ), denoted as , −1 , and those that are predictive of , denoted as , −1 . Some elements of , −1 and , −1 overlap, which leads to confounding bias. We recommend prioritizing˜, −1 , that is, functionals of , −1 , for inclusion in the models of the treatment and censoring processes for deriving restrictions at visit (Zhao and Percival, 2017). In Web Appendix D, we provide further discussion on model choices for deriving calibration restrictions.

SIMULATION
We conduct two simulation studies to assess the finite sample performance of IPWEs for MSMs based on our calibration approach. The set-up of the first simulation study is motivated by the HERS data, where ordinal timevarying treatments are observed over long follow-up periods and dependent censoring is present. Because the CBPS approach of Imai and Ratkovic (2015) can only handle binary treatments with a small number of visits, we are only able to compare the performance of our calibration approach with the MLE approach in the first simulation study. To include the CBPS approach for comparison, we design a second simulation study for binary treatments at five follow-up visits and with no censoring. The computing time for the CBPS approach implemented by the CBPS package in R is 800 or more times than those for the calibration and MLE approaches. Full details of the simulation studies can be found in Web Appendix E. The R code for the simulation study is also available in the Supporting Information.
Overall, the simulation results confirm that both IPWEs for MSMs with weights from MLE and our calibration approach (implemented by both Type (1) and Type (2) methods) have negligible bias when the treatment and censoring models are correctly specified. IPWEs from the CBPS approach have small amounts of biases when the treatment model is correctly specified. However, with model misspecification for weight estimation, IPWEs from all approaches may have large biases that do not disappear with increasing sample sizes. Notably, the IPWEs with calibrated weights are considerably less variable and have much smaller MSEs than their MLE counterparts, especially when the treatment assignment and censoring models are misspecified. In particular, when model misspecification is induced by functional form misspecification of the covariates, the IPWEs with weights from MLE have large variances and MSEs that even increase with sample size. In contrast, the IPWEs with our calibrated weights are more stable and have smaller MSEs that decrease with sample size. The IPWEs with calibrated weights also have smaller median absolute errors (MAEs) and MSEs than their CBPS counterparts. The MAE, which is more robust to extreme values than the MSE, indicates that, after throwing away the worst half of the simulations results, our calibration approach still performs better than the CBPS approach.

APPLICATION
In this section, we apply the proposed method to the HERS data. Since HAART was not available at enrollment in the HERS cohort, we follow Ko et al. (2003) and treat visit 7, when HAART was more widely used in the HERS, as the "baseline" and estimate the causal effects of HAART over the 2-year period between visit 8 and visit 12. Besides attrition, there were secondary sources of missing data that resulted in intermittent missing data (before being lost to follow-up), missing data at enrollment for CD4 counts, and left-censored HIV viral load at the lower detection limit. We deal with these by following the approaches in Ko et al. (2003); see Web Appendix G for details. In total, there are 610 patients at visit 7 who had at least one CD4 count measured between visit 8 and 12 and sufficient information for covariates to estimate the weights for IPTW and IPTCW. The total number of CD4 count observations for analysis is 2581.

Model Parameterizations and Estimation
As discussed in Section 1.3, in order to provide a more precise estimate of the causal effect of HAART relative to no treatment, we consider the time-varying antiretroviral treatment assignment as an ordinal variable, which is represented by the indicator of whether at least one ART was administered, 0 (with a potential value 0 ), and the indicator of whether HAART was administered, 1 (with a potential value 1 ), for = 8, … , 12. Let = 0 if 7 < 200, = 1 if 200 ≤ 7 ≤ 500, and = 2 if 7 > 500, where 7 is the CD4 count at visit 7. We specify the following MSM for the potential CD4 count outcome , } for = 8, … , 12, where 0 are visit-specific intercept terms, are baseline covariates evaluated at visit 7 (see the full list in Web Appendix G), and are their corresponding regression coefficients. This MSM encodes the cumulative effect of HAART and 1-2 ARTs relative to no treatment stratified by the CD4 count at visit 7. If 1 and 2 are constrained to be constant across ( = 0, 1, 2 for the strata), then an overall cumulative treatment effect can be obtained. In addition, we assume a different MSM for evaluating short-term treatment effects in Web Appendix G.
The parameters in the MSMs were estimated by applying IPTW and IPTCW, with weights estimated by MLE and our calibration approach with the Type (1) method. As mentioned previously, we consider the Type (2) method as an unreliable option for weight estimation in the HERS data because it produced multiple solutions for IPTW and did not converge for IPTCW. For IPTW, we assume the model in (5) for the MLE approach and for deriving restrictions for calibration.
In Web Appendix G, we provide the full list of covariates in (5).
For IPTCW, a logistic model with the same covariates as those in the treatment assignment model was used for estimating the inverse probability of censoring weights by MLE and for deriving calibration restrictions (12). To prevent extreme weights as in Cao et al. (2009), the weights in IPTW by the MLE approach were scaled to sum to the number of observations in the HERS data (i.e., 2581); and the weights from MLE for IPTCW were scaled to sum to 5 times the sample size at visit 7 (i.e., the number of outcome measurements that would have been observed had nobody been censored from visit 7 onward). Finally, we estimated standard errors with 2500 nonparametric bootstrap samples by treating patients as resampling units.

Results
In Web Appendix G, we provide details of the estimated weights and discuss the extent to which they suggest that confounding bias from observed covariates is present and the positivity assumption is satisfied. Table 1 presents the estimates and standard errors of the parameters in the specified MSMs with no weighting, IPTW, and IPTCW. The results of the naïve analysis with no weighting applied, as shown in the first two rows of Table 1, strongly suggest that, compared with no treatment, HAART was effective at increasing the CD4 counts over time for those with CD4 ≤ 500 at visit 7, and 1-2 ARTs were effective for those with 200 ≤ CD4 ≤ 500 at visit 7. However, point estimates for the group with CD4 > 500 at visit 7 showed detrimental effects of both HAART and 1-2 ARTs.
Applying IPTW with weights from MLE provides an upward adjustment of the treatment effects, as seen in the third and fourth rows of Table 1. The largest adjustments for 1-2 ARTs and HAART are in the CD4 < 200 and CD4 > 500 strata, respectively. Overall, this results in a fairly substantial upward adjustment for the treatment effects in the MSM with no stratification. However, applying IPTW with weights from MLE also increased the standard errors of the estimated treatment effects.
The fifth and sixth rows in Table 1 present the results from applying IPTW with calibrated weights from the Type (1) method. It appears that HAART had an even greater effect on increasing CD4 counts for those with CD4 ≤ 500 at visit 7 and overall without stratification, compared with the results based on weights from MLE. There were also substantial increases in the estimated effects of 1-2 ARTs for those with ≥ 200 and overall. As anticipated, the estimated standard errors with the calibrated weights are much smaller even compared to the naïve analysis with no weighting applied.
Further adjustment for selection bias due to dependent censoring appears to have largely minor effects, as seen in the last four rows of Table 1. The most notable modifications occur in the CD4 > 500 strata. However, there is substantial uncertainty associated with these estimated TA B L E 1 Parameter estimates and bootstrap standard errors of the MSMs by applying no weighting, IPTW, and IPTCW with weights from maximum likelihood (MLE) and from the calibration approach (CMLE) to the HERS data treatment effects, therefore the evidence is insufficient to draw a conclusion. As expected, our estimated treatment effects for HAART are generally much larger (more than 1 standard error) than those reported in Ko et al. (2003), since we have separated the group with 1-2 ARTs from the group with no treatment. The slightly larger effect of HAART in the CD4 > 500 strata from Ko et al. (2003) is again associated with substantial uncertainty.
In conclusion, the results in Table 1 indicate that there were clinically substantial and statistically significant therapeutic effects of cumulative exposure to HAART for those patients with initial CD4 count ≤ 500, which is consistent with the findings in Ko et al. (2003) and the recommended treatment guideline during the study period of the HERS.

CONCLUSION AND DISCUSSION
In this paper, we have proposed a new CBW approach to MSMs that can accommodate both time-varying confounding and dependent censoring, binary and nonbinary timevarying treatments as well as eventual and longitudinal outcomes. Simulations showed that IPWEs for MSMs with weights from our calibration approach had smaller variances and MSEs than IPWEs with weights from the MLE and CBPS approaches, under correct and incorrect model specification. The flexibility and computational efficiency of our calibration approach makes it well equipped to deal with common scenarios in fitting MSMs using observational cohort data from clinical studies such as the HERS. This will hopefully promote more widespread use of MSMs for various types of treatments/exposure and outcomes in practice. We emphasize that choosing the correct set of covariates and functionals for balancing remains important for the performance of IPWEs with the proposed approach. This is related to the challenging problem of "covariate selection" in causal inference literature (Shortreed and Ertefaie, 2017). Specifically, imposing exact covariate balance, as done in both the proposed approach and the CBPS, will limit the number of covariates and functionals included for balancing, which can reduce the robustness and efficiency of IPWEs if observed confounders and important predictors of the outcome are omitted (Wang and Zubizarreta, 2020). One possible solution is to allow approximate covariate balance such that more calibration restrictions can be included, as advocated in Wang and Zubizarreta (2020). Second, it would be useful to replace initial weights from MLE with initial weights estimated by data-adaptive methods. This can provide some protection from severe model misspecification, and therefore reduce the possibility of large bias for IPWEs with calibrated weights. Third, as pointed out by the associate editor, it would be useful to construct double robust (DR) estimators based on the proposed calibration approach. Focusing on binary treatments, a natural way to construct a DR estimator with our CBWs is to either incorporate them into the augmented inverse probability weighted estimator, or into the targeted maximum likelihood approach as a clever covariate. However, it is not clear if such estimators will perform well even when the treatment assignment and outcome regression models are both mildly misspecified (Kang and Schafer, 2007). Therefore it would be desirable to extend our proposed approach by developing new DR estimators that can perform well when either, but not necessarily both, of the working models for nuisance parameters are mildly misspecified.
Similar to other CBW approaches, it warrants future research to develop sensitivity analysis strategies for the proposed approach to assess the impact of violations to the "no unmeasured confounders" assumption. Ko et al. (2003) implemented the sensitivity analysis approach suggested in Robins (1999a) by introducing a sensitivity parameter defined as the difference between the means of the potential outcomes given observed treatment/covariate histories. This approach is relatively straightforward for binary treatments and continuous outcomes, but less straightforward for other treatment and outcome combinations.

A C K N O W L E D G M E N T S
The authors would like to thank two referees, the associate editor, and Dr Shaun Seaman for helpful comments and suggestions. Data from the HERS were collected under grant U64-CCU10675 from the U.S. Centers for Disease Control and Prevention. This work is supported by the U.K. Medical Research Council [Unit programme number MC_UU_00002/8; grant number MR/M025152/1].

D ATA AVA I L A B I L I T Y S TAT E M E N T
Permission to access the HERS data can be obtained through the U.S. Centers for Disease Control and Prevention. Process to access the database and contact person can be found at https://aspe.hhs.gov/report/inventoryfederally-sponsored-hiv-and-hiv-relevant-databases/data base-hiv-epidemiologic-research-study-hers.