Accounting for time‐dependent treatment use when developing a prognostic model from observational data: A review of methods

Failure to account for time‐dependent treatment use when developing a prognostic model can result in biased future predictions. We reviewed currently available methods to account for treatment use when developing a prognostic model. First, we defined the estimands targeted by each method and examined their mechanisms of action with directed acyclic graphs (DAGs). Next, methods were implemented in data from 1,906 patients; 325 received selective β‐blockers (SBBs) during follow‐up. We demonstrated seven Cox regression modeling strategies: (a) ignoring SBB treatment; (b) excluding SBB users or (c) censoring them when treated; (d) inverse probability of treatment weighting after censoring (IPCW), including SBB treatment as (e) a binary or (f) a time‐dependent covariate; and (g) marginal structural modeling (MSM). Using DAGs, we demonstrated IPCW and MSM have the best properties and target a similar estimand. In the case study, compared to (a), approaches (b) and (e) provided predictions that were 1% and 2% higher on average. Performance (c‐statistic, Brier score, calibration slope) varied minimally between approaches. Our review of methods confirmed that ignoring treatment is theoretically inferior, but differences between the prediction models obtained using different methods can be modest in practice. Future simulation studies and applications are needed to assess the value of applying IPCW or MSM to adjust for treatments in different treatment and disease settings.


BACKGROUND
Prognostic models provide information that can be used to guide physicians and patients when making medical decisions. For example, if a patient has a high predicted probability of a poor health outcome, the physician may recommend preventative medication or referral for specialist treatment. For a prognostic model to be useful for guiding individual decisions regarding interventions, predictions provided by the model should ideally reflect what a person's risk of developing a certain outcome would be if there were to be no intervention (Liew, Doust, & Glasziou, 2011). This has been termed the "untreated risk" of an outcome (Groenwold et al., 2016;Pajouheshnia, Damen, Groenwold, Moons, & Peelen, 2017) and can be formally defined as the probability Pr(Y T = 0 | P), where Y T = 0 is the outcome if individuals were to remain untreated and P is a vector of predictors (Sperrin et al., 2018).
Data used to develop prognostic models often come from observational studies or longitudinal medical databases (e.g., electronic health record data) in which the study sample is composed of both treated and untreated individuals. In particular, individuals who were not receiving treatment when entering the cohort may start treatment during follow-up-an issue termed treatment "drop-in" (Liew et al., 2011;Sperrin et al., 2018). Unfortunately, conventional approaches to develop a prognostic model using data from a partly treated study population will result in a model that does not provide predictions of untreated risk, instead predicting Pr(Y | P). Assuming the treatment is effective in reducing the risk of developing the outcome of interest, the observed risk of individuals in the development sample will, on average, be lower than the risk had they remained untreated. Therefore, risk predictions for future patients provided by the model will underestimate the true untreated risk of the outcome (Groenwold et al., 2016). This could have a number of negative consequences, including possible under-treatment in future patients. Given that a contemporary yet untreated population in many cases no longer exists, several researchers have suggested that the effects of treatment should be accounted for in the analysis when developing a prognostic model (Cheong-See et al., 2016;Groenwold et al., 2016;Liew et al., 2011;Liew, Doust, & Glasziou, 2012;Moons et al., 2015;Sperrin et al., 2018). However, as of yet, there is no clear consensus over whether one analytical approach to deal with the effect of treatment is to be preferred over another.
Simulation studies have been conducted to investigate the impact of methods to account for the effects of treatments on the accuracy of prognostic models. Groenwold et al. (2016) showed that, for a point (time-invariant) treatment, including treatment as a covariate in a prediction model resulted in better predictive performance than ignoring treatment or excluding treated individuals from the development set. This was extended by Sperrin et al. (2018), who proposed the use of marginal structural models (MSMs) to model untreated risk. Through simulations, MSMs were shown to provide unbiased estimates of Pr(Y T = 0 | P), the target estimand of a prognostic model.
In this article, we review currently available methods that might be considered when trying to develop a prognostic model in a partly treated patient cohort, with specific focus on time-varying treatment use after the study baseline. We first outline the theoretical properties of different methods, using directed acyclic graphs (DAGs) to explain their underlying assumptions. Following this, we illustrate how the methods can be applied in practice in a case study and examine differences between the resulting models.

METHODS TO ACCOUNT FOR TREATMENT USE
We consider six approaches that try to remove the effect of treatment drop-in in the development data. The first three approaches remove treatment from the data set in different ways. The next three approaches explicitly model treatment effects. All approaches are based on developing a Cox model with the same predictors as a standard model that ignores treatment use. In this section, we describe the effect of each approach on the associations between predictors in our model (P), treatment use (T), and mortality (Y), using causal diagrams. Figure 1a represents these relationships when treatment is ignored. For each of the approaches (or estimators), we consider whether and how they may provide unbiased estimates of our estimand of interest, namely, Pr(Y T = 0 | P).

Excluding or censoring treatment users from the cohort
A seemingly straightforward way to remove the effect of treatment drop-in is to restrict the analysis to patients who do not receive the treatment during follow-up or to include follow-up of those patients until the moment they start treatment and censor them from that moment onward.
As illustrated in Figure 1b, restricting the analysis to individuals who did not receive a prescription (T = 0) will remove the effect of treatments on mortality from the cohort. However, as previously mentioned, in observational data, treatment is typically associated with predictors in the model (P). Restricting the analysis to the T = 0 subset has two problems: (a) the distribution of predictors and outcomes in the subset may not represent the target population for the model and (b) in the presence of an unobserved confounder (U) of the treatment-outcome association, T now serves as a collider, which is represented by the two incoming arrows. Conditioning on T will alter the relation between P and U, which may lead to biased estimation of the prognostic model (the association between P and Y; see Figure 1c; Hernán, Hernández-Díaz, & Robins, 2004). Alternatively, censoring individuals at the moment of treatment drop-in will retain a representative risk distribution in the data at baseline and retain more information on follow-up. However, as with restriction, if censoring is "informative," that is, associated with P and U, selection bias might affect the estimated model parameters, instead estimating Pr(Y | P, The mechanisms underlying approaches to remove the effect of treatment drop-in on a prognostic model. For a model to provide predictions of the risk of mortality if an individual were not to receive treatment, the model parameters should be estimated such that the associations between P and Y are not influenced by the effect of T on Y. (a) The arrow from predictor to treatment (drop-in) represents a causal association. An indirect pathway from P to T to Y is present, as well as the direct path from P to Y. (b) Conditioning on treatment (represented by a black box) by restriction removes the path between T and Y. (c) In the presence of U, T becomes a collider, and restriction on T creates a backdoor path from P to Y, via T and U, inducing selection bias. (d) Censoring first conditions on T and removes the association between T and Y. The causal path from P to T is removed by inverse probability of treatment weighting. (e) Conditioning on T blocks the indirect path from P to Y via T. (f) In the presence of U, T becomes a collider, and restriction on T creates a backdoor path from P to Y, via T and U, inducing selection bias. For a marginal structural model, this may also create a backdoor path from P to U to Y (not shown). P = predictor(s); Y = outcome; T = treatment; U = unmeasured variable(s)

Reweighting using inverse probability of (censoring) treatment (IPCW) weights
To solve the issue of informative censoring, weights can be applied to patients so that the association between treatment and predictors is removed (Robins & Finkelstein, 2000). Given that treatment drop-in does not occur at a fixed point in time, time-dependent weights need to be derived by, for example, dividing patient follow-up into multiple periods and fitting logistic regression models in each of the periods, regressing the censoring events on predictors of the probability of being censored (i.e., treated), in order to derive time-varying censoring weights (Austin & Stuart, 2015;Cole & Hernán, 2008;Latimer et al., 2017). Stabilized weights for patient i at time j take the form where T(k) is treatment at time point k, P are predictors measured at baseline, and KM represents the Kaplan-Meier survival estimates produced by a null Cox model. In the presence of time-dependent predictor variables, this may yield biased estimates of Pr(Y T = 0 | P). If information on time-dependent predictors P is available, this can be substituted for P, and baseline predictor values P can be used in the numerator to stabilize the weights. The prognostic model is then developed, after censoring, using a weighted Cox model. As Figure 1d indicates, if the underlying assumptions of the weighting procedure are met, the association between P and T should be removed, on top of the removal of the association between T and Y by censoring, allowing for correct estimation of the associations between P and Y in a treatment-naïve pseudopopulation. This might not be the case, for example, if there is a very strong indication for treatment resulting in structural non-positivity (Cole & Hernán, 2008).

Explicitly modeling treatment: time-dependent Cox model
Previous findings have suggested that the explicit modeling of treatment can correct for the effect of a point treatment on predictor-outcome associations (Groenwold et al., 2016). By conditioning the prognostic model on treatment use, the pathway from P to Y via T is blocked (see Figure 1e). A crude approach to model treatment drop-in is to include an "ever/never" indicator variable in the prognostic model. To make prognostic predictions for future patients using this model, we would set their value for the drop-in indicator to zero. However, drop-in occurs by definition after baseline, and conditioning on future treatment (and, thus, future survival) introduces immortal time bias into the estimated model coefficient for treatment drop-in (Lévesque, Hanley, Kezouh, & Suissa, 2010). As T is associated with P, this bias will affect the coefficients of the prognostic model and, hence, affect the predicted probabilities. To address this bias, treatment can be included as a time-dependent covariate in the prognostic model (Fisher & Lin, 1999). If information on treatment status throughout follow-up is available, a time-dependent Cox model can be constructed using start-and stop-time notation (Therneau & Grambsch, 2013). Unlike all the previous approaches, this approach directly models all changes in treatment, such as discontinuation of treatment over time, and treated patients will again contribute untreated person-time after discontinuation. Therefore, this approach makes maximal use of the available data. However, by explicitly modeling treatment, the targeted estimand becomes Pr(Y | P, T = 0), which, for Cox regression, may deviate from Pr(Y T = 0 | P) due to non-collapsibility of the hazard ratio (Greenland, 1996). In addition, the prognosis of patients post-treatment might not represent the untreated prognosis one intends to capture in a prognostic model, and if treatment during follow-up is dependent on prior values of predictors (as shown in Figure 1f), in the presence of residual confounding of the association between T and Y, conditioning on treatment opens a backdoor path from P to Y (via U) and may result in model coefficients that are still biased (Hernán et al., 2004).

Explicitly modeling treatment: MSM
This approach formally derives a prognostic model within a counterfactual framework. As described elsewhere (Sperrin et al., 2018), to estimate the counterfactual, untreated risk Pr(Y T = 0 | P), we can first estimate stabilized inverse probability of treatment (IPTW) weights. As with the derivation of IPCW weights (as described in Section 2.2), we can calculate time-varying IPTW weights. As treatment users are not censored in this approach, weights are calculated for the full period of follow-up for both treated and untreated patients. A weighted prognostic model is derived, which includes the relevant predictor variables, as well as a term to model time-dependent treatment and terms for any interactions between treatment and other predictors in the model that may vary over time. As with explicitly modeling treatment, this approach aims to remove the association between treatment and predictors in the model and will incorporate patient follow-up after treatment discontinuation. Importantly, as with the time-dependent Cox method, this approach also remains sensitive to unmeasured confounding of the treatment-outcome association, as shown in Figure 1f, and requires correct specification of any treatment-predictor interactions. Censoring followed by IPCW does not require any modeling of the treatment-outcome relation, as treated follow-up time is completely removed from the data set.

APPLIED EXAMPLE
We demonstrate the methods described in Section 2 in a case study in which we developed a (hypothetical) prognostic model to predict 5-year mortality risk for patients with chronic obstructive pulmonary disease (COPD) if they were not to use cardio-selective -blocking agents (SBBs). Due to a perceived association between SBB use and COPD exacerbations (Lipworth, Wedzicha, Devereux, Vestbo, & Dransfield, 2016), estimates of patient prognosis if SBBs were to be withheld could be used by physicians to weigh the potential benefits against the harms of prescribing SBBs for individual patients. However, in practice, some COPD patients do use SBBs, and these agents are known to improve survival (Du, Sun, Ding, Lu, & Chen, 2014;Etminan, Jafari, Carleton, & FitzGerald, 2012). All other ("background") treatments were considered to be a part of routine care and should not bias the estimation of untreated risk . A summary of the case study is presented in Supplemental Figure 1.

Case study data
A cohort was defined using electronic health record data from the Utrecht General Practitioners Network database, with patient entry and follow-up from July 1, 1996, to December 31, 2005 (Rutten et al., 2005;Rutten, Zuithoff, Hak, Grobbee, & Hoes, 2010). Prevalent and incident COPD cases entered the cohort on July 1, 1996, and on their date of diagnosis, respectively. In line with previous definitions in this database of an adult, new-user cohort, patients younger than 45 years or using β-blocker medication (ATC code C07) within 14 days prior to cohort entry were excluded.
In total, 1,906 patients were included; median follow-up was 6.7 years, and 559 (29.3%) patients died during follow-up, of which 372 died within 5 years.
Patients prescribed SBBs during follow-up according to their individual prescription records (Rutten et al., 2010) were classified as SBB "users" (yes/no). Prescription duration (days) was calculated as the total number of tablets prescribed divided by the daily tablet quantity prescribed. When multiple prescriptions overlapped in a treatment episode, the total number of overlapping days was added to the end date of the final prescription, to better capture the possibility that patients would still have medication available. In total, 325 (17.1%) patients began using SBBs after entering the cohort, and the median time on treatment was 10.3 months. Prescriptions varied considerably from patient to patient (see Supplemental Figure 2). In total, 68 SBB users (20.1%) died during follow-up, compared to 491 nonusers (31.1%), and SBB users more often had a history of cardiovascular comorbidities than non-SBB users (see Table 1). For all other treatments, only data on use at baseline were recorded. For the sake of illustration, missing data on the number of tablets prescribed (1.2%) and smoking status (33.3%) were singly imputed using

Derivation of a hypothetical prognostic model
First, a prognostic model that completely ignored SBB drop-in was developed using Cox regression using the full follow-up time of all patients. Candidate predictors were selected based on the literature (Esteban et al., 2008;Singanayagam, Schembri, & Chalmers, 2013), the availability of measurements, and backward selection using likelihood ratio testing. The prediction model consisted of the estimated 5-year baseline hazard and 17 predictors: age, sex, smoking status, history of comorbidities (diabetes, hypertension, angina pectoris, myocardial infarction, atrial fibrillation, heart failure, stroke, and peripheral arterial disease), and baseline drug use (drugs for obstructive airway diseases, corticosteroids for systemic use, lipid modifying agents, agents acting on the renin-angiotensin system, diuretics, and antithrombotic agents). The proportional hazards assumption was assessed by visual inspection of Schoenfeld residuals, and no major violations were identified. As treatment use is typically ignored in most prognostic model development studies , the Cox model ignoring treatment-now referred to as "Model 1"-served as a reference model.

Application of methods
We applied each of the approaches to account for treatment drop-in described in Section 2: excluding treated patients (Model 2), censoring treated patients at the moment of treatment drop-in (Model 3), IPCW (Model 4), modeling drop-in as a binary covariate (Model 5), modeling treatment as a time-varying covariate (Model 6), and fitting an MSM with time-varying IPTW weights (Model 7). First, prognostic models were developed following each approach, and the resulting model coefficients and estimated 5-year baseline hazards were estimated. To derive the IPCW and IPTW weights, we divided the data into seven time periods of approximately 1 year and derived time-dependent stabilized weights for each time period using logistic regression (Austin & Stuart, 2015). While shorter periods of 3 months, for example, would have more accurately modeled treatment switches, the number of treated patients per 3-month period did not permit stable estimation of propensity scores. Next, to illustrate differences between the derived models, we used each of the models to calculate 5-year predicted mortality risks for patients in the development data, and we compared these predicted risks to the predicted risks produced by Model 1. Lastly, we assessed the apparent predictive performance (Harrell's c-index, Brier score, calibration slopes) in (a) the full development data set, with patients censored at the moment of SBB treatment, and (b) SBB nonusers only, to evaluate performance in untreated patients. All analyses were performed using the R statistical program (Version 3.2.2). In addition, comparisons were repeated using a "highly treated" subset of the data (50% of the patients began using SBBs), to see whether the results of the modeling approaches differed more with more treatment use in the development sample.

Case study findings
The regression coefficients differed between all seven models (see Table 2). The largest changes in regression coefficients compared to Model 1-developed by the conventional approach of ignoring treatment-were in Models 2 (SBB users excluded), 3 (SBB users censored), 4 (IPCW),  and 7 (MSM). Model 6 (time-dependent SBB) hardly deviated from Model 1. The regression coefficient for (binary) treatment use was negative (−0.56), representing a highly protective effect, but likely due to immortal time bias as described in Section 2.3. The coefficient for SBB use was positive (suggesting a harmful effect) when modeled directly as a time-dependent covariate (0.39) or in an MSM (0.06), which could be the result of residual confounding or misclassification of SBB exposure. Risk predictions made by all models in the development data ranged across the whole spectrum of probabilities (0-1). As shown in Figure 2, Models 2 and 5 produced slightly higher risk predictions than Model 1 (treatment ignored). Models 3, 4, and 6 provided slightly lower risk predictions than Model 1. The means of the predicted risks in the full development cohort were 21.3% (Model 1), 22.7% (Model 2), 20.8% (Model 3), 20.8% (Model 4), 22.5% (Model 5), 21.1% (Model 6), and 21.6% (Model 7). When evaluated in full data, after censoring SBB users, the apparent predictive performance was consistent across the models: c-statistic = 0.79, Brier score = 0.09, and calibration slope ranged from 0.99 (Model 4) to 1.02 (Models 1, 5, and 6). Similar results were seen when the models were evaluated in the data with SBB users excluded.
Repeating the model development and evaluation in a highly treated subset of the development data intensified the differences between the approaches. As Figure 3 shows, the trends of higher predictions as compared to Model 1 (mean predicted risk: 18.6%) after excluding treatment (23.8%) and modeling binary treatment (23.5%) as well as lower predictions after censoring with (15.7%) or without (15.9%) IPCW were more prominent.

FIGURE 2
Five-year mortality risk predictions provided by the seven models for patients in the development data set. Predictions using Models 2-7 are plotted against predictions provided by Model 1 (treatment ignored). Blue points represent predictions for selective β-blocker (SBB) nonusers in the cohort; red points represent predictions for SBB users

FIGURE 3
Five-year mortality risk predictions provided by the seven models for patients in a "highly treated" subset of the development data (50% treatment drop-in). Predictions using Models 2-7 are plotted against predictions provided by Model 1 (treatment ignored). Blue points represent predictions for selective β-blocker (SBB) nonusers in the cohort; red points represent predictions for SBB users

DISCUSSION
This study reviews seven approaches to account for treatment use when developing a prognostic model using observational data. Previous work has focused on methods to account for point treatments, that is, treatment use that does not change over time (Groenwold et al., 2016;Pajouheshnia, Peelen, Moons, Reitsma, & Groenwold, 2017). The applied case study that we present shows the use of treatments can vary greatly over time and across individuals (see Supplemental Figure 2). The seven models, derived by ignoring treatment or following one of six approaches, varied in their coefficients and estimated slightly different risks for individuals in the development data set. We expected the use of effective treatments in the development set to result in a model that underestimates untreated risks and, therefore, that approaches to correct for this would result in models that yielded higher predicted risks on average. However, in our case study, the impact was minimal.
There are a number of explanations for the limited and, where present, unexpected differences between approaches that we observed in the case study. First, it may be that the effect of treatment in the development cohort was too small to impact the performance of our prognostic model. The abundant literature supporting the protective effect of SBBs on mortality for patients with COPD (suggesting a risk ratio of approximately 0.7) suggests there was indeed a true effect of treatment in the patients in our data set who did receive treatment. However, it may be that the 17.1% drop-in rate was insufficient to affect predictions. Additionally, most patients were prescribed SBBs for less than 1 year, which may have limited the impact on their overall risk profile during the study. With the exception of modeling SBB use with an MSM or as a time-dependent covariate, the approaches did not account for the possibility that individuals may have stopped treatment. We also assumed that treatment prescriptions directly represent treatment use, which may have overestimated the total amount of actual treatment use. Indeed, when the analysis set was sampled such that the SBB drop-in rate was 50%, the effects of removing or modeling treatment on risk predictions provided by the models became more apparent. The difference between drug prescription or dispensing and actual use is an inherent challenge when using routinely collected data to model treatment use. In addition, the limited effects of the more advanced IPCW and time-dependent covariate approaches may also be due to residual confounding. While this study has demonstrated that it is feasible to account for the time-dependent use of a pharmacological treatment in practice using prescription information, due to a lack of available information on disease severity and comorbidity during follow-up, we were not able to incorporate this important factor in our analyses. Unobserved confounding of predictor-treatment associations can bias IPCW and MSM estimates, as discussed elsewhere (Cole & Hernán, 2008), and, as indicated in Figure 1, may result in a failure to remove the association between predictors and treatment. In such a case, these methods would not correctly estimate Pr(Y T = 0 | P). This highlights the challenge of implementing more complex approaches when using observational data with limited available data.
Our structured approach to considering the potential bias caused by treatments falls in line with the ideas recently described by Sperrin et al. (2018), who recommended using MSMs to address treatment use for prognostic model development. In our review of methods, we include an alternative estimator, namely, censoring followed by IPCW. Although all of the approaches considered in this paper attempt to remove the influence of treatment on predictor-outcome associations in a prognostic model, subtle differences between them may affect their suitability. We have demonstrated that these differences can be largely explained in terms of the target estimands of the different approaches and the potential mechanisms by which they can introduce selection bias. As previously discussed (Robins, Hernan, & Brumback, 2000;Sperrin et al., 2018), an advantage of the MSM approach is that, unlike simply modeling treatment as a time-dependent covariate, it can account for the association between prior treatment use and time-dependent confounders. Furthermore, as with the time-dependent Cox approach, this method utilizes all available data by also modeling periods after treatment discontinuation. However, correct specification of the MSM to capture differences between the pre-and post-treatment prognoses of patients may be challenging, and censoring followed by IPCW could be considered a safer, more conservative approach.
While this study provides insight into how various methods to account for treatment drop-in work and highlights the assumptions they require, we do not provide empirical evidence to support one method over another. Ideally, to compare different modeling approaches empirically, the performance of models derived following each approach would need to be compared in a treatment-naïve validation data set. Such data were not available for our case study, due to the widespread use of SBBs, and thus, we could only evaluate model performance in an untreated subset of the development cohort. Future studies using real data may need to consider employing advanced methods to account for treatment use in a validation data set . Alternatively, future research could evaluate model performance in a data set with simulated untreated outcomes. In addition, in this study, we selected the parameters in our prognostic model based on clinical knowledge and a backward selection procedure. It is not yet established whether shrinkage methods for model selection could influence the effect of treatment on a prediction model, or vice versa, thus requiring further investigation. However, our applied case study does demonstrate that the choice of approach affects the risk predictions made by a developed prognostic model. Whether these approaches are necessary for a given study will depend on factors such as the strength of the treatment effect, the duration of treatment, the number of individuals treated, and the strength of any associations between predictors in the model and treatment use; all of which can be investigated in additional case studies and simulation studies.
In conclusion, the available methods that could be used to account for treatment drop-in when developing a prognostic model vary in terms of how they account for causal pathways in the predictor-treatment-outcome association and in terms of their target estimands. Censoring followed by IPCW or MSMs should be considered in future model development studies.