By continuing to browse this site you agree to us using cookies as described in About Cookies

Notice: Wiley Online Library will be unavailable on Saturday 7th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 08.00 EDT / 13.00 BST / 17:30 IST / 20.00 SGT and Sunday 8th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 06.00 EDT / 11.00 BST / 15:30 IST / 18.00 SGT for essential maintenance. Apologies for the inconvenience.

Dynamic marginal structural modeling to evaluate the comparative effectiveness of more or less aggressive treatment intensification strategies in adults with type 2 diabetes

Romain Neugebauer, Kaiser Permanente Northern California, Division of Research, 2000 Broadway, Oakland, CA 94612, USA. E-mail: Romain.S.Neugebauer@kp.org

Chronic disease care typically involves treatment decisions that are frequently adjusted to the patient's evolving clinical course (e.g., hemoglobin A1c monitoring and treatment intensification in diabetes patients). Thus, in comparative effectiveness and safety research (CER), it often is less clinically relevant to contrast the health effects of static treatment decisions than to compare the effectiveness of competing medical guidelines, that is, adaptive treatment strategies that map the patient's unfolding clinical course to subsequent treatment decisions. With longitudinal observational studies, treatment decisions at any point in time may be influenced by clinical factors that also are risk factors for the outcome of interest. Such time-dependent confounders cannot be properly handled with standard statistical approaches, because such confounders may be influenced by previous treatment decisions and may thus lie on causal pathways between the very outcomes and early treatment decisions whose effects are under study. Under explicit assumptions, we motivate the application of inverse probability weighting estimation to fit dynamic marginal structural models (MSMs) in observational studies to address pragmatic CER questions and properly adjust for time-dependent confounding and informative loss to follow-up.

Methods

We review the principles behind this modeling approach and describe its application in an observational study of type 2 diabetes patients to investigate the comparative effectiveness of four adaptive treatment intensification strategies for glucose control on subsequent development or progression of urinary albumin excretion.

Results

Results indicate a protective effect of more aggressive treatment intensification strategies in patients already on two or more oral agents or basal insulin. These conclusions are concordant with recent randomized trials.

The progressive nature of type 2 diabetes mellitus (T2DM) results in frequent revisiting of treatment decisions for many patients as glycemic control deteriorates. Widely accepted stepwise guidelines start treatment with metformin, then add a secretagogue if control is not reached or deteriorates. Insulin or (less frequently) a third oral agent is the next step. Current recommendations specify target hemoglobin A1c of < 7% for most patients.[1, 2] However, evidence supporting the effectiveness of this blanket recommendation is inconsistent across several outcomes,[3-10] especially when intensive anti-diabetic therapy is required.

The effects of intensive treatment remain uncertain, and the optimal target levels of A1c for balancing benefits and risks of therapy are not clearly defined.

Because no additional major trials addressing these questions are underway, better answers are at best 5–10 years away. Thus, analytic methods that aim to mimic randomized experiments using observational data and explicit assumptions provide an alternative to address comparative effectiveness and safety research (CER) questions. Using the electronic health records (EHRs) from patients of seven sites of the HMO Research Network (HMORN),[11] we assembled a large retrospective cohort study of adults with T2DM to evaluate several clinical outcomes. We addressed these questions using inverse probability weighting (IPW) to fit marginal structural models (MSMs) and properly account for the time-dependent confounding and informative selection bias that often arise in observational cohort analyses. Here, our principal goal is to illustrate this analytic approach, but we also present results for one clinical outcome, the development/progression of albuminuria, for which prior trials have suggested the benefit of tight control.

AN OBSERVATIONAL, MULTI-CENTER RETROSPECTIVE COHORT STUDY

We contrast two observational approaches to examining the question of optimal glycemic control strategy in T2DM. In the first, associations of patients' A1c levels with subsequent clinical outcomes are directly analyzed. Many previous epidemiologic studies have made such a comparison, but these associations in clinical data tell us nothing about intensification of therapy for any level of A1c.

The approach we chose starts by identifying the key pragmatic clinical question and then restricting the study cohort to patients for whom the question is relevant, as in a clinical trial. Here, the relevant clinical question involves only patients who are “failing” current therapy with multiple (> 2) oral agents and/or basal insulin. We define “failing” as having an A1c that rises above 7% after being below 7% on the current treatment regimen. Cohort entry occurs on date of first elevated A1c. We excluded patients whose first elevated A1c is > 8.5% because there is little question that intensifying anti-diabetic therapy is needed—unless patients are not adhering to current treatment. We also excluded patients whose life expectancy is limited by selected co-morbid conditions. By restricting analyses to patients in whom the clinical question is truly relevant, we aim to mimic inferences from a randomized experiment that compares several plausible treatment intensification (TI) guidelines, each based on a different threshold level of A1c.

We searched the entire adult membership of seven participating HMORN health plans for enrollees meeting the eligibility criteria described in Appendix A. We enrolled each one at the earliest date between 1 January 2001 and 30 June 2009 on which all criteria were met.

These criteria identified 58 671 patients. All patients were followed up from study entry until the earliest of 31 December 2009 (end of the study), plan disenrollment, or death.

MOTIVATION FOR MARGINAL STRUCTURAL MODEL APPROACHES IN COMPARATIVE EFFECTIVENESS AND SAFETY RESEARCH STUDIES

Comparative effectiveness and safety research problems typically involve comparing survival outcomes between exposure groups that are not defined by a therapy administered at one point in time but, instead, by a sequence of therapies experienced by patients over time and characterized by initiations, discontinuations, or changes in therapies. In observational studies, the effect of these exposure regimens may be confounded by risk factors for the outcome that also affect the selection of early and subsequent therapies. For example, Figure 1 represents plausible causal relationships between variables collected in our study using a directed acyclic graph (DAG). Standard methods to account for confounding involve modeling approaches that compare the distributions of outcomes conditional on both the levels of confounders and exposure variables. However, it is well established[12, 13] that such methods do not properly handle time-dependent confounders affected by early therapy exposure such as covariate L(1) (Figure 1) because of the following: (i) inclusion of such confounders in standard models may lead to bias because they block some of the causal pathways under investigation; and (ii) exclusion of such confounders from standard models may lead to bias because of the lack of confounding adjustment.

To alleviate the problem described above, exposure groups in CER studies may be defined based on early therapy exposure only such as A(0) (i.e., disregarding possible changes in therapies during follow-up: A(1)). Confounding adjustment by baseline covariates (L(0)) using standard modeling approaches or causal methods for point treatment problems (e.g., propensity score[14-17]) can then provide unbiased estimates of the effect of the early therapy exposure (A(0)). Such results are especially useful in studies (e.g., randomized trials) in which changes in therapies are uncommon during the follow-up time, that is, an intention-to-treat (ITT) interpretation of the results is informative. However, even in such studies, inference from standard methods or causal methods for point treatment problems may be biased when loss to follow-up (a.k.a. right-censoring) is differential between exposure groups and is affected by time-dependent risk factors for the outcome that also are affected by early therapy exposure[18-20]. Thus, if changes in therapies are expected to be a common occurrence or if informative right-censoring is a concern, effect estimates from statistical methods for point treatment problems in studies where exposure groups are defined only by early therapy may be misleading[20]. These estimates may be attributed partially or in full to subsequent differential therapy exposures or biased because of informative right-censoring. In such studies, statistical methods for point treatment problems are inadequate; causal methods for longitudinal data problems[21, 22] that can properly control for differential right-censoring and therapy changes over time are viable alternatives. Here, we report application of one of these methods based on IPW estimation to fit dynamic MSM.

A DYNAMIC MARGINAL STRUCTURAL MODEL APPROACH

Data structure

The observed data on any given patient in this study (diabetes mellitus (DM) study) consist of the collected measurements on exposure, outcome, and confounding variables over time until the patient's end of follow-up. The time when the patient's follow-up ends is denoted by T˜ and is defined as the earliest of the time to failure (i.e., albuminuria development/progression) denoted by T or the time to a right-censoring event denoted by C (i.e., end of study, death, or disenrollment from the health plan). The indicator that the follow-up time T˜ is equal to the failure time T is denoted by Δ=I(T˜=T). At each time point t=0,…,T˜, the patient's exposure to an intensified DM treatment is represented by the binary variable A_{1}(t), and the patient's right-censoring status is denoted by the indicator variable A_{2}(t) = I(C ≤ t). The combination A(t) = (A_{1}(t), A_{2}(t)) is referred to as the action at time t. At each time point t=0,…,T˜, covariates (e.g., A1c measurements) are denoted by the multi-dimensional variable L(t) and defined as measurements that occur before A(t) or are otherwise assumed not to be affected by the actions at time t or thereafter, (A(t), A(t + 1), …). For each time point t=0,…,T˜+1, the outcome (i.e., the indicator of past failure) is denoted by Y(t) = I(T ≤ t − 1) and is an element of the covariates at time t, L(t). By definition, the outcome is thus 0 for t=0,…,T˜, missing at t=T˜+1 if Δ = 0 and 1 at t=T˜+1 if Δ = 0. To simplify notation, we use overbars to denote covariate and exposure histories, for example, a patient's exposure history through time t is denoted by Ā(t) = (A(0), …, A(t)). Following the MSM framework[23], we approached the observed data from the DM study as realizations of n independent and identically distributed copies of O=(T˜,Δ,L¯(T˜),A¯(T˜),ΔY(T˜+1)) where n represents the sample size.

Effect definition and inverse probability weighting estimation

Informally, a causal effect may be defined by analogy with an ideal randomized experiment, that is, a trial with perfect compliance and no right-censoring. In a longitudinal study with a time-to-event outcome, a causal effect may be defined through the description of a trial in which the treatments experienced by each patient over time in each arm are specified a priori. For example, in arm 1, a patient's DM treatment is intensified at study entry and remains intensified thereafter; in arm 2, a patient's DM treatment is intensified after 6 months and remains intensified thereafter; and in arm 3, TI is never initiated. In the previous example, the levels of aggressiveness of the TI strategies from arm 1 to 3 are decreasing, and contrasts between the distributions of outcomes in each arm thus provide measures of the impact of intensive versus less intensive glycemic control strategies. The treatment interventions in the ideal trial above are referred to as static because they result from a decision at study entry that is not modified based on the patient's baseline condition or in response to the patient's changing condition over time.

Contrasts between the survival curves of each arm of such a trial define a class of causal effects represented by MSM. In the MSM framework, causal effects are formally defined as contrasts between distributions of potential outcomes in a conceptual experiment (Appendix B). MSM are models for the distribution of such potential outcomes[23] and may be fitted using IPW estimation[23-25] under explicit assumptions (no unmeasured confounders, positivity, and consistent estimation of the denominator of the IPW weights) as described in Appendix C.

Although causal effects defined by static treatment interventions are routinely studied, clinical practice often involves treatment decisions that are continuously adjusted to the patient's evolving clinical course (e.g., response to previously experienced treatments) and are not set a priori at baseline. Thus, it often may be less clinically relevant, or even less pragmatic, to compare the health effect of static treatment interventions than to compare the effectiveness of competing medical guidelines, that is, adaptive treatment strategies that map the patient's unfolding clinical course to subsequent treatment interventions. Following such treatment strategies leads to treatment interventions over time, which are referred to as dynamic, because the treatment is adjusted based on the patient's changing circumstances.

Informally, alternate causal effects thus may be defined by analogy with an ideal experiment in which patients are randomized to adaptive treatment strategies that are informed by clinical practice. Contrasts of the survival curves between each arm of such a trial define a class of causal effects represented by dynamic MSM.[26-28] In the DM study, relevant treatment strategies are policies for TI initiation based on the patient's latest A1c level of the type: “patient should initiate TI at the first time her A1c level reaches or drifts above θ% and should remain on the intensified therapy thereafter.” Four A1c thresholds θ are of interest, and each defines a different TI strategy denoted by d_{θ} with θ ∈ Θ = {7, 7.5, 8, 8.5} where d_{θ} represents a mapping from past observed covariates L¯(t) to a sequence of treatment interventions up to time t. The intensity of glycemic control according to strategy d_{θ} decreases as the A1c threshold θ triggering TI increases. The definition of the effects of such TI strategies can be formalized as contrasts between distributions of potential outcomes (Appendix D).

Inverse probability weighting estimation has been extended to fit dynamic MSM.[27, 28] In the case of logistic dynamic MSM for the discrete-time hazards, IPW estimation may be implemented through a standard logistic weighted regression, in which a patient contributes several observations to each risk set if her treatment history is concordant with treatment decisions based on several different strategies d_{θ}. For instance, a patient whose treatment was not intensified and whose A1c level never reached nor drifted above 8% between study entry and time t is ‘following’ strategies d_{8} and d_{8.5} up to and at time t. The outcome at time t + 1 of a patient at risk of failure is replicated for each strategy d_{θ} followed by that patient up to and at time t. Each of these outcome replicates is associated with one of the strategies followed and is weighted with a stabilized weight defined as

where (i) the numerator is the product over time of a patient's probabilities of remaining uncensored and experiencing a treatment concordant with treatment strategy d_{θ} at each time point, given that she did not experience failure previously and that her treatment history also was concordant with treatment decisions made according to d_{θ}; and (ii) the denominator is the product over time of a patient's probabilities of remaining uncensored and experiencing the treatment she experienced at each time point, given her history of treatments and covariates. Valid inference based on IPW estimation to fit a dynamic MSM relies on the same three aforementioned assumptions, with the exception that the positivity assumption may be weakened[29, 30] through the choice of dynamic interventions that are realistic, for example, treatment strategies aligned with clinical practice. Indeed, the positivity assumption for dynamic MSM requires that any patient may follow any of the treatment strategies of interest at any point in time, regardless of her covariate history:

Given that we expected near violation of the positivity assumption for static TI interventions, and given the clinical relevance of the dynamic treatment interventions described above, we adopted a dynamic MSM approach as described below to compare the effectiveness and safety of TI strategies that mimic current clinical practice.

Application to the diabetes mellitus study

For patients with normoalbuminuria at study entry, that is, microalbumin: creatinine ratio (ACR) < 30, we defined failure as an ACR measurement indicating either microalbuminuria (ACR 30–300) or macroalbuminuria (ACR > 300). For patients with microalbuminuria at study entry, we defined failure as an ACR measurement indicating macroalbuminuria. We excluded patients with a baseline ACR measurement missing (5884) or indicating macroalbuminuria (1608), which yielded the sample size n = 51, 179.

Given that we did not expect patients' glycemia to be monitored by an A1c test more than once every 90 days, the analytic unit of time we chose for analyses is a 90-day interval. Appendix E describes how EHR data collected on a daily scale were mapped into the data structure O described in subsection 1. With these restructured data, we determined which patients were exposed to TI in accordance with each of the four treatment strategies of interest d_{θ} at each follow-up time (Figure 2). Of 12 085 observed failures, about 72% occurred to patients who were following at least one of these strategies at the time of failure.

We computed stabilized IPW weights for each replicate of the outcomes in all risk sets as described in Appendix F based on the restructured data. The 99% and 99.9% quantiles of the distribution of these weights were 8.06 and 16.01, respectively. We truncated the weights at 20.[31, 32]

We chose a flexible parameterization of the dynamic, logistic MSM for the discrete-time hazards P(Y_{d}(t + 1) = 1|Y_{d}(t) = 0) to avoid reliance on possibly incorrect modeling assumptions, including the proportionality assumption (i.e., constant hazard ratio or HR over time):

where I denotes indicator variables and Y_{d}(t) denotes the potential outcome at time t corresponding with treatment strategy d_{θ} (Appendix D). This model assumes piecewise constant HR (each time “piece” is increasing in length over time) and does not impose artificial constraints on the change in failure risk as a function (e.g., linear) of the A1c thresholds θ. Once this dynamic MSM was fitted with IPW estimation, we derived point estimates of the potential survival curves P(Tdθ>t) for the four A1c thresholds of interest (equality (4) in Appendix D). These survival curves (Figure 3) were compared with “crude” estimates of the survival curves under each strategy represented on the same Figure and estimated using a standard logistic model for the discrete-time hazards with the same parameterization as for the MSM but fitted without weights.

Using the delta method[33] and the influence curve of the IPW estimator of the coefficients β from the logistic, dynamic MSM, we derived confidence intervals and p-values analytically for the cumulative risk differences and risk ratios at 4 years. We also derived inferences based on 1000 bootstrap samples. Estimates of the contrasts of the survival curves at a four-year cross section are presented in Tables 1 and 2.

Table 1. Stabilized, truncated IPW estimates of the (cumulative) risk differences for albuminuria development/progression over 4 years

Risk difference (row minus column)

A1c target = 7% (d_{7})

A1c target = 7.5% (d_{7.5})

A1c target = 8% (d_{8})

The risk contrasts were derived from a logistic, dynamic MSM for the discrete-time hazards. Inference was derived based on both the influence curve of the IPW estimator and 1000 bootstrap samples (bold). ‘SE’ and ‘p’ stand for standard error and p-value, respectively. The 95%CIs are provided in parentheses next to the point estimates. The SE estimates based on the bootstrap and influence curve approaches are similar and lead to the same conclusions.

A1c target = 8.5% (d_{8.5})

4.61e-02 (1.32e-02;7.9e-02)

2.27e-02 (2e-04;4.51e-02)

1.62e-02 (1.3e-03;3.11e-02)

SE = 1.68e-02, p = 1e-02

SE = 1.15e-02, p = 5e-02

SE = 7.6e-03, p = 3e-02

4.61e-02 (1.01e-02; 7.59e-02)

2.27e-02 (1.5e-03; 4.32e-02)

1.62e-02 (6e-04; 3.01e-02)

SE = 1.64e-02, p = 9e-03

SE = 1.08e-02, p = 3.2e-02

SE = 7.6e-03, p = 3.2e-02

A1c target = 8% (d_{8})

2.99e-02 (−2.6e-03;6.24e-02)

6.5e-03 (−1.33e-02;2.62e-02)

SE = 1.66e-02, p = 7e-02

SE = 1.01e-02, p = 0.52

2.99e-02 (−4.3e-03; 6e-02)

6.5e-03 (−1.22e-02; 2.45e-02)

SE = 1.65e-02, p = 6.5e-02

SE = 9.5e-03, p = 0.483

A1c target = 7.5% (d_{7.5})

2.34e-02 (−7.7e-03;5.46e-02)

SE = 1.59e-02, p = 0.14

2.34e-02 (−9.2e-03;5.17e-02)

SE = 1.56e-02, p = 0.147

Table 2. Stabilized, truncated IPW estimates of the (cumulative) risk ratios for albuminuria progression/development over 4 years

Risk ratio (row over column)

A1c target = 7% (d_{7})

A1c target = 7.5% (d_{7.5})

A1c target = 8% (d_{8})

The risk contrasts were derived from a logistic, dynamic MSM for the discrete-time hazards. Inference was derived based on both the influence curve of the IPW estimator and 1000 bootstrap samples (bold). ‘SE’ and ‘p’ stand for standard error and p-value respectively. The 95%CIs are provided in parentheses next to the point estimates. The SE estimates based on the bootstrap and influence curve approaches are similar and lead to the same conclusions.

A1c target = 8.5% (d_{8.5})

1.192 (1.0321; 1.3518)

1.086 [0.9951; 1.1768]

1.06 [1.0029; 1.1171]

SE = 8.16e-02, p = 2e-02

SE = 4.63e-02, p = 6e-02

SE = 2.92e-02, p = 4e-02

1.192 (1.037; 1.354)

1.086 (1.0052; 1.1736)

1.06 (1.002; 1.1161)

SE = 7.93e-02, p = 1.7e-02

SE = 4.37e-02, p = 4.3e-02

SE = 2.92e-02, p = 3.8e-02

A1c target = 8% (d_{8})

1.1245 (0.9747; 1.2743)

1.0245 (0.9481; 1.1009)

SE = 7.64e-02, p = 1e-01

SE = 3.9e-02, p = 0.53

1.1245 (0.9841; 1.2746)

1.0245 (0.9562; 1.0999)

SE = 7.56e-02, p = 9e-02

SE = 3.66e-02, p = 0.492

A1c target = 7.5% (d_{7.5})

1.0976 (0.958; 1.2372)

SE = 7.12e-02, p = 0.17

1.0976 (0.9655; 1.2379)

SE = 6.96e-02, p = 0.166

DISCUSSION

In this paper, we motivated and illustrated the application of dynamic MSM in observational CER and demonstrated the feasibility of this approach in a study based on large healthcare databases. We found that when patients already on two or more oral glucose-lowering medications or on insulin were promptly intensified (e.g., when A1c levels rose to levels greater than 7%), onset or progression of albuminuria was reduced compared with patients treated only when they reached higher A1c thresholds (e.g., 7.5%, 8.0%, and 8.5%). Note that, although the advantage of TI initiation rules d_{7}, d_{7.5}, and d_{8} over rule d_{8.5} is significant at the 0.05 level, the precision of the other effect estimates (risk differences or ratios) does not allow ruling out the null hypotheses at the 0.05 level (Tables 1 and 2). Note also that, although the point estimates of a naive analysis (plot on the left in Figure 3) suggest a beneficial effect of early TI initiation based on rule d_{7} compared to delayed TI initiation based on all other rules, such beneficial effects of prompt TI initiation are not as clear when comparing rules d_{7.5}, d_{8}, and d_{8.5} only (the three curves are nearly indistinguishable at 4 years). However, the MSM point estimates (plot on the right in Figure 3) clearly indicate an early separation and consistent ordering of the four survival curves, suggesting an increasing beneficial effect on onset or progression of albuminuria as the A1c threshold triggering treatment intensification decreases. Sensitivity analyses (Appendix G) did not alter the substantive finding of a beneficial effect of prompt TI on onset or progression of albuminuria. Findings from the MSM analyses are quite consistent with the recent ACCORD and ADVANCE randomized clinical trials, which found that maintaining A1c values < 7% (compared with > 7%) reduced onset and progression of albuminuria in patients who were clinically similar to those included in this study.[34, 35] In the ADVANCE trial, the more intensive therapy arm aimed to reach an A1c level < 6.5% and achieved a mean A1c level of 6.5%, compared with a mean level of 7.3% in the control arm. This difference was associated with a significant decrease in risk for development or progression of albumin excretion (HR: 0.79, 0.66–0.93). In the ACCORD trial, the more intensive arm aimed for an A1c of < 6% and achieved a mean A1c of 6.4% (versus 7.5% in controls). Onset and progression of microalbuminuria were reduced with intensive therapy.

The validity of the effect estimates relies on correct specification of the parametric models used to estimate the denominator of the IPW weights. Separate models were fitted for predicting TI initiation and TI continuation to allow covariates to affect the probability of TI exposure differently based on whether TI exposure had been previously experienced. For a patient to follow strategy d_{7}, she had to experience TI in the first quarter of follow-up because patients entered the study with an A1c level ≥ 7% by design. To better capture confounding by baseline covariates, a separate model for TI initiation in the first period was thus fitted, whereas a pooled model was fitted to predict TI initiation at all subsequent quarters. The same concern over specification of a single model for predicting right-censoring motivated the use of different models to separately predict right-censoring because of death or disenrollment from the health plan. The distribution of the inverse probability weights did not raise concerns for finite sample bias because of near violation of the positivity assumption. Results from untruncated IPW estimates were virtually identical. The no unmeasured confounders (NUC) assumption cannot be tested with data alone, and our analysis thus relies on the assumption that all risk factors for the outcome that also affect censoring and the decision to initiate TI but also continue a TI regimen are included in the observed covariate process. MSM results may be biased if the assumptions above do not hold. In this analysis, we also restructured the data to approximate an MSM framework in which monitoring is assumed to be non-random, and thus, our approach does not reflect that albuminuria development/progression was only observed partially based on random clinical monitoring (e.g., for a patient experiencing albuminuria progression, the failure time is only known to lie between the times of two actual clinic visits when ACR was measured and the exact 90-day interval when failure occurred may thus be unknown if ACR monitoring did not take place in each 90-day follow-up time). Application of an MSM approach for interval-censored data structures[21, 36] will be of interest in future work to address this limitation.

Concerns may be raised over the validity of the NUC assumption in studies based on healthcare databases and, thus, over the applicability of MSM approaches to address CER questions in such settings. Indeed, patient monitoring is not controlled by investigators in such studies, which can result in missing or incomplete information about risk factors for the outcome between clinic visits. Given that the decision to initiate treatment in practice is typically based on medical events known to physicians during clinic visits, the NUC assumption should approximately hold for investigating ITT interventions even when changes in patients' A1c are not well captured by the healthcare system, as long as the information used by physicians to make a decision to initiate treatment is captured by the healthcare database. The investigation of such effects defined by ITT interventions in longitudinal studies might be favored because variables affecting both treatment discontinuation and the outcome may be “less well measured in most observational studies”,[37] although such an approach suffers from the same limitation described for point treatment studies in section 3 when frequent therapy changes are a concern. In a sensitivity analysis (Appendix G), we evaluated legacy effects defined by TI initiation strategies that are hybrids between ITT and non-ITT strategies. Another concern may be raised over infrequent monitoring of both albuminuria and A1c, which has a direct impact on both the patient's exposure and outcome classification. To mimic inference from a randomized experiment in which patients' A1c and albuminuria are monitored at least once every 6 months and every year respectively, we implemented a sensitivity analysis (Appendix G) in which a patient's follow-up data were artificially right-censored for insufficient monitoring of A1c or albuminuria, and IPW estimation was then extended to incorporate these two new sources of possible informative censoring.

It is not feasible to conduct randomized trials to resolve many important CER questions. Here, we demonstrate that MSM analyses of observational data that attempt to mimic such randomized trials can provide a viable analytic alternative to inform clinical problems for which standard statistical tools are known to be inadequate.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

KEY POINT

Marginal structural modeling with IPW estimation is a viable alternative to inadequate standard modeling approaches in many CER problems where time-dependent confounding and informative loss to follow-up are expected. Dynamic MSM are particularly relevant to address many pragmatic questions in observational CER.

ACKNOWLEDGEMENTS

The authors thank Mark J. van der for helpful discussions, in particular regarding the derivation of analytic variance estimates. This project was funded under Contract No. HHSA290-2005-0033I from the Agency for Healthcare Research and Quality, US Department of Health and Human Services, as part of the Developing Evidence to Inform Decisions about Effectiveness (DEcIDE) program. The authors of this report are responsible for its content. Statements in this report should not be construed as endorsement by the Agency for Healthcare Research and Quality or the US Department of Health and Human Services.

APPENDIX A

STUDY ELIGIBILITY CRITERIA

The cohort of adults with T2DM was assembled using the EHR from patients of the following seven sites of the HMORN: Group Health, HealthPartners, Kaiser Permanente (KP) Colorado (KPCO), KP Hawaii (KPHI), KP Northwest (KPNW), KP Northern California (KPNC), and KP Southern California (KPSC). The Institutional Review Board (IRB) at KPNC approved this study. The IRBs at the other participating institutions ceded IRB review authority to the KPNC IRB.

We searched the entire adult membership of these participating HMORN health plans for enrollees meeting the eligibility criteria below and enrolled each at the earliest date between 1 January 2001 and 30 June 2009 on which all criteria were met: (i) age ≥ 18 years; (ii) currently taking two or more classes of oral medications and/or basal (long-acting) insulin; (iii) at least one A1c value < 7% while on the above regimen; and (iv) at least one subsequent elevated A1c (7% < A1c < 8.5%) (Cohort entry occurs on date of first elevated A1c); (v) ≥ 2 years continuous health plan enrollment before cohort entry; (vi) pharmacy benefit throughout follow-up; (vii) no inpatient or outpatient diagnosis in the 2 years prior to cohort entry date of active cancer other than non-melanoma skin cancer; end-stage renal disease or chronic kidney disease (ICD-9 code or two estimated glomerular filtration rates (eGFR) < 30); hepatic failure; dementia; or hospitalization for congestive heart failure; (viii) no short-acting insulin dispensed in the year before cohort entry; (ix) no evidence of pregnancy in the 15 months before cohort entry. These criteria identified 58 671 patients. All patients were followed up from study entry until the earliest of 31 December 2009 (end of the study), plan disenrollment, or death.

APPENDIX B

FORMAL EFFECT DEFINITION BASED ON A CONCEPTUAL EXPERIMENT

Formally in the MSM framework, causal effects are defined as contrasts between the distributions of potential outcomes in a conceptual experiment. At each time point t = 0, …, K of this fictitious experiment, where K denotes a fixed and arbitrary follow-up time, patient's covariates are observed under any given action interventions ā(K) = (ā_{1}(K), ā_{2}(K))) such that ā_{2}(K) = 0 (i.e., right-censoring never occurs). The patient's potential covariates at time t are denoted by L_{ā(K)}(t). The time ordering of action and covariates described in subsection 1 of section 4 implies the equality: L¯a¯(K)(t)=L¯a¯(t−1)(t) because actions after t − 1 cannot affect covariate levels at time t. As in the observed data O, the potential covariates at t, L_{ā(t − 1)}(t), include the potential outcome denoted by Y_{ā(t − 1)}(t). Under any action interventions ā(K) in this conceptual experiment, failure may occur during follow-up. The potential failure time is denoted by Ta¯1(K) and is defined by the time when the potential outcome jumps to 1 (Y¯a¯(K)(Ta¯1(K))=0,Ya¯(K)(Ta¯1(K)+1)=1,…,Ya¯(K)(K+1)=1). By convention, if failure does not occur during follow-up under action intervention ā(K), that is, Y¯a¯(K)(K+1)=0, the potential failure time Ta¯1(K) is defined as ∞, and otherwise, the remainder of the potential covariate process after failure time Ta¯1(K) occurs is defined by the degenerate variables:

La¯(t−1)(t)=Ya¯(K)(Ta¯1(K)+1),where P(La¯(t−1)(t)=1)=1 for t=Ta¯1(K)+2,…,K+1.

The collection of all potential covariates collected over time during this ideal experiment is referred to as the full data and is denoted by X=(L¯a¯(K)(K+1))a¯(K)∈A(K), where A(K) represents the collection of treatment and right-censoring interventions of interest: A(K) ≡ {ā(K) = (ā_{1}(K), ā_{2}(K)) : ā_{2}(K) = 0}. The causal effects that were informally defined above by analogy with a randomized trial can now be formally defined as a function of the distribution of these full data, for example, differences or ratios of cumulative risks of event up to time t = 0, …, K:

for t = 0, …, K and any two different action interventions ā(K) ∈ A(K) and a¯′(K)∈A(K). As implied by the two equalities above, these risk differences and ratios are all functions of the potential survival curves P(Ta¯1(t)>t). A model that represents each potential survival curve is a MSM[23], for example, a logistic model for the corresponding discrete-time hazards P(Y_{ā(t)}(t + 1) = 1|Y_{ā(t − 1)}(t) = 0):

A MSM may be fitted through IPW estimation[23, 24], which is implemented in practice through a standard weighted regression. In the case of a logistic MSM for the discrete-time hazards, the outcomes of patients at risk of failure are regressed with weights against the corresponding histories of treatment. A patient's outcome at time t + 1 is weighted with a so-called stabilized weight, defined as

where the denominator of the weight (referred to as the action mechanism) permits adjustment for both confounding and selection bias. The denominator of the weights may be decomposed into two components referred to as the treatment and right-censoring mechanisms:

The numerator of the weights was proposed to improve the precision of the MSM fit[23, 25]. In an observational study, the numerator and denominator of the weights are unknown and must be estimated. The inference may be derived by bootstrapping, or alternatively, a conservative estimate of the variance of the IPW estimator may be derived analytically based on its influence curve[21, 38]. A valid inference based on the IPW estimator to fit a MSM relies on the following three assumptions.

The sequential randomization assumption (SRA) or sequential deconfounding (SD). The SRA and SD formalize the assumption of no unmeasured confounders in longitudinal studies. The validity of any of these two assumptions insures that sufficient information is collected to possibly identify the causal effect of interest with the observed data based on the G-computation formula[39]. The SRA is defined by the conditional independencies

Ya¯(K)(K+1)⊥A(t)|L¯(t),A¯(t−1) for all a¯(K)∈A(K) and t=0,…,K.(1)

Neither the SRA nor SD is testable with the data alone but may be motivated based on a causal directed acyclic graph (DAG) such as the one in Figure 1. Based on such a DAG, satisfaction of independencies that involve counterfactual variables like the SRA may be evaluated with the twin network method[40]. Alternatively, SD may be evaluated with a DAG based on the simpler sequential back-door criterion[40].

The experimental treatment assignment (ETA) assumption (a.k.a. positivity assumption). It ensures that the conditional distributions of the observed data in the G-computation formula[39] are well defined and, thus, ensures the non-parametric identifiability of the causal effect of interest. In other words, satisfaction of the ETA assumption permits reliable estimation of the causal effect based on true information in the data instead of extrapolation from limited information based on parametric modeling assumptions. Formally[13], this assumption insures that any patient may experience any of the treatment interventions of interest at any point in time, regardless of her covariate history:

P(A(t)=a(t)|L¯(t),A¯(t−1)=a¯(t−1))>0 for a¯(K)∈A(K) and t=0,…,K.(2)

In the DM study, this assumption was expected to be practically violated, which could result in both biased and highly variable IPW estimates[41]. Indeed, patients with A1c levels drifting well above the recommended 7% value during follow-up were expected to rapidly initiate an intensified treatment because TI is clearly indicated for A1c> 8.5 %. For such patients, the probability of remaining untreated was thus expected to be near 0, which would result in a practical violation of the ETA assumption above.

Consistent estimation of the action mechanism. In practice, each component of the action mechanism often is estimated based on parametric models fitted by maximum likelihood. This assumption then refers to the correct specification of these parametric models.

APPENDIX D

DYNAMIC TREATMENT INTERVENTIONS AND EFFECT DEFINITION

Formally, causal effects of dynamic treatment interventions also are defined based on the full data: the treatment strategies (a.k.a. individualized action rules) of interest are denoted by d_{θ} for θ ∈ Θ and are each defined as a vector function d_{θ} = (d_{θ}(0), …, d_{θ}(K)) where each function, d_{θ}(t) for t = 0, …, K, is a decision rule for determining the action (treatment and right-censoring) to be experienced by a patient at time t. A decision rule d_{θ}(t) maps the action and covariate history measured up to a given time t to an action regimen (i.e., an intervention) at time t: dθ(t):(L¯(t),A¯(t−1))↦(a1(t),a2(t)). In the DM study, the decision rules of interest are defined such that dθ(t)((L¯(t),A¯(t−1)) is:

(a_{1}(t), a_{2}(t)) = (0, 0) (i.e., no use of an intensified treatment and no right-censoring) if and only if the patient was not previously treated with an intensified therapy (i.e., Ā(t − 1) = 0) and the A1c level at time t (an element of L(t)) was lower than the threshold θ.

(a_{1}(t), a_{2}(t)) = (1, 0) (i.e., use of an intensified treatment and no right-censoring) otherwise.

The potential covariate process that could be observed for a patient in an ideal experiment where interventions on the action process according to a decision rule d_{θ} are carried out through time K is denoted by L¯dθ(K+1). This process corresponds with one of the potential covariate processes in the full data: L¯dθ(K+1)=L¯a¯(K)(K+1) such that the action regimen ā(K) = (ā_{1}(K), ā_{2}(K)) corresponds to interventions according to the individualized action rule d_{θ}, that is, a(0) = d_{θ}(0)(L(0)), a(1)=dθ(1)(L¯(1),a(0)),…,a(K)=dθ(K)(L¯(K),a¯(K−1)) (in particular: a_{2}(t) = 0 for all t = 0, …, K). Note that such dynamic treatment interventions through time t = 0, …, K according to the adaptive treatment strategy d_{θ}, that is, ā(t) above, are only functions of the observed covariate process L¯(t) and are thus denoted by dθ(L¯(t)) from now on. Similar to the case of static interventions, failure may occur during follow-up under any such dynamic treatment intervention, dθ(L¯(K)). Such potential failure times are denoted by Tdθ. As with static interventions, causal effects of adaptive strategies on failure time can be formally defined as functions of the potential survival curves P(Tdθ>t), for example, risk differences

for t = 0, …, K and any two different individualized action rules d_{θ} and dθ′. A model that represents each potential survival curve P(Tdθ>t) is a dynamic MSM[26-28], for example, a logistic model for the corresponding discrete-time hazards P(Ydθ(t+1)=1|Ydθ(t)=0):

P(Tdθ>t)=∏j=0t(1−P(Ydθ(j+1)=1|Ydθ(j)=0)).(4)

APPENDIX E

DATA STRUCTURING IN THE DM STUDY

Patients' EHR data were mapped into the observed data structure O from subsection 1 in section 4 as follows:

Each patient's follow-up time measured in days between study entry was discretized into consecutive 90-day intervals denoted by t=0,…,T˜. In particular, t = 0 represents the first τ = 90 days of a patient's follow-up: [0, τ[ and T˜ is the follow-up time expressed in units of 90 days since study entry. Note also that unlike t for t=0,…,T˜−1, which represents a monitoring interval of 90 days [tτ, (t + 1)τ[, the follow-up time t=T˜ typically represents a shorter monitoring interval: The actual follow-up time expressed in days typically falls within the interval of 90 days: [T˜τ,(T˜+1)τ[.

Treatment exposure, A_{1}(t), was defined as a binary variable representing the exposure to an intensified DM treatment during interval t. A patient's observation for A_{1}(t) was set to 0 for all time intervals t except for the interval (if any) when a new DM treatment not used at study entry was initiated by that patient and for subsequent intervals during which one or more DM treatments not used at study entry was estimated to be cumulatively used at least 50% of the days. The TI definition thus allowed for drug switching and was designed to capture prolonged interruption in intensified therapy, that is, relatively short interruptions in intensified treatments were ignored.

The following time-independent potential confounders were considered in this analysis: age at study entry, sex, median household income in the patient's census block, number (1, 2, 3, ≥ 4) and type of DM drugs used at study entry (alpha-glucosidase inhibitor, DPP4 inhibitor, exenatide, insulin combo, long-acting insulin, meglitinide, metformin, pramlintide, short-acting insulin, sulfonylurea, thiaziolidinedione), prospective risk scores based on diagnoses and prescriptions[42], race (Asian, Black, Hispanic, Islander, Native, White), and HMO sites (HealthPartners, Group Health, KPCO, KPHI, KPNC, KPNW, KPSC). In addition, the following time-varying, potential confounders were considered: history of arrhythmia, history of coronary heart disease, history of chronic heart failure, history of cerebrovascular disease, history of diabetic macular edema, history of peripheral artery disease, A1c, LDL and HDL cholesterol, triglyceride, systolic and diastolic blood pressure, body mass index, category of estimated glomerular filtration rate (≥ 90, 60–89, 45–59, 30–44, 15–29, < 15), stage of retinopathy (background, mild, moderate to severe, proliferative), and albuminuria level (< 30, 30–300, > 300). To respect the time-ordering between covariates L(t) and action A(t), daily measurements on the covariate attributes above were mapped to an observation of L(t) as follows:

For all t representing intervals [tτ, (t + 1)τ[ that do not include the day when TI is deemed to be initiated (i.e., when the value for A_{1}(t) is 1 for the first time), the value for each covariate attribute of L(t) is set to the last measurement of that attribute (if any) 1) at t − 1 (i.e., the last measurement obtained during interval [(t − 1)τ, tτ[) if t > 0 and 2) within two years preceding study entry if t = 0.

For t representing the interval [tτ, (t + 1)τ[ when TI is initiated, the value for each covariate attribute of L(t) is set to the last measurement of that attribute (if any) 1) at t − 1 or t but always prior to the actual day when the new intensified DM treatment was initiated if t > 0 and 2) within the year preceding study entry or during the first follow-up interval, but always prior to the actual day when TI was initiated if t = 0.

After mapping EHR data into the observed data structure O from the MSM framework for right-censored longitudinal data[23] as just described, missing values for the baseline covariates L(0) were imputed with the mode and the mean for categorical and continuous covariates respectively except for race, body mass index, and systolic and diastolic blood pressures. Race was imputed conditional on the HMO site with the mode. Unlike other baseline measurements, measurements for body mass index and the two blood pressures were missing for more than 30% of patients and were thus imputed with multivariate logistic models. Body mass index was imputed based on a logistic model fitted with complete observations and parameterized with main terms for all covariate attributes except systolic and diastolic blood pressures and with a term for the patient's year of enrollment into the study. Systolic and diastolic blood pressures were imputed based on separate logistic models with the same parameterization used for imputation of body mass index. A missing measurement for any attribute of the time-varying covariate L(t) for t > 0 except for the outcome Y(t) was imputed with the last non-missing measurement for that attribute before time t, and otherwise with its previously imputed baseline measurement if all measurements of that attribute before time t were missing.

APPENDIX F

ESTIMATION OF THE IPW WEIGHTS IN THE DM STUDY

The treatment mechanism component of the IPW weights, that is, ∏j=0tP(A1(j)|A2(j)=0,L¯(j),A¯(j−1)), was estimated based on three separate logistic models for (i) TI initiation in the first follow-up period P(A_{1}(0) = 1|A_{2}(0) = 0, L(0)); (ii) TI initiation in subsequent periods P(A1(t)=1|A2(t)=0,L¯(t),A¯(t−1)=0) for t > 0; and (iii) TI continuation P(A1(t)=1|A¯2(t)=0,L¯(t),A¯1(t−2),A1(t−1)=1). The three logistic models were parameterized with separate terms for (i) time-independent covariates (elements of L(0)); (ii) the last measurements of time-varying covariates (i.e., elements of L(t)); and (iii) the recent change in A1c (i.e., an element of L(t)-L(t-1)). In addition, a term for time t was included in the last two models because these logistic models are pooled models over time t. No interaction terms between covariates were considered. The change in A1c at t = 0 (i.e., element of L(0)) was defined as the difference between the baseline A1c measurement in L(0) and the previous A1c measurement used to establish study eligibility (below 7% by design).

A patient may be right-censored because of administrative end of the study, disenrollment from the health plan, or death. The indicator of right-censoring A_{2}(t) thus may be recoded with three separate indicator variables, each representing the indicator of right-censoring for one of the possible three reasons. These variables are denoted by A2end(t), A2dis(t), and A2death(t), respectively. The right-censoring mechanism component of the IPW weights thus can be decomposed into three parts: ∏j=0tP(A2end(j)=0|L¯(j),A¯(j−1))∏j=0tP(A2dis(j)=0|A2end(j)=0,L¯(j),A¯(j−1))∏j=0tP(A2death(j)=0|A2end(j)=0,A2dis(j)=0,L¯(j),A¯(j−1)).We assumed that right-censoring because of administrative end of the study was uninformative, that is, P(A2end(t)=0|L¯(t),A¯(t−1)) is a function of t only, and this component thus may be ignored in the definition of the IPW weights because the numerator of the weights for IPW estimation of a dynamic MSM for discrete-time hazards pooled over time may be any function of both t and rules d[28, 37]. The remaining components of the right-censoring mechanism were estimated based on two separate logistic models for (i) right-censoring because of disenrollment from the health plan P(A2dis(t)=0|A2end(t)=0,L¯(t),A¯(t−1)); and (ii) right-censoring because of death P(A2death(t)=0|A2end(t)=0,A2dis(t)=0,L¯(t),A¯(t−1)). The two logistic models were parameterized like the logistic models used to estimate the treatment mechanism, except that a term for the last TI exposure, A_{1}(t), was included in the models.

The denominators of the IPW weights were computed from the predicted values of the five fitted logistic models for the action mechanism described above. The numerators of the IPW weights were estimated non-parametrically using the proportions of patients following each rule d_{θ} at any given time t.

APPENDIX G

SENSITIVITY ANALYSES

Similar to previous applications of this MSM approach in HIV research[37, 43] and in an effort to gain precision in effect estimates, we implemented a sensitivity analysis to investigate the effect of 16 non-ITT rules d_{θ} indexed by A1c thresholds θ ranging from 7 to 8.5 by increment of 0.1. The same IPW approach was used to fit a more parametric dynamic MSM such that gain in precision in the effect estimates described above could be achieved through smoothing of the effect of the A1c thresholds θ on the potential survival curve based on pooled data from patients following more than four TI initiation rules. The original analysis based on four TI initiation rules led to the inclusion of about 66.9% of the available person-time observations for fitting the dynamic MSM, whereas the extended analysis with 16 TI initiation rules led to the inclusion of about 68.1% of the person-time observations. The relatively few observations added in the secondary analysis originate from patients' follow-up data that were not concordant with the initial four rules of interest but that were nevertheless concordant with one or more related rules indexed by different A1c thresholds. Results led to similar subject matter conclusions, although the magnitude of the point estimates was overall smaller and the expected gain in precision was not realized. The lack of improvement in precision may be explained by the relatively few additional person-time observations that were added to the analysis but also may be the result of a parameterization choice for the dynamic MSM that mostly relied on smoothing assumptions for θ (i.e., smoothing over time was limited).

To weaken the requirements of the no unmeasured confounders assumption (NUC), the investigation of effects defined by intention-to-treat dynamic interventions has been proposed and implemented in HIV research, since variables affecting both treatment discontinuation and the outcome may be “less well-measured in most observational studies”[37]. In this study, such ITT decision rules would not require patients to remain on an intensified treatment once it was initiated. This ITT approach may thus suffer from the same limitation described in section 3 when frequent therapy changes (e.g., discontinuation of an intensified treatment) is a concern.

To study so-called legacy effects of more or less aggressive TI strategies, a sensitivity analysis was implemented based on TI decision rules that are hybrids between the non-ITT rules considered in the DM study and the aforementioned ITT rules. These rules require that patients who initiate an intensified treatment remain on that treatment for at least one year after the period of TI initiation. After this mandatory one-year exposure to an intensified therapy, patients may go off the intensified treatment under such hybrid rules. This led to a minor increase in the proportion of available person-time used to fit the dynamic MSM (67.9%). Compared with the primary analysis, the NUC assumption is weakened in this analysis because we need not rely on the assumption that the covariate process L¯(t) contains measurements on all risks factors for the outcomes that also affect late non-adherence to the intensified treatment, that is, adherence after one year on the intensified treatment. Results from this secondary analysis led to similar subject-matter conclusions, although the magnitude of both the point estimates and standard errors were smaller. The advantage of the hybrid rules indexed by 7%, 7.5%, and 8% over that indexed by 8.5% remained significant at the 0.05 level.

Note that in the analyses described above, the effect estimates were based on data from patients with possibly infrequent laboratory monitoring. Monitoring of albuminuria has a direct impact on the outcome definition, i.e., disease progression is more likely to be identified later for patients who are monitored less frequently. Monitoring of A1c has a direct impact on exposure classification, and patients whose A1c is monitored less frequently may be more likely to initiate TI at a higher A1c threshold. In an effort to control for infrequent monitoring of patients' A1c and albuminuria, we implemented a sensitivity analysis in which patient's data were artificially right-censored after 6 months without a new A1c measurement or 1 year without a new albuminuria measurement. To account for possible informative right-censoring through IPW estimation, two additional logistic models were used to separately predict right-censoring because of poor monitoring of albuminuria and A1c. This secondary analysis aims to mimic inference from a randomized experiment in which patients' A1c and albuminuria are monitored at least once every 6 months and every year, respectively. Findings are thus not generalizable to healthcare systems with less frequent monitoring for A1c and albuminuria[28]. Results led to similar subject matter conclusions, although the magnitude of the standard errors was overall larger, and the magnitude of the point estimates of the contrasts between decision rule d_{8.5} and all other rules was larger, although it was smaller for the other point estimates.

As just noted, results from this study based on data with relatively close monitoring of patients may not be generalizable to other settings. Concerns may be raised over the validity of the NUC assumption in such settings and thus over the applicability of MSM approaches to address CER questions. Given that the decision to initiate treatment in practice is typically based on medical events (e.g., A1c measurements) known to physicians during clinic visits (whatever the frequency at which such visits occur), the NUC assumption should approximately hold for ITT interventions even when changes in patients' A1c are not well captured by the healthcare system, as long as the information used by physicians to make a decision to initiate treatment is captured by the healthcare database. Similarly for non-ITT rules, the extent to which the health database also can capture information on risk factors for the outcomes that affect the patient's decision to adhere to the intensified treatment determines the ability to rely on the NUC assumption. For both ITT and non-ITT rules, the ability to properly adjust for selection bias through IPW estimation also relies on the availability of measurements for risk factors for the outcomes that also affect right-censoring events.