G‐estimation of structural nested mean models for interval‐censored data using pseudo‐observations

Two large‐scale randomized clinical trials compared fenofibrate and placebo in diabetic patients with pre‐existing retinopathy (FIELD study) or risk factors (ACCORD trial) on an intention‐to‐treat basis and reported a significant reduction in the progression of diabetic retinopathy in the fenofibrate arms. However, their analyses involved complications due to intercurrent events, that is, treatment‐switching and interval‐censoring. This article addresses these problems involved in estimation of causal effects of long‐term use of fibrates in a cohort study that followed patients with type 2 diabetes for 8 years. We propose structural nested mean models (SNMMs) of time‐varying treatment effects and pseudo‐observation estimators for interval‐censored data. The first estimator for SNMMs uses a nonparametric maximum likelihood estimator (MLE) as a pseudo‐observation, while the second estimator is based on MLE under a parametric piecewise exponential distribution. Through numerical studies with real and simulated datasets, the pseudo‐observations estimators of causal effects using the nonparametric Wellner–Zhan estimator perform well even under dependent interval‐censoring. Its application to the diabetes study revealed that the use of fibrates in the first 4 years reduced the risk of diabetic retinopathy but did not support its efficacy beyond 4 years.

4 years in the ACCORD trial and 5 years in the FIELD study. To date, efficacy of fibrates beyond 4 years remains unclear. This article addresses the estimation of causal effects of long-term use of fibrates in a cohort study that followed patients with type 2 diabetes for 8 years. 3 This analysis involves complications encountered in previous clinical trials, that is, treatment-switching and interval-censoring.
Our approach to these problems builds on and extends our previous work on structural nested models and g-estimation for right-censored data. 4 The scientific question of interest, whether fibrates beyond 4 years are efficacious or not, may be addressed using the hypothetical strategy and structural nested models and g-estimation. [5][6][7][8][9][10][11][12][13] There are two classes of structural nested models; models for the mean of the outcome and models for the entire distribution of the outcome. 12 In the context of survival analysis, the former includes structural nested cumulative failure time models, 11,14 structural cumulative survival models, 13,15 and structural nested mean models (SNMMs), 4 while a typical case in the latter class uses structural nested accelerated failure time models. 5 A major complication with estimation of structural nested models is handling of censoring. Previous studies dealt with right-censoring using artificial-censoring and inverse probability of censoring weighting, 11,12,14 but a simulation study reported that artificial-censoring can increase bias. 16 Other approaches to handling right-censoring include augmented g-estimation, 17 pseudo-observations, 4 and instrumental variable estimation. 15 These estimators avoid artificial-censoring but are not designed to handle interval-censoring.
Andersen et al originally proposed a pseudo-observational approach to directly fit generalized linear models to right-censored data. 18 Their approach also performs well in SNMMs. 4 Functional analysis by Overgaard et al showed that pseudo-observation estimators are asymptotically normal and consistent under a regular condition, implying that this approach is also useful for handling interval-censored data if one chooses appropriate pseudo-observations. 19 In previous work on interval-censoring, Kim and Kim calculated pseudo-observations using a pseudolikelihood estimator of cumulative incidence function, 20 while Sabathé et al. used a semi-parametric estimator of transition intensities of illness-death models. 20 In this article, we propose SNMMs of time-varying treatment effects and pseudo-observation estimators for interval-censored data. Martinussen et al and Ying and Tchetgen Tchetgen constructed an unbiased estimating equation for structural cumulative survival models based on instrumental variable assumptions. 13,15 Unlike their approach, our method does not require the assumption of reclusion restriction. We then apply the proposed methods to time-varying exposures in a cohort study of type 2 diabetes. In Section 2, we introduce SNMMs for time-varying treatments and make assumptions for causal inference. Section 3 provides two pseudo-observation estimators for SNMMs. The first uses a nonparametric maximum likelihood estimator (MLE) as a pseudo-observation, 21 while the second is based on MLE under a parametric piecewise exponential distribution. Section 4 evaluates the finite-sample performance of the estimators in simulated datasets. In Section 5, we apply the proposed methods to the diabetes study 3 and Section 6 briefly discusses some problems related to the methods.

Data
Suppose a cohort study with a time-to-event outcome T i , a time-varying exposure A im and time-varying covariates X im for the ith participant. T i is known only to lie within an interval instead of being observed exactly. We use (L i , R i ] to denote the interval containing T i , allowing right-censored observations (L i , ∞] and left-censored observations (0, R i ]. Data on the exposure and covariates are collected at fixed time points t 1 , t 2 , … , t M in addition to the baseline measurement at t 0 . We presume the variables are ordered X 0 , A 0 , X 1 , A 1 , … , X M , A M such that covariates at t m precede exposure at t m . Let overbars represent a history of exposures, that is, A m = {A 0 , … , A m } denotes the exposure sequence through m. A im is not restricted to a binary variable but our methods assume that A im ≥ 0 with A im = 0 serving as the reference level. 4 We as a sequence that represents actual exposure history through j and no exposure from t m+1 to the end of the study t M . Patients never exposed have, for example, A M = {0}.

Linear and log-linear structural nested mean models
SNMMs specify, at each time t, the effects of a sequence of exposures {a} = {a 0 , … , a M } compared with no exposure at that time conditional on the exposure and covariate history up to that time.
be the potential outcome of the ith participant that would be seen were the participant to have received a hypothetical sequence of exposures {a}.
is defined for all participants but observed only for participants whose exposure level is {a}. Tanaka et al considered SNMMs of the distribution function of potential time-to-event outcomes at the end of study, which may be expressed without being conditional on exposure and covariates as F a (t M ) = Pr ( . 4 For ease of notation, we introduce a possibly unobservable binary variable Y a . A general expression of the SNMMs 12 for the end-of-study binary outcome is given as for m = 0, … M, respectively. This article focuses on identity-link and log-link functions. The proposed methods, g-estimation using pseudo-observations, work particularly well in linear and log-linear SNMMs, although it does not guarantee probabilities within the unit interval. SNMMs with other link functions, such as logistic-link for causal odds ratios or complementary-log-log for causal hazard ratios, require further model specifications and the congeniality problems in parameterizations may arise. An extension of the pseudo-observations estimators for nonlinear link functions has been reported previously. 4

2.3
Assumptions for structural nested mean models The notation for potential outcomes implicitly assumes the stable unit treatment value assumption (SUTVA). 22 In addition to (i) SUTVA and (ii) correct specification of the SNMM, we make the following standard assumptions: (iii) counterfactual consistency, for any observed exposure status with probability one; (iv) a sequential version of the no unmeasured confounders assumption, and (v) non-informative interval-censoring, that is, L and R are independent of T. Although we do not consider treatment variations, one may assume treatment-variation irrelevance 23 rather than (iii) counterfactual consistency when one distinguishes the range of treatment variations and the range for which treatment comparisons are made. Previous studies showed that how we can relax the assumption (v) to covariate-dependent censoring. 24-26

Property of pseudo-observations
Andersen et al derived the pseudo-observation methods, a convenient technique to handle censoring, which use an unbiased estimator of a parameter from the entire dataset and its leave-one-out version. 18 Let̂and̂− i be an (approximately) unbiased estimator of the distribution function F(t) from the entire dataset and that obtained by eliminating the ith individual. The pseudo-observation for the ith participant is defined aŝi = n̂− (n − 1)̂− i . This quantity is a transformation of possibly censored data in which each observation is represented by the contribution of a given individual to the estimate from the entire dataset. Andersen et al proposed estimating equations for regression parameters of the expectation of an outcome Y i conditional on covariates X i based on the (approximate) unbiasedness property of pseudo-observations 18 : Examples of̂for right-censored data are the Kaplan-Meier estimator and the Aalen-Johansen estimator. Since the Aalen-Johansen estimator is approximately unbiased, the unbiasedness property (3) does not hold exactly, but achieves a sufficient degree of accuracy in large samples. 4,19 For interval-censored data, we consider two estimators of the distribution function F(t) and their finite-sample performance in simulation results in Section 4.

Nonparametric method
Kim and Kim assessed performance of a pseudo-observation method with a nonparametric MLE for interval-censored competing risks data thorough simulations, demonstrating their method works well with sample size n = 100, 200, or 300. 27 In our setting, the most popular choice in practice is the Wellner-Zhan estimator, which maximizes likelihood using a hybrid of the expectation-maximization algorithm and the iterative convex minorant algorithm. 21 According to Turnbull, the nonparametric likelihood of F(t) reduces to where I is an indicator function, I j (j = 1, … , J) is the ordered set of nonoverlapping intervals, called the Turnbull interval, and p j = Pr ( T ∈ I j ) . 28 Wellner and Zhan showed that the values at convergence from the hybrid algorithm,p j , and equivalentlyF(t) obtained from these estimates always constitute the nonparametric MLE if it exists and is unique. 21

Parametric method
Another approach that can be useful is specifying a parametric piecewise exponential distribution. In this approach, the hazard function of the marginal distribution of time-to-event T is assumed to be constant within a user-specified interval, (t) = j for t ∈ T j , where T j is the jth interval of time. It is straightforward to calculateF(t) from the MLEŝj. From a practice viewpoint, the piecewise exponential modeling is appealing because the model becomes more nonparametric in nature as the number of intervals increases, while still providing the theoretical advantages of MLE.

Censoring
The assumption (v) means that both nonparametric and parametric estimators for calculating pseudo-observations require that the censoring mechanism is non-informative. It is known that the nonparametric MLE are usually consistent but the rate of convergence depends on the censoring mechanism even under non-informative censoring. With case 2 interval-censored data, Yu et al showed that the nonparametric MLE is asymptotically normal and √ n-consistent if the censoring vector takes on finitely many values, 29 but otherwise the rate of convergence increases to (n log n) 1∕3 . 30 In the latter case, the nonparametric MLE does not satisfy the conditions of proposition 3.1 of Overgaard et al. 19 These results suggest that, in practice, researchers should design the outcome assessment schedule at appropriate intervals when using the nonparametric method, so that the estimatorsF(t) achieve √ n-consistency. If censoring depends on covariates, the nonparametric and parametric methods for estimating the distribution functionF(t) has a large-sample bias. It is possible to weaken the assumption (v) to covariate-dependent censoring, that is, (L, R] are independent of T conditional on covariates X 0 , if one calculates pseudo-observations after stratifying participants according to the baseline covariates X 0 , 25 using model-based methods 25 or inverse probability weighting. 26

G-estimation based on pseudo-observations
)} for log-linear SNMMs (2), wherêi is the pseudo-observation for the ith participant estimated by either of the parametric or nonparametric methods. Then, together with the linear or log-linear structure and the sequential no unmeasured confounders assumption, the expectation of H im ( ) equals the expectation of the potential outcome if the exposure was excluded from time t m onward, that is, If the sequential version of the no unmeasured confounders assumption holds, the mean independence of A im and H im ( * ) conditional on X im suggests a moment estimation equation would be estimated by fitting generalized linear models. The pseudo-observations are dependent, as overlapping data are used in their computation. However, according to theoretical results, it can be shown that the solutions to the estimation equations,̂, are consistent and asymptotically normal. 19 Conservative confidence intervals for̂may be obtained using the sandwich variance formula n −1M −1Ŝ (M −1 ) T , . As suggested through simulations, the sandwich confidence intervals would perform well in small-or moderate-sized datasets. 4,31

Model specification
The linear SNMMs (1) and log-linear SNMMs (2) represent structural relationships between potential outcomes and exposure history and misspecification of the model structure can lead to bias. To assess goodness-of-fit of the model and select the correct model structure, a goodness-of-fit test, 32 and quasi-likelihood-based information criterion 33 have been proposed. In addition to the structural model, it is necessary to specify conditional expectations E to implement the proposed methods. An attractive property of SNMMs is doubly robustness, that is, g-estimation yields a consistent estimator of causal effects when either of the conditional expectations is correctly specified. In the presence of interval-censoring, E be estimated by using parametric models for the interval-censored outcome or fitting generalized linear models 18 to pseudo-observations for the ith participant under the reference level of exposure,̂i.

SIMULATION
We performed Monte Carlo simulations to examine the finite sample performance of the nonparametric and parametric pseudo-observation methods. Aims of the simulations were 3 fold: (i) assessments of bias and simulated SE (SSE) of the proposed estimators and coverage probability (CP) of their Wald-type confidence interval, (ii) assessments of degree of bias due to dependent interval-censoring, and (iii) assessments of the doubly robust property of the estimators when either of E(A|X) or E{H( )|X} is misspecified.

Scenarios
In the simulation study, we randomly generated 2000 datasets from a hypothetical cohort of n = 200 and n = 1000. The datasets included the time interval (L, R] which includes the true time-to-event, a binary exposure A, a set of covariates X 1 , … , X 5 . The performance of the estimators was assessed through scenarios with different patterns of censoring but scenarios considered only a baseline exposure and baseline covariates. Each dataset was generated in four steps. First, X 1 , … , X 5 were generated from independent binomial distributions with binomial probabilities of 0.5. Second, A given X 1 , X 2 , and X 4 was generated from logistic regression with the conditional probability of logit(0.5X 1 + 0.5X 2 + 0.7X 4 ). Third, the true time-to-event T given A and X 1 , X 3 , and X 4 was generated from an exponential regression model. Two sets of covariates, {X 1 , … , X 4 } and {X 1 , X 2 , X 3 , X 5 } were used as confounders and use of the latter set leads to model misspecification. Finally, the time interval (L, R] was determined by T (see below).
In scenarios for risk ratio estimation, the hazard function of the exponential regression model was specified as −log{1−0.2exp( A + 0.5X 1 + 0.5X 3 −0.7X 4 )} and T was truncated at −log(0.0001) = 9.2 if these steps generated a survival probability lower than 0.0001. The target parameter exp( ) corresponds to the risk ratio at t = 1 and was set at exp( ) = 1 or exp( ) = 0.5. In scenarios for risk difference estimation, the hazard function was −log(0.8− A−0.05X 1 + 0.05X 3 + 0.2X 4 ) and T was truncated in the same way. The target parameter corresponds to the risk difference at t = 1 and was set at = 0 or = 0.2. We assumed that the time interval (L, R] exists on a grid of time interval. The three scenarios for the widths of the time interval are as follows: (i) independent interval-censoring (R-L = 0.1 regardless of A), (ii) dependent interval-censoring (R-L = 0.1 for A = 0 and R-L = 0.5 for A = 1), and (iii) dependent interval-censoring (R-L = 0.5 for A = 0 and R-L = 0.1 for A = 1). The latter two scenarios would yield bias if we apply standard survival analysis for right-censoring since the schedules of assessment of event were differential between exposed (A = 1) and unexposed (A = 0).

Comparators
We compared performance of g-estimators for the target parameter using parametric and nonparametric pseudo-observations. Parametric pseudo-observations were calculated as the MLE under a piecewise exponential distribution, while nonparametric pseudo-observations were the nonparametric MLE of the survival function. 21 The conditional expectations for g-estimation,Ê (A|X 1 , X 2 , X 3 , X 4 ) andÊ {H( )|X 1 , X 2 , X 3 , X 4 }, were estimated by generalized linear models. Note that, among the five covariates, only X 1 and X 4 are confounders, and X 5 is used instead of X 4 to assess doubly robustness of incorrectly specified models. To assess the doubly robust property of the estimator, the proposed estimator under correct specification was compared with three estimators under (partially) incorrect outcome and exposure models: (i) the pseudo-observations estimator usingÊ (A|X 1 ,

Results
In the risk ratio estimation under the alternative hypothesis (Table 1), bias in the estimators with parametric pseudo-observations ranged from 0.004 to 0.033 when exposure or outcome models were both correctly specified, that is, percent bias was between 0.8 to 6.6% as the true value was 0.5, while the estimates were negatively biased when both models were misspecified. It is notable that the estimators with parametric pseudo-observations were biased under dependent interval-censoring (the ratio of intervals between outcome assessments in each arm was 5:1). On the other hand, bias due to dependent interval-censoring was not observed in the nonparametric method. However, when we used nonparametric pseudo-observations, simulated SEs were slightly larger than those with parametric pseudo-observations when the sample size was n = 200. Shown in Table 2 are results of the risk ratio estimation under the null hypothesis. When exposure or outcome models were both correctly specified, bias in the small sample of n = 200 was around 0.03, regardless of independent and dependent interval-censoring, and almost negligible when the sample size was n = 1000. In terms of simulated SE, the parametric method was better than the nonparametric method when n = 200 but the latter estimators outperformed when n = 1000. Tables 3 and 4 present results in the differences in risk estimation under alternative and null hypothesis, respectively. As in shown Tables 1 and 2, the pseudo-observations estimators were doubly robust.
Overall, the nonparametric method yields almost unbiased estimates of causal effects, and coverage probabilities of Wald 95% confidence intervals were around 95% even under dependent interval-censoring. This method therefore outperformed the parametric method in terms of robustness, although being slightly inefficient in some cases.

Methods
We applied the proposed methods to estimate the causal effects of fibrates beyond 4 years in a cohort study of type 2 diabetes. 3 The eligibility criteria for the diabetes study were a previous diagnosis of type 2 diabetes, an age of 40 to 70 years and HbA1c of 6.5% or higher. Patients who met these criteria were recruited in 59 university and general hospitals that specialize in diabetes care in January 1995 to March 1996. Patients who had diabetic retinopathy at baseline were also excluded in the current analysis. One of the primary outcomes of the study is the incidence of diabetic retinopathy determined annually based on clinical assessments by local ophthalmologists at each study site and a survey on ocular surgery. Clinical assessment TA B L E 3 Finite sample performance of the pseudo-observation estimators for a risk difference under an alternative hypothesis (risk difference = 0.2) with correct and incorrect specifications of exposure or outcome models out of 2000 simulation runs was in accordance with the international diabetic retinopathy and diabetic macular edema disease scales. Agreements in grading between local ophthalmologists and central retinal specialists were also evaluated using fundus images. In other words, the schedule of the outcome assessment was annual, but intervals between patients' visits actually varied, yielding the potential for differential interval-censoring. The outcome assessment may be terminated when cataract surgery or death occurred and these events were handled as right-censoring as specified by the study protocol. 3 Based on the follow-up periods of the ACCORD trial 1 and the FIELD study, 2 we considered two estimands, the effect of initiating fibrates at baseline and the effects of initiating or continuing fibrates at 4 years. Therefore, only confounders at baseline and 4 years were used, although data collection was scheduled annually. We also estimated the effects of fibrates at 3 years and the effects of fibrates at 5 years as sensitivity analysis. Supplementary Table A describes distributions of major potential confounders, HbA1c, LDL-cholesterol, triglycerides, and the use of statins. Not only baseline treatment but also use of fibrates at 4 years was strongly correlated with triglyceride values, although the majority of patients did not use fibrates to the end of the study.  We applied four SNMMs according to estimands of interest. The linear and log-linear SNMMs that we used are expressed in terms of the incidence of retinopathy at 8 years Y a m ,0 as

Parametric pseudo-observations Nonparametric pseudo-observations
where m = 0, 1 denotes the time points at baseline and 4 years, respectively, A m = 1 if a patient used fibrates at m, and A m = 0 if a patient did not use fibrates at m or was censored before m. In addition to the link functions, two blip functions TA B L E 5 Summary of length of intervals between outcome assessments and incidence of diabetic retinopathy in the cohort study of patients with type 2 diabetes are selected to specify the contrast of treatment regimes. The first blip function corresponds to causal effects of fibrates at baseline and at 4 years under an assumption that the effect of initiating fibrates at 4 years and that of continuing fibrates at 4 years are equal: Although the assumption behind the blip function is strong, the 2-parameter SNMMs may be practically appealing because only 63 patients were treated with fibrates at 4 years in this cohort. Second, we estimated three causal parameters using a blip function that includes an interaction term: To adjust for confounding by indication, we modeled conditional expectations E using baseline covariates X 0 (age, sex, diabetes duration, use of oral hypoglycemic agents, use of insulin, use of blood pressure lowering agents, use of statin, BMI, HbA1c, systolic blood pressure, LDL-cholesterol, HDL-cholesterol, and triglycerides) and covariates at 4 years X 1 (BMI, HbA1c, systolic blood pressure, LDL-cholesterol, HDL-cholesterol, and triglycerides).

Results
During the median follow-up period of 7.0 years, diabetic retinopathy occurred in 310 patients in the cohort of 1152 patients (Table 5). Mean (SD) of the length of intervals between outcome assessments, that is, R-L, of the 310 patients was 1.3 (0.8) years, although the schedule of the outcome assessment specified in the study protocol was annual. Length of intervals were longer on average among those who were treated with fibrates than those not treated with fibrates (Table 5). Figure 1 shows survival curves of the incidence of diabetic retinopathy according to the use of fibrates at baseline estimated by the Kaplan-Meier and Wellner-Zhan methods.  Table 6 shows estimates from the 2-parameter and 3-parameter SNMMs with identity and log link functions. Overall, the results of the parametric and nonparametric pseudo-observations were similar. The causal risk difference and risk ratios estimated using the 2-parameter SNMMs suggested that fibrates at baseline reduce the risk of diabetic retinopathy at 8 years by 6%, while there were little effects of fibrates at 4 years. The results of sensitivity analysis with the time points of fibrate use at baseline and 3 years, or at baseline and 5 years are presented in Supplementary Table B. The estimated effects of fibrates at 3 years were slightly larger than those of fibrates at 4 years, but overall, there was no evidence that suggests that fibrates prevent diabetic retinopathy after 3 years.
In the analysis of the 3-parameter SNMMs, we estimated two contrasts of causal parameters. The first contrast corresponds to the causal effects of continuing treatment with fibrates at 4 years, that is, a comparison of following regimes: {a} = {1, 1} vs {a} = {1, 0}.
The second contrast corresponds to a comparison of following regimes: {a} = {0, 1} vs {a} = {0, 0}. As shown Table 6, the causal risk differences for the first contrast were 16%, indicating that continuation of fibrates at 4 years rather increased the risk of diabetic retinopathy. It should be noted, however, that there were only 45 patients contributing to these estimates.

DISCUSSION
Interval-censoring is an unsolved problem common in randomized clinical trials and observational studies. Interval-censoring data other than those for diabetic retinopathy include progression-free survival in cancer studies and incidence of vertebral fracture in osteoporosis studies. Most clinical trials in cancer are open-label and therefore assessments of progression by investigators can differ between treatment arms, raising the suspicion of assessment bias by regulatory agencies. 34 In osteoporosis studies, although the schedule of assessments of vertebral fracture is specified in the study protocols, x-ray data are often missing 35,36 and outcome data suffer from dependent interval-censoring. Unfortunately, data from these disease fields are almost always analyzed using statistical methods under the assumption of non-informative right censoring, ignoring potential bias. This study extended the pseudo-observation approach to time-varying treatment effects 4 and applied it to the interval-censored data from the cohort study of type 2 diabetes. Note that the asymptotic theory for pseudo-observation methods 19 has mainly focused on competing risks data, and the behavior of pseudo-observations under interval-censoring has not been well-studied. The asymptotic properties of pseudo-observations methods were derived when the estimator for calculating pseudo-observations is √ n-consistent, 19 and this condition does not hold when the censoring mechanism