Correcting for bias due to mismeasured exposure history in longitudinal studies with continuous outcomes

Epidemiologists are often interested in estimating the effect of functions of time-varying exposure histories in relation to continuous outcomes, for example, cognitive function. However, the individual exposure measurements that constitute the history upon which an exposure history function is constructed are usually mismeasured. To obtain unbiased estimates of the effects for mismeasured functions in longitudinal studies, a method incorporating main and validation studies was developed. Simulation studies under several realistic assumptions were conducted to assess its performance compared to standard analysis, and we found that the proposed method has good performance in terms of finite sample bias reduction and nominal confidence interval coverage. We applied it to a study of long-term exposure to PM2.5, in relation to cognitive decline in the Nurses’ Health Study Previously, it was found that the 2-year decline in the standard measure of cognition was 0.018 (95% CI, −0.034 to −0.001) units worse per 10 μg/m3 increase in PM2.5 exposure. After correction, the estimated impact of PM2.5 on cognitive decline increased to 0.027 (95% CI, −0.059 to 0.005) units lower per 10 μg/m3 increase. To put this into perspective, effects of this magnitude are about 2/3 of those found in our data associated with each additional year of aging: 0.044 (95% CI, −0.047 to −0.040) units per 1 year older after applying our correction method.

Supporting Information for Correcting for bias due to mismeasured exposure history in longitudinal studies with continuous outcomes by Jiachen Cai, Ning Zhang, Xin Zhou, Donna Spiegelman, and Molin Wang Web Appendix A Mathematical Derivations

A.1 Fully Independent Validation Data
As defined in the manuscript, , where Y i = (Y i1 , ...Y im i ) T is the vector of outcomes for the ith subject, Σ i is the working variance-covariance matrix, X i = ( X i (t i1 ), ..., X i (t im i )) T , X i (t ij ) = (1, X i (t ij ), t ij , X i (t ij )t ij , W T i (t ij )) T , and . Note that N = n 1 + n 2 for the main and internal validation study design, and N = n 1 for the main and external validation study design.Assume that for the ith person in the validation study, d i (t i ) = (1, C i (t i ), t i , C i (t i )t i , W i (t i )) T , i = n 1 + 1, ..., n 1 + n 2 .We define For the estimate θ=( α T , β T ) T , we have Then, The variance of θ can be derived as Var(ψ(θ)) ∂ψ ∂θ and it can be estimated by plugging in estimated values θ.
Based on Equation ( 5) and ( 6) in the main paper, we have With an internal validation study, replace n 1 in ∂ψ β ∂β and ∂ψ β ∂α l with n 1 + n 2 .

A.2 For Validation Studies With Repeated Measures
Assume that for the ith participant, , where l i is the number repeated measurements in the validation study; B i is the working correlation matrix.The changes to A(θ) and B(θ) defined in Appendix A.1 are

A.3 Consistency of the Estimators
We prove the consistency of the estimators by showing the mean of each estimating function ψ α , ψ β is zero.Then, under standard regularity conditions, GEE theory establishes consistency.
For ψ α : since For = 0, based on equation (3) in the main paper.

A.4 Equivalence of the Two-Step Approach to a Simultaneous Solution to Joint Estimating Equations
It is noteworthy that solving the system of estimating equation ψ = 0 (Section 2.3 Estimation) to obtain α and β is equivalent to performing the following two steps: First, we estimate α in the VS.Then, following equation (5) in the main paper, the expectation of the true exposure history function, , can be estimated as Next, with outcome variable, Y i (t ij ), and explanatory variables , GEE can be applied to obtain estimates for β.
Secondly, assume α * , β * are the solutions to the joint equations.Then, α * is the estimate of the first step (due to the fact that it is the solution to the estimating equation ψ α (α) = 0).Since ψ β ( α * , β * ) = 0, β * is the estimate from the second step.Therefore, α * , β * are estimates from the two-step approach.
Thus, the two-step approach is equivalent to a simultaneous solution to joint estimating equations.

A.5 Considerations of the Time Scale for Analysis
In this section, we show that the time scale used in the measurement error model does not have to be the same as that used in the main model for valid application of the methods described in this paper.For subject i in the jth occasion, let t ij and t ′ ij denote time since baseline cognitive assessment and age, respectively.When time since baseline is used as the time scale in the outcome model and we want to use age as the time scale in the measurement error model, baseline age, t ′ i1 , should be considered in the outcome model as a potential confounder.Let W i (t ij ) −t ′ i1 denote all potential confounders excluding t ′ i1 .We can rewrite the potential confounders ).It follows that formula (3) in the main paper can be rewritten as Following formula (4) in the main paper, Note that t i1 is not included as it has a fixed value 0.
Since age at the kth occasion, t ′ ik , can be written as the sum of age at baseline and time from baseline to the kth occasion (i.e., t ), given the baseline age t ′ i1 , there is a one-toone transformation between t ik and t ′ ik .It follows that, given the baseline age, the surrogate exposure, C i (t ik ), for the ith subject at the kth occasion, which uses time since baseline, t ik , as the time index, can also be written as C i (t ′ ik ), which uses age as the time index, k = 1, . . ., j.Similarly, c i (t ik ) and W i (t ik ) can also be written as c i (t . Furthermore, assuming the following localized error assumption with age as the time scale formula (3) can be written as where the time scale for the outcome model is time since baseline, while it is age in the measurement error model.This proves that we can use the two different time scales in the outcome and measurement error models under the above localized error assumption, which states that conditional on all the variables at the current age, the true exposure is independent of these variables in the past.
It is easy to show that the above derivation still holds if the interactions between t ij and some or all elements of W i (t ij ) are also included in the outcome model.In addition, it holds if we replace age by calendar time, in which case t ′ i1 is the calendar time at baseline and W i (t ij ) may include age at t ij .

A.6 Relationship between Classical-Type Measurement Error Model and Our Assumed Measurement Error Model
The classical-type error model can be viewed as a special case of our assumed model (see Chapter 2 of Carroll et al., 2006 for more discussion).For example, the classical additive measurement error model , where C ij is the surrogate exposure for the jth occasion of the ith individual, c i is the true exposure of the ith individual and e ij is corresponding error, implies that

Web Appendix B Simulation Study B.1 Data Generation Process
For each individual i, we generated the first observed time point t i1 from a uniform distribution with the minimum value 0 and the maximum value 10.Then t ij = t i1 + (j − 1), j = 2, 3, 4, 5, to create a 10-year variation in study entry time, each individual was followed for 5 years, and information was gathered once every year.
When there were no additional covariate in the outcome model, we generated the surrogate exposure ), where Σ C followed a first-order autoregressive correlation matrix (AR(1)) structure with variance σ 2 C = 1 and the correlation for adjacent elements ρ C = 0.6.When a covariate was to be considered, we generated the surrogate exposure C and W together, i.e., where Σ C followed an AR(1) structure with variance σ 2 C = 1 and correlation parameter ρ C = 0.6, Σ W followed an AR(1) structure with variance σ 2 W = 1 and correlation parameter ρ W = 0.2, Σ C,W = Σ W,C followed a correlation structure with the diagonal element ρ C,W = 0.4, representing an intermediate level of correlation, when (C, W ) were assumed correlated and ρ C,W = 0 when (C, W ) were assumed uncorrelated.
Next, to generate the true exposure c from the surrogate exposure C, we needed to specify the parameters in the MEM.When included in the model, α 0 = 1.2, α 1 = 0.7, α 2 = 0.6, α 3 = 0.5, α 4 = 0.4.These choices were motivated by the correlation between c and C (0.61) in the illustrative example, resulting in Corr(c, C) ranging from 0.57 to 0.85, depending upon the exact model used to generate the data.

B.2 Additional Results
Below is a summary of all simulation tables for this paper.Scenarios under True MEM are as defined in Section 3 in the main manuscript.We only considered AR(1) as the working correlation in scenarios where W was included; because when we applied our method for scenarios where W was not present, we found the bias and coverage probability were invariant to the choice of the correlation structure.Note that Tables Main 1-Main 4 can be found in the main paper, and Tables Supplement 1-

Figure 1 :
Figure 1: Measurement error corrected and uncorrected rates of global cognitive score by time since baseline.The longest follow-up time in the main study was 10 years.PM 2.5 mean was the mean (14 µg/m 3 ) level in the NHS Cognitive Cohort, and baseline age was set as the mean age (74.2 years) at baseline in the NHS Cognitive Cohort.25

Table 1 :
12 can be found below, with page numbers provided in this table.Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the IVS Scenario NI-NW in the true MEM.Working correlation specified as AR(1) in the GEE analysis for the outcome model.

Table 2 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the IVS Scenario NI-WP and NI-WC in the true MEM.Working correlation specified as AR(1) in the GEE analysis for the outcome model.

Table 3 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the IVS Scenario IP-NW in the true MEM.Working correlation specified as AR(1) in the GEE analysis for the outcome model.

Table 5 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the EVS Scenario NI-NW in the true MEM.Working correlation specified as Unstructured in the GEE analysis for the outcome model.

Table 6 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the EVS Scenario IP-NW in the true MEM.Working correlation specified as Unstructured in the GEE analysis for the outcome model.

Table 7 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the EVS Scenario NI-NW in the true MEM.Working correlation specified as Independence in the GEE analysis for the outcome model.

Table 8 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the EVS Scenario IP-NW in the true MEM.Working correlation specified as Independence in the GEE analysis for the outcome model.

Table 9 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the IVS Scenario NI-NW in the true MEM.Working correlation specified as Unstructured in the GEE analysis for the outcome model.

Table 10 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the IVS Scenario IP-NW in the true MEM.Working correlation specified as Unstructured in the GEE analysis for the outcome model.

Table 11 :
Bias (%), empirical standard error (ESE), estimated sandwich standard error (SE), and coverage probability (CP) of 95% confidence interval for the interaction effect estimate β 3 based on 1000 simulations, under the IVS Scenario NI-NW in the true MEM.Working correlation specified as Independence in the GEE analysis for the outcome model.