A toolkit for measurement error correction, with a focus on nutritional epidemiology

Exposure measurement error is a problem in many epidemiological studies, including those using biomarkers and measures of dietary intake. Measurement error typically results in biased estimates of exposure-disease associations, the severity and nature of the bias depending on the form of the error. To correct for the effects of measurement error, information additional to the main study data is required. Ideally, this is a validation sample in which the true exposure is observed. However, in many situations, it is not feasible to observe the true exposure, but there may be available one or more repeated exposure measurements, for example, blood pressure or dietary intake recorded at two time points. The aim of this paper is to provide a toolkit for measurement error correction using repeated measurements. We bring together methods covering classical measurement error and several departures from classical error: systematic, heteroscedastic and differential error. The correction methods considered are regression calibration, which is already widely used in the classical error setting, and moment reconstruction and multiple imputation, which are newer approaches with the ability to handle differential error. We emphasize practical application of the methods in nutritional epidemiology and other fields. We primarily consider continuous exposures in the exposure-outcome model, but we also outline methods for use when continuous exposures are categorized. The methods are illustrated using the data from a study of the association between fibre intake and colorectal cancer, where fibre intake is measured using a diet diary and repeated measures are available for a subset. © 2014 The Authors.

• The matching variables are age group and sex and we denote the vector of matching variables by z.m.
Additional variables to be adjusted for are exact age, height, weight, smoking status, social class, physical activity level, and education level, of which all but height and weight and exact age are categorical variables. The vector of adjustment variables is denoted z. Note that it is sufficient in this example for z.m to contain only the information on sex because exact age appears as an adjustment variable.
• Matched sets are identified by the variable match.
Naive analysis (Table 1) The naive analysis is performed as follows: naive.analysis<-clogit(y˜w1.log+z+strata(match),data=mydata) Regression calibration (RC) ( Table 1) RC for classical measurement error was outlined in Section 4.1. To perform RC we first fit the RC model, which is a regression of the second exposure measurement on the first and on all adjustment variables including the matching variables: rc.model<-lm(w2.log˜w1.log+z+z.m,data=mydata) rc.fitted<-predict.lm(rc.model,mydata) The expectations E(X|W 1 , Z) are given by rc.fitted, and the corrected odds ratio estimates are found by using this as the main exposure in the analysis model: rc.analysis<-clogit(y˜rc.fitted+z+strata(match),data=mydata) The variance of the corrected log odds ratio estimate (given by the coefficient of rc.fitted in the above model) is underestimated in the above model because it does not take into account the uncertainty in the parameters estimated in the regression calibration model. The corrected variance estimate (equation (12)) can be obtained as follows: var.betastar<-naive$var [1,1] lambda<-rc.model$coef [2] betastar<-naive$coef [1] var.lambda<-vcov(rc.model) [2,2] var.corrected<-(var.betastar/(lambdaˆ2))+((betastar/(lambdaˆ2))ˆ2) * var.lambda Regression calibration can be performed in Stata using the rcal command in the merror package (http://www.stata.com/merror/), which accommodates the situation using repeated measures and gives bootstrap estimates for the variance of the corrected estimate. It does not incorporate sensitivity analyses to allow for systematic error.
Moment reconstruction (MR) ( Using the above, the moment reconstructed values are given by Finally, the analysis model is fitted using the moment reconstructed values as the main exposure: mr.analysis<-clogit(y˜x.mr+z+strata(match),data=mydata) The standard error and 95% confidence interval which do now allow for uncertainty in the measurement error estimation ((a) in Table 1) arise directly from the analysis model just fitted. To account for the error in estimation of the measurement error we used bootstrapping ((b) in Table 1). Because this is a matched case-control study we sampled matched sets rather than individuals (note that there are a total of 318 matched sets). We obtained 1000 bootstrap samples, performed MR within each sample, and fitted the analysis model using the moment reconstructed values. The standard error of interest is given by the standard deviation of the 1000 bootstrap estimates of the log odds ratio β (est.boot). We used the code given below: The pooled hazard ratio estimate for the main exposure is given by mi.pool$qbar, and its variance by mi.pool$qbar. Note that pool.scalar is part of the mice package.
We used a bootstrapping procedure the same as that outlined for MR in the above section. The details are not repeated here.
In the situation in which the true exposure X is observed in a validation sample, MI can be performed in R using the mice package [1], for example, and in Stata using the ice package [2].

Methods for categorized exposures (Figure 2)
For the categorized exposures analyses described in Section 6, assuming classical error, we divided the main exposure into quintiles. The naive categorised exposure analysis, including the results plot (see Figure 2), can be performed as follows: w1.q<-cut(mydata$w1.log,breaks=quantile(mydata$w1.log,probs=seq(0,1,0.2)), labels=F,include.lowest=T) naive.q<-clogit(y˜as.factor(w1.q)+z+strata(match),data=mydata) Allowing heteroscedastic error (Table 3) Finally, we applied the methods for heteroscedastic error correction described in Section 4.4 to the example data (Section 7.3). Here it is assumed that fibre intake on the original scale is the exposure of interest.
If we ignore the evidence for heteroscedastic error on the original scale, then RC can be performed as follows: rc.model<-lm(w1˜w2+z+z.m,data=mydata) rc.fitted<-predict.lm(rc.model,mydata) rc.analysis<-clogit(y˜rc.fitted+z+strata(match),data=mydata) The standard error for β allowing for the uncertainty in the measurement error estimation was obtained using the approximation illustrated previously and the details are not given here.
The alternative method, in which we assume constant error variance on the log transformed scale was performed as follows: mr.model<-lm(w1.log˜z+z.m,data=mydata) mydata2<-subset(mydata,mydata$w2.log!="NA") mr. Bootstrapping of this full procedure was used to obtain corrected standard error. The bootstrapping procedure was shown above and we do not give the details again here.