SIMEX for correction of dietary exposure effects with Box‐Cox transformed data

Modelling dietary data, and especially 24‐hr dietary recall (24HDR) data, is a challenge. Ignoring the inherent measurement error (ME) leads to biased effect estimates when the association between an exposure and an outcome is investigated. We propose an adapted simulation extrapolation (SIMEX) algorithm for modelling dietary exposures. For this purpose, we exploit the ME model of the NCI method where we assume the assumption of normally distributed errors of the reported intake on the Box‐Cox transformed scale and of unbiased recalls on the original scale. According to the SIMEX algorithm, remeasurements of the observed data with additional ME are generated in order to estimate the association between the level of ME and the resulting effect estimate. Subsequently, this association is extrapolated to the case of zero ME to obtain the corrected estimate. We show that the proposed method fulfils the key property of the SIMEX approach, that is, that the MSE of the generated data will converge to zero if the ME variance converges to zero. Furthermore, the method is applied to real 24HDR data of the I.Family study to correct the effects of salt and alcohol intake on blood pressure. In a simulation study, the method is compared with the NCI method resulting in effect estimates with either smaller MSE or smaller bias in certain situations. In addition, we found our method to be more informative and easier to implement. Therefore, we conclude that the proposed method is useful to promote the dissemination of ME correction methods in nutritional epidemiology.


INTRODUCTION
Measurement error (ME) can lead to seriously wrong conclusions about associations between exposures and health outcomes (Carroll, Ruppert, Stefanski, & Crainiceanu, 2006). Correction methods can reduce the negative consequences of ME. These methods play a big role in modelling exposures based on data from dietary assessment tools as from the 24-hr dietary recall (24HDR) (Souverein et al., 2011). The 24HDR is used to repeatedly record the participant's dietary intake of entire days based on self-reports delivered on the following day. Since the self-reports are both error-prone and assessed on a daily basis, the data are strongly affected by intra-individual variation and do not reflect the usual intake nutritional epidemiologist are primary interested in (Boeing & Margetts, 2014). Furthermore, the intake distributions are usually positively skewed and sometimes zero-inflated.
The most commonly used method for modelling exposures based on 24HDR data that accounts for their special characteristics is the NCI method (Tooze et al., 2006;Kipnis et al., 2009). This method follows the regression calibration approach which consists of two parts. First, following a ME model for 24HDR data, the estimated conditional expectation of the usual intake given the observed 24HDR and other covariates is calculated. The ME model accounts for the skewness of the intake distributions by using the Box-Cox transformation. Second, in a health model, the estimated usual intake is used instead of the unknown true usual intake. The assumed ME model of the NCI method describes the association between the true and measured intake and the health model describes the association between usual intake and health outcome. Therefore, the NCI method provides nearly unbiased estimates for the association between dietary intake and health outcome (Kipnis et al., 2009) and has been used in various studies (Börnhorst et al., 2014;Liese et al., 2015;Hebestreit et al., 2017;Intemann et al., 2018).
However, a recent study by Shaw et al. (2018) about the usage of ME correction methods shows that studies with inadequate correction for ME remain common in nutritional epidemiology. Following the Measurement Error and Misclassification Topic Group of the STRATOS Initiative, it is important to raise awareness for ME problems and to further promote correction methods and their use (Freedman & Kipnis, 2018).
The simulation extrapolation (SIMEX) method (Cook & Stefanski, 1994;Stefanski & Cook, 1995), a promising, clear, and easy to implement correction method, has not been used for modelling 24HDR data, although it is, aside from regression calibration, one of the most prominent approaches. Basically, this method can be used in any situation where the underlying error model can be simulated by Monte Carlo methods. The idea of SIMEX is to add well-defined error terms to the observed variable, determine its association with a health outcome, and extrapolate the resulting effect estimates back to zero ME. For this purpose, remeasurements of the original data with varying level of additional error are generated. Based on the generated data, effect estimates are calculated and the functional association between these estimates and the level of ME is estimated. For some common error models, SIMEX is implemented in statistical software packages, for example, in the R package simex (Lederer & Küchenhoff, 2006). Therefore, the aim of this paper is to adapt the SIMEX algorithm for a model of 24HDR data, to provide a theoretical justification for SIMEX in this situation, and to investigate the algorithm in an application as well as in a simulation study. The paper is structured as follows. Section 2 describes the error and health model throughout. In Sections 3, 4, and 5, the SIMEX algorithm is introduced in the context of 24HDR data, its properties are discussed and its extension for episodically consumed foods is given. The proofs of the properties can be found in Appendices A.1 and A.2. The focus of these proofs is on the mean squared error (MSE) for the generated SIMEX data in case that the ME converges to zero. In Section 6, the SIMEX algorithm is applied to real 24HDR data of the I.Family study (Ahrens et al., 2017) to investigate the association between salt and alcohol intake and blood pressure. These studies serve as template for the simulation study in Section 7 where the SIMEX algorithm is compared with the NCI method. Finally, the results are discussed.

MEASUREMENT ERROR AND HEALTH MODEL
The ME and health model for 24HDR data used in this paper were developed by Dodd et al. (2006), Tooze et al. (2006), and Kipnis et al. (2009). The reported intake of individual on day is denoted as and the true individual usual intake as , = 1, … , , = 1, … , . We assume that is unbiased for on the original scale following the convention in nutritional epidemiology . Furthermore, we assume that (i) there exists a Box-Cox transformation ( ) = ( − 1)∕ if > 0 and ( ) = log( ) if = 0 such that with independent random variables ∼  (0, 2 ), and (ii) the regression model E( ( )) = 0 + + holds, where ∼  (0, 2 ) are independent random variables, 0 and are parameters, and are vectors of error free covariates. Combing both gives the non-linear mixed effects error model The random variables and are assumed to be mutually independent. The former reflects the inter-individual and the latter the intra-individual variation. It is important to note that ( ) is biased for ( ) since it is assumed that is unbiased for on the original scale. For modelling the association between the usual intake and a health outcome the so-called health model is defined as the following regression model with the parameters 0 , , and and the error free covariates ′ which are all included in . If the error model is ignored and the individual mean̄• is used instead of in the naïve health model the least-square estimator of (the naïve estimator) will be biased for . To reduce this bias, an adapted SIMEX algorithm is proposed in the next section.

ADAPTED SIMEX ALGORITHM
The classical SIMEX algorithm is based on the assumption of normally distributed error on the original scale. Therefore, an adaption is necessary to account for the transformation in error model (2). The proposed adapted algorithm consists of the following steps: (i) If and 2 are unknown, the parameters of the non-linear mixed model (2) will be estimated using a maximum likelihood (ML)-like and a restricted ML approach (for details see Appendix A.3).
(ii) Following Carroll et al. (2006) in the simulation step, the equation is used to generate remeasurements of in the -th data set, = 1, … , . The function −1 ( ) = ( + 1) 1∕ if > 0 and −1 ( ) = exp( ) if = 0 is the inverse function of , the pseudo-random variable ( ) ( ) follows a normal distribution with  ( ( ), 2 ) and ∈ . For the choice of values for and  see, for example, the study in Section 4. For the calculation of the corrective expected value ( ), we refer to Section 4. It guarantees for each individual that ( ) ( ) is (approximately) unbiased for on the original scale. Using the Box-Cox transformation with > 0 Equation (5) leads to ( ) F I G U R E 1 The typical SIMEX plot shows the naïve effect estimate (at = 0), the mean effect estimates for ∈  and the SIMEX-Q corrected estimatêS IMEX− (at = −1) which is based on the plotted quadratic extrapolation function (iii) The naïve health model (4) wherē• is replaced bȳ( ) • ( ) is fitted to the # × generated data sets. The corresponding effect estimates are denoted aŝ( ) ( ).
(iv) This and the following steps are conducted according to the classical SIMEX algorithm. First, for each ∈ , the arithmetic mean of the estimateŝ( ) ( ) of all generated data sets is calculated. It is denoted bŷ( ).
(v) Then, thê( ) are plotted against ∈ . The estimate for based on the naïve health model,̂, is denoted aŝ(0) and is plotted against = 0. Using a regression model, the association between ∈  ∪ {0} as independent variable and the correspondinĝ( ) as dependent variable is estimated. For this purpose, the polynomial function  ( ) = 1 + 2 1 + ⋯ + +1 or the rational linear function  ( ) = 1 + 2 ∕( 3 + ) are used. For = 2 the algorithm is called SIMEX-Q, for = 3 SIMEX-C, for = 4 SIMEX-Q4, and for = 5 SIMEX-Q5. If  is used, the algorithm will be called SIMEX-RL, which has the following advantageous theoretical property under the classical error model. In a multiple linear regression model, the effect estimator depends on the error-prone variable and has the form of  ( ) (for details see Carroll et al. 2006). However, the rational linear function has two drawbacks, (i) the success of the model fit depends on the choice of start parameters for 1 , 2 , and 3 and (ii) 3 ∈ [0, 1] leads to a singularity for ∈ [−1, 0] (Carroll et al., 2006).
(vi) The extrapolation step is the last step. The argument = −1 is inserted in the estimated function( ). The result(−1) = SIMEX is denoted as SIMEX estimate. The justification for the extrapolation is given in the following section. If corrected effect estimates for the intercept and remaining covariateŝS IMEX 0 and̂S IMEX ′ are required, the steps (iii)-(vi) will be repeated for the parameters 0 and ′ .
Steps (v) and (vi) are illustrated in Figure 1 for SIMEX-Q using the example introduced in Section 6. Furthermore, it is to be noted that in step (ii) if + ( ) ( ) < 0 for realisations of and ( ) ( ), the realisation of ( ) ( ) will not exist for all .
Since is assumed to be greater than zero, this can only occur for very small values of ( ) ( ), which may in particular occur when the variance of ( ) ( ) is large. In this case, a positive constant must be added to and the algorithm must be restarted with + instead of . Since this procedure is always feasible, we assume that all realisations of ( ) ( ) exist without loss of generality.

Another measurement error correction: The NCI method
Here, we briefly describe the NCI method (Tooze et al., 2006;Kipnis et al., 2009) used in this paper for comparison with the SIMEX approach. The method follows a regression calibration approach under the measurement error model presented in Section 2. As in the proposed SIMEX algorithm, the non-linear measurement error model (2) is fitted first. Then, based on the estimated parameterŝ,̂2,̂2,̂0, and̂, the usual intake is estimated using the formula which can be approximated by a Taylor series expansion: where −1 denotes the inverse Box-Cox transformation as introduced above. This approach can be extended taking into account the consumption probabilities ( > 0), if episodically consumed food data with excess zeros are investigated. Adaptive Gaussian Quadrature is used to estimate (8) (for details see Kipnis et al. 2009). Once the usual intake is estimated, it is plugged into in the health model (3) in place of the true usual intake to derive the corrected estimate for .

PROPERTIES OF THE GENERATED DATA ( ) ( )
When investigating the properties of the SIMEX algorithm, only the ME terms of all random variables are of interest. Therefore, and E( ( )) are assumed to be given in what follows. The focus is on the MSE of ( ) ( ) for each individual for → −1. The question is if in this case the key property holds. If yes, the SIMEX algorithm with the extrapolation to = −1 will be justified. In this hypothetical case, the generated data would be error-free for = −1 and the parameter estimate for these data corresponds to the parameter estimate of the unknown true data. It is important to note that according to the SIMEX algorithm, the ( ) ( ) itself is not extrapolated but the corresponding parameter estimate of the health model is.

EXTENSION OF THE ERROR AND HEALTH MODEL
The health model is not restricted to multiple linear regression models. Depending on the outcome variable or the study design, the health model can be applied to logistic regression models or mixed models. The error model can be extended in two different ways. The first extension will be useful for episodically consumed dietary components, such as alcohol or fish, which show a high proportion of zeros in daily consumption. According to the error model described in Kipnis et al. (2009), the true individual usual intake is given by the product of the consumption probability P( > 0) and the expected intake on a consumption day E( | > 0) : where denotes the true individual daily intake. The case = 0 is allowed and occurs if and only if is also zero. This implies that is error free for if = 0 as = when = 0 and further P( > 0) = P( > 0). To take this into account, the first and second step of the SIMEX algorithm have to be adapted as follows.
(i) Since the error model (2) is only true for > 0, the required parameters are then estimated using only data with > 0 in the first step. This model is called the amount model.  (5) and (6) if > 0.
From steps (i) and (ii), it follows that ( ) ( ) = if = 0 and also P( > 0) = P( ( ) ( ) > 0). Given , the key property (9) holds for each individual on each day ; this is obvious for = 0 since = ( ) ( ) = 0, and is shown in Appendix A.1 for > 0. Subsequently, in step three the full data set with > 0 and = 0 are again used to calculate the individual means ( ) • ( ). The remaining steps of the algorithm are then carried out without further modifications. It is worth noting that in a recently proposed error model for 24HDR data, the consumption probability is modelled as in the NCI method and only the amount model is modelled differently (Agogo, 2017). For an extensive discussion of alternatives for modelling excess zeros in 24HDR data, see Kipnis et al. (2009).
If the constant term is used to avoid the transformation error mentioned in Section 3, there are no more zeros in the data. Nevertheless, = is then treated as = 0 and the data set must be split into two subsets accordingly. When̄( ) • ( ) is calculated, can again be subtracted though, this is not strictly necessary since the estimation of is not affected. The second extension can be used if the error variance 2 is assumed to differ between groups. For example, this could be the case if different age groups are included in the study population. Then each of the groups is assumed to have its own error variance: ∼  (0, 2 ), = 1, … , . This can be easily incorporated in the first and second step of the SIMEX algorithm.

APPLICATION: ASSOCIATION OF BLOOD PRESSURE WITH SALT AND ALCOHOL INTAKE
For illustrative purposes, the adapted SIMEX algorithm was applied to real data from the I.Family study, which is a multi-centre study described in detail in Ahrens et al. (2017). It is a follow-up of the IDEFICS study aiming to investigate the causes of dietand lifestyle-related diseases in children from eight European countries (Italy, Estonia, Cyprus, Belgium, Sweden, Germany, Hungary, and Spain) (Ahrens et al., 2011). As part of the I.Family study, a web-based 24HDR was used to assess the diet of children and their parents (Hebestreit, Wolters, Jilani, Eiben, & Pala, 2018). In addition, systolic blood pressure (SBP) and the body mass index (BMI) were assessed. All institutional and governmental regulations concerning the ethical use of human volunteers were followed. Each survey centre obtained ethical approvals from the local responsible authorities in accordance with the ethical standards of the 1964 Declaration of Helsinki and its later amendments. In this study, data from 878 male adults were used to investigate the association between salt intake and blood pressure and between alcohol intake and blood pressure. The analyses were based on 1856 recalls, in which the intake of at least 500 kcal of energy was reported. The number of recalls per participant varied (38% recalled one day, 16% two days, 42% three days, and 4% four or more days). Even though no one reported zero salt consumption, the intake of less than 1 g of salt was assumed to be implausibly low. Therefore these recalls were excluded from the salt intake analysis which slightly reduces the number of included individuals and recalls to 873 and 1834. The adapted SIMEX algorithm was applied to the salt and alcohol intake data. The covariates age, BMI, and country were used in both the ME and the health models. Since alcohol is not consumed daily (the percentage of zeros was 56%), the extended approach of Section 5 was applied to the alcohol intake data. For each value of ∈ , the number of generated data sets was set to = 200 and  = {1∕4, 2∕4, 3∕4, 1, 5∕4, 6∕4, 7∕4, 2}. In the extrapolation step, the quadratic and quartic functions were used to calculate different SIMEX effect estimates. To illustrate the influence of the corrective expected value, the SIMEX data for salt intake were also generated with ( ) = 0 for = 2. Furthermore, the naïve and the NCI effect estimates, using the estimated usual intakes, assuming the same health and error models, were calculated for comparison with the SIMEX effect estimates.
The variability of the estimates of was assessed via the Bootstrap (Efron & Tibshirani, 1993). We drew = 500 bootstrap samples with replacement each containing the same number of individuals as the original salt and alcohol intake data set. Then, we applied the same methods as for the original data sets to the bootstrap samples and estimated the sample standard deviations of the 500 resulting effect estimates.
The estimated parameters of the error models can be found in Appendix A.3. Figure 3 shows that the inclusion of the corrective expected value resulted in a reduced bias of the generated data with regard to the true observations. The bias of the generated data using the corrective expected value was much smaller than that of the generated data without the corrective expected value.
All the different effect estimates suggest positive associations of SBP with salt and alcohol intake ( Table 1). The lowest effect estimates resulted from the naïve method. Ignoring the ME led to an estimated expected increase of 0.24 mmHg in SBP per 1 gram salt intake, whereas the corrected effect estimates were 1.7-2.7 times higher than the naïve estimate. Similar results were observed for alcohol intake. Ignoring the ME led to an estimated increase of 0.339 mmHG in SBP per 10 grams alcohol intake, whereas the corrected effect estimates were 1.4-2.8 times higher than the naïve estimate. In the salt and alcohol intake analyses, the estimated standard errors were lowest for the naïve estimates and highest for the NCI estimates. The SIMEX-Q4 standard errors were comparable with those of the NCI method and 1.4-1.5 times higher than those of SIMEX-Q. That means SIMEX-Q4 and the NC -method led to the same amount of uncertainty. Although statistically significant associations could be proven in other applications using the NCI method, this was not the case for the correction methods in the applications investigated in this paper if 95% confidence intervals based on the estimated standard errors were used. Nevertheless, the correction methods led to more relevant associations from a public health perspective compared to the naïve approach and the uncertainty can be attenuated by increasing the sample size in future studies. F I G U R E 3 Comparison of and average ( ) ( ) for = 2 according to the adapted SIMEX algorithm based on = 200 generated SIMEX data sets (A) without bias correction using ( ) = 0 and (B) with bias correction using ( ) according to Equation (11). The regression lines are = 0.426 + 1.096 and = 0.036 + 1.034 T A B L E 1 Effect estimates calculated with different methods (naïve, NCI method and SIMEX with quadratic and quartic extrapolation function) and corresponding bootstrap standard errors (SE) for the association between salt intake (in grams) and the systolic blood pressure (in mmHg) and between alcohol intake (in 10 grams) and systolic blood pressure (in mmHG) in male adults adjusted for age, body mass index, and country

SIMULATION STUDY
The set-up of the simulation study was based on the studies conducted by Kipnis et al. (2009) and Agogo (2017) while the data came from the I.Family study and the models from the applications above. In total 500 data sets were simulated for each scenario and each sample size. The sample sizes were = 1,000; 500; 300; 200; and 100 individuals. As in the above applications, the number of recalls was assumed to vary per individual to roughly represent the distribution found in the intake data (35% with one recall, 20% with two recalls, 40% with three recalls, and 5% with four recalls), that is, the data sets consist of 2, 150; 1, 075; 645; 430; and 215 observations. For each individual, the combined covariate information for age, BMI, and country were sampled with replacement from the original data set of the application studies. For each data set and each individual, 1,000 recalls were simulated on the -transformed scale based on the error models described in Appendix A.3. These values were back-transformed and the individual average was calculated, which was used as true usual intake . The same procedure (without averaging) was conducted to simulate one, two, three, or four recalls for each individual. The health outcome SBP was generated from the health models of the application studies based on the simulated true intake (for details see Appendix A.3). The true coefficients of the salt and alcohol intake were chosen to be 0.5 and 0.9 (cf. Table 1) which correspond to an expected increase of 0.5 mmHg in SBP per 1 gram salt intake and of 0.9 mmHG in SBP per 10 grams alcohol intake, respectively.
The same correction methods as for the real data were applied to these simulated data sets. Furthermore, the estimates based on the true usual intake were calculated. As mentioned in Section 3, a problem occurs if + ( ) ( ) < 0. This problem could not be completely avoided in the simulation study, since a total of 2 × 500 × (2, 150 + 1, 075 + 645 + 430 + 215) × 8 × 200 = 7.2 × 10 9 observations were generated in the SIMEX algorithm. The proportion of negative values of + ( ) ( ) in the generated observations ranged from 1.0 × 10 −5 % to 5.1 × 10 −3 % in the salt intake scenario and from 1.3% to 1.4% in the alcohol intake scenario. It was solved by using the minimum of the corresponding generated data set instead. In the alcohol intake scenario with 200 and 100 individuals, the NCI method failed for 4 and 8 data sets, respectively, due to convergence errors when fitting the non-linear mixed effects error model or failure of the Adaptive Gaussian Quadrature optimisation. These data sets were excluded from the assessment of the NCI method. Subsequently, the performance of the different effect estimates was measured in terms of mean empirical bias, the empirical standard error (SE), and the empirical MSE. The results are summarised in Table 2 for = 1,000 and in Table 3 for = 1,000; 500; 300; 200; and 100. For = 1,000, the NCI method was nearly unbiased (0.014 and 0.037), while SIMEX-Q4 and SIMEX-Q underestimated the effects (−0.078 and −0.186 for salt intake; −0.241 and −0.369 for alcohol intake). The empirical bias of the naïve approach was most serious (−0.307 and −0.548). With respect to the empirical MSE, SIMEX-Q outperformed the other correction methods while the NCI method and SIMEX-Q4 performed equally well. For = 500, the NCI method was still superior regarding empirical bias, but if the sample size was further reduced this superiority vanished, while that of the SIMEX-Q regarding empirical MSE remained. The performance of SIMEX-Q and -Q4 regarding empirical bias seems to be independent of the sample size. SIMEX-Q4 was even less or as biased as the NCI method for = 300, 200, and 100 in the salt intake scenario and for = 100 in the alcohol intake scenario. The fact that the NCI T A B L E 3 Comparison of effect estimates resulting from the NCI method and SIMEX with quadratic and quartic extrapolation functions for two different simulation scenarios (daily and episodically consumed dietary component) for different sample sizes ( = 1,000; 500; 300; 200; 100) regarding empirical (emp.) mean, bias, standard error and mean squared error based on 500 simulation data sets method led to an overestimation of the effect in the salt intake scenario for = 100 can be partly explained by seven outliers. In these cases, 2 of the ME model (2) was severely underestimated. All methods demonstrated decreasing empirical MSE with increasing sample size.

DISCUSSION
In this paper, we proposed an alternative, easy-to-use method for ME correction of dietary exposure derived from a 24HDR. The method is based on the SIMEX approach and the assumptions of the error model described in Kipnis et al. (2009). We gave the justification for this method by proving the key property of the adapted SIMEX algorithm. It was shown that the MSE of the generated data converges to zero if the total ME variance converges to zero (i.e., converges to −1). Furthermore, we introduced a so-called corrective expected value which ensures the generated data to be approximately unbiased on the original scale. The underlying ME model assumes that the 24HDR is unbiased on the original scale for the true usual intake, which is not always true. Nevertheless, this is a common working assumption in this field . Formally, the unbiasedness can be justified by the definition of the true usual intake as the average of daily assessed 24HDRs over a long period, since this is the best possible measure for dietary components in practice if a gold standard is not available or known (Carroll et al., 2006). Dodd et al. (2006) discussed extensively whether the unbiasedness should be assumed on the transformed or on the original scale. They are in favour of the latter. Among others, arguments for this choice are (i) that the estimated group mean usual intake and overall average of the 24HDRs coincide and (ii) that the assumption is independent of the estimated Box-Cox parameter , that is, it does not change by analysis group or over time. Nevertheless, if a gold standard for one specific component is available, it can be used to check the robustness of the assumption and of the proposed SIMEX algorithm and to derive new error models which may better reflect reality.
One advantage of the adapted SIMEX algorithm is the easy implementation, which is mainly based on generating the data ( ) ( ), = 1, … , , ∈ , and fitting the health model to these data repeatedly. We applied the proposed method successfully to real data of daily and episodically consumed food. Furthermore, in a simulation study, we compared the proposed method with the NCI method for both scenarios and for varying sample sizes. In all scenarios, SIMEX-Q had lower empirical MSE. Although reducing bias is usually considered more important, methods aiming at a partial correction to reduce the MSE are also justified (Carroll et al., 2006). We therefore recommend applying SIMEX-Q in situations where a small MSE is considered as more important than a small bias. Otherwise, if the bias is the decisive criterion, the NCI method should be used if the sample size is sufficiently large ( ≥ 500 or ≥ 200 in the investigated scenarios). If this is not the case, the SIMEX-Q4 seems to be a more attractive choice, since for smaller sample size, the method appears less biased than the NCI method. One reason for the relatively good performance of SIMEX-Q4 in these situations is the number of parameters on which SIMEX depends. In case of daily consumed food, the same ME model must be estimated for the NCI method and the adapted SIMEX algorithm, but subsequently SIMEX only uses two parameters ( and 2 ) whereas the NCI methods needs all model parameters, that is, 13 in the salt intake scenario, to estimate the usual intake. This makes the SIMEX approach more stable if one of the parameters is heavily under-or overestimated, which happened 7 out of 500 times in the salt intake scenario with = 100 individuals. This unfavourable property of the NCI method is particularly relevant if the sample size is small.
The number of required parameters is also crucial when using SIMEX for sensitivity analyses. This technique, which was proposed by He, Yi, and Xiong (2007), has been recommended for situations where the ME model cannot be estimated but external information about the measurement error is available or can reasonably be assumed. This could, for example, be the case if repeated measurements are missing and values for and 2 can be found in the literature. Then these parameter values can be used in the adapted SIMEX algorithm to obtain corrected effects estimates.
The proposed SIMEX modification has two drawbacks. The first is well-known from the classical SIMEX approach and was already noted by Cook and Stefanski (1994) when it was first introduced. The extrapolation is the Achilles heel of SIMEX, making it an approximative procedure (Carroll et al., 2006). That is why additional extrapolation functions were considered in the simulation study: the cubic (C) and quintic polynomial (Q5), the rational linear (RL) and spline function (Table SM1). The performance of SIMEX-C regarding empirical bias and MSE was between that of the quadratic and the quartic function, that is, inferior to SIMEX-Q4 regarding empirical bias and inferior to SIMEX-Q regarding empirical MSE. The empirical bias of SIMEX-Q5 was of similar size as the empirical bias of SIMEX-Q4, but the empirical MSE of SIMEX-Q5 was always several times higher than that of SIMEX-Q. Besides the practical problems of SIMEX-RL (see Section 3), we also observed sometimes a erratic behaviour of the rational linear extrapolation, which was already mentioned in other studies (Küchenhoff & Carroll, 1997;Carroll et al., 2006) and which resulted in an unacceptably high empirical bias and MSE in the simulation study. Instead, the spline extrapolation resulted in conservative estimates comparable to those of SIMEX-Q. From a theoretical point of view, it might also be interesting to find the exact form of the extrapolation function for the adapted SIMEX algorithm, that is, the form of the effect estimator of the health model taking into account the complex error model for 24HDRs. This will be addressed in future research. However, the practical benefit might be small considering how badly the rational extrapolation works in cases where it is actually the true form (Carroll et al., 2006).
The second drawback is due to the Box-Cox transformation in the error model in which a strictly positive variable is modelled by a normal distribution. This could mean that ( ) ( ) cannot be calculated for some = 1, … , and = 1, … , . In practice, this issue can be solved by either adding a positive constant to and applying the SIMEX algorithm to + or if only a very few observations are affected by just setting these observations to the minimum of the observations in the specific generated data set. A theoretical solution for this could be a modified error model, for example, the truncated normal distribution. However, this makes the model more complicated and the practical benefit is low. For example, in our application on salt intake the difference between the truncated and the untruncated distribution is negligible even in the simulation step for = 2 since P( salt < −1∕ ) = 10 −11 if salt is conservatively estimated by salt ∼  (17,̂2 + 2̂2) (as mentioned in Section 3, this negligible difference can always be achieved by adding a positive constant.). Another approach was proposed by Agogo (2017) using the generalized gamma distribution in the ME model which also increases the model complexity.
In conclusion, the proposed method is theoretically justified and led in practice to a reasonable correction. In a simulation study, the proposed method led either to estimates with smaller empirical MSE, or to estimates with smaller empirical bias for small samples than the NCI method. Furthermore, the adapted SIMEX approach can be useful in sensitivity analyses as well as graphically illustrating the measurement error correction.

SUPPORTING INFORMATION
Additional supporting information including source code to reproduce the results may be found online in the Supporting Information section at the end of the article.
If → −1, the first factor will converge to zero and the second to a constant and thus lim →−1 Var( ( ) ( )) = 0. Now, we consider the case > 0. Then the density function of −1 ( ) is .
We consider the second moment of ( ) ( ) using the substitution = −1 ( ), the inequality ( + 1) ≤ exp( ) and again the variance of the logarithmic normal distribution: It follows the existence of the second moment and the existence of the expected value and variance of ( ) ( ). (2006) found an alternative representation for the moments of −1 ( ) using the Taylor expansion for ( −1 ) at :
Therefore, Equation (10) is used to calculate ( ). In the following, we show desirable properties for the three special cases, → −1, = 1, and = 1∕2 using this equation.

A.3 Estimated measurement error and health models
To estimate the non-linear mixed model (2), the R-functions lmer and the powerTransform of the R-packages lme4 (Fox & Weisberg, 2011) and car (Bates, Mächler, Bolker, & Walker 2015) were used in the SIMEX algorithm. In the salt intake analysis the estimated parameters werê wherêincludes the parameter estimates of the covariates age, BMI and the countries Estonia, Cyprus, Belgium, Sweden, Germany, Hungary, and Spain (Italy serves as reference category.). To estimate the error model for the NCI methods, the SASmacros mixtran and indivint were used (Kipnis et al., 2009)  where 2 is the variance of the error term of health model. The parameter vector has the same structure as . Analogously, for the alcohol intake analysis, the parameters in the amount model, that is, for > 0, were estimated as follows using the SAS-macro mixtran: Again, the estimates obtained with the R-functions for the SIMEX algorithm only slightly differed (e.g.,̂= 0.313).
For the NCI method, the probability of the (reported) alcohol intake needed to be estimated using the mixed effects logistic regression model: with the parameters ′ 0 and ′ and the random variable ′ ∼  (0, 2 ′ ) ( as before). Furthermore, the correlation between ′ and (of model (2)) needed to be estimated to allow the probability and the amount of intake to be correlated (for details see Kipnis et al. 2009). The estimated parameters were: