A comparison of methods for estimating the temporal change in a continuous variable: Example of HbA1c in patients with diabetes

Abstract Purpose To compare the more complex technique, functional principal component analysis (FPCA), to simpler methods of estimating values of sparse and irregularly spaced continuous variables at given time points in longitudinal data using a diabetic patient cohort from UK primary care. Methods The setting for this study is the Clinical Practice Research Datalink (CPRD), a UK general practice research database. For 16,034 diabetic patients identified in CPRD, with at least 2 measures in a 30‐month period, HbA1c was estimated after temporarily omitting (i) the final and (ii) middle known values using linear interpolation, simple linear regression, arithmetic mean, random effects, and FPCA. Performance of each method was assessed using mean prediction error. The influence on predictive accuracy of (1) more homogeneous populations and (2) number and range of known HbA1c values was explored. Results When estimating the last observation, the predictive accuracy of FPCA was highest with over half of predicted values within 0.4 units, equivalent to laboratory measurement error. Predictive accuracy improved when estimating the middle observation with almost 60% predicted values within 0.4 units for FPCA. These results were marginally better than that achieved by simpler approaches, such as last‐occurrence‐carried‐forward linear interpolation. This pattern persisted with more homogeneous populations as well as when variability in HbA1c measures coupled with frequency of data points were considered. Conclusions When estimating change from baseline to prespecified time points in electronic medical records data, a marginal benefit to using the more complex modelling approach of FPCA exists over more traditional methods.


| INTRODUCTION
Opportunities for research using routinely collected data will increase significantly over coming years with expansion of electronic health records (EHR), and investment in e-infrastructure for research, distributed data networks, and patient-centred research. [1][2][3] Analysis of data collected primarily for healthcare delivery rather than research generates methodological challenges. Progress is happening in many areas, for example, studying the same question across different geographical settings 4 with different healthcare systems and adjusting for confounders defined and measured differently in different settings. 3  such as selecting the closest temporal measurement as a surrogate, 5 linear interpolation or 'joining-the-dots' assuming linear change between each sequential measurement, 6 averaging measures over yearly intervals 7 or estimating simple linear regression (SLR) lines using 2 or more measurements. More complex techniques are also available, for instance, random effects (RE) modelling, which allows for population and individual variations. Certain methods consider multiple imputation for longitudinal data 8 but require data to be measured at regular time points, thus not applicable in this context. A nonparametric statistical technique known as functional principal component analysis (FPCA) exists to model longitudinal data, 9 although not widely used in epidemiology. This approach views longitudinal data records as functions, where each curve is a single observation, but the belief is that data are sampled from a process, which is continuous over time. Statistical emphasis is shifted onto observed data functions and no longer on individual observations. 10 The technique's aim is to develop a continuous-in-time estimation (or 'trajectory') of a continuous variable, based on the individual's own data points as well as patterns of change within the whole population.
It is unclear whether more complex techniques perform better when dealing with sparse and irregularly spaced data. Hypothetically, simpler models may work well when variability of measurements is limited, but more complex models may work better in certain situations. The study's aim was to compare FPCA methodology to other simpler methods in a cohort of diabetic patients from primary care. Specific objectives included (1) removing known HbA1c observations and calculating prediction error by comparing estimated to known values at specific time points in the whole cohort as well as in treatment, gender, and age strata and (2) exploring whether methods perform differently in certain circumstances, such as when there are changes in number of medications, length of time between consecutive measurements, and data sparseness levels within stable and unstable disease groupings.

| Study population
The setting was the Clinical Practice Research Datalink (CPRD), a UK database of anonymised primary care EHRs covering an active population of over 8 million people. 11,12 Adult patients with type 2 diabetes defined by Read codes (code list available from the authors) or who were prescribed OHG medication between 1987 and July 2011 were identified from CPRD.
Practices were excluded if their last collection date preceded the study end date or the practice did not meet minimum data quality standards, as assessed by CPRD, throughout the study period. We • More complex techniques exist, such as FPCA, to model sparse longitudinal data.
• In patients with diabetes, this study demonstrates that in the setting of sparse and irregularly spaced data, using the more complex method, FPCA, has a marginal benefit.  randomly selected patients (Figure 2a). The temporarily excluded data point was later reinserted and used to calculate (1) prediction error defined as the absolute difference between the actual HbA1c measurement and its estimated value (d) and (2) squared prediction error. This was developed to allow estimation of prediction error at times when outcomes for some patients may not have been measured and could not contribute to the estimation. The procedure of removing a random 25% of final data points was done 6 times to reflect the variability that would occur depending on which data points were sampled. Results across the 6 data sets were pooled with mean and SD values for each measure of predictive accuracy calculated. Coefficients of variation (CV), defined as the ratio of SD to the mean, were also generated as a measure of precision between the 6 replicated data sets.
Linear interpolation methods were used in the primary analysis when estimating the final observation for each individual, as described by Genolini,14 and summarised in Table 1.
As an alternative to linear interpolation methods, the arithmetic mean (AM) method was used, which involved simply calculating the aver- One other estimation approach of temporarily excluded values was the use of RE modelling using random intercepts and constant slope. Individuals once again were not assumed to be measured at the same number of time points, but rather at different time points.
This model estimates the individual's values across time on the basis of whatever data that individual has, enhanced by the time trend that is estimated for the sample as a whole but with the added bonus of taking into account the effects of covariates, age and gender, in the model. 15 A final approach was to use FPCA methodology in developing patient-specific estimated trajectories, using all data from the whole population, which would then allow estimation of a continuous variable, such as HbA1c, for each individual, not only at the last data point but also at any time point of interest throughout the study period ( Figure 3g). It is the only method tested that allows for the possibility that HbA1c changes nonlinearly with time, or that patterns of change differ between individuals.
All interpolation methods, including AM and SLR, estimated temporarily excluded values using just that individual's set of data, whereas with the model-based approaches of RE and FPCA, it was necessary to use data available on all subjects in the study cohort when making estimations at specific time points for particular individuals.
A secondary analysis sought to examine whether prediction errors improved for any estimation method by removing the middle data point for the same 6 sets of 1-in-4 randomly selected patients (Figure 2b).
This analysis allows us to use the additional linear interpolation method, next occurrence carried backward (NOCB) ( Table 1 and Figure 3h).
Global and local interpolation methods are slightly modified in this analysis as shown in Table 1  To appreciate the importance of the difference between estimated and true HbA1c, we calculated the proportion of individual absolute prediction errors that were (i) below measurement error and (ii) below a clinically meaningful difference. HbA1c measurement error is considered to be around 0.4% assuming an average HbA1c value of 8%. 16 We defined the clinically meaningful difference as the change in HbA1c associated with a 10% increased risk of any endpoint related to diabetes, which equates to a change in HbA1c of 0.5%. 17 The lowest values for MAPE and MSAPE and the highest proportion of absolute prediction errors (i) within a clinically meaningful difference and (ii) within measurement error, indicated the optimal estimation method.

| Factors influencing prediction error
We anticipated that prediction error would be affected by many factors, such as medication, gender, age, switching medication, distance   These sensitivity analyses were done only for a single replicated data set. With each stratum, we created a homogeneous population and as such we expected modelling methods to perform better than other approaches that do not take into account population behaviour.
The exception to this is likely to be stratum G, where we restricted the population to patients with highly variable HbA1c and as such the expectation was that all approaches may underperform.
The analysis was conducted using Stata V.12.1 (http://www.stata. com) and R V.3.1.3 (http://www.R-project.org).   just 6 replicated data sets, any differences in performance are real and not merely due to random variation.

| RESULTS
The best performance was achieved by FPCA where a mean of 54% of subjects had prediction errors less than measurement error compared to a mean of 29% with bisector linear interpolation, the least performing method. FPCA was only marginally better than last-occurrence-carried-forward (LOCF), RE, and AM approaches (Table 2).
Limiting prediction error assessment to strata A-F, based on a single data set having removed 1-in-4 final data points, produced results that can be seen in Figures S1a -S1d. The overall performance of approaches did not change in pattern from that seen in the whole new user cohort, in that LOCF, AM, RE, and FPCA approaches were optimal followed by SLR, with the remaining linear interpolation methods performing worst. Similar results for stratum G (Figures S2a -S2d) found the pattern of prediction errors within subgroups remained the same, with LOCF, AM, RE, and FPCA generating more accurate predictions. See Table S1 for a summarised version of these results. where the best performance was achieved once again by FPCA, although closely followed by SLR, AM, and RE, whereas low CVs reflect the fact that these differences in performance are not due to random variation.
In general, a similar pattern of prediction error was found from all approaches in this secondary analysis when limiting prediction error assessment to strata A-G based on a single data set having removed 1-in-4 middle data points ( Figures S3a -S3d, Figures S4a -S4d and summarised in

| DISCUSSION
This study compared methods for estimating values of a continuous variable, HbA1c, at a given time point using known values of this sparse and irregularly spaced data point within UK primary care records of patients with diabetes. Few studies exist, which investigate the effectiveness of these methods, yet researchers apply them without considering their performance.
In Table 2, when estimating the last observation in the primary analysis, LOCF and FPCA proved to be optimal approaches, with FPCA performing marginally better in some assessments, whereas the remaining linear interpolation methods were equally poor. As the populations were made more homogeneous, such as restricting to females or by having single continuous drug use, the more complex modelling involved in RE and FPCA approaches did not outperform the simpler LOCF method, although FPCA achieved slightly better results overall. For example, under FPCA, 59% of female subjects achieved prediction errors below measurement error compared with 54% under LOCF, whereas the least performing method, bisector linear interpolation, only achieved 32%. We expected an optimal performance from FPCA because of its flexibility to deal with longitudinally nonlinear changes in HbA1c, yet the advantage in using this approach