The use of cross-sectional data to estimate the impact of health events on individual health-related quality of life (QoL) is widespread as such data are easier and less expensive to collect than longitudinal data. Consequently, the existing QoL evidence in areas such as diabetes relies largely on cross-sectional data (Clarke et al., 2002, Bagust and Beale, 2005, Redekop et al., 2002, O'Reilly et al., 2011, Lundkvist et al., 2005, Lung et al., 2011). In these studies, any differences in health-related QoL (HRQoL) between health states with and without events (e.g. stroke) are typically interpreted as changes in an individual's HRQoL resulting from the experience of that event. However, if patients who experience events are systematically different from those who do not, then it is possible that their QoL will differ before as well as after an event occurs. If this is the case, a cross-sectional approach will attribute differences in QoL between patients with and without events to the impact of the events, when in fact they are partly (or wholly) because of underlying and potentially unobserved heterogeneity across these two groups of patients.
In this paper, we use a rich longitudinal dataset to estimate QoL decrements associated with six diabetes-related complications. Our analytical strategy addresses the following questions. First, we illustrate how the use of cross-sectional data to estimate the impact of health events on QoL may result in biased estimators, for example because of unobserved patient heterogeneity. We do so by comparing estimates of the QoL impact of diabetes-related complications obtained from cross-sectional analysis of pooled data with panel data estimates obtained from exploiting the variation across time and within patients. We show that longitudinal data could help mitigate the bias arising from patient specific time-invariant characteristics. We also estimate the magnitude of potential bias in the estimation of QoL decrements when patient heterogeneity is ignored. To control for potential selection into the sample, which could influence the impact of complications on HRQoL, we also test whether sample selection in the form of non-response to the QoL questionnaire matters. Finally, we explore whether the introduction of additional patient-specific characteristics could mitigate the bias in a pooled ordinary least squares (OLS) context.
The paper is organised as follows. Section two describes the dataset used in the paper. Section three outlines the estimation strategy. The empirical results are presented in section four and discussed in section five.
2 DATA: THE UNITED KINGDOM PROSPECTIVE DIABETES STUDY
The United Kingdom Prospective Diabetes Study (UKPDS) was a large, long-term, randomised trial of 5102 patients with diabetes, which compared the effects of different glycaemic interventions on diabetes-related complications and death. Descriptions of the study and clinical results have been previously published (UKPDS Group, 1991, UKPDS Group, 1998a, UKPDS Group, 1998b). After completion of the intervention phase of the study in 1997, all surviving patients entered into a post-trial monitoring study for a further 10 years (Holman et al., 2008a, Holman et al., 2008b), during which information on clinical events continued to be collected.
Health-related QoL was assessed using the EQ-5D questionnaire (EuroQol Group, 1990), with utility values taken from the UK preference set (Dolan et al., 1996). Questionnaires were first administered to all participants in 1996/1997, and estimates of the QoL impact of diabetes complications on utility from that cross-sectional study, plus a comprehensive description of the questionnaire protocol and mode of administration, have been previously published (Clarke et al., 2002). The EQ-5D was then administered annually between 2002 and 2007, with a final questionnaire administered in October 2007 to all surviving participants, giving a maximum of seven responses per patient.
About 3380 patients completed at least one EQ-5D questionnaire, with a mean of 3.4 questionnaires completed per patient for a total of 11,614 questionnaires available for analysis. Response rates conditional on the patients being alive varied from 90% to 74%. Sample sizes by wave, mean age, mean EQ-5D tariff value, and the proportion of patients reporting full health are reported in Table 1, along with the proportion of patients reporting ‘no’ (level 1), ‘some’ (level 2) or ‘severe’ (level 3) problems on each dimension of EQ-5D and at each follow-up point.
Table 1. Descriptive statistics for the health related quality of life measures in the UKPDS, by questionnaire wave
Questionnaire wave (number of patients with complete questionnaires)
Questionnaires were administered from 1 September to 30 September, in rounds 1 to 6, corresponding to the anniversary of each patient’s therapy decision. The last questionnaire was administered in October 2007 to all surviving participants.
Percentage of patients at each response level (%):
The mean EQ5D tariff in the sample equalled 0.69 (SD 0.3) across all follow-up points. As expected, the proportion of patients reporting no problems, and the mean QoL tariff, declined over time, from 0.76 (SD 0.27) to 0.65 (SD 0.31). Patients who had experienced at least one diabetes-related complication had tariffs on average between 0.08 and 0.11 lower than patients with no complications. The average age of the patients was 62.3 years at questionnaire 1 and 71.3 at questionnaire 7. As an external comparison, the average tariff in a general English population sample at ages similar to that of the sample average (65–74 years old) has been reported as 0.76 for women and 0.80 for men (Department of Health, 1998).
Here, we examine the QoL impact of six diabetes-related complications: myocardial infarction (MI), ischaemic heart disease (IHD), stroke, heart failure, amputation, and blindness in one eye. We include only non-fatal occurrences of these events. Detailed information on the classification of complications in the UKPDS and the ICD codes used to define them has been reported previously (UKPDS Group, 1991). The total number of non-fatal diabetes-related complications recorded, and the number of patients experiencing these complications, are reported in Table 2.
Table 2. Number of non-fatal diabetes-related complications and participants with one or more events among the UK PDS participants providing quality of life data (n = 3380)
Number of events
Number of patients with event(s)
Max number of events per patient
ischaemic heart disease
Blindness in 1 eye
Approximately 33% of patients had no complications prior to or during the period over which EQ-5D questionnaires were administered (1105 of 3380 participants), 38% had one or more micro-vascular complications (blindness in one eye or amputation), 30% had a non-fatal macro-vascular complication (MI, IHD, stroke or heart failure), and 13% had both types of complications.
3 ESTIMATION STRATEGY
The estimation strategy (Figure 1) addresses two main concerns. First, we study whether attrition, a common problem in panel data, is a potential source of bias, and if so, we aim to characterize this process in the estimation. We follow Wooldridge (1995) to test the importance of attrition. Second, we investigate whether unobserved time-invariant patient characteristics might give rise to omitted variable bias. Decomposing the variation in the QoL tariff into variation across individuals, or ‘between’ variation, and variation across the questionnaires of a given person, or ‘within’ variation, is used to assess the degree of heterogeneity. If a high degree of heterogeneity is present, that is, if between variation is significantly greater than within variation, and this heterogeneity is correlated with the likelihood of having an event, then the results of a cross-sectional analysis might be biased, attributing to events QoL effects that, in fact, are related to patient characteristics. In a linear model, it is only when this bias is present that estimates from a fixed effects (FE) estimator will be significantly different from the estimates of an OLS estimator (Wooldridge, 2010), and a variant of the Hausman's test (Hausman, 1978) is used to test this hypothesis.
3.1 Sample selection model: the importance of non-response
We begin our estimation strategy with a panel FE model with non-response, following Wooldridge (1995), as this demands the least number of restrictions and is the most robust under the weakest assumptions. This model allows for both the selection process and possible bias because of patient-specific unobserved characteristics. In a longitudinal setting, the Wooldridge procedure requires separate Probit regressions in the selection equations for each period or questionnaire wave. This overcomes the problem that FE cannot be conditioned out of the likelihood. Thus, if a patient reports all 7 questionnaires, for that individual there will be 7 regressors for each of the explanatory variables.
If non-response is induced by unobserved time-invariant patient characteristics, then participation bias could be resolved by eliminating the time-invariant error term within a FE model, even if these potentially unobserved characteristics are correlated with the explanatory variables of interest. From the FE with non-response model, we then compare results with those obtained using the Heckman selection model. This is still a two-step approach for correcting for non-randomly selected samples, but unlike Wooldridge (1995), no provision is made for possible confoundedness from time-invariant heterogeneity as the outcome's equation is a pooled OLS. The question of interest here is whether the reporting cohort is in any way different from the surviving cohort which does not participate.
We define the relationship between patient covariates, x , and EQ-5D tariff, y as
where encompasses all unobserved determinants of tariff levels. The standard assumption in the estimation strategy is that, in the population as a whole, there is no correlation between the onset of events, x, and the unobservables. The parameter β is interpretable as the causal parameter of interest unless either of two conditions presents itself (Heckman, 1979):
Condition 1. The explanatory variable(s) of interest, x, are correlated with the probability of the patient being observed in the sample; or
Condition 2. The unobserved determinants of the sample selection process are correlated with the unobserved determinants of the outcome of interest, .
These conditions can be tested within a sample selection framework. Consider the following latent-variable model for sample selection:
Without loss of generality, in a longitudinal setting (Wooldridge approach), one should read Equation 2 as containing subscripts i,t corresponding to i = 1,…,N individuals and t = 1,…,T questionnaire rounds. In a cross-sectional setting (Heckman approach), each patient observation is instead assumed to be an independent draw, so the relevant subscript is the patient questionnaires j = 1,…,NxT. In this instance, s*, which indicates the patient's willingness to return a filled questionnaire, is a latent variable which is not observed and can only be inferred by observing whether the patient has returned a questionnaire:
x′α denotes a matrix of covariates and their associated coefficients. This matrix contains the same covariates that are used in the second stage outcome equation. w ′γ in Equation 2 is a vector of exclusion restrictions. These restrictions have to influence selection, but they must not have any direct impact on the outcome of interest: utility. The importance of the exclusion restrictions is twofold: first, the hypothesis of sample selection becomes testable, and second, they make the simultaneous equation system identifiable as it no longer relies purely on functional form (Wooldridge, 2010). The exclusions restrictions act as instrumental variables (IV) of the inverse Mills ratios (IMR).
In this framework, we can test whether the explanatory variable(s) of interest, x, are correlated with the probability of the patient being observed in the sample (i.e. Condition 1 implies that α ≠ 0) and whether the unobserved determinants of the sample selection process u are correlated with the unobserved determinants of the outcome of interest ϵ (Corr (ϵ, u) ≠ 0, i.e. condition 2). The source of bias occurs when the expectation of u depends on x in the sampled population. This depends on the second condition above, namely, that u is correlated with ϵ. Under this assumption, the expectation , and this is testable in the Heckman framework. So, if the expected value of u in the selected population is a function of events, x, then the expected value of will be a function of x in the population we observe as well. As Heckman observed, this can be thought of as a classic case of omitted variable bias. In the absence of any corrective steps, the correlation between x and in the sample will bias estimates of the coefficient of interest, β. If α is negative and and u are positively correlated, then this bias will be upward (i.e. towards zero if the true β is negative or away from zero if the true β is positive).
The selection equation in the panel FE model with attrition is
The error term is composed of two terms, ξi + ait, where ξi is a time-invariant unobserved effect. wit in Equation (3) is the exclusion restriction. Herein, we use the interactions between locations and the time trend, exploiting the variation in response through time within location.
The outcome equation in the panel FE model with attrition is
where and are the estimated inverse Mills ratios. Heteroskedasticity robust standard errors are computed for each ρ. We test the null hypothesis H0 : ρ = 0. If this is the case, then non-response is not a source of bias. If, on the contrary, we reject the null hypothesis, then for each t = 1,2…7, Pr(sit = 1|xit) = Φ(X'α) are estimated using standard Probit models, and the inverse Mills ratio are computed and used in the outcome equation.
It is important to note that while s and y are part of the data-generating process, the inverse Mills ratio is not. Because the inverse Mills ratios are estimated, we need standard errors that reflect this uncertainty, and here, we report bootstrapped standard errors. The above equation should read simply as yit = xit′β. All other variables are treated as controls and are of interest only insofar as they affect the estimates of the betas.
3.2 Fixed effects model: the importance of patient heterogeneity
If the inverse Mills ratio is not statistically significant, the panel FE model with selection can be reduced to a one-stage panel FE model, specified as
where the barred variables correspond to the individual means: . Because the fixed effect or time-invariant error, μi, is eliminated through this transformation, consistent estimation is possible, even with endogenous regressors, provided that these are correlated only with the time-invariant component of the error term.
To address the question whether patient heterogeneity might lead to biased estimates, the results of the FE model are compared with those from the OLS model, the most common form of multivariate analysis used in studies estimating utilities related to health states (Lung et al., 2011). Unlike the FE model, the OLS model assumes homogeneity between patients with respect to the relationships investigated. Two pooled OLS models are considered: first, we present a model without any patient-specific controls; second, we include as patient-specific covariates all time-invariant variables available in this study: gender, race, social class at recruitment, and trial centre. A variation of the Hausman test robust to heteroskedasticity (Schaffer and Stillman, 2010) is used to test whether the FE model is better than the respective OLS model.
The standard errors of the coefficients in the regression models were estimated through a bootstrap approach; 1000 clustered bootstrapped samples were used, where the cluster is the patient and all of his/her observations. Web Appendix 1 describes the covariates used in the different regression models.
It is important to point out that current age in FE captures the trend, whereas age in OLS captures both the trend and the time-specific (across) patient effects that affect all patients over time, such as changes in the stock of hospital beds or in national treatment guidelines. Because interpretations of current age in a FE and pooled OLS model are different, the model specification has to be such so that direct comparisons can be meaningful and not driven by the different interpretation of coefficients across models. To solve the comparability problem and to obtain a more flexible representation of time that does not rely on the linearity assumption of age, we use the age of the patient at the first questionnaire together with questionnaire dummies.
3.3 Impact of complications on quality of life over time, and interactions between complications
Previous analyses have attempted to differentiate between the short-term and longer-term impacts of complications on QoL, the hypothesis being that a serious complication may be associated with a severe initial decrement in QoL that is partly or wholly attenuated as time elapses. We test for this effect by assessing significance levels on coefficients representing proximity of the complication to the quality–of-life measure in the preferred estimation method. Previous studies have also attempted to address whether multiple complications are additive or multiplicative in effect. We investigate this by testing for first-order interactions between the complications in this model.
4 EMPIRICAL RESULTS
4.1 Within and between variation
Decomposition of the variation in EQ-5D tariff in the estimation sample indicates between-patient variation (SD = 0.27) that is nearly twice the magnitude of within-patient variation (SD = 0.16), indicating a high degree of heterogeneity in the data. Thus, if two patients were drawn randomly from the sample, the difference between their QoL levels is likely to be nearly twice as large as the difference for the same patient over two randomly selected years. The presence of this degree of patient heterogeneity indicates that a FE model might be appropriate.
4.2 Sample selection models
Following the estimation strategy set out in Figure 1, the results of the Heckman and the Wooldridge sample selection outcomes models are reported in Table 3 (results of the selection parts are reported in Web Appendices 2 and 3). Selection variables appear to affect the likelihood of being censored out of the sample, but as the estimated ρ is not significantly different from zero, we cannot reject the hypothesis of no correlation between the error terms from the participation decision and the tariff equations (p = 0.277); λ provides the estimated coefficient on the inverse Mills ratio. When λ is different from zero sample selection bias is present. From the estimates reported in Table 3, both ρ and λ indicate that the selection and outcomes equations are independent and that the OLS coefficients are no different statistically to the Heckman corrections. The inverse Mills ratio in the FE Wooldridge model is also not statistically significant.
Table 3. Results of outcomes equations in Heckman and Wooldridge selection models
4.3 Comparison of fixed effects and pooled ordinary least squares estimators
The importance of accounting for between-patient heterogeneity is addressed through a comparison of the FE regression models and the pooled linear regression models (Table 4). As the number of events within each category of diabetes-related complications is not always sufficient in our dataset to illustrate statistically significant differences in individual coefficients, Table 4 shows the results with the presence or absence of any complication captured by a single variable and the results for the full set of complications. These comparisons show that the FE model is preferred in all comparisons. The hypothesis of a single intercept across patients is rejected (all p < 0.001), indicating that there is a high degree of heterogeneity and that the pooled OLS could produce inconsistent estimates. Therefore, in this situation, the less restrictive FE models represent the preferred statistical model for estimating the impact of diabetes-related complications on QoL. When all complications are grouped together (Table 4), the coefficient is significantly lower (−0.054 versus −0.097; difference 0.042, 95% CI 0.006, 0.078). When analysed separately (Table 4), the coefficients on specific diabetes complications are not different between FE and OLS at conventional levels of significance, but the FE estimates are consistently smaller (i.e. the effects are closer to zero) than their OLS counterparts, suggesting that the use of OLS could potentially result in over-estimates of the impact of complications on QoL.
Table 4. Comparison of fixed effects and pooled ordinary least squares estimators (grouping and overall complications)
Pooled OLS with additional covariates
p < 0.01,
p < 0.05.
Social class was not available for all participants.
We also tested the FE model against a more fully adjusted OLS model (column 6, Table 4), using all other time-invariant covariates available to us to control for heterogeneity. The results of this indicate that the differences between FE and OLS models are slightly reduced but remain highly significant (Sargan-Hansen, p < 0.001).
4.4 Duration and interaction of quality of life impact of complications
We also tested whether the effects of complications in the acute phase (within one year of an event) and longer term (more than a year after an event occurring) differ. There was evidence that this was the case for MI but not for other complications. Consequently, the QoL impact reported for MI is separated in Table 4 into short-term and long-term decrements, whereas the impacts of all other complications can be interpreted as permanent decrements following the respective complications.
We also tested for evidence of interactions between the effects of the six diabetes-related complications, which might result in the combined effect of more than one complication having a greater or lesser impact on QoL than the separate effects. However, we found no evidence of significant interactions.
The FE results reported in Table 4 can therefore be considered as additive or independent effects on QoL and are our preferred estimates. The results show that amputation (−0.172 SE[0.045]), stroke (−0.165 SE[0.035]) and heart failure (−0.101 SE[0.032]) have the largest impact on QoL, followed by the short-term impact of an MI (−0.065 SE[0.030]). We do not find any statistically significant effect of blindness in one eye (0.033 SE[0.027]) or of ischaemic heart disease (−0.028 SE[0.022]) on quality of life.
Reliable estimates of the impact of health events on QoL are essential when estimating the cost-effectiveness of health care interventions to reduce these events. Here, an empirical analysis is used to compare estimates of the QoL impact of diabetes-related complications derived from cross-sectional data with estimates from longitudinal data. Our hypothesis was that the cross-sectional model might be substantially flawed because patients who have complications may be systematically different from those who do not, in their observed and unobserved characteristics. To address this problem, we analysed a longitudinal dataset using a FE model that accounted for effects of both non-response to EQ-5D questionnaires as well as heterogeneity between patients with and without non-fatal diabetes-related complications.
Our results indicate that the FE approach is likely to be more robust than a pooled OLS approach when estimating the ‘real’ effects of diabetes-related complications on QoL. We find the FE model to be consistently preferred when compared with the pooled OLS model. The estimated magnitude of QoL effects in our dataset was consistently less in the FE model: for example, pooled OLS indicated that the effect of ischaemic heart disease was to lower QoL by 0.067 (0.073 with additional time-invariant controls), whereas the FE model indicated a statistically non-significant reduction of 0.028. This suggests that correlations between the occurrence of diabetic complications and unmeasured or unused determinants of QoL systematically bias the coefficients in the OLS regressions downwards, and thus, the impact of complications on QoL is over-predicted. One possible explanation for this finding is that patients who at some point have a complication are already on a lower utility path compared with patients who never incur events, other things equal. The addition of further covariates in a pooled OLS model suggests that this problem might be reduced by the collection of additional variables that capture patient heterogeneity and may influence QoL. However, quite apart from the potential costs and difficulties of obtaining such data, it is important to note that, even with their addition to the analysis, the estimates of the OLS model remain statistically different from the estimates of the FE model. Moreover, the OLS model still relies heavily on strong assumptions, for example, that the response of QoL to these additional covariates is linear and that there is no heterogeneity in the impact of the additional covariates. In contrast, the FE model by construction adjusts for time-invariant patient-specific characteristics and avoids the need to collect extensive patient-specific information.
How important are the potential inconsistencies in the estimates of QoL effects derived with the OLS model? The literature on minimum clinically important differences in QoL in patients with chronic disease indicates that the minimum is around half of a standard deviation (Norman et al., 2003) or approximately 0.015 in our sample. Similarly, Walters and Brazier (Walters and Brazier, 2005) have suggested that the minimum important difference for the EQ-5D is between 0.011 and 0.014. In our analysis, the differences in estimates of QoL decrements between minimally adjusted pooled OLS and FE were all in excess of such minimum important differences: 0.067 for heart failure, 0.039 for IHD, 0.027 for history of MI, 0.020 for MI in the year preceding a questionnaire, 0.014 for stroke and 0.083 for blindness in one eye.
Further empirical work using a range of longitudinal datasets in different disease areas is clearly needed to assess the generalizability of our results, but our analyses using this dataset suggest that repeated measures data analysed using the FE approach are likely to give a more reliable estimate of the effects of health events on QoL than cross-sectional data.
Fixed effects models do have the drawback of increasing the noise-to-signal ratio, especially when within-patient variation is low. Had data permitted, it would have been interesting to present a richer FE model alongside the adjusted OLS model, including a full set of behavioural variables; the danger, however, with variables that present a high degree of persistence, such as smoking, alcohol consumption and fitness levels. is that these might not have been easily identifiable because of the low degree of within patient variation.
For those who only have access to cross-sectional data, can estimates of the marginal differences attributable to complications be adjusted to remove bias because of between individual characteristics? This is an important question for further study, particularly as the vast majority of studies to date have involved cross-sectional rather than longitudinal data. Fruitful avenues for investigation would be the inclusion of additional control variables in OLS, the use of instrumental variables, and the use of methods such as propensity score matching.
Our substantive results are broadly in line with previous estimates of the effect of diabetes-related complications on QoL, in particular in the finding that amputation and stroke have the largest effect. However, our finding that blindness in one eye did not have a significant impact on QoL does differ from previous studies (Glasziou et al., 2007, Huang et al., 2007, Sharma et al., 2003, Brown et al., 1999, Clarke et al., 2006), which have all been based on cross sectional data. This difference may be driven by the estimation method: the difference between FE and pooled OLS was larger for blindness in one eye than for any other complication. There may also be differences between studies in the definition of blindness: in our data, blindness was defined as LogMAR > 1 in one eye, which may have very little impact on QoL if vision in the remaining eye is maintained. Disease progression in the worst eye may be the most clinically relevant measure, but visual acuity in the best eye may have more bearing on QoL.
Finally, the current work employed linear regression methods to examine the impact of non-response and patient heterogeneity on estimates of QoL change between health states: linear models have well-developed estimators for dealing with these issues, particularly in a panel data context. However, it should be acknowledged that QoL data have properties which may violate the assumptions underlying linear regression methods: for example, they are bounded above and below and, in the case of the EQ-5D, have a conspicuous ‘ceiling effect’ manifested in a significant proportion of the sample indicating no problems in any of the five domains. Examining the impact of these data characteristics within a panel estimation framework would be an important contribution for future work, but we note here that estimators that exploit within-patient variation become increasingly intractable in the context of non-linearity. In the current study, the primary concern has been not the distributional form, which mainly affects efficiency, but rather consistent estimation of causal effects in light of the potential bias arising from non-response and patient heterogeneity.
In this paper, we report QoL decrements associated with six diabetes-related complications of type 2 diabetes. We draw on the clinical history of patients in our sample from the time of diagnosis of diabetes onwards and QoL data reported over multiple time points. These longitudinal data allow us to test whether sample selection (non-response) matters and to control for the effects of time-invariant omitted variable bias. We find that reliable estimates of QoL decrements depend not only on adequate sample sizes but also on availability of longitudinal data on complications and QoL measures over time.
In the current analysis, and possibly more generally, failing to account for patient heterogeneity and relying on cross-sectional estimation methods is likely to overestimate the impact of complications on QoL. The reason for this is that cross-sectional analysis attributes all differences in QoL between those who do or do not experience events to the events, without taking into account that the two groups could differ in other aspects that are not adjusted for in the estimation model. In contrast, FE estimation only takes into account differences in QoL in individuals who experience events before and after the occurrence of the events of interest and, therefore, mitigates the potential for heterogeneity bias because of time-invariant patient characteristics.
We thank Ruth Coleman and Ian Kennedy from the Diabetes Trial Unit in Oxford for assistance in preparing and interpreting data from the UKPDS data set and Rury Holman for valuable comments. We are grateful to participants at EASD 2010 and Willard Manning at iHEA 2011 for helpful comments on earlier versions of this work. All patients participating in the original study gave consent for information including QoL questionnaires to be collected. The UKPDS main study and post-trial monitoring study were undertaken in full compliance with all consent and ethical requirements. No additional consent or ethical approval was required for this study. The work was supported by a UK Medical Research Council project grant on Disease Modelling (Grant ID: 87386). Alastair Gray is an NIHR Senior Investigator.