Telephone: 615-936-2528; FAX: 615-936-0435
The validity of EQ-5D US preference weights in liver transplant candidates and recipients
Article first published online: 24 DEC 2008
Copyright © 2008 American Association for the Study of Liver Diseases
Volume 15, Issue 1, pages 88–95, January 2009
How to Cite
Russell, R. T., Feurer, I. D., Wisawatapnimit, P. and Pinson, C. W. (2009), The validity of EQ-5D US preference weights in liver transplant candidates and recipients. Liver Transpl, 15: 88–95. doi: 10.1002/lt.21648
- Issue published online: 24 DEC 2008
- Article first published online: 24 DEC 2008
- Manuscript Accepted: 4 AUG 2008
- Manuscript Received: 14 MAY 2008
- National Institute of Diabetes and Digestive and Kidney Diseases. Grant Number: 1 F32 DK077482-02
- Novartis Pharmaceutical Corp.
Health utility instruments assess patients' valuation of specific health states, which can be converted to quality-adjusted life years for cost-utility analysis. Data from the EQ-5D, a generic health-related quality of life questionnaire from EuroQoL, can be reported as 5 health status scores or as a single health preference weight (HPW). US population–based HPWs were published by Shaw and colleagues in 2005 (Med Care 2005;43:203-220). Our aim was to test the validity of US EQ-5D HPWs and health status scores in liver transplant patients. EQ-5D scores were converted to HPWs with Shaw et al.'s model. Data were stratified by measurement period: pretransplant period, early posttransplant period (≤12 months), intermediate posttransplant period (13-36 months), and late posttransplant period (>36 months). EQ-5D scores were compared to specific, hypothesized Short Form 36 Health Survey, Center for Epidemiologic Studies Depression Scale, and Beck Anxiety Inventory scores that were identified a priori on the basis of construct similarity. Criterion-related and construct validity were tested with nonparametric methods. Two hundred eighty-five adults participated (113 in the pretransplant period, 60 in the early posttransplant period, 47 in the intermediate posttransplant period, and 65 in the late posttransplant period), and follow-up averaged 36 ± 36 months. Eighty-one percent of the hypothesized relationships between EQ-5D and gold-standard scales were strong (r ≥ |0.5|, P < 0.001), and the remainder were moderate (r > |0.3|, P < 0.001). Differences between pretransplant and posttransplant EQ-5D HPWs were statistically significant. In conclusion, EQ-5D dimensions and the health utility index generated from Shaw's US population preference weights demonstrated criterion-related and construct validity in liver transplant patients. It is a valid instrument for cost-utility analysis in this setting. Liver Transpl 15:88–95, 2009. © 2008 AASLD.
Although outcomes after liver transplantation will always be centered around patient survival, there is increasing interest in and reliance on other metrics that emphasize subjective patient-reported outcomes. Health-related quality of life (HRQOL) represents one such outcome measure, which provides an important supplement to standard clinical measures to quantify the benefits and/or health improvements gained by a patient after a costly procedure such as liver transplantation.
Health status and health utility instruments represent 2 conceptual characterizations of HRQOL. Health status instruments, which are usually multiple-item questionnaires, represent dimensions of HRQOL via quantitative rating scales. These characterizations allow us to compare and contrast the HRQOL of patient groups with a specific disease or following an intervention. There are many health status instruments, one of the most frequently used being the Short Form 36 Health Survey (SF-36).1
Health utility instruments differ from health status instruments in that the quantitative description of health status can be expressed as a societally derived health preference weight (HPW) for each possible health state. These HPWs are then considered in conjunction with the reported time spent in that state of health to calculate the outcome known as quality adjusted life years (QALYs) or QALYs gained.2 The QALY metric represents an amalgamation of HRQOL and time that enables a comparison of the cost of health care programs/interventions between treatments for different illnesses as well as within-illness treatments. There are several health utility instruments available, including the Quality of Well-Being Scale,3 the Health Utility Index,4 and the EQ-5D from the EuroQoL Group.5 Comparisons of the various approaches to health preference measurement reflected in these instruments demonstrated that their differing algorithms can potentially lead to different conclusions.6–8
Health status measures provide a quantitative assessment of HRQOL that can be compared among patient groups or at different time points. However, health status scores cannot be used for cost-utility analysis without associated HPWs. Health utility instruments, which associate a HPW with each health state, permit HRQOL outcome data to be converted into QALYs for use in cost-utility analysis. Until recently, research groups within the United States had to rely on health utility indices having valuation systems (ie, preference weights) that were not representative of the US population. Preference weights for the Quality of Well-Being Scale were generated from a regional (San Diego, CA) sample, and Health Utility Index preference weights were based on a sample from Ontario, Canada. We added the EQ-5D to our HRQOL survey battery shortly after its US population–based preference weights had been developed and in recognition of the instrument's brevity.
The EQ-5D health utility instrument characterizes HRQOL in each of 5 dimensions. These scores can be expressed as 243 possible health state descriptors. Then, each health state descriptor can be converted to an HPW on the basis of the general public's valuation of that health state. Prior to 2005, HPWs for this instrument were based on various European population data that have been shown to differ from those of the US population.9 Thus, Shaw and colleagues,2 in a project funded by the Agency for Healthcare Research and Quality, established a set of US population–based preferences using the time trade-off method. Using a multistage probability sample of the general adult US population, they generated a preference-weighting scoring system for the EQ-5D health states. The importance of validating this instrument is highlighted by the fact that the EQ-5D has been used and validated in populations with specific medical diagnoses and after surgical procedures in Europe, but the core instrument and the recently published US population–based HPW valuations have never been prospectively validated in a US clinical population.
Validity is the degree to which an instrument measures what it purports to measure.10 The psychometric literature describes specific methods for establishing each of the different types of validity, which include face validity, content validity, criterion-related validity, and construct validity.11–13 Different types of validity, their meanings and implications, and methods for establishing each are outlined in Table 1. The aim of this study was to examine the criterion-related and construct validity of the EQ-5D and its US HPWs in liver transplant candidates and recipients. In particular, evidence supporting the validity of the EQ-5D US HPW would enable their future use in cost-utility analyses for liver transplant candidates and recipients.
|Type of Validity||Interpretation||Methodology|
|Face validity||The extent to which questions or items use appropriate terminology to address content area(s) and will be understood by likely survey respondents||Nonstatistical; expert review|
|Content validity||Establishes the breadth of construct representation of the survey||Nonstatistical; expert review|
|Criterion-related validity||Investigates relationships between a target survey in relation to other gold-standard instruments measuring similar constructs||Measures of association (correlation coefficients, effect sizes)|
|Construct validity||Examines the effect of a relevant event (eg, transplantation) on survey scores. Does the target survey behave as other validated measures do?||A. Analysis of variance methods, nonparametric comparisons for continuous data, nonparametric tests for cross-classified categorical data|
|Assesses the responsiveness of a survey to detect change over time (if longitudinal data are available)||B. Within-subject tests for longitudinal data, summary measures of effect size, change scores, Guyatt statistics|
PATIENTS AND METHODS
Patient Sample and Data Collection
The analyses in this study are based on a cross-sectional sample of liver transplant candidates and recipients who completed the EQ-5D survey over a 15-month period (September 2006 through November 2007). In consideration of Shaw et al.'s 2005 report,2 the EQ-5D was added to an existing battery of HRQOL surveys in September 2006. This institutional review board–approved protocol involved the administration of a battery of surveys and the collection of demographic and clinical data from transplant center and medical center databases and records and was approved for analyses of the survey's psychometric characteristics. Demographic and clinical data included age, sex, race, primary indication for liver transplantation, educational attainment, and physiologic Model of End-Stage Liver Disease score.
The HRQOL assessment battery, which has been previously described14 and takes about 30 minutes to complete, included the EQ-5D, SF-36, Center for Epidemiologic Studies Depression Scale (CES-D), and Beck Anxiety Inventory (BAI). Data collection occurred at specific time points: at initial evaluation, every 6 months while a patient was on the waiting list, and 1 month, 3 months, 6 months, and yearly post-transplant. A rolling enrollment system allowed for a patient to participate at any time point in his or her pretransplant or posttransplant course, regardless of whether he or she had participated previously.
In order to be included in this study, each patient was required to have a complete EQ-5D survey, which allowed the conversion of the 5-digit health descriptor into an HPW. Because this was a validation study, in order to maximize the likelihood that data represented an initial encounter with the instrument, if a participant completed surveys on more than one occasion, only data from the first EQ-5D survey were used. This approach avoided any potential recall effect in relation to the instrument being studied and ensured statistical independence between observations.
This study focused on the EQ-5D and how its determinations of HRQOL in liver transplant patients compared to previously validated measures in this population. The EQ-5D descriptive system consists of 5 dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension has a 3-level response format reflecting “no health problems (1),” “moderate health problems (2),” and “extreme health problems (3).” Thus, each of the 243 (35) unique health states has an associated 5-digit descriptor ranging from 11111 for perfect health to 33333 for the worst possible state.15 This health descriptor is then converted, with Shaw's model, into a US population–based HPW that can range from −0.109 (a state worse than death) to 1.0 (perfect health).2 The EQ-5D also includes a visual analogue scale (VAS) by which respondents can rate their overall health on a continuous scale representing the worst state of health (0) to perfect health (100).
The SF-36 was used to assess generic physical and mental HRQOL. This questionnaire measures 8 areas of functioning and well being (role physical, bodily pain, physical functioning, general health, vitality, social functioning, role emotional, and mental health). The physical and mental component summary scales can then be computed as weighted composites of the 8 subscales, with possible scores ranging from 0 to 100 and higher scores indicating a better health state.16 These scales and components provided gold-standard criterion measures for EQ-5D dimension and HPW scores.
The CES-D and BAI provided additional mental health–related criterion measures for validating the EQ-5D. The CES-D scale is a 20-item self-report instrument designed to identify symptoms of depression in both clinical and general populations.17 The BAI was employed to measure the severity of self-reported symptoms of anxiety. This instrument consists of 21 items, each describing a common symptom of anxiety.18 Both instruments employ a 1-week recall period, and higher scores reflect greater symptom severity.
The 5-digit EQ-5D health descriptors were computed and converted to HPWs with Shaw et al.'s model.2 Data were then stratified by measurement period: pretransplant period, early posttransplant period (≤12 months), intermediate posttransplant period (13-36 months), and late posttransplant period (>36 months). Criterion-related validity was tested by an examination of the strength of associations between the EQ-5D scores (HPWs and scores for each of the 5 dimensions) and selected, relevant scores from the SF-36, CES-D, and BAI. A priori hypotheses were generated as to which associations between the EQ-5D and the criterion measures would be expected to exhibit strong correlation coefficients or effect sizes (r ≥ 0.5; Table 2). These hypotheses were generated after a careful review of the manuals for our gold-standard instruments to compare their construct similarity to the 5 EQ-5D dimensions. Domains from the gold standards that measured constructs similar to EQ-5D dimensions were hypothesized to have strong correlations. Validity coefficients were computed as nonparametric Spearman rank correlations coefficients, which were adjusted for attenuation. This adjustment accounts for the previously established reliability of the target (EQ-5D) and the criterion tests.19
Given that small correlations will achieve statistical significance when sample sizes are greater than 200, our primary measure providing evidence of criterion-related validity was the magnitude of the correlation coefficient and not solely whether a correlation was statistically significant (and potentially associated with small effects that would be insufficient evidence of criterion-related validity). In agreement with Cohen's interpretation of effect sizes for measures of this type20 and with several EQ-5D validation studies in different patient populations,21–23 we considered a Spearman's correlation of ≥0.5 to represent large effects and those of 0.3 to <0.5 to be moderate. Although statistical power was not the primary consideration for testing criterion-related validity, statistical power was prospectively established for the construct validation phase. On the basis of preliminary data generated during the first 6 months of EQ-5D data collection, 43 patients in each of the 4 measurement periods would provide adequate power (80% at the 0.05 2-sided alpha level) to detect a moderate difference in means (of 0.08 given a within-group standard deviation of 0.16) on HPW scores. This specific difference in means represents an effect size of 0.5 standard deviations, which is a generally accepted threshold for clinically relevant effects in the HRQOL literature.24
To evaluate the cross-sectional construct validity of the EQ-5D, we assessed the effect of liver transplantation on the EQ-5D health utility index, VAS, and the 5 health dimensions by measurement period (pretransplant period versus early, intermediate, and late posttransplant periods). The pretransplant EQ-5D HPWs and VAS scores were compared to scores for each posttransplant period with the Mann-Whitney U test, with P values ≤ 0.05 considered significant. The chi-square test of proportions was used to test the effect of the measurement period on the proportion of patients with no problems versus any problems on each EQ-5D dimension. Data are summarized as the mean ± standard deviation, median and interquartile range, or percentages.
Two hundred eighty-five patients completed EQ-5D surveys between September 2006 and November 2007 (113 in the pretransplant period, 60 in the early posttransplant period, 47 in the intermediate posttransplant period, and 65 in the late posttransplant period). Two subjects were excluded for not completing the EQ-5D survey. Seventy-six percent of the sample completed all criterion survey scales, and 94% completed all but one. Data represented respondents' only (218 subjects) or initial (67 subjects) EQ-5D survey response during this data collection period. Eighty-seven repeat EQ-5D observations, including those from the 14 individuals having both pretransplant and posttransplant data, were not analyzed. This was predominantly a male, Caucasian sample, and the most prevalent indication for liver transplantation was noncholestatic (hepatitis B, hepatitis C, or alcoholic) cirrhosis (Table 3). The follow-up time for the posttransplant patients averaged 36 ± 36 months (range: 0.8-133 months), and the mean follow-up for each posttransplant group was as follows: 4 ± 4 months for the early posttransplant group, 21 ± 7 months for the intermediate posttransplant group, and 76 ± 26 for the late posttransplant group. There was no difference in age or the proportion of males between the 4 measurement period–specific groups.
|Variable||Mean ± SD or %|
|Age at transplant||53.3 ± 10.4|
|Physiologic MELD||21.3 ± 7.9|
|Grade school (0-8 years)||12%|
|High school (9-12 years)||49%|
|College or beyond||35%|
|Follow-up (months)||36 ± 36|
|Early posttransplant period (≤12 months)||4 ± 4|
|Intermediate posttransplant period (13-36 months)||21 ± 7|
|Late posttransplant period (>36 months)||76 ± 26|
All the a priori hypothesized associations between scales or dimensions measuring similar constructs were confirmed by demonstration of strong or moderate, attenuation-adjusted Spearman rank correlation coefficients that ranged in absolute value from 0.31 to 0.75 (all P < 0.001; Table 2). Eighty-one percent (17 of 21) of these associations were strong, with correlation coefficients that ranged in absolute value from 0.50 to 0.75 (all P < 0.001). The strongest associations were between the EQ-5D HPW and the 2 component summaries of the SF-36. The remaining hypothesized associations were moderate and ranged in absolute value from 0.31 to 0.47 (all P < 0.001). Four strong associations (noted by Σ in Table 2) were not anticipated. The usual activities domain of the EQ-5D correlates with the physical function, bodily pain, and general health scales of the SF-36, and the anxiety/depression domain of the EQ-5D is strongly associated with the SF-36 social functioning scale.
Table 4 displays the average EQ-5D HPWs by measurement period. The pretransplant group had the lowest HPW scores (0.75 ± 0.19), whereas the intermediate posttransplant group had the highest (0.83 ± 0.11). Differences in HPW between the pretransplant period and the intermediate and late posttransplant periods represented statistically significant, clinically relevant, moderate effects (P = 0.01 and P = 0.006, respectively) that are consistent with documented effects of liver transplantation on other HRQOL measures. Figure 1 displays the distribution of HPW scores by period. Overall, EQ-5D HPWs were negatively skewed, with the preponderance of values being toward the upper end of the scale. The greatest variability (range: 0.05-1.0), lowest mean and median values, and greatest number of outlier observations occurred in the pretransplant period. HPWs in the early and intermediate posttransplant periods demonstrated tighter score distributions and fewer outliers. In the late posttransplant period, the 75th percentile value is 1.0, indicating that 25% of recipients in this period reported perfect health. However, the distribution of scores in this late posttransplant period widened in relation to the preceding periods, with a larger number of observations in the lower range of possible HPW scores.
|Period (n)||Mean HPW||SD||95% CI||Median (IQR)|
|Pretransplant (113)||0.746||0.187||0.712–0.780||0.778 (0.689, 0.843)|
|Early posttransplant (60)||0.765||0.145||0.725–0.804||0.779 (0.708, 0.843)|
|Intermediate posttransplant (47)*||0.832||0.112||0.797–0.866||0.816 (0.778, 0.918)|
|Late posttransplant (65)*||0.817||0.164||0.781–0.858||0.827 (0.778, 1.0)|
Similar effects of the measurement period are demonstrated in the EQ-5D VAS data period (Table 5). VAS scores in the pretransplant period were significantly lower than those in each of the posttransplant periods (all P ≤ 0.001).
|Period (n)||Mean VAS||SD||95% CI||Median (IQR)|
|Pretransplant (85)||64.9||24.7||60.0–69.9||70 (50, 90)|
|Early posttransplant (49)*||76.3||14.1||72.5–80.1||80 (70, 85)|
|Intermediate posttransplant (38)*||80.8||17.7||75.3–86.3||84.5 (74, 95)|
|Late posttransplant (58)*||79.9||18.6||75.1–84.9||82.5 (70, 95)|
The percentage of patients reporting problems in several EQ-5D dimensions varied across measurement periods. When responses were dichotomized as “no problems” and “any problems,”25 the EQ-5D dimensions of mobility, self-care, and usual activities demonstrated statistically significant differences in the proportion of patients reporting problems across monitoring periods, with the reported problems declining from the pretransplant period to the late posttransplant period (Table 6). The pain/discomfort and anxiety/depression dimensions, which had lower event frequencies at all time points, showed a pattern of decreasing reported problems after transplantation, but these effects were small and not statistically significant.
|EQ-5D Dimension||Pretransplant Period||Early Posttransplant Period||Intermediate Posttransplant Period||Late Posttransplant Period||Chi-Square P|
This is the first study to examine health utility scores of liver transplant candidate and recipients using the US population–based HPWs for the EQ-5D. Previous studies using health status instruments have shown significant differences or improvement in HRQOL in these patients from the pretransplant period to the posttransplant period.26–28 However, valid health utility measures, not traditional health status scores, are necessary to convert HRQOL outcomes to QALYs for the assessment of cost-utility. Prior to 2005, EQ-5D HPWs were based on European data that were likely not representative of the US population. However, Shaw and colleagues2 established a set of US-based societal preference weights for EQ-5D scores. Johnson and colleagues29 subsequently reported meaningful differences between the US and UK EQ-5D valuations, thus demonstrating the importance of using appropriate population-based standards.
However, there is concern that generic health-utility questionnaires such as the EQ-5D may fail to capture relevant differences or changes in health status within specific patient groups. For example, European patients' EQ-5D self-evaluation and HPWs had weak discriminative ability across differing severities of osteoarthritis.30 For this reason, the reliability and validity of the core instrument and its European population–based HPW valuations have been established in general public and many clinical samples.25, 31–34 This study aims to validate the EQ-5D with recently generated US HPWs to enable future cost-utility analysis in liver transplantation.
Our data support the EQ-5D as a valid instrument for assessing generic HRQOL and HPWs in liver transplant candidates and recipients. We demonstrated criterion-related validity for the EQ-5D through a theoretically anticipated pattern of moderate to strong correlations with relevant constructs represented by generic and specific HRQOL instruments. Although 4 strong associations were not anticipated, in retrospect, the construct similarity between the EQ-5D and criterion scales is reasonable.
Construct validity was demonstrated, in that the EQ-5D detected differences in HRQOL of greater than one-half of a standard deviation from the pretransplant period to the posttransplant period that are consistent with those reported with other HRQOL instruments. Whiting et al.35 reported improvements in most of the SF-36 domains, except bodily pain, in a sample of 84 patients before and after liver transplantation. Bravata and coauthors,28 in a meta-analysis of 7 liver transplant studies, indicated good overall performance on SF-36 mental subscales, with little change due to transplantation, and significant improvement after transplantation on several physical subscales. Finally, we have recently reported longitudinal analyses in a prospective cohort of liver transplant patients, demonstrating a similar magnitude of change (more than half of a standard deviation) after liver transplantation in HRQOL as measured by the SF-36, BAI, and CES-D.26 Consistent with the health status measures used in earlier studies, the EQ-5D detected significant differences in overall HRQOL and in individual domains from the pretransplant period to the posttransplant period, but only subtle differences were appreciated in the EQ-5D domains characterizing mental HRQOL.
A promising feature of the EQ-5D that was demonstrated in this study is the breadth of health states reported by liver transplant candidates and recipients. This overall range of HPWs, from 0.049 to 1.0, suggests the EQ-5D's discriminative ability within our liver transplant sample. We anticipate that our findings can be generalized to other US liver transplant samples having similar demographics. Although we do not anticipate that race would have an effect on the validity of EQ-5D HPWs, our sample was not suitably heterogeneous to test this potential effect.
This study has several limitations. We did not determine the longitudinal responsiveness of the EQ-5D because we have not yet accumulated sufficient longitudinal data to support that type of analysis with sufficient statistical power. Despite this limitation, we tested the construct validity of this instrument using a cross-sectional design and demonstrated its ability to detect differences in liver transplant candidates and recipients that are comparable to the effects detected by other HRQOL instruments with longitudinal data.26 Given the overall congruence between reports that used either cross-sectional or longitudinal designs, we anticipate that longitudinal responsiveness will be demonstrated in the future and do not think it is necessary to delay the application of the US preference weights for cost-utility analyses. Also, we recognize the potential for certain biases that may be present in our study. There is potential for selection bias in any study involving patients undergoing a surgical procedure for which there is associated morbidity as well as mortality that may preclude those affected by perioperative morbidity and/or mortality from participating. Furthermore, responder bias is always a consideration in survey research. Our data collection method, timing assessments with on-site clinical visits, has yielded an overall participation rate close to 80% that we believe adequately represents liver transplant candidates and recipients at our institution.
In conclusion, this is the first study to assess criterion-related and construct validity of the EQ-5D and its US population-based HPWs in the liver transplant setting. We conclude that the EQ-5D is valid for evaluating generic HRQOL in this population and that EQ-5D scores can be converted with Shaw et al.'s model2 into valid HPWs that have future application in cost-utility analyses for these patients.
- 1SF-36 Health Survey: Manual & Interpretation Guide. Lincoln, RI: Quality Metric; 2000., , .
- 3The quality of well being scale: rationale for a single quality of life index. In: WalkerS, RosserRM, eds. Quality of Life: Assessment and Application. London, United Kingdom: MTP Press; 1998: 51–77..
- 10Scale Development: Theory and Applications. Thousand Oaks, CA: Sage Publications; 2003..
- 14Incorporating quality of life and patient satisfaction measures into a transplant outcomes assessment program: technical and practical considerations. Prog Transplant 2007; 17: 121–128., , .
- 15Guidelines for analyzing and reporting EQ-5D outcomes. In: BrooksR, RabinR, De CharroF, eds. The Measurement and Valuation of Health Status Using EQ-5D: The European Perspective. Dordrecht, The Netherlands: Kluwer Academic Publishers; 2003., .
- 16SF-36 Physical and Mental Health Summary Scales: A Users' Manual. Boston, MA: New England Medical Center; 1994., , .
- 18Beck Anxiety Inventory Manual. San Antonio, TX: Psychological Corp.; 1993., .
- 19Introduction to Measurement Theory. Monterey, CA: Brooks/Cole Publishing Company; 1979., .
- 20Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.; 1988..
- 35Clinical determinants of health-related quality of life in recipients of solid organ transplants. J Surg Outcomes 1999; 2: 21–26., , .