If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Although outcomes after liver transplantation will always be centered around patient survival, there is increasing interest in and reliance on other metrics that emphasize subjective patient-reported outcomes. Health-related quality of life (HRQOL) represents one such outcome measure, which provides an important supplement to standard clinical measures to quantify the benefits and/or health improvements gained by a patient after a costly procedure such as liver transplantation.
Health status and health utility instruments represent 2 conceptual characterizations of HRQOL. Health status instruments, which are usually multiple-item questionnaires, represent dimensions of HRQOL via quantitative rating scales. These characterizations allow us to compare and contrast the HRQOL of patient groups with a specific disease or following an intervention. There are many health status instruments, one of the most frequently used being the Short Form 36 Health Survey (SF-36).1
Health utility instruments differ from health status instruments in that the quantitative description of health status can be expressed as a societally derived health preference weight (HPW) for each possible health state. These HPWs are then considered in conjunction with the reported time spent in that state of health to calculate the outcome known as quality adjusted life years (QALYs) or QALYs gained.2 The QALY metric represents an amalgamation of HRQOL and time that enables a comparison of the cost of health care programs/interventions between treatments for different illnesses as well as within-illness treatments. There are several health utility instruments available, including the Quality of Well-Being Scale,3 the Health Utility Index,4 and the EQ-5D from the EuroQoL Group.5 Comparisons of the various approaches to health preference measurement reflected in these instruments demonstrated that their differing algorithms can potentially lead to different conclusions.6–8
Health status measures provide a quantitative assessment of HRQOL that can be compared among patient groups or at different time points. However, health status scores cannot be used for cost-utility analysis without associated HPWs. Health utility instruments, which associate a HPW with each health state, permit HRQOL outcome data to be converted into QALYs for use in cost-utility analysis. Until recently, research groups within the United States had to rely on health utility indices having valuation systems (ie, preference weights) that were not representative of the US population. Preference weights for the Quality of Well-Being Scale were generated from a regional (San Diego, CA) sample, and Health Utility Index preference weights were based on a sample from Ontario, Canada. We added the EQ-5D to our HRQOL survey battery shortly after its US population–based preference weights had been developed and in recognition of the instrument's brevity.
The EQ-5D health utility instrument characterizes HRQOL in each of 5 dimensions. These scores can be expressed as 243 possible health state descriptors. Then, each health state descriptor can be converted to an HPW on the basis of the general public's valuation of that health state. Prior to 2005, HPWs for this instrument were based on various European population data that have been shown to differ from those of the US population.9 Thus, Shaw and colleagues,2 in a project funded by the Agency for Healthcare Research and Quality, established a set of US population–based preferences using the time trade-off method. Using a multistage probability sample of the general adult US population, they generated a preference-weighting scoring system for the EQ-5D health states. The importance of validating this instrument is highlighted by the fact that the EQ-5D has been used and validated in populations with specific medical diagnoses and after surgical procedures in Europe, but the core instrument and the recently published US population–based HPW valuations have never been prospectively validated in a US clinical population.
Validity is the degree to which an instrument measures what it purports to measure.10 The psychometric literature describes specific methods for establishing each of the different types of validity, which include face validity, content validity, criterion-related validity, and construct validity.11–13 Different types of validity, their meanings and implications, and methods for establishing each are outlined in Table 1. The aim of this study was to examine the criterion-related and construct validity of the EQ-5D and its US HPWs in liver transplant candidates and recipients. In particular, evidence supporting the validity of the EQ-5D US HPW would enable their future use in cost-utility analyses for liver transplant candidates and recipients.
Table 1. Types of Survey Validity, Their Interpretation, and Relevant Methodology
Type of Validity
The extent to which questions or items use appropriate terminology to address content area(s) and will be understood by likely survey respondents
Nonstatistical; expert review
Establishes the breadth of construct representation of the survey
Nonstatistical; expert review
Investigates relationships between a target survey in relation to other gold-standard instruments measuring similar constructs
Measures of association (correlation coefficients, effect sizes)
Examines the effect of a relevant event (eg, transplantation) on survey scores. Does the target survey behave as other validated measures do?
A. Analysis of variance methods, nonparametric comparisons for continuous data, nonparametric tests for cross-classified categorical data
Assesses the responsiveness of a survey to detect change over time (if longitudinal data are available)
B. Within-subject tests for longitudinal data, summary measures of effect size, change scores, Guyatt statistics
BAI, Beck Anxiety Inventory; BP, bodily pain; CES-D, Center for Epidemiologic Studies Depression Scale; CI, confidence interval; GH, general health; HPW, health preference weight; HRQOL, health-related quality of life; IQR, interquartile range; MCS, mental component summary; MELD, Model for End-Stage Liver Disease; MH, mental health; NASH, nonalcoholic steatohepatitis; PCS, physical component summary; PF, physical function; QALY, quality-adjusted life year; RE, role emotional; RP, role physical; SD, standard deviation; SF, social function; SF-36, Short Form 36 Health Survey; VAS, visual analogue scale; VT, vitality.
PATIENTS AND METHODS
Patient Sample and Data Collection
The analyses in this study are based on a cross-sectional sample of liver transplant candidates and recipients who completed the EQ-5D survey over a 15-month period (September 2006 through November 2007). In consideration of Shaw et al.'s 2005 report,2 the EQ-5D was added to an existing battery of HRQOL surveys in September 2006. This institutional review board–approved protocol involved the administration of a battery of surveys and the collection of demographic and clinical data from transplant center and medical center databases and records and was approved for analyses of the survey's psychometric characteristics. Demographic and clinical data included age, sex, race, primary indication for liver transplantation, educational attainment, and physiologic Model of End-Stage Liver Disease score.
The HRQOL assessment battery, which has been previously described14 and takes about 30 minutes to complete, included the EQ-5D, SF-36, Center for Epidemiologic Studies Depression Scale (CES-D), and Beck Anxiety Inventory (BAI). Data collection occurred at specific time points: at initial evaluation, every 6 months while a patient was on the waiting list, and 1 month, 3 months, 6 months, and yearly post-transplant. A rolling enrollment system allowed for a patient to participate at any time point in his or her pretransplant or posttransplant course, regardless of whether he or she had participated previously.
In order to be included in this study, each patient was required to have a complete EQ-5D survey, which allowed the conversion of the 5-digit health descriptor into an HPW. Because this was a validation study, in order to maximize the likelihood that data represented an initial encounter with the instrument, if a participant completed surveys on more than one occasion, only data from the first EQ-5D survey were used. This approach avoided any potential recall effect in relation to the instrument being studied and ensured statistical independence between observations.
This study focused on the EQ-5D and how its determinations of HRQOL in liver transplant patients compared to previously validated measures in this population. The EQ-5D descriptive system consists of 5 dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension has a 3-level response format reflecting “no health problems (1),” “moderate health problems (2),” and “extreme health problems (3).” Thus, each of the 243 (35) unique health states has an associated 5-digit descriptor ranging from 11111 for perfect health to 33333 for the worst possible state.15 This health descriptor is then converted, with Shaw's model, into a US population–based HPW that can range from −0.109 (a state worse than death) to 1.0 (perfect health).2 The EQ-5D also includes a visual analogue scale (VAS) by which respondents can rate their overall health on a continuous scale representing the worst state of health (0) to perfect health (100).
The SF-36 was used to assess generic physical and mental HRQOL. This questionnaire measures 8 areas of functioning and well being (role physical, bodily pain, physical functioning, general health, vitality, social functioning, role emotional, and mental health). The physical and mental component summary scales can then be computed as weighted composites of the 8 subscales, with possible scores ranging from 0 to 100 and higher scores indicating a better health state.16 These scales and components provided gold-standard criterion measures for EQ-5D dimension and HPW scores.
The CES-D and BAI provided additional mental health–related criterion measures for validating the EQ-5D. The CES-D scale is a 20-item self-report instrument designed to identify symptoms of depression in both clinical and general populations.17 The BAI was employed to measure the severity of self-reported symptoms of anxiety. This instrument consists of 21 items, each describing a common symptom of anxiety.18 Both instruments employ a 1-week recall period, and higher scores reflect greater symptom severity.
The 5-digit EQ-5D health descriptors were computed and converted to HPWs with Shaw et al.'s model.2 Data were then stratified by measurement period: pretransplant period, early posttransplant period (≤12 months), intermediate posttransplant period (13-36 months), and late posttransplant period (>36 months). Criterion-related validity was tested by an examination of the strength of associations between the EQ-5D scores (HPWs and scores for each of the 5 dimensions) and selected, relevant scores from the SF-36, CES-D, and BAI. A priori hypotheses were generated as to which associations between the EQ-5D and the criterion measures would be expected to exhibit strong correlation coefficients or effect sizes (r ≥ 0.5; Table 2). These hypotheses were generated after a careful review of the manuals for our gold-standard instruments to compare their construct similarity to the 5 EQ-5D dimensions. Domains from the gold standards that measured constructs similar to EQ-5D dimensions were hypothesized to have strong correlations. Validity coefficients were computed as nonparametric Spearman rank correlations coefficients, which were adjusted for attenuation. This adjustment accounts for the previously established reliability of the target (EQ-5D) and the criterion tests.19
Table 2. Hypothesized and Observed Correlations Between Health-Related Quality of Life Measures
NOTE: Boxes with X or x represent our a priori hypotheses of likely strong associations. X represents those scales/dimensions showing strong correlations (all r ≥ |0.5|, P < 0.001), and x represents moderate correlations (all r ≥ |0.3| and < |0.5|, P < 0.001). Σ represents those strong associations that were not anticipated but, in retrospect, are contextually consistent.
Abbreviations: BAI, Beck Anxiety Inventory; BP, bodily pain; CES-D, Center for Epidemiologic Studies Depression Scale; GH, general health; HPW, health preference weight; MCS, mental component summary; MH, mental health; PCS, physical component summary; PF, physical function; RE, role emotional; RP, role physical; SF, social function; VAS, visual analogue scale; VT, vitality.
Given that small correlations will achieve statistical significance when sample sizes are greater than 200, our primary measure providing evidence of criterion-related validity was the magnitude of the correlation coefficient and not solely whether a correlation was statistically significant (and potentially associated with small effects that would be insufficient evidence of criterion-related validity). In agreement with Cohen's interpretation of effect sizes for measures of this type20 and with several EQ-5D validation studies in different patient populations,21–23 we considered a Spearman's correlation of ≥0.5 to represent large effects and those of 0.3 to <0.5 to be moderate. Although statistical power was not the primary consideration for testing criterion-related validity, statistical power was prospectively established for the construct validation phase. On the basis of preliminary data generated during the first 6 months of EQ-5D data collection, 43 patients in each of the 4 measurement periods would provide adequate power (80% at the 0.05 2-sided alpha level) to detect a moderate difference in means (of 0.08 given a within-group standard deviation of 0.16) on HPW scores. This specific difference in means represents an effect size of 0.5 standard deviations, which is a generally accepted threshold for clinically relevant effects in the HRQOL literature.24
To evaluate the cross-sectional construct validity of the EQ-5D, we assessed the effect of liver transplantation on the EQ-5D health utility index, VAS, and the 5 health dimensions by measurement period (pretransplant period versus early, intermediate, and late posttransplant periods). The pretransplant EQ-5D HPWs and VAS scores were compared to scores for each posttransplant period with the Mann-Whitney U test, with P values ≤ 0.05 considered significant. The chi-square test of proportions was used to test the effect of the measurement period on the proportion of patients with no problems versus any problems on each EQ-5D dimension. Data are summarized as the mean ± standard deviation, median and interquartile range, or percentages.
Two hundred eighty-five patients completed EQ-5D surveys between September 2006 and November 2007 (113 in the pretransplant period, 60 in the early posttransplant period, 47 in the intermediate posttransplant period, and 65 in the late posttransplant period). Two subjects were excluded for not completing the EQ-5D survey. Seventy-six percent of the sample completed all criterion survey scales, and 94% completed all but one. Data represented respondents' only (218 subjects) or initial (67 subjects) EQ-5D survey response during this data collection period. Eighty-seven repeat EQ-5D observations, including those from the 14 individuals having both pretransplant and posttransplant data, were not analyzed. This was predominantly a male, Caucasian sample, and the most prevalent indication for liver transplantation was noncholestatic (hepatitis B, hepatitis C, or alcoholic) cirrhosis (Table 3). The follow-up time for the posttransplant patients averaged 36 ± 36 months (range: 0.8-133 months), and the mean follow-up for each posttransplant group was as follows: 4 ± 4 months for the early posttransplant group, 21 ± 7 months for the intermediate posttransplant group, and 76 ± 26 for the late posttransplant group. There was no difference in age or the proportion of males between the 4 measurement period–specific groups.
Table 3. Demographic and Clinical Data
Mean ± SD or %
Abbreviations: MELD, Model for End-Stage Liver Disease; NASH, nonalcoholic steatohepatitis; SD, standard deviation
Age at transplant
53.3 ± 10.4
21.3 ± 7.9
Grade school (0-8 years)
High school (9-12 years)
College or beyond
36 ± 36
Early posttransplant period (≤12 months)
4 ± 4
Intermediate posttransplant period (13-36 months)
21 ± 7
Late posttransplant period (>36 months)
76 ± 26
All the a priori hypothesized associations between scales or dimensions measuring similar constructs were confirmed by demonstration of strong or moderate, attenuation-adjusted Spearman rank correlation coefficients that ranged in absolute value from 0.31 to 0.75 (all P < 0.001; Table 2). Eighty-one percent (17 of 21) of these associations were strong, with correlation coefficients that ranged in absolute value from 0.50 to 0.75 (all P < 0.001). The strongest associations were between the EQ-5D HPW and the 2 component summaries of the SF-36. The remaining hypothesized associations were moderate and ranged in absolute value from 0.31 to 0.47 (all P < 0.001). Four strong associations (noted by Σ in Table 2) were not anticipated. The usual activities domain of the EQ-5D correlates with the physical function, bodily pain, and general health scales of the SF-36, and the anxiety/depression domain of the EQ-5D is strongly associated with the SF-36 social functioning scale.
Table 4 displays the average EQ-5D HPWs by measurement period. The pretransplant group had the lowest HPW scores (0.75 ± 0.19), whereas the intermediate posttransplant group had the highest (0.83 ± 0.11). Differences in HPW between the pretransplant period and the intermediate and late posttransplant periods represented statistically significant, clinically relevant, moderate effects (P = 0.01 and P = 0.006, respectively) that are consistent with documented effects of liver transplantation on other HRQOL measures. Figure 1 displays the distribution of HPW scores by period. Overall, EQ-5D HPWs were negatively skewed, with the preponderance of values being toward the upper end of the scale. The greatest variability (range: 0.05-1.0), lowest mean and median values, and greatest number of outlier observations occurred in the pretransplant period. HPWs in the early and intermediate posttransplant periods demonstrated tighter score distributions and fewer outliers. In the late posttransplant period, the 75th percentile value is 1.0, indicating that 25% of recipients in this period reported perfect health. However, the distribution of scores in this late posttransplant period widened in relation to the preceding periods, with a larger number of observations in the lower range of possible HPW scores.
Table 4. HPW by Period
Abbreviations: CI, confidence interval; HPW, health preference weight; IQR, interquartile range (representing the 25th and 75th percentile ranks); SD, standard deviation.
Similar effects of the measurement period are demonstrated in the EQ-5D VAS data period (Table 5). VAS scores in the pretransplant period were significantly lower than those in each of the posttransplant periods (all P ≤ 0.001).
Table 5. EQ-5D VAS Score by Period
Abbreviations: CI, confidence interval; IQR, interquartile range (representing the 25th and 75th percentile ranks); SD, standard deviation; VAS, visual analogue scale.
The percentage of patients reporting problems in several EQ-5D dimensions varied across measurement periods. When responses were dichotomized as “no problems” and “any problems,”25 the EQ-5D dimensions of mobility, self-care, and usual activities demonstrated statistically significant differences in the proportion of patients reporting problems across monitoring periods, with the reported problems declining from the pretransplant period to the late posttransplant period (Table 6). The pain/discomfort and anxiety/depression dimensions, which had lower event frequencies at all time points, showed a pattern of decreasing reported problems after transplantation, but these effects were small and not statistically significant.
Table 6. Percentage of Patients Reporting No Problems Versus Any Problems by the Measurement Period
Early Posttransplant Period
Intermediate Posttransplant Period
Late Posttransplant Period
This is the first study to examine health utility scores of liver transplant candidate and recipients using the US population–based HPWs for the EQ-5D. Previous studies using health status instruments have shown significant differences or improvement in HRQOL in these patients from the pretransplant period to the posttransplant period.26–28 However, valid health utility measures, not traditional health status scores, are necessary to convert HRQOL outcomes to QALYs for the assessment of cost-utility. Prior to 2005, EQ-5D HPWs were based on European data that were likely not representative of the US population. However, Shaw and colleagues2 established a set of US-based societal preference weights for EQ-5D scores. Johnson and colleagues29 subsequently reported meaningful differences between the US and UK EQ-5D valuations, thus demonstrating the importance of using appropriate population-based standards.
However, there is concern that generic health-utility questionnaires such as the EQ-5D may fail to capture relevant differences or changes in health status within specific patient groups. For example, European patients' EQ-5D self-evaluation and HPWs had weak discriminative ability across differing severities of osteoarthritis.30 For this reason, the reliability and validity of the core instrument and its European population–based HPW valuations have been established in general public and many clinical samples.25, 31–34 This study aims to validate the EQ-5D with recently generated US HPWs to enable future cost-utility analysis in liver transplantation.
Our data support the EQ-5D as a valid instrument for assessing generic HRQOL and HPWs in liver transplant candidates and recipients. We demonstrated criterion-related validity for the EQ-5D through a theoretically anticipated pattern of moderate to strong correlations with relevant constructs represented by generic and specific HRQOL instruments. Although 4 strong associations were not anticipated, in retrospect, the construct similarity between the EQ-5D and criterion scales is reasonable.
Construct validity was demonstrated, in that the EQ-5D detected differences in HRQOL of greater than one-half of a standard deviation from the pretransplant period to the posttransplant period that are consistent with those reported with other HRQOL instruments. Whiting et al.35 reported improvements in most of the SF-36 domains, except bodily pain, in a sample of 84 patients before and after liver transplantation. Bravata and coauthors,28 in a meta-analysis of 7 liver transplant studies, indicated good overall performance on SF-36 mental subscales, with little change due to transplantation, and significant improvement after transplantation on several physical subscales. Finally, we have recently reported longitudinal analyses in a prospective cohort of liver transplant patients, demonstrating a similar magnitude of change (more than half of a standard deviation) after liver transplantation in HRQOL as measured by the SF-36, BAI, and CES-D.26 Consistent with the health status measures used in earlier studies, the EQ-5D detected significant differences in overall HRQOL and in individual domains from the pretransplant period to the posttransplant period, but only subtle differences were appreciated in the EQ-5D domains characterizing mental HRQOL.
A promising feature of the EQ-5D that was demonstrated in this study is the breadth of health states reported by liver transplant candidates and recipients. This overall range of HPWs, from 0.049 to 1.0, suggests the EQ-5D's discriminative ability within our liver transplant sample. We anticipate that our findings can be generalized to other US liver transplant samples having similar demographics. Although we do not anticipate that race would have an effect on the validity of EQ-5D HPWs, our sample was not suitably heterogeneous to test this potential effect.
This study has several limitations. We did not determine the longitudinal responsiveness of the EQ-5D because we have not yet accumulated sufficient longitudinal data to support that type of analysis with sufficient statistical power. Despite this limitation, we tested the construct validity of this instrument using a cross-sectional design and demonstrated its ability to detect differences in liver transplant candidates and recipients that are comparable to the effects detected by other HRQOL instruments with longitudinal data.26 Given the overall congruence between reports that used either cross-sectional or longitudinal designs, we anticipate that longitudinal responsiveness will be demonstrated in the future and do not think it is necessary to delay the application of the US preference weights for cost-utility analyses. Also, we recognize the potential for certain biases that may be present in our study. There is potential for selection bias in any study involving patients undergoing a surgical procedure for which there is associated morbidity as well as mortality that may preclude those affected by perioperative morbidity and/or mortality from participating. Furthermore, responder bias is always a consideration in survey research. Our data collection method, timing assessments with on-site clinical visits, has yielded an overall participation rate close to 80% that we believe adequately represents liver transplant candidates and recipients at our institution.
In conclusion, this is the first study to assess criterion-related and construct validity of the EQ-5D and its US population-based HPWs in the liver transplant setting. We conclude that the EQ-5D is valid for evaluating generic HRQOL in this population and that EQ-5D scores can be converted with Shaw et al.'s model2 into valid HPWs that have future application in cost-utility analyses for these patients.