These senior authors contributed equally to this study.
Validation and Comparison of EuroQol and Short Form 6D in Chronic Prostatitis Patients
Article first published online: 15 APR 2010
© 2010, International Society for Pharmacoeconomics and Outcomes Research (ISPOR)
Value in Health
Volume 13, Issue 5, pages 649–656, August 2010
How to Cite
Zhao, F.-l., Yue, M., Yang, H., Wang, T., Wu, J.-h. and Li, S.-C. (2010), Validation and Comparison of EuroQol and Short Form 6D in Chronic Prostatitis Patients. Value in Health, 13: 649–656. doi: 10.1111/j.1524-4733.2010.00728.x
- Issue published online: 5 AUG 2010
- Article first published online: 15 APR 2010
- chronic prostatitis;
- validation study
Objective: Generic, preference-based health-related quality of life (HRQoL) instruments is increasingly used in health-care decision-making process. However, to our knowledge, no such HRQoL instrument has been validated or used in chronic prostatitis. We therefore aimed to assess and compare the psychometric properties of EuroQol (EQ-5D) and Short Form 6D (SF-6D) among chronic prostatitis patients in China.
Methods: Consenting patients were interviewed using EQ-5D and SF-6D. Convergent and discriminative construct validities were examined with five and two a priori hypotheses, respectively. Sensitivity was compared using receiver operating characteristic (ROC) curves and relative efficiency (RE) statistics. Agreement between instruments was assessed with intra-class correlation coefficients and Bland–Altman plot, while factors affecting utility difference were explored with multiple liner regression models.
Results: In 268 subjects, mean (SD) EQ-5D and SF-6D utility scores were comparable at 0.73 (0.15) and 0.75 (0.10), respectively. Five of the seven hypotheses for construct validity were fulfilled in both instruments. The areas under ROC of them all exceeded 0.5 (P < 0.001). SF-6D had 9.7–19.9% higher efficiency than EQ-5D at detecting the difference in chronic prostatitis symptom severity. Despite no significant difference in utility scores between two instruments, lack of agreement was observed with low intraclass correlation coefficient (0.218–0.630) and Bland–Altman plot analysis. Chronic prostatitis symptom severity significantly (P < 0.05) influenced differences in utility scores between EQ-5D and SF-6D.
Conclusions: Both EQ-5D and SF-6D are demonstrated to be valid and sensitive HRQoL measures in Chinese chronic prostatitis patients, with SF-6D showing better HRQoL dimension coverage, greater sensitivity, lower ceiling effect, and more rational distribution. Further research is needed to determine longitudinal response and reliability.
Health related-quality of life (HRQoL) has been recognized increasingly as an important outcome of health care to be incorporated into the decision-making process of clinicians and policymakers, especially in the management of patients with chronic diseases or disorders . Nevertheless, to adopt a more holistic patient management approach by including HRQoL as an outcome clinically, the major challenge is to find valid and reliable HRQoL measures.
Generally speaking, HRQoL can be evaluated using either condition-specific or generic instruments. Although the advantage of condition-specific measures is their capacity to detect small, but clinically important changes in a disease, these instruments are not suitable for comparisons across different disease states. In contrast, generic HRQoL measures include a broader range of health dimensions and enable broader comparisons to be made independent of disease groups, treatments, or health programs . Hence, it is recommended that a generic instrument should be used alongside a disease-specific instrument in evaluating HRQoL in clinical settings. Furthermore, the use of generic preference-based measures instead of profile-based measurement system can provide utility scores for calculating quality adjusted life-years, which is a widely used clinical effectiveness indicator . Indeed, quality adjusted life-years has been formally included into methodological guidelines for health technology assessment in many countries [4–6].
Chronic prostatitis (CP) is a common condition affecting 2–10% of men around the world [7,8] and 4.5% in China , and causes quality of life impairment of its sufferers [10,11]. Since CP is primarily a disease of uncertain etiology with no current “gold standard” for its treatment, the primary goal of CP management is to achieve optimal symptom control and ultimately to improve the patients' HRQoL. As such, HRQoL profile measures such as Sickness Impact Profile, Short Form 36 and CP-specific measure, National Institutes of Health Chronic Prostatitis Symptom Index (NIH-CPSI) are commonly used generic and disease-specific profile measures for CP patients . However, gaps are noted in the management of CP patients compared with other quality-of-life diseases, such as rheumatoid arthritis . First, the quantity of publications on HRQoL of CP patients is limited . Second, at present, to our best knowledge, no preference-based HRQoL instrument has been validated or used in CP. Our objective of this study, therefore, was to evaluate the validity and sensitivity of EuroQol (EQ-5D) and Short Form 6D (SF-6D), two preference-based HRQoL instruments increasingly used in clinical settings, in CP patients. Furthermore, considering the debate on the different performance of EQ-5D and SF-6D and the dilemma of choosing among instruments , a secondary objective was to evaluate and compare the difference between these two instruments to provide some baseline information for the use of them in CP patients.
Study Design and Patient Recruitment
To increase the power and representativeness, this cross-sectional study was conducted in two centers, namely, the 306th Hospital of PLA in Beijing (northern China) and the First People's Hospital of Yunnan Province in Kunming (southern China), two tertiary referral hospitals in China. With informed consent, a consecutive sample of outpatients with CP was recruited in this Institutional Review Board-approved study from December 2008 to March 2009. Patients were eligible if they were aged between 20 and 59 years and diagnosed with CP by their attending physicians based on clinical symptom, microscopic examination of expressed prostatic secretion and urine, and transrectal ultrasound features. Each patient was interviewed by a trained interviewer using a standardized questionnaire containing the EQ-5D/visual analog scale (VAS) and SF-6D. Other information solicited from the participants included their sociodemographic data and medical conditions. The symptom severity of the patients was measured using the NIH-CPSI. The interviewer, procedure, and questionnaire used were identical between two cities.
EQ-5D/VAS: The EQ-5D is a generic, preference-based HRQoL instrument with five dimensions, including mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension has three response levels (no problem, some problems, and severe problems). The EQ-5D descriptive system can theoretically generate 243 health states, each of which was assigned a utility score ranging from −0.59 to 1.00. The utility scoring algorithm adopted in this study was developed using time trade-off (TTO)-based preference scores from a UK general population sample . EQ-VAS is a 20-cm vertical visual analog scale ranging from 100 (best imaginable health state) to 0 (worst imaginable health state) to represent the overall health. Respondents classify and rate their health status on the day of the survey. The simplified Chinese version of EQ-5D/VAS in this study is an official version authorized by the EuroQol Group.
The SF-6D was developed from Short Form 36 by Brazier et al. with six dimensions comprising physical functioning, role limitations, social functioning, pain, mental functioning and vitality . Each dimension has 4 to 6 levels and thus 18,000 possible health states are defined. The SF-6D utility scoring algorithm used in this study was derived from a representative sample of UK general population with Standard Gambling (SG) method, raging from 0.29 to 1.00 . The recall period is 4 weeks. Our study adopted the Hong Kong Chinese version of SF-6D translated and validated in general population by Lam et al. in Hong Kong . The traditional Chinese characters used in the Hong Kong Chinese SF-6D were converted into equivalent simplified Chinese characters used in Mainland China. During our study, subjects did not report any concerns regarding phrasing of the Hong Kong Chinese SF-6D.
The NIH-CPSI, a nine-item index, is a commonly used instrument for assessing symptoms and its impact on daily life in men with CP/CPPS [18,19]. The score of NIH-CPSI range from 0 to 43, comprising three subscores, including pain (21 scores), urinary symptoms (10 scores), and quality of life (12 scores). NIH-CPSI has been accepted by the International Prostatitis Collaborative Network as the standard, valid instrument for evaluating men with chronic prostatitis symptoms [18,20]. The Chinese NIH-CPSI has been validated and used wildly in scientific research and clinical observation [7,21].
Descriptive statistics. All analyses were based on subjects who fully completed the questionnaire. Descriptive statistics were computed to characterize the sample and the distribution of EQ-5D/VAS and SF-6D. Continuous variables are presented as mean, standard deviation (SD), median, interquartile range (IQR), and range while categorical variables are shown in the number and proportion of the sample within each group. As we recruited the study subjects from two centers in Beijing and Kunming, accordingly, we compared the sample composition based on the socioeconomic and clinical characteristic between two cities with Mann–Whitney U or chi-square tests. In the further analyses, we combined the data from two cities together to increase the power.
Construct validation. Convergent validity of the EQ-5D and SF-6D was assessed by examining their association with NIH-CPSI and EQ-VAS at domain and scale level. Based on the literature and clinical experience, six a priori hypotheses were generated where moderate-to-strong correlation coefficients (ρ) were expected, namely: 1) EQ-5D and SF-6D utility scores with total NIH-CPSI scores; 2) EQ-5D and SF-6D utility scores with NIH-CPSI quality of life domain; 3) EQ-5D and SF-6D utility scores with EQ-VAS; 4) EQ-5D and SF-6D pain/discomfort with NIH-CPSI pain; and 5) EQ-5D usual activity, SF-6D role limitation with NIH-CPSI pain, urinary. Validity coefficient were computed as Spearman's rank correlation coefficient (ρ), with ρ > 0.5 considered as strong correlation, 0.35 to 0.5 as moderate correlation, and 0.2 to 0.34 as week correlation .
As a further test of validity, we used “known-group” method to examine the discriminative validity of EQ-5D and SF-6D based on its ability to discriminate patients with different level of CP severity and self-reported health status groups. Subjects with more severe symptom and poorer health status were hypothesized to have lower utility scores for these two instruments. Other variables used in assessing validity include the social economic status, duration of CP, and presence of other medical conditions. Nonparametric Mann–Whitney U tests were performed to identify statistically significant effects of the dichotomous variables on utility scores, while Kruskal–Wallis H tests for polytomous variables. The levels of CP severity were defined as mild, moderate, and severe if NIH-CPSI scores ranged from 0 to 14, 15 to 30, and 31 to 43, respectively [23,9]. EQ-VAS was adopted as an indicator for self-reported health status, and we classified the EQ-VAS scores into four groups, namely <65 (bad), 65 to 79 (fair), 80 to 89 (good), and 90 to 100 (excellent) .
Sensitivity of EQ-5D and SF-6D. The efficiency of the EQ-5D and SF-6D to detect clinically relevant differences of CP patients were compared using relative efficiency (RE) statistic and receiver operating characteristic (ROC) curves. The area under ROC curves (AUC) was computed to compare the discriminative properties of these two instruments . The measure that generates the largest AUC is regarded as the most sensitive as an instrument with ideal discriminative ability has an AUC of 1.0, and an AUC ≤0.5 means no discriminative power. RE is based on the ratio of squared t statistics between two instruments, where EQ-5D was defined as the denominator [25,26]. The coefficient greater than 1 suggests that SF-6D is more sensitive than EQ-5D at detecting clinically relevant differences with the given sample size, while the coefficient less than 1 means less sensitive. For the purpose of AUC and RE calculation, NIH-CPSI scores were dichotomized into two categories: ≤14 and ≥15, indicating mild and moderate to severe CP severity, respectively. Considering the choice of cutoff point is unavoidably arbitrary and may have effect on the computation results, the RE and AUC were also computed using the median of NIH-CPSI score as an alternative cutoff point.
Level of agreement between EQ-5D and SF-6D. We compared the mean (SD) and median (IQR) utility scores between these two instruments across the sample, as well as for subgroups categorized by social economic and clinical characteristic. Paired comparisons were made with Wilcoxon's signed rank test and Spearman's rank correlation for the association between them. Small subgroups (n < 10) were combined with the adjacent group. Given the limitations of simple correlation and significance test, the degree of agreement between utility scores of EQ-5D and SF-6D was assessed by intraclass correlation coefficient (ICC) and Bland–Altman plot. The ICC was computed with two-way random effects model based on absolute agreement and coefficient above 0.7 suggests an strong agreement . In Bland–Altman plot, the average of the two measurements was plotted on the x-axis, and the difference between the two measurements on the y-axis, where SF-6D was the subtrahend. The deviation of difference from 0, where implies total agreement, indicates the degree of agreement for each subject on the plot .
Factors affecting utility difference between EQ-5D and SF-6D. We explored whether patients' socioeconomic and clinical characteristics were related to the utility difference between EQ-5D and SF-6D. Thus, we ran multiple liner regression (MLR) where the utility difference was the dependent variable. Individual characteristics including age, place, ethnicity, education level, marriage and working status, months with CP, presence with chronic or acute medical conditions, NIH-CPSI scores and EQ-VAS for global well-being were treated as independent variables. All analyses were based on subjects who fully completed the questionnaire. Statistical analyses were performed using SPSS version 16.0 (SPSS Inc., Chicago, IL, USA).
Characteristics of Patients
Of 275 patients who participated, 5 patients from Beijing and 2 patients from Kunming failed to complete the questionnaire because of personal reason and were thus excluded from the analyses. In the remaining 268 patients with median age of 32 years (ranging 20–59), 173 were interviewed at Beijing and 95 at Kunming.
The patients from Beijing had fewer ethnic minorities, higher education level, and smaller household size (P < 0.05), which were representative of these two cities (Table 1) . No significant difference was detected in other demographic and clinical characteristics of patients between two cities, and the demographic and disease profile of the analyzed sample closely resembled the results of epidemiological survey of CP patients in China by Liang et al. (Table 1) .
|Characteristic||Total (n = 268)||Beijing (n = 173)||Kunming (n = 95)||P-value|
|Median (IQR)||32.0 (10.0)||31.0 (10.0)||33.0 (9.0)|
|Mean (SD)||33.2 (7.99)||33.1 (8.24)||33.3 (7.57)|
|Range||20.0 to 59.0||21.0 to 59.0||20.0 to 52.0|
|Ethnic minority||26 (9.7)||7 (4.0)||19 (20.0)||<0.001|
|Years of education||0.011|
|≤6||18 (6.7)||10 (5.8)||8 (8.4)|
|7–12||118 (44.0)||66 (38.2)||52 (54.7)|
|≥12||132 (49.3)||97 (56.1)||35 (36.8)|
|Married||164 (61.2)||101 (58.4)||63 (66.3)||0.202|
|Working||218 (81.3)||146 (84.4)||72 (75.8)||0.084|
|Median (IQR)||3.0 (1.0)||3.0 (2.0)||4.0 (1.0)||0.001|
|Presence of chronic medical condition*||59 (22.0)||40 (23.1)||19 (20.0)||0.555|
|Presence of acute medical condition†||135 (50.4)||87 (50.3)||48 (50.5)||0.970|
|Months with CP||0.035|
|≤6||106 (39.5)||75 (43.4)||31 (32.6)|
|7–12||55 (20.5)||33 (19.1)||22 (23.2)|
|13–18||39 (14.6)||27 (15.6)||12 (12.6)|
|19–24||26 (9.7)||19 (11.0)||7 (7.4)|
|≥24||42 (15.7)||19 (11.0)||23 (24.2)|
|Median (IQR)||18.0 (7.8)||19.0 (8.0)||17.0 (6.0)|
|Mean (SD)||18.5 (5.75)||18.8 (5.67)||17.9 (5.89)|
|Range||6.0 to 34.0||6.0 to 34.0||6.0 to 34.0|
|Median (IQR)||70.0 (20.0)||70.0 (20.0)||70.0 (20.0)|
|Mean (SD)||69.15 (14.20)||68.98 (13.28)||69.45 (15.8)|
|Range||30.0 to 100.0||30.0 to 100.0||40.0 to 100.0|
|Median (IQR)||0.73 (0.07)||0.73 (0.07)||0.73 (0.07)|
|Mean (SD)||0.73 (0.15)||0.74 (0.14)||0.72 (0.16)|
|Range||0.19 to 1||0.19 to 1||0.29 to 1|
|Median (IQR)||0.76 (0.14)||0.76 (0.12)||0.79 (0.16)|
|Mean (SD)||0.75 (0.10)||0.75 (0.09)||0.75 (0.11)|
|Range||0.44 to 0.95||0.53 to 0.95||0.44 to 0.93|
Description Statistics of EQ-5D and SF-6D
Of the total 268 patients, the mean (SD) utility score for the EQ-5D was 0.73 (0.15), and the median (IQR) was 0.73 (0.07), while the mean (SD) utility score was 0.75 (0.10) for SF-6D, and the median (IQR) was 0.76 (0.14). Utility scores between two cities were not significantly different. The range of EQ-5D utility score was 0.19 to 1, wider than the range SF-6D, from 0.44 to 0.95 (Table 1).
Kolmogorov–Smirnov normal test results showed that distribution of SF-6D utility scores was normal (P = 0.362), while that of EQ-5D was bimodal (P < 0.001). Table 2 presents the distribution of EQ-5D and SF-6D results within each domain. The strong ceiling effect was observed in almost all domains except pain/discomfort domain of EQ-5D, and the highest percentage of ceiling effect appeared at mobility (97.0%), self-care (100%), and usual activities (93.7%). Similarly, the ceiling effect occurred at physical functioning (51.9%), role limitation (38.4%), and social functioning (55.6%) domains of SF-6D. Although no floor effect was observed, a noticeable percentage (7.1%) of patients scored at floor level of anxiety/depression domain of EQ-5D.
Convergent validity was demonstrated by the moderate to strong correlation coefficients (range: 0.422–0.548, P < 0.001) for three of five a priori hypotheses in both EQ-5D and SF-6D (Table 3). Correlations between utility of these two instruments with EQ-VAS was weak, meanwhile, NIH-CPSI pain and urinary scores correlated weakly with EQ-5D usual activity and SF-6D role limitation.
Table 4 presents the univariate analyses results for EQ-5D and SF-6D utility scores among multiple subgroups. Hypotheses for known-group discriminative validity that both EQ-5D and SF-6D utility scores would decrease monotonically with increasing NIH-CPSI and decreasing EQ-VAS score levels were fulfilled. Moreover, both measures discriminated between the presence of other chronic diseases, and not that of acute diseases. No significant difference in utility scores was observed among variables for social economic status in univariate analyses for both instruments.
|Mean (SD)||P-value||Mean (SD)||P-value|
|20–29||106 (39.6)||0.73 (0.16)||0.74 (0.09)|
|30–39||106 (39.6)||0.73 (0.15)||0.74 (0.10)|
|40–49||47 (17.5)||0.75 (0.12)||0.76 (0.10)|
|50–59||9 (3.4)||0.78 (0.09)||0.81 (0.04)|
|Beijing||173 (64.6)||0.74 (0.14)||0.75 (0.09)|
|Kunming||95 (35.4)||0.72 (0.16)||0.75 (0.11)|
|Ethnic minority||26 (9.7)||0.73 (0.19)||0.76 (0.12)|
|Nonethnic minority||242 (90.3)||0.73 (0.15)||0.75 (0.09)|
|Years of education||0.489||0.309|
|≤6||18 (6.7)||0.74 (0.13)||0.72 (0.10)|
|7–12||118 (44.0)||0.71 (0.17)||0.75 (0.11)|
|≥12||132 (49.3)||0.75 (0.13)||0.75 (0.08)|
|Married||164 (61.2)||0.73 (0.14)||0.75 (0.10)|
|Nonmarried||104 (28.8)||0.74 (0.16)||0.74 (0.09)|
|Working||218 (81.3)||0.74 (0.15)||0.75 (0.09)|
|Not working||50 (18.7)||0.70 (0.17)||0.72 (0.10)|
|Months with CP||0.626||0.178|
|≤6||106 (39.5)||0.73 (0.16)||0.75 (0.10)|
|7–12||55 (20.5)||0.74 (0.12)||0.76 (0.08)|
|13–18||39 (14.6)||0.73 (0.15)||0.74 (0.09)|
|19–24||26 (9.7)||0.76 (0.12)||0.78 (0.10)|
|≥24||42 (15.7)||0.73 (0.17)||0.72 (0.11)|
|0–14||67 (25.0)||0.83 (0.08)||0.81 (0.08)|
|15–29||194 (72.4)||0.71 (0.15)||0.73 (0.09)|
|30–43||7 (2.6)||0.59 (0.24)||0.60 (0.10)|
|<65||98 (36.6)||0.68 (0.18)||0.73 (0.07)|
|65–79||88 (32.8)||0.74 (0.13)||0.73 (0.07)|
|80–89||46 (17.2)||0.79 (0.08)||0.80 (0.07)|
|90–100||36 (13.4)||0.79 (0.10)||0.80 (0.11)|
|Presence of chronic medical condition||0.001||0.002|
|Yes||59 (22)||0.71 (0.15)||0.73 (0.00)|
|No||209 (78)||0.74 (0.15)||0.73 (0.07)|
|Presence of acute medical condition||0.455||0.451|
|Yes||135 (50.4)||0.74 (0.14)||0.74 (0.14)|
|No||133 (49.6)||0.73 (0.16)||0.75 (0.10)|
Sensitivity of EQ-5D and SF-6D
RE statistic calculation showed that SF-6D had 9.7% higher efficiency at detecting difference between patients with mild and moderate to severe CP symptom (Table 5). Sensitivity analysis showed that recalculated RE statistic increased to 1.199, which suggested that SF-6D is 19.9% more efficient than EQ-5D.
|Measure||NIH-CPSI||n||Mean (SD)||t Test||RE||ROC curve|
|t statistic||P-value||AUC||95% CI|
|EQ-5D||≤14||67||0.83 (0.08)||6.265||<0.001||1.000||0.820*||(0.765, 0.876)|
|SF-6D||≤14||67||0.81 (0.08)||6.561||<0.001||1.097†||0.757*||(0.691, 0.822)|
|EQ-5D||≤18||138||0.79 (0.10)||7.104||<0.001||1.000||0.758*||(0.701, 0.805)|
|SF-6D||≤18||138||0.79 (0.08)||7.779||<0.001||1.199†||0.752*||(0.695, 0.810)|
Furthermore, the AUC scores of both instruments above 0.5 with statistical significance suggested that they are able to detect the difference between patients with mild and moderate to severe CP symptom (Table 5). Sensitivity analysis was also performed, and AUC score of EQ-5D decrease from 0.820 to 0.758 with cut point changed to the median NIH-CPSI score, while AUC score of SF-6D only decreased 0.005 unit.
Level of Agreement between EQ-5D and SF-6D
As shown in Table 6, EQ-5D utility scores were generally lower than SF-6D, but the differences were not statistically significant. Spearman's correlation coefficient between the two instruments was moderate (0.495) for all patients, but poor agreement between them was observed with ICC (0.444). Moreover, the level of correlation and ICC demonstrated a wide range (ρ: 0.288–0.825, ICC: 0.218–0.630) according to different social economical and clinical factors.
|n||Mean (SD)||Median (IQR)||P-value†||ICC||Spearman|
|All patients||268||0.73 (0.15)||0.75 (0.10)||0.73 (0.07)||0.76 (0.14)||0.627||0.444||0.495**|
|20–29||106||0.73 (0.16)||0.74 (0.09)||0.73 (0.07)||0.74 (0.13)||0.876||0.425||0.475**|
|30–39||106||0.73 (0.15)||0.74 (0.10)||0.76 (0.07)||0.73 (0.14)||0.893||0.460||0.518**|
|40–59||56||0.75 (0.11)||0.76 (0.10)||0.73 (0.07)||0.76 (0.15)||0.473||0.446||0.450**|
|Beijing||173||0.74 (0.14)||0.75 (0.09)||0.73 (0.07)||0.76 (0.12)||0.455||0.398||0.424**|
|Kunming||95||0.72 (0.16)||0.75 (0.11)||0.73 (0.07)||0.79 (0.16)||0.104||0.505||0.602**|
|Ethnic minority||26||0.73 (0.19)||0.76 (0.12)||0.80 (0.08)||0.78 (0.20)||0.388||0.630||0.461**|
|Nonethnic minority||242||0.73 (0.15)||0.75 (0.09)||0.73 (0.07)||0.76 (0.14)||0.822||0.412||0.578**|
|Years of education|
|≤6||18||0.74 (0.13)||0.72 (0.10)||0.73 (0.07)||0.73 (0.10)||0.157||0.406||0.825**|
|7–12||118||0.71 (0.17)||0.75 (0.11)||0.73 (0.07)||0.77 (0.15)||0.028||0.508||0.631**|
|≥12||132||0.75 (0.13)||0.75 (0.08)||0.73 (0.07)||0.75 (0.13)||0.357||0.361||0.342**|
|Married||164||0.73 (0.14)||0.75 (0.10)||0.73 (0.07)||0.76 (0.14)||0.475||0.218||0.535**|
|Nonmarried||104||0.74 (0.16)||0.74 (0.09)||0.73 (0.07)||0.76 (0.14)||0.404||0.500||0.439**|
|Working||218||0.74 (0.15)||0.75 (0.09)||0.73 (0.07)||0.76 (0.14)||0.874||0.409||0.467**|
|Not working||50||0.70 (0.17)||0.72 (0.10)||0.73 (0.07)||0.73 (0.13)||0.496||0.539||0.622**|
|Months with CP|
|≤6||106||0.73 (0.16)||0.75 (0.10)||0.73 (0.07)||0.76 (0.13)||0.236||0.436||0.353**|
|7–12||55||0.74 (0.12)||0.76 (0.08)||0.73 (0.07)||0.76 (0.10)||0.497||0.385||0.474**|
|13–18||39||0.73 (0.15)||0.74 (0.09)||0.73 (0.07)||0.73 (0.13)||0.494||0.418||0.530*|
|19–24||26||0.76 (0.12)||0.78 (0.10)||0.80 (0.07)||0.76 (0.13)||0.402||0.541||0.512*|
|≥24||42||0.73 (0.17)||0.72 (0.11)||0.73 (0.07)||0.73 (0.20)||0.268||0.481||0.673**|
|0–14||67||0.83 (0.08)||0.81 (0.08)||0.80 (0.08)||0.82 (0.10)||0.368||0.285||0.382**|
|15–43||201||0.70 (0.15)||0.73 (0.09)||0.73 (0.02)||0.73 (0.12)||0.309||0.379||0.38**|
|<65||98||0.68 (0.18)||0.73 (0.07)||0.72 (0.09)||0.70 (0.12)||0.373||0.379||0.488**|
|65–79||88||0.74 (0.13)||0.73 (0.07)||0.74 (0.09)||0.76 (0.13)||0.369||0.441||0.506**|
|80–89||46||0.79 (0.08)||0.80 (0.07)||0.78 (0.08)||0.77 (0.11)||0.764||0.406||0.639**|
|90–100||36||0.79 (0.10)||0.80 (0.11)||0.81 (0.09)||0.82 (0.11)||0.296||0.273||0.288|
|Presence of chronic medical condition|
|Yes||59||0.71 (0.15)||0.73 (0.00)||0.71 (0.09)||0.71 (0.14)||0.76||0.296||0.326**|
|No||209||0.74 (0.15)||0.73 (0.07)||0.76 (0.10)||0.76 (0.13)||0.479||0.470||0.503**|
|Presence of acute medical condition|
|Yes||135||0.74 (0.14)||0.74 (0.14)||0.73 (0.07)||0.76 (0.14)||0.597||0.361||0.486**|
|No||133||0.73 (0.16)||0.75 (0.10)||0.73 (0.07)||0.76 (0.12)||0.211||0.517||0.478**|
Bland–Altman analysis indicated that the 95% limits of agreement between EQ-5D and SF-6D ranged from −0.279 to 0.253, and over 95% points lies within limits (Fig. 1). The two instruments did not demonstrate consistently similar measure because there was a level of disagreement that includes reported clinically important difference of up to 0.074 of EQ-5D and 0.033 of SF-6D [29,30]. Moreover, a systematic variation in the utility difference of EQ-5D and SF-6D scores was observed, with higher SF-6D at lower mean utility, and lower SF-6D at higher mean utility scores.
Factors Affecting Utility Difference between EQ-5D and SF-6D
As shown in Table 7, the multiple linear regression model with difference between EQ-5D and SF-6D as the dependent variable, CP symptom severity measured with NIH-CPSI scores attained statistical significance; however, the magnitudes of the influence was very small (coefficient −0.004, P = 0.014), while similar result was observed at 7–12 years education level variable (−0.036, P = 0.041). Other socioeconomic and clinical variables were not associated with any statistically significant differences.
|Independent variables||Utility difference|
|Coefficient (95% CI)||P-value|
|Age (Years)||0 (−0.002, 0.002)||0.871|
|Beijing†||0.017 (−0.019, 0.053)||0.348|
|Ethnic minority||0.018 (−0.039, 0.075)||0.540|
|Household size||−0.003 (−0.018, 0.011)||0.646|
|Years of education|
|≤6||0.029 (−0.042, 0.101)||0.421|
|7–12||−0.036 (−0.070, −0.001)||0.041|
|Married||−0.017 (−0.052, 0.019)||0.359|
|Working||0.004 (−0.039, 0.047)||0.861|
|Months with CP||0.009 (−0.002, 0.020)||0.101|
|NIH-CPSI||−0.004 (−0.007, 0)||0.014|
|EQ-VAS||0.001 (0, 0.002)||0.297|
|Presence of chronic medical condition||0.009 (−0.031, 0.049)||0.663|
|Presence of acute medical condition||0.018 (−0.014, 0.050)||0.273|
An advantage of measuring HRQoL with preference-based measures is that the utility scores elicited have an intuitive interpretation and theoretical application in decision-making. Therefore, using an appropriate and valid instrument to elicit the health utility score becomes a key determinant in ensuring the quality of decision made by the clinicians and policymakers in both patient management and health technology assessment. In this study, we provided the evidence of validity and sensitivity of EQ-5D and SF-6D in Chinese patients with chronic prostatitis, and demonstrated it is feasible and acceptable to elicit the utility score with these two instruments. Moreover, these two instruments show similar, but not identical performance, especially at individual-level. This head-to-head comparison shed some light on the choice of preference-based HRQoL instruments for CP patients. To our knowledge, this is the first study evaluating the validity and performance of preference-based HRQoL measures in CP patients.
In our study, we choose Beijing, the capital of China in the north, and Kunming, a middle size city in the south, to increase the power, representativeness and the generalizability of the results. Even though there were more subjects recruited from Beijing, most of the patients' characteristics were representative of these two cities that have different economic and demographic background. Furthermore, it was shown that the location does not affect the validity of the results, and thus better generalizability of the results can be implied.
The convergent validity for EQ-5D and SF-6D was demonstrated through their moderate-to-strong correlations with NIH-CPSI, a validated instrument for CP, and “known-group” validation further support the discriminative validation of EQ-5D and SF-6D. The correlations between EQ-5D and SF-6D with NIH-CPSI pain score are obviously higher than the correlations of them with urinary score. This is in consistent with the finding from Wenninger et al. that the pain scale was the only physical symptom that significantly contributed toward the sickness impact, but not urinary symptoms . Noticeably, the strong ceiling effect of EQ-5D at mobility, self-care, and usual activities domains might attenuate the correlation coefficients. The similar ceiling effects also occur in SF-6D physical functioning, role limitation, and social functioning domains, but compared with EQ-5D, they were not so severe. A possible explanation is that the subjects enrolled in this study were outpatients, and most of them were experiencing mild to moderate symptom, which may increase the ceiling effects. In addition, the ceiling effects may arise when test problems are not the main aspects impaired in the measured condition.
Mean SF-6D utility scores exceeded mean EQ-5D scores by 0.02 with no statistically significance. The magnitude of difference is smaller than the differences reported in other disease groups or general population [24,29,32,33]. This high degree of similarity in utility scores might further support their convergent validity in CP patients. However, although the EQ-5D and SF-6D group scores were similar, the ICC analyses and Bland–Altman plot revealed the inconsistence of these two measures at individual level. From the Bland–Altman plot, we can tell that the differences between EQ-5D and SF-6D were split into two groups by mean utility score around 0.6. It is probably because of the specific UK scoring algorithm of EQ-5D, in which if any dimension is at level 3, a N3 term will be included. The existence of N3 term can also lead to the bimodal distribution of EQ-5D utility scores, whereas distribution of SF-6D was normal [29,33]. Nevertheless, N3 term is not the only reason for the individual-level discrepancy between EQ-5D and SF-6D, because only 21 patients (7.8%) reported extreme level in EQ-5D. Multiple linear regression analysis showed that severity of symptom and education level was the possible predictors for the utility differences. However, the very small magnitudes of the influence suggested that larger studies are needed to confirm and further clarify our findings. Consistent with the findings by Wee et al. that other social economic factors had no significant association with utility differences , this suggested that the paired application of instruments is feasible.
Even though both EQ-5D and SF-6D were demonstrated to be valid and sensitive in CP patients, some comments need to be made about the recommendation for them. First, although both measures can distinguish patients with different severity of symptom and self-reported health status, RE and ROC analysis showed that SF-6D is more efficient to detect clinically relevant difference of CP patients. Second, even these two instruments are all designed to measure the generic HRQoL and produce preference index, their health descriptive systems, methods of eliciting preferences, and scoring functions are different. SF-6D includes broader aspects of HRQoL, such as role and social functioning, and has more response level for each domain. This can make the description of health status more comprehensive, and patients would be more likely to find the best description for their status. In fact, the call for more response options to EQ-5D has been addressed before , and five-level version of EQ-5D is under development . Third, EQ-5D is known to have strong ceiling effects , and this may limit its ability to discriminate between patients with mild to moderate symptom. Finally, the distribution of EQ-5D scores was bimodal, whereas that of the SF-6D was normal, leading to individual-level discrepancy of their utility scores to some extent and raising concerns regarding the scoring algorithm of the EQ-5D, which needs further evaluation. Eventually, with better HRQoL dimension coverage, greater sensitivity, lower ceiling effect, and more rational distribution, SF-6D is shown to be the more appropriate choice in CP patients compared with EQ-5D in our current study.
Naturally, the results of this study need to be interpreted in the light of several possible limitations. The first limitation was that we did not examine the longitudinal response and reliability of EQ-5D and SF-6D, for which are also important psychometric characteristics of any HRQoL instrument. Although sensitive measures is usually considered to be reliable , longitudinal study is necessary for the validation of them in CP patients, as this is a chronic disease. Second, the relatively small sample size of CP patients with severe symptom (NIH-CPSI scores 31–43, 2.6%) might aggregate the high ceiling effect observed. Third, according to the clinical diagnosis criteria practiced in China, we cannot separate NIH II and NIH III type of CP in our study subjects, hence this could potentially introduce systematic bias resulting from the possible differences of patients' experience. However, evidence showed that inclusion of NIH II CP in the analysis would not have effect on the quality of life analysis , suggesting that it is possible to achieve equivalence of the symptom impact on the HRQoL of these two types of CP. Nevertheless, researchers and clinicians are encouraged to consider the effect of CP classification on the utility index measure when adopting EQ-5D and SF-6D as outcome measures. Further research with larger sample size and more strict diagnosis criteria is needed to establish the benchmark of utility score for CP patients and determine other psychometric properties, such as longitudinal response and reliability.
In conclusion, EQ-5D and SF-6D are demonstrated to be valid and sensitive preference-based HRQoL measures in Chinese CP patients, with SF-6D showing better HRQoL dimension coverage, greater sensitivity, lower ceiling effect. and more rational distribution. Further research is needed to determine other psychometric properties, such as longitudinal response and reliability.
Source of financial support: No funding was received for the conduction of the present study.
- 4National Institute for Clinical Excellence. Guide to the methods of health technology appraisal. London, June 2008.
- 5Australia Pharmaceutical Benefits Advisory Committee(PBAC). Guidelines for preparing submissions to the PBAC. Canberra, November 2006.
- 6Ontario Ministry of Health and Long-term Care. Ontario guidelines for economic analysis of pharmaceutical products. Ontario, August 1994.
- 22How to develop and validate a new health-related quality of life instrument. In: SpilkerB, ed. Quality of Life and Pharmacoeconomics in Clinical Trials (2nd ed.). Philadelphia, PA: Lippincott-Raven Publishers, 1996., , .
- 26Quality of Life: the Assessment, Analysis and Interpretation of Patient-Reported Outcomes (2nd ed.). Chichester, West Sussex, UK: John Wiley & Sons, Ltd., 2007., .
- 28National Bureau of Statistics in China. China Statistical Yearbook 2008. Available from: http://www.stats.gov.cn/tjsj/ndsj/2008/indexeh.htm[Accessed May 26, 2009.
- 34EuroQol Group. Available from: http://www.euroqol.org/eq-5d/eq-5d-versions.html[Accessed June 6, 2009.