Feng Xie, Programs for Assessment of Technology in Health, Department of Clinical Epidemiology and Biostatistics, McMaster University, 25 Main Street West, Suite 2000, Hamilton, ON, Canada L8P 1H1. E-mail: email@example.com
Objective: To map the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) onto the EuroQol 5 Dimension (EQ-5D) utility index in patients with knee osteoarthritis (OA).
Methods: A consecutive sample of patients (n = 258) diagnosed with knee OA completed both the WOMAC and the EQ-5D. Regression models with the ordinary least squares (OLS) or the censored least absolute deviations as the estimator were used to establish the mapping function. The WOMAC was represented as explanatory variables in four ways: 1) total score; 2) domain scores (i.e., pain, stiffness, and physical function); 3) domain scores plus pair-wise interaction terms to account for possible nonlinearities; and 4) individual item scores. Goodness-of-fit criteria included the mean absolute error (the primary criterion) and the root mean squared error, and were obtained using an iterative random sampling procedure. Prediction precision was evaluated at individual patient level and at the group level.
Results: The model using the OLS estimator and the WOMAC domain scores as explanatory variables had the best fit and was chosen as the preferred mapping model. The prediction error at the individual level exceeded the maximal tolerance value (i.e., the minimally important difference of the EQ-5D) in about 16% of the patients. At the group level, the width of the 95% confidence interval of prediction errors varied from 0.0176 at a sample size of 400 to 0.0359 at a sample size of 100.
Conclusions: EQ-5D scores can be predicted using WOMAC domain scores with an acceptable precision at both individual and group levels in patients with mild to moderate knee OA.
Osteoarthritis (OA) is the most common form of arthritis in the world and affects knees, hip, hands, and spine . Knee and hip OA in particular are chronic conditions associated with pain and reduction in physical function, leading to a negative impact on the physical and psychosocial well-being of patients [2–4]. Therefore, different instruments have been developed to evaluate the impact of OA on functioning and quality of life of patients . The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is one of the most widely used disease-specific instruments for knee and hip OA . Its reliability and validity have been demonstrated in many countries .
OA is also imposing a significant economic burden to patients, health-care providers, and the society as a whole [8–14]. To make efficient use of scarce health-care resources, cost-effectiveness evidence has become more important than ever to decision-makers at various levels [15–17]. The quality-adjusted life-year (QALY) is recommended as an outcome measure in economic evaluations [18–20] as it incorporates both quality and quantity of life, and allows a broader comparison across treatment strategies, patient populations, and clinical settings. A utility (as a measure of quality of life) for a health state and the time duration of staying in that state (as a measure of quantity of life) are the key parameters for calculating QALY. Utilities can be defined as a person's preference or desire for health states and scaled from 0 (dead) to 1 (full health), with negative values representing states worse than death. Measuring utilities often requires generic preference-based health-related quality of life (HRQoL) instruments, such as the EQ-5D and the Health Utilities Index (HUI). In clinical studies, OA-specific instruments (e.g., the WOMAC) are often preferred to generic ones, particularly because of its higher sensitivity in detecting a minimally important difference (MID) in OA [7,21]. Nevertheless, the WOMAC (as well as most of other OA-specific instruments) is a profile-based measure and thus cannot generate utilities. Given the fact that most of the patients with OA are elderly and need to complete a number of administrative forms and disease-specific questionnaires during their visits, adding a generic instrument would cause extra burden (both time and cognitive) on patients and consequently decrease the quality of the overall response. Although researchers are struggling to gather utilities that can be used directly in economic evaluations, a substantial amount of literature reporting WOMAC scores in different settings and patient populations is sleeping on the shelves of libraries.
One potential solution to this problem is to map disease-specific measures onto generic preference-based indexes so utilities can be estimated using the disease-specific measures. Some studies have been published to link the EQ-5D with cancer-specific measures [22–25], with the Parkinson Disease Questionnaire , and with inflammatory bowel disease-specific instruments . Barton et al. mapped the WOMAC onto EQ-5D scores in patients with self-reported knee pain . Grootendorst et al. developed a model to link WOMAC scores with HUI Mark 3 (HUI3) in patients with knee OA . The aim of this study was to map the WOMAC onto the EQ-5D in patients with knee OA, which has not been done yet. The EQ-5D was chosen because it is a widely used generic preference-based instrument and has been demonstrated as a reliable and valid measure in this patient population [30–35]. More importantly, these two instruments cover some common health domains (i.e., pain and physical function) that are important to patients with knee OA.
A consecutive sample of patients (n = 258) recruited from the Department of Orthopaedic Surgery at the Singapore General Hospital between August and December 2005 completed the questionnaires. All patients were diagnosed with knee OA by their attending physicians based on clinical and radiographic features. Each subject was interviewed by a trained interviewer using the WOMAC and the EQ-5D. The Institutional Review Board of the hospital approved this study.
The WOMAC, a 24-item disease-specific functioning measurement, consists of three domains, namely, pain (5 items), stiffness (2 items), and physical function (17 items). Each of these 24 items is graded either on a five-point Likert scale or on a 100-mm visual analog scale [6,36]. In this study, we used the Likert scale WOMAC (version LK 3.0). Items are scored from 0 to 4 (i.e., no, mild, moderate, severe, and extreme problems). Domain scores are calculated by summing constituent item scores (i.e., pain score ranges from 0 to 20, stiffness from 0 to 8, and physical function from 0 to 68). Total score is calculated by summing the three domain scores (range 0–96), with higher scores reflecting worse pain, stiffness, and physical function.
The EQ-5D measures HRQoL using a self-classifier. The self-classifier consists of a five-item descriptive system and assesses health status in the domains of mobility, self-care, usual activities, pain/discomfort, and anxiety/depression . Each item has three response levels (i.e., no problems, some problems, and extreme problems). Its psychometric properties have been established in patients with OA [30–35].
First, regression analyses using methods suitable for health utility data, which are often not normally distributed [30,35] and has a ceiling value of 1.0, were conducted. Regression models fitted using ordinary least squares (OLS) are consistent regardless of distribution of outcome measures and have been used in previous studies [23,28,29]. Nevertheless, some researchers prefer the censored least absolute deviations (CLAD) estimator to the OLS based on the argument that a CLAD model accounts for ceiling values [22,26,38]. The present study used both the OLS and CLAD estimators.
Second, in the regression analyses, several alternative representations of the WOMAC were considered as explanatory variables: 1) WOMAC total score; 2) WOMAC domain scores (i.e., pain, stiffness, and physical function); 3) WOMAC domain scores plus pair-wise interaction terms (i.e., pain × stiffness, pain × function, stiffness × function, pain × pain, stiffness × stiffness, and function × function) to account for possible nonlinearities; and 4) WOMAC individual item scores with stepwise model selection method in the OLS model, but not in the CLAD model. The reasons for including or excluding demographics in regression models varied across the published studies. To maintain the consistency with other published studies mapping disease-specific instruments to the EQ-5D [22,23,26], demographics were not included in the analysis. Nevertheless, the impact of including demographics on predicting utilities is an important area to be explored in future studies. The outcome variable was EQ-5D score calculated using the Japanese scoring algorithm .
A number of criteria were used to examine the goodness of fit of each model . Mean absolute error (MAE) is the average of the absolute difference between observed and predicted values. In the present study, MAE was identified as the primary criterion for goodness of fit as it is an easily and a directly interpretable measure. We also reported the root mean squared error (RMSE), the positive square root of the average squared prediction error. In contrast to MAE, RMSE attaches greater weight to larger errors. To account for variability in these goodness-of-fit diagnostics, an iterative random sampling procedure proposed by Grootendorst et al.  was used. Specifically, the whole sample was randomly split into two groups, one for estimation and the other for validation. The estimation sample was used to fit each candidate model, and the validation sample was used to obtain the MAE and RMSE. This process was repeated 500 times, each time, with a random split, generating 500 MAEs and RMSEs. Mean MAE, mean RMSE, and corresponding 95% confidence intervals (CIs) were calculated. The lower the mean MAE and RMSE, the better the goodness of fit of a model. The preferred model was the one with the best goodness of fit. We presented one random split as an illustration.
Finally, the coefficients of the preferred model were determined using the whole study sample. The precision of this preferred model was examined at two levels. At the individual level, the prediction error was computed using the difference between observed and predicted EQ-5D scores for each of the 258 patients. At the group level, the prediction error was estimated by applying a nonparametric bootstrapping with replacement method . Specifically, various group sizes of patients (n = 50, 100, 200, and 400) were randomly sampled. For example, a patient was randomly chosen from the original data set and his/her predicted EQ-5D score and prediction error were recorded. This patient was then placed back into the data set (hence the term “with replacement”). This process was repeated until the sample size of each group (i.e., n = 50, 100, 200, and 400) was reached. For each group, mean predicted EQ-5D scores and mean prediction error were calculated, which formed one bootstrapping replicate. By repeating the above-mentioned process 5000 times, we generated a distribution for the group mean predicted EQ-5D scores and corresponding group mean prediction errors for each of the groups. The 2.5th and 97.5th percentiles of the distribution were therefore used to estimate the 95% CI for the prediction error.
All statistical tests were two sided and conducted at a significance level of 5%. Data were analyzed using R version 2.4.1 (the R Development Core Team).
As shown in Table 1, the mean age of the whole sample was 66.5 years with 83% being female. The mean EQ-5D score was 0.62, whereas the mean WOMAC pain, stiffness, and physical function scores were 6.64, 3.12, and 26.24, respectively, for the cohort.
Table 1. Demographic characteristics and quality of life scores for the study sample
Total sample (n = 258)
BMI, body mass index; EQ-5D, EuroQol 5 Dimension; SD, standard deviation; OA, osteoarthritis; WOMAC, Western Ontario and McMaster Universities Osteoarthritis Index.
Age, mean (SD)
Female, n (%)
Ethnicity, n (%)
Formal education, n (%)
Married, n (%)
Retirees/homemaker, n (%)
BMI, mean (SD)
Years with OA, mean (SD)
EQ-5D scores, mean (SD)
WOMAC scores, mean (SD)
Table 2 compares the goodness of fit of the regression models with different representations of the WOMAC. When using the OLS as the estimator, the model—utilizing the three WOMAC domain scores as the explanatory variables—had the lowest MAE (i.e., 0.074) and RMSE (i.e., 0.095) values. This result was consistent when using the CLAD as the estimator. The model with the WOMAC domain scores as the explanatory variables and the OLS as the estimator had the best goodness of fit (i.e., the preferred model) according to the criteria (i.e., lowest MAE and RMSE values). As an illustration, Table 3 compares the observed and predicted EQ-5D scores by the OLS model in one of the randomly generated validation samples. The EQ-5D scores predicted by the CLAD model were also presented for comparison purpose. The mean observed EQ-5D score was 0.624 compared with 0.604 predicted by the OLS model and 0.600 by the CLAD model. The standard deviation (SD) of the observed scores was higher than the SD of the predicted scores by both models. Again, the OLS model generated lower MAE than the CLAD model did in the validation sample.
Table 2. Model selection diagnostics estimated using the iterative random sampling procedure*
Mean estimates with 95% confidence intervals displayed in parentheses.
CLAD model cannot perform stepwise selection.
EQ-5D, EuroQol 5 Dimension; WOMAC, Western Ontario and McMaster Universities Osteoarthritis Index; OLS, ordinary least squares; MAE, mean absolute error; RMSE, root mean squared error; CLAD, censored least absolute deviations.
0.0750 (0.0671, 0.0822)
0.0736 (0.0654, 0.0809)
0.0780 (0.0702, 0.0870)
0.0773 (0.0682, 0.0870)
0.0967 (0.0839, 0.1075)
0.0947 (0.0825, 0.1052)
0.1013 (0.0879, 0.1161)
0.0997 (0.0873, 0.1127)
0.0756 (0.0678, 0.0833)
0.0745 (0.0665, 0.0823)
0.0794 (0.0703, 0.0896)
0.0975 (0.0847, 0.1090)
0.0956 (0.0841, 0.1072)
0.1040 (0.0890, 0.1244)
Table 3. Performance of regression models using a validation sample
EQ-5D, EuroQol 5 Dimension; CLAD, censored least absolute deviations; OLS, ordinary least squares; SD, standard deviation; MAE, mean absolute error.
Specification of the Preferred Model
Based on the performances of each candidate model, the model using the OLS estimator and the WOMAC domain scores as explanatory variables was chosen as the preferred mapping model. The coefficients were therefore obtained by applying this model to the whole sample.
The adjusted R2 for this preferred model was 0.449. For example, if a patient has WOMAC pain, stiffness, and function scores of 10, 4, and 20, respectively, this model will estimate an EQ-5D score of 0.7512 for that patient.
Prediction Precision of the Preferred Model at the Individual Level
Table 4 presents the number and percentage of patients for which the absolute difference between the predicted and the observed EQ-5D scores fell into various ranges. The MIDs of the EQ-5D were reported as 0.07 across different patient populations and as 0.12 in patients with knee OA . Approximately 57% and 84% of the absolute differences between the observed and predicted EQ-5D scores in the OLS model were less than 0.07 and 0.12, respectively. The corresponding percentages for the CLAD model were 56% and 83% (Table 4).
Table 4. Prediction precision of the preferred model at the individual level
CLAD (comparison) n (%)
OLS (preferred model) n (%)
CLAD, censored least absolute deviations; OLS, ordinary least squares.
|▵| ≤ 0.01
0.01 < |▵| ≤ 0.03
0.03 < |▵| ≤ 0.05
0.05 < |▵| ≤ 0.07
0.07 < |▵| ≤ 0.10
0.10 < |▵| ≤ 0.12
|▵| > 0.12
Prediction Precision of the Preferred Model at the Group Level
The OLS model produced acceptable prediction precision at the individual level. At the group level, most of the predictions were between 0.55 and 0.65 (Fig. 1). The proportion of predictions that fell in this range increased with group size (i.e., n = 50, 100, 200, and 400), whereas the errors of the predictions shrank with group size (Fig. 1). The 95% CI of the prediction errors ranged from −0.025 to 0.026 at a group size of 50 and from −0.013 to 0.013 at a group size of 400 (Table 5). No prediction error for the group sizes exceeded the MID of the EQ-5D in knee OA patients. As shown in Table 5, the prediction precision can be estimated according to the group size and mean predicted EQ-5D score. To illustrate, if the mean predicted EQ-5D score was 0.62 in a group of 200 patients, 95% CI of group-level prediction would be from 0.606 (i.e., 0.62–0.0140) to 0.627 (i.e., 0.62 + 0.0070).
Table 5. Comparison of group mean predicted versus group mean prediction error of the preferred model
Clinical evidence forms the foundation of economic evaluation in health-care programs. Preference-based HRQoL instruments are not being used as often as disease-specific instruments in clinical studies. This could be a potential barrier to conducting an economic evaluation that allows broader comparison across different diseases for decision-makers. In an attempt to reduce this barrier by making use of existing clinical evidence obtained from disease-specific instruments, we compared the performance of different regression models mapping WOMAC scores onto EQ-5D utilities. The model using the three WOMAC domain scores as the explanatory variables and the OLS as the estimator is the preferred model with the best goodness of fit (i.e., the lowest prediction error) compared with the alternative models.
Several issues on generalizability are worth noting for the present model-based study. First, the mapping model is recommended for use in patients with mild to moderate knee OA as this is the population used to build the model. Arbitrarily, using the median WOMAC domain scores as thresholds of severity (i.e., 10 for pain, 4 for stiffness, and 34 for physical function), the mean MAE was higher in a subgroup of severe patients than in patients with mild or moderate disease. Thus, generalizing the model to severe patients may be questionable. Second, although this mapping model can be used for both individual and group level predictions, application at the group level would be preferred because effectiveness in economic evaluations is usually compared at the group level. Notably, at the group level, the width of 95% CI of prediction errors varied from 0.0176 at a sample size of 400 to 0.0359 at a sample size of 100. Given the sample size range often seen in clinical studies in OA, the prediction precision is deemed to be good. Nevertheless, the caveat is that the prediction error could be increased when the mapping model is used in an economic evaluation with QALY being estimated over a relatively longer period of time. Therefore, it is highly recommended that 95% CI of the prediction be used in sensitivity analyses to examine the potential impact on decision-making. Lastly, the present study was based on a sample of Asian patients, and caution should be exercised when applying this mapping model to estimate utilities of other patient populations. Nevertheless, the mapping model is of value in predicting utilities for patients with mild to moderate knee OA when such data are not available.
To date, only two published studies mapped the WOMAC onto preference-based HRQoL instruments [28,29]. Compared with our model, the preferred model in the study by Barton et al.  had higher MAE and RMSE values (i.e., 0.129 and 0.180, respectively, vs. 0.074 and 0.095 in our study), and lower adjusted R2 (0.313 vs. 0.449 in our study). Notably, these two studies differed in the patient population (i.e., the patients with confirmed knee OA vs. the patients with knee pain); the scoring algorithms (UK vs. Japan); the WOMAC representations; and the model estimators. Our model also has better goodness of fit in terms of the MAE and RMSE values than the model mapping the WOMAC onto the HUI3 .
It is always a good practice to have an independent data set with both WOMAC and EQ-5D scores available to assess external validity of the mapping model. This important property was not evaluated in the present study because of the lack of such data. Nevertheless, it is still worth reporting the mapping function from the present study while keeping this limitation in mind. Second, the mapping function was developed based on cross-sectional data. It is not clear if and how the mapping function varies over time. Nevertheless, the main application of the function is to generate utilities for economic evaluations, which typically compare different treatments over time. Thus, the longitudinal validity of the function needs to be assessed before it can be applied in economic evaluations. Last but not least, uncertainty is an important issue to be addressed in economic evaluations. Notably, about 50% of the variance around the estimates cannot be explained by the function. It is likely that use of the mapping function increases the uncertainty around utility estimates. Therefore, the mapping function is by no means a replacement of the EQ-5D in economic evaluations. Instead, this function is intended to serve as a remedy when the EQ-5D score (or utility) is important but not available. The uncertainty introduced by the mapping function itself should also be adequately addressed.
In conclusion, EQ-5D scores can be predicted using WOMAC domain scores with an acceptable precision at individual and group levels in patients with mild to moderate knee OA. Factors including patient population, sample size, and tolerance level of the prediction error must be carefully considered and balanced when applying this mapping function.
Source of financial support: No funding was received.