Using the health assessment questionnaire to estimate preference-based single indices in patients with rheumatoid arthritis

Authors


Abstract

Objective

To estimate the relationship between preference-based measures, EuroQol (EQ-5D) and SF-6D, and the Health Assessment Questionnaire (HAQ) disability index (DI) in patients with rheumatoid arthritis (RA), and to characterize components that are predictors of health utility.

Methods

Patients with RA participating in 2 studies in the UK (n = 151) and Canada (n = 319) completed the HAQ, EQ-5D, and Short Form 36 (SF-36). The SF-36, a generic measure of quality of life, was converted into the preference-based SF-6D. From these results we developed models of the relationship between the HAQ and SF-6D and EQ-5D using various regression analyses.

Results

The optimal model developed for the EQ-5D entered levels for each item as independent variables (model 5). A root mean square error (RMSE) of 0.18 suggested relatively good predictive ability. For the SF-6D, RMSEs were lower (0.09), suggesting better predictions than for the EQ-5D, but models with more explanatory variables did not improve results (model 2 or 4 optimal). The models were able to predict actual SF-6D and EQ-5D across the range of the HAQ DI.

Conclusion

Our approach enabled calculations of quality-adjusted life years from existing trials where only the HAQ was measured. All aspects of the HAQ may not be reflected in the preference-based measures, and this method is suboptimal to direct measurement of health state utility in clinical trials. Given this limitation, our approach provides an alternative for researchers who need health-state utility values, but had not included a preference-based measure in their clinical study because of resource constraints or a desire to limit patient burden.

INTRODUCTION

Given the scarcity of health care resources, public and private agencies have become interested in both the effectiveness and cost-effectiveness of health care interventions (1). The preferred approach toward measuring benefits in cost-effectiveness analyses is to value health status in a single unit of measurement known as utilities, which are used to derive quality-adjusted life years (QALYs). Instead of receiving full credit for each year of life, QALYs weight the impact of morbidity. For example, patients with severe disability (Health Assessment Questionnaire [HAQ] score >2) may receive credit for living 5 months of good health for each year they are alive (2). QALYs in cost-effectiveness analyses (known as cost-utility analyses [CUAs]) are particularly informative for health policy decisions because they allow direct comparison of the efficiency of health care resource expenditure across a wide variety of conditions and treatments (3). Utilities are obtained by asking patients to make judgments or reveal preferences about changes in particular health states or outcomes. Preference-based instruments are formal methods for quantifying these judgments. These instruments fall into 2 groups: direct measures such as a standard gamble (SG) or time trade-off (TTO) questionnaire, or indirect measures where a generic instrument (such as the EuroQol [EQ-5D] or Health Utilities Index) has previously been populated with preference values from general population samples (1). Utilities obtained by indirect methods are recommended by the US Panel on Cost-Effectiveness in Health and Medicine and the Outcome Measures in Rheumatology Clinical Trials (OMERACT) Consensus-Based Reference Case for Economic Evaluation in Rheumatoid Arthritis (3, 4).

Many clinical studies do not use a preference-based measure due to lack of resources or time, or because the commonly used generic preference-based measures are regarded as unsuitable for the condition (5). In a majority of rheumatoid arthritis (RA) clinical trials, the HAQ is the primary and often sole measure of quality of life (6). Although the HAQ was primarily designed to measure only aspects of physical function and pain, it has been shown to be highly correlated with many generic and disease-specific measures of health-related quality of life (7). Subsequently, linear transformations between the HAQ and utility have previously been used in CUA (8, 9). While other disease-specific measures such as the Rheumatoid Arthritis Quality of Life questionnaire have been developed, only more recent clinical trials have used a preference-based measure (10).

As a result, the results of many clinical studies are not amenable to populating CUA. Because new programs and treatments in RA are competing alongside other disease areas for funding, it is important for the rheumatology community to be able to demonstrate the value of their interventions to policy makers. Estimating a relationship between the HAQ and a preference-based measure would make it possible to estimate QALY scores from existing clinical data where the HAQ has been measured but preference-based instruments have not (5, 11). Moreover, in trials where one such preference-based instrument has been measured, it could also be possible to evaluate another. Such analyses have previously been attempted for outcomes in asthma and obesity (11, 12). In the present study, we used data from the UK and Canada to map 2 preference-based instruments, the EQ-5D and the SF-6D, from the HAQ questionnaire. We went on to demonstrate how the results can be used in practice.

MATERIALS AND METHODS

Instruments.

Health Assessment Questionnaire.

The HAQ is a self-completed questionnaire, developed as a comprehensive measure of outcome in patients with a wide variety of rheumatic diseases, including RA, osteoarthritis, juvenile RA, lupus, scleroderma, ankylosing spondylitis, fibromyalgia, and psoriatic arthritis. Although the complete form of the HAQ includes an assessment of mortality, disability, pain and symptom levels, drug side effects, and resource utilization, most studies in practice only use the physical disability scale. This scale assesses upper and lower limb function in relation to the degree of difficulty encountered in performing daily living tasks, which include walking, dressing, bathing, and shopping. The HAQ contains 20 items distributed across 8 components. The scores range from 0 (without any difficulty) to 3 (unable to do). The highest score on any item within 1 component represents the dimension score. The respondent also indicates whether he or she uses aids or devices (14 items) or help from other individuals (8 items), totaling 42 individual items. The scores for each dimension are corrected for the use of aids or devices, summated, and transformed to give an overall disability index (DI) score between 0 and 3. A score of 0 represents no disability and 3 represents very severe, high-dependency disability (6).

EQ-5D.

The EQ-5D has 5 dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension has 1 item, and each item has 3 levels with 1 denoting no problems and 3 denoting extreme problems (13). The number of theoretically possible health states is 35 = 243. The EQ-5D can be reported in terms of a 5-digit profile indicating the level on each dimension, or in terms of a preference-based single index number. The latter is obtained by applying algorithms that link the 5-digit health state description with average valuations obtained from members of the public using the TTO method or a visual analog scale. In this study, EQ-5D indices were obtained using the so-called Measurement and Valuation of Health (MVH A1) value set, derived from a population survey in the UK using 10-year TTOs (14).

SF-6D.

The SF-6D was derived from the Short Form 36 (SF-36) (15). The SF-36 is a generic measure of health that generates scores across 8 dimensions of health (16). It has become one of the most widely used generic measures of health throughout the world, but it was not originally designed for use in economic evaluation. A research team at the University of Sheffield in collaboration with Dr. John Ware estimated a preference-based single index measure of health from the SF-36 (15). The index is estimated via a health state classification called the SF-6D derived from the SF-36 and is composed of 6 multilevel dimensions of health. It was constructed from a sample of 11 items selected from the SF-36 to minimize the loss of descriptive information and defines 18,000 health states. A selection of 249 states defined by the SF-6D has been valued by a representative sample of the UK general population (n = 611) using the SG valuation technique. Like the EQ-5D, regression models were estimated to predict single index scores for all health states defined by the SF-6D. The resultant algorithm can be used to convert SF-36 data at the individual level to a preference-based index.

Study populations.

Participants from 2 locations were recruited. In Vancouver, Canada, 319 patients from 8 private rheumatology offices with a clinical diagnosis of RA were followed up quarterly between October 2001 and September 2002 during 3 periods. In Maidstone Hospital, UK, 151 patients with a clinical diagnosis of RA from the department of rheumatology who were under routine treatment were assessed in 2001. All patients self-administered the HAQ, the SF-36, and the EQ-5D, in no particular order, at each clinic visit. We recruited 2 samples in order to generate an algorithm more generalizable to external populations.

Statistical analysis.

For the primary analysis, the relationships between scores on the EQ-5D, SF-6D, and HAQ DI were examined by fitting linear regression models estimated by generalized estimating equation algorithms where the correlation matrix takes the structure of an autoregressive of order 1. We evaluated 5 different regression models. Model 1 regressed only the HAQ DI onto the EQ-5D and SF-6D. Model 2 used all 8 domain scores, treating each as a continuous variable. Model 3 incorporated all 42 items of the HAQ (the 20 items that make up the domain scores along with the 22 questions surrounding aids or devices or help from other individuals), treating each as a continuous variable. Model 4 was the same as model 2 but treated each domain as a categorical variable with 4 levels, whereas model 5 was the same as model 3 but treated each item as a categorical variable. Each successive model required fewer assumptions surrounding items and response choices between intervals carrying equal weight, but also increased the chances of incorporating arbitrary associations.

The significance or sign of the beta coefficients was not of primary interest in this exercise given that we were interested in predictive ability rather than explanatory power of the variables. Because most data sets will collect all items of the HAQ, all coefficients were included in the final models of 1, 2, and 3. This was not practical for models 4 and 5 due to the large number of dummy variables in models 4 and 5. Instead, models 4 and 5 were developed using a backwards stepwise selection procedure, systematically removing the least significant variable until only significant variables remained (P < 0.05).

The criterion for judging the performance of each model is the difference between observed and predicted outcomes as reported in terms of the root mean square error (RMSE) (11). Although there are a number of alternative measures for accuracy of prediction (e.g., mean absolute error or intraclass correlation), because the RMSE favors prediction models that do not produce particularly large errors, it was considered to be the most indicative measure given that the objective of the analyses was to predict the mean EQ-5D and SF-6D scores for a cohort based on the individual HAQ DI scores, and not to predict individual scores or look for explanatory relationships (12). The goodness of fit for each model was also reported in terms of the marginal R2, which accounts for the multiple observations from individuals (17).

Residual plots were examined for nonlinear patterns and nonconstant error variance. Three-fold cross validation was then used to evaluate models. Data were randomly split into 3 subsets stratified by country. Of the 3 subsets, 2 subsets were used as training data and the remaining subset was retained as the validation data for testing the model. The process was then repeated 3 times so that each of the 3 subsets was used as the validation data exactly once. The root of the summation of mean squared errors over the 3 validations was then compared among models. The predictive performance in the UK and Canadian samples was also assessed. The generalizability of the final models to alternative patient populations was also examined by including the covariates age, sex, RA duration, tender joint count, and swollen joint count into the multivariate regression to determine whether they were important additional predictors.

RESULTS

Patient demographics.

At baseline, patients in the Canadian cohort were slightly older (61 years versus 56 years; P < 0.001) and a greater percentage of patients were women (78% versus 67%; P < 0.01) (Table 1). The mean HAQ score in the UK patients was substantially higher (1.41 versus 1.11; P < 0.01). This was reflected in both the EQ-5D scores where UK patients had a statistically significant different mean score of 0.51 versus 0.63 in the Canadian patients, and in the SF-6D where UK patients had a mean score of 0.62 versus 0.68 in the Canadian sample. These scores compare with age- and sex-adjusted general population values of 0.79 and 0.77 for the EQ-5D (UK and Canadian, respectively) and 0.78 and 0.77 for the SF-6D (UK and Canadian, respectively) (18). When the complete HAQ DI and either the SF-6D or EQ-5D were available, they were included in the analysis; otherwise the record was excluded. In total, 131 records were included from the UK cohort, and 308, 258, and 226 records were included from the Canadian cohort at baseline, 3 months, and 6 months, respectively.

Table 1. Summary statistics of baseline characteristics in the 2 cohorts*
 UK (n = 131)Canada (n = 308)Total (n = 439)P
  • *

    Values are the mean ± SD (range) unless otherwise indicated. RA = rheumatoid arthritis; HAQ = Health Assessment Questionnaire; EQ-5D = EuroQol.

  • Ordinal data compared using independent sample t-tests, categorical data compared using chi-square test.

  • Missing data.

Female sex, %6778760.01
Age, years55.98 ± 13.68 (17–82)61.35 ± 13.71 (19–90)60.76 ± 13.61 (17–90)< 0.01
RA duration, years13.98 ± 11.64 (0–57)
Tender joint count15.01 ± 12.08 (0–52)
Swollen joint count9.13 ± 9.66 (0–43)
HAQ disability    
 Number131308439 
 Index1.41 ± 0.80 (0–3)1.11 ± 0.77 (0–3)1.15 ± 0.78 (0–3)< 0.01
 Domains, modal level (% of total) (range)    
  Dressing and grooming2 (35) (0–3)0 (46) (0–3)0 (39) (0–3)< 0.01
  Rising1 (41) (0–3)0 (54) (0–3)0 (44) (0–3)< 0.01
  Eating1 (35) (0–3)0 (40) (0–3)0 (35) (0–3)< 0.01
  Walking2 (41) (0–3)0 (45) (0–3)0 (41) (0–3)0.02
  Hygiene2 (43) (0–3)3 (30) (0–3)0 (31) (0–3)< 0.01
  Reach3 (30) (0–3)0 (31) (0–3)2 (30) (0–3)< 0.01
  Grip2 (57) (0–3)2 (61) (0–3)2 (59) (0–3)0.01
  Activities2 (35) (0–3)2 (28) (0–3)2 (30) (0–3)0.47
SF-6D    
 Number129302431 
 Index0.62 ± 0.11 (0.27–0.92)0.68 ± 0.13 (0.26–1)0.68 ± 0.13 (0.26–1)< 0.01
 Domains, modal level (% of total) (range)    
  Physical functioning4 (31) (1–6)5 (30) (1–6)5 (28) (1–6)< 0.01
  Role limitation4 (46) (1–4)2 (63) (1–4)2 (54) (1–4)< 0.01
  Social functioning3 (36) (1–5)3 (43) (1–5)3 (40) (1–5)< 0.01
  Pain5 (33) (1–6)4 (27) (1–6)4 (27) (1–6)< 0.01
  Mental health3 (36) (1–5)2 (41) (1–5)2 (38) (1–5)0.01
  Energy and vitality5 (34) (1–5)3 (35) (1–5)3 (33) (1–5)< 0.01
EQ-5D    
 Number131308439 
 Index0.51 ± 0.31 (−0.35–1)0.63 ± 0.25 (−0.48–1)0.62 ± 0.27 (−0.48–1)< 0.01
 Domains, modal level (% of total) (range)    
  Mobility2 (78) (1–2)2 (62) (1–3)2 (66) (1–3)< 0.01
  Self-care1 (52) (1–3)1 (71) (1–3)1 (65) (1–3)< 0.01
  Usual activities2 (71) (1–3)2 (66) (1–3)2 (63) (1–3)< 0.01
  Pain2 (77) (1–3)2 (79) (1–3)2 (79) (1–3)0.01
  Anxiety1 (52) (1–3)1 (64) (1–3)1 (60) (1–3)< 0.01

Prediction models.

Each of the candidate models was evaluated and those with the smallest RMSE in the cross-validation analysis were chosen as the optimal prediction models. The coefficients from the combined data source for the optimal models are shown in Table 2. Regardless of which model was used, elements of arising, eating, walking, hygiene, and grip were consistent statistically significant predictors of both health utility measures. All coefficients were negative except for hygiene. Examination of the residual plots (Figure 1) suggested relatively linear models with constant error variance.

Table 2. Optimal regression equations for the SF-6D (models 2 and 4) and EQ-5D (model 5)*
Domain/itemBSEPRMSE developRMSE cross validationRMSE CanadaRMSE UKMarginal R2
  • *

    RMSE = root mean square error; H1 = dress yourself, including tying shoelaces and doing buttons; H4 = get in and out of bed; H6 = lift a full cup or glass to your mouth; H7 = open a new milk carton; H8 = walk outdoors on flat ground; H9 = climb up 5 steps; H13 = wheelchair; H16 = chair; H23 = take a tub bath; H24 = get on and off the toilet; H26 = bend down to pick up clothing from the floor; H27 = open car doors; H28 = open jars that have been previously opened; H30 = run errands and shop; H31 = get in and out of a car; H32 = do chores such as vacuuming or yardwork.

Model 2        
 SF-6D        
  Dressing and grooming−0.010.000.090.0890.0850.0820.0990.50
  Arising−0.030.00< 0.01     
  Eating−0.020.00< 0.01     
  Walking−0.010.00< 0.01     
  Hygiene0.010.000.07     
  Reach−0.010.000.01     
  Grip−0.010.00< 0.01     
  Activities−0.020.00< 0.01     
  Constant0.790.01< 0.01     
Model 4        
 SF-6D        
  Arising = 1−0.030.01< 0.010.0890.0840.0810.0990.51
  Arising = 2−0.050.01< 0.01     
  Arising = 3−0.110.04< 0.01     
  Eating = 1−0.020.01< 0.01     
  Eating = 2−0.040.01< 0.01     
  Eating = 3−0.060.02< 0.01     
  Walking = 2−0.020.01< 0.01     
  Walking = 3−0.070.02< 0.01     
  Hygiene = 1−0.020.01< 0.01     
  Reach = 1−0.020.010.01     
  Reach = 2−0.020.010.02     
  Reach = 3−0.040.01< 0.01     
  Grip = 2−0.020.01< 0.01     
  Constant0.780.01< 0.01     
Model 5        
 EQ-5D        
  Dressing and grooming        
   H1 = 2−0.150.04< 0.010.1830.1780.1610.2410.57
  Arising        
   H4 = 1−0.080.02< 0.01     
   H4 = 2−0.120.050.02     
   H4 = 3−0.590.08< 0.01     
  Eating        
   H6 = 2−0.150.050.01     
   H7 = 1−0.040.020.02     
   H7 = 2−0.080.030.01     
  Walking        
   H8 = 2−0.100.040.03     
   H9 = 30.120.050.02     
  Aids or devices        
   H13 = 2−0.140.04< 0.01     
   H16 = 10.070.030.01     
  Hygiene        
   H23 = 1−0.050.02< 0.01     
   H24 = 1−0.050.020.01     
   H24 = 2−0.110.04< 0.01     
  Reach        
   H26 = 2−0.140.04< 0.01     
   H26 = 3−0.130.060.03     
  Grip        
   H27 = 2−0.080.040.04     
   H27 = 3−0.200.07< 0.01     
   H28 = 3        
  Activities        
   H30 = 1−0.050.02< 0.01     
   H31 = 1−0.070.02< 0.01     
   H31 = 2−0.080.040.03     
   H32 = 3−0.090.03< 0.01     
  Constant0.800.01< 0.01     
Figure 1.

Predicted versus actual EuroQol (EQ-5D; model 5) and SF-6D (model 4) scores.

Models 2 and 4 were equally the best performing models for predictions of the SF-6D, with RMSEs equal to 0.09. Model 2 regressed the SF-6D indices onto the 8 HAQ DI dimension scores, with each dimension treated as a continuous variable. This assumes that the 42 items of the HAQ DI carry equal weight within a given domain and the intervals between response choices for each item are equal. Model 4 was less restrictive by entering each level of the domain as a dummy variable with level 1 as the baseline (i.e., 3 × 8 dummy codes representing the 4 possible responses for each dimension), allowing each dimension to have ordinal properties. In the final model, 13 of the 24 variables were included in the SF-6D (Table 2). Both models had marginal R2 values >0.5.

The model with the most covariates (model 5) was considered the optimal model for the EQ-5D, with an RMSE equal to 0.18. In model 5, the EQ-5D indices were regressed on the individual levels of the HAQ DI item scores, where each level was entered as a dummy variable with level 1 as the baseline (i.e., 3 × 20 dummy codes representing the 4 possible responses for each item of the 8 domains, and 1 × 22 dummy codes representing the dichotomous parameters). This model made the least stringent assumptions and did not assume that the response choices have ordinal properties (Table 2). Again, the marginal R2 value of the model was >0.5.

While the RMSE can be used to choose which of the candidate models performs best, no definition exists of what level of RMSE should be considered acceptable for fitting purposes. Figure 2 demonstrates that across the range of the HAQ DI, the optimal model predictions for both the EQ-5D and SF-6D were close to that observed. Only in the first group (HAQ 0–0.5) was the prediction significantly different from the actual utility (P < 0.01). Even in the higher HAQ groups where there were fewer patients, the predictions appeared to be robust.

Figure 2.

Predicted and actual EuroQol (EQ-5D; model 5) and SF-6D (model 4) scores and confidence intervals across Health Assessment Questionnaire (HAQ) groups for all observations.

Generalizability.

We attempted to assess the generalizability of the prediction models by including other characteristics of the study populations. We found that the Canadian population had a small but significantly higher estimated utility score compared with the UK cohort, above what was explained by the HAQ DI (B = 0.06 for the EQ-5D and B = 0.04 for the SF-6D, P < 0.05). However, because the estimated effect of HAQ elements was not changed when a country was added to the models as a covariate, an estimated utility gain using these algorithms would not be affected by which country patients in the population were from. Of the other clinical variables examined in the Canadian baseline data, none were statistically significant for the EQ-5D, whereas only the number of tender joints was found to be a significant predictor for the SF-6D (B = −0.0016, P < 0.05). However, the inclusion of clinical variables did not improve the predictive performance of the final models for either the EQ-5D or the SF-6D.

Application.

A simple example of how to use the algorithms is given in Figure 3 (a downloadable Excel sheet is available at http://www.pharmacoeconomics.ubc.ca/). For each patient in each strategy, the HAQ DI should be translated to either the EQ-5D or SF-6D at pre- and postintervention. The average utility can then be developed for each time interval. A way of calculating a QALY would be to then calculate an appropriate area under the curve (e.g., [pre-utility + post-utility] / 2 × elapsed time [in years]) and multiply by the annual survival percentage. This describes a simple method for developing just one component in a cost-effectiveness model. The incorporation of costs, extrapolation of costs and benefits, and a comparison between at least 2 strategies are just a few additional requirements before an incremental cost-effectiveness ratio can be derived (1).

Figure 3.

Example of calculation required for estimating a preference-based index. HAQ = Health Assessment Questionnaire.

An example of how the models would predict utility gains is given in Figure 4. We divided the patients in the Canadian sample into responders and nonresponders based on whether they had an improvement in HAQ DI by what is defined as a minimum important difference equal to 0.25 (19). The difference between estimated and observed mean utility gains was small (0.13 versus 0.10 for the EQ-5D and 0.05 versus 0.06 for the actual and observed gains, respectively).

Figure 4.

Example of the algorithm's performance for predicting change in utility for patients in the Canadian cohort achieving a minimally important difference in Health Assessment Questionnaire (HAQ) score (or not) from 0 to 6 months (model 5 for EuroQol [EQ-5D] and model 4 for SF-6D).

DISCUSSION

We anticipated that models with more available predictors would account for a higher proportion of the variance and would therefore perform better as measured by the RMSE. Although this hypothesis was accurate for the EQ-5D where model 5 proved to be the best performing, it was not the case for the SF-6D. The results from the cross validation are conceivably the most important because they predict how generalizable the models will be to external populations. From this we found model 5 to be the most appropriate model for estimating the EQ-5D, whereas model 2 or model 4 was the most appropriate for the SF-6D. The performance of models for the SF-6D always outperformed models for the EQ-5D (e.g., lower RMSEs) due to the smaller scale range of the SF-6D. Although the benefits of using the later models versus the simple estimate in model 1 would seem small in terms of the improvement in RMSEs, overall these models will provide more accurate estimates, partly due to their ability to account for the small nonlinearity seen in the relationship between the HAQ and utility, particularly at severe states of disability (Figure 2).

There are a number of important issues that need further consideration, the first being whether these results would be generalizable to external populations. To address this issue, we developed the models using data from 2 different sources, one from the UK and the other from Canada. Patients in the Canadian data set were older but had less severe RA. Patient heterogeneity within the cohorts is important because it means the models can be used for estimation across a wider range of patients. The country effect that was discovered appears not to be due to age because the Canadian population was older, but could have been due to characteristics not measured in our cohorts. The models were tested on both the UK and Canadian samples. The RMSEs were always higher for the UK population because the models were developed based on more observations from the Canadian sample. We also found that including some additional clinical variables did not improve the predictions, further suggesting that the algorithms should be as applicable in patients with only a few joints involved as in persons with multiple joint involvement. Although the populations in our sample have similar disease characteristics to patients in many studies recently published (20, 21), external validation would add assurance to the results, particularly in patients with more mild and severe disease.

Second, it has been argued that the HAQ DI does not adequately measure aspects of quality of life, measured by the preference-based instruments such as mental health and pain (22). We did not have sufficient data to examine the additive influence of other components in the HAQ questionnaire, such as the pain score. Nevertheless, the models demonstrate that the HAQ DI does explain much of the preference-based measures we have studied, with relatively small RMSEs. Perhaps such aspects of quality of life such as pain are highly correlated to domains and therefore are indirectly covered. Such complex interactions might be the reason for the positive correlation between worsening hygiene and improvement in health utility. The purpose of this study was not to explain why there is a relationship between the 2 measures, but rather to explore if there is a translation between the 2 measures. Importantly, the method described in this report is not designed and would not accurately predict the utility of an individual but rather would only predict the average utility of a cohort. In this respect the models seem to perform well (Figures 2 and 4).

Conversely, it is plausible that aspects of RA captured by the HAQ DI might not be covered in the preference-based measures. Concerns about the EQ-5D and SF-6D in patients with RA have previously been demonstrated (23). The purpose of this report is not to make claims on the superiority or defects of different preference-based measures, but to give researchers a method of estimating what are now frequently used instruments.

Last, this exercise provides a method that will always be suboptimal in comparison with a trial that uses a preference-based questionnaire directly. Given the objectives of the study, there are other approaches that could be used to derive a single index from the HAQ DI. A survey of the general population could be used to value a sample of states defined by the HAQ DI using a preference-elicitation technique such as SG or TTO. This would not only generate an enormous number of health states but more importantly each state would contain 42 pieces of information, which most respondents would find impossible to process. Instead, a selection of the most important items of the HAQ DI could be selected, similar to how the SF-6D uses only 15 questions from the SF-36. Another approach is to administer the HAQ alongside a preference-elicitation technique such as TTO and SG. Regression techniques could then estimate preference weights for each of the items of the HAQ DI using the SG or TTO response as the dependent variable. However, results from such a study would not meet the reference case for either the National Institute for Health and Clinical Excellence or the Washington Panel on Cost Effectiveness in Medicine who prefer social preferences elicited using a choice-based method (3, 24). This exercise could act as a precursor to such studies, but given limited resources, we have undertaken a more pragmatic approach.

Much of this report has concentrated on studies in which no preference-based measure has been administered. Given that the SF-6D does not perform well in patients with severe RA due to a floor effect, there is a potential use when only 1 preference-based questionnaire is administered (21, 25). This is the case in the British Society for Rheumatology Biologics Registry, which measured only the SF-36 (26). The algorithms in this study allowed an estimate of EQ-5D utility to also be calculated (27).

The approach examined in this article is intended to empirically map the relationship between a non–preference-based health-related quality of life instrument and a preference-based measure. This approach has the advantage of being able to utilize existing valuation data and offers a shortcut for researchers who need health-state utility values, but have not used a preference-based measure in their clinical study because of resource constraints or a desire to limit the patient burden. This could be used to estimate the improvement in utility in important trials such as the Anti–Tumor Necrosis Factor Trial in Rheumatoid Arthritis with Concomitant Therapy (ATTRACT) trial of infliximab or the Trial of Etanercept and Methotrexate with Radiographic Patient Outcomes (TEMPO) where no preference-weighted instrument was used (18, 19). The results presented here suggest that such a model can be useful in predicting preference-based values and that the developed models have reasonable predictive ability.

AUTHOR CONTRIBUTIONS

Mr. Bansback had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Bansback, Marra, Tsuchiya, Anis, Brazier.

Acquisition of data. Bansback, Marra, Anis, Hammond.

Analysis and interpretation of data. Bansback, Marra, Tsuchiya, Anis, Guh, Brazier.

Manuscript preparation. Bansback, Marra, Tsuchiya, Anis, Guh, Hammond, Brazier.

Statistical analysis. Bansback, Guh.

Ancillary