The primary outcomes of most clinical studies in rheumatology are chosen as the most relevant and meaningful for the clinical community. However, they have restricted value in assisting policymakers considering resource-allocation decisions. Although clinical outcomes might constitute important results in a rheumatology trial, their use in economic evaluation is confined to cost-effectiveness analysis where outcomes are measured in units that are relevant to the condition under investigation. Comparisons between cost-effectiveness studies are restricted because the outcomes are typically measured in units that differ from study to study. Comparison across therapeutic areas becomes almost impossible. Therefore, cost-utility analysis (CUA; a distinct form of cost-effectiveness analysis) in which outcomes are measured in terms of a standard unit metric that combines information on the quantity and quality of life, the quality-adjusted life year (QALY), is important (1, 2). The QALY requires data that express health-related quality of life (HRQOL) in the form of a single value, known as a health state utility value (HSUV), which is scored on a scale that assigns a value of 1 to a state equivalent to full health and 0 to a state equivalent to death (3).
Although the most recent studies in rheumatology have used some form of an HRQOL questionnaire, such as the Short Form 36, Health Assessment Questionnaire (HAQ), or Western Ontario and McMaster Universities Osteoarthritis Index, these questionnaires typically measure and summarize a number of aspects of quality of life as a profile based on the responses. However, none of these instruments alone can be used to obtain an HSUV and therefore they are not amenable for economic evaluation. To obtain an HSUV requires the incorporation of a preference weight (3). The resulting values can be used to compare the general population preferences for different disease states both within and across diseases. When this is linked to the effect of an intervention, policymakers tasked with improving the outcomes of the whole population can allocate resources accordingly. It is from this context that economic evaluation becomes important: by identifying what gains in HSUVs (and life years) can be achieved by new interventions and at what additional cost.
HSUVs have been described, analyzed, and reported in the rheumatology literature for more than 10 years (4). With the rising cost of health care (5), the use of economic evaluations has escalated, hence the rising interest in the methods and results of HSUV measures. However, although the motivation for using HSUVs is clear, the issues surrounding their development and use, predominantly the research of economists, is less well understood. Policies informed by these methods can impact the treatments available to physicians, and ultimately the patients' well-being. We consequently reviewed the rheumatologic literature with the objective of identifying and addressing key issues and concepts in the valuation of HSUVs and then reported their application in rheumatology. We make recommendations for persons wanting to obtain values in the future and highlight issues requiring further research.
Materials and Methods
The review was based on a systematic search of Medline, EconLit, and the National Health Service Economic Evaluation Database for the years 1980–2005. The search identified articles reporting HSUVs in major rheumatic diseases (rheumatoid arthritis [RA], psoriatic arthritis [PsA], ankylosing spondylitis [AS], systemic lupus erythematosus, osteoarthritis [OA], and osteoporosis [OP]). We used the search criteria for HSUVs as described by Brazier et al (6), which identifies methods (e.g., multiattribute theory and time-tradeoff [TTO]), instruments (e.g., EuroQol [EQ-5D] and Health Utilities Index [HUI]), and their applications (e.g., QALYs). The search strategy is shown in Figure 1.
We included studies that reported specific HSUV values for rheumatic diseases. We excluded studies that reported HSUVs solely as a description of the study population, or reviews or CUAs that did not present primary data sources. Studies were also identified by hand searching, citation searching, and reference list checking, and by those known to researchers involved in the present study. Two databanks of HSUVs and economic evaluations were also searched and used to complement and validate the search strategy (7, 8). For each study selected, the type and value of each measure was extracted, along with other details from the publication. The methodology used in articles identified at this stage of the review was used to identify inconsistencies that suggest the need for clarification of key issues and concepts in the initial section of the review.
To most accurately describe the impact of each disease across instruments and conditions, we constructed criteria for assessing the quality of HSUVs from each study. While no commonly accepted criteria currently exist, we used the study conducted for the Institute of Medicine as an example (9). For studies to meet the inclusion criteria, they had to 1) be reported in primary articles, 2) present values where the source of preferences was from the general population (the recommendation by a number authoritative experts and panels [1, 10–12]), 3) present values separately for a population with a clear diagnosis of a rheumatologic condition, 4) be from populations of sufficient sample size (n ≥ 60 ), and 5) adequately describe important determinants of disease that are known to be closely related to HRQOL.
The broad search strategy found a total of 1,352 articles. We initially sifted through the abstracts to identify those potentially relevant to this review. A number of the articles were on the subject of clinical utility, and once this was removed from the search, the number of articles was reduced to 333. These articles were retrieved and reviewed to identify those that contribute to the issues in the HSUV literature. Of these, 126 articles were judged to meet the final inclusion criteria representing useful utility values. The number of studies has increased exponentially over the past 5–6 years. Among these articles, a wide variety of instruments were used, and a number of issues and inconsistencies were found. Inconsistencies centered around the perspective used to value health states, their description and interpretation, and the techniques used to ascertain values or preferences for health states (14). We begin by examining the issues identified in the literature under 4 broad themes.
Issues in the development of HSUVs.
What is being valued?
A health outcome is a path of health states evolving over time, often over an individual's lifetime. Developing a numerical score for the health outcome involves developing a score for each health state and combining that score with duration to determine the number of QALYs created. Because each health state comprises many different domains, developing a score for each state requires a process that combines the effect of each domain into a single metric (1).
Non–preference-based methods assign scores to individual components of the health state, using a Likert scale or a binary response, and then sum the component scores to a single score. This can be rescaled to a 0–1 scale (e.g., the HAQ could be divided by 3). However, this assumes that the weight between the different domains is known (e.g., an improvement from pain to no pain has a weight equivalent to an improvement from immobile to mobile). It also leaves a policymaker unaware of what aspects of health are most important to the population. The alternative is a preference-based methodology, where subjects are asked to make judgments regarding the value of particular health states. The judgments are then used to produce a score.
The incorporation of preferences is what distinguishes most common HRQOL instruments from a health utility measure. Judgments to inform preferences require individuals to consider a transition from their current (or a hypothetical) health state to an alternative (usually preferable) health state that involves sacrifices of something they value. The objective is to identify the point at which individuals are indifferent between the 2 alternative states, which is used to calculate a valuation of the health state in question. The greater the sacrifice or risk accepted to move to the alternative health state, the lower the valuation of the current health state (3).
Who should do the valuations?
Having patients value their health has the advantage of avoiding the need to describe hypothetical health states and ensures a good understanding of the impact of each state on a person's life. However, it has been argued that, for the purpose of informing resource allocation, the values of society at large should be required. The decisions about allocation should be made by individuals who do not stand to lose or gain by these decisions, and because HSUVs are the basis for allocation, these should be assessed from a wider perspective than that of the patient (1). The valuations of patient health states also need to be compared with optimal health (equal to 1), and while patients by definition are best able to describe and understand their HRQOL, their perception and expectation of optimal health may be modified by their disease experience. Although the debate is ongoing over whether values from patients, professionals, or members of the general population (e.g., society) are the most legitimate, the general consensus is that societal values are the most appropriate for policymakers making decisions that concern a broad spectrum of the population (1, 10, 12).
The debate is important as our review highlighted a number of studies that found large differences in the outcomes dependent on who valued the disease (15, 16). When valuations of purely hypothetical (no group actually experiencing the condition) health states are compared between patients and the general population (17, 18) and between the general population and health professionals (15), clear differences in valuation between groups are evident but the direction is inconsistent. However, higher valuations of poor health states by patients actually experiencing a condition than members of the general population trying to imagine the same states have been reported (16).
The difference in values between patients experiencing the condition and the general population's imagination of a certain health state is not unexpected. The unknown is often more frightening than reality (1). Patients experiencing a health state may develop coping mechanisms and modify their behavior, which may minimize the impact of a health state on their quality of life. Also, a change in expectation can mean that the reference point patients compare their condition against is somewhat lower than they may have considered previously, thus the valuation of their current health state is inflated (19).
How are health states described?
When asking members of society to value hypothetical health states, it is necessary that the description ensures they have an understanding of the impact of the state. Participants may be asked to value specially constructed vignettes that describe each health state, or to use generic health state descriptions that are not specific to any condition. Consequently, a questionnaire with a number of questions and levels can be administered to the patient population of interest, for which each combination of health states has an a priori HSUV value from the survey of the general population. The review found that 6 such preference-based instruments have been used in rheumatology patients, with the EQ-5D (20) being the most popular, followed by the HUI-2/3 (21), the Short Form 6D (SF-6D) (22), the Quality of Well-Being scale (QWB) (23), the 15D (24), and the Rosser index (25) (Figure 2). Each instrument covers different domains, with a different number of levels (Table 1).
Table 1. Selected characteristics of generic preference measures*
Method of elicitation of health states
HUI = Health Utilities Index; v2 = HUI2; v3 = HUI3; SG = standard gamble; VAS = visual analog scale; EQ-5D = EuroQol; TTO = time-tradeoff; QWB = Quality of Well-Being scale; RS = rating scale; SF-6D = Short Form 6D.
Physical functioning, role limitation, social functioning, bodily pain, mental health, vitality
0.3 to 1
In practice, general population valuation surveys have mostly been used to populate generic health status measures by providing a set of utility weights. However, similar surveys can be conducted for disease-specific instruments or outcomes. Examples include the valuation of different types of treatment response, based on the American College of Rheumatology criteria in patients with RA (26); a reduced version of the Cedars-Sinai HRQOL instrument in patients with RA (27); and 3 health states related to OP (16).
How should health states be valued?
Common techniques for measuring preference directly include the standard gamble (SG), the TTO, and rating scales (RS)/visual analog scales (VAS) (Figure 3). The SG involves trading alternative health states against each other with a risk of immediate death, whereas the TTO trades duration of life against quality of life. In the SG, the patient is offered 2 alternatives. The first has the certain outcome of disease state A for life (no gamble). The second is an intervention with 2 possible outcomes (gamble): either the patient returns to full health for life, or the patient dies immediately. Probability is varied until the respondent is indifferent between the 2 alternatives. The preference value for state A is then equal to the probability. In the TTO, the patient is offered 2 alternatives. The first has the certain outcome of disease state A for a specified time (t; can be the rest of the respondent's life). In the second, the patient returns to full health for a time ≤t. The time in full health is reduced until the respondent is indifferent between the 2 alternatives. The preference value for state A is then the time chosen in full health divided by the time in state A. For the VAS, the patient is asked to rate his or her health by drawing a line on the scale above where 100 is the best state they can imagine and 0 is the worst. The utility value is simply the numerical point on the scale divided by 100.
Although experience and clear presentation have allowed for high response rates and internal validity in both the SG and TTO, they can be burdensome and difficult for some individuals to understand (28). This is not the case with the VAS, where the respondent has to simply place a mark on a 0–100 scale indicating the rating for the health state (1). Likely for this reason alone, our review found that of studies that measured HSUVs directly from patients, most used some form of RS.
However, the validity of RS as a measure of the strength of preference has been questioned (29). Although the SG and TTO are not without criticism, they are generally preferred. Theoretical matters would not be quite as critical as long as the underpinning theory methods are valid and the measures produce comparable results. Articles included in the review found this not to be the case. HSUVs derived from RS were consistently lower than TTO or SG ratings (15, 18, 30–35). Values derived from the SG were higher than those from the TTO, possibly due to the incorporation of risk in the rating (15, 36).
The methods that have been used to attach HSUVs to generic health status questionnaire profiles vary greatly (37). The EQ-5D and Rosser index used predominantly the TTO to value health states. The HUI and SF-6D used the SG (the HUI used power transformations to map some RS valuations to SG), whereas the 15D and QWB relied on an RS. It is impractical to value all potential health state combinations (the number varies from 243 in the EQ-5D to >20,000 for the HUI and 15D); some form of regression is used to estimate a scoring algorithm that can assign a value to all possible health states.
Utility values in the literature.
Of the 126 articles initially reviewed, only 27 studies met the quality assessment criteria comprising patients with RA, AS, PsA, OP, and OA (Figure 4). For ease of comparison, we present the results in comparison with age- and sex-adjusted population norms (Figure 5) (38, 39).
We found 13 studies that reported values meeting the quality assessment criteria in RA (17, 23, 40–50). Many studies were excluded because they only measured HSUV directly. Mean values tended to be in the range of 0.5–0.75 for patients recruited from routine clinical practice (23, 40, 41, 44, 48). It is clear that HSUVs differ depending on a number of characteristics of the patients because mean values for the same instrument give widely different results (the EQ-5D means varied from −0.1 to −0.3 units less than the population norm). Important determinants of different utility values within patients with RA appeared to be (in order of magnitude) functional class (46), self-report severity (41), disability (43), income (45), treatment (49, 50), and education (45). Patients with a functional class of IV differed from patients with a functional class of I by 0.9 units when assessed with the EQ-5D (46). While we report the univariate association, it is likely that multivariate (e.g., other than functional class) determinants exist within this patient group.
The agreement between HSUVs from different instruments varied according to the severity of the health state in question. The HUI measure had the largest range of values between the least and the most severe self-reported RA severity (0.52), approximately twice that of the SF-6D (0.33) (41).
Four studies met the quality criteria in studies of patients with AS (51–54). All studies solely used the EQ-5D. Again, mean values varied by an amount similar to patients with RA (−0.1 to −0.3 units below the population norm) (51–53). The Bath Ankylosing Spondylitis Functional Index (BASFI) and the Bath Ankylosing Spondylitis Disease Activity Index both appeared to be important determinants of utility in patients with AS (54). Although the univariate association appeared larger for the BASFI, an interaction between both instruments would be important (54).
Only 1 study of PsA was included (48). The mean EQ-5D value of 0.59 was again in the same bounds as the RA and AS studies (0.22 points worse than general population values). Although not conclusively shown, it appears that the determinants of HSUV in patients with PsA would be important in the differential impact of psoriasis and joint disease (48).
Five studies reported values in patient groups with OA (55–59). The largest deficit in utility values reported was for those awaiting a hip arthroplasty. Here, utility values increased by between 0.4 (EQ-5D) and 0.1 (SF-6D) 6 months after the operation. For the EQ-5D, postsurgery values were equivalent to those in the general population (an improvement of 0.4 units). However, this change was smaller for both the HUI and SF-6D. The choice of instrument would therefore have an important influence on the result of any CUA of hip arthroplasty (55, 57).
Studies of OP generally reported the change in HSUV utility postfracture, and therefore were not included in Figure 5. Five studies met the quality assessment criteria (60–64). The most comprehensive study provided values for fractures (in descending order of impact) of the hip, vertebrae, rib, pelvis, and wrist, varying from −0.09 to 0 using the HUI (61). Another study found that increasing the number of vertebral fractures from 1 to ≥4 led to a reduction in EQ-5D of 0.09 (63).
A clear rationale exists for the use of HSUVs to aid important policy decisions regarding the reimbursement of health technologies. This comprehensive search and review of HSUV articles in rheumatology documents the dramatic increase in the use of HSUVs in rheumatic diseases over recent years. Our review reveals some complex issues regarding the development of instruments to measure HSUVs, and demonstrates that different instruments consistently produce dissimilar results in multiple rheumatologic diseases.
It is predominantly the need for societal values of health utility that has led to the development of preference-based instruments. Because patients with rheumatic diseases adapt to their health conditions, patient valuations of HSUVs would be higher than societal values and would subsequently result in small QALY gains for even the most efficacious treatments. Our review demonstrates that while societal HSUVs are increasingly measured in the rheumatology literature, only 5 articles meeting our criteria reported the change in HSUVs due to interventions (49, 50, 55–57). Without such evidence, it is difficult to demonstrate the cost utility of the intervention. Another important finding was that for all diseases, HSUVs differed depending on which preference-based instrument was used.
The cost of using societal values is that a number of new problems are introduced. Persons in the valuation survey have to imagine they have hypothetical health states, which limits the number and scale of domains that can be included, and requires them to accurately understand the implications of such a health state (37). The value of the health state must then be elicited, for which there are alternative methods that give varying results (14). Then, the health states not directly valued must be modeled, for which alternative statistical methods again exist (65). Finally, the type of persons used in the valuation survey may give different results (66). The consequence of these issues is that the instruments are sometimes insensitive (for example, the HUI was found to be insensitive in conditions that affect the lower limbs or hands and fingers ), are subject to floor and ceiling effects (68), and, as seen in our review of values, result in different values for similar health states. Different values mean that the cost-acceptability conclusions can be hugely affected by the type of instrument used (15, 69).
The limitation in using HSUVs for reimbursement decisions does not end with how the values are obtained. The QALY methodology and use of CUA in reimbursement decisions are the subject of continuing debate and controversy (70, 71). Critics of the QALY methodology have criticized it based on assumptions regarding linearity of the value of health per unit time (72). Assuming linearity of the value of health per unit time may lead to situations of preference reversal where the preference of an individual between alternatives may be overruled (72). QALYs have also been criticized for assuming, for purposes of aggregation, that a year in full health is of equal value to everybody (70, 72). However, societal valuations of medical interventions vary according to the severity of a condition and sequence and duration of health states (70). The process of aggregating individual QALYs to provide an overall QALY has been criticized as an oversimplification that may lead to overlooking important information about the nature of an intervention's effectiveness (73). Criticisms of using CUA have focused on the simplification of assumptions such as perfect divisibility, constant returns to scale, and comparison against arbitrary thresholds that ignore opportunity cost (71).
Given the described limitations in obtaining both HSUVs and QALYs, alternatives to CUA have been sought. For example, healthy life years equivalent and cost-effectiveness analysis using clinical outcomes have been proposed as alternatives to QALYs and CUA, respectively. However, so far no alternative has been found to be as practically informative as QALYs and CUA, and therefore the majority of policymakers who consider economic value in their allocation decisions have accepted the limitations in HSUVs from a purely pragmatic decision-making perspective. Consequently, reimbursement of technologies for rheumatologic conditions will depend on the measurement of HSUV in clinical studies and further research to overcome some of the limitations that are pertinent to health states experienced by patients with rheumatic diseases.
Although there is no conclusive superior instrument, some general rules can help make the decision of which to use. First, the use of a preference-based instrument would be recommended to obtain societal values of utility benefit and to limit patient burden. Second, the instrument with the type of domains that are most relevant to the patients in whom the instrument will be used should be chosen. Last, the country in which the research will be primarily disseminated should be considered because some countries prefer certain instruments (e.g., the EQ-5D in the UK and the HUI in Canada because the surveys for each instrument were conducted in the respective countries). With the current uncertainties, using multiple instruments might also be prudent, especially the inclusion of the EQ-5D because it is quick to administer, is popular, and the results can be compared with other studies.
We propose both the further analysis of current instruments and the development of new instruments to improve future policy decisions regarding rheumatologic technologies. With the current data available, a thorough examination of the current preference-based instruments should be conducted. Essentially, this should aim to examine the comparable reliability, content and construct validity, and responsiveness of each instrument to assess whether one has more scientific evidence to support its use. The development of a new instrument should only be considered if it will notably reduce the issues with preference-based instruments, otherwise it will only add to the confusion of which instrument and value to use. Proponents of condition-specific measures argue that using domains that focus on the aspects pertinent to the disease will increase sensitivity and construct validity (14). For example, none of the current instruments include domains specifically for fatigue, which has been found to be important in certain rheumatic conditions (74). Basing the new health state classification on an existing, well-used, disease-specific questionnaire would reduce the need to burden the patient with an additional questionnaire and would increase the number of clinical studies that could derive HSUVs. However, concerns of comparability of HSUVs from condition-specific measures across diseases will have to be overcome.
Rheumatologic conditions must compete alongside other areas of medicine for health care resources. CUAs, which use HSUVs, are increasingly applied by policymakers across the world to allocate health care resources. Although the use of HSUVs in rheumatology is rapidly increasing, numerous gaps in data remain. Alarmingly, relatively few trials of interventions for rheumatic diseases were included in our review, indicating that HSUVs are not being collected or reported in these studies. It is in these interventions that CUA will be required. Consequently, only by further understanding the use, limitations, and importance of HSUVs from a resource allocation perspective will the rheumatology community effectively provide evidence for policymakers.
Dr. Anis had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Bansback, Brazier, Marra, Symmons, Anis.
Acquisition of data. Bansback, Harrison.
Analysis and interpretation of data. Bansback, Harrison, Brazier, Marra, Anis.