QALYs: The Basics
This article is corrected by:
- Errata: Erratum Volume 13, Issue 8, 1065, Article first published online: 3 April 2009
Milton C. Weinstein, Department of Health Policy and Management, Harvard School of Public Health, 718 Huntington Avenue, Boston, MA 02115, USA. E-mail: firstname.lastname@example.org
The aim of this article is to review the concept of the quality-adjusted life-year (QALY), a widely used measure of health improvement that is used to guide health-care resource allocation decisions. The QALY was originally developed as a measure of health effectiveness for cost-effectiveness analysis, a method intended to aid decision-makers charged with allocating scarce resources across competing health-care programs [1–3]. We refer to this original concept of the QALY, as defined in the early literature, as the “conventional” QALY, recognizing that alternative conceptual models have been proposed, including but not limited to so-called “equity-weighted” QALYs. The US Panel on Cost-Effectiveness in Health and Medicine  and the National Institute of Health and Clinical Excellence (NICE) in Britain have both endorsed the conventional QALY for their “reference case,” i.e., a standardized methodological approach to promote comparability in cost-effectiveness analyses of different health-care interventions.
In using QALYs, we assume that a major objective of decision-makers is to maximize health or health improvement across the population subject to resource constraints. The use of QALYs further assumes that health or health improvement can be measured or valued based on amounts of time spent in various health states. The conventional QALY is therefore a valuation of health benefit. We note, however, that decision-makers may also have other objectives such as equity, fairness, and political goals, all of which currently must be handled outside the conventional Special Issue  addresses some of these variations on the conventional QALY. The QALY was not initially developed to aid individual patient decision-making, although its use has sometimes been extended into clinical decision analyses for this purpose.
The core concept of the conventional QALY is grounded in decision science and expected utility theory. The basic construct is that individuals move through health states over time and that each health state has a value attached to it. Health, which is what we are seeking to maximize, is defined as the value-weighted time—life-years weighted by their quality—accumulated over the relevant time horizon to yield QALYs. Health states must be valued on a scale where the value of being dead must be 0, because the absence of life is considered to be worth 0 QALYs. By convention, the upper end of the scale is defined as perfect health, with a value of 1. To permit aggregation of QALY changes, the value scale should have interval scale properties such that, for example, a gain from 0.2 to 0.4 is equally valuable as a gain from 0.6 to 0.8. States worse than dead can exist and they would have a negative value and subtract from the number of QALYs. These conditions, along with an assumption of risk neutrality over life-years, are sufficient to ensure that the QALY is a useful representation of health state preferences.
What Is Value?
In the conventional concept of QALYs, a health state that is more desirable is more valuable. Value is equated with preference or desirability. A critical question is: desirable to whom?
One possibility is to define the desirability of a health state based on how an individual would value being in that health state herself or himself. This measurement of individual preferences is commonly accomplished by preference surveys in which standard gambles, time trade-offs, or visual analog scales are used to assess preferences for specified health states. A key issue, to which we will return, is whether the relevant values for resource allocation decision-making are those of people who are currently experiencing the health state of interest, or those of people on whose behalf the decisions are being made and who may or may not be in the health state at the time they assess its value. An alternative to assessing preferences for health states directly is to assess preferences for a small set of health domains, or attributes, and then to construct a multiattribute utility as a summary measure that reflects preferences both within and across health domains.
As noted in passing previously, the desirability of a health state can also be conceptualized in terms of people's preferences about the health of the community (i.e., mainly the health of others, but possibly including themselves as a member) and not about their own health. Valuation of community health directly, instead of aggregating up from preferences about individual health, permits the decision-maker to incorporate objectives other than health maximization. We return to this issue later, and it is also discussed in the next article in this Special Issue . Preferences about health states in the community are sometimes measured using person trade-offs, although standard gambles, time trade-offs, and rating scales could be applied to this task as well as to individual health state preferences.
An important aspect of conventional QALYs—regardless of the preference measure or the perspective used in assessing health state values—is that the approach values health states and not changes in health states. Special Issue  (Nord et al. this issue) considers alternatives to the conventional QALY that value changes in, rather than absolute levels of, health as the valued outcome.
How Are QALYs Used?
Who are the decision-makers and what are they asking? Some of the broad categories of the uses of QALYs in health decisions are represented by the columns in Table 1, and we discuss these first. Next, we turn to the various attributes of the procedures used to define and construct the QALY measure, represented by the rows of Table 1. First, what is being valued? Second, whom should we ask, and whose preferences should matter: people currently experiencing the health state, or others? Third, what should be asked—what valuation technique should be used, and how should it be administered? Fourth, how are the health outcomes that are the object of the preference assessments defined? Are they health states (the conventional approach), paths of health states over time, or changes in health states, either at a point in time or over time? We take up these aspects of QALYs sequentially. The last two rows of Table 1, to which we will return in discussing each aspect of QALYs, summarize some of the additional assumptions needed to invoke the QALY as a reasonable measure of health or health improvement, and some additional considerations that are left out of the QALY.
Table 1. Matrix characterizing uses and definitions of QALYs
|Value concept: whose health outcome and whose preferences||Individual's health: ex ante desirability as seen by the individual||Individuals' health: experienced utility, then aggregated||Individuals' health: ex ante desirability as seen by each individual, then aggregated||Community health: the health of others,* as seen by each individual, then aggregated|
|Whom to ask:||The individual, informed by patients/disabled people†||Those affected by the activity; e.g., patients/disabled people, those “prevented” from a disease, etc.||Representative sample of population||Representative sample of population|
|Valuation technique:||SG,‡ TTO,§ RS||SG,‡ TTO,§ RS, or MAU instrument||SG,‡ TTO,§ RS, or MAU instrument||PTO, or transformation of MAU values|
|Health outcomes:||Complete health profiles over time∥||Health states and durations||Complete health profiles over time∥||Health states and durations||Health states and durations (conventional QALY application)||Health states and durations|
|Additional assumptions needed:||None, if SG used||Risk neutrality on longevity, additivity across time||Aggregation across individuals||Risk neutrality on longevity, additivity across time, aggregation across individuals||Risk neutrality on longevity, additivity across time, aggregation across individuals||Risk neutrality on longevity, additivity across time, aggregation across individuals|
|Additional considerations needed:|| || ||Equity of actual outcomes||Equity of actual outcomes||Equity of potential outcomes||Equity/fairness built in to some extent?|
What Is the Question?
We have arbitrarily divided the universe of questions for which QALYs, broadly defined, may provide part of the answer into three categories, as reflected in the three major columns in Table 1. The first category (in the right-hand column) comprises societal resource allocation questions—that is, priority setting across proposed programs and program changes. As stated above, this has been the primary focus of the conventional QALY. The second category (left-hand column) encompasses personal decisions that individuals make to affect their own health, including clinical decisions and decisions about the choice of health insurance coverage. A third category of QALY use, which we call societal audit or programmatic audit (middle column), is to evaluate ongoing activities or programs in terms of the health of a population. For this purpose, one needs a description of the health of the population, either at a point in time or changes over time.
The last two rows of Table 1, to which we will return in discussing each aspect of QALYs, summarize some of the additional assumptions that support the QALY as a reasonable measure of health or health improvement, and some additional considerations that are left out of the QALY.
What Is Being Valued?
For personal clinical or insurance decisions, the decision-maker is an individual (or household unit) who is concerned about the desirability of each of the possible health states. We often approach these decisions analytically by using decision trees or various kinds of optimization models in which the decision-maker seeks to optimize expected utility. Because the perspective is that of the individual decision-maker, the relevant utilities are those of the individual, as viewed at the time of the decision. Hence, if the possible outcomes of the decision include health states that the individual has never experienced, the relevant preferences are those of the individual ex ante the decision and ex ante any experience in the health state. Of course, a prudent decision-maker would seek to become well informed about those possible health states, including possibly asking people who have experienced them to convey how they feel about them.
For purposes of programmatic or societal audit, we are usually interested in valuing the current health of the affected population members from their own perspective. A variant approach might be to take into account not only the desirability of the current health states but also the desirability of the future health prospects of the members of the population, including life expectancy and anticipated health prognoses. A rationale for the latter approach to societal audit might be that if the population has a high prevalence of risk factors for future mortality and morbidity, their health would be considered less desirable than a population with similar current health but a better prognosis.
Whereas individual clinical or insurance decisions are governed by individual ex ante preferences, and programmatic or societal audits are accomplished by measuring individual ex post preferences, societal resource allocation decisions can be guided by measures of value either from the perspective of individuals (ex ante or ex post) or from the perspective of the community. For individuals, this amounts to measuring the desirability of health states to individuals—as in the case of individual clinical decisions—and then aggregating across individuals. This individual-based approach to measuring value is consistent with the principle of consumer sovereignty, the keystone of welfare economics, and this has been the approach most commonly applied in constructing conventional QALYs. Individual health preferences are measured through techniques such as the standard gamble, time trade-off, and visual analog scale. The preferences in conventional QALYs are usually ex ante, although they could also be based on the values expressed by individuals who find themselves in the health states ex post .
For societal resource allocation decisions, the value of a health outcome can also be elicited in terms of how individuals feel about the health of others or of the community as a whole, and then aggregated. The desirability of health states or health state changes to others can incorporate concerns for fairness as seen by each individual. For example, an individual might attach higher value to health changes for people who start out in poor health compared to people who start out in good health.
Whom Do We Ask?
From a societal point of view, we can either ask for people's preferences or values about their own health or about the health of others. To determine how people value their own possible health states, one would survey a representative sample of the affected population—which includes patients or disabled people as they occur in the population—and elicit preferences about a range of health states. People who have not experienced particular health states should ideally be informed by knowledge of what patients and disabled people tell them about what it is like to be in those states. Although one might draw a contrast between this ex ante approach and an ex post approach that limits the relevant preferences to people who are currently experiencing the health states of interest , there is not a complete dichotomy. Individuals do have opportunities to become informed about the adaptations that people go through when they experience the various possible health states that the individuals are being asked to value.
A different answer to the question of whom to ask for preferences to inform societal resource allocation decisions is to elicit preferences regarding the health of the community at large, rather than individual preferences for people's own health. When valuing the health of others, we would still tend to go to a representative sample of the population, which includes patients and disabled people as they occur and who are also informed by what they know about the views that patients or disabled people attach to their own health.
For programmatic or societal audit, we would go to the members of the population of interest, and for personal clinical decisions, we would go to the individual patient as the decision-maker. In the context of individual decisions, it is clearly the individual's own preferences that we are interested in, with the proviso that these preferences are informed by knowledge of people who are in the health states that they may have not yet experienced but may possibly experience.
What Do We Ask?
The standard gamble is the method of choice for personal clinical or insurance decisions, based on the principles of expected utility maximization from decision theory. Nevertheless, the time trade-off and possibly rating scales are also used to value health states because of concern that the standard gamble may be subject to cognitive biases in elicitation .
For the societal individualistic approach, preferences for QALYs are obtained by using the standard gamble, time trade-off, or rating scale, or by using multi-attribute instruments that apply these methods to obtain preferences within and across selected domains of health. For valuing the health of communities or others, the person trade-off offers an additional option. The person trade-off could also be used at the individualistic level, but it tends to be more naturally used in evaluating the health of communities.
How Do We Ask These Value Questions?
The standard gamble is based on well-defined, widely accepted axioms of consistency of preferences under uncertainty such as transitivity, independence, and continuity. The standard gamble stands alone among these measures by having been shown to have interval scale properties with respect to preferences, such that a change from 0.2 to 0.4 is equally valued as a change from 0.6 to 0.8 . The time trade-off tends to be approximately equivalent to the standard gamble, as demonstrated by several empirical studies. It has a unique conceptual relationship to QALYs because it is explicitly a trade-off of time with an impaired health state relative to healthy time—quality-adjusted time. The time-trade-off is theoretically equivalent to the standard gamble under the conditions in which QALYs are appropriate as a utility, which include risk neutrality with respect to longevity. Rating scales (including visual analog scales) are generally considered theoretically inferior to standard gambles or time trade-offs because of the scaling biases they entail and the fact that it involves a rating task rather than a choice task. Nevertheless, they are not subject to cognitive biases in elicitation that may be induced by the use of probabilities in the standard gamble elicitation task , nor are they affected by temporal discounting as in the time trade-off elicitation task .
Examples of multiattribute utility instruments include the EuroQOL 5-item scale (EQ-5D), The 7-item Health Utilities Index 2 scale, the 8-item Health Utilities Index 3 scale, the 6-item SF-6D scale based on the SF-36, the 4-item Quality of Well-Being scale, the 15-item 15D scale, and the 5-item Assessment of Quality of Life scale. In this approach, the health states that are valued comprise a matrix of combinations of health domains, or attributes, which are associated with the particular instrument. For example, the 243 health states in the EQ-5D are defined by selecting one of the three levels of health within each of the five health domains. Patients classify themselves into one of the cells in these matrices. Each cell comes with a score that has been previously obtained by a survey of members of the community. This is the method of choice in most contemporary cost-effectiveness studies. It was the approach recommended by the Panel on Cost-Effectiveness in Health and Medicine , and it is preferred by NICE. The problem is that different instruments give different results, partly due to the preference elicitation methods, partly due to choice of health attributes, and partly due to the manner in which interactions across the individual health attributes are modeled. Moreover, the value scores may be population-specific (by country or by sociodemographic characteristics of the respondents), although less so empirically than one might suspect.
The person trade-off is one example of another approach, which we might label as population equivalence methods. Here, individuals act as surrogate decision-makers and make hypothetical choices about competing programs, as if they were benevolent dictators. For example, the person trade-off might ask respondents to choose between helping X patients with condition A to improve to A′, versus helping Y patients with condition B improve to B′. The paradigm for a person trade-off involves asking representative members of the community to compare the desirability of giving numbers of patients' different health improvements. In one variation of this approach, one of X or Y is fixed and the other is adjusted until the respondent is indifferent between the two choices. At this point, the ratio of X to Y reflects the relative value of the improvement from B to B′ compared to the improvement from A to A′. Note that the respondent to the person trade-off is making judgments about the value of health outcomes for others. Another feature is that the person trade-off allows the respondent to incorporate some aspects of equity or fairness, in that the value of a health improvement can depend on the baseline level of health (A or B in the example); specifically, it is possible to attach a greater value to health improvement for people who start off in a worse health state than others.
How Are the Health Outcomes Defined?
In the conventional QALY approach, health outcomes are defined in terms of health states, and each state is valued at a particular point in time. Health state utilities are then summed over time to yield the number of QALYs, with discounting applied if the QALYs are being used in a cost-effectiveness analysis. By making the assumption that the value of being in a health state depends neither on the length of time spent in the health state, nor on the sequence of health states preceding or following it, the time dimension is taken out of the utility assessment process. A critical assumption in this regard is that QALY values, once obtained, are additive over time, possibly weighted by time preference if discounting is applied.
A more general approach, which relaxes the assumption that the value of a health state does not depend on the states that precede or follow it, nor on the length of time spent in it, (i.e., intertemporal utility independence), would be to assign value to sequences of health states over time, sometimes called health profiles . Valuation of health profiles is more general than valuing health states and then summing up, and is therefore theoretically superior because it does not involve the assumption of additivity of values over time. The practical problem with valuing health profiles is that, because outcomes are specified as complete lifetime paths of health states, there are potentially a very large number of them. In the context of a Markov model, a patient-level simulation, or even a clinical trial with multiple follow-up times, the task of valuing all possible paths or profiles is virtually intractable . If there are N possible health states and T time periods, the health profile approach requires NT valuations.
A third approach, covered in more detail by Nord et al. in this Special Issue , values neither absolute health states nor health profiles but focuses on changes in health states. Instead of valuing health state X and a preferred health state Y separately, value is attached directly to the improvement from X to Y. The person trade-off, in particular, invites the respondent to focus on both the origin and destination health states and, if desired, to attach greater value to health improvements from less desirable origin states. One essential difference between this approach and traditional QALYs is that when coupled with the person trade-off, this method for eliciting preferences for changes in health focuses on the health of a community, of which the respondent may (or may not) perceive himself to be a member.
A similar limitation also applies to this approach as for the health profile approach: the number of utility elicitations required can be very large. If there are N possible health states, the approach that values changes requires N × (N − 1) valuations—not as large as the number of possible health profiles but still larger than the number of possible health states. For example, if a range of health states as rich as is provided by the EQ-5D (243 health states) is desired, there would be 243 × 242 = 58,806 possible changes in health states to consider. Another problem with this approach is that the pathway for change from one health state to another can affect the valuation of that change: the value of improving from X to Y and then from Y to Z could be different from the value of a direct improvement from X to Z. In fact, if the valuations of changes are always independent of the pathway, then the approach is equivalent to valuing the health states themselves.
QALYs from First Principles
To conclude this article, and to serve as a springboard for the other articles in this Special Issue, we identify nine assumptions that underlie the conventional QALY approach as used in societal resource allocation decisions (Table 2).
Table 2. Underlying assumptions of the conventional QALY approach
|1. A resource-allocation decision must be made.|
|2. The outcomes of the alternatives can be specified in terms of health states, changes, and durations.|
|3. Resources are limited, and each alternative has resource implications (costs).|
|4. A major objective of the decision-maker is to maximize health of the population, subject to resource constraints.|
|5. Health is defined as value-weighted time (QALYs) over the relevant time horizon.|
|6. Value is measured in terms of preference (desirability).|
|7. Each individual is risk neutral with respect to longevity and has utility that is additive across time.|
|8. Value scores (preferences) measured across individuals can be aggregated and used for the group|
|9. QALYs can be aggregated across individuals; i.e., a QALY is a QALY regardless of who gains/loses it|
First, a resource allocation decision has to be made.
Second, the health-related consequences of the alternatives can be specified in terms of health states, changes in health states, and durations of health states over time. All nonhealth consequences are either measured as economic costs and included in the numerator of the cost-effectiveness ratio, or are omitted from quantitative consideration as part of the cost-effectiveness analysis.
Third, resources are limited, and each alternative has an impact on the resources available, i.e., an opportunity cost.
Fourth, a major objective of the decision-maker is to maximize the health of the population subject to resource constraints.
Fifth, health is defined as value-weighted time, over the relevant time horizon.
Sixth, value is measured in terms of preference or desirability.
These six premises do not restrict the method used to specify or value the health outcomes used in QALYs. Next, we turn to some more restrictive (and controversial) assumptions that relate specifically to the conventional QALY concept.
Seventh, each individual is risk neutral with respect to longevity and has utility that is additive over time. Risk neutrality is needed to justify the calculation of quality-adjusted life expectancy, that is, the average value of the possible numbers of QALYs, each weighted by its probability of occurring. Additivity over time, discussed previously, is the assumption that allows us to focus on valuing health states at points in time, without regard to their duration or sequence. These are very strong assumptions about preference that undoubtedly simplify reality, but they are necessary in order for QALYs to represent an individual's utility function for health over time. To say that the empirical evidence is mixed as to whether those assumptions provide a serviceable approximation to reality is probably generous for QALYs. For the most part, the evidence is that most people are probably risk averse with respect to their own longevity (although as societal agents, they may be less so), and there is substantial evidence that additivity over time may or may not hold. For example, there is evidence that people can live with a health problem for a short time, but that its perceived impact on health is more severe the longer they have it. This phenomenon has been called “maximum endurable time”. On the other hand, there is also evidence that people can adapt to adverse health conditions. Either of these behaviors would violate the assumption of additivity of utility or disutility over time, albeit in opposite ways.
Eighth, the value scores or preferences measured across individuals can be aggregated and used for the group. Finally, the QALYs calculated using the aggregated preference weights can themselves be aggregated across individuals.
Other Issues Surrounding QALYs: Discounting and Equity Weighting
QALYs as the valued outcome for purposes of societal resource allocation decision-making in health, as in other areas of policy, should reflect positive time preference or discounting. Discounting of both costs and QALYs, and at the same rate, is now the conventional assumption and is recommended by both the US Panel and by NICE. Despite the convergence of views in these two countries, there remain some controversies around discounting. In some countries, such as The Netherlands, the prevailing view is that costs should be discounted at a higher rate than QALYs. There is also the position that QALY should not be discounted at all if risk neutrality holds. The argument for discounting QALYs is driven largely by the opportunity cost argument, as espoused by the US Panel , and yet, a key assumption is that people as individuals are risk neutral and that QALYs are valued equally over time. This continues to be a dilemma that remains to be fully resolved.
Issues of equity and fairness are not incorporated quantitatively into the conventional QALY approach, beyond the basic assumption that each QALY across individuals gets equal weight. This does not mean that these issues are not important, but it does mean that they should be weighed by the decision-maker as additional considerations alongside aggregate QALY gains. Aggregate health gains, measured by conventional QALYs, are one of many inputs to the processes of individual clinical decision-making, societal or programmatic audit, or resource allocation. The other considerations, including equity and fairness, need to be considered separately in the conventional QALY approach. Examples of these aspects include age of the target population, the baseline health status of the target population, and, perhaps, the principle that there is more value in raising the minimum health in the population than in increasing average health by further improving the health of more healthy people. Political considerations may also compete with aggregate health improvement for a decision-maker's priority at the population level. There may be a felt need to give more priority to orphan diseases, to curing identifiable patients compared to preventing statistical disease, or to caring for the very youngest and oldest segments of the population because they are regarded as most vulnerable.
QALYs have made an important contribution to decision-making within the health field. Within the qualifications noted above, the conventional QALY remains a powerful conceptual tool that we believe can lead to improved decision-making. QALYs help to make choices, but there are many other dimensions to decision-making within the health-care arena, dimensions not covered by the conventional QALY. Nevertheless, the conventional QALYs are not intended to incorporate all concerns of decision-makers . Given the assumptions and the difficulties posed in measurement, it is important to maintain caution in the use of the QALY. It is nonetheless our view that the conventional QALY retains an important role in health-care decision-making.
We gratefully acknowledge the contributions to the content of this article gained through discussions with Norman Daniels, Mark Kamlet, and Erik Nord as members of the working group on theoretical foundations of QALYs.
Source of financial support: Funding for the ISPOR “Building a Pragmatic Road: Moving the QALY Forward” Consensus Development Workshop was made possible in part by grant 1R13 HS016841-01 from the Agency for Healthcare Research and Quality. The views expressed in written conference materials or publications and by speakers and moderators do not necessarily reflect the official policies of the Department of Health and Human Services; nor does mention of trade names, commercial practices, or organizations imply endorsement by the US government. Funding for this Value in Health Special Issue, “Moving the QALY Forward: Building a Pragmatic Road” was made possible in part by Contract No. HHSN261200800148P from the National Cancer Institute.
Milton Weinstein, George Torrance, and Alistair McGuire have no conflicts to declare.