The Value of Correcting Values: Influence and Importance of Correcting TTO Scores for Time Preference

Authors


Arthur E. Attema, iBMG/iMTA, Erasmus University, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. E-mail: attema@bmg.eur.nl, brouwer@bmg.eur.nl

ABSTRACT

Objectives:  Quality-adjusted life year (QALY) values measured by means of the time tradeoff (TTO) method are often used in economic evaluations. However, these values are mostly not corrected for time preference, i.e., a lower valuation being attached to later life years than to earlier life years, and therefore may underestimate the true QALY weights. Moreover, the magnitude of the underestimation depends on the severity of the health state and the horizon chosen in the TTO method. Hence, we cannot just add a constant component to all existing QALY tariffs. In this study, we estimated the value of correcting TTO scores. We showed the possible consequences for health policymaking when we correct TTO scores for time preference, thereby taking into account severity and horizon.

Methods:  We employed the results obtained using a nonparametric time preference elicitation method. We made use of experimental time preference data, in order to better represent individuals' time preferences.

Results:  Our results demonstrate that correcting for time preference does not result in one clear influence on the QALY gains from health changes. When considering these changes in the context of cost–utility ratios, the proportional change in the QALY gain is crucial.

Conclusions:  Correcting TTO scores has a moderate, yet nonnegligible influence on outcomes. We conclude that correcting TTO scores for time preference is feasible and influential, so that there can be a substantial value of correcting values for time preference.

Introduction

Many economic evaluations in health care use quality-adjusted life year (QALY) weights that have been elicited by the time tradeoff (TTO) method to measure effects [1–3]. The TTO method determines the utility (on a scale from 0, i.e., death, to 1, i.e., full health) associated with being in some imperfect health state by eliciting the indifference between two distinct health streams: one living in a better condition yet for a shorter period of time and the other living in a worse condition for a longer period of time. Thus, the individual needs to tradeoff life years against health improvements. A typical TTO exercise will ask a respondent to assume living 10 more years in a particular health state X, after which death follows. The alternative is to be completely healthy again, but to live less than 10 years. The individual's task is then to indicate how many life years she would be prepared to give up regaining full health. If the person, for instance, indicates to consider 7 years in full health to be equivalent to living 10 years in state X (and having set the utility of full health equal to 1) the QALY weight of X is then normally simply calculated as 7/10 = 0.7. A well-known problem with directly applying these crude results obtained through a typical TTO is that this procedure assumes that respondents attach equal weight to all future life years, which is highly unlikely as normally, people exhibit (strong) time preference, giving more weight to current events and less to future ones [4,5]. Time, preference obviously distorts the values obtained through a TTO [6–10], yet this distortion is commonly left uncorrected, even in national tariffs. This obviously raises the issue of the size of the resulting bias and the effect this has on health state valuations and economic evaluations.

In a simulation, Dolan and Jones-Lee [10] showed the potential consequences of time preference for health state utilities derived by the TTO method. In particular, they pointed out that the adjustment depends on the score of the health state, with utilities around 0.5 requiring the greatest adjustment, whereas very poor (close to 0) and very good (close to 1) health states need least adjustment. These simulations, however, assumed the constant discounting model, and did not empirically estimate discount rates. Because it is questionable whether the constant discounting model is descriptively valid [11–13] and because the exact discount rates used in simulations were not derived from empirical studies, the question of what exactly is the impact of time preference on TTO scores remains unanswered. Moreover, the implications of (not) correcting for time preference for the practice of economic evaluations and health policy have largely remained unaddressed.

This article aimed to fill this gap by investigating the consequences for policy recommendations of correcting for time preference, while using experimentally elicited time preference estimates and not assuming any particular parametric time preference model. The correction factors (CFs) for several different health states are calculated, as well as for several different gauge durations which may be used as stimulus in TTO exercises. The remainder of this article is structured as follows. First, we explain the QALY model and the TTO method in the Methods Section. There, we also show how TTO scores can be corrected for time preference. The results of our simulation exercises using the empirically obtained time preference data are presented in the Results Section, where we also highlight the practical consequences of (not) correcting for time preference. Finally, the Conclusions Section discusses the results and provides the main messages of our study.

Methods

The QALY model is a common way to describe preferences over health streams. Let h = (hj, . . . , hT) denote a health profile where ht denotes the health state in period t = j, . . . , T, where T is the decision-maker's final period of life. A constant health profile h = (hj = α, . . . , hT = α) is indicated as health profile α with duration nα. Further, v(ht) is a value function that represents the individual's preferences over health quality and δ(t) denotes the corresponding weight attached to the value in this period. It can then be shown that, under some reasonable assumptions, h is weakly preferred to h′ if and only if inline image[14]. We call inline image the general QALY model and assume that health profiles are evaluated by this function. We term the function inline image the utility of life duration for the period between t = 1 and t = T. A concave utility function for life duration is considered equivalent to (positive) time preference.

In the TTO method, the value of a health state β can be elicited by asking the subject to give some period nFH in full health (FH), followed by death (D), which makes her indifferent to a stated period in health state β (nβ), also followed by death. That is, the indifference relation (h1 = FH, . . . , hnFH = FH,hnFH + 1 = D, . . . , hnβ = D) ∼ (h1 = β, . . . , hnβ = β) is obtained. Under the general QALY model, this indifference can be represented by the following equation:

image(1)

If the value function over health is normalized so that v(FH) = 1 and v(D) = 0, this simplifies to:

image(2)

That is, the value of health state β is given by:

image(3)

Users of the TTO method, however, often assume linear utility of life duration, so that W(T) = T or δ(t) = 1 for each t. The model then simplifies to the linear QALY model, where equal weight is attached to all health state values regardless of their timing. We get the following simple expression for v(β), which we denote a TTO score:

image(4)

In case of time preference, W(t) is increasing at a decreasing rate with t, causing v(β) to be higher in Equation 3 than in Equation 4. In order to get an estimate that incorporates the individual's time preference, there are essentially two possibilities. First, we can elicit the function W(t) for each respondent within the TTO exercise itself and use this to compute Equation 3 with the values of the TTO method (i.e., nβ and nFH). Attema and Brouwer [5] discuss in detail how this can be done using a risk-free elicitation method. However, once the time preference of a relevant population has been determined (e.g., in a previous study), an alternative may be to use a set of CFs which are representative for the considered population. These CFs can subsequently be used to correct the raw TTO scores (Equation 4) upwards. That is, having some estimate of W(nFH) and W(nβ) in Equation 3, we can compute a (health state-specific) CF, CF = W(nFH)/W(nβ) − nFH/nβ that links Equation 3 to Equation 4. This allows one to perform an ordinary TTO exercise and infer the corrected TTO score by computing inline image. In this article, we first briefly show how the function W(t) can reliably be elicited and then demonstrate how such an elicitation can subsequently be applied to compute CFs.

In order to estimate W(t), we use the direct method of Attema et al. [15]. This method elicits the utility of life duration function W(t) in a risk-free manner. The advantages of this method are that it is not distorted by probability weighting, that it avoids the inclusion of the problematic outcome immediate death, that it does not make parametric assumptions, and that it appears to be more feasible for respondents than alternative methods [15]. The correction procedure by means of the obtained utility estimate is straightforward and is explained by Attema and Brouwer [5]. We briefly describe the main steps here. Suppose a subject has declared to be indifferent between 7 years in full health and 10 years in an impaired health state. As indicated in the introduction, normally, a raw TTO score of 7/10 = 0.7 is then inferred. When invoking the correcting procedure, we first have to compute the utilities of 7 and 10 years. To this end, we need an elicitation of W(t) for a horizon encompassing at least the considered TTO horizon of 10 years. Suppose we take a horizon of T = 50 years. Then, after, without loss of generality, setting W(50) = 1 and W(0) = 0, we can measure the number of years t such that W(t) = 0.5, 0.25, 0.75, 0.125, etc. (For instance, W(t) = 0.5 implies eliciting that particular point t where the weight attached to the years up to t equals the weight attached to the years after t. So, if, in this case t = 10, this means that the individual attaches as much weight to the first 10 years as to the remaining 40.) Now, if we, for example, have elicited W(5) = 0.125 and W(12) = 0.25 in the subject, we are able to estimate W(7) and W(10) by means of linear interpolation. That is, W(7) = 0.125 + (7 − 5)/(12 − 5)*(0.25 − 0.125) = 0.161 and W(10) = 0.125 + (10 − 5)/(12 − 5)*(0.25 − 0.125) = 0.214, so that the corrected TTO score is equal to W(7)/W(10) = 0.752, leading to a CF = 0.752 − 0.7 = 0.052 for this health state. If this CF is assumed to be applicable in another population, it can subsequently be applied directly to the crude TTO scores.

Given the feasibility of correcting for time preference, especially in the context of calculating national tariffs, it seems that the value of correcting values outweighs its cost. In smaller studies, rather than repeatedly determining own CFs, using predefined CFs may be a feasible alternative. In any case, not correcting TTO scores for time preference seems a less attractive option.

In what follows, we will make use of the data elicited in an experiment by Attema and Brouwer [16]. The information about the utility of life duration was used there to correct TTO scores for a particular health state for time preference. This article uses their median time preference estimates for the correction of several health states. Table 1 shows these rates. The table shows the median estimated values of t attached to the fixed values of W(t). The estimates of t are expressed relatively to the maximum duration (T = 50). Note that a limitation of the study of Attema and Brouwer [16] is that they only had data on a sample of university students. Therefore, the goal of this article was not to provide a full set of representative CFs, but to demonstrate the impact of correcting TTO scores for time preference. In order to estimate CFs that can be broadly applied (e.g., to correct national QALY tariffs), an elicitation of these factors among a representative sample of a nation is called for.

Table 1.  Median elicited relative durations (t) for given utility of life duration [W(t)]
W(t) = 0.125W(t) = 0.25W(t) = 0.5W(t) = 0.75W(t) = 0.875
t = 0.08t = 0.17t = 0.38t = 0.61t = 0.77

We simulate the influence of correcting for time preference on QALY weights by considering nine different health states, varying form very poor to very mild. These severities are captured by considering nine different uncorrected TTO scores: 0.1, 0.2, . . . , 0.9. In addition, we investigate three different gauge durations used in the measurement process, i.e., 10, 20 and 40 years, to demonstrate the CFs for TTO exercises using a time horizon exceeding 10 years.

Results

Table 2 and Figure 1 show the resulting CFs. They vary between 0.01 for the most severe and mildest health states when using the 10-year horizon, and 0.09 for the middle health state (i.e., with a raw TTO score of 0.5) when using the 40-year horizon. Our results deviate somewhat from those of Dolan and Jones-Lee [10] in that our CFs for the smaller horizons, i.e., 10 or 20 years, tend to be the highest around 0.4 instead of 0.5. This is caused by a slope of the discount function that deviates from the exponential function, which emphasizes the importance of using empirically derived estimates for time preference.

Table 2.  Median correction factors
Horizon (years)Raw TTO scoreMedian corrected TTO scoreDifference (CF)Relative difference (%)
  1. TTO, time tradeoff; CF, correction factor.

100.10.1090.0099
0.20.2190.0199
0.30.3280.0289
0.40.4380.0389
0.50.5350.0357
0.60.6320.0325
0.70.7290.0294
0.80.8260.0263
0.90.9170.0172
200.10.1200.02020
0.20.2400.04020
0.30.3460.04615
0.40.4530.05313
0.50.5480.04810
0.60.6390.0396
0.70.7300.0304
0.80.8210.0213
0.90.9130.0131
400.10.1400.04040
0.20.2650.06532
0.30.3740.07425
0.40.4810.08120
0.50.5850.08517
0.60.6830.08314
0.70.7800.08011
0.80.8680.0688
0.90.9380.0384
Figure 1.

Correction factors for 10-, 20-, and 40-year time tradeoffs.

These results indicate a general tendency of a correction for time preference. First of all, the correction has no effect for TTO scores of 0 and 1. For all other health states, the correction increases the utility score for the health states, albeit at different rates. In terms of health changes, which, in the context of economic evaluation, are the outcomes of interest, the direction of the influence of the correction for time preference is less clear. Assume a change from inferior health state A to superior health state B. Using uncorrected TTO scores, the change from A to B would be calculated in QALYs as inline image, whereas after correction, this becomes inline image or inline image. How the difference score, i.e., the QALY gain, improves after correction therefore crucially depends on the height of the CF for health state B relative to that for health state A. It is clear that the CFs for death and full health are, by definition, equal to zero. Therefore, correcting for time preference has no implications for treatments that save people from dying and restore full health. All other CFs for health states in between death and full health are positive. This implies that if health state A is death (and therefore CFA = 0) and health state B is not full health, so that CFB > 0, the QALY gain increases with CFB. As can be easily seen in Figure 1, the correction is most favorable for treatments that rescue people from death to a health state with a raw value of around 0.4 (obtained through a 10-year TTO). Inversely, if health state B is full health and health state A is not ‘death’, the correction for time preference results in a reduced QALY gain (i.e., with CFA > 0 and CFB = 0). Again, the largest impact on the gain is when the uncorrected TTO score of health state A is around 0.4 in a 10-year TTO.

Therefore, the fact that all QALY weights but the extremes increase when correcting for time preference does not necessarily imply that all health-care interventions become more cost-effective because of corrections. In fact, this holds as a rule only for lifesaving programs that do not return someone to full health because a positive CF is added to the gain. Oppositely, the QALY gain of treatments bringing someone from an impaired health state with a positive QALY weight to full health will unequivocally be reduced, because a positive CF is subtracted from the gain.

For all other treatments, both the QALY weight of the starting health state and the QALY weight of the end health state will increase, so that a priori no prediction on the sign of the change in effects can be made. That is, CFB – CFA may be either positive or negative. Looking at Figure 1, taking the 10-year TTO as an example, one can easily derive for which treatments the difference between CFB and CFA will be positive (i.e., when the point on the correction curve for the initial health state is lower than for the end state). For treatments of health states in the neighborhood of the maximal CF (i.e., the top of the graphs in Fig. 1), the graph is relatively flat, so for treatments that bring a patient from a state in that region to a better state close to the original state (say, from 0.4 to 0.5), the correction does not make much of a difference. Similarly, the graphs clearly show other treatments that are not influenced by correction for time preference, for example treatments improving the health state from 0.3 to 0.7 for a 10-year gauge duration, and from 0.2 to 0.6 for a 20-year gauge duration.

It is important to realize that the implications of these CFs also depend on the policy context. Because all CFs are positive, correcting for time preference obviously results in higher valuations of all health states except the extremes. While this is important in its own right, the consequences hereof for policymaking are perhaps even more informative. One first consequence is that the value of a typical life (i.e., the summation of QALYs over lifetime) will certainly increase, justifying more resources to be spent on health. Below, we highlight the importance of correcting for time preference using three distinct yet general examples in the context of economic evaluations: 1) demonstrating the impact of correcting values on establishing the QALY gain of some treatment; 2) demonstrating the impact of correcting when different time horizons were used to obtain the QALY weights for the relevant health states; and 3) demonstrating the monetary value of the correction when a health gain is judged against a fixed threshold.

Example 1—Comparing Treatments before and after Correcting for Time Preference

Normally, in an economic evaluation, a difference score between two health states (e.g., before and after treatment or old vs. new treatment) is calculated. When taking into account time preference, first, the differential impact for different health states is of interest. Suppose, for example, that we can choose from three available treatments, all costing the same, i.e., €1000. The effects are measured by uncorrected 10-year horizon TTO scores. According to these numbers, one treatment causes an improvement in health from 0 to 0.1 [cost–utility ratio (CUR) = €10,000/QALY], another from 0.4 to 0.5 (CUR = €10,000/QALY), and a third from 0.9 to 1 (CUR = €10,000/QALY). That is, ignoring equity and other concerns, all treatments at first glance have the same effects for the same costs. Therefore, they would receive equal priority. If we now consider the corrected scores, we get a different picture. The first treatment increases health from 0 to 0.109, i.e., a positive CF of 0.009 is added to the end state, resulting in a CUR of €9174/QALY. The second increases health with 0.097 from 0.438 to 0.535, so that the CFs result in a net reduction of the QALY gain with 0.003, leading to a corrected CUR of €10,309/QALY. Finally, the third treatment increases health from 0.917 to 1. That is, because of the CF of 0.017 for the initial health state, the utility gain drops to 0.083, leading to a corrected CUR of €12,048/QALY. The first treatment has now become the most cost-effective treatment, whereas the third treatment has become the least cost-effective.

Example 2—CURs

Suppose that we have to invest in one of two treatments. Both treatments save people's lives and return them to some chronic health state for 1 year, after which they die. One treatment causes patients to get in a very poor state, the QALY weight of which has been elicited by means of a 20-year TTO, resulting in a raw TTO score of 0.1. The treatment costs €5000. The alternative treatment returns the patient for 1 year to a medium health state that is valued at 0.5 with the aid of a 10-year TTO. This treatment has a cost of €25,000. Both treatments therefore have a CUR of €50,000 per QALY when not correcting for time preference (note that the choice of a specific time horizon in a TTO matters. An uncorrected TTO score of 0.5 obtained through a 10-year TTO implies a different utility value than one obtained through a 20-year TTO). If we abstract from issues such as budget constraints and equity concerns, both treatments receive equal priority. If we correct for time preference, however, this equivalence no longer holds. The corrected QALY weight for the severe health state increases to 0.12, whereas the corrected QALY weight for the medium health state is equal to 0.535. The CURs now decrease to €41,667 and €46,729, respectively. Therefore, the treatment of the poor health state clearly has become more cost-effective.

This example highlights that, although the correction for the medium health state is larger in absolute terms (i.e., 0.035 vs. 0.02), it does not necessarily mean that the related CUR will have the strongest improvement as well. Instead, the proportional CFs then become important. As shown in the last column of Table 2, these are higher for uncorrected TTO scores of 0.1 than for uncorrected TTO scores of 0.5. Actually, they decrease monotonically with uncorrected TTO scores. While it may be counterintuitive that absolute differences are less important, this may be easily explained with the above examples and a fixed budget of €50,000. With this budget, ten people can be treated with the first treatment (restoring health to 0.1) while two can be treated with the second treatment restoring health to 0.5. Without correction for time preference, opting for either choice results in 1 QALY gained (10 times 0.1 or 2 times 0.5). After correction, however, the choice is between gaining 1.2 QALYs or 1.07 QALYs (10 times 0.12 or 2 times 0.535).

Example 3—Monetary Impact

It may also be illustrative to demonstrate the monetary value of correcting for time preference in the valuation of effects. Suppose the threshold value for a gained QALY is set at €50,000. Then, if a patient gets a treatment preventing her from death and bringing her to a health state that was valued with a 10-year TTO at 0.4 for 1 year, this life extension may cost no more than 0.4375*€50,000 = €21,875 in order to be deemed cost-effective when using corrected TTO scores, whereas it may have cost 0.4*€50,000 = €20,000 when not correcting for time preference. That is, there is an increase in produced value of €1875, meaning that a treatment is allowed to cost €1875 more when compared with a fixed threshold of €50,000. It is easy to see that correction for time preference therefore may have an impact on the final decision made when judging some intervention against a fixed threshold, and may prevent cost-effective programs to be wrongfully rejected as well as cost-ineffective programs to be wrongfully accepted.

Conclusions

This article has shown the influence and importance of correcting TTO scores for time preference. Using empirically derived time preference rates and simulations, it was demonstrated that the value of correcting values can be substantial, also when considering the use of health state valuations in practice. Our results demonstrate that correcting for time preference does not result in one clear influence on the QALY gains from health improvements. While the QALY values of all health states between death and full health will increase because of correction for time preference, it depends on the CF relevant for the initial health state relative to that of the end state whether correction will result in a larger or smaller QALY gain when moving from health state A to B. Moreover, when considering these changes in the context of CURs, the proportional change in the QALY gain is crucial. It is therefore important to keep in mind the context when considering the policy implications of correcting for time preference. That is, when only focusing on the health improvement of several treatments that save lives, correction for time preference is most favorable for treatments that result in health states with the highest CFs. Life saving programs that result in poor health states have lower absolute CFs, but tend to be more favorable when considering CURs, because their relative CFs are higher.

For treatments that have a positive net CF (i.e., CFB − CFA > 0), correcting for time preference reduces the CURs, so that they may be deemed cost-effective sooner. In other words, the value of treatments increases because the treatment produces more QALYs than captured by the raw TTO score. However, because the net CF can also be negative (i.e., CFB − CFA < 0), correcting for time preference does not result in improved CURs as a rule. For example, for treatments restoring full health from some inferior health state, the net CF will always be negative and therefore the CUR of such programs will worsen because of correction. That is, the health improvement now has less value. Similarly, when looking at differences between health states, the CFs can also cancel each other out, resulting in the difference between corrected scores being almost identical to the difference in uncorrected scores.

So, what do these results imply? First of all, that it matters in terms of QALY values, QALY gains and CURs whether one corrects for time preference or not. Given the clear influence time preference has on health state valuations in a TTO, and the available methods to correct for time preference (e.g., [15]), there seems little justification for not correcting for time preference. One may argue that other biases in TTO exercises have an upward bias in the value, whereas time preference has a downward bias [4], so that these biases may work in opposite directions and uncorrected scores may be preferable to corrected ones. However, because there is no reason to assume that the biases equal out, and indeed, given the diversity in influence of time preference on the outcomes, it seems rather heroic to assume such a balance in biases. It seems more appropriate to avoid or correct biases than to assume that not correcting is better than correcting for them.

Second, the observation of considerably lower CFs (see Table 2) for shorter gauge durations might suggest an inclination to favor the use of shorter gauge durations over larger ones in TTO valuations. There exists an important caveat to this conclusion, though. A number of empirical studies have found that short gauge durations have some major difficulties in measuring health state utilities. In particular, because of loss aversion, individuals are not willing to sacrifice life duration to improve quality of life for short remaining lifetimes, whereas lifetime and quality of life are much better substitutes for longer gauge durations [17,18]. This finding makes clear the tradeoff between biases present for short gauge durations (such as loss aversion) and biases present for long gauge durations (such as time preference), and suggests that employing longer gauge durations while correcting for time preference should certainly not be ruled out in advance.

Third, it seems feasible to derive sets of CFs that can be used to correct existing TTO values (such as national tariffs), which would facilitate correcting for time preference in practice. We stress that the CFs we used may be seen as a first indication of reasonable CFs, but that they are not obtained in a representative sample of the population and with one particular method of deriving time preference and therefore should be used with caution. A full set of CFs may also include CFs for health states worse than dead, which are not included or discussed in this article. The application of such factors will, however, be similar for such states.

We do stress two limitations of using average CFs and, in particular, the ones presented here:

  • 1Using aggregate “community-based” CFs to correct TTO scores for time preference may overlook important individual heterogeneity. Individuals’ TTO responses are affected by their own time preferences, which normally vary substantially between individuals, with some people even expressing negative time preference [19]. Variability in TTO scores could therefore be because of either variability in time preference or variability in intrinsic health-state utility. While on average, neglecting heterogeneity in time preference may not lead to large systematic biases in the estimates of the corrected utility, this is worthwhile investigating further. In principle, given the common use of tariffs for health states, it appears that using mean CFs to correct mean utilities across a population, while losing some heterogeneity at the individual level, may be acceptable as long as there are no systematic differences across different subgroups.
  • 2The method used here to derive time preference may not be without problems. For example, the direct method may be susceptible to loss aversion, i.e., subjects may be overly reluctant to a decreasing health status, which would underestimate time preference in this method. In addition, the method elicits time weights under certainty, whereas many health-related decisions involve uncertainty. However, we used this method here to illustrate the more general point made in this article. Other time preference elicitation methods, such as the certainty equivalence method, could have been used as well. The qualitative effect of using other means to derive CFs will most likely be the same, although the quantitative impact may differ to some extent. Currently, it seems unclear which method would be preferred for this purpose. Alternative methods to the one used here also suffer from difficulties (see, e.g., [20–23]). It seems therefore that more research is required to determine how to best correct TTO scores for time preference.

Still, in spite of these limitations, we would conclude that correcting TTO scores for time preference is feasible and influential, so that there can be a substantial value of correcting values for time preference.

Source of financial support: This study was made possible through a grant from The Netherlands Organization for Health Research and Development (ZonMW), project number 80-82500-98-8215.