The value of health—Empirical issues when estimating the monetary value of a quality‐adjusted life year based on well‐being data

Abstract Decisions on interventions or policy alternatives affecting health can be informed by economic evaluations, like cost‐benefit or cost‐utility analyses. In this context, there is a need for valid estimates of the monetary equivalent value of health (gains), which are often expressed in € per quality‐adjusted life years (QALYs). Obtaining such estimates remains methodologically challenging, with a recent addition to the health economists' toolbox, which is based on well‐being data: The well‐being valuation approach. Using general population panel data from Germany, we put this approach to the test by investigating several empirical and conceptual challenges, such as the appropriate functional specification of income utility, the choice of health utility tariffs, or the health state dependence of consumption utility. Depending on specification, the bulk of estimated € per QALY values ranged from €20,000–60,000, with certain specifications leading to more considerable deviations, underlining persistent practical challenges when applying the well‐being valuation methodology to health and QALYs. Based on our findings, we formulate recommendations for future research and applications.


| Conceptual framework
We generally followed the framework proposed by Huang et al. (2018) for obtaining v Q . In a simplified model, the subjective well-being (SWB) of individual i at time t, as a proxy for individual utility, is assumed to be described by: where W it is a vector of the individual's well-being at all observed time points (w it ), Y it is the corresponding incomes (y it ), and H it a vector of health states (h it ). The total well-being experienced by individual i over a time interval of length T can then be described by a simple cumulative sum of individual well-being states across time; HIMMLER ET AL.
Within this framework, consider an individual experiencing a change to their health vector ΔH i within the time window T. For the individual to remain on the same level of SWB W i requires an offsetting income change ΔY i ; The proposed approach estimates the population average ΔY necessary to offset an imposed hypothetical health state change ΔH over T equivalent to one QALY. Therefore, ΔY is the compensating income variation (CIV) for one QALY, or short CIV QALY . Following Huang et al. (2018), an ordinary least squares (OLS) fixed-effects regression was estimated to calculate the impact of health and income on SWB within a time window T of 2 years (t 0 and t −1 ). Modelling SWB as linear despite the cardinal nature of life satisfaction is a widely used approach, see for example, Ferrer-i-Carbonell and van Praag (2002). The underlying empirical model takes the following form;

| Baseline specification
where W irt refers to the SWB of individual i living in region r at time t, measured using life satisfaction data. The individual's health status H irt is captured by health utility values based on the short form six dimensions (SF-6D) instrument and its UK utility tariff (Brazier & Roberts, 2004). Household income is denoted by Y irt . Lagged variables of health and income were included to not be limited to short-term one-year changes and to partly account for reverse causality. We control for a vector X irt of other potential time-varying confounders. To account for time-invariant unobservables, we incorporated individual (λ i ), state (μ r ), and time (ϵ t ) fixed-effects. u irt denotes the error term. Heteroscedasticity-robust standard errors were used in all estimations. In a second step, we obtained CIV QALY values by dividing the health status coefficients (β 0 and β 1 ) by the income coefficients (δ 0 and δ 1 ): The corresponding values represent the marginal rate of substitution between income and health with respect to well-being, based on the overall population average. CIV QALY thereby is the empirical conceptualisation of v Q using the well-being valuation approach. Income outliers (as will be defined in Section 3.4.1) were dropped from the baseline analysis.

| Instrumental variable specification
A well-documented problem of the well-being valuation approach is the endogeneity of the income coefficient estimate. This was frequently addressed using an instrumental variable (IV) (see e.g., Brown, 2015;Howley, 2017;McNamee & Mendolia, 2018). Huang et al. (2018) instrumented income with the occurrence of financial-worsening-events such as personal bankruptcy or large financial losses.
Lacking such information, we followed Luechinger (2009), who used predicted labour-market earnings based on industry-occupation cells as income instrument. The rationale is that shifts in predicted income correspond to industry and/or occupation wide trends, which correlate with the development of negotiated wages or collective wage agreements, but do not reflect individual-level effort or circumstances. Further, it is assumed that the income variance across industries and occupations captures information on the unobserved costs of income generation such as stress and/or associated health risks, and that unobserved selection effects of certain types of individuals into industries and occupations are captured in the time-invariant fixed-effects. One advantage of this instrument is that the captured income shifts have a rather permanent nature, whereas financial-worsening-events or lottery wins can be highly transitory shocks. In addition permanent income shifts have been found to be of higher relevance for individuals' wellbeing (Bayer & Juessen, 2015;Cai & Park, 2016).
The identifying assumption is, therefore, that income variation across industries and occupations over time is uncorrelated with individual-level characteristics and especially life satisfaction, besides the effect of income changes themselves. To implement the IV approach we followed a two-stage least squares estimation procedure. In a first step we estimated the individual's labour market earnings L irt based on the following regression; from which we obtained fitted values, constituting the predicted labour earning conditional on the individual's industry-occupation cell (I irt and O irt ), work tenure (T irt ), and work-hours (R irt ) and a set of industry-and year-fixedeffects.
The obtained predicted labour earnings were summed on the household level and weighted by household composition to obtain the predicted household labour incomeL HH irt , the instrument used in the first-stage regression; from which we obtained the fitted values for individual income,Ŷ irt . In the second stage we substituted income Y irt bŷ Y irt , estimating The resulting coefficients for health (β I 0 and β I 1 ) and income (δ I 0 and δ I 1 ) were then included in Equation (6)

| Treatment of outliers
Due to a right-skewed and long-tailed income distribution, with self-reported income often misreported or even exaggerated (Hariri & Lassen, 2017), income outliers may have a large effect on CIV QALY estimates when using linear models (Rousseeuw & Leroy, 1987). To identify outliers, which remains challenging for fixed-effects models (Verardi & Croux, 2009), we reformulated our base case model as a pooled OLS model and calculated DFbeta, a measure quantifying the impact that dropping an observation has on the coefficient estimate. All observations with a DFbeta larger than 1, the recommended threshold (Bollen & Jackman, 1985), were dropped from the baseline analysis. In a robustness check we repeated the calculations including these outliers.

| Income specification
To accommodate the diminishing marginal return of income we log-transformed income (Layard et al., 2008). CIV QALY was then estimated based on a slightly modified equation as used by Ólafsdóttir et al. (2020) and van den Berg and Ferrer-i-Carbonell (2007). This entailed dropping the lagged income and health coefficients as used in our base model (Equation 6).
HIMMLER ET AL.
In the log-income specification CIV QALY was calculated as the percentage share of annual income (median annual income � y). By construction, CIV QALY values would be confined to be no greater than this income level which may be acceptable when valuing small gains or changes but not a full QALY. Therefore, we added the parameter Δ to the equation and set it to 10. Instead of calculating the monetary equivalent of a one QALY change we calculated the equivalent of a 0.1 QALY change and multiplied it by 10.
To account for the non-linearity of income without imposing a logarithmic functional form, which may not adequately capture the relationship especially on the lower end of the income distribution, we furthermore tested a piecewise linear specification similar to Ólafsdóttir et al. (2020). To obtain the appropriate number of income splines and cut-off values, we iteratively combined income-deciles. The equality of coefficient estimates of adjacent splines was tested and non-significantly different splines were gradually combined until coefficients were significantly different and model fit did not improve. CIV QALY values were then calculated for each income spline and also aggregated by weighting according to the number of individuals in the respective splines. Estimating a piecewise IV specification was not feasible, as one distinct income instrument would have been required for each of the splines.

| Choice of utility tariff
Lacking a German specific SF-6D utility tariff we relied on the UK time-trade-off based value set (Brazier & Roberts, 2004) to construct health utilities. In an alternative specification we explored the importance of tariff choice by instead applying a recently developed value set from the Netherlands which was estimated using a discrete choice experiment (Jonker et al., 2018).

| Health state dependence of the utility of consumption
Another empirical issue of concern relates to the interaction between health and income and experienced (consumption) utility. This so-called health state dependence implies that the marginal utility gain from a given income change is directly dependent on the underlying health status (Finkelstein et al., 2013). So far, there is only inconclusive evidence on the magnitude and the direction of this effect: Finkelstein et al. (2013) found a negative health state dependence, a higher marginal utility of income in good compared to bad health, based on US data. However, replicating their approach using European data, Kools and Knoef (2019) found evidence for positive health state dependence, potentially due to differing provision of public goods in European healthcare systems.
As illustrated by both Finkelstein et al. (2013) and Kools and Knoef (2019), health state dependence has important implications for (health) economic issues such as the optimal design of insurance contracts or individual-level decisions on life-cycle savings. In the context of estimating CIV QALY , which requires a simultaneous measurement of the wellbeing impacts of both health and income separately, a thorough investigation of the life-cycle development of health states and the associated changes in consumption utility seems warranted.
To explore the potential impact of health state dependence on CIV QALY estimates, we reduced our sample to those individuals that transitioned between health states. Finkelstein et al. (2013) used the onset of chronic diseases for this purpose. While this represents a convenient definition for an elderly population, we took a different approach, allowing us to observe the transition of individuals from good to bad health also for healthier groups. First, we reduced the sample to individuals whose mental or physical short form health questionnaire (SF-12) component scores changed by at least 10, or one standard deviation, throughout their respective observation period. 3 This was done to ensure that individuals in this group have experienced a consequential change in their mental and/or physical health. Good health states were defined as periods in which either of the two scores was above their respective individual-level mean; bad health states if they were below. Secondly, we conditioned on the consecutive observation of differing health states with at least two consecutive periods needed to be observed in either state. This allowed us to estimate CIV QALY for good and bad health separately while also ensuring that individuals transition into longer-term health states (see Supplementary Appendix A4 for details). Importantly, the sample included individuals transitioning from good to bad health and vice versa, although the former is most frequent.

| DATA
We used data from the annual SOEP panel survey, providing a representative sample of the adult (aged 16+) German population (Goebel et al., 2019). Ethical approval with respect to the surveying process generating the underlying data was obtained by the SOEP researchers directly. SF-6D health utilities were constructed from SF-12 data, which is biennially included in the survey since 2002. To facilitate the specified two-year time-frame T used for the CIV QALY calculations, and to prevent dropping observations from every second year, we linearly imputed SF-6D values for intermediate years. However, this was only done if individuals were observed for three consecutive years with two completed SF-12 surveys.
Life satisfaction was measured on a 10-point scale ranging from 0 ("completely dissatisfied") to 10 ("completely satisfied"). Information on individuals' income was based on self-reported monthly net household income. To account for differences in household composition, we calculated equivalized household income, following the definition by Hagenaars et al. (1994). Income data was converted to 2018 prices using the official consumer price indices (Federal Statistical Office, 2020).
To construct our instrument we extracted information on net labour income and individuals' industry and occupation. We dropped households with individuals where information on labour income but not on industry/occupation was available. Predicted labour income was assumed to be zero for all individuals with no labour income information, or who stated that they were not employed. 4 We furthermore extracted information on a similar set of variables as used by Huang et al. (2018) to control for confounding factors. These included age, disability, marital status, employment status, educational attainment and leisure time. Table 1 summary statistics of the analysis data, consisting of 29,735 individuals providing 186,906 individual-year observations. Supplementary Table A1.1 provides an overview of the conditioning applied to the SOEP data, while Supplementary Table A1.2 shows that the sub-sample of employed individuals who were dropped because of missing industry/occupation information is comparable to the remaining sample of employed individuals. As the exclusion of individuals without at least two consecutive SF-6D values was the only major selection criterion, the sample remained largely representative for the overall German population. -1855 5 | RESULTS

| Baseline results
The baseline OLS and IV results, are shown in Table 2, separating between results using the full dataset with imputed SF-6D values, and the dataset without imputation. To construct our instrumental variables, we predicted labour incomes based on industry/occupation for 125,229 observations. Supplementary Appendix A3 provides details on this prediction and the associated errors, which were small for the largest part of the income distribution. The instruments were significant in the first stage regression (Supplementary Table A3.1) and passed the Cragg-Donald weak identification test (F-value: 1864 and 192). This indicates a high relevance of the instrument, a common finding for this type of instrument (Bayer & Juessen, 2015;Luechinger, 2009). The Hausman test for endogeneity of the instrumented variables was significant, signalling that income should not be treated as exogenous.
Equivalized monthly household income, health status (SF-6D utility), and their lagged values were positive and significant predictors of life satisfaction in the OLS specification. This was also the case when instrumenting for income,  Abbreviations: BIC, Bayesian information criteria; CIV-QALY, compensating income variation of one QALY; IV, instrumental variable; OLS, ordinary least squares.

-
except that the lagged income coefficient was insignificant. We observed a two-fold increase in the income coefficients in the IV model (0.048 vs. 0.098), a similar magnitude to what has been observed in previous studies using the SOEP (Bayer & Juessen, 2015;Pischke, 2011). Interestingly, the difference is minimal compared to what was observed by Huang et al. (2018), who reported an IV coefficient which was 130 times larger than the OLS coefficient (0.080 and 0.0006). Applying the estimated income and SF-6D coefficients to Equation (6) resulted in a CIV QALY value of €58,533 in the OLS model and €22,717 when instrumenting for income. This value represents the average amount of additional income necessary to maintain the same level of life satisfaction if a hypothetical health change of one QALY is imposed. Without SF-6D imputation, reducing our sample to 85,433 observations across 21,718 individuals, the OLS results increased by a factor of 1.38 to €80,522 while the IV-based value increased by a factor of 1.24 to €28,130. These differences were driven by larger SF-6D and income coefficients compared to the baseline calculations, possibly resulting from increased within-person variance as the distance between observations is two years instead of one. For the remainder of the results presented, we will be using the full dataset with imputed SF-6D values to make use of the largest amount of information available. Table 3 columns 2-3 contains estimates for East and West Germany separately, motivated by the persisting differences in life satisfaction and income levels (Frijters et al., 2004;Vatter, 2020). OLS-based CIV QALY estimates were €75,748 in the West and €28,548 in the East. The IV-based estimate was also higher in the West compared to the East (€20,750 and €12,982), although the relative difference was lower (factor of 3.64 and 2.20). In both models, this difference was mainly driven by a considerably larger income coefficients in the East, likely due to the prevailing income differences between West and East; observed average monthly equivalized income was €2140 in the West and only €1652 in the East.

| Specifications related to income
Re-estimating our baseline models including four individual-year observations which were flagged as outliers lead to a considerably lower income coefficient in the OLS model ( Abbreviations: BIC, Bayesian information criteria; CIV-QALY, compensating income variation of one QALY; IV, instrumental variable; OLS, ordinary least squares.

-
individuals from the same household, which reported a drop in monthly income from €142,534 to €14,051 within 2 consecutive years, while reporting constant life satisfaction.
In the models using log-transformed income (Table 4 columns 5-6), the income coefficient was 0.24, larger than reported before by Pischke (2011) (0.125 to 0.182), who also used the SOEP. The corresponding IV coefficient, with a value of 0.63, was on the higher end of previous IV estimates based on the industry-wage structure and the SOEP: Luechinger (2009) reported an estimate of 0.55, while Pischke (2011) reported values ranging from 0.489 to 0.617. Previous estimates based on instruments using lagged or future income shocks were also similar, with Bayer and Juessen (2015) providing a range of 0.45 to 0.50 for permanent income shifts. 5 The log-transformation resulted in considerably larger CIV QALY values compared to the baseline. The OLS values increased by a factor of 2.63 to €153,877 while the IV values increase by a factor of 3.59 to €81,649. 6 The piecewise linear specification was estimated with ultimately four income splines. The cut-off points were at the 20 th percentile (€1200), the 40 th percentile (€1546), and the 80 th percentile (€2635). Figure 1 plots the overall distribution of life satisfaction across income, and the linear fit of life satisfaction across splines, indicating a non-linear, diminishing pattern. The spline-specific CIV QALY values were €7347, €11,686, €29,548 and €409,810. The population aggregated CIV QALY was €97,486. This estimate was driven by the large CIV QALY value in the fourth income spline, where the income coefficient was insignificant. Using the three significant splines lead to a CIV QALY value of €19,515.

| Specifications and issues related to health
Choice of SF-6D value set.
Applying the Dutch SF-6D value set shifted the distribution of health utilities (Figure 2), with the mean utility decreasing from 0.725 to 0.554. These differences likely reflect methodological differences rather than actual variation in health state preferences between the UK and the Netherlands as UK and Dutch tariffs for the EQ-5D have been shown to be similar (Norman et al., 2009).
The estimated CIV QALY values using the Dutch SF-6D tariff were markedly smaller ( Table 5). The OLS estimates decreased from €58,533 to €32,534, while the IV estimates decreased from €22,717 to €13,054. This shift was caused by the smaller SF-6D coefficients (3.12 to 1.78), resulting from the wider spread of the Dutch tariff, which ranges from −0.44 to 1, allowing for negative health state utility, instead of 0.345 to 1 as in the UK value set. The same actual change in health corresponds to a larger change in SF-6D utility in the Dutch tariff which reduces the impact of a (hypothetical) one unit change in SF-6D on life satisfaction.
Health state dependence of the utility of consumption F I G U R E 1 Relationship between life satisfaction and income across income splines. Life satisfaction values are depicted as small grey dots. Black dash-dotted vertical lines represent the income splines used in the piece-wise linear regression. Black horizontal lines plot the linear t within these splines HIMMLER ET AL.
We explored the potential impact of health state dependence on CIV QALY estimates by restricting our sample to individuals experiencing a substantial health change, and splitting their respective observation periods into good and bad health states (see Section 3.4.4). The resulting sample was considerably smaller, including only 5112 individuals yielding 48,861 observations. Nevertheless, the summary statistics suggests that the sample is still comparable to the full population sample (see Supplementary Table A4.1). Table 6 depicts the corresponding estimation results. Compared to the baseline estimates using the full sample, CIV QALY values based on the combined good and bad health state samples were lower in the OLS model (€39,482) and similar in the IV specification (€20,377). For "good health states", the corresponding CIV QALY estimates were lower with €33,336 and €16,532. For "bad health states", the OLS-based CIV QALY Abbreviations: BIC, Bayesian information criteria; CIV-QALY, compensating income variation of one QALY; IV, instrumental variable; OLS, ordinary least squares.
F I G U R E 2 SF12 index values using UK and Dutch tariffs. The black dash-dotted line indicates the Dutch tari mean. The grey dashdotted line indicates the UK tari mean. The distributions and means reect SF-6D values based on self-reported SF12 questionnaires only estimate was €38,374 and the IV-based estimate €11,779.Important to note is that the drop in the IV based results for the bad health state primarily resulted from a larger income coefficient estimate, even though the SF-6D coefficients increased considerably. These results indicate that there is a positive health state dependence of income in line with the results for Germany by Kools and Knoef (2019). Unfortunately, we were not able to follow Kools and Knoef (2019) and Finkelstein et al. (2013) in focusing on non-working individuals to ensure stable income across health states, ruling out that the increased income coefficients are driven by individuals losing their income, and hence having a larger marginal utility of additional earnings. For our analysis, such a restriction was not feasible, as within-person income variation is necessary to estimate the income coefficients. However, the general empirical pattern remains the same when excluding individuals with large negative income differences between health states (see Supplementary Table A4.2). This also holds when only considering the working population (Supplementary Table A4.3) and those experiencing sudden and severe health changes (Supplementary Table A4.4).

| Robustness checks
Lastly, we tested the robustness of our baseline results to some general concerns regarding our estimation strategy (Table 7). In a first robustness check, we limited our sample to individuals which were in paid employment and provided industry-occupation information, the same sample which was used to obtain estimates for predicted labour income for the IV regression. The resulting OLS-based CIV QALY was slightly lower than the baseline at €52,829, while the IV-based value was slightly higher than the baseline at €26,097. These differences were driven by the smaller SF-6D coefficients in both OLS and IV models, likely resulting from the working population being healthier as individuals without labour income (the unemployed and retired). The sum of both income coefficients was smaller in the corresponding IV-calculations compared to baseline, increasing the CIV QALY . Next, we followed Luechinger (2009) by excluding households with self-employed main income earners, as the income measurement error was likely to be amplified among these individuals. Self-employed individuals are often reluctant to disclose their income, while also experiencing unstable income streams and hence, even if not reluctant to report, they might simply misreport accidentally. The resulting CIV QALY estimates and income and SF-6D coefficients were similar to the baseline estimates (€55,359 and €20,352). Another concern relating to the instrument is that observed income changes may also relate to individual effort, which likely impacts income differently across industries and occupations. Unfortunately, effort cannot be observed. To nevertheless explore this, we use information on reported bonuses, gratifications, or profit sharing to identify the group of individuals for whom this might be a relevant concern, as for them effort would have the highest impact on income and life satisfaction. To test the robustness of our results to this potential bias, we estimate our baseline models excluding such observations. The results in Table 7 columns 7-8 suggest that this bias is relatively limited.
To investigate the potential impact of dropping employed individuals without industry/occupation information (as required for constructing the IV), we included those observations in a further robustness check (Table 7 column 9). The corresponding OLS estimates for income coefficients and CIV QALY (€62,266) are comparable to our baseline estimates. However, by construction, we cannot confirm this for the IV estimates.

| DISCUSSION
Applying the well-being valuation approach to longitudinal health and income data from Germany, we estimated the monetary equivalent value of one year in full health v Q (equivalent to one QALY). Beyond demonstrating the feasibility of this approach in a new country context, we explored additional empirical and methodological challenges with implications for the practical usefulness of well-being valuation based v Q estimates (denoted as CIV QALY ). Abbreviations: BIC, Bayesian information criteria; CIV-QALY, compensating income variation of one QALY; IV, instrumental variable; OLS, ordinary least squares.
*p < 0.10, **p < 0.05, ***p < 0.01.  2020)). The range of CIV QALY estimates obtained in our study fit into the ballpark of more reasonable stated preference estimates (Ryen & Svensson, 2015). Furthermore, it is important to note that all IV CIV QALY estimates, except the logincome specification, fell within the range of v Q estimates for Germany of €4988 to €43,115 reported by Ahlert et al. (2016), who provided the only v Q estimates until now. A first approximation of an opportunity cost based QALY threshold value, or k Q , for Germany was reported by Woods et al. (2016). Using empirical estimates of health care opportunity costs for Germany, and the relationship between GDP per capita and the value of a statistical life, they calculated a k Q range of €19,276 to €24,374 (in 2018 euros). A recent related study by Ochalek and Lomas (2020) reported estimates of cost per DALY averted (essentially the reciprocal of a QALY gain) for Germany of €47,116 to €74,650 (in 2018 euros).

| Limitations and strengths of the analysis
IV-based estimates rely on restrictive assumptions relating to their unbiasedness and informational value. A valid concern is that occupational choice may be related to other unobserved confounders, such as personality traits or income preferences (Pischke & Schwandt, 2012). The use of individual fixed-effects should somewhat alleviate such concerns due to the rather stable nature of personality traits (Borghans, Duckworth, Heckman, & ter Weel, 2008), but they cannot provide complete assurance. A further assumption is that being employed in a certain industry/occupation should not have a significant, direct effect on life satisfaction, therefore violating the exclusion restriction. Supplementary Figures A3.6 and A3.7 show that, controlling for income and other confounders, this effect is not zero, but modest and mostly insignificant. One additional drawback that is rarely explicitly discussed but of great importance in the well-being valuation context, is that IV estimates only yield a local average treatment effect (Angrist et al., 1996). Using predicted labour income as an instrument, at least questions the generalisability of our IV estimates to the full, also non-working, population. Further, as we are not able to address all sources of measurement error with respect to income, the remaining upward bias in the income coefficients would imply a downward bias in the estimated CIV QALY values.
In addition, income variation in industry-occupation cells predominantly consists of positive, upward shifts in wages (and differences therein). This is conceptually different to financial worsening events, as used by Huang et al. (2018), as these capture income losses. 7 Given income loss aversion (Boyce et al., 2013), our IV based CIV QALY estimates likely represent a lower-bound. The potential endogeneity of health (status) in life satisfaction regressions due to reverse causality (see e.g., Veenhoven, 2008or Sabatini, 2014, which is rarely addressed in the related literature, is a further limitation. This endogeneity could be addressed by appropriate instruments or identifying health shocks which are plausibly exogenous, such as heart attacks or strokes. However, besides practical issues like data availability, it is questionable how generalisable such localized causal effects would be for the overall impact of the multi-dimensional construct of health on life satisfaction. Heterogeneity may exist both concerning the type of health shocks, but also relating to their timing within the (life-cycle) health distribution. Whether or not our estimates of the impact of health are biased upwards or downwards can therefore not be easily ascertained. In the one previous article in the related literature that addressed endogeneity directly, Brown (2015) found that the health coefficient was slightly overestimated when not instrumented. Assuming this also holds in our context, this would imply that there is an upward bias in our CIV QALY values resulting from the endogeneity of health.
A more practical limitation relating to measuring health was that we had to impute SF-6D utilities for every second year to make full use of the SOEP's rich annual data. This required us to condition the sample on individuals who had at least three consecutive observations, which may have resulted in underestimating the impact of deteriorating health, since individuals are more likely to discontinue their participation in a longitudinal survey following a negative health shock.
A final limitation lies in the potential presence of double-counting as SWB enters the model twice: As an implicit consideration in the SF-6D health state valuation tasks (on which the scoring of our health measure is based on), and as a proxy for experienced utility (Equation 2). To what extent this is problematic is difficult to assess. To avoid this double counting one could use an unweighted sum score of the SF-6D levels. However, this raises the question of the appropriate anchoring. Using such a sum score, rescaled to a 0 to 1 range (expanding the number of levels of the first two SF-6D dimensions to five to not impose any weighting) lead to lower CIV QALY estimates in the unimputed dataset (Supplementary Table A2.2, columns 4-5). However, when imposing the same anchor and therefore range as in the original SF-6D tariff (0.345 to 1), the OLS and IV results (€88,867 and €30,567) were much closer to the unimputed baseline estimates (€80,671 and €27,777).
It seems that not the differential weighting between the dimensions caused the larger differences, but the different anchors, that is the lowest utility. Another alternative approach entailed eliciting CIV values for different dimensions directly by regressing on all levels of the SF-6D, which did not impose any weighting. Adding up the resulting CIV values of the lowest level of all six dimensions, summed up to a cumulative value of moving from the best possible to the worst possible health state of €79,013 and €27,489, which again resembled the unimputed baseline estimate (Supplementary Table A2.2). While these sensitivity checks somewhat alleviate the concerns about double-counting, the latter revealed that 46% of the CIV QALY value stemmed from the impact of mental health on life satisfaction. It is likely that the mental health dimension also plays a dominant role in our baseline calculations. Whether this in itself is problematic lies outside the scope of this paper, as it relates to a more general issue of the well-being valuation approach: is life satisfaction the best (available) proxy for experienced utility?

| Implications of findings
There are several practical implications of our study for future applications of the well-being valuation approach in general, and its use for estimating v Q in particular. First, judging from the impact outliers have in the OLS specification (Table 4), subsequent applications of the approach using linear models should report on the occurrence and treatment of outliers. Secondly, given that the functional form of income had a large impact on our estimates its final specification has to be well argued and reporting results for other alternative functional forms seems warranted. The piecewise linear specification seems to be a promising alternative, given that it is more flexible and gives all income groups a proportional weight. This approach, however, comes at the price of increasing the number of variables that need to be instrumented for.
Third, the choice of utility tariffs for the health instrument matters greatly. Especially the range of the scoring algorithm has a large impact (Supplementary Table A2.2), as an imposed one unit change in health utility implies a different change in health if the range goes from 0.345 to 1 or −0.44 to 1. How to overcome this issue while facilitating cross-country comparisons and how this relates to the underlying QALY concept, should further be discussed in future applications. Lacking country specific tariffs, it may be convenient to opt for a tariff whose origin can be placed in cultural and socio-economic proximity to the country to be investigated. However, the impact of methodological peculiarities in how these tariffs were generated is relevant. It would have been interesting also to compute CIV QALY estimates based on the more widely used EQ-5D health utilities and compare the implications of differences in scope and range of the health instrument used on CIV QALY values. Unfortunately, EQ-5D is rarely included longitudinal surveys. Lastly, the differing values obtained when considering East and West Germany separately, or specific time periods (Table 3), also highlight the potential importance of the specific country context for CIV QALY calculations.
One of the major conceptual issues discussed in our analysis, with direct relevance for the practical value of any empirically estimated CIV of health, is the health state dependence of utility. We attempted to provide indicative evidence on how health state dependence might affect estimated CIV QALY values. However, it remains unclear whether empirical approaches based on self-reported (panel) data can produce reliable estimates if health state dependence is prevalent and survey participation and attrition is (partially) driven by health changes over time. We found considerable differences in the estimated CIV QALY values when comparing periods of good and bad health within individuals (Table 6). As the underlying point estimates depicted substantial uncertainty, these findings should be interpreted with caution and merely as indicative evidence for the role of health state dependency in this context. The impact of this subsample of individuals on the population wide CIV QALY value is likely small, as attrition is high once individuals experience bad health states, long-term or very severe health shocks. Hence, a pragmatist might argue that this issue is of theoretical interest only. We would argue, however, that this is an inherent limitation of self-reported observational data and its ex-post perspective in this context. Stated preference methods would allow for an explicit ex-ante consideration of this issue through tailored sampling strategies and survey design.
An additional conceptual concern related to health state dependence is the question of adaptation to bad health over time (Huang et al., 2018). Adaptation implies the gradual return of SWB to pre-health-shock levels despite continued (or deteriorating) bad health (Loewenstein & Ubel, 2008). This phenomenon has been documented before using the SOEPdata (Oswald & Powdthavee, 2008) and would generally decrease estimated CIV QALY , as the marginal utility of health would decrease with time spent in bad health. To what extend this represents an estimation error, however, is debatable and depends on what is perceived to be the "true" impact of ill-health on well-being over time, and whether adaptation, if present, should be corrected for. The recent findings by Etilé et al. (2020), who documented a heterogeneous distribution of adaptive potential across subgroups, underline the relevance of this concern also from a normative perspective.
The previous remarks highlight avenues for future research, like investigating the causal effect of health on life satisfaction, for example using instrumental variable regressions. In addition, the approach would crucially benefit from further research into the impact of income on life satisfaction, for example using (natural) experiments. The regular inclusion of variables that represent valid instruments for income into different population panel surveys could also be beneficial for further exploring the reliability and validity of these instruments and the approach as a whole, as it would allow cross-national replications of results. Meanwhile, future applications may draw upon recent advances into the generalisability of IV-based estimates (see e.g., Mogstad et al. (2018)) to explore how these concerns can be addressed within the framework of available instruments. Further, linking survey data on individual-level SWB measures with detailed administrative records on income, health, and care consumption would also be a fruitful direction for further inquiry, resolving some of the enumerated concerns. With respect to the question of health state dependency, for example, it would be possible to determine the extent to which survey data has an inherent blind spot due to the attrition of individual following severe health shocks. In addition, such data could also be used to explore a wider range of specification choices within the general empirical strategy used, for example with respect to the choice of control variables. Here, we deliberately followed Huang et al. (2018), as the set of basic control variables they propose is available in most national panel surveys, which facilitates replications across country-contexts. However, there is ample room for extending the analysis by considering a wider set of control variables and their impact on CIV QALY estimates, or even to altogether choose a different approach such as shrinkage estimators (e.g., LASSO) or matching to address endogeneity concerns around the impact of health and/or income on life satisfaction.
A final issue concerns the practical application of our v Q estimates. If certain (health) policies/interventions in Germany were to be evaluated using a v Q value from our study, which range from around €20,000 (IV) to €60,000 (OLS), we have to highlight the following: 8 Our study cannot provide a definite answer regarding which estimate is most accurate to be used in different contexts. This relates to the uncertainty surrounding these estimates and the underlying assumptions, but also to normative or distributional questions, which need to be addressed in the future (Cookson et al., 2020). While our piecewise regression results somewhat reflect such concerns by constructing v Q estimates using a weighted mean of the different parts of the income distribution, this is only a first, very simplistic approach. When used in a normative context, like decisions on reimbursement of technologies, explicit policy (debate and) support is required. Applied studies could use the range we provided to highlight the impact of varying v Q estimates on their results and recommendations, keeping in mind that for specific sub-populations our v Q estimates might not be directly applicable. In any case the selection of any specific value over another in any practical application should be transparently discussed with respect to the applied selection criteria.

| CONCLUSIONS
We demonstrated that the well-being valuation approach can be another useful instrument in the (health) economist's tool box for obtaining monetary equivalent valuations of health (v Q ). Some inherent empirical and conceptual challenges of applying this approach in this context can be addressed, especially when using large-scale longitudinal data. However, other issues, like the health state dependence of the utility of consumption, will remain a threat to the validity of estimates, warranting additional research. Concurrently, alternative approaches of estimating v Q , like stated preference studies or methods aiming at eliciting the value of a statistical life, as recently applied by Herrera-Araujo et al. (2020), provide important complementary insights, despite their conceptual differences. Also given their respective strengths and limitations, methodological diversity is desired in the ongoing endeavour of measuring the monetary equivalent value of health.
The type of v Q estimates provided in our analysis reflect average marginal health valuations (with the caveat of being entirely based on marginal changes in health related quality of life), representative on a national level. As such, these can be applied in economic evaluations informing decision making on a societal level for publicly funded policies or interventions. Such v Q estimates predominantly find their use by informing the cost-effectiveness threshold in the context of cost-utility analysis within health care, which aid in informing decisions on reimbursement of certain health interventions. However, estimates of the monetary value of health can also be useful in broader contexts, like costbenefit analyses or similar approaches (Cookson et al., 2020), especially when benefits and costs of policies/interventions constitute a mix of health and non-health outcomes occurring across different sectors. Advancing methodologies aiming to estimate v Q and providing insights into their validity can assist in informing some of the uncomfortable trade-offs that societies generally face in priority-setting both within health care but also beyond (Chilton et al., 2020).