Risk equalization in competitive health insurance markets: Identifying healthy individuals on the basis of multiple‐year low spending

Objective To study the extent to which risk equalization (RE) in competitive health insurance markets can be improved by including an indicator for being healthy. Study Setting/Data Sources This study is conducted in the context of the Dutch individual health insurance market. Administrative data on spending and risk characteristics (2011‐2014) for the entire population (N = 16.6 m) as well as health survey data from a large sample (N = 387 k) are used. Study Design The indicator for being healthy is low spending in three consecutive prior years. “Low spending” is defined in three ways: belonging to the bottom 60%, 70%, or 80% of the annual spending distribution. Versions of the Dutch RE model 2017 with and without the indicator are compared on individual‐level payment fit and, using the survey data, group‐level payment fit. Principal Findings All three alternative models outperform the Dutch RE model 2017. However, significant unpriced risk heterogeneity remains. Compared with the 60% threshold, the 80% threshold comes with a larger improvement in fit but identifies a less selective group. Conclusions The performance of the RE model can be improved by adding an indicator for being healthy based on multiple‐year low spending. However, risk‐selection potential remains, warranting high priority to further improvement of RE.


EIJKENAAR Et Al.
Over the past decades, RE models in developed countries have evolved from simple demographic models to sophisticated morbiditybased models, often containing hundreds of risk classes. [8][9][10][11] However, studies have consistently shown that even state-of-the-art RE models considerably under-or overpay specific subgroups in the respective populations, leaving significant selection potential (R. C. Van Kleef, F. Eijkenaar, & R. C. J. A. van Vliet, under review). 10,[12][13][14][15][16][17] Therefore, stakeholders in these markets continue to seek to improve RE, with a strong focus on identifying individuals in poor health through the development of new or enhanced morbidity indicators based on prior diagnoses or utilization. [17][18][19] Importantly, however, there is another side of the issue; risk selection in competitive health insurance markets may also be driven by overcompensated groups of individuals in good health, indications of which have been found in several countries, including the United States, 13,15,20,21 Switzerland, 22 and the Netherlands. [23][24][25][26][27] The Dutch RE model 2017, like most such models, includes numerous variables meant to identify sicker, higher-cost individuals. Specifically, the model contains five morbidity-based risk adjusters, that is, pharmacy-based cost groups (PCGs), diagnosis-based cost groups (DCGs), durable medical equipment groups (DMEGs), physiotherapy diagnosis groups (PDGs), and multiple-year high-spending (MYHS) groups. 11 About 27% of the population is flagged by one or more of these variables. In this paper, the focus is on the complementary group of individuals not flagged by a morbidity variable (73% of the population) and who are thus implicitly designated as healthy. However, this group is likely to be heterogeneous in terms of health and spending, implying the existence of unpriced risk heterogeneity (and thus selection potential) within this group. The reason is 2-fold. First, not all chronic conditions involving predictable, above-average spending are captured by the morbidity variables. Second, the morbidity variables may not flag individuals in moderate health (eg, those who are just developing a chronic illness). For example, individuals are only classified in a PCG if they meet a threshold of 180 defined daily doses of the relevant drugs per year. Indeed, data confirm that many individuals with chronic illnesses are missed: About 50% of the Dutch population is considered to be chronically ill according to International Classification of Primary Care codes, with "chronic illness" defined as an illness without any prospect of full recovery. 28 The result is that within the group without a morbidity flag, individuals in moderate or poor health are undercompensated while those in good health are overcompensated.
Using administrative data on medical spending and risk characteristics over a 4-year period for the entire Dutch population (N = 16.6 m) as well as data from a health survey conducted among a large sample (N = 387 k) of that population, this paper investigates to what extent the Dutch RE model can be improved by explicitly identifying individuals likely to be healthy given their low prior spending levels. Specifically, our goal is 2-fold: (a) identifying healthy individuals on the basis of multipleyear low spending and (b) examining the impact of adding an indicator for "being healthy" into alternative versions of the RE model on payment fit (ie, the extent to which insurers' revenues from RE match the insurance claims), both at the individual level and at the level of specific subgroups derived from the health survey. In addition, models are evaluated on their potential impact on cost-containment incentives, which is relevant here since adding the indicator for "being healthy" creates a link between (prior) spending and (future) RE payment.
The paper proceeds as follows. After a brief description of the Dutch health insurance system and RE model, the data and methodology are explained. Next, the main results are presented, followed by a discussion of the conclusions and the policy implications.

| THE DUTCH HE ALTH IN SUR AN CE SYS TEM AND RIS K EQUALIZ ATI ON MODEL
The Dutch health insurance system is based on Enthoven's model of regulated competition, combining competition with regulation to promote efficiency and protect public objectives such as accessibility and affordability. 7,29,30  PDGs, the Dutch model does not include information on primary care diagnoses/utilization because the required information is not available for the whole population. This is also an important reason for why we use low spending as an indicator for "being healthy." In 2017, the RE model also contained two risk adjusters based on spending on home care and on geriatric rehabilitation care in the prior year. However, both adjusters are excluded here as the latter has recently been removed and the former will probably be replaced in the RE model of 2019.
The introduction of spending-based risk adjusters in the Dutch RE model is primarily a result of the importance being attached by relevant stakeholders to mitigating risk-selection potential. There is strong preference for realizing this goal via ex ante compensation based on medically/clinically informed adjusters based on diagnoses and/or utilization linked to chronic illness. But as long as selection potential remains and the data required for developing such adjusters are not available, spending-based adjusters have been used in the Dutch model since they can be effective in reducing unpriced risk heterogeneity.
However, given the direct link between spending and RE payment, these adjusters reduce incentives for cost containment, implying a trade-off. In the opinion of the Dutch government (and other stakeholders), the reduction in selection potential outweighs the reduction in incentives for cost containment. The government has stated, however, that spending-based risk adjusters are a second-best solution and will be replaced as soon as better alternatives become available. 31

| Administrative data and health survey data
Two datasets are available for this study. First, we use administrative data on medical spending and risk characteristics for the entire Dutch population (N = 16.6 million) for a 4-year period (2011-2014).
These data were those actually used for calculating the coefficients of the RE models for the years 2014-2017, respectively. We use these data to identify individuals likely to be healthy (in absolute sense) based on low prior spending, replicate the RE model 2017, and compare the individual-level fit of alternative versions of that model.
We also use these data to simulate the impact of including an indicator for "being healthy" on insurers' cost-containment incentives.
In addition, models are compared on group-level fit. This is a common approach to quantifying unpriced risk heterogeneity in health insurance markets. 32 This approach, however, requires health information not included in the RE model. Therefore, we use a second dataset based on a health survey conducted among a large sample (N = 387 195) of the population in 2012. These data contain rich information on self-reported general health and chronic conditions, 33 which can be used to define subgroups with an over-or underrepresentation of people in poor health. In turn, for each of these subgroups, the mean actual spending can be compared with the mean spending predicted by alternative RE models, providing an indication of each model's group-level fit.
The survey sample is not representative of the entire population in three ways. First, individuals living in an institution for long-term care are not included. Second, the sample only includes individuals of 19 years or over (on September 1, 2012). Consequently, results on group-level fit are conditional on the remaining (adult) sample. Third, the remaining sample was not drawn randomly. To correct for nonrandom sampling regarding several factors (eg, age, gender, ethnicity, and income), we reweighted the sample using weights supplied by Statistics Netherlands.

| Identifying healthy individuals on the basis of multiple-year low spending
Our first objective was to identify individuals likely to be healthy in absolute sense based on multiple-year low spending (MYLS). To our knowledge, low-spending indicators have not been used previously in a RE model, so our choice of thresholds (ie, the place in the spending distribution) and of the number of years necessary to be designated as healthy was based on our own judgment rather than previous empirical research. We chose a period of three consecutive prior years as a relatively high bar for consistency of low spending. Low spending in one or two prior years could easily be a result of (more or less) random spread of insurance claims across calendar years or temporary upswings in health. In addition, 3 years corresponds to the definition of the current MYHS risk adjuster, 34 contributing to within-model consistency. Nevertheless, we also investigated the potential added value of using two instead of three-year low spending.
Regarding the place in the spending distribution, we were guided by the finding that an estimated 50% of the Dutch population is not chronically ill (Volksgezondheidenzorg.info 2018a). We first de-

| Payment fit and cost-containment incentives
Separately for each of the three spending thresholds, we constructed an indicator for "being healthy" and incorporated it into the RE model 2017. The resulting models are compared on payment fit at the individual level and subgroup level. Individual-level fit is assessed using the R-squared, Cumming's prediction measure, 35 and the mean absolute prediction error.
To assess models' group-level fit, we first merged the actual spending and the predicted spending (based on each of the four models, which were all estimated on the administrative data containing all 16.6 million individuals) in 2014 with the health survey data using an anonymized individual-level identification key. Next, using the information in the survey data, we defined 28 subgroups that are overrepresented by individuals in either poor or in good health, and calculated the mean per person under/overcompensation for each of these subgroups by subtracting the mean actual spending from the mean predicted spending, separately for each model. Assessing payment fit in this way is considered an adequate method for quantifying unpriced risk heterogeneity in health insurance markets, but is often not feasible in practice due to a lack of "external" health information that is not included in the RE model. 32 We circumvent this problem by merging the administrative data with rich health information from a survey conducted among a large sample.
Almost 98% of the survey respondents matched successfully with the administrative data of 2014. Main reasons for an unsuccessful match are death and migration in 2012 or 2013. Table 1 presents information on actual and predicted spending for adults in the administrative data and survey respondents matching with these data.

EIJKENAAR Et Al.
Based on these results, the sample seems slightly healthier than the total adult population. Previous papers using the same data have presented more detailed comparisons of both groups and similarly concluded that the sample is slightly healthier (R. C. Van Kleef, F. Eijkenaar, & R. C. J. A. van Vliet, under review). 19,36 In this study, this results in a small overcompensation on the sample of 46 euro.
We did not correct for this overcompensation because (a) we do not know how it is distributed over specific groups, and (b) our goal is to assess the relative performance of alternative RE models rather than these models' absolute performance. Nonetheless, we assessed the impact of recalibrating the survey data, such that for each model, the mean predicted spending equaled the mean actual spending. This did not alter our conclusions since the relative differences among models (which was our main focus) did not change.
Since adding the indicator for "being healthy" creates a link between (prior) spending and (future) RE payments, we also evaluate models on the potential impact on insurers' cost-containment incentives by (a) qualitatively assessing the possibilities for strategic behavior (ie, stimulating and/or not preventing individuals from exceeding the spending threshold) and (b) simulating the effect on RE payments of a small or medium-sized insurer letting its total insurance claims increase generically by 1% in the prior year (2013). In spirit, the latter relates to the "power measure" developed by Geruso and McGuire, 37 with the main difference that we examine the marginal change in RE payments due to a marginal change in claims (instead of utilization). This measure describes how regulators compensate spending at the margin, or how RE impacts insurers' marginal incentive to contain costs. In the Netherlands as well as in many other countries, this is relevant as insurers are in the position to influence consumers' and providers' utilization decisions. 38 In general, competing insurers may seek to encourage utilization that increases the marginal benefit resulting from higher RE payments more than the marginal cost resulting from higher utilization. 37 Table 2 shows descriptive statistics for the three groups identified based on three-year spending below 60%, 70%, or 80%.

| Identifying healthy individuals on the basis of multiple-year low spending
Unsurprisingly, the mean spending threshold (in euros), the size of the group identified, and the mean spending increase with higher thresholds. The opposite holds for the mean overcompensation, which reduces from 231 euro for the 60% threshold to 185 euro for the 80% threshold. Thus, in terms of mean overcompensation and spending, the 60% threshold yields the most selective group.
However, the total overcompensation (ie, taking the size of the group TA B L E 1 Mean (predicted) spending and overcompensation for adult individuals in the administrative data (2014) and for survey respondents (2012) who successfully match with the administrative data into account) is considerably higher for the higher thresholds and highest based on the 80% threshold.
An additional analysis (data not shown) revealed that increasing the threshold further (eg, to 81%) would not yield an even higher total overcompensation: While the 1% group with three-year spending below 80% but not below 79% is still overcompensated, the 1% group with three-year spending below 81% but not below 80% is undercompensated.
We also examined the modality "low spending in two out of three prior years" and found that the mean overcompensation in 2014 for the resulting group (comprising 50% of the whole population) is almost 10% lower than the group identified based on three-year spending below 70%. Since this modality thus results in a less selective group and involves much lower spending thresholds (around 400 euro, which will probably be considered problematic in the light of insurers' costcontainment incentives), we did not investigate this modality further. The first three rows of Table 3 show that alternative models 2-4 clearly outperform model 1 on individual-level fit. Though statistically significant, the difference in fit among models 2-4 is small:

| Payment fit
Compared to model 1, the improvement in Cumming's prediction measure (+0.6 to +0.8 percentage point) is relatively large, while the R-squared improves only marginally.

Model 4 = model 1 + a risk class for 3-y spending <80%
Fit statistic a an indicator based on MYLS can improve compensation for both individuals in good health and individuals in moderate or poor health.
As a result of adding a MYLS-based indicator, the overcompensation on the groups designated as healthy (see Table 2) naturally reduces to zero. Table 4

| Cost-containment incentives
A potential drawback of an indicator based on MYLS is that it could mitigate insurers' incentives for cost containment. If an individual exceeds the relevant spending threshold at least once in the three prior years, based on the coefficients shown in Table 3, this implies an extra RE payment for his/her insurer in the current year of around 560 euro (relative to the situation in which the individual would stay below the threshold in the entire three-year period). This could stimulate insurers to (a) behave strategically (ie, not preventing individuals from slightly exceeding the relevant spending threshold) and/ or (b) refrain from enacting specific cost-containment strategies.
However, it is unlikely that insurers will actually act on these incentives in practice as the potential benefits are either highly uncertain or unlikely to be worth the additional costs.
Regarding the former, using a percentage instead of an absolute threshold makes the potential benefits of strategic behavior uncertain as they depend on the actions of other insurers. In addition, regarding individuals who already remained below the threshold for 2 years, close to the end of the third year, insurers would have to determine whether these individuals will stay under the threshold again and if so, to take action. But at that moment, claims for ongoing treatments and for treatments that have yet to start are not avail- Note that the weighted mean of the under/overcompensations of mutually exclusive groups does not equal 0 due to the fact that the overall mean (predicted) spending in the survey data differs slightly from the overall mean (predicted) spending in the administrative data.
c Spending refers to the total curative somatic spending in relation to the basic health insurance package of 2017 (cost/price level of 2014). The mean spending in the sample as a whole equals 2561 euro (see Table 1). d Calculated as the overall mean of the absolute values of the under/overcompensations, weighted by the size of the subgroups. *Statistically significantly different from 0 based on a two-sided t test (P < 0.05). **Statistically significantly different from 0 based on a two-sided t test (P < 0.01).

TA B L E 4 (Continued)
Regarding The second conclusion is that although differences are small, the improvement in fit increases with the share of individuals designated as healthy. As compared to the 60% and 70% thresholds, the 80% threshold discriminates more between the designated group and the complementary group. Also, the total overcompensation (ie, taking account of the size of the group) is highest under the 80% threshold.
In terms of the mean overcompensation, however, the 80% threshold yields a less selective group as compared to the groups identified based on the two lower thresholds.
We did not differentiate the spending threshold(s) for potentially relevant enrollee characteristics (such as yes/no morbidity classification). The reason is that our goal was to identify individuals who are healthy in absolute sense (and not in relative sense, eg, individuals with relatively low-spending levels within the group with a morbidity classification), which was informed by the fact that selection actions by Dutch insurers over the past decade have mainly been targeted at those types of individuals. Nonetheless, we acknowledge that a differentiated threshold might be able to further reduce unpriced risk heterogeneity within the group of chronically ill individuals, which will reduce incentives to attract the relatively healthy individuals within this group (and to deter the relatively unhealthy). We believe this is an interesting topic for follow-up research.
The third conclusion is that insurers' incentives for cost contain- TA B L E 5 Heterogeneity of three groups of individuals designated as "healthy" on the basis of multiple-year low spending (2011-2013) using three different spending thresholds Survey respondents as a percentage of All respondents with 3-y spending <60% All respondents with 3-y spending <70% All respondents with 3-y spending <80% (Very) good self-reported health a 92% 90% 87% No self-reported condition a 60% 56% 51% (Very) good self-reported health and no self-reported condition a 51% 47% 42% All respondents with (very) good self-reported health a All respondents with no self-reported condition a

EIJKENAAR Et Al.
The final conclusion is that regardless of the model or threshold Eijkenaar, To be submitted for publication), 3,39 sophisticated forms of ex post risk-sharing, 37,40 and relaxing premium regulation.
In conclusion, the performance of the Dutch RE model can be improved by adding an indicator for "being healthy" based on multiple- year low spending. Irrespective of which spending threshold is ultimately used, however, risk-selection potential remains. Given that risk selection is highly undesirable, further improvement of RE merits high priority.

CO N FLI C T O F I NTE R E S T S
All authors declare there are no conflict of interests.