Comparing different methods of indexing commercial health care prices

Abstract Objective To compare different methods of indexing health care service prices for the commercially insured population across geographic markets. Data Sources Health Care Cost Institute commercial claims data from 2012 to 2016. Study Design We compare price indices computed using methods with differing levels of computational intensity: weighted‐average versus regression‐based methods. We separately compute indices of the prices paid for set of common inpatient and set of common outpatient services in different markets across the United States using each type of method. We subsequently examined the variation of and correlations between the resulting index values. Data Collection/Extraction Methods We computed health care service price indices separately using samples of inpatient and outpatient facility claims from 2012 to 2016 across 112 Core‐Based Statistical Areas. Within each category of services, claims were limited to members under the age of 65 with employer‐sponsored insurance. Both samples were limited to a common set of services that made up nearly 80 percent of the service use in the full sample every year. Principal Findings We found that the methods studied produced highly correlated price indices (r > .94) with similar distributions across years for both inpatient and outpatient services. Conclusions Our findings suggest that weighted‐average methods, which are much less computationally intensive, will generate results similar to regression‐based methods.

markets. Different sources of price variation-such as provider or insurer market power, cost of living or other such factors-imply vastly different policy remedies. To better understand price variation and its potential causes, it is crucial to first determine how to measure prices in different geographic areas.
We assess how using different methods of benchmarking the transaction prices for health care services across geographic areas affects price measurement. Traditionally, researchers have used less computationally intensive average-based measures that summarize the prices of a defined set of services. 4 Related literature indexes prices by computing an average-based measure of the prices for all services used to treat a given disease. 5 These approaches, though, require the researcher to specify a set services which can be compared across areas and over time, as well as how to weight each service.
Alternatively, researchers can use more computationally intensive regression-based approaches to measure medical prices across areas or providers. 1 Regression-based approaches have the advantages of not requiring the specification of a comparison set of services or service weights, more flexibly accounting for missing services, and better accounting for population heterogeneity between areas. These advantages, however, come at the cost of increased computational burden. When dealing with a dataset with hundreds of millions of claim observations for tens of thousands of distinct services, for example, the computational burden is not trivial.
We compare weighted-average-based methods with a regression-based method of indexing commercial health care prices. We then empirically test whether these different methods produce substantially different price indices to inform the trade-off of index performance and computational burden.

| DATA
We leverage data from the Health Care Cost Institute (HCCI) on the allowed amounts paid for 250 million commercial claims spanning 112 Core-Based Statistical Areas (CBSAs) across 43 states over 2012 to 2016. To construct a service-level sample, we aggregate data from all claim lines associated with an individual on common dates for the same service. For inpatient admissions, we define a service by the Diagnostic Related Group (DRG) code. For outpatient services, we define the service by the combination of a Current Procedural Terminology (CPT) code and CPT code modifier. We define the total spending for a service as the sum of the allowed amounts for all associated claim lines. The allowed amount represents the actual amounts paid to a provider for a claim-including all insurer and individual out-of-pocket payments.
We limit our sample to individuals under the age of 65 with employer-sponsored insurance. We exclude claims with extreme lengths of stay or costs. We use CBSAs as the geographic level to report our analysis. We also limit our analysis to CBSAs where the HCCI database has both sufficient data coverage and density of health care providers. For a complete description of our sample restrictions, see the Appendix S1.
In this study, we limit our analysis to inpatient admissions and outpatient facility services. For each category of services, we examine a subset of common services (see Appendix S1). Within both service categories, these common services account for more than 60 percent of all spending and almost 80 percent of service use in our sample in every year (see Appendix S1). Using each index method described below, we compute price index measures for each CBSA in our sample in each year for inpatient and outpatient services, separately. We define our base year as the first year in our sample (2012).

| Defining terminology
We calculate total spending on and use of each service. We define the average price for a "CBSA-year" observation as the total spending on a service divided by the total number of times that service was performed in that CBSA in that year. We similarly calculate the average price of each service at the national-year level as the sum of its total spending nationally divided by the total number of services performed nationally.

| Weighted-average approach I: Geometric average index
This approach computes a price index as the weighted geometric average of the ratio of the average price for each service in a CBSAyear observation relative to the average price nationally in our base year: We weight each service s within our sample services S using the share of national total spending on our sample services accounted for by each service in our base year. Here, Price tgs ∕Price Ts is the ratio of the average price of each service s in each CBSA g in each year t, and the average price of each service s nationally in our base year T. If a CBSA does not have any claims for a service, we impute the national average price as the average price in that CBSA; we impute a price ratio of one.

| Weighted-average approach II: Arithmetic average index
An alternative weighted-average approach is an arithmetic average index. 4 Using the same service weights as above, we compute a price index as the weighted arithmetic average of the ratio between the average price in each CBSA in each year and nationally in a base year (all terms are defined as before):

| Regression-based price index
Another type of approach to computing a price index is to estimate a regression-based average price measure. Following previous literature, we compute such an estimate using the following specification for each claim c for service s provided in CBSA g, in year t using the following estimation equation 1 : Here, X cgst is a vector of gender and age band indicator variables, s are service fixed effects, g are CBSA fixed effects, t are year fixed effects, and csgt is an i.i.d. normally distributed error term (see Appendix S1).
Using these estimated coefficients, we predict the average price per service in each CBSA in each year holding constant the mix of patient demographic characteristics (age bands and gender) and service types. Here, X is a vector of sample means for each demographic indicator, and s is a vector of sample means for each service indicator: We also predict the average price per service nationally in each year holding constant the average mix of demographics, service types, and patients from each study CBSA. Here, all terms are defined as before and g is a vector of sample means for each CBSA indicator: We then calculate a price index for each CBSA-year observation as the ratio between the predicted price in each CBSA in each year and predicted price nationally in our base year:

| Comparison metrics
To compare the variation in the price indices we calculate using each method above, we examine the distributions of each index. We then compute pairwise Pearson correlation coefficients between the indices.

| Price index variation
Each approach produces wide variation in price index values across CBSAs. However, in our sample, the choice of approach had little effect on the shape of the resulting distribution of those values (Figure 1).
For example, using the geometric average approach, in 2016, the CBSA with inpatient prices equal to the 75th percentile was 31 percent higher than the CBSA with inpatient prices equal to the 25th percentile. By comparison, the differences between the 75th and 25th percentile inpatient prices were 30 percent and 28 percent for the arithmetic average and regression-based approaches, respectively.

| Correlation of index measures using different methodologies
In addition to having similar distributions, we also find that the price indices calculated with different methods share a strong positive correlation. In other words, CBSAs with relatively high price levels using a particular approach have similarly high relative prices when using the other approaches. Pairwise scatter plots are presented in Figure 2 alongside corresponding Pearsonʼs correlation coefficients for each set of index values across CBSAs in 2016 for each category of services.
The pairwise correlations among the indices we compute are all above 0.96 for inpatient admissions and above 0.94 for outpatient services.
These correlations are stable across all sample years ( Table 1). As expected, the geometric and arithmetic average indices were the most closely related as they use the same service weights. Our results are robust to alternatively estimating our regression-based price index using the natural logarithm of price as our dependent variable to allow for a skewed price distribution (see Appendix S1).

| D ISCUSS I ON
There are both advantages and disadvantages when computing price indices using each of the three methods discussed in this paper.

| Weighted-average approach I: Geometric average index
This approach has two primary advantages-it is computationally straightforward to implement and it allows for the price index to have An additional limitation when comparing the same market basket across areas and over time is that it does not flexibly allow for substitution-a common concern surrounding price indices. It is possible that individuals substitute away from high-priced services. In this case, our market basket may overestimate price levels, price growth, or both. A related concern is that this approach compares the same services over time, so included services must be present in each period. If services that either enter or exit are disproportionately likely to have low prices, this will overestimate price growth over time. 6 Another limitation of this approach is that it does not adjust for differences in populations across CBSAs. Heterogeneity in underlying population health could then be driving some of the heterogeneity in our price index values. For example, it is possible areas with sicker populations on average receive more expensive versions of the same service resulting in this index overstating such a CBSAʼs price level. This may be of particular concern when defining services using DRG codes, as we do with inpatient services. 7

| Weighted-average approach II: Arithmetic average index
The main advantage of this approach, relative to the geometric average, is ease of interpretation. Here, the price index can be interpreted as the average price of a market basket where a certain number of each type of service (according to their weight) are included in the basket. Additionally, this approach is also not computationally intensive to implement.
The arithmetic average index shares the same limitations caused by selecting sample services and imposing service weights. In particular, it has the same potential limitation due to missing observations for sample services in some CBSA-year combinations; necessitating either a narrower market basket or imputed data. One additional limitation relative to the geometric average index is that the arithmetic average index does not have the convenient multiplicative properties.

| Regression-based price index
The primary advantage of the regression-based approach is that it more flexibly accounts for missing data. In this way, the regression-

| Assessing index methodology trade-offs
The strong correlations between the regression-based measure and both the weighted-average indices potentially alleviate concerns about some of the shortcomings of the latter measures. This finding provides evidence adjusting for demographic factors potentially correlated with underlying population health (ie, age and gender) does not qualitatively change the price indices we compute.
Our findings also provide evidence that price indices for medical services are not sensitive to reasonable choices of market basketsthat is, both the set of services studied and the weights of services imposed. We observed strong correlations between indices which flexibly allow for substitution away from high-priced services (regression-based) and indices that do not (weighted-average-based),  Each approach studied produces largely similar measures of relative commercial health care prices across geographies and over time.
These findings echo previous work using a different data source and sample period, as well as surveying a new set of methods. 4 In practice, we argue that these findings allow researchers to use less computational weighted-average approaches knowing they will produce similar indices to regression-based approaches. As big data become a mainstay of health services research, our findings provide empirical support for measuring commercial prices using methods with low computational intensity. Further, within weighted-average approaches, the similarity in price indexes produced allows researchers to leverage the measure best suited for their purposes.