Local House Price Dynamics: New Indices and Stylized Facts

We introduce the first publicly available data set of constant&#8208;quality house price indices for counties, ZIP codes and census tracts in the United States, at an annual frequency, over a 40&#8208;year period. Between 1990 and 2015, house price gradients within large cities steepen, documenting a reversal of decades of increasing relative desirability of suburban locations. Real house prices are more likely to be nonstationary near the centers of large cities. Within&#8208;city differences in house price appreciation at the ZIP code level are, on average, about half of between&#8208;city differences, though this ratio varies depending on the time period and city size.


Introduction
Much of the 20th century in the United States can be characterized as one of suburbanization. Rapid improvements in both automobile technology and road capacity decreased transportation costs, increasing the desirability of areas far from city centers which were often plagued by pollution, crime and other hazards (Mieszkowski andMills 1993, Glaeser 2011). Consequently, as relative demand for center-city housing fell so, too, did prices. This trend persisted into the 1980s, but recent evidence suggests a reversal: after decades of hollowing out, center-cities are becoming increasingly popular. Median home values in center-cities have been increasing, according to Glaeser, Gottlieb and Toibo (2012), Edlund, Machado and Sviatchi (2015) and others. Popular media is awash in articles on "millennials" (New York Times 2014) and the rise of center-cities as destinations for the young and eager "creative class" (Florida 2004, Couture andHandbury 2016). This is coincident with Moretti's (2012) "Great Divergence" of jobs, income and increased inequality in U.S. cities (see also Diamond 2016).
While this body of research has produced robust evidence of center-city revitalization, what is lacking is direct evidence of shifts in house price gradients within cities. The house price gradient within a city captures the trade-off between demand for housing at different locations, and is therefore a fundamental economic indicator of relative demand. Changes to the house price gradient also underpin broad changes to mortgage collateral risk, and the well-being of borrowers, investors and the national housing finance system.
In this article, we provide the first evidence documenting the steepening of house price gradients over a broad cross-section of cities and, simultaneously, over long time periods. Until now, within-city house price measurement has been a persistent blind spot for those interested in urban research, policy and finance. This has limited our ability to perform panel analyses, quasi-natural experiments and to fully understand the nature of house price risk in mortgage collateral within cities.
Previously, researchers have relied on two main types of data sources to investigate within-city price movements. Both have serious issues that limit their usefulness in a variety of applications. One category consists of value measures such the American Community Survey, the Decennial Census or Zillow's home value index, which confound price and quantity changes unless strong assumptions are valid. 1 Another category consists of proprietary price index data such as Black Knight, CoreLogic or Case-Shiller, which are produced using limited transactions data prior to the late 1980s or early 1990s, forcing reliance on geographic pooling of transactions, smoothing of series over space or time or limiting coverage. 2 To address this gap in the availability of constant-quality house price measures, we construct a comprehensive set of annual house price indices (HPIs) over four decades for cities, counties, three-digit ZIP codes (ZIP3s), 5-digit 1 Value changes can be decomposed into price and quantity changes (V = P × Q, so % V ≈ % P + % Q for small changes). A value index therefore only acts as a price index in areas where the quantity of housing services is unchanging ( Q = 0), which is problematic in locations with new construction, demolition or substantial renovations. Employing a hold-out sample, we find that Zillow's value index at a ZIP code level underperforms the ZIP code price indices produced in this article by about 10% in terms of root mean squared error. 2 For instance, the Case-Shiller ZIP code house price data used by Mian and Sufi (2009) and Guerrieri, Hartley and Hurst (2013) "GHH" is proprietary and begin in the late 1980s, with coverage including 1,498 ZIP codes beginning in 1990 (see column 3 of table 1 in GHH). A major issue with proprietary data is highlighted in footnote 5 of GHH: "Unfortunately, we only have the data through 2008 and, as a result, we cannot systematically explore within-city house price patterns during the recent bust. We have been unsuccessful in our attempts to secure the post 2008 from [the company overseeing the Case-Shiller index] Fiserv." ZIP codes (ZIP5s) and census tracts, using a repeat-sales methodology. 3 The trade-off of this lower level of geographic aggregation is a higher level of time aggregation. The result is a panel of annual indices for every area for which data are available, each estimated using the same method and source data. 4 Our final database includes HPIs for 914 CBSAs (381 MSAs and 533 MicroSAs), 2,716 counties, 879 three-digit ZIP codes, 18,053 five-digit ZIP codes and 54,901 census tracts. 5 Three main stylized facts emerge with these new HPIs. First, the new HPIs provide the first conclusive documentation of steepening house price gradients within an average large American city. Gradient rotations are measured using models relating appreciation to the distance to the central business district (CBD). The study of house prices in terms of gradients is based on insights from the standard urban model (SUM) of Alonso (1964), Mills (1967) and Muth (1969). This model contains predictions about the movement of house prices in terms of gradients as a function of proximity to the CBD. Based on this theory, broad shifts in house prices will occur either at the city level by way of a level shift in the price of all housing units, or within cities based on rotations (steepening or flattening) of the price gradient. 6 We conduct further tests to control for city polycentricity, and find estimates to be robust with respect to a variety of covariates, including city size, commuting time and modal choice, income amount and source, household characteristics and labor force status. We also examine the relation between the elasticity of housing supply and gradient rotations, finding little evidence that supply factors have affected price gradients. Gradient rotations in large cities also suggest the 3 We use the term "cities" to refer to Core-Based Statistical Areas (CBSAs) that may be further defined into Metropolitan Statistical Areas (MSAs) and Micropolitan Statistical Areas (MicroSAs). It should be emphasized that no data from other time periods or areas are incorporated into a particular area's index calculation. This is opposed to the common practice of augmenting index values constructed with sparse observation counts by temporal or spatial information, or with alternative methods. Rather we utilize only the repeat-sales method to produce a broad and consistent set of indices with an information set that is strictly limited to the time period and location of measure. 4 A potential concern with such a comprehensive set of indices is that some periods, especially early in the sample and in sparsely populated areas, may have low observation counts which could lead to higher variance in estimates. Clapp, Giaccotto and Tirtiroglu (1991) show, while short-term estimates of house prices may contain noise, these differences do not compound and instead offset after several years. possibility of nonstationarity of house prices in some locations. According to augmented Dickey-Fuller tests, within-city nonstationarity occurs in large cities at higher rates near the CBD compared to the suburbs. This suggests that real house price appreciation in these areas may be sustainable, as opposed to mean-reverting. In small cities and in the suburbs of large cities, the appreciation gradient is flat, indicating long-run mean reversion.
Second, we find that house price volatility is higher near the CBDs of large cities. In housing market models, higher volatility is generally associated with a lower elasticity of supply. Under this supply elasticity hypothesis, changes to demand-both positive and negative-will be capitalized into house prices to a greater degree in more topographically restricted, denser or more highly regulated cities (see Glaeser and Gyourko 2005, Glaeser, Gyourko and Saiz 2008, Saks 2008and Saiz 2010 for some seminal works in this literature). Our results thus suggest demand increases for center-city housing have outstripped supply increases in every five-year period over the last 30 years.
Finally, we find that the within-city variation in house price appreciation is about half of the between-city variation. The ratio between the two is higher in large cities, suggesting greater submarket heterogeneity in large cities versus small cities. While within-city variation is relatively stable over time, between-city variation changes dramatically. For instance, during the run-up to the Great Recession in the 2000 through 2005 period, between-city differentials doubled compared to the previous five-year period, then returned to the prior rate in the 2005 through 2010 period.
Several results and predictions related to the Great Divergence are supported by our findings. As the price gradient rotates in large cities, wealth is redistributed from owners of land in the suburbs to owners in the center-city, potentially increasing inequality. In addition, in the SUM, households and firms who consume less housing find the center-city relatively more attractive, thus explaining the rise of millennials and the service industry in the center-city. Therefore, the house price results in this article act as strong corroborating evidence for the findings of Couture and Handbury (2016) and Diamond (2016), who find that young, high human capital workers are attracted to center-cities. While these demographic groups may be attracted to amenities, it is the high house prices and their relatively low housing demand that may allow these groups to outbid others for scarce center-city space.
The remainder of this article is structured as follows. The next section outlines how our HPIs are constructed and compares them to existing indices. The third section gives a basic overview of Muth's Equation in the SUM, which frames the following discussion of stylized facts from the indices. The final section concludes with a summary, implications and potential applications of these new house price indices.

Repeat-Sales Index Construction
It can be challenging to estimate a house price index. Data are often held by private listing services or public entities that limit access due to proprietary protections and privacy concerns. Even with accessible data, an accurate market price for a set of housing units can be difficult to compute because several years may pass between individual sales and characteristics may vary across units. Due to these facts, a large volume of transactions is necessary to construct a constant-quality index, often requiring aggregation over time or space. Table 1 shows the major U.S. HPIs that are available online at no pecuniary cost, along with the time horizon, frequency and level of geographic aggregation. Geographic aggregation at the city level is currently the industry standard, allowing for high-frequency (monthly or quarterly) house price measurement, but at the cost of smoothing over variation within the city. This leaves a serious gap in our knowledge of house price dynamics within submarkets of cities. Therefore, instead of aggregating over space in order to capture enough housing transactions to estimate an index, we aggregate over time by reducing the frequency to an annual panel. As the table shows, we produce long panels of annual HPIs at various levels of geography. While this may introduce temporal aggregation bias, especially concerning volatility Table 1 Publicly available house price indices in the United States. Note: "CBSA" stands for Core Based Statistical Area, which includes both MSAs and MicroSAs. "ZIP3" refers to an area defined by the first three numbers in a ZIP code while "ZIP5" are smaller areas within each ZIP3 area, denoted by a five-digit identifier. (Calhoun, Chinloy and Megbolugbe 1995), it allows us to document house price movements across smaller areas in ways that have not been explored before and are otherwise impossible to analyze when using higher levels of geography, like city or state HPIs.
Using a rich, proprietary data set of mortgage transactions going back to the 1970s, we are able to construct indices down to the five-digit ZIP code and census tract-level across the nation using a repeat-sales methodology. This technique is attractive because of its limited data requirements for each observed transaction (i.e., only a property identifier, sales price and date are needed), though it results in discarding transactions when multiple sales for the same unit are not observed in the data set. 7 The repeat-sales methodology is explained below.
Suppose the (natural log) value y of house i at time t can be written as follows: where x is a vector of attributes, β is a vector of relative implicit prices, δ is a vector of price levels over time and D is a vector of traditional dummy variables set equal to one in period t and 0 otherwise. 8 The empirical estimation of β requires information about the house's structural characteristics, location and surrounding neighborhood attributes. Fortunately, the conventional weighted repeat-sales (WRS) methodology uses a differencing technique that eliminates the need to estimate β by "pairing" the same house across periods, or 7 Repeat-sales HPIs have a long tradition, dating back to Bailey, Muth and Nourse (1963). The index approach gained wide notoriety in the late 1980s with Case andShiller's (1987, 1989) seminal work, and has since been the subject of intense research, including Calhoun (1996) that describes the Federal Housing Finance Agency (FHFA)specific methodology. Recently, newer methods have been developed to exploit the broader information sets (e.g., public and private appraisals, real estate listings, official recordings and geographic maps). Unfortunately, even though data digitization has provided more information about property-and neighborhood-level characteristics, it seldom captures sales information prior to the mid-1990s. While newer techniques may have empirical advantages over the repeat-sales methodology, they tend to be dataintensive and are constrained to shorter time periods as well as limited geographic coverage. 8 We implement a geometric index rather than an arithmetic index due to the latter's greater sensitivity to outliers. This follows the FHFA methodology.
Under the assumption that the house's characteristics, both observable and unobservable, are constant across both transaction periods, x it − x iτ = 0 and the model becomes where y itτ ≡ y it − y iτ , D tτ ≡ D t − D τ and itτ ≡ it − iτ . This approach sometimes raises important concerns about assumptions underlying the repeatsales technique. Generally, these issues relate to depreciation, renovations and other factors which may influence value differentials across sales that are unrelated to movements in the price of housing services. 9 Despite these potential problems, the constant-quality assumption is necessary for the index to be estimated.
A statistical issue with Equation (3) is that the error variance may be predictable based on the time between transactions, or "holding period," t − τ . This knowledge gives justification for the use of an Feasible Generalized Least Squares (FGLS) procedure, where the estimated first stage residuals are in turn modeled as a function of the time between transactions and the time squared: 10 9 There are some well-known issues with the assumption that characteristics do not change. For instance, a house can depreciate over time by as much as 2% per year (Harding, Rosenthal and Sirmans 2007). In addition, renovations may improve the quality or size of a housing unit, with recent evidence suggesting the renovation or "flip" bias can range from 1.5% to as much as 20% (McMillen and Thorsnes , Depken, Hollans and Swidler 2009, Billings 2015. In addition to changes in structure attributes, the conditions of the sale may also affect the price. For example, house prices can be depressed when homeowners terminate mortgage payments and are either forcefully evicted through the foreclosure process or voluntarily dispose of their houses in a lender-permitted "distressed" sale (Leventis 2009). Foreclosures have been shown to depress the value of surrounding houses (Immergluck andSmith 2006, Harding, Rosenblatt andYao 2009) and a growing concentration of them can lead to price discounts in excess of 25% (Campbell, Giglio andPathak 2011, Depken, Hollans andSwidler 2015). The large discount is partially attributable to the long foreclosure process after delinquency as well as possible deterioration of property condition. Since there is a shorter timeline between when a borrower stops payments and a distressed sale happens, the time-varying discounts are smaller, with a range between 5% and 20% (Depken, Hollans andSwidler 2015, Doerner andLeventis 2015). This negative impact eventually vanishes as the house sells again and better information is available about its physical condition.
10 Note that the FHFA methodology omits the constant term in Equation (4) due to the possibility of negative predicted variance. We include it for greater functional flexibility. A referee asked whether Equations (4) and (5) are even necessary steps. In other words, whether our later results are sensitive to the weighting. We tested with the weights as defined, the inverse of those weights and without weights. The general trends in the figures and tables are qualitatively the same. Still, we decide to proceed with the WRS methodology for comparative purposes with testing against other FHFA indices because all of them use that approach.
The fitted residuals from this auxiliary equation,˜ 2 itτ =α 1 +α 2 (t − τ ) + α 3 (t − τ ) 2 are applied as weights in a second-stage regression written as The HPI is then computed by exponentiating the estimatedd t as I t = expd t , which we normalize so that the base year is 1990, or I 1990 = 100. 11 In this article, housing unit transaction data are from the FHFA's "alltransactions" sample, which includes conventional mortgages of single-family purchases and refinances that are acquired or guaranteed by Fannie Mae or Freddie Mac. 12 This data set contains over 97 million transactions with nearly 54 million transaction pairs in the United States from 1975 to 2015.
We construct annual HPIs for each area with at least 100 repeat sales (RS) in the data, subject to two filters, the purpose of which are to eliminate transaction pairs from the sample that are likely to violate the constant-quality assumption. 13 The first removes any pair of transactions with annual average appreciation rates greater than ±30%. This filter is common in repeat-sales index construction and serves to remove homes from the sample that have undergone substantial quality changes. The second filter removes any pair of transactions where the two sales are within the same 12-month period. This filter is meant to remove "flips," or homes that are purchased, rehabilitated, then sold in a short length of time. 14 For each area with over 100 RS, the index begins once 25 half-pairs (HP) are observed in a single year in the admissible RS sample. 15 Table 2 shows our local HPIs have longer horizons and greater granularity than is currently available from any other publicly available data source. About half of the MicroSA, county, five-digit ZIP code and tract-level indices begin in 1990 or earlier. The vast majority of MSA and three-digit ZIP codes begin in 1980 or earlier. Geographic coverage at the turn of each decade is given in Figure 1, showing the increasingly broad cross-section of data available over time. As an illustration of several of the indices constructed, Figure 2 shows selected real HPIs [net of the all-goods Consumer Price Index (CPI)] for various ZIP codes and counties in the Washington DC-MD-VA-WV MSA. There is wide variation in index values across different measures of geography, especially in later periods. Noise is often high in early periods due to low observation counts, but there is enough signal to extract some useful information using econometric methods.
To compare similar submarkets across cities, we define several regions based on distance to the CBD for five-digit ZIP codes. 16 These include: the centercity, which consist of ZIP codes within 5 miles of the CBD; the middle city, which consists of ZIP codes between 5 and 15 miles; the suburbs, which consists of ZIP codes between 15 and 25 miles and the exurbs, which consist of ZIP codes beyond 25 miles from the CBD. ZIP codes in CBSAs are more likely to have high transaction counts, and thus begin in earlier periods. 17 Within CBSAs, the closer a ZIP code is to the CBD, the earlier the start date. These submarket classifications are important because they Note: An index is calculated for any area with more than 100 repeat sales over the sample, with an index start year once a threshold of 25 half-pairs (HP) is met. We define the term "half-pairs" as a transaction occurring in either t or τ . For instance, if a home sells in 1978 and 1990, the transaction pair will contribute one HP to the count for 1978 and one HP for 1990. If the home sells a third time in 1998, the home is involved in two transaction pairs, contributing one HP in each of 1978and 1998, and two HPs in 1990. The amount of unique information in the HP count is thus between HP and HP/2. enable measurement and tests of submarket variation in house price dynamics, including gradients of the rate of appreciation, appreciation volatility and stationarity.

Muth's Equation and the House Price Gradient
We introduce the SUM as it relates to transportation costs in order to frame the discussion of facts revealed by our HPIs. The standard rendition of the SUM assumes a monocentric city with exogenous employment in the CBD (Alonso 1964, Mills 1967, Muth 1969. Households commute to this center at a cost t per unit of distance k, creating a downward-sloped bid-rent curve. Equation (6) is termed "Muth's Equation," and shows that house prices fall with distance to the CBD at a rate of −t/H (k; α), where H is housing consumption at location k under a vector of exogenous demand shifters α.
This equation shows that the house price gradient is driven by transportation costs and other exogenous factors within α that would influence the desirability of housing as it relates to distance to the CBD. These exogenous factors may consist of variables such as income (combined with an income elasticity of demand for housing that is less than 1), exogenous increases or decreases in center-city crime or exogenous demographic shifts related to household size and formation. 18 Empirically, however, the absolute distance to the CBD may be an imperfect measure. For instance, nonradial transportation networks and city polycentricity may result in nonmonotonic transportation costs as a function of absolute distance (McMillen and Smith 2003). Commuting costs are also likely correlated with other variables that are measurable, such as structure density and the housing consumption gradient (Brueckner 1987). Because these gradients are each governed by the same relationships in the SUM, these variables may provide additional information.
Beyond the canonical rendition of the SUM, a large body of literature has found other gradients are logically consistent with the model. For example, the modal choice of commuting (Voith 1991) is related to k, with more households choosing to commute via automobile the further from the CBD because car ownership is land-intensive. Other variables have known gradients that can be derived from the consumer's utility function, including incomes, marital status, the number of children and employment and labor force status. For nonworkers, income is endowed and commuting transportation costs are eliminated, causing such households to reside further from the CBD where transportation costs for would-be commuters are higher (Blackley and Follain 1987). Similarly, households with children demand more housing, suggesting optimal residence in the suburbs and in low housing-cost cities (Black et al. 2002). Finally, an income elasticity of demand for housing less than one suggests higher income households will live in the suburbs, where housing expenditure shares are lower. 19 These alternative correlates of commuting costs can be used to further evaluate changes to the desirability of proximity to the center-city. We also wish to be clear from the outset that we do not claim causality in the estimates, only correlations and stylized facts.

Appreciation Within Cities
The ZIP code HPIs suggest price appreciation tends to be highest near the CBD between 1990 and 2015 over a balanced panel of ZIP codes. 20 Figure 3 shows average appreciation rates for nine large cities over a 25-year period. Dark blue cells indicate higher appreciation rates and are generally clustered in center-city ZIP codes while lighter cells indicate lower appreciation rates and are more common on the peripheries. This pattern exists in a variety of other cities, as well. 21 Figure 4 measures this appreciation gradient across all of the cities in our sample from 1990 through 2015. It illustrates a negative correlation between ZIP code-level house price appreciation and distance to the CBD. In CBSAs with over 500,000 housing units, house price appreciation declines markedly with distance (most apparent in the first 15 miles from the CBD); in smaller cities the appreciation gradient is flat. For large cities, average real appreciation is about 2% per year in areas near the CBD compared to an appreciation rate of about 1% in areas that are 10 miles from the CBD. 22 At 25 miles from the CBD, average real appreciation in both large and small cities is about 0.3% per year. In general, ZIP code-level house price measures suggest that house price gradients have steepened in large cities over the last several decades.
To explore the timing of changes in house price gradients, we separate the 40-year period into eight distinct five-year windows, shown in Figure 5. We examine a 40-year period in this case because we no longer require a balanced panel of indices within the entire sample period, allowing us to extend the timeframe of analysis. Five-year growth rates are computed for all ZIP codes where sufficient data are available, with sample counts listed in subfigure headers. 20 We discuss ZIP5 indices for expository purposes. The sample period of 1990 to 2015 is chosen for two reasons. By 1990, a large number of ZIP codes have reliable HPI estimates. In addition, in later sections, we examine effects of various initial ZIP code characteristics on appreciation. Because a decennial census occurred in 1990, this year becomes a natural choice for the start of the analysis. 21 The effect is not perfectly symmetric in all directions as one might expect with natural, man-made and regulatory barriers (e.g., rivers, highways and exclusionary zoning) but the general concept tends to hold well across different-sized cities throughout the United States. 22 City size is measured in 1990, so categories are based on the number of housing units at this date. However, between 1990 and 2015, cities may grow at different rates. As a robustness exercise, we reproduced the same figure, splitting the sample into growing and declining cities. Appreciation has similar growth rates near the CBD, but the mean growth rate is higher in growing cities in the suburbs. Thus, the appreciation gradient is slightly flatter in growing cities.  Notes: Two ZIP Codes have annual average appreciation rates greater than 6% or lower than −3%. These are omitted from the scatterplot but are reflected in the curves.
Price gradients are relatively stable in the 1975 to 1980 period for both large and small cities, as signified by a lack of trend in the respective appreciation gradients. By 1985, a rotation in the large city gradient is clear, and this downward-sloping appreciation gradient appears in every panel through the end of the sample in 2015. In contrast, small city price gradients appear relatively stable, with no clear patterns. Cumulatively, these panels suggest house price differentials between the center-cities and suburbs in large cities began expanding sometime in the mid-1980s, and have widened in every five-year period since.
Stationarity tests lend support to the notion of accelerating real center-city house prices. Augmented Dickey-Fuller (ADF 1981) regressions are estimated for each ZIP code's full time series in order to test for the presence of a unit root. 23 Figure 6 visually depicts how stationarity changes across a city. The null hypothesis of a unit root is rejected more often (1.5 to 2.5 times) in smaller cities than in large cities. In addition, unit roots are increasingly rejected the further a ZIP code is located from the CBD in both small and 23 For our main results, a deterministic constant term is modeled but not a trend, because while areas may have different real levels of house prices due to some economic advantage, it is difficult to justify a model with real house prices trending upward ad infinitum. When a trend is included, the stationarity gradient flattens, indicating some center-city areas appear to be stationary around a deterministic trend. large cities. For both small and large cities, real house prices are more likely to be stationary near the edge of the city and less likely to be stationary the closer the ZIP code is to the CBD.
We return to the 25-year sample period to further explore appreciation rate differentials. We begin by stochastically specifying house price appreciation in ZIP code z in city c as a function of the distance to the CBD, k, where ZIP code-level house price appreciation is approximated by the log difference, p. Distance to the CBD is not a perfect measure of commuting costs in a real-world city, so we also include a vector of covariates X , which may include the modal choice of commuting, income, household demographics and other factors that may be correlated with area desirability. It should be stressed that these covariates are not interpreted as causal determinants of house price appreciation, but rather, as additional controls allowing us to identify the effect of geographic proximity. In particular, these covariates capture explanatory power related to violations in the city monocentricity assumption.
The following specification includes CBSA fixed effects, allowing us to isolate within-city variation in house price appreciation, Equation (7) is estimated for a variety of potential initial covariates, calculated using data from the 1990 Decennial Census. 24 Table 3 shows how different initial covariates are related to cumulative real house price appreciation between 1990 and 2015. This cross-section is the 3,129 ZIP codes within 30 miles of the CBD ZIP code that exist in one of 38 large (over 500,000 housing units in 1990) CBSAs. Covariates are measured at the ZIP code-level and are loosely grouped into five main categories: transportation, structure, labor force, earnings and housing demand. Each covariate relates to an SUM concept with a known gradient slope as a function of commuting costs. 25 In general, results indicate that commuting costs and other correlates of proximity to center-city locations drive house price changes in large cities over the sample period. The distance to the CBD has a significant, negative relationship with appreciation, with a doubling of CBD distance decreasing real house price appreciation by about 14%. Across models, this point estimate is remarkably stable, indicating a low degree of omitted variable bias.
The addition of transportation variables, as shown in Column (2), has limited impact on the explanatory power of the model, suggesting travel time and commuting method may be largely collinear with distance to the CBD, though neither is individually statistically significant. Column (3) considers structural attributes, with denser structures and smaller units suggesting closer proximity to the CBD and an association with positive appreciation. In Column (4), the labor force fraction is positive which is predicted by the SUM to be directly associated with the proximity to the CBD because of the need for workers to commute. In Column (5) higher income is associated with for which indices exist over the sample period, subject to four filters: the indices must exist in a single CBSA, the number of housing units must not have changed by more than 30% over the sample period, the ZIP code must be within 30 miles of the CBD and conditional on these three, there must be more than one ZIP code index in the CBSA.
higher appreciation. Column (6) presents two variables that are associated with housing demand-children and being married. Households with children have a greater demand for space, a resulting suburban location, and reduced house price appreciation. The effect is muted for families with a married head of household and children. A model that includes all these covariates suggests transportation, structure, labor force, and housing demand attributes are each driving some appreciation within cities.
The distance result is echoed in sign but is much smaller in small cities, defined as those with under 500,000 housing units in 1990. After applying the filters necessary to estimate the models with our desired sample, we are left with 260 small CBSAs, as shown in Table 4. Despite the sevenfold increase in the number of cities, we have about 25% fewer ZIP codes than in the large city sample, with 2,346 available. The point estimate of the CBD distance measure is −0.085, but falls to −0.024 in Column (7) as more variables are added. This change in the parameter estimate across models indicates that the bivariate relationship likely suffers from omitted variable bias, and that CBD proximity is not the main driver of appreciation rates in small cities. Rather, other covariates appear to be more important drivers of changes in house prices.
All considered, appreciation rates in small cities are negatively related to CBD distance in the small city sample. We attribute the difference in this result from that observed in Figure 4, which shows no gradient rotation, to sample selection. By demeaning the series by the CBSA rate and focusing on ZIP codes within 30 miles from the CBD, we are effectively restricting our sample to those cities with more than two ZIP codes within 30 miles spanning the 1990 to 2015 sample period. This limits the sample to ZIP codes in cities that have sufficient density to warrant multiple ZIP codes. As shown in Tables 3 and 4, density has a large effect on appreciation rates. Therefore, our estimates are likely to suffer from sample selection bias, and our regression results may not be generalizable to all small cities.

Volatility Within-Cities
Appreciation volatility shows similar spatial characteristics to appreciation rates. Figure 7 shows the median absolute deviation of annual house price changes for each large city ZIP code as it relates to distance from the CBD. 26 Unconditionally, house price volatility in large cities decreases 26 Median absolute deviation is defined as the median deviation from the median appreciation rate over the maximum sample period for a particular ZIP code, or M AD c,z = med c,z (|Y c,z,t − med c,z (Y c,z )|). This measure is preferred to variance or Table 4 Appreciation regressions, small cities (<500k units). LHS Variable: HPI (log)

Model
(1) (3) for which indices exist over the sample period, subject to four filters: the indices must exist in a single CBSA, the number of housing units must not have changed by more than 30% over the sample period, the ZIP code must be within 30 miles of the CBD and conditional on these three, there must be more than one ZIP code index in the CBSA. slightly with distance from the CBD. As illustrated, volatility in small cities appears to dip and then increase with distance from the CBD, though this may be confounded by increasing estimation error. Table 5 presents estimates of volatility as a function of the same covariates in Equation (7). These models relate the median absolute deviation in year-onyear log differences in house prices to SUM variables. We add an additional variable, the log of the number of transactions in the ZIP code, to account for estimation error-related volatility.
The volatility results echo the appreciation results, increasing with proximity to the CBD, controlling for the number of transactions in the source data. In general, the signs of parameter estimates are qualitatively similar to the appreciation results.

Between versus Within-City Differences in Growth Rates
Based on the appreciation rate and volatility results, it is clear there is substantial variation in growth rates both within and across cities. In this standard deviation because it is not influenced by extreme values, which can occur in house price series in periods when transaction data are sparse. Note: *** p < 0.01, ** p < 0.05, * p < 0.1. The left-hand side variable is the ZIP codespecific median absolute deviation of the annual appreciation rate between 1990 and 2015, demeaned using the CBSA index. Standard errors are adjusted based on clustering at the CBSA level. The sample includes all ZIP codes for which indices exist over the sample period, subject to three filters: the indices must exist in a single CBSA, the number of housing units must not have changed by more than 30% over the sample period and the ZIP code must be within 30 miles of the CBD.
section, we present facts regarding between versus within-city appreciation differentials.
To explore appreciation rate differentials, we consider annualized five-year growth rates in CBSA-level real house price indices. The cross-sectional standard deviation of these growth rates in each five-year period is presented in the "Between-City" rows in Table 6. The standard deviation of growth rates ranges from 5.1% in the 1975 to 1980 period to 2.0% in the 1995 to 2000 period, with an average of 3.5%. Between-city variation in appreciation rates is greater for every period in large cities except for 1990 to 1995, with an average of 4.0% for large cities versus 3.4% for small cities.
To construct within-city statistics, we begin by calculating five-year growth rates in real house prices for each ZIP code, and demean them by the CBSA growth rate. The standard deviation of the cross-sectional sample of demeaned ZIP code growth rates is presented in the "Within-City" rows for each five-year period. On average, within-city variation is about one-third of the between-city variation for small cities (1.3% vs. 3.4%) and one-half for large cities (2.1% vs. 4.0%). Within-city growth rates are also different in a roughly monotonic fashion based on proximity to the CBD in large cities.
Center-city areas have the greatest differences within cities, and suburban Table 6 Between versus within-city variation in house price growth rates. 1975-1980 1980-1985 1985-1990 1990-1995 1995-2000 2000-2005 2005-2010  areas, the smallest. The result that statistics for suburban areas of large cities closely resemble those in small cities echoes Figures 4 and 5, where we observe that growth rates to be similar in the two groups.
We also calculate differences for large cities, defined as the top 10 cities in terms of the number of housing units in 1990. Dallas, TX, consistently has the lowest within-city variation in house price appreciation compared to the others at 1.8%, on average. New York, San Francisco, Los Angeles and Miami all are roughly tied for the highest, at about 2.5%. The average of the within-city variation of the largest cities is greater than the within-city variation in even the "large city" category, indicating that the larger a city is, the greater the potential heterogeneity of housing markets and therefore appreciation rates within the city.
Overall, these results highlight the need for within-city house price indices, especially in large cities. While in some periods, within-city variation is quite low and perhaps benign in terms of aggregation bias, there are many periods where the annualized within-city standard deviation of demeaned growth rates is upward of 4%. This indicates that over the five-year period, the average variation in house prices within a city is 20%. For economists, practitioners and policymakers, this degree of variation is often of critical importance, including those targeting at-risk borrowers, determination of collateral risk of mortgage products or simply predicting what a home should be worth a number of years after a sale.

Gradient Rotations and the Elasticity of Housing Supply
In prior sections, we have presented a single city-level channel by which house price gradients have changed, city size (small vs. large). Here, we consider gradient rotations that are correlated with the elasticity of housing supply.
Some discussion of urban theory is necessary, as supply factors alone cannot cause a gradient rotation in the SUM. Muth's equation predicts only transportation and housing demand factors can influence the house price gradient. This is because housing at any location in the city is perfectly substitutable in the model. In the long run, households are only willing to pay what their iso-utility conditions will permit, and this is not a function of housing supply. Variables that measure the elasticity of housing supply are therefore predicted to have no correlation with gradients through supply channels. Table 7 considers models where ZIP code-level house prices (demeaned by CBSA) in large cities are modeled as a function of the distance to the CBD, Note: *** p < 0.01, ** p < 0.05, * p < 0.1. The left-hand side variable is the log difference in real house prices between 1990 and 2015, demeaned using the CBSA index. Standard errors are adjusted based on clustering at the CBSA level. The sample includes all ZIP codes in large cities (>500k housing units in 1990) for which indices exist over the sample period, subject to four filters: the indices must exist in a single CBSA, the number of housing units must not have changed by more than 30% over the sample period, the ZIP code must be within 30 miles of the CBD and conditional on these three, there must be more than one ZIP code index in the CBSA. Decline is Glaeser and Gyourko's (2005) index of urban decline (higher values represent more decline). Regulation is the Wharton Land Use Regulation Index (WRLURI) described by Gyourko, Saiz and Summers (2008) (higher values represent greater regulation). Topo. Interruption is the fraction of land within an urban area that is topographically unable to be developed according to Saiz (2010). >2m Units takes the value of one if the ZIP code resides in a city with greater than 2 million housing units in 1990, and zero otherwise. and this distance interacted with a supply elasticity determinant in the literature. Measures considered include the fraction of homes in a city in 1990 that are priced below the replacement cost of structure (Glaeser and Gyourko, 2005), the Wharton Land Use Regulation Index (WRLURI) introduced by Gyourko, Saiz and Summers (2008), the fraction of land near the center of the city that is topographically unavailable for development (Saiz 2010) and finally, a proxy for the availability of developable sites, a dummy variable for a city with greater than 2 million housing units in 1990.
Individually and jointly, urban decline and regulation appear to have no effect on price gradient rotation, as predicted by the SUM. But topographic interruptions and the large city dummy are correlated with flattening gradients. Because large cities and those facing interrupted development tend to have steep initial gradients, our interpretation of these site availability coefficients is that gradients in smaller and less topographically constrained cities are catching up to the steepness of those in other cities. Davidoff (2016) has a potential explanation for the statistical significance of the coefficients, despite the SUM's prediction of no effect. In this research, he documents that low supply elasticity is often correlated with demand-side amenities. So, gradient rotations may in fact be due to changing household composition in these cities as a result of demand rather than supply factors. We encourage future research on gradient rotations to explore such avenues.

Discussion and Conclusions
This article introduces the first panel of annual, local HPIs from 1975 through 2015, including ZIP codes and census tracts. Prior to the introduction of this data set, the lowest level of aggregation of publicly available, constant-quality indices was at the city level. This highly disaggregated panel creates a new opportunity to investigate interesting research avenues that have remained closed due to lack of data availability.
We produce stylized facts related to house price appreciation gradients over a broad cross-section of cities over a long period of time. Overall, estimates suggest proximity to the center-city is a major factor explaining house price movements in the United States over the sample period, with house price gradients steepening in large cities between 1990 and 2015, and some evidence of this trend beginning in 1985.
The SUM of Alonso (1964), Mills (1967) and Muth (1969) highlights the trade-off between housing location and housing consumption, and how competition for scarce land leads to house price gradients within cities. Based on this model, there are many possible explanations for steepening gradients, including increases in traffic congestion, more extensive center-city amenities, lower center-city crime or changing preferences, to name a few (see Glaeser 2011, or Edlund, Machado andSviatchi 2015 for discussion). Within the SUM, migration of young, high-income households-who have low relative preference for housing-to center-cities is expected to occur endogenously with the rotation in the price gradient, along with gentrification, displacement of low-income renters and finally, substantial wealth increases and reduced mortgage default probabilities for initial homeowners. Previous findings of increases in center-city concentrations of households with relatively low housing demand are therefore corroborated by the price gradient results, including Black et al. (2002), Diamond (2016) and Couture and Handbury (2016).
The final contribution of the house price panel presented in this article is the establishment of facts related to the stationarity of house prices in different parts of the city. The data show ZIP code-level house price series to be nonstationary at higher rates near the CBD and, holding distance to the CBD constant, in large cities (see also Bogin, Doerner and Larson 2019). This is potentially consistent with the notion that the elasticity of housing supply is higher in suburban areas. In an area with a highly elastic housing supply, a permanent housing demand shock is first capitalized into prices, but over time as quantities adjust, prices return to preshock levels (see Glaeser et al. 2014). In contrast, near the CBD, where buildable sites are less available and regulation is presumably more onerous, a permanent demand shock can outpace supply responses, leading to price increases. While in the long run, the SUM implies gradients cannot be affected by supply concerns due to the within-city iso-utility condition, there may be reasons for short-run supplyinduced price differentials.
Combined, these findings demonstrate a small sample of the many potential applications of a panel of highly disaggregated house price indices. When a local house price measure is necessary, current practice is to use in its place either a geographically aggregated index, a value measure that confounds house prices and quantities or a proprietary index that may lack coverage and is unavailable to many potential users. While value proxies may be appropriate in certain narrow circumstances, such as when the characteristics of housing units in an area are identical and unchanging, bias introduced by these measures may have major consequences. For instance, using a home value index in place of a price index in a fast-growing area may introduce an upward bias in perceptions of appreciation, and using a city-level index in place of a local index when estimating current loan-to-value ratios for mortgages may introduce substantial error. Many applications and research opportunities already exist for an accurate, long-horizon panel of geographically disaggregated house price data, and we hope these indices will unlock promising insights.