A High-Resolution Statistical Model of Residential Energy End Use Characteristics for the United States


Address correspondence to:
Zeke Hausfather
Efficiency 2.0
128 East 7th St.
New York, NY 10009


The absence of detailed information on residential energy end use characteristics for the United States has in the past presented an impediment to the effective development and targeting of residential energy efficiency programs. This article presents a framework for modeling space heating, cooling, water heating, and appliance energy end uses, fuels used, and carbon emissions at a zip code–level resolution for the entire United States. It combines a regression-based statistical model derived from Residential Energy Consumption Survey data with U.S. census 2000 five-digit zip code level information, climate division–level temperature data, and other sources. The results show large variations in energy use characteristics both between and within different regions of the country, with particularly notable differences in the magnitude of and distribution by fuel of residential energy use in urban and rural areas. The results are validated against residential energy sales data and have useful implications for both residential energy efficiency planning and further study of variations in use patterns.


To understand and model opportunities for households to reduce energy use and carbon emissions, it is useful to have reasonably accurate estimates of how households use energy and how they can best reduce their use. While this is often best accomplished by in-home audits and bottom-up engineering models of how a specific home functions, those approaches are often prohibitively expensive to scale and require large amounts of household-specific data (Swan and Ugursal 2009). This article develops a high-resolution zip code–level estimate of the average residential energy use by fuel and end use category for the entire United States via an econometric model. In the absence of detailed home information, a spatially explicit model of residential energy use can serve as a first pass estimate of home energy end use characteristics and as a tool for targeting energy efficiency recommendations. It can also help identify spatial patterns in specific residential energy end uses and fuel consumption, and can be used to address questions of the effects of urban form and other factors on energy use.

Energy consumption in the residential sector represents approximately 22% of total U.S. energy consumption and 21% of total carbon emissions (EIA 2009a, 2009b). There is a paucity of detailed information about residential energy use vis-à-vis commercial, industrial, and transportation energy use due to the wide variation in house characteristics, temperatures, and behavioral patterns in the nearly 128 million housing units in the United States (Swan and Ugursal 2009). Additionally, detailed information on the breakdown of residential energy use into different end use categories—space heating, water heating, cooling, and appliances—is limited by the prohibitive cost of submetering specific devices (Swan and Ugursal 2009).

The various modeling techniques for residential energy use can be classified into two approaches: top-down and bottom-up. The terminology refers to the hierarchical position of data inputs with respect to the housing sector. Top-down models attempt to attribute aggregate energy consumption data to different characteristics of the housing sector and economy, with the primary purpose of identifying long-term trends in energy consumption. Variables that are commonly used by top-down models include macroeconomic indicators (employment rates, price indices), climatic conditions, and housing construction and demolition rates (e.g., Hirst 1978).

Strengths of the top-down approach include the need for only aggregate data that are widely available, the ability to detect trends over time when historical data are used, and the ability to compare across countries. Due to the macroscopic approach, information on local and regional variation in residential energy use is almost impossible to obtain from top-down models. Top-down models also have no inherent capability to directly model changes that are not reflected in economic or demographic variables, such as improvements in technology or behavioral changes (Swan and Ugursal 2009). Bottom-up models are models that use input data more granular than the housing sector as a whole. The input data can take the form of individual end uses or a sample of houses. The resulting models are then extrapolated to represent the geographic area of interest. Two classes of models can be identified within bottom-up models: engineering models and statistical models (Swan and Ugursal 2009). Engineering models estimate the energy consumption of various end uses by taking into account energy ratings and usage of equipment. They provide detailed profiles of individual houses, or of a housing stock with relatively uniform archetypes, but the intensive data requirements of engineering-based approaches and their limitations in capturing variations due to demographic and socioeconomic-driven behavioral characteristics somewhat restrict their usefulness in large-scale high-resolution analysis (Larsen and Nesbakken 2004). For example, Heiple and Sailor (2008) developed an engineering model using Residential Energy Consumption Survey (RECS)-based census region factors combined with county tax assessment parcel data and weather records for specific urban areas. The Lawrence Berkeley National Laboratory's Home Energy Saver (HES) model has long been a standard for bottom-up estimates of home energy use, though it requires a fairly extensive set of home-specific inputs to produce detailed results. In the absence of specific inputs, the HES defaults to broad census-division average home characteristics obtained via the RECS (Mills et al. 2008).

Statistical models are regression models that establish a relationship between household energy consumption and various end uses while controlling for exogenous variables such as climatic conditions and household occupancy. Statistical models require careful attention with respect to model selection and validation. Douthitt (1989) constructed a regression model of space-heating fuel use by regressing heating fuel consumption on fuel price, price of alternative fuel, total fuel consumption, and various household characteristics that explained the majority of variation in space-heating requirements. Similarly, Ewing and Rong (2008) studied the relationship between urban form and residential energy use. Using Ewing and colleagues’ (2003) county sprawl index as a measure of urban form, they found that compact development is associated with lower residential energy consumption as compared with sprawling counties. Brown and Logan (2008) estimated the residential energy and carbon footprints of the 100 largest metropolitan areas in the United States. Using proprietary utility sales data that allowed for analysis at the zip code level, they found that residential energy consumption per capita is significantly lower in metropolitan areas than for the entire United States.

The Vulcan project (Gurney et al. 2009) took a novel approach using data and models developed for conventional air pollutants to produce a high-resolution estimate of carbon emission sources, though their model shows carbon production as opposed to consumption. This distinction is important for electricity at a residential level, because the actual carbon emissions occur at power plants rather than via in-home fuel consumption. Vulcan has been adapted to model energy use by urban form (Parshall et al. 2010), though it does not differentiate by building sector (e.g., commercial, industrial, residential) and cannot easily model electricity consumption at the point of use.

The detailed data input of bottom-up modeling allows for the estimation of energy consumption of different end use and the effect of technology change. However, detailed data, especially broken down by energy end use, is difficult and expensive to acquire on a large scale. The only nationwide data source that provides energy end use breakdowns for a representative sample of houses in the United States is the Department of Energy's RECS, conducted roughly every four years. Unfortunately, the RECS is not particularly useful for analyzing spatial energy use patterns, as it explicitly excludes any spatial characteristics other than census regions and specifically inoculates small errors in certain datasets to avoid geographic identification by temperature or fuel price. The RECS can be used, however, to analyze how home characteristics, appliance ownership, demographics, and other factors influence the distribution of residential energy use into different end use categories (EIA 2008).

Many of the factors that the RECS identifies as significant in determining energy end uses can be found in the U.S. 2000 census five-digit zip code level data (U.S. Bureau of the Census 2000). Applying the relationships derived via the RECS to the high-resolution data from the U.S. census, climate division temperature data from the National Climate Data Center (NCDC 2009), and emission factors and generation efficiencies from the U.S. Environmental Protection Agency's (EPA) eGRID (EPA 2009), we can construct a bottom-up statistical model of residential energy end use characteristics at a zip code resolution for the United States.

Model Methodology

Specifying Regression Models for Each Energy End Use

Our regression analysis consists of four major residential energy end use categories: space heating, water heating, cooling, and appliance. We constructed a statistical regression model for each category with the microdata files from the RECS of 2005 released by the EIA (2008). The RECS collected data from 4,382 households randomly sampled through a multistage, area probability design method to represent 111.1 million U.S. households, the U.S. Census Bureau's statistical estimate for all occupied housing units in 2005 (EIA 2008). Each sampling weight value from the RECS data was used as a weighting factor for the weighted regression analysis.

The ordinary least square (OLS) method was used with predictor variables including energy price, household characteristics, housing unit characteristics, regional fixed effects, and heating/cooling degree-days (see figure 1 for a sample of specific variables used). Dependent variables of the four regressions were natural log values of per-household final energy use for heating, water heating, appliance, and cooling. Appliance energy here includes energy consumption from all types of household lighting and appliances such as those used for cooking, refrigeration, audio/video, computer, dish/clothes washer, dryer, pool heating and pumps, and so on.

Figure 1.

Sample of major variables and residential energy use for different end uses. Note: t-values are in parentheses; *p < 0.05, **p < 0.01, ***p < 0.001; Elec = electricity; deg. = degree; Div = division; Adj. R2= adjusted R2, which is a measure of the goodness of fit of the line; Number of Sig. Var. = number of significant variables; Coeff = coefficient. This is a limited sample of variables with particularly large t-values. Full regression models are available in the supporting information on the Journal's Web site. Not all variables are used in each regression, and blank spaces in columns indicate cases where the variable in question is not significant for the specific regression.

The model can be formulated as


where j indicates the four categories of heating, water heating, cooling, or appliance, Ej is total annual energy consumption for each end use, and inline image means predictor variable Xi whose value is from the RECS dataset. This RECS notation is used because we want to differentiate the RECS data used for model creation and the census data for prediction. The dependent variables Ej are an aggregation of annual energy use per end use, which the EIA estimated from the total fuel uses per household. Each Ej means


where NG means natural gas, EL electricity, FO fuel oil, and LP propane. The regression results for selected major variables are shown in figure 1, and the comprehensive results are included in supporting information available on the Journal's Web site. The space-heating, cooling, water-heating, and appliance regression models use 18, 15, 16, and 12 significant variables, respectively, including regional fixed effects (via census division) and squared terms. We converted some explanatory variables into log forms to have better information criteria values.

Utilizing Census Data to Achieve High Geographical Resolution

Since the goal of this research is to estimate per-household energy use in a geographical resolution as granular as possible, the resolution in the RECS dataset, which is the U.S. census division level, was not appropriate. Instead, we made use of the U.S. 2000 census dataset (U.S. Bureau of the Census 2000) containing five-digit zip code–level information for independent variables used in our regression models, including race of householder, mean household income, mean age of householder, mean number of people per household, mean number of rooms, number of housing units by type, number of housing units built at different year ranges, number of housing units using each home-heating fuel type, number of housing units in urban and rural areas, and number of housing units where the head of household works at home. For variables in the unit of number of housing units, we standardize them to determine the ratio of households in each zip code with the associated characteristic, which then become comparable with the binary variables in the RECS data set. For temperature data, we use climate division daily average temperatures from the NCDC (2009), averaging heating and cooling degree-days relative to 65 degrees over the preceding five years to better reflect average annual climatic conditions. We used geographic information systems (GIS) to map specific zip codes to climate divisions using shape files provided by the NCDC, assigning zip codes to the climate division containing the majority of their land area in the case of overlap. Fuel price data for 2005 were obtained from state-level average residential prices provided by the EIA (2009d), and was fixed at 2005 values to correspond with the year of the RECS survey and avoid implicitly assuming any short-term price elasticity.

Once we created regression models for four end uses as above, we plug the census and the weather data Xi, census into equation (1) to predict zip code level energy estimates inline image:


Modeling Energy Use by Fuel Type

From the estimated energy use per end use, we can estimate residential energy consumption by fuel type for both prediction and validation purpose. To do so, we first need to disaggregate each specific energy end use category estimate into component fuel types. First, we assume all cooling energy is from electricity, such that all inline image is added to electricity use. Second, water-heating and space-heating energy, inline image and inline image, are divided into four different fuel types depending on the coefficients of the regressions and the percentage of households using each fuel in the zip code area. Since our model is log-linear, each coefficient β of a dichotomous variable can mean, when β is close to zero, 100·β% change in the dependent variable (since, eβ≈ 1 +β when β is small). For example, according to figure 1, the fuel oil furnace variable has the coefficient 0.352, which means households using fuel oil heating equipment use about 42% more heating energy than others with everything else being equal. From this consideration, we can disaggregate each end use energy for a representative household to obtain energy use per each fuel type by the following equation. For a particular zip code area j, heating energy from gas for the representative household is:


Here rj,i means the percentage of households using fuel i as the main heating fuel in the area with zip code j. The same approach is applicable to all other fuel types used for space heating. To disaggregate water-heating energy into different fuel types, it requires additional steps, because census data we are using do not keep track of water-heating fuel types per zip code area. For this case, we built a multinomial logistic regression model with water-heating fuel type from RECS data set as dependent variable and other independent variables for which both the RECS and census have data. Once this model is created, we can plug zip code–level information for the independent variables from the census into the model to estimate a zip code–level ratio of different fuel types for water heating. With the estimated ratio, we can apply the same approach used to disaggregate space-heating energy and get estimates for water-heating energy use per fuel type per zip code area.

Third, since the nature of appliance energy consumption is not as homogeneous as heating energy consumption, we cannot divide it as simply as through the method above. Lighting and refrigerators, for example, use only electricity, while energy use for stoves, ovens, pools, spas, dryers, or grills may come from either gas or electricity. Since some households (54% according to the RECS data) use only electricity while others use multiple energy sources for appliance energy, we cannot treat all the households in the same way when modeling fuel usage for appliances. Instead, first we built a regression model only with household samples using multiple fuel sources for appliances to estimate the ratio inline image of electricity to total appliance energy. Second, we estimate probability pj that each representative household may use only electricity for appliance energy. For this, we ran a logistic regression with a dependent variable of whether each household uses 100% electricity for appliances or not. Independent variables in this regression include ownership and fuel types for appliances such as stove, oven, spa, and clothes dryer. With this probability inline image we can calculate the expected ratio E[re] of electricity use for appliance in the region:


The estimation methods of fuel type disaggregation for water-heating and appliance energy will certainly introduce additional uncertainties to the consumption estimate per energy source. However, this method will still provide the unbiased mean estimates because of the characteristics of the OLS method. In order to specify the size of uncertainty of a random variable that is a product of other random variables, we have to have correlation coefficients between those original variables (e.g., between water-heating energy use and types of water-heating fuel) for each zip code area (Simon 2006). But practically, we do not have the information in the census dataset, so we cannot specify the uncertainty size for them.

Model Specification and Uncertainty

In terms of model specification, we chose log-linear models over linear ones for three main reasons:

  • 1) Log models gave us higher adjusted R2 values for given combinations of predictor variables.
  • 2) We compared the average sizes of prediction intervals at 95% confidence level, and it shows that log-linear models consistently have smaller intervals for the given combinations of predictor variables (table 1).
  • 3) Normality probability plots drawn from prediction residuals show more linear curves for log-linear models (figure 2).
Table 1.  Comparison of Linear and Log-Linear Models
  1. Note: kBtu = kilo British thermal units; Adj. R2= adjusted R2, which is a measure of the goodness of fit of the line.

Adj. R2Linear0.5940.2950.4900.409
Prediction interval (kBtu)Linear4981.13766.71355.82168.7
Figure 2.

Standardized normal probability plots comparing log-linear and linear models. These graphs show that for the same combination of independent variables, log-linear models are the better specification compared with linear models.

Prediction intervals here are calculated at the mean points for the explanatory variables for each zip code area. To do this, we executed mean-centered regressions for the four models and observed the 95% confidence intervals around the intercepts.

Detecting Multicollinearity

One of the major problems facing statistical models of residential energy use characteristics is multicollinearity, often resulting in poor predictions of certain end uses (Swan and Ugursal 2009). Multicollinearity commonly arises when dealing with variables that tend to be correlated, such as household income and square footage of the house. However, except for the square terms (e.g., hd65sq) that we intentionally added in the model to see the marginal impact change of the factor, correlation between any two variables does not exceed 0.5. Moreover, variance inflation factors for the four regressions range from 1.35 to 3.95 (table 2). Considering that the rule of thumb value for detecting collinearity is 5 or 10 (Hair 1995; Menard 2002), we can say our models do not indicate a noticeable presence of multicollinearity. From the F values of the models,1 we can find the sample size of 4,382 is large enough to make our models significant.

Table 2.  Variance Inflation Factor (VIF) and F Values for Each Regression Model
VIF3.1   3.95   1.35  1.72

Dealing with Endogeneity

Three out of the four model equations include the estimated 2005 residential electricity price as a variable, based on EIA statewide average residential electricity price data (EIA 2009d). This electricity price has a possibility of endogeneity because of the simultaneity with the dependent variable, energy consumption. In order to detect endogeneity between those two, we ran Hausman-Wu tests comparing instrumental variable estimates to OLS estimates and found that electricity price and energy use are not endogenous at the 95% confidence level (table 3).

Table 3.  Hausman-Wu Test Results for Each Regression Model

Potential Errors in the RECS Dataset

As shown in figure 1, the coefficient for electric space heaters is quite substantial. This means that, all things being equal, households with electric space heating as their primary fuel type use only 30.5%(=e−1.188) of heating energy used by those with natural gas space heaters. These numbers are strongly at odds with other published data, which indicate efficiency differences more in the 60% to 70% range (Energy Efficiency and Renewable Energy [EERE] 2010).

This problem appears in the published RECS data tables. For example, the RECS reports that in the New England Census Division, an average household with fuel oil as the primary heating fuel uses 102,500 kBtu for heating, while an average household with electric primary heating uses a mere 8,300 kBtu for heating despite large sample sizes for both fuel types, which is very unlikely in reality. We approached the office at the EIA administering the RECS, and they confirmed that they would look into the issue. This may be attributable to incorrect sampling or erroneous approach taken by the EIA to disaggregate electric bills into end use categories. Because we rely on the relationships in the RECS to determine end uses, we have no simple way to circumvent this problem. Error inoculation introduced by the RECS into degree-day and price data to prevent location of individual respondents will result in additional uncertainty for model projections (EIA 1996).

Results and Validation

The model provides residential energy estimates by end use and fuel type for all zip codes in the United States included in the 2000 census. We can also calculate final energy use, secondary energy use, primary energy use, and carbon emissions, both direct and indirect, for each zip code using associated North American Electricity Reliability Corporation (NERC) subregion emission factors and thermal efficiency (in the case of primary energy) from the EPA's eGRID (EPA 2009). The EPA provided the specific mapping of zip code to NERC subregion, and we assigned each zip code to the NERC subregion that contains the most land area of the zip code in the case of overlap. State-level transmission loss data for electricity are taken from the EIA (2009c). Indirect emissions associated with the fuel cycle, plant construction, and plant decommissioning of natural gas, nuclear, oil, coal, solar, wind, biomass, geothermal, and hydro power are based on the dissertation by Meier (2002) and differentiated for cases in which fuels are being combusted within the residence and for electricity generation elsewhere. These indirect emissions increase the emission intensity of coal, oil, and natural gas by 2%, 5%, and 18%, respectively (note that the work by Meier [2002] did not assess the indirect fuel cycle emissions of propane, and for the purposes of the model it is assumed that it is similar in magnitude to that of natural gas).

Model Validation

It is difficult to validate the accuracy of model outputs given the relative paucity of high-resolution data on residential energy use characteristics. However, the U.S. Department of Energy publishes reliable data on the state level for residential electricity, natural gas, and fuel oil use, which we can compare to our model outputs (EIA 2009d, 2009e, 2009f). Figure 3 shows how, for state-level 2008 residential fuel use, our model mirrors the variation in per-household residential energy use characteristics between states, with estimated use being within 10% of actual use in most cases, though there are a few notable outliers.

Figure 3.

Modeled and actual residential electricity, natural gas, fuel oil, and total energy use by state. Residential use data from the Energy Information Administration are divided by the number of occupied housing units in each state in 2008 to obtain the average per-household residential energy use characteristics by state. The dotted line represents the expected value if the estimated energy use and the actual energy use were equal.

Estimated and actual use vary considerably in a few cases. We overestimate electricity use in the District of Columbia, though this may be due in part to the small number of households in the district relative to the states. We also tend to overestimate per-household electricity use by Utah by 15%, which may be due in part to limitations of degree-days based on daily average temperatures as a proxy for cooling requirements in places with high diurnal variability (Baumert and Selman 2003). Many of the states in New England also use somewhat less electricity (5%–15%) than our model predicts.

For natural gas, both Alaska and Hawaii are major outliers, with the model underestimating Alaska's average per-household natural gas use by 50% and overestimating Hawaii's use almost 13-fold. In Hawaii's case, this is due to a unique characteristic of natural gas use in the state, namely that there is almost no natural gas available. While we model nearly no natural gas use for space heating and water heating in Hawaii, we overestimate the amount used for appliances. Natural gas estimates are also a bit high for Maine.

The model is quite good at estimating which states do and do not use fuel oil, but it tends to systemically overestimate fuel oil use in states where the per-household usage is negligible. In most cases, however, the overestimate is still a negligible portion of total per-household energy use. The model notably underestimates fuel oil use in both Alaska and the District of Columbia.

For total energy use, the two major outliers are, unsurprisingly, Alaska and Hawaii. In the past, the RECS dataset did not sample houses in either state (EIA 1996), and while that has changed in recent years, characteristics of the survey coverage of those states may be at least partially responsible for the anomalous model results.

Changes in residential energy characteristics between 2000 and 2008 are major potential drivers of differences between estimated and observed energy use on a state level. The model uses 2000 U.S. census data for most of the zip code–specific energy use characteristics (the major exceptions being fuel prices, for which we use 2005 numbers to be in line with the RECS and avoid assuming any price elasticity of demand, and weather data, which we obtain for 2008), and many characteristics of the housing stock may have changed in the interim. However, validating the model against 2000 energy sales data would pose its own complications, given that the RECS survey underlying the modeled relationships between energy use characteristics and end use magnitudes was conducted in 2005. The release of the 2009 RECS study (in 2012) and the 2010 U.S. census will provide a good opportunity to update and further validate the model using reasonably concurrent data sources.

Additional uncertainties arise from the fact that the model is tuned toward providing the best estimates of end use energy based on the values in the RECS survey, but those values are themselves estimates based on household bills and home characteristics. There is little true empirical data on residential energy by end use outside of small-scale spatially limited studies, due in part to the prohibitively high costs associated with submetering (Swan and Ugursal 2009). As discussed previously, we have reason to believe that the RECS dataset significantly underestimates electricity used as the primary space-heating and water-heating fuel. As RECS end use distributions are limited by the magnitude of household bills, any errors in the distribution may end up assigning too much electricity use to appliances vis-à-vis other end uses and too little to space and water heating, which could lead to our model underestimating the magnitude of space-heating and water-heating energy demands, especially in areas with more extreme temperatures.

Finally, the zip code–level U.S. census data has a non-negligible sampling error for very small geographic areas, which may make calculations for certain zip codes less accurate, though the net effect on state-level aggregations should be minimal (U.S. Bureau of the Census 2007). The U.S. census does not provide any zip code–level data on water-heating fuel type or the gas versus electric composition of home appliances, and the secondary regressions used to derive these two factors (shown in the online supporting informations) adds additional uncertainty, though they are necessary to calculate a distribution of end uses by specific fuel type.

We have also compared our model outputs for specific zip codes with electricity bills for 2008 from 3,662,839 individual households in 523 zip codes in Illinois aggregated at the zip code level from Commonwealth Edison (ComEd). This was pared down to 413 zip codes for use in model validation by eliminating 92 with exceptionally small population sizes (with less than 150 households) and an additional 18 zip codes that were missing one or more necessary piece of U.S. census data (likely due to the zip code changing or being created subsequent to the 2000 census). We also identified one obviously anomalous zip code and confirmed with ComEd that it was due to a large individually metered RV (recreational vehicle–sometimes called camper) park that had an order of magnitude more units than the actual zip code housing stock. Finally, we identified a shortcoming of our model due to the income categorization in the RECS, namely that the highest income category specified in the RECS is $120,000+. This means that our appliance model does not correctly account for diminishing returns to appliance use as income increases, and tends to produce unrealistically high appliance energy use values for the rare zip codes where the median household income is greater than $200,000. To correct for this, we removed the five anomalous zip codes that exceed this income threshold, and were left with 407.

Figure 4 shows the modeled annual electricity use and the actual electricity use for 2008 for 407 zip codes in the ComEd service territory. Our model accounts for more than 80% of the variability in energy use between these zip codes (R2= 0.814), though the modeled electricity use values tend to be consistently high. This may be due in part to changes in housing stock and characteristics since the 2000 census and is in line with our Illinois statewide estimates for residential electricity use for 2008 that are about 12% higher than actual residential sales data from the EIA. As we continue to work with utility programs, we will have future opportunities to compare our model to actual billing data, particularly after we update the model with upcoming 2010 U.S. census microdata.

Figure 4.

Expected and actual 2008 electricity use by zip code for the ComEd service territory. Based on zip code–level average daily energy use data for single houses without electric heating, single houses with electric heating, multiunit buildings without electric heating, and multiunit buildings with electric heating obtained from ComEd and weighted by the population of each group to create a composite average for each zip code.

Model Outputs

We can use GIS software to spatially map the end use and fuel-specific outputs of our model for each individual zip code, as shown in figures 4, 5, and 6. Note that these reflect the mean house in every zip code and are not weighted by the population of the zip code. The mean house has a probabilistic assignment of factors such as appliance ownership and space heating and water heating fuel types that do not reflect any specific physical house but rather the average of all houses in the zip code. This accounts for the fact that, in a hypothetical zip code, 50% of houses may use natural gas for space heating, 40% may use fuel oil, and 10% electricity.

Figure 5.

Spatial maps of average per-household residential energy by end use. The maps display model outputs in kBtus for each zip code for appliance energy in the upper left, heating energy in the upper right, cooling energy in the lower left, and water heating energy in the lower right. The legend represents the number of kBtus used by the average house over the course of 2008 in zip codes illustrated by the corresponding density of shading. See the supporting online informations for high-resolution color versions of these maps.

Figure 6.

Spatial maps of average per-household residential energy by fuel type. The maps display model outputs for each zip code for fuel oil in the upper left, natural gas in the upper right, electricity in the lower left, and propane in the lower right. The legend represents the number of kBtus used by the average house over the course of 2008 in zip codes illustrated by the corresponding density of shading. See the supporting online informations for high-resolution color versions of these maps.

Displaying average residential per-household energy by end use presents an interesting problem due to differences in end use efficiencies by fuel (National Academy of Science–National Research Council 2008). Specifically, electric devices tend to be significantly more efficient at converting secondary energy into useful energy than devices using primary energy fuels like natural gas (EIA 2008), and a map of point-of-use energy that did not account for the fact that electric energy tends to be more useful when analyzing aggregate energy values would largely reflect differences in fuel type rather than differences in demand for hot water, cooling, heating, or appliance services. A map of energy use that takes into account full fuel cycle factors for electricity would result in the opposite problem, namely that differences in generation efficiencies between grid regions would swamp differences in actual end use magnitudes, at least for appliances and water heating (cooling and heating remain dominated by temperature differences for both point-of-use and full fuel cycle energy metrics). As a compromise and in order to best reflect difference in end use demand across the entire country, we calculate a point-of-use primary energy value using a constant generation efficiency for electricity similar to the methods that Brown and colleagues (2008) and others have used to conduct total end use comparisons spanning different regions of the country. The constant generation efficiency (approximately 41.9%) was chosen to reflect the mean thermal generation efficiency of all kilowatt-hours used by residences in our model based on data from eGRID (EPA 2009). In an effort to reflect differences in use rather than exogenous factors, we do not include any spatially explicit transmission losses in the energy end use maps, though these full fuel cycle factors and transmission losses are included in the estimated residential carbon emissions map.

This first set of maps in figure 5 shows residential energy by end use in kBtus for all 31,467 zip codes for which we have all needed data available, covering roughly 99.5% of the U.S. population. For uninhabited areas (e.g., geographical features like mountain ranges) and areas where data is missing, we in-filled values based on the average of surrounding zip codes using GIS. Lakes and rivers were left uncolored. We chose to use primary energy with a constant generation efficiency for electricity to provide a more meaningful comparison between electricity and other fuels.

The residential appliance energy use map displays an interesting “halo” effect around urban areas in many cases: the urban area itself tends to have lower appliance energy use than both the surrounding suburban or rural areas. However, surrounding the urban area is a bright red ring of suburbs with relatively high appliance energy use, fading into more rural areas with lower appliance energy use. This is due to the strong role played by income, house size, and house type in determining appliance energy use. For example, in the 1,713 zip codes with a population density of over 5,000 people per square mile, appliance electricity use is 28% lower than the U.S. average. Similarly, the 6,320 zip codes with a population density of over 1,000 use 12% less appliance energy than average. Water-heating energy use tends to show an influence of urban form similar, though somewhat less pronounced, to that of the appliance energy use map.

Residential heating energy use and cooling energy use follow an expected pattern dictated by temperature, with warmer areas requiring less heating and more cooling energy than colder areas. However, heating equipment, fuel, house size, house age, thermostat settings, and other factors also play a large role and are more clearly visible in higher-resolution regional views that avoid being swamped by temperature-induced variations.

The particularly high heating use in the Northeast is somewhat of an artifact of the inefficiency of fuel oil heating vis-à-vis other heating types rather than any systemic difference in heating demand, and heating requirements are similar to those of the Midwest when efficiencies are standardized. The relatively low heating energy use in Montana and northern Minnesota may be due to the inclusion of the squared degree day term in the regression function, which can produce somewhat unrealistic results in areas of extreme cold.

The maps of residential energy use by fuel type in figure 6 reflect the division of space heating, water heating, and appliances into specific fuel uses and the aggregation of resulting values together by fuel type. It is useful to note that propane and fuel oil are modeled here as only pertaining to space heating and water heating, as data on propane usage rates for appliances was not available via RECS and appliance fuel oil use is virtually nonexistent. Similarly, all cooling energy use is assigned to electricity. The model can produce maps of fuel use by specific energy end use category, though they are not included herein.

Residential fuel oil use is largely limited to the northeastern United States, with some minor use in North Carolina and Virginia, as well as scattered parts of the Midwest and Northwest. Cities in those regions tend to use less fuel oil than more rural areas. Residential electricity use is influenced by all four end use categories and reflects a number of different factors, especially cooling energy use. Electricity use is particularly high in the southeastern United States and relatively low in California and the Northeast.

Cities and suburban areas tend to have fairly high residential natural gas use relative to more rural areas. However, there are also a number of noticeable areas of natural gas use in lower population areas corresponding to the location of natural gas pipelines (e.g., in upstate New York). Even in states with relatively low natural gas usage, such as Georgia, there is still a noticeable increase in use surrounding urban areas (e.g., Atlanta).

Propane use for heating and water heating is almost an inverted image of the natural gas use map, with urban and suburban areas using virtually no propane, and with high use in many rural areas, particularly in the Midwest. Other fuel use (e.g., wood, solar, coal, etc.) is not shown here, but tends to be clustered in the Appalachian region and in the Pacific Northwest.

The final two maps in figure 7 show average household carbon emissions and total primary energy use for each zip code. The carbon emissions map follows NERC subregion lines, though there is still strong differentiation within subregions that is difficult to see on the aggregate national map, particularly between urban, suburban, and rural areas. Average home carbon emissions vary by a factor of 2.5 between regions, ranging from 7,012 kilograms (kg) in the lowest 5% of zip codes to 18,471 kg in the highest 5%. More rural areas in the Midwest tend to have the highest average household carbon emissions, due in a large part to the high percentage of their electricity coming from coal-based generation. California has the lowest average household carbon emissions, followed by the Northwest and Northeast (though some wealthy communities and Connecticut and Long Island have sizable average household carbon emissions).

Figure 7.

Spatial maps of average per-household carbon emissions and total energy use. The maps display model outputs for each zip code for total primary use (in kBtus) on the left and carbon emissions (in kg) on the right. For the total energy map, the legend represents the number of kBtus used by the average house over the course of 2005 in zip codes illustrated by the corresponding density of shading. For the carbon emissions map, the legend represents the number of kilograms (kg) of carbon dioxide that the average house was responsible for emitting over the course of 2008 in zip codes illustrated by the corresponding density of shading (1 kilogram ≈ 2.2 kilograms). See the supporting information on the Web for high-resolution color versions of these maps.

Total final energy use tends to be dominated by heating (as it represents the single largest energy end use for the residential sector as a whole), but still reveals some interesting patterns. California stands out as having particularly low total energy use, while rural areas in the Midwest and suburban areas in the Northeast show particularly high total energy use. The 100 largest cities in the United States have an average household total energy use of 86,700 kBtus, quite similar to the 88,400 kBtus found by Brown and Logan (2008). This compares to a national average household energy use of 101,800 kBtus for the country as a whole.


This model presents a novel approach to modeling residential energy by both end use and fuel type for the entire United States at a resolution higher than that of any previous model. It provides an in-depth look at how energy is used by residences in different parts of the country, and the variances between home energy use characteristics both within and across different regions. There are numerous possible research applications for this data, both in examining factors like rural/urban divides in residential energy use in different parts of the country and patterns in residential fuel use. Additionally, the model can be combined with fuel-specific price data to estimate end use costs for residences. The model can also be used to examine the impact of various end use reduction actions on different zip codes or regions based on the average characteristics of homes therein. For example, a priori estimates of average annual heating and cooling use by zip code could help assess potential HVAC programs and home retrofits, and help target efficiency programs to those regions for which they would be most effective.

We currently utilize a version of the model in the Personal Energy Advisor online home energy audit tool to estimate home energy use characteristics in the absence of both an in-home audit and the guaranteed availability of specific user inputs for all home characteristics. It is supplemented by home-specific data on year built, square footage, room number, and home type from county tax assessment offices based on the home addresses of users. A variant of the model is also used to disaggregate individual monthly electricity and natural gas bills into component end uses, and can combine top-down estimates using available fuel bills with bottom-up model estimates for fuels without available bills (e.g., in cases where electric bills are available but natural gas and fuel oil bills are not). The model opens up the possibility of sub zip code–level analysis when combined with specific housing unit data from county tax assessment offices and billing data from utility companies.

The model developed in this article could be further tested by examining energy billing data from a representative sample of households in different zip codes, and by comparing the mean energy use values with those predicted. Additionally, the model's annual results could be expanded to estimate monthly energy use characteristics, particularly for heating and cooling, based on temperature data. The major limitation to statistical modeling of residential energy use at a subannual timescale is the absence of a national dataset analogous to RECS that tracks monthly energy end uses. Finally, the model can be used to more deeply explore relationships between urban form, population density, fuel availability, energy use, and carbon emissions in different regions of the country.


This work was made possible by the help and support of our employer, Efficiency 2.0. We also want to thank Chip Berry who manages the RECS survey at EIA for assistance with data access and interpretation, Mark Armstrong at the National Climate Data Center for help with degree day data, Arnulf Grübler for indirectly inspiring the project, and three anonymous reviewers for a wealth of constructive comments.


  • 1

    F is the ratio of the Model Mean Square to the Error Mean Square which tells whether the model can make statistically significant predictions.

About the Authors

Zeke Hausfather is the Executive Vice President for Energy Science and Jihoon Min and Qi Feng Lin are Senior Demand-Side Analysts at Efficiency 2.0 in New York City, USA.