Paper No. JAWRA-11-0134-P of the Journal of the American Water Resources Association (JAWRA). This article is a U.S. Government work and is in the public domain in the USA. Discussions are open until six months from print publication.
Sanford, Ward E. and David L. Selnick, 2012. Estimation of Evapotranspiration Across the Conterminous United States Using a Regression with Climate and Land-Cover Data. Journal of the American Water Resources Association (JAWRA) 1-14. DOI: 10.1111/jawr.12010
Abstract: Evapotranspiration (ET) is an important quantity for water resource managers to know because it often represents the largest sink for precipitation (P) arriving at the land surface. In order to estimate actual ET across the conterminous United States (U.S.) in this study, a water-balance method was combined with a climate and land-cover regression equation. Precipitation and streamflow records were compiled for 838 watersheds for 1971-2000 across the U.S. to obtain long-term estimates of actual ET. A regression equation was developed that related the ratio ET/P to climate and land-cover variables within those watersheds. Precipitation and temperatures were used from the PRISM climate dataset, and land-cover data were used from the USGS National Land Cover Dataset. Results indicate that ET can be predicted relatively well at a watershed or county scale with readily available climate variables alone, and that land-cover data can also improve those predictions. Using the climate and land-cover data at an 800-m scale and then averaging to the county scale, maps were produced showing estimates of ET and ET/P for the entire conterminous U.S. Using the regression equation, such maps could also be made for more detailed state coverages, or for other areas of the world where climate and land-cover data are plentiful.
Evapotranspiration (ET) is a major component of the hydrologic cycle, and as such, its quantity is of major concern to water resource planners around the world. The long-term average quantity of water available for human and ecological consumption in any region is roughly the difference between the mean annual precipitation and mean annual ET (Postel et al., 1996) with the latter frequently a majority fraction of the former. Thus, quantifying ET is critical to quantifying surface runoff to reservoirs or recharge to aquifers (Healy and Scanlon, 2010). Quantifying ET is also critical for studies of ecosystem water balances (Sun et al., 2011a) and regional carbon balances (Sun et al., 2011b).
In spite of the critical nature of this hydrologic component, its measurement on regional to continental scales has been problematic. Measurement of ET, although possible directly at the land surface (e.g., Stannard, 1988), is usually made either indirectly by quantifying the energy balance at a local land surface (Ward and Trimble, 2004) or by eddy covariance at some distance above the land surface (Baldocchi et al., 2001; Mu et al., 2007). The energy-balance approach requires meteorological measurements on an hourly period for each small-scale plot of land because ET can vary in time and space in the range of hours and meters. Thus, this approach has traditionally been data and labor-intensive and not suitable for scaling up to regional estimates of ET. The eddy covariance approach can be used at a single site to get averages over months or years, but still has the disadvantage of measuring ET only within a limited spatial extent. Another indirect measurement approach to estimate ET is by first estimating the potential ET, where the latter is the actual ET that were to occur if there were ample water available. Potential ET has been traditionally estimated by the use of equations that contain terms for local meteorological data such as relative humidity and wind speed. This type of data is more available than energy measurements for many regions, so this approach has been employed in regional studies (Ward and Trimble, 2004). A relation between actual and potential ET, however, is still required, which is usually also a function of meteorological variables. The term “ET” is often used to emphasize that what is being considered is ET that has actually occurred, and not the potential ET. In this article, we consider the terms ET and actual ET to be the same, and will use only the term ET for the remainder of the article except in the final figure, where the modifier “actual” is used only for emphasis.
A third approach for measuring ET indirectly is the water-balance approach, usually conducted for a watershed where the other components (precipitation, change in storage, and stream discharge) are measured, and the remainder is attributed to ET (Ward and Trimble, 2004; Healy and Scanlon, 2010). This method is advantageous when the goal is to obtain a long-term average because when the period of record examined is long enough, the change in storage term can be neglected. The method has also been used in modeling of ET on monthly or yearly scales (Sun et al., 2011b) and compared with eddy covariance data, but this approach has three limitations: the shorter time step amplifies the importance of the change in storage that has high uncertainty, the eddy covariance instruments each sample a relatively small area, and eddy covariance data have only been available for about the last decade or so (Baldocchi et al., 2001). Remotely sensed soil-moisture data have been used as a proxy for change in storage, but remote sensors measure shallow soil moisture only and not the change in storage in groundwater, which can be substantial relative to streamflow on a monthly or yearly time scale. The water-balance approach has been used across the regional and continental scale along with other watershed factors to estimate relative contributions to ET (Zhang et al., 2004; Cheng et al., 2011; Wang and Hejazi, 2011). Recently, the use of remotely sensed data (radiation and land cover) has been combined with theoretical forest canopy ET estimates to estimate continental scale ET across Canada (Liu et al., 2003); however, for this method, the radiation measurements have been limited in time and therefore the ET estimates have been made only for a single year.
We propose here to combine watershed water-balance data with now publically available meteorological data across the United States (U.S.) to create a regression equation that can be used to estimate long-term ET at any location within the conterminous U.S. (CUS). Such a regression equation was developed recently for the state of Virginia (Sanford et al., 2011). The difference between this study and other previous studies is that we rely on long-term (1971-2000) average streamflow conditions in order to minimize the relative size of the change in storage term, and thereby are able to use data from several hundred watersheds across the CUS as a proxy for ET observations. The purpose of the current study is to demonstrate the development and calibration of a regression equation that would apply to the entire CUS, and to make a map of estimated long-term mean annual ET for the CUS and an equivalent map of the ratio of ET to precipitation. The regression equation is first developed using only climatic variables, but a second improved equation is also developed that included land-cover variables. The map of the long-term mean annual ET should prove to be of great value to water managers planning for long-term sustainable regional use of the resource, and the equation should be useful for examining the variability of ET at more local to state scales, or to other areas of the world where such climate and/or land-cover data are available.
The approach taken in this study is to obtain estimates of ET from watersheds across the U.S. and relate these estimates to climate and land-cover data such that a regression equation can be developed and applied to all counties of the CUS. A mean annual streamflow for the period 1971-2000 was obtained from at least one watershed in every state in the CUS based from the U.S. Geological National Water Information System (NWIS) database (http://waterdata.usgs.gov/nwis/rt). Selection criteria included a complete monthly flow record from 1971 to 2000, a watershed area between 100 and 1,000 km2, and lack of any known water-impoundment features (e.g., reservoirs) or water exports or imports from the watershed. An upper limit on the basin size was used to limit the total number of watersheds to a manageable number, and to exclude larger basins that tend to have more variable climatic conditions within them. Based on these criteria, 838 watersheds were selected (Figure 1). ET was calculated for each watershed by subtracting the mean streamflow rate (divided by the watershed area) from the mean precipitation rate for the period 1971-2000. The mean precipitation data for 1971-2000 were obtained from the PRISM climate dataset (Daly et al., 2008), http://www.prism.oregonstate.edu. The period 1971-2000 was selected because climate data from PRISM have already been compiled for this base time period. Data for the 838 watersheds are provided in the Supporting Information.
An assumption behind this water-balance approach is that the change in the storage of water in the subsurface over this 30-year period is small relative to the amount of water that has exited by streamflow during the same period. Sources of error in the estimates of ET, in addition to this change in storage, could be error in the precipitation estimate, flow-rate estimate, or area of the watershed estimate. The latter could be related to either an inaccurate measurement of the surface-water divides or lack of coincidence of the surface and groundwater divides. Based on our results we believe that all of these sources of errors are relatively small compared with the total fluxes involved. Changing climatic conditions have been occurring to some extent over the 30-year period. The averaging approach used here does not describe that variability in time, but simply calculates average values for the 30 years.
The observed ET rates from the 838 watersheds were compared with climate data for these watersheds. The climate data from PRISM (http://www.prism.oregonstate.edu) were based on an 800-m grid resolution, but averages were calculated using the GIS software (ArcGIS, ESRI, Redlands, CA) for the watershed and the county areas (100 to 1,000 or more km2) using the 800-m precipitation and the 800-m minimum and maximum daily temperatures. The county-averaged precipitation is shown in Figure 2. From the latter two datasets, the mean annual temperature (Figure 3) and mean diurnal temperature range (Figure 4) were calculated for both the watersheds and counties. The data were averaged by county in order to illustrate the variability across the CUS.
An initial regression equation was developed that related only the three climate variables of mean annual temperature, mean diurnal temperature range, and mean annual precipitation to the ratio of ET over the precipitation. This ratio was used such that the value of ET/P would vary between 0 and 1, a fairly common approach in ET studies (Brutsaert, 1982, pp. 241-243). The form of the equation (Table 1) was chosen such that the ratio would approach 1 (for Λ = 1) for low values of precipitation (Π term) or high values of temperature (τ term), and approach 0 for high values of precipitation or low values of temperature. In fact some of the data do exceed 0.9 and fall below 0.10 (as demonstrated below). The mean diurnal temperature range term (Δ) was included in a manner that lower values would lower the ET estimate. The Δ term accommodates the effects of higher humidity near the coastline, and also correlates with solar radiation (Allen, 1997) and an accompanying effect on ET. The Greek letters were chosen to reflect their internal variables (τ for temperature, Π for precipitation, Λ for land cover, and Δ for the mean diurnal temperature range). The climate-only form of the regression equation has six parameters, including temperature offsets (To and a) and a temperature difference offset (b), a precipitation multiplication factor (Po), and temperature and precipitation exponential factors (m and n). The regression equation was evaluated for each of the 838 watersheds, and the values of the parameters were adjusted using a nonlinear Gauss-Newton iteration approach until the sum-of-squared errors were minimized. The values of the final parameters are listed in Table 1. The value for parameter “a” was the least sensitive, and was ultimately specified at a value of 10,000.
Table 1. Regression Equation, Variables, Parameters, and Their Values Used to Estimate the Ratio ET/P for the Conterminous U.S.
Tm, mean annual daily temperature (°C); Tx, mean annual maximum daily temperature (°C); Tn, mean annual minimum daily temperature (°C); P, mean annual precipitation (cm)
, where Li is the fraction of landcover type i within the area of calculation, and subscripts d, developed; f, forest; s, shrubland; g, grassland; a, agriculture; m, marsh
Parameter value for climate-only regression
Parameter value for climate- and land-cover-based regression
Results for the climate-based regression show a strong correlation between ET/P and the climate factors (Figure 5). Plotting the observed actual ET/P vs. the estimated ET/P yielded an R2 value of 0.8674 for the best-fit parameters. A best-fit line through the data had the expression MEV = 0.793 OBV + 0.121, where MEV is the model estimated value and OBV is the observed value. The root mean square error (RMSE) of the model data was 0.067, and the coefficient of efficiency (CE) (Nash and Sutcliffe, 1970) was 0.860. The values of estimated ET/P range from <10% to over 95%. The high values of R2 and CE demonstrate that climate factors can explain most of the variation in long-term average ET across the CUS. The value of the best-fit slope on the linear equation, 0.793, is far enough from the ideal value of 1.000 to suggest that there are still other factors unaccounted for in this regression.
Climate and Land-Cover-Based Regression
Although the climate factors explained much of the variation in the observed ET, vegetation cover is also known to influence ET. Thus, a land-cover variable was added to the regression equation to see if the fit to the observed ET could be obtained. Land-cover data from the USGS 2001 land-cover dataset (Homer et al., 2004) were used, and the percentages of land cover in each watershed and county were calculated using the GIS software. The land-cover categories used included developed, forest, shrubland, grassland, agriculture, marsh, and other (Table 1). The most geographically extensive land covers include agriculture (Figure 6), forest (Figure 7), grassland (Figure 8), and shrubland (Figure 9). Six parameters (c through h) were added to the regression equation (Table 1) to account for each category except “other.” Each of these six parameters is a constant that is multiplied by the fraction of that land-cover type within the area of calculation. The six products are summed to create a multiplier to the climate-only regression equation.
Because a land-cover dataset from only 2001 was used in conjunction with average ET estimates over the period 1971-2000, changes in land cover as a whole were assumed to be relatively small over time. As this certainly has not been the case in many locations, especially in developed areas, the total percent of developed land is small relative to the others, and the changes between forest and agriculture, for example, although definitely occurring, are typically small relative to the total areas covered (Stehman et al., 2003). Other studies have also shown that making distinctions within these classes can also affect ET. Examples of this are crop type (Bausch, 1995; Hunsaker et al., 2003) and the type and age of forests (Murakami et al., 2000; Cornish and Vertessy, 2001; Lu et al., 2003). The relatively small (yet substantial) improvement in the regression incurred by adding land cover to the climate regression suggested that the division of land-cover types or ages into additional parameters was unwarranted at this stage and thus deemed beyond the scope of this first study. Lack of additional spatial and temporal variations in land cover are potentially another source of the regression error, and would be good parameters to attempt to include in future work.
The climate-and-land-cover regression equation was applied to each of the 838 watersheds and the parameters were varied until the sum-of-squared-errors were minimized. The resulting factors for the land-cover parameters were consistent with the relative effect each category was expected to have on ET (Table 1). The marsh and agriculture categories had the greatest effect on increasing ET, with values of 0.400 and 0.382, respectively. Likewise, the developed and shrubland categories had the least effect, with values of 0.173 and 0.094, respectively. A plot of the observed vs. estimated ET/P values using the climate and land-cover equation revealed a slight increase in the correlation (Figure 10) with an R2 value of 0.882. A best-fit line through the data had the value MEV = 0.877 OBV + 0.0753. The RMSE of the model data was 0.0617, and the CE was 0.882. The slope and intercepts of the best-fit line are closer to 1 and 0, respectively, than the climate-only model. The RMSE is slightly less (improved) for the climate-and-land-cover model than for the climate-only model. Likewise, the CE value is slightly higher (improved) for the climate-and-land-cover model. Although the R2 value of 0.882 is not much greater than the value of 0.867 for the climate-only regression, it does represent about 13% of the error from the climate-only regression. These results indicate that the climate variables are the most influential in determining ET, with the land cover adding a small but finite additional effect.
In order to test the validity of this regression equation, data from an independent set of watersheds were compiled for a validation test. In this case, the same set of criteria was used as in the first dataset, except that the watershed areas were slightly larger (1,000 to 2,500 km2). Again at least one watershed was selected from each state (except Delaware in this set), and a total of 342 watersheds were selected (Figure 11). Data for the 342 watersheds are provided in the Supporting Information. The climate-and-land-cover regression equation from the first dataset was applied to the second dataset to obtain estimates of ET/P. The resulting R2 fit was not only as good as the first dataset but actually slightly surpassed it with an R2 of 0.903 (Figure 12). Likewise the best-fit line was closer to the 1:1 value with a slope and intercept of 0.897 and 0.0368, respectively. The RMSE for the model data was 0.0660 and the CE was 0.871; these values were similar to those of the smaller watershed data. The range of values in ET/P was again broad, and in this case, ranged from about 10% to nearly 100%. The results indicate the regression model is robust for application to other watersheds or areas within the 1971-2000 time frame for a similar range of physiographic and climatic conditions.
Results of Applying the Regression Equation
We believe much of the remaining errors in the ET in all of the watersheds can be attributed to unaccounted for changes in storage over the 30-year period or other errors associated with the water-balance estimates for the actual ET, rather than an inability of the regression to estimate the ET. The large size of the dataset contributes to the robustness of the equation applied to the 1971-2000 time frame under a similar range of physiographic and climatic conditions. Thus, using the regression to estimate ET in other locations (such as counties) for 1971-2000 should produce an estimate that has an accuracy that is on average ≤6.6% RMSE for the larger watershed dataset.
The regression equation is useful because it can be applied to any area where similar climate and/or land-cover data are available. As such, data are now available for the entire CUS, a map can be compiled of the estimates of ET/P or of actual ET for the entire region. The climate data from the PRISM climate dataset were available at the 800-m resolution, and the land-cover data were available at the 30-m resolution. The land-cover data were first compiled into the 800-m grid, and then all of the data from the 800-m grids were used to calculate ET/P and actual ET at the 800-m grid resolution. In order to improve the visual nature of the results for the entire CUS, the 800-m values were averaged at the county level. These county values are shown in Figure 13. This is the first known detailed map of estimated actual ET for the entire CUS. The map shows that the Pacific Northwest has many regions with an ET/P ratio of <20% because of very high rainfall and low-to-moderate temperatures. Other high-elevation regions in the Cascade, Sierra, and Northern Rocky Mountains have an ET/P ratio between 30 and 50%. Likewise, virtually all of New England, the highest elevations in the Appalachian Mountains, and the central Gulf Coast have an ET/P ratio of between 30 and 50% because of moderate temperatures and/or high rainfall. The majority of the region with a temperate climate has an ET/P ratio of between 50 and 70%. ET in counties in the arid southwestern CUS usually exceeded 80% of precipitation. Here, the averaging by county hides the fact that most of the intermontane basins in the southwestern CUS have ET values that exceed 95% of precipitation, whereas the accompanying mountain ranges have ET values that are below 80% of precipitation.
An unusual feature of the ET/P regression map is that certain areas have a ratio >1. Unlike in the climate-only regression, the second regression has a land-cover multiplication term that can cause the ratio to exceed the value of 1. The map reveals such areas in the High Plains and Central Valley of California. The significance of these examples where ET/P is >1 is that these are currently agriculturally dominated areas whose natural climate alone cannot support the current level of agriculture. Both regions use large quantities of imported water to sustain the agriculture, either by pumping (mining) water from deep aquifers, or by diverting surface water from nearby mountain reservoirs. Virtually, none of the watersheds used to develop the regression were located in these irrigated areas (as this violated the no-import criterion), but most of the counties with an ET/P ratio that exceeds 1 are located in these irrigated regions.
The values of ET/P at the 800-m grid resolution were multiplied by the values of precipitation at the 800-m grid resolution to obtain values of estimated ET at that resolution. These values were averaged at the county level and displayed in terms of estimated mean annual ET in centimeters (Figure 14). The modifier “actual” to ET in this figure is added for emphasis only. The map shows that the highest mean annual ET values (near 100 cm) in the country occur along the Gulf Coast and in Florida where there is a combination of ample rainfall and warm temperatures. The lowest ET values (<10 cm) occur in the desert Southwest where rainfall is also about 10 cm/yr. The estimates of ET are consistent with average values measured at networks of stations in Florida (German, 1996), Ohio (Noormets et al., 2008), and Nevada (Nichols, 2000), and with other ET model results that have been calibrated with ET covariance data across the CUS (Sun et al., 2011b).
Summary and Conclusions
A method was used to estimate ET that combined a water-balance approach with a regression equation based on climate and land-cover factors. The method focused on using long-term (30-year) streamflow records as observations of P-ET, and thus minimized the relative size of the neglected change in the groundwater-storage term. The method differs from other methods currently being used to estimate ET (on this spatial scale but shorter time scales) by not relying on recent ET covariance data as observations (with their local footprints), and by not using monthly or yearly streamflow estimates where unknown changes in groundwater storage can be relatively large in comparison. The long-term discharges from 838 watersheds across the entire conterminous U.S. were compared with long-term precipitation in those watersheds to compile a proxy dataset of observed ET. Climate and precipitation data at these same watersheds were then used as parameters in a regression equation to create a best fit to the observed data. The result was a regression equation that can predict ET at any given site based solely on climate, or climate and land-cover, variables with an R2 value of 0.87 or greater. By then applying this regression equation to climate and land-cover values for each county across the entire conterminous U.S., maps were created for ET and ET/P for the country. The ET/P map illustrates that, in certain regions, such as the High Plains and the Central Valley of California, ET exceeds precipitation because of the import of water other than that available from local precipitation. These maps should be useful for regional water managers, and the method useful for application in more detail at the state level or in other regions of the world where climate and land-cover data are plentiful.