Local and global factors controlling water-energy balances within the Budyko framework
Key Laboratory for Agro-Ecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, China
Huanjiang Observation and Research Station for Karst Ecosystem, Chinese Academy of Sciences, Huanjiang, China
Corresponding author: X. Xu, Key Laboratory for Agro-Ecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, Hunan 410125, China. (firstname.lastname@example.org)
 Quantifying partitioning of precipitation into evapotranspiration (ET) and runoff is the key to assessing water availability globally. Here we develop a universal model to predict water-energy partitioning (ϖ parameter for the Fu's equation, one form of the Budyko framework) which spans small to large scale basins globally. A neural network (NN) model was developed using a data set of 224 small U.S. basins (100–10,000 km2) and 32 large, global basins (~230,000–600,000 km2) independently and combined based on both local (slope, normalized difference vegetation index) and global (geolocation) factors. The Budyko framework with NN estimated ϖ reproduced observed mean annual ET well for the combined 256 basins. The predicted mean annual ET for ~36,600 global basins is in good agreement (R2 = 0.72) with an independent global satellite-based ET product, inversely validating the NN model. The NN model enhances the capability of the Budyko framework for assessing water availability at global scales using readily available data.
 Partitioning of precipitation at the land surface into evapotranspiration (ET) and runoff (Q) reflects the hydrologic response to land use and climate forcing, impacting water availability globally. While development and application of global and regional land surface models has been increasing rapidly within the past couple of decades [Pitman, 2003; Rodell et al., 2004; Xia et al., 2012], there is increased interest in more simple, robust approaches to evaluating water-energy balances using approaches such as the Budyko framework [Williams et al., 2012; Zhang et al., 2012]. The Budyko framework is a simple but effective tool for assessing linkages and feedbacks between climate forcing and land surface characteristics on water and energy cycles at basin scales that has been applied globally [Milly and Shmakin, 2002; Zhang et al., 2012]. Budyko  originally assumed that the curve without any parameter was appropriate for large basins and long-term averages. However, deviations of measured data from this relationship (curve) were observed, and considerable work has been done to explain these deviations, attributing them to variability and seasonality in climate, to soil characteristics, to vegetation type, and to the scales of analyses [Donohue et al., 2007]. In order to account for these factors, several analytical equations have been proposed for the Budyko curve with new parameters added, among which two one-parameter equations (see equation (1) [Fu, 1981; Zhang et al., 2004] and equation (2) [Choudhury, 1999]) are most widely used
where P and ETP are precipitation and potential ET, respectively. Yang et al.  showed that the two equations are equivalent with ϖ = n + 0.72. The parameters ϖ and n control the water-energy partitioning and determine the shape of the Budyko curve and reflects the influences of basin characteristics. ET/P is termed the evaporation index and ETP/P is termed the dryness index. Application of the Budyko model requires steady state conditions, which are generally achieved by using data at time scales significantly longer than 1 year [Gentine et al., 2012; Roderick and Farquhar, 2011].
 A value of ϖ = 2.6 (i.e., n = 1.9 in equation (2)), [see Choudhury1999 and Donohue et al., 2012] was assumed as a default value for the Budyko curve (equation (1)) when applied to different basins. Using historical observations at annual time scale (P, ETP, ET = P − Q, where Q is catchment scale runoff), the parameter ϖ can be determined for individual basins and deviated from the value of 2.6 [Li et al., 2013; Yang et al., 2007; Yang et al., 2009]. However, ϖ cannot be estimated from ungauged basins, highlighting the need to estimate ϖ independently using other data. Previous studies by Zhang et al.  showed that ϖ is higher for forested catchments (2.8) relative to grassland catchments (2.4), reflecting potential higher ET in forested catchments. The importance of vegetation was emphasized by Donohue et al.  who pointed out that the Budyko model should explicitly include vegetation information before it can be extended to small basins. Yang et al.  found that vegetation played opposite roles (i.e., increased ET in one and decreased ET in another) for two groups of basins in different climatic zones. Li et al.  developed a model using NDVI (normalized difference vegetation index) that applied to 26 large basins (≥ 300,000 km2); however, the model did not adequately fit smaller basins (≤50,000 km2). Other explanatory variables including climate, soil, and topographic factors were also used for estimating ϖ in different studies [Donohue et al., 2012; Milly, 1994; Shao et al., 2012; Yang et al., 2007, 2009]. These studies vary in terms of basin areas, number of basins considered, and explanatory variables used. However, none of these models apply to basin areas ranging from small (< 1000 km2) to large (> 10,000 km2) scales.
 The objective of this study was to develop models to assess water-energy partitioning that spans a range of basin scales from small to large using readily available data and evaluated independently using other data sets. Unique aspects of this study include development of models to estimate the ϖ parameter in the Fu's equation using 224 small U.S. basins (100–10,000 km2) and 32 large, global basins (~230,000 to 600,000 km2) and a combined data set including both.
2 Materials and Methods
2.1 Data Sources
 Different groups of basins were used to train the models for estimating ϖ.
There are ~ 400 basins in MOPEX (International Model Parameter Estimation Experiment) data set across diverse climate, vegetation, and soil conditions in the U.S. [Duan et al., 2006]. A total of 224 basins were selected for this study based on data availability (Figure 1a) to represent small basins spanning 1948–2003 with drainage areas ranging from ~100 to 10,000 km2 (median 2300 km2). Daily data (P, ETP, and Q) were aggregated into annual values. Note that ETP data in MOPEX are based on climatology which is appropriate for our study of the long-term mean state [Gentine et al., 2012; Wang and Hejazi, 2011]. MOPEX also provides basin boundaries and NDVI that were used in this study. Topographic variables (elevation, slope gradient, slope aspect, and compound topographic index (CTI)) at 1 km resolution were extracted from HYDRO1K data sets (http://eros.usgs.gov/#/Find_Data/Products_and_Data_Available/gtopo30/hydro).
A total of 32 large basins ranging from ~230,000 to 600,000 km2 area (median ~100,000 km2) were selected from Pan et al.  (Figure 1b). Basin averaged monthly data (P, Q) were aggregated into annual data (1984–2006). Potential ET was obtained from CRU TS 3.20 (1901–2011; 0.5° resolution, Climatic Research Unit (CRU) TS (time series) version 3.20 gridded data (http://www.cru.uea.ac.uk/cru/data/hrg/)). Topographic variables (elevation, slope gradient, slope aspect, and CTI) were extracted from HYDRO1K data sets. NDVI (1981–2006; spatial resolution 0.073°) was acquired from the GIMMS (Global Inventory Modeling and Mapping Studies) data set (http://glcf.umd.edu/data/gimms/).
2.Combined Small and Large Basins
A third data set (256 basins) was developed by combining the small (224 MOPEX) and large (32) basins to train a global model for estimating ϖ. The long-term mean annual evaporation ratio (ET/P) versus dryness index (ETP/P) for the 256 basins is shown in Figure 1c. Note that the third group of basins is only a combination of the first and the second groups rather than an independent data set.
 An independent global data set of ~36,600 basins from HYDRO1K was used to apply the proposed models. The basins were selected according to availability of data on elevation (ele), slope gradient (slp), slope aspect (asp), and compound topographic index (CTI). The climate data (precipitation and potential ET) from CRU TS 3.20 (Climatic Research Unit (CRU) time series (TS) version 3.20 gridded data (http://www.cru.uea.ac.uk/cru/data/hrg/)) and NDVI data from GIMMS were used.
 The Budyko framework (Fu's equation, equation (1)) with ϖ estimated from the proposed methods in this study was further evaluated by comparing estimated ET for ~36,600 global basins using an independent, remote sensing-based ET product of Zhang et al. . The global ET gridded product (resolution 0.073°; time period 1983–2006) was developed based on a satellite remote sensing-based evapotranspiration algorithm that was validated using eddy-covariance tower flux data sets. The algorithm quantifies canopy transpiration and soil evaporation using a modified Penman-Monteith approach with biome-specific canopy conductance determined from NDVI. The algorithms were applied globally using advanced very high resolution radiometer GIMMS NDVI, National Centers for Environmental Prediction/National Center for Atmospheric Research Reanalysis daily surface meteorology, and NASA/Global Energy and Water Cycle Experiment Surface Radiation Budget Release-3.0 solar radiation inputs.
 The parameter ϖ of equation (1) was optimized (calibrated) with annual values of P, ETP, and ET (P − Q) over a period of at least 23 years for each of the 256 basins using a least squares technique, and is referred to as “optimized ϖ.” This study assumes that the water storage change approaches zero at annual time scale and hence ET can be considered as the difference between P and Q [Donohue et al., 2010]. This study is an extension of the Budyko framework as equation (1) was optimized using annual data.
 To develop estimation models of ϖ for different groups of basins (224 MOPEX small basins, 32 large basins, and their combination), a stepwise multiple linear regression (MLR) technique was used to fit the optimized ϖ values against a group of different independent variables. The following variables were considered: elevation (m), basin center latitude (absolute value of −90° (S) to 90° (N)), basin center longitude (−180° (W) to 180° (E)), drainage area (km2), slope gradient (degree), slope aspect (cosine), compound topography index (usually titled wetness index), and NDVI (long-term mean value). Note that all of the variables, except basin center, represent the spatial mean for each basin. The model for the combined 224 MOPEX and 32 large basins is called global MLR model hereafter.
 To improve the performance of global MLR model, we applied a neural network (NN) tool to build a new model, called NN model. Selected variables from the global MLR model were used as inputs for the NN model; 70%, 15%, and 15% of the 256 basins were used to train, validate, and test the neural network, respectively; a two-layer feed-forward network with two sigmoid hidden neurons and one linear output neuron was used, and the network was trained with Levenberg-Marquardt back propagation algorithm. The R2 (root-mean-square error (RMSE)) for the training, validation, and testing are 0.83 (0.13), 0.83 (0.22), and 0.84 (0.16), respectively. The final net structure file (net) of the trained NN model is provided in supporting information.
 Performance of the proposed models for estimating ϖ was evaluated by comparing estimated ET using equation (1) against observed ET (i.e., P − Q). We also applied the Fu's equation (equation (1)) with the NN model-estimated ϖ to the ~36,600 basins globally to estimate ET. CRU climate data (P and ETP) from 1983 through 2006 were used, and the estimated ET was compared with the ET product of Zhang et al.  for the same period to evaluate the performance of equation (1) in estimating ET and thereby performance of our proposed model in estimating ϖ.
3 Results and Discussion
3.1 Characteristics of the ϖ Parameter
 Values of ϖ (Figure 2a) have similar ranges for small (MOPEX) basins (1.0–4.9) and large basins (1.3–4.6) but higher median values for small basins (2.6) relative to large basins (1.8). We also found that higher standard deviation in ϖ for large basins (0.72) than small basins (0.65). This might be that compared to small basins with MOPEX data set represented within U.S., large basins are distributed worldwide covering wider geographic locations that might increase the spatial variation (larger spread) of ϖ. Combining small (224) and large (32) basins results in a median ϖ value of 2.5. The range in ϖ values from this study is similar to that developed from analysis of ~ 470 basins globally by Zhang et al.  (1.7–5.0) with optimal values of ϖ higher for forested basins (2.8) relative to grassland basins (2.4). The range in ϖ values for this study also compares favorably with that from 108 nonhumid basins in China (1.3–4.6, median 2.9) [Yang et al., 2007] and from 97 basins within Murray Darling Basin in Australia (1.8–3.8) [Donohue et al., 2011]. Using 26 out of the 32 large basins used in this study, Li et al.  found that ϖ ranges from 1.3 to 3.9 with the median value at 1.7 (extracted from Figures 3 and 4 in Li et al. ), which is similar to our estimates (1.3 to 4.6 with the median value at 1.8).
3.2 Models for Estimation of ϖ
 A total of five out of eight explanatory variables were selected in the final model for the 224 MOPEX small basins
where lat is absolute latitude of basin center, A is drainage area, and elev is elevation. This model explains 63% of observed variance (the optimized ϖ) with RMSE of 0.40 (Figure 2b) and with lat explaining 43%, CTI 14%, NDVI 3%, A 2%, and elev 1% of variance.
 The final model for the 32 large basins is as follows:
 This model explains 86% of observed variance with RMSE of 0.28 (Figure 2c) and with lat explaining 72%, NDVI 11%, and CTI 3% of variance.
 Combining the MOPEX and global data sets resulted in the following model:
where slp is slope gradient. This global MLR model explains 53% of observed variance with RMSE of 0.47 (Figure 2d) and with slp explaining 27%, lat 17%, NDVI 5%, long 2%, and elev 2% of the variance.
 Although the global MLR model (equation (5)) explains about half of the variance of the optimized ϖ, the predictive capacity of ϖ is weak, particularly at the high end (Figure 2d). Empirical models for estimating ϖ seem to be more easily developed for large basins than for small basins (Figures 2b, 2c, and 2d) [Li et al., 2013]. This may reflect the heterogeneity in terms of climate, soil, vegetation, and geology incorporated in large basins whereas small basins are more homogeneous and distinct from one another which may be more difficult to capture. The NN (neural network) model performed better than the MLR model (Figure 2e versus Figure 2d), with higher explained variance (69% relative to 53%) and lower RMSE (0.38 relative to 0.47). The Fu's equation with optimized ϖ (Figure 2f) accurately estimates long-term mean ET when compared with observed ET (i.e., P − Q) for the 256 basins (224 MOPEX plus 32 large basins), which demonstrates that our optimization using annual data for individual basins is appropriate and the optimized ϖ can be used in the Fu's equation to estimate long-term mean ET for individual basins. The Fu's equation with ϖ determined from the NN model better predicts ET relative to the Fu's equation with ϖ from the MLR model (R2: 0.87 versus 0.84; RMSE: 66 versus 73 mm, see Figure 2h versus Figure 2g).
 Geographical location (latitude) and NDVI are common factors among the three models (equations (3), (4), and (5)). The three models suggest that both global factors (latitude, longitude, and elevation) and local factors (vegetation, slope gradient, CTI, and drainage area) control the water-energy balance within the Budyko framework. The physical basis of the Budyko framework is that ET is limited either by available water or by atmospheric evaporative demand (available energy). All the explanatory variables we used were chosen to help better confine this supply/demand relationship. For example, slope regulates the lateral water redistribution and thus available water for ET; decreasing latitude increases incoming solar radiation, while longitude determines the distance to the sea and thereby rainfall sources; elevation is an integrated indicator of environmental conditions but it may mainly reflect the temperature gradient because of its inverse relationship with ϖ (equation (5)); vegetation (NDVI) directly influences rainfall redistribution, e.g., vegetation coverage controls water interception, and roots control infiltration, uptake, recharge, and runoff. All of the selected factors in our model (equation (5)) are commonly used in studies of climate, ecology, and hydrology. Although specific factors explicitly reflecting the effects of climate seasonality, which were found to be important in many studies [Potter et al., 2005; Williams et al., 2012], were not considered in this study, the geolocation information may implicitly carry some seasonality information. Therefore, the proposed models for estimating ϖ in this study are overall theoretically reasonable and most likely reflect underlying ecohydrologic mechanisms. With estimated ϖ, the Fu's equation reproduced mean annual ET for the 256 basins (Figure 2e), which validated our NN model in estimating ϖ with readily available inputs.
 Different models of estimating ϖ have been developed in previous studies for specific conditions. For example, Yang et al.  proposed a model (R2 = 0.49) with soil hydraulic properties and slope as inputs, and then further added vegetation information [Yang et al., 2009]; Donohue et al.  derived a model with mean storm depth within a day, plant available soil water holding capacity, and effective rooting depth as inputs; Li et al.  proposed a model (R2 = 0.63, RMSE = 0.32) with vegetation fraction as the only input; Shao et al.  identified climatic factor, relief ratio, and vegetation as important factors.
 Compared with previous models, our model (NN model) applies to both small and large basins; the NN model (Figure 2e) performed better (R2 = 0.69 and RMSE = 0.38) than previous models; inputs for our model in estimating ϖ only include basin characteristics (e.g., elevation, slope gradient, latitude, longitude, and NDVI) that are readily available, and do not include any climatic factors. We argue that it is better if ϖ is estimated independently of climatic factors because climatic factors (P and ETP) are already incorporated as inputs to the Fu's equation (Budyko framework) and including them in estimation of ϖ results in cross-correlation issues (equation (1)). Noticeably, all of the models (from both our and previous studies) include vegetation information (e.g., NDVI and its derived variables), and this emphasizes the importance of vegetation in regulating ecohydrologic processes [Donohue et al., 2007, 2010; Gentine et al., 2012]. However, we also noticed that the variance explained by NDVI seems low (11% for the 32 large basins, equation (4)), and much lower than the result (equation (10)) from Li et al. ; it is because of the cross correlations of NDVI with other factors, and the variance explained by NDVI alone can be increased to >70% for the 32 large basins if only NDVI is considered.
3.3 Estimation of Terrestrial ET at Global Scale
 The NN model was applied to ~36,600 basins globally (Figure 3a). Results show that ϖ decreases from the equatorial zone to high latitudes corresponding to the pattern of solar radiation. The NN estimated ϖ (Figure 2a) ranges from 1.0 to 5.0 (mean 2.2; median 1.9) for the ~36,600 basins.
 With the estimated ϖ value from the NN model for each basin, the Fu's equation (equation (1)) was used to estimate ET for the ~36,600 basins. Mean annual ET (1983–2006) (Figure 3b) has a similar spatial pattern with that of ϖ values (Figure 3a). The Fu's equation estimated mean annual ET has a high correlation (R2 = 0.72) with independently estimated ET from Zhang et al. , although there are many points where ET values from this study are much lower than those of Zhang et al.  (Figure 3c). This difference may result from ET estimates in this study being constrained by ETP (energy availability) and P (water availability, the water limit line in Figure 3d), which reflects the assumptions of the Budyko framework (water and energy limit lines in Figure 1c). The ET/P ratio from Zhang et al.  seems physically unreasonable because many points exceed the water limit line (ET/P = 1, Figure 3e). This may raise the following question: what is the source of the extra water to support excess ET relative to P over the long-term? The excess water may be derived from groundwater (irrigation), but the ratio should not be so large (10 or 100 times) unless a large volume of water is diverted from outside of the basin. Therefore, our results (ET) seem more physically reasonable from the perspective of the Budyko constraints (limits of water availability). The high performance of the Fu's equation (equation (1)) with NN estimated ϖ in estimating mean annual ET provides confidence in the NN model.
4 Summary and Implications of This Study
 Development of a universal equation of ϖ that applies to diverse basins globally provides a powerful tool to evaluate water-energy partitioning using the robust Budyko framework (Fu's equation) with readily available data. The detailed development of this formulation using small and large basins and testing using ~36,600 basins globally against an independent ET product provides confidence in the neural network predictive equations. This tool can be used to estimate the water-energy balance of gauged basins and compared with monitoring data, but more importantly it can also be applied to predict ungauged basins which is a critical issue in hydrology [Hrachowitz et al., 2013]. There is increasing interest and modification of Land Surface Models (e.g., Global Land Data Assimilation System, NOAH, MOSAIC, Variable Infiltration Capacity (VIC), and Community Land Model) to assess climate variability and land use change impacts on water resources. These LSMs can be evaluated against the Budyko framework developed in this study to determine how physically reasonable the model output is in terms of water and energy limitations in different regions. While the current model was developed using long-term average state variables (multiyear mean NDVI) with results reflecting long-term equilibrium conditions of water-energy partitioning in basins, future efforts should evaluate the feasibility of developing more dynamic models to accurately capture ecohydrologic processes in response to shorter-term forcing in basins as was done by Zhang et al. . This study should facilitate wider applications of Budyko framework in assessing water-energy partitioning at basin scales globally.
 We acknowledge “100 talents program” (Y323025111 and Y251101111) and Western Development Project (KZCX2-XB3-10) of the Chinese Academy of Sciences and the Key Project of the National Twelfth Five-Year Research Program of China (2010BAE00739). We thank the Associate Editor and the two anonymous reviewers for their constructive comments, which greatly improved our manuscript.
 The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.