Corresponding author: D. P. Lettenmaier, Wilson Ceramics Laboratory 202D, Department of Civil and Environmental Engineering, University of Washington, Box 352700, Seattle, WA 98195, USA. (firstname.lastname@example.org)
 We describe a regional parameter estimation scheme for the unified land model developed using a set of 220 river basins (102–104 km2) distributed across the conterminous United States. We evaluate predictive relationships between geographically varying catchment features and the model's soil parameters using principal components analysis. In addition to commonly used catchment descriptors (meteorological, geomorphic, and land-cover characteristics), we used satellite remote-sensing products and the United States Geologic Survey Geospatial Attributes of Gages for Evaluating Streamflow (GAGES-II) database. In a series of regionalization experiments, we contrast the more conventional procedure of using locally optimized parameters as predictands, with an approach that searches for zonally representative parameter values, using limited additional simulations. Parameters were evaluated through hydrologic model simulations in which daily flows were compared with observations over a 20 year period. We show that the penalty in streamflow prediction skill for using zonal parameters at a given basin (i.e., locally) is comparatively smaller than the penalty for using local parameters zonally. Regionalizations using zonal parameters and local catchment descriptors had the best model performance for both training and validation periods. Finally, we investigate the potential for transferring parameters globally by repeating the regionalization using only catchment attributes derived from globally available data and show that for the United States, model performance is only slightly worse than when U.S.-specific data area used.
 The goal of parameter regionalization is to derive predictive relationships between observable catchment descriptors and model parameters, and in so doing provide a basis for (a) avoiding time-consuming local parameter estimation when a model is applied to a new domain and/or (b) transferring parameters to ungauged basins. The most common approaches for transferring parameter information in past work have been multiple regression and interpolation [Magette et al., 1976; Karlinger et al., 1988; Post and Jakeman, 1996; Abdulla and Lettenmaier, 1997; Kull and Feldman, 1998; Post et al., 1998; Post and Jakeman, 1999; Siebert, 1999; Fernandez et al.,2000; Merz and Bloschl2004; Gan and Burges, 2006; Heuvelmans et al., 2006; Wagener and Wheater, 2006; Boughton and Chiew, 2007; Goswani et al., 2007; Yadav et al., 2007; Viviroli et al., 2009], where the explanatory variables are catchment characteristics. He et al.  categorized these types of parameter transfer into either distance-based or regressions-based regionalization. Other common techniques include hydrologic classification and clustering methods [Vandewiele and Elias, 1995; Gupta et al., 1999; Nijssen et al., 2001; Koren et al., 2003; Merz and Bloschl, 2004; Pokhrel et al., 2008; Zhang et al., 2008], which essentially assign a priori model parameters to catchments either via grouping into hydrologically homogenous regions, by relating parameters to land cover and climatic data, or by a combination of the two.
Singh and Frevert  distinguished regionalization procedures, for which site-by-site model calibration is followed by regionalization, from those that perform calibration and regionalization in a single combined step. Samaniego et al. [2010b] referred to the former as postregionalization and the latter as simultaneous regionalization. The latter has the benefit of incorporating additional geographical information (from nearby basins) into the parameter estimation procedure. A few recent studies have implemented simultaneous regionalization strategies, with varied results. Fernandez et al.  attempted to optimize regional relationships and regional model parameters simultaneously for 33 catchments in the southeastern United States and concluded that improvements in regional relationships do not necessarily lead to improved model performance at ungauged sites. Kim and Kaluarachchi  compared a simultaneous calibration procedure with five postregionalization techniques and found modest improvements from the simultaneous calibration. Hundecha and Bárdossy , Götzinger and Bárdossy , and Pokhrel et al.  each had some success in simultaneous regionalization over different study domains; however, with the exception of Hundecha and Bárdossy , all found that parameter transferability was limited by the use of discrete soil texture classes. Pokhrel et al.  utilized the soil texture class relationships of Koren et al.  to constrain parameters during calibration. Samaniego et al. [2010a, 2010b] successfully applied a simultaneous regionalization procedure that considered the subgrid variability of catchment descriptors. This consideration has the advantage of enabling the regionalization to incorporate information from finer resolution data to estimate effective parameter values that are representative of the dominant hydrological processes at a coarser grid and across scales.
 Collectively, the range of explanatory variables that have been used in the aforementioned studies include meteorological data (mean, variability and frequency information for precipitation, temperature, potential evapotranspiration (ET), solar exposure, and surface wind), and land surface attributes (catchment geometry, aspect, and geomorphic data; geologic information, soil fractional composition, texture classes, hydraulic properties, and depth; vegetation coverage, land-use information, and glaciation). Data sources have almost entirely been in situ; to date, only limited use has been made of remote-sensing data as explanatory variables. In some cases, streamflow information has been used in regionalization (e.g., baseflow index, peak discharge and recurrence, runoff ratio, specific runoff); however, this results in obvious limitations for parameter transfer to ungauged basins.
 Our major objective in this study is to extend the previous model development and calibration efforts for the unified land model [ULM—Livneh et al., 2011; Livneh and Lettenmaier, 2012] to improve model applicability to new domains. Specifically, the goal is to relate calibrated model parameters to catchment descriptors (i.e., regionalization). The model parameters that serve as the basis for our regionalization relationships were derived through single and multi-criteria site-by-site calibrations described by Livneh and Lettenmaier . These calibrations involved minimizing errors between entire time series of predicted streamflow, Q, and observations, as well as in some cases, remote-sensing estimates of ET. A secondary objective of our work is to evaluate alternate means for sampling calibrated parameters and catchment attributes that improve spatial representativeness and enhance regional model performance.
 Because there are many potential explanatory variables for regional parameter estimation and many of these attributes are correlated by construct, we used a principal components analysis (PCA) approach to maximize their explanatory skill and minimize potential redundancy. This approach technically falls into the category of “postregionalization”; however, in two experiments (described in section 2.5), we essentially retrofit the data into a pseudosimultaneous regionalization by conducting a limited number of additional simulations—termed here as “zonalization.” This procedure reevaluates the appropriateness of model parameter sets in the context of their performance over neighboring catchments (i.e., in spatial “zones”). Similarities exist between zonalization and other simultaneous regionalization approaches, but a key exception is that the regionalization takes place a posteriori with respect to model calibration.
 We further test this approach in the context of global parameter transfer, by restricting the regionalization procedure to only those catchment attributes with global coverage. This provides a broader context for hydrologic modeling in data-poor regions where stream gauges are not present, or for cases where site-specific calibration is not computationally viable.
 We describe in this section the ULM model and its parameters, along with the geospatial and hydrometeorological data needed to set-up and force the model. We also summarize the structure of the regionalization experiments, and the form of PCA and bias-correction techniques used.
2.1. Study Area and Data Sources
 The study domain consists of 220 river basins with drainage areas in the range 102 to 104 km2 distributed across the conterminous United States (CONUS; Figure 1). The 220 basins were selected to provide a broad cross-section of hydroclimatic regimes (see Livneh and Lettenmaier  for details). The catchments are a subset of the model parameter estimation experiment (MOPEX) data set [Schaake et al., 2006], which has been screened to assure an adequate density of precipitation gauges and minimal effects of upstream anthropogenic activities such as irrigation diversion and reservoir operations. Streamflow data were obtained directly from U.S. Geologic Survey (USGS) archives.
2.2. Model Forcing Data
 The meteorological forcing data used in this study were derived by B. Livneh et al. (Extension and spatial refinement of a long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States, submitted to Journal of Climate, 2012) and are available at a 1/16° resolution over the CONUS domain for the period 1915–2010. The gridded precipitation and daily minimum and maximum temperatures are based on a total of approximately 20,000 NOAA Cooperative Observer (Coop) stations shown in Figure 1. Wind data were linearly interpolated from a coarser (1.9° grid) National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP–NCAR) reanalysis grid [Kalnay et al. 1996], which was also used to produce a daily wind climatology for years prior to 1948.
2.3. Land Surface Model
 ULM is a merger of the land surface components from the Noah land surface model [LSM; Ek et al., 2003]—e.g., vegetation, ET computation, snow model, and algorithms for computing frozen soil, surface heat, and radiative fluxes—and subsurface elements (soil moisture and runoff generation algorithms, as well as infiltration) from the National Weather Service (NWS) Sacramento Model [Burnash et al., 1973]. The snow model is described by Livneh et al. . It is essentially the standard Noah snow model augmented to include time-varying albedo, partial snow cover, and retention of liquid water within the snowpack. Livneh et al.  evaluated ULM performance with respect to observed river discharge, flux towers measurements of surface heat fluxes, and soil moisture. Table 1 summarizes plausible physical ranges of the model soil parameters that constrained the parameter estimation here. The soil moisture column essentially consists of five conceptual storage components that represent “free” and “tension” water reservoirs in an upper and a lower zone. Free water is a representation of the quantity of water in excess of the soil's field capacity, for which gravity governs the moisture movement. The tension water zones represent the quantity of water between the soil's field capacity and wilting point that is bound more closely to the soil and hence must be satisfied before any moisture can be extracted from the free water zones. The reader is referred to the study by Livneh et al.  and Koren et al.  for a more complete description of model physics.
Table 1. List of ULM Soil Parameters from NWS Sacramento Model and Their Plausible Ranges
Upper zone tension water maximum storage
Upper zone free water maximum storage
Upper zone free water lateral depletion rate
Maximum percolation rate
Exponent of the percolation curve equation
Lower zone tension water maximum storage
Lower zone free water supplemental maximum storage
Lower zone free water primary maximum storage
Depletion rate of the lower zone supplemental free water storage
Depletion rate of the lower zone primary free water storage
Percolation fraction going directly from upper zone to lower zone free water storages
Impervious fraction of the ground surface
Maximum fraction of additional impervious area caused by saturation
2.4. Catchment Attribute Data
 The catchment attribute data sets are summarized in Table 2 and serve as candidate predictor variables for the regionalization experiments. The first set of attributes correspond to the soil texture class relationships used by Koren et al.  to estimate a set of a priori model parameters for the NWS Sacramento model (Sac). These soil parameters are specific to the Sac model, which is the basis for the soil hydrology in ULM. The soil texture relationships used by Koren et al.  are readily estimated from STATSGO data [Schwarz and Alexander, 1995] and heretofore represent the default parameters for ULM.
Table 2. Catchment Attributes Used as Candidate Explanatory Variables in Parameter Regionalization
Monthly gravity field solutions are computed at the University of Texas at Austin Center for Space Research, the GeoForschungsZentrum Postsdam, and the Jet Propulsion Laboratory, which use different processing strategies and hence yield slightly different results.
Soil Texture Attributes
Tension and free water storages, hydraulic conductivities, impervious areas, percolation constant, recession slope.
 The second set of catchment attributes includes the remaining land surface characteristics that can be derived from inputs required by ULM. These include geomorphic variables (derived from the input DEM), vegetation information such as percent forest cover and seasonal greenness, as well as information pertaining to soil temperature climatology and seasonal land surface albedo. The third set of attributes is derived from two remote-sensing moisture flux products that are described in detail by Livneh and Lettenmaier . These include the MODIS-based ET data product of Tang et al. , as well as terrestrial water storage change (TWSC) based on gravity recovery and climate experiment (GRACE) data. Since the GRACE data are at coarser resolution (∼1°) than the model spatial resolution (1/16°), we compute a spatially weighted average TWSC value for each basin. The final set of attributes are taken from the GAGES-II data set, which includes information about basin morphology, climate, topography, soils, anthropogenic disturbance factors, and land use. Only floating-point basin attribute data in GAGES-II were considered. Also, we did not consider streamflow-based attributes, since these are not available for ungauged basins.
2.5. Regionalization Procedure
 The regionalization approach seeks to leverage the site-by-site calibrations performed by Livneh and Lettenmaier . In their work, each basin underwent a multiobjective calibration involving approximately 2000 simulations per basin, from which a set of Pareto-optimal parameter sets, θP, was identified, which minimize streamflow prediction errors. Given the large inherent data-storage requirements, only summary data from these calibrations were retained for each basin; these consist of the roughly 2000 parameter sets and their associated performance statistics. The major performance statistics used were the components of the Nash-Sutcliffe coefficient (NSE) [Nash and Sutcliffe, 1970] pertaining to differences in simulated and observed daily streamflows [Gupta et al., 2009]. These include the Pearson correlation coefficient, R, the difference in mean flows, α, and the difference in standard deviations, or range shift, β. An important point here is that when calibrating the three components of the NSE in a multiobjective context, the resulting Pareto set of solutions have statistics that are nondominant with respect to each other (i.e., there are tradeoffs between R, α, and β). However, Livneh and Lettenmaier  observed that the effective NSE for these parameter sets are not necessarily equal, and some sets may have slightly higher or lower NSE than others.
 The pairs of parameter sets and performance statistics from Livneh and Lettenmaier  were reanalyzed in two ways that address the findings of previous regionalizations [Fernandez et al., 2000] detailed in section 1. The assertion of that work was that a strong predictive relationship between catchment descriptors (predictors) and calibrated model parameters (predictands) will not necessarily lead to improved model performance using regionalized parameters. Therefore, we designed four experiments to test the impact of predictive relationship strength on regionalized model performance by selecting predictands and predictors in several ways that alter the predictive relationships. The predictive relationship between each predictor/predictand set was derived using PCA (described in section 2.6), and the resulting ULM streamflow performance using regionalized model parameters was compared for each experiment. The first two experiments (Table 3) contrast parameter selection (predictands), based on either local or zonal representativeness, illustrated in Figure 2. Local parameters were selected based on the streamflow NSE for the basin of interest. Alternatively, zonal parameters were selected based on the average NSE that resulted from iteratively simulating each pareto-optimal parameter set, θP, from the basin of interest, at its neighboring basins, which we defined as being all basins for which the gauge was within a 5° latitude–longitude of the target basin gauge. The first experiment used locally optimized parameters, θP-LOCAL, and thus followed a classic postregionalization approach, where a predictive relationship was derived between local catchment descriptors and locally optimized model parameters. In the second experiment, we zonalized the model parameters prior to regionalization (i.e., θP-ZONAL), which is anticipated to produce more spatially representative model parameters and test the assertion above.
Table 3. Regionalization Experiments Considering Either Local or Zonal Predictands (ULM Parameters, θP) and Predictors (Catchment Attributes, η)
ULM Parameters (Predictands)
Catchment attributes (predictors)
 The main points of the zonalization procedure (Figure 2) for a given basin are as follow:
 1. Begin with 10 pareto-optimal sets per basin θP;
 2. Compute the NSE for each parameter set at the local basin. Select the set with the highest NSE to be the local optima θP-LOCAL;
 3. Run each parameter set, θP, at all neighboring basins within a 5° zone;
 4. Average the resulting NSE values for each parameter set across neighboring basins and select the parameter set with the highest average NSE as the zonal optima θP-ZONAL.
 The above procedure was repeated for every basin in the domain. The third and fourth experiments (Table 3) were nearly analogous to the first two experiments, except that a simplified zoning procedure was extended to the catchment descriptors, η, where each basin's catchment descriptors (ηLOCAL) were replaced by their spatial average values within a 5° radius (ηZONAL). The intent of these final two experiments was to assess the value of minimizing the impact of outliers on the regionalization equations, achieved through simple spatial averaging, ηZONAL.
 The intention of selecting a 5° zoning radius was to consider a radius sufficiently large, so as to contain enough basins within each zoning window (on average 22 basins) to test the applicability of one set of parameters from a given basin on its neighboring basins. The size of the zoning radius was constrained by the limitation of keeping the computational effort tractable—i.e., the number of additional simulations needed to complete a 5° zonalization experiment was roughly 48,400 (i.e., 220 basins × 10 parameter sets per basin × 22 average neighbors). One limitation of using a strictly length-based zoning criteria is that important geographical boundaries may be crossed within the zoning radius (i.e., continental divide), which has the potential to confound regional parameter estimation in those areas.
2.6. Principal Components Analysis
 PCA was used to develop predictive relationships between catchment descriptors and model parameters for each experiment listed in Table 3. Garen  developed a PCA regression procedure to improve streamflow volume forecasting that included a systematic search for optimal or near-optimal combinations of variables. The procedure has since been used in numerous studies for applications related to hydrologic data reconstruction, reservoir modeling, and water supply forecasting [Hidalgo et al., 2000; Rosenberg et al., 2011]. Hidalgo et al.  found this method to produce more parsimonious results than other PCA regression-based models. To date, this method has not been used specifically for LSM parameter regionalization, although the method is suitable for incorporating a large number of predictor variables (i.e., candidate catchment attributes).
 Therefore, we applied the Garen  approach as implemented by Rosenberg et al.  with the exception that we considered all descriptors with nonzero correlations with the model parameters (as opposed to considering only positively correlated descriptors as is typically done). The procedure includes an iterative series of t tests between catchment attributes and model parameters, ultimately preserving only those principal components (i.e., combinations of descriptors) that are statistically significant (α = 0.1). A jack-knifing approach was used to predict parameter values for each basin based on its catchment descriptors, such that the predictive equation is derived exclusively from information obtained from the other basins. This procedure was chosen to emulate the prediction of model parameters in ungauged basins.
3. Results and Discussion
 We evaluated the variability and spatial coherence of the regionalization input data, followed by an examination of the tradeoffs in model performance that result from using locally versus zonally optimized inputs (θ and η). We then compared ULM performance from all regionalization experiments to understand which strategy produces the best results. Finally, the regionalization experiments were repeated using only globally available catchment attributes to assess the applicability of the outlined method to data-poor regions.
3.1. Spatial Coherence
 Figure 3 shows the inputs to the four regionalization experiments. The western United States is enlarged to illustrate the spatial coherence of zonal model parameters. Livneh et al.  found ULM to be highly sensitive to the parameter shown in Figure 3 (UZTWM). This suggests that model parameters chosen on the basis of zonal performance indeed have detectable spatial coherence and may be linked to processes with length scales larger than those of an individual basin. This is an encouraging preliminary result, given the ultimate goal of finding a set of regional equations that describe model parameter variations across large spatial scales. Consequently, zonalization provides a key spatial linkage among neighboring catchments prior to regionalization. This spatial linkage was different from the Livneh and Lettenmaier  parameters, which were derived from isolated catchment-by-catchment calibrations that include a degree of localized randomness in the search procedure.
 One way to quantify the coherence of a phenomenon in space (e.g., parameter variability) is by constructing an experimental variogram, . This provides a measure of the average dissimilarity of a vector class, ℌk, as a function of distance, h, by computing the dissimilarity between all pairs of samples within a region, i.e.,
where z(xα) represents a parameter value in space for all samples, nc. The experimental variogram is typically computed using vectors, h, of a length less than half the diameter of the region [Wackernagel, 2003]. The experimental variograms for the locally optimized and zonally optimized parameters are shown in Figure 4. For ease of viewing dissimilarities, variograms were plotted via binning at 1° intervals, and dissimilarities were normalized by the square of the mean of each parameter value to facilitate interparameter comparison. Further, Euclidean distances were computed to account for the differences in degrees latitude and longitude—i.e., longitude was multiplied by the cosine of the latitude to obtain comparable distances. We acknowledge that the shape of a variogram may change depending on regularization of the input data [Skøien et al., 2003; Blöschl and Sivapalan, 1995] and clarify that rather than using a block variogram, point parameter data from each basin was used here.
 Two important variogram features are apparent. The first is the dissimilarity value in the bin closest to the origin, which denotes how abruptly the values of the variable changes at a very small scale. This is called the nugget effect, which draws its name from the field of gold mineral exploration. The nugget effect shown here is related to parameter variability within the nearest 1° interval for a given basin. The second important feature of the variogram, the sill, is the distance beyond which increases in variability become negligible—i.e., the upper limit of the variogram. The distance from the origin to the sill is called the range, and it essentially represents the length scale of spatial coherence for a given parameter. It should be noted that this variogram analysis was used solely to illustrate the apparent differences in spatial coherence between the zonal and local parameter variability.
 Figure 4 shows only those parameters with a range that could be detected graphically. In all cases, the zonally optimized parameters have ranges that are greater than those of the locally optimized parameters and in some cases no sill was detected for locally optimized parameters, whereas one was for zonal optimizations. This is a key aspect of zonalization; namely that without additional calibration, it is possible to resample existing model parameter sets based on their performance at neighboring basins to obtain a more spatially continuous parameter surface. This will undoubtedly translate into more realistic model states (e.g., soil moisture) at the boundaries between adjacent basins, while conceding a minimal amount of model streamflow prediction skill from within the Pareto front.
 Although it should intuitively follow that all zonally optimized parameters should have greater range than local parameters by construct, roughly half of the parameters did not have detectable ranges for either case. This suggests that model performance is either insensitive to their values or that parameter values may be correlated with fields that are not spatially coherent. The cases where no sill could be detected for locally optimized parameters (while one was in fact apparent for zonal optimization) were for instances of small spatial coherence, where the range was entirely contained within the 5° zoning window (for parameters Lower zone tension water maximum storage (LZTWM), Lower zone free water supplemental maximum storage (LZFSM), and Exponent of the percolation curve equation (REXP)). This suggests that the size of the selected zoning window (5°) could impact the extent of the range. The presence of the nugget effect for all parameters indicates considerable variability even at very small scales. The nugget effect is frequently smaller in zonal than local parameters, but not always, which is likely caused by the combination of parameter uncertainty and model streamflow prediction errors (i.e., imperfect optimizations). Important geographical boundaries may have been crossed within the zoning radius (i.e., continental divide), which have further contributed to a larger nugget effect for zonal parameters. Overall, the parameters with detectable ranges in Figure 4, especially Maximum fraction of additional impervious area caused by saturation (ADIMP) and Upper zone tension water maximum storage (UZTWM), are consistent with those identified by Livneh et al.  to which ULM was most sensitive.
3.2. Simulated Streamflow Performance
 The tradeoffs in model performance associated with using zonal as contrasted with locally optimized parameters are shown in Figure 5. For approximately 20% of the basins, the best zonal performance resulted from the best locally optimized parameters—i.e., optimal zonal and local parameters were the same. The 20% overlap is likely a function of the constraints placed on the original model calibration [Livneh and Lettenmaier, 2012], in which the initial search of parameter space (i.e., burn-in) was limited to 2000 iterations for each basin. For the remaining ∼80% of basins, Figure 5 shows that the penalty for using zonal parameters for a given basin (i.e., locally) is comparatively smaller than the penalty for using local parameters zonally. Kumar and Samaniego  found an analogous result for hydrologic model calibrated over a number of basins in Germany, with the aim of not over-fitting parameters to an individual basin. We infer from these results that model performance using zonalized parameters has a comparatively smaller penalty than local optimization, while lessening the degree of over-fitting parameters to any single basin.
 Figure 6 compares skill scores from the four regionalization experiments to local optimization. Differences in skill vary between regionalization experiments; however, skill scores tend to be lowest relative to local calibration for basins where local optimization was itself the least skillful (i.e., on the rightmost tail of the plots). An interesting result is that for several cases, the regionalized model outperformed the local optimization, most frequently for the case of zonalized parameters with local catchment descriptors. This finding is partially attributable to the calibration limitations highlighted above. The complete results from the calibration period (1991–2010) and a validation period of equal length (1971–1990) are tabulated in Table 4. For both time periods, regionalization using zonally rather than locally optimized parameters scored slightly higher, while the opposite was true for catchment descriptors—experiments using local (zonal) catchment descriptors had slightly higher (lower) skill. This finding supports a hypothesis that model parameters may in fact be zonally representative, given their function of integrating model physics with hydrometeorological processes. These processes frequently have length scales larger than an individual catchment (e.g., seasonal frontal precipitation over the central and eastern United States). Alternatively, catchment descriptors generally represent spatially fixed local phenomena that are not necessarily well expressed through areal averaging. However, it is worth noting that the spatially averaged zonal catchment descriptors were not a direct analogue to the zonal parameters, which were selected following a nonlinear transformation through ULM and a subsequent streamflow comparison.
Table 4. Model Performance (in Terms of NSE) for Calibrations and Regionalization Experiments
 The validation period is a useful test of the robustness of model parameters and is important for model studies dealing with hydroclimatic sensitivities. Both the locally and zonally optimized skill scores are smaller during the validation period than in the original period used for parameter estimation, but the drop for zonally optimized parameters is more marked. Somewhat surprisingly, model skill from the regionalization experiments was slightly higher in the validation period than in the calibration period from which their predictive relationships were derived. Despite the differences in model skill between the two time periods, a paired t test was performed, which showed that none of the differences listed in Table 4 was statistically significant.
 Notwithstanding these subtle differences, it remains that at the preregionalization stage, the zonal parameters produce slightly poorer results than local parameters, while the opposite is true after regionalization for many cases. This result highlights both the limitations and potential utility of this study. The limitations are due in large part to the parameters that form the basis of regionalization [i.e., from Livneh and Lettenmaier, 2012], which were derived through a search procedure that was constrained (in number of iterations) and stochastic (via a Monte Carlo burn-in approach). This facilitated the few cases where regional parameters outperformed local calibrations, in effect, where the calibration search procedure did not identify the global optima. Further, the stochasticity of parameter selection and complex higher-order parameter interaction contributed to (i) lack of spatial coherence for insensitive soil parameters, as illustrated by the variogram constructions and (ii) potential variability in the fraction of local optima that were also zonal optima, which was approximately 20% of all basins in this case. In light of these limitations, the utility and robustness of the regionalization procedure was shown, via the capability to assign parameters to a given basin that perform comparably to calibrated values, without additional site-specific information or calibration; instead relying solely on attributes and parameters from surrounding basins.
3.3. Principal Components and Catchment Descriptors
 The final equations used to predict model parameters were made up of combinations of statistically significant principal components (PCs), which themselves were composed of catchment descriptors. These are essentially regression equations that relate the catchment descriptors to a given model parameter, where the coefficients were obtained from optimized combinations of significant PCs (section 2.6). Table 5 summarizes the number of catchment descriptors, PCs, and the PCA parameter prediction errors, expressed as a normalized standard error:
where s is the standard deviation of parameter estimation errors, n is the number of basins, and is the mean value of the parameter being estimated. The number of PCs needed to estimate a given parameter varied from 1 to 26, with an average of approximately 10, while the number of catchment descriptors used in estimation ranged from 17 to 47 with an average of 27. Quantitatively, these numbers did not appear to have an impact on the overall quality of parameter estimates. The θP-ZONAL-ηLOCAL regionalization had the smallest parameter prediction error and the best regionalized model performance. The predictive errors and the regionalized model performance from the other experiments intuitively followed similarly. Therefore, although it was called into question by Fernandez et al. , this conclusion appears to hold in this case, namely higher quality regionalization relationships ultimately produce better model simulations.
Table 5. Counts of Principal Components and Catchment Descriptors Needed For the Predictive Equations for Each Model Parameter Listed Belowa
Normalized Standard Error,
Normalized standard error (equation (2)) was computed from the differences between the predictand (either θP-LOCAL, or θP-ZONAL) and the PCA generated estimated value for each regionalization experiments, where the first letter in the heading refers to parameter optimization, θP, and the second refers to the catchment attribute, η., i.e., Z = “zonal”, L = “local.”
 Table 6 shows the overall frequency and use of descriptor variables in terms of their respective classes. Descriptors from the GAGES-II database far outnumber all others and were also used most frequently for parameter predictions. The next most frequently used attributes were the GRACE-based TWSC, which were used nearly twice as often as the other remote-sensing data set (ET), which was least frequently used overall.
Table 6. Summary of Catchment Descriptors Used in Regionalization, Including the Average Number of Parameters Each Descriptor Was Used to Predict (Frequency), and the Total Number of Descriptors Used From Each Class
Mean Frequency of Descriptor Class in Predictions
Fraction of Descriptors Used (From Total)
 Table 7 lists the most explanatory catchment descriptors for each parameter in the regionalization process; computed as the product of the descriptor coefficient (in its respective PCs) and the mean descriptor value across all catchments ( ). Every category of catchment attributes (summarized in Table 1) was represented within the most explanatory set of descriptors shown in Table 7. Somewhat surprisingly, soil-based attributes were not ubiquitous among the most explanatory descriptors and were not among the three most predictive descriptors for the lower zone soil moisture storage parameters (LZFSM and LZFPM). The lack of a direct connection between soil-based attributes and model soil parameters may be partially attributed to the stochasticity described in section 3.2, which calls into question the extent to which parameters can be reliably derived from catchment attributes. This is particularly relevant considering the large number of candidate descriptors with the potential for nonphysical connections to model parameters. Alternatively, the non-soil-based descriptors carry the potential to represent physical phenomena that vary at temporal frequencies that are consistent with the processes that the respective model soil parameters describe. For example, in energy-limited basins, springtime radiation, temperature, and greenness may provide a surrogate for deep soil moisture and resistance toward ET, which collectively possess comparable seasonal frequencies.
Table 7. Three Most Explanatory Catchment Descriptors for Each Predicted Model Parameter, Based on the Product of the Regression Coefficients and the Mean Descriptor Value (All Descriptors Below Were Part of Statistically Significant PCs in the Prediction Equations)a
For each parameter, the descriptors are listed in order of their explanatory strength. Results are shown only for the case of zonal parameters with local descriptors. Descriptions of model parameters are provided in Table 1.
Maximum intermonthly daily minimum temperature
Maximum intermonthly daily maximum temperature
Average value of clay content (%)
Watershed average relative humidity (%)
A priori value of maximum percolation rate coefficient
Mean evapotranspiration in autumn (September, October, November)
Diameter of a circle with basin area divided by the maximum transect.
Percent of stream lengths in the watershed which are third-order streams (Strahler order)
 A notably intuitive relationship resulted between a related descriptor and parameter, namely the topographical wetness index and the parameter ADIMP, which represents the maximum fraction of additional impervious area caused by saturation. Finally, several a priori ULM parameters (from the Koren et al.'s  Sacramento Model parameter estimation) were found to be explanatory predictors. Somewhat unexpectedly, the explanatory a priori predictors were not of the same parameter. This can be partially reconciled by the fact the a priori parameters themselves were derived from the same soils data base (STATSGO) and they frequently represent interrelated processes. An example is the a priori value of PFREE (representing the percolation fraction going directly from upper zone to lower zone free water storages) that was an explanatory descriptor for REXP (the exponent of the percolation curve equation), which are both computed from the estimated soil wilting point in the original derivation of Koren et al. .
3.4. Global Parameter Estimation Experiment
 To emulate parameter estimation in data-poor regions, we repeated the most effective regionalization from section 3.2 (θP-ZONAL-ηLOCAL) as well as the classic postregionalization (θP-LOCAL-ηLOCAL) using only globally available catchment attributes. The two experiments chosen represent the recommended and conventional regionalizations, respectively. They exclude the GAGES-II attributes (which comprised a large number of explanatory catchment descriptors in each original experiment, but are available only for the CONUS). The total number of candidate catchment attributes was reduced to 65 for these tests. Figure 7 compares ULM performance as for Figure 6, however, with the addition of the global coverage experiments. Both of the global experiments perform remarkably closely to the experiments in section 3.2 with only modest reductions in both mean and standard deviation of the skill scores (NSE listed on Figure 7).
 In the absence of GAGES-II catchment attributes, the PCA procedure identified alternative explanatory catchment attributes in place of those used in the original experiments (i.e., described in Table 7). Some of the most apparent replacements (not tabulated) for ULM-sensitive parameters (as identified by the experimental variograms) were topographic wetness index was replaced by soil porosity for ADIMP; average clay content was replaced by mean monthly precipitation for UZTWM; and mean monthly temperature replaced relative humidity for LZFSM. Meanwhile, the most explanatory descriptors for PFREE remained unchanged. Altogether, in the absence of GAGES-II data, the PCA procedure was robust in preserving much of the explanatory skill to estimate ULM parameters.
 In each regionalization experiment, a small number of predictive relationships were found between catchment attributes and ULM parameters that were difficult to reconcile. These either lacked a physical basis or would lead to dangerously circular arguments in their link to the model parameter. Examples of the former in Table 7 are the relationship between the basin compactness ratio with LZTWM, and the basin elongation ratio with maximum percolation rate (ZPERC). These relationships likely result through the effect of basin shape on the steepness or character of the hill slope, which could have a secondary impact on model parameters. Examples of the latter are the relationship between the a priori value of ZPERC with UZFWM and the a priori values of UZTWM with PFREE. ZPERC controls the percolation rate to the lower zone, so it is perceived that a larger storage reservoir (UZFWM) could serve to compensate for this. PFREE is entirely in the lower zone, controlling how much percolation goes to free water recharge, so again it is possible that UZTWM (which controls percolation rate) could have bearing on this, but both of these relationships are somewhat circular. For the above cases, it is equally likely that these relationships are essentially mathematical artifacts of the regionalization. Siebert  noted several regionalization relationships lacking physical basis in an 18-catchment experiment, where lake area was found to be an explanatory variable, despite the fact the model structure did not explicitly considered lakes. Therefore, shown here was a proof of concept for the potential utility of this methodology. However, it is recommended that catchment attributes with the potential for implausible relationships with model parameters be screened out when implementing this method in practice.
4. Summary and Conclusions
 We have described a methodology for regionalizing ULM parameters based on a set of parameter optimizations and ancillary (predictor) variables for 220 catchments located across the CONUS. Both local and zonal parameterization approaches were developed and evaluated in a series of regionalization experiments that ultimately compared the impact of initial parameter and catchment attribute selection on regionalized model performance. A PCA approach constructed predictive relationships between locally or zonally optimized model parameters and PCs consisting of catchment attribute variables that include soil texture, geomorphic, meteorological, and land surface features, as well as two remote-sensing data sources and the GAGES-II data set. Our main conclusions are as follows:
 1. The penalty in streamflow prediction skill for using zonal parameters at an individual basin (i.e., locally) is comparatively smaller than the penalty for using local parameters zonally. This suggests that it is possible to avoid overfitting model parameters to individual basins, while preserving a significant amount of predictive skill.
 2. The experiments showed that the quality of regionalized model performance is directly related to the strength of parameter predictive relationships. Therefore, to be most effective, future regionalization efforts should seek ways to pair model parameters (predictands) and catchment (attributes) prior to regionalization (as done here through zonalization) to further improve regionalization quality.
 3. Over the 20-year training and validation periods, the most skillful model performance resulted from regionalizations that used zonal model parameters and local catchment descriptors as inputs for the PCA procedure. Both the calibrated and regionalized parameter sets were temporally robust, since differences in performance between the training and validation periods were not statistically significant.
 4. The approach worked surprisingly well when only globally available data sources were used. Although many of the best predictors in the original experiment were from the GAGES-II data set (which is only available for CONUS), other parameters displaced those based on GAGES-II, with little loss in accuracy.
 5. Despite the utility and broader applicability of the regionalization approach presented here, we acknowledge limitations in both the calibrated model parameters and descriptor data. First, we recognize a degree of stochasticity and imperfect performance associated with the calibrated parameters that were used as predictands. The second confounding issue is the tractability of potential higher order interactions among the numerous candidate descriptors as well as jointly with model parameters. Altogether, these factors may quantitatively impact regional parameter estimation performance; however, we argue that the qualitative utility of this approach remains viable.
 This work was supported by NOAA Grant NA070AR4310210 to the University of Washington. This work was facilitated through the use of advanced computational, storage, and networking infrastructure provided by the Hyak supercomputer system, supported in part by the University of Washington eScience Institute. We appreciate the advice of Eric Rosenberg in implementing the PCA method, and of Bart Nijssen for discussions related to the experimental design.