Uncertainties in hydrologic parameters could have significant impacts on the simulated water and energy fluxes and land surface states, which will in turn affect atmospheric processes and the carbon cycle. Quantifying such uncertainties is an important step toward better understanding and quantification of uncertainty of integrated earth system models. In this paper, we introduce an uncertainty quantification (UQ) framework to analyze sensitivity of simulated surface fluxes to selected hydrologic parameters in the Community Land Model (CLM4) through forward modeling. Thirteen flux tower footprints spanning a wide range of climate and site conditions were selected to perform sensitivity analyses by perturbing the parameters identified. In the UQ framework, prior information about the parameters was used to quantify the input uncertainty using the Minimum-Relative-Entropy approach. The quasi-Monte Carlo approach was applied to generate samples of parameters on the basis of the prior pdfs. Simulations corresponding to sampled parameter sets were used to generate response curves and response surfaces and statistical tests were used to rank the significance of the parameters for output responses including latent (LH) and sensible heat (SH) fluxes. Overall, the CLM4 simulated LH and SH show the largest sensitivity to subsurface runoff generation parameters. However, study sites with deep root vegetation are also affected by surface runoff parameters, while sites with shallow root zones are also sensitive to the vadose zone soil water parameters. Generally, sites with finer soil texture and shallower rooting systems tend to have larger sensitivity of outputs to the parameters. Our results suggest the necessity of and possible ways for parameter inversion/calibration using available measurements of latent/sensible heat fluxes to obtain the optimal parameter set for CLM4. This study also provided guidance on reduction of parameter set dimensionality and parameter calibration framework design for CLM4 and other land surface models under different hydrologic and climatic regimes.
 It is generally believed that land surface models (LSMs) (more than 30 in use today) [Pitman, 2003] should be based on the laws of physics, with parameters that are physically meaningful, measurable, and transferable to locations sharing the same physical properties, so that a priori assignment of parameter values based on site conditions is justified and desirable, as assumed in the Project for Inter-comparison of Land Surface Parameterization Schemes (PILPS) [Bastidas et al., 2006; Henderson-Sellers et al., 1995, 1996]. However, recent studies suggest that many LSM parameters are uncertain, and default assignment of parameter values may be inappropriate [e.g., Bastidas et al., 2006; Rosero et al., 2010]. From the limited number of studies performed to date, it is not clear what parameters are more uncertain and what the potential is for using observations to constrain or calibrate the uncertain parameters to better capture uncertainty in the resulting land surface states. This study aims at quantifying the uncertainties related to a subset of parameters in a community LSM.
 The Community Land Model (CLM4) is the land component within the Community Earth System Model (CESM) (formerly known as Community Climate System Model (CCSM) [Collins et al., 2006; Gent et al., 2010; Lawrence et al., 2011]. CLM4 has also been tested as the land surface component in an initial effort to develop a regional earth system model based on the Weather Research and Forecasting (WRF) model [Leung et al., 2006]. CLM4 could be used as a traditional land surface model to simulate water and energy fluxes and state variables, driven by observed phenology from satellite (hereafter denoted as CLM4-SP). It has also been extended to include a coupled carbon and nitrogen cycle biogeochemistry functionality (denoted as CLM4-CN hereafter) [Oleson et al., 2010; Thornton and Zimmermann, 2007; Thornton et al., 2007]. When activated, CLM4-CN replaces the diagnostic treatment of vegetation structure (e.g., leaf area index, canopy height) in CLM4 with prognostic variables.
 Uncertainties associated with LSMs come from many different sources. One apparent source is structural uncertainty since a model is always a simplified description of the actual processes or phenomena, with certain assumptions that are only valid under specific conditions. For example, CLM4 adopts simplified representation of the hydrologic processes with the one-dimensional Richards' equation for vadose zone flow processes in combination with TOPMODEL-based runoff generation parameterizations. Parameterizations of other bio-geophysical and biogeochemical processes also contribute to modeling uncertainty.
 In addition to structural uncertainty, LSMs are also subject to uncertainty related to input parameter values. This is particularly significant for land surface models because a large number of input parameters such as those associated with land cover and land use and soil properties are not directly measurable at the scales of its applications. For example, CLM4 has hundreds of input parameters related to equations that describe biophysical, hydrological, and biogeochemical processes of different complexity. A common practice in land surface modeling is thus to define a set of default parameter values that are globally applicable or depend upon environmental conditions. This can simplify global application significantly. While some parameters are explicitly linked to the land cover/land use, soil texture and soil color classes, or Digital Elevation Model (DEM) data, others are assigned based on theoretical or empirical considerations, or loosely calibrated based on limited applications.
 As LSMs are important components of earth system models, quantifying their uncertainty can improve our understanding and characterization of uncertainty in earth system models and their predictions. However, due to the high dimensionality and computational demand, efficient and reliable sensitivity analyses as well as domain knowledge must be utilized to reduce the dimensionality of model parameters. For example, one can select specific climate or hydrologic regimes and focus on a subset of processes such as the hydrological cycle or carbon cycle individually to study model sensitivity to a subset of parameters that are relevant to the processes of interest. Sensitivity analyses can then be used to evaluate the relative significance of the limited number of parameters and be extended to evaluate relationships between inputs and output responses.
 Sensitivity analysis (SA) has been widely employed in LSM applications to evaluate the responses of LSMs to their input parameters and assess their uncertainties. The most commonly used technique for sensitivity analysis (SA) in LSMs is still restricted to the one-factor-at-a-time approach, [e.g., Chen and Dudhia, 2001; Gao et al., 1996; Li et al., 2011; Pitman, 1994]. The one-factor-at-a-time approach has been criticized for its failure in considering parameter interactions and exploring the full parameter space for highly nonlinear models such as LSMs [Bastidas et al., 2006; Rosero et al., 2010; Saltelli, 1999].
 A number of attempts have been made to systematically assess and quantify sensitivity and uncertainties associated with model structure and parameters in LSMs in the past two decades. The methods employed include factorial analysis by perturbing parameters simultaneously to different levels using simple sampling strategy [e.g., Henderson-Sellers, 1993; Liang and Guo, 2003; Oleson et al., 2008a], Fourier amplitude sensitivity test on the basis of determining fractional contributions of individual factors to the variance of the outputs [e.g., Collins and Avissar, 1994], and regionalized sensitivity analysis that samples the entire parameter space and assess how probability distributions of parameters change by subjectively defining behavioral/non-behavioral simulations [e.g., Bastidas et al., 2006; Demaria et al., 2007; Gulden et al., 2008]. Regionalized sensitivity analysis, as one category of global sensitivity analysis methods, is subject to type II errors (i.e., non-identification of influential parameters) and does not quantify to what extent the variance of outputs is affected by each parameter [Saltelli et al., 2008]. They are typically used in calibrations using a subjectively defined merit matrix [Bastidas et al., 1999] and applied by contrasting empirical observations to the results from a model and the “subjectivity” is associated with the definition of “good” or “bad” simulations based on a measure of the performance of the model, in other words, the quality of the simulations. More recently, a global variance-based SA approach using the Monte Carlo sampling of Sobol was employed to quantify the total and first-order sensitivity indices in the hydrologically enhanced versions of the Noah land surface scheme [Rosero et al., 2010]. The sampling strategy is more robust and efficient compared to factorial analysis and bypasses the design of the calculation matrix involved in factorial analysis.
 Besides an efficient and reliable parameter sampling or perturbation methods, the success of a sensitivity analysis study also relies on how the input parameter uncertainties are quantified. Prior information about input parameters can be obtained from literature and general or site-specific data. There are various ways to deal with the hard (direct) or soft (indirect) information. Some approaches assume arbitrary values within the physical bounds as initial guesses for inversion or as a basis for perturbation (e.g., ±10%) for sensitivity analyses. More complicated approaches assume some kind of prior distributions, such as uniform and/or Gaussian distributions based on experiences, which could be rather subjective. In fact, the prior uncertainty can be well quantified using prior probability density distributions strictly derived using the entropy principle [Kesavan and Kapur, 1989; Tarantola, 2005; Ulrych and Sacchi, 2006]. The related, but more general entropic principle is that of minimum relative entropy (MRE). MRE has been used successfully in earth sciences and its application has been growing [Hou and Rubin, 2005; Hou et al., 2006; Woodbury and Ulrych, 1993; Woodbury and Rubin, 2000].
 To date, there is no global SA study on CLM4 to our knowledge. In this study, we focus on evaluating uncertainties associated with hydrologic parameters in CLM4-SP by developing an uncertainty quantification (UQ) framework that integrates the minimum relative entropy concept, an exploratory sampling approach (quasi-Monte Carlo), and generalized linear model analysis, for uncertainty quantification of LSMs. The framework was applied to CLM4-SP to study uncertainty in the simulated sensible (SH) and latent heat fluxes (LH) associated with the hydrological parameters. Our motivation for focusing first on the hydrologic parameters is that realistically simulating surface energy fluxes is critical for LSMs coupled within climate or earth system models. CLM4 has sophisticated biophysical parameterizations representing detailed energy transfer in the biosphere-atmosphere interface. Hydrological processes such as surface and subsurface runoff are generally considered not as critical in their control on the surface energy fluxes, so their influence on the surface energy budget is less well studied. However, as we will demonstrate in this paper, by influencing soil moisture, uncertainty in input parameters related to hydrological processes can affect how surface energy is partitioned between SH and LH, which has important implications to land-atmosphere interactions, as well as water, energy, and carbon cycle dynamics in coupled earth system models.
2. Methodology and Site Information
2.1. Hydrologic Parameters Evaluated
 Parameters related to hydrologic processes in CLM4 can be classified into three groups: (1) hydrologic, (2) vegetation, and (3) snow and snowmelt parameters. In this paper, we focused on the parameters that have direct influences on hydrologic processes, including soil hydrology and runoff generation processes. We identified 10 hydrologic parameters that are likely to have dominant impacts on the simulation of surface and subsurface runoff, latent and sensible heat fluxes, and soil moisture as suggested in existing literature. Uncertainty ranges and prior information of the parameters were determined based on literature [Niu et al., 2005, 2007; Oleson et al., 2008b, 2010] and discussions with the CLM4 developers (G. Niu, S. Swenson, and D. Lawrence, personal communication, 2011). The selected parameters are fmax, Cs, fover, fdrai, qdrai,max (denoted as Qdm hereinafter), Sy, b, Ψs, Ks, and θs. Explanations of the 10 parameters and their prior information are shown in Table 1.
Table 1. Selected Hydrologic Parameters in CLM4 and Their Prior Information
For example, mean, standard deviation (STD), and upper and lower bounds.
Max fractional saturated area, from DEM
Mean value taken from the default CLM4 input data set; STD = 0.160; upper and lower bounds (0.01–0.907) determined from the default global data set for CLM4
Shape parameter of the topographic index distribution
Mean = 0.5 for flux towers, no std information, upper and lower bound 0.01 and 0.9
Decay factor (m−1) that represents the distribution of surface runoff with depth
Hard coded to be 0.5 in CLM4 Mean = 0.5; upper and lower bounds: 0.1–5
Decay factor (m-1) that represents the distribution of subsurface runoff with depth
Mean = 2.5; upper and lower bounds: 0.1–5
Max subsurface drainage (kg m−2 s−1)
Hard coded to be 5.5 × 10−3 kg m−2 s−1 but typically should vary between 1 × 10−6 to 1 × 10−2 in hydrologic applications. Tuning range is 1 × 10−6 to 1 × 10−1 as suggested by NCAR
Average specific yield
Hard coded to be 0.2 Based on the dominant soil type of the site. Converted to coarser soil texture classes using the USGS soil texture triangle. Mean = 0.02 for clay, 0.07 for sandy clay, 0.18 for silt, 0.27 for coarse sand; bounds are +−50% of the mean for the given soil texture.
Table 2. The 13 Flux Towers Selected for Numerical Experiments
Plant Functional Type
sandy clay loam
silty clay loam
silty clay loam
silty clay loam
 For each site, meteorological forcing, site information such as soil texture, vegetation cover, and satellite-derived phenology, as well as validation data sets, such as water and energy fluxes, are provided by the North American Carbon Program (NACP) site synthesis team. CLM4 was spun up by cycling the provided forcing for at least five times until all the state variables reached an equilibrium. For example, the forcing data for U.S.-Wlr span a four-year period from 2001 to 2004. CLM4 was spun up by cycling the four-year data for five times, equivalent to 20 years, which is the minimum number of spin-up years over the thirteen sites. Although data availability varies across sites, the site with the shortest data coverage, U.S.-Wlr and U.S.-Shd, includes at least four years. Detailed description of the NACP site synthesis data set can be found in Schwalm et al. .
 We note that a number of parameters (i.e., fmax, Cs, and qdrai,max) in Table 1 are related to topography and could be estimated by DEMs in theory. In CLM4, a global map of fmax at 0.5° resolution derived from the TOPO1K compound topographic indices (CTI) [Verdin and Greenlee, 1996] is provided as a default input data set. The default fmax at the grid cell collocated with each flux tower is extracted from the default data set and given in Table 1, which are used as the fmax values in the default simulations. However, fmax estimated from coarse-resolution DEMs is intrinsically problematic so high-resolution DEMs are highly recommended for the purpose of estimating fmax. Interested readers are referred to discussions in Li et al.  and Niu et al. .
 However, for most flux tower sites, information on topographic indices and streamflow in collocated watersheds is not readily available and needs to be obtained from other sources with significant efforts, which prevented us from better estimations of the topography-related parameters, as well as the validation of streamflow simulations. We will address this issue in a follow-on study in which watersheds from the Model Parameter Estimation Experiment (MOPEX) assembled by the National Weather Service [Duan et al., 2006] have been used, which will be reported in a separate paper.
2.3. Uncertainty Quantification Framework
2.3.1. MRE Prior Pdfs
 To construct the initial probability density functions (pdfs), we use the minimum-relative-entropy (MRE) algorithm [Hou and Rubin, 2005; Woodbury, 2004]. With the entropy concept, the MRE solution is unique and maximally uncommitted with respect to unknown information given information such as bounds and moments. For example, given a mean (μ), standard deviation (σ), and upper (U) and lower bounds (L), the MRE minimization shows that the minimally prejudiced pdf is truncated Gaussian:
where f(x) represents the probability of any given value of x occurring and Φ represents the standard Gaussian CDF function. The coefficients β and γ are determined by the constraints μ, σ, L and U. The pdf in equation (1) has the advantage to be general and could be reduced to normal, exponential, and uniform distribution depending on the mean and variance. For a complete derivation, see Hou and Rubin . Lacking a standard deviation, MRE selects a truncated exponential distribution:
Similarly, the coefficient β is determined by the constraints μ, L and U.
 Note that the MRE principle, and that of maximum entropy, is based upon the stipulation that the constraints imposed are complete and accurate. In practice, if measurements were carried out, the observed frequencies may not correspond to the pdfs that have been chosen on the basis of MRE, which derives the PDF based strictly on the information available.
2.3.2. Quasi-Monte Carlo Sampling
 Given that each parameter is represented by a pdf, sample values can be generated from these distributions for numerical evaluation. Sampling technique is an important choice in our UQ framework, as the success of a numerical approach hinges upon evaluating all possibilities defined by the parameter space. Given a large number of dimensions, systematic sampling techniques such as by Simpson's rule are not sufficient [Tarantola, 2005]. Better alternatives are the Latin Hypercube Sampling (LHS) and quasi-Monte Carlo (QMC) techniques. LHS is based on stratification and achieves excellent uniformity on each one dimensional projections [Mckay et al., 1979]. QMC, which incorporates deterministic sequences to guarantee good dispersion between sample points [Caflisch, 1998], can achieve good uniformity also in higher-dimensional projections, at least in the early dimensions (e.g., the first 10). This allows QMC to achieve better performance than both MC and LHS in general [Wang and Sloan, 2008].
 We adopt QMC as the sampling technique in this study. Figure 2 shows the comparison between the generated MC, LHS, and QMC samples with the same sample size in a hypothetical two dimensional parameter space. In QMC sampling we are able to get a series of samples with controlled deterministic inputs instead of random ones, and thus alleviate the clumping issue of the MC methods. The well-dispersed QMC samples enable more efficient exploration of multidimensional parameter spaces.
 QMC requires a choice regarding input of a low-discrepancy sequence (the discrepancy of a sequence is low if the proportion of points in the sequence falling into an arbitrary set is close to the proportion measure of the set). We thereby select the Sobol sequences [Sobol, 1967], which are widely acknowledged to perform well for problems with greater than six dimensions, and avoid degradation effects observed in many other low-discrepancy sequences [Atanassov et al., 2010; Sobol and Shukhman, 2007; Wang and Sloan, 2008].
 In practice, the number N of quasi-Monte Carlo samples needs to be determined for a sampling-involved problem, such as numerical integration, exploratory sensitivity analysis, and parameter estimation (inversion). The number normally is a power of 2, and is usually chosen as a trade-off between computational time and numerical error. Because CLM4 spin-ups and simulations are computationally demanding, it was not practical to run thousands of simulations in this study. There was also a concern about the reliability of the developed relationships between the output responses (e.g., latent heat flux, sensible heat flux, and total runoff) and the independent variables (see Table 1). Therefore, tests were performed for different study sites (e.g., U.S.-ARM and U.S.-SO2, where output sensitivities to input parameters are relatively strong) to determine the number of QMC samples (up to 512 samples) needed for reliable outputs; i.e., the significance levels of the parameters were relatively stable. Based on the tests, we have confirmed that output statistics and sensitivity based on 128 samples are comparable to those based on 512 samples.
2.3.3. Significance Tests
 The purposes of the statistical analyses are to test the statistical significance of and to rank the parameters with regard to each output response. We use generalized linear model (GLM) analyses and AIC (Akaike's information criterion, see Akaike )-based backward removal approach to identify the significant parameters for each month (average over multiple years) for each output variable for each field site. The GLM allows for transformations and combinations of multiple dependent variables, which provides more flexibility than regression approaches which are inherently univariate dealing with single dependent variable [McCullagh and Nelder, 1989; Venables and Ripley, 2002]. Another advantage of GLM is its ability to analyze effects of repeated measure factors or categorical parameters, which makes it suitable for evaluating effects of continuous parameters as well as categorical factors such as soil texture and climate conditions. GLM also has the ability to provide a solution for the normal equations when the variables are not linearly independent and to deal with the interaction effects of independent variables. There are other approaches studying input-output relationships, such as surrogate models including polynomial response surfaces, Kriging, support vector machines and artificial neural networks. In practice, the nature of the true function is not known so it is hard to determine which surrogate model will be the most accurate. Compared to these approaches, the GLM has the advantage to perform strict statistical tests of significance of inputs with well-defined statistical theories, and meanwhile to produce reduced-order models with nonlinear and/or interaction effects.
 A GLM is fitted with the following starting model:
where pi,j represents the ith realization of the jth parameter which can be original or transformed first-order, two-way interaction, or higher-order terms, θj is the fitted coefficient for the jth parameter, and Yi represents the ith realization of the response variable, for example, CLM4-simulated latent heat flux at a particular time and location. Among the parameters in Table 1, we use logarithmic transformation for the parameters Ks, Ψs, and Qdm, because they vary by several orders of magnitude and typically follow lognormal distributions.
 This model assumes the response variable (e.g., simulated latent heat flux for a given month) is a linear combination of these aforementioned parameters/factors, and the model-fitting residuals ε follow independent normal distributions with mean 0 and variance σ2. The above model can be fitted using a GLM approach in the Gaussian family.
 The statistical significance of the input parameters can be evaluated through null hypothesis tests. Let X denote the design matrix (the matrix of independent variables, e.g., the input parameters as shown in Table 1), Y denote the “response variable” (e.g., latent and sensible heat fluxes) matrix, and denote the estimates of the GLM-fitted coefficients. Given the residual sum of squares RSS = and the sample variance S2 = RSS/(n − p), the corresponding variance–covariance matrix for the parameter estimation is given by . Note that n is the number of output responses, and p is the number of parameters. The corresponding standard error for the parameter estimation is given by . The t-statistic value for testing the null hypothesis that θi = 0 is given by . The P-value for the test is 2[1 − tn − p(|t|)]. If the P-value is larger than the significance level of the test (e.g., 0.05 or 0.1), one can accept the null hypothesis that θi = 0, which means the corresponding basis function is insignificant [McCullagh and Nelder, 1989; Venables and Ripley, 2002].
 In order to compare all the fitted models, the Akaike's information criterion (AIC) can be used as a selection criterion. AIC [Akaike, 1974] is a measure of the goodness of fit of an estimated statistical model. It is grounded in the concept of information entropy, and is a measure of the information lost when a given model is used to describe reality. It describes the tradeoff between bias and variance in model construction, or between accuracy and complexity; therefore the AIC values provide a means for model selection. For each starting model, a stepwise backward removal approach is used to obtain a finalized model with minimum AIC, which yields the best compromise of the response misfit and the number of input parameters to be included.
 The GLM analyses can also be applied to evaluate the effects of categorical factors such as soil texture, plant functional type, and climate condition. For example, we can use a measure of uncertainty/variability (e.g., variance, standard deviation) of outputs, which results from the input uncertainty/variability, and treat the output variability as the response variable and the categorical factors as explanatory variables. The GLM analyses with this model setup yield quantitative measures of how these factors control the parameter sensitivities.
 Residual analyses are usually performed to check whether a fitted GLM is reasonable. A good fit should have residuals uncorrelated with the inputs, which can be evaluated using scatterplots between the residuals and inputs. The assumption of normal distribution of the residuals also needs to be evaluated. We adopt the Q-Q plot (“Q” stands for “quantile”), a graphical approach comparing two probability distributions by plotting their quantiles against each other.
3.1. Output Statistics and Parameter Sensitivity at Selected Sites
 As explained in the methodology section, the number of quasi-Monte Carlo samples is determined according to computational demand, as well as stability of the developed relationships between the responses and independent variables. For the CLM4 outputs (e.g., latent heat fluxes (LH), sensible heat fluxes (SH), and total runoff), we found that 128 samples were adequate to give reliable outputs and result in consistent results across the 13 flux tower sites. As shown in Figure 3, the 10-dimensional probability space is well covered and effectively sampled by the quasi-Monte Carlo samples. By projecting back to the parameter space following the MRE prior distributions of the 10 parameters [Hou and Rubin, 2005; Hou et al., 2006] as shown in Figure 4, which take various forms such as exponential, Gaussian, and truncated Gaussian distributions given prior information in Table 1, the sample parameter sets could be generated. Figure 5 shows the realizations of the generated samples in physical space as percentiles of the quasi-Monte Carlo samples projected onto the MRE prior pdfs.
 When generating samples from non-uniform distributions, we projected uniformly sampled probabilities to the pdf to get realizations. The histogram of the realizations with equal weights mimics the pdf to be sampled from. We assign equal weight to each sample set of the input parameters, as well as the output responses (LH, SH, runoff) for each parameter set among the ensemble outputs. The direct CLM4 simulated outputs are time series of LH/SH/runoff at high temporal frequency (e.g., 30 min). In order to reduce the number of data points and focus on long-term statistics, we use mean monthly averages of the outputs across the simulation periods. Monthly averaging removes short-term variability but allows possible effects of the input parameters on the output variables at the monthly to seasonal time scale to emerge.
 We summarized the output statistics using boxplots as shown in Figure 6. The boxplots indicate: (1) whether an input parameter (or independent variable) has positive/negative relationship with the output response variable (or dependent variable) and whether the relationship is nonlinear, (2) how the input uncertainty propagates through the forward model (i.e., CLM4).
Figure 6 shows the output statistics of LH in four months with respect to three important parameters fdrai, Qdm, and Sy, at the U.S.-ARM site, which is covered by cropland with a soil texture of clay. Figure 6 shows that in general, LH increases with fdrai when it is around 2 [m−1] or smaller.
 Note that 1/fdrai represents the effective storage capacity of the subsurface aquifer used in the subsurface runoff generation. A smaller fdrai value corresponds to a larger storage capacity, more subsurface runoff generation given the same groundwater table depth/soil moisture status, a quicker depletion of deep layer soil moisture, and therefore a larger percolation from the surface layers to recharge the deep layers [Beven, 1997; Iorgulescu and Musy, 1997; Kirkby, 1997; Li et al., 2011] and less available water in shallow soil layers. Therefore, for shallow-rooted sites covered by grasslands or croplands such as U.S.-ARM, a smaller value of fdrai would result in less evapotranspiration from shallow soil layers in the long run. However, when fdrai is high, especially when it is larger than 2, less water will leave the soil column as subsurface runoff so that fdrai is no longer a limiting factor for soil moisture in the shallow soil layers or hence, evapotranspiration. Under such circumstances, LH has a weak but still positive correlation with fdrai when it is larger than 2 [m−1].
 Another observation from Figure 6 is the output uncertainty. The uncertainty of LH also behaves differently when fdrai falls in different ranges. When fdrai is around 2 or smaller, the predicted interval (range) of LH increases with fdrai; however, when fdrai is larger than 2, the range of LH is not affected by fdrai anymore. Note that in the boxplot, each box of LH corresponds to the same level of variation (e.g., order of magnitude) of the input parameter; therefore, the figure clearly shows how input uncertainty propagates through CLM4 to output uncertainty. Quantitatively, we can say that latent heat flux ranges between 20 and 40 W m−2 if fdrai is around or larger than 2. But when fdrai is smaller than 1, the uncertainty bound becomes 0–20 W m−2.
Qdm is the maximum subsurface runoff rate. Again, higher Qdm means stronger infiltration and water moves easily to the deeper layers. By transferring water away from the surface and the rooting zones, higher Qdm reduces evaporation, transpiration, and latent heat flux. It is not surprising that Qdm has negative effect on LH, but it is interesting to see that LH is subject to larger uncertainty when Qdm falls in the middle range. A possible reason is that when Qdm is very low, the soil column could easily get saturated since the only way for water to leave the soil column is through surface runoff. Therefore, more water is available and evapotranspiration is no longer limited by water availability but rather by the energy input, so uncertainty in the latent heat flux would be low. Similarly when Qdm is very high, soil water is depleted quickly in the form of subsurface runoff, leaving limited water available for evapotranspiration. With LH close to 0, its uncertainty ranges would be minimal.
Sy is specific yield, also known as the drainable porosity. It is a ratio, less than or equal to the effective porosity, indicating the volumetric fraction of the bulk aquifer volume that a given aquifer will yield when all the water is allowed to drain out of it under gravity. Higher Sy normally corresponds to more available water in the shallow subsurface and therefore higher latent heat fluxes. The effect of Sy on LH is consistent across different seasons: Sy has a positive effect on LH and the uncertainty ranges of LH are not affected by the value of Sy. For example, the April monthly average of LH has a range of 40 W m−2 when Sy varies by about 0.005, regardless of whether Sy is very high or low, although when Sy increases by 0.005, the median April LH increases by about 4 W m−2 in general. Detailed analyses on such relationships are discussed later in this paper.
 Similar analyses were also performed for the sensible heat flux (SH). Regarding the effects of the three parameters fdrai, Qdm, and Sy, inverse patterns of the effects on SH are observed compared to the effects on LH as expected because the sum of LH and SH is constrained by the energy budget calculations. Although LH and SH share similar temporal patterns, with high values during summer associated with the high incoming solar radiation. For example, the temperature of a moist surface changes little when evaporation occurs, but once the surface has dried out, sensible heat flux can take over as surface temperature increases rapidly. Therefore, when Qdm is high, water drains out quickly, which results in increases in SH. Similarly, a very low fdrai reduces evapotranspiration, so the skin temperature can rise quickly and result in higher SH to the atmosphere.
 In contrast to Figure 6, which demonstrates the marginal effects of input parameters, Figure 7 shows the overall uncertainty ranges of LH and SH for the ensemble simulations. The calculated LH/SH using the default parameter values (the values originally hard-coded in CLM4) are shown as red circles, while observations of the fluxes are denoted by the green symbols. As can be seen in the figure, LH is in general overestimated in the default simulations, although observations lie within the prediction intervals/bounds of LH. This indicates that we have assigned reasonable physical bounds for the input parameters as the parameterizations intended and our sampling approach provided reasonable parameter values within the parameter space, such that the observations fall within the output possibilities. This result implies that the input parameters can be calibrated for better fit of the LH simulations with available observations. The predictive bounds for the summer LH and SH are much wider than for the cold seasons, which means the summer latent and sensible heat fluxes are more sensitive to the selected parameters. Moreover, deviations between the observations and the default simulations are larger during the warm seasons. Both findings tell us that data collected in the warmer seasons are more informative for parameter optimization and misfits during such time periods are expected to be improved the most.
 GLM analysis was performed and the final model was obtained through the AIC-based stepwise backward removal approach. The estimates using the July monthly average of latent heat flux as the response variable are displayed in Table 3. The estimates and standard errors in the table are for the coefficients θj in equation (3). The t-values and P-values were obtained as explained in the previous section. When a P-value is larger than the significance level (e.g., 0.05), one can say the corresponding variable (input parameter) is relatively insignificant. In Table 3, the level of significance is represented by the number of asterisks. This way, we have identified the significant parameters for the July LH at the U.S.-ARM site: among the 10 input parameters, fdrai, Qdm, and b are the most significant; Cs, Sy, and Ψs also passed the significance test, but the other four parameters fover, fmax, Ks, and θs are considered insignificant.
Table 3. Estimates and Standard Errors of Input Parameters of the Finalized GLM Model
Asterisks represent significance levels in the significance tests in section 2.3.3 and correspond to P-values: three asterisks represent [0, 0.001), two asterisks represent [0.001, 0.01), and one asterisk represents [0.01, 0.1).
 The surface and subsurface runoff generation parameterizations in CLM4 are TOPMODEL-based. Although the parameter ranges and their default values listed in Table 1 were used in CLM4 for climate studies, Li et al.  noted that those ranges/default values might be too wide/biased with respect to the physical meaning of the parameters. For example, Qdm, the maximum subsurface runoff, has a typical range of 0–100 mm d−1 in hydrologic applications [Huang and Liang, 2006; Huang et al., 2003], but ranges from 0 to 8640 mm d−1 are used in climate studies, as shown in Table 1. Such high values of drainage rarely occur under natural conditions. Even the default value of 5.5 × 10−3 kg m−2 s−1 appears too high for typical hydrologic studies. However, the globally uniform default value and parameter ranges were chosen to yield reasonable global water and energy budgets in relatively coarse resolution global climate models (D. Lawrence and S. Swenson, personal communication, 2011) given the functional form of the TOPMODEL-based subsurface flow parameterization, as well as the overall structure of hydrologic parameterizations in CLM4. Interested readers are referred to detailed discussions in Li et al. .
 To further evaluate the impact of parameter range on the conclusions from this study, we have conducted reduced-range numerical experiments on Qdm by repeating the analyses for a subset of simulations in which Qdm varies with an upper bound of 0.01 instead of 0.1 (results not shown). We found that changing the range of Qdm could change the output response variability and sensitivity pattern, but it does not fundamentally change our conclusions about which model parameters lead to larger model sensitivity. Although this finding strengthens the robustness of our conclusion, the parameter ranges should still be chosen carefully as arbitrarily choosing a narrower range (the extreme case is a fixed value) can render the effects of a significant parameter to become negligible, making the sensitivity analyses inconclusive.
 Response surfaces of the output variables (e.g., LH, SH) with respect to the input parameters can help visualize the combined effects of the parameters, as shown in Figure 8. As we discussed earlier, although the time series of latent heat and sensible heat fluxes share similar seasonal patterns, their sensitivity to the input parameters usually exhibit inverse relationships. The response surface plots can be used for predicting the fluxes. From the figure, one can see that at the U.S.-Ne3 site, the June monthly average of latent heat flux is expected to be around 90 W m−2 if fdrai/Qdm/Sy take the values of 4 m−1, 10−4 kg m−2 s−1, and 0.1, respectively,
3.2. Summary of Findings Over All Sites
 For each of the 13 study sites, we calculated 12 monthly averages of LH and SH throughout the corresponding simulation period, and overall there are 13 × 12 × 2 = 312 response variables. We conducted statistical tests and GLM analyses for each of the 312 output response variables, and recorded the number of occurrences of each input parameter passing the significance test. We summarized the results for different sites and different seasons, as shown in Figures 9, 10, and 11.
Figure 9 is the parameter significance summary of input parameters for all 12 months across the 13 study sites. The LH score is defined as the percentage of occurrences that an input parameter passes the significance test for all months and all sites. Three parameters fdrai, Qdm, and Sy are the most significant parameters, b, fmax and θs are of secondary significance, and the other parameters have weak impacts on the LH responses.
 The outputs show largest sensitivity to the subsurface runoff generation parameters, which dominate the warm season processes, while soil texture related parameters are of secondary significance and surface runoff parameters in general are insignificant. These significance patterns could be the results of interactions between the biogeochemical and physical processes, or oversimplified groundwater module or boundary conditions. The patterns look reasonable given the mechanisms explained above for fdrai, Qdm, and Sy.
 The most significant parameters that we identified through the test are different from site to site, and could also vary from season to season for a single study site. From Figure 10, we can see that most sites have the same three parameters identified above as the most significant, but with several exceptions. The U.S.-Ha1 and U.S.-ARM sites have Cs as an important parameter, while U.S.-ARM, U.S.-Dk2/Dk3 all have an important parameter b, the Clapp and Hornberger exponent. The site-specific patterns should be a result of the combined controls by soil texture, plant functional types (PFTs), and climate conditions. For example, at study sites with PFTs of broadleaf and needleleaf, surface runoff generation parameters such as fmax, Cs, and fover are identified to be important, as they control the partitioning of precipitation between surface runoff and infiltration. For PFTs of croplands, grasslands, and closed shrublands, where the rooting zones are shallow, the vadose zone parameters (topsoil soil texture related) become more important than the surface runoff parameters. Regarding the impacts of soil properties, we observed that at field sites with finer soil texture, the air-entry pressure Ψs is more significant than other sites with coarser soil.
 The seasonal variations in the parameter sensitivity patterns are generally smaller than the site-wise changes (Figure 11). One clear observation is that during the summer, more parameters are playing important roles impacting the latent and sensible heat fluxes. Particularly, fmax, fover, Ks, and θs all became more important because of the higher precipitation and temperature during the warm seasons at most sites.
3.3. Soil Texture and PFT Control on Parameter Sensitivity of Simulated Surface Fluxes
 Quantitative measures of the impacts of soil texture and PFT on parameter sensitivity are shown in Figure 12. The LH variability is calculated as the standard deviation of latent heat fluxes for each month. It represents the output uncertainty as input parameter uncertainty propagates through CLM4. Larger LH variability means stronger parameter sensitivity. Figure 12 shows that finer soil texture yields stronger sensitivity of LH responses to the input parameters. The sensitivity decreases as soil texture becomes coarser from clay to clay loam and from silty clay loam to sandy loam. A field site with soil texture of sandy loam facilitates the distribution and discharge of precipitation and tends to have more uniformly distributed water in the rooting zones. This reduces variability of the LH and SH responses.
 The effects of PFTs reflect the indirect impacts of the climate conditions. For example, closed shrublands and grasslands occur in relatively dry conditions, while broadleaf and needleleaf sites are usually wet. Also evergreen needleleaf is more adapted to colder climates than deciduous broadleaf. Figure 12 shows that sites with shallow rooting zones (dry situations) tend to have larger parameter sensitivity, while those with significant rooting depths are stable and relatively insensitive to input parameter variations. This is reasonable since in dry conditions, evapotranspiration is more sensitive to water availability, which has a strong dependence of model parameters, but evapotranspiration from plants with deep roots are insensitive to the decay parameters such as fover and fdrai, and the vadose zone soil parameters. Deciduous broadleaf sites have similar parameter sensitivity as evergreen needleleaf sites during winter. During summer, however, the former group yields much stronger parameter sensitivity; variability of the inputs is translated into larger variability of the outputs given the climate forcing, because deciduous plants are more actively involved in the evapotranspiration processes during summer.
 Besides the main effects of the input parameters, we performed GLM two-way interaction analyses to evaluate the existence and significance of parameter interaction effects. In general, there doesn't seem to be consistent interaction patterns across the field sites and seasons. We compared the performance of the finalized GLMs with and without interaction terms using Q-Q plots as shown in Figure 13. The residuals with only the main effects included are in fact normally distributed. Although the fit with the interaction effect yields a slightly lower AIC value, there are some outliers in the residuals that are against the normality assumption.
4. Summary and Future Work
 This paper describes a sensitivity analysis framework to analyze the input parametric sensitivity and quantify uncertainty in LSMs. By applying the framework to CLM4-SP, our goal is to demonstrate the use of the framework to quantify the sensitivity of simulated surface fluxes to hydrologic parameters in CLM4, which is an important step toward quantifying uncertainty of simulating the hydrologic cycle by CLM4 and calibrating model parameters to improve model skill.
 Our UQ framework features an entropy approach to quantify uncertainty in the input parameters. With this approach, the uncertainty associated with the calculated responses is the most representative of our knowledge of the system. Our approach incorporates an efficient sampling method to explore the parameter space so that the output statistics are exploratory of most if not all the possibilities of the reality. The approach uses multivariate generalized linear model analyses and significance statistical tests to rank the significance of input parameters and develop relationships between inputs and outputs using response surface plots and the finalized linear models. It is worth mentioning that the uncertainty quantified in this study is input parameter uncertainty and the corresponding output variations through CLM4 forward modeling, but not the uncertainty associated with parameter predictions using observations through inverse modeling.
 The outputs investigated show largest sensitivity to subsurface runoff generation parameters, which dominate the warm season processes. Soil texture related parameters are of secondary significance, while surface runoff parameters in general are insignificant, partly due to the biased mode and distribution used in the prior knowledge of the subsurface runoff generation parameters employed as default values in climate studies. The sensitivity of simulated water and energy fluxes to hydrologic parameters varies under different site and climate conditions, as well as over different seasons. Study sites with PFTs of deep root systems are more affected by surface runoff parameters, while sites with shallow root zones are more sensitive to vadose zone soil water parameters. Some parameters, such as air-entry pressure, become particularly important for finer soil with low conductivity. Both surface runoff and soil water parameters are generally more important during the warm seasons with high precipitation and temperature.
 The impacts of soil texture, PFTs, and climate conditions on output uncertainty are also evaluated through categorical factor analyses. Sites with finer soil texture and shallower rooting systems tend to have stronger parametric sensitivity. That is, variation/uncertainty in the input parameters are translated to larger variability of the output responses such as latent and sensible heat fluxes.
 We stress that the purpose of this study is to assess how model simulated surface fluxes are sensitive to model parameters, rather than to assess how surface fluxes are sensitive to surface and subsurface hydrologic processes in the real world. The former sensitivity is a property of the model alone (e.g., how physical processes are parameterized, how numerical solutions are solved) so model output variability is used to assess model sensitivity. The latter sensitivity is a property of the physical system, so accounting for model biases is important when using models to estimate sensitivity in the real world.
 Nevertheless, by including comparisons with observations (e.g., Figure 7), this study also provides guidance on reduction of parameter set dimensionality and demonstrates the necessity and possibility of parameter inversion/calibration for CLM4 when observations of latent/sensible heat fluxes are available. For example, one can choose to use latent flux observations from the warm season for parameter estimation and focus only on the significant parameters. When inverting only a subset of parameters, the inversion problem will become less ill-posed, and therefore might yield more reliable and accurate solutions.
 Because the ultimate purpose of our project is to quantify uncertainties in climate models for climate predictions and assessment, the parameter ranges and their default values in this study were selected based on those typically used for climate modeling. However, as we pointed out in section 3.1, more realistic parameter ranges exist and should be seriously considered for future CLM4 applications and sensitivity studies.
 We acknowledge that by focusing on mean monthly climatology, we chose to evaluate the sensitivity of first-order dynamics in the climate system in this study, but ignored important shorter time-scale processes such as the diurnal cycles of the fluxes and the daily scale variations of soil moisture in response to rainfall pulses, which could have significant implications to boundary layer evolution and cloud formation, as well as hydrologic processes such as flooding events. We also acknowledge that snow and snowmelt are important processes in the hydrologic cycle and parameters associated such processes should be taken into account for a complete understanding of the uncertainties associated with hydrologic parameters in CLM4. We will explore such topics in future studies.
 To more fully evaluate the sensitivity of CLM4 to input parameters, we will expand the study to investigate nonlinear effects of the input parameters as well as the effect of parameter interactions on the output responses. The UQ framework will be applied to more flux tower sites to span a wider range of climate and site conditions. Watersheds from the Model Parameter Estimation Experiment (MOPEX) [Duan et al., 2006] will also be included to provide measurements of runoff to further assess the potential for model calibration using runoff data. These studies can provide further guidance on which parameters to focus on for either uncertainty quantification or parameter estimation. A companion study to compare model sensitivity to the selected hydrologic parameters evaluated in this study with (using CLM4-CN) and without (using CLM4-SP as in this study) carbon/nitrogen cycles will be reported in a separate paper, evaluating the importance of long-term feedbacks between vegetation growth and hydrology.
 This work is supported by the DOE project “Climate Science for a Sustainable Energy Future” and the PNNL Integrated Regional Earth System Modeling (iRESM) Initiative. The authors would like to thank the NACP site synthesis team for providing the flux tower data sets, and D. Lawrence, S. Swenson, and G.-Y. Niu for their comments and suggestions. Data collection at the U.S.-ARM was supported by the Office of Biological and Environmental Research of the U.S. Department of Energy under contract DE-AC02-05CH11231 as part of the Atmospheric Radiation Measurement Program. Data from U.S.-NR1 were provided by P. Blanken and S. Burns and are supported by a grant from the National Science Foundation (NSF) Long-Term Research in Environmental Biology (LTREB). PNNL is operated for the U.S. DOE by Battelle Memorial Institute under contract DE-AC06-76RLO1830.