Optimization and uncertainty estimates of WMO regression models for the systematic bias adjustment of NLDAS precipitation in the United States

Authors

  • Youlong Xia

    1. Geophysical Fluid Dynamics Laboratory, NOAA, and Atmospheric and Oceanic Science Program, Princeton University, Princeton, New Jersey, USA
    2. Now at Environmental Modeling Center, National Centers for Environmental Prediction, NOAA, Camp Springs, Maryland, USA.
    Search for more papers by this author

Abstract

[1] World Meteorological Organization (WMO) regression models for precipitation gauge bias developed by Goodison et al. (1998) were optimized using the very fast simulated annealing algorithm. The regression model uncertainties were estimated by use of a Bayesian stochastic inversion (BSI) algorithm. Legates and Willmott's (1990) precipitation correction factors database (applicable to average monthly conditions) were used to constrain model parameters. Daily wind speed, air temperature, and precipitation from the North American Land Data Assimilation System (NLDAS) were used as input for the WMO regression models in the United States. The results show that the optimal regression model is reasonably bounded by the WMO Alter-shielded and unshielded models for both rain and snow. The optimized regression model, aside from reproducing reasonably well the Legates-Willmott average monthly adjustment factors, also describes daily and interannual variation of precipitation correction factors. The relations among model parameters and model uncertainties, including regression parameter uncertainty and input data uncertainty, are examined. The results show strong relations between regression model uncertainties and uncertain NLDAS wind speed. Uncertainty of NLDAS data has little effect on optimization of the WMO regression model. However, it has significant effects on uncertainty of the regression model parameters and the precipitation correction factors.

1. Introduction

[2] Precipitation is one of the most influential atmospheric variables for simulations of land surface water balance, because it sets the scale of all other water fluxes. The partitioning of precipitation into evapotranspiration, runoff and soil moisture storage is a complicated nonlinear process that depends upon soil moisture and snow accumulation, and spatial and temporal distributions of precipitation and snowmelt processes. Recently, Lohmann et al [2004] simulated three years (1997–1999) of streamflow and water balance analysis in the United States with four land surface models, deriving by precipitation from the North American Land Data Assimilation System (NLDAS) [Mitchell et al., 2004]. They used measured streamflow data from 1154 U.S. Geological Survey (USGS) gauges to evaluate the ability of the four land surface models to capture temporal and spatial variations of streamflow. All four models were found to underestimate streamflow in areas with significant snowfall, such as the northern Rocky Mountains [Lohmann et al., 2004]. The main reason for this underestimation was undercatch of snowfalls in the area [Sheffield et al., 2003; Pan et al., 2003]. When snowfall was increased by a constant factor of 2.17, most of the errors caused by snowfall undercatch were significantly reduced. Clearly, large systematic bias in gauge measurements of snowfall greatly hampers simulation of streamflow and water balance and evaluation of land surface models.

[3] The World Meteorological Organization (WMO) Solid Precipitation Measurement Intercomparison [Goodison et al., 1998] has evaluated the relative biases of standard precipitation gauges using an extremely rigorous method [Yang et al., 1998a]. The WMO organizing committee for the measurement intercomparison designed the octagonal vertical double fence, surrounding a shielded Tretyakov gauge, to measure “true” precipitation data in a range of climatic conditions. These “true” precipitation data were compared with data measured by standard gauges. Two types of gauges, Alter-shielded and unshielded U.S. 8" nonrecording gauges, were compared and evaluated in the United States. On the basis of this comparison, Goodison et al. [1998] and Yang et al. [1998a, 1998b] developed regression models for different types of precipitation (i.e., snow, mixed precipitation, rainfall) and gauges (i.e., Alter shielded and unshielded).

[4] The U. S. precipitation gauge network contains a variety of shielded and unshielded gauges (i.e., weighing, tipping bucket, Fischer and Porter) installed at heights ranging from 0.9 to 7.6 m [Yang et al., 1998b]. Adam and Lettenmaier [2003], lacking detailed information on gauge type, shielding details, and height, used the regression models derived for unshielded gauges at a height of 1.1 m to adjust the U.S. precipitation for all gauges. Undoubtedly, this led to overall improvement in precipitation estimates. Their adjustments probably overestimate precipitation for gauges with Alter shields and underestimate precipitation for other gauges [Adam and Lettenmaier, 2003], leading to area mean biases of unknown sign and magnitude. To overcome this defect, in this study the equations used for bias adjustments are not assumed a prior. Only their functional form is assumed, and the parameters are determined so as to minimize departures from the average monthly mean adjustments determined by Legates and Willmott's [1990]. The minimization is accomplished by use of the very fast simulated annealing (VFSA) optimization algorithm, which has been widely used to search for optimal parameters in solid geophysics [Sen and Stoffa, 1996], simple paleoclimate modeling [Jackson et al., 2004], and the land surface modeling [Jackson et al., 2003; Xia et al., 2004a, 2004b].

[5] The Legates and Willmott's [1990] correction factor database is selected because it is used by the Global Precipitation Climate Centre (GPCC, http://www.dwd.de/en/FundE/Klima/KLIS/int/GPCC/) to scale global gridded precipitation [Huffman et al., 1997] and it is applicable to a wide range of gauge types. The database is based on monthly rather than daily meteorological data. However, the results obtained by Ungersboeck et al. [2000] using the methods of Rubel and Hantel [1999] for 2 years of (1996–1997) daily meteorological data showed that monthly results were not significantly different from Legates and Willmott's [1990] results in the United States.

[6] In order to derive optimal regression models, the following are used as input data: NLDAS surface precipitation, 2 m air temperature, daily wind speed at 10 m, together with Legates and Willmott's [1990] correction factors. The VFSA algorithm was then used to optimize WMO regression model parameters. Through this process an optimal regression model was derived together with other optimal parameters such as gauge height, and the derived optimal regression model was used to adjust the NLDAS 0.125° gridded precipitation product. This adjustment reflects the known effects of measurement biases because root mean square error between the derived and Legates and Willmott's [1990] correction factors is minimal in the United States. Clearly, this bias adjustment method still includes many uncertainties such as wind speed, temperature, precipitation in the NLDAS database together with WMO regression model uncertainty. In addition, temporal and spatial scale issues also result in additional uncertainties (these issues are discussed in sections 4.5, and 5.4, respectively).

[7] The Bayesian stochastic inversion (BSI) algorithm has been used to estimate the uncertainty of paleoclimate models [Jackson et al., 2004] and land surface models [Xia et al., 2004a, 2004b]. In this study, BSI was also used to estimate WMO regression model uncertainty, gauge height uncertainty, and NLDAS input data uncertainty. Finally, correction factor uncertainty and adjusted precipitation uncertainty were estimated.

[8] This paper is organized as follows: Section 2 gives a brief description of the VFSA and BSI methodology, the regression model with uncertainty parameters, and NLDAS data, and section 3 describes the experiment design. Section 4 derives optimal regression models and analyzes optimal adjustment of NLDAS. Section 5 estimates the uncertainties coming from different uncertainty sources. Conclusions are given in section 6.

2. Data, Model, and Methods

2.1. Data

[9] Two databases were used in this study. They were the NLDAS database and Legates and Willmott's [1990] correction factors (hereafter called LW). The NLDAS database was used as input data for the WMO regression model, and the LW database was used as a reference database. The monthly LW global database has a 1° resolution, with 12 monthly correction factors for each grid box.

[10] Seven years of NLDAS data (1997–2003), covering a part of North America from 65°W to 120°W, and from 25°N to 52°N were obtained from the NLDAS project [Mitchell et al., 2004]. These data include hourly downward solar radiation, downward longwave radiation, surface pressure, wind speed at 10 m, surface air temperature at 2 m, specific humidity at 10 m, and precipitation, at a 0.125° resolution. Detailed description of the data set is given by Cosgrove et al. [2003]. Daily temperature at 2 m, daily wind speed at 10 m and daily precipitation calculated from this database were used in this study.

2.2. WMO Regression Model

2.2.1. Regression Equations

[11] Following Goodison et al. [1998] and Yang et al. [1998a, 1998b], the WMO regression equation for snow, mixed precipitation and rain is given as follows: Snow

equation image

Mixed precipitation

equation image

Rain

equation image

where a, b, c, d, e are adjustable parameters for Alter-shielded and unshielded gauge types (see Table 1), V(h) is the wind speed at gauge height h; CFs, CFm, and CFr are daily correction factors for snow, mixed precipitation and rain, respectively.

Table 1. Descriptions and Ranges of Eight Parameters for the WMO Regression Model and Input Data
ParameterDescriptionMinimum ValueMaximum Value
aregression model parameter 1 for snow0.0350.16
bregression model parameter 2 for snow1.251.80
cregression model parameter for mixed5.009.00
dregression model parameter 1 for rain0.040.06
eregression model parameter 2 for rain0.400.70
f1wind scaling factor0.701.30
f2gauge height (m)0.902.00
f3temperature correction factor (°C)−2.002.00

[12] The wind speed at gauge height h was calculated from 10 m NLDAS grid wind speed using similarity theory (logarithmic profile):

equation image

where V(H) is the NLDAS daily wind speed at 10 m, and Z0 is the roughness parameter (m). According to Sevruk [1982] and Golubev et al. [1992], Z0 = 0.01 m for a winter snow surface and Z0 = 0.03 m for a short grass site in the summer are appropriate average roughness parameters for most sites. In this study, a roughness length of 0.01 m and 0.03 m was used for the colder (November to April) and warmer (May to October) halves of the year, respectively. The f1 and f2 are adjustable parameters for correction of NLDAS daily wind speed and gauge height. The ranges of these two parameters are listed in Table 1.

[13] Following Forland et al. [1996] and Rubel and Hantel [1999], the precipitation is identified as rain, mixed precipitation and snow as follows: Rain

equation image

Mixed

equation image

Snow

equation image

where T is the NLDAS daily air temperature at 2 m, and f3 is an adjustable parameter ranging from −2°C to 2°C according to Pan et al. [2003].

2.2.2. Calculation of Monthly Mean Correction Factors

[14] Following Yang et al. [1998b], the bias-adjusted precipitation at each day and grid cell can be expressed as Rain

equation image

Mixed

equation image

Snow

equation image

where Pa is the adjusted daily precipitation, Pg is the NLDAS daily precipitation, ΔPw is wetting loss, ΔPe is evaporation loss, and CF is the daily correction factor calculated using WMO regression equations described above. The subscripts r, m, and s denote the rain, mixed precipitation and snow, respectively. Wetting loss is 0.15 mm/day for rain [Adam and Lettenmaier, 2003], and 0.075 mm/day for mixed precipitation and snow. Evaporation loss is 0.05 mm/day for all types of precipitation [see Forland et al., 1996]. For the period 1997 through 2003, precipitation was adjusted using equations (6a)(6c) according to different types of precipitation for each day and each grid. The adjusted and unadjusted precipitation values were accumulated to get 8-year mean monthly totals, and the mean monthly correction factor (CCFijn) for each month and each grid cell were calculated as

equation image

where i and j are indices of the grid point in longitude and latitude direction, respectively, n is the month, equation image is the mean monthly NLDAS precipitation, and equation image is the mean monthly bias adjusted precipitation. Equation (8) (described in section 2.3) was used to calculate the error function.

2.3. Error Function

[15] The error function (E) is defined as a root mean square error between reference data (LW) and the data calculated using WMO regression equations. It represents the mismatch between reference and calculated data. It is defined as

equation image

where N = 12 is the number of months, i, j are grid point indices in longitude and latitude directions, RCFijn and CCFijn are reference correction factors and calculated correction factors at each grid point and month, respectively. Here E is a function of adjustable parameters a, b, c, d, and e for the first experiment, and parameters a, b, d, f1, f2, f3 for the second experiment (see section 3).

2.4. Very Fast Simulated Annealing

[16] The VFSA is an optimization algorithm. One may use the temperature construct within the Metropolis algorithm [Metropolis et al., 1953] to locate the global minimum of error function by very slowly lowering the temperature parameter within

equation image

where P is the probability of acceptance of a new parameter set with positive change of error function values, ΔE is the change of error function between new and previous parameter sets, and T is a control parameter analogous to temperature. If the change is negative, this new parameter set is accepted. If the change is positive, and if and only if P is less than a randomly generated number between 0 and 1, the new parameter set is rejected. This iterative section process is analogous to the annealing process within a physical system where the lowest energy state between atoms or molecules is reached by the gradual cooling of the substance within a heat bath. Because of this physical analogy, the algorithm is called simulated annealing. In order to enhance the ability of simulated annealing to converge to the global minimum of the error function, Ingber [1989] introduced a new procedure for selecting parameter sets according to a temperature-dependent Cauchy distribution. This modified simulated annealing algorithm is called very fast simulated annealing. Ingber's algorithm can be described as follows.

[17] Let us assume that a model parameter mi at kth iteration (annealing step k) is represented by mi(k) such that

equation image

where mimin and mimax are the minimum and maximum values of the model parameter mi. This model parameter value is perturbed at iteration (k + 1) using mi(k+1) = mi(k) + yi (mimaxmimin), miminmi(k+1)mimax and yiequation image [−1,1]. Yi is generated from the distribution gT(y) = equation imageequation image = equation imageimage (yi) and has a cumulative probability image = equation image. Ingber [1989] showed that for such a distribution the global minimum could be statistically obtained by using the following cooling schedule

equation image

where T0i is the initial temperature for model parameter i and ci is a parameter to be used to control the temperature. The NM is the number of selected model parameter sets. The acceptance rule of the very fast simulated annealing algorithm is the same as that used in the Metropolis rule. However, very fast simulated annealing is more efficient when compared with simulated annealing.

2.5. Bayesian Stochastic Inversion

[18] The Bayesian stochastic inversion (BSI) algorithm [Sen and Stoffa, 1996] is based on the Bayes theorem and, usually, a stochastic method to select sets of parameter values from a distribution of realistic choices for model parameters. Within the Bayesian nomenclature, the relative probability for each combination of parameter values is expressed as a “posterior” probability density function (PPD) assumed to be Gaussian, which is given mathematically as

equation image

where the domain of integration spans the entire model parameter space m, σ(mdobs) is the PPD, vector dobs is the observational data, E(m) is the error function, exp[−E(m)] is the likelihood function, p(m) is the “prior” probability density function for m. Because only the range for each model parameter in m is known, a uniform distribution within the range is used as the “prior” probability density function. This selection is the least biased as a uniform distribution indicates maximum uncertainty range.

[19] Because the PPD is multidimensional, it is difficult to visualize. Therefore a one- dimensional projection of the PPD (i.e., the marginal PPD) for a particular parameter, the posterior mean parameter set and the posterior parameter covariance matrix or correction matrix, are often used. The marginal PPD of a particular parameter mi is given by

equation image

where M is total number of model parameters. The posterior mean parameter set is given by

equation image

The posterior parameter covariance matrix is given by

equation image

The VFSA, described in section 2.4, was used to stochastically select parameter sets. The VFSA is a form of importance sampling that reduces the computational burden of modeling of the effect of every possible combination of model parameters. The VFSA algorithm will sample more frequently those regions of the PPD that are more probable [Sen and Stoffa, 1996].

3. Experiment Design

[20] Table 1 lists eight adjustable parameters and their assumed feasible ranges. Parameters a and b, which affect the snow correction rate, are regression model parameters for snow. Parameter c is a regression model parameter for mixed precipitation, and it affects the mixed precipitation correction rate. Parameters d and e are regression model parameters for rain and they affect the rain correction rate. Parameters f1 and f2 are factors for gauge height and wind correction, respectively. Both f1 and f2 affect wind speed at gauge height. Parameter f3 is air temperature correction factor. It affects portioning of rain, mixed precipitation and snow. Overall, these uncertainty parameters cover almost all uncertainties of the WMO regression model and input data. The ranges of parameters in the WMO regression equations were taken from Yang et al. [1998b]. A gauge height range from 0.9 to 2.0 m was taken because Yang et al. [1998b] showed that most gauges are in this range. The range of the air temperature correction factor was taken from Pan et al. [2003]. There is no information for the range of the wind speed scaling factor so an acceptable range was assumed to be from 0.7 to 1.3, which may be a bit arbitrary.

[21] Two experiments were designed in this study. In the first experiment (Exp1) only the WMO regression model uncertainties were considered and input data uncertainties were ignored. Therefore the first five parameters were optimized and are shown in Table 1. In the second experiment (Exp2), all uncertainties discussed above were considered leaving eight parameters to optimize. In order to reduce the computational burden in the second experiment, a traditional perturbation method (one factor at a time) was used initially, following Xia et al. [2004b], to make an error profile analysis, to select sensitive parameters, and to remove insensitive parameters. This error profile is a ratio of the difference between calculated and minimum error values to minimum error values. The error is calculated as a function of variations in a given parameter while holding the value of all other parameters constant using equation (8). A sensitivity analysis of eight parameters is shown in Figure 1. Comparison of sensitivity tests shows that a, b, f1, and f2 are the most sensitive parameters, d and f3 are also sensitive to calculated error function, and c and e are less sensitive to calculated error function. From this analysis it is known that the model parameters related to snow correction (e.g., a, b, f3) are sensitive, and the model parameter related to rain and mixed precipitation are less sensitive except for d. The parameters related to wind speed correction (e.g., f1, f2) are sensitive because they affect both snow and rain correction factors. After removing two less sensitive parameters, six parameters remain to be optimized. For each experiment 40,000 parameter sets were run, and the parameter set which has a minimum error function value was selected as the optimal parameter set.

Figure 1.

Sensitivity analysis of five model parameters and three input data parameters. Y axis values were computed as a ratio of the difference between calculated error values and minimum error value to minimum error value. The minimum error value is the minimum value of the all calculated errors.

[22] Performance of the regression model was assessed using root mean square error between reference and calculated data. In addition, comparisons of reference and calculated data, optimal and WMO regression models [Yang et al., 1998b] were also used to evaluate the performance of the regression model. Adjusted and unadjusted precipitation in the United States also was compared. Daily and annual variations of correction factors were also analyzed in the United States. Finally, regression model parameter and input data uncertainties were analyzed, together with the effect of input data uncertainty on regression model parameters, and uncertainties of adjusted precipitation. A detailed schematic diagram for this study is shown in Figure 2. Gray lines and boxes represent optimization process, and black lines and boxes represent uncertainty estimation process.

Figure 2.

A schematic diagram for optimization and uncertainty estimation processes in this study. Gray lines and boxes represent optimization process, and black lines and boxes represent uncertainty estimation process.

4. Optimization of WMO Regression Models

4.1. Comparison of Optimal Regression Model with the Results of Yang et al. [1998b]

[23] In order to present the results more concisely, the WMO regression model derived using Alter-shielded gauge data will be referred to as the AM model, and the WMO regression model using unshielded gauge data will be referred to as the UM model. Table 2 shows the parameters of the AM model, UM model, and optimal regression models obtained in Exp1 and Exp2. The different models have different correction factors (rates). Figure 3 shows a comparison between optimal regression models and the WMO regression models listed in Table 2. The results demonstrate that if certain input data uncertainties (gauge height, daily wind speed and temperature) are ignored, the optimal regression model is bounded by the AM model and the UM model for snow [Yang et al., 1998a, 1998b]. This means that neither model is appropriate for adjusting snow systematic bias because the AM model underestimates and the UM model overestimates the correction factor for snow. For rain, the optimal regression model is close to the AM model. Therefore the AM model is appropriate for the systematic bias adjustment of U.S. rainfall. When input data uncertainties are included, the optimal regression model for snow is still bounded by the AM model and UM model, although it is closer to the UM model because of the effect of input data error on optimal regression. For rain, the optimal regression model has a smaller correction factor than the AM model because of the effect of uncertain input data on the optimal regression model. This means that neither the AM nor the UM model is appropriate for both rain and snow bias adjustment when optimal gauge height, NLDAS wind speed, and air temperature are used. Because of input data uncertainties, the optimal regression model is outside the boundary of the AM and UM models for rain.

Figure 3.

Daily correction factors versus wind speed for (a) snow and (b) rain for the AM model, UM model, and optimal regression models. The AM and UM regression models were taken from Yang et al. [1998a], AM and UM are represented by dotted line, Exp1 is represented by solid line, Exp2 is represented by dashed line, and dotted line for AM and solid line for Exp1 are overlapped.

Table 2. Default Parameter Set and Optimal Parameter Set for WMO Regression Models and Input Dataa
ParameterAM DefaultUM DefaultExperiment 1 OptimalExperiment 2 Optimal
  • a

    Symbols can be found in section 2.2.

  • b

    CFs = 100.0/[exp(4.61 − aV(h)b)].

  • c

    CFm = 100.0/[101.0 − cV(h)].

  • d

    CFr = 100.0/[exp(4.61 − dV(h)e)].

Snowb
a0.0360.1570.1170.149
b1.7501.2801.2601.250
 
Mixed Precipitationc
c5.628.348.388.34
 
Raind
d0.0410.0620.0400.041
e0.6900.5800.700.580
 
Input Data
f11.11.11.10.96
f21.01.01.00.85
f30.00.00.0−0.83

[24] It is clear that neither the AM model nor the UM model is appropriate for both rain and snow systematic bias adjustment in the United States because the U.S. gauge network includes Alter-shielded and unshielded gauges. Therefore a reasonable systematic bias adjustment is difficult for the whole United States using one type of WMO regression model except if site-specific information, e.g., gauge type, shielding, gauge height, wind sensor height, and degree of exposure, is obtained for each gauge. However, collection of such information would entail a large investment of effort and is currently unavailable in any central data archive [Adam and Lettenmaier, 2003]. Therefore a compromise method as described in this study may be useful for systematic bias adjustment in the United States. These results show that optimal regression models seem to be reasonable when compared to WMO regression models because they are bounded by the WMO AM and UM models. Optimal regression models are therefore used to calculate the optimal mean monthly correction factor in the United States.

4.2. Comparisons of Mean Monthly Correction Factors in the United States

[25] Figure 4 shows comparison of the optimal mean monthly correction factor for Exp1 (dashed line) and Exp2 (dotted line) to LW results (solid line) in the United States and four subregions. The four subregions are divided into the northwest region (98°W–125°W, 40°N–53°N), northeast region (59°W–98°W, 40°N–53°N), southwest region (98°W–125°W, 25°N–40°N), and southeast region (59°W–98°W, 25°N–40°N), according to the definition of Lohmann et al. [2004]. The results show that optimal mean monthly correction factors are consistent for Exp1 and Exp2, and they are similar to LW results in the United States and four subregions. They have significant seasonal variation, that is, the correction factor is large in winter and small in summer, particularly in the northwest and northeast regions because of snowfalls. In January and February, the optimal regression model underestimates mean monthly correction factors in the northeast region (Figure 5).

Figure 4.

Comparison of mean monthly LW correction factor (solid line), optimal correction factor for Exp1 (dashed line), and optimal correction factor for Exp2 (dotted line) in (a) northwest United States, (b) northeast United States, (c) southwest United States, (d) southeast United States, and (e) the United States as a whole. Dashed line and dotted line almost overlap.

Figure 5.

Horizontal distribution of (a, d) mean monthly LW correction factor, (b, e) optimal correction factor for Exp1, and (c, f) optimal correction factor for Exp2. LW, Exp1, and Exp2 are represented from top to bottom, and January and July results are represented from left to right.

[26] It should be noted that uncertain input data indeed influences optimal regression models for both rain and snow as shown in section 4.1. However, the effect is not significant for mean monthly correction factors when the results of Exp1 and Exp2 are compared in Figure 4. The reason is that optimal input data used here. Figure 6 shows the daily correction factor for rain and snow when optimal input data shown in Table 2 were used. The results demonstrate that the optimal correction factor with uncertain input data (Exp2) is similar to that with accurate input data (Exp1) for snow although the effect of uncertain input data on rain still exists. Therefore uncertain input data have little effect on optimization of the WMO regression model. This result is consistent with that of Xia et al. [2004c] where uncertain forcing data have also shown little effect on optimization of a land surface model.

Figure 6.

Optimal daily correction factors versus wind speed for (a) snow and (b) rain for the Exp1 (solid line) and Exp2 (dashed line) when optimal input data were used for the Exp2.

4.3. Comparisons of Adjusted and Unadjusted Precipitation in the United States

[27] Figure 7 compares mean monthly adjusted and unadjusted precipitation (1997–2003) in the United States. A solid line represents NLDAS precipitation, a dashed line represents adjusted NLDAS precipitation for Exp1, and adjusted NLDAS precipitation is represented by dotted line for the Exp2. The results show that NLDAS precipitation is increased by 10–15 mm for the United States with this adjustment. This increase is largest in the northeast region and is smallest in the southwest region. Figures 8 and 9show the horizontal distribution of adjusted and unadjusted mean monthly precipitation and their differences for January and July, respectively. The results show that mean January precipitation is increased by 20 to 50 mm in the northwest and northeast of the United States (Figures 8d and 8e). The main increase is located in the northern Cascade Range, northern Rocky Mountains, and whole northeast area. This is in good agreement with the results from Lohmann et al. [2004] where they showed that all four land surface models underestimate streamflow simulations when compared to observed streamflow. Comparison of two optimal adjustments (Figure 8f) shows a 3–5 mm difference. This means that uncertain input data have a small effect on systematic bias adjustment when compared to optimal adjustment itself.

Figure 7.

Mean monthly NLDAS precipitation (solid line), mean monthly Exp1 adjusted LDAS precipitation (dashed line), and mean monthly Exp2 adjusted NLDAS precipitation (dotted line) in (a) northwest United States, (b) northeast United States, (c) southwest United States, (d) southeast United States, and (e) the United States as a whole. Dashed line and dotted line almost overlap.

Figure 8.

Mean January (a) NLDAS precipitation, (b) Exp1 adjusted NLDAS precipitation, (c) Exp2 adjusted NLDAS precipitation, (d) difference between Figures 8b and 8c, (e) difference between Figures 8c and 8a, and (f) difference between Figures 8b and 8c.

Figure 9.

Same as Figure 8 but for July results.

[28] For July precipitation adjustments, mean precipitation is increased by 5 to 15 mm (Figure 9d and 9e). The main increase is located in the east of the United States. In the west of the United States, the increase is less than 5 mm. Again, uncertain input data have small effect on the optimal adjustment of precipitation systematic bias.

4.4. Daily and Annual Variations of Precipitation Correction Factors in the United States

[29] Compared to mean monthly LW correction factors, the advantage of the optimal regression model is that it is able to describe the daily and interannual variation of precipitation correction factors. Figure 10 shows daily variations of correction factors averaged in the United States for February, April, June, August, October, and December for the year 1997, 2000, and 2003, respectively. The results show significant daily and interannual variation of precipitation correction factors for all seasons except for summer. In summer, variation of daily correction factors with wind speed is small for rain (Figure 3). Therefore, if daily and interannual variations of the precipitation correction factor are ignored, the study of snow processes, such as comparison of snow water equivalent and snow cover fraction may produce misleading results.

Figure 10.

Daily correction factor for the year 1997 (solid line), the year 2000 (dashed line), the year 2003 (dotted line) in (a) February, (b) April, (c) June, (d) August, (e) October, and (e) December in the United States.

4.5. Discussion

[30] The optimal model presented here is not consistent with the UM model used by Adam and Lettenmaier [2003]. However, it is not possible to judge which regression model is more appropriate for the systematic adjustment of precipitation data in the United States because both methods have their benefits and drawbacks. The benefit of Adam and Lettenmaier's [2003] work is that they directly used the UM model to gauge sites as used by Yang et al. [1998b]. The drawback is that they subjectively selected one of two types of regression models due to the lack of information about gauge types and the large nonhomogeneity of the U.S. gauge network. As indicated by Adam and Lettenmaier [2003], their adjustment included a lot of uncertainties such as gauge representation uncertainty (e.g., sporadic shield, different gauge height), regression model application uncertainty, interpolation errors, and gauge measure network uncertainty. In contrast, the results presented here accounted for most of their uncertainties. An optimization algorithm was used to select one optimal set of model and input data parameters by making the RMSE between LW correction factors and calculated using uncertain regression model and the input data minimum. Clearly, these results depend on two assumptions: (1) LW data covering the years 1920 to 1980 can be used to compare the data presented here, covering the years 1997 to 2003, (2) the AM and UM models derived from gauge sites are appropriate for a grid box with an area of about 144 km2.

[31] These two assumptions can be justified in a number of ways. First, because the LW correction factor is still being used for correction of systematic bias for recent GPCC precipitation data and this correction factor data set is a climate-averaged database, this appears to be the most appropriate reference database that can be used at the current time despite the global warming effect. As the method of Rubel and Hantel [1999] is used to adjust GPCC precipitation data, it is reasonable to expect that a new reference database will be available for future studies. Secondly, like many other works [Sen et al., 2001; Pan et al., 2003], this work also has an issue of scales as well. Since WMO regression models were derived from gauge sites rather than from grid box data, how representative these models are of the grid average is somewhat questionable, especially in relation to daily wind speed and air temperature. However, this was addressed by using an uncertain regression model with a mathematical form similar to the work of Yang et al. [1998b] rather than the exact WMO regression model. LW correction factors and the correction factors calculated from uncertain regression models are used to calculate error function, and the VFSA selects optimal regression models to minimize the error function in the United States. Therefore the optimal regression models can be considered representative for grid points. Furthermore, uncertainties of daily wind speed and daily air temperature were included in this analysis, and thus the VFSA algorithm can select an appropriate wind speed and temperature values at a grid box to fit the selected regression model for that grid box when LW correction factors are used to constrain the calculated error function values.

[32] It should be noted that here the systematic bias adjustment of precipitation mainly includes wind blowing, wet loss and wet evaporation effects. It does not include the topographic effect on precipitation. However, the systematic bias of precipitation caused by topography is an important part as indicated by Milly and Dunne [2002]. Therefore the adjustment of bias due to topography needs to be approached in the future using an expert system such as that described by Daly et al. [1994].

5. Uncertainty Estimates of WMO Regression Models

5.1. Uncertainty Analysis of Model and Input Data Parameters

[33] Marginal posterior probability density (PPD) function can be used to estimate uncertainties of model parameters. Figure 11 shows PPD distributions of five regression model parameters for Exp1. The circles in Figure 11 show optimal parameters that were identified using the VFSA algorithm. The line between the two stars shows the uncertainty range at the 95% confidence level. The circles often line up with the peaks in the marginal PPD, although this is not always the case since there is no requirement that optimal parameters are also the most probable. The probability assigned to a given parameter value through the PPD involves a combined measure of the likelihood function and the frequency at which parameter values within a given neighborhood are selected. Comparison of PPDs for five parameters shows that a, b and d have smaller uncertainty than c and e because of more peaks. This means that marginal probabilities for a, b, and d have strong constraints. Parameters c and e show the largest uncertainty because their marginal PPDs have a near uniform distribution. This large uncertainty is because c and e are not sensitive to the calculated error function (Figure 1). In general, sensitive parameters have small uncertainties and insensitive parameters have large uncertainties. However, d is also not sensitive to the calculated error function, but it shows smaller uncertainty when compared to c and e. One possible explanation for this is that d has a correlation of −0.19 with a and a correlation of 0.20 with b (Table 3a), the two most sensitive parameters, and this association increased the relative influence of d. This issue has been discussed by Jackson et al. [2003].

Figure 11.

Marginal posterior probability density function (PPD) for regression parameters (a) a for snow (b) b for snow, (c) c for mixed precipitation, (d) d for rain, and (e) e for rain when the Exp1 was conducted. Circles are optimal parameters, and lines between two stars represent an uncertainty range at the 95% confidence level.

Table 3a. Correlation Matrix of Five Regression Model Parameters for Experiment 1a
Parameterabcde
  • a

    Bold values indicate significant correlation between two parameters.

a1.00.660.03−0.190.07
b0.661.0−0.070.20−0.06
c0.03−0.071.0−0.220.07
d−0.190.20−0.221.0−0.22
e0.07−0.060.07−0.221.0

[34] Figure 12 shows PPD distributions of three regression model parameters and three input data parameters for Exp 2. The results show strong constraints for a, b, d, f2 and f3 and a weak constraint for f1. A comparison of Figures 12 and 11 shows that a has wider uncertainty range at the 95% confidence level and less peaks for Exp2 when compared to Exp1 results. This means that a has larger uncertainty for Exp1 than for Exp2. This larger uncertainty is due to nonlinearity between uncertain wind speed (f2) and model parameter a. The nonlinearity can be represented by a correction of −0.62 (Table 3b). Besides the strong correlation between uncertain wind speed f2 and model parameter a, there is also a correlation of −0.46 between model parameter a and model parameter b, and a correlation of 0.30 between uncertain wind speed f2 and uncertain temperature f3. These correlations show that there are interactions between model parameters, between input data sets, and between model parameters and input data sets.

Figure 12.

Marginal posterior probability density function (PPD) for regression parameters (a) a for snow (b) b for snow, (c) d for rain, (d) f1 for gauge height, (e) f2 for wind correction factor, and (f) f3 for temperature correction factor when Exp2 was conducted. Circles are optimal parameters, and lines between two stars represent uncertainty range at the 95% confidence level.

Table 3b. Correlation Matrix of Three Regression and Three Input Parameters for Experiment 2a
Parameterabdf1f2f3
  • a

    Bold and italic values indicate significant correlation between two parameters.

a1.000.43−0.01−0.120.62−0.14
b0.431.000.200.03−0.070.03
d−0.010.2001.00−0.18−0.08−0.06
f1−0.120.03−0.181.00−0.090.08
f20.62−0.07−0.08−0.091.000.30
f3−0.140.03−0.060.080.301.00

5.2. Uncertainty Analysis of Daily Correction Factors

[35] Figure 13 shows the probability distribution of daily correction factors for snow when wind speeds of 1, 2, 3, 4, 5, and 6 m/s are used. In Figure 13 the line between two plusses represents the range of daily correction factors calculated using the AM and UM models, the circle represents the result of optimal regression model, and the square represents the result of mean model shown in equation (14). The results show that the range of the daily correction factor calculated using the AM and UM models covers over 95% of the uncertainty range for all examined wind speeds. The mean model and optimal model generate similar daily correction factors for all wind speeds except for cases of 1 and 6 m/s where mean and optimal models show some differences. As the uncertain input data were included, the uncertainty range of daily correction factors increased for examined wind speed (Figure 14) when compared to the results in Figure 13. The range of the daily correction factor calculated using the AM and UM models covers less than 95% of the uncertainty range for cases of large wind speed. This result is reasonable because uncertain input data contribute additional uncertainties. Similar conclusions can be drawn for mixed precipitation and rain correction. This means that uncertain input data indeed have a significant effect on the uncertainty estimates of daily correction factors. They not only affect daily correction factors but also affect mean monthly correction factors. A cumulative distribution function (CDF) of RMSE for Exp1 and Exp2 shows that Exp2 has a larger RMSE than Exp1, showing that the effect of uncertain input data. RMSE involves a snow regression model, mixed precipitation regression model and rain regression model so that it is a combination effect of snow, mixed precipitation and rain. If a given percentage of the best parameter sets is used (say, 10%) to estimate uncertainties of error functions as used by Franks and Beven [1997], uncertain input data will lead to larger uncertainty estimates of adjusted precipitation.

Figure 13.

Probability distribution of daily correction factors for Exp1 snow when wind speeds of (a) 1, (b) 2, (c) 3, (d) 4, (e) 5, and (f) 6 m/s were used. A total 40,000 parameter sets were used here; the circle represents the result calculated using optimal model, a square represents the result calculated using mean model, and a line represents the range of the results calculated using AM and UM models.

Figure 14.

Same as Figure 13 but for the Exp2.

5.3. Uncertainty Estimates of Adjusted Precipitation

[36] Figure 15 shows uncertainty estimates of adjusted mean monthly precipitation in the United States and four subregions when only regression model uncertainties were included. A solid line represents optimally adjusted NLDAS precipitation, a dashed line represents the adjusted precipitation using mean model, and a dashed-dotted line represents uncertainty estimates of adjusted precipitation at the 95% confidence level. The results show that an optimal and mean models gives similar adjusted precipitation. This is consistent with the analysis described in section 5.2 where optimal model and mean model give similar daily correction factors. The uncertainty range of adjusted precipitation has large seasonal and spatial variations. It is large in winter and small in summer, and it is large in the north of the United States and small in the south of the United States. This is because uncertainty of regression models is large for snow and small for rain. Uncertainty of 10–15 mm can be found in winter and uncertainty less than 5 mm can be found in summer. This estimate does not include input data uncertainties. If input data uncertainty is included, the uncertainty estimate of adjusted precipitation would increase somewhat. However, a large increase of the uncertainty estimate is not likely even if uncertain input data were involved.

Figure 15.

Uncertainty estimates of mean monthly NLDAS precipitation adjusted for Exp1 in (a) northwest United States, (b) northeast United States, (c) southwest United States, (d) southeast United States, and (e) the United States as a whole. Solid line represents optimally adjusted NLDAS precipitation, a dashed line represents the adjusted NLDAS precipitation using mean model, and dashed-dotted lines represent an uncertain range at the 95% confidence level. Dashed line and solid line almost overlap.

5.4. Discussion

[37] Major uncertainties of adjusted NLDAS precipitation come from uncertain regression models, uncertain NLDAS data (i.e., wind speed, air temperature, precipitation), uncertain gauge height, and LW data uncertainty. NLDAS precipitation uncertainty and LW data uncertainty were not discussed in this study. This does not necessarily mean that they have little effect on uncertainty estimates of adjusted NLDAS precipitation because NLDAS precipitation contains model precipitation from Eta model outputs. However, as indicated that by Cosgrove et al. [2003], CPC (Climate Prediction Center) daily gauge analyses serve as the backbone of the NLDAS hourly precipitation forcing. Less than 10% NLDAS precipitation is replaced using Eta model precipitation due to missing CPC precipitation. Therefore the uncertainty of the NLDAS precipitation may be expected to be not large.

[38] Imprecise model parameters generate uncertain models, the combined effect of uncertain models and input data generate inaccurate correct factors, and finally inaccurate correction factors generate unreliable adjusted precipitation. It should be noted that the interaction between the model parameter themselves, between the input data itself, and also the interaction between model parameters and the input data make the uncertainty estimation process more complicated. Numerous optimization algorithms (e.g., variational method) cannot be used because there are nonlinear relationships among model parameters and input data. As indicated by Sen and Stoffa [1996], the VFSA and BSI algorithms are appropriate for this study.

6. Conclusions

[39] This study includes two parts. Firstly, the VFSA was used to optimize WMO regression model parameters and input data to drive an optimal WMO regression model, and then used the derived optimal regression model and input data to adjust NLDAS precipitation. Secondly, the BSI was used to estimate the uncertainties of adjusted NLDAS precipitation.

[40] The results show that optimal models are reasonable because they are bounded by the AM and UM models. The calculated mean monthly correction factors are consistent with LW data for the United States and four subregions. Comparison of the Exp1 and Exp2 experiments shows that uncertain input data have some effect on the selection of optimal models. However, it has little effect on optimally adjusted precipitation in the United States.

[41] The AM and UM models can estimate uncertainty of adjusted precipitation for the United States well only if an uncertain regression model is used. However, they cannot estimate uncertainty of adjusted precipitation well when inaccurate input data are involved because this increases uncertainty in the precipitation correction factor and subsequently the adjusted precipitation. In addition, there is significant interdependence within the model parameters, the input data, and between the model parameters and the input data.

Acknowledgments

[42] Y.X. wishes to thank Peter Mayes at New Jersey Department of Environmental Protection, Stephen Dery at GFDL, and P.C.D. Milly at USGS, whose editing greatly improved the readability and quality of this paper. This project was supported by NOAA grant NA17RJ2612. The technical support from K. A. Dunne, Fanrong Zeng, and Sergey Malyshev at GFDL is appreciated. Y.X. also thanks C. J. Jackson at Institute for Geophysics at University of Texas at Austin for providing the BSI and VFSA code.

Ancillary

Advertisement