Corresponding author: J. Yan, Department of Statistics, University of Connecticut, Storrs, CT 06269, USA. (firstname.lastname@example.org)
 Uncertainty information about river level forecast is as important as the forecast itself for forecast users. This paper presents a flexible, statistical approach that processes deterministic forecasts into probabilistic forecasts. The model is a smoothly changing conditional distribution of river stage given point forecast and other information available, such as lagged river level at the time of forecasting. The parametric distribution is a four-parameter skewt distribution, with each parameter modeled as a smooth function of the point forecast and the 1 day ago observed river level. The model was applied to 9 years of daily 6 h lead forecasts and 24 h lead forecasts in the warm season and their matching observations at the Plymouth station on the Pemigewasset River in New Hampshire. For each point forecast, the conditional distribution and resulting prediction intervals provide uncertainty information that are potentially very important to forecast users and algorithm developers in decision making and improvement of forecast quality.
 A deterministic river level forecast is subject to errors resulting from various sources including (1) uncertainty in the amount, location, and timing of rainfall input, (2) uncertainty in the soil moisture state used for model initialization, and (3) inability of the model (model structure and parameter estimates) to accurately reproduce the physical processes. Forecast uncertainty refers to all systematic and random differences between the deterministic forecast and the corresponding actual value. Information on forecast uncertainty adds value to decision making and helps users to feel more confident about their decisions [e.g., Frick and Hegg, 2011]. Ignoring forecast uncertainty could mislead decision making. In fact, when such a forecast turns out to be wrong, the consequences could be worse compared to a situation where no forecast is available [Glassheim, 1997; Montanari and Grossi, 2008]. In addition to providing information necessary for better decision making, information on forecast uncertainty is necessary to guide the development and improvement of a forecasting process. The results of Krzysztofowicz and Maranzano , and Welles et al.  indicate that, despite the several enhancements made to the forecast process over the past 10 and 20 years, there has not been some degree of improvement in the hydrologic forecast accuracy. Welles et al.  attributed this to the use of expert opinion, rather than objective uncertainty measures, for guiding the development of the hydrologic forecast process.
 The National Weather Service (NWS) has offered since 2004 an Advanced Hydrologic Prediction Service (AHPS) [National Oceanic and Atmospheric Administration, 2002]. One of the services the AHPS delivers is quantifying forecast uncertainty and conveying this information in products which specify the probability of reaching various water levels. For river forecast, the uncertainty information is issued in the form of exceedance probability for a certain period of time obtained from conditional simulation based on current conditions. Useful as the exceedance probabilities are, some users may prefer ensemble river forecasts [e.g., Addor et al., 2011; Frick and Hegg, 2011; and Voisin et al., 2011]. Construction of ensemble river forecasts may be challenging as it demands ensemble forecast inputs of precipitation and temperature. A simple, statistical alternative is to generate probabilistic river level forecasts from deterministic river level forecasts. Similar idea has been applied to generation of ensemble precipitation and temperature from single-valued forecasts [e.g.,Schaake et al., 2007; Wu et al., 2011]. We aim to provide a conditional modeling approach that processes deterministic river level forecasts into probabilistic river level forecasts given all available information at the forecast.
 The rest of the paper is organized as follows. Section 2 provides a brief review of techniques currently used in forecast uncertainty assessment. Details about data from Plymouth, New Hampshire, are presented in section 3. The statistical methodology is developed in section 4. Application to the Plymouth data, including model calibration and model validation, are presented in section 5. A discussion concludes in section 6.
 Assessment of forecast uncertainty has been an active research subject in recent years [e.g., Beven, 2006; Montanari and Grossi, 2008; Coccia and Todini, 2011]. Various methods exist for estimation of forecast uncertainty. Among these methods, the generalized likelihood uncertainty estimation (GLUE) and the Bayesian method are extensively used. GLUE, developed by Beven and Binley , accepts possible equifinality of models; i.e., different modeling options (different hydrological models and different parameter values) may mimic equally well an observed natural process. It works with multiple modeling options, via Monte Carlo sampling, and applies likelihood measures to estimate the predictive uncertainty. Limitations of the GLUE method include high computational demand (especially when dealing with distributed models) and incorrect uncertainty estimation [e.g., Thiemann et al., 2001; Montanari, 2005; Mantovan and Todini, 2006; Blasone et al., 2008]. Detailed information about this method is available in Beven et al.  and Stedinger et al. .
 In a Bayesian framework, forecast uncertainty is naturally assessed by the predictive distribution of observations given forecasts based on a fully specified model of matching historical observations and forecasts. This approach was first introduced by as the hydrological uncertainty processor and has made positive impact Krzysztofowicz  and Krzysztofowicz and Kelly . Results from multiple models can be accommodated through Bayesian model averaging Raftery, 1993; Raftery et al., 2005; Vrugt and Robinson, 2007 or model conditional processor [Todini, 2008; Coccia and Todini, 2011]. As with all Bayesian methods, full specification of joint distributions and prior distribution for parameters are needed. Most of the models assume a meta-Gaussian distribution, which means that, although the marginal distributions can be anything, the dependence structure must be a Gaussian copula [e.g.,Genest et al., 2007; Genest and Favre, 2007].
 The frequentist approach for uncertainty assessment gained attention recently. The general idea is estimate the conditional distribution of observations given their forecasts and other available information at forecasting time. Schaake et al. developed a method that generates ensemble forecasts of precipitation from single-valued quantitative precipitation forecasts (QPF) based on the lead-time-specific probability distribution of observed precipitation conditional on the single-valued QPF. The conditional distribution is derived from a joint distribution of the observations and forecasts, specified by a mixture of a meta-Gaussian distribution and point masses at zero to allow zero precipitation. The method has been used experimentally at a number of River Forecast Centers for several years with recent improvement [Wu et al., 2011]. Montanari and Grossi, modeled the forecast error in a regression setting with meta-Gaussian distribution that incorporates explanatory variables such as point forecast, the past forecast error, and the past rainfall. Adequacy of the Gaussian copula in these methods can be checked with packaged goodness of fit tests [Kojadinovic and Yan, 2010], which, however, appears not to have been used in this context. Other advanced statistical modeling include support vector regression [Chen and Yu, 2007] and state-space model [Smith et al., 2012].
 Instead of specifying a joint distribution and derive the conditional distribution from it, a more direct way is to model the conditional distribution of observations given forecasts and other useful information. Yan and Gebremichael  used parametric gamma distributions to characterize the conditional distribution of actual rainfall given satellite rainfall estimates, where a smooth function was assumed between the gamma parameters and satellite rainfall. Gebremichael et al.  proposed a nonparametric model for the conditional distribution of actual rainfall given a satellite rainfall estimate, where the conditional distribution was the mixture of a positive continuous distribution and a point mass at zero. Compared to the methods of Beven, 2006; Montanari and Grossi, 2008; Coccia and Todini, 2011, these methods do not rely on the adequacy of Gaussian copula and the parameters of the conditional distribution are flexible, smooth functions of explanatory variables including point forecasts.
 In this study, we develop a semiparametric conditional model that processes deterministic point forecasts into probabilistic forecasts. With recent advances in the general method of generalized additive model for location, scale, and shape [Rigby and Stasinopoulos, 2005], we model the parameters of the conditional distribution of observed river levels as smooth functions of point forecast and lagged river level. Our working assumption is that the conditional distribution remains stable during the time period of interest, which may be realistic given the lack of major progress in the forecast skill over the past 10 and 20 years [e.g., Welles et al., 2007]. This assumption allows us to take advantage of long-term records of streamflow observations and forecasts for model development and usage. The methodology is illustrated with daily data at the Plymouth, New Hampshire, with forecasts issued by the Northeast River Forecast Center (NERFC).
3. The Data
 The Pemigewasset River Watershed covers an area of approximately 1023 square miles situated in the western center of the state of New Hampshire, USA; see Figure 1. The Pemigewasset River flows south from the northern watershed border, through Plymouth near the center, to Franklin at the southern border. The watershed is mountainous draining the highest elevation in New England. The mean annual precipitation is 45 inches, and the time to peaks ranges from 6 to 12 h.
 The Plymouth Station on the Pemigewasset River is located at 43°45′33″N, 71°41′10″W. The observed river stage data are the operational stage data collected by the U.S. Geological Survey (USGS) and archived by the NERFC. The river stage forecasts are generated by the NERFC through the national weather service river forecasting system that uses the Sacramento Soil Moisture Accounting Model [Burnash et al., 1973; Burnash, 1995]. Model inputs include observed precipitation and temperature, and forecasted precipitation and temperature up to 24–48 h ahead. The time step for the model runs is 6 h.
 The NEFRC issues point forecast river stage at the station everyday around noon with 6, 12, 18, 24, 30, and 36 h lead time. The data span 9 years, from December 2001 to October 2010. As warm seasons (April to September) are usually more important for flood monitoring and a different model may be necessary for cool seasons, we use data from warm seasons to illustrate the proposed methodology. Both 6 h lead forecasts and 24 h lead forecasts are considered. For each leading time, we have 1582 pairs of forecast and observation. Figure 2 shows the daily time series plots of the matching observed river level of the forecasts and the forecast error in warm seasons in 2002–2010 under with both 6 h lead and 24 h lead. It is obvious that the 24 h lead forecasts are much less accurate, hence with much higher uncertainty, than the 6 h lead forecasts.
 Given the point forecast issued by the NERFC and other information available, such as the observed river level on the previous day, we aim to provide a conditional distribution of the actual river level, which provides uncertainty measures for each point forecast. With this objective in mind, we plot the scatterplot of observed river level against forecast level and against the lag 1 (1 day ago) observation in Figure 3. The high correlation between observed river levels and the matching point forecast is the basis of the conditional model for actual river level given point forecast level. As expected, their agreement is much better for 6 h lead forecast than for 24 h lead forecast. The plot also shows high correlation between observed river levels and their lag 1 levels, raising the question whether the lag 1 observations contain additional information in predicting river level beyond the point forecast. As it turned out in the analysis, models including the lag 1 river level in the conditioning set performed much better than models without it terms of model comparison criterion.
 Our goal is to characterize forecast uncertainty by a conditional probabilistic distribution of river level for each given point forecast and other information available at the time of forecast. As suggested by the exploratory analysis in Figure 3, we use the lag 1 observation in addition to the point forecast. Let ft be the forecasted river level on day t and let ot be the matching observed river level, . Our target is , the conditional distribution of ot given ft and . If is not included in the model, the conditional distribution would provide ensemble forecast from single-value forecast. In that case, our model would be conceptually similar to those inSchaake et al.  and Wu et al. , but with a more flexible framework through semiparametric functional forms. In our analysis, this simpler model was largely outperformed by our model with in the conditioning set in model comparison.
 As our sample size is moderate, we adopt a semiparametric model. After trying several parametric distributions for G (e.g., skewed normal distribution, skew tdistribution, generalized gamma distribution, and Box-Coxt distribution). we focused on the skew t distribution because it accommodates heavy tails and skewness and provided better fit than its competitors. The probability density function of a skew t distribution is
where is a location parameter, is a scale parameter, is a skewness parameters, and f and F are the density function and distribution function, respectively, for a standard t distribution with degrees of freedom . Because of the heavier tails, the skew t distribution accommodates extremal values better than models based on normal distribution. Note that, however, an extremal value here is relative to center of the conditional distribution, instead of the observed river level itself exceeding some threshold. The model provides a conditional probabilistic distribution to characterize the forecast uncertainty for all levels of point forecast in the observed range, regardless of exceeding some threshold.
 We assume that all four parameters on day t are smooth functions of ft and . Specifically, we impose a generalized additive model on each one of the four parameters , , , and
where is a smooth function to model quantity a with input b, for , and , and is a monotone link function for quantity a for . In our application, we used the identity function for and , and the logarithmic function for and to ensure positivity of σ and τ. The classic generalized linear model connects the mean of the variable of interest to a linear predictor through a link function. The proposed model is an application of the general methodology of generalized additive model for location, scale, and shape, which have been applied in many fields [Rigby and Stasinopoulos, 2005]. It generalizes the generalized linear model in two aspects: (1) all four parameters of the skew t distribution (not only the mean) are connected to predictors via link functions, and (2) the predictors are smooth functions (not necessarily linear) of the covariates.
 The parameters in the model are estimated by maximizing the log likelihood. Let θ be the vector of all the parameters for a given model. The log likelihood is
where φ is the density in (1), and , , , and are defined as in (2)–(5). The maximum likelihood estimator is
 The functional forms of s have many choices and involve unknown parameters to be estimated from the data. For each covariate, in addition to the usual linear form, we allow more flexible forms through cubic spline functions. Model fitting can be conveniently carried out with R package gamlss [Stasinopoulos and Rigby, 2007], which implements the general generalized additive model for location, scale, and shape of a wide range of parametric distributions. This makes the method easily accessible for practitioners.
 Various combinations of functional forms for the four parameters lead to a large number of models and we choose the one with the lowest Schwartz Bayesian criterion (SBC). The SBC for a model is defined as
where df is the degrees of freedom of the model. The second term is a penalty on the model complexity. For any linear term, the df is 1; for any cubic spline term, the df is 4. Compared to the Akaike information criterion (AIC) whose penalty is of the form 2df, the SBC penalizes model complexity more severely when . A justification for the use of SBC in the specific context of comparing forecasting rule was recently provided by Gneiting and Raftery .
 Characterization of forecast uncertainty is done with the best model. Prediction intervals can be constructed from the conditional model and their accuracy can be assessed by comparing the empirical coverage probability with their nominal levels. To check the adequacy of the distributional specification of the skew t, Rigby and Stasinopoulos  suggested to use the normalized quantile residuals of Dunn and Smyth . The normalized quantile residuals are defined as
where is the quantile function of standard normal distribution, and G is the cumulative distribution function of g in (1). If distribution g is correctly specified, the true residuals rt have a standard normal distribution. Therefore, checking the normality of residuals , for instance, with normal Q-Q plot, provides diagnosis for the adequacy of distribution specificationg.
 We partition the data into calibration data and validation data. The calibration data is used to fit models and select the best one. The validation data is used to assess the out-of-sample performance. We used 7 years (2002–2008) data for calibration with 2 years (2009–2010) data for validation.
5.1. Model Calibration
 For each covariate, we considered three basic functional forms of s, constant, linear, and cubic splines, in the generalized additive model for the parameters of the skew t distribution. For each parameter, there are nine possible models based on the two covariates (ft and ) and the three functional forms. Table 1 summarizes the model formula for all nine models. Different combinations of them with two covariates and four parameters lead to an overly large number of different models (94). Instead of an exhaustive search for the best model, we used a backward deletion procedure with SBC starting from a full model. First, a full model is fitted where both covariates enters all four parameters in the most flexible form, cubic spline (model 1 in Table 1). Then, we walk through all four parameters in the order of τ, ν, σ, and μ to check if the model can be reduced to a simpler one for each parameter under consideration. The parameters are in the increasing order of importance among the four parameters from our model fitting experience. We would like to have simpler form of τ and ν before we choose the forms for σ and μ. Table 2 summarizes the functional forms of the best fitting models selected for both 6 h lead forecast and 24 h lead forecast. As expected, the models conditional on 24 h lead forecasts are much more complicated that those conditional on 6 h lead forecasts because of much higher variation needs to be accounted for.
Table 1. Summaries of Possible Models With Covariates Forecast Value (ft) and Lag 1 Observed Value ( )
cs(ft) + cs( )
ft + cs( )
Table 2. Summaries of Functional Forms of the Best Fitting Model Selected by SBC With the Calibration Data (2002–2008)
6 h Lead Model
24 h Lead Model
ft + cs( )
cs(ft) + cs( )
cs(ft) + cs( )
cs(ft) + cs( )
ft + cs( )
Figure 4 shows the estimated surface of μ and σ of the skew t distribution as functions of ft and for the 6 h lead forecast. The location parameter μ has a surface close to a plane, with a larger ft or giving a larger location; additional information in still helps after accounting for ft but μ is not sensitive to for each ft. The scale parameter σ increases as ft increases, but the rate is much higher when ft is greater than . The most extreme values of σ is observed when ft is high (greater than 10 feet) but is low (less than 5 feet). The skewness parameter ν is estimated as 0.106 with standard error is 0.02, suggesting right skewness. The degrees of freedom τ is estimated as 1.143 with standard error is 0.07, suggesting heavy tail; the mean of the conditional distributions barely exist and the variances do not exist.
Figure 5 shows the estimated surface of all four parameters of the skew t distribution as functions of ft and for the 24 h lead forecast. The surface of μ here is very different from that for the 6 h lead forecast: it still increases close to linearly as ft increases, but it also increases at a higher close to linear rate as increases. This means that the 24 h lead point forecast can be improved by incorporating the lag 1 observed river level. The surface of σ is also very different from that for the 6 h lead forecast: it increases with ft for all levels of , not just for lower values. The skewness parameter ν increases as ft increases but decreases as increases; higher skewness occurs where ft is higher and is lower. Most of the skewness surface is above zero, but negative skewness does occur where ft is lower and is higher. The surface of τ is in a narrow range between 1 and 3.5, suggesting heavier tails than the normal distribution. It increases as increases and for values below 5 feet, it is smaller than 1.5, suggesting nonexistent variance.
 The adequacy of the skew tdistribution is checked visually with Q-Q plots of the normalized quantile residuals, where sample quantiles are plotted against theoretical quantiles of the standard normal distribution. If the points are approximately on a straight line, then the skewt distribution provides adequate fit to the data. Figure 6shows the Q-Q plots for the normalized quantile residuals from the two best fitting models. Except for a possible single outlier in the lower tail for the 24 h lead model, the overall fit appears to be quite good.
5.2. Model Validation
 The predictive performance of the best fitting model is checked against the out-of-sample validation data. For the observed river levelot at time t in the validation data, we position it against its conditional distribution given its matching forecast level ft and lag 1 observation . We study the empirical coverage percentage of the 90% prediction intervals constructed with the lower and upper fifth percentile of the conditional skew t distribution.
 The conditional distribution of ot shifted to the left by an amount of ft is the conditional distribution of the forecast error . This means that the prediction intervals for actual river levels shifted to the left by ft are the prediction intervals for the forecast errors, which are easier to investigate graphically. Figures 7 and 8 show the observed forecast errors (solid line) over 2009 and 2010 overlaid with the 90% pointwise conditional prediction intervals (gray band) constructed from the conditional model fitted with the 2002–2008 data. The prediction intervals have dynamically changing centers and width, depending on the values of ft and . When the realized forecast error is large, the prediction interval from the conditional model is adaptively wider, catching the observed errors at a rate close to the nominal level of 90%. We have also tried using 2002–2006 and 2002–2007 data as calibration data, and the overall empirical coverage was good too.
 Since uncertainty characterization of forecast is more important during floods, it is natural to check the performance of the prediction intervals when the river stage is high. The 90th percentile of the observed river levels is about 5 feet. Table 3 summarizes the empirical coverage percentage of the 90% conditional prediction intervals for observed river levels over 5 feet and under 5 feet separately. At a first glance, some of the empirical coverage percentages for river levels over 5 feet look alarmingly away from the nominal level (e.g., 73.7% for 6 h lead in 2009 and 100% for 24 h lead in 2009). Nevertheless, there are only 15 and 8 observations over 5 feet in 2009 and 2010, respectively. Given the short period of the data, these empirical coverage percentages are not as abnormal as they looks. A more comprehensive validation may be desired if more data were available.
Table 3. Empirical Coverage Percentage of the 90% Prediction Intervals Using the Best Model Obtained From the Calibration Data for Observed River Levels Over 5 Feet and Under 5 Feet
Forecast Lead Time
Over 5 Feet
Under 5 Feet
Over 5 Feet
Under 5 Feet
 We have proposed an uncertainty characterization in river level forecasting with a conditional probabilistic model for river level given a point forecast and other information (e.g., lag 1 observed river level) available at the time of the forecast. Although the mean of the conditional distribution could be viewed as a new point forecast based on all information at the time of forecasting, our contribution is not to replace the NERFC point forecast but to provide a probabilistic distribution for its uncertainty. Our model is an application of the general methodology of generalized additive models for location, scale, and shape [Rigby and Stasinopoulos, 2005], giving smoothly changing conditional distributions as the point forecast and lag 1 river level changes. The publicly available R package gamlss [Stasinopoulos and Rigby, 2007] not only makes the method easily accessible to practitioners who have routine need of forecast uncertainty, but also makes it openly available for public scrutiny.
 The conditional distribution is a skew t distribution, whose four parameters (location, scale, skewness, and df), are all characterized by smooth functions of the point forecast and lag 1 observed river level, both available at the time when the point forecast is issued. The model is intuitive and can be easily applied to other sites or more general forecasting scenarios. The functional forms of the parameters are selected through SBC. The best model is then used to give conditional predictive distributions of river level for validation data that are not used in fitting. The performance of the predictive distribution appears to be reasonably well from the agreement between the empirical coverage percentage of the 90% prediction intervals and their nominal levels.
 The conditional model given point forecast and other information available can be viewed as a processor that processes a deterministic point forecast to a probabilistic forecast. The whole distribution or some its summary statistics can be issued along with the point forecast river level. For instance, if a 90% prediction interval is issued with a point forecast river level, then users of the forecast will have clearer understanding of the forecast uncertainty. This can be especially useful when it is infeasible or too expensive to obtain ensembles of forecast from ensemble hydrologic models and ensemble precipitation forecasts. Therefore, the method may be of help to the NWS as a cost-effective way to generate probabilistic forecast from deterministic forecast before more comprehensive and accurate approaches become available.
 A caveat is that the conditional probabilistic model is only applicable within the range of the data used in the fitting. Extrapolation too far outside of the data range to extremely high forecast values can be dangerous and misleading. For example, there is no data with forecast value greater than 17 feet in the Plymouth data, and, hence, there is no data about the variation associated with such a point forecast. When a 17 feet point forecast is issued for the first time, one can only wish that the behavior of the conditional distribution with extrapolated parameters reflects the unforeseen reality. This limitation is general to all forecasting procedures that are based on models fitted with historical data. We also point out that the model parameters estimated in this study are unique to the site considered (i.e., the Plymouth station on the Pemigewasse River) and the forecast lead time. Future research can investigate how the model parameter estimates vary with different lead time forecasts and watershed characteristics.