Statistical applications of physically based hydrologic models to seasonal streamflow forecasts



[1] Despite advances in physically based hydrologic models and prediction systems, long-standing statistical methods remain a fundamental component in most operational forecasts of seasonal streamflow. We develop a hybrid framework that employs gridded observed precipitation and model-simulated snow water equivalent (SWE) data as predictors in regression equations adapted from an operational forecasting environment. We test the modified approach using the semidistributed variable infiltration capacity hydrologic model in a case study of California's Sacramento River, San Joaquin River, and Tulare Lake hydrologic regions. The approach employs a principal components regression methodology, adapted from the Natural Resources Conservation Service, which leverages the ability of the distributed model to provide an added dimension to SWE predictors in a statistical framework. Hybrid forecasts based on data simulated at grid points acting as surrogates for ground-based observing stations are found to perform comparably to those based on their observed counterparts. When a larger selection of grid points are considered as potential predictors, hybrid forecasts achieve superior skill, with the largest benefits in watersheds that are poorly represented in terms of ground-based observations. Forecasts are also found to offer overall improvement over those officially issued by California's Department of Water Resources, although their specific performance in dry years is less consistent. The study demonstrates the utility of physically based models within an operational statistical framework, as well as the ability of the approach to identify locations with strong predictive skill for potential ground station implementation.

1. Introduction

[2] The scarcity of water has defined much of the history in the western United States and continues to be one of its most complex and pressing public issues today. Decisions related to water usage have significant economic consequences, often with far-reaching implications that affect the welfare of the general public [e.g., Glantz, 1982]. It is therefore critical to ensure that this public resource is used most efficiently, which inherently involves accurate forecasting of its future availability.

[3] The western United States is also distinctive in that over half of its annual streamflow is derived from snow, which acts as a natural reservoir at higher elevations until it runs off in the spring. Since the 1930s, this relationship has been exploited by the Natural Resources Conservation Service (NRCS) [Helms et al., 2008], which now works with National Weather Service (NWS) River Forecast Centers (RFCs) to issue seasonal water supply forecasts throughout the western United States [Franz et al., 2003]. California, which began water supply forecasting in 1929, remains the lone state to conduct its own snow surveys and issue independent forecasts under the direction of its Department of Water Resources (DWR) [Hart and Gehrke, 1990].

[4] Notwithstanding more recent modifications to the mechanics of the approach [Garen, 1992; Pagano et al., 2009], most early forecasting techniques remain largely in active use today. The basic framework is similar for NRCS and DWR, with each relying on multiple regression techniques to relate a collection of predictors (snow water equivalent (SWE), accumulated precipitation, antecedent runoff, and in some cases, seasonal climate indices such as those based on the El Niño/Southern Oscillation) to a predictand (seasonal streamflow volume). From January to June, NRCS produces monthly forecasts for various target periods at 732 locations throughout the West, with unofficial “guidance” outlooks released on a daily basis for a subset of locations [Pagano et al., 2009]. DWR's Bulletin 120 contains monthly forecasts of April–July streamflow for roughly 40 locations from February to May, with weekly updates issued from February to mid-June or as conditions warrant (DWR Division of Flood Management, personal communication, 2008). Predictor data is acquired from various ground-based sources, including storage precipitation gauges and U.S. Geological Survey (USGS) streamflow data that have been adjusted (“naturalized”) to account for upstream anthropogenic effects such as reservoirs or diversions. For SWE data, NRCS employs both manual snow course and automated snow sensor observations from its snow telemetry (SNOTEL) network. DWR depends primarily on snow course data from California's Cooperative Snow Surveys Program for its first of the month forecasts, using snow sensors to help estimate missing data or correct erroneous data and relying on them exclusively for its weekly updates.

[5] As a complement to statistical methodologies, ensemble streamflow prediction (ESP) employs hydrologic and river routing models to produce forecasts of watershed runoff and streamflow [Day, 1985; Twedt et al., 1977]. In recent years, ESP has become a more central component of NWS water supply forecasting activities in the western United States, as implemented in the NWS River Forecast System (RFS) [Anderson, 1973; McEnery et al., 2005], and has also been applied on a limited basis by NRCS. However, with a few exceptions (e.g., Kim et al. [2001], who evaluated the performance of ESP in Korea), studies that use “hindcasts” or “reforecasts” have not demonstrated that ESP can generate significantly more accurate volumetric forecasts than those from existing statistical systems [Pagano et al., 2009]. Yet efforts to incorporate ESP into operational decision support systems have advanced despite these and other hurdles [e.g., Georgakakos et al., 2005].

[6] Prediction approaches that incorporate physically based hydrologic models contain strengths not present in purely statistical systems. These include the ability to characterize current hydrologic variables in much greater spatial detail than can be provided by point observations alone [Li et al., 2009; Wood and Lettenmaier, 2006]. Another is the leveraging of physical algorithms to simulate hydrometeorological situations not found in the historic training period, which is also possible via stochastic weather generators [see, e.g., Wilks, 1992]. Distributed, physically based estimates are useful not only for dynamical simulations, but can expand the predictor set for statistical forecasting applications as well. A hybrid approach that combines the initial conditions provided by a physically based hydrologic model with the regression-based methods used operationally has the potential to improve seasonal forecast skill.

[7] In this paper, we explore the utility of a hybrid prediction approach in a case study involving DWR's seasonal forecasting system. The study was motivated by an overarching interest in the practical integration of model-based hydrologic simulation and prediction methods within water resources decision support settings. The approach was motivated by discussions with DWR personnel, who indicated that adaptations specifically tailored to their established statistical methodology were more likely to be implemented than a larger technological change toward purely model-based forecasting. We therefore give particular attention to comparisons of a hybrid approach with DWR's operational water supply forecasts.

2. Study Area

[8] California's high demand for water is fulfilled by a complex water supply system, including most notably the State Water Project (SWP), operated by DWR, and the Central Valley Project (CVP), operated by the U.S. Bureau of Reclamation. Together, they deliver roughly one-third of the 34 billion cubic m (bcm) (28 million acre ft (maf)) of water consumed annually statewide, with local projects, groundwater, and Colorado River imports providing the rest [California Department of Water Resources (CDWR), 2009]. Initial SWP water allocations are generally issued in late November/early December, although these are based mainly on current reservoir conditions and conservative hydrologic projections (i.e., climatology). Most key decisions regarding water supply usage (e.g., crop selection, groundwater needs) are reserved for January or February when the first snow surveys are conducted, and final allocations are typically issued in May (DWR Division of Flood Management, personal communication, 2011).

[9] Precipitation in California varies greatly from >3550 mm (140 in) in the northwestern part of the state to <100 mm (4 in) in the southeastern part [CDWR, 2003]. With a climate dominated by the Pacific storm track, 75% of this precipitation falls between November and March, with the majority occurring from December through February [Carle, 2009]. Orographic effects generated by California's massive granite backbone, the Sierra Nevada, cause much of this precipitation to fall as snow on its western slopes. The resulting runoff forms the Central Valley drainage, which acts as a funnel for the state's two longest rivers, the Sacramento and San Joaquin, as they make their way to San Francisco Bay and the Pacific Ocean.

[10] Contained within the Central Valley drainage are three distinct hydrologic regions, which together account for about half of the state's average annual streamflow of 88 bcm (71 maf). The regions are further subdivided into 14 major watersheds whose seasonal streamflows are forecast by Bulletin 120 at the primary locations in Figure 1; typical response times for these basins range from about 6 h (Tule) to 8 days (Upper Sacramento) [U.S. Army Corps of Engineers (USACE), 2001]. Summary statistics for each of the hydrologic regions and watersheds are provided in Table 1. The Sacramento region is the wettest of the three, providing the bulk of the SWP and CVP exports to the agricultural areas and population centers in the drier south. The San Joaquin region is characterized by watersheds with higher elevations that generally reach the peaks of their hydrographs later in the spring. The Tulare Lake region is naturally an endorheic basin, separated from the San Joaquin by a low, broad ridge that is overtopped by the Kings River in the wettest of years [CDWR, 2009; Carle, 2009]. Note that the Cosumnes and Mokelumne, while grouped in the San Joaquin region by Bulletin 120, are hydrologically distinct and more or less independent of the San Joaquin River system.

Figure 1.

The 14 watersheds of the Sacramento (blue), San Joaquin (green), and Tulare Lake (red) hydrologic regions, forming the study area for the paper.

Table 1. Average Annual Statistics for the Three Hydrologic Regions (bold) and the 14 Watersheds in the Studya
 Drainage Area (km2 (mi2))Annual Prec (mm (in))Annual Runoff (mcm (taf))April-July Runoff (mcm (taf))Annual Runoff Ratio
  • a

    For hydrologic regions, area and precipitation data are from CDWR [2009], and runoff data are from Dziegielewski et al. [1993] and CSWRB [1951]. For watersheds, drainage areas were calculated from huc250 k shape files, precipitation was calculated by areal averaging VIC forcing data over water years 1956–2005, and runoff was calculated from CDEC data of unimpaired streamflows at the points indicated in Figure 1 (also over water years 1956–2005). The annual runoff ratio is defined as the ratio of annual runoff to annual precipitation. mcm = million cubic m; taf = thousand acre-feet.

Sacramento region70,600 (27,200)930 (36.7)27,600 (22,400)N/A0.42
 1 Upper Sacramento26,400 (10,200)880 (34.7)11,000 (8910)3080 (2500)0.47
 2 Feather9310 (3590)1030 (40.7)5700 (4620)2200 (1780)0.59
 3 Yuba3400 (1310)1600 (62.8)2930 (2380)1240 (1010)0.54
 4 American4830 (1860)1270 (50.1)3350 (2720)1530 (1240)0.55
San Joaquin region39,400 (15,200)670 (26.3)9700 (7900)N/A0.37
 5 Cosumnes1650 (640)1000 (39.3)480 (390)160 (130)0.29
 6 Mokelumne2040 (790)1260 (49.5)930 (750)570 (460)0.36
 7 Stanislaus2590 (1000)1110 (43.8)1440 (1170)870 (710)0.50
 8 Tuolumne4180 (1610)1060 (41.8)2410 (1950)1500 (1220)0.54
 9 Merced2840 (1100)970 (38.2)1240 (1010)780 (630)0.45
 10 San Joaquin4410 (1700)970 (38.2)2260 (1830)1550 (1260)0.53
Tulare Lake44,100 (17,000)390 (15.2)4100 (3300)N/A0.24
 11 Kings4370 (1690)950 (37.5)2120 (1720)1510 (1220)0.51
 12 Kaweah2130 (820)910 (35.7)560 (450)350 (280)0.29
 13 Tule1060 (410)700 (27.4)180 (150)80 (60)0.24
 14 Kern5370 (2070)560 (21.9)900 (730)570 (460)0.30

3. Methods

[11] A summary of the forecasts compared in this study is presented in Table 2. Both principal components regression (PCR) and Z-score regression methodologies were adapted from the NRCS as detailed below. In contrast to the rest of the West, NRCS does not issue forecasts for watersheds in the Central Valley drainage, although it serves other parts of California such as the Klamath River and Lake Tahoe basins. The NWS, on the other hand, is represented in the region by the California Nevada RFC (CNRFC), which issues its own water supply forecasts using a combination of ESP and its own version of NRCS' PCR models (NWS California Nevada River Forecast Center, personal communication, 2011).

Table 2. Summary of the Forecasts Compared in This Studya
Source of Predictor Data  
SWEPRORegression MethodologyCalibration Period (years)
  • a

    Note that “surrogate” SWE predictors are a subset of “simulated” SWE predictors, and “surrogate” P predictors are a subset of “gridded” P predictors.

Observed coursesObserved gaugesObservedMultiple linear (DWR)50
Observed coursesObserved gaugesObservedPCR, Z-score50, 25
Observed courses and sensorsObserved gaugesObservedPCR, Z-score25
Surrogate coursesSurrogate gaugesObservedPCR, Z-score50, 25
Surrogate courses and sensorsSurrogate gaugesObservedPCR, Z-score50, 25
Simulated allGridded allObservedPCR, Z-score50, 25

3.1. Statistical Approach

[12] The statistical forecasting models of both DWR and NRCS can be represented as

equation image

where the target period streamflow (Q) is a function of three general categories of predictor variables: snow water equivalent (SWE), accumulated precipitation (P), and antecedent runoff (RO). DWR relies on standard multiple regression to develop its forecast equations, which employ various types of these predictor variables as listed in Table 3. The two that are common to all watersheds, SWE and accumulated precipitation, are weighted indices of observations at multiple locations (typically 10–20 for SWE and ∼10 for precipitation) in and around the watershed boundaries. For the six watersheds with more drastic ranges of topography (Feather, American, San Joaquin, Kings, Kaweah, and Kern), SWE is further divided into high- and low-elevation indices. Runoff, which is measured at the same stations for which the forecasts are issued, is more subjectively used depending upon the specific characteristics of each watershed; forecasts in the Upper Sacramento, for example, employ total runoff over the previous two water years in consideration of the greater water retention properties of the volcanic soils in that region (DWR Division of Flood Management, personal communication, 2008). The form of the calibrated equation varies by watershed, but generally consists of a polynomial model with predictors that have been transformed (typically via a power function) to account for nonlinearities.

Table 3. Predictor Index Variables Used in DWR's Forecast Equations for Each of the 14 Watersheds in the Studya
  • a

    SWE, snow water equivalent; SWEhi, high elevation SWE; SWElo, low elevation SWE; POctMar, Oct–Mar precipitation for the current water year; PAprJun, Apr–Jun precipitation for the current water year; ROOctMar, Oct–Mar runoff for the current water year; ROOctFeb, Oct–Feb runoff for the current water year; ROpreAprJul, Apr–Jul runoff for the previous water year; and ROpre2wy, total runoff for the previous two water years.

1x  xx x x
2 xxxxx   
3x  xx  x 
4 xxxx  x 
5x  xx    
6x  xx    
7x  xxx   
8x  xx    
9x  xx  x 
10 xxxxx x 
11 xxxx  x 
12 xxxxx   
13x  xxx x 
14 xxxxx x 

[13] Table 3 shows that several of DWR's predictors describe conditions that may be unknown at the time of a forecast. A forecaster working on 1 February does not know the current water year's October–March precipitation, for example. To account for this discrepancy, DWR relies on what are termed “future variables,” which (as the name suggests) extrapolate current conditions of predictor variables to future conditions using their long-term medians. Thus, using the same example, the observed precipitation from October to January is added to the long-term median precipitation for February and March to derive the total October–March precipitation used in the 1 February forecast. Similarly, SWE predictors always describe conditions on 1 April (when peak SWE is typically considered to occur), and forecasts always reflect the April–July target period, even when they are made later in the water year and a portion of the target period's streamflow has already been observed. This practice allows the use of a single equation for all dates forecasts are issued, achieving greater month-to-month consistency in predictor variables and a larger sample size for equation calibration. Note that DWR's final forecasts are the result of balancing predictions at several points in each watershed, regional comparisons of trends and relationships, and “forecaster feel” or experience (DWR Division of Flood Management, personal communication, 2011).

[14] In contrast to DWR, NRCS uses a regression approach based on principal components that dates to the early 1990s. A well-known complication in multivariate regression is collinearity among the predictor variables. DWR's practice of constructing composite indices from data of like data type (e.g., a single SWE value that is a weighted index of multiple SWE observations) partially circumvents this problem, but because the weights are subjectively determined outside of the regression, they are not statistically optimal [Garen, 1992]. PCR is a method of restructuring the predictor variables into uncorrelated principal components, which become the regressors.

[15] The NRCS approach considers only data known at the time of forecast as candidate predictors, which leads to the use of separate equations with varying predictors for each forecast date. Regression coefficients are determined by arranging principal components in order of decreasing eigenvalue (explained variance), developing an equation that sequentially retains only those principal components deemed significant via a t-test, and inverting the transformation so that coefficients are expressed in terms of the original predictor variables. The routine employs an iterative search algorithm that optimizes variable combinations by developing all possible equations resulting from an increasing number of predictors. With each additional variable, the standard error resulting from a jackknife procedure is used to order the equations, and the top 30 equations are identified. When the top 30 equations no longer change from one round to the next, a final equation is selected by striking a compromise between jackknife standard error and month-to-month variable consistency. NRCS' complete regression methodology is fully detailed by Garen [1992] and Garen and Pagano [2007].

[16] Z-score regression is a heuristic statistical method that takes advantage of predictor collinearity to relax the requirement for serial completeness [Pagano et al., 2009; Garen and Pagano, 2007]. Individual predictors are first pooled into groups of like data type, and each observation is converted into a Z-score, or standardized anomaly. For each data type, the coefficient of determination (R2) between standardized predictors and standardized predictand are used to generate an index wherein each element is a weighted average of all available Z-scores for that year. The computed index for each data type is then itself standardized, and the process is repeated to create a single composite index reflective of all the predictor data available each year. This composite index is then regressed against the standardized streamflow volumes using a least squares method. While PCR remains the official forecasting methodology for NRCS, Z-score regression is currently used to provide daily objective guidance to users for a subset of forecast locations [Pagano et al., 2009].

3.2. Physically Based Simulation

[17] Regardless of the regression technique, all of the aforementioned methods employ observed data as predictors. In this study, we contribute simulated SWE predictors using the snow model [Andreadis et al., 2009] contained in the variable infiltration capacity (VIC) macroscale hydrologic model [Liang et al., 1994]. VIC is a semidistributed grid-based model that is typical of land surface models (LSMs) used in most numerical weather prediction and climate models [Wood and Lettenmaier, 2006]. Like other LSMs, VIC solves the water and energy balance at each time step, but is distinguished by its parameterization of subgrid variability in soil moisture, topography, and vegetation. VIC has been successfully applied in a number of research studies involving major river basins worldwide [Nijssen et al., 1997], and was demonstrated to reproduce SWE, soil moisture, and runoff data that compared favorably with observed data for watersheds of varying size across the conterminous United States [Maurer et al., 2002].

[18] Snow accumulation and ablation processes within VIC are simulated using a two-layer energy and mass balance approach. A thin surface layer is used to solve energy exchange with the atmosphere, while the lower or pack layer is used as storage to simulate deeper snowpacks [Andreadis et al., 2009; Cherkauer and Lettenmaier, 2003]. The model contains an explicit overstory canopy interception scheme that accounts for sublimation, meltwater drip, and mass release of intercepted snow.

[19] For this study, VIC implementations described by VanRheenen et al. [2004] and Tang and Lettenmaier [2010] were adapted to a spatial resolution of 1/16° (∼5–7 km at this latitude). Each grid cell was further subdivided into as many as five elevation bands, depending on elevation range. VIC as a whole was implemented at a 24-h time step, with the embedded snow model using a 1-h time step. The model was forced by daily precipitation and maximum/minimum temperature data from National Oceanic and Atmospheric Administration (NOAA) Cooperative Observer stations, and daily wind data from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) Reanalysis Project, following the methods of Maurer et al. [2002].

3.3. Hybrid Model

[20] Two objectives guided the development of the hybrid models: first, to produce methods relevant to DWR's current operational forecasting setting, and second, to allow objective comparisons between the respective skill of simulated and observed predictors. Our regression approach is therefore a compromise between the aforementioned statistical techniques. We employed the same predictor variables in each watershed as those listed in Table 3, and we calibrated our equations over the same 50 year period (water years 1956–2005) used by DWR. Instead of standard multiple regression, however, we adopted the PCR and Z-score methodologies used by NRCS. PCR forecasts were generated using the equation with the median jackknifed standard error of the best 30 models resulting from the search algorithm. This choice “handicapped” our results to match more closely those that might be expected operationally, which typically reflect models with slightly less than optimal skill due to the additional selection criterion of maintaining consistent predictors from month-to-month (see section 3.1). In actuality, the difference in skill between the median equation and the best equation was negligible.

[21] Unlike DWR's methodology, predictor variables were used only as they were known at the time of the forecast; SWE data were not extrapolated to 1 April, and data representing accumulated quantities were used only when in the past. A forecast issued on 1 February included just October–January precipitation in its value for the October–March precipitation predictor and did not use the April–June precipitation predictor at all, for example. We also imposed the criterion that predictors have a correlation of at least 0.3 with the predictand (as is the practice at NRCS) and at least 10 nonzero values over the 50 year period. In addition, rather than using a constant target period regardless of the forecast date, we predicted shrinking target periods reflecting only the months remaining in a season. A forecast on 1 April was thus for the entire April–July season, but a forecast on 1 May was made just for May–July. This choice has two benefits. First, in contrast to DWR's practice of using 1 April SWE as a predictor even in forecasts issued later in the season, our usage of current conditions would result in constant target period forecasts that employ late-season SWE to predict prior streamflow. Second, by focusing our attention on just the water remaining in the snowpack, we are better able to test the late-season performance of a hybrid approach, which can integrate SWE simulated at the highest elevations at a time when snow courses or sensors at lower elevations may already be snow free.

[22] The hybrid model predictors included VIC-simulated SWE data and gridded precipitation forcing data as described in section 3.2. Runoff predictors were obtained from observed records of unimpaired streamflow, archived at the California Data Exchange Center (CDEC) (available at, for the locations shown in Figure 1. All SWE data occurred on the first of the month, as at DWR. The domain from which the simulated data were selected depended on the locations of DWR's snow and precipitation observing stations, many of which are located outside catchment boundaries (DWR Division of Flood Management, personal communication, 2008). Watershed boundaries were expanded (“offset”) by either 1/4°, 1/2°, or 3/4° in latitude or longitude in order to encompass all of the observing stations, and simulated data were compiled for all grid points included in the expanded domains. Since observations external to catchment boundaries are often considered proxies for unmonitored points within, some may question the need for external data when all internal data are simulated. Yet the weather patterns resulting in precipitation and snow yield covariability at these locations as well, and thus the offsets offer an opportunity to assess the relationship of predictive skill to distance from watershed boundaries. Figure 2 shows watershed areas, expansion offsets, and expanded areas (Figure 2, top), along with the numbers of simulated SWE and precipitation predictors (Figure 2, middle) for each watershed. The greater numbers of SWE than precipitation predictors result from the multiple elevation bands present within each grid cell, as discussed in section 3.2.

Figure 2.

(top) Watershed and expanded predictor areas; (middle) numbers of simulated predictors; and (bottom) numbers of observed predictors.

[23] It should be noted that our hybrid approach does not account for structural uncertainty in forecasts, a topic that has received considerable attention in the recent hydrologic literature [e.g., Devineni et al., 2008; Sharma and Chowdhury, 2011]. Structural uncertainty, which concerns forecast errors that can be attributed to deficiencies in model formulations, is particularly relevant when models are based on a large number of candidate predictors. A simple strategy for addressing this issue could involve a static combination of forecast models. In the present context, such an approach might derive a weighted average of forecasts resulting from each of the 30 models selected by the PCR search algorithm, with weights determined on the basis of model fitness. While static combination has been shown to improve the stability of model predictions, however, unwanted side-effects such as biases in the estimation of high and low flows are likely to result [Sharma and Chowdhury, 2011]. Other strategies such as dynamic model combination could better address the issue of structural uncertainty but would increase system complexity and lie outside the scope of the study to investigate.

3.4. Control Models

[24] Our “control” forecast models were developed using the same approach as the hybrid models (i.e., PCR and Z-score), but incorporated observed data obtained from CDEC for stations used by DWR in their monthly forecasts. In contrast to forecasts based on simulated data, however, forecasts based on observed data are less straightforward. Many observed records are serially incomplete, an issue that can be circumvented using an approach like Z-score regression, but which complicates methods like PCR. Moreover, DWR's official SWE predictors comprise manual snow course observations but not automated snow sensor observations, which occasionally suffer inaccuracies due to problems like “ice bridging” (DWR Division of Flood Management, personal communication, 2008). Thus, official observations are generally taken only as needed in the months of February to May, precluding the development of control models at other times of the year.

[25] To address this issue, we incorporated snow sensor data by selecting a collection of snow sensors that mimicked the official snow courses. Most snow sensor records, however, date only as far back as the late 1970s or early 1980s, whereas DWR's official equations are calibrated over a 50 year period from water years 1956 to 2005. To support a consistent comparison between prediction approaches, we devised two types of control models: one that included just snow course data and was calibrated over the full 50 year period used by DWR, and one that included both snow course and snow sensor data and was calibrated over a 25 year period from water years 1981–2005. The numbers of snow courses, snow sensors, and rain (precipitation) gauges used for each watershed are indicated in Figure 2 (bottom). We estimated missing SWE and precipitation observations (as required by PCR) by regressing stations with missing data against those of like data type within the same watershed, which is the practice preferred by NRCS (NRCS National Water and Climate Center, personal communication, 2010). Stations that had <80% of a complete record were excluded from the analysis.

3.5. Surrogate Observational Data

[26] Despite the efforts detailed above, SWE records for many watersheds remained incomplete in June and July. We therefore added an analysis based on “surrogate” observational data, which consisted of either gridded precipitation data or simulated SWE data at grid points adjacent to each observing station. Surrogate SWE data were selected from the model's elevation band most closely corresponding to the elevation of the observing station.

[27] The surrogate approach allows us to quantify the benefit of additional predictor candidates in a statistical model, which is a central question of this study. Comparisons between the predictive skill of simulated and observed data are complicated by differences between the two data sets. The surrogate approach circumvents the uncertainty associated with these differences since the surrogate data are selected from, and thus a subset of, the simulated data. The surrogate data also provide a baseline from which to compare the skill of actual observations, or to estimate their potential skill in months with insufficient data. These comparisons are discussed in section 4, which also includes comparisons with forecasts retrospectively generated using DWR's current regression equations, circa 2006 (DWR Division of Flood Management, personal communication, 2008).

4. Results

4.1. Forecast Skill Analysis

[28] We first present our results from the surrogate analysis to establish a baseline of performance. Jackknifed standard error comparisons between forecasts based on the full suite of simulated data and those based on just the selected surrogate data are presented in Figure 3 for both Z-score and PCR methodologies. All results are based on forecasts generated using linear forms of regression models. Similar experiments based on nonlinear equations did not yield more normally distributed residuals or lower standard errors than their linear counterparts.

Figure 3.

Jackknife standard error as a percentage of mean streamflow for a shrinking target period over water years 1956–2005.

[29] Several patterns emerge from the results of Figure 3. For each watershed, PCR forecasts are comparable to or better than those produced using a Z-score approach. Increasing the number of candidate predictors consistently results in superior forecast skill under the PCR approach, with models developed from all grid points in a domain (hybrid forecasts) achieving the lowest standard errors. For almost all watersheds, including the Yuba and Merced in particular, the largest improvements occur late in the snowmelt season in the months of June and July, supporting the notion that a hybrid model can exploit the ability to simulate SWE at high elevations. The greatest overall improvements occur in the Cosumnes and Tule, which are less snowmelt-dominant and have lower elevations and streamflows than other watersheds in the study. These results appear to be mostly a reflection of the coverage offered by existing observation networks, which are less dense than in other catchments, and the particular performance of the hybrid forecasts in wet years (section 4.2), which tend to dominate standard error for all years because of the positive skew in the streamflow distributions. Note, however, that even with these improvements, forecast error is still higher in the Cosumnes and Tule than other watersheds due to the higher coefficients of variation (CV) of their April–July streamflows (0.81 and 0.97, respectively).

[30] Results using a Z-score approach tell a different story. In most watersheds, forecasts based on the full set of simulated data are comparable to or, at best, marginally better than those based on just the selected surrogate data. In the Kings River watershed, forecasts based on the full set of simulated data are actually worse than those based on just the selected surrogate data. Interestingly, as in the PCR approach, the largest improvements occur in the Cosumnes and Tule River basins. The poor performance of Z-score was clearly due to a lack of a search routine to first screen out those predictors with negligible predictive skill, thus diluting the predictive skill of the group as a whole. As a test, we performed an additional analysis whereby only those stations selected for the PCR model were used as Z-score predictors, and found the results to be comparable to those obtained by PCR alone. However, because this screening routine was already embedded in the PCR approach, Z-score was abandoned for the rest of the analysis.

[31] Figure 4 presents data reflecting the 10th and 90th percentiles of forecast residuals as a percentage of mean streamflow. These “funnel plots,” so named because of their shape, compare DWR's forecasts with PCR forecasts based on three types of predictor data: the full set of simulated data for each domain, selected surrogate data, and observed data that have been filled using a Z-score regression methodology (see section 3.4). Because all four forecast types are calibrated over the 50 year period from 1956 to 2005, forecasts that employ observed or surrogate snow sensor data were excluded. Residuals from DWR's forecasts are shown beginning in February, which is the first month they are available each water year, and residuals from PCR forecasts based on observed data are shown only for February through May, since those were the only months that permitted the development of complete SWE records. Funnel plots are typically employed by DWR as a measure of model skill (DWR Division of Flood Management, personal communication, 2008), and as such, reflect the entire April–July season rather than a shrinking target period as in Figure 3. PCR forecasts issued after 1 April were therefore adjusted by adding streamflow observed since 1 April to account for the full period. This unequal advantage should be noted when comparing them to DWR's official forecasts in these plots.

Figure 4.

Tenth and 90th percentiles of forecast residuals as a percentage of mean streamflow for a constant April–July target period over water years 1956–2005.

[32] A striking result from Figure 4 is the close correspondence between forecast residuals based on observed data (green lines) and those based on simulated surrogate data at selected grid points (red lines). Despite the differences between the two data sets, a surrogate approach appears to produce forecasts that are remarkably similar to those based on their observed counterparts. The implications of this result suggest that forecast skill for months without sufficient observed data can be reasonably reproduced using forecasts based on surrogate (estimated) predictor data. As a check on the interannual variability of modeled SWE, we calculated time series of composite indices that were weighted averages of the surrogate SWE data, and then compared these indices to those based on observed SWE data used by DWR. Correlations between the two were high, ranging from 0.84 to 0.97 for 1 April SWE, depending on the watershed. We also checked whether a relationship existed between the difference in modeled and observed indices and DWR's historical forecast errors, but found only weak correlations at best.

[33] As in Figure 3, the funnel plots indicate that most forecasts based on the full set of simulated data offer at least some improvement over PCR forecasts based on either observed or surrogate data. When averaged over both 10th and 90th percentiles, for example, improvements in 1 April forecasts range from about 1%–2% (Yuba and American) to 12%–13% (Kaweah and Tule). In terms of streamflow volume alone, the greatest improvement in 1 April forecasts occurs in the Upper Sacramento, where a difference of ∼11% equates to a reduction in forecast error of about 340 million cubic m (mcm) (280 thousand acre-feet [taf]). The apparent late-season superiority of PCR forecasts over DWR's official forecasts should be tempered by the incongruity between them noted above. However, the earlier season PCR residuals are relatively unbiased (i.e., well-centered around the zero percentile) in contrast to the official forecast residuals, which appear shifted in the positive direction. This asymmetry is most likely due to the nonlinearity of DWR's equations and will be addressed in greater detail in the next section.

[34] Efforts to incorporate snow sensor data as predictors offered little additional information about the late-season performance of observation-based PCR forecasts. Despite the use of a smaller 25 year calibration period, the sparseness of these data still left most watersheds with incomplete SWE records in the months of June and July. For exceptions such as the American River watershed, results corroborated the above findings that skill metrics for surrogate-based forecasts were reasonable indicators of skill for forecasts based on their observed counterparts (Figure 5).

Figure 5.

Jackknife standard error for PCR forecasts with a shrinking target period and a 25 year calibration period (1981–2005) in the American. Incorporating snow sensor data as predictors allowed observation-based PCR forecasts to be generated in the additional months of December, January, and June.

4.2. Analysis by Water Year Type

[35] In addition to evaluating forecast performance in all years, we assessed performance in wet, normal, and dry year categories (defined by terciles). These groupings were analyzed via another commonly used skill metric, the Nash-Sutcliffe coefficient of efficiency (NS) [Nash and Sutcliffe, 1970]. An NS score of 1 is perfect, 0 indicates skill equal to that of climatology, and values less than 0 denote negative skill.

[36] NS scores for each of the 14 watersheds are given in Figure 6 for forecasts issued on 1 April, which is considered the start of the melt season and thus a benchmark for comparison, and in Figure 7 for forecasts issued on 1 May, when final allocations for SWP contractors are issued. Figure 6 (top) shows scores over all water year types and resembles the results in Figures 3 and 4. PCR forecasts based on the full set of simulated data result in the best scores, those based on either observed or surrogate data result in slightly lower scores, while DWR's official forecasts generally do not perform as well. The second plot (wet years only) is similar to the top plot in its skill rankings, although the scores are lower for each watershed. NS scores for normal and dry water years, however, are less coherent. For normal years, DWR's forecasts outperform PCR forecasts in the Sacramento watersheds, while PCR forecasts outperform DWR's forecasts in many of the other watersheds; for dry years, the reverse is more often true. In Figure 7, which shows results for forecasts issued on 1 May, scores are consistently higher. Forecasts in the top two plots of Figure 7 show similar patterns to those in Figure 6, with considerable wet-year improvements in Figures 6 and 7 for the Cosumnes and Tule (watersheds 5 and 13). Conspicuously, however, DWR's forecasts score highest for normal and dry years in Figure 7. Note that 1 May PCR forecasts are for streamflow from May to July while DWR's forecasts are for streamflow from April to July, although the effect of this disparity on NS scores is most likely small.

Figure 6.

1 April Nash-Sutcliffe efficiency scores for years of (from top to bottom) all, wet, normal, and dry water types.

Figure 7.

1 May Nash-Sutcliffe efficiency scores for years of (from top to bottom) all, wet, normal, and dry water types. Note that PCR forecasts are for a shrinking target period while DWR forecasts are for a constant April–July target period.

[37] The variation in performance of the methods in Figures 6 and 7 is largely explained by the variations in mean forecast bias shown in Figures 8 and 9. Note that the biases in these plots reflect forecasts minus observations, as opposed to the residuals presented in Figure 4, which reflect observations minus forecasts in accordance with DWR's methodology. The top three plots in Figure 8, which are all based on PCR forecasts, show similar patterns. Biases calculated over all water years are consistently zero, as expected for PCR models. Biases calculated for wet years are minimal, while those for normal and dry years are slightly larger. In contrast, DWR's forecasts exhibit a markedly negative bias for each water year type, indicating a tendency to underpredict streamflow. This explains some of the differences in NS scores; where DWR's forecasts are more biased than PCR forecasts (all and wet years), their scores are lower, but for the several watersheds in which their normal and dry year forecasts are less biased, their scores are higher. An example of the latter condition occurs for normal year forecasts in the Sacramento watersheds, due in part to residual patterns that are less linear than in other parts of the domain. Interestingly, DWR's 1 May forecasts (Figure 9, bottom) are still negatively biased, but much less so than their 1 April forecasts. PCR forecasts on 1 May are generally also less biased, but for many watersheds DWR's dry year biases are now smaller, thus explaining their better performance in that category.

Figure 8.

1 April forecast bias as a percentage of mean target period streamflow for (from top to bottom) PCR forecasts using observed data, surrogate grid points, all grid points, and DWR's official forecasts.

Figure 9.

1 May forecast bias as a percentage of mean target period streamflow for (from top to bottom) PCR forecasts using observed data, surrogate grid points, all grid points, and DWR's official forecasts. Note that PCR forecasts are for a shrinking target period while DWR forecasts are for a constant April–July target period.

[38] Despite the superior performance of the hybrid forecasts overall, their limitation in the dry year category warrants comment, especially given California's sensitivity to water scarcity. To its credit, the hybrid model's dry year forecasts perform well in the Sacramento region, which is the source of most of the state's water supply. However, these forecasts are generally less skillful in other watersheds, and thus the model may benefit from a calibration strategy that better recognizes different hydroclimate regimes.

4.3. Geospatial Analysis of Predictors

[39] Applying a search algorithm in combination with PCR also represents a systematic method of determining optimal variable combinations for predictive purposes. In the context of a gridded set of candidate predictor data, the approach offers a means for analyzing predictor locations.

[40] To illustrate the potential utility of the method in this role, we assess predictor locations in the Feather and San Joaquin watersheds (Figures 10 and 11, respectively). At the top left of Figures 10 and 11 is a topographic map of the watershed's predictor domain, including offsets described earlier. Figures 10 (middle) and 10 (bottom) and 11 (middle) and 11 (bottom) depict the 1 April SWE and October–March precipitation predictors that were chosen by the hybrid model for the 1 April forecasts. The black circles in Figures 10 (middle) and 11 (middle) represent all of the predictors that appear at least once in any of the top 30 equations, with the size of each circle proportional to the frequency with which the predictor appears in the equations. The red circles in Figures 10 (bottom) and 11 (bottom) represent what we term the “mean water contribution” of each predictor in the final selected equation (i.e., the equation having the median jackknifed standard error of the best 30 models, as described in section 3.3). Each predictor's mean water contribution is the product of its regression coefficient in the selected equation and its mean value over the 50 year period, thus representing the average influence of the predictor on forecasted streamflows.

Figure 10.

(left) 1 April SWE and (right) October–March precipitation predictor locations for 1 April hybrid forecasts in the Feather. See the text for more details.

Figure 11.

(left) 1 April SWE and (right) October–March precipitation predictor locations for 1 April hybrid forecasts in the San Joaquin. See the text for more details.

[41] Underlying the selected predictors are maps of climatology for each predictor type. Grid cells shown in color were used as predictor candidates (i.e., those having a correlation of at least 0.3 with the predictand and at least 10 nonzero values over the 50 year period, as described in section 3.3). Some initial patterns are discernable. For both SWE and precipitation, the most frequently selected predictors seem to occur in clusters, indicating that those locations contain important information for streamflow prediction. The relationship of these clusters to climatology is not obvious; some occur in regions of high average values (such as the SWE predictors in the south-central part of the Feather) and others occur in regions of lower average values (such as the SWE predictors in the southeastern corner of the Feather). In terms of mean water contribution, predictors in regions of higher climatology generally have more influence, but those in regions of lower climatology are significant as well. Similar patterns are evident for precipitation predictors and for the San Joaquin watershed.

[42] Additional statistical data for the predictors used in the selected equation are given in Tables 4 and 5. The correlations and CVs presented are revealing. There is generally a direct relationship between climatological averages and correlation; those predictors with higher means generally have higher correlations as well. On the other hand, climatological average is generally inversely related to CV, with predictors having the lowest means possessing the highest CVs. The ninth and tenth columns of Tables 4 and 5 show the eigenvector loadings for each predictor in the two principal components deemed significant (the 1 April equations for both the Feather and San Joaquin just happened to contain two principal components each). For both watersheds, the first principal component represents the general spatial distribution of water availability in the basin, i.e., with higher weightings in areas with higher climatological averages. The second principal component reflects variability from that pattern, according higher weights to predictors with lower climatological averages and higher CVs. Note that the regression coefficients (described in section 3.1) have been inverted from the principal component transformation to allow expression in terms of the original predictor variables. Thus, with the exception of the y-intercept (which is not shown), the seventh columns in Tables 4 and 5 completely describe the final equation for the 1 April forecast.

Table 4. Predictor Statistics for the Selected 1 April Forecast Equation in the Feathera
IDTypeFreq.Corr.CVMean (mm)Reg. Coef.Mean Water Contr. (mcm)PC1PC2
  • a

    Freq., frequency of predictor occurrence in the top 30 equations; Corr., correlation between predictor and predictand; CV, coefficient of variation; Mean, mean climatological value in millimeters; Reg. Coef., regression coefficient in the selected equation; and Mean Water Contr., product of the mean and the regression coefficient, which is equal to the mean “contribution” of the predictor to the streamflow forecast in million cubic meters. The elements of PC1 and PC2 represent the loadings in the eigenvectors for each of the predictors, and explained variance was determined from the respective eigenvalues.

X1Apr 1 SWE10.422.6528.221.4240.010.22−0.04
X2Apr 1 SWE70.810.80428.400.59250.830.390.14
X3Apr 1 SWE290.820.96313.500.69217.030.390.21
X4Apr 1 SWE240.432.801.3822.7731.350.24−0.46
X5Apr 1 SWE290.363.980.8319.9616.590.21−0.55
X6Apr 1 SWE130.363.990.5729.1416.490.21−0.55
X7Apr 1 SWE300.711.45112.471.03116.150.340.03
X8Oct–Mar prec300.830.37808.400.68552.870.390.14
X9Oct–Mar prec150.830.39668.870.77517.840.380.19
X10Oct–Mar prec290.670.59368.160.79292.320.310.26
Explained variance (%)4924
Table 5. Predictor Statistics for the Selected 1 April Forecast Equation in the San Joaquina
IDTypeFreq.Corr.CVMean (mm)Reg. Coef.Mean Water Contr. (mcm)PC1PC2
  • a

    Abbreviations are as defined for Table 4.

X1Apr 1 SWE300.413.512.8612.1434.730.10−0.76
X2Apr 1 SWE300.960.57681.500.25167.850.31−0.10
X3Apr 1 SWE210.920.65634.460.21130.890.30−0.04
X4Apr 1 SWE40.881.21134.670.3648.240.300.17
X5Apr 1 SWE190.860.86417.060.1666.940.290.16
X6Apr 1 SWE300.880.66465.780.28129.910.28−0.09
X7Apr 1 SWE300.841.3398.020.3635.460.290.23
X8Apr 1 SWE300.661.6459.020.116.490.240.41
X9Oct–Mar prec260.930.40397.330.55217.340.30−0.05
X10Oct–Mar prec300.920.40612.120.40247.360.29−0.17
X11Oct–Mar prec300.930.39960.140.26251.460.30−0.16
X12Oct–Mar prec190.910.40937.980.26242.940.29−0.17
X13Oct–Mar prec300.760.63111.190.6572.040.260.18
Explained variance (%)768

[43] The results imply that most of the primary skill is derived from predictors with higher climatological averages, but important information is also contained on the “fringes” of these primary areas. This finding is particularly relevant for SWE, which is “transient” at the lower elevation approaches to the high SWE areas. Most ground-based observations of SWE are located in areas of relatively high and persistent (nontransient) snow accumulation, but these may covary more strongly with each other than with less-measured transient areas (e.g., those that appear in the second principal components of this analysis). In addition, many of the best predictors are located outside watershed boundaries, a result noted for other watersheds as well. It will not surprise statistical forecasters that a location need not be “in-basin” to contribute to streamflow predictability. Although this analysis did not formally separate predictor selection, calibration, and validation, and thus some potential for predictor selection bias may exist, jackknifing reduces the risk of this bias by separating validation from predictor selection and calibration.

[44] Also shown on the maps in Figures 8 and 9 are the locations of existing ground-based observing stations. A comparison of these locations with those of the predictors selected by the hybrid model is instructive. While some predictors, such as the SWE predictors in the south-central part of the Feather, are located in areas already well served by observing stations, many, such as the precipitation predictors in the northeast corner of the Feather, are not. This suggests that distributed model simulations coupled with statistical analysis may provide a useful tool for improving or expanding existing networks, and is an area for future research.

5. Conclusions

[45] By combining physically based predictor variables with statistically based prediction methods, we demonstrated a hybrid approach that leverages strengths from both in a real-time, operational forecasting framework. Hybrid forecasts are shown to attain skill comparable to those based on observed data when a selected number of predictor variables are employed, and superior skill when the full set of simulated data are considered. Although this study focuses only on SWE in order to conform with operational practice, various studies have shown a contribution of soil moisture to streamflow predictability as well [Koster et al., 2010; Maurer et al., 2004; Wood and Lettenmaier, 2008]; thus, along with other simulated fields such as runoff, it is worth examining as a potential input to this framework in future studies.

[46] By simulating SWE at the highest elevations, a hybrid approach also allows the generation of late-season forecasts when most observing stations are snow-free. This feature of the model holds particular value for the catchments of the San Joaquin and Tulare Lake regions, which contain peaks as high as 4400 m (14,500 ft) and typically experience longer snow persistence. For the San Joaquin, Kings, and Kern specifically, roughly 10% of each watershed lies above the highest snow observations at 3400 m (11,000 ft), indicating sizable areas that are ungauged once the snowline has reached this elevation. Benefits may be most notable for watersheds with relatively small reservoirs that must balance late-season water supply with flood control considerations. The classic example of this scenario is the San Joaquin watershed's Millerton Reservoir, whose capacity of 640 mcm (520 taf) must contend with an average April–July runoff of 1550 mcm (1260 taf) (Table 1). This low storage-to-runoff ratio prohibits carryover storage, amplifying the shortfall risks associated with late-season overforecasts that are common in dry years (DWR Division of Flood Management, personal communication, 2011). In wet years, it is not unusual for June–July runoff alone to reach 1300 mcm (1060 taf), for which even a respectable 5% underforecast equates to 65 mcm (53 taf) of unanticipated runoff, or roughly 10% of the reservoir's capacity, at a time when reservoir levels are likely to be high. Similar issues occur in the Merced, Kings, Kaweah, Tule, and Kern watersheds, all of which struggle with low storage-to-runoff ratios and limited downstream capacities to manage snowmelt flooding, the most recent of which occurred in water year 2006 [see, e.g., Martin, 2006].

[47] Beyond its forecasting ability, a hybrid model holds potential as a tool for rationalizing predictor locations. While somewhat unique in the context of water supply forecasting, our geospatial analysis is similar to those that have long been common in the atmospheric sciences [see, e.g., Wallace and Gutzler, 1981]. One could conceive of additional experiments designed specifically to determine the next best location for an observing station within a ground-based system. Thus, whether hybrid approaches find use in current systems of operational forecasts, statistical analyses of distributed data sets can help us to assess and improve the infrastructure that makes them possible.

[48] Results of the study have been well received by the Hydrology Branch of DWR's Division of Flood Management. Given the well-established use of its current system and the economic and computational expense of a physically based model, however, it is unclear whether DWR would switch to a hybrid approach in the near term (although it is possible that they could use real-time simulations currently run by CNRFC, which could also be leveraged in case of ground equipment failure). Perhaps most valued, therefore, is the geospatial capacity of the approach. As DWR moves forward in preserving and expanding its data collection network, it has been increasingly asked to justify the cost and environmental impacts of its observing stations (e.g., disturbing a pristine wilderness area with a data collection tower). A hybrid model provides the agency with a tool to rationalize its geographic choices, not to mention the trickledown effect of these locations on data quality and forecast improvements (DWR Division of Flood Management, personal communication, 2011).

[49] Opportunities for a hybrid approach exist in other parts of the western United States as well, particularly in snowmelt-dominant watersheds with relatively sparse observation networks. Possible candidates include those with a considerable percentage of their domain in the National Wilderness Preservation System, for which observation equipment may be restricted [Landres et al., 2003]; examples include the Wind River (Wyoming), Flathead River (Montana), and Gunnison River (Colorado) basins (NRCS National Water and Climate Center, personal communication, 2011). Benefits can also be realized in watersheds with a more transient snowpack, as demonstrated for the Cosumnes and Tule in the present study.

[50] Raw operational forecasts are subject to a great deal of scrutiny and adjustment before they are issued, and so the actual impact of a hybrid system is difficult to predict. Yet the advantages of the approach should not be overlooked. Physically based models forced by mostly low-elevation temperature and precipitation data simulate SWE with biases to be sure, but accurately enough to add value for statistical forecasts. This paper presents one means of exploiting that information resource within an operational framework.


[51] This work was funded by NASA grant NNSO6AA78G, USGS grant 06HQGR0190, and the Joint Institute for the Study of the Atmosphere and Ocean (JISAO) under NOAA Cooperative Agreement NA17RJ1232. The authors thank the staff of DWR and NRCS for their generous assistance, with special thanks to David Rizzardo (DWR), Adam Schneider (DWR), and David Garen (NRCS). We also thank Thomas Pagano, Katrina Grantz, and one anonymous reviewer for comments that substantially improved the manuscript. Technical support from the members of the Land Surface Hydrology Research Group at the University of WA is also gratefully acknowledged.