Incorporating probabilistic seasonal climate forecasts into river management using a risk-based framework



[1] Despite the influence of hydroclimate on river ecosystems, most efforts to date have focused on using climate information to predict streamflow for water supply. However, as water demands intensify and river systems are increasingly stressed, research is needed to explicitly integrate climate into streamflow forecasts that are relevant to river ecosystem management. To this end, we present a five step risk-based framework: (1) define risk tolerance, (2) develop a streamflow forecast model, (3) generate climate forecast ensembles, (4) estimate streamflow ensembles and associated risk, and (5) manage for climate risk. The framework is successfully demonstrated for an unregulated watershed in southwest Montana, where the combination of recent drought and water withdrawals has made it challenging to maintain flows needed for healthy fisheries. We put forth a generalized linear modeling (GLM) approach to develop a suite of tools that skillfully model decision-relevant low flow characteristics in terms of climate predictors. Probabilistic precipitation forecasts are used in conjunction with the GLMs, resulting in season-ahead prediction ensembles that provide the full risk profile. These tools are embedded in an end-to-end risk management framework that directly supports proactive fish conservation efforts. Results show that the use of forecasts can be beneficial to planning, especially in wet years, but historical precipitation forecasts are quite conservative (i.e., not very “sharp”). Synthetic forecasts show that a modest “sharpening” can strongly impact risk and improve skill. We emphasize that use in management depends on defining relevant environmental flows and risk tolerance, requiring local stakeholder involvement.

1. Introduction

[2] To sustain healthy river systems, it is generally accepted that a balance must be struck between human demands for water and the amount required for river ecosystems. However, achieving this balance becomes more and more challenging as water demands intensify and climate becomes more uncertain. As such, additional tools are needed to ensure effective river management across competing uses and in light of climate variability.

[3] To date, most efforts to incorporate climate information into water management have focused on water supply as related to reservoir operations. This is understandable, given the importance of water supply reliability, and therefore streamflow forecasting has been the subject of extensive research [e.g., Garen, 1992; Grantz et al., 2005; Devineni et al., 2008; Bracken et al., 2010; see Wood and Lettenmaier, 2006, and references therein], including studies evaluating the benefits of forecasts in water supply management [e.g., Grantz et al., 2007; Golembesky et al., 2009]. However, information provided for supply management is often too coarse (e.g., total seasonal volume) and does not provide information about the multiple flow regime attributes that are important in regulating ecological processes, such as floods and low flows [Poff et al., 1997]. This disconnect is problematic, given that river systems are strongly influenced by hydroclimatic variability and increasingly vulnerable to climate change [Poff et al., 2002]. A better understanding of the influence of climate on ecosystem flow requirements has been recommended for further study [Petts, 2009], and initial efforts using ocean-atmosphere oscillations to determine ecologically sustainable irrigation withdrawals have shown promise [Mondal et al., 2011].

[4] However, how much streamflow is necessary for river ecosystems has fostered considerable debate. One widespread recognition is that to sustain ecological integrity in altered rivers, flows need to be managed to mimic the natural flow regime of the unaltered system [Poff et al., 1997]. However, scientists are faced with the difficulty of clearly translating these hydrologic principles into actionable management guidelines [Poff et al., 2003]. This has encouraged scientists to define environmental flows (e-flows), specified either as acceptable flow ranges or flow limits that should not be exceeded [Richter et al., 2003]. Methods to determine e-flows have advanced to provide comprehensive flow assessments [see Petts, 2009, and references therein], though standards are still influenced by expert judgment [Petts, 2009] and the practice of using minimum flows for a single species persists [Arthington et al., 2006]. Certain e-flows may be prioritized for different systems; for instance, regulated systems may require managing for multiple aspects of flow variability (e.g., low flows, high flow pulses, and floods), whereas unregulated systems with water withdrawals may only be concerned with low flows in critical periods [Mathews and Richter, 2007].

[5] But however, the e-flow value is determined, it is clear that work is needed to explore how to integrate climate forecasts into streamflow predictions that are relevant to river ecosystem management. To this end, the goal of this work is to present a risk-based framework that can be used with probabilistic climate forecasts for prospective ecosystem management. The framework is presented in five steps: (1) define the situational risk tolerance, (2) develop a forecast model to relate climate information to streamflow characteristics relevant to management, (3) generate seasonal climate forecast ensembles, (4) estimate streamflow ensembles and associated risk, and (5) manage for climate risk through actions aimed at offsetting risk to acceptable levels. Specifically, we put forth a generalized linear modeling (GLM) approach to develop a suite of tools that estimate a variety of flow characteristics. Probabilistic seasonal climate forecasts are incorporated into the GLMs, resulting in predictions that provide the full risk profile. Importantly, these tools are embedded in an end-to-end risk management framework that directly supports proactive planning and decision making. In this paper, we first present the general framework (section 'Risk-Based Framework'), which could be tailored to a specific watershed and management context. Next, we demonstrate the framework for the Big Hole watershed in southwest Montana, which is unregulated (i.e., there is no large amount of storage), but water is withdrawn for agricultural purposes. For this system, we provide background (section 'Case Study Overview') and results (section 'Defining Risk Tolerance'); specifically, we use observed snowpack with precipitation forecasts to predict summer low flow characteristics—including number of days below a threshold and threshold exceedance likelihoods—that are relevant to ongoing fish conservation efforts.

2. Risk-Based Framework

2.1. Step 1: Define Risk Tolerance

[6] Risk management is a viable technique to guide actions in light of climate risk [Jones and Preston, 2011; Yohe and Leichenko, 2010]. However, to adopt a risk-based approach, several factors need to be defined, a priori.

[7] A first step for ecological management is to define the e-flows, or acceptable/critical thresholds (such as minimum flows), which is also a key component of any risk-based approach [Jones, 2001]. Although e-flow definitions are challenging, guidance exists to promote consensus-driven processes that focus on ongoing collaboration between scientists, managers and other stakeholders [Richter et al., 2006] and foster sustainable management practices [Poff et al., 2003; Richter et al., 2003].

[8] Because a probabilistic approach is being adopted for flow forecasts, in addition to defining e-flows, one more thing needs to be determined: an acceptable risk. This is defined as the probability of exceeding the e-flow threshold (or being outside the range). If the predicted risk is higher than the acceptable risk, then action is warranted to mitigate that risk; otherwise no action would need to be taken. Clearly, this requires an additional step for stakeholder input and iteration, but also provides a systematic way to guide decision making under climate uncertainty. In this paper, acceptable thresholds and risks are explored for the watershed in Montana in section 'Defining Risk Tolerance'.

2.2. Step 2: Develop Forecast Model

2.2.1. Model Development

[9] Generalized linear modeling (GLM) is the statistical framework used to develop a suite of tools for predicting responses (e.g., e-flows) that may have different attributes (continuous, discrete, categorical, etc). In GLM, we assume that the response variable, Y, is from any distribution in the exponential family. To specify the distribution, a link function is used to relate the expected value of Y to a set of predictors [McCullagh and Nelder, 1989]:

display math(1)

where G(.) is the link function, E(Y) is the expected value of the response variable, βT is the transposed vector of fitted model parameters, X is the predictor matrix, and e is the error term. GLM can be used for a variety of response variables, and identifying the appropriate link function depends on the assumed distribution of Y. For instance, with discrete data, such as the number of days flow is below a given threshold, the Poisson distribution is the appropriate choice, with the logarithm link function. For some management contexts, a categorical prediction—such as threshold exceedance likelihood—is useful. In this case, the response variable is binary data (0 or 1), and the binomial distribution is appropriate, with the logit link function. If more than two categories are relevant for decision making, such as the likelihood that a flow will be within a given range, the multinomial logit, an extension of logistic regression, is used. Here, we explore these three models—the Poisson, logistic, and multinomial. The reader is referred to McCullagh and Nelder [1989] for details on distributions and link functions, as well as parameter estimation.

2.2.2. Model Evaluation

[10] The performance of the categorical forecast using logistic regression is evaluated using the Brier Skill Score (BSS) [Wilks, 1995]:

display math(2)

where the BSForecast is the Brier Score (BS) for the forecast, defined as:

display math(3)

where pi refers to the forecast probabilities, oi refers to the observed probabilities (oi = 1 if the observed flow exceeds the threshold, 0 otherwise), and N is the sample size. BSClim is the BS for climatology, which is also calculated from equation (3), but for every year uses climatological probabilities, i.e., pi = 0.50 (for when there are two categories, such as above or below a threshold). BSS values range from negative infinity to 1. Compared to climatology, BSS < 0 indicates that the forecast has less skill, BSS = 0 indicates equal skill, and a BSS > 0 indicates more skill, with 1 being a “perfect” forecast.

[11] To evaluate the multinomial logit forecast performance, an extension of the BSS for multiple categories is used, called the ranked probability skill score (RPSS) [Wilks, 1995]:

display math(4)

[12] Where the RPSForecast is defined as:

display math(5)

where for a categorical forecast in a given year, i, p = (pi,1, pi,2,…pi,k), k is the number of categories (here, three), and N is the number of years. For two categories, this collapses to the BS. RPSClim is also calculated from equation (5), but for every year uses the climatological probabilities, i.e., 0.33 if there are three categories. The interpretation of the RPSS is the same as the BSS. The skill of the models can be evaluated using all of the fitted data, as well as in a cross-validation mode (i.e., systematically leave-out and then predict each observation).

2.3. Step 3: Generate Seasonal Climate Forecast Ensembles

[13] Seasonal temperature and precipitation forecasts are made available by several agencies around the world, including the International Research Institute for Climate and Society (IRI; The IRI seasonal climate forecasts are provided globally up to 6 months in advance in 3 month shifting windows. The forecasts are probabilistic and indicate locations where there is likelihood of above or below average conditions, or equal chances where the models do not indicate enough skill to change the outlook. Figure 1 shows a sample precipitation forecast for North America, issued in April of 2011, for the May-June-July 2011 season. Forecasts are given in an A/N/B format, where A indicates the likelihood of above-normal precipitation, N indicates near-normal precipitation, and B indicates below-normal precipitation, where the above-normal and below-normal categories are based on the terciles. Thus, an “equal chances” forecast, is represented as A/N/B = 33/33/33, meaning there is an equal probability (33%) for precipitation to be above normal, near normal, or below normal. White coloring indicates places with equal chances, but other areas show increased likelihoods; for example the south-central US shows increased likelihood of below average precipitation, and the northeast coast shows increased likelihood of above average precipitation.

Figure 1.

IRI precipitation forecast for May-June-July 2011.

[14] To use these forecasts, we adopt a resample method put forth in Towler et al. [2010]. The method creates ensemble forecasts by sampling with replacement from the historical climate data, say precipitation. In short, the historical precipitation values are put in ascending order, and the bottom third is designated as the below-normal pool, the middle third as the near-normal pool, and the top third as the above-normal pool. The A/N/B values from the forecast are used as weights in resampling; that is, they determine how many samples are taken from each third. For instance, an A/N/B = 33/33/33 forecast would prompt equal sampling from each category. For A/N/B = 50/30/20, 50% of the ensemble would be from the above-normal pool, and so on. This generates an ensemble that is reflective of the seasonal forecast.

2.4. Step 4: Estimate Risk

[15] The climate forecast ensembles (Step 3) can be used in conjunction with the appropriate GLM (Step 2) to estimate ensembles of the response variable of interest (e.g., streamflow characteristics). These ensembles can be examined to quantify exceedance probabilities, or the risk profile, associated with a given forecast.

2.5. Step 5: Manage for Climate Risk

[16] Once the forecast prediction has been issued, the first question is whether or not to take anticipatory action. This requires revisiting Step 1, to see if the forecast violates the predefined acceptable risk. Perhaps a more difficult question is what actions can and should be taken. This requires stakeholder participation and is context specific, though general guidance exists [see Richter et al., 2003]. Further, different management strategies can be experimentally tested and evaluated in an adaptive management context [Poff et al., 2003; Richter et al., 2003].

3. Case Study Overview

3.1. Data

[17] The case study used to demonstrate the framework is the Big Hole River, located in southwestern Montana (Figure 2). Several data sets are used in this effort:

Figure 2.

Map of the Big Hole River Watershed in southwestern Montana.

[18] 1. Daily streamflow records, available from 1988 to 2011, from U.S. Geological Survey (USGS) station 06024450 at Wisdom (Big Hole River below Big Lake Creek at Wisdom, Montana), available from the USGS National Water Information System (

[19] 2. Snow water equivalent (SWE) measurements for January to June from 1964 to 2011 for the Bloody Dick Snotel station (site number 355), Montana, available from the U.S. Department of Agriculture (USDA) Natural Resources Conservation Service (NRCS) National Water and Climate Center (

[20] 3. Monthly average precipitation and temperature data over Montana's southwestern region (Division 2) from 1964 to 2011, obtained from the U.S. climate division data set available through National Oceanic and Atmospheric Administration's (NOAA) National Climatic Data Center (NCDC;

3.2. Background

[21] The focus area for this study, the Big Hole River watershed, is part of a largely pristine watershed in the upper Missouri River basin, draining approximately 7250 km2 (2800 mi2) and running 247 km (153 mi). The surface water hydrograph for the River peaks in June and tapers off to base flow conditions in summer, and is influenced by three main climate drivers (Figure 3). Snowpack accumulates over the winter and typically reaches a maximum during the month of April. Subsequently, snowpack begins to melt and precipitation ramps up, with the wettest months occurring in May and June. This is followed by the hottest temperatures during July and August. Southwestern Montana also experiences periodic drought episodes, including a recent drought that began in the 2000s and persisted until around 2008.

Figure 3.

Monthly climate for the case study area, including snow water equivalent from the 1st of the month at the Bloody Dick Snotel (SWE), average monthly precipitation (Precip), and temperature (Temp) for Montana division 2, and average monthly flow at the USGS Wisdom gage (Flow); data has been normalized (average monthly values divided by maximum monthly average).

[22] The River is unregulated (i.e., there is no large amount of storage), but water is extracted to support an economy of agriculture. Land use around the Big Hole is primarily cattle and hay operations, and water is diverted from the Big Hole River and its tributaries starting in late April to May to irrigate grass hay and pasture [Abdo and Roberts, 2008]. The demands on the system are intensified during drought conditions. These can be exacerbated from irrigation needs which are already very high because of several factors: overallocated water rights, inefficient flood irrigation systems, and an increase in pasture grazing that requires a longer irrigation season [U.S. Fish and Wildlife Service (FWS), 2006].

[23] In addition, the Big Hole River is important for recreation and tourism. It is a nationally recognized trout stream, making it a popular fly fishing destination. The River also provides habitat for the last remaining population of a fish species, the fluvial (river-dwelling) Arctic grayling (Thymallus arcticus), in the lower 48 states. The combination of recent drought periods and water withdrawals has exerted increased pressure on the river, making it challenging to maintain flows needed for healthy fisheries [Abdo and Roberts, 2008]. In 1994, the US Fish and Wildlife Service added the grayling to the list of candidate species for consideration as threatened or endangered under the Endangered Species Act (ESA). The grayling was subsequently removed from the candidate list in 2007 then readded in 2010 [US Department of Interior (USDI), 2010].

[24] These competing water demands, compounded by recent drought periods, have raised concerns about the sustainability of the Big Hole River system. To address these points, a local watershed group, the Big Hole Watershed Committee (BHWC), and the Montana Department of Fish, Wildlife, and Parks (FWP) came together and developed a Drought Plan for the Big Hole in 1997 [Big Hole Watershed Committee (BHWC), 1997]. The Drought Plan's stated purpose is “to mitigate the effects of low stream flows and lethal water temperatures for fisheries (particularly fluvial Arctic grayling) through a voluntary effort among agriculture, municipalities, business, conservation groups, anglers, and affected government agencies” [BHWC, 1997]. Central to this effort, the Drought Plan identifies biologically based, decision-relevant thresholds for flows (based on Department of Fish, Wildlife, and Parks (FWP), [1989]) and temperatures (based on Lohr et al. [1996]) for several gaged locations along the river. The Wisdom area (Figure 2) has been identified as a key location for historic and present populations of grayling [FWS, 2006]; hence, this study focuses on the thresholds at the USGS gage at Wisdom, Montana. In the Drought Plan, several flow triggers at Wisdom have been identified that correspond to specific actions. For instance, when flows at the gage drop below 60 cubic feet per second (cfs), officials increase awareness in the community by presenting data and formulating options. When flows go below 40 cfs, voluntary conservation efforts are implemented. At 20 cfs, the River is closed to fishing by Montana FWP.

[25] Our analysis concentrates on the 60 cfs threshold, which is the most conservative threshold for summer, defined in the Drought Plan as 1 July to 30 September. However, we note that the approach is flexible and could be applied to other identified trigger thresholds in the Drought Plan. Historically, the number of summer days that river flow was below the 60 cfs threshold shows considerable variability, though no discernible trend (Figure 4). Henceforth, the number of days below the threshold is referred to as “D60.” For example, in 1997, D60 = 0, i.e., no days were below the threshold, but in 1988, all 92 summer days were below the threshold. From the observed record, the average D60 was 57 days. Although other flow attributes are of interest (e.g., spring high flows), the main focus of the Drought Plan is on number of low flow days, given the importance in maintaining adequate habitat for fisheries (particularly grayling).

Figure 4.

At the USGS Wisdom gage, the number of summer days streamflow was below the 60 cubic feet per second (cfs) trigger (D60).

[26] As part of the Drought Plan, Montana FWP, Montana Department of Natural Resources and Conservation (DNRC), and the United States Natural Resources Conservation Service (NRCS) are tasked with providing accurate and timely information regarding stream conditions and snowpack levels throughout the year. This information is germane, but it does not directly predict the risk of violating the Drought Plan triggers. As such, the aim of this approach is to complement these efforts by using seasonal climate forecasts to predict D60 characteristics to inform their early-season planning efforts.

4. Case Study Results

4.1. Defining Risk Tolerance

[27] The Drought Plan is used to identify flow triggers, or e-flows, that have been agreed upon by local stakeholders and are relevant to the grayling; 60 cfs is used in this paper. As mentioned, a priority of the Plan is the number of days the river goes below the threshold (D60). Lower D60 values are preferred, but determining an exact acceptable D60 value is difficult (personal communication, Emma Cayer, 19 October 2011). Instead, in this paper we looked at several acceptable D60 values that were of interest: 23, 46, 60, and 75 days.

[28] For the acceptable risk, we are referring to the probability of exceeding the acceptable D60 value. If the calculated risk is higher than the acceptable risk, then action is warranted to mitigate that risk; otherwise no action would need to be taken. For demonstration purposes, we designed the following ruleset with four risk classifications and associated probabilities:

[29] No Risk: <33%

[30] Risk Averse: 33–50%

[31] Risk Neutral: 51–66%

[32] Risky: >66%

[33] For ease of classification, we have designated each of these categories with distinct cutoff points; however, we point out that the increasing risk levels exist on a continuum. For instance, a No Risk manager would be comfortable with forecasted exceedance probabilities between 0 and 33%, but above 33% would initiate action. Similarly, a manager who is Risk Averse would be comfortable with exceedance probabilities from 0 to 50%, and a Risk Neutral manager is willing to accept exceedance probabilities from 0 to 66%, and so forth. Big Hole managers plan to present the 1 May forecast to inform the community about whether or not action will be needed in the coming irrigation season, such as contributing water to the River (see section 'Managing for Climate Risk'). As such, to avoid “crying wolf,” and given that there is still time to take action later in the season if needed, the “Risk Averse” category is appropriate for their context. However, all of the risk levels will be discussed in the results.

4.2. Developing the D60 Forecast Model

4.2.1. Model Development

[34] To directly model D60, we used the Poisson distribution. To find the best set of predictors, correlations between D60 and the three important climate drivers—SWE, precipitation, and temperature—were explored (Figure 3). For SWE, we examined the traditional 1 April SWE (SWEApr) and 1 May SWE (SWEMay), as well as precipitation and temperature data that had been averaged monthly from one to three consecutive months (results not shown). The strongest correlation, ρ, was found between average May to July precipitation (PCPMJJ) and D60, with a ρ = −0.77. D60 was weakly correlated with SWEMay (ρ = −0.48) and SWEApr (ρ = −0.34). Average June to August temperatures (TMPJJA) had the highest correlation for temperature (ρ = −0.71). We conducted preliminary analyses with these top-correlating variables in two-variable predictor combinations, and used the Akaike Information Criterion (AIC) [Akaike, 1974] to find the best combination predictor set (results not shown). We found SWEMay and PCPMJJ, henceforth referred to as SWE and PCP, produced the best model.

[35] Using the glm command in the free statistical software R (, we developed the Poisson model:

display math(6)

[36] This resulted in an R2 value of 0.75 and cross-validated R2 value of 0.68 (Table 1); diagnostics showed that the underlying linear model assumptions were reasonable (figures not shown). This explanatory model uses observed SWE and May–July precipitation forecasts (both obtained on 1 May). We note that a more “traditional” model that does not include prospective forecasts, and just includes observed 1 May SWE, has a much lower skill (R2 = 0.21).

Table 1. Best-Fit Generalized Linear Model Coefficients and Fit Statisticsa
 PoissonLogisticMultinomial (Proportional Odds)
 log (D60)logit (P[D60>=60])logit (P[D60>=46])logit (P[D60>=75])
  1. a

    D60 is the number of days below the flow threshold; SWEMay is the May snow water equivalent; PCPmjj is average monthly precipitation for May-July; xval=cross-validated; BSS = Brier Skill Score; RPSS = ranked probability skill score.

R2 (xval)0.75 (0.68)   
BSS (xval) 0.70 (0.55)  
RPSS (xval)  0.42 (0.23)

[37] The D60 coming from equation (6) is the mean (i.e., E(Y)) of the assumed Poisson distribution, and thus constrained to be greater than or equal to 0. We manually constrain the upper limit to 92 days (the total number of days in summer). The coefficients of the model can be interpreted to understand the expected difference in D60 (on the logarithmic scale) for each additional unit of predictor (Gelman and Hill [2009], p. 111). Thus, for 1 May SWE, the expected reduction in D60 is math formula, or a 5.4% reduction for every inch of SWE added. For PCP, the expected reduction is math formula or 63% for every inch of average precipitation added over the season. This relationship is further illustrated in a contour plot between SWE, PCP, and D60 (Figure 5). As expected, when both predictors are below (above) average, D60 is correspondingly high (low). However, it is interesting that when 1 May SWE is above average, if PCP is below average, then D60 is typically higher. This makes sense in light of the watershed climate, where the wettest months are May and June (Figure 3), but shows how important it is to have a 1 May forecast that includes a precipitation outlook for prospective planning. Big Hole managers relying only on the traditional 1 May snowpack for decision making may underestimate the risk of having a higher number of low flow days in the upcoming season.

Figure 5.

Contour plot of number of days below threshold (D60) in terms of average May to July precipitation (PCP) and 1 May snow water equivalent (SWE).

[38] The Poisson model offers a method to directly estimate the D60 value. Further, when used with the precipitation forecast ensembles (see upcoming section 'Generating Precipitation Forecast Ensembles'), it results in D60 ensembles, which can be used to examine different exceedance probabilities. As such, it is a flexible approach for exploring risk management strategies, especially when both the acceptable D60 and acceptable risk values are still being investigated. However, the logistic and multinomial models can be used to directly model exceedance probabilities. They provide a coarser prediction, but are appealing in contexts that use coarse predictor information, such as the case with seasonal precipitation forecasts. Further, the categorical approaches are useful in cases where the acceptable D60 value is rigidly set. Although for this case study the Poisson is appropriate, we also show the logistic and multinomial for completeness and to show the flexibility of the GLM approach (Table 1). We used glm in R to fit the logistic equation, selecting 60 days as the acceptable D60 (Table 1) and we used the vglm command in R's VGAM (Vector Generalized Linear and Additive Models) package to fit a multinomial logit proportional odds model, where three categories were selected: (i) below 46 days, (ii) between 46 and 75 days, and (iii) above 75 days (Table 1). For the latter model, diagnostics showed that the underlying parallel assumption was not violated (results not shown).

[39] The categorical models can be evaluated using the skill scores described in section 'Model Evaluation'. For the logistic, the BSS was 0.70, and the cross-validated BSS was 0.55, indicating that the model exhibited significant skill over just assuming the climatological probabilities (i.e., 50% chance of being in each category). For the multinomial, the RPSS was calculated as 0.42, and the cross-validated RPSS was 0.23, also showing skill over assuming the climatological probabilities (i.e., 33% chance of being in each category).

4.3. Generating Precipitation Forecast Ensembles

[40] Several different A/N/B forecasts were considered. First, 10 years (2002–2011) of historical May to July precipitation forecasts issued by the IRI for southwestern Montana were used. However, most of these are very conservative—either equal chances (6 out of 10 seasons) or barely deviating from equal chances. That is, for the three seasons that were forecasted to be dry, the forecast was 25/35/40, and the one wet season was 40/35/25. As such, we also compared the equal chances scenario (A/N/B =33/33/33) with synthetic scenarios to evaluate forecast sensitivity; we used A/N/B = 50/30/20 as the “wet” scenario and A/N/B = 20/30/50 as the “dry” scenario (see section 'Step 3: Generate Seasonal Climate Forecast Ensembles' for scenario development). For each forecast scenario (real or synthetic), an ensemble was created with 999 members.

[41] Although the GLM models were fitted with data from 1988 to 2011, PCP data from 1964 to 2011 were used in the resample. By including precipitation observations from the last 48 years—rather than just 24 years—it ensured a more varied precipitation ensemble. However, the average PCP decreased by 4.8% between 1964–1987 and 1988–2011. Further, in the upcoming sensitivity analysis, we set 1 May SWE at its median value (=11 in.), as calculated from the 1964 to the 2011 period; however, we note that average 1 May snowpack decreased by 14.5% from 1964–1987 to 1988–2011. This indicates that the precipitation forecast ensembles and the median SWE values used in the sensitivity analysis will result in an underestimation of the risk. That is, the mean for a given forecast precipitation ensemble will be higher than it would have been if we had used only the last 24 years, leading to lower estimates of D60. Similarly, the median SWE value would have been lower if we had used only the last 24 years, also leading to lower estimates of D60.

4.4. Estimating D60

4.4.1. Risk Sensitivity to Synthetic Forecasts

[42] Fixing 1 May SWE at the median (=11 in.), we used the synthetic seasonal precipitation forecasts (i.e., wet, equal chances, and dry) to examine the sensitivity of the D60 ensembles. For the Poisson, smoothed empirical cumulative distribution functions (CDFs) for each precipitation case show that as expected, a wetter (drier) precipitation forecast shifts the curve up (down) from the equal chances forecast (Figure 6; note that the probability of exceedance is 1 minus the CDF value). The curves can be approached in one of two ways: starting either with acceptable risk or an acceptable D60. In terms of the former, the dashed horizontal grey line in Figure 6 shows an exceedance probability of 0.50, which demonstrates a situation where we are assuming an acceptable risk of 50%, and then seeing if the associated number of days is acceptable. Nevertheless, acceptable risk may change through time; as such, it may be more intuitive to start by selecting an acceptable D60. Setting the acceptable D60 at 46 (vertical dashed line), the associated risk can be estimated. The equal chances risk is 48%, but dry (wet) is 64% (33%). Additional acceptable D60 results are shown in Figure 7. These are coded to correspond to the defined risk ruleset. For instance, for 46 days, the equal chances and wet forecast fall into the Risk Averse category, indicating that only the No Risk would take action, and that no action would be taken by the Risk Averse, Risk Neutral, or Risky. However, still looking at 46 days, the dry forecasts fall into the Risk Neutral category, indicating that the No Risk and the Risk Averse would take precautionary action. For 23 days, regardless of the precipitation scenario, only the Risky would not take action.

Figure 6.

Empirical cumulative distribution function (CDF) shows the nonexceedance probability for synthetic precipitation forecasts (dry, EC = equal chances, wet) under median 1 May snow water equivalent. Dashed horizontal line is the 50% exceedance, dashed vertical line is 46 days.

Figure 7.

Exceedance and categorical likelihoods for D60 for synthetic precipitation (PCP) forecasts under median SWE. D60 is number of days below the flow threshold.

[43] For comparison, the same sensitivity analysis was carried out for the logistic regression, with an acceptable D60 equal to 60 days. Results show a modest decrease across precipitation forecasts, moving from 51% (dry) to 36% (equal chances) to 23% (wet), and move incrementally down the risk ruleset classifications, from Risk Neutral to Risk Averse to No Risk (Figure 7). The logistic results are similar, albeit slightly higher, compared to the Poisson results for the D60 > 60 days case (Figure 7). As such, the risk categories for the logistic are more conservative in the dry and equal chances cases than for the Poisson.

[44] Similarly, the sensitivity analysis was carried out on the multinomial logit with categories defined at 46 and 75 days. For the D60 > 75 case, the multinomial and Poisson results were very similar and landed in the same No Risk category across the precipitation outlooks (Figure 7). For the multinomial results, we note that only the >75 category can be coded in accordance with the risk ruleset, as the risk rubric is defined in terms of exceedance probabilities. The multinomial results can also be viewed across the precipitation cases; e.g., the wet scenario shows a likelihood of 61% for being in the <46 category, 27% for 46–75, and 11% for >75.

4.4.2. Historical Forecast Evaluation

[45] It was difficult to perform a meaningful quantification of the historical D60 forecast skill for a number of reasons. First, we note that the D60 forecasts can only be as good as the underlying IRI precipitation forecasts, which do not deviate much from equal chances and are not uniformly skillful from year-to-year. Further, the historical forecasts are for large geographic areas and have been in existence for a short amount of time (since 2002), making it difficult to estimate traditional skill measures with any confidence. Nevertheless, to obtain some insight into the forecast skill, we performed a simple evaluation method with the Poisson model. Using the observed 1 May SWE and historical seasonal precipitation outlooks from 2002 to 2011, we validated each historical forecast (Figure 8). In general, we can say that the forecast ensemble is “good” if the observation falls between the 25th and 75th quantile (i.e., the box in the box plot). From 2002 to 2011, this was achieved in 6 of the 10 years (60%; Figure 8). Further, we can calculate if the observation was closer to the forecast ensemble median or the observed average D60 (57 days, dashed horizontal line); the forecast median was either as good or better in 5 out of 10 years (50%), but it is notable that the forecast median did better in recent wet years (2008–2011). Although an accurate forecast is important in all years, it is useful to know that the wet year predictions are more reliable. Because of recent drought periods, the Big Hole has moved toward more conservative management (i.e., for drought). As such, a wet forecast that results in lower D60 predictions should be paid more attention to, as it suggests that management actions to reduce diversions may be minimal.

Figure 8.

Validation of observations (gray diamonds) and hindcast ensembles (box plots) using historical IRI precipitation forecasts for D60 (number of days below the threshold). Gray dashed line is observed average D60 (57 days).

[46] Given the constraints in evaluating the D60 forecasts, we also investigated how “sharp” the forecasts would have to be to improve the predictions. For 2002–2011, we created two “perfect” synthetic forecasts: Sharp 50 and Sharp 90. For Sharp 50, if the observed PCP was actually in the dry tercile then the forecast would be 25/25/50; whereas if it was in the normal tercile then the forecast would be 25/50/25. Similarly, for Sharp 90, if the observed PCP was actually in the wet tercile then the forecast was 90/5/5, and so on. Compared to the fitted Poisson model, the root-mean-square errors (RMSE) and mean errors (ME) of the forecast ensemble median improve as the forecast sharpens (Table 2). As expected, there is not much of an improvement between equal chances and the IRI forecasts (i.e., RMSE = 13 versus RMSE = 12), but a more targeted forecast of Sharp 50 shows considerable improvement (RMSE = 7). However, given the coarse nature of the tercile-based forecasts the improvements do diminish, as Sharp 90 barely decreases the error over Sharp 50 (RMSE = 6). This indicates that even a modest sharpening of the forecast could reap significant increases in skill. This underscores the importance of having this type of framework developed and ready as forecasts continue to improve.

Table 2. Difference From Fitted Poisson Model in Root Mean Square Error (RMSE) and Mean Error (ME) for Different Precipitation Forecast Ensemble Mediansa
 Difference From Fitted Poisson Model
  1. a

    Equal chances uses a 33/33/33 forecast every year; IRI forecasts use the actual forecasts from 2002 to 2011; Sharp 50 (90) creates forecasts where for each year 50% (90%) of the resample is from the correct tercile and 25% (5%) is from each of the other two terciles.

Equal chances1313
IRI forecast1211
Sharp 5079
Sharp 9067

4.5. Managing for Climate Risk

[47] These D60 outlooks offer a 2 month lead time to help managers plan for the upcoming summer season. The next question is what measures are feasible if the D60 outlook warrants action? For the Big Hole River, efforts led by Montana FWP and the US Fish and Wildlife Service, and in cooperation with Montana DNRC and USDA NRCS, have led to the permitting of Candidate Conservation Agreements with Assurances (CCAAs) [FWS, 2006]. These agreements offer a type of “conservation insurance,” whereby site-specific plans are developed between landowners and CCAA officials aimed at enhancing grayling fish populations. By entering into the agreement, landowners receive assurances against any further regulation beyond the actions outlined in their site-specific plans if the grayling gets listed as threatened or endangered under the ESA. The CCAAs outline several valid conservation measures, one of which is improving streamflows [FWS, 2006]. Agreements that include in-stream flow contributions as a way of improving streamflow are of particular importance, as they offer managers additional flexibility to deal with year-to-year climate variability. Hence, a useful piece of information for managers would be to know when the 1 May forecast is issued: how many in-stream flow contributions will they need to offset the climate risk and maintain an acceptable D60? To begin to answer this question, we used linear regression (i.e., a special case of GLM where Y is assumed to be normally distributed and the link is identity), to relate D60 to average daily streamflow, Q, for the summer season:

display math(7)

[48] This results in an R2 of 0.80. The slope (=−0.3) is interpreted as math formula. This provides a rough estimate suggesting that for every 1 cfs that is contributed over the summer season, the predicted D60 is reduced by one-third of a day. Or, for every 3 cfs contributed, the predicted D60 is reduced by 1 day.

5. Discussion and Conclusions

[49] The purpose of this paper is to develop a procedure that explicitly integrates probabilistic seasonal climate forecasts into an end-to-end risk-management approach for conservation planning. As in other resource management contexts, the Big Hole managers are continually engaged in an adaptive process: monitoring and reflecting after each season on how things went in light of their actions, and how they can refine it for the next season. As such, the steps in this framework offer additional tools to complement their ongoing, iterative management strategies.

[50] This paper put forth GLM as a viable approach to relate climate to streamflow characteristics. These empirical relationships provide a simple, transparent, and effective modeling approach, and have been used successfully in other applications [e.g., Weirich et al., 2011; Weirich, 2012]. However, we point out that the framework is flexible, and other statistical techniques, such as conditional resampling [e.g., Souza Filho and Lall, 2003], or physically based forecast models could be embedded. Physically based streamflow simulation models that can incorporate climate information and evaluate different management strategies [e.g., Thompson et al., 2012, Wilby et al., 2011, Mondal et al., 2011] can be especially useful in studying more complex systems, such as for regulated rivers where dam releases need to be explicitly modeled or where there are multiple e-flow attributes and/or locations that need to be simulated. However, hydrologic models can be data intensive and require significant resources, and may not be viable for some applications, especially small resource-limited watersheds. Regardless of modeling approach, we point out that one key is the availability of gage streamflow data—either for direct empirical modeling or for calibration—at locations relevant to decision making. Here, the Wisdom gage location was a deliberate decision by the Montana FWP, and this paper demonstrates how this type of strategic monitoring can support decision making.

[51] Although the purpose of this paper was not to identify e-flows, we note that it is a critical component in organizing the analysis, and it should be reiterated that defining e-flows and using them for management is challenging. Although managing for the full natural flow regime may be ideal, in practice certain e-flows may be prioritized for managing particular systems. For this relatively simple case study, where the 60 cfs flow trigger from the Drought Plan had already been agreed upon, it is still difficult to “finalize” an acceptable D60 and acceptable risk. As part of the adaptive management process or due to other factors, what is acceptable may change over time [Yohe and Leichenko, 2010]. However, we underscore that it is important to have a general tool that can use probabilistic climate forecasts to investigate acceptable levels—whether they are set, changed, or still being explored—so as to provide a systematic means to guide proactive decision making.

[52] This research focuses on using seasonal climate forecasts, which despite significant advances [e.g., Goddard et al., 2003; Livezey and Timofeyeva, 2008], have been underutilized by the management community for a variety of reasons [Dilling and Lemos, 2011]. One issue illustrated in this study is that the seasonal forecasts are often close to equal chances, i.e., they are not very “sharp,” and we show how modest increases in sharpness could improve skill. As such, it may seem tempting to wait to use the probabilistic forecasts until they become sharper, and to simply issue a more traditional, discrete forecast, e.g., one that uses observed SWE as the only predictor. However, in this case the SWE-only model was much less skillful than the SWE and PCP model (R2 = 0.21 versus 0.75). Having a skillful model that accurately describes the processes governing the low flows is important, and could be used for additional purposes, such as for historical reconstructions or simulations. Further, when used in forecast mode, there is the potential to exploit years with skillful forecasts, which was shown for wet years from the historical evaluation. Similarly, it is important to have a skillful model ready as forecasts improve, as was illustrated by the Sharp 50 forecast. In addition to the readily available IRI forecasts, we point out that improved forecasts can also be developed by tailoring predictors specifically to streamflow in the region of interest, which has shown promising results [Grantz et al., 2005; Regonda et al., 2006; Bracken et al., 2010]. Our risk-based approach could also use other probabilistic climate information, such as from other seasonal forecast systems (e.g., NCEP's Climate Forecast System) or with future climate change projections. In short, as climate predictions advance, they could be readily exploited by the framework developed in this paper.

[53] In summary, this study develops an end-to-end approach that translates probabilistic climate information into decision-relevant streamflow attributes. The risk-based framework is adaptable and could be applied to other watersheds and management contexts where there is a direct association between climate and the e-flow target of interest; here we show a successful demonstration for an unregulated watershed with water withdrawals in Montana. As managers are faced with competing water demands in light of climate variability, this approach provides a systematic means of bridging climate information with decision support.


[54] This research was supported by the Postdocs Applying Climate Expertise (PACE) fellowship program funded by the NOAA Climate Program Office and the U.S. Geological Survey, and administered by the University Corporation for Atmospheric Research Visiting Scientist Programs. The authors wish to thank the Big Hole Watershed Committee, especially Jen Titus and Kevin Brown, as well as Emma Cayer from Montana Fish, Wildlife, and Parks. The first author acknowledges the Regional Climate Section at the National Center for Atmospheric Research (NCAR); NCAR is sponsored by the National Science Foundation. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the US Government.