An approach for probabilistic forecasting of seasonal turbidity threshold exceedance

Authors

  • Erin Towler,

    1. Department of Civil, Environmental and Architectural Engineering, University of Colorado at Boulder, Boulder, Colorado, USA
    2. National Center for Atmospheric Research, Boulder, Colorado, USA
    Search for more papers by this author
  • Balaji Rajagopalan,

    1. Department of Civil, Environmental and Architectural Engineering, University of Colorado at Boulder, Boulder, Colorado, USA
    2. Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, USA
    Search for more papers by this author
  • R. Scott Summers,

    1. Department of Civil, Environmental and Architectural Engineering, University of Colorado at Boulder, Boulder, Colorado, USA
    Search for more papers by this author
  • David Yates

    1. National Center for Atmospheric Research, Boulder, Colorado, USA
    Search for more papers by this author

Abstract

[1] Though climate forecasts offer substantial promise for improving water resource oversight, additional tools are needed to translate these forecasts into water-quality-based products that can be useful to water utility managers. To this end, a generalized approach is developed that uses seasonal forecasts to predict the likelihood of exceeding a prescribed water quality limit. Because many water quality standards are based on thresholds, this study utilizes a logistic regression technique, which employs nonparametric or “local” estimation that can capture nonlinear features in the data. The approach is applied to a drinking water source in the Pacific Northwest United States that has experienced elevated turbidity values that are correlated with streamflow. The main steps of the approach are to (1) obtain a seasonal probabilistic precipitation forecast, (2) generate streamflow scenarios conditional on the precipitation forecast, (3) use a local logistic regression to compute the turbidity threshold exceedance probabilities, and (4) quantify the likelihood of turbidity exceedance corresponding to the seasonal climate forecast. Results demonstrate that forecasts offer a slight improvement over climatology, but that representative forecasts are conservative and result in only a small shift in total exceedance likelihood. Synthetic forecasts are included to show the sensitivity of the total exceedance likelihood. The technique is general and could be applied to other water quality variables that depend on climate or hydroclimate.

1. Introduction

[2] Recent advances in climatology and computational capability are making the probabilistic forecasts of seasonal climate over the United States and across the globe increasingly prevalent and skillful [Barnston et al., 1994; Goddard et al., 2003; Livezey and Timofeyeva, 2008]. Nevertheless, despite significant advances and potential benefits, water managers have been slow to incorporate forecasts into their water quality assessments. In addition to being deemed unreliable, several underlying institutional factors contribute to their underutilization, including system complexity and organizational conservatism [Rayner et al., 2005]. In addition to these barriers, evidence suggests that practical uses for these seasonal forecasts need to be identified [Rayner et al., 2005; Pagano et al., 2001] and tailored to a particular situational vulnerability or local management practice [Pagano et al., 2002]. As such, efforts to integrate forecasts into secondary products that are relevant to the major concerns of water managers have been explored [Carbone and Dow, 2005], and this paper aims to complement and extend those efforts.

[3] Historic attempts to use climate forecasts have focused on water quantity, with little extension to forecasting water quality, which is understandable, since reliability has always been a central tenet of water management. Seasonal streamflow forecasting is a topic of widespread interest and an area of active research [e.g., see Wood and Lettenmaier, 2006, and references therein]. As such, there have been several efforts to develop seasonal streamflow forecasts using large-scale seasonal climate information in combination with physical hydrologic models [Wood and Lettenmaier, 2006; Hamlet and Lettenmaier, 1999; Hay et al., 2002; Wood et al., 2002, 2005], as well as through statistical techniques that incorporate predictors from the ocean-atmosphere and land system [Grantz et al., 2005; Regonda et al., 2006; Opitz-Stapleton et al., 2007].

[4] For drinking water utilities, the extension of forecasting efforts to source water quality is valuable, since there are implications for treatment and management decisions. This is playing an increasingly important role as supplies become strained and regulations become more stringent. Therefore, understanding the variability of the influent water quantity and quality are important for efficient management of the system. Many drinking water treatment systems draw from surface water sources, including from flowing streams and reservoirs. For these sources, year-to-year variations in flow are linked to the regional climate, mainly the precipitation (rainfall and/or snow) in the basin. There is increasing evidence that water quality pollutant concentrations are associated with streamflow [Manczak and Florczyk, 1971; Johnson, 1979; Stow and Borsuk, 2003], and this provides a good opportunity to extend the seasonal climate forecasts to relevant water quality applications.

[5] Environmental and public health protection are often established through the use of water quality thresholds that have been set by regulatory agencies or identified as limits to particular treatment options; hence, accurate assessment of threshold exceedances is an important management tool. In the context of total maximum daily loads (TMDLs) set for discharges to streams and rivers, water quality violations have been assessed using probabilistic approaches [Borsuk et al., 2002], as well as through Bayesian techniques that combine model simulations with monitoring data [Qian and Reckhow, 2007]. However, these methods do not provide a way to utilize seasonal climate forecast information for helping to manage the potential for water quality threshold exceedances.

[6] This work is motivated by the need for a simple and flexible approach that can translate probabilistic seasonal climate forecasts to predictions of seasonal water quality threshold exceedance. We develop a local logistic regression model for seasonal water quality threshold exceedance based on seasonal streamflow, which in turn is modeled using seasonal climate forecasts. The logistic regression models the threshold exceedance probability directly, and the “local aspect” of the logistic regression provides the ability to capture any arbitrary (i.e., linear or nonlinear) relationship present in the data. This approach is unique and makes two significant contributions: first, it introduces a data-driven functional approach for water quality exceedance modeling, and second, it seamlessly integrates climate information, providing exceedance probabilities that are consistent with a seasonal climate forecast. We demonstrate this by applying it to modeling threshold exceedance of turbidity using streamflow and seasonal climate forecast for a drinking water source in the Pacific Northwest United States. The description of the study region, data sets used, and details of the proposed approach are presented in the following sections. This approach will allow managers to take advantage of the benefits of improving skills in climate forecasts, thus enabling more efficient water quality management. Furthermore, as water managers are called to adapt to a nonstationary climate [Milly et al., 2008], such a tool can be used for long-term planning.

2. Case Study Description

[7] The case study that will be used to demonstrate this approach is the Bull Run River, which is the primary source of water for the Portland Water Bureau (PWB) in Oregon. PWB provides water to more than 20% of all Oregonians, including the city of Portland. The Bull Run Watershed is protected and is a source with very high water quality, enabling PWB to meet federal drinking water standards without the filtration treatment process. However, historic flooding and subsequent high-turbidity events have underscored PWB's vulnerability as an unfiltered source [Portland Water Bureau, 2007]. For utilities that do not filter, one of the criteria of the Surface Water Treatment Rule (SWTR) requires that the turbidity level prior to disinfection not to exceed 5 nephelometric turbidity units (NTU) [United States Environmental Protection Agency, 1989]. For PWB, if conditions arise that could cause an exceedance of 5 NTU, they follow procedures and make decisions based on monitored turbidity levels, weather patterns, antecedent conditions, and other case-specific information to ensure compliance. When necessary, the PWB switches to their low-turbidity supplemental groundwater source. This groundwater source ensures that the PWB is able to remain in compliance but is more expensive due to pumping costs. As such, the availability of skillful seasonal forecasts of turbidity exceedance could provide additional information for management and planning purposes.

2.1. Data

[8] The following data sets for the period of 1970–2007 were used in the analysis.

[9] 1. Monthly precipitation data for the Oregon Northern Cascades region (Division 4) were obtained from the U.S. climate division data set from the NOAA-CIRES Climate Diagnostics Center (CDC) Web site (http://www.cdc.noaa.gov).

[10] 2. Daily streamflow data for the main stem to the drinking water source were obtained from U.S. Geological Survey (USGS) Bull Run Station gage 14138850.

[11] 3. Daily turbidity monitoring data from the treatment plant headworks were obtained from PWB. The headworks are located below the two storage reservoirs on the Bull Run River and is the location from which the municipal drinking water supply is provided to the conduits that take water into the Portland metropolitan area.

[12] Where applicable in this analysis, monthly averages and monthly maximums were calculated from the daily data.

2.2. Diagnostics

[13] The climate of the Pacific Northwest includes a distinct wet winter season. The climate diagnostics in this section are shown as box plots, in which the box represents the 25th and 75th percentile, the whiskers show the 5th and 95th percentiles, points are values outside this range, and the horizontal line represents the median. For the case study region, box plots of the average monthly precipitation (Figure 1) and average monthly streamflows (Figure 2) show that the winter months (November–February) include generally higher average values (as indicated by the box plot median) and greater variability (as indicated by the box plot range) than for the other times of the year. Monthly maximum turbidity values monitored at the headworks of the plant exhibit a similar seasonal pattern (Figure 3), with some rare threshold (5 NTU) exceedances during the winter months. Thus, focus is placed on the winter season, and henceforth, the winter months are pooled together for analysis in subsequent figures and calculations. We note that precipitation and streamflow are examined in terms of monthly averages, since the seasonal climate forecasts that are readily available predict changes in average behavior. For turbidity, maximum values are examined, since those values indicate threshold exceedances.

Figure 1.

Average monthly precipitation in the study region for January (J) through December (D).

Figure 2.

Average monthly streamflow at the main stem gage for January (J) through December (D).

Figure 3.

Maximum monthly turbidity at the utility headworks for January (J) through December (D). Dotted horizontal line is the regulatory threshold, 5 NTU, and triangles (and associated printed values) represent outliers outside the y axis range.

[14] A scatterplot between the precipitation and streamflow from the winter months (Figure 4) shows a strong positive linear relationship (ρ = 0.79), indicative of a rainfall-runoff mechanism for the streamflow, which provides 90–95% of Bull Run River's water [Portland Water Bureau, 2007]. The relationship between the winter month streamflow and corresponding turbidity values results in a positive linear association (ρ = 0.26), but the scatterplot (Figure 5) reveals a distinct nonlinearity, as modeled by a local smoother [Loader, 1999]. Here, it can be seen that for streamflow below 700 cfs the turbidity is low, typically less than 1 NTU, and the distribution is fairly tight and constant, while above 700 cfs the turbidity response shows much more spread. However, all but one of the threshold exceedances occurs above 700 cfs. Although the variability of the turbidity response for higher streamflows would make estimation of actual values difficult, the ability to forecast the probability of being above or below the threshold shows potential. The relationship between the winter month precipitation and corresponding turbidity values shows a similar nonlinearlity (figure not included), although the linear association is weaker (ρ = 0.17). For this case study, it was beneficial to keep the analysis in terms of streamflow and turbidity, since those are the variables that PWB monitors and have the greatest influence on their operations. Nevertheless, these diagnostics indicate the link between precipitation and turbidity via streamflow, thus enhancing the prospects of turbidity forecast from seasonal precipitation forecast.

Figure 4.

Average monthly precipitation versus average monthly streamflow for the winter months (ρ = 0.79). Solid line is local smoother.

Figure 5.

Average monthly streamflow versus maximum monthly turbidity for the winter months. Dotted horizontal line is the regulatory threshold, 5 NTU, and triangles (and associated printed values) represent outliers outside the y axis range. Solid line is local smoother.

3. Seasonal Turbidity Forecast

[15] We explore the potential for seasonal turbidity forecast from precipitation forecast by proposing an approach that has four main steps (Figure 6). These steps are detailed below.

Figure 6.

Flowchart outlining the approach.

[16] Step 1: Obtain seasonal precipitation forecast

[17] Seasonal climate forecasts are now routinely provided by several organizations around the world, including the International Research Institute for Climate Prediction (IRI; http://iri.columbia.edu/climate/forecast/net_asmt/). The IRI seasonal forecasts of temperature and precipitation are provided globally with a lead time of up to 6 months in 3 month moving windows. The precipitation forecasts are provided in an A/N/B format, where A indicates the likelihood of above-normal precipitation, N indicates near-normal precipitation, and B indicates below-normal precipitation, where the above and below normal categories are based on the terciles. Thus, a “climatological forecast” would be represented as A/N/B = 33:33:33, where there is an equal probability (33%) for precipitation to be above normal, near normal, or below normal. December through February IRI precipitation forecasts for 2008 and 2005 show this A/N/B forecast nationwide (Figure 7). As can be seen, the probabilistic forecasts span a large spatial area. For the Pacific Northwest, 2008 was a likely “wet” forecast (Figure 7a), with a 40% likelihood of precipitation being above normal, a 35% likelihood of precipitation being near normal, and a 25% likelihood of precipitation being below normal (A/N/B = 40:35:25). In contrast, 2005 was a likely “dry” forecast for the Pacific Northwest, with A/N/B = 25:35:40 (Figure 7b).

Figure 7.

IRI precipitation forecast for December-January-February for (a) 2008 and (b) 2005.

[18] Typical IRI seasonal forecasts are fairly conservative in that they do not deviate much from the climatological forecast in most of the years. As such, in this analysis, we included two scenarios, “wet” and “dry,” that were representative of historic IRI forecasts for the Pacific Northwest (Table 1). However, as forecasts improve, their “sharpness” or confidence of being in a specific category should improve as well. Therefore, we considered two synthetic scenarios, “very wet” and “very dry” (Table 1), which illustrate the envelope of turbidity exceedance likelihood.

Table 1. Seasonal Forecast Scenarios Defined by Probabilistic Forecasts of the Format A/N/Ba
ScenarioANB
  • a

    A indicates above-normal precipitation, N indicates near-normal precipitation, and B indicates below-normal precipitation.

  • b

    Scenarios that are outside the bounds of historic IRI forecasts.

Very wetb0.900.050.05
Wet0.400.350.25
Dry0.250.350.40
Very dryb0.050.050.90

[19] It should also be noted that IRI regularly factors the activity of the El Niño Southern Oscillation (ENSO) into its forecasts [Goddard et al., 2003]. The case study's location in the Pacific Northwest provides an opportunity to examine an area in which the influences of the ENSO signal on winter precipitation and runoff have been well documented [Redmond and Koch, 1991; Dracup and Kahya, 1994; Cayan et al., 1999], therefore increasing the potential utility of the seasonal forecasts.

[20] Step 2: Generate conditional streamflow scenarios

[21] Since a strong relationship between precipitation and streamflow has been established in this basin (Figure 4), the precipitation forecasts were considered to be a reasonable proxy for streamflow forecasts. As such, the probabilistic forecasts (i.e., A/N/B from Table 1) were used as weights in resampling the historic streamflows to generate an ensemble for each flow scenario. For this, we used a bootstrapping technique [Efron and Tibshirani, 1993] with replacement that has been employed successfully for daily weather generation [Yates et al., 2003; Clark et al., 2004; Apipattanavis et al., 2007]. Specifically, the resampling method is carried out by ordering the historic streamflows in ascending order of each winter month average streamflow, then designating the bottom third as the below-normal pool or “B” pool, middle third as the near-normal pool or “N” pool, and the top third as the above-normal pool or “A” pool. Then, the corresponding probabilities, or A/N/B (Table 1), are utilized as weights for resampling historical streamflows from these categories. A category is selected at random using the weights and subsequently a historical streamflow (i.e., month) is resampled at random within the selected category, thus generating an ensemble that is reflective of the seasonal forecast. Each ensemble is the same length as the historic streamflow sample size (N = 148). We note that if rainfall and runoff were less closely related, then a model would need to be used to translate the precipitation forecasts into streamflow forecasts. As mentioned in section 1, there are increasing efforts to develop seasonal streamflow forecasts using large-scale seasonal climate information.

[22] Step 3: Determine threshold exceedance probability

[23] Historical monthly streamflows and the corresponding turbidity are used to estimate the conditional threshold exceedance probability. For this purpose, a local logistic regression technique is used. As such, the dependent variable (i.e., turbidity) takes on a categorical value of “1” if the value is greater than the prescribed threshold and “0” if the value is less than the threshold. The statistical prediction model can be expressed generally as

equation image

where P(TeS) is the probability of a turbidity exceedance Te conditioned on streamflow S, which is fit to its predictor using a function f, and e is the associated estimation error. We note that the error term is assumed to be normally distributed with mean of 0, although the variance is not constant [Helsel and Hirsch, 1995]. Traditionally, to obtain the predicted value of the response in logistic regression, the following equation can be used [see Helsel and Hirsch, 1995, Chapter 15]:

equation image

where the β coefficients are estimated from the data by maximizing the likelihood function.

[24] However, this traditional approach of fitting a single (i.e., global) model to the entire range of the data has several drawbacks including (1) the assumption of a normal distribution of data and errors, (2) higher-order logistic regression fits (e.g., quadratic or cubic) require large amounts of data for fitting, (3) the models are not portable across data sets, (4) model parameters are greatly influenced by outliers [Rajagopalan et al., 2005], and (5) local nonlinear features cannot be adequately captured by a global model. To alleviate some of these drawbacks, we use the “local” (i.e., nonparametric) version of the logistic regression [Loader, 1999], which is implemented using the Locfit package (http://cran.r-project.org/web/packages/locfit/locfit.pdf) in the public domain statistical software R (http://www.r-project.org/).

[25] In the local logistic regression, appropriate logistic models are “locally” developed at each desired point. Here, a function is estimated at a point based on other data points in its neighborhood. These so-called “nearest neighbors,” k(= αN), are identified, where α is the proportion of the total data points N. The value of α ranges between 0 and 1; when α = 1, then all of the data points are included, as is the case in traditional global estimation of the logistic regression coefficients. The function is then approximated by fitting a polynomial order p to the neighborhood and evaluating the point of interest. In this application, we allowed α to range between 0.4 and 1 and considered both first- and second-order polynomials (p = 1, 2). The best combination of α and p is found in terms of an objective statistic, in this case through minimization of the generalized cross-validation (GCV) statistic [Loader, 1999]. The GCV function is defined as

equation image

where yiequation imagei is the residual (error) between the observed and predicted values, N is the number of data points, and m is the degrees of freedom of the fitted polynomial [Loader, 1999, p. 31]. The GCV has been found to be a good estimate of the predictive risk of a model, unlike other functions which are goodness of fit measures [Craven and Wahba, 1979]. We note that local logistic regression is more computationally intensive than the global approach, but this is barely an issue with recent advances in computing power.

[26] To provide an estimate for the amount of uncertainty explained by the logistic model, a likelihood R2 can be calculated as

equation image

where L is the log likelihood of the fit logistic model and L0 is the log likelihood of the intercept-only model. L is calculated as [see Helsel and Hirsch, 1995, Chapter 15]

equation image

where yi is the binary observations and equation imagei is the predicted probabilities for observations i = 1, N. L is a negative number, so the coefficients are estimated so as to maximize it (i.e., bring it closer to 0).

[27] In this application, the diagnostics indicated that the water quality variable of interest T could be sufficiently described by a single predictor variable S. However, when using this approach generally (i.e., at another location and/or for another water quality variable), there may be multiple predictors that need to be considered (e.g., streamflow and water temperature). We note that, methodologically, it would be straightforward to fit the logistic function to a suite of best-fitting predictor variables, which could be found through an objective criteria statistic, such as the aforementioned GCV. We emphasize the importance of data diagnostics in predictor selection and in determining the appropriateness of using this approach.

[28] Step 4: Quantify exceedance likelihood

[29] The key quantity of interest is the total likelihood of water quality threshold exceedance for a given seasonal forecast. As such, the theorem of total probability [Ang and Tang, 2007] can be modified into its continuous form to obtain

equation image

where P(Te) is the likelihood of a turbidity exceedance given the seasonal forecast, P(TeSf) is the conditional threshold exceedance probability found in Step 3, and P(Sf) is the streamflow density distribution under the forecast. The empirical probability density function (PDF) from the streamflow ensemble generated in Step 2 is used to quantify P(Sf).

4. Results

4.1. Turbidity Forecast Quantification

[30] Following the main steps of the approach, the likelihood estimates are computed for the four forecast scenarios (Table 1) along with the historical record, which provides a climatological comparison.

[31] For each of the forecast scenarios, the empirical distributions, both the PDFs (Figure 8a) and the cumulative density functions (CDFs; Figure 8b), were generated for the streamflow ensembles. For the dry, historic, and wet forecast scenarios, the PDFs are quite similar, generally wide, and relatively flat. Comparatively, the PDFs of the very dry and very wet scenarios are sharper and are shifted toward the lower and higher flows, respectively. To get a general sense of how these PDFs relate to observed data, average winter season streamflow values for the driest season (1977), the median season (1992), and the wettest season (1974) are overlaid, which correspond well to the PDFs for the very dry, average, and very wet scenarios, respectively. More clearly than the PDFs, the CDFs (Figure 8b) show that, for wet (dry) scenarios, the curve shifts below (above) the climatological curve, indicating that there is a greater (lesser) likelihood of exceeding a given streamflow s, consistent with the forecast scenarios. From the PDFs and CDFs, it is evident that the wet and dry scenarios, which are representative of historic IRI forecasts for the case study area, are not very different from climatology. As previously stated, this was part of the motivation to include the very wet and very dry scenarios, which provide a better insight of the likelihoods under sharper forecasts. It should be noted that in our resampling scheme, the paired historic turbidity values could have been resampled along with the flows, from which empirical PDFs and CDFs of the turbidity could be constructed and directly interpreted. However, we note that this only has limited applicability for the case where the empirical streamflow distributions are obtained through resample. In keeping with a general approach, we note that streamflow forecasts can be obtained through physical watershed models or statistical techniques, for which the paired turbidity would not be available.

Figure 8.

The empirical (a) PDF and (b) CDF distributions for the average monthly streamflow for each forecast scenario. The stars below the PDF indicate average winter season streamflow values for the driest season (1977), the median season (1992), and the wettest season (1974).

[32] The resulting functions from the local and global logistic regression (Figure 9) visually show how both the local and global logistics fit the observed values. The global intercept-only function is shown for reference and clearly does not fit the observed data well. The local logistic model performed better than its global logistic counterpart in terms of L, R2, and GCV (Table 2). As such, the resulting P(TeSf) function from the local logistic regression (black line, Figure 9) is used henceforth in the subsequent analyses. For the local logistic, the best neighborhood size was α = 0.45 and p = 1, indicating that roughly half the data points were used to fit the local logistic regression at each estimation point. The function shows that the probability estimates increase rapidly around 900 cfs and then mildly increase with higher streamflows. An example of this function's ability to capture local features is exhibited at the small “bump” just below 500 cfs, reflecting the historical observed exceedance at this streamflow. The local logistic also well follows the “Observed % Above Threshold” points (i.e., solid gray dots). These points were calculated by binning the observed discrete data every 150 cfs and then, for each bin, dividing the number above the threshold by the total number observed.

Figure 9.

Monthly observed discrete turbidity values are regressed against average monthly streamflow values using local and global logistic regression.

Table 2. Coefficients and Goodness-of-Fit Statistics for Logistic Models
Logistic ModelCoefficientsGoodness-of-Fit Statistics
B0B1LR2GCV
  • a

    Coefficients are estimated locally, with α = 0.45 and p =1.

  • b

    Intercept-only model.

Local-a-a−24.00.3440.346
Global−6.570.00479−28.10.2320.391
Globalb−2.620−36.6--

[33] Next, the convolution of P(Sf) and P(TeSf) (Figure 10) shows how different flow ranges contribute to the overall probability. Again, a small “bump” is seen just below 500 cfs due to the aforementioned historical exceedance at this streamflow. For all of the scenarios except for very dry, the biggest contribution comes from a monthly average streamflow of about 1000 cfs. Above 1000 cfs, the function starts to decrease, which may initially seem counterintuitive. However, keeping in mind that the function is convoluting both the likelihood of a turbidity event given a certain flow and the likelihood of a certain flow, it stands that even though the likelihood of a turbidity exceedance is higher for higher streamflows, the rarity of those high flows decreases their contribution to the overall likelihood. As would be expected, Figure 10 also shows that the probability curve shifts up (down) as the forecast gets wetter (drier). To quantify this, the area under each of the forecast curves is computed to get the total likelihood (Table 3). The resulting probability of a turbidity exceedance is 6.7% and 4.1% for the wet and dry forecast scenarios, respectively, compared to 5.8% of climatological forecast. The very wet and very dry forecasts show a more pronounced shift from climatology (Table 3). We note that the shift in the likelihood is dependent on the seasonal climate forecast.

Figure 10.

The convolution of P(TeSf) and P(Sf) for each forecast scenario.

Table 3. Total Likelihood of an Exceedance for Each Hydrologic Scenario for Current SWTR Standard
ScenarioP(Te)
Very wet14%
Wet6.7%
Historic5.8%
Dry4.1%
Very dry1.5%

[34] This approach also offers flexibility in assessing impacts of changing the threshold. For instance, utilities may choose to operate within a given safety factor of the prescribed regulation or regulations may become more stringent. Lowering the regulatory threshold from 5 to 1 NTU increases the likelihood of exceedance from the range of 1.5–14% to a range of 30–64% (Figure 11). This type of information could be useful in evaluating planning alternatives under potential regulatory scenarios.

Figure 11.

P(Te) for varying turbidity thresholds.

4.2. Turbidity Forecast Evaluation

[35] It was very difficult to conduct a meaningful quantitative evaluation of the probabilistic turbidity forecasts using traditional skill measures for a number of reasons. First, water quality forecasts can only be as good as the climate forecasts upon which they are based. Although seasonal climate forecasts are getting better with enhanced understanding of the climate system and improved climate models, presently these forecasts have only been in existence for a very short period, span large geographic areas, and have modest skills compared to climatology. These aspects of the seasonal forecast, combined with a very low climatological exceedance probability, made a traditional skill evaluation difficult.

[36] Nevertheless, we employed a simple evaluation method to obtain insights into the forecast skill. Here, the seasonal precipitation forecasts from IRI for 40 available months for the period 1997–2007 were examined to see if the these forecasts offered an advantage in predicting the turbidity exceedance. For example, if there was an observed exceedance during a “wet” forecasted month, the forecast was considered beneficial (i.e., the forecast provided an advantage over climatology), whereas an observed exceedance during a “dry” forecasted month would be detrimental. Eighteen months had a forecast that “tied” with climatology (i.e., the forecast was A/N/B = 33:33:33). Using these criteria and threshold values that ranged from 5 NTU to 1 NTU, we evaluated the remaining 22 months. Again, we point out that the forecasts during this time were very conservative, either wet or dry as we have defined them in this paper in 20 of the 22 months, and only slightly sharper in both the remaining 2 months (i.e., A/N/B = 25:30:45). Nonetheless, for every threshold considered, there were more months for which the forecasts offered an advantage over climatology (Table 4). For the 5 NTU case, the IRI forecast provided an advantage in 13 of the 22 months (59%).

Table 4. Number of Months for Which the Historic IRI Forecasts Provided an Advantage Over Climatology
Threshold (NTU)Advantageous to use forecast? (months)
YesNo
5139
4139
3157
2166
1148

[37] The need for skillful forecast can be further seen by examining the time series of average winter season streamflows (i.e., the average of the water year's four winter months) along with the maximum seasonal turbidity (Figure 12). It can be seen that for the two water years with average flows exceeding the 95th percentile (i.e., 1974 and 1996), both seasons experienced a turbidity exceedance above 5 NTU. For the seasonal flows exceeding the 66th percentile, 5 of the 13 seasons (38%) experienced turbidity exceedances above 5 NTU, and 7 of 13 seasons (53%) experienced an exceedance above 4 NTU. This demonstrates that the use of climate forecasts, especially in wet years, can be of substantial value when translated into water quality forecasts. In light of these results, we note the importance of having these types of methods developed and ready to use as seasonal climate forecasts become more skillful.

Figure 12.

Time series of winter season average streamflows (vertical lines), with the corresponding maximum winter season turbidity (T) range indicated by the legend key. Historic streamflow percentiles (horizontal lines) are overlaid.

5. Summary and Discussion

[38] We developed a local logistic regression-based approach to estimate threshold exceedances of water quality variables conditioned on seasonal climate forecast. In this, seasonal streamflow ensembles were generated using the seasonal precipitation forecast and the conditional turbidity threshold exceedance was modeled using a local logistic regression. Consequently, for a given seasonal streamflow ensemble, the total likelihood of threshold exceedance was computed. We believe this effort to be distinctive in its contribution, both through its introduction of a robust, functional approach to modeling the likelihood of water quality threshold exceedance and its ability to readily incorporate probabilistic climate information.

[39] The approach was demonstrated in the context of forecasting turbidity for a drinking water utility in the Pacific Northwest, where occasional high winter streamflows cause elevated turbidity levels, requiring the utility to switch to a more expensive backup groundwater source. The method forecasts the likelihood of regulatory threshold exceedance occurrence and is offered for planning purposes such as resource allocation for operations. The approach was also used to evaluate the impacts of threshold changes, which could be useful on longer planning time scales. In all cases, this approach is meant to provide a complementary tool to the practices and procedures that utility managers already employ.

[40] The methodology was applied to four seasonal precipitation forecast scenarios and compared to a more traditional approach that would only rely on the historic record (i.e., climatology). Two of the scenarios considered are typical seasonal forecasts, which are fairly conservative in that they do not deviate sharply from climatological forecast. Consequently, the shifts in exceedance probabilities for those scenarios are subtle. Using a simple evaluation of forecast performance, we found that incorporating the seasonal climate forecasts did provide useful skill in the prediction of threshold exceedance. While the evaluation results are not exceptional, we note that our water quality forecasts can only be as good as the forecasts upon which they are based. In addition to being conservative (i.e., similar to climatology), the underlying forecast skills are modest. As for the two synthetic scenarios, the calculated exceedance probabilities diverged noticeably from climatology. These cases are informative in that they are the situations that hold the most potential for disruption. In addition, because a general consequence of a warmer climate is an intensification of the hydrologic cycle [Intergovernmental Panel on Climate Change (IPCC), 2007] and, thus, likely to impact streamflow magnitude and consequently turbidity, the tools presented here can be extended to provide estimates of threshold exceedances under changing climate. Our efforts at this have borne out encouraging results (E. Towler, et al., Modeling hydrologic and water quality extremes in a changing climate, submitted to Water Resources Research, 2009).

[41] By providing a generalized approach, the tool is flexible in a variety of ways. The method is portable and could be applied to other measures of water quality of concern where the diagnostics show a promising relationship with climate or hydroclimate. Where appropriate, additional independent variables can be easily incorporated as predictors into the local logistic regression approach. Furthermore, a local polynomial regression based approach can be used to generate ensembles of water quality variables if this is desired instead of threshold exceedances [e.g., Grantz et al., 2005; Regonda et al., 2006]. Statistical modeling approaches such as the one presented here can serve as an attractive tool for water quality modeling, although we note that alternative methods, such as mechanistic models, can be explored.

[42] To a large extent, the proposed approach is able to characterize the various sources of uncertainty, chiefly streamflow variability, parameter uncertainty, and model uncertainty. By far, the largest source of uncertainty is from the variability in streamflow, which is captured by generating ensembles of flow based on the seasonal forecast (i.e., Step 2). Model parameter and functional estimation uncertainties can be readily obtained from the standard errors provided by the model [Helsel and Hirsch, 1995]. The model uncertainty is a much more complex problem, as the predictor variables in the model are dependent on the understanding of the system and availability of data. Multimodel ensembles can be employed to capture some of the model structural uncertainty [e.g., Regonda et al., 2006].

[43] As regulations become more stringent and water quality concerns become more prevalent, utility managers will need additional tools to facilitate efficient planning and management. As forecasts continue to improve, the ways in which they can contribute to water management should be exploited; the proposed approach provides an important advance in this endeavor.

Acknowledgments

[44] The authors would like to acknowledge Water Research Foundation project 3132, “Incorporating climate change information in water utility planning: A collaborative, decision analytic approach,” the National Water Research Institute (NWRI) through a NWRI fellowship to the first author, and the U.S. EPA through a STAR fellowship to the first author for partial financial support on this research effort. This publication was developed under a STAR Research Assistance Agreement F08C20433 awarded by the U.S. Environmental Protection Agency. It has not been formally reviewed by the EPA. The views expressed in this document are solely those of the authors, and the EPA does not endorse any products or commercial services mentioned in this publication. The second author is thankful to NCAR for providing a visitor fellowship during the course of this study. NCAR is sponsored by the National Science Foundation. In addition, they thank the staff of the Portland Water Bureau for providing data and useful discussions.

Ancillary