Significance of trends toward earlier snowmelt runoff, Columbia and Missouri Basin headwaters, western United States



[1] We assess changes in runoff timing over the last 55 years at 21 gages unaffected by human influences, in the headwaters of the Columbia-Missouri Rivers. Linear regression models and tests for significance that control for “false discoveries” of many tests, combined with a conceptual runoff response model, were used to examine the detailed structure of spring runoff timing. We conclude that only about one third of the gages exhibit significant trends with time but over half of the gages tested show significant relationships with discharge. Therefore, runoff timing is more significantly correlated with annual discharge than with time. This result differs from previous studies of runoff in the western USA that equate linear time trends to a response to global warming. Our results imply that predicting future snowmelt runoff in the northern Rockies will require linking climate mechanisms controlling precipitation, rather than projecting response to simple linear increases in temperature.

1. Introduction

[2] Climate warming is of special concern in regions dominated by snowmelt runoff like the northern U.S. Rocky Mountains, where the past ∼50 years have shown warming of ∼1°C and future warming can be anticipated (J. Hansen et al., GISS surface temperature analysis global temperature trends: 2005 summation, 2005, River systems in the water restricted western United States receive about 60% of annual discharge directly from melting snow [Serreze et al., 2001]. Warming temperatures could logically lead to a greater fraction of precipitation falling as rain and less seasonal water storage in snow, and could shift the spring snowmelt pulse forward in time due to earlier melting of the mountain snowpack. Analyses of regional hydrologic data of the late 20th century have suggested that, in fact, the western North American snowpack has decreased and snowmelt runoff is coming earlier. Mote [2003, 2006] found negative linear trends in April 1 snow water equivalent (SWE) (decreasing 10 to 60%) over the last ∼40–50 years at many snow course sites in the western United States. Stewart et al. [2005] found negative linear trends in several measures of spring runoff timing for the period 1948–2003. For example, spring runoff in streams located in the headwaters of the Columbia and Missouri Rivers shifted earlier by about 6–19 days for the 55 year period of record. Stewart et al. [2005] and McCabe and Clark [2005] attributed their calculated trends toward earlier runoff mostly to temperature increase rather than precipitation trends. The recent report from Working Group I of the International Panel on Climate Change (IPCC) summarized these findings by stating that in western North America earlier stream flows imply peak snow water accumulation has shifted forward by about two weeks since 1950 [Lemke et al., 2007]. Such changes in snowpack and snowmelt runoff could have major implications for water resource management and sustainability of river ecosystems. For example, predictions of major changes in snowpack timing in the Sierra Nevada have led the California Department of Water Resources to propose two new large dams, costing over $4 billion [Boxall, 2007].

[3] Unambiguously detecting change in runoff timing is difficult because time series are short and inter-annual variability is high. Rarely are hydrologic trends obviously linear and large year-to-year variability and decadal periodicity commonly obscure trends, so that small trends in climate must be detected in the presence of large amounts of natural noise [Hulme et al., 1999; Wilby, 2006]. Here we investigate the timing of spring runoff over the last 55 years in the headwaters of the Columbia and Missouri Rivers. Our work differs from previous research in two important ways. First, we use strict criteria to remove records with any evidence of anthropogenic land and water use effects. Second, we assess the correlation between runoff timing and total annual runoff by using statistical methods that account for errors due to multiple tests. Our analysis suggests that snowmelt runoff timing has changed but that it is better explained by changes in discharge than by time alone.

2. Data Selection and Analyses

[4] We used selective discharge data collected at locations within the Columbia and Missouri Rivers headwaters (U.S. Northern Rocky Mountains) from the U.S. Geological Survey Hydro-Climatic Data Network (HCDN). Only those stations with continuous records over water years 1951–2005 were used because these gave the longest, continuous record for the largest number of gages. We further restricted stations to those with no identifiable impacts from water storage and irrigation withdrawal. Instead of relying only on the criteria developed for the HCDN gages, we used databases (imported to a GIS) from the USGS, U.S. Army Corps of Engineers, U.S. Department of Agriculture, and state water and agriculture agencies to identify water storage facilities and irrigation withdrawals above each gage [Cannon and Johnson, 2004; Ruddy and Hitt, 1990; J. Watermolen, 1:2,000,000-scale hydrologic units of the United States, version 4.2, 2006,; U.S. Geological Survey, National water information system, 2006,]. We checked these results by examining the basin above each gage using satellite images and photography (products with <10 m resolution) overlain on digital elevation data. This approach found small dams, irrigation diversions, mines and developments not referenced in the databases. We did not try to establish historical changes in land use, but most acceptable gages showed little or no obvious disturbance in the images (from 1990s to 2005). These stringent selection criteria eliminated many of the HCDN gages in the headwaters. Only 21 were considered suitable for rigorous statistical analysis (Figure 1). All are located in higher elevation headwaters with hydrographs dominated by snowmelt runoff.

Figure 1.

Headwaters of the Columbia and Missouri rivers. Red triangles, dams; green dots, all gages; black circles, gages used in this study.

[5] For each of the 21 stations and each water year from 1951–2005, we calculated the day within the water year (day 1 = Oct 1) at which each percentile of the annual flow occurred (Figure 2). For example, the day of the 50th quantile of flow (50th DQF or day of median flow) is the day that the first 50% of the year's total flow passed a station. Our 50th DQF is similar to the “date of center of mass of flow” (i.e., centroid) [Hodgkins and Dudley, 2006; McCabe and Clark, 2005; Stewart et al., 2005], but is less sensitive to outliers in the flow (auxiliary material). We also did not use the “day of the start of runoff” because this is undefined in typical, complex runoff distributions responding to highly variable spring weather. We present results from three percentiles (25th, 50th and 75th) which represent early, mid and late season flows.

Figure 2.

Cumulative runoff and DQF time series for gage 7 (Table 1). (a) Cumulative flow vs. water year day; time series grade from pure green (1951) to pure red (2005) in 55 increments. (b) Variability over the study interval of the 10th, 25th, 50th, 75th, and 90th DQFs (from left to right); gray bar is the range of the 50th DQF for the record.

[6] We considered two simple linear regression models, DQFt = β0 + β1Year + ɛt (Time model) and DQFt = β0 + β1Qt + ɛt (Discharge model) where Qt is the annual discharge anomaly standardized to have mean 0 and standard deviation 1 to compare gages with wide ranges in discharge. The true slope, β1, for each model is tested for being different from 0 using a t-test. These models assume that the errors, ɛt, are independent, normally distributed and have the same variance. Nonparametric techniques could also have been used, but are required over standard linear regression models only when working with time series where serial correlation in the residuals is observed but not modeled or there are other concerns about the linear regression assumptions, such as extreme outliers. The only potential outliers observed were small, in the middle of the series and were not high leverage points (i.e., “endpoints”), so would have little effect on the results. There is also little autocorrelation in the residuals and distinctly non-normal residuals were not observed.

[7] Although the most standard limit for considering a statistical test to be significant is 5% (p < 0.05), we used a 10% significance level (p < 0.10) for comparison to previous studies [Stewart et al., 2004, 2005] that used this less strenuous limit. The significance level, α, is the probability of falsely rejecting a true null hypothesis on a single test, which here would correspond to detecting a linear time trend when none exists. When doing multiple tests, the probability of incorrectly detecting a significant result for at least one test increases with the number of tests conducted. To account for this we used the method of Benjamini and Hochberg [1995] which controls the “false discovery rate” on average but retains more power to detect significant results. We ordered the m p-values, p1 ≤ p2 ≤ … ≤ pm, and found the largest integer J such that pJ ≤ Jα/m. Any hypothesis whose p-value is less than or equal to Jα/m was rejected, so the adjusted significance level for the m tests was α* = Jα/m. Confidence intervals were similarly adjusted by modifying the confidence level used for the interval based on the adjusted significance level; confidence level = 1 − α* [Benjamini and Yekutieli, 2005]. By using this adjustment technique, confidence is higher that significant results are real. Failing to correct for multiple tests guarantees some unidentifiable spurious results amongst many tests. Previous studies [e.g., Stewart et al., 2004, 2005], using uncorrected p-values, have not addressed these issues and have higher risk of finding significant results that are not real. In our study, we performed m = 126 tests at α = 0.10, which led to α* of 0.05. In the previous similar studies [Stewart et al., 2004, 2005], m was in the thousands, which would lead to a even smaller adjusted significance level and so those studies overestimate significant results.

3. Results

[8] All analyzed gages from the headwaters of the Columbia and Missouri Rivers showed large temporal variability in runoff timing over the 55 years of record (Table 1). For example, in the Flathead River at West Glacier, Montana (gage 7, Table 1, and Figure 2), the day that the first 25% of flow occurred (25th DQF) ranged over 123 days in the 55 year study interval. The 50th DQF ranged over 39 days and the 75th DQF over 27 days. Similarly, flows for any particular date at gage 7 ranged over a large percentage of the total annual flow. April 1 (day 183 on Figure 2) flows were as little as ∼7% of the total flow for the year or as much as ∼37% of the total flow. For all the gages, the timing of the 25th DQF ranges from 66 to 165 days, the 50th DQF from 29 to 93 days and the 75th DQF from 26 to 59 days (Table 1).

Table 1. Range in Timing of Flow Over Last 55 Yearsa
IDUSGS #Observed RangeShift Due to Trend
  • a

    Observed range, the range in days over which the DQF ranged for 55 years of records; shift due to trend, the days of change for the 55 years of record determined from the slope of the linear model with Time,± number of days within the 90% CI; NS, not significant. 25th, 50th and 75th are respective DQFs.

1131850001064137−23 ± 17−11 ± 8−8 ± 7
3123215001384540−18 ± 17−13 ± 8−10 ± 7
413313000864236NS−21 ± 16−11 ± 9
5132400001104039NS−9 ± 8−7 ± 7
6133370001274839NS−9 ± 8−9 ± 8
7123585001233927NS−9 ± 9NS
812332000904328−21 ± 21NSNS
1212355500783328−19 ± 14NSNS
13123355001399359−30 ± 27NSNS
15132350001214332NS−10 ± 9NS
18050175001232936−18 ± 17NSNS
20130830001137230NSNS−7 ± 6
2106191500854031NS−10 ± 7−8 ± 6

[9] Trends in the 25th, 50th & 75th DQF linear models for Time and Discharge of all 21 gages are summarized in Figure 3. For the 25th DQF there were 6 significant values out of a total of 21 gages for the Time model, whereas, 11 of 21 gages were significant for the Discharge model. For the 50th DQF, the Time model was slightly more significant, providing sufficient evidence for 8 of 21 gages, whereas 16 of 21 gages were significant for the Discharge model. The 75th DQF showed significant values for 7 of 21 for Time, while 15 of 21 were significant for Discharge.

Figure 3.

Regression results for two models (Time and Discharge) for three DQFs for each of the 21 gages (Table 1). The abscissa is the value of the trend. Time, days/year; discharge, change in days/standard deviation of total discharge. The error bars are 90% CI. Red squares, significant results; blue circles, insignificant results.

[10] The Time model regressions all showed negative slopes, but only about one third of these were significant. The slopes of these significant gages produce changes that are small when compared to the range of each DQF at each gage (Table 1 and Figure 3) and confidence intervals correspond to a wide range of potential results including no or almost no change over time. Discharge regressions show many more significant outcomes, accounting for 50% of the gages for the 25th DQF and 75% of the gages for the 50th and 75th DQFs. All but one of these significant relationships are positive, showing several days of change per standard deviation of flow (Figure 3). These relationships suggest that discharge is a stronger controlling variable than time.

4. Discussion and Conclusion

[11] Only considering the results from the Time model regressions, our results using corrections for multiple statistical tests appear to support the basic assumption from previous work that snow melt runoff is occurring earlier now than ∼55 years ago: All gages show a negative trend in measures of runoff timing. However, changes are all relatively small within the context of the variability, resulting in a majority of insignificant results. This suggests that many gages in a region may contain similar information that is not significantly linear, but is recording a small linear (or “pseudo-linear”) trend within highly noisy data. This also leads to questions about the practical significance (potential use for projections) of the linear time trends that are small compared to observed range. For example, the 50th DQF slope of gauge 7 gives a change of −9 ± 9 days per 55 years, compared to an observed range of 39 days (Table 1). While this is a detection of change, and may be relevant to some ecological processes [Brown et al., 2007; Durance and Ormerod, 2007], it is also important to note that the magnitude of change in all gauges has so far been small relative to variability.

[12] Our results suggest that in the headwaters of the Columbia and Missouri rivers, far more gages show significant runoff trends related to discharge than those related to time. Similarly, an analysis of discharge timing into the Hudson Bay [Dery et al., 2005] found that peak discharge associated with snowmelt advanced by 8 days from 1964 to 2000 in response to decreasing runoff. A potential mechanism for discharge control on runoff timing is revealed by a simple conceptual model (Figure 4). Two annual hydrographs are depicted, one for a high snowmelt runoff year and one for a low snowmelt runoff year. The base flow during the two years is unchanged, but the spring snowmelt pulse of the high year has twice the volume of the low year. Spring runoff initiates on the same day during the two years, but the pulse duration of the small year is one third shorter. The conceptual model therefore assumes that less snow throughout the basin shortens the duration of snowmelt pulse, but that high elevation snow cannot runoff until sufficient seasonal warming has occurred. We set the high-year pulse duration to 135 days, and calculated the resulting time shift of percentiles between the high and low years. Although actual basin processes are oversimplified, this demonstration shows a forward shift of 19 days (14%) by the 50th DQF during the low year due to runoff volume alone. This relationship between discharge and timing will hold for other measures of runoff timing, including the “first day of snowmelt” and “centroid”, because the algorithm used to select those measures utilize the mean flow [Cayan et al., 2001]. Hence, changes in runoff alone will affect any analyses of runoff timing, with higher flows producing “later” runoff and lower flows producing “earlier” runoff, exactly the relationship we see in our analyses of Columbia-Missouri headwater gages.

Figure 4.

Conceptual snowmelt model. Green line, “low flow”; blue line, “high flow”; circles, 25th DQF; triangles, 50th DQF; squares, 75th DQF.

[13] Recent work has suggested ongoing climate warming may soon impact the frequency, severity, and duration of coupled ocean-atmosphere phenomena [Hansen et al., 2006]. Because precipitation and streamflows in the northwestern United States have been closely tied to conditions in the Pacific Ocean [Beebee and Manga, 2004; Cayan et al., 1999; Gobena and Gan, 2006], a consequence of future warming could be non monotonic time-shifts of snowmelt runoff associated with major changes in discharge, instead of a simple, monotonic response to global warming.