On the stationarity of annual flood peaks in the continental United States during the 20th century



[1] Annual peak discharge records from 50 stations in the continental United States with at least 100 years of record are used to investigate stationarity of flood peaks during the 20th century. We examine temporal trends in flood peaks and abrupt changes in the mean and/or variance of flood peak distributions. Change point analysis for detecting abrupt changes in flood distributions is performed using the nonparametric Pettitt test. Two nonparametric (Mann-Kendall and Spearman) tests and one parametric (Pearson) test are used to detect the presence of temporal trends. Generalized additive models for location, scale, and shape (GAMLSS) are also used to parametrically model the annual peak data, exploiting their flexibility to account for abrupt changes and temporal trends in the parameters of the distribution functions. Additionally, the presence of long-term persistence is investigated through estimation of the Hurst exponent, and an alternative interpretation of the results in terms of long-term persistence is provided. Many of the drainage basins represented in this study have been affected by regulation through systems of reservoirs, and all of the drainage basins have experienced significant land use changes during the 20th century. Despite the profound changes that have occurred to drainage basins throughout the continental United States and the recognition that elements of the hydrologic cycle are being altered by human-induced climate change, it is easier to proclaim the demise of stationarity of flood peaks than to prove it through analyses of annual flood peak data.

1. Introduction

[2] Numerous studies have investigated the presence of temporal trends in streamflow records in the continental United States. U.S. Geological Survey (USGS) [2005, p. 2] reported that “streamflow has been increasing in the United States since at least 1940.” Most of these increases were found to occur for low to moderate streamflows [e.g., Lettenmaier et al., 1994; Gebert and Krug, 1996; Lins and Slack, 1999; Douglas et al., 2000; McCabe and Wolock, 2002; Garbrecht et al., 2004; Small et al., 2006; Kalra et al., 2008]. For annual maximum daily discharge and flood peaks, it is difficult to draw general conclusions. Changnon and Kunkel [1995] found an upward trend in floods in the upper Midwest, while Gebert and Krug [1996] and Juckem et al. [2008] found a decrease in annual flood peaks for stream gauge stations in the Driftless Area of Wisconsin (southwestern part of the state). Groisman et al. [2001a] and Groisman et al. [2001b] found an increasing trend for high discharge, particularly in the eastern part of the continental United States. Other studies, however, did not find significant trends in annual maximum discharge [e.g., Lins and Slack, 1999; Douglas et al., 2000; McCabe and Wolock, 2002; Small et al., 2006].

[3] The ambiguous nature of temporal trends in flood records and, more generally, questions about assumptions of stationarity, are tied to physical processes associated with flood production, sample properties of the flood records and statistical procedures that are used to infer distributional properties of flood series. Sample properties can impose major limitations on statistical procedures that attempt to infer temporal changes in flood peak distributions. In the majority of studies in the literature, inferences are based on records that are much shorter than 100 years. A central element of this study is the examination of temporal changes in flood peak distributions based on some of the longest records in the United States.

[4] In our study we use annual maximum instantaneous peak discharge data from 50 U.S. Geological Survey (USGS) stations in the continental United States with a record of least 100 years [e.g., Blanchard, 2007; U.S. Geological Survey, A new evaluation of the USGS streaming network: A report to Congress, 1998, available at http://water.usgs.gov/streamgaging/report.pdf]. Drainage areas for the stations included in this study range from 420 km2 (Weber River near Oakley, Utah) to 1,805,222 km2 (Mississippi River at St. Louis, Missouri). Many of the drainage basins included in this study, like many of the drainage basins in the United States as a whole, are affected by regulation from systems of reservoirs. The downstream impact of dams [e.g., Williams and Wolman, 1984; Graf, 1999, 2006] is a central element of flood peak distributions for many stream gauging stations in the United States and around the world. Similarly, the 20th century was a time of profound land use changes involving agricultural practice, urbanization and forest management; changing land use has significant impacts on the hydrologic cycle [e.g., Leopold, 1968; Graf, 1977; Sauer et al., 1983; Potter, 1991; Smith et al., 2002; Zhang and Schilling, 2006]. In addition to changing composition of the land surface, human-induced climate change can potentially have significant impacts on flood peak distributions [e.g., Groisman et al., 2001a; Garbrecht and Piechota, 2006].

[5] The aim of this study is the investigation of the validity of the stationarity assumption. In this work, we define as stationary a hydrologic time series that “is free of trends, shifts, or periodicity (cyclicity)” [Salas, 1993, p. 19.5]. For an extensive discussion about the notions of stationarity and nonstationarity in hydrology, consult Matalas [1997] and Koutsoyiannis [2006]. The most common ways to check whether this assumption is valid or not are checking for the presence of slowly varying changes (trend analysis) or change points (i.e., the occurrence of abrupt changes in the mean and/or the variance of the distribution of the variable of interest). The main difference between the two analyses is that when a trend is detected, it is likely to continue in the future. On the other hand, the presence of a change point highlights the shift from one regime to another, and the status is likely to remain the same until a new regime shift occurs.

[6] Studies investigating the validity of the stationarity assumption in the continental United States typically focus on either change point analysis [e.g., Perreault et al., 1999; Rasmussen, 2001] or trend analysis [e.g., Lettenmaier et al., 1994; Lins and Slack, 1999]. When change point and trend analyses are both performed [e.g., McCabe and Wolock, 2002; Kalra et al., 2008; Miller and Piechota, 2008], the former follows the latter, and not vice versa: the information from the change point analysis is not taken into account to divide the time series into two subseries (before and after the change point) on which the trend analysis is then performed separately.

[7] This approach may lead to misleading results. Consider for instance the example in Figure 1, where we generated two samples of size 50; the first (last) 50 values were drawn from a normal distribution with mean equal to 10 (12) and variance equal to 2. Performing trend analysis without considering the presence of the change point would result in the detection of a statistically significant increasing trend (solid grey line). However, after accounting for the change point, the two subseries do not present statistically significant slopes. In this study, trend analysis follows change point analysis. In most previous studies, these analyses are conducted by using one or two tests to identify the presence of temporal trends or change points. Employing the results of different tests provide a more reliable and robust indication about the presence of changes over time.

Figure 1.

Example showing the impact of a change point in the trend analysis. Simulated series of 100 independent normal variates is shown. The first (last) 50 realizations were generated from a normal distribution with mean equal to 10 (12) and variance equal to 2. The solid grey line represents the result from trend analysis neglecting the presence of the change point. The black lines highlight the lack of a trend when the change point is accounted for.

[8] Change point analysis is generally performed only to detect changes in the mean, even though change points in the variance may have a significant impact on the extremes [e.g., Mearns et al., 1984; Katz and Brown, 1992; Meehl et al., 2000]. In this study we perform change point analysis for both mean and variance. As described in section 3, five tests for change point detection were examined and the nonparametric Pettitt test [Pettitt, 1979] was selected for analysis of the annual flood peak records.

[9] Additionally, we parametrically model the streamflow data for all the stations independently of the validity of the stationarity assumption (see Khaliq et al. [2006] for a discussion about the subject) by means of a generalized additive model for location, scale, and shape (GAMLSS) [Rigby and Stasinopoulos, 2005]: GAMLSS models are flexible enough to accommodate the presence of abrupt changes in the mean and/or variance as well as temporal trends in the variable of interest. The use of these models provides additional evidence concerning the presence (or absence) of abrupt and/or slowly varying changes [e.g., Zhang et al., 2004].

[10] Finally, an element that is often overlooked in analyses of stationarity of flood records is the presence of long-term persistence in the flood series [e.g., Hurst, 1951; Potter, 1976]. As recently discussed by Koutsoyiannis [2006], the concepts of stationarity, persistence and scaling should be analyzed jointly [see also Potter, 1976]. In particular, some of the patterns observed in hydrologic series could be better explained by accounting for long-term persistence [e.g., Koutsoyiannis, 2002, 2003, 2006; Koutsoyiannis and Montanari, 2007]. Therefore, in this study we also investigate whether the behavior observed in the annual maximum peak discharge could be better explained in terms of long-term persistence.

[11] The paper is organized as follows. In section 2 we describe the data. In section 3 we discuss long-term persistence and GAMLSS models, together with a brief description of change point and trend tests used in this study. Major results of our analyses are presented in section 4. A summary and conclusions are presented in section 5.

2. Data

[12] The results of this study are based on data from 50 USGS stream gauging stations (Table 1) in the continental United States (Figure 2), each with at least 100 years of measurements of annual maximum instantaneous peak discharge. As shown in Figure 2, there is a concentration of catchments in the eastern United States. Fewer catchments are located in the midwest, Rocky Mountains, and west coast. The drainage area of the selected basins range from 420 km2 to 1,805,222 km2 and can provide valuable indication about the presence or lack of statistically significant trends across the continental United States.

Figure 2.

Map showing the location of the USGS stream gauge stations with a record of at least 100 years included in this study. The southwest (northeast) corner has coordinates 24.20°N–66.85°W (49.55°N–124.95°W).

Table 1. Summary of the Characteristics of the Basins Considered in This Study
USGS IDStateRiver NameDrainage Area (km2)Number of Observations
02489000MSPearl near Columbia14,815102
02492600LAPearl at Pearl River21,999102
03051000WVTygart Valley1,052100
03270500OHGreat Miami6,503115
03373500INEast Fork White12,761104
05054000NDRed at Fargo17,612106
05079000MNRed Lake13,649103
06714000COSouth Platte10,000110
06724000COSt. Vrain559104
07096000COArkansas at Canon City8,073116
07144300KSArkansas at Wichita104,869110
14048000ORJohn Day19,632101

[13] Important features of the data are illustrated in time series plots (Figure 3) for the Connecticut River at Hartford (USGS ID 01190070; 27,161 km2), the Potomac River at Point of Rocks (USGS ID 01638500; 24,996 km2), the Great Miami River at Dayton (USGS ID 03270500; 6,503 km2), the South Platte River at Denver (USGS ID 06714000; 10,000 km2), and Columbia River at the Dalles (USGS ID 14105700; 613,827 km2). Annual flood peaks at these sites are typical of the 50 stations in reflecting a strong influence of reservoir control, extreme flood peaks and a diversity of storm climatologies that control the flood peak distribution.

Figure 3.

Annual maximum peak unit discharge for (a) the Connecticut River at Hartford, (b) the Potomac River at Point of Rocks, (c) the Great Miami River at Dayton, (d) the South Platter River at Denver, and (e) the Columbia River at Dalles.

[14] The Connecticut River station at Hartford (Figure 3a) has the longest continuous record of the 50 stations, with observations extending back to 1838. The upper tail of flood peaks is dominated by two events. The March 1936 flood, which had a unit discharge of 0.33 m3 s−1 km−2, was a rain-on-snow event, which produced the flood of record for many of the major Atlantic and Ohio River drainages [e.g., Miller, 1990] and stimulated one of the major periods of restructuring rivers basins in the eastern United States, through the Flood Control Act of 1936 [e.g., Billington and Jackson, 2006]. The second largest flood peak (0.26 m3 s−1 km−2) was from the hurricane of September 1938 (the “Long Island Express”) [e.g., Emanuel, 2005] which devastated Long Island and much of New England. The 1938 flood peak in the Connecticut River illustrates the central role of tropical cyclones as flood agents over the entire eastern United States, from Florida through New England. Miller [1990] presents a “scaling” paradigm for flood peaks in the eastern United States, in which flood peaks at the largest drainage areas (the major East Coast rivers) are dominated by winter/spring extratropical systems, like the March 1936 event, with landfalling tropical cyclones, like the September 1938 storm, dominant at somewhat smaller scales. The largest flood peaks at basin scales smaller than 259 km2, which are not represented in the stream gauging record of stations with more than 100 years of observations, are produced by organized thunderstorm systems. The Connecticut River is typical of most river basins of its size in the eastern United States in containing a large number of major dams and reservoirs. Regulation of the river has extended over the entire 170 year record, with regulation increasing following the March 1936 flood.

[15] The Potomac River is unusual for a river of its size in the United States because of the relatively minor degree of regulation by reservoirs. The Jennings Randolph Dam, which is located in the West Virginia headwaters of the basin, controls less than 2% of the catchment. Completion of the Jennings Randolph Dam in 1981 marked the end of construction of major federal dams in the United States. The middle 50 years of the 20th century from 1925 to 1975 comprise the dam-building era of the United States, stimulated by Depression-era public works projects, increasing demand for hydropower, and the societal impacts of major floods like the March 1936 flood [e.g., Billington and Jackson, 2006]. As in the Connecticut River, the March 1936 flood provides the flood of record for the Potomac River (Figure 3b), with a significantly larger unit discharge peak (0.54 m3 s−1 km−2) from a comparable drainage area. Also like the Connecticut River, near-record flood peaks are associated with tropical cyclones. The second largest peak in the Potomac River was from the unnamed tropical storm of October 1942 and the third largest peak was from Hurricane Agnes in June 1972. The Potomac River basin is typical of watersheds throughout the eastern United States in having experienced near-total deforestation in the late 19th and early 20th century [Bonan, 1999; Findell et al., 2007; Steyaert and Knox, 2008]. Agricultural practice has evolved in many parts of the United States from the highly erosive farming of the early 20th century to modern farming practices followed the initial introduction of major soil conservation programs in the 1930s. Despite minor regulation by reservoirs and changing land use practices, the Potomac River at Point of Rocks provides one of the most “natural” annual peak records for a river of its size in the United States.

[16] The striking feature of the annual peak record of the Great Miami River is the flood of March 1913 (Figure 3c), which resulted in more than 360 fatalities [e.g., Henry, 1913; Bock, 1918]. The 1.09 m3 s−1 km−2 unit discharge peak dwarfs the March 1898 peak of 0.40 m3 s−1 km−2, which ranks second in the 115 year record. Like the March 1936 flood, the 1913 Great Miami River flood had a transformative impact on river regulation, both in the Great Miami River and throughout the United States. Design of the flood control program for the “Miami Conservancy” [e.g., Morgan, 1916; Bock, 1918] centered on a system of headwater flood control reservoirs and was intended to prevent catastrophic flooding from events like the March 1913 storm. The Miami Conservancy “design” was translated to the Tennessee Valley Authority (TVA) by Arthur Morgan (the architect of the Miami Conservancy and first Chairman of the TVA) and broadly adopted by the U.S. Army Corps of Engineers (COE) for multipurpose river basin development throughout the eastern United States [e.g., Billington and Jackson, 2006].

[17] An important feature of the March 1913 flood peak in the Miami River, like many flood peaks in the annual peak record, is that it was computed using “indirect” discharge estimation methods [e.g., Benson, 1962]. The error structure of annual flood peaks depends on a wide variety of factors that can be broadly viewed in terms of the two basic methods of estimation, computation from stage-discharge rating curves and computation by “indirect methods” using surveyed high-water marks and hydraulic models [e.g., Potter and Walker, 1981]. Equipment, procedures and personnel used for implementing these methods are variables that affect the annual peak record and their changing properties over time [e.g., Benson, 1962; House and Pearthree, 1995; Blanchard, 2007].

[18] The South Platte River basin above Denver extends from the peaks of the Front Range of the Rocky Mountains at elevations exceeding 4,270 m to the outlet of the catchment at 1,550 m. The resulting flood regime (Figure 3d) is diverse, including a mix of snowmelt floods during May and June, which are concentrated in the lower portion of the flood peak distribution, and late summer peaks from orographic thunderstorm systems linked to the North American Monsoon (55 of the 110 annual peaks occurred during July and August; see Higgins et al. [1997] for discussion of warm season rainfall in the United States and the North American Monsoon). The flood of record for the South Platte River (Figure 3d) occurred on 17 June 1965 and its magnitude (0.11 m3 s−1 km−2) was nearly twice that of the second largest peak for the station. Nonetheless, the flood peak in South Platte River was modest in comparison to flood peaks in the adjacent Jimmy Camp Creek (unit discharge peak of 25.08 m3 s−1 km−2 at 140 km2 drainage area) and Bijou Creek (unit discharge peak of 3.70 m3 s−1 km−2 at 3569 km2 drainage area), for 16–17 June 1965, which are among the largest unit discharge peaks in the United States [Costa, 1987a, 1987b; O'Connor and Costa, 2004b].

[19] The 16–17 June 1965 storms were orographic thunderstorm systems [Schwarz, 1967; Javier et al., 2007] which produced catastrophic rainfall over the Plains region east of the Front Range, including a portion of the South Platte River above Denver. Rainfall and flooding from the June 1965 storms were exceedingly rare, but not unprecedented for the region [see Follansbee and Jones, 1922; Follansbee and Sawyer, 1948]. The “climatology” of flood peaks in the South Platte River basin represents a mixture of flood-generating mechanisms linked to snowmelt, heavy rainfall during the North American monsoon, and rare orographic thunderstorm events that dominate the upper tail of the flood peak distribution.

[20] The gauging record for the Columbia River at the Dalles represents a large river basin for which the annual peak record is driven by snowmelt floods (Figure 3e). It is also a basin in which regulation by hydropower dams becomes a dominant feature of the flood hydrology of the region. The unit discharge of 0.06 m3 s−1 km−2 for the June 1894 flood (Figure 3e) is one of the largest in the world for basins with a drainage area greater than 518,000 km2 [e.g., O'Connor and Costa, 2004a]. Snowmelt floods in the Columbia River basin are typically complicated meteorological events, that are dependent on a sequence of extratropical systems that both create the snowpack and contribute to flooding through rapid melt and rain-on-snow processes. The system of hydropower dams in the Columbia River basin has evolved during the 20th century, along with the operating rules for the dams that adapt to the growing demand for power.

[21] In the following sections, we examine trends and abrupt changes in flood records during the 20th century for the 50 stations listed in Table 1. The features of annual peak records for the five stations illustrated in Figure 3 provide context for interpretation of the results.

3. Methodology

[22] In this section we present an overview of the different tests and estimators used to evaluate the presence of change points, and temporal trends [see, e.g., Helsel and Hirsch, 1993; Kundzewicz and Robson, 2000, 2004] and long-term persistence. We also briefly introduce the GAMLSS models.

3.1. Change Point Analysis

[23] We considered a number of different tests for change point analysis and chose the Pettitt test (see additional discussion below) for analyses presented in section 4. The Pettitt test [Pettitt, 1979] is a nonparametric (rank-based) test that allows detection of changes in the mean (median) when the change point time is unknown. This test is based on a version of Mann-Whitney statistic for testing that two samples X1, …, Xm and Xm+1, …, Xn come from the same population. The p value of the test statistic can be computed using the approximated limiting distribution by Pettitt [1979], valid for continuous variables.

[24] The five methods that were considered for change point analysis included two nonparametric tests (Pettitt and CUSUM tests) [e.g., Pettitt, 1979; Buishand, 1982], one semiparametric test (Guan) [Guan, 2004], and two parametric tests (Rodionov and Bayesian change point tests) [Rodionov, 2004; Erdman and Emerson, 2007]. To select one change point test, we used 47 additional USGS stations with at least 100 years of observations and with a USGS qualifying code 5 (“Discharge affected to unknown degree by Regulation or Diversion”) or 6 (“Discharge affected by Regulation or Diversion”). We made the assumption that the year in which the USGS report changes in the discharge due to human intervention marked the abrupt change. Where available, we selected a 5% significance level. We first discarded Rodionov test since it returned multiple change points for a given time series. Even though it is possible that multiple change points are present in a series, segmenting the record in multiple subseries would limit our capability to perform meaningful trend analysis. As far as the other four tests are concerned, the results based on the Pettitt test were in the closest agreement with times of change reported by the USGS. Compared to the other tests, other advantages of the Pettitt test are that (1) it works on the ranks, making it less sensitive to outliers and skewed distributions (as in this case), and (2) the significance of the test is obtained using the approximated limiting distribution by Pettitt [1979] (e.g., compared to CUSUM test, for which bootstrap analysis is required (W. A. Taylor, Change-point analysis: A powerful new tool for detecting changes, 2000, available at http://www.variation.com/cpa/tech/changepoint.html)).

[25] In general, most of the change point tests are designed to detect changes in the mean and fewer methods can identify shifts in the variance [e.g., Perreault et al., 2000; Rodionov, 2005]. As mentioned above, however, changes in the series variability can have strong impact, especially on extreme values. In this study, changes in variance are tested by using the Pettitt test and applying it on the squared residuals (in agreement with the suggestions by Pegram [2000]).

3.2. Temporal Trend Analysis

[26] Detection of temporal trends in annual maximum peak discharge is based on one parametric (Pearson's r), and two nonparametric (Mann-Kendall and Spearman's rho) tests [e.g., Helsel and Hirsch, 1993; McCuen, 2002]. The use of different tests provides a clearer indication of the presence or absence of a trend [e.g., Zhang et al., 2004].

[27] Pearson's test is a parametric test based on the significance of Pearson's product moment correlation coefficient. Given a sample of size n from a random variable X and a covariate T, Pearson's r is computed by dividing the covariance Cov(X, T) by the product of the standard deviations of X and T. In order to obtain unbiased results [e.g., Kowalski, 1972], this test assumes that X is Gaussian distributed. In this study, we perform the normal quantile transformation to transform our skewed data into Gaussian-distributed deviates [e.g., Kundzewicz and Robson, 2004] (notice that this transformation may distort trends in the data).

[28] The Mann-Kendall test [Mann, 1945; Kendall, 1975] is the most widely used statistical test to detect the presence of temporal trends in discharge data. It measures the association between X and T as the proportion of concordant pairs minus the proportion of discordant pairs in the samples. Two bivariate observations, (Xi, Ti) and (Xj, Tj), are called concordant whenever the product (XiXj)(TiTj) is positive, and discordant when the same product is negative. The Kendall's statistic S is the difference between the numbers of concordant and discordant pairs. If X and T are independent and randomly ordered, S has a mean of zero and a variance that is a function of the sample size.

[29] One of the problems associated with the Mann-Kendall test is that the results can be affected by the presence of serial and cross correlations [e.g., Hirsch et al., 1982; Kulkarni and von Stroch, 1995; Hamed and Rao, 1998; Yue et al., 2003]. Even though the use of annual maxima should avoid the issue of serial correlation, we compute the autocorrelation function and check whether the lag 1 value is significantly different from 0. If the lag 1 autocorrelation is not significant, we apply the Mann-Kendall test to the series. If a change point is detected, the autocorrelation function is computed separately on the series before and after it.

[30] Similar to the Mann-Kendall test, Spearman test [e.g., Helsel and Hirsch, 1993] is a nonparametric test. Given a generic bivariate observation (Xi, Ti), i = 1, 2, …, n, original values are replaced with their ranks in the sample (rank(Xi), rank(Ti)). Spearman's estimator corresponds to Pearson's correlation coefficient computed on the ranks. Even in this case, serial independence in time is required. As discussed by Yue et al. [2002], the power (defined as the probability of correctly rejecting the null hypothesis when it is false) of Mann-Kendall and Spearman tests in detecting trends is very similar.

3.3. Generalized Additive Models for Location, Scale, and Shape (GAMLSS)

[31] Given the apparent nonstationary behavior of hydrologic processes over multidecadal time scales [e.g., Milly et al., 2008], new models able to dynamically capture the evolution of the probability density functions in time should be implemented [e.g., Cox et al., 2002]. In this study we assume a parametric distribution for the response variable X, and model the parameters of the distribution as functions of an explanatory variable (time t). The generalized additive models for location, scale and shape (GAMLSS) proposed by Rigby and Stasinopoulos [2005], provide a flexible choice compared for instance with classical generalized additive models (GAM) [e.g., Hastie and Tibshirani, 1990], generalized linear models (GLM) [e.g., Nelder and Wedderburn, 1972; McCullagh and Nelder, 1989], generalized linear mixed models (GLMM) [e.g., McCulloch, 1997, 2003], and generalized additive mixed models (GAMM) [e.g., Fahrmeir and Lang, 2001].

[32] In GAMLSS models the assumption that X follows a distribution from the exponential family (e.g., Gaussian distribution) is relaxed allowing for a general distribution function (e.g., highly skewed and/or kurtotic continuous and discrete distributions). The systematic part makes it possible to model not only the location parameter (related to mean), but also scale and shape parameters (related to dispersion, skewness, and kurtosis) of the distribution of X as linear and/or nonlinear, parametric and/or additive nonparametric functions of explanatory variables and/or random effects [Rigby and Stasinopoulos, 2005; Stasinopoulos and Rigby, 2007]. A GAMLSS model assumes that independent random variables Xi, for i = 1,…, n, have distribution function FX(xi; equation imagei) with equation imagei = (θi1,…, θip) a vector of p distribution parameters accounting for location, scale, and shape. Usually p is less or equal to four, since one-, two-, three- and four-parameter families guarantee enough flexibility for most applications. The distribution parameters are related to the design matrix of explanatory variables, ti, by monotonic link functions gk(·), for k = 1,…, p. For a more comprehensive discussion on theory, model fitting, and selection, the reader is pointed to Rigby and Stasinopoulos [2005, and references therein], Stasinopoulos and Rigby [2007, and references therein].

[33] In this study we consider four different two-parameter distributions widely used in modeling of streamflow data, Gumbel, gamma, lognormal, and Weibull (Table 2) [e.g., Stedinger et al., 1993; El Adlouni et al., 2008], where one of the parameters, θ1, can be related to the mean and one, θ2, to the variance. The evaluation of temporal trends is performed by considering models in which the parameters θ1 and θ2 of the distributions are a linear function of time t (the only explanatory variable considered here) through proper link functions (see Hall and Tajvidi [2000] and Ramesh and Davison [2002] for a discussion about the presence of trends different from the linear one):

equation image
equation image

for i = 1,…, n, where β1 and β2 denote the vectors of coefficients of the linear models. Note that even though in this study we focus on the dependence of θ1 and θ2 on time, other covariates could be used as well.

Table 2. Summary of the Four Two-Parameter Distributions Considered in This Study to Model the Annual Maximum Peak Discharge
 Probability Density FunctionDistribution Moments
GumbelfY(y∣θ1, θ2) = equation image expequation imageE[Y] = θ1 + γθ2 ≅ θ1 + 0.57722θ2
 −∞ < y < ∞, −∞ < θ1 < ∞, θ2 > 0Var[Y] = π2θ22/6 ≅ 1.64493θ22
WeibullfY(y∣θ1, θ2) = equation image expequation imageE[Y] = θ1Γ(equation image + 1)
 y > 0, θ1 > 0, θ2 > 0Var[Y] = θ12equation image
GammafY(y∣θ1, θ2) = equation imageE[Y] = θ1
 y > 0, θ1 > 0, θ2 > 0Var[Y] = θ22θ12
LognormalfY(y∣θ1, θ2) = equation imageE[Y] = image
 y > 0, θ1 > 0, θ2 > 0Var[Y] = ω(ωimage where ω = exp(θ22)

[34] Few studies have investigated the dependence of the moments of the distribution (via parameters) on time [e.g., Strupczewski et al., 2001]. A novel feature of this study is that temporal trends and change points are both included while investigating the nonstationarity assumption in flood frequency modeling. The selection of a particular distribution is based on visual assessment of diagnostic plots as well as maximum likelihood values. Moreover, the selection of a model accounting for change points in mean and/or variance and monotonic temporal trends in the model parameters over a simpler model with time-independent parameters is performed by means of the Akaike information criterion (AIC) [Akaike, 1974], where the maximum likelihood value is penalized by the number of independently adjusted parameters. The model with the minimum AIC value is selected. In this way, we can compare models with different probability distribution, trends in the parameters, and change points in mean and/or variance, providing additional evidence of the presence (or absence) of abrupt and/or slowly varying changes [e.g., Zhang et al., 2004]. However, since the AIC value does not provide information about the quality of the fitting [e.g., Hipel, 1981], we assess the performance of the selected model by means of visual inspection of diagnostic plots of the residuals, such as residuals vs response, qq plots or the more effective worm plots [e.g., van Buuren and Fredriks, 2001; Stasinopoulos and Rigby, 2007]. These are detrended forms of the qq plots, which display the difference between the empirical and theoretical quantiles (vertical axis) against the theoretical quantiles (horizontal axis). The points appear as a worm shape, where the extreme points are more variable than those in the center.

[35] In this study, all the calculations were performed in R [R Development Core Team, 2008] using the freely available GAMLSS package (D. M. Stasinopoulos et al., GAMLSS: Generalized additive models for location scale and shape, R package version 1.6-0, 2007, available at http://cran.r-project.org/web/packages/gamlss/gamlss.pdf).

3.4. Long-Term Persistence

[36] As discussed by Cohn and Lins [2005], long-term persistence could result in detection of a statistically significant trend, even though no trend is present [see also Koutsoyiannis, 2006]. In this study we test the presence of long-term persistence by estimating the Hurst exponent H [Hurst, 1951]. For long-term persistence phenomena, the correlation Corr(·,·) slowly decreases following a power law behavior:

equation image

where Xt is the process of interest, k denotes the lag, and C is a constant.

[37] The Hurst exponent H varies between 0 and 1. A value of 0.5 indicates lack of long-term persistence, and values larger than 0.5 indicate the presence of long-term persistence (this dependence is greater for larger values of H).

[38] There are different methods to estimate H, including the aggregated variance method, differenced variance method, Higuchi's method, R/S method, periodogram and modified periodogram methods, Whittle and local Whittle method (consult Montanari et al. [1999] for a review). One of the most commonly used approaches is the aggregated variance technique. On the basis of this method, in the presence of long-term persistence:

equation image

where n is the sample size, c is a positive constant, and equation imageN is the sample mean.

[39] The presence of nonstationarities (e.g., nonstationarity in the mean) may significantly affect the estimation of H [e.g., Rust et al., 2008]. In particular, in the presence of trends or abrupt changes in the mean, the estimator based on the aggregated variance method is affected by large errors [e.g., Teverovsky and Taqqu, 1997; Montanari et al., 1999]. As noted by Montanari et al. [1999, p. 220], “differencing the variance is a useful technique when jumps in the mean or slowly decaying trends are suspected.” For this reason, in addition to the aggregated variance estimator we also include an estimator based on differencing the variance for those cases in which we suspect the presence of a deterministic change point or linear trend (e.g., due to stream gauge relocation, construction of upstream dams).

4. Results

[40] In this section we present results concerning evaluation of the hypothesis of stationarity in terms of change point and temporal trends (as well as GAMLSS modeling), and the long-term persistence.

4.1. Change Point Analysis

[41] Change point analysis is used to investigate the presence of abrupt changes in mean and variance of annual peak records during the 20th century. Using the Pettitt test, it was found (Figure 4) that 18 of the 50 basins exhibited a significant abrupt change in the mean. The times of abrupt changes in the mean clustered around the 1920s and between 1950 and 1970. For 6 of the 50 basins we have also detected a change point in variance. In the Mississippi River Basin change points in the mean cluster in the 1940s (Figure 4), in relation to profound land use and land cover changes [e.g., Zhang and Schilling, 2006; Schilling et al., 2008].

Figure 4.

Maps with the location of the USGS stream gauge stations with a change point (a) in mean and (b) in variance. The results are based on the Pettitt test for a significance level of 5%.

[42] The fact that there are only 21 records showing a change in the mean and/or variance allows us to look at each station in more detail. We found that for two of the stream gauges in Georgia, the detected change point is close to the year in which the gauge height was computed at a different site and/or with respect to a different datum (Broad River in 1917 and Oconee River in 1949). For the stations on the Great Miami River, Hiwassee River, Red River at Fargo, and the Columbia River, the change point is close to the year in which the USGS applied the qualifying code 6 (“Discharge affected by Regulation or Diversion,” Great Miami River in 1922, Hiwassee River in 1942, Red River at Fargo in 1942, and Columbia River in 1938). These results highlight the fact that abrupt changes in mean and/or variance can be due to changes in gauging practice, changes in land use and land cover and regulation by reservoirs, as well as climatic reasons. Careful inspection of the records is a key element in assessing nonstationarities in annual peak records [e.g., Potter, 1979].

4.2. Temporal Trend Analysis

[43] Change points present one possible mode of nonstationarity. Long-term trends constitute a second major cause of nonstationary in annual flood peak time series. Analyses of long-term trends are performed using Mann-Kendall, Spearman, and Pearson two-sided tests. In the case in which no change point is detected, the temporal trend analysis is performed on the entire time series. For those records in which a statistically significant change point is detected, the temporal trend analysis is performed on the series obtained by splitting the record into two subseries (before and after the change point). Before performing trend analysis, we examined the autocorrelation function (ACF) of the annual peak time series. Overall, the ACF was not significantly different from 0 for any scenario, suggesting that year-to-year correlation is not an important element of trend detection for annual flood peaks.

[44] For basins with no change point in mean (Table 3), only two stations (Broad River and Yampa River) present a significant temporal trend at the 5% for all three tests. For two more rivers (Connecticut, and Kansas) significant trends at least at the 10% significance level are identified for all three tests.

Table 3. Summary of the Results of the Temporal Trend Analysis for Those Series in Which No Statistically Significant Change Point Was Detecteda
  • a

    The numerical values correspond to the p values for the Mann-Kendall (MK), Spearman (S), and Pearson (P) two-tailed tests. The column labeled “Direction” indicates whether a positive (+) or negative (−) trend was detected on the basis of an MK test. One asterisk means test significant at the 5% significance level; two asterisks means test significant at the 10% significance level.


[45] For the 18 stations with change points in mean, two stations (Minnesota River and South Platte River) exhibited significant time trends at the 5% significance level for the sample period prior to the change point (Table 4). For an additional three stations, all three testes showed statistically significant trends at the 10% level. Finally, in Table 5 we have the results for the subseries after the change point. The Columbia River and Umpqua River show statistically significant trends at the 5% level for all three tests.

Table 4. Same as Table 3 but for the Subseries Before Change Point
Table 5. Same as Table 3 but for the Subseries After Change Point

[46] We investigated the effects of “change points” on long-term trend analysis by repeating the trend analysis on those series in which a change point was detected by neglecting it (Table 6). It is clear from these results that neglecting the presence of a change point could result in obtaining a significant trend even though, by accounting for the change point, no statistically significant trend was detected.

Table 6. Results of the Trend Analysis Neglecting the Presence of a Change Pointa
  • a

    One asterisk means test significant at the 5% significance level; two asterisks means test significant at the 10% significance level.


4.3. GAMLSS Modeling

[47] The previous results show that both gradually varying long-term trends and abrupt changes can cause nonstationarity in annual flood peak records. In this section, we examine GAMLSS models as a framework for parametric modeling of nonstationary annual peak records. The suitability of the fitted parametric models is evaluated with respect to likelihood-based optimality criteria (including AIC) and by diagnostic graphical tools. In particular, we checked the normality of the residuals through worm plots [e.g., van Buuren and Fredriks, 2001]. In the case in which no change point is detected, we compared the maximum likelihood and AIC scores for different two-parameter distributions (gamma, Weibull, Gumbel, and lognormal) for four different models: (1) stationary model (no trends); (2) nonstationary model in θ11 modeled as a linear function of time through a proper link function, which depends on the distribution family); (3) nonstationary model in θ22 modeled as a linear function of time through a proper link function); and (4) nonstationary in θ1 and θ21 and θ2 modeled as a linear function of time through proper link functions).

[48] We have summarized these results in Table 7. The gamma (with logarithmic link functions for both location and scale parameters) and lognormal (with identity and logarithmic link functions for location and scale parameters, respectively) distributions are the most frequently selected parametric models. Moreover, our modeling results support the general lack of linear trends in the aforementioned records.

Table 7. Summary of the Results for the GAMLSS Model in the Absence of a Change Pointa
USGS IDCDFStationaryNonstationary in θ1Nonstationary in θ2Nonstationary in θ1 and θ2
  • a

    The table is organized as follows. In the second column are shown the selected probability distributions (GA, gamma; LOGNO, lognormal; GU, Gumbel; WEI, Weibull). Selected models are indicated with a Y in the next columns.


[49] For those records in which significant change points in mean and/or variance were detected, we considered the most complex model accounting for abrupt changes and trends in θ1, and all possible simpler submodels up to the model with constant parameters. All models were compared by AIC scores and by computing pairwise deviances [e.g., McCullagh and Nelder, 1989] to test the significance of the improvement obtained by switching from simpler to more complex models. We have summarized our results in Table 8. Overall, we notice that the change points detected with the Pettitt test are also found significant using the GAMLSS. Moreover, the gamma and lognormal distributions are superior in almost all of the cases.

Table 8. Similar to Table 7 but for GAMLSS in the Presence of a Change Pointa
USGS IDCDFChange Point in MeanTrend Before CPTrend After CPChange Point in Variance
  • a

    The presence or absence of a change point or trend (in the location parameter θ1 before and after the change point) based on GAMLSS is identified with Y and N, respectively. CP means change point.

02191300LOGNO   Y
03270500LOGNOY  Y
06191500GA   Y
06714000LOGNOY  N
07010000GA   N

[50] For four time series, in Figure 5 we illustrate the results of our modeling effort. In particular, we looked at a case in which we have a change point in mean, in variance, in mean and variance, and a change point in mean and a monotonic trend before and after the change point (in Figure 6 we show the goodness of the fit by means of the worm plots of residuals; when the fit is good the points should be aligned preferably on the black line and within the grey curves delimiting the 95% confidence intervals). On the basis of our modeling results for the Arkansas River at Canon City (USGS ID 07096000), the presence of a change point in mean significantly impacts the streamflow results. On the other hand, even the impact of a change point in variance, even though generally neglected, can have a large impact on the results, as shown for the Broad River (USGS ID 02191300); in this case, even though the median does not seem significantly affected, the higher and lower percentiles are affected by it. The next example shows the results of a model in the presence of change points both in mean and variance (Great Miami River, USGS ID 07348500). Even in this case, our model is able to capture the decreased scatter. Finally, we show the results for a case in which we have a change point in mean and monotonic trend before and after the change point (Columbia River, USGS ID 14321000). The model is capable of capturing the large scatter in the time series.

Figure 5.

Fitting of the annual maximum discharge for four stations using the GAMLSS model. Five percentiles are represented (5th, 25th, 50th, 75th, and 95th).

Figure 6.

Worm plots for four basins to assess the fitting of the GAMLSS model to the data. For a good fit, the data points should be aligned preferably along the black solid line but within the two grey lines.

4.4. Long-Term Persistence

[51] Alternative interpretation of the properties of the annual maximum instantaneous peak discharge time series is provided by long-term persistence. In this section we show the results of our analyses based on the aggregated variance estimator and summarize them in Table 9.

Table 9. Values of the Hurst Exponent H Estimated Using the Aggregated Variance Method and the Corresponding p Value From Testing the Hypothesis of H = 0.5
USGS IDHurst Exponentp Value

[52] For almost half of the basins the value of the Hurst exponent is less than or equal to 0.50, suggesting a lack of long-term persistence. On the other hand, for the other time series H is larger than 0.50, suggesting that the observed behavior could be explained in terms of long-term persistence. However, since the estimates of the Hurst exponent are affected by large uncertainties due to the limited sample size [e.g., Koutsoyiannis and Montanari, 2007], we have tested whether the values of H computed from these time series are statistically different from 0.5. The null hypothesis H0 is that H = 0.5, while the alternative hypothesis H1 is that the Hurst exponent is different from 0.5. Since resampling procedures destroy the memory of the series, we have built the distribution of H under the null hypothesis (nonmemory) by means of bootstrap approach [Efron and Tibshirani, 1997]. Namely, we have resampled (with replacement) each series B = 3,000 times, and computed H for each resampled series. Then, the empirical distribution of the B bootstrap values of H was used to define the p values of the Hurst exponent computed from the observed series. We have summarized the results in Table 9. For many of these time series, even though the value of the Hurst exponent is larger than 0.5, there is not enough evidence to reject the null hypothesis. Therefore, even though our time series are large compared to the common series of annual maxima, they are not large enough to yield conclusive results concerning the presence (or absence) of long-term persistence.

[53] Two of the five basins illustrated in Figure 3 have estimated Hurst exponents less than, but close to, 0.5. The estimated Hurst exponents for both the Connecticut River and Potomac River is 0.49. For the other three basins, estimated Hurst coefficients are greater than 0.5, with the South Platte River at 0.58, the Great Miami River at 0.68 and the Columbia River at 0.85 (only in the last case there is statistical evidence for H different from 0.5). The range in Hurst exponents for these five basins reflects the range in river regulation with the Potomac River the least regulated by reservoirs and the Columbia River the most regulated. For these basins, we further investigated long-term persistence by applying the differenced-variance estimator, which is less sensitive to the presence of deterministic trends and shifts in the mean. The differenced variance estimates for the South Platte River, Great Miami River, and Columbia River, are 0.50, 0.40 and 0.51, respectively. The sensitivity of the aggregated variance estimator to the presence of deterministic change points in mean and/or linear trends is especially apparent in the case of the Columbia River, for which a decreasing trend in peak discharge over the 20th century is clearly observable (Figure 3e), resulting in an aggregated-variance estimator of 0.85. Using the differenced-variance estimator, the value of H decreases to 0.51.

[54] The vast majority of the series showing estimates of the Hurst exponent with p values larger than 0.85 are the same series for which a significant change point and/or trend were detected. Since long-memory behavior can be explained as a composition of multiple time scale random fluctuations [Koutsoyiannis, 2002], resulting in apparent abrupt changes and trends on an underlying long-term stationary process, it is expected that classical tests are able to detect such shift and trends. On the other hand, deterministic changes and trends due, for instance, to human intervention (e.g., gauge relocation, construction of upstream dams, land use and land cover changes) can lead to significant H values, which are not due to long-term persistence and random fluctuations.

[55] Then, our results show that abrupt changes can happen, while monotonic trends are less frequent and often caused by nondetected change points. However, it is very difficult to draw conclusions about the forcing mechanism generating such changes without additional information. In particular, when one deals with short series for which human impact is possible (such as for some of our annual flood peak records) statistical analysis does not provide a tool to answer the question if a change will continue in the future.

5. Conclusions

[56] In this study we have investigated the validity of the stationarity assumption in annual maximum instantaneous peak discharge records from 50 basins in the continental United States with more than a century of stream gauging observations. Our findings can be summarized as follows:

[57] 1. The Pettitt test was used to investigate the presence of change points in both mean and variance of annual flood peaks. We found that in almost half of the analyzed series there was a statistically significant change point. In few cases, further investigation uncovered links between the abrupt change and stream gauging practice at the station. We also want to underline the importance of investigating change points not only in mean but also in variance, since they provide an indication about a change in the scatter of the series. McCabe and Wolock [2002] found a shift in the mean of the discharge around 1970 and related the increases in the annual streamflow statistics to an increase in precipitation around the same year in the eastern United States [e.g., Karl and Knight, 1998]. Further studies should be carried out to investigate the relation between precipitation and streamflow [e.g., Groisman et al., 2001a].

[58] 2. On the basis of the trend analysis, we can conclude that overall no monotonic temporal patterns of the annual maximum instantaneous peak discharge for the analyzed series were detected.

[59] 3. Neglecting the presence of a statistically significant change point could significantly affect the conclusions of the trend analysis.

[60] 4. In several cases we found that the gamma and lognormal distributions could be used to model the peak discharge. Nevertheless, more studies are necessary to investigate the issue related to the presence or lack of a heavy tail in the annual maximum peak discharge, with all the consequences related to the design of engineering structures [e.g., Stedinger and Griffis, 2008; El Adlouni et al., 2008].

[61] 5. The results of this study highlight the flexibility of the GAMLSS model to account for change points in mean and/or variance and temporal trends. We think that GAMLSS can be a useful tool in modeling hydrometeorological variables without resorting to the stationarity assumption.

[62] 6. Our results point out that series showing statistically significant changes, show a significant Hurst parameter and vice versa. Thus, long-term persistence could be considered the generating mechanism of the changes. However, looking at series for which metadata are available, it is shown that changes are likely explained by human interventions rather than random fluctuations underlying long-term persistence. On the other hand, for series where additional information is not available, we were not able to assess whether the observed variations in annual maximum instantaneous peak discharge were due to natural climate variability or anthropogenic climate change. On the basis of these results, we pointed out how human modifications affecting runoff generation (e.g., changes in land use and land cover), fluvial transportation (e.g., construction of dams and pools), as well as consistent measurements (e.g., stream gauge relocation), could significantly affect the flood peak time series. Nevertheless, future studies should aim at quantifying the relative impact of these factors (e.g., using rainfall-runoff models).

[63] 7. Even though the time series considered in this study are long compared to the common annual maxima series, they may not be long enough to provide conclusive results regarding the presence of long-term persistence. Moreover, this study highlights the sensitivity of the estimation of the Hurst exponent to human intervention. A careful selection of the proper estimator is a key element for meaningful statements about the presence (or absence) of long-term persistence in hydrometeorological variables as well as the application to a broader sample of basins from diverse climatic and physiographic settings.


[64] This research was funded in part by the Willis Research Network, the National Science Foundation (grant EAR-0847347), and NASA. The authors would like to thank D. M. Stasinopoulos, R. A. Rigby, and C. Akantziliotou for making the GAMLSS package (see http://cran.r-project.org/web/packages/gamlss/gamlss.pdf) freely available in R [R Development Core Team, 2008]. Comments and suggestions by the Associate Editor D. Koutsoyiannis, H. Lins, K. W. Potter, and an anonymous reviewer are gratefully acknowledged.