A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

Authors


Abstract

[1] The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

1. Introduction

[2] An important concern in the development of national flood frequency guidelines such as Bulletin 17B [Interagency Committee on Water Data (IACWD), 1982] or a new Bulletin 17C [Stedinger and Griffis, 2008; England and Cohn, 2008], is that the procedure be robust. That is, the recommended procedure should be reasonably efficient when the assumed characteristics of the flood distribution are true, while not doing poorly when those assumptions are violated. A critical issue is whether the low (or zero) flows in an annual flood series are relevant in estimating the probabilities of the largest events.

[3] Annual-peak-flow series, particularly those in the Western United States, often contain so-called “low outliers.” In the context of Bulletin 17B, low outliers are small values which depart from the trend of the rest of the data [IACWD, 1982] and often reflect a situation wherein the smaller flood flows are unusually small given what one would expect based on the larger flood flows. For example, Figure 1 depicts the logarithms of 1 day rainfall flood annual peak flows for the Sacramento River at Shasta Dam, as computed by the Army Corps of Engineers, 1932–2008 (77 years). At this site, the three smallest observations appear visually to be unusually small. The figure includes a lognormal distribution fit to the top 74 observations. The standard Grubbs-Beck test [Grubbs and Beck, 1972] (see equation (2), 10% significance level) generates a threshold of 7373 cubic feet per second (cfs), and thus correctly identifies the smallest observation as a low outlier, but not the second and third smallest observations.

Figure 1.

Observed annual 1 day peak flows on the Sacramento River at Shasta Dam from 1932 to 2008 (n = 77 total observations). Line represents graphical fit to largest n = 74 observations.

[4] Low outliers and potentially influential low flows (PILFs) in annual-peak-flow series often reflect physical processes that are not relevant to the processes associated with large floods. Consequently, the magnitudes of small annual peaks typically do not reveal much about the upper right-hand tail of the frequency distribution, and thus should not have a highly influential role when estimating the risk of large floods. Klemes [1986] correctly observes:

[5] “It is by no means hydrologically obvious why the regime of the highest floods should be affected by the regime of flows in years when no floods occur, why the probability of a severe storm hitting this basin should depend on the accumulation of snow in the few driest winters, why the return period of a given heavy rain should be by an order of magnitude different depending, say, on slight temperature fluctuations during the melting seasons of a couple of years.”

[6] The distribution of the proposed test statistic is derived specifically for the purpose Klemes suggests: to identify small “nuisance” values. Paradoxically, moments-based statistical procedures, when applied to the logarithms of flood flows to estimate flood risk, can assign high leverage to the smallest peak flows. For this reason, procedures are needed to identify potentially influential small values so a procedure can limit their influence on flood-quantile and flood-risk estimates.

[7] This paper presents a generalization of the Grubbs-Beck statistic [Grubbs and Beck, 1972] that can provide a standard to identify multiple potentially influential small flows. The present work is motivated by ongoing efforts [England and Cohn, 2008; Stedinger and Griffis, 2008] to explore potential improvements to Bulletin 17B [IACWD, 1982] with moments-based, censored-data alternatives [Cohn et al., 1997; Griffis et al., 2004]. The proposed statistic is constructed following the reasoning in Rosner [1975], who developed a two-sided R-Statistic “many outlier” test (RST), that is based on the following argument:

[8] “[t]he idea is to compute a measure of location and spread (a and b) from the points that cannot be outliers under either the null or alternative hypotheses, i.e. the points that remain after deleting 100 p% of the sample from each end.”

[9] In Rosner's implementation, p is some fraction of the total number of observations, n. We consider a one-sided test statistic based on these concepts to detect PILFs in the left-hand tail.

2. Literature Review

[10] A wide range of test procedures for identifying low outliers have been studied in the statistical literature [Thompson, 1935; Grubbs and Beck, 1972; Barnett and Lewis, 1994], including methods for dealing with the case of multiple low outliers considered here [Tietjen and Moore, 1972; Rosner, 1975; Prescott, 1975; Gentleman and Wilk, 1975; Prescott, 1978; Rosner, 1983; Marasinghe, 1985; Rousseeuw and van Zomeren, 1990; Hadi and Simonoff, 1993; Spencer and McCuen, 1996; Rousseeuw and Leroy, 2003; Verma and Quiroz-Ruiz, 2006].

[11] Thompson [1935] provided an early criterion for the rejection of an outlier based on the ratio of the sample standard deviation and an observation's deviation from the sample mean. Another early test for extreme values was provided by Dixon [1950, 1951], who for high outliers proposes the test statistic

display math(1)

where inline image, inline image, and inline image denote the smallest, largest, and second largest observations respectively. Barnett and Lewis [1994] also discuss a Dixon-type test for the second most extreme observation in either tail.

[12] The Grubbs-Beck test [Grubbs, 1969; Grubbs and Beck, 1972], which is recommended by Bulletin 17B [IACWD, 1982], defines a low outlier threshold as:

display math(2)

where Kn is a one-sided, 10% significance-level critical value for an independent sample of n normal variates, and inline image and inline image denote the sample mean and standard deviation of the entire data set. Any observation less than Xcrit is declared a “low outlier” [IACWD, 1982]. With Bulletin 17B low outliers are omitted from the sample and the frequency curve is adjusted, using a conditional probability adjustment [IACWD, 1982]. Kn values are tabulated in section A4 of IACWD [1982] based on Table A1 in Grubbs and Beck [1972].

[13] Stedinger et al. [1993, p. 18.45] provide an accurate approximation for inline image:

display math(3)

[14] Rosner [1975, 1983] develops a sequential two-sided outlier test, based on a generalization of the Grubbs [1969] statistic, the extreme studentized deviate (ESD) statistic, which is defined as

display math(4)

where inline image and inline image are the sample mean and standard deviation of the entire sample. The observation corresponding to inline image is removed and T2 is computed from the remaining sample, i.e., the sample mean and standard deviation are computed from the remaining n − 1 observations. This process is repeated until inline image is computed, for some prespecified k.

[15] Rosner [1975]'s RST procedure is very similar to the ESD procedure, except that it is much less computationally intensive. As described above, an ESD test for k suspected outliers requires the computation of k trimmed sample means and standard deviations to construct k test statistics. The RST procedure only requires the trimmed sample moments to be computed once (after removing the k suspected outliers from the sample). Each of the k R-statistics are then computed from the single trimmed mean and standard deviation.

[16] The Rosner [1975, 1983] tests detect outlier observations that are either unusually large or small. The ESD statistic is the Grubbs-Beck test statistic generalized to consider both large and small outliers, and to test more than the smallest observation in a sample. The one-sided multiple Grubbs-Beck test statistic proposed in this paper differs from the Rosner [1975, 1983] ESD statistic in that it only considers low outliers and does not include the suspected outlier in its computation of the trimmed mean and standard deviation.

[17] In addition to equation (4), several other test statistics have been employed in sequential multiple outlier tests (Barnett and Lewis, [1994], p. 235). Published critical values achieve an overall type I error which matches the specified α for a given kmax, even though they employ multiple comparisons to decide on the number of outiers k, for k inline image kmax. This works well if the number of potential outliers is known.

[18] Bayesian outlier tests have also been proposed [see Ondo et al., 2001; Bayarri and Morales, 2003]. A primary feature of such tests is that they incorporate an explicit model for the alternative hypothesis, based on a contaminating distribution that is responsible for the outlier(s). Barnett and Lewis [1994] provides a good summary of the Bayesian approach to outliers and contrasts that approach with the frequentist approach. The analysis proposed here employs a traditional frequentist approach which does not require a specific alternative distribution for outliers.

2.1. Masking, Swamping, and Block Tests

[19] The Grubbs-Beck test recommended in Bulletin 17B is, by construction, a 10% test for a single outlier in a flood sample [IACWD, 1982]; 10% is the probability of rejecting the hypothesis of normality when samples are drawn from a normal distribution. A reasonable concern is that a flood record could contain more than one low outlier and the additional outliers can cause the Grubbs-Beck test statistic to fail to recognize the smallest observation as an outlier (by inflating the sample mean and variance). This effect is known as masking [Tietjen and Moore, 1972]. Siyi [1987] demonstrated the masking problem using the Bulletin 17B Grubbs-Beck test with several flood records in China. Basically, the problem is that if a sample has several severe outliers, then the smallest observation does not look that unusual in the context of the other unusually small observations.

[20] Some multiple outlier tests consist of successive application of a single outlier test. These are generally divided into step-forward and step-backward tests [Spencer and McCuen, 1996]. Step-backward tests first test if the most extreme (smallest) observation (k = 1) is an outlier. If the smallest (k = 1) proves significant, the next smallest (k = 2) is tested. This continues until no additional outliers are identified. Such step-backward tests are particularly susceptible to masking [Barnett and Lewis, 1994, pp. 109–115]. Suppose the three smallest flood flows in a sample are much smaller than the remainder of the sample, and a successive step-backward outlier detection test that uses the Grubbs-Beck test statistic is used. Having three small outliers can cause the test to conclude at the first step (k = 1) that the smallest observation is not an outlier, so the procedure stops.

[21] The Rosner [1983] outlier test avoids masking by using a step-forward test for at most k outliers. This means that the Rosner [1983] procedure first tests the kth most extreme observation; if the result is significant, all more extreme observations are also considered outliers. If not the k − 1 most extreme observation is tested, and so forth, working outward to the most extreme observation.

[22] If the number of potential outliers is thought to be k0, then a block test simultaneously tests for k0 outliers by computing a single statistic, perhaps based on the ratio of mean deviation from partial sample averages [see Tietjen and Moore, 1972]. Such block tests for k0 low outliers are particularly susceptible to swamping (Barnett and Lewis [1994], pp. 109–115). Consider use of a block outlier test for inline image low outliers applied to a sample with only one unusually small low outlier. The smallest observation can cause the second smallest to erroneously be identified as one of two outliers. The test statistic proposed here avoids the problem of swamping by not including the suspected outlier and all more extreme observations in the computation of the test statistic.

3. Concerns With the Grubbs-Beck Test

[23] Spencer and McCuen [1996] raise three concerns with the Bulletin 17B low outlier-identification procedure:

[24] 1. it provides critical values for only a single significance level, inline image (because of the limited tables);

[25] 2. it assumes zero skew (i.e., a normal distribution as the null hypothesis); and

[26] 3. it does not address multiple low outliers.

[27] To resolve these concerns, Spencer and McCuen [1996] used Monte Carlo simulation to identify critical inline image values for inline image, inline image, and inline image, the three smallest observations in a sample for significance levels inline image, and 1%; log-space skew inline image, and 1.0; and sample sizes inline image. The test requires specification of the log-space skew that defines the log-Pearson type 3 (LP3) population that represents the null hypothesis. They recommend using a weighted average of the sample skew and a regional skew. If no regional skew is available, the sample skew is recommended. It is not clear if the estimated sample skew or an estimated weighted skew is appropriate for an outlier test which assumes the population skew is known independent of the data set.

[28] An additional issue is that the step-forward multiple outlier test recommended by Spencer and McCuen [1996] uses the trimmed sample moments, including a trimmed sample skew coefficient. The trimmed skew coefficient is a very biased estimator of the true skew of a sample; and its use in a successive step-forward test is highly suspect because it is not consistent with how the critical deviates, inline image, were developed.

[29] Hosking and Wallis [1997, p. 152] make two criticisms of Bulletin 17B's approach to low outlier detection. First, “the [Grubbs-Beck] outlier adjustment is a complication that would not have been necessary had the logarithmic transformation not been used,” and, second, that “the criterion used …is arbitrary, being based on an outlier test from samples from the normal distribution at a subjectively chosen significance level of 10%.” Hosking and Wallis [1997] note that “no justification” for the choice of procedure was given beyond “saying that it was based on ‘comparing results' of several procedures.”

[30] Answers to these concerns can be found in Thomas [1985]. Thomas describes the process used to select Bulletin 17B's low outlier detection procedure from among 50 alternatives. This process involved first subjectively comparing their performance. Of the 50 procedures, 10 were judged to agree adequately with the subjective visual identification. Based on a Monte Carlo analysis of the 10 promising candidates, six procedures were selected. Comparison of subjective visual identification of additional outliers in flood records resulted in the selection of the current methodology.

4. Statistical Development

[31] A procedure is provided to compute critical values (p values) for the k smallest flows (PILFs) in flood frequency applications. We first present the context and notation that is compatible with current US National flood frequency guidelines [IACWD, 1982], and outline the generalization of the test statistic employed therein. A p-value equation is then derived using a semianalytical approach with numerical integration, which enables the estimation of a critical value for any combination of k and record length n.

[32] Let inline image be a series of annual maximum flows. The values are assumed to be independent. Some annual peaks may not be floods. In arid and semiarid regions, such as the Southwestern United States, many watersheds have no discharge at all in some years, so zero flows appear in peak flow records. In the United States, one typically works with the logarithms of the annual peaks, denoted {X1,…,Xn}, and zero flows are always treated as low outliers that are afforded special treatment.

[33] We consider the sorted data set, inline image, where inline image denotes the smallest observation (the smallest order statistic) in a sample of size n. The problem is how to determine whether the kth smallest observation in the data set, inline image (and consequently inline image), is unusually small under the null hypothesis (H0) that inline image are drawn from a population of independent and identically distributed normal variates.

[34] The Grubbs-Beck test [Grubbs and Beck, 1972, hereinafter GBT] is designed to determine if the single smallest observation in a sample is a low outlier. If more than one, say k, observations are below the Grubbs-Beck threshold, all would be considered low outliers. Alternatively, the GBT has been implemented iteratively to identify multiple low outliers. As each outlier is detected, it is removed from the data set and the test is repeated on the remaining sample. The statistical properties of this iterative procedure have not been thoroughly investigated, though the discussion below shows this approach is misguided.

[35] Here we explore a generalization of the GBT, called the “multiple Grubbs-Beck test” (MGBT). The purpose of the new test is to identify multiple potentially influential low flows—observations in the left-hand tail that are not consistent with the rest of the data and the hypothesis of normality. In particular, we consider whether inline image are consistent with a normal distribution and the other observations in the sample by examining the statistic

display math(5)

where inline image denotes the kth smallest observation in the sample, and

display math(6)
display math(7)

[36] The partial mean ( inline image) and partial variance ( inline image) are computed using only the observations larger than inline image to avoid swamping.

4.1. Deriving the p Value

[37] This section presents a derivation of the probability density function of inline image. That distribution can be used to determine quantitatively if the kth smallest observation in a sample of size n is, in fact, unusually small. A low outlier test can subsequently be based upon the p value for inline image, which is the probability given H0 of obtaining a value of inline image as small or smaller than that observed with the sample. If that probability is less than, say, 0.10, the k smallest observations could be declared to be low outliers based on the selected significance level.

[38] The p value of interest is given by

display math(8)

[39] Substituting the definition of inline image from equation (5) and rearranging the terms yields

display math(9)

where inline image is the kth-order statistic in a random standard normal sample of size n, and inline image are the partial mean and standard deviation of the standard normal sample. Thus the distribution of the final ratio in equation (9) does not depend on the unknown moments, inline image and inline image. Thus, without loss of generality, we can limit this investigation to the case where inline image are drawn from a standard normal distribution.

[40] In order to estimate inline image, we now turn our attention to the three random variables on the right-hand side of equation (9) and present an analytical approach to estimate their distribution.

4.2. Computing the p-Value

[41] The probability inline image in equation (9) can be computed from a multiple integral that reflects the distributions of the three random quantities in equation (9): inline image. We define:

display math(10)

where

[42] η is the observed value of inline image,

[43] z is a variable of integration corresponding the the kth smallest observation in the sample,

[44]  inline image is the pdf of inline image conditioned on z and s2,

[45]  inline image is the pdf of inline image conditioned on z, and

[46]  inline image is the pdf of inline image, the kth observation statistic in a standard normal population of size n.

[47] Evaluating equation (10) presents challenges. First, the integral is semi-infinite and three-dimensional. Second, aside from the trivial case where no low outlier is present, the estimators for inline image and inline image are not independent. We solve this by using a nearly closed-form analytical approach with direct numerical integration presented in the next section. The result is a p value for any combination of k and n, which eliminates the need for tables and intepolation schemes for inline image. Tables generated by standard Monte Carlo studies, including critical deviates in IACWD [1982] and Rosner [1983] (among many others) are limited to discrete (and finite) combinations of n and significance level.

4.3. Direct Numerical Integration

[48] An approximate solution to equation (10) is generated with the following steps and approximations. First, the mean is reparametrized to eliminate the correlation between inline image and inline image. Moments for the partial mean and standard deviation are derived. Second, ratios of standardized random variables are approximated by noncentral Student t variates. The result is a one-dimensional univariate integral for inline image.

[49] Following Cohn [2005, 1988], it is convenient to reparametrize the distribution of inline image conditioned on z in terms of:

display math(11)

where λ is the regression coefficient

display math(12)

[50] Cohn [2005] then modeled the joint distribution of inline image and inline image, respectively, as a normal variate and an independent Gamma variate with parameters that can be derived analytically. Using that approach, we obtain:

display math(13)

where

[51]  inline image is the mean of inline image,

[52]  inline image is the variance of inline image,

[53]  inline image is the mean of inline image,

[54]  inline image is the variance of inline image,

[55] U is the numerator in equation (13), inline image, and

[56] L is the denominator in equation (13), inline image.

[57] The required moments ( inline image, inline image, inline image, and inline image) are derived in Appendix A.

[58] The random variable U is approximately a normal variate with mean inline image and unit variance, and L can be approximated by the square root of an independent inline image divided by its degrees of freedom. Thus their ratio, inline image, is approximately a noncentral Student t variate, a well-known distribution whose statistical properties are easily evaluated numerically [Johnson and Kotz, 1970]. The needed p values can then be readily computed as:

display math(14)

where ν denotes the degrees of freedom in the inline image distribution (see equation (A(12)) in Appendix A).

[59] Returning to equation (9), inline image can now be evaluated numerically as a single univariate integral:

display math(15)

where inline image is the pdf for the inline image-order statistic in a standard normal sample of size n, given by [David, 1981]:

display math(16)

[60] Thus equations (14)-(16) should provide an accurate approximation of the p values for inline image thereby avoiding the triple integral in equation (10) or the need for tables.

5. Validation of p-Value Approximation Accuracy

[61] Where only one observation is suspected of being a low outlier (k = 1), critical values computed with the MGBT were compared with published values in Grubbs and Beck [1972], to assess the precision of the MGBT approximations for inline image (using the relationships in section A8). The largest error, expressed in terms of percentage difference in Kn, occurred with n = 10 and was about 0.5% of Kn. The errors decreased monotonically with n, with a maximum error of less than 0.2% for inline image and less than 0.1% for inline image. This comparison indicates the derived formula accurately reproduces the existing published values for this special case.

[62] Monte Carlo experiments were used to evaluate the accuracy of inline image estimated with equations (14)-(16). The experiments consist of essentially three steps:

[63] 1. generate a sample of n standard normal iid variates, inline image;

[64] 2. compute the approximate p value inline image for inline image, and inline image using equations (14)-(16); and

[65] 3. determine if computed values of inline image for each sample is less than several nominal significance levels ( inline image, and 50%).

[66] Figure 2 shows the rejection rate for the MGBT based on equation (15) for sample sizes inline image, at select order statistics and significance levels. The results in Figure 2 are based on 100,000 replicate samples and are accurate to at least three significant digits. The rejection rates are generally quite close to the nominal significance levels for all α considered. As the sample size increases, the approximation in equation (15) becomes more accurate. The precision of the approximation, even in samples of size 10, is remarkable.

Figure 2.

Observed rejection rate for approximate Multiple Grubbs-Beck Test for sample sizes inline image, number of suspected low outliers inline image and nominal significance levels inline image (shown as red line), based on 100,000 simulated normal samples for each sample size. The number of outliers k was rounded down to integers for samples resulting in noninteger values.

6. Examples

[67] Two flood frequency examples illustrate the p value computations and the identification of PILFs. Annual maximum 1 day rainfall flood flows from the Sacramento River at Shasta Dam, California are revisited. Annual peak-flow data from Orestimba Creek, California demonstrate the challenge of flood records with many PILFs.

[68] For the 1932–2008 Sacramento River record, Figure 3 depicts p values for the low outlier criterion with inline image candidate observations suspected of being low outliers. The horizontal line on the graph represents the 10% significance level; if the kth smallest observation is below the line, that observation and those smaller could be called an outlier. For the Sacramento River, this is the case for the smallest three observations—that is, inline image. This is entirely consistent with the visual conclusion that there are inline image low outliers in the flood series (Figure 1). Unfortunately, the standard GBT identifies only a single low outlier for this data set. This is not surprising because the test is designed to find a single low outlier, and, does not consider the actual distribution of the second or third smallest observations, as does the new MGBT.

Figure 3.

Computed p values that correspond to the suspected number of low outliers for the Sacramento River flood series from 1932 to 2008 (n = 77). For an α equal to 0.10 (line), three low outliers are identified.

[69] Figure 4 displays the annual peak flow series from 1932 to 1973 for Orestimba Creek, California (U.S. Geological Survey 11274500) [IACWD, 1982, see 12–32]. The data set is discussed in Bulletin 17B and presents a case with an unusually large number of low outliers. The series includes 6 zero flows. Zero flows show conclusively that the LP3 distribution cannot describe the full range of observed flood flows because the support for the LP3 distribution vanishes for inline image. The standard GBT identifies only one nonzero low outlier. It is safe to conclude that the data set contains at least inline image low outliers including zeros, but should we conclude it contains more than seven?

Figure 4.

Annual-peak flow time series for Orestimba Creek, California (11274500), with data from 1932 to 1973. Zero flows are identified by triangles indicating they are less than 10 cfs.

[70] Figure 5 shows four fitted frequency curves where low outlier thresholds have been set at flow thresholds of T = 1, T = 10, T = 100, and T = 1000 cfs, respectively. The zeros and other values below the low outlier thresholds have been recoded as censored observations inline image cfs, respectively. In the case of a low outlier threshold of T = 1000 cfs, the fit between the model and the remaining observed flood series appears to be reasonably good; the fit is not very good for the cases employing smaller flow thresholds. However, employing a low outlier threshold of T = 1000 cfs implies censoring 19 of 42 observations. Is 19 a reasonable number of small flood values to treat as PILFs?

Figure 5.

LP3 distribution fit using the Expected Moments Algorithm (EMA) to annual peak flow data from Orestimba Creek, California (11274500), where all values less than T cfs have been recoded as “less-than T cfs,” for T equal to 1, 10, 100, and 1000, respectively.

[71] Figure 6 presents p values corresponding to the MGBT for inline image as a proposed guide for identifying PILFs. Because the six smallest observations (the zero flows) are low outliers, we consider the 7th through 21st smallest values as possible low outliers in the sample. The horizontal line in Figure 6 represents the 10% significance level; if the kth smallest observation is below the line, that observation and all smaller values can be considered to be low outliers. For Orestimba Creek, this is the case for inline image. Figure 7 shows that employing a low outlier threshold at inline image, which is 1200 cfs, the LP3 distribution fits the censored data set reasonably well with a skew near zero.

Figure 6.

Computed p values that correspond to the suspected number of low outliers for the Orestimba Creek flood series from 1932 to 1973 so that n = 42. Note that 6 zero flows have been omitted because zero flows are automatically classified as low outliers. For an α equal to 0.10 (red line), 20 low outliers are identified.

Figure 7.

LP3 distribution fit using the Expected Moments Algorithm (EMA) to annual peak flow data from Orestimba Creek, California (11274500), where 20 values less than 1200 cfs have been recoded as “less than 1200 cfs.”

[72] These two examples also illustrate the challenge of developing an objective algorithm for identifying the number of observations in a record to be labeled as outliers or as PILFs. For example, with the Shasta record, if one intended to determine the number of PILFs by testing successively inline image, inline image, inline image (etc.) at the 5% level, one would test the first observation inline image, find a p value > 5%, and thus would stop even though inline image is significant at the 0.1% level (Figure 3). So, when records have several PILFs, the smallest may not be statistically significant because of the masking effect of the other unusual values.

[73] The Orestimba record further illustrates the challenge of deciding how many observations in a record should be labeled as PILFs: the record has 6 zero flows plus a single value of 10 cfs, which would be treated as the seventh smallest observation. However, even on the log-scale (Figure 4), five values in the range of 100–200 cfs are likely to be appropriate values to treat as PILFs. On the other hand, larger observations such as those above the lower quartile have relatively little leverage on estimated design flood flows, and thus we are very reluctant to classify them as PILFs. Moreover, when a decision results from several separate (but not independent) hypothesis tests, the overall type I error is not immediately clear, though it will be at least as large as the largest type I error for the individual tests. Rosner [1983] illustrates the concern with the overall type I error for a test that involves multiple individual tests on different order statistics.

[74] The p values generated by the relationships derived here can serve as the basis of an outlier identification algorithm. They are the basis of several algorithms considered in Lamontagne et al. [2013], which uses a different type I error for large-order and small-order statistics in step-forward (median toward smallest observation) and step-backward (smallest observation toward the median) sweeps.

7. Discussion and Interpretation

[75] Two independent approaches have been used to evaluate the integral in equation (10):

[76] 1. An almost analytical result based on the approximate joint distribution of inline image and inline image, resulting in a univariate numerical integral; and

[77] 2. Monte Carlo simulation experiments.

[78] The semianalytical equation was found to provide reasonably accurate p values for the relevant cases when inline image.

[79] Spencer and McCuen [1996] raise three concerns with the Bulletin 17B outlier detection procedure. One concern is that Bulletin 17B only provides critical values for a test with inline image. While it is unclear why this is problematic for a uniform procedure for flood frequency analysis, the p-value approximation derived here can be used to test for outliers at any desired significance level. A second concern is that the Bulletin 17B outlier test is for only a single outlier. The p-value approximation derived here can be used to test if any order statistic is unusually small, and can be iteratively applied in a many outlier test as described by Lamontagne et al. [2013].

[80] Their final concern is that the test assumes the sample is drawn from a normal distribution (equivalent to an LP3 with zero skew) as its null hypothesis. The p-value approximation derived here also assumes the sample is normally distributed, so it is important to consider this concern. An objective here is to reduce the sensitivity of moment estimators to outlying observations in the left-hand tail. Thus the issue is whether our fitted distribution describing the frequency of large floods is unduly influenced by the smallest observations in a sample. A skewness coefficient less than zero indicates that the smaller observations in a sample are more important than the larger observations, which is a situation we want to avoid. Thus a test employing a threshold based on a skew of zero should yield reasonable results. If the true skew happens to be positive, few outliers will be identified, and that causes no problem because the retained small values will have relatively little influence on sample moments. If the true skew is negative, then many of the smallest flows will be identified as PILFs, which also works well because we do not want the magnitude of unusually small floods to have too large an effect on the frequency distribution derived for large events.

[81] There is an additional concern with the logic of employing the standard GBT when more than a single observation is zero or suspected of being a low outlier. If a sample has m zeros, then the kth smallest nonzero observation is an outlier if it is unusually small when considered to be the inline imageth smallest observation in a sample of size n. This is a more stringent test than the common misguided practice of treating the smallest nonzero retained observation as the smallest observation in an independent sample of size n − m. Similarly, if the GBT identifies one outlier in a sample of size n, then treating the second smallest observation as if it were the smallest observation in an independent sample of size n − 1 does not yield a statistical test with the anticipated type I error. The second smallest observation is indeed the second smallest observation in a sample of size n; the MGBT proposed here reflects that reality.

[82] There might be several reasons to be concerned about removing or censoring the smallest observations in a sample. First, efficient censored-data estimators are more challenging conceptually and numerically than the relatively straightforward closed-form estimators that employ complete data sets [Griffis et al., 2004]. The availability of good statistical software solves that problem.

[83] The second concern is that censoring might result in a loss of information. However, the impact of censoring the smallest observations—describing them as inline image—does not result in much loss of effective sample size if one is concerned about estimating the frequency of large floods. From a statistical as well as hydrological perspective, the smaller observations contain relatively little information pertaining to the magnitude of large quantiles, Qt with inline image years. Kroll and Stedinger [1996] demonstrate that when fitting the two-parameter lognormal distribution with efficient algorithms, censoring up to the 60th percentile of the sample does not noticeably diminish the precision of the 10 year flood estimator. Figure 4 for inline image in Cohn et al. [1997] and Figures 2 and 3 of Griffis [2008] show that when fitting the three-parameter LP3 distribution, censoring even at the 90th percentile level has very little impact on the precision of 100 year flood estimators.

[84] Part of the confusion is related to how we visualize flood data. Traditional probability plots display the noncensored observations versus their plotting positions and give little hint of the additional information provided by the knowledge that k censored observations were less than the smallest retained observation. Furthermore, one fails to realize that the smallest retained observation has the precision of the inline imageth smallest observation in a sample of size n, and not the smallest in a sample of size n − k. Thus, while it may not be consistent with our intuition, sampling experiments confirm that if samples are analyzed efficiently, the precision of t-year flood estimators for inline image obtained by fitting traditional three-parameter distributions, are not substantially affected by censoring a major fraction of the smallest observations in a sample.

[85] More to the point, Lamontagne et al. [2013] show that use of a forward-backward MGBT (instead of GBT or no PILF identification) to fit the LP3 distribution with EMA increased the precision of design flood quantile estimators. This perhaps surprising result occurs because the log-moments employed by EMA are not the most statistically efficient estimators for the LP3 distribution, particularly when the log-space skew is far from zero.

8. Conclusions

[86] This paper introduces a nearly closed-form approximation of the p value of a generalization of the Grubbs-Beck (1972) test statistic that can be used to objectively identify multiple potentially influential low flows in a flood series. It can be used to construct a one-sided variation of the Rosner [1983] test. Once such low outliers have been identified, they can be treated as “less-than” values, thereby increasing robustness of the frequency analysis without substantially reducing the precision with which moments-based fitting methods such as EMA [Cohn et al., 1997; Griffis et al., 2004] can estimate the flood quantiles of interest.

Appendix A: Derivation of the Parameters

[87] Key relations in equations (13) and (14) make use of several means and variances for critical statistics. Formulas for the computation of those values are given here, along with brief derivations.

A1. Moments of the Truncated Normal Distribution

[88] First, we compute the moments of a truncated Normal distribution. Let

display math(A1)
display math(A2)

[89] Then inline image is given by:

display math(A3)

[90] Noting that

display math(A4)

one can then integrate by parts, taking care to note the limits on the integral, to yield:

display math(A5)

[91] This leads immediately to the recursion given in the final line of Table A1. This result is significant because it provides the expected value (as a function of the number of low outliers or PILFs) that is needed for estimating moments.

Table A1. inline image Where inline image
k
display math
01
1
display math
2
display math
3
display math
4
display math
display math
display math
k
display math

A2. The Asymptotic Distribution of inline image Given z

[92] This section presents the derivation of the mean inline image and the variance of the mean inline image needed in equation (13). We restate equation (6), noting the use of the variate Z instead of X:

display math(A6)

[93] The moments of inline image are directly computed from the results in equation (A5) above.

display math(A7)

where inline image. One can now estimate the variance of this mean, using the standard equation for a variance and the key results from Table A1.

display math(A8)

A3. The Approximate Distribution of inline image Given z

[94] This section presents the derivation of the variance inline image and the variance of the variance inline image needed in equation (13). We restate equation (7), noting the use of the variate Z instead of X:

display math(A9)

[95] As shown by Cohn [1988], inline image is asymptotically a gamma variate with pdf:

display math(A10)

[96] The parameters α, ν, and β are determined from various moments:

display math(A11)
display math(A12)
display math(A13)

where results from Table A1 are used to obtain

display math(A14)

[97] Using the equation for the variance [Stuart and Ord, 1987, 10.13]

display math

where

display math(A16)

A4. Covariance of inline image and inline image

[98] The covariance between inline image and inline image is given by:

display math(A17)

A5. Approximating the Joint Distributions of inline image and inline image

[99] If inline image is described as a Gamma variate with parameters inline image, the moments of inline image can be expressed as the fractional moments of inline image [Johnson et al., 1994, equation (18.13), p. 421]:

display math(A18)

and specifically

display math(A19)

[100] Combining equations (A14) and (A19), and then recalling the definition of variance,

display math(A20)

A6. Covariance of inline image and inline image

[101] Applying a first-order approximation for the moments of a function of a random variable [Stuart and Ord, 1987] yields

display math(A21)

A7. Moments of inline image

[102] Based on the means, variances and covariances previously obtained, we now estimate the moments of inline image for equation (13).

display math(A22)

and

display math(A23)

[103] Those are the required results.

A8. Relationship of GBT Critical Kn and inline image

[104] For the case of a single outlier (k = 1) and a desired significance level α, the critical Grubbs and Beck [1972] statistic inline image and the critical MGBT statistic inline image can be compared by noting

display math(A24)

or conversely

display math(A25)

[105] By setting inline image and using the relationships in equations (A24) and (A25), the critical MGBT statistic derived from the approximations in this work can be compared to the GBT critical statistic published in section A4 of Bulletin 17B.

Acknowledgments

[106] The authors would like to thank the members of the Hydrologic Frequency Analysis Workgroup, and Will Thomas Jr., Beth Faber, and Martin Becker in particular, for providing thoughtful analyses and rich discussion that led to this research.

Ancillary

Advertisement