SEARCH

SEARCH BY CITATION

Keywords:

  • sub-annual maxima;
  • n-day maxima;
  • Method of Independent Storms;
  • Peak-over-threshold;
  • Poisson process;
  • Weibull distribution;
  • Fisher Tippett Type 1 distribution;
  • mixed climates

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Estimating the mean reduced variate from the order statistics
  5. 3. Implementation of MIS and LM&S methods
  6. 4. Conclusions
  7. Acknowledgements
  8. REFERENCES

This paper consolidates recent advances in methodologies in extreme-value analysis of wind speeds by using sub-annual maxima in conjunction with exact and penultimate extreme-value models. By avoiding asymptotic models and the associated issues of asymptotic convergence, the consolidated methodology is able to extend analysis further into the lower tail, greatly increasing the statistical confidence. The standard error in design predictions of dynamic pressure is reduced to less than a third of the corresponding standard error from annual maxima. The methodology is demonstrated by re-analysing wind speed data from previously published studies at sites in simple and in various mixed-mechanism climates. Copyright © 2013 Royal Meteorological Society


1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Estimating the mean reduced variate from the order statistics
  5. 3. Implementation of MIS and LM&S methods
  6. 4. Conclusions
  7. Acknowledgements
  8. REFERENCES

The purpose of this paper is to consolidate recent advances in the analysis of independent sub-annual maximum wind speeds in simple and in mixed climates. The paper consolidates the joint model for extremes in mixed climates of Gomes and Vickery (1978) with the Method of Independent Storms (MIS) for continuous data (Cook, 1982), and its developments, IMIS and XIMIS (Harris, 1999, 2009), and with the method of n-day maxima for daily maxima and peak-over-threshold (POT) data (Simiu and Heckert, 1996) and its development, LM&S (Lombardo et al., 2009). The consolidated methodology is demonstrated by re-analysing wind speed data from previously published studies.

1.1. Methods of obtaining independent sub-annual maxima

The Method of Independent Storms (MIS) has been available since 1982 (Cook, 1982). The original extraction methodology has not changed significantly, but the method of fitting the data to a statistical model has improved in increments. As IMIS (Harris, 1999) fitted the sub-annual maxima to the asymptotic Fisher Tippett type 1 (FT1) distribution, limiting the bottom end of the fitting range to the reduced variate of the lowest annual mean in the sample, typically to y ≈ − 1.2. The reason for this limit was that the bottom tail of wind speed data is limited at V = 0, whereas the bottom tail of the asymptotic FT1 model has no lower limit, and this region of disparity should be excluded from any fit. Thus, the original advantage in using sub-annual maxima was confined to the additional data points from the kth-highest maxima, k = 2, 3, etc, that lie between the highest and lowest annual maxima in the observations, an increase in data of about a factor of 3. It was also noted (Cook, 1982) that the annual rate of independent events was insufficient to achieve convergence to the FT1 asymptote unless the upper tail of the parent CDF was already close to being exponential, leading to the recommendation of dynamic pressure as the variable instead of wind speed in the UK.

This field lay fallow for some years, with observed curvature in the classical FT1 ‘Gumbel’ plot often being attributed to Type 2 or Type 3 behaviour instead of to non-convergence. The introduction of the Generalised Pareto Distribution (GPD) to characterize peak-over-threshold (POT) data, which requires complete convergence for its validity, led to vigorous debate (e.g. Galambos and Macri, 1999; Holmes, 2002; Simiu and Lechner, 2002), and new developments of extreme-value (EV) theory, showed the GPD to be inappropriate for wind data (Harris, 2005). Instead, it was argued (Cook and Harris, 2004, 2008) that asymptotic models should be replaced by exact or penultimate models that avoid the issue of convergence. Accordingly, as XIMIS, Harris (2009) extended the IMIS methodology to accommodate penultimate statistical models.

Implementation of MIS (IMIS or XIMIS) requires continuous data in order to identify individual storm systems, so that the maximum wind speeds extracted from each storm are independent and the multiplication law of probability applies to these events. As the storm maxima are the outcome of discrete independent trials, the resulting distribution of annual maxima is modelled exactly by the Binomial distribution. When the annual rate of storms is large and/or the probability of exceedence is small, i.e. in the upper tail where P [RIGHTWARDS ARROW] 1, the simpler Poisson process model can be used (See Appendix A). A key indicator for the applicability of the Poisson process model is that the time interval between such events should be exponentially distributed (Palutikof et al., 1999), referred to here as the Poisson recurrence model. Figure 1(a) shows the distribution of time between storms extracted by MIS from a 30 year record of hourly mean wind speeds at Boscombe Down, UK, plotted on axes that linearize the exponential distribution. The 5–95% confidence limits shown here, and throughout this paper, were obtained by ‘bootstrapping’ the fitted parameters, using the methodology described in Cook (2004). As the observations fit reasonably well within the confidence limits, it is reasonable to assume that the Poisson recurrence model applies to MIS data.

thumbnail image

Figure 1. Distribution of time between events for MIS and LM&S methods

Download figure to PowerPoint

Often only daily maxima or POT data are available, for which several methods have been proposed. Building on the example of Jensen and Franck (1970), Simiu and Heckert (1996) introduced the concept of ‘n-day maxima’: maximum values from successive periods, each of n day duration, with a minimum separation of n/2 days between events imposed to eliminate correlation. Data by this method, although independent, will always fail the key indicator for Poisson recurrence because each period produces an event, so the separation times must all fall between the fixed limits of n/2 and 2n. An improved method (LM&S), recently proposed by Lombardo et al. (2009) for discontinuous POT data, extracts all maxima that are separated by a specified minimum time interval, but does not set an upper limit to the time between events. Figure 1(b) and (c) show the distributions of time between events for a 2 and 16 day minimum separation, respectively. As these indicate that the time interval between LM&S events is not exponentially distributed, the applicability of the Poisson process as the model for LM&S data relies on the rule-of-thumb limits given in Appendix A and empirical verification.

1.2. Extreme-value models

1.2.1. Exact distribution of extremes

The Binomial gives the exact distribution, Φ, of the maximum, , of r values drawn from a parent distribution, P, of independent events, x, as:

  • equation image(1)

from which it is clear that the form of the extreme distribution depends on the form of the parent for all finite values of r. When assessing annual maxima, r is the annual rate of all independent events and represents the upper limit to the rate of events that can be extracted from the wind record by MIS, LM&S or similar methods.

1.2.2. Parents of the exponential type

The parent distribution, P(x), of any variate, x, can always be expressed as:

  • equation image(2)

Fisher and Tippett showed that the form of (1) converges towards one to three possible types as x[RIGHTWARDS ARROW]∞.

  1. When h(x)[RIGHTWARDS ARROW]∞ more rapidly than ln(x), Φ() converges towards Type 1. In this case, P(x) is called a parent of the exponential type. Here, h(x) is a slowly increasing function of x which penultimately behaves like xw as x[RIGHTWARDS ARROW]∞ and ultimately behaves like x. For example, consider h(x) = x2, in which case w = 2: with very large values of x, say x = 100, 101, 102…, h(x) = 10 000, 10 201, 10 404…≈ 10 000 + 200(x − 100), i.e. h(x) behaves like a linear function of x.

    The Type 1 distribution is given penultimately (Cook and Harris, 2004, 2008) by:

    • equation image(3)

    and asymptotically as n[RIGHTWARDS ARROW]∞ to:

    • equation image(4)

    The FT1 distribution is unlimited in the upper tail and has an exponential asymptote. The standard reduced variate, 〈y〉 for the expectation 〈h(x)〉, the ensemble mean from an infinite number of random trials, is from Equation (3):

    • equation image(5)

    Hence, when observations of h(x) are plotted as abscissa against 〈− ln(−ln(Φ))〉 as ordinate, a straight line is expected, with an intercept of ln(n). The conventional ‘Gumbel plot’, in which observations of are plotted as abscissa against 〈− ln(−ln(Φ))〉 as ordinate, will be a straight line when w = 1, a concave-upwards curve when w < 1, a concave-downwards curve when w > 1, and the intercept is Uw.

  2. When h(x)[RIGHTWARDS ARROW]∞ less rapidly than ln(x), Φ() converges towards Type 2. This is also unlimited in the upper tail, increasing faster than Type 1. Type 2 distributions appear on Gumbel plots as concave-upwards curves, hence mimic Type 1 with w < 1.

  3. When h(x)[RIGHTWARDS ARROW]L, a finite upper limit, Φ() converges towards Type 3. This appears on a Gumbel plot as a concave-downwards curve, straightening as it approaches the limit, L, but in the region of the mode (where most of the observations lie) mimics Type 1 with w > 1.

    When observations on a Gumbel plot form a curve, it is not possible to distinguish between Type 1 with w ≠ 1 and the corresponding Type 2 or Type 3 behaviour without a priori knowledge of the form of h(x). In the case of wind speeds, observations in single-mechanism climates are invariably very well represented by the Weibull distribution:

    • equation image(6)

    and in mixed climates by the disjoint sum of two or more Weibull distributions. This is a distribution of the exponential type and leads, via the penultimate FT1 distribution, to the Type 1 distribution as the asymptote for extreme wind speeds.

1.2.3. Penultimate distributions of extremes

Recently, attention has been focussed on avoiding the issues of asymptotic convergence in the upper tail by discarding asymptotic models in favour of exact, or penultimate, models. Note that here, and later in this paper, the term ‘exact’ refers specifically to the relationship between the extreme and the corresponding parent. In all real-world data there is always uncertainty associated with the finite sample size, so this does not imply that ‘exact’ models provide ‘exact‘ values.

Cook and Harris (2004, 2008) proposed Equation (3) as the penultimate model for wind speeds because parent wind speed data conform closely to the Weibull distribution, where h(x) = (V/C)w. This model is exact for Weibull parents and is the penultimate distribution for all parents of the exponential type.

In developing XIMIS by extending MIS/IMIS to accommodate penultimate models, Harris (2009) re-examined the role of the Poisson process model:

  • equation image(7)

This is the form of the extremes of independent events that follow the Poisson recurrence model at an average rate, r, for any form of the parent, P. For the derivation of Equation (7) see Cook et al. (2003). More generally, Equation (7) provides a very good approximation to Equation (1) within the rule-of-thumb limits given in Appendix A. Although Equation (7) is independent of any model, it uniquely links the extreme to the parent, so selecting a model for one also selects the model for the other. For parents of the exponential type, P = 1 − exp(−h(x)), Equation (7) becomes (3), the penultimate FT1 model of Cook and Harris (2004, 2008). The standard reduced variate evaluates from (7) as:

  • equation image(8)

Where the events do not follow a Poisson process, Harris (2009) demonstrated that significant differences between (8) and the series expansion of the exact expression (1) are confined to the lower tail, i.e. as P[RIGHTWARDS ARROW]0, and are equivalent in size to the error in the Cauchy approximation in the derivation of (3) (see Cook and Harris, 2004 and Appendix A, here).

Figure 2 compares the Poisson reduced variate 〈y〉 from Equation (8) with the asymptotic FT1 for various values of rate, r. The upper tails are convergent, but the lower tails are different: the Poisson process model is limited at y = − ln(r) whereas the asymptotic FT1 is unlimited in the lower tail (effectively − ln(r)[RIGHTWARDS ARROW]− ∞). As MIS data extend well into the lower tail, this limit has important implications for the fitting range.

thumbnail image

Figure 2. Poisson reduced variate (ordinate) compared with the asymptotic FT1 reduced variate (abscissa) for various rates, r

Download figure to PowerPoint

2. Estimating the mean reduced variate from the order statistics

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Estimating the mean reduced variate from the order statistics
  5. 3. Implementation of MIS and LM&S methods
  6. 4. Conclusions
  7. Acknowledgements
  8. REFERENCES

2.1. The order statistics

The order statistics of a sample are obtained by ranking the values in ascending or descending order of value. Conventional EVA uses the rank from the bottom in ascending order, usually denoted by m for the mth smallest value. However, some of the relevant expressions are simpler when the data are ranked in descending order of value, with rank denoted by ν for the νth largest value, or when probability P is replaced by its complement, Q = 1 − P. (See Harris, 1999, 2009)

The expectation, 〈ym〉, of the mth out of N ranked values of any function, y(P), is the best unbiased estimator of ym and is evaluated from the Binomial by:

  • equation image(9)

It is important here to note that Equation (9) is universally applicable to all functions of P, including probability itself, i.e. including y(P) = P. This case evaluates to the Weibull (1939) estimator:

  • equation image(10)

Although Gumbel (1958) advocated the use of this estimator, Gumbel noted that it produces a mean bias in estimates of the variate of all non-linear distributions. As all probability distributions in nature tend to be non-linear and follow an S-shaped curve, so all estimates for the variate using Equation (10) will be biased. A thorough discussion of this issue is given by Cook (2011, 2012).

In most analysis and design applications, the aim is not to obtain the best unbiased estimate of probability for a given observational value, but is to obtain the best unbiased estimate of the variate for a datum (design) probability: in this case, the best estimate of the reduced variate, y. Whether or not (10) can be evaluated directly for y depends of the form of y(P).

2.2. The Fisher Tippett type 1 reduced variate

The expectation of the FT1 mean reduced variate, 〈ym〉, for any rank is given by inserting the FT1 function for y = − ln(−ln(Φ)), from (5), into (9). A closed form solution does not exist, and evaluation of (9) requires numerical integration (Harris, 1999) or a Monte-Carlo (bootstrapping) approach (Cook, 2004), with the latter also able to evaluate confidence limits.

2.3. The Poisson reduced variate

The expression for the mean reduced variate, ym, for any rank is given by inserting the Poisson model, y = − ln(1 − Φ)− ln(r), from (8), into (9). A closed form solution does exist, as given by Harris (2009) for the XIMIS method:

  • equation image(11)

where ψ is the digamma function. Evaluation of ψ(x) is usually done by exploiting its recurrence relationship, ψ(x + 1) = ψ(x)+ 1/x, starting with the value ψ(1) = − γ = − 0.577215665. Evaluation of (11) is the difference of two summations: one of length N and one of length Nm. Hence (11) simplifies to the single summation of length m:

  • equation image(12)

an expression also given by Gumbel (1958, p. 117) for the case of r = 1. Implementation of (12) is computationally efficient when 〈y〉 is required for all ranks, since 〈y1〉 = 1/N − ln(r) for the first rank, then each successive rank is given from the previous by one division and one add operation. Provided this is done in at least 32-bit double precision arithmetic, values can be obtained for very large N without accumulating significant rounding errors.

The issue with POT data is that the threshold excludes the lower tail, so the observations are left-censored. The population of events above the threshold is not the population required to evaluate 〈ymPoisson in (11) or (12). If the length of the record in years is denoted by R, the unknown population of events becomes N = R × r. Since ψ(x + 1)[RIGHTWARDS ARROW]ln(x) as x[RIGHTWARDS ARROW]∞, then from (11) (Harris, 2008, equation (4).22):

  • equation image(13)

where ν is the rank in descending order (νth largest value). Equation (13) is a more rigorous confirmation of the insensitivity of MIS to the annual rate than that given by Cook and Harris (2004, appendix B). Equation (13) can be used for POT data, where N is unknown in value but is large. The digamma function, ψ(ν), is evaluated first for the largest value, ψ(1) = − γ, then for the 2nd largest, etc., using the recurrence relation sequentially until the smallest data value is reached.

3. Implementation of MIS and LM&S methods

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Estimating the mean reduced variate from the order statistics
  5. 3. Implementation of MIS and LM&S methods
  6. 4. Conclusions
  7. Acknowledgements
  8. REFERENCES

3.1. Maximum annual rate of independent maxima and the ‘relevant parent’

It is convenient at this point to distinguish between the annual rate of independent maxima extracted from the data record, re, and the maximum annual rate of all independent maxima inherent in the record, ri. The ratio re/ri can be interpreted as a measure of the efficiency of the extraction methodology in maximizing the population of values for analysis.

For the temperate climate of the UK, specifically for the case of Boscombe Down which is examined later, Harris (2008) found the correlation time scale to be T = 22.15 h, or approximately 1 day. Given that the interest is in maxima of independent events, the shortest time between such maxima is t = 44.3 h, since at least one event minimum must exist between any two consecutive maxima. Hence the maximum annual rate of independent maxima for the temperate UK climate is ri = 198.6 ≈ 200.

A method that could extract all these independent values from each year of a record would provide what Harris (2008) calls the ‘relevant parent’. It follows that if the MIS or LM&S method extracts independent events at an annual rate re then these events are directly related to the relevant parent by Equation (1), where Φ represents the extracted maxima, P represents the parent, and r = ri/re. Hence, providing that ri is known, or can be independently estimated, it is possible to estimate the relevant parent by inverting (1).

An early attempt to formulate a model for this parent was made by Brooks et al. (1946, 1950). They suggested that the wind vector be resolved into orthogonal x and y components relative to a set of arbitrarily oriented Cartesian axes, each component Normally distributed and mutually independent. As noted by Davenport (1968), this results in a parent distribution which is Rayleigh in form, i.e. Weibull with w = 2. As noted in Section '1.2. Extreme-value models', above, all parents of the exponential type behave penultimately like xw, hence they exhibit Weibull-distribution equivalence in the upper tail. Wind speeds are observed to be very well represented by the Weibull distribution, Equation (6), or the disjoint sum of several, but the shape parameter, w, can take a wide range of values depending on the wind mechanism. This is the basis of the penultimate extreme model proposed by Cook and Harris (2004, 2008).

3.2. Boscombe Down, UK

3.2.1. MIS analysis

Boscombe Down has been used as an exemplar of the UK wind climate in two recent studies:

  • Cook and Harris (2004) fitted their penultimate model for y > − 1.3, assuming a single climate mechanism. The observations lay well within 5–95% confidence limits, but some lay outside the more onerous 37–63% confidence limits, and,

  • Cook and Harris (2008) used the Jenkinson-Lamb index to separate the hourly mean observations into cyclonic and anti-cyclonic sets for separate analysis. The distribution of the hourly parent, irrespective of direction, was a very good fit to the disjoint two-mechanism Weibull model and the extremes to a joint two-mechanism penultimate FT1 model.

When the standard MIS method is run on hourly mean wind speed data from Boscombe Down, the annual rate of events recovered is re = 147, representing an extraction efficiency of around 75%. As the method extracts maxima, the data are therefore expected to have diverged slightly from the ‘relevant parent’, in accordance with Equation (1), and towards the FT1 asymptote. However, from reference to Figure 2, observations plotted using the Poisson ordinate are expected to lie close to the FT1 model down to y ≈ − 4.

Figure 3 shows the dynamic pressure for all ranks plotted against the mean reduced variate for the FT1 ordinate (small solid circles) and for the Poisson ordinate (large open circles). The bold curve is the fit to the penultimate FT1 model, assuming a single mechanism climate, and the chained curves are the 5–95% confidence limits on the data. The penultimate FT1 model was fitted for 〈yPoisson > − 4, with 1665 observations contributing to the fit. All model fitting in this paper was made using the multi-parameter non-linear optimizer ‘Solver’ in MS Excel spreadsheets by optimizing the parameters of the model to achieve the least mean square error between 〈yPoisson and the model y. The resulting shape factor wq = 0.989 for dynamic pressure gives almost a straight line, and corresponds to wV = 1.98 for wind speed which is typical of wV ≈ 2 for the UK climate. As expected, the data remain close to the fitted curve down to 〈yPoisson = − 4, much further into the tail than the y = − 1.8 limit suggested by Harris (2009), but is limited at y = − ln(147) = − 5.0. On the other hand, the data plotted using the 〈yFT1 ordinate have diverged significantly above the penultimate FT1 model by 〈yFT1 = − 3 as the lower tail asymptotically approaches the q = 0 axis. The fitted model line intersects the q = 0 axis at y = − 5.41, giving another estimate for the rate of independent maxima, ri = 224. Comparisons of estimates for the annual rate of independent events, ri, should be made in terms of − ln(ri) as this is represents a linear shift of the variate, the so-called ‘Poisson shift’—on which basis, the correlation time scale estimated by Harris (2008) gives − ln(ri) = 5.3, while Figure 3 gives − ln(ri) = 5.4, matching to within 2%.

thumbnail image

Figure 3. MIS dynamic pressures for Boscombe Down, UK, using FT1 and Poisson ordinates

Download figure to PowerPoint

Figure 4 shows the corresponding relevant parent recovered using r = 200/147 in Equation (1) which, plotted using the Poisson ordinate, gives an excellent match to the penultimate FT1 model for the full range of the observations. The bold model curve and chained confidence limits are from the MIS fit in Figure 3. The extended range includes sufficient observations to attempt a direct fit for the six parameters of the joint two-mechanism penultimate FT1 model, shown by the dashed curve, which is comparable to the fit from the cyclonic/anti-cyclonic separated data in Cook and Harris (2004). The two-mechanism mixed climate model is explored later in Section '3.3. Mixed climates'

thumbnail image

Figure 4. Poisson parent of MIS dynamic pressures at Boscombe Down, UK

Download figure to PowerPoint

3.2.2. LM&S analysis

Daily maxima were abstracted from the hourly dataset, then independent maxima selected by the LM&S method for separations of n = 2, 4, 8 and 16 days. The shortest separation interval, n = 2 days, gives re = 103, representing an extraction efficiency of about 50%. The longest interval, n = 16 days, gives re = 12, an extraction efficiency of only about 6%. Figure 5 shows the dynamic pressure for the median rank of each integer wind speed plotted against the mean reduced variate for the FT1 distribution (small solid symbols) and for the Poisson process model (large open symbols). The fit to the penultimate FT1 model from the MIS analysis of Figure 3 is included for comparison as the bold curve, and the chained curves are the 5–95% confidence limits for this. The trend for the FT1 ordinate is for the observations in the lower tail to converge gradually towards the asymptotic FT1 model, from above, as the interval increases—but note that the observations for n = 16 days lie below the model. As the interval, n, increases, the rate of events extracted, re, decreases rapidly, so that the observations plotted using the Poisson ordinate follow the trend in Figure 2. The observations on the Poisson ordinate begin to move away from the MIS model curve before they approach y = − ln(re), and the smaller the value of re the earlier this deviation begins. The deviation becomes significant at a position approximately ln(ri/re) above y = − ln(re), i.e. at y = − 2ln(re)+ ln(ri), and this is proposed as a reasonable rule-of-thumb for setting the lower fitting limit for LM&S data.

thumbnail image

Figure 5. LM&S dynamic pressures for Boscombe Down, UK, using FT1 and Poisson ordinates

Download figure to PowerPoint

Figure 6 shows the relevant parents recovered using r = ri/re in Equation (1), for comparison with Figure 4. The parents lie close to the fitted MIS model curve, with n = 2 giving the best match, and with a trend for increasing slope (increasing dispersion, C) as n increases. This action removes the deviation in the lower tail and substantially increases the fitting range to y = − 4 for all values of n used here. Figures 5 and 6 empirically demonstrate the validity of the Poisson process model for LM&S data in terms of the wind speed values, despite the poor match for recurrence interval in Figure 1.

thumbnail image

Figure 6. Poisson parents of LM&S dynamic pressures at Boscombe Down, UK

Download figure to PowerPoint

3.2.3. Discussion of Boscombe Down results

Table 1 lists the penultimate FT1 parameters when fitted for y > − 2ln(re)+ ln(ri) using the Poisson ordinate. The two right-hand columns give the predicted 50 year return period dynamic pressure and the percentage difference from the datum MIS value. The LM&S predictions become increasing less accurate as n increases, underestimating by 17% for n = 16 days. It is apparent that any additional confidence that the events are uncorrelated is negated by the increase in sampling variance as the fitted population falls with increasing n. The aim should always be to use shortest interval that achieves independence.

Table 1. Penultimate FT1 parameters fitted to observations of dynamic pressure at Boscombe Down, UK
Standard MIS and LM&SLower limit: y = − ln(re)+ 1.5
 N fittedw (Pa)U (Pa)C (Pa)q50 (Pa)Δ(q50)
MIS16650.989198.636.0343.90.0
LM&Sn = 27870.956198.629.4330.6− 3.9
 n = 44430.991198.040.4370.84.6
 n = 82010.873194.518.3272.4− 10.3
 n = 16960.761193.08.1224.5− 16.8

Table 2 gives the corresponding penultimate FT1 parameters for the ‘relevant parents’ fitted over the full available range, i.e. for y > − 5. Each case includes two to three times more events in the fit than above and this gives a large improvement in the statistical accuracy of the parameters. The LM&S predictions gradually overestimate as n increases, but by only 1.7% for n = 16 days. There is a small, but consistent trend to greater slope (greater dispersion, C) in the LM&S curves with increasing n which is attributable to the corresponding increase in sampling variance. The ranking process subsumes the sampling variance into the variance of the observations but the mean is unchanged, so the dispersion increases and the mode decreases in value. This trend is consistent in Table 2 and is the reason that the observations for n = 16 days lie below the model on the FT1 ordinate in Figure 5.

Table 2. Penultimate FT1 parameters fitted to relevant parent of dynamic pressure at Boscombe Down, UK
Relevant parent MIS and LM&SLower limit: y = − 5
 N fittedw (Pa)U (Pa)C (Pa)q50 (Pa)Δ (q50)
MIS44760.989198.636.0343.90.0
LM&Sn = 231570.980200.234.9344.90.3
 n = 417500.969198.034.9347.81.1
 n = 87960.975195.736.3348.31.3
 n = 163751.002193.540.2349.61.7

Table 3 gives the penultimate FT1 parameters for the joint two-mechanism fit to the MIS ‘relevant parent’ shown in Figure 4. The top half of the table gives the parameters for dynamic pressure, while the bottom half gives the values converted to wind speed in knots for comparison with the fit in Cook and Harris (2008). The results are quite closely comparable, given that the observations in Cook and Harris (2008) were separated before analysis and each set fitted for three unknown parameters, while the observations in this paper were fitted together, for six unknown parameters. The difference between the single and joint models in Figure 4 is small in comparison with the range of the confidence limits. Until recently, the UK strong wind climate has been regarded as ‘simple’, i.e. dominated by Atlantic depressions, but Cook and Harris (2008) showed that analysing separated cyclonic and anti-cyclonic components accounted for the observed deviations of the hourly parent and annual extreme distributions from the single-climate models. It remains debatable whether this represents a true mixed climate, or is simply two aspects of a single climate, and further research is being undertaken to determine this issue.

Table 3. Two-mechanism disjoint penultimate FT1 parameters fitted to relevant parent of dynamic pressure at Boscombe Down, UK
Relevant parent, two-mechanism disjoint extreme, MIS
Dynamic pressurewqUq (Pa)Cq (Pa)q50 (Pa)Δ (q50)
10.940187.832.1350.41.9%
20.916147.829.5  
Wind speedwVUV (kn)CV (kn)V50 (kn)Δ (V50)
11.8834.014.146.50.9%
21.8330.213.5  
3.2.4. Confidence of model predictions

The confidence limits for the observational data were computed by bootstrapping 10 000 trials from the fitted distribution, the 5% confidence limit being given by the 500th smallest value and the 95% confidence limit by the 500th largest value for each rank (Cook, 2004). A similar procedure was used to construct confidence limits for the fitted model, i.e. for design predictions. The penultimate FT1 model was fitted to each trial, then distributions of the predicted variate compiled for various datum values of reduced variate, y = 0 (0.1) 6.

Figure 7(a) compares the 5 and 95% confidence limits of the MIS model fit for each of the analysis methods used above, each using the same fitting range, y > − 4. Note that the confidence limits are generally curved, but these in Figure 7 are nearly linear because wq ≈ 1 for this station. Unlike the data confidence limits, which apply only to each ranked observation, the confidence limits for the model fit can be extended indefinitely by enumerating the fitted model beyond the range of the data to indicate the confidence of extrapolated predictions. Note also that the confidence limits of the fit always lie inside the corresponding confidence limits of the data values because the fit tends to average out the individual sampling errors.

thumbnail image

Figure 7. Five percent and 95% confidence limits of model predictions for Boscombe Down, UK

Download figure to PowerPoint

All the sub-annual extreme methodologies compared here are significantly more accurate than using just the annual maxima. Although the differences between them are small, MIS gives the best accuracy and LM&S with a 2 day minimum separation gives the next best. Figure 7(b) compares the 5 and 95% confidence limits of the MIS model fit for various fitting ranges and shows that the accuracy of predicted design values improves significantly as the population of values used in the fit increases. The benefit of extending the fitting range into the lower tail on the accuracy of design predictions is substantial, in this case reducing the standard error of the design dynamic pressure to less than a third of the error when using just the annual maxima.

3.3. Mixed climates

3.3.1. Joint distribution of extremes

First proposed by Gomes and Vickery (1978), the joint model for annual maxima in an n-mechanism mixed climate is:

  • equation image(14)

For two mechanisms, this is just Φ = Φ1 × Φ2. The joint model assumes each mechanism contributes a value to each year of the record which competes with the other mechanisms to be the annual maximum. When the second of two mechanisms is rare, so that it does not contribute to some years, the joint-disjoint model applies: Φ = (1 − f1 + fΦ1Φ2 where f is the relative annual frequency of the rare events. Cook et al. (2003) show that the joint-disjoint model reverts to the joint model (14) for Poisson distributed events: i.e. to equation image, where equation image is the equivalent distribution of annual maxima. This equivalent distribution, equation image, is the distribution of event maxima, Φ2, with a Poisson shift equal to ln(f). A physical interpretation is that, in each year that the rare mechanism does not contribute an event, a notional event occurs which is so insignificant that it cannot be detected. When the mechanism is very rare, re ≪ 1, the Poisson shift may give the equivalent annual mode, U, a negative value. Accordingly, the Gomes and Vickery (1978) model is universally adopted for extreme wind speeds in mixed climates.

3.3.2. Newark, NJ, USA

Newark is one of the three stations serving the New York area used by Lombardo et al. (2009) to demonstrate the LM&S methodology and the wind data are available for download from http://www.nist.gov/wind. At Newark, the wind climate is mostly driven by large scale weather systems about 4 days apart, with occasional thunderstorms which dominate the extremes. The data are for a period of 20 years and comprise gust speeds exceeding 25 kn that have been rounded to integer knot values, i.e. they are POT data for a threshold of 25.5 kn. This gives an opportunity to demonstrate analyses using Equation (13), since the population and rate of all independent events are unknown, and also an opportunity to compare analyses of the separated data with the joint fit to the full data set.

Lombardo et al. (2009) used an indicator of thunderstorm activity to separate the POT data into ‘thunderstorm’ (T) and ‘non-thunderstorm’ (NT) events for separate analysis. Wind speed in knots was used as the variate in their analysis, and is adopted here to permit direct comparison with their figures. Figure 8 shows the results of three analyses: (a) for the separated thunderstorm events; (b) for the separated non-thunderstorm events; and (c) for the mixed set of events, with each component fitted to the penultimate FT1 model and with the corresponding 5–95% confidence limits for the data. Note that (a) and (b) are fits to three unknowns, whereas (c) is a fit to six unknowns.

  1. The penultimate FT1 model is an excellent fit to the thunderstorm T events and all data values lie centrally within the confidence limits. The confidence limits are wide because there are only 105 thunderstorm events in the record. The thunderstorm component obtained from the joint fit to the mixed set (c), shown by the dot-dot-dash curve, is not a good fit to the events.

  2. The penultimate FT1 model is a very good fit to the non-thunderstorm NT events, lying within the confidence limits, but a systematic ripple in the data suggests that these events may be a mixture of contributions from two or more mechanisms. The non-thunderstorm component from the joint fit to the mixed set (c), the dot-dot-dash curve, is not a good fit to the events.

  3. The joint penultimate FT1 model is a very good fit to the mixed events, lying within the confidence limits. The individual T and NT components from the joint six-parameter fit are shown as dashed curves for comparison with (a) and (b). The joint model, Equation (14), synthesized from the separate analyses in (a) and (b), shown by the dot-dot-dash curve, is almost coincident with the joint fit to the mixed events.

thumbnail image

Figure 8. LM&S wind speeds for Newark, USA, fitted to penultimate FT1 model

Download figure to PowerPoint

In all three cases, the lower end of the fit is limited at 26 kn due to the threshold, the lower tail is censored.

These three analyses illustrate the principal benefit of separating the components of mixed data before analysis. Although the joint model is virtually identical in both cases, the individual components obtained from the joint fit to the mixed data are a poor match to the corresponding separated data. This is because the joint fit to the mixed data optimizes the six parameters together, so is unable to detect or eliminate compensating errors between the three parameters defining each component.

3.3.3. A consistent methodology for suspected mixed climates

In the example analyses above, Newark is clearly a mixed climate case, with two, perhaps three, separate mechanisms. For Boscombe Down the case for a mixed climate is debatable: are ‘cyclonic’ and ‘anti-cyclonic’ really two separate mechanisms, or are they just two anti-symmetric aspects of the same mechanism?

The data confidence limits provide a pragmatic method to indicate when it is appropriate to admit another mechanism to the analysis. The following procedure is proposed.

  1. Fit the full data set to a single penultimate FT1 model.

  2. Apply the confidence limits evaluated from the fitted parameters (e.g. as in Cook, 2004):

    1. if the data lie within the confidence limits there is no statistical justification for admitting a second mechanism—any variations within the limits may be due to chance;

    2. if the data systematically breach the confidence limits admit another mechanism and re-fit the data to the joint model. ‘Systematically’ in this context means a consistent trend that involves a number of adjacent data values, excluding individual outliers.

  3. Repeat step 2 until all data lie systematically within the confidence limits.

The choice of level for the confidence limits is open to debate. Conventionally, 5–95% limits are used to indicate that the current model fails, whereas 37–63% limits are used to indicate that the current model does not fail. In the first case another mechanism should be admitted, and in the second case no further mechanisms should be admitted. This leaves a ‘grey area’ between, in which it is not necessary to add another mechanism, but it may not be inappropriate to try. This is a classic ‘sum of exponentials’ problem which, as noted by Lanczos (1956), is notoriously poorly conditioned. Each additional mechanism adds extra degrees of freedom which always reduce the residual errors in the fit, but Lanczos shows that adding too many produces unrealistic individual components. In this case ‘too many’ means more than the number of significant physical mechanisms. When the range of the data is confined to the transition region between two component mechanisms it is also possible that the observations may remain inside the confidence limits for a single mechanism with wV < 1, mimicking a Type 2 distribution. In this event it is necessary to appeal to the observed physical mechanisms contributing to strong winds, especially for POT data where the lower tail is censored and the confidence limits are wide.

When the original data has been rounded to integer values, both MIS and LM&S methods will produce many tied values. In the lower half of the distribution, the spread of these ties will be wider than the confidence limits. Lombardo et al. (2009) address this issue by distributing the tied values evenly through the rounding interval. Here the problem is addressed by using only the median rank of each set of ties.

The issue of the appropriate weighting for each rank in computing the residual error is also open to debate. Harris (2009) advocated weighting each rank by the inverse of the sampling variance, i.e. by 1/σ2(y), so that each value has the same statistical accuracy. Where the wind speeds are rounded to integer values, as here, there is an additional component from the discretisation variance, σ2(V), which will dominate the uncertainty in the lower tail, in which case the appropriate weight is 1/(σ2(y)+ [σ(V)∂y/∂V]2). The complication in admitting the discretisation variance is the ∂y/∂V term depends on the fit, so becomes an element in the fitting process. When using only the median ranks for each wind speed, each value should also be weighted by the number of ties to restore the weights of the missing values.

Experience shows that weighting by the inverse of the total variance is appropriate when fitting to a single-mechanism model, because it favours the region around the mode and reduces the influence of the less reliable upper tail. However, experience with fitting two, or more, mechanisms shows that this weighting scheme may fail to recognize a second mechanism affecting the upper tail. It may instead assign the second mechanism to a random deviation in the lower tail where there are more observations. Each integer value of wind speed has an equal influence on the fit when each median rank is assigned an equal weight, in which case the second mechanism is always assigned to the most significant deviation within the wind speed range. This is the weighting scheme adopted for Changi, below.

3.3.4. Changi, Singapore

Changi, which lies about 1°N of the Equator, is one of the three stations analysed by Choi and Tanurdjaja (2002) in their study of the mixed wind climate of Singapore. They report seasonal variations in the wind climate between strongly diurnal Monsoon winds, short duration tropical thunderstorms and dawn squalls (‘Sumatras’). Continuous data were separated into long term (Monsoon) and short term (thunderstorm/squall) events. Independence was ensured through the Simiu and Heckert (1996) n-day extreme method, using n = 4–8 days for the long-term data and n = 1 day for the short term data. They performed their analyses using IMIS (Harris, 1999), which implies fitting the asymptotic FT1 distribution to the dynamic pressure, hence wV = 2, so that the fitted model plotted on standard Gumbel axes is a concave-downwards curve. Choi and Tanurdjaja's figures 18–21 exhibit the same characteristic deviation in the lower tail that appears in Figures 3 and 5, here, for the FT1 ordinate.

As Choi and Tanurdjaja's source data were not readily available for this study, a 25 year record of the 10 min mean wind speed at Changi, recorded at 3 h intervals, was purchased from the US National Climatic Data Centre. Independent storm maxima were extracted using the MIS procedure (Cook, 1982), giving an annual rate of re = 142. These were not separated by mechanism and no attempt was made to recover the ‘relevant parent’.

Figure 9(a) shows the single mechanism penultimate FT1 fit to the storm maximum wind speeds, plotted using the Poisson ordinate for the median rank at each integer knot wind speed. There is a clear systematic ripple that crosses outside the 5 and 95% confidence limits, indicating that a single model is not appropriate. Figure 9(b) shows the corresponding joint two-mechanism penultimate FT1 fit, and also the model curves for the two individual components. As the joint model fit lies within the 37 and 63% confidence limits, the conditions of the proposed methodology in Section '3.3. Mixed climates' are satisfied.

thumbnail image

Figure 9. MIS 10 min mean wind speeds for Changi, Singapore

Download figure to PowerPoint

The joint fit is closely comparable with Choi and Tanurdjaja's joint distribution synthesized from their analyses of separated data, except that their distributions are forced to wV = 2. This analysis indicates that the second mechanism has a very high shape parameter, wV = 5.05, so has a very short upper tail. It has no significant impact on wind speeds above the annual mode, but accounting for it improves the fit to the first, dominant mechanism, wV = 1.66. This result is entirely consistent with Choi and Tanurdjaja's conclusions.

4. Conclusions

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Estimating the mean reduced variate from the order statistics
  5. 3. Implementation of MIS and LM&S methods
  6. 4. Conclusions
  7. Acknowledgements
  8. REFERENCES

This paper demonstrates an analysis methodology for sub-annual extreme wind speeds that consolidates recent advances in the analysis of independent sub-annual maximum wind speeds in simple and mixed climates.

MIS remains an effective procedure for extracting independent sub-annual maxima from continuous wind records. The LM&S method is effective with discontinuous or POT wind data. When the expected annual rate of independent events is independently known, the ‘relevant parent’ of the sub-annual maxima can be recovered.

The Poisson process model is independent of any extreme value model, exact, penultimate or ultimate, that may be adopted for fitting. Sub-annual extremes of wind speeds represented by the Poisson process model exhibit the expected asymptotic behaviour of parents of the exponential type in both the upper and lower tails. This permits fitting further into the lower tail than in earlier methodologies, with many more observations contributing to the fit, so giving a substantial improvement in statistical confidence.

The Poisson process model links the extremes explicitly to their parent distribution, the selection of one defines the other. For parents of the exponential type, the extremes of x behave penultimately as xw and imply that the parent is Weibull distributed in the upper tail and that the extremes follow the penultimate model of Cook and Harris (2004, 2008). A major advantage of this model is that the choice of wind speed, V, or dynamic pressure, q, as the variate for analysis makes absolutely no difference to the result, since: wq = wV/2, equation image and equation image. On the other hand, the asymptotic model, which is linear in terms of the variate, gives a different result for V and for q (Cook, 1982). Empirical arguments over which asymptotic model is ‘best’ as in An and Pandey (2007), or whether the distribution is better represented as Type 2 or Type 3 as in Lechner et al. (1993), become redundant when the penultimate model is used.

The consolidated methodology complies with the expectation of extreme-value theory and is empirically validated by the analyses of example sites in differing wind climates around the world presented in this paper.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Estimating the mean reduced variate from the order statistics
  5. 3. Implementation of MIS and LM&S methods
  6. 4. Conclusions
  7. Acknowledgements
  8. REFERENCES

Useful discussions with R. I. Harris in the UK and the assistance of J. A. Main at NIST, USA in providing the thunderstorm/non-thunderstorm separated data for Newark are gratefully acknowledged.

Appendix A. Relating the Binomial and Poisson process models for the annual maxima from independent storms

The exact distribution of annual maxima from independent storm maxima, derived through the multiplication law of probability, is given by Equation (1). Substituting Q = 1 − P(x) and expanding as a Binomial series gives:

  • equation image

Similarly, the series expansion for the Poisson process model given by Equation (7) becomes:

  • equation image

The first two terms in each expansion are identical. Third and successive terms differ only in the treatment of the annual rate, r. When rQ < 1, i.e. in the upper tail of Φ, the two series converge rapidly. When equation image and the two series are almost identical, term for term. Accordingly, the Poisson process is a good model for the Binomial when r ≥ 10 and Q⩽0.05 are used as rule-of-thumb limits.

REFERENCES

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Estimating the mean reduced variate from the order statistics
  5. 3. Implementation of MIS and LM&S methods
  6. 4. Conclusions
  7. Acknowledgements
  8. REFERENCES
  • An Y, Pandey MD. 2007. The r-largest order statistics model for extreme wind speed estimation. J. Wind Eng. Ind. Aerod. 95: 165182.
  • Brooks CEP, Durst CS, Carruthers N. 1946. Upper winds over the World, Part I, The frequency distribution of winds at a point in the free air. Q. J. R. Meteorol. Soc. 72: 5573.
  • Brooks CEP, Durst CS, Carruthers N, Dewar D, Sawyer JS. 1950. Upper winds over the World. Geophysical Memoirs. HMSO: London; pp 150.
  • Choi ECC, Tanurdjaja A. 2002. Extreme wind studies in Singapore. An area with mixed weather system. J. Wind Eng. Ind. Aerod. 90: 16111630.
  • Cook NJ. 1982. Towards better estimation of extreme winds. J. Wind Eng. Ind. Aerod. 9: 295323.
  • Cook NJ. 2004. Confidence limits for extreme wind speeds in mixed climates. J. Wind Eng. Ind. Aerod. 92: 4151.
  • Cook NJ. 2011. Comments on “Plotting positions in extreme value analysis” (The role of sampling error in extreme value analysis). J. Appl. Meteorol. Climatol. 50: 255266.
  • Cook NJ. 2012. Rebuttal of “Problems in the extreme value analysis”. Struct. Saf. 34: 418423.
  • Cook NJ, Harris RI. 2004. Exact and general FT1 penultimate distributions of wind speeds drawn from tail–equivalent Weibull parents. Struct. Saf. 26: 391420.
  • Cook NJ, Harris RI. 2008. Postscript to “Exact and general FT1 penultimate distributions of wind speeds drawn from tail–equivalent Weibull parents”. Struct. Saf. 30: 110.
  • Cook NJ, Harris RI, Whiting RJ. 2003. Extreme wind speeds in mixed climates revisited. J. Wind Eng. Ind. Aerod. 91: 403422.
  • Davenport AG. 1968. The dependence of wind loads on meteorological parameters, Paper 2. Proceedings, 2nd International Conference on Wind Effects, Ottawa 1967, 19–82. University of Toronto Press: Toronto; 739 pp.
  • Galambos J, Macri N. 1999. Classical extreme value model and prediction of extreme winds. ASCE J. Struct. Eng. 125: 792794.
  • Gomes L, Vickery BJ. 1978. Extreme wind speeds in mixed climates. J. Ind. Aerodyn. 2: 331344.
  • Gumbel EJ. 1958. Statistics of Extremes. Columbia University Press: New York, NY; 371 pp.
  • Harris RI. 1999. Improvements to the method of independent storms. J. Wind Eng. Ind. Aerod. 80: 130.
  • Harris RI. 2005. Generalised Pareto methods for wind extremes—Useful tool or mathematical mirage? J. Wind Eng. Ind. Aerod. 93: 341360.
  • Harris RI. 2008. The macro–meteorological spectrum—a preliminary study. J. Wind Eng. Ind. Aerod. 96: 22942307.
  • Harris RI. 2009. XIMIS—a penultimate extreme value method suitable for all types of wind climate. J. Wind Eng. Ind. Aerod. 97: 271286.
  • Holmes JD. 2002. Discussion of “Classical extreme value model and prediction of extreme winds” by J. Galambos & N. Macri. ASCE J. Struct. Eng. 128: 273.
  • Jensen M, Franck N. 1970. The Climate of Strong Winds in Denmark. Danish Technical Press: Copenhagen.
  • Lanczos C. 1956. Applied Analysis. Prentice Hall: Upper Saddle River, NJ; 539 pp (reprinted 1988, Dover Publications, New York, NY).
  • Lechner JA, Simiu E, Heckert JA. 1993. Assessment of “peak over threshold” methods for estimating extreme value distribution tails. Struct. Saf. 12: 305314.
  • Lombardo FT, Main JA, Simiu E. 2009. Automated extraction and classification of thunderstorm and non–thunderstorm wind data for extreme–value analysis. J. Wind Eng. Ind. Aerod. 97: 120131.
  • Palutikof JP, Brabson BB, Lister DH, Adcock ST. 1999. A review of methods to calculate extreme wind speeds. Meteorol. Appl. 6: 119132.
  • Simiu E, Heckert NA. 1996. Extreme wind distribution tails: a “peaks over threshold” approach. ASCE J. Struct. Eng. 122: 539547.
  • Simiu E, Lechner JA. 2002. Discussion of “Classical extreme value model and prediction of extreme winds” by J. Galambos & N. Macri. ASCE J. Struct. Eng. 128: 271272.
  • Weibull W. 1939. A statistical theory of the strength of materials. Proc. R. Swed. Acad. Eng. Sci. 151: 545.