Can we still predict the future from the past? Implementing non‐stationary flood frequency analysis in the UK

The Environment Agency in England is investing £2.5 billion with the aim of reducing flood risk to at least 300,000 homes by 2020/21. Several of the schemes being considered are on rivers that have experienced an upsurge of flooding over recent years. Decisions on whether to invest and how high to build are usually made on the basis of stationary methods of flood frequency analysis that assume the probability of flood flows is unchanging over time. Following successive severe floods in Cumbria, trend tests and non‐stationary flood frequency analysis techniques have been applied. These allow parameters of the frequency distribution to change over time or with some other covariate. The resulting estimates of flow, for the present day, were up to 55% higher than the stationary estimates at river gauges in north‐west England. The results have been incorporated into the scheme appraisal process. A national analysis indicates that there is evidence of upward trends in peak flows at nearly a quarter of river flow gauges across Great Britain. Many rivers show an abrupt increase in flood flows in the late 1990s. Trends tend to occur in upland areas but they are also seen on some rivers across south‐east England.


| INTRODUCTION
Recent years have seen a run of damaging floods in parts of the United Kingdom. North-west England has been particularly badly hit, with severe floods in 2005 (Carlisle), 2009 (much of Cumbria), and 2015 (much of Cumbria and Lancashire). Flood walls built after the first two events were overtopped in 2015 (Spencer et al., 2018). People who are affected by such events can understandably be sceptical that two or even three extreme floods have occurred purely by chance so close together. This raises the question of whether something has changed so that the probability of such events is higher than it was previously. A related question, more difficult to answer, is how the probability might change in the future.
Answers to such questions are needed to help plan investment in flood alleviation. The Environment Agency in England is part-way through a 6-year investment programme, spending £2.5 billion with the aim of reducing the risk of flooding from rivers, the sea, groundwater, and surface water for at least 300,000 homes by 2020/21. Decisions about investment in flood protection are underpinned by flood frequency analysis, and so it is important that this foundation is as secure as possible, within the constraints of the uncertainties that accompany frequency analysis of extreme values.
UK practice in flood frequency analysis is to use methods of the Flood Estimation Handbook (FEH) (Institute of Hydrology, 1999) and updates (Kjeldsen, Jones, & Bayliss, 2008;Kjeldsen, Stewart, Packman, Folwell, & Bayliss, 2005). The techniques in these publications assume that in a data series each value, for example, each annual maximum flow, is independent and has the same probability distribution as all the other values. If the flood frequency behaviour of a catchment is not constant over time, that is, is non-stationary, the peak flows are not identically distributed and so this assumption is violated. Milly et al. (2008) state that "In view of the magnitude and ubiquity of the hydroclimatic change apparently now under way, however, we assert that stationarity is dead and should no longer serve as a central, default assumption in water-resource risk assessment and planning. Finding a suitable successor is crucial for human adaptation to changing climate." Similar arguments could apply on some catchments subject to urban development.
Methods of flood frequency analysis are available that can account for non-stationarity. One approach is to attempt to remove the trend from the data before the analysis. This can be advantageous in contexts where the increase in the rate and size of extreme events can be explained by a trend in mean (e.g., in sea level analysis). A more convenient approach for flood frequency analysis of annual maximum data is to specify a distribution where one or more parameters are allowed to vary with time, or with another covariate, because the inference about the nature of the trend is included in the model fitting rather than as a separate step. A multi-stage procedure can lead to increased variance by having to infer on both the mean and extremal behaviour of a process, and it can become difficult to quantify this uncertainty.
There is a lively debate in the research literature about the merits and drawbacks of non-stationary analysis, with several papers asserting that stationarity is alive and well (Serinaldi & Kilsby, 2015) or even immortal (Montanari & Koutsoyiannis, 2014). We do not propose to cover all the arguments here but outline some of the concerns more relevant to practitioners. Serinaldi and Kilsby (2015) argue that when the model structure and physical dynamics are uncertain, stationary models should be retained as they are simpler, more theoretically coherent and more reliable for practical applications. Similarly, Montanari and Koutsoyiannis (2014) and Serinaldi, Kilsby, and Lombardo (2018) argue that a non-stationary model can only be justified where one has deterministic information on the process of change, for example about an urban area increasing.
Along the same lines, Rehan and Hall (2016) explore the implications of non-stationary analysis for decisionmaking in flood risk management and conclude that there are reasons for at least considering stationary models alongside non-stationary analysis as the latter can lead to a higher variance in estimates of optimal flood protection.
A response to some of these objections is given by Milly et al. (2015).
These are important arguments, even though some appear to be concerned mainly with the semantics of what is meant by stationarity, and it is sometimes difficult to discern their relevance to flood managers dealing with communities that have seen multiple severe floods. A reason for practitioners to be cautious is the increase in uncertainty that results from the introduction of extra parameters to be fitted. In a nutshell, there can be more scope for non-stationary analysis to give an inaccurate answer even if statistical measures judge it to be the best fit to the data. On the other hand, it is difficult to justify making decisions about flood management on the basis of an assumption of unchanging probability that appears incorrect in some locations and indeed is inconsistent with what is known about the effects of climate change on the hydrological cycle.
In this paper we first examine the scale of the problem by presenting results of trend tests across Great Britain and discussing potential reasons for the trends. We then give an introduction to techniques for nonstationary flood frequency analysis and present findings from a case study in north-west England. We conclude with a discussion of the implications and offer some suggestions for next steps.

| SCREENING FOR TREND ACROSS GREAT BRITAIN
This research was originally motivated by a perceived increase in flood flows in Cumbria, and this work is described in the case study for north-west England later in this paper.
To examine whether the issue occurs elsewhere, trend tests were applied to a national dataset of peak flows produced by the National River Flow Archive (NRFA, 2018). Previous trend tests for flood magnitudes or durations across Great Britain are described by Hannaford and Marsh (2007), Prosdocimi et al. (2014), andHannaford (2015). The datasets used in these studies predated the exceptional floods of February 2014, which was thought to be the most severe in 200 years on some chalk catchments in the south of England, and December 2015 which broke records across the north of England. In general these previous studies found upward trends in northern England and Scotland.
In conventional trend testing the null hypothesis H0 is that the data are identically distributed, and therefore represent a stationary process. The tests output a p-value, or probability, and if this is less than a chosen significance level then H0 is rejected. The conventional approach is then to (provisionally) accept a single alternative hypothesis H1, that is, that a statistically significant trend exists. This is not always the correct conclusion to draw: the discrepancy of the observations from H0 may actually result from factors not included in the formulation of H0 and different from H1, such as dependence between the observations (Serinaldi et al., 2018).
An alternative approach, advocated by Prosdocimi et al. (2014), is to alter the null hypothesis to one where change is assumed to be happening, putting the burden of proof on the data to show otherwise since an increase in flood magnitude and frequency is expected in a warming world. This would focus attention on the risk of society being under-prepared for change because the trend test p-value would represent the probability of rejecting a genuine trend. Prosdocimi et al. (2014) tested two null hypotheses, one being that peak flows are increasing at a rate that would see an increase of more than 20% by the year 2085 (assuming observed change continues at the same rate) and the other hypothesis being that any increase would be less than 20% by 2085. They found that, for over 80% of gauging stations in the UK, neither null hypothesis could be rejected. In other words, on the basis of trends observed up to 2009, it could not be determined whether or not a 20% uplift is adequate to account for the expected change in flows by 2085. The authors state that indeterminate results like this draw attention to the difficulty of making inferences from relatively short records with high variability. Strikingly, they found that sample sizes of hundreds of years would be needed before their null hypotheses could be confirmed or negated with confidence.
The dataset tested for the present study was released in 2018 and includes annual maximum flow data for England, Wales and 28 gauges in Scotland up to water year 2015-2016. Records for other gauges in Scotland extend to 2005. Data from Northern Ireland are included in this dataset but have not been tested in this study. All stations with at least 40 years of record were tested, irrespective of their quality classification, since any errors in rating equations, as long as they are consistent over time, are not expected to influence the findings of the non-parametric tests. The dataset also provides peaksover-threshold data which offer a potentially richer source of information but suffer from data gaps, short records and tend to be ill-defined for groundwaterdominated catchments.
Four statistical tests for trend were applied to the annual maximum flow series. Three are non-parametric tests, that is, they make no assumption about the distribution followed by the data: the Mann-Kendall and Cox-Stuart tests for trend and the Pettitt change point test which detects step-changes. The fourth is a parametric test in which a non-stationary frequency distribution is fitted to the data, the method for which is described later.
The Mann-Kendall test assesses whether there is a monotonic upward or downward trend in a variable over time. The test is not dependent on the magnitude of the data but is based on the proportion of increases and decreases between pairs of values.
The Cox-Stuart test is a simpler test that divides a time series into two equal halves and tests whether the second half is in general higher or lower than the first.
Pettitt's test is designed to detect a sudden change in the average of a time series. It outputs the time of the shift as well as the significance level. This test was included with the original intention of helping to identify any trends associated with sudden changes in peak flow, whether genuine (such as construction of a reservoir) or spurious (such as a change in channel hydraulics that was not properly accounted for in the rating equation).
In each case the null hypothesis H0 is that the data are identically distributed, and therefore represent a stationary process. As noted above, the null hypothesis should only provisionally be rejected in the light of the high variability and relatively short record lengths being analysed.
Stations exhibiting the most significant trends or unusual trends, and those with significant trends in areas of the country where little trend was observed at surrounding stations, were investigated. This led to the removal of 15 stations where trends were thought to be spurious, due to changes in the hydraulic control or rating equation during the period of record, leaving 509 stations for analysis with a mean record length of 52 years. Table 1 summarises the results. At most stations (67%-83% depending on the test) no trend was detected at a 5% significance level (i.e., p < .05) although some of these are likely to be false negatives. At the remaining stations, nearly all showed signs of an increasing trend at a 5% significance level.
At most stations where the Mann-Kendall or Cox-Stuart tests indicated a possible trend, the Pettitt test also detected a step change. The implication is that there has been an abrupt shift, generally upwards, in flood magnitudes. This conclusion may not be valid in all cases given the difficulties involved in untangling gradual trends and step changes if both are present (Rougé, Ge, & Cai, 2013). Figure 1 plots the year in which the Pettitt test detects the greatest likelihood of the change, for stations with a Pettitt p-value less than .05. Inevitably step changes are unlikely to be detected if they occur near the beginning or end of records, so we should expect more stations to show changes in the 1980s and 1990s. Nevertheless, there is a striking concentration of stations with evidence of a shift in the 1990s, particularly the water years 1996-1997 and 1997-1998. This is consistent with a perception that the floods of Easter 1998 in the English Midlands and Wales represented the start of a flood-rich period that has continued to date.
At 50 stations all three non-parametric tests indicated a possible trend, and at 44 all four tests did so. Figure 2 shows the spatial pattern of results from the Mann-Kendall test. Stations with signs of trend are scattered around Great Britain but there is a concentration in upland areas with higher rainfall: Cumbria (where 60% of stations have p < .05), the Pennines, parts of south-west England, Wales and Scotland. The mean annual rainfall over catchments with p < .05 is 1,200 mm, compared with 960 mm for the other catchments. This is further evidence of a tendency which has been reported of increasing rainfall totals and flood magnitudes in western parts of the UK (Hannaford, 2015).
A faint cluster of stations showing signs of trend appears in south-east England. Most stations here that indicate a more statistically significant trend are on catchments with at least part-Chalk geology. Floods on such baseflow-dominated catchments are expected to be sensitive to any increase in long-term rainfall over the autumn and winter period, but further investigation would be needed to confirm whether this is a reason for the trends.
It is easier to detect an apparent trend than to diagnose its cause (Merz, Vorogushyn, Uhlemann-Elmer, Delgado, & Hundecha, 2012). It is important to consider what is driving the hypothesised trends because the answer could help assess whether and how the trends might continue and therefore how to allow for them in the design of flood alleviation schemes.
Possible causes for trends in flood magnitude or frequency (Merz et al., 2012, elaborated by Hall et al., 2014 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Number of stations with step change F I G U R E 1 Year in which the Pettit test detects a shift for the stations where it detects a change at the 5% significance level cause of the strongly significant trends in many of the catchments in north-west England which remain largely rural. Changes in agricultural land management are unlikely to produce trends as large as those observed. Some of the rivers that exhibit the strongest trends are relatively natural watercourses that have not been channelised or embanked. Over short time scales it can be difficult to distinguish between decadal-scale variability (for example cyclical changes or clustering) and progressive climate change. Historical records often show an alternation between flood-rich and flood-poor periods (Hall et al., 2014). Mac-Donald and Sangster (2017) found evidence of both regional and national flood-rich periods in Great Britain since 1750, including the years 2000 to present. Such periods can persist for a long time (termed the Joseph Effect by Mandelbrot & Wallis, 1968), so even 40 years may not be long enough to reliably detect progressive change. Many UK river gauges were installed in the 1960s and experienced relatively flood-poor conditions for the first few decades of their operation (Lane, 2009), so it is likely that at least part of the observed trends is F I G U R E 2 Results from the Mann-Kendall test for stations with at least 40 years of record driven by natural fluctuation. Kundzewicz and Robson (2004) recommend 50 years to detect a climate-driven change in flow data. If the minimum record period is increased to 60 years, the proportion of stations with H0 (i.e., stationarity) rejected at the 5% significance level by the Mann-Kendall test increases to 19 out of 59 (32%). So if there is a cyclical effect driving the trend at these stations it would seem to be operating over a long time period spanning more than half a century.
Evidence that natural variation is not the whole picture is provided by recent attribution studies which report that climate change exacerbated the 2013-2014 floods in the south of England (Schaller et al., 2016) and made the December 2015 storm rainfall in the north of England about 60% more likely (Otto et al., 2017).
The observation that nearby catchments can show very different amounts of trend may be partly explained by differing periods of record but it also indicates the limitations of the uniform regional increases imposed by current climate change allowances (Environment Agency, 2018). It could be that some catchments are more sensitive than others to changes in rainfall characteristics or other aspects of the climate. This is the premise of the research from which climate change guidance was developed (Reynard, Crooks, Kay, & Prudhomme, 2009) but during this development the concept of grouping impacts by catchment type was dropped in favour of simpler regional allowances (Kay, Crooks, Davies, Prudhomme, & Reynard, 2011).

| Methods
Approaches to non-stationary flood frequency analysis can be distinguished by (a) what type of flood data is analysed; (b) which distribution is fitted; (c) which parameters of the distribution are varied; (d) the covariates included in the model; (e) the fitting method; (f) the form of the relationship between the parameters and the covariates; (g) how goodness of fit is judged; (h) whether the analysis is applied at individual stations or regionally.
This paper describes the results of fitting a nonstationary generalised extreme value (GEV) distribution to individual series of annual maximum flows, Z, allowing either or both the location and scale parameters to vary as a linear function of time or other covariates. This allows for the mean and/or the variance of the distribution function describing the annual maximum flows to change during the period of record. The distribution function is given by where μ is the location parameter, σ is the scale parameter, ξ is the shape parameter, y + = max{y, 0}, σ > 0 and μ, ξ ∈ R. The location and scale parameters can vary linearly with the covariate x: A log-link function can also be used to ensure that the scale parameter remains positive for all values of x.
The GEV distribution was fitted using maximum likelihood estimation (MLE). The shape parameter was assumed to be constant because there is too much error in its estimation to permit inclusion of a covariate (O'Brien & Burn, 2014). For the case study in north-west England, the choice of parameter(s) to vary for each station was made by comparing the AIC (Akaike Information Criterion) for each fitted distribution and examining the standard error of the parameters that represent the covariate effect, along with visual assessment of plots that summarise the goodness of fit. Since this manual assessment was impractical at a national scale, when fitting models at all locations across Great Britain the best fitting model was chosen automatically using likelihood ratios. Comparison of the two approaches for assessing goodness of fit indicates that likelihood ratios tend to prefer the selection of simpler models, that is, those that are stationary or have only one non-stationary parameter.
The fitting was carried out using the R open source programming language, via the package extRemes (Gilleland & Katz, 2016).
The main analyses related changes in flood frequency to time, that is, the year in which a flood happened was adopted as a covariate. Although this might seem an obvious choice, the time variable is a proxy for some changing physical phenomenon that is causing any change in flood frequency. Using a physically-based covariate offers two potential advantages: • The prospect of a better fit; • The ability to estimate future changes in flood frequency, where there is knowledge of how the covariate might change in future, along with knowledge of a causal relationship between the covariate and the variable. Although we clearly know what the date will be 50 years hence, we cannot justify extrapolating timebased trends into the future and so results have been estimated for present-day conditions only.
Some of these papers report significant improvements in model performance when rainfall covariates are used in preference or addition to a time-based covariate, which is expected given the connection within the hydrological cycle.
A hurdle associated with some covariates is that they introduce an extra stochastic element and hence a need for additional frequency estimation. For example, if the flood frequency is related to the annual rainfall, it is necessary to estimate a frequency distribution for the annual rainfall, and to consider whether that should be a stationary distribution or not. We consider this further in the discussion section.

| Background
Following the floods of December 2015, the Environment Agency commissioned a hydrological review (summarised by Spencer et al., 2018). One of the outcomes was a decision to carry out trend tests and to trial non-stationary flood frequency analysis at a set of 33 gauging stations in north-west England. Most stations were selected for their location on rivers that have recently experienced notable floods, and for their long records of good-quality flood peak data. Most are in Cumbria with some in Lancashire and Greater Manchester. The north-west England analyses can be regarded both as a pilot for the national analyses described below, and also as a more detailed study of an area where nonstationarity is more apparent than most other areas of England and Wales. Figure 3 shows an example of the results, for the River Lune at Killington, Lancashire and the fitted parameters are given in Table 2. The annual maximum flows are plotted, along with the results from two fitted GEV distributions, one stationary and one non-stationary, with just the location parameter varying with time. In the first panel the three pairs of lines correspond to design flow estimates for three annual exceedance probabilities (AEPs). The second panel shows the 90% confidence interval associated with the estimates for a 1% AEP.

| Results with time as covariate
The non-stationary results in Figure 3 show an increase between the first and the last years of record which varies with flood probability, from a 58% increase for common floods (50% AEP) to a 23% increase for rare floods (1% AEP). More relevant to the business of flood risk management is the finding that for the most recent year available (water year 2015-2016) the non-stationary flow estimate is higher than the conventional stationary estimate: from 23% higher at the 50% AEP to 6% higher at the 1% AEP. At the 24 stations across north-west England where the non-stationary fit was preferred, it led to increased flow estimates, up to 55% higher in one case. On average, the larger increases were seen in highprobability floods. Twenty one of the stations had smaller proportional increases for the 1% AEP than for the 50%. Implications for flood schemes are discussed later.
In the second panel of Figure 3 there is little difference in the width of the confidence intervals associated with the stationary and non-stationary analyses at Killington. The non-stationary result tends to show wider confidence intervals when the scale parameter is allowed to vary as well as, or instead of, the location parameter.
All the non-stationary flood frequency analyses were done ending in water year 2014/2015 and again including 2015/2016 (i.e., both without and with the December 2015 floods). The addition of the December 2015 floods can make a significant difference. In the case of Killington, the non-stationary estimates were preferred whether or not 2015/2016 was included. For the 1% AEP estimate, compared with the stationary estimate ending in 2014/2015, the stationary estimate including 2015/2016 was 21% higher, and the non-stationary estimates without and with 2015/2016 were 11% and 28% higher respectively. Thus, in this case, the addition of the extra event had more effect on the 1% AEP estimate than the decision whether or not to apply non-stationary analysis.

| Alternative covariates
At all stations the primary covariate was time. In addition, for four important stations in Cumbria a more indepth analysis was carried out, using a selection of meteorological covariates: annual rainfall, autumn & winter rainfall and winter rainfall. Covariates were calculated from catchment average rainfall series, using the CEH-GEAR dataset which provides daily rainfall on a 1 km grid across the UK from 1890 (Tanguy, Dixon, Prosdocimi, Morris, & Keller, 2016). For each catchment a monthly average rainfall series was created and each of the above covariates was calculated for the water year in which each annual maximum flood occurred. Each covariate was centred and scaled, subtracting the mean and dividing by the standard deviation. This  transformation is reported to help reduce the standard error of the parameter estimates (Yan et al., 2017). One motivation for choosing seasonal rainfall covariates might be that projected changes are available for a range of greenhouse gas emission scenarios and future time periods. There might be an expectation that these covariates would help understand changes in flood frequency. This would rely on an assumption that the response of a catchment to climate change can be captured by the empirical non-stationary analysis using a AEP, annual exceedance probability single covariate; this is highly questionable given that climate change is likely to affect flood flows in multiple ways, including changes to potential evapotranspiration and in short-term rainfall intensity as well as long-term seasonal rainfalls. At all four gauges where rainfall covariates were included as an alternative to time, the best-fitting model (measured by AIC) was the one using autumn and winter rainfall as a covariate. Figure 4 shows an example of the results, for the River Greta which is a tributary of the Derwent near Keswick in Cumbria, and the parameters are given in Table 3. The non-stationary distribution has only the location parameter varying. Unlike Figure 3, the flow estimates do not increase steadily over time, instead jumping from 1 year to the next depending on the recorded autumn and winter rainfall (panel b). For the 50% and 5% AEP estimates there is a tendency for increases in recent years. Panel a shows that over the range of seasonal rainfalls observed to date, the 50% AEP flow varies from about 80 to 130 m 3 /s.
The difference between the stationary and nonstationary models is more pronounced at higher return F I G U R E 6 Ratio of non-stationary to stationary estimate of the 1% AEP flood for the year 2015-2016 at stations with a minimum of 40 years of record. AEP, annual exceedance probability periods, where uncertainty is greater for both. This is due to the difference in the shape parameters fitted in the stationary and non-stationary cases.
The confidence limits for the results were narrower than when time was the covariate and the range of flow estimates during the period of record appeared more realistic in some cases. However, interpretation of the results in terms of a flood frequency curve, even for present-day conditions, is not straightforward. As discussed below, to estimate the flow for a particular probability it is necessary to estimate the autumn and winter rainfall that is expected to occur in the same year as that flow. This complicates practical application of the method and so the results using rainfall covariates were not pursued further.

| NATIONAL-SCALE RESULTS
To provide an indication of the implications of implementing non-stationary analysis at a national scale, the approach outlined above was repeated at all 509 stations in Great Britain with a record length of at least 40 years and with acceptable data for trend testing. The results are indicative, since it was not feasible to review each station individually. At two thirds of stations the stationary distribution gave the preferred fit. At nearly all of the remaining stations, the non-stationary analysis increased the estimate of the present-day flow for the 50% AEP ( Figure 5). Increases of 11-30% are widespread, with some higher increases at isolated stations in the east and south of England and north-east Scotland. The results for lower-probability floods are more mixed (Figure 6), with a general tendency towards more similarity with the stationary estimate but some significant increases in the south of England and some decreases in East Anglia (influenced by large floods in 1947 and 1968) and South Wales.
One result worth highlighting is that at 68 out of the 166 stations where a non-stationary distribution was the preferred fit, the Mann-Kendall test did not detect a trend at a 5% significance level. This highlights a difference between the ways that parametric and non-parametric trend detection tests work: it is worth using both types to gain a fuller picture.

| Choice of covariates
On the face of it, there are some good reasons for preferring physically-based covariates to help explain variations in flood frequency. Most importantly, covariate effects can help to explain the drivers of flood frequency and produce a better-fitting model.
However, while adopting seasonal rainfall (for example) as a covariate may give a good fit, it shifts the problem into estimating a frequency distribution for the covariate. It is as if the uncertainty has been chased out of one corner of the hydrological cycle only for it to pop up in another. Serinaldi et al. (2018) refer to this type of approach as a double stochastic stationary model.
For example, each year's flow estimate in Figure 4 is conditional on a covariate that is, unrealistically, fixed over time, that is, the flow estimates for a particular year are what to expect if that year's autumn and winter rainfall accumulation was observed every year. Because this is highly unlikely to be the case in reality, a flow estimate is required that accounts for the distribution of the rainfall covariate. This expected flow is known as the marginal return level (Eastoe and Tawn, 2009). The return period of the marginal return level is the reciprocal of the mean of the conditional annual exceedance probability over the distribution of the covariate. This gives one return level that accounts for the non-stationarity of the flows but is not conditional on a particular value of the covariate.
The marginal return level can be useful when the distribution of the covariate is bounded or well-approximated, for example, the North Atlantic Oscillation index, but can encounter difficulties when this is not the case, and especially given the uncertainty associated with annual maximum series.
Additional complications arise when there is an additional trend in the covariate, which may also require some degree of extrapolation. Given these issues, further investigation is needed before physically meaningful covariates can be applied in flood management practice.

| Other further investigation needed
Further practical application of non-stationary flood frequency estimation in the UK should explore alternative distributions to the GEV, in particular the Generalised Logistic, widely used as it is recommended by Robson and Reed (1999), and a version of the four-parameter kappa distribution which Kjeldsen, Ahn, and Prosdocimi (2017) state renders three-parameter distributions "obsolete" on most UK catchments.
There are a variety of methods for identifying the goodness of fit and they can give quite different results, particularly with respect to whether non-stationary fitting of the scale parameter is worthwhile. This can have a major impact on the flow estimates so would be worth further exploration.
The non-stationary models specified here comprise parameters that are linear functions of the covariate of interest. This assumption of linearity, while allowing for easily interpretable results and straightforward modelfitting, may not always be suitable. Additive models, where parameters are only restricted to be smooth functions of the covariate (Chavez-Demoulin & Davison, 2005;Jonathan, Ewans, & Randell, 2014) offer greater flexibility despite reduced extrapolation capability. This may reveal more complex non-stationarity in the case studies presented and be worth further exploration.
6.3 | Uncertainty and compatibility with regional (pooled) analysis It is important that results of any flood frequency estimate are considered alongside their uncertainty, and that this is fed through into decision-making about design of flood alleviation schemes. Non-stationary analysis has the benefit of reducing a potential source of bias. On the other hand, it can lead to higher variance, through the introduction of more parameters and because the analysis has been carried out on individual stations. Most flood studies in the UK, using FEH methods, rely on pooled analysis in which data from a group of hydrologically similar catchments are analysed.
One way to reduce the additional uncertainty introduced by non-stationary analysis is to merge information from multiple locations. O'Brien and Burn (2014) present a method for regional non-stationary analysis in which pooling groups are created based on the form of trend found in the data at each station. The results, from Canada, showed less uncertainty than an equivalent regional stationary analysis.
Such an approach could be considered for the UK. One hurdle to overcome is that the FEH method of pooled analysis (Kjeldsen et al., 2008) does not use MLE for fitting frequency distributions, instead being based on an L-moment framework. Non-stationary frequency analysis using L-moments is not straightforward, unless the records are sub-divided into shorter periods. Jones (2013) illustrates the difficulties of extending L-moment procedures to estimate trends in the parameters of distributions. There is also the question of how to obtain results at ungauged sites.
In the meantime, practitioners need a way of reconciling the results of at-site non-stationary analysis (fitted using MLE) and the FEH pooled flood frequency curve which is typically the approach on which flood management decisions are based. A quick fix could be achieved by transforming the parameters of the non-stationary frequency curve for a chosen year (standardised by the median annual maximum flood) into their equivalent Lmoment ratios. These could then be incorporated in a pooled analysis according to the FEH methods, most likely assuming stationarity at the other stations in the group (since they typically have much less weight on the analysis) although in principle other stations could also undergo the same non-stationary treatment as the subject site. Initial tests indicate that this approach gives credible results, but in the long run a more coherent approach would be desirable.

| Role of historical flood data
Where non-stationary analysis is preferred, a possible implication is that, without making an allowance for trend, past information is no longer as valuable for estimating current or future flood risk. This can conflict with the often-cited value of including historical flood data that predate gauged flow records. Without such longerterm information, estimates of flood risk made from relatively short data series can be highly uncertain.
There is a balance to be struck between the value of historical flood information (reducing the uncertainty) and the potential for it to introduce a bias. Each flood study needs to consider the role of historical data individually.

| Communicating the results
There are difficulties in understanding and communicating familiar concepts such as return period or AEP when flood risk is changing (Serinaldi, 2015). Alternative concepts have been formulated for non-stationary analysis, an example being the design life level which quantifies the risk of a flood exceeding a threshold value during a given period such as the design life of a structure (Rootzén & Katz, 2013).

| Implications for flood management
The results of non-stationary analysis have potential implications for the design and/or economic justification of measures to reduce flood risk. Increases in estimated flood flows may lead to a need to protect against larger events than previously thought. The costs of a scheme are associated with the rarer floods such as the 1% AEP since flood walls, storage reservoirs or other interventions are often designed to protect against such rare floods.
The estimated benefits of a scheme in terms of protection of properties tend to be dominated by damages avoided during frequent floods (Environment Agency, 2010).
Current Cumbrian scheme appraisals are piloting the application of non-stationary analysis where there is evidence for it. Preliminary recommendations have been based on the non-stationary analyses, trend tests, and split-sample tests; the latter were carried out on the 50% AEP and provide additional numerical support for increases in flows.

| Future flood risk
The method of non-stationary analysis that has been applied has no predictive power for future conditions. It is all too easy to be proved wrong by future historians or hydrologists when we imagine we can predict the future by extrapolating recent trends. Data series that appear non-stationary now may not do so when another 10 years of flow data are added, and vice versa for apparently stationary series.
Nonetheless, it is becoming increasingly difficult to justify allowing for climate change as if it were purely a future phenomenon, assuming that all data recorded up to the present day can represent an unchanging baseline period. Instead we recommend that, where there is evidence of trend, non-stationary analysis is considered as a way of estimating the present-day flood flows, and a suitable allowance is made for expected future climate change.

| CONCLUSIONS
It is difficult to convince people in Cumbria who have been flooded out of their home for the second or even third time in recent years that they have experienced nothing more than a run of bad luck. Communities who have experienced multiple floods are understandably sceptical about decisions made about investment in flood protection on the basis of an assumption that there has been no change in the probability of flooding. Their suspicions are in line with projections of the impacts of climate change. Although there may be a cyclical element to the recent upsurge in floods across nearly a quarter of river flow gauges in Great Britain, it would be unwise, in the face of a warming climate, to gamble on the likelihood of the trends reversing in the near future.
Non-stationary methods of flood frequency analysis are widespread in research settings. Although they may not yet have reached maturity, they are capable of practical application and can give answers that are more believable and more readily justified to stakeholders. Results can potentially be different enough to justify investment in a flood scheme where present industry-standard methods of flood estimation may lead to a decision not to invest.
It is recommended that, where trend is apparent, non-stationary analysis is adopted alongside conventional methods and the uncertainty of the results is incorporated in the process of deciding a preferred approach.