Issues in generating stochastic observables for hydrological models

This paper provides a historical review and critique of stochastic generating models for hydrological observables, from early generation of monthly discharge series, through flood frequency estimation by continuous simulation, to current weather generators. There are a number of issues that arise in such models, from uncertainties in the observational data on which such models must be based, to the potential persistence effects in hydroclimatic systems, the proper representation of tail behaviour in the underlying distributions, and the interpretation of future scenarios.


| STOCHASTIC OBSERVABLES AS INPUTS TO MODELS
The management of water resources, floods and droughts in hydrology is an exercise in predicting the future, at least in some probabilistic sense. The future is, however, not observable and therefore necessarily unknown. It needs to be predicted, at least in some probabilistic or plausible projection sense, and for that we normally use some form of hydrological model. In doing it is usual to drive, calibrate and evaluate the hindcasts of that model as a hypothesis about how the catchment is functioning making use of some historical observables. The nature of that modelling process has been discussed extensively elsewhere. Here, I wish to concentrate on the nature of the variables used to drive the future discharge projections of the model, in particular catchment discharge, precipitation and evapotranspiration estimates.
The generation of forcing variables will necessarily be directly related to the observables used in calibration and validation. The generation process will, however, be stochastic in both the nature of the observables themselves, and in the uncertain future values. It is worth emphasizing that the observables and their historical time series of values are themselves uncertain, particularly when estimated over a catchment area. We should, therefore, consider the historical data as virtual variables (e.g., Beven et al., 2012;Coxon et al., 2015;Kiang et al., 2018;McMillan et al., 2018;Westerberg et al., 2011). This creates additional constraints on the modelling process and suggests that hydrology should be treated as one of the inexact sciences (Beven, 2019). Extending the model projections into the future, we cannot necessarily expect the statistics of those stochastic variables to be similar to the past, given the potential for future changes in catchments and climate and persistence in the hydroclimatic system. Such changes will then be necessarily subject to epistemic uncertainties, including those associated with any projections of future conditions. It has therefore been a fairly natural process in hydrological analyses, particularly since the start of the digital computing age, to generate time series of potential future inputs to inform management decisions (although, until recently, it has been much less common to consider the uncertainties in the historical observations). In some cases, this has meant a process of modifying historical records to be compatible with expectations of future conditions. In other cases, stochastic models of inputs have been used to drive hydrological models, in some cases using analytical mathematics but more often using Monte Carlo methods. Reviews of analytical derived distribution methods have been provided by, for example, Eagleson (1972); Klemeš (1978) and Gupta and Waymire (1983). To derive analytical solutions generally requires some strong simplifying assumptions and constraints. In what follows we will concentrate on Monte Carlo methods which are more general and more flexible.
Monte Carlo sampling methods on digital computers date back to the work of Stanislaw Ulam, Nicholas Metropolis, and John von Neumann at Los Alamos in the 1940s where the ENIAC 1 computer was assigned to solve the problem of neutron diffusion using random experiments. Since the project was secret, a code name was required for the work and Nicholas Metropolis suggested the name Monte Carlo (Metropolis & Ulam, 1949). He later contributed to the Metropolis-Hastings Monte Carlo Markov Chain algorithm, 2 widely used since in Bayesian model conditioning including applications in hydrology (Beven, 2009). John von Neumann also invented a method of generating pseudo-random numbers for these experiments using the middle-square method. This was known to be subject to failures, but it was fast and compact (a critical issue given the capabilities of the ENIAC computer) and von Neumann argued that when it failed it did so obviously.
In the 1950s and 1960s Monte Carlo methods developed rapidly in a variety of fields, including hydrology, as digital computers started to be made more widely accessible. The area of hydrological work we are concerned with in this paper is the generation of input sequences for hydrological models to inform management decisions. This includes the generation of discharge or water yield series for water resource and reservoir management, and precipitation and evapotranspiration inputs for hydrological simulation models. Monte Carlo methods have also been used for other purposes, such as the description of dispersion in water bodies and the generation of spatial fields of parameters for hillslope and groundwater models but these will not be considered further here.

| GENERATION OF STOCHASTIC SERIES OF DISCHARGES
The first paper to address the issue of generating series of flows seems to have been that of Brittan (1961) in an application to the Colorado River, but this was preceded by a number of studies that had analysed historical flow records from a probabilistic perspective (e.g., Leopold, 1959;Yevjevich, 1961). It was also followed by the much better-known paper of Thomas and Fiering (1962) that has the title 'Mathematical synthesis of stream flow sequences for the analysis of river basins by simulation'. 3 In that paper Thomas and Fiering provide a method of sampling from a specified distribution of monthly stream flows including the effects of correlation between months. It is formulated as the recursion equation: where q i + 1, j and q i, j represent the synthetic monthly stream discharge during year j for month i À 1 and month i respectively; q 0 iþ1 and q 0 i represent the average monthly flow of the historic streamflow record for month i À I and month i, respectively; b i + 1 is the regression coefficient for estimating flow during month i À 1 from the flow during month i; the value s i + 1 is the SD of the historic streamflow record for month i + 1; r i + 1 is the correlation between flow for month i + I and month i; and t i is a random deviate sampled from a normal distribution with zero mean and unit variance (see also Fiering, 1967). This process therefore has 12 q i parameters, 12 b i parameters, 12 r i parameters and 12 s i parameters all of which can be derived from historical flow records under an assumption of stationarity. Thomas and Fiering generated 510 years of record as a input to a water resource management simulation. Such synthetic records could therefore allow for more extreme conditions than might be available in the historical record but Yagil (1963) gave a proof that this type of equation would preserve the mean, standard deviation and correlation of the original historical record. Any process representation in such a formulation is contained only in the coefficients b i and r i . Preservation of the first two moments, of course, is not necessarily an indication that the nature of the extremes of interest in water resources management might be well simulated.
There were a number of extensions to the Thomas and Fiering model proposed. Harms and Campbell (1967) for example suggested sampling a log-normal distribution rather than normal (which also avoids the potential for negative flows in the synthetic series). Based on Yevjevich (1964) they also introduced a correlation from year to year to allow for storage carry-over and historical sequences of wetter and drier years. Colston and Wiggert (1970) developed confidence limits for dependable reservoir flows using stochastic monthly simulations, though Klemeš and Bulu (1979) later suggested that such estimates should be considered to have limited confidence. Salas and Smith (1981) show how the Thomas and Fiering model is equivalent to an autoregressive moving-average (ARMA) model of flow and groundwater storage, with structures depending on the structure of the rainfall inputs in applying the water balance. Srikanthan and McMahon (1980) generated monthly flows for ephemeral streams in Australia, comparing six different model structures. An early review of such methods is provided by Yevjevich (1987).
Later work as digital computers became more widely available also led to developments in generating discharge hydrographs (rather than only monthly and annual time series. One of the earliest models of this type was that of Weiss (1977) who used a shot noise concept of impulses occurring as a Poisson process in time. Weiss uses impulses of exponential form, with different time characteristics for surface and subsurface contributions to the hydrograph, but other assumptions are also possible (e.g., Claps et al., 2005). In that such models normally assume components equivalent to linear stores with different time constants, it is worth noting that there is significant overlap with the data-based mechanistic modelling (DBM) methods of Young (2013, and references therein) as fitted to historical data. and driven by stochastic rainfall series.
Such DBM models could also be driven by stochastic assumptions about rainstorms to produce discharge series.

| DISCHARGE AS A STOCHASTIC SERIES WITH PERSISTENCE
This concept of stochastic hydrological forcing also led to a body of literature on the hydrological phenomenon as fractal variables, primarily because the hydrologist Jim Wallis 4 was working at the IBM Thomas J. Watson Research Center at Yorktown Heights, NY, at the same time as Benoit Mandelbrot, the mathematician who coined the name fractal (see, e.g., Mandelbrot & Wallis, 1968;Mandelbrot, 1971). Persistence effects in hydrological series had been known since the analyses of River Nile data and other geophysical series by Harold Edwin Hurst 5 (e.g., Hurst, 1957) but fractal scaling behaviour provided a novel way of looking at persistence. Others, however, objected to the fractal description at the time suggesting there were other mechanisms to explain persistence in time series (e.g., Klemeš, 1974;Scheidegger, 1970;O'Connell, 1971). More recently Koutsoyiannis et al. (2018) have argued that what constitutes a fractal is only very vaguely defined but essentially represents some local behaviour as scale tends to zero, whereas the type of persistence exhibited by long time series of discharge is a more global behaviour that is better described as a form of Hurst-Kolmogorov dynamics (see also Koutsoyiannis, 2006Koutsoyiannis, , 2011. Hydrological applications of Hurst-Kolmogorov dynamics are discussed in detail in Koutsoyiannis (2021a). It is an important issue in terms of generating stochastic variables because it can be demonstrated that some rather simple longer term stationary models may exhibit apparently short-term nonstationary characteristics including apparent changes in mean or trends. The controversy in part arises because of the difficulties of identification of such models given only relatively short hydrological records. This remains an issue in determining the nature of change (e.g., Koutsoyiannis, 2013;Serinaldi et al., 2018). Dimitriadis and Koutsoyiannis (2018) have recently proposed a general model for stochastic generation of variables that, for certain structures, can reproduce aspects of intermittency characteristic of some long term series. Koutsoyiannis (2020) has extended this work to time irreversible processes such as series of event hydrographs.

| GENERATING STOCHASTIC INPUTS TO HYDROLOGICAL MODELS
These early stochastic models of discharge were primarily intended to replace observed values as inputs to water resource management models. Later developments also started to use stochastic models of the inputs to rainfall-runoff models to replace observed values. There were a number of reasons for doing so. The first was to provide an alternative way of estimating flood frequencies in situations where long discharge records were not available to allow the robust fitting of a flood frequency distribution. It was often the case that longer rainfall time series were available, so that fitting a stochastic model of rainfalls would allow the generation of long rainfall sequences as an input to a rainfall-runoff model calibrated to whatever discharge data were available. This approach had already used based directly on observed rainfall series, for example, using the Hydrocomp version of the Stanford Watershed Model in Fleming and Franz (1971).
The origins of generating stochastic inputs for flood frequency estimation lay in the seminal paper of Pete Eagleson (1972). 6 His motivation for developing a method of flood frequency estimation from rainfall distributions was 'to seek an increased understanding of the dynamics of flood frequency through a theoretical development that relates peak streamflow statistics to the statistics of climatic and watershed parameters by using the kinematic equations of runoff as they are derived for homogeneous catchments and storms' (p878).
This involves three components: a model of point rainfalls and the intensity and duration of rainfall excess; a runoff routing model to derive peak discharge from the rainfall excess variables and catchment characteristics; and a transformational component to derive 'the distribution F'(Q p ) of peak streamflow from the distributions of the rainfall excess variables by using the analytic relationship provided by the runoff model' (p.879).
Using simple mathematical descriptions of distributions and kinematic theory for the routing, Eagleson showed that the derived distribution of flood peaks could be obtained analytically. This provided an engineering solution to frequency estimation from rainfall characteristics at a time when the use of digital computers was still not widespread. It was also the start of an extremely fruitful and influential scientific program based on the derived distribution approach, including the sequence of 7 papers on Climate, Soil and Vegetation in Water Resources Research by Eagleson (1978).
The assumption was that such an approach would provide more robust estimates of flood frequencies than the uncertain estimates associated with fitting a distribution to a small number of historical flood peaks. It was also argued that the approach could be applied to ungauged catchments if the parameters of the rainfall-runoff model could be estimated independently based on soil and topographic information. The devil is, of course, in the details of the assumptions necessary to obtain a tractable analytical solution. In Eagleson (1972) it is assumed that both mean rainstorm intensities and durations can be represented as independent one parameter exponential distributions (since a gamma conditional distribution was more difficult to handle) with an areal reduction factor to allow for increasing catchment area. Rainfall excess is then calculated by subtracting a constant loss rate from the mean rainfall intensity (the Φindex method) leaving a constant excess rate over the duration of the storm. A table of typical loss rate parameters is given but no account is taken of how the loss rate might vary with antecedent wetness prior to an event (Eagleson again notes that would lead to very complicated mathematics). This is then routed over representative slope segments for the catchment. Storm runoff here is very much overland flow, but following the work of Betson (1964), Eagleson allows that this runoff may occur only as a partial area response in the catchment, by letting the rainfall excess to occur on a rectangular area centred on the stream channel will hillslopes of fixed slope and noting that this might result in a partial equilibrium hydrograph depending on the intensity and duration of the excess rainfalls. The routing is completed by routing down the channel itself to derive a flood peak magnitude for an event.
Conditional integration over the distribution of storms that have rainfall excess rates greater than zero results in the required derived distribution of flood peaks.
This paper inspired others that provided alternative models for both the runoff generati0n and the routing components. As computer power increased, some of the simplifying assumptions about runoff generation and routing required by Eagleson's analytical analysis could be relaxed, including the use of the geomorphic unit hydrograph for routing (e.g., Diaz-Granados et al., 1984;Hebson & Wood, 1982) and more realistic treatments of the effect of antecedent wetness conditions and runoff generation (e.g., De Michele & Salvadori, 2002; Sivapalan et al., 1990). This type of analysis is, however, fundamentally depending on the representation of the driving input variables.
More realistic rainstorm generators have been proposed including the random rectangular pulse Bartlett-Lewis and Neyman-Scott models (e.g., Rodriguez-Iturbe et al., 1987;Entekhabi et al., 1989;Cowpertwait & O'Connell, 1992;Onof and Wheater, 1993). Storms are assumed to be generated as a Poisson process but they differ in reproduce the first few moments of the rainfall distributions. Estimation of the parameters of these types of models can be an issue (e.g., Favre et al., 2004;Velghe et al., 1994). Reviews of such models are provided by Waymire and Gupta (1981) and Bordoy and Burlando (2014).
A further development was to add a distribution of arrival times for events, and an additional evapotranspiration component to then allow the generation of time sequences of inputs to a hydrological model which produces continuous simulations of discharges, including the flood peaks of interest. These models were effectively the first 'weather generators'. Such an approach has less often been used to generate the statistics of droughts given the limitations of many rainfall-runoff models in predicting low flows as connectivities in flow pathways start to break down. One of the first studies to use continuous simulation to estimate flood frequency made use of historical rainfall records and evapotranspiration estimates (Fleming & Franz, 1971). This limits the length of simulations can be made, whereas an approach based on distribution functions can generate records of any required length. A rule of thumb in flood frequency estimation that arose out of some of the early stochastic sampling experiments carried out by Jim Wallis and others (e.g., Wallis & Wood, 1985) was that robust estimates of frequency required records of ten times the return period to be estimated (i.e., 1000 years for events with an annual exceedance probability [AEP] of 0.01 that are commonly used in flood defence designs).
For dam safety studies, where event magnitudes with an AEP of 0.0001 might be of interest, this suggests realizations of 100 000 years might be required.  (1969) for example, it was demonstrated that the coefficient of variability for a collection of 12 raingauges in a single enclosure at an exposed site in New Zealand decreased with mean storm rainfall but could be 10%

| ISSUES IN GENERATING INPUTS FOR RAINFALL-RUNOFF MODELS
for a storm of 10 mm. Averaging to monthly values decreased this to 1%-2%. Such uncertainty is often ignored in rainfall generation.
There is then the further issue of relating point rainfall estimates to the catchment area values required as inputs to hydrological models. Many studies have simply assumed that point rainfall distributions can be used at the catchment scale or have simply defined an areal reduction factor (as used in Eagleson, 1972, above). Others, that point values can be associated with subcatchments, but with a simple cross-correlation structure between subcatchments used in generating catchment scale storms (e.g., Blazkova & Beven, 2009). There have also been developments in space-time stochastic rainfall models to address this problem that aim to preserve the point statistics (see next section); the issue of what is a good estimate of catchment area rainfall, however, remains.
To define a stochastic point rainfall generator, long term rainfall records are generally required, especially in hydrological regimes with strong seasonal or weather type variability. Generating rainfall alone may also not be sufficient where snowfalls can be important, which will require a model that includes both rainfall and temperatures (see next section). Early derived distribution methods tended to fit distributions of storm duration, mean intensity and arrival time of the next storm to the sample values derived from the available rainfall records, checking for correlation between the values. This then requires the definition of what constitutes a storm in the available data (e.g., Beven, 1986a;Restrepo-Posada & Eagleson, 1982) by what minimum period of zero or negligible rainfalls should be required between storms. Such issues arise both for daily and sub-daily rainfall values and will inevitably result in some degree of approximation.
Later studies have also involved the generation of storm profile data. One approach has built on the analyses of the multifractal nature of observed rainfalls in time (e.g., Gupta & Waymire, 1993), by disaggregating the event total and duration into a sequence of fractal increments for a specified time step (e.g., Puente and Obreg on, 1996;Obregon et al., 2002;Maskey et al., 2019). Wavelet analysis has also been proposed as a more robust way of analysing multifractal behaviour and has led to some important insights about the small-scale variability within and between pulses of rainfalls within an event (e.g., Venugopal et al., 2006). This type of scaling behaviour might be expected if the high frequency variations in rainfall rates are linked to atmospheric turbulence (though see the discussion in Veneziano et al., 2006, suggesting that it may not be that simple). A simpler approach, based directly on the observational series, is to sample directly from cumulative profiles taken from the observed storms. This approach was taken by Cameron et al. (1999); Cameron et al. (2000) who classified the sampled profiles, normalized by event volume and duration, into a number of event classes. Given a sampled profile for a generated event, the intensities can be calculated for any required time step.
Another issue, noted earlier, is that some applications require the generation of long time series, for instance in frequency analysis for the design of flood defences or dam safety studies. This raises two problems: one is the issue of how pseudo-random number generators produce extreme values on the tails of the specified distributions (some generators are much better than others in that respect, see Bowman, 1995;Press et al., 2007;Bertoni et al., 2010) 7 ; the second is the infinite tails of those distributions which can allow the generation of some very extreme values. In terms of extremes, some consideration needs to be given to reproducing higher moments of the observed data (e.g., Burton et al., 2008;Koutsoyiannis, 2004Koutsoyiannis, , 2021a but for rainfalls, those extreme values might, by random sampling during a long enough run of random numbers, exceed what might be atmospherically possible in a hydroclimatic region. What is atmospherically possible was embodied in the idea of a maximum possible precipitation, that later evolved into a probable maximum precipitation (PMP) (Benson, 1973;Hershfield, 1961). It is, however, difficult to define and its evaluation is subject to multiple assumptions and different sources of epistemic uncertainty, especially in what to assume about the potential advection of moisture and vapour (see, for example, Koutsoyiannis, 1999;Papalexiou & Koutsoyiannis, 2006;Micovic et al., 2015).
Different applications of rainfall generators have dealt with these problems in different ways. Some have simply ignored the problem, particularly when the interest is in defining frequencies that are not too far out on the tails. When the interest is in more extreme values two approaches have been taken. One is to resample if an event exceeds some upper threshold. For rainfalls this has often been some regional definition of PMP (e.g., Blazkova & Beven, 2009). Another approach has been to use a form of sampling distribution that does not have an infinite tail, but rather is asymptotic to some upper limit. Cameron et al. (1999) for example, introduced an upper bound for the distribution of storm intensities based on the UK national envelop of rainfall extremes for different durations. Cameron et al. (2000) compared three different stochastic rainfall models for a number of sites: a version of the original Eagleson (1972) model that had also been used in the continuous flood frequency modelling of Beven (1986aBeven ( , 1986bBeven ( , 1987; a data-based model for the Plynlimon rainfalls that had been introduced by Cameron et al. (1999); and the gamma distribution of the Bartlett-Lewis model developed by Onof and Wheater (1994). This may, however, be a misguided approach to the problem. Koutsoyiannis (1999) revisits the paper of Hershfield (1961) that made use of data from multiple stations in advocating the probable maximum precipitation concept. He shows that the joint data set used by Hershfield is consistent with a Generalized Extreme Value Type 2 (EV2) distribution out to the very long return periods allowed by concatenating the data from different sites. He also later shows (Koutsoyiannis, 2004(Koutsoyiannis, , 2021a) that the shape parameter of the EV2 distribution is close to a value of 0.15 for a very wide range of stations across Europe and North America. An EV2 distribution does not, of course, have an upper limit, it will always predict more extreme values than the EV1 (Gumbel) distribution for a given return period (though both will be subject to significant uncertainties at high return periods depending on the length of the data set available for analysis. The statistics therefore do not justify imposition of an upper limit. In a stochastic generator, however, it remains the case that values of event rainfalls much larger than the envelope curves of extremes for a given duration (often assembled on a national basis) might be randomly generated. These will have low probability, consistent with the distribution from which they are generated, but will clearly generate extremely high flows if used as the input to a hydrological model.
What we do not know is whether those values might be beyond the range of atmospheric possibility at these very low probabilities.
The implication is that, even if there is no justification for applying any upper limit, sufficiently long realizations should be used, relative to the return period of interest for an application, to allow any 'outlier' values generated, of much lower probability, not to have an effect on the return period of interest; hence the rule of thumb noted earlier that the realizations should be at least ten times longer than the return period of interest. Multiple realizations will also allow the sampling variability at different return periods to be evaluated (see, for example, Blazkova & Beven, 2009, for a dam safety example).

| Space-time rainfall models
The point rainfall generators of the last section have often been used as if they represented catchment averaged inputs, in the same way that data from single raingauges is also often used as if it represented BEVEN the catchment average inputs when there is no other information about the spatial pattern. Data from multiple raingauges can, of course, be averaged or interpolated over the catchment area in a variety of ways to estimate an average input value but there may be epistemic uncertainties associated with such estimates, especially where there is a large elevation range in a catchment or where the precipitation also involves snowfall. Attempts to estimate such uncertainties have been rather rare in the literature but can be derived for methods such as general linear models (e.g., Chandler et al., 2006) or kriging and co-kriging (e.g., Clark & Slater, 2006).
In stochastic generation of precipitation values, going from single sites to a full space-time field is not simple because of the timevarying correlation structures in space and time associated with rainfall fields and the possibility of only partial coverage by some events.
Both synoptic and convective rainfall events are structured in cells and bands, with analyses suggesting that this might involve a multifractal scaling (e.g., Schertzer and Lovejoy, 1987;Hubert et al., 1993). An early model of space-time rainfall was suggested by LeCam (1961) within a framework of evolving rainfall cells or 'showers', clustered in space and time, with later developments by Waymire et al. (1984) and Cox and Isham (1988).
More recent extensions of point rainfall models to space-time event generation, include the RainSim code of Burton et al. (2008) which is based on a Neyman-Scott clustering process (Cowpertwait et al., 2002;Leonard et al., 2008). Wheater et al. (2000)  There are too many uncertainties in the energy budget method, and too much approximation in the degree-day method. That is why, in real-time snowmelt modelling, data assimilation of snow-covered areas is often used to improve the forecasts during the melt season.
Because of the importance of snow in many regions, however, there have been many attempts to include snow in weather generators, particularly in assessing the impacts of future climate scenarios.
In some cases this also involves consideration of the accumulation and melt of snow on glaciers. These approaches are based either on a full energy budget approach or the degree-day method. In both cases, temperatures and precipitation values must be generated (with more meteorological variables for the energy budget method), and a temperature threshold is used to decide whether the precipitation falls as snow or rain as a parameter of the model. Temperatures can, of course, be variable in space, particularly in mountainous terrain so that there is an issue of generating spatial patterns, even if that is approximated to an elevation lapse rate relative to temperature at some point. It also raises the issue of covariation of all the required variables.

| Issues in converting stochastic precipitation inputs into flows
Peak flows will depend on velocities and rates of flood runoff generation; low flows will depend on storage-discharge characteristics of the subsurface including, at very low flows, the potential for the breakdown of connectivity in some flow pathways. Both are far more difficult problems than how to route that runoff once it is in a river, in part because of the observations of rainfall and runoff under extreme conditions are both limited and uncertain. This will have an important effect on any attempt to validate rainfall-runoff models (Beven, 1986b(Beven, , 2019. In continuous simulation studies, rainfalls are not the only input required to drive a hydrological model (and if snow is an important input, then other variables such as temperature and radiation might already be required to estimate melt rates). To allow the simulation of antecedent conditions prior to each event account must also be taken of actual evapotranspiration losses from a catchment. There may also be other losses, such as deep groundwater seepage, but these are generally ignored. Potential evapotranspiration rates can be simulated as part of a weather generator where it is expected that temperatures and humidities might be changing. However, some studies suggest that evapotranspiration rates in a hydroclimatic zone are rather conservative, so that if significant change is not an issue, rather simple models of potential evapotranspiration might be used to give good simulations of soil moisture deficit, even under extreme dry conditions (e.g., Calder et al., 1983).
For extreme wet conditions, a major issue in generating flows from any stochastic model of the inputs is just how good are the process representations of hydrological models, particularly under extreme conditions? In principle there is an opportunity to use a rainfall-runoff model to reflect dynamic changes in the dominant runoff processes with event magnitude and antecedent conditions. In practice this has proven to be difficult to confirm; though it has to be said that the hydrological modelling community has not really tried that hard to invalidate its model formulations (Beven & Lane, 2019). It has instead relied on calibration techniques to obtain a parameter set giving acceptable results within a model structure (where the definition of acceptable has been rather variable). There has been little attempt to really test the underlying process concepts of model structures, and very few studies where a model formulation has been rejected (but see, for example, Page et al., 2007;Hollaway et al., 2018).
In the case of flood hydrology, there is a long tradition of applying baseflow and stormflow concepts with the continued use of hydrograph separation techniques (e.g., recently Ghotbi et al., 2020). But there is no common definition of what might constitute baseflow (e.g., Beven, 1991) such that different definitions will provide a differ- show that stormflow and baseflow (however defined) are not equivalent to surface and subsurface runoff (see, for example, Kirchner, 2003;Beven, 2012Beven, , 2020aBeven, , 2021a. The relationship is much more complex. The identification of model parameters based on historical data also remains an issue, especially when the emphasis is on the higher magnitude flows for flood frequency estimation. Lamb and Kay (2004) Indeed it has been suggested that machine learning method might be more successful in converting rainfalls into streamflows than models based on process representations (e.g., Kratzert et al., 2019;Nearing et al., 2021). Such data-based methods are heavily dependent on the range and quality of the training data set available, though they will be able to compensate for inconsistencies in a data set if those inconsistencies show some form of consistent structure (see the discussion in Beven, 2020b). There may also be problems in predicting rare events or extremes using machine learning methods, since by definition such events will be rare in the training data and extrapolation beyond the range of the training data may not be well controlled.
There will certainly more attempts to use such techniques with additional constraints (e.g., an event water balance constraint) in future but such studies are still at an early stage.
There is an extensive literature on the uncertainties in hydrological observations and in predicting streamflows from rainfalls and other input data (see the other papers in this volume). This is compounded in the current context by the additional uncertainties associated with any stochastic model of those input data that then cascades through whatever approach is being used to simulate the discharges. The epistemic cascading nature of such uncertainties suggests that all the resulting discharges will be uncertain, and that any estimation of that uncertainty will be subject to the specific assumptions made in an analysis (e.g., Beven & Lamb, 2014).

| Issues in generating future sequences: Stochastic weather generators and stationarity
Stochastic models are designed to reproduce the statistics of the observational series against which they are calibrated. But this will then depend on the sample support for that calibration and whether the process can be considered as stationary for periods of similar length. In one sense this does not matter in that all the series generated by a stochastic model are necessarily conditional; they will behave strictly according to the assumptions on which they are based (if programmed correctly, and approximately so in the case of, for example, fractional Gaussian generators of finite memory length). This does not mean that they will adequately span the possibilities of future behaviour if either the calibration period is atypical, or if the system shows persistent behaviour that cannot be adequately characterized in the data available, or if the system is changing over time. In all of these cases the generated series might underestimate the potential future variability (see, for example, Thompson & Smith, 2019).
Sometimes it is difficult to determine which of these possibilities might hold. This is illustrated, for example, in the flood frequency estimation by continuous simulation study of Blazkova and Beven (2009).
The aim of that study was to get a better estimation of very rare events for assessing dam safety in the Skalka catchment in the Czech Republic. A rainfall-runoff model was calibrated for the catchment using 60 years of discharge records using the GLUE methodology. Perhaps more importantly, very few have done so while allowing for the potential for persistent stochastic behaviour (though see Tyralis & Koutsoyiannis, 2017). Most authors of such studies will make the argument that the outcomes are only potential scenarios, and that the projections of climate models are the best indication of future boundary conditions that are available. However, the outcomes from this type of study have rarely been tested, for example, by evaluating such an approach as predicted before the last decade against the actual observations over the last decade. This is not surprising since climate models change over time, so it would not be surprising if a past generation of model did not do well in predicting change in the last decade. The current generation of climate model might do better in predicting change in the next decade but will then also be superseded by the time we have the observations for that decade. In the past, however, they have certainly not done well in predicting hydrologically relevant variables such as rainfalls and near-surface humidities and wind speeds (see, for example, Koutsoyiannis, 2021b); indeed it has been suggested that at the decadal scale simple data-based models can be more successful even for predicting changes in global temperature (Fildes & Kourentzes, 2011;Suckling and Smith, 2013;Young, 2018;Young et al., 2021), but even for local predictions for land surface fluxes (e.g., Nearing et al., 2016).
This problem is generally overcome by only looking at the percentage changes predicted in future decades, and applying those changes to an observed record, or the parameters of a weather generation model. This might also involve downscaling algorithms that allow for the difference in scale from the climate model grid scale to the scale of an application such as a particular catchment.
The downscaling algorithms also have to be calibrated and this allows corrections for bias and other characteristics to be made between the climate model and observational statistics. These corrections are then assumed to hold into the future. This is a convenient working procedure, but since changes are being predicted there is no reason why that assumption should hold (see, for example, Koutsoyiannis et al., 2008;Beven, 2011). Indeed, some tests of weather generators suggest that they have not done. This is now cited in the text that well in reproducing the statistics of historical weather (e.g. Semenova et al., 1998), especially for the extremes (e.g. Verhoest et al., 2010).
More recent weather generators produce outputs of highresolution spatial fields of precipitation and meteorological variables (e.g., AWE-GEN of Peleg et al., 2017;Peleg et al., 2019 including time irreversible processes such as event discharges (Koutsoyiannis, 2020).

| CONCLUSIONS
There are enough issues associated with the stochastic generation of observables as inputs to hydrological models that they should be both used with care and the outputs, particularly the extremes, be assessed for realism. This is, of course, easier said than done, since even in the historical record the sample of available extremes might be rather limited. Evaluating future projections is impossible, but that also suggests that such projections should not be used without some effort being made to assess what the potential range of uncertainty associated with the projections might be and how that might propagate through the cascade of models involved in any impact study (e.g., Beven & Lamb, 2014).
Some of the most important issues that remain are: 1. The characteristics of the observations and their uncertainties that underlie stochastic models of inputs. It is still difficult to have an idea of catchment average inputs, because of uncertainties in observations (raingauge, radar or satellite) and interpolations. That makes it difficult to properly calibrate and test any stochastic models of inputs that might be used to produce long term sequences for flood frequency estimation or assessing impacts of change.
2. The limitations of the observational data are also an issue when it comes to representing the extremes (both wet and dry) based on the tail assumptions of the distributions that underlie these stochastic models. This includes the potential for persistence that has been found in time series of hydrological observables. Since we are not going to get this 'right' (it represents an epistemic uncertainty about the nature of the extremes) there can only be a choice of assumptions, but we should make an assessment of the sensitivity of decisions to uncertainty in the assumptions.
3. In terms of assessing impacts of climate change projections it is still difficult to really know how rainfalls will change into the future, or whether there has been any real change in the recent past relative to persistence in natural variability. Downscaling and bias corrections based on historical data, with the same limitations as the previous point that have been used to compensate for limitations of climate models are a neat trick to compensate for model deficiencies but will not necessarily hold into the future. Atmospheric modelling of particular storms with convection resolving schemes is improving but is not yet at a stage when it could be used for generating long term sequences of storms because of the dependence on boundary conditions from larger scale (and often coarser resolution) models and the greater sensitivity to land surface boundary conditions, with limitations of representation of surface and subsurface water and energy fluxes.
4. There is still much to understand about the apparent scaling behaviour of rainfall and discharge data, implying some form of long-term persistence in the hydroclimatic system, at least over a certain range of scales, with consequences for the assessment of frequencies. Some studies have made use of this in generating storm profiles and sequences in the short term, but there remains an issue about implications about longer term frequency characteristics, the use of persistence in weather generators, and whether there might be physical constraints on higher magnitude events.
5. We cannot avoid the general problem that the hydrological models used to convert stochastic sequences of inputs into simulated discharges have not been properly tested for high magnitude events, whether those be models based on process representations or derived by data-based methods or machine learning. More work is required on model validation (or invalidation, see Beven & Lane, 2019;Beven, 2018Beven, , 2021b. This also suggests that any such study should include an appropriate assessment of uncertainty in the outcomes, cascading from the model of the inputs through the discharge simulation model, with an expectation that the uncertainty be significant.

ACKNOWLEDGEMENTS
Work on this paper has been supported by the NERC Q-NFM project led by Dr. Nick Chappell (grant no. NE/R004722/1). The paper has greatly benefitted from an excellent review and the recent work of Demetris Koutsoyiannis for which I am most grateful.

DATA AVAILABILITY STATEMENT
This paper does not include any original data.

Keith Beven
https://orcid.org/0000-0001-7465-3934 ENDNOTES 1 See https://en.wikipedia.org/wiki/ENIAC. ENIAC was first commissioned in 1945. By the end of its operation in 1956, ENIAC contained 18 000 vacuum tubes; 7200 crystal diodes; 1500 relays; 70 000 resistors; 10 000 capacitors; and approximately 5 000 000 hand-soldered joints. It was roughly 2.4 m Â 0.9 m Â 30 m in size, occupied 167 m 2 (1800 sq ft) and consumed 150 kW of electricity. Programming involved by a combination of plugboard wiring and three portable function tables (containing 1200 ten-way switches each), so that setting up a new job could take weeks. Several of the early master programmers were women who did not receive proper credit for their work.
neering at Harvard University and a member of the interdisciplinary Harvard Water Resources Program that was founded in 1955 and chaired by Arthur Maass from the Department of Public Administration (Maass, 1962;Reuss, 2003). Myron B. Fiering (1934-1992 was a PhD student of Thomas and, after a short spell as an assistant professor at the University of California, returned to spend the rest of his career at Harvard. He was eventually appointed as a Professor of Applied Mathematics.