Despite the very wide application of hydraulic models for real-time flood forecasting and potential to represent river dynamics (Bates et al., 2006), an examination of the detailed results reveals that the predictions often do not match inundation observations very well in some parts of the modelled domain (Aronica et al., 1998; Pappenberger et al., 2005; Pappenberger et al., 2007). This is partly a result of numerical approximations within such models (but more critically because of inadequate data on the geometry and connectivity of the channel and floodplain), difficulty in estimating effective roughness coefficients, and uncertainty in the initial and boundary conditions (including poorly known upstream and lateral inflows at flood levels). The last of these issues is exacerbated if, as in the tidal situation, the boundary condition is itself the forecast from another model.
These issues, often combined with a lack of adequate calibration and validation data, mean that a single hydraulic model may not be the most suitable tool for making accurate flood forecasts everywhere on the flood plain. In addition, assimilating real-time data into such models can be an important factor in improving the accuracy of forecasts in the next event, but presents a number of challenges because of their numerous distributed state variables that are not normally observed (Madsen and Skotner, 2005; Weerts and El Serafy, 2006). Where observations are available, they will normally be of two types: point water level measurements using either a level recorder or surveyed maximum levels after an event, or remote sensing of flood outlines. The value of single-point measurements, corresponding to only one of the model state variables, is limited since conditioning of the remaining states is dependant upon the state covariance structure of the model that will change in space and time (Ricci et al., 2011). Remote-sensing data, such as synthetic aperture radar (SAR) imagery, can also be assimilated, either in the form of an outline of the flood extent or by computing estimated water depths using a digital terrain model. In both cases, the method of deriving the assimilated values can introduce uncertainty (Di Baldassarre et al., 2009), the magnitude of which requires careful characterisation since it can lead to biased predictions (Durand et al., 2008; Giustarini et al., 2011; Neal et al., 2009). This article outlines an alternative methodology for providing predictions of the water level at specific points at risk of flooding which may be used for providing flood warnings.
The methodology presented is based upon the data-based mechanistic (DBM) flood forecasting systems which have been shown to be effective elsewhere, for example on the River Nith in Dumfries (Lees et al., 1994), the River Severn (Romanowicz et al., 2006; Romanowicz et al., 2008) and the River Hodder (Young, 2003). Parsimonious time series models are used to describe firstly the tidal signal then, using this and an upstream water level as inputs, the water level at a number of gauged sites on the flood plain. These gauged sites include those used in the issuing of flood warnings.
The following sections of this article present, in order: further details about the River Dee, the model conceptualisation, outlines of the tidal and river water level models, the data assimilation scheme used for real-time forecasting, a summary of the forecast results for the study site, and some concluding remarks.
2. Study site
The source of the River Dee is in the Snowdonia Mountains, west of Bala in North Wales (approximately 700 m above Ordnance Datum) from which it flows down through a deep gorge near Llangollen before arriving at the Cheshire plain. The River Dee then traverses the plain to enter (at Chester) the estuary which links the river to the Irish Sea. The total catchment area is 1816 km2 to the Chester Weir at the head of the estuary.
The upper catchment, above the Cheshire plain, is rural. Rainfalls are approximately 2000 mm per annum in the Snowdonia Mountains. In this section the River Dee is steep and flows over impermeable rock, resulting in a rapid response to rainfall events. This section of the River Dee has a long history of flow regulation, the most recent scheme being presented in Mayall (1997).
The work presented in this article focuses on the lower part of the River Dee which crosses the Cheshire plain. Here rainfalls are lower (approximately 750 mm per annum) and the gradient of the river low. The area has a long history of flooding (Baines, 1959). Figure 1 shows a schematic of this section of the River Dee, including the gauging stations on the main channel maintained by the Environment Agency.
All the Environment Agency gauging stations record water level. Discharge is calculated at Manley Hall from levels recorded at a rated weir. Acoustic Doppler technology is used to measure discharge at Iron Bridge and Chester Weir. At these three sites, the high flows present during floods are not accurately recorded, either because a significant fraction of cross-section is beyond the range of the Acoustic Doppler equipment or because of the ‘drowning out’ of the weir structure. As such observational difficulties are present in many river systems, this article focuses on predicting the water level at each site. Currently flood warnings for this region are issued based upon the water levels at Farndon along with observations of discharges and reservoir levels in the upper catchment.
Data for all the observation sites were available from the start of January 2000 until the end of December 2007 at a 15 min time step. The first two years of data were taken as a calibration period, with the remainder being used for validation.
3. Model conceptualisation
All the locations marked in Figure 1, except Manley Hall, are at times affected by tidal fluctuations in the estuary. During periods of low water level in the river, the tidal influence can produce a significant reverse flow, where the peak in the upstream flow precedes the subsequent increase in water level (Figure 2(a)), consistent with hydrodynamical behaviour. Chester Weir provides a threshold effect (Figure 2(a)). During periods of moderate and high flow in the River Dee the tidal signal may not produce such pronounced reverse flow but acts as a throttle on the outflow of the river (Figure 2(b)).
The flood forecasting model is formulated to predict the water level at three of the gauging locations on the River Dee–Farndon, Iron Bridge and Chester Weir. Past DBM modelling studies (e.g Romanowicz et al., 2006) have cascaded a series of models to produce forecasts at multiple sites. In this case, such an approach is not appropriate due to the presence of response to both the river and tidal signal in the observed water level series. While these signals can be decomposed for identification and estimation purposes (section 5) this is not practical for on-line forecasting. Instead, the models for the individual sites are coupled so the boundary conditions are given as the observed water level data at Manley Hall and modelled estuary water level. Modelled rather than observed estuary water levels are utilised since the time delay in the response to tidal influence (up to 1.25 h depending upon the location) does not offer the minimum of 2 h lead time required by the Environment Agency.
This combined model is then placed within a data assimilation framework (section 6) allowing observations at all three sites to be assimilated and probabilistic forecasts issued.
4. Estuary water level model
Tidal predictions for the River Dee at Saltney Footbridge (Figure 3) were provided using a harmonic analysis (Pugh, 1996) of two years (20 April 2001 to 21 February 2003) of water level data at 15 min intervals. The data were made available by Airbus Industries which monitors that location as part of their operational requirements.
The tides, typical for an Irish Sea location, are semi-diurnal (i.e. there are two high and low waters per day) but with a marked diurnal inequality (high waters within the same day are often different). The curves are typical of tidal rivers, with a marked asymmetry between ebb and flood, in contrast to the more sinusoidal curves typical of coastal ports. The high water part is dominated by the tide but, towards the end of the ebb and at low water, the river flow dominates. Just after low water, the onset of the flood at this location is often marked by the arrival of a tidal bore (a hydraulic jump) followed by a very rapid rise in water level.
In the context of this study, high water predictions are of the greatest interest because the river level model is only influenced by tides which exceed the level of Chester Weir. Figure 3 compares a week of observed data with the tidal predictions derived from 114 harmonic constituents; the high water portions of the tidal predictions are seen to be in good agreement.
5. River level models
The development of the real-time river level forecasting model consists of two steps: off-line identification and estimation of the model from historic data, followed by on-line forecasting utilising data assimilation. The first of these steps is described in this section.
To allow an initial stage of model identification, the observed water level at the ith site yi = (y1,…,yT) is decomposed into river wi = (wi,1,…,wi,T) and tidal vi = (vi,1,…,vi,T) components such that
The term is used to represent a low flow component that cannot be attributed to the short-term response to inputs from upstream and tidal inputs such as ground water flow. This approximation has proved suitable for flood forecasting in a number of studies (e.g. Young and Beven, 1994; Lees et al., 1994; Romanowicz et al., 2006, 2008) but can be revised if strong seasonality or other patterns are present in the base flow. The separation of the river and tidal components is based on the premise that the tidal component will be harmonic in nature. Dynamic harmonic regression analysis (Young and Pedregal, 1999) is used to identify the harmonic component of which is taken to be vi, leaving .
The sites are numbered so that i = 0 is Manley Hall and i = 4 is the (modelled) estuary water level. There is no tidal response at Manley Hall, therefore and by convention y4 = v4 is the modelled estuary water level. At each of the three forecast sites (i = 1,2,3), the models for wi and vi are identified using as inputs wi−1 and vi+1 respectively.
This identification process is outlined in the following subsection. The coupling of the models and the joint parameter estimation is then presented.
5.1. Identification of the model structure
The models of the river and tidal response at each forecast site are constructed within the DBM framework (Young, 1998; Young et al., 2004; Young and Garnier, 2006) ensuring that the time series forecasting models developed are both parsimonious representations of the system and interpretable in a physically meaningful fashion. The model identification process is outlined for the river response at a single site. A similar process is undertaken to identify a model for the tidal response.
Initial analysis of the river response proceeds by fitting the linear continuous time transfer function
where is a stochastic noise and sr = ∂r/∂tr. The robust instrumental variable techniques for fitting such models presented in Young and Garnier (2006) and Garnier et al. (2007) are utilised. A number of model structures, determined by the triad of positive integers (n,m,d), can be trialled from which an optimum is chosen using the robust model selection criteria presented in Young (2011) which trade off model fit and parsimony.
If analysis of the residuals of the fitted transfer function confirms the potential for state dependency in the response to the upstream input. This can be explored using state-dependant parameter (SDP) estimation (Young et al., 2001) to consider the discrete time model
with ηt a stochastic disturbance. The notation and indicates that the value of α and β at site i for the river response is believed to vary with the known values of or respectively.
The SDP estimation utilises a semi-parametric description of the change of parameter value with state; typically this takes the form of a generalised random walk (Jakeman and Young, 1984; Young et al., 1989). For forecasting purposes, it is useful to propose a parametric representation of the resulting state dependencies. State dependency in β is commonly represented by transforming the input series. For example, in rainfall–runoff modelling, a lagged value of water level can often be used as a state which represents the wetness of the catchment and hence the magnitude of the response to a unit rainfall input (Young and Beven, 1994). Representation of state dependency in α is more complex since this implies that the parameters (a1,…,an) in the linear transfer function are state dependent. Though techniques for computing parameter estimates in such situations exist (e.g. Laurain et al., 2010), the physical interpretation of such state dependency is often unclear once the transfer function becomes greater than first order (i.e. a single a parameter).
Table 1 outlines the order of the transfer functions identified at the three forecast sites and where the SDP algorithm suggests state dependency exists. In all cases, the identified transfer function is first order and the response to the input variable (upstream river response or downstream tidal response) is nonlinear. At Farndon and Iron Bridge, the nonlinearity in the tidal response reflects the suppressing of this signal when the water level induced by the river response increases (Figure 4). At Chester Weir, the nonlinearity in the tidal response is heavily influenced by the differencing effect of the weir, which results in low estuary water levels having no impact on the river upstream (Figure 4).
Table 1. Input delays for the identified first-order transfer functions along with the delay in minutes (square brackets) and the states on which model nonlinearities are defined (Eq. (3)). No recorded state indicates that there is no identified nonlinearity.
The nonlinearities detected in the river response at Farndon and Iron Bridge relate to the river going ‘out of bank’ during an event in the calibration period. During this out-of-bank period, the persistence of the water levels at Farndon increases but the effect of the upstream input at this site is dampened. In contrast to this, small changes in the Farndon water level during this period produce much greater changes in the water level at Iron Bridge than during periods when both are ‘within bank’.
Where state dependency is believed to exist, it is proposed that it is described by a scaled sigmoid function which gives a value of
when evaluated at state st with parameters (ϕ1,…,ϕ4). In all cases except the tidal input to Chester Weir, the explanatory state is wi, which is not known when producing real-time forecasts. The following section outlines how this issue can be overcome by connecting the identified models to form a single simulation model whose boundary conditions are known and whose parameters can be estimated.
5.2. The coupled model
The basis of forming the simulation model is to replace the unknowns wi and vi with their predicted values and . For example, let be the value of the parametrised form of the in Eq. (3), evaluated at time t. If the parametrised form is not state dependent and is time invariant, the subscript indicating time is dropped. Since the identified transfer functions are first order, these parametrised nonlinearities are adequate to the coupled model given in Eqs (5)–(8).
Using the state vector
the coupled models outlined above can be expressed in state space form as
The inputs to this model correspond to the boundary conditions, i.e. the water level at Manley Hall and the modelled estuary water level. The system matrices are given by
The parameters of the state-dependent roots and poles require estimation. This is achieved by numerical minimisation of a sum-of-squared-errors criterion. Taking the observation matrix
the errors in the model fit at time t are given by
and the optimisation criteria are defined as . To maintain a mechanistic interpretation of the system, constraints are placed upon the optimisation. The poles are constrained to lie between zero and one, thus maintaining a stable system with a positive temporal dependence in the output variables. The roots are estimated to ensure they are always positive, thus increasing input corresponds to increasing water levels at a site. The resulting parameter estimates are presented in Table 2, along with a summary of the performance.
Table 2. Optimised parameter values of the coupled model for their appropriate response. Single parameters indicate a constant value while vectors represent (ϕ1,…,ϕ4) in Eq. (4). Performance of the fitted model is shown in terms of the Nash–Sutcliffe efficiency and the bias.
(0.88, 0.09, 0.92, 0.00)
(0.01, 0.12, –5.25, 4.42)
(0.14, 0.02, 3.67, 15.28)
(0.02, 0.02, 3.85, 2.35)
(0.02, 0.28, –10.00, 0.75)
(0.22, 1.11, –7.38, 0.00)
(0.01, 0.10, 3.63, 5.06)
6. Real-time forecasting with data assimilation
Past DBM modelling studies have utilised a number of data assimilation techniques. For example, Romanowicz et al. (2006) cast a time-invariant linear transfer function in a mechanistically interpretable state space form, then performed data assimilation using the Kalman filter (Kalman, 1960). An alternative strategy (Lees et al., 1994) is to use the output of the DBM model as a deterministic prediction which is corrected using a stochastic gain to produce probabilistic forecasts. Data can then be assimilated to condition the evolution of the gain. These two approaches can also be coupled (Young, 2002). Further extensions, such as state dependant formulations for the variance terms in the Kalman filter (KF), can also be incorporated (e.g. Smith et al., 2008).
This study follows the former approach of embedding the forecasting model in a filtering algorithm. However, in this case the state-space model is nonlinear making the filtering problem challenging. A number of algorithms are available which provide approximate solutions to the nonlinear filtering problem. Particle filters (e.g. Doucet et al., 2001; Moradkhani et al., 2005a; Weerts and El Serafy, 2006), which approximate the desired distributions through Monte-Carlo sampling, can be considered the most flexible, although the computational burden can be large (Smith et al., 2008) and implementation difficult when the failure of the model to capture the system dynamics dominates the observation noise (Liu and Chen, 1998).
Less computationally demanding and more commonly applied methodologies for solving the filtering problem are nonlinear extensions to the KF which attempt to construct a second-order approximation of the required distributions. In the current context, the main difference among these nonlinear extensions is the method of approximating the transformation of the mean and variance of the state vector through the nonlinear model. The Ensemble Kalman Filter (EnKF; Evensen, 2003; Moradkhani et al., 2005b; Reichle et al., 2008) and Unscented Kalman Filter (UKF; Julier and Uhlmann, 1997) perform this mapping using (often structured) samples. In the case of the EnKF, the resulting transformation is efficient only when the state variable is approximately multivariate Gaussian. For the UKF there is the presumption that the state distribution is unimodal, symmetric and unbounded.
The Extended Kalman Filter (EKF; Rajaram and Georgakakos, 1989; DaRos and Borga, 1997) used here linearises the transform of the state vector through the nonlinear model based on local derivatives. In many hydrological applications, the EKF has fallen out of favour (DaRos and Borga, 1997) due to difficulties in obtaining the required derivatives and concerns over the validity of the linearisation. For the model considered here, the derivatives are smooth and can be obtained analytically while the observation time step, when compared to the responses witnessed, appears frequent enough for the linearisation to provide an appropriate approximation.
6.1. The Extended Kalman Filter
To apply the EKF, consider the state vector xt to be unknown and to follow a distribution that can be summarised by its first two moments: the mean and variance Pt. The propagation of the distribution of the state vector forward in time is based on a Taylor series expansion of the nonlinear model (Rajaram and Georgakakos, 1989). Evaluation of this requires knowledge of the Jacobean
For the model given in Eq. (10), the formulae for columns of can be computed analytically. Significant simplification results when Ft is dependent only upon (w1,t,w2,t,w3,t) and Gt only upon w1,t.
Using the subscript t + 1|t to indicate the prediction at time t + 1 given all the information up to time t, the a priori prediction equations for the distribution of the state after propagation through the nonlinear model are:
where and are the values taken by Ft and Gt evaluated at . The matrix Q represents the covariance of a zero-mean symmetric additive noise term included in the state evolution to represent system noise which arises through inadequacies in the model's characterisation of the hydrological response compounded by errors in the forcing data.
The f-step-ahead forecasts can be computed as follows. First, repeated applications of Eqs. (16) and (17) give and Pt+f|t. From these, the expected value of the prediction is given by and the prediction variance by Rt+f|t + HPt+f|tHT, where Rt+f|t is the time-varying or state-dependant covariance matrix of the additive zero-mean unbounded observation noise.
Following the observation of water levels across the three sites at time t + 1, the mean and covariance of the a posteriori distribution of the model states can be computed using:
The unknown covariance matrices Q and Rt+f|t require selection. In this study they are chosen to be diagonal, implying that the noise terms associated with the modelled response at each site are independent. Correlations between the elements in the state vector are introduced though the nonlinear model. In keeping with past DBM studies, the diagonal elements of Rt+f|t were taken to be linearly dependant upon the expected value of the forecast at that site raised to a power. The diagonal elements of Q and parameters describing the state dependency of Rt+f|t were optimised to maximise the likelihood of the calibration data, given the forecasts at the maximum available lead time (2 h). The Prediction Error Decomposition methodology of Schweppe (1965) was used, which assumes independent Gaussian forecast distributions.
7. Forecast results
Table 3 quantifies the forecast performance of the model at the three sites. The 95% prediction confidence intervals given by the model (based on the Gaussian assumption used in calibration) appear consistent with the observed performance during both calibration and validation at Iron Bridge and Farndon, irrespective of the discharge threshold analysed. At Chester Weir, the validation performance deteriorates at higher thresholds, suggesting there are limitations of the representation of the tidal input and its interaction with the river response.
Table 3. Summary of forecast performance at a 2 h lead time. Fractions of the number of observations within two prediction confidence intervals and bias during the calibration (upright) and validation (italic) periods for each location at different quantile thresholds of the observed discharge.
95% prediction CI
50% prediction CI
The 50% prediction confidence intervals for Farndon and Iron Bridge include more of the data than would be expected in both calibration and validation. This suggests that the Gaussian assumption may not be valid which, given the approximations in the filtering scheme, suggests the prediction intervals are only indicative.
The assessment of the bias of the forecasts indicates that the model has a low forecast bias except for high water levels at Chester Weir.
Figure 5 shows forecast results for a flood event in the calibration period at the maximum lead time possible (2 h). As would be expected from the quantitative forecast summaries, the predictions at Iron Bridge appear to capture the dynamics of the river and tidal response. The forecasts for Farndon highlight the difficulty in capturing the high water levels at this site. While the state dependencies introduced in the model appear to capture the magnitude of the flood peak successfully, the plot indicates changes in the attenuation of the upstream input signal are not represented so well. The effect of this on the ability to issue flood warnings based on the expected value of the predictions is shown in Table 4.
Table 4. Summary of the 2 h lead time forecasts against flood alert levels for the Lower Dee Valley, expressed as percentage of the number of observed time steps above the alert level.
True positives (%)
False positives (%)
True positives are those time steps where both the observed value and the model prediction are above the alert threshold. False positives are those time steps where the model prediction is above the alert level but the observation is below.
Severe flood warning
The qualitative analysis of the model performance is further re-enforced by Figure 6 which shows forecasts at the same lead time for an event in the validation period. Limitations in the representation of high water levels at Farndon can again be seen. The failure to represent a tidal response at Chester Weir on 8 January shows the limitations of the representation of the tidal input at Chester Weir. The use of data assimilation to correct the predictions of the estuary water level model may be useful in addressing this.
This article has developed a DBM model for forecasting water level on the tidal section of the River Dee on the Cheshire Plain. As with past DBM mechanistic modelling studies, the models are identified objectively from the observed data. The nature of the identified models indicates that they require coupling and the resulting state space form is more complex than used in past hydrological DBM studies. In contrast to past studies, lagged values of the observed output are not used to transform the boundary conditions of the model. This results in a model that can be run in simulation, for example to explore climate change scenarios, as well as for real-time forecasting.
For real-time forecasting, the nonlinear nature of the state space form requires the use of a more complex data assimilation algorithm. It is demonstrated that, for this model, the Extended Kalman Filter can provide a suitable solution. Of course, observed values of water levels are required for data assimilation in real-time forecasting. The forecasts generated are shown to be accurate at Farndon and Iron Bridge although the timing of high water levels at Farndon can be poorly characterised. Analysis of the forecasts at Chester Weir, where the tidal signal is a significant cause of high water levels, indicates that improved modelling of the tidal input and its interactions with the river are required for successful forecasting.
The model provides a comparatively short lead time for each of the forecast sites. At 2 h, this meets only the minimum lead time undertaking of the Environment Agency and necessarily limits the utility of the model for flood warning purposes. Ongoing work to increase the lead time is focused on developing a forecasting model for the Manley Hall water level.
The authors wish to acknowledge the support of the Environment Agency (Wales) in providing data and details of their flood warning procedures. Airbus Industries are thanked for providing and allowing use of their data for construction of the tidal forcing model. Discussions with Peter Young and Wlodek Tych have been helpful in the formulation of the models used. This work was carried out as part of the NERC FREE program which funded the first author.