## 1. Introduction

[2] In this paper, flood forecasting is understood in a specific sense: namely the derivation of real-time updated, online forecasts of the flood level at certain strategic locations along the river, over a specified time horizon into the future, based on the information about the rainfall and the behavior of the flood wave upstream. Depending on the length of the river reach and the slope of the river bed, a realistic forecast lead time, obtained in this manner, may range from hours to days. The information upstream can include the observations of river levels and/or rainfall measurements. In the situation where meteorological ensemble forecasts are available, they can be used to further extend the forecast lead times, as in the approach presented by *Krzysztofowicz* [1999, 2002a, 2002b] and *Pappenberger et al.* [2005], but we have not utilized such information in this paper.

[3] The flow forecasting procedures described in the paper are incorporated within a two-step data assimilation (DA) procedure based on data-based mechanistic (DBM) models [e.g., *Young*, 2002a, 2002b], formulated within a stochastic state space setting for the purposes of recursive state estimation and forecasting. In the first step, available observations of rainfall and river levels at different locations along the river are sequentially assimilated into the forecasting algorithm, based on the statistically identified and estimated DBM dynamic models, in order to derive the multistep-ahead forecasts. These forecasts are then updated in real time, using a Kalman filter based approach [*Kalman*, 1960], when new data become available. The incorporation of new observations into the dynamic model via DA is performed online for every time step of the forecasting procedure.

[4] DA techniques have found wide application in the fields of meteorology and oceanography and an extensive review of sequential DA techniques, together with examples of their application in oceanography is presented by *Bertino et al.* [2003]. The problems described there involve the integration of multidimensional, spatiotemporal observations into fully distributed numerical ocean models and thus differ from the flood forecasting systems, which have much smaller spatial dimensionality (these are usually interconnected quasi-distributed systems of lumped rainfall-flow and flow routing models), but require longer lead times for the forecasts. However, certain aspects of the problem remain the same and this has led to recent applications of the ensemble Kalman filter (ENKF) in a flow forecasting context.

[5] The ENKF was developed by *Evensen* [1994] as an alternative to the extended Kalman filter (EKF) approach. The ENKF is the adaptation of Kalman filter (KF) model to nonlinear systems using Monte Carlo sampling in the prediction (or propagation) step and linear updating in the correction (or analysis) step. It has been applied to rainfall-flow modeling by *Vrugt et al.* [2005] and *Moradkhani et al.* [2005b]. In these works, the authors exploit sequential, ENKF-based estimation of both the hydrological model parameters and the associated state variables for a single rainfall-flow model (in contrast to the present paper which considers a much more complicated, quasi-distributed catchment model, involving a number of interconnected rainfall-level and routing models). *Moradkhani et al.*, [2005a] also apply a particle filter (PF) algorithm to implement sequential hydrologic data assimilation. Yet another approach to data assimilation is presented by *Madsen et al.* [2003] and *Madsen and Skotner* [2005], who apply updating of the modeling error of the distributed Mike-11 flow forecasts at the observation sites using a constant-in-time, proportional gain depending on the river “chainage”. Essentially, this gain is included to adjust for a hydrological model bias.

[6] One problem with both PF and ENKF is that they are computationally intensive approaches to DA and forecasting, requiring many Monte Carlo realizations at each propagation step. Moreover, due to the high complexity of these approaches, there may be questions about the identifiability of parameters involved in the different aspects of the applied routines (e.g., the estimation of the variance hyperparameters associated with stochastic inputs during the state and parameter estimation processes). In order to reduce the computational burden of ENKF and other Kalman filter based schemes, regularization may be introduced, as discussed by *Sørensen et al.* [2004]. However, as we shall see in this paper, the relatively simple nature of the rainfall-flow and flow routing processes means that there are simpler and computationally much less intensive alternatives to DA that are able to provide comparable forecasting performance.

[7] There are a number of simplified, conventional approaches to flow routing that have been applied to flow forecasting. Among others, for example, these include the Muskingham model with multiple inputs [*Khan*, 1993], multiple regression (MR) models [*Holder*, 1985], or autoregressive (AR) models [*Thirumalaiah and Deo*, 2000]. The Muskingum model is deterministic and does not give the required uncertainty bounds for the forecasts. Moreover, all of these more conventional models have a completely linear structure and so do not perform as well as nonlinear alternatives within a rainfall-flow context. In contrast, a considerable amount of research has been published recently on the application of nonlinear methods in flood forecasting. Among others, *Porporato and Ridolfi* [2001] present the application of a nonlinear prediction approach to multivariate flow routing and compare it successfully with ARMAX model forecasts. However, this approach does not have a recursive form and requires online automatic optimization when applied to online forecasting. Another nonlinear approach is the application of neural networks (NN) for flood forecasting [e.g., *Thirumalaiah and Deo*, 2000; *Park et al.*, 2005]. As discussed in these papers, NN models can yield better forecasts than conventional linear models and, if designed appropriately, they also allow online data assimilation. However, the NN based models normally have an overly complex nonlinear structure and so can provide overparameterized representations of the fairly simple nonlinearity that characterizes the rainfall-flow process (see section 2.1, equation (1a)). They are also the epitome of the “black box” model and provide very little information on the underlying physical nature of the rainfall-flow process: information that can provide confidence in the model and allows for better implementation of the forecasting algorithm within a recursive estimation context.

[8] In this paper, we consider another, newer approach to simplified modeling that utilizes statistical estimation to identify the special, serially connected nature of the rainfall input nonlinearity in the rainfall-water level process and then exploits this in order to develop a computationally efficient forecasting algorithm. This is based on statistically estimated, stochastic-dynamic DBM models of the rainfall-level and level routing components of the system, which are then integrated into an adaptive version of the standard recursive Kalman filter (KF) state estimation algorithm that generates both the state variable forecasts and their 95% confidence bounds. Here, the state variables are defined as the “fast” and “slow” flow levels (see later, section 2.2), that together characterize the main inferred variables in the identified DBM rainfall-level models. The serial input nonlinearity in the rainfall-flow model is obtained via a nonlinear transformation of rainfall, the nature of which is estimated directly from the data using a method of state-dependent parameter (SDP) estimation for nonlinear stochastic systems [e.g., *Young*, 2000, 2001]. In contrast to the model predictions produced by a fully distributed parameter model, the water level forecasts in this case are made only at the location of measurements, in accordance with the goal of the flood forecasting system under consideration. Of course, the resulting water level forecasts could be used as an input to a fully distributed flood forecasting model [*Romanowicz and Beven*, 1998; *Romanowicz et al.*, 2004a]; or they may be used to derive the risk maps at the gauging sites along the river, if the necessary elevation data are available.

[9] In regard to model and forecast updating, *Refsgaard* [1997] gives a review of different updating techniques used in real time flood forecasting systems, as well as the comparison of two different updating procedures applied to a conceptual hydrological model of the catchment, including rainfall-flow and flow routing models. In relation to his classification, the methods developed in the present paper utilize both recursive parameter and state updating. However, unlike approaches such as the EKF algorithm used by Refsgaard, where the updating is carried out within a single, nonlinear, state space setting, with the parameters considered as adjoined state variables, our state and parameter estimation procedures are carried out separately but concurrently, employing coordinated recursive estimation algorithms, as used by *Young* [2002a, 2002b]. This avoids the well known deficiencies of the EKF (such as problems with covariance estimation and convergence for multidimensional, nonlinear models) and yields more statistically efficient estimates of the model parameters. On the basis of the results of previous research [*Romanowicz et al.*, 2004b], the present study also considers an implementation based on the modeling and forecasting of water level (stage) rather than flow. This approach avoids the errors introduced by conversion of levels to flow and yields directly the forecasts of water levels that are normally required for flood forecasting and warning.

[10] The methodology used in this study is presented in the next section. It exploits the top-down, data-based mechanistic (DBM) approach to the stochastic-dynamic modeling of environmental processes, concentrating on the identification and estimation of those physically interpretable, “dominant modes” of dynamic behavior that are most important for flood prediction [e.g., *Young*, 2002a, 2002b]. In particular, hydrological processes active in the catchment are modeled using the SDP method of estimating the location and nature of significant nonlinearities in the system (here the effective rainfall input nonlinearity), together with a stochastic transfer function (STF) method for characterizing both the linear effective rainfall–water level behavior and the water level routing processes. The complete model consists of these linear and nonlinear, stochastic-dynamic elements connected in a manner that represents the physical structure of the Severn River catchment.

[11] The adaptive forecasting system utilizes a state space form of the complete catchment model, including allowance for heteroscedastic errors, as the basis for data assimilation and forecasting using a standard Kalman filter forecasting engine. Here, both the predicted model states (water levels including the inferred fast and slow components: see section 2.2), as well as adaptive parameters that allow for unpredictable changes in the catchment behavior and state-dependent heteroscedasticity (changing variance) associated with the model residuals, are updated recursively in response to input data received in real time from remote sensors in the catchment. It is also worth noting that this methodology may be used when backwater effects are not negligible (e.g., from tidal effects). Such effects can be considered as a second (downstream) input to the STF model and so that a multi-input, single-output (MISO) model can be incorporated within the same methodological framework. This latter approach was not necessary in the present case study but it is currently under investigation in other applications.