Historic Flood Reconstruction With the Use of an Artificial Neural Network

The uncertainty in flood frequency relations can be decreased by adding reconstructed historic flood events to the data set of measured annual maximum discharges. This study shows that an artificial neural network trained with a 1‐D/2‐D coupled hydraulic model is capable of reconstructing river floods with multiple dike breaches and inundations of the hinterland with high accuracy. The benefit of an artificial neural network is that it reduces computational times. With this network, the maximum discharge of the 1809 flood event of the Rhine River and its 95% confidence interval was reconstructed. The study shows that the trained artificial neural network is capable of reproducing the behavior of the hydraulic model correctly. The maximum discharge during the flood event was predicted with high accuracy even though the underlying input data are, due to the fact that the event occurred more than 200 years ago, uncertain. The confidence interval of the prediction was reduced by 43% compared to earlier predictions that did not use hydraulic models.


Introduction
River floods affect more people worldwide than any other natural hazard as they cause large economic damage and human casualties (Blöschl et al., 2017). To protect the hinterland from being inundated, flood defences are commonly designed such that they can withstand a flood event with a specific probability of occurrence. Most often, flood frequency analyses are used to establish such a discharge frequency relation. This relation is computed using measured discharges. The data set of measured discharges is generally limited because of the short time period that measurements have been performed. For example, in the Netherlands discharges have been measured since 1901, while the largest flood safety level along the river branches has a return period of 100,000 years (Van, 2016). As a result of the limited data set of measured discharges and because extreme events from preinstrumental records are not included, the discharge frequency relation has a large uncertainty interval. This specifically accounts for discharges corresponding to rare events where extrapolation is required. The uncertainty interval of the relation can be reduced by extending the data set of measured discharges with historic flood events (Bomers et al., 2019a).
Many studies have attempted to reconstruct historic flood events based on various sources (e.g., flood marks, flood deposits, and written records) to extend the data set of measured discharges. For instance, Herget and Meurs (2010) reconstructed maximum discharges of historic events near the city of Cologne. Historical information about occurred water levels was translated into discharges with the use of the empirical Manning's equation. Toonen et al. (2015) used grain-size measurements of flood deposits to reconstruct historic flood magnitudes of the last 450 years of the Lower Rhine. Recently, many papers study how information of historic flood events can be included in a flood frequency analysis (e.g., Frances et al., 1994;MacDonald et al., 2014;O'Connell et al., 2002;Reis and Stedinger, 2005;Sartor et al., 2010;Toonen et al., 2015). These studies showed that extending the data set with historic events significantly reduces uncertainty intervals of flood frequency relations. However, in most studies the uncertainty of the reconstructed historical discharges themselves were quite large. Reducing the uncertainty of these reconstructions has the potential to further decrease the uncertainty interval of flood frequency relations such that design discharges with large return periods can be predicted more accurately. We are specifically interested in reducing the uncertainty in the upper bound of the 95% confidence interval since this bound highly influences the design discharges corresponding to rare events as a result of extrapolation of data set of measured discharges. A way to decrease this uncertainty is to use hydraulic models.

10.1029/2019WR025656
Hydraulic models with a discharge wave as upstream boundary condition are commonly calibrated on measured water levels with the main channel friction as calibration parameter (Bomers et al., 2019b;Caviedes-Voullième et al., 2012;Domhof et al., 2018). Generally, for many historic flood events, one or multiple maximum water levels at several locations are known. However, corresponding discharges are often unknown. Therefore, it is not possible to calibrate the hydraulic model. This can be explained as follows: There is an infinite number of possible combinations of main channel friction and discharges that will result in correct prediction of water levels. A larger discharge can be compensated by a lower main channel friction and vice versa, resulting in the same simulated water levels. Therefore, to determine the potential maximum discharge of a historic flood event with the use of a hydraulic model, many model runs (in the order of 1,000 to reach convergence in model output) must be performed to capture all possible main channel friction-discharge combinations. The use of a physical hydraulic model is unfeasible for this application. Therefore, the objective of this paper is to study whether a response surface surrogate (i.e., a data-driven model without any physical processes of the original system) can be used to predict maximum discharges of historic flood events of which only water levels are known at a few locations. Response surface surrogate models are considered since computational time for this category of models is most often in the order of seconds because the model consists of relatively simple mathematical functions without any physical interpretations.
In this study, the 1809 flood event of the Rhine River is used as a case study. This flood event resulted in multiple dike breaches and consequently to inundations of the hinterland. Simple linear regression models are not capable of reproducing the nonlinear behavior caused by dike breaches (Toonen, 2015). Therefore, we set up an artificial neural network (ANN) as response surface surrogate model. ANNs are probably the most successful type of surrogate model with a flexible mathematical structure that is capable of identifying complex nonlinear behavior between input and output (Dibike and Solomatine, 2001). Many hydrological studies have shown the applicability of ANNs for flood forecasting purposes (e.g., Campolo et al., 2003;Elsafi, 2014;Kerh and Lee, 2006;Laio et al., 2003;Lekkas et al., 2004). These studies have set up an ANN to predict discharges and/or water levels at a specific site based on information of upstream gauge stations. It was found that the developed ANNs are well capable of describing flood propagation, making them a suitable forecasting tool.
Recently, ANNs are set up more frequently for flood routing modeling. Kia et al. (2012) created flood maps of the Johor River basin, Malaysia. A hydrological model was used as a high-fidelity model having, among others, rainfall as input data. Peters et al. (2006) and Shrestha et al. (2005) set up an ANN to predict flood wave propagation in a river basin. A 1-D hydraulic model was used to create the training data including flood events larger than measured so far. They found that the use of an ANN reduces computational times significantly compared to the hydraulic models, making them applicable for real-time control.
Most of the presented studies used measurements or hydrological modeling to create the training data, whereas the studies that made use of hydraulic modeling only focused on in-channel flood conditions. However, the large historic flood events typically resulted in dike breaches and inundations of the hinterland. Therefore, we are interested to what extent an ANN is capable of reproducing the physical behavior of a flood event with multiple dike breaches. During the 1809 flood event, several ice jams were present in the studied area. These ice jams may influence the physical behavior of the discharge propagation. However, we assume that the effects of these ice jams on maximum discharges were negligible, which is discussed in detail in section 6.2.
The outline of the paper is as follows. First, the 1809 flood event and the steps taken to reconstruct the 1809 geometry are presented in section 2. Then, the high-fidelity model is given (section 3) as well as the setup of the ANN (section 4). The ANN is validated and used to reconstruct the maximum 1809 discharge in section 5. The paper ends with a discussion and main conclusions in section 6 and section 7, respectively.

Characteristics of the 1809 Flood Event
The 1809 flood event resulted in high water levels and inundations of the embanked areas in the Rhine delta. In total, 100,000 people were affected by the flood and around 275 people died (Driessen, 1994). From half December 1808 till 10 January 1809, the temperatures were far below 0 in Germany and the Netherlands  (Lintsen, 1993). Due to the extremely low temperatures, the Dutch Rhine River branches were frozen. Around 10 January, the temperature started to rise, and as a result, ice sheets started to move chaotically. This caused the formation of a large ice jam downstream of Arnhem ( Figure 2). Hence, the Nederrijn River was blocked, and a larger discharge started to flow toward the IJssel River, resulting in multiple dike breaches. The second flood event followed around 25 January. This resulted in even more dike breaches and inundations. We only focus on the first flood event which occurred around 10 January since this event corresponds with the highest water levels measured during this flood. Hence, this moment most probably also corresponds with the largest maximum discharge at Lobith (Toonen, 2015) which we mainly aim to predict. This discharge can then be used to extend the data set of annual maximum discharges.
The study area stretches from Emmerich to the Dutch cities of Zutphen, Rhenen, and Druten ( Figure 2). In this area, six dike breaches occurred in the period from 12 to 15 January, resulting in inundations of the hinterland (Figure 2). During the event, daily water level measurements were performed at four locations in the studied area, namely at Pannerden, Nijmegen, Arnhem, and Doesburg ( Figure 1). Only the water level measurements closest to Lobith, which are at Pannerden and Nijmegen, are used to determine the maximum discharge at Lobith. This is because the water levels at Arnhem and most probably also at Doesburg were affected by ice sheets in the Nederrijn River. Consequently, the hydraulic model is not able to correctly simulate the water levels at these two locations.

Reconstruction of the 1809 Geometry
The 1809 topography, river position and bathymetry, dimensions of river embankments, and land use are reconstructed from different sources. To reconstruct the topography, a collection of elevation measurements obtained between 1950 and 1965 is used (Atlasproducties, 1987). Because this period predates major anthropogenic changes such as land leveling for agricultural practices, these data can be used to approximate the early nineteenth century topography (Alkema and Middelkoop, 2005). The point data are converted into a digital elevation model (DEM) by simple point to raster conversion in ArcGIS (ESRI, 2016), with a relatively large output cell size of 500 × 500 m. This rather large cell size was chosen in order to avoid artifacts from interpolation and to smooth out possible inaccuracies in the historical measurement data.
For the reconstruction of the river position and bathymetry, the first edition of the Algemene Rivierkaart (literally translated as General River map, and also known as "Goudriaankaarten" after the maker) map series is used, which completely covers the large rivers in the Netherlands at a scale of 1:10,000. The maps of the Rhine River branches Waal, Nederrijn and IJssel were produced in the periods 1830-1832, 1833-1839, and 1840-1844, respectively (Van den Brink, 2002, which is before the onset of major river normalization in the midnineteenth century. The historical river planform geometry is obtained by first georeferencing the individual map sheets and subsequently tracing the river banks in ArcGIS ( Van der Meulen et al., 2018). In the period when the maps were created, water depth profiles were measured across the river at intervals of 1 km. These data accompanied the maps in separate registers (Boode, 1979) and were considered lost (Wierda and Zweerus, 1994), until recently (Van der Meulen et al., 2018). We copied the historical data, consisting of measured depths and distances between depths and assigned geographical locations to the measurement points by linking the data to their respective cross sections, which are indicated on the maps. The cross sections are interpolated using the cubic Hermite interpolation method, resulting in a DEM of the main channel of the various river branches. Since the cross sections had an interval of 1 km, the accuracy near the bifurcation points of the Rhine River branches was not sufficient to correctly simulate the discharge partitioning along the Dutch Rhine River branches. Therefore, the DEM at the locations of the bifurcation points is replaced by the 1926 DEM which was available from previous work (Bomers et al., 2019c).
To reconstruct the river embankments in 1809, the first edition of the Algemene Rivierkaart is used, complemented by the first edition of the Dutch Waterstaatskaart (literally translated as Water Management map). The Algemene Rivierkaart is used to reconstruct the locations of the river dikes as vector lines in ArcGIS. The first edition of the Waterstaatskaart was created in the late midnineteenth century and covers the entire Netherlands at a scale of 1:50,000 (Blauw, 2005;Heere and Storms, 2002). The map sheets that cover our study area were produced between 1871 and 1879. These maps locally provide information on dike heights, which were implemented in ArcGIS, after which we linearly interpolated between the heights along the dike lines.
In order to supply the model with historically accurate friction values, reconstructions of land use based on early (circa 1900) topographical maps of the Netherlands (Knol et al., 2004) are used. The largest changes to land use in the study area occurred in early historic times, when the entire area was taken into agricultural use, and in the twentieth century, when cities and road networks expanded at unprecedented scale. Therefore, the land use situation around 1900 can be applied to the situation in 1809. The land use classes provided in the maps were translated into Manning's roughness coefficients with the tables of Chow (1959). This translation is provided in Table 1.

Setup of the 1-D/2-D Coupled Model
A one dimensional-two dimensional (1-D/2-D) coupled model is used as a high-fidelity model, because such a simplified model has as advantage that computational time is decreased significantly compared to a fully 2-D model, while not many physical processes of the original system are lost (Bomers et al., 2019c). The 1-D/2-D coupled model is run to create the training data to set up the ANN. The geometry is schematized by 1-D nodes as much as possible to keep computational time minimal. Only at locations where a 2-D component (Rijnstrangen area, Figure 2) is required to correctly model the physical processes of the flood event is a 2-D grid used. This has as advantage that computational time can be relatively low compared to a fully 2-D model, while model accuracy remains sufficient. HEC-RAS (v. 5.0.3), developed by the Hydrologic Engineering Centre of the U.S. Army Corps of Engineers, is used for the 1-D/2-D flood modeling.
The main channel and its floodplains are captured by 1-D profiles since 1-D profiles are capable of accurately predicting flood wave propagation in case of in-channel flows and under normal flow conditions (Tayefi et al., 2007). These 1-D profiles are coupled with 1-D storage areas (Figure 2), representing the embanked areas (i.e., the areas that are not part of the river system and are protected by dikes against inundations). If the simulated water levels in the 1-D profiles exceed the dike crest levels, water starts to flow into the storage areas corresponding with inundations of the hinterland. Bomers et al. (2019d) showed that for the discharge range considered in this study, no significant overland flows are present. The water that leaves the river system is not capable of flowing back into the river at a downstream location. Hence, only a reduction of the maximum discharge in downstream direction as a result of overflow and/or dike breaches is found. This justifies the use of the 1-D storage areas and neglecting the overland flow patterns.
Only the Rijnstrangen area ( Figure 2) is discretized with a 2-D grid since it is important that the flow through the Rijnstrangen area is correctly simulated. This is because it influences the discharge at the Pannerdensch Canal and hence the simulated water levels at Pannerden. These water levels are used to determine the maximum discharge at Lobith during the flood event. The Rijnstrangen area is connected with the Lower Rhine by an inlet. An inlet refers to a dike section with a lower crest level compared to its surrounding dike sections. As a result of the lower crest level, water starts to flow into the Rijnstrangen area as soon as the water level exceeds the crest level of the inlet. For this specific case, the crest level of the inlet is equal to 10.78 m +NAP (Ploeger, 1992).
A discharge wave at Emmerich is used as upstream boundary condition. Normal depths are used as downstream boundary conditions at the Waal, Nederrijn, and IJssel Rivers ( Figure 2). Normal depths can be computed with the use of the Manning's equation which can be written as (Brunner, 2016): in which V represents the cross sectional averaged flow velocity (m/s), R the hydraulic radius (m) depending on the water depth, n the Manning's roughness coefficient (s∕m 1/3 ), and S f the slope of the energy grade line (-). Generally, the energy slope can be approximated by the slope of the main channel (Brunner, 2016). Because the flow velocity is computed at each time step during the simulation, the Manning's equation with an energy slope equal to the bed slope as input data produce a water level considered to be the normal depth in both the main channel and floodplains of the various river branches.
The reconstructed land use classifications (Table 1) are implemented in the model as Manning's roughness values with the use of the tables of Chow (1959). Only the main channel friction, which is commonly used as calibration parameter, is considered as a random input parameter to be used in a Monte Carlo framework (section 3.2).
During the 1809 flood event, six dike breaches occurred in the studied area (  Note. The locations of the dike breaches are provided in Figure 2 in which the most upstream dike breaches along the Lower Rhine and Waal River correspond with Number 1 and the downstream dike breaches with Number 2. is assumed that all dike breaches evolve to the level of the surrounding natural terrain within 3 hr (Dawson and Wilby, 2001). The settings of the dike breaches are provided in Table 2.
Because of the highly detailed available information about the 1809 geometry, uncertainties in the bathymetry reconstruction are neglected in the analysis. Maps providing the exact location of the river course resulted in no plan metric uncertainties. Only bed levels of the main channel and floodplains were uncertain caused by the interpolation of the measured cross sections (section 2.2). However, Bomers et al. (2019c) found that uncertainties in the floodplain bed levels and main channel bed levels only have a small effect on the maximum discharges throughout the model domain. This justifies to neglect these uncertainties.

Design of Experiment
To set up a surrogate model, a design of experiment must be defined that determines the procedure of generating the training data with a high-fidelity model. Designs of experiments employ different space-filling strategies such that the behavior of the underlying system over limited ranges of the input parameters is captured (Razavi et al., 2012a). In this study, six input parameters of the high-fidelity model are considered to be random since these were uncertain during the 1809 flood event. The uncertain input parameters are the upstream discharge wave and the main channel friction of each Rhine River branch (five in total, Figure 3b).
Emmerich, Germany, which is located upstream of Lobith is used as upstream boundary (Figure 3a). The range of the maximum discharge wave at Emmerich is based on the work of Toonen (2015), who has reconstructed the 1809 maximum discharge based on average water level measurements of surrounding sites. Toonen (2015) determined the correlation between water level measurements at Lobith and surrounding sites based on measured time series. This regression curve was then used to predict the water level at Lobith based on the measured water levels at Emmerich, Pannerden, and Nijmegen. Subsequently, the predicted water level at Lobith was translated into a maximum discharge during the 1809 flood event using a Q-h relation. The confidence interval of the maximum discharge was based on the variation in predicted water levels at Lobith. Toonen (2015) found a maximum discharge range at Lobith of between ∼10,400 and 13,450 m 3 /s. We use a range of 10,000-14,000 m 3 /s to ensure that the range at Emmerich is wide enough and captures the potential 1809 maximum value at Lobith. This can be evaluated during the first runs with the 1-D/2-D coupled model by assessing whether the measured maximum water levels are in the range of the simulated values. Besides, not only the peak value of the discharge wave is unknown but also its shape. Therefore, three different shapes are considered in the analysis. However, since the maximum discharge at Lobith is mostly influenced by the peak value at Emmerich, and not by the shape of the discharge wave, the discharge wave shape is not considered as an input parameter to set up the ANN (Figure 3a).
Since the exact discharge partitioning over the various Rhine River branches is unknown during the 1809 flood event, each river branch has its own main channel friction in the model (Figure 3b). In this manner, the discharge partitioning can vary in the hydraulic simulations to create the training data since a river branch with a low friction value will receive more discharge than a river with a high friction value. as function of varying discharges is still largely unknown. Therefore, this parameter is commonly used to calibrate hydraulic models. During calibration, the main channel friction is adapted such that simulated water levels are close to measurements (Domhof et al., 2018). As a result, the calibrated main channel friction values capture the following features: the physical friction of the main channel caused by, for example, dune growth (Paarlberg and Schielen, 2012), channel irregularity, and vegetation (Herget and Meurs, 2010); a model-generated friction caused by, for example, simplifications in the model setup (Warmink et al., 2013) and discretization of the model domain (Bomers et al., 2019b); and an artificial friction to compensate for errors in the remaining input parameter (Warmink et al., 2013). The main channel friction is thus treated as a garbage bin to capture both the physical phenomena and model errors (Domhof et al., 2018). As a result, the calibrated friction value does not describe the physical friction as encountered in the field anymore (Bomers et al., 2019b). Therefore, it is impossible to find a friction range that covers the total range of possible values for a calibrated model. Including an infinitely wide range in the analysis results in an infinitely wide range of the upstream maximum discharge. This is because a low discharge with a high main channel friction results in a similar simulated water level as a high discharge with a low main channel friction. Therefore, a 1-D/2-D coupled model calibrated with the 1995 flood event, available from previous work (Bomers et al., 2019d), is used to determine the friction range used. The range is set equal to the minimum and maximum Manning's friction values found after calibration. This range equals 0.010-0.145 s/m 1/3 and is used randomly for each river branch to generate the training data.
Only the above mentioned six input parameters (upstream discharge wave and main channel friction of the various Rhine River branches) are considered to be random since only these parameters were highly uncertain. Latin hypercube sampling is used as sampling method to create the set of model runs used to generate the training data. This sampling strategy has as advantage that it can easily consider a large number of input parameters without the need for extra simulations (Razavi et al., 2012a). The six random input parameters are divided into eight levels in which each level has an equal probability of occurrence of 12.5%, following the method of Bomers et al. (2019c). For each run, the values in each level can randomly be sampled, constraining that if a level is already sampled it cannot be sampled again (Saltelli et al., 2008). In total, 160 simulations with the high-fidelity model are performed such that the continuous distributions of the input BOMERS ET AL. 9679 parameters are sufficiently captured (section 5.2). The maximum discharge at Lobith and the maximum simulated water levels at Pannerden and Nijmegen are considered to be the output data. These water levels, Although many research has been done on the main channel friction due to river dune dynamics during flood events (e.g., Naqshband et al., 2014;Paarlberg et al., 2010;Warmink, 2014), the value of this parameter the maximum discharge at Lobith, and the main channel friction of each river branch are used to train the ANN (Figure 3a).

Surrogate Model
Many different response surface surrogate models exist (e.g., ANN, Support Vector Regression Machine, Regression models). In this study, an ANN is set up. ANNs are the most commonly used response surface surrogate models in environmental and water resources optimization problems (Razavi et al., 2012b). This is because they provide an appealing solution to the problem in complex systems since they can, theoretically, handle incomplete and noisy data (Dawson and Wilby, 2001). In addition, many studies have shown the applicability of ANNs for river flow and flood forecasting for in-channel flow conditions (e.g., Campolo et al., 1999;Dibike and Solomatine, 2001;Kia et al., 2012;Matta et al., 2018;Peters et al., 2006;Shrestha et al, 2005).
An ANN can be seen as a black-box model that is capable of identifying complex nonlinear relationships between input and output data sets (Dibike and Solomatine, 2001). ANNs can be described as a network of interconnected neurons/nodes. Each neuron has several inputs, coming from other neurons or from outside the network (i.e., the input parameters), and a number of outputs, which in turn represents input data of another neuron or results in the final output of the ANN (Dawson and Wilby, 2001) (Figure 4). Each neuron is connected to other neurons by direct communication links. The output of a neuron is based on the weighted sum of all its inputs according to an activation function (e.g., linear function, threshold function, Gaussian function, sigmoid function). The type of activation function used depends on the type of network and training algorithm employed (Dawson and Wilby, 2001).
The ANN is developed with the use of the MATLAB Neural Network Toolbox (Beale and Demuth, 2004), since this toolbox can fit multidimensional problems well, given consistent input data and enough neurons in the hidden layers. We set up a feed-forward network in which the connections between neurons flow in one direction: from an input layer through one or more hidden layers, to an output layer (Dawson and Wilby, 2001;Kia et al., 2012). The neurons in the hidden layer compute an output based on the weighted sum of all its inputs according to an activation function. The sigmoid activation function is often used in literature since it is relatively easy to compute and capable of introducing nonlinear behavior to the network (Dawson and Wilby, 2001;Hagan and Menhaj, 1994;Peters et al., 2006).
To set up an ANN, the number of hidden layers and neurons in each hidden layer of the ANN must be specified by the modeler. One hidden layer is used since this type of ANN is suitable for our purpose (Hornik et al., 1989;Leshno et al., 1993;Peters et al., 2006). The choice of the number of neurons is case specific since it depends on the complexity of the original system (Xiang et al., 2005) as well as on the training data availability (Razavi et al., 2012b). For example, if there are too few neurons in the hidden layers, the network may not be possible to describe the underlying function of the original system because it has not enough parameters to map all points in the training data. On the other hand, if there are too many neurons present in the hidden layer, the network may overfit the training data (Dawson and Wilby, 2001). Commonly, a trial and error procedure is used to determine the appropriate number of neurons in the hidden layer (Dawson and Wilby, 2001;Campolo et al., 1999;Coulibaly et al., 2000). In this study, the following procedure is applied. We started with an ANN with 20 neurons, which we simplified to an ANN with 10 neurons without loss of information. Subsequently, we decreased the number of neurons by steps of one. We found that an ANN with one hidden layer and two neurons (Figure 4) was sufficient to accurately predict model output based on the training data for this specific flood event. The validation results are presented in section 5.
We recall that the ANN is trained with the training data generated with the 1-D/2-D coupled model. A general problem with training ANNs is overfitting (Kia et al., 2012;Shrestha et al., 2005) which leads to the issue that the ANN fits the noise existing in the training data rather than the underlying function (Razavi et al., 2012b). In environmental and water resources modeling literature, two well-known approaches exist to avoid ANN overfitting: early stopping and Bayesian regularization (Dawson and Wilby, 2001;Razavi et al., 2012b). Both approaches are efficiently implemented in the MATLAB neural network toolbox (Beale and Demuth, 2004). The two approaches were tested, and it was found that the Bayesian regularization resulted a slightly more accurate ANN setup (i.e., a slightly higher Nash-Sutcliffe coefficient) compared to the early stopping approach. The regularization approach stops training according to the adaptive weight minimization (MacKay, 1992). The remaining samples, that is, the samples not used to train the ANN, are used as test data. This data set represents an independent measure of network performance after training.
To train the ANN, the main channel friction of the various river branches (five in total) and the measured maximum water levels at Pannerden and Nijmegen are used as input data, resulting in seven input nodes (Figures 3a and 4). The output that must be correctly predicted by the ANN represents the maximum discharge at Lobith, corresponding to one output layer (Figures 3a and 4). Since training the ANN multiple times will result in different ANN structures as a result of different starting conditions, 10 training trials are performed. Maximum discharges at Lobith predicted by the ANN are compared with the results of the high-fidelity model. We found that all ANNs were able to reproduce the behavior of the high-fidelity model with high accuracy. The root-mean-square error had a range of between 70 and 98 m 3 /s. This shows that the ANN structure as developed in this study is able to accurately reproduce flood wave propagation. The ANN with the lowest root-mean-square error is used to reconstruct the 1809 maximum discharge, following the method of Kerh and Lee (2006) and Razavi et al. (2012b).

Reconstruction Results
The 1809 reconstruction shows important differences compared to the present-day situation ( Figure 5). First, the topography has been restored to a situation preceding large-scale anthropogenic modifications to the terrain during the second half of the twentieth century. The reconstructed situation has no higher grounds caused by the highways and only has few pits resulting from clay and sand mining, whereas these features define the topography in the present situation ( Figure 5). Second, the river position and bathymetry have been restored to their seminatural state before river normalization, which in the Netherlands commenced around 1850. The historic river shows wider bends than the present river and has multiple within-channel bars ( Figure 5). Third, the embankments along the rivers have been restored to their nineteenth century dimensions. These are rather similar compared to the present-day embankments but locally less wide and up Finally, the friction of the land cover has been restored to a pre-twentieth century situation. Many locations in the reconstructed study area have friction values similar to present day. This is because these have been in use as grasslands since medieval times. Locally, friction values are significantly lower in the reconstruction, because these have been taken in use for residential purposes in the past century.

Validation of the ANN
We recall that the ANN was trained with varying main channel friction (representing the calibrated values of the hydraulic model) along the various river branches and maximum simulated water levels at Pannerden and Nijmegen as input data (Figure 3a). The maximum discharge at Lobith was used as output. The results during the training and testing procedures are provided in Figure 6. The accuracy of the model is evaluated by computing the Nash-Sutcliffe model efficiency coefficient (NSE) (Nash and Sutcliffe, 1970): where Q m is the modeled discharge by the ANN (m 3 /s), Q o the target discharge modeled by the hydraulic model (m 3 /s), and Q o the average of the target discharge (m 3 /s). A value of NSE equal to 1 corresponds with a perfect fit between the output of the ANN and the output of the hydraulic model, whereas a value of 0 indicates that the ANN output is as accurate as the mean discharge predicted by the hydraulic model. The results show that the ANN is well capable of predicting maximum discharges as a function of varying main channel friction and maximum water levels because of the high NSE values for both the training and test data sets ( Figure 6). Based on this finding, we have enough confidence in the accuracy of the ANN. Hence, the ANN can be used to reconstruct the maximum discharge at Lobith during the 1809 flood event.

Reconstruction of the 1809 Maximum Discharge
The objective of the ANN was to determine the maximum discharge at Lobith during the 1809 flood event. Two maximum measured water levels at surrounding sites (Pannerden and Nijmegen) are available (Figure 1). The corresponding discharge is uncertain as a result of uncertain main channel friction of the various river branches. The input main channel friction values represent the calibrated frictions and are variable since these are highly uncertain due to hydraulic model calibration (section 3.2). Therefore, a Monte Carlo analysis is performed with the ANN in which the two water levels are set to their maximum measured values and in which the five main channel friction values can vary within the continuous ranges of between 0.010 and 0.145 s/m 1/3 (Figure 3a). This range is equal to the range used to generate the training data. Hence, the ANN is not used outside its training values and therefore the ANN provides reliable results. In total, 10,000 runs are performed in less than a second. This number of runs was sufficient to obtain a proper approximation of the probability distribution function of the maximum discharge at Lobith. This  function is presented in Figure 7. The 95% confidence bound of the 1809 maximum discharge is equal to ∼10,920-12,050 m 3 /s, with an expected discharge of 11,270 m 3 /s. Toonen (2015) found an expected 1809 maximum discharge of approximately 11,820 m 3 /s, with a 95% confidence interval of between ∼10,400 and 13,450 m 3 /s. We can thus state that the proposed methodology reduces the 95% confidence interval with 63% compared to the method of Toonen (2015). Specifically, the upper 95% bound is reduced which is highly beneficial to reduce uncertainty in flood frequency relations.

Discussion
The proposed methodology can be applied to any kind of historic flood event caused by high rainfall intensities if sufficient information about the bathymetry and the event itself is available. The important input data are the course of the main channel, the locations of the dike breaches, and some records of maximum water levels. In this section, the following is discussed: the training data, the effects of ice jams on model results, the uncertainty in measured water levels, the applicability of an ANN, and the efforts of the proposed methodology in terms of time investment.

Training Data
The maximum upstream discharge functioned as an input parameter of the 1-D/2-D coupled model to create the training data ( Figure 3a). We found that the 95% confidence interval of the 1809 flood event with a range of between 10,920 and 12,050 m 3 /s falls within the range considered to create the training data. Therefore, we conclude that the chosen upstream discharge range, that is, 10,000-14,000 m 3 /s, was sufficiently large.
Furthermore, the main channel friction of each river branch was used as an independent input parameter to set up the ANN. Even though the Nederrijn and IJssel Rivers are located more than 15 km downstream of Lobith, we found that using their main channel friction values as input parameter of the ANN resulted in a more accurate ANN setup than if these two parameters were not considered in the setup. This indicates that the main channel friction values of these two downstream river branches still have an effect on the maximum discharge at Lobith caused by backwater curves. Therefore, we conclude that none of the input parameters are redundant and that all input parameters contribute to the accuracy of the ANN.
The range of the main channel friction was based on a calibrated model available from previous work (Bomers et al., 2019c). This range influences the confidence interval of the predicted 1809 discharge since a larger range will result in a larger confidence interval. To study the sensitivity of the 95% confidence interval to the input range of the main channel friction values, we have performed some additional runs with the hydraulic model in which a maximum main channel friction of 0.29 is used corresponding with a value which is a factor two larger than the considered maximum main channel friction (0.145 s/m 1/3 ) to create the training data. We found that increasing the main channel friction from 0.145 to 0.29 s/m 1/3 results in an increase of the maximum water level by 45 cm at Pannerden having a fixed upstream discharge. Therefore, we state that the simulated water levels are highly dependent on the main channel friction. Hence, the reconstructed 1809 flood events depends on the considered range. However, we also found that increasing the main channel friction to a value of 0.29 s/m 1/3 results in simulated water levels that are higher than measured, even for a maximum upstream discharge of only 10,000 m 3 /s. On the other hand, for a main channel friction of 0.145 s/m 1/3 , simulated water levels are below measurements for an upstream discharge of 10,000 m 3 /s and higher than measurements for an upstream discharge of 14,000 m 3 /s. Since the training data produces maximum water levels at Nijmegen and Pannerden that are in the range of measurements, we conclude that the main channel friction range is selected properly. However, if no earlier study is available providing insights in the range of potential maximum upstream discharges the problem of selecting an appropriate main channel friction range becomes more severe.

The Effects of Ice Jams on Model Results
This study shows that an ANN can be used to predict maximum discharges of historic flood events if measured water levels are available. However, it must be noted that the developed ANN is only as good as its high-fidelity hydraulic model. Hydraulic models are typically used to simulate flood wave propagation under normal flow conditions. Flood events caused by, for example, ice jams are difficult to simulate because of the complex physical processes. Furthermore, for historic flood events the exact locations, sizes, and behavior of the ice jams are generally unknown. Therefore, we recommend to only use the proposed methodology for flood events caused by high rainfall intensities or snow melt. During the 1809 flood event, an ice jam 10.1029/2019WR025656 was present along the Nederrijn River. As a consequence, the Nederrijn River was more or less blocked and more water started to flow toward the IJssel River. As a result, the high-fidelity model did not predict water levels at Arnhem (located along the Nederrijn River) and Doesburg (located along the IJssel River) in the correct range. Although we have no strong indication from literature, it might be that also the water levels at Pannerden and Nijmegen were affected to a certain extent by ice jams. If this is the case, the water levels could have been lower under normal flow conditions. Subsequently, the reconstructed maximum discharge at Lobith achieved in this study might have been overestimated. Therefore, we highlight that the 1809 flood event had a discharge at maximum equal to the reconstructed maximum discharge. For flood safety assessments, this is the most important information since it is of high importance that the uncertainty in the upper bound of the 95% confidence interval is reduced.
Not only downstream of Lobith were ice jams present but also upstream near Emmerich, Germany (Terfehr, 2008). However, according to literature, this ice jam fell apart quite fast. Flow pulses generated by ice jam melting is not considered in this study because we do not have evidence that such flow pulses were present during the flood event. Consequently, it might be that the discharge wave shape used as boundary condition is not correct since this has a typical shape of a normal flood event generated by a precipitation event.
However, the work of Bomers et al. (2019d) shows that the discharge wave shape only has a little effect on downstream maximum discharges. Hence, the reconstructed discharge at Lobith represents a reliable maximum number.

Uncertainty in the Measured Water Levels
In section 5.3, it was assumed that the measured water levels were not subject to any uncertainties. However, this is generally not the case during flood events. To study the influence of potential uncertainty in measured water levels, a variation of 50 cm is included in the analysis (i.e., the actual water levels at Nijmegen and Pannerden can be 25 cm lower or higher than measured). It is assumed that the uncertainty follows a normal distribution.
Including the uncertainty of measured water levels in the analysis results in a predicted maximum discharge of 11.345 m 3 /s and a 95% confidence interval of between ∼10.500 and 12.190 m 3 /s. The uncertainty in measurements results thus in an increase in the confidence interval by 560 m 3 /s. Even though the interval increases, it is still much lower compared to the findings of Toonen et al. (2015) who also did not include the effect of uncertain measurements in the analysis. This shows the robustness of the presented methodology and the applicability of the use of hydraulic models for accurate historic flood predictions.

Applicability of an ANN
In section 5.2, we found that the ANN with one hidden layer and two neurons is capable of reproducing the physical behavior of the hydraulic model with high accuracy. The simplicity of the ANN may indicate that the set up of an ANN is redundant and that a simpler method might have worked as well. To test this, three other types of surrogate models are set up, namely, a linear regression model, a Gaussian process regression model, and a Quadratic Support Vector Machine. For more information about the set up and applicability of these types of models, we refer to Chau et al. (2005), Han et al. (2007), Liong and Sivapragasam (2002), Raghavendra and Deka (2014), Rezaeianzadeh et al. (2014), Yu et al. (2006) and Wasimi and Kitanidis (1983).
All types of surrogate models are capable of reproducing the output of the hydraulic model with high accuracy (Table 3). However, the ANN performs best. Contrarily, the linear regression model is less capable of predicting the target behavior since the physical processes during the 1809 flood do not have a fully linear nature. This is mainly caused by the flow through the Ouderijnstrangen area and the various dike breaches along the Rhine River branches. Furthermore, the ANN is capable of predicting the maximum discharge at Lobith with the smallest 95% confidence interval (Table 3). This interval is at least 700 m 3 /s smaller compared to the computed intervals of the other surrogate models. Specifically, the upper bound is lower which is highly beneficial for flood frequency analyses.

Geometry Reconstruction and Hydraulic Modeling Efforts
The use of a 2-D hydraulic model to create the training data is quite time consuming. Therefore, a 1-D/2-D coupled model was used in which a single simulation was typically in the order of 30 min. However, collecting the data required to perform the geometry reconstruction and performing the reconstruction itself is quite time consuming especially for older events for which the available information becomes even scarcer. Sheffer et al. (2003) found for the Ardèche River, France, that flood events are generally clustered in time. Note. NSE represents the Nash-Sutcliffe model efficiency coefficient and RMSE the root-mean-square error, Q av is the maximum average discharge predicted, and 2.5% and 97.% represent the lower and upper bounds of the 95% confidence interval, respectively.
We advise to reconstruct such clustered flood events (e.g., the Rhine River floods in 1496 and 1497) since then the bathymetry reconstruction only has to be performed once while it can be used to reconstruct multiple historic floods. Furthermore, we advise to use the proposed method solely for flood events capturing highly complex physical processes caused by, for example, dike breaches. For these situations, the use of simpler methods such as regression functions is less suitable.

Conclusions
An ANN was set up to reconstruct the maximum discharge of the 1809 flood event. Training data were created with a 1-D/2-D coupled model that is capable of correctly simulating the complex physical processes during a flood event. It was found that an ANN with one hidden layer and two neurons is capable of reproducing the input-output relations of the 1-D/2-D coupled model with high accuracy. The ANN was capable of predicting a flood event with multiple dike breaches resulting in inundations of the hinterland with high accuracy. Therefore, this surrogate model was used to perform a Monte Carlo analysis with varying input data to find the 1809 maximum discharge and its 95% confidence interval.
The predicted 1809 flood event had an expected maximum discharge of 11,270 m 3 /s with a 95% confidence interval of between ∼10,920 and 12,050 m 3 /s. This study showed that the range of this confidence interval has significantly been reduced by applying the proposed method compared to methods that did not use hydraulic models to reconstruct historic floods. Therefore, it can be concluded that the use of hydraulic models, and a trained ANN based on such a hydraulic model, can reduce the uncertainty of historic flood reconstructions since these models are capable of accurately simulating the complex physical processes of flood events. If these reconstructions are used to extend the data set of measured discharges, also the flood frequency analyses can be performed with less uncertainty (Bomers et al., 2019a).