Bathymetry Inversion using a Deep-Learning-Based Surrogate for Shallow Water Equations Solvers

River bathymetry is critical for many aspects of water resources management. We propose and demonstrate a bathymetry inversion method using a deep-learning-based surrogate for shallow water equations solvers. The surrogate uses the convolutional autoencoder with a shared-encoder, separate-decoder architecture. It encodes the input bathymetry and decodes to separate outputs for flow-field variables. A gradient-based optimizer is used to perform bathymetry inversion with the trained surrogate. Two physically-based constraints on both bed elevation value and slope have to be added as inversion loss regularizations to obtain usable inversion results. Using the"L-curve"criterion, a heuristic approach was proposed to determine the regularization parameters. Both the surrogate model and the inversion algorithm show good performance. We found the bathymetry inversion process has two distinctive stages, which resembles the sculptural process of initial broad-brush calving and final detailing. The inversion loss due to flow prediction error reaches its minimum in the first stage and remains almost constant afterward. The bed elevation value and slope regularizations play the dominant role in the second stage in selecting the most probable solution. We also found the surrogate architecture (whether with both velocity and water surface elevation or velocity only as outputs) does not show significant impact on inversion result.


Introduction
River bathymetry is critical for many aspects of water resources management, e.g., infrastructure planning, flood prevention, navigation and transportation through waterways, river restoration, among many others.The diverse and often inspirational water features seen in rivers, from breath-taking rapids and water falls to lowland meanders, are mainly controlled by their underlying bathymetry (Robert, 2003;Garcia, 2008;Julian, 2018).Many measurement techniques, ranging from as simple as a rod to acoustic sounding, and more advanced technologies such as close-range photogrammetry and remote sensing, have been used to obtain bathymetric information at a point or over an area (Lane & Chandler, 2003;Chen et al., 2019;Collins et al., 2020).However, despite the advancements in measurement techniques, there are still technological, economical, and logistical barriers in the use of bathymetry measurement and sensing to obtain highresolution data over wide spatial and temporal ranges (Fonstad & Marcus, 2005;Casas et al., 2006;Lee et al., 2018).
-2-manuscript submitted to Water Resources Research Alternatively, indirect methods, such as bathymetry inversion from surface velocity, are attractive and have gained substantial interests (Fonstad & Marcus, 2005;Andreadis et al., 2007;Wilson & OZkan-Haller, 2012;Lee et al., 2018;Almeida et al., 2018;Ghorbanidehno et al., 2021).This work presents one such method which uses a fast deeplearning-based surrogate model, in place of often slow physics-based models, to accelerate the inversion process.More importantly, here we employ a hyper-efficient inversion procedure based on gradient descent, automatic differentiation and backpropagation.To enable backpropagation, we make use of the fact that the neural network surrogate is already differentiable, in a similar spirit to the recently-proposed differential parameter learning (dPL) (Tsai et al., 2021).The gradient obtained in such automatic and fast fashion is then ingested by an inversion optimizer.
Bathymetry inversion has been reported in several previous researches with very encouraging results.Like most inversion problems, the solution of bathymetry inversion is ill-posed, which is characterized by the questions of existence, non-uniqueness, and stability in response to perturbations in measurement data.Previous studies have explored the use of different approaches to obtain usable solutions and deal with the ill-posed nature of bathymetry inversion.Broadly speaking, previous works adopted either the deterministic approach with least-squares (Almeida et al., 2018) or the stochastic approach for example the Bayesian estimation (Lee et al., 2018).Both approaches have their strengths and weaknesses.Briefly, the stochastic approach produces a probability distribution of solutions rather than a single optimal solution from the deterministic approach.It does not need the evaluation of gradient as in the deterministic approach (Ghorbanidehno et al., 2020).However, it does require reliable prior knowledge or belief about the bathymetry.
Otherwise, it is prone to producing biased models.In general, the stochastic approach is computationally more expensive than the deterministic approach, especially for inversion in high-dimensional parameter space, which is the case for bathymetry inversion (Aster et al., 2013).Methodologically, these approaches have been used to solve inversion problems in many closely related fields such as hydrogeology, seismology, and fluid mechanics (Laloy et al., 2017;Zhu et al., 2021;Zhou & Tartakovsky, 2021).This work adopts the deterministic approach to invert bathymetry from flow information, which is increasingly available using acoustic, radar, and image-based technologies at high resolution and low cost.For example, surface velocity can be obtained using Lagrangian drifters, large-scale particle image velocimetry (PIV), and synthetic manuscript submitted to Water Resources Research aperture radar (SAR) (Bradley et al., 2002;Muste et al., 2008;Lewis & Rhoads, 2015;Steissberg et al., 2005;Biondi et al., 2020).
The feasibility of bathymetry inference from surface flow information is based on the assumption that there is a strong causal relation between the two.This assumption is generally valid, especially in shallow flows where the fluid motion is dominantly horizontal and the vertical velocity is relatively small.River flow dynamics under this condition can be described reasonably well with the two-dimensional (2D) shallow water equations (SWEs), a special case of hyperbolic partial differential equations (PDEs) which approximates the full three-dimensional motion of fluid (Liu et al., 2008).In this work, a 2D SWEs numerical solver is used to generate training data and perform inversion.Two-dimensional SWEs and their variants are the backbone of many models for hydrological and hydraulic predictions.These models typically use traditional methods such as the finite volume method (FVM) to discretize the governing equations on a mesh.
Deep learning (DL), especially deep neural network (DNN), is one of the most popular ways for building surrogate models due to its capability of capturing high-dimensional nonlinearity (LeCun et al., 2015;Shen, 2018).Its popularity is also partially due to its ability to perform automatic differentiation.The gradient of an objective function defined on a neural network, with respect to the input, is readily available and have accuracy up to machine precision.Alternatives to obtain gradient either require numerical approximation, symbolic differentiation, or analytical derivation.For example, Almeida et al. (2018) used the variational inversion method where the gradient is calculated with both forward and adjoint solutions.Although efficient, adjoint-based approach involves substantial mathematical derivations.It is also intrusive to PBMs source code because the adjoint problem needs to solve a new set of equations.Thus, the use of automatic differentiation to obtain gradient is appealing and has been swiftly adopted by many researchers for inversion problems (Ren et al., 2020;Xu & Darve, 2020;Zhu et al., 2021).
Recently, Tsai et al. (2021) proposed a novel differentiable parameter learning framework to integrate big-data DL and differentiable PBMs for parameter estimation.Here, the differentiable PBMs refer to models implemented directly using the machine learning platforms such as Tensorflow or PyTorch, or their differentiable surrogates.
In this work, the USBR SRH-2D solver was selected as the PBM and its surrogate was built as an autoencoder using convolutional neural network (CNN).Specifically, a shared-encoder, separate-decoder architecture is used where the input image of bathymetry is encoded and then decoded to three outputs, namely, the two flow velocity components and the water surface elevation (W SE).Similar structure has been used in Guo et al. (2016)  This work is different from previous studies in the following aspects: (1) Our inversion is built upon a CNN-based surrogate and the differentiable parameter learning idea.Previous bathymetry inversion works, which showed very encouraging results, used different approaches such as variation inversion in Almeida et al. (2018), principal component geostatistical approach (PCGA) in Lee et al. (2018), and fully connected DNN in Ghorbanidehno et al. (2021).In fact, the approach in Ghorbanidehno et al. (2021) directly maps surface velocity to bathymetry, i.e., there is no need for inversion.They also subdivided the domain into small segments and bathymetry inference was performed for each segment sequentially.In our work, the bathymetry inversion for the whole domain is performed in a "one-shot" fashion.The PCGA approach in Lee et al. (2018) requires the transformation of the irregular domain into a rectangular box.Although our example is also on a simple rectangle domain, the input and output images used by our surrogate model can embed a river of any shape.(2) We use the gradient generated from automatic differentiation.As will be shown, we examined in detail the inversion process from the initial guess to the final converged bathymetry, which resulted in valuable insights on two distinctive stages during the inversion process.(3) To overcome the ill-posedness of the inversion problem, we found that there are two necessary physical constraints on both bed elevation value and slope, which have to be embedded in the inversion loss function as two separate regularizations.A heuristic approach is proposed to solve the problem of regularization hyperparameter determination.(4) We also investigated the effects of autoencoder architecture for surrogate model on bathymetry inversion.
The rest of the paper is organized as follows.The methodology, including the surrogate model architecture, inversion algorithm, and training data generation, is introduced first.Then the results on both the performance of the surrogate model and the inversion algorithm are presented with discussion.This paper is concluded with a summary of findings and future work.

Methodology
This section describes the deep-learning-based surrogate model and the inversion algorithm, followed by the introduction to the PBM solver and training data generation process.

Deep-learning-based surrogate model architecture
The convolutional autoencoder is used to construct the surrogate model.It consists of two parts: encoder and decoder.Figure 1 shows the architecture of the CNNbased surrogate model developed in this work.Similar structure has been used in Guo et al. (2016) for steady-state, laminar Naiver-Stokes equations, and Forghani et al. (2021) and Song et al. (2021) for 2D SWEs.The main differences between our surrogate model and that in Forghani et al. (2021) (named AE in their work) are in the detailed architecture.In addition, we used multiple output branches for different flow field variables (velocity components and water surface elevation) to investigate the effects of surrogate architecture on inversion.
The encoder takes the bathymetry image, z b , as the input and gradually extracts bathymetric features through several stacked convolution layers and a fully connected layer at the end.The output values of the fully connected layer is the encoded feature vector h, or simply code, in latent space to compactly represent the input data.The encoder can be denoted as where ConvN N (•) is the convolutional encoding operation which maps z b image to h vector.
The decoder is the reverse operation of the encoder.In the original design of autoencoder, the output of the decoder should approximately reproduce the original input.However, in the framework of surrogate model for SWEs solvers, the output of the decoder is the flow field corresponding to the input bathymetry.Indeed, the decoder has three outputs, i.e, u, v, and W SE.Here u and v are velocity components.A CNN model constructed in this manner has a shared-encoder and separated-decoder architecture (Guo et al., 2016).The decoder can be denoted as where DeconvN N (•) is the transpose convolution operation (often incorrectly noted as deconvolution and thus the name in the notation), which reconstructs the flow field from the encoded feature vector h.
To train the surrogate model, a proper loss function needs to be defined.The goal is to train the surrogate model to make accurate predictions on the flow field.Here, the loss function is constructed to minimize the mean squared error (MSE) of all components of the predicted flow field, i.e., û, v, and W SE. The hat denotes the predicted variable.
It will also be used to denote the inverted variable in this work.The loss function based on the error in flow prediction can then be written as where u, v, and W SE are the flow data (ground truth) from SRH-2D simulations.N batch is the number of batches and M sample is sample size in each batch.n and m are the index for batch and sample, respectively.A stochastic gradient descent optimizer, ADAM, was used to update the weights and biases of the neural network (Kingma & Ba, 2014).
To alleviate the vanishing gradient problem, the Rectified Linear Unit(ReLU) activation function was used (Sekar et al., 2019).For the final output layer of the decoder, the linear activation function is used to accommodate negative flow variable values.
The training of the surrogate model also involved hyperparameter tuning.The important hyperparameters for encoder and decoder include the number of convolution/transpose convolution layers, and within each layer the number of feature maps, filter size (width and height), strides, and padding options.The length of the encoded feature vector is also a key parameter, which determines the accuracy of latent space representation for an input bathymetry image.These hyperparameters need to be tuned to achieve the best results.Detailed explanations for each of these hyperparameters in autoencoders and their effects are omitted in this paper due to length limit.More details can be found in many deep learning textbooks, e.g., Goodfellow et al. (2016).

Inversion algorithm
The trained surrogate model was used for the inversion, i.e., to find the bathymetry given a flow field.For convenience, the surrogate model can be written as where G represents the forward model or physical simulator, d = (u, v, W SE) is the flow field data, and is error due to numerical approximation and/or measurement inaccuracy.The inversion problem is then to find the inverse operator G −1 , which is often very challenging if the problem is ill-posed, non-linear, and high-dimensional.A key strategy to stabilize the inversion is to impose additional constraints, generally referred to as regularization.Among many, Tikhonov regularization is commonly used, which adds the additional constraint on the inverted variable.Many previous works have used the zerothorder Tikhonov regularization where the additional constraint is to minimize the norm of z b .However, we will show that the zeroth-order Tikhonov regularization itself is not enough for bathymetry inversion.In this work, we propose to use two new physicallybased regularizations, one for the bed elevation value (a modified zeroth-order Tikhonov regularization indeed) and one for the bed slope (a first-order Tikhonov regularization).
The total inversion loss function has the form of where L prediction is the flow prediction error defined in Eq. 3, L value is the loss due to the inverted ẑb values going beyond a prescribed range, and similarly L slope is the loss due to the slope of inverted bed exceeding a upper limit.For a specific river, the maximum bed slope is mostly determined by its sediment characteristics.α value and α slope are the regularization factors for their corresponding losses.Like other hyperparameters in the model, these two parameters are problem specific and need to be tuned.For inverse problems, these parameters can be determined with methods such as the L-curve criteria.Their determination and effects will be shown in the results section.
Essentially, the value loss L value and the slope loss L slope are used to discourage the inversion algorithm from searching solutions outside the prescribed bounds for bed elevation and slope.These kind of losses are commonly used in inversion problems for distributed parameters to find the most probable solution that reasonably matches the observation data.The slope loss here penalizes the total variation of bed elevation and is also referred to as the maximization of entropy.From the physics point of view, the slope of a sediment bed is bounded by the angle of repose, exceeding of which will cause sand slide (Song et al., 2020).To embed these prior information or constraints in the inversion process, the value and slope losses are calculated with a double-bounded ReLU (dbReLU) function (Ren et al., 2020), which has the form of where x is the input to the function, x c is the center of x, and a is a constant which controls the range of x where the loss function is zero.A schematic diagram of the function is shown in Fig. 2. It is clear that with this function as loss function, the loss will be zero within the range of [x c − a, x c + a], i.e., the zero loss will be bounded by the two ends of the range and hence the name.Away from the two end points of this range, the loss starts to increase linearly.With the above definition, the losses due to value and slope -9-manuscript submitted to Water Resources Research can be written as where N points is the number of points in the input bathymetry array, Ŝx and Ŝy are the slopes of inverted bed in the x and y directions, respectively.The method for bathymetry inversion proposed in this work involves two main steps.
The first is to train the CNN-based surrogate model and the second is to perform the inversion.For the first step, the neural network is trained on a given set of data, which is made of the input/output pairs of the SWEs simulator.The training data generation will be described in the next section.The first step will result in a surrogate model G.
In the second step, for any arbitrary input z b , the surrogate model produces an output (û, v, W SE) and the inversion loss defined in Eq. 5 can be calculated.In the framework of neural network and due to its capability to perform automatic differentiation, the total inversion loss L total is differentiable with respect to the input z b .The gradient ∂L total /∂z b can then be used in a simple iterative scheme to invert the bathymetry, which can be written as where i is the iteration number and α inversion is the inversion step size (learning rate).
More advanced iterative schemes, such as the Gauss-Newton and Levenberg-Marquardt -10-manuscript submitted to Water Resources Research methods, can be used in future work to further take advantage of the gradient information.Equation 9 is essentially a gradient descend optimization to minimize the total inversion loss L total .It is noted that during the inversion process, the parameters of the surrogate neural network are frozen (not trainable).The only trainable parameters are the bed elevations ẑb .To start the iterations, an initial guess on the bathymetry, ẑ0 b , is needed.The initial guess can be samples drawn from a prior distribution and each will result in an inverted bathymetry.The ensemble can be used to assess the stability of the inversion and calculate a mean inverted bathymetry.

SWEs solver
The training data for the surrogate model was generated with the 2D SWEs solver SRH-2D, which is a popular 2D hydraulics model (Lai, 2008) More details about the turbulence model and the physical means of all terms can be found in Rodi (1993) and Lai (2008).
-11-manuscript submitted to Water Resources Research

Training bathymetry generation
There are different ways to generate the bathymetry for training cases.Among them, the most popular way uses a stochastic process which can embed some prior knowledge on the bed.In this work, the training bathymetry is randomly drawn from a 2D Gaussian process.To control the properties of the generated bathymetry, such as overall bed elevation range and feature size, the parameters of the Gaussian process can be tuned.
Additionally, if ground-truth bed elevations at certain locations are available, they can be assimilated into the Gaussian process prior to the draw.Following Landon et al. (2014), this work uses the Radial-Basis Function (RBF) as the 2D Gaussian kernel, which has the form of where x and x are two arbitrary points in the domain; ∆x and ∆y are their distance in x and y directions, respectively; σ z b is the prior variance; l x and l y are the length scales in x and y directions, respectively.
The parameters for the kernel should be set based on some basic prior information about the bathymetry.For demonstration purpose, this work generated hypothetical bathymetries as training data.The bathymetries were firstly generated on a unit square.The prior variance σ z b has a value of 0.5.The length scales l x and l y have the values of 0.1 and 0.2, respective.Then, the bathymetries on the unit square were mapped to a rectangular domain with a length of 25.6 m and a width of 6.4 m.These parameters were set such that the generated bed elevation is in the range of -0.5 m to 0.5 m and there is approximately one or two significant bed features (bedform-like) in the simulation domain.Three thousand bathymetires were generated in this work.Figure 3 shows four samples and flow is from left to right.For example, Sample 0 shows one elongated bar on the top and one hump at the outlet.There is also a deep pool near the outlet hump.Sample 0 will be referenced later as an example for some detailed analysis.

Flow data generation, parameter tuning, and implementation
Three thousand simulation cases were run with the generated bathymetries, out of which 80%, 15%, and 5% were used for training, validation and testing, respectively.
All cases were run with the same conditions except the bathymetry.Steady flow in the rectangular channel was simulated.The inflow discharge was 3 m 3 /s and the downstream The performance of surrogate model and the inversion highly depends on the choice of hyperparameters.The optimal values for these hyperparameters were obtained through manual tuning, which is manageable because of the relative simplicity of the hypothetical cases in this work.Automatic and more intelligent hyperparameter optimization can be performed if the cases are more complex.The CNN-based surrogate model used in this work contains two convolution layers in the encoder and four transpose convolution layers in the decoder for each of the outputs.The fully connected layer for the code has a length of 1,024.Other parameters of the surrogate model can be found in Table 1.
During the surrogate training process, to update the weights and biases in the neural network, an initial learning rate was set at 10 −3 .A learning rate scheduler was utilized to reduce the learning rate by a factor of 0.5 when the validation loss did not decrease over 5 epochs.One hundred epochs with a mini-batch size of 10 were performed until both training and validation losses were simultaneously minimized.For inversion, -13-manuscript submitted to Water Resources Research a fixed learning rate of 0.01 was used to perform 1,000 steps of iterations which has been proven to be sufficient to obtain converged solutions.
The surrogate model and the inversion method were implemented in Tensorflow (Abadi et al., 2015) with Keras API (Chollet et al., 2015).The training of the surrogate model and the inversion were performed on Amazon AWS with a NVIDIA K80 GPU card.
3 Results and discussions

Performance of surrogate model
The premise of accurate inversion using surrogate model is the accuracy of the surrogate model itself.In this work, the surrogate model was evaluated with three metrics, i.e., the MSE defined in Eqn. 3, the root-mean-square error (RMSE) of absolute error and relative error.The relative error for a grid point is defined as the ratio of the absolute error to the maximum of the predicted and ground-truth values (to avoid division by zero problem).Taking u as an example, the RMSE of absolute error e m,u and relative error e r,u for each case are defined as follows: where N points is the number of points in each test case.Then the mean, max, and standard deviation of the two errors for all test cases are calculated.
-15-manuscript submitted to Water Resources Research Both losses were driven to their minimums and there was no significant over-fitting because the validation loss is close to the training loss.This shows the architecture of the CNN surrogate and its hyperparameters are proper for the given dataset.The surrogate model proposed in this work is of high accuracy and properly captures the input-output dynamics embedded in the SWEs solver.Figure 5 shows an example case comparing the results from SRH-2D and the surrogate.The bathymetry of this case is the "Sample 0" shown in Fig. 3.The result shows that the difference between the results of PBM and surrogate is small.For this particular case, the RMSEs of absolute error for u, v and W SE are 0.0076 m/s, 0.0066 m/s, and 0.0007 m, respectively.
The prediction error for W SE is smaller, in both absolute and relative senses, than the errors for the two velocity components.This is not surprising because the W SE field is typically much smoother and has less variations than the velocity field.In fact, because of the hydrostatic assumption in SWEs, the water depth h, which directly controls W SE, is equivalent to the pressure in the Navier-Stokes equations.In the Navier-Stokes equations, the pressure is governed by an elliptic Poisson's equation, which has self-smoothing effect over time and space for proper initial and boundary conditions.
The relatively small prediction error in W SE may have some implications for inversion.For a given bathymetry, it is easy for the surrogate model to make more accurate W SE predictions than the velocity.In other words, the total error is dominated by the velocity errors.If this is true, then during inversion the W SE error does not contribute as much as the velocity error.Therefore, it is possible to drop W SE and only use velocity components u and v for the inversion.This is good news because that means we need less information to invert the bathymetry.In practice, it is difficult to obtain velocity and W SE at the same time.Although the CNN surrogate model has three output branches (u, v and W SE), we do not need to use all of them to do the inversion.
More discussion will be presented in the inversion result section.
The spatial distributions of prediction errors shown in Figure 5 do not show any clear trend in where large errors are located.For example, locations of large prediction errors in the velocity components do not seem to be coincident with either low or high of bathymetry and velocity themselves.These observations apply to all test cases.It might be beneficial in future research to investigate what drives the spatial distribution of prediction error from the CNN and whether it has implications for inversion.
The statistics of the flow prediction errors for all 150 test cases are shown in Table 2.The statistics include the mean, max, and standard deviation (std).The prediction errors for all test cases are small, except the relative error for the velocity component v, which has a mean value of about 39.6% and a maximum of about 61.6%.This high relative error should not cause any alarm because it is due to the small magnitude of v in the flow results.The absolute error for v is reassuringly very small.The accuracy of the surrogate model was further analyzed with the distribution of the L2 norm of the prediction error in Eqn. 4. The histogram, the 5th, 50th, and 95th percentiles of the L2 norms for all 150 test cases are shown in Fig. 6.The L2 norm of surrogate error is used in the determination of the inversion loss parameters α value and α slope .Here, the L2 norm only has the contributions from u and v prediction errors because only velocity is typically used for inversion.From the figure, it is clear that the surrogate model error is in a range (approximately from 3×10 −1 to 1.0) and each case's error is different.Its implication for inversion is discussed next.

Inversion parameter determination
The performance of inversion also heavily depends on the parameters.The values of relevant parameters for inversion are listed in Table 3.Indeed, some of the parameters have physical meanings and should reflect the specific problem setup.For example, the mean and amplitude of bed elevation value and slope should be set with our prior knowledge of the bathymetry.In this case, the normalized bed elevation should be in the ranges of [-0.5, 0.5].The bed slopes in x and y directions should be in [-0.08,0.08]and [-0.15, 0.15], respectively.In practice, this prior information has to be obtained from other means such as survey or historical maps.
-18-manuscript submitted to Water Resources Research Here the L2 norm only has the contributions from u and v prediction errors.-21-manuscript submitted to Water Resources Research • minimize the scatter (uncertainty) in prediction loss L predition .The figure shows that the scatter in prediction loss L predition , measured by the horizontal size of the confidence ellipse, decreases as α slope decreases.This is again as expected because the surrogate model performs best with no extra terms added to the loss function.Based on this requirement, we may again choose a small α slope value such as 0.01.
• minimize the scatter (uncertainty) in slope loss L slope .The figure shows that the scatter in slope loss L slope , measured by the vertical size of the confidence ellipse, is the largest when α slope has a value of 0.01.For other values, the scatter is comparable.Balancing all the above requirements, the α slope value of 0.1 seems most reasonable.In addition, the prediction losses L predition corresponding to this α slope value are within the 5th and 95th percentiles of the surrogate model prediction accuracy.Results beyond the range bounded by the 5th and 95th percentiles do not make sense.They either overfit or underfit the α slope value to drive the surrogate model out of its accuracy range.

Inversion result evaluation
This section shows the bathymetry inversion results.Like in many previous researches, the inversion discussed in this section only used the velocity u and v as the input.From the practical point of view, this is reasonable because velocity data can be obtained relatively easily.The effects of inclusion or exclusion of W SE in the inversion is discussed later.
As an example, Fig. 8 shows the inversion results for one of the cases.All these inversion cases are in the test dataset, not in the training and validation datasets.Note here the truth bathymetry is again the "Sample 0" shown in Fig. 3.The top row of Fig. 8, from left to right, shows the z b truth, mean of all inverted z b fields from all eleven initial bathymetries, and the differences between truth and mean inverted beds.The rest rows of Fig. 3 show three individual examples of the inversion from three different initial bathymetries, including the one from the initial flat bed with random white noise.
The results show that all inversions for this particular case converged to similar, though subtly different, final bathmetires.The inverted bathymetries not only qualitatively, but also quantitatively, recover the truth.For the example shown in Fig. 8, all inversions in the eleven ensemble capture the elongated bar on the top and the hump at the outlet.In addition, they also capture the deep pool feature near the outlet.Quantitatively, the RMSE of the inversion error is about 0.08 m for the mean and about 0.09 m for individual inversions.The slight improvement by the mean shows the benefits of using an ensemble, instead of a single inversion, because of the uncertainties.
To further appreciate the efficacy and uncertainty of the inversion method proposed in this work, Fig. 9 shows the profiles of the inverted beds for the same case shown in  The light profiles are the results inverted from different initial beds.

Inversion process analysis
Some detailed analysis was performed on the inversion process to shed light on how the iterative inversion algorithm finds the solution.Figure 10 shows the inverted bathymetry at eight iteration steps for the same example case shown in Fig. 8.This inversion example started using the initial flat bed with random noises.An animation of this inversion example is in the Supplementary Information.To aid the analysis, the total inversion loss and its components are plotted in Fig. 11.
The inversion process clearly shows two stages.The first stage is between Iteration 0 to about 100.This initial stage is characterized by the rapid change of bed with blocks or bands such that it can quickly evolve to a state that can describe the overall landscape of the target bathymetry.This can be easily observed in the left column of Fig. 10 where the inverted beds at Iteration 0, 10, 50, and 100 are shown.The initial in-  -25-manuscript submitted to Water Resources Research verted beds show some hint on the two bed bars at the top and the outlet, as well as the deep pool at the outlet (the truth bahtymetry is "Sample 0" in Fig. 3).The first stage can also be easily identified in the loss history plotted in Fig. 11.During this stage, the total loss and its three components all drops quickly.It is noted that at Iteration 0, the regularization losses due to bed elevation value and slope are zero.However, after the first iteration which has not taken into account the regularizations yet, the inverted bed has significant violations of the imposed bounds for bed elevation and slope.Thus, the losses for value and slope regularizations have a sudden jump at Iteration 1.The inverted beds from the first several iterations are far from the truth.Thus, the loss due to errors in the predicted flow field is also high.The continuation of the inversion iterations drives down all the losses efficiently because the total loss is very responsive to the change of bed, i.e., the gradient ∂L total (z b )/∂z b has very large magnitude in the first stage.
The blocks or bands of the inverted beds at the beginning of the inversion process is due to the convolutional nature of filters in the encoder.Specifically, the length scale of the blockiness or band is directly proportional to the filter and stride sizes in the convolutional layer, especially the first one.Some portion of information on the bathymetry, in particular the spatial scales smaller than the scale defined by the filter and stride sizes, is lost during encoding process.This is one inherent limitation of using image-based regressions such as CNN.To demonstrate, Fig. 12 shows one example bed, two feature maps in the first convolution layer of the encoder corresponding to this bed, and the inverted bed at one early iteration.The shown feature maps are two representatives out of the 128 feature maps in the first convolution layer.One can observe that the two feature maps roughly capture some bed peaks and depressions.It is also clear that the inverted bed at the shown iteration has similar pattern of feature map 0. Intuitively, the inversion is initially guided by the large-scale, main features of the bed embedded in the feature maps.
One could argue that to reduce the blockiness or bands of the initial inverted beds, we can reduce the sizes of filter and stride, and add more feature maps and layers in the CNN surrogate model.However, this naive approach will also increase the complexity of the CNN architecture and the trainable parameters will increase exponentially.For a given training dataset, the surrogate model with increased complexity will quickly overfit and make the surrogate model prediction useless.Therefore, there is a trade-off between the accuracies of forward prediction and inversion using CNN-based surrogate models.It is beyond the scope of this work to investigate whether we can and if so, how to Fortunately, the blockiness or band of inverted beds at the beginning of inversion is not a big concern.The banded bed will be smoothed out in the second stage which happens roughly after Iteration 100.As shown in the right column in Fig. 10, the blocks and edges of the inverted bed are gradually smoothed out.In this process, more smallscale bathymetric details emerge to better depict the final inverted bed.All these are driven by the continued minimization of the total loss, which needs to minimize all three loss components.From Fig. 11, it is important to note that in this second stage (after Iteration 100), the prediction loss L prediction stays almost constant.The value regularization loss oscillates with negligibly small values, which indicates that inverted bed elevation is well within the imposed bound.What makes significant changes is the slope regularization loss.Indeed, the bed slope changes mostly happen at the edge of the largescale blocks resulted from the first stage.The slope regularization resembles the detailing and finishing of a rough sculpture.
-27-manuscript submitted to Water Resources Research

Effects of inversion regularization
Regularization is the key to achieving usable inversion results.This section will show the effects of the two regularizations proposed in Section 2.2.Four cases will be compared, i.e., with both value and slope regularizations, with slope regularization only, with value regularization only, and with no regularization.
Figure 13 shows the contours of inverted bathymetries for the four cases.The RMSE of inverted z b is also reported on the figure.Figure 14 plots the longitudinal and crosssectional profiles of the inverted bathymetries.Again, the truth bathymetry is the "Sample 0" in Fig. 3.The results show that without proper regularization, the inversion produces suboptimal or even garbage solutions in comparison with the truth.With slope regularization, the inverted bathymetry is close to the truth.However, because of no constraint on value, the inverted bed elevation is beyond the specified range in many places (see the portion of bright white and deep blue pixels beyond the range of colorbar in Fig. 13(b)).
With only value regularization, the inverted bathymetry shows more blockiness and noisiness.The noisiness can also be observed in the profiles shown in Fig. 14.The worst result among the four is the one with no regularization, whose RSME is the highest.Although not plotted, the results also show the prediction losses L prediction for both cases of slope regularization only and value regularization only are comparable.That means the inverted bathymetries shown in Fig. 13(b) and (c) are both admissible solutions if the flow prediction loss is the only metric.This is a clear evidence of the non-uniqueness for the bathymetry inversion problem.To shrink the solution space and nudge the inversion toward a usable solution, physical constraints in the form of inversion loss regularizations must be utilized.The result show that the constraint on the bed elevation value itself using the zeroth-order Tikhonov regularization is not sufficient.Additional constraint on the bed slope, i.e., the smoothness of the inverted bathymetry, is also critical.For practical applications, these additional constraints on value and slope should be adjusted based on prior knowledge or belief of the bathymetry to be inverted.

Inversion uncertainty
Because of the non-uniqueness and ill-posed nature of bathymetry inversion, uncertainty quantification is of great interest for practical purpose.To demonstrate the use of the inversion method proposed in this work, a simple uncertainty study was performed.
The inversion input, i.e., velocity, was augmented with 10% uncertainty by adding random perturbations.Two hundred inversions were performed with the randomly perturbed velocity fields and then statistics were calculated for the inverted bathymetries.
Figure 15 shows the mean inverted longitudinal and cross-sectional profiles with uncertain velocity fields.The 95% confidence intervals around the mean profiles are shown as shaded areas.The truth bed profiles are shown as black solid lines.The case uses the "Sample 0" bathymetry in Fig. 3 as the truth.The result shows that the truth bed profiles are bounded by the 95% confidence intervals, indicating the efficacy of the proposed inversion method and the appropriateness of the hyperparameter values.

Effects of CNN surrogate architecture
The architecture of CNN surrogate model affects the forward prediction.It is also important to know how the architecture affects the inversion.This section discusses two aspects of this effect.
One aspect is that so far we only used two outputs of the surrogate model, i.e., two velocity components u and v, in the inversion.The question is then whether it has any advantage to design a CNN surrogate which only has two outputs for u and v.To distinguish, we denote the surrogate with three outputs as NN (u,v,W SE) and the one with  only two outputs as NN (u,v) .Theoretically, the training of the two surrogate models with the same dataset will result in different parameters (weights and biases) in the neural nets.Consequently, the gradient of ∂L total (z b )/∂z b in Eq. 9 will be different.The surrogate model NN (u,v) was trained using the same dataset except that it only has two outputs for u and v, not W SE (see Fig. 1).
The second aspect is whether the inclusion of W SE in the inversion, in conjunction with u and v, makes any difference.In Section 3.1, we already discussed the relatively small prediction error for W SE in comparison with those for u and v.We hypothesize that the inclusion of W SE in inversion does not have significant contribution for the inversion, at least for the problem defined by the dataset in this work.
Based on above discussion, we compare three different approaches, namely inversions using (u, v) from NN (u,v,W SE) , (u, v, W SE) from NN (u,v,W SE) , and (u, v) from NN (u,v) .Figure 16 shows the inverted bathymetries using the three approaches.Each subplot shows the mean of the bathymetries inverted from eleven different starting guesses of the bed.Although slightly different, all three approaches produce similar bathymetries with comparable RMSE.For the surrogate model NN (u,v,W SE) , the inversions with and without W SE produce almost identical results, which confirms our hypothesis regarding the importance of W SE for this problem.Whether this conclusion can be generalized is unclear at this point.It may so happen that u and v are sufficient to do the inversion and W SE is redundant for this particular problem.For a different problem where W SE can contribute more new information, its inclusion may be necessary.
Judging by the comparable RMSE of inverted bathymetry, the inversion using (u, v) from NN (u,v) has no advantage than that using (u, v) from NN (u,v,W SE) .Future work needs to investigate the generalizability of this conclusion.Again, for cases where W SE can contribute unique information, the inclusion of W SE might be necessary and therefore inversion using (u, v) from NN (u,v) might perform poorly.

Conclusion
Using a CNN-based surrogate model for a shallow water equations solver, a bathymetry inversion method is developed based on the gradient conveniently calculated with neural network's automatic differentiation.The surrogate model uses a shared-encoder and separate-decoder architecture, which can successfully capture the dynamics between in- -33-manuscript submitted to Water Resources Research put (bathymetry) and output (flow field).To do the inversion, new regularizations have to be used for both the bed elevation value and bed slope.The new regularizations embed the prior knowledge or belief on the bathymetry to be inverted.Without these regularizations, especially the slope regularization, the inverted bathymetry shows substantial blockiness and noisiness.
One of the difficulties in nonlinear inversion problem is the lack of clearly defined approach for determining the parameters of regularization losses.In this work, we found the α slope parameter for slope loss is more important than α value for bed elevation value.
Using the "L-curve" criterion, a heuristic approach was proposed with some guiding requirements.This approach produces optimal α slope values and good inversion results.
The inversion process has two distinctive stages.The first is characterized by rapid changes of bed in blocks and bands such that it can quickly evolve to a state that can represent the overall landscape of the target bathymetry.The inversion loss due to flow prediction error quickly converges to its minimum during this stage.The blockiness of the inverted intermediate beds correspond to the sizes of filters and strides for the feature maps of convolution layers in encoder.During the second stage, the regularizations for bed elevation value and slope gradually smooth out the blocks and bands of bed and add more details to the inverted bathymetry.In this second stage, the inversion loss due to flow prediction error stays almost constant.It is the local adjustment of bed slope and elevation that drive the bathymetry toward its final solution.In other words, it is the two regularizations selecting the most probable solution, out of many, to fit the imposed physical constrains on the bathymetry.
The investigation on the CNN surrogate architecture reveals that different options on the CNN surrogate (NN (u,v,W SE) vs. NN (u,v) ) and whether to include W SE in the inversion yield comparable bathymetries.The fact that the use of NN (u,v,W SE) and NN (u,v) produces similar results may be due to shared-encoder and separate decoder architecture of the surrogate.In this way, the separate decoders for u, v and W SE have minimum interference.For NN (u,v,W SE) , the inclusion of W SE, in conjunction with u and v, is redundant, at least for the problem solved in this work.The root cause of this is that the prediction error for W SE is much smaller than those for velocity, and thus it contributes less in the total inversion loss.This conclusion may not be true in other cases where the prediction error for W SE responds strongly to the variation of bathymetry.
-34-manuscript submitted to Water Resources Research The use of surrogate model greatly reduces the computational cost of forward modeling runs during the inversion.However, this saving is at the cost of offline surrogate training time.In future work, this cost may be reduced using other approaches.For example, instead of surrogate models, an alternative option is to implement the physicsbased forward models using machine learning platform and languages such as PyTorch and Tensorflow.In this way, all the operations on the inputs are recorded and automatic differentiation can be carried out directly using backpropagation.However, this alternative currently faces least two challenges.One is the upfront cost of implementing forward models using a new language.The second is the complexity and computational cost of automatic differentiation even if we can implement physics-based models using these platforms.Solvers of SWEs using traditional methods such as the finite volume method involve tremendous amount of calculations which have to be recorded internally for automatic differentiation purpose.

Open Research
A permanent copy of code, data, and scripts used for this work has also been achieved in the CUAHSI's HydroShare: http://www.hydroshare.org/resource/357c3c413622460a91d29cc61d0ba084.
The latest version of code and data generation scripts can also be accessed at https:// github.com/psu-efd/dl4HM/tree/main/examples/bathymetryinversion 2D.
for incompressible flows around objects.The training data was generated by running sufficient number of SRH-2D simulations, whose input bathymetry data was randomly generated.The simple setup in this work is sufficient to prove the concept and show the feasibility of the proposed methodology.For practical use, the training data generation should take into considerations of the prior on real bathymetry.

Figure 1 :
Figure 1: Scheme diagram of the CNN-based surrogate model architecture.

Figure 2 :
Figure 2: Double-bounded ReLU function for the calculation of inversion losses due to value and slope.

Figure 3 :
Figure 3: Four example synthetic bathymetries generated with 2D Gaussian process.The two dashed lines on Sample 0 are the two profiles, longitudinal and cross-sectional, used later in this paper.Flow is from left to right.
simulations with the physics-based model SRH-2D were performed on a desktop with an Intel Core 3.40GHz CPU.The 3,000 SRH-2D simulations took about 24 hours to finish.The training of the surrogate model took about 2 hours.Each inversion took about 1 minute.The code and data generation scripts used in this work can be accessed at https:// github.com/psu-efd/dl4HM/tree/main/examples/bathymetryinversion 2D.Because of the large number of simulations needed, and to automate the data generation and processing, the python package, Python-based Hydraulic Modeling Tools, pyHM T 2D, was used to control SRH-2D modeling runs and transform the results to the inputs and outputs of the neural network

Figure 4
Figure 4 shows the history of the training and validation MSE losses over 100 epochs.

Figure 5 :
Figure 5: An example case showing the performance of CNN surrogate model.The bathymetry for this case is the "Sample 0" shown in Fig. 3.The first and second columns show the contours of velocity components u, v, and W SE from SRH-2D and CNN surrogate, respectively.The third column shows the differences.

Figure 6 :
Figure 6: Histogram of the L2 norms of surrogate prediction errors for all 150 test cases.
factor α slope case dependent Slope mean for slope loss regularization in both x and y directions x c,slope 0.0 Slope amplitude for slope loss regularization in x direction a slopex 0.08 Slope amplitude for slope loss regularization in y direction a slopey 0.15

Figure 7 :
Figure 7: Determination of slope loss parameter α slope for an example case.This case corresponds to "Sample 0" in Fig. 3.The scatter markers are colored by their corresponding α slope values, which are annotated in the figure.For each α slope value, 11 inversions were performed with different initial guesses of the bed.The mean of all 11 inversions for each α slope value are plotted as a larger marker with edge.All the mean points are connected to show the "L-curve".For the 11 cases of each α slope value, the confidence ellipse with two standard deviations is also drawn to show the clustering.The percentiles of the surrogate prediction errors are also shown.

Figure 8 :
Figure 8: Example inversion results for z b .The top row, from left to right, shows the z b truth, mean of all inverted z b fields from all initial conditions, and the differences between truth and inverted beds.The rest rows show three individual examples of inversion results started from different initial beds.Note that for the second row the initial bed is flat with random white noise.

Fig. 8 .
Fig. 8.The locations of the two profiles, one longitudinal and one cross-sectional, are shown as the black dashed lines on bathymetry "Sample 0" in Figure 3.The two profiles highlighted with heavy weight lines are for the truth and the mean of all inversions.The ensemble of inverted bed profiles is blended in the background with light weight lines to show the band of uncertainties.The mean of inverted bed profiles generally follows the profiles of truth.The uncertainty band approximately bounds the truth for both lon-

Figure 9 :
Figure 9: Profiles of the inverted bathymetry for the case shown in Fig. 8: (a) Longitudinal profile in the middle of the channel, (b) Cross-sectional profile at half channel length.

Figure 10 :
Figure 10: Example inversion process for the case shown in Fig. 8.This inversion used the initial flat bed with random noises.

Figure 11 :
Figure 11: Histories of inversion total loss and its three components as functions of iteration steps.The prediction error loss is due to the error in the predicted flow field.The other two losses are due to bed elevation value and slope regularizations.The case is the same as that in Fig. 10.

Figure 12 :
Figure 12: Feature maps and inversion process: (a) Truth bathymetry z b , (b) One example feature map in the first convolution layer corresponding to the truth bathymetry, (c) Another example feature, and (d) Inverted the bathymetry at one early iteration showing similar pattern of feature map 0.

Figure 13 :
Figure 13: Effects of inversion loss regularization on inverted bathymetries: (a) with both slope and value regularizations, (b) with only slope regularization, (c) with only value regularization, and (d) with no regularization.The RMSE of inverted z b is also reported.

Figure 14 :
Figure 14: Effects of inversion regularization on inverted bed profiles.The first column is for the longitudinal profiles and the second for cross-sectional profiles.(a) and (b) show the mean profiles from all regularization scenarios in comparison with the truth.(c) and (d) show the profiles inverted with slope regularization only.(e) and (f) show the profiles inverted with value regularization only.(g) and (h) show the profiles inverted with no regularization.

Figure 15 :
Figure 15: Profiles of the inverted bed with 10% uncertainty added to the velocity field: (a) Longitudinal profile in the middle of the channel, (b) Cross-sectional profile at half channel length.The 95% confidence intervals around the mean are shown as the shaded areas.

Figure 16 :
Figure 16: Effects on inversion using different CNN surrogate architectures and different inversion input: (a) inversion using (u, v) from NN (u,v,W SE) , (b) inversion using (u, v, W SE) from NN (u,v,W SE) , and (c) inversion using (u, v) from NN (u,v) .

Table 1 :
Details of the CNN-based surrogate model

Table 2 :
RMSE of absolute error and relative error to show the accuracy of surrogate model for all 150 test cases.