Fuzzy map comparisons enable objective hydro‐morphodynamic model validation

Numerical modeling represents a state‐of‐the‐art technique to simulate hydro‐morphodynamic processes in river ecosystems. Numerical models are often validated based on observed topographic change in the form of pixel information on net erosion or deposition over a simulation period. When model validation is performed by a pixel‐by‐pixel comparison of exactly superimposed simulated and observed pixels, zero or negative correlation coefficients are often calculated, suggesting poor model performance. Thus, a pixel‐by‐pixel approach penalizes quantitative simulation errors, even if a model conceptually works well. To distinguish between reasonably well‐performing and non‐representative models, this study introduces and tests fuzzy map comparison methods. First, we use a fuzzy numerical map comparison to compensate for spatial offset errors in correlation analyses. Second, we add a level of fuzziness with a fuzzy kappa map comparison to additionally address quantitative inaccuracy in modeled topographic change by categorizing data. Sample datasets from a physical lab model and datasets from a 6.9 km long gravel–cobble bed river reach enable the verification of the relevance of fuzzy map comparison methods. The results indicate that a fuzzy numerical map comparison is a viable technique to compensate for model errors stemming from spatial offset. In addition, fuzzy kappa map comparisons are suitable for objectively expressing subjectively perceived correlation between two maps, provided that a small number of categories is used. The methods tested and the resulting spatially explicit comparison maps represent a significant opportunity to improve the evaluation and potential calibration of numerical models of river ecosystems in the future.


| INTRODUCTION
Two-dimensional (2D) numerical modeling is a state-of-the-art tool in environmental and earth sciences to simulate hydrodynamic and morphodynamic (hydro-morphodynamic) processes in river ecosystems (Ganju et al., 2016). Typically, setting up a state-of-the-art hydromorphodynamic numerical model involves: (1) acquiring input data of the terrain (e.g., topography/elevation and sediment grain sizes), watershed hydrology (e.g., discharge data), and hydraulics (e.g., stagedischarge relationships, local flow velocity, and water depth measurements); (2) using third-party software (e.g., open TELEMAC-MASCARET, Hervouet & Ata, 2020) to select a mathematical model, define a spatiotemporal (implicit or explicit approach) discretization scheme (finite volume, finite differences, or finite elements), generating boundary conditions based on the input data, and implementing a numerical approximation and a method to solve the Navier-Stokes equations; (3) model calibration of relevant input parameters; (4) model validation to express model accuracy; and (5) a sensitivity analysis of model-specific and domain-specific parameters.
Model calibration of hydrodynamic parameters builds on measured hydraulic parameters such as flow velocity and water depth compared with modeled values. For example, if the modeled water depth is generally lower than the observed water depth, the roughness value to be used with the numerical model is iteratively adapted until a best global match between modeled and observed data is achieved (e.g., Barker et al., 2018;Carr et al., 2018). Morphodynamic model parameters are, for instance, the Shields (1936) parameter (also referred to as critical dimensionless bed shear stress), which is vetted in the calibration process against observed elevation (i.e., topographic) change rates. For example, if the numerical model overestimates erosion, the Shields parameter is increased to emulate a higher resistance of grains against hydraulic forces. The outcome is a potentially overcalibrated hydro-morphodynamic model that needs to be validated with a second observation dataset in order to determine the model's accuracy (Beckers et al., 2020;Mosselman & Le, 2016). Both calibration and validation of hydro-morphodynamic models can be carried out using spatially explicit topographic data acquired at two different points in time. This time period is simulated in the numerical model by defining unsteady inflow conditions (i.e., a hydrograph) with sediment input at the upstream model boundary. Thus, the numerical model simulates erosion and deposition patterns that should possibly match observed topographic change. The observed topographic change for comparison with modeled topographic change stems from topographic data in the form of digital elevation models (DEMs). The precision and availability of DEMs have increased exponentially in recent years due to advances in remote sensing, such as airborne survey techniques, combined with ground-based and boat-based data recordings (Gruen, 2008;Haun & Dietrich, 2021;Marcus & Fonstad, 2008;Notebaert et al., 2009;Pajares, 2015). The comparison of two DEMs at two different times constitutes a DEM of difference (DoD) (Milan et al., 2007) with variable observation uncertainty that is accounted for in the procedure of producing a DoD (Wheaton et al., 2010). Eventually, the uncertainty in DoDs can significantly influence calculated sediment budgets over the observation period (Carley et al., 2012;Wheaton et al., 2010).
The comparison of observed topographic change in a DoD and numerically modeled topographic change is often done via pixelby-pixel map comparisons or visual comparison (expert assessment).
However, visual comparison is neither consistently repeatable (Shoarinezhad et al., 2020) nor transparent, and the pixel-by-pixel approach has shown to be overly sensitive to minor shifts (Power et al., 2001). Thus, state-of-the-art techniques compare numerically modeled and observed DoDs based on statistical correlation parameters resulting from a pixel-by-pixel (element-wise) comparison (Sutherland et al., 2004). A pixel-by-pixel comparison typically uses a gridded raster format of DoDs converted to numerical arrays and matches every modeled and observed array element (i.e., topographic change at an ij-array element). If both the modeled and the observed topographic change values are similar, pixel-by-pixel statistics indicate a high degree of correlation and therefore a high model accuracy and vice versa. Still, to the authors' best knowledge and in experience, hydro-morphodynamic models of natural river ecosystems are far from simulating with sufficient high precision to yield high correlation through pixel-by-pixel comparisons. For instance, sediment deposits or scour holes may be simulated slightly too far upstream or downstream than observed in reality. Such a model, simulating morphodynamic processes conceptually correctly, may yield as low a correlation with observed values in a pixel-by-pixel comparison as a completely random, non-representative model. This is why hydromorphodynamic numerical models are often calibrated and validated based on expert assessments instead of statistical or stochastic methods to separate the wheat from the chaff. In this study, we introduce different levels of vagueness in the comparison of modeled and observed topographic change using fuzzy logic to attempt overcoming subjective expert assessment. To this end, and in the context of high uncertainty involved in hydro-morphodynamic modeling, we address the following research questions: (i) Are deterministic, hydro-morphodynamic numerical models performing better than a quasi-random, non-representative model, and how can the quality of different models be expressed? (ii) Can fuzzy logic aid in overcoming weaknesses of pixel-by-pixel map comparisons, and what are relevant fuzzification rules for map comparisons of modeled and observed data at different spatial resolutions?
To answer these questions, we adopt fuzzy map comparison methods that tolerate imprecision of topographic change in space (location) and magnitude (quantity). These methods were fundamentally first proposed by Hagen (2003) for comparing categorical geodata. First, we introduce fuzziness in space and preserve continuous numeric values of raster datasets, which is referred to as fuzzy numerical map comparison (Hagen, 2006). Second, we implement fuzzy sets and kappa κ statistics to establish a method for comparing categorical (i.e., categorized numerical values) raster maps. Thus, we assume that fuzzy sets theory addresses uncertainty and spatial-quantitative inaccuracies in a more explicit way than traditional methods to yield a better and objective appreciation of subjectively perceived correlation between two maps (in line with Pappenberger et al., 2007). To begin, we present a physical lab model of a shallow reservoir and a 6.9 km gravel-cobble bed river reach at the German-Austrian border, which both serve as test datasets in this study. Afterwards, we explain the implementation of fuzzy map comparison methods with novel features for the purpose of validating hydro-morphodynamic numerical models. The results and discussion outline the validity of the hypotheses and the sensitivity of particular parameters, and identify relevant fuzzy map comparison methods with reference to a quasi-random topographic change map.

| DATASETS
Two datasets serve for answering the above-defined research questions. The datasets involve a physical research model of a shallow reservoir (without specific scale) and a 6.9 km long gravel-cobble bed reach of the Salzach River between Germany and Austria. Both the shallow reservoir and the Salzach River were numerically modeled in previous studies and observed topographic change datasets are available. Kantoush (2008) conducted a series of physical model experiments to analyze flow fields and sedimentation processes in shallow reservoirs with different shapes. We use one of the datasets from these experiments (referred to as T14 in the original publication) with a diamondshaped reservoir. The experimental dataset includes maps of loose sediment deposition height from before and after a 7.5 h experimental run with steady inflow of 0.007 m 3 /s and supply of suspended sediment in the form of crushed walnut shells (particle density of 0.0015 g/m 3 ) with an average diameter of 50 Â 10 À6 m and a sediment concentration of 3 Â 10 3 g/m 3 . A mixing tank upstream of the experimental setup served for controlled supply of the watersediment mixture. The supply rate was measured with a flow meter (precision AE0.0001 m 3 /s) and the sediment height was recorded at the end of the experiment with a mini echo sounder (precision AE1 Â 10 À3 m). At the beginning of the experiment considered, no sediment was present in the 4.0 m wide and 6.0 m long, diamondshaped reservoir, which consisted of hydraulically smooth PVC walls and a flat (zero-slope) bottom. Figure 1a shows the topographic change recorded after the experimental run. Shoarinezhad et al. (2021) reproduced the experimental run with a three-dimensional numerical model of the diamond-shaped physical model using the software SSIIM 2 (Sediment Simulation In Intakes with Multiblock, by Olsen, 2014) coupled with a calibration tool called PEST (Parameter Estimation and Uncertainty, by Doherty, 2015).

| Physical model of shallow reservoir
Both experimentally observed and numerically modeled topographic change datasets are used in this study with equidistances of 0.1 m (in x and y directions) between the points (resulting also in a pixel size of 0.1 m).

| The Salzach River
The Salzach River at the German-Austrian border is a heavily impaired mountain river that has been the subject of multiple studies in the framework of river restoration efforts (Beckers et al., 2018). This study uses datasets from previous works that include topographic surveys and numerical modeling of a 6.9 km long river reach (between 59.3 and 52.4 km upstream of the river's confluence with the Inn River). A section of the studied river reach is shown in Figure 2 Beckers et al. (2018) used an expert-calibrated model (i.e., input parameters were changed based on an expert assessment), and Beckers et al. (2020) presented the same model, but with a stochastic (Bayesian) calibration of input parameters. In this study, we test the fuzzy map comparison methods with results from both the expert assessment and the stochastically calibrated models.

| Baseline (placebo) map datasets
To assess model quality in the form of similarity between observed and model data, we generated quasi-random baseline rasters representing placebos for the results of the shallow reservoir and Salzach numerical models. The randomness in the baseline rasters stems from the BitGenerator of the NumPy library (Harris et al., 2020), where we and quasi-random, we assume that subjectively perceived correlation of the numerically modeled maps is higher than the correlation of the baseline maps and observation maps.

| METHODS
Instead of using a pixel-by-pixel comparison of modeled and observed data with statistical parameters such as the root mean square error (RMSE) or a correlation coefficient (e.g., Pearson's r), we introduce fuzzy logic to compare grid data in the form of maps. More information on model assessments with pixel-by-pixel (or value-by-value) statistics is provided in the supplemental material (A).

| Fuzzy logic
Depending on its value, a variable can be categorized as being true or false. For instance, if the variable topographic change is smaller than zero meters (i.e., negative), it can be attributed to belong to the category erosion. Thus, a curve to express degrees of truth (i.e., a membership function) of erosion can be plotted against all possible topographic change values and is a horizontal line at 1 (True) for topographic change values smaller than zero meters and 0 (False) for values larger than zero meters. Vice versa, we can define the category deposition.
Thus, we attribute the belongingness (or membership) of topographic change values according to Boolean logic to a discrete true-false function. However, natural variables can cause more complex responses than just true or false, and with respect to topographic change we may want to consider an additional category of little change. Because a vertical line at a topographic change of zero meters is not representative (or quasi-impossible to happen), we define the little change category by a triangle-like function assigning real numbers between 0 (value does completely not belong to the category) and 1 (value does completely belong to the category). Similarly, we can modify the membership functions for the categories erosion and deposition so that the functions do not merely jump between 0 and 1 in the vicinity of zero meters topographic change, but rather fall linearly from 1 to 0 (erosion), or rise from 0 to 1 (deposition). Thus, a topographic change value close to zero meters can be assigned to multiple categories. The assignment of a variable's state to multiple categories through non-discrete (non-Boolean) functions corresponds to the definition of so-called fuzzy sets and represents the fundamental idea behind fuzzy logic (Zadeh, 1965;Zimmermann, 2011). Figure We will compute membership vectors for every pixel in the fuzzy kappa method introduced below, which produces a map of fuzzy pixels. Thus, a fuzzy pixel represents multiple responses (in the form of fuzzy, non-Boolean pixel values) considering the response of a central pixel and its neighboring pixels. This is why the membership vector is also referred to as fuzzy neighborhood vector (Hagen, 2003).

| Fuzzy map comparison
A fuzzy map comparison of topographic change replaces pixelby-pixel with fuzzy pixel comparisons to tolerate (numerical) modeling and/or measurement errors in space (location) and magnitude (quantity). According to the above-introduced definitions, the neighborhood of every fuzzy pixel is defined by a finite distance, referred to as the neighborhood radius Rad (in pixels), from a center pixel and incorporates all pixels within this distance (including the central pixel). Thus, the neighborhood constitutes a window of N pixels ( Figure 4). The pixels within the neighborhood (neighbors and central pixel) are referred to as neighboring pixels.
To account for errors (or levels of data vagueness) in space (location) and magnitude, we consider two types of fuzziness in the neighborhood: (1) fuzziness of location and (2) fuzziness of category.
Fuzziness of location means that a pixel in a modeled map may be similar to its exact equivalent pixel and to the neighbors of the equivalent pixel on a map of observation data. The degree to which a neighboring pixel contributes to the fuzzy representation of the considered (central) pixel is derived from a user-defined membership function.
For a fuzzy map comparison, a distance decay membership function is preferable to a linear membership function ( Figure 3) (Hagen, 2003). where i represents a row number (i.e., y-coordinate on a map) and j a column number (i.e., x-coordinate on a map). The figure also illustrates the pixel neighborhood defined by the neighborhood radius Rad that introduces spatial fuzziness. In the following, we refer to the position of the neighbors of central pixels by subscript ij, ι, where ι indicates positions relative to the central pixel. Thus, the distance decay membership function in Figure 4 involves the distance d ij, ι between neighboring pixels (at ij, ι-positions) and the central pixel, and the so-called halving distance d halv : The halving distance d halv is the distance (in pixels) at which the membership function decays to half of its maximum value at the central pixel (Hagen, 2009).
Thus, fuzziness of location introduces a spatial tolerance in map comparisons, which is defined by ω(d ij, ι , d halv ) and Rad, and constitutes fuzzy pixels. To avoid overestimating similarities between maps, a fuzzy pixel (e.g., in a modeled map) should not be compared to the analog fuzzy pixel of the map being compared (e.g., the map of observations) and vice versa. Consequently, a fuzzy pixel of one map is to be compared with the analog exact central (also called crisp) pixel of another (comparison) map (Hagen, 2003).
In addition to fuzziness of location, fuzziness of category can be applied to categorical data such as erosion, little change, and deposition.
Fuzziness of category recognizes that data categories may be similar to each other and expresses the similarity of categories in the form of a similarity matrix (see example in Supporting Information Table 8 F I G U R E 3 Definition of the three categories erosion, little change, and deposition through a fuzzy set and linear membership functions (in line with Zadeh, 1965) [Color figure can be viewed at wileyonlinelibrary.com] F I G U R E 4 Qualitative example for introducing fuzziness of location through a fuzzy pixel with a neighborhood radius Rad of one pixel (total number of neighboring pixels N ¼ 9) and an exponential distance decay membership function ω(d ij, ι , d halv ) (Equation 1) with a halving distance d halv of one pixel. d ij, ι is the distance between the neighboring pixel and the central pixel at ij [Color figure can be viewed at wileyonlinelibrary.com] The per-pixel similarities ξ ij, ι, om and ξ ij, ι, mo are high (i.e., close to unity) when the observed and modeled pixel values are close to each other; in contrast, they are low (i.e., close to zero) when the observed and modeled pixel values differ significantly.
The next step introduces spatial tolerance by calculating a perpixel fuzzy similaritys fn ij,om between exact pixels in the observation map and fuzzy pixels in the modeled map (Equation 4). Vice versa, the perpixel fuzzy similarity s fn ij,mo between exact pixels in the modeled map and fuzzy pixels in the observation map is computed (Equation 5).
Both s fn ij,om and s fn ij,mo represent the maximum of the product of the exact similarities ξ ij, ι, om and ξ ij, ι, mo , respectively. Thus, the computation of s fn ij,om and s fn ij,mo involves an iterative calculation of the exact similarity of every central pixel of one map with the corresponding N pixels of the fuzzy pixel of the other map (ij, ι subscripts in Equations (4) and (5)): Next, the two-directional per-pixel similarity s fn ij is computed as the minimum of the two one-directional per-pixel fuzzy similarities s fn ij,om and s fn ij,mo : Ultimately, the global fuzzy numerical map similarity S fn between an observation map and a modeled map is the average of the s fn ij -values of all (M) map pixels (Equation 7): where M denotes the total number of ij-pixels in a map; and superscript fn refers to the fuzzy numerical method (in contrast to the fuzzy kappa method fk presented in the next section).

| Fuzzy kappa map comparison
A fuzzy kappa map comparison combines fuzzy logic and Cohen's κ statistic to calculate the similarity between two categorical raster maps (Hagen, 2003(Hagen, , 2009Hagen et al., 2005). Thus, the κ statistic accounts for relative observed agreement p a between categorical datasets, which may occur by chance and partials out expected agreement p e (Cohen, 1960): The subtraction of p e from p a (numerator) and its maximum possible value of 1.0 (denominator) yields a normalization of the agreement between two categorical maps, which is incorporated in the κ statistic (Cohen, 1960).
The expected agreement p e depends on the size of the neighborhood and the relative frequency f of every category c in a map. The larger the neighborhood, the higher the probability that the central pixel matches one of the neighboring pixels. In the case of categorical map comparisons, p e can be calculated as the sum of weighted matching probabilities of so-called neighborhood rings λ (Hagen, 2003): where R denotes the number of neighborhood rings, p λ is the proba- where the category memberships μ ij, c are computed for every category c as the maximum of the product of the distance decay function ω(d ij, ι , d halv ) and the categorical memberships μ ij, ι, c of neighboring pixels (with position index ij, ι) given by the similarity matrix: Thus, v ij, neigh represents the fuzzy union of all categorical memberships of the neighboring pixels ij, ι, weighted by the distance decay membership function. An illustrative example of the computation of v ij, neigh is provided in Supporting Information Figure 10 and Table 5 in the supplemental material (F).
Similar to the fuzzy numerical map comparison, the fuzzy kappa method also involves calculating a two-directional similarity per pixel.
Therefore, for every pixel, its exact value in the observation map is used with the corresponding fuzzy pixel in the modeled map to calculate s fk ij,om , and vice versa to calculate s fk ij,mo . The superscript fk refers to the fuzzy kappa method (in contrast to fn, referring to fuzzy numerical similarity). The fuzzy pixel is represented by the fuzzy neighborhood vector, while the exact pixel is represented by a membership vector v ij, bool , which is a vector of Boolean match or no-match values. For instance, v ij, bool for a pixel of the little change category can be calculated as follows: Similar to the fuzzy numerical comparison technique, the twodirectional per-pixel similarity s fk ij is calculated as the minimum of s fk ij,om (similarity between the observation map and the modeled map) and s fk ij,mo (similarity between the modeled map and the observation map): where s fk ij,om and s fk ij,mo are the maximum (i.e., the highest assigned) categories resulting from the minimum of the fuzzy pixel and the exact pixel values: The global fuzzy numerical map similarity S fk between an observation map and a modeled map is the average of the s fk ij -values of all (M) map pixels: To calculate the fuzzy kappa coefficient as a similarity measure between an observation map and a map of modeled topographic change, p a is substituted in Equation (8) by S fk : In the case of perfect agreement between both observation and modeled maps, κ is 1.0, and it is 0.0 (or negative) if there is no agreement (Cohen, 1960). Note that the use and interpretation of the κ statistic are controversial because of original application limitations to dichotomous (binary) data types only (e.g., Kraemer, 1980;Maclure & Willett, 1987). A discussion on the κ statistic with illustrative explanations is provided with the supplemental material (E, Supporting Information Table 4).

| Hypotheses
To answer the research questions (i-ii) defined in the Introduction, we gradually test a set of hypotheses. The two fundamental hypotheses are: I Any statistic identifies a numerical model being a better predictor for observed topographic change than the above-introduced quasirandom baseline model rasters.
II Fuzziness of location and/or fuzziness of category enable a more representative map comparison than other pixel-by-pixel methods.
The second hypothesis needs to be unpacked into two aspects: (a) the representation of subjectively perceived model performance; and (b) the capacity of compensating for spatial offset and/or magnitude errors of numerical models. Assuming that the perceived correlation corresponds to the actual correlation, these two aspects (a and b) conflict with each other because fuzzy similarities may identify correlation (i.e., S fn > 0 and κ f > 0) solely due to the introduced fuzziness that does not stem from the model performance. Thus, we investigate the truth of both aspects in this study by untangling hypothesis II into two sub-hypotheses that mutually exclude the truth of each other (i.e., verifying one sub-hypothesis implicitly falsifies the other sub-hypothesis): IIa Fuzzy map comparisons reproduce subjectively perceived similarity (i.e., model performance) as a function of objective, quantifiable parameters. Thus, the ratio between statistics comparing numerically modeled raster maps with observation maps and statistics comparing baseline rasters with observation maps must be higher in the case of fuzzy similarities than in the case of pixel-by-pixel statistics.
IIb Fuzzy map comparisons compensate errors in location and/or magnitude as opposed to pixel-by-pixel statistics, which can lead to a statistic as low as for a non-sense model. Thus, the ratio between statistics comparing numerically modeled raster maps with observation maps and statistics comparing baseline rasters with observation maps must be lower in the case of fuzzy similarities than in the case of pixel-by-pixel statistics.
To test the two sub-hypotheses, we calculate the ratio ϱ as the fraction of similarities between numerically modeled and observation maps (numerator), and baseline raster and observation maps (denominator): where STAT will be substituted by the tested pixel-by-pixel similarities (RMSE and Pearson's r) and fuzzy similarities (numerical and kappa). Note that since the RMSE is an error metric (the lower the value, the higher the model accuracy), the ϱ is to be interpreted inversely from the other ratios in the verification of the hypotheses.
Thus, if ϱ (1/ϱ for the RMSE) takes higher values for pixel-by-pixel similarities than for fuzzy similarities, we will reject hypothesis IIa and accept hypothesis IIb for a fuzzy map comparison method. For the converse, we will accept hypothesis IIa and reject hypothesis IIb.

| Algorithmic approaches
This study features the development of a novel algorithm for the validation of hydro-morphodynamic numerical models (available as an open-source Python package under the name Fuzzycorr; Negreiros,, (2020)). The algorithm pre-processes geospatial data to estimate the goodness of a modeled topographic change map compared with an observed topographic change map and comes with new routines for calculating fuzzy numerical map similarity (see supplemental material, D). In a first step, the algorithm transforms irregularly spaced topographic points to regular data grids in the form of raster files, which is necessary because of artifacts of measurements (e.g., because of different boat speed during sonar surveys) and the nature of numerical meshes (e.g., irregular triangular meshes). The interpolated points are placed in the upper left corner of raster pixels using the SciPy interpolation library  and a user-defined interpolation method out of the following three options: linear, cubic polynomial, and nearest-neighbor interpolation.
Here, we used a simple linear interpolation for the shallow reservoir testbed ( Figure 1) and a cubic polynomial interpolation for the Salzach River (Supporting Information Figure 9 in the supplemental material, C).
The algorithm runs fuzzy numerical map comparisons with the regular raster maps based on the above-shown equations, which are implemented in modularized Python3 functions. Along with a pixel size value that drives the cell (pixel) size of the regular grid (raster), the algorithm accepts a float value defining the halving distance for the distance decay membership function (Equation 1) as function arguments.
In addition to the fuzzy numerical method that is implemented in the novel algorithm, we also tested fuzziness of category with the fuzzy kappa method using the Map Comparison Kit (Visser & De Nijs, 2006). For this purpose, we categorized modeled and observed topographic change raster maps based on natural breaks (Jenks, 1967) into degrees of sediment erosion and deposition. The rationale behind this choice is that natural breaks avoid excessive categorization of data (i.e., it creates a reasonable set of erosion and deposition categories), while it minimizes information loss. Moreover, the natural breaks method outperforms other methods, such as quantiles, standard deviation, or equal interval, when the frequency distribution of data has clear differences between value frequencies (Toshiro, 2002).
The new algorithm also generates comparison maps, which indi-

| Baseline (placebo) similarities
For testing the hypothesis that fuzzy map comparison better accounts for subjectively perceived correlation than pixel-by-pixel statistics, the first required results are similarity statistics comparing the quasi-random baseline rasters with observed topographic change. Table 1 lists the resulting baseline pixel-by-pixel statistics (RMSE and Pearson's r; see the supplemental material, A), the global fuzzy numerical baseline similarity S fn base (Equation 6), and the baseline fuzzy kappa coefficient κ base (Equation 17). We used a halving distance of d halv ¼ 2 pixels and a neighborhood radius of Rad ¼ 4 pixels for the shallow reservoir, and d halv ¼ 4 pixels and Rad ¼ 8 pixels for the Salzach River.
With a possible value range for S fn between À∞ (low correlation) and 1.0 (high correlation), the computed S fn base of 0.660 for the shallow reservoir indicates some correlation between observed and quasirandom topographic change. The value of S fn base for the Salzach River is lower with S fn base = 0.245. Yet, in both cases S fn is considerably higher than the pixel-by-pixel metrics of RMSE and r, which indicates that the fuzzy numerical map comparison identifies a considerable arbitrary similarity. κ base is close to zero in both cases, which is expected not least because κ base contains the expected agreement p e (see Equation 9).  Table 6 in the supplemental material, G) correspond to the range of observed topographic change between Δz ¼ 0:028 m and Δz ¼ 0:056 m (Kantoush, 2008).

| Fuzzy similarities of the numerical model of the shallow reservoir
Both the fuzzy numerical and the fuzzy kappa map comparisons were calculated with a neighborhood radius Rad of four pixels and a halving distance d halv of two pixels.

| Sensitivity analysis of geometric input parameters
The sensitivity analysis uses both testbeds (shallow reservoir and expert-calibrated Salzach River models) to evaluate S fn (i.e., fuzziness of location only; Equation 7) as a function of pixel size, neighborhood radius Rad, and halving distance d halv .  Rad and d halv , where the pixel size is constant with 0.1 m for the shallow reservoir and 5.0 m for the Salzach River. In the case of the shallow reservoir, the effect of both Rad and d halv is small (of the order of 10 À4 ), which can be attributed to a generally more homogeneous topographic change pattern (see Figure 1) than in the case of the ), ϱ(RMSE) must be less than 1.0 and all others should be significantly greater than 1.0 to verify hypothesis I. Table 3 lists ϱ for the similarity parameters used in this study and shows that both requirements (ϱðRMSEÞ < 1:0 and all others >1.0) are fulfilled. Therefore, hypothesis I is true and we conclude that 2D numerical hydro-morphodynamic models have merit because they represent considerably better predictors than quasi-random placebo models.
The acceptance of either hypothesis IIa or IIb depends on the difference between ϱ of the pixel-b y-pixel and the fuzzy map comparisons. Table 1  ratios of Pearson's r. The higher ϱ ratios for pixel-by-pixel comparisons do not support that fuzzy numerical comparisons yield a stronger representation of perceived correlation. However, ϱ(κ f ) is much higher than ϱ(S fn ) and ϱ(r), which indicates that fuzziness of category outweighs fuzziness of location in reproducing perceived map correlation.
In addition, ϱ(κ f ) is much lower for the more complex range of categories of the Salzach River (12 categories) than in the case of the shallow reservoir (six categories). These observations point out that fuzziness of category has a much stronger effect on similarity scores than fuzziness of location, with fewer categories yielding higher scores. Thus, while fuzziness globally leads to similarities closer to their maximum value of 1.0 than pixel-by-pixel comparisons (see Table 2

| Perceived similarity
Verification of the hypotheses depends to some degree on the author's perception of observed similarity (e.g., in Figure 1). To the best of our ability, we objectively express perceived similarities here, and our observations are consistent in the case of the shallow reservoir. In the case of the Salzach River, however, the subjectively perceived similarity is not quite as straightforward because of the high heterogeneity in the observation data (see Figure 6 and Supporting Information Figure 9 in the supplemental material, C). However, to obtain a robust, quasi-objective opinion on the perceived similarity between datasets, ample surveys are required, which are in the domain of other disciplines beyond earth science. Here, we can only provide the outlook to verify our methods based on interdisciplinary approaches in the future.

| Data categorization
The categorization of real numbers of topographic change aims at assigning classes that meaningfully characterize erosion and deposition processes for fuzzy kappa map comparisons. Yet, there are known challenges in interpreting the kappa statistic because it ignores similarity between categories (Kraemer, 1980 (Maclure & Willett, 1987;Sun, 2011). As a result, the kappa statistic should not be used with manifold categories of either erosion or deposition. The similarity ratios ϱ (Table 3) highlight the decreasing relevance of the fuzzy kappa map comparison with an increasing number of categories (six categories for the shallow reservoir compared to 12 categories for the Salzach River). The rationale behind this categorization is to avoid subjective categorization by using the minimum and maximum of observation data and assigning intervals with natural breaks (Jenks, 1967). There are other, similarly objective possibilities for categorizing data, such as the average or the standard deviation of observation data to define numerical intervals for categories. However, because the interpretation of κ f is challenging (Wealands et al., 2005) for multi-categorical data, it is only meaningful to use a small number of categories for topographic change map comparisons, such as the three categories of erosion, little change, and deposition, where the little change category should involve elevation differences corresponding to the accuracy of the DoD used. Only such a simplified categorization is useful for the assessment of a numerical model with the fuzzy kappa method, which consequently only enables us to confirm a rudimentary assessment of a numerical model. Yet, the fuzzy kappa map similarity is much more sensitive to a rudimentary working model compared to a placebo model (see Table 3) than in the case of pixel-by-pixel analysis. Therefore, the fuzzy kappa map comparison can represent an efficient feedback function for the validation of hydro-morphodynamic numerical models (see also discussion below).

| Alternative assessment methods
In addition to fuzzy map comparisons, skill metrics have been discussed by other authors to be more suitable to evaluate hydromorphodynamic model performance than accuracy measures (e.g., RMSE) or correlation coefficients (e.g., Pearson's r) (Sutherland et al., 2004). A skill metric rates the performance of a model relative to the performance of a baseline (reference) prediction. To this end, a generic skill score suggests an improvement of model performance compared to a reference prediction and relative to the total possible improvement in performance (Murphy, 1988). Here, we additionally examine a novel skill score SS fn that stems from the generic definition of a skill score (Murphy, 1988) and resembles the fuzzy kappa coefficient (Equation 17): Thus, the skill score SS fn represents a performance statistic relative to the similarity of the baseline (placebo) rasters S fn base with similar interpretation ranges to Pearson's r (i.e., 0.0 means no correlation and 1.0 indicates perfect correlation). The skill score yields SS fn ¼ 0:729 for the shallow reservoir, SS fn ¼ 0:135 for the expert-calibrated model of the Salzach River, and SS fn ¼ 0:144 for the stochastically calibrated model of the Salzach River. Thus, the skill scores are slightly lower than Pearson's r for the shallow reservoir and slightly higher for the Salzach River models (see Table 2). Consequently, SS fn seems to introduce a reasonable dampening of the correlation tendency under lab conditions and a reasonable amplification of correlation tendency under natural conditions.

| Relevance of map comparisons for numerical modeling
This study has its origin in the challenge of evaluating the performance of hydro-morphodynamic numerical models. With respect to the numerical model output compared with baseline (placebo) rasters (see Figure 1 and Supporting Information Figure 9 in the supplemental material, C) and the computed global fuzzy similarity values S fn (Table 2), we verified the hypotheses that introducing fuzziness of location and category through fuzzy kappa map comparison aids to objectively mime subjective perception. The verification refers particularly to a low number of categories, which enables a reasonable expression of an expert's opinion (e.g., on the shallow reservoir model Shoarinezhad et al., 2021).
In the case of the Salzach River, the expert and stochastically calibrated numerical models suggest the development of a geomorphic landscape pattern corresponding to alternate bars. Because the considered reach of the Salzach River has undergone bank fixation, the pattern produced by the numerical model is reasonable and in good agreement with observations (Beckers et al., 2020;Chang, 1988).
However, in reality sediment deposition and erosion did not exactly occur at the same place and to the same amount as predicted by the numerical models (see Figure 6 and Supporting Information Figure 11 in the supplemental material, H). Thus, the numerical models did correctly predict the geomorphic pattern, but they were imprecise with respect to exact topographic change rates. Yet, the pixel-by-pixel metrics of RMSE and Pearson's r state that the numerical models are close to completely wrong (r close to zero), but the fuzzy map comparisons slightly overcome the strict assessment of the pixel-by-pixel metrics through higher similarities (Table 2). In addition, the above-discussed skill score SS fn (Equation 19) accounts for vagueness of location with the fuzzy numerical method and judges a numerical model in relation to a baseline reference model and can be considered a promising approach.
Finally, the geospatial assessment of model accuracy is a major advantage of fuzzy map comparison methods over global pixelby-pixel statistics because fuzzy comparison maps indicate regions where the model performs particularly well or bad (e.g., visible in the comparison maps in Figures 5 and 6). Thus, fuzzy map comparisons are suitable for the validation of numerical models, but model calibration will need further development with feedback loops to optimize model performance. Such feedback loops will need to communicate with the input parameters of a numerical model and adapt these parameters based on particular computation output for every pixel. In this process, the model output can be converted to rudimentary, spatially explicit feedback responses with a fuzzy comparison map that reflects spatially explicit model improvements. For instance, a comparison map enables the calibration of spatially explicit variables such as roughness coefficients in an automated iterative process, which provides new opportunities compared to existing (half-)automated iterative calibration tools (e.g., Beckers et al., 2020;Doherty, 2015) for adjusting global model parameters, such as the Shields (1936) parameter.
Further research is needed to compare the novel performance statistics (ϱ(κ f ) and ϱ(S fn )) with other purely data-driven approaches, such as machine learning methods, where decision making is based on numbers rather than intuition. In addition, different and more categories for landforms (e.g., crests, depressions, or plains) may lead to a more differentiated performance assessment of numerical models of reservoirs. In particular, a categorization of landforms based on standardized landform classification rules (e.g., with the LANDFORM algorithm from Klingseisen et al., 2008) has the potential to provide novel insights into numerical model performance assessment techniques, as an alternative to the natural breaks of sediment erosion and deposition categories used here.

| CONCLUSIONS
By means of map comparison methods, this study concludes that 2D hydro-morphodynamic numerical models are valuable predictors for the morphological evolution of river ecosystems. The viability of fuzzy logic in map comparisons is verified in the form of a fuzzy numerical map comparison method (fuzziness of location only) and a fuzzy kappa map comparison method (additional fuzziness of category). Introducing fuzziness of location generously compensates for model errors with respect to spatial offset of topographic change prediction over several years when the size of a fuzzy pixel is at least four exact (crisp) pixels. Fuzziness of category enables us to objectively reflect subjectively perceived correlation between observed and modeled topographic change, but only if a small number of categories (≤6) is used.
Thus, a fuzzy kappa map comparison compensates less for model errors than a fuzzy numerical map comparison, but better expresses subjectively perceived correlation. In general, a strength of fuzzy map comparisons is the generation of comparison maps that spatially explicitly measure model quality (i.e., per pixel). In the future, the implementation of spatially explicit fuzzy-based map comparison can leverage the development of efficient methods for (semi-)automated hydro-morphodynamic numerical model calibration.