Reduced-complexity modeling of braided rivers: Assessing model performance by sensitivity analysis, calibration, and validation



[1] This paper addresses an important question of modeling stream dynamics: How may numerical models of braided stream morphodynamics be rigorously and objectively evaluated against a real case study? Using simulations from the Cellular Automaton Evolutionary Slope and River (CAESAR) reduced-complexity model (RCM) of a 33 km reach of a large gravel bed river (the Tagliamento River, Italy), this paper aims to (i) identify a sound strategy for calibration and validation of RCMs, (ii) investigate the effectiveness of multiperformance model assessments, (iii) assess the potential of using CAESAR at mesospatial and mesotemporal scales. The approach used has three main steps: first sensitivity analysis (using a screening method and a variance-based method), then calibration, and finally validation. This approach allowed us to analyze 12 input factors initially and then to focus calibration only on the factors identified as most important. Sensitivity analysis and calibration were performed on a 7.5 km subreach, using a hydrological time series of 20 months, while validation on the whole 33 km study reach over a period of 8 years (2001–2009). CAESAR was able to reproduce the macromorphological changes of the study reach and gave good results as for annual bed load sediment estimates which turned out to be consistent with measurements in other large gravel bed rivers but showed a poorer performance in reproducing the characteristics of the braided channel (e.g., braiding intensity). The approach developed in this study can be effectively applied in other similar RCM contexts, allowing the use of RCMs not only in an explorative manner but also in obtaining quantitative results and scenarios.

1 Introduction

[2] The study of channel adjustments and the prediction of morphological change play a primary role in management and river conservation policies. Nevertheless, the time and space scales of interest for river management (i.e., scales of decades and kilometers) are typically larger than those most frequently adopted for understanding geomorphological processes and modeling [Church, 2007]. Such large scales have been modeled less commonly, using conceptual [Simon, 1989; Surian et al., 2009a], empirical [Rhoads, 1992], or simple 1-D numerical models [Ferguson and Church, 2009; Martín-Vide et al., 2010]. Recent work by Nicholas [2013] has shown that computational fluid dynamic (CFD) models, following a more reductionist approach, can now be applied at those spatial and temporal contexts. Notwithstanding this recent study, no CFD model has been fully evaluated at mesoscales (10–100 km, 10–100 years), especially in braided river systems [Jagers, 2003; Bertoldi and Tubino, 2005; Ferguson, 2007]. Such restricted application of CFD modeling explains the interest and recent studies using the so-called reduced-complexity models (RCMs), and in particular, cellular models [Nicholas, 2005; Brasington and Richards, 2007; Coulthard et al., 2007].

[3] Existing studies have assessed RCMs ability to reproduce laboratory experiments [Doeschl-Wilson and Ashmore, 2005], short high topographic detail river reaches [Nicholas and Quine, 2007], and geological timescale processes [Coulthard et al., 2002], but little attention has been paid to channel behavior at management time and space scales in actual river systems. There are a number of significant simplifications involved in RCM approaches which can make such approaches less suitable for precise predictions (e.g., the exact location of anabranches or bars in a braided stream) but, on the other hand, very useful for exploring the overall stream morphology (e.g., channel width and braiding intensity), evolutionary trajectory (e.g., narrowing/widening or incision/aggradation), and sediment dynamics (e.g., sediment budget). There have been also recent advances in reduced-complexity hydraulic model (RCHM) implementations [Bates et al., 2010; Neal et al., 2011] and these models could lead to further improvements in RCMs [Coulthard et al., 2013].

[4] However, it is important to understand the performance and limitations of RCM models and therefore, we need methods which are able to assess their performance [Aronica et al., 2002; Hall et al., 2005; Lane, 2006]. These methods have to be inclusive of the limitations inherent in calibration and validation of this type of model and ideally should concentrate on using field and remote sensing data from real case studies [Nicholas, 2010]. A focus on the validation of RCMs is even more compelling when considering that (i) numerical models in the earth sciences cannot be validated conclusively [Oreskes et al., 1994; Haff, 1996; Lane et al., 2005; Murray, 2007], a calibrated model can just be “empirically adequate” [Van Fraassen, 1980] and its validation is just a “confirmation” [Oreskes et al., 1994], (ii) there is no international standard for model validation of fluvial morphodynamic models [Mosselman, 2012], and (iii) most importantly, validation frameworks previously proposed are typically designed for hydrodynamics CFD models [ASME, 1993; Lane et al., 2005].

[5] Even more attention is required to RCM evaluation in braided river systems. Braided systems are chaotic, in the broad sense of being deterministic systems that show apparently unpredictable behavior [Paola and Foufoula-Georgiou, 2001]. The evolution of these systems is sensitively dependent upon their instantaneous state [Lane and Richards, 1997]. Therefore, it is highly unlikely that any braided model, however sophisticated physically based, will reproduce the exact time-dependent evolution of a braided system [Lane, 2006]. This is because of (i) the uncertainties in model initial and boundary conditions (e.g., inflow distribution of water and sediments, spatial distribution of grain size), (ii) the upscaling problems in bank erosion and nonuniform sediment transport modeling approaches [Mosselman, 2012], and (iii) the partially unknown nonlinear feedback in the coupled flow-morphology-sediment transport linkage [Haff, 1996]. This implies that the evolution of a braided system is predictable to some extent on short timescales but prediction progressively breaks down for long timescales [Paola, 2001]. Therefore, as approached in this case study, model evaluation can be carried out by comparison of specific morphological features (e.g., channel bed elevation, flowing channels position) for short timescales, while it seems more appropriate referring to statistical characteristics of braided river systems (e.g., braiding intensity, average channel width) when dealing with long timescales [Nicholas, 2005]. To generate data useful for braided river model validation, some system-scale generalizations of braided river features and behavior are required. There is a broad literature referring to braiding indices focused on planform [Mosley, 1983; Bridge, 1993; Egozi and Ashmore, 2008] and topographic aspects [Doeschl-Wilson and Ashmore, 2005; Doeschl et al., 2006] as well as methods designed to characterize the spatial pattern of a braided river [Murray and Paola, 1996; Sapozhnikov et al., 1998].

[6] Equifinality is a further crucial issue in morphodynamics RCM modeling. It is a well-known issue in hydrological [Beven, 2006] and hydraulic modeling [Aronica et al., 1998; Thorndahl et al., 2008], but it has a long tradition in geomorphology and landscape modeling too [Beven, 1996]. In such contexts, “equifinality” has been mainly used to express the concept “that similar landforms being derived from different initial conditions in different ways by possibly different processes” [Beven, 1996]. Beven [2006] proposed a further wider acceptation of the term strictly related to environmental models evaluation. Equifinality is a matter of finding a model setting “that satisfies some conditions of acceptability.” The consequences of equifinality are uncertainty in inference and prediction, which become crucial in braided RCMs modeling because of the high number of processes modeled (e.g., hydraulics, sediment transport, bank erosion, vegetation growth) and nonlinear nature of the modeled system.

[7] In order to achieve as solid as possible model evaluation based on real data, a comprehensive model evaluation strategy should also include a sensitivity analysis (SA). SA methods are important in order to reduce the number of parameters that requires tuning to obtain best fit between model outcomes and real data. SA methods identify factors that do or do not have a significant influence on model simulations of real-world observations. A large variety of SA techniques already exist [Lane et al., 1994; Saltelli et al., 2000; Hall et al., 2009; US EPA Crem, 2009] and selection of the most appropriate technique to be adopted for a particular modeling context is often difficult because each technique has its own strengths and weaknesses. Therefore, the choice of SA depends mainly on (i) the problem that the modeler intends to solve, (ii) the characteristics of the model, (iii) the quality of reference data, and (iv) operational issues such as the computational cost of running the model.

[8] In this case study, the Cellular Automaton Evolutionary Slope and River (CAESAR) cellular model [Coulthard et al., 2002; Van De Wiel et al., 2007] was applied and evaluated on a 33 km braided reach of the Tagliamento River, Italy. The model, calibrated and validated, was used to (i) investigate sediment budget at reach scale, (ii) clarify links between human impact and morphological adjustments over the last 200 years, and (iii) set a what-if scenario framework with potential evolutionary future trajectories related to different sediment management strategies [Ziliani and Surian, 2012]. The main aims of this paper are (i) to identify a sound strategy for evaluating the performance of RCMs, (ii) to investigate the effectiveness of multiperformance model assessments, and (iii) to assess potentials of CAESAR at mesospatial and mesotemporal scales.

[9] The paper is organized as follows. In the first section a brief description is provided about the study reach, CAESAR model, input data, and performance evaluation techniques. A general perspective of the workflow concludes this section. The second and third sections are dedicated to sensitivity analysis and calibration-validation. Finally, we critically discuss results and examine strengths and weaknesses of the RCM application strategy used and CAESAR capability as for morphological evolution simulation and sediment budget estimation.

2 CAESAR Application to the Tagliamento River

2.1 The Tagliamento River and the Study Reach

[10] The Tagliamento River is located in northeastern Italy, in the Friuli Venezia Giulia Region. It drains a 2580 km2 basin and has a length of 178 km (Figure 1a). There are several studies that have examined the Tagliamento River system which is considered the last large natural Alpine river in Europe [Kollmann et al., 1999; Gurnell et al., 2000; Tockner et al., 2003; Gurnell and Petts, 2006; Bertoldi et al., 2009; Surian et al., 2009a; Surian et al., 2009b]. Despite its ecomorphological quality, human activities, including channelization, gravel mining, and torrent control works in the drainage basin, have led to sediment flux modifications and notable morphological adjustments during the last 200 years. Strong narrowing and moderate bed incision have occurred from 1950s to the first half of 1990s especially in the middle and lower river reaches [Ziliani and Surian, 2012]. Over the last 15–20 years, the almost complete cessation of gravel mining activity has coincided with a morphological recovery (average widening of 3.8 m/yr and slight bed aggradation of about 20 cm) which is still in progress [Ziliani and Surian, 2012].

Figure 1.

(a) General setting of the Tagliamento River and of the study reach, (b) validation reach (reach A in text), and (c) sensitivity analysis and calibration reach (reach B in text).

[11] The study reach is located between Pinzano and Carbona (Figure 1b, reach A) and is a braided reach of 33 km in length, with a 756 m average active channel width (140 m min, 1850 m max, 2009 data) and an average slope of 0.0035. The whole reach A was chosen for the model validation and a shorter reach (reach B, 7.5 km in length, with an average active channel width of 600 m and a slope of 0.0029) was selected for SA and calibration steps (Figure 1c). Reach B was selected because its upstream edge is near the Casarsa bridges (Figure 1b) providing a stable morphological boundary that allowed fixed flow/sediment input points for the SA and calibration simulations. Reach A was selected for the following aspects: (i) its homogeneous channel configuration (braided morphology), (ii) the morphology of the upstream section (Pinzano gorge) where a gauging station is located (Figure 1), (iii) historical and recent channel changes have been widely studied in this reach [Ziliani and Surian, 2012], and (iv) the relative abundance of data needed to set up, calibrate, and validate the model (i.e., topographic data, grain size measurements, hydrological series, and aerial and satellite images).

[12] The analysis of the historical morphological changes of the reach [Ziliani and Surian, 2012] highlighted that it was characterized by a moderate widening process from the beginning of the 1990s to 2009 (54 m on average). Analysis of controlling factors indicated that the temporal trajectory of the width adjustment in this recent period is strictly linked to frequency and intensity of formative discharges. The widening trend was highly variable in this period; in particular in the 2001–2007 period, a slight channel narrowing occurred (5 m), followed by a major widening between 2007 and 2009 (30 m). The net average channel width change between 2001 and 2009 is 25 m. This net width variation can be considered moderate, if compared to average channel width of the reach (~750 m), but it was associated to several local bank retreats, up to 150–200 m, and to a notable internal dynamism of the system because about the 12% of the active channel (i.e., an area with an equivalent width of about 99 m) underwent a morphological change from active channel to island/floodplain or vice versa.

2.2 CAESAR Model

[13] CAESAR (Cellular Automaton Evolutionary Slope and River) is a fluvial geomorphology RCM developed initially as a landscape evolution model [Coulthard et al., 1998; Coulthard et al., 2002] to investigate catchment and alluvial fan evolution and later used to study morphological changes in river reaches [Coulthard and Van De Wiel, 2006; Coulthard et al., 2007]. CAESAR shares its essential features with precursor “coupled map lattice” models [Kaneko, 1993] and cellular models [Murray and Paola, 1994, 1997] (hereafter the MP model). Model variables are defined on a grid and the value of those variables changes according to equations meant to represent approximations of fluvial processes. In CAESAR (version 6.0j was used in this work), these equations are arranged in four main components: the flow model, the fluvial erosion/deposition scheme, the lateral erosion, and the vegetation models.

[14] CAESAR flow model uses a “flow-sweeping” (also called “multiscanning”) algorithm (for details see Coulthard et al. [2002], Van De Wiel et al. [2007], and Murray [2007]). During a sweep, the discharge is routed from a donor cell to a range of cells in front identified through a “sweep width” that is the number of receiving cells (factor 9 in Table 1). Like the MP model, water is still routed from each cell into multiple neighbor cells, but water depths are estimated so that water surface elevation can be used. Discharge is distributed to all receiving cells according to the difference in water elevation of the donor cell and bed elevation in the receiving cells. Discharge is distributed just by cells that own an initial discharge higher than a minimum value (Table 1, factor 7). After each flow-sweeping routine, flow depth and velocity are calculated from distributed discharge using the Manning's equation (roughness value fixed to 0.030 sm−1/3).

Table 1. Description of the CAESAR Model Factors Analyzeda
CodeFactorTypebMinMaxNumber of LevelSensitivity AnalysisCalibrationValidation Setting
  1. a

    All factors are configurable using model graphical unit interface.

  2. b

    C, conceptual factor; Q, quantitative factor.

  3. c

    0, edge value law; 1, radius curvature law.

  4. d

    0, suspended transport module not used; 1, suspended transport module used.

  5. e

    0, Einstein [1950] formula; 1, Wilcock and Crowe [2003] formula.

1lateral erosion rateQ0.0010.15xxx0.002
2number of passes for edge smoothing filterQ02005xx 50
3lateral lawcC012x  1
4max erode limit (m)Q0.0050.15xxx0.075
5min time step (s)Q0303x  0
6slope used to calculate TauC023x  2
7min Q for depth calculation (m3 s−1)Q0.050.153x  0.15
8water depth above which erosion can happen (m)Q0.010.13xx 0.03
9flow distribution width (no. cell)C375x  3
10suspended sediment loaddC0 (no)12x  0
11bed load solid transport formulaeC012x  0
12vegetation critic shear (N m−2)Q03005xx 1

[15] The derived flow depths and velocities are then used to model erosion, transport, and deposition of sediments with the shear stress determined from velocity and used to drive Einstein [1950] or Wilcock and Crowe [2003] sediment transport formulas (Table 1, factor 11). The bed shear stress can also be calculated adopting the bed slope-depth approach [Wilcock, 1993] (Table 1, factor 6) with the bed slope being calculated as effective bed slope between donor and receiving cells or maximum bed slope calculated in the eight directions neighboring cells (D8). CAESAR allows up to nine separate grain size classes to be modeled, with sediment transported as bed load or suspended according to user specification (Table 1, factor10). Bed load is routed according to the bed slope, distributed proportional to the local bed slope to D8 neighboring cells with bed elevations lower than donor cell. Sediment transport can only take place just by cells that have a water depth higher than a minimum user-defined value (Table 1, factor 8). The model uses a variable time step that restricts the time step to keep the amount of erosion and deposition in a cell that can occur below a user defined limit (Table 1, factor 4). This prevents numerical instabilities or “chequer-board” effects [Hunter et al., 2005] caused by altering local slopes too rapidly. In addition, a minimum value for the time step can be fixed by the user (Table 1, factor 5).

[16] Lateral erosion model in CAESAR is based on an empirical scheme proposed initially in the bend theory of river meandering [Ikeda et al., 1981]. The model relates the lateral bank migration to a bank erosion constant coefficient (Table 1, factor 1) and to the near bank flow velocity, assuming this is a function of local channel curvature. Local curvature can be calculated in two similar ways: (i) directly using the so-called cellular automata curvature term (Rca, see Coulthard and Van De Wiel [2006] for details) or (ii) by a local curvature value empirically related to the Rca in turn (Table 1, factor 3). The level of smoothing to be applied to the Rca grid values needs to be defined (Table 1, factor 2) and alters the variability in the cellular automata curvature term (Rca). For example, smaller values lead to larger curvature terms giving rise to tighter closer-spaced meander bends and larger values the opposite. A full description of the lateral erosion model and related factors (2, 3) is illustrated in Coulthard and Van De Wiel [2006].

[17] The role of vegetation is integrated in CAESAR by a simple linear vegetation growth model. The vegetation maturity ranges between 0 and 1 growing over time, reaching its maximum value after a period of time which needs to be defined (e.g., 5 years for this case study). When vegetation cover maturity exceeds 0.5, no erosion can occur. Whilst a cell is submerged, vegetation dies with a rate two times faster than growth. If bed shear stress increases above a defined threshold value (Table 1, factor 12), vegetation is scoured away and growth reset to zero. Moreover, vegetation can grow through sediments if buried, but if the burying layer is too thick, vegetation will die back and regrow on surface. There are no additional feedbacks with vegetation, e.g., affecting evaporation rates or roughness parameters.

[18] The addition of the modules briefly described above allows us to consider CAESAR to be a potentially useful prediction model, incremental from purely explorative models like MP model [Murray, 2007]. The goal in using CAESAR is to make quantitative predictions “and the appropriate way to test the model is to compare predictions to measurements” [Murray, 2007]. CAESAR is one of the few RCMs applicable at mesotemporal-spatial scales (1–100 years, 10–100 km) for quantitative predictions, but it needs still to be evaluated to real case studies at these scales.

2.3 Input Data for the Model

[19] Data required include defined boundaries of the geometric domain, flow and sediment input discharges, initial bed grain size composition, vegetation cover, and presence and persistence over time of channelization structures (e.g., groynes, levees, bank protections). The initial bed elevation was defined using a resampled ground digital elevation model (bilinear interpolation, cell dimension 25 × 25 m, original 2 × 2 m) obtained from LiDAR data captured between 9 and 12 April 2001. The LiDAR data were acquired during less than ideal hydrometric conditions, where there was a discharge of 140 m3 s−1 leading to an associated wet area of about 30% of the whole active channel. The elevation of the wet areas could not be estimated because field surveys were not carried out at that time and high water turbidity prevented reliable bathymetry estimates from aerial photographs using depth retrieval models [Legleiter et al., 2004]. In the wet areas, the water surface return has been used. Due to water turbidity, there were very few wet areas without LiDAR data. The “No Data” areas were filled using a linear interpolation of the elevations of neighboring cells. Despite this lack of bathymetric information, the digital elevation model (DEM) was considered suitable for the application as (i) the spatial-temporal scales used in modeling (respectively 33 km and 8 years) required a low-resolution DEM and (ii) lack of bathymetry in a braided channel morphology has relatively small impacts on the elevation of the large DEM cells used here (i.e., 25 × 25 m).

[20] Input flow data were derived from a historical series available at Pinzano gauging station located at the upstream section of reach A and 24 km upstream of reach B (Figure 1). Such data were generated by combining hydrometric data at Pinzano with discharge measurements at Venzone station (about 25 km upstream of Pinzano) corrected by a constant factor of 1.15 to take into account for the contribution of the Arzino Torrent which joins the Tagliamento upstream of Pinzano [Maione and Machne, 1982; Autorità di Bacino dei fiumi Isonzo, Tagliamento, Livenza, Piave, Brenta-Bacchiglione, 1998]. The same discharge time series was applied at the inlet of reach B although there is a slight increase of drainage area downstream of Pinzano, i.e., about 5%, due to the confluence of the Cosa Torrent (see Figure 1). Preliminary runs, sensitivity analysis, and calibration were performed using hydrometric series between 9 April 2001 and 30 November 2002 (Figure 2). Validation runs cover a longer period (about 8 years) from 9 April 2001 to 14 May 2009 (Figure 2). The 2001–2009 hydrograph is a highly representative period of the record with several flood events (recurrence interval between 2 and 12 years) and a brief period of acute shortage of formative discharges (in 2005). Discharge inputs to all simulations were sorted to remove flows below a minimum formative threshold of 100 m3 s−1. This threshold value is below the discharge needed to have sediment movement which was estimated 125 m3 s−1 by Surian et al. [2009b] and Mao and Surian [2010] for a reach near Casarsa and 150 m3 s−1 by Bertoldi et al. [2010] for another braided reach located 7 km upstream of Pinzano. This sorting allowed a significant reduction in the effective simulated time (i.e., 226 days effectively simulated instead of 2957 days) and, therefore, computational time.

Figure 2.

Discharge data used as flow input to the model: (A) series used for preliminary, sensitivity analysis, and calibration runs (9 April 2001 to 30 November 2002); (B) validation series (9 April 2001 to 14 May 2009). Hourly discharges not simulated because lower than threshold discharge (100 m3 s−1) defined for sediment motion (grey lines). Discharges simulated (above the threshold discharge; black lines).

[21] Grain data were obtained from a study conducted in 2006–2007 by Catani et al. [2007] from which we extracted a probability density curve (D50 ≈ 32 mm, D15 ≈ 16 mm, D84 ≈ 46 mm) with nine fixed classes. The distribution across these nine classes was adopted for the whole reach because (i) CAESAR has a limit of nine individual grain sizes and (ii) the available data did not allow us to determine a comprehensive spatially variable grain size distribution. Moreover, the resolution of available aerial photos prevented us from estimating such spatially variable sediment distributions, even for the surficial layer [Carbonneau et al., 2004].

[22] Vegetation cover and channelization structures were digitized from 2001 aerial photos. Only areas with arboreal vegetation (islands and recent terraces) [Comiti et al., 2011] were digitized and then used within CAESAR as initial vegetation cover condition with maturity fixed to one. Bank protection structures, groynes, and levees were also digitized, including structures still effective in 2001 and undamaged not-effective structures built since the nineteenth century.

[23] The lack of bed load input data at the top of our study reach led us to use the sediment recirculation option available in CAESAR to define the sediment flux input in the upstream reach boundary. This implied setting the so-called “sediment proportion recirculated” (SPR) factor. It is a graphical user interface (GUI) configurable parameter which allows to determine the percentage of the output sediment flowing across the downstream boundary that is reintroduced in the upstream input cells (coincident to flow input cells). This constant proportion is applied to recirculated sediment volumes associated to every grain class. The output sediment volumes at the downstream boundary are equally distributed to all upstream input cells. Input sediment volumes that are under or equal to the bed load transport capacity defined in the input cells by the sediment transport formula are fully transported downstream; otherwise, the part of sediment discharge that exceeds bed load capacity is stored in the input cells.

2.4 Performance Evaluation Technique

[24] Models can be compared to a generic function f that relates input factors to the model output values. In a model evaluation process, output values to be taken into account should be (i) readily obtainable by the model (e.g., water stage in monitored sections, daily output water discharge), (ii) able to fully describe the model ability to reproduce specific aspects of interest, and (iii) comparable with real data to evaluate specific performances. Most of the output factors used in this work were chosen referring to a performance technique defined by Aronica et al. [2002], using indices developed by Bates and De Roo [2000] and Horritt and Bates [2001b] specifically for data available in raster format.

[25] These performance indices can be used to globally quantify the capability of a model to spatially reproduce process effects (e.g., vegetation cover evolution, lateral erosion activity). They determine a pixel-to-pixel correspondence between a spatially distributed model output variable (e.g., inundated area, vegetated area, erosion or deposition area) and a raster data (real or just reference data) with the same spatial resolution. A performance index F could be generically expressed in the form:

display math(1)

where Smodel and Sreal are the areas predicted by the model and the real area, respectively, n is the number of pixels in which that areas have been rasterized, math formula and math formula are respectively a pixel where the variable is predicted by the model (M1) but absent in the field data (R0) and a pixel where the same quantity is absent in output model raster (M0) but present in the field data (R1), and math formula and math formula are pixels where model has correctly predicted the presence (R1M1) or absence (R0M0) of the same investigated variable.

[26] As shown in Figure 3, during each stage of the workflow, different sets of performance indices have been used depending on the computational cost estimated for each stage, typology, quality, and density of available evaluation data. Multiperformance index approach was used in order (i) to evaluate different modeling capacities, (ii) to enforce each single test and (iii), at least partly, to overcome equifinality issues.

Figure 3.

Flowchart and main characteristics of each single step of the methodology: number of runs, number of factors analyzed, performance indices used, and total computational time demand.

[27] Wet area (i.e., flowing channel) performance (Fw), vegetated area performance (Fveg), and active channel area (i.e., area occupied by exposed sediment or flowing channel) performance (Fac) have been calculated using expression (1). The elevation performance indices (Fe, Fsec) have been calculated using the following expression:

display math(2)

where Mj=1–3 refer to the model output cell bed elevation change value (i = 1 eroded cell, i = 2 cell with elevation invariance, i = 3 aggraded cell-elevation threshold equal to 0.1 m, being the latter value about twice the D84) and Rj=1–3 to equivalent real data pixels. Index Fe refers to bed elevation changes within the whole active channel area comparing the distributed elevation change output after every single run to the same output provided by a reference run (i.e., the run selected with the highest Fw performance). Instead, Fsec just compares cell elevation change across eight topographic cross sections surveyed in 2003, excluding those areas that were wet in 2001 (i.e., those areas where LiDAR data correspond to water surface). Fcs is the only performance index that totally differs from Aronica et al. [2002] scheme. This was considered with the aim of evaluate computational demand of every single run during preliminary and sensitivity analysis stages (Figure 3). It is calculated as the ratio between the number of numerical iterations associated with faster simulation carried out throughout the stage and the number of iterations performed during the single run (number of iterations and computation speed are directly related, regardless of the computing power).

[28] In addition to the orthophotos associated to the 2001 LiDAR and covering the whole reach A (image ground resolution of 0.5 m/pixel), other aerial photos were sourced and georeferenced using Carta Tecnica Regionale at 1:5000 scale as base layer. A set of aerial photos covering just reach B was acquired toward the end of the flow event of 30 November 2002 (Q 420 m3s−1, 0.7 m/pixel resolution); other available aerial photos covering the whole of reach A were taken on 14 May 2009 (Q 130 m3s−1, 0.3 m/pixel). This series of aerial photographs (2001, 2002, and 2009) were used to digitize the wet area (i.e., flowing channels), active channels, islands, and channelization structures. Topographic cross-section surveys were also used for the SA and calibration stages. Cross sections surveyed in May 2003 at eight monumented sections equally distributed along reach B were used (Figure 1c). The temporal gap between the 2002 aerial photos (November 2002) and topographic surveys (May 2003) is considered not to be significant given the absence of formative events between those dates (Figure 2).

2.5 Workflow to Assess Model Performance

[29] The workflow adopted in this study is organized in a simple sequence of distinct analysis (see flowchart boxes in Figure 3), each able to provide information essential for the following stage but designed as a whole to assess a generic RCM application and to understand its capability as fully as possible. Initially, the activity was characterized by a preliminary exploration phase which was useful to define the DEM cell size. The choice of a 25 × 25 m cell size took into account three main purposes: (i) to ensure low computational demand for every SA run and reasonable computational time, (ii) to maintain the same cell size throughout all the steps of the application, and therefore (iii) to exclude intentionally from the analysis the cell dimension factor which may affect significantly results of cellular models [Doeschl-Wilson and Ashmore, 2005; Doeschl et al., 2006; Nicholas and Quine, 2007].

[30] Long computational times and nonlinearity of output response to single parameter variation was established during preliminary runs. Thus, it was necessary to proceed with a modeling strategy (i) economic (i.e., provide the largest number of evaluations with the smallest number of model runs), (ii) effective (i.e., investigate the domain of existence of parameters as extensively as possible), and (iii) independent of modeler subjectivity. Considering (i) the large number of factors chosen to investigate (12), (ii) the quantitative and conceptual nature of factors, (iii) the computational power available limited to simple desktop machines, and hence (iv) the computational cost of every single run in both reaches (i.e., in reach A, but also in reach B), we discarded both the application of Monte Carlo techniques and other automatic calibration procedures well known in hydraulics [Fabio et al., 2010; Dung et al., 2011] and hydrological modeling [Engeland et al., 2006; Van Griensven et al., 2006; Parajka et al., 2007]. Therefore, we decided to adopt a more computationally sustainable two-step sensitivity analysis [Campolongo et al., 1999] followed by manual calibration and validation (Figure 3).

[31] The two SA methods used in this study significantly differ in (i) design of experiments (DOE) principles, (ii) sampling strategy, (iii) number of the analyzed factors, and (iv) overall computational demand. The first SA stage was carried out to effectively analyze, at an explorative screening level, 12 factors selected among those user-configurable directly by GUI (Figure 3 and Table 1). The second stage was applied instead to the five most significant quantitative factors (Figure 3 and Table 1) that emerged from the first SA, adopting a comprehensive variance-based approach with quasi-random sampling. Both SA stages have been carried out in reach B (Figure 1), simulating the 2001–2002 hydrograph (Figure 2).

[32] The identification of key parameters achieved by SA allowed us to address more easily the calibration step, tuning only the two most important quantitative parameters among those initially investigated (12). The calibration runs performance were checked at the same spatial and temporal context of the SA steps (reach B, 2001–2002 hydrograph), evaluating performance by 2002 digitized data and elevation performance by 2003 topographic surveys. Once the model was calibrated, we proceeded to its validation by applying the model to reach A and simulating an 8 year period. Validation performance was assessed using morphological characteristics derived from 2009 aerial photos and comparing mean annual bed load yield with estimates obtained in other rivers.

3 Sensitivity Analysis

[33] “Sensitivity analysis is the study of how variations in the output of a model can be apportioned, qualitatively or quantitatively, to different sources of variation, and of how the response of the model depends upon the information fed into it” [Saltelli et al., 2000]. It is widely recognized that sensitivity analysis (SA) is a prerequisite for model building in any setting where models are used [Lane et al., 1994; McKay, 1997; Saltelli et al., 2000; Hall et al., 2005; US EPA Crem, 2009]. Good modeling practice requires that the modeler proceeds with an assessment of the confidence of the model using SA techniques to estimate the importance of each factor, identifying the factors that can be considered more relevant for the calibration. The term “factor” denotes any input that can be changed in a model prior to its execution. Originally, the SA was designed just to deal with uncertainty associated with model parameters. Over time, the term factor has achieved a broader meaning to incorporate uncertainties associated with the use of alternative model structures or alternative modeling assumptions.

[34] In this paper, the SA was used to increase confidence with CAESAR, providing a sound approach to understand how model results depend on the variations of the input factors and identifying key factors for calibration. The strategy adopted in this work involves two subsequent SA stages [Campolongo et al., 1999]: an exploratory screening method to identify a reduced set of important factors out of a large number of factors, followed by a more accurate and quantitative variance-based method on the reduced set of factors. Both analyses have been purely exploratory without reliance of prior statistical information to address sampling strategy.

[35] The SA focuses only on factors (quantitative and conceptual) that are configurable by the model GUI and does not consider all possible factors that may influence model performance (e.g., sediment layers scheme parameters, sediment transport formulas parameters, cell size dimension). In other words, this SA concerns exclusively uncertainty in model parameters, rather than uncertainty that may be generated through changes in the model input data. This choice was motivated by practical reasons: (i) during an explorative application, sensitive parameters are potentially extremely numerous and a full analysis is unfeasible in practice, and (ii) the sensitivity to input data uncertainty (e.g., topography, discharge, grain size distribution) deserves specific evaluations that are beyond the main goals of this work.

3.1 Step 1 of the Sensitivity Analysis: Screening Method

[36] Factor screening methods are mainly applied to models with numerous factors and therefore a considerably sized input space. In this paper, the method of Morris [1991], later extended by Campolongo et al. [2007], was applied. Its basic concept is to vary each factor individually (one-at-a-time) and compute the deviation of the model output from the previous numerical experiment (the so-called “elementary effect”):

display math(3)

where dij is the value of the jth elementary effect (j = 1,…,r) associated with the ith factor (i = 1,…,k, e.g., i = 1 refers to the factor, “lateral erosion rate”—see code column in Table 1), k is the number of analyzed factors (here 12), y(x1,x2, …,xk) is the value of the selected model output (e.g., a performance index among those calculated in this SA stage—see performance index columns in Table 1), r is the number of repetitions (i.e., the number of elementary effects calculated for each single factor), and Δi is the value of the incremental step associated to the ith factor. The DOE proposed by the Morris method covers the space of existence of each individual factor, suitably discretized into a number of levels. The incremental step is the fixed interval by which the range of uncertainty of each factor is discretized (e.g., lateral erosion rate factor: investigated range [0.001–0.1], number of levels set [5], incremental step equal to 0.0025, see Table 1). The main effect μ of a factor is then estimated by computing a number (r) of elementary effects (at different points, randomly extracted in the factor domain) and then taking the average of their absolute values. This measure, μ, represents an estimation of the factor influence on the model output considered. A second measure is estimated by the standard deviation, б, of the same set of elementary effects. A high standard deviation associated to a factor is a signal that it likely has complex (nonlinear) interaction(s) with other factors. The random selection of starting points (i.e., the combination of the k factors for which the model output and the single elementary effects are calculated too) reduces the dependence of the SA results on the choice of specific starting points.

[37] In this stage k = 12, mutually independent GUI-configurable factors were analyzed (all the factors are described in section 2.2). Seven of these factors are quantitative while the others are conceptual (see Table 1). The software package called “sensitivity” available within the R Statistical Environment ( [Pujol, 2009] was used to create the DOE. A total of 130 “evaluations” or “experiments” (= r × (k + 1)) was obtained, setting the repetitions value to r = 10 (in order to keep the computational time within feasible limits), with ranges and number of levels as shown in Table 1. For each model evaluation (i.e., a single model run), the three performance indices defined in section 2 were computed (Fw, Fcs, and Fe). Once the 130 runs were completed, boxplots for every analyzed factor were plotted (e.g., the boxplot associated to the lateral erosion rate factor; Figure 4). The response of the model in terms of computational speed (Fcs) was poor, with just 12% of the evaluations requiring a number of iterations less than the reference computational demand (i.e., about 13,000 iterations corresponding to 2 years of effective time simulated in 4 days of computation using the best CPU among those indicated in Figure 3). The wet area performance ranged between 17% and 74% (mean 53%, standard deviation 14%, 75th percentile 64%; Figure 5), while the erosion index performed between 7% and 46% (mean 22%, standard deviation 7%, 75th percentile 27%). For this screening stage, the absolute performance variation between couples of runs (i.e., elementary effect) is the most important result, rather than the absolute performance range, because the variation of performance shows the influence of the factor, not the absolute performance value itself.

Figure 4.

Example of boxplot outcomes from Morris method (lateral erosion rate factor). Plots were produced using all the Morris method runs (i.e., 130 runs).

Figure 5.

Example of wet area performance index outcomes from Morris method. (a) Wet area digitalized using 30 November 2002 aerial photos; wet area model output runs (b) 102 and (d) 52; overlay results associated to wet area index performance values of (c) 75% and (e) 17%.

[38] The analysis of the 130 runs provided for each performance index a sample of 130 performances values and 120 elementary effects (10 for each factor). Hence, values of mean (μ) and standard deviation (б) of elementary effects for each factor and performance index could be calculated. To help visualize the influence of all factors, both in terms of entity and nonlinearity of influence, mean and standard deviation values were plotted (Figure 6). Considering nonlinearity, the scatterplots in Figure 6 revealed an expected nonlinear model response to several factors. Factors 2, 4, 6, and 9 show a nonlinear influence on Fw and Fe, whereas the computational speed (Fcs) seems to be influenced nonlinearly mainly by factors 4 and 12 (see Table 1 for the encoding factor). Although only qualitative, these observations have justified the application of the second SA stage with a more accurate level of investigation, limiting with the scope to a fewer number of quantitative and most relevant factors. To focus more clearly on the magnitude of factor influence, Figure 6 has been integrated with Figure 7. The estimated μ values have been used to establish a first-order ranking of factors. In particular, rank values associated to each factor for each performance index have been used to calculate a cumulative global score as shown in Figure 7. Here four groups of factors with increased levels of relevance clearly emerge. The five most important quantitative factors identified by the screening method have been considered for the variance-based analysis.

Figure 6.

Estimations of the mean (μ in text) and the standard deviation (б in text) of the elementary effect distributions associated to the 12 factors analyzed. (a) Fw, wet area performance index estimations; (b) Fe, erosion pattern performance index estimations; (c) Fcs, computational speed performance index estimations.

Figure 7.

Factor ranking provided by screening sensitivity analysis step (Morris method). Factors 2, 4, and 6 are the most important; factors 5, 7, and 10 have a negligible influence on model output.

3.2 Step 2 of the Sensitivity Analysis: Variance-Based Method

[39] In variance-based methods, we are interested in the variance of the model output and its decomposition into components according to each input factor [Saltelli et al., 2000]. Intuitively, an input factor is considered important if, when fixing its value within its range of uncertainty, the expected conditional variance associated with the output of the model reduces significantly. Importantly, variance-based methods are model independent: they work regardless of whether the model is linear or additive. We can therefore also look for the presence of interactions amongst the input factors for nonlinear, nonadditive models. Variance-based methods can also capture the influence of the full range of variation of each factor and they are capable of dealing with groups of factors. For example, uncertain factors might pertain to different logical types, and it might be desirable to decompose the uncertainty according to these types. The drawback of variance-based measures is their computational cost in terms of number of model runs and the fact that the information on the uncertainty of the model output is captured by the second-order moment.

[40] Given the “model function” f defined over a k-dimensional unit hypercube domain Ωk = (X|0 ≤ xi ≤ 1; i = 1,…,k), where k is the number of independent factors, the decomposition of the total output variance V(Y) is

display math(4)


display math(5)

is the variance of the conditional expectation of Y over all factors xi fixed at value math formula and

display math(6)

is the second-order term (and so on for higher-order variance terms). Dividing both sides of expression (4) by the total unconditional variance V(Y) the following expression is obtained

display math(7)

which represents the summation of the so-called “sensitivity indices.” Two sets of indices provide a good description of model sensitivities: (i) the first-order sensitivity index (8) which quantifies the main direct effect induced by the ith factor on the output and (ii) the total sensitivity index which is a measure of the overall effect that the ith factor induces on the model output, including all its possible synergetic interaction terms with all the other factors (9) [Homma and Saltelli, 1996]

display math(8)
display math(9)

[41] The first-order sensitivity index (8) is a normalized index that ranges between 0 and 1 as Vi varies between zero and V (Y). More specifically, the total effect (9) accounts for the first-order effect plus all higher-order effects due to interactions and is given by the sum of all the sensitivity indices which include the factor in question, not considering the sensitivity indices that do not contain that factor (see math formula in (9)). That is, according to the theory, STi ≥ Si. In practice, (i) high first-order sensitivity index values (i.e., close to one) mean that the factor has a significant effect (i.e., high sensitivity) on model output and it deserves to be more investigated in a calibration procedure, and (ii) low total effect sensitivity indices (i.e., close to zero) mean that the factor has low sensitivity on the model, therefore it can be fixed and the model can be simplified in its structure and calibrated easily. Sensitivity index values provided by the method are just estimates of the real values; that is, they always have an associated “bootstrap interval”, i.e., an uncertainty interval with an associated confidence level (90% here adopted, 5th percentile lower bound, 95th percentile higher bound).

[42] The method developed by Saltelli et al. [2010] was selected to estimate such indices. This choice was due to (i) the basic conceptual simplicity of the method and (ii) the availability of Matlab routines provided by the JRC-IPSC research group (Joint Research Centre–Institute for the Protection and the Security of the Citizen, Ispra, Italy). The DOE needed to execute the variance-based analysis uses, as a basis, random points selected over the space Ωk of the input factors [Saltelli et al., 2010]. For the generation of this sample, we decided to use the quasi-Monte Carlo technique [Sobol, 1967] rather than a simple random number generator. The quasi-random sampling strategy ensures a better uniformity of the sample set and faster convergence of the sensitivity estimates than simple random sampling [Tarantola et al., 2012]. The commercial implementation of the Sobol sequence, SobolSeq8192 (developed by BRODA Ltd and available as C++ code [Tarantola et al., 2012]) was adopted, which generates uniformly distributed sequences on the unit interval.

[43] The top five quantitative factors (see distinction between conceptual and quantitative factors in Table 1) that were identified by the screening method (Figure 7) were selected to be used in the variance-based SA. The seven factors that were not analyzed in the variance-based SA have been fixed at values used for the run that provided the best combined performances Fcs and Fw in the screening analysis. The performance indices used in the variance-based SA partially differ from those used during the screening SA (Figure 3). Computational speed performance (Fcs) and wet area performance (Fw) were still used, but the elevation performance index (Fe) was replaced by a conceptually similar index, Fsec.

[44] During the second SA stage, the DOE dimension was increased progressively in order to converge toward more satisfactory values of sensitivity indices (first-order sensitivity indices, Si—Table 2, total sensitivity indices, STi—Table 3), yet within a reasonable computing time. A total of 246 model evaluations was carried out in three subsequent stages. The analysis of Si indices shows clearly that estimates obtained have not been satisfactory in the first two stages after respectively 71 and 148 runs. Indeed, the bootstrap intervals (i.e., the range between higher and lower bound of indices estimated interval) were extremely loose. As an example, Si estimates obtained for output Fsec and Fveg for the first and the second stages of the analysis are reported in Table 2. Even at the end of the third stage, after 246 simulations, the Si estimates do not converge to satisfactory values.

Table 2. First-Order Sensitivity Indices Estimates Achieved at Second Sensitivity Analysis Stage
  Output 1 FcsOutput 2 FwOutput 3 FvegOutput 4 Fsec
StageFactor CodeLower BoundIndex EstimateHigher BoundLower BoundIndex EstimateHigher BoundLower BoundIndex EstimateHigher BoundLower BoundIndex EstimateHigher Bound
11         −107.70−54.742.42
2         −23.65−4.7012.65
4         −20.73−1.2221.77
8         −119.38−61.77−11.50
12         −16.55−1.6814.40
21      −16.05−2.439.70   
2      −11.22−5.48−1.79   
4      −4.151.417.28   
8      −16.01−6.283.09   
12      −0.800.601.78   
Table 3. Total Sensitivity Indices Estimates Achieved at Second Sensitivity Analysis Stage
  Output 1 FcsOutput 2 FwOutput 3 FvegOutput 4 Fsec
StageFactor CodeLower BoundIndex EstimateHigher BoundLower BoundIndex EstimateHigher BoundLower BoundIndex EstimateHigher BoundLower BoundIndex EstimateHigher Bound

[45] In general, the convergence of the sensitivity indices estimates is reached when all the sensitivity index estimations are between 0 and 1 and the uncertainty intervals associated to the estimations are narrow. However, only a proper dimension of the DOE (i.e., number of runs) can guarantee the proper convergence of the estimations (according to the theory the convergence is ensured only using a number of samples that tends to infinity). This dimension cannot be known prior to the application of the method. Therefore, it is necessary to proceed step by step until (i) the absolute index estimates and the associated bootstrap interval amplitudes are stabilized or (ii) partial results are even provided considering estimations of relative dimensions (i.e., some index estimations is constantly higher than the others). The latter condition took place in this case study. The theoretical convergence was determined to be computationally unfeasible but clear results were derived after the third DOE dimension increment step (see step mark in Tables 2 and 3). In particular, the first-order sensitivity index for factor 4 (maximum erode limit) shows a significant behavior with estimates always higher than those of all the other factors (see entries in boldface in Table 2). Therefore, it could be concluded with a reasonable level of confidence that factor 4 can be more accurately calibrated than the other factors.

[46] Unlike first-order indices, STi converged to reliable estimates (although not exactly minor than 1) after the second stage of the analysis. Table 3 (see entries in boldface) shows that factors 2 (number of passes for lateral smoothing routine) and 12 (vegetation critical shear stress) have no significant influence on any performance indices (indices estimates ranged between 0 and 0.19). Therefore, these two factors can be considered to have no effect on the output. Referring to Fcs, the STi  for factor 4 shows that this is the most influential factor. This factor strongly influences the model also by interacting with other factors (see Table 3, entries with text in parenthesis) not only by itself as shown by the first-order indices in Table 2. Factor 4 is flanked in order of importance by factors 1 (lateral erosion rate) and 8 (water depth above which erosion can occur), respectively. In particular, factor 1 is dominant for both vegetation cover performance (output 3—Fveg) and elevation performance (output 4—Fsec, see entries in italics in Table 3). Instead, factor 8 is relevant just for wet area performance (output 2—Fw, see Table 3 entry in bold italic font). The second SA stage allowed us to identify the less sensitive factors (2, 8, and 12) and simplify the model for further calibration step. The calibration was carried out considering just two factors, maximum erode limit (4) and lateral erosion rate (1).

4 Calibration and Validation

[47] Earlier we described the classical concepts of calibration and validation presented by Darby and Van De Wiel [2003] referring to the American Society of Civil Engineers [1998] generic modeling framework procedure. It is well known that process-based models in fluvial geomorphology are difficult to calibrate [Hoey et al., 2003; Wilcock and Iverson, 2003]. Formann et al. [2007] and Papanicolaou et al. [2008] also showed that complexities increase proportionally with the complexity of the processes that are modeled and the number of parameters to be estimated. Considering these issues, we applied the SA to deal with no more than two factors in the calibration process.

[48] According to the SA results, lateral erosion rate (1) and maximum erode limit (4) factors were selected for calibration. The other three factors investigated during the second SA stage have been fixed at values that provided the best combined performances Fcs and Fw referring to the whole sample of runs carried out in the SA (376 runs). A two-dimensional calibration domain was defined resizing the investigated factor ranges of maximum erode limit [0.03–0.1] and lateral erosion rate [0.001–0.045]. The ranges were selected by visually analyzing the performance Fw scatterplot created using all 376 SA runs. The two-dimensional parameter space was reduced in size to allow the definition of a DOE that covered the whole domain in a homogeneous way. A regular coverage was carried out by a total of 99 runs (see grid point in Figure 8) and performance indices Fw, Fveg, and Fsec were calculated. Using a kriging interpolation technique (ordinary kriging with exponential semivariogram model, package ‘spatial’ version 7.3.1 available in R Statistical Environment), contour-interpolated surfaces were generated for each performance index in order to graphically identify a global maximum performance point (Figure 8). This point was visually identified for the following value combination: maximum erode limit = 0.075 and lateral erosion rate = 0.002 (Fw 76%, Fveg 90.75%, Fsec 69.5%; Figures 8a–8c). This pair of values was used for the validation runs (see validation factor setting in Table 1).

Figure 8.

Kriging interpolation of performance index values obtained by calibration DOE. (a) Fw, wet area performance index interpolation; (b) Fveg, vegetation cover performance index interpolation; (c) Fsec, erosion and deposition performance in surveyed topographic sections. Calibration setting: maximum erode limit = 0.075 m; lateral erosion rate = 0.002 (see square data marker).

[49] Figure 8 shows that the two most relevant quantitative factors (lateral erosion rate and maximum erode limit) have different impacts on model performance. Maximum erode limit is clearly less influential on the development of vegetation cover as shown in Figure 8c: the Fveg surface is similar to a plane with a constant slope toward the vertical axis. That is, regardless the value of the max erode limit factor, just decreasing the lateral erosion rate to the lowest value of its range makes it possible to get the best performance values. This simply indicates that for the calibrated case study, an excess of lateral erosion can produce too much vegetation removal lowering global performance of the model concerning the dynamics of vegetated areas. Figure 8b highlights some aspects concerning the Fsec performance index, treated as an indicator of model reliability to predict bed elevation changes. This figure shows that (i) the areas of maximum performance are located in a narrow band next to the vertical axis that correspond with the lowest lateral erosion rate values and are regardless of the max erode limit, (ii) maximum erode limit factor has a slight influence on the Fsec for higher values of lateral erosion, and (iii) if maximum erode limit is associated to lower lateral erosion rate values, it loses importance for the response of the model to elevation changes. This indicates that (i) high values of lateral erosion rate produce high bank retreat rates affecting negatively global elevation response and making more relevant the efficiency of the in-channel movement of the sediments eroded from the banks (related to the maximum erode limit factor value), and (ii) for the lowest values of lateral erosion rate, the maximum erode limit became insignificant referring to changes in the bed elevation performance. Less clear information can be derived from Figure 8c which refers to Fw. The wet area performance seems to be unevenly influenced by the two main factors analyzed as shown by the irregular patchy aspect of the performance surface. Finally, it is worth noting that the maximum performance area lies on the edge of the explored domain (Figure 8). This fact may suggest the following considerations: (i) the best performance may lie outside the domain investigated by the sensitivity analysis and calibration and, specifically, the best combination could be achieved for lateral erosion rate value within the range [0–0.001]; and (ii) the incremental step used to investigate the lateral erosion rate factor domain could be changed, carrying out a new sensitivity analysis applied to a different parameter domain. The latter option was discarded because it was assessed unfeasible, although worth for further research.

[50] This validation step led us to reconsider the sediment proportion recirculated (SPR) factor. In sensitivity analysis and calibration runs the circulation option has been used setting SPR equal to 1 to reasonably simulate upstream sediment input boundary conditions. This assumption presumes an overall elevation stability at reach scale and a final bed configuration (in terms of grain mixture and bed morphology) strictly dependent on its initial conditions, akin to a recirculating flume experiment [Parker and Wilcock, 1993]. This hypothesis was considered acceptable for reach B because (i) it is 7.5 km in length characterized by a longitudinally constant geometry (i.e., channel width, slope, and pattern); (ii) simulations referred to a relatively short period (April 2001 to November 2002) during which local human activities (gravel mining, bank protection construction) were not significant; (iii) results of a historical channel adjustments analysis [Ziliani, 2011] and a detailed knowledge of the reach acquired directly in the field [Surian et al., 2009b] indicate that in the period 2001–2009, the channel was stable or slightly incising; and (iv) using an alternative boundary sediment discharge hypothesis (e.g., setting an upstream edge sediment transport capacity curve) would be a source of additional uncertainty.

[51] However, unlike reach B, validation reach A is not morphologically homogeneous. The reach displays a braided pattern but channel width varies between a minimum of about 140 m at Pinzano up to a maximum of 1800 m at Spilimbergo (Figure 1). In addition, channel slope and width are different at the upstream end of the reach (slope 0.005 and width 140 m) compared to the downstream end at Carbona (slope 0.002 and width 555 m). Therefore, different bed load sediment transport capacities may be associated with different sections of the reach. Furthermore, historical analysis over recent timescales (2001–2009) showed that overall, reach A is slightly aggrading (~0.2 m [see Ziliani and Surian, 2012]). Taking into account these aspects, validation runs were carried out using SPR values greater than 1 (corresponding to an average annual sediment output yield of about 60 × 103 m3 yr−1), specifically SPR was set equal to 1.5 and 2.

[52] Unlike the SA and calibration stages that used shorter times and smaller spatial extents, the validation stage, applied to mesotemporal and mesospatial scales, led us to use a statistical approach to assess the global performance of the model. Here the performance indices (Fw, Fveg, Fac) used in the previous phases (i.e., SA and calibration) were integrated with other morphological parameters that were able to characterize in a more general statistic way the properties of the study modeled reach (see Table 4, Part I). These included (i) mean active channel width change (∆Lac), (ii) equivalent wet area width (Lw), and (iii) mean braided index [Egozi and Ashmore, 2008]. The results summarized in Table 4 show a vegetation cover performance of about 92% and active channel area performance of about 85%. Output values of mean active channel width confirmed these results, where the difference between the real mean active channel width change (25 m) and the modeled change (≈50 m) corresponded to DEM cell size (25 m). Despite this, the ability of the model to reproduce morphological patterns was poorer. The observed braided index from the 2009 aerial photos is twice that measured in modeled inundated areas, the patterns of modeled and real inundated areas substantially differ (as shown in Figure 9) and the wet area performance index was low (Fw ≈ 23%).

Table 4. Results of the Two Validation Runs Considering Morphological Features and Bed Load Sediment Yield Estimates
Morphological Features Evaluation
 Active Channel 2009 (Vector Data)Active Channel 2009 (Rasterized Data 25 × 25 m)CAESAR Results
  1. a

    Braidplain width at Q 1500 m3 s−1.

Active channel width (m)756756782–786
Active channel width change (m)242450–54
Wet area width (m)236230604–616
Braiding index (channel counted)–2.8
Performance active channel (%)  84.9–85.00%
Performance vegetated area (%)  91.6%
Performance wet area (%)  23.6–23.3%
Bed Load Annual Yield in Some Gravel Bed Rivers Compared To Model Results
 Surian and Cisotto [2007]Martin and Church [1995]Ham and Church [2000]Nicholas [2000]Liebault et al. [2008]CAESAR Results
River nameBrentaVedderChilliwackWaimakariri reach (a) km 10.1–13.3; reach (b) km 16.7–21.1(a) River Eygues; (b) river Lower DromeTagliamento
Local slope0.00400.0035–0.00460.0062(a) 0.0031; (b) 0.0048-0.0035
Active channel width (m)25098–245 (50 m at Vedder Crossing Bridge)50 m at Vedder Crossing Bridge(a) 570 ma; (b) 950 ma-756
D50 (mm)26303232-37-32
Drainage basin area (km2)1567123012302460(a) 1100; (b) 16402580
Mean annual discharge (m3 s−1)67-60120-90
Reach morphologyWandering/braidedBraidedBraidedBraided-Braided
Method usedMorphologicalMorphologicalMorphologicalSediment transport model(a) Morphological; (b) sediment trap measureCAESAR
Mean annual sediment yield (103 m3 yr−1)70.036.6 ± 5.6 (27.5–157.0)1952–1966: 4.9 ± 5; 1966–1973: 7.5 ± 3; 1973–1983: 22 ± 10; 1983–1991: 55 ± 10(a) 60; (b) 310(a) 66; (b) 3560.4–62.0
Figure 9.

Morphological features validation results. (a) Channel morphology digitized using 14 May 2009 aerial photos; (b) wet area output from model at the end of validation run (14 May 2009); (c) evaluation of active channel area performance; (d) evaluation of wet area performance.

[53] In order to integrate the morphological performance evaluations, an estimate of the mean annual bed load sediment yield at the downstream end of reach A (see Table 4, Part II) and along the whole reach was carried out (Figure 10). Because field measurements of sediment transport were not available, reference values of sediment transport were derived from other similar case studies. At the downstream end, mean annual sediment bed load yields obtained from the two validation runs were 60 × 103 and 62 × 103 m3 yr−1, respectively, with SPR 1.5 and 2, which are comparable to those estimated in other similar rivers [Griffiths, 1979; Martin and Church, 1995; Ham and Church, 2000; Surian and Cisotto, 2007; Liebault et al., 2008]. Also, maximum mean annual values (85 × 103 m3 yr−1 at Spilimbergo and 92 × 103 m3 yr−1 4 km upstream of Casarsa subreach—SPR 1.5) are comparable to those of these river reaches. Modeled mean annual sediment yield varies significantly along the reach (up to 40%), but differences also exist between the maximum and minimum annual sediment transport. The 2005 minimum corresponds to an averaged reach mean annual sediment yield of about 10 × 103 m3 yr−1 versus the 2002 maximum average value of about 200 × 103 m3 yr−1 (Figure 10). Sediment transport values obtained from the model have a very good correspondence with estimates reported by Nicholas [2000] and Griffiths [1979] for the braided Waimakariri River (60–310 × 103 m3 yr−1) which displays morphological characteristics very similar to the Tagliamento. Good agreement also exists with annual sediment yield estimates in other gravel bed rivers, such as the Brenta River (70 × 103 m3 yr−1 [Surian and Cisotto, 2007]), the Eygues River (66 × 103 m3 yr−1 [Liebault et al., 2008]), the Chilliwack River (4.9–55 ± 10 × 103 m3 yr−1 [Ham and Church, 2000]), and the Vedder River (27–157 × 103 m3 yr−1 [Martin and Church, 1995]; Table 4).

Figure 10.

Mean, maximum (in 2002) and minimum (in 2005) modeled annual bed load sediment transport estimated for the 2001–2009 period along the whole study reach (Pinzano-Carbona).

5 Discussion

5.1 Comments on RCMs Application Strategy

[54] The methods adopted here (Figure 3) has no equivalents concerning RCM applications to fluvial morphodynamics at these mesospatial and mesotemporal scales. It represents an effort to implement hard quantitative field-based application strategy, clearly advocated in previous research concerning reduced-complexity modeling [Brasington and Richards, 2007]. This method has allowed us to apply the CAESAR model in a systematic way and to explore model capability to reproduce the morphological evolution of a braided reach at mesoscales.

[55] Alternative validation and calibration of RCMs could be performed manually in a “trial-and-error” fashion or adopting automatic calibration process. The first strategy is subjective and time consuming, and depends much on the expertise of the modeler; the second alternative is complex, often computationally unfeasible and it requires a priori probabilistic information. The strategy used here allowed us to minimize those problems, bridging the lack of a priori statistical information related to physically unbased factors and allowing a complete model application that can be considered a meaningful test of the RCM used. The strategy allowed us to analyze 12 factors and then focus calibration on the two most important quantitative factors. Overall, the number of simulations undertaken (547 in total) is less or comparable to those carried out in other works [Saltelli et al., 2000; Aronica et al., 2002; Hall et al., 2005]. Neglecting the different computational power of desktop CPUs that were used (see Figure 3), average single run computational demand reduced considerably from the first SA stage (1.25 days per simulation) to the second SA stage (about 6 h per simulation) and the calibration runs (2 h on average per simulation). In addition, the comparison between the computational demand for all the preliminary runs and those for the calibration runs shows that the workflow increased the model efficiency. Using the same simulation space-timescales and same CPU power, the 99 calibration runs were performed in 8 days of computation, while the preliminary 72 runs were carried out in 85 days (Figure 3).

[56] The adoption of a two-step sensitivity analysis using complementary techniques with different detail and sampling methods was worthwhile. The Morris method was successful despite the significant computational time required (over 160 days of single core computation). We consider that the total computational demand was reasonable in relation to (i) the large number of factors analyzed, (ii) the modest computing power deliberately used, and (iii) the considerable size of the investigated domain. Thanks to its global sample strategies and its ability to analyze also purely conceptual factors, the Morris method allowed us to use the model without having a deep early model confidence or any preliminary factor ranking, thus avoiding the adoption of a full Monte Carlo analysis.

[57] The Saltelli et al. [2010] method (stage 2 of the SA) was also found to be computationally efficient. The results reported in Tables 2 and 3 point out the difficulty in achieving converging sensitivity indices, due to the dimensionality of the SA design and the relative small size of the evaluation sample. Table 2 and 3 highlight the opportunity to have insights on factor relevance integrating the results from first-order and total sensitivity indices. Few indications can be derived from the first-order sensitivity index, whereas useful information comes out from the total sensitivity indices which show higher convergence. STi final values corroborate Si analysis concerning maximum erode limit factor (4) and point out additional crucial issues with number of passes of smoothing factor (2) and vegetation critic shear factor (12) that were otherwise undetectable. Moreover, thanks to the STi results, it was possible to identify lateral erosion rate factor (1) as the second most important quantitative factor to be submitted to calibration. This result concerning lateral erosion rate was highly valuable considering that after the first SA stage lateral erosion rate resulted to had a low sensitive rank (Figure 7), similar to the rank of vegetation critic shear factor, and lower than that of number of passes of smoothing factor. In other words, the conclusions that could be drawn after the screening analysis would be partially misleading and the application of the second SA stage avoided such errors.

[58] The analysis of the 12 model factors is therefore fundamental to establishing the relative importance of the factors and thus the model performance overall. The screening method is the only step of the analysis that provides sensitivity information for all the factors. This step highlights that several factors that are expected able to drive the model processes representation (i.e., factor 3 lateral erosion law, 9 flow distribution width) have low or moderate impact on the channel evolution. Furthermore, factors that could influence the sediment transport (i.e., factor 5 min time step, 7 minimum discharge for depth calculation, 10 suspended load presence, 8 water depth above which erosion can happen) turned out to have a low impact on the modeled morphology. The second sensitivity step can confirm the results from Morris analysis, but may also provide contrasting evidence, for example, the lateral erosion rate factor. However, overall (as shown in Figure 6), the max erode limit 4 is the most relevant factor for influencing sediment and flow dynamics, controlling the amount and the intensity of the sediment transport during every cycle step. In addition, the bed slope factor 6 has a notable impact on erosion process (Figure 6b), as the bed shear stress is directly related to the slope and therefore the sediment motion. The role of factor 9 deserves highlighting, as it has a strong control on the water performance (Figure 6a), but only a slight influence on erosion behavior and computation velocity (Figures 6b and 6c).

[59] As pointed out in section 3, also the spatial data resolution has been excluded from the analysis, although it could be considered an extra likely sensitive factor. Such aspect was excluded because, in most cases, it represents a restriction imposed to the modeler rather than a factor to adjust to achieve better performing simulations. The choice of spatial resolution therefore becomes a factor that the modeler has to determine a priori to be able to cover specific spatial and temporal contexts similar to those here investigated. There is considerable research examining the influence of topographic detail on model output using sensitivity analysis, especially in hydrodynamics [Bates et al., 1998; Horritt and Bates, 2001a; Casas et al., 2006; Frank et al., 2007; Legleiter et al., 2011] and landscape evolution modeling [Hancock, 2005; Hancock et al., 2006; Hancock, 2006]. We are aware that the ranking of factors obtained in this study can depend on data resolution (i.e., size of DEM cell) but accounting for spatial resolution in this study was assessed as unfeasible.

5.2 Evaluation of the Performance Assessment Techniques

[60] During the calibration stage, there was evidence of equifinality [Beven, 1996] in the model outputs (see Figures 8a and 8c), a problem which has already arisen in other RCM morphodynamics applications [Thomas, 2003]. The adoption of a multiperformance index approach [Wilcock and Iverson, 2003] has partially overcome this issue. In Figure 8, a calibrated factor combination has been identified considering together the three performance surfaces. If only Fveg were used (Figure 8b), we could not find a narrow range for the maximum erode limit factor. Conversely, thanks to the evaluation of the other two performance indices (Figures 8a and 8c), we were able to detect a smaller subdomain of global maximum, though not a global maximum single point. More in detail, identification of the calibration set has been performed manually, assigning implicitly more weight to the Fw index. The subjectivity employed in the selection does not invalidate the choice as global Fsec maximum (71%) is only slightly higher than the performance associated to the calibration selected point (69.5%).

[61] The multiperformance index approach has also been very useful during the second SA stage. Just considering Fcs and Fw values (Table 3), we could not observe the great relevance of lateral erosion rate factor on model capability of simulating channel morphodynamics. Furthermore, assessing only Fveg and Fsec performance indices would have led to consider water depth above which erosion can happen factor more important than maximum erode limit factor, questioning the reliability of the results of the first-order sensitivity indices (Table 2).

[62] Another relevant point concerning the performance techniques is the spatial extents to which they were adopted. The performance indices used in the SA and calibration steps were calculated by a cell by cell comparison between model predictions and observations at particular time (i.e., at the end of every run). Although this is a way of evaluation adopted in other RCM applications [Thomas and Nicholas, 2002; Nicholas and Quine, 2007], it represents just one method among a wide range of statistical approaches for comparing model output to observations. For a braided system, with an apparently stochastic morphodynamics, such place-and-time predictions are expected to be feasible only over relatively short timescales [Church, 1996; Paola, 2001]. For this reason, such approach was used in this work just for the SA and calibration steps which were applied to the shortest period (2 years) and reach (reach B in Figure 1). Conversely, validation output associated to a broader spatial and temporal context was evaluated using more general statistic properties of the river such as braiding index, wet area width, active channel area width, and sediment budget.

5.3 Assessment of CAESAR Performance: Potentials and Limitations of the Model

[63] Validation is a crucial step in numerical modeling that allows switching from an exploratory approach to a more quantitative and predictive approach. In this case study, the validation pointed out both limitations of CAESAR and also the potential for application at mesotemporal and mesospatial scales. Overall analysis of the validation results (Table 4) leads to the following considerations. Modeled and real 2009 mean active channel widths (respectively 785 m and 756 m) differ by 29 m. Such result may be considered unsatisfactory if compared to the measured average channel width change (25 m in 8 years), while acceptable if it is considered that it has the same order of magnitude of the DEM resolution.

[64] Simulated bed load yields from the model were satisfactory. Table 4 and Figure 10 show that (i) estimates of mean annual bed load sediment yield, spatially ranging between 60 and 92 × 103 m3 yr−1, are reasonable and coherent with those measured or modeled in other gravel bed rivers (Table 4); (ii) longitudinal sediment yield variation is coherent with channel morphology showing local maximum in wider subreaches (near Spilimbergo and upstream of Casarsa bridges) and local minimum in the single thread confined subreaches (Pinzano) and in the downstream part of the study reach where slope decreases; and (iii) variations between temporal maximum (in 2002) and minimum (in 2005) annual sediment yield agreed with an expected correlation between sediment transport yield and annual flood series. There are no significant differences in terms of channel evolution and bed load yield between simulations carried out with SPR 1.5 and 2.0. This suggests that an increase in upstream sediment supply has no relevance on overall bed elevation over relative short timescales (i.e., about 8 years). The effect of such increase in sediment supply is shown only by local aggradation in few subreaches just downstream of the Pinzano gorge.

[65] Therefore, the model was able to reproduce the bed load yield and the average change in channel width although morphological net changes observed were moderate. Just a portion of the 2001 channel underwent morphological change from active channel (i.e., flowing channel or exposed gravel areas) to vegetated area (i.e., islands or floodplains) or vice versa. These changes have involved an area with an extension of about 3.2 × 106 m2 and an equivalent average width of 99 m. Despite the moderate value of the real average width change, the results presented in Table 4 highlight the model difficulty in reproducing the internal channel dynamics and braided river morphology. The model could not reproduce satisfactorily the main channel pattern according to the indices used, in particular, the topographic complexity found in a braided river at low water stages. The output braiding index (BImodel ≈ 2.6–2.8) is approximately half the real braiding index (BIreal ≈ 4.8, evaluated using digitized 2009 data converted in raster format with 25 m pixel). Inundated area performance is also low (Fw ≈ 23%) and equivalent wet area width modeled (≈610 m) is three times the real data (≈230 m). Marked discrepancies between model output and real channel morphology can be also assessed qualitatively from Figure 9.

[66] This inability in reproducing the internal morphodynamics is partly due to the initial data quality in terms of (i) DEM resolution (25 m cell size) which corresponds, approximately, to the width of the secondary channels, (ii) DEM resampling from original 2 m cell size to a 25 m cell size, and (iii) lack of bathymetric data. The combination of these factors produced a smoothing and thus simplification of channel morphology. The lack of the bathymetric data and the smoothing had no direct effect on bed elevation performance indices (Fe, Fsec) because Fe evaluates the relative overall bed change in output by every run to the same output achieved by the reference run selected in the Morris screening method DOE (i.e., run 102). Also, Fsec was not directly affected by the lack of the bathymetry because it refers only to dry areas in 2001. However, an indirect influence on model results driven by these inherent limitations probably exists in terms of “spanning” in the flowing channel areas, braiding intensity reduction, and bed elevation changes too, as illustrated by more detailed analysis of the results. Comparing the real wet area in 2001 aerial photos to the inundated pattern provided by the model at the validation-run initial state, there are significant differences in braiding index (3.2 real 2001 BI, 2.8 modeled BI), average wet area width (256 m real, 433 m modeled) and also in wet area performance (Fw at initial state is about 49%). Probably these discrepancies already existing in the initial conditions of the validation runs could have been reduced, at least partially, by modifying the initial DEM (e.g., introducing an arbitrary bathymetry or thalweg elevations). We preferred to avoid this data manipulation as we considered it a potential source of further uncertainty that is inconsistent with an objective, rigorous, and real data-based approach.

[67] The morphological response of CAESAR may have a strong dependence on DEM cell size, as pointed out in other cellular model applications [Doeschl-Wilson and Ashmore, 2005; Doeschl et al., 2006; Nicholas and Quine, 2007]. Furthermore, the 8 year time series of high and low flow events was shortened (to speed model run times) to remove extensive periods of low flow. Morphological “gardening” (smaller changes to channel details) during these missing low flow periods may have led to smaller scale morphological changes resulting in a greater channel heterogeneity.

[68] Despite the computational speed advantages that RCM models such as CAESAR afford over other approaches, we still cannot rapidly simulate reaches of 10–30 km of large braided river by resolutions finer than the one used here (25 m). The limitation is a computational issue because the model speed slows proportionately with the number of grid cells modeled. In this application, the model has shown good computational performance (Fcs), close to the expectations with single validation run taking 27 h (using best CPU among those indicated in Figure 3), a reasonable duration in relation to the time simulated (8 years).

[69] The limitations described above in terms of morphological evolution reproduction have to be tested further in real case studies similar in terms of river morphology and temporal-spatial scales supported by input data of higher quality. It is also worth noting that CAESAR could be enhanced in different areas, e.g., flowing scheme and parallel computation [Coulthard et al., 2013]. In agreement with previous works [Murray, 2007; Coulthard et al., 2007; Ziliani and Surian, 2012], this case study shows that CAESAR could be very useful for setting strategies “what-if scenario” over mesospace-timescales but further research is needed for testing its ability to reproduce satisfactory specific morphological characteristics (e.g., braiding intensity) in a braided stream.

6 Conclusion

[70] This paper reports on the application of a reduced-complexity model (CAESAR) to simulate morphological changes within a 33 km reach of a large braided gravel bed river, the Tagliamento River (Italy). The main aim of this work was to define a general strategy for rigorous RCM evaluation, using a multiperformance assessment technique. The strategy was then used to assess strengths and weakness of CAESAR at mesospatial and mesotemporal scales of application (10–100 km, 10–100 years). Nonlinearity, complexity, and a large parameter domain make verifications and confirmation difficult. “Trial-and-error” strategies can be adopted to gain a quick but poor model evaluation; alternatively, Monte Carlo strategies or automatic calibration methods may guarantee full domain model evaluation, but they are often computationally unfeasible. Outcomes of this work show the effectiveness of an alternative approach which involved two levels of sensitivity analysis before model calibration and validation. The approach developed here can be effectively applied in other similar RCM context, allowing to use RCMs not only in an explorative manner but also in obtaining quantitative results and scenarios.

[71] Recent advances in CFD modeling [Nicholas, 2013] have shown that only a few physically based models are able to simulate river morphodynamics at spatial and temporal scales comparable to that supportable by CAESAR. A critical point is that a rigorous and objective methodology like the one used in this study is still unfeasible for CFD models due to computational power and time demand per single run. In other words, at present, CFD morphodynamic models applicable at mesotemporal and mesospatial scales could be hardly evaluated using real case study data.

[72] Therefore, this case study confirms the usefulness of CAESAR for modeling such spatial and temporal scales still hardly supported by 2-D–3-D CFD morphodynamic models. From a morphological point of view, CAESAR has proven to be able to reproduce the average change in channel width, although the real morphological changes observed during the validation simulated period were just moderate. On the other hand, the model showed a poor performance in reproducing the braided in-channel pattern dynamic (e.g., braiding intensity). This is consistent with the well-known difficulties inherent in any attempt to reproduce numerically braided system dynamics [Nicholas, 2005]. The results are encouraging as for sediment yield, because annual bed load estimates were coherent with measurements carried out in other large gravel bed rivers. In conclusion, results suggest that CAESAR represents a useful tool for sediment budget estimation even if it is worth stressing that further testing must be performed in other real cases, paying attention to the quality of data to define initial channel geometry and to validate model results. Further improvements may arise in the future model development by flow routines refinement in line with recent RCHM schemes and also by converting the code into parallel programming methods.


[73] This research was supported by funds from MIUR (Ministero dell'Istruzione dell'Università e della Ricerca) PRIN 2007 project “Present evolutionary trends and possible future dynamics of alluvial channels in northern and central Italy”; Fondazione CARIPARO project “Linking geomorphological processes and vegetation dynamics in gravel bed rivers”; and University of Padua strategic project “GEO-RISKS”. We would like to thank the Autorità di Bacino dell'Alto Adriatico for providing the LiDAR data and some of the aerial photos (2001 and 2002); the Regione Friuli Venezia Giulia for hydrological data; and Walter Bertoldi and Matilde Welber (Department of Civil and Environmental Engineering, University of Trento) for the rating curve at Pinzano gauging station. We thank the Associated Editor Dimitri Lague and the three reviewers, Brad Murray, Andrew Nicholas, and an anonymous reviewer, for their comments that help us to improve the original manuscript. CAESAR is freely available for download at