Assessing the performance of a physically based hydrological model using a proxy‐catchment approach in an agricultural environment

Physically based models are useful frameworks for testing intervention strategies designed to reduce elevated sediment loads in agricultural catchments. Evaluating the success of these strategies depends on model accuracy, generally established by a calibration and evaluation process. In this contribution, the physically based SHETRAN model was assessed in two similar U.K. agricultural catchments. The model was calibrated on the Blackwater catchment (18 km2) and evaluated in the adjacent Kit Brook catchment (22 km2) using 4 years of 15 min discharge and suspended sediment flux data. Model sensitivity to changes in single and multiple combinations of parameters and sensitivity to changes in digital elevation model resolution were assessed. Model flow performance was reasonably accurate with a Nash–Sutcliffe efficiency coefficient of 0.78 in Blackwater and 0.60 in Kit Brook. In terms of event prediction, the mean of the absolute percentage of difference (μAbsdiff) between measured and simulated flow volume (Qv), peak discharge (Qp), sediment yield (Sy), and peak sediment flux (Sp) showed larger values in Kit Brook (48% [Qv], 66% [Qp], 298% [Sy], and 438% [Sp]) compared with the Blackwater catchment (30% [Qv], 41% [Qp], 106% [Sy], and 86% [Sp]). Results indicate that SHETRAN can produce reasonable flow prediction but performs less well in estimation of sediment flux, despite reasonably similar hydrosedimentary behaviour between catchments. The sensitivity index showed flow volume sensitive to saturated hydraulic conductivity and peak discharge to the Strickler coefficient; sediment yield was sensitive to the overland flow erodibility coefficient and peak sediment flux to raindrop/leaf soil erodibility coefficient. The multiparameter sensitivity analysis showed that different combinations of parameters produced similar model responses. Model sensitivity to grid resolution presented similar flow volumes for different digital elevation model resolutions, whereas event peak and duration (for both flow and sediment flux) were highly sensitive to changes in grid size.


| INTRODUCTION
Surface flow and soil erosion are natural processes affected by changes in agricultural land use and management in catchments.
Increases in surface run-off, event peak flows, and suspended sediment loads have been related to altered agricultural practices and management interventions (Deasy, Brazier, Heathwaite, & Hodgkinson, 2009;Deasy, Titman, & Quinton, 2014;Smith et al., 2018).Hydrological models provide useful frameworks for testing, understanding, and predicting catchment behaviour under different agricultural change scenarios and in response to mitigation measures designed to reduce surface run-off, erosion, and suspended sediment loads.
Physically based spatially distributed (PBSD) hydrological models represent processes using fundamental physical equations (Merritt, Letcher, & Jakeman, 2003) usually simulating the hydrological cycle and sediment generation processes (Daniel et al., 2011).These types of models are generally classified according to their spatial scale, temporal distribution, variables, and process descriptions (Aksoy & Kavvas, 2005).PBSD models have the advantage, over empirical or conceptual models, in their ability to better represent the effect of changes in catchment conditions as their parameters relate to physical properties (Bathurst, 2011).However, a frequent problem with PBSD models is the large number of parameters required; some of which may be unavailable or inaccessible, particularly in data scarce regions (Tarawneh, Bridge, & Macdonald, 2016).This data uncertainty can be reduced by a model calibration approach, followed by an evaluation procedure, providing increased confidence in model outputs for further application.Klemeš (1986) described several approaches for model calibration and evaluation.For example, the split-sample test uses measured data, divided into two periods of similar length, one for calibration and the other for evaluation, or the proxy-catchment test that comprises model calibration in one catchment and evaluation in a nearby catchment.The proxy-catchment approach is recommended as a framework for testing model performance (Pechlivanidis, Jackson, McIntyre, & Wheater, 2011;Xu & Singh, 2004) but is less commonly applied as it requires more resources than the split-sample test.Comparatively few studies use the proxy-catchment approach for model evaluation (Arsenault, Brissette, & Martel, 2018;Gumindoga, Rwasoka, Nhapi, & Dube, 2017;Yang, Herath, & Musiake, 2002) with limited applications using PBSD models (Refsgaard, Storm, & Refsgaard, 1995).Moreover, studies that adopt this model testing approach in combination with high-resolution measurements of discharge and sediment flux were not found in the available literature.
Another common criticism of PBSD model is grid size applications (Beven, 1991;Brazier et al., 2011), as parameter values applied to a catchment grid are usually an average of a physical property (measured or calibrated).This generalization fails to capture natural variability in a given property and may affect hydrological process representation (Thomas et al., 2016).Thompson, Bell, and Butler (2001) found that decreasing the digital elevation model (DEM) resolution tends to produce a smoother topography, as larger grids average the elevation of the covered area, losing important landscape detail.The loss of the topographic features can also lead to changes in overland flow pathways affecting sediment transport and deposition processes (Thomas et al., 2017).Moreover, the change in spatial resolution may affect model representation of tributaries (Daniel et al., 2011).Errors can be minimized by using the finest possible grid resolution.However, this requires more computing capacity and increases model run times (e.g., De Figueiredo & Bathurst, 2007).Despite the scale resolution problem being addressed in several studies (Lesschen, Schoorl, & Cammeraat, 2009;Valeo & Moin, 2000;Zhang & Montgomery, 1994), there is a lack of information on the effects of changes in DEM resolution on flow and sediment flux at an event scale with comparison against measured data.Furthermore, understanding PBSD model sensitivity to DEM resolution may be important for assessing predictions of the effectiveness of management interventions to reduce agricultural impacts, particularly when accurate representation of an intervention (e.g., minimum width of field buffer strips) is dependent on grid size.The selection of one PBSD model over another often depends on the available information and accessibility.The SHETRAN model was selected in the present study for its capability to simulate flow and sediment flux on an event and continuous basis, at any given resolution (with a maximum limit of 300 × 300 grids [Windows Version v4.4.5x64]).Moreover, event-based sediment prediction on a continuous basis can reduce errors related to initial conditions (Bussi, Francés, Montoya, & Julien, 2014).The SHETRAN model has been designed to simulate a wide range of land uses (e.g., agricultural, forest, or urban) and has been used globally, with a limited number of applications in the United Kingdom (Bathurst, Ewen, Parkin, O'Connell, & Cooper, 2004;Birkinshaw, 2008;Janes, Holman, Birkinshaw, O'Donnell, & Kilsby, 2017).
This study represents the first application of SHETRAN using a proxy-catchment approach with a semicoarse spatial resolution (50 × 50 m) at a high temporal resolution (15 min).Continuous flow and suspended sediment data from two nearby agricultural catchments in the United Kingdom were used for model calibration and evaluation, with the assessment of model efficiency focussing on flow volume, peak discharge, sediment yield, peak sediment flux, and event duration (flow and sediment) for a range of DEM grid sizes.

| The SHETRAN model
SHETRAN uses deterministic equations and finite-difference solutions to simulate water flow, sediment transport, and contaminant transport processes (Ewen, Parkin, & O'Connel, 2000).It has a distributed response at a catchment scale limited by 300 × 300 grids, each one containing soil and vegetation information.Soil profiles represented by columns of stacked grids (up to six layers) allow lateral and vertical flow transport for a three-dimensional subsurface flow simulation.
The river network is modelled as channel links along the edges of grids (Birkinshaw, 2010).
Water flow comprises the subprocesses of evapotranspiration/interception, overland/channel, and variably saturated subsurface.
The first uses climatological data of rainfall and potential evapotranspiration (PE).The PE can be introduced by the user or calculated by the model using the Penman-Monteith equation (Monteith, 1965;Penman, 1948).The interception process that calculates the amount of rainfall stored by the canopy is based on the canopy surface storage capacity (Rutter, Morton, & Robins, 1975).Evaporation from soil occurs at a rate determined by the proportion of bare ground.This soil moisture control is assessed by the state of the unsaturated flow limits; the losses on the soil are calculated by a simple linear relation between the soil moisture tension (Ψ ) and the actual evapotranspiration (AE) and potential evapotranspiration (PE) ratio (AE/PE), specified in the model as a table of AE/PE values against Ψ .SHETRAN simulates soil erosion by raindrop/leaf drip impact and overland flow, overland and channel transport of eroded sediment, and deposition (Lukey, Bathurst, Hiley, & Ewen, 1995).
In SHETRAN, the reported parameters to which the model is most sensitive on the water flow component are AE/PE ratio, saturated hydraulic conductivity (K sat ), and overland Strickler coefficient (Stk; Anderton, Latron, & Gallart, 2002;Bathurst, 1986;Op de Hipt et al., 2017).The AE/PE ratio influences water balance; the saturated hydraulic conductivity (K sat ) affects the surface-subsurface infiltration and storage; and the overland Strickler coefficient (Stk) has an effect on the surface flow velocity (Ewen et al., 2000).It is suggested by SHETRAN developers that AE/PE and soil depth are calibrated in the first instance, afterwards, if necessary, K sat and Stk (Birkinshaw, n.d.).
In the sediment transport component, the reported parameters to which model is most sensitive are raindrop soil erodibility coefficient (K r ) and overland flow erodibility coefficient (K f ; Adams & Elliott, 2006;Op de Hipt et al., 2017) with the parameters' values associated with catchment soil conditions and vegetation characteristics.For example, Verhaegen (1987) established a relationship between K r and soil texture.However, relationships between soil texture and K f are not commonly reported; hence, calibration of this coefficient is often used.
Performance of the SHETRAN water flow component has been reported in a number of studies using the Nash-Sutcliffe efficiency (NSE) coefficient.The term ranges from 1, for a perfect match, to −∞.The reported NSE values for SHETRAN are between 0.6 and 0.8 (Adams, Western, & Seed, 2012;Janes et al., 2017;Mourato, Moreira, & Corte-Real, 2015).Inaccurate peak discharge prediction has been addressed in previous SHETRAN applications where better estimation of base flow and poorer prediction of discharge peaks have been reported (Zhang, Santos, Moreira, Freire, & Corte-Real, 2013).Underestimation of event peak discharge was also found by comparing simulated flow with data from eight gauging stations within a 2,400 km 2 catchment in the United Kingdom (Janes et al., 2017).Conversely, successful double-peak discharge predictions were obtained in a U.K. catchment (Birkinshaw, 2008).
However, in this study, estimation of phreatic level against two measured sites was not accurate.This phreatic surface level is related to base flow.To assess model uncertainty, the "blind validation" method (Ewen & Parkin, 1996) was developed, which consists of establishing parameter bounds for the prediction of certain hydrological variables (e.g., hydrographs, phreatic surface level, peak discharge, soil water potential, and run-off) and calculating the degree (perceptual value established by the user) to which a measured variable lies within the predicted range.Using this method, a successful prediction of phreatic level (90%) but poorer prediction of peak discharge (81% [outlet] and 55% [channel]) was obtained for a U.K. catchment (Bathurst et al., 2004).
The sediment transport component efficiency has also been assessed.However, as continuous measurements of sediment flux are more difficult to obtain, evaluation has largely focussed on individual events with different coefficients.For example, in a small catchment in New Zealand, the difference between observed and predicted sediment loads (kg) for seven events varied between −3.3 (−97%) and 55 (134%) depending on the event (Adams & Elliott, 2006).Similarly, a single event in four nearby catchments in New Zealand (Elliott, Oehler, Schmidt, & Ekanayake, 2012) showed sediment yield error of 1%, 3%, 13%, and 110% and NSE values of 0.61, 0.70, 0.86, and 0.65, respectively, for each catchment.Additionally, the coefficient of determination (R 2 ) between simulated and measured daily sediment yield on two subcatchments of the Brazilian Sumé catchment showed values of 0.35 and 0.28 (De Figueiredo & Bathurst, 2007).Furthermore, using the blind validation, errors associated with 5-year sediment yields were reported with values ranging from 1% to 198% in a catchment in France (Lukey, Sheffield, Bathurst, Hiley, & Mathys, 2000).
The parameters to which SHETRAN is most sensitive varied according to catchment characteristics (location, topography, vegetation, etc.).Nonetheless, SHETRAN studies showed similar Stk values for land cover with the highest values for urban areas, followed by cropland, grassland, and woodland as the lowest (Bathurst, Moretti, El-Hames, Beguería, & García-Ruiz, 2007;Bathurst, Moretti, El-Hames, Moaven-Hashemi, & Burton, 2005;Birkinshaw, 2008;Elliott et al., 2012;Wicks & Bathurst, 1996).Moreover, on-site studies also indicated a similar Stk trend for land covers (Engman & American Society of Civil Engineers, 1986;Kvaernø & Stolte, 2012;Rahimy, 2011;Schob, Schmidt, & Tenholtern, 2006).In SHETRAN, hillslope surface run-off is simulated as a sheet flow (fine, extensive, and surficial) rather than confined flow (e.g., rills or gullies).The flow depth and velocity are an average value in each grid; therefore, a coarse spatial resolution will generate a wider sheet flow (equal to the size of the grid).The dependence of overland flow erosion by surface flow makes the K f coefficient an important parameter to be considered in the calibration process, with the empirical value representing a combined sheet and rill flow effect (Wicks & Bathurst, 1996).Nonetheless, different erodibility coefficient values were calculated in plot scale studies with different agricultural land uses (Wicks, Bathurst, & Johnson, 1992).Wicks et al. (1992) showed that erodibility coefficient variation depends on the land use (tilled, clipped grass, grazed, and ungrazed).However, there is high variation of soil erodibility coefficients in SHETRAN applications, with K r ranges from 0.05 to 70 (J −1 ) and K f between 5 × 10 −7 and 2 × 10 −5 (kg m −2 s −1 ; Bathurst, 2011).Most of the reported SHETRAN studies employ single value for the erodibility coefficients (K f and K r ).

| Study area
The Blackwater (18 km 2 ) and Kit Brook (22 km 2 ) catchments are part of the River Axe hydrological network in south-west England crops, and (d) grass.In Blackwater, the land cover was grass (60%), arable crops (27%), deciduous woodland (12%), and urban (1%) with comparable land cover proportions observed in the Kit Brook catchment: 57%, 29%, 13%, and 1%, respectively.In the south-west region of England, winter cereals are the most common crop type (Department for Environment, Food and Rural Affairs, 2015); hence, winter barley was used as the simulated arable crop for both catchments.Soil data were acquired from the National Soil Resources Institute (NSRI) of Cranfield University (Cranfield University, 2018).Five soil types were identified in the Blackwater catchment; the one with the largest extent (40%), namely, WICKHAM (Eutric Luvic Planosols), had a soil texture of 21% sand, 41% silt, and 30% clay.Each soil is characterized by five depth layers between 0 and 1.5 m; and the properties for each soil type and layer vary depending on the land use (Hollis et al., 2015).
Six soil classifications with five depth layers (0-1.5 m) were identified in the Kit Brook catchment; BATCOMBE (18% sand, 58% silt, and 24% clay) and CHARITY (16% sand, 58% silt, and 26% clay) soils cover the 37% and 31% of the catchment, respectively.The NSRI soils database provides the international standard soil classification (International Union of Soil Sciences, 2007) related to the soil classification of England and Wales, in which WICKHAM corresponds to Eutric Luvic Planosols, BATCOMBE to Profundic Chromic Endostagnic Luvisols, and CHARITY to Chromic Luvisols.The QUORND soil that covers 20% of the Kit Brook catchment was substituted for HENCE soil (60% sand, 25% silt, and 15% clay) due to the lack of information available in the NSRI.

| Event selection
Individual flow events were selected in both catchments to enable event-based analysis of suspended sediment flux for model calibration and testing.Issues with turbidity data quality arose during some periods; for example, sensor fouling or burial, leading to periods of persistent high turbidity values until cleaning, or on occasion, large rapid variations in measured turbidity occurred that were unrelated to any change in flow.Therefore, it was necessary to exclude some events from further analysis due to these data quality issues.
Discharge event analysis focussed on selecting a subset of events in each catchment that were determined to have "good quality" flow and turbidity data.The start of a flow event was defined by when flow exceeded the base flow and where sediment flux increased above the prior baseline data.The end of an event was determined when flow fell to the pre-event level or to a new temporary base level, which exceeded the flow prior to the event (Robson & Reed, 2008).Total flow volume (m 3 ; Equation (1)), maximum flow peak (m 3 s −1 ), sediment yield (t ha −1 ; Equation ( 2)), and maximum sediment flux peak (kg s −1 ) of each event were obtained.The reference date for each event was stipulated as the day in which the maximum flow peak was found.On the basis of this event selection, 53 events were identified in Blackwater and 46 in Kit Brook.For most selected events, a clockwise suspended sediment concentration-discharge (C-Q) hysteresis behaviour (Williams, 1989) was observed in both catchments (i.e., sediment peak arrives before the discharge at the outlet and/or presenting sediment exhaustion).

| Model set-up
The use of a 25 × 25 m grid size for Blackwater produced a grid number of 239 × 170, fitting with the model grid limit (300 × 300), but this limit was exceeded for Kit Brook (180 × 355).Furthermore, a run time of 15 days was observed for this resolution over the complete simulation period.Therefore, the 50 × 50 m grid size was selected to produce a reasonable running time (24 hr for 5 years); and this grid was converted from the 5 × 5 m DEM in both catchments.
Vegetation and sediment parameter values (Appendix A) were selected from a literature review (Birkinshaw, 2008;Lukey et al., Flow duration curve for Blackwater and Kit Brook streams, period 2010-2014 2000; Wicks et al., 1992).NSRI data were used in the model with a soil depth up to 1.5 m; parameter values (e.g., K sat , θ res , and θ sat ) varied according to each soil layer and land cover (Appendix B).
Rainfall data with a 15 min time step and calculated daily potential evapotranspiration were used for an initial simulation from October 2010 to September 2014 with a "spin-up" hydrological year (October 2009-September 2010) to obtain phreatic surface level equilibrium.

| Calibration and evaluation processes
Calibration aims to improve model performance by changing values of selected parameters, through either "manual" or automated methods.
A manual process is commonly applied, in which parameters are changed "one at a time" (OAT; Rabitz, 1989).This necessitates determination of which parameters require calibration and consideration of the relationship between parameters.It also requires a choice of the model output to be calibrated (e.g., run-off, phreatic level, sediment concentration, discharge peaks, and sediment flux peaks).Furthermore, multiple different combinations of parameter values might give a good fit (Beven & Binley, 1992;Jetten & Maneta, 2011).The decision on parameters to be calibrated depends on information about previous applications (i.e., sensitivity analysis) and/or user experience.
The calibration process was performed in the Blackwater catchment.The first simulation used five soil layers (0-1.5 m), although for an accurate base flow, subsequent simulations required the addition of a layer (sixth [1.5-20 m]) to represent soil from subsoil to bedrock (Birkinshaw, n.d.).The water flow component was calibrated by comparing the measured 4-year discharge record to the model simulation and by quantitative comparison of the selected discharge events.
Afterwards, the sediment transport component was calibrated using the selected events.Using the proxy-catchment approach, model evaluation was undertaken in the Kit Brook catchment for the same spin up and run period using parameter values from the final Blackwater calibration.
The NSE (Equation ( 3)) was used to assess model fit with the continuous discharge record.For assessing event-based performance, the coefficient of determination (R 2 ) between measured and simulated data was calculated for flow (R 2 Q ) and sediment flux (R 2 Sed ), respectively.Coefficient values (NSE and R 2 ) higher than 0.5 are considered as good fit (Moriasi, Arnold, Van Liew, Harmel, & Veith, 2007).
where O i = observe measurement at the time i  4)).
Simulations were run by changing parameters values using the OAT method with the objective of reaching the highest possible Sed ) for the selected events; and (c) R 2 for Qv, Qp, Sy, and Sp.

| Sensitivity analysis
The parameters reported as the most sensitive in other studies of SHETRAN sensitivity (Bathurst, 1986;Bathurst et al., 2004;Op de Hipt et al., 2017;Parkin et al., 1996;Wicks et al., 1992;Wicks & Bathurst, 1996;Wicks, Bathurst, Johnson, & Ward, 1988) include AE/PE ratio, Strickler coefficient (Stk), saturated hydraulic conductivity (K sat ), raindrop soil erodibility coefficient (K r ), and overland flow erodibility coefficient (K f ).In the present study, the sensitivity analysis consisted of varying values of selected parameters upward and downward, relative to the base run (final calibration), by factors of 10 (F 10 ), 0.1 (F 0.1 ), and 0.01 (F 0.01 ) with the OAT method.The factorial changes were applied to the parameters' value for each of the four land cover types, and the maximum possible value was used when a parameter value exceeded the physical limit.
An adaptation of the sensitivity index (SI; Sheikh, van Loon, Hessel, & Jetten, 2010) was implemented using model outputs of the sensitivity simulations (F 10 and F 0.01 ) and the base run (F 1 ).The SI was calculated for each assessed parameter according to Equation (5).The multiparameter approach to sensitivity analysis consisted of running simulations with a combination of parameter changes for the water flow (AE/PE, Stk, and K sat ) and sediment transport components (K r and K f ).Factorial change of parameters from F 0.1 to F 10 on the water flow component produced 27 simulations and nine simulations on the sediment transport component (Figure 6).
The percentage of difference (Pct diff ) for each model output relative to the base run was calculated by Equation ( 6).The base run simulation (S out br ) for the parameters on the water flow and sediment transport components corresponds to simulation numbers 14 and 5, respectively (Figure 6).Model sensitivity to changes in grid resolution was assessed by comparing outputs of the base run (50 × 50 m, final calibration) against each resolution (Equation ( 6)).The model output considered was 4-year flow volume (Qv), mean of discharge peaks (μQp), mean of event flow duration (μQw), 4-year sediment yield (Sy), mean of sediment flux peaks (μSp), and mean of event sediment duration (μSw).
The width of the fluxes (flow [Qw] and sediment [Sw]) was measured as the distance between the intercept points to the left and right of the one-half peak height (Robson & Reed, 2008). where

| Model calibration in Blackwater catchment
The value required an escalation of the plot measurements reported for tillage and clipped grass (Wicks et al., 1992).An over prediction for the majority of events was observed with the K f value of 59 × 10 −10 (kg m −2 s −1 ) and an under prediction with 6.9 × 10 −10 (kg m −2 s −1 ).The highest K f was chosen for cropland and the lowest for grass and woodland.A similar procedure was performed for K r parameter using plot values reported in Wicks et al. (1992).
Monthly mean run-off (Figure 3a) showed the lowest differences between simulated and measured data during autumn (2 mm) and winter period (−6 mm), whereas simulated run-off during spring and summer exceeded measured by 23 and 17 mm, respectively.The simulated annual run-off showed an over estimation of 5%.
Model performance on event scale showed that 83% of the discharge events and 64% of the sediment flux events have R 2 values higher than 0.5 (Figure 4a In contrast, the μAbs diff was higher for Sy (106%) than Sp (86%; Table 2.

| Model evaluation in Kit Brook catchment
Model simulation of flow and sediment flux in Kit Brook showed a lower NSE, larger μAbs diff (Qv, Qp, Sy, and Sp) and similar R 2 compared with the calibrated results for the Blackwater catchment (Table 2).
Model runs in Kit Brook used the final calibrated parameter values obtained from the Blackwater simulation.The measured mean monthly run-off (mm) was higher than simulated, especially for winter months.In contrast to Blackwater, simulated mean annual run-off in Kit Brook was 10% under predicted (Figure 3b).

The model showed poorer performance when comparing Kit
Brook with Blackwater on an event basis.Events with R 2 higher than 0.5 represent the 80% of the total discharge events (Figure 5a) and 41% of the sediment flux events (Figure 5b).The 46 measured events in Kit Brook presented a higher model predictive performance for Qv (R 2 = 0.8) than Qp (R 2 = 0.6; Figure 5c,d), which correspond to μAbs diff values for Qv and Qp of 48% and 66%, respectively.Lower performance was observed for sediment prediction (R 2 = 0.6 [Sy] and R 2 = 0.3 [Sp]; Figure 5e,f), corresponding to μAbs diff values for Sy and Sp of 298% and 438%, respectively.
The higher observed permeability in Kit Brook, based on comparison of the flow duration curves (Figure 2), supported adjustment of the K sat for the sixth soil layer in this catchment.
Increasing the K sat value in this layer showed a difference of 7% between simulated and measured annual run-off.Furthermore, better NSE and μAbs diff coefficients (Table 2) were observed with this K sat adjustment.However, the coefficient of determination for the event analysis (R 2 Q and R 2 Sed ) did not vary significantly compared with the nonadjusted Kit Brook simulation (Appendix C).

| Model sensitivity
The SI (Table 3) showed that variations of K sat affected flow volume (Qv) the most and Stk affected the mean of discharge peaks (μQp).
Variations in AE/PE had little effect on both flow volume and mean of flow peaks.For the sediment transport component, sediment yield   presented more prolonged events with lower flow peaks compared with S16, resulting in similar Qv changes, whereas S15 presented higher base flow during the 4-year period than S8; as a consequence, the mean of flow peaks showed similar μQp reductions when compared with the base simulation (S14).
The multiparameter analysis of parameters on the sediment transport component (Figure 6b) showed that minor changes in K r combined with high K f values produced similar sediment yield with respect to the base run (Pct diff [Sy = 100%]), for example, simulations ).Likewise, on the mean of sediment flux peaks (μSp), large changes in K r combined with small changes in K f (e.g., S7 and S8) presented similar differences compared with the base run (Pct diff [μSp = 100%]).
In terms of sensitivity to DEM resolution, the probability density distribution of slopes for each DEM resolution shows that decreasing grid size leads to increasing variance in the distribution of slope angles (Figure 7a).The change in grid size produced only a very minor effect on land cover proportions represented in simulations (Figure 7b).The largest difference between grid resolutions was in channel network lengths (Figure 7c-f).
The DEM resolution simulations show that increasing grid size from 25 to 200 m leads to increases in μQw and decreases in μQp and  K sat = 0.05 m day −1 for the lowermost soil layer (sixth).b K sat = 0.10 m day −1 for the lowermost soil layer (sixth).
c See Appendix C for the event-based performance.
Qv (Figure 8a).Only small differences (Qv, μQp, and μQw) were observed between grid sizes simulations of 25 and 50 m.In terms of sediment exports, increasing grid size (from 25 to 200 m) produced a decrease in μSp and increase in μSw.In contrast, Sy exhibited a much smaller but more varied response (Figure 8b).
In general, a better estimation (NSE, μAbs diff , and R 2 ) was obtained for the 50 m grid size when comparing simulations of each resolution with measured data (Table 4).The coefficients showed a better fit to measured data with decreasing grid size, except R 2 .Moreover, the two finest resolutions simulations (25 and 50 m) presented only slight differences between coefficients, especially NSE and μAbs diff .
The addition of a deep, low conductivity, and homogenous soil-rock layer (1.5 to 20 m depth) below the upper soil profile (parameterized using NSRI soil hydraulic data) enabled more accurate simulation of base flow (Table 2), which decreased gradually during extended dry periods.Moreover, the hydraulic conductivity of bedrock (2.7 to 20 m) in previous SHETRAN applications (Birkinshaw, 2008;Birkinshaw & Ewen, 2000) was similar to the present study.Stk (kg m −2 s −1 ) 0.15 1.33 Abbreviations: AE, actual evapotranspiration; PE, potential evapotranspiration.
The proxy-catchment test in this study showed better performance in the calibrated catchment (Blackwater) than the evaluated catchment (Kit Brook) probably due to a higher soil permeability in Kit Brook than Blackwater (Figure 2).Nonetheless, SHETRAN showed reasonably accurate flow prediction in both catchments, which are similar in terms of area, relief, and land cover characteristics.The prediction errors (e.g., Qv and Sy) may be attributed more to the parametrization and DEM resolution rather than lack of model process representation.An important factor to consider for model predictions is that soil (NSRI) under grass represents physical and hydraulic properties of permanent grass land cover, whereas the land cover may include permanent and temporary grass.Soils under temporary grass could have different properties (water retention curve, bulk density, and soil porosity) than under permanent grass.Moreover, temporary grass may be located in rotation fields, where soil physical properties can vary depending on time since last cultivation, and take between 2 and 9 years to recover (Thorud & Frissell, 1976;Tuzzin de Moraes et al., 2016) and up to 20 years in loamy soils (Froehlich, Miles, & Robbins, 1985).
In general, simulated discharge in both catchments compared well with measured discharge in terms of event timing and reproduced discharge peaks well during wet periods (Figures 4a-5a).However, the underestimation of peak discharge by SHETRAN could also be attributed to local infiltration variability, in terms of subfield scale run-off and run-on patches not represented in the model, as well as the absence of representation of impervious and hydrologically smooth roads and paths.These features could act as efficient flow pathways connecting run-off-generating areas to the stream network (Croke, Mockler, Fogarty, & Takken, 2005;Jordán-López, Martínez-Zavala, & Bellinfante, 2009) and have been implemented in other models (Elliot, 2004;Tiemeyer, Moussa, Lennartz, & Voltz, 2007) but not identified in any SHETRAN application.
The best-fit calibration for the Blackwater catchment resulted in lower mean errors at the event scale (i.e., μAbs diff [Sy and Sp]) than Kit Brook (Table 2).The lower model performance in Kit Brook (i.e., μAbs diff ) may be attributed to higher soil permeability in the catchment compared with its neighbour (Figure 2).A K sat adjustment in the sixth soil layer produced an improvement in model performance (Table 2).Nonetheless, coefficients of the adjusted simulation pres- Kit Brook (in contrast to Blackwater) could limit distinction between fields with permanent versus temporary grass cover.It is also possible that NSRI soil parameters in Kit Brook are less accurate (i.e., data extrapolation for arable soil) as cultivated soils in this catchment have been reported to be more degraded than soils in Blackwater catchment (Palmer, 2007).
Model sediment yield (i.e., μAbs diff ) showed lower estimation errors in the calibration process (Blackwater) and higher errors in the evaluation process (Kit Brook) compared with other studies (Adams & Elliott, 2006;Elliott et al., 2012;Lukey et al., 2000).In relation to sediment flux peaks, considerable uncertainty has been described in other SHETRAN applications (Elliott et al., 2012;Lukey, Bathurst, et al., 1995;Wicks & Bathurst, 1996) and is a common problem in PBSD models (e.g., Phomcha, Wirojanagud, Vangpaisal, & Thaveevouthti, 2011;Rankinen et al., 2010).Adams and Elliott (2006)  sources occur in a more distant location from the channel network than sediment sources, then some hysteresis effects could be observed between simulated flow and sediments.This difference in run-off and sediment source locations could in part be related to crop rotation, which was not represented in the calibration, as the 2010 land cover map was used for the 5 years of simulation (2009)(2010)(2011)(2012)(2013)(2014).
Moreover, urban areas (e.g., village and roads) can act as sources of flow and sediment transport (Jones, Swanson, Wemple, & Snyder, 2000), and parameters representing the hydraulic properties of these urban areas are not captured in the NSRI dataset.Furthermore, it was not possible to represent roads at the catchment scale due to the model-limited minimum grid size (>25 m).The above factors could explain some C-Q hysteresis behaviour that was observed in the measured data but not reproduced by SHETRAN.
The parameters to which the model is most sensitive on the water flow component were saturated hydraulic conductivity (K sat ) and the Strickler coefficient (Stk; Table 3).These findings were consistent with other studies (Bathurst, 1986;Bathurst et al., 2006).The sediment transport component showed similar SI values between the erodibility coefficients (K f and K r ).This similarity in sensitivity means that increases in either erodibility coefficient affect sediment output and sediment peaks to a similar extent.It is important to note that SHETRAN simulated a sheet surface flow as wide as the grid size (i.e., 50 × 50 m).Therefore, calibration of K f was prioritized over K r .This is consistent with the reported increase in K f values (kg m −2 s −1 ) with decreasing grid size (Bathurst et al., 2005;Bathurst et al., 2007;Janes et al., 2017;Lukey et al., 2000).The studies also showed increases in raindrop/leaf erodibility coefficient (K r ) values with decreases in grid size but with a lower ratio of change.
The multiparameter sensitivity analysis of the SHETRAN water flow component (Figure 6a) showed that different parameter combinations, particularly those including Stk and K sat , can produce similar changes in flow volume and discharge peaks.Anderton et al. (2002) followed a comparable approach, although parameters were limited to the infiltration module on the water flow component (i.e., K sat and Van Genuchten α and n).The authors found similar modelled flow response to some combinations of K sat and soil depth (0-0.2 and 0.2-3.3m).Furthermore, an automatic calibration procedure in Zhang et al. (2013) showed an equifinality problem with the adjustment of Stk, K sat , and soil depth.Likewise, combinations of erodibility coefficients (K r and K f ) produced similar changes in sediment yield and sediment peaks (Figure 6b).The method did not consider feedback from the water flow component to the sediment transport component as this required an excessively large number of simulations (i.e., 243).
However, this interaction may represent an important effect in the prediction of overland flow erosion with surface run-off.
The results of the multiparameter sensitivity analysis demonstrate how different combinations of Stk-K sat can produce similar changes in simulated water yield and event flow peaks.This occurred with some parameters having very high values compared with calibrated values (e.g., F 10 K sat = 0.5 m day −1 ).Nonetheless, comparable measured values (i.e., 0.3 m day −1 ) have been reported at field scale (Chappell & Franks, 1996) in the nearby Slapton Wood catchment.Similarly, K r -K f combinations can provide comparable changes in sediment yield and in the mean of sediment flux peaks without necessarily selecting the appropriate parameter value.
Testing sensitivity to grid resolution showed that decreasing grid size increases the variance in the slope angle distribution (Figure 7a) leading to increases in simulated event peaks (μQp and μSp) and a decrease in event duration (μQw and μSw) with only slight variations in flow volume (Qv; Figure 8a).More complex behaviour was observed for Sy (Figure 8b), which could be explained by differences

(
Figure 1).Catchment DEMs derived from Ordnance Survey (OS) data were obtained through EDINA Digimap with a 5 m resolution.The elevation of the Blackwater catchment ranges from 49 m at the outlet up to 250 m on the crest (south-east), and slopes range from 0 to 36 across the catchment.The Kit Brook catchment elevation rises from 42 m at the outlet to 251 m in the upper northeast reaches, with slope angles ranging between 0 and 36 .The nearest rainfall station with 15 min data available for the period overlapping hydrological measurements was at Raymond's Hill (Met Office, 2018 [50 46 0 0.84 00 N 2 57 0 46.08 00 W, 85 m]), approximately 5 km south of the catchments.Mean daily temperature data were obtained from Seavington station (ID 9092 [50 56 0 26.9 00 N 2 51 0 35.6 00 W, 85 m]) approximately 16 km north of the catchments, which was used to estimate daily PE (mm) using a PET formula (Oudin, Michel, & Anctil, 2005).Climatological data range from 1 October 2009 to 30 September 2014.Measurements of pressure and turbidity were obtained with a 15 min time step.Troll 9500 probes were installed at the outlet of each catchment in September 2010, and measurements continued until December 2014.Pressure data were converted to flow (m 3 s −1 ) based on the stage-discharge rating curve using available measurements collected for each catchment.An extrapolation method was used based on three types of regression equation: linear, quadratic, and power.Mean and base flow values were also obtained from similar size catchments in the region (Centre for Ecology and Hydrology, 2018) and compared with estimated flow.The quadratic equation gave values closest to other gauged flow and the highest regression coefficients (R 2 = 0.9 [Blackwater], 0.9 [Kit Brook]).Flow duration curves were calculated and show similar flow responses between the catchments (Figure 2).In Kit Brook, the fraction of daily discharge over annual mean flow was slightly higher than in Blackwater 90% of the time, whereas Blackwater exceeded Kit Brook only during periods of high flow, suggesting a higher permeability in Kit Brook than Blackwater.Turbidity data were converted to suspended sediment concentration (mg L −1 ) based on the regression equation, y = 1.1049 × −15.005 (R 2 = 0.9; Little, 2012), and suspended sediment flux (kg s −1 ) was computed from sediment concentration and flow.Issues with turbidity data quality arose during some periods; therefore, it was necessary to exclude some events from further analysis.The most recent land cover for each catchment was characterized by a 2010 ground-based field survey of the Blackwater catchment from the Westcountry Rivers Trust and by digitizing 2010 imagery from Google Earth for Kit Brook.The 2010 survey map was classified into four land covers: (a) urban, (b) deciduous woodland, (c) arable F I G U R E 1 Location of catchments, meteorological stations, and stream gauging sites instrumented with level and turbidity probes (Troll 9500) model simulation at the time i (i = 1 … m), and μO = mean of observe measurements (1 … m).Measured and simulated flow volume (Qv [m 3 ]), maximum flow discharge (Qp [m 3 s −1 ]), sediment yield (Sy [t ha −1 ]), and maximum sediment flux (Sp [kg s −1 ]) were obtained for each selected event.The absolute percentage difference between measured and simulated data (Qv, Qp, Sy, and Sp) for each event (Abs diff ) was calculated.The mean of the Abs diff was used to quantify event model performance (Equation ( NSE and the minimum μAbs diff .Model accuracy in flow and sediment flux prediction was assessed for the final calibration (Blackwater) and for the evaluated simulation (Kit Brook) using the previous coefficients and additionally by comparing measured against simulated data using (a) monthly mean run-off; (b) a monthly window of the coefficient of determination (R 2 ) for discharge and sediment flux (i.e., R 2 Q and R 2 Model outputs analysed included 4-year cumulative flow volume (Qv) and the mean of flow peaks (μQp) for the parameters in the water flow component (AE/PE, Stk, and K sat ) and 4-year sediment yield (Sy) and mean of sediment flux peaks (μSp) for the parameters in the sediment transport component (K r and K f ).output by factor change of 10 on the corresponding parameter, F 0.01 out = model output by the factor change of 0.01 on the corresponding parameter, and F 1 out = model output for the base run on the corresponding parameter.
Model sensitivity to grid resolution was examined by running simulations of the Blackwater catchment with grid sizes of (a) 25 × 25 m, (b) 50 × 50 m, (c) 100 × 100 m, and (d) 200 × 200 m.Considering the longer model run time associated with the finest resolution (15 days), the climate records were restricted to 3 years (October 2009-September 2010 [spin up] and October 2010-September 2012).
S out br = model output of base run simulation and S out i = model output of simulation i (i = 1 … m), m = 27 for the multiparameter analysis on the water flow component, 9 on the sediment transport component, and 4 for the simulation on model sensitivity to grid resolution.The catchment representation of grid size resolution (25 × 25, 50 × 50, 100 × 100, and 200 × 200 m) was assessed using probability density plots of the distribution of topographic slope and percentage of land cover represented.Model comparison with measured flow and sediment flux was undertaken for each of the grid size simulations using the same coefficients as in the model calibration and evaluation processes (NSE, μAbs diff [Qv, Qp, Sy, and Sp]).
Figure4c,d).Correspondingly, the absolute percentage of difference between the measured and simulated data (μAbs diff ) showed lower values for Qv (30%) than Qp (41%).A high variability was observed in the representation of sediment export compared with measured data; better prediction of sediment yield (Sy [R 2 = 0.4]) than sediment flux peaks (Sp [R 2 = 0.2]) was observed using the coefficient of determination (Figure4e-f).In contrast, the μAbs diff was higher for Sy (106%)

(
Sy) was most sensitive to the K f parameter, whereas the mean of sediment flux peaks (μSp) was more sensitive to K r .Nonetheless, SI differences between the erodibility coefficients (K r and K f ) were relatively small for both model responses (Sy and μSp).The multiparameter approach to sensitivity analysis on the water flow component showed how different combinations of the parameters, especially Stk and K sat , can produce similar percentage of differences (Pct diff ) with respect to the base run (Figure6a).For example, simulations S8 and S16 reduced flow volume (Qv) by 15%.Likewise, similar Pct diff in mean of flow peaks (μQp) was observed in simulations F I G U R E 3 Measured and simulated monthly mean runoff in (a) Blackwater catchment and (b) Kit Brook catchment of a sixth homogenous layer with 18.5 m depth (1.5-20 m).c Ratio of total leaf area to area of ground covered by vegetation.d Values varied over time.e Twenty-four events with NSE Q > 0.5.f Proportion of ground covered by vegetation.g Twenty-seven events with NSE Q > 0.5.h Proportion of ground shielded by near-ground cover.i Proportion shielded by ground-level cover.with different combination of K sat and Stk values.For example, the simulations S8 and S15 presented μQp changes of −66%; and S2 and S21 produced changes of approximately 53%.An equifinality problem, in terms of flow volume and peaks, appears when increases in K sat occur in combination with decreases in Stk, and vice versa.Although the shape of hydrographs varies between simulations, specifically, S8

F
I G U R E 4 (a) Coefficient of determination (R 2 Q ) between the measured and simulated discharge for each event in a monthly window, (b) coefficient of determination (R 2 Sed ) between the measured and simulated sediment flux for each event in a monthly window and plots of simulated versus measured events for (c) flow volume (Qv), (d) peak flow (Qp), (e) sediment yield (Sy), and (f) peak sediment flux (Sp) for the Blackwater catchment T A B L E 2 Comparison of model performance coefficients for the Blackwater and Kit Brook catchment simulations

Flow
measurements derived from the stage-discharge rating curve may contribute to event-scale model uncertainty.Nevertheless, underestimation of discharge peaks and accurate base flow was F I G U R E 5 (a) Coefficient of determination (R 2 Q ) between the measured and simulated discharge for each event in a monthly window, (b) coefficient of determination (R 2 Sed ) between the measured and simulated sediment flux for each event in a monthly window and plots of simulated versus measured events for (c) flow volume (Qv), (d) peak flow (Qp), (e) sediment yield (Sy), and (f) peak sediment flux (Sp) for Kit Brook catchment T A B L E 3 Sensitivity index (SI) of cumulative flow (Qv) and sediment yield (Sy), and mean of event peak discharge (μQp) and peak sediment flux (μSp) of the complete simulation period (October 2010-September 2014) ented lower values when compared with the best-fit calibration in Blackwater.This could be related to the necessary substitution of the HENSE soil type (covering 20% of the catchment) with the QUORND, based on their similar soil description as a result of the absence of soil parameters in the NSRI dataset (Cranfield University, 2018).Moreover, the lack of a land cover map produced by field-scale survey for explain that this lack of prediction performance in SHETRAN is a consequence of variation in soil cohesive strength, where increases in soil moisture can reduce soil cohesion and increase erodibility.The SHETRAN model does not represent spatial-temporal variability in soil erodibility within overland flow erosion.Seasonal changes in erosion are represented via variability in rainfall intensity in the raindrop/leaf drip F I G U R E 6 (a) Combinations of the changes in parameters (Stk, AE-PE, and K sat ) and the difference in flow volume (Qv) and mean of discharge peaks (μQp) for each simulation (S1-S27) with respect to the base run (S14), (b) combinations of the changes in parameters (K r and K f ) and the difference in sediment yield (Sy) and mean of sediment flux peaks (μSp) for each simulation (S1-S9) with respect to the base run (S5) impact equation and by temporal changes in the proportion of ground shielded by vegetation canopy(Lukey et al., 1995).Suspended sediment concentration (C)-discharge (Q) hysteresis behaviour was present in the majority of measured events and could influence SHETRAN sediment load predictions as the model predicts arrival of both fluxes at the same time(Adams et al., 2012;Elliott et al., 2012;Sheikh et al., 2010), which might be expected based on the sediment routing equation.It is possible that if major run-off F I G U R E 8 Percentage of changes between base run (50 × 50 m) and each DEM grid resolution simulation; (a) flow volume (Qv), mean of flow peaks (μQp) and mean of event flow duration (μQw), and (b) sediment yield (Sy), mean of sediment flux peaks (μSp), and mean of event sediment duration (μSw) T A B L E 4 Evaluation coefficients for each digital elevation model grid size simulation U R E 7 (a) Probability density plots of the grid slope for each Blackwater catchment digital elevation model (DEM) grid size; (b) land cover percentage for each Blackwater catchment DEM grid size; and Blackwater stream network (blue line) of each DEM resolution: (c) 25 × 25 m, (d) 50 × 50 m, (e) 100 × 100 m, and (f) 200 × 200 m in the channel network produced by the contrasting DEM grid resolutions (Figure7c-f).The differences between base run (50 × 50 m) coefficients in Tables2 and 4relate to the use of the first 2 years(2010)(2011)(2012) only for DEM simulations.The model performed better in wet than dry periods, and during the first year, rainfall (694 mm) was 68% of the annual average total during the complete measurement period (1,021 mm).Compared with measured data (2010-2012), the simulation with the best performance was 50 × 50 m (Table4); this was expected as it was calibrated.Nonetheless, similar model evaluation coefficients were observed between the 25 × 25-and 50 × 50 m simulations.It is possible that a better event prediction could be obtained by using the 25 × 25 m resolution for the calibration process but this represents an important compromise between model run times and grid size.5 | CONCLUSION SHETRAN showed reasonable performance in Blackwater and Kit Brook catchments by producing an accurate flow prediction for the 4-year period.Moreover, SHETRAN simulations of discharge and sediment flux performed well in terms of event timing in both catchments.Larger errors in event-scale estimation for both discharge and sediment yield were observed in Kit Brook than Blackwater.Adjustment of K sat for the lowermost subsoil-bedrock layer improved model event-scale accuracy in Kit Brook.Nonetheless, prediction of sediment yield and sediment flux peaks remained overestimated.The proxy-catchment test demonstrated that SHETRAN can predict event-scale flow with reasonable accuracy but requires catchment-specific calibration for sediment flux prediction.The SI showed that flow volume (Qv) was most sensitive to saturated hydraulic conductivity (K sat ) and flow peaks (μQp) to the Strickler coefficient (Stk), whereas sediment yield (Sy) was slightly more sensitive to the overland flow erodibility coefficient (K f ) and sediment flux peaks (Sp) to the raindrop/leaf soil erodibility coefficient (K r ).The multiparameter sensitivity analysis showed that potential equifinality behaviour occurring with K sat increases in combination with Stk decreases and vice versa.Similar effects were observed for combinations of K r and K f .The model showed better performance for the base run (50 × 50 m) than the other grid size simulations (range 25-to 200 m grid size) with small differences between the base run and the finest resolution (25 × 25 m).The DEM grid size variations showed the largest effects in event peaks and duration for both flow and sediment flux, with least variability in predicted event sediment yields.

Table 1
water flow component was calibrated by changing four parameters.The parameter value and NSE for continuous discharge and the μAbs diff (Qv and Qp) of the best fit simulation are presented in . The sequence in the table follows the order in which parameters were calibrated.After the best-fit discharge is obtained, four parameters were calibrated for the sediment transport component in which the μAbs diff (Sy and Sp) was calculated for each simulation.The second baseline simulation appearing in Table1corresponds to the last calibration for the water flow component and set-up values in the sediment transport component.The K f T A B L E 1 Calibrated simulations with the corresponding best fit parameter values (NSE and μAbs diff[Qv, Qp, Sy, and Sp])