When is a hydrological model sufficiently calibrated to depict flow preferences of riverine species?

Riverine species have adapted to their environment, particularly to the hydrological regime. Hydrological models and the knowledge of species preferences are used to predict the impact of hydrological changes on species. Inevitably, hydrological model performance impacts how species are simulated. From the example of macroinvertebrates in a lowland and a mountainous catchment, we investigate the impact of hydrological model performance and the choice of the objective function based on a set of 36 performance metrics for predicting species occurrences. Besides species abundance, we use the simulated community structure for an ecological assessment as applied for the Water Framework Directive. We investigate when a hydrological model is sufficiently calibrated to depict species abundance. For this, we postulate that performance is not sufficient when ecological assessments based on the simulated hydrology are significantly different (analysis of variance, p < .05) from the ecological assessments based on observations. The investigated range of hydrological model performance leads to considerable variability in species abundance in the two catchments. In the mountainous catchment, links between objective functions and the ecological assessment reveal a stronger dependency of the species on the discharge regime. In the lowland catchment, multiple stressors seem to mask the dependence of the species on discharge. The most suitable objective functions to calibrate the model for species assessments are the ones that incorporate hydrological indicators used for the species prediction.

, using microcosm experiments (Ceola et al., 2013), statistical models (Kakouei et al., 2018), or process-based models (Mondy & Schuwirth, 2017). In the absence of direct measurement data or for scenario assessments, modelled streamflow is often used as a data basis for such analysis. Although it is important to match the spatial scales on which streamflow, for example, catchment scale, is produced and species are modelled, for example, habitat scale, uncertainties and inaccuracies in simulated streamflow remain and will consequently affect the simulated species.
Hydrological simulations are impacted by the quality as well as the spatio-temporal simplification of the input data (Melsen et al., 2016), which hydrological model type, model algorithms, and depicted processes are chosen (Fenicia, Kavetski, & Savenije, 2011), the type and mathematical formulation of the model algorithms (Clark et al., 2015), the uncertainty and equifinality of the model parameters (Beven, 2007), the quality and type of observations (e.g., time step interval, time period length, and quality of rating curves), to which the model results are compared (Seibert & McDonnell, 2002), and the type  and the number of objective functions used to parameterize the model (Shafii & Tolson, 2015). The factor integrating all of these dependencies is the overall model performance, measured by a variety of hydrological metrics that compare simulations to observations Reusser, Blume, Schaefli, & Zehe, 2009) and that are used for model calibration and validation.
Hydrological literature is available that contains guidelines and thresholds for certain metrics that enable an assessment of when model performance is sufficient for hydrological applications; for instance, Ritter and Muñoz-Carpena (2013) list limits for the Nash-Sutcliffe efficiency (NSE) above which hydrological model performance is acceptable, good, and very good. Such recommendations do not exist for ecological applications because, until recently, it was not possible to assess how hydrological model performance impacts species responses because, to our knowledge, no quantitative link between flow and macroinvertebrate abundances existed. For Germany, Kakouei, Kiesel, Kail, Pusch, and Jähnig (2017) established these flow-species linkages for macroinvertebrates. By applying these linkages on simulated streamflow, it can now be tested how modelled species abundance changes for different hydrological model performances. Kakouei et al. (2017) developed these linkages using the indicators of hydrological alteration (IHAs; Olden & Poff, 2003). Multiple studies showed that a successful representation of IHAs in hydrological models requires a targeted optimization process towards these IHAs (Pool, Vis, Knight, & Seibert, 2017). Kiesel et al. (2017) developed a methodology for a tailored optimization of hydrological models for these IHAs.
However, a key problem in assessing when a hydrological model has sufficient performance to model species occurrences remains, because species abundance alone is not yet a clear indicator for a riverine ecosystem status. A complex assessment considering the ecoregion, stream type, species richness and diversity, as well as its community structure is needed to assess the health of the riverine ecosystem for the European Water Framework Directive. In Europe, the assessment calculations are supported by ASTERICS software (Hering, Borja, Carvalho, & Feld, 2013), which calculates the ecological status of rivers as different metrics based on benthic invertebrate taxa lists. The assessment metrics are defined in classes, and if similar classes arise from different assessments, the results can be considered stable and robust.
We are attempting to answer two research questions: (1) Do different objective functions and does different model performance matter for predicting species occurrences? (2) When does a hydrological model have sufficient performance to simulate species occurrences so that ecological assessments based on this simulation are stable? Both are pertinent research questions because the improvement of hydrological model performance requires significant efforts in minimizing the effects in the above-mentioned dependencies on model performance and may limit the application of species predictions to wellresearched and data-rich study regions.
To answer these questions, we will assess the importance of hydrological model performance for simulating macroinvertebrate species in two mesoscale catchments in Germany. Therefore, species predictions are made with hydrological model simulations optimized (a) to the exact species flow preferences (IHAs), (b) to multi-objective functions (MOFs) considering the trade-off between multiple flow preferences, and (c) to standard hydrological performance criteria (HPC) on daily, monthly, and annual time steps. To evaluate the significance of these optimization steps, a comparison is made to species predictions using the observed flow conditions and models without any optimization.

| MATERIALS AND METHODS
To test the impact of different model performances, we need to generate different hydrological model parameterizations. These models provide different discharge time series, to which we add the observed discharge to complete the set of discharges that is used for the analysis ( Figure 1, Step 1). In Step 2, these different discharges are translated into five IHA metrics related to the duration, frequency, magnitude, rate of change, and timing of the discharge. In Step 3, these five IHA metrics are then used to predict species abundance for each catchment, species, and discharge time series separately; this is the basis to answer Research Question 1. In Step 4, based on the resulting species lists of Step 3, metrics are calculated that define the ecological status originating from the different discharges. In Step 5, the distribution of these ecological metrics is assessed according to their similarity, which is the basis to answer Research Question 2.

| Study areas
The methodology is applied in two mesoscale catchments in Germany ( Figure 2 and Table 1). The Treene is a northern German lowland catchment where hydrological processes are governed by low hydraulic gradients, high groundwater influence, and agricultural land use, which led to artificial tile drainage of approximately one third of the catchment (Fohrer, Schmalz, Tavares, & Golon, 2007). The Treene contains the catchment of the Kielstau, Germany's first UNESCO ecohydrological demonstration site (Fohrer & Schmalz, 2012). The Kinzig, located in the mid-mountain range of Germany, is part of the Rhine-Main-Observatory and is a long-term ecological research (Haase, Frenzel, Klotz, Musche, & Stoll, 2016) site. At this site, different

| Hydrological model
The Soil and Water Assessment Tool (SWAT) model (Arnold, Srinivasan, Muttiah, & Williams, 1998) in the version SWAT3S (Pfannerstill, Guse, & Fohrer, 2014a) was used to simulate the hydrological processes in the catchments. In contrast to the original Soil and Water Assessment Tool model, SWAT3S uses two groundwater storages that can be independently controlled for groundwater flow into the stream and a third storage that may be used to account for percolation into geologic  (BGR, 1995). Climate data were derived from precipitation, temperature, wind speed, solar radiation, and humidity stations (DWD, 2016; Figure 2). Channel geometry was taken from satellite images (Google Earth, 2016) and field observations. Sowing, fertilization, harvest, and tillage data followed standard German agricultural practices (KTBL, 2009). Tile drains were implemented according to the methodology described by Guse, Reusser, and Fohrer (2014), where HRUs with slopes smaller than 1.25% and agricultural land use patterns and soils prone to water logging were classified as "drained." This parameterization is designated as the "default" run.

| Obtain discharges with different model performances (Step 1)
To obtain different simulated discharges, the hydrological models were run 20,000 times for a 6-year calibration period from 2010 to 2015. Parameter combinations were identified by Latin hypercube sampling of the parameter space presented in Table 2. The analysis was performed according to the methodology described by Pfannerstill, Guse, and Fohrer (2014b) using the R-package Calibration, Sensitivity and Monte Carlo Analysis in R (FME) (Soetaert & Petzoldt, 2010). These parameters influence the major hydrological processes of snow accumulation and snowmelt, surface runoff, soil moisture, and groundwater. Thirty-six metrics (Table 3) were calculated for all simulations to assess the model performances gained from the 20,000 parameterizations. The selected metrics can be categorized into three groups: nine IHAs, three MOFs, and 24 standard HPC.
The IHAs were selected as optimization criteria to ensure that the hydrological model depicts the individual IHAs and, therefore, the species preferences as well as possible. This is necessary because T A B L E 1 Main physical, climatic, and hydrological characteristics and information about the macroinvertebrate species of the Treene and Kinzig catchments Abbreviations: JJA, summer (June, July, and August); DJF, winter (December, January, and February). (ED Extr ) according to Richter, Baumgartner, Powell, and Braun (1996) because hydrological extremes significantly impact species occurrence (Stubbington et al., 2009).
The HPCs were selected to evaluate the impact of applying the optimization methodology commonly used in hydrological modelling.
Therefore, standard performance metrics were selected that were optimized on daily (subscript D), monthly (subscript M), and yearly  2.4 | Calculation of IHA metrics for species models ( Step 2) As described in Table 3, we selected dh4, fl2, ml16, ra7, and ta3 in the Treene and dh4, fl1, ml18, ra4, and th3 in the Kinzig for the simulation of species abundance, because these were found to be most important for the communities of stream macroinvertebrates in each catchment (Kakouei et al., 2018). This was done based on the observations as well

| Calculate ecological status (Step 4)
Biological diversity has widely been used to assess ecosystem health . To estimate whether a change in species abundance would result in an ecological effect, we computed different T A B L E 3 Description of metrics used for optimization

| Assessment (Step 5)
Finally, we compared the values of the four selected ecological assessment (ASTERICS) metrics resulting from the observations and the hydrological simulations over all the sampling sites. As long as no significant differences (p > .05) are detected between ecological status classes, it can be argued that the respective model simulations have no significant ecological effect and can therefore be accepted as suitable.
3 | RESULTS AND DISCUSSION

| Hydrological model optimization
The detailed statistical performance of the selected simulations is shown in Table 4 and in the Supporting Information. Table 4 (Tables S1 and S2).
Optimizing to the HPCs shows a difference between the optimi- The resulting hydrographs selected from the 36 metrics (Table 3) and the default run are compared with the observed flow in Figure 4 to give a visual impression of the calculated performance statistics.
Analysing the daily flow values shows that 83% and 76% of the observed values are within the range of simulations in the Treene and Kinzig, respectively. It can be seen that the simulated low-flow periods show a high range in the Treene, which is due to the strong groundwater influence. In the Kinzig, the recession phases show a high range in the simulations. The default model setting causes a single high peak flow in January 2011. The results for all the species are shown in Figure 6, which shows the variability directly. In both catchments, a strong gradient can be observed between the species that are sensitive to flow changes or not sensitive ( Figure 6). In the Treene catchment, the strongest variability was detected according to magnitude and rate of change in flow events, whereas changes in frequency and rate caused the strongest variability in the abundance of species in the Kinzig catchment.

| Species abundance
The full range of species abundance is supplied as box plots for each species in Figure S1a,b.
The bar plots enable the assessment of whether species are generally more susceptible to flow changes in all the IHA categories or whether they are "specialists" for certain IHA variables only. B. vernus is among the most sensitive species in the Kinzig for duration and frequency, but its sensitivity varies across metrics and catchments ( Figure 6, red bars). These results suggest that for a holistic assessment of overall species abundance, it is necessary to optimize the hydrological model to sufficiently depict all the indicators simultaneously.

| Ecological assessment
The response of communities to the different species abundances observations. This is reasonable because those groups also lead to the highest agreement between the simulated and observed discharge patterns.
In the lowland catchment of the Treene, only the models calibrated towards daily PBIAS (PBIAS D ), the ED between extremely lowand high-flow IHAs (ED Extr ), and the IHA seasonal predictability of nonflooding (th3) lead to significantly different ASTERICS metrics of the GD and GSIs.
If the results are grouped according to each IHA group within the four assessment metrics, it can be seen that the magnitude group in the Kinzig and the rate group in the Treene are subject to significant changes ( Figure S3a,b). Similar to Figure 7, over all the IHA groups, more pronounced changes are found in the Kinzig.
A possible explanation for the smaller differences in the Treene compared with those in the Kinzig is that the species in the lowland show a stronger dependency on water quality and river morphology and less on the discharge pattern (Kiesel et al., 2015;Schröder et al., 2013). Discharge in the lowlands is generally less erratic and smoother compared with that of more mountainous catchments due to higher groundwater influence (Guse et al., 2019), whereas water quality and morphological degradation are more of a concern due to the high agricultural impact (Wagner, Hörmann, Schmalz, & Fohrer, 2018

| CONCLUSIONS AND OUTLOOK
Our results show that the objective function and model performance influence the prediction of species occurrences and that different calibration efforts lead to different simulated species abundances (Research Question 1). As expected, these results are species dependent, where specialists that accept specific ranges of streamflows are more sensitive than generalists that are distributed over larger flow ranges (Kakouei et al., 2018). Hence, the species response to different calibration stages depends on the sensitivity of the species to the particular IHA  and how well the model is able to replicate this IHA . These results are different for the two catchments, indicating that different stressors in the catchments lead to different species sensitivities to flow changes. To deduce more generalized results from the proposed method, the application of the method to a higher number of heterogeneous catchments is needed. This could potentially reveal the spatial differences between species sensitivity to flow changes.
Research Question 2 (sufficient performance to simulate species occurrences so that ecological assessments are stable) was answered through calculating the ecological assessments from all the simulated species lists and statistically evaluating their similarity. In the Kinzig, plausible results were found, where, generally, hydrologically poor performing models versus observed flows led to significantly different ecological assessments. In the Treene, no clear pattern between hydrological model performance and significantly different ecological assessments could be found; for instance, even the default model setting led to no significant differences. However, a direct comparison between the hydrological model performance and the number of significant changes in the ecological status classes in both catchments revealed that skilled hydrological models are sufficient to depict species responses, that is, lead to no significant differences between the status classes. This may provide a first careful threshold, but due to our small sample size of two catchments, we argue that studies assessing the impact of hydrological change on species should not evaluate the calibration performance on HPC alone. Until larger catchment sample studies lead to more robust results, it is necessary to first assess the performance of the model to predict the metric used for the species prediction and second assess the sensitivity of the species to this metric. Additionally, although stream discharge is a significant descriptor of macroinvertebrate abundance, our study shows that in catchments where multiple stressors, such as lower water quality and morphological degradation, occur, multiple stressors should be considered in the species simulation.

ACKNOWLEDGEMENTS
This study was funded through the "GLANCE" project (Global change effects in river ecosystems; 01LN1320A) supported by the BMBF below) of all the calibration runs and the corresponding significant changes in the ecological status classes (sum of red asterisks in Figure 7 for each column); red line shows the cut-off model performance above, which no significant changes occur factors in structuring species, trait composition, and beta diversity of pelagic algae.