Complexity in hydroecological modelling: A comparison of stepwise selection and information theory

Understanding of the hydroecological relationship is vital to maintaining the health of the river and thus its ecosystem. Stepwise selection is widely used to develop numerical models which represent these processes. Increasingly, however, there are questions over the suitability of the approach, and coupled with the increasing complexity of hydroecological modelling, there is a real need to consider alternative approaches. In this study, stepwise selection and information theory are employed to develop models which represent two realizations of the system which recognizes increasing complexity. The two approaches are assessed in terms of model structure, modelling error, and model (statistical) uncertainty. The results appear initially inconclusive, with the information theory approach leading to a reduction in modelling error but greater uncertainty. A Monte Carlo approach, used to explore this uncertainty, revealed modelling errors to be only slightly more distributed for the information theory approach. Consideration of the philosophical underpinnings of the two approaches provides greater clarity. Statistical uncertainty, as measured by information theory, will always be greater due to its consideration of two sources, parameter and model selection. Consequently, by encompassing greater information, the measure of statistical uncertainty is more realistic, making an information theory approach more reflective of the complexity in real‐world applications.

Hydroecological models are predominantly developed through statistical methods such as regression analysis, including multiple linear regression (e.g., Clarke &Dunbar, 2005, andMonk et al., 2007), and multilevel models (recent examples include Bradley et al., 2017, andChadd et al., 2017). Algorithms are commonly employed to do the "heavy lifting" in the determination of model structure; in hydroecology specifically, stepwise multiple regression is widely used.
Examples of the use of stepwise multiple regression in hydroecological modelling include Wood, Hannah, Agnew, and Petts (2001) for the identification of hydrological indicators of importance in a groundwater stream; Wood and Armitage (2004) to determine the influence of drought and low flow variability on macro-invertebrate abundance; Knight, Brian Gregory, and Wales (2008) to establish environmental flow requirements; Monk et al. (2007) and Worrall et al. (2014) on a Principal Component Analysis (PCA)-reduced set of hydrological indices; Surridge, Bizzi, and Castelletti (2014) included stepwise selection methods in their development of the iterative input variable selection algorithm (for the development of hydroecological models); Greenwood and Booker (2015) for the identification of important indices in the case of invertebrate response to floods; and Bradley et al. (2017) to realize important terms when considering the effects of groundwater abstraction and fine sediment pressures. Additionally, the authors have previously used a stepwise-based method as part of a preliminary analysis to identify a more complex aspect of the hydroecological relationship with regard to long-term flow variability and lag in ecological response (Visser et al., 2017). Nonstatistical hydroecological modelling is also known to make use of stepwise selection, for example, Parasiewicz et al. (2013) apply stepwise methods in their application of the MesoHABSIM model.
Stepwise methods are attractive, in general, as the statistical theory and assumptions are well established (Whittingham, Stephens, Bradbury, & Freckleton, 2006). Burnham and Anderson (2002) assert that they represent a particularly straightforward and accessible method for the nonstatistician. An algorithm adds and/or subtracts variables (indices) according to identified criteria, stopping once the criterion has been met, resulting in a single, final model. The assumption is that this single model represents the "best" model with the most predictive power.
Increasingly, there is widespread recognition of the limitations of stepwise methods, which have, in the past, been overlooked (Hurvich & Tsai, 1990;Steyerberg, Eijkemans, & Habbema, 1999;Whittingham et al., 2006). A model of a system is, by nature, only ever an approximation of reality; there is no such thing as a true model (Burnham & Anderson, 2002). Coupled with the increasing complexity of hydroecological modelling, the robustness and validity of the statistical approach is critical. In applied statistics, alternate modelling approaches are increasingly favoured, particularly in the ecological sciences (Burnham & Anderson, 2014;Hegyi & Garamszegi, 2011;Stephens, Buskirk, Hayward, & MartÍNez Del Rio, 2005;Wasserstein & Lazar, 2016;Whittingham et al., 2006). Alternate regression methodologies include partial least squares regression, an option when the predictors are not truly independent (common in hydroecological modelling); and shrinkage methods, where penalties/constraints are introduced; ridge and lasso regression can be effective when there are a large number of predictors (Dahlgren, 2010).
Since the beginning of the 21st century, three measures of statistical validity have been identified with unanimity across disciplines: effect size, levels of (statistical) uncertainty, and the weight of evidence supporting the hypothesis (Burnham & Anderson, 2002;Burnham, Anderson, & Huyvaert, 2011;Stephens et al., 2005;Wasserstein & Lazar, 2016;Whittingham et al., 2006). In asking what methods can satisfy these requirements, the field of information theory stands out as the dominant alternative (see position arguments and extensive discussion: Burnham et al., 2011 andWhittingham et al., 2006). Chadd et al. (2017) represents one of the few examples of the application of information theory in hydroecological modelling.
In this paper, the standard hydroecological approach for developing statistical models (stepwise selection) is compared with the increasingly popular information theory, now regularly utilized in applied ecology to investigate which is the most appropriate approach to model the complexities of the hydroecological relationship. Multiple regression models are developed for a groundwater-dominated catchment, where two scenarios of different levels of complexity are considered: The first features standard interannual variables, whereas the second considers lagged ecological response. The performance of each approach in each scenario is assessed.

| Catchment data
The River Nar has a distinctive change at its midpoint, from chalk to fen river. The focus of this paper is the 153.3 km 2 chalk subcatchment ( Figure 1). A reliance on groundwater and aquifer recharge (BFI 0.91) results in a highly seasonal flow regime (Sear, Newson, Old, & Hill, 2005). Aquifer recharge primarily occurs in the winter months, with a progressive rise in flow until March/April. Macro-invertebrates serve as the proxy for ecological response.
Response is determined using the Lotic-Invertebrate Index for Flow Evaluation (LIFE), accounting for macro-invertebrate flow velocity preferences (Extence, Balbi, & Chadd, 1999). Macro-invertebrate sampling data were provided by the Environment Agency for six sites ( Figure 1; EA, 2016); the sampling methodology follows the Environment Agency's standard semi-quantitative protocol (see Murray-Bligh (1999). Seventy-two macro-invertebrate samples, collected in the spring season (April-June, 1993-2012, were used to determine LIFE scores at the species level; see Figure 2 for the average spring LIFE scores during the study period. The ecological data were paired with the antecedent seasonal hydrologic indices.

| Modelling scenarios
The multiple linear regression modelling approaches are applied to two scenarios. In scenario A, the 10 (interannual) hydrologic indices described previously are considered. Scenario B incorporates ecological lag in response, a reflection of the inherent complexity of the hydroecological relationship. Following Visser et al. (2017), 30 hydrologic indices result from the interannual indices being time-offset up to 2 years (t-2).

| Stepwise regression
Two methods of stepwise selection are applied, backwards and bidirectional. Being unidirectional, backwards represents greater economy, performing fewer steps to select the smallest model. The algorithms are specified to remove variables which are not significant (alpha threshold = 0.05) and hence presumed unimportant to the hydroecological relationship. Bidirectional stepwise selection is applied using the function step, from the base statistical package stats, whereas the backwards algorithm is applied using the ols_step_backward function from olsrr, a package for the development of ordinary least squares regression models (Hebbali, 2017).
These methods yielded the same models, therefore no further differentiation is made.

| Information theory
The information theory approach provides a quantitative measure of support for candidate models. Subsequently, inference is made from multiple models through model averaging. The candidate models are evaluated with respect to the three steps detailed below; for further information, see Burnham and Anderson (2002).
Step 1. Loss of information from model f Kullback-Leibler measures the amount of information lost when model g is used to approximate reality, f . The model with the least information loss (greatest supporting evidence of the candidates) is considered the best approximation of reality.
The information loss, I( f , g), is determined through computation of an information criterion. The Akaike Information Criterion (AIC) represents the standard estimate (Burnham & Anderson, 2002). In hydroecological modelling, the sample size is often small relative to the number of variables; here, a second order bias correction, AICc, is used (Burnham & Anderson, 2002).
Step 2. Evidence in support of model g i The value of AICc is dependent on the scale of the data; the goal is to achieve the smallest loss of information. This difference is rescaled and ranked relative to the minimum value of AICc: This provides a measure of evidence, from which the likelihood that model g i is the best approximating model can be determined. This is known as the Akaike weight, w, ranging from 1 to 0, for the most and least likely models, respectively: Step 3. Multimodel inference The best approximating model is inferred from a weighted combination of all the candidates. Parameter averages, b θ, are the sum of the Akaike weights for each model containing the predictor, b θ: Parameter averages are ranked, such that the highest value represents the most important in the model.

| Package glmulti
There are two options for the application of information theory in R: MuMIn (Bartoń, 2018) and glmulti (Calcagno, 2013). The application of the former centres around "dredging" (data mining) to determine the model subset (e.g., see Grueber, Nakagawa, Laws, and Jamieson (2011)). The package glmulti offers apposite functionality (see below) and has been developed and applied in a relevant discipline (see). In glmulti, information theory is applied to subsets of models selected by a genetic algorithm (GA) from which the multimodel average is derived using the function coef. A GA is a type of optimization that mimics biological evolution. The GA incorporates an immigration operator, allowing reconsideration of removed variables. Immigration increases the level of randomisation and hence the likelihood of model convergence on the global optima (the best models from the available data) rather than some local optima (Calcagno & de Mazancourt, 2010). Inference from a consensus of five replicate GA runs has been shown by Calcagno and de Mazancourt (2010) to greatly improve convergence.

| Analysis
For each scenario/approach, the best approximating model is derived.
The comparative assessment looks at model structure, modelling error and statistical uncertainty.
The analysis of the model structures begins with a review of the selected indices and summary statistics (adjusted R-squared and P values). Being evidence-centric, these statistics are at odds with the underlying philosophies of information theory (revisited in the discussion). Instead, importance, the relative weight of evidence in support of each index in the model ( Step 3), is considered.
Model error assesses how well the given model simulates the data, here, the observed data. Analysis centres on relative error, defined as the measure of error difference divided by observed value.
These errors are presented as an observed-simulated plot. The distribution and magnitude of modelling errors is further considered through probability density functions.
Uncertainty is introduced throughout the modelling process. In this paper, the focus is on statistical uncertainty defined by Warmink et al. (2010, p.1520) as a measure of "the difference between a simulated value and an observation" and "the possible variation around the simulated and observed values," quantified as 1:96· ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi variance p , where 1.96 represents the 95% confidence level. Simply put the model with the least uncertainty, and hence, the most support should be the best representation of reality. In practice, statistical uncertainty dictates the usefulness of the model. Inaccurate appreciation of this uncertainty, however, prevents meaningful interpretation of the results, leading to less than optimal decision-making (Warmink et al., 2010).
The structure of the best approximating models is detailed in Table 1 and Figure 3  Summary statistics for the two best approximating models are detailed in Table 1

| Model error
Modelling errors are presented in Figure 4. Overall, there appear to be minimal differences between the two approaches, with the stepwise selected model featuring marginally less error. In Figure 4a, it can be seen that the models perform slightly worse at the extremes, with the stepwise model achieving a slightly better fit overall. This is further evidenced in Figure 4b, where errors can be seen to concentrate on the left. Finally, the fitted distributions in Figure 4c feature considerable overlap, further emphasizing the similarities in model performance.

| Model uncertainty
The statistical uncertainty, relative to the parameter estimate, is summarized in Figure 5 (facet 1); the stepwise selected model   3.2 | Scenario B

| Model structure
Here, the differences between model structures are greater than Scenario A (Table 1 and

| Model error
The errors associated with each model are detailed in Figure 7. At first glance, Figure 7a suggests that the models perform equally well for lower LIFE scores, whereas for higher values the information theory model provides marginally better estimates. This is reinforced in Figure 7b, where the relative errors are centred around 0% and −4% for the information theory and stepwise models, respectively. The stepwise model also has a tendency to overestimate.
The extent of these differences is evident in Figure 7c. For the information theory model, 56% of the estimated data points have 5% or less absolute relative error, in fact, almost 50% of the data has less than 2.5%. This is in direct contrast to the stepwise selected model, where only 15% of the data has less than 2.5% absolute relative error; this increases to approximately 48% at 5%. The models do not converge until approximately 9.25% absolute relative error, that is, the largest errors for both models are comparable.

| Model uncertainty
Relative to scenario A, these is an increase in the range of statistical uncertainty ( Figure 5, facets 2-4), particularly for the information  Figure 8a,b shows the information theory MC simulations to be more widely distributed than the stepwise. A snapshot of the error distributions at cumulative densities of 5/50/95% is shown in Figure 8c. It is noteworthy that, here, the range of densities on the y-axis is narrower than in scenario A. The difference is more marked at the 50% and 95% densities, where the distribution of the information theory simulations is flatter and wider, indicating a greater spread of error; in contrast, the error in the stepwise simulations tends towards the lower end.

| DISCUSSION
The initial focus herein is model inference and consideration of the explicit implications of the results. To gain further information on the relative strengths and weaknesses of the two approaches, it is necessary to look beneath the surface. The statistical robustness of the models produced is considered, as well as the underlying philosophies of each approach.

| Model inference
In scenario A, the principle difference in model structure is the parameterisation of summer low flows; information theory focuses on low flows (Q s 90) rather than moderate low flows (Q s 75). Consequently, the differences in modelling error is small. Consideration of statistical uncertainty and the error distributions reveals the stepwise selected model to be more balanced in terms of error distribution.
Despite demonstrated importance, as a groundwater-fed river (Sear et al., 2005), aquifer recharge is not recognized under the scenario B stepwise selected model. There is no consideration of hydrological winter and a low number of parameters overall; concerns thus emerge over the parameterisation of the stepwise selected model in this more complex scenario. In contrast, the information theory model includes seven hydrological indices, three of which reflect winter flows. The importance of these indices is further emphasized by the relatively high weight of evidence ( Figure 3, facets 2-4).
Given the difference in model structure, the similarities in modelling errors are unexpected. Interestingly, with the information theory model, the shape of the error distributions is consistent across the two scenarios, it is only the magnitude of the error that varies (increasing in the lagged scenario). In contrast, the stepwise selected approach sees an increase in error.
The increased uncertainty in scenario B can be considered a direct consequence of the increased modelling complexity. Figure 5 (facets 2-4) suggests that the stepwise model is subject to less uncertainty.
However, the MC simulations ( Figure 8) show that the associated error distributions are similar with regard to shape. However, the information theory curve is slightly flatter, leading to errors of higher magnitude.
Based on these findings, it could be concluded that these two hydroecological modelling approaches perform at similar levels. The principal area for concern may be the increase in statistical uncertainty for the information theory model in the lagged scenario. However, it should be noted that the reasons for the increased uncertainty are multifaceted, a matter discussed further below. Despite differences in model structure and statistical uncertainty, all models, and hence both approaches, have been able to provide satisfactory predictions with comparable modelling error.

| Philosophical underpinnings
Looking to the statistical robustness of the approaches, and underlying philosophies, may serve to further elucidate which method is most appropriate in the hydroecological setting. Considered herein are model selection, the definition of evidence, and statistical uncertainty.

| Model selection
In scenario A, the stepwise model was selected following six steps, that is, six hydroecological models were considered. In terms of their summary statistics, the models considered in the fourth and fifth steps are remarkably similar to the selected model; despite this, under this methodology, these models are rejected. Consequently, in the baseline scenario, hydrological indices capturing summer high flows (Q s 10 and Q s 25) were rejected as model parameters. It could be argued that, by simply making this observation, that this is an elementary form of multimodel inference. In practice, however, these second and third ranking models would not be subject to analysis, and thus, such information would be left unknown. Consequently, it is inferred that the selected model is the only model fit (Burnham & Anderson, 2002). In the alternate approach, information theory considers a larger candidate set, the result being a model-average. Consequently, important variables have not been subject to rejection. This is reinforced by the index of importance (Figure 3), an indication of the relative "impor-

| Evidence
The use of P values has been subject to considerable criticism in excess of 80 years (Burnham & Anderson, 2002). In the context of hydroecological modelling, the fundamental problem is with misinterpretation, where P values are interpreted as evidential. In a statistical sense, the P value is a measure of the probability that the effect seen is a product of random chance. Probability is a measure of uncertainty, not a measure of strength of evidence, which is based on likelihood. (Burnham & Anderson, 2002).
This misinterpretation is not exclusive to hydroecological modelling or stepwise selection, it is prevalent in academia (for example, see Wasserstein and Lazar (2016)). As such, this is such an ingrained error that it cannot be viewed as a criticism of the hydroecological modeller. In their paper on the development of the LIFE hydroecological index (Extence et al., 1999, p. 558

| Statistical uncertainty
Statistical uncertainty is the principal determinant of the usefulness and validity of a model. As suggested previously, given the lower uncertainty, it might be concluded that the stepwise approach performs better overall, particularly in the case of the more complex scenario B, where the uncertainty increases further. However, as discussed under methods, these two modelling approaches report different statistical uncertainties. The stepwise model considers parameter uncertainty, whereas the information theory model also quantifies error due to model selection, thereby providing a measure of the overall structural uncertainty. The subsequent higher uncertainty simply represents a more realistic measure, as Anderson, 2007 (p. 113) points out, when only parameter uncertainty is considered, the "confidence intervals are too narrow and achieved coverage will often be substantially less than the nominal level (e.g., 95%)."

| CONCLUSION
As further aspects of hydroecological relationships are understood, such as ecological lag in response, the likelihood of modelling errors and statistical uncertainty is increased, commensurate with the additional complexity. It is thus vital to ensure the modelling approach is suitably robust. Here, the performance of stepwise selection, one of the standard hydroecological approaches, is considered alongside an alternative popular in applied statistics, information theory. The best approximating models are analysed comparatively. The approaches are applied to two scenarios with increasing complexity: scenario A, focussing on standard interannual variables, and scenario B, taking into account any effect of lag in ecological response.
Notable differences in the models are confined to the lagged scenario. Of foremost concern is the structure of the stepwise selected model. Aquifer recharge is fundamental to flow in groundwater-fed rivers, which is a feature of the case study river examined.
In this paper, this physical property is assumed to be represented by the winter variables. In scenario A, this is accounted for through two winter low flow variables, Q w 75 and Q w 90. This is repeated in scenario B for the information theory model, plus an additional lagged high flow variable. Despite their recognized importance, the stepwise selected model includes no winter variables, leading to concerns over its ability to capture the essential physical processes in such complex scenarios.
In terms of model performance, the information theory approach resulted in fewer modelling errors but in greater statistical uncertainty. Despite this, the measure of uncertainty provided by stepwise selection is considered an underestimate (Burnham & Anderson, 2002) as the variance due to model selection is not incorporated; the estimate considers only parameter variance. It may seem contradictory to say that the model subject to greater uncertainty provides the better measure; however, the stepwise selected model inherently suffers from confidence intervals which are too narrow, with the achieved coverage being less than the standard, nominal 95% (Anderson, 2007).
From a utilitarian perspective, one might say that no approach has been demonstrated to be categorically better than the other. However, modelling is only ever an approximation of reality, and the best, true model remains an unknown. Still, in approaching the truth, we must have some criteria to adjudge success, and here, these have been identified as approaches which focus on effect size, statistical uncertainty, and the weight of evidence. Based on the results presented here, information theory satisfies these three best. In contrast, stepwise selection offers a P value, an arbitrary probability measure, which is not a measure of effect size. The uncertainty it measures is an underestimate and weight of evidence is not possible under this approach.
Finally, though no approach emerged a clear "winner," the information theory model still performed empirically better in the two scenarios considered. It presented significantly fewer modelling errors, and although the measure of statistical uncertainty is larger, and thus inconvenient, it may also be viewed as a truer representation of a complex reality, than that provided by its stepwise counterpart.