A classification approach to reconstruct local daily drying dynamics at headwater streams

Headwater streams (HSs) are generally naturally prone to flow intermittence. These intermittent rivers and ephemeral streams have recently seen a marked increase in interest, especially to assess the impact of drying on aquatic ecosystems. The two objectives of this work are (a) to identify the main drivers of flow intermittence dynamics in HS and (b) to reconstruct local daily drying dynamics. Discrete flow states—“flowing” versus “drying”—are modelled as functions of covariates that include information on climate, hydrology, groundwater, and basin descriptors. Three classifiers to estimate flow states using covariates are tested on four contrasted regions in France: (a) a linear classifier with regularization (LASSO for least absolute shrinkage and selection operator) and two non‐linear non‐parametric classifiers, (b) a one‐hidden‐layer feedforward artificial neural network (ANN) classifier, and (c) a random forest (RF) classifier. The three classifiers are compared with a benchmark classifier (BC) that simply estimates dominant flow state for each month based on observations (without using covariates). The performance assessment over the period 2012–2016 carried out by cross‐validation shows that the three classifiers for flow state based on covariates outperformed the BC. This demonstrates the predictive power of the covariates. ANN is the classifier that globally achieves the best performance to predict the daily drying dynamics whereas both RF and LASSO tend to underestimate the proportion of drying states. The covariates are ranked in terms of relevance for each classifier. The monthly proportion of drying states provided by the discrete observation network has a major importance for the three classifiers ANN, LASSO, and RF. This may reflect the proclivity of a site to flow intermittence. ANN gives higher importance to climatic and hydrological covariates and its non‐linearity allows a greater degree of freedom.

Due to their upstream position in the network, their size, and their high reactivity to natural or human disturbances, HS are generally naturally prone to flow intermittence (Datry, Larned, & Tockner, 2014;Fritz et al., 2013). Intermittent rivers and ephemeral streams (IRES) are defined by periodic flow cessation and may experience partial or complete dry up at some location in time and space Larned et al., 2010;Leigh et al., 2016). They range from ephemeral streams that flow a few days after rainfall to intermittent rivers that recede to isolated pools . IRES have seen a marked increase in interest stimulated by the challenges of water management facing the global change context (Acuña et al., 2014; and by the need to improve existing knowledge on aquatic ecosystems in IRES (Larned et al., 2010;Leigh & Datry, 2017;Sarremejane et al., 2017;Stubbington, England, Wood, & Sefton, 2017).
Citizen science creates opportunities to overcome the lack of hydrological data and may contribute to densify the flow-state observation network (Buytaert et al., 2014;Datry, Pella, Leigh, Bonada, & Hugueny, 2016;Turner & Richter, 2011;van Meerveld, Vis, & Seibert, 2017). In France, new sources of observational data are available thanks to the Observatoire National des Etiages Network (ONDE; https: //onde.eaufrance.fr; Nowak & Durozoi, 2012). This unique network in Europe by its coverage, the number of monitored sites and the regularity of the observations, provides frequent discrete field observations (at least five inspections per year) of flow intermittence on more than 3,300 sites throughout France that are located mostly in headwater areas.
However, discrete observations of intermittence with irregular and at most weekly frequency cannot provide information on the persistence of dry conditions at daily temporal resolution. Thus, continuous time series of flow states are needed. Beaufort, Lamouroux, Pella, Datry, and Sauquet (2018) succeeded to relate ONDE observations to continuous hydrological and groundwater level data for predicting the daily probability of drying at the regional scale and obtained robust predictions over France. However, as predictions are aggregated over large areas, this approach does not allow to differentiate the temporal variability of "drying" state for neighbouring streams. Spatial variability of flow intermittence may be high and the understanding of local drying dynamic is crucial. Hence, the main objective of this work is to extend this previous study. Specifically, we aim at (a) identifying the main drivers of the flow intermittence dynamics in HS and (b) reconstructing the daily drying dynamics of HS at the local scale. To achieve these objectives, discrete flow states-"flowing" versus "drying"-from the ONDE observations are modelled as functions of covariates having continuous time series that include information on climate, hydrology, groundwater level, and basin descriptors.
The paper is organized in six parts. Section 2 describes the general modelling framework developed to predict flow states. Section 3 introduces the study area and the data, and Section 4 presents the performance assessment protocol. Results are presented in Section 5 and discussed in Section 6 before drawing general conclusions in Section 7.

| STATISTICAL FRAMEWORK FOR MODELLING DAILY DRYING DYNAMICS
Drying dynamics can be reconstructed from a classifier that relates flow states to covariates. More specifically, the classifier is calibrated in order to, for each day, estimate the probability of the drying state given a set of covariates, described in Sections 3.2 to 3.5 and summarized in Table 1, which are meant to introduce information on climate, groundwater level, hydrology, and basin descriptors. The flow state predicted at each ONDE site at a given day relies on the information provided by local and regional covariates observed at various dates by other ONDE sites in a same region. Due to the limitation of the observation period, the predictions are restricted to the period between the 1st May and the 30th September of each year (see Section 3.2).
Four classifiers are considered: (a) a benchmark classifier (BC), which does not use any covariates but is entirely based on the historical proportions of observed flow state; (b) the so-called LASSO (least absolute shrinkage and selection operator) classifier, which is a linear classifier; (c) an artificial neural network (ANN) classifier, which is potentially non-linear but encompasses a linear classifier as a special case; (d) a random forest (RF) classifier, which can also be non-linear but with a different strategy than ANN. Each classifier relies on a function f to estimate the flow state at day "d" at each where g 1 to g n are the covariates depending on time including hydro-climatic covariates and "x" represents the location of the site.
Calibrating classifiers independently at each site is not possible because there are too few ONDE observations (32 observations per sites on average between 2012 and 2016). The suggested approach as well as the related assumptions for calibrating f is commonly adopted in regionalization in hydrology (e.g., the index flood method for flood frequency analysis; Dalrymple, 1960 where e 1 to e m are covariates that characterize the location of the site o x . A performance analysis, focusing on four contrasted regions in France, is carried out over the 6-year period 2012-2017 to assess their ability to simulate the daily drying dynamics at ONDE sites and to compare the accuracy of classifiers. In a second step, the influence of covariates in each classifier was examined and main environmental drivers of flow intermittence are identified.

| Benchmark classifier
BC is a simple classifier without any covariates that predicts, at a given site, the flow state that is the most frequently observed historically for the month considered. When there is a tie, drying is pre-

| LASSO classifier
The LASSO (Tibshirani, 1996) classifier estimates the drying state probability as a linear function of the covariates transformed with a sigmoid function to constrain the range to [0,1]. LASSO includes a regularization mechanism that may lead to a sparse model in which the coefficients of less relevant covariates are driven to zero (Bishop, 2006). The amount of regularization is determined through cross-validation (R package "elasticnet"; Zou & Hastie, 2018). To classify into either flowing or drying states, an optimal threshold is set after a second cross-validation procedure using the amount of regularization determined previously and leading to the best F-score (see Section 4.4; Equation 9). The relevance of each covariate is inspected directly through the magnitude of the associated coefficient estimated by the LASSO classifier. The LASSO method was recently considered in a hydrological application with other linear and nonlinear regression techniques to predict synthetic design hydrographs for ungauged catchments (Brunner et al., 2018).

| ANN classifier
ANNs-feed-forward neural networks with one hidden layerestimate the drying state probability as a potentially non-linear function of covariates. This is a non-parametric approach that combines the contribution of the neurons in the hidden layer to build an approximation. The number of neurons is related to the number of coefficients and hence the complexity of the classifier. As for LASSO, a sigmoid function is applied to constrain the range of the ANN output to [0,1] (see the implementation in the R package "nnet"; Venables & Ripley, 2002). We include a direct connection between inputs and outputs so that the case with zero hidden units corresponds to a linear relationship. Weight decay regularization, also known as ridge regression, is considered to control overfitting by decreasing less relevant coefficients. Both the number of hidden units and the amount of weight decay are selected with a first cross-validation procedure (Bishop, 2006). As for the LASSO classifier, an optimal threshold is set with a second cross-validation procedure using the number of hidden units and the amount of weight decay determined previously and leading to the best F-score.
The LASSO classifier can be thought of as a particular case of the ANN with no hidden units although the regularization mechanisms are different.
To quantify the relevance of the different covariates, the connection weight approach (Olden & Jackson, 2002;Olden, Joy, & Death, 2004) is employed,

| RF classifier
An RF combines decision trees obtained by resampling the calibration set (Breiman, 2001). Each tree is a structure made of binary nodes associated to binary rules of the type V ≤ s versus V > s where V is one of the covariates and s is a bound. When reaching a terminal node, a majority vote is taken amongst the observations belonging to the node. A single decision tree tends to yield nonrobust estimation (very dependent on the selected calibration set) and the process of combining the trees in a forest circumvents this issue. We use the implementation in the R package "randomForest" (Liaw & Wiener, 2002). The covariates relevance is given directly by the randomForest package, which determines how much the mean square errors in prediction increases when that covariate is randomly permuted within the tree. RF models have been recently used to predict the spatial distribution of intermittent and perennial rivers at the basin scale  (Table 2) and are representative of most HERs in France except for mountainous regions. HER1 is distinguished by its hard, impermeable and noncarbonated primary rocks, a landform of hills, and an oceanic climate. HER2 is a lowland region with an altitude of less than 200 m.  The subsoil is mainly composed of carbonated sedimentary rocks. 3.2 | ONDE dataset: A discrete national flow-state observations network The ONDE network was set up in 2012 by the French Biodiversity Agency (AFB, formerly ONEMA). Its aim is to constitute a perennial network recording summer low-flow levels that can be used to anticipate and manage water crisis during severe drought events (Nowak & Durozoi, 2012).
The ONDE network remains stable over time and distributed throughout France with 3,300 sites regularly inspected ( Figure 1).
ONDE sites are located on HS with a Strahler order strictly less than five and balanced across HER to take the representativeness of the hydrological contexts into account (Nowak & Durozoi, 2012). There   (2008) LeRoy Poff, 2015). In this study, we consider stations as intermittent when five consecutive days with discharge less than 1 l/s is observed during the observation period.

| Explanatory groundwater level dataset
Daily groundwater levels are provided by the ADES database (http:// www.ades.eaufrance.fr) at sites involved in groundwater/surface water exchanges (Brugeron, Allier, & Klinka, 2012). This dataset is composed by 750 piezometers with daily groundwater level data available from 2011 to 2017 with less than 5% of missing data (continuous or not). The level of alteration of groundwater levels by water withdrawal is unknown because no information is available at this scale.
Groundwater level data are used as covariates. A post-processing similar to the one applied to daily discharge is applied to groundwater levels ( Figure A1) except for the first step, which consists in selecting all piezometers located in a same HER instead of HER-HR combination. In a second step, in each HER, three covariates are computed: the average nonexceedance frequency of the observed groundwater level (a) at the dayObs (FGw0), (b) over the 5 days before dayObs (FGw5), and (c) over the 10 days before dayObs (FGw10).

| Explanatory meteorological dataset
Daily meteorological covariates are taken from the SAFRAN dataset

| PERFORMANCE ASSESSMENT AND COMPARISON
The performance of the classifiers is evaluated on the four selected HERs (Figure 1). The calibration and validation methods are described in the next sections.

| Cross-validation over 2012-2016
A cross-validation procedure is carried out for each classifier in each HER. The calibration set is constituted by selecting randomly 80% of the observations. The test set consists of the remaining 20%. Once the classifiers are trained on the calibration set, the evaluation criteria are calculated (see Section 4.4) based on the prediction on the test set.
This step is repeated 20 times in order to evaluate the uncertainty associated to the selection of the calibration set.

| Extrapolation ability over 2017
In order to assess the extrapolation ability of the classifiers, they were

| Spatio-temporal extrapolation ability
In order to evaluate their spatio-temporal extrapolation ability over

| Evaluation criteria
Several validation criteria are calculated to compare the performance of the classifiers. First, criteria based on a 2 × 2 contingency table, see Table 3, are used to evaluate the ability of classifiers to accurately predict flow states for a stream at a given day.
Derived from the contingency table, five criteria (see Equations (5) to (9)), are calculated to assess classifiers performance: the probability of detection (POD; best value is 100%), the false alarm ratio (FAR; best value is 0%), the precision (best value is 1), the recall (best value is 1), and the F-score (best value is 1).
POD ¼ a a þ c × 100: Recall ¼ a a þ c : In addition, the proportions of observed and predicted days with a drying state-named P obs and P pred , respectively-at gauging station i are compared to measure the spatio-temporal ability (Section 4.3), The bias and the root mean square error are calculated for a performance assessment at the HER scale where i is a gauging station located inside a given selected HER and G is the total number of gauging stations located in each HER.

| Evaluation of covariate contribution
In this evaluation, the covariates are grouped according to their type defined in Table 1  This can be seen from the POD, FAR, and F-scores in Figure 2. The LASSO classifier, although linear, performed only slightly worse than the ANN classifier, its non-linear counterpart (see the POD and FAR scores in Figure 2). Amongst the two non-linear classifiers, RF achieved the overall best performance in the four HERs (see the three scores in Figure 2).
The performance obtained by the two non-linear classifiers is very close and the best POD is obtained by ANN in all selected HERs ( Figure 2a). However, RF minimizes the FAR and obtains a better F-score than ANN on average over the four HERs.

| Performance of classifiers in extrapolation over 2017
The results of the classifier predictions in extrapolation in 2017 show that BC obtains the best POD, which is greater than 60% in the four HERs ( Figure 3a). However, the FARs are very high and exceed 50% so BC tends to strongly overestimate drying states in 2017 especially in HER1 and HER3 (Figure 3b).  ANN obtains a POD higher than 50% and a FAR significantly lower than BC whatever the HER. The ANN classifier obtains the best F-score for each selected HER (F-score > 0.5).
The performance of the LASSO and RF are rather moderate. They obtain very low POD lower than 30% on the HER1, HER2, and HER3 but also a very low FAR reaching zero for the HER3. We deduce that these two classifiers tend to underestimate the number of drying states extrapolated in 2017. Their F-score is lower than BC in the HER2 but remains very close to the F-score of ANN in the HER4, which corresponds to the HER where the most drying state is observed.

| Performance of the spatio-temporal extrapolation over the period 2012-2016
For the three classifiers ANN, LASSO, and RF, P pred are close to P obs especially for gauging stations whose proportions of drying states are greater than 20% (Figure 4). For stations where the proportion of drying states is lower than 20%, the accuracy of the classifiers is more contrasted.
Overall, ANN achieves the best performance with an average root mean square error (RMSE) of 3.3% over the four HERs (Table 4). The RMSE is similar and close to 3% for each HER. ANN tends to slightly overestimate the proportion of observed drying states illustrated by positive biases and a FAR close to 40% on the four HERs (Table 4). On the other hand, ANN tends to predict very short drying states, persisting less than 2 days, on perennial stations leading to a prediction of drying states of less than 1% (Figure 4a). These incorrectly predicted drying states correspond to periods of severe low flows, that is, when most of the daily flows stay below the 90th quantile of the flow duration curve. Thus, although not strictly speaking in a drying state, they are consistent with the ANN classifier predictions.

| Identification of covariate contributions to drying state predictions
The analysis of the contributions of the covariates is summarized in     (Table 2). ANN is the classifier that achieves the best performance for weakly intermittent HS. Classifiers tend to underestimate drying states with an observed frequency smaller than 20% (Figure 4). It may occur that no drying state is observed due to the rarity of zero-flow events and the value of MPD is zero whatever the month of the year. The high importance of this covariate in classifiers leads to an underestimation of predicted drying states or to consider the HS as perennial. This underlines the importance of taking into account a calibration dataset composed of years with contrasted climate thus allowing a better representation of the extreme events. Future observations of the ONDE network will make it possible to sample more contrasted situations and to better identify sites impacted by flow intermittence leading to an updated ranking of the three classifiers.

| Drivers of flow intermittence
MPD is considered as very important for the three classifiers. This covariate reflects the level of flow intermittence of each ONDE site   indicators in an IRES. The regional climate patterns (Snelder et al., 2013)   One of the perspectives to this work would be to explore how this non-linearity is used to predict daily drying dynamics. Studying the statistical decision boundary of ANN especially for weakly intermittent stations constitute a first step toward this goal. Another perspective concerns the determination of additional covariates more locally defined that could be tested to analyse their added value in local drying predictions. All these approaches will have to be studied in contrasted climatic and environmental situations in order to accurately assess the performance of each classifier.
Thanks to this first application, the next step will be to extend this approach in all HERs in France. It is therefore conceivable to use the results of our models, that is, the reconstructed drying dynamics, in the context of ecological studies that focus on the distribution and persistence of aquatic communities in response to flow alterations.
In addition, this work could be relevant for watershed management.  Note. ANN: artificial neural network; HER: hydro-ecoregion; LASSO: least absolute shrinkage and selection operator; ONDE: Observatoire National des Etiages; RF: random forest. The first value is the cumulative importance of each covariate in a given type and the value in brackets represents the average importance of covariates.
identify which metrics are the most relevant for detecting human impacts (Datry, Bonada, & Boulton, 2017). Many IRES occur in the headwater of perennial systems and their conservation is widely recognized, especially for the supply of good quality water (Lowe & Likens, 2005). The reconstruction of local drying dynamics could guide stakeholders to improve the ecological restoration and protection of IRES.