When good signatures go bad: Applying hydrologic signatures in large sample studies

Hydrologic signatures are quantitative metrics that describe streamflow statistics and dynamics. Signatures have many applications, including assessing habitat suitability and hydrologic alteration, calibrating and evaluating hydrologic models, defining similarity between watersheds and investigating watershed processes. Increasingly, signatures are being used in large sample studies to guide flow management and modelling at continental scales. Using signatures in studies involving 1000s of watersheds brings new challenges as it becomes impractical to examine signature parameters and behaviour in each watershed. For example, we might wish to check that signatures describing flood event characteristics have correctly identified event periods, that signature values have not been biassed by data errors, or that human and natural influences on signature values have been correctly interpreted. In this commentary, we draw from our collective experience to present case studies where naïve application of signatures fails to correctly identify streamflow dynamics. These include unusual precipitation or flow regimes, data quality issues, and signature use in human‐influenced watersheds. We conclude by providing guidance and recommendations on applying signatures in large sample studies.


| INTRODUCTION
Hydrologic signatures are quantitative metrics that describe streamflow statistics and dynamics.Examples include runoff ratio, baseflow index or slope of the flow duration curve.Signatures have many applications such as assessing habitat suitability and hydrologic alteration, evaluating hydrologic models, defining similarity between watersheds and investigating watershed processes (McMillan, 2021).Signature concepts have been extended to characterize soil moisture (Araki et al., 2022), water quality (Ebeling et al., 2021) and other hydrologic quantities.
From earlier uses in small numbers of watersheds, signatures are now frequently calculated in large samples of watersheds spanning national to global scales.This expansion is part of a wider move towards large sample studies that develop datasets and draw conclusions across scales, hydro climates and ecosystems (Addor et al., 2020;Kratzert et al., 2023).Studies that calculate signature values over 100s of gauged watersheds include evaluating national models (Almagro et al., 2021;Coxon et al., 2019;Donnelly et al., 2016;Massmann, 2020;McMillan et al., 2016), selecting model structures (David et al., 2022), interpreting machine learning models (Botterill & McMillan, 2022;Kratzert et al., 2019), predicting signatures from watershed attributes (Addor et al., 2018;Beck et al., 2015;Grantham et al., 2022;Janssen & Ameli, 2021) and classifying watersheds (Kuentz et al., 2017).However, signature use is challenged by large datasets, as it becomes impractical to check whether the signature quantifies the intended hydrograph property in each watershed.Although difficulties in transferring knowledge between individual samples and large populations are well established (Jelinski & Wu, 1996), these issues have accelerated in hydrologic analysis with more widespread use of large sample datasets.As authors we have encountered these challenges, prompting us to write this commentary to share our experiences and provide guidance on applying hydrologic signatures in large sample studies.
1.1 | Challenges in using signatures over large samples Some signatures such as statistics of flow magnitude or timing are simple to calculate and robust to many errors, uncertainties or unusual flow patterns.Other signatures, such as those requiring identification of storm events or recession periods require more time-series processing and may be less robust.All signatures are impacted by missing data in the time-series, and large sample studies often tolerate some missing data to maximize the number of sites.Signature toolboxes such as TOSSH (Gnann, Coxon, et al., 2021), eflows (Patterson et al., 2020) or Pastas (Collenteur et al., 2019)  Large sample studies require users to specify signature parameter values.Parameters change how signatures retrieve information from flow series and affect the resulting signature value.For example, recession analysis signatures are influenced by parameters for recession extraction and fitting (Dralle et al., 2017).Large sample studies often use constant parameter values (Addor et al., 2018;McMillan et al., 2022), but these may be unsuitable for some watersheds.Strategies to select parameter values include using sensitivity analysis to explore how parameter choices affect study conclusions (Tashie et al., 2020), conducting in-depth checks for representative watersheds such as one watershed per climate region where signature parameters are sensitive to climate, or deriving parameters from the flow regime (Stoelzle et al., 2020).
Large sample studies often include watersheds with reservoir storage and releases, abstractions or agricultural water use.In humanimpacted watersheds in California, signatures could not be calculated because "the annual hydrograph was extremely different compared with the predicted reference condition.These instances would often lack a seasonal flow pattern that the flow calculator relies on to derive subsequent metrics" (Peek et al., 2022).Hydropower operations cause sudden flow changes that make computations of change rates problematic, invalidating recession signatures (Zmijewski & Wörman, 2016).Solutions include designing signatures to measure human impacts (e.g., Indicators of Hydrologic Alteration, IHA; Magilligan & Nislow, 2005), or using existing signatures such as high flashiness signifying urban effects (Smith & Smith, 2015).
A related challenge is signature interpretability.In small sample studies, researchers often have prior expertise (or a perceptual model) against which they interpret the signature values.In large sample studies, it is more difficult to determine whether the same signature values correspond with the same hydrologic characteristics.Our case studies include human-influenced watersheds, where it is difficult to distinguish the effects of reservoir operations from differences in natural flow regimes, and soil moisture seasonality, where bimodal distributions have multiple possible causes.In small-sample studies, signatures can be checked for correct interpretation, but in-depth checks are usually not feasible for 1000s of watersheds.

| Aims of the paper
In this paper, we present eight case studies illustrating challenges that arose in our work when calculating hydrologic signatures over large samples of watersheds.These include unusual precipitation regimes, data quality issues and human-influenced watersheds.For each study we discuss the aim, which signatures were calculated, the issues that occurred and the solution, if any.We use the lessons learnt to provide guidance on applying signatures in large sample studies.

| Event identification failed in Arizona watersheds with convective monsoon rainfall
We used signatures to investigate overland flow generated by convective monsoonal rainfall in 21 subcatchments of the San Pedro River in Southern Arizona.We calculated infiltration-and saturation-excess signatures from the TOSSH toolbox (Gnann, Coxon, et al., 2021), including the significance of thresholds in plots of event runoff versus rainfall depth or intensity, and coefficients in regression equations to predict event runoff.These signatures rely on event identification algorithms to calculate event rainfall and flow totals.The algorithms identify rainfall peaks and then find the corresponding runoff, or vice versa (McMillan et al., 2011;Tarasova et al., 2018).The algorithms use parameters such as thresholds for magnitude or separation time between events.Checks on our rainfall-based algorithm showed that afternoon rainfall on consecutive days was often merged into a single event because the algorithm specified a dry period of 12 h to separate events (Figure 1a).
Afternoon rainfall could cause a flow peak overlapping with the following day's rainfall, which would be mistakenly associated.In such watersheds, hourly data and careful parameter choice are needed to avoid conflating separate events.We used hourly data, but large sample studies may use daily values (e.g., McMillan et al., 2022), which would be insufficient to identify convective rainstorms.

| Event identification required manual checks in drizzly United Kingdom
In the United Kingdom (U.K.), most watersheds experience at least 0.2 mm of rainfall every other day, rising to 78% of days for the wettest watersheds.In a large sample U.K. study, we calculated the event runoff coefficient signature (the ratio between the direct runoff volume and event rainfall) to reveal the space-time controls on event characteristics (Zheng et al., 2023a(Zheng et al., , 2023b)).We used flow-based event identification with hourly data (Giani et al., 2022), but short intervals between events sometimes led to the misattribution of rainfall to runoff events (Figure 1b).Given these issues, we used manual checks on 1144 events from the Swale River at Crakehill (NRFA ID 27071) to create event selection criteria.For events with rainfall after the runoff peak, the time interval between events was calculated.If this interval was less than the watershed response time, both events were excluded.These criteria led us to exclude 23 393 out of 903 745 events from 431 watersheds.
High exclusion rates occur in large watersheds (> 1000 km 2 ) and those with long response times (> 1d).In this and the previous example, a more complex algorithm is required to associate rainfall with runoff that allows for the time of concentration.
2.1.3| Soil moisture seasonality was misidentified in sites with multiple wet seasons in a year We used large sample studies to investigate controls on soil moisture seasonality, particularly the persistent wet and dry seasons that often appear in soil moisture data (Araki et al., 2022(Araki et al., , 2023)).We used two signature types: the number of peaks of the soil moisture frequency distribution, and (for bimodal distributions only) dates and duration of transitions between wet and dry states.These signatures were developed for a study of 82 New Zealand sites (Branger & McMillan, 2020).and used in signatures such as runoff ratio (Q/P) and streamflow elasticity (ΔQ/ΔP).We investigated the impacts of basin polygon errors in the CAMELS dataset of 671 U.S. basins (Addor et al., 2017;Newman et al., 2015).CAMELS includes streamflow observations in [feet 3 s À1 ] and two different area estimates for each basin from GAGES II and Geospatial fabric datasets.We used the two area values to calculate three magnitude-based signatures (5th/95th percentile flows, daily mean flow) in [mm d À1 ].The differences are typically within ±5%, but sometimes substantial (Figure 3a-c).The resulting differences in annual streamflow exceed 100 mm for many basins (Figure 3d).Some CAMELS studies exclude watersheds with uncertain area (Knoben et al., 2020;Kratzert et al., 2019) but others use all or most watersheds, implicitly accepting this uncertainty (Addor et al., 2018;Tyralis et al., 2021).Any large sample study using magnitude-based signatures is susceptible to catchment area uncertainty, such as the event runoff coefficient study described previously (Zheng et al., 2023a).

| High uncertainties in event rainfall and flow totals in Arizona
In the previously described investigation of 21 subcatchments of the San Pedro River (Southern Arizona), we used the TOSSH toolbox (Gnann, Coxon, et al., 2021) to calculate event rainfall and flow totals.
TOSSH plots rainfall totals against flow totals for a manual error check, and we found an unexpected lack of correspondence (example in Figure 4).We performed extended checks for other errors (such as errors in rainfall or flow series extraction, incorrect timestamps, or coding errors) but found none.We suggest that rainfall totals from the NLDAS-2 national gridded product do not accurately measure areal average rainfall for these small watersheds, due to a large grid size (0.125 ), relatively sparse rain gauge network, and convective monsoon rainfall characterized by highly variable local accumulations.
Further, the flow series contains significant periods of gap-filled data that may affect signature values: all three of the flow peaks in Figure 4 consist of gap-filled data.Our findings highlight the importance of considering local variations in data accuracy.

| Non-stationarity in soil moisture values invalidated magnitudes and dynamic range
Soil moisture data often exhibit increasing or decreasing trends over time.These trends may arise from non-stationarity in hydrologic processes or data quality issues, such as changes in sensor voltage power (Martini et al., 2015), oxidation of sensor rods, salinization, and soil compaction (Dorigo et al., 2013).These causes are difficult to distinguish and so trends are rarely addressed during quality control.Trends can impact various signatures, as estimated field capacity and wilting point signatures may change from year to year, and the shape of frequency distributions becomes unclear (Chandler et al., 2017;Araki et al., 2022; Figure 5).The dynamic range of soil moisture (the range between field capacity and wilting point, or between maximum and minimum values), often used as a normalization factor, may be overestimated.For some signatures, trending time-series should be excluded, such as for field capacity and wilting point estimates (Araki et al., 2022).Alternatively, observations can be detrended by subtracting a 2-year moving average signatures (Basak et al., 2017).An example of this approach is shown in Figure 5b

| Signature issues caused by human impacts
Many signatures for ecohydrology and hydrological processes were designed for natural flow regimes (McMillan, 2020;Yarnell et al., 2020).In human-impacted watersheds, signatures can fail if expected flow patterns no longer occur, and anthropogenic changes can override natural differences.

| Flow duration curve slope was modified for flow series impacted by water supply reservoirs
We used signatures to investigate how reservoirs impact flow regimes in Great Britain (Salwey et al., 2023).To maximize storage, operators frequently release only the minimum flow required to protect downstream ecosystems (Maynard & Lane, 2012).Although pre-existing techniques (e.g., IHA) have been used to assess reservoir impact, we lacked the data to upscale these techniques (flow records do not predate reservoir construction, or insufficient locations with upand-downstream gauges).Instead, we investigated signatures that require only a downstream flow series.We considered signatures that quantify flow duration curve (FDC) shape, because reservoirs may modify the full range of streamflow.While the most common FDC signature is the slope of its central portion (Yilmaz et al., 2008), this signature was unreliable in reservoir-impacted watersheds.Operations such as routine environmental flow releases cause flat segments and abrupt changes in the FDC, meaning that the central portion cannot be approximated by a linear slope (Figure 6a-d

| Stream temperature signatures are dominated by dam impacts
Human activity affects water quality signatures such as stream temperature indices that evaluate ecosystem health and nutrient cycling (Ficklin et al., 2023).Dams influence stream temperature regimes due to thermal stratification in reservoirs, with this water moved downstream via top release (warmer) or bottom release (colder) (Bonnema et al., 2020).For 57 eastern U.S. sites, we quantified thermal sensitivity, a signature that measures the strength of the relationship between air and water temperatures (Kelleher et al., 2012).In general, thermal sensitivity is lower in headwaters, where stream temperatures are buffered by groundwater and riparian shading, and higher in large watersheds where heat accumulates along the river.However, cool water releases below dams reduce thermal sensitivity values.Wade et al. (2023) trained random forest models on monthly thermal sensitivity values for 400 U.S. sites.They found that dam storage overwhelmed 23 other influences on river thermal regimes.Teasing out other influences required training models on sites with minimal dam impact.Dams also impact basic statistical metrics of water temperature: among 138 U.S. sites, dam-regulated sites had the warmest and coolest 20-year trends in maximum monthly water temperatures (Kelleher et al., 2021).Overall, dam operations influence short-term behaviour and long-term trends in water temperature signatures.

| Summary of case studies
In this commentary, we described multiple challenges that we encountered when calculating hydrologic signature values over large samples of watersheds (Table 1).Signatures designed for some hydroclimates may lead to incorrect interpretations in others, data quality may vary with location, and uncertainties may occur from unexpected sources.
When signatures are used in human-impacted watersheds, careful analysis is required to separate human impacts from natural processes.When signatures are applied over continental or global datasets, all these factors should be considered when interpreting signature values and patterns.

| Guidance for signature users
We recommend that signature users be mindful of the range of precipitation or flow regimes for which signatures were designed.This design might have been intentional, for example, functional flow metrics that quantify ecologically-important flow features are designed for region-specific natural flow regimes (Yarnell et al., 2020) or inadvertent, for example, soil moisture signatures that were designed for NZ watersheds, all of which have similarities in climate (Branger & McMillan, 2020).Where a large sample analysis includes watersheds with different regimes, signatures might not perform correctly.This is particularly the case for human-impacted watersheds where reservoirs or other infrastructure may cause unusual flow patterns.Such patterns might cause signature codes to fail, or might mimic unrelated natural processes, for example, watersheds with groundwater gains can mimic those with pumped reservoir storage.Signatures may further be designed for a specific temporal resolution, for example signatures of flow alteration under hydropower require hourly data (Bevelhimer et al., 2015).We recommend that, especially where signature parameters must be selected, users should visually assess signature behaviour.For example, overlay event or recession periods on the hydrograph to check correct identification.For large samples, we encourage plotting for a subset of representative watersheds.
Users should also monitor the structure of missing data, to avoid systematic biases.Biases should also be considered if sites or events are excluded from signature analyses as a solution to data quality or other issues (Table 1).Signature values can be compared with distributions errors, including those due to data quality.When interpreting high or low signature values, users should consider their perceptual model of watershed processes and how these affect the signature.We showed an example where bimodal soil moisture regimes have multiple interpretations; similarly, high BFI (base flow index) values could be caused by snow, permeable geology, or wetlands (Gnann, McMillan, et al., 2021).To facilitate such analyses, large sample perceptual model resources are becoming available (McMillan et al., 2023), although regional gaps still remain.
We encourage toolbox authors to help reduce signature errors in large sample studies.One approach is to include warning flags to identify non-behavioural data series as part of signature code, such as a warning that few recession segments could be identified which may invalidate recession signatures.In addition, toolboxes can include plotting functions to allow rapid visual inspection of signature behaviour (as recommended above).Signature creators are in a good position to design relevant visual checks due to their expert knowledge of the signatures.Known issues, limitations and robustness of signatures can be shared in readme documents, such as is implemented for eflows documentation (https://eflows.gitbook.io/project/known_issues).Documents should specify requirements for the treatment of missing and gap-filled data.Through the combined efforts of signature creators and signature users, hydrologists can benefit from accurate signature calculations and interpretations in large sample studies.
T A B L E 1 Summary of case studies described in this commentary.
may reject the timeseries if missing data exceed a certain percentage of the record, or interpolate the data for smaller missing portions.Signature values can be biassed where missing data are systematic, such as missing data under freezing conditions or under extreme high or low flows.

|
Signature issues caused by precipitation and flow regimes Many signatures make assumptions about watershed precipitation and flow regimes.If the assumptions are invalidated, as shown in these examples, signature values can lead to incorrect interpretations.
Figure 2a,b).However, applications to a wider range of climates showed that bimodal distributions had multiple causes.Sites in Texas showed multiple wet seasons (winter, summer monsoon) and large dry downs during wet seasons due to well-drained soil.This resulted in strongly bimodal signatures despite weak seasonality (Figure2c), and invalidated the algorithm for transition signatures that assumed only one wet season per year.Pseudo-seasonality also occurred in Maqu watershed, China, where soil freeze-thaw created a bimodal distribution unrelated to wet and dry seasons (Figure2d).Manual checks of rainfall and soil temperature seasonality were needed when interpreting signature values in these different regimes.

2. 2 |
Signature issues caused by data qualitySignatures vary in their robustness to error types(Westerberg & McMillan, 2015).In large sample studies, data errors will vary site-tosite.In our examples, data quality motivated exclusion of data from specific sites or regions, or development of new processing steps to mitigate errors.

F
I G U R E 2 Peaks in the distribution of soil moisture identified under (a) a weakly seasonal climate in Germany, (b) a strongly seasonal climate in Australia, (c) a climate with two wet seasons per year in Texas, (d) a site with soil freeze-thaw in China.2.2.1 | Issues with catchment area affects flow magnitude signatures Catchment polygons describe the area upstream of a flow gauge, and may contain errors due to incorrect outlet coordinates (e.g., Arheimer et al., 2020), errors in catchment delineation method or elevation data errors.Inaccurate basin areas can have large impacts.Streamflow is often transformed from raw units [M 3 T À1 ] to specific discharge [MT À1 ], by dividing by basin area.Area errors will lead to incorrect specific discharge, and propagate into signatures.Further errors will occur if meteorological data are snipped to incorrect basin polygons, , with the shape of F I G U R E 3 Differences in magnitude-based signatures due to differences in catchment areas used to convert flows from [M 3 T À1 ] to [MT À1 ].(d) Red arrow shows the location of the final data point outside the axis limits.frequency distributions and dynamic range calculated from the detrended time-series.
).Our solution was to develop a modified signature that quantifies deviations from an expected naturalized, sigmoidal FDC.High deviations reliably indicated large reservoirs.Despite this, we could not always distinguish natural and reservoir-impacted flow regimes.Natural F I G U R E 4 Comparison of event totals for 2001-2022 and hourly timeseries for August 2006 of areal-average NLDAS-2 rainfall and observed flow, USGS flow gauge 09470700 Banning Creek near Bisbee, watershed area 22.6 km 2 .F I G U R E 5 Example of the need to detrend a soil moisture time-series to enable correct calculation of distribution type and dynamic range signatures.groundwater gains can mimic pumped reservoir storage, and FDCs from ephemeral streams can mimic those where an upstream reservoir only releases water when full.Therefore, ignoring reservoirs in large sample studies risks mistaking reservoir-induced behaviours for unrelated natural processes.
of values from large sample studies (see signature distributions for the U.S., Great Britain, Australia and Brazil in McMillan et al. (2022)).Values outside or at the extremes of those distributions may indicate F I G U R E 6 Observed flow duration curves (red) and associated naturalized flow duration curves (black dashed) at example reservoir catchments in the U.K. (a) Ness at Ness-side -6007; (b) Brenig at Llyn Brenig -67 003; (c) Vyrnwy at Vyrnwy Reservoir -54 003; (d) Hodder at Stocks Reservoir -71 002).