Comment on “Hydrometeorological Triggers of Periglacial Debris Flows in the Zermatt Valley (Switzerland) Since 1864” by Michelle Schneuwly‐Bollschweiler and Markus Stoffel

Generalizing properties of samples derived via data analysis to advance understanding of underlying processes constitutes the essence of statistical inference. This is especially true in geophysical research, where general findings often emerge from case studies. Consequently, statements about trends in case studies should naturally be grounded on a comprehensive and robust analysis of available data. Here, we focus on trends reported for a debris flow time series of a well‐known case study in the Zermatt Valley, Switzerland (Schneuwly‐Bollschweiler & Stoffel, 2012; https://doi.org/10.1029/2011jf002262). We elaborate that we were not able to confirm the findings of the original study with respect to changes in debris flow seasonality. On the contrary, we show that confusion between data analytic question types (exploratory vs. inferential) can lead to the confirmation of trends that do not necessarily hold beyond the data set, and we demonstrate how the reported effect could be an artifact of incomplete data. In doing so we employ a range of methods that can be used for adequately addressing the original question.

Our comment refers, by way of example, to an analysis of debris flow activity published by Schneuwly-Bollschweiler and Stoffel (2012), which is based on a localized but long and continuous record. In their study, the authors presented an assessment using data from eight torrents in the Zermatt Valley (Valais, Switzerland) at elevations between 1900 m a.s.l and 4600 m a.s.l. The data set was based on dendrochronological reconstructions and daily records of meteorological and gauge measurements over a period of almost 150 years. With reference to the collected and reconstructed event data, the authors concluded that "the debris flow season at these high-altitude sites now [2007] is much longer (May to October) than it used to be in the late nineteenth century when activity was limited to June-September" (Schneuwly-Bollschweiler & Stoffel, 2012).
Based on extensive inferential data analysis, we were not able to reproduce the findings of Schneuwly-Bollschweiler and Stoffel (2012) with respect to changes in debris flow seasonality in the Zermatt Valley. By contrast, we show in accordance with Leek and Peng (2015) that the study confuses data analytic question types by mistaking exploratory data analysis for an inferential analysis. In addition, we demonstrate how the reported effect could be an artifact of incomplete data.
Conducting a completeness analysis in terms of reporting rate of the "Zermatt Valley" event data set therefore represented a first and essential step . The completeness analysis comprised a series of statistical tests for homogeneity in the reporting rate, employing methods based on the cumulative number of events (using different loss functions) as well as change-point analysis (using parametric, nonparametric, and Bayesian variants). These methods differ in how they account for different data-generating processes.
While the change-point methods indicated that the time series could be regarded homogeneous for the entire period (beginning in the year 1864), the methods based on the cumulative number of events suggested that the time series was complete only from 1916 onwards (Figure 1). Both the entire (1864-2007) and the shortened time series  were examined for trends in the number of events per year and for trends in seasonality.
In a second step, we assessed the presence of trends in event frequency ( Figure 1) following Schlögl et al. (2021), again based on a plurality of methods (Appendix A). While no trends in frequency could be detected in the full data set, the null hypothesis of no trend in the time series could only be rejected in the case of one test on the reduced time series (Appendix B).
In a final step we analyzed the existence of trends in seasonality, using quantile regression and the Theil-Sen slope, again following Schlögl et al. (2021). Additionally we investigated the stability of seasonality between 1864 and 2007 by analyzing the temporal concentration of debris flow events in the course of a year (cf. Appendix C). The temporal concentration, that is, the magnitude of seasonality, was expressed as the mean resultant length, which takes values between zero and one (Jammalamadaka & Sengupta, 2001). A value close to one indicates low spread (i.e., high seasonality), while a value close to zero indicates high spread (i.e., uniformity in time).
Since the event dates were partially uncertain, a Monte Carlo simulation was performed to account for event date uncertainty. For all uncertain time points, the day was perturbed randomly (uniformly distributed) in the range of ±7 and 14 days, respectively. The width of the interval depended on the quality of the observation as stated by the authors in the supplementary material. The regression models were estimated for the data generated in this way. The entire computation procedure was repeated 1,000 times.
For the entire event series, no trend was observed in the day of event occurrence (upper panel Figure 2). Both the quantile regression, yielding a slope of 0.05 (−0.19, 0.20), and the Theil-Sen slope with a value of 0.01 (−0.12, 0.04) included 0 as a possible slope value within the confidence interval. Furthermore, the null hypothesis of no trend in seasonality could not be rejected by any of the trend tests (Table B2).
Similarly, the reduced event series also did not provide indications of significant trends in seasonality. Both the quantile regression, yielding a slope of −2 (−7, 1), and the Theil-Sen slope with a value of −2 (−4, 0.8) included 0 as a possible slope value within the confidence interval. Given the fact that 0 just lies within the upper bound of the confidence interval, a possible shift of 1-2 days per 10 years cannot be ruled out, since such slight trends simply might not yet be detectable due to the lack of statistical power. However, such a slight negative trend would be low in magnitude in any case, compared to the apparent variability in the event series, which exhibits a standard deviation of 38 (33, 42) days (lower panel Figure 2). Furthermore, the null hypothesis of no trend in seasonality could not be rejected by any of the trend tests (Table B2).
The temporal concentration, which was high across the whole event series with values consistently above 0.75, showed a periodical behavior (see Figure 3). Compared to the concentration estimated from the base period suggested by Schneuwly-Bollschweiler and Stoffel (2012) (1864-1900), which exhibits a value of 0.92 (0.87, 0.97), the event series showed a decreasing concentration. However, if the base period was based on the first 30 years after completeness , no departure from the expected concentration of 0.85 (0.77, 0.92) was visible. Hence, if stricter completeness criteria were applied, the change in concentration, which would imply an extension of the debris flow season in summer, disappeared.
The analysis suggests that the debris-flow event time series, compiled by Schneuwly-Bollschweiler and Stoffel (2012), does not in fact constitute compelling evidence that would refute the hypotheses of (a) no change in the number of events per year, as well as (b) no change in seasonality in the period between 1864 and 2007. Solely Local polynomial regression with a 95% pointwise confidence interval based on 1,000 bootstrap samples is shown as red dashed line. The span was chosen for each series individually by means of generalized cross-validation. The gray dashed lines equal the midpoints of the months May, June, July, August, September, and October. Panels B and D show the median, 2.5% and 97.5% percentile of the slope of the quantile regression (open gray circles) and the median, 2.5% and 97.5% percentile of the Theil-Sen slope (filled black circles). Error bars correspond to the 2.5% and 97.5% percentiles estimated from the 1,000 Monte Carlo samples to incorporate the uncertainty in the date of the observations. based on the available data set, it is reasonable to conclude that the number of events and seasonality of debris flow occurrence in small, steep, permafrost-influenced catchments at high-altitude sites in the Zermatt Valley did not change between 1864 and 2007.
Consequently, we argue that the data set does not hold enough empiric evidence to warrant the conclusion that "the debris flow season at these high-altitude sites now [2007] is much longer (May to October) than it used to be in the late nineteenth century when activity was limited to June-September" (Schneuwly-Bollschweiler & Stoffel, 2012). In the original study, this conclusion on seasonal shifts was based on an exploratory analysis (in essence a percent stacked barplot) across four equally long time segments. However, a formal statistical analysis that could have quantified whether the observed trend pattern is significantly different from patterns that are to be expected under the assumption of no trend was lacking. Therefore, the study confuses data analytic question types by mistaking exploratory data analysis (real question type) for an inferential analysis (perceived question type; Leek & Peng, 2015). While exploratory data analysis "seeks to make discoveries, but can rarely confirm those discoveries," inferential data analysis "quantifies whether an observed pattern will likely hold beyond the data set in hand" (Leek & Peng, 2015).
From a classical inferential point of view, the lack of significant results in the trend tests on seasonality does challenge this conclusion of Schneuwly-Bollschweiler and Stoffel (2012), as does the variability in the concentration (cf. Figure 3) from a purely exploratory perspective.
Our comment illustrates the caveat of selection biases as well as the intricacies of statistical inference. Properties of the underlying data need to be thoroughly reviewed and analyses need to be conducted with care in order to avoid faulty conclusions. Consequently, careful consideration of the robustness of results and conclusions is of prime importance, especially in light of possible confirmation bias in the climate change discussion. We are convinced that local studies do provide important insights into the dynamics of debris flow frequency and seasonality due to climate change, yet we advocate that data analysis has to be performed with caution.

Appendix A: Trend Tests
A number of (nonparametric) trend tests were conducted for assessing the presence of trends in the data set.

A1. Mann-Kendall Trend Test
The Mann-Kendall trend test can be used to test for a monotonic trend in a time series z(t) based on the Kendall rank correlation τ of z(t) and t (Kendall, 1975;Mann, 1945). Specifically, it can be used to test the null hypothesis that the data are i.i.d. (independent and identically distributed), that is, do not follow a monotonic trend.

The Mann-Kendall test statistic S is calculated according to
in case of positive S it can be assumed that observations obtained later in time tend to be larger than observations made earlier, and vice versa.
The statistic S is approximately normally distributed. Using continuity correction, the z score can be computed as: the mean of S is μ = 0, and the variance (including the correction term for ties) is where p denotes the number of the tied groups in the data set and t j is the number of data points in the jth tied group.
The critical region can be obtained from the quantiles of the standard normal distribution. In the two-sided case, ±z 1−α/2 is the critical region. α denotes the type I error rate, that is, probability of falsely rejecting the null hypothesis.

A3. Wald-Wolfowitz Test
The Wald-Wolfowitz runs test (Wald & Wolfowitz, 1940) can be used to test the null hypothesis of randomness in a time series. It is based on transforming the series into a dichotomous vector, depending on whether each value is above or below a given threshold. Usually, the sample median is used as threshold, and values above it are coded as positive, while values below it are coded as negative. Observations equal to the median are dropped. A run is a subset of the transformed sequence, which is characterized by adjacent equal (i.e., positive or negative) elements. The number of runs r in a series of n = n + + n − elements has a conditional distribution given the observation of n + positive values and n − negative values, which is asymptotically normally distributed under the null hypothesis of randomness (cf. Central limit theorem).
The mean is estimated using the expected number of runs and the variance is estimated using the variance of the number of runs 2 2 = 2 + − (2 + − − + − −) the test statistic is the standard score with ± z 1−α/2 as critical region in the two-sided case.

A4. Theil-Sen Slope
The Theil-Sen estimator is a method for robustly estimating the slope in simple linear regression. The original method uses the median of a set of slopes (y j − y i )/(x j − x i ) for all pairs of n two-dimensional points (x i , y i ) to estimate the slope coefficient (Sen, 1968;Theil, 1950). In this comment we employ the repeated medians method of Siegel (1982), which is more robust and maintains a 50% breakdown point. This method is based on first determining the slopes of all lines between each observation (x i , y i ) and all other observations, and then computing the median slope m i related to each of the n observations. The final slope estimator is eventually determined as the median of all these n medians m i .

Appendix B: Trend Test Results
Quantile regression and the Theil-Sen slope on both the entire (1864-2007) and the shortened time series  data set resulted in a slope of 0. Furthermore, the mean trend of the local polynomial regression is included in the range of an ensemble of 10,000 local regression smoothers, which were calculated on the permuted data set ( Figure 1). According to the Bartels test, the Mann-Kendall trend test and the Wald-Wolfowitz test, the null hypothesis of no trend in the time series could only be rejected in the case of the Wald-Wolfowitz test on the reduced time series (Table B1).

Appendix C: Temporal Concentration
The temporal concentration in the course of the year, that is, the magnitude of seasonality, is given by the length of the vector 〈r〉 (Equation C1), with 〈r〉 ∈ [0, 1].
A value of 〈r〉 close to unity indicates low spread that is, high seasonality, while a value close to zero indicates high spread, that is, uniformity in time. Note that a cyclic pattern also results in 〈r〉 being near 0. 〈x〉 and 〈y〉 are defined as where 〈x〉 and 〈y〉 are the coordinates of the mean occurrence date and θ i is the angle of the ith event according to Equation C4. The transformation, projecting the days of the years on the unit circle, is given by Equation C4.
where ( * ) is the Julian day at which event i happened.

Data Availability Statement
The data that support the findings of this study are openly available as supporting information (Table S1) in Schneuwly-Bollschweiler and Stoffel (2012). The code used for data preparation, processing, and statistical analysis is available on GitLab at https://gitlab.com/Rexthor/debris-flow-trends-zermatt.