Forests and floods: Using field evidence to reconcile analysis methods

The extent to which forests, relative to shorter vegetation, mitigate flood peak discharges remains controversial and relatively poorly researched, with only a few significant field studies. Considering the effect purely of change of vegetation cover, peak flow magnitude comparisons for paired catchments have suggested that forests do not mitigate large floods, whereas flood frequency comparisons have shown that forests mitigate frequencies over all magnitudes of flood. This study investigates the apparent inconsistency using field‐based evidence from four contrasting field programmes at scales of 0.34–3.1 km2. Repeated patterns are identified that provide strong evidence of real effects with physical explanations. Magnitude and frequency comparisons are both relevant to the impact of forests on peak discharges but address different questions. Both can show a convergence of response between forested and grassland/logged states at the highest recorded flows but the associated return periods may be quite variable and are subject to estimation uncertainty. For low to moderate events, the forested catchments have a lower peak magnitude for a given frequency than the grassland/logged catchments. Depending on antecedent soil saturation, a given storm may nevertheless generate peak discharges of the same magnitude for both catchment states but these peaks will have different return periods. The effect purely of change in vegetation cover may be modified by additional forestry interventions, such as road networks and drainage ditches which, by effectively increasing the drainage density, may increase peak flows for all event magnitudes. For all the sites, forest cover substantially reduces annual runoff.

worldwide that forests prevent floods (e.g., Cambrian Wildwood, 2015;Confor, 2018). The main factors behind the controversy are disagreement over the means of quantifying the flood impact of forests and a relative lack of observational data, especially for large floods and for distinguishing between the effect of forests purely as a vegetation cover and the effects of management interventions, such as logging patterns and road networks. This paper addresses these points as follows.
1 Comparisons of peak flow magnitudes from paired catchments with and without forest cover for the same rainfall event (the so-called equal meteorology or chronological pairing method) suggest that forests (purely as a vegetation cover) can mitigate peak discharges for small to moderate floods but not extreme floods (e.g., Beschta, Pyles, Skaugset, & Surfleet, 2000;Thomas & Megahan, 1998). The higher annual evapotranspiration of forested catchments creates, on average, larger soil moisture deficits and therefore, during a storm event, the soil absorbs more of the rainfall that would otherwise contribute to flood runoff. For large events, this buffering effect is overwhelmed, and peak discharges are little affected by vegetation cover (e.g., Soulsby, Dick, Scheliga, & Tetzlaff, 2017). However, Alila and associates (Alila,  and any other influences vary over time. Alila and associates show on this basis that, in contradiction to the above, forest logging significantly increases the frequency and magnitude of peak discharges relative to the unlogged state and this effect increases with increasing peak magnitude. Therefore, and for the first time, this paper tests the hypothesis that the two, apparently conflicting, approaches can be reconciled and that they are both of value. 2 The value of the methods turns on the questions which they address. Green and Alila (2012) propose the question: what is the change in magnitude (frequency) for an event of a specific frequency (magnitude) of interest? Clearly this is addressed by frequency pairing. Flood frequency curves are a standard means of characterizing a catchment response and link flood magnitude to flood frequency. Frequency pairing therefore addresses public concerns that loss of forest cover increases the frequency of floods. It is also very relevant to engineering projects which design for an event with a specific exceedance frequency, such as the 100-year flood. Similarly, channel stability, as defined by channel geometry, is understood to vary with the flood frequency regime. However, there is another question: would the peak discharge have been as big if the forest cover had not been removed? This cannot be dismissed as irrelevant, because it is one that is asked by affected citizens (e.g., "But many whose properties were pulverized asked if clear-cutting boosted the magnitude of the flood." [Fowler, 2018]), who might for example wish to lodge a claim for damages against a forest company. There is a general public belief that removal of forest cover increases discharge peaks (e.g., Cambrian Wildwood, 2015;Confor, 2018).
Affected citizens may not therefore be so interested in (or may not understand) details of flood frequency; they are more likely to be concerned about the magnitude of a specific event. It is the responsibility of the hydrologist, therefore, at least to examine the question and to determine how well it can be answered. Here, frequency pairing may be less relevant because it provides an overall long-term characterisation of flood response and does not comment on individual storms. By contrast, chronological pairing has nothing to say about frequency or risk but does allow comparisons on a specific storm-by-storm basis. An important aim of the paper is therefore to determine exactly what information can be derived from chronological pairing and whether this is useful in answering the question of whether a particular peak discharge would have occurred, or would have been as big, if a forest cover had been in place.
3 Stratford et al. (2017) find that the strongest support for a peak flow mitigation effect from forest cover comes from modelling studies and that the results of field studies are more conflicted. Carrick et al. (2018) similarly note an increasing reliance on modelling studies and a lack of direct field evidence. The few field studies that do exist concentrate mostly on the Pacific Northwest of North America (e.g., Alila et al., 2009;Beschta et al., 2000;Green & Alila, 2012;Jones & Grant, 1996;Kura s et al., 2012;Thomas & Megahan, 1998). Chronological pairing and frequency pairing are therefore applied to four research catchments from across the world, providing new field evidence and greatly expanding on the previous geographically limited studies. 4 One impediment to field studies is the rarity of the larger flood peaks, combined with the lack of stationarity in conditions (e.g., in vegetation cover or climate) over the long periods between occurrences of large floods (e.g., Jones & Grant, 2001;Yu & Alila, 2019).
This makes it difficult to assemble a flood peak series for an individual catchment that both exhibits stationary conditions and is long enough for statistically robust analysis of the larger flood peaks.
This study therefore follows Lewis, Reid, and Thomas (2010) and Green and Alila (2012) in noting apparent trends in the field data, regardless of statistical significance, and conducting metastudies to investigate whether such trends have been measured repeatedly, although individually appearing to be statistically insignificant. The aim is to combine the records of the four catchments to determine if forest impacts on the few largest peak discharges in the records show similar behaviour, thus overcoming the limitations of record length. This goes beyond simply increasing the sample size of rare events for statistical analysis by pooling samples from multiple catchments. A repeated pattern across the catchments, explainable by physical reasoning, would provide strong evidence of a real effect, irrespective of statistical significance. 5 Stratford et al. (2017) note the need for more investigation of contextual factors, including the impacts of forest management practices (such as drainage ditching and road networks), compared with the impact of purely forest cover itself. Road and ditch networks may act similarly by, in effect, extending the stream network and increasing drainage efficiency. Both practices have thus been found to increase flood peaks (e.g., Jones & Grant, 1996;Robinson, 1998). La Marche and Lettenmaier (2001) suggest that road impacts increase with flood return period, while vegetation cover impacts decrease. Previous studies of the flood impact of different forest management practices have been carried out on a case-by-case basis. By combining catchments with a range of practices, this study offers a more integrated view distinguishing the effect of forest cover on its own (apparent from the repeated patterns above) from the effects of the individual practices (apparent from distinctive deviations from these patterns).
Overall, the paper uses a new high-quality data set to make a first attempt at reconciling the analysis methods and, in so doing, presents a new conceptual model of the impact of forests and forest management interventions on peak discharge magnitude and frequency distributions.
The emphasis is on floods driven by rainfall rather than snowmelt.

| FIELD SITES
The sites, from the four corners of the Earth, have been the subject of long-running research programmes on the impacts of both afforestation and logging and represent a range of forest management practices ( Figure 1, Table 1). The extensive data availability in each case is complemented by the authors' detailed knowledge of the sites. the Coalburn catchment before plantation in 1972. In its early years, the network increased both the annual runoff and the storm peak discharges compared with the pre-existing grassland condition. Subsequently, the ditches have partly filled with debris but a small part of the network still appears to affect storm flow response. Further details are given by Archer and Newson (2002), Bathurst et al. (2018), Birkinshaw, Bathurst, and Robinson (2014) and Robinson (1998 Palacios (2011), Iroumé, Mayen, andHuber (2006) and Iroumé, Palacios, Bathurst, and Huber (2010).

| METHODOLOGY
The sites are first assessed for stationarity of conditions (by considering precipitation trends and by double mass curve analysis) and for their compliance with the conventionally expected forest impact of reduced annual runoff (by considering the rainfall-runoff relationships). Chronological and frequency pairing comparisons then analyse the forest impact on peak discharges.
Chronological pairing compares peak discharges from different catchments or different catchment states when paired by the same or an equal rain event. For the paired catchments, the peak discharges are plotted against each other while, for La Reina, peak discharge is plotted against storm rainfall, distinguishing between the pre-and post-logging periods.
Flood frequency curves were prepared by ranking the peaks in size order and estimating exceedance probability using the Gringorten (1963) relationship.
where P(x) is the probability of a peak discharge equalling or exceeding a magnitude x in any given year, r is the ranking where r = 1 is the largest peak discharge and N is the number of years in the data record.
The return period T(x) for an annual maximum series (or recurrence interval for a partial duration series) was calculated for each peak discharge as 1/P(x). Generalized Extreme Value (GEV) distributions were fitted to the annual maximum series using L-moments.
For the paired catchments with records from before and after the relevant intervention (i.e., Glendhu and H.J. Andrews), and using the peak discharge data from the chronological pairings, linear calibration equations were derived for the pre-intervention period as: where Q intervention is the peak discharge for the catchment due to undergo intervention, Q control is the discharge for the control catch- Double mass curve analysis for the other sites shows the expected trends of increased runoff in a catchment post-logging and a decreasing runoff following plantation and during forest growth ( Figure 3). The analysis for the H.J. Andrews sites plots cumulative runoff against cumulative runoff, rather than cumulative precipitation, as this distinguishes the patterns a little more clearly.   Figure 4 show the F I G U R E 3 Double mass curves for cumulative annual runoff for the field sites three alignments (including the one for the grassland/logged catchments) to be roughly parallel, so that, for annual rainfalls of 1,500 and 3,000 mm, forest runoff as a proportion of grassland/logged runoff is 29 and 72% respectively for the lower forest alignment and 68 and 89% respectively for the upper forest alignment (noting, though, that the lines are provided for visual guidance and are not proposed as a quantitative model).

| Comparison of peak discharges by chronological pairing
Ideally the chronological pairing would be based on annual maximum peak discharges but only the Glendhu site has an annual maximum time series long enough to define a statistically credible pattern. For the other catchments, some form of partial duration or peak-overthreshold series is employed to ensure sufficient data points.
For the Glendhu site, Figure 5a compares the annual maximum flood peaks determined for the grassland control catchment GH1 with the corresponding peaks for the forested catchment GH2 (i.e., for the same event). The data show significant scatter for 1980-1986, that is, the 1980-1981 pre-plantation period plus 5 years following the December 1981 contour ripping but when the plantation had little impact on annual evapotranspiration (as shown in Figure 3). Overall, though, the gradient (coefficient a) in the regression of GH2 on GH1 in Equation (2)  There is no obvious effect of the m anuka scrub on the peak flows in the grassland catchment but, to check in more detail, mean storm peak discharges were calculated according to class for the periods 1993-2001 (m anuka encroachment less than 20%) and [2005][2006][2007][2008][2009][2010][2011][2012][2013] (m anuka cover averaging 26%) ( Table 2). Except for the largest storm class (in which the earlier period is poorly represented), there is little difference in mean peak discharges between the two periods, suggesting that the m anuka has had no discernible effect on peak discharges, at least for an encroachment of no more than 26% of the catchment area.
The Wark Forest comparison (Figure 5b) is based on a 6.83-year partial duration series of 40 flood events, corresponding to a threshold discharge of 0.183 m 3 s −1 km −2 for the Coalburn catchment (Bathurst et al., 2018). In this case it is the forested, not the grassland, catchment that produces the higher peak discharges for a given rainfall event. It is thought that part of the Coalburn catchment's ditch network supports a flashier runoff response, while runoff in the  Figure 5c and logarithmic scales are used to highlight the differences at low to moderate events. The fitted trendlines are therefore power laws. On average, the volume of precipitation from the individual rainstorms that generated the peak flows was not significantly different (t-statistic at a 95% level) between the two periods, thereby supporting the hypothesis that the evident increase in average peak flows results from the loss of forest cover. Both the pre-and postlogging periods are able to produce similar peak discharges for all F I G U R E 4 Comparison of annual runoff with annual rainfall for the four sites. Data are separated into three alignments characterized by fitted linear trendlines, as described in the text event sizes, indicating no mitigating effect from the forest cover.
However, the pre-logging period also shows peak discharges ranging to lower levels than the post-logging period for small to moderate events, indicating that a mitigating effect can occur, presumably when the event antecedent soil moisture conditions permit. Because of the logarithmic scale, though, the apparent visual convergence of the two data sets at the largest events may be misleading. Two reasons are advanced for the notably large variation in runoff response for a given rainfall, spanning over two orders of magnitude. The first is the selection method for the events. All events that correspond to the rainfall  (Figure 4). Consequently, there is potential for a greater reduction in peak discharges.
The peaks were selected by an algorithm that required a certain rate of rise of the hydrograph. Events were separated by a return to a threshold low flow and hence were considered to be independent.
The data refer to the period between late autumn (October/ November) and the following early spring (April/May). For the natural conditions before logging (1955)(1956)(1957)(1958)(1959)(1960)(1961), there is a clear linear relationship between WS1 and WS2 (control) (Figure 5d). For both the WS1 logging period (1962)(1963)(1964)(1965)(1966)  Comparison of WS3 and WS2 was initially carried out for the periods of pre-logging without roads (1955)(1956)(1957)(1958), pre-logging with roads in WS3 (1959WS3 ( -1962  There is a small increase in the range of WS3 discharges for a given WS2 discharge at around 0.5 m 3 s −1 km −2 (the two most noticeable points being in the early post-logging period) but there is no tendency between the two relationships to converge at the larger peak discharges.

| Comparison of peak discharges by frequency pairing
Flood frequency curves were prepared using the same peak discharge data as for the chronological pairing, apart from some modifications for the H.J. Andrews sites explained below. Following convention, the abscissa of the frequency diagram is labelled "return period" for analysis based on an annual maximum series and "recurrence interval" for analysis based on a partial duration series.
For the Glendhu catchment, as noted above, the line of equality in Figure 5a is adopted as the pre-intervention calibration equation. For H.J. Andrews, the post-logging increase in flood peak discharges evident for the WS1 catchment in Figure 5d was maintained, albeit in more subdued form, to at least 1979-1980(e.g., Jones & Grant, 1996 1962-1963to 1979-1980). The period is long enough for the curves to be determined for an annual maximum series rather than the partial duration series, enabling GEV distributions to be fitted. For the smaller return periods, peak discharge magnitudes for a given return period are larger for the observed (logged) than the expected (forested) states but both the rank-based curves and the fitted distributions converge at the largest peak discharges, subject to the associated uncertainties.
Rank-based frequency curves for the WS3 catchment were derived for the period 1961-1962 to 1967-1968, comparing the observed (with roads and partial logging) and expected (as if forested without a noticeable road impact) behaviours (Figure 6e). The peak discharge magnitudes for a given return period remain higher for the observed than for the expected state throughout and there is no convergence at the highest discharges.

| Suitability of the four catchments as a basis for a metastudy
While there is significant year-to-year variation in precipitation, there is little long-term trend over the period of record to bias the analysis ( Figure 2). Runoff trends are instead much more significantly related to the specific catchment interventions (Figure 3). Figure 4 confirms that the catchments, whatever their management interventions, display the conventionally expected behaviour of decreased annual runoff for a forest cover relative to a grass cover or logged state (e.g., Andréassian, 2004;Bosch & Hewlett, 1982;Zhang et al., 2017).
The enhanced nature of this behaviour for the Reina catchment may be related to the high water demand of the exotic tree species, especially over the dry summer period, and a high soil water retention capacity (Huber, Iroumé, & Bathurst, 2008).

| Reconciliation of chronological and frequency pairing for analysing forest impact on peak discharges
The controversies that require resolving are whether forests (purely as a vegetation cover) can mitigate peak discharges for large events and whether, in addressing this matter, both the chronological and the frequency pairing methods contribute relevant evidence.
Considering first the impact of forest cover on peak discharge, the chronological pairings of forested and grassland or logged catchments show remarkably similar results for all four field sites ( Figure 5). The Glendhu, La Reina and H.J. Andrews WS1/WS2 sites are the most relevant for the effect of forests purely as a vegetation cover but Wark Forest illustrates the same principles in the response to land use differences. For the low to moderate events, the effect of an absence of forest cover ranges from a significant increase (decrease in the case of Wark Forest) in peak discharges for a given event (more than doubling) to no effect at all. The variation is assumed to be due to differences in catchment conditions, especially soil moisture content. If the paired catchments are both saturated, the peak responses will be much the same, despite the differences in vegetation. Wark Forest (Figure 5b) shows this dependency can vary seasonally. For the largest events, all three paired catchments show a convergence of response (Figure 5a,b, d). For La Reina (Figure 5c) this latter pattern is less clear; modelling by Birkinshaw et al. (2011) using 1,000 years of synthetic rainfall data suggests that the range of response remains constant in absolute terms but decreases as a percentage of the event discharge, so indicating a relative rather than an absolute convergence.
The rank-based frequency pairing similarly presents a consistent pattern ( Figure 6). For the small to moderate floods, the forested catchment has a lower peak magnitude for a given flood frequency (or a longer return period for a given peak magnitude) than the grassland/logged catchment. (Again the inverse applies to Wark Forest.) For the largest floods, the two curves converge.
The overall agreement between the sites provides confidence in the metastudy approach. It is also reinforced by the one case where there is no convergence, namely, H.J. Andrews WS1/WS3, as agreed by both pairing methods (Figures 5e and 6e). In that case, the road construction may have had the effect of changing the drainage density and thus flow paths (i.e., a hydraulic effect), which would influence the response of events of all magnitudes. Channel scour (discussed later) may also have enhanced the effect. By contrast, the difference in the other cases is assumed to depend on the forest creating a soil moisture deficit to absorb part of the storm rainfall, a hydrological effect that becomes increasingly irrelevant at the larger events.
The value of chronological pairing turns on its relevance to the question of whether a particular peak discharge would have occurred, or would have been as big, if a forest cover had been in place. For low to moderate events, the method shows that forest cover may mitigate peak discharge response for a given rainfall event but also that it may not; the effect depends on not only the vegetation cover but also other factors such as soil moisture, soil depth and snow cover. In  found this return period to increase with record length, raising the possibility that convergence may not occur within any practical range of flood flows. In addition, the confidence limits for fitted flood frequency distributions begin to expand significantly once the return period exceeds around half the length of the record on which the curve is based and even more so as it exceeds the length of the record altogether (e.g., Linsley, Kohler, & Paulhus, 1975). The accuracy of the fitted distributions for the higher discharges (at which convergence is modelled) must therefore be questionable. Indeed, the projections of the GEV curves beyond the highest measured discharges in Figure 6a converge. First estimates from Figure 6 indicate a range of return periods from around 50 years for Glendhu to roughly 10 years for the other three sites. In addition to the influence of record length, catchment characteristics may play a determining role. For example, catchments with deeper soils or greater forest evapotranspirations (e.g., La Reina in Figure 4) may maintain separate frequency curves for forested and non-forested states to the very largest floods (as perhaps indicated by Birkinshaw et al.'s (2011) 1,000-year simulations for La Reina). Invoking the power of the metastudy, the fact that all the paired catchments (except for WS1/WS3) show convergence of response for data records of no more than a decade or two, suggests nevertheless that the flood events required for convergence to occur may in many cases be relatively common (with return periods of perhaps 5-20 years) rather than relatively rare.
For this study, the major differences in response between the paired catchments or catchment states at each site are explainable by plausible physical reasons related to the forestry interventions.
More subtle differences in the relative patterns between the sites will depend on site characteristics, such as soil properties, the significance of groundwater response, the vegetation type and the climate.
It was beyond the scope of this study to investigate such dependencies but they appear to have relatively little impact on the broad response pattern established for the type of catchment examined here. For a wider range of sites, though, they may hold greater significance (e.g., Cosandey et al., 2005).

| Impact of forest management practice
Considering the effect of vegetation cover only (without any other interventions), there is unanimity of response across the catchments.
Relative to the logged or grassland state, evapotranspiration is higher in forested catchments. At the annual scale runoff is therefore reduced quite substantially, especially in the drier years (Figure 4). At the event scale, as discussed in the previous section and Figure 7, forest cover may mitigate peak discharge magnitude and frequency at low to moderate floods but has less effect at the largest events when any buffer of soil moisture deficit is overwhelmed by the amount of rainfall (e.g., Figure 5d).
This pattern is clear where the contrast is between 100% and zero forest cover. For the 25% logging in H.J. Andrews WS3, the same pattern appears to be present but only to a minor degree (e.g., the small perturbations evident for WS2 discharges around 0.5 m 3 s −1 km −2 in Figures 5e and 6e). Similarly the spread of m anuka in the grassland Glendhu GH1 catchment seems to have had little impact on peak discharges, at least up to a catchment cover of 26%. This suggests that the threshold of percentage change in vegetation cover needed to cause a measurable impact on peak discharges is no smaller than the 20% change considered necessary to cause a measurable change in annual runoff (Bosch & Hewlett, 1982;Stednick, 1996) and may be larger.
The altered response in H.J. Andrews WS3 apparent in Figures 5e and 6e occurs in 1961, 2 years after the road construction in that catchment and 2 years before the 25% forest logging. Whether it is due to the road construction is therefore not entirely clear. Jones and Grant (1996) Thomas and Megahan (1998) for the full range of discharges (as might otherwise be expected) but is overcome at larger events or at times of soil saturation when the drainage efficiency of the Flothers matches that of Coalburn.
The hydrological effect of purely vegetation change on flood peak discharges can thus be dominated by the hydraulic effect of changes in drainage efficiency associated with roads, ditching and channel scouring, especially at large events. Other potential hydrological effects, not distinguishable in the data presented here but not necessarily inactive at the sites, may arise from differences in accumulated snowpack between forested and logged/grassland catchments (e.g., Green & Alila, 2012;Jennings & Jones, 2015;Jones & Perkins, 2010) and from alterations in soil permeability linked to ditching and contour ripping.

| CONCLUSIONS
Combining the data of four catchment studies has extended the geographical range of research into the impacts of forests on flood peak discharges and provided encouraging new field evidence to support the hypothesis that the chronological and frequency pairing methods for analysing the impacts can be reconciled. The study establishes an approach for comparing the methods and proposes a conceptual model for the impact of forests on discharge peaks (Figure 7) which other researchers can test. Central to the study has been a careful interpretation of the data for each catchment, based on physical reasoning and the authors' detailed understanding of their catchments.
1 Repeated patterns across the four study sites demonstrate consistency of response between field studies and provide strong evidence of real effects with physical explanations.
2 Both chronological and frequency pairing are relevant methods for determining the impact of forests on flood peak discharges but they address different, complementary questions. Frequency pairing provides an overall long-term characterization of the relationship between frequency and magnitude of discharge peaks; catchments with and without interventions have clearly differentiated relationships. Chronological pairing provides storm-by-storm commentary on peak magnitudes; relative responses between catchments with and without interventions may be highly variable, as a function also of catchment conditions.
3 Most of the available data refer to low to moderate floods. In this range, relative to a grassland or logged catchment, a forested catchment has a lower peak discharge magnitude for a given flood frequency or a larger return period for a given peak discharge magnitude. Within this overall pattern, though, the effect of the forest cover on a storm-by-storm basis depends on catchment conditions, especially the soil antecedent moisture content. Forest cover may mitigate the flood peak discharge (potentially by 50% or more). Equally, a given rainstorm may generate peaks of the same magnitude, but different return periods, for different vegetation covers, meaning no mitigation of peak discharge magnitude. More colloquially, forests do not prevent floods but they can make them less frequent.
4 For the largest events on record, both frequency and chronological pairing show a convergence of response, which suggests an increasing irrelevance of the vegetation cover.
5 The flood frequency, above which forest cover loses its potential mitigating effect, seems likely to vary between catchments, with moderate return periods (5-20 years) in some cases but more extreme values possible in others.
6 Catchment interventions other than purely change of vegetation cover can modify the above responses. Increased peak flows due to road networks, drainage works and channel scouring may be apparent for all events, with no convergence of response at large events. Ditch networks in the forested catchment may invert the relative magnitude of peak discharges otherwise expected between forested and grassland catchments.
7 For all the sites, whatever the management interventions, forest cover substantially reduces annual runoff by comparison with the grassland/logged state, especially in drier years.
Despite the striking similarities in the catchment responses which underlie the proposed model of Figure 7, wider confirmation of the model is required, through extension to a larger number of catchments (with a range of characteristics and management interventions), to catchment areas larger than the few square kilometres of this study and to tropical zones. Particular work is needed to determine if it is possible to identify a characteristic flood return period at which the responses of forested and non-forested catchments converge.