Forest disturbance and growth processes are reflected in the geographical distribution of large canopy gaps across the Brazilian Amazon

Canopy gaps are openings in the forest canopy resulting from branch fall and tree mortality events. The geographical distribution of large canopy gaps may reflect underlying variation in mortality and growth processes. However, a lack of data at the appropriate scale has limited our ability to study this relationship until now. We detected canopy gaps using a unique LiDAR dataset consisting of 650 transects randomly distributed across 2500 km2 of the Brazilian Amazon. We characterized the size distribution of canopy gaps using a power law and we explore the variation in the exponent, α. We evaluated how the α varies across the Amazon, in response to disturbance by humans and natural environmental processes that influence tree mortality rates. We observed that South‐eastern forests contained a higher proportion of large gaps than North‐western, which is consistent with recent work showing greater tree mortality rates in the Southeast than the Northwest. Regions characterized by strong wind gust speeds, frequent lightning and greater water shortage also had a high proportion of large gaps, indicating that geographical variation in α is a reflection of underlying disturbance processes. Forests on fertile soils were also found to contain a high proportion of large gaps, in part because trees grow tall on these sites and create large gaps when they fall; thus, canopy gap analysis picked up differences in growth as well as mortality processes. Finally, we found that human‐modified forests had a higher proportion of large gaps than intact forests, as we would expect given that these forests have been disturbed. Synthesis. The proportion of large gaps in the forest canopy varied substantially over the Brazilian Amazon. We have shown that the trends can be explained by geographical variation in disturbance and growth. The frequency of extreme weather events is predicted to increase under climate change, and changes could lead to greater forest disturbance, which should be detectable as an increased proportion of large gaps in intact forests.

reflect underlying variation in mortality and growth processes. However, a lack of data at the appropriate scale has limited our ability to study this relationship until now.
2. We detected canopy gaps using a unique LiDAR dataset consisting of 650 transects randomly distributed across 2500 km 2 of the Brazilian Amazon. We characterized the size distribution of canopy gaps using a power law and we explore the variation in the exponent, α. We evaluated how the α varies across the Amazon, in response to disturbance by humans and natural environmental processes that influence tree mortality rates.
3. We observed that South-eastern forests contained a higher proportion of large gaps than North-western, which is consistent with recent work showing greater tree mortality rates in the Southeast than the Northwest. Regions characterized by strong wind gust speeds, frequent lightning and greater water shortage also had a high proportion of large gaps, indicating that geographical variation in α is a reflection of underlying disturbance processes. Forests on fertile soils were also found to contain a high proportion of large gaps, in part because trees grow tall on these sites and create large gaps when they fall; thus, canopy gap analysis

| INTRODUC TI ON
Gaps in tropical forest canopies arise from tree mortality and play an important role in forest regeneration processes and forest biodiversity by creating habitat heterogeneity for forest-dwelling organisms (Brokaw, 1985;Grubb, 1977;Muscolo et al., 2014;Yamamoto, 1992).
Many understorey plants survive in a low-light environment and depend upon these occasional gaps to capture light and grow (Marthews et al., 2008). Small gaps favour species which are shade tolerant, while large gaps favour light-demanding pioneer species (Brokaw, 1985;Yamamoto, 1992). Gap colonization is driven by the nature of soil, plants and animals in the surrounding forest (Grubb, 1977). The size of gaps is also linked to the mode of deathwith broken/uprooted trees leaving larger gaps than standing dead trees (Esquivel-Muelbert et al., 2020). In this study, we map the size distributions of canopy gaps across the Brazilian Amazon and show how they are related to canopy height and environmental variables. Remote sensing technologies make it possible to map canopy gaps over large areas of tropical forests (Asner et al., 2013;Dalagnol et al., 2021;Espírito-Santo et al., 2014;Kent et al., 2015;Lobo & Dalling, 2013;Wedeux & Coomes, 2015). Several studies using airborne lidar datasets have found that gap size distributions follow a simple power-law function (f[x] = cx −α ) in which small gaps heavily outnumber large gaps in all forest environments (Asner et al., 2013;Espírito-Santo et al., 2014;Kellner & Asner, 2009;Lobo & Dalling, 2013;Silva et al., 2019). Identifying power-law distributions for ecological features such as canopy gaps provides insight into the nature of gap formation processes such as tree mortality (Goodbody et al., 2020). The power-law scaling coefficient α has been associated with the type and degree of disturbance in forested areas at the landscape and regional scales (Yamamoto, 1992) and can vary from less intense disturbance events (low proportion of large gaps) to mortality of large trees or damage at the stand level (high proportion of large gaps) (Asner et al., 2013;Silva et al., 2019).
Extremely large gaps are very rare and they are mainly caused by wind storms (Espírito-Santo et al., 2014;Negrón-Juárez et al., 2018), fire and logging events (Broadbent et al., 2008). Conversely, canopy openings due to branch falls result in very small gaps and are far more common (Asner et al., 2013;Espírito-Santo et al., 2014). Small gaps (<0.1 ha) account for an estimated 1.28 Pg of gross carbon losses per year over the entire Amazon region-a proportion of 98.6% of the total carbon losses due to gap formation (Espírito-Santo et al., 2014).
The size distribution of canopy gaps is also related to the history of anthropogenic disturbance (Jucker, 2022;Kent et al., 2015).
Forest recovery after a disturbance event depends on the severity of disturbance, the time since it occurred and local environmental factors (Cole et al., 2014;Kent et al., 2015), as well as anthropogenic actions such as deforestation, logging and fires (Aragão et al., 2014). In Gola rainforest park in Sierra Leone, Kent et al. (2015) found a higher gap fraction in logged blocks (3%-6.3%) than in old-growth forest blocks (1%-2.3%). In a peat swamp forest in Indonesia Wedeux and Coomes (2015) showed that, even 8 years after becoming protected for conservation, logged plots had a higher gap fraction and a higher proportion of large gaps (lower α) in comparison with an old-growth forest.
The size distribution of canopy gaps will vary along environmental gradients, since forest dynamics is controlled by environmental variables (Phillips et al., 2004;Quesada et al., 2012). Previous studies have found correlations between α and climate variables, topography and soils (Goodbody et al., 2020;Goulamoussène et al., 2017), as well as wind and lightning (Gora et al., 2021). In the Amazon, mortality and turnover rates mainly vary along an east-west gradient coinciding with a soil fertility gradient, with higher tree mortality and turnover rates in the rich soils of western Amazon compared to the eastern Amazon (Aragão et al., 2009;Esquivel-Muelbert et al., 2020;Phillips et al., 2004;Quesada et al., 2012). A large proportion of Amazonian forests have also experienced water stress by intense droughts (Anderson et al., 2018;Aragão et al., 2007;Marengo et al., 2018), which has increased rates of tree mortality and biomass loss (Phillips et al., 2009;Phillips et al., 2010). Wind has also been linked to high tree mortality (Negrón-Juárez et al., 2018;Rifai et al., 2016), with forests in the Northwest Amazon more vulnerable to windthrows and higher tree mortality than central Amazonian forests (Negrón-Juárez et al., 2018). Dalagnol et al. (2021) found that picked up differences in growth as well as mortality processes. Finally, we found that human-modified forests had a higher proportion of large gaps than intact forests, as we would expect given that these forests have been disturbed. 4. Synthesis. The proportion of large gaps in the forest canopy varied substantially over the Brazilian Amazon. We have shown that the trends can be explained by geographical variation in disturbance and growth. The frequency of extreme weather events is predicted to increase under climate change, and changes could lead to greater forest disturbance, which should be detectable as an increased proportion of large gaps in intact forests.

K E Y W O R D S
canopy height, environmental gradients, forest dynamics, gap size distribution, landscape ecology, power law, tropical forest gap fraction across the Brazilian Amazon was positively correlated with soil nutrients, water deficit, dry season length, wind speed and floodplains fraction; and negatively correlated with distance to the forest edge and precipitation. Building on this work, we focus on the size distribution of canopy gaps and its relationship with environmental factors. This focus allows us to distinguish areas with a high proportion of large gaps (i.e. heavily disturbed or human-modified forests) from intact or undisturbed forest.
This relationship depends on the definition of a canopy gap, that is, whether the cut-off height is defined as a relative number to local canopy height or as a fixed value (Dalagnol et al., 2021). Therefore, interpreting environmental effects on gap properties across heterogeneous forests can be challenging. For example, a treefall event creates a much smaller gap in a forest with a substantial understorey layer, as compared to the same event in sparser forest (Dalagnol et al., 2021;Leitold et al., 2018). Furthermore, the time it takes for a gap to close depends on the surrounding canopy height (Grubb, 1977;Muscolo et al., 2014) and the size of the gap (Dalagnol et al., 2019).
Canopy height is related to environmental factors, and a recent study in the Brazilian Amazon showed that the presence of very large trees is explained by low wind, high soil clay content, high precipitation, high temperature and high light availability (Gorgens et al., 2020). Therefore, we expect both canopy height and environmental factors interacting to control the size distribution of canopy gaps; however, little is known about these interactions in Amazonian forests.
In this study, we use a large tropical forest LiDAR dataset to explore the relationship between gap size distribution with environmental factors, anthropogenic disturbance and canopy height. This dataset, which was collected by the 'Improving Biomass Estimation Methods for the Amazon' project (Ometto et al., 2021), provides an unprecedented perspective on forest structural variation over 2500 km 2 of forest. We formulated four hypotheses: H1 Human-modified forests contain a higher proportion of large gaps than intact forest (i.e. have lower α).
H2 Tall forests will contain a higher proportion of large gaps than short forest, because big trees produce large gaps when they die.
H3 High wind speeds and lightning frequency will be associated with a high proportion of large gaps (i.e. low α), due to an increased rate of disturbance.
H4 Forests with high water deficit will be associated with a high proportion of large gaps, because water limitation will cause gaps in these areas to recover slowly.
H5 Low soil fertility will be associated with a high proportion of large gaps, because the rate of recovery from disturbance will be slow.

| LiDAR data collection and processing
We used 650 LiDAR transects ( Figure S1) of 375 ha (12.5 × 0.3 km) each collected by the 'Improving Biomass Estimation Methods for the Amazon' project (Ometto et al., 2021) between 2016 and 2018 Tejada et al., 2019). The transects were allocated in forested areas using mask layers for primary and secondary forests (TerraClass). Within these classes, the transects were randomly located, except for a small number of transects which intentionally overlapped with existing field plots. The flights were We reclassified all LiDAR point clouds into ground, vegetation and noise points, excluding noise points from further analyses. The classification of the points in LiDAR data is important to provide reliable digital terrain models (DTM) and, consequently, reliable height values used to estimate forest attributes, such as volume or biomass (Leitold et al., 2015;Longo et al., 2016). Points corresponding to terrain (ground points) were isolated and interpolated by the triangulation irregular network (TIN) method, generating a 1-m spatial resolution DTM. In addition, we subtracted the elevation for each vegetation point by its corresponding DTM to obtain the vegetation heights (Popescu & Wynne, 2004). Lastly, we applied the pit-free algorithm to create the canopy height model (CHM, Khosravipour et al., 2014) using the one highest return per grid cell and triangulated them to obtain a 1-m spatial resolution CHM ( Figure 1).

| Extracting gaps and characterizing their size distributions
As in other studies, we defined canopy gaps as contiguous areas of low canopy height which meet a number of thresholds. The first threshold (A) is that the canopy height must be below a cutoff height. We chose to use a 10 m cut-off height following (Silva et al., 2019) since this is commonly found in LiDAR data but low enough to be the result of a disturbance event. The second threshold (B) is that the area of low canopy height must be larger than 20 m 2 , this is to focus on gaps which are more likely the results of disturbance and to filter out noise. The third threshold (C) was that the gap must be smaller than 10,000 m 2 (1 ha) to avoid permanent features, such as roads or rivers, being classified as gaps. We then filtered out erroneous gaps (D) which were usually found along the transect edges ( Figure 1a). We achieved this by calculating a topographic position index, which depends on the values of neighbouring pixels, and excluded all polygons with missing values (Figure 1b; Figure S2). These thresholds used to delineate canopy gaps are somewhat arbitrary and different studies choose different values (Brokaw, 1985;Marthews et al., 2008;Wedeux & Coomes, 2015).
We filtered out transects where the median canopy height was under 15 m since we could not reliably detect gaps in these cases We calculated the area of each gap to have their size and then be able to fit a simple power-law function (Equation 1): where c is a normalization term, x is the gap size (m 2 ) and the scaling parameter α quantifies the disturbance level. Using the powerlaw package (Gillespie, 2015), we estimated the scaling coefficients (α) of each one of the transects.
As a rule of thumb, α values higher than 2 are found in forests dominated by small gaps and with less intense disturbance events, whereas α values lower than 2 indicate a higher proportion of large gaps (Asner et al., 2013;Silva et al., 2019). Deviation from the powerlaw pattern has been reported at large gap sizes and we observed this in some of the transects in this study. In these cases, the distribution can be represented by a power law which transitions to an exponential distribution at a given gap size (Wedeux & Coomes, 2015). We tested fitting this more complex model to the gap size distributions and found similar α values to those from the powerlaw package (see Figure S4).

| Characterizing forest structure
Following previous studies (Feldpausch et al., 2011), we split the Amazon into four regions (North, West, Southeast, Central-East).
We tested for statistical differences among median α values among regions using a post hoc Tukey's test at 95% confidence level. We used these regions to make our study comparable to previous work, recognizing our dataset does not sample these regions evenly.
To classify intact forests, we used the intact forest landscapes (IFL) map (Potapov et al., 2008), which delineates contiguous areas of natural ecosystems, showing no signs of significant human activity, and large enough that all native biodiversity could be maintained.
The IFL map (scale 1:1,000,000) for 2016 was applied to divide the dataset into two categories of forests-intact and human-modified forests. This product was created through expert-based visual mapping of fragmented and altered forest areas using medium spatial resolution images from Landsat TM circa year 1990 and ETM+ circa year 2000 as the primary data source for year 2000 IFL mapping.
The IFL map updates, such as the one available for 2016, were based on more recent data sources using similar methodology as the year 2000 mapping to ensure consistency (see details in https://intac tfore sts.org/). We included IFL as a factor in our linear model to test whether land-use is related to gap size distribution (H1). (1) The canopy height model panels show the gaps generated from a height threshold of 10 m and filtered by the topographical position index: example of gap delineation (area ≥ 20 m²) in a transect with the lowest proportion of large gaps (α = 2.50) before (a) and after (b) applying the topographical position index filter.
We extracted median canopy height and 99th percentile of canopy height (H max ) from the CHM. The H max variable was used as predictor variable to test the hypothesis H2.
The elevation was computed based on the third version of the Shuttle Radar Topography Mission provided by NASA with a spatial resolution of 30 m (Farr et al., 2007;Liu et al., 2014). The digital elevation model is available from 60° north latitude and 56° south latitude, covering 80% of Earth's land surface.

| Environmental data
To test hypotheses (H3-H5), we downloaded spatial data on water deficit, soil cation concentration (SCC), wind gust speed and lightning frequency ( Figure S5) for the entire Amazon, and ran statistical models to evaluate their influence on canopy gaps.
The Climate water deficit (CWD, in mm) was provided by the TerraClimate dataset, a global monthly climate and water balance for terrestrial surfaces spanning 1958-2015 (Abatzoglou et al., 2018).
With a spatial resolution of ~5 km, this layer combined high-spatial resolution climatological normals from WorldClim with Climate Research Unit Ts4.0 and the Japanese 55-year Reanalysis data.
CWD is a derived variable calculated as the annual evaporative demand that exceeds the water available in the soil. The reference evapotranspiration was calculated using the Penman-Monteith approach (Abatzoglou et al., 2018).
We used SCCs (in cmol [+].kg −1 ) as a proxy for soil fertility across the Amazon following Zuquim et al. (2019). The SCC map was generated from field measurements of soil chemistry, expanded using maps of plant indicator species to derive soil information for locations that had indicator plant maps, and then interpolated to produce a rasterized map covering all Amazonia by inverse-distanceweighted interpolation at a spatial resolution of 6 arcmin (~11 km); the raster values were log-transformed to produce the soil fertility index (Zuquim et al., 2019).
We used an instantaneous 10-m wind gust map (WG, in ms −1 ), which represents the maximum wind gust averaged over 3 second intervals, at a height of 10 metres above the surface of the Earth (Olauson, 2018). This layer has a spatial resolution of ~25 km. This variable came from the fifth major global reanalysis (ERA5) produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). The reanalysis combined model data with observations from across the world into a globally complete and consistent dataset (Olauson, 2018). We resampled all the layers above to a spatial resolution of 500 m applying the bilinear interpolation method, cropped them to the Amazon biome extension ( Figure S5), and calculated the transects median values to correlate them with their respective level of disturbance represented by α. We used the raster package (Hijmans & van Etten, 2012) to work with these layers.

| Statistical modelling
We first calculated the Pearson correlation (r) among environmental variables and forest structure metrics. The resulting covariance matrix guided us during selection of predictor variables that should be included in the model, avoiding the inclusion of strongly correlated variables ( Figure S6). However, the correlation coefficients of our variables were below 0.6 and we kept all variables in the models.
To address our hypotheses, we fit two separate multivariate linear regression models containing the seven explanatory variables discussed above (see Table 1). The first model contained only linear terms for simplicity while the second model included interaction terms. We determined which interaction terms to retain by first constructing a model containing all 42 possible interaction term and then removing superfluous interaction terms using backwards selection based on Akaike's Information criterion (AIC). We removed interaction terms where this decreased model AIC. Following Burnham and Anderson (2004) and Symonds and Moussalli (2011), we further removed interaction terms which were not strongly supported (i.e. where this increased AIC by less than 10). Where two models were similarly parsimonious, we chose the model with fewer interaction terms (see supplementary materials for details, Figures S7 and S8).
We standardized all predictor variables prior to modelling, rescaling them to have a mean of zero and a standard deviation of one.
Our regression models had the following form: where α i is the power-law scaling coefficient for transect i, β 0 is the intercept, β j is the regression coefficient for each predictor variable X ij , and ε ij is the residual error. Here, j is the index of the predictor variable.
To assess the goodness of fit, we performed a graphical analysis, calculated the AIC and the adjusted coefficient of determination (Adj. R 2 ). We also evaluated the collinearity among predictors using the variance inflation factor (VIF). Our definition of canopy gaps was based on a number of arbitrary thresholds, so we conducted a sensitivity analysis to determine how our model results depend on the choice of the threshold (Figures S9-S11).

| Gap distributions across the Brazilian Amazon biome
We analysed 487 transects of LiDAR data distributed over the Brazilian Amazon to test whether the size distribution of canopy gaps varies systematically. Figure 2a shows the patterns of gap size distributions across (2) i = 0 + j X ij + i the Amazon from Northwest to Southeast. The Southeast region had the highest proportion of large gaps (median α ± 95% confidence interval: 1.92 ± 0.02). The North (1.99 ± 0.01) and West (2.02 ± 0.04) regions contained a similar distribution of gaps while (2.10 ± 0.03) region had the lowest proportion of large gaps (Figure 2b).

| Human-modified forests
The median α for intact forest areas was significantly higher than that for human-modified forests (2.08 and 1.96 respectively, Wilcoxon p-value < 0.001). This result supports H1 that human-modified forests contain a higher proportion of large gaps than intact forests ( Figure 2c). This is also demonstrated by the fact that intact forests status (IFL = 1) had a significant positive effect on α in all our linear regression models (Table 1).

| Modelling the size distribution of canopy gaps
We used two multiple linear regression models to address our hypotheses regarding how environmental variables and canopy height jointly explain the observed variation in gap size distribution. We found that the first model (containing only linear terms for each explanatory variable) explained 39% of the variance in α, while the second model (additionally containing five interaction terms) explained 45% of the variance in α (Figure 3a). The two models show broadly similar patterns and we discuss them both in relation to our hypotheses below.
Our second hypothesis (H2) was that forests with tall trees would contain a high proportion of large gaps because big trees produce large gaps when they die. We found that maximum tree height was negatively associated with α in both models, meaning that the presence of very tall trees was associated with a high proportion of large gaps (supporting H2). Model 2 included an interaction between maximum tree height and wind gust speed, suggesting that this negative relationship between α and maximum tree height is strongest in areas with a low wind gust speed (Figure 3b).
Our third hypothesis (H3) was that high wind speeds and lightning frequency would be associated with a high proportion of large gaps due to an increased rate of storm disturbance. Our results support this hypothesis, since both models found significant negative effects of wind and lightning on α. We also found significant interaction terms between lightning frequency and SCC (Figure 3c), and between wind gust speed and maximum tree height (Figure 3b).
Finally, model 2 included a significant positive interaction between lightning frequency and elevation. This suggests that the negative relationship between α and lightning is strongest at low elevations.
Our fourth hypothesis (H4) was that forests with high water deficit would be associated with a high proportion of large gaps because the rate of recovery after disturbance would be slow. Our results support this hypothesis, since α was negatively associated with water deficit in both models. Model 2 includes significant interactions  (1) between water deficit and both elevation and intact forest status.
These interactions show that the negative relationship between α and water deficit is strongest for intact forests (Figure 3d) and at relatively high elevations (Figure 3e).
Our final hypothesis (H5) was that low soil fertility (approximated by low SCC) would be associated with a high proportion of large gaps because gaps would recover less quickly. Our results do not support this hypothesis, since α was negatively associated with soil cation content in both models, which means that large gaps predominantly occurred on fertile soils. In model 2, we found a significant negative interaction between SCC and lightning frequency (Figure 3f). This suggests that the negative relationship between α and SCC is strongest in areas with frequent lightning.

| Sensitivity to gap definition
We defined canopy gaps using a number of arbitrary thresholds (see Section 2) and we therefore conducted a sensitivity analysis to test how our results depended on these thresholds. Our results were not highly sensitive to the choice of minimum or maximum gap size, or the erroneous gap filter (Figures S10, S11 and S4). Our results were sensitive to the choice of cut-off height ( Figure S9). We found that a cut-off height of 5 m or 10 m produced similar estimates for the model coefficients, We were also concerned that the 2 m cut-off height would be more sensitive to errors in the ground detection algorithm used to create the canopy height model. We therefore decided to use the 10 m cut-off height for our analysis, but models for the 2 m and 5 m cut-off heights are provided in the supplementary materials ( Figures S9-S11).

| DISCUSS ION
The size distribution of canopy gaps (α) detected in an area of forest is determined by the balance between disturbance (gap production) and productivity (gap recovery) processes (Jucker, 2022). Large gaps will take longer to recover than small gaps, so areas with a high proportion F I G U R E 3 Summary of interaction model for α. The y-axes on all panels are the observed α. Panel a shows the full model predictions (model 2) which had an R 2 = 0.45, RSE = 0.121, AIC = 661 and p < 0.001. The remaining panels (b-f) show the five interaction terms which contribute to the full model (see Table 1). In each case, one variable is shown on the x-axis and the other is displayed using a colour scale.
For panels c-f, the black fit line represents the effect of the x-variable on α with no interaction. The red and blue fit lines illustrate the interaction effect. The red lines are fit only to the points where the interaction variable (in colour) is higher than its median value. The blue line is the opposite. For panel b, the fit lines represent the two classes (intact forest and human-modified forests).
of large gaps either have a high disturbance rate, or a slow recovery rate. It is important to consider the different time-scales of the main processes determining α. Water and soil fertility gradients have longterm effects on forest structure and species composition (ter Steege et al., 2006) and therefore cause long-term changes in the canopy gap size distribution. On the other hand, the immediate effects of disturbance are short-lived in the tropics since canopy gaps will close after 3-6 years due to natural regeneration and infilling (Brokaw, 1985;Dalagnol et al., 2019). Repeated disturbance can also have long-term impacts on forest structure. For instance, decades of high deforestation rates have left behind a legacy of fragmentation, increased forest edges and degraded forests across parts of Brazil (Aragão et al., 2014).

| Human-modified forests contain a high proportion of large gaps
In support of H1, we found that human-modified forests are characterized by a high proportion of large gaps (i.e. α values smaller than 2 in 71% of transects in human-modified forests vs. 42% in intact forests). Similarly, Wedeux and Coomes (2015) found a high proportion of large gaps in areas affected by logging in central Kalimantan, Indonesia. This suggests that human activities, such as logging, leave a legacy of large gaps which are slow to recover. Our models also showed that high water deficit was associated with an increase in the proportion of large gaps, but only across intact forest. The fact that this trend did not occur across human-modified forests suggests that the impact of humans on forest structure masks the potential impact of water availability.

| Tall trees produce large canopy gaps when they fall
We found that α was negatively correlated with the local maximum canopy height. This effect supports our hypothesis (H2) that tall forests would contain a higher proportion of large gaps than shorter forests. This is likely because tall trees have large crowns and therefore produce large canopy gaps when they fall (Grubb, 1977). Conversely, areas of short forests often contain a higher density of smaller trees, leading to smaller gaps when trees die and fall down (Wedeux & Coomes, 2015).

| Wind and lightning are associated with a high proportion of large gaps
High wind gust speeds and lightning frequency were associated with a high proportion of large canopy gaps (supporting H3). This suggests that increased natural disturbances rates change the structure of intact forests so that they resemble human-modified forests (i.e. a high proportion of large gaps).
Wind may be the direct cause of death for some individual trees and will also cause damaged/dead trees to snap or uproot, increasing the size of canopy gaps (Esquivel-Muelbert et al., 2020). Individual trees may acclimate to their local wind environment (Bonnesoeur et al., 2016), but when they are exposed to increased wind loading, for example due the creation of a nearby canopy gap, they are more likely to be damaged (Aleixo et al., 2019;Kamimura et al., 2019;Mitchell, 2013). This leads to a gap 'contagion' effect where large gaps may grow over time (Jucker, 2022). In extreme cases, wind disturbance can cause extensive damage (gaps >10 ha) to the forest canopy (Negrón-Juárez et al., 2018), but smaller scale wind disturbances (<0.1 ha) likely account for a much larger proportion of biomass turnover (Espírito-Santo et al., 2014).
We found that the negative relationship between α and wind gust speed was strongest across forests with low maximum canopy heights. This seems counterintuitive since we would expect tall trees to be the most vulnerable to strong winds (Jackson et al., 2021). One possible explanation could be that forests containing tall trees have survived more wind storms and are therefore acclimated to higher wind conditions than shorter forests (Bonnesoeur et al., 2016).
We also found that high lightning frequency was associated with a high proportion of large gaps. Lightning is often underestimated as a driver of tree mortality, partly because it can take many years for a tree to die  and the proximate cause of death may be mislabelled (e.g. as wind damage). Recent studies show that a single lightning strike can kill multiple trees, that it predominantly affects taller trees, and that lightning could be responsible for approximately 40% of the mortality of tall trees in lowland tropical forests Yanoviak et al., 2020). However, the interaction between lightning and maximum canopy height was not statistically significant in our model. Instead, we found that the effect of lightning on α was strongest at low elevations and in forests with low soil cation concentration. The latter effect may be related to the rate of recovery after disturbance, where forests with low soil fertility recover more slowly after lightning disturbance (as in H5).

| High proportion of large gaps in drier forests
We hypothesized (H4) that forests with high water deficit would have slow rates of recovery from disturbance and would therefore contain a high proportion of large gaps. Our results supported this hypothesis: we found a high proportion of large gaps in the forests on the Southeast fringes of the Amazon which are characterized by frequent prolonged moisture deficits (Phillips et al., 2009). The negative effect of water deficit on α was strongest at high elevations, although this interaction is driven by a small number of high elevation LiDAR transects in the South-eastern edge of the Brazilian Amazon ( Figure S5).

| High proportion of large gaps in forests with fertile soils
Contrary to hypothesis (H5), high soil nutrient availability (as measured by soil cation concentration) was associated with a high proportion of large gaps. For example, Acre state (see Figure 2) has fertile soils (Quesada et al., 2012;Zuquim et al., 2019) and high productivity (Phillips et al., 2004) but a high proportion of large gaps.
Conversely, the poor-nutrient soils found in the centre of the biome, mostly Amazonas state (Figueiredo et al., 2018), were associated with a lower proportion of large gaps. One possible explanation for the high proportion of large gaps in forests with fertile soils is that high productivity leads to high turnover rates. These turnover rates may therefore counteract the presumably fast recovery rates.
Another possible explanation is that productivity leads to high maximum canopy height which has a strong negative effect on α (H2). This is supported by a moderate correlation between soil cation concentration and maximum canopy height (see Figure S6).

| Large-scale trends in gap size distributions across the Amazon
Tree mortality rates vary across the Amazon with a higher mortality in the Western and South-eastern regions than in the Northern and Central-east regions (Esquivel-Muelbert et al., 2020). Similarly, Johnson et al. (2016) found the lowest rate of stem mortality in the Centraleast Amazon, followed by the Northern, Western and South-eastern regions using a network of field plots (Table 2). Dalagnol et al. (2021) predicted mortality rates using gap fraction and found a similar pattern, although with lower absolute values of mortality (Table 2).
Our results partially align with these trends in tree mortality. Specifically, we found the highest proportion of large gaps in the South-eastern region (median α ± 95% confidence interval: 1.92 ± 0.01) and the lowest proportion of large gaps in the Centraleast region (2.10 ± 0.03) which aligns with mortality rates in previous studies (Dalagnol et al., 2021;Esquivel-Muelbert et al., 2020;Johnson et al., 2016). However, we found a median α for the North (1.99 ± 0.01) similar to that for the West (2.02 ± 0.04). This is surprising because the Northern region has previously been shown to have lower mortality rates, similar to the Central-east. This disparity could be caused by the savanna of Roraima (in the Northern region), which has an open canopy structure and therefore contains a high proportion of large gaps (Barbosa & Campos, 2007).
We note that, to compare our study with previous work (e.g. Dalagnol et al., 2021;Esquivel-Muelbert et al., 2020;Johnson et al., 2016), we compared the median α for the four the regions described in Feldpausch et al. (2011). However, these regions were defined to maximize differences in tree allometry (Feldpausch et al., 2011) and are therefore not directly related to mortality. In addition, our dataset does not sample these regions evenly. In particular, the Northern and Western regions are not fully covered by our LiDAR dataset (Figure 2a).

| CON CLUS IONS
Canopy gaps reflect the balance between disturbance and regeneration in forests, and spatial variation in gap sizes reflect geographical changes in the drivers of forest growth and mortality processes. This study provides a new understanding of the variation in canopy gap size distributions across the Brazilian Amazon, and the processes that drive forest dynamics in the region.
We found that forests in the Southeast of Brazil contain a higher proportion of large gaps than forests in the North and West; this finding is consistent with previous studies showing greater mortality rates in the Southeast. As expected, human-modified forests contained a higher proportion of large gaps than intact forests. The presence of very tall trees was also associated with a high proportion of large gaps, presumably because large trees leave large gaps when they die.
We found that high water deficit, wind speed and lightning frequency were associated with a high proportion of large gaps. This suggests that stressors such as drought, wind and lightning significantly increase forest disturbance rates. Finally, we found a high proportion of large gaps in forests on fertile soils, possibly due to the high canopy heights or the fast turnover rates in these areas. These findings suggest that the increased frequency of extreme weather events resulting from climate change may increase the proportion of large gaps in currently intact forests across the Brazilian Amazon. We would expect α to be lower in regions with higher mortality rates, as low α indicates a high proportion of large gaps.