Amazonian pollen assemblages reflect biogeographic gradients and forest cover

Pollen assemblages are commonly used to reconstruct past climates yet have not yet been used to reconstruct past human activities, including deforestation. We aim to assess (i) how pollen assemblages vary across biogeographic and environmental gradients, (ii) the source area of pollen assemblages from lake sediment samples and (iii) which pollen taxa can best be used to quantify deforested landscapes.


| INTRODUC TI ON
The vastness of lowland Amazonia embraces a diversity of naturallyoccurring habitats ranging from savannas to dense rainforests.
Gradients of temperature, elevation, soil properties, precipitation and seasonality contribute to species distribution patterns, forest structural differences and propensity for fire (e.g.Gentry, 1988;Nepstad et al., 2004;ter Steege et al., 2006).This diversity of habitats has played a role in determining the distribution and density of past and present human occupation of the Amazon Basin, and also in the manner of land use (e.g.Bush et al., 2015;Neves & Petersen, 2006;Watling et al., 2015).For example, on the most nutrient-poor soils, people enriched the native clays with ash, faeces and silage to create Amazonian Dark Earths, but this practice was not followed on the more fertile soils emanating from the Andes (e.g.Glaser & Woods, 2004;Lehman et al., 2003).A commonality across Amazonia is that pre-Columbian settlers appeared to prefer close proximity to rivers or lakes as they offered additional food resources and, in the case of rivers, a means of transportation (e.g.Denevan, 1996).
In most settings, the characteristic signature of human occupation was forest clearance, commonly associated with fire or signals of cultivation.The scale of such disturbances in pre-Columbian Amazonia is actively debated, as is the detectability of these disturbances in paleoecological records (e.g., Clement et al., 2015;McMichael et al., 2012;Piperno et al., 2015).While identifying periods of human occupation in Amazonian forests can be relatively straightforward using archaeological or paleoecological approaches, quantifying past landscape modifications has proved challenging.Lake sediments can be particularly useful for quantifying aspects of past human activity as they offer the best possibility of obtaining an uninterrupted archive spanning millennia.Investigations of pre-Columbian land use and occupation patterns in Amazonia using lake sediments are routinely based on fossil pollen assemblages (Nascimento et al., 2022 and references within).The pollen assemblages are commonly analysed alongside charcoal fragments, which can be used to estimate the severity, frequency and intensity of fire events in Amazonian landscapes (Gosling et al., 2021;Power et al., 2008Power et al., , 2013)).Cultivation practices can also be identified (though not quantified) through the analysis of pollen grains or phytoliths found in lake sediments.
One parameter that has been lacking, however, is quantifying the extent of deforestation that occurred alongside occupation and cultivation.Using pollen assemblages to quantify past land cover has been successful in temperate ecosystems (Dawson et al., 2019;Gaillard, Sugita, Bunting, Dearing, et al., 2008;Gaillard, Sugita, Bunting, Middleton, et al., 2008;Liu et al., 2022;Prentice, 1988;Sugita, 1993Sugita, , 2007;;Trondman et al., 2015).The only attempt to quantify land cover in Amazonia was performed in the savannaforest landscape of Bolivia, which revealed that deforestation was easier to detect in the closed-canopy rainforests than in the savannas (Whitney et al., 2019).
One of the problems encountered in quantitatively reconstructing aspects of climate or land cover is that for a given taxon (which can be at the species, genus, or family level), pollen percentages do not reflect a 1:1 relationship with plant abundances in the landscape (Bush, 1991;Bush & Rivera, 1998, 2001;Gosling, 2004).In tropical settings such as Amazonia, many forest taxa produce very little pollen, or produce many pollen grains that cannot be identified with certainty (Bush et al., 2021;Bush & Weng, 2007).Most Amazonian tree species have hermaphroditic flowers (both stamens and anthers on the same flower), and these species tend to be severely underrepresented in pollen assemblages though they may comprise the majority of aboveground biomass of the system (Bush, 1995).The pollen abundances of monoecious tree species (having both male and female flowers on the same individual plant) may be close to parity with their abundances in the landscape (Bush, 1995).Dioecious species have either male or female flowers on an individual plant, and as the pollen must travel some distances for pollination to occur, these species are likely to be over-represented in pollen assemblages.Many pioneer or early successional species in Amazonia are dioecious, and because of their over-representation, pollen assemblages may suggest a more disturbed landscape than is the reality (Bush & Rivera, 1998).
A further complication in translating pollen assemblages into total amounts of forest cover or relative amounts of vegetation types is that many pollen grains can only be identified at the family level, and some families include congeners that are shrubs, trees or vines, for example Combretaceae.Thus, natural biogeographic variation in species composition, with different balances of dioecy, monoecy, or hermaphroditism between habitats across Amazonian pollen assemblages, may indicate varying amounts of forest cover even when no differences exist.An alternative approach is to look at the relative abundances of pollen that are derived from non-tree taxa (e.g.grasses, the Poaceae family).Poaceae percentages within a pollen assemblage are often interpreted as indicators of disturbances or open landscapes within the closed-canopy forests of Amazonia (Bush, 2002).Poaceae pollen within closed-canopy forests, however, can also be derived from floodplains, marshes and floating aquatic grass mats and may not be a direct reflection of deforestation due to human activity (Bush, 2002;Colinvaux et al., 1985Colinvaux et al., , 1988)).Further, in the savannas located along the periphery of Amazonia, the grass is naturally abundant, and Poaceae pollen may not be able to detect forest openings caused by human activities or other processes (Whitney et al., 2019).
Quantifying the amount of deforested areas using pollen assemblages in Amazonia is also dependent on the source area of pollen (Liu et al., 2022).Studies in the forest-savanna ecotone areas of Bolivia suggested that anemophilous (wind-dispersed) pollen found in lake sediments may be derived from distances of over 40 km (Whitney et al., 2019).In the closed-canopy forests that comprise most of the Amazon, pollen dispersal distances are believed to be smaller, limited by the high density of trees, low sub-canopy wind velocities, and predominance of entomophilous taxa (Bush et al., 2021;Gosling et al., 2009).There has been no assessment of the variability of the source area of pollen deposited into lake sediments across various regions of Amazonia.
Here we assess how pollen assemblages across Amazonia vary in aspects of biogeography, source area, forest cover and amounts of cleared landscapes.We use modern pollen assemblages from 65 Amazonian lakes to assess the variability in composition of Amazonian pollen assemblages and how they may be used to reconstruct past landscape deforestation.We specifically address the following questions: How do pollen assemblages derived from lake sediments vary across biogeographic regions in the Amazon?What spatial scale are pollen assemblages representing across the biogeographic regions of Amazonia?Which pollen taxa can be used to best quantify deforested landscapes?

| Data collection and preparation
A total of 65 sites located within Amazonia sensu stricto (Eva et al., 2005) were used in our analysis (Figure 1, Table S1).Samples from new sites (n = 44) were collected with a Universal corer during 2018 and 2019 (Table S1).Samples from previously published sites (n = 21) were collected with a Colinvaux-Vohnout piston corer or a Universal corer (Table S1) (Bush et al., 2021).Our sites were located within evergreen tropical rainforests (n = 56) or savannas close to the ecotone with forests (n = 9; Figure 1a).Forty-four of the forested sites were situated within terra firme soils, and 21 on floodplains (Table S1).The sites included permanent lakes, oxbow lakes and flooded marshes in the Brazilian savannas (Table S1).
We divided our sites into three geographic regions (northwestern, southwestern and eastern) based on previous classifications on the basis of the biogeographical distinctness of vegetation (Gentry, 1988;ter Steege et al., 2000ter Steege et al., , 2013ter Steege et al., , 2016; Figure 1).We included additional classifications of 'savanna' and 'Hill of Six Lakes' (hereafter Ho6L) sites, which are also biogeographically distinct from the surrounding landscape (Figure 1).The 'savanna' sites of Roraima (Brazil) are located in non-forested savanna areas, and the Ho6L sites are located atop an inselberg in northwestern Amazonia (Figure 1).We characterized the climatic and topographic conditions of each site (Table S1).The climatic variables of mean annual temperature, mean annual precipitation and dry quarter precipitation were derived from CHELSA, a free high resolution (30 arc sec, ~1 km) global climate dataset (Karger et al., 2017).Averages were derived F I G U R E 1 Map of 65 lakes within Amazonia sensu stricto (Eva et al., 2005, green outline) with modern pollen assemblages used in the analysis.Sites are colour-coded based on assigned biogeographic regions.Inset maps show environmental gradients of mean annual temperature, mean annual precipitation, dry quarter precipitation and elevation within Amazonia (see Table S1 for values for individual sites).All maps were generated using ArcGIS Pro 3.0 and the environmental data were obtained from CHELSA V 2.1 (Karger et al., 2017) or Shuttle Radar Topography Mission digital elevation models (Jarvis et al., 2008).
from climatological data obtained from 1981 to 2010.Elevation data were obtained from Shuttle Radar Topography Mission S holefilled ~1 km resolution digital elevation models (Jarvis et al., 2008).
Climate and topographic data for a 40 km radius around each site were downloaded, and the average values from the grid cells falling within each of the buffers were calculated (see below for description of buffers).
Elevation ranged from 10 to 700 m a.s.l.across all sites, with the western and Ho6L regions bordering the Andes primarily containing sites with higher elevations and the eastern region containing sites at the lower elevation range.Mean annual temperatures were highest in the eastern and savanna sites (27°C) and decreased toward the northwestern sites (22-25°C; Figure 1; Karger et al., 2017).Annual precipitation ranged from 1600 to 3300 mm, with the northwestern sites receiving the highest amounts of precipitation and the savanna sites receiving the lowest (Figure 1).Seasonal precipitation (precipitation of the driest quarter) ranged from 138 to 923 (mm) and was highest in the northwestern region but lower in the eastern and savanna regions (Figure 1).
From each of the 65 sites, we collected the upper 1 cm of sediment (mud-water interface) for pollen analysis.Pollen samples were prepared in the Palaeoecological Laboratory at the Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Netherlands or in the Neotropical Paleoecology Laboratory at Florida Institute of Technology, USA, and followed standard procedures as documented in Faegri and Iversen (1989).For each sample, 300 to 500 pollen grains were counted to generate a pollen assemblage, and tablets of Lycopodium clavatum spores (Batch no: 483216, 18,583 spores per tablet) were added prior to sample preparation to allow for calculations of pollen concentration (Stockmarr, 1972).
Pollen identification of the new samples was at a finer taxonomic resolution than the previously published samples, and data were harmonized (e.g.merged to a coarser taxonomic resolution) prior to analysis (e.g.Bush et al., 2021).
To assess the spatial scale of pollen representation, we generated a series of buffers around the perimeter of each of the 65 lakes at distances of 1, 2, 5, 10, 20 and 40 km using Google Earth Engine (Gorelick et al., 2017;Figure S1).We used the Global Forest Change v1.8 database, which is publicly available at 30 m spatial resolution and generated from Landsat satellite imagery, to calculate forest cover for each buffer around each site (Hansen et al., 2013).Within the database, forest cover is expressed as percentages per pixel, where 0 values represent no forest cover within the pixel, and 100 represents a completely forested pixel.We considered pixels containing >50% of forest cover as forested.The number of forested pixels (number of pixels where forest canopy height exceeds 5 m) for each buffer was calculated by taking the number of forested pixels present in the year AD 2000 and subtracting the number of forest pixels lost each year until the time of sample collection.For example, if the lake sediment sample was collected in AD 2018, the number of forested pixels was the number of forested pixels present in the AD 2000 minus the number of forested pixels lost each year until AD 2018 (Figure S1).JRC Global Surface Water Mapping Layers, v1.3 (Pekel et al., 2016) was used to delineate water body boundaries and to calculate the number of water-based pixels within each buffer (Figure S1).The final metric of relative forest cover was a proportion of the total number of forested pixels out of the total number of nonwater pixels for each buffer (Figure S1).

| Data analysis
All pollen counts were converted to percentage data, and all analyses were run on pollen percentages.We performed Detrended Correspondence Analysis (DCA) to assess (dis)similarity between samples within and between the biogeographic regions.We also fitted smooth surfaces of elevation, mean annual temperature, annual precipitation and precipitation seasonality on DCA site score ordinations (Oksanen et al., 2013) to assess (dis)similarity among sites along the environmental gradients.All ordinations were performed using pollen taxa that exceeded 5% in at least one sample.We used this threshold to reduce the possibility that rare taxa (which are also potentially misidentified) would drive the sample differences (Correa-Metrio et al., 2014).To reduce the number of taxa shown on the plot (overcrowding), only taxa with a significant correlation (p < 0.05) with either of first two axes were shown on DCA outputs.
For each pollen type identified, we categorized the growth form and pollination strategy based on expert knowledge and published literature (Bush, 1995;Ollerton et al., 2011;Renner & Feil, 1993) (Table S2) and used these to form pollen taxa groups (i.e., grasses, herbaceous, woody, wind pollinated, animal pollinated, etc.).We assessed how the percentages of specific pollen taxa (Poaceae, Cecropia and Moraceae/Urticaceae) or percentages of the pollen taxa groups (Table S2) could be used to predict relative amounts of forest cover.Poaceae (the grasses) are typically indicative of open areas, Cecropia is a light-loving disturbance tree that is indicative of early successional regrowth, and the Moraceae/Urticaceae families, which are not distinguishable using pollen, is a group composed primarily of Amazonian trees.The pollen taxa groups were either related to forest structure (e.g.herbaceous, woody and trees) or the pollination syndrome (i.e., anemophilous and entomophilous), as both could be linked to amounts of forest cover in the surrounding landscape.We used a beta regression model (Cribari-Neto & Zeileis, 2010), which was developed for response variables that are proportions or rates, to determine how well the pollen taxa and taxa groups could predict the proportion of forest cover pixels in the landscape.Lake size can have a strong influence on pollen source area (Jacobson & Bradshaw, 1981;Prentice, 1988); therefore, we also ran models with lake area (km 2 ) as a secondary predictor variable to determine if size affected the apparent source area of pollen.We ran models for each of the buffers around the lakes where forest cover was measured (1, 2, 5, 10, 20 and 40 km).Model performance was assessed using pseudo-R 2 values and p-values of the beta regression coefficients for the predictor variables (Cribari-Neto & Zeileis, 2010).Pseudo-R 2 values for the beta regression model are squared correlation values of the linear predictor and link-transformed response (in this case, logit).To show proof of concept, we applied our beta regression model to previously published pollen data from Lake Kumpak a in Ecuador (Åkesson et al., 2021).The superscripted "a" in Kumpak a reflects pronunciation of the lake name in the Shuar language (Colinvaux et al., 1985).

| RE SULTS
Pollen assemblages, based on 252 pollen types found across all sites, showed high taxonomic diversity and variation both within and among the biogeographic regions (Figure 2).The northwestern, southwestern and Ho6L sites generally contained higher abundances of Moraceae/ Urticaceae pollen (10%-60%) compared with the eastern and savanna sites (Figure 2).The dry forest shrub Byrsonima was present in higher abundances (5%-35%) in the savanna and eastern sites compared with the other regions where it rarely exceeded 5%.Alchornea, a tree, was present in all regions, but highest in the Ho6L sites (Figure 2).
Percentages of the hardwood palm Iriartea deltoidea were higher in the northwestern and southwestern sites (5%-20%) compared with other regions, matching its known geographic distribution.Euterpe, a genus of palm that includes E. precatoria, the most common tree in the Amazon, had higher pollen abundances (5%-25%) in the eastern sites compared with other regions (Figure 2).Percentages of the weedy herbaceous taxon Ambrosia and the sedges (Cyperaceae family) were generally higher in the eastern and savanna sites compared with the other regions.All regions contained grasses (Poaceae), though abundances were highest in the savanna region (25%-68%) and ranged from 0%-40% in the other regions (Figure 2).Some taxa were present in all regions but had highly variable percentages within the region.
For example, Cecropia, a genus of fast-growing, disturbance-oriented trees, was present in all regions, but within a region ranged from 0% to >30%.The palms Mauritia flexuosa and Mauritiella sp., which only grow in swampy conditions, were also present in all regions but were absent from many individual sites (Figure 2).
The DCA analysis provided some separation according to the biogeographic affiliation of the 65 sites (Figure 3a).The northwestern and southwestern sites had vegetation that was more similar to each other and was located on the positive end of Axis 1, whereas the eastern Amazonian and savanna sites were overlapping and both were located on the negative end of Axis 1 (Figure 3a).In total, the 28 taxa used in this analysis were able to significantly separate the western sites from the eastern sites on DCA Axis 1.
Generally, sites within the NW Amazon and Ho6L regions had the highest percentages of surrounding forest cover (75%-99%) and the E Amazon sites typically had lower percentages (30%-78%) (Figure S2).
The savanna sites had the lowest forest cover values, ranging from <1% to 26%.The SW Amazon sites exhibited the most variability, where forest cover percentages ranged from 30% to 90%, depending on the size of the buffer.The percentages of forest cover around each lake, and thus site rankings, varied depending on the size of the buffer (Figure S2).Forest cover values around each lake were generally positively correlated, though some spatial heterogeneity existed in NW Amazonia, E Amazonia and the savanna sites (Figure S3).
The beta regressions indicated that Poaceae percentage was the best predictor of forest cover, based on the pseudo-R 2 values, model coefficients and p-values of model predictors (Table 1, Table S3).
Poaceae percentages were able to predict forest cover at all buffer sizes (1, 2, 5, 10, 20 and 40 km), but model performance decreased as buffer size increased, and performance was highest for the 1 km buffer (Figure 4, Table 1).Cecropia was also able to predict forest cover, though with lower pseudo-R 2 values and only for the 1, 2, 5 and 10 km buffers (Table 1).Poaceae plus Cecropia values (Poaceae + Cecropia) were able to predict forest cover at all buffer sizes better than Cecropia alone but not as well as with Poaceae alone (Table 1).Pseudo-R 2 values for all significant models (Poaceae, Cecropia, Poaceae + Cecropia) increased when lake size was included as a secondary predictor, though lake size itself was not a significant predictor in any models (Table 1).All other pollen groupings based on pollination syndrome or ecological characteristics (Table S2) performed worse than Poaceae, Cecropia, or Poaceae + Cecropia percentages (Table 1, Table S3).

| DISCUSS ION
Our analysis of 65 mud-water interface samples collected across a broad range of Amazonian habitats provides novel insights into the biogeographic and local-scale variability of pollen percentages across this heterogeneous landscape.To date, most quantitative studies of modern pollen representation in Amazonia have been on relatively local scales comparing sites within 10s of km (Bush, 2002;Gosling et al., 2009;Whitney et al., 2019).The greater spatial scale of this study, across >2000 km, embraces long environmental gradients (Figure 1, Figures S2 and S3).Whereas the beta diversity of habitats in local analyses for example the difference between igapo and terra firme forest in pollen traps (Bush, 2002;Gosling et al., 2009) was readily identifiable, the larger catchment area afforded by lake sediments provides a more blended signal.In this analysis, we use the commonest pollen types to attempt to determine gamma diversity and relate those changes to long environmental gradients.

| Broad biogeographic patterns
Floristic studies show that environmental gradients across Amazonia are reflected in altered forest composition resulting in biogeographic sub-regions (Gentry, 1988;ter Steege et al., 2006).
These evaluations are based on species-level turnover and changes F I G U R E 2 Pollen percentages of taxa reaching at least 5% abundance from 65 lakes in Amazonia.These taxa were included in the Detrended Correspondence Analysis (see Figure 3).Sites are colour-coded based on assigned biogeographical region.
in stem density within forest plots (Gentry, 1988;Ruokolainen et al., 2019;ter Steege et al., 2006;Tuomisto et al., 2016).While we are usually unable to identify pollen to the species or genus level, we find that when using the 30 commonest pollen types, the biogeographic sub-regions can still be differentiated (Figures 2 and 3).
In many DCA analyses the Axis 1 versus Axis 2 site scores form a diamond shape (Gauch, 1982), but in practice often form a triangle.Somewhat unusually, in our plot of the DCA Axis 1 versus Axis 2 results, the species scores form a diagonal with forest taxa.
Taxa such as Urticaceae/Moraceae, Cordia, Ficus, Acalypha and Iriartea, plot at the lower right, while indicators of openness, for example Poaceae, Cyperaceae and Byrsonima, plot at the upper left (Figure 3a).We explain this diagonal trend by observing that the large environmental gradients are somewhat, but not completely, autocorrelated.Due to this partial autocorrelation there was enough variance for the first two axes of the DCA to cause this diagonal pattern (Figure 3b-f).Consequently, we find that the site scores plot from the hottest, lowest, driest, most seasonal savannas at the top left, that is low scores on Axis 1 and high scores on Axis 2, to the coolest, highest, wettest and least seasonal forests plot at the bottom right with high scores on Axis 1 and low scores on Axis 2 (Figure 3b-f).
Our data reflect known distributional patterns.Poaceae abundances track the gradient of increasing precipitation and seasonality across the savanna, cerrado, dry forest, rainforest continuum (Figures 2 and 4;Flenley, 1979;Gosling et al., 2009;Rodgers & Horn, 1996).Poaceae pollen percentages are highest in the savanna region, where forest cover is lowest (Figure 2, Figure S2).
Similarly, the western Amazonian forests have higher stem densities of Cordia, Ficus, Acalypha and Iriartea, than eastern forests (ter Steege et al., 2013Steege et al., , 2016)), and this is reflected in the pollen analysis (Figure 2).The pollen of taxa derived from the eastern flank of the Andes mountains, such as Podocarpus, Alnus and Hedyosmum, is also more abundant in the western Amazonian sites.Collectively, these taxa could be used to differentiate western Amazonian sites from those in the Ho6L, eastern Amazon, or savanna sites (Figure 3a).

| Local gradients of forest cover
Detecting disturbance in paleoecological records is important for assessing the impacts of climate change or human impacts on forests (e.g.Davis, 1981;Fyfe et al., 2015).A key metric of disturbance is the amount of openness in a forest or its reciprocal, percent forest cover (Broström et al., 1998;Fyfe et al., 2015).It might be thought that simply adding all 'forest pollen' taxa together would provide the best proxy for forest cover.There is considerable uncertainty, however, as to the correct classification of pollen types that could be large trees or vines (e.g., Combretaceae), or herbs, shrubs or vines (e.g., Asteraceae).Consequently, the 'forest pollen' group is not significant in predicting forest cover.Furthermore, many forest taxa are silent in the pollen record because of their reproductive mechanisms (Bush, 1995), whereas others, such as Acalypha and Cecropia are strongly over-represented (Bush & Rivera, 1998;Bush & Rivera, 2001).Weighting for the area and the perimeter of the lake in case pollen taxa were growing on the shoreline does not improve correlations.Of the many combinations of taxa, Poaceae percentages are the best predictor of forest cover in our samples, with an inverse relationship, regardless of the scale of examination (Table 1, Figure 4).Our data highlight that floating grass mats found on many Amazonian lakes do not contribute significantly to increased Poaceae percentages even when deforestation is minimal (e.g.Colinvaux et al., 1985;Colinvaux et al., 1988).As grasses occasionally grow around the edges of the lake, our results also show that this is not a determining input of Poaceae pollen into our sites.
We found that as Poaceae percentages increased within the dense closed-canopy forests of the non-savanna sites, forest cover in the surrounding landscape decreased significantly (Figure 4).
These results suggested that the relationships between forest cover and Poaceae are stronger in the closed-canopy forest than in the savanna.The uppermost 1-2 cm of sediment contains pollen deposited over multiple years, and the forest cover estimates covered approximately 10-years (Hansen et al., 2013).Most of our samples were collected in 2018 and 2019, and forest opening was calculated collectively since the onset of forest cover database in 2010 until the time of sample collection.It is thus possible that our analysis was affected by the timing of deforestation at the sites.For example, a site that has been continuously deforested for the last 10 years may have a stronger signal of forest opening than a site which has only had some opening in one out of the 10 years.Whitney et al. (2019) found that in a forest-savanna ecotone, small openings in dense forests were detectable by small increases in Poaceae, but in seasonally flooded savanna landscapes that were more open the Poaceae representation was not negatively proportional to forest cover.Our findings are consistent with those results.
The savanna sites in our study exhibited a wide range of Poaceae percentages (5%-68%) alongside little variation in forest cover (0%-22%; Figure 4).Unlike the Whitney et al. (2019) study, where Poaceae representation saturated in landscapes with >5% forest openings, we found a more persistent relationship.One explanation may be that our sites were small, closed-basin lakes (especially in the savanna settings), whereas the sample sites exhibiting this lack of sensitivity used by Whitney et al. had more seasonal flooding or were much larger waterbodies.
TA B L E 1 Summary of beta regression results using groups of pollen taxa to predict forest cover values at 1, 2, 5, 10, 20 and 40 km buffers around Amazonian lakes (N = 65).For specifics on pollen groupings, see Table S2.For the full set of results showing all pollen groups, see Table S3.The source area from which the Poaceae were most indicative of forest openings is also of great interest as there is the perennial question in palynology, do signals reflect small scale changes near the sampling point or large-scale changes at distance (e.g.Bunting & Middleton, 2009;Hellman et al., 2009)?Whitney et al. (2019) found that samples collected within forests were sensitive to small scale clearance events close to the sampling point, but for sample points that lay outside forest in Bolivian savannas, the source area of pollen could be derived from up to 40 km.Our data were not inconsistent with these findings, though we can refine this understanding.
We found that the Poaceae percentages from sites collected in a forested setting best reflected forest cover within a 1-2 km radius of the sampled sites (Table 1, S3).The best fits between pollen percentages that had significant model coefficients (Poaceae, Cecropia and Poaceae + Cecropia) and forest cover were all strongest within the 1-2 km buffer (Table 1, Table S3).Our findings likely differed from Whitney et al. (2019) because most sites were in predominantly closed-canopy rainforests that were not near forest-savanna boundaries, apart from the nine savanna sites (Figure 1).These results provide increased confidence that the Poaceae percentages can reflect small canopy openings at distances of 1-2 km from the lake edge, increasing the detectability of small-scale land use in paleoecological contexts.
Our findings correspond with the fact that in closed-canopy forests, up to 98% of taxa have pollen that is dispersed by animals (i.e. insects, birds or bats; Bush & Rivera, 2001).The potential for long distance dispersal of pollen released within 1-2 m of the ground, that is grasses and herbs, is greater in savanna systems, where wind speeds are higher, there are fewer tree trunks to intercept pollen and convective heating is more likely to reach the herb layer and generate more turbulence, thereby helping pollen to disperse longer distances (Hoffmann et al., 2012;Tauber, 1977).Our dataset did not contain enough sites to model savanna separately from forested settings, but we consider it likely that the wind-dispersed pollen types would have shown a stronger relationship with the larger-sized buffers in our savanna sites, and that entomophilous taxa would have had a stronger relationship with the smaller buffers.

| An applied case study
Even within regions, for any given amount of forest cover there is 1.6% and 1.0%, respectively; Figure 5; Åkesson et al., 2021).These data give us confidence in our beta regression modelling resultsthat Poaceae representation is correlated with forest cover.Our regression predicts that 1.6% and 1% of Poaceae pollen represents 82.5%-83% forest cover within 1 km of the lake (Figure 5).Based on our regression, the least forest cover of the last 2000 years, occurred c. 1005 CE, corresponding to c. 66% forest cover within a kilometre of the lake (potential range 48%-80%; Figure 5).
Regardless of these uncertainties, our results show promising relationships between pollen percentages, particularly Poaceae, and local forest cover.We provide a dataset that can be used to move forward with quantitative reconstructions of forest cover in Amazonian paleoecological records using Poaceae pollen, which readily preserves in lake sediments over thousands of years and is easily identifiable even with novice pollen analysts.We hypothesize that more nuanced estimates of forest cover and forest type can be estimated if classifications of forest type (using different remote sensing products or areal imagery) are matched to pollen assemblages across our sites (and other sites).Quantitative estimates of forest cover and forest type in a paleoecological context are important for addressing research questions regarding the extent, duration and frequency of pre-and post-Colonial deforestation in Amazonia.

| The path forward
We can identify limitations within our study and offer some thoughts on profitable directions for future research.One limitation of our dataset was that the distribution of forest cover values for our sites was skewed toward high values of forest cover (Figure S2).The total amount of forest cover surrounding our sites was locally variable

F
I G U R E 3 Detrended Correspondence Analysis showing site-taxa relationships and site-environment relationships for pollen assemblages analysed from 65 lakes across Amazonia.Sites are colour-coded by assigned biogeographic regions.(a) Site-taxa relationships where vectors indicate taxa that were significant (p < 0.05) in separating sites from each other; (b-f) Site (dis)similarity across environmental gradients of: mean annual temperature (b), annual precipitation (c), elevation (d), dry quarter precipitation (e) and forest cover (f) for the 2 km buffer.
considerable variability in Poaceae pollen representation.Local factors, that is basin shape, the presence of input streams, or fringing marsh extent, provide enough variability that we do not recommend taking an observed value of Poaceae and translating that into forest cover via the regression line without taking note of the confidence intervals.For example, Lake Kumpak a had two overflight photographs taken during coring operations in 1983 and 2014 and the imagery revealed ca.85% forest cover within 1 km of the lake during both flights F I G U R E 4 Beta regression models using Poaceae percentages to predict forest cover values (the proportion of forest pixels) calculated at 1, 2, 5, 10, 20 and 40 km buffers around the 65 Amazonian lakes.Sites are colour coded by assigned biogeographic regions.Pseudo-R 2 values, model coefficients, and p-values are shown for each regression.(Figure 5; Åkesson et al., 2021).Pollen analysis revealed the Poaceae pollen percentages were functionally the same in 1983 and 2014 (i.e.

F
The fossil Poaceae record from Lake Kumpak a , Ecuador, over the last 2200 years (orange line) alongside the aerial imagery collected in1983 and 2014 (data from Åkesson et al., 2021).Predictions from the beta regression generated in this study show the likely amount of forest cover during the last 2200 years at the lake (green line with confidence intervals).Arrows on the Poaceae pollen curve (orange) and predicted forest cover (green) indicate the peak of human disturbance at c. 1005 CE and the years when overflight images are available(1983 and 2014).

Pollen types or groups (predictor) Buffer size (km) Model using only pollen p-value Model using pollen + lake size p-value (pollen) p-value (lake size)
Bold p-values indicate significance of predictors used in beta regressions.