Chemical variations in Quercus pollen as a tool for taxonomic identification: implications for long-term ecological and biogeographic research

Aim Fossil pollen is an important tool for understanding biogeographic patterns in the past, but the taxonomic resolution of the fossil-pollen record may be limited to genus or even family level. Chemical analysis of pollen grains has the potential to increase the taxonomic resolution of pollen, but present-day chemical variability is poorly understood. This study aims to investigate whether a phylogenetic signal is present in the chemical variations of Quercus L. pollen, to determine the prospects of chemical techniques as for identification in biogeographic research. Location Portugal Taxon Six species of Quercus L., Q. faginea, Q. robur, Q. robur estremadurensis, Q. coccifera, Q. rotundifolia and Q. suber belonging to three sections: Cerris, Ilex, and Quercus (Denk, Grimm, Manos, Deng, & Hipp, 2017) Methods We collected pollen samples from 297 individual Quercus trees across a 4° (∼450 km) latitudinal gradient and determined chemical differences using Fourier-transform infrared spectroscopy (FTIR). We used canonical powered partial least-squares regression (CPPLS) and discriminant analysis to describe within- and between-species chemical variability. Chemical functional groups associated with variation in the FTIR spectra between different Quercus species were determined. Results We find clear differences in the FTIR spectra from Quercus pollen at the section level (Cerris: ∼98%; Ilex: ∼100%; Quercus: ∼97%). Successful discrimination is based on lipids and sporopollenins. However, discrimination of species within individual Quercus sections is more difficult: overall, species recall is ∼76% and species misidentifications within sections lie between 22% and 31% of the test-set. Main Conclusions Our results demonstrate that sub-genus level differentiation of Quercus pollen is possible using FTIR methods, with successful classification at the section level. This indicates that operator-independent FTIR approaches can perform equally to traditional morphological techniques. However, although sporopollenins are identified as important functional groups for discrimination between some Quercus taxa, the importance of lipids to resolve phylogenetic signatures presents challenges for future biogeographic research because lipids may be less likely to be preserved in sediment sequences. We suggest that future work on further discrimination of isolated sporopollenin components is required before FTIR-based chemical discrimination can be used in long-term ecology and biogeography studies.


Introduction
Palaeoecology uses biological proxies (e.g. the sub-fossilised remains of plants and animals) preserved in sediments to reconstruct past biotic changes over decadal to multi-millennial timescales. These reconstructions can be used to address a range of biogeographic questions related to species-range dynamics in response to the Quaternary glaciations (Webb, 1986;Davis & Shaw, 2001), information about long-term successional processes and the establishment and dynamics of plant communities (Delcourt, Delcourt, & Webb, 1982;Ritchie, 1995;Mitchell & Cole, 1998), and temporal and spatial changes in plant-diversity gradients and their links with climate and anthropogenic change (Odgaard, 1999;Haskell, 2001). Fossil pollen and spores are a critical tool to provide insight into these topics. The exine, or outer cell wall, of pollen and spores is made up of a set of complex biomolecules known as sporopollenins (Scott, 1994). Sporopollenins are highly resistant to corrosion and provide a protective outer layer for the genetic material, energy and carbon reserves (carbohydrates and lipids), and proteins (nutrients and pollen-specific transcriptomes) stored within the pollen intine. In palaeoecological research, the sporopolleninbased exine has traditionally been used as the main tool to differentiate between different pollen types (Erdtman & Praglowski, 1959;Faegri & Iversen, 1989). Since the structural complexity of the exine varies considerably between genera and families, analysis of subsequent sediment samples using traditional methods under the light microscope has enabled insights into biogeographic patterns and processes through reconstructions of past landscape and vegetation change (Prentice, Guiot, Huntley, Jolly, & Cheddadi, 1996;Prentice & Webb, 1998;Elenga et al., 2000).
Although traditional morphological approaches continue to yield insights into the patterns and processes of vegetation changes in the past, these methods of identification are time consuming, and this information is often limited by resolving pollen types only to the genus, or even family level. One potential solution lies in the chemical analysis of pollen and spores. Evidence is mounting that analysis of pollen grains using Fourier Transform Infrared Spectroscopy (FTIR) may provide a new tool for differentiating between pollen types extracted from sediment sequences (Julier et al., 2016;Woutersen et al., 2018;Jardine, Gosling, Lomax, Julier, & Fraser, 2019), since pollen-grain chemistry may itself show biogeographic patterns related to phylogeny and environmental conditions (Bağcıoğlu, Kohler, Seifert, Kneipp, & Zimmermann, 2017;Zimmermann et al., 2017). FTIR is a nondestructive method which is used to infer the chemical composition of a sample based on the fact that different molecular-functional groups have different wavelength-specific absorbances of infrared radiation (Gottardini, Rossi, Cristofolini, & Benedetti, 2007;Ivleva, Niessner, & Panne, 2005;Pappas, Tarantilis, Harizanis, & Polissiou, 2003;Parodi, Dickerson, & Cloud, 2013;Schulte, Lingott, Panne, & Kneipp, 2008;Zimmermann, 2010;Zimmermann & Kohler, 2014;. For example, Zimmermann (2010) classified pollen from 11 species at comparable accuracy to conventional light-microscopy methods, whilst later work by Zimmermann & Kohler (2014) and Depciuch, Kasprzyk, Drzymała, & Parlinska-Wojtan (2018) expanded on the differences in pollen composition by analysing over 300 different species and identified lipids (specifically triglycerides) as being major sources of variation in the pollen spectra of congeneric species across several diverse genera (Quercus, Iris, Pinus, Betula, etc). Differences between the sporopolleninspectral regions of the pollen of 15 morphologically-similar Pinales species  have also recently been observed, whilst two independent studies have demonstrated that the pollen spectra of pollen of both Poaceae and Nitrariaceae can be used to generate sub-family level differentiation using FTIR (Julier et al., 2016;Woutersen et al., 2018;Zimmermann et al., 2017;P. Jardine et al., 2019).
Taxonomic differentiation of pollen based on the chemical variations inferred by FTIR undoubtedly shows promise, but there are a number of challenges that remain to be solved to enable more extensive application across palaeoecology. First, lipids and proteins have been found to be major discriminants of variation in FTIR spectra Zimmermann & Kohler, 2014), but these may not be preserved in sediment sequences as compared to the sporopollenin. For example, a study on fungal basidiospores, conducted on the archived samples collected within the last 50 years, demonstrated changes in lipid chemistry associated with the increasing storage time (Zimmermann, Tkalčec, Mešić, & Kohler, 2015). Therefore, although variations in sporopollenin components between pollen types have been observed, the relative importance of the different chemical structures (e.g. lipids, proteins, sporopollenins) that are responsible for discrimination between many modern pollen types still needs to be established. Second, understanding of the potential impact of environmental variation on the pollen-chemistry signature remains limited. For example, some of the variation in lipid, protein, and carbohydrate chemistry may be explained by environmental influences such as temperature during pollen maturation (Lahlali et al., 2014;Zimmermann & Kohler, 2014;Jiang et al., 2015;Zimmermann et al., 2017), whilst sporopollenin composition may be influenced by external environmental influences such as the plant exposure to UV-B radiation (Rozema et al., 2001;Blokker, Boelen, Broekman, & Rozema, 2006;Willis et al., 2011;Wesley T. Fraser et al., 2011;P. E. Jardine, Abernethy, Lomax, Gosling, & Fraser, 2017;Bell et al., 2018). As a result, the extent to which variation across larger environmental gradients might override taxonomic differences in chemical composition remains unclear. Finally, the majority of studies investigating pollen chemistry have so far used pollen from herbaria, botanical gardens, or university campuses, with a limited number of replicates (usually less than three) per location and large number of different species and families (often more than ten). Although sampling from botanic gardens offers ease of access, reliable identification, and a broad species range, the number of replicates in these analyses remains a limiting factor when investigating environmental and taxonomic variability in pollen chemistry. As a result, extensive sampling across large spatial gradients are required.
This study aims to address the challenges related to understanding the taxonomic variations of pollen chemistry by investigating the relative importance of within-and between-species chemical differences in Quercus pollen. Quercus makes an excellent study system for this purpose because it has a wide geographical distribution (Colombo, Lorenzoni, & Grigoletto, 1983;Rushton, 1993;Manos, Doyle, & Nixon, 1999;Petit, Bodénès, Ducousso, Roussel, & Kremer, 2003). Investigating the chemical differences between different Quercus pollen types also has implications for palaeoecological research, particularly for studies based in the Mediterranean region. Quercus pollen is often abundant in Mediterranean-pollen records (e.g. Carrión, Parra, Navarro, & Munuera, 2000;Brewer, Cheddadi, de Beaulieu, & Reille, 2002;Petit et al., 2002), but despite its high species diversity, pollen from the genus Quercus can only generally be separated into three morphologically distinctive types in pollen sequences based on differences in surface ornamentation of the grains. Given the importance of incorporating long-term ecological information to improve biodiversity forecasts of environmental change (Dawson, Jackson, House, Prentice, & Mace, 2011), the ability to differentiate Quercus to species level in the pollen record would greatly improve our understanding of the ecological dynamics underlying Mediterranean ecosystems in the past and future (e.g. Guiot & Cramer, 2016).
Here, we use a total of 297 trees sampled across a climatic and topographic gradient in Portugal to investigate within and between species variability in Quercus pollen chemistry at a regional scale. Our dataset is unique because, to our knowledge, it represents the largest collection of closely related species sampled from populations outside botanic gardens, with the specific goal of testing the possibilities of taxonomic differentiation using FTIR. Using this dataset, we aim to investigate whether within-and between-species variability can be used to differentiate sections and species of Quercus. Addressing these questions represents the first step if we are to successfully use pollen chemistry as a tool to improve our understanding of past biogeographic patterns and processes in plants.

Sample collection
We collected pollen samples from 297 individual trees belonging to six Quercus species across a 4° (~450 km) latitudinal gradient in Portugal, from Porto in the north to Lisbon in the south and the Spanish border east of Vila Nova de Foz Côa. The Quercus species in this study belong to the sections Cerris, Ilex, and Quercus according to Denk et al., (2017) (Table 1) and have different geographic distributions (Figure 1). Q. robur, for example, has a temperate distribution and is more abundant in north-west Portugal. It occurs in regions with sufficient summer rain, and is absent from areas with summer drought (Amigo, 2017;Ülker, Tavsanoglu, & Perktas, 2018). Q. faginea s.l. has a broad distribution in the Mediterranean Basin and the Iberian Peninsula (Tschan & Denk, 2012) and is more abundant on limestone with higher summer precipitation, tracking the sub-Mediterranean bioclimatic belt (Sanchez, Benito-Garzon, & Ollero, 2009). Q. suber is distributed across the whole study region on siliceous bedrock but is absent from areas with winter cold (Amigo, 2017;Matías, Abdelaziz, Godoy, & Gómez-Aparicio, 2019). Q. coccifera and Q. rotundifolia prefer xerophytic conditions and are often co-occuring species. Both are indifferent towards bedrock conditions, but prefer soils without waterlogging, although Q. coccifera is a thermophile species and is less tolerant of winter cold. A detailed summary of the number of trees sampled at each location is in the Supplementary Material (Table S1). Trees were sampled along gradients of temperature and precipitation to cover a wide range of environmental conditions. Climatic conditions across the study region vary, with oceanic conditions along the coast, and a drier Mediterranean climate in the south and towards the Spanish border. East of Porto is the area with the highest precipitation in Portugal before the upper regions of the Douro river valley become semi-Mediterranean ( Figure 1). The underlying soil changes from predominantly silica-rich bedrock in the north to clay in the south and a pocket of calcareous bedrock around Sintra (Rivas-Martinez, Penas, del Rio, Gonzales, & Rivas-Saenz, 2017). All samples were collected in spring 2018 by taking whole-tree composite samples of ca. 30 catkins per individual tree. Several branches were sampled for catkins up to 5 m in height. The catkins were air-dried at room temperature (~23 C) for at least 24h and the pollen was separated from the anthers by light shaking. Pollen was also sieved through 60 µm sieves to remove excess plant material before analysis.

Pollen-chemistry measurements
Reflectance-infrared spectra were recorded using a Vertex 70 FTIR spectrometer (Bruker Optik GmbH, Germany) with a single reflectance-attenuated total-reflectance (SR-ATR) accessory. The ATR IR spectra were recorded with a total of 32 scans and spectral resolution of 4 cm -1 over the range of 4000-600 cm -1 , using the horizontal SR-ATR diamond prism with 45° angle of incidence on a High Temperature Golden gate ATR Mk II (Specac, United Kingdom). Approximately 1 mg of dried pollen was deposited onto the ATR crystal for each measurement (3 replicate measurements). Between each measurement a background (reference) spectrum was recorded using the sample-free setup. The OPUS software (Bruker Optik GmbH, Germany) was used for data acquisition and instrument control.
We applied minimal processing on the spectra, because the spectra were reproducible with three replicate analyses. All further processing was performed on the mean spectra of the replicate analyses. The processing of the spectra included a single pass of Extended Multiplicative Signal Correction (EMSC) using the EMSC package (Liland, 2017) as well as second-derivative calculation and smoothing using the Savitzky-Golay smoothing algorithm (Edwards & Willson, 1974;Savitzky & Golay, 1964). It has been shown that multivariate-regression methods (e.g. partial least squares; PLS) perform better with preprocessed spectra, as shown by classification of FTIR spectral pollen data in other studies (Zimmermann & Kohler, 2013;Woutersen et al., 2018). For further analyses, the spectra were constrained between 700 and 1800 cm -1 . We follow peaks of interest that have been attributed to chemical-functional groups according to Pappas et al., 2003, Gottardini et al., 2007, Schulte et al., 2008, Zimmermann, 2010and Zimmermann & Kohler, 2014 (Table 2). Table 2 Wavenumber of peaks attributed to specific functional groups in spectra of fresh pollen and the compounds most representative for them (Pappas et al., 2003;Gottardini et al., 2007;Schulte et al., 2008;Zimmermann, 2010, p. 20109;Zimmermann & Kohler, 2014compiled in Zimmermann, Bağcıoğlu, et al., 2015. A (*) marks wavenumbers which are shared by more than one compound .

Compounds
Wavenumber ( Aromatic rings in phenylpropanoid acids

Statistical analyses
For the exploration of within and between species chemical variability, we fitted a PLS model combined with canonical correlation analysis (CPPLS) to the processed spectra (2 nd derivative) to predict species identity. This analysis was implemented in the "pls" package (Mevik, Wehrens, Liland, & Hiemstra, 2019) in R version 3.6.0 (R Core Team, 2019). The PLS family of models has been shown to be powerful in multivariate analyses of FTIR-spectral data (Wold, Sjöström, & Eriksson, 2001;Liland, Mevik, Rukke, Almøy, & Isaksson, 2009;Telaar, Nürnberg, & Repsilber, 2010;. The CPPLS method improves the extraction of predictive information by estimating optimal latent variables in comparison to standard PLS regression (Mehmood & Ahmed, 2016). Unlike standard PLS, CPPLS weights the contribution of the explanatory variables (spectra wavenumbers), which weakens the contribution of non-relevant wavenumbers in the spectra to optimise the covariance between response (species) and explanatory variables (spectra). Indahl, Liland, & Naes (2009) show improved accuracy and increased explained variance of CPPLS compared to conventional PLSR using spectral data.
To assess the classification performance of the CPPLS, the dataset was randomly split into training and test sets using a 60%/40% split. This split was repeated 10 times to create 10 versions of the dataset (folds) with different training/test splits. A CPPLS model was fitted for each fold using leave-one-out (LOO) cross validation and the extracted component scores were used to predict species identity using limited discriminant analyses. The performance of the classifier in predicting the test set was averaged over the folds and summarised in a confusion matrix (Table 3).

Chemical variations in Quercus
Assessment of the mean spectra of the six Quercus species reveals clear differences in absorbance between taxonomic sections at specific wavelengths associated with specific chemical functional groups ( Figure 2). For example, the lipid peak absorbance at ~1750 cm -1 is weaker in Ilex pollen compared to the other sections, whilst the sporopollenin and carbohydrate absorbance bands (at 1516, 1168, 833 cm -1 and 985 cm -1 , respectively) are noticeably lower in absorbance in the taxa belonging to the Ilex sections. Note, however, that although clear spectral differences exist at the section level, it is more difficult to separate variations between species within different sections ( Figure 2). Table 2. Spectra are offset.

Figure 2 Mean absorbance spectra of Quercus species and notable peak locations. Lipids (L), protein (P), sporopollenin (S), and carbohydrates (C) using wavenumbers compiled in
The observations made following assessment of the mean spectra are confirmed by the analysis using CPPLS (Figure 3a). Here, the three sections of Quercus can be clearly separated using the variance explained by two CPPLS components (Figure 3a). For example, the Ilex section scores are negative along the first component (19.6 % of the variation), whilst individuals from the Quercus and Cerris sections have positive scores along this axis. The Quercus and Cerris sections can mainly be separated along the second principal component (Cerris with negative scores and Quercus with positive scores). However, between-species level variations within different Quercus sections species are harder to differentiate, and at the species level in particular there is large overlap within the sections Ilex and Quercus. This overlap is reduced on later components, where the different species separate within their respective sections. On components 3 and 4 Q. coccifera and Q. rotundifolia can be separated along the third component, while Q. robur and Q. faginea show some separation along the fourth component (Figure 3b). In total, two components explain ~34% of the variation in the dataset and separate the samples into Quercus sections, while the next two components explain a further ~8.8% ( Figure S1). Since specific absorbance peaks in the FTIR spectra can be related to chemical functional groups (summarised in Table 2 Table 2. High absolute loading indicates a high importance of a given wavenumber for the corresponding component. Loadings are chosen in such a way as to describe as much as possible of the covariance between the variables (spectra) and the response (species). Proportion of variance explained by each component in parentheses.

Discriminant analysis
Using four components (explaining ~40% of the variance) the confusion matrix of the classification CPLS model shows clear differentiation between the Quercus sections, with only ~2±2% of Q. suber samples misidentified as belonging to the Quercus section (Table 3). Quercus robur ssp estremadurensis has by far the worst accuracy in the model and is most often identified as its parent species Quercus robur, possibly due to the limited number of samples in the dataset (Table 1). In general, species misidentifications are contained within the different Quercus sections and lie between 22% and 31% of the test-set samples (within-section misidentifications: 23% of Q. robur as Q. faginea; 31% of Q. faginea as Q. robur; 27% of Q. coccifera as Q. rotundifolia; 22% of Q. rotundifolia as Q. coccifera). Overall species accuracy within sections ranges from 64% to 78% in the Ilex and Quercus sections. Increasing the components available to the model to 10 (~50% explained variance) increases species accuracy by 5-10 percentage points in the Ilex and Quercus section (Table  S2), with higher uncertainty between Ilex species than Quercus section species. As demonstrated with our ordination plots (Figure 3), differentiation of the three sections of Quercus is possible using 40% of the variance in the spectral data, but these results indicate difficulties in differentiation between species of the same section.

Separation of Quercus according to chemical variation
Recent research has shown that spectroscopic methods such as FTIR are effective at differentiating pollen species between distantly related families and/or genera using their chemical composition (Gottardini et al., 2007 Our results build on these previous studies to reveal the potential for chemical variations in pollen to distinguish sub-generic variation between species belonging to three different Quercus sections in pollen samples from 297 individuals from Portugal. We identify a clear separation of Quercus into the three taxonomic sections of Ilex, Cerris, and Quercus ( Fig. 3 and Table 3), using two components of a PLS model and only 30% of the explained variance in the spectral data. One component (component 1) can be used to differentiate Ilex from Quercus, while component 2 can be used to separate Cerris. Combined, these two components achieve the performance equivalent to traditional palynological methods using light microscopy, where Quercus pollen can be identified to section level.
Despite finding that classification at the section level is possible using FTIR approaches, there is considerable overlap in variation between species of the same section. Furthermore, classification performance does not improve when using a more complex model in which the number of components used increases from four to ten. Using this more complex model, which explains ~50% of the variance (compared to ~40% in the four-component model), classification accuracies remain roughly similar within Quercus sections (Table S2). For example, Q. coccifera and Q. rotundifolia have a recall of ~75% with both 4 and 10 components. Similarly, approximately one third of Q. robur and Q. faginea samples (both belonging to the Quercus section) are misclassified as the other species. Thus, while our results indicate that sub-generic classification of Quercus pollen is possible at the section level using FTIR, we still find it difficult to distinguish between more closely related (i.e. within-section) pollen types.
These findings are approximately in line with other studies, which have performed species classification using FTIR. For example, both Julier et al. (2016) and Jardine et al. (2019) report classification successes of ~80% and ~85% rates, respectively, using an FTIR analysis of cryptic morphospecies within the family Poaceae. Their studies are based on a combination of specimens of mainly non-congeneric grass species (except two species of Oryza, Julier et al., 2016, and four species of Triticum, Jardine et al. 2019). In both these studies, classification success is lower for the samples belonging to congeneric species and higher for the more-distantly related pollen types (i.e. those species belonging to different genera). In another study, Woutersen et al. (2018) report ~95% recall on largely congeneric species in the Nitrariaceae family using single grain FTIR, but also note that lack of environmental variability (pollen from one individual per species) could have led to an overestimation of classification success. In contrast, Zimmermann et al. (2017) achieve ~100% accuracy on species identification and 75% accuracy on identification of origin using hierarchical PLSR on pollen from three species of Poaceae (Festuca ovina, Anthoxanthum odoratum, Poa alpina) of different genera and origins (Sweden, Norway, Finland) grown under controlled conditions (45 individual per species). Such a high classification success on taxa grown in controlled conditions demonstrates the strong phylogenetic signature that can be observed using FTIR. Our results also demonstrate these strong phylogenetic differences in FTIR spectra (i.e. the ability to differentiate between Quercus-section level variability), but our study additionally demonstrates the difficulty of distinguishing between-species level variability, even when relatively large subsets of samples are used.

Key chemical drivers of variation within Quercus spp. pollen
Given the result that separation of different Quercus species is possible at the section level, a key question that follows is which of the chemical components of the pollen grain are mostly responsible for the separations between the three sections of Quercus under FTIR? In our study, we find that lipid peaks are the most important factors in separating Ilex samples from the taxa in the datasets. The waveband at 1750 cm -1 is particularly important in this regard. Previous research has shown that this waveband is a strong indicator for triglyceride lipids . Our results confirm previous findings by Zimmermann & Kohler (2014), who show extreme variations in the relative content of triglycerides and find this waveband to be an important separator between Iris, Quercus, and Pinus species. Indeed, our results extend the inferences made in that previous study by demonstrating a sub-generic level variability of the relative lipid content. Specifically, we identify a distinctly lower amount of lipids in pollen sampled from individuals within the Ilex section ( Figure 2).
In addition to the importance of triglyceride lipids as a tool for chemical separation, we also find that wavebands representing building blocks of sporopollenin (Table 3) are important for differentiating Q. suber from other taxa. For example, the peaks at 833, 852, 1516 and 1605 cm -1 are associated with building blocks of sporopollenin  and have the highest loadings on component 2, which is used in this study to isolate Q. suber. These peaks at 833 and 852 cm -1 are related to different types of phenylpropanoid building blocks and our results suggest relative differences in their abundance within the sporopollenin of Q. suber compared to the other species. The chemical structure of sporopollenin is comprised of phenylpropanoid building blocks and hydroxylated fatty acids (Schulze Osthoff & Wiermann, 1987), and it is thought that the composition of these phenylpropanoid units is species and environment dependent (Vogt, 2010). Thus, it is possible that a species-specific diagnostic tool based on sporopollenin exists for at least one taxon between the different Quercus sections using FTIR. Given that we analysed bulk samples, it is unlikely that these differences are a result of scattering effects related to the specific sporopollenin structures. Nevertheless, more detailed work on the composition of sporopollenin of different genera and how this affects pollen grain structural elements (e.g. Li, Phyo, Jacobowitz, Hong, & Weng, 2019) is needed for this finding to be confirmed.
Finally, protein and carbohydrate peaks (Carbohydrates: 1107, 1028, 1076 cm -1 ; Proteins: 1535, 1641 cm -1 ) have the highest loadings on components 3 and 4 and are partly responsible for the partial distinction of species within the same section. These peaks represent amylose and cellulose as carbohydrates and amide functional groups within proteins (Table 2). For example, variation along component 3 contributes to the separation of Q. robur from Q. faginea. However, overall, protein and carbohydrate peaks have the least influence for explaining the variance of classification success, and most of the phylogenetically important information we used to distinguish between the species is stored in the lipids and sporopollenin components of the pollen chemistry.

Prospects for developments using fossil pollen
Long-term ecological (i.e. fossil pollen) data are an important tool in macroecological and biogeographic research. When fossil pollen data are integrated from multiple sites at continental scales, they can be used to (i) track relative-niche shifts in association with the emergence of noanalogue climates since the Last Glacial Maximum (Veloz et al., 2012); (ii) forecast future range shifts as a result of climate change through integration with statistical species-distribution modelling approaches (e.g. Nogués-Bravo et al., 2016, 2018, and (iii) test hypotheses surrounding the factors relating to the emergence of disequilibrium of ecological communities (Webb, 1986;Gaüzère, Iversen, Barnagaud, Svenning, & Blonder, 2018). A number of recent palaeoecological studies have also attempted to link fossil pollen data to species-based functional-trait databases via a functionalbiogeographic approach (Violle, Reich, Pacala, Enquist, & Kattge, 2014). However, a common limitation of the preceding studies has been the taxonomic mismatch between the pollen data and the macroecological tools and databases that have been used, with the fossil pollen data often resolved only to the genus or even family level. This taxonomic mismatch is likely to create large uncertainties in any model inferences. As a result, any technique that can be used to increase the potential taxonomic resolution of fossil pollen records has the potential to have major biogeographic implications.
The results presented in this paper build on previous research which has proposed that FTIR may be a tool to improve classical, morphology based classification of pollen in long-term ecological research based on palynological records (Julier et al., 2016;Woutersen et al., 2018;Zimmermann, 2018;Jardine et al., 2019). Given the high biodiversity of Quercus and the potential sensitivity of Mediterranean ecosystems to future climate changes (Guiot & Cramer, 2016), we were motivated to investigate the potential for using FTIR to distinguish variations in Quercus spp. What, then, do our results tell us about the applicability of these FTIR approaches as a tool for taxonomic identification of Quercus in pollen records?
Our study reveals clear differences in FTIR spectra across the dataset, enabling sub-generic level differentiation of Quercus species belonging to three different sections. In contrast, withinsection variability is more difficult to differentiate using these approaches. As a result, our findings demonstrate that FTIR methods may work at a comparable level to traditional pollen-morphological approaches, within which skilled pollen analysts are often able to differentiate between the three different Quercus sections. However, despite the potential, we note a number of caveats that should be considered before the application can be applied to the fossil record.
First, these analyses are based on fresh pollen sampled from modern taxa. Using these samples, successful differentiation between taxa from different sections is primarily driven by lipids, followed by sporopollenins (Figure 4). Although sporopollenin is stable enough to allow the identification of morphological features for pollen classification under favourable (i.e. anoxic) preservation conditions (Friis, Pedersen, & Crane, 2001), the extent to which these lipids are preserved in chemical sequences in sub-fossil pollen sequences remains uncertain. Thus, although lipids are identified as a primary driver of chemical variation between Ilex and the Quercus and Cerris sections, it may be more beneficial in future studies to focus primarily on variations within the sporopollenins, whose preservation and stability in fossil sequences are well established (Fraser et al., 2012).
In this study we find some evidence to suggest that sporopollenin peaks may be a useful classification scheme to differentiate between the Quercus sections ( Figure 3). In particular, the sporopollenin peak at 852 cm -1 is influential in the differentiation of Q. suber from other taxa ( Figure  4). Furthermore, although FTIR is very useful for characterising a broad range of chemical functional groups within a pollen grain and is therefore being used widely in palaeoecological research based on pollen chemistry, it may not actually be the most optimal method for resolving finer-scale differences within sporopollenins. This is because of the nature of sporopollenin, which is abundant in non-polar bonds of long-chain fatty acids crosslinked with phenylpropanoid building blocks. Whilst FTIR relies largely on vibrational absorbance by polar chemical functional groups and to a lesser extent on nonpolar bonds, Raman uses induced energy changes, which preferably target the non-polar bonds characterised in sporopollenins. In this regard, Raman spectroscopy may be a better candidate for fossil-pollen taxonomy in long-term ecological research, since it may resolve the structure of sporopollenin in more detail through detection of caretonoid building blocks (Merlin, 2009). Future research, which aims to resolve finer scale variations in the sporopollenin component, may be useful to determine whether pollen-chemical variations can be used to distinguish phylogenetic variation in fossil pollen.
Second, while this study infers chemical differences using FTIR analysis on bulk pollen samples, fossil-pollen samples would require single-grain measurements since pollen grains are difficult to separate from other organic material within the sediment matrix. Single-grain FTIR spectra are less reproducible than bulk, mostly due to spectral anomalies caused by scattering and by nonradial symmetry of certain pollen types Zimmermann, 2018). Although these issues have been addressed by adjusting experimental settings and by implementing numerical correction methods (Zimmermann et al., 2016;Zimmermann, 2018), future work is needed to test whether the patterns we observe at the bulk level can be replicated using single-grain FTIR measurements.
Third, our study shows the importance of using large numbers of replicates in the pollen samples to account for the large amounts of chemical variation present in the chemical spectra, even within replicate species. The large numbers of samples and high levels of replication here (i.e. 50 ± 23 tree replicates per species) are a major advantage over previous studies, which have featured either fewer replicates (<5) (Jardine et al., 2019;Julier et al., 2016;Woutersen et al., 2018) or fewer/no congeneric species (Julier et al., 2016;Zimmermann et al., 2017). Although we do find clear signals in the data related to phylogenetic structure ( Figure 3 and Table 3), we also find that ~60 % of the total variation remains unexplained. We suggest it will be critical to understand the other factors which can account for this variation if these pollen-chemistry techniques can be successfully applied to fossil sequences.
One possible reason for the unexplained variation observed in our study may be linked to the environmental controls on pollen chemistry. Previous studies have suggested plasticity of pollen chemistry to climate and other environmental variables (Zimmermann & Kohler, 2014;Zimmermann et al., 2017). For example, temperature has been connected to changes in protein as well as lipid content in both controlled and field experimental conditions (Lahlali et al., 2014;Zimmermann & Kohler, 2014;Jiang et al., 2015;Zimmermann et al., 2017), whilst the effects of exposure to UV-B radiation on pollen chemistry have gained increasing attention (Rozema et al., 2001;Blokker et al., 2006;Fraser et al., 2011;Willis et al., 2011;Jardine et al., 2017;Bell et al., 2018). Additionally, there may be confounding effects related to local adaptation and hybridisation. For example, Bell et al. (2018) investigated changes in Cedrus pollen chemistry to UV-B radiation and their results suggest a heritable component in the pollen chemical response to UV-B, from analyses of FTIR spectra of pollen samples from botanic gardens closely resembling their source of origin.
Quercus as a genus also shows propensity for hybridisation and complex phylogenetic patterns (Rushton, 1993), and together these factors might be expected to increase the amount of variation we observe in pollen-chemical spectra derived from FTIR. In summary, the larger number of replicates used in this study provide us with a greater understanding of the amount of variation FTIR observed in Quercus pollen grains, and separating the effects of environment, local adaption, and sub-species variation is an important challenge for future studies of pollen chemistry if we are to fully characterise chemical variation within any given pollen type and use FTIR approaches on pollen grains from sediments.

Conclusions
We investigated the chemical variation in pollen sampled from 297 individuals of Quercus using FTIR to see whether this technique could enable taxonomic discrimination of pollen, with the goal of using this technique in future long-term ecological and biogeographic analyses. Our results achieved excellent (~97%) recall to section level, showing that sub-genus level differentiation of pollen samples is possible using IR methods. However, despite these promising results at the section level, more detailed, species-level differentiation was complicated by overlapping variation in the chemical composition of closely related species.
We also aimed to identify whether variations in specific functional groups are responsible for any taxonomic discrimination in the data. Here, we found lipids and sporopollenins to be key determinants between different Quercus sections. Although the sporopollenin functional groups are identified as important for discrimination between some Quercus taxa, the importance of lipids in determining the whole signal presents challenges for future biogeographic research using FTIR because lipids may be less likely to be preserved in sediment sequences. Taken together, our findings build on previous studies and show that, whilst FTIR approaches on modern Quercus pollen can perform at a similar level to highly skilled palynologists using traditional morphological techniques, future work on the discrimination of sporopollenin components is required before FTIR can become a more widespread tool in long-term ecology and biogeography. Thus, our study represents a valuable step forward in improving our understanding of variation in pollen chemical composition and its application in long-term ecology and biogeography.
x. Data Availability: Data and code for analysis are available as supplementary material on a github repository (https://github.com/FM-uib/Quercus_Portugal_FTIR) for review. Both will be uploaded to a dryad repository upon acceptance of manuscript.
xi. References xii. Biosketch: Florian Muthreich is a palaeoecologist interested in developing new methods of pollen classification. This work represents a component of his PhD work at the University of Bergen University within the PollChem project (https://www.uib.no/en/rg/EECRG/98775/pollchem). In this project he and other authors collaborate to explore pollen chemistry applications in biogeography and long-term ecology.
Author contributions: FM, AWRS, BZ and HJBB conceived the idea; FM and CMVV conducted the fieldwork and collected the data; FM, BZ, and AWRS analysed the data; and FM and AWRS led the writing with assistance from BZ, HJBB, and CMVV