Chemical variations in Quercus pollen as a tool for taxonomic identification: Implications for long‐term ecological and biogeographical research

Fossil pollen is an important tool for understanding biogeographical patterns in the past, but the taxonomic resolution of the fossil‐pollen record may be limited to genus or even family level. Chemical analysis of pollen grains has the potential to increase the taxonomic resolution of pollen analysis, but present‐day chemical variability is poorly understood. This study aims to investigate whether a phylogenetic signal is present in the chemical variations of Quercus L. pollen and to assess the prospects of chemical techniques for identification in biogeographical research.


| INTRODUC TI ON
Fossil pollen data can also be used to track relative-niche shifts in association with the emergence of no-analogue climates (Veloz et al., 2012) and forecast future range shifts as a result of climate change (e.g. Nogués-Bravo et al., 2016, 2018. The basis of all such studies is reliable identifications of fossil pollen to the lowest taxonomic level possible. With detailed identifications, reconstructions and answers to particular biogeographical and ecological questions can similarly be detailed. Indeed, many advances in historical plant geography (e.g. Birks, 2008;Birks, 2014;Godwin, 1975;Lang, 1994;Magri et al., 2006) have been made because of advances in the identification of plant fossils. However, although Quaternary botany (sensu Birks, 2019) has been dominated for over 100 years by pollen analysis, identifications can only be made to the genus or family level for many taxa. This is limiting the biogeographical information gained from fossil pollen to relatively coarse taxonomic levels.
This issue is of particular relevance for understanding the past, present and future distributions of the genus Quercus (oak).
Quercus contains 22 native species in two subgenera and three sections in Europe (Denk, Grimm, Manos, Deng, & Hipp, 2017;Tutin et al., 1993), several of which have striking and often distinct geographical distributions today (e.g. Iberia, Balkans, eastern Mediterranean, widespread Mediterranean, Apennine Peninsula, widespread to about 60°N; Jalas & Suominen, 1976). However, what is known about the history of Quercus is almost entirely based on pollen and is thus only at the genus level. Although three pollen-morphological types can, with care, be distinguished by conventional light microscopy (LM) (Beug, 2004) and scanning election microscopy (SEM) (Denk & Grimm, 2009), fossil pollen of Quercus is usually determined as either Quercus Deciduous or Quercus Evergreen types. This situation has several implications for biogeographical research. Maps of the changing distribution and abundance of oak pollen in the late-glacial and Holocene of Europe (Brewer et al., 2017;Huntley & Birks, 1983) can only confidently be made using the two broad pollen morphotypes (i.e. Quercus Deciduous or Quercus Evergreen). Palaeo-biomization methods used to forecast the future responses of Mediterranean ecosystems to climate change have used the same distinction between these two morphotypes (Guiot & Cramer, 2016), while a recent attempt to model future responses of Quercus in Europe using fossil pollen were based on Quercus pollen resolved to the genus level (Nogués-Bravo et al., 2016). As the sensitivity and response of Quercus to recent environmental changes is species-specific in Mediterranean ecosystems (Acácio, Dias, Catry, Rocha, & Moreira, 2017), and because Quercus macrofossils are very rarely found, any improved understanding of its historical and future biogeography clearly depends on consistent pollen identifications at levels lower than is presently available.
One potential approach lies in the chemical analysis of pollen.
However, although taxonomic differentiation of pollen based on the chemical variations inferred by FTIR shows considerable promise, and likewise for forecasting future responses to climate change. However, before FTIR techniques can be applied more broadly across palaeoecology and biogeography, our results also highlight a number of research challenges that still need to be addressed, including developing sporopollenin-specific taxonomic discriminators and determining a more complete understanding of the effects of environmental variation on pollen-chemical signatures in Quercus.

K E Y W O R D S
chemical composition, ecology and environmental sciences, Fourier-transform infrared spectroscopy, palynology, partial least squares regression, pollen widespread application of FTIR in biogeography and palaeoecology remains limited. One potential for this limitation is because lipids and proteins can be major discriminants of variation in FTIR spectra Zimmermann & Kohler, 2014), but these may not be preserved in sediment sequences alongside the sporopollenin-based exines (Zimmermann, Tkalčec, Mešić, & Kohler, 2015). Therefore, the relative importance of the different chemical structures (e.g. lipids, proteins, sporopollenins) that are responsible for discrimination between many modern pollen types still needs to be established. Moreover, influence of various abiotic stressors on pollen chemistry can hinder taxonomic differentiation of pollen samples by FTIR (Depciuch et al., 2016;Lahlali et al., 2014;Zimmermann et al., 2017), and this needs to be researched further. In addition, the majority of studies investigating pollen chemistry have used pollen from herbaria, botanical gardens or university campuses, with a limited number of replicates per location and large number of different species and families. Although this sampling design encourages ease of access, reliable identification, and a broad species range, the number of replicates remains a limiting factor for understanding chemical variations in response to both environmental and taxonomic variations.
This study aims to address the challenges related to understanding the taxonomic variations of pollen chemistry by investigating the relative importance of within-and between-species chemical differences in Quercus. Our dataset is unique because it represents the largest collection of closely related species sampled from populations outside botanic gardens across a large bioclimatic and biogeographical gradient in Portugal. We use multivariate-discriminant analysis to (i) investigate the potential for FTIR as tool to differentiate six taxa of Quercus based on pollen, and (ii) determine the main chemical-functional groups responsible for chemical variation observed in the dataset. Addressing these questions represents the first step if we are to successfully use pollen chemistry as a tool to improve our understanding of past biogeographical patterns and the history of oaks in Europe.

| Sample collection
We collected pollen samples from 294 individual trees belonging to five Quercus species across a 4° (~450 km) latitudinal gradient in Portugal (Figure 1). The Quercus taxa in this study belong to the sections Cerris, Ilex, and Quercus according to Denk et al. (2017) and have different geographical distributions (Table 1). Trees were sampled along gradients of temperature and precipitation to cover a wide range of environmental conditions. A detailed summary of the number of trees sampled at each location is in Table S1.
All samples were collected in spring 2018 by taking whole-tree composite samples of ca. 30 catkins per individual tree. Several branches were sampled for catkins up to 5 m in height. The catkins were air-dried at room temperature (23°C) for at least 24 hr and the pollen was separated from the anthers by light shaking. Pollen was also sieved through 60 µm sieves to remove excess plant material before analysis.

| Pollen-chemistry measurements
Reflectance-infrared spectra were recorded using a Vertex 70 FTIR spectrometer (Bruker Optik GmbH) with a single reflectance-attenuated total-reflectance (SR-ATR) accessory. The ATR IR spectra were recorded with a total of 32 scans and spectral resolution of 4 cm −1 over the range of 4000-600 cm −1 , using the horizontal SR-ATR diamond prism with 45° angle of incidence on a High Temperature Golden gate ATR Mk II (Specac). Approximately 1 mg of dried pollen was deposited onto the ATR crystal for each measurement (three replicate measurements). Between each measurement a background (reference) spectrum was recorded using the sample-free setup. The OPUS software (Bruker Optik GmbH) was used for data acquisition and instrument control.
We pre-processed the spectra since multivariate-regression methods (e.g. partial least squares; PLS) have been shown to perform better with pre-processed spectra in other studies (Woutersen et al., 2018;Zimmermann & Kohler, 2013). The processing of the spectra consisted of smoothing and calculation of the second derivative using the Savitzky-Golay algorithm, as implemented by the extended multiplicative signal correction (EMSC) package (Liland, 2017). The settings of the Savitzky-Golay smoothing algorithm (Edwards & Willson, 1974;Savitzky & Golay, 1964) were: second degree polynomial and a window size of 11. The second-derivative spectra were constrained between 700 and 1,900 cm −1 and normalized using EMSC, a multiplicative signal correction model extended by a linear and quadratic component (Liland, 2017). For further analyses, the mean of the measurement replicates (three) was calculated for each tree (resulting in one spectrum per tree). We follow peaks of interest that have been attributed to chemical-functional groups according to Pappas et al. (2003), Gottardini et al. (2007), Schulte et al. (2008), Zimmermann (2010) and Zimmermann and Kohler (2014) (Table 2).

| Statistical analyses
For the exploration of within-and between-species chemical variability, we fitted a PLS model combined with canonical correlation analysis (CPPLS) to the processed mean spectra (second derivative) to predict species identity. This analysis was implemented in the 'pls' package (Mevik, Ron Wehrens, & Liland, 2019) in R version 3.6.0 (R Core Team, 2019). The PLS family of models has been shown to be powerful in multivariate analyses of FTIR-spectral data (Liland, Mevik, Rukke, Almøy, & Isaksson, 2009;Telaar, Nürnberg, & Repsilber, 2010;Wold, Sjöström, & Eriksson, 2001;Zimmermann et al., 2017). The CPPLS method improves the extraction of predictive information by estimating optimal latent variables in comparison to standard PLS regression (Mehmood & Ahmed, 2016). Unlike standard PLS, CPPLS weights the contribution of the explanatory variables (wavenumbers), which weakens the contribution of non-relevant wavenumbers to optimize the covariance between response (species) and explanatory variables (wavenumbers). Indahl, Liland, and Naes (2009) show improved accuracy and increased explained variance of CPPLS compared with conventional PLS regression using spectral data.
To assess the classification performance of the CPPLS, the dataset was randomly split into training and test sets using a 60%/40% split. This split was repeated 100 times to create 100 versions of the dataset (folds) with different training/test splits.
A CPPLS model was fitted for each fold and the extracted component scores were used to predict species identity using limited discriminant analyses. The performance of the classifier in predicting the test set was averaged over the folds and summarized in a confusion matrix (Table 3).

| Chemical variations in Quercus
Assessment of the mean spectra of the five Quercus species and one subspecies reveals clear differences in absorbance between the major intrageneric lineages (sections) at wavelengths associated with specific chemical functional groups ( Figure 2). For example, the lipid peak absorbance at ~1,745 cm −1 is weaker in section Ilex compared with the other sections, while the sporopollenin and carbohydrate absorbance bands (at 1,516, 1,171, 833 and 985 cm −1 respectively) are noticeably lower in absorbance in the taxa belonging TA B L E 1 Taxonomy of sampled Quercus trees and total number of trees sampled (n). Sections are according to Denk et al. (2017) (Tschan & Denk, 2012) and is more abundant on limestone with higher summer precipitation, tracking the sub-Mediterranean bioclimatic belt (Sanchez, Benito-Garzon, & Ollero, 2009).

Quercus
Quercus Q. robur estremadurensis 15 to section Ilex. Note, however, that although clear spectral differences exist at the section level, it is more difficult to separate variations between species within different sections ( Figure 2).
The observations made following assessment of the mean spectra are confirmed by the analysis using CPPLS (Figure 3a).
Here, the three sections of Quercus can be clearly separated using Aromatic rings in phenylpropanoid subunits Note: A (*) marks wavenumbers which are shared by more than one compound . The peak at 1,171 cm −1 is an indicator for C-O-C stretching, that can be present in various types of lipids (triglycerides and phospholipids) and sporopollenins as well as some types of carbohydrates.
TA B L E 2 Wavenumber of peaks attributed to specific functional groups in spectra of fresh pollen and their representative compounds (Pappas et al., 2003;Gottardini et al., 2007;Schulte et al., 2008;Zimmermann, 2010, p. 20109;Zimmermann & Kohler, 2014compiled in Zimmermann, Bağcıoğlu, et al., 2015 Pred  show some separation along the fourth component (Figure 3b). In total, the two first components explain ~37% of the variation in the dataset and separate the samples into the Quercus sections, while the next two components explain a further ~9.7% ( Figure S1).
Since specific absorbance peaks in the FTIR spectra can be related to chemical functional groups (summarized in Table 2

| Discriminant analysis
Using four components (explaining ~45% of the variance) the confusion matrix of the classification CPLS model shows clear differentiation between the Quercus sections, with some misidentified spectra (<~2 ± 3) from section Quercus and Q. suber (Table 3).
Quercus robur ssp. estremadurensis has by far the worst accuracy in the model and is most often identified as its parent species Quercus robur, possibly due to the limited number of samples in the dataset ( S2). As demonstrated with our ordination plots (Figure 3), differentiation of the three sections of Quercus is possible using 37% of the variance in the spectral data, but these results indicate difficulties in differentiation between species of the same section.

| Separation of Quercus according to chemical variation
Recent research has shown that spectroscopic methods such as Our results build on these previous studies to reveal the potential for chemical variations in pollen to distinguish infrageneric variation between species in pollen samples from 297 individuals from Portugal, which belong to three different Quercus sections (Cerris, Ilex, Quercus). We identify a clear separation at the Quercus section level (Figure 3 and   Table 2. High absolute loading indicates a high importance of a given wavenumber for the corresponding component. Loadings are chosen in such a way as to describe as much as possible of the covariance between the variables (wavenumbers) and the response (species). Proportion of variance explained by each component in parentheses [Colour figure can be viewed at wileyonlinelibrary.com] differences in FTIR spectra (i.e. the ability to differentiate between Quercus-section level variability), but our study also demonstrates the difficulty of distinguishing between-species level variability, even when relatively large subsets of samples are used.

| Key chemical drivers of variation within Quercus spp. pollen
Given the result that identification is possible at the section level, a key question that follows is which of the chemical components of the pollen grain are mostly responsible for the difference between the three sections of Quercus under FTIR? In our study, we find that lipids are one of the most important functional groups in diagnosing samples belonging to section Ilex. The wavebands at 1,462 and 1,745 cm −1 are particularly important in this regard.
Previous research has shown that these wavebands are indicators for triglyceride lipids . Our results also confirm previous findings by Zimmermann and Kohler (2014), who show extreme variations in the relative content of triglycerides and find this waveband to be an important separator between Iris, Quercus, and Pinus pollen types. Indeed, our results extend the inferences made in that previous study by demonstrating a subgeneric level variability of the relative lipid content. Specifically, we identify relatively fewer lipids in pollen sampled from individuals within the Ilex section ( Figure 2).
In addition to the importance of triglyceride lipids as a tool for chemical separation, we also find that wavebands representing building blocks of sporopollenin (Table 3) are important for differentiating taxa on the first two components of our CPPLS analysis.
For example, the peaks at 833, 852, 1,516 and 1,605 cm −1 are associated with building blocks of sporopollenin  and have relatively high loadings on component 2, which is used in this study to isolate Q. suber. Peaks at 833 and 852 cm −1 are related to different types of phenylpropanoid building blocks and our results suggest relative differences in their abundance within the sporopollenin of Q. suber compared with the other species. In addition to lipid variation, aromatic peaks at 833, 852, 1,516 and 1,605 cm −1 also have a high loading on component 1, which can be used to separate the Ilex section pollen from the other sections. Thus, both lipids and sporopollenins are important functional groups to different pollen between the three sections in our dataset.
Our observations of different chemical compositions of sporopollenin mirror the sequence of the development of the pollen wall in different Quercus sections described by Solomon (1983aSolomon ( , 1983b and Denk and Grimm (2009). Evolutionary, pollen of section Ilex represent the earliest, primitive state of Quercus pollen, with a microrugulate pattern on the pollen exine surface. A set of secondary sporopollenins are then added to this surface during exine formation in pollen of sections Cerris and Quercus (Denk & Grimm, 2009). It is possible that these key differences in structure and formation of the exine between the section Ilex and other Quercus pollen grains are responsible for the differences in sporopollenin chemistry identified using FTIR. More detailed work on the composition of sporopollenin of different genera, and how this affects pollen grain structural elements (e.g. Li, Phyo, Jacobowitz, Hong, & Weng, 2019) is needed for this finding to be confirmed.
Finally, protein and carbohydrate peaks (Carbohydrates: 1,107, 1,028, 1,076 cm −1 ; Proteins: 1,535, 1,641 cm −1 ) have the highest loadings on components 3 and 4 and are partly responsible for the partial distinction of species within the same section. These peaks represent amylose and cellulose as carbohydrates and amide functional groups within proteins (Table 2)

| Implications for understanding past and future Quercus dynamics
We investigated the potential for chemical separation of Quercus  Brewer et al., 2017;Huntley & Birks, 1983). As a result, there may be much detail missing in our current understanding about past Quercus dynamics, which could be improved through methods that result in refined taxonomic resolution. Indeed, our results indicate the potential for FTIR to surpass traditional LM methods used in palynology, and work at a comparable level to SEM (Denk & Grimm, 2009;Denk & Tekleva, 2014;Grímsson, Grimm, Meller, Bouchal, & Zetter, 2016;Grímsson et al., 2015). However, the extensive automatedclassification possibilities offered by future IR analysis (Mondol et al., 2019), and in the ease of sample preparation and data collection, may mean it will be easier to expand these technologies compared with the more time-consuming SEM methods in the long term.
The ability to differentiate at higher taxonomic resolution would enhance our understanding of past trajectories of co-occurring Quercus sections, in particular for understanding the expansion of Quercus since the Last Glacial Maximum (e.g. Brewer, Cheddadi, de Beaulieu, & Reille, 2002). FTIR techniques may also be useful for older interglacial sequences, where identification of Quercus pollen to section is often not possible due to degradation (Tzedakis, 1994).
This would complement studies that use genetic methods on modern samples to reconstruct colonization pathways, which have higher taxonomic resolution and compliment the palynological data, but lack the temporal resolution that pollen records provide (Petit et al., 2002).
In addition, a number of studies have highlighted the need to incorporate long-term ecological information to improve biodiversity forecasts of environmental change (Dawson, Jackson, House, Prentice, & Mace, 2011). Rates of temperature increases in the Mediterranean are projected to outpace the rest of the temperate regions in Europe and are predicted to rapidly change the associated biomes in the region (Giorgi & Lionello, 2008;Guiot & Cramer, 2016;Guiot & Kaniewski, 2015), but the consequences of this change for Mediterranean oak forests remain uncertain (Acácio et al., 2017;Lindner et al., 2014). A number of studies have integrated pollen data in order to reduce uncertainties when forecasting the biotic responses of Quercus to climate change in the future (Nogués-Bravo et al., 2016;Guiot & Cramer, 2016). However, like the palaeoecological studies discussed above, the limited taxonomic resolution used may bias projections. For example, species distribution models based solely on Quercus pollen were only able to estimate niche-environment relationships at the genus level. Extensive application of FTIR techniques may therefore provide a bridge between long-term ecological and modern biogeographical approaches.
Nevertheless, despite the potential shown in our FTIR approach, our findings reveal a number of challenges before vibrational methods can be rolled out across biogeographical and palaeoecological applications. First, our results are still unable to resolve at the species level, and so although taxonomic resolution would be refined using FTIR approaches, in many cases palaeoecological studies would still lack the taxonomic precision of other biogeographical tools (e.g. phylogenetic analysis). Second, our results are based on fresh pollen sampled from modern taxa, and lipids were some of the main functional groups used to differentiate between taxa in this study, in addition to by sporopollenins ( Figure 4). Although the preservation and stability of sporopollenins in fossil sequences are well established (Fraser et al., 2012), the extent to which lipids are preserved in chemical sequences in subfossil pollen sequences remains uncertain. Variations in sporopollenin functional groups were still responsible for differentiation between the three main Quercus sections, but the ability for these functional compounds to be used as taxonomic tools in isolation is yet to be established. In the future it may be more beneficial to focus on variations of sporopollenins in pollen, perhaps through the use of Raman spectroscopy, which preferentially targets the vibration of non-polar bonds in sporopollenins and so may be able to achieve finer-scale differentiation of sporopollenin building blocks (Merlin, 2009).
Third, in this study we used bulk pollen samples to infer differences using FTIR, but fossil-pollen samples would require single-grain measurements since pollen grains are difficult to separate from other organic material within the sediment matrix. Single-grain FTIR spectra are less reproducible than bulk, mostly due to spectral anomalies caused by scattering and by non-radial symmetry of certain pollen types (Zimmermann, 2018;Zimmermann, Bağcıoğlu, et al., 2015). Although these issues have been addressed by adjusting experimental settings and by implementing numerical correction methods (Zimmermann, 2018;Zimmermann et al., 2016), future work is needed to test whether the patterns we observe at the bulk level can be replicated using single-grain FTIR measurements.
Finally, our study shows the importance of using large numbers of replicates in the pollen samples to account for the large amounts of chemical variation present in the chemical spectra, even within replicate species. The large numbers of samples and high levels of replication here (i.e. 50 ± 23 tree replicates per species) are a major advantage over previous studies, which have featured either fewer replicates (<5) (Jardine et al., 2019;Julier et al., 2016;Woutersen et al., 2018) or fewer/no congeneric species (Julier et al., 2016;Zimmermann et al., 2017). Although we do find clear signals in the data linked to systematics (Figure 3 and Table 3), we also find that ~60% of the total variation remains unexplained. One probable reason for the unexplained variation observed in our study may be linked to the environmental controls on pollen chemistry. Previous studies have suggested plasticity of pollen chemistry to climate and other environmental variables Depciuch et al., 2016;Depciuch, Kasprzyk, Sadik, & Parlińska-Wojtan, 2017;Zimmermann et al., 2017;Zimmermann & Kohler, 2014). The other probable reason is the intra-species variation between the genotypes of different populations as well as within populations . This suggests it will be critical to understand the other factors which can account for this variation if these pollen-chemistry techniques can be successfully applied to fossil sequences.

| CON CLUS IONS
We investigated the chemical variation in pollen sampled from 294 individuals of Quercus using FTIR to investigate whether this technique could enable taxonomic discrimination of modern Quercus pollen. Our results achieved excellent (~97%) recall to section level, showing that subgenus level differentiation of pollen samples is possible using IR methods. However, despite these promising results at the section level, more detailed, species-level differentiation was complicated by overlapping variation in the chemical composition of closely related species.
We also aimed to identify which specific functional groups are responsible for the taxonomic discrimination in the data. Here, we found lipids and sporopollenins to be key determinants between different Quercus sections. Although the sporopollenin functional groups are identified as important for discrimination between Quercus taxa, isolating the effect of these sporopollenin groups from the effects of other functional groups which may not be preserved in sediment sequences (e.g. lipids) still present a challenge. In addition, testing the application on single-grain Quercus samples, and developing a more complete understanding of the effects of environmental variation on pollen-chemical signatures in Quercus is required. Taken together, our findings build on previous studies and show that, while FTIR approaches on modern Quercus pollen can perform at a similar level to SEM techniques, future work on the discrimination of sporopollenin components is required before FTIR can become a more widespread tool in long-term ecology and biogeography. Thus, our study represents a valuable step forward in improving our understanding of variation in pollen chemical composition and its application in long-term ecology and biogeography.

ACK N OWLED G EM ENTS
We dedicate this paper to the memory of John Flenley  who pioneered so many exciting and novel aspects of pollen analysis Alistair W. R. Seddon https://orcid.org/0000-0002-8266-0947