Uncovering a phylogenetic signal in plant biopolymer chemistry: a comparison of sporopollenin isolation approaches for use in palynological research

Sporomorphs (pollen and spores) are a mainstay of research into past vegetation, and increasingly sporomorph chemistry is being used as a palaeoecological tool. To make extant sporomorphs directly comparable to fossil specimens, fresh material is processed to remove labile compounds and isolate the sporopollenin wall. A range of processing approaches are currently in use, but the chemistries produced by these different techniques have not yet been compared across a range of taxa. It is therefore not clear how they compare in terms of efficiently isolating sporopollenin without changing its chemical structure, and what impact they have on relative chemical similarities and differences among taxa (i.e. whether more closely related species will always appear chemically more similar, regardless of how they have been processed). Here, we test this by applying five different processing approaches to sporomorphs from 15 taxa from across the vascular plant phylogeny. We show that each approach has its own idiosyncrasies in terms of impacts on sporomorph chemistry. For the most part a common pattern of among‐taxon chemical variability is uncovered, and a phylogenetic signal within sporopollenin chemistry is supported. Working with spectral derivatives generally increases agreement among the different processing approaches, but decreases the strength of the phylogenetic signal. No one processing approach is ideal, and the choice of which to use is likely to depend on the goal of the study, the type and quantity of material being processed, and the laboratory facilities available for processing.

Abstract: Sporomorphs (pollen and spores) are a mainstay of research into past vegetation, and increasingly sporomorph chemistry is being used as a palaeoecological tool.To make extant sporomorphs directly comparable to fossil specimens, fresh material is processed to remove labile compounds and isolate the sporopollenin wall.A range of processing approaches are currently in use, but the chemistries produced by these different techniques have not yet been compared across a range of taxa.It is therefore not clear how they compare in terms of efficiently isolating sporopollenin without changing its chemical structure, and what impact they have on relative chemical similarities and differences among taxa (i.e.whether more closely related species will always appear chemically more similar, regardless of how they have been processed).Here, we test this by applying five different processing approaches to sporomorphs from 15 taxa from across the vascular plant phylogeny.We show that each approach has its own idiosyncrasies in terms of impacts on sporomorph chemistry.For the most part a common pattern of among-taxon chemical variability is uncovered, and a phylogenetic signal within sporopollenin chemistry is supported.Working with spectral derivatives generally increases agreement among the different processing approaches, but decreases the strength of the phylogenetic signal.No one processing approach is ideal, and the choice of which to use is likely to depend on the goal of the study, the type and quantity of material being processed, and the laboratory facilities available for processing.
S P O R O M O R P H S (pollen and spores) are reproductive vectors of land plants (Traverse 2007).They are a cornerstone of research into past vegetation, environmental and climate change, pollination ecology and aerobiology, and are used as tools in biostratigraphy, melissopalynology (for understanding the botanical provenance of honey) and forensics (Moore et al. 1991;Jansonius & MacGregor 1996;Traverse 2007).More recently, there has been research from a material science perspective into sporomorphs, with a focus on the physical and chemical structure of sporopollenin, the biopolymer that comprises the outer wall (exine) of sporomorphs, with a view to using sporomorphs as microparticles, microcapsules, and as a target for biomimetic material synthesis (Diego-Taboada et al. 2013;Mackenzie et al. 2015;Gonzalez-Cruz et al. 2018;Li et al. 2019;Cojocaru et al. 2022).
The analysis of sporopollenin chemistry is also a focus of palaeopalynological research.This is partly stimulated by an interest in the fate of organic matter in the geological record, and the repolymerization pathway of sporopollenin from a biopolymer to a geopolymer (Yule et al. 2000;de Leeuw et al. 2006;Watson et al. 2012;Fraser et al. 2014a;Bernard et al. 2015;Jardine et al. 2021), but also for the use of sporopollenin chemistry as a palaeoclimatic and palaeoecological tool.Specifically, recent research has focused on using sporopollenin chemistry as a proxy for reconstructing ultraviolet-B (UV-B) and total solar irradiance through time (Rozema et al. 2001a;2002;2009;Blokker et al. 2005;Watson et al. 2007;Lomax et al. 2008;Fraser et al. 2011;2014b;Willis et al. 2011;Lomax & Fraser 2015;Jardine et al. 2016;2017;2020;Seddon et al. 2017;2019;Bell et al. 2018;Liu et al. 2023), as a chemotaxonomic tool for classifying fossil sporomorphs (Julier et al. 2016;Woutersen et al. 2018;Jardine et al. 2019;2021;Muthreich et al. 2020) and, via stable carbon isotope analysis, reconstructing C 3 vs C 4 grass abundances (Nelson et al. 2006;2014) and moisture availability (Griener et al. 2013;Bell et al. 2017;2019), as well as identifying reworked sporomorphs in fossil samples (Korasidis et al. 2022).Key to these endeavours is understanding the evolution of sporopollenin, including its chemical disparity across the land plant phylogeny and through time, and the strength of a phylogenetic signal in sporopollenin chemistry vs evolutionary convergence (de Leeuw et al. 2006;Fraser et al. 2012;Woutersen et al. 2018;Nierop et al. 2019;Jardine et al. 2021).
Fresh sporomorphs (i.e.those collected from plants, rather than fossil sporomorphs from the geological record) comprise a mix of proteins, lipids and carbohydrates in addition to the sporopollenin-based exine (Zimmermann & Kohler 2014;Ba gcıo glu et al. 2015;Jardine et al. 2015;2017;Zimmermann 2018;Muthreich et al. 2020).Chemical analysis of sporopollenin, or the use of sporopollenin 'shells' as microparticles or microcapsules, therefore requires the removal of these labile compounds to isolate the sporopollenin for subsequent investigation or use (Dom ınguez et al. 1998;1999;Ahlers et al. 1999;Loader & Hemming 2000;Jardine et al. 2015;Mundargi et al. 2016;Gonzalez-Cruz et al. 2018;Li et al. 2019;Lutzke et al. 2020).Consequently, a range of processing methods have been developed for sporopollenin isolation, with differences in the reagents used, the durations and number of steps required, and the equipment needed (Table 1).
For example, chemical analysis of sporomorphs with a view to palaeoproxy development and chemotaxonomic applications have often used simple and quick methods developed for routine morphological palynological research, such as acetolysis and, to a lesser extent, treatment with potassium hydroxide (KOH) (Rozema et al. 2001a(Rozema et al. , 2001b;;Blokker et al. 2005;Jardine et al. 2015;2016;2017;2020;2021;Woutersen et al. 2018).For carbon isotope analysis of sporomorphs, sulfuric acid (H 2 SO 4 ) rather than acetolysis has been used to isolate sporopollenin, to avoid contamination of the d 13 C signal by carbon-containing chemicals (Loader & Hemming 2000;Bell et al. 2017).Longer procedures involving multiple reagents and steps have typically been used for microencapsulation applications and research into the chemical composition of sporopollenin, where clean and unaltered sporopollenin is needed and bulk quantities (typically 100s of mg to 10s of g) of sporomorphs are processed (Dom ınguez et al. 1998(Dom ınguez et al. , 1999;;Gonzalez-Cruz et al. 2018;Li et al. 2019;Uddin et al. 2019;Lutzke et al. 2020) (Table 1).
While these various methods have the same overall aim (that of removing the non-sporopollenin components of sporomorphs and leaving isolated sporopollenin behind) they have different chemical effects, with the potential to either not fully remove the labile compounds, or to alter the chemical signature of the isolated sporopollenin.For palaeopalynological applications investigating either the chemical evolution of sporopollenin, or seeking to classify fossil sporomorphs based on their chemical signature, a key question is what impact sample processing choices have on the results obtained.In particular, it is essential to understand how relative inter-taxon chemical differences vary among processing approaches, whether the magnitude of these taxon-to-taxon chemical differences changes substantially, and how they compare with phylogenetic inter-taxon distances (i.e. the total length of the branches connecting species in a phylogeny through their most recent common ancestor as a measure of relatedness; Faith 1992; Verd u et al. 2012).The strength of a phylogenetic signal in sporopollenin, and its maintenance with processing, is important not only for understanding sporopollenin chemical evolution, but also for classification problems.A strong phylogenetic signal can be leveraged to classify sporomorphs at higher and more inclusive taxonomic levels, removing the need to focus on species-level classifications that are likely to be impractical in highly diverse taxa such as grasses (Julier et al. 2016;Jardine et al. 2019), or in deep time investigations (Jardine et al. 2021).
Here, we investigate these issues using sporomorphs from a broad phylogenetic range of plant taxa, whilst subjecting them to several different processing techniques.We document the chemical changes that occur with each processing approach, and the relative similarities and differences among taxa following processing, using Fourier transform infrared (FTIR) spectroscopy.Fourier Transform infrared spectroscopy is a vibrational spectroscopic technique that allows for the rapid and non-destructive characterization of extant and fossil sporomorph chemistry, and as such it has become a standard tool for biogeochemical measurement in (palaeo)palynological studies (e.g.Steemans et al. 2010;Zimmermann 2010;Fraser et al. 2012;Zimmermann & Kohler 2014;Jardine et al. 2017;2020;Muthreich et al. 2020;Liu et al. 2023).Since FTIR provides information on the different components of sporomorphs (i.e. the sporopollenin wall and labile compounds such as proteins, lipids and carbohydrates) (Zimmermann & Kohler 2014;Ba gcıo glu et al. 2015;Jardine et al. 2015), it is appropriate for tracking chemical changes associated with different sporopollenin isolation techniques.We also study the impact of spectral processing measures (smoothing and taking derivatives), and the relationship between sporopollenin chemistry and phylogeny.

MATERIAL AND METHOD
The sample set comprises 15 taxa, which were selected to provide a broad phylogenetic range, incorporating angiosperms (10 species), gymnosperms (4 species) and a lycopod (1 species) (Fig. 1; Table 2).For the angiosperms, four species are from the superasterid clade, four are superrosids, one (Platanus hispanica) is a basal eudicot, and one (Secale cereale) is a monocot (Table 2).The angiosperm and gymnosperm samples were purchased from Pharmallerga (Lisov, Czechia), and the Lycopodium clavatum spores were purchased from Sigma-Aldrich (Gillingham, UK).Samples were purchased rather than collected from nature to ensure both uniformity and sufficient material to perform all processing techniques.
We selected a range of sporomorph processing approaches to provide a broad comparison of methods and techniques.We used a combined enzyme and solvent approach (hereafter enzymes/solvents) because while this is not routinely applied in palaeopalynological research, it has been used for researching the chemical composition of sporopollenin (Ahlers et al. 1999;Li et al. 2019;Lutzke et al. 2020) and should remove all non-sporopollenin compounds without changing the sporopollenin chemical signature (Lutzke et al. 2020).Acetolysis and KOH were used because they are common in palaeopalynological research (Traverse 2007), and H 2 SO 4 because it is routinely used in pollen carbon isotope studies (Loader & Hemming 2000;Bell et al. 2017).We also applied the anhydrous hydrogen fluoride treatment introduced by Dom ınguez et al. (1998), which uses hydrogen fluoride in pyridine (HF-Py), because it has previously been used to study the chemical composition of sporopollenin (Dom ınguez et al. 1998(Dom ınguez et al. , 1999)), and hydrofluoric acid (HF) is routinely handled in many palynology labs.Palynologists are likely to be able to make use of this approach, despite it not yet being widely adopted by palaeopalynologists.
For each treatment 1 mL of sporomorphs was processed, except for the enzymes/solvents treatment where 125 mg of sporomorphs were used.In each case the samples were processed in 15 mL centrifuge tubes, and deionized water was used throughout.(2020).The sporomorph samples were first suspended in 5 mL of 0.1 M sodium acetate buffer (pH 4.5) containing 1% w/v Cellulase Onozuka R-10 and Macerozyme R-10 for 72 h at 30°C, with regular stirring throughout.The samples were then centrifuged and washed five times with 5 mL of water, then left in water for 24 h at 30°C.Following this, the samples were again centrifuged and washed five times with 5 mL of water; thereafter they were washed three times with 2 mL of 100% methanol, and three times with 2 mL of 100% F I G . 1 .Molecular phylogeny of the taxa included in this study, showing the main clades (see also Table 2).Abbreviations: L., Lycopods; M., Monocots, B.E., basal Eudicots.diethyl ether.This was followed by six 24-hour treatments with solvents of increasing polarity: 100% diethyl ether, 100% chloroform, ≥99.5% acetone, 100% methanol, 100% 2-ethoxyethanol, and water (5 mL in each case).
Between treatments, the sporomorph suspension was centrifuged and washed three times with 2 mL of the extraction solvent, followed by two washes with 2 mL of the following solvent in the sequence.Following the extraction steps, the samples were washed fives time with 5 mL of water, incubated in water at 40°C for 72 h, centrifuged, and finally washed three times with 5 mL of water.
For the acetolysis treatment the samples were acetolysed in 3 mL of a 9:1 mixture of ≥99% acetic anhydride ([CH 3 CO] 2 O) and 96% sulfuric acid (H 2 SO 4 ) for 10 min at 90°C in a water bath, followed by one wash with 100% glacial acetic acid (CH 3 COOH) and two washes with water, with centrifugation and decanting in between washes.
For the H 2 SO 4 treatment (Loader & Hemming 2000) the samples were processed in 3 mL 96% H 2 SO 4 for 1 h at room temperature.A complicating factor when working with H 2 SO 4 is its higher density compared to most other reagents, which makes centrifugation challenging.Therefore, following the treatment the samples were transferred to 50 mL tubes before being topped up with water, mixed, centrifuged and decanted, followed by repeated washing and centrifugation until the samples were pH neutral (c. 8 washes needed).
For the anhydrous hydrogen fluoride treatment (Dom ınguez et al. 1998), samples were treated in 3 mL HF-Py (70% hydrogen fluoride, 30% pyridine) for 5 h at 40°C in a water bath.The samples were stirred several times during treatment.Following treatment, the sample tubes were topped up with water, centrifuged and decanted, followed by repeated washing and centrifugation until the samples were pH neutral.
For the KOH treatment, the samples were processed in 3 mL 10% KOH for 1 h at 90°C in a water bath, followed by repeated washing and centrifugation until the samples were pH neutral.Following all treatments, the samples were washed twice with ethanol, centrifuged, decanted and left to air dry.
IR spectra were generated using ATR (attenuated total reflectance) FTIR, because this approach is suitable for generating bulk scans from sporomorph samples, and provides high quality spectra without the distortions and scattering effects that are often produced in transmissionbased IR analyses of a limited number of grains (Zimmermann et al. 2016;Jardine et al. 2021).We generated the data using an Agilent Cary 670 FTIR spectrometer, with a KBr beamsplitter and a thermo-electrically cooled DLaTGS detector, fitted with a Pike Technologies (Madison WI, USA) MIRacle universal ATR accessory with a single reflection ZnSe ATR crystal, housed in the Institute of Landscape Ecology at the University of M€ unster.The IR and ATR units were purged with compressed air to limit the impact of variations in atmospheric CO 2 and H 2 O on the measurements.FTIR spectra were generated using 32 scans per sample and a resolution of 4 cm À1 , with a resultant spectral data interval (i.e. the spacing between data points in the FTIR spectra) of 1.93 cm À1 .We generated six replicate measurements per taxon and treatment, resulting in 15 taxa 9 6 treatments 9 6 replicate measurements = 540 measurements in total.Before each set of six replicate measurements a background spectrum was recorded using the empty ATR crystal, and F I G . 2 .Stacked mean FTIR spectra for each taxon in the untreated sample set.Grey shaded regions show AE1 standard deviation about the mean.Key peaks are marked with dotted/ dashed lines.Abbreviations: L, lipids; S, sporopollenin; P, proteins (see also Table 3); Crypto., Cryptomeria.
automatically removed from the sample measurements during data generation.
Prior to analysis the FTIR spectra were baseline corrected with a second-order polynomial baseline, and truncated to the 1800 to 800 cm À1 region to focus on the main among-taxon chemical variations (Jardine et al. 2021).To remove variations in absolute absorbance values the truncated spectra were standardized using standard normal variates (SNV), also known as z-score standardization, by subtracting the spectrum mean and dividing by the standard deviation (Varmuza & Filzmoser 2009).This results in spectra with a mean of zero and variance of one.Peaks were identified and linked to the primary components of sporomorphs (sporopollenin, proteins, lipids, carbohydrates) using the published literature ( For each taxon 9 treatment combination, the mean FTIR spectrum was calculated and used in further analyses.We used principal components analysis (PCA) to visualize the inter-taxon chemical relationships for each treatment, because this approach reduces the number of dimensions of complex multivariate data, allowing the main variation in the dataset to be visualized on a limited number of axes (Varmuza & Filzmoser 2009).The intertaxon Euclidean distances were compared across treatments, and with the phylogenetic distances between taxa, using pairs plots with correlation coefficients, and box plots (Zuur et al. 2009).Transforming FTIR spectra to derivatives via the Savitzky-Golay smoothing algorithm is a common step prior to multivariate analysis and classification (e.g.Julier et al. 2016;Zimmermann et al. 2016;Muthreich et al. 2020): the use of derivatized spectra emphasizes smaller-scale spectral features that can be useful for classification, but also increases the influence of random noise in the spectra, which is counteracted by smoothing as part of the Savitzky-Golay algorithm (Varmuza & Filzmoser 2009).Since this approach has the potential to alter the relative pairwise chemical distances between taxa and their arrangement in ordination space (Jardine et al. 2019), we re-ran all analyses on second derivative spectra calculated using the Savitzky-Golay algorithm with a second order polynomial and a smoothing window size of 21.This window size was chosen as an intermediate value from previous analyses of sporomorph chemistry where the spectral data interval is the same as that used here (i.e.1.93 cm À1 ), and where the smoothing window value has been optimized via the classification success rate, with values typically between 7 and 43 (Woutersen et al. 2018;Jardine et al. 2019;2021).The phylogeny was generated using the phylo.makerfunction from the V.PhyloMaker2 package (Jin & Qian 2019; 2022) for R (R Core Team 2013), and phylogenetic distances were measured as the pairwise cophenetic distances calculated from the branches of the phylogeny (Faith 1992).

Overview of untreated sporomorph chemistry
Taken together, the FTIR spectra (Fig. 2) for the untreated samples exhibit peaks associated with lipids, proteins, carbohydrates, and sporopollenin-bound phenolic compounds, as expected (Table 3).For example, lipid peaks occur at c. 1740 and 1460 cm À1 , proteins at c. 1650 and 1550 cm À1 , peaks associated with the carbohydrate intine (the inner pollen wall) between 1200 and 900 cm À1 , and sporopollenin peaks at c. 1680, 1605, 1515, 1170, 850 and 830 cm À1 .However, there are a number of taxon-specific differences in the heights of the various peaks, reflecting differences in the underlying contributions of respective structural moieties.For

Impact of processing on sporomorph chemistry
All the processing approaches have the anticipated effect of removing labile compounds, leaving sporopollenin behind and thus making the sporopollenin-related peaks in FTIR spectra more prominent.There are, however, substantial differences in how completely the non-sporopollenin peaks are removed, and what impact the processing approaches have on the sporopollenin chemistry.For each processing technique, the main spectral changes are summarized below and in Table 4. Figures 3-7 show both the spectra for each taxon following treatment, and the difference between the treated and untreated spectra (one figure per treatment).
Enzymes/solvents.This treatment leads to a removal of the labile compounds, with the protein, lipid and carbohydrate peaks decreasing in size or disappearing entirely (Fig. 3).The carbohydrate band decreased relatively less in some taxa, such as the gymnosperms  H 2 SO 4 .As with acetolysis, this treatment appears to efficiently remove labile compounds while also reducing the intensity of the aromatic peaks (Fig. 5).Again additional peaks also appear in the 1200 to 1000 cm À1 range.The form of these peaks differs to treatment with acetolysis, however, with a pronounced peak at 1200 cm À1 and a relatively lower 1030 cm À1 peak.The maintenance of the F I G . 3 .Stacked FTIR spectra for the enzymes/solvents treatment sample set.Left panel shows the mean spectrum for each taxon, with grey shaded regions showing AE1 standard deviation about the mean.Right panel shows the difference between the mean spectrum and the mean untreated spectrum for each taxon, with the horizontal dashed grey lines showing zero difference in each case (see Fig. 2 for the untreated spectra).Key peaks are marked with dotted/dashed lines.Abbreviations: L, lipids; S, sporopollenin; P, proteins (see also Table 3); Lyco., Lycopodium; Crypto., Cryptomeria.
Amide I peak at 1650 cm À1 in some taxa (e.g.Fraxinus, Populus) suggests that the proteins were also not fully removed by H 2 SO 4 in these samples (Fig. 5).
HF-Py.This treatment removes most labile compounds with apparently less of an obvious impact on the sporopollenin chemistry compared to acetolysis and H 2 SO 4 (Fig. 6).However, the aromatic peaks are slightly decreased in height relative to the intensities after the enzymes/solvents treatment, and in some cases the labile compounds are incompletely removed, such as with the 1745 cm À1 lipid peak in Liquidambar and Bassia, and a peak or shoulder at 1650 cm À1 in several taxa suggesting remaining proteins (Fig. 6).
KOH.Following this treatment, a number of labile compound peaks remain across the taxa, specifically peaks in the 1200 to 900 cm À1 range suggesting incomplete removal of the carbohydrate intine, and peaks or shoulders at 1745 and 1460 cm À1 suggesting incomplete removal of lipids (e.g.Liquidambar) (Fig. 7).A new peak at c. 1660 cm À1 also appears in most taxa following F I G . 4 .Stacked FTIR spectra for the acetolysis treatment sample set.Left panel shows the mean spectrum for each taxon, with grey shaded regions showing AE1 standard deviation about the mean.Right panel shows the difference between the mean spectrum and the mean untreated spectrum for each taxon, with the horizontal dashed grey lines showing zero difference in each case (see Fig. 2 for the untreated spectra).Key peaks are marked with dotted/dashed lines.Abbreviations: L, lipids; S, sporopollenin; P, proteins (see also Table 3); Lyco., Lycopodium; Crypto., Cryptomeria.
treatment with KOH, which possibly represents a change in protein structure leading to a shift of the Amide I band to higher wavenumbers.

Multivariate analysis
For the untreated samples and non-derivatized spectra, the inter-taxon Euclidean distances are positively correlated with phylogenetic distances, demonstrating that closelyrelated taxa are chemically more similar compared to more distantly related taxa (Fig. 8).A PCA of the taxon mean spectra shows that the angiosperms group together in the upper left quadrant of the plot (lower end of PC1 and upper end of PC2), separating out from Lycopodium, which occurs at the upper end of PC1, and the gymnosperms, which occur between these two extremes (Fig. 9).Picea occurs separately to the other three gymnosperm taxa, consistent with their placement on the phylogeny (Fig. 1).Similarly, the superasterid taxa all cluster together, although the superrosids are more spread out in ordination space (Fig. 9).The PC loadings show that the heights of the 1650 F I G . 5 .Stacked FTIR spectra for the H 2 SO 4 treatment sample set.Left panel shows the mean spectrum for each taxon, with grey shaded regions showing AE1 standard deviation about the mean.Right panel shows the difference between the mean spectrum and the mean untreated spectrum for each taxon, with the horizontal dashed grey lines showing zero difference in each case (see Fig. 2 for the untreated spectra).Key peaks are marked with dotted/dashed lines.Abbreviations: L, lipids; S, sporopollenin; P, proteins (see also Table 3); Crypto., Cryptomeria.
and 1550 cm À1 protein peaks, and the 1020 cm À1 carbohydrate peak, are negatively correlated with PC1 (that is, the peak heights are relatively lower in taxa at the upper end of PC1, and relatively higher in taxa at the lower end of PC1), while the 1710 and 1170 cm À1 sporopollenin peaks are positively correlated with PC1 (Fig. 10; Table 3).For PC2, peaks at 1740, 1515 and 1170 cm À1 are positively correlated, while the 1020 cm À1 peak is negatively correlated.The inter-taxon distances for the enzymes/solvents, H 2 SO 4 and HF-Py treatments are all positively correlated with each other, the phylogenetic distances, and to the inter-taxon distances for the untreated samples (Spearman's rho = 0.4-0.8)(Fig. 8).Ordinations of these treatments reveals a broadly similar arrangement of taxa as the untreated samples, with the angiosperms separating out from the gymnosperms and Lycopodium (Fig. 9).While the loadings vary between the treatments, for PC1 peaks between 1200 and 1000 cm À1 are positively correlated while peaks between 1700 and 1500 cm À1 are negatively correlated (Fig. 10).
F I G .6 .Stacked FTIR spectra for the HF-Py treatment sample set.Left panel shows the mean spectrum for each taxon, with grey shaded regions showing AE1 standard deviation about the mean.Right panel shows the difference between the mean spectrum and the mean untreated spectrum for each taxon, with the horizontal dashed grey lines showing zero difference in each case (see Fig. 2 for the untreated spectra).Key peaks are marked with dotted/dashed lines.Abbreviations: L, lipids; S, sporopollenin; P, proteins (see also Table 3); Crypto., Cryptomeria.
While the inter-taxon distances for the KOH and acetolysis treatments are positively correlated with each other and the other treatments and the phylogenetic distances, the correlation coefficients are lower (Fig. 8).This is especially pronounced for acetolysis which is at most only weakly correlated with the other treatments, and uncorrelated with phylogeny.A PCA of the KOH treated samples reveals a less clear phylogenetic signal in the arrangement of the taxa in ordination space compared to most of the other treatments, although the angiosperms are mostly limited to the lower end of PC2 and the gymnosperms are mostly limited to the upper end (Fig. 9).An exception to this is Liquidambar, which occurs at the upper end of PC2; the PC loadings show that this is because of the higher 1745 cm À1 lipid peak that remains in these samples, and the relatively pronounced 1170 cm À1 peak that it shares with Picea (Figs 7, 10).The arrangement of taxa on PC1 is largely controlled by the relative height of the carbohydrate peak, suggesting that incompleteness of the removal of the intine across the taxa overprints much of the underlying phylogenetic signal in these spectra (Fig. 10).
F I G .7 .Stacked FTIR spectra for the KOH treatment sample set.Left panel shows the mean spectrum for each taxon, with grey shaded regions showing AE1 standard deviation about the mean.Right panel shows the difference between the mean spectrum and the mean untreated spectrum for each taxon, with the horizontal dashed grey lines showing zero difference in each case (see Fig. 2 for the untreated spectra).Key peaks are marked with dotted/dashed lines.Abbreviations: L, lipids; S, sporopollenin; P, proteins (see also Table 3); Crypto., Cryptomeria.
A PCA of the acetolysed samples shows limited evidence of a phylogenetic signal in the arrangement of the taxa, with considerable overlap between the gymnosperms and angiosperms, and no clear clade-based groupings within the angiosperms (Fig. 9).Inspection of the PC loadings shows that peaks at c. 1720 and <1100 cm À1 are positively correlated with the PC 1 scores, while sporopollenin peaks at c. 1670, 1570 and 1170 cm À1 are negatively correlated (Fig. 10).
The inter-taxon Euclidean distances are highest for the untreated and enzymes/solvents samples and lowest for the acetolysed samples, with the H 2 SO 4 , HF-Py and KOH samples intermediate between these (Fig. 11).The lower distances among the acetolysed samples are consistent with the observed simplification of the IR spectra (Fig. 4).
Repeating these analyses with second derivative spectra reveals broadly similar relationships among the intertaxon distances to the non-derivatized spectra.The correlation coefficients between the treatments are in general higher with the derivatized spectra, especially for the acetolysed and KOH treated samples (Fig. 12).They are lower for the H 2 SO 4 treatment, however, and the correlations between the chemical and phylogenetic distances are also generally weaker with the derivatized spectra (Fig. 12).Comparing the inter-taxon Euclidean distances for each treatment shows a similar pattern to the non-derivatized spectra, with the exception that the H 2 SO 4 spectra have similarly low distances to the acetolysed spectra (Fig. 13).Applying PCA to the derivatized spectra shows a separation of the angiosperm and non-angiosperm taxa in ordination space for most treatments, and to some extent a separation between the superrosid and superasterid angiosperms (Fig. 14).This pattern is less clear in the untreated spectra, where Lycopodium and Picea separate out from the rest of the taxa on PC1, and for the KOH treated spectra where Liquidambar separates out on PC1 and the other taxa cluster together (Fig. 14).Compared to the non-derivatized spectra the acetolysed samples show a clearer within-clade clustering.The percentage of variance explained on the first two PCs is slightly lower with the derivatized spectra compared to the nonderivatized spectra, but in most cases the first two axes account for c. 70% of the variance in the dataset (Fig. 14).

DISCUSSION
Overall, these results suggest that using different treatment methods can lead to substantially different residual sporomorph chemistries, either through not completely removing labile compounds or by altering the chemistry of the sporopollenin.This shows that for machine learning-based classifications using sporomorph chemistry it will be essential to work with training sets that have been processed in exactly the same way as the material being classified to ensure that like is being compared with like.However, the overall ranking of inter-taxon distances is similar across most of the treatments, with high positive correlations among distances and a similar arrangement of taxa in ordination space (Figs 8,9).This suggests that while the different treatments lead to a different sporomorph chemistry, the same underlying taxon-totaxon relationships can be uncovered.There are exceptions to this, such as acetolysis and to some extent the treatment using KOH, although processing the spectra with derivatives and Savitzky-Golay smoothing increases the correlation with the enzymes/solvents treatment, suggesting that a common underlying chemical pattern can, to some extent, be recovered with these methods as well (Figs 12,14).
These results support a phylogenetic signal in sporomorph chemistry, with a separation between the angiosperms and non-angiosperms in ordination space, and some evidence for clade-based groupings within these higher taxa.While the correlation between the inter-taxon chemical and phylogenetic distances in the processed samples shows that sporopollenin chemistry contains a phylogenetic signal (Woutersen et al. 2018;Nierop et al. 2019;Jardine et al. 2021), the decrease of this correlation with processing suggests that the labile compounds carry a phylogenetic signal as well, and it is the combination of these different components that drives the relationship with phylogeny in fresh sporomorph chemistry (Zimmermann & Kohler 2014;Ba gcıo glu et al. 2015).Interestingly, the correlations between the inter-taxon distances and phylogeny are generally lower with the derivatized spectra compared to the non-derivatized spectra (Figs 8,12), and future analyses of sporopollenin evolutionary history should carry out data analysis both with and without spectral processing to take the impact of this step into account.
The treatments used here each have pros and cons for use in palynological analyses, linked both to their ability to isolate sporopollenin efficiently while not altering its chemical structure, but also to their general applicability for (palaeo)palynological research (Table 4).In particular, collecting samples for classification training sets will mean going beyond sporomorph types that can be purchased in bulk or collected from abundant sporomorph producers (such as Lycopodium, conifers or birches), and instead working with small samples from plants that produce limited quantities of sporomorphs, sampled in natural settings, botanical gardens or herbaria.Ideally this will also involve sampling a number of individual plants per species to incorporate intra-specific variation and chemical responses to environmental gradients (Zimmermann The enzymes/solvents treatment appears to remove most labile compounds while leaving the sporopollenin biopolymer largely unaffected, although the carbohydrate intine was incompletely removed in some taxa (Fig. 3).
One key difference in our study compared to previous applications of this approach (e.g.Lutzke et al. 2020) is that, due to equipment constraints tied to processing 15 samples at once in a palynological laboratory, the samples were not continuously stirred during the enzyme digestion step but were instead manually stirred at intervals across the 72-hour treatment.Since this included leaving the samples unattended overnight, it is possible that this led to the intine not being fully digested, and this is something would need careful consideration if this approach were to be used again.However, the wider applicability of this technique also needs to be taken into account, since the processing protocol takes c. 2 weeks to work through, including treatment with multiple reagents and many cycles of centrifugation and decanting.This approach is therefore unlikely to be useful for highthroughput processing and working with very small sample sizes, limiting its use for developing chemotaxonomic libraries (Julier et al. 2016;Zimmermann et al. 2016;Jardine et al. 2019), analysing sporomorph chemistry evolution (Fraser et al. 2012;Nierop et al. 2019) or calibrating palaeoclimate proxies (Blokker et al. 2005;Lomax et al. 2008;Jardine et al. 2016;2017).
Processing sporomorphs with KOH, while a standard approach for morphological studies (Traverse 2007), appears to be less suitable for chemical analyses, due to the incomplete removal of both carbohydrates and lipids (Fig. 7).However, it is perhaps unsurprising that not all labile compounds are removed by a 1-hour treatment with KOH, given that the standard approach used in sporopollenin isolation for microencapsulation applications involves refluxing with acetone for c. 12 hours to remove lipids, KOH for c. 12 hours to remove proteins, and phosphoric acid for 7 days to remove residual proteins and the carbohydrate intine, as well as multiple washes with solvents (Gonzalez-Cruz et al. 2018).While this approach and variations of it result in empty and clean sporopollenin shells suitable for microencapsulation (Mundargi et al. 2016;Gonzalez-Cruz et al. 2018;Uddin et al. 2019), as with the enzymes/solvents approach it is unlikely to be useful for routine palynological applications involving large quantities of, what are often, small samples.
Both the H 2 SO 4 and HF-Py treatments were positively correlated with the enzymes/solvents and untreated samples, and would be suitable for working with smaller sample sizes.However, both approaches failed to remove all the labile compounds, suggesting longer treatment times would be needed, and the H 2 SO 4 treatment appears to alter sporopollenin chemistry, with a reduction in the height of the phenolic peaks (Fig. 5).HF-Py is also expensive to purchase and hazardous to use, and since palynologists are actively looking for alternatives to HF for routine sample processing (Riding & Kyffin-Hughes 2011; Santos & Ledru 2022) this treatment seems less likely to attain widespread use.
Acetolysis leaves a clear imprint on the processed material, with a reduction in the height of phenolic peaks and the introduction of new absorbance bands at c. 1160 and 1030 cm À1 (Fig. 4).This means that the resultant biopolymer is not unaltered sporopollenin (Lutzke et al. 2020) and inter-taxon chemical differences are reduced compared to most of the other treatments (Figs 11,13).Processing the spectra with derivatives increases the correlation with all the other treatments, at least in terms of relative inter-taxon Euclidean distances and the arrangement of taxa in ordination space, although the correlation coefficients are still relatively low in most comparisons (Fig. 12), and determining whether this approach can be applied to future studies of the evolutionary history of sporopollenin chemistry requires further research.Acetolysis is, however, convenient and quick to use and gives consistent results, with all labile compounds removed, and is applicable to small samples sizes.Chemical classification with acetolysed material does appear to be feasible as well (Jardine et al. 2021), even if unaltered sporopollenin chemistry is not being used as the basis for classification, although again this requires further validation to ensure that the results are consistent and reliable across both extant and fossil specimens, and are not systematically biased by the processing procedure used.Similarly, UV-B irradiance reconstructions have been achieved using acetolysed sporomorphs (Jardine et al. 2016;2020), and a direct comparison between UV-B absorbing phenolic compound levels in acetolysed and unacetolysed subfossil grass pollen grains revealed a strong positive correlation (Jardine et al. 2016), despite the reduction in phenolic content that acetolysis causes.Pragmatically, acetolysis may still be a suitable approach for future chemopalynological investigations, once the various factors (convenience of use and applicability to small sample sizes vs impacts on sporopollenin chemistry) have been considered, but only if it can be shown that  the chemical alterations inherent in this approach do not invalidate the assumptions of the research.
The broader findings from this study (that no one approach both efficiently isolated the sporopollenin exine and left it unaltered) suggests that palynologists will need to continue investigating new methods, or adapt those used here, to try to find a better solution to the problem of processing extant sporomorphs for chemical analysis.We have emphasized keeping methods broadly applicable, including limiting techniques as far as possible to those that can operate in standard palynological laboratories, and ideally without the use of hazardous and expensive reagents such as HF, but being open to collaborating with chemists and incorporating novel techniques and equipment is likely to help solve some of the issues raised in this study.
While here we have used FTIR spectroscopy as a measurement tool, a range of vibrational and mass spectroscopic methods are available that can provide complementary information on different aspects of sporomorph chemistry, and future analyses of the kind carried out in this study would benefit from incorporating these.Like FTIR, Raman spectroscopy is a vibrational technique that provides information on the molecular structure of analysed samples, but with differing sensitivity to different functional groups, so that bands that are weakly detected in one approach are often strong in the other (Mayo 2003;Olcott Marshall & Marshall 2014).Although Raman spectroscopy has previously been used to analyse sporomorphs (e.g.Schulte et al. 2008;Ba gcıo glu et al. 2015;Diehn et al. 2020), strong fluorescence can swamp the Raman signal and therefore complicate measurements, most significantly with the isolated sporopollenin and associated bio/geopolymers analysed in palaeopalynological research (Olcott Marshall & Marshall 2014;Bernard et al. 2015).However, Fourier transform (FT) Raman appears to overcome this issue (Zimmermann 2010; Ken del & Zimmermann 2020), and so may be a viable option for comparing sporopollenin isolation approaches and analysing fossil sporomorphs.While mass spectroscopic analysis of sporomorphs has traditionally been problematic because of the chemical inertness of the sporopollenin biopolymer, thermally assisted hydrolysis and methylation-pyrolysis-gas chromatography-mass spectrometry (THM-py-GC-MS) has successfully been used to provide detailed information on the identity and abundance of the specific compounds comprising sporopollenin (Blokker et al. 2005;Watson et al. 2007;2012;Willis et al. 2011;Bell et al. 2018).Incorporating THM-py-GC-MS measurements would therefore be a natural extension of the FTIR-based analysis presented here, especially in terms of understanding the potential of the different isolation techniques to drive changes in sporopollenin chemistry.Future work in this area will also benefit from incorporating (sub)fossil specimens that have been treated with a range of chemical treatments, to better understand how extant and fossil datasets can be directly integrated in the same analysis.

CONCLUSION
In this study, we have compared sporopollenin isolation approaches using a range of taxa from across the tracheophyte phylogeny.The results of this research show that there is a phylogenetic signal both in fresh sporomorphs and in isolated sporopollenin.While different treatments lead to different sporomorph chemistries, either because some labile compounds remain following processing or because the sporopollenin itself is altered, the various treatments broadly converge on a common signal in terms of relative chemical similarities and differences among taxa.Although there are exceptions to this finding, such as processing with acetolysis and to some extent KOH, analysing derivatized spectra increases the overall similarity with other treatments.Future studies of chemical palynology will need to consider the trade-offs between the different processing options available, including the efficacy of the treatment, its impact on sporopollenin chemistry, and its applicability to small sample sizes and high-throughput processing of potentially large batches of samples.

For
the enzymes/solvents treatment we adapted the protocols outlined in Ahlers et al. (1999), Li et al. (2019) and Lutzke et al.

T
A B L E 3 .Band assignments for the main peaks in the ATR-FTIR spectra, based primarily on Lutzke et al. (2020) and Ba gcıo glu et al. (2015

F
I G . 8 .Pairs plot for the non-derivatized taxon mean spectra and phylogenetic pairwise distances, showing the relationships among treatments and phylogeny via scatterplots (upper triangle) and Spearman's rho correlation coefficients (lower triangle), and the distribution of pairwise distances as histograms on the diagonal.F I G . 9 .Principal components analysis scatterplots for the non-derivatized taxon mean spectra, showing PC1 and PC2 for each treatment.Percentage variance explained by each principal component shown in parentheses.

F
I G . 1 0 .Principal components analysis loadings plots for the non-derivatized spectra, showing the PC1 (black solid lines) and PC2 (grey dashed lines) loadings for each treatment.

F
I G . 1 1 .Box plot of the pairwise inter-taxon Euclidean distances for each treatment, for the non-derivatized spectra.For each box the thick horizontal line shows the median, the edges of the box show the lower and upper quartile, and the whiskers show the extent of the data up to 1.5 times the interquartile range; values beyond this are shown as individual points.Notches around the median lines roughly correspond to a 95% confidence interval for the medians, and were calculated according to McGill et al. (1978).& Kohler 2014; Muthreich et al. 2020), again limiting potential sample sizes because multiple individuals cannot be pooled together.Such research is also likely to involve efficiently processing large numbers of samples in standard palynology laboratory set-ups, rather than in dedicated chemistry laboratories.

F
I G . 1 2 .Pairs plot for the 2nd derivative taxon mean spectra and phylogenetic pairwise distances, showing the relationships among treatments and phylogeny via scatterplots (upper triangle) and Spearman's rho correlation coefficients (lower triangle), and the distribution of pairwise distances as histograms on the diagonal.

F
I G . 1 3 .Box plot of the pairwise inter-taxon Euclidean distances for each treatment, for the 2nd derivative spectra.For each box the thick horizontal line shows the median, the edges of the box show the lower and upper quartile, and the whiskers show the extent of the data up to 1.5 times the interquartile range; values beyond this are shown as individual points.Notches around the median lines roughly correspond to a 95% confidence interval for the medians, and were calculated according to McGill et al. (1978).

F
I G . 1 4 .Principal components analysis scatterplots for the 2nd derivative taxon mean spectra, showing PC1 and PC2 in each case.Percentage variance explained by each principal component shown in parentheses.
Overview of sporopollenin isolation approaches, with examples of use in the published literature.
T A B L E 1 .
Species used in this study; Clade 1 and Clade 2 are suprafamilial taxa that incorporate the focal species.
T A B L E 2 .
). Bassia, Liquidambar and Lycopodium have a more pronounced lipid peak at 1740 cm À1 , while this is barely expressed in Calluna, Betula and Secale, and is intermediate in height in the other taxa.The broad carbohydrate band centred on c. 1050 cm À1 is clearly present in most taxa but not expressed in Picea and Lycopodium.Phenolic peaks representing the sporopollenin wall are most clearly expressed in Picea but more variably in the other taxa.In general the protein-related amide I and II peaks at 1650 and 1550 cm À1 , respectively, are more clearly expressed in the angiosperms relative to the gymnosperms and Lycopodium. example, Taxus, Cryptomeria and Juniperus, suggesting that the enz2021;reatment may not have been similarly effective in all cases.The sporopollenin peaks become more prominent, especially the 1680 cm À1 carbonyl peak (Lutzke et al. 2020), which is less obvious in most of the untreated spectra.Similarly, the C=C peak at c. 1580 cm À1 becomes more prominent, especially in the angiosperm taxa.A carboxylic acid peak at 1710 cm À1 , which has been identified in previous studies of isolatedsporopollenin (Fraser  et al. 2014a; Jardine et al. 2017;2021; Lutzke et al. 2020), appears as a shoulder on the main 1680 cm À1 carbonyl peak.