Plant–pollinator interactions over time: Pollen metabarcoding from bees in a historic collection

Abstract Pollination is a key component in agricultural food production and ecosystem maintenance, with plant–pollinator interactions an important research theme in ecological and evolutionary studies. Natural history collections provide unique access to samples collected at different spatial and temporal scales. Identification of the plant origins of pollen trapped on the bodies of pollinators in these collections provides insight into historic plant communities and pollinators’ preferred floral taxa. In this study, pollen was sampled from Megachile venusta Smith bees from the National Collection of Insects, South Africa, spanning 93 years. Three barcode regions, the internal transcribed spacer 1 and 2 (ITS1 and ITS2) and ribulose‐1,5‐biphosphate carboxylase (rbcL), were sequenced from mixed pollen samples using a next‐generation sequencing approach (MiSeq, Illumina). Sequenced reads were compared to sequence reference databases that were generated by extracting sequence and taxonomic data from GenBank. ITS1 and ITS2 were amplified successfully across all (or most) samples, while rbcL performed inconsistently. Age of sample had no impact on sequencing success. Plant classification was more informative using ITS2 than ITS1 barcode data. This study also highlights the need for comprehensive reference databases as limited local plant sequence representation in reference databases resulted in higher‐level taxon classifications being more confidently interpreted. The results showed that small, insect‐carried pollen samples from historic bee specimens collected from as early as 1914 can be used to obtain pollen metabarcodes. DNA metabarcoding of mixed origin pollen samples provided a faster, more accurate method of determining pollen provenance, without the need for expert palynologists. The use of historic collections to sample pollen directly from pollinators provided additional value to these collections. Sampling pollen from historic collections can potentially provide the spatial and temporal scales for investigations into changes in plant community structure or pollinator floral choice in the face of global climate change.


| INTRODUC TI ON
Our daily diet contains many plant products produced as a result of pollination, such as fruits, vegetables, nuts and seed-derived commodities. This crucial ecosystem service not only ensures food on our tables, but also the diversification and maintenance of natural plant populations (Daily et al., 1997;Klein et al., 2007;Kremen et al., 2007). Studying the interaction between plants and their pollinators has traditionally been done by field-based observation (Johnson, 1997;Wester, Stanway, & Pauw, 2009) and palynology (Dafni, 1992;Wilcock & Neiland, 2002). These methods are tedious and time-consuming, and require experts in the fields of palynology and taxonomy to identify both the pollen and the pollinator. Similar pollen morphologies, especially from closely related taxa, further complicate plant identification by palynology (Hargreaves, Johnson, & Nol, 2004;Mullins & Emberlin, 1997;. These requirements have limited studies on plant-pollinator interactions for many pollinator genera, particularly in species-rich regions where there is an abundance of both plants and pollinators.
Taxonomic activities in the areas of entomology and botany drive pollinator and palynology-related work, but studies are often conducted independently from each other. Samples are often collected for taxonomic purposes, such as species identification, distribution pattern determination or identifying new introductions.
Individual specimens are labelled with descriptive collection information, including collection date, location, collector and other relevant information before being stored in collection (Pennisi, 2000).
Flower-visiting animals housed within natural history collections may have pollen on their bodies. Although flower visitors were likely not collected with the aim of utilizing the pollen that was inadvertently collected along with the specimen, this pollen holds important information on the plants visited by the insect visitor, the identity of a possible pollinator and the plant community structure where the organism was collected. Additionally, a number of specimens from the same area, but from different temporal points, can be used to provide a chronological map of the area's plant and pollinator history.
Historic collections may therefore provide a meaningful resource to investigate, not only pollinator-plant interactions over time, but also plant community change over time, providing important information on diversity and distribution.
DNA barcoding allows for identification and classification of organisms based on a short nucleotide sequence (Hebert, Cywinska, Ball, & deWaard, 2003). Ideal DNA barcodes have significant interspecific genetic variation and are flanked by conserved regions for universal primer binding to allow easy amplification across a wide range of taxa (Kress & Erickson, 2008). There is still debate on the optimal DNA barcode for plants (Dong et al., 2015;Ferri et al., 2015;Kress, García-Robledo, Uriarte, & Erickson, 2015), but the ribulose-1,5-biphosphate carboxylase (rbcL) and maturase K (matK) chloroplast genes have been suggested as good candidate genes to target (CBOL Plant Working Group, 2009). Other chloroplast genes and regions have also been used successfully to barcode plants and pollen, including trnL (Kraaijeveld et al., 2015;Valentini, Miquel, & Taberlet, 2010), rpoC1 and trnH-psbA (CBOL Plant Working Group, 2009). Another well-utilized marker is the internal transcribed spacer 2 (ITS2) region that is found between the 5.8S and 26S rRNA genes in plants Yao et al., 2010). ITS2 has been used as the DNA barcode in recent pollen barcoding studies (Bell et al., 2017;Keller et al., 2015;Sickel et al., 2015). ITS1 can be similarly useful at identifying plants to species level (Wang et al., 2015). Generally, a multi-locus approach to identification yields better results due to increased discriminatory power (CBOL Plant Working Group, 2009;Burgess et al., 2011 andas reviewed in Bell et al., 2016).
Mixed origin, environmental samples, such as pollen, are characterized by the presence of DNA from different organisms that may or may not be degraded. Next-generation sequencing (NGS) technologies now allow high-throughput sequencing of complex DNA libraries (Liu et al., 2012) including pollen (Bell et al., 2017;Keller et al., 2015;Kraaijeveld et al., 2015;Sickel et al., 2015). Coupling together NGS and DNA barcoding (metabarcoding) could improve on the identification assessments of pollen compared to traditional microscopic identification methods employed by palynology.
In this study, we investigated the possibility of using a historical bee collection as a pollen source for ITS1, ITS2 and rbcL metabarcoding and examined the usefulness of this approach to identify plant species from small amounts of pollen carried by bee specimens collected over 100 years ago.

| Pollen sample collection from bee specimens
Selecting an appropriate bee species for this study was not only dependant on the availability of the species within the collection, but also when specimens were collected. Pollen loads of the specimens can also vary depending on whether the individual bee was captured on their way to or on their way from a floral visit. Based on these criteria, Megachile venusta Smith (Megachilidae) specimens were selected from the South African National Collection of Insects housed at Biosystematics, Plant Protection Research: Plant Health, of the Agricultural Research Council (ARC), Pretoria, South Africa. The collection is utilized for taxonomic classification of indigenous bee species, and houses type specimens.
Megachile venusta is indigenous to southern Africa, and bee specimens used for pollen sampling were collected from different biomes across South Africa over a period of 93 years .
Three bee specimens with visible pollen from each decade, starting from the 1910s up to the 2000s, were selected for inclusion in this study. No specimens with visible pollen were available for the 1930s or 1950s, and these decades are thus not represented here. Only one specimen was available and included for the 1940s. Accession information of the bee specimens used in this study is provided in Supporting Information Table S1.
Pollen samples from the selected M. venusta specimens were removed from bee abdomens using sterile micropipette tips dipped in sterilized glycerol while viewing specimens with a stereo dissection microscope (SteREO Discovery.V8 microscope, Carl Zeiss Microscopy GmbH, Jena, Germany). Care had to be taken when working with the old, fragile bee specimens. Each pollen sample was transferred to a sterile 1.5-ml Eppendorf tube and pollen crushed with the micropipette tip while still under magnification. The micropipette tip for each sample was left inside the tube after scraping off the pollen to include any pollen that inadvertently entered the micropipette tip during scraping.

| DNA extraction
To ensure that grains were ruptured evenly across taxa, the DNA extraction protocol was optimized prior to the extraction of pollen samples from historic bee specimens. Both fresh pollen, arbitrarily collected from plants growing at the ARC Biotechnology Platform's grounds at Onderstepoort, Pretoria, and historical pollen on bee specimens were used as test samples for pollen extraction optimization. The following commercial kits were tested: QIAamp DNA Micro Kit (Qiagen, Hilden, Germany), DNeasy® Plant Mini Kit (Qiagen) and Nucleospin® DNA Trace Kit (Macherey-Nagel GmbH & Co. KG, Düren, Germany). All kits were used according to manufacturer's protocols. In the first experiment, three fresh pollen mixtures were used. Each of these three test samples was divided into two and extracted in parallel using the kits mentioned above, with one reaction subjected to 3 mm steel bead disruption using a TissueLyser II (Qiagen). The other reaction was not subjected to disruption.
In the second optimization experiment, pollen samples collected from six different bees were selected for DNA extraction optimization based on sample age. Bee specimen pollen loads did not vary significantly in size when judged by eye, but actual pollen grain count was not taken into account in selection. Two bee specimens were selected from three different decades, respectively (1980s, 1960s and 2000s). Pollen samples from each decade were used for DNA extraction with the DNeasy® Plant Mini Kit (Qiagen), one sample with and one sample without bead disruption. Bead disruption was performed for 2 min at 25 Hz, with addition of lysis buffers both before and after disruption in different samples. Direct amplification from the pollen template was also tested on fresh pollen, as previously performed by Petersen, Johansen, and Seberg (1996).
DNA from pollen collected from 22 M. venusta (Supporting Information Table S1) was extracted using the DNeasy ® Plant Mini Kit (Qiagen) without bead disruption, as this method produced the most consistent results. Lysis buffer AP1 and Proteinase K (0.2 mg/ ml) were added directly to the Eppendorf tubes containing the micropipette tips used for scraping pollen off the bees. Before transferral of the lysate to the QIAshredder Mini Spin Columns, the micropipette tips were carefully removed using a pair of sterile forceps and excess liquid expelled with a micropipette. In all cases, equipment was cleaned with 10% bleach and 70% ethanol solutions between sampling to avoid cross-contamination. The remainder of the DNA extraction was performed following the manufacturer's protocol, with the elution step using the protocol recommendation for increasing DNA yield with a minor modification: The 20 μl eluate was reapplied to the DNeasy Mini Spin Column and eluted for a second time.

| Barcode amplification and sequencing
Three regions were targeted for DNA barcoding to identify pollen origins, namely, ITS1, ITS2 and rbcL, with primers available in the literature for ITS1 and ITS2 (White, Bruns, Lee, & Taylor, 1990) and rbcL (de Vere et al., 2012;Fazekas et al., 2008;Kress, Wurdack, Zimmer, Weigt, & Janzen, 2005). Primers were modified by adding overhang adapters compatible with the standard Illumina indexing PCR as described in the Illumina 16S Metagenomic Sequencing Library Preparation Guide (Illumina, 2013). The modified primer sequences are given in Table 1. Amplification of the rbcL region was poor, despite extensive optimization. Two reverse primers were tested for rbcL due to poor amplification and sequencing results. Amplification products using primer rbcLajf634R_Tag_IL produced some sequence reads, whereas sequencing products after amplification with rbcLr506_Tag_IL did not produce usable results.  Amplified barcodes were visualized using 2% agarose gel electrophoresis. ITS1 and ITS2 barcode sizes differ between plant taxa and ranged between 100 and 700 bp . The rbcL barcode ranged between 500 and 700 bp (Burgess et al., 2011;Fazekas et al., 2008).

| Bioinformatics and sample analyses
Sample demultiplexing was done using MiSeq Reporter v2.5.1 by separating the samples on perfect index matches. Quality and adapter trimming of reads were done using Trimmomatic 0.33 (Bolger, Lohse, & Usadel, 2014) using the script provided in Supporting Information Appendix S2. All sequences with a length below 50 were discarded, and a quality score of 20 was applied for bases with a sliding window of four bases. Nextera adapters were trimmed from sequences in this same step. Reads that passed quality trimming were merged in MacQiime 1.9.1-20150604 (Caporaso et al., 2010). Generalized linear models were used to test the relationship between the number of reads obtained post quality trimming and the age of the sample and quality score (Q-scores, prediction of the probability of an error in base calling) of sequences produced. The average Phred value (Qscore) for each sample was calculated from the raw ITS1 and ITS2 fragments before adaptor trimming. For each sample, a Phred score was calculated for each base position across all the fragments obtained for a sample. The average of these across the fragment was then calculated. Q-scores were calculated for forward and reverse sequences separately and then averaged to provide a single Q-score for each sample. The number of ITS1 and ITS2 reads was fitted as response and the age of the sample and Q-score as explanatory variables. A negative binomial regression analysis was performed, as the data were not normally distributed and there was evidence of overdispersion. Model fit was assessed using a likelihood ratio test.
Analyses were performed with R 3.5.0 (the R project for statistical computing) with a significance level of p < 0.05.
No curated, plant sequence database for ITS1 was available at the time of data analyses. A reference database was therefore constructed by downloading all Viridiplantae sequences for ITS from GenBank using custom Python scripts (Supporting Information Appendix S1). sanbi.org). (t = 4.48, p < 0.001) and ITS2 (t = 4.03, p < 0.001). Table 2 provides summary statistics for ITS1, ITS2 and rbcL processed reads.

| RE SULTS
The percentage of reads of both ITS1 and ITS2 assigned to the kingdom Viridiplantae varied between samples. Samples consisting of fewer than 1,000 reads that were classified to species level were regarded as unsuccessful and were discarded prior to further analyses. This cut-off was selected so that rare taxa identified will have at least two reads. The presence of unidentified reads did not influence the identification of plant origins of samples, even though a higher amount of total reads were necessary to reach sequence saturation for plant identification. Identification of rbcL reads to plant origins produced very variable results. In 45% of the samples, less than 1,000 reads were produced. Due to the extremely variable nature of amplification and sequencing results, rbcL data were not analysed further.
After removal of taxa representing <0.1% of reads per sample, only one or two plant genera per sample for ITS1 could be identified against the sequence reference database generated in this study.
Between one and eight plant species were identified per sample using ITS2. Rarefaction curves show that the sequencing depth for all samples was sufficient to obtain maximum taxon richness ( Figure 1). When all raw read data are included in rarefaction analyses, a maximum of ten species per sample for ITS2 was reached, with curves still reaching a plateau ( Figure S1), thus further indicated that sufficient sequencing depth was reached.
A total of 81.8% of ITS1 samples had more than 1,000 reads identified to Viridiplantae. One of the ITS2 samples only had 1,154 high-quality, merged reads that could be identified to Viridiplantae, but was still sequenced to saturation, as indicated by the rarefaction curve ( Figure 1b). A single ITS2 sample had less than 1,000 reads identified to Viridiplantae and was removed from further analyses, with 95.5% of samples remaining for further analyses.

| Plant origins of pollen collected from Megachile venusta specimens
When classifying sequence reads to the ITS1 database, two plant genera (Helianthus and Oryza) were identified. Helianthus was identi- With the confidence set at the recommended level of 85%, an average of four species, four genera and four families were identified per sample when classifying reads with the ITS2 database.
In total, 25 species from 21 different genera could be confidently identified with the ITS2 database. These species belonged to TA B L E 2 A summary of merged processed reads for ITS1, ITS2 and rbcL after next-generation sequencing. Numbers indicated are after reads were processed for quality with Q20 filtering, Nextera adapter trimming, fragments discarded that were less than 100 bp in length and forward and reverse reads merged  (Figure 3).

| Effect of sample age on sequencing success
Pollen This suggests that the age of sample did not have a negative impact on pollen barcode amplification. In contrast, Q-scores were a statistically significant predictor of the number of ITS1 (Z = 5.66, df = 19, TA B L E 3 Viridiplantae taxa that were not classified to specieslevel, but to genus and family-level. Two classes were also identified, Liliopsida and Magnoliopsida, as well as the order Cucurbitales

| D ISCUSS I ON
Historic specimen collections can essentially be seen as large, untapped resources of data, especially for plant-pollinator interaction investigations. This is particularly important for species where field observation is difficult. Honeybee pollination is well studied, and the NGS workflow tested in this study could significantly improve knowledge on less well-studied species. As pollinator specimens in historic collections usually have accompanying metadata, these specimens are invaluable to researchers interested in pollination and change in ecosystems or plant communities over time.
DNA barcoding has been used for years to successfully identify unknown plants (Burgess et al., 2011), and the recent uptake of pollen sequencing by the metabarcoding community (Galimberti et al., 2014;Hawkins et al., 2015;Keller et al., 2015;Kraaijeveld et al., 2015;Richardson, Lin, Sponsler, et al., 2015) has sparked new interest in the topic. In this study, DNA metabarcoding was used to determine the plant origins of limited pollen sampled directly from M. venusta bees taken from a historic collection. The pollen exine is exceptionally resilient ensuring DNA contained within the pollen grain maintains its integrity for a very long time, making ancient pollen studies possible (Parducci, Suyama, Lascoux, & Bennett, 2005).
The choice of DNA barcode to identify plant species has been controversial. This has led to the selection of a suite of markers identified as potential candidates (Hollingsworth et al. 2009;Hollingsworth 2011;Hollingsworth, Graham, & Little, 2011;Li, Gao, Poudel, Li, & Forrest, 2011). The utility of these on low concentration, degraded DNA from historical samples, has not been formally assessed. In this study, we examine the utility of three DNA barcode markers (ITS1, ITS2 and rbcL) to identify plants visited by pollinators from small amounts of pollen collected from an important bee collection. Sequencing results indicate that historic pollen identification using DNA barcoding on an NGS platform was most successful using ITS1 and ITS2, regardless of limited starting material or the age of the specimen. In contrast, rbcL produced variable amplification results between samples. Two different reverse primers were tested for this gene and only amplicons produced using the rbc-Lajf634F_Tag_IL primer produced sequence results. However, low clustering on the MiSeq flow cells occurred during both sequence runs, with variability between samples. RbcL has been successfully used before in pollen barcoding with both Sanger sequencing (Bruni, Galimberti, & Caridi, 2015;Galimberti et al., 2014) and with NGS . However, the amount of pollen used for DNA extraction in these cases was notably higher than that used in this study: 50 mg  and 100 mg (Galimberti et al., 2014). Plastids are maternally inherited in many floral taxa, and they will consequently not be present in all pollen grains (Bennett & Parducci, 2006;Corriveau, Goff, & Coleman, 1990 reads were necessary to reach sequence saturation, with the upper limit consistent with previous pollen metabarcoding results (Sickel et al., 2015).
The lower number of plant taxa identified per sample in this study is concordant with the observed floral constancy behaviour in foraging bees (Michener, 2000), where bees tend to visit flowers from plants of the same taxa during one foraging trip as long as this resource remains available. Pollen in this study was sampled directly from bee specimens that were actively foraging during their capture, and therefore, the lower number of plant taxa obtained is not surprising. Sampling pollen from pollen traps  or honey (Bruni et al., 2015;Hawkins et al., 2015) is expected to yield considerably more plant taxa as these pollen samples originate from multiple bees and cover many foraging trips. Indeed, the plant families identified here using metabarcoding correspond with the foraging information known for M. venusta, with the Asteraceae, Fabaceae and Poaceae associations observed for the species in South African (Eardley, 2013 The identification from a reference database will also only occur if the specific species was barcoded before, correctly classified and phylogenetically assigned. The International Barcode of Life (iBOL, www.ibol.org) project aims to achieve this. In this study of pollinator-plant interactions in a hyper-diverse region, more confidence should be placed in higher-level classifications, with family-level interpretation likely being the most accurate. platform. This shows that museum collections could indeed be a valuable resource in pollinator-plant studies. DNA metabarcoding was used to identify the plant origins in pollen. Using ITS2 as a barcode provided much better resolution for plant classification than ITS1.

| CON CLUS ION
Multi-locus approaches to DNA barcoding for plants are recommended, and ITS1 data should therefore be considered with ITS2.
Species-level plant classification is possible with ITS2, but without a comprehensive local plant sequence reference database, familybased interpretations are more reliable.