From feces to data: A metabarcoding method for analyzing consumed and available prey in a bird‐insect food web

Abstract Diets play a key role in understanding trophic interactions. Knowing the actual structure of food webs contributes greatly to our understanding of biodiversity and ecosystem functioning. The research of prey preferences of different predators requires knowledge not only of the prey consumed, but also of what is available. In this study, we applied DNA metabarcoding to analyze the diet of 4 bird species (willow tits Poecile montanus, Siberian tits Poecile cinctus, great tits Parus major and blue tits Cyanistes caeruleus) by using the feces of nestlings. The availability of their assumed prey (Lepidoptera) was determined from feces of larvae (frass) collected from the main foraging habitat, birch (Betula spp.) canopy. We identified 53 prey species from the nestling feces, of which 11 (21%) were also detected from the frass samples (eight lepidopterans). Approximately 80% of identified prey species in the nestling feces represented lepidopterans, which is in line with the earlier studies on the parids' diet. A subsequent laboratory experiment showed a threshold for fecal sample size and the barcoding success, suggesting that the smallest frass samples do not contain enough larval DNA to be detected by high‐throughput sequencing. To summarize, we apply metabarcoding for the first time in a combined approach to identify available prey (through frass) and consumed prey (via nestling feces), expanding the scope and precision for future dietary studies on insectivorous birds.

In this study, we test the metabarcoding methods in describing a boreal food web between insectivorous birds (four species of parids) and their invertebrate prey. According to previous studies (based on morphology, direct observations, etc.), breeding parids in northern Europe forage mainly on moth caterpillars feeding on birch leaves (Rytkönen, Koivula, & Orell, 1996;Rytkönen & Krams, 2003). Here, we will trial techniques to identify the diet for several bird species in the same food web in much greater detail than was previously possible. In addition, the availability of potential invertebrate prey in the birds' habitat is also monitored using the same methods. The prey consumed by the predators can be determined from prey DNA in the predators' feces (e.g., Vesterinen, Lilley, Laine, & Wahlberg, 2013;Vesterinen et al., 2016;Vesterinen, Puisto, & Blomberg, 2018;Wirta et al., 2015). Correspondingly, the available prey species (Lepidoptera) can be determined by collecting the feces of larvae (hereafter frass). To our knowledge for the first time in this context, we apply DNA metabarcoding to samples from both bird feces and insect frass to observe consumed prey and evaluate the food availability in the wild. Sampling of frass is already commonly used to determine caterpillar abundance in the canopy (e.g., Zandt, 1994;Rytkönen & Orell, 2001), as feces are produced in proportion to caterpillar biomass (Tinbergen & Dietz, 1994).
Firstly, we focused on whether we can extract, amplify and sequence the invertebrate DNA from both the bird and the insect feces. Secondly, we tested how well the sequences are identifiable on the grounds of the extensive regional DNA barcode reference library built by the national initiative, the Finnish Barcode of Life project (FinBOL, www.finbol.org), with the data maintained by BOLD Systems (Ratnasingham & Hebert, 2007). Thirdly, we investigated how much material is needed for accurate species identification from frass samples using a simple laboratory experiment with a single moth species. Finally, we analyzed whether varying amounts of DNA sequence material can be used to estimate the quantity of prey that is available and used by the predators. The success in this approach may open new perspectives in the studies of diet selection, food webs, and ecosystem functioning, including interspecific competition and resource partitioning.
Samia cynthia (Lepidoptera, Saturniidae) caterpillars used for the laboratory experiment were reared in university facilities. Animal handling was done according to ethical guidelines presented by the National Animal Experiment Board.

| Collection of nestling feces
The fecal samples of parid nestlings were collected in 2015 when the nestlings were about two weeks old (nearly three weeks for the Siberian tits), around mid-June. All feces acquired at a nest (typically 1 to 3 fecal sacs) were collected in the same 5-ml plastic tube containing 96% ethanol. The tubes were stored in a freezer at −20°C until analysis. The total number of fecal samples analyzed in this study was 14 (willow tits: 4, great tits: 4, blue tits: 2, Siberian tits: 4). This number was deemed adequate for a proof-of-concept study while simultaneously being manageable, and we tried to keep sampling effort even between the study species.

| Frass collection in the wild
Frass of moth caterpillars was collected 1996-2014 using plastic funnels (Ø 35 cm) attached to tree trunks under the canopy. Only birches (Betula spp.) were selected for the frass collection because they are known to form the most important foraging habitat during the nestling period of the focal birds (Rytkönen & Krams, 2003). The frass was collected into paper coffee filters attached under the funnels.

| Laboratory experiment to test DNA barcoding with frass
To analyze how much frass is needed for successful species determination via metabarcoding, we used Samia cynthia (Lepidoptera, Saturniidae) as a model species. The larvae originated from a commercial laboratory stock. Eggs were housed at 20°C. Once the eggs hatched, larvae (N = 20) were divided evenly between two clean plastic containers provided with mesh lids to prevent the larvae from escaping.
The larvae were reared at 20°C (light:dark cycle: 12 hr:12 hr) and fed ad libitum on fresh leaves of the cherry (Prunus cerasus, Rosaceae), which is a known host plant of the species. In the fifth larval instar, frass of the larvae was collected immediately after being observed in a container, and stored in a freezer at −20°C. Before analysis, two frass sets each produced by several individual caterpillars was dried, grinded, mixed, and divided into weight classes: 0.1, 0.3, 1.2, 4.2, 14.4, and 50 mg.
Samples were weighed with a precision balance (Mettler Toledo MT 5).

| DNA extraction of all samples
Before DNA extraction, the samples were unfrozen, homogenized, and dried at 50°C until the ethanol had vaporized. We used the QiaAmp Fast DNA Stool Mini Kit (ID: 51604, QIAGEN), which is specifically developed for fecal samples, for the extraction, and followed the manufacturer's instructions. The extracted DNA was further concentrated by evaporating samples in vacuum. A negative control treatment was carried out with each extraction batch, containing all the same chemicals but without any DNA sample. No products were formed in any of the extraction blanks, and they were not used in further sequencing analysis.

| PCR and library construction
The DNA of fecal samples is generally highly fragmented (Deagle, Eveson, & Jarman, 2006), and therefore, short mini-barcode primers (ZBJ-ArtF1c/ZBJ-ArtR2c) were selected for amplification (Zeale, Butlin, Barker, Lees, & Jones, 2011). These primers selectively amplify a 157 bp long target in mitochondrial cytochrome c oxidase subunit I (COI) gene in arthropods. Although the use of COI as a standard DNA barcode marker has received some critique for taxonomic bias (Clarke, Soubrier, Weyrich, & Cooper, 2014; Elbrecht & Leese, 2017), it has nevertheless been successfully applied in numerous studies and is by far the most commonly applied primers in studies targeting arthropod prey (e.g., Vesterinen et al., 2013, Clare et al., 2014, Wirta et al., 2015, Kaunisto et al., 2017. COI barcodes also have been used in one of the few dietary studies combining molecular data from both consumed arthropods and available arthropod prey (Vesterinen et al., 2016). Moreover, the complete COI reference library for ~2,600 species of Finnish Lepidoptera has proven its high functionality (Mutanen et al., 2016). Short sequences have previously been shown to be highly informative, allowing differentiation of species even in notoriously highly similar Lepidoptera (Hajibabaei et al., 2006;Mutanen, Kekkonen, Prosser, Hebert, & Kaila, 2015).
In addition to the locus-specific primers, linker sequences that allow for the easy inclusion of unique tags and adaptors can be attached in the second round (Clarke, Czechowski, Soubrier, Stevens, & Cooper, 2014). The first PCR step used the following primers: The first step PCR was performed in 25 μl total volume including 12.5 μl Phusion ® Flash High-Fidelity PCR Master Mix with GC Buffer (Thermofisher), 0.5 μM of forward and reverse primers, 2.5 μl of DNA template, and 7.5 μl of sterile water. PCR cycling profile was as follows: first denatured at 98°C for 2 min, followed by 38 cycles of 98°C for 10 s, 58°C for 30 s, and 72°C for 15 s, and then, after the last cycle hold at 72°C for 5 min. Two PCR reactions were prepared for each sample to offset PCR stochasticity and enhance detection of rare taxa (Alberdi, Aizpurua, Gilbert, & Bohmann, 2018), and combined after the PCR. Amplified PCR products were then purified with AMPure XP (Agencourt) at a 1.2× ratio, and concentration was measured using the MultiNA capillary electrophoresis system (Shimadzu).
In the second PCR stage, we used primers with unique 9-basespecific index-tags to track each individual sample. Besides index-tags, the primers had Ion Torrent PGM-specific adaptors A and TrP1 as well as universal linkers (the same as first-stage primers). The primer sequences used in the second step were (index-tags marked with "N"): The second PCR contained 25 μl Phusion ® High-Fidelity Flash PCR Master Mix with GC Buffer (Thermofisher), 0.5 μM of forward and reverse primers, 10 ng of the purified PCR product from the previous step, and the total volume was brought to 50 μl with sterile water.
Samples were initially denatured for 90 s at 98°C, followed by eight amplification cycles: 98°C for 10 s, 63°C for 20 s, and 72°C for 20 s, and after the cycles 72°C for 5 min.
The second PCR products were purified with AMPure XP at a 1.0× ratio and concentration was determined via picogreen dsDNA reagent (Thermofisher) after which the samples were pooled in equimolar ratios (25 ng of each sample). Pooled sample was further purified with AMPure XP at first in a 1.0× ratio, then double-sided 0.6× and 1.2× ratios and finally at 1.0x ratio. Prior to sequencing, library profile was checked using the MultiNA capillary electrophoresis system and final concentration was determined with the picogreen reagent.

| Sequencing of the pooled library
The sequencing of DNA was done at the Biocenter Oulu Sequencing Center, University of Oulu, by using Ion Torrent PGM device with a 316 chip. The manufacturer's instructions were followed and the following packages used: Ion PGM™ Template OT2 400 kit (Thermofisher) and Ion PGM™ Hi-Q™ Sequencing Kit (Thermofisher).

| Bioinformatics for the Ion Torrent data
Sequencing resulted in 3,484,589 raw reads of which about 1.8 million were COI amplicons. Resulting reads were processed with QIIME software (version 1.8.0, Caporaso et al., 2010). The raw FASTQ file was split on the basis of sample-specific index-tags into their own groups and low quality (average quality <Q20) raw reads were discarded (split libraries command). For passed reads, the presence of both forward and reverse primer sequence (from the first PCR stage) was required. Next, the reads were clustered based on their similarity (pick_otus de novo command) into Operational Taxonomic Units (OTUs). As sequences are particularly similar for the anticipated main prey (lepidopterans; Hebert et al., 2003), a low clustering value could result in clumping of closely related species together. Thus, we decided to use the 98% similarity threshold for clustering. For each OTU, a random sequence was selected as a representative sequence for the cluster (pick_rep_set command). Finally, an OTU table was made (make_otu_table command). OTUs with low abundance (<8 reads in all samples combined) were discarded from further analysis.
Finally, 760 OTUs remained to be assigned to biological species.

| Assignation of OTUs to species
We compared the OTUs to the international BOLD database (Ratnasingham & Hebert, 2007). Sequences were manually compared to the full BOLD database, which includes all the sequences uploaded into the BOLD systems, also those with no proper species assignation. The regional DNA barcode reference library is unusually comprehensive for insects, and for the most important prey group of tits, the Lepidoptera, its coverage is 100% (all species occurring in the area represented). The details of the assignation of the OTUs to species is in the supplements (Supplement 1: Assignation of OTUs to species).
The following basic principles were used in the determination of sequences: 1. 97%-100% match to the sequence in the database was required for the species determination, whereby the best match would be chosen as the species corresponding to the sample.
2. 97%-100% similarity to the database was also used as the threshold for genus determination.
3. If the species with the best match does not occur in the study area, a resident species with the next best matching sequence was selected instead, but only when the sequence showed >97% similarity to that species too.

The determination was left at the genus level (or another higher
taxonomic level) if two species (or higher taxon) were equally probable on the basis of similarity of the sequences. The determination was also left at a higher taxonomic level if no sufficiently similar species (or higher taxon) could be found among the sequences in the database.

The determination was left to the order or family level if similarity
was less than 97%.
6. Similarities below 90% were classified as unidentified to any taxonomic level. Often, the 20 best matches of such undefined sequences could belong to wholly different orders rendering class-level determinations unreliable. Three OTUs were determined to species level at 95% identity, as a different OTU from the same sample had already been determined to the same species based on 100% identity. In a similar fashion, nine others were determined to the genus level. Finally, another four were determined to the genus level despite below 97% identity as the majority of their similar hits belonged to the same genus.

| Statistical analyses
The R Statistical Environment (R Core Team, 2013) was used for the parametric tests and linear modeling on the relationships between sample size and metabarcoding success, as well as for the correlation analysis between prevalence and read abundance.

| Success in DNA extraction and PCR
DNA extraction and PCR yielded sufficient DNA for further analysis in 31 of 50 samples (62%). This can be broken down to the following for the various categories of samples: 11/14 fecal samples of birds (79%), 8/24 frass samples of wild moth larvae (33%), and 12/12 frass samples of laboratory-reared larvae (100%). The oldest successful frass samples were 20 years old (dating from 1996). The bird feces were collected during the preceding breeding season (2015), that is, about one year old. The success rate of DNA extractions from frass samples from the field sites seemed to be related to preservation method and the amount and quality of plant material in the samples. For dried frass, 4/4 attempts (100%) were successful, but for frozen frass samples, which included more plant fibers that were difficult to grind, this number decreased to 4/20 (20%).
The list of all identified species, split between frass and parid samples, is presented in Supporting Information Appendix S1. The vast majority of identifications in both data sets belonged to the order Lepidoptera: 84% of identified sequences and 80% of species detected in bird feces, while this was respectively the case for 58% and 64% in the frass data.

| The amount of fecal material and metabarcoding success
The mass of wild moth frass samples was not related to success in metabarcoding (i.e., successful and unsuccessful samples did not differ in their mass: t = −0.082, df = 8, p = 0.936). As these samples concern bulk material, however, no data of frass mass for individual moth species were available. The frass mass of the laboratory-reared Samia cynthia was significantly related to success in metabarcoding. The best model included dry mass as log-transformed variable ( Table 1). The proportion of correct identifications increased sharply with increasing sample dry mass, especially when dry mass exceeded 5 mg (Figure 1). The corresponding frass mass in field collections could be achieved by collecting ca. 20 average-sized frass pellets (per species); although the number cane range from 2 to 100 frass pellets due to substantial variation in their size (Figure 2).

| Quantitative data on species abundance
We found a correlation between the number of different DNA sequences (in a sample) and the fraction of samples including a species (prevalence). The higher the number of sequences belonging to the same species, the more likely it is that these species are prevalent in our data, both in frass and bird samples (r = 0.807 and 0.755, df = 15 and 39 for frass and bird data, respectively, both p < 0.001). It should be noted, however, that this effect is mainly caused by the high variation in sequences for a few species, as can be seen in Figure 3.

| D ISCUSS I ON
Our study aimed to test the feasibility of conducting a combined research on the feces of both arthropod prey and their avian predators, in order to obtain insights into prey abundance and the corresponding predator diet. The dietary results we obtained match those in previous observations. Whether expressed as the proportion of species (77%), unique sequences (84%) or samples they occurred in (79%), the proportion of lepidopteran species in the birds' feces is similar to the three quarters of prey items found for great TA B L E 1 Specifications of the best linear model explaining the relationship between frass sample dry mass (mg) and the percentage of correct species identifications in the feeding trial (where all samples were known to be of Samia cynthia)

F I G U R E 1
The relationship between frass sample dry mass (mg) and percentage of correct identifications in the feeding trial of lepidopteran species Samia cynthia and blue tits (Rytkönen & Krams, 2003) and the 20%-80% for willow tits (Rytkönen et al., 1996). By using metabarcoding combined with an expansive reference library, however, this large generalized group could be further investigated, and we were frequently able to identify prey accurately to the species level. The arthropod species encountered in the diet and environment contained those one would expect (e.g., moth species whose caterpillars feed on birch leaves), and the frass collection specifically also included nonlepidopterans (e.g., spiders, dipterans) accidentally trapped in the funnels. Although the latter could complicate quantitative comparisons across all species, their confirmed presence in the birds' habitat would still be of use in qualitative analyses. Only some species found in the nestlings' feces were also encountered in the frass samples and vice versa. Given the multitude of insect species occurring in the area, only a fraction was observed using either method, thus leading to a small overlap between the two. Our limited sample size is partially to blame for this, with most species only being found in one or two samples (see Supporting Information Appendix S1). The phenomenon of low overlap between available and detected prey is not uncommon even with larger sampling size however (Eitzinger et al., 2018;Vesterinen et al., 2016).
By experimentally manipulating the amount of fecal mass analyzed, we showed that the used metabarcoding method is powerful enough to detect DNA sequences from small amounts of material.
However, a certain threshold level seems to exist, which may affect the detectability of scarce prey species in bulk samples. Based on the data from the laboratory-raised moth species, the threshold to detect DNA with high probability was about 5 mg of dry frass mass.
At present, we do not know if the problems with smaller samples already occur in the DNA extraction phase, during PCR, or whether some sequences are filtered from data due to their rarity. We also did not test if the amplification of the total genomic DNA might have a positive effect on this. Based on our long-term frass data, the threshold value of 5 mg dry frass mass corresponds to ca. 20 average-sized frass pellets, ranging from 2 to 100 depending on the frass pellet size (Figure 2). It is therefore recommended that frass samples used for metabarcoding analyses should be large where possible. On the other hand, having several separate frass samples may be used to obtain the quantitative prevalence data for the estimation of relative abundances of different moth species.
Our results show that invertebrate prey species can be identified from both predator feces and feces of the prey species. When DNA from predators' diets and their available prey species are representatively sampled, one could obtain detailed information not only on the diet of predators and the food web structure, but also on the predators' preferred prey and niche differences between species. Existing knowledge on the ecology of parids (e.g., Rytkönen et al., 1996, Rytkönen & Krams, 2003 allowed us to focus on particular sections of their habitat: the birches, inhabited by moth caterpillars, which are the main foraging site during the nestling period. Metabarcoding analysis of these frass samples extends the traditional method of frass sampling to determine the overall abundance of caterpillars in the canopy (Tinbergen & Dietz, 1994) by allowing the detection of specific prey species.
Moreover, although DNA metabarcoding is typically used for qualitative analyses of species, it may nonetheless be possible to acquire insights regarding quantitative data on species' abundances as well (see Deagle et al., 2018). This can be done by utilizing intra- Given the limitations of our study, most notably the small sample size as well as the fact that bird and frass samples were collected in different years, it is too early to draw any sweeping conclusions on the ecological interactions within this food web.
Indeed, the substantial variation in identified species in different samples indicates a requirement for a large number of samples, both for the bird feces and frass. Given fluctuations in arthropod numbers between the years, simultaneous collections of both types of samples would be preferred whenever possible, especially since our results suggest a broad range of arthropods to be predated upon by the birds. If already available, however, we have found that well-stored historical samples can still yield good results using metabarcoding.
In this study, we demonstrated that metabarcoding can be employed to both determine a predator's diet from their feces (bird nestling feces) and the availability of those prey from their own feces (moth caterpillar frass), thus being applicable to trophic interaction studies. To our knowledge, this is the first time when the available insect prey species were successfully identified from prey feces using DNA metabarcoding. To validate our approach, we carried out a laboratory feeding experiment, and to prove the power of the method, we analyzed the diet of four widespread bird species. As both nestling feces and insect frass can be collected easily using harmless and unobtrusive methods, this methodology is ideal for many applications. With advances in sequencing technologies, detecting prey DNA from predator feces has recently become a norm in diet studies of many insectivorous animals (e.g., Clare et al., 2009;Wirta et al., 2015;Kaunisto et al., 2017;Vesterinen et al., 2018). Equipped with another new tool to dissect the diet of animals, we hope that our study opens a new wave of research utilizing both available and consumed prey.

ACK N OWLED G M ENTS
We thank Kvantum Institute at the University of Oulu, Academy of Finland (grants #258638 and #283609), Finnish Cultural Foundation (North Ostrobothnia Regional Fund), and Jane and Aatos Erkko Foundation for supporting the ongoing project, P. Tokola for providing the Samia larvae and S. Haapala for rearing them in the lab.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R S CO NTR I B UTI O N
SR, EV, MH, and MO were primarily responsible for the collection of field samples (SR&EV great tit and blue tit, MH Siberian tit, MO willow tit). TL participated in the lab work. MS was responsible for the sequencing. PV, MM, and CW worked on the arthropod identification. EJV contributed to the final analyses. Writing and revision of the manuscript were primarily conducted by CW, EJV, SR, EV, and TL (methods), with contributions from the other authors.

DATA ACCE SS I B I LIT Y
OUT sequences and OTU