One‐locus‐several‐primers: A strategy to improve the taxonomic and haplotypic coverage in diet metabarcoding studies

Abstract In diet metabarcoding analyses, insufficient taxonomic coverage of PCR primer sets generates false negatives that may dramatically distort biodiversity estimates. In this paper, we investigated the taxonomic coverage and complementarity of three cytochrome c oxidase subunit I gene (COI) primer sets based on in silico analyses and we conducted an in vivo evaluation using fecal and spider web samples from different invertivores, environments, and geographic locations. Our results underline the lack of predictability of both the coverage and complementarity of individual primer sets: (a) sharp discrepancies exist observed between in silico and in vivo analyses (to the detriment of in silico analyses); (b) both coverage and complementarity depend greatly on the predator and on the taxonomic level at which preys are considered; (c) primer sets’ complementarity is the greatest at fine taxonomic levels (molecular operational taxonomic units [MOTUs] and variants). We then formalized the “one‐locus‐several‐primer‐sets” (OLSP) strategy, that is, the use of several primer sets that target the same locus (here the first part of the COI gene) and the same group of taxa (here invertebrates). The proximal aim of the OLSP strategy is to minimize false negatives by increasing total coverage through multiple primer sets. We illustrate that the OLSP strategy is especially relevant from this perspective since distinct variants within the same MOTUs were not equally detected across all primer sets. Furthermore, the OLSP strategy produces largely overlapping and comparable sequences, which cannot be achieved when targeting different loci. This facilitates the use of haplotypic diversity information contained within metabarcoding datasets, for example, for phylogeography and finer analyses of prey–predator interactions.


| INTRODUC TI ON
Diet studies are critical to the understanding of species interactions, trophic structures, and trophic dynamics (Nielsen, Clare, Hayden, Brett, & Kratina, 2018). They have been applied to a vast set of issues in ecology, evolution, and conservation, such as predator/prey interactions and habitat use (Corse et al., 2010;Sánchez-Hernández, 2014), trophic niche partitioning Trevelline et al., 2018), and the delineation of habitats for guiding species conservation (Quéméré et al., 2013), management (Chivers et al., 2013), and habitat restoration (Motte & Libois, 2002). Diet studies have also proved critical in interfacing agriculture and ecology by assessing the effects of agricultural practices or policies on the trophic behaviors of species (Llaneza & López-Bao, 2015;Mollot et al., 2014) and by evaluating the ecosystem services of wild species, such as in the control of crop pests (Aizpurua et al., 2018;McCracken et al., 2012).
Previously, we developed a benchtop-to-desktop workflow to analyze the diet of an invertivorous fish. We evaluated the taxonomic coverage of two primer sets targeting the 5′ end of the barcode region of the cytochrome c oxidase subunit I gene (COI) in silico, in vitro (using tissue-derived invertebrate DNA), and in vivo (using fish feces; Corse et al., 2017). Here, we introduce a new primer set designed to reduce false negatives. We combined it with the previous two primer sets using a "one-locus-several-primer-sets" (OLSP) strategy to assay environmental samples. We first assessed the taxonomic coverage of the three primer sets with in silico analyses of current available barcodes. We then conducted an in vivo evaluation of the taxonomic and haplotypic diversity coverage and complementarity of the three primer sets on environmental DNAs (eDNA) obtained from a variety of materials as follows: feces from within the same MOTUs were not equally detected across all primer sets. Furthermore, the OLSP strategy produces largely overlapping and comparable sequences, which cannot be achieved when targeting different loci. This facilitates the use of haplotypic diversity information contained within metabarcoding datasets, for example, for phylogeography and finer analyses of prey-predator interactions. invertivores living in brackish water, freshwater, and terrestrial habitats, and spider webs. Finally, we brought new insights into the diet of the brackish water fish Pomatoschistus microps (Krøyer, 1838) and of the African freshwater fish Epiplatys infrafasciatus (Günther, 1866), and we assessed the biodiversity of invertebrates trapped in spider webs from the Amazon rainforest.

| Primer design
In the workflow we developed previously (see Corse et al., 2017), we used two primers sets (MFZR and ZFZR; for details, see  Table 1 for details).

| In silico evaluation of primers
The taxonomic coverage of seven primers sets (of which three include newly designed primers; see org; Ratnasingham & Hebert, 2007) in February 2017. Sequences were clustered with VSEARCH v2.9.0 (Rognes, Flouri, Nichols, Quince, & Mahé, 2016) implemented in Primerminer using a 3% dissimilarity threshold to avoid redundancy, and then, the majority consensus sequences of the clusters were aligned. Only sequences that completely covered the primer annealing site were considered. The number of consensus sequences varied among taxa from 1 to 2075 (median 32; considered taxa listed in Supporting Information Table   S1). Primerminer then provided a penalty score: We used the default value (i.e., 120) to determine whether the consensus sequences should be successfully amplified (score < 120) or not (score > 120) by the primers.

| In vitro selection of primers
The primers sets LepLCO/McoiR1, LepLCO/McoiR2 (LFCR), and LepLCO/MLepF1-rev were assayed using the DNA from 16 distinct specimens of four invertebrate species (further details in Supporting Information Appendix S1). We selected the primer set that unambiguously amplified all 16 samples: LFCR.

| In vivo evaluation of primers
The taxonomic coverage of MFZR, ZFZR, and LFCR was empirically evaluated through metabarcoding of 107 eDNA samples (Table 2). DNA was extracted from samples following Corse et al. (2017), and the safety measures to prevent cross contamination are described by Monti et al. (2015). Our analysis also included extraction, aerosol, PCR, and tag negative controls (respectively,  were 11-13 nucleotide long sequences differentiated by at least three different nucleotides (for details see: Corse et al., 2017).
These tags were added on the 5′ end of the primers (12 and 8 distinct tags were used to label forward and reverse primers, respectively). Amplicons were then processed and sequenced on an The taxonomic assignment of each variant/contig was conducted (a) automatically using the lowest taxonomic group approach (Corse et al., 2017) and (b) manually using BOLD systems. When assignment levels were insufficient or when the two approaches conflicted, a third assignment method was conducted by building phylogenetic trees and/or considering biogeographic information (Corse et al., 2017). The combined use of these three assignment approaches led to a final taxonomic assignment for each variant/ contig. After an initial round of filtering and taxonomic assignment, the variants that were unexpected given their source habitat were identified. Based on the frequency of these unexpected variants, a second round of filtering was run using adjusted LFN thresholds that maximize the elimination of unexpected variants (Supporting Information Figure S1). Finally, to standardize the evaluation of coverage and complementarity between primer sets, all validated variants/contigs were clustered into molecular operational taxonomic units (MOTUs) based on a 3% divergence threshold using complete-linkage clustering.
Since all predators in this study were mainly invertivores, we assumed that most macroinvertebrates (and vertebrates) constituted relevant prey and referred to them as "Macrometazoans." Hence, items that most likely result from passive ingestion or secondary predation such as microinvertebrates (e.g., Amoebozoa, Acari, Tardigrada, Rotifera), diatoms, algae, and plants, as well as potential parasites (e.g., Acanthocephala, Nematoda), were excluded from the analyses (for a similar approach, see Hardy et al., 2017).
To evaluate the coverage and complementarity of the three primer sets, we looked at their performance in detecting different prey groups from the 107 eDNA samples at various taxonomic levels as follows: Phylum, Class, Order, Family, MOTU, and variant. The coverage of each primer set was estimated through the coverage ratio Bc (Ficetola et al., 2010). Here, Bc corresponds to the ratio between the number of taxa detected in samples by a given primer set and the total number of taxa detected by all three primer sets. The complementarity (Com) of the primer sets was assessed by dividing the number of prey items detected by one primer set only by the total number of detected taxa. In addition, we measured samples' pairwise differences in diet composition between primer sets (Wsd for within-sample dissimilarity) with pairwise the Bray-Curtis index (Bray & Curtis, 1957). Finally, pairwise Bray-Curtis dissimilarities between samples were also calculated (Bsd for between-sample dissimilarities). Kingdom or NA = 0), and then, a mean score among the variants was calculated for each sample.

| In silico evaluation of primers
The in silico analysis revealed that theoretical amplification success was generally quite low and varied strongly across primer sets and target taxa (Figure 1).

| In vivo evaluation of the coverage and complementarity of primer sets
The complementarity (Com) of primer sets depended both on the taxonomic level and on the predator type (Figure 3b). The mean Com index differed significantly between the taxonomic levels (Kruskal-Wallis test; Χ 2 = 88.97, df = 5, p < 10 −15 ) and steadily increased from phylum (mean Com = 14.8%) to variant (mean Com = 33.4%; Figure 3b). Although the increase of complementarity from phylum to variant is general, its order of magnitude differed sharply across predators (e.g., Zingel asper vs. E. infrafasciatus; Figure 3b).

| Taxonomic identification and resolution
The calculation of IR allowed for a standardized comparison between samples concerning their taxonomic resolution. Across all environmental samples, the mean IR for Macrometazoans was 5.33 (±0.91).
IR significantly differed between predator types, habitats, and geographic locations (Figure 4). The mean IR values were close to the maximal value for bats and Z. asper (IR = 6), which corresponded to an average taxonomic assignment of variants to species level. The mean IR was only slightly lower for P. microps samples, while the taxonomic resolution of Macrometazoans in the two types of equatorial samples (E. infrafasciatus feces and spider webs) were close to family level (mean IR = 4.05 ± 0.69).

| Diet results
The Macrometazoans detected in the 107 environmental samples covered a wide taxonomic array of invertebrates and included some vertebrate prey as well ( Figure 5). The proportion of cumulative MNI for non-Macrometazoans represented <20% of the total dataset and varied from ~10% (P. microps) to ~40% (E. infrafasciatus) of the total (Appendix 1).
Predator DNA was not detected in any of the 46 Z. asper fecal samples. The mean MNI per sample was 3.50 ± 1.97 ( Figure 6).
Macrometazoan prey of Z. asper was aquatic invertebrates such as Ephemeropera (8 MOTUs), Trichoptera (6 MOTUs), Diptera (13 MOTUs), and Gammaridae (4 MOTUs; Supporting Information Table S5). We also detected DNA from benthic fish species (Barbus barbus and Barbatula sp.) and allochthonous prey (undetermined Nymphalidae). More than one variant was detected for ~30% of Macrometazoan MOTUs. The MOTU assigned to Baetis fuscatus was the most diverse (six distinct variants) and was also the most frequently detected prey (37% of the total MNI). Predator DNA was detected in all 29 P. microps fecal samples.
The mean MNI per sample was 3.17 ± 2.59 ( Figure 6). Our data revealed sharp differences among sampling locations in P. microps diet. Predator DNA was detected in all six E. infrafasciatus fecal samples. The mean MNI per sample was 3.00 ± 1.78 ( Figure 6).
Macrometazoan DNA detected in E. infrafasciatus fecal samples was a mix of aquatic organisms such as crustacean (Atyidae) and terrestrial invertebrates such as ants (Formicidae) and springtails (Collembola), with the ratio of potential allochthonous prey being ~60%.
Interestingly, a variant assigned to ray-finned fishes (Actinopterygii) was detected in one fecal sample.
No DNA of spiders was detected from any spider web samples. The mean MNI per web sample was 6.76 ± 5.34 ( Figure 6).
Macrometazoans detected in spider webs were mostly flying insects with a high diversity detected for Diptera (52 MOTUs, including 24 Cecidomyiidae MOTUs) and Hymenoptera (9 MOTUs). We also detected variants assigned to Coleoptera, Trichoptera, and Collembola.

| In vivo tests are essential to determine primer sets
The discrepancies in coverage of primer sets that we observed among in silico, in vitro, or in vivo analyses illustrates the difficul-

| The use of multiple primers is key to finely describe species diversity
Our three primer sets were shown to perform equally well in detecting higher taxonomic ranks (phyla and classes). On the contrary, there were significant differences in coverage at finer taxonomic levels (e.g., more than 40% of the MOTUs were detected by one primer set only). Interestingly, complementarity slightly increased at the taxonomic level "variant" revealing that even variants belonging to the same MOTUs were not equally detected by all three primer sets. Consequently, the more desirable is a fine taxonomic resolution (MOTUs or variants), the more important is the use of multiple primer sets for revealing greater diversity and decrease false negatives. Furthermore, some primer sets will be more suitable than other depending on the predator. For example, ZFZR has the best coverage for Lepidoptera (Bc = 87%; Appendix 2).
When sequencing the diet of bats that eat in majority Lepidoptera ( Figure 5), this primer set will be better as it detects more MOTUs than the other primer sets (Figure 2). LFCR, on the other hand, has the best taxonomic coverage for Crustacea (Bc = 86%; Appendix 2) and, consequently, covered 94% of MOTUs and 79% of variants detected in P. microps feces. Overall, our results highlight the need to use multiple primers sets when one wants to tackle finely prey diversity and especially when the prey communities can be highly variable. It should be noted that some authors proposed to use one single highly degenerated primer set as an alternative (Elbrecht & Leese, 2017a, Piñol et al., 2019.

| The "one-locus-several-primer-sets" strategy
The COI gene is a locus of choice when studying invertebrate diversity and is largely used for this. However, the high variability that makes the COI a powerful marker consequently makes truly universal primer sets hard to design. Yet, several recent attempts, that involved highly degenerated primers, appeared very promising Alternatively, we followed an OLSP strategy (used in Corse et al., 2017; but see also Gibson et al., 2014) targeting the COI gene using three distinct primer sets (MFZR, ZFZR and LFCR). Initially developed for studying the diet of Z. asper, we applied this strategy to other types of invertivores to fully evaluate the applicability of our approach and found that each primer set detected at least some biodiversity that was hidden from the other primer sets (Figure 2). In some cases, we observed critical differences among primer sets: The estimates of the ingested prey communities were sharply different depending on the primer set considered for E. infrafasciatus or for P. microps from Prévost Lagoon ( Figure 5). Consequently, whether using one primer set or another is expected to considerably influence the qualitative interpretation of the trophic behavior of these two species. In fact, the differences in prey composition (Wsd) for a same sample obtained with different primers were very close or higher than that observed between samples (Bsd), the former representing technical noise, and the later representing a biological signal (Figure 3c,d), hence confirming the conclusions by Alberdi et al. and reads were merged into contigs when overlapping sequences (~130 bp) were identical such that the three different datasets could be either compared together or merged into a single dataset.
The use of amplicon sequence variants instead of MOTUs in metabarcoding analyses was recently suggested to improve and refine measurements of biodiversity (Callahan, McMurdie, & Holmes, 2017;Wares & Pappalardo, 2016). In fact, variant information was used to quantify intraspecific genetic diversity (Elbrecht, Vamos, Steinke, & Leese, 2018;Pedro et al., 2017;Sigsgaard et al., 2016) and was shown to approximate taxa abundance reliably for diet analyses (Corse et al., 2017). However, only robust and reliable datasets free from possible false positives and artifacts can allow such use of variant level information. Consequently, use of variant information in quantitative or semiquantitative ways will require reliable and robust bioinformatics filtering procedures of HTS data (see Corse et al., 2017).

| Public databases and the taxonomic assignment of prey
Both the confidence and the resolution of taxonomic assignment procedures are highly dependent on the completeness of reference sequence databases (e.g., Gibson et al., 2014;Porter et al., 2014;Devloo-Delva et al., 2018). In this study, we illustrated that the accuracy of taxonomic assignment of prey depended on the environment of the predators (Figure 4b). Palearctic samples (feces from Z. asper, P. microps and bats) displayed a high taxonomic resolution of prey with most variants assigned to species. On the contrary, samples from the tropics (feces of E. infrafasciatus from Africa and spider webs from South America) displayed a much lower taxonomic resolution generally corresponding to family level ( Figure 4b). In megadiverse environments and regions, such as equatorial environments, our ability to identify sequences precisely (e.g., at species level) is indeed strongly restricted by relatively incomplete public databases (e.g., Cowart et al., 2015;Beng et al., 2016;Lopes et al., 2017;Stat et al., 2017). Our results therefore illustrate the role of completeness of public databases and the importance in investing into species inventories to better understand food web and species interactions.

| Insights into the diet of Pomatoschistus microps and Epiplatys infrafasciatus
To our knowledge, our OLSP approach using a variant-based filtering procedure is the first to reveal the trophic ecology of P. microps and E. infrafasciatus with metabarcoding data.
Epiplatys species inhabit lentic rivers, usually near the surface under the leaves of plants (e.g., Romand & Morgalet, 1983). Diet of Epiplatys are mainly composed of aquatic insects (Guma'a, 1982;Ndome & Victor, 2002) although terrestrial organisms such as Formicidae may also represent important preys (Romand & Morgalet, 1983). This was the case here, with ~60% of Epiplatys prey being composed of terrestrial organisms such as Formicidae, Collembola, and Psocoptera and only 15% of preys being composed of unambiguously aquatic organisms ( Figure 5; Supporting Information Table   S5). Collembola may constitute allochthonous prey fallen from riparian vegetation or, given their small size (2-3 mm), secondary prey for E. infrafasciatus (i.e., ingested by their primary prey such as Formicidae). These results are consistent with E. infrafasciatus topmouth position that favors capture of prey at the water surface, notably allochthonous ones. We also detected ~25% of algae and fungi in E. infrafasciatus feces, suggesting that this species could also forage aufwuchs communities as reported for E. senegalensis (Ndome & Victor, 2002). Although only based on six fecal samples, our results suggest that E. infrafasciatus exhibits quite versatile foraging behaviors, exploiting both pelagic and benthic habitats and feeding on both allochthonous and aquatic prey.

| The capture of biodiversity by spider webs
Spider webs represent a potential noninvasive source of DNA for conservation, ecology, and management studies (Blake, McKeown, Bushell, & Shaw, 2016;Xu, Yen, Bowman, & Turner, 2015). They are able to trap a part of the local arthropod biodiversity present in natural (even pristine), agricultural, or urban habitats and may serve as "biodiversity capsules" (sensu Boyer, Cruickshank, & Wratten, 2015), especially for flying insects. To date, DNA extracted from spider webs enabled the detection of both the predator and its prey using diagnostic PCR under controlled conditions (Blake et al., 2016;Xu et al., 2015). We here present the first attempt to analyze the biodiversity captured by natural spider webs using metabarcoding. Among our environmental samples, spider webs from the equatorial forest of French Guiana displayed the highest biodiversity by far displaying the highest mean MNI ( Figure 6) and the highest MOTU diversity (Supporting Information Tables S6). The main taxa detected were flying insects, especially Diptera (52 MOTUs) and Hymenoptera (9 MOTUs), but also some Coleoptera, Trichoptera, Lepidoptera, and Hemiptera. However, contrary to previous studies using diagnostic PCR (Blake et al., 2016;Xu et al., 2015), we were unable to detect the DNA of the spiders that spun the webs even though our primer sets were able to detect a non-negligible part of Arachnida in bat fecal samples ( Figure 5). Regardless, we illustrate that spider webs do act as natural traps of biodiversity, and we confirmed that they are a promising tool for invertebrate diversity assessments when combined with the power of HTS (as suggested by Xu et al., 2015).

| CON CLUS IONS
We demonstrated the lack of predictability of both the coverage and complementarity of individual primer sets. Although in silico and in vitro (i.e., based on DNA extracted from isolated organisms) evaluations are useful to narrow down a subset of primer sets for metabarcoding studies (e.g., Elbrecht & Leese, 2017a;Piñol et al., 2019), in vivo evaluation of primers based on a subset of environmental samples is critically informative for minimizing false negatives before conducting larger scale analyses. Furthermore, we formalized the "one-locusseveral-primers" (OLSP) strategy that directly addresses primer set coverage biases to minimize false negatives. This strategy produces largely overlapping and comparable sequences, which cannot be achieved when targeting different loci, and facilitates the use of genetic diversity information contained within metabarcoding datasets.
However, we emphasize the importance of stringent variant-based filtering procedures to validate HTS data (Callahan et al., 2016;Corse et al., 2017) before genetic information of metabarcoding datasets can be used to estimate (semi-)quantitative diversity indices. Our workflow, combining the OLSP strategy and stringent variant-based filtering, can be easily adapted and extended to other loci and other applications for studying biodiversity through metabarcoding.

ACK N OWLED G M ENTS
We warmly thank the staff from the French Agence Française pour la Biodiversité (AFB) and from Aix-Marseille Université who par-

CO N FLI C T O F I NTE R E S T
None declared. and E.M. wrote the paper. All authors drafted the paper.

DATA ACCE SS I B I LIT Y
Supplementary data deposited in Dryad (https://doi.org/10.5061/ dryad.2ck7120) include the unfiltered HTS data from the MiSeq runs used in this paper, as well as the sequence alignment generated with Primerminer for in silico evaluation of metabarcoding primer sets.