Metabarcoding options to study eukaryotic endoparasites of birds

Abstract There is growing interest in the study of avian endoparasite communities, and metabarcoding is a promising approach to complement more conventional or targeted methods. In the case of eukaryotic endoparasites, phylogenetic diversity is extreme, with parasites from 4 kingdoms and 11 phyla documented in birds. We addressed this challenge by comparing different primer sets across 16 samples from 5 bird species. Samples consisted of blood, feces, and controlled mixes with known proportions of bird and nematode DNA. Illumina sequencing revealed that a 28S primer set used in combination with a custom blocking primer allowed detection of various plasmodiid parasites and filarioid nematodes in the blood, coccidia in the feces, as well as two potentially pathogenic fungal groups. When tested on the controlled DNA mixes, these primers also increased the proportion of nematode DNA by over an order of magnitude. An 18S primer set, originally designed to exclude metazoan sequences, was the most effective at reducing the relative number of avian DNA sequences and was the only one to detect Trypanosoma in the blood. Expectedly, however, it did not allow nematode detection and also failed to detect avian malaria parasites. This study shows that a 28S set including a blocking primer allows detection of several major and very diverse bird parasite clades, while reliable amplification of all major parasite groups may require a combination of markers. It helps clarify options for bird parasite metabarcoding, according to priorities in terms of the endoparasite clades and the ecological questions researchers wish to focus on.

To date, most studies have focused on selected parasite groups, potentially missing interactions beyond these groups for lack of a practical approach to assess broader parasite diversity. Such interactions between phylogenetically distant parasite groups are very likely in birds; for instance, positive and negative correlations between avian malaria apicomplexans and microfilaria blood parasites have been observed using conventional methods (Clark et al., 2016). Likewise, interactions between microparasites and helminths from different tissues (bloodstream and digestive tract) demonstrated in other hosts may also be consequential (Graham, 2008;Knowles, 2011).
Molecular approaches have greatly facilitated research across a broad diversity of parasitic taxa and may help unravel the huge and largely undescribed diversity of bird parasites (Carlson et al., 2020). Metabarcoding in particular has become a standard approach to describe biological communities in a variety of contexts (Deiner et al., 2017;Pollock et al., 2018;Xu, 2016) and holds potential for surveying parasite communities. Metabarcoding involves amplifying a small (up to a few hundred bases long) genetic marker ("the barcode") using a pair of primers that will anneal in all target taxa.
Next-generation sequencing technology is then used to sequence the diversity of barcode variants obtained. These sequences are searched against a reference database (e.g., the Silva or the BOLD databases) to identify the taxa amplified from the sample whose sequence is also present in the database. The primer pair is a defining feature of the barcode and will determine which taxa are amplified and thus detected as part of the sampled community. Finding and using adequate primers is therefore key to the success of any metabarcoding enterprise.
Exhaustive description of parasite communities on a large scale may be difficult with traditional techniques such as microscopy, and there can be important benefits of applying a metabarcoding approach to parasite identification. For instance, it enables detection of all parasite life stages; it makes high throughput processing of samples possible; and DNA sequences are easy to share between laboratories (Aivelo, 2015;Aivelo & Medlar, 2018). As such, metabarcoding is being used extensively to study bacterial communities, with the 16S rRNA gene being the barcode of choice across many host taxonomic groups, including birds (Ganz et al., 2017;Leclaire et al., 2019). Studies using metabarcoding for eukaryotic endoparasite characterization are much less common however. Most have been conducted in mammals, where they aimed at identifying fecal parasites (Aivelo & Medlar, 2018;Avramenko et al., 2018;Gogarten et al., 2020). In birds, where blood parasites such as apicomplexans also have a major importance, parasite metabarcoding studies investigating both fecal and blood samples are missing.
The phylogenetic diversity of bird eukaryotic endoparasites is extreme and spans no less than four different kingdoms. It includes unicellular parasites from both the Chromista and Protozoa kingdoms, fungi, and metazoans (birds, together with bony fish, have been predicted to harbor the largest number of undescribed helminth species; Carlson et al., 2020). The major issue preventing bird parasite metabarcoding investigations is the lack of a practical primer set to capture this astounding diversity, while not amplifying excessive amounts of unwanted targets such as host DNA. This issue stems from the fact that some major parasites groups (e.g., helminths) are phylogenetically closer to the avian host than they are from other parasites of interest (e.g., protozoans) and is especially acute with blood samples, where red blood cells are nucleated. Therefore, the aim of this study was to test different candidate metabarcoding primer sets for their ability to recover the broadest complement of parasites from major clades, in both blood and fecal samples, despite large quantities of nontarget (host) DNA being present.
From the literature, we selected five different primers sets that have potential to detect diverse bird-infecting parasite clades. Some are targeted at the 18S rDNA, either for all eukaryotes or for singlecelled eukaryotes only, while others are targeted at the 28S rDNA region and were complemented with custom blocking primers to reduce amplification of host sequences. We then carried out direct laboratory comparisons using a set of samples including bird blood and fecal DNA extracts, as well as artificial mixes consisting of bird blood DNA spiked in with different proportions of nematode DNA.

| Samples
Sixteen samples were used for this study, consisting of: -Eight passerine bird blood DNA samples, known to be positive for hemosporidian parasites, based on the cytochrome b nested PCR described in Reis et al. (2021).
-Six bird fecal DNA samples.
-Two artificial mixes consisting of hemosporidian-negative bird blood DNA spiked with different proportions of nematode DNA (1:100 or 1:1,000 by mass).
Bird samples were collected on São Tomé Island (Gulf of Guinea, Africa). Species (illustrated on Figure 1)

| Primers
We tested five different primer sets deemed promising for bird parasite metabarcoding ( Table 2). The five primer sets were applied to the same above set of 16 samples.
Primer set 1, here coded "xca," was developed for the simultaneous analysis of the intestinal parasites and diet from the feces of farmland birds through environmental DNA metabarcoding (Cabodevilla et al., 2020). It is a generalist eukaryotic primer pair targeting a short region of the 18S gene (~180 bp amplicons; Figure S1). It is capable of amplifying a wide range of eukaryotic organisms from unicellulars to plants, fungi, and metazoans, while excluding bacteria and Archaea.
It has been shown to amplify various potentially parasitic groups such as apicomplexans, nematodes, and platyhelminthes.
This primer set is designed to amplify unicellular eukaryotes only, thus excluding host DNA, but also potential metazoan parasites, such as helminths. It entails a nested PCR with a first round using a set of unicellular-specific primers to exclude metazoans, followed by a second round using the primers of Comeau et al. (2011), also developed to study micro-eukaryotic communities. In silico analyses suggested that this set could amplify potentially parasitic groups such as Alveolata or Heterokonta; empirical analysis of human fecal samples yielded less than 10% human reads.
Primer set 3, coded "dc2," was also developed to retrieve micro-eukaryotes (and exclude metazoans) from animal-and plantassociated microbiomes, using a single PCR round (Bass & del Campo, 2020). In silico analyses also suggested a good sequence match with potentially parasitic groups such as Alveolata and Heterokonta.
Whereas primer sets 1-3 above targeted the 18S gene, primer sets 4 and 5 both targeted the 28S gene. Both used the same PCR primers RM2F and RM3R described in Kounosu et al. (2019) and deemed to amplify efficiently both Apicomplexa and helminths, while offering a reasonable amplicon length (~240 bp) and excluding bacterial sequences. These primers were shown to effectively amplify nematodes, platyhelminthes, and coccidia. In an attempt to improve their specificity by reducing amplification of host DNA, we designed two different blocking primers ( Figure S2) to prevent annealing of the reverse PCR primer on bird sequences, resulting in primer sets 4 ("bl3") and 5, ("bl4") respectively. Blocking primers are oligonucleotides that are modified at their 3′ end so that they cannot initiate elongation during PCR; they are designed to match the sequence of unwanted templates, but not that of target templates. They shall therefore selectively block amplification of nontarget sequences during PCR, either by competition with the forward or reverse PCR primer during annealing, or by preventing complete elongation along the nontarget DNA strands (Vestheim et al., 2011).
Further primer description is provided in Table 2 and Table S1.

| Library preparation and nextgeneration sequencing
For primer sets xca, nes (first round), and dc2, PCR were carried out in reactions containing 5 µl 2× Qiagen Multiplex PCR Master Mix with a final concentration of 0.4 µM of each forward and reverse primer. For the bl3 and bl4 primer sets, reactions were carried out as above, except that the respective blocking primers were also included to a final concentration of 12 µM. Either 1 or 2 µl DNA extracts were used as template, respectively, for blood and fecal F I G U R E 1 Bird species whose blood or fecal samples were used to assess different primer sets for eukaryotic endoparasite metabarcoding. Top left: São Tomé paradise-flycatcher (Terpsiphone atrochalybeia); top right: São Tomé thrush (Turdus olivaceofuscus); bottom left: São Tomé weaver (Ploceus sanctithomae); bottom right: Principe seedeater (Crithagra rufobrunnea). Original pictures by Lars Petersson, reproduced with permission samples (which usually contain less DNA than blood samples), and water was added to a final reaction volume of 10 µl. The methods described here are intended to be applied to samples whose status for any given parasite clade has not been previously assessed.
Therefore, template DNA amount was not adjusted according to any previous information on, for example, parasitemia; back titrations indicated that the total template DNA amount ranged between 8 and 78 ng per reaction. For the nes primer set (second round), PCR were carried out in reactions containing 5 µl 2× Kapa HiFi HotStart ReadyMix Mix (Roche) with a final concentration of 0.5 µM of each forward and reverse primer. Two microliters of round one PCR products were used as template, and water was added to a final reaction volume of 10 µl. Cycling conditions varied between primers and are detailed in Table S1.
PCR primers included 5′ overhangs (Table 2)  and 30 s at 72℃, followed by a final elongation step of 5 min at 72℃. Indexed products were then cleaned using the AMPure XP beads (Beckman Coulter) and quantified using the Epoch Microplate pectrophotometer. Samples were pooled by barcode to an equal concentration of 0.25 nM of each sample in the pool, and product sizes were ascertained using the TapeStation (Agilent). Pool concentrations were quantified more precisely for Illumina sequencing using qPCR. Sequencing was carried out on a MiSeq (M01998) machine using a Nano v2 500-cycle kit, with a PhiX spike-in of 15% by molarity.  , was TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG (all sequences in this report are given from 5′ to 3′) and the 5′ overhang on the reverse primer, coded "roh," was GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG. # indicates a C3 spacer at the 3′ end of the blocking primers.

| Data analysis
Sequences were demultiplexed using bcl2fastq, and adapters were trimmed with the Illumina software. For each primer set, paired-end reads were merged accepting a 10% mismatch rate using vsearch.
Only merged reads of a length comprised between 50 and 500 bp and containing both forward and reverse PCR primers (allowing a 10% mismatch) were kept, and primers were then removed from these sequences using cutadapt. Reads containing ambiguous nucleotides were also removed. Reads were clustered using swarm following a two-step procedure, first with aggregation distance = 1 and then with aggregation distance = 3. vsearch was then used to identify and remove clusters whose seed sequence was deemed chimeric. The remaining clusters were affiliated using BLAST and the Silva 138 or 138.1 database (https://www.arb-silva.de/), resulting in an affiliated OTU (Operational Taxonomic Unit) table for each primer set. These analyses were carried out using the FROGS suite in Galaxy (Escudié et al., 2018). Subsequent analyses on the affiliated OTU tables were carried out using R. The OTU table taxonomy information was manually curated, and the datasets were trimmed to the twenty most abundant OTUs for each primer set × sample type (blood, fecal, or mix) combination for analyses.

| Sequencing and filtering metrics
We carried out one sequencing run that included all primer sets. A total of 2,157,180 reads were obtained, of which 2,038,148 (94.48%) passed the Illumina "chastity filter". Error rate was 1.08% and, aver- Between 114,461 and 179,008 read pairs were obtained for each primer set (for the 16 samples combined). These were kept if they could be paired-end assembled, contained both the forward and reverse primer sequences, were 50-500 nucleotides long, had no ambiguous base calls, and were deemed nonchimeric. The proportion of reads passing these filters varied between 66% and 92% for four of the primer sets, while it was 37% for primer set dc2. This was due to a substantial number of reads being eliminated because they were less than 50 nucleotides long. The remaining sequence clusters (hereafter referred to as Operational Taxonomic Units, OTU) were subsequently affiliated using the SILVA database (Quast et al., 2013).
While 97%-99% of reads were affiliated for four of the primer sets, only 1% of reads were affiliated in the case of dc2; this marker was subsequently abandoned. Table 3 provides a summary of the number of reads kept and affiliated for each primer set. Malassezia detected here could be contaminants from the skin of birds (or field workers), rather than bird blood parasites.

| Blood samples
In terms of blood parasite detection, the xca, bl3, and bl4 primers all detected substantial microfilaria DNA in both Turdus individuals (4,710 microfilaria reads were detected in individual Tur_2 by bl4; Figure 2). Consistent with its bias against metazoans, the nes primer set failed to detect any microfilaria nematodes. Of note, extremely low microfilaria read counts were also returned by bl3 in Cri_2, Plo_1, and Ter_1 (one single read in each individual), and bl4 in Plo_1 and Ter_1 (two reads each). Such low read counts are invisible in Figure 2. Detailed read counts are provided in Table S2.
Plasmodiidae, another major blood parasite group, was detected with substantial read numbers in three individuals using the bl3 set and in four individuals using the bl4 set (up to 1,434 reads in a single individual). Lower levels were also detected by bl3 in Cri_1 and Tur_1 (30 and eight reads, respectively) and by bl4 in Tur_2 (34 reads Various primer sets also allowed detection of fungal families that include bird pathogens, namely Aspergillaceae or Debaryomycetaceae which, respectively, host the avian pathogens Aspergillus fumigatus (Beernaert et al., 2010) and Candida albicans (Jacobsen et al., 2008).
Finally, a substantial number of plant DNA reads (up to 3,882 reads in one individual) occurred with the nes primers, which was not expected; most of these plant amplicons were less than 250 nt long, that is, substantially shorter than the main 420 nt nes peak ( Figure S1). Figure 3 shows the number of reads obtained for the most prevalent taxonomic groups detected in six bird fecal DNA samples. The diversity of taxa detected was generally higher than with blood samples and included parasite, host, and diet-related items.

| Fecal samples
Parasitic coccidia such as Eimeria or Isospora (family Eimeriidae) were detected in the two Terpsiphone individuals with all four primer sets. Remarkably, all four sets consistently returned a higher Eimeriidae read count for individual Ter_1 than individual Ter_2.
Besides these two individuals, the bl3 set also detected a sizeable Eimeria population in Plo_1 (67 reads).
The three primer pairs xca, bl3, and bl4 returned notable numbers of Aspergillaceae DNA reads, most consistently in both Turdus and one Terpsiphone individuals. Debaryomycetaceae were also recovered, notably in both Turdus individuals by both xca and bl4. All primers retrieved other Ascomycota (up to 1,641 reads in one individual with xca), belonging to a variety of taxa that were not bird parasites but rather plant pathogens such as Cladosporium or Ramularia.
The nes primer set also returned large numbers of fungi reads from the phylum Basidiomycota. These reads were from taxa that are taxa from the Auriscalpiaceae family-which interestingly are plant pathogens and wood saprotrophs (Nguyen et al., 2016), and therefore likely came from the diet.
The nes primer set identified a ciliate population (phylum Ciliophora) in one Turdus individual. These reads were affiliated to class Colpodea, a group of ciliates that are common in freshwater and soil habitats but generally not considered pathogenic.
Other Ciliophora pathogenic in birds, such as Balantidium (class

| Artificial DNA mixes
We quantified the ability of the four primer sets to detect a known,  the most nematode reads for both mixes ( Figure 4). As expected, the nes primers did not return any nematode read from either mix, whereas xca maintained the nematode reads close to their original proportions (0.7% in the 1% mix and 0.28% in the 0.1% mix, Table 4).
On the other hand, the bl3 set increased the proportion of nematode reads to 9.6% in the 1% mix, and 1.69% in the 0.1% mix (i.e., roughly a 17-fold increase). The bl4 set also increased the nematode DNA proportions, to 13.7% in the 1% mix, and to 1.71% in the 0.1% mix (also a 17-fold increase).
Reads from taxa other than bird or nematode were retrieved with all four primer sets, although their identity and proportions were variable. As expected, the nes primers yielded very few host (bird) reads, and instead returned mostly Malassezia. The bl3 and bl4 primers also yielded some Malassezia, while nes, bl3, and bl4 detected tropical lichens (Ascomycota), and the xca set returned some unexpected DNA reads from a variety of nonbird vertebrate groups.

| D ISCUSS I ON
We tested five different primer sets for avian eukaryotic endoparasite metabarcoding in order to shed light on the usefulness of the data that can be obtained from the different sets and help researchers choose the most appropriate for their aims. For that purpose, eight blood samples, six fecal samples, and two controlled "bird+nematode" mixes were analyzed, and the outcome (in terms of parasite detection) was compared, revealing the pros and cons of each primer set.  To apply those criteria, the notions of "parasite clade" and "parasite reads" need to be defined, in the context of a metabarcoding analysis that is contingent upon the phylogenetic resolution of the markers and the completeness of the reference database. In this study, "parasite clade" is meant as any taxonomic family which includes known avian eukaryotic endoparasites; and a "parasite read" is a sequencing read affiliated to a parasite clade.
Criterion (i) (the number of different parasite clades) clearly seems relevant to the description of the global biodiversity of parasite communities and could arguably be used first; Table 3 shows how the primers tested here fared according to this criterion.
Criteria (ii) and (iii) (the overall number or proportion of parasite reads, respectively) give an overview of the capacity of a primer set to yield workable numbers of parasite reads. We think that, provided the same sequencing effort is applied to the different test primers and samples, the total number of reads is preferable since it will also account for the efficacy of the whole process (from PCR to sequencing), directly answering the question of how many parasite reads can be obtained. The proportion-based criteria (iii) and (iv) may be used if the sequencing effort is different from primer to primer, with criteria Our results suggest that primer sets bl3 or bl4 are good options to study highly diverse blood parasites ( Figure 2 and  (Flaherty et al., 2021). Likewise, the penguin diet primers of Jarman et al. (2013) have good potential to detect fecal parasite populations. Another possible approach is to carry out multiple targeted PCR to specifically detect various parasite groups, as done by Cannon et al. (2018).
Some fascinating co-infection patterns were detected in the small sample set analyzed here. Figure  based on the detection of double peaks (Reis et al., 2021;Rooyen et al., 2013). However, co-infections may be missed using this method if a lineage is substantially more abundant or preferentially amplified compared to the others, resulting in a single visible peak (Bernotienė et al., 2016). The bl3 and bl4 set also revealed that the plasmodiid infecting both Crithagra rufobrunnea individ- bird species. This is interesting to note since it shows that a single marker is capable of detecting both genera.
Clearly, a primary purpose of using these fragments is to diagnose infections by a broad range of parasite groups. It should be noted, also, that using primers able to detect simultaneously host, diet, and parasites of both the host and its diet (insects and plants), may allow a broader understanding of the ecosystem, by facilitating the integration of ecological and epidemiological data (Ezenwa, 2021). As an illustration of its use for ecological studies, we are currently applying the bl4 set to a large number of samples from São Tomé to assess the impact of anthropogenic habitat modification (deforestation for oil palm monoculture) on bird parasite communities.
The use of these short fragments for parasite evolutionary analyses may also be possible, in instances where a given primer set enables detection of different OTU (or amplicon sequence variants) from the same parasite clade. For instance, the detection of several distinct Plasmodium OTU with the 28S primers may shed some light on their evolutionary relationships across bird samples.
It should be noted on the other hand that individual apicomplexan genomes harbor several very different 18S gene copies, as investigated in Plasmodium, Cryptosporidium, and Toxoplasma (Nishimoto et al., 2008;Rooney, 2004;Stenger et al., 2015). As such, 18S sequence variation cannot be used to infer phylogenetic relationships between individuals in this group; 28S rDNA could be a preferable marker for that purpose, provided it does not display such intragenome heterogeneity (which to our knowledge has not been described, to date).

| CON CLUS ION
This side-by-side comparison of some promising primer sets for bird parasite metabarcoding showed that combining the 28S primers RM2F and RM3R (Kounosu et al., 2019) with a blocking primer offers an interesting option to describe major components of parasite biological diversity in the blood and feces, including helminths, apicomplexans, and fungi. This strategy also demonstrated potential to enrich an experimental mixture by returning more nematode reads than were initially present, and more than is returned by the xca set. This is promising in terms of the ability of these primers to detect co-infections with helminths and malaria parasites, which will be interesting considering the known facilita-

ACK N OWLED G M ENTS
We are grateful to a number of colleagues from CIBIO who helped with this project, notably Maria Magalhães and Susana Lopes for assistance with library preparation and Illumina sequencing, and Vanessa Mata, Ana Pereira, Raquel Xavier, Ricardo Lopes for advice on blocking primer design and testing. We are also grateful to Paulo de Oliveira from Universidade de Évora for insights on fungal ecology and to the field assistants on São Tomé, who helped with bird sample collection: Sidney "Dulay" Samba, Martim Veiga, and Ricardo "Mito" Fonseca. The nematodes whose DNA was extracted to serve in the test mixes were donated by Anna Perera from CIBIO, and the DNA from the bird used in these test mixes was extracted by Sandra Reis from CIBIO. This work was funded by Portuguese National

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The sequencing data generated during this study and supporting the above discussion are publicly available through the Open Science Framework (OSF) at this URL: https://osf.io/9zxht/ ?view_ only=c956d adb5f 0a48d 093bc f082a c5ab38b.