A fine‐scale phylogenetic assessment of digenean trematodes in central Alberta reveals we have yet to uncover their total diversity

Abstract Despite over 100 years of digenean trematode parasite species descriptions, from a wide diversity of vertebrate and invertebrate host species, our ability to recognize the diversity of trematode species within a single lake remains an incredible challenge. The most challenging aspect is the identification of species from larval stages derived from intermediate hosts, due to the disjointed data of adult worm morphological descriptions, from which species are named, and links to corresponding molecular identifiers in depauperate databases. Cryptic species also play a significant role in the challenge of linking trematode larvae to adults, species identifications, and estimating diversity. Herein, we utilize a large, longitudinal dataset of snail first‐intermediate host infection data from lakes in Alberta, Canada, to infer trematode larval diversity using molecular phylogenetics and snail host associations. From our assessments, we uncover a diversity of 79 larval trematode species among just five snail host species. Only 14 species were identified to a previously described species, while the other 65 species are either cryptic or otherwise unrepresented by mitochondrial genes in GenBank. This study currently represents the largest and most diverse singular molecular survey of trematode larval fauna composed of over one thousand mitochondrial sequences. Surprisingly, rarefaction analyses indicate we have yet to capture the complete diversity of trematodes from our sampling area.

on parasites, causing gaps in our understanding of diversity and richness for defined geographical locations (Adlard, Miller, & Smit, 2015;Adlard & O'Donoghue, 1998;Mollaret et al., 1997). Taken together, plastic and cryptic morphology, with a lack of survey data, makes it more difficult to correctly identify a parasite sample from a new location.
Recent meta-and spatial analyses have shown that our understanding of parasite diversity is biased toward location, time, and parasite class, correlating with when and where taxonomists are most active during their careers, and it is argued that more taxonomists are needed (Poulin, 2014;Poulin & Jorge, 2018). Molecular methods have come a long way in allowing faster and more precise species identifications and the ability to make hypotheses about species relationships and evolution considering cryptic morphology.
However, even with these methods, regional checklists of hostparasite relationships remain incomplete (Poulin, Besson, Morin, & Randhawa, 2016). One major issue is depauperate and biased databases, directly related to research and funding interests, expertise, and the natural evolution of improving methodologies over time. So, not only do we need more taxonomists, but we need them to study more broadly to fill in these gaps in our understanding of parasite diversity.
Ecologically speaking, most parasites have incomplete life cycle descriptions. Likewise, our understanding of their distributions and interactions within and among host species is limited due to a lack of surveillance records and repeated or long-term studies. The dispersion of parasite data constrains our knowledge of the finer details of their ecology across broad geographic ranges. Additionally, unreliable morphological assessments in survey data present the caveats that (a) the species identities may not be accurate and (b) the survey may not represent true diversity within the area, missing cryptic species all together and underestimating overall diversity. Furthermore, the onset of molecular methods for species identifications has widened the knowledge gap through revelations of prior undetected diversity that cannot always be traced to a described species. In fact, the revelation of cryptic species is enhanced with greater sequencing effort, and more so for trematodes than any other group of parasitic helminth (Pérez-Ponce de León & Poulin, 2018). This, overall, can make it incredibly difficult to understand the larger picture when it comes to parasite ecology because we are lacking long-term, field studies, and precision in data collection.
Digenean trematodes are a very large group of parasitic helminths, with complex life cycles. The adult worms infect vertebrate hosts, in which their eggs are passed into the environment with the feces of the animal. The eggs hatch and infect a snail (or other mollusk), in which their larval development occurs. Larvae will emerge from their obligate, snail, first-intermediate host to then infect either a second-intermediate host or a definitive host, depending on the species. Current estimates for the number of trematode species range from 18,000 (Cribb et al., 2001) to 24,000 (Poulin & Morand, 2004).
Traditionally, taxonomic descriptions of trematodes are from morphological traits of adult worm stages derived from vertebrate hosts, as their most prominent features are fully developed and measurable, contrasting the less developed features of the larval stages (Schell, 1985). With the onset of molecular barcoding, not only have we realized the problems of cryptic morphology and the need for multiple lines of evidence for species delineation, but that for trematodes, we can now use larval stages to delineate species (Detwiler et al., 2010(Detwiler et al., , 2012Georgieva, Selbach et al., 2013;Gordy, Locke, Rawlings, Lapierre, & Hanington, 2017;Locke, Mclaughlin et al., 2010;Schwelm, Soldánová, Vyhlídalová, Sures, & Selbach, 2018;Soldánová et al., 2017). This is advantageous in that it is considerably easier to collect larvae from snail, first-intermediate hosts. The disadvantage is the lack of a direct connection between adult morphological records and molecular records.
The goal of this study was to capture an accurate identification of the trematode biodiversity among snail first-intermediate hosts to establish a better, ecological understanding of trematode communities and how they differ geographically and change over time.
In this study, we use molecular phylogenetic methods to assess species relationships, to identify collected specimens, and account for possible cryptic morphology. Snails and trematodes were collected from six lakes in central Alberta, Canada, over 3 years, from June to September. This longitudinal dataset provides novel contributions to the species diversity of trematodes, new geographical species records in central Alberta, and snail host association records, to better connect trematode life cycles. Though the data collected for this study were a continuation of our previous long-term dataset (Gordy, Kish, Tarrabain, & Hanington, 2016), the use of phylogenetic methods herein both expand and improve upon our understanding of trematode diversity and clarify identification issues we confronted previously.
Several trematode families have previously been given a considerable amount of attention in molecular phylogenies, more than others (e.g., Diplostomidae and Echinostomatidae). Therefore, species delimitation methods and acceptable sequence divergence limits have been tested for specific genes within these, well-studied, trematode families (Blasco-Costa & Locke, 2017;Detwiler et al., 2012;Georgieva et al., 2014;Georgieva, Selbach et al., 2013). Most trematode families have not been given such attention. Though there are general assumptions extrapolated from previous studies, such as 5% sequence divergence of cytochrome c oxidase subunit 1 (cox1) as an acceptable limit for species delimitation (Vilas, Criscione, & Blouin, 2005), this remains to be tested for all trematode families. Herein, we test this 5% assumption for delimitation using cox1 across seven trematode families.
The resulting diversity estimates from this study exemplify both the power and utility of molecular phylogenetics for species identification, but this study also identifies gaps and caveats that trematode taxonomists may face in future studies. Therefore, we provide commentary on the current caveats of the field of trematode taxonomy, cryptic species, depauperate databases, and areas in need of further research. We also provide a current record of trematode and host associations within Alberta and encourage the continued effort to better understand trematode diversity from both a regional and global context.

| Trematode and snail sample collection and selection
As a continuation of the 2-year survey described in Gordy et al. (2016), snails were collected for an additional year in the same manner from the following sites: Lake Isle, Lake Wabamun, Gull Lake-Aspen Beach, and Buffalo Lake-Pelican Point, Rochon Sands, and The Narrows ( Figure 1). All methods regarding collection and sample processing, including molecular methods, were the same as previously described (Gordy et al., 2016(Gordy et al., , 2017. Briefly, snails were collected from sites previously established and brought back to the laboratory for examination of patent infection by larval trematodes. Trematode infections, when patent, resulted in larval cercariae emerging from the snail into the surrounding water. Free-swimming cercariae were detected with a dissecting microscope, collected from the sample well, and preserved for downstream molecular work. Our original aim was to extract DNA and barcode every parasite sample. However, with over 2,400 samples, this goal was not feasible in cost and time. Nearly half the trematodes derived from the total collection were xiphidiocercariae, and previous sequencing efforts revealed these samples to be closest to Plagiorchis sp. (Gordy et al., 2016). Therefore, much of the sequencing efforts went to all other morphotypes for which there were enough cercariae available for sequencing (i.e., >10 cercariae, to keep a voucher stored in ethanol). For cost feasibility, we chose to sequence only 70 haphazardly sampled xiphidiocercariae samples, representative of sites and snail host species from which they were found. The sequencing effort strategy for all other morphotypes was complete coverage.
F I G U R E 1 Sample collection locations. Map of the six lakes from which snails and trematodes were collected in central Alberta, Canada. Depth of lake is given as a mean depth in meters Phylogenies were first constructed using a broad sampling of taxa within each family. Sequences of the same gene (either cox1 or nad1) were gathered from each species available in GenBank within that family. Because there are no standard methods yet employed for molecular taxonomic analysis within the Digenea, and much sequencing effort has been based on personal preference, we were unable to consistently attain a good representation of the species or even genera for several families, including Psilostomidae, Notocotylidae, and Plagiorchiidae. Because of issues with substitution saturation at broader taxonomic groupings for some families, their phylogenies were further refined into either genera or groups of closely related genera that were previously published as such (e.g., Hypoderaeum is paraphyletic to Echinoparyphium within the family Echinostomatidae (Detwiler et al., 2010;Kostadinova & Herniou, 2003)).
Phylogenies were constructed at a family-level with nonredundant sequences to understand species relationships. These family-based phylogenies were used as a benchmark for later phylogenies, in which redundant sequences were included for identification of individual sequences (specimen samples). Because there were many sequences, some phylogenies were divided to reduce the computation time (i.e., Strigeidae, Diplostomidae, and Echinostomatidae). We only present the information relevant to species identification phylogenies below, as the species relationships were the same as those within the nonredundant familylevel phylogenies.
After phylogenetic analyses, because many of our sequences were not directly within monophyletic groups of previously identified species, we employed additional tools to further distinguish taxa. The web app, Automatic Barcode Gap Discovery (ABGD; Puillandre, Lambert, Brouillet, & Achaz, 2012), was used in combination with a priori assumptions of a 5% cutoff in sequence divergence for species delimitation using p distances calculated in MEGA7 (Gordy et al., 2017;Kumar et al., 2016). For ABGD, nucleotide alignments were inserted and tested using all three distance measurements (Jukes-Cantor, Kimura 2.0, and simple distance) to look for agreements on grouping and prior maximal distance, using the default minimum slope of 1.5. Other specific methods will be described separately for each trematode family below. The one family that was included in downstream diversity analyses, but not described below is the Schistosomatidae because their phylogeny from this dataset was described and previously published (Gordy, Cobb, & Hanington, 2018).

| Family Notocotylidae
A final alignment of 98 cox1 sequences with a length of 327 bp was used for phylogenetic analyses. Echinostoma hortense (KR062182) was used as an out-group because of its prior demonstrated phylogenetic relationship to the notocotylid Ogmocotyle sikae (KR006934.1; Liu et al., 2016), which was one of only two sequences from GenBank we were able to use for comparison. The E. hortense sequence did cause one small gap in the final alignment.
Only O. sikae and Notocotylus sp. BOLD (KM538104) were used for comparison to the 95 sequences from this study, due to a lack of Notocotylid cox1 sequences available with significant overlap.
HKY + G was the best substitution model and was used for phylogenetic analyses. this alignment, because previous phylogenies have shown them as paraphyletic (Detwiler et al., 2010).

| Richness and recovery calculations
The following packages were utilized in R version 3.4.3 (R Core Team, 2017) to calculate richness and diversity metrics and plot them: vegan (Oksanen et al., 2018) and dplyr (Wickham, François, Henry, & Müller, 2017). Species richness was derived using the diversityresult (vegan) command to add unique species by site as well as pooled species richness for all sites, by snail species, and to view how they were represented by lake. Species accumulation and rarefaction were analyzed using the specaccum (vegan) command, utilizing the "collector" method to derive site richness in the order the data were collected and the "rarefaction" method to view an individual-based, rather than site-based, method for species accumulation, respectively. An Arrhenius nonlinear model was fit to a species accumulation curve to view the species-area relationship utilizing the specaccum with "random" method and fitspecaccum (vegan) commands. If we assume that morphological identification of larval trematodes gives the greatest confidence at the taxonomic scale of family, we predict that accumulation curves will plateau faster than with information derived from molecular phylogenetic identifications that can provide confidence to the species-level. To show this, we repeated the same accumulation and rarefaction analyses at the level of trematode family. This process was repeated for snail species, with exception of the Arrhenius nonlinear model, which would not converge.
Of these infections, 1,149 (46.8%) were classified as xiphidiocercariae by morphology (by having a clearly defined stylet in the anterior rim of the oral sucker (Schell, 1985)).
A total of 1,091 trematode cercariae samples were successfully extracted and sequenced for downstream molecular phylogenetic analyses. Less than 200 cercariae samples, excluding xiphidiocercariae, were not included in the final diversity analyses, either because of low quantities of cercariae, low quantity or quality of DNA, or bad sequencing results. Phylogeny results will be discussed in the same order as above, by family, in the sections below.
Several new lineage and singletons have emerged from these phylogenies, and we refer to them below as "species." We acknowledge the limitations of using molecular phylogenies for species identifications, without further supporting evidence (e.g., sequences from adult specimens) and that others would refer to them as operational taxonomic units (OTUs). However, we prefer to use the term species to remain consistent with our previous publications and sequence names.

| Family Notocotylidae
Despite there being 19 different species represented in GenBank from the superfamily Pronocephaloidea, only five species had cox1 F I G U R E 2 Molecular phylogeny of the Notocotylidae and Psilostomidae based on cox1. Bayesian inference phylogenies are given. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Posterior probabilities >0.50 and bootstrap values >50 are reported near the nodes, respectively. Accession numbers are given after species names. Emboldened taxa with three asterisks represent novel species from molecular analyses. (a) Notocotylidae. (b) Psilostomidae sequences available at the time of this analysis, and one of those sequences was from a region other than the typical barcoding region (Folmer). Two of the sequences, Notocotylidae gen. sp. 1 NZ and sp.
2 NZ, were eventually removed from analyses because they did not align well. Therefore, the only sequences from GenBank left for phylogenetic comparisons with our sequences were Ogmocotyle sikae

| The former Gorgoderidae
From BLAST results, several sequences in our dataset matched most closely to the sequence for Gorgoderina sp. (FJ477202) in GenBank.
When attempting to find other sequences for use in downstream analyses, we found that nearly every species in this family was only represented by 28S or ITS. A cox1 sequence was available from Pseudophyllodistomum macrobranchiola (LC002523); however, the sequence was downstream of the Folmer region and did not overlap with our sequences. Upon further investigation, we found that these sequences matched very close to our other sequences for Notocotylidae sp., despite no BLAST matches from GenBank to Notocotylids. We therefore dissolved the Gorgoderidae sequence group, merging these sequences with the other Notocotylidae sequences, and have updated our previously published sequence (KT831348).
Our phylogenetic analyses have revealed four Notocotylid species from our samples. Both ML and BI trees agreed on topology with strong statistical support ( Figure 2a). Though all ABGD methods agreed, they only split the groups into three (JC p max = 0.0215; K2 and simple p max = 0.0129): E. hortense, O. sikae, and Notocotylus sp. The only GenBank sequence to group with our sequences was Notocotylus sp. BOLD (KM538104).
Based on pairwise distances, however, Notocotylus sp. as a single group determined by ABGD was not supported based on the 5% cutoff, as several sequences within the group were more than 5% different from others, despite the average intraspecific divergence being 3.0% for all. Two sequences, isolates MGC683 and MGC1730, expressed 6.8%-10.2% and 5.0%-10.2% intraspecific divergence, respectively. Without including these sequences, the range of intraspecific divergence was 0.0%-5.6%, which is more reasonable for a single lineage, however, still beyond the cutoff.
We suspected further division within the tree topology, as some sequences continued to be closer or above the 5% divergence cutoff. Those that grouped outside of the primary clade (identified as Notocotylus sp. A) and closer to MGC683 were then separated further and support by intraspecific divergence was then within the cutoff range. In doing this, the average interspecific divergence between Notocotylus sp. A and D is 3.8% with a range of 2.8%-5.6% (Appendix:

| Family Psilostomidae
Only a few species within the Psilostomidae had representation by cox1 in GenBank and significant overlap with our sequences.
In molecular phylogenies, none of the sequences from this study grouped with any of the GenBank species representing the Psilostomidae family, but created their own monophyletic group, sister to P. varium. Both BI and ML trees agreed on topology ( Figure 2b). The six sequences from this study were 0%-0.8% divergent from each other, with an average intraspecific divergence of 0.4%, and interspecific divergence of 14.3%-24.6% (Appendix: Table A2). Because of the low identity to any of the available genera from this family, the sequences from this study have therefore been identified broadly as Psilostomidae gen. sp.
A. All six samples were derived from cercariae emerging from H. trivolvis snails.

| Family Haematoloechidae
Despite there being 18 Haematoloechus spp. with cox1 sequences available in GenBank at the time, only two sequences from this database overlapped with our sequences because of different choices in sequenced cox1 regions. In addition, no other genera within the Haematoloechidae were currently represented in GenBank.
The four Haematoloechidae sequences from this study were 100% identical to each other, but 13.4%-25.8% divergent from GenBank sequences (Figure 3a and Appendix: Table A3). These four sequences were generalized to Haematoloechidae gen. sp. A, because there were no specific species within GenBank or other evidence that could provide more specificity at this time. Both BI and ML trees agreed with strong support for topology, as suspected for such little information. All four sequences were derived from samples that came from S. elodes snails collected at Pelican Point at Buffalo Lake.
F I G U R E 3 Molecular phylogeny of the Haematoloechidae and Plagiorchiidae based on cox1. Bayesian inference phylogenies are given. Clades representing a single species have been condensed for space. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Posterior probabilities >0.50 and bootstrap values >50 are reported near the nodes, respectively. Accession numbers are given after species names. Numbers in parentheses after taxon names correspond to the number of sequences within the clade. The first number is number of GenBank sequences and the second number, if given, represents number of sequences from this study. Emboldened taxa with three asterisks represent novel species from molecular analyses.

| Family Plagiorchiidae
Most Plagiorchiidae sequences in GenBank use a different region of the cox1 gene, downstream from the Folmer region. The only sequence that aligned with ours was one Plagiorchis sp. (FJ477214).
Phylogenetic analyses of Plagiorchiid sequences resulted in both ML and BI trees agreeing on topology with strong statistical support for external nodes and moderate support for internal nodes ( Figure 3b). All methods within ABGD supported the differentiation of lineages within the tree to nine groups other than the out-group (p max (All) = 0.004-0.0599). Pairwise and averaged intraspecific divergence values were supported by the 5% cutoff, and the highest value was 2.1% within Lineage 1. The average interspecific divergence had a range from 8.9% to 18.8% (Appendix : Table A4). Further support for the differentiation of some lineages was found among intermediate host use, as Lineage 6 utilized H. trivolvis, Lineage 7 used L. stagnalis, and all other lineages were found emerging from S. elodes. Because this diversity was greater than we had expected by morphology (indicating possibly two species based on relative size) and prior BLAST results, we were unable to assign the unsequenced samples to these nine different lineages. Therefore, in downstream diversity analyses that require abundance information, these lineages have been conservatively lumped into one species, called Plagiorchis sp.

| Family Echinostomatidae
For each separate alignment, ML and BI phylogenies were compared and found to agree on major topology. In instances where external node topology disagreed between the two methods, this was identified as a separate tree.

| Drepanocephalus
Both nad1 sequences from this study grouped monophyletically with D. auritus sequences. Drepanocephalus sp. was paraphyletic to the D. auritus group and displayed a nucleotide divergence range to the D. auritus group of 14.4%-15.5% (Figure 4a). The intraspecific divergence within the auritus group ranged from 0.0% to 4.4%, with an average of 2.2% (Appendix:

| Neopetasiger
The 10 sequences from this study all grouped within Neopetasiger sp. 4 and were <1% different in nucleotide identity from Neopetasiger sp. 4 (KM191817), with an average intraspecific divergence of 0.2% and an interspecific divergence of 21.1%-28.2% ( Figure 4b and Appendix: Table A6). All Neopetasiger sp. samples from this study were derived from H. trivolvis snails, further indicating their specialization for planorbid snails, as indicated by other studies (Table 3).

| Echinoparyphium/Hypoderaeum
All methods in ABGD agreed on separation of the alignment into 17 groups (p max (all) = 0.0129; Figure 5a). This separation was further supported by considering the range of intraspecific divergence values reported previously for several of these same lineages (Detwiler et al., 2010). Furthermore, most groups supported a clear separa- and Hypoderaeum sp. Lineage 1. For example, though ABGD showed Echinoparyphium sp. Lineage 3 to be one group made of four sequences, their p distance values were very divergent. The two sequences from GenBank previously identified as Lineage 3 were 2.7% divergent from each other, but 9.7%-11% divergent from the two sequence from our study that were 3.7% divergent from each other. To us, this was a clear split and was also highly supported by posterior probabilities and bootstrap values in phylogenetic analyses as well.
We therefore derived a new lineage, Echinoparyphium sp. Lineage 4.
Within the Hypoderaeum sp. Lineage 1 clade, there was an obvious split occurring with three sequences forming their own clade (MGC577, MGC650, and MGC824). This split was not supported by ABGD or by host use, as all utilized S. elodes snails. The nucleotide divergence, however, ranged between 0.3% and 5.4%. The small clade that was found diverging from the rest was 0.3%-0.7% different from each other and 5.0%-5.4% different from the others.
The split was obvious and well supported within the phylogenies.
We have therefore split this lineage into two groups, now including Hypoderaeum sp. Lineage 2 (Figure 5a).

Several additional new lineages have been added to the genus
Echinoparyphium because of our sequencing efforts. We have labeled these as Echinoparyphium sp. A-E, and for the two that are close to the previously identified Echinoparyphium sp. Lineage 1, we have labeled as Echinoparyphium sp. Lineage 1 A-B (Figure 5a). F I G U R E 4 Molecular phylogeny of the Echinostomatidae: Drepanocephalus and Neopetasiger genera based on nad1. Bayesian inference phylogenies are given. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Posterior probabilities >0.50 and bootstrap values >50 from BI and ML analyses are reported near the nodes, respectively. Accession numbers are given after species names. Clades representing a single species have been condensed for space. Numbers in parentheses after taxon names correspond to the number of sequences within the clade. The first number is number of GenBank sequences and the second number, if given, represents number of sequences from this study. (a) Drepanocephalus. (b) Neopetasiger | 3165

GORDY anD HanInGTOn
One group we could not clearly delineate further, despite divergence higher than the cutoff. Echinoparyphium sp. Lineage 2 displayed above 5% intraspecific divergence, with an intraspecific range of 0.0%-5.7% and an average of 1.2%. The one isolate responsible for this greater divergence value is MGC369 that ranges from 1.7% to 5.7% from all other isolates within this lineage. All other isolates in this lineage have an intraspecific divergence range from 0.0% to 4.3% without MGC369. While this difference would seem a clear divergence, the phylogeny does not support the placement of MGC369 outside of this lineage. From host use, we find that MGC369 utilized L. stagnalis, whereas the majority of Lineage 2 isolates used S. elodes.
While this would also seem to support differentiation, one other member MGC16B also utilized L. stagnalis, with very close sequence homology to other Lineage 2 members (0.3%-4.3%). Because neither the phylogenies nor host use supports further differentiation for this group, MGC369 remains in this lineage.
The cox1 phylogenies for the Echinostomatidae (Figure 5b), for the most part, were well supported and matched patterns seen within the nad1 phylogenies for this family. Because a few samples had both cox1 and nad1 sequences available, the lineage identities were informed by nad1 because there were not many GenBank cox1 sequences that matched. Overall, there was only one lineage within F I G U R E 5 Molecular phylogeny of the Echinostomatidae: Echinoparyphium/Hypoderaeum genera based on nad1 and cox1. Bayesian inference phylogenies are given. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Posterior probabilities >0.50 and bootstrap values >50 from BI and ML analyses are reported near the nodes, respectively. Accession numbers are given after species names. Clades representing a single species have been condensed for space. Numbers in parentheses after taxon names correspond to the number of sequences within the clade. The first number is number of GenBank sequences and the second number, if given, represents number of sequences from this study. Emboldened taxa with three asterisks represent novel species from molecular analysis. (a) nad1. (b) cox1 F I G U R E 6 Molecular phylogeny of the Echinostomatidae based on nad1. Bayesian inference phylogenies are given. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Posterior probabilities >0.50 and bootstrap values >50 from BI and ML analyses are reported near the nodes, respectively. Accession numbers are given after species names. Clades representing a single species have been condensed for space. Numbers in parentheses after taxon names correspond to the number of sequences within the clade. The first number is number of GenBank sequences and the second number, if given, represents number of sequences from this study F I G U R E 7 Molecular phylogenies of the Diplostomidae-I Group based on cox1. Clades representing a single species have been condensed for space. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Bootstrap values >50 and posterior probabilities >0.50 are reported near the nodes. Numbers in brackets after taxon names correspond to the number of sequences within the clade. The first number is number of GenBank sequences and the second number, if given, represents number of sequences from this study. Emboldened taxa with three asterisks represent novel species from molecular analyses. (a) Maximum-likelihood tree. (b) Bayesian inference tree

| Echinostoma
There was strong nodal support by both BI and ML trees for the topology of the Echinostoma species ( Figure 6). All ABGD distance methods supported the separation of the alignment into 15 groups (p max (JC & K2) = 0.0359; p max (simple dist.) = 0.02154). Intraspecific divergence values, based on the delineation cutoff, did not always support the same groups. For instance, Echinostoma miyagawai, Echinostoma robustum, and Echinostoma revolutum all exhibited ranges >5%, despite the average being lower, except for E. robustum whose average was 5.4%. Placement of one sequence within the tree did not match expectations but had high statistical support; E. robustum (GQ463053) grouped within a clade of E. miyagawai. The inclusion of the robustum sequence did explain the greater intraspecific divergence within this clade, but there was not support for its placement with the other robustum sequences that also exhibit high intraspecific divergence.
Further inspection of this particular robustum sequence has shown previous assessments that have identified this same trend, indeed showing it to be E. miyagawai .

Sequences labeled/identified as Echinostoma trivolvis from
GenBank resulted in two paraphyletic groups within the tree, the separation of which was confirmed by ABGD and within-group divergence values of <5%. These observations confirmed previous lineage separation by Detwiler et al. (2010).
The Echinostoma sequences from the present study all fit within two clades, either E. revolutum or E. trivolvis Lineage A. The revolutum group exhibited higher than expected intraspecific divergence that ranged from 0.0% to 6.0%. Though not supported by ABGD, there did appear to be two separate groups emerging, one that has been found among S. elodes snails (Lineage B) and the other among Lymnaea spp. and ducks (Lineage A). By splitting these lineages, we saw more reasonable intraspecific divergence values within Lineage B (0.0%-1.6%), yet Lineage A continued to exhibit divergence higher than the cutoff (0.0%-5.7%; Appendix: Table A9). Because Lineage B isolates all utilized the same snail host, we were more confident in the grouping of this lineage, but believe that further sampling will likely show greater differentiation within Lineage A.

| Family Diplostomidae
For both Diplostomidae-I and Diplostomidae-II groups, BI and ML phylogenies agreed on minor topologies, with greater support for external nodes and less support and agreement between the two methods for internal nodes (Figure 7). For Diplostomidae-I, all distance methods in ABGD agreed on 41 total groups (p max = 0.059), further supported by the 5% cutoff. A result worth noting from this phylogeny is that a sequence we previously identified as T. scheuringi  Table A10). Other lineage splits seen within Diplostomum baeri and Tylodelphys sp. 2 have previously been described (Soldánová et al., 2017) and are further supported with our phylogeny.
Twenty-three groups were identified for Diplostomidae-II, supported by all distance methods of ABGD (p max = 0.059) and the 5% cutoff. Two lineages made up of sequences from this study did not group within a specific clade of previously identified sequences and have thus been identified generally as Diplostomidae gen. sp. O and sp. X. One such sequence was previously identified as being most like Ornithodiplostomum sp. 4 (KT831363); however, in this phylogeny, it grouped far from the other Ornithodiplostomum sequences. Of note is that a sequence from GenBank previously identified as Posthodiplostomum sp. 3 (FJ477217) grouped with high statistical support with sequences of Posthodiplostomum centrarchi (KX931421-KX931423), supporting a very recent report of this same identification (Stoyanov et al., 2017; Figure 8 and Appendix: Table A11).

| Family Strigeidae
Few species with sequences across the cox1 barcoding region were available from GenBank for comparison within the Strigeidae-I group. At the start of our analyses, only two species had matched with some of our sequences, Cotylurus cornutus and Cotylurus gallinulae. More recently, more Cotylurus species have been added to GenBank (Locke et al., 2018), and these additions helped define three previously unidentifiable lineages from phylogenies. Both ML and BI trees agreed with strong statistical support for the division of all aligned sequences into 16 groups, which was further supported by ABGD (p max (all) = 0.0077-0.0129). Sequences from the present study were all more closely related to Cotylurus as opposed to Ichthyocotylurus, based on p distances. Five could be identified to previously named species (C. cornutus, C. gallinulae, Cotylurus flabelliformis, Cotylurus marcogliesei, and Cotylurus strigeoides), and six other lineages did not match to any GenBank sequences and have been identified as Cotylurus sp. A-F. Clade division is further supported by intermediate host use. While intraspecific divergence was within the cutoff for all species, there was lower than expected interspecific divergence between C. cornutus and C. flabelliformis (4.2%; Figure 9a and Appendix: Table A12).
The Strigeidae-II group utilized the previously published phyloge- California, and JX977727, an adult from Mexico. Though they differed from some other LIN1 sequences >5%, they were more similar to other LIN1 sequences with divergence <5%, which made it difficult to clearly delineate whether there was one monophyletic clade or more. Currently, there is not enough evidence to clearly support more than one clade within Lineage 1. Therefore, with the best supported information, there appeared to be 23 groups within the Strigeidae-II, which revealed three new species of Apatemon: species A, which included our previously published Apatemon sp. (KT831859), and species B and C. Though these three species all utilized S. elodes, they were molecularly divergent.
Within the Australapatemon clade, a new lineage appeared from isolate MGC2030 that utilized P. gyrina identified herein as Lineage 10. Lineage 9, with the addition of more sequences, as predicted in Gordy et al. (2017), has revealed the greater likelihood and separation of this lineage into two, which we have called Lineage 9A and Lineage 9B, both of which were hosted by S. elodes snails ( Figure 9b and Appendix: Table A13).
Of the 79 total trematode species reported here, 59 are newly identified species in this report that have resulted in 15 updated identifications for previously published sequences (Table 3). Thirtynine of the 59 new identifications represent novel lineages/singletons (represented by "a" in Table 3), with another two lineages that represent a recent split (Australapatemon sp. LIN9A/9B). The remaining 20 species were previously identified, and for 15 of them, we have added further sequenced specimens, confirming their previous identifications and adding to our understanding of species presence and abundances in Alberta lakes (Table 3 and Appendix:   Table A14).
Examining the relationship of trematode species richness and sample size (by sites/area and individuals) through rarefaction and nonlinear models revealed a stark contrast between whether confidence for delimitation was at a family-level (morphological analysis) or a species-level (molecular analysis; Figure 10a-c). Considering the accumulation of trematode families, the curves plateaued (individual-based) or approached one (site-based), suggesting we likely captured the available trematode families within our samples and F I G U R E 8 Molecular phylogenies of the Diplostomidae-II Group based on cox1. Clades representing a single species have been condensed for space. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Bootstrap values >50 and posterior probabilities >0.50 are reported near the nodes. Numbers in parentheses after taxon names correspond to the number of sequences within the clade. The first number is number of GenBank sequences and the second number, if given, represents number of sequences from this study. Emboldened taxa with three asterisks represent novel species from molecular analyses. Black diamonds represent sequences identified uniquely in GenBank that have high similarity and likelihood of being the same as a different species. (a) Maximum-likelihood tree. (b) Bayesian inference tree sample region. However, when looking at the curves based on trematode species accumulation, there was no plateau, suggesting that there was potentially greater trematode species diversity than we captured from our sampling. Snails, on the other hand, plateaued in rarefaction analyses (Figure 10d,e). This was not at all surprising, considering that over 3 years of collections, we had yet to find more than seven species.
The greatest richness recoveries by snail host species were found among S. elodes (40 trematode species), followed by P. gyrina (26), H. trivolvis (15), L. stagnalis (10), and P. armigera (1), following the same trend as identified previously (Gordy et al., 2016), but with more total species (Table 5). Specificity for snail host species was high (55 specialist trematode species) among all but 15 trematodes, of which were found to infect two or more snail species (generalists). Some trematodes were found infecting snails from completely

| D ISCUSS I ON
Fine-scale molecular analyses of trematodes in central Alberta have revealed many new and important insights about their diversity. What is perhaps most surprising is that species accumulation curves would suggest we have yet to capture all the possible trematode species within our sample area. Comparing the species-level to family-level accumulation clearly demonstrates how important the molecular phylogenetic perspective is. Herein, we have used the family-level as a proxy for the type of results achieved by morphological analysis only, in considering trematode larval stages. While morphological identification of trematode larvae can be less costly in terms of materials, it does not afford a very high level of confidence because of the issues surrounding cryptic species and underdeveloped, definable features. Family-level accumulation based on individuals and sites is achieved at a much higher rate than species, as expected, and reaches a plateau earlier. If, for instance, this representation is true of the number of species attained by a typical survey, it is likely that trematode surveys are missing much of the actual diversity present. This is important to note because of the potential impact on how we might interpret community assembly and structure in natural environments, especially in consideration of cryptic species.
Overall, the trematode species richness found by this longitudinal survey exceeded expectations, and the number of snail species needed in a community to maintain a diverse set of trematodes was surprisingly small. In our original morphological assessments, we expected 29 trematode species. With the use of molecular assessments, based on BLAST identities and fewer sequenced samples, generated from the first 2 years of the study, we had expanded our view to 39 identified species (Gordy et al., 2016). Now, with more available sequence information, and the use of more stringent methods, we have, in total, recovered double the species from previous assessments at a total of 79 trematode species, 55 of which are new records to Alberta from this study alone. This raises the recorded trematode species in Alberta to 114, representing 16 families (Appendix : Table A14).
For an ecoregion that has previously been considered speciespoor (Hoberg, Galbreath, Cook, Kutz, & Polley, 2012), sub-Arctic lake ecosystems have presented a surprising amount of trematode diversity from recent surveys. From one lake in Norway, on the 69th parallel, 24 different trematode species were recovered, representing seven different, common families from lakes in the Northern hemisphere (e.g., Strigeidae, Diplostomidae, Schistosomatidae, Echinostomatidae, Notocotylidae, Plagiorchiidae; Soldánová et al., 2017). Though further South, between the 54th and 52nd parallel, our study is still considered within the sub-Arctic region and has uncovered a range of 3-38 trematode species representing 3-8 families, each, among six lakes (the lower end, from Pigeon Lake and Lac La Nonne, was only sampled in 1 year as opposed to 3 years for the other lakes). In between, sampling of fish from the Saint Lawrence River in Quebec (between the 49th and 44th parallel) has revealed 47 species of just diplostomoids (Locke, Mclaughlin et al., 2010).
From these surveys, it is apparent that our perspective of what constitutes incredible or unexpected diversity is changing and will continue to change as we take a closer look with molecular data. In all three of these studies, the unveiling of cryptic diversity has been a large component. From a recent meta-analysis of 110 studies, it has been noted that there is a trend, particularly among trematodes, that sequencing effort positively correlates with more cryptic species as opposed to any other group of helminths. This has been attributed to differences in trematode biology, and our ability as taxonomists to identify them by their morphological characters, or lack thereof (Pérez-Ponce de León & Poulin, 2018).
From a basic search of the GenBank database, we can see that trematodes are not a neglected group, as there are 877,472 molecular records specific to digeneans (as of August 2018). However, this is not to say that specific groups of digenean are not neglected nor that representation is not highly skewed to particular gene regions or to those species most important to human or veterinary health.
Of the digenean sequences in GenBank, 15,185 were of mitochondrial origin. Considering the two most used mitochondrial genes for F I G U R E 9 Molecular phylogenies of the Strigeidae based on cox1. Bayesian inference phylogenies are given. Clades representing a single species have been condensed for space. Branches are colored by support values from phylogenetic analyses, with blue having the highest support. Posterior probabilities >0.50 and bootstrap values >50 are reported near the nodes, respectively. Accession numbers are given after species names. Numbers in parentheses after taxon names correspond to the number of sequences within the clade. The first number is number of GenBank sequences and the second number, if given, represents number of sequences from this study. Emboldened taxa with three asterisks represent novel species from molecular analyses.

TA B L E 2 (Continued)
TA B L E 3 Host associations, geographical origins, and life stages of specimen sequences used in phylogenies.

Species
Australapatemon sp. LIN8 --9 --9 Australapatemon sp.  Gibson, Bray, & Harris, 2005). Prior evidence related to their snail hosts is limited to records from the United States: N. attenuatus has been identified from Physa acuta in the Eastern United States (Graczyk & Shiff, 1993b), and N. urbanensis was identified from Stagnicola emarginata in Michigan (Keas & Blankespoor, 1997).  Gibson et al., 2005). It is impossible at this point to know whether the presence of species from these families is from recent introductions or not, but they are rare in the fact that we only collected a few from each family over the course of 3 years. Considering their host species are quite prevalent across Alberta, it is possible that they have been here and remained undetected, but they could also have expanded their distributions westward into Alberta as well.
The gap between morphological and molecular species identities is growing larger, and the effort to find a solution is not growing at the same rate. Without the link between the two, we are missing important information about life cycle dynamics due to host associations and infection processes that could help inform wildlife managers and possibly influence control efforts for human and veterinary diseases caused by trematodes. One possible solution to this, aside from more molecular data from adult worm samples, is the development of methods to derive quality sequence information from historical, adult trematode specimens. As these vouchers have been our historical standard for species identifications, they are our ultimate source for generating molecular libraries by which to further our understanding of trematode diversity, speciation, and evolution with the added benefit of linking life cycles.
Furthermore, we urge the contribution of sequences that represent a broader diversity of digenean trematodes. One current issue is that novel lineages in molecular phylogenies could either represent cryptic species or they could represent described species for which we have no/limited molecular resources. Therefore, placing emphasis on capturing a broader diversity of trematodes might help bridge knowledge gaps.

ACK N OWLED G M ENTS
We would like to acknowledge Elisabeth Richardson and Joel Dacks for fruitful discussion surrounding phylogenetic analyses, and Valerie K. Philips for assistance with fieldwork. We thank Sean A.
Locke, Janet Koprivnikar, and Jillian T. Detwiler for an early review of the manuscript, as well as three anonymous reviewers for their constructive comments. This work was funded by Alberta Innovates

Energy and Environment Solutions grant 2078 and NSERC 418540
(PCH), with partial salary support for MAG by an NSERC CREATE Host-Parasite Interactions student scholarship.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
MAG designed and performed the research, completed all analyses, and wrote the paper. PCH was involved with conception and design of the research, reviewing the results, and contributed to writing the paper.

DATA ACCE SS I B I LIT Y
DNA sequences: GenBank accessions MH368808-MH369793, TA B L E A 1 Average cox1 divergence within and between groups in the Notocotylidae. The number of base differences per site from between sequences are shown. Standard error estimate(s) are shown above the diagonal. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 98 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 322 positions in the final dataset. Standard error estimates are shown above the diagonal. Average within group divergence is given on the diagonal. Group intraspecific divergence ranges are given as percentages next to species names. Numbers in red are outside the delineation cut-off.

TA B L E A 3 Pairwise distance between individual cox1
sequences in the Haematoloechidae. The number of base differences per site from between sequences are shown. Standard error estimate(s) are shown above the diagonal. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 7 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 454 positions in the final dataset.

TA B L E A 4
Average cox1 divergence within and between groups of Plagiorchis sp. The number of base differences per site from averaging over all sequence pairs between groups are shown. Standard error estimate(s) are shown above the diagonal. The average within group divergence is given on the diagonal. The range of pairwise distances within each group are given as percentages in the first column after group names. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 55 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 435 positions in the final dataset.

TA B L E A 5 Pairwise distance between individual nad1 sequences in the genus
Drepanocephalus. The number of base differences per site from between sequences are shown. Standard error estimate(s) are shown above the diagonal. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 6 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 388 positions in the final dataset.

TA B L E A 6
Average nad1 divergence within and between genera of Neopetasiger. The number of base differences per site from averaging over all sequence pairs between groups are shown. Standard error estimate(s) are shown above the diagonal. On the diagonal are the average within group divergence estimates. The range of pairwise distances within groups is given in parentheses next to the group names. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 20 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 308 positions in the final dataset.

TA B L E A 7
Average nad1 divergence within and between genera of Echinoparyphium/Hypoderaeum. The number of base differences per site from averaging over all sequence pairs between groups are shown. Standard error estimate(s) are shown above the diagonal. Intraspecific divergence values are given on the diagonal. The range of p distances is given for each group in parentheses as percentages after species names. Numbers highlighted in red are above the 5% cut-off, while numbers in green represent average interspecific divergence below 5%. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 261 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 299 positions in the final dataset. shown. Standard error estimate(s) are shown above the diagonal. Intraspecific divergence is given on the diagonal. Numbers in red represent groups that exceed the cut-off. The range of p-distances is given for each group in parentheses as percentages after species names. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 111 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 383 positions in the final dataset.  shown. Standard error estimate(s) are shown above the diagonal. The average within group divergence is given on the diagonal. The range of pairwise distances within each group are given as percentages in the first column after group names. Numbers in red lie outside the delineation cut-off. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 72 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated.
There were a total of 386 positions in the final dataset.  TA B L E A 1 0 Average cox1 divergence within and between genera of Diplostomidae-I. The number of base differences per site from averaging over all sequence pairs between groups are shown below the diagonal. Standard error estimate(s) are shown above the diagonal. On the diagonal are the average within group divergence values. Numbers within parentheses after species names represent the range of percent divergence within groups. Species with three asterisks represent novel species by molecular phylogeny. The analysis involved 196 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 334 positions in the final dataset.

TA B L E A 11
Average cox1 divergence within and between genera of Diplostomidae-II. The number of base differences per site from averaging over all sequence pairs between groups are shown. Standard error estimate(s) are shown above the diagonal. On the diagonal are the average within group divergence values. Numbers within parentheses after species names represent the range of percent divergence within groups. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 102 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 307 positions in the final dataset.  Gordy, M.A., et al., 2016, Parasitol. Res. 115 (10)