Phylogenetic placement of environmental sequences using taxonomically reliable databases helps to rigorously assess dinophyte biodiversity in Bavarian lakes (Germany)

1. Reliable determination of organisms is a prerequisite to explore their spatial and temporal occurrence and to study their evolution, ecology, and dispersal. In Europe, Bavaria (Germany) provides an excellent study system for research on the origin and diversification of freshwater organisms including dinophytes, due to the presence of extensive lake districts and ice age river valleys. Bavarian fresh‐ water environments are ecologically


| INTRODUC TI ON
Solid knowledge of ecosystem functioning and community dynamics during seasonal or longer periods, as well as conservation strategies and the impact of invasive species, essentially relies on precise original data about the spatial and temporal occurrence of the inhabiting organisms. Recent advances in the application of high-throughput sequencing (HTS) to analyse the molecular diversity in aquatic environments have enabled a better understanding of the community composition and species distribution in such ecosystems. Within the last decade, our knowledge of the microbial biodiversity has increased faster than ever, especially after Edwards et al. (2006) published the first metagenome analysis of environmental samples using next-generation sequencing technologies. An unexpectedly high protist diversity has been described since then in various aquatic environments (Cuvelier et al., 2010;Keeling et al., 2014;Kohli, Neilan, Brown, Hoppenrath, & Murray, 2014;Massana et al., 2015;Stoeck et al., 2010), revealing also the existence of seasonal variation in many phytoplankton taxa (Tillmann, Salas, Jauffrais, Hess, & Silke, 2014;Toebe, Joshi, et al., 2013) and demonstrating the power of these methods in discovering the world's hidden microbial diversity (Lindeque, Parry, Harmer, Somerfield, & Atkinson, 2013;Medinger et al., 2010). Unknown sequences derived from environmental samples may additionally assist discovering new species and even new lineages (Seenivasan, Sausen, Medlin, & Melkonian, 2013).
One of the most basic and important questions in evolutionary biology of microorganisms refers to mechanisms that have shaped their current distribution (O'Dwyer, Kembel, & Sharpton, 2015). The relative importance of principal processes such as divergence, dispersal, and selection by ecological filtering (Vellend, 2010) is a key point that needs to be worked out rigorously. There is an ongoing debate (Bass & Boenigk, 2011;Caron, 2009;Foissner, 2011) into whether microbes are all cosmopolitan, and lack distinct distributions (Fenchel & Finlay, 2004;Finlay, 2002;Read et al., 2013), or rather follow a moderate endemism model (Bass, Richards, Matthai, Marsh, & Cavalier-Smith, 2007;Bates et al., 2013;Coleman, 2001).
Under a less dogmatic perspective, plankton communities may actually consist of both widespread species and those with a smaller range (Bik et al., 2012;Coleman, 2001;Foissner, 2008;Žerdoner Čalasan, Kretschmann, Filipowicz, et al., 2019), and occurrences for such species remain to be identified individually.
A limitation that still precludes a comprehensive knowledge of protist distribution is the taxonomic confusion that exists due to complex determination procedures (Wheeler, 2008). To take full advantage of environmental sequencing, curated reference collections are necessary, linking the molecular data with scientific binominals.
A prerequisite to any assessment of the microbial biodiversity found in a given ecosystem is the confident placement of resulting molecular operational taxonomic units (OTUs) within particular lineages in the Tree of Life. Due to a huge number of reads, and the diversity of organisms captured by HTS of environmental DNA, initial OTU annotation using reference databases usually allows for confident taxonomic classifications only up to coarser taxonomic ranks (such as phylum, class, or order: Quast et al., 2013). A complementary approach, which enables finer levels of taxonomic assignments (i.e. family, genus, or species), is the phylogenetic placement of reads in reference trees, which also offers the advantage of estimating statistical support values for annotations, regardless of the read's length (Dunthorn et al., 2014). This method has been recently used in barcoding and biomonitoring projects of protists (Elferink et al., 2017;Keck, Vasselon, Rimet, Bouchez, & Kahlert, 2018;Medinger et al., 2010;Vergin et al., 2013).
In Germany, a number of dinophyte species have been originally described from the Bavarian region (Baumeister, 1957;Lindemann, 1920;Schrank, 1802) also being the focus of the present study.
However, most algae inhabiting Bavarian waters are presumably wider-ranging, with type localities outside of that region. Mauch, Schmedtje, Maetze, and Fischer (2003) reports 62 dinophyte species (online table available at www.gewae sser-bewer tung.de/files/ taxal iste.pdf), but this checklist does not provide any source references, is taxonomically not rigorous and also includes species from the marine environment (e.g. Heterocapsa rotundata, which is unlikely to be present in Bavarian freshwater habitats). Morphology based records of Bavarian dinophytes are otherwise rather sporadic, and only a few species have been reported so far including common Ceratium hirundinella, Gyrodinium (≡ Gymnodinium) helveticum, and Peridinium willei (Raeder, 1990;Schaumburg, 1996;Siebeck, 1982), as well as rarer species such as Cystodinium cornifax and Gloeodinium montanum (Höll, 1928). The true number of dinophyte species in Bavaria, and their spatial occurrences, is thus unknown at present, and more thorough biodiversity assessments of these key protists will benefit from the application of contemporary, high-throughput molecular methods.
In the present study, we provide an initial attempt to apply environmental amplicon sequences with the principal aim to uncover the yet unexplored dinophyte diversity in Bavarian lakes as an exemplary system. We determine their species by placing the ribotypes in phylogenetic reference trees (i.e. rRNA sequence comparison of multiple sequences including GenBank vouchers collected all around the world). Bavarian freshwater environments range from deep nutrient-poor mountain lakes to shallow nutrient-rich lakes and ponds (basic information is available at www.lfu.bayern.de/). These are characterised by a broad range of environmental conditions and resource levels and represent potential habitats for phytoplankton communities. We expect the dinophyte communities in those lakes to be composed of species with different ecological requirements and potentials for dispersal (Žerdoner Čalasan, Kretschmann, Filipowicz, et al., 2019), resulting in different effects on their distribution. Our results will lay the basis for a better knowledge of ecosystem functioning and evolutionary dynamics of protists such as freshwater dinophytes.

| ME THODS
Surface plankton tow samples were collected from piers at 13 localities in Upper Bavaria (Germany) in April 2017 using a plankton net (mesh size 20 μm). The localities included 10 lakes (two lakes were sampled at two sites) and one subsidiary river, to cover standing and flowing bodies of water as well (Table 1, Figure 1). Geographic coordinates were recorded for all sites using a standard GPS Garmin Ltd device. Cells were observed, documented, and measured under a CKX41 inverted microscope (Olympus; Hamburg, Germany) equipped with a phase-contrast option and a DP73 digital camera (Olympus).
Environmental DNA was extracted using the Genomic DNA from Soil kit (Machery-Nagel; Düren, Germany) following the manufacturer's protocol. The small subunit (SSU or 18S) of the ribosomal RNA (rRNA) operon V4 region (c. 410 bp) was the amplification target.
Forward and reverse primers were those used by Xiao, Wu, Liu, Xu, and Chi (2017). Amplification of DNA (PCR) for subsequent amplicon sequencing (Illumina) was carried out using 5 ng/μl template DNA, 1 μM of each primer and 2× KAPA Hifi HotStart Ready Mix (Roche; Penzberg, Germany). Resulting PCR products were visualised in 1% agarose gels and purified using AMPure XP Beads (Beckman Coulter). Dual indices and Illumina sequence adapters were attached by means of an Index PCR using the Nextera XT Index Kit (Illumina), and final PCR products were again purified using AMPure XP Beads. The library was validated using an Agilent 2100 Bioanalyzer Operational taxonomic units classified as Dinoflagellata as search strings and with an abundance of ≥10 were classified more accurately by phylogenetic placement onto a reference tree based on concatenated rRNA alignments. Full voucher information of this TA B L E 1 Geographic origin of the samples used in the molecular analyses. To characterise the localities in terms of their ecology, we provide the corresponding trophic state index (Nürnberg, 1996;OECD, 1982) Collectors  Table S1. To further explore dinophyte identities based on DNA sequences, we performed BLAST searches (Altschul, Gish, Miller, Myers, & Lipman, 1990).
The aligned matrices are available as *.nex files upon request.
Dinophyte phylogenetic analyses were carried out using maximum likelihood (ML) and Bayesian approaches, as described in detail previously

| RE SULTS
In the plankton tow samples analysed using light microscopy, we observed a considerable morphological diversity of dinophytes ( Figures 2 and 3), comprising about 10 different species and 2-3 easily distinguishable species per locality. It included photoautotroph taxa such as the relatively easy recognisable Apocalathium Occasionally, coccoid developmental stages of dinophytes were also observed (Figure 2b-d).
Reliable species determinations of our study are summarised in Table 2.

| D ISCUSS I ON
Reliable determination of organisms is a necessary prerequisite to explore their spatial and temporal occurrence and to rigorously test hypotheses on their diversification, ecology, and dispersal. Flowering plants, insects, and larger animals are well represented in extensive collections (Krupnick & Kress, 2005;Mayer et al., 2013;Rocha et al., 2014;Steinicke, 2014). In numerous cases, these have also been digitised over the course of the past decade, providing enduring and exact publicly available occurrence data (e.g. GBIF, GBOL, JSTOR, Tropicos ® ). Such powerful and continuously curated databases are scarce for protists, which are too small for direct observation and need microscopic expertise for examination. However, the problem is recognised, and considerable efforts have been made to build curated sequence databases and reference phylogenetic trees for dinophytes (Del Campo et al., 2018;Elferink et al., 2017;Mordret et al., 2018;Quast et al., 2013) and other microbial taxa.
Our reference tree comprising the known dinophycean sequence diversity is largely in agreement with previous rRNA approaches (Gu et al., 2013) as well as those based on excessive transcriptome sequence data (although using a much smaller taxon sample: Janouškovec et al., 2017;Price & Bhattacharya, 2017 sequencing amplicons. Our approach to determine dinophyte species using reference trees as inferred from multi-locus rRNA alignments is proven successful to a certain degree, at least for samples from the freshwater environment (providing also some new dinophyte records for Bavaria: Table 2), but also from the marine realm (Elferink et al., 2017;Wohlrab et al., 2018 Apocalathium (Mauch et al., 2003;Mischke, Riedmüller, Hoehn, Deneke, & Nixdorf, 2015;Schaumburg, 1996), Ceratium (Mauch et al., 2003;Raeder, 1990;Schaumburg, 1996;Schaumburg & Hehl, 2001), and Peridinium (Fröbrich, Mangelsdorf, Schauer, Streil, & Wachter, 1977;Mauch et al., 2003;Mischke et al., 2015;Raeder, 1990;Schaumburg, 1996), summing-up to almost 80% of all identified OTUs. Our sequence-based findings seem to support our own morphological confirmations in the samples that we have investigated, and such combinatorial approaches are needed (Medinger et al., 2010;Rimet, Vasselon, A-Keszte, & Bouchez, 2018;Mora et al., 2019), as long as the taxonomic impediment and uncertainty with taxon determination continue to exist. However, when comparing morphological and genetic species diversity assessments, one has to take into account the morphological variation and the cryptic speciation. On a broader spectrum, these might namely either seemingly corroborate or contradict each other. Thus, having a broad knowledge on the biology of investigated microbiota is of great importance, as for the example given below.
The Peridiniaceae are one of the most important groups of freshwater dinophytes and may comprise about a dozen species (Gottschling, Kretschmann, & Žerdoner Čalasan, 2017;Moestrup & Calado, 2018), half of which are already known from molecular DNA sequences (Table S1). In rRNA sequences, they show a considerable variation even within species (Izquierdo López,  leading to long branches in phylogenetic trees (Gu et al., 2013). This intraspecific variability, in combination with a reliable taxonomy at the species level, makes the Peridiniaceae a good example of how species determination can be effective using environmental amplicon sequences. We are able to assign all peridiniacean OTUs gained in this study to an established species of Peridinium (at least as long as no cryptic speciation has been documented in this lineage). Peridinium bipes exhibits a distinct morphology, but our approach allows for efficient differentiation between, for example, P. cinctum and P. willei that are challenging to tell apart using just light microscopy in monitoring studies. However, sequence variation might be lower (and closely related species may exhibit identical SSU sequences) in other groups such as Scrippsiella, and it is therefore not always possible to differentiate between   Gonyaulax clevei has been reported from German lakes (Hickel & Pollhinger, 1986), but has never been sequenced. It is possible that those sequences once gained will group with marine taxa such as species of Lingulodinium. Moreover, a few organisms are physiologically able to successfully overcome the physiological barrier between the oceans and freshwater habitats (Pokorný, 2009) and establish new populations. As living models, species such as Huia caspica and Kolkwitziella acuta might be a key in this respect, as they are found in both marine and freshwater habitats (Gu, Mertens, & Liu, 2016;Mertens et al., 2015). By any means, the precise biological role, and the overall biological activity, has to be worked out for marine taxa in freshwater environments. Future research should rigorously use physical specimens and preferably living strains. Moreover, dinophyte species richness in Bavarian lakes may be greater than that previously reported in the literature based on our genetic analysis.
Particularly, the considerable number of so far unknown Pfiesterialike sequences (Burkholder & Marshall, 2012;Calado, Craveiro, Daugbjerg, & Moestrup, 2009;Litaker et al., 2005) is impressive. A targeted search in future will assess this diversity and address questions, such as whether these are already accepted species without hitherto known DNA sequence information or even new species.
In the microbial world, the importance of DNA sequences linked to type material cannot be overestimated. In this respect, our approach to place OTUs on a reference tree using curated and vouchered representatives has been proven successful with the documentation of sequences identical to Biecheleria brevisulcata (Suessiaceae), Palatinus apiculatus (Peridiniopsidaceae), and Theleodinium calcisporum (Thoracosphaeraceae). The first and latter species have been described only a few years ago (Craveiro et al., 2013;Takahashi, Sarai, & Iwataki, 2014), but Palatinus apiculatus refers to a historical name from the 19 th century (Ehrenberg, 1838).
Usually, such names are not linked to DNA sequence information, but the application of epitypification has made the determination of such species unambiguous (Kretschmann, Žerdoner Čalasan, Kusber, & Gottschling, 2018). This strategy has not-to the best of our knowledge-been applied for the species of Ceratium, which makes the determination of numerous OTUs gained in this study at the species level vague. Once identified, strategic taxonomic clarifications of target organisms are possible and may refer to names such as Ceratium macroceras and Ceratium tetraceros, both being described from Bavaria (Schrank, 1793(Schrank, , 1802 Curated contemporary reference databases leave further room for improvements. For example, a number of the OTU sequences obtained in this study were assigned to Scrippsiella by the SILVA (Quast et al., 2013) reference, which is a predominantly marine dinophyte lineage. Such entries misleadingly refer to it as "Scrippsiella" hangoei, which is a name being classified today under Apocalathium.
Therefore, the correct species name for the OTUs is (probably) freshwater Apocalathium aciculiferum. It might be only a matter of time until this particular taxonomic confusion is corrected for future releases of the SILVA databases, as this error has been already corrected in the dinoref database (Mordret et al., 2018). The latter, in turn, relies on SSU reference sequence data only and is therefore unable to place environmental OTUs of studies using LSU (Elferink et al., 2017) and/or ITS sequences (Lutz, McCutcheon, McQuaid, & Benning, 2018). Concomitantly, taxa such as Peridiniella, Sphaerodinium and Tyrannodinium, of which only LSU sequences are known at present, are subsequently undetectable using dinoref (however, †Leonella and †Posoniella are also missing, although SSU reference sequences are already available). The variety of sequencing approaches thus requires a database that assures both, provision of extensive rRNA sequence information and taxonomic reliability. In this respect, the indication of sequences that have been gained from type material is also important (Pawlowski et al., 2012) and needs to be added in dinoref.
Our approach to detect dinophytes in Bavarian lakes is powerful and will lay the basis for solid information on which species are widely distributed and abundant, and which species are rarer and represent rather endemic entities with narrower distributions. If occurrences of dinophyte species correlate with environmental traits, then improved species circumscriptions also taking their ecological niche into account are possible. With our project, we may start to understand not only that a certain species occurs in a given freshwater habitat, but also why. Our example of field mapping, and the pursued predictability of freshwater dinophyte occurrences, has thus a great potential to serve as a model for other taxonomic groups and / or the investigation of similar and alternative environments in other parts of the world.

ACK N OWLED G M ENTS
We are thankful to Susanne S. Renner (Munich) for funding the