Making sense of environmental sequencing data: Ecologically important functional traits of the protistan groups Cercozoa and Endomyxa (Rhizaria)

We have compiled a database of functional traits for two widespread and ecologically important groups of protists, Cercozoa and Endomyxa (Rhizaria). The functional traits of microorganisms are crucially important for interpreting results from environmental sequencing surveys. Linking morphological and ecological traits to environmental factors is common practice in studies involving micro‐ and macroorganisms, but is rarely applied to protists. Our database provides functional and ecologically significant traits linked to morphology, nutrition, locomotion and habitats. We discuss how the use of functional traits may help to unveil underlying ecosystem processes. This database is intended as a common reference for the molecular ecology community and will boost the understanding of ecosystem functions, especially those driven by biological interactions.

to ecosystem functioning or particular ecosystem functions, and (b) measuring these relevant traits." The last 10 years have seen a welcome increase of focus on the less studied part of the microbial biome, the protists . Protists are an immensely heterogeneous, polyphyletic assemblage of unicellular organisms featuring a vast array of functional roles (Pawlowski et al., 2012). Depending on their associated functional traits, different protistan groups will thus play contrasting or complementary roles in essential ecosystem functions and services, including primary production and nutrient cycling. Understanding traits and their redundancy in microbial taxa may thus facilitate the interpretation of biotic, abiotic and global change drivers on the protist community. For example, protistan functional traits have been used to identify the main actors driving the shift between net heterotrophy and autotrophy in the oceans and to establish models predicting phytoplanktonic blooms (Alexander et al., 2015). In soils, testate amoebae have been useful as indicators of ecological conditions. For example, the functional traits of testate amoebae allowed researchers to: (a) successfully reconstruct past environmental changes in bogs and peatlands (Fournier, Lara, Jassey, & Mitchell, 2015;Marcisz et al., 2016); reflect several abrupt ecological changes in the last 800 years in a mountain wetland ecosystem, with increased anthropogenic modifications leading to a shift towards myxotrophic taxa (Kajukało, Fiałkiewicz-Kozieł, Gałka, Kołaczek, & Lamentowicz, 2016); (c) reveal that harder soil frost shifted the community composition of testate amoebae towards smaller, myxotrophic species ; and (d) study their dependence on plant functional types (Jassey et al., 2014). Using the functional traits of ciliates, complex predator-prey interactions models could be assessed (Tirok & Gaedke, 2010). Recently, the planktonic Protist Interaction DAtabase (PIDA) has been compiled using the available literature and it revealed that all major planktonic protistan lineages were involved in interactions as hosts, symbionts, parasites, predators and/or prey, with symbiotic association as the main interaction (Bjorbaekmo, Evenstad, Røsaeg, Krabberød, & Logares, 2019).
It is therefore crucial to expand publicly available databases of functional traits of ecological importance for the main protistan groups. However, gathering functional data on protistan traits requires specialists familiar with the specialized literature.
Here we provide a functional database for Cercozoa and Endomyxa (sensu Cavalier-Smith, Chao, & Lewis, 2018) to serve as a common reference and facilitate ecological studies. Increasing evidence from environmental sampling shows that these lineages are widespread and ubiquitous (de Vargas et al., 2015;Domonell, Brabender, Nitsche, Bonkowski, & Arndt, 2013;Geisen et al., 2015;Grossmann et al., 2016;Mahé et al., 2017;Turner et al., 2013;Urich et al., 2008). They comprise a vast array of functional traits in morphologies, nutrition and locomotion modes. Living modes include autotrophic algae, parasites of plants and animals (Bass, Ward, & Burki, 2019;Neuhauser, Kirchmair, Bulman, & Bass, 2014), and free-living predators (Burki & Keeling, 2014). Their community composition can be specifically and thoroughly assessed with an established protocol (Fiore-Donno et al., 2018). Because amplicon-based studies comprise the majority of the environmental surveys conducted so far, and the markers most frequently used for protists are sections of the rRNA genes (mostly 18S, variable regions V1, V4 and V9) (Pawlowski et al., 2012), we focused on taxa for which 18S sequences are publicly avalaible. for which an 18S sequence was available.

| Taxonomy
The Our table is hierarchically structured, with descending taxonomic ranks from left to right, down to the genus. We consider the genus level to be the most pragmatic for the purpose of assigning functional traits, but at the same time recognize that traits may differ significantly between species within each genus (e.g., food and parasite host preferences, morphology and behaviour).
However, sequences in reference databases are often not assigned to species. In addition, the resolution of the environmental sequencing data lacks precision at the species level, because similar sequences are usually clustered together in operational taxonomic units (OTUs), to compensate for amplification and sequencing errors. Consequently, closely related species will not (or rarely) be recognized in environmental sequencing data, making the genus level more adequate for our purpose. Each taxonomic rank has attributes to it, and therefore the information is compatible with results from environmental sampling. For example, in the Silva database, sequences can be identified to any taxomic rank (order, family or genus). A simple command or script can be used to join the user's taxonomic table with our functional table, based on the taxonomic entries (Table S1).

| Functional traits
We attempted to be as exhaustive as possible in the selection of functional traits; nevertheless, we mainly chose traits with known or potential ecological importance. We considered that the organisms' morphology, mode of movement, habitat and nutritional preferences were essential features. Thus, we classified the traits of interest in four categories: morphology, locomotion or nutrition modes, and habitats (Table S1).
Each category was divided into discrete units to be compatible with statistical analyses (i.e., each taxon can only be assigned once in each category). We attributed traits to taxa by searching in the relevant literature, starting from the original description, until we could find all attributes for each taxon. All consulted references are provided (Data S1; S2; Table S1 can also be accessed via GitHub The main distinction in all categories was between free-living organisms and parasites. In the morphology of the free-living cells, we separated cells protected by a shell from those that are naked. Two main categories of shells were recognized: organic/agglutinated or siliceous. We made a subcategory for the silica-shell-forming taxa because of their importance in the silica cycle in soil and water (Biard et al., 2016;Wilkinson & Mitchell, 2010). The other subcategory comprised organic and agglutinated shells, because only cells bearing an organic shell may facultatively develop an agglutinated shell around the organic layer (Dumack, Bonkowski, Clauß, & Völcker, 2018). We assumed that amoebae, amoeboflagellates and flagellates would differ in their locomotion and feeding behaviour and thus distinguished between them.
Three main locomotion modes were recognized: organisms bound to the substrate (i.e., gliding) or freely swimming. In nutrition, we first separated autotrophs (most of the marine Chlorarachnea) from the remainder. We further distinguished bacterivores, eukaryvores (feeding on fungi, microfauna, algae or other protists) and omnivores (feeding on bacteria and eukaryotes). We could not create more precise categories (e.g., fungivores) because of a lack of information (or contradictory reports/probable multiple trophic modes) for most taxa. We considered most Cercomonadidae as omnivores, based on recent observations (Flues, Bass, & Bonkowski, 2017;Geisen et al., 2016), although most characterized taxa were isolated with bacteria as the sole food source Howe, Bass, Chao, & Cavalier-Smith, 2011;). However, many taxa that have been originally considered as bacterivorous may actually be omnivorous, feeding on small flagellates, algae or yeasts, as ongoing observations and studies suggest. Further evidence from feeding experiments is required in this respect for many taxa.
We estimated that parasites of plants or stramenopiles (Phytomyxea) and animals (Ascetosporea) spent most of their life cycle as nonmotile endoparasites (within hosts), although they may be liberated as spores or ephemeral motile stages (Neuhauser et al., 2014).
Regarding ecology, we set the main distinction between soil and freshwater and marine taxa, because freshwater and soil-inhabiting taxa may easily switch habitat, while marine taxa are rarely found in terrestrial or limnic habitats (Heger et al., 2010;Shalchian-Tabrizi et al., 2008). Genera known to accommodate species from marine and soil and freshwater habitats were considered ecologically ubiquituous.

| RE SULTS AND D ISCUSS I ON
To improve our knowledge of ecosystem functioning, we need an enhanced understanding of the functional relationships between components of the microbial communities-bacteria, fungi and protists-and their environment transcending the traditional taxonomic framework. Understanding ecological processes can be facilitated through trait-based approaches that identify linkages between phenotypic and ecological traits of organisms. However, knowledge in this area for protists currently lags behind that for bacteria and fungi.
Additionally, culture-based experiments and behavioural/ecological observations of protists are now less widely undertaken than in the past, leading to a gap in such knowledge, which is regrettable as we are increasingly aware of the very high protistan diversity.
For environmental microbial diversity and ecology studies, the gold standard is currently amplicon-based surveys using high-throughput sequencing (de Vargas et al., 2015;Mahé et al., 2017) allowing the necessary sequencing depth for large sampling.
Our database has been designed to add ecological meaning to environmental studies reporting rhizarian sequences. For this, we compiled a reference database for cercozoan and endomyxan genera categorized by morphological and ecological features here termed functional traits. For full trasparency, we provide the sources consulted for each functional trait.
With respect to soil Cercozoa, we revealed several interesting and new interactions between taxonomic or functional groups, and the ecological processes shaping them .
For example, we showed a correlation between abundances of OTUs of (a) testate cells and soil moisture, (b) bacterivores and bacteria, (c) freely swimming protists and soil bulk density, and (d) plant parasites and C microbial biomass (Figure 1).
We compiled a table comprising each so far sequenced cercozoan and endomyxan genus and its associated traits in soil, marine and limnic environments (Table S1). For the table to be used in statistical analyses, categories represented by only one taxon had to be avoided. For example, there are only a few taxa that spend the majority of their life cycles as colonies (some Spongomonadidae).
We did not consider this trait, although it could confer significant behavioural and/or ecological differences from noncolonial lineages, as indicated by Fiore-Donno et al. (2019). Another example is Thaumatomonadida, which we classified as shelled, although they have interlocking silica scales, not forming an entirely rigid shell. It has been shown that they do not display the negative correlation with soil moisture as do other shelled taxa in this data set . For such particular cases, it is advisable to crosscheck the taxonomic and functional assignments and also to refer to the comments of the table.
Further research is likely to reveal a broader repertoire of traits for many of the taxa listed in Table S1 (updated versions will be accessible under https ://github.com/Kenne th-Dumac k/Funct ional-Traits-Cerco zoa-Endomyxa). It is also desirable that trait-based data sets are eventually informative at the species level. We hope that this work, based on relevant and significant functional information at a detailed taxonomic scale, will facilitate the establishment of new models of microbial interactions in microbial ecology and more generally in ecology.