All together now: Limitations and recommendations for the simultaneous analysis of all eukaryotic soil sequences

The soil environment contains a large, but historically underexplored, reservoir of biodiversity. Sequencing prokaryotic marker genes has become commonplace for the discovery and characterization of soil bacteria and archaea. Increasingly, this approach is also applied to eukaryotic marker genes to characterize the diversity and distribution of soil eukaryotes. However, understanding the properties and limitations of eukaryotic marker sequences is essential for correctly analysing, interpreting, and synthesizing the resulting data. Here, we illustrate several biases from sequencing data that affect measurements of biodiversity that arise from variation in morphology, taxonomy and phylogeny between organisms, as well as from sampling designs. We recommend analytical approaches to overcome these limitations, and outline how the benchmarking and standardization of sequencing protocols may improve the comparability of the data.

need for prior isolation of target organisms (Pawlowski et al., 2020).
These markers allow researchers to study several groups simultaneously, filling gaps in soil biodiversity data and serving as the basis for synthesis efforts (Orgiazzi et al., 2015). Such efforts are underway both at regional and global scales (e.g., Bastida et al., 2020;Delgado-Baquerizo et al., 2020;Ramirez et al., 2014) and are becoming increasingly important to ecological research .
As amplicon sequencing of eukaryotic markers becomes standard practice (Pawlowski et al., 2020), it is important to understand how the variation among taxa, sampling, and sequencing affect the analysis and interpretation of amplicon sequences derived from eukaryotic markers (Bent & Forney, 2008;Nekola & White, 1999;Ruppert et al., 2019;Taberlet et al., 2018). Here, we explore how the wide range of morphological and phylogenetic variation, as well as F I G U R E 1 Issues arising from variation in morphology. (a) Soil biota comprise a broad range of sizes (adapted from Swift et al., 1979). (b) The ideal scenario for amplicon sequencing-based studies is that a sequenced read is equivalent to an organism, or proportional to its abundance. (c) The multicellularity of eukaryotes results in an overestimation of the abundance according to body size, while (d) the abundance of organisms with multiple copies of the marker gene per cell will also be overestimated. Finally, (e) organisms belonging to a single species, but that contain multiple, different copies of the marker gene (intragenomic polymorphisms) may be estimated as several species. These biases affect estimates of (f) diversity, (g) the total abundance of organisms, and (h) the abundances of specific species. In the case of intragenomic polymorphisms, compositional data may be biased by the incorrect classification of different sequences from a single organism as multiple species, indicated by error bars sampling practices, compromise the comparability of diversity and community composition among taxa, studies, and sampling designs from amplicon sequencing data. Our aim is to identify the challenges and limitations associated with this approach, as well as to provide recommendations on how to produce and analyse amplicon sequencing data so that the data are reusable and results interpretable.

| PROB LEMS
Morphological, taxonomic, phylogenetic, and sampling variation may all bias the quality of amplicon sequencing data of eukaryotes.

| Morphological variation
Soil eukaryotes range in body size from unicellular protists to multicellular organisms (i.e., earthworms and snails; Figure 1a), all of which may contribute DNA to a soil sample. The study of ecological communities with amplicon sequencing relies on the assumption that the genes belonging to each individual in the community are amplified proportionally to their abundance in the community (ideally, one read: one organism; Figure 1b). All life deviates from this assumption, but with their variable morphologies, soil eukaryotes deviate from one read: one organism in several ways, and to a greater extent than prokaryotes, which we overview below.
First, multicellularity disrupts estimates of relative abundances, as these become confounded with the organisms' sizes (Elbrecht & Leese, 2015;Figure 1c). Second, the variable number of copies of a marker gene is exacerbated in eukaryotes. While bacterial cells may contain up to 15 copies of the 16S RNA gene, protists may contain between 1 and 400,000 copies of the 18S rRNA gene (Kirchman, 2018;Figure 1d). In bacteria, this variable gene copy number can be corrected using bacterial sequence databases (although is discouraged for soil prokaryotes, which are poorly represented in sequence databases; Louca et al., 2018); however, eukaryotic sequence databases are considerably sparser . These two phenomena obfuscate the relationship between the number of gene copies detected from a sample and the abundance of organisms (i.e., number of individuals) in the community, leading to potential overestimation of the abundance of larger individuals or those with the most copies of the marker gene Figure 1g-h).
Studies which focus on groups with known cell numbers by isolating the organisms prior to sequencing have approached this limitation by modelling relative copy numbers per individuals (e.g., in nematodes; Darby et al., 2013) but this does not work for the majority of soil fauna that are highly variable in body size. Third, a single eukaryotic cell may have multiple, different copies of a gene (intragenomic polymorphisms, Figure 1e), as has been shown for protists, nematodes, and fungi (Bik et al., 2013;Thornhill & Santos, 2007;Wu et al., 2016). While multicellularity and multiple gene copies per cell lead to an overestimation of abundance, intragenomic polymorphisms can result in inflated estimates of α-diversity including richness, or the number of taxa (Figure 1f). These polymorphisms can emerge quickly (i.e., over 400 generations in a nematode population; Bik et al., 2013). Furthermore, the number of marker genes per cell may vary within an individual. For example, the number of mitochondria in a cell depend on the cell's function (Veltri et al., 1990), and this may also result in skewed estimates of individual abundances.

| Taxonomic and phylogenetic variation
To work accurately, the marker gene or region of choice must be sufficiently conserved on either flank of the DNA segment so that primers can capture all versions of the segment; but it must also be adequately variable in the centre of the segment to classify species according to variations in the DNA sequences. Such an ideal universal marker does not exist, as life exhibits a wide range of morphological and phylogenetic variation, and increased universality of a marker generally comes at the cost of taxonomic resolution. While the 16S rRNA gene is widely used to classify prokaryotes into taxonomic units, no such consensus exists among eukaryotes, and extant markers must consider several hurdles that arise from this variation.
Primer mismatches, in which a primer does not match the DNA template and fails to amplify it, occur selectively (Elbrecht & Leese, 2012;Tedersoo et al., 2016), resulting in an underestimation of αdiversity (Figure 2b,e). Primer mismatches result in the systematic exclusion of certain clades (Nichols et al., 2018), making diversity estimates primer-specific. For example, soil invertebrate communities exhibit different compositions depending on whether the 28S rRNA gene or the COI gene is sequenced (Dopheide et al., 2019), and diversity estimates may depend on the choice of marker gene (e.g., 18S rRNA or COI gene; Tang et al., 2012), or on the target region selected within the marker gene (e.g., in the 18S rRNA gene; Leasi et al., 2018).
The ability of a marker gene to detect taxonomic α-diversity is further obscured by the relationship between trait-and gene-based taxonomy. On the one hand, varying rates of evolution between different clades result in different taxonomic resolutions. For example, morphological differentiation in recently radiated lineages may be more apparent than genetic differences (Eberle et al., 2020). These may result in the classification of all members of a clade as a single species (Tang et al., 2012;Tedersoo et al., 2016;Figure 2c,e), the underestimation of α-diversity (Leasi et al., 2018), and the potential underestimation of β-diversity. On the other hand, marker genes may be better able to identify cryptic species (Fonseca, 2018).  (Callahan et al., 2017;Edgar, 2018). However, this level of resolution may confound biological variation with sequencing-related artefacts, or capture population-level variation, affecting the ecological interpretations. Whether a single marker region can provide sufficient resolution to accurately characterize community α-diversity has been questioned (Rodriguez-R et al., 2018), and the degree to which universal markers capture diversity is marker-specific ).

| Sampling variation
One of the greatest incentives for using amplicon sequencing is the potential to facilitate synthesis and the comparison among distinct groups of eukaryotic taxa. Spatial structuring has long been recognized in ecological sampling designs in microbial ecology (e.g., spatially explicit designs; Yergeau et al., 2007), and the study of the global distribution of microbes (Fierer & Jackson, 2006;Green et al., 2004;Martiny et al., 2006). The spatial scale of sampling is critical for comparability across studies (Dickie et al., 2018), but has received less attention. The spatial scale of sampling (Dungan et al., 2002) is characterized by volume or area of samples taken (their grain), the spatial extent of a study, and affect the resulting estimates of diversity and community composition. For example, due to DDS, samples that are taken more closely together will typically have more similar compositions, and thus have lower total diversity, than samples that are taken further apart. It is therefore essential to take the grain, distance, and extent of the samples into consideration when comparing across samples which were collected in disparate ways. However, this information is seldom considered (Dickie et al., 2018). In addition, the abundance of soil biota varies over time, but temporal patterns of soil eukaryote diversity are poorly understood (Bálint et al., 2018;Briones, 2018).
The laboratory methodologies used prior to sequencing may also bias soil diversity assessments. Estimates of diversity are influenced by the laboratory protocol used for DNA extraction (Santos et al., 2017). Current protocols for eukaryotic sequencing are nearly identical to those for prokaryotic sequencing (e.g., the Earth Microbiome Project's protocols; Thompson et al., 2017), despite the much wider range of body sizes in soil eukaryotes ( Figure 1). The suggested amount of the soil sample (less than 1 g in most commercial DNA extraction kits) is F I G U R E 2 Issues arising from variation in phylogeny and taxonomy. (a) The ideal scenario for amplicon sequencing-based studies is that an operational taxonomic unit (OTU) is equivalent to a taxonomic species, and that all species are detected with sequencing; however (b) primer mismatches result in the systematic exclusion of certain branches of the phylogenetic tree which are not captured by the selected primer set, resulting in the omission of present taxa. (c) Differing evolutionary rates among clades may result in the clustering of several species into one OTU, and (d) the taxonomic classification of species may greatly differ from the sequence-based classification. (e) All three phenomena may bias estimates of biodiversity; however, the difference between taxonomy and sequence-based classifications may compound the misestimation of diversity, indicated by grey lines also much smaller than that traditionally used in morphological assessments of soil fauna, and this may result in estimates of α-diversity that are lower and have higher between-replicate variability (e.g., in nematodes; Wiesel et al., 2014). Indeed, the finding that body size positively correlates with random variation in community structure (Zinger et al., 2019) may be due to the patchiness that arises from observing large organisms with relatively small samples (De Gruyter et al., 2019). Further studies sampling larger volumes (e.g., extracellular DNA extraction, Taberlet et al., 2012;Zinger et al., 2016) are necessary to determine the extent to which β-diversity is inflated in eDNA data targeting larger soil organisms. F I G U R E 3 Spatial sampling issues that affect average species richness per sample ( ), total richness across all samples (γ), and total richness of an entire site, i.e. both within and outside of the samples (γ site ). All of the expected effects stem from two ubiquitous empirical patterns: the increase of number of taxa with increasing area or volume (taxa-area or taxa-volume relationship) and the Tobler's law (a.k.a. distance decay of similarity) that states that closer locations are more similar in their taxonomic composition that distant ones

| MOVING FORWARD
Despite the long list of biases inherent to eukaryotic amplicon sequencing, certain precautions can be taken to mitigate their impact in ecological studies. We propose a three-step approach to using eukaryotic amplicon sequencing data to study soil communities, and recommendations for future experimental designs.

| Stratify analyses
We advocate for the separation of eukaryotic amplicon sequencing data by group (heretofore stratification) as a way to deal with the wide range of organisms assessed with this technique (Graham et al., 2018). This approach is related to using different marker groups to target specific groups (Oliverio et al., 2020;Tedersoo et al., 2016).
Stratification ensures that detection biases do not propagate to the rest of the community and that observations remain comparable within groups. For example, intragenomic polymorphisms of 18S in nematodes can inflate nematode diversity estimates (Dell'Anno et al., 2015). Here, stratifying the data may ensure that the resulting bias does not affect other soil eukaryotes. There is no consensus on the optimal grouping, and this requires careful consideration.
Historically, soil biota have been grouped according to physical traits, most notably body size (Orgiazzi et al., 2016). Size-based stratification has a phylogenetic component and may overcome errors associated with body size (i.e., multicellularity), but can ignore finer problems derived from genetic differences, such as primer mismatches. Alternatively, stratifications based strictly on phylogeny may more comprehensively account for errors (e.g., Zinger et al., 2019). Growing evidence suggests that traits, including body size, are phylogenetically conserved across the tree of life (Blomberg et al., 2003;Martiny et al., 2015), and size data may not be available a priori. However, most research on trait conservatism in eukaryotes has focused on plants and vertebrates, and whether relevant morphological and genetic features are conserved in soil eukaryotes (dominated by fungi, protists and invertebrates) requires investigation. Whether stratification is necessary may depend on the research question, as analysing organisms with diverse body sizes with amplicon sequencing may be equivalent to assessing the community through estimates of relative biomass, rather than individual abundances (Elbrecht & Leese, 2015;Schenk et al., 2019;Yoccoz et al., 2012), once accounting for the difference in marker gene copies per cell.

| Rarefy separately
In addition to the standard issues associated with the ecological analysis of observational data, sequencing data can be further distorted by the amplification and sequencing processes. Amplification artificially and exponentially increases the number of reads in the original sample, and the sequencer used imposes its own limits on the number of reads. Amplicon sequencing data must therefore be standardized prior to statistical analyses (Quinn et al., 2019).
However, no consensus on the best methodology for standardizing amplicon sequencing data exists, and the optimal method depends on the ecological questions of interest (McKnight et al., 2019) and the characteristics of the data (Weiss et al., 2017). We advocate for rarefaction (see Gotelli & Colwell, 2001), which randomly resamples observations to the same depth as a sensible compromise.
Rarefaction outperforms most bioinformatics methods in compositional analyses (McKnight et al., 2019), and deals well with small sample sizes and variable read depths (Weiss et al., 2017).
When cataloguing all eukaryotes simultaneously, biases arising from morphological and phylogenetic variation may interact to further distort estimates of diversity. We suggest rarefying phylogenetic groups separately, as many of the characteristics which bias abundance estimates (i.e., marker gene copies per cell, organism size, and taxonomic resolution of the marker gene of choice) are phylogenetically conserved (Briones, 2014;Martiny et al., 2015). Consider, for example, a comparison between the diversity in two adjacent soil samples, one which captured a segment of earthworm tissue and a second one which did not. In rarefying both samples to the same observation depth, a high proportion of the reads in the first sample may belong to the earthworm. Consequently, diversity will be underestimated in the first sample relative to the second. Stratifying data may be a good starting point, however further benchmarking is necessary to determine the optimal grouping to reduce biases.
Performing rarefactions separately for each group allows the adjustment of the minimum amount observations of individuals for each group, and prevents the propagation of biases across different groups.

| Consider presence/absence instead of abundance
One way to address the mismatch between number of reads and abundance is to work with binary presence/absence (incidence) data instead of abundances. This approach has been recommended for Many fundamental ecological variables are derived from incidences, and are robust and practically useful (i.e., species richness, incidence-based β-diversity, and their scaling relationships). For example, in traditional ecology, simple incidence-based number of species (richness) can be predictive of biomass productivity and other ecosystem services (Tilman et al., 2014). It has also been shown that the amount of information about an ecological process can be, in some instances, higher in incidence data than in abundance data (Bastow Wilson, 2012;Dale et al., 2001). This is because incidences and abundances can be driven by different ecological processes (Orrock et al., 2000;Potts & Elith, 2006;Wenger & Freeman, 2008).
For example, while extreme winter temperatures or soil type may be a limiting factor determining presence/absence, the actual abundance may depend on variation in microclimates and microhabitats, which cannot be studied with the common destructive sampling practices (Fierer, 2017). In other words, variation of abundances can mask important ecological correlations at the incidence level (Bastow Wilson, 2012).
While converting amplicon sequence data is the most conservative approach to analysing diversity through amplicon sequencing, it is not perfect. Incidences can inflate the importance of rare OTUs (Deagle et al., 2019), which may be artefacts of sequencing errors or an overly-strict OTU similarity threshold. Furthermore, incidence data may not match research objectives, such as when defining a core diet or microbiome (Alberdi & Gilbert, 2019) or when the measurement of interest is taxon biomass, rather than abundance.
Therefore, it is important to ensure that abundance data are retrievable for future reuse. The decision whether to work with abundances or incidences thus depends on the specific hypotheses and research questions.
Using a continuum of diversity measures differentially weighted by abundances, such as Hill numbers (Alberdi & Gilbert, 2019;Hill, 1973) can be a useful alternative. This allows weighting OTUs according to their rarity, offering a mathematically elegant continuum between incidence-and abundance-based measures of biodiversity (Chiu & Chao, 2016). This approach has already been used to quantify diversity in environmental DNA (Calderón-Sanou et al., 2020).

| Standardize sampling
One particularly underappreciated source of methodological variation is how soil samples are arranged spatially, since biodiversity and community composition vary with area, volume, and distance ( Figure 3). A community can be defined for any extent, and all extents may be ecologically meaningful Wiesel et al., 2014). However, communities defined at different spatial scales (both grain and extent) cannot be directly compared ( Figure 3). In soils, combining multiple subsamples from a defined plot, or pooling, is common, as is insufficient reporting of sampling metadata (Dickie et al., 2018). These two practices make accounting for experimental differences or sampling scale a posteriori impossible. The Earth Microbiome Project (Thompson et al., 2017) has pioneered the large-scale standardization of laboratory protocols and the recording of standard environmental metadata. We argue that additional metadata of the spatial and temporal components of sampling should be reported for each sample. This includes the distance among samples, precise reporting of the spatial location, volume, extent, and grain of sampling, and the time of sample collection. With such information, it will be possible to account for differences in sampling strategy statistically, to compare samples across studies, and study the relationship between sample volume, organism size, and the patterns of diversity detected in the heterogeneous soil matrix.

| Account for imperfect detection and false positives statistically
Many of the issues described above concern imperfect detection (i.e., the detection of OTUs that are not present in the sample, or the failure to detect present OTUs; Figures 1 and 2).
Several statistical methods have been developed to deal with the complexities of microbiome data (e.g., SparCC, isometric log ratio transforms, and machine learning algorithms, see Knight et al., 2018).
One additional solution is occupancy modelling (Guillera-Arroita, 2017), a powerful toolset that accounts for the biases caused by imperfect detection (Figure 4), by adjusting for false presences and false absences (Lahoz-Monfort et al., 2016). Occupancy models are hierarchical statistical models that explicitly separate the observed (biased) detections and the unobserved true presences/absences (i.e., occupancy) or abundances. This is possible by having separate submodels for the true occupancy and the observation process ( Figure 4). For the model to work, the data must be informative about the detection process, for example via repeated sampling in time or spatial subsamples (Willoughby et al., 2016). Importantly, there are also variants that can estimate abundance (Kery & Andrew Royle, 2010), work for multispecies (Iknayan et al., 2014), estimate multiple facets of biodiversity (Broms et al., 2015), and can be designed to account for the complex spatial structuring (Chen & Ficetola, 2019).
Occupancy modelling also offers an extension that considers the false positive rate (i.e., the rate at which OTUs which were not present in the sample are detected, Lahoz-Monfort et al., 2016;Royle & Link, 2006). However, complex occupancy models must be informed by further experimental work (e.g., from additional PCR calibrations, using confirmation designs, or in the form of Bayesian priors and plausible limits on the probabilities; see Lahoz-Monfort et al., 2016 for a review). The first promising steps have been made towards the application of multispecies occupancy models to various amplicon sequencing data (Doi et al., 2019;Ficetola et al., 2015;McClenaghan et al., 2020), and we argue that this is a good starting point for data analysis of soil eukaryotes.

| FUTURE DIREC TIONS: B EN CHMARKING AND S TANDARDIZING
While sequencing technologies for eukaryotes can be adopted from prokaryote-based techniques, benchmarking and standardization remain to be done. Empirical studies have shown how sample size (Wiesel et al., 2014), extraction method (Griffiths et al., 2018), and the primers used (Schenk et al., 2019) influence diversity estimates in isolated nematode communities, but less is known about how these procedures affect the diversity estimates of other groups of soil fauna (Marquina et al., 2019). One exception is the effect of DNA extraction and storage on diversity assessments, which has received considerable attention (Delmont et al., 2011;Guerrieri et al., 2020;İnceoǧlu et al., 2010;Zinger et al., 2016). Benchmarking is a formidable challenge, but it is necessary for soil eukaryotic biodiversity assessments. The accuracy of different universal gene markers needs to be compared for each phylogenetic subgroup of soil biota, especially since most studies currently use single markers for universal diversity assessments.
The affinity of any primer pair is likely to be unbalanced across the tree of life, and marker regions differ in their coverage of taxa . One alternative is to simultaneously sequence multiple marker regions, a practice that increases diversity estimates, more closely approximates morphology-based assessments , and may become increasingly accessible as the cost of high throughput sequencing continues to decrease (Eberle et al., 2020). Long-read metagenomic shotgun sequencing may also serve to compare diversity assessments performed with different gene segments, and may help uncover novel biota (Eloe-Fadrosh et al., 2016) and further biases, such as those associated with DNA extraction (Jacquiod et al., 2016).
Additionally, the sensitivity of the resulting sequence fragments for assigning species identities needs to be determined and reported within each phylogenetic group. In a first step, this can be done in silico using extant genetic repositories of fully sequenced individuals. Such benchmarking efforts are essential to characterizing and quantifying biases, within taxonomic groups as well as across all eukaryotes targeted (Elbrecht & Leese, 2015;Thomas et al., 2016), and may aid in selecting the appropriate phylogenetic grouping for stratifying the data prior to analyses.
Assessing soil communities using amplicon sequencing involves countless choices (e.g., sample size, marker gene, primers), which may affect the resulting output, and its comparability to other data. Another, easier way to ensure comparability is with the development of a standard protocol, such as that proposed by the Earth Microbiome Project (Thompson et al., 2017). However, the standardization of methods across studies may never be perfect, and the continued development of several protocols may maximize the discovery rate. Here, the best solution is to ensure that experimental methods, potential biases and deviations from the standard protocol (for example varying sample size or primer sequence) are always reported in the metadata. Guidelines for the deposition of comprehensive experimental metadata have been proposed (Yilmaz et al., 2011), but these do not require the standardized reporting of spatial sampling designs. The archiving of laboratory protocols (Rambold et al., 2019) may offer a more comprehensive paradigm for data reusability. If sufficient metadata are available, then data integration, meta-analyses, and comparison among studies are possible, as methodological biases can be modelled and accounted for a posteriori using statistical models and meta-analytical machinery (Gerstner et al., 2017).
As the cost of next generation sequencing continues to plummet and its throughput continues to rise, amplicon sequencing will probably become an integral part of soil ecology, filling long-standing gaps in the field and improving our understanding of belowground biota . Data created by amplicon sequencing F I G U R E 4 Example of a simple occupancy model that accounts for imperfect detectability when estimating the presence/absence of a single species at three sites (based on Kéry, 2010 andGuillera-Arroita, 2017). The model has two parts: (i) an occupancy submodel, which is an ordinary logistic regression of presence/absence against a covariate, and (ii) an observation submodel that estimates detectability of the species thanks to repeated sampling at each site. Both submodels are fitted at the same time, usually by Markov Chain Monte Carlo (MCMC) or maximum likelihood. Royle and Link (2006) and Lahoz-Monfort et al. (2016) provide generalizations of this model that also account for false positives may be integrated to inform ecological syntheses, and will serve as the record of soil biodiversity for future generations, aiding in the study of the effects of global change on the diversity and dynamics of soil biota. Ensuring that these data remain comparable in the long term is paramount for the both the present and the future of soil ecology.

ACK N OWLED G EM ENTS
This work was funded by the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, which is funded by the German Research Foundation (DFG FZT 118). BKS visit to iDiv was funded by Humboldt Research Award and Australian Research Council (DP190103714). We would like to thank S. Tem for the valuable discussions, and the anonymous reviewers for their constructive feedback. Open Access funding enabled and organized by Projekt DEAL.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.