msGBS: A new high‐throughput approach to quantify the relative species abundance in root samples of multispecies plant communities

Abstract Plant interactions are as important belowground as aboveground. Belowground plant interactions are however inherently difficult to quantify, as roots of different species are difficult to disentangle. Although for a couple of decades molecular techniques have been successfully applied to quantify root abundance, root identification and quantification in multispecies plant communities remains particularly challenging. Here we present a novel methodology, multispecies genotyping by sequencing (msGBS), as a next step to tackle this challenge. First, a multispecies meta‐reference database containing thousands of gDNA clusters per species is created from GBS derived High Throughput Sequencing (HTS) reads. Second, GBS derived HTS reads from multispecies root samples are mapped to this meta‐reference which, after a filter procedure to increase the taxonomic resolution, allows the parallel quantification of multiple species. The msGBS signal of 111 mock‐mixture root samples, with up to 8 plant species per sample, was used to calculate the within‐species abundance. Optional subsequent calibration yielded the across‐species abundance. The within‐ and across‐species abundances highly correlated (R 2 range 0.72–0.94 and 0.85–0.98, respectively) to the biomass‐based species abundance. Compared to a qPCR based method which was previously used to analyse the same set of samples, msGBS provided similar results. Additional data on 11 congener species groups within 105 natural field root samples showed high taxonomic resolution of the method. msGBS is highly scalable in terms of sensitivity and species numbers within samples, which is a major advantage compared to the qPCR method and advances our tools to reveal hidden belowground interactions.


| INTRODUC TI ON
Our understanding of root distributions is limited compared to our knowledge of the patterning of leaves and shoots. This difference is largely due to methodological challenges as roots of different species can generally not be identified visually. With the introduction of DNA-based detection techniques (e.g. Bobowski et al., 1999;Jackson et al., 1999;Jones et al., 2011;Linder et al., 2000;Mommer et al., 2011), the first steps were taken in opening the "black box of the underground". Until 2008 these techniques were based on classic PCR amplification of nuclear, chloroplast or mitochondrial plant barcode loci, often combined with Sanger sequencing or RFLP (e.g. Bobowski et al., 1999;Brunner et al., 2001;Jackson et al., 1999;McNickle et al., 2008;Ridgway et al., 2003;Wildová, 2004). Individual root segments were identified on the basis of obtained PCR product length, DNA sequence or RFLP pattern. In some studies the species abundances were estimated after identification of numerous single root segments isolated from a single root core (Frank et al., 2015;Kesanakurti et al., 2011). Mommer et al. (2008) and McKay et al. (2008) were the first to introduce quantitative polymerase chain reaction (qPCR) in studies on plant root distributions. Rather than extracting DNA from individual root segments, Mommer et al. (2008) extracted DNA from multispecies root samples. In a four-species model system, the across-species abundance of root samples was estimated by relating the qPCR signals from root mixtures of unknown assembly to the qPCR signals of hand-mixed root samples of equal biomass proportions (i.e. calibration samples). In addition, species-specific primers, rather than universal primers were used. This method was later successfully applied in biodiversity experiments using plant mixtures with up to eight species (e.g. Hendriks et al., 2015;Mommer et al., 2010;Oram et al., 2018;Padilla et al., 2013;Zeng et al., 2017).
Although many successful uses, there are three main drawbacks of using qPCR all connected to the use of species-specific primers; (a) the primer development for each new species and the increased difficulty of it if species are more related; (b) the variable sensitivity of these primers; and (c) each species has to be analysed separately.
These drawbacks inspired us to explore the use of high throughput sequencing (HTS) for the quantification of relative species abundance in mixed root samples.
DNA sequence identification and counts can be used for both species-identification and quantification. Hiiesalu and colleagues (2012) were first to apply HTS in the field of root ecology, using the 454 Life Sciences sequencing platform. Hiiesalu et al. (2012) showed the power of HTS, but the use of a single barcoding marker resulted in insufficient taxonomic resolution; the 37 species identified aboveground were represented by 29 belowground molecular operational taxonomic units (MOTUs). Matesanz et al. (2019) sequenced a 517 bp chloroplast rbcl marker using Miseq to analyse the root proportions of five shrubland dominant species but recorded insufficient biomass versus sequence reads correlations and high false positive rates. Lang et al. (2019) used the combined sequence information of 65 to 71 chloroplast protein coding genes (PCG) within "genome skims" (low-coverage, short-read sequence data sets) to estimate pollen donor proportions within pollen mixtures using.
However, for two out of six pollen donor species the taxonomic resolution was still insufficient. The use of genome skims to map to a small set of genes is very data inefficient, even more when applied on roots which contain much lower number of plastids (Bramham & Pyke, 2017). Peel et al. (2019) described RevMet; 49 wild reference species were represented by genome skims which were mapped to individual long Minion sequence reads derived from mixed species pollen samples. Each read was assigned to a plant species and species proportions calculated from the collection of identified reads.
The method was validated using six replicate mock pollen mixtures of known composition. This elegant approach shows promise but struggled with false positive assignments within one of the two congener plant species pairs. Root and bee pollen grains have in common that they host many Fungi (Brundrett, 2004;Leidenfrost et al., 2020) which influence the taxonomic resolution and the quantification of plant species proportions. Ondov et al. (2019) introduced Mash Screen, a MinHash (Ondov et al., 2016) based approach which enables containment estimates for every NCBI RefSeq genome within every SRA metagenome. Mash Screen has not been validated for quantification of species abundances in plant mixtures but has great potential. While current tests are still limited, the results so far suggest that none of the currently available methods are able to accurately identify all species in mixed samples.
In this paper we describe the application of genotyping by sequencing (GBS) (Elshire et al., 2011) on multiple species root samples (msGBS) which, combined with a gDNA cluster filtering strategy, has the potential of increasing taxonomic resolution. GBS is developed for SNP detection; gDNA is fragmented using endonucleases and a set of two synthetic dsDNA adapters ligated to the fragments. Due to this preparation only a subset of the full genome is PCR amplified.
GBS provides a middle ground between targeted-and whole-genome shotgun barcoding. The sequenced subset is clustered into a relative small reference genome which, in msGBS, is enriched for species unique clusters increasing the taxonomic resolution.
The msGBS method we developed was aimed for two purposes: (a) Quantify within-species abundance in mixed root samples in one single molecular analysis with unprecedented taxonomic resolution; and (b) link the within-species abundance to root biomass across-species abundance using the calibration procedure sensu Mommer et al. (2008). msGBS was succesfully applied by in 't Zandt (2020) to asses local soil legacy effects in a multispecies grassland community.
The monoculture root material was used for the assembly of 13 "monoculture", 20 "calibration" and 111 "mock-mixture" samples: • The 10 calibration samples per species pool were assembled from monoculture root material in equal per species proportions.
• 56 and 55 mock-mixture samples of pool 1 and 2, respectively, were assembled from monoculture root material of each of eight species per pool and varied in proportions from 0% to 50%. gDNA was extracted for these and the "unknown" mixed field plot samples and subsequently analysed by qPCR to quantify relative fine root abundances according to Mommer et al. (2008). Only the "monoculture", "calibration" and "mock-mixture" samples were processed using msGBS (Figure 1). The monoculture samples were processed to assemble the meta-reference and used for downstream meta-reference filtering, the calibration samples were used to calibrate the within-species abundance to across-species abundance and the mock-mixture samples were used for the evaluation of msGBS in terms of correlations (to weighed root biomass and qPCR) and false-positive and negative signals (FPS and FNS). Based on the FPS an analytical detection limit can be introduced (Alberdi et al., 2018;Garrido-Sanz et al., 2020).

| Experimental setup -Dutch field study
For the evaluation of the taxonomic resolution of msGBS we analysed the msGBS relative FPS (rFPS) of 11 congener groups within a field experiment, further referred to as the "Dutch field study".
In this field study aboveground vegetation surveys were compared to the belowground noncalibrated msGBS within-species abundances. Leaves of in total 120 plant species (Table S1) were collected from seven field sites across a 30 km trajectory along the main branch of the river Rhine dike grasslands between the villages Ooij and Tiel in The Netherlands for meta-reference creation. A Braun-Blanquet (Braun-Blanquet, 1932) vegetation survey was performed at two levels in each of the seven field sites. A 5 × 5 m 2 plot survey and 5 1 × 1 m 2 plots within the broader plot.
From each of the 1 × 1 m 2 plots two 40 × 400 mm root cores were F I G U R E 1 The Jena field study experimental setup. The monoculture, calibration and mock-mixture samples are assembled from washed monoculture root material from 2.5 m × 2.5 m monoculture field plots. Two species pools were created each consisting of 8 of the 13 species; these assemblies correlated to two 8-species field plots of the Jena experiment taken and were subdivided in 0-10 cm, 10-20 cm and 20-40 cm depth portions, the replicate samples were combined which, after careful root washing, totalled to 105 "field mixture root" samples.
gDNA was extracted from all leaf and root samples. The collected survey data and noncalibrated msGBS results was used to assess the congener (Table S1) msGBS relative false positive signals (rFPS).

| gDNA extractions and qPCR
gDNA of the Jena field study samples were previously extracted using the DNeasy plant kit (Qiagen). The qPCR methodology and root distributions were previously described in the Jena Trait-Based Experiment papers (Barry et al., 2019;Oram et al., 2018). gDNA of the Dutch field study samples was extracted using the Nucleospin plant II kit (MN).

| msGBS library preparations and sequencing
The GBS protocol, as described by Elshire et al. (2011), was altered regarding the restriction enzymes and the adapter design ( Figure S1 and Table S2). A more detailed lab protocol can be found in the Extended lab protocol section of the supporting information. In total three pooled sequence libraries were constructed; one for the 144 samples of the Jena field study, one for the 122 monoculture leaf samples of the Dutch field study and one for the 105 root samples of the Dutch field study. The Dutch field study samples were equimolar pooled using qPCR. Half a sequence run (lane) was used for the Jena msGBS library and a full sequence run for each of the Dutch field study msGBS libraries.
First, 300 ng of genomic DNA (gDNA) of each sample was digested by two restriction enzymes (PacI and NsiI) after which two indexed adapters were ligated to the DNA fragments. The main change in the adapter design was the incorporation of three random nucleotides per F I G U R E 2 Overview of the msGBS analysis as outlined in the text and supporting documentation. Process 1 (gray) depicts the pre-processing of the sequence reads and produces product 1; the assembled sequence reads. Process 2 (blue) is the creation of the BLASTN filtered meta-reference (product 2). Process 3 (orange) depict the sequence read mapping to produce a BAM alignment file (product 3). Process 4 (yellow) is the identification of PCR duplicates, the conversion of BAM to CSV format and the monoculture-based cluster filtering. Total, per sample per cluster, read counts are stored in a filtered CSV file (product 4). Process 5 (green) starts with the non-calibrated analysis which results in the within-species abundance (product 5a). Next, a calibration key was created from the calibration sample read counts. The calibration key was subsequently used to convert the withinspecies abundance into the across-species abundance (product 5b) adapter for the identification of PCR duplicates within each amplified msGBS library. After the ligation step the samples were pooled, mixed and aliquoted in eight portions per library for practical reasons and to prevent the effect of PCR bias. For the Dutch field study the diglig reactions were equimolar pooled based on a qPCR quantification using the KAPA Library Quantification Kit for HTS (KAPA Biosystems). The aliquots were concentrated (QIAquick, Qiagen), AMPureXP size selected preferring >150 bp DNA fragments (Beckman coulter) and PCR amplified using KAPA HiFi HotStart readyMix (Roche Diagnostics).
The PCR reactions were combined, QIAquick concentrated and quantified using the KAPA Library Quantification Kit for HTS. The final three pooled msGBS libraries were spiked with 10% PhiX DNA to increase the DNA complexity of the library in order to improve the Hiseq colour matrix estimation for which the first 11 sequencing cycles are used overlapping with our index region. Sequencing was performed by Novogene (Hongkong) on a Illumina (USA) Hiseq X-Ten sequencer; 2 × 150 bp paired-end (PE) sequencing reads, each one starting with 3 unique molecule identifiers (UMI) nucleotides and the adapter index.

| msGBS data processing
Computations were executed on a local Linux cluster node in Nijmegen, The Netherlands. Writing and debugging of Python scripts (Python Core Team, 2015) was performed using PyCharm Professional 2017 2.2. R (Suhl et al., 2014) was executed using Rstudio (RStudio Team, 2016). The msGBS data processing can be described by five processes which are outlined in Figure 2: Process 1: Sequence read preprocessing. First, the reads were demultiplexing; the sequence read adapter indices are coupled to the sample name which was added to the read header. The 2 × 3 bp UMI nucleotides were processed and together with the indices stripped from the sequence read and added to the read header ( Figure S1). Next, the reads were inspected for adapter traces and low-quality nucleotides (<Q10) and trimmed when needed. All PE reads were merged (minimum 20 bp overlap) or else joined. The combined merged and joined reads are the assembled reads ( Figure 2, product 1).
Process 2: Meta-reference creation. For each monoculture a de novo assembled reference was created from dereplicated and clustered (with 95% identity) monoculture assembled reads.
The clusters of all monoculture references are combined into a single meta-reference (a digital gDNA sequence database) while retaining original monoculture identifier names. The meta-reference was cleansed from all identifiable non-Eukaryota and Fungi clusters by a local BLASTN search against the NCBI nr database ( Figure S3).
Process 3: Sequence read mapping. The assembled reads from all samples were mapped to the meta-reference. A BAM (sequence F I G U R E 3 Illustration of the monoculture-based cluster filtering ( Figure 2, process 4, step 3). The monoculturebased filtering evaluates the mapped read counts of the monoculture samples. Each cluster is evaluated individually. If a read from a monoculture sample was mapped to a cluster that originated from that monoculture sample this is called a target read count and if mapped to a cluster that originated from another monoculture sample this is called a non-target read count. In this example 1,132 reads were counted for Plantago lanceolata meta-reference cluster Pl_1 which could be split in 1,094 target read counts and 38 (3.5%) non-target read counts. The Pl_1 cluster did pass the all evaluation steps so this cluster was accepted and recorded in the filtered CSV file alignment file) was created ( Figure 2, product 3), in which the read header information was retained.
Process 4: Post-processing of read mapping data. First the UMI's and the mapping alignment scores in the BAM file are processed; sequence reads are marked as "is_duplicate" or "qc_fail", respectively. PCR duplicates are evaluated on a per meta-reference cluster, within sample level. They can cause bias in the analysis as the duplication rate can vary between amplified regions and samples. The BAM file is converted to CSV format; only the total read counts per cluster per sample are retained, reads marked as "is_duplicate" or "qc_fail" are not counted. A minimum total read count threshold of 10 reads per cluster over all samples was set; clusters that failed this criteria were removed from the CSV file.
An important step of the post-processing is the monoculture-based cluster filtering which uses the monoculture read counts in the CSV file to identify and discard between monoculture root sample homologous clusters. Removal of these clusters increases the taxonomic resolution of msGBS. These homologous clusters are plant-born or nonplant-born clusters that are present in multiple monoculture root samples. Monoculture per cluster read counts were either "target read counts" or "non-target read counts" as illustrated in Figure  When a cluster passed all filter steps this cluster, and the reads counts of all samples, was recorded in the filtered CSV file. Finally the total number of read counts, of all filtered clusters combined, were counted for all samples. Jena field study samples for which, in total, less than 1,000 reads were counted (script filter parameter f3 = 1,000) were removed from the filtered CSV file ( Figure 2, product 4).
Process 5: Noncalibrated and calibrated analysis. msGBS filtered CSV data was processed in two steps as illustrated in Table 1. The first step, which was performed for both the Jena and Dutch field study samples, is the noncalibrated analysis in which the per species read counts is divided by the total reads count of the mock-mixture and field root mixture samples, respectively.
This resulted in the within-species abundance ( Figure 2, product 5a). The second step, which was only performed for the Jena field study, is the optional calibration of the within-species abundances.
Since typical gDNA yields vary among species, we expected biomass independent, species-specific variation in the number of reads within samples. To estimate across-species abundance in mixed samples, the within-species abundance thus needed to be calibrated (sensu Mommer et al., 2008). Ten calibration samples per pool, assembled from per species equal proportions of fresh monoculture root biomass, were used to calculate a calibration key.
The calibration key was used to convert the within-species abundance of the mock-mixture samples to across-species abundance TA B L E 1 Final calculations of a mock-mixture sample in non-calibrated and calibrated mode. First the per species read counts of both calibration and mock-mixture samples are divided by total read count. The relative read counts of a mock-mixture sample is the 'within-species abundance' (green) in non-calibrated mode. For calibrated mode the averaged relative read counts of the calibration samples is called the calibration key (blue). In calibrated mode the across-species abundance of a mock-mixture sample is calculated in two steps. First the relative read counts of the mock-mixture sample is divided by the calibration key which results in the calibrated msGBS signal. The across-species abundance (red) is calculated by dividing the per species calibrated signal to the total calibrated signal  More details on the bioinformatics can be found in the extended bioinformatics section of the Supporting Information.

| Statistical analysis
Regression analysis were performed for all comparisons of the biomass-based species proportions the qPCR-and msGBS estimates of relative species abundance. In order to evaluate the between species variation in sequence read mapping counts of the calibration samples we calculated the coefficient of variation (CV). A two-way ANOVA was used to test if the calibrated msGBS and qPCR results were significantly different.

| msGBS library preparations and sequencing
For the Jena field study samples a total of 144 msGBS reactions were pooled into a single msGBS sequencing library ( Figure S2)

| Jena field study msGBS results
The results of the Jena field study msGBS data processing following  Adapter traces were identified and trimmed in 13.8% of the reads.
Of the assembled reads ( Figure 2, product 1) 39% and 61% of the reads were merged and joined, respectively.
Process 2. De novo meta-reference creation. On average 102,155 meta-reference clusters were generated per monoculture root sample; on average one cluster per 12 PE reads which results in a total of 1,328,016 clusters (Table S5). The number of clusters per monoculture varied from 8,475 to 260,885. A positive correlation (R 2 = 0.92) was found between number of processed monoculture reads and generated clusters per species ( Figure S4) which implies that a higher sequencing effort will result in more clusters per species. BLASTN filtering removed 1.1% of the clusters from the meta-reference (Table S6) which were mainly annotated as arbuscular mycorrhizal (AM) fungi and bacteria (60% and 34%, respectively).
Process 3. Sequence read mapping. In total 62% (Table S4) of the assembled reads of monoculture-, calibration-and mock-mixture samples were recorded in the BAM file. That a high proportion of reads did not map to the meta-reference can be caused by (a) the absence of a homologous cluster due to low monoculture sequencing effort or BLASTN filtering; or (b) reads that were too short for mapping after low quality nucleotide trimming.
Process 4. On average 23.3% of the reads were marked as "is-duplicate" and/or "quality-fail" in the BAM file. The extracted CVS file counted 86,443,367 reads (Table S4) which were mapped to 726,605 clusters (Table S7).
The monoculture-based cluster filtering evaluated the read counts of the 9,446,804 monoculture reads (Table S4) to the 726,605 remaining clusters in the CSV file. In total 29.4% of the clusters (213,367) were retained in the filtered CSV file (Table S7). Indeed, there was large variation in between species read counts per unit of root biomass (Tables S9A and S10A). The averaged read counts varied from 609 to 13,298 in pool 1 and 315 to 4,597 in pool 2. This again illustrates why across-abundances within root samples cannot be based directly on read counts. As in the qPCR method of Mommer et al. (2008), the calibration procedure is sensitive for signal variation between the calibration samples, due to errors in weighing equal tiny fresh biomasses by hand. Specifically, msGBS also requires comparable relative read counts between the calibration samples (Table S9B and S10B). Outlier values, containing one or more values that deviated more than 2.5 STD, were detected (sample 1 of pool 1 and sample 15 and 18 of pool 2) and removed (Tables S9B and S10B, Figures S5 and S6). Removal of outlier "calibration" samples is standard procedure in the old qPCR method and justified because of the sensitive root weighing procedure.
The biomass-based species proportions of the pool 1 and 2 mock-mixture samples was compared to the msGBS-and qPCR across-species abundances (sensu Mommer et al., 2008) ( Figure 5a,b). In general, we found comparable correlations ranging from R 2 = 0.84 to 0.97 for msGBS and R 2 = 0.94 to 0.98 for qPCR (Table S11 and (Table S13 and S14). In pool 2 Plantago lanceolata and Holcus lanatus had a relative high average msGBS FPS (4.6% and 3.5%, respectively). For pool 1 and 2 no msGBS FNS were found in calibrated mode. The msGBS FPS in calibrated mode were comparable to those in noncalibrated mode (0.42% and 0.46%, respectively, Table S8).

| The effect of cluster filtering
The combined effect of the BLASTN-and monoculture-based cluster filtering on the msGBS results was evaluated by comparing the msGBS and qPCR across-species abundances of the mock-mixture samples of pool 1, with-and without the combined filtering steps (

| The influence of species assembly on the calibration key
To assess the influence of species assembly on the calibration key values and the regression model slopes of the biomass-based species proportions and noncalibrated msGBS estimated within-species abundance we compared these values of the three species present in both pools ( Figure S7). Due to the species assembly the slope shift ranged from 0.01 to 0.03, the calibration key shift ranged from 0.01 to 0.04 (Table S16).

| Dutch field study msGBS results
On average 44,044 meta-reference clusters were generated per monoculture leaf sample; on average one cluster per 76 PE reads, more reads compared to the one cluster per 12 PE reads of the Jena study monocultures. This can be explained by two reasons: (a) the Vsearch minuniquesize parameter which was increased from 2 to 3 because of the higher per monoculture sequence read input; and (b) the fact that the Jena study monocultures derived from root, rather than leaf material which is expected to contain more Bacteria and Fungi. The second reason is hypothesized to be the cause of the difference in the percentage of leaf monoculture-and root field sample assembled reads retained in the filtered CSV file; 18% and 6%, respectively (Table S4).
The noncalibrated msGBS data of the Dutch field study was used to evaluate the congener specificity; does the monoculture based cluster filtering effectively identify between congener species homologous clusters and sufficiently increase the taxonomic resolution? To answer this question we looked at the msGBS rFPS of 'absent' species (based on extensive field surveys), which is assumed to be caused by the presence of congener species. This rFPS is calculated by dividing the msGBS signal of the "absent" congener specie(s) by the msGBS signal of the "present" congener specie(s). The analysis was based on five congener pairs, five congener triplets and one congener quartet. For congener triplets and quartets comparisons were performed in all available F I G U R E 4 Biomass-based species proportions compared to the msGBS in non-calibrated and calibrated mode. Correlation of biomassbased species proportions to the msGBS-non-calibrated within-species abundance and calibrated across-species abundance of the mockmixture samples of species pool 1 (ab) and 2 (cd). Regression line slopes and correlations are inserted as a table combinations (Table 3). For five congener combinations we found less than 0.5% rFPS and for three congener combinations between 0.5% and 3% rFPS. For the remaining three congener combinations we found a higher rFPS ranging from 14.53% to 43.96%. A closer look at the actual field survey data and their msGBS within-species abundances was used to discus these high rFPS signals ( Figures S8   and S9).

| D ISCUSS I ON
As one of the very few molecular techniques to quantify relative species abundance in mixed root samples, the qPCR method of Mommer et al. (2008) produces robust results but also has its limitations. Here we present a new molecular method that solves these drawbacks by:

TA B L E 3
The performance of msGBS and RevMet in terms of rFPS within congener species groups. The average signal for RevMet is the bee pollen species abundance (%) and for msGBS it is the non-calibrated within-species root abundance of the Dutch field study root samples. N is the number of samples included in the comparison  (Peel et al., 2019) only limited congener data is available but so far high rFPS are reported for one of two tested congener pairs. Smart application of filtering strategies or MinHash (Ondov et al., 2019)

| Meta-reference assembly
The number of GBS reference clusters generated depends on the sequencing effort, the restriction enzyme choice, clustering parameters and genome related properties of a species. In msGBS, the restriction enzyme choice cannot be optimized per species. For the Jena field study data we observed a high variation in the number of clusters generated per species ranging from 8,475 for Geranium pratense to 260,885 for Phleum pratense. However, this was strongly correlated to the sequencing effort. For the Dutch field study, were we aimed for 3M sequence reads per sample, we observed much less between species variation in cluster numbers. Despite the large variation, sufficient clusters were yielded for the Jena field study to allow robust estimation of the across-or within-species abundance. Overall, we do not regard cluster number variation as a fundamental problem since the mock-mixture reads are all mapped to the same set of clusters.

| BLASTN filtering
The processing of the Jena field study meta-reference BLASTN output led to the removal of only 1% of the clusters. Removed clusters were predominantly annotated to AM Fungi and Bacteria (60% and 34%, respectively). Many clusters could not be identified because of the incompleteness of the NCBI nr database used. A demonstration of this is that >99% of the removed AM Fungi clusters had a hit against Rhizophagus irregularis strain DAOM_181602 = DAOM_197198; the only AM Fungi (Tisserant et al., 2013) of which genome-scale sequence information is present in the NCBI nr database. The monoculture material of the Dutch field study, for meta-reference assembly, was collected from aboveground leaf material to prevent unnecessary interference with soil biota.

| Mapping
We used assembled (merged and joined) reads for mapping instead nonassembled reads. We believe that, for msGBS, assembled reads is preferable; some fragments are in the size region were 20 bp overlap, needed for a read to merge, is just present for some read pairs but not for others resulting in merged and joined variants of the same locus. During clustering those variants are not collapsed and therefore result in more than one cluster. The mapping of assembled reads prevents bias as they will only map to either the merged or nonmerged variant cluster of that locus.

| Monoculture-based cluster filtering
Evaluation of the per cluster read counts in the CSV file showed that it was quite common that monoculture reads of multiple species were mapped to a single meta-reference cluster. Monoculture-based cluster filtering identifies clusters with relative high nontarget mapping. Nontarget mapped reads can be caused by (a) between species homologous clusters; (b) clusters that originated from root-or rhizosphere microbiota; (c) nontarget roots present in the monoculture plots or laboratory environment pollution; and (d) tag-or index jumping (Schnell et al., 2015) although this is mainly a problem in library types that have blunt-end ligation steps in the wet protocol. Oram et al. (2018)  For the Jena field study, we accepted a maximum of 6.7% (f2 = 15) nontarget reads resulting in high between msGBS and qPCR correlations, acceptable FPS, no FNS and minimal sample loss due to filter f3 (1,000 reads). For the Dutch field study, were we had a higher monoculture sample read average, we accepted a maximum of 0.33% (f2 = 300) nontarget reads resulting in low rFPS and only two discarded samples due to filter f3 (2000 reads).

| msGBS in noncalibrated and calibrated mode
Our results showed that msGBS in noncalibrated mode resulted in

| False positive and false negative signals (FPS, FNS)
No FNS was detected within the Jena field study mock-mixture sam-

| msGBS taxonomic resolution
Taxonomic resolution is an unresolved issue in plant taxonomy studies due to high homologies between closely related congener species and is further complicated by a plethora of natural hybrids.  (Baltisberger & Hörandl, 2016). Within the Rumex congener triplet; visual inspection ( Figure S9) of the R. thyrsiflorus and R. acetose signal showed interference in both directions. For R. crispus no significant rFPS was detected from the other two species, this again corresponds to their phylogenetic relatedness (Schuster et al., 2015). More RevMet congener data is needed to properly evaluate the RevMet taxonomic resolution but in the single case were a direct comparison with msGBS could be made (rFPS between R. repens and R. acris) the latter showed an improved taxonomic resolution.

| Genetic variation and natural hybrids
Genetic variation within species may be another source of error.

| msGBS application on multispecies samples
The origin of the monoculture roots, from which the calibration sam- To minimize the chance that, in a natural field setting, species are missing in the meta-reference the monoculture material for the creation of the meta-reference is best collected aboveground and throughout the year. Species with latent presence in the field plot in the form of seeds or tubers will not interfere with the msGBS signals as roots are first washed from the soil core.
Significant FPS signals from missing species are only expected when the species are present in the form of roots and when closely related to species in the meta-reference.
Sampling representative pure species-specific fine root tissue in high diversity plant communities in natural field settings will often be difficult to impossible. We have shown that even without the preferred calibration, msGBS can provide meaningful results on quantitative distribution differences for the species in the plant community. For example, the within-species relative to total sequence read mapping counts can be compared between samples of different locations and soil depth. In this way, the distribution of roots of a single species in the soil column can be compared to soil type, soil heterogeneity and the presence of other species. Likewise, the degree of clustering of roots in the horizontal plane may unravel spatial niches belowground that cannot be derived from aboveground patterns because roots generally have a much wider range than shoots. Although root quantities cannot be compared between species, root distributions can, by which positive or negative associations may be unraveled related to questions of species competition and facilitation. These new opportunities for studying belowground community assembly in relation to environmental change now open up even for most diverse plant communities such a species-rich grasslands (Frank et al., 2010;Kesanakurti et al., 2011) and tropical forests (Jones et al., 2011).
In conclusion, our results highlight msGBS in calibrated mode as a novel, robust and cost-effective approach to estimate across-species abundances in mixed root samples. We showed that msGBS can as well be used in noncalibrated mode to estimate within-species abundances in high diversity plant communities when the arduous assembly of calibration samples is not preferred. msGBS has a high taxonomic resolution and is well able to distinguish congener species.
However, the genetic distance between closely related congener species approaches to the within-species genetic distance and the genetic gap is in some cases filled by a spectrum of hybrid variants. Although msGBS was developed with plant roots in mind the methodology is applicable to other sample types like pollen-or diatom mixtures.

ACK N OWLED G EM ENTS
We thank the gardener team of the Jena Experiment and many field assistants and student helpers for maintaining the field. We are grateful for the support of Katie Barry