Metabarcoding unsorted kick-samples facilitates macroinvertebrate-based biomonitoring with increased taxonomic resolution, while outperforming environmental DNA

Many studies have highlighted the potential of DNA-based methods for the biomonitoring of freshwater macroinvertebrates, however only a few studies have investigated homogenisation of bulk samples that include debris to reduce sample-processing time. In order to explore the use of DNA-based methods in water quality assessment in South Africa, this study compares morphological and molecular-based identification of freshwater macroinvertebrates at the mixed higher taxon and mOTU level while investigating abundance and comparing mOTU recovery with historical species records. From seven sites across three rivers in South Africa, we collected a biomonitoring sample, an intensive-search comprehensive sample and an eDNA sample per site. The biomonitoring sample was picked and scored according to standard protocols and the leftover debris and comprehensive samples were homogenised including all debris. DNA-based methods recovered higher diversity than morphology, but did not always recover the same taxa, even at the family level. Regardless of the differences in taxon scores, most DNA-based methods except some eDNA samples, returned the same water quality assessment category as the standard morphology-based assessment. Homogenised comprehensive samples recovered more freshwater invertebrate diversity than all other methods. The eDNA samples recovered 2 to 10 times more mOTUs than any other method, however 90% of reads were non-target and as a result eDNA recovered the lowest target diversity. However, eDNA did find some target taxa that the other methods failed to detect. This study shows that unsorted samples recover the same water quality scores as a morphology-based assessment and much higher diversity scores than both picked and eDNA samples. As a result, there is potential to integrate DNA-based approaches into existing metrics quickly while providing much more information for the development of more refined metrics at the species or mOTU level with distributional data which can be used for conservation and biodiversity management.


Abstract
Many studies have highlighted the potential of DNA-based methods for the biomonitoring of 26 freshwater macroinvertebrates, however only a few studies have investigated homogenisation 27 of bulk samples that include debris to reduce sample-processing time. In order to explore the 28 use of DNA-based methods in water quality assessment in South Africa, this study compares 29 morphological and molecular-based identification of freshwater macroinvertebrates at the 30 mixed higher taxon and mOTU level while investigating abundance and comparing mOTU 31 recovery with historical species records. From seven sites across three rivers in South Africa, 32 we collected a biomonitoring sample, an intensive-search comprehensive sample and an 33 eDNA sample per site. The biomonitoring sample was picked and scored according to 34 standard protocols and the leftover debris and comprehensive samples were homogenised 35 including all debris. DNA-based methods recovered higher diversity than morphology, but 36 did not always recover the same taxa, even at the family level. Regardless of the differences 37 in taxon scores, most DNA-based methods except some eDNA samples, returned the same 38 water quality assessment category as the standard morphology-based assessment. 39 Homogenised comprehensive samples recovered more freshwater invertebrate diversity than 40 all other methods. The eDNA samples recovered 2 to 10 times more mOTUs than any other 41 method, however 90% of reads were non-target and as a result eDNA recovered the lowest Introduction mixed taxon (typically family) level (Fig. S1) and are assigned a quality score based on 118 pollution sensitivity (Dickens & Graham, 2002). The abundance of organisms is roughly 119 estimated into categories (where 1 = 1 individual, A = 2 -10, B = 10 -100, C = 100 -1000, D 120 > 1000) and recorded on the scoring sheet. Abundance is not used in the SASS calculations,   Following eDNA sampling a SASS sample was taken by an accredited river health 167 practitioner, following the SASS protocol using a 30x30cm framed standard kicknet with a from each sample were picked, morpho-sorted after scoring and then identified further if 183 possible and counted (here termed "SASS picked" Following the second PCR, the tagged amplicons were purified using Agencourt AMPure XP 251 beads at 0.8x ratio. The eDNA sample from site E3 was gel cut using the QIAquick Gel     Taxonomy was assigned to remaining mOTUs using an R script to search against both BOLD 286 and NCBI databases. The taxonomy assigned mOTU   Historical species records 318 The taxa previously recorded for each river / sampling area were extracted from the database 319 of the Albany Museum, Makhanda (previously known as Grahamstown), which houses the 320 largest freshwater invertebrate collection in Africa, including from the sites sampled in this 321 study, and compared to the target mOTUs recovered by molecular methods. Only records 322 identified at the genus or species level were used for comparison. Where available, data on 323 the number of described species for the broader region (Eastern Cape / South Africa / 324 southern Africa as applicable) was added to the analysis.

327
Sequencing statistics 328 The MiSeq run yielded 16 million reads from the 68 tagged samples (raw data available from 329 https://doi.org/10.5281/zenodo.3462633). After library demultiplexing, an average of 330 221,630 (SD = 69,340) read pairs were retained. After bioinformatic processing, a total of 331 17,660 mOTUs were detected which was then reduced to 5,117 mOTUs that were present in 332 more than one technical replicate. Taxonomy was assigned using BOLD and NCBI, the    Fig. 2. Proportion of reads lost during each processing step, an overview of sequences discarded from raw data, bioinformatics processing and 347 non-target hits (e.g. bacteria) compared to the target macroinvertebrate taxa (green).

349
When considering the number of mOTUs shared between methods across all sites, only 57 of 350 the 5,117 mOTUs (0.01%) found were shared between all four methods (Fig. 3a). As 351 expected, SASS picked returned mostly arthropods (237 mOTUs), 76% of which were also 352 present in the SASS leftover sample and 84% of SASS picked mOTUs were shared with the 353 Comprehensive sample (Fig. 3a). The eDNA samples showed the least overlap with other     19.14 ± 3.72) while taxon recovery from the eDNA samples was consistently lower than 392 other methods, between 5 -16 target families (mean = 10.29 ± 3.82) (Fig. 4a).

423
The relative proportion of reads assigned to each family was positively correlated with 425 abundance when all sites were combined into a single regression (Fig. 6) with a significant 426 but moderate adjusted R 2 value (R 2 = 0.557, p < 0.0001). taxa is more apparent and mean mOTU numbers found with each method are considerably 448 more variable (Fig. 7), highlighting the increased information obtainable at a higher taxon 449 resolution.   Table 1, Fig. S3) 457 Comparing mOTUs in these three rivers against historical records of species from the region 458 highlighted several groups with potentially high levels of cryptic diversity (Table 1)  Albany Museum historical specimen records of species found at the rivers sampled in this study. 492 Regional records for families with number of mOTUs >5 and for those with regional species 493 information readily available. In this study, we compared four DNA-based methods for recovering biodiversity at mixed 503 higher taxon (typically family level) and the mOTU level in the context of water quality 504 biomonitoring and rapid biodiversity assessment. The methods included a standardised SASS 505 sample split into the picked individuals (SASS picked) and the leftover debris (SASS 506 leftover), a Comprehensive sample searching for a longer time period with a finer mesh size, 507 and an eDNA sample of filtered water from each site. Although the SASS picked samples 508 contained mostly target sequences as a result from the sorting and picking process (Fig. 2), 509 this method also recovered a lower diversity at the mOTU level than the SASS leftover debris 510 which was part of the same sample ( Fig. 3 and 7). When comparing morphotaxa with the The Comprehensive and SASS leftover samples showed similar proportions of non-target 518 reads, and similar compositions of target and non-target taxa detected, however sample 519 inhibition, due to the substantial debris in both of these sample types, was not apparent, 520 although this was not experimentally validated in our study.

522
The Comprehensive method found the highest number of target taxa and the highest number 523 of unique target taxa, suggesting that a longer sampling time with a smaller size mesh net, 524 followed by whole sample homogenisation captures more of the local macroinvertebrate 525 diversity than either the standardised SASS sampling protocol, or the eDNA sampling 526 method tested in our study. This increase in mOTU diversity recovered in the Comprehensive 527 sample is potentially due to the inclusion of taxa that would morphologically be considered as 528 "out of season" due to their lifestage (e.g. eggs / small larvae) and otherwise undetectable.  Our eDNA sampling method found between 2 to 10 times more mOTUs than any other 540 method, however a remarkable proportion of reads were discarded during bioinformatics 541 processing, as many reads were short or non-target reads (e.g. bacteria, Chromista, Plantae 542 and some Chordata). Thus, the number of target taxa detected with eDNA was lower than for 543 all other methods. These non-target reads are likely due to the highly degenerate primers  For most biodiverse countries, DNA reference libraries are still a major limiting factor (e.g. 574 Venter & Bezuidenhout, 2016), however most taxa can at least be assigned confidently to 575 family level or below, using current data on BOLD, as was the case for the target taxa in this 576 study suggesting that reference libraries do not hinder the incorporation of DNA 577 metabarcoding into the current SASS protocol.

578
The results show that while some DNA-based metabarcoding methods detect more diversity 580 than morphology, even at family level, they do not always find the same taxa, as has been  powerful tool in this respect and is able to uncover remarkable mOTU diversity for these 651 groups. Across the sites, 101 chironomid mOTUs were found using the Comprehensive 652 method, while less than a third of that was picked from the SASS samples as morphospecies, 653 and this diversity is usually recorded as a single family and potentially misrepresented by a 654 relatively low taxon score. A similar pattern was seen within nearly all target groups in this 655 study, in many cases a higher number of mOTUs were found than the number of species 656 known from the province or even country, which is surprising given our limited sampling in 657 only three rivers.

659
While access to morphological expertise has always limited the level to which taxa, and 660 therefore patterns of biodiversity, can be identified, it is argued that the mOTU approach because the largest and most diverse groups (e.g. Chironomidae) are grouped together or 676 omitted from analyses.

678
The advantage of developing a metabarcoding tool now, despite the limited reference 679 libraries in South Africa, is that the data gathered can be "time stamped" and can be  Although there is still much uncertainty and concern with the conceptual and practical