Have the cake and eat it: Optimizing nondestructive DNA metabarcoding of macroinvertebrate samples for freshwater biomonitoring

Abstract DNA metabarcoding can contribute to improving cost‐effectiveness and accuracy of biological assessments of aquatic ecosystems, but significant optimization and standardization efforts are still required to mainstream its application into biomonitoring programmes. In assessments based on freshwater macroinvertebrates, a key challenge is that DNA is often extracted from cleaned, sorted and homogenized bulk samples, which is time‐consuming and may be incompatible with sample preservation requirements of regulatory agencies. Here, we optimize and evaluate metabarcoding procedures based on DNA recovered from 96% ethanol used to preserve field samples and thus including potential PCR inhibitors and nontarget organisms. We sampled macroinvertebrates at five sites and subsampled the preservative ethanol at 1 to 14 days thereafter. DNA was extracted using column‐based enzymatic (TISSUE) or mechanic (SOIL) protocols, or with a new magnetic‐based enzymatic protocol (BEAD), and a 313‐bp COI fragment was amplified. Metabarcoding detected at least 200 macroinvertebrate taxa, including most taxa detected through morphology and for which there was a reference barcode. Better results were obtained with BEAD than SOIL or TISSUE, and with subsamples taken 7–14 than 1–7 days after sampling, in terms of DNA concentration and integrity, taxa diversity and matching between metabarcoding and morphology. Most variation in community composition was explained by differences among sites, with small but significant contributions of subsampling day and extraction method, and negligible contributions of extraction and PCR replication. Our methods enhance reliability of preservative ethanol as a potential source of DNA for macroinvertebrate metabarcoding, with a strong potential application in freshwater biomonitoring.


| INTRODUC TI ON
Freshwater ecosystems are among the most threatened ecosystems in the world, facing numerous pressures associated with pollution, eutrophication, damming and regulation of rivers, water overuse, invasive species and climate change (Craig et al., 2017;Vörösmarty et al., 2010). These drivers are leading to fast biodiversity declines and hindering services provided by freshwater ecosystems (Craig et al., 2017;Vörösmarty et al., 2010). To counteract these trends, national and international regulations have been enacted to protect and rehabilitate freshwater ecosystems, such as the Water Framework Directive (WFD, Directive 2000/60/EC) applied in the European Union. These regulations involve country-specific, long-term and large-scale monitoring programmes, requiring the development of cost-effective methodologies to assess the ecological status of aquatic ecosystems (Birk et al., 2012;Pawlowski et al., 2018).
Currently, freshwater biological assessments around the globe are generally based on the characterization of communities of indicator organisms, which are used to derive biotic indices quantifying the biological quality status (Birk et al., 2012;Pawlowski et al., 2018). For example, assessments in rivers under the WFD include indicator organisms as diatoms, macroalgae and angiosperms, benthic invertebrates and fish (Birk et al., 2012). Typically, the monitoring programmes involve sampling at field sites, sample preparation (e.g. sorting), morphological species identification and quantification, calculation of biotic indices and quality assessment (Pawlowski et al., 2018). Although this approach has been successfully used since the mid-20th century, it is labourintensive and time-consuming, which in many cases may limit the number of sites that can be sampled, and the frequency of sampling (Hajibabaei, Shokralla, Zhou, Singer, & Baird, 2011). The need for morphological identification of organism is particularly troublesome, as this is laborious and requires taxonomic expertise that is often very limited. Also, for many organisms, misidentifications may occur or identifications may be impossible to achieve at the highest taxonomical resolution required for fine ecological assessments, due to difficulties in identifying certain groups and/ or life stages (e.g. larvae of some macroinvertebrates) (Hajibabaei et al., 2011). Given these difficulties and the advent of powerful high-throughput DNA sequencing, there has been an increasing interest in the use of molecular tools in ecosystem assessment (Sweeney, Battle, Jackson, & Dapkey, 2011;Taberlet, Coissac, Pompanon, Brochmann, & Willerslev, 2012), now often referred as biomonitoring 2.0 (Baird & Hajibabaei, 2012).
DNA metabarcoding may be particularly useful in freshwater biomonitoring because it is able to process complex multispecies assemblages, and is potentially faster, lower-priced and more refined than conventional methods (Aylagas, Borja, Irigoien, & Rodríguez-Ezpeleta, 2016;Gibson et al., 2014;Hajibabaei et al., 2011). By combining DNA taxonomic identification, high-throughput sequencing and bioinformatic pipelines, metabarcoding can achieve higher taxonomic resolution and thus potentially higher sensitivity of metrics to fine variations in freshwater ecosystems (Andújar et al., 2018;Carew, Pettigrove, Metzeling, & Hoffmann, 2013;Gibson et al., 2015). Despite its potential, there are still several technical and conceptual challenges associated with the use of DNA metabarcoding in freshwater bioassessment (Leese et al., 2016; detailed revision in Pawlowski et al., 2018), which need to be addressed before it can be mainstreamed into official monitoring programmes such as those undertaken under the WFD (Leese et al., 2016;Pawlowski et al., 2018). In the case of biotic indices based on benthic macroinvertebrates, for instance, one of the problems is the need for pre-processing bulk samples, such as cleaning and sorting of specimens before DNA extraction (Aylagas et al., 2016;Elbrecht, Peinert, & Leese, 2017;Elbrecht, Vamos, Meissner, Aroviita, & Leese, 2017), which increase processing time, costs and possible contamination. Furthermore, DNA extraction from a bulk sample requires its destruction, which may be problematic if the sample is required for other uses, or if it needs to be preserved for a certain period of time due to legal reasons, as currently required by some regulatory agencies.
In this study, we optimize and evaluate procedures for nonde-  Shokralla et al. (2010) and Zizka et al. (2018), and it was expected to be more effective at preserving DNA and bulk samples than the concentration of 70% used in other studies (Elbrecht, Vamos, et al., 2017;Stein, White, Mazor, Miller, & Pilgrim, 2013

| Laboratory procedures
After careful manual shaking, five 2-ml subsamples of preservative ethanol were taken from each macroinvertebrate bulk sample on days 1, 2, 3, 5, 7 and 14 following field sampling and stored at −20ºC until DNA extraction. The subsample volume was chosen as a balance between the objective of representing macroinvertebrate diversity in the bulk sample, and the need to take many replicate subsamples from each bulk throughout the experiment. The duration of the experiment was established based on the range used in other studies (e.g. Linard et al., 2016;Zizka et al., 2018), and considering a prior expectation that results would stabilize in about two weeks. Furthermore, a relatively short period was tested because National Regulatory Authorities need to have water quality information as soon as possible upon field sampling. A higher frequency of subsampling was carried out in the first days because this was the period when we expected the results to change more rapidly.
Prior to DNA extraction, ethanol was completely evaporated using an Eppendorf vacuum concentrator. Genomic DNA was then extracted from each 2-ml preservative ethanol subsample using one out of three extraction methods ( The quantity (ng/μl) and integrity of extracted DNA were assessed using an Agilent 2200 TapeStation system (Agilent Technologies, Inc., California, USA). DNA integrity was evaluated using the DIN (DNA Integrity Number) algorithm estimated with Genomic DNA ScreenTape, which is based on the size distribution of DNA fragments and varies between 1 (highly degraded) and 10 (highly intact).
Library preparation was performed in two steps, adapted from the protocol described by Kircher, Sawyer, and Meyer (2011) and Gansauge and Meyer (2013). First-round PCR amplifications were performed using the reverse primer BR2  and a redesigned forward primer (MARTINS-2019-COI_Fw, 5´-GGNTGAACHGTHTAYCCHCC-3´) from the Ill_C_R  reversed complemented. These primers were used because preliminary in vitro testing showed their ability to consistently amplify Ephemeroptera, Plecoptera, Trichoptera and Odonata (EPTO) taxa, which are widely considered the best macroinvertebrate indicators of freshwater biological quality (Bonada, Prat, Resh, & Statzner, 2006), and for which we had a considerable barcode database from specimens collected within or close to the study area (IBI, CIBIO- At the end of the experiment, the bulk samples were cleaned and sorted, and the WFD-targeted macroinvertebrate taxa were morphologically identified to the lowest possible taxonomic level.
A particular effort was taken to achieve species-level identification for EPTO taxa, since many had been identified at this level from metabarcoding data.

| Bioinformatic analysis
Sequence reads were processed using the OBITools software suite (Boyer et al., 2016), from pairwise alignment to clustering (Bálint, Márton, Schatz, Düring, & Grossart, 2018;Taberlet, Bonin, Zinger, & Coissac, 2018), following procedures detailed in the Supplementry Methods (see Supporting Information). Particular care was taken to remove artefacts resulting from PCR and sequencing errors, including the use of "obigrep" to eliminate sequences with a length outside the expected metabarcode size (310-316), and sequences occurring just once across the data set. The command "obiclean" was used to filter out potentially erroneous sequences compatible with indel or substitution errors, based on their lower frequency of occurrence and similarity to most common sequences (De Barba et al., 2014).
We also removed resulting cluster sequences with ≤0.03% of read coverage in at least one sample and ≤5 reads. Finally, extraction and PCR negative controls were used to filter out potential contaminants.
Each cluster sequence was taxonomically assigned considering three databases: using BLAST searches against NCBI Nucleotide database "nt" (downloaded in September 2017) and our private species database (IBI-InBIO Barcoding Initiative); and using the search engine of the BOLD database (details in Supplementry Methods in Supporting Information). Assignments to species level were required to have a percentage identity of at least 98%, whereas lower identity thresholds were required for assignments to Order or lower taxonomic levels (<92% of identity), Family (≥92%) and Genus (≥95%) levels. Assignments were cross-checked using the three databases, and the best match was retained. All assignments were manually checked for plausibility, including verification of the likelihood of species occurrence close to the study area, using information on species geographic range and occurrence records.
In the case of macroinvertebrate taxa targeted by the WFD in  (Kearse et al., 2012). This approach was used to visually detect spurious sequences that might have passed through the pipeline filtering (including pseudogenes), and define group-specific divergence thresholds. Assuming distance thresholds derived from sequence databases, we considered a distinct phylOTU each cluster of sequences that was separated from all other clusters by ≥5%, except in the case of the Trichoptera and Hemiptera for which we selected a threshold of 3% (further details in Supplementry Methods in Supporting Information). Species and phylOTU data were combined in a single matrix to analyse the diversity and composition of WFD and EPTO taxa communities. Because rare occurrences can result for instance from cross-contamination or tag jumps during the process (Taberlet et al., 2018), species/phylOTU with a read coverage <0.01% were removed from each sample. As the criteria and thresholds to remove rare taxa can influence results, analyses were repeated with the unfiltered taxa matrix, with the exclusion of "singleton" taxa from the matrix (i.e. taxa with only one read) and with the matrix trimmed at 0.03% and 0.05% thresholds. Results are presented considering the 0.01% threshold except where indicated otherwise.

| Statistical analysis
Our study was based on a hierarchically structured design that considered five stages: sampling site (n = 5), subsampling day (n = 6), extraction method (n = 3), extraction replicate (n = 1 in SOIL and BEAD; or n = 3 in TISSUE) and PCR replicate (n = 3). The first three stages are crossed, and the last two are nested within the hierarchical stages above (Schielzeth & Nakagawa, 2013). The experiment thus produced 450 sampling units, of which only 418 were carried out for subsequent analysis once units producing <500 reads of WFD-targeted taxa were discarded. For each sampling unit, we estimated species richness as the total number of WFD taxa detected through metabarcoding. We also used Chao1 estimator of species richness, thereby accounting for differences on sampling effort and sample completeness (Chao & Chiu, 2016 Generalized additive mixed models (GAMM) were used to model variation in each response variable in relation to independent variables and their interactions, which permit detecting nonlinear responses without needing a priori assumptions on the expected shape of such responses, while accounting for the hierarchical structure of the experiment (Wood, 2006). In the fixed component of the GAMM, we considered as independent variables the extraction method, the subsampling day and the interaction between the two. The fixed component also included the number of reads of the target taxa (i.e. WFD or EPTO taxa), thereby accounting for variation introduced by differences in coverage between samples. This approach was used instead of computing a rarefaction curve and truncating data considering a given read count threshold (Taberlet et al., 2018), because explicitly modelling the effects of coverage is increasingly considered a more robust and statistically efficient approach (McMurdie & Holmes, 2014). In GAMMs using either DNA concentration or integrity (DIN) as response variables, the nested random factors included the extraction replicate within extraction method within site. In GAMMs using either species richness or Chao1 diversity as response variables, the random component included the PCR replicate as an additional level nested within the other nested random factors. All GAMMs were built considering Gaussian errors and an identity link, except for observed species richness for which we used Poisson errors and a log link. In all models, we specified a penalized spline smoother with a basis dimension k = 4 for the subsampling day. The number of reads was log-transformed, assuming a stabilization of the effects for high read counts. GAMMs were fitted using the package gamm4 (Wood & Scheipl , 2017) and pl otted using the R ggplot2 package (Wickham, 2016).
To estimate the contributions of treatments and replicates to variation in community composition among units, we adopted the pro- To estimate the percentage of matching between metabarcoding and morphological identification results for each sampling unit, we computed the proportion of taxa identified through morphological analysis that were retrieved through metabarcoding. As deviations between morphology and metabarcoding could also be due to taxa retrieved from the latter that were not detected by the former method, we computed Jaccard index as a measure of overall distance between each molecular sampling unit and the corresponding morphological sample. Separate comparisons were made for identifications at either the family or species level of EPTO, since identifications for most other taxa were often very coarse due to the lack of adequate barcode reference collections. To estimate how percentage matching and the Jaccard index varied in relation to extraction method and subsampling day, we used GAMMs with a fixed component and a nested random structure as described above for species richness. All analysis used morphological identification as the benchmark rather than metabarcoding the bulk sample itself, because we wanted to compare metabarcoding with the standard morphological approaches used in WFD monitoring programmes. In addition, as our bulk samples were collected under a WFD monitoring programme, they need to be preserved for at least five years and so could not be destroyed for bulk metabarcoding.

| DNA concentration and integrity
The concentration of DNA was significantly lower in samples extracted with SOIL than with BEAD extraction protocol (Figure 1a; Note. Subsampling was conducted at 1, 2, 3, 5, 7 and 14 days after field sampling, and DNA extractions were performed using three methods: BEAD, TISSUE and SOIL as described in Table S1. For each model, we provide the parameter estimates, standard errors (SE) and statistical significance of parametric terms, and the effective degrees of freedom (edf) and approximate significance of smooth terms. The shape of the smooth terms is provided in Figure 1. ***p < 0.001. **p < 0.01. *p < 0.05. ns p > 0.05.  Table S3).

| Taxa richness
The mean number of taxa detected per sample was 34 (±13 SD) when using the 0.01% threshold for removing rare species, which was smaller than that obtained from the unfiltered matrix (37 ± 14 SD) and larger than that using the 0.05% threshold (26 ± 9 SD) ( Table S2). The mean observed richness increased significantly with the number of reads obtained for the sample, and it was significantly lower in samples extracted using SOIL and TISSUE than using BEAD (Table 2). There were also significant effects of subsampling day on observed richness of samples extracted with the three methods ( Table 2). The strongest effect was found for TISSUE, with observed richness increasing markedly up to about 10 days and declining slightly thereafter (Figure 3a), though the later decline may be a model artefact due to lack of data on the interval 7-14 days. For BEAD and SOIL, there was an overall trend for observed richness increasing with subsampling day, though with a small decline in the first three days for BEAD, and small fluctuations over time for SOIL (Figure 3a). It is noteworthy that although BEAD was the method detecting most species in average, the difference in relation to TISSUE was mostly apparent within the first 1-3 days, and largely converged thereafter. Alternative criteria of rare species removal produced broadly similar results, though effects were stronger when using the unfiltered taxa matrix and much weaker when using stricter criteria (Table S4). In particular, when using the 0.03% and 0.05% removal criteria, there were no longer significant differences between BEAD and TISSUE, and no significant effects of subsampling day for BEAD (Table S4).
Results using Chao1 were broadly similar, though the decline observed for BEAD in the first few days was stronger for Chao1 than for observed richness estimates ( Figure 3b, Table 2).

| Community composition
The PERMANOVA indicated that by far the largest share of variation in the composition of WFD taxa across samples was due to significant differences among sites (64.2%), while differences in coverage among samples had a very small (0.5%), albeit significant contribution to explained variation (Table 3). Although much lower, the subsampling day and the extraction method had significant effects on community variation, though each contributed <2% to explain such variation. In contrast, extraction and PCR replicates did not contribute significantly to explain variation in community composition. The patterns observed were consistent irrespective of the alternative criteria used to deal with rare species (Table S5).

| Differences between metabarcoding and morphology for EPTO
Overall, most EPTO taxa detected morphologically at sampling sites were also detected at the corresponding sites in at least one mo-

Note.
Extractions were performed from the ethanol used to preserve five unprocessed freshwater macroinvertebrate bulk samples and subsampled on days 1, 2, 3, 5, 7 and 14 after field sampling, using three DNA extraction methods (BEAD, TISSUE and SOIL). For each model, we provide the parameter estimates, standard errors (SE) and statistical significance of parametric terms, and the effective degrees of freedom (edf) and approximate significance of smooth terms. The shape of the smooth terms is provided in Figure 3. The matrix used to build the models for observed richness excluded rare species with a percentage read count <0.01% from each sample; the unfiltered taxa matrix was used for Chao1 (see Methods); models build using alternative criteria for dealing with observed rare species are given in Table S4. ***p < 0.001. **p < 0.01. *p < 0.05. ns p > 0.05. metabarcoding were not detected through morphological identification, either for BEAD (44.6% ± 9.6 SD; 34.6%-57.9%), TISSUE (44.5% ± 9.8 SD; 32.8%-58.8%) or SOIL (47.9% ± 14.6 SD; 28.0%-66.7%) (Table S6).
Both at family and species levels, the Jaccard distance index between metabarcoding and morphology of EPTO increased with the read coverage of the samples and it was significantly lower for BEAD than SOIL but not TISSUE (Figure 4a, c; Table 4). Distances  Table 4). For both the Jaccard distance and percentage matching, the results obtained were similar when using alternative criteria for dealing with rare taxa, though with stronger and more significant results obtained when using unfiltered taxa matrix than when using stricter removal criteria (Tables S7, S8). In particular, significant effects of extraction method disappeared or became very weak when using the 0.03% or 0.05% removal criteria, the same occurring for the temporal trends using BEAD and the 0.05% removal criteria (Tables S7, S8).

| D ISCUSS I ON
Our results confirmed previous studies showing that DNA metabarcoding of 96% ethanol used to preserve freshwater macroinvertebrate bulk samples can provide reliable information on taxa diversity and composition Zizka et al., 2018).
We further show that this information can be obtained even from F I G U R E 3 Variation predicted from GAMMs in (a) observed species richness and (b) Chao1 richness estimates assessed through the metabarcoding of DNA extracted from 96% ethanol used to preserve five unprocessed freshwater macroinvertebrate bulk samples, in relation to ethanol subsampling day and DNA extraction method, while controlling for variation in the number of reads across samples. Subsampling was conducted at 1, 2, 3, 5, 7 and 14 days after field sampling, and DNA extractions were performed using three methods: BEAD (purple, solid line), TISSUE (blue, dashed line) and SOIL (green, dotted line). Temporal variation trend lines are provided with the corresponding standard errors. Summary statistics of the GAMMs are provided in Note. The number of reads was also included to control for variation in coverage among samples. For each term, we provide the degrees of freedom (df), mean sum of squares (MSS), F model ratio (F), r-squared (R 2 ) and p-values. The matrix used to build the model excluded rare species with a percentage read count <0.01% from each sample; models build alternative criteria for dealing with rare species are given in Table  S5. ***p < 0.001. **p < 0.01. *p < 0.05. ns p > 0.05.
unprocessed bulk samples preserved in the field without sorting, and thus mixed with a wide range of potential contaminants and PCR inhibitors originated from sediments and plant material (Schrader, Schielke, Ellerbroek, & Johne, 2012). Similarly to Zizka et al. (2018), we were able to retrieve relatively high diversity of taxa from a wide range of phyla, across nearly all the samples analysed, with a strong representation of freshwater macroinvertebrate taxa considered in the WFD, the main target of this study. Furthermore, information from metabarcoding clearly detected the ecological signal corresponding to marked variations in the composition of macroinvertebrate communities across sampling sites. However, we also show significant effects of technical variants such as DNA extraction methods and timing of ethanol subsampling in relation to the field bulk collection, which can influence community diversity and composition estimates. Overall, our study suggests that metabarcoding of preservative ethanol from bulk samples may provide a promising tool for cost-effective biomonitoring programmes of freshwater benthic macroinvertebrates, though care should be taken with the choice and standardization of methods, from extraction to bioinformatic analysis (Leese et al., 2016;Pawlowski et al., 2018;Zizka et al., 2018).
As previously described for other types of samples (Deiner, Walser, Mächler, & Altermatt, 2015;Hermans, Buckley, & Lear, 2018;Majaneva, Diserud, Eagle, Hajibabaei, & Ekrem, 2018), the DNA extraction method was one of the main technical factors affecting metabarcoding community estimates from preservative ethanol. Overall, our results showed that column-based DNA extraction methods (TISSUE and SOIL) tended to have lower performance compared to the magnetic-based method (BEAD). In fact, TISSUE and SOIL, particularly the latter, resulted in lower concentrations and integrity of DNA than BEAD, detected less taxa and produced larger dissimilarities in relation to the community composition assessed morphologically. These results were robust to the effect of differences in coverage among samples, which was explicitly controlled statistically by incorporating the number of reads as an offset variable in all models. However, effects were less evident when using stricter criteria for removing rare species with low read counts (i.e. species with a proportion of counts in a sample <0.03% or <0.05%), which suggests that differences between methods were influenced to at least some extent for differential ability to detect rare species. The lower performance of column-based than magnetic-based methods was possibly related to the higher probability of DNA extracts to be washed away, while potentially causing higher retention of contaminants (e.g. cell debris, protein, polysaccharides or humic acids) despite the inclusion of an inhibitor removal solution. The performance of SOIL was particularly poor, likely due to the bead-beating cell lysis step. The majority of DNA in the preservative ethanol might be already in its free form and/or partially degraded, and mechanical lysis can be too damaging for its integrity (Hermans et al., 2018;Leray & Knowlton, 2015), thus affecting the number of taxa recovered. The highest performance of BEAD was probably related with (a) higher affinity of magnetic beads to high-molecular-weight genomic DNA, leaving out potential low, degraded DNA present in preservative ethanol; (b) minimize DNA F I G U R E 4 Variation predicted from GAMMs in Jaccard distance (a,b) and percentage matching (c,d) between EPTO community composition estimated from morphological and metabarcoding data, at family (a,c) and species (b,d) levels, in relation to ethanol subsampling day and DNA extraction method, while controlling for variation in the number of reads across samples. Subsampling was conducted at 1, 2, 3, 5, 7 and 14 days after field sampling, and DNA extractions were performed using three methods: BEAD (purple, solid line), TISSUE (blue, dashed line) and SOIL (green, dotted line). Temporal variation trend lines are provided with the corresponding standard errors. Summary statistics of the GAMMs are provided in Table 4 [Colour figure can be viewed at wileyonlinelibrary.com] washout; and (c) lower yields of contaminants. Overall, results suggest that variations observed among extraction methods were probably related to differential procedures for cell lysis and DNA capture.
The timing of ethanol subsampling did not have significant effects on DNA concentration, but there was a significant increase in DNA integrity over time when using TISSUE. In contrast, there were marked significant effects of subsampling day on metabarcoding results. Temporal effects were particularly strong for SOIL and TISSUE, with performance increasingly rapidly during the first seven to ten days after field sampling, and levelling-off or slightly declining thereafter. The later declines, however, could be a model artefact due to the lack of subsampling in days 8 to 13, as well as lack of data beyond day 14. Effects were also marked for SOIL, with a general tendency for increasing performance with subsampling day. Results for BEAD were generally weaker and less consistent than for the other methods, though the highest performance also tended to be obtained for subsampling in the period 7-14 days. These patterns were weaker when using stricter criteria for the removal of taxa with a low proportion of read counts, indicating that they were affected to at least some extent by the ability for detecting more rare species at later subsampling dates. Reasons for these results may derive at least partly from a progressive release of small quantities of DNA from the macroinvertebrates preserved in ethanol, particularly during the first week after field sampling. This release was probably not sufficient to cause appreciable changes in the concentration and integrity of the extracted DNA, but it was likely enough to enhance detection of rarer species and thus increase taxa richness and the similarity between metabarcoding and morphology.
These effects were weaker for BEAD probably because it consistently yielded higher DNA concentration and integrity irrespective of subsampling day, and thus likely retrieved DNA from rare species even at low concentrations in the preservative ethanol. In contrast, the two column-based methods always obtained lower concentrations and integrity of DNA, likely with comparatively higher presence of inhibitors, and thus could only detect rarer species when the concentration of their DNA increased in the preservative ethanol.
Although there were significant effects of extraction method and subsampling day on the estimates of community composition, these effects accounted for about 35 to 45 times less variation than that observed among sampling sites. Furthermore, variation in community composition accounted for by either extraction or PCR replicates was also much smaller compared to variation among sites, and even compared to variation among extraction methods and subsampling day. It should be borne in mind, however, that we selected TA B L E 4 Summary statistics of GAMM models relating Jaccard's distance and percentage matching between the composition of EPTO communities inferred from morphology and metabarcoding, in relation to subsampling day and DNA extraction methods Note. Extractions were performed from the ethanol used to preserve five unprocessed freshwater macroinvertebrate bulk samples and subsampled on days 1, 2, 3, 5, 7 and 14 after field sampling, using three DNA extraction methods (BEAD, TISSUE and SOIL). For each model, we provide the parameter estimates, standard errors (SE) and statistical significance of parametric terms, and the effective degrees of freedom (edf) and approximate significance of smooth terms. The shape of the smooth terms is provided in Figure 4. The matrix used to build the models excluded rare species with a percentage read count <0.01% from each sample; models build using alternative criteria for dealing with rare species are given in Tables S7 and S8. ***p < 0.001. **p < 0.01. *p < 0.05. ns p > 0.05. sites within the same river basin that a priori were expected to have contrasting macroinvertebrate communities due to differences in local habitats, to test the impacts of laboratory procedures in samples reflecting a wide range of ecological conditions. Therefore, the contribution of sampling sites to overall community variation would likely have been lower if we had chosen ecologically more similar sites. Nevertheless, our results suggest that metabarcoding from preservative ethanol can successfully detect at least large variations in community composition among sites, irrespective of the technical alternatives and levels of replication adopted. Finally, it is noteworthy that variation in coverage had a small, albeit significant effect on variation in community composition between samples, which was much smaller than the effect of this variable on species richness estimators.
The percentage of EPTO taxa morphologically identified at sampling sites that were detected through metabarcoding in at least one molecular sample was high, particularly when using BEAD or TISSUE (≈70%-95%). Percentage of matchings, however, were smaller when considering individual subsampling units, particularly at species level, though they were higher in analysis made with BEAD and with samples taken more than seven days after field sampling. The lower matchings in the individual units were probably a consequence of some species having low concentrations of DNA in the ethanol solutions, which were thus difficult to detect systematically due to the relatively small volume of the aliquot used in our study (2 ml). This suggests that larger volumes of ethanol may need to be taken in future studies to detect consistently all the species represented in the bulk (e.g. Zizka et al., 2018). In contrast, Jaccard's distances were relatively high between morphology and metabarcoding, though they also declined when using BEAD and in subsamples taken more than seven days after field sampling. This was because about 40%-50% of taxa detected through metabarcoding were not detected through morphology, irrespective of extraction method. There may be several reasons for metabarcoding missing species detected morphologically, including the lack of a sufficiently extensive DNA barcode reference collection (Elbrecht, Vamos, et al., 2017), which in our case resulted in 40% of reads unassigned to species. Also, primer bias may have caused some taxa to be missed Elbrecht, Vamos, et al., 2017), while high DNA dilution and degradation in the preservative ethanol may have resulted in the loss of rare taxa during subsampling, extraction or PCR amplification. The latter view is supported by the observation that percentage matching was lower for individual subsamples than for subsamples combined, suggesting that DNA from some taxa was present in some subsamples but not in others. The detection of more species by metabarcoding may also be a consequence of contamination, though care was taken during field and laboratory procedures to avoid it as much as possible. Besides the errors induced by metabarcoding, the patterns obtained can also reflect the limitations and errors of the morphological approach itself, making it difficult to compare our results with those of other studies using mock samples where the mix of species was known without error . In fact, our morphological data, as indeed any comparable data based on field sampling under real conditions, may have missed taxa that were detected with metabarcoding, due for instance to taxa misidentification, the inability to identify some small specimens and larval stages to species or even family levels, or the impossibility to detect eventual taxa represented by specimens that were destroyed or overlooked during the collection, sorting and identification processes (Elbrecht, Vamos, et al., 2017).
Overall, our results provide some guidance on future efforts to develop ecological monitoring programmes for freshwaters based on the metabarcoding of macroinvertebrate bulk samples. First, we confirm that DNA extracted from 96% ethanol used to preserve bulk samples in the field may provide a cost-effective approach to characterize freshwater macroinvertebrate communities (Zizka et al., 2018), as it avoids the preprocessing steps (e.g. cleaning and sorting) required when undertaking metabarcoding from the bulks themselves (Aylagas et al., 2016;Elbrecht, Peinert, et al., 2017).
This should minimize the potential for cross-contamination, the costs and time required to obtain the data following field sampling, and thereby potentially allowing an increase in the number of sites sampled. Second, we highlight the importance of obtaining comprehensive reference collections of DNA barcodes for target taxa, as this may greatly influence the results (Ekrem, Willassen, & Stur, 2007;Elbrecht, Vamos, et al., 2017). Though this may be particularly important in the case of indicator organisms such as EPTO, less known but highly diverse groups such as Diptera should not be neglected since they yield a large number of unassigned reads thus contributing to uncertainties in the data (Ekrem et al., 2007;Kwong, Srivathsan, & Meier, 2012). Third, we suggest that the DNA extraction method needs to be carefully selected and that magnetic-based protocols such as BEAD are likely to provide better results than column-based protocols such as TISSUE and SOIL. This is important because commercial column-based extraction kits are currently the most commonly used methods in metabarcoding for freshwater bioassessment studies (Andújar et al., 2018;Carew et al., 2013;Deiner et al., 2015;Gibson et al., 2014;Hermans et al., 2018;Linard et al., 2016), thereby requiring further assessment on the potential of magnetic bead technology (e.g. Leontidou et al., 2018;Krehenwinkel et al., 2018), or even other approaches such as the salting-out protocol (Elbrecht & Steinke, 2019;Elbrecht, Vamos, et al., 2017). Fourth, we suggest that the results of preservative ethanol metabarcoding are significantly improved when subsampling 7-14 days after field collection rather than earlier on, though this is less important when using the more efficient BEAD protocol. Nevertheless, further research is needed on how the timing of subsampling affects metabarcoding results beyond the time frame analysed in our study. Finally, we suggest that when trading-off biological replication (e.g. the number of sites sampled) and technical replication (e.g. the number of extraction or PCR replicates per site) due for instance to human, logistic and financial limitations, it should be duly considered that the former is often the main source of variation in community composition (Mata et al., 2018; this study), and thus that sampling a large number of sites should be essential to obtain an adequate appreciation of ecological variability in freshwater systems. Future studies should complement our research by further evaluating the impacts of additional methodological procedures, including for instance sample preprocessing (ultrasonic irradiation, shaking, freezing; Zizka et al., 2018) and how it affects subsequent steps in time, ethanol concentrations (Elbrecht, Vamos, et al., 2017;Linard et al., 2016), the ratio of ethanol to bulk volumes and the volume of ethanol analysed. Also, studies are needed on the differential recovery of DNA from different taxa due to variation in body characteristics (e.g. soft vs. hard bodied arthropods), which can affect metabarcoding results (Carew, Coleman, & Hoffmann, 2018;Zizka et al., 2018).
These studies should be essential to gain a better understanding of methodological challenges and potential biases throughout the metabarcoding workflow, from the field through the laboratory, to the bioinformatics processing of sequencing data, thereby contributing to standardize protocols to be used in the next-generation biomonitoring of freshwater ecosystems (Elbrecht, Vamos, et al., 2017;Pawlowski et al., 2018).

ACK N OWLED G M ENTS
This article has received funding from the European Union's

DATA ACCE SS I B I LIT Y
Raw sequencing output is provided as FASTQ files in ENA project Accession no. PRJEB31133, sample Accession nos.
ERS3202521-ERS3203000. All data needed to replicate the analyses reported in this study are provided in the Supporting Information Tables, and are also available at the BioStudies repository (https://www. ebi.ac.uk/biostudies/studies/S-BSST241), Accession no. S-BSST241.