Environmental DNA metabarcoding: Transforming how we survey animal and plant communities

The genomic revolution has fundamentally changed how we survey biodiversity on earth. High‐throughput sequencing (“HTS”) platforms now enable the rapid sequencing of DNA from diverse kinds of environmental samples (termed “environmental DNA” or “eDNA”). Coupling HTS with our ability to associate sequences from eDNA with a taxonomic name is called “eDNA metabarcoding” and offers a powerful molecular tool capable of noninvasively surveying species richness from many ecosystems. Here, we review the use of eDNA metabarcoding for surveying animal and plant richness, and the challenges in using eDNA approaches to estimate relative abundance. We highlight eDNA applications in freshwater, marine and terrestrial environments, and in this broad context, we distill what is known about the ability of different eDNA sample types to approximate richness in space and across time. We provide guiding questions for study design and discuss the eDNA metabarcoding workflow with a focus on primers and library preparation methods. We additionally discuss important criteria for consideration of bioinformatic filtering of data sets, with recommendations for increasing transparency. Finally, looking to the future, we discuss emerging applications of eDNA metabarcoding in ecology, conservation, invasion biology, biomonitoring, and how eDNA metabarcoding can empower citizen science and biodiversity education.


| INTRODUCTION
Anthropogenic influences are causing unprecedented changes to the rate of biodiversity loss and, consequently, ecosystem function (Cardinale et al., 2012). Accordingly, we need rapid biodiversity survey tools for measuring fluctuations in species richness to inform conservation and management strategies (Kelly et al., 2014). Multispecies detection using DNA derived from environmental samples (termed "environmental DNA" or "eDNA") using high-throughput sequencing ("HTS"; Box 1) is a fast and efficient method to survey species richness in natural communities (Creer et al., 2016). Bacterial and fungal taxonomic richness (i.e., richness of microorganisms) is routinely surveyed using DNA metabarcoding and is a powerful complement to conventional culture-based methods (e.g., Caporaso et al., 2011;Tedersoo et al., 2014). Over the last decade, it has been recognized that animal and plant communities can be surveyed in a similar fashion (Taberlet, Coissac, Pompanon, Brochmann, & Willerslev, 2012;Valentini, Pompanon, & Taberlet, 2009).
Many literature reviews summarize how environmental DNA (eDNA) can be used to detect biodiversity, but they focus on singlespecies detections, richness estimates from community DNA (see Box 1 for definition for how this differs and can be confused with eDNA) or general aspects of using eDNA for the detection of biodiversity in a specific field of study (Table S1). To compliment these many recent reviews, here we concentrate on four aspects: a summary of eDNA metabarcoding studies on animals and plants to date, knowns and unknowns surrounding the spatial and temporal scale of eDNA information, guidelines and challenges for eDNA study design (with a specific focus on primers and library preparation) and emerging applications of eDNA metabarcoding in the basic and applied sciences.

| SURVEYING SPECIES RICHNESS AN D RELATIVE ABUNDANCE WITH EDNA METABARCODING
Conventional physical, acoustic and visual-based methods for surveying species richness and relative abundance have been the major ways we observe biodiversity, yet they are not without limitations.
For instance, despite highly specialized identification by experts, in some taxonomic groups identification errors are common (Bortolus, 2008;Stribling, Pavlik, Holdsworth, & Leppo, 2008). Conventional physical methods can also cause destructive impacts on the environment and to biological communities (Wheeler, Raven, & Wilson, 2004), making them difficult to apply in a conservation context. Furthermore, when a species' behaviour or size makes it difficult to survey them (e.g., small-bodied or elusive species), conventional methods can require specialized equipment or species-specific observation times, thus making species richness and relative abundance estimates for entire communities intractable (e.g., many amphibians and reptiles, Erb, Willey, Johnson, Hines, & Cook, 2015;Price, Eskew, Cecala, Browne, & Dorcas, 2012). These reasons highlight the continued need to develop improved ways to survey global biodiversity, and the unique ways eDNA metabarcoding can complement conventional methods.

| Species richness: eDNA metabarcoding compared with conventional methods
Environmental DNA metabarcoding can complement (and overcome the limitations of) conventional methods by targeting different species, sampling greater diversity and increasing the resolution of taxonomic identifications (Table 1). For example, Valentini et al. (2016) demonstrated that, for many different aquatic systems, the number of amphibian species detected using eDNA metabarcoding was equal to or greater than the number detected using conventional methods.
When terrestrial haematophagous leeches were used as collectors of eDNA (blood of hosts), endangered and elusive vertebrate species were detected using eDNA metabarcoding and served as a valuable complement to camera trap surveys in a remote geographic region (Schnell, Sollmann, et al. 2015). In plants, Kraaijeveld et al. (2015) demonstrated that eDNA metabarcoding of filtered air samples allowed pollen to be identified with greater taxonomic resolution relative to visual methods.
The ways that eDNA can complement and extend conventional surveys are promising, but the spatial and temporal scale of inference is likely to differ between conventional and molecular methods.
For example, in a river, Deiner, Fronhofer, M€ achler, Walser, and Altermatt (2016) showed on a site-by-site basis that the eDNA metabarcoding method resulted in higher species detection compared to a conventional physical-capture method (i.e., kicknet sampling; Table 1). However, eDNA in this case may have detected greater species richness at a site not because the species themselves are present, but rather because their DNA has been transported from another location upstream, creating an inference challenge in space and time for eDNA species detections. Therefore, research is needed to understand the complex spatiotemporal dynamics of the various eDNA sample types (Figure 1), which at present we know very little about. In addition, all sampling methods have inherent biases caused by their detection probabilities. Detection probabilities often vary by species, habitat and detection method (e.g., the mesh size of a net or a primer's match to a target DNA sequence) and use of bias-corrected species richness estimators will be important to DEINER ET AL.

Community DNA
DNA is isolated from bulk-extracted mixtures of organisms separated from the environmental sample (e.g., soil or water).

Macro-organism environmental DNA
Environmental DNA originating from animals and higher plants.

Barcoding
First defined by Hebert et al. (2003), the term refers to taxonomic identification of species based on single specimen sequencing of diagnostic barcoding markers (e.g., COI, rbcL).

Metabarcoding
Taxonomic identification of multiple species extracted from a mixed sample (community DNA or eDNA) which have been PCR-amplified and sequenced on a high-throughput platform (e.g., Illumina, Ion Torrent).

High-throughput sequencing (HTS)
Sequencing techniques that allow for simultaneous analysis of millions of sequences compared to the Sanger sequencing method of processing one sequence at a time.
Community DNA metabarcoding HTS of DNA extracted from specimens or whole organisms collected together, but first separated from the environmental sample (e.g., water or soil).

Molecular Operational Taxonomic Unit (MOTU)
Group identified through use of cluster algorithms and a predefined percentage sequence similarity (e.g., 97%; Blaxter et al., 2005).
Since the inception of high-throughput sequencing (HTS, Margulies et al., 2005), the use of metabarcoding as a biodiversity detection tool has drawn immense interest (e.g., Creer et al., 2016;Hajibabaei et al., 2011). However, there has yet to be clarity regarding what source material is used to conduct metabarcoding analyses (e.g., environmental DNA versus community DNA). Without clarity between these two source materials, differences in sampling, as well as differences in laboratory procedures, can impact subsequent bioinformatics pipelines used for data processing, and complicate the interpretation of spatial and temporal biodiversity patterns.
Here, we seek to clearly differentiate among the prevailing source materials used and their effect on downstream analysis and interpretation for environmental DNA metabarcoding of animals and plants compared to that of community DNA metabarcoding.
With community DNA metabarcoding of animals and plants, the targeted groups are most often collected in bulk (e.g., soil, malaise trap or net), and individuals are removed from other sample debris and pooled together prior to bulk DNA extraction (Creer et al., 2016). In contrast, macro-organism eDNA is isolated directly from an environmental material (e.g., soil or water) without prior segregation of individual organisms or plant material from the sample and implicitly assumes that the whole organism is not present in the sample. Of course, community DNA samples may contain DNA from parts of tissues, cells and organelles of other organisms (e.g., gut contents, cutaneous intracellular or extracellular DNA). Likewise, macro-organism eDNA samples may inadvertently capture whole microscopic nontarget organisms (e.g., protists, bacteria). Thus, the distinction can at least partly break down in practice.
Another important distinction between community DNA and macro-organism eDNA is that sequences generated from community DNA metabarcoding can be taxonomically verified when the specimens are not destroyed in the extraction process. Here, sequences can then be generated from voucher specimens using Sanger sequencing. As the samples for eDNA metabarcoding lack whole organisms, no such in situ comparisons can be made. Taxonomic affinities can therefore only be established by directly comparing obtained sequences (or through bioinformatically generated operational taxonomic units (MOTUs)), to sequences that are taxonomically annotated such as NCBI's GenBank nucleotide database (Benson et al., 2013), BOLD (Ratnasingham & Hebert, 2007), or to self-generated (Continues) account for these biases when conducting statistical comparisons between the outcomes in measured richness (Gotelli & Colwell, 2011;Olds et al., 2016).
Future methodological comparisons could also benefit from a quantitative ecological approach in the design of sampling by matching sample effort and scope of sampling between eDNA and conventional methods. Multimethod species distribution modelling or site occupancy modelling is one example for how this can be achieved and has been demonstrated in cases comparing qPCR for a single species and conventional methods (Hunter et al., 2015;Rees, Bishop, Middleditch, et al. 2014;Schmelzle & Kinziger, 2016;Schmidt, Kery, Ursenbacher, Hyman, & Collins, 2013), but rarely for eDNA metabarcoding . Thus, we expect the robustness of eDNA metabarcoding to reveal species richness estimates for animals and plants will be improved by coupling distribution or occupancy modelling with studies to determine the scale of inference in space and time for an eDNA sample ( Figure 1).
2.2 | Species relative abundance: eDNA metabarcoding compared with conventional methods Estimating a species' relative abundance using eDNA metabarcoding is an intriguing possibility. Here, we focus on the evidence from animals in aquatic systems. Controlled studies based on the detection of a single animal species in small ecosystems, such as in aquaria and mesocosms (e.g., Minamoto, Yamanaka, Takahara, Honjo, & Zi, 2012;Moyer, D ıaz-Ferguson, Hill, & Shea, 2014;Pilliod, Goldberg, Arkle, & Waits, 2013;Thomsen, Kielgast, Iversen, Wiuf et al., 2012), in natural freshwater systems (e.g., Doi et al., 2017;Lacoursi ere-Roussel, Côt e, Leclerc, & Bernatchez, 2016) and in marine environments (Jo et al., 2017;Yamamoto et al., 2016) demonstrate that eDNA can be used to measure relative population abundance with a species-specific primer set and qPCR. While many more controlled experiments are needed in all ecosystems to determine the relationship of abundance to copy number observed in qPCR, evidence thus far from water samples signifies that eDNA contains information about a species' relative abundance.
Overall, ascertaining abundance information using metabarcoding of eDNA for whole communities still lacks substantial evidence, but some studies in aquatic environments have shown positive relationships between the relative number of reads and relative or rank abundance estimated with conventional methods. Evans et al. (2016) showed in a mesocosm setting that relative abundance of individuals and biomass was correlated with relative read abundance in mesocosms containing fishes and an amphibian. In a natural lake, H€ anfling et al. (2016) found that the rank abundance derived from long-term monitoring was correlated with read abundance for fish species, and positively correlated with gillnet surveys conducted at the same time as eDNA sampling. In deep sea habitats, Thomsen et al. (2016) found that when reads for fish were pooled to the taxonomic rank of families, there was a correlation with relative abundance of individuals and biomass captured in trawls. While these examples are promising, not all studies support such findings (e.g., Lim et al., 2016).
Challenges to accurate abundance estimation through eDNA metabarcoding stem from multiple factors in the field and the laboratory (Kelly, 2016). In the field, the copy number of DNA arising from an individual in an environmental sample is influenced by the characteristics of the "ecology of eDNA" (e.g., its origin, state, fate and transport; Barnes & Turner, 2016). Because different animal and plant species are likely to have different rates of eDNA production or "origin" (Klymus, Richter, Chapman, & Paukert, 2015), exhibit different "transport" rates from other locations Deiner & Altermatt, 2014) or stability or "fate" of eDNA in time (Bista et al., 2017;Yoccoz et al., 2012), eDNA in an environmental sample could be inconsistent relative to a species' true local and current abundance. Therefore, continued research on how the origin, state, fate and transport of eDNA influence estimates of relative abundance is needed before we can understand the error this may generate in our ability to estimate abundance.
In the laboratory, primer bias driven by mismatches with their target has been shown to skew the relative abundance of amplified DNA from mock communities (Elbrecht & Leese, 2015;Piñol, Mir, Gomez-Polo, & Agust ı, 2015). Similarly, the same mechanism could reference databases from Sanger-sequenced DNA Sønstebø et al., 2010;Willerslev et al., 2014). Then, to at least partially corroborate the resulting list of taxa, comparisons are made with conventional physical, acoustic or visual-based survey methods conducted at the same time or compared with historical records from surveys for a location (see Table 1).
The difference in source material between community DNA and eDNA therefore has distinct ramifications for interpreting the scale of inference for time and space about the biodiversity detected. From community DNA, it is clear that the individual species were found in that time and place, but for eDNA, the organism that produced the DNA may be upstream from the sampled location (Deiner & Altermatt, 2014), or the DNA may have been transported in the faeces of a more mobile predatory species (e.g., birds depositing fish eDNA, Merkes, McCalla, Jensen, Gaikowski, & Amberg, 2014) or was previously present, but no longer active in the community and detection is from DNA that was shed years to decades before (Yoccoz et al., 2012). The latter means that the scale of inference both in space and in time must be considered carefully when inferring the presence for the species in the community based on eDNA. Complementary means the two survey methods detected different diversity, but does not exclude that some of the diversity was detected by both methods. Higher diversity means the study found more diversity was detected compared to conventional, but does not exclude that some of the diversity was not detected by both methods. Better taxonomic resolution means that sequence-based identifications could be resolved to a lower taxonomic rank compared with the conventional method.
alter the relative abundance of a species' DNA amplified from eDNA ( Figure 2). Primer bias results in an increased variance in abundance of reads observed relative to their true abundance in an environmental sample ( Figure 2). Another source of error is related to library preparation methods. Analysis of mock communities has shown that amount of subsampling during processing steps can drive the loss of rare reads (Leray & Knowlton, 2017) and likely occurs for eDNA samples as well (Shelton et al., 2016). The combination of primer bias and library preparation procedures alone could cause a large variance in reads observed for any given species and could prevent rare species detection altogether ( Figure 2). Technical approaches and potential solutions to alleviate primer bias and alternative library preparation methods are discussed in the "Challenges in the field, in the laboratory and at the keyboard" section. While in the end, it may be that eDNA metabarcoding is not the most accurate method for simultaneously measuring the relative abundance for multiple species from eDNA, researchers should consider whether the eDNA metabarcoding method is accurate enough for application in a particular study or an applied setting. Other methods such as capture enrichment are being examined and are promising because they avoid PCR and hence the bias this may cause, but they do require extensive knowledge of the biodiversity to design targeted gene capture probes and they come with a greater costs for analysis (Dowle, Pochon, Banks, Shearer, & Wood, 2016 Mounting evidence suggests that the spatial and temporal scale of inference for eDNA sampled from surface water differs for rivers and lakes ( Figure 1). Specifically, river waters measure species richness present at a larger spatial scale  compared to eDNA in lake surface waters (H€ anfling et al., 2016). Differences between lake and river eDNA signals may be due to the transport of eDNA over larger distances in rivers compared to longer retention times of water in lake systems (Turner, Uy, & Everhart, 2015). However, lakes and ponds with river and surface run-off inputs, combined with lake mixing or stratification, may serve as eDNA sources for catchment-level terrestrial and aquatic diversity estimates similar to rivers . No studies to date have estimated the sources of eDNA in surface water from a lake's catchment and related it to the diversity locally occurring in the lake. However, ancient DNA from sediment cores in lakes (sedaDNA) has been used to determine historical plant (e.g., Pansu, Giguet-Covex, Ficetola, et al. 2015;Parducci et al., 2013) and livestock communities In addition to surface freshwater (~1%), groundwater (~30%) and ice (~69%) comprise much of earth's freshwater (Gleick, 1993).
While the other freshwater habitats far surpass the amount of surface water, their extant biodiversity is rather poorly described (Danielopol, Pospisil, & Rouch, 2000). Groundwater is known to harbour a wide range of specialist taxa which are difficult to assess using conventional survey methods due to the inaccessibility of these habitats (Danielopol et al., 2000). Groundwater microorganism metabarcoding studies have shown high fungal (Sohlberg et al., 2015) and bacterial (Kao et al., 2016) (Valiere & Taberlet, 2000). Environmental DNA metabarcoding of water from glacial run-off will also likely be a valuable tool to survey animal and plant richness living in glacial and subglacial habitats, which are undergoing dramatic change because of climate warming (Giersch, Hotaling, Kovach, Jones, & Muhlfeld, 2017).

| Marine ecosystems
The use of eDNA metabarcoding is often described as challenging in marine ecosystems, due to the potential dilution of eDNA in large volumes of water and additional abiotic factors (salinity, tides, currents) that likely impact eDNA transport and degradation (Foote et al., 2012;Port et al., 2016;Thomsen et al., 2012b), not to mention the logistics involved in undertaking such surveys. Nonetheless, eDNA metabarcoding surveys of marine fish from coastal water samples have demonstrated that eDNA can detect a greater taxonomic diversity compared to conventional survey techniques (Table 1), while simultaneously improving the detection of rare and vagrant fish species and revealing cryptic species otherwise overlooked by visual assessments (O'Donnell et al., 2017;Port et al., 2016;Thomsen et al., 2012bThomsen et al., , 2016. Marine mammals have been surveyed with acoustic surveys and eDNA metabarcoding, and here, the conventional acoustic methods detected a greater species richness (Foote et al., 2012). Nevertheless, this study used low sample volumes compared to other marine studies (15-45 ml vs. 1.5-3.0 L), and the authors concluded that larger sample volumes would likely lead to greater similarity between eDNA and conventional methods. In Monterey Bay, California, water sampled from depths <200 m or >200 m were used to detect marine mammals such as seals, dolphins and whales in addition to many fishes and sharks (Andruszkiewicz et al., 2017). The taxonomic groups detected were spatially explicit and were found more or less in water associated with their expected habitat.
Longitudinal transport of animal and plant eDNA in marine environments is not well studied. But, similar to freshwater sediment cores from lakes, vertical transport into marine sediments is likely to preserve a large proportion of eDNA from particulate organic matter or eDNA that has become directly adsorbed onto sediment particles.
This absorption shields nucleotides from degradation (particularly oxidation and hydrolysis) and facilitates long-term preservation of genetic signals over potentially large spatiotemporal scales ( Figure 1).
Marine sediment eDNA concentrations have been shown to be three orders of magnitude higher than in seawater eDNA (Torti, Lever, & Jørgensen, 2015) and eDNA from both ancient and extant communities is typically recovered (Lejzerowicz et al., 2013). Similar to lake sediments, marine sediments can accumulate genetic information from both terrestrial and pelagic sources (Torti et al., 2015).

| Terrestrial and aerial ecosystems
Environmental DNA from terrestrial sediment cores is a valuable tool for investigating past environments and reconstructing animal and plant communities (Figure 1; Haouchar et al., 2014;Jørgensen et al., 2012;Willerslev et al., 2003). Animal remains also provide opportunities to reconstruct past trophic relationships. For example, eDNA metabarcoding of pellets in herbivore middens has been used to identify species in ancient animal and plant communities (Figure 2; Murray et al., 2012) and DNA traces from microplant fossils within coprolites were used to reconstruct former feeding relationships in rare and extinct birds (Wood et al., 2012). Again here, the recent reviews of Brown and Blois (2016) and Pedersen et al. (2015) provide a more extensive overview for how ancient DNA is used to uncover past animal and plant communities.
In modern environments, eDNA isolated from top soils has been used to characterize biodiversity in earthworms (Bienert et al., 2012;Pansu, De Danieli, Puissant, et al. 2015 and are not necessarily using eDNA sources from faecal DNA to estimate species richness of terrestrial communities. Boyer, Cruickshank, and Wratten (2015) proposed that surveys of faeces from generalist predators can act as "biodiversity capsules" and analysis of this eDNA source should give rise to biodiversity surveys for prey communities DEINER ET AL.
| 5879 in landscapes. While all of these sources are available, most of these sample types (e.g., leaves from a tree, faecal pellets, spider webs and dust) do not have a known scale of inference in space and time. A single sample of eDNA from these sources is not likely to confirm species richness for more than a local scale, but combination of multiple sample sources (e.g., leaves, faecal pellets and spider webs throughout a park) sampled over time may allow for spatial and temporal estimates of terrestrial species richness.
Surveys of airborne eDNA have placed greater emphasis on the detection of bioaerosols that cause infection or allergic responses in animals and plants (West, Atkins, Emberlin, & Fitt, 2008). For example, Kraaijeveld et al. (2015) investigated airborne pollen that can cause hay fever and asthma in humans and showed that the source of allergenic plant pollen could be identified more accurately using eDNA from plant pollen filtered from the air compared to microscopic identification. A particularly interesting area for further research is to gain an understanding of the scale of inference for air samples in space and time ( Figure 1). While plant eDNA can be ascertained, surveying other species such as birds and insects from aerial eDNA sources has not been tested to our knowledge.

| CH ALLEN GE S IN THE FIELD, IN THE LABORATORY AND AT THE KEYBOARD
Despite the obvious power of the approach, eDNA metabarcoding is affected by a host of precision and accuracy challenges distributed throughout the workflow in the field, in the laboratory and at the keyboard (Thomsen & Willerslev, 2015). Following study design (e.g., hypothesis/question, targeted taxonomic group, Figure 3), the current eDNA workflow consists of three components: field, laboratory and bioinformatics. The field component consists of sample collection (e.g., water, sediment, air) that is preserved or frozen prior to DNA extraction. The laboratory component has four basic steps: (i) DNA is concentrated (if not performed in the field) and purified, (ii) PCR is used to amplify a target gene or region, (iii) unique nucleotide sequences called "indexes" (also referred to as "barcodes") are F I G U R E 3 Important guiding questions for consideration in the design and implementation phases of an environmental DNA metabarcoding study incorporated using PCR or are ligated onto different PCR products, creating a "library" whereby multiple samples can be pooled together, and (iv) pooled libraries are then sequenced on a highthroughput machine (most often the Illumina HiSeq or MiSeq platform). The final step after laboratory processing of samples is to computationally process the output files from the sequencer using a robust bioinformatics pipeline (Figure 3, Box 2). Below, we emphasize the important and rapidly evolving aspects of the eDNA metabarcoding workflow and give recommendations for ways to reduce error.

| In the field
As for any field study, the study design is of paramount importance  (Torti et al., 2015). It is clear that at least some freshwater eDNA comes from intact cellular or organellar sources because it has recently been demonstrated to be available in the genomic state ). Thus, eDNA in water exists in both undegraded and degraded forms ).
However, continued research on the origin, state and fate of eDNA will greatly inform numerous strategies regarding its acquisition (filtering, replication, sample volumes and spatial sampling strategies; Barnes & Turner, 2016). Many methods for solving current challenges of false negatives (e.g., use of biological replicate sampling, improved laboratory methods) and false positives (e.g., use of negative controls) in the field are explored in a recent review (Goldberg et al., 2016); we therefore refer readers to this review rather than treat those topics in-depth here.  (Hebert, Ratnasingham, & de Waard, 2003), and a two-loci combination of rbcL and matK as the plant barcode (Hollingsworth et al., 2009) with ITS2 also suggested as valid plant barcode marker (Chen et al., 2010). However, there are limitations for using the standard barcoding markers in macro-organism eDNA metabarcoding. Specific to COI, other DNA regions are commonly used because not all taxonomic groups can be differentiated to species equally well (Deagle, Jarman, Coissac, Pompanon & Taberlet, 2014) and because it is challenging to design primers in this gene for a length that is suitable for short amplicon analysis, but some regions have been identified (Leray et al., 2013). The most common alternative markers used are mitochondrial ribosomal genes such as 12S and 16S or protein-coding genes such as cytochrome B (Table S2). Specific to the plant barcoding loci, the two-loci combination primarily used for barcoding plants can be independently generated, but is not always possible to recover which fragment from each gene is associated with each other in an eDNA sample, rendering species identification using the standard plant barcode challenging. Bioinformatic methods can help resolve these situations to some extent, and may work when diversity is low in a sample (Bell et al., 2016). Therefore, often one or dif-  Table S2).

| In the laboratory
Additionally, some highly evolving noncoding loci, such as ITS rRNA, are used (

BOX 2 BASIC BIOINFORMATIC PIPELINE FOR EDNA METABARCODING FOR PLANTS AND ANIMALS
Bioinformatic processing of sequence data is one of the most critical aspects of eDNA metabarcoding studies, helping to substantiate research findings, following field and laboratory work components. Standardization of bioinformatics in a "pipeline" can ensure quality and reproducibility of findings; however, some level of customization is required across studies. Customization is needed to compensate for advances in sequencing technology, software workflows and the question being addressed. Therefore, taking raw read data and turning it into a list of taxa require multiple quality assurance steps-some necessary, while others optional. Reaching an absolute consensus for the approaches and software used is not necessary as these will always be in flux, but here we advise careful consideration of the following preprocessing steps at a minimum for HTS data before embarking on further analyses (e.g., for biodiversity estimates and statistical significance). We focus primarily on processing Illumina generated data sets, and therefore if the technology is different, many of the bioinformatic tools highlighted and advice is transferable to preprocessing of data produced on other platforms, but may be different.

Terms
Chimeras PCR artefacts made of two or more combined sequences during the extension step of PCR amplification.

Phred quality score
Quality scoring per nucleotide for Illumina sequencing providing the probability that a base call is incorrect.

Sequence merging
Combining forward (R1) and reverse (R2) reads from paired-end (PE) sequencing, using criteria such as minimum overlap or quality score.

Sequence trimming
The process of cutting/removing the beginning or end of sequencing reads can be performed either by searching for a specific sequence (removal of adaptors, indexes and primers) or based on quality score.

Singletons
MOTUs that appear only once in the data are likely to be rare taxa, false positives, low-level contamination or unremoved chimeras, and should be treated with appropriate consideration.

Primer-adaptor trimming
Preliminary steps of bioinformatics processing include demultiplexing of the samples based on the indices used (unique nucleotide tags incorporated into raw sequence data) and trimming (i.e., removal) of the adaptor sequences. The adaptors are specific DNA fragments that are added during library preparation for ligation of the DNA strands to the flow cell during Illumina sequencing. Additionally, the index sequences themselves and the primer sequences should be trimmed (e.g., using software such as Cutadapt, Trimmomatic, QIIME), allowing either zero or a low level of mismatch between the exact sequence of the primer or index and the observed reads.

Merging or end trimming
Sequences from Illumina runs tend to drop in quality towards the 3 0 end of the reads, as phasing leads to increased noise (and lower signal) in later chemistry cycles. Thus, the quality score of reads should be reviewed to allow informed decisions on the appropriate length of end trimming (single-end runs), merging (paired-end runs) and subsequent sequence quality filters. Visualizing the quality scores from raw reads or demultiplexed sequences (using software such as FastQC) will help with the selection of downstream quality cut-off levels.
When paired-end (PE) sequencing is used for an amplicon of suitable size, the forward (R1) and reverse (R2) reads should be combined (merged) to form the complete amplicon. Using merged sequences improves accuracy as the lower-quality bases at the tail ends of individual reads can be corrected based on the combined reads. Here, the minimum overlap for R1 and R2 reads should be specified and "orphan" reads with little or no overlap between forward and reverse pairs can be discarded. Inspection of the quality scores, as mentioned above, can provide an estimate of optimal parameters for merging of R1 and R2 reads. Even though a specific consensus does not exist yet; in many cases, an overlap of at least > 20 bp is selected (Deiner et al., 2015;Gibson et al., 2015).

Quality filtering
For most HTS platforms, a Phred score is calculated and subsequently used to determine the maximum error probabilities (Bokulich et al., 2013). Selected strategies include filtering based on a lower Phred score cut-off, usually set at least above 20 or 30 (Bista probability, which is also derived from Phred scores. The lower the maximum error, the stricter the cut-off. Selection of a maximum error filtering level of 1 or 0.5 is common in macro-organism studies (Bista et al., 2017;Pawlowski et al., 2014;Port et al., 2016).
Additionally, in the case of single-end sequencing, or when long amplicons without sufficient overlap of the forward and reverse reads are used, it is advised that trimming should be performed from the appropriate end. It is often the case that reads are trimmed to a common length, which facilitates alignment downstream and minimizes miscalled bases as a merging step cannot be used.

Removing short reads
Many studies also select to remove short reads from the data set before clustering as the presence of high length variation could influence the clustering process (see USEARCH manual, Edgar, 2010). These sequences could result from sequencing of primer dimers that have not been removed (Pawlowski et al., 2014). Different studies select a variety of minimum length reads, from very short 20 bp , to medium 60-80 bp (Pawlowski et al., 2014;Shaw et al., 2016) and up to 100 bp (Bista et al., 2017;Gibson et al., 2015;H€ anfling et al., 2016;Pawlowski et al., 2014). Note that some demultiplexing or quality filtering workflows may automatically set a minimum sequence length when processing input data and it is advisable to check whether such a parameter is included by default.

Removing singletons and chimeras
Important steps after MOTU clustering involve removal of singletons and chimeras. Chimeras are by-products of the PCR amplification process from two or more parental sequences (chimeric), most commonly produced through an incomplete extension step (Edgar, Haas, Clemente, Quince, & Knight, 2011). It has been shown that when unique reads, such as chimeras and singletons, are withheld in analysis, the estimation of diversity can be severely inflated (Kunin, Engelbrektson, Ochman, Hugenholtz, 2010). The nature of the chimeric sequences, which can be present as high-quality reads, does not enable their removal directly through quality-based end trimming . Removal of chimeras can be performed either de novo or based on a reference database. Most common practice to date is the de novo method as a sufficient reference database may not be available. Despite the variation in software used such as UCHIME (Edgar et al., 2011), OBITOOLS  or CHIMERASLAYER , there is a consensus regarding the importance of removing chimeras and singletons as a minimum quality control for bioinformatics pipeline.

Abundance filtering
In addition to quality filtering based on quality scores and removal of chimeras and singletons, many studies also employ further filtering for removal of low abundance sequences (Murray et al., 2015). This step arises from the need to control for laboratory contamination or because of cluster contamination on the flow cell (unique to Illumina platforms; Olds et al., 2016).
The process of applying abundance filtering requires setting an MOTU abundance threshold by which MOTUs are only retained in analysis if their relative abundance is higher than the selected threshold (Bokulich et al., 2013). Selection of a threshold varies between studies and there is no generally accepted definition of what constitutes an insufficiently abundant read (Murray et al., 2015), perhaps with the exception of singletons. Abundance filtering may be applied minimally or avoided entirely, especially if stringent quality trimming parameters are applied to raw reads and detection of "rare" MOTUs is an important aspect of a study (Bokulich et al., 2013). Another option that could be used involves selection of a threshold based on availability of empirical data as was performed in Valentini et al. (2016). An increasing number of studies have employed the sequencing of positive controls to establish a threshold level (H€ anfling et al., 2016;Port et al., 2016;Stoeckle, Soboleva, & Charlop-Powers, 2017). Technical replicates can also be used to assess consistency as was shown to be effective with assessing omnivore diets (De Barba et al., 2014).
Using a positive control-defined error level works by identifying the abundance of sequences in the control sample that belong to nontarget taxa and can be the result of errors such as contamination. Furthermore, the distribution of phiX reads assigned to target samples has been used to investigate the presence of "tag-jumps" (Schnell, Bohmann, et al. 2015) and mis-assigned reads during demultiplexing (H€ anfling et al., 2016;Olds et al., 2016). The exact mechanisms for mis-assignment of reads remain unknown, but increasingly many studies are reporting this error to be between 0.01% and 0.03% of reads (H€ anfling et al., 2016;Olds et al., 2016;Stoeckle et al., 2017). Adjustments for this include use of a threshold approach-based on negative and/or positive controls and remove a low number of reads from any given sample. The issue of abundance filtering most significantly causes uncertainty in low abundance MOTUs and will continue to be a problem for the detection of rare species. Therefore, to avoid negative impacts to scientific insights or management decisions, careful consideration and transparency regarding how technical artefacts are dealt with during bioinformatic data analysis is needed until these artefacts are well understood.

| 5883
Amplicon size is also an important consideration because there may be a trade-off in detection with amplicon length (e.g., short fragments are more likely to amplify). However, short fragments may persist longer in the environment and increase the inference in space or time that can be made from an environmental sample (Bista et al., 2017;Deagle, Eveson, & Jarman, 2006;Jo et al., 2017;Yoccoz et al., 2012). Additionally, use of more than one locus for a target group can allow for tests of consistency between loci and increase stringency of detection for any species .
Once primers are designed and PCR products are amplified, eDNA metabarcoding relies on multiplexing large numbers of samples on HTS platforms in order to make the tool cost-effective. Illumina (MiSeq and HiSeq) sequencing platforms at the moment outperform other models for accuracy (Loman et al., 2012), and multiplexing samples is usually achieved by the incorporation of sample-specific nucleotide indices and sequencing adapters during PCR amplification.
However, multiplexing creates opportunities for errors and biases. In this facet of the workflow, it is important to avoid methods that induce sample-specific biases in amplification (O'Donnell, Kelly, Lowell, & Port, 2016) and to reduce the potential for index crossover, or "tag jumping" (see Box 2; Schnell, Bohmann, & Gilbert, 2015). To address these issues, Illumina has developed a two-step PCR protocol using uniformly tailed primers across samples for the first step and sample-specific indexes for the second PCR, which could reduce bias related to index sequence variations (Berry, Mahfoudh, Wagner, & Loy, 2011;Miya et al., 2015;O'Donnell et al., 2016). Regardless of the strategy employed, extreme care is needed to ensure primer quality control (e.g., both use of small aliquots from stocks as well as proper cleaning of PCR-amplified products to remove indexing primers after amplification; Schnell, Bohmann, et al. 2015). When a species detection is suspected as highly unlikely in a sample, single-species quantitative PCR (qPCR) can be used to verify its presence from the same eDNA sample because qPCR does not suffer from the same technical sources of error. Additional suggestions for dealing with multiplexing artefacts are suggested in Box 2 under "abundance filtering." In addition, both positive and negative controls must be used in the laboratory to ensure sample integrity (Figure 3). Use of positive control samples (either from pooled DNA extracts derived from tissue at the PCR stage, or used at the extraction stage alongside that of eDNA samples) can help evaluate sequencing efficiency and multiplexing errors in the eDNA metabarcoding workflow (H€ anfling et al., 2016;Olds et al., 2016;Port et al., 2016). Careful thought in the construction of the mock community is needed. Typically, species not expected in the study area are used Thomsen et al., 2016) such that if there is contamination during the workflow, their reads can be identified, removed and serve as a control for detecting contamination when it occurs.
Negative controls should be introduced at each stage of laboratory work (i.e., filtration-if performed in the laboratory, extraction, PCR and indexing). We recommend that an equivalent amount of technical replication should be used on negative and positive controls as that carried out on actual samples . Furthermore, it is becoming important that negative controls are sequenced regardless of having detectable amounts of DNA because contamination can be below detection limits of quantification and sequences found in these controls can be used to detect demultiplexing errors or used in statistical modelling to rule out false-positive detections .
Finally, an important but often neglected consideration for the eDNA metabarcoding workflow is the identification of technical artefacts that arise independently of true biological variation. For example, recently in a study focused on bacterial biodiversity using the 16S locus, it was shown that a run effect can be confounded with a sample effect if it is not accounted for (e.g., by splitting sample groups across multiple Illumina runs, Chase et al., 2016); however, it remains to be seen whether such technical artefacts are also prevalent for loci used for metabarcoding plant and animals from eDNA (COI, 18S, ITS, etc.), and more research is needed. Until then, careful thought into how samples are pooled and run on a sequencer seems warranted in order to not confound the hypotheses being tested.

| At the keyboard
Bioinformatic processing of high-throughput sequence data sets requires the use of UNIX pipelines (or graphical wrappers of such tools, Bik et al., 2012). Metabarcoding of animal and plant community DNA is comprehensively outlined in Coissac, Riaz, and Puillandre (2012). Below and in Box 3, we highlight the common practices to community DNA metabarcoding and deviations for studies focusing on macro-organism eDNA metabarcoding.

Recording removed data
For all quality control steps, the data removal should be transparent. Often studies report the total number of sequences obtained, but then rarely show how each quality filtering step affects the number of sequences used in testing ecological hypothesis nor do researchers provide the subset of sequences that were retained or omitted. Deleting data without a clear justification does not allow transparency. Therefore, including a supplemental table in eDNA metabarcoding studies showing the number of sequences remaining after each filtration step is advised and archiving the subset of reads retained after each filtering step on a platform such as DRYAD (http://datadryad.org/) or archiving the exact pipeline with version control information on a platform such as GITHUB (https://github.c om/) will allow for greater transparency and reproducibility of quality filtering.

BOX 2 (Continued)
Bioinformatic pipelines and parameters must be carefully considered (Box 2), and it is important to work with a knowledgeable computational researcher to understand how processing can impact the biological results and conclusions. Before computationally processing an eDNA metabarcoding data set, perhaps the strongest message from Coissac et al. (2012) is to identify the differences between the analysis of data derived from microbial and macro-organismal groups. As microbial ecologists have been inspired to use sequence-based

MOTU clustering
While this step is not always necessary and depends on the target set of taxa (Lacoursi ere-Roussel, Dubois, Normandeau, & Bernatchez, 2016), the amplicon length sequenced  and completeness of the reference database (Chain, Brown, MacIsaac, & Cristescu, 2016), clustering of sequencing reads into MOTUs is often performed prior to taxonomic assignment. MOTU clustering is the process whereby multiple reads are grouped according to set criteria of similarity based on an initial seed (Creer et al., 2016;Egan et al., 2013). Here, a centroid sequence is selected and depending on the set radius or similarity cut-off, closely related sequences are grouped under each centroid sequence (USEARCH, Edgar, 2010). The level of similarity selected depends on the study and taxon used, based on the knowledge of intraspecific diversity of the studied taxon. Commonly used cut-offs range from 97% to 99% (Bista et al., 2017;Fahner, Shokralla, Baird, & Hajibabaei, 2016;Olds et al., 2016). For example, the cut-off selected could depend on known levels of intraspecific diversity of the studied taxon, which could be estimated from an existing reference database. Some commonly used clustering algorithms include USEARCH (Edgar, 2010),

Taxonomic assignment
Identification of HTS reads is achieved through a comparison of anonymous MOTU clusters/centroid sequences or direct comparisons of reads remaining after quality filtering against a reference database. Depending on the taxon of study and the marker used, the reference database may consist of publicly available sequences or study-generated reference sequences.
The challenges of taxonomic assignment have been the subject of a considerable literature so we only briefly discuss this important aspect of the bioinformatics pipeline (e.g., Bazinet & Cummings, 2012). A number of different approaches have been suggested including assignment based on sequence similarity via alignment programs like BLAST or similarity searches using hidden Markov models such as JM-MOTU (Jones, Ghoorah, & Blaxter, 2011), MG-RAST (Glass, Wilkening, Wilke, Antonopoulos, & Meyer, 2010), sequence composition and machine learning approaches (e.g., RDP (Wang, Garrity, Tiedje, & Cole, 2007), TACOA (Diaz, Krause, Goesmann, Niehaus, & Nattkemper, 2009)), phylogenetic placement (e.g., PPLACER Matsen, Kodner, & Armbrust, 2010), probabilistic taxonomic placement (e.g., PROTAX (Somervuo, Koskela, Pennanen, Henrik Nilsson, & Ovaskainen, 2016;Somervuo et al., 2017), minimum entropy decomposition (e.g., oligotyping, Eren et al., 2015), MEGAN (Huson, Auch, Qi, & Schuster, 2007) and ecotag . A number of widely used programs use combinations of these methods; for example, the program SAP (Munch, Boomsma, Huelsenbeck, Willerslev, & Nielsen, 2008) uses BLAST searches of the NCBI database and phylogenetic reconstruction to establish taxonomic identity of query sequences. Most of these methods and various derivatives are nicely discussed and compared by Bazinet and Cummings (2012). Two major determinants of the utility of these different approaches are the specific eDNA markers and the breadth and resolution of reference databases. Some markers have better representation in available databases and greater coverage of relevant species diversity. Taxonomic assignment using the BLAST algorithm (Camacho et al., 2009) is commonly used, and depending on the study, different selection criteria are specified, such as e-value, maximum ID or length of matching sequence, number of top hits selected. Caution is warranted in strictly relying on this approach, as errors in the curation of sequences in publicly available databases can propagate through the analysis and lead to misidentification of sequences. Ideally, a combination of approaches is used, and when feasible, the resultant species assignments should be vetted with independent data based on the known distribution and ecology of the species.

Diversity analysis
The goal of most eDNA metabarcoding studies is to accurately characterize the species richness of the community under study. Calculation of diversity indices using appropriate software allows modelling and ecological association of sequencing results. Important considerations when attempting ecological associations include appropriate data standardization to account for variations in sequencing depth and the careful selection of diversity indexes. The most common assessments include alpha diversity (rarefaction, visualization of taxonomic profiles) and beta diversity (principal components/coordinates analysis, NDMS ordination, etc.), prior to hypothesis testing via downstream statistical analysis. DEINER ET AL.
| 5885 identification of taxa over the past 40 years (Creer et al., 2016), the range of software solutions to analyse microbial metabarcoding data sets is unsurprisingly extensive (Bik et al., 2012). Perhaps more importantly, a number of established and maintained databases exist featuring many of the commonly used microbial taxonomic markers for prokaryotes (Cole et al., 2009), microbial eukaryotes (Guillou et al., 2013;Pruesse et al., 2007;Quast et al., 2012) and fungi (Abarenkov et al., 2010), meaning that microbial data sets can be analysed and taxonomic affiliations established are established in a straightforward way.
For macro-organism communities, preprocessing and initial quality control of eDNA metabarcoded data sets is not different from that of microbial data sets and can be acquired using packages developed either for microbial (Caporaso et al., 2010) or for macro-organism data , but taxonomic assignment will require a robust data set of locus-specific reference sequences and the associated taxonomic data from a reference database Box 3). Currently, the two most common reference sources for macro-organisms are NCBI's nucleotide database (Benson et al., 2013) and the Barcode of Life Database (Ratnasingham & Hebert, 2007). The utility and taxonomic breadth of these databases can be enhanced by the creation of custom-made or hybrid databases, with the obvious additional workload and cost depending on the number of focal taxa missing from current data sources. Recently, Machida, Leray, Ho, and Knowlton (2017) have assembled and proposed metazoan mitochondrial gene sequence data sets that can be used for taxonomic assignment for environmental samples.
While these data sets do not account for future growth, their methods could be repeated at the time of any new study to generate a custom reference data set for taxonomic assignment.
Macro-organism eDNA metabarcoding data sets are associated with advantages compared to microbial data sets because the number of taxa in any survey will be comparatively low, reducing the computational time needed for taxonomic annotation. Moreover, the species delimitation concepts and taxonomic markers associated with macroorganisms are well developed (de Queiroz, 2005) and can even be used to analyse population genetic structure Thomsen & Willerslev, 2015), or delimit species boundaries Hebert et al., 2003;Tang, Humphreys, Fontaneto, & Barraclough, 2014). Reliance on the vast knowledge we have for animal and plant taxonomy and biogeography is a distinct advantage for eDNA metabarcoding because of the independent test that it provides to calibrate and test the tool for its precision and accuracy .

| Data archiving for transparency
As eDNA applications continue to develop, all procedures used in the field, laboratory and during bioinformatic data processing require a strong commitment to transparency on the part of researchers (Nekrutenko & Taylor, 2012). Here, we outline best practices for eDNA metabarcoding studies of macro-organisms, following on from wellestablished standards in the fields of microbiology and genomics (Yilmaz et al., 2011 any "x" sequence) specifications (Yilmaz et al., 2011). Goldberg et al. (2016) have made specific recommendations for upholding these reporting standards specific to eDNA studies (see Table 1 in Goldberg et al., 2016).
Third, computational processing of data needs to be reproducible (Sandve, Nekrutenko, Taylor, & Hovig, 2013 Sandve et al. (2013) provide 10 rules that can be followed to ensure such reproducibility, and we strongly encourage researchers using eDNA metabarcoding methods to uphold these practices and take advantage of archiving intermediate steps (Box 2) of their analysis for full transparency.

| Applications in ecology
Quantifying the richness and abundance of species in natural communities is and will continue to be a goal in many ecological studies.
Information about species richness garnered from eDNA is not necessarily different from conventional approaches (Table 1) Hawkins et al., 2015;Xu et al., 2015). Knowledge of species co-occurrences and interactions in these instances will additionally foster the study of metaecosystems and provide data to guide management decisions at the ecosystem scale (Bohan et al., 2017). What will remain challenging is moving beyond richness estimates to also obtaining species abundance data (Figures 2 and 4).

| Applications in conservation biology
Given the rapid rate at which biodiversity is declining worldwide (Butchart et al., 2010), it is critical that we improve the effectiveness of strategies to halt or reverse this loss (Thomsen & Willerslev, 2015;Valentini et al., 2016). Accordingly, developing tools that enable rapid, cost-effective and noninvasive biodiversity assessment such as eDNA metabarcoding, especially for rare and cryptic species, is paramount ( Figure 4). Improved estimates of the distribution of vulnerable species, and done so noninvasively, would facilitate policy development and allow for efficient targeting of management efforts across habitats (Kelly et al., 2014;Thomsen & Willerslev, 2015). For example, documenting the presence of threatened species in a habitat can trigger a suite of actions under laws pertaining to biodiversity conservation (e.g., US Endangered Species Act). Frequently, data relevant to policy are derived from monitoring efforts mandated by environmental laws imparting a significant consequence to the data collected (Kelly et al., 2014).
Environmental DNA-based monitoring is likely to be a tremendous boon to often underfunded public agencies charged with compliance to data-demanding laws. Specifically, eDNA metabarcoding will be useful for monitoring communities when many species are of conservation concern. Vernal pools throughout California are a prime example because they contain 20 US federally listed endangered or threatened species of plants and animals. Monitoring species richness with soil and water samples from a habitat such as this would provide a comprehensive sampling method to ascertain needed community data for their conservation and management (Deiner, Hull, & May, 2017). However, while eDNA metabarcoding may be important for noninvasively gaining access to the distribution of vulnerable species, it cannot be used to differentiate between alive and dead organisms or estimate many demographic parameters important of population viability analysis (Beissinger & McCullough, 2002).
Quantifying baselines of animal and plant species richness and departures from those baselines is central to the assessment of environmental impact and conservation (Taylor & Gemmell, 2016). The application of eDNA metabarcoding methods to different samples types, which taken together allow inference across time (e.g., surface water and sediment layers from a core in a lake, Figure 1), provides a unique tool to document local extinctions and long-term changes in ecosystems. Extinction models often rely on and understanding extinction timelines (reviewed in Thomsen & Willerslev, 2015). The efficiency of eDNA metabarcoding to track the timing of extinctions associated with previous glacial events has been demonstrated in mammals (Haile et al., 2009) and plants .

| Applications in invasion biology
Because one of the first applications of eDNA to macro-organisms was the detection of North American bullfrogs in French ponds (Ficetola et al., 2008), the method immediately came to the attention of researchers interested in invasion biology (e.g., Egan et al., 2013;Goldberg, Sepulveda, Ray, Baumgardt, & Waits, 2013;Jerde, Mahon, Chadderton, & Lodge, 2011;Takahara, Minamoto, & Doi, 2013;Tr eguier et al., 2014). These initial studies, as well as much ongoing research, continue to be based on species-specific primers, where positive amplification provides occurrence evidence for a particular invasive species. In invasion biology with eDNA, such a targeted approach is referred to as "active" surveillance (Simmons, Tucker, Chadderton, Jerde, & Mahon, 2015).
On the contrary, eDNA metabarcoding makes it possible to detect the presence of many species simultaneously, including species not previously suspected of being present. This broader untargeted approach is called "passive" surveillance in management applications ( Figure 4; Simmons et al., 2015). On the downside, due to a trade-off in primer specificity, we expect that eDNA metabarcoding may be less sensitive in detecting some species or that the detection rate of a species can change depending on species richness. Adopting a dual approach of passive and active surveillance could be considered in cases where the risk of a new invasion is high and where cost-effective eradication plans for undesirable species are likely to be successful .
Avoiding future introductions and reducing the spread of exotic species is paramount in natural resource policy .
Environmental DNA metabarcoding relevant to management includes early detection of incipient invasive populations in the environment, surveillance of invasion pathways, for example, ballast water of ships (Egan et al., 2015;Zaiko et al., 2015) and the live bait trade (Mahon, Nathan, & Jerde, 2014). While eDNA metabarcoding is not yet routinely used for biosecurity regulation of invasive species or enforcement in many settings, it has the potential to become valuable monitoring tool for biological invasions. An important challenge for the use of eDNA metabarcoding in invasive species detections are false positives and false negatives as both outcomes can trigger action or inaction when not required, causing a potentially large burden on entities responsible for invasive species mitigation and control (Fig. 4). Therefore, continued research to reduce or understand the nature of false positives and false negatives will reduce uncertainty in the tool and facilitate greater adoption.

| Applications in biomonitoring
Pollution of air, water and land resources generated from processes such as urbanization, food production and mining is one of the many emerging global challenges we are facing in the 21st century (V€ or€ osmarty et al., 2010). Determining the origin, transport and effects of most pollution is challenging because it accumulates through both point sources (e.g., wastewater effluent) and diffused sources related to land-use types (e.g., agriculture or urbanization).
In this context, the presence of tolerant or absence of sensitive organisms has been used to determine the consequences of pollution on ecosystem health throughout the world and is termed "biological monitoring" or "biomonitoring" (Bonada, Prat, Resh, & Statzner, 2006). The extent to which animals and plants have been used in biomonitoring depends on the unique characteristics of the taxonomic group monitored and their relationship to the pollution of interest (Bonada et al., 2006;Stankovic, Kalaba, & Stankovic, 2014). Most biomonitoring programmes take community composition and often abundance of taxa into account and calculate what is known as a biotic index (Friberg et al., 2011). Biotic indices take many forms and are typically surrogates for the impacts of pollution

| Applications in citizen science and biodiversity education
The simplicity of the protocol used to collect environmental samples has created an avenue for citizen scientist programmes to be built around surveying for biodiversity using eDNA (Biggs et al., 2015).
With the development of sample kits from commercial companies specifically used for eDNA analysis (e.g., GENIDAQS, ID-GENE, Jonah Ventures, NatureMetrics, Spygen), there now exists a novel opportunity to engage the public in biodiversity science, which could accompany already-established biodiversity events, such as BioBlitz (National Geographic Society). Use of eDNA metabarcoding in this context will likely provide an unprecedented tool for education and outreach about biodiversity, and increase awareness about its decline. Challenges that hinder integration of eDNA metabarcoding in citizen science projects and educational opportunities are the time and costs needed to process samples and user-friendly data visualization tools to allow exploration of the data once provided.
Thus, finding ways to cut costs and speed up data generation (a goal common for any application of the tool), as well as creation of applications for exploration of data on smart phones and desktops alike, is needed to propel the use of eDNA applications in citizen science and education.

| CONCLUSIONS
As the tool of eDNA metabarcoding continues to develop, our understanding regarding the analysis of eDNA from macro-organismal communities, including optimal field, laboratory and bioinformatics workflows, will continue to improve in the foreseeable future.
Concurrently, we need to gain a better understanding of the spatial and temporal relationship between eDNA and living communities to improve precision, accuracy, and to enhance the ecological and policy relevance of eDNA (Barnes & Turner, 2016;Kelly et al., 2014).
Ultimately, the errors and uncertainties associated with eDNA metabarcoding studies can often be mitigated by thoughtful study design, appropriate primer choice and robust sampling and replication: as Murray, Coghlan, and Bunce (2015)