Needle in a haystack? A comparison of eDNA metabarcoding and targeted qPCR for detection of the great crested newt (Triturus cristatus)

Abstract Environmental DNA (eDNA) analysis is a rapid, cost‐effective, non‐invasive biodiversity monitoring tool which utilises DNA left behind in the environment by organisms for species detection. The method is used as a species‐specific survey tool for rare or invasive species across a broad range of ecosystems. Recently, eDNA and “metabarcoding” have been combined to describe whole communities rather than focusing on single target species. However, whether metabarcoding is as sensitive as targeted approaches for rare species detection remains to be evaluated. The great crested newt Triturus cristatus is a flagship pond species of international conservation concern and the first UK species to be routinely monitored using eDNA. We evaluate whether eDNA metabarcoding has comparable sensitivity to targeted real‐time quantitative PCR (qPCR) for T. cristatus detection. Extracted eDNA samples (N = 532) were screened for T. cristatus by qPCR and analysed for all vertebrate species using high‐throughput sequencing technology. With qPCR and a detection threshold of 1 of 12 positive qPCR replicates, newts were detected in 50% of ponds. Detection decreased to 32% when the threshold was increased to 4 of 12 positive qPCR replicates. With metabarcoding, newts were detected in 34% of ponds without a detection threshold, and in 28% of ponds when a threshold (0.028%) was applied. Therefore, qPCR provided greater detection than metabarcoding but metabarcoding detection with no threshold was equivalent to qPCR with a stringent detection threshold. The proportion of T. cristatus sequences in each sample was positively associated with the number of positive qPCR replicates (qPCR score) suggesting eDNA metabarcoding may be indicative of eDNA concentration. eDNA metabarcoding holds enormous potential for holistic biodiversity assessment and routine freshwater monitoring. We advocate this community approach to freshwater monitoring to guide management and conservation, whereby entire communities can be initially surveyed to best inform use of funding and time for species‐specific surveys.


TARGETED qPCR
Prior to testing for T. cristatus, all extracted samples were tested for PCR inhibitors and sample degradation using methodology outlined by Biggs et al. (2015), where an Internal Positive Control was included in qPCR reactions of eDNA samples and a sample considered inhibited if replicates showed different Cq values (where samples move into the exponential phase of qPCR amplification). Targeted qPCR was carried out as part of the T. cristatus monitoring programmes mentioned above in the laboratories at Fera and ADAS during 2015 using a standardised protocol (Biggs et al. 2015). Extracted DNA was amplified by TaqMan probe qPCR using published primers and probe to amplify an 81 bp fragment of the cytochrome b gene: TCCBL (5'-CGTAAACTACGGCTGACTAGTACGAA-3'), TCCBR (5-CCGATGTGTATGTAGATGCAAACA) and TCCB_Probe (5'-CCACGCTAACGGAGCCTCGC-3') (Thomsen et al. 2012). PCR reactions were set up in a total volume of 25 µL consisting of: 3 µL of extracted template DNA, 1 µL of each primer (0.4 µM), 1 µL of probe (0.1 µM), 1x TaqMan ® Environmental Master Mix 2.0 (containing AmpliTaq GOLD DNA polymerase, Life Technologies) and ddH2O. The PCR included an initial incubation for 5 min at 50 °C, then a 10 min denaturation step at 95 °C, followed by 55 cycles of denaturation at 95 °C for 30 s and annealing at 56.3 °C for 1 min. For each sample, 12 qPCR replicates were performed and a sample recorded as positive for T. cristatus if one or more qPCR replicates were positive. Positive (T. cristatus DNA: 1 x 10 -1 ng/μL to 1 x 10 -4 ng/μL) and negative controls (ddH2O) were also included on each plate in quadruplicate. Following qPCR, the eDNA samples were placed in storage at -80 o C.

Appendix 2: Reference database construction
A custom, phylogenetically curated reference database of the target region was created for UK vertebrate species. For freshwater fish, we used a previously created database comprising 67 fish species, which includes all known native and non-native species in the UK and our positive control Rhamphochromis esox, a species of cichlid from Lake Malawi (Hänfling et al. 2016). For all remaining vertebrate species recorded in the UK (Natural History Museum UK Species Database, 2016), custom, phylogenetically curated reference databases were constructed using the ReproPhylo environment (Szitenberg et al. 2015) in a Jupyter notebook (Jupyter Team 2016). Each vertebrate group was processed separately in order to build Muscle alignments (Edgar 2004) and visualise constructed phylogenetic trees. Species lists containing the binomial nomenclature of UK vertebrate species were constructed using the Natural History Museum UK Species Database. All vertebrates recorded in the UK were included. The BioPython script performed a GenBank search based on the species lists and downloaded all available mitochondrial 12S ribosomal RNA (rRNA) sequences for specified species. Proportion of reference sequences available for species varied within each vertebrate group: amphibians 100.00% (N = 21), reptiles 90.00% (N = 20), birds 55.88% (N = 621), and mammals (83.93%, N = 112), Where there were no records on GenBank for a UK species, the database was supplemented with downloaded sequences belonging to sister species in the same genus. Species that had no 12S rRNA records on Genbank are provided in Table S1.
Redundant sequences were removed by clustering at 100% similarity using vsearch 1.1 (https://github.com/torognes/vsearch). Due to high proportion of partial 12S rRNA records on GenBank for the majority of UK species, only sequences longer than 500 bp were processed initially to increase alignment robustness to large gaps. Short sequences can cause problems in global paired alignments where the alignment algorithm attempts to align them to longer sequences. Short 12S rRNA sequences (<500 bp) were later incorporated into the existing long 12S rRNA alignment using the hmmer v3 program suite (HMMER development team 2016) to construct a Hidden Markov Model alignment containing sequences of all lengths. Alignments were trimmed using trimAl (Capella-Gutiérrez, Silla-Martínez & Gabaldón 2009). Maximum likelihood trees were inferred with RAxML 8.0.2 (Stamatakis 2006) using the GTR+gamma model of substitutions. The complete alignments were then processed using SATIVA (Kozlov et al. 2016) for automated identification of 'mislabelled' sequences which could cause conflict in downstream analyses. Putatively mislabelled sequences were removed and process of alignment and phylogenetic tree construction repeated for manual investigation of sequences.

PRIMER VALIDATION
Vertebrate DNA from eDNA samples was amplified with published 12S rRNA primers 12S-V5-F (5'-ACTGGGATTAGATACCCC-3') and 12S-V5-R (5'-TAGAACAGGCTCCTCTAG-3') (Riaz et al. 2011). Primers were validated for the present study in silico using ecoPCR software (Ficetola et al. 2010;Bellemain et al. 2010) against a custom, phylogenetically curated reference database for UK vertebrates. Parameters were set to allow a fragment size of 50-250 bp and maximum of three mismatches between the primer pair and each sequence in the reference database. Primers were previously validated in vitro for UK fish communities by Hänfling et al. (2016) and here were also validated against tissue DNA extracted from UK amphibian species (supplied by DICE, University of Kent, and University of Glasgow): great crested newt, smooth newt, palmate newt Lissotrition helveticus, Alpine newt, common frog and common toad. Primer validation tests were performed at University of Hull in a separate laboratory situated on a different floor to the dedicated eDNA laboratory. A dilution series (10 0 to 10 -8 ) was performed for DNA (standardised to 5 ng/μL) from each species to identify the limit of detection (LOD) for each species. Molecular grade sterile water (Fisher Scientific) substituted template DNA for the PCR negative control.

TWO-STEP PCR PROTOCOL
A two-step PCR protocol was performed on eDNA samples at University of Hull. Dedicated rooms were available for pre-PCR and post-PCR processes. Pre-PCR processes were performed in a dedicated eDNA laboratory, with separate rooms for filtration, DNA extraction and PCR preparation of sensitive environmental samples. PCR reactions were set up in a UV and bleach sterilized laminar flow hood. Eight-strip PCR tubes with individually attached lids were used instead of 96-well plates to minimise cross-contamination risk between samples (Port et al. 2016). After the first sequencing run revealed substantial human contamination across samples and PCR controls, reactions prepared for the second sequencing run were sealed with mineral oil as an additional measure against PCR contamination. For the first PCR, three replicates were performed for each sample to combat PCR stochasticity. Alternating PCR positive and negative controls were included on each PCR strip (six positive and negative controls on each 96-well plate), to screen for sources of potential contamination. The DNA used for the PCR positive control was R. esox, as occurrence in UK ponds is extremely rare or non-existent. The negative control substituted molecular grade sterile water for template DNA. During the first PCR, the target region was amplified using the primers described above, including adapters (Illumina 2011). First step PCR reactions were performed in a final volume of 21.1 μL, using 2 μL of DNA extract as a template. The amplification mixture contained 10.5 μL of Bioline ® MyTaq™ HS Red Mix, 1.05 μL of forward and reverse primer (final concentration -0.5 μM) and 6.5 μL of molecular grade sterile water (Fisher Scientific). PCR was performed on an Applied Biosystems ® Veriti Thermal Cycler and PCR conditions for the first component of the two-step protocol consisted of: an incubation step at 98 °C for 5 min, followed by 35 cycles of denaturation at 98 °C for 15 s, annealing at 56 °C for 20 s, and extension at 72 °C for 30 s with final extension at 72 °C for 10 min. PCR products were stored at 4 °C until fragment size was verified by visualising 5 μL of selected PCR products on 2% agarose gels (100 mL 0.5x TBE buffer, 2 g agarose powder). Gels were then stained with ethidium bromide and imaged using Image Lab Software. A PCR product was deemed positive where there was an amplification band on the gel that was of the expected size (200-300 bp). PCR replicates for each sample were pooled in preparation for the addition of Illumina indexes in the second PCR, which resulted in 63.3 μL of PCR product for each sample. PCR positive and negative controls were not pooled to allow individual purification and sequencing of all 228 PCR controls. All PCR products (30 μL samples and 15 μL PCR controls) were then purified to remove excess primer using E.Z.N.A. Cycle Pure V-Spin Clean-Up Kits (VWR International) following manufacturer centrifugation protocol. Eluted DNA was stored at -20 °C until the second PCR could be performed.
In the second PCR, Molecular Identification (MID) tags (unique 8-nucleotide sequences) and Illumina MiSeq adapter sequences were bound to the amplified product. These tags were included in the forward and reverse primers resulting in indexed primers for second PCR (O'Donnell et al. 2016). For each second PCR plate, 96 unique tag combinations were created by combining eight unique forward tags with 12 unique reverse tags or vice versa (Kitson et al. 2018). A total of 384 unique tag combinations were achieved, allowing samples to be distinguished during bioinformatics analysis. Second step PCR reactions were performed in eight-strip PCR tubes with individually attached lids in a final volume of 21.1 μL, using 2 μL of purified DNA from the first PCR product as a template. The amplification mixture contained 10.5 μL of Bioline ® MyTaq™ HS Red Mix, 2.1 μL of tagged primer mix (final concentration -0.5 μM) and 6.5 μL of molecular grade sterile water (Fisher Scientific). PCR was performed on an Applied Biosystems ® Veriti Thermal Cycler with the following profile: denaturation at 95 °C for 3 min, followed by 12 cycles of annealing at 98 °C for 20 s and extension at 72 °C for 30 s with final extension at 72 °C for 5 min. PCR products were stored at 4 °C before they were all visualised on 2% agarose gels (100 mL 0.5x TBE buffer, 2 g agarose powder) using 5 μL PCR product. Gels were then stained with ethidium bromide and imaged using Image Lab Software. Again, PCR products were deemed positive where there was an amplification band on the gel that was of the expected size (200-300 bp). Amplification bands were found to be present in some of the negative controls thus all negative controls were included for sequencing.

LIBRARY PREPARATION
All remaining library preparation was conducted at Fera. PCR products were transferred to a new 96-well PCR plate for individual purification with AMPure ® XP beads and a magnetic stand. The Illumina PCR clean-up protocol was adapted to use 18.6 μL AMPure ® XP beads (1.2x PCR product) to 15-16 μL PCR product. Illumina protocol was then followed until the beads were resuspended in 15 μL molecular grade water and incubated at room temperature for 5 minutes. The supernatant without beads in each well were not transferred to a new plate due to low volumes of purified product. Further pipetting may have resulted in loss of DNA. Each plate was sealed and stored at 4 °C until quality assurance. A Quant-IT™ PicoGreen™ dsDNA Assay was conducted for all samples on a ThermoFisher 96-well microplate reader. Samples were then normalised and pooled to create 4 nM pooled libraries before quantification using a Qubit™ dsDNA HS Assay. Both libraries passed quality assurance with concentrations of 2.62 ng/μl and 4.14 ng/μl respectively. A Tapestation D1000K assay was then used to check and compare size of the pooled libraries to selected samples. The pooled libraries were 272 bp and 299 bp (expected 286 bp) with samples in the same range. Equimolar libraries (4 nM) were then created using tapestation trace size estimates and Qubit concentrations. Libraries were run at 12 pM concentration on an Illumina MiSeq using 2 x 300 bp V3 chemistry. Both libraries included a 10% PhiX DNA spike-in control to improve clustering during initial sequencing.

BIOINFORMATIC PROCESSING
Raw reads were quality trimmed using Trimmomatic v0.32 (Bolger, Lohse & Usadel 2014), both from the read ends (minimum per base phred score Q30), as well as across sliding windows (window size 5bp; minimum average phred score Q30). Reads were clipped to a maximum length of 110 bp and reads shorter than 90 bp after quality trimming were discarded. To reliably exclude adapters and PCR primers, the first 25 bp of all remaining reads were also removed. Sequence pairs were merged into single high quality reads using FLASH v1.2.11 (Magoč & Salzberg 2011), if a minimum of 10 bp overlap with a maximum of 10% mismatch was detected between pairs. For reads that were not successfully merged, only forward reads were kept. To reflect our expectations with respect to fragment size, a final length filter was applied and only sequences of length 80-120 bp were retained. These were screened for chimeric sequences against our custom reference database using the uchime algorithm (Edgar et al. 2011), as implemented in vsearch v1.1 (Rognes et al. 2016). Redundant sequences were removed by clustering at 97% identity ('--cluster_fast' option) in vsearch v1.1 (Rognes et al. 2016). Clusters represented by less than five sequences were considered sequencing error and omitted from further analyses. Non-redundant sets of query sequences were then compared against our custom reference database using BLAST (Zhang et al. 2000). For any query matching with at least 98% identity to a reference sequence across more than 80% of its length, putative taxonomic identity was assigned using a lowest common ancestor (LCA) approach based on the top 10% BLAST matches. Sequences that could not be assigned (non-target sequences) were subjected to a separate BLAST search against the complete NCBI nucleotide (nt) database at 98% identity to determine the source via LCA as described above.

MANIPULATION OF METABEAT DATASET
Non-target sequence assignments and original assignments at 98% identity were merged. Any spurious assignments (i.e. non-UK species, invertebrates and bacteria) were removed from the dataset. Assignments to genera or families which contained only a single UK representative were manually assigned to that species. In our dataset, only genus Strix was reassigned to tawny owl Strix aluco. Where family and genera assignments containing a single UK representative did have reads assigned to species, reads from all assignment levels were merged and manually assigned to that species. Consequently, all taxonomic assignments included in the final database were of species resolution. A total of 60 species were detected by eDNA metabarcoding. Mis-assignments in our dataset were then corrected; again, only one instance was identified. Scottish wildcat Felis silvestris was reassigned to domestic cat Felis catus on the basis that Scottish wildcat does not occur where ponds were sampled (Kent, Lincolnshire and Cheshire).

GLMM COMPARISON OF eDNA METHODS FOR T. CRISTATUS DETECTION
Initially, a Poisson distribution was specified but tests using the R package RVAideMemoire v 0.9-45-2 (Hervé 2015) revealed models with this distribution were overdispersed. Models with a quasi-Poisson and zero-inflated distribution failed to resolve overdispersion (Ver Hoef & Boveng 2007). A negative binomial distribution was used to control for aggregation in the count data and prevent biased parameter estimates (Harrison 2014). Model overdispersion remained unresolved but model fit was improved. Model fit was assessed using the Hosmer and Lemeshow Goodness of Fit Test (Hosmer & Lemeshow 2000) within the R package 'ResourceSelection' v0.2-4 (Lele et al. 2016). Model predictions were obtained using the predictSE() function in the 'AICcmodavg' package v2.0-3 (Mazerolle 2017) and upper and lower 95% CIs were calculated from the standard error of the predictions. Table S2. List of species detected in PCR positive controls by eDNA metabarcoding and corresponding speciesspecific false positive sequence threshold applied.        Detections exceeding 100,000 reads (e.g. cow Bos taurus) were omitted during plotting to improve visualisation of lower read assignments in the dataset, but the data were not adjusted in this process. Each species was present in at least one sample although low read counts are not always visible. Figure S3. Proportion of eDNA samples in which each species was detected by eDNA metabarcoding. Figure S4. Presence of foreign DNA in PCR negative controls across sequencing runs. Highest contamination was observed from fish species, common roach Rutilus rutilus and European bullhead Cottus gobio, in addition to common frog Rana temporaria and pig Sus scrofa. Common roach occurred in six PCR negative controls, two of which exceeded 100 reads. European bullhead occurred in four PCR negative controls but all exceeded 1,000 reads. Notably, common frog occurred in 13 PCR negative controls but only two exceeded 100 reads, with 180 and 13,120 reads. Pig occurred in one PCR negative control only but exceeded 14,000 reads. Contamination from other species was relatively low with few species exceeding 100 sequence reads. Figure S5. Presence of cichlid DNA (PCR positive control) amongst PCR negative controls and eDNA samples. Contamination of PCR negative controls was more frequent on the first sequencing run but greater where it occurred during the second sequencing run. Contamination of environmental samples was most common on plates 3 and 4, which were also sequenced on the first MiSeq run. Figure S6. Presence of human DNA amongst PCR controls and eDNA samples. Contamination of PCR controls and environmental samples was less frequent in the second sequencing run. Human DNA contamination was most abundant in environmental samples on PCR plates 1, 3, 4 and 5.

VERTEBRATE METABARCODING
eDNA METABARCODING VS qPCR FOR T. CRISTATUS DETECTION   Table S8. Summary of agreement (+) and disagreement (-) between egg searches, qPCR NT, qPCR TA, metabarcoding NT, and metabarcoding TA for T. cristatus detection in ponds (N = 532).  Table S9. Details of expenditure breakdown provided as an excel spreadsheet.