Toward an ecoregion scale evaluation of eDNA metabarcoding primers: A case study for the freshwater fish biodiversity of the Murray–Darling Basin (Australia)

Abstract High‐throughput sequencing of environmental DNA (i.e., eDNA metabarcoding) has become an increasingly popular method for monitoring aquatic biodiversity. At present, such analyses require target‐specific primers to amplify DNA barcodes from co‐occurring species, and this initial amplification can introduce biases. Understanding the performance of different primers is thus recommended prior to undertaking any metabarcoding initiative. While multiple software programs are available to evaluate metabarcoding primers, all programs have their own strengths and weaknesses. Therefore, a robust in silico workflow for the evaluation of metabarcoding primers will benefit from the use of multiple programs. Furthermore, geographic differences in species biodiversity are likely to influence the performance of metabarcoding primers and further complicate the evaluation process. Here, an in silico workflow is presented that can be used to evaluate the performance of metabarcoding primers on an ecoregion scale. This workflow was used to evaluate the performance of published and newly developed eDNA metabarcoding primers for the freshwater fish biodiversity of the Murray–Darling Basin (Australia). To validate the in silico workflow, a subset of the primers, including one newly designed primer pair, were used in metabarcoding analyses of an artificial DNA community and eDNA samples. The results show that the in silico workflow allows for a robust evaluation of metabarcoding primers and can reveal important trade‐offs that need to be considered when selecting the most suitable primer. Additionally, a new primer pair was described and validated that allows for more robust taxonomic assignments and is less influenced by primer biases compared to commonly used fish metabarcoding primers.


Contents
A genetic database for all freshwater Actinopterygii species with established populations in the MDB was obtained using a PCR amplification of the complete mitochondrial 12S ribosomal RNA gene followed by Sanger sequencing. For all species, either extracted DNA or tissue samples were obtained from previous studies (Hardy et al., 2011;MacDonald, Young, Lintermans, & Sarre, 2014) (Table S1). When only tissues samples were available, genomic DNA was extracted using the DNeasy Blood and Tissue Kit following the manufacturer instructions (Qiagen, Hilden, Germany).
For most samples, successful amplification of the entire 12S gene was achieved using primer combinations 12SR and 12SL or Marinefish-12SrRNA-F and Marinefish-12SrRNA-R (Jin, Zhao, & Wang, 2013;Wang, Tsai, Tu, & Lee, 2000). PCR reactions contained 12.50 µL MyTaq TM HS Red Mix (Bioline Australia Pty Ltd, NSW, Australia), 0.25-1.00 µL of each primer (10µM), 1.00-4.00 µL genomic DNA and DEPC-treated water to a final volume of 25 µL. Cycling conditions consisted of an initial activation of 2 min at 95°C; 35 3-step cycles of 1 min at 94°C, 1 min at 50°C and 1 min 30 sec at 72°C; and a final extension of 10 min at 72°C. For three species (i.e. Galaxias ornatus, Maccullochella peelii and Pseudaphritis urvillii) modifications to the PCR protocol were needed. For G. ornatus and M. peelii the 12SV5 primers described by Riaz et al. (Riaz et al., 2011) were used as internal PCR primers in combination with Marinefish-12S-F and 12SR. Additionally, a touchdown cycling stage (i.e. 10 3-step cycles of 1 min at 94°C, 1 min at 60°C and 1 min 30 sec at 72°C with annealing temperatures decreasing with 1°C per cycle) was added after the initial activation step to increase specificity and yield. Successful amplification of the 12S gene of P. urvillii required newly developed primers Not-12S-F (5'-TATTTAAAACGTAACACTGAAAATG-3') and Not-12S-R (5'-TCATGATGCAAAAGGTACGAG-3') as previously used primers contained significant base-pair mismatches with sequence records of other species within the suborder Notothenioidei.
The presence of a single PCR product was confirmed through gel electrophoresis using a 2% agarose gel containing SYBR ® Safe DNA gel stain and a run time of 60 min at 90 volts. Amplicons were purified using the MinElute PCR Purification Kit (Qiagen, Hilden, Germany) and Sanger sequenced using an AB 3730xl DNA Analyzer at the ACRF Biomolecular Resource Facility (The John Curtin School of Medical Research, Australian National University). PCR primers were used for sequencing and an internal sequencing primer (MT1478H) was used to improve sequencing quality of the 5' region of the 12S gene for most samples (excluding G. ornatus and M. peelii) (Fuller, Baverstock, & King, 1998).
Sequences were imported into Geneious v8.1.8 and assembled into contigs using the "DeNovo Assembly" option (Kearse et al., 2012). Assemblies were manually checked for quality and a consensus sequence was obtained containing a partial sequence of the Phenylalanyl-tRNA gene, the whole 12S ribosomal RNA gene and Valine-tRNA gene, and a partial sequence of the 16S ribosomal RNA gene (NCBI accession codes: KY798443-KY798504). Table S1. Complete list of all freshwater fish species the Murray-Darling Basin (MDB) and the details of all the samples used for the PCR amplification and Sanger sequencing of the entire mitochondrial 12S ribosomal RNA gene (NCBI accession codes: KY798443-KY798504).  Sequence lenght distribution of all sequence reads assigned to their respective samples. Figure S1. The number of internally amplified barcodes for each primer pair plotted against the length of the internal barcode sequences. The data are derived from all sequence records that were successfully assigned to their respective samples and the vertical dashed lines represent the sequence length threshold used to remove short sequence records for each primer pair.

Afurcagobius tamarensis
Best fitting linear regression model for the eDNA metabarcoding data obtained from the artificial community sample. Figure S2. The best fitting model describing the relationship between the proportional read abundances and the PrimerMiner penalty scores for the artificial community.
Summary of the metabarcoding data obtained from environmental DNA samples collected from two sites in the MDB. Table S3. Summary of the metabarcoding data obtained from environmental DNA samples collected for two sites within the Murray-Darling Basin (i.e. 8 and 12 samples collected for the Blakney Creek and Murrumbidgee River sites respectively) and analysed with the MiFish-U, Teleo and AcMDB07 primer pairs. Results are given as the number of samples testing positive for the different species and the average proportion of sequence reads ± the standard deviation given in between brackets.