Metabarcoding for stomach‐content analyses of Pygmy devil ray (Mobula kuhlii cf. eregoodootenkee): Comparing tissue and ethanol preservative‐derived DNA

Abstract The application of high‐throughput sequencing to retrieve multi‐taxon DNA from different substrates such as water, soil, and stomach contents has enabled species identification without prior knowledge of taxon compositions. Here we used three minibarcodes designed to target mitochondrial COI in plankton, 16S in fish, and 16S in crustaceans, to compare ethanol‐ and tissue‐derived DNA extraction methodologies for metabarcoding. The stomach contents of pygmy devilrays (Mobula kuhlii cf. eregoodootenkee) were used to test whether ethanol‐derived DNA would provide a suitable substrate for metabarcoding. The DNA barcoding assays indicated that tissue‐derived operational taxonomic units (OTUs) were greater compared to those from extractions performed directly on the ethanol preservative. Tissue‐derived DNA extraction is therefore recommended for broader taxonomic coverage. Metabarcoding applications should consider including the following: (i) multiple barcodes, both taxon specific (e.g., 12S or 16S) and more universal (e.g., COI or 18S) to overcome bias and taxon misidentification and (ii) PCR inhibitor removal steps that will likely enhance amplification yields. However, where tissue is limited or no longer available, but the ethanol‐preservative medium is still available, metabarcoding directly from ethanol does recover the majority of common OTUs, suggesting the ethanol‐retrieval method could be applicable for dietary studies. Metabarcoding directly from preservative ethanol may also be useful where tissue samples are limited or highly valued; bulk samples are collected, such as for rapid species inventories; or mixed‐voucher sampling is conducted (e.g., for plankton, insects, and crustaceans).


| INTRODUC TI ON
High-throughput sequencing platforms enable generation of accurate and cost-effective multispecies genetic assays (Mardis, 2008).
Deoxyribonucleic acid (DNA) barcoding is a methodology that can provide precise and semi-automatable species identification through the design of forward-reverse primer sets for highly conserved regions of mitochondrial (mt) DNA (Hebert et al., 2003). Combining DNA barcoding and high-throughput sequencing has delivered a promising tool-DNA metabarcoding-to detect biodiversity in DNA extracted from materials including water (Stat et al., 2017), air (Kraaijeveld et al., 2015), soil (Drummond et al., 2015), fecal , and stomach-content samples (Berry et al., 2015).
In the last decade, DNA metabarcoding of fecal matter and stomach contents has been developed, with accurate taxonomic resolution of dietary biodiversity, in attempts to infer trophic interactions among both terrestrial (Bohmann et al., 2011;Clare, Fraser, Braid, Fenton, & Hebert, 2009) and aquatic organisms (Berry et al., 2015). A fundamental step in this kind of study is DNA extraction, usually from tissue or fecal samples. However, Shokralla, Singer, and Hajibabaei (2010) proposed a novel DNA extraction methodology; successfully Sanger sequencing a universal COI barcode directly from ethanol-derived DNA extracts. The ethanol used for sample preservation was hypothesized to be an adequate DNA carrier, thus through low temperature evaporation of ethanol and subsequent re-suspension of the DNA pellet, the target organism's mtDNA was polymerase chain reaction (PCR) amplified and Sanger sequenced.
More recently, the same ethanol DNA extraction protocol was tested for utility in retrieving trace DNA from freshwater benthic larval communities (Hajibabaei, Spall, Shokralla, & Konynenburg, 2012) using both Sanger and 454 sequencing. Only individuals present at very low abundance (i.e., 1 individual) were not ascertained via these DNA barcoding assays. In contrast, Robertson, Minich, Bowman, and Morin (2013)  Here, we propose that for stomach-content analyses, DNA extracted directly from preservative ethanol could provide an accurate representation of prey diversity in a more cost-effective and efficient manner than DNA extraction from tissue. Moreover, DNA metabarcoding may alleviate the impediment of identifying all prey DNA contained in stomach remains, if liquefied remains can be efficiently filtered in an unbiased manner. However, extracting DNA directly from stomach contents is not straightforward, because mixed-taxon stomach samples can be several kilograms in weight, particularly in apex-(e.g., great white shark (Carcharodon carcharias)) and mesopredators (e.g., blacktip shark (Carcharhinus limbatus), Thunnus ssp.) (Pompanon et al., 2012).
It can therefore be difficult to representatively subsample small amounts of (sometimes digested) tissue to recover the full diversity of ingested prey. Nonetheless, it is important to note that digestion rate plays a crucial role in the detection of tissue-dependent extractions, sometimes causing an underrepresentation of those taxa where tissue is rapidly digested (Sousa et al., 2016).
Despite recent debate over the taxonomy of M. kuhlii cf. eregoodootenkee (White et al., 2017), the conservation status of M. eregoodootenkee and M. kuhlii are, respectively, near threatened and data deficient (IUCN 2018). Knowledge regarding habitat type is also inaccurate since the former is described as an oceanic species, and M. kuhlii as neritic (IUCN 2018). However, these mobulids share the same distribution across the Indo-West Pacific Ocean.
Dietary habits of both species have been scarcely described in both species, but it is generally accepted that planktonic crustaceans and possibly small fishes and cephalopods are the main prey items (Couturier et al., 2012). The rare recovery of these rays allowed us to test a novel methodology for diet research in these taxa and to fill a knowledge gap for this understudied mobulid.
Since stomach-content studies preclude a priori knowledge of the target species, we used operational taxonomic unit (OTU) analysis to test our hypothesis of no differences in recovered predatorprey relationships via metabarcoding between ethanol-preservative and tissue-derived DNA.

| Sample collection
Between January and May 2017, 31 specimens of pygmy devil rays were identified and collected by the New South Wales Department of Primary Industries (NSW DPI) from five bather protection nets deployed off northern New South Wales, Australia. After recovery, the individual specimens were transported by vehicle at ambient temperature to NSW DPI Center in Ballina (NSW, Australia) and immediately frozen at −20°C to preserve the specimens, prior to bulk assessment of all animals, where the animals were collectively defrosted and necropsied following Broadhurst, Laglbauer, Burgess, and Coleman (2018). From visual inspection of stomach contents, it was possible to distinguish small fragments of prey remains, some of which were identified as sandy sprat (Hyperlophus vittatus). The stomach contents of 27 of the 31 specimens, which were relatively undegraded, were preserved in 50 ml of 100% ethanol to avoid further degradation of the samples.

| DNA extraction
In September 2017, 1.5 ml aliquots of ethanol were taken from each of 10 pygmy devil ray stomach-content samples without disturbing the settled tissue matter. For each sample, five technical replicates were taken to increase the final amount of DNA. These samples, along with five negative controls, were heated at 56°C until the ethanol had completely evaporated (Shokralla et al., 2010). Microbiology grade DNAse-free water was added, and the samples were left at ambient temperature overnight. The samples were then vortexed to encourage resuspension, and technical replicates were combined. Tissue was also taken from the same 10 samples for comparison. We attempted to subsample the tissue as representatively as possible, particularly subsampling larger tissue pieces in an attempt to avoid swamping the sample with a single taxon. The DNA was extracted using a Qiagen DNeasy blood and tissue kit (Qiagen, Germany) following the manufacturer's instructions (Qiagen, 2006), with 10 min incubation after addition of elution buffer, instead of 1 min, to ensure adequate elution. The DNA extraction was carried out in a pre-PCR laboratory to minimize the risk of contamination. Filter tips were used, and negative controls were included in all stages of the laboratory workflow. Clean room protocols were followed, with extensive bleaching of the work areas and UV-treatment of equipment, wherever possible.

| Positive controls
Positive controls are a vital inclusion in any metabarcoding PCR because they allow distinction between problems regarding PCR conditions and sample DNA extractions, and can also be used to troubleshoot and calibrate downstream metabarcoding issues (Deiner et al., 2017). Here, positive controls were sampled for DNA on a subsequent day to the stomach-content samples to avoid cross-contamination and comprised the following: white banana prawn (Penaeus  1985). These specimens were chosen because they were likely to amplify using the primers used in this study and were accessible from a local fish market. Sections of tissue were removed and the same DNeasy tissue extraction kit was used. Positive controls were tested using routine universal primers HCO2198 and LCO1490 for prawns (Folmer, Black, Hoeh, Lutz, & Vrijenhoek, 1994), and Fish F1 and Fish R2 for fish (Ward, Zemlak, Innes, Last, & Hebert, 2005) to confirm the extraction protocols and PCR assays were working.

| Metabarcoding assay
Three previously designed and tested group-specific barcode primers were selected for teleosts, crustaceans, and plankton, targeting 16S mtDNA (teleost and crustacean) and the cytochrome c oxidase subunit I gene (plankton) ( Table 1). The DNA extract concentrations were measured using a NanoDrop. The ratio of 260 and 280 nm absorbance was recorded to examine protein inhibition in the DNA extracts (Zarzoso-Lacoste et al., 2013). The PCR was performed following the protocol recommended by Taberlet, Bonin, Zinger, and Coissac (2018) using 10 µl of AmpliTaq Gold 360 DNA Master Mix (Thermo Fisher Scientific, USA), 2 µl of forward and reverse primer, 0.16 µl bovine serum albumin to decrease the risk of PCR inhibition, 2 µl of genomic DNA, and 4 µl of DNA-free water. After activation at 96°C for 10 min, each PCR run consisted of 35 cycles: 30 s at 96°C denaturation, 30 s at 50°C hybridization, and 1 min at 72°C elongation (Taberlet et al., 2018). The PCR products were tested on a 2% Agarose gel to confirm the successful amplification of the target amplicon of correct size, while subsequent PCR assays were undertaken using each primer with the addition of the appropriate Illumina overhang adaptors (forward overhang 5'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG [sequencespecific primer] 3' and reverse overhang 5'GTCTCGTGGGCTCGGAG ATGTGTATAAGAGACAG-[sequence-specific primer] 3') using the same PCR conditions as above. Two PCR replicates for each primer set were conducted for each sample. The PCR products were sent to the Ramaciotti Centre for Genomics at The University of New South Wales (Sydney, Australia) for PCR cleanup using AMPure XP beads and Index PCR. This process involves attaching Illumina unique MIDs (Multiplex IDentifiers) to the amplicons of each sample, using primers that bind to the Illumina overhang adaptors. After two repetitions of PCR clean-up, the libraries were quantified, normalized, and pooled before paired-end sequencing was performed on an Illumina MiSeq platform. Because single step (indexing) PCR has been shown to significantly bias sequence abundance, we chose to use the Illumina recommended two-step procedure for metabarcoding (O'Donnell, Kelly, Lowell, & Port, 2016

| Bioinformatic pipeline
All Illumina reads went through quality filtering comprising two main steps. The first involved pairing, merging, and trimming off the forward-reverse primers using Geneious software (Kearse et al. 2012). Reads were discarded if primers were not present, not exactly matching the primer nucleotide lengths, or if reads exceeded the mismatch number of ambiguities according to the primer sequences.
The second quality-filtering step was executed through USEARCH.
Reads were removed if containing ≥1 ambiguities, maximum error above 0.5, were less than 50 bp in size or exceeding the expected amplicon length (Table 1). Sequences were dereplicated into groups of unique sequences. Singleton groups were discarded (Edgar, 2010).
Following USEARCH OTU pipeline recommendations, we conducted an OTU analysis that consisted of clustering sequences with 97% similarity through the execution of the UPARSE algorithm (Edgar 2010

| Statistical analysis
R Studio was utilized for all statistical analysis (RStudio Team, 2015).
Data normality was evaluated by Shapiro (Shapiro and Wilk 1965) and Levene (Levene, 1960) (Wilcoxon, 1945) was executed in order to inspect the similarity of the read distribution of common OTUs between the TIS and ETH methods for the three minibarcodes (16S fish, 16S crustacean and COI plankton), using R Studio.

| DNA extraction
The initial DNA concentrations derived from TIS were significantly greater than those for the ETH (t-value = 3.47; p-value = 0.007;

| Metabarcoding results and Operational Taxonomic Unit analysis
Executing BLASTn helped to identify exogenous contaminations and non-target taxa such as Amniota and Bacteria, which were confidently excluded from further analyses. The negative controls had low levels of endogenous contamination, and those OTUs were The OTU analysis detected 11 and 13 OTUs in plankton COI, 6 and 5 OTUs in the fish 16S assay, and 2 and 13 OTUs in the crustacean 16S assay, for ETH and TIS, respectively. The distribution of reads for common OTUs across samples was similar between ETH and TIS (Table 3). The TIS method retained the greatest number of OTUs in plankton COI and crustacean 16S (Figure 3). In contrast, for the fish 16S assay, ETH showed one more OTU than TIS. In addition, we tested an 80% identity threshold to explore whether a lower similarity could give the same number of common OTUs. Our reasoning was that "false" OTUs could be generated when sequence lengths differed, or when high levels of intraspecific genetic diversity were present, or through sequencing error or other such biases. Although the 80% cut-off contributed to a larger number of common OTUs than for the 97% cut-off, it still indicated that TIS recovered more OTUs than the ETH (Figure 3).  (Albaina et al., 2016) and environmental DNA surveys (Stat et al., 2017). In fact, considering that bacterial sequences (also found in the raw data analysis and removed from OTU analysis), unidentified "no hits" and other eukaryotic taxa were detected by the COI barcode, this reflects its universal detection ability (Somervuo et al., 2017).

| D ISCUSS I ON
For our results, the high overall number of COI OTUs could be due to COI barcode limitations, that is a large number of unidentified "no hits".
Clearly, the period of digestion, and inhibition due to bacteria, digestive enzymes etc., are two crucial variables in dietary assays (Zarzoso-Lacoste, Corse, & Vidal, 2013). The longer prey DNA goes through digestive phases, the more inhibited it is likely to be, effectively influencing its quantification and amplification (Taberlet et al., 2018). Nonetheless, gut-derived prey DNA is thought to be of better quality than scat-derived prey DNA since digestion is at its initial phase. As one example, Kamenova, Mayer, Coissac, Plantegenest, and Traugott (2017) noted gut DNA was detectable for longer than scat DNA after periods of feeding and starvation in the predatory carabid beetle (Pterostichus melanarius). In fish stomachs, DNA persistence has been recorded ranging between 16 and 24 hr post digestion using mid-throughput sequencing (Carreon-Martinez et al., 2011). However, because the stomach likely contains bacteria and co-extracted substances, DNA can still certainly be affected by inhibition impairing PCR amplification (Zarzoso-Lacoste et al., 2013).
Various commercial DNA extraction kits enable the reduction or removal of inhibition. Most published studies describing marine organism diets using stomach-content samples either included an analysis step, or extraction kit that reduced inhibition (Table 2). An example of dealing with PCR inhibitors in dietary studies involves scat samples, which are less invasive and far easier to collect than stomach samples Hardy et al., 2017). Scat DNA was often found to be highly inhibited and degraded. Quantitative PCR (qPCR) is a key step that facilitates determining an appropriate dilution to reduce inhibition levels for downstream analysis, and for estimating whether sufficient target DNA is obtained from the DNA extraction. In fact, coupling qPCR and high-throughput sequencing is thought to be the best option for in-depth diet analysis when using scat samples (Murray et al., 2011). However, studies on stomach samples of marine organisms did not always include qPCR before NGS sequencing (Table 2). Although we did not use the qPCR approach here, it is clear that qPCR provides clear benefits for metabarcoding studies.
The study organisms' (Pygmy devil ray) diet has not been well studied. Nevertheless, dietary research on mobulids has described the group as mostly planktivorous, feeding on various zooplankton and small fish (Couturier et al., 2012). Although the techniques in use, that is stable isotope and fatty acid profiling, and visual identification are generally accepted, most existing methods have failed to resolve higher taxonomy in prey identification (Pompanon et al., 2012;Berry et al., 2015). Since the collection of scat samples from many marine organisms are particularly challenging, dietary studies rely on stomach samples, which in turn are dependent on fishery and fishery-independent sampling or strandings. Moreover, the eth-  (Wilcoxon 1945). When the p-value is lower than 0.05 (shown in BOLD), the two methods show different read distributions

| CON CLUS IONS
Studying trophic dynamics via high-throughput sequencing requires an efficient low-cost methodology that accurately recovers, identifies, and characterizes all ingested prey-in terms of both taxonomic diversity and ideally, levels of biomass (Taberlet et al., 2018). The methodology implemented for dietary studies of marine organisms varies according to the study taxon and the sample condition (Table 2). Even though this study investigated only one animal species, we suggest that DNA extractions from stomach-content tissue are more reliable than extractions directly from preservative ethanol but comparing these two methods in other species is required in order to verify whether this finding is universal. Ethanol-based methods can still facilitate a rapid overview of diet, and such methods may be applicable for liquefied stomach contents, using ethanol for resuspension when tissue samples are not available. The method might also be applied to bulk mixed taxonomic vouchers (e.g., bulk insect/plankton/crustacean collections preserved in ethanol)-a method routinely used in rapid biodiversity inventories.
Future research warrants testing ethanol extraction coupled with inhibition removal kits to determine if ethanol performs equally well as tissue. However, this may entail an increase of material costs and laboratory work. Moreover, as qPCR has an important role in ensuring robustness of the methodology, its effectiveness should be evaluated when DNA extraction kits include an inhibition removal step. The application of multiple DNA metabarcoding assays is certainly a promising approach for taxonomic studies on the feeding habits of marine organisms. However, this type of study will always be constrained by spatial and temporal coverage. Integrating the metabarcoding approach with isotope analysis and fatty acid profiling would improve resolution for longterm dietary studies.
We conclude that when stomach-content samples are available, researchers should consider which methodology is appropriate according to the sample status (e.g., level of degradation) and current knowledge of the host diet (e.g., generalist or specialist feeder).
However, assessing diet only via the visual identification of ingested prey may underestimate taxonomic diversity. It is essential to integrate new high-throughput approaches into dietary studies to inform fishery management and marine conservation; implicit among which is identifying appropriate, low-cost metabarcoding methodologies to accurately assess taxonomic composition.

ACK N OWLED G M ENTS
We thank Joseph Di Battista and Michael Stat for general metabarcoding discussions and guidance with our bioinformatic pipeline.
MdB acknowledges start-up funds from The University of Sydney and the USyd School of Life and Environmental Sciences. MKB acknowledges funding from NSW DPI. We are thankful to three anonymous referees for their helpful comments to improve the manuscript.

CO N FLI C T O F I NTE R E S T
None declared. wrote the paper. All authors contributed intellectually to the interpretation of the results and writing of the manuscript.