A simplified DNA extraction protocol for unsorted bulk arthropod samples that maintains exoskeletal integrity

Published in: Environmental DNA DOI: 10.1002/edn3.16 Publication date: 2019 Document version Publisher's PDF, also known as Version of record Document license: CC BY Citation for published version (APA): Nielsen, M., Gilbert, M. T. P., Pape, T., & Bohmann, K. (2019). A simplified DNA extraction protocol for unsorted bulk arthropod samples that maintains exoskeletal integrity. Environmental DNA, 1(2), 144-154. https://doi.org/10.1002/edn3.16


| INTRODUC TI ON
Arthropods greatly outnumber all other eukaryotic taxa in terms of species (Mora, Tittensor, Adl, Simpson, & Worm, 2011), and they have a profound influence on human well-being, ranging from vectors of deadly diseases to providing essential ecosystem services, such as pollination and pest control (e.g. reviewed in Noriega et al., 2018). However, the immense number of arthropod species has traditionally represented an enormous challenge for ecological studies, monitoring programs, and biodiversity assessment analyses that rely on bulk arthropod samples. Such studies can include hundreds of bulk samples each containing numerous species of which many might even be undescribed. Morphologybased identification of specimens in such bulk arthropod samples is a daunting task that can require thousands of working hours and a wide range of taxonomic expertise (Basset et al., 2012). One solution has been to restrict the focus, and thereby the taxonomic identifications, to indicator groups such as butterflies (e.g. Thomas et al., 2004) or beetles (e.g. Rainio & Niemelä, 2003), which while faster, results in the loss of much of the information contained within the samples and adds a taxonomic bias. DNA-based methods have been suggested as an alternative means with which to improve the speed and economic cost of biodiversity assessments in general (Baird & Hajibabaei, 2012), and in the case of arthropods, such approaches offer the potential to move from looking at just a few indicator species to assessing entire communities.
Among the DNA-based methods, metabarcoding is currently the methodology of choice, making it possible to identify multiple taxa with high-throughput sequencing (HTS) of DNA mini-barcodes obtained from PCR amplification (Taberlet, Coissac, Pompanon, Brochmann, & Willerslev, 2012). Studies have shown that metabarcoding can be used to identify arthropod taxa in bulk arthropod samples (e.g. Kocher et al., 2017;Liu et al., 2013;Morinière et al., 2016), and for example be applied to assess how arthropod diversity changes to different land-uses (Beng et al., 2016).
Metabarcoding has also been used to assess how invertebrates might leak DNA to their preservative ethanol which then potentially can serve as a mean to obtain sample diversity information (Erdozain et al., 2019;Hajibabaei, Spall, Shokralla, & Konynenburg, 2012). As an alternative to metabarcoding, shotgun sequencing, where the total DNA is sequenced, has been explored, for example to identify the species composition of insect communities  and to elucidate beetle phylogenies (Crampton-Platt et al., 2015;Gillett et al., 2014).
A key element of any DNA-based study of bulk arthropod samples is the DNA extraction, which often includes physical homogenization of individual bulk samples into "insect soups" before DNA extraction, loosing valuable morphological information (Gibson et al., 2014;Gillett et al., 2014;Morinière et al., 2016;Yu et al., 2012). Samples often undergo additional processing during the DNA extraction process, for example sorting into individual specimens to amplify their individual DNA barcodes (Wang, Srivathsan, Foo, Yamane, & Meier, 2018). To account for biomass differences, specimens within bulk samples have been sorted into size-groups (Elbrecht, Peinert, & Leese, 2017) or taxonomic groups (Morinière et al., 2016). Bulk samples have also been sorted out for individual DNA extractions, which were then pooled together to mimic community samples (Brandon-Mong et al., 2015;Crampton-Platt et al., 2015;Yu et al., 2012). While the rationale behind each of these approaches is to maximize the recovered taxonomic information, the labor incurred renders them unrealistic for large-scale studies with hundreds of bulk arthropod samples.  (Murray, Coghlan, & Bunce, 2015;Schrader, Schielke, Ellerbroek, & Johne, 2012) and which therefore require removal through labor-intensive purification, raising the question as to whether such methods add any benefit to the result of bulk arthropod studies. Lastly, there is an obvious need to automate extractions of bulk arthropod samples as this will make processing large numbers of samples more feasible, but the effectiveness and reliability of automation needs validation and quantified comparisons to manual approaches.
In this study we aim at answering the above-mentioned questions. First, we compare a non-destructive DNA extraction method to a destructive method. Second, we assess whether large quantities of digest obtained from bulk arthropod samples need to be purified, or if purifying a subset of the digest is sufficient to recover taxonomic information of bulk arthropod samples.
Third, we assess the need to use phenol/chloroform to remove inhibitors from extracts obtained from bulk arthropod samples.
Lastly, we evaluate whether automatic purification of digests on a QIAcube robot (Q iagen) performs as well as manual purification.
A detailed overview of the study is given in Figure 1. pairs were generated to be as pairwise similar as possible with regard to the number, body-sizes, and origin of specimens (Table S1).

| Mock bulk arthropod samples
Thus, each sample pair contained the same number of specimens within the same three size categories (small, medium, and large) all originating from the same Malaise trap sample. For example: The two mock bulk arthropod samples in sample pair 2 contained 30 small, 0 medium, and 2 large-sized arthropods all originating from the same Malaise trap sample collected in 2014 in Tanzania (Table   S1). Unfortunately, it was not possible to make the nine sample pairs fully identical. Mock samples were photographed with a digital single-lens reflex camera and specimens were morphologically identified to either taxonomic order or family level based on the images. Specimens in sample pair nine and the three larger samples were not identified because of the high number of specimens.
All mock samples were stored in 70% ethanol at room temperature until DNA extraction.

| DNA extraction
DNA extractions were carried out in a dedicated pre-PCR laboratory. Ethanol was carefully poured off the samples, and remaining ethanol was evaporated by placing the samples in an oven at 56°C until dry. Within each of the nine sample pairs, one sample was randomly chosen to be kept intact, while the other was homogenized using a small grinder covered with aluminium foil ( Figure 1c). New foil was used and the grinder cleaned with bleach and ethanol between each sample. The three large samples were F I G U R E 1 Study overview. Twentyone mock arthropod bulk samples falling into two categories were created: (a) Nine sample pairs (18 samples in total) in which each sample contained 11-100 individual arthropods and (b) three large samples containing 100, 510, and an unknown number of arthropods. The nine sample pairs were used to (c) compare physical homogenization with no homogenization prior to DNA extraction, (d) assess the effect of phenol/ chloroform inhibitor removal, and (e) compare DNA purification on a QIAcube robot to manual purification. The three larger samples were used to (f) assess the effect of purifying different digest volumes. Samples that underwent a phenol/chloroform inhibitor removal step were only purified on a QIAcube robot. Comparisons and assessments were based on quantitative PCR and metabarcoding. Note that it was not possible to make the nine sample pairs perfectly identical kept intact. Digestion of all samples was carried out using a digestion buffer modified from  consisting of 10 mmol/L Tris-HCl (pH 8.0), 10 mmol/L NaCl, 5 mmol/L CaCL, 2.5 mmol/L ethylene-diamine-tetra-acetic acid (EDTA), 1% sodium dodecyl sulfate (SDS), 1% proteinase K, 40 mmol/L dithiothreitol (DTT), and molecular biology grade H 2 O. Digestion buffer was added so that all specimens or specimen parts in a sample were covered. A negative extraction control was included alongside the digestion of homogenized samples, intact samples, and the three larger samples. After adding digestion buffer, samples were placed on a rotator in an oven at 56°C overnight (>14 hr) and briefly centrifuged to pellet arthropods. After collecting the DNA digest, Louis, MI) to 200 µl digest. This was mixed by resuspension, placed on a rotator for 5 min followed by centrifugation at 17,000x g for 5 min after which the supernatant was removed for later purification. Automatic purifications were carried out on a QIAcube robot (Protocol: "Purification of PCR products from 100-200 µl PCR samples", version 1.0) with the following settings: a binding step with 5× volume of PB buffer followed by centrifugation at 12,000× g for 60 s. A wash step with 750 µl PE buffer followed by centrifugation at 12,000× g for 60 s and a subsequent dry-spin centrifugation at 12,000× g for 60 s. Lastly, DNA was eluted in 30 µl elution buffer (EB), incubated at 1 min at room temperature and followed by centrifugation at 12,000× g for 60 s. Manual purifications were carried out with the QiaQuick PCR purification kit (Qiagen) following the manufacturer's protocol, but with the same parameters as for the QIAcube robot.
DNA extracts were stored at −20°C until further analyses.

| Metabarcoding
Metabarcoding was carried out on all 69 DNA sample extracts and negative extraction controls. To minimize contamination risk, all PCRs were set up in dedicated amplicon-free laboratories. All PCR amplifications were carried out on a 2720 Thermal Cycler (Applied Biosystems) using the above-mentioned arthropod primers. PCR amplifications were carried out as described for the qPCR above, but as determined by the qPCR screening, they were carried out with 1:10 dilutions of sample DNA extracts and with 35 PCR cycles.
Furthermore, SYBR Green/ROX and dissociation curve were omitted while a final extension of 72°C for 7 min was included. A negative PCR control was included for every seven reactions. A set of 60 uniquely 5′-nucleotide tagged forward and 60 uniquely tagged reverse primers were used (for tag sequences, see Schnell, Bohmann, & Gilbert, 2015). Tags were 7-8 nucleotides in length. Each sample was PCR-amplified using different combinations of matching tags for each of three PCR replicates. PCR products were visualized with GelRed Nucleic Acid Stain (Biotum, Heyward, CA) on a 2% agarose gel against a 50 bp ladder. All negative controls appeared negative.
The PCR products were pooled into amplicon pools avoiding that PCR products carrying the same tag combination were pooled together. Pooling was based on gel band strength to approximate equimolar ratios. Negative controls were included at a similar volume as the PCR amplicons. Amplicon pools were converted into sequencing libraries following the protocol described in (Schnell et al., 2015) using a NEBNext DNA Library Prep Master Mix Set (#E6070; NEB, Ipswitch, MA) and sequenced 230 bp paired-end on an Illumina MiSeq sequencing platform aiming for 20,000 reads per PCR replicate.
After sequencing, AdapterRemoval 2.1.7 (Lindgreen, 2012) was used to remove adaptors and low-quality reads. A customized Perl script was used to merge the paired reads. A modified version (https ://github.com/shyam sg/DAMe) of the tool kit DAMe (Zepeda-Mendoza, Bohmann, Carmona Baez, & Gilbert, 2016) was subsequently used to sort sequences by primer and tags and to filter sequences from each sample's PCR replicates so that only sequences present in minimum two of the three PCR replicates were retained, and singletons were removed (Alberdi, Aizpurua, Gilbert, & Bohmann, 2018). Operational taxonomic units (OTUs) were clustered at 97% similarity using Sumaclust (Mercier, Boyer, Bonin, & Coissac, 2013). Each sample was normalized to a total of 50,000 sequences. In each sample, OTU sequences with a copy number of less than 0.1% of total (less than 500 copies) were discarded as these were assessed as likely false positives (OTU table with all OTUs and sequences before the 0.1% filtering is supplied as Table S4).
Operational taxonomic units were identified through best matches in the Barcode of Life Data System (Ratnasingham & Hebert, 2007) or the NCBI database (https ://www.ncbi.nlm.nih.gov/). OTUs were assigned to taxonomic order or family if it was >97.3% similar to a reference arthropod sequence. For consistency, OTU taxonomy and morphological identifications were not performed at lower taxonomic level than family. Operational taxonomic units not assigned to the phylum Arthropoda were not included in the study.

| Analysing results
To enable comparison of ct values between different qPCR runs, ct values were normalized against the mean ct value of the four positive controls (inter-run calibrators) (Bustin et al., 2010). Ct values, normalized ct values and number of OTUs were statistically analyzed using a paired t test. Venn diagrams and Jaccard similarity coefficients were calculated using Rstudio (Version 1.0.143) and the packages vegan and VennDiagram.

| Non-destructive DNA extraction
Although initial physical homogenization of samples yielded higher DNA concentrations compared to samples that had not undergone homogenization prior to DNA extraction (Figure 2), they were also more strongly affected by PCR inhibitors ( Figure S2). Notably, we also found that the OTU diversity recovered from intact samples was generally at least comparable to, if not better than, that from the homogenized samples with R 2 values almost twice as high for intact samples compared to homogenized samples (Figure 3; Tables   S2, S3). Furthermore, homogenized samples exhibited more variation (less consistency) in OTU numbers between the three purification approaches (phenol/chloroform inhibitor removal followed by automatic purification, only manual purification, and only automatic purification) ( Figure S3).

| Digest volumes
We initially investigated whether the volume of purified digest affects the recovered OTU diversity. Within each of the three large mock bulk samples, there was a consistent and large overlap of recovered OTUs irrespective of whether purification was carried out on 1,000 or 200 µl digest volumes ( Figure 4a). Importantly, purification of 1,000 µl digest did not result in higher numbers of recovered OTUs within each of the three large mock bulk samples, meaning that purification of 200 µl digest was enough to represent the overall OTU diversity (Figure 4b; Figure S1).

| The effect of phenol/chloroform inhibitor removal
We subsequently explored the effect of purifying bulk arthropod sample digests with organic solvents (phenol/chloroform) to remove PCR inhibitors. While we found that this inhibitor removal yielded significantly (p < 0.01) higher amounts of DNA ( Figure 5), this did not affect the number of recovered OTUs (p > 0.05). Indeed, samples that were not treated with phenol/chloroform performed equally well, if not better ( Figure 6; Figures S4 and S5). We found no obvious difference in inhibitor levels between samples which were treated with phenol/chloroform and those that were not, and some samples still showed signs of PCR inhibitors even after the phenol/chloroform treatment ( Figure S6).

| Automated purification
Lastly, we explored the performance of automated purification of DNA extracts with a QIAcube robot versus manual purification.
DNA yields were generally the same between the automated and manual purification ( Figure S7), and no difference was observed with regard to PCR inhibitor levels (data not shown). Furthermore, the number of OTUs did not differ (p > 0.05) between the two purification methods (Figure 7), and the Jaccard similarity coefficients were high showing similar OTU diversity outputs between the two methods ( Figures S8 and S9).
F I G U R E 2 Δct values (calculated as the normalized ct intact − ct homogenized ) for undiluted (blue) and 1:10 diluted (green) sample extracts from the nine sample pairs with the three different purification approaches. Positive values indicate a higher DNA concentration for homogenized samples. * indicate a significant difference (t test, p < 0.05) in ct values between homogenized and intact samples

| D ISCUSS I ON
The principal aim of this study was to explore whether it is possible to increase the efficiency of DNA extractions from keeping bulk caught arthropod samples unsorted and without losing taxonomic information. Our results demonstrate how steps relating both to sample pre-processing and DNA extraction can be simplified without incurring significant loss of biodiversity information.
First, we demonstrate that it is not necessary to physically homogenize the arthropods into a "soup", and we believe that this offers F I G U R E 3 Scatterplot for the number of operational taxonomic units (OTUs; x-axis) and observed number of morphospecies (y-axis) for the nine sample-pairs of homogenized (left) and intact (right) mock bulk arthropod samples. For these, three different approaches applied for the DNA purification are shown; automatic purification using a QIAcube robot (Automatic), manual purification (Manual), and phenol/ chloroform inhibitor removal followed by automatic purification (P/C). The two top plots include all observed number of morphospecies whereas the bottom plots include an adjusted number of morphospecies in relation to the known biases of the used primer set (Tables S2  and S3). Data points above the x = y line indicates a higher number of morphospecies than the number of retained OTUs. R 2 values for an intercept at (0,0) are shown considerable potential gains. This is not only because eliminating the physical homogenization step retains the information represented by the specimens' morphology, but also because sample pre-processing rapidly becomes impractical and costly in man-hours when handling large numbers of samples. Furthermore, the incorporation of additional processing steps increases the risk of cross-contamination between samples. Perhaps the only disadvantage of not homogenising the samples, is the lower levels of DNA (Figure 2). Whether this is actually a disadvantage, however, is unclear and we hypothesize that DNA amplified from non-homogenized samples better reflects the overall community composition because larger individuals will not contribute with an equally larger DNA release when kept intact as opposed to being homogenized. This is because homogenized specimens will contribute in proportion to their full biomass (an approximately cubic relationship to their cross-sectional radius), whereas when left unhomogenized the DNA released is a function of their surface area, thus F I G U R E 4 Assessing the effect of purifying different digest volumes from arthropod bulk samples. For each of three large mock bulk arthropod samples, 1000 μl digest and four replicates of 200 μl digest were purified, resulting in five DNA extracts from each sample on which metabarcoding was carried out. (a) Overlap and (b) number of operational taxonomic units (OTUs) resulting from purifying 1000 μl and each of four replicates of 200 μl digest. The mean Jaccard similarity coefficient (JSC) indicates the similarity between the five DNA extracts. The higher the JSC, the higher similarity F I G U R E 5 Δct values (calculated as the normalized ct -phenol/ chloroform − ct +phenol/chloroform ) for undiluted (blue) and 1:10 diluted (green) sample extracts from the nine sample pairs with the three different purification approaches. Positive values indicate a higher DNA concentration for samples treated with phenol/chloroform. * Indicates a significant difference (t test, p < 0.01) in ct values between samples treated with phenol/chloroform and those that were not proportional to the square of their cross-sectional radius. Whether this is the case requires further study, although it might explain the inconsistent OTU diversity recovered within the same homogenized samples but purified in three different ways ( Figures S3-S5, S8, S9). It should be noted that a simple sorting of specimens within bulk arthropod samples according to their size (as in e.g. Elbrecht et al., 2017) followed by non-destructive extraction of each size group, might further reduce bias caused by different specimen sizes, but will also increase the number of samples substantially.
Another non-destructive approach of assessing biodiversity was reported by Hajibabaei et al. (2012), who extracted invertebrate DNA from their preservative ethanol thus keeping individual sample constituents intact. However, this approach might not be suitable for detecting the overall diversity in bulk arthropod samples as it failed to detect a large proportion of the arthropod species in a known bulk arthropod sample (Erdozain et al., 2019;Linard, Arribas, Andújar, Crampton-Platt, & Vogler, 2016).
Second, we demonstrate that while extraction of DNA from bulk-sampled arthropods typically requires large volumes (e.g. multiple ml) of digestion buffer unless they are first subsampled, purification of only a sub-fraction of the digestion buffer maintains the overall OTU diversity (Figure 4). This in turn renders savings in terms of both time and economic costs, and further makes automatic handling more feasible as a fixed digest volume is often the only option.
Third, we demonstrate that although organic solvents such as phenol/chloroform have been used in some previous studies on arthropod genetics Zhou et al., 2013), possibly with the aim of ensuring potential inhibitory substances are purified out of DNA extracts (Hunt, 1997;Sperling, Anderson, & Hickey, 1994), their use offered no significant benefit often resulting in the same or lower numbers of OTUs compared with their omission (Figure 6; Table S2-S3). This likely indicates that the occurrence of PCR inhibitors in DNA extracts from bulk arthropod samples are equally well accounted for through dilution of DNA extracts and given both the toxicity of the organic solvents, and the manual labor required to use them, their omission is a welcome step.
Lastly, we show that automated bulk arthropod digest purification on the QIAcube platform is an effective way to optimize their processing, as it returns similar results to manual purification (Figure 7), while concomitantly offering savings in manual labor, and almost certainly improving consistency and reducing the risk of human error and contamination. We acknowledge that the QIAcube robot used in this study only has a 12-sample throughput and therefore cannot be seen as high-throughput processing. The benefits obtained from automated sample handling can, however, also be obtained from other platforms such as QIAsymphony (Qiagen) or Opentrons (Opentrons, New York, USA) with higher sample-outputs.
Naturally, despite these successes we would be amiss if we were not to highlight the fact that our results are based upon a relatively limited dataset, and thus come with several caveats. First, we highlight that our results are based on metabarcoding. While currently the most popular approach in molecular biodiversity studies, the potential of alternate approaches are now being explored, such as F I G U R E 6 Comparison of the number of operational taxonomic units (OTUs) recovered from samples for which the DNA extraction protocol included a phenol/chloroform inhibitor removal step (x-axis) and where it did not (y-axis). Specifically, DNA digests from 18 bulk arthropod samples were treated with a phenol/chloroform inhibitor removal step and compared against the same DNA digest, which did not undergo phenol/chloroform inhibitor removal. Black marks indicate two data points in the same place. The linear regression line is depicted as a solid line with associated R 2 value F I G U R E 7 Number of operational taxonomic units (OTUs) resulting from metabarcoding analyses of DNA extract digest from 18 bulk arthropod samples, which were all both purified on a QIAcube robot (y-axis) and manually purified (x-axis). Black data points indicate two data points in the same place. One homogenized sample purified on a QIAcube robot failed to PCR amplify and was not used. The linear regression line is depicted as a solid line and the R 2 value included shotgun and target capture enrichment sequencing as a means of characterizing the taxonomic diversity of complex arthropod communities (e.g. Tang et al., 2015;Zhou et al., 2013). While utilization of such approaches is unlikely to change the results, studies that directly validate this will be welcomed. Metabarcoding also relies on PCR amplification which might introduce primer bias (Elbrecht & Leese, 2015;Piñol, Mir, Gomez-Polo, & Agustí, 2015). The primer set used in this study, ZBJ-ArtF1c and ZBJ-ArtR2c (Zeale et al., 2011) is known to have lesser affinity toward, for example Hymenoptera and Coleoptera (Brandon-Mong et al., 2015;Clarke, Soubrier, Weyrich, & Cooper, 2014). In this study, all samples are, however, influenced by the same primer bias, and we found no difference in results when taking the potential biases into account (Figure 3; Tables S2, S3).
Second, we also acknowledge that the complexity of the arthropod samples analysed here is moderate, as our mock samples contain no more than 48 different taxa (Tables S2, S3). Other studies will perhaps face far richer bulk samples -for example, one single Malaise trap in New Zealand was reported to catch an average of 3,800 individual arthropods per month across a year of sampling (Moeed & Meads, 1987), while a Malaise trap deployed in Costa Rica collected more than a thousand arthropod specimens during a 7-day period, resulting in more than 350 OTUs (Gibson et al., 2014). While an increase in community complexity is unlikely to affect either organic solvent or automated purification results, and as indeed the current extraction protocol may even perform better on non-homogenized samples for the reasons outlined above, there is certainly a need for future studies to explore whether purification of subsamples of digests is equally effective when applied to such complex mixtures.
Lastly, we highlight that while our method is certainly less destructive than homogenization, given that DNA is being extracted, it is clearly not a completely non-destructive method. The DNA principally derives from the internal soft tissue , with the external features left largely intact, but even so the degree of "nondestructiveness" to these features will be taxon-dependent, and more robust specimens (e.g. those with a more sclerotized exoskeleton such as most beetles) will be better suited than others for subsequent analysis or inclusion in museum collections post extraction. Having said that, in separate pilot experiments we have succeeded in extracting DNA using the protocol introduced here on a range of more fragile specimens including flies, and subsequently included the specimens into museum collections afterwards (unpublished). Therefore, while promising, we recommend that potential users may wish to undertake pilot tests before fully adopting the method on valuable specimens.
One should also note that when scaling up, a large volume of digest buffer will be needed. We have estimated a cost of around 3.8$ per sample (using 10 ml buffer) and increasing further for larger bulk samples. The benefits of the method presented here, such as not having to sort the samples (therefore fewer samples for the downstream work), reduced handling time and the non-destructive approach outweighs the potential small increase in extraction expenses.
We have demonstrated how steps related to both sample pre-processing and DNA extraction of bulk arthropod samples can be simplified and automated without incurring significant loss of biodiversity information. This workflow is efficient, cost-effective, reduces risk of cross-contamination, and importantly, leaves specimens intact. We believe this has the potential to change the way we utilize and value DNA-based biodiversity studies of arthropods in the future.

ACK N OWLED G M ENTS
We are grateful for the permits provided by Tanzania National Parks (TANAPA), the Tanzania Commission for Science and Technology (COSTECH: No. 2013-242-ER-2012-147 to TP and 2014-138-ER-2012 to MN) and the Tanzania Wildlife Research Institute (TAWIRI). We also thank Richard M. L. Laizzer and Aloyce Mwakisoma for their invaluable assistance in the field, and the Udzungwa Ecological Monitoring Centre (UEMC) and its staff for logistical support.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.