Whole genome sequencing of marine organisms by Oxford Nanopore Technologies: Assessment and optimization of HMW‐DNA extraction protocols

Abstract Marine habitats are Earth's largest aquatic ecosystems, yet little is known about marine organism's genomes. Molecular studies can unravel their genetics print, thus shedding light on specie's adaptation and speciation with precise authentication. However, extracting high molecular weight DNA from marine organisms and subsequent DNA library preparation for whole genome sequencing is challenging. The challenges can be explained by excessive metabolites secretion that co‐precipitates with DNA and barricades their sequencing. In this work, we sought to resolve this issue by describing an optimized isolation method and comparing its performance with the most commonly reported protocols or commercial kits: SDS/phenol–chloroform method, Qiagen Genomic Tips kit, Qiagen DNeasy Plant mini kit, a modified protocol of Qiagen DNeasy Plant kit, Qiagen DNeasy Blood and Tissue kit, and Qiagen Qiamp DNA Stool mini kit. Our method proved to work significantly better for different marine species regardless of their shape, consistency, and sample preservation, improving Oxford Nanopore Technologies sequencing yield by 39 folds for Spirobranchus sp. and enabling generation of almost 10 GB data per flow cell/run for Chrysaora sp. and Palaemon sp. samples.

The morphological convergence of marine organisms and their phenotypic plasticity often leads to imprecise taxonomic assignment.
Studying their genomes can clarify the phylogenetic relationships among and within genera and species (Panova et al., 2016). However, the molecular approach has been impeded by scarcely available genomic data.
We recently, launched a project that aims to sequence whole genomes of Qatar's marine species to identify potential new marine organisms and create a library for Qatar's marine species. We applied Oxford Nanopore Technologies (ONT) platforms to achieve this aim as it generates long reads that facilitate genome assemblies.
The ONT's sequencing flow cells utilize hundreds of nanopores concurrently for reading single-stranded DNA at up to 450 nucleotides per second, resulting in several gigabases of sequence during a 2-day run. As the technology does not rely on PCR or discrete strand synthesis events, DNA fragments can be arbitrarily long (Jansen et al., 2017). However, to successfully utilize this genomic approach, obtaining a pure and high molecular weight (HMW) DNA is required.
Such DNA extraction from marine organisms is yet an arduous task due to the various tissue consistency and high polysaccharide, polyphenol, and other secondary metabolites content co-extracted with the DNA (Ramakrishnan et al., 2017). These contaminants can inhibit the activity of the enzymes during library preparation, rendering the DNA useless for downstream applications.
Therefore, we surveyed a wide range of homogenization methods and genomic DNA extraction protocols either using chemical extraction (Lin et al., 2016) or commercial extraction kits to assess their efficiency in providing high-quality HMW-DNA that is suitable for long reads sequencing.

| MATERIAL S AND ME THODS
Specimens of the intertidal belt-forming worm Spirobranchus sp.
were collected along Qatari coasts. Most specimens were found in dense aggregations containing thousands of individuals covering the rocks' surfaces. They were thus removed from their tubes by carefully cracking the tubes with a knife, placed in vials, and transported to the laboratory for further analysis (Pazoki et al., 2020).
Due to their small size, two worm specimens (~70 mg) were used for DNA isolation for each of the six investigated extraction protocols: Sodium dodecyl sulfate (SDS)-phenol/chloroform protocol (50 mM Tris-HCl pH 8, 0.1 M EDTA, 1% SDS, 0.2 M NaCl, and 100 μg/ml proteinase K) as reported previously (Lin et al., 2016) and the Qiagen Genomic Tips 100/G kit as recommended by ONT (Jansen et al., 2017) with a liquid nitrogen homogenization step. Four other protocols were also investigated based on three wildly used commercial kits when dealing with difficult samples (DNeasy Blood & Tissue kits; QIAamp DNA Stool mini kit; Qiagen's DNeasy Plant mini kit; and modified DNeasy Plant protocol). The addition of a mechanical homogenization step was also assessed; and DNA yield, purity, and sequencing profile were compared (Angthong et al., 2020).
The first three protocols were carried out as per manufacturer's guidelines. For the fourth protocol that utilizes the DNeasy Plant mini kit, a proteinase digestion step was introduced (50 µl of 20 mg/ ml stock solution) during the sample lysis (Table 1).
A new optimized protocol concocted from the combination of different steps of the common methods was also investigated. Initial wash steps were performed three times by PBS buffer for 5 min at 4000 × g. Mechanical homogenization was then achieved in Lysis Buffer G2, provided with the Genomic Tips 100/G kit, using the TissueRuptor II (Qiagen) on ice at low speed 2 for 10 s (a second step may be needed in case of specimens with hard shells). Proteinase K (20 mg/ml) and RNAse A (200 µg/ml) were added as per the manufacturer's guidance and incubated at 50°C. Different incubation periods were tested: incubation for 2 h as recommended and 16 h (overnight) incubation. Then, a centrifugation step was performed for 5 min at 4000 × g to pellet the debris. Two tablets of InhibitEX (Qiagen) were added to the obtained supernatant and solubilized by inversion followed by 5 min incubation at room temperature.
Another centrifugation step was performed for 5 min at 4000 × g and again the pellet was discarded. The retained supernatant was then subjected to extraction by gravity flow procedure according to the Genomic Tips kit guidelines with some adjustments: an extraincubation step of 5 min at room temperature after the addition of the isopropanol and another extra-incubation step of 1 h at 37°C for the final elution with pre-warmed eluting buffer.
The extracted DNA was subjected to quantification using a Qubit dsDNA High Sensitivity Assay kit on the Qubit ® 4.0 Fluorometer (ThermoFisher Scientific). Purity and fragment length were checked using the Nanodrop ratio A260/A280 and a 0.7% agarose gel visualization, respectively (Ketchum et al., 2018). Quality was also evaluated by PCR amplification using two pairs of primers targeting nuclear and mitochondrial genes and amplifying two different sizes of DNA fragments (Halt et al., 2009).
Extracted DNA was then subjected to library preparation using ONT's Ligation Kit (SQK-LSK109) according to the manufacturer's guidelines. Briefly, genomic DNA (800 ng) was subjected to end repair and a tailing by NEBNext FFPE DNA Repair mix and NEBNext Ultra II End repair/dA-tailing modules (New England Biolabs) prior to the ligation. The sequencing adaptors were ligated using NEBNext Quick Ligation module (New England Biolabs) after an AMPure XP (Beckman Coulter, California, USA) magnetic beads purification step.
After a final product clean-up using the Long Fragment Buffer (LFB), the sequencing library was loaded to a primed FLO-MIN106 flow cell on a GridIon device for a 48-h run. The sequencing device control, data acquisition, and real-time basecalling were carried out by the MinKNOW software. Translocation speed, pore occupancy, cumulative output, and N50 of the read length were continuously checked during the sequencing runs.

| RE SULTS
Using the most commonly reported approaches: SDS-Phenol/chloroform protocol and the Genomic Tips protocol with liquid nitrogen snap freezing as a pre-treatment step, the purity of the retained DNA was 2.35 and 1.98, respectively, as determined by the A260/ A280 ratio (Table 1). Signs of DNA degradation/contamination were observed after agarose gel electrophoresis ( Figure 1a).
When the obtained DNA was subjected to ONT sequencing, the translocation speed was outside the recommended range right from the beginning of the sequencing run. For both protocols, pore occupancy did not exceed 40%, and more than 98% of the sequencing reads were classified as failed. According to the default setting of ONT runs, the reads are classified as failed after MinKNOW basecalling runs through a qscore filtering process. This qscore filtering only looks into the quality of the data and does not consider minimum read length, consequently any reads with qscore value lower than 7 are sent to the "fail" folder while basecalling. The obtained passed reads are represented by a flat green line almost confounded with the X axis (Figure 2a,b). DNA extractions and ONT sequencing runs were repeated more than once; the same observations of DNA degradation and failed ONT reads were obtained.
To minimize any bench-to-bench variability, both extracts were also subjected to automatic library preparation by VolTRAX V2 device using the VolTRAX Sequencing kit (VSK-VSK002) and the VolTRAX Configuration kit (VCK-V2002). Library preparations were unsuccessful as the device was not able to complete the process due to blockage occurrence. Same observations were recorded even after the experiment was repeated twice with diluted DNA input.
Using the TissueRuptor II for the homogenization step and silica membrane-based kits, the recorded DNA purity A260/280 ratio was 1.44, 1.67, 1.76, and 1.9 for the Plant, modified-Plant, Tissue, and Stool Qiagen extraction protocols, respectively (Table 1).
Both Plant kit-based protocols yielded higher DNA quantity but  Table 2).
Successful sequencing runs with a majority of called passed reads were obtained with the extracts of the Tissue and Stool kits.
The Stool kit extraction protocol showed increased output with higher N50 read length: 3.91 kb compared to 1.56 kb observed by the Tissue extraction protocol (Table 2).
For the optimized protocol (Figure 3), the tissue lysis, the incubation conditions of the proteinase K, the DNA precipitation, and the elution steps were optimized. When the lysis step was carried out as recommended in commercial kits (2 h), a substantial smear on the agarose gel was detected. An increased incubation time to overnight resulted in a more efficient lysis with a clear genomic DNA band and minimal smear (Figure 1c). According to the Genomic Tips manufacturer's guidelines, 1-2 h incubation at 55°C is required for the final elution step. However, evaporation of the elution buffer volume was recorded when proceeding with our samples (decreased elution TA B L E 1 Assessment of the different extraction protocols volume). The modification to 1 h incubation at 37°C gave satisfactory results. The yield of DNA increased also when we added a 5 min incubation step after the addition of the precipitating chemical (isopropanol) (Figure 3). Optimal nucleic purity was recorded as evident by A260/A280 ratio of 1.83 (Table 1), which suggests that the DNA extraction was free of proteins and polyphenolics/polysaccharide compounds. Sequencing of the extract resulted in a read length N50 of =13.82 kb with an increase in the ONT data generation by 39-fold comparing to the mean of all the previous tested protocols (Table 2).
When applying the same protocol for two other marine species (up to ~200 mg of tissue) with different tissue consistency: jellyfish sample (Chrysaora sp.) and shrimp sample (Palaemon sp.), the A260/ A280 ratio ranged from 1.85 to 1.89, respectively (Table 1). When sequenced, flow cell occupancy of more than 82% was recorded ( Figure 4a) as well as 9.95 GB of generated data for Palaemon sp., whereas sequencing of Chrysaora sp. generated 8.64 GB of data with a read length N50 of =24.03 kb (Figure 4b) within a 24-h run (Table 2).
It is noteworthy that the quality of DNA (A260/280, agaroses gel) was not always indicative of its amplification capabilities as all the DNA extracts from the seven protocols were amplified successfully for two independent genetic markers ( Figure 5).

| DISCUSS ION
Investigating the marine organism's genomes can bring interesting insights into clarifying phylogenetic relationships among and within different genera and species, thus shedding light on some plasticity and adaptability characteristics of the species. However, a lack of genomic data from marine fauna is noted, which may be limited by the fastidious high-quality DNA extraction in sufficient amounts of marine species with different consistency (Ketchum et al., 2018) and thus, optimize the conditions for successful sequencing runs.
For non-destructive DNA extraction methods, a homogenization of the target sample in liquid nitrogen is recommended prior to the extraction procedure. We used this approach for the two most commonly reported extraction protocols. The first widely reported protocol within the marine species extraction uses phenol and chloroform with SDS buffer to extract the DNA from cellular molecules and debris (Ferrara et al., 2006;Pinto et al., 2000;Sokolov, 2000).
We utilized this protocol to extract DNA from a worm that is habitant in the Arabian Gulf near the cost of Qatar. As Spirobranchus sp.
was the organism with the smallest genome within the collection, we used it as a start model. However, the DNA extracted using these chemicals was not successfully sequenced on the ONT GridION platform.
The second DNA extraction protocol followed the guidelines of the Genomic Tips kit as recommended by the ONT experts (Jansen et al., 2017). According to the manufacturer, the kit operates by gravity flow and thus provides less DNA damage that might be observed with the centrifugation steps. The kit also uses unique anionexchange technology to purify high molecular weight DNA up to 150 kb with an average length of 50-100 kb. DNA extracted using this kit with a liquid nitrogen solubilization step also failed to sequence well the GridION platform. This led to the notion that the common liquid nitrogen homogenization step may be questionable, particularly when improvements were noticed after it was replaced with a mechanical homogenization within the Genomic Tips kit extraction (Table 2).
To optimize laboratory methods for obtaining high-quality of HMW-DNA from marine samples, we then tested silica-based approaches that are reported ideal for high-throughput DNA yield, especially from small tissue samples (Street et al., 2020). Additionally, liquid nitrogen homogenization has been substituted by mechanical homogenization of the target tissues in the corresponding lysis buffer.
High molecular weight genomic DNA extraction from marine invertebrates has been previously reported upon the addition of cetyltrimethylammonium bromide (CTAB) and polyvinylpyrrolidone (PVP) in the extraction buffer (Lin et al., 2016). These chemicals are known to bind to polysaccharides and polyphenolics and are commonly used in DNA extraction protocols for plants (Jobes et al., 1995). Consequently, we investigated the yield of the DNeasy Plant extraction kit. With this kit, high DNA concentration was observed, however, low purity was noted with noticeable smearing after gel electrophoresis. The reason of the low purity might be due to the missing protease digestion step resulting in a potential co-precipitation of proteins during the elution as evident by the low A260/280 index (Chakraborty et al., 2020). We modified the protocol by including an extra Proteinase K step. The recorded purity index slightly improved but remained out of the recommended conditions for ONT.
When we assessed the DNA extraction by the tissue and stool protocols, the quantity and purity yield were higher than observed by the previously investigated protocols. It seems that the mechanical homogenization and the protease digestion are crucial steps to a higher DNA yield (Ramakrishnan et al., 2017).
The Qiagen Tissues kit protocol uses two lysis buffers: ATL buffer containing the sodium dodecyl sulfate SDS that acts as a detergent and aids in cell lysis. It disrupts non-covalent bonds in proteins to denature and unfold them; and the AL buffer that contains the chaotropic agent guanidinium chloride. The latter component promotes the lysis and also inactivates nucleases and promotes nucleic acid binding to silica membrane. Comparing the two kits,

F I G U R E 3 Flowchart representation of step-by-step optimized protocol
As the purity and quality of the DNA extracted by all commonly used protocols did not fulfill the requirements for ONT sequencing, we used an optimized combination of these protocols. Our optimized protocol (Figure 3) included the addition of some critical steps related to tissue maceration/inhibitors removal and the adjustments of proteinase K incubation as well as the standard extraction techniques (Pinto et al., 2000). Moreover, the established protocol includes initial washing steps with PBS buffer to remove any potential impurity and/or any excess buffer retained with the sample followed by low speed (4000 × g) centrifugation steps (PBS wash, debris, and inhibitors removal) to minimize DNA damage. The addition of these steps in combination with the Genomic Tips kit improved the quality of the DNA that, subsequently, was successfully sequenced on ONT platform. Giraldes), and thus the subsequent protocols, optimization steps, and the final established protocol were based on fresh tissues. However, due to sampling seasonality, resampling of other organisms was not possible. The optimized protocol was tested on the available species with different viscosity content: ethanol conserved jellyfish specimen (Chrysaora sp.) and frozen conserved shrimp specimen (Palaemon sp.).
The sequencing output was not affected by the conservation mode of the sample as pore blockage was no longer observed during the runs and data acquisition was high.
In conclusion, we presented an extraction protocol for different marine species optimized to provide high-quality and increased yields of HMW-DNA for whole genome sequencing by ONT technology. Our work highlights the remarkable effects of some adjustments and additions to the Genomic Tips protocol that helped to achieve a pure and high molecular weight DNA. While the extracted DNA was tested on ONT platform and considering the improvements observed, we believe that this protocol could be applied to extract HMW-DNA that is suitable for other molecular analysis or to be sequenced on other platforms.

ACK N OWLED G M ENTS
This work was funded by Qatar Petroleum project number: QUEX-QP-BRC-GH-18/19. Open Access funding was provided by the Qatar National Library.

CO N FLI C T O F I NTE R E S T
All authors declare no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The manuscript is a protocol paper with detailed methodology steps within the article text. So, no extra data are generated to be deposited.