Harnessing the MinION: An example of how to establish long‐read sequencing in a laboratory using challenging plant tissue from Eucalyptus pauciflora

Abstract Long‐read sequencing technologies are transforming our ability to assemble highly complex genomes. Realizing their full potential is critically reliant on extracting high‐quality, high‐molecular‐weight (HMW) DNA from the organisms of interest. This is especially the case for the portable MinION sequencer which enables all laboratories to undertake their own genome sequencing projects, due to its low entry cost and minimal spatial footprint. One challenge of the MinION is that each group has to independently establish effective protocols for using the instrument, which can be time‐consuming and costly. Here, we present a workflow and protocols that enabled us to establish MinION sequencing in our own laboratories, based on optimizing DNA extraction from a challenging plant tissue as a case study. Following the workflow illustrated, we were able to reliably and repeatedly obtain >6.5 Gb of long‐read sequencing data with a mean read length of 13 kb and an N50 of 26 kb. Our protocols are open source and can be performed in any laboratory without special equipment. We also illustrate some more elaborate workflows which can increase mean and average read lengths if this is desired. We envision that our workflow for establishing MinION sequencing, including the illustration of potential pitfalls and suggestions of how to adapt it to other tissue types, will be useful to others who plan to establish long‐read sequencing in their own laboratories.

and efficiency of genome assembly, especially for genomes that contain long low-complexity regions; detailed investigation of segmental duplications and structural variation (Jain et al., 2018); major histocompatibility complex (MHC) typing (Liu et al., 2017); and detecting methylation patterns (Simpson et al., 2017). The number of genome assemblies using nanopore data either exclusively or in combination with other sequencing data is steadily increasing, for example the 3.5 gigabase (Gb) human genome, the 860 Mb European eel genome, the 1 Gb genome of the wild tomato species Solanum pennellii and the 135 Mb genome of Arabidopsis thaliana (Jain et al., 2018;Jansen et al., 2017;Michael et al., 2018;Schmidt et al., 2017). In short, nanopore sequencing solves the technical challenges of reading long DNA fragments, while still having room for improvement in terms of per read accuracy.
The Oxford Nanopore Technologies (ONT) MinION makes longread sequencing accessible to most laboratories outside of a dedicated genome facility. It has very low capital cost, has the potential to generate more than 1 Gb of sequence data per 100 USD, has a footprint about the size of an office stapler and runs on a standard desktop or laptop computer. The MinION uses small consumable flowcells for sequencing, which contain fluid channels that flow samples onto a sequencing matrix and provide a small amount of fluid waste storage.
This democratization of sequencing brings the challenge that every laboratory has to establish the sequencing platform and concomitantly, new DNA extraction and library preparation protocols.
One of the primary remaining challenges is to extract and purify very long DNA fragments from the organisms or tissues of interest. This is especially important for nanopore sequencing as the native DNA molecules are directly translocated through the nanopore. Any contaminants and impurities directly interfere with the optimal sequencing outcome. Acquiring some data is easy, but it can be challenging and time-consuming to obtain reliable and good yields (>5 Gb as of writing of this article 03/2018) from challenging starting material.
Here, we illustrate the workflow we applied to establish MinION sequencing in our laboratories using the tree species Eucalyptus pauciflora as a case study. It is challenging to extract high-purity and high-molecular-weight DNA from E. pauciflora because the mature leaf tissue is physically tough, and because it contains very high levels of secondary metabolites which are known to reduce the efficacy of DNA extraction protocols (Coppen, 2002;Healey, Furtado, Cooper, & Henry, 2014). We illustrate reliable and repeatable ways of measuring DNA purity to optimize output from the MinION sequencer. We discuss important considerations for DNA library preparation, and methods to control and optimize the final distribution of read lengths. We show that during DNA extraction, small alterations in sample homogenization protocols can drastically alter DNA fragment lengths; introduce a novel low-tech size selection protocol based on solid-phase reversible immobilization (SPRI) beads; and assess the impact of size selection via electrophoresis and controlled mechanical DNA shearing. Finally, we introduce an opensource MinION user group that shares DNA extraction, size-selection and library preparation protocols for many additional organisms, making our workflow applicable well beyond the case study presented here.

| Tissue collection
Eucalyptus pauciflora leaf tissue was collected from Thredbo, New South Wales (NSW), Australia. After harvesting, the young twigs were transported in plastic bags and stored in darkness at 4°C in water until DNA extraction.

| High-molecular-weight DNA extraction and clean up
We extracted high-molecular-weight DNA based on Mayjonade's DNA extraction protocol optimized for our eucalyptus samples (Mayjonade et al., 2016;Schalamun & Schwessinger, 2017b) (Supporting information Appendix S1). Each extraction was carried out with 800-1,000 mg leaf tissue which was cut into small pieces and split between 8 separate 2-ml Eppendorf tubes, each containing two metal beads of 5 mm in diameter, before freezing in liquid nitrogen.
We lysed the tissue mechanically by grinding using the Qiagen Tis-sueLyser II for 35 s at 25 Hz. Pulverized tissue was suspended in 700 µl SDS lysis buffer (1% w/v PVP40, 1% w/v PVP10, 500 mM NaCl, 100 mM Tris-HCl pH 8.0, 50 mM EDTA, 1.25% w/v SDS, 1% w/v sodium metabisulfite, 5 mM DTT, Milli-Q water and heated to 64°C for 30 min to inactivated DNases. One µL RNase A (10 mg/ml) (Thermo Fisher) per 1 ml lysis buffer was added to the mixture after the heat treatment, followed by incubation at 37°C for 50 min at 400 rpm on a thermomixer. Twenty minutes into the incubation we added 10 µl Proteinase K (800 Units/ml) (NEB). To precipitate proteins, the tubes were cooled on ice for 2 min before adding 0.3 vol (210 µl) 5 M potassium acetate pH 7.5 and mixed by inverting the tube 20 times. The precipitates containing leaf material and proteins were removed by centrifugation at 8,000 g for 12 min at 4°C. We transferred the supernatants to new tubes and added 1.0 vol (V/V) of the SPRI beads solution as described below followed by an incubated on a hula mixer for 10 min. After a brief (pulse) centrifugation step in a microcentrifuge, we placed the tube in a magnetic stand so that the beads bound to the rear of the tube, allowing for removal of the supernatant. We then washed the beads twice with 1 ml 70% ethanol. For the wash, we kept the tube on the magnetic stand throughout the wash procedure to avoid loss of DNA bound to the beads (the tube can be rotated 360°within the stand, allowing comprehensive washing while ensuring bead retention). After the second wash, we centrifuged the tube briefly again to remove the last traces of ethanol. The beads were dried for no longer than 30 s before elution of the DNA in 50 µl TE buffer preheated to 50°C, for 10 min.
We further purified the samples using a chloroform: isoamylalcohol extraction. The eight aqueous DNA solutions were pooled to a total of 400 µl to which one volume of chloroform:isoamylalcohol (24:1) was added, mixing by inversion for 5 min. The phases were separated by centrifugation at 5,000 g for 2 min at room temperature (RT). We transferred the upper, DNA containing phase to a fresh tube performing another round of the chloroform:isoamylalcohol purification. After the second extraction, the DNA was precipitated by adding 0.1 volume 3 M sodium acetate pH 5.3 and 1 volume 100% cold ethanol, followed by centrifugation at 5,000 g for 2 min at RT. The short centrifugation at low speed may reduce DNA yields but potentially precipitates longer fragments in favour of shorter fragments. The transparent pellet was washed with 70% ethanol and resuspended in 50 µl 10 mM Tris-HCl pH 8.0 for 2 hr at room temperature. The solubilized DNA was stored at 4°C until library preparation, for a maximum of 10 days.

| g-TUBE shearing
We processed 5 µg of pure HMW DNA through a g-TUBE (Covaris) in an Eppendorf 5,418 centrifuge at 3,800 rpm (1,243 g), which is slightly lower than the recommended 4,200 rpm (1519 g) for 20 kb fragment size shearing, for a total of 2 min.

| BluePippin size selection
We used the BluePippin system (Sage science) with 0.75% dye-free Agarose cassettes and marker S1, selecting for fragments >20 kb using 6 µg sample for each lane and not the recommended 5 µg, because we were expecting a higher recovery from a slightly higher input amount as suggested by the sales representative.

| Removal of small DNA fragments <1.5 kb with optimized SPRI beads
In order to purify and remove small fragments from our DNA samples, we optimized a SPRI beads solution which we used for clean ups and library preparations. The improved beads solution consist of 11% PEG 8,000, 1.6 M NaCl, 10 mM Tris-HCl pH 9.0, 1 mM EDTA, 0.4% Sera-Mag SpeedBeads (GE Healthcare PN 65152105050250) (Schalamun & Schwessinger, 2017a) (Supporting information Appendix S2). For the clean-up procedure, 0.8 vol of this beads solution was mixed with the DNA sample and incubated on a hula mixer for 10 min. After a brief (pulse) centrifugation step in a microcentrifuge, we placed the tube in a magnetic stand so that the beads bound to the rear of the tube, allowing for removal of the supernatant. We then washed the beads twice with 1 ml 70% ethanol for all steps except after the last adapter ligation step. For the last postadapter ligation step, SPRI beads were washed with ONT's recommended ABB solution instead of Ethanol. For the wash, we kept the tube on the magnetic stand throughout the wash procedure to avoid loss of DNA bound to the beads (the tube can be rotated 360°w ithin the stand, allowing comprehensive washing while ensuring bead retention). After the second wash, we centrifuged the tube briefly again to remove the last traces of ethanol. The beads were dried for no longer than 30 s before elution of the DNA in 50 µl Tris-HCl pH 8.0 preheated to 50°C, for 10 min.

| Library preparation and sequencing
We followed two versions of the 1D SQK-LSK108 ligation protocol; mostly, we used the SQK-LSK108 selecting for long reads (SQK-LSK108long) and in some cases the regular SQK-LSK108 protocol (Supporting information Before loading, the flowcell was primed with a solution consisting of 480 µl RBF and 520 µl NFW, and first 800 µl of this solution was added into the sample port with a closed SpotON port, incubating

| DNA sample purity
The first goal of our project was to optimize extraction protocols to yield highly intact and high-purity DNA suitable for long-read sequencing. High purity of DNA is defined by Nanodrop spectrophotometer (Thermo Fisher) absorbance of DNA with a 260/280 nm ratio between 1.8 to 2.0 and a 260/230 nm ratio between 2 and 2.2 when all absorbance at 260 nm is due to double-stranded (ds) DNA (Desjardins & Conklin, 2010;Mackey & Chomczynski, 1997). Therefore, it is important that the ratio of DNA concentrations measured on the Qubit and Nanodrop instruments, respectively, should be at least 1:1.5 and optimally 1:1. A 1:1 ratio indicates that most DNA molecules are double-stranded and that no other molecules (e.g., RNA) are present that absorb at 260 nm (e.g., Qubit: 100 ng/µl; Nanodrop: 150 ng/µl gives an acceptable ratio of 1:1.5 (O'Neill, McPartlin, Arthure, Riedel, & McMillan, 2011)).
In our workflow, we first aimed to recover high-molecular-weight DNA with a Nanodrop/Qubit concentration ratio that was close to 1:1. We then optimized DNA purity based on 260/280 nm ratios, which are indicative of protein contamination, and 260/230 nm ratios, which are indicative of contamination by salts, phenol and carbohydrates (O'Neill et al., 2011). To achieve this, we first tested a well-established hexadecyltrimethylammonium bromide (CTAB) extraction protocol to extract DNA from E. pauciflora leaves collected in June 2017 from adult trees in the Kosciuszko National park near Thredbo, New South Wales, Australia (Doyle & Doyle, 1987, 1990Healey et al., 2014;Schwessinger & Rathjen, 2017). While the CTAB protocol returned good yields of double-stranded DNA (~5 µg DNA per g tissue), the Qubit/Nanodrop ratio of 0.05 indicated significant contamination with RNA or single-stranded DNA.
Nanodrop absorption spectra from 220 to 350 nm (Figure 1a) F I G U R E 1 Illustration of different purity DNA preparations. Nanodrop readings of different DNA preparations. (a) DNA extraction with CTAB lysis buffer followed by phenol: chloroform: isoamylalcohol extraction (Schwessinger & Rathjen, 2017). (b) Sample A after SPRI beads clean up. (c) DNA extraction using SDS lysis buffer and SPRI beads purification (Mayjonade et al., 2016). (d) Sample C followed by an additional chloroform: isoamylalcohol purification step. The curves are representations of 260/280 and 260/230 quality control numbers which can be found in Supporting information Table S1 [Colour figure can be viewed at wileyonlinelibrary.com] deviated drastically from pure DNA absorption curves, revealing the presence of contaminants ( Figure 1d). In such cases, it is often recommended to clean the DNA using SPRI paramagnetic beads in combination with a polyethylene glycol (PEG) and sodium chloride (NaCl) mixture, such as AMPure XP beads (Beckman Coulter). These beads bind to the DNA, but most contaminants do not and can be washed away (Krinitsina, Sizova, Zaika, Speranskaya, & Sukhorukov, 2015;Mayjonade et al., 2016). We were able to improve sample quality slightly in terms of Qubit to Nanodrop concentration ratios by adding the standard measure of 0.45 vol (V/V) AMPure XP beads ( Figure 1b) but repeating this step did not increase the purity of the DNA further as measured by 260/230 nm and 260/280 nm ratios.
Next, we tested an extraction method employing the detergent sodium dodecyl sulphate (SDS) which contains a PEG-NaCl precipitation step to capture the DNA onto SPRI beads. This approach has been reported to work well with many species including sunflower, human, and Escherichia coli (Mayjonade et al., 2016). Using this approach, we recovered high levels of double-stranded DNA (Qubit/ Nanodrop = 1:1.5), but the Nanodrop absorption curves still indicated the presence of contaminants in the final DNA extract (Figure 1c). Again, we were unable to improve the DNA purity by repeated SPRI clean up steps, obtaining a maximum of 1.5 for the 260/280 nm and a maximum of 1.0 for 260/230 nm ratios. As an alternative method, we cleaned the crude DNA obtained from the SDS-based method using a chloroform: isoamylalcohol extraction followed by isopropanol or ethanol precipitation of the DNA, as described for some fungal DNA samples (Dong, 2017). This consistently resulted in high-purity DNA with Qubit/Nanodrop ratios of 1:1-1.5, 260/280 nm ratios of~1.8, 260/230 nm ratios of~2.0 and excellent Nanodrop absorbance curves (Figure 1d).
ONT 1D (1D because only one DNA strand is sequenced) library preparations involve the ligation of sequencing adapters at both 3' ends of end-repaired double-stranded DNA. Sequencing adapters carry a motor protein that guides the DNA to the pore and regulates the translocation speed of the DNA across the pore.
In addition, they carry a characteristic DNA sequence which is used by basecallers to recognize the translocation start of a new DNA molecule (Jain et al., 2016;Leggett & Clark, 2017). We tested the effect of sample impurities on MinION output using the 1D ligation protocol. Our three samples differed primarily in their 260/230 nm ratios. One suboptimal sample (sample 5, Table 1), for which no chloroform: isoamylalcohol clean up step was performed, had a low ratio of 1.0. The other two samples (samples 10 and 27, Table 1) had close-to-optimal ratios of 2.1 and 2.3, respectively. The sample with the low 260/230 nm ratio yielded an order of magnitude less sequence data from a single flowcell compared to the other two samples (0.7 Gb vs.~7 Gb, respectively, Table 1, Supporting information Table S1). It seems likely that the contaminants causing the reduced 260/230 nm ratio inhibited the library preparation or the sequencing itself.

| Sequencing library preparation
The manufacturer-recommended kit for library preparation, which was LSK108 at the time of the experiments, involves DNA repair and end-prep and is optimized for 0.2 pmol of input DNA with an average fragment size of 8 kb, which in turn requires 1 µg of double-stranded DNA. This implies that the DNA input as expressed in mass needs to be adjusted according to the concentration of free DNA ends available for adapter ligation, which is a function of fragment length (Mayjonade, 2018;Schwessinger, 2018). The molarity of the DNA sample can be calculated using the Promega BioMath calculator (https://www.promega.com/a/apps/biomath/) which requires the average fragment length to calculate the respective DNA mass for 0.2 pmol. For example, 0.2 pmol of DNA of mean length 24 kb requires a DNA input of 3 µg. In our case, we estimated a mean DNA fragment length of~30 kb based on the slight low-molecularweight smear observed during 0.8% agarose gel electrophoresis (Figure 2) and the strongest staining between 24 and 97 kb during pulsed-field gel electrophoresis (PFGE) which provides higher-resolution in the high-molecular-weight range (Figure 3). When estimating mean DNA fragment length based on fluorescent intensity (e.g., after staining with SYBR red or ethidium bromide), it is important to consider that smaller DNA molecules incorporate less dye so appear fainter during imaging. For example, even faint DNA smears below 10 kb can indicate the significant presence of short DNA fragments that are best avoided if long-read lengths are a primary goal of the sequencing effort (see below.). Failure to account for this can easily lead to overestimation of mean DNA fragment length, and miscalculation of the true molar concentration of DNA fragments.
As a starting point, we defined the optimal DNA input based on our initial mean fragment length estimate of 30 kb. This was followed by empirical adjustments from plotting sequencing outputs versus the DNA input into adapter ligation (Figure 4). This approach revealed an optimum of~2 µg dsDNA (Figure 4), which required an input of 2.9 µg DNA for the DNA preparation stage considering typical losses of 30% after clean up using in-house SPRI beads (see below). Neither decreasing or increasing the DNA input improved the sequencing output, due to too few adapter-DNA molecules, or too many free DNA molecules potentially interfering with the sequencing reaction. Assuming that 2.9 µg input DNA was the equivalent of 0.2 pmol (recommended concentration as per the ONT protocol LSK108 that was used), we estimate a mean DNA fragment length of 23 kb for our sample preparation. This suggests we initially  Table 2. #3 (sample 9) DNA accidentally sheared during the extraction procedure with mean read length of 5 kb as shown in Table 2 F I G U R E 3 Purposeful mechanical shearing and high-pass filtering alter DNA fragment length distribution. Pulsed-field gel electrophoresis of differently treated DNA samples. Lane #1 and #5 (L) MidRange II PFG marker (BioLabs). Lane #2 (sample 10) DNA extracted following the default HMW DNA extraction protocol (mean read length of 13 kb as shown in Table 4). Lane #3 (sample 2) same DNA extraction as in #2 followed by size selection with the BluePippin using 20 kb high-pass filtering (a mean read length of 26 kb as shown in Table 4). Lane #4 (sample 4) same DNA extraction as in #2 followed by mechanical shearing with a g-TUBE (a mean read length of 11.8 kb as shown in Table 3) ( Figure 2) belied a drastic shift in read-length distributions; the mean read length dropped from~13 kb to 4.9 kb and the median from~7 to 2.5 kb (  DNA shearing did not increase yield, but did affect the read-length distribution ( Note. Read-length comparison for samples sheared during the extraction process. Comparison of N50 Q7, mean read length Q7 and median read length Q7 between untreated samples (#10 and #27) and the DNA sample sheared during DNA extraction as shown in Figures 2 and 3 (#9).
sequence read-length distribution from sheared reads shifted to smaller values and peaked at about 11 kb ( Figure 6), with an N50 Q7 of 18 kb, compared to an N50 Q7 of~26 kb from the unsheared samples (Table 3). Here, a quality score of 7 (Q7) represents the default quality threshold from the basecaller. Interestingly, the median read length from the sheared DNA samples increased to 7.5 kb from 6.5 kb when compared to unsheared DNA. At the same time, lowquality short reads were reduced in the sheared samples ( Figure 6).
We also tested the effect of removing DNA fragments below 20 kb by size selection using the BluePippin system in the high-pass mode which enables the collection of DNA molecules above a certain size. When we applied the 20-kb high-pass filter, we were able to remove DNA fragments less than 20 kb while maintaining the high-molecular-weight size distribution (Figure 3). After sequencing, the read-length N50 Q7 increased to 35 kb from 26 kb, while the mean and median read lengths increased to 26 and 23 kb from 12 and 6.5 kb, respectively (Table 4 and Figure 3). The main drawbacks of BluePippin high-pass size selection were the high sample loss (65%-75%), the increase in cost and prolonged sample handling.  Note. Read-length comparisons for unsheared and sheared samples. Comparison of N50 Q7, mean read length Q7 and median read length Q7 of untreated samples (#10 and #27) and sheared (g-covaris tube) samples (#4 and #23) (Figure 3).

| Real time and between run evaluation
T A B L E 3 Targeted mechanical DNA shearing does not increase sequencing throughput stop run). Initially, we only stopped sequencing runs with pore occupancy below 30%. As we improved our sample handling and library preparation, we only continued runs with pore occupancy greater than 70% within 1 hr of starting the run. Otherwise, we stopped the sequencing run, washed the flowcell and loaded a new library to ensure high throughput (Supporting information Figure S1). We reasoned that low-throughput runs were usually due to insufficient DNA molecules being ligated to sequencing adapters during the library preparation. We found that, given our DNA fragment length distribution, we had to load approximately 0.75 µg library DNA (~0.07 pmol) onto a flowcell to achieve optimal yields (Supporting information Table S1, Figure S2). To ensure sufficient adapter ligated DNA, we started library preparation with at least 4 µg of high-quality starting DNA to account for potential losses during the SPRI bead clean-up steps. A second pore statistic to consider is the number of unavailable pores, for example, "zero" (black), "unavailable" (light blue), or "active feedback" (pink) (Mayjonade, 2018). If these numbers increase too quickly in the first few hours of the run, it is likely that the library is contaminated and the pores are being irreversibly blocked or damaged or that the membrane has ruptured. Furthermore, the length distribution from the length graph can be easily assessed and, if unsatisfactory, the library exchanged for a separately prepared sample ( Figure S1).
One key to ongoing optimization of flowcells in our laboratories was the tracking of all parameters for each sequencing run using our monitoring spreadsheet (Table S2)   Note. Read-length comparisons for BluePippin size-selected samples. Comparison of N50 Q7 , mean read length Q7 and median read length Q7 of untreated samples (10) and (27) and BluePippin size-selected samples (2) (Figure 3).
F I G U R E 7 MinION nanopore sequencing workflow to optimize sequencing output. A short overview of important steps to consider when getting started, from preparation of sample to quality control of sequence output. Each box represents an essential step in this workflow. Starting with a sample optimized DNA extraction, achieving high yields of HMW DNA, followed by a quality control step using Nanodrop and Qubit values and agarose gels. Only from those samples that pass these QC requirements a sequencing library can be prepared with a minimum input amount of~3 µg of~30 kb DNA library for the LSK108 selecting for long-read (ONT) library protocol. Once sufficient (~1 µg) prepared library was loaded onto the flowcell, the sequencing run can to be interpreted using the MinKNOW graphical user interphase (Supporting information Figure S1). The sequence output is basecalled either real time or after sequencing (as for this project) into fastq files. Using "sequencing_summar.txt" file from Albacore or Guppy basecaller quality control can be performed using the minion QC script (Lanfear et al., 2018) [Colour figure can be viewed at wileyonlinelibrary.com] obtain more than 6 Gb of data from each flowcell, with mean read lengths consistently above 12 kb.

| DISCUSSION
Here, we present a complete workflow to establish MinION longread sequencing (Figure 7) in any laboratory using the recalcitrant plant species eucalyptus as test case.

| Recommendations for obtaining high-quality high-molecular-weight DNA
The key starting material to every successful nanopore run is clean input DNA into the library preparation. DNA purity can be measured We (see above) and many others have reported that NaCl/PEG-SPRI bead solutions are not always ideally suited to clean up DNA as contaminants simply coprecipitate. Following a similar logic of preferential precipitation, we hypothesize that is possible to first precipitate contaminants onto SPRI beads at low NaCl/PEG concentrations when HMW DNA stays in solution. Contaminants with higher affinity to SPRI beads and lower solubility than DNA can thereby be removed from the solution. In a subsequent step, DNA can be precipitated out of the remaining supernatant by adding more of the initial NaCl/PEG-SPRI beads solution. This will increase the NaCl/PEG concentration and thereby precipitate the DNA out of solution onto the newly added SPRI beads (Nagar & Schwessinger, 2018b).
It is important to mention that we have had DNA preparations that fulfilled all our recommended quality control criteria but did not sequence well on the MinION. This was likely caused by "invisible" contaminants that did not absorb at the tested wavelengths (200-340 nm). However, applying a combination of the approaches suggested above enabled us to overcome this problem with our latest protocol (Nagar & Schwessinger, 2018b Figure 4. This approach can help to quickly optimize the amount of input DNA added to the ligation step. Most genome sequencing projects benefit from optimizing mean, median and N50 read length. Here, we tested the impact of DNA shearing using g-TUBEs and size selection via BluePippin on read-length distribution and sequencing output (Figures 6 and 8). Overall, we did not employ DNA shearing or size selection in our final sequencing protocol even though they reduced the variance in read-length distributions (Figures 6 and 8). In our case, the high-quality sequencing results achieved with our standard protocol using the improved SPRI beads mixture did not warrant the additional time and financial investment required when incorporating g-TUBEs DNA shearing or BluePippin size selection into our workflow. However, other projects may well benefit from maximizing read length via BluePippin size selection or of ultra-long-read sequencing protocols using the transposase-based DNA library kit RAD004 (Jain et al., 2018;Quick, 2018).
One intriguing observation we made was that shearing DNA reduced the abundance of low quality "reads" (Figure 6). It is possible that removing long DNA fragments (>50 kb) reduces the incidence of long DNA molecules being stuck in the pore, at least when using the R9.5 pore chemistry. This hypothesis is supported by the observation that sheared DNA samples (4, 9, 23) have a lower tail of short low quality reads when compared to unsheared (10, 27) or BluePippin size-selected samples (2), as shown in Figure 6 when comparing the density plots of "All reads" versus "Q ≥ 7." This highlights that filtering reads based on their Q-scores, as well as removing short sequencing reads, may help to avoid error propagation during downstream analyses of the data.
Lastly, user experience (r = 0.67) and number of available pores on the flowcell (r = 0.62) are the other two tracked variables (Supporting information Tables S1 and S2) that are linearly related to sequencing yields as revealed by posterior linear regression analysis (Supporting information Figure S3). Hence, experience and high-quality flowcells in combination with high-quality DNA will generate the best sequencing results.

| Recommendations for genome sequencing projects
In total, the combined data from all flowcells described here (Supporting information Table S1) comprised 107 gigabases of sequence, in 12.6 million reads with an average length of 8,513 bases, a median length of 2,956 bases, and an N50 of 24,021 bases. Approximately 4 million of these "reads" were extremely short and very low quality such that the reads with an Albacore Q score ≥ 7 comprised 103 gigabases of sequence in 8.7 million reads, with an average length of 11,959 bases, a median length of 6,054 bases, and an N50 of 24,513 bases. If we assume that the genome size of E. pauciflora is approximately 500 Mb, approximately in line with the conserved genome sizes of other eucalypts, this represents around 200x coverage of the genome (Grattapaglia et al., 2012). The length distribution of the reads is such that we generated 61 gigabases of reads (or 122x coverage) longer than 20 kb, and 15 Gb of reads (or~30x coverage) longer than 50 kb. These read distributions are expected to be more than sufficient to assemble a high-quality reference genome, particularly if they are combined with high-accuracy short read data to polish minor errors (Jiao & Schneeberger, 2017;Michael et al., 2018;Schmidt et al., 2017).

| CONCLUSION
The field of nanopore sequencing is extremely fast moving.
Updates on sequencing library kits and the MinKNOW software made some of the specific recommendations of the initial version of this manuscript less applicable. At the same time, much of the general pointers and advice will be useful to new laboratories starting out with MinION sequencing independently of updates in sequencing chemistry and software. Overall we highlight the importance of clean high-molecular-weight DNA for successful sequencing runs and provide detailed wet laboratory DNA extraction and purification protocols that include size selection. Once established under regular laboratory conditions, some of these protocols may also be adaptable to sequencing in the field by reducing their complexity. All of these protocols, and many others applicable to different starting material provided by other community members, are freely available on the open-access protocol sharing repository protocols.io in form of a MinION user group (https://www.protocols. io/groups/awesome-DNA-from-all-kingdoms-of-life) (Schwessinger, 2016). We encourage others to contribute to this open science platform to accelerate research and for the community to save costs when establishing long-read DNA sequencing in their own laboratories. High-quality "living" protocols with careful run and run-to-run evaluations as described here (see Supporting information Table S2 and R script on https://github.com/gringer/minionuser-group for inspiration) will facilitate knowledge generation instead of constant "reinvention of the wheel" (Lanfear et al., 2018).