A low‐cost pipeline for soil microbiome profiling

Abstract Common bottlenecks in environmental and crop microbiome studies are the consumable and personnel costs necessary for genomic DNA extraction and sequencing library construction. This is harder for challenging environmental samples such as soil, which is rich in Polymerase Chain Reaction (PCR) inhibitors. To address this, we have established a low‐cost genomic DNA extraction method for soil samples. We also present an Illumina‐compatible 16S and ITS rRNA gene amplicon library preparation workflow that uses common laboratory equipment. We evaluated the performance of our genomic DNA extraction method against two leading commercial soil genomic DNA kits (MoBio PowerSoil® and MP Biomedicals™ FastDNA™ SPIN) and a recently published non‐commercial extraction method by Zou et al. (PLoS Biology, 15, e2003916, 2017). Our benchmarking experiment used four different soil types (coniferous, broad‐leafed, and mixed forest plus a standardized cereal crop compost mix) assessing the quality and quantity of the extracted genomic DNA by analyzing sequence variants of 16S V4 and ITS rRNA amplicons. We found that our genomic DNA extraction method compares well to both commercially available genomic DNA extraction kits in DNA quality and quantity. The MoBio PowerSoil® kit, which relies on silica column‐based DNA extraction with extensive washing, delivered the cleanest genomic DNA, for example, best A260:A280 and A260:A230 absorbance ratios. The MP Biomedicals™ FastDNA™ SPIN kit, which uses a large amount of binding material, yielded the most genomic DNA. Our method fits between the two commercial kits, producing both good yields and clean genomic DNA with fragment sizes of approximately 10 kb. Comparative analysis of detected amplicon sequence variants shows that our method correlates well with the two commercial kits. Here, we present a low‐cost genomic DNA extraction method for soil samples that can be coupled to an Illumina‐compatible simple two‐step amplicon library construction workflow for 16S V4 and ITS marker genes. Our method delivers high‐quality genomic DNA at a fraction of the cost of commercial kits and enables cost‐effective, large‐scale amplicon sequencing projects. Notably, our extracted gDNA molecules are long enough to be suitable for downstream techniques such as full gene sequencing or even metagenomics shotgun approaches using long reads (PacBio or Nanopore), 10x Genomics linked reads, and Dovetail genomics.

have established a low-cost genomic DNA extraction method for soil samples. We also present an Illumina-compatible 16S and ITS rRNA gene amplicon library preparation workflow that uses common laboratory equipment. We evaluated the performance of our genomic DNA extraction method against two leading commercial soil genomic DNA kits (MoBio PowerSoil® and MP Biomedicals™ FastDNA™ SPIN) and a recently published non-commercial extraction method by Zou et al. (PLoS Biology, 15, e2003916, 2017). Our benchmarking experiment used four different soil types (coniferous, broad-leafed, and mixed forest plus a standardized cereal crop compost mix) assessing the quality and quantity of the extracted genomic DNA by analyzing sequence variants of 16S V4 and ITS rRNA amplicons. We found that our genomic DNA extraction method compares well to both commercially available genomic DNA extraction kits in DNA quality and quantity. The MoBio PowerSoil® kit, which relies on silica column-based DNA extraction with extensive washing, delivered the cleanest genomic DNA, for example, best A260:A280 and A260:A230 absorbance ratios.
The MP Biomedicals™ FastDNA™ SPIN kit, which uses a large amount of binding material, yielded the most genomic DNA. Our method fits between the two commercial kits, producing both good yields and clean genomic DNA with fragment sizes of approximately 10 kb. Comparative analysis of detected amplicon sequence variants shows that our method correlates well with the two commercial kits. Here, we present a low-cost genomic DNA extraction method for soil samples that can be coupled to an Illumina-compatible simple two-step amplicon library construction workflow for 16S V4 and ITS marker genes. Our method delivers high-quality genomic DNA at a fraction of the cost of commercial kits and enables cost-effective, large-scale amplicon sequencing projects. Notably, our extracted gDNA molecules are long enough to be suitable for downstream techniques such as full gene sequencing or

| INTRODUC TI ON
In the last decade, microbiome studies have been increasing rapidly in popularity, from 4505 publications by December 2010 to 66,250 publications by February 2020 (PubMed reports for search term "microbiome"). Next-generation sequencing (NGS) has made microbiome studies more accessible to a wider audience of researchers through increases in throughput and falling costs (Schwarze et al., 2020). Common remaining bottlenecks for larger-scale environmental microbiome studies are the price and the hands-on time required for NGS-quality genomic DNA (gDNA) extraction and NGS library preparation. Studies sampling inhibitor-rich materials such as soil (Bahram et al., 2018;Walters et al., 2018) are further restricted to the use of specialist commercial kits (costing up to $10.26 per extraction; Marotz et al., 2017). This cost, coupled with the many handling steps, can limit studies to smaller sample numbers. Custom gDNA extraction workflows have been described, but many current methods are low in extraction yield, throughput and often not tested for NGS or microbiome purposes or optimized for soil (Abdel-Latif & Osman, 2017;Zou et al., 2017). Commonly used protocols for nucleic acid purification are often column and centrifuge based, which are more laborious, harder to automate and so not easily used in a high-throughput manner or scaled economically (Hamedi et al., 2016;Miao et al., 2014;Narayan et al., 2016;Oberacker et al., 2019).
More recent open-source protocols for rapid DNA purification using coated magnetic particles (Oberacker et al., 2019) or even cellulose (Zou et al., 2017) address many of these shortcomings. Carboxylated or silica-coated magnetic particles as described by Oberacker et al. (2019) are commonly used in most NGS laboratories and can be readily automated, thus driving down costs effectively (Fisher et al., 2011). Researchers face a similar bottleneck with Illumina amplicon library constructions for microbiome typing (e.g., using 16S, ITS or 18S markers) as for nucleic acid extraction. Commercial kits are limited to a small number of barcoded libraries (Minich, Humphrey, et al., 2018), while specialist workflows (e.g., the Earth Microbiome Project benchmarked protocols) use custom sequencing primers and therefore cannot be processed using standard Illumina protocols. This limits the choice of the available sequencing provider and affects throughput and sequencing prices (Walters et al., 2016) restricting many projects to lower throughput in-house platforms such as the Illumina MiSeq. For comparison, the Illumina MiSeq v2 500cycle kit has an 8.5 Gb maximal output whereas the NovaSeq 6000 SP 500-cycle kit has a 400 Gb maximal output (Bahram et al., 2018;Bartoli et al., 2008;Thiergart et al., 2020).
In scenarios with a high number of low biomass and inhibitor-rich samples such as rhizosphere and soil (Lakay et al., 2007;Zhou et al., 1996), it is therefore often too costly to perform large-scale amplicon sequencing projects with a sufficient number of samples necessary for robust statistical analysis (Bahram et al., 2018;Kelly et al., 2015). To address this, we implemented a soil DNA extraction workflow (hereafter referred to as the SDE method) by combining aluminum sulfate based humic acid removal with a magnetic bead gDNA cleanup ( Figure 1) and a two-step PCR protocol creating Illumina sequencing-ready libraries. We show equal or better gDNA extraction performance from various soil types in comparison with two commercial kits and a recently published non-commercial extraction method by Zou et al. (2017; Figure A1). Further, we achieve this at a fraction of the cost per extraction ( reads per sample). The combination of our two new protocols therefore allows better utilization of state-of-the-art Illumina sequencing platforms to perform large-scale amplicon-based microbiome studies at a reduced cost.

| Soil material collection
We selected three different woodlands, as this captured different soil characteristics (Augusto et al., 2002;Zhou et al., 1996): a coniferous forest (52.661750, 1.095444, UK), a mixed forest (46.394474, 11.235371, Italy), and a broad-leafed forest (46.454682, 11.301284, Italy). We sampled the soil material from the topsoil after removing the litter layer into sterile 50 ml conical tubes (Supplier Starlab (UK) Ltd, E1450-0800) using nitrile gloves and a sterilized shovel. The sampled material was stored in a mobile refrigerator during transportation to the laboratory where the soil was stored at 4 ˚C until used for gDNA extraction. The cereal compost mix was collected in the same manner but obtained from the John Innes Horticulture facilities (Norwich, UK). even metagenomics shotgun approaches using long reads (PacBio or Nanopore), 10x Genomics linked reads, and Dovetail genomics.

K E Y W O R D S
amplicon library construction, DNA cleanup, gDNA extraction, microbiome, soil

| MoBio PowerSoil® DNA Isolation Kit
The kit was applied following the manufacturer's instructions with one alteration. The soil material was ground using the Geno/Grinder (SPEX SamplePrep 2010) for 1 min at 1750 rpm using the supplied grinding stones. The active hands-on time without incubation and centrifugation times is 16 min per extraction (S1, https://doi. org/10.5281/zenodo.4060156).

F I G U R E 1
Overview of the SDE soil gDNA extraction protocol: The genomic DNA extraction protocol is divided into two parts, the gDNA extraction and the gDNA cleanup. The soil material is ground with buffer one and two, following centrifugation the proteins are removed with buffer three and incubation on ice. After centrifugation, humic acid is removed by using buffer four and incubation on ice. The extracted genomic DNA is then ready for cleanup after centrifugation. The gDNA cleanup is performed using magnetic bead-based isopropanol precipitation of the gDNA, plus two bead washing steps with 80% Ethanol. The gDNA may then be eluted into a buffer of choice-in our case TE. gDNA, genomic DNA; SDE, soil gDNA extraction method

| Paperdisk method
The paperdisk extraction used is based on the protocol by Zou et al. (2017). We followed the protocol as described previously (Zou et al., 2017)

| SDE method
A total of 250 mg soil was transferred to a 2 ml tube containing  (Dong et al., 2006). We centrifuged the reaction at 17,000 rcf for 10 min and transferred either 600 µl supernatant to a 1.5 ml tube or 140 µl supernatant to a 96-well plate. Supernatants were stored at −20°C until further use. Buffer composition for Buffer 1, 2, 3, and 4 adapted from Brolaski et al. (2008).

| SDE single tube gDNA cleanup
To perform gDNA cleanups in single tubes, we prepared 10 µl magnetic beads (Sera-Mag Carboxylate-Modified Magnetic Particles (Hydrophophylic), 24152105050250, GE Healthcare Life Sciences) per extraction. The beads were transferred to a 2 ml tube and washed twice with a large volume of in 1% Tween-20. After washing, we resuspended the 10 µl beads in 20 µl 1% Tween-20 and added 20 µl of the Tween-20/bead mixture to each extraction (a final Tween-20 concentration of ~0.02%). DNA was precipitated onto the beads for 5-10 min by adding 0.7× volume of Isopropanol (420 µl). We washed the magnetic beads twice with 500 µl 80% ethanol on a magnet rack, air-dried the beads, and eluted the gDNA in 50 µl 1× TE buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA) for 5-10 min. The eluted gDNA was transferred to a 1.5 ml tube and stored at −20°C until further use.

| SDE 96-well plate gDNA cleanup
To perform the gDNA cleanups in a 96-well format, we transferred 140 µl supernatant to a 96-well plate (96 Well Non-Skirted PCR Plate, CLEAR, 4TI-0750_50, 4titude Ltd). For each extraction, we washed 5 µl magnetic beads and eluted the beads in 5 µl 1% Tween-20 (for 96 samples, we therefore prepared 480 µl magnetic beads). We added 5 µl of the 1% Tween-20/bead mixture to each reaction and precipitated the DNA on the beads adding 0.7× volume of Isopropanol (98 µl). We mixed the reactions by vortexing for 5 s and incubated the plates for 5 min to precipitate the gDNA. We then washed the beads twice on a magnet rack with 100 µl 80% ethanol, removed the remaining ethanol well, and eluted the gDNA in 50 µl 1× TE buffer for 10 min. The cleaned gDNA was transferred to a fresh 96-well plate (96 Well Non-Skirted PCR Plate, CLEAR, 4TI-0750_50, 4titude Ltd) and stored at −20°C until further use.

| Genomic soil DNA extraction and quality control
We performed the gDNA extraction using three replicates per soil sample and extraction method, using 250 mg soil from the same sample for each extraction. We tested four different extrac-  in 10 µl total reaction volume. We performed each PCR with three technical replicates using the following cycle conditions: initial denaturation at 95°C for 3 min, followed by 20 cycles of denaturation at 98°C for TA B L E 1 Genomic DNA quality control: Qubit BR DNA kit was used to measure the total gDNA concentration, and the nanodrop was used to compare the A260/280 and A260/230 ratios.
The MoBio PowerSoil® kit delivers high quality but low yields whereas the MP Biomedicals™ FastDNA™ SPIN kit delivers high yields at low quality. The SDE method falls between the two commercial kits for both quality and quantity

| Sequencing
The 16S and ITS library pools were sequenced using the MiSeq Nano reagent version 2, 500 cycle kit (Illumina) at an 8 pM loading concentration with a 10% PhiX spike-in. 16S and ITS pools were sequenced separately-each pool using a MiSeq Nano reagent kit.
The truncation length for forward reads was set to 180 bp and the truncation length for the reverse reads to 200 bp. For 16S libraries, we used the following parameters: maxN = 0, maxEE = 2 and truncQ = 11. For ITS libraries, we specified the following parameters: maxN = 0, maxEE = c(2, 2) and truncQ = 11 and a minimum length of 50 bp. Forward and reverse reads were merged with default settings. We used the Silva (silva_nr_v132) database to classify bacterial reads (Quast et al., 2013) and UNITE (sh_general_release_ dynamic_s_01.12.2017) for the fungal dataset (Nilsson et al., 2019).
Reads that did not match to the bacterial or fungal database were

| DNA yield and fragment analysis of different extraction methods
We tested our SDE method by extracting gDNA from 250 mg samples of four different soil types taken from a mixed forest (MiF), a coniferous forest (CoF), and a broad-leafed forest (BrF), plus a standardized cereal crop compost mix used at the John Innes Centre (Cer). We compared our method to two frequently used commercial extrac- Our method scored between the two commercial kits for both quality and quantity (Table 1). We further evaluated the methods for the extracted gDNA fragment length on the Agilent TapeStation. For the MoBio PowerSoil® kit method, we found the majority of gDNA fragment sizes fall between 13.9 and 24.4 kb. The SDE method produced fragments centered between 11.3 and 11.7 kb and the MP Biomedicals™ FastDNA™ SPIN mostly extracted fragments below 10 kb ( Figure A2). For the paperdisk method, we could not obtain enough DNA for fragment analysis.

| Extraction method effects on bacterial and fungal amplicon library construction
We constructed 16S V4 and ITS rRNA Illumina sequencing libraries from all extractions (three biological replicates per soil type) using 3 ng of gDNA input per library construction reaction and three technical replicates (similar to Tourlousse et al., 2018). All libraries were inspected using LabChip GX Touch high-sensitivity  (Figure 2). The paperdisk method did not produce a library with a detectable electropherogram trace. We pooled all libraries at equal mass (except the paperdisk method where we used the full amount as these libraries were not detectable) and submitted each library for 250 bp paired-end sequencing. Sequencing of the paperdisk extraction method did not produce any reads suggesting that the extracted gDNA concentration was too low for successful library construction.

| Comparison of extraction methods based on bacterial and fungal microbial composition
It has been previously reported that different microbial gDNA extraction methods can introduce a genera bias (Sáenz et al., 2019). To test this, we compared the biological replicates of each library preparation method (apart from the paperdisk method) by correlation analysis of the detected bacterial (total of 4069) and fungal (total of 1549) amplicon sequence variants (ASVs; Table 2, Figure A3). The three tested gDNA extraction methods compared well across all soil types. We also compared the genus abundances of our SDE method with the two commercial kits by analyzing the bacterial and fungal genus abundances of each library. The genus abundance plots for each soil type were not statistically significantly different between the extraction methods used for either fungal (adonis test, p-value: .801, Table 3) or bacterial communities (adonis test, p-value: 0.579, Table 3). Instead, our data showed statistically significant variation between soil types (bacteria ANOSIM test, p-value: 9.99e-4, fungi ANOSIM test, p-value: 9.99e-4, Table 3) but not between gDNA extraction methods (bacteria: Figure 3a,b and fungi: Figure 3c,d).
We further tested the samples using beta diversity as a measure TA B L E 3 Statistical results of alpha-and beta-diversity analysis: Bacterial and fungal data were investigated for potential soil type and gDNA extraction method influence on alpha and beta diversity. For alpha diversity, Shannon diversity and Observed richness were used, and for beta diversity, Bray-Curtis distance matrix was used. (Bray-Curtis) for between-sample similarity. This analysis agreed with the result of the genus abundance plots, that is, by clustering the soil types separately but not clustering the data for the three extraction methods (Figure 3b,d). We confirmed this result with permutation multivariate analysis of variance analyses (PERMANOVA, package "vegan" version 2.5.2, adonis function). We also controlled the impact of gDNA extraction methods on alpha diversity (Figure 4).
We could not find an impact of the gDNA extraction method on the   Figure A4c; Table 2). These results confirm that our SDE method fits between two commonly used commercial soil gDNA extraction kits across a broad range of measurable parameters.
On the other hand, Zou et al. described a fast and very affordable gDNA extraction method but yielding lower gDNA quantities (Zou et al., 2017). Here, we present a high-throughput gDNA extraction method that is suitable for low input and inhibitor-rich sample types such as soils (Figure 1; Figure A1). For our method, we first optimized mechanical lysis conditions by increasing the amount and types of grinding material, then chemical additives to the lysis buffer which remove common contaminants. We also compared commercial kits (silica column and carboxylate-coated beads) with our carboxylated magnetic bead-based protocol to determine best extraction yields. For the additives, we used sodium phosphate as a buffer matrix, which in the past together with aluminum ions was recommended for efficient removal of humic acids while minimizing DNA losses during extraction (Mandalakis et al., 2018). We found that adding aluminum sulfate in the lysis buffer leads to increased DNA yields and purity. Although we successfully extracted DNA adding only aluminum sulfate, we also observed that the addition of ammonium acetate to further precipitate impurities (Yu & Mohn, 1999) increased PCR amplification success (S4, https://doi. org/10.5281/zenodo.4060156 for more experimental details). We compared our finalized protocol to two leading commercial (MoBio PowerSoil® and MP Biomedicals™ FastDNA™ SPIN kits) and a non-commercial (paperdisk based) extraction method (Zou et al., 2017) and observed that our SDE method delivered good quality and quantities of NGS-compatible gDNA at a fraction of the cost of commercially available solutions. We also find that our method requires less manually intensive centrifugation steps (MoBio: nine steps, MP: nine steps, SDE: three steps) and the total hands-on time of our method is lower than the hands-on times for the MoBio and MP kits (MoBio: 16 min, MP: 10 min, SDE: 8 min). We achieved this with the use of a modified SPRI bead extraction protocol that allows fast, scalable, and inexpensive extraction of nucleic acids (Oberacker et al., 2019), especially because magnetic particles enable the transfer of our protocol to a plate format, without the disadvantages of handling many tubes and minimizing potential sample mix-ups (S1, https://doi.org/10.5281/zenodo.4060156).
We tested our extraction method for 96-well plate compatibility by quantifying yields from 27 different soil types of 196 samples ( Figure A1). Because it uses simple pipetting steps, SPRI beadbased purification and washing steps, our method should be easily adaptable from a multi-channel pipette to common liquid handling robotic systems (typically already able to use bead-based methods for DNA and RNA NGS library construction). The extracted gDNA from four distinctively different soil types using SDE is similar in quality and quantity to the two commercial kits (Table 1), with the extracted gDNA from the commercial kits and the SDE method led to similar amplicon library profiles ( Figure 2). In contrast, the paperdisk method did not generate useable sequencing libraries.
We further investigated the gDNA preparation methods for extraction biases when analyzing fungal and bacterial communities.
Here, the results of the two commercial kits and the SDE method overlap, showing no statistical differences in microbiome composition ( Figure 3) for beta diversity. Clustering of the sequencing data using PCoA separated soil types but not gDNA extraction methods (Figure 3b,d) (Table 3). Fungal alpha diversity is not affected by soil type and only partly affected by gDNA extraction method (Table 3; Figure 4). This altogether indicates that our SDE method overall does not induce an experimental bias in extracting bacterial and fungal community data.

E TH I C S S TATEM ENT
None required.

ACK N OWLED G M ENTS
We thank Sarah Walkington at the Natural History Museum, London, for assistance with sequencing. Financial support was provided by