Pyrosequencing analysis of a bacterial community associated with lava-formed soil from the Gotjawal forest in Jeju, Korea

In this study, we analyzed the bacterial diversity in soils collected from Gyorae Gotjawal forest, where globally unique topography, geology, and ecological features support a forest grown on basalt flows from 110,000 to 120,000 years ago and 40,000 to 50,000 years ago. The soils at the site are fertile, with rocky areas, and are home to endangered species of plants and animals. Rainwater penetrates to the groundwater aquifer, which is composed of 34% organic matter containing rare types of soil and no soil profile. We determined the bacterial community composition using 116,475 reads from a 454-pyrosequencing analysis. This dataset included 12,621 operational taxonomic units at 3% dissimilarity, distributed among the following groups: Proteobacteria (56.2%) with 45.7% of α-Proteobacteria, Actinobacteria (25%), Acidobacteria (10.9%), Chloroflexi (2.4%), and Bacteroidetes (0.9%). In addition, 16S rRNA gene sequences were amplified using polymerase chain reaction and domain-specific primers to construct a clone library based on 142 bacterial clones. These clones were affiliated with the following groups: Proteobacteria (56%) with 51% of α-Proteobacteria, Acidobacteria (7.8%), Actinobacteria (17.6%), Chloroflexi (2.1%), Bacilli (1.4%), Cyanobacteria (2.8%), and Planctomycetes (1.4%). Within the phylum Proteobacteria, 56 of 80 clones were tentatively identified as 12 unclassified genera. Several new genera and a new family were discovered within the Actinobacteria clones. Results from 454-pyrosequencing revealed that 57% and 34% of the sequences belonged to undescribed genera and families, respectively. The characteristics of Gotjawal soil, which are determined by lava morphology, vegetation, and groundwater penetration, might be reflected in the bacterial community composition.


Introduction
On Jeju Island in Korea, the word "Gotjawal" refers to any naturally formed forest that grows on basalt-flow rocky terrain and presents a virtually impassable mixture of trees and undergrowth. In addition, these forests function as the main source of water for Jeju's population; rainwater is purified and recharged by the porous rocks and groundwater aquifers within the forest. Gotjawal forests are characterized by lava domes, microclimates, and ecological features shaped by volcanic activity occurring 110,000-120,000 and 40,000-50,000 years ago (Park 2010). Overall, the Gotjawal forest represents a speciesrich ecosystem of coexisting plant species, such as ferns and broad-leaved trees, at both the northern and southern distributional limits Kim et al. 2010), harboring a total of 506 plant and 784 insect species identified to date (Jung 2010;Kim et al. 2010). The forest occurs on a highly irregular substrate of a'a lava flows (Fig. 1). Jeju's lava forests may be a globally unique area and critical for understanding lava-formed forests. However, many regions of the Gotjawal forest have been deforested, used for charcoal and edible mushroom production, and grazed by horses, cows, and other herbivores. In response to these insults, secondary forests have developed on disturbed sites. Nonetheless, these forests have been gradually disappearing in recent decades, and approximately 50% of these forests have been destroyed; currently, only about 6% of the original forest area remains due to unregulated construction and urbanization (Jung 2009).
Until recently, few studies had characterized the microorganisms in Gotjawal forest soils; therefore, microbial analyses are necessary to understand the microbial communities within these soils and for elucidating the characteristics that permit the formation of these microbial communities (Kim et al. 2014a). Generally, soil microbial communities are highly diverse, and estimates of unclassified species may reach 99% of 16S rRNA gene sequence databases, since most soil bacteria are difficult to cultivate (Torsvik et al. 1990;Amann et al. 1995).
Microbial communities associated with volcanoes have been studied in the lava flow soils, rocks, and/or glass found at Kilauea volcano, Hawaii (King 2003;Dunfield and King 2004;Nanba et al. 2004;Gomez-Alvarez et al. 2007; King and Weber 2008;Nacke et al. 2011;King and King 2012); Mauna Loa, Hawaii (Crews et al. 2001); Miyake Island, Japan (Ohta et al. 2003); Llaima volcano, Chile (Hernandez et al. 2014); and Mt. Hekla, Iceland (Kelly et al. 2010). Cutler et al. (2014) suggested that plant community composition is a significant determinant for fungal communities, but is less relevant for bacterial community composition during long-term changes in soil microbial communities. Bacteria are able to colonize recent volcanic deposits, which can contain numerous unknown bacterial species (Gomez-Alvarez et al. 2007). Various aspects of the structure and function of microbial communities have been studied in recent Hawaiian volcanic deposits (Dunfield and King 2004), and these deposits in particular have been shown to harbor very distinct microbial assemblages. Three hundred-year-old lavaderived forest soils have been shown to exhibit substantial diversity (Nusslein and Tiedje 1998).
In this study, we analyzed the composition and diversity of bacteria in Gotjawal soils using 454-pyrosequencing and polymerase chain reaction (PCR) cloning-based approaches. The results of our study will provide important insights into the understanding of the soil microbial community in lava forest soils.

Collection of soil samples in Gotjawal
The geographic coordinates of the sample collection site were 33°26.023 0 N and 26°39.46 0 W (Gyorae Gotjawal; Fig. 1A and B). In May 2009, samples of soil from behind or between the lava and trees were collected aseptically using ethanol-disinfected spatulas. Samples were placed in clean, sealable plastic bags. These samples were stored in a cooler during transfer to the laboratory and were then stored at 5°C for 1 week until further processing. After sieving (using a 2-mm sieve), subsamples were frozen at À70°C. Soil DNA extraction was conducted within 1 week of collection. The soil samples had a pH of 4.5, electrical conductivity of 3.44 dS/m, organic matter content of 34%, and NO 3 À concentration of 300.48 mg/kg dry soil (Kim et al. 2014a).

Soil DNA extraction
DNA was directly extracted from three subsamples using a FastDNA SPIN kit for soil (QBiogene Inc., Vista, CA) according to the manufacturer's protocol. The extracted DNA was purified using a FastPure DNA kit (Takara Bio Inc., Shiga, Japan) and concentrated using a Zymoclean gel DNA recovery kit (Zymo Research Corp., Orange, CA). The purified DNA from 20 subsamples was then combined and used to generate amplicons for 454-pyrosequencing and for construction of the clone library.

Clone library analysis
Detail information for the cloning and transformation has been previously published (Kim et al. 2012). The primers used to amplify the 16S rRNA genes for bacteria were 27F and 1492R (Lane 1991

Pyrosequencing
For the determination of operational taxonomic units (OTUs), we defined species, genus, family, and phylum levels at 3%, 5%, 10%, and 20% dissimilarity, respectively, following the procedures used by Schloss and Handelsman (2005). For taxonomy-based analysis, the SILVA database (Pruesse et al. 2007) was used with the unique sequences at an 80% confidence threshold cutoff. Rarefaction curves were analyzed using the R VEGAN package (Oksanen et al. 2011;Fig. 2), and richness estimates were analyzed using MOTHUR (Schloss et al. 2009). Short reads (26,382 reads) less than 289 bp and reads with ambiguous nucleotides (17,730 reads) were removed from analysis. Long reads more than 724 bp were trimmed using CD-Hit-OTU (Li et al. 2012). Chimeric sequences (1091 reads) were removed using MOTHUR. Reads from all datasets were quality filtered using a Q20 quality cutoff.

Clone library analysis
Putative chimeric sequences were identified using Bellerophon (Huber et al. 2004). The 16S rRNA gene sequences were aligned using the Nearest Alignment Space Termination (NAST) aligner (DeSantis et al. 2006a), and the aligned sequences were compared to the Lane mask using the Greengenes website (DeSantis et al. 2006b). The Sequence Match feature of RDPII (Cole et al. 2009) was used to find GenBank sequences representing the most closely related type strain for each clone, which were then included as references in the phylogeny. Using the Greengenes Automatic Taxonomic Classification algorithm (DeSantis et al. 2006a) and GenBank, a set of related sequences was interpreted as a novel genus (or species) if they were classified as the same genus (or species). Phylogenetic trees were constructed using neighborjoining with MEGA version 5.0 for Windows (Tamura et al. 2011). Evolutionary distances were calculated using the Kimura 2-parameter method (Kimura 1980). Bootstrap analyses of the neighbor-joining data were conducted based on 1000 samples to assess the support for inferred phylogenetic relationships. DOTUR (Schloss and Handelsman 2005) was used to calculate taxon richness and diversity estimates. A distance matrix was obtained using the Calculate Distance Matrix algorithm from the Greengenes website (DeSantis et al. 2006a,b).

Nucleotide sequence accession numbers
All pyrosequencing reads were deposited in the DDBJ Sequence Read Archive (SAR) under the study accession number DRP002233. Bacterial clone sequences were deposited in the DDBJ under the following accession numbers: AB821051-AB821192.

Analysis of bacterial sequences
In this study, we examined a total of 116,475 reads and 142 clones representing 10 and 7 phyla, respectively, from Gotjawal forest soil. Figure 3 and Table 1 summarize the phylogenetic distribution of the 454-pyrosequences and clone sequences of the 16S rRNA gene. The class a-Proteobacteria, and phyla Actinobacteria and Acidobacteria dominated the bacterial community in the soil, representing 45.6%, 25.2%, and 10.9% of the 454-pyrosequences, respectively. The same phyla or classes dominated the clone library, representing 51.2%, 21%, and 10% of sequences, respectively.
At a cutoff of 3% dissimilarity (i.e., 97% sequence identity) among the pyrosequencing reads, 12,621 OTUs were obtained from the 85,324 unique sequences (Table 2). To estimate species richness, the ACE (abundance-based coverage), Boot, and Chao1 estimators were used. At the OTU cutoff of 3%, 12,621 OTUs were obtained from the 116,475 reads. The respective total numbers of species were estimated to be 23,489, 14,920, and 19,871 (Table 2). At 3% dissimilarity, 73 OTUs were obtained from the 142 bacterial clone sequences. ACE, Boot, and Chao1 estimators for this dataset were 199, 92, and 189, respectively (Table 2).
To determine richness based on pyrosequencing datasets and bacterial clone sequences, we identified 12,621, 7500, 3126, and 807 OTUs and 73, 60, 40, and 18 OTUs based on 3% (species level), 5% (genus level), 10% (family level), and 20% (phylum level) dissimilarity, respectively (Table 2). Bacterial community composition based on 454-pyrosequences (3% dissimilarity) revealed that 12,621 OTUs were represented in the soil (Fig. 2). The Shannon-Wiener (H) and reciprocal Simpson's (1/D) indices based on the clone library were 3.95 and 3.52 (H) and 48.1 and 20.8 (1/D) at 3% and 5% dissimilarity, while the H and 1/D indices of pyrosequences were 7.75, 7.03 (H) and 526, 306 (1/D) at 3% and 5% dissimilarity, respectively. Chao1, based on the clone library and pyrosequences of the Gotjawal soil, was also higher than the samples (189, 130 and 19,871, 10,833) at the same   (Table 2). Moreover, based on the comparison of the obtained 16S rRNA gene sequences to their closest known relatives, we discovered several new taxa at the species and genus levels. Figure 4A shows the discovery of two novel genera within d-Proteobacteria and 10 novel genera within a-Proteobacteria.

The order Rhizobiales of a-Proteobacteria
Among the a-Proteobacteria (45.7% of OTUs) represented in the 454-sequence dataset, the order Rhizobiales (30.2% of OTUs) was represented by the families Hyphomicrobiaceae, Xanthobacteraceae, Rhodobiaceae, Bradyrhizobiaceae, and Rhizobiaceae, with Xanthobacteriaceae as the core group (16.3%; Fig. 4A). Overall, however, the phylotypes identified at the genus level were unknown. Within the family Bradyrhizobiaceae (4.8% of OTUs), the genus Bradyrhizobium accounted for 4.2% of the total OTUs, suggesting that this genus, and nitrogen fixers in general, may be major contributors to the Gotjawal nitrogen cycle. However, further careful studies are required to support this hypothesis.
The results of clone library analysis showed that a-Proteobacteria represented a high percentage of the total bacteria in the Gotjawal soil, which may be a key factor in the maintenance of the forest ecosystem (Table 1). Different results were obtained with a forest soil, where a-Proteobacteria made up only approximately 11.4% of the total bacteria (Nacke et al. 2011). Recently, our research group reported a new genus, Variibacter gotjawalensis gen. nov., sp. nov., isolated from the soil of Aewol Gotjawal forest, one of four Gotjawal sites on Jeju Island (Kim et al. 2014b); this isolate belongs to a-Proteobacteria and shows high similarity with Bradyrhizobium oligotrophicum LMG 10732 (93.6%).
Among the cloned sequences, Actinobacteria were dominant (21%). A phylogenetic analysis of these sequences revealed that the Gotjawal soil contained a variety of previously undiscovered genera and species of Actinobacteria (Fig. 4C). We discovered six new clades at the genus level and one new clade at the family level.

Functional bacterial genera
As Roesch et al. (2007) and Uroz et al. (2010) have suggested, and as is shown in Table 3, the nitrogen-fixing bacteria primarily detected by 16S rRNA gene pyrosequencing belonged to Bradyrhizobium (4.2%) and Rhizobium (0.6%). A large number of sequences from the cellulolytic bacterial genus Acidothermus (Barabote et al. 2009) were also detected in the forest soil (Table 1). In addition, few anaerobic AOB (0.002%) and methane-oxidizing bacteria (0.002-0.04%, Methylocella genus) were detected. However, 16S rRNA gene taxonomies were only loosely correlated with function.
This study has some limitations. First, the dataset was small, and no replications were performed. This makes both the analysis and comparison with other studies difficult. Additionally, many of the identified sequences were not affiliated with known taxa. While this is a common occurrence, further studies are necessary to determine the importance of these results with regard to the ecology and characteristics of the Gotjawal forest. Analysis of the extent to which sequences from Gotjawal are completely novel, agricultural soil clone SC-I-40 (AJ252634)

GJ16s3 H04
Planktothricoides raciborskii INBaOR(AB045953) or whether these sequences have been observed in other soils or other systems, would also be of value. Further data are necessary to determine the novelty of these results. In addition, the presence of nitrogen-fixing bacteria is only important if we can document members of Fabaceae among the plants. Despite these limitations, this study is the first to analyze the bacterial communities within Gotjawal forest soils, and further studies are needed.
In conclusion, the high sequence identity of many of the bacterial clones to only environmental reference clones suggested that the majority of 16S rRNA gene pyrosequencing datasets and gene clones in Gotjawal soils were not affiliated with known genera or species. We discovered 18 novel genera and one novel family, as well as various novel species candidates, within the bacterial domain. The high rate of retrieval of new genus candidates (frequency of >50%) suggested that the communities may be highly specialized for growth in lava forest soils. Furthermore, the soil of the Gotjawal forest exhibited a unique bacterial composition containing unclassified Actinobacteria and a-Proteobacteria. Therefore, further work is necessary to fully elucidate the composition of the bacterial community and the functions of these soils.