Correspondence: Max M. Häggblom, Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA. Tel.: +1 848 932 5646; fax: +1 732 932 8965; e-mail: email@example.com
Acidobacteria are among the most abundant bacterial phyla found in terrestrial ecosystems, but relatively little is known about their diversity, distribution and most critically, their function. Understanding the functional activities encoded in their genomes will provide insights into their ecological roles. Here we describe the genomes of three novel cold-adapted strains of subdivision 1 Acidobacteria. The genomes consist of a circular chromosome of 6.2 Mbp for Granulicella mallensis MP5ACTX8, 4.3 Mbp for Granulicella tundricola MP5ACTX9, and 5.0 Mbp for Terriglobus saanensis SP1PR4. In addition, G. tundricola has five mega plasmids for a total genome size of 5.5 Mbp. The three genomes showed an abundance of genes assigned to metabolism and transport of carbohydrates. In comparison to three mesophilic Acidobacteria, namely Acidobacterium capsulatum ATCC 51196, ‘Candidatus Koribacter versatilis’ Ellin345, and ‘Candidatus Solibacter usitatus’ Ellin6076, the genomes of the three tundra soil strains contained an abundance of conserved genes/gene clusters encoding for modules of the carbohydrate-active enzyme (CAZyme) family. Furthermore, a large number of glycoside hydrolases and glycosyl transferases were prevalent. We infer that gene content and biochemical mechanisms encoded in the genomes of three Arctic tundra soil Acidobacteria strains are shaped to allow for breakdown, utilization, and biosynthesis of diverse structural and storage polysaccharides and resilience to fluctuating temperatures and nutrient-deficient conditions in Arctic tundra soils.
Acidobacteria represent one of the most abundant and ubiquitous bacterial phyla found in global soil environments (Barns et al., 1999; Janssen, 2006; Fierer et al., 2007; Jones et al., 2009; Kielak et al., 2009; Lauber et al., 2009; Lee & Cho, 2009; Eichorst et al., 2011), and they are widely distributed in Arctic and boreal soils (Goulden et al., 1998; Neufeld & Mohn, 2005; Dedysh et al., 2006; Männistö et al., 2007, 2009; Lee et al., 2008; Pankratov et al., 2008, 2011; Campbell et al., 2010; Chu et al., 2010). Nevertheless, relatively little is still known about their functional and ecological roles in these soils. Despite a large collection of Acidobacteria 16S rRNA gene sequences in databases representing diverse species from various habitats, only a few have been cultivated and described. Acidobacteria have been divided in up to 26 phylogenetic subdivisions based on 16S rRNA gene phylogeny (Barns et al., 2007) of which subdivisions 1, 3, 4, and 6 are most abundantly detected in soil environments (Jones et al., 2009). The phylogenetic diversity, ubiquity, and abundance of this group suggest that they play important ecological roles in soils. Acidobacteria are assumed to be genetically and metabolically diverse, as they inhabit a wide variety of natural environments over a range of temperature, salinity, organic matter, and pH (Jones et al., 2009; Faoro et al., 2010; Ganzert et al., 2011). The abundance of Acidobacteria correlates with soil pH, with subgroup 1 Acidobacteria being the most abundant in slightly acidic soils (Kishimoto et al., 1991; Männistö et al., 2007; Kleinsteuber et al., 2008; Jones et al., 2009; Lauber et al., 2009; Chu et al., 2010). An increasing number of Acidobacteria have recently been cultivated and described (for references see Männistö et al., 2012). Nonetheless, the paucity of well-characterized Acidobacteria hampers our understanding of the physiology and ecological function of these organisms, as well as how they will respond and adapt to environmental change in these soil environments.
Arctic and boreal environments cover over 20% of the terrestrial surface and harbor about one-third of the total global soil carbon pool (Loya & Grogan, 2004). However, little is known about the microbial communities that utilize this large carbon pool, their activity, and community dynamics, despite their critical role in carbon mineralization and potential impact on atmospheric CO2 and future climate change. Acidobacteria have been reported to dominate soils rich in soil organic matter and are involved in microbial degradation of lignocellulosic plant biomass (Eichorst et al., 2011; Pankratov et al., 2011). Using combined molecular- and cultivation-based approaches, we have demonstrated that members of subdivision 1 Acidobacteria are a dominant bacterial group that are active at low temperatures and resilient to multiple freeze–thaw cycles in acidic tundra soils of northern Fennoscandia. In addition, Acidobacteria comprise up to > 50% of sequences in clone libraries (Männistö et al., 2007, 2009). A concerted effort led to the cultivation of several new slow-growing and fastidious cold-adapted Acidobacteria belonging to the genera Terriglobus and Granulicella (Männistö et al., 2011, 2012). It appears that soils naturally exposed to harsh and changing environmental conditions may harbor frost-tolerant and resilient bacterial species. We hypothesize that these conditions have selected a stable bacterial community dominated by Acidobacteria that is only minimally affected by temperature fluctuation and freeze–thaw cycles.
Here, we report on the analysis of the genomes of three novel cold-adapted strains of subdivision 1 Acidobacteria, Granulicella mallensis strain MP5ACTX8, Granulicella tundricola strain MP5ACTX9, and Terriglobus saanensis strain SP1PR4, isolated from Arctic tundra soils (Fig. 1). Our genomic analysis of the three tundra soil strains is supported by physiological characterization to assess the mechanisms promoting their activity, dominance, and survival in these soil environments. These strains are compared with three other Acidobacteria strains, for which finished genomes are available (Ward et al., 2009; Challacombe et al., 2011), namely Acidobacterium capsulatum ATCC 51196 isolated from acid mine drainage in Japan (Kishimoto et al., 1991) and two other soil strains, ‘Candidatus Koribacter versatilis’ Ellin345 and ‘Candidatus Solibacter usitatus Ellin6076’, both isolated from soils of rye grass/clover pasture in Australia (Joseph et al., 2003; Davis et al., 2005). Our study provides genomic insights into the ecology of these Acidobacteria communities in turnover of soil organic carbon in Arctic and boreal environments.
Materials and methods
Habitat and strains of Acidobacteria
Granulicella mallensis MP5ACTX8T (= DSM 23137 = ATCC BAA-1857), G. tundricola MP5ACTX9T (= DSM 23138 = ATCC BAA-1859), and T. saanensis SP1PR4T (= DSM 23119 = ATCC BAA-1853) were isolated from Arctic tundra heaths located in northern Finland (Männistö et al., 2011, 2012). All strains originated from the organic layer of soil samples collected from oligotrophic wind-swept hills that experience large annual temperature variation and frequent freeze–thaw cycles in the autumn and spring. Vegetation in these sites is dominated by dwarf shrubs of the Ericaceae family, which produce acidic organic matter with a high C/N ratio (Eskelinen et al., 2009). The soil organic matter content is high, c. 30–50% and acidic (pH 4.8–5.2). Strains were cultivated from soil samples using R2A agar (SP1PR4) or a mixture of carboxymethyl cellulose (CMC), starch, and xylan as carbon sources (MP5ACTX8 and MP5ACTX9), and once obtained as pure cultures, the strains were maintained and grown on R2 agar or broth (adjusted to pH 5.5, Difco) and stored at −70 °C in 20% glycerol.
Growth of strains and DNA extraction
The tundra soil strains were grown aerobically on half-strength R2A medium, pH 5.5 at 20 °C. Genomic DNA of high sequencing quality was isolated using a hexadecyltrimethylammonium (CTAB) method (Doyle & Doyle, 1990) modified for genomic DNA extraction from bacterial cells. Cells (OD600 nm of not more than 1.2) were treated with 2% SDS and 250 μg mL−1 proteinase K to lyse the cells and incubated at 37 °C for 1 h. Then, CTAB buffer (1% CTAB, 0.75 M NaCl, 50 mM Tris pH 8, 10 mM EDTA) was added and incubated at 65 °C for 10 min. The suspension was extracted once with chloroform/isoamyl alcohol (24 : 1) and then with phenol/chloroform/isoamyl alcohol (24 : 1). Finally, the DNA was precipitated from the supernatant with 0.6 vol isopropanol (−20 °C) at room temperature for 30 min. Genomic DNA was pelleted, washed with 70% ethanol, and dried. The pellet was resuspended in TE (10 mM Tris, 1 mM EDTA) buffer containing 1 μL RNAse (10 mg mL−1) and incubated at 37 °C for 20 min. The DNA was evaluated according to the quality control guidelines provided by the DOE Joint Genome Institute (DOE-JGI).
Genome sequencing and assembly
Finished genomes for strains G. mallensis MP5ACTX8 (JGI ID 4088692), G. tundricola MP5ACTX9 (JGI ID 4088693), and T. saanensis SP1PR4 (JGI ID 4088690) were generated at DOE Joint Genome Institute using a combination of Illumina (Bennett, 2004) and 454 technologies (Margulies et al., 2005). Three libraries, an Illumina GAii shotgun library, a 454 Titanium standard library, and a paired-end 454 library were constructed. All general aspects of library construction and sequencing performed at JGI can be found at http://www.jgi.doe.gov/. The 454 Titanium standard data and the 454 paired-end data were assembled together with Newbler, version 2.3. The Newbler consensus sequences were computationally shredded into 2 kb overlapping fake reads (shreds). Illumina sequencing data were assembled with velvet, version 0.7.63 (Zerbino & Birney, 2008), and the consensus sequences were computationally shredded into 1.5 kb overlapping fake reads (shreds). The 454 Newbler consensus shreds, the Illumina velvet consensus shreds, and the read pairs were integrated in the 454 paired-end library using parallel phrap, version sps – 4.24 (High Performance Software, LLC). The software Consed (Ewing et al., 1998; Gordon et al., 1998) was used in the following finishing process. Illumina data were used to correct potential base errors and increase consensus quality using the software Polisher developed at JGI (Alla Lapidus, unpublished data). Possible misassemblies were corrected using gapResolution (Cliff Han, unpublished data), Dupfinisher (Han & Chain, 2006), or sequencing cloned bridging PCR fragments with subcloning. Gaps between contigs were closed by editing in Consed, by PCR, and by Bubble PCR (J-F Cheng, unpublished data) primer walks. A total of 291, 153, and 28 additional PCRs, and 5, 6, and 0 shatter libraries were necessary to close gaps and to raise the quality of the finished genomes of G. mallensis MP5ACTX8, G. tundricola MP5ACTX9, and T. saanensis SP1PR4, respectively. A combined depth of coverage of 231×, 294×, and 219× was achieved for the three genomes of G. mallensis MP5ACTX8, G. tundricola MP5ACTX9, and T. saanensis SP1PR4, respectively.
Sequence analysis, annotation, and bioinformatics
Gene prediction and/or functional annotation of genomes were retrieved through the Integrated Microbial Genome (IMG) system supported by DOE-JGI Microbial Annotation Pipeline (DOE-JGI MAP). Genes were identified using Prodigal as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline (Pati et al., 2010). The coding sequences (CDS) were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COGs, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Noncoding genes and miscellaneous features were predicted using tRNAscan-SE, RNAMMer, Rfam, TMHMM, and signalP.
Comparative genome analysis was carried out in part using the Integrated Microbial Genome (IMG) portal. The DNA and protein sequences were retrieved by web-based databases (e.g., NCBI Query and blast searches). To facilitate the predictive analysis of the open reading frames of the sequenced genomes, relevant information was extracted from databases such as clusters of orthologous genes (COGs) (http://www.ncbi.nlm.nih.gov/COG) (Tatusov et al., 1997) and CAZy (http://www.cazy.org) (Cantarel et al., 2009). Phylogenetic and molecular evolutionary analyses were conducted for DNA- or protein-based sequences subjected to alignment using clustalw and muscle using mega version 5 (Tamura et al., 2011).
Homology-derived secondary structure of proteins (HSSP) distances (Sander & Schneider, 1991; Rost, 2002) between protein sequences were computed as follows: (1) psi-blast (Altschul et al., 1997) all-against-all the sequences in the six genomes (three iterations, inclusion e-value (h and e) parameters = 10–10); (2) extract sequence identity and alignment length without gaps for each alignment; (3) apply the HSSP formula described in Rost (2002) to compute the distance. In short, the similarity of genome A to genome B is computed as the number of genes in A that have homologues in B to the total number of genes in A (at a given HSSP cutoff). At each HSSP cutoff (range −5 to 45), we computed the numbers of genes overlapping between all different combinations of genomes. This measure represents the distance of a particular sequence alignment from a homology threshold curve, a function of alignment length, and percent sequence identity. In other words, alignments are mapped to a two-dimensional space where points above the HSSP curve represent pairs of functionally similar proteins. The distance of one such point to the curve is correlated with the reliability of function transfer; that is, the percentage of functionally similar sequence pairs at HSSP distance above 30 is higher than at HSSP distance above 0. We mapped the percentage of genes in each genome common to (1) only the three tundra genomes vs. (2) all six genomes. To compute the ‘random’ baseline for this curve, we randomly selected from HAMAP (Lima et al., 2009) six genomes of fully sequenced bacteria with similar genome size (4–5 K genes): Azotobacter vinelandii, Escherichia coli (strain K12), Methylacidiphilum infernorum (isolate V4), Novosphingobium aromaticivorans (strain DSM 12444), Rhodopseudomonas palustris (strain ATCC BAA-98/CGA009), and Vibrio fischeri (strain MJ11). Of these six, we randomly chose E. coli, A. vinelandii, and V. fischeri to be the reference outliers.
Assays for substrate and enzyme activities
Carbon source utilization assays and enzymatic activities were tested and reported earlier (Männistö et al., 2011, 2012), where utilization of sugars was detected by growth on 96-well plates, and hydrolysis of various polysaccharides was assayed as CO2 production (Männistö et al., 2011, 2012). Here, we assayed the utilization of soluble polysaccharides alginate, CMC, laminarin, lichenan, pectin, pullulan, and starch on 96-well plates with VL55 mineral medium (pH 5.5; Sait et al., 2002) supplemented with yeast extract (100 mg L−1) and 1–2 g L−1 of the polysaccharide. CMC hydrolysis was tested for colonies spotted on replicate diagnostic plates with 0.5 and 1% (w/v) of CMC sodium salt as sole source of carbon in VL55 medium containing yeast extract (100 mg L−1) and grown for 2 weeks. Formation of a zone of clearance around colonies after staining with iodine was used as a preliminary indication of enzyme activity (Kasana et al., 2008). Hydrolysis of CMC and xylan was further assayed on plates containing 0.25 g L−1 of peptone and yeast extract, 20 g L−1 of agar, and 2 or 5 g L−1 of each polysaccharide in VL55 buffer (pH 5.5). The plates were incubated at 20 °C for up to 4 weeks, and the polysaccharide hydrolysis was detected by flooding the plates with 0.1% Congo red for 15 min (Teather & Wood, 1982).
Utilization of chitin by the strains was assayed using chitin azure (chitin from crab shells covalently linked with Remazol Brilliant Violet 5R (RBV) dye; Sigma) and by a chitobiase assay as described by O'Brien & Colwell (1987). For the chitin azure assay, 0.5 mL of substrate (0.5 g L−1 in VL55 buffer) was mixed with 0.5 mL of culture grown for 5 days on glucose (1 g L−1) and yeast extract (0.5 g L−1). Chitinase activity was detected after 5 h, 20 h, 42 h, and 10 days incubation at room temperature by measuring the absorbance at 550 nm. Chitobiase activity was assayed by the filter paper spot test using 4-methylumbelliferyl-N-acetyl-β-d-glucosaminide. Biomass for the assay was grown on R2A (pH 5.5) for 5 days and ca. 0.2 mL of loop-full of the colonies rubbed on antibiotic disks. Control disks without substrate and without inoculum were included, and 20 μL of the substrate solution pipetted to each disk. Chitobiase activity was detected after 20–30 min of incubation at room temperature by exposure to UV light.
The genomes of three tundra soil strains, G. mallensis, G. tundricola, and T. saanensis, consist of one circular chromosome of 6.2, 4.3, and 5.0 Mbp, respectively (Fig. 2). In addition, G. tundricola has five mega plasmids ranging in size from 1.1 × 105 to 4.7 × 105 bp (Fig. S1) for a total genome size of 5.5 Mbp. Among the five strains of subdivision 1 Acidobacteria, G. mallensis has the largest genome size of 6.2 Mbp. The general genome and physiological features of the three tundra soil strains are compared with those of A. capsulatum, ‘K. versatilis’, and ‘S. usitatus’ and shown in Table 1. The genome GC content for the three tundra strains ranges from 57% to 60%. The genomes of G. mallensis, G. tundricola, and T. saanensis are estimated to encode for 4907, 4706, and 4279 protein CDSs, respectively, with ~70% of the genes with predicted functions (68–72% COGs) (Table 1). The genomes of G. mallensis and T. saanensis consist of a large percent of CDSs with signal peptides (43–44%), suggesting that a higher number of genes are involved in transport/translocation processes in these two tundra soil strains as compared to the other subdivision 1 Acidobacteria. The three tundra soil strains contained a large number of sequences in paralogous clusters (50–54%). In comparison, the large genome size (9.9 Mbp) of ‘S. usitatus’ belonging to subdivision 3 Acidobacteria has an increased number of paralogs, which has been accounted for by gene acquisition and horizontal gene transfer events (Challacombe et al., 2011). We identified mobile genetic elements in all three tundra soil strains, with a large number in the genome of G. tundricola (n = 154), encoding for phage integrases, transposases, and IS elements (Table S1). Granulicella tundricola also contained five mega plasmids (> 100 kb in size). We did not find any CDSs for clustered, regularly interspaced, short palindromic repeats (CRISPRs) in the genome of any of the three tundra soils strains; however, a CDS for a CRISPR-associated protein (AciX8_2932) of the Cas5 family was present in the genome of G. mallensis. This suggests that the genomes of the tundra soil strains are subjected to gene transfer events.
Table 1. Comparison of general genome and physiological features of six Acidobacteria species
G. mallensis MP5ACTX8
G. tundricola MP5ACTX9
T. saanensis SP1PR4
A. capsulatum ATCC 51196
‘K. versatilis’ Ellin345
‘S. usitatus’ Ellin6076
Plasmids in G. tundricola MP5ACTX9, pACIX901, 0.48 Mbp; pACIX902, 0.3 Mbp; pACIX903, 0.19 Mbp; pACIX904, 0.12 Mbp; pACIX905, 0.12 Mbp.
The physiology of the three tundra soil strains, G. mallensis, G. tundricola, and T. saanensis, has been described in detail (Männistö et al., 2011, 2012), and comparative data with strains A. capsulatum, ‘K. versatilis’, and ‘S. usitatus’ are presented in Table 1. The tundra soil strains are cold-adapted (grow at temperatures from +4 to 28 °C) compared with the mesophilic strains A. capsulatum, ‘K. versatilis’, and ‘S. usitatus’ (Kishimoto et al., 1991; Ward et al., 2009; Challacombe et al., 2011) isolated from temperate environments (Table 1). All six Acidobacteria strains are acidophiles. The two Granulicella strains are able to grow at an acidic pH range (pH 3.5–6.5), while T. saanensis grows at a pH range of 4.5–7.5.
General genome comparisons
Organisms occupying the same environmental niche are intuitively more functionally related to each other than to the outsiders. Here, we assessed organism functional similarity using whole-genome homology. Comparisons of the six Acidobacteria genomes were made by all-against-all sequence comparisons at the level of protein CDS, using HSSP distance as a measure of similarity (Rost, 2002). In short, the similarity of genome A to genome B is computed as the number of genes in A that have homologues in B to the total number of genes in A (at a given HSSP cutoff). At each cutoff (range −5 to 45), we computed the similarity of all six genomes in our study. For our analysis, we assume that a higher number of functionally similar proteins indicate overall organism similarity. Intuitively, higher stringency in assigning homology decreases the number of homologous genes; that is, with increasing HSSP cutoffs, there is a decrease in the percentage of genes common to all six genomes (Fig. 3a, X-axis). However, highly similar genes are still assigned homology even at high stringency thresholds, as illustrated by the increase in the percentage of genes common to only the three (similar) tundra genomes (Fig. 3a, Y-axis; HSSP −5 to HSSP 34). Thus, the increase in HSSP cutoffs stringency from −5 to 22 (Fig. 3a; beginning of the curve plateau) progressively better groups the tundra genomes closer to each other (increasing along the Y-axis) and away from the three other strains of Acidobacteria (decreasing along the X-axis). Cutoffs stricter than 22 remove homologues from both sets, with the tundra overlap set being replenished by the nontundra ‘drop-outs’ until HSSP = 34. Beyond this threshold, fewer and fewer genes find homologues in any of the genomes.
The gene pool shared by the genomes of three tundra soil strains G. mallensis, G. tundricola, and T. saanensis at HSSP = 22 (beginning of curve plateau, Fig 3a) is depicted by a Venn diagram (Fig. 3b; numbers in each intersection indicate shared CDSs). Some genes were specific to one strain only: 1679 CDSs in G. mallensis, 1640 CDSs in G. tundricola, and 1249 CDSs in T. saanensis. 1900–1966 CDSs were shared by all tundra genomes with more than half (1551–1586 CDSs; data not shown) of these genes shared by all six Acidobacteria strains (the three tundra soil strains and A. capsulatum, ‘K. versatilis’, and ‘S. usitatus’). Further analysis identified a gene pool shared only by the three tundra Acidobacteria (but not identified in three other species) consisting of 380 CDSs in G. mallensis, 370 CDSs in G. tundricola (including plasmids: 21 genes in pACIX902, 20 genes in pACIX901, 11 genes in pACIX903, four genes in pACIX904, and two genes in pACIX905), and 340 CDSs in T. saanensis (Fig 3b, box). This gene pool was assigned to 261 COG and 273 pfam functions (Table S2), while 47 genes had no assigned function. Many of the CDSs in this gene pool were assigned via COG annotations to functions of metabolism and transport of carbohydrates (Table S2). These included glycoside hydrolases (GHs) of family GH1 (pfam00232), GH2 (pfam00703, pfam00754, pfam02836), GH20 (pfam00728), GH28 (pfam00295), GH31 (pfam01055), GH57 (pfam03065, pfam09210), GH88 (pfam07470), and GH92 (pfam07971), alginate lyase of polysaccharide family PL5 (pfam05426), glycosyl transferases (GTs) of family GT1 (pfam00534), GT2 (pfam00535), and GT9 (pfam01075), and transporters of Caenorhabditis elegans ORF (CEO) family (DUF1632)/sugar transport protein (pfam06800) and major facilitator superfamily (MFS) (pfam00083, pfam07690).
Functional diversity in Acidobacteria genomes
For all six strains of Acidobacteria, predicted genes were assigned to four main functional categories – metabolism (Me), cellular processes (Cp), information storage and processing (Isp), and poorly characterized (Pc) within the Cluster of Orthologous Groups (COG) database (Tatusov et al., 1997) as shown in Fig. 4. For metabolism (Me), the highest percent of genes could be assigned to carbohydrate transport and metabolism [G] (9–10%), with the highest abundance in genomes of G. mallensis and T. saanensis among all six Acidobacteria strains. This was followed by amino acid transport and metabolism [E] (7–8%), energy production and conversion [C] (5–6%), and lipid transport and metabolism [I] (3–4%). For cellular processes (Cp), the majority of genes were assigned to cell wall/membrane/envelope biogenesis [M] (8–9%) followed by signal transduction mechanisms [T] (4–5%) and for information storage and processing (Isp) to transcription [K] (7–9%). We infer that the genomes of the three tundra soil strains encode for functions involved in transport and utilization of nutrients, mainly carbohydrates for energy production and cell biogenesis to maintain cell integrity in cold tundra soils.
Carbohydrate transport and metabolism
To explore the genetic potential of the three tundra soil strains to metabolize organic carbon, we analyzed their genomes for CDSs predicted to code for modules that catalyze the breakdown, biosynthesis, or modification of carbohydrates of the carbohydrate-active enzymes (CAZy) family (http://www.cazy.org; Cantarel et al., 2009). CDSs predicted to encode for CAZymes were more abundant in the genomes of the three tundra soil strains, G. mallensis (n = 321), G. tundricola (n = 215), and T. saanensis (n = 244), as compared to the genomes of two other strains of subdivision 1, A. capsulatum (161) and ‘K. versatilis’ (135) (Fig. 5). The genomes of G. mallensis, G. tundricola, and T. saanensis contained gene modules spanning four major CAZyme super families of glycoside hydrolases (GHs) (n = 166, 103, 110, respectively), glycosyl transferases (GTs) (n = 77, 74, 90), polysaccharide lyases (PLs) (n = 9, 4, 4), carbohydrate esterases (CEs) (n = 16, 15, 16), and noncatalytic carbohydrate-binding modules (CBMs) (n = 53, 19, 24). This indicates that the tundra soil strains are abundant in genes encoding for functional activities required for rearrangement of oligo- and polysaccharides. Predicted gene modules encompassed 59 different families of glycoside hydrolases (GHs, 21 GTs, seven PLs, nine CEs, and 12 CBMs (Table S3), emphasizing the elaborate set of enzymes needed for breakdown of different types of plant and/or microbial polysaccharides as well as for the biosynthesis of various polysaccharides. The three tundra Acidobacteria strains contained a large number of predicted CDSs encoding for sugar transporters of the major facilitator superfamily.
Biodegradation of structural and storage polysaccharides
Cellulose and hemicelluloses are the most abundant plant structural carbon polymers found in the biosphere, and therefore, their degradation by microorganisms represents a significant part of the carbon cycle. The efficient degradation of polysaccharides requires the concerted action of many catalytic enzymes and/or noncatalytic CBMs, which facilitate the targeting of enzymes to the insoluble polysaccharides (Warren, 1996). The tundra soil Acidobacteria are versatile heterotrophs isolated using selective plant-based carbon sources (Männistö et al., 2011, 2012). In order to explore the metabolic potential of the three tundra soil strains to hydrolyze biomass polysaccharides, we analyzed their genomes for CDSs predicted to code for main-chain and side-chain cleaving enzymes of the CAZy family. The genomic data were validated by biochemical assays to bridge genome predictions to biochemical activities encoded in their genomes. We identified predicted CDSs of CAZyme families involved in breakdown of plant structural polysaccharides such as hemicelluloses, celluloses, pectin, and storage polysaccharides such as starch/glycogen (Table S3). The tundra Acidobacteria strains grew on a number of plant- and microbe-based polysaccharides as single carbon sources (Table 2).
Table 2. Comparison of carbon substrate utilization/hydrolysis by six strains of Acidobacteria
Hemicelluloses are highly complex heteropolysaccharides requiring a battery of enzymes, belonging to GH and CE families, required for hydrolysis of xylan-, mannan-, and arabinofuranosyl-containing hemicelluloses (Shallom & Shoham, 2003). We identified predicted CDSs encoding for hemicellulolytic enzymes in the genomes of the three tundra soil strains (Table S2). These included endoxylanases represented by family GH10 and exoxylanases represented by family GH39 that successively hydrolyze xylan into short xylooligomers and xylose. In addition, CDSs for acetyl xylan esterases of carbohydrate esterase families CE1 and CE4 that hydrolyze the acetyl substituents of xylose moieties were identified in the genomes of all three tundra soil strains (Table S3). Predicted CDSs for family GH3, GH43, and GH51, which represent α-l-arabinofuranosidases required to cleave l-arabinofuranose side chains found in softwood xylans and GH1, GH2, and GH5 required for hydrolysis of β-mannan-based polymers, were also present in the genomes of the three tundra soil strains. In addition, a large number of CDSs of family GH27, GH36, and GH57 that represent α-galactosidases and GH1, GH3, GH30, and GH116, which represent β-glucosidases, were identified in tundra soil strains (Table S3). By cultivation assays, xylan (from birch wood) degradation was not detected in any of the three tundra strain as assayed by turbidity, CO2 production, or Congo red staining (Männistö et al., 2011, 2012; Table 2). Further studies are underway to assay the xylanase activities against different substrates and under different conditions.
Degradation of cellulose requires three enzyme activities, including endoglucanase, exoglucanase (or cellobiohydrolase), and β-glucosidase. Recently, cellulases were described within 13 GH families, of which GH5 and GH9 appear to have the largest number of biochemically characterized bacterial cellulases with both endo- and exocellulase activity, while no exocellulase activity is identified for GH8 in the CAZy database (Sukharnikov et al., 2011). We identified CDSs belonging to five different glycoside hydrolase families that represent cellulases: GH5, GH8, GH9, GH12, and GH51 (Table S3). CDSs for GH5 were identified in the genomes of G. mallensis and G. tundricola, but not in T. saanensis. CDS for GH9 was only identified in G. mallensis. GH9 cellulases were also present in the genomes of A. capsulatum, ‘K. versatilis’, and ‘S. usitatus’ (Ward et al., 2009). However, no CDSs for exoglucanases or cellobiohydrolase were identified in the genomes of any of the three tundra soil strains.
Carbohydrate-binding modules that are likely involved in cellulose degradation were identified in the three tundra strains (Table S2), which included CBMs binding to GH16 in both Granulicella strains and those binding to GH27 and GH36 in G. mallensis and T. saanensis genomes, but not in G. tundricola. CBMs binding to GH64 and GH55 representing β-1,3-glucanase were identified in all three tundra soil strains, while CBM6 binding to GH55 and GH16 was identified in genomes of G. mallensis and G. tundricola, respectively. CBM6 is reported to have both xylan and cellulose-binding function (Boraston et al., 2004).
Although the three tundra soil Acidobacteria contained cellulases from several glycoside hydrolase families, none of them effectively utilized cellulose when assayed by turbidity or CO2 production (Männistö et al., 2011, 2012; Table 2). No increase in turbidity or CO2 production was detected after 3 weeks of incubation in liquid culture with CMC and a small amount (100 mg L−1) of yeast extract. Subdivision 1 Acidobacteria have been linked to cellulose degradation in sphagnum peat, but the rates of cellulose degradation are extremely low (Pankratov et al., 2011). To determine whether the presence of an easily degradable substrate would trigger CMC hydrolysis, the strains were inoculated on plates containing CMC with peptone and yeast extract and with cellobiose, peptone, and yeast extract. After 3 weeks of incubation, G. mallensis strain MP5ACTX8 scored positive for CMC hydrolysis in plates with both amendments, while G. tundricola strain MP5ACTX9 produced clearing zones, indicative of CMC hydrolysis, only in plates amended with cellobiose. Terriglobus saanensis strain SP1PR4 did not produce a clearing zone on either of the plates. Further studies are needed to determine the factors that trigger cellulose utilization by these strains and by Acidobacteria in general.
Degradation of pectin, starch, and chitin
Pectin cross-links cellulose and hemicellulose fibers and is primarily degraded by a battery of enzymes that include endo- and exopolygalacturonases represented by glycoside hydrolases of family GH28, pectate lyases, and oligo-d-galactosiduronate lyases of the polysaccharide lyase (PL) family and pectin esterases of the carbohydrate esterase (CE) families CE8 and CE12 (Jayani et al., 2005; Abbott & Boraston, 2008). The genomes of all three tundra soil strains contained large number of CDSs of the GH28 family predicted to code for polygalacturonases. We identified CDSs encoding for pectate lyases of families PL1 and PL10 in G. mallensis, PL10 in G. tundricola, and family PL9 in T. saanensis. CDSs for pectin methyl esterases of family CE12 were identified in the genomes of all three tundra soil strains, and pectin acetyl esterases of family CE8 were identified in T. saanensis and G. tundricola. Homologues for CE8 and CE12 were not found in ‘K. versatilis’, but were present in A. capsulatum. CDSs encoding for polysaccharide lyases that represent alginate lyases of family PL5 were identified in all three tundra soil strains and PL7 in G. mallensis. However, homologues for alginate lyases were not identified in the genomes of A. capsulatum, ‘K. versatilis’, or ‘S. usitatus’.
Starch is hydrolyzed by the combined action of α-amylases, β-amylases, other exo-α-1, 4-glucanases, and glucoamylase. We identified in the genomes of G. mallensis, G. tundricola, and T. saanensis a large number of CDSs for glycoside hydrolases of family GH13 (n = 12, 16, 12, respectively) GH15 (n = 2, 2, 2) GH31 (n = 5, 3, 3), and GH57 (n = 2, 2, 2). Homologues of these glycoside hydrolases were also identified in A. capsulatum, ‘K. versatilis’, and ‘S. usitatus’. CDSs for the carbohydrate-binding module CBM48-GH13 were identified in all three tundra soil strains, while CBM41-CBM41-CBM48-GH13 involved in glycogen binding were identified in the G. tundricola genome. All three strains utilized laminarin, starch, and pectin, as detected by both CO2 production and increase in turbidity (Table 2). In addition, G. tundricola strain MP5ACTX9 grew well on pullulan, and G. mallensis MP5ACTX8 grew well on lichenin, the major polysaccharide of lichens.
Chitin is the second most prominent biopolymer (next to cellulose) found in arthropodal exoskeleton composed of β(1-4)-linked N-acetyl-d-glucosamine (GlcNAc). Chitinases (GH18 and GH19 family) hydrolyze the β (1-4)-linkages, whereby GH18 include endoglycosidases that hydrolyze chitobiose core of N-linked glycoproteins. CDSs for GH18 chitinases were identified in the genomes of G. mallensis and T. saanensis, but not in G. tundricola. A GH19 chitinase was identified only in T. saanensis. We did not find any chitosanases (GH46) that hydrolyze chitosan. CDSs for predicted xylanase/chitin deacetylase and GH18-CE4-GT2 were identified in all three tundra soil strains. A large number of CDSs for the carbohydrate-binding module CBM5, CBM12, CBM13, and CBM35 with chitin-binding function were identified in G. mallensis genome. CBM with cellulose-binding domains may also bind to chitin, given the similar structures of cellulose and chitin (Warren, 1996). Nonetheless, no chitinase activity was detected in any of the strains after 10 day-incubation with chitin azure (chitin labeled with Remazol Brilliant Violet). However, all strains were positive for chitobioase activity (Table 2) when assayed using 4-methylumbelliferyl-N-acetyl-β-d-glucosaminide as the substrate (O'Brien & Colwell, 1987).
Biosynthesis of extra-polymeric substances
A large number of gene modules representing GTs of families GT1 (n = 8, 1, 20), GT2 (n = 27, 32, 28), and GT4 (n = 22, 20, 22) were identified in genomes of all three tundra soil strains, G. mallensis, G. tundricola, and T. saanensis, respectively (Table S3). Predicted CDSs for biosynthesis of nucleotide sugars such as dTDP-l-rhamnose, GDP-mannose, cytidine 5′-monophospho-3-deoxy-d-manno-2-octulosonic acid (CMP-KDO), GDP-glucose, and other complex di-, oligo-, and polysaccharides were identified (Table S4). These include CDSs encoding for cellulose synthase (UDP-forming) (glycos_transf_2), α,α-trehalose phosphate synthase [UDP-forming] (glyco_transf_20), ADP-glucose: starch glucosyl transferase (glyco_transf_5), UDP-glucose: ceramide β-glucosyltransferase (glycos_transf_21) involved in biosynthesis of cellulose, trehalose, starch, hopanoid, and capsular/free exopolysaccharide (EPS).
Exopolysaccharide biosynthesis (EPS)
A gene cluster containing CDSs for capsular polysaccharide synthesis protein (COGs 3206) polysaccharide export protein (COGs 1596) and CDSs encoding for GTs of family GT1 was identified in the genomes of all three tundra soil strains. A gene cluster containing CDSs for EpsH (Exosortase_EpsH) (COGs 1368) (AciPR4_0588), EpsI (AciPR4_0589), and a tetratricopeptide (TPR_1) repeat-containing protein (AciPR4_0590) was identified in the T. saanensis genome. Three copies of the exopolysaccharide H (EpsH) gene were found in the genome of G. mallensis (AciX8_0208, AciX8_2453, and AciX8_4634), but not G. tundricola. EPS includes capsular polysaccharide (CPS), as well as free extracellular polysaccharide (slime). EPS is thought to protect cells from desiccation or other environmental stresses and serves as a cryoprotectant of enzymes from cold-adapted microorganisms assisting in colonization of various ecological niches (Roberts, 1996; Nicolaus et al., 2010).
Cellulose biogenesis is reported for several bacteria, of which the most extensively studied is Gluconacetobacter xylinus (formerly Acetobacter xylinum) (Ross et al., 1991; Römling, 2002). Eight different proteins participate in the cellulose biosynthetic pathway and its regulation; these are UDP-glucose pyrophosphorylase, the cellulose synthase, diguanylate cyclase, phosphodiesterase PDE-A and PDE-B, and the recently discovered bacterial cellulose synthesis (bcs) operon that encodes four proteins, BcsA, BcsB, BcsZ, and BcsC. We identified predicted CDSs for UDP-forming GT of family GT2 (Table S3), which encodes for cellulose synthase of the cellulose biosynthesis pathway (MetaCyc: PWY-1001). In the genomes of all three tundra Acidobacteria, clusters of genes were identified in close neighborhood of the cellulose synthase gene (bcsAB: AciX8_2186, AciX9_2052, AciPR4_1394, AciPR4_3357), which included cellulase (endoglucanase Y) of family GH8 (bscZ: AciX8_2185, AciX9_2051, and AciPR4_1395), cellulose synthase operon protein (bcsC: AciX8_2184, AciX9_2050, and AciPR4_1397), and a cellulose synthase operon protein (yhjQ: AciX8_2187, AciX9_2053, and AciPR4_1393) (Fig. S3). Among other Acidobacteria, a cluster of genes, yhjQ (ACP_0074), bcsB (ACP_0075), and a putative endoglucanase Y (ACP_0076) were present in A. capsulatum. However, no homologue for bcsC was found in the A. capsulatum genome and spanned intergenic region in the bcs operon. psi-blast search identified tetratricopeptide repeat region (TPR) motif in all the three bcsC genes, which is involved in the assembly of multiprotein complexes (D'Andrea & Regan, 2003). Homologues for yhjQ, bcsB, bcsC, or bscZ were not identified in the genomes of ‘K. versatilis’ or ‘S. usitatus’. The gene for endo-1, 4-β-glucanase involved in cellulose biosynthesis (GH8) is part of the bacterial cellulose synthesis operon in genomes of the three tundra Acidobacteria and is also observed in the cell operons (celABC and celDE) in the plant-colonizing bacteria Agrobacterium tumefaciens and Rhizobium leguminosarum, where cellulose is suggested as an exopolysaccharide used for colonization of plant host cells (Matthysse et al., 1995; Ausmees et al., 1999).
Starch and trehalose biosynthesis
Glycogen and trehalose are both major bacterial storage carbohydrates used under conditions of limiting growth when an excess of carbon source is available and other nutrients are deficient (Wilson et al., 2010). We identified CDSs for ADP-glucose type glycosyl transferase (Glycos_transf_1, Glyco_transf_5), which encodes for glycogen/starch synthase (GlgA) in the genomes of G. mallensis (AciX8_3243) and T. saanensis (AciPR4_2465), but not in G. tundricola. CDSs for glucan phosphorylase (GlgC) and phosphoglucomutase (PGM), which converts glucose-6-P to glucose-1-phosphate, were present in genomes of the three tundra soil strains. In the common GlgC-GlgA pathway (reviewed by Preiss, 2009), after chain elongation to generate the linear glucan, glycogen is formed by glycogen branching enzyme (GlgB). However, CDSs for glycogen branching enzyme were not identified in any of the three tundra soil strains, nor the two other strains of subdivision 1 Acidobacteria.
Trehalose is reported as a stress-protectant, helping bacteria to survive desiccation, cold, and osmotic stress (Freeman et al., 2010). A gene cluster involved in trehalose biosynthesis from maltose (TreS, MetaCyc: PWY-2622), which included CDSs encoding for trehalose synthase (AciX8_1169, AciX9_2884, AciPR4_3489) and alpha amylase catalytic region (GH13) (AciX8_1170, AciX9_2883, AciPR4_3488), was identified in the genomes of the three tundra soil strains. A second pathway for trehalose biosynthesis from maltodextrins (TreYZ, MetaCyc: PWY-2661) was also identified, which included CDSs encoding for malto-oligosyl trehalose trehalohydrolase, malto-oligosyl trehalose synthase, and glucoamylase/isoamylase of GH15 family (AciX8_1171, AciX9_2882). A predicted CDS for α, α-trehalose phosphate synthase [UDP-forming] of GT family 20 was only identified in the genome of G. mallensis. In E. coli, trehalose-6-phosphate synthase (OtsA), trehalose-6-phosphate phosphatase (OtsB), and cold-shock proteins (Csps) are induced in response to cold shock. We identified CDSs predicted to code for the cold-shock DNA-binding domain protein, CspA in genomes of G. tundricola (n = 9), T. saanensis (n = 4), and G. mallensis (n = 3). CspA, the major cold-shock protein belongs to a family of nine homologous proteins, CspA to CspI in E. coli. In psychrophilic bacteria, basal set of Csps exists and additional Csps appear with more severe cold shocks (Hebraud & Potier, 1999). In addition to Csps (now called cold-induced proteins, CIPs) observed in mesophiles, cold acclimation proteins (Caps) are identified in psychrophilic microorganisms, being constitutively rather than transiently expressed at low temperatures (D'Amico et al., 2006).
Molecular analyses suggest a tremendous diversity of Acidobacteria in tundra and other soil environments, but although ubiquitous, the ecological role of the Acidobacteria remains elusive. Our concerted efforts led to the cultivation of several new slow-growing and fastidious cold-adapted Acidobacteria from tundra soils of northern Finland (Männistö et al., 2011, 2012). The integrated study of the taxonomic, genetic, and functional diversity of Acidobacteria is providing an ecosystem-level understanding of the metabolic networks of Acidobacteria and other species consortia involved in biogeochemical activities in tundra soil environments. We hypothesize that the harsh and changing environmental conditions have selected for a stable bacterial community dominated by Acidobacteria that is only minimally affected by temperature fluctuation and freeze–thaw cycles. Comparative genomic and physiological analysis of these Terriglobus and Granulicella species is providing insights into their roles in organic carbon utilization in Arctic tundra soils and revealing mechanisms promoting their activity and dominance. The genomes of the three tundra soil Acidobacteria contained an abundance of conserved genes/gene clusters encoding for gene modules of the carbohydrate-active enzyme (CAZyme) family. We infer that gene content and biochemical mechanisms encoded in the Acidobacteria genomes strains are shaped to allow for breakdown, utilization, and biosynthesis of diverse structural and storage polysaccharides and resilience to fluctuating temperatures and nutrient-deficient conditions in Arctic tundra soils. We conclude that Acidobacteria communities are central to carbon cycling in Arctic and boreal systems and play a significant role in degradation of accumulated biomass as polar temperatures increase.
This work was supported in part by the National Science Foundation (IPY 0732956), the Academy of Finland (Grant 123725), and the New Jersey Agricultural Experiment Station. We thank Tanya Woyke (Joint Genome Institute) and her project team for sequencing and assembly of the genomes. We are greatly thankful to Lynn Goodwin (Joint Genome Institute) for technical assistance. We thank Bernard Henrissat for updating CAZyme-related information for the three tundra soil strain genomes. The work conducted by the US Department of Energy Joint Genome Institute is supported by the US Department of Energy.