Detection and in situ identification of representatives of a widely distributed new bacterial phylum


Corresponding author. Tel.: +49 (89) 2892 2370; Fax: +49 (89) 2892 2360; E-mail:


16S rRNA gene libraries were prepared by polymerase chain reaction amplification and cloning from soil samples taken periodically from a field with genetically modified plants. Sequence analyses of the cloned rDNAs indicated that 140 of them clustered apart from known bacterial phyla. Based on 31 full sequences a new phylum could be defined. It includes Holophaga foetida, ‘Geothrix fermentans’ and Acidobacterium capsulatum as the only cultured species so far. Therefore, this line of descent was named the Holophaga/Acidobacterium phylum. About 50 published partial sequences of cloned rDNAs retrieved from soil, freshwater sediments or activated sludge from different continents indicate the occurrence of further representatives of this phylum. Two specific hybridization probes were constructed for members of one of four subclusters. A careful data analysis revealed the importance and problems of identifying and dealing with artefacts such as chimeric structure when defining new phylogenetic groups based mainly upon cloned amplified rDNAs. For the first time, the presence of bacterial cells representing this group could be shown in soil, sediment, activated sludge and lake snow by in situ hybridization.


Until the introduction of modern rRNA-based methods for phylogenetic analyses of complex environmental communities as well as in situ detection and identification of microbial cells, the investigation of phylogenetic diversity was seriously hampered by the demand for pure cultures[1]. Only a small proportion of all prokaryotes have so far been cultivated. Meanwhile, the sequencing of rRNA genes directly retrieved from complex environmental samples by specific polymerase chain reaction (PCR) and cloning has become a routine technique in microbial ecology. Consequently, these environmental sequences nowadays represent a reasonable fraction of rRNA sequence data deposited in public databases[2]. The rRNA approach has been applied by numerous research groups to investigate a variety of habitats. Not surprisingly, many hitherto unknown phylogenetic groups more or less closely related to well-known cultivated organisms were indicated by newly determined sequences. In contrast to Archaea, where a new major line of descent, i.e. the Korarchaeota, was proposed based upon environmental sequence data[3], no new phylum within the domain Bacteria has been defined upon rRNA gene sequences retrieved from environmental samples so far. Recently, 16S rRNA gene sequences have been published by different authors which had been derived from soil, sediment and sludge samples from different continents [4–9]. Some of these sequences indicated the existence of organisms phylogenetically related to Acidobacterium capsulatum, an organism which had been isolated from an acidic mineral environment[10]. A moderate relationship of A. capsulatum and the phyla of planctomycetes and chlamydia, Gram-positive bacteria or fibrobacters [9, 10] as well as a separate status was discussed[8]. Given that all published cloned rDNA sequences with significant similarity to the homologous A. capsulatum 16S rRNA had been only partially determined (300–1300 nucleotides), the phylogenetic status of A. capsulatum and its relatives could not be clarified. In the present study, several almost complete 16S rRNA gene sequences were retrieved from soil samples. These comprehensive sequence data provide a more distinct picture of the phylogeny of A. capsulatum and related organisms.

2Materials and methods

2.1Extraction of nucleic acids from soil and rhizosphere samples

Extraction and purification of nucleic acids from soil samples followed a previously described method[11] with some modifications. Soil or macerated roots (5 g) together with 1 g polyvinyl polypyrrolidone, 5 mg lysozyme, and 2.5 g glass beads (diameter 0.17–0.18 mm) were suspended in 5 ml of a solution of 50 mM Tris, 100 mM NaCl and 25% (w/v) sucrose (pH 8.0) and incubated for 30 min at 37°C. Subsequently, 3 ml of a buffer containing 50 mM Tris, 50 mM EDTA, 1% (w/v) sodium dodecyl sulfate and 6% (v/v) phenol (Tris saturated) were added and the resulting solution treated in a bead beater for 1 min. After incubating the mixture at 0°C for 1 h, the nucleic acids were purified by (at least) two phenol/chloroform/isoamylalcohol (25:24:1) extractions followed by (at least) two phenol/chloroform (24:1) extractions.

2.2In vitro amplification, cloning and sequencing of rRNA gene fragments

Almost complete 16S rRNA gene fragments corresponding to nucleotides 8–1541 of the Escherichia coli 16S rRNA molecule were amplified in vitro as described earlier[12]. The amplified rDNA was cloned into the pGEM-T vector and E. coli JM109 using the Promega pGEM-T Vector System II (Serva, Heidelberg, Germany) according to the instructions of the manufacturer. Sequencing of the cloned rDNA was done using rDNA[12] and vector-specific primers and the GATC®-Bio-Cycle Sequencing Kit (GATC GmbH, Constance, Germany) or the Thermo Sequenase Fluorescent Primer Cycle Sequencing Kit (Amersham, Braunschweig, Germany) in combination with the GATC Direct Blotting System (GATC GmbH) or the Licor 4200 Automatic DNA Sequencer (MWG Biotec, Ebersberg, Germany), respectively.

2.3Sequence data analysis

The new 16S rRNA sequences were fitted into an alignment of about 8000 homologous full and partial primary structures using the respective automated tools of the ARB software package[13]. The sequences were screened for potential cloned PCR errors and chimeric structure by calculating likelihoods of the nucleotides at each particular alignment position and performing fractional treeing based on overlapping datasets of 300 contiguous nucleotides. Only full sequences (E. coli positions 8–1542) were used for the construction of phylogenetic trees. Distance matrix, maximum parsimony and maximum likelihood methods were applied as implemented in the ARB software package. Different datasets varying with respect to included outgroup reference organisms (sequences) as well as alignment positions were analyzed. Partial sequence data were incorporated into trees derived from full sequence data according to the maximum parsimony criteria without allowing changes of the existing tree topology using a special tool of the ARB software.

2.4Specific probe design and in situ cell hybridization

Based upon the new data and the database reference sequence data, specific probes were designed using the respective tools of the ARB software package. The oligonucleotides were obtained from MWG Biotech (Ebersberg, Germany). Labelling with fluorescent dyes was performed as described earlier[14]. Cells were fixed and hybridizations were carried out at 48°C following standard protocols[14]. The hybridization solution contained 35% formamide.

3Results and discussion

Beginning in the spring of 1993, soil and rhizosphere samples were periodically taken from the Roggenstein site near Munich (Germany) at which conventional as well as genetically modified plants were cultivated. After the extraction of nucleic acids from the samples, almost complete 16S rRNA gene fragments were amplified in vitro and cloned. The resulting gene libraries were screened by partial sequencing. The 3′- and/or 5′-terminal parts of the 16S rRNA sequences were determined in most cases. Against the background of a comprehensive rRNA sequence database in combination with the versatile ARB software, the partial sequence data containing one of the most variable parts of the molecule could generally be assigned to known phylogenetic groups. However, in any of the various data libraries established from the Roggenstein samples, a major fraction of the clones contained sequences which shared significant to moderate sequence similarity were clearly of bacterial origin but could not be convincingly assigned to defined bacterial phyla.

Given that rRNA primary structures comprise a mixture of evolutionary invariant to highly variable stretches which are informative for different levels of phylogenetic relatedness[15], partial sequence data especially from the variable regions allow rapid recognition of the closest neighbors of the respective organisms if, and only if, the variable regions are represented in the data library. However, such partial sequence comparisons cannot be used for reliable reconstruction of phylogenies[15]. Therefore, from a total of 144 clones a selection of 40 clones representing different levels of similarity was used for complete sequence determination of the rDNA. Phylogenetic analyses revealed that the new group includes three described and cultivated species for which no stable placing in the bacterial tree has yet been established. Acidobacterium capsulatum, an acidophilic chemo-organotrophic menaquinone-containing bacterium, was isolated from an acidic mineral environment[10]. A moderate relationship of A. capsulatum to the planctomycetes and chlamydia, the Gram-positive bacteria with a low DNA G+C content, Fibrobacter as well as a separate status were discussed by several authors [8, 10]. Similarly for Holophaga foetida[16] and ‘Geothrix fermentans’[17]. H. foetida, an obligately anaerobic organism that produces methanethiol and dimethylsulfide during growth on methoxylated aromatic compounds, was isolated from a black anoxic freshwater mud sample[16]. Phylogenetically, the organism was tentatively assigned to the δ-subclass of the Proteobacteria. However, a moderate relationship to the Gram-positive bacteria with a low DNA G+C content was also discussed[16]. Later on, ‘G. fermentans’, an Fe(III)-reducing microorganism isolated from a petroleum-contaminated aquifer, was shown to be a close relative of H. foetida[17]. This two species branch was then discussed as a new line of descent within the bacterial domain. Several authors published partial 16S rRNA sequences retrieved from environmental samples by PCR and cloning which shared striking sequence similarities with the Acidobacterium 16S rRNA [4–9]. However, no relationship between the Acidobacterium cluster and the H. foetida-‘G. fermentans’ group had been established so far.

The comparative analysis of the complete 16S rRNA sequences retrieved from the Roggenstein soil samples during the present study and all available homologous primary structure data support a common origin of A. capsulatum, H. foetida and ‘G. fermentans’ together with the uncultured organisms indicated by the new environmental sequences. Since H. foetida and A. capsulatum are the only validly described species, the new line of descent is called the Holophaga/Acidobacterium phylum, given that Holophaga was described first.

Only full sequences were used for the analyses and calculations described below. These 36 full sequence data have been deposited at the EBI database under accession numbers Z95707–Z95737. The phylum status of the group is shown in the tree of Fig. 1. This tree is based on distance matrix and maximum parsimony analyses of all available small subunit rRNA sequences which comprise at least 1400 nucleotides. A relative branching order cannot be unambiguously determined for the majority of the lines of descent. Furthermore, the monophyletic status of some phyla, e.g. the Gram-positive bacteria with a low DNA G+C content, is questionable. The positioning of the Holophaga/Acidobacterium phylum in the neighborhood of the planctomycetes and chlamydia has no phylogenetic significance. The monophyletic status separated from all other phyla is supported by the results obtained by applying different treeing methods and is also indicated by the overall sequence similarities. The binary similarity values (based on full sequences only) found for the members of the new phylum are 80% and higher, the mean values for interphylum comparisons are in the range of 74–78%. The intraphylum structure is shown in the dendrogram of Fig. 2. This tree is based on a maximum parsimony analysis of the complete dataset (about 5500 sequences) of small subunit rRNA sequences comprising at least 1400 nucleotides and those complete gene library sequences which presumably are not of chimeric structure. These clones are indicated by thick branches in the tree. The topology of the tree was evaluated by distance matrix as well as maximum likelihood analyses of the cloned rDNA primary structures and representative selections of reference sequences from all other major lines of descent of Bacteria and Archaea. Branches for which a relative order could not be unambiguously determined or for which a common topology was not supported applying different treeing methods are connected by common vertical lines. Thin lines indicate cloned rDNA sequences for which a chimeric structure cannot be excluded. These clones were placed using a special ARB parsimony tool which allows the preservation of the topology of the initial tree while including new data. Currently, four subclusters (a–d; Fig. 2) can be defined within the phylum. Two of them (a and c) are represented only by cloned rDNAs so far. Subcluster b contains Holophaga and ‘Geothrix’ and cloned rDNAs, whereas Acidobacterium and other clones represent cluster d. This subclustering is of low significance in the case of the Holophaga and ‘Geothrix’ group and has generally to be regarded as tentative, given that new environmental sequences may ‘fill up’ the ‘gaps’ between the clusters in the future. In addition to the about 100 partial 16S rRNA sequences determined in the present study, there are about 50 available in public databases provided by a number of authors [4–9]. The majority of these sequences comprise fewer than 1000 nucleotides. Therefore, these data were not included when constructing the tree in Fig. 2, but were added later using the special ARB tool described above. For those potential organisms which are represented by partial sequences of at least 600 nucleotides, stippled areas in Fig. 2 indicate the regions where the corresponding branches are most likely to occur.

Figure 1.

The bacterial phyla. The tree is based on a maximum parsimony analysis of the complete dataset (about 5500 sequences) of small subunit rRNA sequences comprising at least 1400 nucleotides. The bar indicates 10% estimated sequence divergence.

Figure 2.

The phylogenetic structure of the Holophaga/Acidobacterium phylum. All clones shown were constructed from Roggenstein soil samples. The tree is based on a maximum parsimony analysis of the dataset used for the tree in Fig. 1. Sequences from cloned rDNAs were included which presumably are not chimeric structures (thick lines). The topology of the tree was corrected (multifurcations) according to the results of distance matrix as well as maximum likelihood analyses. Some rDNA sequences for which a chimeric structure cannot be excluded (thin lines) were inserted into the optimized tree using a special ARB parsimony tool without affecting the initial tree topology. This tool was also used to include published cloned incompletely sequenced rDNAs[13]. The stippled areas indicate regions within which these incomplete sequences were placed. Bold-face letters indicate clones which contain the target sequences of probes IRog1 and IRog2. Subclusters are marked by a–d. The bar indicates 10% estimated sequence divergence.

The majority of cloned rDNAs retrieved from soil [7, 8] and freshwater sediment samples[9] cluster with the Acidobacterium 16S rRNA sequence. Two sequences originating from an analysis of microbial mats [18, 19] apparently group with clone RB25 which has some sequence similarity with Holophaga and ‘Geothrix’. One cloned rDNA sequence[20] was placed in subcluster a. Each partial sequence determined in the present study clearly grouped with one of the full sequences in the subclusters (data not shown). This is not the case with the published partial sequences of less than 600 nucleotides. Most of these sequences exhibit less than 90%, some up to 95% sequence similarity (within the sequenced region) to the full rDNAs. Therefore, a most probable positioning in a phylogenetic tree was not possible. However, they could roughly be assigned to the four subclusters (data not shown).

Artefacts such as PCR errors or chimeric structures may occur when analyzing environmental samples [21, 22]. False residues introduced during in vitro amplification and selected by chance with a particular clone can only be estimated as such, and consequently removed for phylogenetic analyses, if signature positions or residues usually involved in intra-rRNA base pairing are affected[15]. When variable positions not involved in helix formation are changed, false residues usually cannot be identified. However, the information content of variable positions is of lower value for elucidating major phylogenies[15]. Therefore, non-detected artefacts of this nature should not affect the phylogenetic definition of a new major line of descent too seriously. The chimeric structure of cloned rDNA can be recognized using public database facilities[23] or performing fractional treeing if the sequence stretches originating from different sources (organisms) comprise a reasonable number of nucleotides and the source organisms are not closely related. Again, when the chimeric fragments originate from closely related organisms, the influence on the quality of phylogenetic trees may be overcome by the (local) unsharpness resulting from peculiarities of the treeing methods. However, it may be rather difficult to identify chimeric structure as such when only a limited number of environmental sequences are available to define a new major phylogenetic group and the chimeric fragments originate exclusively from different representatives of this group. If inconsistencies are detected, the problem remains to decide which sequence is the correct reference and which one the chimera. In the present study it was assumed that performing multiple, independent nucleic acid extractions, amplification and cloning steps should result in a lower frequency of identical or similar chimeric structures than of correct structures. Checking for chimeric structures, seven of the 38 clones from which complete 16S rRNA sequences were derived were identified as such and deleted from the dataset used for further investigations. Interestingly, only one of the cloned rDNA fragments was a chimera consisting of a 5′-terminal part from a Proteobacteria of the α-subclass closely related to bradyrhizobia and the (major) 3′-part apparently from a representative of the new phylum. The other chimeric structures were exclusively formed by sequence parts from putative organisms of the Holophaga/Acidobacterium phylum. The intragroup chimeras mentioned above were not detectable until a reasonable number (20–30) of complete sequences had been determined and stable subclusters (Fig. 2) could be detected performing treeing analyses. Fractional treeing indicated that the rDNAs contained parts from putative representatives of different clusters. Some of the sequences might represent intra-subcluster chimeras or contain evolutionary peculiarities. These sequences are indicated by thin lines in Fig. 2.

Besides the problems of amplification and cloning artefacts, another important aspect of performing rRNA-based environmental analyses is answering the question whether the putative organisms from which the sequence data originate are really present in the environment studied. Currently, without cultivation this can only be demonstrated by applying in situ cell hybridization techniques. Most of the published environmental studies provide (more or rather less) complete sequence data but fail to perform the last step of the so-called full rRNA cycle[1]. Therefore, two hybridization probes, IRog1 (5′-AAGGCGGCATCCTGGACC-3′) and IRog2 (5′-GCAGTGGGGAATTGTTC-3′), specific for members of sequence cluster a of the Holophaga/Acidobacterium phylum (Fig. 2) were designed complementary to rRNA primary structure stretches homologous to positions 353–369 and 728–745 of the E. coli small subunit rRNA molecule, respectively. The target sites of the probes are present in the clones specified in Fig. 2 and a number of other clones for which partial sequences were determined only (data not shown). The combined application of two or multiple probes of identical or overlapping specificities targeting different parts of the 16S rRNA molecule is of advantage in two respects: (i) the non-chimeric nature of cloned sequences can be evaluated by double (multiple) hybridization of individual cells (at least for the primary structure region flanked by the probe target sites); (ii) the identification of targeted cells can be achieved more reliably. The underlying assumption for the latter statement is that the likelihood for the presence of identical target sites within rRNA primary structures from phylogenetically unrelated environmental organisms is reduced with the number of different targets or probes.

In situ hybridizations were performed with the original soil and rhizosphere material as well as with samples from diverse environmental origins. Representatives of the new phylogenetic group were found in the soil and rhizosphere samples from the Roggenstein area, in soil from a coniferous forest (located north of Munich, Germany), in sediments and lake snow aggregates from Lake Constance (Germany) and even in activated sludge sampled from different wastewater plants close to Munich (Germany) (Fig. 3). In the case of Lake Constance lake snow and sediment, partial 16S rRNA sequence data, which upon comparative analysis grouped in subcluster c, could be retrieved from gene libraries (data not shown). Different morphotypes, coccoid cells, short rods and even threadlike structures could be detected. This morphological diversity is not surprising since the overall similarity values of sequences containing the target sites range from 90% to 94.6% indicating a wide phylogenetic spectrum. The targeted cells were usually found as more or less dense cell clusters within the different habitats studied.

Figure 3.

Whole cell hybridization of environmental samples with subcluster-specific probes. Scale bars, 10 μm. Upper panel: Sediment of Lake Constance (left picture) hybridized with CY3-labelled probe IRog1. Phase-contrast (left half) and epifluorescence micrograph (right half, CY3-specific filter). Lake snow aggregates from Lake Constance (right picture) hybridized with CY3-labelled probe IRog1. Phase-contrast (left half) and epifluorescence micrograph (right half, HQ filter). Middle panel: Activated sludge from wastewater treatment plant Dietersheim (Germany) hybridized with CY3-labelled probe Irog1. DAPI staining (left picture) and epifluorescence micrograph (right picture, HQ filter). Lower panel: Rhizosphere samples from rape plants cultivated at the Roggenstein area (Munich) simultaneously hybridized with probe Eub338 specific for the domain Bacteria[14] and the Holophaga/Acidobacterium subcluster-specific probes Irog1 and IRog2. Color 1 indicates cells hybridizing with the bacterial probe and IRog1 or IRog2, color 2 those hybridizing with the bacterial probe and IRog1 as well as IRog2, color 3 those hybridizing only with the bacterial probe. Other colors are due to autofluorescence.

The comparative sequence analyses of 16S rRNA gene libraries retrieved from diverse soil, sediment and aquatic environments (lake snow) indicated a major role of the corresponding organisms in the environment. The fact that these rDNAs were amplified and cloned from diverse locations in different parts of the world such as field and wood soil as well as freshwater sediment and lake snow from southern Germany (this study), peat from northern Germany[8], forest soil from Great Britain[7], sediment from USA[10], soil and activated sludge from Australia [4, 5], and soil from Japan[6] supports this assumption. For some of these environments the actual presence of bacterial cells belonging to this phylum was shown by in situ cell hybridization for the first time. Despite this ubiquitous distribution, the organisms escaped detection until recently. One of the reasons could be the difficulty to obtain these organisms in pure cultures. Only three representatives have been cultured so far [10, 16, 17].

Based upon the results obtained in the present study, it has to be assumed that the application of modern molecular techniques for the investigation of complex environments will elucidate the existence of further hitherto undetected or underestimated lines of descent within the living world. However, the findings also show that comparable investigations were often based on insufficient data retrieval and analysis. For a reliable rRNA sequence-based identification and phylogenetic positioning partial sequences of appropriate regions are only informative if close relatives are present in the reference data set. Otherwise, these data are rather worthless and may result in misallocations as was the case with sequences from Australian soil samples [4, 8]. Especially when defining new major phylogenetic groups not only full sequences but also a reasonable number of different sequences should be determined and used. It is well known that single organisms which are related to all reference organisms represented in rRNA sequence databases at the phylum level usually cannot be stably positioned in phylogenetic trees until additional data from more closely related organisms are available. The history of Holophaga and Acidobacterium is an example. Moreover, cloned environmental samples should be carefully screened for artefacts such as chimeric structure. An example of a chimeric sequence which was not detected as such and was described as a member of the Acidobacterium cluster is clone MC22 [4, 8]. The chimera check can successfully be performed if sequence data from cultured reference organisms (for the chimeric parts) are available, however, it may be difficult to recognize a chimeric structure as such when the individual parts originate from organisms of a phylogenetic group which is only represented by cloned amplified rDNA. Again, increasing the number of different full sequences helps to detect inconsistencies. However, when it cannot be unambiguously decided whether these inconsistencies result from experimental artefacts or from evolutionary peculiarities, in situ hybridization with multiple probes complementary to distant targets within the rRNA molecule should help.


This work was supported by grants of the Bayerische Forschungsstiftung (FORBIOSICH) and the German Bundesministerium für Bildung, Kunst, Forschung und Technologie.