Molecular characterization of the microbial species that colonize human ileal and colonic mucosa by using 16S rDNA sequence analysis


T.H.J. Florin, University of Queensland, Department of Medicine, Mater Health Services Adult Hospital, South Brisbane QLD 4101, Australia (e-mail:




The aim of this study was to characterize the bacterial community adhering to the mucosa of the terminal ileum, and proximal and distal colon of the human digestive tract.

Methods and Results: Pinch samples of the terminal ileum, proximal and distal colon were taken from a healthy 35-year-old, and a 68-year-old subject with mild diverticulosis. The 16S rDNA genes were amplified using a low number of PCR cycles, cloned, and sequenced. In total, 361 sequences were obtained comprising 70 operational taxonomic units (OTU), with a calculated coverage of 82·6%. Twenty-three per cent of OTU were common to the terminal ileum, proximal colon and distal colon, but 14% OTU were only found in the terminal ileum, and 43% were only associated with the proximal or distal colon. The most frequently represented clones were from the Clostridium group XIVa (24·7%), and the Bacteroidetes (Cytophaga-Flavobacteria-Bacteroides) cluster (27·7%).

Conclusion: Comparison of 16S rDNA clone libraries of the hindgut across mammalian species confirms that the distribution of phylogenetic groups is similar irrespective of the host species. Lesser site-related differences within groups or clusters of organisms, are probable.

Significance and Impact: This study provides further evidence of the distribution of the bacteria on the mucosal surfaces of the human hindgut. Data contribute to the benchmarking of the microbial composition of the human digestive tract.


The human digestive tract is extensively colonized by autochthonous microbiota. The numbers are lowest in the stomach where few are adapted to the acid environment, and remain low in the small intestine due to relatively fast transit and high bile acid concentration (Cummings and Macfarlane 1991; Florin and Woods 1995). However, the total number of microbial cells (domain Bacteria and Archaea) can reach 1013 in the caecum, which is 10 times higher than the total number of somatic cells in the entire body (Berg 1996; Tannock 2000). Total microbial cells in the ascending, transverse, descending and rectosigmoid colon is high although the composition of this community changes depending on the diet and location. For example, from proximal to distal colon, the growth rates and activities of some bacteria decline as the environment becomes depleted of exogenous non-starch polysaccharide and resistant starch (Cummings and Macfarlane 1991). Evidence is accumulating that the composition and activity of the intestinal bacterial community has a significant impact on the health of the host because of its influence on nutrition, bowel habit, the physiology of the mucin barrier, and the ontogeny of the mucosal immune system (Moreau and Corthier 1988; Okada et al. 1994; Hooper and Gordon 2001; Hooper et al. 2002). In addition, the composition and activity of the intestinal bacterial community is likely to influence the pathogenesis of colorectal cancer (Van Munster and Nagengast 1993; Kado et al. 2001) and inflammatory bowel diseases (Dalwadi et al. 2001).

An understanding of complex microbial communities has been greatly enhanced by the development of molecular ecological techniques based on the 16S subunit of rRNA (Pace 1997). Early gut microbial ecology work depended on cultivation methods (Moore and Holdeman 1974), but a large majority remained unculturable by known classical methods, and these methods do not reflect the true diversity of the gut bacteria (Raskin et al. 1997). A 16S rDNA clone library of a single human faecal sample has revealed extensive diversity (Suau et al. 1999). Only 24% of sequences in this single faecal sample were represented by cultured organisms, and 76% of analysed sequences belonged to unknown bacterial species. The numbers of yet to be cultured micro-organisms may be much higher still (Krause and Russell 1996; Pace 1997).

The mucosal vs luminal location of intestinal bacteria may be important for both normal health and pathology. It is believed that adhesion to intestinal cells and mucus is one of the most important properties of colonic bacteria necessary for them to colonize and proliferate inside the gastrointestinal tract (Adlerberth et al. 1996). However, from the host viewpoint, the bacteria that colonize a location adjacent, or adherent to epithelial cells, or mucus, are more likely to affect, or be affected by the host's immune system (Okada et al. 1994; Tannock 2000; Hooper et al. 2002). In addition, the adhesion of non-pathogenic bacteria to the epithelial surface may contribute to the barrier that effects host resistance to pathogenic bacteria (Hooper and DiSpirito 1985; Tannock 2000; Hooper et al. 2002). Due to enriched oxygen and other blood-borne nutrients, the microenvironment on the surface of the intestinal epithelium is different to the lumen, and hence the composition of the bacterial populations in the two locations may not be the same (Pryde et al. 1999; Zoetendal et al. 2002). In this study, we performed a 16S rDNA-based analysis of mucosal bacteria from the ileum, proximal and distal colon of normal human subjects. It is the first study to our knowledge that describes the mucosal bacterial populations at different sites of the human gastrointestinal tract in the same subject using a 16S rDNA sequence analysis.

Materials and Methods

Human biopsy collection

Pinch biopsy samples were collected during colonoscopy from the descending (part of the left or distal) colon of patient AB, a 68-year-old female with mild sigmoid colonic diverticulosis – sigmoid diverticulosis is a normal finding in older Angloceltic subjects that occurs in more than 50% (Manousos et al. 1967; Hughes 1969) – and from the terminal ileum, proximal (right) colon and distal (left) colon of patient LC, a 35-year-old female undergoing cancer screening and found to have a normal colon. Colons were well cleaned with an oral colonic lavage preparation, PicoprepTM (Pharmatel Pty Ltd, Thornleigh, Australia). Neither subject, LC or AB, had taken antibiotics for at least 6 months. Biopsies were immediately washed in anaerobic phosphate buffer (0·1 m sodium phosphate), placed into labelled Eppendorf tubes, snap frozen in liquid nitrogen, and stored at −80°C.

Total DNA extraction

Stored biopsies were transferred to liquid nitrogen, then ground into powder using a pestle. Total DNA was extracted using the method described by Krause et al. (2001). In brief, the biopsy powder was suspended in 0.5 ml TE buffer (100 mm Tris–HCl plus 10 mm EDTA, pH 8·0) to which 100 μl of 10% SDS, and 100 μl of proteinase K (10 mg ml−1) were added. The mixture was incubated at 55°C for 150 min with occasional mixing, then 120 μl of 5 m NaCl and 65 μl of 10% CTAB (hexadecyltrimethylammonium bromide, Sigma, Castle Hill, Australia) were added, and incubated at 65°C for a further 10 min. The mixture was extracted twice with 0·7 ml of chloroform-isoamyl alcohol (24:1), and precipitated with two volumes of ethanol. After washing the DNA pellet with 70% ethanol, the pellet was resuspended in 50 μl of distilled water.

Amplification of bacterial 16S rDNA

Two sets of PCR primers (Genworks, Adelaide, Australia) were used to generate the partial and near-full length sequences of 16S rDNA. Partial sequences were performed on all biopsies. Near-full sequences were performed on the AB descending colon biopsy only. The partial sequence amplicons, which included two of the nine variable regions of the 16S rDNA, were ca 350 bp long, and produced by using eubacterial primers 27f (5′ AGAGTTTGATCMTGGCTCAG 3′, E. coli positions 8–27) and 342r (5′ CTGCTGCSYCCCGTAG 3′, E. coli positions 342–358). The near-full sequence of bacterial 16S rDNA, which included all nine variable regions of the rDNA, were ca 1500 bp and were produced by eubacterial primers 27f and 1492r (5′ TACGGYTACCTTGTTACGACTT 3′, E. coli positions 1492–1513) (Lane 1991). The PCR reaction mixture contained 50 ng DNA, 1 U of Taq DNA polymerase (Promega, Madison, WI, USA), 1 × Red Hot Taq reaction buffer, 2.5 mm MgCl2, 400 μm each dNTP and 6 pmole of each primer. PCR cycling conditions were one cycle at 94°C for 2 min, 60°C for 5 s, and 72°C for 8 s; followed by 20 cycles at 94°C for 2 s, 50°C for 5 s and 72°C for 8 s; and one cycle at 94°C for 2 s, 50°C for 5 s and 72°C for 8 min. The PCR reaction used a heated lid program with a final extension at 30°C for 30 min. Only 20 cycles of amplification were used to prevent bias towards one particular group (Wang et al. 1996; Bonnet et al. 2002).

The concentration of amplified DNA was determined densitometrically on 0·8% agarose gels. As the yield of both partial (350 bp) and near-full (1500 bp) sequence PCR products was low, two batches of PCR products (100 μl) were pooled together and concentrated 10× by ethanol precipitation. Briefly, 100 μl of PCR product was mixed with 10 μl of 3 m sodium acetate (pH 4.6) and 200 μl of ethanol. After centrifuging at room temperature for 15 min, the pellet was washed with 70% ethanol and redissolved in 20 μl of distilled water.

Construction of 16S rDNA clone libraries

The PCR product was ligated into the pCR 2.1-TOPOTM vector (Invitrogen, Carlsbad, CA, USA), then transformed into competent E. coli cells (One ShotTME. coli, Invitrogen) by chemical transformation as described by the manufacturer. Recombinant cells were grown on Luria-Bertani (LB) agar plates which contained 50 μg ml−1 ampicillin, X-Gal (5-bromo-4-chloro-3-indolyl-b-d-galactopyranoside) and isopropyl-b-thiogalactopyranoside (IPTG). One hundred to 150 white colonies were picked from each cloning library and grown in LB broth containing 50 μg ml−1 ampicillin.

Plasmid DNA was extracted by a rapid plasmid DNA extraction method (Sambrook et al. 1989). Briefly, 0.5 ml of overnight culture were mixed with 0.5 ml of phenol:chloroform (1:1). After mixing, the mixture was separated by centrifuge at 12 800 g for 5 s, then 300 μl of aqueous phase was removed to a clean Eppendorf tube and mixed with 300 μl of isopropanol. Following storage at −20°C for 5 min, plasmid DNA was precipitated by centrifuging at 12 800 g for 10 s. The pellets were finally resuspended in 50 μl of distilled water.

Restriction fragment length polymorphism (RFLP) analysis

The diversity of the clone libraries was initially examined using RFLP analysis and expressed as the percentage of a specific operational taxonomic unit (OTU). Although there were two different length PCR products generated from the experiment, the RFLP database used in our experiment is based on partial 16S rDNA sequences covering the variable regions V1 and V2. Therefore, RFLP analysis for clone libraries with near-full sequence ca 1500 bp inserts, was conducted with partial sequences using the PCR primers 27f/342r. In detail, a 35 cycle PCR reaction was performed with near-full sequence 16S rDNA using plasmid-targeted primers (M13 forward: 5′ GTAAAACGACGGCCAG 3′ and M13 reverse: 5′ CAGGAAACAGCTATGAC 3′), then the products were further amplified with the bacteria paired primers 27f and 342r to produce the final ca 350 bp bacterial 16S rDNA. RFLP patterns were obtained by digesting the 350-bp PCR products with restriction endonucleases enzymes Dde I and Alu I (Promega, Madison, WI, USA) and analysed by electrophoresis in 3·0% low melting point agarose gel (Promega). By comparing the patterns from each colony pick, the OTU were identified based on individual RFLP patterns, which were characterized further by DNA sequencing.

DNA sequencing

Plasmid inserts from individual OTU were obtained by PCR reaction using M13 forward and reverse primers, and purified using the Machery-Nagel DNA purification NucleospinTM kit (Integrated Sciences, Sydney, Australia). The sequence reaction for partial sequences contained the ABI Big Dye Terminator sequencing kit (Perkin-Elmer, Vernon Hills, IL, USA) and primer M13 reverse. Three PCR primer sets, M13 forward, M13 reverse and primer 907 were used for near-full sequencing. All reactions were done in duplicate if there were more than one OTU that was believed to be the same, by an automated ABI 373A sequencer at the Australian Genome Research Facility (University of Queensland, St Lucia, Australia).

Phylogenetic analysis

The two sets of partial (350 bp) and near-full (1500 bp) rDNA sequences were compared directly with those in GenBank by BLASTTM searches. Sequences with 99% similarity (<1% diversity) were designated as the same species. Only sequences that corresponded to domain Bacteria underwent further phylogenetic analysis, as a few human genome sequences were also detected. All sequences and their closest relatives were aligned using CLUSTALWTM. Phylogenetic analysis was carried out using TREECONTM phylogenetic analysis package. The SIMILARITY and DNADIST functions were used to analyse linkage and distance with the Kimura correction. Rooted trees were produced from distance matrices by the Neighbour-joining program. The stability of branches was checked by bootstrapping. Sequences were screened for possible chimeras with CHIMERA_CHECK (Maidak et al. 1997). Confirmed chimeric sequences were excluded from further analysis.

Nucleotide sequence accession numbers

The partial and near-full 16S rDNA sequences of isolates used in the phylogenetic analysis have been deposited in the GenBank under the acronym adhumuc for ‘adult human mucosal’ preceding the accession numbers, which were from AF499828 to AF499911. The reference strains used in phylogenetic analysis were also from the GenBank database.

Ecological statistical methods

The percentage of coverage of the diversity by the clone libraries was calculated with the formula [1 − (n/N)]/100, where n is the number of molecular species represented by one clone (single-clone OTU) and N is the total number of sequences (Good 1953). The composition and diversity of different clone libraries was quantified by using the Ochiai index equation (Ludwig and Reynold 1988). The Ochiai index ranges from 0 (most dissimilar) to 1 (most similar). Correlation analysis was conducted to compare the effects of partial and near-complete sequences on cloning bias. No post hoc analyses were conducted given that the correlation was high (r > 0·95).

Ethical considerations

The study was approved by the Mater Health Services Research Ethics Committee.


Comparison of partial and near-full sequence bacterial 16S rDNA clone libraries

We first compared the precision of a partial sequence screening of bacterial diversity with a near-full sequence screen, using the two different insert lengths to determine if there was a cloning bias. The concentrations of both lengths of PCR amplicon were similar, but both needed to be concentrated 10 times to achieve an appropriate concentration for cloning. A total of 110 clones from the 1500 bp and 80 from the 350 bp 16S rDNA AB distal colon (left colon) clone libraries were picked for RFLP analysis.

An estimation of the bacterial diversity in each library was made by plotting the cumulative frequency of the unique OTU as a function of the number of clones analysed. The results presented in Fig. 1 indicated that the two AB distal colon libraries had a similar overall diversity and that the sequence length did not influence the analysis (correlation analysis, r = 0·98). The phylogenetic analysis of the OTU showed that Bacteroidetes, Clostridium cluster XIVa (Collins et al. 1994) and Proteobacteria were the predominant bacterial groups identified from both of the AB clone libraries (Ochiai index 0·69). A similar percentage of recovery occurred in each major group from the partial and near-full clone libraries, which suggests that the PCR-based 16S rDNA cloning technique for analysis of the mucosal bacteria was stable and reproducible. The percentages of predominant bacterial groups from AB distal colon – both clone libraries combined in the table because they are so similar – are shown in Table 1.

Figure 1.

Comparative biodiversity of distal (descending) colonic mucosal bacteria from partial (▴) and full (▵) sequence 16S rDNA clone libraries of patient AB. The results were derived from RFLP analysis. The cumulative OTU is expressed as a function of the total number of clones that have been analysed

Table 1.  Summary of bacterial diversity from mucosa of human terminal ileum and colon (proximal and distal) obtained by 16S rDNA analysis
  1. *Samples taken from terminal ileum, proximal and distal colon of a healthy 35-year-old female. Partial 350 bp sequences.

  2. †Samples taken from the descending colon of a 68-year-old female with mild sigmoid colonic diverticulosis. Sequence lengths were both 350 and 1500 bp.

Alphaproteobacteria (%)06·73·3
Betaproteobacteria (%)1·46·23·8
Gammaproteobacteria (%)126·713·8
Bacteroidetes (Bacteroides CFB) (%)3817·327·7
Clostridium cluster I (%)01·30·6
Clostridium cluster IV (%)017·98·9
Clostridium cluster IX (%)11·81·4
Clostridium cluster XI (%)13·706·9
Clostridium cluster XIVa (%)34·115·324·7
Clostridium cluster XVIII (%)10·705·4
Bacillus-Lactobacillus-Streptococcus (%)11·31·1

Comparison of 16S rDNA clones of mucosal bacteria from terminal ileum, proximal colon and distal colon

In the previous section, we showed that analysis of partial 16S rDNA covering V1 and V2 regions revealed the diversity of the human mucosal bacterial community as accurately as near-full sequence rDNA. Therefore, to compare the complexity of the human mucosal bacterial community vs intestinal site, three partial sequence 350-bp clone libraries were constructed using biopsy DNA from terminal ileum, proximal colon and distal colon of the healthy 35-year-old subject, LC (Fig. 2).

Figure 2.

Comparative biodiversities of mucosal bacteria from terminal ileum (▴), proximal colon ( inline image) and distal colon (▵) 16S rDNA clone libraries of patient LC. The results were derived from RFLP analysis. The cumulative OTU is expressed as a function of the total number of clones, which have been analysed

The composition of the clone libraries was compared by OTU and phylogenetic analyses of RFLP, and then confirmed with phylogenetic analysis of sequences. Twenty seven OTU were identified from the terminal ileum, 30 OTU from the right colon, and 41 OTU from the left colon. Twenty-three per cent of OTU were common to the three clone libraries, 14% OTU were unique to the terminal ileum, and 43% OTU were only found in the proximal or distal colon. There appeared to be an internally consistent proximal-to-distal gradient in which the number of OTU increased and the shared OTU were least between the most distant sites, terminal ileum and distal colon (Ochiai index 0·38). The biodiversity of the ileal clone library appeared closer to that of the proximal colonic clone library (Ochiai index 0·57), which in turn was closer to the distal colonic clone library (Ochiai index 0·62).

The major bacterial phylogenetic groups and the percentage of total analysed clones from all three intestinal sites of LC were not disssimilar overall (Ochiai indices 0·38–0·62, Figs 3–6), and are presented in aggregated form in Table 1. Most species were in Bacteroidetes (Cytophaga-Flavobacteria-Bacteroides), and Clostridium clusters XIVa, XI and XVIII, with species in these major groups of bacteria presenting consistently in all three of the clone libraries. The Ochiai index to compare the composition and diversity of the partial sequence LC distal and AB distal colonic clone libraries was 0·25.

Figure 3.

Phylogenetic tree derived from analysis of 16S rDNA sequences from human gut mucosal biopsy samples for Bacteroidetes (CFB). The tree for this figure and Figs 4–6 were constructed by Neighbour-joining program. Bootstrap values are based on 100 replications. Sequences derived from the database are shown in italics. Experimental sequences are in bold and are named on the basis of origin and OTU number. (Thus, in the naming, subject = LC or AB; site = terminal ileum TI, proximal (right) colon = RC, or distal (left) colon = LC; f = near-full sequence; number = experimental OTU number; number in brackets = number of OTU clones). Methanobacterium formicicum is used as the outgroup sequence. The scale bar represents substitutions per 100 nucleotides

Figure 4.

Phylogenetic tree derived from analysis of 16S rDNA sequences from human gut mucosal biopsy samples for Clostridium cluster XIVa

Figure 5.

Phylogenetic tree derived from analysis of 16S rDNA sequences from human gut mucosal biopsy samples for Clostridium clusters I, IV, IX, XI and XVIII

Figure 6.

Phylogenetic tree derived from analysis of 16S rDNA sequences from human gut mucosal biopsy samples for the members of Proteobacteria

Phylogenetic analysis of 16S rDNA sequences

Bacteroidetes. With the three LC libraries, a total of 80 clones, comprising 26 clones from terminal ileum, 27 clones from right colon and 27 from left colon, were allocated using BLAST search of the GenBank database into two clusters of Bacteroidetes. This accounted for 39% of the LC clones. Figure 3 shows the relationship of these clones to reference species in Bacteroidetes. Bacteroides fragilis cluster contained the most clones. A total of 69 clones (88%) were identified as B. vulgatus with >98·8% sequence similarity. Of the remainder, three clones from LC terminal ileum and left colon libraries were identified (>99·1% sequence similarity) as the uncultured human fecal bacterium adhufec 51 (gi/6456097), which is closest to B. caccae, and three clones from the right colon and left colon libraries may belong to a novel species (sequence similarity 96%) that were closest to B. thetaiotamicron. Five sequences from LC left colonic and terminal ileal libraries that were placed in a single robust cluster (bootstrap value of 100%), had no closely related sequences in the public database, with BLAST search showing them closest to B. distasonis (sequence divergence 5·1%).

The phylogenetic distribution of sequences from the two AB left colon libraries are also shown in Fig. 3. A total of 27 clones fell inside the Bacteroidetes, of which 26 clones belonged to the B. fragilis cluster. Phylogenetic analysis indicated that 92% of these clones were represented by B. vulgatus (sequence similarity >98·6%). Two clones were close to the uncultured bacteria HUCC30 (sequence similarity 98·8%), which was previously isolated from human colon, and B. ovatus (sequence similarity 94·6%). Only one sequence was found in the vicinity of B. distasonis (sequence similarity 97·6%).

Clostridium (Firmicutes). The majority of clones of all libraries were in Clostridium (Collins et al. 1994). There was a total of 122 LC sequences in Clostridium, which comprised 41 sequences from the LC terminal ileum, 41 sequences from LC right colon, and 40 sequences from LC left colon library. In addition, there were 18 sequences from the AB partial 16S rDNA left colon library and 35 sequences from the AB near-full sequence 16S rDNA left colon library. As shown in Figs 4 and 5, the phylogenetic distribution of cloned sequences from LC libraries is remarkably different from the AB libraries. Cluster XIVa (Clostridium coccoides), cluster XI (Clostridium bifermentans), cluster XVIII (Clostridium ramosum) and cluster IX (Selenomonas sputigena) were represented in LC libraries, while sequences from AB libraries were mainly found in cluster XIVa, cluster IX, cluster IV (Clostridium leptum) and cluster I (Clostridium perfringens).

In general, it was cluster XIVa that contained most sequences from both AB and LC libraries. Most of these sequences were closest to uncultured bacteria published in GenBank. Four of the cluster XIVa OTU from the AB left colon libraries designated as ABLCf20, ABLC21, ABLCf89 and ABLC75 were affiliated with butyrate producing bacteria published by Barcenilla et al. (2000) (butyrate producing T2-145 gi/6899961; butyrate producing T1-815 gi/12406799; butyrate producing A2-231 gi/13548352; and butyrate producing L1-83 gi/12331176, respectively) with <2% sequence divergence (Fig. 4).

Within Clostridium cluster XIVa (Fig. 4), 19 clones from LC terminal ileum, 10 clones from LC left colon, 14 clones from LC right colon and three clones from the AB partial 16S rDNA clone library have identical sequences (similarity 99·7%) to uncultured bacteria mpn-group 24, near Eubacterium formicigenerans. One clone from the AB partial 16S rDNA clone library, three clones from AB near-full sequence 16S rDNA clone library and 11 clones from LC libraries have similar sequences (similarity > 98·6%) to uncultured bacteria AF54. Six clones from three LC libraries were closest to uncultured human fecal bacterium 66·25 (similarity 98·8%). Seven clones from LC libraries have sequences closest to uncultured human faecal bacteria 71·25 (sequence similarity 97·7%). Three cluster XIVa LC right colonic clones were identical with Ruminococcus gnavus (sequence similarity 99·7%), a known mucolytic bacteria (Hoskins et al. 1985). A further six clones, four ABLCf 6 and two ABLCf 11, were in the vicinity of Lachnospira pectinoschiza and Clostridium xylanolyticum respectively, with sequence divergence 5·4%.

In addition to numerous sequences in cluster XIVa, many LC sequences also fell in Clostridium cluster XI and cluster XVIII (Fig. 5). Twenty-two clones have sequences closest (similarity 99·7%) to Clostridium ramosum in cluster XVIII and 28 clones have sequences closest (similarity 98·2%) to uncultured bacterium F17 isolated from swine feaces (gi/8347693), which phylogenetically belong to Clostridium mayombeii, located in cluster XI. There were two LC clones identical (similarity 99·2%) with Veillonella atypica.

Six other clostridia sequences identified from 32 AB left colonic clone libraries mainly allocated to clusters IV, I and IX. Four of these clones were identical to butyrate producing bacterium A2-165 (similarity 99·1%) which is close to Faecalibacterium prausnitzii in cluster IV, while another two clones may belong to a new species as there was only 91·1% sequence similarity to Ruminococcus bromii. The other 22 sequences belonging to cluster IV formed two OTU that branched deeply. BLAST data showed the closest relatives were uncultured bacterium CB25 (gi/16549091) and Anaerofilum pentosovoran (similarity 94·1%). In clusters I and IX, two AB clone sequences were closest (similarity 98·6%) to Clostridium perfringens, and two clone sequences identified (similarity 99·2%) with Veillonella atypica.

Proteobacteria. A total of 78 clones comprising mainly 36 AB near-full sequence and 37 AB partial sequence 16S rDNA left colonic clones were in Proteobacteria– only five LC clones were related to this heterogeneous group. Based on phylogenetic analysis, the Proteobacteria clones fell into three subdivisions (Fig. 6). Most clones were in Gammaproteobacteria. Twenty-five AB near-full sequence and 16 AB partial sequence 16S rDNA clones had similar sequences to Shigella sonnei and S. flexneri with only 0·6% and 1·7% sequence divergence respectively, and thus accounted for the majority of Proteobacteria sequences. One AB clone was near Haemophilus parainfluenzae, with sequence divergence of 4·2% and two LC clones were identified with H. influenzae (sequence similarity 99·0%). Three sequences affiliated with Stenotrophomonas maltophilia (sequence similarity 98·6%), and one sequence identified with uncultured Bihiii 13 (sequence similarity 99·0%), which was isolated from industrial waste samples.

One clone was identical to Leptothrix cholodnii (sequence similarity 100%) and six clones belonged to an uncultured environmental bacteria D100 (sequence similarity 98·8%) adjacent to Herbaspirillum lemoignei in Betaproteobacteria. The sequences of two OTU, ABLC72 and ABLC15, were near uncultured bacteria S26-9 and CDC group Ivc-2 JHH 1448, respectively, but the sequence divergences of 6·1 and 8·3%, respectively, suggest that these two clones belong to novel species. There was one LC OTU, which was identical to Burkholderia cepacia (sequence similarity 99·4%).

Three clones from the 350-bp clone library and four clones from the 1500-bp clone library of AB were closest to Methylobacterium sp., and three 350-bp AB clones and one 1500-bp AB clone were closest to Sphingomonas in Alphaproteobacteria.

Other bacteria. Two OTU from the 350-bp AB clone library were closest to Streptococcus pneumoniae (sequence similarity 98·8%) and two OTU from the terminal ileum of LC had sequences closest to S. salivarius.


The human digestive tract harbours a highly complex and abundant community of micro-organisms (Tannock 2000; Hooper and Gordon 2001). We are in contact with components of this microbiota from birth, yet little is known about their influence on the normal physiology and pathology of the host (Hooper et al. 2002). This was recently demonstrated when germfree mice were colonizd with B. thetaiotaomicron, a prominent component of the normal mouse and human intestinal microflora. Global intestinal transcriptional responses to colonization were observed with DNA microarrays, and revealed that this commensal bacterium modulated expression of genes involved in several important intestinal functions including nutrient absorption, mucosal barrier fortification, xenobiotic metabolism, angiogenesis, and postnatal intestinal maturation (Hooper et al. 2001). These intestinal responses to bacteria are expected to have a profound effect on the pathogenesis of diseases affecting the digestive tract. However, before significant progress can be made in these areas, benchmarking of the full complement of microbiota in normal human subjects is required.

Our study, which sampled two subjects at more that one site in the digestive tract revealed that the most representative groups were the Bacteroidetes and Clostridium cluster XIVa (Table 1). This was true for both individuals and appears to be supported by the evidence from other studies (Table 2). The 16S rDNA clone libraries of Hold et al. (2002), who sampled the mucosa of the colon of three subjects, indicated that the Clostridium XIVa cluster was most dominant (46·3%) followed by the Bacteroidetes (Bacteroides CFB) cluster (26·4%). A similar study on the faeces of one 45-year-old individual (Suau et al. 1999) revealed that the Clostridium XIVa cluster (44%) was most dominant, followed by the Bacteroidetes (30·9%). Sghir et al. (2000) using northern blot analysis demonstrated that Bacteroidetes in faecal flora can vary from 20 to 52% between individuals.

Table 2.  Comparative phylogenetic distribution of microbial species in the hindgut of the human, horse and pig as determined by 16S rDNA analysis
GroupSample type
  1. *Samples taken from the mucosa of the terminal ileum, proximal and distal colon of a 35 and a 68 y human subject (this study).

  2. †Samples taken from the colonic mucosa and lumen of the horse (Daly et al. 2001). Insufficient information to calculate coverage accurately.

  3. ‡Mucosa samples taken from the colonic mucosa from three elderly human subjects (70–82 y) (Hold et al. 2002). Insufficient information to calculate coverage accurately.

  4. §Ileum, caecum, and colon contents (Leser et al. 2002).

  5. ¶Human faecal sample from a single 45 y subject (Suau et al. 1999).

  6. **Other sequences comprise sequences not falling within the major groups described in the table.

  7. ††See Materials and Methods.

Proteobacteria (%)20·922·75·31·8
Bacteroidetes (%)27·72026·411·230·9
Clostridium cluster IV (%)8·9814·529·120·1
Clostridium cluster XIVa (%)24·73746·333·344
Clostridium other clusters (%)14·3145·44·02·0
Other sequences (%)**2·4174·74·80·5
Number of sequences3612721104720284
Coverage (%)††82∼80∼8097·885

The study of Hold et al. (2002), and the present investigation, did not encounter any bifidobacterial sequences. Bifidobacteria are estimated by culture to comprise 109/g faeces (Gibson et al. 1995). Not all sequences are amplified at exactly the same efficiency with a given set of primers, which may bias the clone libraries. However, the results have been consistent in the molecular studies, even though they employed different primer sets. In addition, we find bifidobacteria in other normal subjects (unpublished PCR-clone data). Bifidobacteria, which are relatively easy to culture, may be overestimated by traditional techniques. They have previously been detected by fluorescent in situ hybridization and were estimated at not more than 3% of the total bacterial population (Franks et al. 1998). On the other hand, it is conceivable that random PCR-amplification may be less efficient for the detection of the rarer microbiota, because the rarer sequence amplicons may be swamped by the amplicons of the more common sequences. This is why the PCR-cycle number should be kept as low as possible to maximize the recovery of OTU with the random cloning technique (Bonnet et al. 2002).

In our study, clone libraries were made from three different intestinal sites. It is not possible to conclude on the basis of data from one individual that differences in the three clone libraries were due to site, rather than due to small sample size. Zoetendal et al. (2002), who compared sites within the colon of 10 subjects, did not detect a site-related difference with DGGE. However, the findings in the present phylogenetic investigation were internally consistent with a proximal-distal gradient in which the number of OTU increased and the shared OTU were least between most distant sites.

In our study, we obtained 361 sequences with 70 unique OTU that represent a calculated coverage of 82%. This means that the probability of an additional cloned sequence falling into a not yet observed OTU is 18% (Good 1953). Coverage gives an estimate of how much of the biodiversity in the sample was cloned with the methodology used. Another estimate was obtained by plotting the cumulative number of OTU as a function of the number of clones sequenced (Figs 1 and 2). Using this estimate, the asymptote would represent 100% coverage, and in both individuals, either using near-full or partial sequences, a significant proportion of the biodiversity was covered. These estimates are useful and indicate how much additional information is obtained by doing more sequencing. For example, the study of Leser et al. (2002) (Table 2) obtained 4720 sequences from the pig hindgut with a coverage of 97·8%. In essence, as one gets closer to the asymptote, vast amounts of additional sequence are needed for relatively small gains in new information.

In our study, pinch samples from the mucosa were taken during routine colonoscopic examination of patients. An oral colonic lavage treatment was used to prepare the colon for the examination. A criticism of our study might be that the lavage treatment, which is relatively severe, would have destroyed a significant proportion of the biodiversity inhabiting the mucosal surfaces of the digestive tract. This appears not to be the case. Hold et al. (2002) sampled the colonic mucosa of a deceased individual where no lavage was administered. They did not obtain greater diversity in their clone libraries than we did (Table 2).

We compared the diversity obtained using a near-full length sequence and by using a 350 bp fragment (Fig. 1). When the cumulative number of OTU was plotted against the number of clones for both the near-full length and 350 bp fragment, there was not any significant difference in OTU obtained. This would indicate that an adequate representation of microbial diversity could be obtained using the shorter fragment. It has previously been demonstrated that phylogenetic trees based on partial sequences have the same topologies as those based on complete sequences. While the established groups are identical, there are some differences in deeply branching phyla but this will not influence the view of diversity that is obtained (Lane et al. 1985; Lopez-Contreras et al. 2001).

Studies using either conventional culture methods or fluorescent in situ hydridization have revealed that over 96% of colonic mucosal bacteria were anaerobes and only 1–4% belonged to facultative aerobes (Pathmakanthan et al. 1999; van der Waaij et al. 2002). The results of our study are consistent with this but do demonstrate the significant inter-individual differences in terms of anaerobe and facultative aerobe distribution and population. Subject LC biopsy samples, had a high proportion of Bacteroidetes spp. (38%) in all three clone libraries and few Protoebacteria clones, whereas individual AB harboured less Bacteroidetes but more numerous γ-Proteobacteria. The Proteobacteria clones in AB descending colon were an order of magnitude larger than in other reported gut niches (Table 1), which meant that the mean percentage of Proteobacteria clones from the subjects, LC and AB, in our study, were also an order of magnitude larger (Table 2). They included bacteria like Burkholderia cepacia, Leptothrix cholodnii, H. parainfluenzae, Shigella sonnei, S. flexneri and Stenotrophomonas maltophilia, all of which may be pathogenic, both in the immunologically compromised human (Nikkari et al. 2002; Saiman et al. 2002) and the normal human host. Thus, the possibility arises that the presence of so many Proteobacteria in AB, is pathological. She had mild sigmoid colonic diverticulosis. Diverticulosis is a normal finding in elderly Caucasians (Manousos et al. 1967; Hughes 1969) but its aetiology is poorly understood. There may be a role for bacteria in its pathogenesis. Hold et al. (2002) also reported Burkolderia cepacia in their elderly subjects. However, it is quite likely that the Proteobacteria are commensals, which only become pathogenic if they translocate the mucus layer. The bacterial communities in LC were closer to the reported mucosal bacterial communities of previous culture-based studies, with a predominance of species from Bacteroidetes. This included frequent clones of B. vulgatus, B. caccae, B. ovatus, B. thetaiotaomicron and B. distasonis (Poxton et al. 1997; Pathmakanthan et al. 1999). However, the population structure of the other major bacterial groups, in particular Clostridium, is considerably different in the 16S rDNA clone library-based studies. Thus, members of the Clostridium cluster XIVa were the most frequently represented clones in LC, in the Hold et al. (2002) mucosal study and the study of a single faeces (Suau et al. 1999). In addition, 4% of sequences from AB, all from distal colon, were closest to butyrate-producing bacteria. These physiologically important species are more likely to be underestimated by conventional culture (Barcenilla et al. 2000; Hold et al. 2002).

Comparative analysis between our study and similar studies of hindgut ecosystems reveals remarkable similarity between phyla represented (Table 2). The Bacteroidetes (CFB) and Clostridium XIVa groups are in all cases the predominant clusters. This indicates that phylogenetic diversity among gastrointestinal ecosystems is highly similar. It is likely that pathogenic states would alter this balance, possibly not in terms of the overall distribution of phenotypes but certainly within groups, or clusters of organisms.

In conclusion, we have constructed and analysed the bacterial community resident on the mucosa of the human terminal ileum, proximal colon and distal colon by using 16S rDNA clone libraries. Together with previously published data, our data reveal that the major phylogenetic groups are similar across the mammalian digestive tract, and are dominated by the Bacteroidetes and the Clostridium XIVa cluster. This study makes an important contribution to the benchmarking of the bacterial composition of the mucosal surfaces of the human digestive tract.


We gratefully acknowledge the technical help from Ms Wendy Smith (Longpocket Laboratories) and Mrs Jing-Chuan Li (Mater IBD Research Laboratory). Dr Xin Wang is the current holder of the Reginald Ferguson fellowship in IBD research.

The work was presented to the American Gastroenterological Association's Digestive Diseases Week in San Francisco. Gastroenterology May 2002; W938.