Human distal gut microbiome


E-mail; Tel. (+44) 29208 74188; Fax (+44) 29208 74305.


The distal gut and its associated microbiota is a new frontier in the quest to understand human biology and evolution. The renaissance in this field has been partly driven by advances in sequencing technology and also by the application of a variety of ‘omic’ technologies in a systems biology framework. In the initial stages of understanding what constitutes the gut, culture-independent methods, primarily inventories of 16S rRNA genes, have provided a clear view of the main taxonomic groups of Bacteria in the distal gut and we are now moving towards defining the functions that reside in the distal gut microbiome. This review will explore recent advances in the area of the distal gut and the use of a variety of omic approaches to determine what constitutes this fascinating collection of microbes.


Gut or intestinal microbiology has undergone a mini-renaissance in the past 10 years. In a comprehensive review of the role of the gut microbiota in the health of the host, Sekirov and colleagues (Sekirov et al., 2010) charted the number of publications for the period from 1990 to 2009. Their data show that in this period there has been a near fivefold increase in the yearly publication rate. In fact the overall number they show is an underestimation, if the ISI Web of Knowledge database is queried with similar keywords (Fig. 1), the trend is the same; however, the number of publications, which now includes 2010, is nearly twice the figure and currently peaks at 1366 for 2010. Several reasons are responsible for this increased attention, the recognition that the gut microbiota plays a central role in host health, as well as the cross-pollination of ideas from microbiologists working in the varied areas of environmental microbiology. As a discipline environmental microbiology has always been challenged by what has been referred to as the ‘the great plate count anomaly’ and which describes the disparity between what we can grow in the laboratory, on conventional microbiology media and what we can directly count (Staley and Konopka, 1985). This challenge has resulted in a dramatic (and some may say it is a swing too far away from culturing) shift away from culturing to developing culture-independent approaches to investigate ecosystem function and the role that microbes play (Amann and Kuhl, 1998). However, microbiologists working in the human body, have been relatively fortunate because a significant proportion of the microbial community in these systems are culturable, a fact that delayed the introduction of culture-independent approaches to analyse this ecosystem. The suite of methods that have been used are variations on the genomic, transcriptomic, proteomic and metabolomic methods. The most commonly used are the metagenomic and 16S/18S rRNA gene-based methods to determine the functions in the microbiome and the species present. While metatranscriptomic {Gosalbes, 2011 #17381; Bomar, 2011 #17475} and metaproteomic {Verberkmoes, 2009 #15794} {Rooijers, 2011 #17579; Klaassens, 2007 #6600} methods are been implemented, but to a much lesser extent. Using the gut as an example, the two most commonly studied niches are the distal gut and oral cavity, because logistically they are the easiest to access. In both instances the proportions of the microbial community that are as yet uncultivated are between 30% and 50% (Wade, 2002; Eckburg et al., 2005; Duncan et al., 2007), which provides researchers with a significant culturable microbial biomass for investigation. When this figure is compared with environmental ecosystems such as the deep biosphere or soil where the culturable fraction can be between < 0.1% and 1% respectively (Hugenholtz et al., 1998; Fry et al., 2008) it becomes clear how researchers in these areas needed to create a suite of tools to help in developing a more complete picture of microbial contributions to ecosystem function. The current burst of interest in the gut ecosystem and how its microbes influence host function/physiology has in some way been driven by microbiologists adopting the tools of environmental microbiologists and implementing them ina new setting. Without these methods we would not have realized that the gut microbiota is so diverse between different individuals, or resilient to perturbations, or that key functions in some instances are redundant, while other they are not. These initial forays into the gut ecosystem using culture-independent methods paved the way for developing hypotheses in which the gut microbiota are drivers of health and disease. Hence in light of the numerous reviews on the microbiota of the human gut (524 for 2008–2010) this review will concentrate on the most recent and significant findings in the literature.

Figure 1.

The number of publications retrieved from the ISI Web of Knowledge database (, obtained by using the following keywords and Boolean operators: ‘intestinal microbiota’ OR ‘gut microbiota’ OR ‘intestinal flora’ OR ‘gut flora’ OR ‘intestinal microflora’ OR ‘gut microflora’ OR ‘gut microbiome’ OR ‘intestinal microbiome’ (the addition of the word gut microbiome and intestinal microbiome, not used by Sekirov and colleagues, added 92 publications compared with the same search without).

Anatomically, the human gut is divided into six sections, the oral cavity, oesophagus, stomach, small intestine (subdivided into the duodenum, jejunum and ileum), the colon or distal gut (subdivided into the ascending, transverse and descending colons) and rectum (Fig. 2). While the physiological role of the gut is to process and digest the food we ingest, it also offers a niche for colonization by a variety of microbes. Each niche harbours a specific microbial community, which to some extent reflects the dynamics of that compartment. The numbers of microbes in each niche increases as one moves from the stomach to the rectum resulting in one of the most densely populated ecosystems being found in the distal gut or colon, which contains between 1011–1012 bacteria per gram of luminal material. Because the distal gut contains one of the densest communities known (Whitman et al., 1998) and is very easy to access [up to 55% of a stool sample is bacterial biomass (Cummings and Macfarlane, 1997)] it has received the majority of the attention. However, this does not mean it is a robust representation of the whole colon or small intestine; moreover, the mucosal surface contains a microbiota that is significantly different to that found in a stool sample from the same subject (Momozawa et al., 2011). Data obtained from analysis of faecal material must be considered in light of where this sample comes from and conclusions based on this data must be tempered appropriately; however, these data still provide a very valuable insight into functions and species present in the gut.

Figure 2.

The anatomy of the gastrointestinal tract, major bacterial phyla and their abundance in each niche. The information for this figure was compiled from (Eckburg et al., 2005; Bik et al., 2006; O'Hara and Shanahan, 2006; McConnell et al., 2008; van den Bogert et al., 2011).

The current census of the inhabitants of the distal gut: the early years of the distal gut

Unlike many environmental ecosystems being investigated, the establishment of the climax community in the gut is played out time and time again with every birth; moreover, it can very easily be perturbed and involves an immunological dialogue with the system in which it resides. Many of the major ecosystems that are studied, marine, terrestrial, deep-biosphere and atmosphere have been colonized for many millions to billions of years. However, in the majority of cases humans are born sterile [cases have been reported in which amniotic fluid sludge containing cultured isolates of Mycoplasma hominis, Streptococcus mutans and Aspergillus flavus has been observed (Espinoza et al., 2005; Romero et al., 2008)] and immediately upon exit from the mother start to be colonized by microbes. There is a significant interest in understanding what drives this colonization process and how much nature or nurture plays an influencing role. Specifically, because we have a poor understanding of whether early life events, which may alter the gut microbiota's composition, can have ramifications for later life health. The climax community seems to be established within the first 2 years of life and after the first year, it has started to converge and reflects a generalized adult distal gut community (Palmer et al., 2007). The factors that influence this process are the maternal microbiota (Dominguez-Bello et al., 2010), diet (breast fed vs. formula fed; Favier et al., 2003), mode of delivery (normal vs. caesarean; Biasucci et al., 2010; Dominguez-Bello et al., 2010), full or preterm gestation (Schwiertz et al., 2003; Morowitz et al., 2011), environmental exposure (Palmer et al., 2007) and clinical interventions [antibiotics (Palmer et al., 2007) or gastrointestinal surgery (Zhang et al., 2009) technically this paper shows the impact of surgery on the adult gut)]. This progression has also been confirmed when using metagenomic DNA (mgDNA) instead of the 16S rRNA gene. Koenig and co-workers (Koenig et al., 2011) created inventories of the 16S rRNA gene (from 60 infants) and used this information to select 12 infants for a sequence based metagenomic analysis on the Roche 454 platform. The data they generated were processed using MEGAN (Huson et al., 2007) and MG-RAST (Meyer et al., 2008) to assess the taxonomic source and functions contained within the mgDNA respectively. Using mgDNA the same succession was seen as with 16S rRNA gene data (Fig. 3) and other groups have also confirmed that random sequence reads can be used in lieu of taxonomically relevant genes such as the 16S rRNA gene (Manichanh et al., 2008; Ghosh et al., 2010; Gori et al., 2011). The consensus of opinion from these studies seems to be that the trajectory of the colonization process is towards a similar outcome, i.e. a distal gut microbiota, which after the age of 2 is stable and colonized predominantly by Firmicutes and Bacteroidetes (see below). However, we do lack the information of which factors are driving this process, how the colonization process in different ethnic groups proceeds and to what extent the functions in the gut are established. Hence there is a clear need to continue to determine the key events that influence the establishment of the climax community.

Figure 3.

Taxonomic distribution of metagenomic sequences isolated from infant faecal DNA [adapted from Koenig et al. (2011) with data kindly provided by Prof Ruth Ley].

The adult and ageing distal gut microbiota

One of the most comprehensive early culture-independent analyses (using clone libraries of 16S rRNA genes and Sanger or first-generation sequencing platforms) was carried out on the distal gut by Eckburg and colleagues (Eckburg et al., 2005). This study revealed that while there were many bacteria in the gut they were actually not as diverse as soil or marine ecosystems. In the majority of mammals the two main phyla present are the Bacteriodetes and Firmicutes (Ley et al., 2008) and it seems that members of these two phyla contribute approximately 90% of the species in the distal gut. The number of species estimated to be present in the distal gut is relatively small [compared with soil in which millions of species are estimated to exist in 10 g (Gans et al., 2005)] and is in the hundreds (Qin et al., 2010), while a larger degree of diversity exists at the strain level, which maybe in the thousands (Ley et al., 2006). The importance of the strain diversity may only be significant when the functions that the strain carries are non-redundant. For example, there are two hydrogenotrophic groups, the methanogens and sulphate reducing bacteria, which are represented by very few species and strains in the distal gut {Scanlan, 2009 #16220} {Dridi, 2009 #17393}. In this scenario, it would be easy to lose the functions these organisms provide, whether health promoting or detrimental remains to be seen, to the host. A further consequence of this strain diversity is that phylogenetic trees of the gut tend to have few branches, which are not deep, but have a large degree of radiance at the ends. However, this census is based on a very small number of samples and to put it into perspective a recent search for single nucleotide polymorphisms that correlate with adult height screened 183 727 individuals to determine statistically significant correlations (Lango Allen et al., 2010); in contrast, the majority of the studies that have been undertaken to determine the composition of the gut microbiota use small cohorts that can be counted in the tens rather than thousands. One recent study that has sampled hundreds of individuals has shown why we need large cohorts of subjects. While we can take averages of the numbers of species present in the distal gut and conclude that two phyla predominate, if the sample size is increased the proportions of these phyla can tell a very different story. In the Eldermet project ( being undertaken in University College Cork, Ireland, the investigators profiled the distal gut of 386 > 65 year old individuals using second-generation 454 pyrosequencing and obtained approximately 40 000 reads per sample and which spanned the V4 region of the 16S rRNA gene. Once again the composite picture of the distal gut was one in which gene sequences from the Bacteriodetes and Firmicutes contributed 97% of the overall sequences obtained (57% and 40% respectively; Claesson et al., 2011). However, when the individual profiles were plotted and ordered an entirely different picture was obtained (Fig. 4). The distribution of 16S rRNA genes showed that within the cohort there was a continuum, at one end Bacteroidetes made up nearly 90% of the distal gut microbiota while at the other Firmicutes made up more than 95% of the sequences recovered. This larger study further highlights the necessity to increase the cohort size and move away from small studies of 2–3 subjects. The fact that the distal gut microbiota is so variable at the phylum level does make one wonder whether it is this variable further down the taxonomic levels, for example, at the genus level? To answer this question, several groups have been exploring the concept of ‘the core microbiota’ and whether there exist a group of species found in all distal guts regardless of geography, ethnicity, age, gender or diet. While it would be safe to say that there is a core microbiota at the phylum level, i.e. all humans posses members of the Bacteroidetes and Firmicutes, when we drill down the taxonomic levels it seems that this concept becomes more sketchy and different studies and methods provide different answers. Tap and colleagues undertook a de novo analysis of the composition of the distal gut microbiota using first-generation sequencing and PCR amplification and cloning of the 16S rRNA gene (Tap et al., 2009). They generated 10 456 16S rRNA gene sequences from 17 human faecal DNA samples and analysed them to determine which sequences were shared and which were unique. In their conclusions, they state that on average each individual contains 259 operational taxonomic units (OTUs at the 98% level), but the range was large (159–383) and in total 3180 OTUs were identified from the total pool of 16S rRNA gene sequences. Approximately, 79% of the OTUs were only found in one sample and 21% were found at least twice; however, no OTUs was found in all 17 distal guts. They showed that 66 OTUs were found in 50% of the samples and proposed that these may in some way constitute a core microbiota. These OTUs belonged to 18 genera and these were affiliated predominantly with the Firmicutes (57/66). In a study with a similar goal, to determine the members of the core microbiota of the distal gut, Rajilić-Stojanović and co-workers used a DNA array approach to determine the composition of the distal gut (Rajilić-Stojanovićet al., 2009). While the aims were comparable the methods are not, even if they both target the same gene – 16S rRNA. DNA arrays have the advantage of being more sensitive and are able to detect sequences in a mixture at much lower levels than a random sampling without replacement strategy, e.g. analysis of a clone library or second-generation amplicon sequencing (Harrington et al., 2008; Paliy et al., 2009; Rigsbee et al., 2011). However, they are only as good as the database used to design the array probes and any novel sequences in the samples, which are not represented on the chip, will not be detected. Bearing this in mind Rajilić-Stojanović and co-workers used their human intestinal tract chip (HITChip) to profile the distal gut of five young and five elderly volunteers. Their HITchip can measure the abundance of 1140 unique microbial phylotypes and they concluded that there was a common core between all 10 individuals, which consisted mainly of probes from three phyla (Actinobacteria, Bacteroidetes and Firmicutes) found in the distal gut and confirms that we possess a distal gut microbiota that is host specific. In addition, they were also able to show that the young and elderly guts samples clustered according to the host's age with all the young and elderly samples found in their respective clades. In the Eldermet study (Claesson et al., 2011), there was also a significant difference between the elderly and young distal gut. In the elderly distal gut more than half of the core microbiota (53%) were from the Bacteroidetes, from the genera Bacteroides (29%), Alistipes (17%) and Parabacteroides (7%), while in the younger distal gut this figure dropped to between 8% and 27%. Furthermore, the core clostridial species were predominantly in the Clostridium cluster IV for the elderly whereas cluster XIVa was more prevalent in the younger cohort, again highlighting the need for longitudinal studies rather than snapshots of the distal gut composition. In the study conducted by Biagi and co-workers (Biagi et al., 2010), which looked at young (Y), elderly (E) and centenarians (C) (groups of 20, 22 and 21 and average ages of 31, 72.7 and 100.5 respectively) using the HITChip platform and quantitative PCR, the trend for a variation in the main groups was also seen (Fig. 5). However, the changes in the subgroups within the clostridial group were not the same as found in the Eldermet project, with an increase in Clostridium cluster XIVa going from Y to E, which decreased in the centenarians. The Clostridium cluster IV remained the same between all three groups, while between the C and E groups only the Faecalibacterium prausnitzii was significantly different, between C and Y groups Bifidobacterium spp. differed and between E and Y members of the genus Akkermansia differed. In a recent development to describe the core microbiota of the distal gut Sekelja and colleagues undertook a post hoc analysis of previously published datasets from pyrosequencing projects targeting the 16S rRNA genes (Sekelja et al., 2011). They also changed the approach used, by moving away from defining taxonomic groups and in their words ‘search for a human core microbiota independent of both predefined phylogroup depths and phylogenetic trees’. Using an alignment-independent approach, they analysed 16S rRNA gene sequences (from eight previous studies and comprising 1 186 272 partial 16S rRNA sequences from 210 samples) and clustered them using principal component analysis based on their sequence similarity [calculated by establishing 5 mer nucleotide frequencies in each sequence (Rudi et al., 2006; 2007)]. From their analysis they report that there were two microbiota cores, which were consistently found in all samples. Both cores were affiliated to the Firmicutes and were members of the clostridial family –Lachnospiraceae. Interestingly they concluded that each core appeared at defined moments in evolution with core 2 co-evolving with the radiation of vertebrates and core 1 co-evolved with the mammals. These studies enforce the stochastic nature of sampling the distal gut and the need for more large-scale studies to minimize confounding factors such as diet, environment and genetic/immunological variability of the host.

Figure 4.

Proportions of main bacterial phyla in 386 Eldermet faecal samples, the two main phyla are shown in the figure while the remaining phyla were the Proteobacteria, Actinobacteria, Lentisphaerae and Verrucomicrobia. The inset pie-chart shows the mean values for the phyla (F –Firmicutes and B –Bacteroidetes) isolated from the distal guts of the elderly individuals, the category others includes the following phyla –Proteobacteria, Actinobacteria, Lentisphaerae and Verrucomicrobia (data to construct this figure were kindly supplied by Dr Paul O'Toole, University College Cork, Ireland and Eldermet principle investigator).

Figure 5.

Relative abundance of phylum/order phylotypes from centenarians (C), elderly (E) and young (Y) (adapted from Biagi et al., 2010).

A bacterial-centric view that needs to encompass all the microbiota

To date we still have a limited understanding of what constitutes the core microbiota of the distal gut and as such cannot define its limits. We still need to expand the numbers of distal guts sampled, the ethnic groups from which we obtain the samples and also the compositional stability of the core. While the previous studies have all indicated that the concept of a core microbiota is not dead, we have not reached any consensus as to what species should be considered as members of this important group of bacteria, but we do agree that main phyla are the Bacteroidetes and Firmicutes. Furthermore, the concept of a core microbiota has not been fully inclusive and maybe should be renamed the core ‘bacteriota’ as it has not considered the micro-eukaryotic and viral components. Both of these groups of organisms have been studied in relation to the distal gut, but in a more limited fashion. Only a few studies have been undertaken looking at the human micro-eukaryotic diversity using culture-independent approaches (Ott et al., 2008; Scanlan and Marchesi, 2008) and viral diversity in faecal samples (Breitbart et al., 2003; 2008; Zhang et al., 2005; Reyes et al., 2010). The micro-eukaryotic diversity and numbers is several orders of magnitude lower than the Bacteria and is skewed towards Candida and Saccharomyces spp. when cultured, but culture-independent approaches using 18S rRNA genes shows that Blastocystis spp. are very common in the distal gut and yeasts are rarely obtained. In fact, it may be concluded that micro-eukaryotes are only really significant when there is a dysbiosis in the gut (Goldman and Huffnagle, 2009). For the viral component the story is very different with their numbers being at least an order of magnitude higher than the bacterial numbers in the distal gut. Thus we might need to start to consider the viral component as drivers of community dynamics as some marine microbiologists do (Suttle, 2007). In fact, Lepage and colleagues (2008) have hypothesized a role for distal gut bacteriophage as drivers of dysbiosis in the distal gut and inflammatory bowel disease. While studies looking to define the core microbiota have focused on describing the Bacteria within the distal gut, there is also a significant number of Archaea in this niche. The most common species and 16S rRNA gene sequence isolated from the distal gut come from the Euryarchaeota and in particular the Methanobacteriaceae family (Scanlan et al., 2008a; Dridi et al., 2009) with Methanobrevibacter smithii and Methanosphaera stadtmanae the two predominant Archaea found. However, other rarer archaeal sequences have been reported that cluster in the Methanosarcinales[a methyl coenzyme reductase subunit A (mcrA) sequence (Scanlan et al., 2008a)], Halobacteriaceae (Oxley et al., 2010) and a putative sixth archaeal order (Mihajlovski et al., 2008; 2010). However, in all studies to date M. smithii and Mstadtmanae are the two main Archaea (Dridi et al., 2011) and one would question to what extent the much rarer species are autochthonous and are actually contaminants from our diet/environment.

The luminal microbiota versus the mucosal microbiota

One of the major criticisms of many of the studies on the distal gut is the reliance on stool or faecal material as the source of microbial biomass or genomic DNA. While it is quite simple to collect it is clear that faeces do not afford a robust proxy for the gut microbiota as a whole. In Eckburg and co-worker's 2005 culture-independent analysis of the distal gut (Eckburg et al., 2005) they clearly showed that while the microbiota attached to the mucosa was similar throughout an individual's large intestine it was significantly different to the stool sample from the same individual, but whether there is any biological significance in this difference remains to be shown, as the number of luminal bacteria are between 4–6 orders of magnitude less than the mucosally associated bacteria [MAB; Zoetendal (Zoetendal et al., 2002; Ahmed et al., 2007; Walker et al., 2011)]. Using a DNA microarray (Aus-HIT Chip) Aguirre de Carcer and colleagues (de Carcer et al., 2011) have shown not only a gender difference between the MAB, but also a qualitative change in the MAB composition moving from the caecum to the rectum, via the transverse and sigmoid colon. However, we still need larger studies to determine what is considered to be the prevalent species colonizing the different regions of the colon and at what scale the community starts to diverge.

Is there a core microbiome?

Qin and co-workers (Qin et al., 2010) and other sequence-based metagenomic studies have addressed the issue of whether there is a core microbiome (the collection of microbial genes) and if so what does it look like? To date the study of Qin and colleagues is by far the ‘deepest’ and largest metagenomic1 sequencing project to be undertaken; however, two smaller metagenomic studies do precede it (Kurokawa et al., 2007; Turnbaugh et al., 2009). In the most recent study the authors used an Illumina second-generation sequencing platform to generate 0.58 terabases of sequence from 124 volunteers(approximately 4.5 gigabases per individual with an average read-length of 75 bp) and determined that there were 3.3 million non-redundant genes in the distal gut metagenome. This figure is in agreement with the previous figure of 9 million genes (Yang et al., 2009) and that between the different gut samples there were 204 056 common genes that comprised 38% of an individual's gut microbiome. In this project these genes were grouped into 6313 clusters of orthologous groups and could be divided into house-keeping genes and gut-specific genes. When studying the gut it would be the genes only found in the gut that are of key interest as these may play a role in shaping the relationship between the host and its gut microbiota. While the house-keeping genes were part of the main metabolic pathways commonly associated with bacteria, for example, amino acid synthesis, nucleic acid processing and general secretory processes, the gut-specific genes were identified as being involved in adhesion to host proteins or catabolizing globoseries glycolipids. However, the majority of the clusters of orthologous groups (74.3%) were not defined and this fact highlights a key problem with sequence-based metagenomic projects, they lack the ability to provide novel functions (Table 1). Many of the sequences, when compared with the current databases will either return ‘hits’ to annotated functions, hypothetical ORFs or unknowns. In the case at hand when the supplementary data (tables 10 and 11 from Qin et al., 2010) are searched there are no reported hits to genes involved in butyrate synthesis (Louis et al., 2010), bile catabolism (Jones et al., 2008), glucuronidases (Gloux et al., 2011) and functions, which are not easily classified, but maybe important to the host, for example indole-3-propionic acid synthesis (Wikoff et al., 2009), choline catabolism (Wang et al., 2011) and NF-κB modulators (Lakhdari et al., 2010). However, this is not a criticism of the study, but rather an observation of the difficulty of the task and deciding what should be classified as a core function of the microbiome (genes involved in bile catabolism and butyrate synthesis are present in the METAHIT datasets, but are not abundant). Trying to determine which genes are important to the host, when they may be at low levels in the microbiome, cannot be achieved by simply sequencing. This fact is further enforced by the recent study of Arumugam and colleagues (Arumugam et al., 2011), which has taken 17 metagenomic datasets from previous studies (Gill et al., 2006; Kurokawa et al., 2007; Turnbaugh et al., 2009) as well as 22 they generated using first-generation Sanger sequencing and statistically and phylogenetically analysed the information. The major outcome of this analysis was their conclusion that the distal gut is stratified into three ‘enterotypes’, which are predominantly driven by species composition. Enterotype 1 is dominated by the genus Bacteroides, enterotype 2 is dominated by the genera Prevotella while in enterotype 3 the genus Ruminococcus is the discriminatory genus (Fig. 6; see Supporting information for Arumugam et al., 2011 for further information on genus abundance). Another interesting finding was that several abundant functions found in the different enterotypes are not associated with abundant genera, for example, bacterial pilus assembly were associated with the low-abundance genus Escherichia, while the hydrogenotrophic functions, which include acetogenesis, sulphate reduction and methanogenesis were not detected using the functional marker approach. The mcrA functional gene was only detected in 3 out of the 22 European samples, although the methanogens that harbour this gene is found in > 95% of individuals (Dridi et al., 2009). Dridi and colleagues claim that the low incidence of methanogens in previous studies was due to an inappropriate DNA extraction method and PCR target, using their modified approach they improved detection from 19% to 95.7% in the 700 samples studied (Dridi et al., 2009). Which raises the question of how much bias is introduced into these studies by such factors as the method used to extract the DNA? The method used in the METAHIT study was one developed to obtain high molecular weight genomic DNA for creating a metagenomic library fosmids and uses a ‘gentle’ extraction protocol that does not involve any mechanical shearing (Courtois et al., 2003), hence this may explain why some functions/groups are absent. Furthermore, if genes involved in hydrogenotrophic processes, which are known to be important to gut and host function (McNeil, 1984; Waniewski and Martin, 1998; Attene-Ramos et al., 2006; Sahakian et al., 2010), cannot be robustly detected is the data suspect? The authors concede ‘[functional genes] from these less abundant microbes could barely be identified’. However, such studies do provide the wider scientific community with an invaluable resource from which we can derive hypotheses as to what constitute a core microbiome and these can be tested in either large human cohorts or animal models of the human gut. However, we may need to invest in even deeper sequencing projects to establish the limits of how deep we need to probe in order to find functions that are of importance to the host and define the gut ecosystem.

Table 1.  Comparison of the pros and cons of the two metagenomics methods used to study the functions in an ecosystem.
 Function-based screenSequence-based screen
Screen large amounts of DNAYes – with the aid of colony picking and arraying robotsYes – with the use of second-generation sequencing platforms
Provide noveltyYesNo
Genomic contextYesLimited and relies on assembly of reads and assumptions on pan-genomic nature of gut bacteria.
Toxic genesNoYes
Expression issuesYesNo
Storage issuesYes – physical storage, one fosmid library can easily take 650, 384 well plates and 1950 if stored in triplicateNo
Computational issuesNoYes – BLAST searches and data analysis are becoming bottlenecks in the analysis
Figure 6.

A. Abundance of the main phylogenetic groups contributing to defining the three enterotypes of the distal gut. B. Network analysis showing the interrelationships between the main genera in each enterotype (taken from Arumugam et al., 2011).

One way to avoid missing functions is to adopt a top-down or reverse genetics strategy to determine the core functions in the gut (Nicholson et al., 2005; Martin et al., 2007). The most robust strategy would be to use a metabonomic approach either in a targeted fashion or non-targeted, using either mass spectrometry (hyphenated with chromatographic separation, e.g. UPLC-MS) or nuclear magnetic resonance to identify key metabolites that occur in the gut and can only be derived from microbial processes (see example in Fig. 7). From these metabolites it would be possible to develop a database of the core metabonome and work back to microbial genes that are responsible for synthesizing them. Metabonomic studies have started to provide an insight into the key metabolites that are seen constantly in the gut at varying levels, for example, the short chain fatty acids (Martin et al., 2009), amines (Wang et al., 2011), amino acids (Wikoff et al., 2009) and bile salts (Martin et al., 2007). From these metabolite signals, we can start to develop strategies to investigate the diversity and expression of the microbials genes that are responsible for their synthesis. Louis and colleagues investigated the diversity of butyrl-CoA : acetate CoA-transfereses (Louis et al., 2010) using a degenerate PCR method and showed that this gene and its associated function are found in all the samples studied and shows a large degree of variation. Thus this function would be considered to be a core function of the microbiome, because it not only plays a role in the bacterium, but is a significant factor responsible for a key interaction with the host itself. Hence the definition of a core function may need to be revised so we get away from defining the core microbiome as the functions/genes found in a gut, which include genes found in all bacteria, to one that includes the need to interact with the host and is undergoing positive selection by the host, either directly or indirectly. Using this definition many functions would not be included in the core microbiome and only those playing a role in both biological compartments would be considered. Another such example of a ‘core function’ of the microbiome would be the bile salt hydrolases (Jones et al., 2008). In the absence of these genes we can see that rodents have reduced bile acid deconjugation, produce more bile acids and absorb more cholesterol (Wostmann, 1973; Wilks, 2007). Furthermore, the microbial re-colonization of a gnotobiotic animal provides evidence of the gut microbiome's ability to modulate bile acid metabolites, which themselves are regulators of lipid absorption (Claus et al., 2011). These types of integrative or systems biology studies are bringing together the different biological compartments and help to develop a better understanding of the what aspects of the core microbiome are really important in a superorganism.

Figure 7.

Changes in urinary metabolites due to colonization of the gut by microbiota as shown by pattern recognition analysis [principal components (PC) analysis] of partial nuclear magnetic resonance spectroscopic data from gnotobiotic sequential rat urine samples. Samples were collected for up to 3 weeks during the gut microbiotal conventionalization process, the mapping position of five different temporal subsets are shown (T1–T5). One animal (triangle marked by an asterisk next to d 21 cluster) completed conventionalization by day 17 (adapted from Nicholson et al., 2005).

The mobile microbiome or mobilome

In nearly all the functional and sequence driven human metagenomic studies to date, very little regard is paid to genetic elements involved in gene transfer. However, we know that bacteria are frequently transferring DNA via phage, plasmids, transposons and other mobile genetic elements (MGEs) (Ochman et al., 2000). One of the most commonly isolated functions that are found on these elements are genes involved in antibiotic resistance (Wright, 2007); however, the methods used to ‘pull out’ these MGEs are themselves highly biased. They tend to isolate bacteria showing a chosen function [positive screening or endogenous isolation (Smalla and Sobecky, 2002)], this approach limits the range of functions that can be screened and microbes that can be cultured. Alternatively the methods only isolate MGEs that can transfer into a suitable host [exogenous isolation (Bale et al., 1988)], which tend to be Gram-negative. Hence the ability to isolate and describe the functions on MGEs is limited by the current methods available and the fact that many functions are not easily maintained or screened for in a surrogate host (when using functional metagenomics) or reassembled into a whole plasmid in sequence-based approaches. Moreover, cryptic ORFs on MGEs may not be recognized as such if the complete element is not reassembled from the raw data. To this end other approaches have been developed to specifically look at unknown function on plasmids and these have been applied to the distal gut. The TRACA method (Jones and Marchesi, 2007) uses an in vitro transposition event coupled with a plasmid-safe DNAse to tag circular DNA (plasmids and DNA phage) with a selectable marker and an E. coli plasmid origin of replication. This strategy can be used to capture small plasmids (< 15 kb) from the gut metagenome and stability maintain them in E. coli without the need for any selection, apart from that which was introduced (in this case kanamycin), or transfer to a suitable recipient. Using this approach, several plasmids have been isolated from the large intestine of an individual and sequenced. Coupling these sequence data with bioinformatic methods, it was possible to use them as a ‘DNA hook’ to pull out similar sequences from the metagenomic datasets deposited in the public databases (Jones, 2010; Jones et al., 2010). These studies have started to show that as with certain gut functions, such as butyrate production and bile salt deconjugation, there is also a core mobilome in the gut. Two of the plasmids isolated, pTRACA10 and pTRACA22, were found to be enriched in the metagenomes of the 15 human distal guts (from USA, Europe and Japan), while four others did not show any significant homology to these datasets (BLASTn, > 100 bp fragments, > 80% identity and E-value of 1e−5). However, when the same six plasmids were screened against the METAHIT dataset, all were shown to be represented in these datasets, with pTRACA22, showing a significant enrichment compared with the other five plasmids. pTRACA22 is a small 5.9 kb mobilizable plasmid that most probably originates from Blautia hydrogenotrophica as all nine ORFs show > 98% identity to genes from this draft genome (Jones et al., 2010). The most notable feature of this plasmid is its RelBE or type II addiction module (Van Melderen, 2010) and these modules have been implicated in range of host-specific functions, for example, modulation of gene expression, formation of persister cells and biofilm dispersal. However, whether this enrichment of these modules is biologically significant and of relevance to the gut or host still needs to be determined, but it does show that even in the mobilome the gut does show interesting enrichments of some genes and more thorough investigation of this genetic compartment needs to be undertaken in order to establish its role in the ecology of this ecosystem.

The distal gut microbiome as a driver of health and disease

The whole concept of integrating the core microbiome into host biology and physiology is further extended and challenged by considering it as a driver of disease as well. If we have a core microbiota, evolved to the host's needs, and if two individuals share common features of this core will they also share common emergent properties too? Furthermore, if there is a dysbiosis in the gut microbiota, does this lead to the development of gastrointestinal diseases? Such concepts have been explored in the context of the gut microbiota as an environmental factor in functional gastrointestinal diseases, for example, inflammatory bowel disease (Scanlan et al., 2006; Frank et al., 2007), colorectal cancer (Scanlan et al., 2008b; Sobhani et al., 2011), irritable bowel syndrome (Kassinen et al., 2007; O'Mahony et al., 2009) and Clostridium difficile-associated diarrhoea (Khoruts et al., 2010) and more recently in ex-intestinal diseases such as cardiovascular disease (Wang et al., 2011), obesity {Turnbaugh, 2008 #15541; Ley, 2005 #7445; Backhed, 2004 #501} (however, others have been unable to confirm this observation {Duncan, 2008 #15503} {Fleissner, 2010 #17580} {De La Serre, 2010 #16857} {Zhang, 2009 #15740} {Schwiertz, 2009 #16428}) and psychiatric diseases (Desbonnet et al., 2008; Rook and Lowry, 2008) {Bercik, 2011 #17583}. However, there are several issues that discombobulate the idea of the gut microbiota as an environmental factor in these and other diseases. First, many of the studies look at the gut microbiota after diagnosis of the disease, hence we are unsure as to whether we are observing cause or effect. In order to circumvent this issue, large prospective studies need to be undertaken, which are statistically empowered, in which frequent samples are taken and appropriately stored for retrospective analysis. Second, we are currently developing correlations between a disease state and a snapshot of microbial diversity in the gut or a potential metabolite, we need to develop stronger causal links and mechanistic models that are predictive and can be tested in suitable animal models. Even with these issues researchers are developing the view that certain functions and the associated microbes are beneficial to the health of the host. Some of the most commonly seen bacterial metabolites in the human gut are the SCFA, butyrate, acetate, lactate and propionate {Saric, 2008 #15051}. The two bacterial groups that are mainly responsible for producing butyrate are the F. prauznitzii and Eubacterium rectale/Roseburia groups {Louis, 2010 #17389; Louis, 2009 #15805}. This metabolite has been implicated in large array of effects in the intestine that include controlling apoptosis, cytokine production, energy for colonocytes and mucus synthesis {Guilloteau, 2010 #17009}. Hence any changes in these groups would potentially have an impact on this function and host physiology. Beyond this ubiquitous function it does become an exercise in speculation as to what bacterial groups are important to host health. In one respect moving away from trying to define a core microbiome to a core metabonome may aid in defining what we need to study and understand in order to maintain a healthy gut and thus a healthy host.

Concluding remarks

The paradigm of the human distal gut microbiome has shifted in recent years, from one that looked upon it as a source of opportunistic pathogens to one that embraces it as a virtual organ with the ability to influence the health status of the host. Taxonomically, we have established that this system is mainly composed of members of the Bacteroidetes and Firmicutes, but we are still struggling to determine the key functions that are important to the microbes and the host. The ability to catalogue the genes present in the distal gut does not equate to defining the core microbiome and using a top down approach will help to determine this feature in more detail. The question of cohort sizes needs to be addressed and to this end we need to increase the sample sizes used, in order to develop a much more complete picture of the functions in the gut and start to combine sequence and function-based metagenomic studies in order to determine the core microbiome. In addition, the integration of metabonomic data into this model will help to determine the core microbiome and establish how it varies both inter- and intra-individually. Once we have created this foundation we can commence to develop hypothesizes that address such questions ‘as how does variation in the distal gut microbiome influence host function’ or ‘can we modulate the gut microbiome in order to promote health and should we even try’.


I wish to acknowledge the help of my colleagues in the Eldermet project in University College Cork for sharing their data and Professor Ruth Ley for kindly providing me with here data for Fig. 3.


  • 1

    The term metagenomics is routinely confused with creating inventories of 16S rRNA genes to describe bacterial diversity. Metagenomics is the analysis of random genomic fragments either by sequencing or functional analysis.