Having outlined ideas about ecological communities and the evolutionary processes in microorganisms that complicate the relationship between organismal phylogeny and function, we now consider current taxonomic and functional knowledge about microbial communities. These insights will allow us to develop ideas that fuse these aspects of microbial evolution and ecology into potentially new modes of analysis.
The search for a taxonomically defined ‘core’ microbiome
Ecological overlap or equivalence may be at the root of the frequently observed taxonomic differences among samples collected from the same or similar habitats. The most compelling example of this is the absence of a persistent ‘core’ microbiome in many human organs. Huse et al. (2012) examined the distribution of OTUs defined using a 97% identity threshold for different variable regions of the 16S. Oral and stool samples yielded a small number of OTUs that were ubiquitous or nearly so, although these were not necessarily abundant in all samples. Conversely, no OTUs were ubiquitous in many of the vaginal locations sampled, refuting the idea of a ‘core’ vaginal microbiome. Even OTUs that were ubiquitous in oral or stool samples showed differentiation among samples at higher thresholds of sequence identity, suggesting that important differences were being masked at the 97% identity level. Nemergut et al. (2011) examined the distribution of OTUs in different habitats such as soils, lakewater, and saline sediments and found that no OTU was ubiquitous in any habitat even when the sequence identity threshold was set as low as 89%. Deep sequencing of a marine sample (Gibbons et al., 2013) produced significant overlap with OTUs from a range of marine habitats, and the authors suggested that sufficiently deep sequencing at one site would reveal a ‘seed bank’ that encompasses all marine OTUs.
Several causes could contribute to the apparent lack of a ‘core’ in the many habitats examined, beyond the sampling limitations probed by Gibbons et al. (2013). Dispersal limitation and biogeography may play a role (Hanson et al., 2012), with groups such as Pseudomonas (Cho & Tiedje, 2000) and Burkholderia (Pearson et al., 2009), showing strong evidence of spatial structuring. Habitat definitions such as ‘soil’ and ‘gut’ are clearly too broad, as soil microbial diversity is strongly influenced by pH (Fierer & Jackson, 2006) and microhabitat (Carson et al., 2009; Dennis et al., 2009; Reim et al., 2012), and the composition of gut microbiota appears to strongly depend on factors such as diet (Muegge et al., 2011; Wu et al., 2011; Claesson et al., 2012) and the section of the gut that is sampled (Stearns et al., 2011). Although there is no core ‘gut’ microbiome, there may yet be a core ‘healthy transverse colon with high protein and animal fat inputs’ microbiome. Succession may also play a role, as seen for instance in the colonization of dental plaque: The same site can be occupied by ‘early’ or ‘late’ communities that emerge following a disturbance (Human Microbiome Project Consortium, 2012; Teles et al., 2012). Succession was also observed in the multiyear fermentation of American coolship ale, which shows a reproducible pattern in bacteria and yeast species (Bokulich et al., 2012). Finally, the lack of a core may reflect different outcomes of lottery processes as previously described, with observed assemblages reflecting different initial colonization events, where the first established organisms potentially structure the remainder of a community. The existence of positively correlated groups of lineages such as the ‘coabundance groups’ defined by Claesson et al. (2012) and groups of organisms identified in network analysis (Steele et al., 2011; Faust et al., 2012; Friedman & Alm, 2012) does not distinguish between these alternative scenarios. It does, however, suggest that the members of these groups either interact positively with one another and constitute a real community or interact in similar ways with the environment such that all are favored in the same conditions. The observed patterns also support the idea that taxonomic and phylogenetic approaches alone may be insufficient to understand the microbial ecology of a particular habitat (Shade & Handelsman, 2012).
Functional traits in microbial assemblages and communities
If a taxonomic or phylogenetic view fails to resolve a consistent set of community properties, trait-based approaches might yield more coherent results. The ecotype model of Cohan (2002, 2006) retains the requirement that entities constitute clades, but provides a very useful working notion of a set of organisms that are subject to similar evolutionary pressures due to their high relatedness and ecological similarities. However, the evolutionary dynamics of microorganisms allow for rapid change that may bring disparate lineages into conflict, especially if one lineage acquires a particular function of another via LGT. Thus, it becomes more straightforward to focus on ecological similarities, approximated by function defined at one or more levels of organization. How can we integrate functional similarities into a community analysis?
Functional overlap in spite of the apparent lack of an organismal core between samples of the same habitat has already provided convincing arguments in favor of a focus on ecological similarities. A recent example of this has been observed in the microbial communities associated with Ulva australis. Although only six OTUs were present in all sampled habitats (Burke et al., 2011b), and on average, 15% species similarity was seen between samples, and 70% functional similarity was observed across habitats. These functions spanned several categories such as motility, cell adhesion, biofilm formation, interaction with the host, and mechanisms of LGT (Burke et al., 2011a). The proteins involved in these functions in different samples were often phylogenetically distinct, suggesting functional convergence in disparate lineages. Such consistency of function has also been observed with regard to membrane proteins in the ocean. Patel et al. (2010) found correlations between transport proteins and inorganic chemical concentrations, but failed to find a corresponding link with species abundances. These functional profiles also correlated with environmental attributes including pollution, potentially allowing for these gene abundances to be utilized for predictions of such events. Barberán et al. (2012a) report that where 16S fails to differentiate marine microbial communities, genomic traits such as G+C content, genome size, and protein composition dramatically altered beta-diversity patterns and could better discriminate coastal from open-ocean samples and samples from the Atlantic, Pacific, and Indian oceans. Finally, the clinical significance of a shift from taxonomy-based to trait-based community ecology has already been demonstrated through the successful implementation of functional analyses and metagenomic linkage groups to discern microbiomes from type II diabetes patients and healthy individuals (Qin et al., 2012). The above examples all imply that within a given environment certain functional repertoires, defined either by collections of genes or by genomic properties, may be selected for and thus should be the focus of comparisons between habitats.
Although individual genes or ab initio generated combinations of genes may be predictive of phenotype or ecological role (e.g. MacDonald & Beiko, 2010), analyses that treat genes as uncorrelated entities will not always succeed in identifying important functional traits. For instance, Muegge et al. (2011) found that a diverse range of fecal microbial communities from different mammals clustered by diet type when 16S signatures were considered, but not when genes were summarized across all functional categories. Aggregation of genes into pathways and metabolic modules uses known associations between genes and allows for correction of incorrect predictions via gap filling and screening out of unlikely or redundant pathways (Ye & Doak, 2009; Abubucker et al., 2012). At the level of sequenced genomes, pathway- and module-based analyses have identified important functional correlations with periodontal disease (Kastenmüller et al., 2009). It is essential to choose the right trait definition for the question under scrutiny. Conserved traits are often assumed to track genome or organism evolution and thus may be expected to correlate with a wide range of genomic properties and functions (Langille et al., 2013). On the other hand, functional genes, pathways, or modules obtained from WGS confer information about a distinct set of traits that need not correlate with the phylogenetic relationships implied by 16S or other marker genes. To the extent that these different types of information can generate distinct and conflicting patterns, it may be worth combining them in an analysis.
How important are individual genes as mediators of community functions or interactions? Within a single cell, genes and gene products interact in a multitude of ways, for instance by direct chemical interaction, participation in the same biochemical pathway, transcriptional regulation, protein folding and refolding, and subcellular localization. These interactions place constraints on the evolutionary trajectory of genes: For example, the complexity hypothesis (Jain et al., 1999) predicts that genes whose products have many interactions are less likely to undergo LGT, suggesting lower LGT frequencies for ‘informational’ genes that tend to participate in large complexes such as the ribosome as compared with ‘operational’ genes with key metabolic and regulatory roles. This idea was made more explicit by Cohen et al. (2011) who showed that connectivity rather than function was the crucial determinant of gene transferability, which is consistent with the frequent transfer of aminoacyl-tRNA synthetases that are informational, but have few interaction partners in the cell (Woese et al., 2000; Andam & Gogarten, 2011).
In applying these insights from sequenced genomes to microbial communities, a central question is how these gene product interactions can mediate different types of interaction between community members. Gene loss and gene transfer according to the PGH and BQH along with the processes of duplication and substitution can lead to the formation of new community interactions; several such examples have been outlined above in the ecology of the dechlorinating communities, insect endosymbionts, biofilms, and SAR11. Cross-feeding is an obvious example of a microbial interaction, but some described or implied interactions are more complex and difficult to elucidate. For example, targeted studies of homologous genes from environmental samples have revealed remarkable and seemingly stable sequence diversity (Sabehi et al., 2003; Atamna-Ismaeel et al., 2008; Gabor et al., 2012), suggesting niche specialization (Bielawski et al., 2004) and the potential for rapid changes to nutrient sensitivity and host defense. Given the small amount of variation in these sequences and their presence in closely related strains that may possess identical 16S, the effects of these variations will depend on subtle differences in enzyme specificity or kinetics. Although transcription factors are unlikely to migrate between cells, there have been remarkable demonstrations of the ability of one taxon to induce significant changes in another, with dramatic ecological consequences. An example of this is seen in the lungs of cystic fibrosis patients that are subject to periodic exacerbations of the disease that lead to permanent declines in pulmonary function (Goss & Burns, 2007). With Pseudomonas aeruginosa as a primary pathogen of interest, researchers have identified a class of organisms including the Streptococcus milleri group, collectively termed ‘synergens’ that have neutral to positive impacts on hosts on their own, but increase mortality rates when combined with P. aeruginosa (Sibley & Surette, 2011). The specific interactions that induce the shift in pathogenic status remain to be elucidated, although transcriptional profiling under different association conditions will be highly informative (Duan et al., 2012).