Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches


  • Editor: Fernando Baquero

Correspondence: Frederick M. Cohan, Department of Biology, Wesleyan University, Middletown, CT 06459-0170, USA. Tel.: +1 860 685 3482; fax: +1 860 685 3279; e-mail:


Horizontal genetic transfer (HGT) has played an important role in bacterial evolution at least since the origins of the bacterial divisions, and HGT still facilitates the origins of bacterial diversity, including diversity based on antibiotic resistance. Adaptive HGT is aided by unique features of genetic exchange in bacteria such as the promiscuity of genetic exchange and the shortness of segments transferred. Genetic exchange rates are limited by the genetic and ecological similarity of organisms. Adaptive transfer of genes is limited to those that can be transferred as a functional unit, provide a niche-transcending adaptation, and are compatible with the architecture and physiology of other organisms. Horizontally transferred adaptations may bring about fitness costs, and natural selection may ameliorate these costs. The origins of ecological diversity can be analyzed by comparing the genomes of recently divergent, ecologically distinct populations, which can be discovered as sequence clusters. Such genome comparisons demonstrate the importance of HGT in ecological diversification. Newly divergent populations cannot be discovered as sequence clusters when their ecological differences are coded by plasmids, as is often the case for antibiotic resistance; the discovery of such populations requires a screen for plasmid-coded functions.


The evolution of bacteria is not just the evolution of animals and plants writ small. The origin of bacterial species is accelerated by unique features of bacterial genetics, perhaps the most important being the ability of bacteria to readily acquire genes from other organisms (Ochman & Davalos, 2006; Cohan & Koeppel, 2008). Fully sequenced genomes reveal that a substantial fraction of ORFs have been horizontally transferred (Nakamura et al., 2004; McDaniel et al., 2010), and many of these acquisitions are thought to have driven the origins of new bacterial species (Gogarten et al., 2002).

Early evidence of the importance of horizontal genetic transfer (HGT) in bacterial evolution was seen in the spread of penicillin resistance through plasmid transfer across the Enterobacteriaceae (Datta & Kontomichalou, 1965). Bacteriologists might have interpreted this rapid spread of resistance as evidence for the general importance of HGT in bacterial evolution, but the extent and impact of HGT were not fully appreciated until much later. With eventual access to genome sequences, it became clear that HGT occurs throughout the genome, and has been responsible for the origins of extremely diverse adaptations (Ochman et al., 2000). Moreover, HGT has played a role in bacterial evolution at least since the origins of the bacterial divisions (Gogarten et al., 2002). For example, methanotrophs acquired the ability to synthesize some of the cofactors for methane utilization from methanogenic Archaea (Chistoserdova et al., 1998; Gogarten et al., 2002).

More recent HGT events have resulted in important ecological differences between closely related species and between populations within a single recognized species taxon (Welch et al., 2002). For example, the virulence factors that distinguish Salmonella from Escherichia coli were largely acquired by HGT (Groisman & Ochman, 1996; Gogarten et al., 2002). Also, antibiotic resistance factors coded on plasmids distinguish extremely close relatives within recognized species taxa. For many bacterial pathogens, the acquisition of antibiotic resistance is likely necessary to survive in the ecological niche of agricultural animals and humans frequently treated by antibiotics (O'Brien, 2002).

Through HGT, divergent populations can share an adaptation whose value transcends their differences in physiological capabilities, cellular structures, and ecological niches. This sharing of niche-transcending genes does not generally make the two populations more similar ecologically. Instead, HGT allows the recipient to build on its unique, pre-existing adaptations to either invade a new niche or to improve its performance in its current niche (Cohan & Koeppel, 2008). For example, enterotoxigenic E. coli, which attacks the epithelial cells of the small intestine, has shared the Class 5 fimbriae by HGT with Burkholderia cepacia (Anantha et al., 2004), which can reside in human lungs of cystic fibrosis patients, attacking the respiratory epithelium. Acquiring these niche-transcending genes thus allows each lineage to better attack cells of its respective niche, but does not cause the donor and the recipient to converge ecologically (Cohan & Koeppel, 2008; Cohan, 2011). Likewise, when ecologically disparate human pathogens acquire the same antibiotic resistance factors by HGT (Fondi & Fani, 2010), their ecological niches are not converging beyond their response to natural selection by antibiotics.

Here, we will review the properties of bacteria that make HGT so effective in fostering adaptations and the origins of new ecological populations. We will review the classes of adaptations most and least likely to be delivered by HGT, and the ecological and phylogenetic limits on genetic transfer. Also, we will review the evolutionary challenges on the recipient to accommodate horizontally acquired adaptations. Finally, we will review evidence from genome content comparisons that demonstrate a prominent role of HGT in the origins of bacterial diversity, and how genome comparisons have led to the discovery of the ecological dimensions by which bacterial divergence occurs.

The qualities of bacterial genetic exchange that foster adaptive HGT

Exchange of genes in bacteria is both rare and promiscuous. A broad survey of recombination rates has shown that recombination, even among close relatives, occurs at a per capita per gene rate that is generally close to the rate of mutation, and rarely more than about 10 times the rate of mutation (Vos & Didelot, 2009). The rarity of recombination on a per capita basis does not prevent acquisition of adaptations from other organisms, as the enormous population sizes of many bacteria can bring unlikely recombination events within reach (Levin & Bergstrom, 2000). Moreover, recombination in bacteria is too rare to hinder the ecological diversification of closely related populations. This is because natural selection against maladaptive, niche-specifying genes (genes that are adaptive only in the context of their home population) from other populations easily keeps these foreign genes at negligible frequencies (Cohan, 1994; Cohan & Koeppel, 2008) (Box 1).

Table Box 1..   Why recombination does not hinder the adaptive divergence of bacterial populations
Haldane long ago showed why the rare introduction of niche-specifying alleles from one population to another cannot reverse the adaptive divergence between populations: the equilibrium frequency (p*) of a maladaptive, niche-specifying allele from another population is equal to its rate of entry into the population (m) divided by the selection intensity disfavoring the foreign allele (s), i.e. p*=m/s (Haldane, 1932). In the case of bacteria, the rate of entry of another population's allele is equal to the rate of recombination between populations (cb) (Cohan, 1994). A broad survey of recombination rates in the Bacteria and Archaea showed that recombination generally occurs at about the same rate as mutation and never greater than about 10 times the rate of mutation (per gene per generation) (Vos & Didelot, 2009), which is about 10−6 per gene per generation (Drake, 2009). This analysis of recombination rates took special care to estimate only recombination rates among closest relatives, and so the recombination rates may be taken as estimates of the rate of recombination within populations (cw). Thus, the estimate of recombination rate at 10−6 per gene per generation may be taken as an upper limit for the rate of recombination between populations (cbcw). If foreign alleles are disfavored even by weak selection, for example around 10−2, the equilibrium frequency of foreign alleles would be negligible, in this case around p*=cb/scw/s, or 10−6/10−2=10−4.
Thus, even if recombination between populations is occurring at as high a rate as recombination within populations, each population will be able to hold onto its respective combinations of genes, which adapt it to its respective way of making a living. Thus, adaptive divergence in bacteria does not require sexual isolation (Cohan, 1994; Cohan & Koeppel, 2008). This argument has been ignored by workers claiming a central role for recombination and sexual isolation in bacterial divergence (Fraser et al., 2007; Sheppard et al., 2008), but the argument has never been refuted.
Nevertheless, recombination is an important force fostering adaptive evolution in bacteria. Recombination of niche-transcending genes has been shown to be an important means of introduction of adaptations into bacterial populations. The difference is that maladaptive recombination of niche-specifying genes can be rejected by natural selection, but rare, adaptive introductions of niche-transcending genes are amplified by natural selection.

Bacterial recombination is much more promiscuous than recombination in higher organisms, where it is always limited to very close relatives (usually members of the same species or very closely related species) (Mallet et al., 2007; Mallet, 2008). Bacterial recombination can extend across the bacterial divisions and even across the three domains of life (Garcia-Vallve et al., 2000; Rest & Mindell, 2003). Thus, bacterial recombination can foster the acquisition of adaptations from both close and distant relatives.

Transferred DNA is generally short, often the length of one to several genes. A short length helps to enable the success of adaptive HGT between deeply divergent bacteria, as it allows a recipient to pick up a niche-transcending gene (or set of genes) without also acquiring the niche-specifying genes of the donor, which would be maladaptive in the recipient (Zawadzki & Cohan, 1995; Cohan & Koeppel, 2008). This is in contrast to adaptive introgression in animals and plants. Because eukaryotic meiosis involves a 50 : 50 mix of the two parents' genomes, several generations of backcrossing are required to yield a pure transfer of niche-transcending genes across species (Rieseberg et al., 2003); in bacteria, a set of niche-transcending genes can be transferred, without other genes, in a single transfer event (Cohan, 2001).

Donor–recipient similarities required for genetic transfer

The promiscuity of bacteria and the opportunity to transfer adaptations across taxa are not absolute. Certain shared characteristics between donor and recipient are required for transfer to take place (Cohan, 2002). In some cases, genetic exchange takes place through a plasmid or a phage vector, whereby the vector incorporates genetic material from one host, the vector then infects another host, and the genetic material becomes recombined into the new host (Majewski, 2001). Clearly, vector-based genetic exchange between bacteria can take place only if the bacteria share their vectors (Cohan, 2002). The host ranges of phage and plasmids are generally narrow, but the host ranges of certain phage extend across bacterial genera (Jensen et al., 1998), and some plasmids have extremely broad host ranges that extend across bacterial divisions (Norman et al., 2009).

Sharing of vectors between hosts is challenged by bacterial restriction–modification systems. Many bacteria are equipped with a restriction endonuclease that recognizes one specific DNA sequence, usually a palindromic tetramer (e.g. GGCC), hexamer, or octamer; these bacteria are also equipped with a modification enzyme that methylates the recognition sequence and thereby protects the cell's DNA from its own restriction enzyme (Weiserová & Ryu, 2008). Thus, bacteria that share the same restriction–modification system can more easily share a phage or a plasmid, whereas bacteria with different restriction–modification systems will effectively digest the DNA of one another's vectors (Jeltsch, 2003; Budroni et al., 2011). Nevertheless, short plasmid and phage genomes may be shared across bacteria differing in their restriction–modification systems, as short vectors are less likely to contain a given recognition sequence and are thereby more protected from cleavage (Lacks & Springhorn, 1984). Even some larger plasmids, including the broad-host-range Inc-P1 plasmids, are able to defend against restriction endonucleases by removing restriction target sites from their sequences (Wilkins et al., 1996), whereas others simulate the host modification system, protecting their sequences from cleavage (Kruger & Bickle, 1983), thus contributing to their promiscuity.

In some taxa (e.g. Bacillus and Neisseria), genetic exchange can occur by transformation, through uptake of naked DNA from the environment. Most transforming bacteria are able to take up DNA indiscriminately (Lorenz & Wackernagel, 1994), but some species are selective about the DNA sequences that are allowed to cross the membrane. For example, in Neisseria gonorrhoeae, successful transformation requires a 10-bp uptake sequence that occurs abundantly in the genome of this species taxon. The requirement for this uptake sequence effectively prevents genetic transfer from divergent organisms (Hamilton & Dillard, 2006).

Once present in the cytoplasm, transferred DNA must ensure its replication so that it is not lost. For DNA that enters a recipient cell on a plasmid, replication may occur by simply remaining on the plasmid, as seen in many plasmids with antibiotic resistance factors (Fondi & Fani, 2010). Alternatively, if transferred DNA is contained within a pair of insertion sequences on a plasmid or a phage, the DNA flanked by the insertion sequences may replicate and then transfer to the recipient chromosome, without a need for homology between donor and recipient DNA (Vo et al., 2010).

Otherwise, entering DNA may integrate into the recipient chromosome through homologous recombination. This requires a ‘minimum efficient processing segment’ (MEPS) consisting of near-identical sequences of at least 25 bp at one or both ends of a donor segment (Shen & Huang, 1986; Majewski & Cohan, 1999). The requirement of near identity limits the divergence of genetic material transferred through homologous recombination, effectively allowing only close relatives (with <20% nucleotide divergence) to exchange genes in this manner (Majewski & Cohan, 1999). The MEPS requirement has resulted in an exponential decay in the recombination rate as sequence divergence increases, as seen in Bacillus (Roberts & Cohan, 1993; Zawadzki et al., 1995; Majewski & Cohan, 1999), Escherichia (Vulićet al., 1997, 1999), and Streptococcus (Majewski et al., 2000a), although requirements of near identity may be relaxed in hypermutator strains lacking mismatch repair (Matic et al., 2000).

The requirement for near identity in homologous recombination is expected to reduce the transfer of homologous adaptations, such as penicillin resistance in Streptococcus, which occurs through a mutational change in the target protein (the so-called ‘penicillin-binding protein’) (Maynard Smith et al., 1991). Resistance alleles have been transferred within the genus through homologous recombination (Maynard Smith et al., 1991; Hoffman-Roberts et al., 2005; Barlow, 2009), and presumably, the rate of transfer across species taxa was reduced by their sequence divergence.

Sequence divergence of shared genes may also limit the transfer of heterologous genes. This is because homologous recombination can facilitate the transfer of heterologous genes, particularly when a heterologous gene is sandwiched between homologous genes shared by the donor and the recipient [i.e. homology-facilitated illegitimate recombination (HFIR)] (de Vries & Wackernagel, 2002). Provided that a donor segment's end sequences are homologous with recipient sequences and they pass the MEPS criterion, any genetic material sandwiched between the end segments may be cotransferred (Majewski & Cohan, 1999).

While acquisition of novel genes by HGT is possible between extremely divergent organisms, HGT occurs much more frequently between close relatives than between more distant relatives. For example, in the gammaproteobacterium Legionella pneumophila, the number of genes acquired by HGT events from a donor taxon is correlated with the phylogenetic distance (based on 16S rRNA gene divergence) between L. pneumophila and the donor (Coscolláet al., 2011). More broadly, a recent analysis of 657 sequenced prokaryotic genomes clearly showed that HGT events are much more common among close relatives than distant relatives (Popa et al., 2011) (Fig. 1).

Figure 1.

 The relationship between genome sequence similarity and the proportion of species pairs that engaged in HGT. The ‘donor–recipient’ pairs represent the pairs of species that were connected by at least one HGT event, based on 32 028 HGT events that could be attributed to a recipient and a donor organism, among 657 sequenced prokaryotic genomes; ‘disconnected pairs’ represent those species pairs that did not show evidence of any HGT events. The decrease in the number of species pairs apparently engaging in HGT among closest relatives (from 50% to 100% genome sequence similarity) was attributed to the difficulty in identifying HGT events among close relatives (Popa et al., 2011) (used with permission from Cold Spring Harbor Laboratory Press).

A close phylogenetic relationship fosters HGT in at least two ways. First, close relatives will have greater sequence identity in their shared genes, thus increasing their rate of homologous recombination (Zawadzki et al., 1995; Vulićet al., 1997; Majewski & Cohan, 1998, 1999; Majewski et al., 2000b), as well as HFIR (de Vries & Wackernagel, 2002). Second, close relatives are more likely to share the same habitat than distant relatives; even members of different families, but the same order are more likely to share habitats than more distant relatives (Philippot et al., 2010). Because organisms can exchange genes only when they are in the same habitat (Matte-Tailliez et al., 2002), phylogenetic closeness tends to promote genetic exchange through a sharing of environments.

What adaptations can and cannot be widely transferred?

Beyond the requirement of similarity between donor and recipient, there are limitations on the kinds of adaptations that can be transferred. The various genes must fit on a chromosomal segment short enough to be transferred or they must fit on a mobile extrachromosomal element; also, the set of genes must be functional when it arrives in a new organism (Lawrence, 1999). The selfish operon model predicts that natural selection will favor the contiguous arrangement of a functionally related set of genes as an operon, enabling the gene set to be successfully transferred as a single, functional element across taxa (Lawrence & Roth, 1996; Lawrence, 1997, 1999). In this model, natural selection acts at the level of the operon, preserving the transferability of the unit, such that an operon can spread widely across the bacteria. Indeed, certain operons have passed between organisms at a high rate (Homma et al., 2007).

An operon may expand to include more functions through HGT acquisition of additional genes (Homma et al., 2007); when this occurs, it is generally the first step in the pathway that is gained, such as the gaining of the fucP gene in the fucPIKUR operon in E. coli (for the digestion of fucose) (Pál et al., 2005). In addition, regulators of newly added genes may be gained along with the genes they control (Price et al., 2008). The co-incorporation of newly acquired protein-coding genes along with their regulators may increase the successful transferability of an enlarged operon (Price et al., 2008).

Transferable gene clusters often contain a set of genes that are all involved in processing a single resource molecule. This is exemplified by the lac operon, providing inducible metabolism of environmental lactose (Jacob & Monod, 1961). The lac operon includes protein-coding genes for a lactose-binding repressor, which shuts down the operon in the absence of lactose; a lactose permease, allowing for the uptake of lactose; and two proteins for catabolizing lactose in different directions: one that cleaves lactose to its constituent monosaccharides and another that transacetylates lactose. Another motif for a transferable cluster is an entire biochemical pathway, for example the full synthetic pathway for cytochrome c biogenesis (Goldman & Kranz, 1998). In both of these cases, a contiguous set of genes constitutes a transferable adaptation.

Multiple antibiotic resistance genes are frequently arranged in clusters on plasmids (Gomez-Lus, 1998; Barlow, 2009). While not strictly operons in the sense that the various contiguously arranged antimicrobial-resistance genes on a plasmid may be regulated independently (Gomez-Lus, 1998), the contiguous organization of such gene sets can be explained through an extension of the selfish operon theory. The association of different antimicrobial resistance genes may contribute to a single, selectable function. Carrying resistance for multiple antibiotics is a highly successful strategy for human pathogens likely to be targeted by multiantibiotic therapy, either simultaneously or in a cycling regimen (Lawrence, 2000; Summers, 2006).

Surveys of the genes and adaptations that have been most and least frequently transferred demonstrate some of these principles of transferability. In a survey of completely sequenced genomes, the majority of genes identified as recently transferred were classified into four functional categories: plasmid, phage, and transposon functions; cell envelope functions; regulatory functions; and cellular processes (Nakamura et al., 2004). Within the cell envelope and regulatory function categories, commonly transferred genes include those encoding cell surface structures, biosynthesis and degradation of surface polysaccharides, and DNA interactions. Frequently transferred cell surface proteins may be understood as niche-transcending adaptations, as they are often the targets of immune systems, and a newly acquired surface protein may allow immune escape (Stein et al., 2010). The most commonly transferred cellular process genes were those involved in DNA transformation, pathogenesis, toxin production, and resistance (Nakamura et al., 2004). In the case of genes coding for pathogenesis, toxin production, and resistance, we may interpret their transferability in terms of providing niche-transcending adaptations for invading new environments or for resisting new challenges appearing in an organism's present environment, such as antibiotics.

The survey by Nakamura et al. (2004) also showed that plasmid, phage, and transposon functions comprised nearly a third of the transferred genes. Many of these transferred genes are caused by infection by these subcellular particles and provide no adaptation for bacteria. In one special case, bacteria can acquire defense against phage through HGT of gene segments from phage that have infected them; this is the modus operandi of the clustered regularly interspaced short repeats (CRISPR) system (Marraffini & Sontheimer, 2008) (Box 2). CRISPR units are comprised of a series of short repeats separated by spacer sequences that are derived most often from bacteriophage or plasmids (Bolotin et al., 2005; Mojica et al., 2005; Pourcel et al., 2005), and provide defense against the phages or plasmids from which they are derived (Barrangou et al., 2007). The CRISPR loci are transcribed into RNA segments, which are cleaved into smaller units that each target the complementary sequence on the phage (Brouns et al., 2008) by base-pairing with either the phage's DNA (Marraffini & Sontheimer, 2008) or RNA (Hale et al., 2009). Thus, the CRISPR system represents a form of acquired defense, where infection of a bacterium by a phage may be followed by the incorporation of phage DNA into new spacer regions of the CRISPR system, thereby providing defense against related phage in the future (Sorek et al., 2008; van der Oost et al., 2009; Horvath & Barrangou, 2010).

Table Box 2..   Evolution of the CRISPR system through HGT
CRISPR modules are an acquired immunity defense of prokaryotes against phage and plasmids, and are possessed by most Archaea and many Bacteria (Makarova et al., 2011). Each CRISPR module within a genome contains a number of spacer regions, each derived from the sequence of a previously infecting phage or plasmid through HGT, and the spacer regions are separated by repeat regions. Resistance to phages through the CRISPR system is very precise. If the spacer sequences are not a 100% match to the phage, the phage may retain the ability to infect the bacterium (Barrangou et al., 2007). Thus, the incorporation of spacer sequences into CRISPR loci contributes to the evolutionary arms race between the bacterium and the phage, where selective pressure imposed on the phage through the CRISPR system may drive high evolutionary rates in the phage (Sorek et al., 2008). Spacer regions in CRISPR loci are also rapidly gained and lost – the most newly acquired spacer regions are often unique to individual bacterial isolates (Pourcel et al., 2005). Large-scale evolutionary changes have also been noted in CRISPR systems. Although sequences of repeats in CRISPR loci vary among different bacterial species, there are occurrences of unexpectedly similar repeat sequences between diverse bacteria (Makarova et al., 2002; Haft et al., 2005; Godde & Bickerton, 2006). These similarities point to the propagation of CRISPR systems by HGT. The transfer of CRISPR systems is hypothesized to be mediated by megaplasmids (plasmids >40 kb in size), as CRISPR loci in some genomes are located on megaplasmids rather than on the bacterial chromosome (Godde & Bickerton, 2006). The HGT of CRISPR loci may have helped to establish the widespread presence of CRISPR systems in bacteria [∼40% of bacterial genomes contain CRISPR loci (Godde & Bickerton, 2006)], and indicates the importance of the CRISPR system for phage defense.

Genes that have been acquired by a particular lineage through HGT cannot be assumed to be adaptations, and the adaptive value of an acquired gene should be confirmed, for example by showing that it is expressed under natural conditions in a way that is consistent with its likely adaptive value (Steunou et al., 2008). Acquired genes are most likely to be nonadaptive ‘craters’ that have fallen onto the genome under a variety of circumstances, particularly if the genes were newly acquired. One possibility, which genome analyses abundantly support, is that many transferred genes have entered the genome nonadaptively with invading phage or transposons (Nakamura et al., 2004). Another possibility is that a nonadaptive, transferred gene has randomly reached a high frequency within a population or a taxon by genetic drift or by periodic selection. Finally, a gene may have been adaptive in the past, but is no longer adaptive. This could be the case for an antibiotic resistance gene that was adaptive when a particular antibiotic was applied, but in the absence of the antibiotic, the gene is no longer useful.

Among the cellular processes that are transferred the least are informational genes, involved in translation and transcription (Jain et al., 1999; Nakamura et al., 2004). Within E. coli in particular, Davids & Zhang (2008) found genes encoding for informational processes to be dominated by core genes shared by all E. coli strains, with no significant contribution from HGT. The low rates of transfer of informational genes [as well as photosynthetic genes (Aris-Brosou, 2005)] can be understood most generally by Riedl's ‘burden’ hypothesis (Riedl, 1978). The burden hypothesis predicts (for animal evolution) that organs burdened by connections to many other organs would be slow to evolve; their burden of complex interactions with other functional units would make their evolution difficult. Aris-Brosou (2005) has argued that the property of burden (or connectivity) that makes adaptive evolution difficult for complex structures also renders their HGT unlikely to succeed. More specifically, the burden hypothesis predicts a low rate of transfer of genes coding for parts of the transcription and translation machinery, as these functions are burdened with many complex interactions; the transfer of only one part of a complex set of co-adapted structures would likely bring about an incompatibility and loss of function (Jain et al., 1999). The degree of burden, and thus the predicted resistance to the transfer of a gene, may be approximated by the number of protein–protein interactions of the gene's product (Wellner et al., 2007; Price et al., 2008). This measure of a protein's connectivity has been shown to be an important factor for the transferability of genes across all functional categories, in both ancient and recent transfers (Cohen et al., 2011) (Fig. 2).

Figure 2.

 The relationship between the number of HGT events and the number of protein–protein interactions. Each dot represents a gene family, and the number of protein–protein interactions for a given family is quantified as the number of other gene families with which the given family interacts. The negative relationship between the number of HGT events and the protein–protein interactions was quantified with a Spearman correlation coefficient of −0.422, with P<8.18 × 10−105 (Cohen et al., 2011) (used with permission from Oxford University Press).

Although uncommon, the transfer of highly connected informational genes can occur (e.g. the transfer of an rRNA operon or a gene encoding a ribosomal protein) (Gogarten et al., 2002; Lind et al., 2010). The burden hypothesis (or the ‘complexity’ hypothesis) predicts that for informational genes to be successfully transferred, the recipient and the donor should be closely related. As predicted, the fitness costs of transferred ribosomal genes in Salmonella in experimental manipulations were generally much higher from distant relatives than from close relatives (Lind et al., 2010). In contrast, the most phylogenetically distant transfers, such as those between the Bacteria and the Archaea domains, almost always involve metabolic genes (Kanhere & Vingron, 2009). This pattern is consistent with informational genes being incompatible with the physiology and structures of any but the most closely related organisms.

Among the most complex structures that show no evidence of HGT are features of cell architecture, which are profoundly different across the most anciently divergent bacterial taxa (Cohan & Koeppel, 2008; Cohan, 2010). The prokaryotic domains of Bacteria and Archaea differ fundamentally in the structure of their cell membranes, owing to their use of ester vs. ether linkages in their membranes' fatty acids. Some Archaean lineages build upon the ether linkage to produce a tetraether diglycerol structure yielding a lipid monolayer. This monolayer structure is of great ecological importance to hyperthermophiles, as the monolayer does not peel apart at high temperatures, in contrast to the lipid bilayer of Bacteria (Madigan et al., 2009). Thus, this profound structural difference between these domains is at least partly responsible for the ecological success of Archaea at extremely high temperatures. While this architecture-based ability to resist high temperature might be of value for some Bacterial lineages, such differences in the ‘bauplans’ of cells, however adaptive, are not likely to be packaged and transferred with success (Cohan & Koeppel, 2008). Because the lipid monolayer membrane interacts with many aspects of cell physiology, including the function of many trans-membrane proteins, transfer of such an adaptation is unlikely, even if it were succinctly packaged as a string of genes on a plasmid.

Other nontransferable adaptations include architectural structures of the Firmicutes and the Planctomyces. The Firmicutes are built with an extremely strong cell wall, containing multiple layers of cross-linked peptidoglycan (Madigan et al., 2009), conferring effective and constitutive resistance to osmotic stress (Schimel et al., 2007). The gram-positive cell wall thus confers to the Firmicutes the ability to thrive in drought-prone environments, which can deliver an osmotic shock by rapidly re-wetting the cell. The Planctomyces build a stalked morphology, unique in the bacterial world, which lifts the cell above the surface of sediment and brings the cell within reach of much higher levels of nutrients (Lawler & Brun, 2007). In all these cases, bauplan-level adaptations would in principle be gratefully received by other lineages for their ecological advantages, but they have not been transferred, presumably because such adaptations would integrate into the existing physiology and development of a recipient cell only with extreme difficulty, if at all (Cohan & Koeppel, 2008; Cohan, 2010; Philippot et al., 2010).

Accommodation of transferred adaptations

While a set of genes acquired through HGT may provide an important adaptation to a recipient, the transfer may also incur harmful pleiotropic effects (Cohan et al., 1994; Nogueira et al., 2009). In the case of animals and plants, large genetic changes often disrupt the organism's physiology and development (Fisher, 1958), and we should likewise expect HGT events to be disruptive in bacteria. It is easy to imagine that the expression of novel genes would disrupt existing physiology, but even acquiring an adaptive allele by homologous recombination could disrupt the smooth functioning of a cell, especially in the case of antibiotic resistance alleles. This is because many antibiotics target informational proteins, which are difficult to alter, as we have discussed. Thus, many resistance-conferring mutations in targeted information molecules bear a severe cost to the growth rate and competitiveness of a cell (Cohan et al., 1994; Andersson & Levin, 1999; Levin et al., 2000).

The deleterious side effects of a new adaptation can drive natural selection toward ‘domesticating’ the adaptation, that is, toward ameliorating its negative fitness effects. One mode of ameliorating the cost of a new adaptation is through compensatory evolution, whereby natural selection favors modifiers at other loci that compensate for the deleterious effects of the adaptation (Cohan et al., 1994; Andersson & Levin, 1999). These other loci are expected to code for functions that interact with the new adaptation. For example, a costly streptomycin-resistance allele of the S12 ribosomal protein locus can be ameliorated by compensatory mutations in the L19 locus, coding for another ribosomal protein (Maisnier-Patin et al., 2007).

The potential for compensatory evolution may be evaluated by observing spontaneous evolution among the descendants of a cell that has integrated an adaptation through HGT. This reveals the potential of new mutations in the genome's already-existing genes to ameliorate an adaptation's harmful effects (Andersson & Levin, 1999; Levin et al., 2000). Alternatively, one may place an adaptive gene into different genetic backgrounds within a species taxon and observe the extent to which the fitness effect of the adaptive gene varies between genetic backgrounds (Cohan et al., 1994). This provides an estimate of the potential for a species' standing genetic variation to ameliorate the adaptation. For example, placing rifampicin-resistance alleles of the rpoB locus into different strains of Bacillus subtilis yielded a large range of fitness costs across different genetic backgrounds (Cohan et al., 1994).

Amelioration can also occur through change in the adaptation itself. Newly acquired genes have higher rates of evolution than other genes in the genome (Daubin & Ochman, 2004; Hao & Golding, 2006; Marri et al., 2007; Davids & Zhang, 2008). For example, Hao & Golding (2006) found that within the Bacillaceae, recently transferred genes had longer tree lengths and higher Ka/Ks ratios than native genes, illustrating rapid evolution. As transferred genes persist in the genome, the Ka/Ks ratio decreases, suggesting adaptation of the new genes to the new environment of the host (Hao & Golding, 2006). Additionally, horizontally transferred genes are more likely to be regulated by multiple factors; this complex regulation is believed to evolve after the genes are transferred (Price et al., 2008).

Another mechanism for domesticating a horizontally acquired adaptation involves altering the amount of protein product of the transferred gene, by either gene amplification or gene repression. In full-genome studies of paralogous genes, it has been shown that genes newly acquired by the genome are more likely to undergo gene duplication (Hooper & Berg, 2003a). Although the reason for the duplication of horizontally transferred genes is not well established, it is thought that gene duplication functions to amplify the amount of gene product for genes with suboptimal levels of protein product, as genes with large amounts of protein product are rarely duplicated (Hooper & Berg, 2003b; Lind et al., 2010). Gene amplification may be especially important when HGT results in the homologous replacement of essential genes. For example, Lind et al. (2010) found that in Salmonella typhimurium, gene amplification was common following the homologous replacement of ribosomal genes from a variety of sources. The growth rates of these mutants with gene amplifications were significantly higher than those without, indicating that the gene duplication, in this case, served as amelioration following HGT.

Amelioration can also occur by way of gene repression. Horizontally acquired sequences are often initially repressed in the host genome by histone-like nucleoid-structuring proteins (H-NS) (Schechter et al., 2003; Dorman, 2004, 2009; Oshima et al., 2006). H-NS appears to have the ability to repress horizontally transferred DNA by recognizing sequences with an aberrant GC content (Dorman, 2009). In some strains, upwards of 40% of horizontally transferred genes are associated with H-NS proteins (Lucchini et al., 2006). In the case of Salmonella enterica, an Hns-like gene on a plasmid expresses H-NS-like proteins that target horizontally acquired genes only, allowing the silencing of horizontally transferred genes upon transfer (Beloin et al., 2003; Doyle et al., 2007; Baños et al., 2009).

The initial silencing of transferred genes may foster the incorporation and domestication of transferred DNA into the genome, as repression serves to limit the fitness costs of expression of the novel DNA on the host cell (Dorman, 2007; Baños et al., 2009). Dorman has argued that this initial silencing can be lessened over time as compensatory changes occur, allowing a potential benefit of a transferred gene while minimizing its disruptive effects (Dorman, 2009).

Finally, one other form of amelioration involves changes in an acquired set of genes to fit the nucleotide composition of the host genome. Organisms in different taxa above the genus level frequently are different in their frequencies of single nucleotides, dinucleotides, trinucleotides, and so on. The compositional differences between a donor segment and the recipient are diminished over time as incorporated genes are subjected to the host's pattern of mutation (Lawrence & Ochman, 1998), as well as natural selection to better integrate the acquired genes with host genome regulation (Lercher & Pal, 2008).

HGT and ecological divergence among close relatives

How did the bacterial world diversify from a single common ancestor to the hundred or more divisions of today, representing profoundly different cell architectures, metabolisms, and ecologies? While the full answer is buried in the vast expanses of time, we can approach a partial answer by investigating a much more modest process, that of speciation, in which one lineage splits into two irreversibly separate lineages that are at least subtly different in ecological requirements and capabilities. One can approach the origins of new diversity through a genetic analysis of newly divergent bacterial populations. The availability of full genome sequences from closely related bacteria has enabled microbiologists to discover the ecological differences among closely related groups as well as the genetic differences underlying the ecological divergence.

In cases where habitat differences suggest ecological differentiation between close relatives, a genome-based analysis can reveal the physiological basis for the ecological divergence and the role of HGT. For example, a recent study by Luo et al. (2011) aimed to find the physiological basis of differences between E. coli clades associated with external environments (e.g. freshwater beaches and sediments) vs. clades associated with mammals and birds, either as commensals or as pathogens. Comparisons of genome content confirmed that clades associated with external environments were indeed ecologically distinct from clades associated with endosymbiont lifestyles within mammals and birds. In contrast to commensal and pathogenic clades, the environmental clades were found consistently to contain lysozyme, for breaking down the cell walls of other bacteria in the external environment, as well as the biochemical pathway for diol utilization, apparently for the use of diol as an energy substrate. The gut-associated clades were found to have transport and metabolic capability related to various substrates that are known to be common in the gut, including N-acetylglucosamine, gluconate, and five- and six-carbon sugars. Most importantly, by virtue of genome content differences, the environmental clades are not likely to compete well in the gut and are not expected to be of concern for human health (Luo et al., 2011).

Likewise, closely related gut and dairy taxa within Lactobacillus have acquired genes that foster adaptation to their respective environments (O'Sullivan et al., 2009). The gut-adapted taxon Lactobacillus acidophilus bears the gene for maltose-6-phosphate glucosidase, for degradation of the abundant maltose in the gut; also, the gut-adapted taxon has a gene for bile salt hydrolase, which contributes to resistance to bile in the gut. Both these functions are absent in the dairy-adapted taxon Lactobacillus helveticus. The dairy-adapted taxon has a carboxypeptidase gene not found in the gut-adapted taxon, which contributes to survival in environments, such as milk, where amino acid levels are low.

Genome comparisons among other extremely closely related populations in other taxa also identify the physiological basis of ecological divergence and point to a role for HGT in the origins of ecological diversification. Genome comparisons among soil populations of Pseudomonas putida associated with polluted and unpolluted environments show the polluted populations to hold metal resistance genes not found in unpolluted sites (Wu et al., 2010). Also, closely related populations within Prochlorococcus marinus have diverged in their phosphate-acquiring adaptations: those populations adapted to marine environments with very low levels of phosphate have acquired a set of genes involved in the uptake, regulation, and utilization of organic phosphates (Martiny et al., 2009a). In studies like these, comparisons of genome contents among close relatives point to the physiological basis by which populations diverge ecologically and implicate a role for HGT as the motive force by which this diversification has proceeded.

Genome content comparisons can also add to our knowledge about the ecological dimensions by which populations have diverged. This is seen in the case of two clades of Synechococcus in a Yellowstone hot spring: one clade associated with hotter, upstream environments closest to the source of the spring (the A clade, 60–65 °C) and the other associated with cooler, downstream environments (B′ clade, 55–61 °C). Comparison of the genome content of these groups showed that beyond differences in their temperature optima (Allewalt et al., 2006), the groups were also different in their abilities to take up and store mineral nutrients (Bhaya et al., 2007). The genomes suggest that the cooler, downstream clade can accommodate lower concentrations of mineral nutrients (owing to consumption by upstream populations) by being able to take up phosphonate and store nitrogen, adaptations acquired by HGT.

Metagenomic approaches have also yielded evidence for HGT in the origin of ecological diversity. A shotgun sequencing of random clones from the microbial mat of a Yellowstone hot spring indicated an adaptive divergence among extremely closely related populations of Synechococcus (Bhaya et al., 2007). Whereas the complete genome of one isolate showed no genome content dedicated to transport of the reduced ferrous ion, several closely related organisms identified from the metagenome contained the ferrous transport genes feoA and feoB. This suggested that some populations, but not others are able to scavenge reduced iron during the nighttime when the mat becomes anoxic (Bhaya et al., 2007). Thus, niche adaptation strategies can be inferred by comparing metagenome sequences with an ‘anchor’ isolate whose genome has been fully sequenced.

Genome-based analyses have also shown how ecological diversity can emerge from changes other than HGT, through changes in existing genes. Petersen et al. (2007) searched the set of genes shared by six E. coli strains for genes whose amino acid divergence was accelerated by natural selection. This search identified several genes coding for cell surface proteins, and the amino acids under selection were consistently found to be in the extracellular region, indicating that diversification has resulted from the arms race between the bacteria and their enemies, such as phage and the host immune system.

Analyses of genome-wide gene expression have also helped to characterize how changes in existing genes have led to niche partitioning among close relatives. Denef et al. (2010) found that one clade of the Archaean Leptospirillum, which is typically a late colonizer of acid-mine pools, differs from a closely related, early-colonizing clade in the expression of several shared proteins that may be important in growth at low nutrient concentrations, including proteins for cobalamin synthesis and glycine cleavage. It will be fascinating to find the extent to which ecological differences among close relatives are caused by the horizontal acquisition of novel genes vs. changes in the sequences and expression patterns of shared genes.

In summary, genome-based comparisons have confirmed that closely related populations are ecologically distinct; they have identified the ecological dimensions by which the populations have diverged; and they have determined the genetic and physiological bases of ecological divergence, including the role of HGT. Approaches adopted up to now, however, have not directly addressed the origins of species, by which one lineage irreversibly splits into two lineages, as we next discuss.

HGT and the origins of bacterial species

To discover the ecological and genetic changes that occur during speciation, it is not enough to compare closely related species taxa or even close relatives within a species taxon. To investigate the dynamics of lineage splitting and the origin of irreversible separateness, we need to identify the most newly divergent, ecologically distinct populations. Focusing on these most newly divergent populations will allow us to identify the features responsible for the earliest splitting of lineages. If we instead compare more divergent lineages, we cannot distinguish the features responsible for lineage splitting from those features added on after the lineages have become established. Studies of the speciation process in bacteria have not yet attempted to identify and compare the genomes of the most recent products of speciation.

The established systematics of bacteria does not help identify the most newly divergent, irreversibly separate, ecologically distinct bacterial populations. The problem is that the taxa recognized as species by bacterial systematics are extremely broadly defined, with enormous levels of divergence in physiology, genome content, and most importantly, ecology (Welch et al., 2002; Whittam & Bumbaugh, 2002; Tettelin et al., 2005; Lefébure & Stanhope, 2007; Rasko et al., 2008; Touchon et al., 2009; Paul et al., 2010). For example, isolates within the species taxon E. coli show huge differences in habitat (Walk et al., 2007, 2009) and genome content (Welch et al., 2002; Touchon et al., 2009; Luo et al., 2011). While there are increasing numbers of studies comparing the genomes of close relatives within the recognized species taxa, little attention has been paid as to whether the populations being compared represent the most newly divergent, irreversibly separate, ecologically distinct populations.

One approach to identifying such populations takes into account the dynamic properties that are widely understood to characterize species, at least outside of the systematics of bacteria (de Queiroz, 2005). Foremost among these dynamic properties is that species are cohesive, in the sense that genetic diversity within a species is limited by a force of evolution (Templeton, 1989; Cohan & Perry, 2007). In contrast, different species are viewed to be irreversibly separate, with no cohesion holding them together. Also, species are expected to be ecologically distinct, which allows the species to coexist into the indefinite future. Finally, each species is expected to be invented only once (de Queiroz, 2005).

The forces of species cohesion in bacteria potentially include genetic exchange between populations (Papke et al., 2007; Sheppard et al., 2008; Fraser et al., 2009; Retchless & Lawrence, 2010), genetic drift (Wernegreen & Moran, 1999; Kuo et al., 2009), and periodic selection, a diversity-purging process occurring in populations with low recombination rates (Koch, 1974; Levin, 1981; Cohan, 1994). Genetic exchange has been widely considered to be a cohesive force preventing ecological divergence in obligately sexual animals and plants; in some models, permanent, adaptive divergence between animal and plant populations is impossible unless barriers to genetic exchange have evolved (Mayr, 1963; Coyne & Orr, 2004), although the necessity of barriers to genetic exchange in animal and plant speciation has been recently challenged (Dieckmann et al., 2004; Mallet, 2008).

In the case of bacteria, the rate of exchange of genes between populations, even between differently adapted populations of the same microhabitat, is extremely low. This is because the flow of genes between populations is limited by the bacterial rate of recombination, which is consistently near the mutation rate (Vos & Didelot, 2009). At such low rates of gene flow between populations, genetic exchange is not a significant force preventing ecological divergence between bacterial populations (Cohan, 1994; Cohan & Koeppel, 2008) (Box 1).

Genetic drift and periodic selection are the most likely forces of cohesion within bacterial species. Cohesion by these forces is limited to the set of ecologically similar organisms within an ‘ecotype’ (Kopac & Cohan, 2011). Genetic drift is most likely to be an important force of cohesion for ecotypes with low effective population sizes, for example, obligate endosymbionts that are transmitted between hosts in extremely small numbers (Wernegreen & Moran, 1999). Periodic selection can act to purge the diversity within any ecotype, regardless of its population size (Levin & Bergstrom, 2000). In a periodic selection event, an adaptive mutation or a gene acquisition by HGT within an ecotype outcompetes to extinction all other members of the ecotype; owing to the low recombination rates in bacteria, selection favoring the adaptive mutation or recombinant can bring nearly the entire genome sequence of the mutant cell to 100% frequency within the ecotype (Cohan, 2005). Thus, the diversity within an ecotype is ephemeral, lasting only until the next periodic selection event (Fig. 3).

Figure 3.

 The dynamics of ecotype formation and periodic selection within an ecotype. Circles represent different genotypes, and asterisks represent adaptive mutations. (a) Ecotype-formation event. A mutation or a recombination event allows the cell to occupy a new ecological niche, founding a new ecotype. A new ecotype can be formed only if the founding organism has undergone a fitness trade-off, whereby it cannot compete successfully with the parental ecotype in the old niche. (b) Periodic selection event. A periodic selection mutation improves the fitness of an individual such that the mutant and its descendants outcompete all other cells within the ecotype; these mutations do not affect the diversity within other ecotypes because ecological differences between ecotypes prevent direct competition. Periodic selection leads to the distinctness of ecotypes by purging the divergence within, but not between ecotypes. (Used with permission from Landes Publishers.)

The origin of permanent diversity requires the origin of a new, ecologically distinct population, that is, the origin of an ecotype (Cohan & Perry, 2007). A new ecotype is formed when an organism invades a new ecological niche, by virtue of a mutation or an HGT event. The diversity-purging power of an ecotype's periodic selection events cannot reach beyond the ecological similarity of ecotype members; the ecological distinctness across ecotypes will generally protect an ecotype from periodic selection stemming from another ecotype (Fig. 3). An ecotype may be most precisely understood as the domain of competitive superiority of an adaptive mutant (Cohan, 1994).

The formation of a new ecotype requires that the new niche-invading organism must be able to coexist with the parental ecotype. This requires a fitness trade-off, where the success of the niche-invading organism in its new niche comes at a cost, usually in its ability to compete with the parental ecotype in the old niche. The fitness trade-off leads to two alternatively specialized ecotypes that can coexist (Fig. 4). A different dynamic ensues when a mutant or a recombinant organism acquires the ability to use new resources at no cost in the old niche. In this case, the new organism becomes more of an ecological generalist, and will simply outcompete the members of its ecotype to extinction, and no new ecotype is created (Fig. 4).

Figure 4.

 The consequences of a change in ecological niche following an HGT event. Acquisition of a new ecological function by HGT (indicated by the green triangle) can lead to either a periodic selection event (a) or an ecotype formation event (b). (a) The new ecological function is added to the ecological repertoire of the recipient strain, and the resultant strain is now able to outcompete the membership of its ecotype by virtue of its greater repertoire. An example of such a niche-expanding HGT event would be the acquisition of one more antibiotic resistance factor in a human pathogen that is already resistant to a number of antibiotics; this would expand the set of clinical conditions under which the strain could succeed. (b) Gain of one function is incompatible with an existing function or reduces the performance of an existing function. Thus, acquisition of the new ecological capability comes at the expense of some pre-existing function. In this case, acquisition of the new function leads to a new ecotype, which can coexist with the pre-existing ecotype. An example of this would be the acquisition of a new symbiosis plasmid in Rhizobium, conferring the recipient with the ability to infect a new plant species; however, two symbiosis plasmids within a cell are incompatible. Thus, the plasmid transfer event would create a splitting of the ecotype into two ecotypes. Alternatively, a mutation to better utilize a new carbon source could lead to the invention of a new ecotype, provided that the mutation causes lower performance in utilizing the old carbon source. This has been seen repeatedly in experimental populations of Escherichia coli that primarily used glucose for carbon; a mutation to utilize secreted acetate created a new ecotype because the acetate-utilizing bacteria were less able to utilize glucose (Treves et al., 1998).

Thus, the origin of bacterial ecotypes that can coexist can be viewed as the origin of species: a nascent ecotype is irreversibly separate from the parental ecotype from which it is derived, because the divergence of different ecotypes is not limited either by periodic selection (Fig. 3) or by recombination (Box 1); also, each ecotype is genetically cohesive, owing to the effect of genetic drift and/or periodic selection in purging the diversity among the ecologically interchangeable membership. Ecotypes thus hold the dynamic attributes of species, and the origin of species can be understood by exploring the origins of ecological divergence among newly formed ecotypes.

Fortunately, the most newly divergent ecotypes can potentially be identified even when we do not know the ecological dimensions by which they have diverged, provided that a particular model of bacterial speciation is assumed (Cohan & Perry, 2007). Under the Stable Ecotype model (Fig. 5a), ecotypes are formed rarely and each ecotype endures many periodic selection events within its lifetime. This model therefore allows the most newly divergent ecotypes to accumulate a unique set of neutral sequence mutations since diverging from their most recent common ancestor; moreover, the recurrent periodic selection events ensure that each ecotype appears as a sequence cluster of very close relatives, based on a phylogeny of any gene in the genome. Different ecotypes will therefore appear to be much more distantly related than members of the same ecotype and will correspond to sequence clusters (Fig. 5a).

Figure 5.

 Models of bacterial speciation. Ecotypes are represented by different colors; periodic selection events are indicated by asterisks and extinct lineages are represented by dashed lines. The letters represent the resources that each group of organisms can utilize. In cases where ecotypes utilize the same set of resources, but in different proportions, the predominant resource of each ecotype is denoted by a capital letter. (a) The Stable Ecotype model. The Stable Ecotype model is marked by a much higher rate of periodic selection than ecotype formation, such that each ecotype endures many periodic selection events during its lifetime. The Stable Ecotype model generally yields a one-to-one correspondence between ecotypes and sequence clusters. The ecotypes are able to coexist indefinitely because each has a resource not shared with the other. (b) The Nano-Niche model of bacterial speciation. In the figure, there are three Nano-Niche ecotypes (denoted by Abc, aBc, and abC) that use the same set of resources, but in different proportions. Each Nano-Niche ecotype can coexist with the other two because they have partitioned their resources, at least quantitatively. However, because the ecotypes share all their resources, each is vulnerable to a possible speciation-quashing mutation that may occur in the other ecotypes. This could be a mutation that increases efficiency in the utilization of all resources. These speciation-quashing mutations are indicated by a large asterisk; each of these extinguishes the other Nano-Niche ecotypes. Thus, in the Nano-Niche model, cohesion can cut across ecologically distinct populations, provided that they are only quantitatively different in their resource utilization. (c) The Recurrent Niche Invasion model. Here, a lineage may move, frequently and recurrently, from one ecotype to another, usually by acquisition and loss of niche-determining plasmids. In the figure, the red lines indicate the times in which a lineage is in the plasmid-containing ecotype; the blue lines indicate the times when the lineage is in the plasmid-absent ecotype. Periodic selection events within one ecotype extinguish only the lineages of the same ecotype. For example, in the most ancient periodic selection event shown, which is in the plasmid-absent (blue) ecotype, only the lineages missing the plasmid at the time of periodic selection are extinguished, while the plasmid-containing lineages (red) persist. Ecotypes determined by a plasmid are not likely to be discoverable as sequence clusters (Cohan, 2011).

For decades, sequence clustering has formed the basis for discovering ecotypes. For example, a diversity of ecotypes was hypothesized within the species taxon Mycobacterium bovis based on the clustering of rapidly evolving sequences, and these clusters were confirmed to be ecologically distinct on the basis of differences in host range (Smith et al., 2006). Most closely related ecotypes have similarly been hypothesized and independently confirmed to be ecologically distinct in many other taxa, including the photosynthetic marine cyanobacteria of Prochlorococcus (Martiny et al., 2009b), on the basis of temperature and nitrogen requirements.

The demarcation of ecotypes can be aided by theory-based algorithms, which do not rely on an investigator's intuition about the phylogenetic size of a sequence cluster most likely to correspond to an ecotype. The algorithms ecotype simulation (Koeppel et al., 2008) and adaptml (Hunt et al., 2008) infer ecotype demarcations using universal molecular methods, generally interpreting ecotypes as sequence clusters for any gene shared among organisms. These algorithms have inferred ecotypes from sequence data in various bacterial systems, and the ecotypes have consistently been confirmed to be ecologically distinct. For example, ecotype simulation has identified extremely closely related ecotypes in soil Bacillus that are ecologically distinct in their associations with solar exposure (Koeppel et al., 2008), soil texture (Connor et al., 2010), rhizospheres, elevation, and salinity (S. Kopac, unpublished data), as well as in hot spring Synechococcus that are ecologically distinct in their associations with horizontal (temperature and nutrient availability) and vertical (light level and quality) dimensions of the mat (Ward et al., 2006; Melendrez et al., 2011), and in the pathogen L. pneumophila, where ecotypes differ in their amoebic host ranges and in their gene expression patterns (Cohan et al., 2006); adaptml has identified ecotypes of marine Vibrio that are associated with different particle sizes and with seasons (Hunt et al., 2008).

These most closely related ecotypes, whether discovered by an intuitive or an algorithmic demarcation, are the key to discovering the ecological dimensions by which lineages split and the genetic basis of ecotype formation, as well as the role of HGT in the origins of ecological diversity. Genome-based comparisons of the most closely related ecotypes could confirm that the ecotypes are ecologically different, and could reveal the ecological dimensions of ecotype formation; moreover, they could identify the role of HGT in the origins of species. If genome investigations could be directed in the future toward comparing populations identified by algorithms such as ecotype simulation and adaptml to be the most closely related, ecologically distinct populations, we would make better progress toward studying the dynamics of bacterial speciation and identifying all the ecological diversity within bacterial taxa of interest (Cohan & Perry, 2007). To our knowledge, clades that have been identified as most closely related ecotypes have yet to be compared by genome-based analyses.

Genome comparisons among close relatives can also test the validity of alternative models of speciation. Of foremost importance is the need to test whether bacterial ecotypes are cohesive, and genome comparisons provide an opportunity to do this. Doolittle & Zhaxybayeva (2009) have argued that there is no fundamental reason why bacterial species should be cohesive, and they suggest that frequent HGT events may continuously cause a lineage to switch from one ecological niche to another. In the most extreme form of this argument, ecological interchangeability may extend only as far as a bacterium and its immediate offspring.

Doolittle and Zhaxybayeva (2009) have likely exaggerated the importance of HGT in fostering ecological diversity among closest relatives (Kopac & Cohan, 2011). While closest relatives are inevitably different in their rosters of genes, careful analyses of genome content differences have shown that nearly all HGT events among closest relatives involve transfers of genes not known to be involved in the ecological divergence of bacteria, such as phage-related genes, genes for transposition activity, and genes whose function is not known (the genes of so-called ‘hypothetical’ function) (Touchon et al., 2009; Wiedenbeck, 2011).

Nevertheless, the phylogenetic groups identified as ecotypes can be cohesive only if their members are ecologically interchangeable, as genetic drift and periodic selection can purge diversity only within populations of the same ecological niche; it is therefore essential to test whether the members of an identified ecotype are ecologically interchangeable (Kopac & Cohan, 2011). Genome-based comparisons, whether focusing on genome content differences, positive selection on shared genes, or changes in gene expression, can test whether the members of a putative ecotype (identified by sequence-based algorithms) are ecologically interchangeable and can identify the phylogenetic breadth of ecological interchangeability.

We have recently tested for ecological interchangeability within one putative ecotype hypothesized by ecotype simulation and adaptml, to our knowledge the only test of ecological interchangeability at this small phylogenetic scale (Wiedenbeck, 2011). This was an analysis of five strains from a putative ecotype (PE15) within B. subtilis ssp. spizizenii, including four strains isolated from a Death Valley canyon (Connor et al., 2010) and one well-characterized reference strain (Zeigler et al., 2008). The five strains were shown to be heterogeneous in their genome content, largely in genes related to phage and transposition as well as genes classified as hypothetical (Wiedenbeck, 2011). Remarkably, none of the remaining unshared genes provided any novel ecological functions; all these genes were either duplicates of genes already shared by every strain or they were novel genes contributing to functions already possessed by all strains. For example, all five strains had multiple genes involved in the transport and utilization of maltose, and additional genes unique to some strains were either copies of existing maltose genes or they provided additional capability for metabolizing maltose. Every nonphage, nontransposing, nonhypothetical gene that was unique to a subclade of strains within our sample shared this pattern seen for maltose.

Thus, while the putative ecotype's members appear not to be ecologically interchangeable, the heterogeneity appears to be limited. We hypothesized that the divergence among genome types represents a kind of quantitative divergence, where each strain sampled has its own ecological niche, but no strain has a unique resource that protects it completely from competition (Wiedenbeck, 2011). In this case, the various members of PE15 may utilize maltose under different conditions, and each strain's unique set of maltose genes may allow that strain to utilize maltose best under its own optimal conditions.

This kind of quantitative divergence among ecotypes is different from the Stable Ecotype model, where each ecotype is more completely protected from competition from other ecotypes because each ecotype has some unique resources (Fig. 5a). In the case of the variants within PE15, each quantitatively different ecotype may have its own periodic selection events, but it is still possible that a periodic selection event in one ecotype could drive the other ecotypes to extinction, as in the Nano-Niche model of bacterial speciation (Cohan & Perry, 2007) (Fig. 5b). Thus, while the putative ecotype analyzed here turned out to be ecologically heterogeneous, it still may be cohesive owing to the incomplete and quantitative nature of the ecological divergence among populations. This study shows the potential for gene content comparisons to study both the phylogenetic breadth of ecological interchangeability and the extent to which a demarcated, putative ecotype forms a cohesive unit.

An alternative model of bacterial speciation, the Recurrent Niche Invasion model, takes into account the role of mobile genetic elements, such as plasmids or phage, in determining bacterial niches (Cohan & Perry, 2007) (Fig. 5c). For example, in the case of Rhizobium, a bacterial lineage may acquire a symbiosis plasmid that adapts it as an endosymbiotic mutualist to one set of legume hosts; then the lineage may lose that plasmid and acquire another symbiosis plasmid, thereby adapting it to another set of legumes (Segovia et al., 1991). Also, a pathogenic lineage may acquire a multiple-resistance plasmid, which will adapt it to habitats where antibiotics are free flowing, and then lose it when living in habitats where antibiotics are rarely used. In either case, a cell can be converted from one ecotype to another by acquiring and/or losing a niche-specifying plasmid or phage, and so a lineage moves back and forth between memberships in different, previously existing ecotypes. In contrast to chromosomally based ecotypes, ecotypes that are distinguished only by the presence of different plasmids cannot be discovered as sequence clusters in genes unrelated to ecological divergence (Cohan, 2011) (Fig. 5c).

We therefore cannot predict and demarcate plasmid-based ecotypes on the basis of sequence divergence in randomly chosen chromosomal genes. The ecological diversity created by plasmids can be discovered only by screening for the functions known or expected to be coded by the plasmids ahead of time. As we next discuss, such a functional approach has been successful in identifying ecological diversity in antibiotic resistance factors, which are commonly carried and transferred on plasmids (Fondi & Fani, 2010).

The antibiotic resistance of our future

What are the resistance genes of our future? To find the resistance factors most likely to enter human pathogens through HGT in the near future, we should take into account that environmental and phylogenetic proximity of two organisms contribute to their sharing of genes, as well as the complexity and compatibility of resistance factors.

One approach screens for resistance factors with environmental proximity to potential human pathogens; here, bacteria associated with humans have been surveyed for their antibiotic resistance genes. Sommer et al. (2009) recently expressed DNA cloned from the human microbiome and selected clones for resistance to various antibiotics. This approach does not require homology or sequence similarity of resistance genes to any known resistance factor. Indeed, Sommer and colleagues found a broad diversity of resistance factors that were extremely divergent from, and some not even homologous with, previously known factors (Box 3). It is not understood why these abundant resistance factors have not yet made their way into E. coli and other familiar bacteria in our guts. While this approach identifies possible resistance factors in our future, it is important to note that the proximity of these factors to possible human pathogens might not be enough to foster their transfer.

Table Box 3..   Identification of antibiotic resistance factors in human gut bacteria
The widespread occurrence of antibiotic resistance factors among bacteria, as well as their propagation by HGT, has led to an increased interest in characterizing the possible reservoirs for resistance factors (Riesenfeld et al., 2004; D'Costa et al., 2006). As the majority of microorganisms are not presently cultivable, metagenomic analyses of these possible reservoirs may supplement previous analyses of resistance factors present in bacterial isolates. Sommer et al. (2009) analyzed a possible resistance reservoir using a metagenomic analysis of human microbial communities. The authors isolated DNA from human saliva and fecal samples, and created clone libraries from these DNA isolates. The clones were then screened for resistance to a number of antibiotics, and the metagenomic inserts conferring antibiotic resistance were sequenced and annotated. Interestingly, many of the resistance inserts identified in this way did not match previously characterized antibiotic resistance genes (Sommer et al., 2009). Some of these genes encoded proteins that were identical to proteins annotated as hypothetical. This finding is especially intriguing, as it indicates that many hypothetical or poorly characterized proteins from sequenced bacterial genomes may indeed be resistance factors. Identifying these proteins will not only help to further characterize modes of antibiotic resistance in bacteria, but will also help to improve sequence annotation by identifying the functions of hypothetical proteins. The findings of the Sommer and colleagues study indicated that metagenomic analyses are a successful way to identify new resistance factors. The human microbiome, however, is only one of many possible reservoirs for antibiotic resistance. Other important sources for resistance factors include agricultural sources, water environments (Baquero et al., 2008), and soil (Riesenfeld et al., 2004). Further investigations into these reservoirs, including metagenomic analyses, will help to better characterize resistance factors likely to be medically significant.

Beyond searching for antibiotic resistance genes in the human microbiome, it will be informative to screen the metagenomes of other organisms from which we frequently acquire pathogens, including agricultural animals, mice, ticks, and mosquitoes. We will increasingly be able to identify the organism sources of resistance genes discovered in this way as more bacterial genome sequences become available. Those resistance factors that are in phylogenetic and environmental proximity to human pathogens will be of greatest concern (Matte-Tailliez et al., 2002).

As we discover increasing numbers of resistance factors, we may test them for their efficacy in human pathogens. (Or for safety purposes, we may test them in closely related, nonpathogenic model systems.) This would include quantitative measures of the resistance benefit as well as the fitness cost of resistance factors, perhaps by following the high-throughput approach of Sorek et al. (2007) for introducing thousands of genes into a strain and measuring the fitness effects of gene acquisition. It will be interesting to observe the extent to which antibiotic resistance genes tend to be specialized to different phylogenetic groups of pathogens and how easily a resistance factor can co-evolve with a new pathogen to overcome the initial incompatibility between resistance and organism.

In summary, predicting our future of antibiotic resistance will involve identifying the resistance factors that are within reach by HGT of our species' pathogens. It will also be important to use the approaches outlined here to determine which of these resistance factors are compatible with our pathogens and which can become compatible through ameliorative evolution.

The population biology approaches we have outlined may also help to identify all the bacteria whose acquisition of antibiotic resistance may harm us and our agricultural dependents. We have illustrated how the broadly defined species taxa of bacterial systematics can obscure within them multiple clades differing in their pathogenic properties; predicting the next epidemic may require going beyond the existing taxonomy, through discovering new pathogenic ecotypes within the established taxa (Cohan & Perry, 2007) and by not being distracted by ecotypes that are not pathogenic (Cohan et al., 2006; Smith et al., 2006; Luo et al., 2011). We suggest using the universal approaches described here to discover all the ecotypes within known pathogenic taxa and to characterize their ecological capabilities through genome comparisons. The discovery of potential pathogens by these universal approaches, followed by surveillance of their acquisitions of antibiotic resistance, will be an important preemptive public health strategy.


This work was funded by a NASA-funded Connecticut Space Grant to J.W. and by Wesleyan University research funds to F.M.C. We thank Jessica Sherry for alerting us to several examples of niche-specifying HGT events. We thank Fernando Baquero and Dieter Haas, as well as two anonymous reviewers, for their valuable suggestions for improving the manuscript.