Horizontal gene transfer: the path to maturity

Authors

  • Eugene V. Koonin

    1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
    Search for more papers by this author

E-mail koonin@ncbi.nlm.nih.gov; Tel. (+1) 301 435 5913; Fax (+1) 301 435 7794.

The realization that horizontal (lateral) gene transfer (HGT) might have had a major impact on biological evolution is perhaps the most fundamental change in our perception of general aspects of biology brought about by massive genome sequencing (Gogarten et al., 2002; Doolittle et al., 2003). As Lawrence and Hendrickson (2003) rightly point out in the MicroReview appearing in this issue, HGT might also be the most controversial topic in genomics. The main thesis of their review is that the study of HGT is still ‘in its adolescence’. They discuss four major questions that, in their reckoning, should be addressed for HGT to graduate to adulthood:

  • (i) How does HGT impact the evolutionary history of different genes?
  • (ii) How does the role of HGT differ among different lineages?
  • (iii) How does one reach robust conclusions on the presence or absence of HGT?
  • (iv) How does one integrate HGT into the continuum of genetic exchange to arrive at meaningful microbiological concepts?

I really like the adolescence metaphor and agree that it is applicable to our current state of understanding of HGT. There are different kinds of adolescents, though, ranging from the insecure and neurotic, if bright and lovable, Holden Caulfields of this world to precocious young stars busy preparing to take over. I believe that right now HGT science is more like the former but, for it to become the latter, only one fundamental issue needs to be resolved: have there been enormous amounts of HGT throughout the evolution of life on earth or not? This is related to the third question of Lawrence and Hendrickson, but I think that this more straightforward formulation highlights the real problem: for all the accumulating indications of widespread HGT, we still do not have a rigorous proof for this proposition; hence, the continuing argument that the extent and importance of HGT are greatly exaggerated (Kurland et al., 2003).

In my view, the crux of the problem is that any phylogenetic evidence of HGT can also be explained via a combination of gene duplication and lineage-specific gene loss events (Snel et al., 2002; Kunin and Ouzounis, 2003; Mirkin et al., 2003). Two scenarios for a simple, real-life case are illustrated in Fig. 1. Figure 1A shows an ‘obvious’ interpretation of the evolutionary history of the gene in question: a single HGT event between bacteria and archaea. Imagine, however, that this kind of HGT is somehow known to be impossible (a hard thing to prove but, nevertheless, worthy of consideration for the sake of the present argument). Then, excluding the impossible, we will have to accept the scenario in Fig. 1B, however improbable. According to this scenario, the gene in question was present in the last universal common ancestor (LUCA) of all extant life forms, but was subsequently lost in all lineages (at least those that include sequenced genomes) except for two (10 losses altogether). It is easy to see that, when applied on the whole-genome scale, this approach will lead to an inflation of the reconstructed gene set of LUCA. Just how dramatic is this inflation going to be? An estimate for reconstructions made with 27 genomes showed that the number of genes assigned to LUCA increased from ≈ 600 when HGT was considered as likely as gene loss to ≈ 1800 when HGT was (virtually) prohibited (Mirkin et al., 2003). This does not seem particularly scary: after all, under the no-HGT scenario, LUCA is supposed to have had about the same genome size as some of the simplest of the modern, free-living prokaryotes. However, the problem is that, with the increase in the number of genomes to analyse, the number of genes ‘traced’ to LUCA will inevitably grow leading, soon enough, to a ridiculous ‘supergenome’. The escape from the supergenome paradox is clear enough: one would postulate that there was not one LUCA but a population of ancestral organisms with different gene sets, which has been sampled to produce the modern lineages of prokaryotes. Obviously, this scenario involves . . . massive HGT! I believe that this reductio ad absurdum argument is one of the strongest pieces of indirect evidence for the major evolutionary role of HGT.

Figure 1.

Figure 1.

Two scenarios of evolution for anaerobic glycerol-3-phosphate dehydrogenase (glpB).The data are from the COG database (Tatusov et al., 2001) in which the glpB orthologues comprise COG3075. The topology of the species tree is from Wolf et al. (2001).

A. Scenario with a single HGT event, presumably from bacteria to the archaeon Halobacterium, which has numerous genes of ‘bacterial character’ (Koonin et al., 2001).

B. Scenario based exclusively on lineage-specific gene loss. Hatched rectangles indicate the presence of the gene in a given lineage (or in LUCA); red crosses indicate gene loss. Species name abbreviations are as follows. Eukaryotes: y, Saccharomyces cerevisiae (yeast); archaea: a, Archaeoglobus fulgidus; k, Pyrococcus horikoshii; m, Methanococcus jannaschii or Methanothermobacter thermoautotrophicus; o, Halobacterium sp.; p, Thermoplasma acidophilum; z, Aeropyrum pernix; bacteria: b, Bacillus subtilis; c, Synechocystis sp.; d, Deinococcus radiodurans; e, Escherichia coli; f, Pseudomonas aeruginosa; g, Vibrio cholerae; h, Haemophilus influenzae; i, Chlamydia trachomatis or Chlamydophila pneumoniae; j, Mesorhizobium loti; l, Lactococcus lactis or Streptococcus pyogenes; n, Neisseria meningitides; q, Aquifex aeolicus; r, Mycobacterium tuberculosis; s, Xylella fastidiosa; t, Treponema pallidum or Borrelia burgdorferi; v, Thermotoga maritima; x, Rickettsia prowazekii.

Figure 1.

Figure 1.

Two scenarios of evolution for anaerobic glycerol-3-phosphate dehydrogenase (glpB).The data are from the COG database (Tatusov et al., 2001) in which the glpB orthologues comprise COG3075. The topology of the species tree is from Wolf et al. (2001).

A. Scenario with a single HGT event, presumably from bacteria to the archaeon Halobacterium, which has numerous genes of ‘bacterial character’ (Koonin et al., 2001).

B. Scenario based exclusively on lineage-specific gene loss. Hatched rectangles indicate the presence of the gene in a given lineage (or in LUCA); red crosses indicate gene loss. Species name abbreviations are as follows. Eukaryotes: y, Saccharomyces cerevisiae (yeast); archaea: a, Archaeoglobus fulgidus; k, Pyrococcus horikoshii; m, Methanococcus jannaschii or Methanothermobacter thermoautotrophicus; o, Halobacterium sp.; p, Thermoplasma acidophilum; z, Aeropyrum pernix; bacteria: b, Bacillus subtilis; c, Synechocystis sp.; d, Deinococcus radiodurans; e, Escherichia coli; f, Pseudomonas aeruginosa; g, Vibrio cholerae; h, Haemophilus influenzae; i, Chlamydia trachomatis or Chlamydophila pneumoniae; j, Mesorhizobium loti; l, Lactococcus lactis or Streptococcus pyogenes; n, Neisseria meningitides; q, Aquifex aeolicus; r, Mycobacterium tuberculosis; s, Xylella fastidiosa; t, Treponema pallidum or Borrelia burgdorferi; v, Thermotoga maritima; x, Rickettsia prowazekii.

The other argument comes from the biological side and is related to Lawrence and Hendrickson's question (ii). In some cases, there is a clear positive correlation between the apparent amount of horizontal gene exchange among organisms from phylogenetically distant lineages and the similarities in their lifestyles. The first such case is the presence of many more genes of ‘archaeal character’ in the genomes of bacterial hyperthermophiles than in the genomes of mesophilic bacteria (Aravind et al., 1998; Nelson et al., 1999), indicating higher numbers of ‘shared genes’ in populations of very distinct prokaryotes inhabiting similar niches. The second, more recent observation is the preponderance of genes of ‘bacterial character’ in the mesophilic archaeon Methanosarcina, which lives in tightly knit microbial communities along with numerous bacteria, compared with the related thermophilic archaeal methanogens (Deppenmeier et al., 2002). Even if the demonstration of, respectively, the archaeal and bacterial ‘character’ of genes (based on relative levels of sequence similarity or phylogenetic affinity) is not error proof, I find the link between the amount of apparent HGT and the organism's lifestyle to be a strong argument that massive HGT is a biological reality.

I believe that the above arguments, although technically indirect, make a weighty case for a major evolutionary role of HGT and justify intense research into its implications. However, they still fall short of definitive proof. It remains unclear whether or not such proof can be obtained. One could imagine actual experiments with mixed cultures of diverse microbes and selective pressures carefully applied in the hope of selecting for HGT and observing its occurrence in real time. If this does not work, however, HGT will be ushered into its healthy youth by gradual accumulation of indirect evidence such as that discussed above, to the point when the sum of such evidence becomes compelling.

What is required for HGT studies to progress from youth to maturity? I believe that the key question, once the major evolutionary impact of HGT is considered to have been proven (if it is not, then there is not much point in further discussion), is: why? That is, why are horizontally transferred genes fixed in a microbial population? The general answer is, of course, that fixation is driven by Darwinian selection, i.e. the acquired genes increase the fitness of the recipient (it goes without saying that only a miniscule fraction of foreign DNA that invades a microbial cell is fixed) (Koonin et al., 2001; Berg and Kurland, 2002). However, the big enigma is that horizontally transferred genes can confer advantage on the cells that retain them, although they come from a donor whose evolutionary history is distinct from that of the recipient and should therefore be ill adapted to functioning in the latter. The answer could be relatively obvious in the case of acquisition of ‘selfish’ operons, which confer new metabolic capacities to the recipient (Lawrence and Roth, 1996; Lawrence and Hendrickson, 2003). In contrast, the selective advantage of xenologous (orthologous) gene displacement, which might be an even more common form of HGT (Koonin et al., 2001), is much harder to contemplate. One clue seems to be provided by antibiotic resistance, e.g. replacement of an antibiotic-sensitive bacterial aminoacyl-tRNA synthetase by a resistant eukaryotic one (Brown et al., 1998). How general this explanation might be, however, remains unclear. Numerous cases of apparent xenologous displacement suggest entirely unexpected compatibility of proteins that are not adjusted to functioning in the same cellular environment. I believe that, when a satisfactory understanding of the underlying causes of fixation of horizontally transferred genes is achieved at both the theoretical and the empirical levels, the study of HGT will reach the status of a mature research area. At that point, the dramatic implications of massive HGT, in particular, for the tree representation of life's history, will have to be faced in earnest.

Acknowledgements

The author would like to gratefully acknowledge helpful discussions with W. Ford Doolittle on the problems of HGT.

Ancillary