The distribution, dynamics, and evolution of insertion sequences (IS), the most frequent class of prokaryotic transposable elements, are conditioned by their ability to horizontally transfer between cells. IS horizontal transfer (HT) requires shuttling by other mobile genetic elements. It is widely assumed in the literature that these vectors are phages and plasmids. By examining the relative abundance of IS in 454 plasmid and 446 phage genomes, we found that IS are very frequent in plasmids but, surprisingly, very rare in phages. Our results indicate that IS rarity in phages reflects very strong and efficient postinsertional purifying selection, mainly caused by a higher density of deleterious insertion sites in phages compared to plasmids. As they do not tolerate IS insertions, we conclude that phages may be rather poor vectors of IS HT in prokaryotes, in sharp contrast with the conventional view.
Transposable elements (TE) are discrete pieces of DNA that can move within and between genomes. They are widely distributed in eukaryotes and prokaryotes, and they sometimes represent substantial fractions of host genomes. This is well illustrated by a recent survey of 107 genes from all domains of life which showed that the most abundant and ubiquitous genes in nature are transposases (Tpases) (Aziz et al. 2010), the genes that mediate the mobility of a major TE class known as DNA transposons in eukaryotes and insertion sequences (IS) in prokaryotes (Mahillon and Chandler 1998; Feschotte and Pritham 2007).
We downloaded the curated annotations of 1109 plasmid and 1217 phage genome sequences available in ACLAME (version 0.4, http://aclame.ulb.ac.be/), a comprehensive database of prokaryotic MGEs (Leplae et al. 2010). Because we were interested in evaluating the potential of phages and plasmids as shuttles for IS HT, we focused on mobile or mobilizable MGEs. Thus, we conservatively excluded the 760 prophage genomes (phages integrated in bacterial chromosomes) because we have no information on their mobility potential, and 11 phages for which gene content information was not available. We also conservatively excluded 655 plasmids that lack a relaxase gene, which is a prerequisite for plasmid mobility (Smillie et al. 2010). Thus, Tpase content was investigated in 446 phages and 454 plasmids.
TRANSPOSASE DETECTION AND VALIDATION
Tpase genes were recovered from the ACLAME curated annotations provided for each MGE. In these annotations, Tpase open reading frames (ORFs) were first automatically identified based on similarity to Tpases stored in GenBank (Leplae et al. 2004). Following expert verification, Tpase ORFs were assigned an IS family by using ISfinder, the international reference database for IS elements (Siguier et al. 2006).
To account for programmed frameshifting occurring in some IS families and for possible fragmented Tpases, we conservatively counted overlapping and/or contiguous (less than 50 bp apart) Tpases from the same IS family and in the same orientation as single copies.
ESSENTIAL GENE ANALYSIS
Essential genes were defined as genes that are important for MGE mobility and survival, according to Frost et al. (2005). Not all genes need to be present in all MGEs, but when present, these genes are considered as essential. They were recovered from the ACLAME database by keyword searches on the functions of the encoded protein families: “phage DNA replication,”“phage genome replication,”“DNA replication initiation,”“lysis,”“tail,”“head,”“packaging,”“lysogeny/lysogenic,”“phage DNA translocation,”“terminase,”“portal,”“maturation,”“scaffold,” and “fiber” for phage essential genes, and “plasmid vegetative DNA replication,”“DNA replication initiation,”“relaxase/relaxosome,”“partition/partitioning,”“type IV,”“conjugation,”“mating,”“signal transduction,”“cyclase,” and “pilus/pilin” for plasmid essential genes. Densities of essential genes were calculated as the ratio between the coverage of essential genes and the coverage of the whole gene pool of a given MGE (overlaps excluded).
Statistical analyses were carried out using R (http://www.r-project.org/). Distributions were compared with unpaired, two-sided Wilcoxon two-sample tests, and correlation coefficients were estimated with nonparametric, two-sided Spearman tests.
Results and Discussion
PAUCITY OF IS IN PHAGES
From 446 phages and 454 plasmids analyzed, we detected a total of 56 Tpases from 36 phages and 2087 Tpases from 241 plasmids (Table 1). Thus, the vast majority of phage genomes (92%) do not contain any IS insertion. This proportion is significantly higher (chi-square test, P < 10−16) than in plasmids (47%). To account for heterogeneity in the amount of sequence data investigated (∼19.5 Mb of phage sequence vs. ∼39.5 Mb of plasmid sequence), we calculated Tpase densities. Tpase density was significantly lower (Wilcoxon test, P < 10−16) in phages (one Tpase every 346 kb on average) than in plasmids (one Tpase every 19 kb on average). The abundance of IS in plasmids is fully consistent with the classical view that plasmids are common vectors of IS HT. By contrast, the paucity of IS in phages is unexpected given the generally held view that phages are important contributors of IS shuttling between organisms.
Table 1. Summary information for MGEs analyzed in this study.
Total number in ACLAME
Nucleotide coverage (kb)
Number with Tpases
Average Tpase density (kb/Tpase)
WHY SO FEW IS IN PHAGE GENOMES?
There are two possible explanations for the paucity of IS in phages: (1) lack of transpositional opportunities (i.e., IS seldom transpose into phages) and (2) posttranspositional elimination (i.e., IS do transpose into phages but they are subsequently removed).
The paucity of IS in phages may simply lie in the short amount of time that phages generally spend in infected cells, which may reduce opportunities for IS acquisition from the cell genome. Consistently, about half of phage species sequenced to date are “virulent” (Lima-Mendez et al. 2008): their life cycle does not include any lysogenic phase (integration into host genome) and, thus, they never stably inhabit with the host genome (Weinbauer 2004). By contrast, the cycle of “temperate” phages includes a lysogenic phase, which favors interactions with the host genome and should promote IS acquisitions (Weinbauer 2004). Hence, if the paucity of Tpases in phages is due to rarity of transpositional opportunities, we would expect Tpases to be less abundant in virulent than in temperate phages. To test this prediction, we analyzed the 216 phages in ACLAME in which lifestyle is documented (71 virulent and 145 temperate phages; http:// aclame.ulb.ac.be/Classification/Phages/life_style.html). Genome size distributions among the two phage categories do not differ significantly (Wilcoxon test, P= 0.348), with a median value of ∼42 kb. We found that the proportion of phages carrying at least one Tpase is not significantly different (chi-square test, P= 0.204) between virulent (10/71 or 14%) and temperate (11/145 or 8%) phages. In addition, Tpase densities are not significantly different between virulent and temperate phages (Wilcoxon test, P= 0.208). Thus, we conclude that the paucity of Tpases in phage genomes does not reflect a lack of opportunities for IS transposition.
Alternatively, the paucity of IS in phages may result from postinsertional removals. It has been shown that Tpase number and density in bacterial chromosomes are positively correlated with chromosome size, because of lower density of highly deleterious insertion sites in larger genomes (Touchon and Rocha 2007). Thus, IS insertions in small chromosomes are more likely to insert at deleterious sites and be eliminated by selection (Touchon and Rocha 2007). We found that Tpase amount in the 446 phages is positively and significantly correlated with genome size (Tpases excluded), both by number (Spearman's ρ= 0.28, P < 10−8) and density (Spearman's ρ= 0.27, P < 10−7). Thus, selective constrains may explain the paucity of IS in phages.
WHY DO PHAGES AND PLASMIDS DIFFER SO DRAMATICALLY IN THEIR IS CONTENT?
Similar to phages, we found that Tpase abundance in the 454 plasmids is positively and significantly correlated with genome size, both by number (Spearman's ρ= 0.81, P < 10−106) and density (Spearman's ρ= 0.64, P < 10−53). This also suggests an influence of purifying selection against IS insertions in plasmids. If so, why is IS content so different between phages and plasmids?
Having excluded a size bias between phages and plasmids in our dataset (no significant difference in size distributions, Wilcoxon test, P= 0.205), we investigated the possibility of different selective constraints in these two MGE classes. Under this hypothesis, a higher IS abundance in plasmids than in phages implies a higher density of lowly constrained genomic regions in the formers than in the latters. To test this hypothesis, we calculated the amount of intergenic sequence in the 446 phages and 454 plasmids by subtracting for each MGE the total coding sequence coverage (overlaps excluded) to MGE genome size. Fully consistent with a selective explanation, we found that the proportion of intergenic sequence is significantly higher (Wilcoxon test, P < 10−30) in plasmids (17% on average) than in phages (10% on average) (Fig. 1A).
Another prediction of the selection hypothesis is that a higher IS abundance in plasmids than in phages may also reflect a higher density of highly constrained genes in phages than in plasmids. To test this prediction, we estimated the density of essential genes among all genes in phages and plasmids (see Materials and Methods). Again fully consistent with a selective explanation, we found that the proportion of essential genes is significantly higher (Wilcoxon test, P < 10−6) in phages (34% on average) than in plasmids (18% on average) (Fig. 1B). Overall, we conclude that purifying selection is the most likely explanation for the paucity of IS in phages and the sharp difference of IS content between phages and plasmids.
PHAGES MAY BE RATHER POOR VECTORS OF IS HT AMONG PROKARYOTES
In sum, our results indicate that IS rarity in phages does not reflect a lack of exposure to transposition. Instead, two lines of evidence support that a postinsertional process more strongly counter-selects IS inserted in phages than in plasmids: phages contain less intergenic regions and more essential genes than plasmids (Fig. 1). This indicates that there is a higher density of deleterious insertion sites in phages than in plasmids. Phage genomes could also be less tolerant in terms of size expansion. Indeed, bacteriophage biology requires transfer between bacteria through a capsid of limited size (Frost et al. 2005), which severely constrains the size of phage genomes. Acquisition of unessential genetic material, such as IS elements, is thus probably more strongly counter-selected in phages than in plasmids. Moreover, presumably enormous effective population sizes make selection highly efficient in phages (Weinbauer 2004; Hambly and Suttle 2005). Consequently, phages experience very strong and efficient purifying selection, which dramatically limits IS abundance in their genomes. It is widely assumed in the literature that phages and plasmids both mediate IS HT (Blot 1994; Mahillon and Chandler 1998; Toussaint and Merlin 2002; Bordenstein and Reznikoff 2005; Frost et al. 2005; Touchon and Rocha 2007; Wagner and De la Chaux 2008). Our results fully support this classical view concerning plasmids. In sharp contrast, the paucity of IS in phages suggests that they may be rather poor vectors of IS HT among prokaryotes.
Associate Editor: J. Wernegreen
We thank R. Leplae for discussions on the ACLAME database. We also thank the three anonymous referees for their constructive comments on a previous version of the manuscript. This research was funded by a Young Investigator ATIP Award from the Centre National de la Recherche Scientifique (CNRS) and a European Research Council Starting Grant (FP7/2007–2013 grant 260729 EndoSexDet) to RC. SL was supported by a postdoctoral fellowship from the CNRS.