mgpB and mgpC sequence diversity in Mycoplasma genitalium is generated by segmental reciprocal recombination with repetitive chromosomal sequences


  • These data were presented, in part, at the 16th International Congress of the International Organization for Mycoplasmology, Cambridge, UK, 9–14 July 2006, the IUSTI-Europe Conference on Sexually Transmitted Infections, Versailles, France, 19–21 October 2006, and the 107th General Meeting for the American Society for Microbiology, Toronto, Canada, 21–25 May 2007.

*E-mail; Tel. (+1) 206 341 5350; Fax (+1) 206 341 5363.


Mycoplasma genitalium is associated with sexually transmitted infections in men and women that, if untreated, can persist, suggesting that mechanism(s) exist to facilitate immune evasion. Approximately 4% of the limited M. genitalium genome contains repeat sequences termed MgPar regions that have homology to mgpB and mgpC, which encode antigenic proteins associated with attachment. We have previously shown that mgpB sequences vary within a single strain of M. genitalium in a pattern consistent with recombination between mgpB and MgPar sequences (Iverson-Cabral et al.). In the current study, we show that mgpC heterogeneity similarly occurs within the type strain, G-37T, cultured in vitro and among cervical specimens collected from a persistently infected woman. In all cases, alternative mgpC sequences are indicative of recombination with MgPar regions. Additionally, the isolation of single-colony M. genitalium clonal variants containing alternative mgpB or mgpC sequences allowed us to demonstrate that mgpB and mgpC heterogeneity is associated with corresponding changes within donor MgPar regions, consistent with reciprocal recombination. Better-defined systems of antigenic variation are typically mediated by unidirectional gene conversion, so the generation of genetic diversity observed in M. genitalium by the mutual exchange of sequences makes this organism unique among bacterial pathogens.


The initial isolation of Mycoplasma genitalium from urethral specimens collected from two men with non-gonococcal urethritis in 1981 suggested that this bacterial species may be a novel cause of idiopathic urethritis (Tully et al., 1981; 1983). Because this fastidious organism is difficult to culture, nucleic acid detection assays have been crucial for epidemiological studies to demonstrate an association between M. genitalium and reproductive tract disease syndromes in men and women (Taylor-Robinson, 2002; Jensen, 2004; 2006; Totten et al., 2004). To date, this organism has been linked with acute and chronic non-gonococcal urethritis in men (Horner et al., 1993; Jensen et al., 1993; Maeda et al., 1998; Björnelius et al., 2000; Gambini et al., 2000; Totten et al., 2001; Horner et al., 2003; Taylor-Robinson et al., 2004; Andersen et al., 2007; Wikström and Jensen, 2006) and mucopurulent cervicitis, endometritis, pelvic inflammatory disease and tubal-factor infertility in women (Clausen et al., 2001; Cohen et al., 2002; 2005; 2007; Manhart et al., 2003; Simms et al., 2003; Pépin et al., 2005; Andersen et al., 2007; Haggerty et al., 2006; Korte et al., 2006). M. genitalium is associated with chronic urethritis in men (Wikström and Jensen, 2006), and a single M. genitalium strain can persist in the lower genital tract of infected women for 2 to 3 years (Cohen et al., 2007). This association with chronic infection is not surprising given the ability of several Mycoplasma species to establish long-term infection (Baseman and Tully, 1997; Razin et al., 1998; Razin, 1999). Mycoplasma persistence has been linked with immune evasion that most commonly involves changing the expression of surface-exposed, antigenic lipoproteins encoded by single- or multi-gene families (Wise, 1993; Chambaud et al., 1999; Rosengarten et al., 2000). Such variation is accomplished by several different mechanisms, including nucleotide insertions and deletions, DNA rearrangements, promoter inversion, gene conversion and site-specific recombination (Bhugra et al., 1995; Citti and Wise, 1995; Glew et al., 2000; Horino et al., 2003). The persistence of M. genitalium in the human reproductive tract suggests that this species, much like other Mycoplasmas, may have mechanism(s) for immune evasion.

Because of its small genomic size, the M. genitalium type strain G-37T was one of the first bacterial genomes to be fully sequenced and characterized (Fraser et al., 1995). At 580 kb, this genome represents the smallest of any known self-replicating cellular organism (Colman et al., 1990; Su and Baseman, 1990; Fraser et al., 1995), and thus has been used to study the minimal requirements for cellular life (Hutchison et al., 1999; Koonin, 2000; Glass et al., 2006). Despite its limited size, approximately 4% of the M. genitalium genome is composed of repeated sequences (Peterson et al., 1993), subsequently termed MgPar regions or MgPa repeats (Fraser et al., 1995), based on their homology to genes found within the MgPa operon, which include mgpB and mgpC. These genes encode proteins that are associated with the complex attachment organelle (Hu et al., 1987; Dhandayuthapani et al., 1999; 2002; Burgos et al., 2006) found on the characteristic tip structure of M. genitalium cells. Specifically, mgpB (also known as MG_191) encodes the MgPa protein (also referred to as P140), which mediates attachment to various cell types, including the ciliated epithelium of human fallopian tubes (Collier et al., 1990), and mgpC (also known as MG_192) encodes the P110 protein (also referred to as P114), which has been less characterized. Mutants that fail to express MgPa and/or P110 proteins have distorted morphologies that lack the characteristic attachment organelle and are non-adherent (Mernaugh et al., 1993; Burgos et al., 2006).

Both MgPa and P110 are translated from a single expression site as the full-length mgpB and mgpC genes are each present in a single copy on the M. genitalium chromosome. In contrast, the repeated MgPar sequences contain only partial, incomplete copies of mgpB and/or mgpC that have homology to distinct regions of both genes (Peterson et al., 1993; Fraser et al., 1995; Iverson-Cabral et al., 2006). The existence of these MgPar regions and their homology to portions of mgpB and mgpC promoted the hypothesis that the homologous, but not identical MgPar sequences recombine into either mgpB or mgpC, resulting in the expression of variant MgPa and P110 proteins (Dallo and Baseman, 1991; Fraser et al., 1995; Peterson et al., 1995; Iverson-Cabral et al., 2006). This proposed system of antigenic variation is especially appealing when one considers that these proteins are immunogenic in infected or immunized animals (Hu et al., 1987; Morrison-Plummer et al., 1987) and infected individuals (Clausen et al., 2001; Svenstrup et al., 2006). In the present study, we characterize the relationship of the mgpB and mgpC genes to the MgPar regions and show extensive mgpC diversity within the type strain of M. genitalium cultured in vitro, as well as within a single strain in a persistently infected woman. We also demonstrate donor MgPar sequences covary with sequences within mgpB and/or mgpC in a pattern consistent with reciprocal recombination. The results of this analysis, in addition to our preceding study of mgpB heterogeneity (Iverson-Cabral et al., 2006), indicate that there is widespread intrastrain mgpB and mgpC diversity resulting from segmental recombination with MgPar donor sequences. Furthermore, we provide compelling evidence that sequences are exchanged between the mgpB or mgpC expression site and MgPar regions through reciprocal recombination, which distinguishes M. genitalium from other bacterial pathogens.


Architecture of MgPar sequences in M. genitalium G-37T

The MgPar partial repeat sequences have homology to mgpB and mgpC [MG_191 and MG_192 in the Institute for Genomic Research (TIGR) database (], which are both found in a single copy on the G-37T chromosome. To define the architecture of MgPar regions, we used pairwise alignments to characterize mgpB and mgpC sequence repetition within each of the nine MgPar regions using techniques that have been described previously (Iverson-Cabral et al., 2006; GenBank accession numbers DQ248096DQ248104). Our previous investigation of mgpB sequence repetition (Iverson-Cabral et al., 2006) confirmed that there are three distinct regions of repetition termed B, EF and G within this expression site (Fig. 1). In the current evaluation, we determined that eight of the nine MgPar regions have homology to sequences found within the mgpC expression site (Fig. 1); MgPar 6, the smallest of the MgPar regions, has no homology to mgpC and is identified as an MgPar region based on homology to mgpB repeat region EF (Fig. 1). The use of the full-length MgPar sequences in this pairwise alignment confirms that, for MgPars 1, 3–5 and 7, homology to mgpC is focused within the previously defined repeat regions KL and LM (Fig. 1) that were preliminarily identified through an examination of incomplete genomic clones (Peterson et al., 1995). Using the complete G-37T genome sequence, we additionally determined that MgPars 2, 8 and 9 contain homologous sequences that span both regions KL and LM, as well as the intervening sequences (Fig. 1), indicating that a single, large repeat region that we will refer to as KLM herein is present within the mgpC expression site. This is consistent with the results of a recent homology search of the M. genitalium genome that used sequences within the mgpC gene (Musatovova et al., 2006).

Figure 1.

Homology between the mgpB and mgpC expression sites and the nine MgPar sequences in the G-37T genome showing concentrated regions of homology within mgpB regions B, EF and G, and mgpC regions KL and LM, as well as the discontinuous and alternating homology characteristic of the majority of MgPar regions, determined by pairwise alignment. Because MgPars 2, 8 and 9 have homology to both mgpC region KL and LM, as well as the intervening sequences, throughout this manuscript we will refer to the single, large mgpC repeat region as KLM. Conserved portions of each gene are shown in white, repeat regions highlighted in black, and small arrows indicate the location of the ‘AGT’ trinucleotide repeats present within mgpB, mgpC and three MgPar sequences. Below the mgpB/mgpC schematic, MgPar sequences are listed with their corresponding sizes to the left. Sequences present in each MgPar with homology to mgpB or mgpC are shown diagrammatically by a line; the placement of each line corresponds to the location in mgpB or mgpC for which homology was observed. Each line is numbered according to the sequences deposited in GenBank (Iverson-Cabral et al., 2006), so that the discontinuous homology to mgpB and mgpC sequences can be appreciated.

When the homology of the MgPar regions is compared with both mgpB and mgpC sequences, it is apparent that the majority of MgPar regions share a common architecture in which segments homologous to mgpB are interspersed with those that are homologous to mgpC. For example, scanning along MgPar 4, homology to mgpB regions B and EF is followed by homology to mgpC region KL, mgpB region G, and finally mgpC region LM (Fig. 1). Again, this arrangement of interspersed mgpB and mgpC homology is common to the majority of MgPar regions, including MgPars 1, 3–5, 7 and 9 (Fig. 1). This discontinuous homology emphasizes that these potential donor sites are unlikely to themselves act as alternative expression sites, because they contain only fragments of both genes with the intervening conserved portions of either gene completely absent (Fig. 1). While the transcriptional activity of MgPar regions has not, to our knowledge, been evaluated, the alternating mgpB and mgpC sequence homology, in addition to the absence of any functional start codons and the existence of AT-rich regions that encode stop codons in all three reading frames (data not shown; Peterson et al., 1995), suggests that these sequences serve as a reservoir of alternative sequences for genetic variation of mgpB and mgpC.

Overall, the MgPars are 79–90% identical to sequences within mgpB and mgpC in a pattern in which sequences of exact identity are interspersed with sequences that deviate from those within either gene by nucleotide substitutions, insertions and deletions (see Fig. S1 and Iverson-Cabral et al., 2006). For clarification, throughout this manuscript the term identical will be used to indicate sequences that have 100% identity, while the term homologous will be used to indicate sequences that share a great deal of homology (above 75%) but are not identical. As observed for mgpB (Iverson-Cabral et al., 2006), the correct reading frame within mgpC would be maintained following recombination with the MgPars, as nucleotide insertions and deletions present within the homologous MgPar regions generally occurred in multiples of three. A small segment of MgPar 3 that is homologous to the KL region of mgpC, however, contains two extra nucleotides (Fig. S1), suggesting that if this specific portion of MgPar 3 that includes the extra bases were to be inserted into mgpC, the reading frame would shift and premature stop codons would be introduced. However, as noted below, recombination between mgpB or mgpC and the MgPar regions frequently occurs in a segmental manner, indicating that an mgpC/MgPar 3 recombination event would not necessarily result in the premature truncation of P110.

To complement the pairwise alignment between mgpC and the MgPar sequences discussed above, we used a modification of the computer algorithm developed by Rocha and Blanchard (2002; Iverson-Cabral et al., 2006) to determine whether mgpC sequences were repeated elsewhere in the genome (outside of the nine MgPars). In this analysis, a sliding 23 bp window (bp 1–23, 2–24, 3–25, etc.) of the mgpC expression site was used to probe the entire G-37T genome for sequences of exact identity, allowing for no mismatches. Consistent with our pairwise alignment (Fig. 1), sequence repetition is concentrated within the proximal end of mgpC covering repeat region KLM (Fig. 2). Conversely, the terminal end of mgpC clearly contains sequences that are unique to the expression site because sequences here were only detected once in the genome (Fig. 2). Of note, this second analysis also revealed a sharp peak of repetition at bp 1225 in mgpC (Fig. 2), found to result from consecutive repeats of the sequence ‘AGT’ (formally referred to as ‘TAG’ repeats; see Fig. 1 and Iverson-Cabral et al., 2006) that occurs elsewhere in the genome, including mgpB. The ‘AGT’ repeat region within mgpC contains 11 consecutive ‘AGT’ repeats that are predicted to encode as many successive serine residues. Unlike the poly-‘AGT’ sequence within mgpB that does not occur within a repeat region because it lies just beyond region EF (Fig. 1), the poly-‘AGT’ sequence within mgpC is located in the single, large repeat region KLM that has homology to MgPars 2, 8 and 9. Consequently, within the G-37T genome, MgPars 2, 8 and 9 are reported to have 16, 10 and 9 ‘AGT’ repeats respectively (Figs 1 and S1).

Figure 2.

Regions of mgpC sequence repetition as identified by a computer-based algorithm that probed the G-37T genome for sequences with 100% identity to overlapping 23 bp segments of the full-length gene. Sequences within mgpC that are repeated within the G-37T genome correspond to the single, large repeat region KLM within mgpC (bp 125–1536), also identified in the pairwise alignment presented in Fig. 1.

In vitro heterogeneity of the mgpC gene in M. genitalium G37-S

The observation that the 5′, but not the 3′ end of mgpC contains sequences that are repeated throughout the genome indicates that only the proximal end of this gene would be a target for homologous recombination and, therefore, genetic variation. To determine whether mgpC repeat region KLM is heterogeneous within a single M. genitalium strain, we analysed the diversity of mgpC sequences within a stock culture of the type strain G-37, designated as G37-S, which has been grown and maintained in Seattle after it was initially obtained from ATCC. Using primers that would amplify only those mgpC sequences present within the expression site (C1-F/C1-R, Fig. 3A), we amplified the 5′ end of mgpC repeat region KLM, cloned the polymerase chain reaction (PCR) products, and sequenced 25 plasmids, from which four sequences [G37-S (1) through (4)] were identified (Fig. 3B). Sequence G37-S (1) was identical to the published G-37TmgpC sequence (identified in 22 of 25 plasmid clones), while sequences G37-S (2), (3) and (4) (identified in one plasmid each) had divergent sequences with identity to MgPars 8 and/or 9 (Fig. 3B). These results suggest that portions of the putative MgPar regions had been inserted into the mgpC expression site within M. genitalium strain G37-S, a strain we have previously shown to contain heterogeneous mgpB variants that are also indicative of MgPar recombination (Iverson-Cabral et al., 2006). Additionally, we assessed G37-S diversity within the 3′ end of mgpC region KLM using a second PCR assay that amplified sequences within the expression site (primers C2-F/C2-R, Fig. 3A). Five different sequences were detected using this PCR in the 25 plasmids analysed. These five sequences differed from the published G-37TmgpC solely on the number of ‘AGT’ repeats present. As stated above, the published G-37TmgpC sequence has 11 such repeats; yet the five sequences identified in G37-S contained 6–17 consecutive ‘AGT’ repeats (data not shown). Although the homologous MgPars 2, 8 and 9 sequences also contain this trinucleotide repeat sequence (Fig. 1), the DNA flanking the poly-‘AGT’ repeat regions within the heterogeneous G37-S mgpC sequences was unchanged, indicating that variation in the number of ‘AGT’ repeats is unlikely to be the result of recombination with homologous MgPar regions.

Figure 3.

Sequence heterogeneity of mgpC in M. genitalium strain G37-S showing the apparent insertion of MgPar sequences into the mgpC expression site.
A. Schematic of the full-length G-37TmgpC expression site with the location of the proximal repeat region KLM (black fill) and the terminal conserved region (white fill). Arrows indicate the location of primers used to amplify region KLM, as described in Experimental procedures.
B. Sequence variation identified using primers C1-F/C1-R. Note that sequence G37-S (1) is identical to the corresponding sequence of mgpC published in the M. genitalium G-37T genome, while alternative sequences G37-S (2), (3) and (4) contain 169, 515 and 130 bp of identity to MgPar 8 and those common to MgPars 8 and 9, as indicated by the key.
C. Illustration of the M. genitalium strain TW10-5 mgpC sequence (Musatovova et al., 2006), which is identical to the G-37TmgpC sequence except for bases 255–654 that are identical to MgPar 7. Arrows show the location of primers used in the MgPar 7-anchored PCR. Alternative sequences G37-S (5) through (11) identified using this PCR assay contain identity to various MgPar regions as indicated by the key. Sequences G37-S (2) through (11) are available under GenBank accession numbers EF458019 to EF458028.

To determine whether additional mgpC recombinant sequences were present within G37-S, we developed a more sensitive PCR assay using a primer specific for the mgpC expression site (C1-F, Fig. 3C) in combination with a primer specific for MgPar 7 (TW10-R, Fig. 3C). We chose to look for mgpC/MgPar 7 recombinant sequences specifically based on the recent publication of the mgpC sequence for M. genitalium strain TW10-5 [GenBank accession number AY679761; Musatovova et al., 2006), which was included in our sequence analysis and observed to vary from G-37T by the insertion of ∼400 bp from MgPar 7 (Fig. 3C). This MgPar 7-anchored PCR assay should produce a PCR product only when the segment of MgPar 7 targeted by the reverse primer has been inserted into the mgpC expression site, which was verified by the failure to obtain PCR products in a similar analysis of our single-colony filtered clone, G37-C (data not shown), that contains the published G-37TmgpC sequence and lacks any detectable alternative sequences (see below). Using this MgPar 7-anchored PCR to amplify DNA from G37-S, we not only identified a sequence within our stock of G-37 (G37-S) that is identical to the TW10-5 mgpC sequence [sequence G37-S (5), Fig. 3C], but also detected six additional mgpC variants [sequences G37-S (6) through (11), Fig. 3C] in the 10 plasmids analysed. Each recombinant sequence contained the insertion of a different portion of MgPar 7, and sequences G37-S (10) and (11) also included sequences with identity to other MgPars. The use of the more sensitive MgPar 7-anchored PCR in this analysis indicates that mgpC diversity within G37-S is more extensive than our initial survey indicated, and suggests that other mgpC variants would be identified if anchored PCR assays targeting other MgPar donor regions were employed. Additionally, the evaluation of mgpC heterogeneity within G37-S described above documents that the 5′ end of mgpC is incredibly diverse within a single strain, indicating that the proposed use of diversity within this region to differentiate strains (Musatovova et al., 2006) should be interpreted with caution.

To assess for diversity at the amino acid level, all divergent sequences identified in the experiments above were translated in silico. Each G37-S mgpC variant maintained the correct reading frame and was predicted to encode a divergent amino acid sequence. Together, the in vitro evaluation of mgpC region KLM indicates that multiple variants are present within the single, cultured strain G37-S, that alternative sequences are indicative MgPar insertion into the mgpC expression site, and that the variant mgpC sequences are predicted to result in the expression of variant P110 proteins.

In vivo heterogeneity of the mgpC gene in a persistent M. genitalium infection

The heterogeneity observed in a single strain of cultured M. genitalium prompted us to investigate whether mgpC diversity occurred during a natural infection. For this analysis, we examined multiple cervical samples collected longitudinally from a woman persistently infected with a single strain of M. genitalium. This woman (participant #10090) had nine consecutive M. genitalium-positive cervical samples taken over a 19 month period. To evaluate in vivo mgpC variation, the 5′ end of mgpC repeat region KLM was amplified (primers C1-F/C1-R, Fig. 3A) using DNA isolated directly from cervical samples. We chose to evaluate in vivo mgpC sequence diversity using DNA derived from patient specimens not only because corresponding viable clinical strains are unavailable, but also because the examination of sequences amplified directly from cervical samples allowed us to assess diversity in vivo, unaffected by sequence changes that may have subsequently occurred during in vitro culture. For this evaluation, five of the M. genitalium-positive samples collected on 24-May-01, 26-Jul-01, 29-Jan-02, 2-Apr-02, and 3-Dec-02 were examined to provide a longitudinal picture of the location and extent of mgpC diversity in vivo (Fig. 4).

Figure 4.

Heterogeneous mgpC sequences identified in cervical specimens collected from participant #10090, a woman persistently infected with a single strain of M. genitalium. A total of 50 sequences (10 per cervical sample) were analysed, and eight unique variants (A through H) were identified in the five time points examined.
A. Schematic of the eight mgpC sequences identified, as listed to the left. White regions are identical to the published G-37TmgpC sequence, while shaded areas illustrate divergence and instead are identical to G-37T MgPar sequences or have no homology to sequences within the G-37T genome (‘novel’), as indicated by the key. Block sizes in the key represent five nucleotides each. The star below the schematically depicted sequences designates the region shown in the amino acid alignment in C.
B. The temporal relationship of mgpC sequences identified. The majority of sequences were present at only one time point; however, sequences C and F were identified in two ‘early’ and three ‘late’ specimens respectively.
C. An alignment between amino acids 80–150 of the published P110 sequence for G-37T (GenBank accession number AAA2521; Inamine et al., 1989) and the amino acid sequences predicted for variants A through H. Residues that are identical to G-37T are highlighted in grey; divergent, missing or inserted amino acids are left unshaded. Below the alignment, the following symbols are used to represent the level of residue conservation: (*) identical (:) conserved (.) semiconserved, and ( ) no conservation, as defined by the clustal alignment program. Sequences A through H have been deposited in GenBank under accession numbers EF458029 to EF458036.

Among the 50 mgpC sequences analysed from these five cervical samples (10 per time point), eight unique variants were identified. As illustrated in Fig. 4, three sequences (A to C) were detected in the first time point (24-May-01), three (C to E) at the second time point (26-Jul-01), two (F and G) at the third time point (29-Jan-02), two (F and H) at the fourth time point (2-Apr-02), and one (F) at the last time point (3-Dec-02). Most of the mgpC variants identified in this persistently infected woman were not detected in more than one cervical sample, with sequences A, B, D, E, G and H each unique to their respective specimen (Fig. 4A and B). Of the sequences identified in multiple samples, sequence C was restricted to the two early time points, while sequence F was only observed in the final three samples (Fig. 4A and B). The distribution of sequences identified over time in this woman (Fig. 4B) supports a temporal shift in mgpC sequence prevalence that could be directly related to the function of, or immune response to, P110 during infection. All mgpC variants identified in this analysis contain regions of absolute identity to G-37TmgpC or MgPar sequences, as well as ‘novel’ sequences that have no identity to sequences within the published genome (Fig. 1). Because the only reported MgPar donor sequences are derived from strain G-37T, which is distinct from the strain identified in this woman, these so-called ‘novel’ sequences most likely represent novel MgPar sequences present in this clinical strain. Such ‘novel’ MgPar sequences were also identified in the mgpB region B sequences of various M. genitalium clinical isolates (Peterson et al., 1995), as well as in our previous evaluation of in vivo mgpB region B sequences from another woman in this cohort (participant #10139) who was infected with a different M. genitalium strain (Cohen et al., 2007; Iverson-Cabral et al., 2006).

The mgpC sequences identified in the analysis of participant #10090 were translated in silico, and as the alignment between the eight predicted P110 sequences in Fig. 4C shows, there is considerable diversity among the proteins predicted to be expressed by this strain in vivo. Sequences A through H each maintain the correct reading frame, do not contain any premature stop codons, and are predicted to encode P110 proteins that share amino acid sequences that are 80–98% identical and 87–99% similar (data not shown). The amino acid sequences of the ‘early’mgpC sequences (sequences A through E, Fig. 4C) are more related to each other than they are to the ‘late’mgpC variants (sequences F through H, Fig. 4C), and vice versa. Additionally, both the ‘early’ and ‘late’ amino acid sequences differ significantly from the corresponding G-37T P110 sequence (data not shown).

To determine whether the in vivo mgpC heterogeneity observed within participant #10090 occurred within one M. genitalium strain and is not the result of coinfection with multiple strains, we performed strain typing using an established PCR amplification and sequencing method (Jensen et al., 2004; Hjorth et al., 2006). As reported previously (Hjorth et al., 2006), this method of strain typing has a high discriminatory index, is stable over time in infected individuals, and shows concordance between sexual partners. This sequence-based system discriminates M. genitalium strains based on single-nucleotide polymorphisms that occur within the semiconserved 5′ region of mgpB, a region that importantly is present in a single copy in the M. genitalium chromosome. Using this method to evaluate the cervical samples from participant #10090, we determined that this woman was infected with a single strain over the 19 months of sample collection (GenBank accession number DQ248090; Cohen et al., 2007; Iverson-Cabral et al., 2006), indicating that the mgpC region KLM diversity observed (Fig. 4) occurred within a single infecting strain. Furthermore, this M. genitalium strain differed from those identified in other persistently infected women sampled at similar dates in the same cohort (Cohen et al., 2007; Iverson-Cabral et al., 2006; see also fig. 3 in Cohen et al., 2007). We conclude that sequence variation within mgpC is supportive of recombination with MgPar regions, and that in vivo mgpC heterogeneity is extensive and evolves over time in a persistent infection.

Isolation of M. genitalium single-colony clonal variants

Our in vitro evaluation of mgpB (Iverson-Cabral et al., 2006) and mgpC heterogeneity demonstrates that the G37-S culture contains a heterogeneous population that expresses multiple mgpB and mgpC variants. Before we addressed the reciprocal nature of recombination between mgpB or mgpC and the MgPar regions, we first isolated single-colony filtered clones that express a single mgpB and mgpC sequence that were free from any other contaminating sequences present within the diverse G37-S population. Three single-colony M. genitalium variants were selected and purified based on sequence variation within mgpB region B, mgpB region G and mgpC region KLM, in addition to a control clone that contained the published G-37TmgpB and mgpC sequences. To isolate these populations, the following strategy was employed: G37-S was inoculated onto H-agar plates, and mgpB repeat regions B and G and the 5′ end of mgpC repeat region KLM were amplified from 100 well-isolated colonies. The resulting amplicons were subjected to restriction enzyme digestion in order to identify colonies that expressed alternative mgpB or mgpC sequences that differed from the sequences published for G-37T. As shown in Fig. 5, PflFI, PflMI and MscI digestion clearly demonstrated that the variant clones contain mgpB or mgpC repeat region sequences that deviate from those published for G-37T. Colonies identified in this screen were subjected to a filter-cloning procedure at least three times as recommended for single-colony cloning of Mycoplasma species (Tully, 1983). Following final passage, the repeat region selected using the restriction enzyme screen was amplified, cloned and sequenced from 10 plasmids to determine the relative homogeneity of the alternative sequence expressed by each clone. After this filter-cloning procedure was complete, four single-colony clones were isolated that contained mgpB and mgpC sequences that matched those published for G-37T (designated as clone G37-C) or that had variant sequences within mgpB region B (designated as clone G37-vB), mgpB region G (designated as clone G37-vG), or mgpC region KLM (designated as clone G37-vKLM). The mgpB region B sequence in the G37-vB clone contained the apparent insertion of MgPar 7 (Fig. 6A–C) in 9 of the 10 plasmids analysed, while the single remaining sequence had identity to both MgPar 7 and MgPar 8 (data not shown). The mgpB region G sequence in the G37-vG variant contained the previously described alternative sequence 3-2 (Iverson-Cabral et al., 2006) that contains ∼115 bp of MgPar 3 (Figs 6A, 6D and E) in all 10 sequences examined. The mgpC region KLM sequence for G37-vKLM contained multiple insertions of both MgPars 8 and 1 (Fig. 6A, F and G) in nine of the sequences analysed, while the remaining sequence was identical with the exception that one less segment of MgPar 8 was present within mgpC region KLM (data not shown). The dominant mgpC region KLM sequence amplified from G37-vKLM provides evidence for multiple recombination events within a single repeat region, because five segments of MgPar 8 and a small segment of MgPar 1 have inserted into the mgpC expression site (Fig. 6A and F). This is in contrast to the changes seen within the mgpB repeat regions described above that all illustrate single recombination events. Finally, for our control, the mgpB regions B and G and mgpC region KLM sequences for G37-C importantly contained the published G-37T sequence in all plasmids examined (Fig. 6A).

Figure 5.

Use of restriction enzyme digest patterns to identify colonies of M. genitalium G37-S containing variant mgpB or mgpC sequences. The PCR products resulting from amplification of mgpB and mgpC repeat regions of interest were subjected to restriction enzyme digestion, thereby differentiating colonies containing sequences that deviated from those present in the control clone G37-C, which contained the mgpB and mgpC sequences published for G-37T. Colonies identified using this screen were filter cloned at least three times with identification and selection of variant colonies following each passage. Restriction enzymes used and clones identified were:
A. PflFI digestion identified the variant mgpB repeat region B sequence in G37-vB;
B. PflMI digestion identified the variant mgpB repeat region G sequence in G37-vG;
C. MscI digestion identified the variant 5′mgpC repeat region KLM sequence in G37-vKLM.

Figure 6.

Sequences within MgPar regions covary with those in mgpB and mgpC in a pattern consistent with reciprocal homologous recombination. In panels A, B, D, and F, boxed areas in black represent repeat regions identical to published G-37TmgpB and mgpC sequences, as do bases shaded in grey in panels C, E, and G. For all panels, colour shading shows identity to G-37T MgPar sequences as indicated by the key.
A. Schematic of mgpB and mgpC sequences for single-colony M. genitalium clones G37-C, G37-vB, G37-vG and G37-vKLM showing the insertion of MgPar regions into repeat regions of either gene. Variation within mgpB region B is expanded in panels B and C, mgpB region G in panels D and E, and mgpC in panels F and G. Panels B, D and F schematically depict recombination events observed between mgpB/mgpC and MgPar regions, with hatched lines indicating the approximate location for recombination initiation or termination; those in black highlight the reciprocal portion of the exchange, while those in red indicate the associated non-reciprocal portion of the recombination event, if present.
B. The perfectly reciprocal exchange of DNA between mgpB region B and MgPar 7 in G37-vB.
D. The asymmetrical reciprocal exchange of sequences between mgpB region G and MgPar 3 in G37-vB, G37-vG and G37-vKLM, in which only a portion of MgPar 3 contains the original mgpB expression site sequence.
F. The multiple recombination events between mgpC region KLM and MgPars 8 and 1 in G37-vKLM. The black line at the bottom of the panel indicates the region shown in more detail in panel G. Panels C, E and G show nucleotide alignments between the published G-37T and variant sequences indicated to illustrate the reciprocal transfer of sequences.
C. The G37-vB mgpB region B and MgPar 7 sequences showing perfect reciprocal recombination between these two regions. The black line denotes nucleotides that have been mutually exchanged.
E. The G37-vB, G37-vG and G37-vKLM mgpB region G and MgPar 3 sequences, where nucleotides underlined in black demonstrate the perfectly reciprocal exchange of DNA, while those underlined in red highlight the asymmetric nature of the recombination event.
G. The G37-vKLM mgpC region KLM, MgPar 8 and MgPar 1 sequences showing the reciprocal exchange of DNA. Bases underlined in solid black show the mutual transfer of DNA between the mgpC expression site and MgPar 8, while those underlined with a dotted line note the exchange of sequences between MgPar 1 and MgPar 8 that most likely occurred before MgPar 8 was inserted into the mgpC expression site. Variant sequences identified in this analysis are available under the following GenBank accession numbers: G37-vB mgpB region B (EF458037); G37-vB mgpB region EF (EF458038); G37-vB, G37-vG and G37-vKLM mgpB region G (DQ248069); G37-vKLM mgpC region KLM (EF458041); G37-vB MgPar 7 (EF458039); G37-vB MgPar 5 (EF458040); G37-vB MgPar 3 (EF458044); G37-vG MgPar 3 (EF458045); G37-vKLM MgPar 3 (EF458046); G37-vKLM MgPar 8 (EF458042); and G37-vKLM MgPar 1 (EF458043).

To completely define the mgpB and mgpC diversity present within each of these G37-S single-colony clones, all four potential variable regions within mgpB and mgpC (repeat regions B, EF, G and KLM) were amplified and sequenced. The G37-vB variant was selected and shown to carry an alternative mgpB region B sequence; yet sequence variation within mgpB regions EF and G was also noted (Fig. 6A). The alternative mgpB repeat region EF sequence contained a ∼12 bp sequence that was identical to MgPar 5 and the mgpB region G sequence matched that expressed by G37-vG (Fig. 6A), indicating that, in total, G37-vB contained the apparent insertion of MgPars in each of the three mgpB repeat regions (Fig. 6A). The G37-vG variant contained less diversity across the mgpB and mgpC expression sites, because the alternative mgpB region G sequence described above was the only variation detected (Fig. 6A). Interestingly, while selected for based on deviation within mgpC region KLM, the G37-vKLM variant contained the alternative mgpB region G sequence identified in both G37-vB and G37-vG, indicating that the G37-vKLM clone has accumulated diversity within both mgpB and mgpC genes (Fig. 6A). Altogether, the mgpB and mgpC sequences identified in these M. genitalium clones, which were all isolated from the same G37-S culture, illustrate that extensive sequence variation occurs within a single strain and, furthermore, that the mgpB and mgpC repeat regions can vary independently of each other.

Heterogeneity of MgPar sequences in M. genitalium single-colony variants can be explained by reciprocal recombination

Using the G37-S single-colony filtered clones, we next sought to determine whether insertion of alternative MgPar sequences into the mgpB/mgpC expression sites occurs via gene conversion, in which the donor MgPar sequences would remain unchanged, or via reciprocal recombination, in which there would be a mutual exchange between MgPar and expression site sequences. To distinguish between these two possible outcomes, the predicted MgPar donor regions were sequenced in each of the single-colony M. genitalium variants isolated. We predicted that if MgPar sequences were inserted into mgpB or mgpC through gene conversion, the corresponding MgPars would remain identical to G-37T, and conversely, that if sequences were exchanged through reciprocal recombination, the original G-37TmgpB or mgpC expression site sequence would be identified within the MgPar of interest. Remarkably, for G37-vB, the identification of 142 bp of MgPar 7 within mgpB region B was accompanied by the detection of 142 bp of the G-37TmgpB region B expression site within MgPar region 7 (Fig. 6B and C). Similarly, the MgPar 5 sequence from this same variant clone revealed that the MgPar 5 segment inserted into mgpB region EF had been replaced by ∼12 bp of the original G-37TmgpB expression site (data not shown). In both of these examples, the collective sequence changes within mgpB and MgPar regions are indicative of a reciprocal recombination event, as the genomic loci involved appear to have exchanged sequences in a perfectly reciprocal manner.

Sequencing of MgPar 3 from the G37-vB, G37-vG and G37-KLM variant clones [each of which expressed the same altered mgpB region G sequence described above (Fig. 6A, D and E)] is also supportive of reciprocal recombination in M. genitalium, although the DNA involved appears to have been exchanged in a more uneven manner. In this situation, the alternative mgpB region G sequence contains a segment spanning 115 bp that is identical to MgPar 3; yet the corresponding MgPar 3 sequence in these clonal variants contained only a short section spanning less than 20 bp from the original G-37TmgpB expression site (Fig. 6D and E). Consequently, it appears that the generation of the alternative mgpB region G sequence in these three M. genitalium variants involved a small reciprocal exchange near the 3′ end while a larger, seemingly non-reciprocal recombination event took place further upstream (Fig. 6D and E). In this manuscript, we will use the term asymmetrical reciprocal recombination to describe such recombination events where the transfer of DNA is reciprocal, albeit uneven.

To complement the observations above, the pattern of genetic exchange between mgpC region KLM and MgPar regions within G37-vKLM provides several additional examples of reciprocal recombination with MgPars 8 and 1. Amplification and sequencing of these two MgPars from G37-vKLM reveals several sequence deviations from the corresponding G-37T sequence. The MgPar 8 sequence within G37-vKLM is altered and contains at least four segments that are derived from the mgpC expression site, as well as a small section originating from MgPar 1 (Fig. 6F and G). Of note, the G37-vKLM MgPar 1 sequence is also modified by the insertion of a segment that has identity to the G-37T MgPar 8 sequence. A likely explanation for these findings is that sometime during the propagation and/or isolation of G37-vKLM, a reciprocal recombination event occurred between MgPar 1 (between ∼1660 and 1741 bp) and MgPar 8 (between ∼1547 and 1628 bp) prior to the insertion of MgPar 8 into the mgpC expression site (Fig. 6F and G). Similar to the mgpB region B/MgPar 7 (Fig. 6B and C) and mgpB region EF/MgPar 5 (data not shown) recombination events observed within G37-vB, approximately half of the G37-vKLM recombination events observed between mgpC, MgPar 8 and MgPar 1 involve the perfectly reciprocal exchange of DNA (Fig. 6F). The remaining recombination events identified within G37-vKLM appear to be more uneven, or asymmetrical (Fig. 6F), as observed between mgpB region G and MgPar 3 in the three variant clones (Fig. 6D and E).

Because the MgPar 1 and MgPar 8 sequence data from G37-vKLM suggested that recombination may have taken place between these two donor MgPar regions, we amplified and sequenced all nine MgPar regions from G37-C, G37-vB, G37-vG and G37-vKLM to evaluate the occurrence and extent of such intra-MgPar recombination. This evaluation determined that, other than the MgPar variations described above, the remaining MgPar sequences matched those reported for G-37T (data not shown) with two exceptions. First, approximately 10 bases of MgPar 4 within the G37-vB variant differed from that published for G-37T; yet corresponding changes in any homologous sequences were not detected (data not shown). This single sequence deviation was the only one detected within all M. genitalium single-colony clones that would be supportive of non-reciprocal gene conversion. Second, variation in the number of ‘AGT’ trinucleotide repeats was observed in all four single-colony M. genitalium filter clones, including G37-C. Importantly, other than variation due to number of trinucleotide repeats (Table 1), the G37-C clone contained sequences identical to those published for G-37T in all regions analysed (mgpB/mgpC repeat regions and the nine MgPars; Fig. 6A and data not shown). Changes in the number of ‘AGT’ repeats were observed within mgpB and mgpC, as well as MgPars 2, 6, 8 and 9 (Table 1). Because the trinucleotide repeat regions within mgpB, mgpC, MgPars 2, 8 and 9 are larger, they were identified in our in silico analysis (Figs 1 and 2); however, the smaller ‘AGT’ repeat within MgPar 6 was not. It is important to note that, while MgPar 6 has homology to mgpB, the mgpB ‘AGT’ repeat region lies beyond region EF and, as a result, does not correspond to the trinucleotide repeat in MgPar 6. The sequences flanking all changes in trinucleotide repeat number described in Table 1 were unchanged (data not shown), similar to the observations made with the five in vitro G37-S sequences identified with the 3′mgpC region KLM amplification described above (data not shown). Again, these data suggest that the number of ‘AGT’ repeats vary by a mechanism independent of MgPar recombination.

Table 1.  Variability in the number of ‘AGT’ trinucleotide repeats present in M. genitalium single-colony clones.a
Genomic locationG37-S cloneG37-vKLM
  • a. 

    MgPar regions not included in the table above do not contain any detectable variations in trinucleotide repeat number.

  • b. 

    Number of ‘AGT’ repeat sequences published for the M. genitalium G-37T genome.

MgPar 21617161718
MgPar 644555
MgPar 810991110
MgPar 999101010


The architecture of the MgPar regions compared with mgpB and mgpC first suggested the hypothesis that antigenic variation of the MgPa and P110 proteins may enable M. genitalium to alter attachment affinity or to evade the immune system (Dallo and Baseman, 1991; Fraser et al., 1995; Peterson et al., 1995; Iverson-Cabral et al., 2006). Because these MgPar regions contain homologous, yet non-identical and incomplete copies of mgpB and mgpC, they likely represent a pool of archived donor sequences that mediate mgpB andmgpC sequence variation by means of recombination. This hypothesis is supported by our current evaluation of mgpC diversity, as well as our previous investigation of mgpB heterogeneity (Iverson-Cabral et al., 2006), which indicate that these two genes vary within a single strain of M. genitalium both in vitro and in vivo in a pattern consistent with the insertion of MgPar sequences.

Sequence homology between the mgpB, mgpC and MgPar regions suggests that DNA is exchanged through homologous, rather than illegitimate, recombination. Segments of sequence identity between the mgpB or mgpC expression site and the corresponding MgPar regions that would be required for the homologous exchange of DNA (Watt et al., 1985; Shen and Huang, 1986) are found throughout a given repeat region. These identical sequences are found flanking regions of sequence diversity that could enhance mgpB and mgpC heterogeneity by facilitating recombination with any of the homologous repeat regions. Segments of sequence identity between MgPar regions could also enhance M. genitalium diversity via recombination between donor sites, as illustrated by the exchange of sequences between two MgPars in one of our single-colony cloned M. genitalium variants. While intra-MgPar recombination would not immediately affect mgpB or mgpC diversity, the continual exchange of sequences among the MgPar regions for eventual incorporation into mgpB or mgpC amplifies the overall potential for M. genitalium sequence diversity. The capacity for sequence variation is further enhanced by the observation that multiple MgPars are inserted at numerous positions within a single repeat region in a segmental manner. The combinatorial diversity within mgpB and mgpC that results from multiple, segmental recombination events is illustrated by the complexity and chimeric nature of in vivo mgpB (Iverson-Cabral et al., 2006) and mgpC sequences, which appear to change temporally during persistent infection in women. Future studies concerning M. genitalium variation and recombination would ideally be performed using an animal model in which a known variant could be inoculated with diversity then followed over the course of infection; however, the most successful M. genitalium animal model has been the chimpanzee (Møller et al., 1985), which is unfortunately currently unavailable. Notwithstanding the absence of such experimental tools, the homology between the mgpB or mgpC expression site and MgPar regions, in addition to our sequence analyses, indicates that M. genitalium has the potential capacity to create a seemingly unlimited number of mgpB and mgpC variants, despite a small genome, and that in the context of a persistent female infection, variant sequences evolve over time.

The isolation of single-colony cloned M. genitalium variants with alternative mgpB and/or mgpC sequences with inserted MgPar sequences allowed us to demonstrate that virtually all of recombination events observed involved sequence changes within the mgpB and mgpC expression site that were accompanied by corresponding changes in the suspect MgPar donor region, which is consistent with reciprocal recombination. In the majority of these reciprocal recombination events, the DNA sequences involved were exchanged in a perfectly reciprocal manner, while in the remaining events, sequences appeared to be exchanged in a partially reciprocal or asymmetrical manner. The asymmetrical recombination events (part reciprocal, part non-reciprocal) within our G37-S variant clones may be a consequence of single strand invasion, strand displacement, strand migration and/or repair of the resulting heteroduplexes during or subsequent to the recombination event (Snyder and Champness, 2003). Such asymmetric outcomes are not uncommon following homologous recombination (Sharples et al., 1999). Strikingly, in the four single-colony filter-cloned M. genitalium isolates examined, there was only one instance in which a sequence modification (i.e. ∼10 bases within MgPar 4 of G37-vB) was not accompanied by corresponding changes in the mgpB or mgpC expression site or remaining MgPar sequences. This was the only example that would be supportive of non-reciprocal recombination or gene conversion. All other mgpB and mgpC sequence changes detected in these variants had analogous changes that occurred within the donor MgPar sites, providing compelling evidence that mgpB and mgpC diversity in this organism is achieved through the reciprocal exchange of DNA.

The reciprocal nature of recombination observed distinguishes M. genitalium from other bacterial species that undergo antigenic variation by non-reciprocal gene conversion in which the recipient sequence changes while the donor sites remain the same. Much like the mgpB/mgpC and MgPar system described here, a repertoire of variant pilin sequences in Neisseria gonorrhoeae are generated as silent, alternative pilS sequences recombine into the single pilE expression site (Nassif and So, 1995; Nassif et al., 1999). In contrast to M. genitalium, however, the pilS donor sequences in N. gonorrhoeae remain unchanged following recombination (Haas and Meyer, 1986; Zhang et al., 1992). Gene conversion has also been described for several systems of recombinatory gene variation in other organisms, including tprK sequence diversity in the causative agent of syphilis, Treponema pallidum (Centurion-Lara et al., 2004), vlsE heterogeneity in the Lyme disease spirochete Borrelia burgdorferi (Zhang and Norris, 1998), msp2 variation in the rickettsial pathogen Anaplasma marginale (Brayton et al. 2002), and vlhA diversity in the poultry pathogen Mycoplasma synoviae (Noormohammadi et al. 2000). The reason for the difference between reciprocal recombination in M. genitalium and non-reciprocal recombination in these other organisms is unclear.

It is remarkable that M. genitalium is able to achieve such recombinatorial diversity in spite of its limited genome that may lack many of the recombination genes described for other organisms. Annotation of the G-37T genome identified homologues of the Escherichia coli basic recombination proteins; however, homologues of the recombination initiation enzymes are apparently missing (Fraser et al., 1995; Carvalho et al., 2005; Rocha et al., 2005), suggesting that this organism likely contains unique recombination genes or genes that encode recombination enzymes not recognized using current methods for genomic annotation. Interestingly, as stated previously (Iverson-Cabral et al., 2006), the mgpA gene (MG_190 in the TIGR database) found immediately upstream of mgpB and mgpC within the MgPa operon (Inamine et al., 1989) contains a phosphoesterase motif with homology to the RecJ presynaptic recombination protein of E. coli (Clark and Sandler, 1994; Aravind and Koonin, 1998a, b). It is tempting to speculate that mgpA is included 5′ of mgpB and mgpC in the MgPa operon to promote recombination within these genes. The full set of recombination enzymes required for homologous recombination M. genitalium and how they are regulated, as well as the rate of mgpB and mgpC variation, are all avenues of future study.

Our analyses of sequence diversity within mgpB (Iverson-Cabral et al., 2006) and mgpC predict that recombination with the MgPar regions most often contributes to corresponding full-length MgPa and P110 amino acid sequence diversity, rather than premature truncation resulting from the insertion of stop codons. Indeed, in more than 200 variant mgpB and mgpC sequences analysed from in vitro and in vivo sources, only one was predicted to contain in-frame stop codons (Iverson-Cabral et al., 2006). M. genitalium MgPa and P110 phase variants have been reported, but they have been described for spontaneously arising mutants that were selected by their non-adherent phenotype (Mernaugh et al., 1993). Burgos et al. (2006) characterized these mutants and demonstrated that the lack of MgPa and/or P110 expression was the result of large genomic deletions caused by a single cross-over between the mgpB or mgpC expression site and an adjacent MgPar that resulted in the elimination of intervening sequences. It has not been determined whether this phenomenon occurs in vivo, and if so, how often such recombination events take place. Considering that for many Mycoplasma species, attachment is essential for successful colonization and infection (Baseman and Tully, 1997; Razin, 1999), and that for M. genitalium the expression of full-length, functional MgPa and P110 proteins is required for attachment (Burgos et al., 2006), we predict that such MgPa and P110 phase variants would have a selective disadvantage for M. genitalium survival in vivo. Alternatively, it is possible that the reversible loss of these immunogenic proteins could enhance survival in certain anatomical site(s) and/or aid in the dispersal of M. genitalium from the genital tract.

It is currently unknown how the mgpB (Iverson-Cabral et al., 2006) and mgpC sequence heterogeneity observed in persistently infected women influences expression of the MgPa and P110 proteins. Currently, the structural, functional and/or antigenic epitopes within these two proteins are undefined. Exhaustive studies mapping antibody-accessible epitopes within these proteins have not been performed, and reports concerning the location of MgPa-adherence epitopes are conflicting (Opitz and Jacobs, 1992; Svenstrup et al., 2002). Considering that both MgPa and P110 induce a humoral response in chimpanzees (Morrison-Plummer et al., 1987; Clausen et al., 2001; Svenstrup et al., 2006), men with acute urethritis (Morrison-Plummer et al., 1987; Clausen et al., 2001; Svenstrup et al., 2006), women with tubal-factor infertility (Morrison-Plummer et al., 1987; Clausen et al., 2001; Svenstrup et al., 2006), and women with lower tract M. genitalium infections (S.L. Iverson-Cabral and P.A. Totten, unpublished data), one obvious hypothesis is that the variable mgpB and mgpC repeat regions encode antigenic epitopes that are antibody accessible and that antigenic variation would facilitate immune evasion. This hypothesis is consistent with the temporal shifts observed within the mgpB (Iverson-Cabral et al., 2006) and mgpC sequences identified in persistently infected women. Additionally, sequence variation within these proteins of the attachment organelle may influence the specificity of host cell receptor binding, tissue tropism and disease manifestation, as observed with N. gonorrhoeae infection (Rudel et al., 1992; Jonsson et al., 1994; Chen et al., 1997; Gray-Owen et al., 1997). Such changes in attachment could be mediated by the mgpB and mgpC sequence variation that arises from MgPar recombination, as well as by altering the number of ‘AGT’ repeat sequences, and therefore the number of serine residues within either protein.

Mycoplasmas are a unique group of organisms that are able to present a highly variable and versatile surface architecture to the immune system despite their characteristically small genomes and apparent absence of sophisticated regulatory mechanisms (Himmelreich et al., 1997). Diversification of the immunodominant MgPa and P110 proteins in M. genitalium, achieved by reciprocal recombination of mgpB or mgpC with alternative archived MgPar sequences, provides yet another possible mechanism for long-term Mycoplasma infection. Clearly, much remains to be learned about M. genitalium pathogenesis, and future studies are needed to investigate antibody-mediated selection of mgpB and mgpC variants, to define the structural, functional and antigenic epitopes of MgPa and P110, and to establish how protein variation influences the engagement of host cell receptors. Additionally, despite the limited recombination machinery presentin the M. genitalium genome, we have demonstrated a system for homologous reciprocal recombination, which could lead to studies that would provide insight into novel bacterial recombination pathways.

Experimental procedures

Mycoplasma genitalium media, growth conditions and strains

Mycoplasma genitalium cultures were propagated in H broth [0.02 g ml−1 soy peptone, 10% yeast dialysate, 20% horse serum, 0.005 g ml−1 sodium chloride, 5 mM glucose, 0.002% phenol red, and 200 units ml−1 penicillin G, pH 7.3 (Kenny, 1985)] at 37°C in 75 cm2 cell culture flasks (Corning, Corning, NY) until the media changed colour from red to orange, indicting growth of the organism by the production of acid. To obtain colonies of M. genitalium, cultures were inoculated onto 60 × 15 mm Falcon plates (Becton Dickinson, Franklin Lakes, NJ) of H-agar [H broth with 0.01 g ml−1 agarose, 0.004% phenol red instead of 0.002%, and pH 7.3 (Kenny (1985)]. Inoculated plates were incubated at 37°C until the agar changed from red to orange, and small colonies with the typical M. genitalium fried egg morphology were identified using an inverted microscope under 40× magnification.

The M. genitalium-type strain G-37T, from which the published genome sequence was derived [GenBank accession number NC_000908 (Fraser et al., 1995)], was obtained from the American Type Culture Collection (ATCC 33530) and maintained in the laboratory of George Kenny (University of Washington, Seattle). This G-37T strain was single-colony cloned using the standard threefold filtration procedure before it was deposited in ATCC (Tully et al., 1983 and J. G. Tully and D. Taylor-Robinson, pers. comm). We designated the stock culture of M. genitalium G-37T maintained and cultured in Seattle as G37-S to distinguish it from G-37T, the type strain maintained at ATCC, and from the cloned populations described below. Multiple 1 ml aliquots of G37-S were frozen at −80°C, and unless indicated otherwise, a 1 ml frozen aliquot was used to inoculate 100 ml of H broth. DNA was isolated from this culture using the Epicenter MasterPure DNA purification kit according to the manufacturer's directions (Epicenter, Madison, WI).

Amplification, cloning, sequencing and in silico analysis of mgpC

Designated regions of the mgpC gene were amplified from M. genitalium cultures using two separate, overlapping reactions with primers C1-F/C1-R or primers C2-F/C2-R as described in Table 2 and shown schematically in Fig. 3A. Primers C1-F and C2-R each contain sequences that are identical to unique regions of mgpC to ensure that only sequences within the expression site would be amplified, while primers C1-R and C2-F contain sequences that are common among mgpC and the homologous MgPars, thus ensuring that mgpC sequences would be amplified regardless of whether any MgPar regions had recombined into the expression site. For simplification and consistency throughout this manuscript, all base-pair locations for mgpC are designated based on the presumed translational start of the gene rather than base-pair location within the MgPa operon [containing mgpA, mgpB and mgpC, GenBank accession number M31431 (Inamine et al., 1989)]. Amplification reactions contained 1× PCR buffer (Promega, Madison, WI), the MgCl2 concentration indicated in Table 2, 0.1 μM concentrations (each) of the forward and reverse primers, 200 μmol l−1 of each deoxynucleoside triphosphate, 1 U Taq polymerase (Promega), 103 genome equivalents of purified M. genitalium DNA, and sterile, nuclease-free distilled water to bring the final volume to 100 μl. Each reaction was overlaid with 50 μl of mineral oil and cycled under the following conditions using a Perkin Elmer DNA thermal cycler (model 480): 4 min at 94°C, followed by 35 cycles of 1 min at 94°C, 1 min at the annealing temperature designated in Table 2, and 1 min at 72°C followed by a final 10 min elongation step at 72°C. All G37-S PCR reactions were performed using DNA from the same purified sample, and all PCR protocols involved standard PCR clean conditions to safeguard against PCR contamination.

Table 2.  PCR primers used in this study.
Region amplifiedPrimer namePrimer sequenceLocation in mgpB, mgpC or in the G-37T genome (bp)aAnnealing temperature (°C)MgCl2 concentration (mM)
  • a. 

    The location of each mgpB or mgpC primer is numbered so that the first base-pair represents the predicted translational start for each gene; the location of each MgPar primer is numbered according to its location in the G-37T genome.

  • b. 

    The locations of each mgpC primer relative to repeat region KLM are also shown schematically in Fig. 3A.

  • c. 

    Primer TW10-R was designed based on the sequence for MgPar 7 (bp 1729–1478; see also Fig. S1), which has little identity to sequences within mgpC of G-37T.

  • d. 

    These primers were used in a reaction containing 45 µl Invitrogen Platinum® PCR Supermix (Carlsbad, CA), 0.2 µM of each primer, the MgCl2 concentration listed, and 5 µl template DNA using the same cycling conditions discussed for mgpC PCR reactions.

  • e. 


5′ end of mgpC region KLMC1-FbAATGTTACTGCTTACACCCC106–125552.5
5′ end of mgpC regionC1-FbAATGTTACTGCTTACACCCC106–125553.5
KLM using MgPar 7-anchored assayTW10-RcTTCAGACCAATGGAGTTACG563–581  
3′ end of mgpC region KLMC2-FbAGCAGTCACCAACAACCACA906–925551.5
Internal portion of mgpB with ‘AGT’ repeatdmgpB AGT-FATCAGTTCTTAGACTTTCTCCCC2 285–2 307551.6
MgPar 1ePar1-FATTTTGGCTTTTGGTATTATTG85 352–85 373502.3
MgPar 3ePar3-FTAATACTGAGAATAGGTAAGCAC174 766–174 788573.0
MgPar 4ePar4-FAATACTAAAGTCCAATGGGAAGAA213 232–213 255503.0
MgPar 5ePar5-FGGCACTTACTGAGTGAAGAGAT229 223–229 244503.0
Par5-RCACGCTTTTGCTGTGTTTA231 780–231 798  
MgPar 6ePar6-FTGTTGTTTGTACCACTTTTCCC273 161–273 182572.0
MgPar 7ePar7-FTTCCAACAAGCTG CTAACAA312 632–312 651502.3
MgPar 8e (Rxn 1)Par8-FTCAAAATAGAGTGTTGTGGGTCG349 160–349 183502.2
MgPar 8e (Rxn 2)Seq1-FGGATGGGGGCGACTTAGAC349 661–349 679502.2
MgPar 9ePar9-FCTTTTTGAGCCATCTTTCATTAC428 100–428 122503.0

Polymerse chain reaction (PCR) products were purified using the MinElute PCR Purification Kit (Qiagen, Valencia, CA) and cloned into the pCR®2.1-Topo® vector (Invitrogen, Carlsbad, CA) following the manufacturer's standard protocols. Ligation reactions were transformed into TOP10-competent E. coli (Invitrogen), which were plated onto Luria Agar media (Miller, 1972) containing 100 μg ml−1 ampicillin and 40 μg ml−1 Xgal. Well-isolated, white colonies from each transformation were inoculated into 2.5 ml of Luria–Broth (Miller, 1972) containing 100 μg ml−1 ampicillin and incubated overnight shaking at 37°C. Plasmids were isolated using the QIAprep Spin Miniprep kit (Qiagen), digested with EcoRI, and analysed by agarose gel electrophoresis to confirm insertion of the PCR product into the cloning vector. Plasmids containing inserts of the appropriate size were sequenced at the University of Washington Biochemistry DNA Sequencing Facility ( in the forward and reverse direction using the M13F or M13R primers provided in the Topo cloning kit (Invitrogen). The clustalw program ( was used to align the resulting sequences, which were manually adjusted to correct for sequencing and/or alignment errors. For each mgpC PCR assay, five amplifications were performed, and from each reaction and subsequent cloning, 25 E. coli plasmids (five per PCR reaction) were sequenced. The mgpC sequences were also translated in silico using the European Bioinformatics Institute's (EMBL-EBI) EMBOSS Transeq program ( using both the mycoplasma genetic code [in which UGA encodes a tryptophan instead of a translational stop (Inamine et al., 1990)] and the appropriate reading frame. The predicted amino acid sequences were then aligned using clustalw and examined for amino acid variability, as well as the occurrence of potential stop codons. Because alternative sequences identified within mgpC are flanked by sequences of identity, regions of variation for alternative sequences were defined by the first and last divergent nucleotide present.

Analysis of clinical specimens

Cervical samples collected from a persistently infected (19 months) woman (participant #10090) were used to analyse mgpC heterogeneity in vivo. These samples were collected as part of a longitudinal study of a cohort of 300 women who were followed for up to 3 years to assess correlates of sexually transmitted infections among female commercial sex workers in Nairobi, Kenya (Cohen et al., 2007) and assessed for M. genitalium using our standard diagnostic PCR assay (Dutro et al., 2003; Cohen et al., 2007). Viable strains were not available for these retrospectively analysed specimens, so our mgpC analysis was performed on five samples from participant #10090 using primers C1-F and C1-R (Table 2) to directly amplify M. genitalium DNA present in the samples, following the same protocols described above for evaluation of in vitro cultures. Due to limitations in sample volume and because we wanted to determine whether genetic variation was a common feature of M. genitalium infection, we did not examine mgpC diversity within the cervical samples obtained from participant #10139, the persistently infected woman who was the focus of our previous mgpB study (Iverson-Cabral et al., 2006). In our current analysis of mgpC diversity within the samples from participant #10090, one PCR reaction was performed per time point (again due to limited sample volume) using 6 μl of purified DNA (representing 30 μl of original cervical sample) in a final volume of 50 μl and following amplification with 10 transformed E. coli clones examined. The specimens from participant #10090 were further analysed using the strain-typing method developed by Jensen et al. (2004; Hjorth et al., 2006) as described previously (Iverson-Cabral et al., 2006). Briefly, the semiconserved 5′ region of mgpB was amplified directly from patient specimens using the mgpB primers 1F and 1R (Iverson-Cabral et al., 2006), and because of limited sample, only one reaction was performed per time point with five transformed E. coli clones sequenced per PCR.

Isolation of variant mgpB and mgpC M. genitalium G37-S populations

We isolated derivatives of M. genitalium G37-S that contained alternative mgpB or mgpC expression site sequences. To accomplish this, 100 well-isolated colonies of M. genitalium on H-agar were selected and inoculated into H broth. Following a pH shift in the media, DNA was isolated using the Viral RNA Mini Spin kit (QIAGEN). To screen for colonies that had variant sequences within mgpB regions B or G, these repeat regions were amplified from the 100 initial isolates using primers 3F/3R and 5F/5R respectively (Iverson-Cabral et al., 2006), and were digested using restriction enzyme PflFI or PflMI for regions B or G respectively. To identify an M. genitalium variant with a divergent mgpC sequence, primers C1-F and C1-R (Table 2) were used to PCR amplify DNA from the 100 isolated colonies, and the resultant PCR products were analysed by digestion with the enzyme MscI. For all restriction digest screens, isolated colonies that displayed a digest pattern different from that predicted for the G-37TmgpB and mgpC sequences were used to inoculate fresh H broth, passed through a 0.45 μm syringe filter (Corning, Corning, NY) and plated onto H-agar. This single-colony filter-cloning process was repeated at least three times for each M. genitalium variant. The resulting clones were designated as G37-vB, G37-vG or G37-vKLM based on their selection for an alternative mgpB region B, mgpB region G or mgpC region KLM sequence respectively, using the restriction digest screen.

Restriction enzyme digestion of colonies in the single-colony cloning process was used to screen M. genitalium variants, which were subsequently more accurately defined with the analysis of several sequence changes within G37-vB, G37-vG and G37-vKLM. Following the final single-colony passage, DNA was isolated from these M. genitalium clones using the Viral RNA Mini Spin Kit (QIAGEN) and all potentially variable regions of mgpB and mgpC (mgpB regions B, EF and G, and mgpC region KLM) were then amplified and sequenced using primers 3F/3R, 4F/4R and 5F/5R described previously (Iverson-Cabral et al., 2006) and C1-F/C1-R and C2-F/C2-R (Table 2). In addition, primers mgpB AGT-F and mgpB AGT-R (Table 2) were used to determine the number of ‘AGT’ repeats present within the mgpB gene of these clones. To further determine whether the variant clones had a homogenous sequence in the repeat region of interest (for example mgpB region B for the G37-vB clone), the PCR product for that region was ligated into the pCR®2.1-Topo® vector (Invitrogen) as described above, after which 10 E. coli clones were sequenced. The same single-colony screening and purification scheme was used to isolate a M. genitalium clone carrying the published G-37T sequence for mgpB and mgpC; this control M. genitalium clone, designated as G37-C, contains a sequence identical to G-37T for all potentially variable regions (mgpB regions B, EF and G, and mgpC region KLM). To verify that the original stock G37-S and the single-colony clones had not been contaminated with other M. genitalium strains, the strain-typing method developed by Jensen et al. (2004; Hjorth et al., 2006) was performed as described previously (Iverson-Cabral et al., 2006).

Amplification and sequencing of MgPar regions from M. genitalium cultures

Each of the nine MgPar donor sequences was amplified using the primers listed in Table 2 at a concentration of 0.8 μM with 45 μl of Invitrogen's Platinum® PCR Supermix (Carlsbad, CA), the MgCl2 concentration indicated in Table 2 and 6 μl of template DNA, and 50 μl of sterile, nuclease-free mineral oil covering each reaction. The Perkin Elmer DNA thermal cycler (model 480) was used for amplification with the following cycling conditions: 5 min at 94°C, followed by 35 cycles of 1 min at 94°C, 1 min at the annealing temperature designated in Table 2, extension at 72°C for 7 min (MgPars 1, 2, 4, 5, 7, and 8) and for 1 min (MgPars 3 and 6), followed by a final incubation at 72°C for 10 min. With the exception of MgPar 8, which was amplified using two overlapping reactions (Table 2), all MgPar regions were amplified in a single reaction. The resulting PCR products were purified with the MinElute PCR Purification Kit (QIAGEN) and directly sequenced by the University of Washington Biochemistry DNA Sequencing Facility ( Because of the 1–3 kb size of each MgPar, a complete sequence was obtained for each region with multiple, overlapping sequencing reactions using the MgPar forward and reverse primers, as well as some of the primers listed in the footnote to Table 2.

Data deposition

All sequence data presented in this study have been submitted to the GenBank database and are available, as indicated in the relevant figures, under accession numbers EF458019 to EF458046.


This work was supported by NIH RO1 AI/HD48634, a University of Washington Royalty Research Fund Award, and a University of Washington Provost Bridge Award. S.L.I.-C. is supported by the STD/AIDS Research Training Grant NIH/HIAID T32 AI007140. We thank Eduardo Rocha for access to, and assistance with, the in silico computer algorithm used to examine mgpC repetition. Additionally, we thank Eduardo Rocha, Arturo Centurion-Lara and Gwen Wood for critical reading of this manuscript, George Kenny for assistance with M. genitalium growth, and Sheila Lukehart for scientific discussion and suggestions.