Repeat-associated phase variable genes in the complete genome sequence of Neisseria meningitidis strain MC58

Authors


Abstract

Phase variation, mediated through variation in the length of simple sequence repeats, is recognized as an important mechanism for controlling the expression of factors involved in bacterial virulence. Phase variation is associated with most of the currently recognized virulence determinants of Neisseria meningitidis. Based upon the complete genome sequence of the N. meningitidis serogroup B strain MC58, we have identified tracts of potentially unstable simple sequence repeats and their potential functional significance determined on the basis of sequence context. Of the 65 potentially phase variable genes identified, only 13 were previously recognized. Comparison with the sequences from the other two pathogenic Neisseria sequencing projects shows differences in the length of the repeats in 36 of the 65 genes identified, including 25 of those not previously known to be phase variable. Six genes that did not have differences in the length of the repeat instead had polymorphisms such that the gene would not be expected to be phase variable in at least one of the other strains. A further 12 candidates did not have homologues in either of the other two genome sequences. The large proportion of these genes that are associated with frameshifts and with differences in repeat length between the neisserial genome sequences is further corroborative evidence that they are phase variable. The number of potentially phase variable genes is substantially greater than for any other species studied to date, and would allow N. meningitidis to generate a very large repertoire of phenotypes through expression of these genes in different combinations. Novel phase variable candidates identified in the strain MC58 genome sequence include a spectrum of genes encoding glycosyltransferases, toxin related products, and metabolic activities as well as several restriction/modification and bacteriocin-related genes and a number of open reading frames (ORFs) for which the function is currently unknown. This suggests that the potential role of phase variation in mediating bacterium–host interactions is much greater than has been appreciated to date. Analysis of the distribution of homopolymeric tract lengths indicates that this species has sequence-specific mutational biases that favour the instability of sequences associated with phase variation.

Introduction

Pathogenic Neisseria species are responsible for causing bacterial meningitis and gonorrhoea. Their surface structures have been extensively studied and display substantial intrastrain diversification, providing adaptability to different microenvironments, stages of colonization, and immune responses within the human host. In Neisseria, phase variation is a mechanism for generating intrastrain diversification. Phase variation is associated with reversible mutations in simple DNA repeats (in which each identical repeated motif is less than 10 nucleotides). These are located either within open reading frames (ORFs), such that alteration in the length of the repeat alters the translational reading frame, or within promoters, where they affect the relative position of flanking promoter components and influence transcription. A majority of the recognized bacterial surface structures of meningococci which interact with the host are phase variable. This is consistent with a model in which phase variation is a feature of genes, described as contingency genes, which facilitate adaptation of bacterial populations to changing environments and are frequently important in bacterial virulence (Moxon et al., 1994). Phase variable structures in Neisseria include: capsule, which confers serum resistance and affects cell interactions (DeVoe, 1982; Virji et al., 1992; 1993a,b; Stephens et al., 1993; Hammerschmidt et al., 1994; 1996a), pili and pilus modifications, which affect adhesion (Stephens and McGee, 1981; Virji et al., 1991, 1993b; Rudel et al., 1992; 1995; Kupsch et al., 1993; McNeil et al., 1994; Nassif et al., 1994; Waldbeser et al., 1994; Jennings et al., 1998; Weiser et al., 1998), several surface proteins including Opas, Opc, PorA and iron binding proteins which have roles in adhesion, formation of surface pores and nutrient acquisition (Poolman et al., 1980; Stern et al., 1984; 1986; Sparling et al., 1986; Achtman et al., 1988; Tommassen et al., 1990; Bhat et al., 1991; Virji et al., 1992; 1993a; Hopman et al., 1994; Sarkari et al., 1994; van der Ende et al., 1995; Chen et al., 1996; 1998; Lewis et al., 1999; Schryvers and Stojiljkovic, 1999), and lipopolysaccharide (LPS) (Apicella et al., 1987; Schneider et al., 1988; 1991; Weel et al., 1989; van Putten and Robertson, 1995; Jennings et al., 1999). An increased rate of phase variation, associated with the loss of the Dam methylase, has been associated with some invasive disease isolates (Bucci et al., 1999). The variability associated with DNA repeats, the length of which is critical in expression, makes these repeats markers for phase variable genes likely to be involved in host adaptation.

Results and discussion

The repeats that were identified as likely to affect the expression of the associated ORFs are listed in Table 1. The most striking finding is how many putative phase variable genes are present in this genome sequence. Of the 65 potential phase variable genes identified in this analysis, 13 have been previously described as phase variable (indicated by K in Table 1). Of the remainder, 31 are strong and 21 are less strong candidates (indicated by S and M, respectively, in Table 1) – see below. This number contrasts with a total of 15 potential phase variable genes identified in the genome sequence of Haemophilus influenzae (Hood et al., 1996; van Belkum et al., 1997a), 26 in Helicobacter pylori (Saunders et al., 1998), and 42 (of which 22 are strong candidates) in Treponema pallidum (N.J.S., unpublished results).

Table 1. Repeat associated putative phase variable genes in N. meningitidis strain MC58.
RepeataFrameshiftbNMAcNGOdGene similaritiesePrfNMB numberg
  • Notes: *ORF inactivated by other mutations.

  • a

    . Sequence of the repeat associated with the putative phase variable gene.

  • b

    .+ and – indicate the presence or absence of a frameshift in ORF respectively; (+) indicates a frameshift in the most probable ORF in cases where this cannot be implied from homologies. Pro indicates a promoter located repeat.

  • c

    . N. meningitidis serogroup A strain Z2491. S = the repeat is the same length in the compared sequence, P = polymorphic, i.e. the repeat is of a different length in the compared sequence, NR = there is no repeat at the equivalent location of the compared sequence, N = there is no homologue of the gene in the compared sequence.

  • d

    . N. gonorrhoeae strain FA1090. S, P, NR, and N as above.

  • e

    . Homology identification based on annotation of the MC58 genome.

  • f

    . Pr = Predicted likelihood of phase variability. K = previously recognized phase variable gene; S = strong candidate; M = moderate candidate; R = previously documented repeat for which the associated gene has not been confirmed to be phase variable.

  • g

    . ORF designation in the annotation of the MC58 genome.

Surface associated proteins
 (G)11+SSPilus assembly protein (pilC2)KNMB0049
 (G)14+PPPilus assembly and adhesion protein (pilC1)KNMB1847
 (G)11ProPPClass I outer membrane protein (porA)KNMB1429
 (C)12ProPNClass 5 protein/surface adhesion protein (opc)KNMB1053
 (CTTCT)10+PPClass 5 protein/surface adhesion protein (opa)KNMB0442
 (CTTCT)11+PPClass 5 protein/surface adhesion protein (opa)KNMB1636
 (CTTCT)13+PPClass 5 protein/surface adhesion protein (opa)KNMB1465
 (TCTTC)16+PPClass 5 protein/surface adhesion protein (opa)KNMB0926
 (G)6+SP‘Cell adhesion molecule’– from patent matchSNMB2104
 (C)9NNOuter membrane protein related to adhesion/
invasion proteins and IgA protease
SNMB1998
 (TAAA)9ProNNOuter membrane protein (yop1/yadA related)MNMB1994
 (GGCA)3NRSAdhesion and penetration protein homologueMNMB1985
 (AC)4SSOuter membrane protein homologous to D15
(omp85)
 M
NMB0182
 (TG)4SSTransporterMNMB1277
 (G)9PNRHaemoglobin receptor (hmbR)KNMB1668
 (G)8NRNRLactoferrin binding protein (lbpA)SNMB1540
 (C)11ProNRPIron acquisition protein (frpB)SNMB1988
Surface sugar biosynthesis proteins
 (G)13PPSaccharide acetylaseSNMB1836
 (CAAACAA)34+PPGlycosyltransferaseSNMB0624
 (G)14PPLPS glycosyltransferase (lgtA)KNMB1929
 (C)12+NPLPS glycosyltransferase (lgtG)KNMB2032
 (C)7NNCapsule biosynthetic protein (siaD)KNMB0067
 (G)11PSPilus glycosyltransferase (pglA)KNMB0218
Toxin and secreted enzyme related
 (ATAACAAA)4+NNRTX-type toxin*SNMB1407
 (C)10SPSerine proteaseSNMB1969
Bacterial population competition determinants
 (G)7+PSRestriction modification system
specificity protein (hsdS)*
SNMB0831
 (A)9+SNType I restriction modification system
modification protein (hsdM)*
MNMB1223
 (TG)4NNType II restriction enzymeMNMB0726
 (C)6NRNRType II restriction enzymeMNMB1032
 (CAGC)20+PPRestriction-modification system modification
protein (mod)
 S
NMB1375
 (CCCAA)16+PPType III restriction modification system
modification protein (mod)
SNMB1261
 (CAAAT)5NNBacteriophage gene (funZ)SNMB0961
 (A)7+NNBacteriophage protein (ner)SNMB1080
 (G)7+NNBacteriocin export protein (mtfB)SNMB0098
 (C)5+SPColicin V secretion protein (cvaA)SNMB1783
Others
 (AT)5SSDi-heme cytochrome C (fixP)SNMB1723
 (C)8NRNRProtein involved in Fe-S complex
generation (nifS)
 S
NMB1379
 (TGCG)3SSGlutaredoxin 2MNMB1734
 (TTCC)3SSFatty acid/phospholipid sythesis protein (plsX)MNMB1913
 (A)8+SSCell cycle protein (mesJ)MNMB1140
Proteins of unknown function and hypothetical proteins
 (AAGC)9+PPFUNRNMB0312
 (AAGC)5+PPFUNRNMB1525
 (G)9+SPFUNSNMB0415
 (G)7+S(P)FUNSNMB0486
 (G)7+S(P)FUNSNMB0970
 (TTCC)4(+)SNFUNSNMB1893
 (G)7+S(P)FUNSNMB1741
 (C)7SPFUNMNMB0593
 (C)8(N)9(G)7ProPNFUNMNMB1634
 (C)8(N)10(G)7ProPNFUNMNMB1543
 (TA)4NRNRFUNMNMB0432
 (GAAA)3NNFUNMNMB1265
 (AC)4SSFUNMNMB0471
 (CAAG)11PPHypothetical proteinRNMB1507
 (AGCA)3(+)SNRHypothetical proteinSNMB1275
 (C)7NPHypothetical proteinSNMB0488
 (C)7NPHypothetical proteinSNMB1489
 (G)7(+)PNHypothetical proteinSNMB1931
 (G)6(+)SNRHypothetical proteinSNMB0300
 (C)6(+)NNHypothetical proteinSNMB1760
 (A)11PPHypothetical proteinSNMB0368
 (T)10Pro or (+)NNHypothetical proteinSNMB0065
 (C)7ProNNHypothetical proteinMNMB2008
 (A)9ProPPHypothetical proteinMNMB1786
 (A)11ProSPHypothetical proteinMNMB0032

The methodology used to identify nucleotide repeats that were potentially associated with phase variation was the same as that described for H. pylori (Saunders et al., 1998) in which the presence and length of repeats were interpreted on the basis of their sequence context. The sequence context of every homopolymeric tract of greater than 6 bp in length, every dinucleotide repeat of four or more copies, every other repeat motif of up to 10 bp in length with three or more copies, and the junctions of every frameshifted ORF were considered. Those repeats located such that changes in their length would be expected to alter expression, either through altering the reading frame or the relative position of promoter components, were used as the basis for identifying candidate phase variable genes.

In addition, this analysis was combined with a Markov chain analysis (Cox and Miller, 1965). Markov chain analysis determines the expected frequency of any sequence ‘word’ on the basis of the frequency of its component parts within a sequence. Excesses in a particular sequence above those expected suggest that a particular sequence is either selected for on the basis of function or is a product of a process which generates it at higher than normal frequency. Markov chain analysis of this type of simple repeat has not been previously described in this context. This approach has important advantages over methods that are based upon base composition and assumptions that the DNA sequence is random in the absence of the specific bias sought. For example, a Markov chain analysis will distinguish between an excess of a 4 bp sequence that is due to an abundance of the 2-and 3 bp components and a true bias for the specific presence of the 4 bp sequence. Based on Markov chain analysis, there are fewer G/C homopolymeric tracts of less than 5 bp than predicted on the basis of the frequency of their component parts, and a large excess of G/C homopolymeric tracts greater than 6 bp, most significantly 8 bp, in length (Fig. 1). For each of the repeats, the deficit or excess of the component parts is accounted for in the Markov prediction. These results suggest that while there is selection against the shorter G/C homopolymeric tracts, perhaps due to mechanistic biases or other factors such as codon choice, there is a strong selective bias for the generation and instability of G/C homopolymeric tracts of greater than 6 bases in length. This implies that, in addition to the specific selection for variability for the repeat-associated phase variable genes, there is a selection for the mutational processes which mediate the process of phase variation. This is reflected in the distribution of the repeats present in the genome as a whole. In this context, it should be noted that these repeats are not limited to locations in which they might mediate phase variation. The findings shown in Fig. 1 support the appropriateness of the search thresholds used in this analysis. In contrast to the results for the G/C homopolymeric repeats, there is an over representation of AAA/TTT trinucleotides (and AA/TT dinucleotides – data not shown) but longer repeats of up to 8 bp are less frequent than predicted, suggesting that they are selected against, as a consequence of increasing instability or mechanistic bias. There is a lesser excess of homopolymeric tracts of As or Ts of greater than 9 bases, but they are relatively infrequent, and they are predominantly located in sites that would not be expected to affect gene expression.

Figure 1.

The figure shows the ratio of observed over expected homopolymeric tract frequencies determined by Markov chain analysis using word length L-2 for the predictions (Gs and Cs, solid line; As and Ts, dashed line) plotted against homopolymeric tract length.

Similar repeat-based analyses have identified candidate phase variable genes that have subsequently been confirmed experimentally and/or shown to be polymorphic in comparative studies (Hood et al., 1996; van Belkum et al., 1997a,b; Alm et al., 1999; Appelmelk et al., 1999; Ren et al., 1999). In addition, N. meningitidis represents a unique context in which to interpret the results of this type of analysis because the existing body of knowledge on the length, composition, and instability of repeats associated with phase variation in Neisseria, facilitates interpretation of the repeat sequences that are identified. Both homopolymeric tracts and repeats composed of longer motifs are already recognized to mediate the phase variation of several contingency genes in N. meningitidis. These include relatively short homopolymeric tracts, e.g. of (C)7 in a gene for capsule biosynthesis (Hammerschmidt et al., 1996b), which, combined with the Markov modelling, indicate the lower limits for potentially unstable repeats in this species. A large proportion of the putative phase variable genes (30 of 57) that have repeats located within the ORF with result in frameshifts. In one of these candidates (NMB0415), differences in the length of the repeat that alter the translational reading frame have been identified in different neisserial strains (L.A. Snyder, N.J. Saunders and W.M. Shafer, submitted). The proportion (53%, 29 of 55) of ORFs with repeats of a length that would prevent expression due to frameshifting, when frameshifts are otherwise uncommon in ORFs that do not have repeats, provides further evidence for the instability of these repeats, and is possibly a reflection of the absence of relevant selection pressures during in vitro cultivation.

The candidacy of the putative phase variable genes is not equal. All of the intact genes associated with appropriately located long repeats, for instance like those described in H. influenzae (Hood et al., 1996), can be reasonably considered very likely to be phase variable. Those associated with shorter repeats, or with motifs that have not yet been shown to be unstable in a given species, whilst categorized as potentially phase variable, are less strong candidates. Table 1 describes the repeated motif, the number of copies, and an indication as to whether the candidate is previously recognized as phase variable (K), or has been assessed as either a strong (S), or a less-strong (M) candidate. It is likely that any genes incorrectly identified as potentially phase variable (false positives) will be included in the ‘M’ category. The data on polymorphisms from the comparative analysis of the genome sequences was not used in this assessment. However, it is noteworthy that of the 19 genes rated ‘M’, 13 have the repeat or gene present in at least one of the other sequences and five of these 13 display length polymorphisms.

Upon publication of the strain MC58 genome sequence (Tettelin et al., 2000), we described the number of the potential phase variable genes divided into broad functional categories with no description of the repeat components. In the serogroup A strain Z2491 genome sequence reported by Parkhill et al. (2000), the authors state that ‘there are around 26 tandem repeats indicating potentially phase variable genes’ (and refer to supplementary information available from http://www.sanger.ac.uk). Our comparative analysis identified 50 of the 65 candidate phase variable genes found in strain MC58 in strain Z2491, of which 33 include the respective, associated potentially unstable repeat sequence (see Table 1). Therefore we propose that that repertoire of potentially phase variable genes in the serogroup A strain is greater than was indicated.

The presence of potentially functional dinucleotide repeats is a novel finding for Neisseria species. The (AT)5 repeat in fixP is one of only two of this type/length in the genome and the only one that is located within an ORF. In the genome as a whole, there is apparently selection against (AT/TA)4 or (AT/TA)5 repeats in ORFs, possibly due to their instability (data not shown). (AT)n repeats have been associated with phase variation in H. influenzae (van Ham et al., 1993).

A comparison between the two available H. pylori genome sequences has been used to identify phase variable genes in strain J99 on the basis of polymorphisms in repeats present in both strains J99 and 26695 (Alm et al., 1999). The completed N. meningitidis serogroup A strain Z2491 sequence (available at http://www.sanger.ac.uk/Projects/N_meningitidis/) and the largely complete N. gonorrhoeae strain FA1090 sequence (available at http://www.genome.ou.edu/gono.html) were used to perform a similar comparative analysis of the of the 65 putative phase variable genes identified in strain MC58 (see Table 1). This is the first opportunity to perform this type of analysis on two genetically unrelated strains of a pathogenic species and a closely related species with a distinct pathogenesis. Such comparison provides additional data on polymorphisms which supports the candidacy of many of the genes as phase variable, but also addresses issues of interstrain and interspecies differences in the phase variability and gene complement of a subset of genes likely to be important in pathogenesis. The comparative analysis revealed polymorphisms in the length of the repeats in 36 of the 65 candidate genes [13 genes previously known to be phase variable (10) or associated with repeats (3), and 23 of the novel candidates]. The repeats predicted to mediate phase variation in strain MC58 were absent from a further six genes in one or both of the other sequences where length differences were not present. Fifteen of the strain MC58 genes were not present in the strain Z2491 sequence while 18 were absent from strain FA1090. In 12 instances there were no homologues in either of the other two neisserial sequences. While the gonococcal sequence remains incomplete, it is possible that some of the genes that are absent in this analysis are present in this strain but not currently included in the database. However, it is striking that whilst an estimated 8.8% of all the genes present in strain MC58 do not have homologues in strain Z2491 (Tettelin et al., 2000), 23% of the putative phase variable genes fall into this category.

Forty-six of the 65 (71%) putative phase variable genes are either associated with repeats that are of different lengths in the other sequences or with frameshifts in strain MC58. This is supportive evidence that these repeats are unstable and that the associated genes are phase variable. The proportion of these genes that lack the repeats or are not present in one or other of the other sequences contrasts with the comparison of the H. pylori sequences in which the complement of these genes was similar in both strains. This suggests that the repertoire of genes that undergo phase variation is a major source of interstrain and interspecies variability in Neisseria. In addition, the association of phase variation with gene families and genes with related functions also suggests that there may be considerable functional redundancy associated with phase variable genes.

Where possible functions are identifiable, the putative phase variable genes in strain MC58 include a similar repertoire of genes important in host interactions to that identified in H. influenzae (Hood et al., 1996) and H. pylori (Saunders et al., 1998). These functional groups typically include LPS biosynthetic genes, surface proteins, and restriction/modification system genes. Variation of neisserial LPS is known to play a role in virulence, niche adaptation, and host mimicry (Giardina et al., 1999). Novel, potentially variable glycosyltransferases identified in the strain MC58 genome sequence suggest that additional LPS structural variants and/or sugar modifications of surface structures occur. Not every LPS gene that is phase variable in other neisserial strains has repeats that are long enough to be likely to mediate high frequency variation in strain MC58 (e.g. lgtB and lgtE). The stability of shorter repeats in contexts that are recognized to be associated with phase variation has been demonstrated experimentally (Jennings et al., 1999). lgtB and lgtE in strain MC58 have a (G)5 at the same locations as the reported longer repeats in other strains suggesting that they are remnants of longer repeats, although it is also possible that these shorter repeats reflect the ancestral state. These may act as a genetic pool of genes predisposed to regain phase variability under appropriate selective conditions [the (G)6-associated acetyltransferase (NM0285/6) may be similar in this respect]. Of the genes encoding surface-exposed iron acquisition proteins, hmbR and frpB[FrpB is a vaccine candidate (Ala'Aldeen et al., 1994; van der Ley et al., 1996)] are known to be phase variable in Neisseria, the latter previously documented only in N. gonorrhoeae (Dyer et al., 1988; Beucher and Sparling, 1995; Pettersson et al., 1995; Lewis et al., 1999). Lactoferrin-binding protein (encoded by lbpA) was not previously considered to be phase variable. The lbpA sequence from N. gonorrhoeae has only a (G)5 repeat, and the previously reported N. meningitidis gene sequence has no repeat at this locus (Pettersson et al., 1993; Biswas and Sparling, 1995). This is the first unequivocal example of a gene that has features of phase variability in one strain and no potential variability in another. The other phase variable iron acquisition gene reported in Neisseria (hpuA) (Chen et al., 1996; Lewis et al., 1999) is absent in strain MC58. The repertoires of LPS and iron acquisition genes with different capacities for variation highlights the potential for interstrain differences that a combination of genetic exchange in a naturally competent organism and phase variation confer on a bacterial population. The presence of several types of potentially phase variable gene in N. meningitidis that have not been previously seen in other species types extends the potential roles of phase variation in bacterial fitness and the introduces new concepts into the field of phase variable gene functions. Phase variable restriction/modification enzymes have been identified previously in other species including Mycoplasma bovis, H. influenzae and H. pylori, but with the exception of the type I system in M. bovis, associated with resistance to bacteriophage, their function is unknown (Dybvig and Yu, 1994; Hood et al., 1996; Saunders et al., 1998). However, with the exception of a type III system in N. gonorrhoeae, these have not been previously recognized in Neisseria spp. (Belland et al., 1996). The different types of restriction/modification systems use different enzyme components which have different interdependence for their functions (Redaschi and Bickle, 1996). Type I restriction/modification systems comprise three components which confer modification, restriction and sequence specificity properties. In these systems, the specificity component is required for both modification and restriction of the target DNA and therefore phase variation of the specificity protein would be an efficient means by which to switch these systems on and off. However, the hsdS gene identified in this study would be the first example in any species of variation of this component in a type I restriction/modification system. Bacteriophage- and bacteriocin-related proteins have also not previously been associated with phase variation in any species. The funZ gene is a homologue of a gene that is involved in lysogenic conversion in ‘phage P2 whilst the Ner protein from ‘phage Mu is a DNA binding protein that may act as a repressor (GenBank Accession: GI3139107; Strzelecka et al., 1995). Variation in the expression of these genes would be expected to affect susceptibility to and productivity of bacteriophage. Variable expression of restriction/modification systems and bacteriophage proteins may produce similar consequences, generating mixed populations of resistant and susceptible populations and lysogenic and latent infections respectively. It is not known whether variation of the type II and III restriction/modification systems, or of the type I systems in Neisseria have similar functional consequences. N. meningitidis and several of the other species which have phase variable restriction systems are naturally transformable. It may be that variation of some restriction/modification systems can influence the efficiency with which recipient cells incorporate horizontally transferred DNA. Novel types of potentially phase variable genes identified in this analysis include secreted enzymes, toxins and toxin secretion systems. Phase variation of toxin production has been previously recognized in Bordetella pertussis (Gross and Rappuoli, 1989). The identification in the annotation of the genome sequence of strain MC58 is one of the first indications that meningococci may possess toxin systems. Potentially phase variable genes that are functionally different from any previously described are listed in the category ‘other’ in Table 1. These include a di-heme cytochrome C that would potentially adapt the population to different microenvironments in a novel way. This protein is an essential component of a terminal receptor of an apparently branched electron transport chain that would enable the organism to adapt to microaerophilic conditions (Preisig et al., 1993; 1996; Thony-Meyer et al., 1994; Koch et al., 1998). However, this type of oxidase has been reported to have a less tight coupling between oxygen reduction and proton translocation (de Gier et al., 1996), so expression may be disadvantageous when the bacterium is not in the environment for which it is adaptive. The homologous system from the symbiosis-specific respiratory chain of Bradyrhizobium japonicum is tightly regulated, is induced under low oxygen conditions, and is niche adaptive (Preisig et al., 1993). Phase variation of this type of gene highlights how alterations in metabolism could be niche adaptive, and how genes of this type can be seen to fall within the compass of contingency genes. The number and functional diversity of known and candidate phase variable genes in N. meningitidis is unparalleled in any species investigated to date. The progeny of each bacterial cell potentially comprises many thousands of phenotypes due to the independent switching and combinatorial nature of this process. This study extends the role of phase variation beyond bacterium–host contact, nutrient acquisition and immune evasion, to include the relative fitness of subclones within the bacterial population and adaptation to different metabolic microenvironmental conditions. This extension does not alter the concept of phase variation as it relates to the function of contingency genes (Moxon et al., 1994) but it reveals both the phenotypic flexibility of N. meningitidis and the importance of stochastic switching processes in adaptation within bacterial populations to local environmental conditions.

Experimental procedures

Using our previously described whole genome analysis methodology (Saunders et al., 1998), we have analysed the complete genome sequence of the serogroup B N. meningitidis strain MC58 (Tettelin et al. 2000) for simple DNA repeats to identify the potential repertoire of phase variable genes. We have also analysed the length distribution of the repeats to seek evidence of mutational biases that may underlie the use of this mechanism of switching in meningococci. The frequency of each length of homopolymeric tract up to 12 bases was used to determine the expected numbers of such tracts in the genome using high-order Markov chains (Cox and Miller, 1965). This analysis determines the expected frequency of sequence ‘words’ based upon the frequency of their component parts. In this way, ‘words’ can be determined to be present at higher or lower frequencies than expected (Saunders et al., 1999). Based upon the Markov chain analysis and existing published data on repeats in Neisseria, we investigated homopolymeric tracts of greater than 6 Gs or Cs, and 8 As or Ts. We also invistigated repeats composed of four copies of dinucleotides (five for GC/CG), three copies of tetramer and longer motifs, and all repeats of five or more bases associated with frameshift changes. The significance of each repeat was interpreted on the basis of sequence context and their potential effect on associated reading frame expression as described in Result and discussion. The ORF, and where appropriate, flanking promoter sequences of the putative phase variable genes were extracted from the strain MC58 genome sequence and used to perform blastn searches of the N. meningitidis strain Z2491 and N. gonorrhoeae FA1090 genome sequences. The meningococcal strain Z2491 and the gonococcal FA1090 sequence were downloaded on the 5th of January 2000.

Acknowledgements

N.J.S. is supported by a Wellcome Trust Fellowship in Medical Microbiology. We acknowledge the use of the strain Z2491 sequence produced by the N. meningitidis Sequencing Group at the Sanger Centre that can be obtained from ftp://ftp.sanger.ac.uk/pub/N.meningitidis. We also acknowledge the Gonococcal Genome Sequencing Project, and B.A. Roe, S. P. Lin, L. Song, X. Yuan, S. Clifton, T. Ducey, L. Lewis and D.W. Dyer for the use of the strain FA1090 sequence that can be obtained from http://www.genome.ou.edu/gono.html

Ancillary