Comparative proteome analysis of Mycobacterium tuberculosis and Mycobacterium bovis BCG strains: towards functional genomics of microbial pathogens

Authors


P. R. Jungblut. E-mail jungblut@mpiib-berlin.mpg.de; Tel. (+49) 30 28460 170; Fax (+49) 30 28460 174.

Abstract

In 1993, the WHO declared tuberculosis a global emergency on the basis that there are 8 million new cases per year. The complete genome of the strain H37Rv of the causative microorganism, Mycobacterium tuberculosis, comprising 3924 genes has been sequenced. We compared the proteomes of two non-virulent vaccine strains of M. bovis BCG (Chicago and Copenhagen) with two virulent strains of M. tuberculosis (H37Rv and Erdman) to identify protein candidates of value for the development of vaccines, diagnostics and therapeutics. The mycobacterial strains were analysed by two-dimensional electrophoresis (2-DE) combining non-equilibrium pH gradient electrophoresis (NEPHGE) with SDS–PAGE. Distinct and characteristic proteins were identified by mass spectrometry and introduced into a dynamic 2-DE database (http://www.mpiib-berlin.mpg.de/2D-PAGE). Silver-stained 2-DE patterns of mycobacterial cell proteins or culture supernatants contained 1800 or 800 spots, respectively, from which 263 were identified. Of these, 54 belong to the culture supernatant. Sixteen and 25 proteins differing in intensity or position between M. tuberculosis H37Rv and Erdman, and H37Rv and M. bovis BCG Chicago, respectively, were identified and categorized into protein classes. It is to be hoped that the availability of the mycobacterial proteome will facilitate the design of novel measures for prevention and therapy of one of the great health threats, tuberculosis.

Introduction

Tuberculosis is one of the most prevalent infections globally. In 1998, Mycobacterium tuberculosis caused about 8 million new cases and 2 million deaths. As a result of the AIDS pandemic, the prevalence of tuberculosis increased greatly, which led the WHO to declare tuberculosis a global emergency in 1993.

The emergence of M. tuberculosis strains resistant to chemotherapeutic regimens necessitates an intensified search for novel therapeutic and prophylactic approaches. The currently available vaccine strain M. bovis bacille Calmette–Guérin (BCG) confers some protection against childhood tuberculosis. In contrast, its protective capacity against the most common form, pulmonary tuberculosis in adults, is low and highly variable (Colditz et al., 1994). Although the BCG genome shares a very high degree of similarity with that of M. tuberculosis, BCG lacks a number of genes that are present in M. tuberculosis (Mahairas et al., 1996; Brosch et al., 1998). It has been proposed that the M. tuberculosis-specific gene products are potential vaccine antigens in the form of subunit vaccines, naked DNA or expressed by recombinant carriers. Conversely, M. tuberculosis genes that encode for crucial virulence factors but are irrelevant for protective immunity may be deleted in M. tuberculosis to generate M. tuberculosis knock-out vaccine candidates (Berthet et al., 1998).

To date, our knowledge of virulence genes in M. tuberculosis and their corresponding proteins is still fragmentary. One of the main characteristics of pathogenic mycobacteria is their capacity to survive within host macrophages by interfering with processes leading to phagolysosome fusion, but the genes/proteins involved in their survival strategies have not yet been defined. Therefore, precise analysis of the mycobacterial proteome, comparing different mycobacterial strains and species in adjunct to the mycobacterial genome, is urgently required.

Recently, Cole et al. (1998) published the complete DNA sequence of the M. tuberculosis strain H37Rv genome with a total of 3924 individual genes. The functional complement of this genetic information, the proteome, can be analysed by a combination of two-dimensional (2-DE) and mass spectrometry. Earlier attempts to resolve mycobacterial proteins resulted in a resolution of about 50–170 proteins (Britton et al., 1987; Daugelat et al., 1992; Wallis et al., 1993; Lee & Horwitz, 1995; Mahairas et al., 1996). This was increased to 700 protein species when immobilized pH gradients in the first dimension of 2-DE were used (Urquhart et al., 1997; Sonnenberg & Belisle, 1997). Subtractive protein analysis, the comparison of gene expression in different biological situations, allows the correlation of biological effects with protein composition. The resulting protein 2-DE databases are continuously collected in the world 2-DE database (http://www.expasy. ch/ch2d/2d-index.html), available on the WWW.

We used 2-DE methodology to achieve resolution of the mycobacterial proteome into 1800 distinct protein species. We compared the composition of cellular as well as culture supernatant proteins of two strains of M. tuberculosis and of M. bovis BCG. To date, 263 proteins have been identified by matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS), 157 and 53 in the cell protein (CP) fraction of M. bovis BCG Chicago and M. tuberculosis respectively, as well as 54 proteins from H37Rv culture filtrate (CSN). From the CP patterns, eight proteins were unique for BCG and 13 for M. tuberculosis H37Rv. Identification was achieved by peptide mass fingerprinting (PMF) using MALDI-MS and, if necessary, by confirmation with post-source decay sequencing. As the basis for further proteomic investigations, the construction of a mycobacterial 2-DE database has been initiated.

Results

Protein separation and identification

The 2-DE patterns of all mycobacterial strains investigated are highly similar and, because many landmark spots exist, these patterns are easily comparable. Only obvious differences readily recognizable by visual evaluation were used to detect protein species from different mycobacterial strains with regard to intensity or position. Each comparison was repeated at least three times with different sample preparations of the same strains. Only differences confirmed in all preparations were accepted as strain specific. However, the 2-DE patterns comprise numerous additional variants, which await further evaluation. Identification of proteins separated by 2-DE has been reviewed recently (Patterson and Aebersold, 1995; Jungblut et al., 1996; Jungblut and Thiede, 1997). The term protein species was defined as the smallest unit of a protein classification, defined by its chemical structure (Jungblut et al., 1996). We chose in-gel tryptic digestion (Otto et al., 1996) and MALDI-MS peptide mass fingerprinting (Henzel et al., 1993; James et al., 1993; Mann et al., 1993; Pappin et al., 1993) with the possibility of sequencing by post-source decay MALDI-MS (Spengler et al., 1992) for identification of the first 263 proteins, with a priority for high-intensity proteins and for variants between the investigated mycobacterial strains.

Mycobacterial 2-DE database

To organize the vast amount of data obtained from our mycobacterial proteome, our present results are published along with this paper in an electronic 2D–PAGE database available via the Internet (http://www.mpiib-berlin.mpg. de/2D-PAGE). It is constructed according to the rules of the world 2D–PAGE guidelines (Appel et al., 1996; H.-J. Mollenkopf et al., 1999). The management of the 2D–PAGE database not only allows interactive use via the Internet, but also facilitates active participation of different scientific groups with their own data and thereby minimizes administration and maintenance work.

Analysis of the mycobacterial protein composition

Whole-cell preparations of mycobacteria resulted in 2-DE patterns containing 1500–2000 distinct protein spots depending on silver-staining conditions and the amount of sample applied to the gels. Standard patterns of M. bovis BCG Chicago and M. tuberculosis H37Rv chosen for the construction of the mycobacterial 2-DE database are shown in Fig. 1A and B. Molecular mass and isoelectric point calibrations were obtained by internal mycobacterial marker proteins identified during this approach. Both mycobacterial species comprise patterns with a high density of spots in the acidic range, whereas, in the basic range, spot density is clearly reduced. The patterns of the four strains investigated are highly similar and can be compared easily. They were divided into six sectors to promote data handling for visual inspection and personal computer evaluation. Figure 2 shows sector C of M. bovis BCG Chicago CP (Fig. 2A) and M. tuberculosis H37Rv CSN (Fig. 2B) only. All sectors are shown in the Molecular Microbiology website (http://www.blackwell-science.com/mmi) and in the 2D–PAGE database (http://www.mpiib-berlin.mpg.de/2D-PAGE).

Figure 1.

Figure 1.

. Two-dimensional gel of total cell protein of (A) M. bovis BCG, (B) M. tuberculosis H37Rv and (C) culture supernatant of H37Rv.

Figure 1.

Figure 1.

. Two-dimensional gel of total cell protein of (A) M. bovis BCG, (B) M. tuberculosis H37Rv and (C) culture supernatant of H37Rv.

Figure 1.

Figure 1.

. Two-dimensional gel of total cell protein of (A) M. bovis BCG, (B) M. tuberculosis H37Rv and (C) culture supernatant of H37Rv.

Figure 2.

. Sectors C of the 2-DE pattern of (A) M. bovis BCG Chicago cell proteins and (B) M. tuberculosis H37Rv CSN. Identified proteins are marked with corresponding accession numbers in Table 1 (http://www.blackwell-science.com/mmi).

Selected proteins from the six sectors were identified by peptide mass fingerprinting (PMF) using MALDI-MS. Although we started with the most intensive protein spots, there were more uncertainties of identification at the beginning of our investigation, because the sequence database of mycobacteria contained only about 2500 entries. As the sequence database became more complete, less additional sequence information by post-source decay sequencing was necessary. A protein was accepted as identified if we had detected peptides covering at least 30% of the complete sequence. An assignment with a sequence coverage below 30% was only accepted if (i) at least the three main peaks of the mass spectrum matched with a database sequence; (ii) the number of low-intensity peaks was clearly reduced and the mass of the uncleaved protein fitted within 20%; or (iii) PSD confirmed a proposed protein. Most proteins matched with one database entry with a clearly higher number of common peptides compared with the second candidate. Only three spots in BCG contained two proteins: BCG Chicago spot C100 includes a protein homologous to a conserved hypothetical M. tuberculosis H37Rv protein, Rv3075c, and, in addition, the transcription antitermination protein NusG, Rv0639. BCG Chicago C241 contains a probable adenylate kinase, Rv0733, and a probable transposase, Rv1041c; and C600 a thioredoxin reductase, Rv3913, and 3-hydroxyacyl-CoA dehydrogenase, Rv0468. In some cases, peptides of neighbouring spots were detected in reduced intensity in addition to the peptides of the main protein.

Starting from Coomassie brilliant blue R-250 or G-250 or, in some cases, negatively stained gels, we analysed 312 mycobacterial protein spots. From these spots, PMF identified 263 proteins. Starting with the identification of M. bovis BCG strain Chicago CP, we identified 157 proteins. From M. tuberculosis strains H37Rv and Erdman, we identified by PMF 53 and 12 proteins respectively. Additional sequence information by PSD confirmed the PMF assignments for 34 proteins. Because all PSD results confirmed the PMF assignments, we are convinced that 30% sequence coverage is sufficient for protein identification. PSD had to be used only if the sequence coverage was < 30%. As determined by PMF, all 23 H37Rv spots tested had the same identity as their counterparts at the same position in the BCG pattern. Therefore, we identified additional spots in cellular proteins by comparison of the spot position of these two species, resulting in a total of 162 identified proteins in BCG Chicago and a total of 626 identified proteins in CP of all strains.

Identified proteins of the mycobacterial species investigated were classified according to the M. tuberculosis H37Rv gene classification of Cole et al. (1998) and assigned to the corresponding Rv numbers. A comprehensive table with complete information on the identified proteins can be found on the Web page of this journal. After identification of about 3% of all predicted gene products, starting with the most common proteins, we found species of many categories. However, only within two categories, i.e. protein translation and modification and chaperones/heat shock, were more than 40% of the predicted gene products identified in our 2-DE patterns. The significance of the proteome approach for confirmation of predicted protein species has been emphasized (Humphery-Smith et al., 1997). To date, our study has revealed the expression of 30 conserved hypotheticals and six unknowns, not described previously at the protein level.

In the CSN of M. tuberculosis H37Rv, ≈800 proteins were resolved by 2-DE (Fig. 1C). So far, 54 protein spots have been identified within the CSN of M. tuberculosis H37Rv (see table on the Web page of this journal: http://www.blackwell-science.com/mmi). Similar to the CP patterns, CSN patterns were highly comparable. Compared with CP, CSN proteins occurred in more spot series relative to the total number of spots (Fig. 1C). This suggests the existence of different protein species, probably resulting from post-translational modification, such as phosphorylation, glycosylation or acylation. The higher proportion of spot series in CSN could also be caused by the higher load per protein on the gel, by a higher degree of post-translational modification of secreted proteins or by degradation of proteins outside of the bacterial cell. For instance, in CSN, three adjacent series containing eight spots were stained. Four of these spots were identified by PMF as elongation factor Tu (Tuf), Rv0685. The 14 kDa antigen (Rv2031c) and the 10 kDa chaperonin (Rv3418c) appeared as six and five spots respectively. An example from CP, steroid dehydrogenase of BCG Chicago corresponding to Rv0148, occurred in six spots randomly distributed within one sector of the 2-DE pattern.

Comparison of protein patterns from different M. tuberculosis and M. bovis BCG strains

The genomes of the M. tuberculosis complex, comprising all four strains investigated here, are highly conserved (Sreevatsan et al., 1997). The 2-DE patterns confirm the prediction that the vast majority of proteins have their counterparts in all strains investigated. However, between these strains, there were also clear differences in spot intensity, presence or absence and position of the spots. We concentrated our evaluation on readily detectable spot variations, which were consistent in all 2-DE patterns. The investigation was primarily aimed at the elucidation of proteins occurring exclusively in the virulent strains to detect potential virulence factors and candidate vaccine antigens (Table 1). Between BCG Chicago and H37Rv, 31 variants were detected. In comparison with BCG, H37Rv comprised 13 additional spots and lacked eight spots; nine spots were decreased in intensity, and one spot was increased. From the 31 variants, 25 were identified by PMF. Six identified proteins in H37Rv were without any counterpart in BCG: l-alanine dehydrogenase (40 kDa antigen, Rv2780), isopropyl malate synthase (Rv3710), nicotinate-nucleotide pyrophosphatase (Rv1596), MPT64 (Rv1980c) and two conserved hypotheticals (Rv2449c and Rv0036c). The absence of l-alanine dehydrogenase and of MPT64 in BCG confirms a previous observation (Andersen et al., 1992; Li et al., 1993; Behr & Amall, 1999). Eight of the ± variants were shown to be mobility variants, possibly caused by amino acid exchanges or post-translational modifications. Two obvious positional variations, one intensity and one ± variant, are shown in Fig. 3A. Succinyl-CoA synthase alpha chain (Rv0952) shifted from a higher Mr variant in BCG to a lower one in H37Rv. An oxidoreductase of the aldo/keto reductase family (Rv2971) was shifted diagonally from a more basic, lower Mr form in BCG to a more acidic, higher Mr form in H37Rv. Alkyl hydroxyperoxide reductase chain C (Rv2428) was decreased in intensity in H37Rv, and MPT64 (Rv1980c) occurred as an additional spot in H37Rv.

Table 1. . Protein variability between cell proteins (CP) of different strains.Thumbnail image of
  • a

    Four comparisons were performed: (A) M. bovis BCG Chicago CP versus M. tuberculosis H37Rv CP; (B) M. tuberculosis H37Rv CP versus Erdman CP; (C) M. bovis BCG Chicago CP versus Copenhagen CP; and (D) M. bovis Chicago CP versus M. tuberculosis Erdman CP. Each strain was prepared at least three times, and at least gels of three independently prepared samples were compared. Some obvious differences were checked for reproducibility, and only variations occurring reproducibly in all gels of one strain were accepted. From these 59 variant spots, we identified 50 proteins. [↑] spot intensity increased; [↓] spot intensity decreased; [−] spot not detected on 2-DE pattern; MV, mobility variant, spot position shifted, the following spot number corresponds to the shifted spot.

  • Figure 3.

    . Pattern sectors showing differences in intensity or position between cell proteins of different mycobacterial strains. A. Comparison between (A, C and E) M. bovis BCG Chicago and (B, D and F) M. tuberculosis H37Rv. C645 is a mobility variant of C527. Both spots were identified as succinyl-CoA synthase α-chain (Rv0952). C126 and C125 are mobility variants, both identified as oxidoreductases of the aldo/keto reductase family (Rv2971). C31 was increased in intensity in BCG Chicago compared with C53 of H37Rv. This protein was identified as alkyl hydroperoxide reductase chain C (Rv2428). C71 was absent in BCG Chicago and was identified as MPT64 (Rv1980c). B. Comparison of (A and C) M. tuberculosis H37Rv with (B and D) Erdman. Proteins of the glutamate family were increased in intensity in the Erdman pattern: A511 and A195 and their corresponding spots in H37Rv A386 and B17 were acetylornithine aminotransferases ArgD (Rv1655), and D20 was N-acetyl-glutamylphosphate reductase (Rv1652). Two spots in (A) and (B) were shifted to a more acidic position in the Erdman pattern. A473 and A267 were identified as transcriptional regulator MoxR (Rv1479). The region shown in (C) and (D) revealed three intensity differences: D59 was identified as Rv3213c; D153 as Rv1996; and D10 as haloalkane dehalogenase Rv2296.

    Comparison between M. tuberculosis Erdman and M. bovis BCG Chicago revealed four mobility variants, belonging to an oxidoreductase of the aldo/ketoreductase family described as Rv2971 in H37Rv, succinyl-CoA synthase α-chain (Rv0952), S-adenosylmethionine synthase (Rv1392) and transcriptional regulator MoxR (Rv1479).

    Comparison of 2-DE patterns from M. tuberculosis H37Rv against Erdman revealed 18 variant proteins, 16 of which were identified. In the M. tuberculosis Erdman proteome, six protein species appeared to be increased in intensity, two protein species newly appeared, six were absent, and two represented mobility variants. Some examples are shown in Fig. 3B. Two spots of the acetylornithine aminotransferase ArgD (Rv1655) were present in both H37Rv and Erdman, but both clearly had higher intensities in Erdman. The transcriptional regulator MoxR (Rv1479) was shifted to a more acidic position in the Erdman 2-DE pattern. The haloalkane dehalogenase (Rv2296), two spots containing l-alanine dehydrogenase (Rv2780), and protease IV (Rv0724) were absent from the Erdman proteome, whereas the unknown protein Rv3213c, sharing similarity with a Soj protein of possible relevance to chromosome segregation, and the conserved hypothetical protein Rv2641 were absent in the H37Rv proteome.

    BCG Chicago and Copenhagen expressed highly similar 2-DE patterns. Only three obvious variants were identified. The conserved hypothetical protein Rv0968 was absent in the Copenhagen proteome, and two spots of a probable neuraminidase (Rv3463) were increased in intensity in the Chicago strain.

    The detected mobility variants raise the question whether putative ± variants do have an as yet unidentified cognate, and thus represent mobility variants rather than ± variants. However, positional variants are interesting vaccine candidates, too, if the positional variation is caused by amino acid exchanges within the amino acid sequence relevant to T-cell recognition. If the positional change is caused by post-translational modification, then the gene per se is not different, shifting the interest towards the enzyme that mediates the protein modification.

    Discussion

    Proteome analysis of a biological entity depends on separation methods appropriate for the complexity of this entity. Whereas proteomes of ribosomes containing about 50–100 protein species can be investigated by small 2-DE systems (Kaltschmidt and Wittmann, 1970) or high-performance liquid chromatography (HPLC) (Kamp et al., 1984), proteome analysis of bacteria and higher organisms requires high-resolution techniques. The combination of isoelectric focusing (Vesterberg and Svensson, 1966) in its NEPHGE modification (O'Farrell et al., 1977) and SDS–PAGE (Laemmli, 1970), both per se high-resolution methods, and the use of large-sized gels (at least 20 cm × 30 cm) results in a resolution power of 5000–10 000 protein species (Klose and Kobalz, 1995), with sufficient quality to allow the comparison of gels between different laboratories (Jungblut et al., 1994). The preparation of the mycobacterial proteins has been simplified as much as possible to obtain reproducible samples. Growth stage and culture medium have been kept constant for the presented results. These results are a prerequisite for further experiments to elucidate spatial and time-dependent protein compositions.

    We describe comparative proteome analyses of different strains of Mycobacteria spp. as a first step towards post-genomic characterization of medically important pathogens. We embarked on this endeavour with the aim of gaining insights into protein composition differences between attenuated vaccine strains and virulent M. tuberculosis, which may aid in the design of novel vaccines and chemotherapy for the control of tuberculosis.

    Of the 263 proteins identified by 2-DE in the different preparations analysed (CP and CSN of both M. tuberculosis and M. bovis BCG), about one third corresponded to housekeeping proteins involved in gene regulation, biosynthesis, degradation or metabolism. We applied about 200 μg of protein to the preparative gels. With a mean detection limit of 100 ng for Coomassie brilliant blue R250, a minimum of 0.5% of the total protein amount may be estimated as the lowest level of protein investigated. From the 366 predicted proteins putatively associated with the cell envelope (Cole et al., 1998), three proteins were identified (Rv1980c, Rv2145c and Rv0475). Although our sample preparation and 2-DE technique are, in principle, able to resolve even lipoproteins, as we have shown for OspA of Borrelia garinii (Jungblut et al., submitted), none of the 65 predicted lipoproteins were identified in the Mycobacteria species until now.

    Four polypeptides play a role in transcription control, such as the RNA polymerase A (Rv3457c) and the transcription termination protein Rho (Rv1297). Four proteins are ribosomal proteins, such as the 50S L7/L12 (Rv0652), and seven proteins are involved in protein translation and modification, such as the elongation factors Tu (Rv0685) and Ts (Rv2889c) and the homologue to the transcription elongation factor GreA of M. leprae (Rv1080). The EF-Tu was present in the CP as well as in the CSN. This factor has been localized to the cell wall of M. leprae and is associated with the membrane and periplasmic space of other bacteria, such as Escherichia coli and Neisseria gonorrhoeae, but its function remains uncertain (Jacobson & Rosenbusch, 1976; Porcella et al., 1996; Marques et al., 1998).

    There are two two-component response regulators (Rv1626 and Rv3133c) present in the proteome. One of these proteins, Rv1626, shows strong similarities to two-component systems of Methanobacterium thermoautotrophicum, Azetobacter vinelandii and Streptomyces coelicolor, indicating the usage of environmental sensor and regulation systems by mycobacteria similar to other prokaryotes (Gutierrez et al., 1995; Brian et al., 1996; Smith et al., 1997). In A. vinelandii, this protein is involved in negative regulation of the nitrite–nitrate reductase system. In S. coelicolor, a member of the Actinomycetaceae closely related to Mycobacteriaceae, it is a negative regulatory element in the synthesis of antibiotics. MoxR (Rv1479), which was apparently modified in H37Rv when compared with Erdman, is a putative regulatory molecule probably involved in the formation of an active methanol dehydrogenase, as shown for Paracoccus denitrificans (Van Spanning et al., 1991). Similarly, the 40 kDa antigen (Rv2780), an alanine dehydrogenase which is unique for M. tuberculosis and M. marinum (Andersen et al., 1992), was upregulated in H37Rv when compared with Erdman. It is unclear yet whether this polypeptide is exclusively expressed in virulent mycobacteria. However, it could contribute to virulence, because it has been implicated as part of the cell wall synthesis machinery, as l-alanine is an important constituent of the peptidoglycan layer. Consistent with this notion, this protein is also present in the mycobacterial cell wall and even the outermost capsule (Ortalo-Magne et al., 1995).

    Twenty-five protein spots were identified as putative heat shock proteins, including Hsp60 (groEL2; Rv0440), Hsp70 (DnaK; Rv0350), Hsp10 (GroES; Rv3418) and ClpB (Rv0384c) (Kaufmann and Andersen, 1998). Owing to the high sequence homology between mycobacterial and human Hsp60, it has been suggested that this protein can be involved in infection-triggered autoimmune responses (Zügel and Kaufmann, 1999). DNA vaccination experiments also indicate that Hsp60 is a potential vaccine candidate (Tascon et al., 1996). A 14 kDa protein (HspX; Rv2031c) related to the heat shock protein alpha-crystalline is a strong inducer of antibodies in patients with pulmonary tuberculosis (Verbon et al., 1992). Interestingly, both M. bovis BCG and M. tuberculosis contain a putative rotamase (peptidyl-prolyl cis–trans isomerase; Rv0009) homologous to cyclophilins, the specific receptors for the immunosuppressive drug cyclosporin A. Similar to heat shock proteins, cyclophilins may participate in protein folding (Resch and Szamel, 1997; Wang and Tsou, 1998).

    A number of proteins identified within the mycobacterial proteome are involved in biosynthesis/degradation of fatty acids and glycolipids, which are essential components of the complex acid-fast cell wall. Examples are the methoxy mycolic acid synthase 4 (Rv0642c), the enoyl (ACP) reductase (Rv1484) and β-ketoacyl (ACP) synthase (Rv2246), which are central to the biosynthesis of mycolic acids and have recently been identified as targets for isoniazid (Sacchettini & Blanchard, 1996; Mdluli et al., 1998; Rozwarski et al., 1998). Members of the antigen 85 complex (Rv1886c, Rv3803c and Rv3804c) are also part of the enzymatic cascade of the cell wall synthesis, i.e. mycolyl transferases, but apparently also have the potential to mediate mycobacterial binding to fibronectin (Abou-Zeid et al., 1988; Belisle et al., 1997). In addition, they are considered as vaccine candidates (Kaufmann and Andersen, 1998).

    A group of proteins with a function not fully understood yet, but with high homology to oestradiol dehydrogenases (Rv0148), is also of interest. This class of enzymes has so far only been identified in eukaryotes such as yeast and humans (Baker, 1990; Sloots et al., 1991). In Candida tropicalis, the homologue is present in peroxisomes and has hydratase–dehydrogenase–epimerase activity, which is important in fatty acid beta-oxidation (Baker, 1990; Sloots et al., 1991). A role for these enzymes in sugar metabolism and synthesis of antibiotics and steroids, such as oestradiol, corticosterone and hydrocortisone, appears likely. Although steroids are unlikely constituents of prokaryotes, steroid synthesis has recently been demonstrated for M. smegmatis (Lamb et al., 1998), and homologues to cholesterol synthesis enzymes of yeast have been found in the M. tuberculosis genome. It is also of interest that the homologous gene in the yeast C. tropicalis contains responsive elements for oleic acids, which are carbohydrate sources for mycobacteria (Baker, 1990; Sloots et al., 1991).

    Among the proteins identified within the mycobacterial proteome, several have been suggested as mycobacterial antigens of putative value for vaccine development and/or for diagnosis. These include the alanine dehydrogenase (Rv2780), Hsp60 (Rv0440), Hsp70 (Rv0350), members of the antigen 85 complex (Rv1886c, Rv3803c and Rv3804c), α crystalline (Rv2031) and the 35 kDa antigen (Rv2744c) (O'Connor et al., 1990; Kaufmann and Andersen, 1998). The mycobacteria-specific 34 kDa protein, termed antigen 84 (Rv2145c), has been identified in M. kansasii, M. bovis BCG, M. leprae and M. tuberculosis and is recognized by antibodies in 60% of lepromatous leprosy patients (Hermans et al., 1995). MPT64 (Rv1980c) and MPT51 (Rv3803c), a homologue to antigen 85, are both CSN proteins, and MPT64 is a known inducer of delayed-type hypersensitivity responses in guinea pigs (Kaufmann and Andersen, 1998).

    The genome sequence revealed several genes for lipases, phospholipases C, esterases and proteases potentially contributing to mycobacterial virulence (Cole et al., 1998). So far, only two alkyl hydroperoxide reductases (AhpC Rv2428, AhpD Rv2429) have been identified within the proteome. Although not fully proven, these enzymes could play a role in resistance to reactive nitrogen intermediates (Chen et al., 1998).

    To date, of all the proteins analysed, 39 polypeptides are conserved hypothetical proteins and six are unknowns using the information contained in the M. tuberculosis genome sequence. About 1500 so far unidentified proteins leave room for further analysis, as they may contain candidates responsible for virulence, intracellular survival or drug resistance.

    Complete genome sequencing of microbial pathogens not only offers insights into the biology of prokaryotes, but also provides the basis for rational prevention and therapy of infectious diseases. As a next step towards this goal, determination of protein expression, protein composition, post-translational modification and, ultimately, function is required. Disclosure of the proteomes of the tubercle bacillus and the vaccine strain BCG serves to complement available genome information of M. tuberculosis towards the development of improved measures for tuberculosis control.

    Experimental procedures

    Mycobacteria

    In this study, two virulent strains of M. tuberculosis, H37Rv and Erdman, and two vaccine strains, M. bovis BCG Chicago and Copenhagen, were analysed. To prepare CP, mycobacteria were grown in Middlebrook medium for 6–8 days to a cell density of 1–2 × 108 cells ml−1. The cells were washed and sonicated in the presence of proteinase inhibitors, and the proteins were treated with 9 M urea, 70 mM dithiothreitol (DTT) and 2% Triton X-100 to obtain completely denatured and reduced proteins. Culture supernatant proteins were prepared from mycobacterial cultures grown in Sauton medium under permanent shaking for 10–15 days or without shaking for 30 days to obtain a cell density of 1–2 × 108 cells ml−1. CSNs were collected by filtration and precipitation in 10% trichloroacetic acid.

    Two-dimensional electrophoresis

    For the resolution of the mycobacterial proteome, we applied a 2-DE gel system (Klose and Kobalz, 1995) in a 23 cm × 30 cm version and a resolution power of about 5000 protein species. For subtractive analyses (Aebersold and Leavitt, 1990) and database construction, we applied 50–100 μg of protein to the anodic side of the IEF gel. In the second dimension, we used 0.75-mm-thick gels. The proteins were detected by silver staining optimized for these gels (Jungblut and Seifert, 1990). For the identification of proteins, 200–300 μg of protein were applied and, in the second dimension, 1.5-mm-thick gels were used. The proteins were stained by Coomassie brilliant blue R250 (Eckerskorn et al., 1988), G250 (Doherty et al., 1998), or negative staining (Fernandez-Patron et al., 1995).

    Peptide mass fingerprinting

    In-gel tryptic digestion was performed using a peptide-collecting device to concentrate and wash the peptide mixture (Otto et al., 1996). The peptide solution was mixed with an equal volume of a saturated α-cyano-4-hydroxy cinnamic acid solution in 50% acetonitrile, 0.3% TFA, and 2 μl was applied to the sample template of a matrix-assisted laser desorption/ionization mass spectrometer (Voyager Elite; Perseptive). Data were obtained using the following parameters: 20 kV accelerating voltage, 70% grid voltage, 0.050% guide wire voltage, 100 ns delay and a low mass gate of 500.

    Peptide mass fingerprints were searched using the program ms-fit (http://prospector.ucsf.edu/ucsfhtml/msfit.htm) reducing the proteins of the NCBI database to the mycobacterial proteins and to a molecular mass range estimated from 2-DE ± 20%, allowing a mass accuracy of 0.1 Da for the peptide mass. In the absence of matches, the molecular mass window was extended. Partial enzymatic cleavages leaving two cleavage sites, oxidation of methionine, pyro-glutamic acid formation at N-terminal glutamine and modification of cysteine by acrylamide were considered in these searches.

    Database construction

    Gels were digitized after scanning with a UMAX Mirage IIse scanner using the topspot software (WITA). Before spot detection, the gels were divided into six sectors, which were automatically spot detected and afterwards corrected interactively. The resulting map files were introduced together with gif files, and identification data were collected within an access database into the 2-DE database (H.-J. Mollenkopf et al., 1999).

    Acknowledgements

    This project was made possible through the financial support of Chivon-Behring, Marburg, Germany.

    Ancillary