Cloning and expression of multiple integral membrane proteins from Mycobacterium tuberculosis in Escherichia coli


  • Alla Korepanova,

    1. Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306, USA
    2. National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida 32306, USA
    Search for more papers by this author
  • Fei P. Gao,

    1. Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306, USA
    2. National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida 32306, USA
    Search for more papers by this author
  • Yuanzi Hua,

    1. Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306, USA
    Search for more papers by this author
  • Huajun Qin,

    1. Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306, USA
    Search for more papers by this author
  • Robert K. Nakamoto,

    1. Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22908, USA
    Search for more papers by this author
  • Timothy A. Cross

    Corresponding author
    1. Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306, USA
    2. National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida 32306, USA
    3. Institute of Molecular Biophysics, Florida State University, Tallahassee, Florida 32306, USA
    • National High Magnetic Field Laboratory, 1800 E. Paul Dirac Drive, Tallahassee, FL 32310, USA; fax: (850) 644-1366.
    Search for more papers by this author


Seventy integral membrane proteins from the Mycobacterium tuberculosis genome have been cloned and expressed in Escherichia coli. A combination of T7 promoter-based vectors with hexa-His affinity tags and BL21 E. coli strains with additional tRNA genes to supplement sparsely used E. coli codons have been most successful. The expressed proteins have a wide range of molecular weights and number of transmembrane helices. Expression of these proteins has been observed in the membrane and insoluble fraction of E. coli cell lysates and, in some cases, in the soluble fraction. The highest expression levels in the membrane fraction were restricted to a narrow range of molecular weights and relatively few transmembrane helices. In contrast, overexpression in insoluble aggregates was distributed over a broad range of molecular weights and number of transmembrane helices.

There is great scientific and pharmaceutical interest in integral membrane proteins because they are responsible for many essential cellular functions and represent >50% of all potential drug targets. Membrane proteins account for 20% to 30% of genes in sequenced genomes (Boyd et al. 1998; Wallin and von Heijne 1998); however, only 0.3% of the atomic resolution structures in the Protein Data Bank represent membrane proteins (Grisshammer 2003). There are many reasons why membrane proteins are so challenging for structural biology, the first of which is achieving high level expression. The current state of membrane protein expression is still a matter of “trial and error.” To date there are few comprehensive statistics on membrane protein expression. Several published statistical studies concentrate solely on soluble protein expression and purposely exclude membrane proteins as their targets (Christendat et al. 2000; Braun et al. 2002; Hammarström et al. 2002). Only recently has information on expression and purification of individual membrane proteins begun to be accumulated. During the last decade, over two dozen bacterial transporter proteins have been overexpressed in Escherichia coli and purified in milligram quantities (Wang et al. 2003). A number of successful procedures for the expression and crystallization of the outer membrane proteins have also been developed (Bannwarth and Schultz 2003).

We have taken a genomics approach to methods development for expression of integral membrane proteins using the Mycobacterium tuberculosis strain H37Rv genome. M. tuberculosis is the causative agent of tuberculosis, the leading cause of infectious mortality in the world today. Nearly 3 million people die annually from this infection. As much as a third of the world's population carries this bacillus in their lungs, with 10 million cases in the United States alone. Increasingly, there is concern that multiple-drug resistance will leave many more susceptible to this disease. Structural analysis of the integral membrane proteins could help to better understand the molecular details of infection and help in the design of a new generation of more effective drugs, but the cloning and expression of these proteins are the first hurdles.

Membrane proteins can be divided into two superfamilies—the β-barrel proteins of the outer membranes and those membrane proteins that have transmembrane α-helices. In this effort we have focused exclusively on the α-helical proteins, although a few putative β-barrel transmembrane proteins have been predicted from the genome. Here, we have attempted to clone and express many putative membrane proteins by using a minimal set of expression vectors, E. coli host strains, and culture conditions. The practical goal of this research is to evaluate the technical details involved in membrane protein expression and to establish a robust initial approach for targeted membrane proteins.


Target selection

Figure 1 shows the distribution by molecular weight of the 1162 open reading frames (ORFs) identified in the M. tuberculosis database as putative helical membrane proteins and the 143 ORFs that have been targeted for this study. More than 60% of the potential membrane proteins from M. tuberculosis are <40 kDa, and >15% are <20 kDa. This distribution reflects only the molecular weights of the protein monomers and not the molecular weights of the functional membrane proteins as multiprotein complexes or oligomeric structures. The analysis of the transmembrane helical content of the ORFs in the genome indicates that most membrane proteins have just one or two transmembrane helices (Fig. 2), and 84% of these putative membrane proteins have fewer than seven transmembrane helices. During the course of this study, four of the expressed proteins were found to be lipoproteins rather than integral membrane proteins and are not included in the following analysis. In addition, two ORFs were excluded after we identified them as secreted proteins, i.e., proteins with a cleavable transmembrane signal sequence.

For the targeted ORFs we aimed to generate a large and diverse collection of membrane proteins biased somewhat toward low-molecular weight proteins that would be appropriate for structural characterization by NMR spectroscopy. While the targeted ORFs show this bias, the distribution by number of transmembrane helices resembles the complete set of 1162 membrane protein ORFs of this genome. Consequently, the bias in molecular weight distribution primarily reflects a bias against large extramembranous domains in our targeted set. Qualitatively, this sample of ORFs were functionally annotated as putative membrane proteins (38%), conserved proteins (42%), and unknown or hypothetical proteins (20%).

Cloning of selected ORFs and modification of expression vectors

Cloning was based on the polymerase chain reaction (PCR). Primers were designed when possible to optimize codon usage, especially at the beginning of the cloned DNA sequences. Ten percent of initial PCR amplifications required additional optimization of temperature conditions to generate a desired PCR product. Overall, >85% of the DNA fragments encoding targeted proteins were successfully amplified from M. tuberculosis H37Rv genomic DNA (obtained from the TB Research Materials and Vaccine Testing Contract, Dr. John Belisle, Colorado State University).

The initial cloning effort of 137 targeted membrane protein ORFs showed that >72% were successfully cloned and ligated into one of two expression plasmid vectors. A total of 87 ORFs were cloned into pET16b (Novagen, Inc.) and 12 ORFs were cloned into pET29b(+) with short nonremovable N- or C-terminal (His)6 tags, respectively. Consequently, approximately half of the failures in cloning occurred at the PCR stage. Table 1 provides a listing of the successfully cloned ORFs from the targeted list. As shown in Figure 3A, cloning efficiency is significantly higher at low molecular weight (83% <30 kDa) compared with high molecular weight (54% >30 kDa). In addition, cloning efficiency appears to be less for single transmembrane helical proteins (58%) than with larger numbers of helices (81%; Fig. 3B).

Protein expression

In the interest of high throughput, a single set of growth conditions for testing membrane protein expression was used. Of the 99 successfully cloned membrane protein genes, 70 (70%) showed expression in E. coli BL21(DE3) CodonPlus-RP (Stratagene) or C43(DE3) (Avidis; Miroux and Walker 1996) strains. The C43(DE3) cells were used for a secondary expression effort, resulting in the expression of four ORFs that did not express in BL21(DE3) CodonPlus-RP cells. A total of 29 cloned genes were not expressed by these limited efforts. Figure 4 shows an example of small scale expression for ORF Rv1924c, a 14.2-kDa protein with three predicted transmembrane helices overexpressed in both the membrane and insoluble fractions of E. coli, and Table 1 displays the expression results for all of the ORFs. Very high expression was observed for 35 of the proteins using Coomassie stain, and the expression of the remaining 35 was detected by Western blots with antibodies to the N-terminal His-tag. The individual expression results are presented in supplementary material in table form. Overall, 70 M. tuberculosis membrane proteins were expressed in E. coli, with 62 proteins expressed with an N-terminal His-tag and eight proteins expressed with a C-terminal His-tag.

The molecular weights of the expressed proteins range from 8.1 to 71.3 kDa, demonstrating expression over >90% of the molecular weight range of the M. tuberculosis membrane protein genome. In fact, observed expression seems to be virtually independent of molecular weight, with 70% efficiency for proteins having a molecular weight >30 kDa and 73% for those having a molecular weight <30 kDa (Fig. 3A).

Membrane proteins having one to 14 putative transmembrane helices have been expressed. The fraction of expressed ORFs relative to the number of cloned ORFs having three or more transmembrane helices is 64%, and the fraction of expressed ORFs with only one or two helices is 78%, indicating a small reduction in expression efficiency for membrane proteins having a larger number of transmembrane helices (Fig. 3B). Even so, 11% of our expressed proteins have more than six transmembrane helices, and only 16% of all putative membrane proteins from M. tuberculosis have more than six transmembrane helices. Despite the small decrease in efficiency for proteins with a large number of helices, this expression approach appears to be a robust general approach for membrane protein expression for this genome.

Membrane proteins are expressed in three different cell fractions: in the soluble fraction, in the membrane fraction, or in what appears to be an insoluble fraction. More than 94% of the expressed proteins formed some degree of insoluble aggregates: 53% were detected in the membrane fraction, and 14% were observed in the soluble fraction. All of the proteins isolated from the soluble fraction were also expressed as insoluble aggregates, and 50% of these proteins were also expressed in the membrane fraction. The 50% that are not expressed in the membrane fraction have just one or two transmembrane helices, and all but one of these proteins has at least 85% of their residues outside of the hydrophobic region of the membrane (assuming that each predicted membrane spanning segment as listed in supplementary material table is 20 residues long). The 50% that are expressed in the membranes have from two to 10 transmembrane helices, and the extramembranous content is between 34% and 57% of the total length of the protein.

Membrane proteins with just one or two transmembrane helices, based on the results in Figure 5A, are much more likely to be expressed as insoluble aggregates or “inclusion bodies” than in membranes. In fact, we have no examples of proteins with one or two helices that were expressed only in membranes, although a few were expressed in both membranes and inclusion bodies. For proteins with three or more helices, the proteins were typically expressed in both inclusion bodies and in the membrane fraction. The fraction of proteins that were, at least in part, expressed in membranes is 9% of proteins having a single helix, 36% for two helices, 81% for three helices, and 100% for those having four or more helices. In contrast, the fraction of proteins expressed in inclusion bodies having one or two helices is 100%; for three or more helices, 83%. We note, however, that these percentages are based on the assumption that the low-speed pellet represents insoluble aggregates. There have been reports that gross overexpression of some membrane proteins into membranes may cause a density change of the membrane, causing the protein-containing membrane fraction to sediment at very low speeds (see Arechaga et al. 2000).

Detection by Coomassie stain suggests considerably more expressed protein than detection by Western blot. Of the 70 expressed membrane proteins, 35 (50%) proteins were overexpressed and detected via standard Coomassie R-250 staining of SDS-PAGE gels. The other 35 (50%) proteins were detected via Western blot analysis only. There is a correlation between the number of helices and the probability for overexpression detected by Coomassie stain (Fig. 5B). Eighty-seven percent of the expressed proteins with a single helix were detected by Coomassie stain, 40% for proteins with two or three transmembrane helices, and 18% for those with four or more helices. Consequently, overexpression with this protocol is much more easily achieved for those proteins with a small number of transmembrane helices. Proteins expressed in the membrane fraction and detected by Coomassie stain are restricted to proteins having two to four transmembrane helices (Fig. 5C) and a narrow molecular weight range from 10.1 to 14.9 kDa (Fig. 6A). Proteins expressed in inclusion bodies and detected by Coomassie stain are almost exclusively in the one to three transmembrane region of Figure 5C, but they are evenly distributed over the entire molecular weight range (Fig. 5B).



It is estimated by hydropathy analysis and sequence motifs that membrane proteins of the helical bundle class account for 20% to 30% of the ORFs in various genomes (Wallin and von Heijne 1998). We identified 1162 out of 3920 ORFs ( from M. tuberculosis that appear to encode integral membrane proteins representing approximately one third of this bacterial genome. As seen in Figure 2, the vast majority of membrane proteins have only a few transmembrane helices per polypeptide sequence. This is in good agreement with the data obtained from a variety of microorganisms (Wallin and von Heijne 1998). The number of membrane proteins with a single transmembrane helix may be slightly overstated in this analysis due to the imperfections in the prediction algorithm. It is documented that cleavable signal peptides of secreted proteins sometimes are indistinguishable from transmembrane segments of membrane proteins and lead to false membrane protein predictions (Nielsen et al. 1997). Six of the 76 expressed proteins described here were shown not to be membrane proteins: Four are lipoproteins and two are secreted proteins. Additional analyses will be utilized in the future to identify specific lipoprotein amino acid sequences and signal peptides before selecting targets. While this will reduce the number of putative membrane proteins, the analysis also underestimates the number of β-barrel proteins for which better prediction tools are now available (Wimley 2002).


The primers for the M. tuberculosis gene amplification were designed using the most common codons for E. coli. The GC content of E. coli DNA is only 50%, while that for M. tuberculosis is 65% to 70% (Dale and Patki 1990). Using native Mtb codons may reduce the efficiency of translation in E. coli. Selective codon replacement has been shown to enhance the overexpression of several cytoplasmic mycobacterial proteins in E. coli (Lakey et al. 2000). More profound replacement of low-usage codons may become very useful during expression optimization steps when applied on a gene by gene basis. Our success rate for gene amplification via PCR is somewhat lower than that reported by other genomic studies of soluble proteins (Christendat et al. 2000; Hammarström et al. 2002). However, our success rate has improved significantly with experience as we have moved through the target list.

Host strain

E. coli is one of the most widely used systems for heterologous expression. We successfully used the BL21(DE3) Codon Plus-RP E. coli strain, which is designed to compensate for codon usage differences between M. tuberculosis and E. coli (Stratagene). Most of the proteins in the tested sample were expressed using this strain, but we have also used the C43(DE3) E. coli strain. Previously, it was shown that the expression of 2-hydroxycarboxylate transporter CitS was increased more than fivefold in C43(DE3) compared with the original BL21(DE3) strain (Kästner et al. 2000). To date, the capabilities of the C43(DE3) strain have been explored in only a small number of our expression trials. For four proteins—Rv0143c, Rv1487, Rv1607, and Rv1624c—expression was only achieved in C43(DE3) cells. In addition, C43(DE3) has been used to great advantage over other expression strains in the case of larger, more complex integral membrane proteins from M. tuberculosis (Y.B. Peskova, K.A. DiGiandomenico, and R.K. Nakamoto, unpubl.).

Expression vector

The vector is one of the most important factors defining the level of protein expression. Protein fusion with affinity tags has become the method of choice for heterologous protein expression in E. coli. The tag greatly facilitates protein detection and purification. To date, the polyhistidine tags have proved to be most effective for the expression of soluble proteins (Edwards et al. 2000). The pET series vectors developed by Novagen are routinely used to express proteins with N- or C- terminal His-tags. The gene expression in this system is under control of the T7 promoter, which is a target of T7 RNA polymerase provided with the bacterial strain. The successful expression of several membrane transporters has also been achieved with the pET vectors (Cheng et al. 1996; Kästner et al. 2000). We have utilized two vectors, pET16b and pET29b(+), with slight modifications described in the Materials and Methods. Both modified vectors provide very short affinity tags consisting of only eight to nine amino acid residues, including six histidines at the N or C termini of the recombinant protein. Cloning of generated DNA fragments into the expression vectors was performed by using unique restriction sites to avoid longer tags common for the alternative DNA recombination cloning when extra DNA sequences are required to assure a precise gene transfer. The high success rate for expression of membrane proteins described here may, in part, derive from the use of short affinity tags.

There is an ongoing discussion in the literature about the advantages and disadvantages of different fusion tags. Tags generally have little effect on protein structure. However, it has been shown for some soluble proteins that terminal extensions may change protein stability and folding properties (Scholtz et al. 1996; Korepanova et al. 2001). It has also been shown that affinity tags may affect protein crystallization and quality of protein crystals (Bucher et al. 2002). Increasing the length of the polyhistidine tag from six to 10 histidines decreases the expression of aquaporin (AqpZ). AqpZ with a 10-histidine tag more readily forms high-order nonphysiologically relevant oligomers (Mohanty and Wiener 2004). We do not have sufficient data to speculate about the similar effects of His-tags on M. tuberculosis membrane protein stability and structure at this time. Currently, we have purified a number of expressed M. tuberculosis membrane proteins using metal affinity chromatography (data not shown). His-tags were useful, both when the protein was solubilized in detergent solution from the bacterial membrane and when recovered from inclusion bodies.


Seventy predicted integral membrane proteins from M. tuberculosis have been successfully expressed using the T7 promoter expression system. Molecular weights of the expressed proteins range from 8.1 to 71.3 kDa, with both detectable expression and overexpression efficiency being largely independent of molecular weight. It is important to note that 76% of the 3920 putative proteins from the M. tuberculosis genome have predicted molecular weights between 10 and 50 kDa (Tekaia et al. 1999). This number is virtually identical for both the subset of soluble proteins and the subset of membrane proteins. As a function of the number of transmembrane helices, detectable expression is somewhat reduced for proteins with a large number of transmembrane helices, even though we have successfully expressed proteins having one to 14 such helices. Overexpression is more substantially suppressed for these proteins with a large number of helices. Despite this reduction in efficiency, this expression approach using short His-tags, pET vectors, and E. coli appears to be a solid general approach for membrane protein expression for this proteome.

It is shown that the expression of some membrane proteins with a strong promoter may cause bacterial cell death (Miroux and Walker 1996), which may explain our inability to detect the expression at any level for 29 cloned ORFs. This number of 28% nonexpressed ORFs relative to cloned genes is not much higher than the expression data obtained for nonmembrane proteins of M. thermoautotrophicum (Christendat et al. 2000).

Structural studies by solution NMR, solid-state NMR, or X-ray crystallography require a plentiful supply of membrane proteins. Tens of milligrams of purified protein are used for these experiments, and few membrane proteins are so abundant in natural membranes. The production of at least 0.5 mg of protein per liter from bacterial cultures greatly facilitates structural studies (Wang et al. 2003). Expression in inclusion bodies avoids membrane insertion, even though the protein is frequently associated with lipid and is often partially structured, and provides high yields of recombinant protein that is less likely to interfere with the function of the host cellular processes. However, the expression of membrane proteins in the cellular membrane fraction is generally preferred by structural biologists so that refolding is avoided and reconstitution of the membrane protein is simplified (Drew et al. 2003). Nevertheless, there has recently been numerous successes with the productive in vitro refolding of α helical membrane proteins obtained from inclusion bodies (see Rogl et al. 1998; Ma et al. 2002; Tian et al. 2002; Kiefer 2003). Expression of several outer membrane proteins as inclusion bodies followed by solubilization in denaturing buffer and in vitro refolding has also been shown to be very successful (Bannwarth and Schulz 2003; Buchanan 2003). Consequently, there is interest in both the expression of membrane proteins as inclusion bodies and expression in membranes. Expression in a soluble cellular fraction may suggest a different conformation for membrane proteins than that in a membrane bound state (Arumagam et al. 1996; Abrami et al. 2000; Oxenoid et al. 2001; Tsitrin et al. 2002); this soluble conformation may also be a native and hence significant conformation.

There is much to be learned about how cells direct expressed proteins to different cell fractions. It is known that inclusion body formation depends on the differences between the rate of protein folding (membrane protein insertion in the membrane) and the aggregation of unfolded and misfolded protein (Clark 1998). Certainly, cells have limited capacity to incorporate protein into their normal membranes, which may lead to inclusion body formation. But cells may also produce additional membrane environments for the expressed proteins (Weiner et al. 1984; Wilkinson et al. 1986). The capacity of the E. coli's cell machinery to insert M. tuberculosis protein into membranes may be a further limitation resulting in inclusion body formation. In addition, the use of N-terminal His-tags may interfere with this machinery, especially for those proteins having the N terminus in the extracellular orientation, prompting the insertion of protein into inclusion bodies. Structural genomics have provided an opportunity to express many proteins under similar conditions. In further studies it can be hoped that an understanding of how proteins are directed to these various cellular fractions will be achieved. In this way greater yields of protein in the desired form may be obtained.


While high expression yields of membrane proteins remain a significant bottleneck for many proteins, this study shows that 50% of the targeted membrane proteins have been expressed and 25% have been overexpressed. The use of short His-tags, pET vectors, and E. coli as the expression host has been demonstrated to be effective for the expression of a broad range of protein molecular weights and number of transmembrane helices. This study has established a baseline of statistics against which other membrane protein expression studies can be compared. There are many variables that could be screened, for instance, to increase the overexpression of high-molecular weight proteins in the membrane fraction, or to increase the membrane fraction of single transmembrane helix proteins. With optimization and tailoring of approaches for specific proteins, our results show that this relatively simple expression strategy is a good first approach to achieve expression of many membrane proteins.

Materials and methods

The genomic DNA sequence for the H37Rv virulent strain of M. tuberculosis (Cole et al. 1998; was used to identify ORFs predicted to encode integral membrane proteins. Membrane protein topology prediction was performed using TMHMM (Krogh et al. 2001; and TMpred ( to select ORFs for cloning. TMHMM distinguishes between soluble and membrane proteins with a 97% fidelity rate (Möller et al. 2001; Chen et al. 2002). Lipoproteins were identified using a database of bacterial lipoproteins via the Internet ( The signal sequence analysis was done using an available signal peptide prediction server (


M. tuberculosis H37Rv (ATCC25618) chromosomal DNA was used to clone and express selected ORFs. Two main cloning approaches were utilized with pET vectors (Novagen). The generated PCR fragments either were cloned into the entry vector pETBlue-1 AccepTor followed by recloning into the pET29b(+) expression vector or were cloned directly into the pET16b expression vector. In the interest of high throughput, no other expression vectors were tested. All primers were purchased from IDT, Inc. Pairs of gene specific primers were used to amplify ORFs from genomic DNA using standard PCR conditions and the enzyme Pfu Turbo DNA polymerase (Stratagene), thus lowering the probability for introducing unwanted mutations. Portions of DNA sequence of pET16b and pET29b(+) vectors were removed as a result of cloning. Figure 7 shows final cloning regions of utilized pET16b and pET29b(+) vectors.

The PCR fragments were cloned into pET16b expression vectors using digests with only the Eam1104I enzyme. The original pET16b plasmid DNA contains three sites for the Eam1104I restriction endonuclease. Prior to cloning the M. tuberculosis genes, these sites were removed via site-directed mutagenesis using the QuikChange site-directed mutagenesis kit (Stratagene). All ORFs selected for cloning into pET16b were analyzed for possible Eam1104I sites. Only those templates that did not contain the site for the endonuclease were targeted, resulting in a 3% reduction in the number of initially selected ORFs. Eam1104I is a type II restriction endonuclease that cuts outside its recognition sequence. Treatment with Eam1104I generates unique termini consisting of three nucleotides. Unique sticky ends of DNA were designed and incorporated into the primer sequences used to amplify both the inserted gene and vector DNA, which provided directional cloning during the subsequent ligation event. This approach simplified cloning and avoided the complication of selecting two compatible and appropriate restriction enzymes. The factor Xa protease recognition and cleavage sites along with part of the spacer region were removed during cloning. The genes were positioned in the final construct immediately after the hex His-tag followed by a single glycine.

Synthetic oligonucleotide primers were used to amplify the pET16b vector DNA: forward, 5′-agttactcttcatgaggatccggctgctaac aaagc; reverse, 5′-agttactcttcacatgccgtgatgatgatgatgatggcccatgg, The Eam1104I restriction site is shown in bold. Synthetic oligonucleotide primers for gene amplification consisted of the specific N-terminal and C-terminal regions for a targeted ORF and contained extensions encoding the Eam1104I restriction site: forward, 5′-agttactcttcaatg; reverse, 5′-agttactcttcatca, when the gene is encoded by the (+)-strand, and forward, 5′-agttactcttcatca; reverse, 5′-agttactcttcaatg, when the gene is encoded by the (−)-strand of DNA. The PCR products were digested with Eam1104I, purified by electrophoresis using a gel-extraction kit (QIAGEN, Inc.).

The entry vector pETBlue-1 AccepTor vector was used to clone PCR fragments relying on the single 3′-dA nucleotide extensions on the reaction products. The purified plasmids carrying the inserts were digested with NdeI and XhoI enzymes. The PCR fragments were cloned into the pET29b(+) expression vector between the NdeI and XhoI restriction sites, thus discarding the DNA region encoding S-tag and thrombin recognition and cleavage sites. Inserts were purified as before and ligated into the pET29b expression vector, which has previously been digested with the same enzymes.

Colony PCR was used to select for clones with the correct insert size for either pET16b or pET29b(+). The resultant plasmids were sequenced to confirm the identity of the cloned genes and check for introduced mutations.

Small-scale expression screening

The recombinant plasmids were transformed into E. coli BL21(DE3)CodonPlus-RP (Stratagene) or C43(DE3) (Avidis; Miroux and Walker 1996) cells grown on LB agar plates containing 50 μg/mL ampicillin or 30 μg/mL kanamycin for pET16b and pET29b(+), respectively. Individual colonies were grown in tubes at 37°C at 220 rpm overnight in 2 mL LB supplemented with an antibiotic. An aliquot of overnight culture was taken to inoculate 10 mL fresh LB media containing antibiotic. Cells were grown for 2 to 2.5 h until an OD600 of 0.4 to 0.6 was reached, when expression was induced with the addition of 0.3 mM IPTG. The culture was grown for an additional 3 h and harvested by centrifugation at 4500g for 15 min at 5°C. Cells were resuspended in 1 mL water and lysed by sonication, using a Sonic Dismembrator, Model 100 (Fischer Scientific, Inc.) by sonicating three times for 15 sec each. The lysed preparation was inspected by optical microscopy, and >99% of the cells were deemed to have been lysed by this protocol. The lysate was clarified by centrifugation for 15 min at 7000g. The supernatant normally contained soluble proteins and solubilized biological membranes, while the pellet consisted of insoluble proteins and inclusion bodies. The pellet was resuspended in 0.5 mL 1% SDS and 4% urea. The supernatant was subjected to ultracentrifugation at 90,000g for 20 min to separate the membrane and soluble protein fractions. Tenmicroliter aliquots of soluble, insoluble, and membrane fractions were analyzed for protein expression by SDS-PAGE (Schagger and von Jaggow 1987) using tricine gels and 12% polyacrylamide followed by either Coomassie staining or Western blotting. For the latter detection scheme, the bands were transferred to PVDF Immobilon-PSQ membranes with a Mini Trans-Blot Electrophoretic Transfer Cell (Bio-Red) by standard methods (Gallagher et al. 1997). The blotting and detection were done using His-Tag AP Western Reagent kits from Novagen as directed by the manufacturer. Control experiments were performed under the same experimental conditions without IPTG induction.

Table Table 1.. Expression of membrane protein from M. tuberculosis H37Rv in E. coli
     Coomassie detectionWestern detection
  1. a

    The abbreviations are as follows: (IB) inclusion body fraction, (SOL) soluble fraction, (MEM) membrane fraction, (No. TM) number of putative transmembrane helices, (MW) molecular weight in kDa. nt stands for nontested; *means that expression was observed, and for Coomassie stains it suggests ≥20 mg/L of culture. No mark means that expression was not observed.

2Rv011026.97N   * *
3Rv0143c50.810N*  * *
4Rv015611.83N   *  
5Rv020313.91N*  ntntnt
6Rv020538.08N   * *
8Rv0265c35.31N*  ntntnt
10Rv040112.64N   * *
11Rv0403c15.32C   *  
13Rv0451c15.41N*  * *
15Rv046310.13N   * *
19Rv0677c15.21N*  ntntnt
20Rv082028.01N*  ntntnt
21Rv0870c13.73N     *
23Rv08829.63N     *
24Rv096112.43N     *
25Rv0985c16.02C   ** 
26Rv098627.41N*  ntntnt
27Rv103120.21N*  ntntnt
28Rv117115.24N   * *
30Rv1239c41.52N*  ntntnt
31Rv1280c63.43N*  ntntnt
32Rv13058.12C   *  
33Rv1342c13.43C*  * *
35Rv1446c32.71N*  ntntnt
37Rv148715.23N   * *
39Rv160736.810N   ***
41Rv1624c20.46N   ***
42Rv163447.714N   * *
44Rv1819c71.36N   * *
46Rv185726.52N   *  
47Rv186110.33N* ** *
48Rv189211.42N   * *
50Rv1924c14.23N* ** *
51Rv197413.51N*  *  
52Rv2076c9.03C   *  
55Rv2169c14.62C   *  
56Rv2198c30.91N   * *
57Rv2199c14.94N* ** *
58Rv227212.93N* ** *
59Rv227311.43N   * *
61Rv2398c29.37N   * *
63Rv2446c13.32N   *  
64Rv250728.51N   ** 
65Rv2532c14.32N   * *
66Rv259914.91N*  ntntnt
68Rv2620c14.64N   * *
69Rv2639c11.74N     *
71Rv2719c17.31N** ** 
74Rv293731.17N*  * *
75Rv300412.22N*  ntntnt
77Rv306914.33N* ** *
81Rv3200c38.12N** ** 
82Rv3217c14.34N   * *
83Rv3289c13.23N   * *
85Rv3368c23.71N*  ntntnt
86Rv3419c35.11N*  ntntnt
88Rv345311.92N   * *
89Rv3483c22.91N   ** 
90Rv352436.31N*  ntntnt
91Rv355026.31N   ntntnt
92Rv363563.58N   * *
93Rv3659c36.71N*  ntntnt
95Rv368334.41N*  ntntnt
96Rv3761c38.51N*  ntntnt
97Rv3773c20.71N*  ntntnt
98Rv378233.91N*  ntntnt
99Rv378913.44N  ** *
100Rv0821c23.50N** ntntnt
101Rv1270c24.90N * ntntnt
102Rv2330c18.80N*  ** 
103Rv287322.10N*  ntntnt
104Rv358418.80N   *  
Figure Figure 1..

The distribution of the putative α-helical transmembrane proteins from the genome of M. tuberculosis H37Rv (A) and in the targeted sample of ORFs (B) according to the predicted molecular weights of the proteins.

Figure Figure 2..

The distribution of the putative α-helical transmembrane proteins from the genome of M. tuberculosis H37Rv (A) and in the targeted sample of ORFs (B) according to the number of predicted transmembrane helices.

Figure Figure 3..

Histograms displaying the targeted (137, light blue), cloned (99, green), and expressed (70, dark blue) membrane proteins of M. tuberculosis H37Rv as a function of molecular weight (A) and number of putative transmembrane helices (B).

Figure Figure 4..

The results of a small-scale protein expression for ORF Rv1924c expressed as an N-terminal His-tag fusion in pET16b resulting in the expressed molecular weight of 15.5 kDa (shown with an arrow). Soluble, membrane, and inclusion body fractions were separated on 12% Tricine SDS-PAGE and stained with Coomassie blue R-250 (A) or subjected to Western blot detection (B). The lanes are as follows (left to right): 1, molecular-weight markers; 2, uninduced nonfractionated cells; 3, inclusion body fraction; 4, soluble fraction; and 5, membrane fraction. Molecular weight markers (kDa) are labeled with numbers.

Figure Figure 5..

Expression of membrane proteins as a function of the number of putative transmembrane helices. (A) Expression in inclusion bodies and membrane fractions. (B) Expression detected by Coomassie stain or Western blot. (C) Coomassie detection of expression in inclusion bodies or membrane fraction.

Figure Figure 6..

Expression of membrane proteins as a function of protein molecular weight. (A) Expression in the membrane fraction. (B) Expression in inclusion bodies. Dark color is Coomassie detection; light color is Western detection.

Figure Figure 7..

Final cloning regions for the modified pET16b and pET29(+) vectors. Starting and terminating codons are shown in bold. Gene flanking restriction sites are also labeled. ORF designates the position of the cloned DNA fragment. Amino acid residues corresponding to the N- or C-terminal His-tags along with the short spacer residues are shown with single amino acid abbreviations.


We thank the Department of Chemistry and Biochemistry at Florida State University for completely renovating laboratory space for this project. This work was supported by the NIH grant PO1 GM64676.