Figure 1 shows the distribution by molecular weight of the 1162 open reading frames (ORFs) identified in the M. tuberculosis database as putative helical membrane proteins and the 143 ORFs that have been targeted for this study. More than 60% of the potential membrane proteins from M. tuberculosis are <40 kDa, and >15% are <20 kDa. This distribution reflects only the molecular weights of the protein monomers and not the molecular weights of the functional membrane proteins as multiprotein complexes or oligomeric structures. The analysis of the transmembrane helical content of the ORFs in the genome indicates that most membrane proteins have just one or two transmembrane helices (Fig. 2), and 84% of these putative membrane proteins have fewer than seven transmembrane helices. During the course of this study, four of the expressed proteins were found to be lipoproteins rather than integral membrane proteins and are not included in the following analysis. In addition, two ORFs were excluded after we identified them as secreted proteins, i.e., proteins with a cleavable transmembrane signal sequence.
For the targeted ORFs we aimed to generate a large and diverse collection of membrane proteins biased somewhat toward low-molecular weight proteins that would be appropriate for structural characterization by NMR spectroscopy. While the targeted ORFs show this bias, the distribution by number of transmembrane helices resembles the complete set of 1162 membrane protein ORFs of this genome. Consequently, the bias in molecular weight distribution primarily reflects a bias against large extramembranous domains in our targeted set. Qualitatively, this sample of ORFs were functionally annotated as putative membrane proteins (38%), conserved proteins (42%), and unknown or hypothetical proteins (20%).
Cloning of selected ORFs and modification of expression vectors
Cloning was based on the polymerase chain reaction (PCR). Primers were designed when possible to optimize codon usage, especially at the beginning of the cloned DNA sequences. Ten percent of initial PCR amplifications required additional optimization of temperature conditions to generate a desired PCR product. Overall, >85% of the DNA fragments encoding targeted proteins were successfully amplified from M. tuberculosis H37Rv genomic DNA (obtained from the TB Research Materials and Vaccine Testing Contract, Dr. John Belisle, Colorado State University).
The initial cloning effort of 137 targeted membrane protein ORFs showed that >72% were successfully cloned and ligated into one of two expression plasmid vectors. A total of 87 ORFs were cloned into pET16b (Novagen, Inc.) and 12 ORFs were cloned into pET29b(+) with short nonremovable N- or C-terminal (His)6 tags, respectively. Consequently, approximately half of the failures in cloning occurred at the PCR stage. Table 1 provides a listing of the successfully cloned ORFs from the targeted list. As shown in Figure 3A, cloning efficiency is significantly higher at low molecular weight (83% <30 kDa) compared with high molecular weight (54% >30 kDa). In addition, cloning efficiency appears to be less for single transmembrane helical proteins (58%) than with larger numbers of helices (81%; Fig. 3B).
In the interest of high throughput, a single set of growth conditions for testing membrane protein expression was used. Of the 99 successfully cloned membrane protein genes, 70 (70%) showed expression in E. coli BL21(DE3) CodonPlus-RP (Stratagene) or C43(DE3) (Avidis; Miroux and Walker 1996) strains. The C43(DE3) cells were used for a secondary expression effort, resulting in the expression of four ORFs that did not express in BL21(DE3) CodonPlus-RP cells. A total of 29 cloned genes were not expressed by these limited efforts. Figure 4 shows an example of small scale expression for ORF Rv1924c, a 14.2-kDa protein with three predicted transmembrane helices overexpressed in both the membrane and insoluble fractions of E. coli, and Table 1 displays the expression results for all of the ORFs. Very high expression was observed for 35 of the proteins using Coomassie stain, and the expression of the remaining 35 was detected by Western blots with antibodies to the N-terminal His-tag. The individual expression results are presented in supplementary material in table form. Overall, 70 M. tuberculosis membrane proteins were expressed in E. coli, with 62 proteins expressed with an N-terminal His-tag and eight proteins expressed with a C-terminal His-tag.
The molecular weights of the expressed proteins range from 8.1 to 71.3 kDa, demonstrating expression over >90% of the molecular weight range of the M. tuberculosis membrane protein genome. In fact, observed expression seems to be virtually independent of molecular weight, with 70% efficiency for proteins having a molecular weight >30 kDa and 73% for those having a molecular weight <30 kDa (Fig. 3A).
Membrane proteins having one to 14 putative transmembrane helices have been expressed. The fraction of expressed ORFs relative to the number of cloned ORFs having three or more transmembrane helices is 64%, and the fraction of expressed ORFs with only one or two helices is 78%, indicating a small reduction in expression efficiency for membrane proteins having a larger number of transmembrane helices (Fig. 3B). Even so, 11% of our expressed proteins have more than six transmembrane helices, and only 16% of all putative membrane proteins from M. tuberculosis have more than six transmembrane helices. Despite the small decrease in efficiency for proteins with a large number of helices, this expression approach appears to be a robust general approach for membrane protein expression for this genome.
Membrane proteins are expressed in three different cell fractions: in the soluble fraction, in the membrane fraction, or in what appears to be an insoluble fraction. More than 94% of the expressed proteins formed some degree of insoluble aggregates: 53% were detected in the membrane fraction, and 14% were observed in the soluble fraction. All of the proteins isolated from the soluble fraction were also expressed as insoluble aggregates, and 50% of these proteins were also expressed in the membrane fraction. The 50% that are not expressed in the membrane fraction have just one or two transmembrane helices, and all but one of these proteins has at least 85% of their residues outside of the hydrophobic region of the membrane (assuming that each predicted membrane spanning segment as listed in supplementary material table is 20 residues long). The 50% that are expressed in the membranes have from two to 10 transmembrane helices, and the extramembranous content is between 34% and 57% of the total length of the protein.
Membrane proteins with just one or two transmembrane helices, based on the results in Figure 5A, are much more likely to be expressed as insoluble aggregates or “inclusion bodies” than in membranes. In fact, we have no examples of proteins with one or two helices that were expressed only in membranes, although a few were expressed in both membranes and inclusion bodies. For proteins with three or more helices, the proteins were typically expressed in both inclusion bodies and in the membrane fraction. The fraction of proteins that were, at least in part, expressed in membranes is 9% of proteins having a single helix, 36% for two helices, 81% for three helices, and 100% for those having four or more helices. In contrast, the fraction of proteins expressed in inclusion bodies having one or two helices is 100%; for three or more helices, 83%. We note, however, that these percentages are based on the assumption that the low-speed pellet represents insoluble aggregates. There have been reports that gross overexpression of some membrane proteins into membranes may cause a density change of the membrane, causing the protein-containing membrane fraction to sediment at very low speeds (see Arechaga et al. 2000).
Detection by Coomassie stain suggests considerably more expressed protein than detection by Western blot. Of the 70 expressed membrane proteins, 35 (50%) proteins were overexpressed and detected via standard Coomassie R-250 staining of SDS-PAGE gels. The other 35 (50%) proteins were detected via Western blot analysis only. There is a correlation between the number of helices and the probability for overexpression detected by Coomassie stain (Fig. 5B). Eighty-seven percent of the expressed proteins with a single helix were detected by Coomassie stain, 40% for proteins with two or three transmembrane helices, and 18% for those with four or more helices. Consequently, overexpression with this protocol is much more easily achieved for those proteins with a small number of transmembrane helices. Proteins expressed in the membrane fraction and detected by Coomassie stain are restricted to proteins having two to four transmembrane helices (Fig. 5C) and a narrow molecular weight range from 10.1 to 14.9 kDa (Fig. 6A). Proteins expressed in inclusion bodies and detected by Coomassie stain are almost exclusively in the one to three transmembrane region of Figure 5C, but they are evenly distributed over the entire molecular weight range (Fig. 5B).