The evolutionary adaptations of thermophilic water-soluble proteins required for maintaining stability at high temperature have been extensively investigated. Little is known about the adaptations in membrane proteins, however. Here, we compare many properties of mesophilic and thermophilic membrane protein structures, including side-chain burial, packing, hydrogen bonding, transmembrane kinks, loop lengths, hydrophobicity, and other sequence features. Most of these properties are quite similar between mesophiles and thermophiles although we observe a slight increase in side-chain burial and possibly a slight decrease in the frequency of transmembrane kinks in thermophilic membrane protein structures. The most striking difference is the increased hydrophobicity of thermophilic transmembrane helices, possibly reflecting more stringent hydrophobicity requirements for membrane partitioning at high temperature. In agreement with prior work examining transmembrane sequences, we find that thermophiles have an increase in small residues (Gly, Ala, Ser, and Val) and a strong suppression of Cys. We also find a relative dearth of most strongly polar residues (Asp, Asn, Glu, Gln, and Arg). These results suggest that in thermophiles, there is significant evolutionary pressure to offload destabilizing polar amino acids, to decrease the entropy cost of side chain burial, and to eliminate thermally sensitive amino acids.
The proteins of thermophilic organisms are able to maintain stable folds at high temperatures. The adaptations required to maintain stability at high temperature have been of interest not only from the perspective of evolutionary mechanisms, but also because it may teach us how to stabilize proteins. As a result, the stabilizing mechanisms have been extensively explored for water-soluble proteins. The mechanisms observed include an increase in secondary structure propensity;1, 2 changes that decrease unfolded state entropy such as the introduction of Pro residues, reduction of Gly residues,3 smaller loops,4 and the addition of disulfide bonds;5–7 an increase of hydrogen bonds and salt bridges;8–12 and better optimized hydrophobicity.13 Szilágyi and Závodszky3 made an extensive study of many protein families and found that few of these mechanisms were general, except an increase in salt bridges. Rather, many of these mechanisms may be used for stabilization of a given protein, but not necessarily all of them. Berezovsky and Shakhnovich11 make the interesting observation that the preferred stabilization mechanisms can change depending on the evolutionary path. For example, proteins from organisms that evolved at high temperature show increased compactness, whereas proteins from organisms that evolved at lower temperatures and then adapted to higher temperatures are less compact and have increased salt bridges.
Because of the dearth of membrane protein structures, the evaluation of differences between mesophiles and thermophiles has been largely restricted to sequence analysis. Schneider et al.14 studied sequence differences between predicted transmembrane helices in the genomes of thermophilic and mesophilic membrane proteins. They observed a striking depletion of Cys residues in thermophiles and an increase in Gly, Ser, and Ala pair motifs, suggesting a preference for the packing of small residues. By comparing mesophilic and thermophilic reaction center sequences and structures, Shlyk-Kerner et al.15 showed that subtle changes in cavity volumes in photosynthetic reaction centers were a key factor in switching between thermophilic and mesophilic properties, implying an important role for packing in thermal adaptation.
Recent advances in structure determination of membrane proteins prompted us to investigate whether the database had been enlarged enough to examine the generality of prior results and look for additional structural features that change between thermophilic and mesophilic proteins. We were able to develop a database of 25 independent thermophilic/hyperthermophilic α-helical membrane protein structures and 101 independent mesophilic structures. Although a far cry from what is available for soluble proteins, this database enables us to examine whether there are prominent structural features that distinguish mesophiles and thermophiles.
Results and Discussion
Thermophile–mesophile database construction
To explore the structural differences between thermophilic and mesophilic membrane proteins that may contribute to thermostability, we constructed a database of high-resolution thermophilic and mesophilic membrane protein structures. We were able to find independent structures from 25 thermophiles (growth temperature >50°C) and 101 mesophiles (see Materials and Methods section). The database allows us to compare the features of thermophilic and mesophilic structures globally. Different folds may have different global properties, however. For example, Hildebrand et al.16 showed that proteins that undergo large domain movement are less well-packed than other membrane proteins. Ideally, we would like to compare proteins from the same family,3 but there is currently not enough structural data. We therefore built a database of similar structural segments to compare the similar local structures between thermophiles and mesophiles. We refer to global comparisons of the properties of all thermophiles to all mesophiles as unpaired data and the comparisons between similar structured regions only as paired data.
Burial and packing
Van der Waals packing is thought to play an important role in stabilizing the transmembrane domains of membrane proteins.17 We have found that membrane proteins increase van der Waals packing contributions relative to soluble proteins by increasing the overall level of side-chain burial, rather than by improved packing efficiency (volume occupied by atoms).18, 19 Nevertheless, either increased packing efficiency or increased burial could be an important factor in stabilizing membrane proteins in hot environments.15 We therefore investigated whether packing density and burial are different in mesophiles and thermophiles.
When comparing the water-soluble domains, we found essentially no difference between mesophiles and thermophiles in burial or packing for both the paired and the unpaired data (Table I). We also found no significant difference in packing density in the transmembrane domains.
Table I. Differences Between Thermophilic and Mesophilic Membrane Protein Structures
For unpaired structure or direct homolog data sets.
Mean difference (thermophiles–mesophiles) with standard error (A paired difference of 0 indicates no difference between paired thermophiles and mesophiles).
*Statistically significant (P < 0.05).
Burial (fractional side-chain buried)
0.780 ± 0.027
0.733 ± 0.012
0.054 ± 0.045
0.672 ± 0.021
0.661 ± 0.008
0.004 ± 0.025
0.719 ± 0.006
0.727 ± 0.002
−0.002 ± 0.012
0.718 ± 0.004
0.717 ± 0.002
0.003 ± 0.004
Surface loop length (number of residues)
19.126 ± 0.735
19.189 ± 0.747
−0.063 ± 0.203
Helix length (number of residues)
25.627 ± 0.723
26.969 ± 0.789
0.404 ± 1.074
Kinks (number per TM-helix)
0.070 ± 0.008
0.073 ± 0.004
−0.011 ± 0.009
Interhelical H-bonds (number per residue)
0.102 ± 0.020
0.115 ± 0.008
0.126 ± 0.024
0.220 ± 0.015
0.039 ± 0.034
We did observe a roughly 5% increase in the average fraction of the side-chain surface area buried in the transmembrane domains of thermophiles (P = 0.026, Mann–Whitney U-test). The increase was also seen in the paired data, albeit without statistical significance. Although the increase is small in magnitude, even mesophilic transmembrane domains bury a high fraction of their potential surface area and hence any increase is likely to be difficult to achieve and therefore hard to detect. Thus, we suggest that thermophiles may bury slightly more surface area on average in the transmembrane domains.
Surface loop length in thermophiles and mesophiles
Prior work on water-soluble proteins showed that thermophiles can have shorter loops between secondary structure elements, which is thought to be a mechanism for reducing unfolded state entropy.4 In a similar fashion, we might expect to see shorter loop lengths between transmembrane helices of thermophiles compared to mesophiles. On the other hand, it is possible that longer surface loop lengths may have evolved to help structure the transmembrane domains.
To investigate changes in loop length, we would ideally compare proteins within the same family, but as noted above there were few homologous thermophile/mesophile pairs. Consequently, we augmented our structural database with homologous sequences. For every structure, we identified homologous mesophile and thermophile sequences. We then utilized the sequence alignment with the protein of known structure to define the transmembrane segment endpoints. In this manner, we could use structural information to estimate loop lengths in all the homologous proteins. We found an average loop length of 19.189 ± 0.747 residues for mesophiles and 19.126 ± 0.735 for thermophiles (Table I). Thus, there is apparently no major difference in loop length between the two classes of proteins (P = 0.49, Mann–Whitney U-test).
Transmembrane helix length in thermophiles and mesophiles
Another way to reduce the entropy cost of folding would be to decrease order in the folded state. This could be accomplished by increased fraying at the ends of transmembrane helices, reducing their length on average. We found a very small (5%) decrease in the average length of thermophilic transmembrane helices for the unpaired databases (P = 0.27, Mann–Whitney U-test). This difference completely disappears in the paired data, however. Thus, if transmembrane helices are shorter, it is not a large difference.
Transmembrane kinks in thermophiles and mesophiles
Kinks and other helix distortions should be higher energy features relative to regular helices.20 Consequently, it is possible that these distortions would be minimized in the transmembrane helices of membrane proteins. In the overall analysis of all thermophiles and mesophiles, we found an average of 0.073 ± 0.004 kinks per transmembrane residue in mesophiles and 0.070 ± 0.008 kinks in thermophiles (Table I). Although within margin of error, we also found a consistent decrease in the number of kinks in thermophiles when comparing the paired structures. Thus, there may be modest evolutionary pressure to reduce the number of kinks in transmembrane helices. Helix distortions are likely to be important for function, however, and hence there are likely to be limits on how much kinking can be reduced.
Interhelical hydrogen bonding in thermophiles and mesophiles
An increase in hydrogen bonds and salt bridges is a commonly observed trend when comparing water-soluble proteins from mesophiles and thermophiles.8–12 Although most hydrogen-bonded side-chain interactions that have been experimentally measured in membrane proteins appear to make relatively modest contributions,21–27 some have been measured as high as 1.8 kcal/mol and in theory it is possible that they could be made even stronger.26, 28 Thus, it seems reasonable to expect that the number of transmembrane interhelical hydrogen bonds might increase in thermophiles. However, we find a slight decrease in the number of interhelical hydrogen bonds in thermophiles relative to mesophiles in both the unpaired and the paired comparisons. This result is consistent with the view that strongly polar residues are generally present in the transmembrane domains for functional reasons, not to stabilize the structure.26, 27 As discussed below, the imperative to maintain hydrophobicity may trump any benefit accrued by additional hydrogen bonds.
Thermophilic transmembrane helices are more hydrophobic
Prior work comparing the amino acid composition of thermophilic and mesophilic transmembrane helices relied on predictions from sequence information.14 Here, we employed structures to define the transmembrane segments. To bolster our database, we added aligned sequences of close homologs of the known structures, using the structures to identify transmembane helices.
The observed differences in amino acid composition are shown in Figure 1. In agreement with the earlier results from predicted transmembrane helices, we find a dramatic depletion of Cys in thermophiles, likely reflecting its higher reactivity at high temperature. We also find an enrichment of the small residues Ala and Gly and to a lesser extent Ser and Val. Schneider et al.14 found a significant increase in Asp and Glu among thermophiles and essentially no difference for the other strongly polar amino acids except Gln which was found to be depleted in thermophiles. We, however, find a large reduction in all the strongly polar residues (Asp, Asn, Glu, Gln, and Arg) with the exception of Lys and His. We also see a compensatory increase in the large apolar residues Phe, Leu, and Ile.
Why do we find a clear depletion of polar residues in thermophilic transmembrane helices that were not observed using transmembrane helices predicted from sequence data? It appears that the transmembrane helices of thermophiles are simply more hydrophobic on average than those of mesophiles (Supporting Information Table S1). Indeed, using the biological hydrophobicity scale,29 the average hydrophobicity of structurally observed transmembrane helices is +0.220 ± 0.015 kcal/mol per residue in mesopohiles and +0.126 ± 0.024 kcal/mol per residue in thermophiles (P = 0.005, Mann–Whitney U-test). Thus, a uniform prediction criteria for both mesophiles and thermophiles would tend to reject more of the less hydrophobic transmembrane helices in mesophiles.
The most striking difference we observe between the membrane proteins in mesophiles and thermophiles is a general increase in the hydrophobicity of the thermophile transmembrane helices. This result implies more stringent requirements for transmembrane helix insertion at high temperature. To the extent that the translocon measures the thermodynamic preference for a sequence segment to insert in the bilayer or remain in an aqueous environment,29 it makes sense that a lower free energy well would be required at high temperature to ensure the insertion decision is made. Theoretical calculations and experimental measurements indicate that the free energy of partitioning peptides into bilayers is relatively constant with temperature,43, 44 and hence higher temperatures should decrease the probability of insertion. This imperative may explain the general depletion of strongly polar residues that is observed. It may also preclude the deployment of more stabilizing hydrogen bonding interactions. If so, membrane proteins need to find other ways to stabilize their structures at higher temperature. How might they do this? One observation made by Scheider et al.14 and confirmed here is the increase in small amino acids in thermophiles. This could have two stabilizing effects: (1) packing small residues rather than large residues lowers the entropy cost of packing,45, 46 which could be particularly important at higher temperature; and (2) small residues can allow for more intimate interactions between helices. Consistent with this latter view, we also find a modest increase in the average fraction of side-chain surface area buried in thermophiles. Finally, we observe a possible small reduction in the number of high energy kinks.
A significant caveat with a global comparison of mesophiles and thermophiles is that we interpret all changes as reflecting the effects of growth temperature, the only common parameter that changes between the two groups of proteins analyzed here. But clearly there are other parameters that change when comparing proteins from different organisms. Moreover, some proteins in mesophiles are not any less stable than a thermophilic protein, but we do not have stability data for most membrane proteins at this time. Also, as pointed out by Szilágyi and Závodszky,3 any given protein may evolve a different predominant method of stabilization. As a result of all these variables, many actual stabilizing mechanisms may disappear in the noise. A careful parsing of the different effects will be more feasible when we have many examples of mesophile and thermophile structures from the same family, along with stability measurements, so that direct comparisons can be made.
Materials and Methods
A nonredundant database of mesophiles and thermophiles was created using α-helical membrane proteins from Stephen White's database (http://blanco.biomol.uci.edu/mpstruc/listAll/list).30 Growth temperatures were obtained from the American Type Culture Collection and web searches (Supporting Information Table S2). Structures were divided into mesophiles and thermophiles based on a 50°C optimal growth temperature cutoff.
Each group of structures (mesophiles and thermophiles) was made nonredundant separately. Within each group, none of the protein sequences was similar to each other at the level of an expectation value of 1 × 10−10 according to BLAST.31
For each structure, the author-recommended oligomer was chosen as described by REMARK 350 in each Protein Data Bank (PDB) file. Only unique chains (i.e., nonidentical) were used for analysis. Monotopic membrane proteins, nuclear magnetic resonance, and electron crystallography structures were eliminated. We also eliminated structures that had fewer than three transmembrane helices per oligomer. Prosthetic groups for each structure were retained, but lipids and other nonintrinsic components were removed from each structure. The PDB codes for the two groups are as follows.
Structurally homologous pairs were identified using the structural alignment tool TM-align.32 One set of pairs was constructed with no resolution cutoff that could be used for sequence-based analysis, and a second set was constructed using a 3.2 Å or better resolution cutoff that was used for analyzing more detailed features. Only unique chains were used for structural pairing. Each thermophile structure was structurally aligned to every mesophile structure. A pair was considered similar if the alignment involved more than 100 residues and the root mean square deviation was less than 5 Å. The longest alignment was chosen as the representative pairing for each thermophile structure. Each thermophile was matched to one mesophile sequentially and in a nonredundant fashion so that no thermophile was matched to the same mesophile. Analysis (e.g., burial, packing, helix length, etc.) of structural pairs was restricted to aligned regions only.
Transmembrane region assignment
Transmembrane regions were obtained from a consensus database constructed from the Protein Data Bank of Transmembrane Proteins (PDBTM) and Orientations of Proteins in Membranes (OPM) database, or using TMDET to identify transmembrane regions for those proteins not present in these databases.33–35 The assigned transmembrane regions are summarized in Supporting Information Table S3.
Burial data were obtained by calculating relative solvent accessibilities per residue using a fast version of the Shrake and Rupley algorithm36 implemented in the Ezprot library (http://www.doe-mbi.ucla.edu/local/software/ezprot).37 Only unpaired and paired structures of 3.2Å or better resolution were analyzed. Burial values were calculated separately for soluble and transmembrane regions of unpaired thermophiles and mesophiles, as well as for the paired thermophile–mesophile structures. For the unpaired structures, values were averaged over all mesophiles and thermophiles separately. In computing the average, burial values were weighted by the number of residues in a given protein to obtain a weighted burial average over all proteins; this ensured that burial values from smaller proteins did not unnecessarily bias the results calculated over all proteins. For the paired structures, difference values between each thermophile–mesophile pair were averaged over all thermophile–mesophile pairs.
Packing values were calculated using Rother et al.'s Voronoia program for computing packing densities.38 Surface residues were excluded from the calculation. A 0.20-Å grid distance and the ProtOr radii were chosen as parameters based on similar ones chosen by Hildebrand et al.16 Only unpaired and paired structures of 3.2 Å or better resolution were analyzed. Packing values were calculated separately for soluble and transmembrane regions of unpaired thermophiles and mesophiles, as well as for the thermophile–mesophile paired structures (as was done above for burial). For the unpaired structures, the values were averaged over all thermophiles and mesophiles separately. In computing the average, packing values were weighted by the number of residues in a given protein to obtain a weighted packing average over all proteins. For the paired structures, difference values between each thermophile–mesophile pair were averaged over all thermophile–mesophile pairs.
Interhelical hydrogen bonding calculation
HBPLUS was used to identify hydrogen bonds.39–41 Only structures of 3.2 Å or better resolution were analyzed and we only counted hydrogen bonds between transmembrane helices (interhelical hydrogen bonds). Interhelical hydrogen bonds were counted for each unpaired thermophile or mesophile structure and normalized by the number of transmembrane residues in each structure. The average number of interhelical hydrogen bonds per transmembrane residue was then calculated for all thermophile or mesophile proteins separately. Similarly, for paired structures, interhelical hydrogen bonds were counted for each paired thermophile or mesophile structure and normalized by the number of transmembrane residues in each paired structure. The difference in this value between each thermophile–mesophile paired structure was then calculated. These differences were then averaged over all paired structures.
Transmembrane helix length calculation
Transmembrane helix length was examined in unpaired and paired structures with no restriction on structure resolution. To define the extent of transmembrane helices, we first started with the helical transmembrane segment in the hydrocarbon core region of the bilayer, and then continued the helices out to the helix endpoints as defined by DSSP.40, 41 No structures with fewer than three transmembrane helices were considered in analyzing helix length. For unpaired structures, helix lengths were averaged over each structure and then over all thermophiles (or mesophiles). For paired thermophile–mesophile structures, helix lengths were averaged over the chain of each paired structure for aligned residues. The difference in average helix length between the two paired structures was then calculated. These differences were then averaged over all paired structures.
The biological hydrophobicity scale was used for this calculation.29 For each unpaired structure, hydrophobicity was averaged over each transmembrane segment. The average for each unpaired structure was then averaged over all thermophile or mesophile structures. Similarly, for each paired structure, hydrophobicity was averaged over each transmembrane segment. The difference in average hydrophobicity between paired structures was then calculated. Finally, the average of differences in hydrophobicity between paired structures was calculated.
Transmembrane helical kink calculation
Transmembrane helical kinks were identified as described previously.42 Only structures of 3.2 Å or better resolution were analyzed. Kinks were counted for each unpaired thermophile or mesophile structure and normalized by the number of transmembrane residues in each structure. The average number of kinks per transmembrane residue was then calculated for all thermophile or mesophile proteins separately. For the paired structures, kinks were normalized by the number of transmembrane residues in each paired structure. The difference in this value between each thermophile–mesophile paired structure was then calculated. These differences were then averaged over all paired structures.
Direct homolog identification for surface loop length and transmembrane amino acid preference studies
Neither the unpaired nor the paired structures provided sufficient statistical power to study surface loop length and transmembrane amino acid preferences. As a result, we identified direct sequence homologs of the unpaired mesophile structures using BLAST (identifying both mesophiles and thermophiles).31 We used an expectation cutoff value of 1 × 10−10 and limited the number of homologs identified to 10,000 for each unpaired mesophile structure's sequence. These BLAST results were screened for thermophile and mesophile homologs according to species name. We have provided the species names used in the Supporting Information. Only homolog alignments whose length was 70% or greater than the length of the query mesophile sequence were retained. The average number of mesophile sequences obtained in this manner was 955 and the average number of thermophile sequences was 34.
Surface loop length calculation
Using the direct homologs of mesophiles to identify additional thermophiles and mesophiles, we analyzed loop length between thermophiles and mesophiles. Loops were identified as the intervening regions between transmembrane segments. We ignored loops longer than 50 residues to eliminate folded water-soluble domains. Only loops that were entirely contained within direct homolog alignments were considered. We analyzed one randomly chosen thermophile or mesophile homolog for every original, unique chain of each mesophile structure to avoid double counting homologs. Loop lengths were then averaged over all mesophiles or thermophiles.
Transmembrane amino acid preferences calculation
Using the direct homologs of mesophiles to identify additional thermophiles and mesophiles, we analyzed transmembrane amino acid preferences between thermophiles and mesophiles. Only transmembrane amino acids that were contained within direct homolog alignments were considered. Sequence gaps were not counted. We employed a weighting scheme to prevent over counting of highly similar sequences. For example, if two aligned sequences are 100% identical, each amino acid at each position would be counted twice if unweighted although the second sequence provides no new information. Counts were therefore weighted according to the average sequence identity between the original structure and the homolog sequence. The first homolog's sequence was given full weight for each amino acid, whereas additional homolog amino acids were given a weight of (1-% identity). In other words, if the homolog sequence was 90% identical to the sequence of the structure, the count for every amino acid in the sequence was incremented only by 0.1 rather than 1. The number of amino acids of each type was counted for mesophiles or thermophiles. These counts were then normalized by the total number of amino acids for mesophiles or thermophiles. Finally, we calculated the ratio between thermophiles and mesophiles of each amino acid type.
The authors thank members of the Bowie lab for comments on the manuscript.