Amino-acid interactions in psychrophiles, mesophiles, thermophiles, and hyperthermophiles: Insights from the quasi-chemical approximation



We investigate the mechanisms used by proteins to maintain thermostability throughout a wide range of temperatures. We use the quasi-chemical approximation to estimate interaction strengths for psychrophiles, mesophiles, thermophiles, and hyperthermophiles. Our results highlight the importance of core packing in thermophilic stability. Although we observed an increase in the number of charged residues, the contribution of salt bridges appears to be relatively modest by comparison. We observed results consistent with a gradual loosening of structure in psychrophiles, including a weakening of almost all types of interactions.

Organisms live in a wide range of environments, including extremes of temperature, pH, salt concentration, and pressure. While intracellular pH and salt concentration can be modulated, the various proteins in prokaryotes must be stable and functional throughout the entire range of temperatures that these organisms experience. At first it might seem surprising that the same mechanisms of protein stabilization act in both psychrophiles at temperatures below 0°C and hyperthermophiles active above 100°C. Because the relatively modest thermodynamic stability of proteins (5–10 kCal/mol) results from the small difference between large stabilizing (hydrophobic, ion-pair, hydrogen-bonding, van der Waals interactions) and destabilizing (conformational entropy) contributions, small changes in the number or strength of these contributions can cause a large proportional change in this difference. In fact, the wide range of interactions that can be adjusted means that different proteins from different organisms can use various combinations of modulations to adjust their thermostability. This has made it difficult to delineate common principles (Vogt and Argos 1997; Jaenicke and Bohm 1998).

One of the difficulties in studying the thermostability of organisms from different temperatures is that the interactions that stabilize proteins are themselves temperature dependent. One of the main stabilizing effects in proteins is due to the hydrophobic effect. At room temperature, where the hydrophobic effect is maximal, nonpolar molecules in contact with aqueous solvent results in organization of the solvent molecules in order to reduce enthalpically unfavorable interactions. The result is a decrease in entropy of the solvent, which is the dominant contribution. This entropic contribution to the free energy has an explicit temperature dependence, decreasing as the temperature is reduced, contributing to the phenomenon of cold denaturation. As the temperature is increased, the entropic nature of the hydrophobic effect is replaced by an enthalpic contribution, as it becomes more favorable to lose the enthalpically favorable interactions than to structure the higher temperature water. This changing nature of the hydrophobic interaction results in a decreasing hydrophobic effect as the temperature is increased above room temperature.

Other interactions can have their own temperature dependences. The conformational entropy of the protein is, like all entropic contributions, explicitly temperature dependent. Even interactions such as van der Waals interactions, salt bridges, and hydrogen bonds can lose effectiveness at higher temperatures as molecules become more disordered and fluxional, as optimal geometries are no longer maintained.

The main issue in thermophilic proteins is in the understanding of how they avoid denaturing at high temperatures. There has been much interest in thermophiles for two important reasons. One is the possibility that life has a thermophilic origin, arising, for instance, in deep-sea vents (Wiegel and Adams 1998). There are also important biotechnological applications for proteins that are able to function at higher temperatures. For this reason many would like to engineer higher thermostability for normally mesothermal proteins. It is important to note, however, that while the mechanisms of natural thermophilic stabilization can provide insights into possible approaches for such protein-engineering applications, there might not be a strict equivalence between the gradual changes in protein sequence that can be accomplished through the natural evolutionary process and the relatively few mutations technologically optimal for engineered thermostability. There is experimental evidence, for instance, that enhanced thermal stability can result from interactions distributed throughout the structure (Hollien and Marqusee 1999).

Much less work has been done on psychrophiles, even though most of the biosphere consists of psychrophilic environments. Although proteins undergo cold denaturation, due to the temperature dependence of the hydrophobic effect, the emphasis is more on function than on structure—how, for instance, enzymes can function at temperatures where significant activation energies are difficult to obtain, and where the flexibility required for catalysis may be reduced. Interestingly, however, it has been observed that regions of psychrophilic proteins not involved in catalysis might be equally or even more rigid then their mesophilic homologs (Liang et al. 2004; Papaleo et al. 2006), possibly related to the reduced efficacy of the hydrophobic interactions.

Two different approaches have generally been used to look at thermostability in extremophiles. The first involves studying a specific set of proteins, such as a mesophilic and thermophilic homolog, examining in-depth the differences between these proteins. The advantage of this approach is the detailed understanding it can generate of the specific proteins under examination. The disadvantage is the uncertainty of extrapolating the insight gained to other proteins that may use quite different mechanisms for thermostability. The second approach involves a broader investigation of sequence and structure properties of a set of database proteins by examining the frequency of different residues or different interactions at different types of locations. The result is a broader survey of the ensemble of different methods for changing thermostability. The difficulty is estimating the effect of these different interactions. The effect of an increased number of exposed salt bridges, for instance, can only be assessed if we have an idea about how much they would contribute to the folding free energy difference, a matter of some disagreement (Hendsch and Tidor 1994; Elcock and McCammon 1997).

One method of bridging this gap is to use the more general properties of the set of database proteins to estimate interaction strengths. This can be done using the quasi-chemical approximation first applied to proteins by Miyazawa and Jernigan (1985). They postulated that the number of any given type of interaction would be proportional to the Boltzmann factor of the strength of that interaction. While many of the assumptions in their analysis are unlikely to be strictly true, other investigators have established justifications for the Boltzmann-like distribution of interactions based on evolutionary considerations (Finkelstein et al. 1995a,b). The advantage of this approach is that one can look over a large database of structures and derive general principles, including estimates of the strength of temperature-dependent interactions, complementing the other approaches that have been developed.

In this study we apply the quasi-chemical approach of Miyazawa and Jernigan (1985) to four different databases of proteins, psychrophiles, mesophiles, thermophiles, and extreme thermophiles. We observed trends that seem to highlight the packing interactions in thermophiles and extreme thermophiles, with a more modest role for salt bridges and other charged interactions. We also observed trends with the psychrophilic proteins consistent with less compact structures.


We analyzed a database of 47 proteins from psychrophilic organisms (maximum growth or general physiological temperature below 20°C), 814 proteins from mesophilic prokaryotic organisms (20°C–45°C), 269 proteins from thermophilic organisms (45°C–80°C), and 334 proteins from hyperthermophilic organisms (80°C and above). (The definition of “psychrophilic” was slightly expanded from the more common definition of organisms that thrive below 15°C in order to increase the size of the corresponding psychrophilic protein database.) The quasi-chemical approximation was used to compute the interaction potential for contacts between different amino acid types. We also computed the contact potential for interactions between amino acids from five broader categories of residues, dividing residues into positively charged (“POS,” Arg, and Lys), negatively charged (“NEG,” Asp, and Glu), aromatic (“ARO”, Phe, Trp, Tyr), uncharged polar (“POL,” Asn, Cys, Gln, His, Ser, Thr), and hydrophobic (“FOB,” Ala, Gly, Ile, Leu, Met, Pro, Val).

Detailed data are presented in the Supplemental material. The fraction of amino acids from the various categories in the four protein sets are shown in Figure 1A. The average number of residue–residue contacts formed by each of the amino acids in each category, normalized by the expected total number of (residue plus solvent) contacts, is shown in Figure 1B. The contact potential between different categories of residues is shown in Figure 2, expressed as εij, the energy of forming both a contact between these two residue types plus a solvent–solvent bond relative to the energy of the two bonds lost between the residues and solvent. We can also compute εir, the value of εij averaged over the different amino acid types forming potential interaction partners, and εrr, the average value of εij for all potential pairs of amino acids, both shown in Figure 1C. εir–εrr, the average energy of a contact between a residue of type i and an average residue of the protein, compared with the average overall residue, shown in Figure 1D, provides us with a measure of how much each particular residue type prefers to be buried in the protein away from the solvent. Finally, we can compute the fractional amount each particular type of interaction contributes to the overall protein stability, as shown in Figure 3.

Figure Figure 1..

Various parameters as a function of temperature for different categories of residues: positively charged (red), negatively charged (yellow), uncharged polar (green), aromatic (purple), and hydrophobic (blue). (A) Composition of the proteins in the four databases among the different protein categories. (B) Relative average number of contacts made by each of the four categories relative to the maximum expected. (C) εir and εrr (black). (D) εir–εrr, measuring the relative tendency of the various amino acids to be buried in the protein interior. Errors were calculated as described in the Materials and Methods section. In this plot, as well as the following plots, error bars not shown are smaller than the smallest error bars shown.

Figure Figure 2..

Interaction energy εij between residues of different categories as a function of temperature. εij represents the energy of forming a contact between residues of categories i and j as well as a solvent–solvent contact, compared with the energy of contacts between the two residues and solvent. Solid lines represent contact energy between residues of the same type, using the same coloring scheme as in Figure 1, e.g., solid blue line represents the contact energy between two hydrophobic residues. Multicolored dashed lines represent the contact energy between residues of different categories, e.g., green–purple line represents contact energy between an aromatic and an uncharged polar residue. Errors were calculated as described in the Materials and Methods section.

Figure Figure 3..

Fraction of the interaction energy that is formed by pairs of contacts, using the same color scheme as in Figure 2. Errors were calculated as described in the Materials and Methods section.

Thermophiles and hyperthermophiles

Previous studies have generally ensured that the data sets corresponding to the various categories of proteins have corresponding homologous proteins. This is important in making sure that there are few systematic biases when comparing quantities such as the relative fraction of the various amino acids. We did not do this in this study, as we wanted to ensure the maximum size of the database used, and we are most interested in computing quantities whose accuracy depends upon the independent validity of the quasi-chemical assumption. Still, the changes in the relative fraction of the different types of residues in thermophiles relative to mesophiles are similar to those calculated by other researchers. We observed a significant increase in the number of the charged residues (Glu, Lys, and Arg, but not Asp) and a more modest increase in the number of aromatic (Tyr and Phe, but not the less thermostable Trp) (Jaenicke and Bohm 1998), as well as a corresponding decline in the number of uncharged polar residues (Gln, Asn, Ser, His, Thr) (Haney et al. 1999; Cambillau and Claverie 2000; Chakravarty and Varadarajan 2002). Similarly to the work of Zeldovich et al. (2007), we observed a substantial increase in the sum of Ile, Val, Tyr, Trp, Arg, Glu, and Leu, although most of this increase is due to the contributions of Glu and Val. The increase in charged residues is consistent with the theories that stress the role of surface-residue salt bridges in achieving stability (Vogt et al. 1997; Elcock 1998; Perl et al. 2000; Dominy et al. 2002; Zhou and Dong 2003), as well as the effect these surface charges have on the dielectric properties of the proteins (Dominy et al. 2004). In addition, it has been suggested that interactions between cations and the π-orbitals of aromatic residues (Chakravarty and Varadarajan 2002), or between residues in aromatic clusters (Kannan and Vishveshwara 2000), could provide further stabilization, consistent with our observed increase in the number of aromatic residues. We observed a significant increase in Pro in thermophiles, but not in hyperthermophiles. We also observed a significant decrease in Cys, potentially due to its relatively low thermostability (Jaenicke and Bohm 1998).

It has been suggested that proteins use a variety of different mechanisms for achieving thermostability in thermophiles and extreme thermophiles. Consistent with this picture, there was an increase in the magnitude of almost all interactions, as indicated by the value of εij, the energy of forming a contact between residue type i and residue type j relative to those residues being in contact with solvent.

Core packing

One of the common mechanisms used to explain the increased thermostability of thermophilic proteins involves tighter packing of the interior (Jaenicke and Zavodszky 1990; Britton et al. 1995; Chan et al. 1995; Russell et al. 1997; Scandurra et al. 1998; Gromiha et al. 1999; Lo Leggio et al. 1999; Jaenicke 2000; Criswell et al. 2003; England et al. 2003; Berezovsky and Shakhnovich 2005; Pack and Yoo 2005; Nakamura et al. 2006), although the analysis of protein structures have yielded inconclusive and mixed results (Vogt and Argos 1997; Karshikoff and Ladenstein 1998; Szilagyi and Zavodszky 2000). Our results are consistent with the tighter packing hypothesis, as the largest increase in interaction energy was in the interaction between hydrophobic residues, increasing by 9% between mesophiles and thermophiles with an additional 23% increase between thermophiles and hyperthermophiles. There was also a significant increase in the strengths of the contacts between hydrophobic residues and aromatic residues, uncharged polar residues, and charged residues, as well as between aromatic residues. The same trend was observed in the increasingly negative values of εir and εir–εrr for hydrophobic residues with increasing temperature. As shown in Figure 3, the fraction of the interaction energy that was comprised of interaction between hydrophobic residues increases steadily from mesophiles to thermophiles to hyperthermophiles. Figure 1B supports this observation with an observed increase in the number of inter-residue contacts made by hydrophobic residues with increasing temperature.

There was also an increase in the stabilization energy for interactions between aromatic residues, consistent with the suggestion that aromatic clusters may enhance thermophilic stability (Kannan and Vishveshwara 2000). We also observed substantially increased contact energy between uncharged polar residues, as well as between charged and polar residues. Many uncharged polar and charged residues are either hydrogen-bond donors (Arg, Asn, Gln, His, Lys, Ser, Thr, Trp, and Tyr) or acceptors (Asn, Asp, Glu, Gln, His, Ser, Thr, and Tyr). Greater hydrogen-bond interaction, as has been proposed as a contribution to thermophilic stabilization (Tanner et al. 1996; Vogt and Argos 1997) would exhibit itself as increased stabilization energy between polar residues and between polar and charged residues, as was observed.

Salt bridges

The role of external salt bridges in stabilizing proteins from thermostable organisms has been a matter of debate (Hendsch and Tidor 1994; Honig and Yang 1995; Xiao and Honig 1999). There is increasing evidence for the importance of these salt bridges (Vogt et al. 1997; Elcock 1998; Perl et al. 2000; Dominy et al. 2002; Zhou and Dong 2003), complemented by models of the effect surface charges have on the dielectric properties of the proteins (Dominy et al. 2004) as well as why these salt bridges might be particularly important for thermophiles (Elcock 1998; Thomas and Elcock 2004).

We observed a slight increase in the strength of the interaction between charged residues, increasing about 10% between mesophiles and thermophiles, and only a small additional increase between thermophiles and hyperthermophiles. There was a small increase in the fraction of the stabilizing interactions that was due to interactions between positive and negative residues, from 2.3% for mesophiles to 2.7% for thermophiles and 2.9% for hyperthermophiles. This observed increase is consistent with the hypothesized advantages of salt bridges at higher temperatures, although the increase both in the strength of the interaction and the overall importance of this interaction, measured as a fraction of stabilization energy, is much smaller than the increases due to core packing described above. One disadvantage of this approach is the inability to distinguish between surface and buried salt bridges. It is interesting to note, however, the increased value of εir–εrr for charged residues, as would be predicted if these residues were increasingly found on the protein surface with increased temperature. This is also shown in Figure 1B, which shows that the number of inter-residue contacts made by charged residue decreases as the temperature increases, which is what would be expected if these residues increased on the surface of the protein.

Cation–aromatic interactions

While there was an increase in the stabilizing effect of positively charged residues interacting with aromatic residues, it was paralleled by the increase in the interaction of negatively charged residues with aromatic residues—although the cation–aromatic interaction is consistently more stabilizing. This suggests that the increase in this interaction is due more to interactions between these residues and solvent rather than because of an increased residue–residue interaction strength, or that the cation–aromatic interactions involving electrostatic interactions with the π-bond electron cloud of the aromatics was obscured by the number of cations and aromatic residues in close proximity.

Hyperthermophiles versus thermophiles

Many of the trends described above changed in a consistent pattern from mesophiles to thermophiles to hyperthermophiles. There were, however, some interesting differences. For instance, the changing frequency of charged residues increased in a continuous manner from mesophiles to thermophiles to hyperthermophiles, while most of the decrease in the polar residues occurred from mesophiles to thermophiles.

The strength of contacts between hydrophobic residues, as well as between hydrophobic and polar residues, increased more sharply as the temperature is increased. Conversely, the interaction strength between aromatic residues as well as between oppositely charged residues increased most significantly between mesophiles and thermophiles. The latter observation is interesting, because the contact energy in the quasi-chemical approximation represents the relative number of such contacts formed compared with what would be expected with a random assortment of residues. By comparison with Figure 1,B and D, it appears that the relative tendency for oppositely charged residues to associate does not significantly increase; however, the location of such associations is more likely to be on the surface of the protein. There is, conversely, much less of a change in the fraction of charged residues on the surface between mesophiles and thermophiles.


In psychrophiles, there is also a decline in the number of uncharged polar residues as well as the decline in the number of charged and aromatic residues (especially Asn, Lys, Tyr, Phe, and Glu), with an increase in the number of hydrophobic residues relative to mesophiles (especially Ala, Cys, Gly, Val, and Met). We also observed a substantial increase in the number of Pro residues. Comparison with prior analyses of amino acid composition in psychrophiles was difficult, as these studies have yielded ambiguous results (Saunders et al. 2003; Methe et al. 2005; D'Amico et al. 2006).

It has been suggested that psychrophilic proteins enhance their flexibility by decreasing the number and strength of various interactions (Wallon et al. 1997; Gianese et al. 2001; Violot et al. 2005). We observed a decrease in interaction energy, continuing the trends observed in moving from hyperthermophiles to thermophiles to mesophiles. The increase in the number of Gly residues would also serve to increase both protein flexibility and stability by increasing the conformation entropy of the unfolded state. In particular, there is a significant decrease in the interaction between hydrophobic residues, and between hydrophobic and aromatic residues, as would be expected given the temperature dependence of the hydrophobic effect. Despite this decrease, the total number of hydrophobic residues increases, so that the fraction of the interaction energy that results from contact between hydrophobic amino acids actually shows a noticeable increase. Again, this would be consistent with a reduction in other types of interactions, such as hydrogen bonds and salt bridges.


In order to explore the mechanisms used for thermostability for different proteins, there have been numerous studies looking at the numbers and types of interactions. Structural studies can count the number of interactions in different proteins, but it is difficult to determine the strength of these interactions without detailed computational modeling. The quasi-chemical approach uses information about the observed number of interactions compared with the number of interactions that would be expected given random mixtures of the amino acids, in order to evaluate the strength of the interactions. The advantage of this perspective is the ability to perform this type of analysis on large databases, generating more general principles than those obtainable through the examination of individual structures.

There is an interesting interplay between the temperature dependence of the intra- and intermolecular interactions, the requirements for protein stability and functionality, and the resulting selective pressure and consequent evolutionary dynamics. As an example, the decreasing efficacy of the hydrophobic interaction with increasing temperature might result in an increased contribution of other interactions to thermostability in thermophiles, or alternatively, an increase in the number or effectiveness of hydrophobic interactions in order to compensate. The choice of strategies pursued by these thermophiles is likely to relate to issues of sequence entropy, that is, the number of sequence changes that develop alternative interactions vs. the number that enhance the contributions from hydrophobic interactions. There might be other reasons why other specific interactions cannot be used as substitutes for the hydrophobic effect; for instance, the relatively low thermostability of Cys (Jaenicke and Bohm 1998) precludes a greater reliance on disulfide bonds. The evidence presented in this study shows that, even with the reduced hydrophobic interaction, the selective pressure results in an increased reliance on these interactions. This may reflect the plasticity of hydrophobic interactions. Because they do not rely on specific interatomic distances and orientations, it is still easier to increase these interactions than for them to be substituted with others.

It is important to remember that the potentials described in this study can be considered “potentials of mean force” (Kirkwood 1935), where the strength of a given interaction includes an integration over all of the possible ways that the rest of the system adapts to that interaction. If the formation of a salt bridge affects the structure of nearby water molecules, the free-energy changes of these water molecules will be represented in the interaction potential between the charged residues forming the salt bridge. Temperature dependences of the solvent properties will thus result in a temperature dependence of this charge–charge interaction. If salt bridges are more preferentially found on the outside of the protein in thermophiles compared with mesophiles, then the effect of solvent rearrangement will make more of a contribution to this interaction in thermophiles. For this reason it is to be expected that different data sets, such as mesophiles vs. thermophiles, would show different potentials, even for enthalpic interactions that would seem to be largely temperature independent.

There are some factors that may be important in thermostability that cannot be assessed through this type of analysis. For instance, changes in conformational entropy, such as those resulting from changes in the number or location of prolines and glycines, or through the shortening of external loops, are unobservable. In addition, there has been recent work looking at the role that the choice of protein structure might have on generating thermostable proteins (England et al. 2003). Again, such mechanisms, although interesting, cannot be observed through a quasi-chemical analysis.

Additionally, we note that our results describe the interactions that occurred in the folded state relative to those interactions that would be found in the same structure if the amino acids were to be “scrambled” and making random contacts. There has been interest in the question of “negative design”, that is, how alternative misfolded states might be specifically destabilized (Pakula and Sauer 1990; Bowler et al. 1993; Lattman and Rose 1993; Koshi and Goldstein 1997; Bolon et al. 2005). In a recent publication, Berezovsky et al. (2007) have suggested that the increased number of charged residues in thermophiles is precisely because of this negative design, that is, in raising the energy of misfolded configurations. This argument might explain why we observed an increase in the number of charged residues in thermophiles and hyperthermophiles, although this seems to result in only a small increase in the contribution of salt bridges to stabilizing the native state.

Our results are in general agreement with prior observations. Although the hydrophobic effect is, generally, a maximum at around room temperature, we observed the strength of the interaction between hydrophobic residue increases in thermophiles and hyperthermophiles. We also observed an increase in the strength of interactions between charged and uncharged polar residues, consistent with stronger hydrogen-bond networks. Both of these effects would be expected to result from improved packing of the internal core of the protein. We observed an increase in salt-bridge interaction strengths, as well as an increase in the number of charged residues in organisms adapted to higher temperature. Interestingly, the overall effect of the greater number of stronger interactions is surprisingly modest relative to the effect of the tighter packing.

The trends observed in going from mesophiles to thermophiles and hyperthermophiles are largely reversed in psychrophiles. We observed a decrease in the strength of almost all interactions, especially the hydrophobic reaction, as would be expected from a looser, more flexible protein and from the explicit temperature dependence of the hydrophobic effect. We observed a decrease in charged residues as well as charged and uncharged polar molecules, as would be expected with a decrease in both hydrogen bonding and salt bridges. Although the hydrophobic effect decreases in strength, the decrease in the number of non-hydrophobic residues means that the contribution of the hydrophobic effect increases at lower temperatures.

Materials and Methods


We started with the SCOP database (1.71) (Murzin et al. 1995), containing 71,796 domains. Structural descriptions of these files were extracted from the Protein Data Bank (PDB) (Berman et al. 2000). Membrane proteins were removed, and data sets were constructed consisting of proteins corresponding to one of 30 psychrophilic organisms (Set 1, optimal growth temperatures from 0°C to 18°C), eight prokaryotic mesophilic organisms (Set 2, optimal growth temperature 20°C–42°C), 81 thermophilic organisms (Set 3, 45°C–78°C), and 48 hyperthermophilic organisms (Set 4, above 80°C). Proteins from these organisms were clustered with BlastClust so that no two had over 50% sequence identity. Other proteins were excluded because they lacked side-chain coordinates, were shorter than 100 residues or longer than 1000, or were no longer in the current version of the PDB database. Sets 2, 3, and 4 were then culled so that the overall distribution of protein sizes matched that of Set 1. The final set involved Set 1: 47 proteins, Set 2: 814 proteins, Set 3: 269 proteins, and Set 4, 334 proteins.


We follow the approach described by Miyazawa and Jernigan (1985). We considered a lattice model of the protein where each residue makes a given number of contacts, either with solvent or with other (non-nearest-neighbor) residues; the average number of contacts qi only depends upon the identity of the residue i:

equation image(1)

where ni is the number of residues of type i, nii, and 2nij are the number of contacts between two residues of type i and between a residue of type i and type j, respectively, and the subscript 0 refers to solvent. The assumption of the one-to-one exchangeability of solvent and residue contacts is not an assumption about the structure of water as much as a model of the solvent as an ensemble of molecules whose effective size is such as for Equation 1 to be satisfied.

We consider a pair-contact potential where the energy of a protein in a given conformation is given by

equation image(2)

where Ec is a component of the energy that does not depend on the protein conformation and εij is the energy of forming a contact between these two residue types (Eij) plus forming a solvent–solvent bond (E00) relative to the energy of two separate residue–solvent bonds (Ei0 + Ej0):

equation image(3)

Here equation image is the energy of forming a contact between residues i and j (or residue and solvent if i or j equals 0) relative to half the energy of forming an ii and a jj contact. (equation image).

Assuming that all energetic terms are in units of kT, where k is the Boltzmann constant and T is the absolute temperature, the quasi-chemical approximation states that the relative number of ij, ii, and jj contacts depend upon the Boltzmann-factor weighting of the energies of these contacts, as long as we normalize the number of contacts by the expected number of contacts that would be formed if the contacts were made randomly. This takes the form

equation image(4)

where for a single protein Nij = nij and Cij = (nirnjr)/nrr, where equation image and equation image. In this case nrr is the total number of residue–residue contacts. For multiple proteins we need to sum these entities over the various proteins P being considered.

equation image(5)
equation image(6)

Similarly we have, for residue–solvent contacts,

equation image(7)


equation image(8)
equation image(9)
equation image(10)

In order to calculate the above quantities, we need to know the effective number of solvent molecules in the system, that is, the number of solvent molecules that would interact with the unfolded protein chain if the intramolecular and residue–solvent interactions were all identical in strength to the solvent–solvent interactions. This is approximated using the Flory theory, including an effective size of the solvent molecules, q0 equal to the average size of the protein residues. Flory theory provides us with the average number of residue–residue contacts ñrr, where the tilde refers to the random-chain configuration. We can then calculate the number of effective solvent molecules n0 using

equation image(11)

where qr is the average number of contacts for the amino acid residues. ni0, nr0, and n00 can all be calculated with Equation 1 above.

In order to calculate εir and εrr, we follow Miyazawa and Jernigan (1985) and consider a hypothetical 200-residue protein with the same composition as the corresponding data set. We then calculate expected values of ñij, the expected number of each type of contact for this hypothetical protein, by looking for a simultaneous solution of Equation 1 and

equation image(12)

We then calculate εir and εrr using

equation image(13)
equation image(14)

We also considered effective potentials between different categories of residues, dividing residues into positively charged (POS, Arg and Lys), negatively charged (NEG, Asp and Glu), aromatic (ARO, Phe, Trp, Tyr), uncharged polar (POL, Asn, Cys, Gln, His, Ser, Thr), and hydrophobic (FOB, Ala, Gly, Ile, Leu, Met, Pro, Val). The calculation proceeds as described above, with the size and value of qi equal to the composition-weighted average of these quantities.

In the original derivation, it was assumed that the energies calculated would be in units of kT, where k is the Boltzmann constant. The results presented in this study assume that this relationship is correct and represent the energies in kCal mol−1.

Error estimation

We are using a relatively small sample of proteins to represent the set of all psychrophilic, mesophilic, thermophilic, and hyperthermophilic proteins. In order to access the error due to this finite sampling, we use standard bootstrapping (Efron and Tibshirani 1993). As an example, for the psychrophilic data set, we create 20 new data sets with 47 proteins, all drawn by sampling randomly from the original 47-protein data set with replacement. The various quantities are calculated for these new data sets, with the spread in the derived values representing the approximate error due to sampling. As can be seen in the figures, the sampling error is small relative to the differences between the results obtained from the different databases.


We thank Gisle Sælensminde for the list of organisms with associated optimal growth temperatures, Benjamin Blackburne for computational assistance, and Gisle Sælensminde, Benjamine Blackburne, and Anna Chernova for helpful discussions. This work was supported by the NIMR (MRC).