An amino acid sequence, in the context of the solvent environment, contains all of the thermodynamic information necessary to encode a three-dimensional protein structure. To investigate the relationship between an amino acid sequence and its corresponding protein fold, a database of thermodynamic stability information was assembled that spanned 2951 residues from 44 nonhomologous proteins. This information was obtained using the COREX algorithm, which computes an ensemble-based description of the native state of a protein. It was observed that amino acid types partitioned unequally into high, medium, and low thermodynamic stability environments. Furthermore, these distributions were reproducible and were significantly different than those expected from random partitioning. To assess the structural importance of the distributions, simple fold-recognition experiments were performed based on a 3D-1D scoring matrix containing only COREX residue stability information. This procedure was able to recover amino acid sequences corresponding to correct target structures more effectively than scoring matrices derived from randomized data. High-scoring sequences were often aligned correctly with their corresponding target profiles, suggesting that calculated thermodynamic stability profiles have the potential to encode sequence information. As a control, identical fold-recognition experiments were performed on the same database of proteins using DSSP secondary structure information in the scoring matrix, instead of COREX residue stability information. The comparable performance of both approaches suggested that COREX residue stability information and secondary structure information could be of equivalent utility in more sophisticated fold-recognition techniques. The results of this work are a consequence of the idea that amino acid sequences fold not into single, rigidly stable structures but rather into thermodynamic ensembles best represented by a time-averaged structure.
It is a longstanding idea that protein structures are the result of an amino acid chain finding its global free energy minimum in the solvent environment (Anfinsen 1973). Several exceptions to this so-called “thermodynamic control” have been discovered in recent years, including examples of proteins whose folding may be under “kinetic control” (Baker et al. 1992; Cohen 1999) and proteins requiring information not completely contained in the amino acid sequence (e.g., chaperone-assisted folding [Feldman and Frydman 2000; Fink 1999]). Although thermodynamic control is widely accepted as the default behavior for correct folding (Jackson 1998), a detailed understanding of the forces involved in thermodynamic control and how atomic interactions relate amino acid sequence to the folding and stability of the native structure has still proven elusive.
In this work, a computational method is used to address the issue of how sequence specifies structure in the energetic minimum around the native state. This is accomplished by an approach that determines the stability differences within a protein structure using the COREX algorithm (Hilser and Freire 1996). The COREX algorithm generates an ensemble of states using the high-resolution structure as a template. Based on the relative probability of the different states in the ensemble, different regions of the protein are found to be more stable than others. Thus, the COREX algorithm provides access to residue-specific free energies of folding. Perhaps the most important feature of the COREX algorithm is that the results have been verified by comparison with protection factors obtained from native state hydrogen exchange experiments. The good agreement between the calculated and experimental protection factors suggests that the calculated native state ensemble provides a reasonable representation of the actual native state ensemble (Hilser and Freire 1996; Hilser et al. 1998).
The goal of this work is to determine whether residue types have distinct preferences for thermodynamic environments in the folded native structure of a protein, and whether a scoring matrix based solely on thermodynamic information (independent of explicit structural constraints) can be used to identify correct sequences that correspond to a particular target fold. The strategy is to calculate the regional stabilities for a large database of COREX-analyzed proteins, and to determine whether amino acid types are distributed uniformly across the different thermodynamic environments. The strength of this approach is that it ignores traditional amino acid properties such as hydrophobicity and secondary structural propensity, and focuses only on the thermodynamics of the microscopic states in the native state ensemble. The results are validated using previously described fold-recognition profiling methods (Gribskov et al. 1987; Bowie et al. 1991). Control data sets of randomized data and secondary structure information are used to assess the significance of the fold-recognition results.
COREX residue stability constants
A database of 44 nonhomologous proteins (Table 1) was analyzed using the COREX algorithm. Briefly, COREX generates an ensemble of partially unfolded microstates using the high-resolution structure of each protein as a template (Hilser and Freire 1996). This is facilitated by combinatorially unfolding a predefined set of folding units (i.e., residues 1–5 are in the first folding unit, residues 6–10 are in the second folding unit, etc.). By means of an incremental shift in the boundaries of the folding units, an exhaustive enumeration of the partially unfolded species is achieved for a given folding unit size. The entire procedure is shown schematically in Figure 1A for ovomucoid third domain (OM3), one of the proteins in the database (PDB accession code 2ovo).
For each microstate i in the ensemble, the Gibbs free energy is calculated from the surface area-based parameterization described previously (D'Aquino et al. 1996; Gomez et al. 1995; Xie and Freire 1994; Baldwin 1986; Lee et al. 1994; Habermann and Murphy 1996 [see also Fig. 1B and Materials and Methods]). The Boltzmann weight of each microstate, Ki = exp(−ΔGi/RT), is used to calculate its probability:
in which the summation in the denominator is over all microstates. From the probabilities calculated in equation 1(1), an important statistical descriptor of the equilibrium can be evaluated for each residue in the protein. Defined as the residue stability constant, κf,j, this quantity is the ratio of the summed probability of all states in the ensemble in which a particular residue, j, is in a folded conformation (ΣPf,j) to the summed probability of all states in which j is in an unfolded (i.e., non-folded) conformation (ΣPnf,j):
From the stability constant, a residue-specific free energy can be written as:
Equation 3(3) reflects the energy difference between all microscopic states in which a particular residue is folded and all such states in which it is unfolded.
The algorithmic basis of the residue stability constant (equation 2(2)) warrants discussion. Although the stability constant is defined for each position, the value obtained at each residue is not the energetic contribution of that residue. Instead, the stability constant is a property of the ensemble as a whole. The energy difference between each partially unfolded microstate and the fully folded reference state is determined by the energetic contributions of all amino acids comprising the folding units that are unfolded in each microstate, plus the energetic contributions associated with exposing additional (complementary) surface area on the protein (Fig. 1B). The stability constant thus provides the average thermodynamic environment of each residue, wherein surface area, polarity, and packing are implicitly considered. This is important because it provides a thermodynamic metric wherein each of these static structural properties is weighted according to its energetic impact at each position.
As stated above, the most significant feature of the ensemble-based description provided by COREX is that the residue stability constant can be compared to hydrogen exchange protection factors. This represents an experimentally verifiable description of the energetic information calculated by the COREX algorithm. An example of such a comparison is shown for OM3 in Figure 2. The agreement in the location and relative magnitude of the protection factors with the stability constants for this and other proteins suggests that the calculated native state ensemble provides a good description of the actual ensemble (Hilser and Freire 1996). It naturally follows that the residue stability constants of a particular protein provide a good description of the thermodynamic environment of each residue in that structure.
Further inspection of Figure 2 reveals another important feature in the pattern of residue stability constants: The stability constants may vary significantly across a given secondary structural element, as observed for α helix 1 of OM3. The protection factors (and stability constants) are high at the N-terminal region of helix 1, but decrease over the length of the helix. This indicates that structural classifications, such as secondary structure type, do not obligatorily coincide with thermodynamic classifications. This result has potentially important consequences for cataloging propensities of amino acids in different environments. For example, in OM3 two threonine residues are located in different structural environments: Thr 47 is part of the loop that follows α helix 1, whereas Thr 49 is part of β strand 3. Despite the different structural environments for the two threonine residues, the stability constants show that both residues, to a first approximation, share the same thermodynamic environment.
Thermodynamic propensities of residue types
The individual residue stability constants for the 2951 residues comprising the 44 proteins in the database were sorted and empirically binned into three classifications of approximately equal size: high, medium, and low stability, as described in Materials and Methods. Statistics of amino acid type as a function of each of these stability categories were tabulated (Table 2), and normalized histograms of these numbers are shown in Figure 3. Striking asymmetries are often observed for the histograms of certain amino acids across the three stability environments. These asymmetries are well outside the standard deviation of the average of three random data sets, as indicated by the data points and error bars in Figure 3. For example, the aromatic amino acids Phe, Trp, and Tyr are mostly found in high stability environments, whereas Gly and Pro are overwhelmingly found in low stability environments. In contrast, residues such as Ala, Met, and Ser show distributions that do not significantly differ from randomized data.
Although the acidic residues Asp and Glu share a slight tendency to be found in medium stability environments, it is observed that several amino acid pairs with nominally similar chemical characteristics partition differently in the stability environments. For example, the basic residues Arg and Lys show opposite stability characteristics: The counts for Arg increase as the stability class is incremented from low to high, but the counts for Lys decrease. Asn is found less often in high stability environments, whereas Gln is found more often in high stability environments. Although the distribution for Ser does not differ significantly from the randomized data, Thr occurs more often in low stability environments and less often in high stability environments. Somewhat surprisingly, the aliphatic amino acids Ile, Leu, and Val do not show a general pattern, except perhaps a slight disfavoring of low stability environments.
To determine whether the trends in these data are simple functions of the side-chain surface area exposure in the different regions of the protein, probability histograms (identical to Fig. 3) were generated using side-chain surface area alone, averaged over a window size of five residues as described in Materials and Methods. As shown in Figure 4, frequencies of amino acids found in COREX stability environments are not correlated to frequencies of amino acids in exposed surface area environments. This is important as it suggests that the thermodynamic information calculated by the COREX algorithm is not simply monitoring a static property of a single X-ray or NMR structure, but instead is capturing a property of the native state ensemble as a whole.
Fold-recognition experiments based on thermodynamic propensities of residue types
As the distribution of amino acids in the different stability environments is nonrandom, it was hypothesized that these propensities could be used to correctly identify sequences corresponding to structures whose thermodynamic environments are known. In other words, if the thermodynamic environments within a structure are given, can the correct sequence of that structure be identified? To test this, simple fold-recognition experiments were performed based on the information contained in the distributions. The profiling method of Eisenberg and coworkers (Gribskov et al. 1987; Bowie et al. 1991) was used, and stability profiles as a function of residue position were created for each of the 44 proteins in the database. A 3D-1D scoring matrix for each protein that contained the log-odds probabilities of finding amino acid types in the high, medium, or low stability classes was calculated from the data in Table 2 as described in Materials and Methods. An analogous scoring matrix was also calculated using only DSSP secondary structure information (Kabsch and Sander 1983). This provided a control data set of identical dimensionality (i.e., COREX matrix: 20 amino acids × three stability categories [low, med., high] vs. DSSP matrix: 20 amino acids × three secondary structure categories [α, β, other]). It is important to note that no target protein was included in the scoring matrix used for its respective fold-recognition experiment, thus negating the possibility that target information could have directly biased the results.
The scoring matrices derived from COREX stability and secondary structure, averaged over all 44 target proteins, are shown in Table 3, A and B, respectively. The stability matrix scores faithfully reflect the histograms shown in Figure 3; for example, Gly and Pro score unfavorably in high stability environments but score favorably in low stability environments. Similarly, the secondary structure matrix scores follow intuitive notions of secondary structure propensity; for example, Ala scores positively in helical environments, the aromatics score positively in β environments, and Gly and Pro score negatively in both α and β environments. The standard deviations in both matrices were generally small as compared to the magnitude of the scores, suggesting that the scores were not affected by the removal of any one protein from the database.
Fold-recognition experiments were performed using a local alignment algorithm based on the Smith-Waterman algorithm (Smith and Waterman 1981) as implemented in PROFILESEARCH (Bowie et al. 1991). The 44 amino acid sequences corresponding to the structures used in the database were each threaded against the 44 target profiles of COREX stability or DSSP secondary structure. The local alignment algorithm was used to optimally score the target profile versus each amino acid sequence. For comparison, three randomized datasets of COREX stability and secondary structure were used to construct scoring matrices in an identical manner, and the results from these control experiments were averaged for comparison. The fold-recognition protocol was therefore designed so that the only variable between any set of experiments was the scoring matrix.
The results of the fold-recognition experiments are shown in Figure 5, and at least two conclusions can be drawn from these data. First, scoring matrices composed of either COREX stability or DSSP secondary structure data performed better than randomized data sets in matching a structural target to its amino acid sequence. In Figure 5, the results for COREX data are stacked toward the left (successful) side of the rankings, whereas the randomized data approach a bell-shaped distribution with a maximum near the median of the size of the sequence datasets (approximately 22 for the library size of 44 sequences). Second, the number of targets falling in the most successful bin was similar (approximately one-third to one-half of the total number of targets) for both the COREX stability and secondary structure matrices, suggesting that COREX stability propensities alone contain comparable fold-specification information to secondary structure propensities.
Because the local alignment algorithm used here computes a score without returning the complete alignment of profile to sequence, high scores may have been possible from nonstructurally significant local alignments. In other words, it is possible that a correct sequence may have scored well against its corresponding target structure without having placed the individual amino acids in their correct positions within the structure. To assess the extent of local alignments that were structurally significant, minor modifications were made to the PROFILESEARCH source code that saved the traceback of the alignment matrix. It was found that for targets scoring poorly in the fold-recognition rankings, local alignments of the corresponding sequence were often not significant. However, sequences that scored in the top two bins were often found to be completely and correctly aligned with their target profiles, even though not all of their residues contributed to the overall score because of the rules of the local algorithm. Three examples of successful alignment based on COREX stability data alone are shown in Figure 6, A, B, and C, and Table 4ATable 4A., BTable 4B., and CTable 4C., for the targets, neuro-oncological ventral antigen 1 fragment (1dt4:a), protein G (1igd), and tendamistat (2ait), respectively. The alignments calculated using the local algorithm were correct, despite the fact that no sequence information about the target was used, and that only a subset of the amino acid sequence was used in the scoring. In addition, it is noteworthy that the success of these examples is not because of merely a small fragment of the sequence, as the cumulative 3D-1D matrix scores steadily increased over the entire length of the sequence.
To test whether the results were dependent on the particular choice of proteins used in the database, a second, independent database was assembled from 50 proteins found in the PDBSelect database (Hobohm and Sander 1994) that were not contained in the original 44 protein database used above. The 3304 residues from these additional proteins were binned into high, medium, and low stability classes, and the normalized frequencies of stability as a function of residue type were compared to those shown in Figure 3. A scatterplot of these results, given in Figure 7, clearly shows that the two databases are correlated. The major outliers in the correlation are attributable to residue types with low statistics, such as His. These results are interpreted as evidence that the COREX stability data calculated in this work were not fundamentally biased by the original choice of 44 proteins.
The interpretation of a protein as a single, static structure is incomplete (Frauenfelder et al. 1988). An increasing amount of data from NMR spectroscopy and molecular dynamics simulation shows that it is more correct to view the native state of a protein as an ensemble of interconverting conformations, with an average structure approximating that of an X-ray or NMR structure (Kay 1998; Philippopolous and Lim 1999). Although it is possible in favorable cases to assign a macroscopic stability difference between the native and denatured states of a protein, not all residues in a protein share the same microscopic stability, as is evident from hydrogen exchange experiments (Bai and Englander 1996; Chamberlain et al. 1996; Hilser et al. 1998; Huyghues-Despointes et al. 1999; Jaravine et al. 2000; Llinas et al. 1999; Sadqi et al. 1999). The COREX algorithm provides a means of estimating this energetic variability in the native state, and it is possible that this information may illuminate the relation between amino acid sequence and protein structure.
The most significant finding of this work is that amino acid types have different propensities to occur in high, medium, or low stability environments within a protein structure. It is remarkable that the distributions for many residues are far from random, despite the facts that the observed stability distributions are massive averages over many million microstates from different proteins, and certain nonsurface area-related interactions such as electrostatics are ignored in the COREX free energy parameterization. Furthermore, Figures 5 and 6, Fig. 6. indicate that the simple three-dimensional stability classification is able to provide sufficient signal for correct sequence to structure alignments.
In general, the success of any fold-recognition experiment will depend on whether the scoring function is representative of the true determinants of the folds being predicted. For the current case, there are two relevant questions: First, is enough structural-energetic information retained to identify a fold by distilling thermodynamic stability down to three dimensions per amino acid? Second, is the database large enough to sample completely over all stability categories? Here we have implemented two strategies to address these issues. The first is the use of a three-dimensional secondary structure matrix. As the utility of secondary structure information in fold-recognition algorithms is well documented (Jones et al. 1999; Rice and Eisenberg 1997; Bowie et al. 1991), it provides a good metric for the performance of the COREX-derived data in fold recognition. The fact that the overall fold-recognition success rate is similar for COREX and secondary structure (Fig. 5) suggests that the COREX stability matrix indeed contains significant structural information. It is likely that breaking the overall stability into its component enthalpies and entropies would increase the discrimination of the scoring by introducing additional categories to describe a target profile. The impact on fold-recognition experiments of further dividing the stability data is currently under investigation.
The use of a global alignment scheme, as opposed to a local alignment scheme, provides a second means of assessing the information contained in the COREX-derived scoring matrix. For an ideal scoring function, the correct sequence should always score highly when each amino acid is aligned with its corresponding structural position. This is indeed the case for the majority of the structures studied here (data not shown). Although the local alignment scores are generally lower than the global scores, suggesting that information is lost in the local alignment, correct sequences that score highly in local alignments are often completely and correctly aligned with the structural target. This result indicates that even the rudimentary partitioning of residue stabilities described here is sufficient to identify correct sequence/structure pairs. It may be possible to improve the local alignment scheme by assessing scores based on more than one high-scoring subsequence, similar to the logic used in the BLAST algorithm (Altschul et al. 1990).
It is important to bear in mind that residues with functional instead of structural significance are not accounted for in the statistics, and these residues may therefore score poorly in this scheme. However, a surprising implication of these results for protein design is that careful consideration of specific interactions between some amino acids may not be necessary for determination of a given fold. A reflection of this phenomenon may be found in recent work in which almost half of the residues in BPTI were replaced by alanine, yet the resulting structure was stable and almost indistinguishable from the original protein in terms of activity (Kuroda and Kim 2000). The COREX matrix given in Table 3A would similarly predict that, in general, alanine replacements would have a greater probability of being tolerated in most positions throughout the protein structure, as the scores for alanine in any stability environment are near zero.
As noted, absolutely no sequence information about the target was used in the threading calculations. The implication of this observation is that it might be possible to routinely generate a correct sequence to structure alignment on a purely energetic basis. This result is particularly exciting because energetic information may be able to increase the chances of successful fold recognition in cases in which identity between the target sequence and a homologous sequence is below the threshold of the best sequence matching algorithms (Park et al. 1998). Indeed, preliminary results from our laboratory suggest that homologous structures with low sequence identity share similar features in their COREX stability profiles.
Historically, proteins have been classified according to their structural features (i.e., helix sheet, turn) and/or their physico-chemical properties (i.e., hydrophobicity, polarity, packing). The underlying principle that unifies both the structural and physico-chemical properties is that they are all manifestations of the free energy of the system. Whether a sequence is helix, sheet, or turn is determined largely by the energetics of backbone and side-chain interactions within the different conformers and the hydrophobicity, polarity, and packing are determined by the energetic interactions of the different groups with solvent and with each other. This work shows that some proteins can be classified in purely thermodynamic terms. In effect, this approach classifies both the structural and physico-chemical properties within a unifying framework, wherein the importance of these properties is implicitly considered at each site. This finding has potentially far-reaching consequences. As each amino acid sequence adopts the specific fold that is a thermodynamic minimum for that sequence, it is probable that the true determinants for each fold are thermodynamic in nature and universal for all proteins. In other words, thermodynamic rules represent a more fundamental characterization of a protein's fold.
Materials and methods
Selection of proteins used in dataset
A database of 44 proteins, 2951 residues in total (Table 1), was selected from the Protein Data Bank (Berman et al. 2000) on the basis of biological and computational criteria. Two biological criteria were that the proteins be globular and nonhomologous with every other member of the set as ascertained by SCOP (Murzin et al. 1995). One computational criterion was that the proteins be small (< ∼90 residues), because the CPU time and data storage needs of an exhaustive COREX calculation increased exponentially with the chain length. A second computational criterion was that the structures be mostly devoid of ligands, metals, or cofactors, as the COREX energy function was not parameterized to account for the energetic contributions of nonprotein atoms. The database was comprised of 21 X-ray structures, whose average resolution was 1.62 ± 0.07 Å (mean ± S.D.). Twenty-three NMR structures completed the database. An independent database of 50 proteins, 3304 residues in total, which were not included in the above set, was created from the PDBSelect database (Hobohm and Sander 1994). This second database was used as a control to check the results obtained from the first database, as shown in Figure 7.
The COREX algorithm (Hilser and Freire 1996) was run with a window size of five residues on each protein in the database, and the simulated temperature was 25°C. The calculated equilibrium constant for stability of residue j in the native structure, κf,j, was computed from the statistical weights of each microstate i in the folded and unfolded subensembles according to equation 2(2).
The Gibbs energy for each microstate i, relative to the fully folded structure, was calculated using equation 4(4):
In equation 4(4), the calorimetric enthalpy and entropy of solvation were parameterized from polar and apolar surface exposure, and the conformational entropy was determined as described previously (Hilser and Freire 1996). The maximum stability for each protein was normalized to a common arbitrary value of approximately 6.2 kcal/mol (max lnκf = 10.4) by adjusting its conformational entropy factor, W. The average entropy factor required for the normalization was 0.852 ± 0.098 (mean ± S.D.) over the 44 proteins. It was found that adjustment of a stable protein's conformational entropy factor did not change the relative patterns of high and low stability regions in the structure.
Binning of residue stability constants
Lnκf values were binned into three stability classes: high, medium, and low. The cutoffs for each stability class were adjusted so that an approximately equal number of residues in the database fell into each class (Table 2). The low stability category was defined as lnκf ≤ 3.91, the medium stability category was defined as 3.91 > lnκf ≤ 7.08, and the high stability category was defined as lnκf > 7.08.
Calculation of average native state side-chain area surface exposure
Average side-chain area surface area exposure of residue j over a window size of five residues, ASAaverage,j, was calculated using equation 5(5):
Because equation 5(5) was undefined for the first and last two residues in each protein, these four residues were ignored in the binning. The cutoffs for each side chain area class were adjusted so that an approximately equal number of residues fell into each class. The low exposure category was defined as ASAaverage,j ≤ 40.81 Å2, the medium exposure category was defined as 40.81 Å2 < ASAaverage,j ≤ 57.77 Å2, and the high exposure category was defined as ASAaverage,j > 57.77 Å2.
Random data sets
For comparison to the COREX and DSSP data sets from the 44 nonhomologous proteins in the database, control data sets were constructed by randomizing (i.e., shuffling) the calculated stability and the secondary structure data. The random data sets therefore contained the same amino acid composition, counts of high, medium, and low stabilities, and types of secondary structure, as the real data sets. However, any correlation between residue type or secondary structural class was presumably destroyed by randomization. To assess internal variability of these data caused by differing numbers of counts of each residue type, the results from three randomized data sets were averaged and standard deviations calculated; these data are plotted as points and error bars in Figure 3.
Fold-recognition experiments were based on the profile method pioneered by Eisenberg and coworkers (Gribskov et al. 1987; Bowie et al. 1991). Briefly, the method characterizes each residue position of a target protein in terms of a structural environment score derived from analysis of a database of known structures. The resulting profile of the target protein is then optimally aligned to each member of a library of amino acid sequences by maximizing the score between the sequence and the profile. In this work, two structural environment scoring schemes were developed: one based on calculated COREX stability, and one based on DSSP secondary structure (Kabsch and Sander 1983) as contained in each target protein's PDB file. Each scoring scheme had three dimensions as a function of the 20 amino acids: high, medium, and low stability for COREX scoring (as described above), or α, β, and other for secondary structure scoring. Two alignment algorithms were used: a local scheme (Smith and Waterman 1981) as implemented in the PROFILESEARCH software package (Bowie et al. 1991), and a global scheme. The global alignment scheme simply paired the first residue of an amino acid sequence with the first position of a target profile with no allowance for gaps. No attempt was made to optimize the gap opening and extension penalties for the local algorithm; in all cases these were the default values given in the PROFILESEARCH package, 5.00 and 0.05, respectively. The library of amino acid sequences was composed of the sequences corresponding to each of the 44 structures in the database (Table 1).
Construction of scoring matrices
The scoring matrices were calculated as log-odds probabilities of finding residue type j in structural environment k, as described below and in Bowie et al. (1991). The matrix score, Sj,k, was defined as:
In equation 6(6), Pj|k is the probability of finding a residue of type j in stability class k (i.e., number of counts of residue type j in stability class k divided by the total number of counts of residue type j) and Pk is the probability of finding any residue in the database in stability environment k (i.e., number of residues in stability class k, regardless of amino acid type, divided by the total number of residues in the entire database, regardless of amino acid type). The structural environment was described by either COREX stability information (high, medium, or low lnκf), or DSSP secondary structure (α, β, or other) as given in the target's PDB entry. The fold-recognition target was removed from the database, and the remaining 43 proteins were used to calculate the scores; therefore, information about the target was never included in the scoring matrix. The values in Table 3, A and B, are the average ± standard deviation of all 44 individual scoring matrices.
Electronic supplemental material
This includes COREX lnκf values for 2951 residues in 44 proteins comprising the database (lnkf.txt), 44 3D-1D scoring matrices based on COREX lnκf values (matrices.txt), and the 44 amino acid sequences contained in the threading library (sequence.txt).
We thank Jim Bowie and UCLA DOE-MBI for a copy of the PROFILESEARCH software, Robert O. Fox for stimulating discussions, and Josephine Chu-Ferreon, Jim Hamburger, Roberto Galletto, Reza Razeghifard, Steve Whitten, and John Wooll for critically reading the manuscript. This work was supported by grants from GSE, Inc. (Columbia, MD), NSF (9875689), and the Welch Foundation (H-1461).
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.