Electrostatic interactions are often critical determinants of protein structure and function. In an earlier protein design study, an overly simplistic electrostatic model was found to incorporate destabilizing electrostatic interactions into the designed proteins (Marshall et al. 2002). Energies calculated using the finite difference Poisson-Boltzmann (FDPB) model (Gilson et al. 1987; Honig and Nicholls 1995; Rocchia et al. 2001), a more sophisticated model for describing the electrostatic potential in proteins, correlated more strongly with experimentally determined stability. However, FDPB calculations, as normally performed, are computationally too costly for most protein design calculations.
Computational protein design algorithms (Gordon et al. 1999; Street and Mayo 1999; Kraemer-Pecore et al. 2001; Mendes et al. 2002) have relied on simple, often empirical methods to model electrostatic interactions between charged and polar protein groups and the desolvation of polar and charged side chains. For example, the ORBIT (Optimization of Rotamers by Iterative Techniques) protein design force field uses Coulomb's law with a distance-dependent dielectric and an explicit hydrogen-bond term to describe interactions between polar and charged groups and either a penalty for the burial of polar hydrogens or a penalty for the burial of polar surface area (Dahiyat and Mayo 1996; Gordon et al. 1999). Havranek and Harbury have developed a modified Tanford-Kirkwood model to describe electrostatic interactions and applied it to the design of homodimeric and heterodimeric coiled coils (Havranek and Harbury 1999, 2003). Baker and coworkers have used a volume-based solvent exclusion model to describe the desolvation of polar groups (Lazaridis and Karplus 1999), along with a distance-dependent dielectric model, in the successful design of a novel protein fold (Kuhlman et al. 2003). Recently, Hellinga and coworkers have empirically derived a large number of dielectric constants and interaction parameters to describe polar desolvation as well as charge–charge and charge–polar interactions between protein groups (Wisz and Hellinga 2003) and used these parameters to engineer catalytic function into a catalytically inert scaffold (Dwyer et al. 2004). Finally, Pokala and Handel (2004) have proposed a method for calculating Born radii in the context of protein design calculations.
Here, we describe a method for modeling electrostatic interactions in protein design calculations using a limited number of FDPB calculations performed with simplified surface representations. Typically, FDPB calculations require atomic coordinates for the protein backbone and all side chains in order to define the spatial regions that correspond to the low dielectric protein and high dielectric solvent. In protein design calculations, each possible rotameric sequence (a rotamer is a low energy amino acid side-chain conformation) will have a unique structure and require an independent FDPB calculation. Because the combinatorial complexity of design calculations is often astronomically large, it is not feasible to perform an independent calculation for each possible structure. Instead, we determine the electrostatic energy for each side chain or pair of side chains by performing FPDB calculations using simplified structures that include only the backbone and one or two side chains. The total energy is then obtained by summing the contribution of each side chain and side-chain pair.
Like the other electrostatic models that have been used for design, the simplified surface approach possesses the computational efficiency required for combinatorially complex protein design calculations. The method is two-body decomposable (meaning that each energy term depends on the identity and conformation of at most two amino acid side chains) and therefore compatible with deterministic search algorithms such as Dead End Elimination (DEE) (Desmet et al. 1992; Goldstein 1994; Gordon et al. 2003) that are often used for sequence selection. The two-body FDPB method described in this paper allows for calculation of both desolvation energies and electrostatic interactions between polar protein groups using a minimal number of free parameters. It explicitly captures the impact of sequence changes on the structure of the protein surface, which defines the boundary between the low dielectric protein and the high dielectric solvent. Finally, it efficiently produces energies that correlate well with standard FDPB methods, providing the accuracy demanded by protein design problems.
Strategies for incorporating FDPB methods into protein design calculations
In this study, we have used the FDPB solver from the computer program DelPhi (Rocchia et al. 2001) to calculate electrostatic energies for 24 proteins selected from a group of 500 high resolution protein X-ray structures compiled by Richardson and coworkers (Lovell et al. 2003). The results of these “exact” FDPB calculations were compared to the results of a tractable number of FDPB calculations performed using simplified surface representations that require knowledge of the identity and conformation of no more than two amino acid side chains at a time in order to assess the accuracy of the simplified surface approximation.
Polar protein groups can form favorable electrostatic interactions with the solvent; we refer to the resulting energies as electrostatic solvation energies. The difference between the electrostatic solvation energy of a polar group in the folded state versus the unfolded state is the desolvation energy. In design calculations, the backbone conformation is typically held fixed. As shown in Figure 1A, the desolvation energy of the protein backbone can therefore be defined as the difference between the electrostatic solvation energy of the backbone in the presence of all of the protein's side chains versus the electrostatic solvation energy of the isolated backbone (a reference state that remains constant in the design calculation). As shown in Figure 2A, the desolvation energy of a side chain is defined as the difference between the electrostatic solvation energy of the side chain in the context of the folded protein versus the electrostatic solvation energy of the side chain and local backbone alone, where the local backbone is defined by the atoms CA(i − 1), C(i − 1), O(i − 1), N(i), H(i), CA(i), C(i), O(i), N(i + 1), H(i + 1), and CA(i + 1).
Electrostatic interactions between polar protein groups and the solvent also act to screen Coulombic interactions within a protein. The screening energy is generally opposite in sign and weaker in magnitude than the Coulombic energy for a given interaction. The procedures used to calculate side-chain/backbone and side-chain/side-chain screening energies are shown in Figures 2A and 3A, respectively. In all cases, the screening energies and Coulombic energies are added to yield “screened Coulombic energies,” and the screened Coulombic energies predicted by the different electrostatic models are then compared. As solvation energies are strongly anticorrelated with Coulombic energies, comparison of screened Coulombic energies but not screening energies alone is appropriate for the validation of approximate electrostatic models (Scarsi and Caflisch 1999).
For compatibility with the ORBIT protein design procedure, we have calculated backbone desolvation energies, side-chain desolvation energies, side-chain/backbone interaction energies, and side-chain/side-chain interaction energies separately. The total electrostatic energy of each rotameric state of a protein is then the sum of the backbone desolvation energy (ΔGbbdesolv), the desolvation energy of each side chain i (ΔGidesolv), the screened Coulombic interaction between each side chain i and the backbone (ΔGi/bbscreenedCoul), and the screened Coulombic interaction between each pair of side chains i and j (ΔGi/jscreenedCoul):
When calculating the “exact” FDPB energies, each of the above terms is calculated using all of the protein atoms to define the low dielectric protein region versus the high dielectric solvent region.
One-body FDPB decomposition
Several physical properties of proteins can be calculated using information derived from the protein surface. While protein surfaces cannot be perfectly represented using pairwise decomposable methods, earlier protein design studies have demonstrated that pairwise or sequence-independent approximations can yield satisfactory results for hydrophobic solvation and binary patterning, respectively (Street and Mayo 1998; Marshall and Mayo 2001). Similarly, it may be possible to obtain accurate estimates of the FDPB energies obtained using all the atomic coordinates to define the surface from FDPB energies obtained using simplified models for the protein surface that require knowledge of only one or two side-chain conformations at a time.
Since the protein backbone is fixed during design calculations, an approximate one-body (i.e., one side-chain rotamer) surface can be obtained using the atoms from the protein backbone and the side chain of interest only. It is necessary to include the side chain of interest when defining the protein surface to ensure that all protein charges are located in the low dielectric protein region rather than the high dielectric solvent region. The one-body backbone desolvation energy, which is an approximation of the desolvation of the backbone by each side chain, is calculated as the difference in solvation energy between the one-body folded state (which includes only the side chain of interest and the backbone) and the isolated backbone, as shown in Figure 1B. The total backbone desolvation energy for each protein is approximated as the sum of the one-body backbone desolvation energies of each of its side chains. As is shown in Figure 2B, one-body side-chain desolvation energies are calculated as the difference in solvation energy between the one-body folded state (which includes only the side chain of interest and the backbone) and the unfolded state (which includes side chain i and the local backbone). The one-body side-chain/backbone screened Coulombic energy of each side chain is calculated using the model in Figure 2C.
To test the accuracy of the one-body decomposition, we calculated the backbone desolvation energies, side-chain desolvation energies, and side-chain/backbone screened Coulombic energies for the set of 24 proteins. Backbone desolvation energies can be calculated reasonably well by summing the desolvation induced by the presence of each side chain, as shown in Figure 4A. Using the one-body decomposition, the backbone desolvation energy resulting from each side chain can be considered as a component of the side-chain/backbone energy of the side chain in design calculations. The extent to which backbone desolvation energy depends on protein sequence and side-chain conformations is not yet fully understood. Avbelj, Baldwin, and coworkers, however, have reported the importance of backbone desolvation in determining amino acid secondary structure propensities (Avbelj and Moult 1995; Avbelj and Fele 1998; Avbelj et al. 2000).
The one-body approximation grossly underestimates the majority of the side-chain desolvation and side-chain/backbone screened Coulombic energies, as shown in Figure 4, B and C, respectively. The one-body model neglects the contribution of the other side chains to the dielectric environment of the side chain of interest, resulting in an excessively solvated folded state. Deviations between the one-body and exact FDPB results are especially pronounced for large magnitude desolvation and screened Coulombic energies, which tend to occur in environments with a low effective dielectric.
Two-body FDPB decomposition
More accurate energies can be obtained using two-body methods (i.e., methods including two side-chain rotamers), in which the total side-chain desolvation or side-chain/backbone screened Coulombic energy for each side chain i is defined as the sum of its one-body energy and the two-body perturbation energies for each other side chain j. As shown in Figure 2, B and C, the perturbation energy of each other side chain is defined as the difference between the two-body energy, which is calculated using the backbone and two side chains to define the dielectric boundary, and the one-body energy calculated previously.
Incorporating the effects of other side chains using the two-body perturbation method allows accurate calculation of electrostatic energies, as shown in Table 1 and Figure 5, A and B. Five outlier points, representing five different amino acid types from four structures, were observed to have large errors in their two-body side-chain desolvation energies, as shown in Figure 5A. These outliers likely arise from grid placement artifacts, a source of error in FDPB calculations that has been described previously (Gilson et al. 1987). Accurate two-body desolvation energies can be obtained for these five points by slightly altering the position of the molecule relative to the grid (data not shown).
The two-body approximation systematically underestimates the magnitude of the side-chain desolvation energy. The systematic error in the two-body desolvation energy was minimized by linearly scaling the two-body perturbation energy. The set of 24 structures was divided into two sets of 12 structures, and a scaling parameter, α, was derived by a linear least-squares fit for each set (with the five outlier points removed). The robustness of the scaling parameter was tested by cross-validation, as shown in Table 2, and sensitivity analysis, as shown in Figure 6A. The error in the two-body side-chain desolvation is reasonably insensitive to α around the optimal α value, and both sets have similar dependence on α, suggesting that this scaling parameter should be used in routine calculations.
In the one-body FDPB method, we calculated side-chain and backbone desolvation energies and side-chain/backbone screening energies, but not side-chain/side-chain screening energies. Simply multiplying the one-body potential generated by side chain i by the partial atomic charges of side chain j is not very accurate (data not shown), especially for charged atoms located at or beyond the dielectric boundary defined by side chain i and the protein backbone. Side-chain/side-chain screened Coulombic energies were calculated using a two-body decomposable method that uses only the backbone and two side chains of interest to define the dielectric boundary, as shown in Figure 3B. Although the two-body model systematically overscreens the Coulombic interactions, the accuracy obtained using a two-body FDPB decomposition is quite good, as shown in Table 1 and Figure 5C. The two-body approximation is probably less accurate for certain large interaction energies owing to increased sensitivity to the shape of the dielectric boundary in regions of large electrostatic potential.
Analysis of the side-chain desolvation and side-chain/backbone screened Coulombic energies indicates that, in most cases, the perturbation caused by a second side chain is negligible. The small fraction of two-body perturbations that contribute significantly to the desolvation or side-chain/backbone energies involve pairs of residues that are close in space. Furthermore, side-chain/side-chain interaction energies for residues that are not close in space are typically small in magnitude and may be approximated using a simpler electrostatic model. We performed additional calculations in which two-body perturbations were calculated only for pairs that separated by <6 Å or 4 Å. As shown in Table 1, we observe a slight decrease in accuracy as the distance cutoff is decreased from infinity to 6 Å to 4 Å. This arises from an increased underestimation of the side-chain desolvation energies and side-chain/backbone screened Coulombic energies, as well as increased inaccuracy in defining the dielectric environment, as fewer pairs are included.
When calculating screened Coulombic energies, the interaction of side-chain pairs separated by more than a distance cutoff of 6 Å or 4 Å was approximated by a distance-dependent Coulombic model, and the two-body FDPB model was applied only to pairs that are close in space. The two sets of protein structures used for the α parameterization were used to derive the optimal distance-dependent dielectric values for pairs separated by distances greater than the cutoff. The dielectrics derived for each set are similar, and the errors in the two-body approximation with the cutoffs are comparable to the error in the full two-body calculation including all pairs, as shown in Table 3. The sensitivity of the error and correlation with the exact FDPB energies to the dielectric value is shown in Figure 6, B and C.
Considering only a limited subset of pairs significantly reduces the total calculation time, which is crucial since the number of pairs in a design calculation is often large. For instance, the reported surface design calculation for engrailed homeodomain considers 15,000,000 rotamer pairs (Marshall et al. 2002). The FDPB calculation for this number of pairs would require ∼3 wk of CPU time on a cluster of 128 IBM PowerPC 970 processors running at 1.6 GHz. The time required to complete the two-body calculation can be reduced to <1 d of CPU time by applying a distance cutoff of 4.0 Å.
It has been shown that, for a series of designed homeodomain variants, there is a correlation between experimental stability and exact FDPB electrostatic energies plus ORBIT van der Waals energies (Marshall et al. 2002). In order to assess the predictive power of the two-body method presented here, we have compared the two-body FDPB energies to these experimental results. For each variant, the sum of all two-body side-chain/backbone and side-chain/side-chain screened Coulombic energies and the sum of all two-body side-chain desolvation energies were added to the ORBIT van der Waals energies. As shown in Figure 7, the two-body FDPB energies are able to predict, with accuracy close to that of the exact FDPB calculations, trends in experimental stabilities of six of the seven variants tested, including the wild-type protein and NC3-Ncap, the most stable variant.
Thus far, we have developed and tested new electrostatic models for protein design calculations by maximizing the agreement between the approximate desolvation and screened Coulombic energies with the exact FDPB energies. While even “exact” FDPB energies are an approximation of the true electrostatic energy of the system, it is probable that, in the context of design calculations, the accuracy of the structural model will be a greater source of error than the limitations of the underlying FDPB model. To maximize computational efficiency, most protein design methods use a fixed backbone, discrete side-chain rotamers, and a very simple model of the unfolded state. As a result, certain errors in electrostatic energies can be observed in design calculations. For example, the energetic benefit of surface salt bridges is overestimated if the entropic cost of locking flexible side chains into a single conformation is not considered. Similarly, the folded-state stability conferred by interactions that are populated in the unfolded state, such as i, i ± 2 side-chain/backbone interactions, is overestimated if the unfolded state is modeled as the side chain and local backbone only.
Based on a single study of electrostatics in designed proteins (Marshall et al. 2002), either exact or two-body FDPB energies (with large magnitude side-chain/side-chain inter-actions truncated) are sufficiently accurate to provide a reasonable correlation with experimentally determined stability, as shown in Figure 7. Additional experimental studies will be required to assess the performance of the two-body decomposable model in the design of proteins with specific catalytic or binding properties. In cases where accurate modeling of electrostatics is especially critical, more sophisticated structural models, such as the flexible rotamer model (Mendes et al. 1999) and explicit modeling of alternate backbone conformations (Kuhlman et al. 2003), may prove useful.
Accurate electrostatic models, including the FDPB model, require knowledge of the full tertiary structure of the protein. As a result, these models cannot be applied directly to protein design calculations, which often consider >1050 possible protein structures. While it is not possible to explicitly calculate electrostatic energies in each structural environment, it is also not prudent to neglect changes in the shape of a protein's surface that result from modifying the protein sequence.
We have found that it is possible to obtain accurate electrostatic energies using simplified surface models that depend on the identity and conformation of the protein backbone and only one or two side chains at a time. The success of the two-body FDPB method suggests that it is critical to define the surface accurately in the immediate vicinity of the partial charges that are “generating” and “feeling” the electrostatic potential in each calculation. The results also suggest that it is important to account for desolvation and screening due to other nearby side chains, but that the effects of each other side chain are fairly independent and can be captured pairwise. Finally, we have found that the effects of sequence-dependent variation in the dielectric boundary can be neglected if the perturbations are reasonably far removed from the partial charges that are “generating” or “feeling” the electrostatic potential in a given calculation.
Efficient and accurate electrostatic models are also critical for protein folding and docking calculations. The simplified surface methods discussed here could be used to explore different side-chain orientations given a fixed-backbone conformation. Similarly, derivatives of a small molecule scaffold, such as those generated by combinatorial chemistry methods, could be modeled. However, folding and docking calculations typically sample a large number of backbone conformations or relative molecular orientations. Since each backbone conformation would require an independent set of one- or two-body FDPB calculations, the computational demands of folding and docking calculations would be far greater than those for design.
The stability of designed proteins has already been demonstrated to be sensitive to the quality of the electrostatic model used in the design calculations. It is likely that electrostatic interactions are at least as important in determining the functional properties of proteins, including binding and catalysis. As a result, the development and testing of accurate electrostatic models are likely to significantly aid in the design of proteins with desired physical, chemical, and biological properties.