Computational analysis of binding of P1 variants to trypsin



The binding of P1 variants of bovine pancreatic trypsin inhibitor (BPTI) to trypsin has been investigated by means of molecular dynamics simulations. The specific interaction formed between the amino acid at the primary binding (P1) position of the binding loop of BPTI and the specificity pocket of trypsin was estimated by use of the linear interaction energy (LIE) method. Calculations for 13 of the naturally occurring amino acids at the P1 position were carried out, and the results obtained were found to correlate well with the experimental binding free energies. The LIE calculations rank the majority of the 13 variants correctly according to the experimental association energies and the mean error between calculated and experimental binding free energies is only 0.38 kcal/mole, excluding the Glu and Asp variants, which are associated with some uncertainties regarding protonation and the possible presence of counter-ions. The three-dimensional structures of the complex with three of the P1 variants (Asn, Tyr, and Ser) included in this study have not at present been solved by any experimental techniques and, therefore, were modeled on the basis of experimental data from P1 variants of similar size. Average structures were calculated from the MD simulations, from which specific interactions explaining the broad variation in association energies were identified. The present study also shows that explicit treatment of the complex water-mediated hydrogen bonding network at the protein–protein interface is of crucial importance for obtaining reliable binding free energies. The successful reproduction of relative binding energies shows that this type of methodology can be very useful as an aid in rational design and redesign of biologically active macromolecules.

The ability to predict the strength of noncovalent binding between molecules has been a longstanding goal in computational chemistry. Various methods have been developed during the past 10–15 yr to address the question of ligand–receptor binding, and several comprehensive reviews are available (Kollman 1993; Lamb and Jorgensen 1997; Böhm and Stahl 1999). Free energy perturbation (FEP) theory (Beveridge and DiCapua 1989; Jorgensen 1989; Kollman 1993) combined with conformational sampling by molecular dynamics or Monte Carlo simulations provides a rigorous way of calculating free energies upon modifications of a ligand or a receptor. Most FEP calculations take advantage of a thermodynamic perturbation cycle (Fig. 1), and modifications of the ligand or receptor are achieved through a nonphysical transformation process. As a result of sampling and convergence problems related to large perturbations, FEP calculations are in most cases limited to the evaluation of relative binding free energies for compounds of similar chemical structure. Even calculations of relative binding free energies can pose a major problem if extensive modifications are required to bring the system from one state to another. The recently proposed linear interaction energy (LIE) method (Åqvist et al. 1994), based on the electrostatic linear response approximation and an empirical estimate of the nonpolar binding contribution, has been found to provide a useful alternative to FEP in many cases. In contrast to FEP calculations, the LIE method requires only simulations of the corners of the thermodynamic perturbation cycle and, therefore, no unphysical transformations are needed. The LIE approach has been used to estimate absolute binding free energies of various ligands. It was first applied (Åqvist et al. 1994) to a set of endothiopepsin inhibitors for which absolute binding free energies were obtained in excellent agreement with experimental data. Subsequently, the method was used in studies of inhibitor binding to HIV protease, trypsin, dihydrofolate reductase (DHFR), and human thrombin as well as sugar binding to a bacterial receptor protein (Åqvist and Mowbray 1995; Hansson and Åqvist 1995; Åqvist 1996; Hulten et al. 1997; Marelius et al. 1998b, Ljungberg et al. 2001). Other research groups have also employed the LIE type of approach in studies of ligand binding to, for example, thrombin (Jones-Hertzog and Jorgensen 1997), FKBP12 (Lamb et al. 1999; Adalsteinsson and Bruice 2000a,b), P450cam (Paulsen and Ornstein 1996), DHFR (Gorse and Gready 1997), avidin (Wang et al. 1999a), neuraminidase (Wall et al. 1999), and catalytic antibodies (Xu et al. 1999).

The original version of the linear interaction energy approach employs two parameters, α and β, as scaling factors for the nonpolar and polar ligand-surrounding interaction energies. In several test cases, it was found that β = 0.5 and α = 0.16 were reasonably transferable among different protein systems. Subsequently, the coefficients were refined by FEP calculations, and the electrostatic coefficient β was then defined as a function of charge and the number of hydroxyl groups on the ligand (Hansson et al. 1998). This process resulted in four different β values ranging between 0.33 and 0.5, where the latter value is now used only for ionic ligands, and a value of the nonpolar parameter α = 0.18. The main idea with refining the electrostatic coefficient was to account for deviations from the linear response approximation, which was particularly pronounced for ligands containing dipolar groups. This revised LIE model has only one free parameter, namely α, as the β values were obtained by FEP calculations and not fitted to experimental binding data. For inhibitor binding to human thrombin, however, it was recently found that an additional constant (γ = 3 kcal/mole) is required to reproduce the absolute binding free energies (Ljungberg et al. 2001). The previously derived values of α and β were, however, found to be optimal also for thrombin. Still, other studies indicate that different parameters may be needed to obtain quantitative agreement with experimental data. LIE calculations on binding free energies for P450cam–substrate complexes showed that the best fit was obtained with α = 1.043 and β = 0.5 (Paulsen and Ornstein 1996). A recent study by Wang et al. (1999b) also indicated that a higher value of the α parameter (or the addition of a constant γ) is needed for some systems. Therefore, it seems likely that these parameters are somewhat protein dependent, although differences in force fields and computational procedures may also be important. Notwithstanding the different parameterizations of the LIE method, binding free energies have, in most cases, been predicted in good agreement with experiments. The most promising feature is probably the capability to calculate absolute binding free energies, which is normally not within the scope of FEP calculations.

The interaction between serine proteinases and their natural inhibitors is one of the most extensively studied models of protein–protein recognition. A large amount of structural and kinetic as well as mutagenic data (Bode and Huber 1992; Helland et al. 1999; Krowarsch et al. 1999) are available, which makes the model system an ideal choice for exploring features related to protein–protein association in addition to testing the validity of computational procedures. Association measurements for binding of bovine pancreatic trypsin inhibitor (BPTI) to trypsin (Krowarsch et al. 1999) revealed a range of binding free energies >12 kcal/mole for a set of BPTI variants. The binding mode is characterized by an accommodation of the primary binding residue (P1, nomenclature of Schechter and Berger 1967) in the canonical binding loop at the specificity pocket (S1 site) of trypsin, and a highly complex hydrogen bonding network. Stable complex formation generally involves direct contact between about 10–12 amino acid residues on the inhibitor and 20–25 residues on the enzyme, providing an intermolecular buried area of 600–900 Å2. Interaction free energies between residues in the canonical binding loop and the proteinase are often found to be additive (Qasim et al. 1997; Krowarsch et al. 1999), and the P1 residue is responsible for up to about 70% of this interaction energy (Krowarsch et al. 1999).

The present study was undertaken to gain insight into mechanisms related to protein–protein recognition as well as to examine procedures for investigations of specific energetics related to protein–protein interactions. The model system is well suited for this purpose as a large amount of structural and kinetic data are available. In this study, we used the LIE method to investigate the P1–S1 interaction in 13 trypsin–BPTI complexes that differ in the inhibitor P1 position. The results are found to be in excellent agreement with experimental association energies, and average simulated structures are close to the X-ray structures. Most important is the reproduction of the highly complex hydrogen bonding network at the S1 site, which is found to be of crucial importance for polar side chains.

Results and Discussion

As mentioned above trypsin–BPTI provides an ideal model system for free energy calculations as a vast amount of experimental data are available and form a large data set for cross-validation of calculated binding free energies. Crystallographic analysis of 10 P1 variants in complex with trypsin revealed a highly complex water-mediated hydrogen bonding network at the protein–protein interface. The complexity of this network changes in response to the size and charge of the P1 residue and is thus crucial for the binding. Previous free energy calculations (Brandsdal and Smalås, 2000) showed that to obtain quantitative agreement with experimental association energies, water molecules had to be taken explicitly into consideration. The water-mediated network found at the contact area is not unique to trypsin–BPTI, but is found in other proteinases as well (Huang et al. 1995). The trypsin–BPTI interface is also characterized by specific interactions at secondary binding sites. The energetic contribution from these binding sites is similar for all complexes (Helland et al. 1999), and thus the main difference is the interactions formed by the P1 residue. Therefore, it is natural to choose the P1-Gly mutant as a reference state, and the difference ΔGX − ΔGGly as a measure of the strength of the interaction formed between the P1-X side chain and the S1 site of trypsin. Other studies also support this choice (Huang et al. 1995; Lu et al. 1997; Qasim et al. 1997; Krowarsch et al. 1999). Therefore, only the P1 residue is treated as the ligand in the LIE calculations whereas the rest of BPTI is considered as part of the surrounding. This approach turns out to be very useful as the interaction energy of the P1 residue converges quite rapidly in the MD simulations, which would, of course, not be the case for the entire BPTI molecule. Thus the idea here, to treat only one specific residue of a complex as the ligand in the LIE scheme, provides a new strategy for using this technique to examine the energetics of protein–protein recognition. Absolute binding free energies cannot be estimated in this way, which is not such a serious drawback because protein–protein affinities are often more useful to analyze in relative terms, that is point mutations. The present approach can also, in principle, be extended to as many simultaneous mutations as the energy convergence allows.


The P1-His side chain is not involved in any direct contacts with trypsin, but interacts via solvent water molecules. Still, the measured association energy for binding of BPTI P1-His to trypsin revealed ∼400 times stronger binding relative to P1-Gly, which corresponds to a binding free energy difference of −3.6 kcal/mole (Table 1). The experimental binding measurements were carried out at pH 8.3 at which a (solvent accessible) histidine residue is usually unprotonated. The LIE calculations yield a free energy difference of –3.6 kcal/mole using a neutral description of the P1-His relative to P1-Gly, in good agreement with the experiments. It may also be noted here that the binding process of turkey ovomucoid third domain inhibitor (OMTKY3) with a histidine as P1 residue to Streptomyces griseus proteinase B revealed a drop in the pKa value for the P1-His of about 2 units (Qasim et al. 1995), which supports the notion of P1-His being unprotonated upon binding. On the other hand, a binding free energy difference of −6.6 kcal/mole is obtained relative to P1-Gly when using a charged (+1) description of P1-His. Because the P1 residue of BPTI is fully exposed to solvent, it will be unprotonated at pH 8.3 in unbound BPTI. Therefore, the calculated binding free energy using a charged P1-His needs to be corrected by the free energy of protonation, which is given by ΔΔGbindpKa = 1.35(pH − pKa) where pH = 8.3 and pKa = 6.6 is that of a free histidine (e.g., see Quasim et al. 1995). After addition of the appropriate pH-correction term (+2.3 kcal/mole) the binding free energy associated with P1-His becomes −4.3 kcal/mole. To search for differences explaining the different interactions with the charged and neutral forms of P1-His, average structures from the MD simulations were calculated. A closer inspection of these average structures reveals a slightly different hydrogen bonding network (Fig. 2). The charged variant achieves a more stable hydrogen bonding network involving the main-chain oxygen of Gly219, and a water molecule at the second nitrogen. Repeating the neutral P1-His calculations with the hydrogen attached to the other nitrogen (Nε) yields a P1–S1 interaction free energy of only −1.5 kcal/mole (results not shown).

P1-Glu and Gln

Binding of BPTI P1-Glu to trypsin results in unfavorable interactions at the bottom of the S1 pocket. Consequently, the side chain of the glutamic acid becomes highly flexible as revealed by relatively weak and delocalized electron density in the X-ray structure (Helland et al. 1999). The P1-Glu complex was also refined with two alternative positions. P1-Gln binds to trypsin with the same strength as P1-Glu and is very similar in terms of hydrogen bonding network. Free energy perturbation calculations on the transformation of P1-Glu to Gly and P1-Gln to Glu (B. Brandsdal, A.O. Smalås, and J. Åqvist, unpubl.) showed that trypsin was not capable of accommodating a negatively charged glutamate at the P1-position. The LIE calculations estimate the P1–S1 interaction to be +13.5 kcal/mole (including the contribution from distant charges) when using a negatively charged P1-Glu in both bound and unbound BPTI. Considering the experimental association energy associated with accommodation of P1-Gly at the S1 site and the calculated P1–S1 interaction free energy, BPTI P1-Glu should not bind to trypsin at all. Yet, the experimental association measurements show that BPTI P1-Glu binds 100 times stronger (2.9 kcal/mole) than P1-Gly. This prompted us to study binding of P1-Glu using a protonated description in both states. The LIE calculations yield slightly different average electrostatic interaction energies for P1-GluH (protonated) and P1-Gln relative to P1-Gly, −1.4 and −1.7 kcal/mole respectively, whereas the nonpolar part is −1.9 and −1.6 (Table 1). Thus, the LIE calculations estimate the P1–S1 interaction free energy to −3.3 kcal/mole for both P1-GluH and P1-Gln, whereas the association measurements gave −2.9 and −3.0 kcal/mole, respectively. The use of a protonated P1-GluH does, however, require a correction term to be added to account for the energy required to protonate the ionizable P1 residue in the unbound state at pH 8.3. This correction amounts to approximately +5.1 kcal/mole using ΔΔGbindpKa = 1.35(pH − pKa) and assuming a pKa of 4.5 for P1-Glu in BPTI (Quasim et al. 1995). Hence, addition of this term to the calculated binding free energies for P1-GluH yields a P1–S1 interaction free energy of +1.8 kcal/mole. The above calculations of the P1-GluH complex were based on the conformation similar to that of P1-Gln but, as noted above, the P1-Glu complex was refined with two alternative P1 conformations. This motivated us to study the binding of P1-GluH also using the second conformation in which the P1 side chain is in a more upward position with respect to Asp 189 (Helland et al. 1999). The results from this calculation are also presented in Table 1 and show a stronger stabilization of the polar P1-GluH side chain than observed for the other orientation. In this conformation, the P1–S1 interaction is −10.0 kcal/mole and addition of the pH correction term yields a binding free energy difference of −4.9 kcal/mole.

P1-Asp and P1-Asn

The association measurements (Krowarsch et al. 1999) showed that binding of P1-Asp to trypsin was one of the weakest variants with a binding free energy difference of −0.9 kcal/mole relative to the reference state (P1-Gly). In contrast, binding of P1-Asn is one of the strongest noncognate variants as the binding free energy difference is −4.3 kcal/mole. On the basis of this observation, it is clear that the binding mode of P1-Asp and P1-Asn must be different. Unfortunately, the structure of the P1-Asn complex has not yet been solved, and a model was built on the basis of the P1-Asp complex. The water structure from the P1-Asp complex was assumed to correspond to that of the P1-Asn complex, as the size of an aspartic acid and an asparagine is approximately the same. Table 1 presents the average interactions for P1-Asp, P1-Asn and P1-AspH (protonated Asp), as well as the relative binding free energy differences. LIE calculations for the charged P1-Asp variant yield a destabilization of the charge at the S1 site similar to that of P1-Glu. This destabilization is reflected in the electrostatic contribution to the binding free energy, which is +12.5 kcal/mole in the case of P1-Asp. Because the nonpolar part is only −0.9 kcal/mole, the total binding free energy becomes −11.6 kcal/mole. However, with a protonated description for the P1-residue the electrostatic contribution is reduced to +1.0 kcal/mole, whereas the nonpolar contribution becomes −1.5 kcal/mole, yielding a total binding free energy difference of −0.5 kcal/mole. The protonation of the P1-Asp residue can be done at either of the two oxygens in the carboxylic group, and the first P1-AspH simulation was calculations carried out protonating the oxygen closest to Asp 189. Repeating this calculation with the hydrogen attached to the other oxygen yields a slightly more unfavorable electrostatic contribution to the binding free energy (+2.6 kcal/mole). Note, however, that, in this case, a pH correction of ∼5.1 kcal/mole is also required, which would bring the binding free energies of the two P1-AspH simulations to +4.6 and +6.0 kcal/mole, respectively. Hence, all of the calculations for P1-Asp predict a much weaker binding than observed experimentally.

In contrast, the LIE calculations for P1-Asn show a rather strong electrostatic contribution to the binding free energy of −2.6 kcal/mole, and a nonpolar contribution of −1.0 kcal/mole, which gives a total binding free energy difference of −3.6 kcal/mole. The side chain of P1-Asn can, in principle, also be placed in two positions, with the nitrogen in the amide group up or down. Again, the natural choice was to have the oxygen pointing upward because of electrostatic considerations. The average structure from the simulations showed that the oxygen of the amide group interacts with the NH2 group of Gln 192, whereas the NH2 group of the P1-Asn forms a strong hydrogen bond to the main chain oxygen of Gly 216 (Fig. 3). However, the calculations with the nitrogen in the opposite position show an even larger gain in the electrostatic contribution to the binding free energy, which was somewhat surprising. A closer inspection of the average structure from this simulation shows that the amide group has rotated 180°, and the difference in the electrostatic contribution to the binding free energy in the two simulations of P1-Asn seems to be caused by a shorter distance in the hydrogen bond involving Gln 192. The average structures from the simulations of the P1-Asp and P1-Asn show two water molecules within hydrogen bonding distance at the same average positions in all four simulations, and these are also found in the X-ray structure. Nevertheless, the strong binding of P1-Asn can be ascribed to a stronger and more well-defined hydrogen bonding network than what seems to be the case in the P1-AspH simulations (Fig. 3).

P1-Phe, P1-Trp, and P1-Tyr

The binding of aromatic P1 variants to trypsin is characterized by a strong P1–S1 interaction, and the P1-Tyr variant shows the strongest noncognate association energy. However, as the structure of the P1-Tyr complex is not yet determined, a model was generated using the P1-Phe complex as a basis. Crystallographic water molecules at the enzyme–inhibitor interface as well as those present in the S1 pocket were included. Thus, the only unknown coordinates were those of the hydroxyl group assuming similar positions of the aromatic ring. The calculations show that both electrostatic and nonpolar contributions favor all three aromatic P1 side chains relative to the reference state. Furthermore, all three nonpolar contributions are within 0.3 kcal/mole from each other. Thus, it is reasonable to expect that the difference in the binding free energy is caused by different electrostatic interactions, and this also seems to be the case as the electrostatic contribution is −1.6, −2.4, and −3.2 kcal/mole for P1-Trp, P1-Phe, and P1-Tyr, respectively, and also that the calculated binding free energies are in good agreement with the experimental association energies. The strong binding of P1-Tyr can be explained by a direct hydrogen bond formed between the hydroxyl group of P1-Tyr and the acidic group of Asp 189, which is shown in Figure 4.

P1-Ser and P1-Thr

The structure of the P1-Ser complex was modeled on the basis of information provided from the P1-Thr complex. Because the volume and size of a serine and threonine are small and only differ by a methyl group, the water-mediated hydrogen bonding network at the S1 site was assumed to be similar. Thus, the methyl group of the threonine at the P1 position was changed to a hydrogen to generate the serine. It is also interesting to note that the association measurements showed that binding of P1-Ser is −4.6 kcal/mole more favorable than P1-Gly, whereas P1-Thr is favored by only −1.6 kcal/mole. Table 1 shows that this difference in binding arises from different electrostatic energies, as the nonpolar part of the calculated binding free energy is −0.9 and −1.0 kcal/mole for P1-Thr and Ser, respectively. The electrostatic contribution is −1.6 and −3.1 kcal/mole in P1-Thr and P1-Ser, yielding total binding free energy differences with respect to P1-Gly of −2.5 and −4.1 kcal/mole, respectively, in good agreement with experiments. Average structures from the simulations shows that the hydroxyl group of the P1 side chain is located in a very similar position in the two complexes, but that the hydrogen bond that can be formed with a solvent molecule is more favorable in the P1-Ser case. Furthermore, the methyl group of the P1-Thr side chain is not in favorable contact with either water molecules or other protein groups (Fig. 5). In fact, as is also evident from the crystal structure of P1-Thr (Helland et al. 1999), the methyl group is located in a rather polar environment that would seem to favor the P1-Ser side chain. The difference between the average electrostatic P1-surrounding interaction energies from the P1-Ser and P1-Thr complex simulations is −1.6 kcal/mole in favor of P1-Ser (Table 1). This difference probably reflects both the somewhat strained hydrogen bond of the P1-Thr hydroxyl group as well as the unfavorable effect associated with insertion of a hydrophobic methyl group in the binding site, which also can be seen to displace one of the other nearby water molecules (Fig. 5).

P1-Lys and P1-Met

Association measurements showed that binding of BPTI with cognate P1 residues (Arg and Lys) to trypsin were 105-fold higher than noncognate P1 variants. The narrow trypsin specificity is caused by the negative potential at the S1 site, generated predominantly from Asp 189, which rests at the bottom of the S1 pocket. The LIE calculations reveal that the gain in electrostatic interactions upon binding of P1-Lys to trypsin corresponds to −10.8 kcal/mole. This calculation is before correction for neglecting distant charges, which can be achieved by use of a screened coloumbic potential (ε = 80) and amounts to +0.4 kcal/mole. Therefore, the total binding free energy becomes −12.3 kcal/mole, which compares well with the experimental binding free energy of −12.3 kcal/mole. It has been suggested (Helland et al. 1999; Krowarsch et al. 1999) to use the difference between the interactions of the P1-Lys side chain with the S1 site (ΔGLys − ΔGGly) and the interactions formed by the P1-Met side chain and the S1 site (ΔGMet − ΔGGly) to decompose the total association energy of P1-Lys. In this way, the electrostatic and nonpolar interactions are separated, as well as the interactions that arise from secondary binding sites. This approach requires, however, that the orientation of the P1-Met and P1-Lys side chain in the bound state superimpose, which is not the case in trypsin (Helland et al. 1999). Canonical inhibitors in which the P1 residue is Met have been found to possess slightly different conformations when bound to various proteins. In the complex between rat trypsin and ecotin, a fully optimized P1-Met side chain (McGrath et al. 1994) in terms of dihedral angles was observed, whereas the P1-Met side chain was in a slightly bent conformation in the trypsin–BPTI complex. In the rat trypsin complex, the P1-Met side chain extends toward Asp 189 like the P1-Lys side chain in the trypsin–BPTI complex. Thus, on the basis of these observations, it was suggested that the nonpolar interactions could be more optimized in the P1-Met complex (trypsin) and that the electrostatic contribution from P1-Met would actually disfavor binding of P1-Met (Helland et al. 1999). Consequently, the electrostatic contribution from the P1-Lys side chain to the association energy would be even larger than the binding free energy difference. However, the LIE calculations show that the nonpolar contribution from the P1-Met is very similar to that of P1-Lys, −1.4 and −1.9 kcal/mole, respectively. Thus, it is correct that the electrostatic contribution from the P1-Lys side chain is larger than the actual difference in the binding free energy between P1-Lys and P1-Met. However, the P1-Met side chain has more favorable electrostatic interactions within the complex than in free BPTI, as the electrostatic contribution to the total binding free energy is −2.7 kcal/mole.

Overall quality and error assessment

Both the magnitude of the error bars and the preservation of structures are important factors for the assessment of the quality of calculated binding free energies. As discussed above, MD simulations in general reproduce the structural properties of the different P1 variants to a high degree. Root mean squares deviation for the average MD structures (sampled every 500 steps) from the X-ray structure is in the range 0.3–0.6 Å excluding hydrogen atoms and solvent molecules (results not shown), which confirms that the simulations maintain the experimental structures. Error assessment of the calculated binding free energies can be done in several ways. A common procedure is to divide the production phase of the simulations into two parts, and then to estimate the error as the difference in the average binding free energy for each of these two parts. This step has been taken for all calculations presented in Table 1, and the average deviation amounts to ±0.6 kcal/mole. This process provides an overall error assignment inherent in the calculated binding free energies and also confirms a high stability in the free energies. The average deviation is, however, dominated by the error in calculations involving binding of ionic side chains. Excluding the calculations involving charged P1 variants results in an average deviation of only ±0.45 kcal/mole. Thus, the error in the binding free energy calculations is somewhat larger for ionic P1 variants, which actually reflects that longer simulations are needed to allow the electrostatic contribution to converge. Association constants are measured with an estimated average error of ±40%, corresponding to ±0.2 kcal/mole error in the ±ΔG values. Thus, it is reasonable to expect an error of about ±0.4 kcal/mole in the experimental ΔΔG values (Krowarsch et al. 1999). Therefore, errors in both the experimental and calculated binding free energies are of the same order. Figure 6 presents a scatter diagram of the calculated versus the experimental binding free energies, and a correlation coefficient of 0.99 is obtained excluding the P1-Asp and P1-Glu variants. This figure also reflects the high degree of reproduction of the experimental data, which is further supported by the average discrepancy between calculated and experimental binding free energies of only 0.38 kcal/mole again excluding P1-Asp and P1-Glu.


Binding of BPTI P1 variants to trypsin has been investigated by use of molecular dynamics simulations combined with the linear interaction energy method. The LIE calculations yield a quantitative estimation of the energetic contribution from the interaction between the P1 side chain in BPTI and the S1 site of trypsin within the assumption that the secondary interactions are equal regardless of which amino acid is present at the P1 position.

Analysis of binding of ionizable P1 residues to trypsin requires knowledge of the protonation state of the respective residue. Crystallographic methods for solving structures of proteins are, at present, incapable of resolving the question of the protonation state of ionic species, as the resolution limits the accuracy of particularly hydrogen atom positions. However, crystallographic analysis combined with theoretical calculations presents a strategy for evaluation of specific interactions as well as energetic questions. The calculations presented here include four ionizable P1 variants, His, Glu, Asp, and Lys, and as expected, the LIE calculations predict that the accommodation of Glu and Asp side chains at the S1 site of trypsin requires that the respective side chain becomes protonated upon binding. The acidic side chains are not only opposed to another negative charge in the complex, but they also become embedded in an interior environment. From a thermodynamic viewpoint, transfer of charges from high dielectric medium (e.g., water) to a medium of lower dielectric constant (e.g., protein interior) is unfavorable. Thus, it is reasonable to expect a drastic increase in the pKa value for the acidic P1 variants. In a similar system, SGPB in complex with OMTKY3 (Qasim et al. 1995), huge pKa shifts were found for binding of glutamic and aspartic P1 variants, 4.3 and 4.9 units, respectively. The LIE calculations show that the electrostatic destabilization of the negative charge is somewhat larger for P1-Glu compared with P1-Asp, which is reflected in the more positive ΔΔVel in the P1-Glu. X-ray crystallographic analysis showed that the P1-Glu side chain was penetrating deeper into the S1 pocket than the P1-Asp side chain, which is also evident from the simulations. Hence, it is reasonable to expect a somewhat larger increase in the pKa of P1-Glu compared with P1-Asp.

It turns out, however, that the energetics of P1-Asp and P1-Glu appear more problematic than for the other P1 variants. The calculations clearly indicate that there is an unfavorable binding energy for the negatively charged forms, at least without accompanying counter ions in the binding site. However, the predicted binding affinity at pH 8.3 is still off by about 5 kcal/mole for the protonated P1-AspH, whereas the result for the most favorable conformation of P1-GluH is within 2.0 kcal/mole from the experimental value. It would, of course, have been useful here to have measurements available also at lower pH for comparison. There are also other factors that may complicate binding estimates for the P1-Asp and Glu variants. For example, it has been suggested that one of the solvent molecules in the S1 site is actually a counter ion (Helland et al. 1999). Furthermore, if one protonation is involved in the P1–S1 interaction, there are a number of different possibilities regarding the hydrogen bonding pattern and which oxygen is protonated (on the P1 residue or Asp 189). These issues have been examined here in a preliminary way but need to be studied further.

It is also shown that trypsin is capable of accommodating a positively charged P1-His at the S1 site, which is in contrast to binding of OMTKY3 P1-His to SGPB. The OMTKY3 P1-His residue is neutral when bound to SGPB at pH 8.3, as the pKa is changed to 4.3 when bound (Qasim et al. 1995). The difference between the S1 site of trypsin and SGPB is predominantly the lack of the Asp 189 in the latter. However, the present LIE calculations involving binding of neutral and charged P1-His predict a rather similar P1–S1 interaction free energy at pH 8.3. The results in Table 1 would suggest a preference for binding the protonated form of P1-His but, because the pH correction is sensitive to the exact pKa of P1-His in free BPTI, this is somewhat uncertain.

The binding of OMTKY3 P1 variants to SGPB has also been investigated by computational procedures (Fujinaga et al. 1998). In that study, continuum electrostatic calculations were used to evaluate the electrostatic contribution whereas an accessible surface area-dependent term handled the nonpolar part of the binding free energy. This strategy also involves a multivariate fitting procedure in which the protein dielectric, atomic solvation parameters, and a constant term were fitted to the observed binding free energies. The parameters obtained for OMTKY3 in complex with SGPB were subsequently used to estimate the binding free energies for OMTKY3 in complex with both chymotrypsin and leukocyte elastase, but, in these cases, the binding energies were poorly predicted. Therefore, the parameters obtained on the basis of calculations for OMTKY3 and SPGB were not transferable. The experimental binding data that are available for BPTI with bovine chymotrypsin and trypsin (Krowarsch et al. 1999) and for OMTKY3 with both chymotrypsin and porcine pancreatic elastase (Lu et al. 1997) have also been used to validate the score of amino acid side chain conformations accommodated at the respective S1 sites using an Ace-P2-P1-P1′ -Nme peptide to represent the inhibitor (Lamb et al. 2001). These calculations were, on an overall basis, not capable of reproducing the experimental P1–S1 interaction energies, but the P1–S1 interaction free energies of small hydrophobic side chains were reasonably well reproduced. The main reason for this result is probably the lack of crystallographic water molecules and the exclusion of protein flexibility. On the contrary, the docking strategy is several orders of magnitude faster than the present LIE calculations, and inclusion of explicit water molecules and protein flexibility would add an unacceptable computational expense. The present LIE calculations do, however, clearly demonstrate that to obtain high correlation with experimental data, both protein flexibility and water molecules must be treated explicitly.

The present successful reproduction of relative binding energies in the context of protein–protein recognition holds considerable promise for applying this type of methodology as an aid in rational design and redesign of biologically active macromolecules. For example, attempts to increase or change specificity of enzymes by site-directed mutagenesis can be made much more efficient by pretesting and screening of a large number of computational mutants by molecular dynamics simulations prior to the experimental design. However, the present study also shows that the success of the method to a high degree depends on the accuracy of the three-dimensional model. Omission of water molecules on the protein–protein interface, wrong protonation state or side chain rotamers, for example, failed to produce correct values for the relative association energies. In the case of the trypsin–BPTI system, a large amount of crystal structures allowed for such corrections in a relatively straightforward manner. Therefore, accurate template structures of the protein–protein complex to be studied is probably of crucial importance. Nevertheless, it appears that the new strategy taken here, to use the LIE method for analyzing the relative binding energetics for a set of mutants, provides a useful route for addressing protein–protein interactions in a quantitative way by computer simulation.

Materials and methods

Computational details

All calculations were carried out with the molecular dynamics program Q (Marelius et al. 1998c) using the Amber95 force field (Cornell et al. 1995). X-ray structures (Helland et al. 1999) of trypsin in complex with BPTI were used as starting structures. These structures were superimposed and the Cα atom of the P1 residue was defined as the center of a 16 Å sphere for which unrestrained molecular dynamics simulations were carried out. Atoms outside this sphere were strongly restrained to their crystallographic positions, and nonbonded interactions across the boundary were excluded. A nonbonded cutoff of 10 Å was used in combination with a multipole expansion for long-range electrostatics (Lee and Warshel 1992). Nonbonded interactions involving the BPTI P1 residue were not truncated and thus allowed to interact with the total system. The net charge within the 16 Å sphere was +3 excluding the P1 residue in all calculations, and distant charges were described using a neutral charge set. This step was taken to avoid problems related to the electrostatic contributions to the calculated binding free energy for ionic species (Åqvist, 1996). Because the calculated binding free energy for ionic species lacks the contribution from distant charges, a correction term was added using a screened coloumbic potential with ε = 80. Crystallographic water molecules closer than 12 Å from the reaction center were retained, and additional solvent molecules were added to fill the 16 Å sphere. Water molecules were described using the TIP3P model (Jorgensen et al. 1983). All structures were heated using a stepwise scheme followed by an equilibration period of 15 ps prior to the simulations. During the heating phase, heavy atoms (protein and crystallographic water) were restrained to the initial positions using a harmonic potential with a force constant of 5 kcal/(mole × Å2). The timestep used in the production phase of the simulations was 1.5 fs, and temperature was set to 300 K using a weak coupling to an external bath. SHAKE (Ryckaert et al. 1977) was used to constrain bonds and angles on solvent molecules. The total simulation time for the production period was 225 ps for all simulations, but some of the structures needed a longer equilibration period. Energies were sampled every fifth step in the production part of the simulations.

The LIE method

The linear interaction energy method (Åqvist et al. 1994) was originally designed for calculations of absolute free energies of binding. The main idea is to consider contributions from polar and nonpolar interactions to the total binding free energy separately. The electrostatic linear response approximation is used to estimate the polar part, whereas the nonpolar contribution is calculated using an empirical relationship calibrated against a set of experimental binding data. Using the energy data from two MD simulations, one of the associated complex and one of the ligand free in solution, the binding affinity is estimated by

equation image(1)

where the van der Waals and electrostatic components of the ligand-surrounding interaction energy Vl-s are denoted by the superscripts vdW and el, respectively. The terms 〈〉bound and 〈〉free are average energies from simulations of the solvated protein complex and of the free ligand in water, respectively. In this equation, α and β are the scaling factors for the nonpolar and electrostatic contributions to the free energy. Here, we use the earlier parametrization of the method with α = 0.18, whereas β is assigned one of four fixed values depending on the chemical nature of the P1 residue (Hansson et al. 1998, Marelius et al. 1998b). Although it might be expected that the LIE method would have to be reparametrized for the Amber force field, the results of the present calculations suggest that this is not necessarily the case. It has also been shown by Wang et al. (1999b) that the present parametrization works well for the trypsin–benzamidine case. A general point with the LIE method is that no simulation needs to be carried out for the empty receptor protein, and this issue has been discussed previously (Marelius et al. 1998a). That is, the linear reponse approximation in principle takes into account relaxation of the free receptor implicitly and it has been shown to work, for example, for such cases as ion binding to crown ethers where significant reorganization and delsolvation of the receptor site takes place upon binding (Marelius et al. 1998a). It is thus fundamentally different from approaches that address all energy terms contributing to binding separately.

As a result of the large size of a protein inhibitor such as BPTI, it is not possible to calculate the absolute binding free energy of the trypsin–BPTI complex, as this would require that one obtains convergent values for the entire interaction (polar and nonpolar) between the inhibitor and its surroundings. These energies are on the order of several thousand kcal/mole and, therefore, would require extremely long simulations to get stable averages. However, by measuring just the interactions of the P1 residue in equation 1(1), we examine whether the LIE approach can be used to estimate the contributions of this particular residue to the overall binding affinity of the complex. That is, we attempt to use the method for obtaining the relative binding energies of BPTI variants differing only in the P1 position.

Table Table 1.. Calculated relative binding free energies and average interaction energies (kcal/mol) between the P1-variant of BPTI and its surroundings in complex with trypsin and in free BPTI
  • a

    a Experimental binding free energies from Krowarsch et al. (1999).

  • b

    b Corrected by the free energy required to protonate the P1-residue in unbound BPTI using ΔΔGbindpKa = 1.35 (pH − pKa) where pH is 8.3 and pKa is that of the free amino acid.

  • c

    c Denotes switched positions of oxygen and nitrogen in P1-Asn, and the hydrogen attached to the second oxygen in P1-Asp.

  • d

    d Corrected for neglected distant charges using a coloumbic potential with ε = 80.

  • e

    e Second conformation of P1-Glu.

Figure Fig. 1..

Schematic representation of the thermodynamical perturbation cycle. (Horizontal lines) Experimental binding free energy; (vertical lines) calculated mutation free energies in FEP simulations; (E) enzyme; (I) inhibitor; (I′) modified inhibitor. The difference between the experimental binding free energies equals the difference between the calculated binding free energy: ΔG2 − ΔG1 = ΔG3 − ΔG4.

Figure Fig. 2..

Average structures calculated from the MD simulations of trypsin–BPTI complexes with the primary binding residue (A) P1-His (charged form) and (B) P1-His (neutral form). Water molecules are shown in green. Overlapping atoms indicate mobile atoms, as the average structures were not energy minimized.

Figure Fig. 3..

Average structures calculated from the MD simulations of trypsin–BPTI complexes with the primary binding residue (A) P1-Asn and (B) P1-AspH.

Figure Fig. 4..

Average structures calculated from the MD simulations of trypsin–BPTI complexes with primary binding residue (A) P1-Tyr and (B) P1-Phe. Water molecules are shown as closed circles.

Figure Fig. 5..

Average structures calculated from the MD simulations of trypsin–BPTI complexes with primary binding residue (A) P1-Thr and (B) P1-Ser.

Figure Fig. 6..

Scatter diagram of the calculated versus the experimental binding free energies (kcal/mole) excluding P1-Asp and P1-Glu.


B.O.B. and A.O.S. gratefully acknowledge financial support from the Norwegian Research council. B.O.B. appreciates financial support from NorFa. J.Å acknowledges support from the Swedish Research Council.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.