The changes in binding affinity as a result of a mutation, are calculated as the difference between the binding free energies, , of the mutant and wild type.
where . Note that negative values of correspond to a stabilizing effect of the mutation and vice versa. Following a number of recently published structure-based models, 3, 4, 13 the energies of the complex and unbound states, and , are approximated by a sum of a few interaction energy terms.
where a, b, and c are empirical weighting parameters, Evdw is the van der Waals energy, is the entropy term for the cost of reduced side-chain flexibility, and is the pH-dependent electrostatic interactions where the protein ionization characteristics are calculated using the same method as in our previous work. 17
The free energy of binding between two molecular partners A and B is related to the equilibrium association constant Ka as for the reaction AB = A + B.
In most of existing physics-based approaches (e.g., MM/FDPB1 or LIE2 methods) the binding free energy terms necessary to calculate are evaluated using three separate calculations:
where the energy terms for the unbound partners A and B are calculated separately because of technical constraints from the linear dimension of the FDPB grid or from the water box (LIE).
Taking advantage of the pair-wise Generalized Born approximation 18 in CHARMM GBIM method,19, 20 the calculations are reduced to two sets:
where the unbound state is modeled simply by separating the binding partners by a large distance (e.g., 500 Å).
To illustrate the general difficulties in calculating the binding free energy of a protein complex, let us assume, as an approximation, that the possible structures of binding partners in bound and unbound states are represented by sets of discrete conformations Rj. Since the changes in electrostatic interactions could be a key contributor to binding free energy, and in turn, they depend on the ionization of acidic and basic groups, the titratable residues are represented by the possible states of protonation, Xi. 21 The free energy terms of bound and unbound states could be derived from the corresponding partition sums of all microstates:
where NP = 2Ns is the number of possible protonation states of Ns titratable groups, and NC is the number of all possible conformations. Because the total number of all possible states arising from the multiple conformations and protonation states is huge, various approximations are made in the computational approaches that completely or partially omit the treatment of the multiple titration states or the protein conformational flexibility. Based on the level of approximation in the treatment of the combinatorial problem, we classify all methods, which are used or could be used in modeling the effects of the mutations, into four general classes:
- 1NP = 1, NC = 1: This approximation uses a single protonation state and a single conformation as input for the energy calculations. Some well known programs such as Robbeta 3 and FoldEF22 belong to this class, which calculate the binding free energy as a combination of forcefield energies and additional empirical energy terms. An earlier Discovery Studio16 protocol also belongs to this class and it will be described below.
- 2NP = 1, NC > 1: The second approximation also neglects the treatment of multiple states of protonation, but model proteins as flexible structures. It is used in the most rigorous and computationally expensive approaches such as free energy perturbations or methods that use ensemble averages over MD trajectories such as MM/PBSA, 1 LIE,2 or conformational sampling algorithms (CC/PBSA4).
- 3NP » 1, NC = 1: As with the first class, this level of approximation neglects the conformational flexibility, but takes into account the equilibria of proton binding. Our method presented here belongs to this class.
- 4NP » 1, NC ≫ 1: This approach considers both the multiple protonation states as well as the conformational flexibility of the protein. It should be the ultimate goal of future development.
Regarding the third and the fourth approaches, we were unable to find any computational methods in the literature that rigorously reports the pH dependent mutation energy terms. Instead, to improve the predictions when the standard ionization model fails, some authors 2, 4 model the titratable residues in their neutral charge state and add a simplified correction term of 1.36|pKa-pH| kcal/mol to . However, this approach completely ignores the cooperativity of the proton binding that in many cases is critical for the titration properties of complex systems with multiple ionization sites, as proteins are. It is not applicable for all pH values and needs other calculations or assumptions to estimate the ionization properties in the bound and unbound states. In the attempt to fill this gap, we developed a new computational protocol that automatically calculates as a function of pH, described below as model MPH. For this purpose, we combined two existing computational components in Discovery Studio. The first component is for fast calculation of that has been developed in the “traditional” approximation using a single protonation state (model M0). The second component was developed earlier to predict the proton binding equilibria at given pH and used by another Discovery Studio protocol “Calculate Protein Ionization and Residue pK.” 17
The free energy terms are approximated with the sum of a van der Waals term, and an electrostatic term, that represents the polar contribution of both the intramolecular and protein solvent interactions as described below. Two additional energy terms are added, a solvent dependent, term for the non-polar contribution of solvation energy, and an empirical entropy term to account for the changes in the side-chain flexibility:
Similar to several existing methods, 2–4 the empirical weighting coefficients a, b, c, and d are introduced to improve the fit with experimental data. All energy terms are calculated using CHARMm23 and the method is developed as program modules written in CHARMm scripting language. Evdw, , and are standard CHARMm energy terms calculated using the Momany and Rone forcefield. 24 The GBIM CHARMm module20 is used to calculate the electrostatic term, which extends the functionality of the method to membrane proteins. The total electrostatic contribution, , is calculated as:
where qi are the atomic charges, αi are the effective Born radii, and ϵm and ϵslv are the dielectric constants of the molecule and the solvent, respectively. The contribution of side-chain entropy SSC is approximated as:
where sai is the percentage side-chain solvent accessibility of residue i in the folded state, σi is the average entropic cost to bury a residue in a folded structure, and the summation is taken over all amino-acid residues. The σi values can be taken from one of the available empirical entropy scales 25, 26 and the results in this study are obtained using the data from Ref. 26. Instead of a linear dependence25 on sai we suggest a sigmoid function f(sa)i that ranges between 0 for entirely exposed to 1 for entirely buried residues:
The main difference between the M0 model and the new, pH-dependent model is that the electrostatic term and are calculated as a function of pH respectively, while the other terms are calculated in exactly the same way, as in M0:
To evaluate the pH dependent (pH) term we implemented a method, based on the integration over the binding isotherms. 11 Similar approaches have been used to model pH-dependence of protein stability,12, 27, 28 the protein-DNA interactions,29 protein-ligand interactions,17 and cooperativity of ion binding.15 However, no computational tool has been reported to calculate the full scale of pH-dependent energy differences resulted from the mutations of amino-acid residues.
In the MPH model, the electrostatic contribution is calculated as:
where Q(pH) can be either the average number of bound protons or the total charge. The electrostatic free energy is conveniently referenced to the energy of completely deprotonated state . 12 The model of deprotonated state used to calculate is constructed by assigning the corresponding partial charges to the atoms of all titratable groups. Q(pH) is derived from the fractional protonation of titratable residues:
The calculations of θi(pH) as well as all other pH related properties of wild type and mutant structures are carried out as described in a previous Discovery Studio method to calculate protein ionization. 17 It is based on GBIM CHARMm calculations combined with the IMC iterative mobile clustering approach30 to treat the combinatorial problem of multiple protonation states. Another difference from M0 method is that the CHARMm GBIM module is used only to calculate the effective Born radii, and the electrostatic energy terms used in the calculations of and θi(pH) are carried out by a separate C++ program that extends the method by including the effect of ionic strength I. 31
In addition to mutation energy terms, the MPH-based protocol also reports the predicted pKa values, the fractional protonation of titratable residues for the wild type and mutants in both the bound and unbound states, the mutation energy at the specified pH, and the corresponding titration curves and the pH-dependent electrostatic contribution to the binding free energy.
Modeling of the mutant structures
Both M0 and MPH methods use a module written as a CHARMm script to generate and optimize the structures of the mutants. The construction of the mutant structure includes a sampling algorithm that is similar to ChiRotor 32 that searches for optimal conformation of the side-chain of the mutated residue at the fixed backbone.
The method is implemented for both CHARMm and CHARMm Polar H (hydrogens) forcefields. 24 The results shown in this study were obtained using CHARMm Polar H.
The methods use a number of CHARMm scripts, C++, and Perl program modules wrapped in a single Accelrys Pipeline Pilot protocol “Calculate Mutation Energy (Binding).” The input list of the mutations is generated automatically from the list of selected residues and amino acid types of the substitutions. The relationship between CPU time for a single mutation and the size of the proteins is almost linear. For a medium size protein of about 200 residues, it takes about 30 sec per mutation using M0 approximation and 1.5 min using MPH. In addition, a coarse grain parallelization implemented in the protocol allows automatically and easy distribution of the individual mutations to a large number of available processors and servers.
The use of the Generalized Born solvation model in combination with IMC approach makes the calculations fast and applicable to very large systems (e.g., more than 1000 sites of titration), and unlike many grid-based methods, is independent on the size of interacting molecules. Also, the computational protocol is applicable to membrane environments, and besides protein–protein complexes, it can be used to study the effect of mutation on the binding of ions and other compounds such as organic ligands or DNA/RNA molecules.
Homology model of human Fc and FcRn
Given that the sequences of the Fc domain of human IgG are very similar, we choose the subtype 1 as input to the homology model. For FcRn, human sequence from pdb structure 3m17 is used which includes the beta-2-microglobulin domain. The sequence alignment of Fc-FcRn complex between murine (pdb code 1i1a) and human was generated using multiple sequence alignment method in Discovery Studio 3.5. The sequence identities between the murine and the human for Fc and FcRn domains are 64 and 70% respectively. Twenty homology models are created using MODELER 33 implemented in Discovery Studio and the model with the smallest violation to the homology restraints are selected. Further analysis of the models reveals that some of the interface residues have highly conserved side-chain conformation and others highly variable. The residues with conserved side-chains mostly have identical residue type between murine and human and those with the highly variable side-chain have different residues between the two organisms. For those that are highly variable, additional refinements are carried out using the CHARMm based side-chain optimization method, ChiRotor.32 The resulting model is then used for all the mutation energy calculations.