Notice: Wiley Online Library will be unavailable on Saturday 27th February from 09:00-14:00 GMT / 04:00-09:00 EST / 17:00-22:00 SGT for essential maintenance. Apologies for the inconvenience.
A compound-based computational approach for the accurate determination of hot spots
The College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
Correspondence to: Lincong Wang, The College of Computer Science and Technology, Jilin University, Changchun, Jilin, China. E-mail: firstname.lastname@example.org (or) Yongli Bao, National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, Jilin, China. E-mail: email@example.com
National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, Jilin, China
Correspondence to: Lincong Wang, The College of Computer Science and Technology, Jilin University, Changchun, Jilin, China. E-mail: firstname.lastname@example.org (or) Yongli Bao, National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, Jilin, China. E-mail: email@example.com
A plethora of both experimental and computational methods have been proposed in the past 20 years for the identification of hot spots at a protein–protein interface. The experimental determination of a protein–protein complex followed by alanine scanning mutagenesis, though able to determine hot spots with much precision, is expensive and has no guarantee of success while the accuracy of the current computational methods for hot-spot identification remains low. Here, we present a novel structure-based computational approach that accurately determines hot spots through docking into a set of proteins homologous to only one of the two interacting partners of a compound capable of disrupting the protein–protein interaction (PPI). This approach has been applied to identify the hot spots of human activin receptor type II (ActRII) critical for its binding toward Cripto-I. The subsequent experimental confirmation of the computationally identified hot spots portends a potentially accurate method for hot-spot determination in silico given a compound capable of disrupting the PPI in question. The hot spots of human ActRII first reported here may well become the focal points for the design of small molecule drugs that target the PPI. The determination of their interface may have significant biological implications in that it suggests that Cripto-I plays an important role in both activin and nodal signal pathways.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
The delicate balance between the binding affinity and specificity of a protein–protein interaction (PPI), perfected through the evolutionary process, is a prerequisite for the proper functionality of all the biological processes such as metabolic and signal transduction pathways. A deviation in either affinity or specificity of an interaction from its norm in healthy individuals could lead to diseases. Consequently, the modulation of PPIs using therapeutic agents has become one of the most active drug discovery efforts. A starting point of such an effort is an intermolecular interface (binding site) with atomic details. Over the past 20 years, a variety of experimental methods and computational approaches have been developed for the detection, quantification, and prediction of PPI sites.[2, 3] It has been found that most of the studied PPI sites share common features with each other including (1) a small percentage of the residues in the binding site contribute disproportionally to the binding, and these residues are called hot-spot residues (or hot spots in short)[4, 5]; (2) they are composed of a hydrophobic core surrounded by an O-ring of charged/polar residues[4, 5]; and (3) there exists amino acid preference in hot spots. The focus of the present article is on hot-spot determination, and thus in the following we only review the current state of their experimental measurement and in silico prediction. Previously, given a complex structure their hot spots have been identified experimentally through alanine scanning mutagenesis followed by the measurement of the binding affinity of each mutant. However, large-scale alanine scanning is rather costly. Furthermore, for the majority of PPIs there may exist no complex structures, even no apo-structures for any of the two interacting partners. Consequently, in the last 10 years or so different computational approaches have been proposed for hot-spot prediction including both structure-based[6-19] and sequence-based[8, 20] ones. However, the overall accuracy of sequence-based methods remains low, so most of the current approaches require a complex structure that is determined either experimentally or computed using a protein–protein docking program. Given a protein–protein interface, these approaches perform alanine scanning in silico to predict hot spots through the estimation of the binding affinities of the mutants generated for all the residues at the PPI site using an empirical scoring function. The following survey focuses on them.
A computational approach for hot-spot prediction consists of at least two components either explicitly or tacitly: a scoring function devised to distinguish hot-spot residues from regular surface residues and a search algorithm designed to find all the hot spots. The previous search algorithms are often heuristic in nature and adopted from machine learning such as neural networks, support vector machine and its variants.[13, 19] Even when a protein–protein docking program is used to find all the binding sites and poses, the search may still not be exact if the docking program itself relies on a heuristic search. A search algorithm is employed for both the tuning of the parameters in a scoring function and hot-spot identification. In theory, a scoring function is an approximation to the exact physics for a PPI expressed by where is the binding Gibbs free energy. Neither enthalpy nor entropy could be measured or computed precisely. The explicit forms of current scoring functions may differ but all of them use a very crude approximation to and a somewhat more accurate approximation to . Part of could be estimated through a quantification of the desolvation effect by computing the difference in solvent accessible surface area (SAA) before and after the binding. According to how and are approximated and sequence information is used, current scoring functions could be roughly divided into four categories.
Energy-based functions that consist of such terms as hydrogen bond (H-bond), salt bridge, van der Waals (VDW) and electrostatic interactions as well as SAA[6-8, 19];
Knowledge-based that include such terms as atomic (residue) contact and distance-dependent pair potential, the shape of the binding site, and the local structural environment such as structural conservation. These terms are typically mined from either a PPI or hot-spot database;
Sequence-based that have such terms as the degree of residue conservation and a sequence profile that includes neighboring residues;
Ad hoc functions that include terms from both energy-based and knowledge-based. For example, such a function may combine an SAA term with residue conservation, residue contact, and pair potential of interface residues or distance-dependent pair potential, or combine shape specificity with atomic contact, H-bond, and salt bridge.
In contrast to the terms in an energy-based function those in either a knowledge-based or sequence-based function have neither clear physical meaning nor proper energy unit. The parameters in these functions such as their weighting factors are learned through a training process using a database of complex structures either determined experimentally[6, 17-19] or predicted by a protein–protein docking program such as ICM. With an ad hoc scoring function that includes terms from different categories, it is tricky to determine its parameters and to assign proper energy unit to its terms. Though the individual terms in an energy-based function may have clear physical meaning, the function itself is still a very crude approximation to the overall Gibbs free energy of binding. The other scoring functions, though whose terms lack clear physical meaning, have been adopted mainly because of their good performance in practice. The development of both scoring functions and search algorithms relies heavily on experimental alanine scanning databases such as ASEdb, BID, and ISIS for their parameter optimization and prediction verification.
Although alanine scanning mutagenesis is an established technique for hot-spot identification, the results are somewhat difficult to interpret quantitatively as a point mutation to a single type is not always sufficient to quantify a residue's contribution to the binding. In fact, it is rather tricky to quantify the contribution of any individual residue to either a PPI or protein–ligand interaction through point mutation without in the same time measuring the structural and dynamical changes brought about by the mutation. Thus, it is not straightforward to translate the knowledge of alanine scanning into the design of small chemicals that target the PPI because of the lack of specific and explicit information about the particular intermolecular interactions that have been modulated by the point mutation.
In this article, we present a novel computational strategy for hot-spot determination through the docking into a set of proteins homologous to one of the two interacting partners of a compound known to be capable of disrupting a PPI. The rationale behind our approach is in line with the recent findings that PPI sites have increased propensity for small molecule (ligand) binding.[25, 26] We have developed an ad hoc scoring function that includes intermolecular hydrogen bonding propensity, intermolecular VDW energy and SAA as well as residue conservation first to predict PPI sites and then to distinguish hot spots from the other surface residues. In our approach, the search for a PPI site is achieved through the docking of the known compound into a set of homologous structures while the hot spots themselves are determined by evaluating their specific interactions with the compound and the degree of conservation. Compared with the previous structure-based computational methods for hot-spot identification, our approach offers several advantages. First, only a homologous structure of one of the two interacting proteins is required. Second, because it starts with experimental observations, the prediction is more reliable, as having been confirmed experimentally through point mutation studies and by a comparison with a prediction made with a frequently used protein–protein docking program, ZDock. Third, the explicit atomic interactions between the docked compound and binding site residues make it possible to mutate the key residues according to the particular type of interaction they engage in, rather than to mutate all of them to a single type residue as in an alanine scanning approach, and thus the quantification of a particular residue's contribution to the PPI should be more precise than what could be possibly achieved through alanine scanning. Fourth, as our strategy is based on a competitive mechanism between the ligand binding and protein binding, the explicit types of interactions identified by our approach could become the starting point for the optimization of lead compounds that target the PPI. Our novel approach has been applied to predict the hot spots of human activin receptor type II (ActRII) critical for its binding toward Cripto-I. Both ActRII and Cripto-I are involved in nodal and activin signal pathways. However, it has not been proved with certainty whether Cripto-I binds directly to ActRII or whether the interaction between Cripto-I and ActRII is mediated by activin. The experimental confirmation of human ActRII and Cripto-I binding interface and ActRII hot spots first reported in this article provides concrete evidence for their direct interaction.
Our novel computational approach for hot-spot determination using a small molecule, alantolactone, as a probe has been successfully applied to the identification of residues of human ActRII critical for its binding to Cripto-I. In the following, we first describe the ActRII and Cripto-I interface determined for the first time by our approach. Then, we present a quantitative experimental assessment of the identified hot spots using a mammalian two-hybrid system with luciferase as a reporter.
Computational identification of human ActRII PPI site and hot spots
Out of all the possible alantolactone binding sites (Fig. 1) on the set of ActRIIA homologs (Table 1) output from the docking program GOLD, a site (Site I) on human activin receptor IIB (ActRIIB, PDBID 2H62, 62% sequence identity to human ActRIIA) has the best (lowest) overall score as determined by our scoring function (Table 2, and Supporting Information S1). This site is composed of residues W45, R46, N47, G50, T51, I52, V74, Y82, F95, T96, and H98 of ActRIIB (Fig. 2). Assuming that both ActRIIA and ActRIIB interact with alantolactone in a similar manner and thus share hot spots, we could use the ActRIIB interface to infer the contributions of the corresponding residues in ActRIIA to its binding toward Cripto-I.
Table 1. The List of Four Homologous Protein Structures used in the Docking Study
Sequence identity (extracellular)
Sequence identity (whole)
The four structures, three of them are complexes, are selected from the PDB based on sequence similarity of both their extracellular domain and the whole sequence.
The two complexes have similar structures with different resolutions.
Table 2. The Ranking Scores of the Five Binding Sites
Surface SAA of bound ligand (Saa)
Sequence conservation (Sc)
Is there an O-ring (Or)
The total score, ST, and its individual terms, , are computed as follows. The protein–ligand VDW term, Ev, is computed using CHARMM force field parameters for the protein and AMBER force field parameters for the ligand. The protein–ligand H-bond energy, Eh, is computed using the parameters as implemented in the program DSSP a weighting factor of is applied to the raw H-bond energy in order to bring its numerical values into the range −1.0 and 0.0. The term that approximates the desolvation effect, Saa, is defined as the ratio of the change in SAA versus the total SAA of all the hydrophobic atoms in the binding site, , where Au is the total surface SAA of all the hydrophobic atoms of the unbound ligand, that is 278.6 Å2 for alantolactone, Ab is the total surface SAA of all the hydrophobic atoms of the bound alantolactone. Sc is defined as the minus of the percentage of the conserved residues with respect to all the residues in the binding site; Or is assigned −0.5 if at least one residue in the binding site is a part of an O-ring, otherwise, is assigned 0.0.
The first number is H-bond energy while the second is the number of H-bonds.
The first number is the number of the conserved residues in the binding site while the second is the total number of the residues. The binding site residues are those that have at least one atom to be within 6 Å of at least one atom of the docked compound. The lower the total score ST is, the better.
We start with an analysis of the top-scored ActRIIB alantolactone binding interface in terms of the type of interaction the residues in the interfaces could possibly engage in. In the ActRIIB binding interface, there exist an H-bond between O34 of alantolactone and HE1 of W45 and two H-bonds between O29 of alantolactone and the carboxamide group of N47. Both HE1 of W45 and the carboxamide group of N47 are exposed without the bound compound. A comparison of human ActRIIA and ActRIIB sequence reveals that both residues are conserved (Fig. 3). We hypothesize that the side chains of both residues could participate in hydrogen bonding with Cripto-I while the binding of an alantolactone renders them unavailable for Cripto-I binding. With their statuses as hot spots having been confirmed experimentally through point mutation study, we extend into the neighboring region of the above ligand binding site to search for additional hot spots. The Cripto-I binding interface of an ActRII is expected to include the alantolactone binding interface as a subregion, and most likely extend further into the neighboring region. A visual inspection of the possible extensions indicates that if the above alantolactone binding region is expanded into its right side as shown in Figure 4, then the Cripto-I binding interface of ActRIIA would be formed by a valley of hydrophobic residues surrounded by an O-ring composed of polar/charged residues, a common structural feature in many PPI interfaces. Moreover, the extended binding site includes W45, Y82, and R94, the three types of residues that have been reported to occur frequently in PPI sites. In this extended Cripto-I binding site, N47 is a part of an O-ring while W45 in the valley. Other residues in the O-ring include both S49 and R94 with their side chains possibly forming either an H-bond or salt-bridge with Cripto-I because their side chains extend far into the solvent. In particular, S49 is conserved and its hydroxyl group could not form any intramolecular H-bonds. The residue in ActRIIA corresponding to R94 of ActRIIB is K94, both of them are positively charged and thus possibly form a salt-bridge with Cripto-I, while Y97 of ActRIIA corresponds to H97 of ActRIIB, both of them are aromatic and thus possibly engage in a hydrophobic or aromatic–aromatic interaction with Cripto-I. Y97(H97) covers the left side of the alantolactone binding site (Figs. 2 and 4) while K94(R94) is located at its upper left (Fig. 4). All of them may contribute to Cripto-I binding and thus are selected for experimental verification and quantitative assessment as detailed later.
Experimental validation and quantification of hot-spot residues
The binding affinities of both the wild-type and five mutants of ActRIIA toward Cripto-I have been measured by a mammalian hybrid system with luciferase as a reporter (Fig. 5). The data show that N47V mutant retains less than 20% affinity for the wild-type. Though on average 70% affinity remains for W45F mutant, none of the measured values have more than 90% of the affinity for the wild type (Fig. 5). The W45F and N47N mutants are designed to disrupt their capability to form a hydrogen-bond. It is possible that W45F mutant retains its aromatic–aromatic interaction with Cripto-I and thus has smaller reduction in affinity compared with that of N47V mutant. We predict that both of them are hot spots though W45 is likely to be a weak one. To quantify the contributions of the three residues S49, K94, and Y97 at the extended ActRIIA PPI site, S49 is mutated to an alanine to remove its hydrogen bonding propensity, K94 to a methionine to cut out its capacity to form a salt-bridge and Y97 to an alanine to disrupt a possible aromatic–aromatic interaction or to reduce a possible hydrophobic interaction with Cripto-I. Compared with the binding affinity of the wild-type, the affinity of S49A is reduced by 60%, that of Y97A by 10% while the binding affinity of R94M remains more or less the same. Thus, we conclude that S49 and possibly Y97(H97) are hot spots while K94(R94) may be not.
We first present the rationale behind our scoring function on which our computational approach for hot-spot identification depends. Next, we compare our approach with previous structure-based computational methods for hot-spot identification. We then discuss the biological implications of the identified hot spots of human ActRII critical for Cripto-I binding and highlight the significance of our computational approach for the design of small molecule drugs targeting a PPI. Finally, we discuss the limitations and possible extensions of our approach.
The scoring function
Our approach assumes that there exist only homologous structures for one of the two interacting proteins. Consequently, in contrast with the computational alanine scanning methods that require a complex structure, in our case no PPI energies such as intermolecular VDW or H-bond energy could be computed. On the other hand, given a pose from a docking program, the protein–ligand interaction energy is readily available. With the assumption that both the ligand and protein compete for the same site, it is possible to use protein–ligand binding affinity to estimate protein–protein binding affinity. Thus, for the purpose of PPI site identification, we have developed an ad hoc scoring function, , that is composed of two types of terms that occur respectively in most protein–ligand and protein–protein scoring functions. The first three terms, Ev, Eh and Saa, occur frequently in many scoring functions for both protein–ligand docking[31, 32] and protein–protein docking as well as hot-spot identification[6-8] because of their clear physical meaning; while the next two terms, Sc and Or, are typical in most knowledge-based scoring functions for PPI ranking and hot-spot identification though their concrete forms for term Or may differ from our expression. The ad hoc combination of terms from both protein–ligand interaction and PPI seems to be odd but the idea is consistent with the previous findings that a PPI site has high propensity for ligand binding.[25, 26] The practice of combining physical terms with nonphysical ones is not new: in structural biology especially in NMR structure determination a typical target function includes both molecule-mechanics terms and pseudo-energy terms that assign a penalty to restraint violation.[33, 34] However, it is challenging to determine weighting factors and the associated energy units for individual terms in such an ad hoc function without first training it against a database of PPIs that could be inhibited by chemicals. At present, we are not aware of the existence of any such databases. Consequently, to keep the scoring function as simple as possible the numerical value for each term is scaled to the range between −1.0 and 0.0 except for VDW energy Ev whose value is scaled to the range from −1.0 to +1.0.
The binding site search method and hot-spot identification
Our approach assumes that the ligand and protein share the same binding site, thus it is possible to use a protein–ligand docking program to search for a PPI site. The initial set of binding sites is identified by the program Ligsite and then the set is pruned by the protein–ligand docking program GOLD: if GOLD fails to output any pose for the site, it is removed from any further considerations. The accuracy of our PPI site search and hot-spot determination relies critically on the accuracy of GOLD that itself employs a genetic algorithm to optimize a protein–ligand scoring function. To reduce the possibility of missing the correct binding sites, three rounds of docking experiments are performed with each to be initialized with a different center for a binding site. If any pose is output from GOLD, the site is kept for hot-spot identification. When a pose is used to identify hot spots, only its key intermolecular interactions such as intermolecular hydrogen bonding is used in order to minimize the effect of the possible low accuracy of the docked poses on hot-spot determination.
Comparison with the previous structure-based computational methods for hot-spot identification
Our novel computational strategy determines the hot spots of a PPI interface through the docking of a compound known to be capable of disrupting a PPI into a set of proteins homologous to only one of the two interacting partners. Compared with previous computational methods, our approach offers several advantages. First, no complex structures not even the exact structures of any individual proteins are required. In contrast, the previous structure-based hot-spot prediction methods require either a complex structure or the structures of individual proteins, for the latter a protein–protein docking must first be performed to search for the possible interfaces between the two proteins. Second, because our approach starts with an experimental observation, the predicted hot spots are more accurate than those achieved by the previous methods as being confirmed experimentally through point mutation study. For the purpose of comparison, we have used a protein–protein docking program ZDock with the default parameters provided by the program to dock an ensemble of NMR solution structures of mouse Cripto-I into human ActRIIB (PDBID 2H62) used in our protein–ligand docking experiment. None of the poses output from ZDock could correctly identify any of the above hot spots (Supporting Information S2). Third, the specific intermolecular interactions with atomic details between the docked ligand and individual residues obtained by our approach make it possible to mutate the involved residues according to the specific type of interaction they engage in while in the same time to keep all their other physical–chemical properties as intact as possible. In contrast, the experimental alanine scanning itself and the computational methods built upon it mutate every residue to an alanine, consequently the quantification of a particular residue's contribution to the PPI may be less precise than that obtained from our approach.
Implications for nodal and activin signal pathways and tumorgenesis
It has been shown that nodal signals via ActRIIA/ActRIIB and ALK4 (activin-like kinase 4 or activin type I receptor), and is essential for mesoderm formation and subsequent organization of left–right axial structures in early embryonic development. Specifically, nodal requires Cripto-I as a coreceptor for signaling via ActRIIA/ActRIIB and ALK4. Although previous studies have indicated that Cripto-I could form a trimer with activin and ActRII,[38, 39] and may interact directly with human ActRIIA, it has not been proved with certainty whether Cripto-I binds directly to ActRIIA/ActRIIB or whether the interaction between Cripto-I and ActRIIA/ActRIIB is mediated by activin as no complex structures have been reported and neither has the interface between Cripto-I and ActRIIA/ActRIIB nor the interface between Cripto-I and activin been mapped experimentally. Through a screening effort using a mammalian two-hybrid system, alantolactone has been identified previously as a possible candidate capable of disrupting Cripto-I and ActRIIA interaction. The conclusion is based on a co-PI experiment of Cripto-I and ActRIIA and the following experimental observation: alantolactone could induce activin/SMAD3 signaling in human colon adenocarcinoma HCT-8 cells. If Cripto-I could bind directly to ActRIIA, it should play an essential role in both activin and the nodal signal pathways. It has been well documented that activin controls diverse cellular processes ranging from tissue patterning during embryogenesis to the control of homeostasis, cell growth, and differentiation in many types of adult tissues. The malfunction of the activin signaling is associated with multiple pathological states including reproductive disorders and carcinogenesis. Thus, the role of Cripto-I in tumorgenesis should be consistent with the facts that (1) it could inhibit the activin signaling and (2) alantolactone could reverse the inhibition through the induction of activin/SMAD3 signaling by blocking the Cripto-I and ActRII binding. However, up to now neither is known whether alantolactone binds to either ActRIIA or Cripto-I nor has the alantolactone or Cripto-I binding site on ActRIIA been characterized. The alantolactone binding site, the interface between ActRIIA and Cripto-I and the hot spots on ActRII all determined for the first time by our computational approach provide the concrete evidence that supports a direct though possible transient interaction between human ActRII and Cripto-I and further supports the view that Cripto-I may play an important role in the activin signal pathway as well.
The significance for the design of small molecules that modulate PPIs
The delicate balance between the binding affinity and specificity of a PPI plays an essential role in the development of many diseases. Consequently, the modulation of PPIs using small molecules has become one of the most active drug discovery efforts. However, compared with the structure-based design of small molecules that target an enzyme or a ligand-binding protein, the design of small molecule PPI inhibitors is more challenging because of (a) the lack of complex structures, (b) the lack of definite information about both hot spots and the type of interaction at atomic level, and (c) the unique characteristics of a typical PPI interface such as being noncontiguous, flat, and devoid of a deep binding site, and its surface area being much larger than the potential binding area of a typical small molecule. Our computational approach may make it easier to design small molecule inhibitors because it could determine hot spots with high accuracy even when neither a complex structure nor the structures of any individual proteins are available and because the probe itself is a small molecule that has been experimentally proved to be capable of disrupting the PPI. Moreover, the hot spots and the detailed information about the PPI determined by our approach should be particularly useful for lead optimization with the probe itself as a scaffold and being guided by the specific types of intermolecular interactions. In our particular case, with the previous experimental observation that alantolactone exhibits antiproliferative function specific to tumor cells with almost no toxicity to normal cells at a concentration of up to 5 μg/mL, alantolactone may well become a promising lead for the design of an entirely new class of anticancer agents that target the activin signal pathway.
The limitations of our approach and possible extensions
In the current implementation of our approach, only limited flexibility of the shared ligand and protein binding site is taken into consideration as the current version of GOLD only lets the methyls of the binding site and the protein atoms that could possibly participate in hydrogen bonding with a ligand atom have multiple conformations. A possible remedy is first to use molecular dynamics simulation to explore the mobility of the binding site and then to dock the compound into the representative trajectories, and finally to rank all the possible binding sites. Our approach assumes that the ligand and the protein compete for the same binding site, and thus it will fail if the ligand achieves its inhibition through an allosteric effect. On the other hand, our approach could be used to determine whether the ligand achieves its inhibition through an allosteric mechanism as in this case, the point mutation of a ligand binding site residue will not affect the PPI in question. A possible disadvantage of our approach is that it requires the existence of both a compound capable of inhibiting the PPI and an apo-form structure of a protein homologous to one of the two interacting partners and thus only the hot spots on the protein that has a homologous structure could be determined. Finally, though the newly developed scoring function and the computational approach itself have been applied successfully to determine the hot spots of ActRII toward Cripto-I, for better assessment of both the scoring function and computational approach itself, they should be tested on more PPI systems that could be interrupted by a ligand. In particular, the fine tuning of our ad hoc scoring function including the determination of its weighting factors and the associated units requires a database of PPIs that could be disrupted by known compounds.
Materials and Methods
We first present our novel computational approach for hot-spot determination. Then we illustrate its application to human ActRIIA that binds to Cripto-I. Finally, we describe the experimental verification of our computational results.
The computational approach
Our approach does not require any complex structures, instead it searches for hot spots using as a probe a compound known to be capable of modulating the interaction between the two proteins whose hot spots we are seeking to identify. Briefly, our approach proceeds as follows. Starting with the structures of a set of proteins homologous to only one of the two binding partners, we use a protein–ligand docking program to dock the probe into all the possible binding sites on all the homologs. These binding sites are identified initially with the program Ligsite. Those sites into which the program failed to dock the compound are removed from further considerations while all the remaining sites are ranked using our newly developed scoring function. The scoring function is composed of intermolecular H-bond energy, intermolecular VDW energy, and the change in solvent SAA upon ligand binding as well as two additional terms that represent, respectively, the degree of residue conservation and whether the ligand binding site is a part of an O-ring structure characteristic of most known PPI sites. Finally, the conserved residues in the top-ranked site that make important contributions to the ligand binding are predicted to be hot spots. In the following, we present our approach in detail.
The selection of protein structures for docking study
There exist no complex structures for ActRIIA and Cripto-I. Instead, we use a set of crystal structures of activin receptors (Table 1) that are similar to ActRIIA at the sequence level.
Protein–ligand docking for PPI site identification
We first use the program Ligsite to identify all the possible binding sites on each structure in the set of homologs. Then, we perform several rounds of docking studies on each binding site of each homolog using the program GOLD initialized with three distinct centers manually picked for a binding site. GOLD requires a user to pick a point as the center of a binding site and the final poses depend, sometime critically, on the location of the center. We use the default parameters as provided by GOLD. Specifically, the number of final poses output from GOLD is decided by the program itself using its default fitness score. During the docking process, except for methyl groups and those atoms that possibly participate in hydrogen bonding with the ligand, all the other atoms in a binding site are kept to be rigid while the ligand itself is flexible.
The scoring function for binding site ranking
The following scoring function, ST, has been developed to rank all the binding sites on all the homologs output from the above protein–ligand docking study:
where Ev and Eh are, respectively, the intermolecular protein–ligand VDW and the protein–ligand H-bond energies, Saa is the ratio of the change in SAA versus the total SAA of the hydrophobic atoms, , where Au is the total surface SAA of all the hydrophobic atoms of the unbound ligand, Ab is the total surface SAA of all the hydrophobic atoms of the bound ligand. Sc is the percentage of the conserved residues with respect to all the residues in the binding site, while Or is assigned −0.5 if there exists an O-ring in the ligand binding site, otherwise, is assigned 0.0. If there are several docking poses for a binding site, we compute ST using the pose with the best GOLD fitness score.
All the terms in ST are computed using our own implementation. The protein–ligand VDW term, Ev, is computed using CHARMM force field parameters for the protein and the AMBER force field parameters for the ligand. The term Saa that approximates the desolvation effect of both protein–ligand and protein–protein binding is computed by following a procedure for SAA computation in the program DSSP. The protein–ligand hydrogen bond term Eh is also computed using the parameters in the program DSSP, and as its numerical values are rather large, they are divided by a constant to bring them into the range −1.0 to 0.0 (0.0 if there exist no H-bonds) in agreement with the numerical values for those of Saa and Sc that are in the range −1.0 to 0.0. To keep the scoring function as simple as possible, the numerical value for every term is scaled to the range between −1.0 and 1.0. For example, the numerical value for Or is set to either −0.5 or 0.0. Except for VDW energy Ev, the numerical values for all the other terms are negative. The residues used in Sc computation are those that have at least one atom to be within 6 Å of at least one atom of the docked compound.
Hot-spot identification and the design of point mutation
Our approach assumes that a ligand accomplishes its inhibition of a PPI through a competitive mechanism, thus a hot spot is also pivotal for protein–ligand binding. In other words, a residue in the top-ranked site that has important interactions with the docked compound such as an intermolecular H-bond or an aromatic–aromatic interaction is predicted to be a hot spot if it is also conserved and a part of an O-ring structure. These predicted hot spots are then assessed by point mutation study. After an initial binding region having been confirmed experimentally, we extend into its neighboring regions to further search for the conserved residues that could potentially form a salt-bridge or hydrogen bond or participate in an aromatic–aromatic interaction with a protein residue as a protein–protein interface is likely to be larger than a protein–ligand interface especially when the ligand itself is a small chemical. Point mutation is performed on these predicted hot spots as well to assess their contributions to the binding. The design of point mutation itself is based on the type of the interaction this particular residue is expected to participate in when it binds to the protein. For example, if a residue is expected to form an intermolecular H-bond, we mutate it to a residue that keeps as much the same as possible the other physical and chemical properties while the capacity of its side chain to form an H-bond is being removed.
Experimental verification and quantification of hot spots
The effect of a single point mutation on human ActRIIA-Cripto-I interaction is assessed using a mammalian two-hybrid system with luciferase as a reporter.
Cell line and cell culture
Human embryonic kidney cells (HEK293T) were obtained from the Cell Bank of the Chinese Academy of Sciences (Shanghai, China). The cells were maintained in Dulbeccos modified Eagles medium (DMEM, Gibco), supplemented with 10% fetal bovine serum (FBS, TBD, China), containing 100 U/ml penicillin and 100 mg/ml streptomycin and incubated in a 37°C incubator with 5% CO2.
An N-truncated cDNA sequence, that is, with its signal peptide sequence deleted, of either human ActRIIA or Cripto-1 is cloned into a pBind vector (Promega). The five mutants, W45F, N47V, S49A, K94M, and Y97A, are generated using a Site Directed Mutagenesis Kit by following a protocol provided by the manufacturer (Beyotime, Jiangsu, China). Briefly, the five DNA constructs (pBind-ActIIA W45F, pBind-ActRIIA N47V, pBind-ARIIA S49A, pBind-ActRIIA K94M, and pBind-ActRIIA Y97A) are first created and then amplified by PCR using the respective primers. All the five constructs are then transferred into E.coli cells for culture and finally their cDNAs are extracted from the cells and sequenced to confirm the mutation.
Luciferase reporter assay
HEK 293T cells are grown for about 24 h until they achieve a 60–70% confluence when they are ready for cotransfection by plasmids. The HEK293T cells are then cotransfected by three types of plasmids (1 μg/well) having respectively human pAct-Cripto-1 DNA, pG5luc luciferase DNA, and either the wild-type pBind-ActRIIA DNA construct or one of the above five point mutation constructs. The cotransfection is achieved using a Calcium Phosphate Transfection Kit (Beyotime, Jiangsu, China) according to the manufacturer's recommended protocol. After the cotransfection, the HEK 293T cells are further cultured for an additional 24 to 48 h prior to a luciferase reporter assay. The assay itself proceeds as follows. The cells are first harvested and washed twice using phosphate-buffered saline (PBS), and lysed in 85 μL cold lysis buffer (25 mmol/L glycylglycine, 1% Triton X-100 [v/v], 15 mmol/L MgSO4, 4 mmol/L ethylene glycol tetra-acetic acid [EGTA], and 1 mmol/L dithiothreitol, pH 7.8). Then, luciferin (100 μL, 25 mmol/L, BD Monolight; BD Biosciences, Pharmingen, San Jose, CA), potassium salt (20l 0.5M KH2PO4) and 5 μL assay cocktail (1 mol/L adenosine triphosphate, 15 mmol/L KH2PO4, 15 mmol/L MgCl2, pH 7.8) are added to each 45 μL cell lysate. Finally, the luciferase activity and β-Galactosidase activity are measured, respectively, using a FLUOstar OPTIMA fluorescence spectrometer (BMG LABTECH, Offenburg, Germany) and a β-galactosidase assay system (Promega Biosciences, San Luis, Obispo, CA).
The hot spots of a protein quantify its binding affinity toward another protein at the individual residue level with atomic details, and thus their identification is the starting point for the design of small chemicals targeting the PPI. However, the experimental determination of hot spots is time-consuming and resource demanding while the accuracy of the current computational methods for hot-spot identification remains low. In this article, we presented a novel structure-based computational approach for hot-spot determination through the docking of a compound known to be able to inhibit the particular PPI into a set of protein structures homologous to only one of the two interacting partners. The subsequent experimental confirmation of the predicted hot spots suggests its potential as a general method for hot-spot determination while the specific types of interactions identified by the approach could be used to guide lead optimization with the compound itself as a scaffold. The confirmation also demonstrates the adequacy of our ad hoc scoring function for hot-spot determination. The interfaces between human ActRIIs and Cripto-I as well as the hot spots on ActRIIs first determined by our approach support the view that Cripto-I plays an important role in both activin and nodal signal pathways.