This article was originally published online as an accepted preprint. The “Published Online”date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at biopolymers@wiley.com
Efforts in drug discovery are routinely based on targeting active sites of proteins with small molecules. The characteristics of typical drug-like molecules were summarized in Lipinski's rule of five,1 namely not more than five hydrogen-bond donors and 10 hydrogen-bond acceptors, a molecular weight under 500 Dalton, and a partition coefficient log P, which is a measure of lipophilicity, of less than five. These were later revised by Veber et al.2 who proposed a more relaxed set of rules for predicting oral bioavailability: (1) 10 or fewer rotatable bonds and (2) polar surface area equal to or less than 140 Å2 (or 12 or fewer hydrogen-bond donors and acceptors). However, in recent years, the productivity of drug-discovery efforts has shown a consistent decline.3 New paradigms have begun to emerge in an attempt to improve the situation. One exciting direction is that of targeting protein-protein interactions as an alternative to targeting active sites of enzymes. Protein-protein interactions between an enzyme and its substrate, between a catalytic domain and its regulatory domains, and between protein motifs and scaffolding domains are involved in most signal-transduction processes in the cell. Stabilization of protein-protein interactions,4, 5 disruption of protein-protein interactions6 and disruption of intra-protein contacts during folding7 are all being pursued. However, despite some success stories, discovering small-molecule drugs that modulate protein-protein interactions remains an enormous challenge.8 The large interfacial areas and the relatively flat topographies of the surfaces of protein-protein interfaces are different from the smaller and deeper cavities of the active sites.6 Therefore, various types of molecules that are suitable for targeting protein-protein interactions, and that do not necessarily comply with Lipinski or Veber rules, are now actively being sought.
Recent advances in peptide and peptidomimetic synthesis9, 10 are generating renewed enthusiasm in the use of peptides as therapeutics, for example, in vaccine development,11 angiogenesis-related12 and auto-immune diseases,13 diabetes14 and neuroprotection.15 In parallel, the role of short linear motifs in mediating protein-protein interactions in signaling complexes16, 17 is being unraveled, and the importance of such motifs (and potentially of their peptidic mimics) in signal channeling has been demonstrated.17 In general, within many of the highly abundant disordered regions in proteins, lie short (3–10 amino acids long) linear peptide sections that are responsible for thousands of protein-protein interactions. Some examples are the SH3 and WW domains which bind to proline-rich regions, the SH2, PTB and 14-3-3 domains which bind to phosphorylated peptides, the PDZ domain discussed below, and many others.17 It has become clear that protein-peptide interactions are very abundant in the cell and estimates suggest they represent 15–40% of all interactions,18 making peptide discovery and development of wide applicability and interest.
Cyclic peptides19–22 and other modified peptides23, 24 can be highly potent as well as selective, providing a compromise between structural preorganization and sufficient flexibility to mold to a target surface and maximize binding interactions. There is much evidence that despite having molecular masses well above Lipinski's rule constraints, cyclic peptides and other macrocycles can possess drug-like physicochemical and pharmacokinetic properties such as good solubility, lipophilicity, metabolic stability and bioavailability.25
Targeting protein-protein interactions by large and flexible molecules requires the adaptation of research methods from several disciplines: suitable compounds need to be synthesized, appropriate biochemical assays that are able to monitor modulation of protein-protein interactions need to be developed, and computational tools that are adequate for analyses of protein interactions with such compounds need to be devised. This review focuses on computational methods for the rational design of peptidic modulators of protein-protein interactions. The fact that the peptides are short proteins allows the use of sequence-based techniques, which are not suitable in the case of small molecules; on the other hand, the large size and high flexibility of peptides present a challenge for the conventional computational docking approaches that rely on the relatively small number of rotatable bonds in the typical “drug-like” molecules. In this review, we highlight the sequence-based approaches to peptide development as well as the adaptation of docking techniques to large and flexible peptides. The approaches described below are schematically represented in Figure 1.
Schematic representation of the approaches described in this review, in the general context of drug (or modulator of signaling) development. (A) Protein-protein interaction cartoon (B) Mimicking the interface sequence of one of the partners by a peptide. (C) Finding inhibitory peptides by biochemical screening of peptide libraries. (D) Predicting the most probable bound conformation of the peptide in complex with the target protein (peptide docking). This approach can be applied to guide optimization of peptide sequences derived by A or B; or to simultaneously optimize the conformation and the sequence of the peptide.
SEQUENCE-BASED APPROACHES FOR DISCOVERY OF PEPTIDE LEADS
Mimicking a Binding Partner
One popular strategy used to search for peptides that inhibit a protein-protein interaction is to mimic the sequence of one of the interacting proteins26 (Figure 1B). Following are some examples of applying this strategy to several important drug targets.
Protein kinases are enzymes that catalyze the phosphorylation of tyrosine or serine/threonine residues. Protein kinases are involved in a myriad of signaling pathways and the regulation of their catalytic activity is crucial for healthy functioning of the cell. Most protein kinase inhibitors in laboratory use, in the clinic and under clinical development are small molecules that interfere with ATP binding of their target kinase.27, 28
Recently, approaches aimed at targeting the protein-protein interactions of kinases, rather than their ATP-binding site, have emerged. These include inhibiting the interactions of the catalytic domain with its regulatory domains (e.g. Refs.29–31), inhibiting interactions of the regulatory SH2 domain with its ligands,32 inhibiting kinase-substrate interactions22, 33–35 etc. The main approaches that use sequence data for selectively targeting interactions between kinases and other proteins are: mimicking a common region of unrelated binders of the same protein, mimicking the substrate, or mimicking the substrate-binding regions in the kinase.
Mochly-Rosen and coworkers derived a peptide based on the only homologous region between annexin I and a member of the 14-3-3 family, two unrelated binders of protein kinase C (PKC). The authors reasoned that the region of homology is what mediates PKC binding to these two proteins. Indeed a 15-mer peptide corresponding to this sequence inhibited PKC binding to the purified or cloned RACK, as well as to annexin I; the peptide inhibited βIIPKC function in cells and its injection into a Xenopus oocyte inhibited insulin-induced oocyte maturation.29, 36, 37
Inhibiting kinase-substrate interactions by mimicking the sequence of known substrate peptides is an approach that is being actively pursued by several labs.22, 35, 38–40 For example, the sequence of the activation loop of insulin-like growth factor-1 receptor, which undergoes autophosphorylation, was used as a pharmacophore template to be mimicked by novel macrocyclic molecules.22
A complementary approach is to derive peptides based on the sequence of the enzyme rather than the substrate. We analyzed structures and sequences of protein kinases and identified well-defined substrate-binding regions consisting of accessible and variable patches distributed between conserved and buried anchors. The variable nature of these regions suggests that they are involved in determining substrate specificity. We used these substrate-binding regions from several different kinases to derive peptides of 7–11 amino acids in length, all myristoylated at the N-terminus and amidated at the C-terminus. These peptides inhibited the activity of kinases from which they were derived in both cell-based and cell-free assays (whereas control peptides failed to do so), and inhibited cancer cell proliferation in the micromolar range.41 Administration via injection into animal models has confirmed that the kinase-mimicking peptides were nontoxic and had the expected physiological effects.34, 42 Because the peptides are derived from a well-defined region in the kinase sequence, they can easily be designed for a large number of kinases, including those that are novel or not characterized.
A similar idea was recently adopted for the inhibition of β-APP cleaving enzyme-1 (BACE1),43 a primary therapeutic target in Alzheimer's disease.43, 44 BACE1 inhibitory peptides of between 9 and 12 residues in length, were derived from the catalytic domains of BACE1 and N-myristoylated for cell permeability. These molecules inhibited the formation of the BACE1 product APPβ, and significantly reduced the level of the BACE1 product Aβ40, a neurotoxic peptide whose accumulation induces Alzheimer's disease. Control peptides did not show this inhibitory effect.43
gp41, a transmembrane protein which is part of the HIV-1 envelope glycoprotein (Env), mediates HIV-cell fusion. It contains four major functional domains: the fusion peptide, heptad repeat 1 (HR1), HR2, and the transmembrane domain that anchors gp41 to the viral lipid bilayer. The FDA-approved anti-HIV drug Enfuvirtide is a synthetically produced 36-amino-acid peptide derived from a continuous sequence within the HR2 domain of gp41. The fusion of host and viral membranes is a complex process, involving several conformational transitions. In one of those transitions, the HR2 domain “zips” into a hydrophobic groove on the outer face of a pre-existing HR1 helix bundle. Enfuvirtide inhibits this stage by competing with the HR2 domain for HR1 binding.45
G-protein coupled receptors (GPCRs) are seven-transmembrane proteins that function as signaling switches throughout the body and regulate virtually every physiological response.46, 47 This is one of the largest gene families targeted for drug discovery.47 Peptides (usually around 20 amino acids in length) that mimic GPCR-transmembrane domains48–50 are thought to inhibit GPCR oligomerization. 10–20 amino acids long peptides mimicking the intracellular loops of GPCRs51–53 were designed to modulate interactions with downstream proteins. Shorter peptides and peptidomimetics mimicking the endogenous peptidic ligands of some GPCRs are under active development.21, 54
Integrins are a family of heterodimeric cell-surface receptors that regulate cell attachment and response to the extracellular matrix. Many integrins recognize a sequence in the matrix protein that contains the core amino-acid triplet, “Arg-Gly-Asp” (RGD). Cilengitide is a cyclic ligand-mimicking Arg-Gly-Asp peptide, which binds to and inhibits the activities of the αvβ3 and αvβ5 integrins, thereby inhibiting endothelial cell-cell interactions, endothelial cell-matrix interactions, and angiogenesis.55 The cyclization of the peptide increases its stability to serum degradation and introduces stereo constraints, that make it selective with respect to other integrins. The compound is currently in clinical trials for the treatment of glioblastoma.
Finding an Optimal Synthetic Peptide by a Wide-Scale Screen
Recent studies have used wide-scale screens of synthetic or phage-display peptides for binding to a protein of interest (schematically represented in Figure 1C). Combinatorial approaches to peptides as protein-site mimetics have been discussed in a recent review.56 Such screens of either random57–61 or biased peptide libraries22, 38, 62 of peptides of up to 20 amino acids in length,63 may identify potent peptidic inhibitors that are not necessarily similar to native ligands of the protein, but nevertheless efficiently inhibit the protein-protein interaction of interest.
Lead inhibitory peptides, whether based on a known binding partner or obtained from a library screen, often serve as merely the first step in developing an inhibitor. For example, rational changes may be aimed at enhancing the stability of the active conformation, affinity, selectivity and other properties of the peptide. Towards this goal, detailed structural information about the interaction between the peptide and the target protein is most useful. However, such information may be difficult to acquire, both experimentally and computationally. In the following, we describe the challenges and advances in the development of computational tools for structure-based peptide design.
STRUCTURE-BASED APPROACHES FOR PEPTIDE DESIGN AND OPTIMIZATION
The key challenge for computational prediction of protein-peptide interactions is the flexibility of the systems under consideration: peptides have many more degrees of freedom than typical small molecules. Accounting for this flexibility is crucial for the rational design and optimization of peptides and peptidomimetics. In addition, some disruptors of protein-protein interactions aim to stabilize the inactive form of the protein. Thus, the rational design of such inhibitors must take into consideration the conformational plasticity of the protein and the interplay between different conformations. We focus on the methods used to predict the detailed interaction between the protein and the peptide, namely “peptide docking” (schematically represented in Figure 1D).
An excellent review on ligand docking to proteins can be found in Sousa et al.64 The ability of docking methods to dock ligands into a known native structure has been evaluated in several recent papers.65–68 Redocking (docking of the ligand back into its partner which is kept in the “bound” conformation) of ligands with up to 10 and sometimes even 20 rotatable bonds is currently achievable with several docking algorithms. However, docking of highly flexible ligands, such as peptides (which have four rotatable bonds per residue on average) remains nontrivial. There have been only a few comparative studies for such large ligands. In a comparative redocking study performed by Bursulaya et al., two ligands had about 30 rotatable bonds.67 ICM69 redocked both within 2 Å; one of these ligands was also successfully docked by Autodock,70 while the other had a root mean square deviation (RMSD) of 9.10 Å. DOCK,71 FlexX,72 and GOLD73 misdocked these highly flexible ligands (with RMSD >5.9 Å).67 Another comparative test set74 included five ligands with more than 25 rotatable bonds: Surflex75 redocked three of them with RMSD <2 Å, FLexX72 and FRED76 redocked two, GOLD73 and GLIDE77 redocked one, and DOCK71 did not redock any of these flexible ligands within 2 Å.
A few procedures have been developed specifically for peptide docking. A heuristic search of the potential energy surface of peptides in the binding grooves of (rigid) proteins was tested on a dataset of 56 complexes.78 While all 20 tripeptides were successfully redocked, only 3 of the 36 larger (tetra, penta, hexa, and hepta) peptides could be redocked within 2 Å, suggesting that the sampling strategy needs further adaptation for the longer peptides.
Modeling of the complexes between various nuclear receptors (NRs) and peptide coactivators was performed starting from the bound conformations of both partners and using a knowledge-based coarse-grained potential and replica exchange Monte Carlo sampling.79 Only 1 out of 10 peptides redocked within 2 Å.
Limiting the conformational space using knowledge-based constraints is often crucial for successful peptide docking. The constraints may come from experiments (such as NOE data80) or from previous knowledge of conserved positions for some of the peptide residues.81–83
Scheraga and coworkers used a “ligand-growing” procedure. Optimization of the ECEPP/3 potential function84 supplemented by the experimental NOE constraints was carried out using a collective variable Monte Carlo minimization technique.85 This approach facilitates structure determination from NMR experiments of weakly binding systems. It was applied to determine the structure of a fibrinogen A α-like peptide bound to an active-site mutant of thrombin.80
Schafroth and Floudas defined the pockets binding each of the peptide residues based on the known structure of the complex of a major histocompatibility complex (MHC) molecule HLA-DRB1*0101 with a cognate peptide, and used deterministic global optimization to dock MHC peptides. The pockets were predefined and considered to be independent of each other and the protein was considered to be rigid.86 Another work on MHC peptides made use of the fact that N-terminal and C-terminal parts of the peptides dock to well-conserved sites. The edges were docked as rigid bodies, and the rest of the peptide was placed using a loop-closure technique, achieving Cα RMSD <1.5 Å for 9- to 10-amino-acid-long peptides (heavy atom RMSD was not reported).87
Well-defined anchoring points of peptides to proteins such as in MHC-peptide binding also occur in other cases: many linear motifs have a well-defined docking region for the conserved residues but more variable docking areas for the residues flanking the motif.88 Two of the applications described below81, 83 utilize the anchoring point idea and succeed not only in redocking, but also in the much more challenging task of docking peptides to other structures and to homology models.
Our protocol for flexible peptide docking,81 implemented in CHARMM,89 is based on a simulated annealing molecular dynamics (MD) approach. An anchoring point between the peptide and the protein must be known. Then, the peptide-protein complex is heated to an elevated temperature (the peptide does not “evaporate” because of the tethered anchor) and the conformational space in the binding region is explored. Each of the hundreds of conformations obtained from the heated trajectory is cooled, and then optimal side-chain rotamers are chosen using SCWRL3.0.90 The resulting poses are minimized and scored by the sum of the interaction energies between the protein and the peptide, and the internal energy of the peptide. We tested the ability of this method to predict peptide binding to PDZ domains; the well-conserved position of the Cα atom of the C-terminal residue of the peptide was used as an anchoring point. Redocking to native structures yielded excellent results (RMSD <2 Å) for the six tested penta- to octapeptides. In docking to other structures of the same protein (either apo, or originally complexed with a different peptide), and even to homology models, we found that in 9 out of 12 cases, the best-scoring poses had RMSD <2.8 Å for heavy atoms of tetra- to octapeptides.81
One of the motivations for studying the structures of PDZ-peptide complexes is the puzzling specificity profile of these interactions: one peptide may bind to dissimilar PDZ domains, while the same PDZ domain may bind dissimilar peptides.91, 92 Recent wide-scale peptide-library studies on PDZ specificity58, 60 have shed more light on the propensities of different PDZ domains towards peptides of particular sequences, but the ability of the same PDZ domain to bind peptides that have residues with different physical properties at a given position is still apparent (see Figure 3 in Tonikian et al.58). As suggested previously,92 this variability probably reflects structural differences in the binding conformations of the peptides and/or their binding-site residues, which enable accommodation of the peptides in the binding groove. The interest in PDZ domains as drug targets, and the attempts to develop peptidomimetic or small-molecule PDZ inhibitors20, 93 require structural data to complement the vast available sequence data. Since determining the crystallographic structure of all peptide-PDZ complexes is unfeasible, docking is an essential procedure.
Bordner and Abagyan83 tested and optimized peptide-MHC docking simulations using a flexible all-atom model of the complete peptide. A biased-probability Monte Carlo minimization method94 implemented in the ICM program, combined with grid potentials to represent interactions with the MHC, allowed a computationally efficient sampling of the peptide's degrees of freedom. Comparison of all available X-ray structures revealed conserved hydrogen bonds between the N and C termini of the peptide and particular MHC residues. Based on this observation, backbone atoms of the peptide's N and C termini were tethered to these conserved positions, by a quadratic energy restraint. This work was not limited to redocking: accurate prediction (0.75 Å backbone RMSD) for cross-docking of a highly flexible decapeptide, as well as docking predictions using homology models with low average backbone RMSDs of less than 1.0 Å were achieved.83
Sood and Baker aimed to recover not only the conformations but also the sequences of known binding peptides.82 Starting from an anchor point, the authors extended the peptide backbone by using a library of about 500,000 peptide fragments of the appropriate length, which was precompiled from a nonredundant subset of the Protein Data Bank (PDB). Extensions that resulted in steric clashes between backbone atoms, or expanded such that there was little chance of favorable interactions with the target protein were eliminated. Side chains were then added using RosettaDesign, and Monte Carlo search was used to sample all rotamers of all residues at the protein-peptide interface and identify the sequence with the lowest energy. In cases of little sequence variation of the native peptide ligands, the protocol generated models with low Cα-RMSDs from the native structure, and the consensus sequences of the models recapitulated those observed in phage-display or site-directed-mutagenesis experiments. This is a promising approach to simultaneously model backbone flexibility and sequence diversity which may prove useful in protein-peptide interface design and prediction.
CHALLENGES AND FUTURE DIRECTIONS
Advances in peptide docking and design (for example Refs.95–98) have raised hope for the tractability and relevance of rational design, including docking of peptides and other large and flexible molecules to proteins. Below we highlight several directions that await further developments.
Calculation of Binding Affinity and Binder/Nonbinder Classification
In addition to the structural details of interactions that may be obtained by docking or experimentally, it is important to be able to predict the affinity of binding (or complete lack of binding) of peptides to target proteins. The binding affinity is determined by the interplay between energy and entropy. Upon binding, the ligand becomes less mobile, and the resulting loss in configurational entropy opposes the attractive forces that drive binding.99 Energy models often account for changes in torsional entropy with a term related to the number of rotatable bonds in a ligand. However, recent results suggest that the physical model underlying these rotamer-counting approaches overestimates the number of stable conformations accessed by the ligand in the free state. Vibrational entropy, on the other hand is often underestimated. The common assumption that different ligands lose the same amount of rotational and translational entropy upon binding also appears to be unsupported.100 These challenges apply to ligands in general, but are even more significant for the flexible peptides. Nevertheless, a combination of (even nonideal) physics-based methods with machine learning approaches appears to be promising.83, 101 For example, a support vector machine (SVM) was employed by Hou et al. to build classification models based on the molecular interaction energy of peptides with a SH3 domain. The best SVM classifier had prediction accuracies of 78% and 91% for the binding and nonbinding peptides, respectively.101
Incorporation of Explicit Water
Water molecules can modify the shape and flexibility of a ligand-binding site in a protein, improving the complementarity between the two. In drug design, attempts have been made to incorporate water molecules into ligands at their binding sites with the goal of improving binding affinities, as in the design of a class of HIV protease inhibitors.102 Interestingly, however, the displacement of interfacial water molecules does not always lead to an increase in the ligand's binding affinity. It has been shown that the inclusion of critical water molecules can lead to significant improvement in docking results,102 and that some, but not all crystallographic water molecules can make significant contributions to the protein-ligand's binding free energies.103 Explicit solvent MD and calculation of entropic contributions in preformed protein-peptide complexes have been investigated (for example Ref. 104). The simulation of the dynamic process of peptide binding to a protein in explicit solvent is another exciting direction. The metadynamics algorithm was used to investigate the role of two peptides in the folding of HIV-1 protease monomer.7 This algorithm is aimed at reconstructing the multidimensional free energy of complex systems, and is based on artificial dynamics performed in the space defined by a few collective coordinates, which are assumed to provide a coarse-grained description of the system. The dynamics is driven by the free energy of the system and is biased by a history-dependent potential constructed as a sum of Gaussians centered along the trajectory of the collective variables.105 The dynamic process of peptide binding to SH3 binding domain has been simulated using unbiased atomistic MD simulations in explicit water.106 Thirteen independent, 50–150 ns long simulations were performed starting from peptide-protein distances of 13–20 Å. One of these runs resulted in a near-native conformation of the complex. The simulated binding process was shown to have three characteristic phases: an initial rapid diffusion, a nonspecific-encounter complex, stabilized by salt bridges, and then a specific stable bound complex. The transition from the first to the second phase is driven by long-scale electrostatic force, while use of explicit solvent revealed that the transition to the last phase is accelerated by partial dewetting of the hydrophobic interface.
Docking Without Prior Knowledge of Interaction Region or Anchoring Point
There are cases in which an extended surface is available for the protein-peptide interactions.96 In such cases, biochemical (for example, mutational) data typically need to be included to narrow down the possible regions of interaction.96 Purely ab initio prediction of the peptide-protein binding site remains a formidable challenge. Nevertheless, recent work has demonstrated an example of successful ab initio docking followed by redesign of a peptide, probably facilitated by the fact that the peptides in that study were cyclic.98
A somewhat related issue is that of prediction of peptide aggregation, relevant to the study of Alzheimer's and other protein misfolding diseases.107 The mechanism by which peptides derived from the C-terminus of the Aβ42 protein inhibit the formation of neurotoxic oligomers of this protein (considered to be the cause of Alzheimer's disease) was investigated using a coarse-grained protein model and discrete MD approach.108 Although incapable of providing a high- resolution structure of the complex, such methods enable the rationalization of experimentally observed phenomena.109
Peptide-Like Molecules Including Non-Natural Amino Acids or Building Blocks
Modification of lead peptides often includes introduction of non-natural amino acids or modulation of the backbone. For example, cyclic peptides19–22, 110 and peptides that include non-native residues such as N-methylated111, 112 or D-amino acids,42 may improve stability, cell permeability and efficacy of natural peptide leads. Sequence-based tools or knowledge-based protein force fields cannot be applied to these compounds, which are no longer merely short proteins. Structure-based approaches are applicable, but require adaptations, such as development of force-field parameters that would describe the potential energy surface of the modified amino acids.
Transformation into Small Molecules
A good model of a protein-peptide complex enables the determination of a pharmacophore model (i.e., geometrical arrangements of chemical features such as hydrogen bonding and electrostatic and hydrophobic interactions). This model may then be used to design small molecules that mimic the peptide.113 New computational tools are being developed to facilitate the search for such peptide-mimicking molecules.114–116 In a recent study, structures of peptides that correspond to known linear motifs and for which structures in complex with proteins are available, were used to search against FDA approved drugs to identify similar compounds.117 It will be interesting to see how these approaches contribute to the field of rational design of peptidomimetic and peptide-inspired inhibitors.
CONCLUSION
The combination of pharmaceutical need and recent advances in peptide synthesis makes it a particularly exciting time in the rational design of peptidic compounds and in the development of adequate computational methods. Indeed, computational tools suitable for the challenge of modeling peptide-protein interactions are being used by several groups and are starting to contribute to the modulation of protein-protein interactions and drug discovery. The development of force-field potentials suitable for peptide-like molecules which include non-natural amino acids is an important and promising direction. The peptidomimetics field is likely to benefit from the synergistic combination of biochemical studies, synthetic efforts and computational (sequence- and structure-based) techniques. We believe that computational effort directed at the many cases in which anchoring information of the peptide to the target protein is available will prove to be most fruitful, and will likely contribute to discovery and development of novel compounds and eventually drug candidates.