Quantitative Structure–Activity Relationships and Docking Studies of Calcitonin Gene-Related Peptide Antagonists


Corresponding author: Anahita Kyani, anahita_kyani@yahoo.com


Defining the role of calcitonin gene-related peptide in migraine pathogenesis could lead to the application of calcitonin gene-related peptide antagonists as novel migraine therapeutics. In this work, quantitative structure–activity relationship modeling of biological activities of a large range of calcitonin gene-related peptide antagonists was performed using a panel of physicochemical descriptors. The computational studies evaluated different variable selection techniques and demonstrated shuffling stepwise multiple linear regression to be superior over genetic algorithm-multiple linear regression. The linear quantitative structure–activity relationship model revealed better statistical parameters of cross-validation in comparison with the non-linear support vector regression technique. Implementing only five peptide descriptors into this linear quantitative structure–activity relationship model resulted in an extremely robust and highly predictive model with calibration, leave-one-out and leave-20-out validation R2 of 0.9194, 0.9103, and 0.9214, respectively. We performed docking of the most potent calcitonin gene-related peptide antagonists with the calcitonin gene-related peptide receptor and demonstrated that peptide antagonists act by blocking access to the peptide-binding cleft. We also demonstrated the direct contact of residues 28–37 of the calcitonin gene-related peptide antagonists with the receptor. These results are in agreement with the conclusions drawn from the quantitative structure–activity relationship model, indicating that both electrostatic and steric factors should be taken into account when designing novel calcitonin gene-related peptide antagonists.

Calcitonin gene-related peptide (CGRP) is a potent 37 amino acid neuropeptide generated by alternative tissue-specific splicing of the primary transcript of the calcitonin gene. CGRP was first discovered when alternative processing of RNA transcripts from the calcitonin gene was shown to result in the production of distinct CGRP-encoding mRNAs (1). Calcitonin gene-related peptide signals through a seven-transmembrane G-protein-coupled receptor that belongs to the family of secretin receptors. The CGRP receptor is a heterodimer of the G-protein-coupled calcitonin receptor–like receptor (CLR) and receptor activity modifying protein 1 (RAMP1) as an accessory protein. Receptor activity modifying protein 1 has a role in facilitating the expression of CLR on the cell surface and is also necessary for the ligand-binding activity of the receptor (2). Calcitonin gene-related peptide is broadly expressed throughout the central and peripheral nervous systems (3) as well as in the heart, blood vessels, pituitary gland, thyroid gland, lungs and gastrointestinal tract. Although CGRP has a general range of physiological and biological effects including neuromodulation, vasodilatation, cardiac contractility, bone growth, and mammalian development (4), its vasodilatory effect is the most well studied (5). It is postulated that the release of CGRP is initiated early in the migraine process and increases as the headache intensifies (6). The role of CGRP in the pathogenesis of migraines has led CGRP antagonists to be interesting therapeutic targets.

Recent studies have demonstrated that it is possible to design high-affinity CGRP antagonist analogs with significantly increased potency, selectivity, and metabolic stability as compared to the human αCGRP (8–37)-amide antagonist peptide (7). Structure–activity studies aimed at designing novel antagonists have been challenging, however, as the structure of the receptor has only recently been reported (8). As a result, antagonist interaction and potency have been hard to predict.

Quantitative structure-activity relationship (QSAR) is of central importance to computational drug design as it provides useful information regarding medicinal chemistry and drug activity. These relationships are mathematical equations that relate chemical structures to a wide variety of physical, chemical, and biological properties. The resultant equations can be used to predict the properties of novel compounds in silico. Thus, we propose to use QSAR and docking studies to explain the recent CGRP antagonist findings and create a tool for design of novel CGRP antagonists. This will also shed light on the mechanism of action of the CGRP antagonist and the properties that affect their potency.

In this work, we intended to build a powerful QSAR model for two different classes of CGRP antagonists using a wide range of descriptors. Shuffling stepwise multiple linear regression (MLR) was used as a variable selection technique to choose the best descriptors from different subsets of conventional (a9) and inductive (10) descriptors and enable modeling of the pIC50 values of the CGRP antagonists. The predictive ability of the model was examined using a wide range of validation techniques. Shuffling stepwise MLR was compared with genetic algorithm-MLR (GA-MLR) for selecting the important features of the peptides included in this study. The linear model presented was compared with the non-linear model developed using support vector regression technique. The linear model was used to predict the biological activity of peptides in the external set having less-accurate pIC50 values. Docking of the most potent peptides with the CGRP receptor was performed to confirm the binding site of the antagonists and determine the antagonizing mechanism of these peptides.

Materials and Methods

Software, data set and descriptor generation

Published data from two structure–activity experimental studies containing 97 peptides (representing systematic modifications of C-terminal segments CGRP Y0-28-37 and CGRP 27–37) were used as datasets (7,11). The IC50 was transformed to pIC50 (IC50 in m) to decrease the heteroscedasticity of the data. The sequences of the peptides and their corresponding pIC50 values are shown in Table 1.

Table 1.   Experimental and calculated pIC50 values (IC50 in m) and sequences for CGRP antagonists
No.Peptide sequence (single letter code)pIC50 (Exp.)pIC50 (Cal.)
  1. CGRP, calcitonin gene-related peptide.

  2. *Molecules were used as external test set. [5.52(IC50 > 3E-6)] – indicate the pIC50 transformed from less-accurate IC50 values of references (7,11) (IC50 values in the parenthesis).

23*Ac-WVTH(Cit)LAGLLS(Cit)SGGVV(hArg)KNFVPTDVGpFAF-NH26 (IC50 > E-6)8.49
50*YVPTAVGSEAF4 (IC50 > E-4)4.92
52*YVPTNVGSAAF5 (IC50 > E-5)5.00
60*YVPTnVGSEAF4 (IC50 > E-4)5.12
61*YVPTNVGsEAF4 (IC50 > E-4)4.91
62*YVPTNVGSEaF4 (IC50 > E-4)4.87
64*YVPSNVGSEAF4 (IC50 > E-4)5.09
74*FVPPNVGSEAF5 (IC50 > E-5)6.13
76*FVPTNPGSEAF5 (IC50 > E-5)5.72
77*FVPTNVPSEAF5 (IC50 > E-5)5.70
79*FVPTNVGSPAF5 (IC50 > E-5)5.75
80*FVPTNVGSEPF5 (IC50 > E-5)5.62
81*FVPTDVGSEAF5.52 (IC50 > 3E-6)6.15
92*FVPTDVG-Acp-FAF5.52(IC50 > 3E-6)6.29
93*FVPTDVG-Pac-FAF5.52(IC50 > 3E-6)5.86

The peptides were built as full extensions in MOE (Molecular Operational Environment v. 2006.10; Chemical Computation Group Inc., Montreal, Canada, 2006) software, and their structures were simulated using CHARMM27 force field settings enabling ‘bonded, van der Waals, electrostatics and restrains’ in Born solvation. A total of 211 conventional descriptors and 48 inductive descriptors (9,10) were calculated for the 97 peptides of the data sets. The Dragon program package, developed by Milano Chemometrics and QSAR Group (Dragon, v. 5; TALETE srl, Milano, Italy), was employed for calculating a total of 3132 different descriptors for each of the peptides in Table 1. To decrease the redundancy that existed in the descriptors data matrix, the descriptors with more than 90% constant, zero values and colinear descriptors (i.e., R2 > 0.90) were deleted from the data matrix. A total of 132 descriptors were retained after removing the constant, zero and highly correlated ones. Thus, the next computational steps were performed on a 97 × 132 data matrix including the 132 descriptors for the 97 CGRP antagonists. All programs for variable selection and modeling were written in MATLAB (version 7.1; MathWorks Inc., Natick, MA, USA).

Variable selection and model building

First, the data matrix containing the total descriptors was subjected to principal component analysis, and the first two principal components were plotted against each other (Figure 1). As demonstrated in the principal component analysis score plot, the peptide molecules were divided into two classes. These two classes are related to the two data sets with different peptide sequences (7,11). Thus, presenting a model capable of predicting the activity of both classes of peptides with a large range of activities can be useful for future peptide design studies. To reach this goal, the 82 peptides with the experimental values of pIC50 were used for model building and 15 remaining peptides with less-accurate values of pIC50 were put in the external test set (Table 1). As a result of the wide range of peptide sequences, we used shuffling in both variable selection and model building phase to decrease the probability of chance correlation.

Figure 1.

 Score plot of principal component analysis for total data set molecules.

Employing a powerful variable selection method that finds the most relevant and important variables to construct a model with high predictive power is the important step of work. As 132 molecular descriptors were available for model building and only a subset of them would be statistically significant in terms of correlation with pIC50, variable selection can lead to an optimal QSAR model. In the literature, an extensive range of methods for variable selection has been applied such as stepwise, forward, backward, principal component analysis (12), genetic algorithms (13,14), and shuffling-adaptive neuron fuzzy inference system (15).

In this study, we used shuffling stepwise MLR to select the most relevant descriptors having linear correlation with biological activity. The written program in MATLAB was able to perform 1000 times stepwise MLR selection on 1000 different training and test sets of peptide data sets. Each test set including 25% of whole data is randomly selected, and the remaining data are employed as a training or calibration set for performing stepwise selection. Shuffling data to 1000 sets and selection of descriptors from all of these sets can give a frequency plot indicating the descriptors frequently selected through stepwise MLR. The descriptors selected using the shuffling stepwise MLR method were applied for the construction of a linear model.

The hybrid approach (GA-MLR) combining GA with MLR was used as the other linear variable selection technique to select the best variable correlated with pIC50. The performance of this approach was compared with shuffling stepwise MLR in construction of a linear model. To check the probability of a non-linear correlation between the descriptors and pIC50 values, the performance of support vector regression technique was evaluated on both subsets of descriptors selected using shuffling stepwise MLR and GA-MLR. The QSAR models were validated through leave-20-out cross-validation procedure and statistically evaluated by the squared correlation coefficient of the experimental versus predicted pIC50 values (R2), and the standard deviations (SD) of training and cross-validation sets. Leave-one-out (LOO) cross-validation was performed as the other validation technique on the most robust QSAR model. To check the risk of chance correlation, the validation of the feature selection and modeling results of the best QSAR model were evaluated based on the Y-randomization test.

Docking procedure

Threading method was used to give the best model of peptide structure for the docking study. The aim of threading is to optimally ‘thread’ the amino acids of a query protein into the structural positions of a template structure as measured by a scoring function (16). The amino acid sequence of peptides was entered into a local threading meta-server (LOMETS); http://zhanglab.ccmb.med.umich.edu/LOMETS/) for quick and automated predictions of protein tertiary structures and spatial constraints (17). LOMETS generates 3D models by collecting high-scoring target-to-template alignments from eight locally installed threading programs. The pdb file with the highest confidence score was analyzed for its consistency with the CGRP secondary structure (18,19).

The pdb structure of CGRP receptor was taken from Protein Data Bank (8). Recently, Moore and coworkers (8) solved the crystal structure of the CLR/RAMP1 N-terminal ectodomain heterodimer revealing how RAMP1 binds to CLR and potentially modulates the receptor’s activity.

The pdb structure of CGRP antagonists extracted from LOMETS server was placed close to the CGRP receptor ectodomain, and the pdb of the complex of CGRP antagonist and the receptor entered to the GRAMM-X (global range molecular matching) web server v.1.2.0 (20,21) for the peptide–protein docking. GRAMM-X is a program for protein docking and requires only the atomic coordinates of the two molecules (no information about the binding sites is needed) to predict the structure of the complex. The program is an empirical approach to smoothing the intermolecular energy function by changing the range of the atom-atom potentials and performs an exhaustive 6-dimensional search through the relative translations and rotations of the molecules (22,23). A selection of three potent peptide antagonisits (sequences No. 2, 4 and 5 in Table 1) with the IC50 values around the activity of telcagepant (MK0974); a potent migraine drug (8) was subjected to docking to find the main interactions of CGRP antagonists with the CGRP receptor.

Results and Discussion

QSAR modeling

Construction of a robust QSAR model will provide pertinent information for the prediction of the biological activity of CGRP antagonists and design of novel peptides. To fulfill this goal, we used different chemometric techniques to consider all aspects of the data sets. The principal component analysis score plot of the entire data set illustrates two classes of peptide sequences belonging to two different data sets from references (7) and (11) (Figure 1). To present a comprehensive model that predicts the activities of both classes of peptides with a large range of activities, different variable selection and modeling procedures were employed. The variable selection and model construction were performed on 82 peptides with observed values of pIC50. The best QSAR model was employed to predict the pIC50 of the 15 remaining peptides having less-accurate values observed for their biological activity. Shuffling was performed in both variable selection and the model building phases, to decrease the probability of inhomogeneous selection where a wide range of peptide sequences exists.

Shuffling stepwise MLR was performed on the matrix of 82 × 132 data points. The selection frequency for each of 132 descriptors over the course of 1000 shuffling stepwise MLR cycles clearly demonstrates that some descriptors are more important than others (Figure 2A). The five descriptors with the highest frequencies in this plot were chosen for the model building step. To compare shuffling stepwise MLR with other variable selection techniques, a GA-MLR (generation = 100, population size = 64, mutation rate and crossover frequency = 0.01) was designed. Finally, eight of the 132 descriptors were chosen based on the chromosome with the best fitness. These descriptors were subjected to the model building step for the prediction of biological activity.

Figure 2.

 (A) Frequency percent of descriptors appearing in shuffling stepwise multiple linear regression (MLR). The five most potent descriptors have been selected. Adding more descriptors does not significantly change the statistical parameters of the model. (B) Plot of the mean effects of the descriptors in the proposed MLR model.

To consider the probability of non-linear correlation between descriptors selected by shuffling stepwise MLR and GA-MLR techniques and pIC50 values, we checked the performance of the support vector regression method as a non-linear modeling technique. Support vector regression with kernel type of radial basis function (kernel parameter = 0.1) and C and epsilon parameters (of 50 and 1, respectively) was performed on two sets of descriptors. The correlation coefficient and standard deviation derived from 1000 cycles of leave-20-out cross-validation on different training and cross-validation sets were used as criteria for comparison of QSAR methods. The statistical parameters of the different variable selection and modeling methods were all averages of 1000 rounds of data shuffling; thus, the probability for chance correlation should be very low. The shuffling stepwise MLR method with a lower number of descriptors demonstrated the highest correlation coefficient for the cross-validated data and the lowest standard deviation as compared to the GA-MLR method (Table 2). It was also demonstrated that support vector regression as a non-linear modeling method does not significantly improve on the model.

Table 2.   Statistical parameters of leave-20-out cross-validation for different QSAR models
MethodNo. descriptorsR2inline imageRMSERMSECV
  1. GA-MLR, genetic algorithm-MLR; MLR, multiple linear regression; QSAR, quantitative structure–activity relationship.


Therefore, a simple linear technique (shuffling stepwise MLR) with only five descriptors (described in Table 3) can be employed for QSAR analysis of a diverse set of CGRP antagonists, and in silico calculation of the therapeutic potential of new CGRP antagonist is:

Table 3.   Selected descriptors using shuffling stepwise MLR model
DescriptorDefinitionDescriptor class
  1. MLR, multiple linear regression; 3D-MoRSE, molecule representation of structures based on electron diffraction.

PEOE_VSA+1Sum of van der Waals surface area where qi is in the range (0.05,0.10) calculated by partial equalization of orbital electronegativities (PEOE) methodConventional
DipoleZThe z component of the dipole momentConventional
Most_Neg_Sigma_i_molLargest negative atomic inductive parameter σ* (atom → molecule) for atoms in a moleculeInductive
Mor31m3D-MoRSE – signal 31/weighted by atomic masses3D-MoRSE (Dragon)
Mor16e3D-MoRSE – signal 16/weighted by atomic Sanderson electronegativities3D-MoRSE (Dragon)

The confidence interval (95%) were calculated for the models constant term, demonstrating a lower bound of −1.060 and a upper bound of 2.775, illustrating that the constant term is of less importance for the equation, and thus justifying the model. A tighter standard error could be obtained at the cost of the models simplicity, simply by adding additional descriptor elements.

The robustness of the model is further supported by the high R2 value and F values with low standard deviation. A correlation coefficient (R2) matrix for the five descriptors selected using shuffling stepwise MLR demonstrated no significant cross-correlation between the descriptors in the linear model (Table 4).

Table 4.   Correlation coefficient (R2) matrix for the descriptors selected using the shuffling stepwise MLR method
  1. MLR, multiple linear regression; PEOE, partial equalization of orbital electronegativities.

DipoleZ 1.00000.74560.65040.2966
Most_Neg_Sigma_i_mol  1.00000.74420.3110
Mor31m   1.00000.2453
Mor16e    1.0000

In addition, the high linear correlation between the predicted pIC50 values using the presented QSAR model in a LOO cross-validation process and experimental estimated values for CGRP antagonists (R2 = 0.9046) further demonstrates the high predictive ability of the proposed QSAR model (Figure 3). The residuals of the calculated values of pIC50 were plotted against the experimental pIC50 for the LOO cross-validated data, and the propagation on both sides of the zero line indicates that no systematic error exists in the development of the QSAR model (Figure 4). Moreover, the model is capable of predicting all peptides in the dataset, with a large range of activity, without appearance of any outliers [having residual error value larger than three times of the standard deviation of the model (3 × SD = 1.5078)] in the residual plot.

Figure 3.

 Plot of the leave-one-out (LOO)-CV calculated pIC50 values against the experimental ones for the calcitonin gene-related peptide antagonists studied in this work.

Figure 4.

 Plot of residuals versus experimental values of pIC50 for the proposed quantitative structure–activity relationship model.

The presented model was used for predicting the missing pIC50 of the peptides of the external test set. These peptide sequences are assigned with ‘*’ in Table 1. As the cross-validated model has low standard deviation values, it would be expected that the biological activity predicted for the peptides in the external test set is close to their actual values. The predicted and calculated pIC50 values for all 97 CGRP antagonists are given in Table 1. The biological activity values predicted using the proposed QSAR model for the CGRP antagonists of the external test set are clearly within the range reported in the references (7,11).

To exclude the probability of chance correlation, validation of the variable selection and modeling results was performed based on ten randomized biological activity values. For each, the response vector (pIC50 values) was randomly rearranged, shuffling both stepwise MLR and linear modeling was performed on the scrambled data, and R2 values for the training and cross-validation sets were recorded for each randomized trial. All computations were carried out without a priori knowledge after scrambling the activities. No significant R2 values were observed for either the training set or the cross-validation sets (data not shown). Therefore, it can be concluded that the derived QSAR model is viable for prediction without any probability for chance correlation.

Interpretation of QSAR model

The mean effects of five descriptors can be extracted from the final optimized QSAR model (Figure 2B). As can be seen from the plot, the inductive descriptor of ‘Most_Neg_Sigma_i_mol’ had the highest value of mean effect among the model descriptors. This descriptor identifies the largest negative atomic inductive parameter σ* for atoms in the molecule and represents the steric and electronegativity effects of atoms on the CGRP molecules (10,24,25). Therefore, the identification of this descriptor as the most important feature in the QSAR model adequately describes the aspects of intra- and intermolecular electrostatic interactions that are of special relevance for the activity of the CGRP antagonists.

The second most influential descriptor according to the mean effect plot is ‘PEOE_VSA+1’. It is calculated using the partial equalization of orbital electronegativities (PEOE) method and can determine the effect of the electrostatic distribution in the molecule on the activity of the peptides. This descriptor demonstrates the role of electrostatic interactions of the CGRP molecules with the polar sites of CGRP receptor.

DipoleZ, describing the z component of the dipole moment, is another descriptor with an electrostatic characteristic and can show the effect of the electrostatic interactions of polar sites of peptide antagonists in a special direction with the polar residues of the binding site of receptor.

Mor16e, as a 3D-Molecule representation of structures based on electron diffraction (3D-MoRSE) descriptor weighted by electronegativity, illustrates the role of geometry of the peptide molecules and their electrical diffraction properties during the interaction with the binding site of the receptor.

The four above-mentioned descriptors have positive mean effects and will increase the biological activity of the peptides as their values increase. This demonstrates the importance of electrostatic interactions as the key interactions between the antagonist molecules and CGRP receptor and is in agreement with the results reported by Moore and coworkers (8) on the interaction of bioavailable drug molecules with the CGRP receptor. Moreover, the electrostatic dipole–dipole interactions between the amino acids of the CGRP antagonists and the residues of the receptor can induce hydrogen bonding between the donor and acceptor sites and may be an additional antagonizing mechanism.

The values of the four descriptors with positive mean effects are larger for the CGRP antagonists described in reference (7) (sequences No. 1–32 in Table 1) as compared to the second class of CGRP antagonists described in reference (11) (sequences No. 33–97 in Table 1). As this correlated with the most potently active peptides, this demonstrates the high correlation of these four descriptors with the biological activity of the CGRP antagonists. Further, this also demonstrates the important role of electrostatic and hydrogen bonding interactions that should be considered in the design of new effective CGRP antagonists.

Mor31m is the only descriptor having a negative mean effect. As it is a 3D-MoRSE weighted by atomic masses related to 3D structure of peptides, it reveals the importance of geometry of molecules and steric effects on interaction of CGRPs with the receptor. Increasing the value of this descriptor resulted in a decrease in the activity of CGRP antagonists owing to more steric hindrance of the hydrophobic interaction between the ligand and the binding pocket of the receptor (8). In summary, the results demonstrate that electrostatic, hydrogen bonding, and hydrophobic interactions play important roles in the mechanism of blocking the CGRP receptor.

Docking study

Understanding the mechanisms of CGRP receptor antagonism by drug molecules and peptides is difficult owing to the complex nature of the target (8). The CGRP receptor consists of two protein components, CLR and RAMP1. CLR has an N-terminal extracellular domain, or ectodomain, that binds polypeptide hormones of 27–141 amino acids; additionally, it contains a seven-helix transmembrane domain and an intracellular carboxyl terminus (8). The ectodomain complex is able to bind CGRP, as well as other peptide and small molecule antagonists (8,26). Recently, crystallographic structures have been reported for the CGRP ectodomain in the absence of ligand as well as in the presence the potent CGRP receptor antagonists (8). In the present docking study, we used the structure of the CGRP receptor (PDB code 3N7P) together with simulated structures of three of the most potent peptides from our data set. The structures of peptides No. 2, 4, and 5 were modeled using the LOMETS server. The LOMETS server uses eight threading programs, each generating 20 models, resulting in a total of 160 models for each peptide. The individual models are sorted by their z-scores in its respective algorithm. Among the 10 top models reported by LOMETS, we selected the peptide model with the highest confidence score and with a high z-score value, (20.365, 20.070 and 20.305 for peptides No. 2, 4 and 5, respectively). These threading alignments were obtained using MUSTER, a program available from the LOMETS server, and the structural template 2kj7A, a rat islet amyloid polypeptide (PDB code 2KJ7) (27). The secondary structure, 3D model and Ramachandran plot for peptides No. 2, 4, and 5 are given in Figure 5A–C. The results revealed that the majority of the amino acids in the peptides have phi-psi values that involve little or no steric interference. The phi–psi distribution is also consistent with the expected secondary structure of CGRP peptide antagonists (18,19) lending support to our hypothesized model.

Figure 5.

 Three dimensional model for peptides No. 2 (A), 4 (B), and 5 (C), supported by their peptide sequence (Seq), secondary structure (SS) and Ramachandran plots (build using VEGA ZZ). Rat islet amyloid polypeptide (2kj7A) has been used as a template for threading alignment, and its sequence is available at the top of the figure.

The verified structural models for peptides No. 2, 4, and 5 were placed parallel with the extracellular domain of the CGRP receptor and entered into the GRAMM-X server for rigid-body docking. While the best 3D models of the CGRP peptide antagonists were constructed using threading alignment, optimizing the rigid-body orientation of the peptides relative to the receptor can provide information regarding the possibility for interaction.

Docking simulation of all three CGRP antagonists with the CGRP receptor illustrates that they act by blocking access to the peptide-binding cleft at the interface of CLR and RAMP1 (Figure 6A–C). Thr122 of CLR has been reported as an effective hydrogen bond donor site of CLR and an important residue for interaction between the receptor and antagonists (8). Docking of peptide No. 2 illustrated a hydrogen bond between the phenylalanine (Phe31) of the peptide and Thr122 of CLR that may stabilize the antagonist–receptor interaction resulting in increased potency of the peptide. The hydrophobic stacking interaction of hydrophobic residues (colored yellow) of peptide No. 2 with CLR/RAMP1 hydrophobic pocket stabilized the peptide in the extracellular domain between CLR and RAMP1. Another key residue in the binding of peptides and non-peptides antagonists by CGRP is Trp84 (28). Docking of peptide No. 4 to the receptor demonstrated formation of a backbone–backbone hydrogen bond between Trp84 of the receptor and Pro22 in the antagonist. This, taken together with the hydrophobic stacking of Val25 from the antagonist into the hydrophobic pocket of the receptor, could explain the peptide’s high selectivity for the protein target (Figure 6B). Peptide No. 5, the most potent peptide among all 97 peptides, cross-links RAMP1 (Asp90) and CLR (Arg67) by forming two hydrogen bonds to Arg18 in the peptide (Figure 6C). Arg67 has been reported as a key determinant residue for high-affinity binding of antagonists to the receptor. Its hydrogen bonding interaction with peptide No. 5 likely plays an important role in the peptide’s high-affinity for antagonizing the CGRP receptor, and docking of the peptide is likely significantly stabilized by hydrogen bonding to Asp90. Hydrophobic stacking interactions of hydrophobic residues in the C-terminal end of peptide No. 5 with CLR/RAMP1 hydrophobic pocket would further contribute to the potency and selectivity of this peptide for the CGRP receptor. All three peptides bound to the binding pocket placed in the extracellular domain between CLR and RAMP1 as reported previously for the clinical receptor antagonist telcagepant and olcegepant (8,28). This demonstrates involvement of overlapping peptide–protein interactions that contribute to receptor antagonist potency. Key residues (Thr122, Trp84, Arg67 and Asp90) of the receptor participate in electrostatic and hydrogen bonding interactions with the peptide antagonists, stabilizing the peptide–protein interaction and increasing antagonist potency. Additionally, non-polar (hydrophobic) amino acids of peptide antagonists interact with the hydrophobic sites of the receptor (8). Taken together, these results are in good agreement with the physicochemical properties covered by the five most influential descriptors that were identified through variable selection using shuffling stepwise MLR.

Figure 6.

 Structure of the calcitonin gene-related peptide antagonist/calcitonin receptor–like receptor (CLR)/receptor activity modifying protein 1 (RAMP1) Ectodomain Complex. Ribbon diagram of CLR (cyan) and RAMP1 (magenta) with (A) peptide No. 2 (IC50 = 0.64 nm), (B) peptide No. 4 (IC50 = 0.71 nm) and (C) peptide No. 5. (IC50 = 0.2 nm). All antagonists (brown) act by blocking access to the peptide-binding cleft at the interface of CLR and RAMP1 (8). The hydrophobic residues are shown in yellow, and the dashed green lines indicate hydrogen bonds to the backbone of Thr122 of peptide No. 2, Trp84 of peptide No. 4, and Arg67 and Asp90 of peptide No. 5 (structures are built using ViewerLite 5.0).

The molecular docking of the CGRP antagonists to the CGRP receptor also confirms that it is the C-terminal portion of peptides that interact with the receptor. This is in good agreement with the previous investigations, demonstrating that residues 28–37 (in the case of αCGRP) are required for high-affinity binding of peptides, and this region is almost certainly in direct contact with the receptor (19,29). That the first 32 peptides of the data sets have this region and demonstrate high-affinity binding capability as compared to that of the remaining 65 peptides of low activity provides further proof for the importance of this region in binding.

The findings of our docking study prove that the CLR/RAMP1 extracellular domain complex used in this study can be used for peptide antagonist binding and as a valid functional surrogate for the full-length receptor complex in agreement with recent experimental studies (8,26). The results of the docking study also clarified, for the first time, how peptides bind to a class B G-protein-coupled receptor and highlighted the challenges of designing potent antagonists for the treatment of migraine.


The proposed linear QSAR model was demonstrated to be robust for the prediction of biological activity of CGRP antagonists. The validation procedures utilized in this work (LOO cross-validation, leave-20-out cross-validation, and Y-randomization) demonstrated the accuracy and reliable prediction ability of the produced model. The validated model could successfully predict the biological activity of peptide antagonists with unknown experimental activity. As a powerful linear variable selection method, shuffling stepwise MLR could select the appropriate set of meaningful descriptors. The results of the QSAR method showed the importance of electrostatic, hydrogen bonding and hydrophobic properties in the potent CGRP antagonists. The docking study confirmed the mechanism of interaction proposed by the QSAR model and could successfully identify the CLR/RAMP1 extracellular domain complex as the binding site in the CGRP receptor. The developed QSAR model could likely be an important tool in the design of novel selective antagonists with increased potency for the treatment of migraine.



We gratefully acknowledge financial support from the Danish council for independent research |Natural sciences. We also wish to acknowledge the helpful comments made by the HDP class of 2011 that lead to this improved manuscript.