Construction of a robust QSAR model will provide pertinent information for the prediction of the biological activity of CGRP antagonists and design of novel peptides. To fulfill this goal, we used different chemometric techniques to consider all aspects of the data sets. The principal component analysis score plot of the entire data set illustrates two classes of peptide sequences belonging to two different data sets from references (7) and (11) (Figure 1). To present a comprehensive model that predicts the activities of both classes of peptides with a large range of activities, different variable selection and modeling procedures were employed. The variable selection and model construction were performed on 82 peptides with observed values of pIC50. The best QSAR model was employed to predict the pIC50 of the 15 remaining peptides having less-accurate values observed for their biological activity. Shuffling was performed in both variable selection and the model building phases, to decrease the probability of inhomogeneous selection where a wide range of peptide sequences exists.
Shuffling stepwise MLR was performed on the matrix of 82 × 132 data points. The selection frequency for each of 132 descriptors over the course of 1000 shuffling stepwise MLR cycles clearly demonstrates that some descriptors are more important than others (Figure 2A). The five descriptors with the highest frequencies in this plot were chosen for the model building step. To compare shuffling stepwise MLR with other variable selection techniques, a GA-MLR (generation = 100, population size = 64, mutation rate and crossover frequency = 0.01) was designed. Finally, eight of the 132 descriptors were chosen based on the chromosome with the best fitness. These descriptors were subjected to the model building step for the prediction of biological activity.
Figure 2. (A) Frequency percent of descriptors appearing in shuffling stepwise multiple linear regression (MLR). The five most potent descriptors have been selected. Adding more descriptors does not significantly change the statistical parameters of the model. (B) Plot of the mean effects of the descriptors in the proposed MLR model.
Download figure to PowerPoint
To consider the probability of non-linear correlation between descriptors selected by shuffling stepwise MLR and GA-MLR techniques and pIC50 values, we checked the performance of the support vector regression method as a non-linear modeling technique. Support vector regression with kernel type of radial basis function (kernel parameter = 0.1) and C and epsilon parameters (of 50 and 1, respectively) was performed on two sets of descriptors. The correlation coefficient and standard deviation derived from 1000 cycles of leave-20-out cross-validation on different training and cross-validation sets were used as criteria for comparison of QSAR methods. The statistical parameters of the different variable selection and modeling methods were all averages of 1000 rounds of data shuffling; thus, the probability for chance correlation should be very low. The shuffling stepwise MLR method with a lower number of descriptors demonstrated the highest correlation coefficient for the cross-validated data and the lowest standard deviation as compared to the GA-MLR method (Table 2). It was also demonstrated that support vector regression as a non-linear modeling method does not significantly improve on the model.
Table 2. Statistical parameters of leave-20-out cross-validation for different QSAR models
Therefore, a simple linear technique (shuffling stepwise MLR) with only five descriptors (described in Table 3) can be employed for QSAR analysis of a diverse set of CGRP antagonists, and in silico calculation of the therapeutic potential of new CGRP antagonist is:
Table 3. Selected descriptors using shuffling stepwise MLR model
|PEOE_VSA+1||Sum of van der Waals surface area where qi is in the range (0.05,0.10) calculated by partial equalization of orbital electronegativities (PEOE) method||Conventional|
|DipoleZ||The z component of the dipole moment||Conventional|
|Most_Neg_Sigma_i_mol||Largest negative atomic inductive parameter σ* (atom molecule) for atoms in a molecule||Inductive|
|Mor31m||3D-MoRSE – signal 31/weighted by atomic masses||3D-MoRSE (Dragon)|
|Mor16e||3D-MoRSE – signal 16/weighted by atomic Sanderson electronegativities||3D-MoRSE (Dragon)|
The confidence interval (95%) were calculated for the models constant term, demonstrating a lower bound of −1.060 and a upper bound of 2.775, illustrating that the constant term is of less importance for the equation, and thus justifying the model. A tighter standard error could be obtained at the cost of the models simplicity, simply by adding additional descriptor elements.
The robustness of the model is further supported by the high R2 value and F values with low standard deviation. A correlation coefficient (R2) matrix for the five descriptors selected using shuffling stepwise MLR demonstrated no significant cross-correlation between the descriptors in the linear model (Table 4).
Table 4. Correlation coefficient (R2) matrix for the descriptors selected using the shuffling stepwise MLR method
|Most_Neg_Sigma_i_mol|| || ||1.0000||0.7442||0.3110|
|Mor31m|| || || ||1.0000||0.2453|
|Mor16e|| || || || ||1.0000|
In addition, the high linear correlation between the predicted pIC50 values using the presented QSAR model in a LOO cross-validation process and experimental estimated values for CGRP antagonists (R2 = 0.9046) further demonstrates the high predictive ability of the proposed QSAR model (Figure 3). The residuals of the calculated values of pIC50 were plotted against the experimental pIC50 for the LOO cross-validated data, and the propagation on both sides of the zero line indicates that no systematic error exists in the development of the QSAR model (Figure 4). Moreover, the model is capable of predicting all peptides in the dataset, with a large range of activity, without appearance of any outliers [having residual error value larger than three times of the standard deviation of the model (3 × SD = 1.5078)] in the residual plot.
Figure 3. Plot of the leave-one-out (LOO)-CV calculated pIC50 values against the experimental ones for the calcitonin gene-related peptide antagonists studied in this work.
Download figure to PowerPoint
Figure 4. Plot of residuals versus experimental values of pIC50 for the proposed quantitative structure–activity relationship model.
Download figure to PowerPoint
The presented model was used for predicting the missing pIC50 of the peptides of the external test set. These peptide sequences are assigned with ‘*’ in Table 1. As the cross-validated model has low standard deviation values, it would be expected that the biological activity predicted for the peptides in the external test set is close to their actual values. The predicted and calculated pIC50 values for all 97 CGRP antagonists are given in Table 1. The biological activity values predicted using the proposed QSAR model for the CGRP antagonists of the external test set are clearly within the range reported in the references (7,11).
To exclude the probability of chance correlation, validation of the variable selection and modeling results was performed based on ten randomized biological activity values. For each, the response vector (pIC50 values) was randomly rearranged, shuffling both stepwise MLR and linear modeling was performed on the scrambled data, and R2 values for the training and cross-validation sets were recorded for each randomized trial. All computations were carried out without a priori knowledge after scrambling the activities. No significant R2 values were observed for either the training set or the cross-validation sets (data not shown). Therefore, it can be concluded that the derived QSAR model is viable for prediction without any probability for chance correlation.
Interpretation of QSAR model
The mean effects of five descriptors can be extracted from the final optimized QSAR model (Figure 2B). As can be seen from the plot, the inductive descriptor of ‘Most_Neg_Sigma_i_mol’ had the highest value of mean effect among the model descriptors. This descriptor identifies the largest negative atomic inductive parameter σ* for atoms in the molecule and represents the steric and electronegativity effects of atoms on the CGRP molecules (10,24,25). Therefore, the identification of this descriptor as the most important feature in the QSAR model adequately describes the aspects of intra- and intermolecular electrostatic interactions that are of special relevance for the activity of the CGRP antagonists.
The second most influential descriptor according to the mean effect plot is ‘PEOE_VSA+1’. It is calculated using the partial equalization of orbital electronegativities (PEOE) method and can determine the effect of the electrostatic distribution in the molecule on the activity of the peptides. This descriptor demonstrates the role of electrostatic interactions of the CGRP molecules with the polar sites of CGRP receptor.
DipoleZ, describing the z component of the dipole moment, is another descriptor with an electrostatic characteristic and can show the effect of the electrostatic interactions of polar sites of peptide antagonists in a special direction with the polar residues of the binding site of receptor.
Mor16e, as a 3D-Molecule representation of structures based on electron diffraction (3D-MoRSE) descriptor weighted by electronegativity, illustrates the role of geometry of the peptide molecules and their electrical diffraction properties during the interaction with the binding site of the receptor.
The four above-mentioned descriptors have positive mean effects and will increase the biological activity of the peptides as their values increase. This demonstrates the importance of electrostatic interactions as the key interactions between the antagonist molecules and CGRP receptor and is in agreement with the results reported by Moore and coworkers (8) on the interaction of bioavailable drug molecules with the CGRP receptor. Moreover, the electrostatic dipole–dipole interactions between the amino acids of the CGRP antagonists and the residues of the receptor can induce hydrogen bonding between the donor and acceptor sites and may be an additional antagonizing mechanism.
The values of the four descriptors with positive mean effects are larger for the CGRP antagonists described in reference (7) (sequences No. 1–32 in Table 1) as compared to the second class of CGRP antagonists described in reference (11) (sequences No. 33–97 in Table 1). As this correlated with the most potently active peptides, this demonstrates the high correlation of these four descriptors with the biological activity of the CGRP antagonists. Further, this also demonstrates the important role of electrostatic and hydrogen bonding interactions that should be considered in the design of new effective CGRP antagonists.
Mor31m is the only descriptor having a negative mean effect. As it is a 3D-MoRSE weighted by atomic masses related to 3D structure of peptides, it reveals the importance of geometry of molecules and steric effects on interaction of CGRPs with the receptor. Increasing the value of this descriptor resulted in a decrease in the activity of CGRP antagonists owing to more steric hindrance of the hydrophobic interaction between the ligand and the binding pocket of the receptor (8). In summary, the results demonstrate that electrostatic, hydrogen bonding, and hydrophobic interactions play important roles in the mechanism of blocking the CGRP receptor.
Understanding the mechanisms of CGRP receptor antagonism by drug molecules and peptides is difficult owing to the complex nature of the target (8). The CGRP receptor consists of two protein components, CLR and RAMP1. CLR has an N-terminal extracellular domain, or ectodomain, that binds polypeptide hormones of 27–141 amino acids; additionally, it contains a seven-helix transmembrane domain and an intracellular carboxyl terminus (8). The ectodomain complex is able to bind CGRP, as well as other peptide and small molecule antagonists (8,26). Recently, crystallographic structures have been reported for the CGRP ectodomain in the absence of ligand as well as in the presence the potent CGRP receptor antagonists (8). In the present docking study, we used the structure of the CGRP receptor (PDB code 3N7P) together with simulated structures of three of the most potent peptides from our data set. The structures of peptides No. 2, 4, and 5 were modeled using the LOMETS server. The LOMETS server uses eight threading programs, each generating 20 models, resulting in a total of 160 models for each peptide. The individual models are sorted by their z-scores in its respective algorithm. Among the 10 top models reported by LOMETS, we selected the peptide model with the highest confidence score and with a high z-score value, (20.365, 20.070 and 20.305 for peptides No. 2, 4 and 5, respectively). These threading alignments were obtained using MUSTER, a program available from the LOMETS server, and the structural template 2kj7A, a rat islet amyloid polypeptide (PDB code 2KJ7) (27). The secondary structure, 3D model and Ramachandran plot for peptides No. 2, 4, and 5 are given in Figure 5A–C. The results revealed that the majority of the amino acids in the peptides have phi-psi values that involve little or no steric interference. The phi–psi distribution is also consistent with the expected secondary structure of CGRP peptide antagonists (18,19) lending support to our hypothesized model.
Figure 5. Three dimensional model for peptides No. 2 (A), 4 (B), and 5 (C), supported by their peptide sequence (Seq), secondary structure (SS) and Ramachandran plots (build using VEGA ZZ). Rat islet amyloid polypeptide (2kj7A) has been used as a template for threading alignment, and its sequence is available at the top of the figure.
Download figure to PowerPoint
The verified structural models for peptides No. 2, 4, and 5 were placed parallel with the extracellular domain of the CGRP receptor and entered into the GRAMM-X server for rigid-body docking. While the best 3D models of the CGRP peptide antagonists were constructed using threading alignment, optimizing the rigid-body orientation of the peptides relative to the receptor can provide information regarding the possibility for interaction.
Docking simulation of all three CGRP antagonists with the CGRP receptor illustrates that they act by blocking access to the peptide-binding cleft at the interface of CLR and RAMP1 (Figure 6A–C). Thr122 of CLR has been reported as an effective hydrogen bond donor site of CLR and an important residue for interaction between the receptor and antagonists (8). Docking of peptide No. 2 illustrated a hydrogen bond between the phenylalanine (Phe31) of the peptide and Thr122 of CLR that may stabilize the antagonist–receptor interaction resulting in increased potency of the peptide. The hydrophobic stacking interaction of hydrophobic residues (colored yellow) of peptide No. 2 with CLR/RAMP1 hydrophobic pocket stabilized the peptide in the extracellular domain between CLR and RAMP1. Another key residue in the binding of peptides and non-peptides antagonists by CGRP is Trp84 (28). Docking of peptide No. 4 to the receptor demonstrated formation of a backbone–backbone hydrogen bond between Trp84 of the receptor and Pro22 in the antagonist. This, taken together with the hydrophobic stacking of Val25 from the antagonist into the hydrophobic pocket of the receptor, could explain the peptide’s high selectivity for the protein target (Figure 6B). Peptide No. 5, the most potent peptide among all 97 peptides, cross-links RAMP1 (Asp90) and CLR (Arg67) by forming two hydrogen bonds to Arg18 in the peptide (Figure 6C). Arg67 has been reported as a key determinant residue for high-affinity binding of antagonists to the receptor. Its hydrogen bonding interaction with peptide No. 5 likely plays an important role in the peptide’s high-affinity for antagonizing the CGRP receptor, and docking of the peptide is likely significantly stabilized by hydrogen bonding to Asp90. Hydrophobic stacking interactions of hydrophobic residues in the C-terminal end of peptide No. 5 with CLR/RAMP1 hydrophobic pocket would further contribute to the potency and selectivity of this peptide for the CGRP receptor. All three peptides bound to the binding pocket placed in the extracellular domain between CLR and RAMP1 as reported previously for the clinical receptor antagonist telcagepant and olcegepant (8,28). This demonstrates involvement of overlapping peptide–protein interactions that contribute to receptor antagonist potency. Key residues (Thr122, Trp84, Arg67 and Asp90) of the receptor participate in electrostatic and hydrogen bonding interactions with the peptide antagonists, stabilizing the peptide–protein interaction and increasing antagonist potency. Additionally, non-polar (hydrophobic) amino acids of peptide antagonists interact with the hydrophobic sites of the receptor (8). Taken together, these results are in good agreement with the physicochemical properties covered by the five most influential descriptors that were identified through variable selection using shuffling stepwise MLR.
Figure 6. Structure of the calcitonin gene-related peptide antagonist/calcitonin receptor–like receptor (CLR)/receptor activity modifying protein 1 (RAMP1) Ectodomain Complex. Ribbon diagram of CLR (cyan) and RAMP1 (magenta) with (A) peptide No. 2 (IC50 = 0.64 nm), (B) peptide No. 4 (IC50 = 0.71 nm) and (C) peptide No. 5. (IC50 = 0.2 nm). All antagonists (brown) act by blocking access to the peptide-binding cleft at the interface of CLR and RAMP1 (8). The hydrophobic residues are shown in yellow, and the dashed green lines indicate hydrogen bonds to the backbone of Thr122 of peptide No. 2, Trp84 of peptide No. 4, and Arg67 and Asp90 of peptide No. 5 (structures are built using ViewerLite 5.0).
Download figure to PowerPoint
The molecular docking of the CGRP antagonists to the CGRP receptor also confirms that it is the C-terminal portion of peptides that interact with the receptor. This is in good agreement with the previous investigations, demonstrating that residues 28–37 (in the case of αCGRP) are required for high-affinity binding of peptides, and this region is almost certainly in direct contact with the receptor (19,29). That the first 32 peptides of the data sets have this region and demonstrate high-affinity binding capability as compared to that of the remaining 65 peptides of low activity provides further proof for the importance of this region in binding.
The findings of our docking study prove that the CLR/RAMP1 extracellular domain complex used in this study can be used for peptide antagonist binding and as a valid functional surrogate for the full-length receptor complex in agreement with recent experimental studies (8,26). The results of the docking study also clarified, for the first time, how peptides bind to a class B G-protein-coupled receptor and highlighted the challenges of designing potent antagonists for the treatment of migraine.