SEARCH

SEARCH BY CITATION

Keywords:

  • CCR5 antagonist;
  • CoMFA;
  • CoMSIA;
  • force field;
  • support vector machine

Abstract

  1. Top of page
  2. Abstract
  3. Computational Methods and Materials
  4. Results and Discussion
  5. Conclusion
  6. Acknowledgments
  7. References
  8. Supporting Information

CCR5 is the key receptor of HIV-1 virus entry into host cells and it becomes an attractive target for antiretroviral drug design. To date, six types of CCR5 antagonist were synthesized and evaluated. To search more potent bio-active compounds, non-linear support vector machine was used to construct the relationship models for 103 oximino-piperidino-piperidine CCR5 antagonists. Then, comparative molecular field analysis and comparative molecular similarity indices analysis models were constructed after alignment with their common substructure. Twenty-one structural diverse compounds, which were not included in the support vector machine, comparative molecular field analysis, and comparative molecular similarity indices analysis models, validated these models. The results show that these models possess good predictive ability. When comparing between support vector machine and 3D-quantitative structure activity relationship models, the results obtained from these two methods are compatible. However, 3D-quantitative structure activity relationship model is significantly better than support vector machine model and previous reported pharmacophore model. These models can help us to make quantitative prediction of their bio-activities before in vitro and in vivo stages.

HIV-1 entry into host cells is the key process of virus infection, therefore the inhibition of this process is an attractive target for antiretroviral intervention and drug design (1). The detail infection process of HIV-1 is as follow: at first, HIV gp120 envelope protein binds the CD4 receptor (2), then the HIV envelope protein adjusts its conformation, finally docks the chemokine receptors CCR5 and CXCR4 (3). Therefore, CCR5 receptor is one of the important targets for anti-AIDS drug design. Until now, the antagonists of CCR5 are classified into six main categories (4): anilide derivatives (5), oximino-piperidino-piperidine derivatives (6), chiral piperazine-based derivatives (7), tropane-based derivatives (8), spirodiketopiperazine-based derivatives (9), and acyclic and cyclic scaffold-based derivatives (10). Because oximino-piperidino-piperidine antagonists have safe and generally well-tolerated characters (4), they have been widely investigated. However, the crystal structure of CCR5 receptor complex does not released, ligand-based molecular modeling method is just used to investigate CCR5 receptor antagonist.

Support Vector Machine (SVM) is an algorithm developed for regression and classification (11). Thanks to its remarkable generalization performance, the SVM has attracted wide attention and gained extensive applications to quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) for drug design (12–16). In this work, SVM was used to construct regression model. Then, this model was compared with those of comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA). In summary, these models could help us to better understand how the substituents influence the bio-activity of CCR5 receptor antagonist and to make quantitative prediction of their bio-activities before in vitro and in vivo stages.

Computational Methods and Materials

  1. Top of page
  2. Abstract
  3. Computational Methods and Materials
  4. Results and Discussion
  5. Conclusion
  6. Acknowledgments
  7. References
  8. Supporting Information

Data sets

The structures and activities (concentration in nm was determined on RANTES binding to CCR5) of CCR5 receptor antagonists are extracted from the literature (17). The structures and data are listed in Table 1. The structures marked with ‘*’ constitute the test set. The others are made up of the training set.

Table 1.   Structures and bio-activities Thumbnail image of Thumbnail image of Thumbnail image of Thumbnail image of Thumbnail image of Thumbnail image of

SVM method

Support vector machine is a promising classification and regression method developed by Vapnik et al. (11) A detailed description of SVM theory can be found in several relative books (18–21). Support vector machine was originally developed for classification problems, then it was extended to solve non-linear regression problems by the introduction of ε-insensitive loss function.

Support vector machine approach has been proposed to minimize the structural risk rather than the empirical risk; that is to preserve good generalization ability rather than optimizing the agreement with a given (limited) training set. Therefore, it constitutes a trade-off between the complexity of the model and its capability to reproduce experimental observations.

The regression performance of SVM depends on the setting of parameters: C, ε and the kernel type and corresponding kernel parameters. Parameter C is a regularization constant which determines the trade-off between the model complexity and the degree to which deviations larger than ε are tolerated in optimization formulation. The kernel function and corresponding parameters are another important influencing factor because they define the distribution of the training set of samples in the high-dimensional feature space.

All SVM models in this present study were implemented using the shareware program Libsvm developed by C. W. Hsu and C. J. Lin (22). The radial basis function (RBF) was used as kernel function in this work. For RBF kernel, the most important parameter is the width (γ) of the RBF. All calculation programs implementing SVM were written in M-file based on matlab script.

Molecular modeling and alignment

Three-dimensional structure modeling was performed using the sybyl program packagea. Energy minimization was used by Tripos force field (23) with a distance-dependent dielectric and the Powell conjugate gradient algorithm with a convergence criterion of 0.01 kcal/(mol Å). Partial atomic charges were calculated with the Gasteiger–Hückel method. Fifty conformers of each compound in the training and test set were generated using multisearch module in sybyl. Energy-minima conformers were selected to build 3D-QSAR model.

Comparative molecular field analysis results might be extremely sensitive to a set of factors such as alignment rule, overall orientation of the aligned compounds, lattice shifting step size and probe atom type (24). The accuracy of prediction for CoMFA models and the reliability of the contour plots strongly depend on the structural alignment of the molecules. Because the structures in this investigation have a common substructure, these compounds were aligned according to the common substructure (Figure 1). The molecule M77 with the largest pKi value was selected as the template of alignment. The molecular alignment was applied with the routine SYBYL function of ‘database align’.

image

Figure 1.  Common substructure (line in bold) for investigated compounds.

Download figure to PowerPoint

CoMFA models

Comparative molecular field analysis was used to build statistical and graphical models of activity from molecular structure and to make accurate predictions of the activity for designed compounds (25). This method led to several drugs currently on the market (26). The simple step of CoMFA analysis is as follow. After consistently aligning the molecules within a lattice, a probe sp3 carbon atom with +1 net charge was employed. The steric and electrostatic interactions between the probe and the rest of the molecule were calculated. Steric and electrostatic fields were generated by the standard CoMFA method in sybyl with default energy of 30 kcal/mol. Electrostatic interactions were modeled by a Coulomb potential and Van der Waals interactions by Lennard-Jones potential. The regression analysis was carried out using the partial least-squares (PLS) (27) method. The final model was developed with the optimum number of components equal to that yielding the highest inline image. The total set of antagonist was initially divided into two groups in the approximate ratio 4:1 (for example, 82 in training set to 21 in the test set). The selection of the training and test set was done such that low-, moderate-, and high-activity compounds are present in roughly equal proportions in both sets.

CoMSIA models

Standard CoMFA approach just describes the potential energetic contributions to the binding constants and neglects entropic influences or insufficiently covered. To include entropic contributions, CoMSIA method was proposed by Klebe et al. (28,29) Five physicochemical properties related to steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor fields were evaluated on the probe atom. Gaussian-type distance dependence potential was employed to describe the relative attenuation of the field position for each atom in lattice. Gaussian-type distance dependence in CoMSIA leads to much smoother sampling of the fields around the molecules than in CoMFA. The default value of 0.3 was used as the attenuation factor.

Descriptor calculation

Twenty-seven Charged Partial Surface Area descriptors (30) and 56 VolSurf descriptors (31) with default methods implemented in sybyl were calculated. But only eight descriptors were selected after a number of tests with training set compounds. These descriptors have strong correlation with the bio-activity of CCR5 antagonist. But their mutual dependence is not significant. These descriptors are molecular polarizability and dispersion force (W2 and W4), three capacity factors (Cw3, Cw4, and Cw5), the ratio between charge of most negative atom and sum total negative charge (RNCG), highest occupied molecular orbital (HOMO), and lowest unoccupied molecular orbital (LUMO).

Regression analysis

To derive 3D QSAR models, the standard CoMFA and CoMSIA descriptors were used as independent variables, pKi as dependent variable, to perform PLS regression analyses implemented in sybyl package. Comparative molecular field analysis descriptors were calculated using a sp3 carbon probe atom with a charge of +1.0 to generate steric (Lennard-Jones potential) field energies and electrostatic (Coulombic potential) fields with a distance-dependent dielectric at each lattice point. Comparative molecular similarity indices analysis descriptors were generated using a sp3 carbon probe atom with +1.0 charge (Gaussian potential). The predictive values of the models were evaluated by leave-one-out (LOO) cross-validation. The cross-validated coefficient, q2, was calculated using eqns 1 and 2 (32).

  • image(1)
  • image(2)

Where yi is the activity of training set. ym is the mean observed value, corresponding to the values for each cross-validation group. ypred,i is the predicted activity for yi.

Results and Discussion

  1. Top of page
  2. Abstract
  3. Computational Methods and Materials
  4. Results and Discussion
  5. Conclusion
  6. Acknowledgments
  7. References
  8. Supporting Information

SVM method

To select the suitable descriptors, LOO cross-validation method was used to build SVM model. The performance of SVM for regression depends on the combination of several factors. They are kernel function type, capacity parameter C, ε of insensitive loss function, and its corresponding parameters. To get the best generalization ability, some strategies are employed to optimize these factors. There are four possible choices of kernel functions available in LibSVM package i.e., linear; polynomial; RBF and sigmoid function. For regression tasks, the RBF kernel is often used because of its effectiveness and speed in training process (33) and also applied in our SVM models. For RBF kernel function, three parameters, ε, γ and C, were chosen. Detailed descriptions of the process for selecting parameters and effect of each parameter on generalization performance are shown in Figures 2–4. At first, we only change the value of γ from 0.10 to 1.60, the mean standard error (MSE) based on LOO cross-validation for training set varies with γ. The curve between MSE and γ is shown in Figure 2. The optimal value of γ was found as 1.0. According to this method, the optimal value of ε was found as 0.05. The cost factor was found as 1000. Then experimental activities (EA), selected descriptors, and predicted activities (PA) are listed in supplement files (Tables S1 and S2).

image

Figure 2.  γ versus mean standard error of the training set based on leave-one-out cross-validation (C = 1000, ε = 0.07).

Download figure to PowerPoint

image

Figure 3.  ε versus mean standard error of the training set based on leave-one-out cross-validation (C = 1000, γ = 1.0).

Download figure to PowerPoint

image

Figure 4.  Cost versus mean standard error of the training set based on leave-one-out cross-validation (ε = 0.05, γ = 1.0).

Download figure to PowerPoint

Examination of SVM model

The correlation coefficient r2 between EA and PA is 0.904, with standard errors (SEE) equal to 0.210 for training set. The r2 is 0.742 with SEE equal to 0.312 for test set (M3 and M38 are outliers which can not be predicted by this SVM model). The correlations between EA and PA for training and test sets are shown in Figure 5. This model could generally reflect the relations between EA and PA.

image

Figure 5.  Correlation between experimental and predicted activity of support vector machine model.

Download figure to PowerPoint

3D-QSAR model

Two methods, CoMFA and CoMSIA, were used to construct 3D-QSAR models for CCR5 receptor antagonists. The alignment diagram of the 144 compounds of the training and test set is shown in Figure 6. The statistical parameters of the models are given in Table 2. The PA and the residuals between EA and PA are gathered in Tables 3 and 4.

image

Figure 6.  Alignment of compounds for training and test set.

Download figure to PowerPoint

Table 2.   Partial least-squares (PLS) statistical parameters of comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) Modelsa
PLS parameterCoMFACoMSIA
SESESEHSEDSEASEHDSEHASEDASEHDA
  1. aS, steric field; E, electrostatic field; H, hydrophobic field; D, hydrogen bond donor; A, hydrogen bond acceptor; SEE, standard error.

Q20.7310.5280.5200.3740.4510.4790.4870.3910.462
R20.9710.9280.9560.8730.9410.9420.9390.9040.956
SEE0.1290.2030.1580.2710.1840.1820.1870.2350.160
F420.956161.443273.76385.659200.432204.199232.846117.623269.178
PLS components666666566
Steric0.5730.3660.1880.2660.2530.1500.1460.1870.120
Electrostatic0.4270.6340.3520.5150.3960.2900.2640.3310.219
Hydrophobic0.4600.4010.3490.319
Acceptor0.3510.2410.2960.202
Donor0.2180.1580.1860.140
Table 3.   The experimental and predicted activities (PA) of training set and the previous work
No.Exp. pKiCoMFACoMSIAPrevious work (17)
PARes.PARes.
  1. CoMFA, comparative molecular field analysis; CoMSIA, comparative molecular similarity indices analysis.

M1−1.76−1.850.09−1.980.22−1.15
M2−1.83−1.76−0.07−1.830.00−1.89
M4−1.40−1.410.01−1.460.06−0.49
M5−1.74−1.73−0.01−1.67−0.07−1.88
M6−2.04−2.120.08−2.070.03−2.49
M8−1.93−1.930.00−1.84−0.10−2.54
M9−1.52−1.46−0.06−1.41−0.11−2.72
M10−2.78−2.62−0.16−2.69−0.09−2.46
M11−2.56−2.560.00−2.610.05−2.66
M12−1.73−1.850.12−1.58−0.16−1.28
M14−1.79−1.70−0.09−1.73−0.06−1.04
M15−0.90−0.85−0.05−0.82−0.08−2.11
M16−1.46−1.640.18−1.670.21−0.58
M18−0.30−0.340.04−0.310.01−0.30
M19−1.68−1.65−0.03−1.50−0.18−1.20
M21−1.89−2.060.17−2.050.16−1.89
M22−0.95−0.86−0.09−0.94−0.01−0.78
M23−0.60−0.610.00−0.56−0.04−0.51
M25−1.52−1.36−0.16−1.42−0.10−0.51
M26−1.48−1.570.09−1.38−0.10−0.43
M28−1.26−1.370.11−1.270.01−0.56
M29−0.65−0.43−0.22−0.21−0.44−0.83
M30−0.48−0.45−0.03−0.34−0.14−0.48
M31−0.04−0.200.16−0.300.26−0.48
M33−1.41−1.430.02−1.13−0.28−0.66
M34−1.48−1.510.03−1.33−0.15−0.40
M35−0.85−0.83−0.02−0.76−0.09−0.40
M36−1.63−1.57−0.06−1.55−0.08−0.43
M37−1.48−1.520.04−1.42−0.06−0.43
M39−1.48−1.45−0.03−1.500.02−0.56
M40−1.48−1.660.18−1.45−0.03−0.28
M41−1.20−1.380.18−1.260.06−0.38
M42−1.28−1.280.00−1.09−0.19−0.38
M44−0.88−0.75−0.13−0.75−0.13−0.72
M45−1.48−1.490.01−1.38−0.10−2.63
M46−1.38−1.35−0.03−1.30−0.08−0.56
M47−0.58−0.630.05−0.57−0.01−0.42
M49−1.38−1.22−0.08−1.27−0.03−2.58
M50−0.75−0.750.00−0.920.17−0.34
M51−1.48−1.46−0.02−1.700.22−2.78
M52−3.11−2.81−0.30−2.85−0.26−2.42
M54−1.79−1.820.03−2.090.30−1.70
M55−1.48−1.44−0.04−1.40−0.08−2.43
M56−1.30−1.28−0.02−1.630.33−2.43
M57−0.90−1.160.26−0.950.05−2.11
M59−1.08−1.100.02−0.92−0.16−1.28
M60−0.85−0.940.09−0.890.04−0.98
M610.00−0.040.040.12−0.12−0.53
M62−0.700.63−0.07−0.61−0.09−1.23
M64−0.48−0.37−0.11−0.490.01−2.11
M65−0.70−0.890.19−0.720.02−1.71
M66−1.18−1.14−0.04−1.220.04−1.86
M670.15−0.0630.21−0.130.28−1.64
M69−0.72−0.750.03−0.950.23−1.28
M70−1.26−1.290.03−1.390.13−2.08
M71−0.48−0.23−0.25−0.530.05−1.00
M72−1.00−0.87−0.13−0.86−0.14−0.80
M74−1.08−1.170.09−0.99−0.09−1.34
M75−1.04−0.92−0.12−1.130.09−1.46
M76−1.58−1.800.22−1.44−0.14−0.88
M771.000.740.260.680.32−2.56
M78−2.77−2.61−0.16−2.51−0.26−0.40
M80−0.36−0.500.14−0.450.09−0.38
M81−0.94−0.83−0.11−0.86−0.09−0.30
M820.400.240.160.380.02−0.72
M83−0.080.01−0.090.048−0.13−0.69
M840.150.21−0.050.150.00−0.26
M86−0.340.11−0.45−0.043−0.30−0.32
M87−0.30−0.360.05−0.370.06−0.41
M88−1.15−1.00−0.15−1.390.24−0.61
M89−0.32−0.25−0.07−0.30−0.02−0.38
M91−0.52−0.580.06−0.690.17−0.46
M92−0.32−0.490.17−0.550.22−0.52
M93−0.51−0.48−0.03−0.580.07−0.49
M94−0.76−0.67−0.09−0.880.120.42
M96−2.50−2.590.09−2.700.20−0.51
M97−1.63−1.670.04−1.750.12−0.61
M98−1.40−1.410.01−1.34−0.06−0.52
M99−0.72−0.70−0.02−0.68−0.04−2.04
M101−0.85−0.850.00−0.890.04−2.18
M102−0.36−0.24−0.13−0.29−0.07−1.15
M103−0.85−0.850.00−1.030.18−1.15
Table 4.   The experimental and predicted activities (PA) of test set and the previous work
No.Exp. pKiCoMFACoMSIAPrevious work (17)
PARes.PARes.
  1. CoMFA, comparative molecular field analysis; CoMSIA, comparative molecular similarity indices analysis.

M3−1.52−1.540.02−1.520.00−1.28
M7−2.19−2.260.07−2.240.05−2.48
M13−2.28−2.330.05−2.22−0.06−1.38
M17−1.40−1.32−0.08−1.32−0.08−0.96
M20−0.30−0.350.05−0.320.02−0.30
M24−1.52−1.600.08−1.530.02−0.46
M27−0.53−0.580.05−0.18−0.35−0.53
M32−0.04−0.380.34−0.350.31−0.04
M38−0.32−1.220.90−0.570.24−0.32
M43−0.72−0.69−0.04−0.770.05−0.89
M48−0.64−0.740.09−0.58−0.06−1.42
M53−2.64−2.29−0.36−2.57−0.07−2.58
M58−0.70−0.770.07−0.68−0.02−1.26
M63−0.70−0.63−0.07−0.800.10−0.89
M68−0.43−0.820.39−0.820.38−1.41
M73−1.00−1.120.12−0.94−0.07−1.04
M790.520.140.110.490.030.19
M85−0.67−0.67−0.01−0.690.02−0.62
M90−1.04−1.080.04−1.110.07−0.64
M95−1.78−1.860.08−1.73−0.051.15
M100−0.110.07−0.180.01−0.12−0.49

Selection of CoMSIA fields

There are five fields in CoMSIA model. The most important parameter that influences its performance is how to combine these five fields. To obtain the optimal result, we systemically changed the combination of fields and chose the value that gave the best cross-validation and non-cross validation, the smallest SEE and the largest F value. Figure 7 illustrates the parameters for the combination of five fields. The model combined with steric, electrostatic, and hydrophobic fields, having the highest cross-validated q2, r2, 1/SEE, and F, was chosen as the best CoMSIA model, and the contour plots will be analyzed using this model.

image

Figure 7.  The influence of different combined fields.

Download figure to PowerPoint

Evaluation of CoMFA and CoMSIA models

We will now examine the correlation models between EA and PA of CCR5 antagonist.

For CoMFA model, the cross-validated q2 value of training set is 0.731 with six principal components. The non-cross-validated r2 value is 0.971, with SEE 0.129. Twenty-one structurally diverse compounds, which were not included in the CoMFA and CoMSIA models, were selected as a validation set. The corresponding correlation coefficient r2 between EA and PA for test set is 0.927, with SEE equal to 0.222. The correlations between EA and PA for training and test sets are shown in Figure 8. Except M38, other compounds of test set can be predicted well by CoMFA model. However, we cannot give an exact explanation for the large deviation. It may be brought by experimental errors or insufficient correction factors. For CoMSIA(SEH) model, the cross-validated q2 value of training set is 0.520 with six principal components. The non-cross-validated r2 value comes to 0.956, with SEE 0.158. The corresponding correlation coefficient r2 between EA and PA for test set is 0.967, with SEE 0.155. The correlations between EA and PA for training and test sets are also shown in Figure 8. The residues for all tested compounds are less than 0.40. This suggests that the CoMSIA(SEH) model has good prediction ability.

image

Figure 8.  Correlations between experimental and predicted activity for comparative molecular field analysis and comparative molecular similarity indices analysis model.

Download figure to PowerPoint

Analysis of CoMFA model

The PLS statistical parameters of CoMFA are summarized in Table 2. The steric and electrostatic field contribution is 0.573 and 0.427, respectively.

Figure 9 illustrates the contour plots of CoMFA model with the structure M77. The meanings of the different color areas are given in the figure caption. Red-colored regions near 4-pyridine of phenyl for substituent R3 suggest that negative charge groups are favorable to bio-activity. This could explain that the activity of compound M77 with pyridine is higher than that of compound M74 with phenyl substituent. For compounds M93, M94, and M95, the activity order is consistent with that of their electronegativity, such as M93 (–CF3) > M94 (–NH2) > M95 (–NHCOCH3). Blue-colored regions near substituent at positions 2 and 6 of phenyl for substituent R3 suggest that positive charge groups are favorable to activity. This could explain that the bio-activities have the sequence: M64 (2,6-diCH3) > M65 (2-Cl-6-NH2), M18 (2,6-diCH3) > M27 (2-Cl-6-NH2), M58 (2,6-diCH3) > M59 (2-Cl-6-NH2). Green-colored, blue-colored, and yellow-colored regions near the substituent of C=N–O for substituent R1 suggest that suitable bulk and positive groups are favorable to activity. This could explain that the activity of M88 (–CH2–CF3) with negative charge group is lower than those of M89 (–CH2–C3H7), M90 (–CH2–C3H5), and M91 (–CH2–C2H6) with positive charge group. The activity of M23 with positive charge is higher than that of M24 with negative charge group. This is also consistent with the indication of CoMFA model. At the same time, the bulk of substituent –CH2–C2H6 for M91 is the smallest among –CH2–C3H7 for M89 and –CH2–C3H5 for M89, indicating that the activity of M91 is the largest among M88 and M89. Blue-colored region near the substituent at position R1 of trans conformers of M20 and M18 indicates that the activity of trans conformers (M18 and M20) is larger than that of cis transformers (M17 and M19), respectively. Green-colored, red-colored, and yellow-colored regions near the substituent at position 4 of phenyl for R1 suggest that suitable bulk and negative charge groups are favorable to activity. It could explain that the activities have the sequence: M4 (–CF3) > M3 (–I) > M1 (–Br) > M2 (–Cl) > M6 (–H) > M7 (–OCH3), M43 (–CF3) > M44 (–OCF3) > M45 (–SO2CH3). It is convinced that the activity of M11 (–C6H5) with positive and bulk group is very low. Yellow-colored region near the substituent R2 and R4 suggests that small bulk groups are favorable to activity. This is also consistent with the experimental observation.

image

Figure 9.  The contour plots of comparative molecular field analysis steric and electrostatic fields. Green contours indicate regions where bulky groups increase activity, whereas yellow contours indicate regions where bulky groups decrease activity. Blue contours indicate regions where positive groups increase activity, whereas red contours indicate regions where negative charge increase activity.

Download figure to PowerPoint

Analysis of CoMSIA model

The PLS statistics of CoMSIA are also summarized in Table 2. The steric, electrostatic, and hydrophobic field contribution is 0.188, 0.352, and 0.460, respectively.

Figure 10 shows the contour plots of CoMSIA model with the structure M77. The meanings of the different color areas are listed in the figure caption. Red-colored regions near 4-pyridine of phenyl for substituent R3 suggest that negative charge groups are favorable to activity. This is in agreement with the result of CoMFA. Blue-colored and purple-colored regions near substituent at positions 2 and 6 of phenyl for substituent R3 suggest that positive charge and hydrophobic groups are favorable to activity. This could explain that M65 (2-Cl-6-NH2), M27 (2-Cl-6-NH2), and M59 (2-Cl-6-NH2) with hydrophilic groups are less active than those compounds M64 (2,6-diCH3), M18 (2,6-diCH3), and M58 (2,6-diCH3) with hydrophobic groups. White-colored, purple-colored, and green-colored regions near the substituent of C=N–O for substituent R1 suggest that suitable hydrophobic and bulk groups are favorable to activity. The logP of substituent for M89 (–CH2–C3H7), M90 (–CH2–C3H5), and M91 (–C3H7) is 2.11, 1.54, and 1.63, respectively. M91 with suitable logP is the most active one among these compounds. This is consistent with the indication of CoMSIA model. Purple-colored regions near the substituent at position 4 of phenyl for R1 suggest that hydrophobic groups are favorable to activity. When comparing with the contour plots of CoMFA and CoMSIA models, CoMSIA model can reflect the influence of hydrophobic filed. This compensates the shortage of CoMFA model.

image

Figure 10.  The contour plots of comparative molecular similarity indices analysis steric, electrostatic, and hydrophobic fields. Green contours indicate regions where bulky groups increase activity, whereas yellow contours indicate regions where bulky groups decrease activity. Blue contours indicate regions where positive groups increase activity, whereas red contours indicate regions where negative charge increases activity. White contours indicate regions where hydrophobicity is unfavorable to activity, whereas purple for hydrophobicity favorable to activity.

Download figure to PowerPoint

SVM versus 3D-QSAR models

The descriptors constructed SesVM models are W2, W4, Cw3, Cw4, Cw5, RNCG, HOMO, and LUMO. W2 and W4 describe the molecular polarizability and dispersion force. They could be assigned to the steric field of CoMFA and CoMSIA. RNCG is the parameter related with the negative charge. It represents the electrostatic interaction between the residues of receptor and the compound. Cw3, Cw4, and Cw5 represent the hydrophilic capacity. It may reflect the hydrophobic property of a molecule. This is consistent with CoMSIA model that the contribution of hydrophobic field is 0.460. The HOMO and LUMO are localized onto the aromatic moiety of the compounds. Their energies describe the ability of the aromatic ring to make π–π interactions with aromatic residues of receptor (34). Therefore, it is conceivable to assign the HOMO and LUMO descriptors to activity, in agreement with the result that phenyl substituent at position R3 is favorable to activity. Our SVM and 3D-QSAR models not only can explain but also complement each other. However, the correlation coefficient r2 between EA and PA of SVM model for test set is 0.742, while the corresponding r2 of CoMFA and CoMSIA models for same data set is 0.927 and 0.967, respectively. This suggests that 3D-QSAR model is significantly better than SVM model.

Comparison with previously reported work

Debnath A. K. (17) studied these CCR5 antagonists (see Tables 3 and 4) with pharmacophore model. Twenty-five training set compounds were used to construct 10 pharmacophore models. In these models, there is at last one hydrophobic feature, indicating that the activity is significantly correlated to the hydrophobicity of molecules. The contribution of hydrophobic field in the CoMSIA model is 0.460, which shows that hydrophobic field is the main factor influencing activity among three used fields. Our CoMSIA model is consistent with the previous work. The estimated activities with pharmacophore model are also gathered in Tables 3 and 4. The correlation between EA and PA is shown in Figure 11A for all activity data [except M61 and M103, because of their too large residuals between CA and EA for Debnath’s result (17)]. The correlation coefficient r2 between EA and PA is 0.330. While the corresponding EA–PA correlation coefficient r2 of our CoMSIA model including training and test set is 0.958 (see Figure 11B). Although this comparison between this work and previous work has some biases because of the different training and test set, we also can conclude that our model has better prediction ability.

image

Figure 11.  Correlation between experimental activities and predicted activities. (A: previous work; B: this work).

Download figure to PowerPoint

Conclusion

  1. Top of page
  2. Abstract
  3. Computational Methods and Materials
  4. Results and Discussion
  5. Conclusion
  6. Acknowledgments
  7. References
  8. Supporting Information

Non-linear SVM method can be used to construct the activity model for CCR5 antagonists. The SVM model including eight descriptors could well describe this correlation. The descriptors include W2, W4, Cw3, Cw4, Cw5, RNCG, HOMO, and LUMO. The correlation coefficient r2 between EA and PA of SVM model for training set is 0.904. Then, CoMFA and CoMSIA models were constructed after alignment according to common substructure. The 3D-QSAR investigations suggest that negative charge group at position 4 of substituent R3; positive charge and hydrophobic groups at positions 2 and 6 of substituent R3; suitable bulk, hydrophobic and positive groups of substituent C=N–O for R1; suitable bulk, hydrophobic and negative charge groups at position 4 of phenyl for R1 could be favorable to the activity. These results can help us to make quantitative prediction of their activities. When comparing between SVM and 3D-QSAR model, we found that SVM and 3D-QSAR models not only can explain but also complement each other. However, 3D-QSAR model is significant better than SVM model. For the same set of compounds, our 3D-QSAR and SVM models give better results than pharmacophore model.

Footnotes
  • a

    SYBYL, [Computer Program], Version 6.9, St Louis, MO: Tripos Associates Inc.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Computational Methods and Materials
  4. Results and Discussion
  5. Conclusion
  6. Acknowledgments
  7. References
  8. Supporting Information

This work is supported by the National Natural Science Foundation of China (Grants No. 30770502 and No. 20773085), in part by grants from Ministry of Science and Technology China and by National 863 High-Tech Program (2007DFA31040).

References

  1. Top of page
  2. Abstract
  3. Computational Methods and Materials
  4. Results and Discussion
  5. Conclusion
  6. Acknowledgments
  7. References
  8. Supporting Information
  • 1
    LaBonte J., Lebbos J., Kirkpatrick P. (2003) Enfuvirtide. Nat Rev Drug Discov;2:345346.
  • 2
    Maddon P.J., Dalgleish A.G., McDougal J.S., Clapham P.R., Weiss R.A., Axel R. (1986) The T4 gene encodes the AIDS virus receptor and is expressed in the immune system and the brain. Cell;47:333348.
  • 3
    Berger E.A., Murphy P.M., Farber J.M. (1999) Chemokine receptors as HIV-1 coreceptors: roles in viral entry, tropism, and disease. Annu Rev Immunol;17:657700.
  • 4
    Palani A., Tagat J.R. (2006) Discovery and development of small-molecule chemokine coreceptor CCR5 antagonists. J Med Chem;49:28512857.
  • 5
    Shiraishi M., Aramaki Y., Seto M., Imoto H., Nishikawa Y., Kanzaki N., Okamoto M., Sawada H., Nishimura O., Baba M., Fujino M. (2000) Discovery of novel, potent, and selective small-molecule CCR5 antagonists as anti-HIV-1 agents: synthesis and biological evaluation of anilide derivatives with a quaternary ammonium moiety. J Med Chem;43:20492063.
  • 6
    Palani A., Shapiro S., Clader J.W., Greenlee W.J., Cox K., Strizki J., Endres M., Baroudy B.M. (2001) Discovery of 4-[(Z)-(4-bromophenyl)- (ethoxyimino)methyl]-1′-[(2,4-dimethyl-3- pyridinyl)carbonyl]-4′-methyl-1,4′- bipiperidine N-oxide (SCH 351125): an orally bioavailable human CCR5 antagonist for the treatment of HIV infection. J Med Chem;44:33393342.
  • 7
    Tagat J.R., McCombie S.W., Steensma R.W., Lin S., Nazareno D.V., Baroudy B., Vantuno N., Xu S., Liu J. (2001) Piperazine-based CCR5 antagonists as HIV-1 inhibitors. I: 2(S)-methyl piperazine as a key pharmacophore element. Bioorg Med Chem Lett;11:21432146.
  • 8
    Wood A., Armour D. (2005) The discovery of the CCR5 receptor antagonist, UK-427,857, a new agent for the treatment of HIV infection and AIDS. Prog Med Chem;43:239271.
  • 9
    Maeda K., Yoshimura K., Shibayama S., Habashita H., Tada H., Sagawa K., Miyakawa T., Aoki M., Fukushima D., Mitsuya H. (2001) Novel low molecular weight spirodiketopiperazine derivatives potently inhibit R5 HIV-1 infection through their antagonistic effects on CCR5. J Biol Chem;276:3519435200.
  • 10
    Mills S.G., DeMartino J.A. (2004) Chemokine receptor-directed agents as novel anti-HIV-1 therapies. Curr Top Med Chem;4:10171033.
  • 11
    Vapnik V., Chapelle O. (2000) Bounds on error expectation for support vector machines. Neural Comput;12:20132036.
  • 12
    Chen H.F. (2008) Computational study of histamine H3-receptor antagonist with support vector machines and three dimension quantitative structure activity relationship methods. Anal Chim Acta;624:203209.
  • 13
    Chen H.F. (2008) Computational study of the binding mode of epidermal growth factor receptor kinase inhibitors. Chem Biol Drug Des;71:434446.
  • 14
    Chen H.F. (2008) Quantitative predictions of gas chromatography retention indexes with support vector machines, radial basis neural networks and multiple linear regression. Anal Chim Acta;609:2436.
  • 15
    Chen H.F., Wu M.Y., Wang Z., Wei D.Q. (2007) Insight into the metabolism rate of quinone analogues from molecular dynamics simulation and 3D-QSMR methods. Chem Biol Drug Des;70:290301.
  • 16
    Yao X.J., Panaye A., Doucet J.P., Zhang R.S., Chen H.F., Liu M.C., Hu Z.D., Fan B.T. (2004) Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression. J Chem Inf Comput Sci;44:12571266.
  • 17
    Debnath A.K. (2003) Generation of predictive pharmacophore models for CCR5 antagonists: study with piperidine- and piperazine-based compounds as a new class of HIV-1 entry inhibitors. J Med Chem;46:45014515.
  • 18
    Cristianini N., Shawe-Taylor J. (2000) An Introduction to Support Vector Machines. Cambridge, UK: Cambridge University Press.
  • 19
    Joachims T. (2002) Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Norwell: Kluwer.
  • 20
    Schölkopf B., Smola A. (2002) Learning with Kernels. Cambridge, MA: MIT Press.
  • 21
    Herbrich R. (2002) Learning Kernel Classifiers. Cambridge, MA: MIT Press.
  • 22
    Hsu C.W., Lin C.J. (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw;13:415425.
  • 23
    Clark M., Cramer R.D., Van Opdenbosh N. (1989) The tripos force field. J Comput Chem;10:9821012.
  • 24
    Cho S.J., Tropsha A. (1995) Cross-validated R2-guided region selection for comparative molecular field analysis: a simple method to achieve consistent results. J Med Chem;38:10601066.
  • 25
    Cramer R.D., Patterson D.E., Bunce J.D. (1988) Comparative molecular field analysis (CoMFA). 1.effect of shape on binding of steroids to carrier proteins. J Am Chem Soc;110:59595967.
  • 26
    Boyd D.B. (1990) Successes of Computer-Assisted Molecular Dessign. New York: VCH Publishers.
  • 27
    Clark M., Cramer R.D. (1993) The probability of chance correlation using partial least squares (PLS). Quant Struct-Act Relat;12:137145.
  • 28
    Bohm M., Stürzebecher J., Klebe G. (1999) Three-dimensional quantitative structure-activity relationship analyses using comparative molecular field analysis and comparative molecular similarity indices analysis to elucidate selectivity differences of inhibitors binding to trypsin, thrombin, and factor Xa. J Med Chem;42:458477.
  • 29
    Klebe G., Abraham U., Mietzner T. (1994) Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. J Med Chem;37:41304146.
  • 30
    Stanton D.T., Jurs P.C. (1990) Development and use of charged partial surface area descriptors in computer-assisted quantitative structure-property relationship studies. Anal Chem;62:23232329.
  • 31
    Raichurkar A.V., Kulkarni V.M. (2003) Understanding the antitumor activity of novel hydroxysemicarbazide derivatives as ribonucleotide reductase inhibitors using CoMFA and CoMSIA. J Med Chem;46:44194427.
  • 32
    Leach A.R. (2001) Molecular Modelling Principles and Applications. London: Henry Ling LTd Press.
  • 33
    Keerthi S.S., Lin C.J. (2003) Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput;15:16671689.
  • 34
    Macchiarulo A., De Luca L., Costantino G., Barreca M.L., Gitto R., Pellicciari R., Chimirri A. (2004) QSAR study of anticonvulsant negative allosteric modulators of the AMPA receptor. J Med Chem;47:18601863.

Supporting Information

  1. Top of page
  2. Abstract
  3. Computational Methods and Materials
  4. Results and Discussion
  5. Conclusion
  6. Acknowledgments
  7. References
  8. Supporting Information

Table S1. The experimental and predicted activity of training set with SVM model.

Table S2. The experimental and predicted activity of test set with SVM model.

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

FilenameFormatSizeDescription
CBDD_935_sm_TableS1-S2.doc224KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.