Modeling the Binding Affinity of p38α MAP Kinase Inhibitors by Partial Least Squares Regression

Authors


Corresponding authors: Maria Cristina Menziani, menziani@unimore.it; Marina Cocchi, marina.cocchi@unimore.it

Abstract

The p38 mitogen-activated protein kinase is activated by environmental stress and cytokines and plays a role in transcriptional regulation and inflammatory responses. Factors influencing the activity and selectivity of the p38α mitogen-activated protein kinase inhibitors have been investigated in this paper by inspecting the binding orientation and the possible residue-inhibitor interactions in the binding site. The binding pattern of a set of 45 different inhibitors against p38α mitogen-activated protein kinase was studied through Molecular Dynamic Simulations of the protein-inhibitor complexes. Further, Partial Least Squares regression was used to develop a Quantitative Structure Activity Relationship model to predict the binding affinities of ligands. The selected model successfully predicted the test set with a Root Mean Square Error of Prediction of 1.36. The regression coefficients and the Variable Importance in Projection plots highlighted the residue-inhibitor interactions which exhibited the largest absolute effect on the ligand binding, such as the van der Waals interaction with LYS50, ILE81, ASP165; electrostatic interactions with SER29, LEU164; hydrogen bonds with MET106; and total energy interaction with SER29 and LEU83.

Signal transduction via mitogen activated protein (MAP) kinases plays a key role in a plethora of cellular responses (1). Mitogen activated protein kinases constitute an evolutionarily conserved family of enzymes that form a highly integrated network required to achieve specialized cell functions controlling cell differentiation, cell proliferation, and cell death. These cytoplasmic proteins modulate the activities of other intracellular proteins by adding phosphate groups to their serine/threonine amino acids (2). Hence, MAP kinases provide a focal point in order to understand the control of cellular events by growth factors and stresses. Consequently, selective inhibitors against specific kinases can be used to treat various disorders as cancer, arthritis, and diabetes. Most of the protein kinase inhibitors are small molecules that either interfere with phosphorylation or bind the ATP binding site, an area within the activation loop of the MAP kinase in which the dual phosphorylation takes place. The major problem associated with ATP-competitive kinase inhibition is target specificity, as many other enzymes and kinases utilize ATP (3,4). Moreover, the issue of selectivity is complicated by the high degree of conservation amongst different MAP kinase sub-families.

Five families of MAP kinases have been reported in mammalian cells: extra cellular signal-regulated kinases (ERK1 and ERK2); c-Jun N-terminal kinases (JNK1, JNK2, and JNK3); p38 kinase isozymes (p38α, p38β, p38γ, and p38δ); ERK3/ERK4; and ERK5 (5). In general, the ERK cascade is activated by growth factors and is critical for cell proliferation (6). Conversely, the JNK and p38 pathways are stimulated by genotoxic agents and cytokines mediating the stress response, growth arrest, and apoptosis (7). The interest in p38 MAP kinase signaling pathways is mainly due to the fact that it is implicated in numerous diseases, including inflammation, arthritis and other joint diseases, septic shock, and myocardial injury. As p38 MAP kinase regulates the production of TNF-α and IL-1, p38 inhibitors are expected to inhibit not only the production of pro-inflammatory cytokines, but also their actions, thereby interrupting the vicious cycle that often occurs in inflammatory and immune responsive diseases (8).

The first p38 selective inhibitor, pyrimidylimidazole, SB-203580 was reported by SmithKline Beecham in 1994, then a plethora of substituent modifications and heterocyclic replacements of the imidazole and/or pyridine ring systems have been proposed (9–12). Rational drug design favoring p38α target was greatly accelerated by the availability of p38 X-Ray crystallographic structures and by an increasingly comprehensive sequence database of protein kinases. Structurally related p38 inhibitors were discovered which maintained or improved potency, and/or minimized liabilities present in the original pyridylimidazole series (13), as well as different classes of p38α inhibitors that are active in vitro and in vivo assays (14). Since the initial discovery of the first prototypical inhibitor, more than 150 patent applications from at least 30 pharmaceutical companies have claimed other different p38 inhibitors (13).

The early discovered inhibitors bind in the ATP-mimetic mode (Type I inhibitors) targeting the enzyme in its active form, i.e. in an open conformation of the activation loop usually called DFG ‘in’ based on the position of the conserved aspartate-phenylalanine-glycine (DFG) at the beginning of the activation loop. (14) Type I inhibitors are usually effective, but their low specificity and selectivity towards the p38 MAP kinase target make them unsuccessful in many instances. The second generation of inhibitors, defined as type II inhibitors, is more promising from a selectivity point of view. It binds to the same area occupied by the type I compounds but also extends to the additional hydrophobic pocket which becomes accessible by the flip of the DFG-loop to the inactive form, i.e., DFG ‘out’ (15,16). Moreover, from the analysis of some recently published kinase inhibitors Zuccotto et al. (17) highlighted an additional binding mode of the so-called type I1/2 inhibitors which can be exploited in the design of compounds with the desired activity/selectivity profile. Type I1/2 ligands recognize the enzyme in the DFG ‘in’ form; bind to the ATP site like type I compounds, and reach the hydrophobic pocket establishing interactions with those residues characteristic of the type II inhibitors which are based on the DFG ‘out’ form.

Here, with the aim of taking into account a wide range of inhibitors structural features and providing rationalization of the binding mode/modes of the different inhibitors proposed in the literature, a set of 45 inhibitors diverse in structure and functionalities, exhibiting a high range of binding affinities for p38α will be analyzed. The recognition and selective binding of the inhibitor can be described by the residue-inhibitor interactions occurring in the active site of the protein. To this aim, a careful analysis of the binding site for each studied complex after minimization of the average structure derived by Molecular Dynamics will be carried out in order to individuate the amino acid residues that are likely to form the active (ATP) site or in general binding site, being located in a position of the enzyme such as to form some significant interactions with the inhibitors. This could help in the design of new selective inhibitors in future, which would not be restricted by the basic inhibitor structures; on the contrary, the available molecules could be modified, by introducing favorable substituent from the point of view of giving additional stabilizing interactions.

Materials and Methods

Data set

Forty-five inhibitors showing diversity in structure, functionality, and binding modes, i.e., the usual ATP-mimetic ligands along with others targeting the DFG-out conformation, were employed in this study. The binding affinities of the considered inhibitors, expressed as concentration (nm) of the compound required to inhibit 50% of the MAP kinase activity (IC50), are taken from the literature (18–35) and are listed in Table 1.

Table 1.   2D structure, experimental binding affinity data values (IC50, nm) against p38α, identification PDB code (PDB id), and binding mode of the 45 inhibitors considered in this study Thumbnail image of

Modeling of inhibitors and molecular docking

The 45 inhibitors were constructed using 3D sketcher module of the Discovery Studio software from Accelrysa and were subjected to geometry optimization and energy minimization by means of the CHARMM force-field (36). Each optimized inhibitor was aligned to the similar available inhibitor structures in the complexes, whose X-Ray crystallographic structures were available in the Protein Data Bank (37). Then the inhibitors were manually docked into the corresponding p38α protein structures listed in Table 1 (after removing of the docked inhibitor and solvent molecules), by considering the criterium of inhibitor structure similarity as a guide for docking poses.

Twenty-four residues contribute to constitute the entire surface of the binding pocket: VAL27, SER29, TYR32, VAL35, ALA48, LYS50, GLU68, LEU72, ILE81, LEU83, LEU101, VAL102, THR103, HIS104, LEU105, MET106, GLY107, ASP109, ASN112, SER151, ASN152, LEU164, ASP165, GLU175; they were selected in order to determine their role in molecular recognition of docked inhibitors, covering both Type I and II inhibitor occupancy sites (see Figure S1, Supporting Information).

Molecular dynamics simulations

The complexes were subjected to a short minimization procedure (a fast sequence of steepest descent and conjugated gradient) using the CHARMM force-field (36) in order to remove artifact due to the manual docking. Standard protonation states corresponding to pH 7 were assigned to the amino acid residues.

Each complex was then solvated by the TIP3 water molecules through constructing a sphere of 25 Å around the binding site of the protein using InsightII software by Accelrysa. A CHARMM spherical potential was applied to obtain spherical boundary conditions that confer the spherical shape to the water molecules. Solvent molecules are subjected to a Langevin dynamics, useful to smooth the effect of the boundary conditions to the limit of the sphere. The SHAKE algorithm was further applied to constrain bonds involving hydrogen (38).

A standard protocol for molecular dynamic simulation of the protein-inhibitor complexes was followed; it is characterized by four main steps: i) minimization of the whole system, ii) heating of the whole system applying protein restrained, followed by a fast minimization, iii) equilibration run, and iv) production run. The first step was used to remove the close contacts between water molecules, while in the second step an equilibration of the solvent at 300 K was performed, keeping the protein restrained such as to move gradually the water molecules surrounding the protein, so that to have a better distribution of velocities and positions of the latter. After that the whole system was equilibrated for 1 nanosecond, while during the last steps the system was free to move around for dynamic calculations. The time step used is 0.002 ps. The trajectories were analyzed and the interaction energy values for the protein-inhibitor complexes were calculated on the minimized average structures.

The ligand-enzyme interactions were quantified in terms of the total interaction energy (ETOT) along with their components: van der Waals (EVDW), electrostatic (EEL), and hydrogen bond (EHB) interaction energies. Therefore, the computational procedure performed on the selected inhibitor-protein complexes yielded a dataset of interaction energy profiles for each of the 45 inhibitors.

The data collected, consisting of 96 (24 residues × 4 energy values each: ETOT, EVDW, EEL, EHB) interaction descriptors for each inhibitor, were subjected to multivariate data analysis to highlight the important correlation between the measured variables and to build predictive models to estimate the binding affinity.

Data analysis

The interaction-energy dataset obtained for each of the 45 inhibitors docked into the p38α MAP kinase protein was analyzed using Partial Least Squares (PLS) Regression (39). Partial Least Squares is a multivariate calibration technique which establishes a relationship between a set of predictors, X, and a set of responses, Y, by maximizing their covariance. This is achieved in the latent variables space by decomposing the predictors and responses’ blocks according to PCA-like models and imposing an inner linear relation among the X-scores, T, and Y-scores, U, trough a weights matrix W, which rotates the latent variables in X to maximize the covariance between T and U in each dimension. Partial Least Squares can be re-expressed in the form:

Y = XB

where the pseudo-regression coefficients B are calculated as:

B = W (PTW)−1QT

To avoid overfitting, the number of PLS latent variables (PLS components) has been assessed using Leave-one-out cross validation (LOO-CV) (39).

In this work, the predictors (X) dataset is comprised of 45 inhibitors × 96 interaction variables, whereas, the response vector (y) contains the corresponding −Log IC50 values.

The dataset was randomly split into calibration (28 inhibitors × 96 variables) and the validation (17 inhibitors × 96 variables) subsets; to include a wider range of −Log IC50 variation in the calibration set, the most active p38α inhibitor 1PMN (0.078 nm) and one of the least active 1PMV (30000 nm) were both included in the calibration set.

Data pretreatment.  The ETOT and EEL interaction energies show larger scales and variances with respect to EVDW and EHB, which means that the model would be influenced in different extents by the two couples of energy descriptors. To overcome this without introducing overweight from interaction terms showing very little variance, the ETOT and EEL interactions were transformed by taking the fourth square root of their absolute values. This transformation also renders more similar variation in the attractive and repulsive terms within the Electrostatic interactions.

The energy terms were transformed according to:

image
image
image

where, xij is a generic element of ETOT and EEL variables.

Prior to PLS modeling, the X and y data were also mean centered.

Results and Discussion

The results of the Interaction Energy analysis derived by computational simulations of the complexes and rationalized by means of PLS Regression provide important information on the binding determinants of a set of 45 inhibitors belonging to several structurally unrelated chemotypes and representative of different binding modes (Table 1).

A three latent variables (LVs) PLS model was selected according to minimum in leave-one-out cross validation error; it shows a Root Mean Square Error in fit (RMSEC), Cross-validation (RMSECV), and external test set predictions (RMSEP) of 0.5 (0.96) and 1.4 (1.5) and 1.36, in terms of binding affinity values (−Log IC50), respectively. The higher RMSECV value (especially due to BIRB796 and 2QD9 estimation) with respect to RMSEC indicates that many of the molecules in the training set bear unique information, as it could be expected given their structural diversity. Overall, the model performance could be considered satisfactory for identifying, on a pure computational basis, the molecules to be worth of further experimental investigation. In fact, it is worth noting that a high experimental uncertainty is associated to the IC50 determinations collected in this paper, as they come from different research laboratories and are measured in different experimental conditions, sometimes with different protocols.

The experimental (measured) versus predicted binding affinity values are plotted in Figure 1, and overall the training set samples (open circles) lie close to the y = x line, showing a good model fit. The Residuals versus yexperimental (measured) plot reported in Figure 2 can be more informative regarding the model adequacy. If the residuals appear to behave randomly (no patterns), the model is adequate. The residuals plot for the above PLS model shows a rather random pattern in distribution of the samples showing residuals of about one order of magnitude (errors of about 1 in −Log IC50 units) and also for external predicted samples (test set, black triangles), except for the inhibitors JMC17, JMC13, SB203580, 3C5U, which are poorly predicted owing to high residual errors. It may be noted that the worst prediction is mainly for low to inactive inhibitors. Moreover, the worst test set predicted sample, JMC17, resulted to be an outlier, i.e., when projected on the training set model, it is outside both the 95% confidence limits in the squares residuals versus Hotelling-T2 plot (not shown), showing that it has a rather different behavior from other inhibitors.

Figure 1.

 Plot of the measured versus predicted values of y (p38α mitogen activated protein kinase binding affinity) for the training (open circles) and test set (black triangles) samples. Bisecting line is shown as dashed. Labels are the same as reported in Table 1, and for clarity only discussed inhibitor labels are reported.

Figure 2.

 Y Residuals versus Y Measured plot for Training (open circles) and Test set (black triangles).

An overview of the interaction energy terms contributing to the model, hence, influential for binding, can be obtained from the PLS regression coefficients plot, shown in Figure 3.

Figure 3.

 The Regression vector plot for the variables; each residue is labeled according to the kind of interaction term (TOT, VDW, EL or HB) they refer to.

The binding mode of the inhibitors is evaluated on the basis of their interactions with the amino acid residues of the binding site of the protein and measured in terms of the total interaction energy (ETOT) and its corresponding van der Waals (EVDW), electrostatic energy (EEL) and hydrogen bond (EHB) components.

The magnitude of these energy values highlights the role and importance of that particular amino acid residue in stabilizing or de-stabilizing the binding of the inhibitor.

From the regression coefficients plot (Figure 3), it is evident that the van der Waals and the total interaction energies, in general, show negative regression coefficient values, hence, the higher the energy values (less negative) for these residues, the higher the IC50 values and the lower the binding affinity; of course the reverse holds, i.e. the most negative the energy values for these residues the most stable the complex. Residues LYS50 (EVDW) show the highest negative regression coefficients and influence the model more compared to other residues on the positive side of regression coefficients plot. In addition to the above residue, SER29 (ETOT), SER29 (EEL), ALA48 (EVDW), ILE81 (EVDW), LEU83 (ETOT), LEU105 (EVDW), MET106 (EHB), LEU164 (EEL), LEU164 (EVDW), and ASP165 (EVDW) also have negative regression coefficients.

The most active inhibitors, 1PMN and BIRB796, form strong attractive interactions with all the residues having the highest negative regression coefficients, as shown by the Interaction Energies data values listed in Table 2; 3BV2, 3C5U, and SB203580 show slight repulsion with the total and electrostatic energy terms of SER29 (3BV2 and 3C5U), along with LEU83 (repulsive ETOT energy interaction with 3C5U and SB203580) and LEU164 (weak repulsive ETOT energy interaction with 3BV2). On the other hand, JMC13, 1PMU, and 1PMV, i.e., the least active p38α inhibitors form weaker interactions or steric hindrance with the above mentioned residues.

Table 2.   Interaction energy (kcal/mol) contributions of selected p38α mitogen activated protein kinase amino acid residues to the binding of representative inhibitors, together with their experimental IC50 (nm) data values
InhibitorsIC50SER29
TOT
SER29
EL
ALA48V
DW
LYS50
VDW
ILE81
VDW
LEU83
TOT
LEU105
VDW
MET106
HB
LEU164
TOT
LEU164
EL
ASP165
VDW
1PMN0.078−2.18−1.96−2.40−3.57−2.18−1.18−3.02−6.27−3.36−0.16−1.37
BIRB7960.13−1.06−0.97−2.45−4.56−3.55−0.24−2.04−3.46−4.72−1.87−4.17
3BV20.440.040.26−2.40−5.70−2.75−0.12−2.87−6.42−1.510.62−3.33
3C5U6.40.240.30−1.74−2.67−2.191.04−2.34−5.56−3.36−1.37−0.71
SB20458048−0.46−0.35−0.95−3.10−1.761.55−1.57−2.08−3.17−1.75−1.49
JMC13450−0.65−0.54−1.32−2.18−1.78−0.78−1.35−2.610.532.98−0.24
JMC1751000.070.18−1.063.94−1.620.15−1.30−2.300.483.18−1.83
1PMV300000.020.06−0.94−1.45−0.871.01−1.78−2.931.522.53−0.70
1PMU40000−0.77−0.73−0.88−0.38−0.810.07−2.23−2.64−0.071.68−0.94

In particular, the increased potency of the most active p38α inhibitors is achieved by the simultaneous optimization of the hydrogen bond interaction with MET106 and the van der Waals interactions with the LYS50, ALA48, ILE81, and LEU105 amino acid residues. On the contrary, the least active p38α inhibitors 1PMV, 1PMU, JMC17, and JMC13 show weaker HB with MET106, overall, moderate van der Waals interactions with LYS50, LEU105, ILE 81 and ALA48, and repulsive interactions with LEU164 (EEl). The MET106 HB is the only hydrogen bond interaction having relatively stronger interaction energy, which clearly highlights the important role of this particular inhibitor-amino acid residue interaction in the classes of inhibitors studied or p38α MAP kinase.

The interaction of active p38α inhibitors can be better understood by looking at the orientations of 1PMN and BIRB796 in the binding pocket of p38α MAP kinase as resulted from the minimized average structure obtained from the dynamic run (Figure 4).

Figure 4.

 Inhibitor 1PMN (in orange) and BIRB796 (in cyan) into the p38α ATP pocket where they interact with MET106 and LYS50 (in cyan), ILE81 and LEU83 (in blue), and LEU164 and ASP165 (in green). The 2D structures for 1PMN (Upper right hand corner) and BIRB796 (lower right hand corner) are also provided.

The binding pattern of 1PMN and BIRB796 shows the fluorophenyl ring (1PMN) and the naphthyl ring (BIRB796) lying into the hydrophobic pocket (represented in blue in Figure 4) constituted by ILE81, LEU83, LEU101, and LEU105 residues. The pyrimidine ring in 1PMN and methoxy-morpholino ring in BIRB796 form the important hydrogen bond with MET106. Further, the piperidine ring of 1PMN and pyrozole moiety in BIRB796 lie near the activation loop (in green, in Figure 4) and interact with LEU164 and ASP165.

Further the presence of JMC17 as an outlier inhibitor according to the residuals plot can be better understood by looking at its binding pattern in the p38α ATP site (Figure 5).

Figure 5.

 Binding mode of JMC17 in the p38α ATP pocket.

The JMC17 when binding to the ATP pocket mimics the hydrogen bond interaction with MET106 (2.52 Å), which is usually the most important and stabilizing interaction for p38α inhibitor binding. However, the core pyrimidine ring fixes the movement of the adjoining substitutions on the side chains, causing more repulsive interactions with residues LEU164 and LYS50, hence, lowering the activity of the inhibitor against the protein and this could be the reason for distinct binding mode and interactions compared to other inhibitors. Further, the marked inability of the model to fit the data for JMC17 (high Q residual and T2 values) could be due to the loss of van der Waals interaction with the LYS50. JMC17 is the only inhibitor showing repulsive energy value (Table 2) compared to all other inhibitors (both active and not active against p38α), which show attractive van der Waals interaction with LYS50. Moreover, all the outliers JM13, SB203580, and 3C5U share the establishment of repulsive interactions with LEU83 or LEU164.

The above highlighted variables also show significant Variable Importance in Projection (VIP) values (39). Variable Importance in Projection is a parameter which defines the relative importance of each X variable in the PLS model. A variable with a VIP score close to or >1 can be considered important in a given model. Variables with VIP scores significantly <1 are less important and might be good candidates for exclusion from the model. Figure 6 shows a significantly higher VIP score for EVDW energy of LYS50 along with other interactions such as ETOT for ASP165, EVDW for ASP165, EEL and EVDW for LEU164, EVDW for ILE81, ETOT for SER29, EHB for MET106, which also show high VIP scores.

Figure 6.

 Variable Importance in Projection (VIP) and the variable plot; the amino acid residues with the highest positive VIP score are important for binding affinity of the inhibitor against p38α.

The plot of PLS variables weights, which capture the contribution of each variable in X to model the responses Y, under different components may further provide information on the relationships among X variables and their correlation pattern. Figure 7 shows the PLS weights plot for the first versus the second component of a PLS model obtained by including only the X variables significant according to VIP. Although there is no significant improvement in the prediction capability of the model with the RMSEP values lowering from 1.36 to 1.32, as only the VIP variables are plotted, it becomes easier to study the pattern of the variable and the correlation between them. The variables on the lower left quadrant of the plot (Figure 7) are highly correlated with each other and are relevant in predicting the dependent variable (pIC50 values). They are inversely related to the dependent variable (most attractive the energy terms, i.e., the more negative, the higher the binding affinity) whereas, the variables occupying the upper right quadrant are relevant and directly correlated to pIC50 values (meaning that the most active inhibitors show repulsion with these residues, see Discussion below). As evident from the plot, EVDW for LYS50, ASP165, LEU164, EHB for MET106, EEL for LEU164, and ETOT for SER29 and LEU83 are relevant for increasing the binding affinity and future inhibitors may be designed to improve these interactions. Whereas, variables (ETOT for SER151, TYR32; EEL for SER151; EVDW for VAL35; and EHB for HIS104) occupying the upper right hand quadrant of the plot are characterized by interaction energies values positive or closer to zero, thus registering repulsion, with the most active inhibitors. These residues should be considered in the design of inhibitors for p38α MAP kinase, as they directly influence the binding affinity of the bound inhibitor and could further influence the selectivity. To incorporate this information, modifications of the side chains of the inhibitors are admitted to further enhance the above mentioned interactions.

Figure 7.

 Plot of Partial Least Squares weights first versus second component.

Conclusions

The results obtained by the computational approach based on molecular dynamics analysis of inhibitor-p38α MAP kinase complexes used in this study help in providing a better view of the binding pattern and orientation of the inhibitor in the binding pocket and of possible favorable substitutions and modifications which can be introduced on the molecular scaffolds to induce improved affinity against the target.

The PLSR approach used to rationalize the data obtained furnishes a QSAR model with good performance in terms of Y-variance captured and low root mean square in cross-validation and external prediction. The van der Waals interaction with LYS50, ILE81, ASP165; electrostatic interactions with SER29, LEU164; hydrogen bonds with MET106; and total energy interaction with SER29 and LEU83 were highlighted as the important residue interactions to influence the strong binding affinity of the inhibitor against p38α.

Hence, the multivariate regression analysis clearly shows their capability to analyze datasets generated on ligand-residue interactions, by extracting factors significant for the increased binding affinity of the inhibitors for the target along with highlighting molecules that have a binding orientation different from others.

Acknowledgments

N.B. Thankfully acknowledges the award of Italian Government Fellowship for the Doctoral Research Program.

Footnotes

Ancillary