Prediction of Protein (cid:2) compound Binding Energies from Known Activity Data: Docking-score-based Method and its Applications

17


Introduction
The quantitative structureÀactivity relationship (QSAR) approach is a useful tool for optimizing leads and predicting target/off-target activities and toxicity. QSAR-based affinity predictions are useful for the general drug development process, including the repositioning (repurposing) of already approved drugs, poly-pharmacology, and the prediction of drugÀdrug interactions. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] The recent accumulation of proteinÀcompound affinity data in public repositories, such as the PubChem and ChEMBL projects, has enabled us to carry out proteome-wide target/off-target predictions. [16,17] These predictions are based on QSAR models for multiple proteins, just as in conventional computer-aided drug design and virtual screening.
Wide application of QSAR-based models in computeraided drug development, such as proteinÀcompound binding free energy (affinity) prediction, target/off-target predictions, and counter screening based on QSAR models, has succeeded in many studies, including ours. [3,4,7,8] Most QSAR models rely on descriptors with sets of two-dimensional (2D) substructures; the most popular such descriptors are MDL's MACCS key and 0-3D molecular descriptors (e.g., 5,270 descriptors recorded in Dragon (Kode srl, Pisa, Italy)). In our previous studies, we developed QSAR methods for the affinity prediction of a compound by using docking studies against multiple proteins. [17][18][19] We used a proteinÀcompound affinity matrix as the set of descriptors and applied principal component regression (PCR). [18] The Q 2 value of calculated binding free energies was 0.44 and the RMSE was 1.54 kcal/mol for about 97 kinases and 18,491 compounds selected from the ChEMBL database. However, the coefficients of the regression equations for some targets were unrealistically (10 3 -10 5 ) higher than those for other targets. Either these coefficients should be restricted to a range of realistic values, or the applicability domain should be very restricted around the known experimental data.
In the present study, we applied a combination of the ridge (Tikhonov regularization) regression, robust estimation, and principal component analysis to the proteinÀcompound affinity matrix. [19][20][21] The robust estimation was expected to reduce the problem of error in the experimental data. The present method restricted the coefficients of the generated prediction equation around realistic values. The method was applied to the kinases and the matrix metalloproteinases (MMPs) of the ChEMBL database. [22][23][24] MMPs, which have zinc ions in the reaction pockets, require metal-binding groups in their inhibitor structures in many cases. [25] The metalÀligand interaction shows quantum effects such as electron donation and back donation, that form a weak covalent bond, in addition to the electrostatic and van der Waals interactions. [26][27][28][29][30][31][32][33][34][35][36] The quantum effect makes it difficult to estimate binding energy. A method to evaluate metal interaction has long been sought. In the framework of the classical force field, several methods have represented metal interactions. One is to add a metal contact term such as the van der WaalsÀtype potential and the potential considering the coordination number of the central metal ion. [27][28][29][30][31] The other method is to modify the parameters of the atomic charge and the van der Waals potential of the original force field. [32][33][34] The metal parameters depend on the environments of the metal atom, so the user should tune the parameters for each protein. [35,36] The metal contact terms enable the user to reproduce proteinÀligand complex structures and the absolute value of the binding energy. However, protein-dependent parameter tuning has been a time-consuming process, especially when the user analyzes multiple target proteins.
In addition, we evaluated the effect of the elimination of outlier data points from these multiple data points corresponding to each single proteinÀcompound pair, and we improved the present regression model. The method, which we call the "docking-score-based QSAR model", predicted the proteinÀcompound binding affinities of 107 kinases that have no metals in their pocket, and those of 5 MMPs (MMP2, MMP3, MMP7, MMP9, and MMP13) that have a zinc ion in each pocket. The docking-score QSAR method worked for these various targets. Namely, the RMSE values were~1 kcal/mol, respectively.

Background of Prediction Models
In the present study, we develop a binding-energy (affinity) prediction method based on the proteinÀcompound dock-ing scores obtained by a docking program; the present method is a modified version of our QSAR method. [18] In our present and previous QSAR models, the affinity of compounds can be estimated by using a pharmacophore model of the target protein. The IUPAC guidelines define a pharmacophore as "an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological response". [37] Hydrogen donors, hydrogen acceptors, hydrophobic groups of ligands and receptors are called "pharmacophore features" and the functional groups can give these features. [38] These features are usually depicted as spheres connected by lines those lengths represent the spatial distances among these features of the ligands and receptors. In the present study, we temporarily define a pharmacophore as a set of spatially distributed pharmacophore features. Each pharmacophore feature represents the probability of existence of a hydrogen-bond donor, a hydrogen-bond acceptor, and both electrostatic and hydrophobic interaction sites. Both receptors and ligands have pharmacophores. We approximate that the receptorÀligand binding energy is given by the sum of interactions between the pairs of the ligand pharmacophore (f l ) and the receptor pharmacophore (f r ). Here, we introduce a function for the interaction k and let k(f r , f l ) be an interaction value between the two pharmacophores.
Because we discuss only the interaction, we do not need to explicitly represent fs, but rather we need only the value of the function k(f r , f l ) for the interaction. Our final DGestimation equation does not include fs explicitly but consists of docking scores. In this framework, the interaction between the receptor and ligand pharmacophores gives the binding energy of the pharmacophore l in a ligand to the pharmacophore r in a protein, DG l r , as where g l r is a parameter.
We try to generalize this discussion by introducing a linear combination of a basic pharmacophore. Suppose F (= {f 1 , f 2 , f 3 , …..}) is a set of all pharmacophores. The f functions form the basis set of the pharmacophores of any kind of ligand or receptor. Each pharmacophore (f i ) does not have to be found in an actual protein structures. In this discussion, pharmacophores work as descriptors of both protein and compound. The pharmacophores are used only to derive a regression model, and we do not calculate them explicitly. And the total number of pharmacophores could be infinite in the present discussion. We suppose that the binding energy of the i-th compound and the r-th pharmacophore in a protein is Then the proteinÀcompound binding energy (DG i a ) between the a-th protein and the i-th compound is given by the following linear combination of binding energies to pharmacophores {f m ; m= 1, 2, 3,…}, where w r a are the scaler coefficients (see Figure 1).
Various protein pockets correspond to the various pharmacophores, and they work as probes for a given compound instead of f in eqs. 2 and 3. Thus the binding free energy can be estimated by the regression based on the docking scores for the various protein structures. This is a simplified model, since it does not include the intramolecular interaction or the conformational entropy of the compound. Figure 2 shows the procedure of the present QSAR method. This method requires a learning set of 3D structures of compounds, the binding energy data between those compounds, and target proteins. We assume that proteinÀligand docking programs give DG i a . We proposed an approximation, as follows. [18] Here, s i b , R b a , and b a are the docking score of the i-th compound to the b-th protein, the weight parameter, and the parameter for fitting, respectively. The set of {b} can include the target protein (the a-th protein). Equation 4 showed the RMSE of predicted binding energies was 1.5 kcal/mol. [18] One of the most serious problems with QSAR models is, in general, the limited range of applicability domains, since these models cannot work for input data that is too different from the training data set. [39] Since docking scores have been developed to mimic binding free energy, we assume that a docking score is equal to the binding free energy, DG i a = s i a , R a a = 1, b a = 0, and R b a = 0 for a ¼ 6 b in eq. 4. In this case, eq. 4 can work without any experimental affinity data, and the problem of identifying an applicability domain is avoided.
Eq. 4 gives a linear regression model whose descriptors are docking scores, and the number of parameters is equal to the number of proteins.
In the previous [18] and present models, the protein (a)compound (i) binding energy DG i a is approximated by the PCR method based on the proteinÀcompound docking scores {s i b }. The optimal principal component (PC) axis was selected to maximize the correlation coefficient by the  The PC analysis projects the score vector (column vector) of each compound into a point in the PC space. In the PC analysis, each dot represents a compound. The red, green, and blue dots represent the strong, medium, and weak affinity compounds, respectively. In this example, the first and second principal component axes (PC1 and PC2) are not useful to describe the affinity difference. c: The factor rotation method selected the representative axes (PC-M and PC-N) from the total N p axes. PC-N and PC-M describe the affinity difference clearly. d: Finally, the regression model is constructed by using PC-N and PC-M.
Here, c j a , b a , p, and d b j are the parameter, offset parameter, principal component vector, and loading vector, respectively. The total number of optimal PC axes is N axis . The upper bar represents an average. The PC axis of the proteinÀcompound docking-score matrix s gives the loading vector d and the principal component vector (axis) p. The parameters c and b are determined by multilinear regression (MLR).
N axis and N p are the number of selected axes by the factor rotation and the total number of proteins used in the docking study as the compound descriptors, respectively. Here, N axis (N axis < N p ) is determined to maximize the correlation coefficient obtained by the LOO cross-validations. The parameters are determined based on the learning set and then are used for prediction.
To calculate the proteinÀcompound docking scores s i b , we used our own program, Sievgene, [40,41] a proteinÀligand flexible docking program for in silico drug screening. Sievgene is a part of the myPresto system, which is available online (http://presto.protein.osaka-u.ac.jp/myPresto4/) and is free for academic use.

New Prediction Model with Restricted Regression Method
In the present study, we introduced a regularization term and robust estimation into the previously developed QSAR model based on docking scores using eqs. 5-6. In eq. 5, the coefficients c j a and b were determined by the multiple linear regression in our previous study. [18] In some cases, c j a was unrealistically large (10 3 -10 5 ). The value should be 0 or 1 when the target protein structure is used in the docking calculation (we call this c j a value "ideal value"), and the docking score is equal to the binding energy. We would like to restrain c j a to around this values. In the present study, the coefficients were determined by minimizing the objective function by introducing the regularization term. Let Objf (a) be the objective function for the a-th target protein for the determined parameters c and b.
Here, c ideal j a is the ideal value of c j a , and b a and l are parameters. N cmp is the total number of compounds. c ideal j a is unknown. DG exp represents the experimental DG value. The last term restrains c j a to around the ideal values. Equation 7 is a generalized version of the Tikhonov regularization. [19][20][21] To estimate the l value, we considered the following things. Since eq. 4 suggest that c i a = DG i a under the ideal conditions (DG i a = s i a , R a a = 1, b a = 0, and R b a = 0 for a ¼ 6 b, discussed in section 2.1) and that {c j a } values should correspond to DG values. The larger the l value is, the smaller the { j c j a j} values are. In general, most j DG j < 18 kcal/mol. In the present study, l was set to 0, 00001, 0.0002,.., 0.01, to satisfy { j c j a j} < 18 kcal/mol. Also, c ideal j a was set to 0, since the protein set providing the docking scores does not include the target protein structures in the present study.
In addition, we apply the maximum-likelihood-like estimation (M estimation), a robust estimation method. [42] The M estimation method weights the difference between a calculated value and an experimental value considering the predicted experimental error. The M estimation version of our objective function is given as follows: where Here, W and d are the upper limit of allowed error and a scalar value, respectively. The d value is the difference between the experimental value and the fitted value. The parameters c and b are determined to minimize the objective function. The derivation of eq. 8 is not linear, and we solve eqs. 8-10 by an iterative procedure. The M Full Paper www.molinf.com estimation method places a higher weight on likely reliable data than unreliable data. W = 0, 5, 10, 15,…, 100 were examined in the present study. We call this model (eq. 8) the "docking score QSAR model".

Generation of the Docking-score Index by ProteinÀcompound Docking
The proteinÀcompound docking scores s i b were calculated by the proteinÀcompound docking program Sievgene. [40] This ligand-flexible program reconstructed about 50% of the receptorÀcompound complexes in PDB (132 in total) with an accuracy of less than 2 Å root mean square deviation (RMSD) in a self-docking test. [40] The computational setup in the present study was exactly the same as that in the previous study. Namely, Sievgene generated up to 100 conformers for each compound, and 200 3 200 3 200 grid potentials were adapted for all proteins. The pocket regions were suggested by the coordinates of the original ligands in the receptorÀcompound complex structures, and each edge length of the grid was about 35-45 Å. The docking-scoring function is based on the physical chemistry (accessible surface area, van der Waals potential, and electrostatic potential). The estimated error in binding free energy is almost 2.5 kcal/ mol. [42] It takes 1 second to dock one compound against one protein on a single core of a Xeon 5570 CPU (2.98 GHz).

Probe Protein Sets
To generate {s i b } in eqs. 5-8, we performed a proteincompound docking simulation based on the soluble protein structures registered in the Protein Data Bank (PDB). The probe protein set consisted of 600 arbitrarily selected protein structures, as in our previous study (see APPENDIX A in Supporting Information). All of these structures were proteinÀligand complexes. The protein set did not include the present target proteins. For protein sets, the complexes containing a covalent bond between the protein and ligand were removed, and all missing hydrogen atoms were added to form all-atom models of the proteins. All water molecules and cofactors were removed from the protein structures. All Asp and Glu were prepared as negatively charged forms, while Lys and Arg were prepared as positively charged forms. The atomic charges of the proteins were the same as those in AMBER parm99. [43] The docking pocket of each protein was indicated by the coordinates of the original ligand.

Training Set: Target Proteins and Compounds
To compare the present and previous results, the compounds and their assay information (compound structures, affinities against kinases) were downloaded from the Kinase SARfari website (https://www.ebi.ac.uk/chembl/sarfari/kinasesarfari/downloads) in the ChEMBL database, as in our previous work. [18] Note that the ChEMBL main page does not link to the KinaseSARfari website directly. The biochemical assay data, namely, K i , IC 50 , %residual activity, and/or %inhibition values of human kinase protein-inhibitor systems, were also extracted from the bioactivity table in KinaseSARfari, and these data were converted to binding free energy by the software package used in our previous report. [18] The biochemical assay data were translated into the binding free energy by the ChengÀPrusoff equation and others. [14,44] . We assumed that the experimental conditions were the same in all the assays. The procedure is described in details in APPENDIX B in the Supporting Information.
The first target was the kinase family. As target proteins, 107 kinases were selected. The 3D structures of the compounds were energy-optimized by cosgene [32] with the general AMBER force field (GAFF), [45] and the atomic charges were calculated by the MOPAC AM1 model using the Hgene program of the myPresto suite. Each functional group in all molecules was set to the dominant ionic form at pH 7. Finally, the filter condition reduced the number of data points used, and 45,663 assay data points of 107 kinases were derived.
The second target was the MMP family. We selected MMP2, MMP3, MMP7, MMP9, and MMP13. The protein structures were extracted from the PDB. The PDB IDs were 1hov, 2y6d, 4g9l, 5b5o, and 5cuh for MMP2, MMP3, MMP7, MMP9, and MMP13, respectively. [46][47][48][49][50] The protein structures were prepared in the same manner as the kinases. The zinc atom charges of these MMPs were set to + 2, which is the formal charge of the most stable singlet zinc ion. As mentioned in the previous works, the actual charges were smaller than the formal charge and the charge values should differ from each other depending on the pocket structures. The proteinÀcompound interaction data were extracted from the ChEMBL database. The compound structures and the DG values were prepared in the same manner as the kinases.

Definitions of Q 2 and RMSE
The definition of Q 2 and root-mean-square error (RMSE) are determined as follows.

Cross-validation Tests of the Docking-score QSAR Model
Each of the compounds that gave assay data for one or more of the 107 target proteins was docked to all proteins of a protein set to generate the proteinÀcompound docking-score matrix s. Then we adopted eq. 8 with changes to l and W values and the LOO cross-validation test to calculate the Q 2 . In addition, we applied the 4-fold cross validation test in all kinase cases to verify the results. Table 1 summarizes the RMSE and Q 2 values of DGs calculated by the docking-score QSAR model with changes to the l value in eq. 8 without the M estimation. The Q 2 and RMSE depended on l, and l = 0.0001 showed the best Q 2 and RMSE. These values did not change appreciably when l was > 0.00001 and < 0.005. The regularization term worked well, and l was set to around 0.0001-0.005 in the following calculations. In the present study, our regression model did not use the docking scores for the target proteins and instead used the 600 probe proteins. Indeed, we found the coefficient R b a values reasonably low. Namely, the maximum and minimum values of R b a were 3.9 and À3.8, respectively, and the average and standard deviations of R b a were 0.01 and 0.1, respectively.
In eq. 8, the estimated error d was unknown a priori. The d value was estimated by an iterative solution method. Starting from d = 0, the new d value was estimated by using the previous d value. The iteration converged within 4-6 steps in all cases. Tables 2 and 3 summarize the Q 2 , and RMSE of calculated DGs obtained by the docking-score QSAR model with changes to the W value in eq. 9. The data in Tables 2 and 3 were obtained by the LOO and 4-fold cross validation tests, respectively. The Q 2 and RMSE values were improved when 20 kcal/mol < W < 100 kcal/mol in many cases compared to the result by eq. 7 (W value is "À" in Tables 2 and 3) and the prediction with W = 5 kcal/mol gave the worst Q 2 and RMSE values ( Figure S1). The Q 2 and RMSE values depended on W weakly, and the results with W > 20 kcal/mol were almost equal to each other. The M estimation improved the results slightly. Equation 8 with l = 0.0002 and W = 20 kcal/mol in the LOO cross validation and that with l = 0.005 and W = 20 kcal/mol in the 4-fold cross validation gave the best Q 2 and RMSE values. Figure 3 shows the results of the LOO cross-validation test with l = 0.0002 and W = 20 kcal/mol for all 107 kinases. The average error reached 1.08 kcal/mol and Q 2 = 0.70 in the binding free energy. Some of the datasets showed very high accuracy, considering that the thermal fluctuation was about 0.6 kcal/mol at room temperature (Table S1). The results of the 4-fold cross-validation test with l = 0.005 and W = 20 kcal/mol for all 107 kinases showed qualitatively the same trends as those in Figure 3 ( Table S2). The W value was originally in the acceptable error range, and the regression/analysis ignored the data points with the estimation error (d) > W. The optimal W value was much bigger than the standard deviation of DG values that are multiply observed for each target proteinÀcompound pair

Effect of Elimination of Outliers from the Experimental Data
There are multiple experimental affinity data for some single proteinÀcompound pairs in the database. This is because the experimental data depend on the experimental conditions such as pH, temperature, density of buffer salt, and cell line, and then the unique proteinÀcompound pairs in the database correspond to these different experimental affinity data points under different experimental conditions. Some kinases are anti-cancer drug targets, and the aminoacid mutation of kinases causes drug resistance. The database includes such kinase data, and these mutants with different affinities share unique protein IDs in the database. In this case, a drug-resistant protein should show weaker binding affinity to the same compound than to a native protein. The enzyme activity could depend on the effector protein in the proteinÀprotein interaction network. In this case, the observed enzyme activity depends on the experimental environment, such as whether it is in vivo or in vitro. Also, the protein activities depend on the temperature, pH, and density of the salt of the buffer solvent. These conditions are not shown in the ChEMBL database clearly.
The average standard deviation of the Log 10 K i affinity data was about 0.24 kcal/mol, that of Log 10 IC 50 was about 0.44 kcal/mol, and that of Log 10 (%-inhibition or %-residual activity) was 0.093 kcal/mol. When all the experimental data were translated to DG, the average deviation of the binding affinity was about 0.73 kcal/mol (Table S3).
We examine the effect of eliminating outliers among multiple data on the prediction result. When multiple affinity data correspond to a single pair of a protein ID and a compound ID, we eliminate outliers among the multiple data. Let the number of multiple data points for the a-th protein and the i-th compound be M a i and the k-th affinity value be E a i (k). We define E a i (k) as an outlier of the data set when E a i (k) satisfies the following relationship.
where DG a i is the average value of a set of E a i (m), m = {1,.., M a i }, and s a i is the deviation of the data. Equation 14 defines the k-th compound as an outlier when the k-th compound satisfies this condition. In the present study, we removed outliers from all experimental data trying N = 0.2, 0.4, 0.5, 0.6, 0.8, 1, 2, and 3 before the following cross-validation tests. Table 4 summarizes the cross-validation results for N = 0.2, 0.4, 0.5, 0.6, 0.8, 1, 2, and 3. In Table 4, the regression models used were eq. 8 with l = 0.0002 (LOO cross validation) and l = 0.005 (4-fold cross validation), respectively. In both tables, W = 20 kcal/mol. In all cases, the elimination of outliers in the test data set improved the prediction results with decreasing N. Elimination of outliers worked well when N s was set to N < 0.8. Figure 4 shows the result of the 4-fold cross-validation test for all 107 kinases. The result by the LOO cross validation was similar to Figure 4. Some of the data points are located vertically at À4 kcal/mol. These data points correspond to 0% inhibitions or 100% residual activities. Except for these data points, the predicted data points clearly correlate to the experimental data. Also, we examined the 4-fold cross validation case with l = 0.002, and the result was close to that summarized in Table 5 (Table S4).
When N = 0.2, 0.5, 0.8, and 1.0, eqs. 13-14 eliminated 23%, 21%, 17%, and 4% of the data, respectively, out of the total of 45,663 data points. Since the standard deviation of the experimental DG values was 0.7 kcal/mol, the results of RMSE < 0.7 kcal/mol should be the accurately predicted cases. The outlier elimination increased the accurately predicted cases (RMSE < 0.7 kcal/mol) from 20 to 42 cases out of the 107 targets in the LOO cross validation.
In some cases, the outlier elimination did not work and the prediction accuracy remained low. The data sets with more than 1000 data points showed particularly low accuracy, such as the sets for cyclin-dependent kinase 2, epidermal growth factor receptor erbB1, tyrosine-protein kinase SRC, vascular endothelial growth factor receptor 2, and MAP kinase p38 alpha. The reason for this was unclear. When a target protein has multiple ligand binding sites, such as orthosteric and allosteric sites, the linear regression model is not suitable. In this case, nonlinear regression, such as logistic regression and neural networks, may solve the problem. But the present model is based on eqs. 2-4. When the target protein structure gave the docking score without computational error, then DG i a = s i a . The nonlinear regression must satisfy this simple condition. This problem is somewhat troublesome. We will examine it in the future.
The ChEMBL database did not provide details on the experimental conditions (density of native ligand, temperature, etc.). The DG values should be precise by using the actual experimental information. But this method requires natural language analysis, which is an expensive approach. One possible improvement of the present method might be batch effect correction. [52] That is, we could determine some optimal artificial values as the experimental parameters, such as the densities of ligands and substrates, in order to minimize the prediction error. In this approach, first the present method generates a prediction model based on the experimental data with the standard experimental values, then the batch-effect correction method would optimize the experimental parameters of each batch to minimize the computational error of the set of data points of the batch.
The outlier elimination improved accuracy more than the M-estimation did. This gap might be attributable to the fact that the standard deviations of DG values differ from each other among the different targets and different compound sets, while the W value is consistent throughout all the data. This difference could be the reason why the outlier elimination improved the results better than the M estimation.

Application to MMPs
We studied MMP2, MMP3, MMP7, MMP9, and MMP13 by the docking-score QSAR method and the naïve proteinÀcompound docking simulations. We applied the

Full Paper
www.molinf.com naïve proteinÀligand docking calculation by Sievgene. [40] Figure 5A shows the correlation between the experimental and the calculated proteinÀcompound binding free energies of MMP2 by Sievgene. The averaged values of the Q 2 and RMSE were 0.037 and 1.692 kcal/mol, respectively. Table 5 summarizes the total number of ligands as well as the Q 2 and RMSE values of MMPs by naïve docking. The accuracy was poor and the results did not show any experimental trends, since the Sievgene docking program does not have a metal-contact term to support the metal ions.
Next, we applied the docking-score QSAR method to the same data sets. We applied the LOO cross-validation test to the MMPs. Figure 5B shows the correlation between the experimental and calculated proteinÀcompound binding free energies of MMP2 by eq. 8. The Q 2 and RMSE values by the LOO cross-validation test were 0.88 and 1.08 kcal/mol, respectively. The RMSE values by the docking-score QSAR method were much better than those by the naive docking, while both methods used the same docking program (Table 5).
We checked the individual correlation results. The results obtained by the docking-score QSAR method were much better than those by the naïve docking in many cases. The overall trends of the results were the same as those in the kinase cases ( Figure S3). Namely, the average RMSE value by the naïve docking studies was 1.7 kcal/mol. These values were close to the deviation of the experimental DG values. On the other hand, the docking-score QSAR showed an average RMSE of 1.1 kcal/mol and a better Q 2 .
The docking-score QSAR method used only the Sievgene docking scores and was exactly the same program used in the above naïve docking study without any parameter tuning for the quantum mechanical interaction between the metal and the ligand. The docking-score QSAR method should take into account the quantum mechanical interaction implicitly to improve the prediction results. The RMSE values were similar to those given in Section 3.2.
The docking-score QSAR method was based on the docking program without consideration of the quantum effect between the metal atom and the ligands, but it improved the RMSE by 0.6 kcal/mol, the same as in the kinase cases. Thus, the present method predicted DG values with an RMSE of 1 kcal/mol even for the difficult targets like the MMPs, and it could be applied to general target proteins.
The force-field parameters for the metalÀligand interaction are generally poor, because such interactions should include quantum effects, which depend on the environment. Consequently, the naïve docking simulations could not provide good scores for the MMP systems. The current docking-score QSAR model is based on a docking score that does not include a quantum effect, but this QSAR model is a completely different approach from that of the naive docking study. The machine learning procedures provided smaller RMSE value than the naive docking study.

Conclusions
We developed a docking-score QSAR model based on combinations of multiple docking scores from proteinÀdrug docking simulations, and applied this model to 107 kinase proteins from the ChEMBL public database. The prediction model employed a descriptor-based weighted PCR with a regularization term and robust estimation (M estimation) methods in order to realize more realistic prediction than that by the ordinal multilinear regression model. The compound descriptor was a set of docking scores against many nontarget proteins. The LOO and 4-fold crossvalidation tests showed that the addition of the regularization term improved the RMSE from 1.5 kcal/mol to 1.1 kcal/ mol. In addition, the data preparation with outlier elimination worked to improve the results. We also applied the present method to MMPs that were difficult targets because of the quantum mechanical interaction. The present method improved the RMSE values for the MMPs without any manual parameter tuning, the same as in the kinase cases. The LOO cross-validation tests showed that the docking-score QSAR method improved the RMSE from 1.7 kcal/mol by the naïve docking calculation to 1.1 kcal/ mol. These results suggested that this method is applicable to general target proteins. The analysis performed was based on the cross-validation tests only, and there was no prospective experimental validation in this study. Further analysis of the validation tests designed for extrapolation should be performed.