Once desirability scaled both KiA3 and REA3 responses for each compound, the corresponding overall desirability (DKiA3-REA3) values were derived. To identify the factors governing the trade-offs between binding affinity and efficacy of this family of A3AR agonists, the combined response DKiA3-REA3 was mapped as a function of four simple 1D MDs with a direct structural and/or physiochemical explanation. The resulting best-fit model together with the statistical regression parameters is given below:
N = 32 R2 = 0.781 R2Adj = 0.749 F = 24.13 s = 0.127
Q2LOO = 0.566 sLOO = 0.138 Q2Boost = 0.539 sBoost = 0.179 a(R2) = 0.0063 a(Q2) = −0.0039
The statistical significance and predictive ability exhibited by the model show evidence of their suitability for subsequent analyses.
No violations of the preadopted parametric assumptions were found for eqn (12).
At the same time, two QSAR PMs (for KiA3 and REA3) focused on their predictive ability (identified further as prediction approach A2) were derived to use both in combination with the previously described overall desirability PM (eqn (12), identified further as prediction approach A1) in a LBVS strategy based on the combination of their concurrent predictions through belief theory.
The resulting best-fit models together with the statistical regression parameters are given in eqns (13 and 14):
N = 32 R2 = 0.985 R2Adj = 0.981 F = 230.82 s = 48.796
Q2LOO = 0.977 sLOO = 56.345 Q2Boost = 0.957 sBoost = 61.246 a(R2) = 0.0017 a(Q2) = −0.0052
N = 32 R2 = 0.966 R2Adj = 0.956 F = 96.79 s = 5.515
Q2LOO = 0.942 sLOO = 6.369 Q2Boost = 0.921 sBoost = 7.182 a(R2) = 0.0017 a(Q2) = −0.0055
According to their statistics, the models are good in terms of their statistical significance and predictive ability. In opposition to eqn (12), eqns (13 and 14) were derived from a pool of variables significantly higher than the number of cases used for training. As a consequence, the risk to find chance correlations in such a vast variable space is always high. So checking the occurrence of this event is of vital importance in this case. As can be deduced from the significantly low values of a(R2) and a(Q2) obtained in the respective Y-scrambling experiments, there is no reason to ascribe to chance correlations the statistical significance and predictive ability exhibited by each PM.
With the exception of the non-multicollinearity of the independent variables included in the MLR model developed for REA3; no violations of the remaining MLR parametrical assumptions were found (48). As above-mentioned, multi-collinearity affects the common interpretation of a regression equation. However, the predictive ability of the PM is not affected in this situation (46).
See Supporting Information for details of the inspection of the parametrical assumptions as well as the establishment of the applicability domain of eqns (12–14).
Consequently, according to the statistical parameters exhibited, the goodness of fit of the PMs involved on both prediction approaches A1 and A2 can be considered as statistically significant. At the same time, considering their satisfactory predictive ability and the validity of the preadopted parametrical assumptions, the resultant predictions can be regarded as reliable in the domain of the N6-substituted-4′-thioadenosines A3AR agonists used for training and structurally coded as a linear function of the respective subsets of MDs. Therefore, all the PMs developed can be employed in a LBVS scheme with an adequate degree of reliability.
Desirability-based prediction model interpretation and theoretical design of N6-substituted-4′-thioadenosine A3AR agonist candidates
Based on the satisfactory accuracy, statistical significance and predictive ability of the overall desirability PM (eqn (12)) we can proceed, with an adequate level of confidence to the simultaneous analysis of the factors governing the balance between the binding affinity and relative efficacy profiles of A3AR agonists.
Although the main variation of the subset of compounds employed is over the N6 position of the adenine ring, the MDs employed in mapping DKiA3-REA3 are global and not fragment based. So any inference made have to be only based on the influence of N6 substituents over the global molecular system.
First, the information encoded in the MDs included on the model was analyzed. According to the model regression parameters, the most influencing MD is the aromatic ratio (ARR), followed by the Ghose-Crippen octanol–water partition coefficient (ALOGP2), the number of circuits (nCIR) and the number of total secondary sp3 carbon atoms (nCs). All MDs were inversely related with the overall desirability DKiA3-REA3 of N6-substituted-4′-thioadenosine A3AR agonists, except nCIR.
Specifically, ARR is the fraction of aromatic atoms in the hydrogen suppressed molecule graph and encodes the degree of aromaticity of the molecule. According to the model parameters, N6 substitutions increasing the aromaticity of the molecule do not favor DKiA3-REA3.
ALOGP2 is simply the square of the Ghose-Crippen octanol–water coefficient (ALOGP), which is a group contribution model for the octanol–water partition coefficient. Because these MDs encode the hydrophobic/hydrophilic character of the molecule, DKiA3-REA3 could be favored by the presence of N6 substituents contributing to reduce the hydrophobicity of the molecule.
The nCIR is a complexity descriptor, which is related to the molecular flexibility. Because nCIR serve as a measure of rigidity with higher numbers of circuits corresponding to reduced flexibility; cyclic and rigid or conformationally restricted N6 substituents could increase the overall desirability of the molecular system.
Finally, the presence of secondary sp3 carbon atoms in the molecule appears to be detrimental for DKiA3-REA3.
According to the model, a molecule with a low aromaticity degree, without secondary sp3 carbon atoms, and containing cyclic and rigid N6 substituents, which contributes to reduce the hydrophobicity of the system could favor the balance of the binding affinity and relative efficacy profiles of N6-substituted-4′-thioadenosine A3AR agonists.
To note that these conclusions, although derived from a simple 1D model, are very similar to that obtained by 3D-CoMFA/CoMSIA approaches (12). Kim and Jacobson have concluded that a bulky group, conformationally restricted, at the N6 position of the adenine ring will increases the A3AR binding affinity, and that a small bulky group, at this position, might be crucial for A3AR activation. Note the accordance of data obtained in the previous and present work: a ‘conformationally restricted bulky group’ is suggested by Kim and Jacobson and herein a ‘cyclic and rigid substituents’ on the N6 position.
To note that although nCIR is not the MD more significantly related with DKiA3-REA3, it is very informative for the property. From nCIR, we can infer that the bulkiness of the N6 substituent suggested in (12) can be characterized by a cyclic rather than an alkyl substituent.
Although useful, this information is found to be incomplete because it is well known that steric factors are determinant for the design of A3AR agonists, especially for binding affinity (12). Consequently, it is found to be important to determine the optimal size of the conformationally restricted cyclic N6 substituent. Unfortunately, the simple inspection of the regression parameters of the PM does not offer this information. In consequence, a property/desirability profiling was carried out to identify the levels of the MDs included in the PM that simultaneously generate the most desirable combination of binding affinity and relative efficacy.
As the main goal of this analysis is to extract information on the factors governing DKiA3-REA3 rather than optimize it, the behavior of DKiA3-REA3 was profiled at the mean values of the four MDs rather than looking for their optimal values (see first row in Figure 1). Accordingly, it was possible to find the levels of the MDs simultaneously producing the best possible DKiA3-REA3 in the training set employed. As can be noted in Figure 1 (second row), a A3AR agonist candidate should exhibit a value of DKiA3-REA3 near to 0.9 at levels of ARR, nCs, ALOGP2, and nCIR around 0.4, 2, 0, and 6; respectively.
Figure 1. Property/desirability profiling of the levels of the molecular descriptors that simultaneously produce the most desirable combination of binding affinity and relative efficacy of N6-substituted-4′-thioadenosine A3AR agonists.
Download figure to PowerPoint
The analysis reveal that the most favorable balance of binding affinity and agonist efficacy: the ARR should be not just low but near to 0.4; ALOGP2 should be as low as possible; the number of secondary sp3 carbon atoms should be kept around two; and nCIR should be not just high but close to six.
Because the thioadenosine nucleus already contain three secondary sp3 carbon atoms, at least on the applicability domain of the present model, the minimum number of such atoms should be kept at three. So this type of carbons must be excluded in the substituents located at N6 position.
At the same time, considering that the nCIR value of the thioadenosine nucleus is four, one can deduce that the ideal nCIR value of the N6 substituent should be two. This information can be structurally translated into bicyclic N6 type of substituents.
The inclusion in the PM of nCIR, instead of the number of rings in the chemical graph (nCIC) is also significant. Although the structural information of this pair of MDs is very similar (the number of cyclic structures in a chemical graph) their graph-theoretical information is quite different. While nCIC encodes the number of rings, nCIR includes both rings and circuits (a circuit is a larger loop around two or more rings). As an example, naphthalene contains 3 circuits and 2 rings. This is illustrated in Figure 2.
So additional information can be inferred: the bicyclic N6 substituent should not be fused. This assumption could be related to the binding interaction of this type of fragments with the A3AR. In fact, the presence of a certain degree of rotational freedom between the two rings of the fragment could favor its docking into the receptor cavity.
This result matches with previous experimental findings on the SAR of this family of thioadenosine derivatives (34). The SAR obtained for this family suggests that compounds with bulky N6 substituents lost their binding to the A3AR. Paradoxically, among compounds showing high binding affinity at the human A3AR, two compounds substituted with a N6-(trans-2-phenylcyclopropyl)amino group were found to be full agonists at the human A3AR. In addition, it was found that compounds with α-naphthylmethyl N6 substituents lost their binding to the A3AR (34), which reinforce the present proposal.
From the study it was also concluded that bulky N6 substituents only affects the binding affinity; however bulky (bicyclic) substituents such as a trans-2-phenylcyclopropyl group could be beneficial for agonist efficacy without lost their binding affinity. Although that experimental study do not deal with the simultaneous analysis of both properties, their experimental findings properly match with our theoretical results.
Until now, it has been exposed the importance of bicyclic and rigid N6 substituents contributing to reduce the hydrophobicity of the system to obtain an adequate balance between binding affinity and relative efficacy profiles of N6-substituted-4′-thioadenosine A3AR agonists.
At first sight, this information is pretty focused and we could expect that the task of finding promising candidates is almost performed. However, if we consider the number of attainable N6 substituents of this type, generated from a tiny portion of the possible chemical space indicated by this information we can extrapolate the huge number of possible candidates (Table 3). To mention that this analysis has been only performed taking into account unsaturated rings and the valence of the atoms. The number of options can vary, rising or go down if we consider double bounds or chemical feasibility. Anyway, although focused, the ‘haystack’ is vast. So it is determinant a focused screening strategy to efficiently find some ‘needle’ on it.
Table 3. Fraction of the chemical space determined by the N6 substituents conformed by the possible combinations of two not fused rings linked by a single bound
Therefore, the previous information is employed for the theoretical design of new N6-substituted-4′-thioadenosine analogs with adequate balances between binding affinity and agonist efficacy. Because ARR and ALOGP2 cannot be easily manipulated by structural modifications, the design efforts will be mainly focused on nCs and nCIR. Thus, a combinatorial library focused on the generation of N6-substituted-4′-thioadenosine candidates was assembled with nCs ≈ 3 and nCIR ≈ 6. This approach was performed with the aid of the SmiLib software (48), for the rapid assembly of combinatorial Libraries in SMILES notation. The library was directed to produce candidates with conformationally restricted bicyclic N6 substituents while keeping at minimum the presence of secondary sp3 carbon atoms using the 4′-thioadenosine nucleus as scaffold and a set of 25 cyclic or heterocyclic structures as linkers and building blocks. The working combinatorial scheme is shown in Table 4.
Table 4. Scaffolds, linkers, and building blocks employed to assemble the combinatorial library
This combinatorial strategy produced a set of more than 9000 candidates, which according to previous results can be employed in a subsequent virtual screening campaign using as ranking criterion the predicted value of DKiA3-REA3 of each candidate. As mentioned before, only candidates included on the applicability domain of the overall desirability PM (3395 candidate molecules) should be submitted to the ranking process. Figure 3 shows the plot of the predicted DKiA3-REA3 values of the 9782 candidate molecules versus their respective leverage values. As can be noted, predictions range from values of −0.31 to 1.70; however, candidates included on the PM applicability domain are restricted to predicted values of DKiA3-REA3 between 0.22 and 1.44. As a result, it is possible to propose for biological screening a reduced set of candidates with a promissory balance between A3AR binding affinity and agonist efficacy. The values of the MDs included on the overall desirability PM as well as the predicted value of DKiA3-REA3 for a fragment of the ranked combinatorial library are shown in Table 5.
Figure 3. Predicted DKiA3-REA3 values of the candidate molecules included on the combinatorial library plotted vs. their respective leverage values.
Download figure to PowerPoint
Table 5. Fractions of the combinatorial library ranked according to the predicted values of DKiA3-REA3
|Rank||Comb. Lib. ID*||ARR||nCIR||nCs||ALOGP2||Pred. DKiA3-REA3|
Library ranking based on the combination of desirability and belief theories
Although the idea of desirability-transforming and combining a number of related properties is in accordance with the concept of pharmaceutical profile (32,33), the usefulness of a parallel approach allowing obtaining a feedback on the reliability of the properties predicted as a unique Di value is also desirable.
If two or more property values Yi (previously scaled to the respective di values with proper DF) of a compound are combined into a unique Di value, to map it as a MLR function of n MDs Xi (denoted as approach A1), it is rational to expect that the resultant predicted Di value should be similar to the inverse approach. The inverse approach consist in the independent mapping of the k properties Yi as a MLR function of n MDs Xi, the subsequent desirability-scaling of each predicted Yi value and the final combination of the corresponding di values into a unique predicted Di value (denoted as approach A2).
Assuming true the previous analysis, one must anticipate that the higher is the degree of similarity between the predicted Di values of both approaches, the higher should be their reliability, and vice versa. Clearly, the results will depend on the goodness of fit and prediction of the set of PMs involved. In addition, the degree of uncertainty of PMs with different sets of MDs will be diverse.
So it is required a framework allowing the fusion of results from different approaches to access the reliability of predictions from several approaches with different degrees of uncertainty. In the present work, we select Dempster–Shafer Theory (DST) (49–51) (also known as belief theory) to achieve that goal. DST is a mathematical theory of evidence that has been developed to combine separate pieces of information that can arise from different sources (52). Dempster–Shafer Theory is based on two ideas: the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster’s rule for combining such degrees of belief when they are based on independent items of evidence (52).
The foundations of DST can be traced to the work of George Hooper, who published an article in the Philosophical Transaction of the Royal Society entitled ‘A calculation of the credibility of human testimony’ (50). In this article, Hooper formulated two rules relating the credibility of reports to the credibility of the reporters who make them (51).
These two rules are quite simple. The rule for successive testimony says that if a report has been relayed to us through a chain of n reporters, each having a degree of credibility p, then the credibility of the report is pn. The rule for concurrent testimony says that if a report is concurrently attested to by n reporters, each with credibility p, then the credibility of the report is 1−(1−p)n; where 0 ≤ p ≤ 1. Thus, the credibility of a report is weakened by transmission through a chain of reporters but strengthened by the concurrence of reporters (50,51).
If we make a simple analogy of this situation with the situation previously exposed regarding two parallel overall desirability PMs, each approached inversely, is possible to note that DST theory, specifically, the Hospers’s rule for combining concurrent evidence (50,51), is fully applicable to our problem. There, it is only needed to replace ‘report’ with ‘prediction’ and ‘reporter’ with ‘PM’, and the previous paragraph will almost literally describe our problem.
Developing a probability assignment is the basic function in DST and is an expression of the level of confidence that can be ascribed to a particular measurement. However, in this work, we are interested on the desirability of a compound. Consequently, rather than a probability assignment for each compound, we will use the desirability values coming from both overall desirability PMs approaches (D1 and D2) to derive the final joint belief values (BD):
While desirability is not itself a probability, like probabilities their values also range from 0 to 1. Therefore, it can be used to derive the values of BD for each compound. So in this way, it is possible to encode the reliability of the predicted desirability of a compound along with two inverse but complementary prediction approaches. Given this information, BD can be used as ranking criterion in a virtual screening scheme, resulting particularly useful for LBVS.
A LBVS strategy based on BD can be described in the sequence of steps detailed below:
The resultant ranking should render an ordered list, top ranking the most reliable compounds with the highest desirability values. The compounds with a higher chance to exhibit a desirable combination of the k properties modeled.
Subsequently, the BD-based virtual screening (VS) strategy described earlier was applied to the already described training set to test their performance as ranking criterion. Considering the structural similarity between both (the combinatorial library assembled and our training set) is possible to use the latter to infer the reliability of the ranking attained for the combinatorial library. The predicted values of DKiA3-REA3 (according to approach A1) were also tested as ranking criterion to compare a VS strategy based on predictions coming from a single approach with a VS strategy based on the combination of concurrent predictions. The quality of the respective ranking obtained was compared according to Ψ*, as described earlier.
Based on the analysis of our training set, the quality of the ranking attained using the predicted values of DKiA3-REA3 is around 80%, which suggest an acceptable degree of confidence if the scheme is applied to our combinatorial library (R% = 80.08%; Ψ* = 0.1992). As can be noted in Figure 4, the use of BD as ranking criterion (R% = 82.81%; Ψ* = 0.1719) slightly overcomes the performance of the predicted values of DKiA3-REA3. Considering that BD encodes in addition to the desirability of the compound, the reliability of such a prediction, it is clear their suitability at the moment to screen higher and/or structurally diverse libraries with a wider range of the mapped properties.