Prediction models
Once desirability scaled both Ki_{A3} and RE_{A3} responses for each compound, the corresponding overall desirability (D_{KiA3REA3}) values were derived. To identify the factors governing the tradeoffs between binding affinity and efficacy of this family of A_{3}AR agonists, the combined response D_{KiA3REA3} was mapped as a function of four simple 1D MDs with a direct structural and/or physiochemical explanation. The resulting bestfit model together with the statistical regression parameters is given below:
 (12)
N = 32 R^{2} = 0.781 R^{2}_{Adj} = 0.749 F = 24.13 s = 0.127
Q^{2}_{LOO} = 0.566 s_{LOO} = 0.138 Q^{2}_{Boost} = 0.539 s_{Boost} = 0.179 a(R^{2}) = 0.0063 a(Q^{2}) = −0.0039
The statistical significance and predictive ability exhibited by the model show evidence of their suitability for subsequent analyses.
No violations of the preadopted parametric assumptions were found for eqn (12).
At the same time, two QSAR PMs (for Ki_{A3} and RE_{A3}) focused on their predictive ability (identified further as prediction approach A_{2}) were derived to use both in combination with the previously described overall desirability PM (eqn (12), identified further as prediction approach A_{1}) in a LBVS strategy based on the combination of their concurrent predictions through belief theory.
The resulting bestfit models together with the statistical regression parameters are given in eqns (13 and 14):
 (13)
N = 32 R^{2} = 0.985 R^{2}_{Adj} = 0.981 F = 230.82 s = 48.796
Q^{2}_{LOO} = 0.977 s_{LOO} = 56.345 Q^{2}_{Boost} = 0.957 s_{Boost} = 61.246 a(R^{2}) = 0.0017 a(Q^{2}) = −0.0052
 (14)
N = 32 R^{2} = 0.966 R^{2}_{Adj} = 0.956 F = 96.79 s = 5.515
Q^{2}_{LOO} = 0.942 s_{LOO} = 6.369 Q^{2}_{Boost} = 0.921 s_{Boost} = 7.182 a(R^{2}) = 0.0017 a(Q^{2}) = −0.0055
According to their statistics, the models are good in terms of their statistical significance and predictive ability. In opposition to eqn (12), eqns (13 and 14) were derived from a pool of variables significantly higher than the number of cases used for training. As a consequence, the risk to find chance correlations in such a vast variable space is always high. So checking the occurrence of this event is of vital importance in this case. As can be deduced from the significantly low values of a(R^{2}) and a(Q^{2}) obtained in the respective Yscrambling experiments, there is no reason to ascribe to chance correlations the statistical significance and predictive ability exhibited by each PM.
With the exception of the nonmulticollinearity of the independent variables included in the MLR model developed for RE_{A3}; no violations of the remaining MLR parametrical assumptions were found (48). As abovementioned, multicollinearity affects the common interpretation of a regression equation. However, the predictive ability of the PM is not affected in this situation (46).
See Supporting Information for details of the inspection of the parametrical assumptions as well as the establishment of the applicability domain of eqns (12–14).
Consequently, according to the statistical parameters exhibited, the goodness of fit of the PMs involved on both prediction approaches A_{1} and A_{2} can be considered as statistically significant. At the same time, considering their satisfactory predictive ability and the validity of the preadopted parametrical assumptions, the resultant predictions can be regarded as reliable in the domain of the N^{6}substituted4′thioadenosines A_{3}AR agonists used for training and structurally coded as a linear function of the respective subsets of MDs. Therefore, all the PMs developed can be employed in a LBVS scheme with an adequate degree of reliability.
Desirabilitybased prediction model interpretation and theoretical design of N^{6}substituted4′thioadenosine A_{3}AR agonist candidates
Based on the satisfactory accuracy, statistical significance and predictive ability of the overall desirability PM (eqn (12)) we can proceed, with an adequate level of confidence to the simultaneous analysis of the factors governing the balance between the binding affinity and relative efficacy profiles of A_{3}AR agonists.
Although the main variation of the subset of compounds employed is over the N^{6} position of the adenine ring, the MDs employed in mapping D_{KiA3REA3} are global and not fragment based. So any inference made have to be only based on the influence of N^{6} substituents over the global molecular system.
First, the information encoded in the MDs included on the model was analyzed. According to the model regression parameters, the most influencing MD is the aromatic ratio (ARR), followed by the GhoseCrippen octanol–water partition coefficient (ALOGP2), the number of circuits (nCIR) and the number of total secondary sp3 carbon atoms (nCs). All MDs were inversely related with the overall desirability D_{KiA3REA3} of N^{6}substituted4′thioadenosine A3AR agonists, except nCIR.
Specifically, ARR is the fraction of aromatic atoms in the hydrogen suppressed molecule graph and encodes the degree of aromaticity of the molecule. According to the model parameters, N^{6} substitutions increasing the aromaticity of the molecule do not favor D_{KiA3REA3}.
ALOGP2 is simply the square of the GhoseCrippen octanol–water coefficient (ALOGP), which is a group contribution model for the octanol–water partition coefficient. Because these MDs encode the hydrophobic/hydrophilic character of the molecule, D_{KiA3REA3} could be favored by the presence of N^{6} substituents contributing to reduce the hydrophobicity of the molecule.
The nCIR is a complexity descriptor, which is related to the molecular flexibility. Because nCIR serve as a measure of rigidity with higher numbers of circuits corresponding to reduced flexibility; cyclic and rigid or conformationally restricted N^{6} substituents could increase the overall desirability of the molecular system.
Finally, the presence of secondary sp^{3} carbon atoms in the molecule appears to be detrimental for D_{KiA3REA3}.
According to the model, a molecule with a low aromaticity degree, without secondary sp^{3} carbon atoms, and containing cyclic and rigid N^{6} substituents, which contributes to reduce the hydrophobicity of the system could favor the balance of the binding affinity and relative efficacy profiles of N^{6}substituted4′thioadenosine A_{3}AR agonists.
To note that these conclusions, although derived from a simple 1D model, are very similar to that obtained by 3DCoMFA/CoMSIA approaches (12). Kim and Jacobson have concluded that a bulky group, conformationally restricted, at the N^{6} position of the adenine ring will increases the A_{3}AR binding affinity, and that a small bulky group, at this position, might be crucial for A_{3}AR activation. Note the accordance of data obtained in the previous and present work: a ‘conformationally restricted bulky group’ is suggested by Kim and Jacobson and herein a ‘cyclic and rigid substituents’ on the N^{6} position.
To note that although nCIR is not the MD more significantly related with D_{KiA3REA3}, it is very informative for the property. From nCIR, we can infer that the bulkiness of the N_{6} substituent suggested in (12) can be characterized by a cyclic rather than an alkyl substituent.
Although useful, this information is found to be incomplete because it is well known that steric factors are determinant for the design of A_{3}AR agonists, especially for binding affinity (12). Consequently, it is found to be important to determine the optimal size of the conformationally restricted cyclic N_{6} substituent. Unfortunately, the simple inspection of the regression parameters of the PM does not offer this information. In consequence, a property/desirability profiling was carried out to identify the levels of the MDs included in the PM that simultaneously generate the most desirable combination of binding affinity and relative efficacy.
As the main goal of this analysis is to extract information on the factors governing D_{KiA3REA3} rather than optimize it, the behavior of D_{KiA3REA3} was profiled at the mean values of the four MDs rather than looking for their optimal values (see first row in Figure 1). Accordingly, it was possible to find the levels of the MDs simultaneously producing the best possible D_{KiA3REA3} in the training set employed. As can be noted in Figure 1 (second row), a A_{3}AR agonist candidate should exhibit a value of D_{KiA3REA3} near to 0.9 at levels of ARR, nCs, ALOGP2, and nCIR around 0.4, 2, 0, and 6; respectively.
The analysis reveal that the most favorable balance of binding affinity and agonist efficacy: the ARR should be not just low but near to 0.4; ALOGP2 should be as low as possible; the number of secondary sp3 carbon atoms should be kept around two; and nCIR should be not just high but close to six.
Because the thioadenosine nucleus already contain three secondary sp3 carbon atoms, at least on the applicability domain of the present model, the minimum number of such atoms should be kept at three. So this type of carbons must be excluded in the substituents located at N^{6} position.
At the same time, considering that the nCIR value of the thioadenosine nucleus is four, one can deduce that the ideal nCIR value of the N^{6} substituent should be two. This information can be structurally translated into bicyclic N^{6} type of substituents.
The inclusion in the PM of nCIR, instead of the number of rings in the chemical graph (nCIC) is also significant. Although the structural information of this pair of MDs is very similar (the number of cyclic structures in a chemical graph) their graphtheoretical information is quite different. While nCIC encodes the number of rings, nCIR includes both rings and circuits (a circuit is a larger loop around two or more rings). As an example, naphthalene contains 3 circuits and 2 rings. This is illustrated in Figure 2.
So additional information can be inferred: the bicyclic N^{6} substituent should not be fused. This assumption could be related to the binding interaction of this type of fragments with the A_{3}AR. In fact, the presence of a certain degree of rotational freedom between the two rings of the fragment could favor its docking into the receptor cavity.
This result matches with previous experimental findings on the SAR of this family of thioadenosine derivatives (34). The SAR obtained for this family suggests that compounds with bulky N^{6} substituents lost their binding to the A_{3}AR. Paradoxically, among compounds showing high binding affinity at the human A_{3}AR, two compounds substituted with a N^{6}(trans2phenylcyclopropyl)amino group were found to be full agonists at the human A_{3}AR. In addition, it was found that compounds with αnaphthylmethyl N^{6} substituents lost their binding to the A_{3}AR (34), which reinforce the present proposal.
From the study it was also concluded that bulky N^{6} substituents only affects the binding affinity; however bulky (bicyclic) substituents such as a trans2phenylcyclopropyl group could be beneficial for agonist efficacy without lost their binding affinity. Although that experimental study do not deal with the simultaneous analysis of both properties, their experimental findings properly match with our theoretical results.
Until now, it has been exposed the importance of bicyclic and rigid N^{6} substituents contributing to reduce the hydrophobicity of the system to obtain an adequate balance between binding affinity and relative efficacy profiles of N^{6}substituted4′thioadenosine A_{3}AR agonists.
At first sight, this information is pretty focused and we could expect that the task of finding promising candidates is almost performed. However, if we consider the number of attainable N^{6} substituents of this type, generated from a tiny portion of the possible chemical space indicated by this information we can extrapolate the huge number of possible candidates (Table 3). To mention that this analysis has been only performed taking into account unsaturated rings and the valence of the atoms. The number of options can vary, rising or go down if we consider double bounds or chemical feasibility. Anyway, although focused, the ‘haystack’ is vast. So it is determinant a focused screening strategy to efficiently find some ‘needle’ on it.
Table 3. Fraction of the chemical space determined by the N^{6} substituents conformed by the possible combinations of two not fused rings linked by a single bound Therefore, the previous information is employed for the theoretical design of new N^{6}substituted4′thioadenosine analogs with adequate balances between binding affinity and agonist efficacy. Because ARR and ALOGP2 cannot be easily manipulated by structural modifications, the design efforts will be mainly focused on nCs and nCIR. Thus, a combinatorial library focused on the generation of N^{6}substituted4′thioadenosine candidates was assembled with nCs ≈ 3 and nCIR ≈ 6. This approach was performed with the aid of the SmiLib software (48), for the rapid assembly of combinatorial Libraries in SMILES notation. The library was directed to produce candidates with conformationally restricted bicyclic N^{6} substituents while keeping at minimum the presence of secondary sp^{3} carbon atoms using the 4′thioadenosine nucleus as scaffold and a set of 25 cyclic or heterocyclic structures as linkers and building blocks. The working combinatorial scheme is shown in Table 4.
Table 4. Scaffolds, linkers, and building blocks employed to assemble the combinatorial library This combinatorial strategy produced a set of more than 9000 candidates, which according to previous results can be employed in a subsequent virtual screening campaign using as ranking criterion the predicted value of D_{KiA3REA3} of each candidate. As mentioned before, only candidates included on the applicability domain of the overall desirability PM (3395 candidate molecules) should be submitted to the ranking process. Figure 3 shows the plot of the predicted D_{KiA3REA3} values of the 9782 candidate molecules versus their respective leverage values. As can be noted, predictions range from values of −0.31 to 1.70; however, candidates included on the PM applicability domain are restricted to predicted values of D_{KiA3REA3} between 0.22 and 1.44. As a result, it is possible to propose for biological screening a reduced set of candidates with a promissory balance between A_{3}AR binding affinity and agonist efficacy. The values of the MDs included on the overall desirability PM as well as the predicted value of D_{KiA3REA3} for a fragment of the ranked combinatorial library are shown in Table 5.
Table 5. Fractions of the combinatorial library ranked according to the predicted values of D_{KiA3REA3}Rank  Comb. Lib. ID*  ARR  nCIR  nCs  ALOGP2  Pred. D_{KiA3REA3} 


1  1.36_2  0.294  6  5  0.532  1.439 
2  1.36_3  0.294  6  5  0.532  1.439 
3  2.4_54  0.294  6  5  0.567  1.436 
4  2.5_3  0.294  6  5  0.633  1.429 
5  2.5_2  0.294  6  5  0.633  1.429 
2221  1.32_55  0.455  6  3  2.161  1.000 
2222  1.54_17  0.455  6  3  2.163  1.000 
2223  1.17_86  0.441  6  3  2.527  1.000 
2224  1.55_11  0.471  6  3  1.752  0.999 
2225  1.35_40  0.441  6  3  2.541  0.998 
2914  2.52_108  0.441  6  3  4.388  0.800 
2915  1.34_87  0.441  6  3  4.402  0.799 
2916  2.10_106  0.457  6  3  3.992  0.798 
2917  1.58_90  0.357  5  3  4.7  0.798 
2918  1.38_109  0.441  6  3  4.418  0.797 
3343  2.35_106  0.441  6  3  7.185  0.500 
3344  2.48_55  0.429  6  4  6.647  0.500 
3345  2.54_53  0.441  6  3  7.198  0.499 
3346  2.56_106  0.441  6  3  7.242  0.494 
3347  2.48_109  0.429  6  4  6.702  0.494 
3391  1.48_55  0.441  6  4  8.071  0.314 
3392  1.48_109  0.441  6  4  8.132  0.307 
3393  1.48_110  0.441  6  4  8.256  0.294 
3394  1.48_52  0.441  6  4  8.74  0.242 
3395  1.48_108  0.441  6  4  8.932  0.221 
Library ranking based on the combination of desirability and belief theories
Although the idea of desirabilitytransforming and combining a number of related properties is in accordance with the concept of pharmaceutical profile (32,33), the usefulness of a parallel approach allowing obtaining a feedback on the reliability of the properties predicted as a unique D_{i} value is also desirable.
If two or more property values Y_{i} (previously scaled to the respective d_{i} values with proper DF) of a compound are combined into a unique D_{i} value, to map it as a MLR function of n MDs X_{i} (denoted as approach A_{1}), it is rational to expect that the resultant predicted D_{i} value should be similar to the inverse approach. The inverse approach consist in the independent mapping of the k properties Y_{i} as a MLR function of n MDs X_{i}, the subsequent desirabilityscaling of each predicted Y_{i} value and the final combination of the corresponding d_{i} values into a unique predicted D_{i} value (denoted as approach A_{2}).
 (15)
Assuming true the previous analysis, one must anticipate that the higher is the degree of similarity between the predicted D_{i} values of both approaches, the higher should be their reliability, and vice versa. Clearly, the results will depend on the goodness of fit and prediction of the set of PMs involved. In addition, the degree of uncertainty of PMs with different sets of MDs will be diverse.
So it is required a framework allowing the fusion of results from different approaches to access the reliability of predictions from several approaches with different degrees of uncertainty. In the present work, we select Dempster–Shafer Theory (DST) (49–51) (also known as belief theory) to achieve that goal. DST is a mathematical theory of evidence that has been developed to combine separate pieces of information that can arise from different sources (52). Dempster–Shafer Theory is based on two ideas: the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster’s rule for combining such degrees of belief when they are based on independent items of evidence (52).
The foundations of DST can be traced to the work of George Hooper, who published an article in the Philosophical Transaction of the Royal Society entitled ‘A calculation of the credibility of human testimony’ (50). In this article, Hooper formulated two rules relating the credibility of reports to the credibility of the reporters who make them (51).
These two rules are quite simple. The rule for successive testimony says that if a report has been relayed to us through a chain of n reporters, each having a degree of credibility p, then the credibility of the report is p^{n}. The rule for concurrent testimony says that if a report is concurrently attested to by n reporters, each with credibility p, then the credibility of the report is 1−(1−p)^{n}; where 0 ≤ p ≤ 1. Thus, the credibility of a report is weakened by transmission through a chain of reporters but strengthened by the concurrence of reporters (50,51).
If we make a simple analogy of this situation with the situation previously exposed regarding two parallel overall desirability PMs, each approached inversely, is possible to note that DST theory, specifically, the Hospers’s rule for combining concurrent evidence (50,51), is fully applicable to our problem. There, it is only needed to replace ‘report’ with ‘prediction’ and ‘reporter’ with ‘PM’, and the previous paragraph will almost literally describe our problem.
Developing a probability assignment is the basic function in DST and is an expression of the level of confidence that can be ascribed to a particular measurement. However, in this work, we are interested on the desirability of a compound. Consequently, rather than a probability assignment for each compound, we will use the desirability values coming from both overall desirability PMs approaches (D_{1} and D_{2}) to derive the final joint belief values (B_{D}):
 (16)
While desirability is not itself a probability, like probabilities their values also range from 0 to 1. Therefore, it can be used to derive the values of B_{D} for each compound. So in this way, it is possible to encode the reliability of the predicted desirability of a compound along with two inverse but complementary prediction approaches. Given this information, B_{D} can be used as ranking criterion in a virtual screening scheme, resulting particularly useful for LBVS.
A LBVS strategy based on B_{D} can be described in the sequence of steps detailed below:
The resultant ranking should render an ordered list, top ranking the most reliable compounds with the highest desirability values. The compounds with a higher chance to exhibit a desirable combination of the k properties modeled.
Subsequently, the B_{D}based virtual screening (VS) strategy described earlier was applied to the already described training set to test their performance as ranking criterion. Considering the structural similarity between both (the combinatorial library assembled and our training set) is possible to use the latter to infer the reliability of the ranking attained for the combinatorial library. The predicted values of D_{KiA3REA3} (according to approach A_{1}) were also tested as ranking criterion to compare a VS strategy based on predictions coming from a single approach with a VS strategy based on the combination of concurrent predictions. The quality of the respective ranking obtained was compared according to Ψ*, as described earlier.
Based on the analysis of our training set, the quality of the ranking attained using the predicted values of D_{KiA3REA3} is around 80%, which suggest an acceptable degree of confidence if the scheme is applied to our combinatorial library (R_{%} = 80.08%; Ψ* = 0.1992). As can be noted in Figure 4, the use of B_{D} as ranking criterion (R_{%} = 82.81%; Ψ* = 0.1719) slightly overcomes the performance of the predicted values of D_{KiA3REA3}. Considering that B_{D} encodes in addition to the desirability of the compound, the reliability of such a prediction, it is clear their suitability at the moment to screen higher and/or structurally diverse libraries with a wider range of the mapped properties.