• A3 adenosine receptor agonists;
  • belief theory;
  • Chemoinformatics;
  • desirability theory;
  • drug discovery;
  • ligand-based virtual screening;
  • simultaneous analysis


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results and Discussion
  5. Conclusions
  6. Acknowledgment
  7. References
  8. Supporting Information

Desirability theory (DT) is a well-known multi-criteria decision-making approach. In this work, DT is employed as a prediction model (PM) interpretation tool to extract useful information on the desired trade-offs between binding and relative efficacy of N6-substituted-4′-thioadenosines A3 adenosine receptor (A3AR) agonists. At the same time, it was shown the usefulness of a parallel but independent approach providing a feedback on the reliability of the combination of properties predicted as a unique desirability value. The appliance of belief theory allowed the quantification of the reliability of the predicted desirability of a compound according to two inverse and independent but complementary prediction approaches. This information is proven to be useful as a ranking criterion in a ligand-based virtual screening study. The development of a linear PM of the A3AR agonists overall desirability allows finding significant clues based on simple molecular descriptors. The model suggests a relevant role of the type of substituent on the N6 position of the adenine ring that in general contribute to reduce the flexibility and hydrophobicity of the lead compound. The mapping of the desirability function derived of the PM offers specific information such as the shape and optimal size of the N6 substituent. The model herein developed allows a simultaneous analysis of both binding and relative efficacy profiles of A3AR agonists. The information retrieved guides the theoretical design and assembling of a combinatorial library suitable for filtering new N6-substituted-4′-thioadenosines A3AR agonist candidates with simultaneously improved binding and relative efficacy profiles. The utility of the desirability/belief-based proposed virtual screening strategy was deduced from our training set. Based on the overall results, it is possible to assert that the combined use of desirability and belief theories in computational medicinal chemistry research can aid the discovery of A3AR agonist candidates with favorable balance between binding and relative efficacy profiles.

Adenosine receptors (ARs) are G-protein-coupled receptors, consisting of A1, A2A, A2B, and A3 subtypes, that are activated by the endogenous agonist adenosine and blocked by natural antagonists, such as caffeine and theophylline (1). A1 and A3 subtypes are coupled to GI/O proteins, while A2A and A2B subtypes are GS protein-coupled.

There is growing evidence that ARs could be promising therapeutic targets in a wide range of pathologies (1–6). In particular, A3AR agonists have shown to be useful to prevent ischemic damage in the brain and heart and as anti-inflammatory, anticancer, and myeloprotective agents (7–11).

Although ARs are becoming important targets in drug design and development, several problems complicate the development of new AR agonists. Kim and Jacobson (12) point out several reasons for the bottleneck in this area:

  • (a)
     The ubiquitous expression of ARs in the body would result in diverse side-effects.
  • (b)
     The low density of a given receptor subtype in a targeted tissue may reduce its desired effect in the treatment of certain diseases (8).
  • (c)
     In many cases, nucleoside derivatives have lowered maximal efficacy at the A3AR and, consequently, behave as a partial agonist or antagonist.
  • (d)
     A major bottleneck for structure-based drug design of AR agonists or antagonists is the lack of three-dimensional (3D) structural information about G-protein-coupled receptors through standard structure determination techniques X-ray and nuclear magnetic resonance studies because of the difficulties in receptor purification and their insolubility in environments lacking phospholipids.

The problem of side-effects exposed in (a) obviously demands for selective and specific agonists to overcome it. The simultaneous study of the agonist efficacy, the binding affinity to the target AR and the binding affinity of the rest of subtypes could offer practical clues in this regard, motivating future researches in this area. In the present work, the last three problems [(b), (c), and (d)] will be tackled.

From (b) and (c), it is clear that both the binding affinity and the agonist efficacy should be simultaneously studied to develop selective A3AR agonists. Even more, the study of the combination of both properties could be very informative and useful. However, from (d) we are aware of the little feasibility of a structure-based approach. Therefore, in cases where the receptor structure is unknown, a ligand-based approach, based only on an extensive study of structure-activity relationships (SAR), could be an informative alternative. In particular, the quantitative structure-activity relationship (QSAR) paradigm has long been of interest in the drug-design process (13, 14). Recently, an excellent review on QSAR tools to find new A3AR agonists using 2D and 3D molecular descriptors (MDs) has been published (15).

When a medicinal chemist faces the problem of using QSAR prediction models (PM) to aid the search for new drug candidates, the desired goal is to obtain an interpretable and predictive PM. However, the fact is that the ‘dominant Boolean operator’ in this situation is not precisely ‘AND’, and more often what is desired, results to be ‘OR’ the dominant operator. So the interpretability of a PM is a trade-off with predictive accuracy. For example, linear regression models can be interpreted in a detailed fashion, but, generally, have lower accuracy, especially for biological activities. On the other hand, one can achieve high accuracy using a neural network model, but extracting the encoded SAR can be very difficult. In the same way, MDs with a direct physicochemical or structural meaning such as physicochemical properties or constitutional descriptors can be easily translated into structural modifications enhancing the biological profile of a molecule, whereas highly informative MDs such as the 3D ones tend to be more abstract and do not allow one easily to understand the substructures that are important for activity (16).

Thus, a PM provides to the researcher with two aspects: a set of predicted values, and information regarding the SAR(s) that are present in the dataset. Unfortunately, these two parameters are not usually provided jointly. As a consequence, it is necessary to establish priorities in an investigation weighting the importance of predictivity and interpretability, prioritize that what is determinant for the problem, and select the MDs and the modeling strategy accordingly.

At the same time, improving the profile of a molecule for the drug discovery process requires the simultaneous optimization of numerous, often competing objectives. Classic QSAR approaches usually ignore the multi-objective nature of the problem focusing on the evaluation of each single property as they became available during the drug discovery process (17). So an approach offering a simultaneous study of several biological properties determinants for a specific therapeutic activity is considered a very attractive option in computational medicinal chemistry. In this sense, desirability functions (DF) are well-known multi-criteria decision-making methods (18,19). This approach has been extensively employed in several fields (20–31). However, despite of perfectly fit with the drug development problem, reports of computational medicinal chemistry applications are at present very scarce (32,33).

Recently, a three-dimensional QSAR study (3D-QSAR) on the A3AR agonists binding affinity and relative efficacy profiles including oxo- and thioadenosine analogs exposed the outlier nature of thioadenosine derivatives (12). In a training set of 91 compounds, five of eight outliers were 4′-thioadenosine analogs, indicating the possibility of a subtle difference in the binding mode and activation mechanisms of 4′-thioadenosine analogs in comparison with the oxo analogs. The nature of the substituents on the N6 position of the adenine ring was found to play a significant role in the binding affinity and relative efficacy of the compounds. These interesting findings make N6-substituted-4′-thioadenosine analogs an attractive goal in A3AR agonists research.

Considering the medicinal and computational chemistry problems above exposed, we propose in this work the use of the desirability theory as a tool to extract useful information on the desired trade-offs between binding and relative efficacy of N6-substituted-4′-thioadenosines A3AR agonists. Additionally, desirability and belief theories are combined to integrate a ligand-based virtual screening (LBVS) protocol allowing the fusion of results from independent approaches to access the reliability of concurrent predictions.

Materials and Methods

  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results and Discussion
  5. Conclusions
  6. Acknowledgment
  7. References
  8. Supporting Information

Data set and computational methods

The multiple linear regression (MLR) PMs developed were based on the binding affinities (KiA3) and relative maximal efficacy (REA3) in the activation of the A3AR reported by Jeong et al. (34) for a library of thirty-two N6-substituted-4′-thioadenosines A3AR agonists. The chemical structures and property values are depicted in the Supporting Information related to this work.

The structures of all compounds were first drawn with the aid of ChemDraw Ultra 9.0a, and reasonable starting geometries by resorting to the MM2 molecular mechanics force field were obtained (35,36). Molecular structures were then fully optimized with the PM3 semi-empirical Hamiltonian (37), implemented in the mopac 6.0 program (38). Subsequently, the optimized structures were brought into the dragon software packageb for computing a total of 1664 MDs. Descriptors having constant or near constant values were excluded. Thus, from the initial set 1320 MDs remained for further variable selection and construction of the PMs (focused on predictability) involved on LBVS approach.

On the other hand, for the overall desirability PM (focused in interpretability) involved on the desirability-based interpretation approach was computed only 351 MDs (48 constitutional, 154 functional groups count, 120 atom-centered fragments and 29 molecular properties). These four families of MDs were chosen because their simple nature offers an easily structural or physicochemical interpretation of the resultant PM. To reduce noisy information that could lead to chance correlations, descriptors having constant or near constant values as well as highly pair-correlated (|R| > 0.9) were excluded. Consequently, from the initial set, only 32 MDs remained for further variable selection. The set of four variables finally included in the model is depicted in Table 1.

Table 1.   Molecular descriptors (MDs) included on the overall desirability prediction model, identified through the Genetic Algorithm selection process
ARRAromatic ratioConstitutional descriptors
nCIRNumber of circuitsConstitutional descriptors
nCsNumber of total secondary sp3 carbon atomsFunctional groups count
ALOGP2Squared Ghose-Crippen octanol–water partition coefficient (logP^2)Molecular properties

An optimization technique – the Genetic Algorithm (GA) – was applied for variable selection (39–41) by using the MobyDigs 1.1 software packagec. The GA selection parameters setup was: population size = 100, maximum allowed variables in the model = 7, reproduction/mutation trade-off = 0.5 and selection bias = 50%. The determination coefficient of the leave-one-out cross-validation (Q2LOO) was employed as fitness function.

The predictive ability of the PMs was evaluated by means of internal cross-validation (CV). Specifically, the leave-one-out (LOO) technique (42) is already implicit on the GA feature selection process, being characterized by the Q2LOO and sLOO statistics in eqns 12–14. Additionally, to ensure the predictive ability, the resultant PM was subjected to a bootstrap validation procedure (43) determined by 8000 resubstitutions (characterized by Q2Boost and sBoost statistics in eqns 12–14). A Y-scrambling procedure (44) (based on 500 random permutations of the Y-response vector) implemented on MobyDigsc was also applied to check whether the correlations established by the respective PMs were because of chance correlations or not. See a (R2 ) and a (Q2 ) statistics in eqns 12–14, where unstable models because of chance correlations are characterized by high values and vice versa. In this way, the quality and predictive ability of the PMs can be assessed.

We have also checked the validity of the preadopted parametric assumptions, another important aspect in the application of linear multivariate statistical-based approaches. These include the linearity of the modeled property, normal distribution of residuals as well as the homoscedasticity and non-multicollinearity of the independent variables included in the MLR model (45,46).

Finally, the applicability domain of the final PMs was identified by a leverage plot, that is to say, a plot of the standardized residuals vs leverages for each training compound (42,47).

Scaling properties with desirability functions

The properties Yi were scaled to their respective desirability (di) values by means of the Derringer DF (19). Desirability functions are well-known multi-criteria decision-making methods, based on the definition of a DF for each property to transform their values to the same scale. Each attribute (KiA3 and REA3) is independently transformed into a desirability value (d(KiA3) and d(REA3)) by an arbitrary function. The original value is range scaled between 0 and 1 by:

  • image(1)

where Li and Ui are the selected minimum and maximum values, respectively.

In this work, two specific DF (one for each property) were used.

If a property is to be maximized, its individual DF is defined as:

  • image(2)

In this case, Ti is interpreted as a large enough value for the property, which can be Ui.

On the other hand, if one wants to minimize a property, one might use:

  • image(3)

Here, Ti denotes a small enough value for the property, which can be Li.

Specifically, REA3 ought to be maximized (eqn 2) in such a way that the compound with the highest/lowest value should be the most desirable/undesirable (di = 1/di = 0). Specifically, Li was set to 0%, and the upper value Ui, made equal to the target value Ti, was set to 114%. In contrast, to maximize the binding affinity to the human A3AR, the KiA3 values most be minimized (eqn 3) where Li = Ti 0.8 nm and Ui = 1650 nm, coinciding with the lower and higher values of KiA3 in the data set, respectively.

Anyhow, if a response is of the target best kind, then its individual DF is defined as:

  • image(4)

The exponents s and t in eqns (2–4) determine how important is to hit the target value Ti. For s = t = 1, the DF increases linearly towards Ti. Large values for s and t should be selected if it is very desirable that the value of Ŷi be close to Ti or increase rapidly above Li. On the other hand, small values of s and t should be chosen if almost any value of Ŷi above Li, and below Ui are acceptable or if having values of Ŷi considerably above Li are not of critical importance (19).

The individual desirabilities are then combined using the geometric mean, which gives the overall desirability Di:

  • image(5)

with k denoting the number of properties.

This single value of Di gives the overall assessment of the desirability of the combined property levels. Clearly, the range of Di will fall in the interval [0, l] and will increase as the balance of the properties becomes more favorable.

Ranking quality

To measure the quality of the ranking obtained we employ a quantitative measure also based on the application of DF.

We will use a simple notation to represent ordering throughout this article. Without loss of generality, for n cases to be ordered, we use the actual ordering position of each case as the label to represent this case in the ordered list. We assume the examples are ordered incrementally from left to right. Then, the true-order list is OT = 1(lowest), 2, 3, …, n (highest). For any ordered list generated by a ranking algorithm, it is a permutation of OT. We use OR to denote the ordered list generated by the ranking algorithm R. OR can be written as a1, a2, …, an, where ai is the actual ordering position of the case that is ranked ith in OR (see Table 2).

Table 2.   An example of ordered lists

The ranking validation includes the following steps:

  • 1
     Order the cases in the library according to Di in a decreasing fashion and label each case as described earlier (1, 2, 3, …, n). This ordering corresponds to the true-order list (OT).
  • 2
     Invert OT. This new ordering corresponds to the worst-order list (OW).
  • 3
     Order incrementally the cases in the library according to Δi (starting with the case exhibiting the lowest value of Δi) and label each case as described earlier (a1, a2, …, an). This ordering corresponds to the order generated by the ranking algorithm R (OR).
  • 4
     Normalize [through eqn (3)] the values (labels) assigned to each case on steps 1 to 3 where Li = Ti = 1 y Ui = number of cases included in the library (n). In this way, we obtain the respective normalized order values for the true (OTdi) and worst (OWdi)-order lists as well as the order generated by the ranking algorithm R (ORdi).
  • 5
     Use the respective normalized order values to determine the difference between ORand OT (OT−ORδi):
    • image(6)
    and between OWand OT (OT−OWδi):
    • image(7)
    The ideal difference is 0 for all the cases and corresponds to a perfect ranking.
  • 6
     Estimate the quality of the order generated by the ranking algorithm R (OR) by means of the ranking quality index (Ψ), which can be defined as the absolute value of the mean of OT−ORδi, for the n cases included in the library to be ranked:
    • image(8)
  • 9
    Ψ is in the range [0, 0.5], being Ψ = 0 if a ranking is perfect and Ψ ≅ 0.5 for the worst ranking. Like this, the closer to 0 is Ψ for a certain ranking the higher will be the quality of this ranking. In contrast, values of Ψ near to 0.5 indicate a low-ranking quality. Because the value of Ψ associated to the worst ranking is dependent of the size of the library to be ranked, this value is not exactly, but approximately equal to 0.5. At the same time, a range [0, 1] rather than [0, 0.5] is a more clear indicator of the quality of a ranking. Considering the previous questions, a correction factor (F) is applied to Ψ:
    • image(9)
    where ΨOW is the quality index for the worst ranking. F is used here to obtain a more representative indicator Ψ of the quality of a ranking and at the same time to include Ψ in the range [0, 1] where ΨOW is exactly equal to 1. In this way, we obtain the corrected ranking quality index (Ψ*):
    • image(10)
  • 11
    Finally, is possible to express Ψ* as the percentage of ranking quality (R%):
    • image(11)

Results and Discussion

  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results and Discussion
  5. Conclusions
  6. Acknowledgment
  7. References
  8. Supporting Information

Prediction models

Once desirability scaled both KiA3 and REA3 responses for each compound, the corresponding overall desirability (DKiA3-REA3) values were derived. To identify the factors governing the trade-offs between binding affinity and efficacy of this family of A3AR agonists, the combined response DKiA3-REA3 was mapped as a function of four simple 1D MDs with a direct structural and/or physiochemical explanation. The resulting best-fit model together with the statistical regression parameters is given below:

  • image(12)

= 32 R2 = 0.781 R2Adj = 0.749 F = 24.13 s = 0.127

Q2LOO = 0.566 sLOO = 0.138 Q2Boost = 0.539 sBoost = 0.179 a(R2) = 0.0063 a(Q2) = −0.0039

The statistical significance and predictive ability exhibited by the model show evidence of their suitability for subsequent analyses.

No violations of the preadopted parametric assumptions were found for eqn (12).

At the same time, two QSAR PMs (for KiA3 and REA3) focused on their predictive ability (identified further as prediction approach A2) were derived to use both in combination with the previously described overall desirability PM (eqn (12), identified further as prediction approach A1) in a LBVS strategy based on the combination of their concurrent predictions through belief theory.

The resulting best-fit models together with the statistical regression parameters are given in eqns (13 and 14):

  • image(13)

= 32 R2 = 0.985 R2Adj = 0.981 F = 230.82 s = 48.796

Q2LOO = 0.977 sLOO = 56.345 Q2Boost = 0.957 sBoost = 61.246 a(R2) = 0.0017 a(Q2) = −0.0052

  • image(14)

= 32 R2 = 0.966 R2Adj = 0.956 F = 96.79 s = 5.515

Q2LOO = 0.942 sLOO = 6.369 Q2Boost = 0.921 sBoost = 7.182 a(R2) = 0.0017 a(Q2) = −0.0055

According to their statistics, the models are good in terms of their statistical significance and predictive ability. In opposition to eqn (12), eqns (13 and 14) were derived from a pool of variables significantly higher than the number of cases used for training. As a consequence, the risk to find chance correlations in such a vast variable space is always high. So checking the occurrence of this event is of vital importance in this case. As can be deduced from the significantly low values of a(R2) and a(Q2) obtained in the respective Y-scrambling experiments, there is no reason to ascribe to chance correlations the statistical significance and predictive ability exhibited by each PM.

With the exception of the non-multicollinearity of the independent variables included in the MLR model developed for REA3; no violations of the remaining MLR parametrical assumptions were found (48). As above-mentioned, multi-collinearity affects the common interpretation of a regression equation. However, the predictive ability of the PM is not affected in this situation (46).

See Supporting Information for details of the inspection of the parametrical assumptions as well as the establishment of the applicability domain of eqns (12–14).

Consequently, according to the statistical parameters exhibited, the goodness of fit of the PMs involved on both prediction approaches A1 and A2 can be considered as statistically significant. At the same time, considering their satisfactory predictive ability and the validity of the preadopted parametrical assumptions, the resultant predictions can be regarded as reliable in the domain of the N6-substituted-4′-thioadenosines A3AR agonists used for training and structurally coded as a linear function of the respective subsets of MDs. Therefore, all the PMs developed can be employed in a LBVS scheme with an adequate degree of reliability.

Desirability-based prediction model interpretation and theoretical design of N6-substituted-4′-thioadenosine A3AR agonist candidates

Based on the satisfactory accuracy, statistical significance and predictive ability of the overall desirability PM (eqn (12)) we can proceed, with an adequate level of confidence to the simultaneous analysis of the factors governing the balance between the binding affinity and relative efficacy profiles of A3AR agonists.

Although the main variation of the subset of compounds employed is over the N6 position of the adenine ring, the MDs employed in mapping DKiA3-REA3 are global and not fragment based. So any inference made have to be only based on the influence of N6 substituents over the global molecular system.

First, the information encoded in the MDs included on the model was analyzed. According to the model regression parameters, the most influencing MD is the aromatic ratio (ARR), followed by the Ghose-Crippen octanol–water partition coefficient (ALOGP2), the number of circuits (nCIR) and the number of total secondary sp3 carbon atoms (nCs). All MDs were inversely related with the overall desirability DKiA3-REA3 of N6-substituted-4′-thioadenosine A3AR agonists, except nCIR.

Specifically, ARR is the fraction of aromatic atoms in the hydrogen suppressed molecule graph and encodes the degree of aromaticity of the molecule. According to the model parameters, N6 substitutions increasing the aromaticity of the molecule do not favor DKiA3-REA3.

ALOGP2 is simply the square of the Ghose-Crippen octanol–water coefficient (ALOGP), which is a group contribution model for the octanol–water partition coefficient. Because these MDs encode the hydrophobic/hydrophilic character of the molecule, DKiA3-REA3 could be favored by the presence of N6 substituents contributing to reduce the hydrophobicity of the molecule.

The nCIR is a complexity descriptor, which is related to the molecular flexibility. Because nCIR serve as a measure of rigidity with higher numbers of circuits corresponding to reduced flexibility; cyclic and rigid or conformationally restricted N6 substituents could increase the overall desirability of the molecular system.

Finally, the presence of secondary sp3 carbon atoms in the molecule appears to be detrimental for DKiA3-REA3.

According to the model, a molecule with a low aromaticity degree, without secondary sp3 carbon atoms, and containing cyclic and rigid N6 substituents, which contributes to reduce the hydrophobicity of the system could favor the balance of the binding affinity and relative efficacy profiles of N6-substituted-4′-thioadenosine A3AR agonists.

To note that these conclusions, although derived from a simple 1D model, are very similar to that obtained by 3D-CoMFA/CoMSIA approaches (12). Kim and Jacobson have concluded that a bulky group, conformationally restricted, at the N6 position of the adenine ring will increases the A3AR binding affinity, and that a small bulky group, at this position, might be crucial for A3AR activation. Note the accordance of data obtained in the previous and present work: a ‘conformationally restricted bulky group’ is suggested by Kim and Jacobson and herein a ‘cyclic and rigid substituents’ on the N6 position.

To note that although nCIR is not the MD more significantly related with DKiA3-REA3, it is very informative for the property. From nCIR, we can infer that the bulkiness of the N6 substituent suggested in (12) can be characterized by a cyclic rather than an alkyl substituent.

Although useful, this information is found to be incomplete because it is well known that steric factors are determinant for the design of A3AR agonists, especially for binding affinity (12). Consequently, it is found to be important to determine the optimal size of the conformationally restricted cyclic N6 substituent. Unfortunately, the simple inspection of the regression parameters of the PM does not offer this information. In consequence, a property/desirability profiling was carried out to identify the levels of the MDs included in the PM that simultaneously generate the most desirable combination of binding affinity and relative efficacy.

As the main goal of this analysis is to extract information on the factors governing DKiA3-REA3 rather than optimize it, the behavior of DKiA3-REA3 was profiled at the mean values of the four MDs rather than looking for their optimal values (see first row in Figure 1). Accordingly, it was possible to find the levels of the MDs simultaneously producing the best possible DKiA3-REA3 in the training set employed. As can be noted in Figure 1 (second row), a A3AR agonist candidate should exhibit a value of DKiA3-REA3 near to 0.9 at levels of ARR, nCs, ALOGP2, and nCIR around 0.4, 2, 0, and 6; respectively.


Figure 1.  Property/desirability profiling of the levels of the molecular descriptors that simultaneously produce the most desirable combination of binding affinity and relative efficacy of N6-substituted-4′-thioadenosine A3AR agonists.

Download figure to PowerPoint

The analysis reveal that the most favorable balance of binding affinity and agonist efficacy: the ARR should be not just low but near to 0.4; ALOGP2 should be as low as possible; the number of secondary sp3 carbon atoms should be kept around two; and nCIR should be not just high but close to six.

Because the thioadenosine nucleus already contain three secondary sp3 carbon atoms, at least on the applicability domain of the present model, the minimum number of such atoms should be kept at three. So this type of carbons must be excluded in the substituents located at N6 position.

At the same time, considering that the nCIR value of the thioadenosine nucleus is four, one can deduce that the ideal nCIR value of the N6 substituent should be two. This information can be structurally translated into bicyclic N6 type of substituents.

The inclusion in the PM of nCIR, instead of the number of rings in the chemical graph (nCIC) is also significant. Although the structural information of this pair of MDs is very similar (the number of cyclic structures in a chemical graph) their graph-theoretical information is quite different. While nCIC encodes the number of rings, nCIR includes both rings and circuits (a circuit is a larger loop around two or more rings). As an example, naphthalene contains 3 circuits and 2 rings. This is illustrated in Figure 2.


Figure 2.  Graphical illustration of the definition of nCIC and nCIR for two chemical graphs.

Download figure to PowerPoint

So additional information can be inferred: the bicyclic N6 substituent should not be fused. This assumption could be related to the binding interaction of this type of fragments with the A3AR. In fact, the presence of a certain degree of rotational freedom between the two rings of the fragment could favor its docking into the receptor cavity.

This result matches with previous experimental findings on the SAR of this family of thioadenosine derivatives (34). The SAR obtained for this family suggests that compounds with bulky N6 substituents lost their binding to the A3AR. Paradoxically, among compounds showing high binding affinity at the human A3AR, two compounds substituted with a N6-(trans-2-phenylcyclopropyl)amino group were found to be full agonists at the human A3AR. In addition, it was found that compounds with α-naphthylmethyl N6 substituents lost their binding to the A3AR (34), which reinforce the present proposal.

From the study it was also concluded that bulky N6 substituents only affects the binding affinity; however bulky (bicyclic) substituents such as a trans-2-phenylcyclopropyl group could be beneficial for agonist efficacy without lost their binding affinity. Although that experimental study do not deal with the simultaneous analysis of both properties, their experimental findings properly match with our theoretical results.

Until now, it has been exposed the importance of bicyclic and rigid N6 substituents contributing to reduce the hydrophobicity of the system to obtain an adequate balance between binding affinity and relative efficacy profiles of N6-substituted-4′-thioadenosine A3AR agonists.

At first sight, this information is pretty focused and we could expect that the task of finding promising candidates is almost performed. However, if we consider the number of attainable N6 substituents of this type, generated from a tiny portion of the possible chemical space indicated by this information we can extrapolate the huge number of possible candidates (Table 3). To mention that this analysis has been only performed taking into account unsaturated rings and the valence of the atoms. The number of options can vary, rising or go down if we consider double bounds or chemical feasibility. Anyway, although focused, the ‘haystack’ is vast. So it is determinant a focused screening strategy to efficiently find some ‘needle’ on it.

Table 3.   Fraction of the chemical space determined by the N6 substituents conformed by the possible combinations of two not fused rings linked by a single bound Thumbnail image of

Therefore, the previous information is employed for the theoretical design of new N6-substituted-4′-thioadenosine analogs with adequate balances between binding affinity and agonist efficacy. Because ARR and ALOGP2 cannot be easily manipulated by structural modifications, the design efforts will be mainly focused on nCs and nCIR. Thus, a combinatorial library focused on the generation of N6-substituted-4′-thioadenosine candidates was assembled with nCs ≈ 3 and nCIR ≈ 6. This approach was performed with the aid of the SmiLib software (48), for the rapid assembly of combinatorial Libraries in SMILES notation. The library was directed to produce candidates with conformationally restricted bicyclic N6 substituents while keeping at minimum the presence of secondary sp3 carbon atoms using the 4′-thioadenosine nucleus as scaffold and a set of 25 cyclic or heterocyclic structures as linkers and building blocks. The working combinatorial scheme is shown in Table 4.

Table 4.   Scaffolds, linkers, and building blocks employed to assemble the combinatorial library Thumbnail image of

This combinatorial strategy produced a set of more than 9000 candidates, which according to previous results can be employed in a subsequent virtual screening campaign using as ranking criterion the predicted value of DKiA3-REA3 of each candidate. As mentioned before, only candidates included on the applicability domain of the overall desirability PM (3395 candidate molecules) should be submitted to the ranking process. Figure 3 shows the plot of the predicted DKiA3-REA3 values of the 9782 candidate molecules versus their respective leverage values. As can be noted, predictions range from values of −0.31 to 1.70; however, candidates included on the PM applicability domain are restricted to predicted values of DKiA3-REA3 between 0.22 and 1.44. As a result, it is possible to propose for biological screening a reduced set of candidates with a promissory balance between A3AR binding affinity and agonist efficacy. The values of the MDs included on the overall desirability PM as well as the predicted value of DKiA3-REA3 for a fragment of the ranked combinatorial library are shown in Table 5.


Figure 3.  Predicted DKiA3-REA3 values of the candidate molecules included on the combinatorial library plotted vs. their respective leverage values.

Download figure to PowerPoint

Table 5.   Fractions of the combinatorial library ranked according to the predicted values of DKiA3-REA3
RankComb. Lib. ID*ARRnCIRnCsALOGP2Pred. DKiA3-REA3
  1. ARR, Aromatic ratio.

  2. *Combinatorial Library identification: 1.36_2 = Scaffold1.Linker36_Building Block2.


Library ranking based on the combination of desirability and belief theories

Although the idea of desirability-transforming and combining a number of related properties is in accordance with the concept of pharmaceutical profile (32,33), the usefulness of a parallel approach allowing obtaining a feedback on the reliability of the properties predicted as a unique Di value is also desirable.

If two or more property values Yi (previously scaled to the respective di values with proper DF) of a compound are combined into a unique Di value, to map it as a MLR function of n MDs Xi (denoted as approach A1), it is rational to expect that the resultant predicted Di value should be similar to the inverse approach. The inverse approach consist in the independent mapping of the k properties Yi as a MLR function of n MDs Xi, the subsequent desirability-scaling of each predicted Yi value and the final combination of the corresponding di values into a unique predicted Di value (denoted as approach A2).

  • image(15)

Assuming true the previous analysis, one must anticipate that the higher is the degree of similarity between the predicted Di values of both approaches, the higher should be their reliability, and vice versa. Clearly, the results will depend on the goodness of fit and prediction of the set of PMs involved. In addition, the degree of uncertainty of PMs with different sets of MDs will be diverse.

So it is required a framework allowing the fusion of results from different approaches to access the reliability of predictions from several approaches with different degrees of uncertainty. In the present work, we select Dempster–Shafer Theory (DST) (49–51) (also known as belief theory) to achieve that goal. DST is a mathematical theory of evidence that has been developed to combine separate pieces of information that can arise from different sources (52). Dempster–Shafer Theory is based on two ideas: the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster’s rule for combining such degrees of belief when they are based on independent items of evidence (52).

The foundations of DST can be traced to the work of George Hooper, who published an article in the Philosophical Transaction of the Royal Society entitled ‘A calculation of the credibility of human testimony’ (50). In this article, Hooper formulated two rules relating the credibility of reports to the credibility of the reporters who make them (51).

These two rules are quite simple. The rule for successive testimony says that if a report has been relayed to us through a chain of n reporters, each having a degree of credibility p, then the credibility of the report is pn. The rule for concurrent testimony says that if a report is concurrently attested to by n reporters, each with credibility p, then the credibility of the report is 1−(1−p)n; where 0 ≤  1. Thus, the credibility of a report is weakened by transmission through a chain of reporters but strengthened by the concurrence of reporters (50,51).

If we make a simple analogy of this situation with the situation previously exposed regarding two parallel overall desirability PMs, each approached inversely, is possible to note that DST theory, specifically, the Hospers’s rule for combining concurrent evidence (50,51), is fully applicable to our problem. There, it is only needed to replace ‘report’ with ‘prediction’ and ‘reporter’ with ‘PM’, and the previous paragraph will almost literally describe our problem.

Developing a probability assignment is the basic function in DST and is an expression of the level of confidence that can be ascribed to a particular measurement. However, in this work, we are interested on the desirability of a compound. Consequently, rather than a probability assignment for each compound, we will use the desirability values coming from both overall desirability PMs approaches (D1 and D2) to derive the final joint belief values (BD):

  • image(16)

While desirability is not itself a probability, like probabilities their values also range from 0 to 1. Therefore, it can be used to derive the values of BD for each compound. So in this way, it is possible to encode the reliability of the predicted desirability of a compound along with two inverse but complementary prediction approaches. Given this information, BD can be used as ranking criterion in a virtual screening scheme, resulting particularly useful for LBVS.

A LBVS strategy based on BD can be described in the sequence of steps detailed below:

  • 1
     Prediction Models setup.
  • Here, the predicted Di values for each compound are derived from A1 and A2 as expressed in eqn (13).

  • 2
     Desirability assignment.
  • Because of limitations inherent to the MLR approach, the predicted desirability values not always will be included in the interval [0,1] and consequently is not possible to use it as is to derivate BD. So in the case of the desirability values derived from the approach A1, it is necessary to rescale using eqn (2) considering that D have to be maximized.

  • In the case of the approach A2, the derivation of the respective Di values is affected by the above-mentioned limitations of MLR, but the process is complicated by the wider range of the mapped Yi properties. Consequently, di is scaled by using a two-tale (eqn (4)) using the same target Ti values employed in A1 for each Yi.

  • 3
     Derivation of Joint Belief BD by the application of Hospers’s Rule for Combining Concurrent Evidence.
  • 4
    BD-based ranking.

The resultant ranking should render an ordered list, top ranking the most reliable compounds with the highest desirability values. The compounds with a higher chance to exhibit a desirable combination of the k properties modeled.

Subsequently, the BD-based virtual screening (VS) strategy described earlier was applied to the already described training set to test their performance as ranking criterion. Considering the structural similarity between both (the combinatorial library assembled and our training set) is possible to use the latter to infer the reliability of the ranking attained for the combinatorial library. The predicted values of DKiA3-REA3 (according to approach A1) were also tested as ranking criterion to compare a VS strategy based on predictions coming from a single approach with a VS strategy based on the combination of concurrent predictions. The quality of the respective ranking obtained was compared according to Ψ*, as described earlier.

Based on the analysis of our training set, the quality of the ranking attained using the predicted values of DKiA3-REA3 is around 80%, which suggest an acceptable degree of confidence if the scheme is applied to our combinatorial library (R% = 80.08%; Ψ* = 0.1992). As can be noted in Figure 4, the use of BD as ranking criterion (R% = 82.81%; Ψ* = 0.1719) slightly overcomes the performance of the predicted values of DKiA3-REA3. Considering that BD encodes in addition to the desirability of the compound, the reliability of such a prediction, it is clear their suitability at the moment to screen higher and/or structurally diverse libraries with a wider range of the mapped properties.


Figure 4.  Ranking of the training set compounds based on BD (top) and DKiA3-REA3 (bottom), respectively.

Download figure to PowerPoint


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results and Discussion
  5. Conclusions
  6. Acknowledgment
  7. References
  8. Supporting Information

The development of a linear 1D PM of the A3AR agonists overall desirability based on four simple MDs with a direct physicochemical or structural explanation, as well as the desirability analysis of this model, was described in this work. The results obtained provided significant clues on desired trade-offs between binding and relative efficacy of N6-substituted-4′-thioadenosines A3AR agonists.

The desirability-based PM interpretation strategy proposed here suggest a favorable effect over binding affinity and agonist efficacy of conformationally restricted, but not fused bicyclic N6 substituents. The overall data provide guides to the rational design of new A3AR agonist candidates by assembling a combinatorial library useful for the prioritization of candidates with a promissory balance between A3AR binding affinity and agonist efficacy through a virtual screening campaign. The VS depicted protocol, based on the combined use of desirability and belief theories, exhibited a slightly superior performance compared with the single use of predicted overall desirabilities.

Finally, the combined use of desirability and belief theories in computational medicinal chemistry research was demonstrated to be a valid approach. The model was able to simultaneously consider several properties, in a simple an interpretable manner, and to execute a multi-target LBVS strategy.

  1. aCambridgeSoft. (2004) ChemDraw Ultra. Cambridge: CambridgeSoft.

  2. bTodeschini R., Consonni V., Pavan M. (2005) DRAGON Software. Milano: Talete srl.

  3. cTodeschini R., Consonni V., Pavan M. (2002) MOBY DIGS. Milan, Italy: Talete srl p. Software for Multilinear Regression Analysis and Variable Subset Selection by Genetic Algorithm.


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results and Discussion
  5. Conclusions
  6. Acknowledgment
  7. References
  8. Supporting Information

The authors acknowledge the Portuguese Fundação para a Ciência e a Tecnologia (FCT) (MCM SFRH/BD/30698/2006, MNDSC SFRH/BSAB/930/2009 grants and the project PTDC/QUI/70359/2006) and Xunta de Galicia (PGIDIT07PXIB) for financial support.


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results and Discussion
  5. Conclusions
  6. Acknowledgment
  7. References
  8. Supporting Information
  • 1
    Fredholm B.B., Jzerman A.P., Jacobson K.A., Klotz K.N., Linden J. (2001) International Union of Pharmacology. XXV. Nomenclature and classification of adenosine receptors. Pharmacol Rev;53:527552.
  • 2
    Clarke B., Coupe M. (1989) Adenosine: Cellular mechanisms, pathophysiological roles and clinical applications. Int J Cardiol;23:110.
  • 3
    Chen Y., Corriden R., Inoue Y., Yip L., Hashiguchi N., Zinkernagel A., Nizet V, Insel P.A., Junger W.G. (2006) ATP release guides neutrophil chemotaxis via P2Y2 and A3 receptors. Science;314:17921795.
  • 4
    Jacobson K.A., Gao Z.G. (2006) Adenosine receptors as therapeutic targets. Nat Rev Drug Discov;5:247264.
  • 5
    Grifantini M., Cristalli G., Franchetti P., Vittori S. (1991) Adenosine derivatives as agonists of adenosine receptors. Farmaco;46:161169.
  • 6
    Jacobson K.A., Joshi B.V., Wang B., Klutz A., Kim Y., Ivanov A.A., Melman A., Gao Z-G. (2008) Modified nucleosides as selective modulators of adenosine receptors for therapeutic use. In: HerdewijnP., editor. Modified Nucleosides as Selective Modulators of Adenosine Receptors for Therapeutic Use. Weinheim: Wiley-VCH; p. 433.
  • 7
    Fishman P., Bar-Yehuda S. (2003) Pharmacology and therapeutic applications of A3 receptor subtype. Curr Top Med Chem;3:463469.
  • 8
    Yan L., Burbiel J.C., Maass A., Muller C.E. (2003) Adenosine receptor agonists: from basic medicinal chemistry to clinical development. Expert Opin Emerg Drugs;8:537576.
  • 9
    Leesar M.A., Stoddard M., Ahmed M., Broadbent J., Bolli R. (1997) Preconditioning of human myocardium with adenosine during coronary angioplasty. Circulation;95:25002507.
  • 10
    Conti J.B., Belardinelli L., Curtis A.B. (1995) Usefulness of adenosine in diagnosis of tachyarrhythmias. Am J Cardiol;75:952955.
  • 11
    Madi L., Bar-Yehuda S., Barer F., Ardon E., Ochaion A., Fishman P. (2003) A3 adenosine receptor activation in melanoma cells: association between receptor fate and tumor growth inhibition. J Biol Chem;278:4212142130.
  • 12
    Kim S.K., Jacobson K.A. (2007) Three-dimensional quantitative structure-activity relationship of nucleosides acting at the A3 adenosine receptor: analysis of binding and relative efficacy. J Chem Inf Model;47:12251233.
  • 13
    Brown N., Lewis R.A. (2006) Exploiting QSAR methods in lead optimization. Curr Opin Drug Discov Devel;9:419424.
  • 14
    Hansch C. (1976) On the structure of medicinal chemistry. J Med Chem;19:16.
  • 15
    Gonzalez M.P., Teran C., Teijeira M., Helguera A.M. (2006) Quantitative structure activity relationships as useful tools for the design of new adenosine receptor ligands. 1. Agonist. Curr Med Chem;13:22532266.
  • 16
    Guha R. (2008) On the interpretation and interpretability of quantitative structure-activity relationship models. J Comput Aided Mol Des;22:857871.
  • 17
    Nicolaou A.C., Brown N., Pattichis C.S. (2007) Molecular optimization using computational multi-objective methods. Curr Opin Drug Discov Devel;10:316324.
  • 18
    Harrington E.C. (1965) The desirability function. Ind Qual Control;21:494498.
  • 19
    Derringer G., Suich R. (1980) Simultaneous optimization of several response variables. J Qual Technol;12:214219.
  • 20
    Outinen K., Haario H., Vuorela P., Nyman M., Ukkonen E., Vuorela H. (1998) Optimization of selectivity in high-performance liquid chromatography using desirability functions and mixture designs according to PRISMA. Eur J Pharm Sci;6:197205.
  • 21
    Garcia-Gonzalez D.L., Aparicio R. (2002) Detection of vinegary defect in virgin olive oils by metal oxide sensors. J Agric Food Chem;50:18091814.
  • 22
    Shih M., Gennings C., Chinchilli V.M., Carter W.H. Jr (2003) Titrating and evaluating multi-drug regimens within subjects. Stat Med;22:22572279.
  • 23
    Kording K.P., Fukunaga I., Howard I.S., Ingram J.N., Wolpert D.M. (2004) A neuroeconomics approach to inferring utility functions in sensorimotor control. PLoS Biol;2:e330.
  • 24
    Safa F., Hadjmohammadi M.R. (2005) Simultaneous optimization of the resolution and analysis time in micellar liquid chromatography of phenyl thiohydantoin amino acids using Derringer’s desirability function. J Chromatogr A;1078:4250.
  • 25
    Pavan M., Todeschini R., Orlandi M. (2006) Data mining by total ranking methods: a case study on optimisation of the “pulp and bleaching” process in the paper industry. Ann Chim;96:1327.
  • 26
    Coffey T., Gennings C., Moser V.C. (2007) The simultaneous analysis of discrete and continuous outcomes in a dose-response study: using desirability functions. Regul Toxicol Pharmacol;48:5158.
  • 27
    Rozet E., Wascotte V., Lecouturier N., Preat V., Dewe W., Boulanger B., Hubert P. (2007) Improvement of the decision efficiency of the accuracy profile by means of a desirability function for analytical methods validation. Application to a diacetyl-monoxime colorimetric assay used for the determination of urea in transdermal iontophoretic extracts. Anal Chim Acta;591:239247.
  • 28
    Wong W.K., Furst D.E., Clements P.J., Streisand J.B. (2007) Assessing disease progression using a composite endpoint. Stat Methods Med Res;16:3149.
  • 29
    Cojocaru C., Khayet M., Zakrzewska-Trznadel G., Jaworska A. (2008) Modeling and multi-response optimization of pervaporation of organic aqueous solutions using desirability function approach. J Hazard Mater;167:5263.
  • 30
    Fajar N.M., Carro A.M., Lorenzo R.A., Fernandez F., Cela R. (2008) Optimization of microwave-assisted extraction with saponification (MAES) for the determination of polybrominated flame retardants in aquaculture samples. Food Addit Contam Part A Chem Anal Control Expo Risk Assess;25:10151023.
  • 31
    Jancic-Stojanovic B., Malenovic A., Ivanovic D., Rakic T., Medenica M. (2009) Chemometrical evaluation of ropinirole and its impurity’s chromatographic behavior. J Chromatogr A;1216:12631269.
  • 32
    Cruz-Monteagudo M., Borges F., Cordeiro M.N. (2008) Desirability-based multiobjective optimization for global QSAR studies: application to the design of novel NSAIDs with improved analgesic, antiinflammatory, and ulcerogenic profiles. J Comput Chem;29:24452459.
  • 33
    Cruz-Monteagudo M., Borges F., Cordeiro M.N.D.S., Cagide Fajin J.L., Morell C., Molina Ruiz R., Cañizares-Carmenate Y., Dominguez E.R. (2008) Desirability-Based Methods of Multiobjective Optimization and Ranking for Global QSAR Studies. Filtering Safe and Potent Drug Candidates from Combinatorial Libraries. J Comb Chem;10:897913.
  • 34
    Jeong L.S., Lee H.W., Kim H.O., Jung J.Y., Gao Z.G., Duong H.T., Rao S., Jacobson K.A., Shin D.H., Lee J.A., Gunaga P., Lee S.K., Jin D.Z., Chon M.W., Moon H.R. (2006) Design, synthesis, and biological activity of N6-substituted-4′-thioadenosines at the human A3 adenosine receptor. Bioorg Med Chem;14:47184730.
  • 35
    Burkert U., Allinger N.L. (1982) Molecular Mechanics. Washington, D.C., USA: ACS.
  • 36
    Clark T. (1985) Computational Chemistry. NY, USA: Wiley.
  • 37
    Stewart J.J.P. (1989) Optimization of parameters for semiempirical methods I. Method. J Comp Chem;10:209220.
  • 38
    Frank J. (1993) MOPAC. MOPAC. Colorado Springs, CO: Seiler Research Laboratory, US Air Force Academy.
  • 39
    Leardi R., Boggia R., Terrile M. (1992) Genetic algorithms as a strategy for feature selection. J Chemom;6:267281.
  • 40
    Todeschini R., Consonni V., Mauri A., Pavan M. (2003) MobyDigs: Software for Regression and Classification Models by Genetic Alghorithms. In: LeardiR., editor. Nature-inspired Methods in Chemometrics: Genetic Algorithms and Artificial Neural Networks. Amsterdam: Elsevier; p. 141167.
  • 41
    Todeschini R., Consonni V., Mauri A., Pavan M. (2004) Detecting “bad” regression models: multicriteria fitness functions in regression analysis. Anal Chim Acta;515:199208.
  • 42
    Eriksson L., Jaworska J., Worth A.P., Cronin M.T., McDowell R.M., Gramatica P. (2003) Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ Health Perspect;111:13611375.
  • 43
    Efron B. (1987) Better bootstrap confidence intervals. J Am Stat Assoc;82:171200.
  • 44
    Lindgren F., Hansen B., Karcher W., Sjöström M., Eriksson L. (1996) Model validation by permutation tests: applications to variable selection. J Chemom;10:521532.
  • 45
    Stewart J., Gill L. (1998) Econometrics, 2nd edn. London: Prentice Hall.
  • 46
    Kutner M.H., Nachtsheim C.J., Neter J., Li W. (2005) Multicollinearity and Its Effects. New York: McGraw Hill; p. 278289.
  • 47
    Atkinson A.C. (1985) Plots, Transformations and Regression. Oxford: Clarendon Press.
  • 48
    Schüller A., Schneider G., Byvatov E. (2003) SMILIB: rapid Assembly of Combinatorial Libraries in SMILES Notation. QSAR Comb Sci;22:719721.
  • 49
    Dempster A.P. (1967) Upper and lower probabilities induced by a multivalued mapping. Ann Stat;28:325339.
  • 50
    Hooper G. (1699) A calculation of the credibility of human testimony. Philos Trans R Soc;21:359365.
  • 51
    Shafer G. (1986) The combination of evidence. Int J Intell Syst;1:155179.
  • 52
    Muchmore S.W., Debe D.A., Metz J.T., Brown S.P., Martin Y.C., Hajduk P.J. (2008) Application of belief theory to similarity data fusion for use in analog searching and lead hopping. J Chem Inf Model;48:941948.

Supporting Information

  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results and Discussion
  5. Conclusions
  6. Acknowledgment
  7. References
  8. Supporting Information

Figure S1. Correlation Matrix for KiA3 Model (eqn 13).

Figure S2. Pareto chart of t-values for coefficients in KiA3 Model (eqn 13).

Figure S3. Correlation Matrix for REA3 Model (eqn 14).

Figure S4. Pareto chart of t-values for coefficients in REA3 Model (eqn 14).

Figure S5. Applicability domain (for training set compounds) of the MLR models employed on prediction approach A2.

Table S1. Chemical structures, MDs and property values of the library of N6-substituted-4′-thioadenosine analogues.

Table S2. Checking the main parametric assumptions and the applicability domain of the overall desirability PM involved on the prediction approach A1.

Table S3. Checking the main parametric assumptions related to the MLR models used in approach A2.

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

CBDD_971_sm_FS1-5TableS1-3.doc2680KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.