Development and validation of the discriminant functions (DFs)
The linear discriminant analysis (LDA) has become an important tool for the prediction of chemical properties. Because of the simplicity of this method, many useful discriminant models have been developed and presented by different authors in the literature (21,23,32,42–44). It was the technique used in the generation of a discriminant function in the present work. Also, the principle of maximal parsimony (Occam’s razor) was taken into account as the strategy for model selection (45). The general data set was randomly divided into two subsets, training and test set (which have 346 and 94 compounds, respectively), both of them containing active and inactive compounds. The best models obtained using atom-based non-stochastic and stochastic bilinear indices as MDs, together with their statistical parameters are given below, respectively:
where, Class refers to antitrypanosomal activity, N is the number of compounds, λ is the Wilks’ statistic, QTotal is the accuracy of the model for the training set, MCC is the Matthews’ correlation coefficient, D2 is the square Mahalanobis distance, F is the Fisher ratio and p-value is its significance level.
Both equations appeared statistically significant at p < 0.001. The best non-stochastic model (Eq. 1), which includes non-stochastic indices, presents a good overall accuracy of 89.02% for the training set (see Table 1). In addition, this model showed an adequate Matthews’ correlation coefficient of 0.76; MCC quantifies the strength of the linear relation between the MDs and the classifications, and, usually, it may provide a much more balanced evaluation of the prediction than, for instance, the percentages (accuracies). Together with the accuracy, other parameters such as sensitivity, specificity and false positive rate (also known as ‘false alarm rate’) are among the most commonly used parameters in medical statistics. While the sensitivity is the probability of correctly predicting a positive case, the specificity (also known as ‘hit rate’) is the probability that a positive prediction be correct (37). The non-stochastic model shows, for the training set, a good value of sensitivity of 85.83%, a specificity value of 83.06% and a false positive rate of only 9.29% (See Table 1). Nevertheless, the most important criterion, for the acceptance or not of a discriminant model, is based on statistics for the external prediction set, which is known as the predictive power of the model. For the test set, the non-stochastic model showed an accuracy of 85.11%, MCC of 0.67, a good value of sensitivity of 91.30% and a specificity value of 63.64%, with a 16.90% of false positive rate.
Table 1. Prediction performances for LDA-based QSAR models for training and test sets
|Models||Matthews correlation coefficient (C)||Accuracy ‘QTotal’ (%)||Specificity (%)||Sensitivity ‘hit rate’ (%)||False positive rate (%)|
| Eq. 1||0.76||89.02||83.06||85.83||9.29|
| Eq. 2||0.77||89.60||82.81||88.33||9.73|
| Eq. 1||0.67||85.11||63.64||91.30||16.90|
| Eq. 2||0.74||88.30||68.75||95.65||14.08|
On the other hand, the best stochastic model (Eq. 2) presents a good overall accuracy of 89.60%, with a good MCC value of 0.77 for the training set. These values are slightly better than those obtained with the non-stochastic model. The achieved values for sensitivity and specificity were 88.33% and 82.81%, respectively, as well as a false positive rate of only 9.73%. For the test set, the results of the stochastic model were an accuracy of 88.30%, MCC of 0.74, sensitivity of 95.65% and specificity of 68.75%; these values are acceptable. All the values are reported in Table 1. The results of the classification for compounds in both, training and test, sets achieved with Eqs. 1 and 2 can be seen in the Supporting Information (Tables S1–S4).
Therefore, the robustness of the model refers to the stability of its parameters (predictor coefficients) and, consequently, to the stability of its predictions when a perturbation is applied to the training set and the model is regenerated from the ‘perturbed’ training set. Here, we develop the leave-group-out (LGO) and Y-scrambling procedures (b, 3,46) as very important tools to detect what is sometimes referred to as ‘internal predictivity’ and possible chance correlation in the models obtained, respectively (For details, see section 1 of Supporting Information). First, a LGO strategy was performed, and the calculation of accuracies in the new training sets and test set compounds permitted us to carry out the assessment of the models. The results of this validation process are illustrated in Figure S1 (see Supporting Information). It can be observed from this plot that the models present a high stability to disturbances within the database. The results of the stochastic model were better than those obtained with the non-stochastic model. After that, the Y-scrambling test was carried out. The results of our randomization experiments are shown in Figure S2 (see Supporting Information) and indicate that when the random group size is increased, the globally good accuracy of the model decreased gradually. This outcome indicates that the values of good overall classification are not because of chance correlation or structural redundancy in the training set.
In silico and experimental identification of novel antitrypanosomals
The entire algorithm, described in the sections above, was made up with the main objective of exploring the applicability of the QSAR models, obtained with the atom-based bilinear indices, for the identification of ‘hits’ (pro-lead compounds) from large databases. Therefore, an in silico screening of novel compounds was performed, looking for the biological activity concerning this work. To carry out this, a pool of approximately 200 compounds available from our academic collaborators never described in the literature as antitrypanosomal agents was chosen. Later, the in silico assays were performed by using all the models developed inside this report, to identify bioactive chemicals that present trypanocidal activity.
Here, 18 new organic compounds were selected as putative antitrypanosomal by the LDA-based QSAR models. However, it is generally acknowledged that QSARs are valid only within the same domain for which they were developed. In fact, even if the models are developed on the same chemicals, the applicability domain (AD) for new chemicals can differ from model to model, depending on the specific MDs. Therefore, the leverage values (h) and standardized residuals related to these 18 compounds were calculated; the leverage values of these new compounds were lower than the value of warning leverage (h* = 0.06); the corresponding leverage plot is shown in Figure S3 (For details, see Section 2 of supporting information). According to this, these chemicals lie in the AD of the model; consequently, their predictions are reliable. This proves the good valuation for the classification of this set of compounds as new antitrypanosomal, and so, this model can be used with high accuracy for the prediction of new compounds within its AD.
After that, the in vitro assays of the previously synthesized compounds (Figure 1) were carried out to corroborate the in silico predictions. We proceeded to test the compounds in an epimastigote inhibition (in vitro) assay (40). The ΔP% values of the compounds in the data set, using all the DFs and the chemical structures, are depicted in Table 2 and Figure 1, respectively. A good agreement (16/18) is observed between the experimental antitrypanosomal activity and theoretical predictions for this set of compounds. Sixteen compounds showed more than 70% of epimastigote inhibition at a concentration of 100 μg/mL (see Table 2). Also, three compounds (CRIS 112, CRIS 140 and CRIS 147) demonstrated more than 70% of epimastigote inhibition at a concentration of 10 μg/mL (79.95%, 73.97% and 78.13%, respectively). Even though none of them resulted more active than nifurtimox, the current results constitute a step forward in the search for efficient ways to discover new lead antitrypanosomals.
Table 2. Compounds evaluated in the present study, their classification (ΔP%) according to the obtained models, their antitrypanosomal activity and cytotoxicity at three different concentrations (100, 10 and 1 μg/mL) and antitrypanosomal activity of nifurtimox (reference)
|Compound||Exp.a||ΔP Eq. 1b||ΔP Eq. 2c||%AE (SD)d||%CI (SD)e|
|100 μg/mL||10 μg/mL||1 μg/mL||100 μg/mL||10 μg/mL||1 μg/mL|
|CRIS 105||A||94.5||97.5||72.10 ± 0.28||38.20 ± 2.61||14.83 ± 5.16||27.58 ± 1.45||0.00 ± 4.35||0.00 ± 2.18|
|CRIS 109||A||96.0||97.3||84.21 ± 0.75||56.20 ± 1.39||0.00 ± 2.05||49.21 ± 0.60||10.88 ± 1.36||11.66 ± 1.70|
|CRIS 110||A||96.3||97.3||82.14 ± 0.72||54.15 ± 0.89||8.56 ± 0.47||65.85 ± 1.68||33.48 ± 4.61||7.14 ± 2.05|
|CRIS 111||A||96.2||97.8||83.80 ± 1.47||41.73 ± 1.25||23.94 ± 1.02||42.91 ± 0.47||8.68 ± 0.72||0.00 ± 1.64|
|CRIS 112||A||96.4||97.8||87.24 ± 0.29||79.95 ± 2.17||15.42 ± 1.34||57.99 ± 4.88||19.70 ± 0.85||0.00 ± 1.15|
|CRIS 116||A||97.9||98.5||70.84 ± 2.38||53.18 ± 1.88||6.98 ± 4.25||24.31 ± 1.52||9.71 ± 1.57||7.85 ± 1.30|
|CRIS 119||A||98.0||98.6||73.77 ± 1.66||30.71 ± 0.88||19.65 ± 2.57||63.22 ± 1.32||25.69 ± 1.32||11.22 ± 2.28|
|CRIS 130||A||97.9||98.6||76.45 ± 2.31||46.09 ± 2.53||0.00 ± 2.68||50.21 ± 0.82||12.60 ± 1.18||0.00 ± 2.14|
|CRIS 131||I||99.8||99.3||35.56 ± 2.35||21.71 ± 1.81||4.24 ± 0.82||20.54 ± 1.63||27.56 ± 1.45||7.14 ± 1.20|
|CRIS 135||A||94.6||97.6||81.13 ± 2.55||35.48 ± 4.16||10.69 ± 1.35||35.18 ± 1.54||11.71 ± 1.33||0.00 ± 0.85|
|CRIS 140||A||96.1||97.3||77.46 ± 2.69||73.97 ± 1.79||33.25 ± 1.78||64.19 ± 1.10||7.44 ± 1.47||0.00 ± 1.97|
|CRIS 141||A||96.2||97.8||75.64 ± 0.80||54.38 ± 0.55||8.27 ± 1.05||99.46 ± 0.21||99.90 ± 0.07||34.66 ± 1.91|
|CRIS 142||A||99.8||99.0||74.82 ± 1.65||22.23 ± 5.23||2.51 ± 1.67||31.41 ± 4.48||19.24 ± 1.72||5.72 ± 0.65|
|CRIS 143||A||99.8||99.1||80.35 ± 3.25||39.01 ± 2.11||7.80 ± 3.28||71.14 ± 3.60||23.14 ± 4.10||4.67 ± 0.80|
|CRIS 147||A||99.8||99.1||99.29 ± 0.74||78.13 ± 0.78||23.44 ± 2.00||37.23 ± 0.79||20.63 ± 2.12||6.28 ± 2.62|
|CRIS 148||A||99.8||98.9||82.26 ± 1.32||31.77 ± 0.78||12.56 ± 4.04||26.79 ± 2.42||26.74 ± 5.06||6.71 ± 1.06|
|CRIS 149||A||99.8||99.0||75.00 ± 2.96||48.56 ± 0.87||14.34 ± 1.95||41.32 ± 2.76||10.10 ± 1.32||0.00 ± 1.93|
|CRIS 153||I||99.9||99.5||20.31 ± 0.56||18.75 ± 0.54||21.41 ± 0.52||20.63 ± 1.20||20.70 ± 0.56||3.50 ± 1.63|
|Nifurtimox||A||99.98||98.39||100 ± 1.49||85.45 ± 2.43||38.21 ± 2.17||11.68||0.6||0.32|
After this preliminary in vitro test, the unspecific cytotoxicity was determined against macrophages at the concentrations that were used in the previous assay (40,41). At this time, three compounds (CRIS 105, CRIS 116 and CRIS 148) that showed more than 70% of epimastigote inhibition, at a concentration of 100 μg/mL (Table 2), also presented acceptable values of cytotoxicity (27.58%, 24.31% and 26.79%, respectively). The three compounds with more than 70% activity at a concentration of 10 μg/mL (CRIS 112, CRIS 140 and CRIS 147) showed low values of cytotoxicity (19.7%, 7.44% and 20.63%, correspondingly). Taking into account all these results, we can say that some compounds of this group can be optimized in forthcoming works, but we consider that compound CRIS 140 is the best candidate (see Figure 1).
Here we would like to give a brief consideration about the possible structure–activity relationship for this set of compounds. In accordance with the experimental results, if we select, for example, compound CRIS140 with CRIS149 and CRIS153, we can see that the hybridization sp3 of the carbon to which the pyridyl ring is attached seems to be better than sp2 hybridization for the trypanosomicidal action. Similar situation can be seen if we compare compounds CRIS112 and CRIS131; in both cases, carbons with sp3 hybridization present more % of anti-epimastigotes (AE) than those that have sp2 hybridization in the same position.
On the other hand, the same group of chemicals used in this work was recently tested against other protozoan parasite, Trichomonas vaginalis, and all compounds were found inactive at all assayed concentrations, with exception of compound CRIS 148 (47). Therefore, we can say that the antitrypanosomal activity, predicted and experimentally corroborated in this work, is quite specific for this group of compounds. However, a T. cruzi amastigote susceptibility assay and other tests of activity against other protozoa parasites are needed, in particular with other protozoa that also belong to the trypanosomatida family like Leishmania and Trypanosoma brucei.