Superaugmented Eccentric Distance Sum Connectivity Indices: Novel Highly Discriminating Topological Descriptors for QSAR/QSPR

Authors


*Corresponding author: Anil Kumar Madan,madan_ak@yahoo.com

Abstract

Four highly discriminating fourth-generation topological indices (TIs), termed as superaugmented eccentric distance sum connectivity indices, as well as their topochemical versions (denoted by inline image, inline image, inline image and inline image), have been conceptualized in this study. The values of these indices for all possible structures with three, four, and five vertices containing one heteroatom were computed using an in-house computer program. The proposed superaugmented eccentric distance sum connectivity topochemical indices exhibited exceptionally high discriminating power, low degeneracy, and high sensitivity toward both the presence and the relative position of heteroatom(s) for all possible structures with five vertices containing at least one heteroatom. Intercorrelation analysis revealed the absence of correlation of proposed indices with Zagreb indices and the molecular connectivity index. Subsequently, the proposed TIs were successfully utilized for the development of models for the prediction of checkpoint kinase inhibitory activity of 2-arylbenzimidazoles. A data set comprising 47 differently substituted analogs of 2-arylbenzimidazoles was selected for the study. The values of various TIs for each analog in the data set were computed using an in-house computer program. The resulting data were analyzed, and suitable models were developed through decision tree (DT), random forest (RF), and moving average analysis (MAA). The performance of the models was assessed by calculating the specificity, sensitivity, overall accuracy, and Mathew’s correlation coefficient. A decision tree was constructed for the checkpoint kinase inhibitory activity to determine the importance of topological indices. The decision tree identified the proposed TIs –inline image, inline image– as the most important indices. The decision tree learned the information from the input data with an accuracy of 96% and correctly predicted the cross-validated (10-fold) data with an accuracy of 77%. Random forest correctly predicted the checkpoint kinase inhibitory activity with an accuracy of 83%. The single index-based models were also developed for the prediction of checkpoint kinase inhibitory activity using MAA. The accuracy of prediction of single index-based models derived through MAA was found to vary from a minimum of 90% to a maximum of 95%. Exceptionally high discriminating power, low degeneracy, and high sensitivity toward branching and presence of heteroatom of proposed indices can be of immense use in drug design, isomer discrimination, similarity/dissimilarity studies, quantitative structure activity/property relationships, lead optimization, and combinatorial library design.

The identification and optimization of the lead compounds in a rapid and cost-effective way are the most critical steps in drug discovery. The computer-aided drug discovery approach offers an alternative to the real world of synthesis and screening (1,2). The computational techniques have advanced rapidly over the past few decades and have played a major role in the development of a number of drugs now in the market or going through clinical trials (3,4). QSAR/QSPR is the mathematical relationship linking chemical structure and pharmacological activity/property in a quantitative manner for the series of compounds (5). It also reduces the number of compounds to be synthesized and promptly detects the most favorable compounds. Fundamentally, QSAR aims to identify relationships between some aspects of molecular structure and properties as toxicology, pharmacodynamics, and pharmacokinetics (6).

The 2D approach has a number of advantages compared with the higher dimension QSAR methodologies. First of all, owing to the variety of molecular descriptors available, optimized coordinates are not always required. In fact, connectivity information (in the form of an adjacency matrix) alone can be used to develop QSAR models. As a result, the models using topological descriptors can be built rapidly for very large sets of molecules. Second, this approach avoids the alignment step and thus can be used in the absence of experimental information regarding the binding of a molecule to its target.a

The 2D QSAR makes use of TIs which are the numerical values associated with the chemical constitution for correlation of chemical structures with various physical properties, chemical reactivity, or biological activity (7). These are derived from topological representation of molecules and can be considered structure explicit descriptors (8). The TIs are among the most useful descriptors known nowadays, as these can be rapidly computed for large number of molecules and also offer a simple way of measuring molecular branching, shape, size, cyclicity, symmetry, chirality, complexity, and heterogeneity of atomic environments in the molecule (9–14). The past two decades have witnessed that the use of TIs in QSAR models enhanced the scope of drug design by producing the reliable estimates of therapeutic and toxic potential of chemicals (15).

The genetic integrity of a cell is constantly challenged by radiation, chemical agents, and replication errors (16). These agents mainly cause double strand breaks (DSB) and single strand breaks (SSB) and cause genomic instability that may lead to tumor development, if left unrepaired (17). The DNA damage is also used to cure the cancer. Many of the conventional anticancer treatments (ionizing radiation, hyperthermia, pyrimidine and purine antimetabolites, alkylating agents, DNA topoisomerase inhibitors, and platinum compounds) at least partly damage the DNA of cells. As these treatments are not specifically selective for cancer cells, patients have suffered from serious side effects when taking these drugs (18). Therefore, DNA damage causes the disease, used to treat the disease, and responsible for the toxicity of therapies for disease (19).

In DNA damage response (DDR), eukaryotic cells activate checkpoint pathways to arrest the cell cycle (20–22). The checkpoints comprise a subroutine integrated into the larger DDR pathway that regulates a multifaceted response. Moreover, several checkpoint genes are essential for cell and organism survival (23–27) implying that these pathways are not only surveyors of occasional damage but are firmly integrated components of cellular physiology (22).

The DNA damage checkpoints are known to comprise signal transduction cascades that link the detection of DNA damage to several other processes, i.e. inhibition of progression through the cell cycle from G1 to S, through S and from G2 into M, activation of DNA repair and initiation of apoptosis (28). DNA damage is recognized by damage sensor proteins such as Mre11-Rad50-Nbs1 (MRN complex) and breast and ovarian cancer locus 1 (BRCA1)-associated genome surveillance complex (BASC). These proteins recruit and activate the upstream Ataxia-telangiectasia mutated (ATM) protein and ATM and Rad 3-related (ATR) kinases (17,29). Checkpoint kinases Chk1 and Chk2 are downstream key mediators of DDR through activation of an increasing number of substrates such as p53, NBS1, BRCA1, MDM2, Cdc25A, Cdc25C, and E2F1 (30–32). The relevance of these kinases in the maintenance of genome integrity is clearly indicated by the severe human genetic disorders and the predisposition to cancer associated with defects in these proteins (20,33–35).

Radiation and chemotherapy as the therapy for cancer often have serious side effects that limit their efficacy. Modulations of checkpoint regulating responses to these types of drugs appear as a potential strategy to sensitize the tumor cells to the DNA damaging agents (17). Checkpoint kinase 2 acts as mediator between DNA damage signaling and also act as barrier for tumorogenesis (36). There is evidence in favor of therapeutic value of Chk2 inhibitors (37,38). Checkpoint kinase 2 inhibitors are reported to augment the effect of various cytotoxic drugs, e.g. Doxorubicin (39), Cisplatin (40), and Paclitaxel (41).

The side effects from the radiation therapy have been reported as more serious. As these side effects are in part determined by p53-mediated apoptosis, temporary suppression of p53 has been suggested as a therapeutic strategy to prevent damage of normal tissues during treatment of p53-deficient tumors (42,43). The p53 response to DNA breaks induced by radiation and certain chemical agents is controlled by Chk2 (36). Studies showed that Chk2-deficiency exhibited radioresistance and a critical role in p53 function in response to IR by regulating its transcriptional activity and its stability indicating the utility of Chk2 inhibitors as radioprotectant for normal cells (44,45). Thus, Chk2 inhibitors may be useful drugs for reducing the side effects of cancer therapy and other types of stress associated with p53 activation (46,47).

Agents that target checkpoint kinases have demonstrated impressive evidence preclinically that this approach will provide tumor-specific potentiating agents and may have broad therapeutic utility. Only a few selective Chk2 inhibitors have been reported other than 2-arylbenzimidazole (48), NSC 109555 (49), VRX0466617 (50), isothiazole carboxamides (51), and PV 1019 (52). There are various published inhibitors of Chk1 (Staurosporin, Go6976, SB-218078, ICP-1, CEP-3891, and AZD7762) (53) and both Chk1 and Chk2 (TAT-S216A, UCN-01, and debromohymenialdisine) (54,55), CEP-6367, Sulforaphane (18,56,57).

The past decade has witnessed the development of checkpoint kinase inhibitors for the treatment of cancer. Three checkpoint kinase inhibitors have already entered clinical trials since 2005 (58). The pharmaceutical industry strives to explore novel scaffolds for checkpoint kinase inhibition.

In this study, four topological descriptors termed as superaugmented eccentric distance sum connectivity indices and their topochemical versions have been conceptualized and successfully utilized along with existing TIs for development of models for prediction of checkpoint kinase (Chk2) inhibitory activity of 2-arylbenzimidazoles.

Methodology

Calculation of topological indices

The values of inline image were calculated for all possible structures with three, four, and five vertices containing one heteroatom (Figures 1 and 2.) using an in-house computer program.

Figure 1.

 Index values of for all possible structure with three, four, and five vertices containing one heteroatom. *Cpd no., compound number.

Figure 2.

 Calculation of values of superaugmented eccentric distance sum connectivity topochemical index-1 (inline image), superaugmented eccentric distance sum connectivity topochemical index-2 (inline image), superaugmented eccentric distance sum connectivity topochemical index-3 (inline image), and superaugmented eccentric distance sum connectivity topochemical index-4 (inline image), for three isomers of 11-membered molecule (decylamine).

Superaugmented eccentric distance sum connectivity indices

Superaugmented eccentric distance sum connectivity indices, inline image, proposed in this study can be defined as the inverse of the summation of quotients of the product of adjacent vertex degrees and the product of the squared distance sum and eccentricity of the concerned vertex for all vertices in a hydrogen-suppressed molecular graph. It can be expressed as follows:

image(1)

where Mi is the product of degrees of all the vertices (vj), adjacent to vertex i and can be easily obtained by multiplying all the non-zero row elements in augmentative adjacency matrix, Ei is the eccentricity, Si is the distance sum of vertex i, and n is the number of vertices in the graph, and the N is equal to 1, 2, 3, 4 for superaugmented eccentric distance sum connectivity indices-1, -2, -3, -4, respectively.

Similarly, the topochemical version of superaugmented eccentric distance sum connectivity indices can be defined as the inverse of the summation of quotients of the product of adjacent vertex chemical degrees and the product of the squared chemical distance sum and chemical eccentricity of the concerned vertex for all vertices in a hydrogen-suppressed molecular graph.

It can be expressed as follows:

image(2)

where Mic is the product of chemical degrees of all the vertices (vj), adjacent to vertex i and can be easily obtained by multiplying all the non-zero row elements in additive chemical adjacency matrix, Eic is the chemical eccentricity, Si is the chemical distance sum of vertex i, and n is the number of vertices in the graph, and the N is equal to 1, 2, 3, 4 for superaugmented eccentric distance sum connectivity topochemical indices-1, -2, -3, -4, respectively (denoted by inline image, inline image, inline image, and inline image).

Superaugmented eccentric distance sum connectivity topochemical indices can be easily calculated from the chemical distance matrix (Dc), chemical adjacency matrix (AC), and augmentative chemical adjacency matrix (inline image). The calculation of proposed inline image, inline image, inline image, and inline image for three isomers of 11-membered molecule (decylamine) has been exemplified in Figure 2.

The index values of the proposed topochemical descriptors toward presence and the relative position of heteroatom(s) for all three-, four-, and five-membered isomers containing one heteroatom have been complied in Figure 1. The discriminating power and degeneracy of the superaugmented eccentric distance sum connectivity topochemical indices were investigated using all possible structures with three, four, and five vertices containing one heteroatom has been given in Table 1. The intercorrelation of the proposed superaugmented eccentric distance sum connectivity indices with Wiener’s index, Zagreb indices, the molecular connectivity index, and eccentric connectivity indices were investigated (Table 2).

Table 1.   Comparison of the discriminating power and degeneracy of inline image , inline image, inline image, inline image using all possible structures with three, four, and five vertices containing one heteroatom
 inline imageinline imageinline imageinline image
  1. aDegeneracy: number of compounds having same values/total number of compounds with same number of vertices.

For three vertices
 Minimum value0.3630.3950.4280.461
 Maximum value2.4843.8095.3967.16
 Ratio1:6.8431:9.6431:12.611:15.54
 Degeneracya0/30/30/30/3
For four vertices
 Minimum value0.0890.0990.1090.119
 Maximum value6.58514.85832.73770.8
 Ratio1:73.9891:150.0801:300.341:594.96
 Degeneracy0/110/110/110/11
For five vertices
 Minimum value0.0390.0470.0570.067
 Maximum value11.81430.23674.188176.075
 Ratio1:302.9231:643.3191:1301.541:2627.99
 Degeneracy0/470/470/470/47
Table 2.   Intercorrelation matrix
 χAξcinline imageinline imageWcinline imageinline imageinline imageinline image
χA10.9390.590.6190.743−0.010.0610.1060.132
ξc 10.5990.6620.67−0.07−0.5670.0620.093
inline image  10.9790.016−0.62−0.567−0.53−0.496
inline image   10.045−0.57−0.502−0.46−0.422
Wc    10.5480.580.5930.594
inline image     10.9930.980.965
inline image      10.9960.988
inline image       10.998
inline image        1

Topological indices

The 26 descriptors including the proposed indices (Table 3) (59–75) of diverse nature were used in this study. Though a total of 26 descriptors were employed for the present study, only 14 indices were shortlisted on the basis of non-correlating nature and classification ability. These shortlisted indices used in the present study are defined below.

Table 3.   Topostructural and topochemical indices
CodeIndexReferences
A1Molecular connectivity topochemical index(59,60)
A2Eccentric adjacency topochemical index(61)
A3Augmented eccentric connectivity topochemical index(62)
A4Superadjacency topochemical index(63)
A5Eccentric connectivity topochemical index(64)
A6Connective eccentricity topochemical index(65)
A7Zagreb topochemical index, inline image(66)
A8Zagreb topochemical index, inline image(66)
A9Wiener’s topochemical index(67)
A10Superaugmented eccentric connectivity topochemical index-1(68)
A11Superaugmented eccentric distance sum connectivity topochemical index-1
A12Superaugmented eccentric distance sum connectivity topochemical index-2
A13Superaugmented eccentric distance sum connectivity topochemical index-3
A14Superaugmented eccentric distance sum connectivity topochemical index-4
A15Molecular connectivity index(69)
A16Eccentric adjacency index(70)
A17Augmented eccentric connectivity index(71)
A18Superadjacency index(63)
A19Eccentric connectivity index(72)
A20Connective eccentricity index(73)
A21Zagreb index, M1(74,75)
A22Zagreb index, M2(74,75)
A23Superaugmented eccentric distance sum connectivity index-1
A24Superaugmented eccentric distance sum connectivity index-2
A25Superaugmented eccentric distance sum connectivity index-3
A26Superaugmented eccentric distance sum connectivity index-4

Wiener’s topochemical index (Wc)

Wiener’s topochemical index (67) is defined as sum of the chemical distances between all pairs of vertices in hydrogen suppressed molecular graph. It is a refined form of the oldest and widely used distance-based topological index, Wiener’s index (76), and this modified index considers the presence and the relative position of heteroatom(s) in a molecular structure. It can be expressed as

image(3)

where Picjc is the chemical length the path that contains the least number of edges between vertex i and j in the graph G and n is the number of vertices in the hydrogen depleted graph(67).

Zagreb indices (M1 and M2)

This pair of indices (74,75) denoted by M1 and M2 was introduced in 1972 and is defined as per the Equations 4 and 5.

image(4)
image(5)

where d(i) is the degree of vertex i, which can be defined as number of edges incident on a vertex i and d(i)d(j) is the weight of edge {i,j}.

Similarly Zagreb topochemical indices (66) inline image and inline image are defined as per the Equations 6 and 7.

image(6)

where dc(i) is the chemical degree vertex i and n is the number of vertices.

image(7)

where dc(i) dc(j)is the chemical weight of edge {i, j} in the hydrogen suppressed molecular graph and n is the number of edges.

Connective eccentricity index

Connective eccentricity index (73) can be defined as summation of the ratios of the degree of a vertex (Vi) and its eccentricity (Ei) for all vertices in the hydrogen suppressed molecular structure. It can be expressed by the following equation:

image(8)

The eccentricity Ei of a vertex i in a graph G is the path length from vertex i to the vertex j that is farthest from iinline image.

Data set

A data set (48) comprising 47 analogs of 2-arylbenzimidazole was selected for the present investigation. The basic structure for these analogs is depicted in Figure 3, and various substituents are enlisted in Figure 4. The values of 26 descriptors (Table 3) used in this study were calculated for all the analogs involved in the data set using an in-house computer program. Compounds having reported IC50 values of ≤25 nm were considered to be active, whereas those possessing IC50 values >25 nm were treated to be inactive for the purpose of the present study.

Figure 3.

 Basic structures of 2-arylbezimidazole analogs (48).

Figure 4.

 Relationship of superaugmented eccentric distance sum connectivity topochemical indices, Zagreb topochemical Index, Wiener’s topochemical Index with Checkpoint Kinase (Chk2) inhibitors. (+) active compound, (−) inactive compound and (±) compound in transitional range.

Decision tree

Decision tree provides a useful solution for many problems of classification where large data sets are used and the information contained is complex. A decision tree (generally defined) is a tree whose internal nodes are tests (on input patterns) and whose leaf nodes are categories (off patterns). A decision tree assigns a class number (or output) to an input pattern by filtering the pattern down through the tests in the tree. Each test has given mutually exclusive and exhaustive outcomes.

Decision trees are constructed beginning with the roots of tree and proceeding down to its leaves. In terms of ability, decision trees are a rapid and effective method of classifying data set entries and can provide good decision support capabilities (77,78). In this study, the decision tree was grown to identify the importance of TIs. In a decision tree, the molecules at each parent node are classified, based on the index value, into two child nodes. The prediction for molecule reaching a given terminal node is obtained by majority vote of molecules reaching the same terminal node in training set. In this study, r program (version 2.1.0; University of Auckland, Auckland, New Zealand) along with the RPART library was used to grow the decision tree. The active compounds were labeled as ‘A’ (n = 18) and the inactive compounds were labeled ‘B’ (n = 29). Each analog was assigned a biological activity, which was then compared with the reported Chk2 inhibitory activity.

Random forest

Random forest (RF) was grown for Checkpoint (Chk2) inhibitory activity. Random forest grows numerous classification trees. To classify a new object from an input vector, put the input vector down each of the trees in the forest. Each tree gives a classification means the tree ‘votes’ for that class. The forest chooses the classification having the most votes (over all the trees in the forest) (79). In this study, the RFs were grown with the r program (version 2.1.0) using the RF library.

Moving average analysis

Moving average analysis constitutes the basis for development of single topological index-based model (70,80). For the selection and evaluation of range-specific features, exclusive activity ranges were discovered from the frequency distribution of response level and subsequently identify the active range by analyzing the resulting data by maximization of the moving average with respect to active compounds (<35% = inactive, 35–65% = transitional, >65% = active) The checkpoint kinase (Chk2) inhibitory activity assigned to each compound was compared with the reported biological activity. The average IC50 (nm) values for each range and activity were also calculated.

Data analysis

The sensitivity and specificity values were calculated, which represents the classification accuracies for the active and inactive compounds, respectively. The randomness of model was also predicted by calculating Mathew’s correlation coefficient (MCC). The MCC values ranging between −1 and +1 indicates the potential of model. Mathew’s correlation coefficient took both the sensitivity and specificity into account, and it is generally used as a balanced measure in dealing with data imbalance situation (81).

The results are summarized in Tables 4 and 5 and Figures 5 and 6. The validation of the decision tree (DT)-based model and self consistency test were performed by 10-fold cross validation (CV) method, in which the data set was randomly split into 10-folds. The model was developed using nine randomly selected folds, and the prediction was done on the remaining fold. The goodness of DT-based model was also assessed by calculating the specificity and sensitivity. The 10-fold cross validation results have been presented in Table 4.

Table 4.   Confusion matrix for checkpoint kinase (Chk2) inhibitory activity and recognition rate of models based on decision tree and random forest (RF)
ModelDescriptionRangesNumber of compound predictedSpecificity (%)Sensitivity (%)Mathew’s correlation coefficient
ActiveInactive
Decision treeTraining setActive17196.594.40.9
Inactive128
Cross-validated setActive12682.766.60.03
Inactive524
RF Active16282.788.80.098
Inactive524
Table 5.   Proposed model for the prediction of checkpoint kinase inhibitors
IndexNature of rangeIndex valueTotal compounds in the rangeNumbers compounds predicted correctlyOverall accuracy of prediction (%)Average IC50 (nm)
  1. NA, not applicable.

  2. Values in brackets are based on correctly predicted analogs in the particular range.

inline imageLower inactive<140599.32421901239.44(1414.2)
Active140599.3–2076090.4131266.49 (10.95)
Transitional>2076090.4–<267916.77NA123.38
Upper inactive≥267916.7333620 (3620)
inline imageLower inactive<1298135171794.591672.7 (1672.7)
Transitional1298135–<185965110NA205.46
Active1859651–2357104111110.4 (10.4)
Upper inactive>2357104971301.41 (1671.42)
inline imageInactive<157.64191890.321508 (1590.444)
Lower transitional157.64–<171.649NA132.33
Lower active171.64–184.28565120.417 (8.5)
Upper transitional>184.285–<202.2677NA123.38
Upper active202.267–245.436514.4167 (9.1)
WcLower inactive<2014.091616>991152.25 (1152.25)
Lower transitional2014.09–<2223.329NA146.7
Active2223.32–2431.067710.843 (10.843)
Upper transitional>2431.0615NA165.7
Figure 5.

 A decision tree for distinguishing active analog (A) from inactive analog (B); A13-superaugmented eccentric distance sum connectivity topochemical index-3 (inline image), A14-superaugmented eccentric distance sum connectivity topochemical index-4 (inline image), A6 - Connective eccentricity topochemical index.

Figure 6.

 Average IC50 (nm) value of correctly predicted analogs of 2-arylbenimidazole in various ranges of topological models.

Results and Discussion

The successful application of many topological descriptors is somewhat limited owing to low discriminating power and high degeneracy. There is always a strong need for the development of descriptors and approaches that could provide explicit information on the molecular aspects responsible of drug action (1). Moreover, pharmacogenomics (82), combinatorial chemistry (83,84), and high through put screening (85) permit to obtain and evaluate thousands of compounds in a short time. These technologies have generated new challenges for computational scientists, as they demand novel approaches to the computer-aided lead discovery and optimization in an accelerated way (86).

As the structure of the compound depends on connectivity of its constituent atoms, therefore, TIs based on connectivity can reveal the role of structural and substructural information of molecules in estimating biological activity and evaluate toxicity. Topological indices developed for predicting physicochemical properties and biological activities of chemical substances can be used for drug design (87,88). The application of TIs in drug design can be in lead discovery and lead optimization, virtual screening, structure activity/property studies, structure pharmacokinetics study, and structure toxicity relationships. Recently, these are also being used in similarity/dissimilarity studies, combinatorial chemistry in studying the chirality of the molecule, isomer discrimination, and molecular complexity (1,3).

As shown in Figure 2, the value of inline image changes by a factor of 11 (from 238.801 to 20.804), the value of inline image changes by a factor of 30 (1554.158–52.646), the value of inline image changes by a factor of about 77 (9810.431–127.118), and the value of inline image changes by a factor of 203 (60235.7–296.55) with a minor change in the branching of an 11-membered molecule containing one heteroatom. These descriptors have high discriminating power, which is defined as the ratio of highest to lowest value for all possible structures of same number of vertices. The discriminating power of inline image, inline image, and inline image is 302.9, 643.31, 1301.54, and 2627.99, respectively for all possible structures containing only five vertices (Table 1).

High discriminating power of proposed new descriptors renders them extremely sensitive toward any change in molecular structure. The indices having discriminating power ≥100 for structures containing only five vertices are treated as ‘fourth-generation’ topological descriptors (68,89). inline image, inline image, inline image, and inline image did not exhibit any degeneracy for all possible structures with three, four, and five vertices.

Extremely low degeneracy of the proposed indices ensures the enhanced sensitivity toward the minor changes in branching, connectivity, and changes in the molecular structures. The intercorrelation between the proposed superaugmented eccentric distance sum connectivity topochemical indices and other well-known TIs was also investigated. Pairs of TIs with r ≥ 0.97 are considered highly intercorrelated, those with 0.90 ≤ r ≤ 0.97 appreciably correlated, those with 0.50 ≤ r ≤ 0.89 weakly correlated, and finally the pairs of TIs with r < 0.50 are not intercorrelated (90). As indicated in Table 2, inline image, inline image, inline image, and inline image are not intercorrelated with the well-known χA, ξc, M1, and M2. However, these indices were found to be weakly intercorrelated with Wc and highly intercorrelated with each other, as these are based on similar principles/matrices. The pair of indices χA and ξc, M1 and M2, are highly intercorrelated, whereas χA and Wc, ξc and Wc, ξc and M1, ξc and M2 are found to be weakly intercorrelated, while M1 and M2 are found not be intercorrelated with Wc.

In this study, DT-, random forest (RF)- and moving average analysis (MAA)-based models were developed for the prediction of checkpoint kinase (Chk2) inhibitory activity of 2-arylbenzimidazole. The decision tree was built by utilizing 26 TIs of diverse nature. The index at root node is most important, and the importance of index decreases as the length of tree increases. The classification of 2-arylbenzimidazoles analogs both as active and inactive using a single tree, based on A13, A14, and A6, is illustrated in Figure 5 (the respective descriptor is denoted with an alphanumerical abbreviation that refers to Table 3). The decision tree identified the A13 (inline image) as the most important index. The decision tree classified the 2-arylbenzimidazoles analogs in the training set with an accuracy of 96% and 10-fold cross-validation with an accuracy of 76.6%. The specificity and sensitivity of the DT-based model in training set were of the order of 96.5% and 94.4%, respectively (Table 4). The specificity and sensitivity of the DT-based model in cross-validated set with respect to inactive analogs were of the order of 82.7% and 66.6%. The values of MCC for DT-based model in the training set and cross-validated set are 0.9 and 0.03, respectively, suggesting the randomness and robustness of the model. The values of specificity, sensitivity, and MCC are shown in Table 4.

The RFs were grown with 26 topological descriptors enlisted in Table 3. The importance of node was determined by mean decrease in accuracy. The RF classified 2-arylbenzimidazoles analogs either as active or as inactive with an accuracy of 83%. The specificity and sensitivity were of the order of 82.7% and 88.8%, respectively, and the value of MCC was found to be 0.098 (Table 4).

Using a single index at a time, four independent MAA-based models using inline image, inline image, inline image, and Wc were developed. The proposed models are shown in Table 5. The methodology used in this study aims at the development of suitable models for providing lead molecules through exploitation of the active ranges in the proposed models. These models are unique and differ widely from the conventional QSAR models. Both systems of modeling have their own advantages and limitations. In the instant case, the modeling system adopted has distinct advantage of identification of narrow active range, which may be erroneously skipped during routine regression analysis in conventional QSAR modeling (68). As the ultimate goal of modeling is to provide lead structures, therefore, these active ranges can play a vital role in lead identification.

Retrofit analysis of data (Figure 4 and Table 5) reveals that the MAA-based models derived from inline image, inline image, inline image, and Wc correctly predicted analogs with regard to checkpoint kinase inhibitory (Chk2) activity to the tune of 90%, 94.5%, 90.32% and >99%, respectively. The transitional ranges were observed in all the four models indicating a gradual change in checkpoint kinase inhibitory activity. The active ranges of the models based on inline image and Wc correctly predicted checkpoint kinase inhibitory (Chk2) activity of analogs with an accuracy of >99%. As observed from Table 5 and Figure 6, the average IC50 of correctly predicted analogs of the active ranges of all the four models varied from only 8.5 to ∼11 nm indicating exceptionally high potency. High accuracy of prediction amalgamated with high potency renders active ranges of the proposed models extremely beneficial for providing lead structures for the development of potent checkpoint kinase inhibitors.

Conclusion

Superaugmented eccentric distance sum connectivity topochemical indices– novel molecular descriptors exhibited exceptionally high discriminating power and sensitivity towards both the presence and the relative position of heteroatom amalgamated with low degeneracy. Moreover, these indices were found to be non-correlating with important topological descriptors. These qualities ensure their utility in drug design, quantitative structure activity/property relationships, combinatorial library design, isomer discrimination, and similarity/dissimilarity studies.

Subsequently, proposed TIs along with other TIs were successfully employed for development of numerous models for Chk2 inhibitory activity of 2-arylbenzimidazoles through decision tree, RF, and MAA. Decision tree revealed that proposed superaugmented eccentric distance sum connectivity topochemical index-3 (inline image) and superaugmented eccentric distance sum connectivity topochemical index-4 (inline image) are the most important indices. The exceptionally high degree of predictability of the resulting models offers a vast potential for providing lead structures for the development of specific Chk2 inhibitors that will help in improving the therapeutic window of radiation therapy and chemotherapy by reducing their side effects on the normal cells.

Footnotes

Ancillary