SEARCH

SEARCH BY CITATION

Keywords:

  • QSAR;
  • QSPR;
  • machine learning;
  • QSAR models interpretation

Abstract

In this paper we offer a novel approach for the structural interpretation of QSAR models. The major advantage of our developed methodology is its universality, i.e., it can be applied to any QSAR/QSPR model irrespective of chemical descriptors and machine learning methods applied. This universality was achieved by using only the information obtained from substructures of the compounds of interest to interpret model outcomes. Reliability of the offered approach was confirmed by the results of three case studies, including end-points of different types (continuous and binary classification) and nature (solubility, mutagenicity, and inhibition of Transglutaminase 2), various fragment and whole-molecule descriptors (Simplex and Dragon), and multiple modeling techniques (partial least squares, random forest, and support vector machines). We compared the global contributions of molecular fragments obtained using our methodology with known SAR rules derived experimentally. In all cases high concordance between our interpretation and results published by others was observed. Although the proposed interpretation approach could be easily extended to any type of descriptors, we would recommend using Simplex descriptors to achieve a larger variety of investigated molecular fragments. The developed approach is a good tool for interpretation of such “black box” models like random forest, neural networks, etc. Analysis of fragment global contributions and their deviation across a dataset could be useful for the identification of key fragments and structural alerts. This information could be helpful to maximize the positive influence of structural surroundings on the given fragment and to decrease the negative effects.