• Statistical;
  • Mechanistic;
  • Cycloalkyl-pyranones;
  • HIV-PI;
  • MLR;
  • Heuristic;
  • BMLR;
  • External validation


Two parallel approaches for quantitative structure-activity relationships (QSAR) are predominant in literature, one guided by mechanistic methods (including read-across) and another by the use of statistical methods. To bridge the gap between these two approaches and to verify their main differences, a comparative study of mechanistically relevant and statistically relevant QSAR models, developed on a case study of 158 cycloalkyl-pyranones, biologically active on inhibition (Ki) of HIV protease, was performed. Firstly, Multiple Linear Regression (MLR) based models were developed starting from a limited amount of molecular descriptors which were widely proven to have mechanistic interpretation. Then robust and predictive MLR models were developed on the same set using two different statistical approaches unbiased of input descriptors. Development of models based on Statistical I method was guided by stepwise addition of descriptors while Genetic Algorithm based selection of descriptors was used for the Statistical II. Internal validation, the standard error of the estimate, and Fisher’s significance test were performed for both the statistical models. In addition, external validation was performed for Statistical II model, and Applicability Domain was verified as normally practiced in this approach. The relationships between the activity and the important descriptors selected in all the models were analyzed and compared. It is concluded that, despite the different type and number of input descriptors, and the applied descriptor selection tools or the algorithms used for developing the final model, the mechanistical and statistical approach are comparable to each other in terms of quality and also for mechanistic interpretability of modelling descriptors. Agreement can be observed between these two approaches and the better result could be a consensus prediction from both the models.