Get access

QSPR study for prediction of boiling points of 2475 organic compounds using stochastic gradient boosting

Authors

  • Jue-hong Zhang,

    1. School of Mathematics and Statistics, Central South University, Changsha, China
    2. Department of Statistics and Financial Mathematics, Hunan Normal University, Changsha, China
    Search for more papers by this author
  • Zai-ming Liu,

    Corresponding author
    1. School of Mathematics and Statistics, Central South University, Changsha, China
    • Correspondence to: Z.-m. Liu, School of Mathematics and Statistics, Central South University, Changsha 410083, China.

      E-mail: math_lzm@csu.edu.cn

    Search for more papers by this author
  • Wan-rong Liu

    1. Department of Statistics and Financial Mathematics, Hunan Normal University, Changsha, China
    Search for more papers by this author

Abstract

The normal boiling point is one of the major physicochemical properties used to characterize and identify an organic compound. In this study, the boosting regression tree model was developed to model quantitative structure–property relationship (QSPR) for the boiling points of 2475 compounds with structurally high heterogeneity. Stochastic gradient boosting (SGB) aims at constructing additive regression models by sequentially fitting a simple regression tree model to current “pseudo”-residuals by least squares at each iteration. The parameters of SGB were optimized using 10-fold cross-validation. The best SGB model established using 2D descriptors had an overall Q2 of 0.957, root mean square error of validation of 17.89 for validation set, and RT2 of 0.954, root mean square error of test of 18.19 for test set. Compared to other commonly used modeling methods such as partial least square, classification and regression tree, and random forest, SGB can not only obtain the best predictive ability, but also get more useful insights into the relationship between properties and descriptors for prediction of boiling points, with the help of partial dependence plots. SGB could be a promising tool in the field of QSPR research, especially for the screening of new compounds. Copyright © 2014 John Wiley & Sons, Ltd.

Get access to the full text of this article

Ancillary