Get access

Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machine

Authors


Abstract

Aqueous solubility of drug compounds plays a very important role in drug research and development. In this study, we have collected 225 diverse druglike molecules with accurate aqueous solubility. Three commonly used methods, namely partial least squares (PLS), back-propagation network (BPN) and support vector regression (SVR), were employed to model quantitative structure–property relationship (QSPR) for the aqueous solubility of 180 druglike compounds. Twenty eight molecular descriptors were used to relate the drug aqueous solubility. In order to obtain a reliable and robust aqueous solubility prediction, a novel outlier detection method was employed to simultaneously detect all outliers in the established models. According to the Organization for Economic Co-operation and Development (OECD) principles, the QSPR models were checked by both internal and external statistical validation to ensure both reliability and predictive ability. The results indicate that three models can provide good predictive ability for drug aqueous solubility. Futhermore, it was found that the predictive ability of SVR is superior to those of PLS and BPN and 28 selected molecular descriptors could give a reliable and direct interpretation to the aqueous solubility. Copyright © 2010 John Wiley & Sons, Ltd.

Ancillary