Accurate Prediction of Aquatic Toxicity of Aromatic Compounds Based on Genetic Algorithm and Least Squares Support Vector Machines



Quantitative Structure–Toxicity Relationship (QSTR) plays an important role in ecotoxicology for its fast and practical ability to assess the potential negative effects of chemicals. The aim of this investigation was to develop accurate QSTR models for the aquatic toxicity of a large set of aromatic compounds through the combination of Least Squares Support Vector Machines (LS-SVMs) and a Genetic Algorithm (GA). Molecular descriptors calculated by DRAGON package and log P were used to describe the molecular structures. The most relevant descriptors used to build QSTR models were selected by a GA-Variable Subset Selection procedure. Multiple Linear Regression (MLR) and nonlinear LS-SVMs methods were employed to build QSTR models. The predictive ability of the derived models was validated using both the test set, selected from the whole data set by the Kennard–Stone Algorithm, and an external prediction set. The model applicability domain was checked by the leverage approach and the external prediction set was used to verify the predictive reliability of the models. The results indicated that the proposed QSTR models are robust and satisfactory, and can provide a feasible and promising tool for the rapid assessment of the toxicity of chemicals.