• Support vector machine;
  • Eye/skin irritation;
  • Prediction program;
  • Bioinformatics;
  • Structure[BOND]activity relationships


In this study, the ensemble of features and training samples was examined with a collection of support vector machines. The effects of data sampling methods, ratio of positive to negative compounds, and types of base models combiner to produce ensemble models were explored. The ensemble method was applied to produce four separate in silico models to classify the labels for eye/skin corrosion (H314), skin irritation (H315), serious eye damage (H318), and eye irritation (H319), which are defined in the “Globally Harmonized System of Classification and Labelling of Chemicals”. To the best of our knowledge, the training set used in this work is one of the largest (made of publicly available data) with acceptable prediction performances. These models were distributed via PaDEL-DDPredictor ( that can be downloaded freely for public use.