Get access

Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction

Authors

  • Benhui Chen,

    Non-member
    1. Graduate School of Information, Production and Systems, Waseda University, Hibikino 2-7, Wakamatsu-ku, Kitakyushu-shi, Fukuoka 808-0135, Japan
    2. School of Mathematics and Computer Science, Dali University, Hongsheng Road 2, Dali, Yunnan 671003, China.
    Search for more papers by this author
  • Jinglu Hu

    Member, Corresponding author
    1. Graduate School of Information, Production and Systems, Waseda University, Hibikino 2-7, Wakamatsu-ku, Kitakyushu-shi, Fukuoka 808-0135, Japan
    • Graduate School of Information, Production and Systems, Waseda University, Hibikino 2-7, Wakamatsu-ku, Kitakyushu-shi, Fukuoka 808-0135, Japan
    Search for more papers by this author

Abstract

Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. Gene function prediction is a complicated HMC problem with large class number and usually strongly imbalanced class distributions. This paper proposes an improved HMC method based on over-sampling and hierarchy constraint for solving the gene function prediction problem. The HMC task is transferred into a set of binary support vector machine (SVM) classification tasks. Then, two measures are implemented to enhance the HMC performance by introducing the hierarchy constraint into learning procedures. Firstly, for imbalanced classes, a hierarchical synthetic minority over-sampling technique (SMOTE) is proposed as over-sampling preprocessing to improve the SVM learning performance. Secondly, an improved True Path Rule (TPR) ensemble approach is introduced to combine the results of binary probabilistic SVM classifications. It can improve the classification results and guarantee the hierarchy constraint of classes. Experiment results on four benchmark FunCat Yeast datasets show that the proposed method significantly outperforms the basic TPR method and the Flat ensemble method. © 2012 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

Get access to the full text of this article

Ancillary