Get access

Protein Remote Homology Detection by Combining Chou’s Pseudo Amino Acid Composition and Profile-Based Protein Representation

Authors

  • Bin Liu,

    Corresponding author
    1. School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, Guangdong, P. R. China phone: +86-755-26033283
    2. Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology,Shenzhen Graduate School, Shenzhen, Guangdong, P. R. China
    3. Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, P. R. China
    • School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, Guangdong, P. R. China phone: +86-755-26033283

    Search for more papers by this author
  • Xiaolong Wang,

    1. School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, Guangdong, P. R. China phone: +86-755-26033283
    2. Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology,Shenzhen Graduate School, Shenzhen, Guangdong, P. R. China
    Search for more papers by this author
  • Quan Zou,

    1. School of Information Science and Technology, Xiamen University, Xiamen, Fujian, P. R. China
    Search for more papers by this author
  • Qiwen Dong,

    1. School of Computer Science, Fudan University, Shanghai, P. R. China
    Search for more papers by this author
  • Qingcai Chen

    1. School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, Guangdong, P. R. China phone: +86-755-26033283
    2. Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology,Shenzhen Graduate School, Shenzhen, Guangdong, P. R. China
    Search for more papers by this author

Abstract

Protein remote homology detection is a key problem in bioinformatics. Currently the discriminative methods, such as Support Vector Machine (SVM) can achieve the best performance. The most efficient approach to improve the performance of SVM-based methods is to find a general protein representation method that is able to convert proteins with different lengths into fixed length vectors and captures the different properties of the proteins for the discrimination. The bottleneck of designing the protein representation method is that native proteins have different lengths. Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, we applied this approach for protein remote homology detection. Some new indices derived from the amino acid index (AAIndex) database are incorporated into the PseAAC to improve the generalization ability of this method. Finally, the performance is further improved by combining the modified PseAAC with profile-based protein representation containing the evolutionary information extracted from the frequency profiles. Our experiments on a well-known benchmark show this method achieves superior or comparable performance with current state-of-the-art methods.

Get access to the full text of this article

Ancillary