Get access

Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features

Authors

  • Qi Dai,

    Corresponding author
    1. College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
    • College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Li Wu,

    1. Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Lihua Li

    Corresponding author
    1. Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
    • Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
    Search for more papers by this author

Abstract

Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011

Ancillary