Get access

LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields

Authors

  • Yong-Xian Fan,

    1. Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
    Search for more papers by this author
  • Yang Zhang,

    Corresponding author
    1. Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109
    2. Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109
    • Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109
    Search for more papers by this author
  • Hong-Bin Shen

    Corresponding author
    1. Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
    2. Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109
    • Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
    Search for more papers by this author

Abstract

The calpain family of Ca2+-dependent cysteine proteases plays a vital role in many important biological processes which is closely related with a variety of pathological states. Activated calpains selectively cleave relevant substrates at specific cleavage sites, yielding multiple fragments that can have different functions from the intact substrate protein. Until now, our knowledge about the calpain functions and their substrate cleavage mechanisms are limited because the experimental determination and validation on calpain binding are usually laborious and expensive. In this work, we aim to develop a new computational approach (LabCaS) for accurate prediction of the calpain substrate cleavage sites from amino acid sequences. To overcome the imbalance of negative and positive samples in the machine-learning training which have been suffered by most of the former approaches when splitting sequences into short peptides, we designed a conditional random field algorithm that can label the potential cleavage sites directly from the entire sequences. By integrating the multiple amino acid features and those derived from sequences, LabCaS achieves an accurate recognition of the cleave sites for most calpain proteins. In a jackknife test on a set of 129 benchmark proteins, LabCaS generates an AUC score 0.862. The LabCaS program is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/LabCaS. Proteins 2013. © 2012 Wiley Periodicals, Inc.

Ancillary