Research Article
Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequences
Article first published online: 16 APR 2004
DOI: 10.1002/prot.20079
Copyright © 2004 Wiley-Liss, Inc.
Issue
1097-0134/asset/cover.gif?v=1&s=d817e79b67ba6cacf8bdcce1a819c04de300a7e3)
Proteins: Structure, Function, and Bioinformatics
Volume 55, Issue 4, pages 1036–1042, 1 June 2004
Additional Information
How to Cite
Chen, Y.-C., Lin, Y.-S., Lin, C.-J. and Hwang, J.-K. (2004), Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins, 55: 1036–1042. doi: 10.1002/prot.20079
Publication History
- Issue published online: 6 MAY 2004
- Article first published online: 16 APR 2004
- Manuscript Accepted: 4 DEC 2003
- Manuscript Received: 22 SEP 2003
Funded by
- National Science Council in Taiwan, Republic of China
- Abstract
- Article
- References
- Cited By
Keywords:
- support vector machines;
- disulfide bonds;
- cysteine state sequences;
- multiple feature vectors
Abstract
The support vector machine (SVM) method is used to predict the bonding states of cysteines. Besides using local descriptors such as the local sequences, we include global information, such as amino acid compositions and the patterns of the states of cysteines (bonded or nonbonded), or cysteine state sequences, of the proteins. We found that SVM based on local sequences or global amino acid compositions yielded similar prediction accuracies for the data set comprising 4136 cysteine-containing segments extracted from 969 nonhomologous proteins. However, the SVM method based on multiple feature vectors (combining local sequences and global amino acid compositions) significantly improves the prediction accuracy, from 80% to 86%. If coupled with cysteine state sequences, SVM based on multiple feature vectors yields 90% in overall prediction accuracy and a 0.77 Matthews correlation coefficient, around 10% and 22% higher than the corresponding values obtained by SVM based on local sequence information. Proteins 2004;55:000–000. © 2004 Wiley-Liss, Inc.

1097-0134/asset/PROT_centre.gif?v=1&s=77b56b1f2cdaba74cb3bb149bd9b029cd8803cdb)