Research Article
Improved prediction for N-termini of α-helices using empirical information
Article first published online: 8 JUL 2004
DOI: 10.1002/prot.20218
Copyright © 2004 Wiley-Liss, Inc.
Issue
1097-0134/asset/cover.gif?v=1&s=d817e79b67ba6cacf8bdcce1a819c04de300a7e3)
Proteins: Structure, Function, and Bioinformatics
Volume 57, Issue 2, pages 322–330, 1 November 2004
Additional Information
How to Cite
Wilson, C. L., Boardman, P. E., Doig, A. J. and Hubbard, S. J. (2004), Improved prediction for N-termini of α-helices using empirical information. Proteins: Structure, Function, and Bioinformatics, 57: 322–330. doi: 10.1002/prot.20218
Publication History
- Issue published online: 18 AUG 2004
- Article first published online: 8 JUL 2004
- Manuscript Accepted: 5 APR 2004
- Manuscript Received: 22 JAN 2004
- Abstract
- Article
- References
- Cited By
Keywords:
- α-helix;
- secondary structure;
- structure prediction;
- N-terminus;
- N-cap
Abstract
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the α-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in α-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted α-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment–based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of α-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of α-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant. Proteins 2004. © 2004 Wiley-Liss, Inc.

1097-0134/asset/PROT_centre.gif?v=1&s=77b56b1f2cdaba74cb3bb149bd9b029cd8803cdb)