SEARCH

SEARCH BY CITATION

Keywords:

  • α-helix;
  • secondary structure;
  • structure prediction;
  • N-terminus;
  • N-cap

Abstract

The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the α-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in α-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted α-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment–based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of α-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of α-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant. Proteins 2004. © 2004 Wiley-Liss, Inc.