Protein structure prediction begins well but ends badly



The accurate prediction of protein structure, both secondary and tertiary, is an ongoing problem. Over the years, many approaches have been implemented and assessed. Most prediction algorithms start with the entire amino acid sequence and treat all residues in an identical fashion independent of sequence position. Here, we analyze blind prediction data to investigate whether predictive capability varies along the chain. Free modeling results from recent critical assessment of techniques for protein structure prediction (CASP) experiments are evaluated; as is the most up-to-date data from EVA, a fully automated blind test of secondary structure prediction servers. The results demonstrate that structure prediction accuracy is dependent on sequence position. Both secondary structure and tertiary structure predictions are more accurate in regions near the amino(N)-terminus when compared with analogous regions near the carboxy(C)-terminus. Eight of 10 secondary structure prediction algorithms assessed by EVA perform significantly better in regions at the N-terminus. CASP data shows a similar bias, with N-terminal fragments being predicted more accurately than fragments from the C-terminus. Two analogous fragments are taken from each model, the N-terminal fragment begins at the start of the most N-terminal secondary structure element (SSE), whereas the C-terminal fragment finishes at the end of the most C-terminal SSE. Each fragment is locally superimposed onto its respective native fragment. The relative terminal prediction accuracy (RMSD) is calculated on an intramodel basis. At a fragment length of 20 residues, the N-terminal fragment is predicted with greater accuracy in 79% of cases. Proteins 2010. © 2009 Wiley-Liss, Inc.