Combining prediction of secondary structure and solvent accessibility in proteins



Owing to the use of evolutionary information and advanced machine learning protocols, secondary structures of amino acid residues in proteins can be predicted from the primary sequence with more than 75% per-residue accuracy for the 3-state (i.e., helix, β-strand, and coil) classification problem. In this work we investigate whether further progress may be achieved by incorporating the relative solvent accessibility (RSA) of an amino acid residue as a fingerprint of the overall topology of the protein. Toward that goal, we developed a novel method for secondary structure prediction that uses predicted RSA in addition to attributes derived from evolutionary profiles. Our general approach follows the 2-stage protocol of Rost and Sander, with a number of Elman-type recurrent neural networks (NNs) combined into a consensus predictor. The RSA is predicted using our recently developed regression-based method that provides real-valued RSA, with the overall correlation coefficients between the actual and predicted RSA of about 0.66 in rigorous tests on independent control sets. Using the predicted RSA, we were able to improve the performance of our secondary structure prediction by up to 1.4% and achieved the overall per-residue accuracy between 77.0% and 78.4% for the 3-state classification problem on different control sets comprising, together, 603 proteins without homology to proteins included in the training. The effects of including solvent accessibility depend on the quality of RSA prediction. In the limit of perfect prediction (i.e., when using the actual RSA values derived from known protein structures), the accuracy of secondary structure prediction increases by up to 4%. We also observed that projecting real-valued RSA into 2 discrete classes with the commonly used threshold of 25% RSA decreases the classification accuracy for secondary structure prediction. While the level of improvement of secondary structure prediction may be different for prediction protocols that implicitly account for RSA in other ways, we conclude that an increase in the 3-state classification accuracy may be achieved when combining RSA with a state-of-the-art protocol utilizing evolutionary profiles. The new method is available through a Web server at Proteins 2005. © 2005 Wiley-Liss, Inc.