SEARCH

SEARCH BY CITATION

Keywords:

  • side chain conformation;
  • machine learning;
  • neural network;
  • Monte Carlo methods;
  • protein structure prediction

Abstract

Accurate protein side-chain conformation prediction is crucial for protein modeling and existing methods for the task are widely used; however, faster and more accurate methods are still required. Here we present a new machine learning approach to the problem where an energy function for each rotamer in a structure is computed additively over pairs of contacting atoms. A family of 156 neural networks indexed by amino acid and contacting atom types is used to compute these rotamer energies as a function of atomic contact distances. Although direct energy targets are not available for training, the neural networks can still be optimized by converting the energies to probabilities and optimizing these probabilities using Markov Chain Monte Carlo methods. The resulting predictor SIDEpro makes predictions by initially setting the rotamer probabilities for each residue from a backbone-dependent rotamer library, then iteratively updating these probabilities using the trained neural networks. After convergences of the probabilities, the side-chains are set to the highest probability rotamer. Finally, a post processing clash reduction step is applied to the models. SIDEpro represents a significant improvement in speed and a modest, but statistically significant, improvement in accuracy when compared with the state-of-the-art for rapid side-chain prediction method SCWRL4 on the following datasets: (1) 379 protein test set of SCWRL4; (2) 94 proteins from CASP9; (3) a set of seven large protein-only complexes; and (4) a ribosome with and without the RNA. Using the SCWRL4 test set, SIDEpro's accuracy (χ1 86.14%, χ1+2 74.15%) is slightly better than SCWRL4-FRM (χ1 85.43%, χ1+2 73.47%) and it is 7.0 times faster. On the same test set SIDEpro is clearly more accurate than SCWRL4-rigid rotamer model (RRM) (χ1 84.15%, χ1+2 71.24%) and 2.4 times faster. Evaluation on the additional test sets yield similar accuracy results with SIDEpro being slightly more accurate than SCWRL4-flexible rotamer model (FRM) and clearly more accurate than SCWRL4-RRM; however, the gap in CPU time is much more significant when the methods are applied to large protein complexes. SIDEpro is part of the SCRATCH suite of predictors and available from: http://scratch.proteomics.ics.uci.edu/. Proteins 2012; © 2011 Wiley Periodicals, Inc.