Get access

Prediction-based fingerprints of protein–protein interactions

Authors

  • Aleksey Porollo,

    1. Division of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio 45229
    Search for more papers by this author
  • Jarosław Meller

    Corresponding author
    1. Division of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio 45229
    2. Department of Informatics, Nicholas Copernicus University, 87-100 Toruń, Poland
    • Division of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio 45229
    Search for more papers by this author

Abstract

The recognition of protein interaction sites is an important intermediate step toward identification of functionally relevant residues and understanding protein function, facilitating experimental efforts in that regard. Toward that goal, the authors propose a novel representation for the recognition of protein–protein interaction sites that integrates enhanced relative solvent accessibility (RSA) predictions with high resolution structural data. An observation that RSA predictions are biased toward the level of surface exposure consistent with protein complexes led the authors to investigate the difference between the predicted and actual (i.e., observed in an unbound structure) RSA of an amino acid residue as a fingerprint of interaction sites. The authors demonstrate that RSA prediction-based fingerprints of protein interactions significantly improve the discrimination between interacting and noninteracting sites, compared with evolutionary conservation, physicochemical characteristics, structure-derived and other features considered before. On the basis of these observations, the authors developed a new method for the prediction of protein–protein interaction sites, using machine learning approaches to combine the most informative features into the final predictor. For training and validation, the authors used several large sets of protein complexes and derived from them nonredundant representative chains, with interaction sites mapped from multiple complexes. Alternative machine learning techniques are used, including Support Vector Machines and Neural Networks, so as to evaluate the relative effects of the choice of a representation and a specific learning algorithm. The effects of induced fit and uncertainty of the negative (noninteracting) class assignment are also evaluated. Several representative methods from the literature are reimplemented to enable direct comparison of the results. Using rigorous validation protocols, the authors estimated that the new method yields the overall classification accuracy of about 74% and Matthews correlation coefficients of 0.42, as opposed to up to 70% classification accuracy and up to 0.3 Matthews correlation coefficient for methods that do not utilize RSA prediction-based fingerprints. The new method is available at http://sppider.cchmc.org. Proteins 2007. © 2006 Wiley-Liss, Inc.

Ancillary