• binding affinity;
  • discrimination;
  • feature selection;
  • machine learning techniques;
  • protein–protein interactions


Protein–protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein–protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein–protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein–protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein–protein interaction networks and human–pathogen interactions based on the strength of interactions.Proteins 2014. © 2014 Wiley Periodicals, Inc.