• protein–protein interactions;
  • protein binding sites;
  • data set of protein–protein interfaces;
  • structural comparison of binding sites;
  • alignment of interfaces, physicochemical properties


Protein–protein interfaces are regions between 2 polypeptide chains that are not covalently connected. Here, we have created a nonredundant interface data set generated from all 2-chain interfaces in the Protein Data Bank. This data set is unique, since it contains clusters of interfaces with similar shapes and spatial organization of chemical functional groups. The data set allows statistical investigation of similar interfaces, as well as the identification and analysis of the chemical forces that account for the protein–protein associations. Toward this goal, we have developed I2I-SiteEngine (Interface-to-Interface SiteEngine) [Data set available at; Web server:]. The algorithm recognizes similarities between protein–protein binding surfaces. I2I-SiteEngine is independent of the sequence or the fold of the proteins that comprise the interfaces. In addition to geometry, the method takes into account both the backbone and the side-chain physicochemical properties of the interacting atom groups. Its high efficiency makes it suitable for large-scale database searches and classifications. Below, we briefly describe the I2I-SiteEngine method. We focus on the classification process and the obtained nonredundant protein–protein interface data set. In particular, we analyze the biological significance of the clusters and present examples which illustrate that given constellations of chemical groups in protein–protein binding sites may be preferred, and are observed in proteins with different structures and different functions. We expect that these would yield further information regarding the forces stabilizing protein–protein interactions. Proteins 2005. 2005 Wiley-Liss, Inc.