### Abstract

- Top of page
- Abstract
- Introduction
- Results and Discussions
- Materials and Methods
- Conclusions
- References
- Supporting Information

Quantitative prediction of protein–protein binding affinity is essential for understanding protein–protein interactions. In this article, an atomic level potential of mean force (PMF) considering volume correction is presented for the prediction of protein–protein binding affinity. The potential is obtained by statistically analyzing X-ray structures of protein–protein complexes in the Protein Data Bank. This approach circumvents the complicated steps of the volume correction process and is very easy to implement in practice. It can obtain more reasonable pair potential compared with traditional PMF and shows a classic picture of nonbonded atom pair interaction as Lennard-Jones potential. To evaluate the prediction ability for protein–protein binding affinity, six test sets are examined. Sets 1–5 were used as test set in five published studies, respectively, and set 6 was the union set of sets 1–5, with a total of 86 protein–protein complexes. The correlation coefficient (*R*) and standard deviation (SD) of fitting predicted affinity to experimental data were calculated to compare the performance of ours with that in literature. Our predictions on sets 1–5 were as good as the best prediction reported in the published studies, and for union set 6, *R* = 0.76, SD = 2.24 kcal/mol. Furthermore, we found that the volume correction can significantly improve the prediction ability. This approach can also promote the research on docking and protein structure prediction.

### Introduction

- Top of page
- Abstract
- Introduction
- Results and Discussions
- Materials and Methods
- Conclusions
- References
- Supporting Information

Protein–protein interactions participate in an extremely wide range of life processes, including cellular metabolism of matter and energy, signal transduction, and so on. Thus, understanding protein–protein interactions is a very important issue in biology. However, satisfactory solutions to many problems in this field have not been obtained yet, including predictions of protein–protein affinity and protein–protein structure. All of them require a precise energy function. Many efforts have been made to develop such functions but the achieved accuracy still need to be improved in practice.1–3 In this article, we focus on structure-derived statistical potentials to predict protein–protein affinity.

Structure-derived statistical potentials have been widely applied not only in protein structure prediction and design but also in protein complexes studies, such as protein–ligand affinity prediction (the ligand can be protein, peptide, DNA, RNA, or other molecules), mutation-induced changes in protein stability, and rational drug design.4–13 In those approaches, the potential is extracted by statistically analyzing known three-dimensional structure data of biomolecules. Therefore, they were also termed knowledge-based potentials. One kind of them, potential of mean force (PMF), is derived from the statistical mechanics of simple liquids,14–16 which converts particle pair distribution of distance into distance-dependent potential function. PMF has been frequently used in affinity prediction and structure scoring, because its physical meaning and function curve are similar to those of the “true” energy potential, which in principle can be derived from fundamental analysis of the forces between particles,10, 17 such as quantum chemical calculations. Therefore, PMF was also called as energy-like potential or quantity.

Volume correction must be considered when PMF is applied in protein systems. It is one of the key factors that can improve the precision of prediction and the reasonableness of potential function. Since PMF was introduced into the studies of protein systems, the understanding and the application of volume correction (or frequency correction) have undergone a series of development.

Sippl18 observed the frequency of the alpha-C of a residue pairs and normalized it with the average frequency over all residue pairs. Then, the normalized frequency was transformed into potential directly without considerations of the frequency correction. This traditional PMF approach was the mainstream method in early researches.19, 20

Subsequently, some approaches to calculate PMF are based on the radial distribution function (RDF) in the statistical mechanics of simple liquids.14–16 In those approaches, the frequency was normalized in the manner of dividing occurrence numbers in a sphere volume without any correction. However, the occupied volume in a more complex system, such as in a protein system, is not a whole sphere. Therefore, when normalizing the occurrence frequency of atom pairs, the whole sphere volume is not a good indicator of the actual occupied volume. For example, Bahar and Jernigan21 considered the theoretical basis of PMF as the RDF. They normalized the occurrence numbers with the numbers in a whole sphere volume (4π*r*^{2}d*r*). They further analyzed in detail the distribution tendency of the occurrence numbers of residue pairs in protein systems with increasing distance and compared it with the occurrence numbers in a whole sphere (Fig. 2 in Ref. 21, the tangent in this figure corresponds to the distribution of numbers in a whole sphere). Form this figure, we can get the hint of correcting the distribution of the occurrence numbers in a whole sphere with a function to obtain the better approximation to the distribution in protein systems. Mitchell *et al*.22 found that the factor of a whole sphere (4π*r*^{2}d*r*) gives an average potential that is weakly repulsive over the entire distance range with no attractive region at typical interaction distances. They thought that this abnormality is due to the occupied volume of atoms in protein complexes deviating significantly from *r*^{2} proportionality.

Imperfections in the aforementioned studies show that in systems as complicated as proteins, the occupied volume is not proportional to a whole sphere. In contrast to in simple liquids system, the normalized frequency of atom pairs (or residue pairs) can work well15 using *f(r)* = *N(r)*/volume(*r*), here occupied volume is a whole sphere: volume(*r*) = 4π*r*^{2}d*r*.

Since then to obtain the real occupied volume in protein systems, the volume correction has been developed along two ways, one of which is based on correction functions and the other on structural statistics. The first way corrects occupied volume with a certain function to get the better approximation than a whole sphere volume 4π*r*^{2}d*r*. Zhou and Zhou23 established DFIRE approach, which corrected volume with *r*^{α}. The exponent α is a constant, whose empirical value was first found equal to 1.5723 and refined to 1.6124 subsequently. DFIRE was applied in the affinity prediction of protein complexes later.25 In this article, we tested our approach on the test set from DFIRE. Shen and Sali26 went a step further. They analytically derived a statistical potential termed DOPE for decoy discrimination of single protein structure. The DOPE corrects volume with a correction factor of *r*^{α(r)}. The effective exponent α(*r*) is a function of interparticle distance *r*, which results in a more flexible application.

It should be noted that these approaches above corrected volume with a uniform factor to all atom types. In other words, they used the same correction factor for distinct atoms. But in fact, each of the atom types is different on occupied volume. Therefore, a distinct volume correction should be used for each of the atom types.

This problem is naturally solved by the second type approach of volume correction, which acquires the volume correction factors directly from statistics to structures. This type of approach, unlike the first one, is independent of a certain function form to correct occupied volume. Moreover, in contrast to the first way, it is able to distinguish different atom types surrounded by distinct environments, by generating a unique volume correction for each atom type. Therefore, a more accurate correction can be acquired. Muegge and Martin27 corrected the volume based on structural statistics. In their approach, each atom type is treated with a different volume correction. Their approach performed well in the prediction of protein–ligand binding affinity. However, the implementation of their approach is very complicated in practice, which obstructed its popularity.

The approach presented in this article belongs to the second type of approach, but we circumvented the complicated step of volume correction process. The volume correction was achieved using a novel and very simple frequency correction. More reasonable potentials were obtained, and the prediction to protein–protein binding affinity on six test sets from five literatures also showed good performance of our approach.