SOFTDOCK application to protein–protein interaction benchmark and CAPRI


  • Nan Li,

    1. Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100080, People's Republic of China
    2. Graduate School, Chinese Academy of Sciences, Beijing 100080, People's Republic of China
    Search for more papers by this author
  • Zhonghua Sun,

    1. Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100080, People's Republic of China
    2. Graduate School, Chinese Academy of Sciences, Beijing 100080, People's Republic of China
    Search for more papers by this author
  • Fan Jiang

    Corresponding author
    1. Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100080, People's Republic of China
    • Institute of Physics, Chinese Academy of Sciences, Zhong Guan Cun Nan San Jie No. 8, Beijing 100080, People's Republic of China
    Search for more papers by this author


The success of molecular docking requires cooperation of sampling and scoring of various conformations. The SOFTDOCK package uses a coarse-grained docking method to sample all possible conformations of complexes. SOFTDOCK uses a new Voronoi molecular surface and calculates several grid-based scores. It is shown by the leave-one-out test that three geometry scores and an FTDOCK-like electrostatics score contribute the most to the discrimination of near-native conformations. However, an atom-based solvation score is shown to be ineffective. It is also found that an increased Voronoi surface thickness greatly increases the accuracy of docking results. Finally, the clustering procedure is shown to improve the overall ranking, but leads to less accurate docking results. The application of SOFTDOCK in Critical Assessment of PRedicted Interactions involves four steps: (i) sampling with INTELEF; (ii) clustering; (iii) AMBER energy minimization; and (iv) manual inspection. Biological information from literature is used as filters in some of the sampling and manual inspection according to different targets. Two of our submissions have L_rmsd around 10 Å. Although they are not classified as acceptable solutions, they are considered successful because they are comparable to the accuracy of our method. Availability: SOFTDOCK is open source code and can be downloaded at Proteins 2007. © 2007 Wiley-Liss, Inc.


Protein–protein interactions play an important role in various biochemical processes. With the development of genomics and proteomics, more and more protein interaction data are obtained by experimental methods1, 2 but their structural annotations are lacking. The complete structural annotations for these interaction data will take years with the current structure determination methods.3 Therefore, to solve the problem, computational methods that predict structural details of protein interaction have been designed and continuously improved in recent years.

Docking methods are computational methods, which predict interactions between two proteins. In the last 2 decades, various docking methods have been designed.4–23 Jiang and Kim designed a soft docking method of matching surface cubes and surface normals.6, 7 Grid method was also used by Cherfils et al.8 Katchalski-Katzir et al.9 first used FFT to sample the conformational space. Vakser10 used a sparse cube size with the FFT procedure to dock various protein complexes. Sternberg and coworkers11, 12 designed FTDOCK, which performed search with FFT and used a soft electrostatic function as a filter for docking unbound structures. Abagyan and coworkers13–18 used pseudo-Brownian search and biased probability Monte Carlo search to deal with rigid-body search and side-chain conformation search, respectively. Weng and coworkers19–21 developed ZDOCK with a geometry function counting surface atom pairs and the ACE solvation function in addition to a novel implementation of the original FFT method. Baker and coworkers22 devised RossettaDock using a two-stage searching method sampling conformational space.

One of the most important tasks in docking is calculating scores that can discriminate true complexes from false ones. Of the various scoring functions that have been applied to different docking procedures, the most useful and efficient ones are based on the concept of geometry or shape complementarity. For example, they are used in SOFTDOCK,6, 7 FTDOCK,11 and ZDOCK.19, 20 A more physical version, namely, Lennard–Jones potential is used in ICM-DISCO13 and RossettaDock.22 Besides geometry scoring functions, other types of functions have been developed to represent other aspects of complementarity in the process of complex formation, including electrostatics, solvation, hydrogen bonding, and so forth. Electrostatics energy is generally calculated by a modified Coulomb equation.11, 15, 22 As for solvation energy, several methods15, 22, 24 have been used based on various solvation models. Most of them are knowledge-based statistical potentials using either contact surface areas or contact pairs of atoms with predefined atomic types. Hydrogen bonding energy is estimated in some docking methods.15, 22

A community-wide assessment, Critical Assessment of PRedicted Interactions (CAPRI), has been set up to help developers improve their methods. Reviews for rounds 1–525, 26 as well as more docking methods participating in CAPRI can be found.26 We have attended CAPRI from rounds 6–11 and predicted eight targets.27–33

In this article, we will present an improved rigid-body docking procedure that uses a large cube size in sampling the conformation space. We systematically test the method with protein–protein interaction benchmark V2.0.34 We will show that our coarse grained rigid-body docking procedure is able to find correct solutions and rank them within top 2000 for 66 out of 83 complexes. We will also present our results in CAPRI rounds 6–11.


AA, antigen–antibody; EI, enzyme-inhibitor; OT, others.


Generation of Voronoi molecular surface

Molecular surfaces of the ligand and the receptor are created by an algorithm that generates surface dots as defined by the Voronoi surface.35 The algorithm is described here. A molecule is first enclosed by the accessible surface of the molecule and the dots generated on the accessible surface are used as the probe centers for the Voronoi surface. Then, the Voronoi surface is defined between the probe centers and the atom centers of the molecule. Voronoi surface dots are generated according to the method shown in Figure 1. We take the advantage of the fact that each Voronoi surface dot should belong to both a probe center and an atom center, which should be the closest ones among all the probe centers and the atom centers, respectively. Since the Voronoi surface is a bisection plane, Dp and Da should be equal in the ideal case. When they are not equal, their difference happens to define the thickness of the Voronoi surface. In our implementation, the surface thickness represented by a layer of surface dots is a parameter that can be adjusted.

Figure 1.

Algorithm for generating Voronoi surface dots. This is a schematic diagram. See the main text for detailed descriptions.

Each surface dot has the following attributes: the Cartesian coordinate, a surface normal, and the area of the surface patch. Charge is assigned to each ligand surface dot and electrostatic potential to each receptor surface dot, calculated based on the atomic charges.

Mapping the atoms and the surface dots into a cube space

The surface dots of both ligand and receptor molecules will be projected into a cube/grid space. Each surface cube is described by the following attributes: three integer coordinates; a surface normal calculated by averaging over all the surface dots projected into the cube, where the averaging is weighted by dot areas; the sum of areas of all the surface dots within the cube; the electrostatic potential for a receptor cube and the electrostatic charge for a ligand cube; and an integer index for the atom type.

Searching the conformational space

The searching scheme of INTELEF samples the six-dimensional rotation and translation space completely.6 Here, INTELEF is an updated version of SOFTDOCK's coarse-grained global search program, which can calculate any form of pair potential between atom pairs. Before running INTELEF, a list of all possible rotation angles is generated with 5° interval (a new sampling method has been implemented to improve the method in Ref.6), containing 85,254 rotations. In the rotational search, each rotation in the angle list is applied to the ligand. After the ligand is rotated by a particular rotation angle, a fast translational search is used to find the best translational vectors. The translational search is a full sampling of all possible translation vectors between the ligand and receptor surface cubes. For each pair of the ligand and the receptor surface cubes, the intercube vector is calculated as the translation vector. All the scoring functions (to be defined later) between this pair of surface cubes are then calculated and assigned to this translation vector. For all pairs of the ligand and the receptor surface cubes, their translation vectors and all the associated pair-wise scoring functions are calculated and stored in a translation (vector) space. The best translation vectors are chosen according to the peak positions in the distribution of geometric scores (GEO, to be defined later) in the translation space. For each rotation, the top 100 peaks are output as candidate solutions, and for each of the top 100 candidate solutions the recorded scores of all the other associated scoring functions are also included with the output.

Definitions of scoring functions

Seven scoring functions are utilized to describe different aspects of complementarity of a candidate solution. They are geometry complementarity score (GEO), surface normal correlation score (CORR), interface cube pair score (PAIR), atomic contact solvation score (ACS), electrostatic score (ELEC), surface difference score (DIFF), and volume overlap score (OVERLAP). All scores are calculated over the interface cubes, which are defined as the superimposed surface cubes between the ligand and the receptor after a rigid-body transformation is applied to the ligand.

GEO reflects the degree of surface shape complementarity in the predicted interface. The GEO is calculated as the following:

equation image

where P (probe) and T (target) denote the ligand and the receptor, respectively. The tuple (ik, jk) denotes an interface cube defined by a pair of the receptor surface cube ik and the ligand surface cube jk, where k is the serial number of the interface cube. equation image is the receptor surface normal of cube ik, and equation image is the ligand surface normal of cube jk. equation image and equation image are the surface area of the receptor and ligand cubes, respectively. N is the total number of interface cubes and θ is an input parameter defining a cone angle. In each interface cube there is a pair of the ligand surface normal and the receptor surface normal. The acute angle formed by the two surface normals is defined as the surface matching angle. The surface matching angle differs significantly from one interface cube to another. The input cone angle θ determines the weights of contribution of different interface cubes to the total geometry scores. The weighting function used is Gaussian and the cone angle used in this work is 10°. This will make the matching of surface cubes both soft and hard, soft because the cube size used is large, and hard because the cone angle used is small.7 This weighting function has the effect of sharpening the peaks in the distribution of scores in the translation space.

CORR reflects the accuracy of shape complementarity. It is the correlation coefficient of the ligand and receptor surface normals. The calculation of the correlation score is the following:

equation image

equation image is the average of the receptor surface normals weighted by dot areas and equation image is that of the ligand. The range of the CORR is between −1 and 1. Larger correlation scores indicate better geometry complementarity of the predicted interface.

PAIR is the number of matched surface cubes in the predicted interface. It measures the size of the interface in units of cubes.

ACS adopts the ACE parameters proposed by Zhang et al.24 It calculates the solvation effect of candidate predicted complexes. The non-pairwise scoring matrix is used in our calculation as in ZDOCK.19, 20

To carry out the calculation of ELEC, Delphi partial charges36, 37 are used to assign charges for the atoms of the ligand and the receptor molecules. The calculation of electrostatic potential for each receptor surface dot follows the method of FTDOCK,11 which uses atomic partial charges and a distance-dependent function for the dielectric coefficient.

DIFF is defined as the difference of the numbers of ligand and receptor surface cubes in the predicted interface.

OVERLAP is a scoring function that calculates the atom clashes in the predicted interface by counting the number of overlapping volume cubes weighted by a factor that reflects the packing density of the atoms.

Filtering process and scoring combination

The filtering process intends to remove those solutions with poor complementarity scores and reduce the noise in the final solution list. After the filtering, scores will be combined into one score for each solution. The score combination function is as follows:

equation image

In the equation, m is the number of scoring functions, and the summation is over all seven scores. αs are filtering cutoffs used in the filtering process, and βs are the input parameters that control the bin sizes of the discretization of the scores. After the score combination, solutions are sorted by the combined score and output for further analysis.

Clustering process

A simple clustering method is designed to enhance the signal of near-native solutions. The solutions are first sorted according the combined score. Then each solution is examined in the sequential order of sorting, starting with the top solution. If a solution is within 10 Å RMSD to an existing clustered solution (starting with null cluster), the solution will be merged into the existing cluster. If a solution does not belong to any currently existing clusters according to a cluster radius of 10 Å, a new cluster will be created and this solution will become the first member of the new cluster. The clustering process stops when sufficient number of clusters has been created.

Criterion of near-native solutions

The near-native solution is defined as solutions whose angle distance to the ligand in the native complex is less than 50° and whose translation distance is less than 10 Å.

Comparison between different parameters

When testing the performance of our procedure, different parameter sets are used. We use the relative R factor to compare the performance between two parameter sets, which is defined as:

equation image

where the summation is over all the complexes in the benchmark. There are 83 complexes in the benchmark. r1i and r2i can be either the ranks or RMSDs produced by parameter sets 1 and 2, respectively. In the case of ranks, r1i is the best rank of near-native solutions sorted according to the combined score for complex i and parameter set 1. In the case of RMSDs, r1i is the RMSD of the best rank solution for complex i and parameter set 1. The formula indicates that parameter set 2 is considered as the reference set and parameter set 1 is compared with set 2. From the formula it can be seen that, if the Rrel(1, 2) is positive, parameter set 1 performs better than parameter set 2, and if Rrel(1, 2) is negative, then parameter set 1 performs worse than set 2.

Additional procedure on CAPRI

When we participate in CAPRI, an energy minimization step is added using AMBER 8.038 with implicit solvent option. The purpose of this step is mainly to exclude candidate complexes with bad steric clashes. After the energy minimization, the candidate structures are inspected manually.


Docking results on the benchmark

INTELEF was tested on the 83 complexes of the Benchmark V2.0. When running INTELEF, the cube size was 4.5 Å and the surface thickness was 2.5 Å. Four scoring functions were used, including GEO, CORR, ELEC, and OVERLAP. The average rank of near-native solutions for 83 complexes was 1018 and the average RMSD 11.003 Å. Sixty-six out of 83 complexes had at least one near-native solution ranked in the top 2000 solutions, of which 20 were of enzyme-inhibitor (EI) class, 16 of antibody–antigen (AA) class, and 30 of other (OT) class, such as signaling complexes. Thirty complexes were even ranked in top 100 solutions, of which 16 were of EI class, 3 of AA class, and 11 of OT class, respectively.

The effect of side-chain removal of surface exposed Lys and Arg

The surface exposed sidechains of Lys and Arg are often observed to undergo large conformational change between unbound and bound states and cause large steric clashes in the docked complexes. The effect of removal of surface exposed Lys and Arg residues in both the ligand and receptor was tested. A positive of Rrel(removal, no removal) of 0.037 was found, which indicates that the removal helps improve docking results.

The effect of Voronoi surface thickness

The thickness of Voronoi surface layer determines the softness of the surface and is important for the performance of INTELEF. A range of Voronoi surface thickness was tested and results are shown in Figure 2. It shows that the average logarithm rank improved when the surface thickness increased. The only exception was at 3.0 Å, where the average logarithm rank was a little larger than that at 2.5 Å. The average RMSD reached its minimum at 2.0 Å. Then it increased gradually when the Voronoi surface thickness changed from 2.0 to 3.5 Å. Considering both rank and RMSD, the optimal range of surface thickness should be about 2.0–2.5 Å.

Figure 2.

Effect of surface thickness. This figure shows how surface thickness affects the averaged logarithm rank and the averaged RMSD of the docking results. The black line shows the averaged logarithm rank and the gray dashed line shows the averaged RMSD.

The effect of scoring functions using leave-one-out tests

Leave-one-out tests were performed to examine the effect of different scoring functions. The results are shown in Table I. It shows that for X = GEO, CORR, ELEC, and OVERLAP, Rrel(remove X, All) was all negative, meaning that the docking results got worse when each X score was removed. So all these four scores help improve docking, that is, they increase the ability to discriminate near-native solutions from the noise solutions. But when PAIR was removed, Rrel(remove PAIR, All) was positive, meaning that this score does not help docking. Since Rrel(remove ACS, All) was nearly zero, it showed little effect in this test.

Table I. Effect of Scoring Functions
Score XRank_avgRrel_rank (remove X, all)RMSD_avgRrel_RMSD (remove X, all)
  1. This table shows the effect of removing each score X on docking results. When calculating Rrel (remove X, All), the reference parameter was ‘All scores’ and X was any of the six scores listed in Column 1. The average values (Rank_avg and RMSD_avg) did not directly lead to the proper sign for Rrel because Rrel was the average of many Rrels for the 83 complexes in the benchmark.


In all the tests, the RMSDs were about the same, around 11 Å, and the Rrels were all about zero, suggesting that different scoring schemes do not change the accuracy of the docking results.

The effect of scores for different biological complex classes

The leave-one-test was also performed according to the three different biological complex classes (EI, AA, and OT). Three scoring functions (GEO, CORR, ELEC) showed different effects among the three functional classes. It is obvious that GEO and CORR were of great importance to the docking of EI class, with Rrel being −0.377 and −0.334, respectively [where Rrel is Rrel(X, All) and the same for the following Rrels]. On the other hand, ELEC was dominant for the docking of AA and OT classes with Rrel of −0.477 and −0.667, respectively. The only positive Rrel, 0.237, was for GEO of AA class, suggesting that GEO score is not suitable for AA class.

The scores with the secondary effects were GEO of OT class and CORR of AA class, with Rrel of −0.310 and −0.211, respectively. Other scores had very little effect with Rrel being −0.030 and −0.123 for CORR of OT class and for ELEC of EI class, respectively. Overall, three scores GEO, CORR, and ELEC included almost all the important effects for these three functional classes of complexes.

The effect of clustering

It is obvious that the clustering procedure improved ranks greatly with Rrel(clustering, no clustering) of rank being 0.777, but the effect on RMSDs was very small with Rrel of RMSD being 0.033.

The effect of choice of ligand and receptor selection in docking partners

For this test, the 83 complexes were divided into two classes. Class 1 contained complexes that have a convex–concave shape in the interface. This definition of shape is determined by a Rrel of atom numbers between the ligand and receptor in the interface (within a distance of 5 Å). A lower cutoff value of 0.12 is applied to Rrel. The monomer with less number of atoms in the interface was chosen as the ligand and the one with more number of atoms was chosen as the receptor. Class 2 contained complexes that have no obvious convex–concave shape in the interface with Rrel < 0.12. In this case, the assignment of ligand and receptor in the benchmark was used. With this classification of complexes, two tests were performed. Test I was to dock ligand to receptor. Then Test II was to dock receptor to ligand by reversing the partners on purpose. The Rrel(Test I, Test II) of rank was 0.606 and 0.159 for Classes 1 and 2, respectively. This shows that the correct choice of ligand and receptor in docking indeed affects the docking results.

Application of INTELEF in CAPRI rounds 6–11

We used INTELEF to dock the CAPRI targets from rounds 6–11. Altogether the predictions of five targets were submitted and assessed successfully. The RMSD and Fnat of the best solutions that we submitted are listed in Table II. Although we did not submit any solution that was identified as the acceptable solution, our results had improved continuously. The best and the second best results of Target 27 were approaching the criterion of acceptable solution. It was nevertheless considered a success for our coarse-grained docking procedure to submit such results without any refinement step.

Table II. Best Results Submitted for CAPRI Targets
  1. This table shows our best results of each CAPRI target. The data of Fnat, L_rmsd and I_rmsd_bb were taken from the assessment results given by the CAPRI evaluation team.


As CAPRI progressed from rounds 6–11, we had tried to improve our algorithms and optimize the parameters used. First, the α and β values for each score were adjusted for the filtering step. They were only empirical values since we have found using weights from linear regression to calculate the combined score is better. Second, we increased the Voronoi surface thickness from 0.8 to 2.5 Å. Finally, we implemented the removal of the exposed Lys and Arg residues before calculating the Voronoi surface dots for running INTELEF.


The most important factor in our docking procedure is shape complementarity. The best score is GEO, which is designed to measure the level of complementarity over the interface. Since this score is used by INTELEF for the sorting of peaks, the Rrel in Table I for GEO is underestimated. The second best score is CORR, which is designed to reflect the precision of the surface complementarity and perhaps is more sensitive to surface normal pairs than to interface areas. So CORR can discriminate solutions with good complementarity, but having small area, from those solutions with bad complementarity, but having large area. The combination of GEO and CORR is the crucial factor for the success of our docking. Since our method is a coarse-grained rigid-body search, this probably suggests that shape complementarity is one of the most important factors in the recognition stage of docking.

ELEC is another factor to be important in distinguishing the correct solutions from other noise/false ones as shown in Table I. We have found that different implementations of electrostatics had varying effects on the results of our scoring function. When a more complicated implementation was used that utilizes the partial charges of all atoms, including hydrogen atoms of the 20 amino acids, the results showed that the electrostatic hardly contributed to the correct docking score. Considering the possibility that the electrostatic interaction in the recognition stage is not as accurate as that in the binding stage, we used a simple implementation, in which only the form charges of the commonly charged amino acids were assigned. Then the charge of each surface dot was calculated by summing atomic charges within a radius of 5 Å.

ACS score had nearly no contribution to the docking results of our method. The main reason is probably due to the coarse-grained nature of INTELEF, because ACS score is originally designed as an atom-based score. When large cubes are used for the calculation of ACS, there are more errors compared with the atomic-level scoring. From the experience of ACS, it is suggested that the atom-level pair potential may not be suitable for the coarse-grained methods.

The inherent limitations of the grid-based scoring methods have been analyzed through examining the distributions of scores of both near-native and false-positive solutions. They are found to be heavily overlapping.39 This finding is consistent with our calculations of the average, median, and standard deviation of each individual score. It has been hoped that combination of several scores will improve the power of discrimination. However, this overlap in distribution is unavoidable and seems to partially explain the limited successes of various scores used by SOFTDOCK with the benchmark.

The different performances of scoring functions for different complex classes reflect the intrinsic differences in their interface characteristics. The interfaces have been analyzed for different biological complexes.40–42 Most important characteristics have been found to be geometric and electrostatic.7, 11 Because of the different natures of these two types of interactions, it is generally hard to evolve an interface that satisfies both types of complementarities. Perhaps, a set of rules could be found to identify both types of complementarities and interfaces. They may facilitate protein docking and interface design.

Grid-based methods often generate complexes that have steric clashes. We tested two methods to alleviate this problem. One method is the removal of the long sidechains of Lys and Arg. These residues often intrude deeply into the interface of the other protein.20 By removing them, better near-native complexes can be generated. The other method to reduce steric clashes is the adjustment of surface thickness. Obviously, with the increased surface thickness, the proportion of complexes that have large steric clashes will decrease.

The surface thickness is also related to the surface flexibility. When the surface thickness increases, more potential positions of the surface atoms are implied. But more uncertainty in atom positions results in more false-positive solutions as well as more true-positive solutions. Apparently, a balance must be reached between the flexibility of surface and the relative signal-to-noise ratio of near-native solutions. Thus, there is an optimal value for the surface thickness as shown in Figure 2.

The clustering procedure is very helpful to condense the signal of near-native solutions. However, one problem remains, that is, we cannot always pick out the solution with the best RMSD as the representative of a cluster. This is probably why our best RMSDs were only around 11 Å. This problem with accuracy of solutions needs to be solved by improving the scoring functions.

Our results also show that the correct choice of ligand and receptor is important in docking. This effect is primarily due to the clustering procedure. The clustering signal (the size of a cluster) is bigger on the binding site of a receptor than on that of a ligand. For the same clustering criteria on rotation and translation, the centers (the center of mass of the molecule) of the clustered ligands around a receptor site are more densely distributed than the centers of the clustered receptors around a ligand site. This is mainly due to the different sizes of the molecules and may be corrected by a scale factor proportional to the radius of the molecules.

Our results on CAPRI had improved in between rounds (Table II). But more improvement in scoring is needed. We have examined the best solutions generated by INTELEF for the 83 complexes of the Benchmark V2.0. They are well within the acceptable solutions. We are testing new methods of applying atom-based scoring on the complexes generated by INTELEF. Further improvement is possible on the current scoring functions using methods, such as the multiple linear regression algorithms, to find local near-native peaks and the machine learning algorithms to discriminate false peaks among global landscapes.43


In this article, we tested a rigid-body coarse-grained docking procedure with a recent benchmark. Correct solutions were found for 66 out of 83 complexes with ranks in the top 2000 solutions. We applied our method in the CAPRI rounds 6–11 and were able to submit two solutions with backbone RMSD around 10 Å. It shows our current docking method is promising and requires further development.