In order to test the usefulness of our filter and ranking methods, we applied our algorithms to a benchmark of 59 nonredundant protein complexes first used by Chen et al. (2003b). This benchmark set includes 22 enzyme–inhibitor complexes, 19 antibody–antigen complexes, 11 other complexes, and seven difficult test cases. This benchmark has been used by other groups to test their docking methods (Gray et al. 2003; Li et al. 2003). Gottschalk et al. (2004) also used 21 complexes of this benchmark to test their scoring function of tightness of fit. Since unbound–unbound docking (using single protein crystal structures as input) is more challenging than bound–bound docking (using the structures obtained from protein-complex crystals), we have carried out the unbound–unbound docking and unbound–bound docking as given by the benchmark (Chen et al. 2003b).

#### Analysis of FTDock performance

Using FTDock (http://www.bmm.icnet.uk/docking) (Gabb et al. 1997; Moont et al. 1999), we obtained 10,000 docked models and their ranks according to the correlation function (equation 3) of shape complementarity and pair potential (see “Docking calculations” below). For these 10,000 models, we calculated the root mean square deviation (RMSD) of C_{α} atoms of each model structure from the native complex structure. We then defined “hits” as the number of models having RMSD <4.5 Å from the native structure (shown in Table 1). Also shown in Table 1 are the lowest RMSD (LRMSD) complex obtained with FTDock and its corresponding shape-complementarity rank and pair-potential rank. It can be seen in Table 1 that there are 26 complexes with LRMSD <2.5 Å, 15 complexes with LRMSD >2.5 Å but <3.5 Å, and eight complexes with LRMSD >3.5 Å. We are thus confident that FTDock can generate model complexes close to native structures. Nonetheless, for five complexes (1AVW, 1BQL, 1EFU, 1FIN, 1GOT), FTDock failed to generate near-native structures, as the LRMSDs for these complexes are >4.5 Å.

In order to explore the effect of conformation change on docking procedure, we also carried out a bound–bound dock for 1FIN (listed in Table 1 as 1FIN_BB). Comparing with unbound–unbound docking of 1FIN, we observe that the bound–bound docking gives a model complex with LRMSD = 0.41 Å, with ranks of 15 and 21 for shape complementarity and pair potential, respectively. For unbound–unbound 1FIN docking we could only get a lowest RMSD model of 5.94 Å with very high rank values of 9597 (shape complementarity) and 7502 (pair potential).

The rank based on shape complementarity predicts near-native structures very poorly: the average rank of the LRMSD complexes is 4123, with only three of the 60 complexes registering ranks better than 100. It is thus clear that shape complementarity is not by itself an adequate means for choosing near-native structures.

The pair-potential rank did improve the ranks for 47 complexes out of the 60 cases. From Table 1, it can be observed that there are only 12 complexes with pair-potential ranking worse than shape complementarity. Nonetheless, ranks based on pair potential do not have impressive predictive ability. For example, only five complexes (1BRC, 1BRS, 1PPE, 2MTA, 2SIC) have ranks <20 for the LRMSD model, and another three complexes (1CGI, 1CHO, 2BTF) have ranks of LRMSD complexes <100. The rest have very high rank values.

#### Filters performance

First, we try to reduce the number of possible docked models from the generated 10,000, without filtering out the lower RMSD models. As described in “Filters” below, we developed two filters based on residue conservation information. In the functionally interacting natural proteins, such as enzyme–inhibitor complexes, we gave higher ranks for the models with a higher number of conserved positions in the interface region. In the case of antigen–antibody interactions, the interacting regions are highly variable, and we gave higher ranks for the models with low numbers of conserved positions. After performing the first filter, we used filter II (see below) to reduce the number of complexes to ∼2000–4000 models. These results are also shown in Table 1. It can be seen that combining with the conservation filter and filter II the number of complexes is reduced from 56% to 86%.

In Table 1, there are 11 complexes (1A0O, 1AHW, 1BRS, 1DFJ, 1FQ1, 1IGC, 1UDI, 1UGH, 1WQ1, 2MTA, 4HTC) for which sufficient homolog sequences were not available from nonredundant databases to calculate the conserved residue position information. Therefore, only filter II is applied for these complexes (in this case, filter II only includes three normalized ranks without the conserved residue position information).

When we applied the filters to the model sets, some near-native structures are also filtered out (false negatives), besides nonnative structures. Here we define the improvement factor (I_fact) as:

where hits/models is the ratio of the number of structures with RMSD < 4.5 Å from the native structure over the number of complex models, before—(hits/models)_{i}—and after—(hits/models)_{f}—applying the filters.

The results are shown in Table 1 and Figure 1. It is observed that there are 48 out of 60 complexes with I_fact >1.0. Most of them (44) are >2.0, which means the improvement is >100%. For a few complexes, applying the filter resulted in >400% improvement.

There are five out of 60 complexes (1AVW, 1BQL, 1EFU, 1FIN, 1GOT) with I_fact = 1.0. From Table 1, it can be observed that for these five complexes (see “Analysis of FTDock performance” above), FTDock did not generate any near-native structure (with RMSD <4.5 Å), that is, no hits are found. When we examined these structures more carefully, we found that except for 1FIN, in which the LRMSD structure was filtered out, the LRMSD structures are still in the filtered subset of these proteins. Moreover, the filters have reduced the number of model structures for these five complexes by a factor of 2.5 to 4. This shows that the filters assist with even these five complexes.

Our filters failed for seven complexes: there are three complexes (1FSS, 1IGC, 1MAH) for which I_fact is <1.0 (Fig. 1; Table 1). For these structures proportionately more near-native model structures are filtered out than unrelated ones. In Figure 1, it can also be observed that four complexes (1EO8, 1L0Y, 1NCA, 1QFU) have I_fact = 0. This means that we filtered out all of the near-native structures (two, one, seven, and five hits for the four complexes, respectively). When we examined the number of conserved residue positions at the interface for these four complexes, we found that there is a high number of conserved residue positions for antibody–antigen systems 1EO8 and 1QFU, and a low number of conserved residues for non-antibody 1L0Y and 1NCA, contrary to most of the complexes investigated.

The global rank (see next section) for these four failed complexes (1EO8, 1L0Y, 1NCA, 1QFU) and two of the complexes (1FSS, 1MAH) without improvements are also given in Table 1 without using filter I. It is observed that except for 1L0Y, the I_fact values of the rest of five complexes are >1.0, and the lower RMSD models are still in the subset. 1L0Y only has one hit (see Table 1) and is filtered out by filter II, but other lower RMSD models are still in the subset. Conserved residue position information cannot be calculated for 1IGC, since there are not enough homologous sequences in the database. The result of 1IGC listed in Table 1 is obtained by just using filter II. Its improvement (I_fact) is still <1.0 since lower RMSD models are filtered out.

By comparing the results before and after filtering (Table 1), it becomes clear that only in a few cases (1AHW, 1CHO, 1FIN, 1FQ1, 1IGC, 1KKL, 1WQ1), the LRMSD model structure was filtered out, but even in these cases the second lowest RMSD complex is retained into the remaining subset. For all other complexes the structure closest to the native structure is always in the remaining subset. This demonstrates that our conserved residue information filters work well for the benchmark set.

In order to check the redundancy of filter I and filter II, we tested them separately on those complexes that have enough conserved residue position information. The I_fact values for performing these two filters separately are also listed in Table 1 (columns I1 and I2). Both of them do improve the efficiency with most of I_fact values (I1, I2) being >1.0. After combining them, we observed further significant improvement (I_fact in Table 1). The combined I_fact values are greater than the individual I_fact values (I1, I2). We conclude, thus, that it is necessary to include filters when conserved residue information is available, in order to substantially decrease the number of model structures and improve the prediction.

#### The efficiency of global ranking

The free energy of binding would in principle suffice to determine the native structure from a large set of complexes. Unfortunately, the free energy we calculated does not rank near-native structures at the top of the list. This could be the result of inaccuracies in the potential force fields used for calculating enthalpic terms or in the empirical entropic terms. Conformational changes upon binding, whether local or global, can also result in significant changes in the free energy of binding (Camacho et al. 2000a). As a result we have to resort to empirical descriptors, and since none can individually predict near-native structures with great accuracy, we decided to combine multiple descriptors in a global ranking scheme.

Empirical rankings based on more than one descriptor have been attempted before: In ZDOCK (Chen et al. 2003a) shape complementarity, electrostatics and desolvation energies were combined to get a final target function, and AutoDock (Morris et al. 1998) involved more energy terms into the score function. A major bottleneck for composite, global scoring functions is that the weights for different quantities are difficult to determine.

As described in “Global normalized ranking” below, we derived a global ranking function by renormalizing the rank of each descriptor used (equation 11), and used weights 1, 1, 2, 4, and 5 for shape complementarity, binding free energy, conservation index, desolvation energy, and pair-potential energy, respectively, in a new global ranking function (equation 12). Using this function we obtained a new global rank for each model complex. Some examples (18 out of 60 complexes) of the global rank versus the RMSD are shown in Figure 2.

The rank of the LRMSD structure for each complex is also listed in Table 1 (G_rank). From Figure 2 and the value of the G_rank, we can see that in most of the model complexes the near-native complexes have lower ranks. Comparing our G_rank with PP_rank in Table 1, there are only four cases (1CGI, 1EO8, 1JHL, 1L0Y) for which our G_rank is higher than the pair-potential rank. For another 56 complexes, our global ranking fairs better than the pair-potential rank. Comparing our results with the results obtained by rigid-body displacement (Gray et al. 2003), Pro-Mate (Gottschalk et al. 2004), ZDOCK (Chen et al. 2003a), and RDOCK (Li et al. 2003) (ranks are also listed in Table 1 for comparison), our ranking scheme produces a similar fraction of accurate predictions, although each method may not produce accurate predictions for the same complexes. Overall, our ranking results compare well with ZDOCK and ProMate results. RDOCK results are better than ours in most cases.

Since the methods for generating the decoy complexes, for evaluating and ranking them are dissimilar in all these studies, the information obtained and reported herein can be considered as complementary to other methods.

In Table 1, we also give the number of hits (E_hits) within the first 100 ranks. For 22 complexes, application of the global rank resulted in no hits in the top 100 ranked structures. We should note that for five of them there were no hits to begin with, because FTDock did not generate any. For the rest of 38 complexes, application of the global ranking improves substantially the predictive ability. Specifically, we calculate the improvement over random (IOR) for these 38 complexes

where NRC is the number of complexes after filtering, and we find significant IOR values (see Table 1). The average calculated IOR for these 38 complexes is 11.18. Even when the 17 complexes with IOR = 0 are included in the average calculation, the average IOR for the 55 complexes for which FTDock generated hits is 7.72.

Figure 3 shows model structures of the best predictions superimposed on the native structures for some of the selected targets with rank <10. The complexes 4HTC, 2MTA, 1SPB, 1STF, 1KXQ, and bound–bound 1FIN (1FIN_BB) have given excellent prediction with rank of 1 or 2 for the lowest RMSD structure.

Comparing 1FIN with 1FIN_BB, we can see for bound–bound docking (1FIN_BB) we get better results over the unbound–unbound docking (Fig. 3d). This should be expected since unbound–unbound docking involves a large conformational change. Since we only performed our algorithms on unbound–unbound cases (except for 1FIN, where we did both, and the unbound–bound complexes in the benchmark), it is expected that our docking procedure will give better results for bound–bound docking systems. Moreover, if the initial docking procedure (FTDock) gave more hits, then our ranking procedure could potentially determine the near-native structures.

#### Concluding remarks

In this work we have demonstrated the usefulness of conserved residue position information in identifying possible near-native complex model structures from docking solutions. We have used this information to develop two filters, reducing the number of docked model structures by 56% to 86% depending on the complex, while keeping near-native complexes in the remaining subset. We applied our method to a benchmark set of 59 complexes. There are 11 complexes for which we didn't find enough homolog sequence information. Thus, we could not apply our filter at present. Only for four of the remaining complexes did our filter fail to retain the near-native structures, and for another three out of 60 complexes (the 59 benchmark and the FIN bound–bound calculation), our filter did poorly compared to FTDock results.

After filtering, we minimized the side-chain structure of the remaining model structures, and we calculated the binding free energy and desolvation energy. We developed a ranking scheme by renormalizing and weighting a combination of the ranks based on conservation position information, shape complementarity, desolvation energy, pair potential, and binding free energy. Excluding the five complexes for which FTDock did not generate any hits (with RMSD < 4.5 Å), the average improvement over random for the top 100 ranked structures is 7.72. For 17 complexes IOR = 0, but for the majority (38 complexes) we observed significant improvements in predictive ability, in terms of predicting near-native structures in the highest-ranked 100 structures. Generally, our approach can be easily adapted to any other docking algorithms to refine their ranking results.