Ninety-six effective targets in CASP7 have been split into 124 domains by the assessors which include 28 HA-TBM, 77 TBM, 4 TBM/FM, 15 FM, and 1 decoration targets. For conciseness, we will divide our analysis in two big categories of TBM (including HA-TBM and TBM) and FM (including TBM/FM and FM).
In Table I, we present a summary of the average performance of Zhang-Server and Zhang compared with the best threading templates used by I-TASSER. Column 5 is the average RMSD and the alignment coverage of the best threading template in different categories. Here, the best template refers to the template of the highest TM-score to the native structure among all the templates exploited by I-TASSER. It is usually worse than the real best template by the structural alignment in the PDB library, identification of which needs the native structure information.38 Obviously, the PPA threading identified much better alignments for the TBM targets than that for the FM targets. On average, the PPA alignments have a RMSD 5.0 Å over 90% aligned regions for the TBM targets and a TM-score 0.66. For the FM targets, the templates have an average RMSD 13.5 Å in 81% aligned regions. The average TM-score (0.21) is close to that expected for the random structure matches (0.17),46 understandable because by definition there is no appropriate templates in PDB for the FM targets. Overall, the incorporation of the CASP7 servers as taken in the human prediction results in a slightly better set of threading alignments in both TBM and FM categories. Here, many TBM targets are also categorized as Medium/Hard targets by the PPA system and the threading alignments from the CASP7 servers are therefore exploited. The average TM-score of all templates increases from 0.591 to 0.603 (by 2%).
Table I. Average Results of I-TASSER Predictions in Both Human and Server Sections
| ||Type||Ntarget||Size||Best template||First model||Best model||Constraints|
|Ralia/Frab (%)||TM||Ralia||Rallc||TM||Ralia||Rallc||TM||Acd/Cove (%)||Erf|
Column 7 is the average RMSD to native of the first I-TASSER models calculated in the same aligned region as in templates. The RMSD decreases (∼1 Å) compared with the templates in these regions therefore reflects the improvement purely by the I-TASSER reorientation of the secondary structure fragments. It should be mentioned that I-TASSER does not attempt to “re-tune” the alignments because the local fragments are kept rigid during the simulations. The fragment repacking is driven by the inherent I-TASSER force field and the external consensus restraints. The columns 9 and 12 show the TM-score of the first and the best I-TASSER models. On average, I-TASSER reassembly results in a TM-score increase by ∼14% in the TBM category. On the basis of the previous statistics,23 a simple loop connection can lead to a TM-score increase of 3.5% because of the length elongation. Therefore, about 10% of the TM-score increase may be due to the topology improvements. For the FM targets, the TM-score increase is about 70%, more significant than that for the TBM targets, since the low TM-score templates have much more space for improvement. In contrast, the RMSD improvement from 13 Å to 10–12 Å for the FM targets sounds marginal, partially because RMSD is not an appropriate quality for distinguishing the topology in this range of accuracy.46
Column 13 is the consensus contact constraints collected from the PPA threading alignments (or PPA plus CASP7 threading servers for the Medium/Hard targets in the human predictions). For the Easy/Medium/Hard targets, top 20/30/50 templates are employed with a contact occurring frequency cutoff of 0.2/0.1/0.1. Because of the differences in the alignment quality, the average accuracy and coverage of contact restraints in TBM is much higher than that in the FM category. Even for the FM targets, the contact is still much better than the random prediction. (Wu ST, Zhang Y. Could the sequence-based contact prediction be useful for protein tertiary structure modeling? Submitted for publication 2007.) Having in mind that a set of contact prediction with an average accuracy higher than 0.22 will be helpful for ab initio MC simulations to drive the topology at the correct direction,20 it is estimated that in about half of the FM cases the employment of the restraint prediction should be better than not using them in the ab initio modeling. It should be mentioned that the purpose of the contact collections is to provide helpful constraints for the I-TASSER simulation rather than to generate the most accurate contact prediction. Certainly, if we collect the contacts only from the most confident templates and based on a higher frequency cutoff, the accuracy of the contacts may be higher and the coverage will be lower. But we found the current setting of the template number and cutoff parameters work the best for I-TASSER in our benchmark test. For each of the 10 residues, we generate up to four distance predictions for the long-range Cα pairs (with |i−j| > 6). Column 14 shows the average difference between the native Cα distances and the best predicted distances. Obviously, the distance map prediction of TBM is again more accurate than that of the FM targets.
In Figure 2, we present the comparison of the first I-TASSER models and the best threading templates for both server and human predictions. There is a consistent improvement of final models over templates based on RMSD and TM-score. There is no systematic difference in template refinements with regard to the targets from TMB or FM targets and to the models by Zhang or Zhang-Server. One notable exception is T0258 at Figure 2(c), where the RMSD of the final model by human is 3.3 Å worse than the best template (from 5.3 Å to 8.6 Å). The main reason is that our human prediction combines the threading alignments from the CASP7 servers with wrong templates for the target although our in-house PPA threading programs hit the best template of 2a2pA. The mixture of bad server templates results in the biggest cluster having a wrong orientation at the C-terminal, although the third human model has a correct topology of full-length RMSD 5.3 Å. There are also some cases where the big differences of RMSD in templates and models may not be entirely due to the structural topology improvement. In T0347_1, for example, RMSD of the template is reduced from 18.2 Å to 5.2 Å mainly because the misorientated tails in the template has been corrected by the I-TASSER reassembly. But the core region does not change much in the I-TASSER modeling and the overall TM-score increases only by 0.12 in this case. Because the templates from the CASP7 servers are sometime better than our in-house PPA templates, the RMSD improvement over the templates in the human prediction appears less dramatic in some of those cases [see Fig. 2(c)].
Figure 2. Comparison of the first predicted models by human (“Zhang”) and server (“Zhang-Server”) with respect to the best exploited templates. The RMSD is calculated in the same set of aligned residues. The TM-score is calculated in the aligned regions for the templates and in full-length for the models.
Download figure to PowerPoint
In the upper panel of Figure 3, we show four representative examples where I-TASSER successfully refines the templates from high RMSD (3.3–16 Å) to low RMSD (1.5–3 Å). In all these cases, the consensus contact predictions have a high accuracy and coverage, that is T0338_1 with 0.5/142%, T0363 with 0.41/162%, T0369 with 0.36/198%, T0370 with 0.43/173%. The consensus restraints combined with the optimized I-TASSER inherent potential serve as the major driven force for the refinement of the templates. In all of the four cases, the accuracy and coverage of the constraints are higher than that extracted from the best individual template (data not shown), which helps to refine the loops and tails and sometime the global topology such as T0369.
Figure 3. Representative examples for the TBM (upper panel) and FM (lower panel) targets. The thin lines represent the backbone of the experimental structures and the thick lines are the threading templates or the final models. The two number under the TBM models are the RMSD to native in the threading aligned regions and the RMSD of the full-length. Blue to red runs from N- to C-terminals. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]
Download figure to PowerPoint
In the lower panel of the Figure 3, we also show four FM examples where I-TASSER builds models of correct topology with a RMSD of 3–4 Å. Figure 4 shows a more detailed analysis of the typical example of T0382. It is a new fold protein from Rhodopseudomonas palustris CGA009 crystallized by the structure genomics project.47 The topology of T0382 consists of six joggled α-helixes. The left panel of the Figure 4 shows the top five templates hit by the multiple threading programs used by I-TASSER, all having correct local second structure elements but incorrect global topologies with the best RMSD of 9.3 Å from 1xm9A1 (TM-score = 0.28). Our contact prediction program generates 148 side-chain contacts with 37 contacts correct (accuracy 25%). The average error of the best predicted Cα distances is 2.2 Å. I-TASSER cuts the fragments from the template alignments and reassembles the topology under the guidance of the predicted restraints and the inherent potential, which results in a model of full-length RMSD 3.6 Å and TM-score 0.66 (right panel of Fig. 4). The correlation of I-TASSER energy and the RMSD of the structure decoys is 0.72 which demonstrates the consistency of the external restraints and the inherent force field.
Figure 4. Structure comparison of the threading templates, the final model, and the experimental structures for the target T0382. Blue to red runs from N- to C-terminals.
Download figure to PowerPoint
Human versus server predictions
The data in Table I have shown that the overall performance of our human predictions is slightly better than the automated server prediction. The improvement mainly occurs in the FM category where the average TM-score of the first model of the human prediction increases from 0.302 to 0.341 by 13% compared with the server. The increase of TBM targets is modest from 0.729 to 0.740 by 1.5%, which lead to an overall TM-score increase by 2.3% (0.664 to 0.679) for the first model. The overall increase of the best in top-five models for all targets (1.6%) is lower than that of the first model (2.3%), which indicates that the employment of multiple CASP7 servers tends to improve the ranking of the model rather than the best topology.
In Figure 5, we present a detailed comparison of the human versus server predictions for the first model. Similar to the tendency of Table I, for the high quality models (mainly TBM targets), for example, the models of RMSD < 5 Å or TM-score < 0.75, there is no notable difference between human and server. But for the hard targets, there is a tendency that the human prediction generates more models with better scores than the server.
There are several reasons for the human prediction outperforming the server prediction. (1) For hard targets when PPA programs have no confident hit, we exploited multiple threading templates from the CASP7 servers. Figure 6(a) represents one example where a better template 1kk1A hit by HHsearch43 has been exploited by the human prediction, which results in a TM-score increase from 0.36 to 0.45. (2) Human visual view of the multiple threading alignments and the template tertiary structures usually leads to a better domain parser than that by the simple domain assignment procedure used by server (see section “Materials and Methods”). For example, T0289 is a two-domain protein and PPA threading that hit both domains of 2bconA with high Z-scores. The server prediction fails to split the domains based on the sequence alignments and therefore folds the entire chain together. In the human prediction, by viewing the template structures, we correctly split the target into two domains at Residue ILE224 and fold the domains separately. As a result, the quality of both domains has been improved since I-TASSER tends to handle better the simulation of small single proteins partially due to the conformational entropy reduction.21 The TM-score of the final human prediction increases from 0.68 to 0.7 for T0289_1 and from 0.37 to 0.51 for T0289_2. Figure 6(b) is the structure superposition for T0289_2. (3) The human prediction can benefit from the longer CPU running time. Because of the current limited computing power at our lab, most of the hard targets in the server prediction did not run sufficient trajectories as in benchmark. Figure 6(c) shows an example of T0382 where I-TASSER needs to reconstruct the models from wrong templates. In the server prediction, the average energy of the largest cluster is −3024 kT. But by a longer run, the human simulation reached a cluster of average energy of −3264 kT, which results in a TM-score improvement from 0.54 to 0.66.
Figure 6. The examples where the human predictions generate better (a–c) and worse (d) models than that by the server predictions. The thin lines represent the backbone of the experimental structures and the thick lines are the final models. Blue to red runs from N- to C-terminals. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]
Download figure to PowerPoint
There are also some cases where the human prediction can be worse than the server prediction. Figure 6(d) is a typical example from the T0358 where the server prediction generate better models because our in-house PPA threading programs consistently hit the correct template from 2a2pA but with weak Z-scores. In the human prediction, we exploit the multiple templates from the CASP7 servers that actually have hit worse templates. The incorporation of incorrect templates can result in worse constraint data and therefore reduce the performance of I-TASSER modeling. For the first model, the server prediction has a RMSD of 5.8 Å but the human has a RMSD 8.5 Å with a disoriented C-terminal.