Get access

Evaluating the absolute quality of a single protein model using structural features and support vector machines



Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT-TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross-validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template-based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top-ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking. Proteins 2009. © 2008 Wiley-Liss, Inc.