Prediction of functional class of novel plant proteins by a statistical learning method
Author for correspondence: Y. Z. Chen Tel: +65 6874 6877 Fax: +65 6774 6756 Email: firstname.lastname@example.org
- • In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods.
- • A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated.
- • To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999–2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods.
- • The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4% of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.