Heterogeneous data integration by tree-augmented naïve Bayes for protein–protein interactions prediction


  • Colour Online: See the article online to view Figs. 1–6 in colour.

Correspondence: Dr. Xiaotong Lin, Department of Electrical Engineering and Computer Science, The University of Kansas, 1520 west 15th Street, Lawrence, KS 66045, USA

Fax: +313-577-6868

E-mail: cindylin317@yahoo.com


Most proteins execute their functions through interacting with other proteins. Thus, understanding protein–protein interactions (PPIs) is essential to decipher biological functions in a living cell. To predict large-scale PPIs, effective and efficient computational approaches are desirable to integrate heterogeneous data sources provided by advanced technologies. In this paper, we extend our previous work on a Bayesian classifier for human PPI predictions from model organisms, by introducing a tree-augmented naïve Bayes (TAN) classifier. TAN maintains the simplicity and robustness of a naïve Bayes classifier while allows for the dependence among variables. Our empirical results show that by integrating features extracted from microarray expression measurements, Gene Ontology values, and orthologous scores, TAN achieves higher classification accuracy than the manually constructed Bayesian network classifier and naïve Bayes. For human PPI prediction, TAN obtains 88% sensitivity while keeping a reasonable 70% specificity on testing samples.