Standard Article

Functional inference from probabilistic protein interaction networks

Part 3. Proteomics

3.8. Systems Biology

Specialist Review

  1. Joel S. Bader

Published Online: 15 APR 2005

DOI: 10.1002/047001153X.g308202

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Bader, J. S. 2005. Functional inference from probabilistic protein interaction networks. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 3:3.8:111.

Author Information

  1. Johns Hopkins University, Baltimore, MD, USA

Publication History

  1. Published Online: 15 APR 2005

Abstract

The ability to obtain complete genome sequence for human and model organisms has shifted the grand challenge of genomics to describing the functional units of a genome and their interactions at the systems level. Protein-coding genes are the dominant class of functional units, and protein–protein interactions provide the framework for understanding manifold biological processes. Recent years have seen exponential growth in the quantity of protein–protein interaction data available for inferring protein networks. Inference remains challenging, however, because of high false-positive and false-negative rates in the high-throughput studies generating the majority of protein interaction data. Experimental studies now report confidence metrics together with interaction data, and similar methods have been applied retrospectively to analyze legacy data sets. Thus, a convenient level of abstraction for protein interaction data is a weighted graph in which vertices represent proteins, perhaps labeled by biological function, and edge weights represent confidence scores or likelihoods for pairwise interactions. Graph-based models permit intuitive inference of protein function based on the function of neighboring proteins. This review provides a critical view of recent advances in algorithms for protein functional inference from probabilistic protein interaction networks. The types of algorithms considered, in order of increasing statistical rigor and computational cost, include nearest-neighbor approaches, adaptive neighborhoods and tree-building methods, energy minimization by simulated annealing and free energy minimization by message passing methods, and stochastic methods based on Gibbs sampling. The review explores the trade-off between computational cost, predictive accuracy, and generalization error.

Keywords:

  • proteomics;
  • protein interactions;
  • spin lattice;
  • Markov random field;
  • generalized belief propagation;
  • Gibbs sampling