Standard Article

Large-scale protein annotation

Part 4. Bioinformatics

4.3. Protein Function and Annotation

Short Specialist Review

  1. Sarah K. Kummerfeld

Published Online: 15 JUL 2005

DOI: 10.1002/047001153X.g403304

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Kummerfeld, S. K. 2005. Large-scale protein annotation. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.3:34.

Author Information

  1. MRC Laboratory of Molecular Biology, Cambridge, UK

Publication History

  1. Published Online: 15 JUL 2005


In recent years, genome projects have scaled up production so that they are now producing hundreds of protein sequences everyday. This has created a wealth of sequence data and raises several questions, such as: (1) what do these proteins do? (2) how do they fold? (3) what are their evolutionary relationships? Large-scale protein annotation aims to address these questions using approaches that can quickly be applied to millions of proteins. This review traces the annotation of a hypothetical proteome in order to illustrate the computational techniques and data sources available for large-scale annotation. We begin by outlining some of the most extensive catalogs of experimentally determined protein annotation and then discuss how homology methods can be used to relate these small- to medium-scale information sources to the large-scale annotation problem. Finally, we discuss some of the biological questions that are being addressed now that we have whole-proteome annotations.


  • protein annotation;
  • function prediction;
  • structure prediction;
  • protein evolution;
  • homology;
  • proteome comparison;
  • sequence comparison