Standard Article

Pfam: the protein families database

Part 3. Proteomics

3.6. Proteome Families

Short Specialist Review

  1. Robert D. Finn

Published Online: 15 APR 2005

DOI: 10.1002/047001153X.g306303

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Finn, R. D. 2005. Pfam: the protein families database. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 3:3.6:86.

Author Information

  1. Wellcome Trust Sanger Institute, Cambridge, UK

Publication History

  1. Published Online: 15 APR 2005

Abstract

Systematic analysis has shown that the majority of proteins can be grouped into approximately 1000 sequence families. These sequence families are often representative of domains. Pfam is a protein families database. The basic contents and availability of the Pfam database are described. Genome sequencing projects, including the human and fly, have used Pfam extensively for large-scale functional annotation of genomic data, while smaller research groups, devoted to a single protein or biochemical pathway, frequently use Pfam for their analyses. Typically, Pfam matches between 55 and 90% of proteins from complete proteome sets. Pfam also allows the domain distributions to be compared for completed genomes. In addition to sequence domain annotation, Pfam also contains information on domain–domain interactions. The new resource that describes domain–domain interactions at the molecular level is called iPfam. The contents of iPfam are briefly outlined.

Keywords:

  • Pfam;
  • genome annotation;
  • HMM;
  • Markov;
  • protein interaction