Standard Article

COGs: an evolutionary classification of genes and proteins from sequenced genomes

Part 3. Proteomics

3.6. Proteome Families

Short Specialist Review

  1. Eugene V. Koonin

Published Online: 15 APR 2005

DOI: 10.1002/047001153X.g306307

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Koonin, E. V. 2005. COGs: an evolutionary classification of genes and proteins from sequenced genomes. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 3:3.6:90.

Author Information

  1. National Institutes of Health, Bethesda, MD, USA

Publication History

  1. Published Online: 15 APR 2005

Abstract

A comprehensive classification of genes from sequenced genomes, based on evolutionary principles, is a must for the success of comparative and functional genomics. One such classification system is the database of Clusters of Orthologous Groups of proteins (COGs), which was constructed by clustering the results of an all-against-all comparison of the protein sequences encoded in prokaryotic and eukaryotic genomes. Each COG includes genes or sets of genes from three or more genomes, which are orthologous to each other, that is, evolved from a single ancestral gene in the common ancestor of the analyzed organisms. Between 50 and 85% of the genes from the sequenced genomes belong to COGs, indicating notable evolutionary conservation. The COG system is a natural framework for comparative genomics and has the potential of facilitating both functional annotation of genomes and large-scale evolutionary studies.

Keywords:

  • comparative genomics;
  • genome evolution;
  • orthologs;
  • paralogs;
  • functional annotation;
  • phyletic patterns;
  • lineage-specific gene loss;
  • horizontal gene transfer