Standard Article

The CATH domain structure database

Part 3. Proteomics

3.6. Proteome Families

Short Specialist Review

  1. Frances Pearl,
  2. Christopher Bennett,
  3. Christine Orengo

Published Online: 15 APR 2005

DOI: 10.1002/047001153X.g306313

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Pearl, F., Bennett, C. and Orengo, C. 2005. The CATH domain structure database. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 3:3.6:89.

Author Information

  1. University College London, London, UK

Publication History

  1. Published Online: 15 APR 2005

Abstract

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 60 435 domain structures classified into 917 fold groups, 1606 superfamilies, and 5202 sequence families. Recent developments include improved methods for rapidly recognizing domain boundaries in multidomain proteins. These exploit the principle of domain recurrence during evolution. Algorithms have been developed that identify these regions using a fast method that compares secondary structure arrangements between proteins (CATHEDRAL). In a recent CATH release, 75% of protein chains from the Protein Data Bank (PDB), with no significant sequence similarity to entries in CATH, had domains that could be recognized using this approach. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help increase the frequency of CATH updates. CATH has recently been used to provide structural annotations for completed genomes. The Web-based Gene3D resource assigns complete and partial genome sequences, from 120 completed genomes, to CATH domain structure superfamilies.

Keywords:

  • protein structure classification and comparison;
  • domain boundary recognition