A database of protein structure families with common folding motifs

Authors

  • Liisa Holm,

    Corresponding author
    1. European Molecular Biology Laboratory, Heidelberg, Germany
    • Protein Design Group, EMBL, Meyerhofstrasse 1, D-6900 Heidelberg, Germany
    Search for more papers by this author
  • Christos Ouzounis,

    Corresponding author
    1. European Molecular Biology Laboratory, Heidelberg, Germany
    • Protein Design Group, EMBL, Meyerhofstrasse 1, D-6900 Heidelberg, Germany
    Search for more papers by this author
  • Chris Sander,

    Corresponding author
    1. European Molecular Biology Laboratory, Heidelberg, Germany
    • Protein Design Group, EMBL, Meyerhofstrasse 1, D-6900 Heidelberg, Germany
    Search for more papers by this author
  • Georg Tuparev,

    Corresponding author
    1. European Molecular Biology Laboratory, Heidelberg, Germany
    • Protein Design Group, EMBL, Meyerhofstrasse 1, D-6900 Heidelberg, Germany
    Search for more papers by this author
  • Gert Vriend

    Corresponding author
    1. European Molecular Biology Laboratory, Heidelberg, Germany
    • Protein Design Group, EMBL, Meyerhofstrasse 1, D-6900 Heidelberg, Germany
    Search for more papers by this author

Abstract

The availability of fast and robust algorithms for protein structure comparison provides an opportunity to produce a database of three-dimensional comparisons, called families of structurally similar proteins (FSSP). The database currently contains an extended structural family for each of 154 representative (below 30% sequence identity) protein chains. Each data set contains: the search structure; all its relatives with 70–30% sequence identity, aligned structurally; and all other proteins from the representative set that contain substructures significantly similar to the search structure. Very close relatives (above 70% sequence identity) rarely have significant structural differences and are excluded. The alignments of remote relatives are the result of pairwise all-against-all structural comparisons in the set of 154 representative protein chains. The comparisons were carried out with each of three novel automatic algorithms that cover different aspects of protein structure similarity. The user of the database has the choice between strict rigid-body comparisons and comparisons that take into account interdomain motion or geometrical distortions; and, between comparisons that require strictly sequential ordering of segments and comparisons, which allow altered topology of loop connections or chain reversals. The data sets report the structurally equivalent residues in the form of a multiple alignment and as a list of matching fragments to facilitate inspection by three-dimensional graphics. If substructures are ignored, the result is a database of structure alignments of full-length proteins, including those in the twilight zone of sequence similarity. The database makes explicitly visible architectural similarities in the known part of the universe of protein folds and may be useful for understanding protein folding and for extracting structural modules for protein design. The data sets are available via Internet.

Ancillary