Toward the detection and validation of repeats in protein structure

Authors

  • Kevin B. Murray,

    1. European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
    2. Department of Computer Science, University College, London, UK
    3. Department of Biochemistry and Molecular Biology, University College, London, UK
    Search for more papers by this author
  • William R. Taylor,

    1. Division of Mathematical Biology, National Institute for Medical Research, London, UK
    Search for more papers by this author
  • Janet M. Thornton

    Corresponding author
    1. European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
    2. Department of Biochemistry and Molecular Biology, University College, London, UK
    3. Department of Crystallography, Birkbeck College, London, UK
    • EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
    Search for more papers by this author

Abstract

We present a method called DAVROS to detect, localize, and validate repeating motifs in protein structure allowing for insertions and deletions. DAVROS uses the score matrix from a structural alignment program (SAP) to search for repeating motifs using an algorithm based on concepts from signal processing and the statistical properties of the alignments. The method was tested against a nonredundant Protein Data Bank, and each chain was assigned a score. For the top 50 chains ranked by score, 70% contain repeating motifs detected without error. These represent 14 types of fold covering α, β, and αβ protein classes. A second data set comprising protein chains in different sequence families for triosephosphate isomerase (TIM) barrel, leucine-rich repeat (LRR), trefoil, and α–α barrel folds was used to assess the ability of DAVROS to detect all motifs within a specific fold. For the second test set, the percentage of motifs detected was highest for the LRR chains (88.7%) and least for the TIM barrels (60%). This variability results from the regularity of the LRR motif compared to the αβ units of the TIM barrel, which generally have many more indels. These reduce the strength of the repeat signal in the SAP matrix, making repeat detection more difficult. Proteins 2004. © 2004 Wiley-Liss, Inc.

Ancillary