Rank information: A structure-independent measure of evolutionary trace quality that improves identification of protein functional sites

Authors

  • Hui Yao,

    1. Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine,Houston, Texas 77030
    2. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
    Search for more papers by this author
  • Ivana Mihalek,

    1. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
    Search for more papers by this author
  • Olivier Lichtarge

    Corresponding author
    1. Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine,Houston, Texas 77030
    2. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
    • Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030
    Search for more papers by this author

Abstract

Protein functional sites are key targets for drug design and protein engineering, but their large-scale experimental characterization remains difficult. The evolutionary trace (ET) is a computational approach to this problem that has been useful in a variety of case studies, but its proteomic scale application is partially hindered because automated retrieval of input sequences from databases often includes some with errors that degrade functional site identification. To recognize and purge these sequences, this study introduces a novel and structure-free measure of ET quality called rank information (RI). It is shown that RI decreases in response to errors in sequences, alignments, or functional classifications. Conversely, an automated procedure to increase RI by selectively removing sequences improves functional site identification so as to nearly match manually curated traces in kinases and in a test set of 79 diverse proteins. Thus we conclude that RI partially reflects the evolutionary consistency of sequence, structure, and function. In practice, as the size of the proteome continues to grow exponentially, it provides a novel and structure-free measure of ET quality that increases its accuracy for large-scale automated annotation of protein functional sites. Proteins 2006. © 2006 Wiley-Liss, Inc.

Ancillary