Recurrent use of evolutionary importance for functional annotation of proteins based on local structural similarity

Authors

  • David M. Kristensen,

    1. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
    2. Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
    Search for more papers by this author
  • Brian Y. Chen,

    1. Department of Computer Science, Rice University, Houston, Texas 77030, USA
    Search for more papers by this author
  • Viacheslav Y. Fofanov,

    1. Department of Statistics, Rice University, Houston, Texas 77030, USA
    Search for more papers by this author
  • R. Matthew Ward,

    1. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
    2. Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
    Search for more papers by this author
  • Andreas Martin Lisewski,

    1. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
    Search for more papers by this author
  • Marek Kimmel,

    1. Department of Statistics, Rice University, Houston, Texas 77030, USA
    Search for more papers by this author
  • Lydia E. Kavraki,

    1. Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
    2. Department of Computer Science, Rice University, Houston, Texas 77030, USA
    3. Department of Bioengineering, Rice University, Houston, Texas 77030, USA
    Search for more papers by this author
  • Olivier Lichtarge

    Corresponding author
    1. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
    2. Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
    • Olivier Lichtarge, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA; fax: (713) 798-5386.
    Search for more papers by this author

Abstract

The annotation of protein function has not kept pace with the exponential growth of raw sequence and structure data. An emerging solution to this problem is to identify 3D motifs or templates in protein structures that are necessary and sufficient determinants of function. Here, we demonstrate the recurrent use of evolutionary trace information to construct such 3D templates for enzymes, search for them in other structures, and distinguish true from spurious matches. Serine protease templates built from evolutionarily important residues distinguish between proteases and other proteins nearly as well as the classic Ser-His-Asp catalytic triad. In 53 enzymes spanning 33 distinct functions, an automated pipeline identifies functionally related proteins with an average positive predictive power of 62%, including correct matches to proteins with the same function but with low sequence identity (the average identity for some templates is only 17%). Although these template building, searching, and match classification strategies are not yet optimized, their sequential implementation demonstrates a functional annotation pipeline which does not require experimental information, but only local molecular mimicry among a small number of evolutionarily important residues.

Ancillary