Structure-based function inference using protein family-specific fingerprints

Authors

  • Deepak Bandyopadhyay,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
    Current affiliation:
    1. Johnson & Johnson Pharmaceutical Research and Development, 665 Stockton Drive, Exton, PA 19341, USA.
    Search for more papers by this author
  • Jun Huan,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
    Search for more papers by this author
  • Jinze Liu,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
    Search for more papers by this author
  • Jan Prins,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
    Search for more papers by this author
  • Jack Snoeyink,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
    Search for more papers by this author
  • Wei Wang,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
    Search for more papers by this author
  • Alexander Tropsha

    Corresponding author
    1. UNC School of Pharmacy, Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, North Carolina 27599, USA
    • Alexander Tropsha, UNC School of Pharmacy, Medicinal Chemistry and Natural Products, CB# 7360 Beard Hall, Room 327A, University of North Carolina, Chapel Hill, NC 27599-7360, USA; fax: (919) 966-0204.
    Search for more papers by this author

Abstract

We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization.

Ancillary