Get access

Probabilistic Substructure Mining From Small-Molecule Screens

Authors

  • Sayan Ranu,

    1. Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA, USA
    Search for more papers by this author
  • Bradley T. Calhoun,

    1. Division of Laboratory and Genomic Medicine, Department of Pathology and Immunology, Washington University, School of Medicine, St. Louis, MO, USA
    Search for more papers by this author
  • Ambuj K. Singh,

    1. Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA, USA
    Search for more papers by this author
  • S. Joshua Swamidass

    Corresponding author
    1. Division of Laboratory and Genomic Medicine, Department of Pathology and Immunology, Washington University, School of Medicine, St. Louis, MO, USA
    • Division of Laboratory and Genomic Medicine, Department of Pathology and Immunology, Washington University, School of Medicine, St. Louis, MO, USA
    Search for more papers by this author

Abstract

Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small-molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.

Ancillary