Get access
Advertisement

Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information theoretic modeling

Authors

  • Martin Vogt,

    1. Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
    Search for more papers by this author
  • Britta Nisius,

    1. Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
    Search for more papers by this author
  • Jürgen Bajorath

    Corresponding author
    1. Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
    • Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
    Search for more papers by this author

Abstract

Similarity searching is currently one of the most widely applied approaches to computationally screen large databases for novel active compounds, and molecular fingerprints are among the most popular search tools. Fingerprint searching has recently also been applied in chemical biology to identify compounds that are selective for a target within a group of related ones. In general, fingerprints are bit string representations of molecular structure and properties but their design, size, and complexity often vary substantially. Like essentially all similarity search tools, fingerprints display a strong compound class dependence in their ability to identify active molecules and distinguish them from other database compounds. In practical applications, this limitation makes it very difficult to select or prioritize fingerprints that are most suitable for a given search problem. We have previously (i) devised a Bayesian-scoring scheme to combine fingerprints and molecular property descriptors for similarity searching and (ii) developed an information-theoretic approach to predict active compound recall rates for fingerprint searching. Herein, we combine these methods and present an approach for the prediction of compound recall in search calculations using Bayesian screening with molecular property descriptors, fingerprints and their combination. For practical similarity search applications, this analysis is highly relevant because it makes it possible to identify search methods that are most likely to be successful for a given compound activity class and screening database. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 123-134, 2009

Get access to the full text of this article

Ancillary