Improving fingerprint search performance: An activity-oriented feature- filtering procedure and a corresponding similarity function were developed for molecule-specific fingerprints, recording ensembles of structural patterns such as the popular extended connectivity fingerprints. Shown are comparisons of search calculations for cyclooxygenase inhibitors based on k nearest neighbor (1NN, 10NN) and Tanimoto coefficient (Tc) calculations, and the ACF BDM approach introduced herein.
The Pipeline Pilot extended connectivity fingerprints (ECFPs) are currently among the most popular similarity search tools in drug discovery settings. ECFPs do not have a fixed bit string format but generate variable numbers of structural features for individual test molecules. This variable string design makes ECFP representations amenable to compound-class-directed modification. We have devised an intuitive feature-filtering technique that focuses ECFP search calculations on feature string ensembles of given compound activity classes. In combination with a simple bit-density-dependent similarity function, feature filtering consistently improved the search performance of ECFP calculations based on Tanimoto similarity and state-of-the-art data fusion techniques on a diverse array of activity classes. Feature filtering and the bit density similarity metric are easily implemented in the Pipeline Pilot environment. The approach provides a viable alternative to conventional similarity searching and should be of general interest to further improve the success rate of practical ECFP applications.