Diversity Selection of Compounds Based on ‘Protein Affinity Fingerprints’ Improves Sampling of Bioactive Chemical Space



Diversity selection is a frequently applied strategy for assembling high-throughput screening libraries, making the assumption that a diverse compound set increases chances of finding bioactive molecules. Based on previous work on experimental ‘affinity fingerprints’, in this study, a novel diversity selection method is benchmarked that utilizes predicted bioactivity profiles as descriptors. Compounds were selected based on their predicted activity against half of the targets (training set), and diversity was assessed based on coverage of the remaining (test set) targets. Simultaneously, fingerprint-based diversity selection was performed. An original version of the method exhibited on average 5% and an improved version on average 10% increase in target space coverage compared with the fingerprint-based methods. As a typical case, bioactivity-based selection of 231 compounds (2%) from a particular data set (‘Cutoff-40’) resulted in 47.0% and 50.1% coverage, while fingerprint-based selection only achieved 38.4% target coverage for the same subset size. In conclusion, the novel bioactivity-based selection method outperformed the fingerprint-based method in sampling bioactive chemical space on the data sets considered. The structures retrieved were structurally more acceptable to medicinal chemists while at the same time being more lipophilic, hence bioactivity-based diversity selection of compounds would best be combined with physicochemical property filters in practice.