Diversity selection is a frequently applied strategy for assembling high-throughput screening libraries, making the assumption that a diverse compound set increases chances of finding bioactive molecules. Based on previous work on experimental ‘affinity fingerprints’, in this study, a novel diversity selection method is benchmarked that utilizes predicted bioactivity profiles as descriptors. Compounds were selected based on their predicted activity against half of the targets (training set), and diversity was assessed based on coverage of the remaining (test set) targets. Simultaneously, fingerprint-based diversity selection was performed. An original version of the method exhibited on average 5% and an improved version on average 10% increase in target space coverage compared with the fingerprint-based methods. As a typical case, bioactivity-based selection of 231 compounds (2%) from a particular data set (‘Cutoff-40’) resulted in 47.0% and 50.1% coverage, while fingerprint-based selection only achieved 38.4% target coverage for the same subset size. In conclusion, the novel bioactivity-based selection method outperformed the fingerprint-based method in sampling bioactive chemical space on the data sets considered. The structures retrieved were structurally more acceptable to medicinal chemists while at the same time being more lipophilic, hence bioactivity-based diversity selection of compounds would best be combined with physicochemical property filters in practice.
Compound diversity selection is a commonly applied approach for assembling screening libraries. In this study, we show that by considering knowledge-based ‘Protein Affinity Fingerprints’ on a panel of training set proteins, we also significantly increase target coverage on a panel of test set proteins. In addition, compounds selected look more favourable to a medicinal chemist, albeit being on average larger.