The key concept in chemogenomics is the similarity principle that states that similar ligands should bind similar targets. Chemogenomic analysis requires large amounts of data and both powerful computational algorithms and computers. Data used for chemogenomics analysis can either be compiled from open sources, or they can be produced in-house as is often done in the pharmaceutical industry. The chemogenomic modeller often has to resort to mixing activity values from different laboratories and even assay types to facilitate chemogenomic analysis. The amount of chemogenomics data available in the public domain has dramatically increased in recent years, allowing fully traceable analysis on a continuously increasing scale. However, some warning flags about the data quality have been raised and because the primary data determine the accuracy of chemogenomic analysis, the quality of the data is one of the key questions in chemogenomics. This mini-review discusses some of the most common issues with public domain biological data related to chemogenomic analysis. The errors in data can originate from problems with the experiments themselves and their interpretation, or from more mundane issues such as data extraction and annotation. These issues are not unique for a certain database but are shared by all the public domain databases and can plague commercial and in-house bioactivity databases as well.