Get access

Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures



Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward’s method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures.