Bayesian cluster ensembles



Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with missing values, row- or column-distributed cluster ensembles. Existing cluster ensemble algorithms are applicable only to a small subset of these variants. In this paper, we propose Bayesian cluster ensemble (BCE), which is a mixed-membership model for learning cluster ensembles, and is applicable to all the primary variants of the problem. We propose a variational approximation based algorithm for learning Bayesian cluster ensembles. BCE is further generalized to deal with the case where the features of original data points are available, referred to as generalized BCE (GBCE). We compare BCE extensively with several other cluster ensemble algorithms, and demonstrate that BCE is not only versatile in terms of its applicability but also outperforms other algorithms in terms of stability and accuracy. Moreover, GBCE can have higher accuracy than BCE, especially with only a small number of available base clusterings. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 54–70 2011