Get access

Model-based cluster analysis



Cluster analysis seeks to identify homogeneous subgroups of cases in a population. This article provides an introduction to model-based clustering using finite mixture models and extensions. Finite mixtures have been successfully used for more than a hundred years for clustering and classification, but have become increasingly popular in the last decade due to recent advances in computer technology and software availability. Unlike traditional methods of cluster analysis, which are based on heuristic or distance-based procedures, finite mixture modeling provides a formal statistical framework on which to base the clustering procedure. Finite mixture models assume that the population is made up of several distinct subsets (or clusters), each following a different multivariate probability density distribution. Model-based cluster analysis can deal with a mix of nominal, ordinal, count, or continuous variables, any of which may contain missing values. We will demonstrate how the problems of determining the number of clusters and choosing an appropriate clustering method reduce to a model selection problem, for which objective procedures exist. We briefly discuss how model-based cluster analysis can be used to analyze complex and structured (e.g., longitudinal) datasets. WIREs Comput Stat 2012 doi: 10.1002/wics.1204