Soft clustering for information retrieval applications



This paper overviews soft clustering algorithms applied in the context of information retrieval (IR). First, a motivation of the utility of soft clustering approaches in IR is discussed. Then, an outline of the two main flat soft approaches, namely probabilistic clustering and fuzzy clustering, is described. Specifically, the expectation maximization and fuzzy c-means algorithms are introduced, and some of their extensions defined to overcome their main drawbacks when applied for organizing large document collections. Finally, soft hierarchical clustering algorithms designed for generating taxonomies of documents are introduced. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 138-146 DOI: 10.1002/widm.3