6. Data Mining Algorithms I: Clustering

  1. Amiya Nayak B.Math., Ph.D. Adjunct Research Professor Associate Editor Full Professor2 and
  2. Ivan Stojmenović Ph.D. Chair Professor founder editor-in-chief2,3
  1. Dan A. Simovici

Published Online: 1 MAR 2007

DOI: 10.1002/9780470175668.ch6

Handbook of Applied Algorithms: Solving Scientific, Engineering and Practical Problems

Handbook of Applied Algorithms: Solving Scientific, Engineering and Practical Problems

How to Cite

Simovici, D. A. (2008) Data Mining Algorithms I: Clustering, in Handbook of Applied Algorithms: Solving Scientific, Engineering and Practical Problems (eds A. Nayak and I. Stojmenović), John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9780470175668.ch6

Editor Information

  1. 2

    SITE, University of Ottawa, 800 King Edward Ave., Ottawa, ON K1N 6N5, Canada

  2. 3

    EECE, University of Birmingham, UK

Author Information

  1. Department of Mathematics and Computer Science, University of Massachusetts at Boston, Boston, MA 02125, USA

Publication History

  1. Published Online: 1 MAR 2007
  2. Published Print: 14 FEB 2008

ISBN Information

Print ISBN: 9780470044926

Online ISBN: 9780470175668

SEARCH

Keywords:

  • data mining algorithms I - clustering;
  • ultrametric spaces;
  • PAM algorithm

Summary

Clustering is the process of grouping together objects that are similar. The similarity between objects is evaluated by using a several types of dissimilarities (particularly, metrics and ultrametrics). After discussing partitions and dissimilarities, two basic mathematical concepts important for clustering, we focus on ultrametric spaces that play a vital role in hierarchical clustering. Several types of agglomerative hierarchical clustering are examined with special attention to the single-link and complete link clusterings. Among the nonhierarchical algorithms we present the k-means and the PAM algorithm. The well-known impossibility theorem of Kleinberg is included in order to illustrate the limitations of clustering algorithms. Finally, modalities of evaluating clustering quality are examined.