Chapter

7 Clustering and Classification Methods

Research Methods in Psychology

I. FOUNDATIONS OF RESEARCH ISSUES

  1. Glenn W. Milligan PhD1,
  2. Stephen C. Hirtle PhD2

Published Online: 26 SEP 2012

DOI: 10.1002/9781118133880.hop202007

Handbook of Psychology, Second Edition

Handbook of Psychology, Second Edition

How to Cite

Milligan, G. W. and Hirtle, S. C. 2012. Clustering and Classification Methods. Handbook of Psychology, Second Edition. 2:I:7.

Author Information

  1. 1

    The Ohio State University, Fisher College of Business, Columbus, Ohio, USA

  2. 2

    University of Pittsburgh, School of Information Sciences, Pittsburgh, Pennsylvania, USA

Publication History

  1. Published Online: 26 SEP 2012

Abstract

The chapter by Milligan and Hirtle provides an overview of the current state of knowledge in the field of clustering and classification. Such methods are used to find groups in multivariate data sets. The methods are discussed within the context of exploratory data analysis, though some confirmatory or testing methods are reviewed. A survey of the issues critical to the analysis of empirical data is presented along with “best practice” recommendations for the applied user. Coverage includes sections on data preparation, data models, and data representation using distance and similarity measures. The section on clustering algorithms covers a wide range of classification methods, including latent profile analysis. In addition, the algorithms section includes a discussion of the known cluster recovery performance of various selected clustering methods. The fourth section covers a variety of issues important for applied analyses such as data sampling, variable selection, variable standardization, choosing the number of clusters, and post classification analysis of the results. Threaded into the discussion are three example applications of the methodology to empirical data. The examples are based on perceived kinship data, animal similarity data, and the classification of single malt scotch whiskies.

Keywords:

  • clustering algorithms;
  • classification validation;
  • tree models of data;
  • Monte Carlo methods;
  • latent profile analysis