Volume 87, Issue 1
Original Article

Distance Metrics and Clustering Methods for Mixed‐type Data

Alexander H. Foss

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, Buffalo, 14214 NY, USA

Search for more papers by this author
Marianthi Markatou

Corresponding Author

E-mail address: markatou@buffalo.edu

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, Buffalo, 14214 NY, USA

E‐mail: markatou@buffalo.eduSearch for more papers by this author
First published: 21 June 2018
Citations: 6

Summary

In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem. In order to identify the most effective approaches for clustering mixed‐type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.

Number of times cited according to CrossRef: 6

  • Clustering Mixed Datasets by Using Similarity Features, Sustainable Communication Networks and Application, 10.1007/978-3-030-34515-0_50, (478-485), (2020).
  • Robust multivariate analysis for mixed-type data: Novel algorithm and its practical application in socio-economic research, Socio-Economic Planning Sciences, 10.1016/j.seps.2020.100907, (100907), (2020).
  • A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data, PeerJ Computer Science, 10.7717/peerj-cs.270, 6, (e270), (2020).
  • Mould wear-out prediction in the plastic injection moulding industry: a case study, International Journal of Computer Integrated Manufacturing, 10.1080/0951192X.2020.1829062, (1-14), (2020).
  • Survey of State-of-the-Art Mixed Data Clustering Algorithms, IEEE Access, 10.1109/ACCESS.2019.2903568, (1-1), (2019).
  • Distance‐based clustering of mixed data, Wiley Interdisciplinary Reviews: Computational Statistics, 10.1002/wics.1456, 11, 3, (2018).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.