SEARCH

SEARCH BY CITATION

Keywords:

  • clustering;
  • Hamiltonian path;
  • multidimensional scaling;
  • multivariate mode;
  • runs test;
  • shortest path;
  • Traveling Salesman Tour;
  • low-dimensional projection

Abstract

We introduce the ‘snake’, a new tool for the visualization and exploration of a multivariate dataset. The snake connects each data point along a single short path. Using techniques from the Traveling Salesman Problem (TSP), it is possible to find such a path in polynomial (nearly quadratic) computational time. A plot of the individual segment lengths versus their position along the path transforms the original multidimensional dataset into a one-dimensional ‘time-series’ of interpoint distances. The snake traces the local structure of a datacloud, so this visualization is most useful for detecting density fluctuations: regions of high density appear as many consecutive short segments, while regions of low density appear as many consecutive long segments. Dips in the time series reveal the presence of clustering and can be used to count the number of modes in the datacloud. We illustrate the technique on a variety of artificial and real-world datasets. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 236-252, 2010