SEARCH

SEARCH BY CITATION

Keywords:

  • Compositional data analysis;
  • Correspondence analysis;
  • Dataset heterogeneity;
  • Dissimilarity measures;
  • Divisive classification;
  • Heterogeneity measures;
  • Hierarchical classification;
  • Numerical classification;
  • Plant community;
  • Vegetation classification

Abstract

Aim: To propose a modification of the TWINSPAN algorithm that enables production of divisive classifications that better respect the structure of the data.

Methods: The proposed modification combines the classical TWINSPAN algorithm with analysis of heterogeneity of the clusters prior to each division. Four different heterogeneity measures are involved: Whittaker's beta, total inertia, average Sørensen dissimilarity and average Jaccard dissimilarity. Their performance was evaluated using empirical vegetation datasets with different numbers of plots and different levels of heterogeneity.

Results: While the classical TWINSPAN algorithm divides each cluster coming from the previous division step, the modified algorithm divides only the most heterogeneous cluster in each step. The four tested heterogeneity measures may produce identical or very similar results. However, average Jaccard and Sørensen dissimilarities may reach extreme values in clusters of small size and may produce classifications with a highly unbalanced cluster size.

Conclusions: The proposed modification does not alter the logic of the TWINSPAN classification, but it may change the hierarchy of divisions in the final classification. Thus, unsubstantiated divisions of homogeneous clusters are prevented, and classifications with any number of terminal clusters can be created, which increases the flexibility of TWINSPAN.