Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions
Article first published online: 24 FEB 2009
2005 IAVS - the International Association of Vegetation Science
Journal of Vegetation Science
Volume 16, Issue 5, pages 497–510, October 2005
How to Cite
Podani, J. (2005), Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions. Journal of Vegetation Science, 16: 497–510. doi: 10.1111/j.1654-1103.2005.tb02390.x
- Issue published online: 24 FEB 2009
- Article first published online: 24 FEB 2009
- Received 15 September 2004; Accepted 3 September 2005
- Multidimensional scaling;
- Ordinal measure;
Are ordinal data appropriately treated by multivariate methods in numerical ecology? If not, what are the most common mistakes? Which dissimilarity coefficients, ordination and classification methods are best suited to ordinal data? Should we worry about such problems at all?
A new classification model family, OrdClAn (Ordinal Cluster Analysis), is suggested for hierarchical and non-hierarchical classifications from ordinal ecological data, e.g. the abundance/dominance scores that are commonly recorded in relevés. During the clustering process, the objects are grouped so as to minimize a measure calculated from the ranks of within-cluster and between-cluster distances or dissimilarities.
Results and Conclusions:
Evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance. In an optimal situation, each methodological step is order invariant. This property ensures that the results are independent of changes not affecting ordinal relationships, and guarantees that no illusory precision is introduced into the analysis. However, the multivariate procedures that are most commonly applied in numerical ecology do not satisfy these requirements and are therefore not recommended. For example, it is inappropriate to analyse Braun-Blanquet abudance/dominance data by methods assuming that Euclidean distance is meaningful. The solution of all problems is that the dissimilarity coefficient should be compatible with ordinal variables and the subsequent ordination or clustering method should consider only the rank order of dissimilarities. A range of artificial data sets exemplifying different subtypes of ordinal variables, e.g. indicator values or species scores from relevés, illustrate the advocated approach. Detailed analyses of an actual phytosociological data set demonstrate the classification by OrdClAn of relevés and species and the subsequent tabular rearrangement, in a numerical study remaining within the ordinal domain from the first step to the last.