Evaluating classification strategies



Abstract. This contribution is a comment on the paper by Belbin & McDonald (1993) on the comparison of three classification strategies for use in ecology. There are two problems in evaluating clustering methods: does the sample adequately reflect the population structure, and what is the nature of the clusters sought. First, one has to decide on the number of clusters to be obtained. Possibly the best approach of all is the Bayesian coding theory for inductive inference. This may depend on the objectives of the clustering, which can be manifold. Phytosociologists do not agree on the nature of the clusters they seek, and are reticent in providing a formal definition of their clusters. As a method for identifying gradients Correspondence Analysis has had some success, so that a classification method largely based on it, notably TWINSPAN, may better reflect what phytosociologists are intuitively seeking than alternative variance minimisation methods. Additionally, TWINSPAN incorporates the characterisation through indicator species. Maybe we are more interested in these differentiating species than in the existence of clusters per se.