## Introduction

Methods of classifying species and samples from multivariate species occurrence data were much investigated in the 1960s and 1970s. A distinction was made between Q-mode methods, in which the samples or stands were clustered, and R-mode methods, in which the species were clustered. Occasionally, as in Lambert & Williams's (1962) nodal analysis and Hill's (1979) program Twinspan both samples and species were clustered, one after the other. By the end of the 1970s, it was accepted that the correct procedure is to classify the samples first. R-mode methods were in eclipse.

More recently, in the period 1995–2010, there has been renewed interest in numerical classification, mainly in the fields of text mining (Manning, Raghavan & Schütze 2008) and genomics.

Along with the general increase of interest in numerical classification, two-way classification has received increased attention. Two-way classification is variously known as biclustering (Madeira & Oliveira 2004; Gupta & Aggarwal 2010), co-clustering (Banerjee *et al*. 2007; Jain 2010) or two-mode clustering (Van Mechelen, Bock & De Boeck 2004; Schepers & Van Mechelen 2011; Hageman, Malosetti & van Eeuwijk 2012). The term biclustering, used here, was apparently introduced by Mirkin (1996), who does indeed cite Twinspan as an example. There has, however, been little flow from methodologies used in text mining and bioinformatics into ecology.

A promising approach to clustering and biclustering is to treat these methods as fitting models to a data matrix. An interesting example is set out by Martella & Vichi (2012). They and several other authors (ter Braak *et al*. 2009; Schepers & Van Mechelen 2011) use the least-squares criterion to approximate either a raw matrix or a similarity matrix. Approximations to a raw matrix based on unweighted least squares are generally not suitable for occurrence data in ecology and biogeography. We set out a crude multiplicative model for such data, but do not use it except as a means of estimating the Akaike Information Criterion to select the numbers of clusters.

Our interest in R-mode clustering was rekindled during a study of European plant distributions (Finnie *et al*. 2007). For this purpose, we compared species distributions with cluster centroids, using the cosine measure of similarity. This measure is widely used in text mining (Manning, Raghavan and Schütze 2008). Finnie's (2007) clustering algorithm was agglomerative, building up clusters from pairs of similar individual species. It was rather complicated and had some arbitrary parameters. Therefore, in a subsequent study of British and Irish liverworts (Preston, Harrower & Hill 2011), we used a simpler method. We called it Clustaspec. It starts by being agglomerative, and continues with a second phase in which the smallest clusters are systematically removed and their species distributed to larger ones. When Clustaspec was applied to other datasets, it usually gave good results, but it had a tendency to generate small clusters of rare species confined to special habitats. We were not entirely satisfied with it.

Both Finnie's (2007) method and Clustaspec tidied up the final clustering by means of an iterative relocation algorithm, by which each species was allocated to the nearest cluster centre, repeating the process until stability was reached. For clustering in Euclidean space, this method is known as the k-means algorithm (Krishna & Murty 1999). Finnie's algorithm and Clustaspec defined proximity in terms of the cosine similarity measure. Their relocation algorithm was therefore a case of the spherical k-means (SKM) algorithm, whose properties have been investigated by Vinh (2008). There is, however, an important difference. In the SKM algorithm described by Vinh, the objects to be clustered are first projected on the surface of the unit hypersphere, and are thereafter clustered by the SKM algorithm. In the algorithm used by us, the unit hypersphere was not considered, the cluster centres being calculated simply as the centroids of untransformed vectors. As explained below, this amounts to weighted SKM, with weights proportional to the length of the untransformed vectors. The weights make a big difference.

In Clustaspec, we used the SKM algorithm merely for tidying up the clusters. Vinh (2008) shows that the SKM algorithm will converge to a local optimum of the SKM objective function, defined as the sum of squared chord distances between cluster centres and individual cluster vectors. He also points out that there are very many such local optima. Indeed, there are so many local optima that the quest for the global optimum can be very arduous. For this quest, we have devised an algorithm based on ‘key species’. These are defined as those species that are most closely aligned to the cluster centres. Key species were used by Finnie *et al*. (2007) and Preston, Harrower & Hill (2011) to name the clusters. In the algorithm described below, they are used also to initiate the clusters.