The two essential considerations for the test are: (i) the construction of the test-statistic, and (ii) the calculation of a P-value using some method of permutation. I will describe the method, which I shall simply call non-parametric MANOVA, first for the one-way design and then for more complex designs, followed by some ecological examples. I deal here only with the case of balanced ANOVA designs, but analogous statistics for any linear model, including multiple regression and/or unbalanced data, can be constructed, as described by McArdle and Anderson (in press).
The test statistic: an F-ratio
The essence of analysis of variance is to compare variability within groups versus variability among different groups, using the ratio of the F-statistic. The larger the value of F, the more likely it is that the null hypothesis (H0) of no differences among the group means (i.e. locations) is false. For univariate ANOVA, partitioning of the total sum of squares, SST, is achieved by calculating sums of squared differences (i) between individual replicates and their group mean (SSW, the within-group sum of squares; Table 1a), and (ii) between group means and the overall sample mean (SSA, the among-group sum of squares). Next, consider the multivariate case where p variables are measured simultaneously for each of n replicates in each of a groups, yielding a matrix of data where rows are observations and columns are variables. A natural multivariate analogue may be obtained by simply adding up the sums of squares across all variables (Table 1b). An F-ratio can then be constructed, as in the univariate case.
Table 1. Calculations of within-group sums of squares for partitioning in (a) univariate ANOVA, (b) a multivariate analogue obtained by summing across variables, (c) a multivariate analogue equivalent to (b) obtained using sums of squared Euclidean distances, (d) the traditional MANOVA approach, which yields an entire matrix (W) of within-group sums of squares and cross products, and (e) the partitioning using inter-point distances advocated here, equivalent to (b) and (c) if Euclidean distances are used
This multivariate analogue can also be thought of geometrically (e.g. Calínski & Harabasz 1974; Mielke et al. 1976; Edgington 1995; Pillar & Orlóci 1996), as shown in Fig. 1 for the case of two groups and two variables (dimensions). Here, SSW is the sum of the squared Euclidean distances between each individual replicate and its group centroid (the point corresponding to the averages for each variable, Fig. 1 and Table 1c). Note that this additive partitioning using a geometric approach yields one value for each of SSW, SSA and SST as sums of squared Euclidean distances. This geometric approach gives sums of squares equivalent to the sum of the univariate sums of squares (added across all variables) described in the previous paragraph. This differs from the traditional MANOVA approach, where partitioning is done for an entire matrix of sums of squares and cross-products (e.g. Mardia et al. 1979; Table 1d).
Figure 1. A geometric representation of MANOVA for two groups in two dimensions where the groups differ in location. The within-group sum of squares is the sum of squared distances from individual replicates to their group centroid. The among-group sum of squares is the sum of squared distances from group centroids to the overall centroid. (——) Distances from points to group centroids; (.......) distances from group centroids to overall centroid; (⋆), overall centroid; (), group centroid; (●), individual observation.
Download figure to PowerPoint
The key to the non-parametric method described here is that the sum of squared distances between points and their centroid is equal to (and can be calculated directly from) the sum of squared interpoint distances divided by the number of points. This important relationship is illustrated in Fig. 2 for points in two dimensions. The relationship between distances to centroids and interpoint distances for the Euclidean measure has been known for a long time (e.g. Kendall & Stuart 1963; Gower 1966; Calínski & Harabasz 1974; Seber 1984; Pillar & Orlóci 1996; Legendre & Legendre 1998; see also equation B.1 in Appendix B of Legendre & Anderson 1999). What is important is the implication this has for analyses based on non-Euclidean distances. Namely, an additive partitioning of sums of squares can be obtained for any distance measure directly from the distance matrix, without calculating the central locations of groups.
Figure 2. The sum of squared distances from individual points to their centroid is equal to the sum of squared interpoint distances divided by the number of points.
Download figure to PowerPoint
Why is this important? In the case of an analysis based on Euclidean distances, the average for each variable across the observations within a group constitutes the measure of central location for the group in Euclidean space, called a centroid. For many distance measures, however, the calculation of a central location may be problematic. For example, in the case of the semimetric Bray–Curtis measure, a simple average across replicates does not correspond to the ‘central location’ in multivariate Bray–Curtis space. An appropriate measure of central location on the basis of Bray–Curtis distances cannot be calculated easily directly from the data. This is why additive partitioning (in terms of ‘average’ differences among groups) has not been previously achieved using Bray–Curtis (or other semimetric) distances. However, the relationship shown in Fig. 2 can be applied to achieve the partitioning directly from interpoint distances.
Thus, consider a matrix of distances between every pair of observations (Fig. 3a). If we let N = an, the total number of observations (points), and let dij be the distance between observation i = 1,…, N and observation j = 1,…, N, the total sum of squares is
Figure 3. Schematic diagram for the calculation of (a) a distance matrix from a raw data matrix and (b) a non-parametric MANOVA statistic for a one-way design (two groups) directly from the distance matrix. SST, sum of squared distances in the half matrix () divided by N (total number of observations); SSW, sum of squared distances within groups () divided by n (number of observations per group). SSA = SST–SSW and F = [SSA/(a– 1)]/[SSW/ (N–a)], where a = the number of groups.
Download figure to PowerPoint
That is, add up the squares of all of the distances in the subdiagonal (or upper-diagonal) half of the distance matrix (not including the diagonal) and divide by N (Fig. 3b). In a similar fashion, the within-group or residual sum of squares is
where εij takes the value 1 if observation i and observation j are in the same group, otherwise it takes the value of zero. That is, add up the squares of all of the distances between observations that occur in the same group and divide by n, the number of observations per group (Fig. 3b). Then SSA = SST–SSW and a pseudo F-ratio to test the multivariate hypothesis is
If the points from different groups have different central locations (centroids in the case of Euclidean distances) in multivariate space, then the among-group distances will be relatively large compared to the within-group distances, and the resulting pseudo F-ratio will be relatively large.
One can calculate the sums of squares in equations (1) and (2) and the statistic in equation (3) from a distance matrix obtained using any distance measure. The statistic in equation (3) corresponds exactly to the statistic in equation (4) of McArdle and Anderson (in press), who have shown more generally how partitioning for any linear model can be done directly from the distance matrix, regardless of the distance measure used. Another important aspect of the statistic described above is that, in the case of a Euclidean distance matrix calculated from only one variable, equation (3) gives the same value as the traditional parametric univariate F-statistic.
This is proposed as a new non-parametric MANOVA statistic that is intuitively appealing, due to its analogy with univariate ANOVA, and that is extremely relevant for ecological applications. The results (in terms of sums of squares, mean squares and pseudo F-ratios) obtained for individual terms in a multivariate analysis can be interpreted in the same way as they usually are for univariate ANOVA. The difference is that the hypothesis being tested for any particular term is a multivariate hypothesis.