## Introduction

Field, Clarke & Warwick (1982 ) described a non-parametric multivariate strategy for analysing multispecies distribution patterns or, in other words, changes in species composition. Although formal tests for differences in species composition between groups of samples did not form part of the original strategy, as pointed out by Clarke & Green (1988 ) many community data possess some a priori defined structure within a set of samples, for example replicates from a number of different sites (and/or times). A prerequisite to interpreting community differences between sites should be a demonstration that there are statistically significant differences to interpret. Within the non-parametric similarity-based framework for analysis of assemblage data such tests generally involve ‘distance’ matrix comparisons, either explicitly or implicitly.

One method for comparing two (symmetric) similarity or distance matrices, computed for the same objects, is the Mantel test (Mantel 1967). First, a statistic is calculated between corresponding elements in the two lower-triangular matrices. The basic form of the Mantel statistic is the sum of cross-products of corresponding elements in the two similarity or distance matrices. Values may or may not be standardized, or transformed into ranks, before computing the statistic (Legendre & Legendre 1998). Transforming distances into ranks before computing a standardized Mantel statistic is equivalent to computing a Spearman rank correlation (ρ) between corresponding values in the two matrices. As suggested by Dietz (1983) and Hubert (1985), using such a non-parametric rank correlation has several advantages. Primarily, it diminishes the influence of large differences in values and is appropriate for non-linear monotonic relationships (Somerfield & Gage 2000).

Whichever statistic is calculated, a decision needs to be made whether to accept or reject the null hypothesis. Standard tables for testing correlations such as ρ are invalidated by the lack of independence of elements of a similarity matrix. More usually, the significance of the statistic in a particular test is determined by a permutation procedure (Hope 1968). One such procedure is to reallocate object-labels randomly in one of the matrices a number of times, recalculating the relevant statistic each time, in order to construct the distribution of the statistic under the null hypothesis. The actual value is then compared to this reference distribution and, as with any statistical test, the null hypothesis is rejected for observed values of the test statistic in the upper *P* tail of this distribution. We adopt the modern practice of quoting the actual value of *P*, the probability of obtaining a value of the test statistic as large as, or larger than, the observed value if the null hypothesis is true. Note that the test is one-sided because an appropriate alternative hypothesis is that intergroup dissimilarities are greater than intragroup dissimilarities (two-sided alternatives can be of interest in some special cases, see Chapman & Underwood 2000).

The Mantel test may be used to assess how well observational data match an a priori model. The test compares the empirical resemblance matrix to a model matrix, so-called because it is constructed to represent the model to be tested. In other words, it depicts the alternative hypothesis of the test. If the model is a classification of the objects into groups, the Mantel test is then a type of non-parametric multivariate analysis of variance, and addresses a null hypothesis of the one-way anova type:

- • H
_{0}: There are no differences between two (or more) groups defined a priori, with the alternative; - • H
_{1}: There are differences between two (or more) groups.

Focusing on problems of analysis of variance that involve community composition, Clarke & Green (1988) and Clarke (1993) developed a parallel approach to model-based Mantel tests, called anosim (ANalysis Of SIMilarities), which encompassed both the one-way layout and the two-way crossed and nested anova-type designs. As demonstrated by Legendre & Legendre (1998), the main difference between the model-based Mantel test and one-way anosim, as test procedures, is one of parametric or non-parametric tradition. Instead of using a Mantel statistic, calculated using actual or standardized distances, for anosim distances are converted into ranks prior to calculating the anosim statistic *R*, a difference between inter- and intragroup rank dissimilarities. In the simple two-group one-way layout Legendre & Legendre (1998) show that *R* is a monotonic function of a Mantel statistic computed on ranked distances. It is thus analogous to a Spearman correlation coefficient in this case, although for more complex model structures the precise link is less clear. The anosim statistic *R* has the advantage of an absolute interpretable value which may be compared across different tests. By contrast, the Mantel statistic is usually of the normalized type, with the statistic divided by its standard deviation, which limits its interpretation to testing the null hypothesis (rather than interpreting the size of group differences when the null is rejected).

Often, however, an investigator anticipates more than an undefined difference between groups of samples and may have an a priori expectation of the probable direction and magnitude of differences to be expected. For example, a model matrix may be constructed to represent the hypothesis that a gradient in species composition is present in the data, in which case the model matrix could represent geographical distances along a transect, and the null and alternative hypotheses would then be:

- • H
_{0}: The dissimilarities among samples from the ecological matrix are not (non-parametrically) correlated with the corresponding model distances; - • H
_{1}: The sample dissimilarities are correlated to the distances in the model matrix.

Despite the fact that a Mantel test based on expected differences between samples, for example samples at opposite ends of a gradient being more dissimilar than samples at intermediate positions along the gradient, might be expected to be commonly employed, this is not the case. The majority of investigations employing the non-parametric rank-similarity based approach rely on anosim to test for significant differences between groups. This is a perfectly valid approach, but can lead to a lack of sensitivity, as we shall show.

One area where the type of multivariate analyses discussed in this paper have become commonplace is in analyses of data resulting from pollution surveys. In the Norwegian sector of the North Sea such analyses were used to demonstrate that the effects of disturbance from drilling and extraction activities are far more extensive than previously realized (Gray *et al*. 1990) leading to changes in the legislation regulating such activities in the sector. Until recently (Gray 1999) biological monitoring was required around each installation every 3 years. These surveys, in which samples are collected at different distances from a putative point source of pollution (Olsgard & Gray 1995), provide good examples of data in which gradients in species composition are to be expected (Olsgard, Somerfield & Carr 1997, 1998). Macrobenthic assemblage data from surveys at 6 fields are used in the present study.

anosim tests between groups of samples near (< 1 km), further (1 km) and far (> 1 km) away at five of these fields ( Table 1 , Fig. 1 ) could be interpreted as showing that oilfield activities are only affecting macrobenthic community structure at Statfjord C, although there is a suggestion that these activities are having a mild influence on macrobenthic abundances in at least some of the others. Convential wisdom, however, states that further analyses should only be undertaken if the global test is significant, and the null hypothesis of ‘no difference between groups’ is rejected with confidence. Remembering, however, that in each of these surveys a gradient in community structure may be expected, what happens if we choose to ignore conventional wisdom and continue with pairwise anosim tests between the three distance groups of samples in each survey? The results ( Table 1 ) do indeed indicate that there is more going on than is revealed by a simple global test for differences between groups. In each case *R* -values for the pairwise test between samples at opposite ends of the gradient (< 1 km and > 1 km) are considerably higher than *R* -values for pairwise tests between samples from each end of the gradient and samples in the middle. In two of the surveys these pairwise tests have *P* < 0·05, even though the relevant global test failed to reject the null hypothesis of ‘no differences between groups’. MDS plots ( Fig. 1 ) also confirm that in each of the surveys there is some evidence of a gradient in macrobenthic community structure related to a gradient in distance from the centre of the field.

Location, year, number of stations | anosim global tests | anosim pairwise tests | relate ‘seriation with replication’ tests | |||||||
---|---|---|---|---|---|---|---|---|---|---|

A v B | A v C | B v C | ||||||||

R | P | R | P | R | P | R | P | R | P | |

Veslefrikk 1993, 14 | 0·174 | 0·071 | 0·119 | 0·246 | 0·356 | 0·008 | 0·031 | 0·389 | 0·245 | 0·005 |

Gullfaks A 1989, 16 | 0·114 | 0·141 | −0·156 | 0·865 | 0·303 | 0·027 | 0·013 | 0·442 | 0·200 | 0·019 |

Gullfaks B 1993, 13 | 0·065 | 0·217 | 0·019 | 0·333 | 0·162 | 0·143 | 0·042 | 0·400 | 0·218 | 0·014 |

Statfjord A 1993, 11 | 0·294 | 0·061 | 0·356 | 0·063 | 0·545 | 0·095 | −0·143 | 0·600 | 0·310 | 0·034 |

Statfjord C 1993, 13 | 0·349 | 0·012 | 0·088 | 0·238 | 0·694 | 0·008 | 0·302 | 0·029 | 0·379 | 0·002 |

What happens if we apply a test designed for the detection of a gradient in the data? For each survey a model distance matrix was constructed in which samples were grouped, as before, into samples < 1 km, 1 km and > 1 km from the centre of the field. Samples in the same distance group are considered to be at distance 0 apart, in adjacent groups at distance 1, and at opposite ends of the gradient at distance 2. The absolute numbers used are not important, only the rank order of their differences being used in the non-parametric form of the Mantel test. Following the terminology of Clarke, Warwick & Brown (1993), who examined the impact of dredging activities on the structure of coral communities with ρ as a test statistic which they called an ‘index of seriation’, a relate test of ‘no relationship’ between the resulting distance and biotic matrices may be referred to as a relate‘seriation with replication’ test (Table 1), and the results are unequivocal. In each of the five surveys the hypothesis ‘there is no gradient in the data’ can be rejected with a high degree of confidence.

The failure of anosim to detect differences between groups of samples, when there are apparent differences to be interpreted, and the greater ability of relate to detect a gradient when one exists, are directly related to the power of the different tests for specific alternative hypotheses. There is no analytical framework for calculating the power of such multivariate tests, and simulation studies comparing type I error and power of the various methods of matrix comparison (anosim, Mantel tests, etc.) are needed (Legendre & Legendre 1998). To understand better the advantage, in terms of power of the test, that the relate approach may have over the anosim procedure, we look in detail at the direct analogue of this comparison in the univariate case, where a general analytical framework is available. The improved power of univariate gradient (correlational) analyses in comparison with control-impact (categorical) designs, specifically for oil-field studies, was examined by Ellis & Schneider (1997). They, however, calculated only relative power values for a specific data set. In contrast this paper provides general results for power comparisons of correlational and categorical tests in the univariate case. Simulations of specific cases are then used to determine the extent to which the multivariate tests match their univariate analogues in terms of their behaviour and relative power.