## Introduction

One of the primary goals of field studies in ecology is to estimate how many species of a given taxon (or all taxa) occur in an area. Initially many species are found as larger areas are sampled and a plot of accumulated number of species against area sampled rises steeply at first and then more slowly as the increasingly rare species are added. The species–accumulation curve may approach an asymptote for data sets of species that can be identified easily, such as of plants and breeding birds in Britain (Rosenzweig 1995) or tropical tree species in 50-ha plots (Hubble 2001) where it is possible to obtain a count of all the species present (Colwell & Coddington 1994). For other habitats (or taxa) one cannot expect to count all the species. Examples are estimating numbers of species of beetles in tropical forest tree canopies (Erwin 1988, 1991) or species inhabiting coral reefs or marine sediments. In such habitats all that can be done is to estimate total species richness and the sampling effort needed to obtain reliable estimates of this richness. Here we focus on fitting species–accumulation curves for one of these difficult habitats for which we have extensive data, namely marine soft sediments.

Gotelli & Colwell (2001) state that for patchy distributions the individual-based rarefaction curves ‘inevitably overestimates the number of species that would have been found with less effort’. They suggest that it is preferable to use sample-based species–accumulation curves that take account of between sample heterogeneity, a view with which we entirely concur. Sample-based species–accumulation curves are plotted from samples taken randomly within a given area (Gotelli & Colwell 2001), and take into account the number of species and their identity, but no information of the distribution of individuals among species is utilized.

The order that samples are added to the species–accumulation curve affects the shape of the curve produced. This variation in the shape of the curve results from sampling error and heterogeneity among the species in the samples (Colwell & Coddington 1994). To overcome this problem various sample randomization procedures have been developed (Colwell & Coddington 1994; Gray *et al*. 1997; Colwell 2001; Gotelli & Entsminger 2001). The traditional method of plotting a species–accumulation curve starts by calculating and plotting the mean number of species (and its standard deviation (SD)) of the smallest sample size. Then all combinations of the next sample size are randomized and the mean cumulative number of species is calculated. This procedure is followed for all sample sizes. For the randomized sample data, once a curve has been obtained it can be used to estimate species richness. The traditional method is simply to extrapolate a parametric model for the species–accumulation curve to a larger area for which an estimate is needed. Here we develop an analytical method which gives exact cumulative numbers of species and so obviates the need for randomization using Monte Carlo techniques and curve fitting.

In the traditional method the curve for randomized data takes account of variance in number of species between samples, but does not take into account the fact that within the total area sampled there may be heterogeneity between subareas such that one subarea is species-rich, another subarea species-poor and intermediate subareas of moderate richness. By randomizing over all samples such heterogeneity is ignored. Here we take a different approach by recognizing that heterogeneity in species richness can occur within subareas and that this may have important consequences for estimating species richness.

Consideration of the covariance structure between species and between subareas leads to a largely unrecognized aspect of predicting numbers of species in large areas, namely with the addition of new subareas the new species–accumulation curve will not only cover a larger area, but will usually also lie above that for one subarea taken alone. It is the rate of increase of this new (and subsequent) species–accumulation curve as more subareas are combined which leads to the best estimate of total species richness, and this may be considerably higher than richness estimates from application of the traditional species–accumulation curve approach.

We first develop the analytical approach to the species–accumulation curve and then illustrate the importance of taking into account covariance structure of species between subareas.