## Introduction

The relationship between pattern and scale is fundamental to population-, community- and macroecology (Levin 1992; Brown 1995; Gaston & Blackburn 2000; McGeoch & Price 2004). Indeed, understanding and predicting the form of species distributions, or occupancy patterns, is dependent on the identification of spatial scaling relationships that underlie the patterns observed (Kunin 1998; He & Gaston 2000a, 2003; Kunin, Hartley & Lennon 2000; Dungan *et al*. 2002; McGeoch & Gaston 2002; Perry *et al*. 2002). Species occupancy, or records of presence and absence across a series of sites or quadrates, has become central to the debate on several patterns in ecology, including occupancy frequency distributions (McGeoch & Chown 1997; McGeoch & Gaston 2002) and the occupancy–abundance relationship (Brown 1995; Gaston & Blackburn 2000; He & Gaston 2000b, 2003; Kunin *et al*. 2000; Holt, Gaston & He 2002). However, although spatial scale is clearly an important determinant of occupancy patterns, the relationship between occupancy and spatial scale (measured as grain or window size) remains difficult to predict (He & Gaston 2000a; McGeoch & Gaston 2002).

At the root of variation in the relationship between occupancy and spatial scale is the manner in which species aggregation patterns change over distance (Moloney *et al*. 1992; He, Gaston & Wu 2002; He & Gaston 2003; He & Hubbell 2003), with aggregation determined by a combination of species biology, behaviour, abundance and environmental heterogeneity (Nachman 1981; Taylor *et al*. 1983; Levin 1992; Dungan *et al*. 2002; Perry *et al*. 2002). Indeed, the range of statistical distribution models used to describe the occupancy–abundance relationship, reflects in part the inherent variation in the distribution patterns of individuals across space (He *et al*. 2002). Furthermore, although these models describe the occupancy–abundance relationship adequately, the parameter estimates of the relationship remain dependent on the scale of observation (He & Gaston 2000a). As a consequence, scaling models for species distributions have been developed from the relationships between species occupancy, spatial scale and pattern of aggregation (Kunin *et al*. 2000; He & Gaston 2003).

Some of the above models have been extended to use the scale-dependent nature of the occupancy–abundance relationship to predict its parameters (Kunin 1998, 2000; He *et al*. 2002). Kunin (1998) used a fractal power relationship to estimate the total area occupied by species (the sum of the number of occupied cells). He & Gaston (2000b) followed this with a model and parameterization method for predicting species abundance from measures of occupancy, across spatial scales, using the negative binomial distribution. Because abundance across the sampling extent is scale invariant, this model provides a scaling theory of occupancy. Methods such as this, of predicting species abundance and occupancy, are potentially very valuable to both conservation and pest management (He & Hubbell 2003; Warren, McGeoch & Chown 2003; Tosh, Reyers & van Jaarsveld 2004). Broad-scale distributional data, particularly abundance data, are generally labour-intensive and costly to obtain and generally not feasible for a broad range of taxa (He & Gaston 2000b). Therefore, approaches that are able to model occupancy patterns and predict species abundance accurately are an important avenue of ecological investigation.

Unfortunately, the accuracy of abundance estimates using the models mentioned above are often highly variable (Kunin *et al*. 2000; He & Gaston 2003; Warren *et al*. 2003; Tosh *et al*. 2004). Significantly, however, the information on aggregation incorporated in these statistical models (for example, mean abundance and the aggregation parameter *k* in the negative binomial distribution) is spatially implicit rather than explicit (*sensu *Perry 1998). In other words, no spatial information, such as relative location or spatial autocorrelation, is incorporated into the model (Wiens 2000; Perry *et al*. 2002; Veldtman 2004). The inclusion of spatial information in other forms of ecological models has significantly advanced our understanding of several ecological processes (e.g. Hanski 1998; Dieckmann, Law & Metz 2000; Veldtman & McGeoch 2004) and, as suggested by Warren *et al*. (2003) and shown by He & Hubbell (2003), may also improve the accuracy of occupancy–abundance model estimates.

Here we propose a novel scaling model (SSO model) for the estimation of species occupancy and first-order spatial correlation [i.e. the correlation in the probability of occupancy between two adjacent sites or quadrates (equivalent to local density, see below)] using an approach that includes spatially explicit information (*sensu *Perry 1998). The approach presented here facilitates accurate, data sparse estimation of occupancy, and provides an alternative approach towards modelling species distributions. We test the accuracy of model predictions using a mesocosm of flies occupying decaying fruit, and compare the accuracy of estimates of our spatially explicit model with occupancy estimates obtained from He & Gaston's (2003) spatially implicit occupancy–abundance model.

### occupancy–abundance scaling models

We first outline developments in occupancy–abundance models as the basis for the spatial scaling approach that we develop here. Spatial variance σ^{2}, or the variation in counts or abundance between samples, is a measure of statistical heterogeneity and is spatially non-explicit (Wiens 2000; Perry *et al*. 2002; Veldtman & McGeoch 2004). Three well-known, distribution patterns are defined by spatial variance (Pielou 1969), i.e. aggregated (over-dispersed, clumped or contagious; spatial variance larger than mean), random (spatial variance equivalent to mean) and uniform (under-dispersed, even or regular; spatial variance less than mean abundance). Spatial variance is also estimated by formal statistical distributions, such as the Poisson and negative binomial distributions (Wright 1991), with the negative binomial most commonly adopted to describe species distributions:

where *p*_{x}(*a*) is the proportion or the probability that a sampling quadrate of grain size *a* contains *x* individuals, µ_{a} is the mean abundance across sampling, *k* is a clumping parameter of the species’ distribution from highly aggregated at *k* = 0 to random at *k* = +∞ (Wright 1991). As *k* tends to infinity, the negative binomial describes a Poisson distribution. Based on the negative binomial distribution, the absence probability in a sample is *p*_{0}(*a*) = (1 + µ_{a}/*k*)^{−k}. Therefore, the presence probability in samples, i.e. the occupancy or occurrence, will be *p*_{+}(*a*) = 1 − (1 + µ_{a}/*k*)^{−k}, which is a relationship between occupancy *p*_{+}(*a*) and mean abundance µ_{a} (He & Gaston 2000b). On the other hand, the mean abundance µ_{a} is the product of the density per unit area, *d*, and grain size, *a* (µ_{a} = *a* × *d*) (Hubbell 2001).

Substituting the linear grain–density relationship into the presence or absence probabilities above thus provides a scaling theory of occupancy. However, this relationship has been shown to underestimate abundance in high occupancy (or highly aggregated) species, and to a lesser extent in rare and moderately abundant species (Kunin *et al*. 2000; He & Hubbell 2003; He & Gaston 2000b; Warren *et al*. 2003). He & Gaston (2003) found this underestimation resulted from discordance between the negative binomial and observed data. The relationship between the statistical variance and mean abundance in the negative binomial is σ^{2} = µ_{a}+ /*k*; however, the observed relationship for most species fits a Taylor's power law σ^{2}= *c* × , where *c* and *b* are constant (Taylor 1961). The assumption that exponent *b* is >1 not only has a theoretical explanation, but is empirically well supported (Downing 1986). Under this assumption (*b* > 1), the variance–mean abundance ratio (called the coefficient of diffusion or index of dispersion, Pearson & Hartley 1966) changes from zero to infinity with scaling up (i.e. an increase in grain *a*). Thus, based on this spatially implicit approach species distributions change from uniform to random, and finally to aggregated with an increase in grain [note that this ratio does not describe spatial distribution patterns accurately, but provides a measure of statistical heterogeneity (Hurlbert 1990)].

Based on Taylor's power law the variance of the negative binomial should be replaced by a varying *k*-value, *k* = /(σ^{2} − µ_{a}) (He & Gaston 2003; He & Hubbell 2003). Therefore, the scaling theory of occupancy should be as follows (He & Gaston 2003):

This is also an occupancy–abundance relationship in the spatially implicit, rather than spatially explicit, sense. In sum, this scaling theory of occupancy is a combination of the negative binomial distribution (eqn 1) and Taylor's power law (henceforth referred to as the NBT model). Therefore, although the NBT model was developed to examine the spatial distributions of species, it paradoxically includes no explicit spatial information (Wiens 2000). While the NBT model provides accurate estimates in some cases (He & Gaston 2003), it requires two occupancy maps at different grains to do so (Kunin 1998; He & Gaston 2000b; Kunin *et al*. 2000). As shown below, a spatially explicit model that incorporates spatial correlation requires data at only a single spatial scale.

### a spatial scaling model of occupancy

The terms ‘aggregated’ and ‘random’ are also used to describe spatially explicit patterns of heterogeneity, although their interpretation is somewhat different to that described by spatially implicit models (Wiens 2000; Fortin, Dale & ver Hoef 2002; Perry *et al*. 2002; Veldtman 2004). For example, Veldtman & McGeoch (2004) show that conclusions drawn regarding patterns of heterogeneity, and correlations between them, differ with the use of spatially explicit vs. implicit approaches. Therefore, spatial variance, as a spatially implicit measure, is clearly insufficient to describe patterns in the physical distribution of individuals across space (see also Hurlbert 1990).

One approach to describing species distributions in a spatially explicit manner is join-count statistics (Fortin *et al*. 2002). The join-count statistic is conceptually and mathematically similar to the pair-approximation (or moment approximation) approach [see Dieckmann *et al*. (2000) for outline of the latter] used to describe spatial distributions in metapopulation ecology. The spatiotemporal dynamics of binary maps forms the basis of metapopulation ecology, based on Levins’ patch occupancy model (Levins 1969; Hanski 1998; Hui & Li 2003). Join-count statistics can also be used to classify distributions as spatially aggregated, segregated or random in terms of the global and local densities used in pair approximation (Sato & Iwasa 2000; Hui & Li 2004). Global density is the probability that a randomly chosen sample is presently occupied by a local population, which has the same meaning as occupancy *p*_{+}(*a*). Local density *q*_{+/+}(*a*) is the conditional probability that a randomly chosen adjacent quadrate of an occupied quadrate is also occupied (Dieckmann *et al*. 2000; Hui & Li 2004). Therefore, spatially explicit aggregation can be described by *q*_{+/+}(*a*) − *p*_{+}(*a*) > 0, indicating the positive first-order spatial correlation between two adjacent, occupied samples. The spatial random distribution has *q*_{+/+}(*a*) − *p*_{+}(*a*) = 0 and implies the independence of two adjacent, occupied samples. Lastly, therefore, the spatial segregated distribution can be depicted by *q*_{+/+}(*a*) − *p*_{+}(*a*) < 0, i.e. a negative spatial correlation between two adjacent samples (Hui & Li 2004).

We therefore use the pair-approximation approach (Sato & Iwasa 2000) to develop a scaling model for the prediction of species occupancy and spatial correlation. The spatial scaling occupancy (SSO) model predicts occupancy and spatial correlation with a change in grain size. Suppose the sampling unit *a* is a square quadrate sample (Fig. 1a). The most common shapes generated when such units are combined, such that the area is 4*a*, are a transect and chessboard (Fig. 1b,c). For a transect, the probability of absence in a sample is *p*_{0}(4*a*) = *p*_{0}(*a*) × *q*_{0/0}(*a*)^{3}. The absence probability for a chessboard is: *p*_{0}(4*a*) = *p*_{0}(*a*) × *q*_{0/0}(*a*)^{2} × *b*_{0}(*a*), in which

is the estimation, according to Bayes’ rule, that a sample unit with two absent neighbours (i.e. three-quadrates) is also absent (Pitman 1993). The absence probability in the chessboard case above is a product of three probabilities: the global density of absence *p*_{0}(*a*) (probability that a patch is empty), the local density of absence in a pair *q*_{0/0}(*a*) (correlation between the conditional absence probabilities of two adjacent patches) and the Bayes estimation of absence in the three-quadrate situation *b*_{0}(*a*). The conditional probability *q*_{0/0}(4*a*) has two forms in the transect case (adjacent to either the long or short edge) and only one form in the chessboard case. The probability *q*_{0/0}(4*a*) in the transect adjacent to the long edge (Fig. 1d) is *q*_{0/0}(4*a*) = *q*_{0/0}(*a*) × *b*_{0}(*a*)^{3} and adjacent to the short edge (Fig. 1e) is *q*_{0/0}(4*a*) = *q*_{0/0}(*a*)^{4}. In the chessboard case (Fig. 1f), we have *q*_{0/0}(4*a*) =*q*_{0/0}(*a*)^{2} *b*_{0}(*a*)^{2}.

Therefore, according to probability rules that *p*_{+} = 1 − *p*_{0} and *q*_{+/+} = 1 − (1 − *q*_{0/0}) × *p*_{0}/*p*_{+}, by scaling-up we obtain occupancy (global density) and spatial correlation (the local density minus the global density). Because the formula is complex, here we provide the probability for only the chessboard case, as this is the most common form of occupancy data. The occupancy and spatial pattern of transect samples can be obtained similarly from the above probabilities. For chessboard samples, the occupancy is:

and the local density is:

where ∇ = *p*_{0}(*a*) − *q*_{0/+}(*a*) *p*_{+}(*a*) and Δ = *p*_{0}(*a*)[1 − *p*_{+}(*a*)^{2} (2*q*_{+/+}(*a*) − 3) + *p*_{+}(*a*)(*q*_{+/+}(*a*)^{2} − 3)]. The conditional probability *q*_{0/+}(*a*) = 1 − *q*_{+/+}(*a*) is the absence probability in a quadrate adjacent to an occupied one. If species occupancy *p*_{+}(*a*) and local density *q*_{+/+}(*a*) are known for the feasible region (coloured region of Fig. 2a,b), that is 0 ≤ *p*_{+}(*a*) ≤ 1 and 2 − 1/*p*_{+}(*a*) ≤ *q*_{+/+}(*a*) ≤ 1 (it is not possible for combinations of occupancy and local density to lie outside this region) (Hui & Li 2004), then it is possible to obtain, from eqns 3 and 4, the occupancy and local density (and thus spatial correlation) of a species at larger scales (i.e. scaling-up) (Fig. 2a,b).

From eqns 3 and 4, several outcomes are obtained. First, at the greatest degree of segregation *q*_{+/+}(*a*) = 2 − 1/*p*_{+}(*a*), (the boundary between the feasible and non-feasible regions in Fig. 2a–c), occupancy and local density will rapidly tend to 1 with scaling-up (Fig. 2). This result can be obtained by the substitution of *q*_{+/+}(*a*) in eqns 3 and 4. Another interesting outcome is that, for the spatially random distribution *q*_{+/+}(*a*) = *p*_{+}(*a*), the absence probability will decrease with sampling area as an exponential function, which concurs with the spatially implicit equivalent of randomness. Furthermore, the random spatial pattern is insensitive to grain size *a*, i.e. by substituting *q*_{+/+}(*a*) = *p*_{+}(*a*) into the equation, we have 1 − *p*_{+}(4*a*) = [1 − *p*_{+}(*a*)]^{4} and *q*_{+/+}(4*a*) = *p*_{+}(4*a*). The first term implies *p*_{0}(*x* × *y*) = *p*_{0}(*y*)^{x}; however, the only function that coincides with this condition is exponential *p*_{0}(*a*) = Exp[–*d* × *a*] and *d* is a constant coefficient. Compared to randomness in a spatially implicit sense (Poisson process), the constant *d* is, indeed, the real density in the region or sampling extent, i.e. *p*_{+}(*a*) = *q*_{+/+}(*a*) = 1 − Exp[– µ_{a}]. Finally, the edge effect (i.e. the effect of the perimeter to area ratio) will inflate the occupancy observed under conditions of spatial aggregation, and decrease it under spatial segregation, with no influence on the occupancy of a spatially random distribution (as evident from a comparison of the occupancy probabilities for the chessboard and transect cases: see above explanation). Because the abundance in the whole region (sampling extent) does not change, this outcome also implies that the abundance or density in the long-border samples will be smaller in spatially aggregated species and will be larger in spatially segregated species. This can be demonstrated by a comparison of probabilities *p*_{+}(4*a*) in the transect and chessboard cases (Fig. 2c).

An important result here is that the occupancy *p*_{+}(*a*) and local density *q*_{+/+}(*a*) will both approach 1 in the limit with scaling-up. However, local density will at first decrease under aggregated conditions and then increase rapidly (Fig. 3), which means that the spatial distribution of a species will change from aggregated to random with scaling-up. In addition, as shown above, segregation will also limit to random with scaling-up. These results are opposite to those achieved using measures of statistical heterogeneity, where species spatial distributions change from random to aggregated with scaling-up. Because statistical variance increases faster than mean abundance (σ^{2}/µ_{a} ∼ *a*^{b−1}, *b* > 1), statistical heterogeneity thus increases with scaling-up. In contrast, spatial correlation converges to zero and spatial heterogeneity decreases with scaling-up, i.e. distribution patterns change from spatially aggregated to random. This highlights the significance of distinguishing spatially implicit from explicit patterns of species distributions in ecological studies (see also Veldtman & McGeoch 2004). Finally, Fig. 3 demonstrates, in agreement with He & Hubbell (2003; Fig. 2), that points of lowest local density (or spatial correlation) correspond with occupancy inflection points (i.e. where the rate of change of occupancy is highest). In other words, when the spatial autocorrelation structure of the species distribution is weak (low local density values), their distribution pattern will be strongly scale-dependent. This suggests that studies that are interested in the scaling behaviour of ecological relationships should focus on scales (grains) around this point.

Because fine-scale binary (presence/absence) data are more information rich than coarse-scale (low-intensity) data (McGeoch & Gaston 2002), scaling-up predictions of occupancy are bound to be more accurate than those obtained by scaling-down (Fig. 2a,b). However, it may be possible to predict fine-scale spatial pattern from coarse-scale data under certain conditions by zooming in on a binary map. Based on the result that occupancy and spatial correlation approach 1 in the limit with scaling up, coarse-scale binary data always tend to have both high occupancy and spatial correlation. Therefore, the prediction of spatial pattern by scaling-down will be highly sensitive to the accuracy of coarse-scale data. Small deviations will lead to widely divergent predictions at fine scales. However, under loose mathematical constraints, the inverse function of eqns 3 and 4 exists if the spatial pattern is aggregated or random, from which the estimation of occupancy and local density with scaling down is possible (Fig. 2d). The inverse function does not exist for the segregated distribution, because it is difficult to distinguish segregation from randomness at coarse scales. Nevertheless, if the occupancy is greater than local density, i.e. negative spatial correlation, the numerical approaches that use eqns 3 and 4 to find occupancy *p*_{+}(*a*) and local density *q*_{+/+}(*a*) can be used to predict occupancy and spatial correlation with scaling-down (Fig. 2d).

### empirical evaluation

We tested the accuracy of the predictions of the spatial scaling occupancy (SSO) model (eqns 3 and 4) using a data set of Drosophilidae (Diptera) inhabiting a 12 × 18 decaying fruit (nectarine, *Prunus persicae* Miller) matrix (see Warren *et al*. 2003 for details). Six adjacent plots (2 × 3) with 36 nectarines in each plot were used, with three of the plots in alternate rows of the two columns shaded artificially with 80% shade netting to impose a level of microclimatic heterogeneity on the experiment. The occupancy data of four species on the 25th day (peak in temporal abundance) were used to test the accuracy of the predictions of eqns 3 and 4 and to compare these with predictions from the spatially implicit NBT model (eqn 2). The species examined were *Drosophila simulans* Sturtevant, *D. melanogaster* Meigen and a *Zaprionus* morphospecies group (see Warren *et al*. 2003). Warren *et al*. (2003) showed that these data were fitted by a negative binomial distribution. Based on three different sampling grains [0·04 m^{2}, 0·16 m^{2} and 1·14 m^{2}, as in Warren *et al*. (2003)], the Taylor's power law relationships between mean abundance and spatial variance were obtained (*D. simulans*, σ^{2} = , *R*^{2} = 0·995; *D. melanogaster*, σ^{2} =, *R*^{2} = 0·995; *Zaprionus* morphospecies group, σ^{2} =, *R*^{2} = 0·994). Occupancy and local density were calculated (SSO model, eqns 3 and 4) using the three sampling grains, and compared with the observed data and predictions from the NBT model.

The occupancy values predicted by the SSO and NBT models were mainly strongly correlated with the observed values (Table 1). The coefficients of determination for the interspecific relationship between observed and predicted occupancy was > 0·99 for both models (predictions 1, 2 and 3) (Table 1). Occupancy values were thus equally accurately predicted with the spatially implicit and explicit (scaling-up or -down) approaches (Table 1). Local density was also accurately predicted with scaling-up, whereas with scaling-down the observed–predicted relationship was not significant (Table 1). Generally, local density estimates were less accurate than occupancy estimates, and scaling-down estimates of local density were particularly inaccurate, with accuracy < 0·90 in all cases (Table 1). Therefore, the SSO model performed as well as the NBT model in the prediction of occupancy. However, the SSO model also provided estimates of local density (with scaling-up) that were related significantly to observed values. This model therefore provides both range size information, as well as information on how that range is spatially structured. Furthermore, the NBT model (when used as it is here to predict occupancy, rather than as in its original usage to predict abundance (He & Gaston 2000b)) requires at least three different sampling grains, as well as the mean and variance of abundance, for the estimation of parameters *c* and *b*. The SSO model requires only a single occupancy map, and is thus less data demanding than the NBT model.

Species | Grain (m^{2}) | Observed Occ | LD | SSO model | SSO model | NBT model | ||
---|---|---|---|---|---|---|---|---|

Prediction 1 Occ | LD | Prediction 2 Occ | LD | Prediction 3 Occ | ||||

Occ: occupancy *p*_{+}; LD: local density*q*_{+/+}. Values in prediction 1 are calculated from eqns 3 and 4 using observed data at 0·04 m^{2}grain (i.e. scaling-up). In prediction 2, values are obtained from observed data at grain 0·16 m^{2}for 0·04 m^{2}(i.e. by scaling-down) and 0·64 m^{2}(i.e. by scaling-up). Italic values are thus obtained from scaling-down. Prediction 3 indicates the occupancy predicted from the NBT model (eqn 2). Asterisks denote accuracy (A) = 1-Abs[predicted–observed]/predicted; ***A > 0·99; **A > 0·95; *A > 0·9.*D. mel.*,*Drosophila melanogaster*;*D. sim*.,*D. simulans*;*Z. msp*;*Zaprionus*morphospecies group.
| ||||||||

D. sim. | 0·04 | 0·505 | 0·568 | 0·639 | 0·817 | 0·458 | ||

0·16 | 0·854 | 0·890 | 0·904* | 0·907** | 0·814** | |||

0·64 | 1·000 | 1·000 | 1·000*** | 1·000*** | 0·988** | 0·988** | 0·990** | |

D. mel. | 0·04 | 0·036 | 0·143 | 0·033* | 0·226 | 0·034* | ||

0·16 | 0·104 | 0·200 | 0·124 | 0·172 | 0·122 | |||

0·64 | 0·417 | 0·283 | 0·392* | 0·407 | 0·324 | 0·358 | 0·389* | |

Z. msp | 0·04 | 0·151 | 0·201 | 0·223 | 0·578 | 0·182 | ||

0·16 | 0·438 | 0·587 | 0·459** | 0·472 | 0·439*** | |||

0·64 | 0·833 | 0·817 | 0·907* | 0·907* | 0·799** | 0·818*** | 0·804** | |

Linear regression results | ||||||||

Slope | 1·03 | 0·99 | 1·14, 1·26 | 0·87, 1·13 | 0·95 | |||

R^{2} | 0·991 | 0·936 | 0·999, 0·994 | 0·999, 0·763 | 0·996 | |||

F | 456·11 | 58·96 | > 2·5 × 10^{4}, 181·64 | 4060·7, 3·23 | 1818·0 | |||

(d.f.) | (1,4) | (1,4) | (1,1) | (1,1) | (1,7) | |||

P < | 0·0001 | 0·01 | 0·01, 0·05 | 0·01, 0·32 | 0·0001 |

In addition, the demonstration by the SSO model that the edge effect will increase occupancy and therefore decrease the density, or mean abundance, within samples when individuals are aggregated (and *vice versa* when segregated) is supported by this empirical evaluation. The ratio of occupancy in a chessboard to occupancy in a transect at grain 0·16 m^{2} was 0·9437 ± 0·0325 (mean ± SE) for *D. simulans*, 0·7738 ± 0·0595 for *D. melanogaster* and 1·0023 ± 0·0477 for the *Zaprionus* morphospecies group. These results suggest that *D. simulans* and *D. melanogaster* are spatially aggregated (as the ratios are less than 1), which coincides with the observed patterns (that local densities are larger than the occupancies of these two species in Table 1). The *Zaprionus* spp., on the other hand, was distributed randomly (as the ratio is approximately 1). This contradicts the observed data (Table 1), which suggests that individuals are aggregated (local density is greater than occupancy at grain 0·04 m^{2}). The occupancy predictions for *Zaprionus* spp. were, however, not accurate (Table 1), as a probable consequence of its low abundance (56, compared to 2869 *D. simulans* individuals), and may thus not be expected to produce an accurate estimate of the edge effect.

We raise the edge effect issue here to demonstrate that edge length may be used to describe species distribution patterns (as also pointed out by He & Hubbell 2003). De Grave & Casey (2000) also tested the effect of sample edge on density estimates of intertidal macrofauna (data available for seven species). Densities obtained from square (16·8 × 16·8 cm) samples were compared to rectangular (33·5 × 8·4 cm) samples of the same area. Mean densities in rectangular samples of some species were found to be significantly lower than in square samples. The authors suggested that these density differences were a result of the spatial distribution and intensity of spatial aggregation. The edge effect results that we present here (Fig. 2c), provide a theoretical explanation for this phenomenon. The three species with no significant differences are likely to be spatially randomly distributed, whereas those with densities significantly lower than expected in rectangular samples are likely to be spatially aggregated. From Fig. 2, it is evident that the random distribution in fact occurs at the boundary between aggregation and segregation in the plane of *p*_{+}(*a*) vs. *q*_{+/+}(*a*). Under the assumption that species in the assemblage are distributed randomly across the phase plane in Fig. 2, the ratio of the area of segregation to aggregation in the feasible region is 2 ln(2) − 1 (≈ 0·39). If we compare the mean abundances of De Grave & Casey's (2000) seven species in square samples, with those in rectangular samples, the proportion of segregated to aggregated species is 2 : 5 (= 0·4). This ratio is virtually identical to that predicted by Fig. 2 (≈ 0·39). Accordingly, a proportion of 1/2 ln(2), or 72%, of the species in this case are spatially aggregated. This suggests, under the random assumption outlined above, a possible scale-invariant ratio for aggregated to segregated species in assemblages.