A spatially explicit approach to estimating species occupancy and spatial correlation

Authors


Cang Hui, Spatial, Physiological and Conservation Ecology Group, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa. E-mail: chui@sun.ac.za

Summary

  • 1Understanding and predicting the form of species distributions, or occupancy patterns, is fundamental to macroecology and is dependent on the identification of scaling relationships that underlie the patterns observed.
  • 2Occupancy–abundance models based on the negative binomial distribution and Taylor's power law are spatially implicit, rather than explicit, as they include no information on the relative positions of individuals. Here we present a spatially explicit model, the spatial scaling occupancy (SSO) model, to estimate species occupancy and spatial correlation, based on join-count statistics, or a pair approximation, approach. This model provides a spatially explicit description of species range size and aspects of range structure.
  • 3Occupancy data from Drosophilidae species inhabiting a decaying fruit mesocosm were used to test the SSO model. Predictions from the spatially implicit and explicit models were largely equally accurate. The SSO model is thus more efficient as it is less data demanding, and more informative as it provides an estimation of spatial correlation.
  • 4The results also showed that species distribution patterns differ when examined with spatially implicit vs. explicit approaches; the scaling relationship between occupancy and local density identifies a focal grain for studying the scale-dependent nature of ecological relationships; and the longer the length of the sample edge, the higher the occupancy observed under conditions of spatial aggregation.
  • 5The SSO model presents a step towards a general scaling model for occupancy, and demonstrates that the inclusion of spatially explicit information in macroecological models warrants further attention.

Introduction

The relationship between pattern and scale is fundamental to population-, community- and macroecology (Levin 1992; Brown 1995; Gaston & Blackburn 2000; McGeoch & Price 2004). Indeed, understanding and predicting the form of species distributions, or occupancy patterns, is dependent on the identification of spatial scaling relationships that underlie the patterns observed (Kunin 1998; He & Gaston 2000a, 2003; Kunin, Hartley & Lennon 2000; Dungan et al. 2002; McGeoch & Gaston 2002; Perry et al. 2002). Species occupancy, or records of presence and absence across a series of sites or quadrates, has become central to the debate on several patterns in ecology, including occupancy frequency distributions (McGeoch & Chown 1997; McGeoch & Gaston 2002) and the occupancy–abundance relationship (Brown 1995; Gaston & Blackburn 2000; He & Gaston 2000b, 2003; Kunin et al. 2000; Holt, Gaston & He 2002). However, although spatial scale is clearly an important determinant of occupancy patterns, the relationship between occupancy and spatial scale (measured as grain or window size) remains difficult to predict (He & Gaston 2000a; McGeoch & Gaston 2002).

At the root of variation in the relationship between occupancy and spatial scale is the manner in which species aggregation patterns change over distance (Moloney et al. 1992; He, Gaston & Wu 2002; He & Gaston 2003; He & Hubbell 2003), with aggregation determined by a combination of species biology, behaviour, abundance and environmental heterogeneity (Nachman 1981; Taylor et al. 1983; Levin 1992; Dungan et al. 2002; Perry et al. 2002). Indeed, the range of statistical distribution models used to describe the occupancy–abundance relationship, reflects in part the inherent variation in the distribution patterns of individuals across space (He et al. 2002). Furthermore, although these models describe the occupancy–abundance relationship adequately, the parameter estimates of the relationship remain dependent on the scale of observation (He & Gaston 2000a). As a consequence, scaling models for species distributions have been developed from the relationships between species occupancy, spatial scale and pattern of aggregation (Kunin et al. 2000; He & Gaston 2003).

Some of the above models have been extended to use the scale-dependent nature of the occupancy–abundance relationship to predict its parameters (Kunin 1998, 2000; He et al. 2002). Kunin (1998) used a fractal power relationship to estimate the total area occupied by species (the sum of the number of occupied cells). He & Gaston (2000b) followed this with a model and parameterization method for predicting species abundance from measures of occupancy, across spatial scales, using the negative binomial distribution. Because abundance across the sampling extent is scale invariant, this model provides a scaling theory of occupancy. Methods such as this, of predicting species abundance and occupancy, are potentially very valuable to both conservation and pest management (He & Hubbell 2003; Warren, McGeoch & Chown 2003; Tosh, Reyers & van Jaarsveld 2004). Broad-scale distributional data, particularly abundance data, are generally labour-intensive and costly to obtain and generally not feasible for a broad range of taxa (He & Gaston 2000b). Therefore, approaches that are able to model occupancy patterns and predict species abundance accurately are an important avenue of ecological investigation.

Unfortunately, the accuracy of abundance estimates using the models mentioned above are often highly variable (Kunin et al. 2000; He & Gaston 2003; Warren et al. 2003; Tosh et al. 2004). Significantly, however, the information on aggregation incorporated in these statistical models (for example, mean abundance and the aggregation parameter k in the negative binomial distribution) is spatially implicit rather than explicit (sensu Perry 1998). In other words, no spatial information, such as relative location or spatial autocorrelation, is incorporated into the model (Wiens 2000; Perry et al. 2002; Veldtman 2004). The inclusion of spatial information in other forms of ecological models has significantly advanced our understanding of several ecological processes (e.g. Hanski 1998; Dieckmann, Law & Metz 2000; Veldtman & McGeoch 2004) and, as suggested by Warren et al. (2003) and shown by He & Hubbell (2003), may also improve the accuracy of occupancy–abundance model estimates.

Here we propose a novel scaling model (SSO model) for the estimation of species occupancy and first-order spatial correlation [i.e. the correlation in the probability of occupancy between two adjacent sites or quadrates (equivalent to local density, see below)] using an approach that includes spatially explicit information (sensu Perry 1998). The approach presented here facilitates accurate, data sparse estimation of occupancy, and provides an alternative approach towards modelling species distributions. We test the accuracy of model predictions using a mesocosm of flies occupying decaying fruit, and compare the accuracy of estimates of our spatially explicit model with occupancy estimates obtained from He & Gaston's (2003) spatially implicit occupancy–abundance model.

occupancyabundance scaling models

We first outline developments in occupancy–abundance models as the basis for the spatial scaling approach that we develop here. Spatial variance σ2, or the variation in counts or abundance between samples, is a measure of statistical heterogeneity and is spatially non-explicit (Wiens 2000; Perry et al. 2002; Veldtman & McGeoch 2004). Three well-known, distribution patterns are defined by spatial variance (Pielou 1969), i.e. aggregated (over-dispersed, clumped or contagious; spatial variance larger than mean), random (spatial variance equivalent to mean) and uniform (under-dispersed, even or regular; spatial variance less than mean abundance). Spatial variance is also estimated by formal statistical distributions, such as the Poisson and negative binomial distributions (Wright 1991), with the negative binomial most commonly adopted to describe species distributions:

image(eqn 1 )

where px(a) is the proportion or the probability that a sampling quadrate of grain size a contains x individuals, µa is the mean abundance across sampling, k is a clumping parameter of the species’ distribution from highly aggregated at k = 0 to random at k = +∞ (Wright 1991). As k tends to infinity, the negative binomial describes a Poisson distribution. Based on the negative binomial distribution, the absence probability in a sample is p0(a) = (1 + µa/k)k. Therefore, the presence probability in samples, i.e. the occupancy or occurrence, will be p+(a) = 1 − (1 + µa/k)k, which is a relationship between occupancy p+(a) and mean abundance µa (He & Gaston 2000b). On the other hand, the mean abundance µa is the product of the density per unit area, d, and grain size, aa = a × d) (Hubbell 2001).

Substituting the linear grain–density relationship into the presence or absence probabilities above thus provides a scaling theory of occupancy. However, this relationship has been shown to underestimate abundance in high occupancy (or highly aggregated) species, and to a lesser extent in rare and moderately abundant species (Kunin et al. 2000; He & Hubbell 2003; He & Gaston 2000b; Warren et al. 2003). He & Gaston (2003) found this underestimation resulted from discordance between the negative binomial and observed data. The relationship between the statistical variance and mean abundance in the negative binomial is σ2 = µainline image/k; however, the observed relationship for most species fits a Taylor's power law σ2c × inline image, where c and b are constant (Taylor 1961). The assumption that exponent b is >1 not only has a theoretical explanation, but is empirically well supported (Downing 1986). Under this assumption (b > 1), the variance–mean abundance ratio (called the coefficient of diffusion or index of dispersion, Pearson & Hartley 1966) changes from zero to infinity with scaling up (i.e. an increase in grain a). Thus, based on this spatially implicit approach species distributions change from uniform to random, and finally to aggregated with an increase in grain [note that this ratio does not describe spatial distribution patterns accurately, but provides a measure of statistical heterogeneity (Hurlbert 1990)].

Based on Taylor's power law the variance of the negative binomial should be replaced by a varying k-value, k = inline image/(σ2 − µa) (He & Gaston 2003; He & Hubbell 2003). Therefore, the scaling theory of occupancy should be as follows (He & Gaston 2003):

image(eqn 2 )

This is also an occupancy–abundance relationship in the spatially implicit, rather than spatially explicit, sense. In sum, this scaling theory of occupancy is a combination of the negative binomial distribution (eqn 1) and Taylor's power law (henceforth referred to as the NBT model). Therefore, although the NBT model was developed to examine the spatial distributions of species, it paradoxically includes no explicit spatial information (Wiens 2000). While the NBT model provides accurate estimates in some cases (He & Gaston 2003), it requires two occupancy maps at different grains to do so (Kunin 1998; He & Gaston 2000b; Kunin et al. 2000). As shown below, a spatially explicit model that incorporates spatial correlation requires data at only a single spatial scale.

a spatial scaling model of occupancy

The terms ‘aggregated’ and ‘random’ are also used to describe spatially explicit patterns of heterogeneity, although their interpretation is somewhat different to that described by spatially implicit models (Wiens 2000; Fortin, Dale & ver Hoef 2002; Perry et al. 2002; Veldtman 2004). For example, Veldtman & McGeoch (2004) show that conclusions drawn regarding patterns of heterogeneity, and correlations between them, differ with the use of spatially explicit vs. implicit approaches. Therefore, spatial variance, as a spatially implicit measure, is clearly insufficient to describe patterns in the physical distribution of individuals across space (see also Hurlbert 1990).

One approach to describing species distributions in a spatially explicit manner is join-count statistics (Fortin et al. 2002). The join-count statistic is conceptually and mathematically similar to the pair-approximation (or moment approximation) approach [see Dieckmann et al. (2000) for outline of the latter] used to describe spatial distributions in metapopulation ecology. The spatiotemporal dynamics of binary maps forms the basis of metapopulation ecology, based on Levins’ patch occupancy model (Levins 1969; Hanski 1998; Hui & Li 2003). Join-count statistics can also be used to classify distributions as spatially aggregated, segregated or random in terms of the global and local densities used in pair approximation (Sato & Iwasa 2000; Hui & Li 2004). Global density is the probability that a randomly chosen sample is presently occupied by a local population, which has the same meaning as occupancy p+(a). Local density q+/+(a) is the conditional probability that a randomly chosen adjacent quadrate of an occupied quadrate is also occupied (Dieckmann et al. 2000; Hui & Li 2004). Therefore, spatially explicit aggregation can be described by q+/+(a) − p+(a) > 0, indicating the positive first-order spatial correlation between two adjacent, occupied samples. The spatial random distribution has q+/+(a) − p+(a) = 0 and implies the independence of two adjacent, occupied samples. Lastly, therefore, the spatial segregated distribution can be depicted by q+/+(a) − p+(a) < 0, i.e. a negative spatial correlation between two adjacent samples (Hui & Li 2004).

We therefore use the pair-approximation approach (Sato & Iwasa 2000) to develop a scaling model for the prediction of species occupancy and spatial correlation. The spatial scaling occupancy (SSO) model predicts occupancy and spatial correlation with a change in grain size. Suppose the sampling unit a is a square quadrate sample (Fig. 1a). The most common shapes generated when such units are combined, such that the area is 4a, are a transect and chessboard (Fig. 1b,c). For a transect, the probability of absence in a sample is p0(4a) = p0(a) × q0/0(a)3. The absence probability for a chessboard is: p0(4a) = p0(a) × q0/0(a)2 × b0(a), in which

Figure 1.

Sampling unit, or grain (a), transect (b), chessboard (c), and their different forms (d, e, f) when scaling-up (see text).

image

is the estimation, according to Bayes’ rule, that a sample unit with two absent neighbours (i.e. three-quadrates) is also absent (Pitman 1993). The absence probability in the chessboard case above is a product of three probabilities: the global density of absence p0(a) (probability that a patch is empty), the local density of absence in a pair q0/0(a) (correlation between the conditional absence probabilities of two adjacent patches) and the Bayes estimation of absence in the three-quadrate situation b0(a). The conditional probability q0/0(4a) has two forms in the transect case (adjacent to either the long or short edge) and only one form in the chessboard case. The probability q0/0(4a) in the transect adjacent to the long edge (Fig. 1d) is q0/0(4a) = q0/0(a) × b0(a)3 and adjacent to the short edge (Fig. 1e) is q0/0(4a) = q0/0(a)4. In the chessboard case (Fig. 1f), we have q0/0(4a) =q0/0(a)2 b0(a)2.

Therefore, according to probability rules that p+ = 1 − p0 and q+/+ = 1 − (1 − q0/0) × p0/p+, by scaling-up we obtain occupancy (global density) and spatial correlation (the local density minus the global density). Because the formula is complex, here we provide the probability for only the chessboard case, as this is the most common form of occupancy data. The occupancy and spatial pattern of transect samples can be obtained similarly from the above probabilities. For chessboard samples, the occupancy is:

image(eqn 3 )

and the local density is:

image(eqn 4 )

where ∇ = p0(a) − q0/+(ap+(a) and Δ = p0(a)[1 − p+(a)2 (2q+/+(a) − 3) + p+(a)(q+/+(a)2 − 3)]. The conditional probability q0/+(a) = 1 − q+/+(a) is the absence probability in a quadrate adjacent to an occupied one. If species occupancy p+(a) and local density q+/+(a) are known for the feasible region (coloured region of Fig. 2a,b), that is 0 ≤ p+(a) ≤ 1 and 2 − 1/p+(a) ≤ q+/+(a) ≤ 1 (it is not possible for combinations of occupancy and local density to lie outside this region) (Hui & Li 2004), then it is possible to obtain, from eqns 3 and 4, the occupancy and local density (and thus spatial correlation) of a species at larger scales (i.e. scaling-up) (Fig. 2a,b).

Figure 2.

(a) Global density p+(4a) and (b) local density q+/+(4a) (i.e. first-order spatial correlation as defined in text) values (represented by colour scale below, left) obtained by scaling-up from grain a to 4a. (c) The ratio of density 1/p+(4a) in a transect to that in a chessboard (colour scale below, right). (d) An example of scaling-down estimation of occupancy and local density. The two curves are the contours of occupancy p+(4a) and local density q+/+(4a), respectively, for the values provided. The intersection point is the solution of occupancy and local density achieved with scaling-down.

From eqns 3 and 4, several outcomes are obtained. First, at the greatest degree of segregation q+/+(a) = 2 − 1/p+(a), (the boundary between the feasible and non-feasible regions in Fig. 2a–c), occupancy and local density will rapidly tend to 1 with scaling-up (Fig. 2). This result can be obtained by the substitution of q+/+(a) in eqns 3 and 4. Another interesting outcome is that, for the spatially random distribution q+/+(a) = p+(a), the absence probability will decrease with sampling area as an exponential function, which concurs with the spatially implicit equivalent of randomness. Furthermore, the random spatial pattern is insensitive to grain size a, i.e. by substituting q+/+(a) = p+(a) into the equation, we have 1 − p+(4a) = [1 − p+(a)]4 and q+/+(4a) = p+(4a). The first term implies p0(x × y) = p0(y)x; however, the only function that coincides with this condition is exponential p0(a) = Exp[–d × a] and d is a constant coefficient. Compared to randomness in a spatially implicit sense (Poisson process), the constant d is, indeed, the real density in the region or sampling extent, i.e. p+(a) = q+/+(a) = 1 − Exp[– µa]. Finally, the edge effect (i.e. the effect of the perimeter to area ratio) will inflate the occupancy observed under conditions of spatial aggregation, and decrease it under spatial segregation, with no influence on the occupancy of a spatially random distribution (as evident from a comparison of the occupancy probabilities for the chessboard and transect cases: see above explanation). Because the abundance in the whole region (sampling extent) does not change, this outcome also implies that the abundance or density in the long-border samples will be smaller in spatially aggregated species and will be larger in spatially segregated species. This can be demonstrated by a comparison of probabilities p+(4a) in the transect and chessboard cases (Fig. 2c).

An important result here is that the occupancy p+(a) and local density q+/+(a) will both approach 1 in the limit with scaling-up. However, local density will at first decrease under aggregated conditions and then increase rapidly (Fig. 3), which means that the spatial distribution of a species will change from aggregated to random with scaling-up. In addition, as shown above, segregation will also limit to random with scaling-up. These results are opposite to those achieved using measures of statistical heterogeneity, where species spatial distributions change from random to aggregated with scaling-up. Because statistical variance increases faster than mean abundance (σ2a ∼ ab−1, b > 1), statistical heterogeneity thus increases with scaling-up. In contrast, spatial correlation converges to zero and spatial heterogeneity decreases with scaling-up, i.e. distribution patterns change from spatially aggregated to random. This highlights the significance of distinguishing spatially implicit from explicit patterns of species distributions in ecological studies (see also Veldtman & McGeoch 2004). Finally, Fig. 3 demonstrates, in agreement with He & Hubbell (2003; Fig. 2), that points of lowest local density (or spatial correlation) correspond with occupancy inflection points (i.e. where the rate of change of occupancy is highest). In other words, when the spatial autocorrelation structure of the species distribution is weak (low local density values), their distribution pattern will be strongly scale-dependent. This suggests that studies that are interested in the scaling behaviour of ecological relationships should focus on scales (grains) around this point.

Figure 3.

Scaling patterns of occupancy (solid lines) and local density (dashed lines) under spatial aggregation, calculated from eqns 3 and 4 (note scale differences between the occupancy and local density axes). Occupancy starting from p+(a) = 0·03 (○); p+(a) = 0·23 (×); p+(a) = 0·43 (▵); and p+(a) = 0·63 (+). All the local density values start from q+/+(a) = 0·93.

Because fine-scale binary (presence/absence) data are more information rich than coarse-scale (low-intensity) data (McGeoch & Gaston 2002), scaling-up predictions of occupancy are bound to be more accurate than those obtained by scaling-down (Fig. 2a,b). However, it may be possible to predict fine-scale spatial pattern from coarse-scale data under certain conditions by zooming in on a binary map. Based on the result that occupancy and spatial correlation approach 1 in the limit with scaling up, coarse-scale binary data always tend to have both high occupancy and spatial correlation. Therefore, the prediction of spatial pattern by scaling-down will be highly sensitive to the accuracy of coarse-scale data. Small deviations will lead to widely divergent predictions at fine scales. However, under loose mathematical constraints, the inverse function of eqns 3 and 4 exists if the spatial pattern is aggregated or random, from which the estimation of occupancy and local density with scaling down is possible (Fig. 2d). The inverse function does not exist for the segregated distribution, because it is difficult to distinguish segregation from randomness at coarse scales. Nevertheless, if the occupancy is greater than local density, i.e. negative spatial correlation, the numerical approaches that use eqns 3 and 4 to find occupancy p+(a) and local density q+/+(a) can be used to predict occupancy and spatial correlation with scaling-down (Fig. 2d).

empirical evaluation

We tested the accuracy of the predictions of the spatial scaling occupancy (SSO) model (eqns 3 and 4) using a data set of Drosophilidae (Diptera) inhabiting a 12 × 18 decaying fruit (nectarine, Prunus persicae Miller) matrix (see Warren et al. 2003 for details). Six adjacent plots (2 × 3) with 36 nectarines in each plot were used, with three of the plots in alternate rows of the two columns shaded artificially with 80% shade netting to impose a level of microclimatic heterogeneity on the experiment. The occupancy data of four species on the 25th day (peak in temporal abundance) were used to test the accuracy of the predictions of eqns 3 and 4 and to compare these with predictions from the spatially implicit NBT model (eqn 2). The species examined were Drosophila simulans Sturtevant, D. melanogaster Meigen and a Zaprionus morphospecies group (see Warren et al. 2003). Warren et al. (2003) showed that these data were fitted by a negative binomial distribution. Based on three different sampling grains [0·04 m2, 0·16 m2 and 1·14 m2, as in Warren et al. (2003)], the Taylor's power law relationships between mean abundance and spatial variance were obtained (D. simulans, σ2 = inline image, R2 = 0·995; D. melanogaster, σ2 =inline image, R2 = 0·995; Zaprionus morphospecies group, σ2 =inline image, R2 = 0·994). Occupancy and local density were calculated (SSO model, eqns 3 and 4) using the three sampling grains, and compared with the observed data and predictions from the NBT model.

The occupancy values predicted by the SSO and NBT models were mainly strongly correlated with the observed values (Table 1). The coefficients of determination for the interspecific relationship between observed and predicted occupancy was > 0·99 for both models (predictions 1, 2 and 3) (Table 1). Occupancy values were thus equally accurately predicted with the spatially implicit and explicit (scaling-up or -down) approaches (Table 1). Local density was also accurately predicted with scaling-up, whereas with scaling-down the observed–predicted relationship was not significant (Table 1). Generally, local density estimates were less accurate than occupancy estimates, and scaling-down estimates of local density were particularly inaccurate, with accuracy < 0·90 in all cases (Table 1). Therefore, the SSO model performed as well as the NBT model in the prediction of occupancy. However, the SSO model also provided estimates of local density (with scaling-up) that were related significantly to observed values. This model therefore provides both range size information, as well as information on how that range is spatially structured. Furthermore, the NBT model (when used as it is here to predict occupancy, rather than as in its original usage to predict abundance (He & Gaston 2000b)) requires at least three different sampling grains, as well as the mean and variance of abundance, for the estimation of parameters c and b. The SSO model requires only a single occupancy map, and is thus less data demanding than the NBT model.

Table 1.  Observed and predicted occupancy and local density for Drosophilidae species across the mesocosm arena, and results of the linear regression between observed and predicted occupancy and local density. Predictions derived from the spatial scaling occupancy model (SSO) and the model derived from the negative binomial and Taylor's power law (NBT)
SpeciesGrain (m2)Observed OccLDSSO modelSSO modelNBT model
Prediction 1 OccLDPrediction 2 OccLDPrediction 3 Occ
  1. Occ: occupancy p+; LD: local density q+/+. Values in prediction 1 are calculated from eqns 3 and 4 using observed data at 0·04 m2 grain (i.e. scaling-up). In prediction 2, values are obtained from observed data at grain 0·16 m2 for 0·04 m2 (i.e. by scaling-down) and 0·64 m2 (i.e. by scaling-up). Italic values are thus obtained from scaling-down. Prediction 3 indicates the occupancy predicted from the NBT model (eqn 2). Asterisks denote accuracy (A) = 1-Abs[predicted–observed]/predicted; ***A > 0·99; **A > 0·95; *A > 0·9. D. mel., Drosophila melanogaster; D. sim., D. simulans; Z. msp; Zaprionus morphospecies group.

D. sim.0·040·5050·568  0·6390·8170·458
0·160·8540·8900·904*0·907**  0·814**
0·641·0001·0001·000***1·000***0·988**0·988**0·990**
D. mel.0·040·0360·143  0·033*0·2260·034*
0·160·1040·2000·1240·172  0·122
0·640·4170·2830·392*0·4070·3240·3580·389*
Z. msp0·040·1510·201  0·2230·5780·182
0·160·4380·5870·459**0·472  0·439***
0·640·8330·8170·907*0·907*0·799**0·818***0·804**
Linear regression results
 Slope   1·030·991·14, 1·260·87, 1·130·95
 R2   0·9910·9360·999, 0·9940·999, 0·7630·996
 F   456·1158·96> 2·5 × 104, 181·644060·7, 3·231818·0
 (d.f.)   (1,4)(1,4)(1,1)(1,1)(1,7)
 P <   0·00010·010·01, 0·050·01, 0·320·0001

In addition, the demonstration by the SSO model that the edge effect will increase occupancy and therefore decrease the density, or mean abundance, within samples when individuals are aggregated (and vice versa when segregated) is supported by this empirical evaluation. The ratio of occupancy in a chessboard to occupancy in a transect at grain 0·16 m2 was 0·9437 ± 0·0325 (mean ± SE) for D. simulans, 0·7738 ± 0·0595 for D. melanogaster and 1·0023 ± 0·0477 for the Zaprionus morphospecies group. These results suggest that D. simulans and D. melanogaster are spatially aggregated (as the ratios are less than 1), which coincides with the observed patterns (that local densities are larger than the occupancies of these two species in Table 1). The Zaprionus spp., on the other hand, was distributed randomly (as the ratio is approximately 1). This contradicts the observed data (Table 1), which suggests that individuals are aggregated (local density is greater than occupancy at grain 0·04 m2). The occupancy predictions for Zaprionus spp. were, however, not accurate (Table 1), as a probable consequence of its low abundance (56, compared to 2869 D. simulans individuals), and may thus not be expected to produce an accurate estimate of the edge effect.

We raise the edge effect issue here to demonstrate that edge length may be used to describe species distribution patterns (as also pointed out by He & Hubbell 2003). De Grave & Casey (2000) also tested the effect of sample edge on density estimates of intertidal macrofauna (data available for seven species). Densities obtained from square (16·8 × 16·8 cm) samples were compared to rectangular (33·5 × 8·4 cm) samples of the same area. Mean densities in rectangular samples of some species were found to be significantly lower than in square samples. The authors suggested that these density differences were a result of the spatial distribution and intensity of spatial aggregation. The edge effect results that we present here (Fig. 2c), provide a theoretical explanation for this phenomenon. The three species with no significant differences are likely to be spatially randomly distributed, whereas those with densities significantly lower than expected in rectangular samples are likely to be spatially aggregated. From Fig. 2, it is evident that the random distribution in fact occurs at the boundary between aggregation and segregation in the plane of p+(a) vs. q+/+(a). Under the assumption that species in the assemblage are distributed randomly across the phase plane in Fig. 2, the ratio of the area of segregation to aggregation in the feasible region is 2 ln(2) − 1 (≈ 0·39). If we compare the mean abundances of De Grave & Casey's (2000) seven species in square samples, with those in rectangular samples, the proportion of segregated to aggregated species is 2 : 5 (= 0·4). This ratio is virtually identical to that predicted by Fig. 2 (≈ 0·39). Accordingly, a proportion of 1/2 ln(2), or 72%, of the species in this case are spatially aggregated. This suggests, under the random assumption outlined above, a possible scale-invariant ratio for aggregated to segregated species in assemblages.

Conclusion

First, it is possible to predict occupancy more efficiently, and as effectively, using a spatially explicit compared with a spatially implicit approach. As demonstrated here using the SSO and NBT models, the former is less data demanding and equally accurate. Furthermore, using the pair approximation approach, both occupancy and spatial correlation can be predicted across spatial scales, providing information on both range size and structure. Secondly, it is clearly necessary to distinguish spatially implicit from explicit spatial patterns of species distributions (see also Hurlbert 1990; Veldtman & McGeoch 2004). When scaling-up, species distribution patterns were found to change from random to aggregated with an implicit approach, whereas they changed from aggregated to random with the spatially explicit approach. Thirdly, occupancy and spatial correlation may be predicted by scaling-down when individuals have random or aggregated distributions (but not when segregated), albeit with less accuracy. Fourthly, the scaling relationship between occupancy and local density identifies a focal grain for examining the scale-dependent nature of ecological relationships. Finally, the edge effect on occupancy estimates provides additional information on species, and potentially also assemblage, distribution patterns.

Modelling and understanding the occupancy–abundance relationship remains one of macroecology's central themes (Holt et al. 2002; He & Gaston 2003; He & Hubbell 2003). Here we show that the join-count statistic (pair approximation) approach can be used to obtain a spatial scaling theory of occupancy. By describing spatial patterns in occupancy in terms of mean occupancy and first-order spatial correlation, the SSO model represents a step towards a general scaling model for occupancy. The model based on pair approximation, presented here, is clearly discrete (in contrast to previous models such as the NBT that scale continuously with area). Although a continuous, general scaling theory of occupancy is likely to be overly complex, further exploration of the inclusion of spatial autocorrelation structure into occupancy–abundance models is likely to be productive. Indeed, the fields of spatial structure documentation (e.g. Fortin et al. 2002; Perry et al. 2002) and modelling of spatial processes (e.g. Hanski 1998; Dieckmann et al. 2000) have to date developed largely independently. Combinations of these two approaches are likely to result in significant advances towards the development of a general macroecological framework. Here, the insights into species and assemblage distribution patterns gained from the SSO model suggest that the inclusion of spatially explicit information in macroecological models warrants further attention.

Acknowledgements

We thank K. J. Gaston, S. L. Chown, R. Veldtman and S. H. Hurlbert for comments and discussion, and B. Laniewski for editorial assistance. We are very grateful for the helpful comments by F. He and an anonymous reviewer. This work was supported by the National Research Foundation of South Africa (GUN 2053618) and the University of Stellenbosch.

Ancillary