### ABSTRACT

- Top of page
- ABSTRACT
- INTRODUCTION
- METHODS
- RESULTS
- DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
- BIOSKETCH
- Supporting Information

**Aim ** To investigate the influence of choice of the measure of mean abundance on the abundance–occupancy relationship, and to examine the implications for identifying causal mechanisms.

**Innovation ** Simulations were performed to generate stochastic abundance–occupancy data sets covering a wide range of scenarios representative of empirical abundance–occupancy data. Two common measures of mean abundance were used: local mean abundance (mean abundance calculated using only data from occupied sites) and global mean abundance (mean abundance calculated using all sites or samples). I found that the choice of mean abundance measure had a strong effect on the correlation between abundance and occupancy. Local mean abundance was associated with a high proportion of negative correlations (mean percentage of negative correlations across 24 simulations = 44.39), while global mean abundance was strongly associated with positive correlations (mean percentage of negative correlations across 24 simulations = 0.02).

**Main conclusions ** The choice of abundance measure influences the correlation between abundance and occupancy. Negative correlations between local mean abundance and occupancy are an inherent and unavoidable consequence of using this measure of abundance. Efforts to identify causal mechanisms that give rise to the abundance–occupancy relationship have attempted to explain occasional negative correlations when the expectation was for positive correlations. This study shows that negative correlations arise from the choice of mean abundance measure and that this artefact confounds efforts to identify ecological causal mechanisms.

### INTRODUCTION

- Top of page
- ABSTRACT
- INTRODUCTION
- METHODS
- RESULTS
- DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
- BIOSKETCH
- Supporting Information

Macroecology examines relationships between attributes of ecological systems measured at large spatial and temporal scales and attempts to infer the processes that give rise to them (Brown, 1995; Gaston & Blackburn, 2000; Blackburn & Gaston, 2004). Pairwise relationships between four key attributes – body mass, species richness, abundance and spatial distribution – have dominated the macroecological literature (Brown, 1995; Gaston & Blackburn, 2000; Blackburn, 2004; Blackburn & Gaston, 2004). The relationship between abundance and spatial distribution (i.e. some index of the amount of space occupied by a species) has received particularly intense scrutiny, with the conceptual foundation for the study of the relationship laid down by Ricklefs (1972), Hanski (1982), Bock & Ricklefs (1983) and Brown (1984, 1995). The motivation for these studies was to link abundance within resource patches (often referred to as ‘local abundance’) with the distribution or availability of resources.

Local abundance is defined as the mean number of individuals in a set of samples computed by excluding zero-count samples (Hanski, 1982; Bock & Ricklefs, 1983; Brown, 1984; Lacy & Bock, 1986; Brown, 1995). I will use the term ‘local mean abundance’ (LMA) to distinguish this measure from ‘global mean abundance’ (GMA) which is the mean abundance computed using all samples. Similar distinctions between methods for computing mean abundance were introduced by Elton (1932).

In the earliest exploration of the relationship by Hanski (1982), the chosen measure of spatial distribution was the proportion of samples or habitat patches occupied by a species; this is usually referred to as ‘occupancy’. Other researchers correlated abundance measures with the number of occupied sites (Bock & Ricklefs, 1983; Brown, 1984; Lacy & Bock, 1986; Brown, 1995), but this is directly proportional to occupancy for a set of samples and therefore does not alter correlation coefficients (Zar, 1999). Although many other measures of abundance and spatial distribution have been used (Gaston, 1996; Quinn *et al*., 1996; Gaston *et al*., 2000; Blackburn *et al*., 2006; Wilson, 2008), the early use of LMA and occupancy has led to the adoption of the generic term ‘abundance–occupancy relationship’ (Gaston *et al*., 2000).

Bock & Ricklefs (1983), Brown (1984, 1995) and Hanski (1982) concluded, largely on the basis of limited empirical evidence, that the relationship between LMA and occupancy within a given data set should result in a positive correlation. Hanski (1982), for example, postulated as a law of nature that species with the highest local abundance also occupied the largest number of sites. However, subsequent studies have revealed that zero and negative correlations do occur (Gaston & Lawton, 1990; Gaston, 1996; Gaston *et al*., 1997, 2000; Blackburn *et al*., 2006; Wilson, 2008), and their presence has led to the search for explanatory processes that generate mostly positive but occasionally zero and negative correlations. The search for explanatory processes has included the development of mathematical models of metapopulation dynamics (Gyllenberg & Hanski, 1992; Hanski & Gyllenberg, 1997; Holt *et al*., 1997; Freckleton *et al*., 2006), and conceptual models invoking evolutionary scenarios (Symonds & Johnson, 2006) or the action of ecological and biogeographical constraints on distribution (Gregory & Gaston, 2000).

LMA and GMA values for a given species are related to occupancy (Pennington, 1983; Wright, 1991). Let the total number of samples in a data set, *N*, be composed of two parts, the number of samples in which one or more individuals were *present* (*N*_{P}, the occupied samples), and the number of samples from which individuals were *absent* (*N*_{A}, the zero-count or unoccupied samples; equals 1 –*N*_{P}). Occupancy (ω) is then computed as the proportion of samples with a count greater than zero:

Occupancy may be interpreted as the probability that a randomly selected sample is occupied, and can therefore range from 0 to 1, inclusive.

Local mean abundance (LMA) is the mean computed using only counts (*x*) from occupied samples. In statistical terminology, LMA (*m*_{L}) is a statistic computed using left-censored, conditioned or zero-truncated data. That is,

Global mean abundance (GMA) is the unbiased estimate of the mean or expected value of the statistical population from which a set of samples was drawn (Freund, 1972). GMA (*m*_{G}) is computed using the standard formula from elementary statistics, namely,

The two mean abundances are related via occupancy:

- (1)

Since 0 < *ω*≤ 1, it follows that *m*_{L}≥*m*_{G} always. A further artefact of using *m*_{L} is that the minimum attainable value for this measure is not zero. Since local mean abundance is measured as individuals per occupied sample, then the minimum value for LMA is 1, which occurs when only one individual is detected in only one sample within a sample set.

Equation 1 defines a hyperbolic function, and the relationship between *m*_{G} and *m*_{L} is therefore nonlinear and occupancy dependent. On an abundance–occupancy plot, low-occupancy species will be shifted to the right more than higher-occupancy species when LMA is used, and the shift at very low occupancy (e.g. a very rare species in a large set of samples) may be of several orders of magnitude (Fig. 1). It is therefore possible for a scatter of points on an abundance–occupancy plot with a positive correlation between GMA and occupancy to be transformed into a negative one between LMA and occupancy.

The statistical attributes of LMA were considered by Aitchison (1955) and Pennington (1983), and the implications of LMA for modelling the abundance–occupancy relationship was examined by Wright (1991), and mentioned by Hartley (1998). In light of the preliminary assessment that LMA transforms the relationship between abundance and occupancy, I sought to answer the following questions using simulations: What is the influence of the transformation between GMA and LMA on the correlation of each with occupancy? Does the use of LMA explain the observation of occasional negative correlations between LMA and occupancy in empirical data?

### DISCUSSION

- Top of page
- ABSTRACT
- INTRODUCTION
- METHODS
- RESULTS
- DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
- BIOSKETCH
- Supporting Information

LMA represents a nonlinear, occupancy-dependent rescaling of GMA (Pennington, 1983; Wright, 1991). Although Wright (1991) highlighted important aspects of the relationship between the two abundance measures from a macroecological perspective, there has not been a thorough appraisal of the impact of LMA on the abundance–occupancy relationship. My results clearly show that zero and negative correlations between LMA and occupancy are always possible in survey data. Further, the probability of a negative correlation increases as the maximum occupancy within a set of samples decreases. For example, if only rare species are examined, the probability of a negative correlation increases to approximately 0.5. Zero and negative correlations may occur when GMA is used, but are extremely rare events only occurring with probabilities of the order of 0.001 when sample size (e.g. number of species in an inter-specific abundance–occupancy relationship) is small.

The results I have obtained here help clarify the nature of the abundance–occupancy distribution when mean abundance and occupancy are used as the indices of abundance and spatial distribution, respectively. This is the form in which the abundance–occupancy relationship was first described, but many subsequent studies have used other measures for abundance and spatial distribution. Even though there often is a positive correlation between most alternative measures of spatial distribution and occupancy (Gaston, 1991; Gaston, 1994; Quinn *et al*., 1996), a recent meta-analysis (Wilson, 2008) has shown that these other measures have an impact on the sign of the correlation. A study of the impact that various combinations of measure have on the sampling distribution of correlations between abundance and other measures of distribution may prove to be a fruitful area of further study.

There is long-standing recognition that mean abundance (or its counterpart, mean density) conditioned or truncated in some way may be a meaningful measure of abundance (Elton, 1932; Pennington, 1983; Rudran *et al*., 1996; Gaston *et al*., 1999a). However, there does not appear to have been a similar long-standing recognition in ecology that correlating a transformed and truncated variable (e.g. LMA) to an unconditioned variable (e.g. occupancy) may introduce statistical artefacts, even though Brett (2004) highlighted the statistical implications of correlations of the form *X*/*Y* correlated with *Y*. The introduction of statistical artefacts into the correlation between LMA and occupancy appears analogous to the instability in linear relationships when data collected with differing extents and grains are compared (Wiens, 1989; Schneider, 1994, 1998) and different arrangements of regional aggregations (the change of support or modifiable area unit problem, MAUP) used in geographical analyses (Openshaw & Taylor, 1979; Cressie, 1996). Recognition of the possible impact that LMA may have on the relationship between measures of abundance and distribution is, however, beginning to emerge in macroecology (e.g. Pautasso & Weisberg, 2008).

The correlation between LMA and occupancy (represented by the number of occupied sites) was first used by Bock & Ricklefs (1983), and later justified and expanded in scope by Lacy & Bock (1986) to avoid a statistical artefact that they thought was created by using GMA. Their reasoning was as follows. (1) Assume that species are most abundant in the centre of their ranges and decline in abundance towards limits of their range (but see Blackburn *et al*., 1999; Sagarin & Gaines, 2002; Sagarin *et al*., 2006). (2) It follows that those species with range limits falling within a study region will have low mean abundance and occur in fewer samples than species whose ranges do not end within the region. (3) This could result in a positive correlation between abundance and occupancy ‘where no such relationship actually existed’ (Bock & Ricklefs, 1983, p.295). Subsequently, Brown (1984, 1995), Gaston and co-authors (e.g. Gaston *et al*., 1997, 1999b, 2000) and others (e.g. Soininen & Heino, 2005; Leger & Forister, 2009) have continued to use LMA because of its assumed ability to correct for perceived artefactual positive correlations caused, in their view, by inclusion of zero-count or empty samples.

Using LMA does not solve this problem, because sampling the spatial pattern of each species in a region by standardized methods will typically give a positive correlation between GMA and occupancy. The only exceptions occur when sample sizes are small, for example when restricted to a narrow range of abundance values. Fundamental principles from spatial point pattern analysis indicate that this relationship is expected to hold without regard to the position of range boundaries relative to the boundaries of a study region, and without regard to variation in the intensity of the spatial pattern of any species (P.D.W., unpublished). That is, a positive correlation between GMA and occupancy is an inherent property of *any* sufficiently large ensemble of sampled spatial patterns collated into an abundance–occupancy plot, and not a sampling artefact as suggested by Bock & Ricklefs (1983), Lacy & Bock (1986) and others. Regarding the inclination of some ecologists to discard or avoid zero-count samples, it was noted by Diggle (2003, p.32) that this practice is founded ‘in the mistaken belief that an empty quadrat contains no information’. The impact of this discarded information is clearly apparent when we try to relate LMA to occupancy.

An important aspect of sampling biological populations is the impact of imperfect detection of organisms leading to an underestimate of occupied sites and consequent errors in estimates of mean abundance (Royle *et al*., 2005; MacKenzie *et al*., 2006). The impact of so-called zero-inflated data has received some consideration with respect to identifying valid models linking abundance and occupancy (e.g. Wenger & Freeman, 2008; Sileshi *et al*., 2009). However, it can be shown that zero-inflated data will still be affected by the rescaling of abundance between GMA and LMA when correlated with occupancy. This is because underdetection increases LMA, decreases apparent occupancy and therefore increases the impact of the hyperbolic rescaling of GMA into LMA represented in Fig. 1.

My results may have broad implications for the study of macroecological relationships. Macroecology seeks to infer causal mechanisms from relationships observed at large spatial and temporal scales (Brown, 1995; Gaston & Blackburn, 2000). Although it is necessary to first document and classify observed patterns before making causal inferences (Underwood *et al*., 2000), and even though there are occasional successes (Diniz-Filho, 2006; Chapman *et al*., 2009), it has in general proved difficult to make the link between pattern and process in ecology (Cale *et al*., 1989; Lepš, 1990; McArdle *et al*., 1997; Tyre *et al*., 2001). I suggest that efforts to select amongst competing explanatory or causal mechanisms for the abundance–occupancy relationship (e.g. Gaston *et al*., 1997, 2000; Krüger & McGavin, 2000; Heino, 2005; Päivinen *et al*., 2005; Blackburn *et al*., 2006; Webb *et al*., 2009; Zuckerberg *et al*., 2009) have been confounded by the presence of a statistical artefact when using LMA to describe spatial patterns.

The present study lends weight to Wright's (1991) view that simple correlations between abundance and occupancy are of limited explanatory value in macroecology, and provide an instance of the general findings of Brett (2004) regarding spurious correlations or statistical artefacts in ecological relationships. The results I have presented suggest that greater care is required to avoid confounding influences of statistical artefacts in the study and application of macroecological relationships.