Sampling effort, regression method, and the shape and slope of size–abundance relations



1. Despite a substantial body of work there remains much disagreement about the form of the relationship between organism abundance and body size. In an attempt at resolving these disagreements the shape and slope of samples from simulated and real abundance–mass distributions were assessed by ordinary least squares regression (OLS) and the reduced major axis method (RMA).

2. It is suggested that the data gathered by ecologists to assess these relationships are usually truncated in respect of density. Under these conditions RMA gives slope estimates which are consistently closer to the true slopes than OLS regression.

3. The triangular relationships reported by some workers are found over smaller mass and abundance ranges than linear relations. Scatter in slope estimates is much greater and positive slopes more common at small sample sizes and sample ranges. These results support the notion that inadequate and truncated sampling is responsible for much of the disagreement reported in the literature.

4. The results strongly support the notion that density declines with increasing body mass in a broad, linear band with a slope around −1. However there is some evidence to suggest that this overall relation results from a series of component relations with slopes which differ from the overall slope.


Considerable effort has been spent investigating the shape, slope and interpretation of relationships between body size and organism abundance, the appropriate type of data to use, and the appropriate method of analysis. Despite a substantial body of published work much disagreement remains.

Damuth (1981, 1987) showed (log) density to decline with increasing (log) body mass in a broad linear band with a slope of −0.75. This value suggested a metabolic explanation for the relation, with the implication that populations of different-sized species used equivalent amounts of energy (the energetic equivalence rule, EER). Triangular or polygonal relationships with much flatter slopes (e.g. Brown & Maurer 1986) have subsequently been described (e.g. Brown & Maurer 1987; Blackburn, Harvey & Pagel 1990), implying that larger species were energetically dominant. Energy limitation might determine upper bound abundances (Fig. 1) (Blackburn et al. 1993) whereas lower bound abundances could be the result of sampling artefacts (Griffiths 1992; Blackburn et al. 1993; Blackburn, Lawton & Pimm 1993; Currie 1993) or reflect minimum viable population sizes (Silva & Downing 1994). Navarrete & Menge (1997) showed that predation could reduce the slope of a size–abundance relation by preferential consumption of small species. Silva & Downing (1995) found the overall relation for mammal data to be non-linear, with larger species exhibiting flatter slopes: Damuth (1993) produced evidence for the converse within dietary groups. Griffiths (1992) pointed out that a potentially more appropriate technique, the reduced major axis method (RMA) gave steeper slope estimates (of about −1.0) than those obtained by ordinary least squares (OLS) regression. It has been argued that compiled data will bias the relation because of a tendency to report or study species in high density populations (Lawton 1989, 1990): Blackburn & Gaston (1996) and Smallwood & Schonewald (1996) found evidence for this. However Blackburn & Gaston (1997) found no differences between compiled and natural assemblage data: they showed that the scale at which the study was conducted was the most important factor influencing results. They examined a range of possible explanations for the variability in size–abundance relations but were unable to decide whether the patterns which have been detected were real or were sampling artefacts.

Figure 1.

The various components of a size–abundance distribution.

Here I investigate the effect of sampling range and regression model on the shape and slope of the relations detected when sampling from various simulated data distributions and compare the results obtained with observed relationships. The findings support the notion that much of the variability previously reported is artefactual, and that abundance declines with increasing mass in a broad band of slope approximately −1.

Describing the shape of size–abundance relations

A size–abundance relationship is derived by plotting the mean density (or number of individuals counted) of species against their mean body size (usually mass). Figure 1 names the various components characterizing the shape of this relationship. Mean slope (usually referred to simply as slope) is determined by regression of all the data points. The boundary slopes might not be linear, as illustrated for the lower- and upper-bound slopes shown in the figure. Blackburn, Lawton & Perry (1992) described a procedure for determining the slope of upper bounds: this can also be applied to lower bounds.

I assume that the size-independent component of the lower bound is a sampling artefact (Griffiths 1992; Currie 1993; Blackburn & Gaston 1994a). Any distribution in which this component dominates is termed triangular whereas if most of the lower bound is a function of body size the distribution is described as linear. I ignore the rising component of the upper bound at small body sizes noted by some authors (see Marquet, Navarette & Castilla 1995 and references therein), a number of whom regard it as a sampling artefact.

The data distributions

Body mass distributions were generated by randomly sampling from uniform, positively skewed unimodal or polymodal distributions: the latter two mass distributions are common in nature. Densities for distributions 1–3 were then generated by assuming that


over the range 0–12 log10 mass. This slope value is close to that detected by Griffiths (1992). The resulting densities were randomised as normally distributed variables with standard deviation = 1 which produced a 3–4 fold range in log density at any given size, consistent with observed values. Distributions 4 and 5 also assumed the same overall density–mass relation across modes. However within each size mode density was weakly mass-dependent in distribution 4 and strongly mass-dependent in distribution 5.

Distribution 1

800 species following a uniform mass distribution were created. The observed ordinary least squares (OLS) slope was −1.00 ± 0.010.

Distribution 2

A skewed mass distribution was generated from a gamma distribution with shape parameter 6 and scale parameter 0.4 (Wilkinson 1990): 900 species were created. The observed OLS slope was −1.01 ± 0.017.

Distribution 3

Mass was polymodally distributed. Three normally-distributed random variates of mean log mass 2, 6 and 10 and standard deviation 1 were generated and combined to obtain a polymodal size distribution. A total of 900 species were created, 17 of which were outside the 0–12 log mass range. This distribution, like that of distribution 1, is homoscedastic. The observed OLS slope was −1.01 ± 0.017.

Distribution 4

The same mass distribution was used as in distribution 3. However within each of the size modes density was calculated as a normally distributed random variate of standard deviation 1 and slope −0.5: the intercepts were 11, 9, and 7 for progressively larger size modes. The overall observed OLS slope was −0.96 ± 0.010 rather than −1.00.

Distribution 5

The same mass distribution as distribution 3 was used but within each size mode density was calculated as a normally distributed random variate of standard deviation 1 and slope −1.5: the intercepts were 13, 15, and 17 for progressively larger size modes. The overall observed OLS slope was −1.06 ± 0.011 rather than −1.00. These distributions, with LOWESS smoothed lines fitted, are shown in Fig. 2.

Figure 2.

Data distributions 2–5 (a–d respectively) with LOWESS smoothing (tension 0.2) applied. A smoothed line of tension 0.7 (dashed line) is also shown for distribution 4.

A symmetrical version of distribution 2 (generated as a normally distributed random variate of mean 6 and standard deviation 2) gave similar results to distribution 2 and the results obtained are not reported here.

I assume that crude density estimates (sensuBlackburn & Gaston 1997) for assemblages/communities are obtained following one of two sampling procedures. Many ecologists census or sample once from a defined area (which supports a finite number of individuals): the lowest abundance estimate is obtained when a species is represented by a single individual. When repeated censuses are taken species which occur very infrequently are often indicated simply as present e.g. Thiollay (1994) and cannot be used in size–abundance analyses: Gaston (1994) lists other potential sampling biases. I imitated such ‘density-prescribed’ sampling by analysing the shape and slope of the data distributions, truncated over an increasing range of densities from common to rare (Fig. 3a). In contrast in ‘size-prescribed’ sampling samples are taken over an increasing size range, with the proviso that all samples included the smallest, often most common, species (Fig. 3b). Such samples might be obtained by investing increasing effort until a particular sized species is incorporated in the samples. Some reported assemblages are size-prescribed, consisting, for example, of small or large mammals (see the reference list in Silva & Downing 1995 for examples).

Figure 3.

(a) density-prescribed (horizontal lines) and (b) size–prescribed (dotted vertical lines) sampling from a size–abundance distribution (shaded band). Note that the ratio of the solid horizontal and vertical lines (approximately the same as the standard deviations of the two variates and hence the RMA slope), is constant in (a) but changes with the size range sampled in (b).

In this manuscript range is measured as the difference in log10 maximum and minimum values i.e. in orders of magnitude. Where appropriate statistics are reported with standard errors unless otherwise stated.

Regression methods

Ricker (1973) drew the assumptions made by various regression models to the attention of ecologists. Estimated slopes differ little between regression methods when the variables are highly correlated but differ considerably when there is substantial scatter in the relationship.

In the analysis of mass–abundance distributions debate about the appropriate regression model has centred around the relative sizes of the error variances in the dependent and independent variables. When there is no variance in estimates of the predictor variable ordinary least squares regression is favoured while the reduced major axis (RMA) method is preferred when the error variances are equal to the variances of the two variables (Ricker 1973; McArdle 1988). However, Ricker (1973) pointed out other circumstances when RMA was a superior estimator. For example, when samples were taken from naturally polymodal distributions (Ricker type B), or were part of open-ended distributions (Ricker type E) RMA was consistently better at estimating the true slope than OLS regression. Over large size ranges species size distributions are frequently polymodal and open-ended (e.g. Figure 1 of Blackburn & Gaston 1994b; Kirchner, Anderson & Ingham 1980; Warwick 1984; Griffiths 1986; Holling 1992).

However, RMA is a potentially misleading slope estimator when correlation coefficients are close to zero: RMA slope = OLSb/r, giving RMAb approaching ∞ as r approaches 0. Furthermore, RMA slopes around 0 will generate a bimodal slope distribution even when the OLS slope distribution is unimodal simply because RMA slopes are always more positive or negative than OLS slopes.

LOWESS (locally weighted scatterplot smoothing) regression has been used to detect the pattern of change in mean slope (Silva & Downing 1995): it should be noted that this, like OLS regression, is a least squares procedure and assumes that there is no error in the predictor variable (Cleveland 1985). The degree of flexibility in a LOWESS smooth is controlled by the tension parameter (Wilkinson 1990) which ranges between 0 and 1 for flexible to stiff lines.


Simulated data

If samples are density-prescribed the shape of the size–abundance relation depends on the range sampled, varying from triangular at small sample ranges to linear at large ranges (Fig. 4).

Figure 4.

LOWESS fits (tension 0.5) to distribution 1 data for densities prescribed above 1010 and 104. Note that when a small sample is taken (densities greater than 1010) the relation appears to be non-linear and the slope flatter.

Despite the underlying relation being linear in distributions 1–3, the degree of linearity of the slope detected by LOWESS varies with the sample range because rare, larger species are missed, i.e. density–truncated sampling reduces the density range for large species (Fig. 4). Hence triangular profiles produce concave slopes while largely linear relations are obtained at larger sample ranges.

The measured slope also depends on the range sampled (Fig. 4) because at small ranges sampling bias becomes progressively more important. In addition it depends on the estimator used. When samples are density-prescribed OLS slopes become progressively poorer estimates of true slope as the sampled size range decreases, with estimates at small ranges being less than half the true slope (Fig. 5a–e). RMA was a good estimator until the number of data points (species) dropped below about 40, when it too seriously under-estimated slopes. RMA performed better than OLS regression for the non-linear distributions 4 and 5 but still appreciably under-estimated within mode slopes (Fig. 5d,e). If samples are size-prescribed OLS regression was the better estimator (Fig. 5f) for all distributions.

Figure 5.

OLS (·) and RMA (○) slopes as a function of mass range for (a–e) density-prescribed samples from distributions 1–5 respectively and (f) for size–prescribed samples from distribution 3. All plots are LOWESS smoothed with tension = 0.4. The triangles on the ordinates indicate the true slope values: note that within mode and across mode true slopes are shown in (d) and (e). 95% confidence limits are not shown for all distributions to maintain clarity in the plots: note however that confidence limits for both estimators are a strong function of range.

The performance of OLS and RMA methods in estimating slopes was summarised for distributions 1–3 by determining the number of estimates that failed to enclose the generating slope (b = −1.00) and by recording which of the two estimates was closer to this value (Table 1). RMA outperformed OLS, though neither method was satisfactory for distribution 2 where both methods frequently failed to enclose the true slope. Note, however, that RMA again did much better than OLS in a random sample of 100 observations from distribution 2. For distributions 1–3 it was only at large sample sizes (when close to 100% of observations were included) that OLS regression was a better estimator than RMA. While this is not a rigorous evaluation of the performance of the two methods it does suggest the general superiority of the RMA method.

Table 1.  Summary of the performance of OLS and RMA methods in estimating the slopes of density-prescribed samples. Distribution 2* is a random sample of 100 observations from distribution 2
DistributionNumber of estimates% of estimates significantly different from true valueOLSRMA% of estimates when RMA better

Published data

This section examines published data to test if there is evidence for sampling artefacts affecting the shape and slope estimates of size–abundance relations.

Figure 6 shows the relation between the range of densities and of mass encountered in a variety of published size–abundance relations (Peters & Wassenberg 1983; Terborgh 1983; Damuth 1987; Gaston & Lawton 1988; Morse, Stork & Lawton 1988; Macpherson 1989; Marquet, Navarette & Castilla 1990; Basset & Kitching 1991; Carrascal & Telleria 1991; Griffiths 1992; Novotny 1992; Blackburn et al. 1993; Cotgreave, Hill & Middleton 1993; Blackburn & Lawton 1994; Strayer 1994; Ebenman et al. 1995; Gregory & Blackburn 1995; Nilsson & Svensson 1995; Silva, Brown & Downing 1997). These distributions were classified by data type (natural assemblages or compiled data) and were subjectively identified as triangular (polygonal) or linear (I omitted four distributions intermediate in shape, i.e. ones I found hard to classify). There were significant differences in the mean ranges of triangular and linear plots for both density and mass data (density F1,26 = 24.77, P < 0.001; mass F1,25 = 12.43, P = 0.002), consistent with the notion that triangularity is a sampling artefact. However, there was also a significant association between data type and the shape of the distribution, with linearity being associated with compiled data (χ2 = 10.22, P = 0.001, though two of the four cells had less than five observations). There were insufficient observations to distinguish between the data type and sampling range hypotheses.

Figure 6.

The relation between the range of densities and mass encountered in published size-abundance data for datasets that were subjectively classified as triangular (·) or linear (○).

Some datasets have sufficient observations to allow estimation of the upper or lower bound and mean slopes. Data in Strayer (1994) allows determination of all three slopes (because part of the lower bound was size-independent the descending limb slope was estimated by piece-wise linear regression (Wilkinson 1990)). The respective slopes (up to log mass = 3, beyond which there are only two data points) were similar to each other (RMA slopes ± approximate 95% C.I. −0.49 ± 0.12, −0.62 ± 0.08 and −0.61 ± 0.51 respectively), suggesting parallel upper and lower bounds to the data. Strayer noted that there were no density estimates for an appreciable number of the species known to occur in Mirror Lake but no evidence of any size-bias was presented. Blackburn et al. (1993) presented mean and upper bound slopes for a number of predominantly local assemblages. The median mean and upper bound RMA slopes for the 10 local assemblages were −1.24 and −1.53 respectively. All figured assemblages were triangular in shape but this appeared due to sampling artefacts (i.e. to species represented by a single individual per species). For compiled mammal data the lower bound RMA slope (−0.82 ± 0.15 95% C.I.) (Silva & Downing 1994) is similar to the mean and upper bound slopes of 20.97 ± 0.05 and 20.99 respectively (Damuth 1987), consistent with the notion that density declines in a broad, linear band with increasing size.

Figure 7 shows OLS slopes as functions of the number of observations and mass range for local assemblages (Brown & Maurer 1986; Griffiths 1986; Marquet, Navarette & Castilla 1990; Blackburn et al. 1993; Strayer 1994; Thiollay 1994; Brawn, Karr & Nicholls 1995; Silva & Downing 1995; Navarrete & Menge 1997). There was no trend in slope value with sample size but the scatter increased greatly for small samples and the majority of positive OLS slopes occurred in assemblages comprised of small sample sizes, and a limited species size range. Multiple regression showed that absolute deviations from the mean slope value of −0.302 were significantly related to both the number of species and the mass range (slope deviations v number of species OLS b = −0.011 ± 0.004, P = 0.003; slope deviations vs. range OLS b = 0.070 ± 0.035, P = 0.050, n = 76). This result strongly suggests that positive slopes are sampling artefacts. 22% of the OLS slopes were positive –Blackburn & Gaston (1997) report 28% for their data – but restricting the analysis to assemblages with Ð25 species and a logarithmic size range Ð2 reduces this to 13%. Just 19% of the 126 assemblages analysed here met these criteria, suggesting a strong sampling bias in current data. Only one of the three positive RMA slopes in this restricted dataset was statistically significant compared to 17/21 of the negative slopes. The mean negative RMA slope of the restricted dataset was −1.05 ± 95% C.I. 0.14 (Fig. 8).

Figure 7.

OLS slopes for assemblage datasets as a function of (a) number of observations and (b) mass range. The regression line in the latter plot is statistically significant (OLSb = −0.127 ± 0.054, P = 0.020).

Figure 8.

Histogram of RMA slopes for the entire data presented in Fig. 7 (unshaded columns) and for datasets with Ð25 observations and a mass range Ð2. Note that the apparent bimodality in slopes is an artefact of the RMA method.

Some data and analyses are consistent with variable slopes within a dataset. LOWESS smoothing of the Mirror Lake data detected only a linear relation: however, Strayer (1994) showed that RMA slopes for all 10 taxa were steeper than that of the overall dataset. Data in Gordoa & Duarte (1992) give an overall RMA slope for the 38 most abundant fish species of −0.46, considerably flatter than the median, individual species, slope of −0.75 (assuming W ∝ L3). In contrast the mammal data compiled by Fa & Purvis (1997) showed little density change within a size mode and thus correspond to a type 4 distribution.


The results suggest that attention needs to be paid to how data are collected, to the number and size range of species incorporated in the samples, and to the regression method used to avoid the danger of sampling artefacts when presenting and analysing size–abundance data in future.

McArdle (1988) concluded that the RMA method should be used when the error (both natural and measurement) in the X variable was greater than one third the error in the Y variable. He noted that the effects of non-normality and non-homogeneity in these errors had not been investigated. In the analysis presented here truncated (density-prescribed) sampling generates non-normal, heteroscedastic errors in the Y variable. In these circumstances the RMA method is the superior estimator in the majority of instances: a definitive answer to this question will depend on more extensive analysis.

Why should the RMA method be more effective at estimating true slopes in density-prescribed data and OLS be better for size-prescribed data? The reason is most easily seen by examining the shape of the sampled assemblages. In OLS regression the slope is determined by the covariance whereas in RMA it is simply the ratio of the standard deviations of the X and Y variables (Ricker 1973). For density-prescribed data the shape of the relationship (and hence the OLS slope) changes with increasing sample range (Fig. 3a) but the relative sizes of the standard deviations of X and Y variates (and hence the RMA slope) does not change. The converse is true for size-prescribed data (Fig. 3b).

Blackburn & Gaston (1997) questioned the sufficiency of the sampling range hypothesis to explain observed relations because the coefficient of determination (CD) rose and OLS slopes were steeper at small sample ranges (their Fig. 6), contrary to expectation (Currie 1993) when sampling from a linear relation. This pattern could be real, because within mode slopes are steeper than the overall slope (a type 5 distribution), or be an artefact. Polynomial regression failed to detect any non-linear components (data digitized from their Fig. 6). All but two of the data points for their body mass range below three orders of magnitude (that part of the dataset causing the discrepancy from the expected pattern) come from the mammal data of Silva & Downing (1995). The median number of species in these mammal assemblages was 9 and the mass range 1.7 i.e. the statistics are based on very small samples and are well within the range at which sample statistics become unreliable (see Fig. 7).

Blackburn & Gaston (1997) noted differences between the mean CD but not the slopes of mammal vs. bird assemblages. Comparison of mammal (Silva & Downing 1995) with bird (Brown & Maurer 1986) data found similar differences. Unlike Blackburn & Gaston (1997) there was no difference between taxa in the size range sampled in bird data from Griffiths (1986) but there were significantly fewer species in mammal than bird assemblages (means 10.2, 19.1). Again sampling artefacts could account for the reported differences.

Whether the slope of the size–abundance relation is flatter for larger species remains unresolved. Silva & Downing's (1995) claim of non-linearity is unproven since it is based on compiled data which are potentially subject to large sampling biases (Smallwood & Schonewald 1996). While LOWESS is an excellent smoother it can be misleading if an inappropriate tension parameter is chosen. For example two lines of differing tension are shown in Fig. 2(c): despite the underlying relation being linear within the largest mode LOWESS generates a levelling off in slope at lower tensions for the largest species. Hence it is possible that the similar pattern illustrated in Fig. 1 of Silva & Downing (1995) is, in part, an artefact of the smoothing procedure.

How realistic are the simulated distributions? Mass distributions of species are uni- or polymodal rather than uniform (e.g. Kirchner et al. 1980; Warwick 1984; Griffiths 1986; Brown & Nicoletto 1991; Blackburn & Lawton 1994). However, with available data it is difficult to distinguish between distributions in which the component slopes are the same as, or differ from, the overall slope though the literature provides some instances of the latter. For example Kozlowski & Weiner (1997) proposed that interspecific allometries in life history characteristics result from natural selection operating on intraspecific allometries for species of varying ecologies. Similarly, Dickie, Kerr & Boudreau (1987) showed that within taxon slopes of production–biomass ratios declined more steeply with body mass than the overall slope: they suggested an overall physiological slope, made up of component slopes where ecological processes were more important. Component, parabolic, relationships in normalized biomass–size spectra (see below) have been suggested to be a consequence of predator–prey interactions (Thiebaux & Dickie 1992, 1993). Griffiths (1992) pointed out that energetic equivalence for organisms consuming the same resource would imply a size–abundance (RMA) slope of −0.8 whereas across trophic levels a slope of −1 would be expected because of inefficiencies in energy transfer (Platt 1985). Since there is a reasonable correlation between size and trophic level in aquatic communities one might expect to find type 4 distributions there.

There are a few instances in which component slopes differ from the overall slope but suitable data to test this are scarce. Normalized biomass–size class plots, common in the aquatic literature, are based on summed densities within logarithmic size classes and therefore approximate the upper bound lines of size–abundance relations (Vidondo et al. 1997). Biomass–size spectra have overall negative slopes of about −1.2, consistent with the findings presented here, and in most instances show systematic deviations from linearity, some size classes being more and others less abundant than expected (e.g. pelagic habitats: Witek & Krajewska-Soltys 1989; Rodriguez, Echevarria & Jimenez-Gomez 1990; Gasol, Guerrero & Pedrós-Alio 1991; Sprules & Stockwell 1995, benthic habitats: Schwinghamer 1981; Hanson, Prepas & Mackay 1989; Poff et al. 1993). In at least some instances, lower than expected abundances correspond to transitions between counting and/or sampling methods (e.g. Schwinghamer 1981; Rodriguez et al. 1990) so that artefacts due to methodology cannot be ruled out. In all of the cited examples, lower than expected abundances coincide with transitions between taxa or other groupings, e.g. between pico-, phyto- and zoo-plankton. This pattern could result from type 4 species distributions since the largest and smallest organisms within taxa are usually less speciose than medium-sized ones. If this is generally true then the differing conclusions reached by previous workers simply reflect different scales of investigation, with size- or taxon-prescribed samples being more likely to detect weak or non-existent relations between size and abundance while cross-taxon (community) samples will find strongly negative slopes. Note that a dome-shaped component slope is consistent with the positive relation between size and abundance noted for the smallest species (Marquet et al. 1995).

Some of the disagreement between workers reflects population versus ecosystem perspectives. The EER argument is only sensible when applied at the resource or ecosystem levels and should therefore not be applied to compiled data or to samples from communities arbitrarily delimited by, for example, taxon. Whether energy does limit abundance over a large mass range is unresolved, though the biomass flux models proposed by aquatic ecologists (Platt 1985; Boudreau, Dickie & Kerr 1991) suggest that it does at the ecosystem level. What is clear is that the overall slope of size–abundance relations of −1.05 indicates that, on average and within assemblages, small species consume more energy than large.


My thanks to Paul Harvey and two referees for helpful comments on the manuscript.

Received 18 September 1997;revision received 21 January 1998