Species abundance distributions: pattern or process?


Author to whom correspondence should be addressed: E-mail: aem1@st-and.ac.uk

Hubbell's (2001) ‘unified neutral theory of biodiversity and biogeography’ raises many intriguing, and sometimes perplexing, questions (Whitfield 2002). It is remarkable that a model that assumes that the demographic properties of individuals are independent of the identity of the species to which those individuals belong can generate distributions of species abundance that replicate, to a considerable degree, those observed in real communities. The hypothesis that species are ecologically identical, at least in terms of their contribution to community and regional diversity, has been considered elsewhere (see, for example, Brown 2001; Condit et al. 2002; Clark & McLachlan 2004). One facet of the theory that has, however, received little critical attention is that of the assumptions regarding the distribution of species abundances over large sweeps of space or time. This is important because Hubbell predicts that species abundances in local communities will reflect, to an extent that depends on migration rates within the regional assemblage, the structure of the metacommunity. Hubbell argues that the diversity of a metacommunity is a function of the mode (and rate) of speciation occurring therein. He proposes two speciation mechanisms – point mutation and fission (see below for details) – which result in different abundance distributions. All of the analytical results presented by Hubbell and colleagues (Hubbell 2001; Volkov et al. 2003) relate to the mutation model of speciation which predicts that species abundances will be distributed according to Fisher's (Fisher et al. 1943) log-series. For example, in a recent defence of the neutral model, Volkov et al. state that: ‘Under neutrality at large spatial and temporal scales, Fisher's log-series distribution is the expected steady-state distribution of relative species abundance at the speciation–extinction equilibrium in the metacommunity when the per capita birth and death rates are density independent and the same for all species, and speciation is introduced’. In this paper I therefore revisit Fisher's model and consider its potential as a descriptor of large-scale patterns of species abundance. I also highlight the need for better methods of distinguishing between rival species abundance distributions and the theories that underpin them.

local communities and metacommunities

Hubbell (2001, p. 5) defines a ‘local’ or ‘ecological’ community as a ‘group of trophically similar, sympatric species that actually or potentially compete in a local area for the same or similar resources’. In doing so he offers a more restrictive definition of a community than is usual in investigations of biological diversity; most ecologists measure the diversity of a taxonomically (and geographically) delimited set of organisms, such as forest beetles or pelagic fish, rather than a trophically delimited one (Magurran 2004). However, even though this neutral theory is conceived in the context of a grouping that might otherwise be termed an ‘ecological guild’ or ‘ensemble’ (Fauth et al. 1996), its ability to fit much broader assemblages, such as Amazonian fish communities, which typically contain species occupying a variety of niches (A. E. Magurran, personal observation) suggests robustness against variation in species role. Local communities are embedded in a ‘metacommunity’, which is defined as a regional collection of communities. The metacommunity is the arena in which speciation occurs and the source from which local communities are colonized. The immigration rate, m, is the proportion of individuals in a local community that are replaced by individuals from the metacommunity. When m = 1 the local community is not isolated from the metacommunity and will share the same species abundance distribution (the log-series). As m decreases, the local community becomes more isolated and supports fewer rare species. This has the effect of translating a log-series type distribution in the metacommunity into a log-normal type distribution of abundance in the local community. One of the successes of Hubbell's neutral theory has been to show how shifts in the value of m can generate the sort of negatively skewed log-normal distribution that has so intrigued ecologists (Nee et al. 1991; Harte et al. 1999; Gaston & Blackburn 2000; Magurran & Henderson 2003).

fishers logarithmic series

Fisher's (Fisher et al. 1943) logarithmic series represented one of the first attempts – preceded only by Motomura (1932) whose work the authors were unaware of – to provide a mathematical description of the relationship between the number of species (S) and the number of individuals (N) in a sample.

The log-series takes the form

αx, αx2/2, αx3/3,…, αxn/n,

where αx = the number of species predicted to have one individual, αx2/2 is the number predicted to have 2 and so on. The statistic x, which is estimated by iteration, is usually around 0·99. In other words, x ≈ 1. α, which is an excellent measure of diversity in its own right (see Magurran 2004 for further details), is therefore also an approximate predictor of the number of singleton species (those represented by a single individual) in the assemblage. A key aspect of the log-series distribution is that the singleton class is also the modal class. Species with one individual are more abundant than those with two individuals, doubleton species are more abundant than those with three individuals and so on as the series progresses.

The log-series provided a compelling fit to the butterfly and moth collections described in Fisher, Corbet & Williams's (1943) original paper. However, it is worth noting that these assemblages were hardly representative communities. Only the first 24 individuals of a species were recorded in Corbet's Malayan butterfly assemblage. As a result, a biased value of N (number of individuals) was used to deduce x and α and the size of the assemblage was underestimated. Light traps, of the sort used to collect Williams's moths at Rothamsted in England, provide a notoriously selective sample of the fauna (Southwood & Henderson 2000). The use of the term ‘a random sample of an animal population’ in the title of Fisher et al.'s (1943) paper is perhaps a little optimistic.

Despite these precedents the log-series has been shown to fit a range of assemblages, particularly those that have a high frequency of rare species and that are not too speciose (May 1975; Magurran 2004). However, the log-series distribution may also arise as a sampling distribution (Taylor 1978). At small sample sizes the log-series and (truncated) log-normal distributions are often indistinguishable (see Figs 2·13 and 2·14 in Magurran 2004). It is only when sample size increases and the ‘veil line’ (Preston 1948) is pulled back to reveal the mode of the distribution that the full log-normal becomes apparent and the log-series ceases to be a good fit. This transition from log-series to log-normal is clearly seen in Williams's (1964) 8-year survey of Rothamsted moths (see also Fig. 2·4 in Hubbell 2001). In a related observation Magurran & Henderson (2003) noted that the distribution of abundance of transient species in an estuarine fish assemblage was log-series in character whereas the core species (the permanent residents) were log-normally distributed.

species abundances in the metacommunity

Metacommunities vary markedly in size and diversity. Benthic nematodes in the deep oceans (>4 km deep), a habitat that extends across some 50% of the Earth's surface, are an example of a large metacommunity (Lambshead & Boucher 2003), the fauna associated with tree-hole microcosms (Yanoviak 1999), a small one. How likely is it that their species abundances will follow the log-series? Fisher et al.'s (1943) recipe can be used to deduce the frequency of singletons and other rare species in an assemblage of any size. A metacommunity with 60 species and 2000 individuals is expected to have 12 singletons (20% of species) while a metacommunity with 60 000 species and 2 000 000 000 individuals will have 6000 (10%) singleton species. There is also a high proportion of rare species (which I define as species with 10 or fewer individuals) across a range of metacommunity sizes (see Fig. 1). The percentages for the two examples given above are 58% and 29% of species, respectively.

Figure 1.

The number of singleton (•, n = 1 individual) and rare (◊, n = 10 individuals) species in relation to overall species richness, expected by a log-series distribution, in various metacommunities. Table 1 of Ricklefs (2003) and Figs 8·17 and 8·18 in Hubbell (2001) were used as a guide to possible metacommunity sizes. These range from 200 to 20 000 000 000 individuals and from 20 to 200 000 species.

empirical evidence of rarity

The above results are a product of the fact that the log-series describes a situation where there are many uncommon species, and as a corollary, a few extremely abundant ones. On first inspection the existence of many singleton species seems reasonable. For example, Novotny & Basset (2000) found that 278 out of the 1050 species of leaf-chewing and sap-sucking insects in a tropical insect community were represented by a single individual, while Scharff et al. 2003) detected 19 singletons and 12 doubletons out of a total of 66 spider species sampled during an inventory of mature beech forest in Denmark. However, sampling intensity has a large impact on the perception of rarity. Longino et al. found that the proportion of unique species (detected in a single sample) of ants in a Costa Rican forest varied from 0·13 to 0·47, depending on sampling method. When data from all sampling techniques were combined, the overall proportion of unique species declined to 0·12 (51 out of 437 species). Longino et al. then examined the status of their 51 unique species. The rarity of 20 of these species was attributed to ‘edge effects’, that is species likely to be abundant at the La Selva biological station but hard to sample, or species known to be common elsewhere but rare in this particular geographical locality. Only six species – the so-called ‘global uniques’– were found in a single sample and known nowhere else on Earth. Even these are likely to lose their singleton status given more intense sampling. There is no reason to assume that these ants are unusual or that the status of apparent singletons in other taxa will not be challenged by more intensive investigation. Similar conclusions emerge from other comprehensively investigated assemblages. Only 13 out of 217 breeding bird species in Great Britain, an exceptionally well-documented taxon, have 10 or fewer individuals (Gaston & Blackburn 2000). (The proportion of rare species recorded in any assemblage will depend on how that assemblage is defined, and the time period over which it has been sampled; Magurran 2004.)

rarity and persistence

Small populations are known to be vulnerable to extinction. Although a threshold of 50 individuals (Franklin 1980) is often cited, recent work shows that populations may need to be large to ensure persistence (Reed & Bryant 2000). Reed et al. (2003) used population viability analysis to estimate the minimum viable population size of 102 vertebrate species. A median of 5826 adults was required to achieve a 99% probability of persistence for 40 generations. The metapopulation, and therefore the sum of all the potentially reproductive individuals, of a given species is presumably contained within a metacommunity. A log-series distribution at the level of the metacommunity thus implies that a large fraction of species found there is committed to extinction. Accelerating extinction rates are of real concern (Thomas et al. 2004) but current levels of species loss, even with human assistance, are not on a par with those anticipated by a log-series scenario.

How then are metacommunities likely to be structured? The available evidence suggests that large accumulations of species, at a regional scale, approximate a log-normal distribution, though with certain qualifications (Gaston & Blackburn 2000). For example, fully unveiled distributions often have more rare species than Preston (1948) anticipated. A common difficulty is that large-scale assemblages may have political boundaries (such as the birds of the Czech Republic; Hudec et al. 1995; Gaston & Blackburn 2000) rather than biological ones. However, Gaston & Blackburn (1996) show that the distribution of global population sizes of wildfowl species is indeed approximately log-normal. Whether or not a world species abundance distribution is equivalent to a metacommunity is another issue.

modes of speciation

As noted earlier, a log-series distribution of abundances in the metacommunity is a product of the mutation model of speciation (Hubbell 2001). Indeed this is the only speciation mode that will produce a log-series distribution (Hubbell 2003). This deliberately simple and analytically tractable mechanism assumes that new species arise like point mutations and originate as single individuals. Species formation through polyploidy would be an example. One consequence is that instantaneously produced species may not be sufficiently differentiated from their parents to be recognized as distinct species (Hubbell 2003). It also implies that many lineages go extinct before they become established. Although this might seem to get round the difficulty of an excess of rare species in the metacommunity, it also implies that surveys of local communities vastly underestimate species richness (Ricklefs 2003). The good fit between Hubbell's model and empirical data sets, such as the tree species on Barro Colorado Island in Panama (Volkov et al. 2003), is made using conventional species identities.

Hubbell (2001) discusses an alternative speciation mechanism which he dubs the ‘random fission model’. The model, which is equivalent to allopatric speciation (the most parsimonious mode of species formation; Coyne & Orr 2004), produces new species whose abundances are typically well in excess of one individual. It has the advantage of generating metacommunity species abundance distributions that are approximately log-normal (and hence probably representative of the real world) but suffers the disadvantage of having no analytical solution. Furthermore, Ricklefs (2003) points out that under the random fission model, tropical forests populated by reasonable numbers of tree species (103−104) will have unrealistically small metacommunities or unrealistically low speciation rates. A variant is the ‘peripheral isolate’ mode of speciation (Hubbell & Lake 2002; Hubbell 2003). This produces a relative abundance distribution that falls between the high-diversity, log-normal like, random fission and the low-diversity, log-series like, point mutation. These models differ in their assumptions about the size of the founding population. In reality huge variance in the size of emergent species is likely.

questions of fit

One thorny issue in ecology is how best to discriminate between completing solutions to a problem. For example, there are at least five different explanations for the excess of rare species (log-left or negative skew) often found in log-normal distributions (Sugihara 1980; Tokeshi 1996; Harte et al. 1999; Hubbell 2001; Magurran & Henderson 2003). It is relatively easy to demonstrate that something does not match up, as I have tried to do for the notion that metacommunities have a log-series distribution and as Ricklefs (2003) has shown for the patterns of species life spans and community size predicted by Hubbell's model. Goodness-of-fit tests are, however, a blunt instrument when it comes to distinguishing between rival species abundance distributions. Indeed one of the frustrations of model fitting is that several different distributions, founded on markedly different assumptions, will often be good descriptors of the same data set (Magurran 2004) and that different goodness-of-fit tests lead to different conclusions about the appropriateness of a model. Volkov et al. (2003) point out that, according to a χ2 test, the neutral model fits the distribution of tree species on Barro Colorado marginally better than the log-normal model (McGill 2003). Small differences in sample size, measurement error (Harte 2003) and the manner in which class intervals are selected (Tokeshi 1993; Colwell & Coddington 1994; Magurran 2004) can change the outcome of goodness-of-fit tests. The usual advice is to replicate sampling (Pielou 1975; Wilson 1988; Tokeshi 1993) since one snapshot of community structure may give a biased image of the outcome of many probabilistic events. But replicated inventories of large communities, whose recorded species abundance distributions are often in any case an amalgam of samples collected over space and time, is rarely feasible. There is a pressing need for better methods of assessing fit.

As Volkov et al. (2003, p. 1037) remark, ‘fitting exercises in and of themselves do not constitute an adequate test of underlying theory’. Failure to reject a null hypothesis does not necessarily vindicate the assumptions upon which it is based; the putative mechanisms involved must be demonstrated to be correct beyond all reasonable doubt. It is disappointing that despite a long history of attempts to explain the relative abundances of species in terms of niche apportionment (see Magurran 2004 for a review) there have been few direct experimental tests of the processes involved. Correlations between expected species abundance distributions and empirical ones only tell part of the story. A recent study, which presented experimental evidence for apparent competition in a tropical forest food web (Morris et al. 2004), exemplifies the power of direct tests of ecological theory. Hubbell's neutral model has been invaluable in forcing ecologists to look again at how communities are assembled. The challenge now is to devise better tests of the assumptions and predictions of null models in order to answer fundamental questions they raise about the relative abundance of species (May 1986; Hubbell 2003). Testing the relationship between dispersal limitation, resource use and species abundances in some tractable local communities might not be a bad place to start.


I am grateful to the referees for their helpful comments.