*Ecology Letters* (2010) 13: 627–642

**Ecology Letters**

# Towards a unification of unified theories of biodiversity

E-mail: mail@brianmcgill.org

## Abstract

### Abstract

A unified theory in science is a theory that shows a common underlying set of rules that regulate processes previously thought to be distinct. Unified theories have been important in physics including the unification of electricity and magnetism and the unification of the electromagnetic with the weak nuclear force. Surprisingly, ecology, specifically the subfields of biodiversity and macroecology, also possess not one but at least six unified theories. This is problematic as only one unified theory is desirable. Superficially, the six unified theories seem very different. However, I show that all six theories use the same three rules or assertions to describe a stochastic geometry of biodiversity. The three rules are: (1) intraspecifically individuals are clumped together; (2) interspecifically global or regional abundance varies according to a hollow curve distribution; and (3) interspecifically individuals are placed without regard to individuals of other species. These three rules appear sufficient to explain local species abundance distributions, species–area relationships, decay of similarity of distance and possibly other patterns of biodiversity. This provides a unification of the unified theories. I explore implications of this unified theory for future research.

## Unified theories

A unified theory is a theory that ties together branches formerly seen as separate and unconnected. Physics has unified the electromagnetic and weak forces producing two Nobel prizes in the 20th century, unified the electric and magnetic forces (by Maxwell in the late 19th century), and unified motion on earth and in the heavens (Newton in the 17th century). Einstein spent the last 20 years of his life unsuccessfully trying to unify gravity with the electromagnetic forces. Similarly chemistry deifies Mendeleev for the unifying role of the periodic table. One might think ecology was too immature or too complex to support unified theories of its own.

On the contrary, in the last 10 years ecology, specifically macroecology, has produced not one, but at least half a dozen different unified theories of biodiversity. These theories broadly unify ideas of area, abundance and richness to produce from a few underlying principles such seemingly distinct patterns as the species–area curve and the species abundance distribution. With one exception (neutral theory), these unified theories have arrived with relatively little fanfare. Unlike physics, unification has not been heralded as one of the highest achievements in ecology. No doubt this is in part due to certain sociological tendencies in ecology which fail to appreciate theory in general and especially theory that greatly simplifies the natural world (Kingsland 1995; Simberloff 2004). But it is also undeniably a problem that there is not one, but at least six different unified theories. And the theories seem extremely different from each other. They start with radically different assumptions. One starts with the niche while another explicitly rejects the niche. Some are at scales of 100s m while others are at scales of 1000s km. The math ranges from birth–death processes to the recursive nature of fractals to Gaussian bell-curves. But all share the property of being highly stochastic (probably further working against their broad acceptance).

Here I show that the differences are superficial and at a deep level all of the unified theories share a common set of rules and approaches. In short, once one navigates through superficialities, there is a single unified theory of biodiversity that starts with a few simple rules or assertions that in turn can explain disparate features of ecology. In short, there is a unification of all the unified theories.

## Review of unified theories of biodiversity

I review six different unified theories. Two of these were first presented, at least in partial form, in the 1970s, but then remained relatively dormant (at least as unified theories), only to then receive major reworkings, improvement, testing and attention in the last decade. The other four all were first developed and published within the last decade. Thus the last decade has provided a rapid burst of unified theories of biodiversity. My delimitation of what is or is not a unified theory is somewhat arbitrary, but I have tried here to focus on theories that reproduce at least two major previously known patterns of macroecology (usually the species–area relationship and the local species abundance distribution), and I have tended to group together conceptually related efforts. I have also deemphasized theories (Harte *et al.* 1999, 2005) where the authors have themselves moved on to newer theories (Harte *et al.* 2008). I now briefly summarize each of these six theories, proceeding in chronological order. Also see Tables 1 and 2 and Fig. 1.

Unified theory | Key references | Input parameters | Math | Spatial model | Assertion 1 Intraspecific spatial | Assertion 2 Interspecific variation in global abundance | Assertion 3 Interspecific independence |
---|---|---|---|---|---|---|---|

Continuum | (Gauch & Whittaker 1972; Hengeveld et al. 1979; Coleman et al. 1982; McGill & Collins 2003) | S, N_{max}∼, σ∼, A | Probability theory (analytical) | Density surface (aggregate) | Peak and tail | N_{max} and σ are sampled from distribution (input) | Each peak is located according to a Poisson process (random wrt other species) |

Neutral | (Caswell 1976; Hubbell & Foster 1986; Bell 2000, 2001; Hubbell 2001) | S or Θ, N, m | Birth–death (analytical) + Lattice (simulation) | Lattice (individual) | Dispersal-limited | Metacommunity processes create logseries regional abundances (derived) | Each lattice cell can be populated by any species |

Metapopulation | (Hanski & Gyllenberg 1997) | S, A∼, w∼ | Levins metapopulation differential equation (analytical) | Probability present|A (aggregate) | Incidence | Density of species w_{i} is sampled from loguniform distribution (input) | Presence of one species on a patch is modelled independent of any other species allowing simple summation |

Generalized Fractal | (Harte et al. 1999; Green et al. 2003; Storch et al. 2008) | S, l_{i}∼, D_{i}∼ | Hierarchical division (simulation) | Spatially explicit (aggregate) | Hierchically clumped | Each species is modelled with four nested levels of multiplication of a uniformly distributed random variables approaching a central limit theorem like situation (derived) | Each species is modelled and placed in space independent of other species allowing simple summation |

Clustered Poisson | (Plotkin & Muller-Landau 2002; Plotkin et al. 2002; Morlon et al. 2008) | S, N, aggregation parameters | Neyman–Scott process | Point process (individual) | Explicitly aggregated | Regional species abundance distribution is specified (various used) (input) | Each species is its own clustered Poisson process without reference to other species, allowing simple summation |

MaxEnt | (Harte et al. 2005; Pueyo et al. 2007; Harte 2008) | S, A, N, E | MaxEnt (analytical) | Not spatially explicit (aggregate) | Derived exponential P | MaxEnt plus a constraint on total abundance gives a logseries SAD (derived) | P_{i}(n|A) is independent of other species, allowing simple summation |

Global SAD | Local SAD | SAR | Abb-Occ | Decay of Sim | Other | Test data | Test scale | |
---|---|---|---|---|---|---|---|---|

Continuum | X | X | X | X | Elevational transects Breeding bird survey | 10s km 1000s km | ||

Neutral | X | X | X | X | BCI | 100s m | ||

Metapopulation | X | X | Moths on offshore islands; birds in habitat patches | 1s km 10s km? | ||||

Poisson cluster | X | X | BCI, Pasoh | 100s m | ||||

Fractal | X | X | X | Box Dim, P(A) | BCI, Czech birds | 100s m 10s km | ||

MaxEnt | X | X | X | X | EAR, P(A) | BCI | 100s m |

### Continuum theory

Gauch & Whittaker (1972) presented a model that was intended to capture all the rules observed in Whittaker’s gradient studies (Whittaker 1952, 1960; Whittaker & Niering 1965) in such a way that realistic communities across gradients could be simulated. Nine empirically derived rules were given of which we repeat four here: (1) abundance of a species along a linear environmental gradient is roughly Gaussian bell-curve in shape; (2) the location in space of the peaks are distributed randomly (with some caveats about dominant species); (3) the maximum observed abundance across species is distributed log-uniform (i.e. geometric) in small assemblages and log-normal in large assemblages; and (4) the width of the bell curves is normally distributed. Gauch and Whittaker showed that these rules led to realistic communities by the test of visually inspecting the simulated communities along a gradient and getting realistic ordination results, but they did not explicitly link this model to macroecological patterns. Hengeveld made the connection that this model could explain local species abundance distributions and derived analytical results (Hengeveld *et al.* 1979; Hengeveld & Haeck 1981). McGill & Collins (2003) ended up in the same place but starting from the literature on Gaussian bell-curves of abundance across two-dimensional species ranges (Brown *et al.* 1995, 1996) rather than along an environmental transect. They independently derived a model based on these same four rules (except the width of bell curves was lognormal). They showed that these assumptions produce realistically shaped species–area curves, species abundance distributions, decay of similarity with distance, and abundance occupancy correlations. They derived these conclusions analytically (with help from Allen & White 2003) and showed using Monte Carlo simulation that the results were robust to minor variations in the assumptions. They also used data from the North American Breeding Bird Survey and showed that without curve fitting the model explained the species abundance distribution and species–area relationship well.

### Neutral theory

Caswell (1976) suggested that neutral molecular evolution models could be applied to abundances of species in ecological communities. He showed that such an approach produced realistic species abundance distributions and species–area relationships but that other patterns such as change in evenness over succession were not realistically produced. Hubbell (1979) and Hubbell & Foster (1986) also suggested neutral drift might be the dominant factor in structuring communities. In a series of papers 20 years later, Bell and Hubbell (Bell 2000, 2001; Hubbell 2001) proposed a neutral theory of biodiversity that assumed: (1) neutral demographics (per capita birth and death rates constant between species), (2) neutral dispersal limitation (dispersal distance is identical across species and in the form of diffusion); (3) immigration from an external metacommunity to prevent drift to fixation; (4) explicit absence of differential response to environmental heterogeneity; and (5) absence of species interactions. Hubbell (2001) also added a neutral evolution component. These five assumptions were able to accurately reproduce many of the patterns held central in community ecology such as species abundance distributions, species–area curves, and decay of similarity with distance. Later work has shown that this model can accurately predict clumping of individuals in space (Chave & Leigh 2002; Houchmandzadeh 2008).

### Metapopulation

Hanski & Gyllenberg (1997) started from metapopulation theory which studies patchy networks (or island networks). They used the standard differential equation model (Levins 1969) of patch occupancy, *p*_{ij}, for species *i* on patch *j*: d*p*_{ij} /d*t* = *C*_{i}(*t *)(1 - *p*_{ij}(*t *)) - *μ*_{ij}*p*_{ij}(*t *). The two parameters *C* and *μ* are functions of island area, *A*_{j}, species density, *w*_{i}, and island population *K*_{ij} = *w*_{i}*A*_{j} where *A*_{j} and *w*_{i} are each assumed to be log-uniform distributed. In the case of a mainland/island model, *C*_{i}(*t*) = *c*_{1}*w*_{i} and *μ*_{i}(*t*) = *c*_{2}/*K*_{ij}. A slightly more complicated form of *C*_{i} is used in archipelagos with no mainland. These assumptions produce a Michaelis–Menton-like incidence curve *p*_{ij} (probability species is present on an island given species abundance and island area). The number of species on an island is then (assuming independence between species) *E*(*S*_{j}) = Σ_{j} *p*_{ij}(*A*_{j},*w*_{i}). This gives an empirically realistic island-species–area curves (distinct from the nested species–area curves of the previous two theories). Similarly, what they called a DA curve giving the probability of patch occupancy on an island (*P*_{j}) can be obtained by summing over *i*: *E*(*P*_{j}) = Σ_{i }*p*_{ij}(*A*_{j},*w*_{i}). The links to species–area relationships were analysed in more detail later (Ovaskainen & Hanski 2003).

### Fractal

A fractal object is one that is self-similar, i.e. it maintains basic geometric measurements across spatial scales. A power-law (*S* = *cA*^{z }) form to a species–area distribution (something often found empirically to be at least approximately true) suggests that individuals are distributed in a self-similar fashion. Harte *et al.* (1999) demonstrated that assuming a fractal distribution of individuals can produce a number of macroecological patterns including not just the aforementioned power-law species–area distribution, but a distribution of occupancies (specifically, the probability, *P*(*n*, *A*|*A*_{0}) that *n* individuals are observed in an area *A* given a total area *A*_{0}), a species abundance distribution derived from the occupancy distribution (by taking *A* = the area occupied by one individual), and a new pattern known as the endemics–area relationship (giving the number of species found only in the given area). A debate ensued about whether this theory assumed community-level self-similarity or species-level self-similarity. Lennon *et al.* (2002) pointed out that species-level self-similarity does not produce a power-law species–area relationship. An empirical test (Green *et al.* 2003) showed that the community-level self-similar assumption did not produce realistic communities (due to the assumption that all species are the same), but the individual-level self-similarity model worked fairly well, failing only in slightly over estimating the degree of spatial aggregation (clumping of individuals).This suggested that the occupancy function depended on *N*_{0} the global abundance of a species (*P*(*n*|*N*_{0},*A*,*A*_{0}) and now required a distribution of global abundances. Harte *et al.* (2005) proposed an alternative theory incorporating such variation between species that was no longer self-similar but produced an appropriate degree of clumping known as HEAP. One problem with assuming self-similarity is that species distributions do not appear self-similar (Condit *et al.* 2000; Hartley *et al.* 2004). Borda-de-Agua *et al.* (2002) developed a model using multifractals, in which the fractal dimension changes systematically with scales. This model was able to produce species–area relationships, species abundance distributions (and how they change with scale), and range–size-abundance relationships. Storch *et al.* (2008) proposed a model based on generalized fractals. Generalized fractals suggest combining patches of species presence in a hierarchical fashion, in a manner not dissimilar to fractals, but allowing key parameters such as the number of clumps and proportion of area to vary from scale to scale. This model also produces realistic patterns including species–area relationships, probability of abundance, *P*_{i}(*n*|*A*,*A*_{0}) which in the limit of small area gives species abundance distributions, and distribution of fractal dimensions. Because Harte and colleagues have replaced their fractal model with a MaxEnt model (Harte 2008) and because the generalized fractal model fits empirical data better, I focus hereafter on the generalized fractal model.

### Clustered Poisson

Starting from the empirical observation that individuals of a given species are nearly always spatially aggregated (clumped) on scales from m^{2} to hectares (He & LaFrankie 1997; Condit *et al.* 2000; Plotkin *et al.* 2000), several authors have used the clustered-Poisson (aka Neyman–Scott) point process as a model. A Poisson-cluster model is one of the simplest and most well-known point processes (stochastic models of the location of points in space). A Poisson-cluster process (Stoyan & Stoyan 1994) first places `mother’ points at random locations (Poisson processes), then places multiple `daughter’ points centred around the mother points. Parameters to the model include the number (intensity) of mother points, number (or parameters for the probability distribution of the number) of daughter points around each mother point, and the distance and fashion of placing points around the seed points (e.g. a bivariate Gaussian density with distance to inflexion given). Plotkin *et al.* (2000) showed that such models are good fits to empirical data in tropical forest tree plots and lead to species–area relationships that fit the data well. Plotkin & Muller-Landau (2002) later added the assumption of a global species abundance distribution to produce a model of decay of Sorenson similarity with distance. Morlon *et al.* (2008) provide a highly general, scale-explicit version where a species abundance distribution and a Poisson-cluster model of spatial distribution produce a decay-of-similarity with distance curve.

### MaxEnt

Maximum entropy is a generic tool for predicting a probability distribution subject to certain minimal knowledge about the distribution (such as its mean) (McGill 2006; McGill & Nekola in press). Maximum entropy is justified based on a minimum information logic – it starts with a no-information prior (often all species are equally abundant) and adds in a technical sense as little information as possible subject to the constraints. It uses a standard optimization technique (Lagrange multipliers) and produces a Gibbs probability distribution with parameters that are the solved Lagrange multipliers. Often this distribution collapses to more familiar distributions such as the exponential or normal distributions. Pueyo *et al.* (2007) showed that the tool of maximum entropy can produce realistic logseries (Fisher *et al.* 1943) species abundance distributions with very minimal input (specifically a constraint on mean abundance and a prior of 1/*n*). Harte *et al.* (2008) produced a unified theory making multiple predictions. The starting assumptions (aside from the use of maximum entropy) involve equal abundance priors, a constraint on mean abundance and a constraint on mean energy. The exact constraints are critical (more or less constraints produce very different results). This system can be solved using fairly standard MaxEnt techniques. The central result is a joint distribution for energy and abundance. When summed over all energy states, this produces a logseries distribution for abundance depending only on *S* and *N*. It also produces the function *P*_{i}(*n*|*A*,*A*_{0},*N*_{0},*S*_{0}) giving the probability of observing *n* individuals of a species in area *A* (given the number of individuals, *N*_{0}, and species, *S*_{0}, in some larger study area, *A*_{0}). With the function *P* in hand, species–area relationships and endemic–area relationships can be easily derived by summing across the *P*_{i}.

## Different mathematical languages

A major impediment to identifying a minimally sufficient set of rules to specify the stochastic geometry of biodiversity is the fact that the six unified theories reviewed above all use extremely different mathematical languages and tackle extremely different spatial scales. At the most basic split, four models work with population densities in an area (continuum, metapopulation, fractal, MaxEnt, spatially implicit neutral), while two model the precise spatial location of individuals (spatially explicit neutral, cluster Poisson). Even for the models that deal with aggregated individuals (i.e. densities), the spatial scales vary widely with the continuum model covering entire species ranges (McGill & Collins 2003) or entire elevational transects (Gauch & Whittaker 1972) and explicitly including climatic variation. In contrast, the MaxEnt model (Harte 2008) is likely intended at scales close to the individual and uses test data similar to that used by the individual models. The metapopulation and fractal models fall in between.

Similarly, the clustered Poisson, continuum and generalized fractal models are spatially explicit (precise spatial locations are given to objects and distances between objects can be derived). The MaxEnt model is spatially implicit. The neutral model has both spatially implicit (the analytical solution) and explicit (the lattice simulation) versions and the metapopulation model is intermediate between being spatially explicit and implicit. More generally, the view of and mathematical descriptions of space are distinct in all six models – this is summarized in Fig. 1 and Tables 1 and 3.

Unified theory | Null intraspecific spatial (no clumping) | Alternative (clumped) |
---|---|---|

Continuum | Equally abundant everywhere (flat abundance surfaces) | Gaussian variation across space (rare most places, common in one area) |

Neutral | Well-mixed (infinite dispersal) | Dispersal-limited |

Metapopulation | Present in all patches | Incidence function (logistic curve for presence/absence vs. patch area) – mixed presence/absence |

Fractal | Equally abundant everywhere (flat abundance surfaces) or strictly fractal | Generalized fractal (not self-similar but hierarchical division) or multifractal |

Point process | Poisson | Clustered-Poisson (Neyman–Scott process) |

MaxEnt | P_{i}(n|A) is Poisson | P_{i}(n|A) is exponential |

Finally, although, this should in principle be irrelevant, the six different models use fundamentally different branches of math (Table 1). The metapopulation model starts from Levins (1969) colonization extinction differential equation. The neutral theory uses birth–death processes (Hubbell 2001). The fractal (Harte *et al.* 1999) and generalized fractal (Storch *et al.* 2008) models use recursive equations or simulations respectively. The clustered Poisson model uses point processes. And the MaxEnt (Harte 2008) and continuum models (McGill & Collins 2003) use probability theory in a fairly general fashion (with MaxEnt also using Lagrange multipliers).

## Distributions of organisms=stochastic geometry

These six types of unified theory have largely been perceived as entirely distinct. After all, how could a model based on such distinct mechanisms as niches (continuum theory), neutrality and MaxEnt have anything in common? But in fact, these six theories have the commonality that they are all exercises in what a mathematician would call multitype stochastic geometry. Stochastic geometry is the study of objects placed stochastically in space (Stoyan & Stoyan 1994). The multitype qualifier indicates that the objects not only have a location (and in some cases a size/shape) but also have a type which in ecology corresponds to different species.

It should not be surprising that these unified theories have this common thread of locating typed organisms in space. In the real world ecologists go into the field (*in situ* studies) and place down boxes (quadrats) in different configurations and count the number and type (species) found within the box. In the real world this leads to data giving rise to all of the patterns addressed by unified theories of biodiversity such as species abundance distributions (SAD), species–area relationships (SAR), decay of similarity with distance, endemics area relationships, etc.

All six unified models are doing this exact same process in a modelling fashion. First, the model places organisms down in space according to some rules. This creates an exact analogue of the real world where organisms are spread out spatially and identified to species. Then boxes are drawn in different fashions according to which pattern is reproduced (i.e. SAD, SAR, etc). In some cases these analyses are *in papyro* (pseudo-Latin for on paper, meaning by analytical formula) and in other cases the analyses are done *in silico* (i.e. in a computer via a Monte Carlo simulation). Presumably if we can find the minimally sufficient set of rules such that the *in papyro* or *in silico* analyses match the *in situ* (field-based real world) analyses to accurately reproduce the macroecological patterns of biodiversity, we will have achieved a useful description of rules governing nature. This is the central goal of this paper.

A major challenge to the acceptance of this approach (witness the dormancy of both continuum and neutral theory for over 25 years after first being introduced) is that these unified models are inherently stochastic. Traditionally in ecology, stochasticity has been treated as noise that is inherently uninteresting. Indeed most null hypotheses are stochastic (e.g. two means differ by less than the 95% bounds of a *t*-distribution, random reshuffling of individuals) whereas most explanatory theories in ecology have long been seen primarily as deterministic. The defining models in ecology such as the Lotka–Volterra model, resource competition, and optimal foraging have all been 100% deterministic. But arguably scientific fields increasingly use stochastic modelling techniques as the discipline matures. For example, physics moved from the deterministic (essentially differential equation) world of Newtonian mechanics (glorified by Descartes’ hypothetical watchmaker) to the increasingly probabilistic world of statistical mechanics and quantum mechanics. In these worlds, scientists can only make probabilistic statements. Unfortunately, ecology has not yet made this transition and stochastic models seem very unfamiliar to most ecologists. Most ecologists receive much more training in the differential equation tools common to population dynamics than in the various forms of probability theory (e.g. the birth–death processes of neutral theory or the MaxEnt machinery). Many ecologists find the idea of explanatory stochastic theories deeply disturbing, but it may be a necessary paradigm shift.

Indeed stochastic geometry models in ecology and especially biodiversity and macroecology have become increasingly common. There are several such models of species–area relationships where ranges of varying size are given a position in space and then SARs are calculated (Coleman 1981; Leitner & Rosenzweig 1997; Maurer 1999; Allen & White 2003). An alternative approach is to start with sampling from a species abundance distribution and build collectors curves which can be equated to SARs with an assumption of constant number of individuals per unit area (Arrhenius 1921; He & Legendre 1996; Ugland *et al.* 2003). Green & Ostling (2003) have produced endemics–area relationships using similar principles. A similar approach has produced decay of similarity with distance (Plotkin & Muller-Landau 2002; Morlon *et al.* 2008). This same paradigm has produced the mid-domain effect to explain the latitudinal gradient in species richness (Colwell & Hurtt 1994; Colwell & Lees 2000). The key innovation of the six unified theories relative to these approaches is not the use of stochastic geometry, but only the derivation of multiple patterns from the given stochastic geometry.

## Minimally sufficient rules for the stochastic geometry of biodiversity

To date, the differences in scale, biological assumptions and mathematical language have tended to obscure any possible similarities between the distinct unified theories. Indeed several authors have suggested the only commonality is the fact that they are unified theories and some have gone to great pains to draw distinctions between the theories (Harte *et al.* 2005; Harte 2008; Storch *et al.* 2008), although certain structural similarities have been recognized (Harte *et al.* 2005; Storch *et al.* 2008). I have already suggested that all six unified theories also share the fact of using stochastic geometry.

However, I here make a strong claim that all six models implicitly or explicitly share three key assertions and that these three key assertions (in some cases invoked as assumptions and in some cases derived from assumptions) alone represent the minimally sufficient set of rules for describing the stochastic geometry of biodiversity. These three rules then lead inexorably to key patterns in biodiversity such as local species–abundance distributions, species–area relationships, decay of similarity with distance, abundance occupancy correlations and others (Fig. 2).

The three assertions or rules are:

- 1 Individuals are spatially clumped within a species
- 2 Abundance between species at a regional or global scale varies drastically and is roughly hollow curve in distribution
- 3 Individuals between species can be treated as independent and placed without regard to other species

The right hand side of Table 1 shows how these three assertions are formulated in each of the six theories. Further commentary on each of these assertions follows. McGill & Collins (2003) also earlier identified these three principles as the key assumptions. To advance the field, these assertions need to be falsifiable with alternative options clearly available (Platt 1964; Lakatos 1978). Table 4 summarizes each of these three assertions and gives an obvious alternative possibility.

Question | Null | Alternative |
---|---|---|

1. Spatial arrangement of intraspecific individuals | Random (Poisson) or even (uniform) | Clustered |

2. Species similarity | All species have equal abundance | Species differ strongly in global/regional abundance (some form of hollow curve) |

3. Spatial correlation between species | There is no correlation between species | Interspecific correlation is positive or negative |

### Assertion 1 – intraspecific individuals clumped

Probably the single most important feature of all six models is that individuals within a species are spatially aggregated or clumped (Table 1, third column from the right). This commonality was noted earlier (Storch *et al.* 2008) but only as a launching point for an entirely new model. In three of the models, the clumping assertion is an explicit assumption (i.e. the clumped Poisson process, continuum model and the fractal model). In the other three models this assertion is derived. Specifically, neutral theory assumes dispersal limitation which leads to clumping; metapopulations assume populations in a patch are either at abundance 0 or density *w* (with nothing in between) which is a form of clumping; and MaxEnt produces an exponential form for *P*(*n*|*A*,*A*_{0},*N*_{0}) which is much more strongly clumped than in a Poisson distribution (more *n* = 0 and more *n* large). Table 3 summarizes how each model specifies clumping and contrasts this with an alternative non-clumped possibility. The empirical evidence for making this assertion is reasonably strong, although more work is needed. A number of recent studies at the scale of individuals have shown that such intraspecific clumping occurs (He & LaFrankie 1997; Condit *et al.* 2000; Plotkin *et al.* 2000; Conlisk *et al.* 2009). At very large spatial scales individuals are also clumped – this is represented by the propensities for abundance surfaces across space to show a small, very high abundance peak and a large area of low abundance (Gauch & Whittaker 1972; Brown *et al.* 1995; McGill in revision). Although the mathematical language of clumped individuals (Ripley’s K, Condits Ω) sound very different than the language of abundance surfaces, in the end both describe a propensity for individuals to be spatially clumped. Indeed several authors have recently begun improving earlier models which assumed complete spatial randomness (e.g. Coleman 1981) by explicitly incorporating clumping (e.g. He & Legendre 1996; Conlisk *et al.* 2009) and have shown that such refinements lead to improved fits to empirical data.

### Assertion 2 – interspecific abundance varies

In all of these models it is necessary to create variability in the global or regional pool abundance of species (Table 1, next to last column). This has been shown most strongly in the fractal model where early models assumed similarly abundant species (Harte *et al.* 1999) which was rejected by empirical data (Green *et al.* 2003) leading to later explicit incorporation of variation in abundance between species (Harte *et al.* 2005). These abundances are invariably distributed with some hollow curve shape (McGill *et al.* 2007) in which there are many rare and a few common species. Although most measures of interspecific abundance are local, it is well documented that a hollow curve at a global/regional scale is empirically justified (Nee *et al.* 1991; Gregory 2000; Hubbell 2001; McGill & Collins 2003). In some models, like the continuum, metapopulation and Poisson cluster, a specific distribution of regional abundances is assumed explicitly as an input to the model. In other models, the hollow curve distribution of regional abundances is derived from other assumptions (the speciation/drift to extinction balance in neutral theory, the constraint on mean abundance in MaxEnt, or the repeated multiplication of fractional box sizes across hierarchical levels in the generalized fractal model leading to a central limit theorem like process). It is interesting to note that these models require no differences between the species to successfully reproduce biodiversity patterns except the variation in abundance. Thus other traits of species such as body size or life history that are presumably highly relevant to some aspects of ecology do not appear important for driving biodiversity patterns except for how they influence abundance.

### Assertion 3 – interspecific spatial arrangement is independent

All six models treat the spatial location of different species as completely unrelated to each other (Table 1, last column). This in turn makes the math simpler as it makes the probabilities of species occurrences independent, and allows for simple summing across species to derive multispecies patterns such as richness. In non-spatially explicit models there is no spatial arrangement of species and this assertion may not be strictly necessary (Green & Ostling 2003). After decades of assuming species interactions are central to ecology, this assertion is unpalatable for many. However, to date models assuming no spatial interactions have been very successful at making predictions about macroecological biodiversity patterns that match empirical data. Indeed, a few recent empirical studies tend to lend support to the assertion. Veech (2006) found that pairwise correlations of abundance across space were most commonly zero and with positive correlations also found. Hoagland & Collins (1997) also found 24 of 42 communities showed no correlation in locations of peak abundance and the rest showed a positive (clumped) correlation. And a recent paper examining correlations of abundance across time found that most correlations were zero or positive (Houlahan *et al.* 2007). The existence of some positive interactions has several interpretations. They could indicate predation (although one would expect matching negative interactions), or they could also indicate mutualism (although most people would not expect specific pairs of species in these studies to be mutualists and a weaker non-species specific facilitation would not produce these results). A third explanation, the one adopted by Houlahan and colleagues, is that some points in space (first two studies) or time (third study) are inherently more favourable (benign) to most species, resulting in higher abundances across several species at that point, leading to detection of clumping. However, the presence of some weak positive clumping does not appear to break the models (explicitly tested in McGill & Collins 2003). Also note that it would be incorrect to interpret any success of this assertion as rejecting the importance of competition, predation, mutualism and other species interactions. Independence may be more a consequence of statistical arguments. If one starts with a community of 30 species, then there are 435 = 30 × (30–1)/2 possible pairwise interactions. If we assume that each species interacts strongly with 3 other species in a symmetric fashion then there are only (30 × 3)/2 = 45 strong interactions – i.e. only about 10% of all possible interactions are strong. Thus in a many species communities, pairwise interactions may on average be quite weak, despite the existence of some strong interactions (Paine 1988; Wootton 1997). This appears sufficient for this assertion of independence to become accurately predictive.

It is interesting to note (Table 4) that of the three assertions, one (independence between species) would fit our *a priori* null hypothesis and seems relatively uninformative about biology, while the other two (clumping within species, variability in abundance between species) are rejections of the obvious nulls and appear to represent significant underlying biology. It is also worth commenting on the box labelled `Antecedent assumptions’ in Fig. 2. Some theories (continuum, clustered point process) start with assumptions that exactly match the three assertions identified here. Others (neutral theory, fractal theory and MaxEnt theory) start with different assumptions (the antecedent assumptions) and derive the three assertions presented here. Thus it must be emphasized that while the three assertions highlighted here are sufficient to produce a stochastic geometry theory of biodiversity, and represent a minimal set in the sense that removing any one of the assertions will cause the theory to fail, neutral, MaxEnt and fractal theories can also produce the same results. Could one of those sets of assumptions be more minimal? It is hard to say. Much has been made by various authors (Hubbell 2001; Volkov *et al.* 2003; Harte *et al.* 2005; Storch *et al.* 2008) of numbers of parameters, strength of assumptions and numbers of predictions to justify claims of superiority. However, many of the models have various numbers of hidden parameters (e.g. is the assumption of MaxEnt or a Gaussian bell-curve one parameter?), and there is not even agreement on exactly how many quantitative parameters the neutral model contains (Nee & Stone 2003; Volkov *et al.* 2003; McGill *et al.* 2006). But it is probably moot to try to choose one model as superior to the others. Indeed the main argument of this paper is the models are essentially a single model with different mathematical representations. In the end the theories will probably be judged on success at prediction and stimulation of new research rather than parsimony. Moreover, it is hard to imagine how neutral or MaxEnt theory could create realistic stochastic geometry without somehow reproducing the three minimally sufficient assertions identified here. One clear benefit of the three assumptions used here is that they make strong biological statements that can be directly tested and studied (see Box 1). But other theories have advantages too, such as the predictions over time of neutral theory or (paradoxically) the relative lack of biological inputs to MaxEnt.

## Predictions and testing

The above three assertions are in general adequate, when worked through the various mathematical methods of the six different unified theories, to predict multiple patterns that are commonly observed in nature (Table 2). Although the math is highly different, the conceptual, geometric process of producing the basic patterns of biodiversity are the same (Fig. 3 and the steps identified in the legend of Fig. 3). All six theories produce species–area relationships. Four of the six have produced a local species abundance distribution, and the remaining two (clustered Poisson point process and metapopulation) could probably be used to produce local SADs with a little effort. Half of the models derive the global SAD from other assumptions (neutral, generalized fractal, MaxEnt) while three (continuum, metapopulation, clustered Poisson) make it an explicit assumption (but all agree that a hollow curve-shaped global SAD is a key step in predicting the stochastic geometry). Three theories produce explicit decay of similarity with distance predictions (continuum, neutral and clustered Poisson). The other three models presumably could derive such curves as well. Three theories (continuum, metapopulation and MaxEnt) derive the positive correlation between abundance and occupancy (or range size and occupancy in the case of the continuum) and the other three probably could as well. The bottom line is that once the stochastic geometry has been realistically produced we can in principle derive any basic macroecological pattern using *in papyro* or *in silico* methods that exactly match the *in situ* methods used to collect the analogous empirical data. At a minimum this should always be possible in a computer simulation. What has perhaps proven surprising is in how many different mathematical languages/spatial descriptions and for how many different patterns it has proved possible to do this analytically.

Simply producing curves of an appropriate shape is a weak test. Elsewhere (McGill 2003a) I have called this the lowest possible test or a level I test. All of the published unified theories have gone beyond this level though, using empirical data to parameterize their model, and then demonstrate good fit not just in shape but in slope, intercept, etc., of the predicted curves to the empirical data (Table 4, two rightmost columns and Fig. 4). In other words, they curve fit the predicted functional forms to the data. Although, most authors do not report *r*^{2} values, the visual fits demonstrated are in most cases impressive. I called such curve-fitting tests Level II tests. Stronger tests are possible and desirable (e.g. fitting empirical data with *a priori* parameters or predicting dynamics over time or predicting previously unknown patterns – see McGill 2003a). At least three theories produce such level III tests (McGill & Collins 2003; Harte *et al.* 2008; Storch *et al.* 2008). Moreover, unified theories by definition make many simultaneous predictions, which if they prove true has to count as a strong test, even if individual predictions are weak; Rosenzweig described this as the dipswitch test where many weak (binary) predictions are unlikely to align correctly by chance (Rosenzweig & Abramsky 1997; McGill *et al.* 2007).

## Limitations and scope of applicability

Although I hope this identification of minimally sufficient rules and similarities between formerly distinct theories represents a useful step forward, it is clear that it is currently incomplete and represents an intermediate point along the path (Boxes 1 and 2). There is probably more left undone than done. Most noticeably there is not a unified set of equations that covers all scales (Box 2). It is also important to be clear about the limits of applicability, beyond which this theory does not apply. First, although implicit in much of the discussion, it is perhaps important to reiterate that these minimally sufficient rules lead to predictions about biodiversity and macroecology. They do not lead to predictions about any other branch of ecology such as physiological ecology, behavioural ecology or even population ecology.

Second, the discussion so far has been quite vague about which taxa and how broad a group of species it applies to. Hubbell’s version of neutral theory was built on a zero-sum assumption (Hubbell 2001; but see Etienne *et al.* 2007) which he interpreted to mean that the theory applied only to a single guild or group of organisms at one trophic level directly competing with each other. He later relaxed this assumption applying neutral theory to all birds in Britain which clearly contains multiple trophic levels. I am unaware of explicit statements of scope for the other unified theories. From first principles, the theory proposed herein would apply to any group of organisms that fit the identified minimally sufficient rules or assertions. I know of no studies suggesting that the strong propensity to clump disappears in any group of organisms but clumping has been primarily studied in plants. Similarly, if the statement of spatial independence between species holds for closely related organisms, one would expect it to also apply to more distantly related organisms. Probably the most constraining assertion is the hollow curve distribution of abundances. Only taxonomic extents meeting this constraint would be addressed by this unification. However, the hollow curve species abundance distribution has, in practice, been measured across very diverse groups such as all birds (Gaston & Blackburn 2000; McGill & Collins 2003), all fish (Winemiller 1990) or even across phyla as in all zooplankton or all marine invertebrates (Ugland *et al.* 2007). Thus, the assertions and the predictions discussed here would be presumed to apply to nearly any community of any taxonomic extent pending further study of the generality of the assertions.

The timescale of this theory is very similar to that of the original theory of island biogeography. Namely it is a dynamic equilibrium. Thus it makes predictions about all points in time without being specific about the time trajectory by being vague about species identity. However, the theory presented here uses *S* and *N* as inputs. So it is clear that predictions would change over situations and timescales where *S* and *N* are changing. I perceive the lack of statements about trajectories over time to be one of the larger limitations of the current theory. I have not been too precise about the definition of community covered by the theory, but it does not appear to matter. The spatial area being modelled is precise, the time period is any time over which the input conditions (*S* & *N*) are constant, and the species involved can be pretty much any set of interest. In this way the community studied here is not so different from past definitions of community (Fauth *et al.* 1996).

## Conclusion

The central goal of this paper has been to see if there was a commonality across all six unified models to produce a minimally sufficient set of rules to successfully describe the stochastic geometry of biodiversity patterns in the real world. I identified three assertions or rules (intraspecific clumping, interspecific variation in global abundance, and interspecific spatial independence) that either explicitly (as assumptions) or implicitly (as results) are central to all six theories. This strongly points to these three assertions as a minimally sufficient set of rules to produce a unified stochastic geometry theory of biodiversity. Conceptually this stochastic geometry can then be used to derive any biodiversity pattern of interest that depends only on species and the spatial structure and abundance of organisms. Aside from the importance of having a single unified theory from a theoretical perspective, we can treat the progress towards a unified theory of unified theories of biogeography as a filter for distinguishing interesting from uninteresting future research directions (Boxes 1 and 2). Perhaps biodiversity ecology is beginning to have a strong unified theory to serve as a central organizing paradigm.

**Box 1** What this theory tells us about what we do know, what we don’t know, and what we need to know

### What we know

- 1
**The processes driving local species abundance distributions, species–area relationships, and decay of similarity with distance.**All six of the unified theories are successful in explaining the first two of these well known macroecological patterns and several explain the third as well. And as shown here, all six do it in more or less the same way. Local species abundance distributions occur from sampling from clumped spatial distributions (sometimes in the clump, usually not in the clump) overlayed with global variation in abundance. Species–area curves and decay of similarity derive from random placement of species with many small ranges and a few large ranges. We don’t need to continue producing two to three new theories explaining species abundance distributions per year (McGill*et al.*2007), but we probably will.

### What we don't know

- 2
**How does clumping change across scales?**Are species more clumped or less clumped at larger scales (e.g. He & LaFrankie 1997; Plotkin & Muller-Landau 2002)? Can we quantify the nature of this variation. - 3
**How general are clumped distributions beyond plants and at large spatial scales?**Clumping has been studied almost entirely in plants and almost entirely at scales of 100s–1000s m, although birds do appear to be clumped at the scale of their geographic ranges (McGill & Collins 2003) - 4
**What processes cause most species to show clumping?**Presumably it is some mixture of dispersal limitation (not necessarily neutral) and clumping of underlying environmental factors with current evidence giving a nod to environment being stronger (Gilbert & Lechowicz 2004; Jones*et al.*2008) A powerful, predictive theory of how neutral dispersal limitation affects clumping exists (Houchmandzadeh 2008), but equivalent theories for non-neutral dispersal limitation or environment are lacking. - 5
**How general is the hollow curve global abundance distribution?**All attempts known to me to measure global abundance distributions show a hollow curve, but there are probably less than a dozen such attempts. - 6
**What drives the variation of global abundance?**It seems probable that global species abundance distributions must derive from evolutionary processes, although they may also emerge as limit theorems of local processes (Šizling*et al.*2009). The attempt to relate species traits to global abundance (a more logical agenda than relating species traits to local abundances) has had little success to date (Murray*et al.*2002; White*et al.*2007), perhaps due to the complex interplay of forces involved (McGill 2008). - 7
**To what degree and at what spatial and taxonomic scales are species spatially independent and why?**Of the three assertions, this has been the least studied (the three studies cited in the main text are the only attempts I know of to measure this). It likely depends on scale (Wiens 1989; Russell*et al.*2006) - 8
**What are the ramifications of the spatial non-independence of species?**The assertion with the most contrary evidence to date is that of interspecific spatial independence, where a solid minority of species show interspecific clumping. This appears not to break the theory. Why and how much clumping can be tolerated?

### What we need to know

- 9
This paper suggests that the central unanswered question is what determines*S*and*N*are always inputs. What drives these?*S*and*N*. Despite my calling the unified theory a theory of biodiversity, in every case the species richness,*S*, and number of individuals,*N*are inputs to the model rather than predictions (nb: neutral theory uses*θ*as an input but*θ*is not directly measurable and is highly correlated with*S*: McGill 2003b). To date the greatest success in the study of these factors has been empirical (i.e. looking for correlations with environmental variables), where factors like productivity, climate, and altitude seem important (Mittelbach*et al.*2001; Hurlbert 2004). Arguably one consequence of the unified unified theory, is a strong indication that one of, if not the, central focus of future biodiversity research needs to be directed towards mechanistic explanations of*S*and*N*.

**Box 2** Moving further towards a useful unified unified theory

- 1
**Can we develop a general mathematical machinery?**Can we find a generic mathematical machinery that efficiently captures the three core assertions, allows the derivation of multiple predictions (Table 2) which can be tested against data at multiple scales? Having mathematical equations will allow us to: (1) make additional predictions (such as those called for in point # 3 below) and (2) to make precise quantitative predictions that are subject to more robust testing. Although the six different unified theories make qualitatively similar predictions (or one of them would be falsified by data), they do differ in specific detail. Perhaps most extreme is the predictions about species–area relationships which range from negative to positive second derivatives (Fig. 4). To date, probably the most promising general approach for developing equations in this unified context has been the*P*(*n*|*A*,*A*_{0},*N*_{0}) idea found in several unified theories (Harte*et al.*1999, 2005, 2008; Storch*et al.*2008) as well as several simpler theories (e.g. He*et al.*2002). By changing A this approach make statements ranging from individuals to large areas, so by summing across species derivation of SADs and SARs are trivial. However,*P*(*n*|*A*,*A*_{0},*N*_{0}) is not spatially explicit, making derivation of patterns like decay of similarity with distance difficult, and*P*(*n*|*A*,*A*_{0},*N*_{0}) was inspired by theory without much empirical precedent so we know little about its true empirical patterns and it will require enormous amounts of data to fill this in. The sampling language of neutral theory (Alonso & McKane 2004; Etienne 2005) which is independent of the neutrality assumption might also be a possibility. Multifractals (Borda-de-Agua*et al.*2002) are another possibility. Or we may need something completely new. - 2
**Can we describe a unified model that works across scales?**As highlighted in Table 2, although very few of the unified theories are explicit about the scales they operate at, it becomes clear from the different empirical datasets used that different theories are targeted with different spatial scales in mind. The fact that some theories deal with individuals and some deal only with densities per unit area also suggests this. In fact the neutral, MaxEnt, and clustered Poisson seem targeted at smaller scales of a few thousands of individuals and 100s of meters, the metapopulation and generalized fractal seem targeted at intermediate scales, not dealing with individuals but targeting 10s of kms, and the continuum theory seems targeted at macroscales (elevational gradients and continents). This suggests several research questions. At the simplest level, can we paste the models at different scales together to produce an `all-scales’ model. This is suggested in Fig. 4 where no one unified theory produces the triphasic species–area relationship but the MaxEnt and continuum models combined successfully reproduce the entire range from scales of m^{2}to continents. At a more profound level can we develop a single model and mathematical machinery that can span this range of scales? One key feature of such a model will be an ability to go mathematically from locations of individuals to population density (abundance surfaces) (i.e. Fig. 1a/b to 1e). Another key feature will be either a prediction or incorporation of empirical data on how clumping changes with spatial scales (Question 2, Box 1). - 3
**Conservation implications?**It would be disappointing if the unified unified theory proved interesting only to academics. One hopes it will carry over into adding tools to conservation biology. To date there has been a noticeable failure to do this (Clark 2009). But it seems hard to imagine that a truly general and accurate stochastic geometry of biodiversity will not influence conservation biology. Some of this may come through exploring in depth the three assertions. But I think one of the most promising areas occurs if we succeed in moving towards a general all-scales theory; such a theory can be used to extrapolate from easily obtainable data up or down to spatial scales for which it is more difficult to obtain data. Several attempts at this research program have already begun (Kunin 1998; He & Gaston 2003; Harte*et al.*2009).

## Acknowledgements

I thank Cathy Collins, John Harte, Fangliang He, Tomasso Zillio, Bill Shipley, David Storch and Arnost Sizling for many generous and free-ranging discussions that greatly improved my understanding of the unified theories. I thank the members of the Enquist/Saleska/McGill labgroup for their feedback on my ideas about a unified theory.