1Analyzing the phylogenetic structure of natural communities may illuminate the processes governing the assembly and coexistence of species. For instance, an association between species co-occurrence in local communities and their phylogenetic proximity may reveal the action of habitat filtering, niche conservatism and/or competitive exclusion.
2Different methods were recently proposed to test such community-wide phylogenetic patterns, based on the phylogenetic clustering or overdispersion of the species in a local community. This provides a much needed framework for addressing long standing questions in community ecology as well as the recent debate on community neutrality. The testing procedures are based on (i) a metric measuring the association between phylogenetic distance and species co-occurrence, and (ii) a data set randomization algorithm providing the distribution of the metric under a given ‘null model’. However, the statistical properties of these approaches are not well-established and their reliability must be tested against simulated data sets.
3This paper reviews metrics and null models used in previous studies. A ‘locally neutral’ subdivided community model is simulated to produce data sets devoid of phylogenetic structure in the spatial distribution of species. Using these data sets, the consistency of Type I error rates of tests based on 10 metrics combined with nine null models is examined.
4This study shows that most tests can become liberal (i.e. tests rejecting too often the null hypothesis that only neutral processes structured spatially the local community) when the randomization algorithm breaks down a structure in the original data set unrelated to the null hypothesis to test. Hence, when overall species abundances are distributed non-randomly across the phylogeny or when local abundances are spatially autocorrelated, better statistical performances were achieved by randomization algorithms preserving these structural features. The most reliable randomization algorithm consists of permuting species with similar abundances among the tips of the phylogenetic tree. One metric, RPD-DO, also proved to be robust under most simulated conditions using a variety of null models.
5Synthesis. Given the suboptimal performances of several tests, attention must be paid to the testing procedures used in future studies. Guidelines are provided to help choosing an adequate test.
Useful knowledge about ecological processes governing species assemblages can be inferred from community biodiversity patterns, and integrating phylogenetic information can add further insights because a phylogeny conveys indirect information on the shared origin, the shared adaptations, and the potential for competition between species (Webb et al. 2002). For example, if related species tend to be adapted to similar habitats because they share similar traits (a pattern called ‘niche conservatism’; Lord et al. 1995), local communities distributed along an ecological gradient may show an excess of related species co-occurring locally, that is, a pattern of ‘spatial community phylogenetic clustering’. Conversely, a pattern of ‘spatial community phylogenetic overdispersion’ (i.e. sister-species co-occur within sites less often than expected by chance) can result from (i) competitive exclusion between related species that share similar ecological requirements, (ii) mortality dependence on the density of related species (e.g. Gilbert & Webb 2007), or (iii) ecological speciation causing habitat differentiation between sister-species (Cavender-Bares et al. 2004, 2006). Hints about these processes can thus be gathered from the correlation between co-occurrence and phylogenetic distance of species pairs in natural communities. Note that such ‘spatial phylogenetic structure’ may also have a biogeographic origin when the speciation rate exceeds the dispersal rate across biogeographic barriers (Webb et al. 2002; Hardy & Senterre 2007).
The phylogenetic structure of communities is particularly informative to the current debate on neutrality because, in a neutral community no ecological process regarding species–species or species–environment interactions can generate a phylogenetic structure in the spatial distribution of species. However, purely neutral processes can create complex structures in the spatial distribution of species (Ulrich 2004), so that testing for a spatial phylogenetic structure by randomizing a data set, for example, permuting species abundances among locations (and there are many ways to do so – see Gotelli 2000) may not be adequate.
A relaxed and more realistic version of community neutrality is the concept of ‘locally neutral’ communities, where neutrality (i.e. species equivalence) holds within a focal community but not necessarily outside this community (Dolman & Blackburn 2004; Ulrich & Zalewski 2007). If the locally neutral community receives migrants from a regional species pool at random locations, the spatial distribution of species will depend only on neutral processes (dispersal, ecological drift) and should be devoid of any phylogenetic structure, though a phylogenetic structure could occur in the species abundance distribution because the latter can be influenced by non-neutral processes occurring outside the focal community. Hence, it is important to be able to test separately the occurrence of a ‘spatial phylogenetic structure’ (Are species found in a given site more, or less, related on average than species found in distinct sites?) from the occurrence of an ‘abundance phylogenetic structure’ (Is the distribution of overall species abundances non-random in the phylogeny?) (Helmus et al. 2007a).
The null model approach has been widely used in community ecology, in particular to test whether patterns of species co-occurrence is ‘random’, a context in which its interpretation has been much debated (e.g. Diamond 1975; Connor & Simberloff 1979; Gotelli & Graves 1996; Gotelli 2000; Gotelli & McCabe 2002; Ulrich 2004; Ulrich & Gotelli 2007a,b). Defining a null model algorithm adequate for the specific null hypothesis to be tested is not straightforward because different randomization schemes can be conceived and they are not necessarily congruent (Gotelli 2000). Actually, a null model, defined by a specific randomization algorithm, always tests correctly an implicitly defined null hypothesis (i.e. it tests whether the observed pattern conforms to the possible set of patterns generated by the randomization scheme). However, the latter can differ from the explicitly defined null hypothesis one wishes to test (e.g. Does the observed pattern conform to a neutral community?). The problem may arise when the randomization algorithm breaks down a structure in the original data set unrelated to the null hypothesis to test.
Although testing procedures to detect ‘non-random’ community phylogenetic structure are becoming routinely used, their statistical behaviour and their reliability to detect the imprint of non-neutral ecological processes has rarely been discussed (Kembel & Hubbell 2006; Kraft et al. 2007; Helmus et al. 2007a). For tests originally designed by Webb (2000), Kraft et al. (2007) reported consistent Type I error rates when applying them on a single simulated community created under various assembly models. Many empirical studies, however, are based on multiple communities, so that the problem of the statistical independence of community samples, a feature than can greatly affect the validity of tests, must be evaluated.
To investigate how tests perform on multiple community samples, I simulate a locally neutral community that is spatially subdivided and devoid of spatial phylogenetic structure. I then assess the validity of different testing procedures, each corresponding to a particular combination of a metric and a randomization algorithm (‘null model’), based on the rate of rejection of the null hypothesis that the community is locally neutral. The Type I error rate (i.e. proportion of false positives) is explored when the community is subject to restricted dispersal, variable sub-community sizes, and/or a non-random phylogenetic distribution of global species abundances, thus under conditions that affect structural features of the community that randomization algorithms do not necessarily preserve. In the following, I first present the different metrics and null models that will be investigated, then the simulation algorithm and data analyses.
Metrics of spatial community phylogenetic structure
Different metrics using presence/absence or abundance data have been developed to describe the extent of community phylogenetic structuring (Table 1; a more detailed description is available on line in the Appendix S1 in the Supplementary Material). Two categories can be distinguished: (i) metrics expressing the relatedness between species or individuals co-occurring within a site, in absolute terms (MPD, MNTD, , PSV, PSR, PSE) or relative to the relatedness between species or individuals from distinct sites (BST, ΠST, or the difference PST − IST), and (ii) metrics expressing the correlation between the matrix of phylogenetic distances between species and a matrix of pairwise co-occurrence indices between species (RPD-CA, RPD-CO, RPD-RA, RPD-RO, RPD-DO, which differ by the way co-occurrence is estimated).
Table 1. Metrics to characterize the spatial phylogenetic structure of communities
The metrics described above and in Table 1 usually seek to characterize a ‘spatial phylogenetic structure’: Are co-occurring species more related than expected? There could also be an ‘abundance phylogenetic structure’: Are abundant species randomly distributed across the phylogeny? The latter pattern can be quantified by the Abundance Phylogenetic Deviation index, which I define as APD = (ΔP − DB)/ΔP, where is the mean phylogenetic distance between distinct species in the whole community (PDij is the phylogenetic distance between species i and j), and is the mean phylogenetic distance between individuals of distinct species (n being the number of species and fi the relative abundance of species i in whole community). Thus, DB is a species abundance weighted version of ΔP. If abundant species tend to belong mostly to one or several related clades, DB < ΔP, so that APD > 0, a situation I will name ‘species abundance phylogenetic clustering’. On the contrary, if abundant species tend to be spread among distant clades, DB > ΔP, so that APD < 0, a situation I will name ‘species abundance phylogenetic overdispersion’. Note that APD can also be defined using the number of sites in which a species is found to compute relative species abundances.
Null models for testing community phylogenetic structure
Randomization can be applied on the inter-species phylogenetic distance matrix, that is on the phylogenetic tree (Fig. 1), or on the species by site matrix, that is on the species spatial distribution (Fig. 2). Here I detail nine variants (i.e. distinct null models) and emphasize their impact on the structure of the randomized data set (Table 2).
Table 2. Constraints of the different null models on the structural features of data. Equivalences with previous works are given as footnotes
Null model (equivalence)
Constraints (structural features of data left unchanged after randomization)
Local species diversity per site
Among sites abundance distribution per species
Spatial autocorrelation of local abundances for each species
Phylogenetic distribution of global species abundances
The simplest phylogeny randomization consists of reshuffling species positions among the tips of the phylogenetic tree (null models 1, Fig. 1), thereby keeping the tree topology and branch lengths unchanged. If a reference species pool is defined and contains non-sampled species, there are two variants of this null model depending on whether the phylogenetic tree contains only the species sampled in the set of sites studied (null model 1s; Fig. 1), or all the species of a reference species pool (null model 1p; Fig. 1). A third variant considers the overall abundance (or the occurrence frequency among sites) of each species, restricting permutation among species with similar abundances. To this end, species are grouped into distinct abundance classes characterized by a fixed ratio K = maximal abundance / minimal abundance (e.g. for K = 4, the limits between abundance classes could be 1, 4, 16, ... ) and species are randomly permuted within each class (null model 1a; Fig. 1). This randomization algorithm maintains most of the abundance phylogenetic structure originally present in a data set (Table 2). Note that class limits follow a geometric progression but it is preferable to change them from one randomization to the next (e.g. for K = 4, class limits could also be 0.5, 2, 8, 32, ... , then 0.3, 1.2, 4.8, ... ) to avoid that species with similar abundance could not be permuted because they would always belong to distinct abundance classes.
The species by site matrix can be randomized in different ways by permuting the elements of the matrix within each site (null models 2), or within each species (null models 3), independently or not (Fig. 2). Each randomization algorithm may constrain different features of the data set (Table 2) and six null models are described below.
A first simple algorithm consists of permuting, independently for each site, the local species abundances among species (null models 2s and 2p; Fig. 1). As for null model 1, if a reference species pool is defined and contains non-sampled species, there exist two variants depending on whether the species by site matrix includes only the sampled species (null model 2s) or all species from the reference species pool (null model 2p). The local species diversity is maintained within each site but not the original inter-site abundance distribution of each species. With presence/absence data, this null model is equivalent to randomly sampling species from the pool of species (without replacement) until the original species richness in each site is achieved. It has often been used to test community phylogenetic structuring (e.g. Webb et al. 2002; Horner-Devine & Bohannan 2006; Kembel & Hubbell 2006). Helmus et al. (2007a) noted that these null models break down both the spatial and the abundance phylogenetic structures so that the two types of patterns can be confounded.
A second simple algorithm consists of permuting, independently for each species, the local abundances among sites (null model 3i; Fig. 2). Here, the inter-site abundance distributions are maintained for each species but not the original local species diversities per site (Table 2). This approach has been applied by Cavender-Bares et al. (2004, 2006) and is advocated by Helmus et al. (2007a) for testing if there is a significant spatial phylogenetic structure (independently of an abundance phylogenetic structure).
The species by site matrix can also be randomized in a more constrained way by maintaining both local richness and total species occurrences (null models 2x and 3x). This is achieved by the Gotelli ‘swap’ algorithm (Gotelli & Graves 1996; Gotelli 2000; Gotelli & Entsminger 2001). The latter seeks out randomly in the matrix for submatrices of four elements (where rows and columns do not need to be adjacent) showing a (1, 0)(0, 1) or a (0, 1)(1, 0) presence/absence configuration, then it swaps the values to get a (0, 1)(1, 0) or a (1, 0)(0, 1) configuration, respectively. Hence, the rows and columns total of the original matrix are preserved. The matrix randomization is achieved by repeating this swapping procedure many times. Although the Gotelli swap algorithm was designed for presence/absence data, I extend it for abundance data where submatrices are in (x, 0)(0, y) configuration. The latter can thus be swapped into (0, x)(y, 0) or into (0, y)(x, 0) configurations, defining two variants: one maintains local diversities (null model 2x; Fig. 2), the other maintains inter-site species abundances distributions (null model 3x; Fig. 2). For these models, submatrices in (r, s)(t, u) configuration, where r, s, t, u are all ≠ 0, are also swapped into (s, r)(u, t) in null model 2x and into (t, u)(r, s) in null model 3x.
All the preceding species by site matrix randomization schemes breaks down any spatial autocorrelation of local species abundances (although constraints induced by the swap algorithm may maintain some spatial autocorrelation). To avoid this, when sites are arranged regularly on a rectangular lattice (or along a transect), the whole spatial pattern of each species local abundances can be translated by a random number of lattice units in each direction, independently for each species (null model 3t; Fig. 2). Such randomization displaces some sites beyond the lattice boundary, in which case the sites are ‘wrapped around’ on to the opposite boundary, as in torus-translation tests (Harms et al. 2001). Except for disruptions at the boundaries, this procedure holds the within-species spatial structure intact while rendering the spatial structures of different species independent, though it does not preserve local species diversity.
Simulation model of a subdivided community
To verify the statistical properties of tests of community phylogenetic structuring, I developed an individual-based model of a locally neutral community subdivided into n sites. Each site contains a fixed number of individuals and replacement follows a lottery model where individuals of all species have the same competitive ability (Chesson & Warner 1981). This community is connected to a constant large regional species pool defined by a set of P species with predefined relative abundances. Contrary to Hubbell's neutral model (but as the 3L-SINM model of Munoz et al. 2008) relative species abundances in the regional species pool is not assumed to result from an ecological drift-speciation equilibrium but is assumption-free and can take any form, or result from any kind of processes (neutral or not). The only assumption is that it is stable on the time-scale investigated. Hence, species equivalence is assumed only within the focal subdivided community (hence the expression ‘locally neutral community’).
The initial community is composed of a random sample from the regional species pool, that is, individuals are sampled from the source community (with replacement) with probability equal to the pre-defined species relative abundances. At each time step, or generation, all individuals die and are replaced by the progeny of individuals from the previous generation. Progeny can come from (i) the same site (no migration), (ii) the regional species pool (at rate Mp), or (iii) another site. In the latter case, there are three possible patterns of migration between sites: (i) all sites are interconnected and migration occurs randomly between sites (rate Mr); (ii) sites are arranged in a spatially explicit way and migration is limited by spatial distance, occurring only between adjacent sites (rate Md); or (iii) sites are organized into distinct groups and migration is limited by group membership, occurring only between sites from a same group (rate Mg). High Mp would result in sub-communities essentially identical to the regional species pool whereas if the rate of local recruitment is close to one (very low migration rates), monodominance would result as a consequence of ecological drift. To obtain a balanced level of local diversity and correlated species composition among sites, one needs 0 < Mp < Mr or Md or Mg << 1. Md causes a spatial correlation between nearby sites whereas Mr tends to homogenize composition among sites and Mg results in a homogenization of sites within groups and a differentiation among groups. Hence, all pairs of sub-communities are expected to show equal level of differentiation under Mr whereas differentiation will vary according to spatial distance under Md and according to group membership under Mg. The relationships between species are defined by a dated phylogenetic tree providing a phylogenetic distance for each pair of species in terms of divergence time (see below). This model was simulated using a C-coded program. Simulations were run for T generations before analyzing the phylogenetic structure, and 1000 replicates were run for each parameter set (see below). It is assumed that T is sufficiently short that speciation and drift in the regional species pool can both be neglected (drift-migration equilibrium is not necessarily reached).
In the simulations presented here, the regional species pool (source community) is composed of P = 227 species, the number of angiosperm trees with DBH > 10 cm in the 50-hectare Forest Dynamics Plot on Barro Colorado Island, Panama (BCI data set; Condit 1998). The species phylogeny for the 227 BCI species was constructed from a dated phylogenetic tree of all angiosperm families (Davies et al. 2004) using the Phylomatic software (Webb & Donoghue 2005). Within families, genera and species are positioned as polytomies with node ages equal to, respectively, 2/3 and 1/3 the age of the family (Hardy & Senterre 2007). Relative species abundances in the source community follow one of three possible modes: (1) the actual abundances observed on BCI (available on line in the Table S1), (2) the abundance distribution observed on BCI but randomized among species, (3) the 34 highest species abundances observed on BCI (which constitutes 72% of the community) are re-attributed randomly to the 34 species belonging to the (APG) Fabaceae family (APGII 2003), the other abundances being re-attributed randomly to non-Fabaceae species. Thus, the frequency distribution of species abundances is the one observed on BCI but the phylogenetic distribution of species abundances is realistic (case 1), random (case 2), or highly clustered (case 3). Simulations consider n = 100 sites on a 10 × 10 grid arranged in a toroidal fashion (i.e. opposite sides of the grid are connected, so that there are no borders). The number of individuals per site (i.e. per sub-community) is constant (Ns = 100) or variable (Ns alternates spatially between 40 and 160), but the overall number of individuals is fixed (10 000). Immigration from the source community occurs at a rate Ms = 0.001. Species migration among sites is random (Mr = 0.1 and Md = Mg = 0), limited by distance (Md = 0.1 and Mr = Mg = 0), or limited by group membership within each of four groups of 5 × 5 sites (Mg = 0.1 and Mr = Md = 0).
There are thus three variable parameters (three states for the phylogenetic distribution of species abundances, three states for the migration pattern and two states for sub-community sizes) that are combined in a factorial way, leading to 18 parameter sets. For each set, after T = 100 generations, the global and average local species richness is computed, as well as the abundance phylogenetic deviation index, APD. Then, each metric quantifying community phylogenetic structuring (Table 1, including the MPD and MNTD metrics for a single site) are computed and tested by each of the nine null models (Table 2). For null models 1p and 2p, the reference species pool is constituted by the 227 species of the regional species pool, whereas null models 1s and 2s considered only species sampled in the focal community (84 species on average). For null model 1a, the threshold ratio was set to K = 3, using species abundances or species occurrence frequencies in accordance with the type of data treated by each metric. For null models 2x and 3x, the number of submatrix swapping (Fig. 2) used to randomize the species by site matrix is equal to five times the matrix size (i.e. 5 × 227 × 100 = 113 500) for the first randomized data set, which starts from the original data set, and equal to the matrix size for the next randomized data sets, where each starts from the previously produced randomized data set.
For each metric by null model combination, the number of significant tests over 1000 replicates was recorded at α = 0.05, distinguishing phylogenetic clustering and overdispersion (i.e. there are two one-sided tests at α = 0.025). For exact tests, there should be by chance an average of 25 replicates where phylogenetic clustering is detected and 25 replicates where phylogenetic overdispersion is detected (false positives). Deviations from these expectations are significant when the number of significant tests exceeds 37 or is less than 13 (χ2 test at P < 0.01, 1ddl). To get an overall assessment of Type I error rate conformity of a given test across a range of parameter sets, the mean square error is computed: MSE = where Nobs and Nexp are, respectively, the observed and expected numbers of replicates for which the test suggests significant clustering, or overdispersion (here, Nexp = 25).
For brevity, results (Tables 3 and 4) are presented for seven of the 18 parameter sets, 5 of the 10 metrics, and five of the nine null models. Complete results are available in the online supplementary material (Tables S2 and S3).
Table 3. Mean global and local species richness, mean abundance phylogenetic deviation index (APD), and mean values of the metrics characterizing the spatial phylogenetic structure of a simulated locally neutral community for different sets of simulation parameters
Species abundances in the source community: Phylogenetically random, realistic using BCI data, or phylogenetically clustered.
Species migration among sites: spatially random, limited by distance, or limited by group membership.
Sub-community size: Constant (Ns = 100 individuals) or Variable (Ns alternates spatially between 40 and 160).
Other simulation parameters are fixed (see text). Parameters different from the first parameters set are highlighted (bold).
Table 4. Results of different tests of phylogenetic structuring applied on a simulated locally neutral community under various simulation parameters (see text and Table 3 for details). Each test is a particular combination of a metric and a null model. Values (X/Y) indicate the number of replicates over 1000 for which each test detects significant phylogenetic clustering (X) or overdispersion (Y) at P ≤ 0.05 (25/25 is expected for exact tests). In bold: cases where Type I error rate (α) is consistent for both clustering and overdispersion (values inside the 13–37 range, P ≤ 0.01). In italic: cases where the test is extremely liberal, being significant at a frequency at least four times the fixed α (values ≥ 100 for clustering or for overdispersion)
Null model 1s
Null model 1a (K = 3)
Null model 2s
Null model 2x
Null model 3i
Null model 3t
In the simulated community, the mean species richness is about 85 overall and about 24 within a given site, these values being lower when the sub-community size was variable and local richness was reduced furthermore when migration between sites was limited by group membership (Table 3). As expected, APD = 0 when species abundances are phylogenetically random in the source community, whereas it is positive (APD = 0.27 ± 0.04 SD) under phylogenetically clustered species abundances (simulations with Fabaceae species very abundant). Interestingly, the actual species abundances found on BCI lead to slightly negative values (APD = −0.02 ± 0.02 SD), indicating that species abundances are phylogenetically overdispersed on BCI (note that although c. 81% of replicates gave negative APD, application of null model 1s failed to detect significance in the majority of cases).
Within sites, the mean phylogenetic distance between species (mMPD) ranges from 79 to 115 million years, being highest when BCI species frequencies are used in the source community, and lowest when species frequencies are phylogenetically clustered. The within site mean nearest taxon distance between species (mMNTD) and mean phylogenetic distance between individuals () follow the same trends (Table 3). ΠST and BST are very close to zero, as expected in the absence of spatial phylogenetic structure. RPD-CA and RPD-CO are much affected by the phylogenetic distribution of species abundances in the source community, being close to zero under a phylogenetically random distribution, very negative under a phylogenetically clustered distribution, and somewhat positive with the phylogenetically overdispersed BCI distribution (Table 3). By contrast, RPD-RA, RPD-RO and RPD-DO metrics are much less affected by the phylogenetic distribution of species abundances and remain always close to zero (Table 3).
When the phylogenetic tree is randomized without constraint on species abundances (null models 1s and 1p), the tests respect the Type I error rates for all the metrics as long as the distribution of species abundances is phylogenetically random (Table 4, Table S3). Under abundance phylogenetic structuring, most metrics lead to (sometimes extreme) liberal tests. However, some metrics such as BST and RPD-DO provide (near) exact tests with the phylogenetically overdispersed BCI abundance distribution and they are the most reliable metrics using null models 1s and 1p (Table 5). Results obtained using null models 1s and 1p are similar, though deviations from expected Type I error rates are somewhat higher for null model 1p. When permutations are allowed only among species having similar abundances (null model 1a), conformance of Type I error rates improve substantially for most metrics, even under extreme abundance phylogenetic clustering, and tests become satisfactory using ΠST, BST, RPD-RA, RPD-RO and RPD-DO (Tables 4 and 5). Hence, in the absence of a phylogenetic structure in the species abundance distribution one can advise to use null model 1s, otherwise randomizations should be constrained to preserve the pre-existing abundance phylogenetic structure (null model 1a).
Table 5. Summary of Type I error rate conformance of the different tests (combination of a metric with a null model) applied on the simulated data sets conforming to the null hypothesis to be tested (local neutrality). Values are the quadratic averages of the absolute difference between observed and expected number of significant tests over 1000 replicates, averaged over the 18 parameter sets (MSE1/2). Low values (in bold) indicate the best tests (Type I error rates nearly conforming to nominal values in all simulation conditions) whereas high values (in italic) indicate potentially highly liberal tests (high risk of false positive in some simulation conditions). The statistical properties of mMPD and are representative of the properties of, respectively, the PSV (and PSR) metric and the PSE metric defined by Helmus et al. (2007a), when applied on multiple samples
MNTD (one site)
MPD (one site)
I now consider the tests based on species by site matrix randomization. Essentially all the tests based on null models 2p and 2s are highly liberal (Tables 4 and 5). Resampling species from the reference species pool (null model 2p) worsens the statistical properties (Table S3). Hence, these randomization procedures are clearly inadequate to test whether non-neutral processes generate a spatial phylogenetic structure. The tests based on null model 3i are generally valid as long as sub-community sizes remain constant and migration is random, otherwise they are too liberal, especially when migration is limited by group membership (Table 4, Table S3). The tests based on the swap algorithm (null models 2x and 3x) are valid in the same conditions as the preceding one (null model 3i) but perform better when sub-community size varies among sites, at least for the metrics based on presence/absence data (Table 4). Nevertheless, they remain too liberal when migration is not random, or when abundance phylogenetic clustering is combined with varying sub-community sizes, though the RPD-DO metric keeps good statistical properties (Table 4, Table S3). Finally, null model 3t, which is similar to null model 3i but maintains the spatial autocorrelation of local abundances for each species, recovers good statistical performance on most metrics when migration is limited by distance, provided that sub-community size is constant, although it is too conservative with BST and it is not very effective when migration is limited by group membership (Table 4). This null model seems thus adequate to test patterns generated under limited dispersal but it requires constant sample sizes and sampling sites located on a regular rectangular grid or on a transect.
Overall, very few tests gave consistent Type I error rates in all conditions (c. 15 over 108 tests; Table 5), liberal tests being usually observed when some structural feature of the data set was lost by the randomization algorithm. However, RPD-DO proved very robust with all randomization algorithms (except null model 2p; Table 5). Among the metrics based on abundance data, RPD-RA and BST used with null model 1a provided the most satisfactory tests (Table 5).
MNTD and MPD assessed for a single site, which are the main test metrics used by the software Phylocom, show better conformance with Type I error rates than their multiple sample counterparts (mMNTD and mMPD, Table 5). They were often used with null model 2s, in which case they perform satisfactorily provided that there is no strong phylogenetic structure in the species abundance distribution (whereas mMNTD and mMPD give highly liberal tests in all conditions, Table S3). However, they are expected to suffer low testing power because they do not provide a global test for multiple samples.
The ‘neutral model’ is a mechanistic model of community assuming species equivalence (identical per capita rates of birth, death, migration and speciation; Hubbell 2001). The more realistic variant of the neutral model considered here, the ‘locally neutral model’, assumes that neutrality holds at some defined scale (a set of sub-communities) but not necessarily at a higher level (regional species pool). Gotelli & McGill (2006) insisted on the difference between neutral and null models. The latter are pattern-generating models based on data randomization whereby certain structural features are held constant while others are allowed to vary stochastically in order to create new ‘somehow random’ patterns. Null models are particularly suited to design tests by generating the distribution of a particular metric. A well chosen null model (i.e. a particular randomization algorithm) is expected to allow testing the absence of a particular ecological mechanism that affects a feature of species assembly pattern. Hence, null models can be of interest to test the neutral assumption. A locally neutral community is expected to show no ‘spatial phylogenetic structure’, as long as biogeographic effects can be neglected (speciation rate << large scale migration rate), though it may show an ‘abundance phylogenetic structure’, in particular when non-neutral processes affect abundances in the regional species pool. The goal of this study was to assess the conformance of Type I error rates of different null models used to test the locally neutral assumption considering different metrics describing the spatial phylogenetic structure. A potential problem with null models is that data randomization may simultaneously affect not only the association between phylogenetic distance and the co-occurrence patterns of species, but also species abundances across samples, the spatial autocorrelation of local species abundances, and/or the phylogenetic distribution of species abundances. Simulations of locally neutral communities have shown that most of the tests do not respect Type I error rates in all simulated conditions, though they usually perform satisfactorily under specific and often predictable conditions (Table 4). This is not unexpected because most of the null models cannot constrain all the features of the data set listed in Table 2, except null model 1a which performs satisfactorily with most metrics (Table 5). Hence, the simulation results suggest that null models should ideally preserve all structural features of the data unrelated to the spatial phylogenetic structure, otherwise randomization tests can become too liberal. Similar problems were reported in related randomization tests, for instance to assess whether patterns of species co-occurrence is random (Gotelli 2000; Ulrich & Gotelli 2007a,b), or with partial Mantel tests (Raufaste & Rousset 2001).
The conditions leading to valid tests are largely predictable from the features of the randomized data set that are preserved. For instance, phylogenetic tree randomization (null models 1s and 1p) keeps all features listed in Table 2 except the species abundance phylogenetic structure, which is randomized. Consequently, it leads to exact tests when global species abundances show no phylogenetic structure (Table 4). For similar reasons, null models based on species by site matrix randomizations that do not preserve the spatial autocorrelation of local species abundances (null models 2s, 2p, 2x, 3i, 3x) generally provided liberal tests under limited dispersal (migration limited by distance or by group membership, Table 4). Null models 3i and 3t are similar (Fig. 1) except that only the latter preserves the spatial autocorrelation of local species abundances. When applied on a data set where such spatial autocorrelation occurs because migration is limited by spatial distance, the frequency distributions of all metrics are more dispersed under null model 3t (not shown). Hence, null models not preserving spatial autocorrelation tend to be liberal under limited dispersal because they underestimate the stochastic variance of the metric. More generally, it seems that when there is a complex pattern of differentiation between sub-communities (dependency on spatial distance or group membership), randomization of species-site matrix must preserve this complex pattern otherwise liberal tests may result. Therefore, for the simulated data where dispersal was limited by group membership, we can expect that a null model applying the swap algorithm within each subgroup would probably show good statistical properties because group membership would have been preserved. By analogy with the present results, it is likely that dispersal limitation also affect Type I error rates of null model based tests of nestedness and co-occurrence patterns. Hence, the swap algorithm, which is usually considered as the most reliable to test these patterns (Gotelli 2000; Ulrich & Gotelli 2007a,b), should be evaluated further under dispersal limitation.
Interestingly, some metrics provide more robust tests than others. This is particularly the case of the RPD-DO metric (Table 5). The robustness of some metrics is probably due to their way of quantifying the phylogenetic structure in a more ‘standardized’ or ‘relative’ fashion. For example, mMPD is an absolute inter-species divergence time within sites whereas ΠST is a ratio involving inter-species divergence time within sites () and among sites (): mMPD = and ΠST =. Therefore, if global species abundances are phylogenetically clustered (APD > 0), the randomization of the phylogenetic tree following null model 1s or 1p will increase , affecting mMPD but not necessarily ΠST. In other words, mMPD confounds abundance and spatial phylogenetic structures, whereas ΠST reveals only the spatial phylogenetic structure. This explains why, using null model 1s or 1p, ΠST provide more robust tests than mMPD under species abundance phylogenetic structuring (Table 4). Similarly, RPD-CO and RPD-CA are based on a co-occurrence index (proportional similarity, Cij, Schoener 1970; see Appendix S1) sensitive to the global abundance of the species being compared because Cij tends to be high for common species and low for rare species, whereas RPD-DO, RPD-RO and RPD-RA are based on measures of species co-occurrences more standardized, which are close to zero for species showing independent spatial distributions whatever their global abundances. Consequently, the latter metrics, and particularly RPD-DO, appear much more robust than the former (Table 5). For example, in RPD-DO, co-occurrence is computed as DOij = (Pij − PiPj)/(PiPj), where Pi, Pj and Pij are the proportions of sites where species i occurs, species j occurs, and both species occur, respectively, so that DOij ≅ 0 under independent distributions of species i and j because the product PiPj is the expectation of Pij.
Recently, Kraft et al. (2007) simulated the species assembly of a single local community connected to a source community under various scenarios of species trait evolution and community assembly processes, to assess the statistical properties of the single sample MPD and MNTD metrics combined with null model 2p. Contrary to the present study, they found consistent Type I error rates under scenarios that do not generate community phylogenetic structure. However, as they acknowledged, their species (rather than individual) based model did not consider species abundance variation so that each species had the same probability of migrating into the local community. In addition, because a single local community was considered, dispersal spatial limitation and community size variation were not relevant. Hence, there is no contradiction with the present results which also predict consistent Type I error rates under the model simulated by Kraft et al. (2007). An important conclusion of the present study is that the statistical properties of the tests can change substantially when applied on multiple samples, and null models 2s and 2p then become highly unreliable to test for the absence of a spatial phylogenetic structure because they underestimate the variance and/or shift the frequency distribution of the metrics (Fig. 3). This is clearly revealed by the statistical behaviour of the single-site MPD and MNTD metrics (used in the software Phylocom, Webb et al. 2007) which lead to correct tests when combined with null model 2p, which consists in resampling species at random from a reference species pool, whenever there is no species abundance phylogenetic structuring, whereas the all-site average mMPD and mMNTD metrics always perform very badly with this null model (Table S3).
A noted by Helmus et al. (2007a) when discussing the use of different null models with the metric PSV, which behaves in the same way as MPD (Appendix S1), a significant MPD or MNTD value for a site according to null model 2p indicates that the species abundance distribution in this site in phylogenetically non-random, but one cannot assess whether the pattern results from ecological processes sorting species in this site or from a phylogenetically non-random abundance distribution in the pool of species susceptible of migrating into the site. More generally, Helmus et al. (2007a) argued that null model 2s permits to detect a phylogenetic structure but without distinguishing an abundance phylogenetic structure from a spatial phylogenetic structure, whereas null model 3i detects only a spatial phylogenetic structure, and they had concerns about the interpretation of null model 3x (swap algorithm). However, the present simulation results show that their statement regarding null model 2s does not hold for multiple communities (though it is correct for a single community as explained above) and the one regarding null model 3i holds only when all local communities are equivalent (same size) and equivalently interconnected (random migration). Null model 3x resolves to a large extend the problem of null model 3i under variable community sizes, but remains inadequate when migration among local communities is not random, except when a single community is examined, explaining why the single-site MPD and MNTD metrics performs well with the swap algorithm (null models 2x and 3x) in all conditions (Table 5).
Some previous pioneering and important studies that have lead to the development of community phylogenetic structure analyses (e.g. Webb et al. 2000; Cavender-Bares et al. 2004, 2006; Horner-Devine & Bohannan 2006; Kembel & Hubbell 2006; Helmus et al. 2007a) have used combinations of metrics (MNTD, MPD, PSV, RPD-CA or RPD-CO) and null models (usually null models 1s, 2p, 3i or 3x) that potentially lead to biased tests of local neutrality (Table 4). This does not necessarily invalidate their conclusions because the latter were mostly based on convergent lines of evidences and deviation from expected Type I error rates may have often been limited. In addition, tests are not biased per se (they are always exact with respect to a null hypothesis implicit in the randomization scheme) but can be biased when used to test the action of non-neutral processes acting at a local scale. Hence, the present results call for precautions in future studies.
A useful test should not only respect Type I error rate but also be powerful to avoid Type II errors (false negative). Different processes can generate a phylogenetic structure in a community and evaluating the power of the different tests under the many possible non-neutral models is beyond the scope of this paper. However, to verify whether metrics or null models that provided tests respecting Type I error rates did not suffer a particular loss of power, I ran some simulations of a non-neutral model involving habitat heterogeneity and species habitat specialization. The latter could be phylogenetically conserved, causing a spatial phylogenetic structure, or phylogenetically random, causing no spatial phylogenetic structure (O. J. Hardy, unpublished data). In general, all tests showing consistent Type I error rates under phylogenetically random habitat specialization were able to detect a phylogenetic structure under phylogenetically conserved habitat specialization, and the constrained null model 1a was as powerful as other null models. In some conditions, metrics based on abundance data (e.g. BST) were more powerful than metrics based on presence/absence data (e.g. RPD-DO). Thus, the higher reliability of RPD-DO regarding Type I error rate might sometimes be at the cost of a higher risk of Type II errors (i.e. reduced testing power). Similarly, although MPD and MNTD have better statistical properties than mMPD and mMNTD regarding Type I error rates, their statistical power to detect a phylogenetic structure was much reduced. Whether tradeoffs occur between Type I and Type II error rates deserves further study but it is not unlikely that phylogenetic tree randomization algorithms (null models 1) are more powerful than species by site matrix randomization algorithms (null models 2 and 3) where there are many species and few sites, and vice-versa when there are many sites and few species.
Analyzing the phylogenetic structure of natural communities is potentially very insightful to infer indirectly the kind of processes governing the composition of species assemblages, or as a general test of community (local) neutrality. Ulrich (2004) insisted that neutral stochastic processes (ecological drift, dispersal) can cause complex (‘non-random’) co-occurrence patterns of species, so that detecting a ‘non-random’ pattern using a null model does not demonstrate that niche-assembly processes were involved. In other words, a significant test means that the observed pattern is different from the set of possible patterns generated under a particular randomization algorithm. But the implicit null hypothesis of a null model, as embodied by the randomisation algorithm, does not necessarily match the actual null hypothesis one wishes to test. The simulation results indicate that conformance of Type I error rates is more likely to hold for more constrained null models, thus for randomization algorithms preserving most structural features of the data unrelated to the null hypothesis to test.
I propose the following guidelines to help choose an adequate test to detect a spatial phylogenetic structure (independently from an abundance phylogenetic structure) for a given data set. First, the phylogenetic distribution of global species abundances must be tested. To this end, the APD index can be tested by randomizing the phylogenetic tree (using null model 1s). Alternative approaches are also available, for example using the K statistic of Blomberg et al. (2003), trait analysis in the software Phylocom (Webb et al. 2007), or the software Mesquite (Maddison & Maddison 2005). A non-significant test would suggest that null model 1s can be used to test community phylogenetic structuring, a priori with any metric (though using the most robust ones is better). Otherwise, null model 1a should be applied to avoid confounding a spatial phylogenetic structure with an abundance phylogenetic structure, and it will perform better with ΠST, BST, RPD-RA, RPD-RO or RPD-DO. In principle, null model 1a could also be used in the absence an abundance phylogenetic structure but the constraint imposed by this null model might reduce its testing power in some cases, especially when the number of species is limited. I do not recommend the null models randomizing the species by site matrix (null models 2 and 3) in a general case, though they might be applied to confirm results, in which case I recommend to use the metric RPD-DO with null model 3x (swap algorithm). Other tests (combination of a metric and a null model) could be used provided that additional structural features of the data set are checked (spatial autocorrelation of local species abundances per site, homogeneity of sample sizes and local diversity per site) and taken into account when choosing the null model so that they are preserved as far as possible.
I thank the CTFS for providing the BCI data set and Jonathan Davies for providing the supertree of Angiosperm families. Many thanks are due to Jeannine Cavender-Bares, Jérôme Chave, Pierre Couteron, Nick Gotelli, Cam Webb and three anonymous reviewers for their comments on previous drafts. This project was born during the workshop ‘Phylogenies and community ecology’ from the ‘GDR Interactions Biotiques dans les Communautés: Théories et Modèles’ financed by INRA and CNRS, and its completion benefited from the project OSDA financed by the programme ‘Ecosystèmes Tropicaux’ of the ‘Ministère de l’Ecologie, du Développement et de l’Aménagement Durables’, France, and from the project BRIDGE financed by the programme ‘Biodiversité’ of the ‘Agence Nationale de la Recherche’, France. I also thank the Belgian Fund for Scientific Research (FNRS), where I am a Research Associate, for financial support.