Characterizing the phylogenetic structure of communities by an additive partitioning of phylogenetic diversity



    1. Behavioural and Evolutionary Ecology – CP 160/12, and *Laboratoire de Botanique systématique et de Phytosociologie – CP 169, Université Libre de Bruxelles, B-1050 Brussels, Belgium
    Search for more papers by this author

    1. Behavioural and Evolutionary Ecology – CP 160/12, and *Laboratoire de Botanique systématique et de Phytosociologie – CP 169, Université Libre de Bruxelles, B-1050 Brussels, Belgium
    Search for more papers by this author

Olivier Hardy (tel. +32 (0)2650 6585; fax +32 (0)2650 2445, e-mail


  • 1Analysing the phylogenetic structure of natural communities may illuminate the processes governing the assembly and coexistence of species in ecological communities.
  • 2Unifying previous works, we present a statistical framework to quantify the phylogenetic structure of communities in terms of average divergence time between pairs of individuals or species, sampled from different sites. This framework allows an additive partitioning of the phylogenetic signal into alpha (within-site) and beta (among-site) components, and is closely linked to Simpson diversity. It unifies the treatment of intraspecific (genetic) and interspecific diversity, leading to the definition of differentiation coefficients among community samples (e.g. IST, PST) analogous to classical population genetics coefficients expressing differentiation among populations (e.g. FST, NST).
  • 3Two coefficients which express community differentiation among sites from species identity (IST) or species phylogeny (PST) require abundance data (number of individuals per species per site), and estimators that are unbiased with respect to sample size are given. Another coefficient (ΠST) expresses the gain of the mean phylogenetic distance between species found in different sites compared with species found within sites, and requires only incidence data (presence/absence of each species in each site).
  • 4We present tests based on phylogenetic tree randomizations to detect community phylogenetic clustering (PST > IST or ΠST > 0) or phylogenetic overdispersion (PST < IST or ΠST < 0). In addition, we propose a novel approach to detect phylogenetic clustering or overdispersion in different clades or at different evolutionary time depths using partial randomizations.
  • 5IST, PST or ΠST can also be used as distances between community samples and regressed on ecological or geographical distances, allowing us to investigate the factors responsible for the phylogenetic signal and the critical scales at which it appears.
  • 6We illustrate the approach on forest tree communities in Equatorial Guinea, where a phylogenetic clustering signal was probably due to phylogenetically conserved adaptations to the elevation gradient and was mostly contributed to by ancient clade subdivisions.
  • 7The approach presented should find applications for comparing quantitatively phylogenetic patterns of different communities, of similar communities in different regions or continents, or of populations (within species) vs. communities (among species).


Community assembly of the individuals of different species depends on both historical (biogeographical) and ecological phenomena (Webb et al. 2002). To co-occur, species must have both overlapping geographical distributions and overlapping habitat affinities (although a species could occur in a suboptimal habitat through dispersal from other nearby habitats). In addition, when species ecological niches overlap excessively, competitive exclusion could limit coexistence. Ecological niches depend on the similarities between species traits. Ideally, an assessment of species geographical ranges and a detailed characterization of the species traits relevant for habitat preferences and biotic interactions would be necessary to understand and predict community assemblages, but this is an insurmountable task for species-rich communities. However, species phylogeny is highly relevant to understanding community assembly because it provides insight into the divergence time among species, and hence is a proxy for interspecific biogeographical similarity as well as for ecological similarity if niche conservatism occurs during evolution. Investigating the phylogenetic structure of communities can thus provide useful insight to understand the historical and ecological factors shaping species assemblages (Webb et al. 2002; Cavender-Bares et al. 2004).

The correlation between any two species ranges is expected to decrease with their divergence time because of the accumulation of independent dispersal and local extinction events over time. Likewise, the similarity in habitat preferences of species pairs should decrease with increasing phylogenetic distance because of the accumulation of independent trait changes over time, a pattern known as phylogenetic trait conservatism (Lord et al. 1995; Webb et al. 2002). Therefore, a pattern of phylogenetic clustering (i.e. species co-occurring within communities are more related on average than species from different communities) is expected whenever the communities compared are geographically distant (i.e. biogeographical origin; Fig. 1a), or whenever the communities occur in contrasted habitat (ecological origin; Fig. 1c). However, the correlation between divergence time and geographical or ecological differentiation does not necessarily hold, in particular for closely related species, and a pattern of phylogenetic overdispersion (i.e. species co-occurring within communities are less related on average than species from different communities) may result. Such a pattern could result from allopatric speciation of widely distributed ancestral species caused by a biogeographical barrier (Fig. 1b). It could also occur among communities situated in distinct habitats because sister species have specialized into these habitats, as a result of sympatric speciation driven by ecological differentiation, or as a result of secondary habitat differentiation driven by competitive exclusion (Fig. 1d). Finally, phylogenetic overdispersion can occur among communities situated in similar habitats as a direct consequence of competitive exclusion between sister species with widely overlapping niches (Fig. 1e). As illustrated by Cavender-Bares et al. (2004), community phylogenetic patterns driven by ecological factors ultimately depend on the interplay between the evolution of species traits, which can be phylogenetically conservative or convergent, the environmental filtering of species traits, which favours phenotypic clustering, and the competitive interaction between species, which favours phenotypic overdispersion. A random pattern (i.e. absence of phylogenetic clustering and overdispersion) may mean that the impact of these three processes is not significant, as would be the case for a neutral community (Hubbell 2001). In fact, overdispersion may occur in some lineages, or just among closely related species, whereas clustering affects other lineages or more distantly related clades, so that the overall pattern may be difficult to distinguish from a random one. It must be emphasized that the scales of observation (geographical range, range of habitats covered, taxonomic delimitation of the communities) are very important to consider when interpreting community phylogenetic patterns. For example, studying oak species spread over a range of forest habitats in north central Florida, Cavender-Bares et al. (2004) detected phylogenetic overdispersion, but the pattern reverted to phylogenetic clustering when all plant species were considered in the same habitats, and this clustering became amplified when a larger pool of habitats (including wetlands and coastal communities from Florida) was included (Cavender-Bares et al. 2006). The same type of dependency towards taxonomic and species pool scaling was reported by Swenson et al. (2006) when analysing the phylogenetic structure of neotropical forest tree communities. Thus, while community phylogenetic overdispersion may occur at a shallow phylogenetic depth because of competitive exclusion among species from a radiating clade, phylogenetic clustering could be present at deeper phylogenetic depth because of niche conservatism. Hence, methods to characterize the phylogenetic structure of a community at different time depths are very valuable (e.g. Webb 2000).

Figure 1.

Types and origins of community phylogenetic patterns. Biogeographical origin: (a) phylogenetic clustering due to local speciation of allopatric clades; (b) phylogenetic overdispersion due to allopatric speciation of two ancestral sympatric species caused by the same biogeographical barrier. Ecological origin: (c) phylogenetic clustering due to habitat filtering of phylogenetically conserved traits; (d) phylogenetic overdispersion due to habitat filtering of phylogenetically convergent traits (when sister species arise from ecological speciation); (e) phylogenetic overdispersion due to competitive exclusion of related species showing phylogenetically conserved traits. The communities compared occur in contrasted habitats in cases (c) and (d), and in similar habitats in case (e).

Using species inventory data and the topology of a phylogenetic tree, Webb (2000) developed a method to assess whether species co-occurring locally are more related than species from a regional species pool. Similar studies have been published recently (e.g. Webb et al. 2002; Cavender-Bares et al. 2004, 2006; Horner-Devine & Bohannan 2006; Kembel & Hubbell 2006; Lovette & Hochachka 2006; Silvertown et al. 2006; Swenson et al. 2006), and given the recent availability of dated super-trees based on molecular phylogenetic data (e.g. Davies et al. 2004, for angiosperm families) and of new tools and software performing phylogenetic community structure analyses (e.g. Webb & Donoghue 2004; Webb et al. 2004), many new studies are likely to be forthcoming.

As for any new research area, the development of appropriate statistical tools is fundamental. We present a statistical framework that quantifies and partitions additively into alpha and beta components (i) the phylogenetic diversity of communities expressed by the average divergence time between pairs of individuals, and (ii) the phylogenetic distinctness of species assemblages expressed by the average divergence time between pairs of species. This framework is based on several previous treatments of biodiversity organization (e.g. Rao 1986; Lande 1996; Ganeshaiah et al. 1997; Clarke & Warwick 1998; Shimatani 2001; Veech et al. 2002; Webb et al. 2002; Couteron & Pélissier 2004; Pavoine et al. 2004, 2005; Pavoine & Dolédec 2005; Chave et al., in press). The framework generalizes some classical diversity coefficients (such as Simpson's diversity) in a phylogenetic perspective, and defines differentiation coefficients analogous to classical population genetics indices (e.g. FST, NST) that quantify the strength and direction of the phylogenetic signal. These differentiation coefficients can also be assessed for pairs of community samples so that the phylogenetic signal can be interpreted according to ecological or geographical distances between communities. Testing the phylogenetic signal can be achieved by randomizing species at the tips of the phylogenetic tree. In addition, we propose new partial randomization tests to detect a phylogenetic signal within different clades or within clades younger than some time threshold. Such randomizations can potentially discern phylogenetic clustering and overdispersion if both occur at different levels or depths of a phylogenetic tree.

To illustrate this approach, we analyse floristic inventories performed in rain forests of Equatorial Guinea: (i) we test for phylogenetic clustering/overdispersion within sites, (ii) we assess in which clades a phylogenetic signal occurs, (iii) we compare the testing power of statistics based on species abundance vs. species presence/absence, (iv) we identify which ecological factors best predict the phylogenetic signal using pairwise differentiation coefficients, and (v) finally we assess the robustness of the approach with respect to the precision of the phylogenetic tree by comparing our results with those obtained using a rank-based species classification.


Partitioning species diversity within and among sites is fundamentally analogous to partitioning allele diversity within and among populations, and the basic processes determining the dynamics of populations and communities are also very similar in nature. Consequently, the way genetic diversity is partitioned given a genealogy of alleles (Pons & Petit 1996) can be used to describe the phylogenetic structure of communities (Pavoine & Dolédec 2005; Chave et al., in press). In the following, we present descriptive statistics of the phylogenetic structure of communities requiring species abundance data, or species incidence (presence/absence) data, and we show how essentially unbiased estimates can be obtained. We then present randomization procedures to test whether phylogenetic clustering or overdispersion occur. Our definition of ‘community’ is any assemblage of species spatially localized and possibly restricted to some clade, functional group and/or phenotypic features (e.g. all angiosperm trees with a diameter above 10 cm in a 1-ha plot).

partitioning phylogenetic diversity from species abundances

Different measures quantify species diversity of a community, some of the best known being the species richness, the Shannon–Wiener index, the Simpson diversity index and Fisher's alpha (Magurran 2004). These statistics differ in the way species frequency is accounted for (e.g. Couteron & Pélissier 2004), but also in their sensitivity to sample size (Gimaret-Carpentier et al. 1998). For example, Simpson diversity gives much weight to common species, nearly ignoring rare species, and is unbiased with respect to sample size. In contrast, species richness weights common and rare species equally, but is highly dependent on sample size so that unbiased estimates are difficult to obtain in species-rich communities (Gotelli & Colwell 2001).

These statistics do not consider the phylogenetic relationships between species but the Simpson diversity index, D, can be extended to incorporate phylogenetic information. D is


where fi is the frequency of species i in a community. As inline image is the probability that a random pair of individuals from the community belong to species i, inline image is the probability that the pair belongs to the same species. Hence, D is the probability that two individuals from the community belong to a different species. It can also be expressed as


if δij is an indicator variable equal to 0 when i = j (same species) and equal to 1 when i ≠ j (different species).

Equation 2 can be extended to incorporate species phylogeny by letting δij be a continuous variable expressing the phylogenetic distance between i and j rather than being a binary (0, 1) variable. For example, if δij is the divergence time between species i and j (i.e. the age of the most recent common ancestor of i and j), D is the mean divergence time between two randomly sampled individuals in the community, and can be interpreted as a measure of phylogenetic diversity. In the following we will distinguish DI, the original Simpson diversity index which accounts only for species identity (i.e. not their phylogeny), and DP, a diversity measure accounting for species phylogeny and equivalent to Rao's ‘quadratic entropy’ (Rao 1982) for the special case where δij is a phylogenetic distance (see Table 1 for equivalence with other publications). Note that δij could represent other types of distances between species (e.g. taxonomic, morphological, functional). However, a diversity coefficient should ideally always increase when a species is added, a property of D ensured when δij is ultrametric (Pavoine et al. 2005), which is the case when δij is the divergence time or, more generally, when δij is obtained from the branch lengths of a rooted tree in which all the end nodes are equidistant from the root (ultrametric tree).

Table 1.  Main coefficients presented in this paper and correspondences with other publications
DIProbability that two individuals belong to different species (Simpson diversity index). (ref. 2); D (ref. 3)
DPMean phylogenetic distance (e.g. divergence time) between individuals (an index of phylogenetic diversity).D (ref. 2); Δ (ref. 3, 4); AI (ref. 5); HΔ (ref. 6)
ISTProportion of the overall species diversity (DI) expressed among sites. Analogue to FST and GST (ref. 1‡) in population genetics. 
PSTProportion of the overall phylogenetic diversity (DP) expressed among sites. Analogous to NST (ref. 1‡) in population genetics.FST (ref. 2)
ΔPMean phylogenetic distance (e.g. divergence time) between distinct species (an index of phylogenetic species distinctness).Δ+ (ref. 4); 1/2 MPD (ref. 7)
ΠSTProportion of the overall phylogenetic species distinctness (ΔP) expressed among sites. 

When divergence time between species are known, DI can be partitioned additively according to classes of divergence time, c (Shimatani 2001; Ricotta 2005). For instance, inline image where inline image is the probability that two individuals belong to distinct species whose divergence time is included in class c (classes must be non-overlapping and cover the full range of divergence times between species). In the limit of very narrow classes, inline image where δc is the average divergence time corresponding to class c.

DI and DP can be used to define differentiation coefficients between local communities (sites) as follows. Diversity can be perceived at different levels and it is customary to decompose the total amount (gamma diversity) into a local component (alpha diversity) and an intersite or interhabitat component (beta diversity; Whittaker 1972). D can be partitioned additively, as in a nested anova (e.g. Lande 1996; Couteron & Pélissier 2004; Pavoine & Dolédec 2005). Let fi and fik be the frequency of species i overall in the region and within site k, respectively. The total (gamma) diversity is defined as


The diversity within site k is


The average within-site (alpha) diversity, DS, is the expectation of Dk over all sites. The beta diversity is the difference DT –DS. This among-site component can be rewritten as a fraction of the total diversity, expressing differentiation among sites using information from species identity or species phylogeny:


where the subscripts S and T refer to the fact that diversity within site is compared with total diversity. IST and PST thus represent fractions of the overall species or phylogenetic diversity expressed among sites. They are equivalent to, respectively, FST (or GST, the proportion of genetic diversity expressed among populations considering allele identity) and NST (the proportion of genetic diversity expressed among populations considering the phylogeny of alleles), which are defined at the within-species level (Pons & Petit 1996). Note that the same formalism of community diversity decomposition was proposed by Chave et al. (in press), who used the symbol ‘FST’ to denote PST in reference to population genetics literature (Table 1). However, to avoid confusion, here we introduce new symbols. Moreover, we demonstrate below how to compare, estimate and test IST and PST to make inferences on the phylogenetic structure of communities.

Among-site differentiation occurs when PST > 0 or IST > 0. An interesting property of these differentiation coefficients is that PST = IST when there is no community phylogenetic structuring, whereas PST > IST (PST < IST) indicates phylogenetic clustering (overdispersion), i.e. species found within a same site are more (less) related on average than species taken from different sites. The reason is that IST expresses only the among-site differences in species frequencies whereas PST also expresses the gain of phylogenetic divergence among species. The strength and direction of the phylogenetic signal can thus be quantified by the difference PST − IST.

To obtain unbiased estimators (Pons & Petit 1996), we assume that N randomly chosen sites have been sampled in a large community, and that nk individuals have been sampled in site k (nk > 3). Let ik be the observed frequency of species i in site k (an estimate of the parameter fik which is the actual value). The phyletic distance between species, δij, is assumed to be known without error. Unbiased estimates of diversity coefficients can be obtained from the principles of an anova, considering that the factor site is a random effect (Pons & Petit 1996):


Estimators of differentiation coefficients are obtained as:


PST and IST assess the proportion of the total diversity explained by species turnover among sites. These coefficients can also be used to quantify pairwise differentiation between sites by applying the above formulae in the particular case where N = 2. Such between-site differentiation coefficients can then be regressed on explanatory variables that express the distance (geographical, ecological) between sites to infer which factors might be responsible for the phylogenetic signal.

partitioning phylogenetic distinctness from species incidence

The diversity measures presented above require abundance data (counts of individuals) and rare species are underemphasized. As an alternative, we follow Clarke & Warwick (1998) to define a measure of phylogenetic ‘distinctness’ based on species incidence (presence/absence data) that should remain robust with respect to sample size. For a given community


where pi = 1 if species i is present (fi > 0), otherwise pi = 0 (fi = 0). The denominator is twice the number of pairwise comparisons between existing species. Note that, contrary to D, the double sums exclude comparisons of a species with itself.

It is easiest to understand ΔP by analogy with DP: while DP is the mean phyletic distance between distinct individuals, ΔP is the mean phyletic distance between distinct species. It can thus be interpreted as a measure of ‘phylogenetic distinctness’ between species (see Table 1 for equivalence with other publications). Contrary to DP, ΔP is not a measure of community phylogenetic diversity because it does not necessarily increase with the addition of new species. But like DP, ΔP can be evaluated at different levels: inline image the average within site, and inline image over all sites. Hence, a coefficient analogous to PST can be defined as:


ΠST expresses the gain of phyletic divergence between species occurring in different sites compared with species occurring in the same site. It is not sensitive to the gain in species richness among sites vs. within a site because ΔP expresses phylogenetic distinctness independently of species richness. Its expectation is ΠST= 0 when there is no community phylogenetic structuring and ΠST > 0 (ΠST < 0) under phylogenetic clustering (overdispersion).

To derive estimators that should be unbiased when phylogenetic distances and differences in species abundances are uncorrelated, let ik= 1 if ik > 0, otherwise ik= 0:


testing the phylogenetic structure of communities

To detect phylogenetic clustering or overdispersion within sites, PST or ΠST can be computed after randomizing the species among the tips of the phylogenetic tree used to define the δij phyletic distances (Fig. 2a). Because such tree randomization breaks down the actual phyletic relationships among species (while keeping the tree architecture intact), the resulting pseudo PST or ΠST values (hereafter denoted pPST or pΠST) are representative of a community without phylogenetic structuring, conforming to the null hypothesis to be tested. Hence, the actual PSTST) can be compared with the distribution of pPST (pΠST) obtained for many independent randomizations, providing an estimate of the P value for the null hypothesis that there is no phylogenetic signal. This procedure tests if PSTIST or if ΠST = 0 because pPST and pΠST have statistical expectations equal to IST and 0, respectively (see Hardy et al. 2003 for an analogy in population genetics).

Figure 2.

Modes of phylogenetic tree randomizations for testing phylogenetic patterns at different evolutionary levels. The structure of the tree is left unchanged but species are randomly permuted within the grey areas so that the apparent divergence times between species are modified. (a) Complete tree randomization. (b) Partial tree randomization for the clade shown by the arrow. (c) Partial tree randomization for clades younger than the age threshold shown by the stippled line.

The same principle could be applied to test the phylogenetic diversity/distinctness measures themselves inline image rather than their ratios (PST, ΠST) but care must be taken in the interpretation of the results. Indeed, inline image (or inline image) would mean that the difference in abundance (or frequency) between species is correlated with their phyletic distance, and thus that the most abundant species belong to particular clades of the phylogenetic tree (clustering/overdispersion is relative to the particular tree considered). It does not demonstrate that the species found within sites are more or less related than species found among sites. The latter situation can only be tested using the ratios (PST, ΠST).

partial randomizations of the phylogenetic tree

When a non-random community phylogenetic structure is observed, one may wonder which parts of the phylogenetic tree contribute to the pattern. Partial tree randomizations, where species positions are randomized only within a defined clade, allow one to test phylogenetic clustering or overdispersion independently within each clade (Fig. 2b).

A community phylogenetic structure may also show overdispersion within recent clades and clustering within old clades. To assess phylogenetic patterns according to evolutionary depth, partial tree randomizations where species positions are randomized only within clades younger than an age threshold (Fig. 2c) can provide interesting clues, because the resulting pPST (or pΠST) retains part of the phylogenetic information and its expectation lies between IST and PST (or between 0 and ΠST for pΠST). The difference pPST − IST (or pΠST) expresses then the extent of community phylogenetic structuring contributed by the separation of clades that diverged before the age threshold.

If phylogenetic overdispersion occurs in some clades while phylogenetic clustering occurs in other clades, these patterns may cancel out. Partial randomizations should thus permit the identification of such situations using PST or ΠST.

Application to a tropical forest tree community

The approach described above has been applied to detailed floristic inventories conducted in a forest from Atlantic central Africa by Senterre (2005). The Monte Alén National Park in Equatorial Guinea, an area that has not suffered substantial human disturbance, is situated on the transition between the ‘Sacoglottis littoral evergreen rain forests’ and the ‘Biafrean evergreen rain forests’sensu stricto (Letouzey 1968; White 1983), and ranges from 300 m to 1300 m in altitude (Senterre 2001; Senterre & Lejoly 2001; Senterre et al. 2004). A submontane belt occurs above 700–900 m. Rainfall is quite variable depending on exposure and elevation, and ranges from 1500 mm year−1 to 3800 mm year−1 with two short dry seasons (Fa 1991). Temperature is about 25 °C and is seasonally constant. Soils in this region are derived from granite and gneiss and are mostly ferralitic and acid.


The flora of 28 sample plots was inventoried within a 25 × 25 km2 area within the Monte Alén National Park. These plots were placed in order to represent the main mature forest types of the region, and are in homogeneous stands of about 1 ha. They are distributed along five mountain slopes situated at different distances from the sea coast. Species inventories were made at several forest layers but, for simplicity in the present study, only the upper layer comprising canopy and emergent trees (i.e. trees generally over 20–25 m in height exposed to direct solar radiation) are considered. In each plot, 100 individuals were randomly sampled along a transect. Morphospecies were recognized in the field and their exact identification was based on herbarium material (2874 specimens). Nomenclature follows Lebrun & Stork (1991, 1992, 1995, 1997, 2003). Among the 2800 trees inventoried, 40 (1.4%) were unidentified and 111 (4%) were identified to morphospecies known to genus.

ecological data

The following quantitative variables were measured at each plot: altitude, stand dynamic (intensity of perturbation assessed from the frequency of windfall gaps; three ordinal classes), hygrometry (i.e. air humidity; four ordinal classes), soil hydromorphy (i.e. the degree of water saturation in soil; five ordinal classes), soil depth (five ordinal classes), presence of rocks in the soil (binary) and presence of an impenetrable gravel layer in the soil (binary). We used Bryophyte cover as an indicator of hygrometry (Frahm & Gradstein 1991; Wolff 1993; Kessler 2001). The absolute difference of values between plots for each variable was used as interplot ecological distance.

phylogenetic tree

Phylogenetic distances between species from different families are estimated from the dated Angiosperm families super-tree of Davies et al. (2004). This super-tree is based on rbcL gene sequencing and is calibrated using fossil data related to the split between Fagales and Cucurbitales 84 million years ago (Wikström et al. 2001). The tree for the families found is shown in supplementary Fig. S1. Below the family level, the different genera were treated as polytomies with a divergence age arbitrarily set at two-thirds the age of the family, where the family age is the estimated age of the node between sister families of the whole super-tree (379 families), and species from the same genus were treated in the same way considering that they diverged at one-third the age of their family.

To investigate the robustness of the measures of phylogenetic diversities with respect to the precision of the phylogenetic tree, we also considered a very simplified tree following a hierarchical APG classification (APG II 2003). Phylogenetic distances, δ, between individuals were then equal to 0 (same species), 1 (same genus), 2 (same family), 3 (same order), 4 (same higher clade, distinguishing asterids, rosids, magnoliids, monocots and orders not belonging to these clades) or 5 otherwise.

data analyses

Coefficients representing local inline image and total inline image (phylogenetic) diversity/distinctness as well as their ratios (IST, PST, ΠST) were computed. Simpson's diversities inline image were also partitioned according to divergence time using 20 million year (Myr) wide age classes (i.e. classes of divergence times: 0–20 Myr, 20–40 Myr, … , 140–160 Myr). In this way, the distributions of divergence time between individuals sampled within plots and among plots were compared.

To test for phylogenetic structuring, tips (species) from the phylogenetic tree were permuted 999 times (complete randomization), considering a tree containing only the observed species. To assess how this test is affected by the species pool of the phylogenetic tree when species from another ecological guild are included, we also applied the randomization procedure on a tree containing two times more species by adding Angiosperm species found in the herbaceous layer (which was very rich in Monocots), so that sampled species are also permuted with species not represented in the data set analysed.

Partial tree randomizations were performed by setting an age threshold above which the tree is kept intact (i.e. no species permutations among clades older than the threshold). We considered different threshold dates ranging from 150 Myr (equivalent to complete randomizations in our case) to 40 Myr.

In addition, pairwise IST, PST and ΠST between plots were computed and regressed on pairwise spatial and ecological distances between plots for each variable (multiple regression). The significance of explanatory variables was checked by partial Mantel tests.



Over all 28 plots inventoried, 273 species and morphospecies belonging to 168 genera from 36 families were identified. The most species-rich families (following APG classification) are the Fabaceae (which includes Caesalpiniaceae and Mimosaceae following older classifications), Euphorbiaceae, Annonaceae, Burseraceae, Clusiaceae and Rubiaceae, represented each by at least 10 species (supplementary Fig. S1). The maximal divergence time between species is 144 Myr following the super-tree of Davies et al. (2004).

Within a plot, the probability that two individuals belong to different species is inline image, the mean divergence time between individuals is inline image Myr, and the mean divergence time between species is inline image Myr (Table 1). According to these coefficients, most diversity occurs within a plot, the among-plot contribution (β diversity) being always less than or equal to 6% (IST = 0.042, PST = 0.059, ΠST = 0.0031; Table 2).

Table 2.  Partition of (phylogenetic) diversity/distinctness measures within and among 28 tree plots from an Atlantic central African forest (2760 individuals belonging to 273 species). For coefficients based on species phylogeny, mean values and 95% central distribution envelopes after 999 randomizations of the phylogenetic tree are given (italic values): ausing a tree containing only the observed species, and busing a tree containing additional herbaceous species
Coefficients based on:Local diversity αTotal diversity γ = α + βDifferentiation β/γ
  1. Note: DP and ΔP are mean divergence times expressed in million years.

Species identity and abundanceinline imageinline imageIST = 0.042
Species phylogeny and abundanceinline imageinline imagePST = 0.059
a96.6 (91.8, 103.5)a100.9 (95.9, 108.0)a0.042 (0.037, 0.049)
b110.7 (105.2, 116.3)b115.6 (109.8, 121.6)b0.042 (0.037, 0.049)
Species phylogeny and incidenceinline imageinline imageΠST = 0.0031
a102.9 (100.1, 106.2)a102.9 (101.2, 106.3)a0.0000 (−0.0024, 0.0036)
b117.9 (114.6, 121.5)b117.9 (114.6, 121.4)b0.0000 (−0.0023, 0.0032)

When these measures were computed using taxonomic ranks to produce surrogates of phyletic distances, estimates of phylogenetic diversity/distinctness were naturally different because divergence times were not included (e.g. inline image, inline image), but estimates of phylogenetic differentiation among plots were only slightly affected (PST = 0.055, ΠST = 0.0032).

The distributions of divergence time between individuals, which is a partition of DI, show that more than half the pairs of individuals have diverged between 100 and 120 Myr ago, belonging to different major clades such as rosids and asterids (Fig. 3). Slightly more pairs of individuals are from species having diverged less than 100 Myr ago within plots than among plots, the reverse pattern occurring for species having diverged more than 100 Myr ago (Fig. 3). Hence, a trend of phylogenetic clustering is observed.

Figure 3.

Decomposition of within- and among-plot Simpson's diversity indices according to the divergence time (i.e. frequency distributions of divergence time between individuals from different species for pairs of individuals sampled within plots or among plots).

testing community phylogenetic structuring

After complete randomization of the phylogenetic trees (for a tree containing only the observed species or also additional ones), mean pPST = IST and mean pΠST = 0, as expected on theoretical grounds (Table 2). The observed PST is above the 95% central pPST distribution interval, indicating that species within plots tend to be more related than species among plots (phylogenetic clustering; Table 2). ΠST is marginally significantly larger than zero, the observed ΠST being within the 95%, but outside the 90%, pΠST distribution interval. Interestingly, mean divergence times between individuals or species (i.e. inline image) were within their 95% distribution intervals obtained after randomizing a tree containing only the observed species, but they were outside such intervals when randomizing a tree containing additional species from the herbaceous layer (Table 2). This result occurs because tree species do not form a random sample of a phylogenetic tree containing both tree and herb species. Thus, testing absolute phylogenetic diversity measures by randomizing a phylogenetic tree strongly depends on the phylogenetic tree used and is not adequate to test whether species co-occurring within plots are more (or less) related on average than species from different plots. On the contrary, testing for phylogenetic clustering/overdispersion within plots using ratios such as PST and ΠST seems robust with respect to the phylogenetic tree used.

Under partial randomizations, when families are kept intact (age threshold at 40 Myr), pPST does not differ from PST (Fig. 4). The same trend is observed using ΠST (results not shown). Hence, within a family, no tendency for phylogenetic clustering or overdispersion is observed. Significant phylogenetic patterns emerge using older age thresholds for partial randomizations: a substantial drop of the mean pPST value (about half of the phylogenetic signal) is observed when the age threshold is larger than 143 Myr, i.e. when species from core eudicots and magnoliids (Lauraceae, Myristicaceae and Annonaceae) are allowed to be permuted (Fig. 4). Similar trends are observed using ΠST (not shown). Hence, phylogenetic clustering is mostly explained by deep phyletic divisions.

Figure 4.

Results of partial phylogenetic tree randomizations on PST. Error bars represent the 95% central pPST values obtained for 999 randomizations, where species permutations are done only among clades younger than the age threshold (in millions of years). Values can be compared with IST and PST (without randomizations).

Partial randomizations within each clade detected significant phylogenetic clustering within eudicots, rosids, eurosids1, eurosids2, malpighiales, malvales, rosales, sapindales and Annonaceae. Thus, only one family (Annonaceae) shows some trend of non-random phylogenetic pattern, and phylogenetic overdispersion was never observed. It must be noted, however, that the resolution of our phylogenetic hypothesis below the family level may be too low to detect existing phylogenetic signals.

phylogenetic signal and ecological differentiation

Pairwise differentiation between plots using IST, PST, ΠST or the difference (PST − IST) were not correlated with the spatial distance between plots, but did yield significant positive correlations with the ecological distance between plots for several variables, which together explained from 17% (for ΠST) to 29% (for PST) of the variance of differentiation coefficients (Table 3). The differences in altitude explained most of the variance (Table 3). When pairwise IST and PST between plots are averaged for a set of elevation difference intervals (0–100 m, > 100–200 m, > 200–300 m, > 300–500 m, > 500–1000 m; Fig. 5), only pairs of plots situated at very similar elevation (first interval) show no phylogenetic signal, and the magnitude of phylogenetic clustering, expressed by the difference PST − IST, increases steadily with the elevation difference between plots (Fig. 5).

Table 3.  Variables explaining the (phylogenetic) structure of tree communities in an Atlantic central African forest. Values are partial regression slopes of the linear regression of pairwise IST, PST or ΠST on spatial and ecological distances between plots. They are tested by partial Mantel tests (999 permutations). The last line shows the amount of variance explained when only the variables showing a significant effect (P < 0.05) are included in the regression model (R2)
  • *

    P < 0.05,

  • **

    P < 0.01,

  • ***

    P < 0.001.

Spatial distance−0.054−0.065−0.0600.004
Soil depth0.099*0.149*0.156*−0.009
Soil hydromorphy0.204**0.263*0.255*−0.012
Rocks in soil0.127*0.0580.0010.016
Gravel layer in soil−0.084−0.055−0.0250.051
Stand dynamic0.019−0.014−0.0330.037
Figure 5.

Phylogenetic signal for interplot differentiation. Average pairwise PST and IST are represented according to the elevation difference between plots. The stippled lines represent the 95% envelope of the pPST values obtained for 999 randomizations, showing that phylogenetic clustering (PST > IST) occurs when the elevation difference exceeds c. 150 m.

The difference PST − IST represents the part of the differentiation between plots that is explained only by the increase in phyletic divergence between individuals. Interestingly, it is not affected by hygrometry or the presence of rocks in the soil, two variables that significantly explain IST values (Table 3).


This paper provides a consistent theoretical framework to characterize the extent of community phylogenetic structuring through the partitioning of measures of phylogenetic diversity or distinctness into α (within-site) and β (between-site) components. It also proposes new testing procedures to detect a phylogenetic signal in different clades or at different evolutionary time depths. This work is closely linked to previous efforts aimed at defining general frameworks to treat diversity patterns in a way that includes similarities between species (Rao 1982, 1986; Ganeshaiah et al. 1997; Clarke & Warwick 1998; Webb 2000; Shimatani 2001; Webb et al. 2002; Pavoine et al. 2004; Pavoine & Dolédec 2005; Chave et al., in press), partition diversity additively within and among sites or habitats (e.g. Lande 1996; Veech et al. 2002; Couteron & Pélissier 2004; Pavoine & Dolédec 2005; Ricotta 2005; Chave et al., in press), or allow species or plot ordination (Pélissier et al. 2003; Pavoine et al. 2004). The proposed framework has the advantage of defining parameters with clear interpretations and, moreover, unifies the analysis of interspecific biodiversity patterns with the classical analysis of intraspecific (genetic) diversity pattern undertaken by population geneticists (‘F-statistics’, Wright 1965). In fact, phylogenetic diversity could be analysed without using any species concept (i.e. without categorizing individuals into a set of species) provided that phylogenetic distance between individuals can be assessed, for example using molecular data. The synthetic parameters describing diversity and/or phylogenetic distinctness (DI, DP, ΔP, IST, PST, ΠST) should be useful for comparing phylogenetic patterns, for example, of similar communities in different regions or continents, of different guilds of a same region, or at the levels of populations (within species) vs. communities (among species). Using differentiation coefficients (IST, PST, ΠST) in a pairwise fashion, the origin of a phylogenetic signal can be investigated by regressing these coefficients on ecological or geographical distances between community samples. A software program performing these analyses is being developed by O. J. Hardy and interested readers are invited to contact him.

comparison with other descriptors of phylogenetic diversity

Among the measures of phylogenetic diversity, Faith's PD measure (Faith 1992, 1994) has gained popularity in conservation biology. PD is the total phylogenetic branch length spanned by the species composing a given community, and can be interpreted as the amount of evolutionary history. The concept of PD can be used to describe phylogenetic diversity and also phylogenetic similarity between communities from the proportion of shared evolutionary history (Faith et al. 2004). How do DP and ΔP compare with Faith's PD? First, PD does not account for species abundance and is thus more related to ΔP than to DP. Second, ΔP is an average divergence time between pairs of species and is not influenced by species richness, explaining why it is a measure of ‘phylogenetic distinctness’, whereas PD increases with species richness and better fits the notion of ‘phylogenetic diversity’ (Clarke & Warwick 1998). The disadvantage of PD is that it requires an exhaustive sampling of all the species composing a community (if branches to missing species are not counted PD would be underestimated), whereas ΔP and DP do not suffer such estimation problems and are thus easy to estimate using community samples.

Webb et al. (2002) defined the ‘net relatedness index’ (NRI) and ‘nearest taxon index’ (NTI) to characterize the phylogenetic structure of communities. NRI is a standardized measure of ΔP defined for each plot as inline image where inline image and inline image are the expectation and standard deviation of inline image when the species in the plot are randomly resampled from a defined species pool, or randomly redistributed among plots under some constraints. Thus, NRIplot quantifies the extent of phylogenetic clustering in a plot relative to a reference tree (species pool), and/or a particular randomization null model (Kembel & Hubbell 2006). By comparison, ΠST quantifies the extent of phylogenetic clustering within a plot relative to species found among plots. Both NRIplot and ΠST provide a way to compare community phylogenetic patterns across studies and permit significance testing in a very similar way. However, NRIplot relies on the choice of a particular randomization null model, which affects its value (Kembel & Hubbell 2006; O. J. Hardy, unpublished data), and/or on a subjectively defined reference species pool, adding levels of complexity, whereas ΠST and PST are not affected by such factors. Swenson et al. (2006) suggested the scale sensitivity of NRIplot could be exploited with respect to the species pool to identify critical scales at which local or regional influences gain primacy for the structuring of communities. The same objective can be reached using pairwise differentiation coefficients (IST, PST or ΠST) by assessing how geographical distance or habitat differentiation between community samples affect the phylogenetic signal (e.g. Fig. 5). The advantage of the latter approach is that prior knowledge of species habitat affinities and of the existing species pools at different scales is not required.

NTI is defined in a same way as NRI but replacing inline image by the mean phylogenetic distance to the nearest taxon of each species, so that it quantifies the extent of terminal clustering, focusing on recent evolutionary events. Although possible, we have not defined measures analogous to ΔP and DP focusing on terminal phylogenetic clustering (but the data can always be reduced to a given clade). However, partial randomization tests permit an assessment of phylogenetic clustering/overdispersion up to any evolutionary depth, generalizing on a continuous time scale the distinction between NRI and NTI.

robustness with respect to the precision of the phylogenetic tree

When based on a dated phylogenetic tree, DP and ΔP depend on the time calibration of the super-tree, which can be problematic because it is based on incomplete fossil evidence and assumptions regarding nucleotide evolution (the ‘molecular clock’). For example, time calibrations of phylogenetic trees using single calibration points have very large confidence intervals and published trees can provide fairly different clade ages according to the set of species used and the gene regions analysed (e.g. the age of Malpighiales is 114 Myr according to Davis et al. 2005 whereas we considered 79 Myr following Davies et al. 2004). Using a taxonomic proxy for age is also problematic. For example, Diniza excelsa in the Fabaceae is almost as old as the family with 54-Myr fossils, yet the genus Inga is less than 6 Myr old (C. Dick, personal communication). The more detailed information presented in fossil-calibrated phylogenetic studies of individual families or genera will provide better resolution for future analyses. Fortunately, whereas the accuracy of absolute DP or ΔP coefficients might be questionable given present knowledge, ratios of these coefficients, such as PST or ΠST, are much more robust with respect to the interpretation of fossil evidence, and our results have shown that even a taxonomic rank-based classification provides estimates very similar to those based on a dated phylogenetic tree. This does not mean that rank-based classification are as informative as a more resolved phylogenetic tree because the ability to detect a phylogenetic signal within a clade always depends on the phylogenetic resolution of this clade. This may explain why essentially no phylogenetic signal was detected below the family level in our study.

testing phylogenetic patterns

All our tests of community phylogenetic structuring were based on a randomization of species at the tips of a phylogenetic tree. The underlying logic is that we create artificial data sets matching the null hypothesis to be tested. Besides removing any pattern of phylogenetic clustering (overdispersion) regarding species co-occurrence within plots, such randomization also removes any pattern of phylogenetic clustering (overdispersion) regarding species abundances/frequencies. Hence, two types of phylogenetic patterns can be tested: (i) one relative to species spatial distribution (‘Are species within sites more or less related than species from different sites?’) for which differentiation coefficients (PST or ΠST) are the relevant measures to test against their distribution after randomization; and (ii) one relative to species abundance/frequency (‘Are there clades of mostly abundant species and clades of mostly rare species?’) for which total diversity coefficients (inline image or inline image) are the relevant measures to test. This study focuses on the species spatial distribution and, interestingly, testing this pattern is not greatly affected by the species content of the tree used to provide randomization (Table 2). Simulations show that the randomization procedure of the phylogenetic tree gives an exact test (i.e. exact type I error rate) for this pattern when overall species abundances are phylogenetically random (O. J. Hardy, unpublished data). For our data set, tests on inline image and inline image are not significant using the phylogenetic tree containing only the observed species (Table 2), and hence the tests on PST or ΠST should be exact.

Is species abundance important to consider when testing community phylogenetic patterns? For the data set analysed here, higher testing power was obtained using PST than ΠST (Table 2). However, other data sets have sometimes given the opposite result (O. J. Hardy, unpublished data). It is likely that species abundance (rather than just incidence) is both informative, because abundance differences can reflect the action of ecological sorting, and noisy, because abundance differences are subject to stochastic processes that may lower testing power. Hence, whether abundance or incidence information should be used depends on each data set. We suggest always testing both PST and ΠST.

To test whether species within plots are more related than species among plots, other randomization procedures have been used in previous studies, in particular randomizing species abundance or incidence among sites, sometimes under specific constraints, such as keeping the per-site species richness unchanged (Cavender-Bares et al. 2004, 2006; Kembel & Hubbell 2006). To our knowledge the validity of such tests has not been demonstrated formally or by simulations (but see Kembel & Hubbell 2006), but preliminary results suggest that they can lead to liberal tests (i.e. rejecting the null hypothesis too often) when the randomization null model does not constrain species abundances or when species abundances are spatially autocorrelated (O. J. Hardy, unpublished data). Assessing the reliability of each randomization procedure using simulated data sets is beyond the scope of this paper but will be the subject of a forthcoming study by the first author.

phylogenetic clustering and overdispersion in forest tree communities

Partial randomizations of the phylogenetic tree are useful to assess the respective contributions of different clades to the phylogenetic pattern observed. This procedure should discern between clustering and overdispersion patterns occurring at different levels. Our data revealed that most of the phylogenetic signal was contributed by niche differentiation among clades that diverged long ago, and notably between eudicots and magnoliids, the latter being represented in our sample by 14 species of Annonaceae, three species of Myristicaceae and one species of Lauraceae. This signal is consistent with the fact that the number of magnoliids species per plot is negatively correlated with the altitude (r = −0.77, P < 0.001), the factor that best explains the observed phylogenetic signal (Table 2). This pattern is consistent with diversity gradients in the neotropics for the Myristicaceae, which shows highest diversity at low altitude, but not for Annonaceae, which shows highest diversity within the 1400–2000-m altitude range (Gentry 1988). The partial tree randomizations did not detect any clades within which phylogenetic overdispersion occurred, which might indicate that phylogenetic clustering, or random phylogenetic patterns, may be more common than overdispersion in tropical tree communities. However, it is too early to conclude because our data set did not include many congeneric species and phylogenetic relationships were poorly defined beyond the family level, so that actual overdispersed clades may have been undetectable (see below). Progress in phylogeny reconstruction should soon allow much finer analyses.

In agreement with previous studies focusing on tropical tree diversity (Webb 2000; Chave et al., in press), we detected an overall pattern of phylogenetic clustering. In contrast, phylogenetic overdispersion was demonstrated for Florida oak communities (Cavender-Bares et al. 2004). These contrasting results may come from the different taxonomic scales investigated: the whole angiosperm clade for tropical tree diversity studies vs. the genus Quercus for the oak communities. Processes of competitive exclusion, which can cause phylogenetic overdispersion, are more likely to occur among closely related species because of their niche overlap. Likewise, speciation by habitat specialization should also cause phylogenetic overdispersion of closely related species. On the contrary, phylogenetic clustering due to niche conservatism should increase on average with the divergence time between species because major shifts of species niche are rare. Hence, community phylogenetic patterns might combine overdispersion patterns in some recent clades nested within clustering patterns in older clades. Similarly, a clustering pattern appearing for a range of contrasted habitats might hide overdispersion patterns among moderately contrasted habitats. Co-occurring clustering and overdispersion patterns at different levels might sometimes compensate for each other, leading to an apparently overall absence of community phylogenetic structure when tested using a single statistic such as PST. Partial randomizations of the phylogenetic tree could in principle distinguish such a situation from a pattern where there is truly no community phylogenetic structuring at any level. However, the power of these tests to distinguish such situations remains to be evaluated.

testing community neutrality from its phylogenetic structure

Neutral community theories assume that all individuals behave in the same way (every individual has same chance of reproduction), independently of the species they belong to (Bell 2001; Hubbell 2001; Chave 2004). They emphasize the processes of demographic drift, dispersal and speciation, in sharp contrast to many ‘classical’ theories of community organization that emphasize niche differentiation. The goal of neutral theory is to provide a testable null hypothesis, so that observed deviations can give clues about the importance of non-neutral processes. In this respect, testing the phylogenetic structure of communities can be viewed as a test of community neutrality whenever biogeographical explanations for observed patterns can be ruled out because it would demonstrate that habitat differentiation among species and ecological sorting occur. Conversely, phylogenetic community analyses might be used to delimitate a community behaving putatively neutrally whenever no phylogenetic signal can be detected among community subdivisions.

Biogeographical explanations for community phylogenetic clustering can be excluded when migration rates largely exceed speciation rates at the geographical scale investigated (e.g. regional or local scales). Hence, the phylogenetic patterns detected in our case study must involve an ecological sorting of species coupled to some degree of niche conservatism during evolution. Accordingly, the strength of the phylogenetic signal between sites was well explained by their ecological differentiation, especially altitude (Fig. 5). It is worth noting that differentiation of some ecological variables (e.g. hygrometry, Table 2) was well correlated with floristic differentiation (IST) but without showing any phylogenetic signal (i.e. no correlation with ΠST or PST − IST), whereas other variables (e.g. altitude) were correlated with both phylogenetic and non-phylogenetic floristic differentiation (Table 2). This pattern could be explained if hygrometry and altitude are major determinants of species assemblages but adaptation to hygrometry is not phylogenetically conserved, contrary to adaptation to altitude. More data analysis is required to investigate this hypothesis further, in particular by characterizing the habitat affinities of each species in order to assess how traits are conserved in the phylogeny. In conclusion, we believe that phylogenetic community analyses could prove very insightful to identify general patterns of trait evolution and ecological sorting.


We thank Jonathan Davies for providing the super-tree of Angiosperm families and David Ackerly, Jérôme Chave, Christopher Dick, Sandrine Pavoine, Vincent Savolainen and three anonymous reviewers for helpful comments on a previous draft. We also thank the participants of the workshop ‘Phylogenies and community ecology’ from the ‘GDR Interactions Biotiques dans les Communautés: Théories et Modèles’ for their comments, and INRA for financial support. O.J.H. is a Research Associate from the Belgian National Fund for Scientific Research (FNRS). B.S. gathered his data set during his PhD thesis which was financed by a grant from the Belgian Fund for Formation to Research in Industry and Agriculture (FRIA).