Partitioning species diversity within and among sites is fundamentally analogous to partitioning allele diversity within and among populations, and the basic processes determining the dynamics of populations and communities are also very similar in nature. Consequently, the way genetic diversity is partitioned given a genealogy of alleles (Pons & Petit 1996) can be used to describe the phylogenetic structure of communities (Pavoine & Dolédec 2005; Chave et al., in press). In the following, we present descriptive statistics of the phylogenetic structure of communities requiring species abundance data, or species incidence (presence/absence) data, and we show how essentially unbiased estimates can be obtained. We then present randomization procedures to test whether phylogenetic clustering or overdispersion occur. Our definition of ‘community’ is any assemblage of species spatially localized and possibly restricted to some clade, functional group and/or phenotypic features (e.g. all angiosperm trees with a diameter above 10 cm in a 1-ha plot).
partitioning phylogenetic diversity from species abundances
Different measures quantify species diversity of a community, some of the best known being the species richness, the Shannon–Wiener index, the Simpson diversity index and Fisher's alpha (Magurran 2004). These statistics differ in the way species frequency is accounted for (e.g. Couteron & Pélissier 2004), but also in their sensitivity to sample size (Gimaret-Carpentier et al. 1998). For example, Simpson diversity gives much weight to common species, nearly ignoring rare species, and is unbiased with respect to sample size. In contrast, species richness weights common and rare species equally, but is highly dependent on sample size so that unbiased estimates are difficult to obtain in species-rich communities (Gotelli & Colwell 2001).
These statistics do not consider the phylogenetic relationships between species but the Simpson diversity index, D, can be extended to incorporate phylogenetic information. D is
where fi is the frequency of species i in a community. As is the probability that a random pair of individuals from the community belong to species i, is the probability that the pair belongs to the same species. Hence, D is the probability that two individuals from the community belong to a different species. It can also be expressed as
if δij is an indicator variable equal to 0 when i = j (same species) and equal to 1 when i ≠ j (different species).
Equation 2 can be extended to incorporate species phylogeny by letting δij be a continuous variable expressing the phylogenetic distance between i and j rather than being a binary (0, 1) variable. For example, if δij is the divergence time between species i and j (i.e. the age of the most recent common ancestor of i and j), D is the mean divergence time between two randomly sampled individuals in the community, and can be interpreted as a measure of phylogenetic diversity. In the following we will distinguish DI, the original Simpson diversity index which accounts only for species identity (i.e. not their phylogeny), and DP, a diversity measure accounting for species phylogeny and equivalent to Rao's ‘quadratic entropy’ (Rao 1982) for the special case where δij is a phylogenetic distance (see Table 1 for equivalence with other publications). Note that δij could represent other types of distances between species (e.g. taxonomic, morphological, functional). However, a diversity coefficient should ideally always increase when a species is added, a property of D ensured when δij is ultrametric (Pavoine et al. 2005), which is the case when δij is the divergence time or, more generally, when δij is obtained from the branch lengths of a rooted tree in which all the end nodes are equidistant from the root (ultrametric tree).
Table 1. Main coefficients presented in this paper and correspondences with other publications
|DI†||Probability that two individuals belong to different species (Simpson diversity index).||D̄ (ref. 2); D (ref. 3)|
|DP†||Mean phylogenetic distance (e.g. divergence time) between individuals (an index of phylogenetic diversity).||D (ref. 2); Δ (ref. 3, 4); AI (ref. 5); HΔ (ref. 6)|
|IST||Proportion of the overall species diversity (DI) expressed among sites. Analogue to FST and GST (ref. 1‡) in population genetics.|| |
|PST||Proportion of the overall phylogenetic diversity (DP) expressed among sites. Analogous to NST (ref. 1‡) in population genetics.||FST (ref. 2)|
|ΔP†||Mean phylogenetic distance (e.g. divergence time) between distinct species (an index of phylogenetic species distinctness).||Δ+ (ref. 4); 1/2 MPD (ref. 7)|
|ΠST||Proportion of the overall phylogenetic species distinctness (ΔP) expressed among sites.|| |
DI and DP can be used to define differentiation coefficients between local communities (sites) as follows. Diversity can be perceived at different levels and it is customary to decompose the total amount (gamma diversity) into a local component (alpha diversity) and an intersite or interhabitat component (beta diversity; Whittaker 1972). D can be partitioned additively, as in a nested anova (e.g. Lande 1996; Couteron & Pélissier 2004; Pavoine & Dolédec 2005). Let fi and fik be the frequency of species i overall in the region and within site k, respectively. The total (gamma) diversity is defined as
The diversity within site k is
The average within-site (alpha) diversity, DS, is the expectation of Dk over all sites. The beta diversity is the difference DT –DS. This among-site component can be rewritten as a fraction of the total diversity, expressing differentiation among sites using information from species identity or species phylogeny:
where the subscripts S and T refer to the fact that diversity within site is compared with total diversity. IST and PST thus represent fractions of the overall species or phylogenetic diversity expressed among sites. They are equivalent to, respectively, FST (or GST, the proportion of genetic diversity expressed among populations considering allele identity) and NST (the proportion of genetic diversity expressed among populations considering the phylogeny of alleles), which are defined at the within-species level (Pons & Petit 1996). Note that the same formalism of community diversity decomposition was proposed by Chave et al. (in press), who used the symbol ‘FST’ to denote PST in reference to population genetics literature (Table 1). However, to avoid confusion, here we introduce new symbols. Moreover, we demonstrate below how to compare, estimate and test IST and PST to make inferences on the phylogenetic structure of communities.
Among-site differentiation occurs when PST > 0 or IST > 0. An interesting property of these differentiation coefficients is that PST = IST when there is no community phylogenetic structuring, whereas PST > IST (PST < IST) indicates phylogenetic clustering (overdispersion), i.e. species found within a same site are more (less) related on average than species taken from different sites. The reason is that IST expresses only the among-site differences in species frequencies whereas PST also expresses the gain of phylogenetic divergence among species. The strength and direction of the phylogenetic signal can thus be quantified by the difference PST − IST.
To obtain unbiased estimators (Pons & Petit 1996), we assume that N randomly chosen sites have been sampled in a large community, and that nk individuals have been sampled in site k (nk > 3). Let f̂ik be the observed frequency of species i in site k (an estimate of the parameter fik which is the actual value). The phyletic distance between species, δij, is assumed to be known without error. Unbiased estimates of diversity coefficients can be obtained from the principles of an anova, considering that the factor site is a random effect (Pons & Petit 1996):
Estimators of differentiation coefficients are obtained as:
PST and IST assess the proportion of the total diversity explained by species turnover among sites. These coefficients can also be used to quantify pairwise differentiation between sites by applying the above formulae in the particular case where N = 2. Such between-site differentiation coefficients can then be regressed on explanatory variables that express the distance (geographical, ecological) between sites to infer which factors might be responsible for the phylogenetic signal.
partitioning phylogenetic distinctness from species incidence
The diversity measures presented above require abundance data (counts of individuals) and rare species are underemphasized. As an alternative, we follow Clarke & Warwick (1998) to define a measure of phylogenetic ‘distinctness’ based on species incidence (presence/absence data) that should remain robust with respect to sample size. For a given community
where pi = 1 if species i is present (fi > 0), otherwise pi = 0 (fi = 0). The denominator is twice the number of pairwise comparisons between existing species. Note that, contrary to D, the double sums exclude comparisons of a species with itself.
It is easiest to understand ΔP by analogy with DP: while DP is the mean phyletic distance between distinct individuals, ΔP is the mean phyletic distance between distinct species. It can thus be interpreted as a measure of ‘phylogenetic distinctness’ between species (see Table 1 for equivalence with other publications). Contrary to DP, ΔP is not a measure of community phylogenetic diversity because it does not necessarily increase with the addition of new species. But like DP, ΔP can be evaluated at different levels: the average within site, and over all sites. Hence, a coefficient analogous to PST can be defined as:
ΠST expresses the gain of phyletic divergence between species occurring in different sites compared with species occurring in the same site. It is not sensitive to the gain in species richness among sites vs. within a site because ΔP expresses phylogenetic distinctness independently of species richness. Its expectation is ΠST= 0 when there is no community phylogenetic structuring and ΠST > 0 (ΠST < 0) under phylogenetic clustering (overdispersion).
To derive estimators that should be unbiased when phylogenetic distances and differences in species abundances are uncorrelated, let p̂ik= 1 if f̂ik > 0, otherwise p̂ik= 0:
testing the phylogenetic structure of communities
To detect phylogenetic clustering or overdispersion within sites, PST or ΠST can be computed after randomizing the species among the tips of the phylogenetic tree used to define the δij phyletic distances (Fig. 2a). Because such tree randomization breaks down the actual phyletic relationships among species (while keeping the tree architecture intact), the resulting pseudo PST or ΠST values (hereafter denoted pPST or pΠST) are representative of a community without phylogenetic structuring, conforming to the null hypothesis to be tested. Hence, the actual PST (ΠST) can be compared with the distribution of pPST (pΠST) obtained for many independent randomizations, providing an estimate of the P value for the null hypothesis that there is no phylogenetic signal. This procedure tests if PST= IST or if ΠST = 0 because pPST and pΠST have statistical expectations equal to IST and 0, respectively (see Hardy et al. 2003 for an analogy in population genetics).
Figure 2. Modes of phylogenetic tree randomizations for testing phylogenetic patterns at different evolutionary levels. The structure of the tree is left unchanged but species are randomly permuted within the grey areas so that the apparent divergence times between species are modified. (a) Complete tree randomization. (b) Partial tree randomization for the clade shown by the arrow. (c) Partial tree randomization for clades younger than the age threshold shown by the stippled line.
Download figure to PowerPoint
partial randomizations of the phylogenetic tree
When a non-random community phylogenetic structure is observed, one may wonder which parts of the phylogenetic tree contribute to the pattern. Partial tree randomizations, where species positions are randomized only within a defined clade, allow one to test phylogenetic clustering or overdispersion independently within each clade (Fig. 2b).
A community phylogenetic structure may also show overdispersion within recent clades and clustering within old clades. To assess phylogenetic patterns according to evolutionary depth, partial tree randomizations where species positions are randomized only within clades younger than an age threshold (Fig. 2c) can provide interesting clues, because the resulting pPST (or pΠST) retains part of the phylogenetic information and its expectation lies between IST and PST (or between 0 and ΠST for pΠST). The difference pPST − IST (or pΠST) expresses then the extent of community phylogenetic structuring contributed by the separation of clades that diverged before the age threshold.
If phylogenetic overdispersion occurs in some clades while phylogenetic clustering occurs in other clades, these patterns may cancel out. Partial randomizations should thus permit the identification of such situations using PST or ΠST.