Does phylogenetic distance aid in detecting environmental gradients related to species composition?



Questions: How should we evaluate the success of new distance measures combining community abundance and phylogenetic information? How do we interpret ordinations using these metrics?

Methods: We generated synthetic data along a known environmental gradient with two hypothetical underlying phylogenetic structures: niche phylogenetically conserved or dispersed along a gradient. We also examined tree species composition associated with gradients in elevation and longitude in Oregon, USA. NMS ordinations of plots in species space from phylogenetic (PD) and Sørensen distance (SD) matrices allowed comparison of the use of PD in different scenarios.

Results: PD caused plots to cluster based on the clades that they contained, reducing stress with the synthetic data but not with the real example. Phylogenetic distance highlighted clades related to gradients when these were associated. When phylogeny was not conserved along a gradient, that gradient was less strong. Regardless of phylogenetic conservation, NMS using SD consistently extracted the strongest gradients in species composition.

Conclusions: The success of PD should be evaluated on how well it extracts gradients in species composition and allows community ecologists to determine which gradients are partially explained by phylogeny and not based on its ability to reduce ordination stress. PD ordinations can help community ecologists interpret niche conservation but may obscure gradients related to species composition when niches are not conserved along the gradient of interest at the scale of the study.


What can community phylogenetics tell us about niches of species in a community and the evolutionary processes that led to their existence? This is growing area of study that helps us understand evolution and community assembly (Kembel & Hubbell 2006; Emerson & Gillespie 2008). The signal of ecological and spatial relationships of organisms is manifest as phylogenetic and niche clustering or dispersion at different spatial and temporal scales (Emerson & Gillespie 2008). Community analysis techniques that incorporate genetic or morphological relationships are vital in understanding the complexity of the phylogenetic signal in the context of ecological associations. Many studies have found incorporation of phylogenetic distance (PD) in community analysis aids interpretations of ecological gradients in the context of evolutionary relationships and perhaps provides better resolution of environmental gradients than standard distance (SD) measures alone (Webb et al. 2002; Nipperess et al. 2010; Duarte 2011). Most methods use presence–absence (binary) data from vegetation abundance data to calculate relatedness of different communities.

A few new approaches have begun to incorporate abundance measures in ordinations of community data (Nipperess et al. 2010; Pillar & Duarte 2010). Pillar & Duarte (2010) developed an elegant approach in which fuzzy weights indicate species relatedness and produce a PD-style matrix that can be used in ordination. A recent paper has proposed an improved distance measure incorporating abundance and phylogenetic information (Nipperess et al. 2010). This method assigns the abundance of each taxon in each plot to the terminal branch on a phylogenetic tree. Each branch throughout the tree is assigned the summed abundances of the more distal branches; this branch abundance matrix is used to calculate a Bray-Curtis-style similarity index. Both of these techniques may be superior to net relatedness indices (Kembel & Hubell 2006) because they incorporate the multivariate nature of the community more so than could a reduction to a single variable (Duarte 2011).

The success of Nipperess et al.'s (2010) method was gauged by reduction in stress in a non-metric scaling ordination (NMS, Kruskal 1964) without regard to known ecological gradients, which we do not believe to be the best approach for evaluating its success. PD should be further investigated as a tool for community ecologists by testing its ability to recover gradients related to community composition and phylogeny in synthetic and real data sets. A standard approach for evaluating new distance measures and ordination techniques has been to simulate species responses along a gradient and determine how effectively the array of analytical options extract the known data structure (e.g., Austin & Noy-Meir 1971; Kenkel & Orloci 1986; McCune 1994). We suggest combining this approach with PD to provide a better understanding of its properties.

To explore the effect of PD on the ability of ordination to extract ecological gradients, we used two simulated and a real data set with strong and apparent ecological gradients. The simulated data sets examined two scenarios: (1) species responses to ecological gradients that are phylogenetically conserved and (2) species responses of taxa along the ecological gradient that are unrelated to phylogeny. The real data set consists of tree species along a transect crossing the Cascade Mountains of Oregon, USA. We expect strong elevation and longitudinal gradients to be easily detected in the vegetation community in this region. We build on Nipperess et al.'s (2010) proposal of the new PD metric by discussing its effects in examples with specific ecological gradients, highlighting situations in which it may be most useful to community ecologists, and outlining its potential drawbacks and future directions to take.


Synthetic examples

We propose two test scenarios for species responses: a community with niche conservation or dispersion in relation to the phylogeny. These two types of community were simulated in species response curves along a hypothetical ecological gradient: (1) related species occur in similar positions along the gradient (Ex1), or (2) related species do not occur in similar positions along the gradient (Ex2). We generated the hypothetical phylogeny used in the example from Nipperess et al. (2010; Fig 1a) for both scenarios (Fig. 1a) along with species response curves using truncated beta functions with shape parameters 0.9, widths of 40 gradient units, and means varying along the gradient (Fig. 1b, similar to the approach of Minchen 1987). We arbitrarily selected 13 sampling plots along the gradient for study. We calculated standard Bray-Curtis distance among these plots (SD; Bray & Curtis 1957) using R statistical language (v. 2.7.2 R Foundation for Statistical Computing, Vienna, Austria) with the vegan package (v. 1.17-5), PD using the ape package (version 2.6-2) and phylosim function using the Sørensen equivalent (Nipperess et al. 2010).

Figure 1.

 (a) Hypothetical phylogenetic tree with taxa A–F. (b) Synthetic gradient with species response curves. Species names in Ex1 refer to the case where related species are in similar habitats; in Ex2, related species are in different habitats.

We evaluated the relationship between SD and PD for each of the examples using a Mantel test (Mantel 1967). Non-metric multidimensional scaling (NMS) ordinations (Kruskal 1964) of plots in species space were calculated directly from these distance matrices in PC-Ord (version 6.255 beta, MjM Software, Gleneden Beach, OR, USA) without penalization for ties, using a random start, 250 runs with real data and forcing a one-dimensional solution to focus on the single gradient of interest and to maximize comparability among the methods. To visualize gradient retrieval, we rotated and re-scaled the ordination axes and gradient to the unit interval.

Real example

Our vegetation abundance data were basal areas of 31 tree species from 180 (USDA 2010) plots with basal areas exceeding 20 m2 ha−1 in a geographic band spanning the Cascade Mountains from longitudes −122.85 to −121.500 and latitudes 44.100 to 44.708 in Oregon, USA. We expected that this band would allow easy detection of gradients in longitude and elevation relating to tree species composition, which spans the dry to wet sides of the Cascade Mountain range and elevations ranging from 90 to 2100 m.

We obtained sequences from GenBank of the chloroplast DNA locus coding for the protein RuBisCO (rbcL) from each species. Of 31 species, 25 had rbcL sequences available; in two cases, these came from closely related species that would be expected to have very similar patterns in relation to the rest of the taxa (Table 1). We removed species lacking rbcL data from the data set. Sequences were aligned using the MAFFT algorithm in CLC sequence viewer (v.6.5, CLC bio A/S, Aarhus, Denmark) with the default gap penalty settings (opening penalty of 1.53 and gap extension penalty of 0.123). Ambiguously aligned end regions were excluded using Gblocks (Castresana 2000) with the default settings, leaving 1239 base pairs per species in the final alignment. This alignment of 26 species was imported to the CIPRES web portal (Miller et al. 2010), and a maximum likelihood analysis run in RAxML blackbox (Stamatakis 2006) with the default settings. The best tree from this search was rooted with one outgroup (Polystichum munitum) in TreeView (Page 1996), and exported a NEWICK file for use in R. When calculating PD, branch lengths were measured as base pair changes per hundred (Fig. 3a).

Table 1.   Acronyms for tree species in forest inventory and analysis plots in Oregon, USA and GenBank numbers for rbcL sequences for each species.
AbiamaAbies amabilisAB029650
AbiconAbies concolorAB029648
AbigraAbies grandisAB029646
AbilasAbies lasiocarpaAY664855
AbiproAbies proceraAB029651
CalnooCallitropsis nootkatensisHM024268
JunoccJuniperus occidentalisHM024317
LaroccLarix occidentalis (sequence from Larix laricina)AF479878
CaldecCalocedrus decurrensHM024269
PinalbPinus albicaulisDQ155294
PinconPinus contortaAY497230
PinlamPinus lambertianaDQ155292
PinmonPinus monticolaAY497223
PinponPinus ponderosaAY497234
PsemenPseudotsuga menziesiiAY664856
TaxbreTaxus brevifoliaAF249666
ThupliThuja plicataAF127428
TsuhetTsuga heterophyllaX63659
TsumerTsuga mertensianaAF145463
AcemacAcer macrophyllumDQ978414
AceglaAcer glabrumDQ978410
ArbmenArbutus menziesiiAF419813
PoptrePopulus tremuloidesAF206812
PoptriPopulus trichocarpa (sequence from P. balsamifera)AJ418826
PrunusPrunus spp. (sequence from P. emarginata)PEU06820
Figure 3.

 (a) Best tree from maximum likelihood analysis of relationships between an Oregon tree species example based on rbcL sequences. (b, c) Species response curves along the elevation and longitude gradients for the Oregon tree species example generated from a kernel smoother of species basal areas on plots along these gradients.

Species response curves along elevation and longitude were generated using a kernel smoother (HyperNiche, version 2.10, MjM Software, Gleneden Beach, OR, USA) for species giving models with a cross-validated xR2>0.10 (Fig. 2b, c). NMS ordinations of plots in species space were calculated as with the simulated data, except that we forced two-dimensional solutions to focus on the two gradients of interest (longitude and elevation) and because adding a third dimension did not substantially reduce stress. We used a generalized log transformation to decrease the influence of dominant tree species. We rigidly rotated ordinations to maximize comparability.

Figure 2.

 Ordination scores for plots in species space using the synthetic data plotted against the true underlying ecological gradient scores. An ordination that perfectly detected the gradient would follow a line through the origin with a slope of one. Each line represents a different ordination: SD refers to Bray-Curtis distance, PD Ex1 refers to phylogenetic distance when related species are in similar habitats, PD Ex2 refers to phylogenetic distance when related species are in different habitats.


Synthetic examples

Sørensen distance detected the synthetic gradient most accurately compared with the ideal of a perfect correlation with a slope of one (Fig. 2), but also had the greatest stress (Table 2). PD performed better at detecting the synthetic gradient when species responses were related to phylogeny (Ex1) than when they were not (Ex2). Ordination using PD for Ex1 arranged plots in three groups (Fig. 2) to create a step-like interpretation of the gradient with the lowest stress. These groups corresponded to those with only species in the clade with species C-F (plots 1–6), those with species in both clades (plots 7–9) and those dominated by species in the clade with species A and B (plots 10–13). Ordination using PD for Ex2 also arrange plots in groups associated with clades and had a lower stress than using SD but extracted the gradient poorly.

Table 2.   Statistics comparing the methods. Analyses for synthetic data include Sørensen distance (SD) and abundance incorporating phylogenetic distance (PD) under scenarios one (related species are in similar habitats) and two (related species are in different habitats). Real data are tree basal area data with SD and PD. R2 with SD expresses how phylogenetic distance is related to Sørensen distance. Stress percentages are the final stresses from NMS ordinations. R2 with gradients express the relationship between NMS ordination scores and gradients, including synthetic gradient with the single ordination axis and elevation and longitude with two ordination axes.
AnalysisR2 with SDStressR2 with gradient
Synthetic data
 SD 9.97Synthetic: 97
 Scenario 1: PD710.00005Synthetic: 95
 Scenario 2: PD577.65Synthetic: 90
Real data
 SD 11.28Elevation: Axis 1: 68 Axis 2: 45
  Longitude: Axis 1: 71 Axis 2: 1
 PD3911.82Elevation: Axis 1: 68 Axis 2: 0
  Longitude: Axis 1: 63 Axis 2: 1

Real example

Our phylogenetic tree was consistent with our hypotheses that the gymnosperms were sister to angiosperms. Within the gymnosperms, it was consistent with our expectation that Pinaceae and Cupressaceae were sister groups. In both the angiosperms and Pinaceae, congeners were placed as monophyletic groups (Fig. 3a).

Most tree species had roughly unimodal distribution along gradients of elevation and longitude (Fig. 3b, c). Low- to mid-elevation species included Pseudotsuga menziesii, Tsuga heterophylla, Acer macrophyllum and Juniperus occidentalis. Mid- to high-elevation species included the Pinaceae Abies spp., Tsuga mertensiana and Pinus albicaulis. Western species included the Cupressaceae Thuja plicata, angiosperms Acer glabrum, Acer macrophyllum and Arbutus menziesii, and Pinaceae Pseudotsuga menziesii, Tsuga heterophylla and Abies amabilis. Species of middle longitudes included the Pinacaeae Pinus contorta, Tsuga mertensiana and Abies lasiocarpa. Eastern plots were characterized by a mix of Cupressaceae Juniperus occidentalis, Pinaceae Pinus ponderosa, and occasional Populus tremuloides.

As expected, ordination using SD detected strong ecological gradients related to elevation and longitude (Fig. 4a, Table 1). Phylogenetic distances (PD) among plots were generally lower than Sørensen distances (SD), but the distance matrices were weakly correlated (Table 1). Using PD, plots formed clumps in the ordination and showed weak associations with elevation and longitude (Fig. 4b, Table 1). Plots at the top of the ordination included Cupressaceae whereas those on the left included angiosperms and those at the right were entirely composed of members of the Pinaceae. Use of PD weakened the longitudinal gradient by clumping eastern and western plots containing Cupressaceae and angiosperms. The ordination highlighted the elevation gradient, as plots at higher elevations were dominated by Pinaceae.

Figure 4.

 (a) NMS ordination of plots (small grey circles) in species space using Bray-Curtis (SD) distance. The weighted average location of species is shown with clades symbolically overlain. Biplot vectors represent trends in elevation and longitude. (b) NMS ordination of plots in species space using phylogenetic distance (PD) with an overlay showing locations of plots with species from Cupressaceae, Pinaceae and angiosperms. Both elevation and longitude biplot vectors point in the same direction, however, longitude is shorter.


Abundance-based phylogenetic distance (PD; Nipperess et al. 2010) did not consistently enhance our ability to detect ecological gradients relating to species composition; instead it highlighted which gradients were related to phylogeny by bringing plots with related taxa together in a clumped pattern. Where habitats were somewhat phylogenetically conserved along gradients (synthetic Ex1 and elevation in the real data), PD allowed interpretation of clades associated with positions along these gradients. When niches along gradients were not phylogenetically conserved (synthetic Ex2 and longitude in the real data), PD weakened interpretation of environmental gradients.

Incorporating phylogenetic and community abundance information to examine habitat associations is best used as an exploratory tool to examine which patterns in community composition are partially explained by phylogeny. Comparison of ordinations using PD to those using SD can suggest which gradients correspond to phylogenetic conservatism and which are more related to other factors. At small spatial scales, closely related species may have divergent niches (over-dispersion; Kembel & Hubbell 2006; Emerson & Gillespie 2008) and incorporation of PD would be expected to downplay environmental gradients associated with species composition. At larger spatial scales, related species may have similar niches, showing phylogenetic clustering along environmental gradients. In these cases, ordination incorporating PD would be expected to highlight the clustering of clades in similar niches. Over very long gradients, plots may share genera or families but not species or genera; PD could allow analysis that takes into account the potential functional similarity of related taxa. The scale at which over-dispersion and clustering are observed may depend on the environmental gradient.

Community ecologists should use ordinations based on SD and PD to address different questions: ‘Which environmental gradients are most related to species composition?’ is best answered using SD, whereas, ‘Which environmental gradients related to species composition can be partially explained by evolutionary histories?’ is best addressed by comparing SD and PD ordinations. We should also be aware of potential pitfalls of PD. (1) There is no precise definition of what species response surfaces should look like when habitats are phylogenetically conserved (Webb et al. 2002), although conceptual evolutionary scenarios leading to different ecological outcomes have been summarized (Wiens & Graham 2005; Emerson & Gillespie 2008). (2) Phylogenies are working hypotheses; distances and topologies are under revision as we gain more information about evolutionary histories. Despite these caveats, we believe that many patterns in community ecology may be related to evolutionary history, especially at broad spatial scales, and that combining SD with PD will allow further exploration of mechanisms controlling species distributions and habitat associations.

Further development of PD may include a more comprehensive comparison of fuzzy-set weighted communities (Pillar & Duarte 2010) to those calculated with reference to tree topology (Nipperess et al. 2010). Further development of permutation tests could evaluate competing models, similar to those suggested for binary species response data (Kembel & Hubbell 2006). Niches could be interpreted as phylogenetically conserved when related taxa are clustered along a gradient. Since PD highlights this situation, using it should allow detection of strong correlations between ordination scores and the environmental gradient of interest. A potential null model could evaluate whether this correlation was much lower when taxon positions on the phylogenetic tree are randomized. We suggest that future studies develop these randomization tests.

We applaud Nipperess et al.'s (2010) development of ordination of community data using phylogenetic distance that incorporates abundances. However, we felt that its utility needed to be examined, not by noting a reduction in stress but by examining its performance with known environmental gradients. We have addressed this omission using two small examples and discussed situations in which this exciting new approach may be most helpful in community ecology.


David Nipperess provided the phylogeny from his example. The paper benefitted from comments by Bruce McCune and the community analysis discussion group at Oregon State University. Leandro Duarte, Mark Fishbein and David Nipperess provided valuable reviews of the manuscript.