Does phylogenetic distance aid in detecting environmental gradients related to species composition?
Article first published online: 11 JUL 2011
© 2011 International Association for Vegetation Science
Journal of Vegetation Science
Volume 22, Issue 6, pages 1143–1148, December 2011
How to Cite
Root, H. T. and Nelson, P. R. (2011), Does phylogenetic distance aid in detecting environmental gradients related to species composition?. Journal of Vegetation Science, 22: 1143–1148. doi: 10.1111/j.1654-1103.2011.01320.x
Co-ordinating Editor: Michael Palmer
- Issue published online: 5 OCT 2011
- Article first published online: 11 JUL 2011
- Received 7 March 2011, Accepted 6 June 2011
- Community analysis;
- Niche conservation;
- NMS ordination;
- Oregon trees;
- Phylogenetic distance;
- Synthetic data
Questions: How should we evaluate the success of new distance measures combining community abundance and phylogenetic information? How do we interpret ordinations using these metrics?
Methods: We generated synthetic data along a known environmental gradient with two hypothetical underlying phylogenetic structures: niche phylogenetically conserved or dispersed along a gradient. We also examined tree species composition associated with gradients in elevation and longitude in Oregon, USA. NMS ordinations of plots in species space from phylogenetic (PD) and Sørensen distance (SD) matrices allowed comparison of the use of PD in different scenarios.
Results: PD caused plots to cluster based on the clades that they contained, reducing stress with the synthetic data but not with the real example. Phylogenetic distance highlighted clades related to gradients when these were associated. When phylogeny was not conserved along a gradient, that gradient was less strong. Regardless of phylogenetic conservation, NMS using SD consistently extracted the strongest gradients in species composition.
Conclusions: The success of PD should be evaluated on how well it extracts gradients in species composition and allows community ecologists to determine which gradients are partially explained by phylogeny and not based on its ability to reduce ordination stress. PD ordinations can help community ecologists interpret niche conservation but may obscure gradients related to species composition when niches are not conserved along the gradient of interest at the scale of the study.
What can community phylogenetics tell us about niches of species in a community and the evolutionary processes that led to their existence? This is growing area of study that helps us understand evolution and community assembly (Kembel & Hubbell 2006; Emerson & Gillespie 2008). The signal of ecological and spatial relationships of organisms is manifest as phylogenetic and niche clustering or dispersion at different spatial and temporal scales (Emerson & Gillespie 2008). Community analysis techniques that incorporate genetic or morphological relationships are vital in understanding the complexity of the phylogenetic signal in the context of ecological associations. Many studies have found incorporation of phylogenetic distance (PD) in community analysis aids interpretations of ecological gradients in the context of evolutionary relationships and perhaps provides better resolution of environmental gradients than standard distance (SD) measures alone (Webb et al. 2002; Nipperess et al. 2010; Duarte 2011). Most methods use presence–absence (binary) data from vegetation abundance data to calculate relatedness of different communities.
A few new approaches have begun to incorporate abundance measures in ordinations of community data (Nipperess et al. 2010; Pillar & Duarte 2010). Pillar & Duarte (2010) developed an elegant approach in which fuzzy weights indicate species relatedness and produce a PD-style matrix that can be used in ordination. A recent paper has proposed an improved distance measure incorporating abundance and phylogenetic information (Nipperess et al. 2010). This method assigns the abundance of each taxon in each plot to the terminal branch on a phylogenetic tree. Each branch throughout the tree is assigned the summed abundances of the more distal branches; this branch abundance matrix is used to calculate a Bray-Curtis-style similarity index. Both of these techniques may be superior to net relatedness indices (Kembel & Hubell 2006) because they incorporate the multivariate nature of the community more so than could a reduction to a single variable (Duarte 2011).
The success of Nipperess et al.'s (2010) method was gauged by reduction in stress in a non-metric scaling ordination (NMS, Kruskal 1964) without regard to known ecological gradients, which we do not believe to be the best approach for evaluating its success. PD should be further investigated as a tool for community ecologists by testing its ability to recover gradients related to community composition and phylogeny in synthetic and real data sets. A standard approach for evaluating new distance measures and ordination techniques has been to simulate species responses along a gradient and determine how effectively the array of analytical options extract the known data structure (e.g., Austin & Noy-Meir 1971; Kenkel & Orloci 1986; McCune 1994). We suggest combining this approach with PD to provide a better understanding of its properties.
To explore the effect of PD on the ability of ordination to extract ecological gradients, we used two simulated and a real data set with strong and apparent ecological gradients. The simulated data sets examined two scenarios: (1) species responses to ecological gradients that are phylogenetically conserved and (2) species responses of taxa along the ecological gradient that are unrelated to phylogeny. The real data set consists of tree species along a transect crossing the Cascade Mountains of Oregon, USA. We expect strong elevation and longitudinal gradients to be easily detected in the vegetation community in this region. We build on Nipperess et al.'s (2010) proposal of the new PD metric by discussing its effects in examples with specific ecological gradients, highlighting situations in which it may be most useful to community ecologists, and outlining its potential drawbacks and future directions to take.
We propose two test scenarios for species responses: a community with niche conservation or dispersion in relation to the phylogeny. These two types of community were simulated in species response curves along a hypothetical ecological gradient: (1) related species occur in similar positions along the gradient (Ex1), or (2) related species do not occur in similar positions along the gradient (Ex2). We generated the hypothetical phylogeny used in the example from Nipperess et al. (2010; Fig 1a) for both scenarios (Fig. 1a) along with species response curves using truncated beta functions with shape parameters 0.9, widths of 40 gradient units, and means varying along the gradient (Fig. 1b, similar to the approach of Minchen 1987). We arbitrarily selected 13 sampling plots along the gradient for study. We calculated standard Bray-Curtis distance among these plots (SD; Bray & Curtis 1957) using R statistical language (v. 2.7.2 R Foundation for Statistical Computing, Vienna, Austria) with the vegan package (v. 1.17-5), PD using the ape package (version 2.6-2) and phylosim function using the Sørensen equivalent (Nipperess et al. 2010).
We evaluated the relationship between SD and PD for each of the examples using a Mantel test (Mantel 1967). Non-metric multidimensional scaling (NMS) ordinations (Kruskal 1964) of plots in species space were calculated directly from these distance matrices in PC-Ord (version 6.255 beta, MjM Software, Gleneden Beach, OR, USA) without penalization for ties, using a random start, 250 runs with real data and forcing a one-dimensional solution to focus on the single gradient of interest and to maximize comparability among the methods. To visualize gradient retrieval, we rotated and re-scaled the ordination axes and gradient to the unit interval.
Our vegetation abundance data were basal areas of 31 tree species from 180 (USDA 2010) plots with basal areas exceeding 20 m2 ha−1 in a geographic band spanning the Cascade Mountains from longitudes −122.85 to −121.500 and latitudes 44.100 to 44.708 in Oregon, USA. We expected that this band would allow easy detection of gradients in longitude and elevation relating to tree species composition, which spans the dry to wet sides of the Cascade Mountain range and elevations ranging from 90 to 2100 m.
We obtained sequences from GenBank of the chloroplast DNA locus coding for the protein RuBisCO (rbcL) from each species. Of 31 species, 25 had rbcL sequences available; in two cases, these came from closely related species that would be expected to have very similar patterns in relation to the rest of the taxa (Table 1). We removed species lacking rbcL data from the data set. Sequences were aligned using the MAFFT algorithm in CLC sequence viewer (v.6.5, CLC bio A/S, Aarhus, Denmark) with the default gap penalty settings (opening penalty of 1.53 and gap extension penalty of 0.123). Ambiguously aligned end regions were excluded using Gblocks (Castresana 2000) with the default settings, leaving 1239 base pairs per species in the final alignment. This alignment of 26 species was imported to the CIPRES web portal (Miller et al. 2010), and a maximum likelihood analysis run in RAxML blackbox (Stamatakis 2006) with the default settings. The best tree from this search was rooted with one outgroup (Polystichum munitum) in TreeView (Page 1996), and exported a NEWICK file for use in R. When calculating PD, branch lengths were measured as base pair changes per hundred (Fig. 3a).
|Larocc||Larix occidentalis (sequence from Larix laricina)||AF479878|
|Poptri||Populus trichocarpa (sequence from P. balsamifera)||AJ418826|
|Prunus||Prunus spp. (sequence from P. emarginata)||PEU06820|
Species response curves along elevation and longitude were generated using a kernel smoother (HyperNiche, version 2.10, MjM Software, Gleneden Beach, OR, USA) for species giving models with a cross-validated xR2>0.10 (Fig. 2b, c). NMS ordinations of plots in species space were calculated as with the simulated data, except that we forced two-dimensional solutions to focus on the two gradients of interest (longitude and elevation) and because adding a third dimension did not substantially reduce stress. We used a generalized log transformation to decrease the influence of dominant tree species. We rigidly rotated ordinations to maximize comparability.
Sørensen distance detected the synthetic gradient most accurately compared with the ideal of a perfect correlation with a slope of one (Fig. 2), but also had the greatest stress (Table 2). PD performed better at detecting the synthetic gradient when species responses were related to phylogeny (Ex1) than when they were not (Ex2). Ordination using PD for Ex1 arranged plots in three groups (Fig. 2) to create a step-like interpretation of the gradient with the lowest stress. These groups corresponded to those with only species in the clade with species C-F (plots 1–6), those with species in both clades (plots 7–9) and those dominated by species in the clade with species A and B (plots 10–13). Ordination using PD for Ex2 also arrange plots in groups associated with clades and had a lower stress than using SD but extracted the gradient poorly.
|Analysis||R2 with SD||Stress||R2 with gradient|
|Scenario 1: PD||71||0.00005||Synthetic: 95|
|Scenario 2: PD||57||7.65||Synthetic: 90|
|SD||11.28||Elevation: Axis 1: 68 Axis 2: 45|
|Longitude: Axis 1: 71 Axis 2: 1|
|PD||39||11.82||Elevation: Axis 1: 68 Axis 2: 0|
|Longitude: Axis 1: 63 Axis 2: 1|
Our phylogenetic tree was consistent with our hypotheses that the gymnosperms were sister to angiosperms. Within the gymnosperms, it was consistent with our expectation that Pinaceae and Cupressaceae were sister groups. In both the angiosperms and Pinaceae, congeners were placed as monophyletic groups (Fig. 3a).
Most tree species had roughly unimodal distribution along gradients of elevation and longitude (Fig. 3b, c). Low- to mid-elevation species included Pseudotsuga menziesii, Tsuga heterophylla, Acer macrophyllum and Juniperus occidentalis. Mid- to high-elevation species included the Pinaceae Abies spp., Tsuga mertensiana and Pinus albicaulis. Western species included the Cupressaceae Thuja plicata, angiosperms Acer glabrum, Acer macrophyllum and Arbutus menziesii, and Pinaceae Pseudotsuga menziesii, Tsuga heterophylla and Abies amabilis. Species of middle longitudes included the Pinacaeae Pinus contorta, Tsuga mertensiana and Abies lasiocarpa. Eastern plots were characterized by a mix of Cupressaceae Juniperus occidentalis, Pinaceae Pinus ponderosa, and occasional Populus tremuloides.
As expected, ordination using SD detected strong ecological gradients related to elevation and longitude (Fig. 4a, Table 1). Phylogenetic distances (PD) among plots were generally lower than Sørensen distances (SD), but the distance matrices were weakly correlated (Table 1). Using PD, plots formed clumps in the ordination and showed weak associations with elevation and longitude (Fig. 4b, Table 1). Plots at the top of the ordination included Cupressaceae whereas those on the left included angiosperms and those at the right were entirely composed of members of the Pinaceae. Use of PD weakened the longitudinal gradient by clumping eastern and western plots containing Cupressaceae and angiosperms. The ordination highlighted the elevation gradient, as plots at higher elevations were dominated by Pinaceae.
Abundance-based phylogenetic distance (PD; Nipperess et al. 2010) did not consistently enhance our ability to detect ecological gradients relating to species composition; instead it highlighted which gradients were related to phylogeny by bringing plots with related taxa together in a clumped pattern. Where habitats were somewhat phylogenetically conserved along gradients (synthetic Ex1 and elevation in the real data), PD allowed interpretation of clades associated with positions along these gradients. When niches along gradients were not phylogenetically conserved (synthetic Ex2 and longitude in the real data), PD weakened interpretation of environmental gradients.
Incorporating phylogenetic and community abundance information to examine habitat associations is best used as an exploratory tool to examine which patterns in community composition are partially explained by phylogeny. Comparison of ordinations using PD to those using SD can suggest which gradients correspond to phylogenetic conservatism and which are more related to other factors. At small spatial scales, closely related species may have divergent niches (over-dispersion; Kembel & Hubbell 2006; Emerson & Gillespie 2008) and incorporation of PD would be expected to downplay environmental gradients associated with species composition. At larger spatial scales, related species may have similar niches, showing phylogenetic clustering along environmental gradients. In these cases, ordination incorporating PD would be expected to highlight the clustering of clades in similar niches. Over very long gradients, plots may share genera or families but not species or genera; PD could allow analysis that takes into account the potential functional similarity of related taxa. The scale at which over-dispersion and clustering are observed may depend on the environmental gradient.
Community ecologists should use ordinations based on SD and PD to address different questions: ‘Which environmental gradients are most related to species composition?’ is best answered using SD, whereas, ‘Which environmental gradients related to species composition can be partially explained by evolutionary histories?’ is best addressed by comparing SD and PD ordinations. We should also be aware of potential pitfalls of PD. (1) There is no precise definition of what species response surfaces should look like when habitats are phylogenetically conserved (Webb et al. 2002), although conceptual evolutionary scenarios leading to different ecological outcomes have been summarized (Wiens & Graham 2005; Emerson & Gillespie 2008). (2) Phylogenies are working hypotheses; distances and topologies are under revision as we gain more information about evolutionary histories. Despite these caveats, we believe that many patterns in community ecology may be related to evolutionary history, especially at broad spatial scales, and that combining SD with PD will allow further exploration of mechanisms controlling species distributions and habitat associations.
Further development of PD may include a more comprehensive comparison of fuzzy-set weighted communities (Pillar & Duarte 2010) to those calculated with reference to tree topology (Nipperess et al. 2010). Further development of permutation tests could evaluate competing models, similar to those suggested for binary species response data (Kembel & Hubbell 2006). Niches could be interpreted as phylogenetically conserved when related taxa are clustered along a gradient. Since PD highlights this situation, using it should allow detection of strong correlations between ordination scores and the environmental gradient of interest. A potential null model could evaluate whether this correlation was much lower when taxon positions on the phylogenetic tree are randomized. We suggest that future studies develop these randomization tests.
We applaud Nipperess et al.'s (2010) development of ordination of community data using phylogenetic distance that incorporates abundances. However, we felt that its utility needed to be examined, not by noting a reduction in stress but by examining its performance with known environmental gradients. We have addressed this omission using two small examples and discussed situations in which this exciting new approach may be most helpful in community ecology.
David Nipperess provided the phylogeny from his example. The paper benefitted from comments by Bruce McCune and the community analysis discussion group at Oregon State University. Leandro Duarte, Mark Fishbein and David Nipperess provided valuable reviews of the manuscript.
- 1971. The problem of non-linearity in ordination: experiments with two-gradient models. Journal of Ecology 59: 763–774. &
- 1957. An ordination of the upland forest communities in southern Wisconsin. Ecological Monographs 27: 325–349. &
- 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17: 540–552.
- 2011. Phylogenetic habitat filtering influences forest nucleation in grasslands. Oikos 120: 208–215.
- 2008. Phylogenetic analysis of community assembly and structure over space and time. Trends in Ecology and Evolution 23: 619–630. &
- 2006. The phylogenetic structure of a neotropical forest tree community. Ecology 87: S86–S99. &
- 1986. Applying metric and nonmetric multidimensional scaling to ecological studies: some new results. Ecology 67: 919–928. &
- 1964. Nonmetric multidimensional scaling: a numerical method. Psychometrika 29: 115–129.
- 1967. The detection of disease clustering and generalized regression approach. Cancer Research 27: 209–220.
- 1994. Improving community analysis with the Beals smoothing function. Ecoscience 1: 82–86.
- 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. Proceedings of the Gateway Computing Environments Workshop (GCE) 14: 1–8. , &
- 1987. Simulation of multidimensional community patterns: towards a comprehensive model. Vegetatio 71: 145–156.
- 2010. Resemblance in phylogenetic diversity among ecological assemblages. Journal of Vegetation Science 21: 1–12. , &
- 1996. TREEVIEW: an application to view phylogenetic trees on personal computers. Computer Applications in the Biosciences 12: 357–358.
- 2010. A framework for metacommunity analysis of phylogenetic structure. Ecology Letters 13: 587–596. &
- 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688.
- USDA-Forest Service. 2010. Online database of all United States Department of Agriculture forest health monitoring and forest inventory and analysis datasets. Available at: http://fia.fs.fed.us/tools-data/data (accessed December 29, 2010).
- 2002. Phylogenies and community ecology. Annual Review of Ecology, Evolution and Systematics 33: 475–505. , , &
- 2005. Niche conservatism: integrating evolution, ecology, and conservation biology. Annual Review of Ecology, Evolution and Systematics 36: 519–539. &