1. Here, I present a new, multifunctional phylogenetics package, phytools, for the R statistical computing environment.
2. The focus of the package is on methods for phylogenetic comparative biology; however, it also includes tools for tree inference, phylogeny input/output, plotting, manipulation and several other tasks.
3. I describe and tabulate the major methods implemented in phytools, and in addition provide some demonstration of its use in the form of two illustrative examples.
4. Finally, I conclude by briefly describing an active web-log that I use to document present and future developments for phytools. I also note other web resources for phylogenetics in the R computational environment.
In recent decades, phylogenies have assumed a central role in evolutionary biology (Felsenstein 1985, 2004; Harvey & Pagel 1991; Losos 2011). Among phylogeneticists, the scientific computing environment R (R Development Core Team 2011) has grown by leaps and bounds in popularity, particularly since the development of the multifunctional ‘ape’ (Analysis of Phylogenetics and Evolution) R package (Paradis, Claude & Strimmer 2004) and since the publication of Paradis’s ‘UseR!’ phylogenetics book (Paradis 2006). Recent years have witnessed a rapid expansion of the phylogenetic capabilities of R in the form of numerous contributed packages. Most, such as the popular packages ‘geiger’ (Harmon et al. 2008) and ‘phangorn’ (Schliep 2011), work by building off the functionality and data structures developed in ape.
Invariably, I have strived to maximize the interactivity of phytools with the ape package. For instance, one of the new functionalities of phytools is the capacity to generate, plot, read, and write stochastic character mapped trees (Nielsen 2002). Rather than create a new type of R object to store stochastically mapped phylogenies, I instead build directly on the existing ‘phylo’ structure developed for ape and employed in many other R phylogenetics packages. At present, phytools is not interoperable with the ‘phylobase’ package (R Hackathon et al. 2011), although this capability will be added in the future.
In the sections that follow, I will describe the major functionality of the phytools library; I will provide two illustrative examples that demonstrate some of the functionality of phytools; and, finally, because phytools is a work in progress, I will describe a web-log that I will use to keep phytools users up to date on bugs, updates, and future software development for the package.
So far, I have implemented numerous functions for the phytools package; however, I should also note that phytools is a work in progress and I expect the capabilities of phytools to expand considerably in the coming years. In Table 1, I provide an annotated list of the major functions thus far implemented in the phytools library. These functions cover methods in a few different areas of phylogenetic biology, described later.
Table 1. Major functions of the phytools package
Adds a tip to all edges of a tree
Generates all possible bi- and multifurcating trees for a set of taxa
Performs ancestral character estimation with a trend using likelihood
Creates an animation of Brownian motion evolution with speciation
Plots a stochastic character map format tree (Fig. 3)
Plots a phylogenetic tree with several options (Fig. 2)
Reads one or multiple stochastic character map format trees from file (Bollback 2006)
Reorders the edges of a stochastic map format tree
Re-roots a phylogenetic tree at an arbitrary position along an edge
Simulates a stochastic history for a discretely valued character trait on the tree
Simulates multiple evolutionary rates on the tree using a Brownian evolution model
Cuts a tree and returns all subtrees
Function writes stochastic map style trees to file
Several methods in phylogenetic comparative biology have been implemented in phytools. These cover a wide range of areas including ancestral character estimation (e.g. anc.trend), likelihood-based methods for studying the evolution of character traits over time (e.g. brownie.lite, evol.vcv, fitDiversityModel and phylosig), a Bayesian method for detecting the location of a rate shift in the tree (evol.rate.mcmc), estimation of phylogenetic signal, including with sampling error (phylosig), and various methods for statistical hypothesis testing in a phylogenetic context (e.g. phyl.cca, phyl.pairedttest, phyl.pca and phyl.resid).
Several simulation methods are implemented in phytools. These include Brownian motion simulation under various conditions (fastBM), simulation of discrete character evolution (sim.history), simulation of stochastic character maps (make.simmap) and simulation of multiple evolutionary rates (sim.rates), among other functionality (Table 1).
A few different phylogenetic inference procedures are implemented in the phytools package. These functions are, in general, highly dependent on calculations and algorithms in the phangorn library of Schliep (2011). Some of the functionality includes matrix representation parsimony supertree estimation (mrp.supertree) and least-squares phylogeny inference (optim.phylo.ls; Table 1).
Several graphical methods are implemented in phytools. Among these are projection of a tree into bivariate morphospace (phylomorphospace; Fig. 1), plotting stochastic character maps and histories (plotSimmap), lineage through time plotting with extinct lineages (ltt), animation of Brownian motion and speciation (branching.diffusion), and other functions (Table 1).
In addition to the aforesaid scientific functions, phytools also includes a number of utility functions for phylogeny input, output and manipulation. These are meant to supplement and complement the existing diverse array of utility functions in the ape and phangorn packages. Several of these functions are listed in Table 1.
To demonstrate the use of phytools, I have created two short illustrative examples which can be easily reproduced by the reader. In the first, I use simulated data and the phytools function evol.rate.mcmc to identify the location of a shift in the evolutionary rate over time (Revell et al. in press). In the second, I simulate a stochastic discrete character history and a continuous character with different rate conditioned on the discrete character state, and then I fit a multi-rate Brownian character evolution model using the phytools function brownie.lite (O’Meara et al. 2006).
Example 1: Detecting the Location of a Rate Shift
In this example, I first simulate a stochastic pure-birth phylogeny; next, I simulate evolutionary change for a single continuously valued character on the phylogeny under two different evolutionary rates in different parts of the tree; I analyse the tree and data using the Bayesian MCMC method for identifying the location of a shift in the evolutionary rate over time (Revell et al. in press); finally, I analyse the MCMC results to estimate the location of the shift and the evolutionary rates tipward and rootward of this point.
First, I loaded the phytools package. This will also load ape and other required packages on first instantiation:
> # load the phytools package (and ape) > require(phytools)
Loading required package: phytools
Loading required package: ape....
Next, I set the random number seed for reproducibility (here, it is just set to 1):
> set.seed(1) # set the seed
I use the ape function rbdtree to simulate a stochastic pure-birth tree. In this instance, the tree has 91 taxa.
> # simulate a tree (using ape) > tree<-rbdtree(b=log(50),d = 0,Tmax=1)
Now, for the purposes of simulation, I split the tree at a predetermined position – specified here by the number of the descendant node and the distance along the edge from the root. To do this, I use the phytools function splitTree. It should be noted that the node and edge position used below are only guaranteed to work conditioned on having set the random number seed at 1 (see above), otherwise a different split point should be chosen.
I can then plot the generating tree for simulation (which has its branches stretched to be proportional to the evolutionary rate multiplied by time; Fig. 2), using the phytools function plotTree, and simulate on this stretched tree using phytools function fastBM:
> # plot the generating tree for simulation
> plotTree(sim.tree,fsize = 0·5)
> x<-fastBM(sim.tree) # simulate on the tree
Now, I perform Bayesian MCMC analysis using the phytools function evol.rate.mcmc (Revell et al. in press). This analysis required about 20 min on a Dell i5 650 CPU running at 3·20 GHz.
The MCMC function first prints the control parameters (which can be set by the user, although above they have been given their default values, see below), and then prints the state of the MCMC chain at a frequency given by the control parameter print (here, every 100 generations; generations after 200 not shown above).
Next, I can estimate the location of the shift point by finding the split in the posterior sample with the smallest summed distance to all the other samples (this is one of multiple possible criteria; see Revell et al. in press). For this analysis, I use the phytools function minSplit and exclude the first 20 000 generations as burn-in:
This analysis took about 6 s to run on the same hardware as described earlier.
Finally, I need to pre-process the posterior sample to get the sampled rates tipward and rootward of the average shift, for each sample (see Revell et al. in press). I do this using the phytools function posterior.evolrate. I can then print the results (estimated shift point and evolutionary rates) to screen:
Here, the parameter estimates are very close to the generating shift point of [153, 0·09] and the generating evolutionary rates of and .
It should be noted that in actual practice, the authors should pay much closer attention to the control parameters of the MCMC than is given here, and in particular, to the proposal distribution for the model parameters. More information about function control can be obtained by calling the help file of evol.rate.mcmc:
or by referring to Revell et al. (in press). In addition, users should assess convergence and compute effective sample sizes for their samples from the posterior distribution. This can be carried out using the MCMC diagnostics package ‘coda’ (Plummer et al. 2006). Please refer to Revell et al. (in press) for more information about this method.
Example 2: Simulate and Analyse Multi-Rate Brownian Evolution
In this example, I first simulate the character history of a discretely valued character trait with three states evolving on a phylogeny. I then simulate the evolution of a continuous trait with a rate that depends on the value of the discrete trait. Finally, I fit single and multiple rate evolutionary models to the data and tree using the likelihood method of O’Meara et al. (2006).
After loading phytools, I first set the seed (arbitrarily to 10; done here for reproducibility only):
Now, I simulate a stochastic pure-birth tree using ape:
> tree<-rbdtree(b = log(50),d = 0,Tmax = 1)
This tree contains 129 taxa. Next, I simulate a stochastic character history on the tree for a character with three states, A, B, and C, using the phytools function sim.history as follows:
> # this is our transition matrix > Q<-matrix(c(−2,1,1,1,−2,1,1,1,−2),3,3)
I can plot the simulated history using the phytools function plotSimmap to see what it looks like:
> # set colors > cols<-c(“red”,“blue”,“green”); > names(cols)<-rownames(Q)
> # plot tree with labels off > plotSimmap(mtree,cols,ftype=“off”)
Next, I simulate continuous character evolution using three different rates using the phytools function sim.rates:
> # set rates > sig2 <-c(1,10,100) >names(sig2)<-rownames(Q)
> X<-sim.rates(mtree,sig2) # simulate
Finally, I fit a multi-rate Brownian model using the likelihood method of O’Meara et al. (2006) with the phytools function brownie.lite. This likelihood optimization took about 3 s to run on the same hardware described earlier.
B C A
10·4646399 99·9563314 0·8247106
 “Optimization has converged.”
It should be noted that the order of the three rate regimes in the fitted model is the order in which they are encountered in the tree (Fig. 3), rather than in alphabetical or numerical order. In this case, the fitted parameter estimates (0·82, 10·46, 99·96) are very close to their generating values (1, 10, and 100). For more details on this likelihood method, please refer to O’Meara et al. (2006) or Revell (2008).
phytools development web-log and other resources
This package so far implements a number of methods for phylogenetic comparative biology, phylogeny inference, tree manipulation and graphing. However, the phytools project is one in progress. To keep users of phytools up to date on bugs, improvements, and new functionality, I maintain an active web-log (i.e. ‘blog’; http://phytools.blogspot.com). This blog acts as both a conduit between the developer (presently myself) and users of the phytools package, as well as a sort of open lab notebook (Butler 2005; Bradley et al. 2011) in which I document the details of bug fixes, software implementation, and use. Most of the functions listed earlier have already been featured on the blog (in the course of their development and refinement). Future work on phytools will also be documented here.
Scientists using phytools in a published paper should cite this article. Users can also cite the phytools package directly if they are so inclined. Citation information can be obtained by typing:
at the command prompt.
Credit is due to L. Harmon for encouraging me to learn R, develop phytools, and publish this note. Thanks to L. Mahler for sharing his data and helping to create Fig. 2. C. Boettiger, an associate editor, and an anonymous reviewer provided very helpful criticism on an earlier version of this article.