Measuring the temporal structure in serially sampled phylogenies

Authors

  • Rebecca R. Gray,

    1. Department of Pathology, Immunology and Laboratory Medicine and Emerging Pathogens Institute,University of Florida, PO Box 100009, 2055 Mowry Rd, Gainesville, FL, USA, 32611
    Search for more papers by this author
  • Oliver G. Pybus,

    1. Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, United Kingdom
    Search for more papers by this author
  • Marco Salemi

    Corresponding author
    1. Department of Pathology, Immunology and Laboratory Medicine and Emerging Pathogens Institute,University of Florida, PO Box 100009, 2055 Mowry Rd, Gainesville, FL, USA, 32611
    Search for more papers by this author

Correspondence author. E-mail: salemi@pathology.ufl.edu

Summary

1. Nucleotide sequences sampled at different times (serially sampled sequences) allow researchers to study the rate of evolutionary change and the demographic history of populations. Some phylogenies inferred from serially sampled sequences are described as having strong ‘temporal clustering’, such that sequences from the same sampling time tend to cluster together and to be the direct ancestors of sequences from the following sampling time. The degree to which phylogenies exhibit these properties is thought to reflect interesting biological processes, such as positive selection or deviation from the molecular clock hypothesis.

2. Here, we introduce the temporal clustering (TC) statistic, which is the first quantitative measure of the degree of topological ‘temporal clustering’ in a serially sampled phylogeny. The TC statistic represents the expected deviation of an observed phylogeny from the null hypothesis of no temporal clustering, as a proportion of the range of possible values, and can therefore be compared among phylogenies of different sizes.

3. We apply the TC statistic to a range of serially sampled sequence data sets, which represent both rapidly evolving viruses and ancient mitochondrial DNA. In addition, the TC statistic was calculated for phylogenies simulated under a neutral coalescent process.

4. Our results indicate significant TC in many empirical data sets. However, we also find that such clustering is exhibited by trees simulated under a neutral coalescent process; hence, the observation of significant ‘temporal clustering’ cannot unambiguously indicate the presence of strong positive selection in a population.

5. Quantifying topological structure in this manner will provide new insights into the evolution of measurably evolving populations.

Ancillary