The coup de grâce for the nested clade phylogeographic analysis?
Article first published online: 23 OCT 2007
© 2007 The Author
Volume 17, Issue 2, pages 516–518, January 2008
How to Cite
PETIT, R. J. (2008), The coup de grâce for the nested clade phylogeographic analysis?. Molecular Ecology, 17: 516–518. doi: 10.1111/j.1365-294X.2007.03589.x
- Issue published online: 20 DEC 2007
- Article first published online: 23 OCT 2007
- Received 5 September 2007; revision received 18 September 2007; accepted 18 September 2007
- permutation test;
- random mating;
Nested clade phylogeographic analysis (NCPA) has become a popular method for reconstructing the history of populations across species ranges. Ever since its invention in 1995, criticisms have been formulated, but the method, which has been regularly updated, continues to attract investigators. Molecular Ecology has published a large fraction of the literature on the topic — both pro and con. A recent study by Panchal and Beaumont (2007) finally allows a precise evaluation of the method by developing software that automates the somewhat complicated NCPA procedure. Using simulations of random-mating populations, Panchal and Beaumont find a high frequency of false-positives with their automated NCPA procedure (over 75%). These findings, which echo and amplify earlier warnings, appear serious enough to suggest to researchers to await further evaluation of the method. Although no other all-encompassing method such as the NCAP currently exists to evaluate phylogeographic data sets, researchers have many alternative methods to test ever more refined hypotheses.
As pointed out by Hey & Machado (2003) and Wakeley (2003), the study of structured populations is a difficult and divided science. Two different schools of thought are recognized, the phylogeographic school, which relies on graphical representation of intraspecific phylogenetic trees or networks (e.g. Avise 2000), and the mathematical one, which avoids this step and relies instead on more abstract mathematical models of population genetics. The essence of the problem was well captured by Smouse (1998) in his paper entitled ‘To tree or not to tree’. A current illustration of this debate is provided by the controversy surrounding one of the most successful tree-based methods of analysis of phylogeographic data, the nested clade phylogeographic analysis (NCPA, Templeton et al. 1995; Templeton 1998, 2001, 2004). NCPA seeks to distinguish between different historical processes that might have influenced the geographic distribution of haplotypes or lineages relative to higher-level clades. It includes permutation tests of the spatial distribution of these lineages and a protocol for inference of the processes resulting in the particular phylogeographic structure observed (based on an inference key). As noted by Hey & Machado (2003), the promise to identify the relevant processes (isolation by distance, past fragmentation, range expansions ... ) has raised considerable interest, because such inferences are generally impossible with any single alternative method. A user-friendly program, called geodis, was devised to help researchers implement the NCPA method (Posada et al. 2000). As of August 2007, there were over 1600 citations of the original NCPA papers, and about 20% of studies in phylogeography make use of this method (Fig. 1), illustrating its large diffusion in the field.
Alternative methods to NCPA have been proposed, based on analytical models, and these are considered to better take into account the stochasticity inherent in the evolutionary process (Knowles & Maddison 2002; Knowles 2004). Nevertheless, many researchers have continued to use NCPA for single-locus data, pleading for pluralism and pointing to the lack of handy alternatives to analyse the hard-won data (e.g. Weiss & Ferrand 2006), while efforts to include multiple gene trees in NCPA studies have been made (Templeton 2004). Other criticisms of the NCPA method have emerged following the realization that there were deficiencies in the inferences drawn from NCPA for specific historical scenarios or demographic events (e.g. Alexandrino et al. 2002; Paulo et al. 2002; Masta et al. 2003). These findings have led to changes in the methodology or to recommendations of when the approach is more likely to fail (Templeton 2004). A more serious criticism has focused on the practical implementation of the permutation tests used to evaluate the data (Petit & Grivet 2002). Contrary to what is customary when testing the existence of spatial genetic structure by permutation, it is not the populations that are being reshuffled at random in the NCPA procedure but the individuals themselves. As a consequence, processes that can affect local haplotype frequencies, such as bottlenecks (Johnson et al. 2007) or mating between relatives, are being confounded with historical processes that have shaped the genetic structure at the range-wide scale, greatly increasing the risk of false-positives (Fig. 2). According to Templeton (2002), the NCPA method would still be valid provided some variation within population exists, but it is unclear how much within-population variation is needed for the method to perform well. Despite these uncertainties, NCPA has continued to attract much interest (Fig. 1).
By far, the largest limitation of NCPA is that few simulation studies have been carried out to validate the inferences made. The only type of validation performed by the proponents of the method was the analysis of a number of cases studies with ‘known’ a priori expectations (Templeton 1998). Knowles & Maddison (2002) carried out the first (small-scaled) simulation study to examine whether NCAP was able to infer the processes used to simulate the data (few simulations had been used because the results had to be evaluated manually using the inference key). They found that NCPA was misleading in 70% of the cases. However, the significance of these simulations was discussed on the ground that some key aspects of the method had been misrepresented (Templeton 2004).
The new paper by Panchal & Beaumont (2007) describes the automation of the whole NCPA inference process (building the haplotype network, performing the nesting and permutation tests, and applying the inference key) and its evaluation on a larger number of simulated data sets. The description of the corresponding computer program (aneca) is in Panchal (2007); another program called autoinfer was published nearly at the same time to achieve the same objectives (Zhang et al. 2006). After carefully checking that the automated method was working, Panchal and Beaumont simulated, under a panmictic model, 100 genealogies from which they sampled DNA sequences from varying number of populations and individuals. They finally ran geodis on these simulated samples. The proportion of false-positives reached 76%. This is actually a conservative estimate, since 87% of the simulated data sets have at least one of the two spatial statistics (Dc or Dn) that is significant. The (false) inferences arrived at with their simulated panmictic data sets (restricted gene flow with isolation by distance and contiguous range expansion) are precisely those most frequently found in previous analyses of empirical data sets.
In conclusion, at least three independent analyses (Knowles & Maddison 2002; Petit & Grivet 2002; Panchal & Beaumont 2007) have now shown that the NCPA method returns an excess of false-positives under a number of scenarios. In retrospect, the propensity of NCPA to return too many significant outcomes might have contributed to its appeal among researchers eager to identify statistical support for their interpretation of the data. Given these findings, I suggest that the method be no longer used until it has been more thoroughly and critically evaluated, which should be facilitated by the availability of new automated NCPA procedures.
I thank Martin Lascoux, Mahesh Panchal and Mark Beaumont for helpful comments on an earlier draft of this paper.
- 2002) Nested clade analysis and the genetic evidence for or population expansion in the phylogeography of the golden-striped salamander, Chioglossa lusitanica (Amphibia: Urodela). Heredity, 88, 66–74. , , (
- 2000) Phylogeography: The History and Formation of Species. Harvard University Press, Cambridge, Massachusetts. (
- 2003) The study of structured populations — new hope for a difficult and divided science. Nature Reviews Genetics, 4, 535–543. , (
- 2007) Effects of recent population bottlenecks on reconstructing the demographic history of prairie-chickens. Molecular Ecology, 16, 2203–2222. , , (
- 2004) The burgeoning field of statistical phylogeography. Journal of Evolutionary Biology, 17, 1–10. (
- 2002) Statistical phylogeography. Molecular Ecology, 11, 2623–2635. , (
- 2003) Population genetic structure of the toad Bufo woodhousii: an empirical assessment of the effects of haplotype extinction on nested cladistic analysis. Molecular Ecology, 12, 1541–1554. , , (
- 2007) The automation of nested clade phylogeographic analysis. Bioinformatics, 23, 509–510. (
- 2007) The automation and evaluation of nested clade phylogeographic analysis. Evolution, 61, 1466–1480. , (
- 2002) Using nested clade analysis to assess the history of colonization and the persistence of populations of an Iberian lizard. Molecular Ecology, 11, 809–819. , , , (
- 2002) Optimal randomization strategies when testing the existence of a phylogeographic structure. Genetics, 161, 469–471. , (
- 2000) geodis: a program for the cladistic nested analysis of the geographical distribution of genetic haplotypes. Molecular Ecology, 9, 487–488. , , (
- 1998) To tree or not to tree. Molecular Ecology, 7, 399–412. (
- 1998) Nested clade analyses of phylogeographic data: testing hypotheses about gene flow and population history. Molecular Ecology, 7, 381–397. (
- 2001) Using phylogeographic analyses of gene trees to test species status and processes. Molecular Ecology, 10, 779–791. (
- 2002) ‘Optimal’ randomization strategies when testing the existence of a phylogeographic structure: a reply to Petit and Grivet. Genetics, 161, 473–475. (
- 2004) Statistical phylogeography: methods of evaluating and minimizing inference errors. Molecular Ecology, 13, 789–809. (
- 1995) Separating population structure from population history: a cladistic analysis of the geographical distribution of mitochondrial-DNA haplotypes in the Tiger Salamander, Ambystoma tigrinum. Genetics, 140, 767–782. , , (
- 2003) Inferences about the structure and history of populations: coalescents and intraspecific phylogeography. In: The Evolution of Population Biology (eds SinghR, UyenoyamaM), pp. 193–215. Cambridge University Press, Cambridge, UK. (
- 2006) Current perspectives in phylogeography and the significance of South European refugia in the creation and maintenance of European biodiversity. In: Phylogeography of Southern European Refugia (eds WeissS, FerrandN), pp. 341–357. Springer, Amsterdam, The Netherlands. , (
- 2006) autoinfer 1.0: a computer program to infer biogeographical events automatically. Molecular Ecology Notes, 6, 597–599. , , (