Starting with “mitochondrial Eve” in 1987, genetics has played an increasingly important role in studies of the last two million years of human evolution. It initially appeared that genetic data resolved the basic models of recent human evolution in favor of the “out-of-Africa replacement” hypothesis in which anatomically modern humans evolved in Africa about 150,000 years ago, started to spread throughout the world about 100,000 years ago, and subsequently drove to complete genetic extinction (replacement) all other human populations in Eurasia. Unfortunately, many of the genetic studies on recent human evolution have suffered from scientific flaws, including misrepresenting the models of recent human evolution, focusing upon hypothesis compatibility rather than hypothesis testing, committing the ecological fallacy, and failing to consider a broader array of alternative hypotheses. Once these flaws are corrected, there is actually little genetic support for the out-of-Africa replacement hypothesis. Indeed, when genetic data are used in a hypothesis-testing framework, the out-of-Africa replacement hypothesis is strongly rejected. The model of recent human evolution that emerges from a statistical hypothesis-testing framework does not correspond to any of the traditional models of human evolution, but it is compatible with fossil and archaeological data. These studies also reveal that any one gene or DNA region captures only a small part of human evolutionary history, so multilocus studies are essential. As more and more loci became available, genetics will undoubtedly offer additional insights and resolutions of human evolution.
With the publication of a paper entitled “Mitochondrial DNA and human evolution” by Cann et al. in 1987, genetics has played an increasingly important role in our understanding of human evolution over the last two million years. Cann et al. (1987) presented a genetic survey based on restriction site polymorphisms in 147 human mitochondrial DNA samples whose maternal origins came from five different geographical regions that spanned the globe. They estimated the tree of the resulting 133 mitochondrial haplotypes using maximum parsimony, although incorrectly (Maddison 1991). Subsequent data and analyses showed that despite their errors, two basic inferences were correct: (1) the mitochondrial tree was rooted in Africa, and (2) all the branches were relatively short, implying a recent common mitochondrial ancestor (dubbed “Eve” in popular accounts). From these observations, Cann et al. concluded that their data supported a model of human evolution in which anatomically modern humans first evolved in Africa around 150,000 years ago, then spread out from Africa starting around 100,000 years ago, totally replacing all other human populations such that all living humans are descendants only from this expanding African population. This out-of-Africa replacement hypothesis (Stringer and Andrews 1988) was already being debated in the paleoanthropological literature of the time, with the major alternative in the debate being the multiregional hypothesis that human populations in both Africa and Eurasia contributed to the evolution of anatomically modern humans (Wolpoff et al. 1988).
Although Cann et al. (1987) did not present a pictorial representation of these alternative models, many subsequent papers and textbooks did, with Figure 1 being redrawn from Lewin (1989, p. 104), one of the first textbooks on human evolution to feature the work of Cann et al. (1987). Both hypotheses in Figure 1 have the human lineage initially only in Africa, and both have the human lineage expanding out of Africa at the time of what is now called Homo erectus, an expansion most recently dated to 1.9–1.7 million years ago (Mya; Aguirre and Carbonell 2001; Bar-Yosef and Belfer-Cohen 2001; Antón et al. 2002; Vekua et al. 2002). As shown in Figure 1A, the candelabra model (often called the “multiregional model”) then posits that three separate lineages of H. erectus were established after this early Pleistocene expansion out of Africa, and each lineage independently evolved into their modern forms. In contrast, the out-of-Africa replacement model in Figure 1B has a second expansion out of Africa at about 100,000 years ago that represents the single African human population that had evolved into modern form. The earlier Eurasian lineages are shown as broken, to portray replacement such that all living humans are descendants only of this second population to expand out of Africa. Both models represent the relationship of African, European, and Asian populations as separate branches on an evolutionary tree, but the branch lengths are very different in the two models.
Cann et al. (1987) noted that their mitochondrial tree did not have branches displaying evidence of antiquity and that non-Africans did not represent distinct lineages, as expected under Figure 1A. Hence, the genetic data seemingly falsified the multiregional hypothesis and supported the out-of-Africa replacement hypothesis. Soon thereafter, the out-of-Africa replacement hypothesis became the model of human evolution, particularly in undergraduate textbooks and the popular science literature.
Unfortunately, there were serious flaws in the argument of Cann et al. (1987) and in subsequent studies supporting replacement, flaws that have been perpetuated even until today. First, Figure 1A is not the multiregional hypothesis. Second, Cann et al. performed no hypothesis testing. Third, there are many potential models of human evolution, not just two, so even if one is falsified, it does not mean that the other one is necessarily true. A fourth flaw in much subsequent work is the ecological fallacy (Freedman 1999) in which a higher order pattern is used to “prove” through goodness of fit that an underlying process model is true. These flaws will be examined in this perspective, but it will also be shown that although genetic data cannot be used to prove a favored hypothesis to be true, it can be used to reject or falsify hypotheses-–the more traditional and epistemologically justified role of the scientific method. The model that emerges when all inferences are based upon rejecting null hypotheses overlays well upon fossil and archaeological data in a manner that is extremely informative of recent human evolution.
WHAT IS THE MULTIREGIONAL HYPOTHESIS?
The “multiregional” model shown in Figure 1A requires an amazing parallel evolution of nonmodern humans into their modern forms. This model was already largely discredited before the publication of Cann et al. (1987) just on the basis of the theoretical implausibility of such a threefold parallel evolution. Moreover, early genetic studies had already falsified this hypothesis by showing that the genetic differences among the major populations of humanity were rather small and showed no evidence of ancient, highly divergent sublineages (Lewontin 1972). So why was there even an on-going debate about the multiregional model in 1987? The answer is straightforward: Figure 1A was not the multiregional model being debated at the time (except to some of the advocates of replacement). The multiregional model was formulated by the anthropologist Franz Weidenreich (1946), who provided a pictorial representation of his model, which is redrawn here as Figure 2. In total contrast to the “multiregional” model in Figure 1A, the original multiregional model in Figure 2 regards human populations as being interconnected by nearly continuous gene flow throughout the Pleistocene, with the gene flow being of sufficient magnitude such that the human continental populations define an intertwined trellis. There is no tree of human populations of any sort in Weidenreich's figure. Weidenreich (1946) argued that regional populations could display differences, and some local differences could persist through time in the same locality, but there is no assumption of independent, parallel evolution. Instead, humanity consists of a single evolutionary lineage with no subbranches because humanity's geographically dispersed populations were and are interconnected by gene flow and lines of recent, not ancient, common descent due to this gene flow. The genetic implications of the original multiregional model bear no resemblance to the characterization given in Cann et al. (1987).
Cann et al. (1987) give two references for the multiregional model in their paper—Wolpoff et al. (1984) and Coon (1962). Wolpoff and his associates were the primary group arguing for the multiregional model sensu Figure 2 in the late 1980s. In contrast, Coon was an anthropologist who strongly believed that human races were ancient and displayed vast genetic differences. It was Coon who advocated the candelabra model shown in Figure 1A. Because Cann et al. (1987) cited references for both of these alternative “multiregional” models, perhaps they were arguing against Weidenreich's model (Fig. 2) and not Coon's candelabra model (Fig. 1A). However, Cann et al. (1987) state that the model they are arguing against “holds that the transformation of archaic to anatomically modern humans occurred in parallel[emphasis mine] in different parts of the Old World.”Cann et al. (1987, p. 35) then continue that the multiregional model “leads us to expect genetic differences of great antiquity within widely separated parts of the modern pool of mtDNAs.” Parallel evolution is Coon's model, not Weidenreich's, and the genetic expectations they describe only occur with long-term isolation, not continuous gene flow with no tree-like structure at all (Fig. 2). This issue was further clarified by Cann (1993), who described the thought process behind Cann et al. (1987) in more detail. In particular, Cann (1993, p. 78) states, “It appeared to us [i.e., Cann et al. (1987)] that instead of evolving from isolated archaic ancestors on many continents[emphasis mine], modern people stemmed from some pool of African ancestors who colonized new areas.” There is no doubt that Cann et al. (1987) equated multiregional evolution to Coon's model of parallel evolution of isolated archaic populations.
Because the advocates of the multiregional model strongly objected to the candelabra model as representing the multiregional model because of its emphasis on parallel evolution due to a lack of gene flow (Wolpoff et al. 2000), many of the subsequent authors who favored the out-of-Africa replacement model would slightly modify their tree-like figures by drawing in a few horizontal lines between the continental lineages to represent gene flow. These subsequent figures still portray human evolution under the multiregional model as essentially tree-like with gene flow being a weak, sporadic force rather than as in the Weidenreich trellis of Figure 2 that has continuous, not sporadic gene flow, and no tree-like structure whatsoever. This visual downplaying of the role of gene flow in such figures is often reinforced by the text. For example, the latest work on human evolution at the time of writing this perspective is a book by Stone et al. (2007). Stone et al. (2007, p. 33) state that “This idea [parallel evolution] is at the core of the multiregional model of human evolution (Figure 2.12),” where “Figure 2.12” refers to a figure of the same generic type as shown here in Figure 1, but now with a few sporadic horizontal arrows added to depict gene flow. Stone et al. (2007) explain their inclusion of the arrows showing gene flow as follows (p. 34): “Over the years, the multiregional model has been modified, and its current version-–the Thorne-Wolpoff hypothesis-–assumes that considerable admixture (represented by double arrows in Figure 2.12) took place between African, Asian, and European populations as they were evolving into H. sapiens.” However, as is clear from Weidenreich (1946) and many subsequent papers by Wolpoff and his associates, parallel evolution was never part of the multiregional model, much less its core, whereas gene flow was not a recent addition, but rather was present in the model from the very beginning (Fig. 2).
Such misrepresentations, including the pictorial and textual dismissals of the importance of gene flow in the multiregional model, are found not only in books geared for a lay audience, but in scientific books as well. For example, Stephen J. Gould was an advocate of the replacement model, and in his last major scientific text he stated:
Multiregional evolution should be labeled iconoclastic, if not a bit bizarre. How could a new species evolve in lockstep parallelism from three ancestral populations spread over more than half the globe? Three groups, each moving in the same direction, and all still able to interbreed and constitute a single species after more than a million years of change? (I know that multiregionalists posit limited gene flow to circumvent this problem, but can such a claim represent more than necessary special pleading in the face of a disabling theoretical difficulty?) (Gould 2002: pp. 911–912)
Once again, parallelism is Coon's model, not the multiregional model, and continuous gene flow throughout the Pleistocene has been part of the multiregional model since at least 1946 and is not a “special pleading” in response to recent difficulties.
This misrepresentation has certainly been convenient for the advocates of the replacement model because parallel evolution is so easy to discredit. In contrast, Weidenreich's multiregional model is not such an easy target. The mitochondrial haplotype tree described by Cann et al. (1987) with no long branches is compatible with the multiregional model if there were sufficient gene flow to prevent any long-term isolation among subpopulations of H. erectus, as Weidenreich (1946) depicted (Fig. 2). The short time to a common ancestor is also compatible with gene flow uniting human populations into a single lineage, so there is no necessity to go back to the time of H. erectus for the common mitochondrial ancestor in the model portrayed in Figure 2. Finally, H. erectus was mostly limited to the southern tier of Eurasia throughout most of the Pleistocene, with Africa having more occupied area than either Europe or Asia (Hassan 1981; Weiss 1984; Eller et al. 2004). When regional populations are interconnected by gene flow, the geographical place of coalescence can occur anywhere in the species' range, so an African coalescence is compatible with a multiregional model with gene flow. Moreover, the probability of neutral coalescence to a particular geographic region is equal to the proportion of the total population that inhabits that region when gene flow unites the regional populations into a single evolutionary lineage. Consequently, the estimates of occupied area (Hassan 1981; Weiss 1984; Eller et al. 2004) and pre-expansion estimates of population sizes (Relethford 1998) imply that an African coalescence is the most likely outcome for the mitochondrial haplotype tree under the multiregional model with gene flow. Thus, the data and analysis in Cann et al. (1987) did not distinguish at all between the models shown in Figure 1B versus Figure 2. The original multiregional model is just as compatible with the mtDNA data as the out-of-Africa replacement model (Templeton 1993, 1997).
The multiregional model is not just a single alternative of recent human evolution, but rather is a class of models. Weidenreich (1946) created this model to deal with morphological data from the fossil record, and his diagram (Fig. 2) invokes continuous and sufficient gene flow throughout the Pleistocene to explain the global evolution of modern human traits, but with the gene flow being sufficiently restricted by distance to allow some regional differentiation and continuity. Population genetic theory and observations indicate that there is a broad range of conditions that will result in this pattern (Templeton 2006b). When gene flow is highly restricted, say by distance, local differentiation of even neutral traits is possible while selectively favored traits can still sweep through the species. When gene flow is high, neutral traits show little differentiation, but locally adapted traits can show much differentiation and continuity. Consequently, there is a broad range of gene flow and selective conditions that could explain the patterns that were of concern to Weidenreich (1946). Accordingly, the multiregional model is really a class of models and not a single, well-defined alternative to replacement. Moreover, although Weidenreich's model (Fig. 2) does not show any major population expansion events after the initial expansion out of Africa, many regard population expansions that were accompanied by interbreeding and not total replacement as a variant of the multiregional model (Wolpoff et al. 1994), resulting in many current flavors of the original multiregional model (Relethford 2001). By positing total replacement, the out-of-Africa replacement model renders the population structure of Pleistocene populations irrelevant, which in turn makes the replacement model simpler and more well defined than the multiregional model. As a consequence, the out-of-Africa replacement hypothesis is actually a more appropriate null hypothesis of human evolution than the multiregional model, a feature that will be used shortly.
HYPOTHESIS COMPATIBILITY VERSUS HYPOTHESIS TESTING
We teach our students that science is about gathering data to test hypotheses, but unfortunately much of the genetic work on human origins avoids hypothesis testing and only shows that a favored, a priori hypothesis can be made to be compatible (often with additional, ad hoc assumptions) with the genetic data (Templeton 1994). As pointed out above, the original study on mtDNA (and later subsequent mtDNA and Y-DNA studies) was equally compatible with the replacement and multiregional models (Figs. 1B and 2). The major distinction between the replacement and multiregional models is whether Eurasian replacement occurred, and not an African root to the mtDNA and Y-DNA haplotype trees nor the relatively recent coalescent times of mtDNA and Y-DNA to a common ancestral molecule. By substituting Coon's model for the multiregional model, Cann et al. (1987) mistakenly argued that this was a phylogenetic problem of distinguishing between an mtDNA tree with short branches (replacement) versus an mtDNA tree with long branches (the “genetic differences of great antiquity within widely separated parts of the modern pool of mtDNAs” in the Cann et al.  depiction of multiregional evolution). But replacement, the true distinguishing feature, is a population-level demographic process and not a phylogenetic branch-length problem when populations are interconnected by sufficient gene flow. To test replacement, it is necessary that the data should be informative about the populations in the time periods before, during, and after the alleged replacement event occurred. Only in that case is there any potential to test the hypothesis that anything was indeed replaced. Genetic data are informative at the population level only if there is genetic variation in the populations. Hence, statistical information about replacement is present only in genetic datasets that can reveal the presence of genetic variation in the populations that existed before, during, and after the hypothesized time of replacement.
In regard to statistical information, the strong focus on mtDNA and Y-DNA studies by advocates of the replacement hypothesis has been a colossal, but often not appreciated, mistake. Standard coalescent theory (Ewens 1990; Hudson 1990) predicts that the average coalescent time to the most recent common ancestor (MRCA) of a sample of n genes is 2xNef(1 − 1/n), where x is the ploidy level and Nef is the inbreeding effective size. Note that the expected time for ultimate coalescence approaches 2xNef as the sample size (n) increases. Standard coalescent theory also shows that a sample of n genes requires n− 1 coalescent events to yield the MRCA, and n− 2 of these events are expected to occur in the first half of this coalescent process, reducing the number of DNA lineages to just two. These last two DNA lineages take as much time to coalescence to the MRCA (xNef generations) as the first n− 2 coalescent events take to yield the total expected time to the MRCA of 2xNef generations. Information for much population genetic inference requires genetic variation, and just two DNA lineages are often insufficient (Templeton 2002). When the ultimate coalescence occurs, there is no longer any genetic variation and all information for most types of population genetic inference is completely lost. Interestingly, fragmentation into isolates is one of the few cases in which a single lineage of DNA can be informative because it contains information about branch lengths. Because branch lengths are not informative in distinguishing the model in Figure 1B from the model in Figure 2, haplotype trees are generally most informative about human evolution only for the first half of the time to the MRCA (i.e., the time period in which three or more polymorphic lineages are expected to coexist), with all information being lost when ultimate coalescent occurs (Templeton 2005).
Both mtDNA and Y-DNA have a ploidy level of x= 1, as opposed to x= 2 for autosomal regions and x= 1.5 for X-linked regions. Moreover, both mtDNA and Y-DNA are unisexually inherited, so Nef refers only to the effective size of just one sex for these molecules as opposed to the autosomal or X-linked regions for which the effective size includes both sexes. As a result, mtDNA and Y-DNA are expected to have the shallowest coalescent times of all genetic regions found in humans, as is indeed observed (Fig. 3, modified from Templeton 2005). This means that mt- and Y-DNA lose their population genetic information more rapidly than any other human genetic element. In particular, the oldest event or process that anyone has ever claimed to infer from either of these molecules is the recent population expansion out of Africa into Eurasia (Templeton 2005). There is no information in either molecule about any events or processes that occurred prior to this recent expansion event. Therefore, both mt- and Y-DNA have zero information in a statistical sense about replacement or the multiregional model with gene flow, although the shallowness of these coalescence times makes the candelabra model or a multiregional model with little or highly sporadic gene flow unlikely. Thus, the advocates of the replacement hypothesis are completely right when they claim that the mt- and Y-DNA are 100% compatible with replacement; but this compatibility trivially arises from these molecules being noninformative about replacement. This lack of statistical information also explains the equal compatibility of mt- and Y-DNA with both the replacement and multiregional hypotheses. It is essential to focus upon other data sources to make the transition from hypothesis compatibility to hypothesis testing, as will now be done.
NUCLEAR HAPLOTYPE TREE ROOTS
Starting with Cann et al. (1987), one of the primary forms of evidence presented as favoring the replacement hypothesis is the African root of the mtDNA haplotype tree, but an African root is also the most likely result under the multiregional model under the reasonable assumption that the bulk of the human population resided in Africa throughout most of the Pleistocene (Hassan 1981; Weiss 1984; Eller et al. 2004). Similarly, subsequent work showed an African root of the Y-DNA tree (Hammer et al. 1998), although the authors of this later paper were careful to state that the African origin of the Y-chromosome was only compatible with replacement and did not test it.
Haplotype trees can now be estimated for many regions of the nuclear genome, and most of the coalescent times to the root are much older than 100,000 years ago (Fig. 3). Hence the roots of these nuclear gene trees or old haplotype clades within them are potentially informative about replacement. One strong, qualitative prediction that discriminates between replacement and multiregional origins is that all genetic regions or haplotype clades with coalescent times greater than 100,000 years ago must have an African root under replacement, whereas under the multiregional model most haplotype trees should have an African root, but there should be at least some non-African roots as well. These qualitative predictions can be transformed into a quantitative statistical test. Under the replacement model, the probability of an African root for a haplotype tree or clade older than 100,000 years is 1, and the probability of a Eurasian root is 0. Under the mulitiregional model, these probabilities are 1 −P and P where P > 0 but P < 0.5 because an African root is the most likely single outcome under the multiregional model, as explained above. Hence, a conservative test would test the alternative hypotheses of P= 0 (replacement) versus P= 0.5. A sample based just on mtDNA and Y-DNA (which represent just two haplotype trees despite containing multiple genes within each) is simply inadequate to discriminate between these two models because the probability of obtaining such an observation under replacement is 1 and the probability under the multiregional model is minimal (0.5)2= 0.25. Note that with this quantitative test, even a single observed root in Eurasia will definitively falsify the replacement hypothesis because the probability of the hypothesis P= 0 being true is 0 in that situation. It is important to point out that this test could also reject the multiregional hypothesis if all n loci studied had African roots such that (P)n < 0.05.
Harding et al. (1997) reported haplotypes at the beta-hemoglobin locus that were of Asian origin and older than 200,000 years. Takahata et al. (2001) examined 15 X-linked and autosomal DNA regions and inferred the geographical root for 10 of these, with nine being in Africa and one in Asia. Zietkiewicz et al. (2003) show that the oldest haplotype lineage at the human dystrophin gene is virtually absent in Africa yet is older than the hypothesized out-of-Africa expansion, which indicates admixture outside of Africa. Garrigan et al. (2005) report another X-linked gene with a clearly Asian origin with an ultimate coalescence time of approximately 2 Mya. Old, Eurasian-origin haplotype lineages and tree roots cannot be explained through ancient gene flow introducing them into African populations prior to replacement because the methods infer the population of origin that has genetic continuity to the present, which is African if there had been a total replacement of Eurasian populations. Moreover, such an explanation would leave answered the virtual absence in Africa of the oldest haplotype lineage at the dystrophin gene (Zietkiewicz et al. 2003). Using the quantitative test described in the previous paragraph, the probability of replacement being true given these data is zero. Therefore, these results offer a definitive refutation of the out-of-Africa replacement model. Incredibly, the Takahata et al. paper has been cited as support for the replacement model over the multiregional model (e.g., Pearson 2004; Ray et al. 2005). This probably stems from the nonstandard wording used in Takahata et al. (2001). Takahata et al. contrast the “uniregional” versus the “multiregional” models in their paper. To many, the “uniregional” model (modern humans evolved in one region, specifically Africa) is a synonym for the out-of-Africa replacement model (e.g., Stone et al. 2007), but Takahata et al. (2001, p. 181) state that they are using “uniregional” only in the sense that “one population predominated and the rest played minor roles in the evolution of anatomically modern H. sapiens.” Takahata et al.'s definition of “uniregional” therefore allows P > 0, and indeed their data clearly falsify complete replacement.
MULTILOCUS GENETIC DISTANCES AND DIVERSITIES AND THE ECOLOGICAL FALLACY
Genetic distances and diversity patterns among current human populations based on multilocus surveys of nuclear genes may also be informative about modern human origins. Eswaran et al. (2005) showed via computer simulations that a model of isolation-by-distance extending over Africa and Eurasia coupled with selective sweeps on one to eight loci associated with the evolution of modern traits explains well the observed patterns of genetic distances and diversities, whereas the replacement model does not. Indeed, Eswaran et al. (2005) concluded that up to 80% of the human nuclear genome is significantly affected by assimilation with non-African archaic human populations (this does not mean that 80% of our genes have a non-African root). Therefore, they used strong words in their paper and even in their title: “Genomics refutes an exclusively African origin of humans.”
Interestingly, about one week after the publication of Eswaran et al. (2005), another paper was published using computer simulations to explain human genetic distance and diversity patterns (Ray et al. 2005). Ray et al. (2005) claimed that their simulation results strongly favored the complete replacement model. Thus, one paper claims 80% of the human genome has been influenced by assimilation with non-African archaic populations, and the other claims that this percentage is zero. Both use computer simulations to justify these contradictory assertions.
How could these contradictory results arise? First, neither paper presents any formal tests of alternative hypotheses. Rather, both merely give an assessment of goodness of fit to the data under the simulated alternatives. For example, Ray et al. (2005) note that the proportion of explained variation under their simulated replacement model is four times better than that of any of their simulated multiregional models. But what does four times better mean in a statistical sense? There is no indication that any of their models are statistically significantly different from another, and indeed, even their best-fitting replacement model only explained 10% of the observed variation. The critical question is whether a fourfold difference discriminates between the alternatives at the 5% probability level or less. This question is unanswered. Therefore, although both of these papers present themselves as testing alternative models, in point of fact both papers are merely showing hypothesis compatibility with a favored simulated scenario.
The question still remains as to why the two papers end up with different best-fitting scenarios. Perhaps the devil is in the details. Each simulation model had to make a large number of assumptions: how many human populations, what are their sizes, how did they grow, is there gene flow and if so how much and so on. One can only guess these parameter values. Another major difference in their simulations is that Eswaran et al. (2005) allowed one or eight loci to experience selective sweeps (to model selection favoring the universal traits found in anatomically modern humans) whereas Ray et al. (2005) only simulated neutral evolution. The different results are likely to be found in one or more of these differences in their underlying simulation parameters and features, but there are many differences and they are all confounded. This illustrates one of the major weaknesses of the simulation approach; large differences can arise from different assumptions about parameters that are largely unknown and unknowable. Thus, are these papers comparing basic models of human evolution or are they really examining the impact of rather arbitrary assumptions? Until this question can be answered, it is not clear whether genetic distance and diversity patterns truly contain information to test hypotheses about recent human evolution.
Ramachandran et al. (2005) also used computer simulations to evaluate the goodness of fit of the replacement model to human genetic distance and diversity data on a global scale. They first point out the excellent fit of the genetic distance data to the expectations of an isolation-by-distance model (the linear regression of genetic distance against geographical distance corrected to exclude travel over large bodies of water has an R2 of 0.7835, a result expected from the multiregional model shown in Fig. 2). To explain this pattern under the out-of-Africa replacement hypothesis, they simulate a situation in which many sequential founder effects occurred in the expanding African population as it spread throughout the world. In their simulations, there is no gene flow after founding and genetic drift is modeled only during the founder event itself and is ignored as the founding populations grow to a constant carrying capacity. There is no independent evidence for such serial founder events resulting in isolated colonies in human evolution, so they are free to invoke as many as are needed to fit the data. They report that they can explain 76–78% of the variation in the observed genetic variation by these hypothesized serial founder events. They provide no statistical evaluation of the relative merits of an isolation-by-distance model to the serial founder event model, so this paper is also one of hypothesis compatibility and not hypothesis testing. Despite the lack of hypothesis testing, Ramachandran et al. (2005) claim that their results discredit the isolation-by-distance model. They reach this conclusion by first stating (p. 15945) that “the observed relationship of genetic and geographic distance should not be interpreted simply as following from theories of isolation by distance, which are valid only at equilibrium between migration, mutation, and drift.” Note that the only observation that they are mentioning is the correlation between genetic and geographical distances. There is nothing in the theory of isolation-by-distance that requires that this correlation instantaneously appear only (their word) upon reaching equilibrium; rather, such a correlation can arise in nonequilibrium situations as well (Slatkin 1993). Thus, there is no theoretical basis for this claim given in Ramachandran et al. (2005).
The next sentence in their argument against isolation by distance is (Ramachandran et al. 2005, p. 15945) “There clearly has not been time to reach equilibrium between the extremes of man's inhabited range, or even within continents, in the very short evolutionary history of modern humans (29),” where “(29)” refers to a paper supporting the replacement hypothesis (Cavalli-Sforza and Feldman 2003). This statement repeats the false premise that an equilibrium situation is required under the isolation-by-distance model, but it further asserts that isolation by distance cannot be true because of the “very short evolutionary history of modern humans.” However, if replacement is not true, there can be a deep evolutionary history of isolation-by-distance in humans (Fig. 2). Indeed, when human evolution is approached in a hypothesis-testing framework with formal statistical tests, the evidence for isolation-by-distance among African and Eurasian populations extends back to nearly 1.5 Mya ago with 95% statistical confidence (Templeton 2002, 2005, 2006a). Thus, there clearly has been much time to reach a positive correlation between genetic and geographical distances under an isolation-by-distance model of gene flow. The argument of Ramachandran et al. (2005) is nothing more than circular reasoning: they argue that the replacement model must be true because the times are too short for isolation-by-distance if the replacement model is true. In this manner, Ramachandran et al. (2005) dismiss the straightforward and simple model of isolation-by-distance, which is consistent with observed patterns of gene flow in humans (Templeton 2006b), with a cumbersome, ad hoc model of serial founder events resulting in isolated colonies—a model with no independent supporting evidence and one that is grossly inconsistent with observed patterns of gene flow in humans.
Collectively, the papers of Eswaran et al. (2005), Ray et al. (2005), and Ramachandran et al. (2005) show that computer simulations can be used to discover models that can explain the observed patterns of human genetic distances and diversities. All of these papers only show hypothesis compatibility and none test alternative models of human evolution against one another or against a null hypothesis, despite claims to the contrary. Finally, the fact that the best-fitting models in these papers are contradictory indicates that the entire simulation/goodness-of-fit approach is subject to the ecological fallacy (Freedman 1999). Ecological inferences in this sense are inferences about underlying processes drawn from data on higher-order aggregates. In this case, the higher-order aggregate data are the patterns of genetic diversity and distances observed in multiple human populations, and the ecological inferences are the underlying demographic and evolutionary processes that can generate these patterns such as the dynamics of spatial movement by individuals or individual demes. The simulations model the underlying processes, and then judge how well they mimic the aggregate properties. The ecological fallacy consists in thinking that the relationships observed for the aggregate data prove that an underlying process model is true. Both philosophically and statistically, one cannot prove an underlying process model to be true simply because it generates a good fit to an aggregate pattern. The fallacy of proof arises because (1) there is no one-to-one mapping of lower-order process to aggregate patterns (e.g., recall the good fits of the Eswaran et al. model vs. the Ray et al. model, or the good fits of the isolation-by-distance model vs. the serial founder event model), (2) it is usually impossible to prove that one has considered all the possible underlying processes that could potentially generate the aggregate patterns, (3) the underlying process models usually differ in multiple ways (the problem of confoundment mentioned above), and (4) the observed patterns can only be measured at the higher-order level and not the underlying process level that can lead to a misleading “aggregation bias” (Freedman 1999). Because of the ecological fallacy, the simulation/goodness-of-fit approach can never be considered a test of any model of human evolution. This approach is legitimate for delineating possibilities and eliminating specific simulated models (with all features and parameters of that model confounded), but it is philosophically and statistically impossible to use this approach to prove that a good-fitting model is true. This does not mean that higher-order patterns cannot be used to test hypotheses. Such patterns can be used to test appropriately worded null hypotheses such that the tests are designed to reject the null hypothesis when it is false rather than to prove that a model is true. For example, as shown earlier, the pattern of the geographical locations of the roots of haplotype trees can legitimately be used to reject the replacement hypothesis, but this does not necessarily mean that the multiregional hypothesis is true. Much statistical inference is about rejecting null hypotheses in a probabilistic sense and not about proving that null hypotheses are true. Therefore, the problem of human evolution will be treated in the next subsection only with inferences arising from rejecting, not proving, hypotheses.
NESTED CLADE PHYLOGEOGRAPHIC ANALYSIS
Nested clade phylogeographic analysis (Templeton 1993, 1998, 2004a,b; Templeton et al. 1995) is one method of testing null, phylogeographic hypotheses. In this methodology, a haplotype tree (or more properly, a 95% confidence set of haplotype trees, Templeton et al. 1992) is estimated from the genetic data. For a nuclear region, the tree is next tested for evidence of recombination by rejecting the null hypothesis of no recombination (Templeton and Sing 1993; Templeton et al. 2000a). If recombination is detected but is limited to just one or a few haplotypes, they are excluded from the initial analysis (Templeton et al. 1987), but the recombinants can be added later and can sometimes provide critical insights (Templeton et al. 1987; Templeton 2004c). If recombination is common but concentrated into one or more hotspots, the region is subdivided and separate haplotype trees are estimated for the DNA subregions of no to little recombination (Templeton et al. 2000b). If recombination is common and uniform, the DNA region cannot be analyzed with this method.
Given an estimated haplotype tree or set of trees for a DNA region with no to little recombination, the next step of the nested clade analysis is to convert the tree into a series of nested clades or branches. Any phylogenetic ambiguity in the estimated tree set can either be directly incorporated into the nested design (Templeton and Sing 1993) or the design can be iterated over all possible resolutions, retaining only those inferences robust to phylogenetic ambiguity (Brisson et al. 2005). Therefore, nested clade analysis can explicitly incorporate phylogenetic ambiguity and tree estimation error into its inference structure, contrary to the unsubstantiated claims found in Felsenstein (2004). Geographic location is next overlaid upon the nested design, and various statistics are calculated to measure the geographical range of a clade and its distance from its closest evolutionary neighbors. The null hypothesis of no geographical association is then tested with these statistics, and all phylogeographic inferences are based upon the rejection of this null hypothesis.
Because this is a statistical procedure, nested clade analysis is subject to both false positives in which the null hypothesis is rejected even though it is true, and false negatives in which the null hypothesis is not rejected even though it is not true. Moreover, when the null hypothesis is rejected, this method sometimes identifies an event or process that could explain the result. These inferences are not proven to be true by this method, only compatible with the pattern of observations that led to the rejection of the null hypothesis. Another type of false positive is when the null hypothesis is correctly rejected, but in which the inferred explanation for the rejection is incorrect. Templeton (2004b) evaluated these error rates by using positive controls; that is, analyzing actual datasets for which prior information exists for a past historical event such as a fragmentation event or a range expansion event. A total of 150 positive control events were identified from the literature, making nested clade analysis the most extensively validated technique using positive controls in the area of intraspecific phylogeography. By using actual datasets, this method of validation avoids the serious shortcoming of validation through computer simulation in which unrealistic assumptions in the simulation model can have a major effect on the error rates, as occurred in the simulations of Knowles and Maddison (2002) (see Templeton 2004b for further discussion of this point). Moreover, positive controls are the only direct way to validate the method with respect to its intended use-–on real data. In this analysis, all inferences that were not expected a priori were counted as false positives. Note, this means that false positives include those cases in which the null hypothesis was rejected correctly, but in which the inference used to explain this rejection was not expected a priori. Because some of these nonexpected inferences could have been true, this procedure yields an upper bound to the false positive rate. Despite this bias to inflate the false positive rate, these validation studies revealed that false negatives were the most common error in nested clade analysis (28% from tables 4 and 5 in Templeton 2004b, with single haplotype trees being the unit of analysis). False negatives occur not only for the usual statistical reasons related to sampling, but also because nested clade analysis can only infer a past event or process if it is marked by one or more mutations occurring at the right place in space and time, a factor over which an investigator has no control. As a result, any one gene or DNA region will always miss some of the events or processes that have affected a species' current array of haplotype diversity. The upper bound of the false positive rate was 13% when the type I error rate had been set at 5% for testing the null hypothesis. False positives are generally regarded as the more serious error, so although it was good that the upper bound for the false positive error rate was less than half the false negative rate, the upper bound of 13% still indicates that some inferences from any one DNA region could be misleading beyond the preassigned 5% level. One method of decreasing both error rates is to perform a multilocus nested clade analysis coupled with explicit cross validation of all inferences across loci. The first step in such a cross validation procedure is to match the inferences qualitatively for inference type (e. g., a range expansion) and location (e.g., a range expansion out of Africa into Eurasia). Next, the null hypothesis is tested that all qualitatively matched inferences across loci occurred at the same time (e.g., all loci detecting a range expansion out of Africa into Eurasia occurred at a common time) using a maximum-likelihood statistical framework (Templeton 2002, 2004a,b). This same framework can be used to construct confidence intervals of multilocus inferences of recurrent gene flow.
Templeton (2005, 2006a) identified 25 human DNA regions from the literature that satisfied minimal sampling requirements for phylogeographic analysis (mtDNA, Y-DNA, 11 X-linked regions, and 12 autosomal regions). Figure 4 summarizes the cross-validated inferences, which yields a model of human evolution that is remarkably consistent with the fossil and archeological data (Templeton 2005, 2006a). A trellis structure is shown after the initial expansion out of Africa that was molecularly dated to 1.9 Mya to indicate the cross-validated inference of gene flow constrained by isolation-by-distance throughout much of the Pleistocene and continuing into the present. Using the multilocus data, there is 95% confidence of gene flow involving African and Eurasian populations going back to 1.46 Mya. Quite interestingly, this analysis detected two cross-validated population expansions out of Africa after the initial H. erectus expansion, and not zero as predicted by the original multiregional model (Fig. 2) nor one as predicted by the replacement model (Fig. 1B). Thus, the model shown in Figure 4 differs qualitatively from all the alternatives that were at the core of the original debate over human origins and is more similar to the mostly out-of-Africa hypothesis (Relethford 2001). This illustrates one of the great strengths of approaching this problem via testable null hypotheses; the inferences one draws are determined by the data and not by some a priori model. This also illustrates one of the major weaknesses of the simulation approach; one only evaluates the alternatives simulated (and all of the specific parameters associated with such a simulation) so that the inference universe is limited exclusively to the simulated a priori models and their specific parameter values. For example, none of the simulation studies mentioned in the previous section simulated a model such as that shown in Figure 4, so none of these studies can make any statements about the relative merit of the replacement model to the model shown in Figure 4. In contrast, the validity of the replacement model can be evaluated under nested clade analysis with the same log-likelihood ratio testing framework used for cross-validation by restating replacement as a testable null hypothesis.
Such a restatement can be achieved by noting that if an expanding population replaced other populations, then there should be no detectable events or processes in the putatively replaced areas that are older than the expansion event (all inferences in nested clade analysis are strictly limited to populations in the past that have contributed genes to the present). Hence, the null hypothesis is that all the putatively older Eurasian events and processes are no older than the expansion event. The replacement null hypothesis is rejected with a probability of 0.035 for the out-of-Africa expansion dated to 650,000 years ago in Figure 4, and is rejected with a probability of less than 10−17 for the recent out-of-Africa replacement hypothesis, the one that has been the primary focus of this debate (Templeton 2005).
Genetic data have indeed played a critical role in studies on human evolution over the last two million years, but not in the manner thought two decades ago nor in the popular science literature of today. Far from supporting the out-of-Africa replacement hypothesis, the genetic data are definitive and unambiguous in rejecting replacement both by the test of haplotype tree or clade root locations (Harding et al. 1997; Takahata et al. 2001; Zietkiewicz et al. 2003; Garrigan et al. 2005), which yields a P-value of 0 under the replacement hypothesis, and by a nested clade statistical test that yields a P-value less than 10−17 under the replacement null hypothesis (Templeton 2005, 2006a). Much of the genetic literature supporting replacement has been flawed by misrepresenting hypotheses, ignoring other alternatives, focusing upon hypothesis compatibility rather than hypothesis testing, using zero-information datasets, committing the ecological fallacy, and using circular logic. Nevertheless, when genetic data are used to test null hypotheses rather than to “prove” that a favored hypothesis is “true,” much insight and resolution into the details of human evolution are possible.
Genetics will undoubtedly continue to provide even more insight and resolution into human evolution as additional data become available. One of the dramatic conclusions that arises from the multilocus nested clade analyses is just how little information is obtained from a single gene or DNA region. Any one gene captures only a small subset of the historical factors and processes that influenced a species' evolution because of a lack of appropriate mutational markers and because any one gene is informative for only a limited time period in the past (Templeton 2002). Also, a single gene potentially has a substantial false positive error rate (Templeton 2004b). It is therefore essential to take a multilocus approach. In 2002, 10 DNA regions were used in a multilocus nested clade analysis of human evolution (Templeton 2002), and just three years later, 25 loci were used (Templeton 2005). This increase in the number of loci led to new inferences and far greater statistical power in testing null hypotheses. As more and more loci become available, even more details of human evolution should emerge. This is particularly true for the older aspects of human evolution. Recall that all information is lost when coalescence to a common ancestral molecule exists for most types of phylogeographic inference because most inferences require genetic variation (fragmentation being an exception). As seen in Figure 3, few loci are even potentially informative about human evolution at around 2 Mya or older. (Note, the locus MX1 is shown in Templeton 2005 as being informative of human evolution in this time period, but Justin Fay in a personal communication showed that the old coalescence time was an artifact of combining orthologous copies with then-unknown paralogous copies. Interestingly, no inferences made from this flawed dataset could be cross-validated, so its inclusion in the original analysis had no impact on the inferences shown in Figure 4. This is an illustration of cross-validation eliminating false positives.) In the 2002 analysis, there were not enough loci informative about the early Pleistocene to even yield a single cross-validated inference in this time period, and hence there was no inference of the original expansion out-of-Africa by H. erectus from the genetic data. In contrast, by 2005 more loci had been sampled, and the early Pleistocene expansion of H. erectus was cross-validated by three loci. As more and more loci are sampled, the informative times will undoubtedly extend even further back into time. In this manner, genetics will become an increasingly useful tool for probing our evolutionary history. The genetic probing over the last two million years revealed completely unanticipated results, such as the middle expansion out-of-Africa shown in Figure 4 (although unanticipated, this result overlays well upon the archaeological record of the spread of the Acheulean tool culture out of Africa at the same time, as discussed in Templeton 2005, 2006a). Further surprises about our evolutionary ancestry can be expected as the amount of genetic information grows with future studies.
Associate Editor: M. Rausher
This work was supported in part by NIH grant GM065509. I would also like to thank the anonymous reviewers for their suggestions, most of which were incorporated into this version.