Recent studies indicate that Neanderthal and Denisova hominins may have been separate species, while debate continues on the status of Homo floresiensis. The decade-long debate between “splitters,” who recognize over 20 hominin species, and “lumpers,” who maintain that all these fossils belong to just a few lineages, illustrates that we do not know how many extinct hominin species to expect. Here, we present probability distributions for the number of speciation events and the number of contemporary species along a branch of a phylogeny. With estimates of hominin speciation and extincton rates, we then show that the expected total number of extinct hominin species is 8, but may be as high as 27. We also show that it is highly unlikely that three very recent species disappeared due to natural, background extinction. This may indicate that human-like remains are too easily considered distinct species. Otherwise, the evidence suggesting that Neanderthal and the Denisova hominin represent distinct species implies a recent wave of extinctions, ostensibly driven by the only survivor, H. sapiens.

The evolution of our own species and its relations with its closest relatives have interested biologists for more than a century. A continuous source of debate in paleoanthropology has been how many hominin species have existed since the divergence of the lineages that led to Homo sapiens and Pan, our closest extant relatives (Conroy 2002; Marks 2005). Undoubtedly, several different populations existed, but there is little consensus on the question how many of those populations would have constituted biological species (Hunt 2003). In the last century, many new human-like fossils were classified as new species, until the hominin tree counted as many as 60 taxa. That number has dropped more recently to a median of 14, ranging from five to 23 (Curnoe and Thorne 2003), yet some view all human-like fossils as racial variants of one or a few lineages (Wolpoff et al. 2002; Curnoe and Thorne 2003).

Fossil morphology is not the most reliable means to identify separate species (Collard and Wood 2000; Wolpoff et al. 2002; Wood 2010; Klingenberg and Gidaszewski 2010). Nevertheless, estimates of numbers of extinct hominin species are largely based on morphological characteristics of fossil remains, as DNA can only be extracted from the youngest fossils. Comparison of mitochondrial (Krause et al. 2010) and nuclear (Reich et al. 2010) DNA from approximately 40,000-year-old remains from the Denisova cave to DNA extracted from modern humans and Neanderthal remains (Green et al. 2010) suggests low levels of gene flow (Bi-Rached et al. 2011), hence the existence until recently of at least three hominin species (H. sapiens, Neanderthal, and Denisova hominins). These discoveries, like many earlier (Leakey 2001; Brunet 2002; Brown et al. 2004), not only revived interest in the question how many hominin species have existed (Lieberman 2001; Wolpoff et al. 2002), they also stress the need for an independent expectation (Conroy 2002; Marks 2005). Here, we present statistical models to estimate how many species the “direct line to modern humans” consists of (DLMH, the branch from the human–chimp split to present-day H. sapiens), and how many species derived from it.

Materials and Methods

We regard the DLMH as a branch of a reconstructed phylogeny (Nee et al. 1994b) and following previous work (Raup et al. 1973; Nee et al. 1994b), we assume that the stochastic rates of speciation and extinction (λ and μ, respectively) that gave rise to that phylogeny have been constant across the DLMH, and equal in all species that descended from it. We denote t_{o} and t_{e} the times when a branch originated and ended (measured throughout in millions of years ago with the present being time 0 and time increasing going into the past). For the DLMH, t_{e}= 0 and t_{o} is the time of the human–chimp split. Evolution of the DLMH started with the single species that existed at t_{o}, immediately after the human–chimp split. It can be shown that the probability that this species led to n descendant species after time t is (Bailey 1964), for n > 0:

(1)

where

.

For λ≠μ, α is the probability that a species has no descendants after time t, that is, for n= 0 we have P_{0}(t) =α. In the special case where λ=μ, P_{0}(t) =λt/ (1 +λt).

The chance that speciation occurs is λ anywhere along the DLMH. Any speciation that took place cannot have led to extant descendants, because in that case we would have observed them. The chance of a speciation not leading to any extant descendants is P_{0}(t). Thus, the chance of a “hidden” speciation event is 2λP_{0}(t) anywhere on the branch. The factor 2 accounts for the fact that either one of the two species does not leave descendants. The expected number of hidden speciation events on a branch between time t_{o} and t_{e} becomes:

(2)

Note that equation 2 not only depends on the length of a branch (t_{o}–t_{e}), but also on the start and end times of a branch t_{o} and t_{e}. From equation 2, we can observe that we expect more hidden speciation events for older branches of length (t_{o}–t_{e}) than for younger branches of the same length. The intuitive reason is that older clades have had more time to go extinct, thereby leaving more hidden speciation events on older branches. Equation 2 provides the expected number of speciation events, but to obtain the probability density we need further derivations. First, we derive this probability for λ≠μ. As noted above, the probability density of a speciation event happening at time point T that is “hidden” (i.e., has no extant descendants) is 2λP_{0}(T). The probability that no speciation event happens on a branch during a time interval of length τ is e^{-(λ−μ)τ}. Together, the probability density that a branch at time point t_{o} is not extinct by a time point t_{o}–t_{e} later, and unobserved speciation events occur at times T_{1}, T_{2}, …, T_{S} is

.

The factor S! accounts for the fact that the times T_{1}, T_{2}, …, T_{S} can be ordered in S! different ways, while we are only interested in the case in which they happen sequentially through time. Now, the probability of S unobserved speciation events happening between t_{o} and t_{e} is obtained by integrating the previous equation over T_{1}, T_{2}, …, T_{S}:

We condition our probability function on the fact that we observe a branch between t_{o} and t_{e} (which has probability density P_{1}(t_{o})/P_{1}(t_{e}), see [Stadler 2010]), which yields

Evaluating the integral yields:

.

In the limit for λ=μ (using the property that for small x, we have e^{x}≈ 1 +x), we obtain,

.

We can rewrite P(S) (for λ≠μ and λ=μ) as function of S and E(S) (where E(S) is provided in eq. 2):

(3)

Thus, we established that the number of hidden speciation events on a branch between time t_{o} and t_{e} follows a Poisson distribution with mean E(S). The times of the hidden speciation events follow a nonhomogenous Poisson process with rate 2 λP_{0}(T), where T denotes the time of the process.

Splits may occur not only on the DLMH, but also on every lineage that derived from it. For example, the Denisova hominin may be more closely related to the Neanderthal than to H. sapiens. Thus, equation 3 describes the number of splits on the lineage that persisted to the present, but not the number of species resulting from these speciation events. To estimate how many species originated from these events, we need some further derivations. First, we will calculate the distribution of the number of simultaneously existing species s at time T in the past, conditional on a single species at present. Again, we start with deriving the case λ≠μ. The process starts with the single species that existed at time t_{o}, just after the human–chimp split. At some point in time T where t_{o} > T > t_{e}, the probability that there are s species simultaneously in existence is given by equation 1 above, , (where the subscript indicates that (t_{o}–T) is substituted for t in α and β). We observe only H. sapiens at present (t_{e}), which happens if one of the species extant at time T has one surviving descendent species, and all other species extant at time T have no surviving descendant species. That has the probability s (1 –α_{Te}) (1 –β_{Te}) α_{Te}^{s}^{- 1} where the subscript indicates that (T–t_{e}) is substituted for t in α and β . Thus, the probability that s species existed at time T and one species exists at present is s (1 –α_{oT}) (1 –α_{Te}) (1 –β_{oT}) (1 –β_{Te}) (α_{Te}β_{oT})^{s}^{- 1}. The distribution of the number of simultaneously existing species s, conditional on a single species at present, then becomes:

(4)

.

In the limit for λ=μ (using the property that for small x, we have e^{x}≈ 1 +x), we obtain

.

For the DLMH, this equation gives the probability that there existed s hominin species at some time t_{o}– T after the human–chimp split, given that at present (t_{e}) only H. sapiens is left. The expected number of species at any point in time is

(5)

To obtain quantitative estimates of numbers of extinct species, we need estimates of t_{o}, λ, and μ. Recent estimates using autosomal genes suggest that the human–chimp split occurred between 5 and 7 Mya (Kumar et al. 2005; Hobolth et al. 2007), whereas sex-chromosomal sequences instead point to a divergence as recent as 4 Mya (Patterson et al. 2006). To reflect this uncertainty, we assigned t_{o} for the DLMH a normal distribution with mean 6 and SD 0.66 (so that 8 > t_{o} > 4 with 99.7% certainty). We estimated the speciation and extinction rates λ and μ from the branching times of a reconstructed phylogeny (Nee et al. 1994a; Stadler 2008). We used the following phylogeny of the hominoid primates:((((Pan_paniscus:0.93, Pan_troglodytes:0.93):4.37, Homo_sapiens:5.3): 1.6, (Gorilla_gorilla:1.3,Gorilla_beringei:1.3):5.6):6.1, (Pongo_abelii:1.1, Pongo_pygmaeus:1.1):11.9) (Fig. S1). Consensus seems to exist on the topology of this phylogeny, discussion having focused on its branching times. We adopted an estimate of an approximately 0.93-million-year-old divergence of the bonobo and common chimpanzee (Won and Hey 2005; Hey 2010). The split between the two Gorilla species is estimated at 1.3 Mya (Thalmann et al. 2007), and the orangutan species are estimated to have diverged 1.1 Mya (Warren et al. 2001). We adopted estimates for the ages of deeper splits from Glazko and Nei 2003.

Results

We obtained Bayesian estimates of λ and μ by MCMC sampling as described by Bokma (2008), assuming uniform priors <0,∞> for λ and μ, and using the likelihood of the branching times in a reconstructed phylogeny from Stadler (2008), which assumes a uniform prior <0,∞> on the age of the root. The broad prior for the root age tends to lead to higher estimates of λ and μ and, hence, high numbers of extinct species. We estimate a mean λ= 0.46 (95% CI 0.12–1.37) and μ= 0.43 (0.01–1.44) per million years, suggesting an average time from origin to extinction of a species of 0.43^{−1}= 2.3 million years, which is similar to fossil estimates for other mammals (Alroy 2009). Overall, we estimate high species turnover, but due to the relatively few branching times, we estimate a relatively long tail of high species turnover rates μ/λ reminiscent of the prior (Fig. S2). Also due to few branching times, including uncertainty in these has little effect on the posterior distribution of λ and μ.

Substituting 1000 samples of λ, μ, and t_{o} in equation 3, we found that the expected number of speciation events is just 3.6, although the 95% confidence region extends to as many as 12 such splits (Fig. 1A). We then substituted estimates of λ, μ, and t_{o} in equation 4 and found that the number of simultaneously existing species s is most likely low (Fig. 2), even for high species turnover rates μ/λ. According to equation 5, even though speciation events may have occurred anywhere along the DLMH, greatest extinct hominin species richness is expected about halfway along the DLMH. That is because very soon after the human–chimp split, time had been insufficient for species richness to build up, whereas close to the present, time was insufficient for species richness to disappear. Yet, even halfway along the DLMH, where the expected number of coexisting species reaches its maximum, we expect only two species, and with 95% probability, the number of hominin species existing at the same time has always been lower than 6 (Fig. 2).

Species that split from the DLMH may also themselves give rise to new species, so the total number of species derived from the DLMH (including H. sapiens) may well be larger than the number of speciation events on the DLMH, and also higher than the number of species s simultaneously existing. We simulated phylogenies with samples of λ and μ following Rambaut et al. (2004) and discarded those that did not have exactly one species after time t_{o}. For each of 10,000 retained phylogenies, we counted the total number of species. In that way, we determined that the expected total number of species more closely related to us than to the chimpanzee is 7.7, but with 95% probability there have been at most 27 hominin species (Fig. 1B).

Discussion

Applications of “mainstream evolutionary biology” to paleoanthropology invariably suggest a lower number of hominin species than commonly recognized. Analysis of chromosomal rearrangements in humans and chimpanzees suggested a most likely number of around 3, and up to 5 speciation events on the direct line to modern humans (Curnoe and Thorne 2003), in agreement with the expected number of 3.6 speciation events we calculated here. A more recent analysis suggested a similar number of 2 to 4 on the DLMH (Curnoe 2008). Allometric analyses of mammal families of similar size and weight as humans (Conroy 2002, 2003; Hunt 2003) also suggested a low number of hominin species simultaneously in existence. Similar to these studies, we calculated that it is highly unlikely that there ever simultaneously existed more than 5 hominin species. These estimates contrast sharply with some fossil interpretations: Where we estimated that the total number of extinct species is 7.7, fossil studies recognize a median of 14 species with a lower limit of 5, and some taxonomies recognize up to 25 species (Curnoe and Thorne 2003; Foley 2005). Several explanations can be given for the discrepancy between high fossil estimates of the number of hominin species and low estimates from other sources, and these explanations are not mutually exclusive. First, the species concepts used (e.g., morphological vs. evolutionary) may affect the number of species distinguished (Henneberg 2009). A second possibility is that numbers are inflated by phenotypic plasticity of morphological characters by which fossil species are identified, although this does not necessarily influence the reliability of classification (Collard and Lycett 2008). Finally, claims to “uniqueness” and individual prestige may lead to a bias toward recognizing many species, which, even though recognized in the field (Derricourt 2009; Tarver et al. 2011), might be very difficult to correct. Combining different types of data will help to provide a more accurate picture of human evolution, but based on the evidence available already (discussed above) it seems likely that splitters will have to downwards adjust the number of hominin species they recognize.

Compared to the expectations calculated here, in particular, the number of species in the very recent past is improbably high (Fig. 2). Analyses of nuclear genomes revealed gene flow between H. sapiens, Neanderthal, and Denisova hominins (Green et al. 2010; Reich et al. 2010; Bi-Rached et al. 2011), which may indicate that these populations did not constitute biological species. However, reported levels of gene flow are low and may have occurred early in the divergence of these lineages (Reich et al. 2010). If these lineages were separate species and not isolated populations of H. sapiens, then around 40,000 years ago at least three hominin species existed simultaneously. As is illustrated in Figure 2, the expected number of coexisting species 40 000 years ago is 1 (i.e., H. sapiens) and with 96% probability there was no other species at that time. The probability that three (e.g., H. sapiens, Neanderthal, and Denisova hominins) separate lineages existed is 0.2%. These results do not depend much on our estimates of speciation and extinction rates. If we assign species status and very recent common ancestors to all the populations of Gorilla, Pan, and Pongo that some consider subspecies, we obtain very high speciation and extinction rates (λ≈μ≈ 3), but even then we expect only one species 40,000 years ago. Indeed, if also H. floresiensis is a distinct species, then there were four very recent species, which would be expected only if λ≈μ≈ 150, which is an extinction rate three orders of magnitude higher than fossil estimates for mammals (Alroy 2009). In other words, it is extremely unlikely that two out of three (or, even less likely, three out of four) very recent hominin species disappeared due to natural, background extinction. Thus, if future investigations confirm the species status of these lineages, this provides strong evidence for the idea that recent H. sapiens are not typical hominins, but a competitively superior species, the rise of which first restricted congeners to isolated refuges such as the island of Flores (Brown et al. 2004) and Gorham's cave on Gibraltar (Stringer et al. 2008), to eventually drive them to extinction on a global scale, as it continues to do today (Fig. S1).

Associate Editor: Dr. J. Vamosi

ACKNOWLEDGMENTS

We want to thank the referees for valuable comments on earlier versions of this manuscript. This research was funded by the Swedish research council (FB).