Potential conflict of interest: Nothing to report.
The origin of hepatitis B virus (HBV) infection in humans and other primates remains largely unresolved. Understanding the origin of HBV is crucial because it provides a framework for studying the burden, and subsequently the evolution, of HBV pathogenicity with respect to changes in human population size and life expectancy. To investigate this controversy we examined the relationship between HBV phylogeny and genetic diversity of modern humans, investigated the timescale of global HBV dispersal, and tested the hypothesis of HBV-human co-divergence. We find that the global distribution of HBV genotypes and subgenotypes are consistent with the major prehistoric modern human migrations. We calibrate the HBV molecular clock using the divergence times of different indigenous human populations based on archaeological and genetic evidence and show that HBV jumped into humans around 33,600 years ago; 95% higher posterior density (HPD): 22,000-47,100 years ago (estimated substitution rate: 2.2 × 10−6; 95% HPD: 1.5-3.0 × 10−6 substitutions/site/year). This coincides with the origin of modern non-African humans. Crucially, the most pronounced increase in the HBV pandemic correlates with the global population increase over the last 5,000 years. We also show that the non-human HBV clades in orangutans and gibbons resulted from cross-species transmission events from humans that occurred no earlier than 6,100 years ago. Conclusion: Our study provides, for the first time, an estimated timescale for the HBV epidemic that closely coincides with dates of human dispersals, supporting the hypothesis that HBV has been co-expanding and co-migrating with human populations for the last 40,000 years. (HEPATOLOGY 2013)
Hepatitis B is a major global public health concern with approximately 2 billion individuals infected with hepatitis B virus (HBV) and with more than 350 million chronic carriers.1 HBV has been phylogenetically classified into eight distinct genotypes (A-H), which are further divided into subgenotypes denoted by numerical subscripts (A1, B1, C3, etc.).2–4 Debate about the origin of the infection in humans and other apes has focused on three competing hypotheses.5 In the first scenario, because the South American-specific genotypes, F and H, are outliers to the rest of the genotypes, it has been suggested that HBV was endemic in the New World and spread to the rest of the world 400 years ago, soon after the colonization from Europeans (New World Origin).5 In addition, this scenario suggests that HBV transmitted to human populations of the New World as a result of one cross-species transmission from New World monkeys to humans around 2,000 years ago. A second hypothesis suggests that HBV was present in the common ancestor of the Old World primates and New World monkeys and co-speciated with them from 35 Myr to 10 Myr ago (co-speciation).6 Moreover, to explain the fact that HBV strains from primates and humans phylogenetically do not form distinct clades, this hypothesis further proposes that humans have been infected as a result of multiple cross-species transmission events from primates. Finally, and chronologically in the middle of the other two, it has been proposed that HBV could have been present in anatomically modern humans when they migrated from Africa, ∼60-70 thousand years ago (ka) (Out of Africa hypothesis).7–9
On current evidence, none of these three hypotheses can be accepted as the most probable. Modern humans originated in Africa 200 ka before present (BP), moved out of Africa 60-70 ka BP, spread to Eurasia and Oceania 40-45 ka BP, and finally to the Americas ∼15 ka BP.9–11 The “Out of Africa” hypothesis would be supported if global HBV genotype distributions matched these anatomically modern human (Homo sapiens) migrations. Crucially, HBV sequences sampled from several isolated indigenous populations belong to separate subgenotypes.12–16 In some cases, such as the Indonesian archipelago, the distribution of HBV genotypes/subgenotypes is associated with the ethnic origin of the populations.12 These geographical patterns indicate that HBV diversity might be associated with early waves of human migration, although HBV phylogeny does not match perfectly the evolutionary history of human populations or primates.5
We investigated the controversy about the origin of HBV in humans and systematically searched for patterns in HBV phylogeny related to modern human history. Based on evidence supporting the coincidence of HBV and human migrations, we investigated the timescale of global HBV dispersal and tested the hypothesis of co-divergence of the virus with modern humans using phylodynamic and phylogeographic methods. We also propose a model for the origin of HBV in Old World primates. We suggest, based on multiple lines of evidence, that the “Out of Africa” hypothesis is far more likely than the alternative hypotheses about the HBV origin in humans.
HBV, hepatitis B virus; 95% HPD, 95% higher posterior density; ka, thousand years ago; tMRCA, time to most recent common ancestor.
Materials and Methods
Review and Analysis of HBV Epidemiology with Respect to Indigenous Populations.
If HBV co-diverged with human populations, we should be able to find distinct patterns relating to ancient human population movements. We systematically searched the literature of HBV epidemiology using the keywords “Amerindians,” “Pacific,” “Aborigines,” “Indigenous,” AND “HBV.” We also downloaded nucleotide sequences isolated from populations using these keywords. The search was completed in August 2010 and updated in May 2012 (Supporting Information).
Rationale of Analyses.
We tested for HBV-human co-divergence using a stepwise calibration-test approach. Briefly, we checked whether the coalescence times of the Amerindian population (13.0-20.0 ka BP), when used to calibrate the ages of the Amerindian-specific genotypes on the HBV tree, were able to estimate the co-migration of HBV and humans in Polynesia. These dates are based on genetic and archaeological evidence for the dispersal times of modern humans in the Americas.17 We then incorporated the Polynesian and the Haitian calibration dates in our molecular clock analyses (6.6 ± 1.5 ka and no earlier than 500 years ago, respectively) to incorporate dates that covered a larger part of the HBV genetic diversity. If HBV had only appeared in the human population a few thousand years ago, we would not expect early and late coalescent dates in the human phylogeny to match with those in the phylogeny of their HBV isolates. We also tested whether historical human population sizes correlated with the inferred effective population sizes of HBV.
Dating the Time to Most Recent Common Ancestor (tMRCA) of the HBV Lineages.
Molecular clock analyses were performed in two steps: First, we examined whether the tMRCAs of HBV lineages circulating specifically in isolated (indigenous) human populations coincided with the populations' well-established coalescent events from mitochondrial and Y chromosome haplogroup data (M1 model; Table 1). The calibration was placed at the root node of the F/H HBV genotypes from the Amerindians, corresponding to the first colonization of the Americas. This event is estimated to have occurred approximately between 13.0 and 20.0 ka BP,17 but probably towards the younger end of this range.18 The prior was approximated using a gamma distribution with a minimum bound of 12.5, median of about 15.0, and an upper 95% limit of about 19.0 ka.
Table 1. Inferred Coalescence Times For C3 and D4 Autochthonous Subgenotypes in the Pacific
Median Estimate (95% HPD) ka (μ1 Model)
Median Estimate (95% HPD) ka (μ2 Model)
Coalescent event used as calibration point.
M1 model: Results of phylodynamic analysis using a single calibration placed at the root node of the F/H HBV genotypes from the Amerindians, corresponding to the first colonization of the Americas (∼between 13.0 and 20.0 ka BP)
M2 model: Results of phylodynamic analysis using additional calibration points based on the coalescence time of the Asian founders (6.6 ± 1.5 ka) of Remote Oceania, used as a prior for the tMRCA of HBV subgenotype D4 in Polynesia. Moreover an upper bound of 500 years was used as a prior for the coalescence of A5 in Haiti.
Given that the estimated dates for human and HBV lineages of Polynesian populations match (Table 1), we repeated the molecular analyses (second step) using additional calibration points (M2 model). Specifically, the second calibration point was based on the coalescence time of the Asian founders (6.6 ± 1.5 ka) of Remote Oceania (19), used as a prior for the tMRCA of HBV subgenotype D4 in Polynesia. We selected the coalescence time of the D4 instead of C3 to set as a calibration point because of the wider distribution in time estimates for the origin from Near Oceania (6.2–12.0 ka) compared to Asia (5.1–8.1 ka). Finally, given that the slave trade in Haiti started at the beginning of the 16th century, we used a conservative upper bound of 500 years for the coalescence of A5 in Haiti. Details about the analyses are described in the Supporting Information.
HBV Molecular Epidemiology Suggests that HBV Followed Modern Human Major Migrations. We explored systematically the HBV dispersal in indigenous populations around the world (Supporting Table 1). Most strikingly, we found that in Australian Aborigines the prevalence of HBV infection is very high, ranging between 3% to 35%. Notably, two full-length HBV isolates from the Australian Aborigines, classified as genotype C, appear as outliers to the clade C radiation and are termed “novel variant genotype C.”16 The high divergence between genotype C strains and these novel variants suggests an ancient origin of HBV infection in this population. In the alternative scenario with HBV infection in Aborigines being introduced after the European colonization of Australia about 200 years ago, we would expect the “Aboriginal” genotype C genetic diversity to be nested within the diversity of globally sampled genotype C sequences. However, this pattern is observed only for a few cases, which are most probably spillover infections from recent Australian settlers.
The distribution of HBV genotypes in South America also correlates with the ethnic origin of the population.13 Specifically, genotype F has been detected at a very high frequency among the HBV-infected Amerindians in all countries of South America (Venezuela, Colombia, Peru, Bolivia, Argentina, and Brazil) and, in general, its prevalence depends on the degree of admixture of the population with Amerindians.13,20 Moreover, F sequences sampled from sporadic cases among the non-Amerindian population appear as nested clades within the “Amerindian” genotype F radiation.3 Genotype H, which has been isolated from Amerindians in North and Central America, displays a close phylogenetic relationship with genotype F.21,22 This suggests an ancient introduction of the F/H ancestral strains to the Americas, certainly occurring before the recent European colonization.
HBV sequences sampled from several isolated indigenous populations, such as the Canadian Arctic, Indonesian tribes, Papua Indonesia, and Pacific islands, form distinct subgenotypes (B6 for Canadian Arctic, C3 for Pacific, C3 and C5-C10 for Indonesia). This pattern suggests that HBV genotypes and subgenotypes were shaped by different waves of human migration across the continents.12,23–26 This hypothesis is further supported by the observed gradient of nucleotide and amino acid diversity from west to east, as well as the clustering of HBV sequences from three Polynesian islands, which is in accordance with archeological and linguistic evidence for the initial west-to-east settlement of Polynesia.19 Crucially, analyses of the Y chromosome and the mitochondrial DNA (mtDNA) markers revealed a dual genetic origin of Polynesians (Remote Oceania) from Near Oceania (Melanesia) and Asia (see Supporting Information).19 The detection of two autochthonous subgenotypes (C3 and D4) in Remote Oceania is consistent with the dual genetic origin of this population.19
Phylogenetic analyses of the HBV sequences isolated from Haiti revealed that a proportion of them formed a monophyletic clade within subgenotype A5 from Africa. The latter finding suggests that the particular strains spread as the result of a founder effect that occurred during the period of the slave transport from Africa to Haiti between the 16th and 19th centuries. The fact that the nested “Haitian” A5 clade originated 200-500 years in the past suggests that the subgenotypes within A have been circulating in Africa for several centuries, well before the start of slave migration from Africa to the Caribbean.
Molecular Clock Estimates Are Compatible With the Out of Africa HBV Hypothesis and With Ancient HBV DNA Findings.
To test the co-divergence of HBV with humans, we examined whether the tMRCA of HBV lineages from particular regions correlated with previous estimates of divergence times of isolated human populations. Briefly, we employed a stepwise calibration of the HBV molecular clock. We initially tested whether the oldest calibration points (i.e., the migration into the Americas and Oceania) were reciprocally concordant. We then added a series of younger calibration points (see Materials and Methods and Supporting Information), allowing us to cover a larger part of the HBV history.
First, we performed molecular clock analysis in the overlapping S/P region, using a single calibration point at the root node of the F/H genotypes from Amerindians, to test if we are able to infer the timing of the settlement of Polynesia (M1 model, SI) (Table 1). More specifically, human settlement of Remote Oceania occurred recently and linguistic and archeological evidence points toward an origin from Asia or Near Oceania (Melanesia), respectively.27 Previous tMRCA estimates (6.2–12.0 ka for Y chromosome and 5.1–8.1 ka for mtDNA), in addition to the double origin of the Polynesians, were used as the major hypotheses to be tested by our molecular clock analyses of HBV.19 Our first calibration point, which was placed at the root node of the F/H genotypes from the Amerindians, allowed us to accurately recover the previously mentioned coalescence times of Polynesian populations (Table 1).
The molecular clock analysis using the additional younger calibration points (i.e., D4 subgenotype and A5 clade from Haiti; see Materials and Methods and Supporting Information) gave an estimate for the substitution rate of HBV of 2.2 × 10−6 (95% higher posterior density [95% HPD]: 1.5−3.0 × 10−6) substitutions/site/year. Our estimate for the tMRCA of HBV in humans was therefore 33.6 ka (95% HPD: 22.0–47.1 ka) (Table 2). The median tMRCAs for most HBV genotypes (A, B, D, and F) are similar to each other (Table 2), ranging from 8.9 to 12.7 ka (Table 2; Figs. 1-3; Supporting Figs. S2-S4). Genotype C was the oldest, due to the inclusion of the outlier “Aboriginal” strains (median estimate 26.2 ka; Fig. 2). In contrast, genotypes E, H, and G appeared much more recently, although considerable differences were observed in their median tMRCAs (0.7–6.0 ka; Table 2).
Table 2. Estimated Time to the Most Recent Common Ancestor (tMRCA) for Major HBV Subgenotypes
Is there evidence that HBV is evolving so slowly? Notably, in a recent study Bar-Gal et al.28 described the detection and molecular characterization of HBV DNA isolated from a Korean child naturally mummified in the 16th century A.D. This finding provides the first physical evidence that humans were infected with HBV at least 400 years ago, but also allows us to check if our molecular clock findings are consistent. The ancient sequence from the Korean mummy was not an outlier to the most recent HBV subgenotype C2 sequences (Fig. 2), confirming that HBV is a slow-evolving pathogen and that its clades (genotypes and subgenotypes) were shaped long before the 16th century A.D.
The Most Pronounced Period of HBV Growth Coincides With Exponential Growth of Modern Humans.
The estimated population history of HBV, measured as the product of the effective number of infections and generation time (NeT) (Fig. 4), suggests that the most pronounced period of growth began about 5.0 ka years ago and lasted for at least 4,000 years. The exponential phase in the HBV epidemic coincides with the population expansion of modern humans over the past 5,000 years, during which the global population increased from 15 million to 3,000 million (P < 0.001) (Fig. 4).29,30
HBV and Human Co-cladogenesis Coincide in Shape.
Is there any similarity between the HBV and human populations' phylogeny to support co-cladogenesis of HBV and human? If so we would be able to see the formation of HBV clades coinciding with the formation of clades in the human phylogenetic tree. During the last 20 years, phylogenetic analyses of both uniparental (mtDNA and Y chromosome) but also X chromosome genetic markers have revealed that non-African populations are nested within the African diversity.31 The uniparental markers revealed that the first split of non-Africans from their ancestors occurred ∼60,000 years ago (K = 2; where K corresponds to the number of distinct clades-populations the dataset is divided) (Fig. 5). The next split separates data into partitions from five geographic regions (K = 5; Africa, East Asia, South Asia, Oceania, and Europe) occurring ∼40,000 years ago.31 Finally, an additional geographic division appears from America more recently at ∼20,000 years. The cladogenesis of the major HBV lineages, estimated to have occurred ∼20,000 years ago (Fig. 5), is highly consistent with the split times in both uniparental markers' trees distinguishing all major continents,31 suggesting that HBV major clades were generated as a result of major migrations of human populations (Fig. 5).
As suggested by phylogeographic analyses performed on the Bayesian trees, genotype A originated in Africa ∼10.0 ka (Table 2). At ∼7.9 ka, genotype A spatially divided into two major branches from western, southern (A1, A2, and A6), and eastern Africa (A3, A4, and A5) (Supporting Fig. S2),2 which further divided into the subgenotypes and area-specific branches within subgenotypes (A2 Europe, A5 Haiti). The tMRCA of the founder A2 lineage in Europe was estimated at ∼5.0 ka (95% HPD: 2.7–7.5 ka), suggesting a migration from Africa to Europe. In addition to the A2 subclades that spread outside Africa, A1 strains have been detected in East and South Asia at a similar point in time to the European A2 (Table 2; Supporting Fig. S2).
Genotype B also showed geographical clustering into three branches from indigenous Arctic populations (B6), East Asia (B1 and B2), and South-East Asia (B3–B9). However, we found no evidence relating to the source of this dispersal (Supporting Fig. S3).
For genotype C, most subgenotypes (except for C1 and C2 in East Asia) showed a strong geographic pattern in their clustering, suggesting that dispersal occurred through different founders (Near and Remote Oceania, insular Southeast Asia) (Fig. 2). Notably, strains from Remote Oceania (subgenotype C3) clustered with those from Near Oceania (C6), matching modern human migrations in this area (Fig. 2).19 However, no conclusions about the source of dispersal can be drawn.
Genotype D showed the highest level of spatial complexity. Specifically, except for D4 and D7 isolated from Remote Oceania, Australian Aborigines and Tunisia,15,32 genotype D isolates from Western Asia showed no clustering according to their geographic origin (Fig. 3).33–35 There were several routes of dispersal from Western Asia (Iran, India, Asia Minor) to Europe (D2), South-East Asia (D6), or East Asia (D1) ∼5.0-6.0 ka, probably matching late agricultural spread from the Middle East to Europe and eastwards. Notably, a few isolates from the Canadian Arctic formed a monophyletic group within the D4 subgenotype from indigenous populations in Australia and Remote Oceania.36 The D4 strains were associated with the First Nation (Dene) population from the Western Arctic, in contrast to the subgenotype B6 found in Inuit living in the Eastern Arctic. Interestingly, the eight D4 strains were detected in different communities distantly located across the southwest Canadian Arctic.36 Identification of monophyletic strains among the indigenous Arctic populations suggests a potential settlement of the Canadian Arctic from the Pacific. The tMRCA of D4 in the Arctic was 4.1 (95% HPD: 1.8–6.2 ka). The latter hypothesis cannot be rejected on the basis of the estimated tMRCA of D4, given that the colonization of the Pacific occurred during the last 2.0–3.0 ka.37
A well-defined geographical separation was also observed for genotypes F and H. The genetic diversity of genotype F was greater than that of H, but no geographic origin could be traced for genotype F (Supporting Fig. S4). Notably, genotype F diversified into subgenotypes termed F1b, F2a, F2b, etc., suggesting high levels of isolation for the Amerindian population carrying HBV.
Cross-Species Transmission of HBV From Human to Nonhuman Primates.
The branching order of primate HBV sequences indicates three independent transmission events, giving rise to the gibbon, orangutan, and chimpanzee HBV lineages, with minimum ages of 12.8, 6.9, and 8.2 ka, respectively (Figs. 1, 6). The orangutan HBV lineage is closely related to the C4 and J human lineages. Chimpanzee-derived HBV sequences, on the other hand, are more distantly related to extant human lineages, resembling a “new” genotype within the HBV human radiation. It also suggests a cross-species event from humans to chimpanzees from an ancient human lineage that went extinct (Figs. 1, 6). This is not surprising, given the ancient nature of potential chimpanzee ancestors. Based on the currently available HBV sequences and the nested clustering of both Asian and African ape within HBV human-derived sequences, the opposite scenario of HBV origins in humans (ape-to-human transmission) is unlikely.
Our systematic survey of HBV dispersal in isolated human populations provides several lines of evidence that HBV co-diverged with modern humans. First, there is a high congruence between branching points in the HBV genealogy and those of humans—if we calibrate the HBV tree at the root node of the F/H genotypes from Amerindians using dates from genetic and archeological evidence, the estimated divergence times for subgenotypes C3 and D4 in Near and Remote Oceania are highly consistent with inferred colonization times.19 Second, our estimate of the population history of HBV over 20.0 ka closely mirrors that of humans. Third, the age of HBV infection in humans, dating back to 33.6 ka with an upper bound of ∼47.1 ka, is in agreement with the estimated coalescence time of modern non-African human mitochondrial and Y chromosomal lineages.9,11 Fourth, the distant branching of the C4 subgenotype from the rest of genotype C strains is compatible with the ancient nature of the HBV Aboriginal strains. The estimated timeframe of the genotype C coalescence (26.2 ka; 95% upper bound: 38.9 ka) is also in accordance with previous date estimates of ∼30.0 ka for the separation of Australian and Asian human and bacterial genetic markers.38 The human genetic and archeological evidence points to a colonization of Australia and New Guinea around 40.0 to 45.0 ka9, 11; however, the divergence time of New Guineans from Asia was recently estimated at 27.0 ka,39 which matches our estimates for genotype C. Finally, the date of cladogenesis for the major HBV lineages matches the tMRCA (20.0 ka) (Fig. 5) in both mtDNA and Y chromosome trees of modern human populations from five continents.31
In the absence of ancient HBV DNA samples, we divided our co-divergence hypothesis into independent components and tested them for their robustness. For example, do molecular estimates from HBV sequences, calibrated using the date of the human colonization of the Americas, correctly predict the colonization of the Pacific Islands? If so, then the pattern of HBV dispersal through these regions was similar to that of their host, suggesting that the co-divergence hypothesis is at least internally consistent. Recently, Bar-Gal et al.28 detected HBV in a Korean mummy dating from the 16th century A.D. Given that there is no contamination from recent HBV-DNA samples, as the authors explained thoroughly in their study, the high similarity of the ancient sequences with synchronous samples (∼99%) is concordant with our substitution rate (∼10−6) and the age of the sample (∼400 years).
Our model indicates that the major HBV genotypes and subgenotypes resulted from multiple founder events that occurred subsequent to the Out-of-Africa human migration.40 This recent generation of the global HBV genetic tapestry (∼10 ka) explains why only one of the genotypes (A) is endemic in Africa. That we do not find the highest genetic diversity of HBV to be in recent studies on the populations of Africa is because recent studies on the populations of Africa suggest that the ancient Homo sapiens, and most probably their associated HBV lineages, were replaced by more recent population expansions.41
Given our proposed model about the long march of the virus in modern humans, another question is how the clinical manifestations of the infection remained hidden for such a long time. HBV infection results in chronic infection at a rate of 10%–80%, depending on the age of the infected population. Moreover, among the chronically HBV-infected individuals, 1%–14% have a risk of developing hepatocellular carcinoma (HCC) after > 30 years of infection.42 Thus, the very long induction period and the low HBV-associated mortality rate can explain the ancient nature of the pathogen in humans, given that the lifespan of humans was less than 40 years until 150–200 years ago. Moreover, HBV can be effectively transmitted vertically or horizontally (sexually, bloodborne, or interfamily), suggesting that HBV may have caused extensive epidemics in the past, spreading either vertically or through human practices.
Other, divergent lineages of HBV have been isolated from different avian and rodent species, indicating its ancient origin.43–45 In contrast, HBV has been detected in only a few nonhuman primates, with all of these strains (except for those from the woolly monkey) falling within the human HBV radiation. This pattern suggests that the lineages of HBV from nonhuman primates were the result of at least three different human-to-ape cross-transmission events that occurred no earlier than 6,100 years ago. The apparent absence of HBV infection in other ape species (Cercopithecidae, Atelidae, Cebidae, Lemuridae and Callimiconidae) supports our hypothesis about a more recent, human-derived origin of HBV infection in these animals. The abundance of highly divergent HBVs from birds (Ross' goose, Sheldgoose, Duck and Snow goose) and other species (e.g., woodchuck and squirrel),45 also suggests that these viruses have been infecting different animal hosts for a long time and, therefore, that one of the animal hosts also provided the source of HBV infection to humans.
Our study using “deep” calibration ages provides an older estimate for the long-term evolution of the HBV infection in modern humans. Although it was previously proposed that HBV might follow the migrations of modern humans out of Africa,7,8 ours is the first study providing compelling lines of evidence that this hypothesis is the most likely. We also found evidence for HBV infection in Old World nonhuman primates being the result of human-to-ape transmission events. We have described a complementary approach to study the history of pathogens, based on evidence of phylogeographic co-divergence with their host.38 This approach might be applied to clarify other host-pathogen histories.