Inferring archaic introgression from hominin genetic data

Abstract Questions surrounding the timing, extent, and evolutionary consequences of archaic admixture into human populations have a long history in evolutionary anthropology. More recently, advances in human genetics, particularly in the field of ancient DNA, have shed new light on the question of whether or not Homo sapiens interbred with other hominin groups. By the late 1990s, published genetic work had largely concluded that archaic groups made no lasting genetic contribution to modern humans; less than a decade later, this conclusion was reversed following the successful DNA sequencing of an ancient Neanderthal. This reversal of consensus is noteworthy, but the reasoning behind it is not widely understood across all academic communities. There remains a communication gap between population geneticists and paleoanthropologists. In this review, we endeavor to bridge this gap by outlining how technological advancements, new statistical methods, and notable controversies ultimately led to the current consensus.


| ESTIMATES OF ARCHAIC ADMIXTURE BASED ON ANCIENT MITOCHONDRIAL DNA SEQUENCING
The first ancient Neanderthal DNA to be successfully isolated and analyzed was mtDNA from the Neander Valley type specimen, discovered in 1856. 15,16 The relative abundance of mtDNA in cells, compared with nuclear DNA, made it a logical starting place for sequencing ancient Neanderthal DNA (Box 2). These first studies found that, across a total of 600 base pairs (bp) of sequence, Neanderthal mtDNA fell well outside the bounds of extant human mtDNA variation ( Figure 1), exhibiting on average three times as many pairwise differences from extant humans as different human populations did between each other. Importantly, these researchers also did not find that Neanderthal mtDNA was more similar to that of Europeans than to that of Africans or Asians. This observation went against a key prediction under the multi-regional model that Neanderthals contributed substantially to the ancestral gene pool of modern Europeans. 2,17,18 Sequence differences were also used as a molecular measure of divergence time, calibrated using a human-chimpanzee divergence of 4-5 million years ago. 19,20 Both studies consistently found a mtDNA sequence divergence time of approximately half a million years between the Neander Valley specimen and modern humans, which is approximately three to four times older than the average divergence time between extant human mtDNA sequences ( Figure 1). 15,16 These results from a single Neanderthal showed that its mtDNA was evolving separately from AMHs for over half a million years and is no longer present in the modern human gene pool.
Additional work expanded these analyses by including mtDNA sequence data from an additional Neanderthal individual from Vindija Cave, Croatia. 21 This individual's mtDNA also exhibited large sequence differences from extant human mtDNA sequences, and phylogenetic analysis placed these Neanderthals together in a deeply diverged clade. 21 The degree of sequence diversity of the Neanderthal population was estimated by comparing the two sequences to each other and to a third shorter mtDNA sequence from a more ancient Neanderthal individual from Mezmaiskaya Cave, Russia. 21 By sequencing multiple archaic individuals, especially ones so geographically dispersed, researchers could confidently say that Neanderthal mtDNA sequences were highly distinct from those of modern humans and were not more closely related to any one extant population. Furthermore, mitochondrial aDNA sequences from nearly contemporaneous Upper Paleolithic AMH specimens were found to fall within the range of modern human mtDNA variation, distant from the Neanderthals. 22,23 The presence of significant genetic differences between the mtDNA of AMH and Neanderthal groups that lived within just 15 ka of each other implied strong reproductive boundaries between the two groups, and contradicted the classic multiregional hypothesis.
The analysis of mtDNA led most geneticists to initially conclude that archaic introgression did not occur. 22,24,25 The availability of additional mtDNA sequencing data has also not significantly changed the broad phylogenetic pattern ( Figure 1). However, mtDNA is a single locus, and can therefore offer only limited information about potential archaic admixture (Box 2). Non-neutral forces such as natural selection for AMH mtDNA (or against archaic mtDNA) could have also led to the complete loss of Neanderthal mitochondrial variation in AMH. 26 Another possibility is that the interbreeding event(s) were sex-biased; in the extreme case, where 100% of interbreeding events involved a Neanderthal male and an AMH female, Neanderthal mtDNA would have never entered the modern human gene pool. Additionally, genetic drift could have erased evidence of archaic introgression from the extant pool of mitochondrial variation. Several population genetics models showed that some degree of interbreeding is compatible with an absence of archaic mtDNAs in the modern gene pool (Table 1).
These various models were, however, difficult to test further without information from additional independent loci, such as from the nuclear genome. Despite these data limitations, geneticists generally agreed that archaic-modern human matings were an unlikely (or at least infrequent) occurrence, a consensus that held until the first archaic hominin nuclear DNA sequencing results were published in 2006.

| ARCHAIC AUTOSOMAL GENOMES
It was not always obvious that the full nuclear genome of an archaic individual would ever be sequenced. aDNA, if it survives in any appreciable quantity, is highly damaged and fragmented, which makes piecing long sequences together a major technological and computational challenge. However, the development of "next-generation sequencing" (NGS) technology in the 2000s significantly mitigated this problem. One benefit of NGS is that individual loci do not need to be specifically targeted to be sequenced; it is capable of sequencing a random selection of all the fragments in a DNA sample. The resulting short reads can later be assembled computationally by mapping (or aligning) them to a reference genome. By contrast, the Sanger sequencing method used in earlier studies required researchers to T A B L E 1 Estimates of initial Neanderthal genomic contribution to AMH based only on mtDNA evidence m Model Citation

<10%
Effective population size of AMH females is 16,000, and no archaic mtDNA is observed in a modern sample of 5,000 mtDNA sequences 26 Up to 25% Single pulse, panmictic population 56

0%
No model, examined differences between mtDNA hypervariable regions of Neanderthals and AMHs (pairwise and in MDS space) 22

BOX 2 Mitochondrial DNA
The human mitochondrial genome is a small (16,569 bp) stretch of non-recombining DNA that is passed from mother to child through the mother's egg cell. Over evolutionary time, mutations accumulate in different mitochondrial lineages, which makes it possible to reconstruct past relationships between different groups and trace the genetic history of females in the population. The pedigree figure above left, which depicts females as circles and males as squares, shows the transmission of mitochondrial genomes (colored ovals) through the generations. Without recombination, offspring carry the same sequence as their mother, except when novel mutations occur.
Early aDNA studies found that, in old, degraded specimens, mitochondrial sequences were the most readily recoverable DNA/genetic material. This is primarily due to their abundance; each cell in the body carries only two copies of the nuclear genome, but up to thousands of mitochondria that each contain several copies of their genome. Furthermore, because the modern human mitochondrial sequence was well known, it was feasible to target a phylogenetically informative region in an ancient specimen for sequencing using the older Sanger sequencing technology. For these reasons, mitochondrial aDNA quickly became an important source of information for studies of archaic admixture.
However, mtDNA has limited power to conclusively answer whether or not archaic and AMH interbred. One reason is that, because it is only transmitted through the generations by females, the mitochondrial genome always has a smaller effective population size than the autosomal nuclear genome and is subject to a proportionately stronger degree of genetic drift. Therefore, while the absence of archaic mtDNA lineages in modern humans was interpreted by some to indicate no introgression, this observation is in fact compatible with a substantial level introgression.
This situation is illustrated in the figure below left, where introgression occurs at generation t 1 with a small number of yellow Neanderthal mitochondrial sequences migrating into the modern human gene pool (red arrow). Over the next few generations, the frequency of the yellow mitochondrial eventually drifts to zero even in the absence of negative selection.
It is important to note that this illustration depicts only one possible iteration of the highly stochastic process that leads to new generations. Under this model, it is also possible that the yellow type persists in the human gene pool until the present day. The likelihood of this scenario increases with higher levels of initial migration (m), and decreases with the age of the gene flow event. In order to determine how many independent loci (i.e., different iterations of the evolutionary process) would be needed to make a determination on the occurrence of admixture between modern humans and Neanderthals, Wall conducted a power analysis assuming a specific demographic model, and estimated that information from 50 to 100 independent loci would be needed. 137 Therefore, while mitochondrial sequence information can paint a general picture of the evolutionary relationship between populations, it offers inadequate resolution to rule out low levels of archaic introgression. Anthropology produced a full, low coverage, Neanderthal genome. 34 The researchers produced this genome by combining sequencing data from three Neanderthal individuals from Vindija. 34 Importantly, they explicitly estimated contamination levels of their libraries. By looking at diagnostic positions in the mitochondrial genome where Neanderthals and modern humans carried fixed differences, Green et al. concluded that contamination by modern humans contributed less than 1% to their dataset. 34 The most highly publicized result of this 2010 article was that individuals from certain extant human populations contain a substantial amount, between 1 and 4%, of Neanderthal-derived ancestry in their genomes (Table 2). 34 The researchers arrived at this figure by developing a novel test, which came to be known as the "D" or  shed. 35 This individual was sequenced from a single finger bone, and was found to be genetically divergent from both modern humans and Neanderthals. Nuclear sequence data placed this group as sister to Neanderthals. 35 The specimen was designated as a member of an unknown archaic population, which was named "Denisovan" after Denisova cave in Siberia where it was discovered. 35 As in the Neanderthal study, the The mitochondrial phylogeny of a Sima de los Huesos hominin, four Denisovan, 19 Neanderthal, 5 extant human, and 4 ancient AMH mitochondrial sequences (15,788 aligned base pairs in total) constructed using the neighbor joining method. 120,121 The branch lengths are proportional to the evolutionary distances computed using maximum composite likelihood. All analyses were conducted in MEGA7. 122 Branch tips are labeled with a sample name, the accession number of the downloaded sequence in brackets, and the approximate date of the specimen. 32,35,75,[123][124][125][126][127][128][129][130][131][132][133][134][135][136] The tree shows that Neanderthal mitochondrial sequences are more highly diverged from extant humans than all AMH (ancient and extant) are from each other. Interestingly, the mitochondrial phylogeny places Neanderthals and AMHs as sister groups to the exclusion of Denisovans and the Sima de los Huesos hominin, as observed previously. 136 This is in contrast to the phylogeny constructed from multiple loci of autosomal DNA, which instead places Neanderthals and Denisovans as sister groups. 35 This discrepancy highlights the fact that inferences of population history based on single loci can be misleading, as they reflect the history of only one gene lineage researchers estimated f, the proportion of Denisovan ancestry in modern humans, using both parametric and non-parametric approaches. Interestingly, they found a large contribution (4-6%) of this archaic group to modern Melanesians, but no contribution to Eurasians (Table 3). 35 Subsequent research has estimated the Denisovan contribution to Melanesians to be only about half that, after also accounting for Neanderthal admixture. 36,37 Additional studies have used a variety of methods to estimate f in both Neanderthals and Denisovans; a summary of these estimates is found in Tables 2-3. In general, initial estimates of the archaic fraction of modern human genomes have tended to be high, with later publications revising these figures downward.

| HAPLOTYPE-BASED METHODS TO IDENTIFY GENOMIC REGIONS OF INTROGRESSION
Following the publication of the low coverage Neanderthal genome sequence, researchers began to highlight specific loci where some modern humans carried haplotypes that were hypothesized to have an archaic source, uncovering evidence for introgression on a finer genomic scale. 38,39 Even before nuclear data from archaic hominins were available, haplotype analyses of modern humans were used to identify genomic candidates of archaic introgression. [40][41][42][43] These methods took the general approach of looking for haplotypes that were both highly diverged from other modern humans and also relatively long (Box 5). Once high coverage archaic genomes became available, some of these cases were re-evaluated by comparing the hypothesized archaic haplotypes to their putative ancestral source.
In one study, Yotova et al. studied a specific haplotype on the X chromosome that is nearly absent in sub-Saharan Africans, common in non-Africans, and the most basal human haplotype. 39

BOX 3 Early Neanderthal genome studies
In 2006, the first two studies of a nuclear Neanderthal genome published significant quantities of ancient sequence and also inferred population genetic parameters such as Neanderthal-AMH divergence time and relatedness. 27,28 It was immediately clear that these two studies had inconsistent estimates of these fundamental parameters, motivating further analyses to understand the drivers of these discrepancies. 29 shared the derived allele with some modern humans at far more loci than they would have expected under a simple demographic model. 28 They concluded that this excess of derived SNP sharing was due to the occurrence of archaic introgression into some ancestral human populations. 28 By contrast, the study by Noonan et al. found no evidence of introgression; they surveyed their data for derived alleles that were at low frequency in Europeans and were also shared with the archaic individual, and found none. 27 In a reanalysis of both datasets and using a uniform set of methods, Wall and Kim confirmed large inconsistencies between both studies. 29  In recent years, this idea of modern humans acquiring beneficial genetic variants through introgression with archaic hominins has become a popular model for explaining how early human populations were able to rapidly adapt to the novel environments they encountered throughout the world. 44

BOX 4 Estimating the fraction of archaic ancestry in modern human genomes
The D statistic was first used by Green et al. (2010) to demonstrate that Neanderthals appeared to be more similar to non-African modern humans than Africans. 34 The appeal of this statistic, and its subsequent widespread use, can be attributed to its simplicity and the fact that it can be calculated even when there is only a single haplotype representing the archaic population. As illustrated above, the D statistic compares the number of derived alleles shared between the archaic specimen (N) and one of the modern human populations but not the other (H1/H2) at biallelic sites that exhibit either an "ABBA" or "BABA" pattern. These are determined through comparison to an outgroup, in this case the chimpanzee (Pan). The chimpanzee state is assumed to be ancestral, and is denoted as "A," while the derived allele is denoted as "B." While this assumption may not always hold, such as in the case of recurrent mutations on the chimpanzee lineage, the effects of this type of misspecification are not expected to systematically bias this statistic, as long as mutation rates across human groups are constant. 102 Multiple loci are tested for "ABBA" and "BABA" patterns, which do not follow the population tree and are thus expected to be a result of either introgression, ILS, or recurrent mutation. As the latter two processes are expected to affect all human populations equally, they should generate as many ABBA single nucleotide polymorphisms (SNPs) as BABA SNPs. In the equation above, c is either 1 or 0 based on whether the pattern is seen or not. To calculate D, the number of sites that conform to the ABBA pattern is subtracted from the number that conform to the BABA pattern and divided by the total number of sites considered. Thus, values of D that significantly deviate from 0 (ABBA-BABA equality) can support the presence of introgressed archaic ancestry in one of the modern populations.
Importantly, the D-statistic does not directly yield an estimate of the archaic ancestry proportion (f ), but is simply an observation that parameterized demographic models can be compared with. Another method of obtaining a point estimate of f uses the S-statistic, which is simply the numerator of the D-statistic. The equation above and diagram below show how, in theory, the ratio between S-statistics can be used to estimate f directly, where H nAfr is a modern non-African human population whose the relationships between haplotypes (see Box 5), Huerta-Sanchez et al.
found that the Tibetan version of the EPAS1 gene was most similar to the Denisovan. 51 A subsequent network analysis conducted on a more comprehensive panel of modern humans showed that the Denisovan haplotype was also found in high altitude populations of the Himalayas, and clusters within a wide array of diverse African haplotypes that share many EPAS1 alleles with the Denisovan. 53 This broader context demonstrates that EPAS1 haplotype variants were likely polymorphic in the ancestral human-Denisovan population and underwent incomplete lineage sorting (ILS) (see section on "Alternative explanations") prior to introgression. 53 Additionally, a follow-up study of modern Tibetan genomes found that their EPAS1 haplotypes exhibit a combination of Denisovan and non-Denisovan variants. 54 Based on these additional variants, the authors conclude that the population that contributed this haplotype to the ancestral Tibetan population had diverged from the reference Denisovan by between 238 and 952 ka. 54 Further questions regarding the precise genetic basis of hypoxia adaptation and the timing of acquisition and selection on this archaic EPAS1 haplotype in modern high-altitude populations continue to be investigated by both geneticists and archeologists.
An under-emphasized result of studies of the Neanderthal and Denisovan genomes is the lack of corroboration of genomic regions that had been previously hypothesized in earlier studies to be of archaic origin. For example, researchers suggested that the microcephalin (MCPH1) gene, which is involved in regulating brain volume, showed signatures of introgression. 43  ancestors experienced introgression, H Afr is the African population that is assumed to have not experienced introgression, and N A is the ancestral Neanderthal population that contributed genetic material to the ancestors of H nAfr . In practice, N A cannot be known for certain, so a second Neanderthal individual is used as a proxy. The numerator measures how much more similar the first Neanderthal is to the modern non-African than to the modern African. The denominator yields an estimate of the maximum value of S when comparing two Neanderthals. By normalizing the observed level of sharing between non-Africans and Neanderthals by this theoretical maximum, this ratio infers the proportion of the observed similarity that is due to introgression.
However, recent work by Chen et al. invalidates the assumption that Africans carry negligible Neanderthal ancestry, which is often made in estimating f in non-Africans using S-statistic ratios of the above form. 70 The presence of excess derived allele sharing between Africans and Neanderthals due to introgression may bias estimates of f in non-Africans by reducing the numerator S-statistic. The magnitude of this effect would depend on what proportion of the African-Neanderthal sharing is also shared by the non-African population; this would decrease the number of sites available to calculate the S-statistic, but should not downwardly bias f. Interestingly, 94% of the Neanderthal ancestry in Africans is also shared with a non-African group. 70 African-Neanderthal sharing could also inflate estimates of Neanderthal mtDNA provided the first archaic sequence data that could be used to address this question, and showed that that Neanderthal and modern human mtDNA gene pools were distinct and highly diverged. However, for reasons previously discussed, the absence of Neanderthal mitochondrial lineages in modern humans did not preclude the possibility of archaic introgression. Assuming that interbreeding did occur, Nordborg tested two admixture models to estimate the expected impact of Neanderthal mtDNA sequences on the extant human gene pool. 56 Nordborg showed that, under the implausible scenario that AMH and Neanderthals comprised a single, randomly mating population, the observed mtDNA phylogeny would be highly unlikely. 56 However, when considering a much more realistic model where some Neanderthal individuals were absorbed into a randomly mating modern human population, Nordborg showed that substantial levels of admixture could not be rejected (Box 2, Table 1). 56 Specifically, if the hypothetical ancestral mtDNA pool was 25% Neanderthal, a much higher fraction than has been proposed in the literature, there was still a considerable chance (over 50%) that these archaic lineages would have gone extinct by the present (Table 1). 56 This conclusion was subsequently challenged by Currat and Excoffier who argued that even this admixture model was overly simplistic. 24

BOX 5 Using haplotypes to infer relationships
A haplotype is a specific combination of alleles at loci that lie close together along a chromosome. Because of this physical proximity and linkage, the individual variants composing a haplotype tend to be inherited together. Three distinct haplotypes comprised of six alleles each are depicted above, with the dark bar representing the intervening sequence that is shared between all of them. At each variable position, a haplotype can carry one of two alleles. Along with the variants themselves, the associations between them provide information about demographic history and evolutionary processes. Haplotypes are passed down from parent to offspring with recombination between the parent's chromosomes. Both mutation and recombination affect haplotype patterns in a generation time-dependent manner, making them useful for inferring parameters related to archaic introgression, including the extent and timing of gene flow between groups.
Whenever recombination occurs, it disrupts the continuity of the haplotype. Because recombination occurs at a particular rate per generation, distinct haplotypes are expected to break down steadily over time. Therefore, haplotype length can be used to approximately date introgression events. 139 As shown in the figure above, in the first generation after gene flow, the hybrid offspring would have a full complement of AMH (green) and archaic (purple) chromosomes. With each subsequent generation, pieces of the introgressed chromosome are shuffled by recombination (red arrows) into an AMH genetic background and eroded by successive recombination.
This would eventually lead to individuals in the population carrying their archaic ancestry in small tracts. With archaic genomes, it is possible to identify autosomal haplotypes in modern humans that approximately match either Neanderthals or Denisovans. It is assumed that these extended matching haplotypes entered the human gene pool via archaic introgression; the shorter the shared haplotype, the more recombination has occurred and the older the introgression event.
result in even a small number of hybridization events having a disproportionate impact on the AMH gene pool. 24 Therefore, they interpreted the absence of Neanderthal mtDNA lineages in extant humans as strong evidence that interbreeding between archaic and modern humans did not occur. 24 Further studies explored additional demographic scenarios, each based on different models and assumptions, and inferring different values of m ( Table 1). The availability of the nuclear genome provided new fodder to explore this topic. Initial analyses reporting an f of 1-4% seemed to demonstrate significant non-zero levels of migration from Neanderthals into AMH populations. 34 In light of the Neanderthal nuclear data, Currat and Excoffier revisited their spatially explicit models, and found that a hybridization rate of less than 2% was compatible with the estimated levels of Neanderthal ancestry in modern humans, and concluded that the new observations were still compatible with strong reproductive isolation between Neanderthal and AMH populations and a complete lack of mtDNA sharing. 57 In order to estimate m from whole genome data, Kuhlwilm et al.
applied a Bayesian method to neutral stretches of sequence throughout the genome. 58 By targeting regions that were less likely to be affected by natural selection, Kuhlwilm et al. estimated the initial migration fraction of Neanderthals into non-Africans to be 0.3-2.6% (Table 2). 58 Harris and Nielsen, however, argued that neglecting to account for hybrid fitness could provide a skewed picture of patterns of Neanderthal ancestry in modern human genomes. Using simulations, they demonstrated that if Neanderthal-modern human hybrids exhibited higher fitness than modern humans, the average fraction of Neanderthal ancestry in modern humans could increase from an initial 1% to the currently observed approximately 3% within 500 generations (~15 ka) after introgression. 59 However, if Neanderthal-human hybrids exhibited depressed fitness, an initial admixture fraction of 10% is compatible with current observations. 59 Since the true fitness effects of Neanderthal variation on a modern human genetic background are not known, both scenarios and initial admixture fractions Haplotype divergence is another feature that can be used to estimate the relative age of a genomic segment. Because mutations also occur at a particular rate per generation, the number of nucleotide differences between two haplotypes reflects their evolutionary distance. In the figure above, the colored blocks in the sequence which are not yellow represent only the variable positions of the haplotype. Some of these haplotypes are passed down to the next generation with the occasional mutation. With each generation, the diversity of the set increases as the haplotypes become more different from each other. Haplotypes within the two populations are more similar that the ones between populations. AMH carry some haplotypes that are unusually diverged from the rest, given our relatively recent origin. Archaic introgression is often invoked to explain this pattern, since gene flow will carry haplotypes from one population into another. Additionally, the worldwide pattern of haplotype variation can support an introgression hypothesis for a particular locus.
For example, given the geographic range of Neanderthals, it is unlikely that the ancestors of sub-Saharan Africans would have interbred with them. Therefore, a highly diverged haplotype that is common in Europeans and is highly uncommon in sub-Saharan Africans is consistent with being of Neanderthal origin. In cases where genomic data from the hypothesized archaic source exists, it is also possible to compare their sequences and determine if the haplotypes are closely related. However, the extreme lack of representation of modern Africans in genetic databases may be biasing this view-basal haplotypes that are assumed to be absent in Africa may simply be unsampled there as of yet.
A common way that haplotypes are represented is through haplotype networks, illustrated above. The nodes represent groups of haplotypes that are identical, and their sizes are proportional to the number of haplotypes they contain. The edge lengths represent the number of genetic differences from that node to the next most closely related one. The nodes are usually colored by the population that the haplotype was sampled from. For example, the leftmost network has a large node that contains multiple colors, representing a single haplotype that is shared across populations 1-4. The network in the middle shows a locus where haplotypes are highly population specific. All the haplotypes have a common origin, which is carried mostly by individuals belonging population 1, and to a much lesser extent, populations 2 and 4. The rightmost network shows a haplotype that has a high degree of differentiation among samples, with many unique haplotypes that are only slightly different from the haplotype of origin.
are plausible. Therefore, the persistence of a few fundamental uncertainties means that the initial level of gene flow between archaic and AM humans still cannot be known.

| WHERE AND WHEN DID ARCHAIC INTROGRESSION OCCUR?
Morphological arguments for admixture have long been made by paleoanthropologists, particularly those espousing regional continuity between AMH and preceding taxa. For example, Erik (a) A structured ancestral population is comprised of two distinct ancestries (blue and orange) in distinct demes (dashed circles) that give rise to new demes over time. The two leftmost demes eventually give rise to AMHs, but one of them shares more ancestry with the deme that eventually gives rise to Neandertals and Denisovans. Due to recombination over generations, this ancestry is carried in the second AMH population in short tracts that are highly divergent in sequence from the blue ancestry carried by the first AMH population. This pattern occurs without needing to invoke postpopulation split introgression from the archaic hominin. (b) Different gene lineages within individuals and populations can have different evolutionary histories. A concordant gene lineage is one that conforms to the topology of the overall population tree. However, depending on the depth of divergence between the groups and the size of the ancestral population, some fraction of these lineages is expected to be affected by ILS. (c) Balancing selection can maintain highly diverged variants (blue and gray) of a specific genetic trait within a population (dashed circle) on long haplotypes over evolutionary time. Alternatively, if there is no selection acting to maintain variation at a locus, a long, highly diverged tract of ancestry could come from an archaic source. Recent introgression (red arrow) could bring this diverged ancestry into an AMH population, where it would lie on a long ancestral tract because relatively few generations of recombination have occurred. (d) A reference sequence (top) is used to align ancient archaic reads (green) from a sequencing experiment to recover the full sequence. Ancient DNA reads are typically short and contain a relatively high proportion of mismatches, either due to damage or diverged ancestry, compared with the reference. Observed C to T mutations (red) are due to a common form of DNA damage. Real mismatches (blue) can also occur because the archaic individual is usually substantially diverged from the reference, which is based on modern humans. Contaminant sequences from modern humans (orange), even if rare, can be favored by mapping algorithms because those fragments are longer and are more similar to the reference sequence. This leads to a reference-biased consensus sequence (bottom) Several directly dated fossil AMH which also carry Neanderthal ancestry suggest that gene flow must have occurred prior to 35 ka.
Two AMH individuals, "Kostenki 14" dated to 36-38 ka in Russia and "Ust'ishim" 45 ka in western Siberia, carry Neanderthal haplotypes that are longer than the modern human average. 64 was not negligible. Using a new method for identifying Neanderthal sequence without relying on a "non-introgressed" reference population, the authors found that 0.3% of sub-Saharan African genomes was shared with Neanderthals. 70 Importantly, they found that this was not due to primary admixture between the ancestors of modern sub-Saharan African populations and Neanderthals. Rather, this higher-thanexpected level of Neanderthal sharing was driven by a combination of AMH migration back to Africa and by introgression of earlier AMH outof-Africa migrants into Neanderthals prior to their extinction. 70 In the latter scenario, the sequence sharing between Neanderthals and modern Africans can be explained by shared ancestral variation between the earlier out-of-Africa population, which admixed with Neanderthals, and the ancestral African population. 70 It is unclear if and how much estimates of f will have to be revised in light of these findings ( Table 2 Neanderthal admixture event, noting minimal variation in Neanderthal haplotypes across all modern non-African populations. 37 Chen et al. find that if the Neanderthal ancestry in modern Africans was introduced primarily by back-to-Africa migration from ancestral Europeans, their levels of Neanderthal ancestry would be systematically under-estimated relative to East Asians. 70 Therefore, by accounting for the Neanderthal ancestry in modern Africans, Chen et al. find the discrepancy between estimates of f in Europeans and East Asians to be greatly reduced ( Table 2). 70 Inferring the timing and location of Denisovan introgression is an even more challenging problem given that there is little physical evidence of their presence, and genetic data from only a single cave in approximately 2%, implying strong selection against archaic genetic elements. 82 However, a recent re-analysis of the data shows that the observed decline in f was an artifact of the statistic used in the original article, which failed to account for recent gene flow between modern human populations. 83 Using an updated version of the statistic, they showed that the Neanderthal fraction in AMH has remained relatively steady at approximately 2.5% for over 40,000 years. 83  With aDNA sequencing, however, these "human specific" variants were thrown into doubt when Neanderthals (and later, Denisovans) were shown to carry the same alleles as modern humans. 93 Furthermore, several introgression studies found that this genomic region is notable for its lack of Neanderthal or Denisovan ancestry in modern humans. 70 (Table 1). Therefore, when reading the archaic genomics literature, it is important to pay careful attention to the assumptions made, assess whether these are reasonable, and to con-

| Ancestral population structure (non-random mating)
In studies of archaic admixture, ancestral human populations are often modeled as panmictic; that is, all members of the population choose a mate at random from among anyone else in the population. In reality, a multitude of factors (e.g., geography, language, and culture) structure populations such that certain pairings on individuals are much more likely than others. There is strong evidence from AMH morphology to suggest that the ancestral population was structured within Africa. [96][97][98][99][100][101] A potential consequence of such structuring is that certain groups of modern humans might share more genetic variants with archaic hominins than others in the absence of recent introgression.
Neanderthals, for example, could have split from the common ancestral population that later also gave rise to all non-African AMH populations (Figure 2a). Under this scenario, the observed excess of variants shared between Out of Africa individuals and Neanderthals would be due to ancient sharing of genetic lineages through persistent population structure over time.
The authors of an early Neanderthal genome study point out that they could not distinguish between ancestral population structure and archaic introgression. 34,102 Indeed, Eriksson and Manica 103 demonstrated that spatial structure in the ancestral hominin population could produce values of the D-statistic that were comparable with those obtained by Green et al., and were interpreted as evidence for archaic admixture. 34 The degree to which ancestral population structure is responsible for the observed patterns of archaic ancestry is still debated. [104][105][106][107] The presence of long tracts of archaic ancestry in extant non-African humans is the most convincing demonstration that their genetic similarity is driven by a recent introgression event and not ancestral population structure. 55 However, the process of identifying regions that are shared between archaic and modern human genomes can be computationally challenging, with smaller (and older) tracts being more difficult to detect. It is, therefore, possible that ancestral population structure accounts for a significant proportion of the signal attributed to introgression.

| Incomplete lineage sorting
ILS refers to a discrepancy in the relationship between populations (or species) and genetic lineages. ILS and ancestral population structure are distinct concepts that can create similar patterns in genomic data. Over evolutionary time, both populations and genetic lineages generate trees through splitting and divergence (Figure 2b). In cases of relatively recently separated groups, such as Denisovans, Neanderthals, and AMH, a genetic lineage found in one individual can sometimes share its most recent common ancestor with an individual from the other group, even if each is panmictic. 107 Therefore, some proportion of genetic lineages will be more recently shared between a particular human population and an archaic group by chance; this probability is proportional to the ancestral population size. As with ancestral population structure, the age of the shared variation is a distinguishing factor; if the archaic variant is on a long human haplotype, this is more indicative of recent admixture. Unlike ancestral population structure, however, ILS is not expected to generate more overall archaic sharing with one modern human group over another. Therefore, when looking across the entire genome, as in the D-test, the effect of ILS would theoretically be averaged out.

| Balancing selection
Balancing selection is a type of natural selection that maintains more than one haplotype in a population at intermediate frequencies over  (Figure 2c). Specifically, longer than expected haplotypes can persist when there exist epistatic interactions between polymorphisms along its length, that is, the fitness of an allele depends on the presence of another allele some distance away. 110 As a safeguard, regions that encode genes, and therefore might have been affected by balancing selection, are often excluded from analysis. 40 However, this filtering greatly reduces power to identify biologically consequential cases of archaic introgression. Furthermore, it is difficult in practice to conclusively determine that a given region is not, or has never been, under balancing selection, even if it not near any genes. The possibility of balancing selection should therefore always be considered when studies purport to find evidence of adaptive introgression at a particular locus.

| Contamination
Contamination remains a problem in aDNA studies. Small amounts of modern contamination in archaic sequencing experiments can "modernize" ancient individuals, leading to incorrect inferences of population history and archaic admixture [28][29][30] (Figure 2d). aDNA studies should always explicitly address the measures that were taken, both in handling and extracting the sample in the lab and in processing the sequence data, to measure and mitigate the effects of contamination. 111

| Reference bias
When using a modern human reference to assemble genomes of highly diverged individuals, reference bias (or "mapping bias") can occur. Reads in the sequencing library that are more similar to the reference are more likely to map, and thus be included in subsequent analyses (Figure 2d). 112 Additionally, ancient fragments can be more difficult to map to a modern human reference because of sequence differences that are real (due to divergence) and/or artificial (due to DNA damage) (Figure 2d). This type of bias can also cause archaic genomes to look artificially similar to modern human genomes. 112

| Ghost admixture
Recent evidence has highlighted the importance of ghost admixture, that is, introgression with populations for which there is neither descendant group nor even fossil evidence, in hominin evolutionary history. Certain features of the available genetic data of archaic and modern humans are best fit by population genetic models that include introgression events with as yet unidentified groups. 78,113 Developing statistical methods to better detect the genetic signatures of introgression from ghost populations, for which there is by definition no reference genome, continues to be an active area of current research. 41,76,114,115 Ghost admixture introduces complexities to population genetic models that are typically unaccounted for, especially in earlier studies of archaic introgression.
Rogers and Bohlender showed that estimators of f based on pairwise allele counts (such as the ratio of S-statistics) are prone to biases when introgression from ghost populations has occurred. 113 The severity of this bias depends on how deeply diverged the populations in question are from each other. 113 Rogers and Bohlender also found that different count-based estimators of the Denisovan contribution to Melanesians, based on a model of a single pulse of Denisovan introgression, are inconsistent with each other. 113 They speculated that this may be due to a misspecification of the underlying demographic. 113 Indeed, while early studies assumed a single introgression event in the ancestors of Melanesians, subsequent research has found evidence of multiple events from different Denisovan or Denisovan-like populations 37,76,77,116 (see section on "Where and when did archaic introgression occur?").

| CONCLUSIONS AND FUTURE DIRECTIONS
The field of aDNA and archaic introgression continues to rapidly expand as new specimens are sequenced, and novel laboratory and analytical techniques are developed. However, in the midst of these exciting advances, the statistical methods employed across studies often remain difficult to understand and to evaluate by non-specialists. 117 In exploring the ever-burgeoning archaic admixture literature, it is prudent to pay careful attention to the details of these statistical tests, which are often relatively new and have been developed to accommodate the peculiarities and limitations of ancient data.
Readers should always carefully note which assumptions are being made by the researchers, if these assumptions are reasonable, and consider the consequences of violating them for the overall conclusions. Alternative explanations for these patterns, some of which are outlined in this review, are often inadequately explored.
Given the sheer quantity of discoveries being made each year, it has not been possible cover all interesting facets of ancient introgression. Other recent reviews take complementary anthropological perspectives and dive deeper into many of the topics raised here. 46,118,119 It will remain important that geneticists and paleoanthropologists continue to critically engage with, and evaluate, the findings of archaic introgression studies. In doing so, future multidisciplinary research will hopefully be able to address outstanding questions in the field, such as: What are the phenotypic effects of  (Table 2). Finally, we anticipate that ever larger and more diverse human genomic reference databases will enable the evaluation of more sophisticated hypotheses about how archaic admixture has impacted historically understudied populations in Africa, Asia, and the Americas.