The genomic origin of Zana of Abkhazia

Abstract Enigmatic phenomena have sparked the imagination of people around the globe into creating folkloric creatures. One prime example is Zana of Abkhazia (South Caucasus), a well‐documented 19th century female who was captured living wild in the forest. Zana's appearance was sufficiently unusual, that she was referred to by locals as an Almasty—the analog of Bigfoot in the Caucasus. Although the exact location of Zana's burial site was unknown, the grave of her son, Khwit, was identified in 1971. The genomes of Khwit and the alleged Zana skeleton were sequenced to an average depth of ca. 3× using ancient DNA techniques. The identical mtDNA and parent‐offspring relationship between the two indicated that the unknown woman was indeed Zana. Population genomic analyses demonstrated that Zana's immediate genetic ancestry can likely be traced to present‐day East‐African populations. We speculate that Zana might have had a genetic disorder such as congenital generalized hypertrichosis which could partially explain her strange behavior, lack of speech, and long body hair. Our findings elucidate Zana's unfortunate story and provide a clear example of how prejudices of the time led to notions of cryptic hominids that are still held and transmitted by some today.


| INTRODUCTION
The local folklore of the South Caucasus region of Abkhazia records a "wild woman" named Zana, who lived in the 19th century, who was referred to by some locals as a female Abnauayu or Almasty: names for a creature similar to the infamous Yeti of the Himalayas and Bigfoot of North America, that supposedly lives in the Caucasus and Central Asia. 1,2 Originally captured while living outdoors in the forest, Zana was later enslaved by a succession of local wealthy individuals, and was finally bought by the Abkhaz nobleman Edgi Genaba who took her to his estate at Tkhina, where she lived until her death around 1890. 3 Inspired by the speculation that she might have been a female Yeti, Soviet scientists visited the region in 1962 to gather descriptions and accounts from the elders living in the village of Tkhina, who still recalled her. The locals described her as being "part human and part animal," 2 m tall and dark-skinned, covered with thick hair, who was able to lift a 50 kg sack of flour with one hand, and outrun a horse in a race. 1,3 According to the eyewitness accounts she also lacked speech, which along with her alleged strange behavior and appearance, likely resulted in her reputation as an Almasty. Zana is also documented to have given birth to two sons and two daughters from local men. Following her death, she was buried in the Genabas' family cemetery, and although the exact location of Zana's burial site was unknown, the grave of her youngest son, Khwit, was identified in 1971. After several attempts to locate Zana's burial site, the remains of an anonymous female were discovered in the Genaba's family cemetery, leading to speculation that they may have belonged to Zana herself. 4,5 To overcome limitations of a previous DNA analysis and the possible ambiguities of craniometric studies, we sequenced the genomes of both the unknown female and Khwit, to 3.1-and 3.3-fold coverage, respectively. We performed genome-wide analysis to explore their genetic ancestry and kinship relations, which allowed us to shed light on Zana's story based on objective genome-wide data.

| MATERIAL AND METHODS
See the Supplementary Methods for a more in-depth description of the materials and methods used in this study.

| Data generation and bioinformatics analysis
All ancient DNA (aDNA) laboratory work was carried out in dedicated clean laboratory facilities at the GLOBE Institute, University of Copenhagen, according to aDNA standards described elsewhere. 6 We used teeth and petrous bones to extract ancient DNA from both individuals and build double-stranded BGISeq libraries following the BEST protocol, using adapters compatible with BGI sequencing according to Mak et al. 2017. 7 The amplified libraries were sequenced on two lanes of BGISEQ-500 platform.
We used the BAM workflow implemented in the PALEOMIX pipeline 8 to trim and map the sequencing reads against the human reference genome build GRCh37 and the revised Cambridge reference sequence (rCRS, NCBI accession number NC_012920.1).
We used mapDamage v2.0 to get the read length distribution and approximate Bayesian estimates of damage parameters. 9 To estimate the levels of contamination in the ancient samples we used con-tamMix 10 and the X-chromosome based contamination method implemented in ANGSD. 11 The sex of individuals were assigned according to the Rγ estimates described elsewhere. 12
A phylogenetic network analysis of complete mtDNA sequences including that of "Zana", Khwit and other L2 haplogroup sequences 14 (n = 93) was conducted with POPART 15 using the "Median Joining Network" algorithm. We used BEAST v2.6.1 16 to estimate the divergence time of "Zana's" mtDNA lineage using the Bayesian skyline plot (BSP) method. The tree height was calibrated based on the previous work by Silva et al. 14 The kinship coefficients were calculated by first generating a site allele frequency likelihood file (saf) in ANGSD (http://www. popgen.dk/software/index.php/IBSrelate) which was followed by estimation of IBS sharing matrix based on the two-dimensional site frequency spectrum (2d-SFS) from real-SFS implemented in ANGSD. 11

| Population genetics
To assess the genetic relationship between "Zana," Khwit, and other populations, we merged the shotgun sequencing data from the historical individuals with the Affymetrix Human Origins SNP array panel of worldwide populations. 17,18 We also included data from three archaic humans (two Neanderthals 19 and the Denisovan 20 ) and chimpanzee genotypes, as well as genomes from two Caucasus hunter-gatherer (CHG) individuals (the ca. 13300 years old SATP specimen, and the ca. 9700 years old KK1 specimen ) originally excavated in the South Caucasus 21 for comparison.
We conducted unsupervised maximum likelihood-based clustering analysis with ADMIXTURE 22 by pruning the data set for linkage disequilibrium using plink v1.9. 23 The program pong 24 was used to identify and visualize the best run for each K and similar components between different Ks.
The principal components analysis (PCA) was performed using plink v1.9 with the ancient genomes projected onto the modern variation. The first 30 eigenvectors of PCA were used as input for the uniform manifold approximation and projection (UMAP) analysis using the "umap" R package.
D-statistics estimates were calculated using the ADMIXTOOLS 25 and R package "admixr." 26 The maximum likelihood phylogenetic tree of "Zana" and the African populations were constructed with TreeMix. 27

| RESULTS AND DISCUSSION
A total of 1 219 599 801 BGISeq sequences were generated from the teeth and petrous bones for both individuals (see Mapping Statistics in Table S1). As expected from the results of previous studies, 28 the endogenous content was much higher in the petrous bones ("Zana"-41.95%, Khwit-33.93%) than teeth samples ("Zana"-1.16%, Khwit-12.7%). We used the data to obtain the genomes at an average sequencing depth of coverage of 3.1Â and 3.3Â, for "Zana" and Khwit, respectively. The sequences show typical ancient DNA damage profiles and short DNA fragment lengths, 29  The contamination estimates based on mtDNA (for both) and X chromosome (only for Khwit) were less than 1%, and the chromosomal sexes of the individuals matched their anthropological descriptions: Khwit as male and "Zana" as female.

| Uniparental markers
The mtDNA sequences were identical for both individuals, consistent with the hypothetical mother-son relationship, and could be assigned to haplogroup L2b1b, the parental haplogroup of which (L2b clade) is widely distributed in western Africa, 14,30 but is also found across Africa. 31 The same haplogroup was identified by an earlier, independent analysis of teeth samples from Khwit and "Zana" at the Southern Research Institute based on human DNA enriched libraries (unpublished data).
Khwit's Y-chromosomal lineage belongs to the haplogroup R1b1a1b1b which clearly reflects his non-African paternal heritage. This is a sub-haplogroup of a major haplogroup R1b1a1b defined by M269 mutation, which is encountered in high frequencies in Europe and western Asia. [32][33][34] Although an earlier study that analyzed Khwit's mtDNA sequence revealed "Zana's" maternal origin as African, 35 the authors suggested that "Zana" could have belonged to an ancient African lineage, likely due to the lack of a suitable comprehensive comparative data set at that time. We therefore, conducted a mitochondrial network analysis F I G U R E 1 Median-joining network. The analysis of mtDNA sequences of "Zana" and Khwit alongside 93 complete mtDNA sequences from the human L2 mitochondrial clade using the "Median-Joining" algorithm implemented in PopArt. Each circle represents a certain haplotype; smaller black circles indicate median vectors. Small black lines connecting branches between the haplotypes denote the number of mutation steps separating the haplotypes. Since the mtDNA sequences of "Zana" and Khwit are identical, only Zana's mtDNA haplotype is mentioned in the plot to assess the relationship of the putative "Zana" sample's maternal lineage with other (n = 93) human L2 haplogroup sequences, 14 and found that it clusters together with other individuals of the L2b lineage, as expected ( Figure 1).
Silva et al. have estimated that mitochondrial clade L2b likely originated ca. 24 kya, 14 thus our mtDNA assignment can be used to reject the hypothesis that the hypothetical "Zana" sample had an ancient or archaic origin. In order to obtain an approximate estimate of the time to the most recent common ancestor (TMRCA) of the maternal lineage of "Zana" and its sister groups, we ran BEAST based only on L2b sequences (n = 57) ( Figure S3). With the limited number of L2b1b sequences (n = 4) in the data set, the divergence time was estimated to be ca. 9800 ya (3515-13 000; 95% highest posterior density intervals).

| Kinship
We next further tested whether the two individuals were directly related using PC-relate as implemented in PCAngsd 36

| Genetic Affinities
To further assess the genetic relationship between Zana, Khwit and various worldwide populations using nuclear genome variation, we ran a PCA based on the Human Origins (HO) panel. To aid visualisation, we reduced the total number of reference populations in the panel to represent the major genetic lineages of the world. However, given its geographic relevance, we included relatively more populations from the Caucasus, and given previous hypotheses that Zana may have had some archaic hominid ancestry, we also included genome-wide data from three archaic humans, and used the chimpanzee as an outgroup. Additionally, we included the two Mesolithic hunter-gatherers (SATP and KK1) from the South Caucasus for comparison.
The results clearly show that Zana is neither genetically close to archaic humans nor the chimpanzee, but clusters closely with modern Unsupervised clustering analysis using ADMIXTURE also clearly rejects any hypothesis that Zana was of "nonhuman" origin, for example as suggested by various sources. 1,2 Rather, it is clear that she shared genetic ancestry with present-day western and eastern African populations ( Figure 2B). To explore this African origin further, we conducted additional PCA and admixture analyses based solely on African groups from the HO panel ( Figure 3A and Figure S4). Here again, Zana shows ancestry components from the eastern (eg, Dinka) and western (eg, Yoruba) African groups, with no significant genetic contribution from southern, northern, and central African populations. We were unable, however, to resolve whether she was (a) an individual derived from admixture between a Dinka-like and Yoruba-like population (purple and plink components in Figure 3A) or (b) originated solely from eastern African groups such as Luhya and Luo.
To estimate Zana's ancestry proportion more accurately, we conducted the admixture analysis in "supervised" mode based on 13 African populations ( Figure 3B The contemporary reports and subsequent tales of Zana's wildness were at least partially based on some of her unusual physical characteristics such as the lack of speech, intellectual disability and long hair covering her whole body to name a few. With the genomic data clearly rejecting all nonhuman hypotheses, we speculate that if these descriptions of her physical characteristics are accurate, she may have had a rare human genetic disorder such as congenital generalised hypertrichosis: a syndrome with dismorphic facial features, intellectual disability, and hypertrichosis. 39

| CONCLUSIONS
Our results prove that the unknown female buried in the Genaba family cemetery was Zana herself. In contrast to the speculations that she might have been a female Almasty, we provide definitive genomewide data to put an end to the accounts of her as anything but a human woman.
Zana was likely of eastern African descent, although we cannot rule out partial western African ancestry. We hypothesise that her lineage could have arrived in the territory of present-day Abkhazia (South Caucasus) as a result of the slave-trade practiced between the 16 to 19th centuries CE by the Ottoman Empire. Lastly, we speculate that it was simply her unfamiliar individual physical characteristics (such as unusual behavior, physical strength, tall stature, lack of recognisable speech and hypertrichosis) and the subsequent rumors over generations that fueled the myth of a non-human origin.

CONFLICT OF INTEREST
The authors declare no competing interests.

PEER REVIEW
The peer review history for this article is available at https://publons. com/publon/10.1002/ggn2.10051 and in Supplementary File 3.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in European Nucleotide Archive (ENA) at https://www.ebi.ac.uk/ena/, reference number PRJEB45032.