Ethiopian indigenous goats offer insights into past and recent demographic dynamics and local adaptation in sub‐Saharan African goats

Abstract Knowledge on how adaptive evolution and human socio‐cultural and economic interests shaped livestock genomes particularly in sub‐Saharan Africa remains limited. Ethiopia is in a geographic region that has been critical in the history of African agriculture with ancient and diverse human ethnicity and bio‐climatic conditions. Using 52K genome‐wide data analysed in 646 individuals from 13 Ethiopian indigenous goat populations, we observed high levels of genetic variation. Although runs of homozygosity (ROH) were ubiquitous genome‐wide, there were clear differences in patterns of ROH length and abundance and in effective population sizes illustrating differences in genome homozygosity, evolutionary history, and management. Phylogenetic analysis incorporating patterns of genetic differentiation and gene flow with ancestry modelling highlighted past and recent intermixing and possible two deep ancient genetic ancestries that could have been brought by humans with the first introduction of goats in Africa. We observed four strong selection signatures that were specific to Arsi‐Bale and Nubian goats. These signatures overlapped genomic regions with genes associated with morphological, adaptation, reproduction and production traits due possibly to selection under environmental constraints and/or human preferences. The regions also overlapped uncharacterized genes, calling for a comprehensive annotation of the goat genome. Our results provide insights into mechanisms leading to genome variation and differentiation in sub‐Saharan Africa indigenous goats.


| INTRODUC TI ON
Threats to biodiversity, and projected future food demands and climatic conditions, underscore the urgent need to characterize, monitor and maintain agricultural biodiversity. Data on molecular changes have been critical in understanding genome architecture, maintaining biodiversity and fitness and exploring impacts of genome evolution. This has been made possible through the investigation of genetic diversity and structure, demographic dynamics (Bosse et al., 2012) and assessment of autozygosity under varying degrees of reproductive isolation and inbreeding (Ceballos, Joshi, Clark, Ramsay, & Wilson, 2018;Kirin et al., 2010;Peripolli et al., 2016). As livestock dispersed from their centres of domestication, they encountered diverse environments with unique climatic, anthropological and biophysical limits. These animals responded to the niche-specific environmental pressures through behavioural and biological adjustments. The former provided short-term buffer against many of the stressors, and the latter ensured long-term survival through physiological and/or genetic adaptation.
Among domestic ungulates, goats have the longest socio-cultural and economic co-existence with humans. They were domesticated around 11,000 years ago in the Fertile Crescent of Southwest Asia and adjacent areas (Zeder & Hesse, 2000). Since their domestication, goats have been contributing to human cultural and socio-economic transformations that have shaped ancient and modern human civilizations and the goat's genomes. Their rusticity and resilience have allowed them to cope and adapt to a wide range of environments. However, the evolutionary history of goats is complex with two contrasting hypotheses explaining their domestication and little geographic pattern in their maternal diversity. Luikart et al. (2001) observed at least six distinct mtDNA haplogroups with little geographic partitioning. They interpreted this to be the consequence of multiple domestication events and ease of translocation through commercial activities. Naderi et al. (2008) on the other hand suggested that such diversity of mtDNA haplogroups was compatible with a single domestication event, followed by a phase of human management of wild or semi-domesticated goats comprising multiple mtDNA lineages, prior to their global dispersion. Evidence from the analysis of ancient goat genomes suggests the domestication of multiple divergent wild goats in a dispersed manner resulting in distinct Neolithic populations that contributed disproportionately to modern goat genomes (Daly et al., 2018). The initial diffusion of goats to Africa is also complex. Archaeological evidence suggests at least three entry points into the continent (Gifford-Gonzalez & Hanotte, 2011). At least two mtDNA haplogroups have also been observed in Kenyan (Kibegwa, Githui, Jung'a, Badamana, & Nyamu, 2015), Ethiopian (Tarekegn et al., 2018), Sudanese (Sanhory, Giha, Ebrahim, Eldin, & Gornas, 2014) and Egyptian (Naderi et al., 2008) indigenous goats, suggesting the influence of at least two independent maternal lineages.
Palaeogenomic evidence has shown that the north-eastern and Horn of Africa region were at the centre of a vast web of maritime and terrestrial routes of ancient and modern trade and thus a major entry point of domesticates into the continent (Fuller & Boivin, 2009;Marshall, 2000). The Horn of Africa landscape exhibit considerable changes in elevation within short geographic distances due to complex volcano-tectonic activities over the past millennia (Mohr, 1971).
In particular, the Ethiopian highlands, which range from 125 m below sea level in the Afar depression to altitudes exceeding 4,000 metres above sea level in the Arsi-Bale mountains, form an extensive uplifted plateau, that is delimited by pronounced escarpments (Umer et al., 2007). The altitudinal gradients result in a wide range of agroeco-climates and production environments. The region is also characterized by human ethnic diversity of ancient origin that has been associated with livestock husbandry. Palaeo-climatic data have also revealed periods of prolonged and severe droughts in the region with profound impacts (Verschuren, Laird, & Cumming, 2000). These factors make the region particularly attractive for investigating how the genomes of indigenous livestock might have been shaped by past and recent events. Here, we generated 52K SNP genotypes in 13 populations of Ethiopian indigenous goats. The data were used to investigate past and recent demographic dynamics through the analysis of genome-wide diversity and admixture, autozygosity and evidence for selection.

| Animals and genotypes
The animals used in this study were provided by farmers and pastoralists who participated in the study by agreeing to have their and production traits due possibly to selection under environmental constraints and/ or human preferences. The regions also overlapped uncharacterized genes, calling for a comprehensive annotation of the goat genome. Our results provide insights into mechanisms leading to genome variation and differentiation in sub-Saharan Africa indigenous goats.

K E Y W O R D S
autozygosity, diversity, effective population size, genome dynamics, LD decay, runs of homozygosity, selection signatures animals sampled. The sampling was done following standard guidelines and procedures of the Ministry of Livestock and Fisheries of the Federal Democratic Republic of Ethiopia. No ethical approvals were required at the time.
In total, 646 unrelated goats from 13 populations (FARM-Africa 1996; Table 1) were sampled by collecting whole blood from two mature unrelated animals per flock via jugular venipuncture into EDTA vacutainers. Prior to sampling, and in the absence of written pedigree records, flock owners were interviewed in detail on the extent of genetic relationships between their animals. Genomic DNA was recovered with the salting out procedure (Shinde, Gujar, Patil, Satpute, & Kashid, 2008). Genotyping was performed with the Illumina® Caprine SNP 52K panel featuring 53,347 SNPs (single nucleotide polymorphisms). Using PLINK v1.9 (Purcell, Neale, Todd-Brown, Thomas, & Ferreira, 2007), SNPs with call rates <95%, minor allele frequency (MAF) <0.01, unknown and redundant genome coordinates, nonautosomal location and those not in Hardy-Weinberg equilibrium (HWE; p < .001), and samples with >5% missing genotypes, were filtered from the data set. This left 44,723 SNPs and 628 goats for analysis. A subset of these samples had been analysed for mtDNA D-loop variation (Tarekegn et al., 2018).

| Data analysis
To estimate genetic diversity, observed (H O ) and expected (H E ) heterozygosity, proportion of polymorphic SNPs (P N ) and average MAF were calculated with PLINK v1.9. The average pairwise genetic distances between individuals within a population were also calculated with PLINK v1.9. Higher values indicate elevated genetic distances and thus low genetic relationships. The average proportion of alleles shared between two individuals was calculated as D ST using PLINK v1.9 with the "-genome" command line: IBS 1 and IBS 2 represent the number of loci which share either one or two alleles that are identical by state in pairwise comparisons between individuals, respectively, and N is the number of loci tested. The genetic distance between all possible pairwise combinations of individuals was calculated as D = 1−D ST .
Detection of ROH was performed by invoking the "--homozyg" option in PLINK v1.9. To account for medium marker density, the following set of conditions were imposed: minimum length of ROH = 1 Mb, number of SNPs present in the ROH = 5, number of missing SNP in the ROH = 1, minimum allowed density of SNPs within a run = 1 SNP/100 Kb, number of heterozygous SNPs in each ROH = 1 and maximum gap between consecutive homozy- For each population, runs of homozygosity (F ROH ; Kim et al., 2013) and excess of homozygosity (F HOM ; Wright, 1922)  where the numerator represents the sum of ROH per animal above a certain criteria length and the denominator (L) is the total length of the genome covered by autosomal markers (McQuillan et al., 2008). The F HOM statistic was derived using the formulae: O HOM and E HOM represent the observed and expected homozygosity for each population, respectively.
To determine the extent of linkage disequilibrium (LD) between adjacent SNPs, the r 2 statistic was calculated for each pair of loci using the formulae: where f(AB) is the frequency of haplotypes having allele A at locus 1 and allele B at locus 2 and f(A) and f(B) are the observed allele frequencies at each locus (Hill & Robertson, 1968). For the calculation, we used the "--ld-window 9,999 --ld-window-kb 1,000 --ld-window-r 2 0" command line in PLINK v1.9. These settings allowed the analysis of SNPs that were not more than 9,999 SNPs apart, set the window size to 1,000 kb and used all SNPs in the analysis.
The r 2 values were used to model changes in effective population sizes (N e ) over generations for each population and across the 13 populations using the SNeP tool (Barbato, Orozco-terWengel, Tapio, & Bruford, 2015). Sved (1971) described the relationship between LD (r 2 ), N e and c (recombination rate) using the equation: Here, r 2 is the LD between different markers, N e is the effective population size, c is the genetic distance between various markers measured in Morgans, n is the chromosome experimental sample size, α is a correction for the occurrence of mutations (Ohta & Kimura, 1971; α = 1 in the absence of mutation and α = 2 if mutation is considered), k = 4 for autosomes and k = 2 for the X chromosome.
In contemporary studies, physical distance is used instead of genetic distance to estimate population size. A physical distance of 100 Kb is approximately equivalent to a genetic distance of 0.1 cM (or 1 cM ≈1 Mbp; Dumont & Payseur, 2008). When population size is changing linearly, the expectation of chromosome segment homozygosity is ~1/(4Ntc + 1), where Nt is the population size 1/(2c) generations ago (Hayes, Visscher, McPartlan, & Goddard, 2003 To investigate patterns of population splits and cross-population gene flow, the maximum-likelihood tree-based approach incorporating admixture events and implemented in TreeMix v1.13 was used.
The phylogenetic tree was built without rooting, and the migration events "m" modelled as edges were added to the phylogeny until the model explained at least 99% of the variance in ancestry. The value of "m" with the highest log-likelihood following six replicate runs of TreeMix v1.13 was chosen as the optimal.
To reveal fine-scale population stratification independent of a priori ancestry information, network analysis was carried out using matrix with R scripts provided in the hapFLK webpage (https:// forge -dga.jouy.inra.fr/proje cts/hapflk). In the construction of the NJ tree, no outgroup population was defined, but the software was prompted to use all populations and the midpoint as outgroup. The kinship matrix captured the population structure, which was used to model covariance matrix of allele frequencies, whereas a multipoint LD model was used to create haplotype clusters on each chromosome (set to 13 (-k, 13) per chromosome). This was determined using cross-validation based estimation in fastPHASE 1.4.0 (Scheet & Stephens, 2006) using the setting -KL10 -KU20 -Ki3. The hapFLK statistic was calculated for each SNP as the average of 20 expected maximization runs fitting the LD model. P-values were then computed based on a chi-square distribution using a Python script (https://forge -dga.jouy.inra.fr/docum ent/588). To limit false positives, a q-value threshold of 0.01 was applied to control for false discovery rate (FDR). Putative selection signatures were defined by the regions with a threshold of p < .001.
To further pinpoint loci under selection, we also used the cross-population extended haplotype homozygosity (XP-EHH) test (Sabeti et al., 2002(Sabeti et al., , 2007 to make comparisons between the 13 populations. The XP-EHH assesses haplotype differences between two populations and is designed to detect alleles that have increased in frequency to the point of fixation or near-fixation in one of the populations (Pickrell et al., 2009;Sabeti et al., 2007). The test uses the integrated EHH (iHH) of a core SNP in two populations, A and B, rather than two alleles in a single population. The unstandardized XP-EHH statistic is calculated as follows: where iHH A and iHH B are the integrated EHH of a given core Because previous studies showed that the standardized XP-EHH statistics follow standard normal distribution (Ma, Zhang, Zhang, & Ding, 2014;Sabeti et al., 2007), P-values were estimated using the standard normal distribution. For each cross-population comparison, we determined the regions under selection based on the threshold P < .001.
Using the NCBI map viewer, the regions revealed by hapFLK and XP-EHH were annotated using the ARS1 release 102 goat reference genome assembly (Bickhart et al., 2017). Functional enrichment and gene ontology (GO) analysis were performed with Panther 14.1 (Mi, Muruganujan, Ebert, Huang, & Thomas, 2019) using Bos taurus as the background. We performed text mining with STRING 11.0 (https:// strin g-db.org/) to identify phenotypes and protein-protein interaction networks associated with the candidate genes.

| RE SULTS
Genetic diversity indices (mean ± standard deviation (SD); .051 (Keffa). The trends in LD decay over genomic distances reveal high LD over short distances, which decays rapidly with distance ( Figure 2a). LD decays more rapidly in Woyto-Guji but slower in Keffa, Western highland/Agew and Western lowland/Gumez.
One thousand (1,000) generations ago, Nubian and Keffa had the highest and lowest N e , respectively (Table 3). Thirteen (13)   Comparative analysis between Arsi-Bale and Nubian revealed the four putative signatures (Figure 5a,b), suggesting they are specific to these two populations. Arsi-Bale goats carry dense hairy coats and reside at an altitude of ≥4,000 m above sea level (www. We annotated the four candidate regions based on the ARS1 release 102 goat reference genome assembly. We found a total of 123 (Arsi-Bale = 29; Nubian = 94) unannotated genes, that is prefixed with "LOC/ENSCHIG" in the candidate regions. Although potentially under selection, these unannotated genes were not included in the functional enrichment and ontology analysis. In total, 20 (CHI6) and 45 (CHI12) genes were present in the Arsi-Bale regions, while 62 (CHI8) and 244 (CHI13) were observed in the Nubian regions (Table   S1). Functional enrichment and GO analysis were performed separately for each population-specific gene sets using

| Genetic variation and marker polymorphisms
We present findings from an analysis of genome diversity, structure

| ROH and inbreeding
Genomic inbreeding coefficients, such as F ROH and F HOM , are more accurate at estimating autozygosity and detecting past and recent inbreeding compared to estimates derived from pedigree data (Curik, Ferenèakoviae, & Sölkner, 2014;Ferenčaković et al., 2013;Knief, Kempenaers, & Forstmeier, 2017). Furthermore, in the absence of genealogical information, molecular data can be used to infer population history and inbreeding (Kim, Sonstegard, Van Tassell, Wiggans, & Rothschild, 2015;Zavarez et al., 2015). The latter is important especially for African indigenous livestock that most often lack written pedigree records. The strong and positive correlation (r = .978) between F ROH and F HOM observed in our study corroborates previous findings in pigs  and cattle (Mastrangelo et al., 2016). It confirms that the extent of a genome under ROH can be used to predict the proportion of the genome that is identical by descent (IBD). A high and positive correlation between F ROH and conventional pedigree-based estimates of inbreeding (F PED ) has been reported (Ferenčaković et al., 2013;Martikainen, Tyrisevä, Matilainen, Pösö, & Uimari, 2017;Purfield, Berry, McParland, & Bradley, 2012) confirming F ROH as an appropriate estimator of IBD alleles. The importance of understanding and quantifying genome-wide autozygosity has also been highlighted through correlations of F ROH with inbreeding depression for a range of production (Bjelland, Weigel, Vukasinovic, & Nkrumah, 2013;Kim et al., 2015;Pryce, Haile-Mariam, Goddard, & Hayes, 2014) and fertility (Kim et al., 2015;Martikainen et al., 2017) traits. In this study, F ROH and F HOM revealed low levels of genomic inbreeding in Ethiopian indigenous goats, results that are consistent with findings on Egyptian Barki (Kim et al., 2016) and Ugandan goats (Onzima et al., 2018). Colli et al. (2018) suggested that significant introgression of other breeds in African goats explains their low inbreeding.
We however suggest that low inbreeding in African goats is the result of extensive outcrossing within and between diverse populations and individuals considering that these flocks are communally grazed and watered, and mating is mostly uncontrolled.

| Demographic history and dynamics
Genome-wide ROH, LD and N e are valuable sources of information on how population structure, demography and management evolve over time (Ceballos et al., 2018;Kirin et al., 2010). Generally, F I G U R E 5 Genome-wide hapFLK statistics between Arsi-Bale and Nubian goat populations. Green and red lines correspond to p < .005 and <.001, respectively short ROH is most likely correlated with ancestral inheritance, ancient bottlenecks or consanguinity, whereas long ROH is associated with recent inbreeding (Browning & Browning, 2013;Purfield et al., 2012). The distribution of ROH can also provide information on specific selection events.  (Brito et al., 2017;Kim et al., 2016;Onzima et al., 2018), sheep (Purfield, McParland, Wall, & Berry, 2017), cattle (Mastrangelo et al., 2016) and pigs (Bosse et al., 2012). Period (AHP) (Wright, 2017), interspersed with short dry spells, were experienced in eastern Africa (Verschuren, 2007;Verschuren et al., 2000). Indigenous goats seem to have thrived when conditions were favourable but drastically shrunk during the subsequent drought (Verschuren, 2007;Verschuren et al., 2000) or following the abrupt or gradual termination of the AHP. Each population, however, seems to have responded differently to the climatic events as deduced from the individual trends ( Figure S1)

| Past and recent population genetic structure
The use of several methods to investigate population structure allowed us to reveal and describe past and recent population genetic structure in indigenous goats in Ethiopia and by extension sub-Saharan Africa. NetView revealed two broad genetic groups lacking a clear phylogeographic structure that corresponded slightly to the genetic backgrounds revealed by ADMIXTURE at K = 2. These two genetic groups may possibly represent two deep ancient ancestries that arrived with the first introduction of goats in eastern Africa. It mirrors findings of mitochondrial DNA which identified two haplogroups that also lacked a clear phylogeographic structure in Ethiopian (Tarekegn et al., 2018), Kenyan (Kibegwa et al., 2015), Sudanese (Sanhory et al., 2014) and Egyptian (Naderi et al., 2008) indigenous goats. With the current data set, we nevertheless cannot determine the origin and route of entry and dispersal of the two deep ancestries and whether they arrived together or independently.
However, their occurrence in all the 13 populations, albeit at different frequencies, points to a relatively early dispersal in the region, most likely facilitated by human socio-cultural and trade interactions as inferred from anthropologic, linguistic and human genetic studies (Pagani et al., 2012).
Cryptic population genetic structure, represented by seven genetic groups, was revealed by PCA and TreeMix. These results were supported by the output of ADMIXTURE: the optimal number of clusters was K = 7. Except Keffa, the genomes of the other populations are a mosaic of at least two genetic backgrounds. This pattern was repeated at K ≥ 7 except for Abergelle, Woyto-Guji and Afar which showed one predominant genetic background. How these seven backgrounds evolved from the two ancestries remains a subject of speculation. Geo-specific local founder events accentuated by genetic drift arising from past reproductive isolation could be a possible source of the observed evolutionary hepta-groupings.
With extensive human migrations across Ethiopia dating back to the 16th century (Habitamu, 2014;Yelma, 1966), it is possible that human cultural and socio-economic interactions dispersed the seven backgrounds across the country resulting in admixed genomes.

| Genome-wide signatures of selection
Four strong selection signatures were identified, two each in Arsi-Bale and Nubian goats. Despite the observation of a unique genetic background in Keffa that we suggest could be the outcome of trypanotolerance, no selection signatures were observed in this population. This was rather surprising and difficult to explain but may suggest that reproductive isolation and the resulting genome autozygosity could be the cause. We attribute the four strong selection signatures to differential adaptation to contrasting environments namely high altitude and the associated low temperatures experienced by the Arsi-Bale population, and low dry altitude and high temperatures experienced by the Nubian population. One Arsi-Bale candidate region spanned ALOX5AP a gene that was identified in sheep as a potential candidate for climate-mediated adaptation (Lv et al., 2014). In humans, a mutation in ALOX5AP was associated with lung function (Ro et al., 2012).
Given the high altitude and restricted oxygen concentration in the Arsi-Bale mountains, ALOX5AP may play a role in adaptation to high altitude, especially in so far as respiration is concerned.
Several candidate genes with critical roles in maintaining genome integrity through DNA repair processes (BRCA2, PDS5B, RAD51) (Couturier et al., 2016;Prakash, Zhang, Feng, & Jasin, 2015) overlapped with the Arsi-Bale candidate genomic regions. In the tropics, high-altitude regions receive much higher annual levels of UV radiation. Therefore, maintaining genome integrity against possible DNA damage induced by ionizing radiation is essential. Other candidate genes found in selective sweeps included KATNAI1, FRY and RXFP2. KATNAI1 has been associated with variation in fibre diameter in domestic sheep breeds (Seroussi, Rosov, Shirak, Lam, & Gootwin, 2017;Zhang et al., 2013). The FRY has been linked with variations in coat pigmentation in many sheep breeds (Garcia-Gamez et al., 2011;Seroussi et al., 2017;Wei et al., 2015;Zhang et al., 2013) and is also involved in growing wing hairs and bristles in Drosophila (Cong et al., 2001;Fang, Lu, Emoto, & Adler, 2010;He, Fang, Emoto, Jan, & Adier, 2005). Mutations in RXFP2 were linked to horn type and development in sheep (Johnston et al., 2011;Wang, Zhou, Li, Zhao, & Chen, 2014), reproductive success and survival in Soay sheep (Johnston et al., 2013), and the genomic region spanning RXFP2 was also identified to be under selection in Creole (Gautier & Naves, 2011) and West African Borgou cattle (Flori et al., 2014). We also observed genes (STARD13/CCNA1, FLT1, DCLK1, NBEA) that are reported to be associated with female reproduction in bovines (Kfir et al., 2018) and chicken (Shen et al., 2017;Sun et al., 2015). The PITX2 and CDX2 have been associated with puberty in cattle (Cánovas et al., 2014) and embryogenesis in the mouse (Lu et al., 2018). Evidence drawn from animal experiments suggests that conditions in high-altitude environments and prolonged exposure to altitude-initiated stress, including cold and hypoxia, can have direct negative effects on reproductive function (Heath & Williams, 1989). The exposure can have significant negative effects on female reproduction through reduction in fertility, increasing foetal loss and/or reducing fecundability (Adekilekun, Aboua, Oyeyipo, & Oguntibeju, 2019).
The Nubian is a goat breed that is adapted to arid environments.
It is therefore not surprising that hapFLK and XP-EHH identified selective sweeps overlying regions that spanned genes associated with adaptation to environmental stress (PCK1, CTCFL, SPO11, BMP7), oxidative stress (PLAGL2, SRXN1), adaptation to different ecological environments (ZBTB46, ARFRP1, STMN3, GMEB2), and mitochondrial homeostasis (TDRD7) including classical innate immunity and fever induction (HCK, PROCR, CTSZ) (Table S2). Coat and skin colour are an integral part of adaptation. The coat of Nubian goats is light coloured and most often white. In one of the candidate regions, there were two genes that have been associated with colouration (GNE, EDN3).
Surprisingly, we observed genes that have been associated with  (Table   S2). With candidate regions overlapping these genes our results demonstrate the potential of the Nubian to be used in breeding programmes to increase production to meet future demand for proteins of animal origin under unpredictable climatic conditions.

| CON CLUS ION
The genetic history of African indigenous goats is complex. It has obviously been closely intertwined with the history of local human communities. The need to adapt to diverse African environments may also have shaped present-day African goat genomes. Here, we have provided insights that improve our understanding of the genomic landscape and demographic history of sub-Sahara African indigenous goats in Ethiopia. Our results provide a foundation to formulate and test biological hypotheses relating to population demographic profiles and genome dynamics in African livestock. Further studies are needed to refine and confirm our interpretations and the proposed hypotheses. These include among others, the association between the uniform genetic background in Keffa and reduced susceptibility to trypanosomosis, the origin and route of entry and dispersal of the two deep ancestries into the region, and the suggestion that the historical demographic dynamics of the indigenous goats may be more complex than is reflected by the composite trend.

ACK N OWLED G EM ENTS
We would like to specially thank flock owners who volunteered their animals for sampling. We acknowledge the assistance of the district agriculture and rural development offices during sampling.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest regarding this manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data used here will be made available upon request.