Domestication of cattle: Two or three events?

Abstract Cattle have been invaluable for the transition of human society from nomadic hunter‐gatherers to sedentary farming communities throughout much of Europe, Asia and Africa since the earliest domestication of cattle more than 10,000 years ago. Although current understanding of relationships among ancestral populations remains limited, domestication of cattle is thought to have occurred on two or three occasions, giving rise to the taurine (Bos taurus) and indicine (Bos indicus) species that share the aurochs (Bos primigenius) as common ancestor ~250,000 years ago. Indicine and taurine cattle were domesticated in the Indus Valley and Fertile Crescent, respectively; however, an additional domestication event for taurine in the Western Desert of Egypt has also been proposed. We analysed medium density Illumina Bovine SNP array (~54,000 loci) data across 3,196 individuals, representing 180 taurine and indicine populations to investigate population structure within and between populations, and domestication and demographic dynamics using approximate Bayesian computation (ABC). Comparative analyses between scenarios modelling two and three domestication events consistently favour a model with only two episodes and suggest that the additional genetic variation component usually detected in African taurine cattle may be explained by hybridization with local aurochs in Africa after the domestication of taurine cattle in the Fertile Crescent. African indicine cattle exhibit high levels of shared genetic variation with Asian indicine cattle due to their recent divergence and with African taurine cattle through relatively recent gene flow. Scenarios with unidirectional or bidirectional migratory events between European taurine and Asian indicine cattle are also plausible, although further studies are needed to disentangle the complex human‐mediated dispersion patterns of domestic cattle. This study therefore helps to clarify the effect of past demographic history on the genetic variation of modern cattle, providing a basis for further analyses exploring alternative migratory routes for early domestic populations.


| INTRODUC TI ON
Between the late Pleistocene and early Holocene, the most commonly occurring cattle species was the aurochs (Bos primigenius), ranging from northern Africa to both the Atlantic and Pacific coasts of Eurasia (Zeuner, 1963). This formerly widespread wild species recently became extinct, with the last recorded herd found in 1627 AD in Poland (Götherström et al., 2005). Similar to sheep and goats, archaeological and genomic evidence suggests that the ancestors of taurine cattle (Bos taurus) were domesticated from Bos primigenius primigenius in the Fertile Crescent during the Neolithic, more than 10,000 years ago (YA; Bruford, Bradley, & Luikart, 2003;Ajmone-Marsan, Garcia, & Lenstra, 2010;MacHugh, Larson, & Orlando, 2017). However, approximately 1,500 years later a second domestication event took place in the Indus Valley from Bos primigenius nomadicus, separated from the taurine branch ~250-330,000 YA, eventually giving rise to the extant indicine cattle (Bos indicus), often also termed zebu cattle (Loftus, MacHugh, Bradley, Sharp, & Cunningham, 1994).
Despite support for the extant distribution of cattle arising from two main domesticated lineages, a third domestication event has been hypothesized to have occurred in north-east Africa about 8,000-9,000 YA, giving rise to the divergent African taurine cattle. Support for this hypothesis derives from archaeozoological evidence and genetic data from contemporary cattle and aurochs. The archaeozoological evidence is supported by comparative osteological analyses between ancient wild and domestic cattle from Europe, Asia and Africa (Applegate, Gautier, & Duncan, 2001;Grigson, 1991Grigson, , 2000Stock & Gifford-Gonzalez, 2013). Bradley, MacHugh, Cunningham, and Loftus (1996), using maternal mitochondrial DNA (mtDNA), showed that African cattle feature a higher frequency of the T1 mitochondrial haplogroup than is common in other regions, estimated that the separation between African and European taurine ancestors occurred 22,000-26,000 YA (earlier than the Fertile Crescent domestication), and found patterns of population expansions consistent with domestication that were more recent than the corresponding signature of African/European divergence. These results are supported by analyses of extant taurine cattle from Europe, the Middle East and Africa, as well as extinct British aurochs (Troy et al., 2001). Furthermore, evidence from nuclear DNA analyses (Bovine Hapmap Consortium et al., 2009;Hanotte et al., 2002;Pérez-Pardal et al., 2010) is consistent with a local domestication of African taurine cattle and the subsequent admixture of Near East and the Indus Valley cattle in Africa. However, Bonfiglio et al. (2012) support the Near Eastern origin of the T1 mitochondrial DNA haplogroups and defended that the North African subhaplogroup T1d could have originated in already domesticated cattle shortly after their arrival from the Near East. Along this line, analyses on wholegenome SNP arrays supported two domestication events for taurine and indicine lineages followed by introgression from wild aurochs in Africa, East Asia and Europe (Decker et al., 2014). Taken together, controversial archaeological and genetic data seem to support both two and three domestication events in cattle; therefore, an analysis using genealogical modelling of alternative scenarios that could give rise to the extant patterns of cattle genetic diversity needs to be addressed.
Taurine cattle dispersed quickly after their domestication northwest from the Fertile Crescent through Turkey into the Balkans and into northern Italy, either following a Mediterranean coastline or a route partially along the Danube River, and subsequently dispersing across Europe (Figure 1) (Beja-Pereira et al., 2006;Pellecchia et al., 2007). Taurine cattle may have also migrated along the northern coast of Africa, eventually crossing into the Iberian Peninsula and admixing with local cattle (Figure 1) (Beja-Pereira et al., 2006;Decker et al., 2009) Additionally, a modern indicine migration via pastoralists into eastern Africa (~2,500-3,500 YA) and subsequently throughout central and southern areas of the continent has been described ( Figure 1) (Decker et al., 2014;Hanotte et al., 2002;Payne & Hodges, 1997).
Previous studies using traditional molecular markers (e.g., mitochondrial DNA and microsatellites) generated insights into the genetics underlying the domestication history and postdomestication processes, such as the taurine dispersal patterns in Europe (Beja-Pereira et al., 2006;Bradley et al., 1996;MacHugh, Shriver, Loftus, Cunningham, & Bradley, 1997). However, the development of newer genomic technologies, such as SNP arrays, has enabled the rapid collection of information for thousands of markers with high precision, facilitating comprehensive surveys of genomewide variation in domestic cattle (Bovine Hapmap et al. 2009).
Here, we collated a data set of ~54,000 SNPs genotyped in 180 cattle populations from across the world, representing both indicine and taurine breeds (Supporting Information Table S1), to characterize the genetic diversity, population structure and demographic history of extant cattle. To disentangle the long-standing debate around the existence of a third domestication event in north-east Africa, we used approximate Bayesian computation to model a variety of possible scenarios that could give rise to the extant patterns of cattle genetic diversity, including analyses to determine whether modern cattle derive from two or three domestication events and migratory patterns among breeds.
Additionally, one individual was genotyped from the archaeological remains of a B. p. primigenius animal from England (Supporting Information Table S1; Park et al., 2015) and used as an outgroup.
The UMD3.1 bovine assembly was used as a reference to map the genomic positions for each SNP. All data sets were merged and quality controlled using PLINK 1.90 (Chang et al., 2015;Purcell et al., 2007). Across all breeds, only autosomal SNPs were retained and any locus with a minor allele frequency less than 5% and a call rate less than 90% was removed, leaving 8,081 SNPs (~8 k) for further analyses.

| Genetic variation and population divergence
The molecular inbreeding coefficient (F IS ), and observed (H o ) and expected (H e ) heterozygosities per breed were calculated using PLINK on the breeds with a sample size of at least eight individuals (breeds with lower sample sizes were not included to avoid potentially biased estimates deriving from unrepresentative sample sizes).
Welch's t test was used to test or differences between H e and H o within each breed. Admixture v1.3 (Alexander, Novembre, & Lange, 2009) was used to assess population structure in the data for values of K (the number of clusters in the data) between 1 and 150.
The predictive accuracy for each K was estimated using the fivefold cross-validation (CV) procedure implemented in Admixture. For this analysis, we further pruned the 8K data set for linkage disequilibrium (LD) in PLINK, using a sliding window of 50 kb, a step size of 10 SNPs, and randomly removing one SNP from each pairwise comparison with an r 2 of 5% or higher. Sixty-one values of K ranging between 1 and 150 were tested. Due to the computational power required for assessing higher values of K, we restricted the analysis to all values of K between 1 and 50, and used intervals of 5 K from 55 to 100, and examined K = 150. The partition solutions were visualized using the POPHELPER R package (Francis, 2017).

Multidimensional scaling (MDS) was calculated in PLINK using
Hamming distances to determine the distances between individuals and breeds across 20 dimensions. The first two major axes explaining the highest proportion of the variance in the data set were visualized using R (R Core Team, 2014). F ST was also estimated between all pairs of populations featuring more than one sample, and the resultant matrix was used to construct a neighbour-net tree in SplitsTree v.4.14.4 (Huson & Bryant, 2006).

| Approximate Bayesian computation (ABC)
The domestication history of cattle was analysed with approximate Bayesian computation (ABC) as implemented in the software ABCtoolbox (Wegmann, Leuenberger, Neuenschwander, & Excoffier, 2010) with simulations generated using Fastsimcoal2 (Excoffier, F I G U R E 1 Hypothesized main domestication sites and migration routes of taurine (Bos taurus) and indicine (Bos indicus) cattle, including the postulated third domestication site in Egypt. Lesser migration routes, such as the dispersal across Europe, are not depicted Dupanloup, Huerta-Sánchez, Sousa, & Foll, 2013;Excoffier & Foll, 2011). To reduce the computational power required to perform the analyses, as well as to reduce the effect of ascertainment bias (e.g., Barbato et al., 2017;Kijas et al., 2012;Orozco-terWengel et al., 2015), the 8K data set was pruned in PLINK as above but with an r 2 of 2.5%, resulting in 2,202 SNPs (2K). Four breeds were selected for these analyses, a European taurine ( (Table 1). These breeds were chosen avoiding outlier breeds based on the MDS and neighbour-net analyses. Two additional breed sets were selected with the same criteria and used as replicates for the ABC analysis to assess that the results obtained were not dependent on the breed set used but rather a general outcome reflective of the domestication history of cattle (Table 1). To ensure the 2K data set retained information about the genealogical history of these breeds, we compared the results of running the Admixture analysis on each set of four breeds using the 2K and 8K data sets. The 2K data set generated qualitatively the same results as the 8K data set.
The observed data for each breed set were analysed with Arlequin 3.5 (Excoffier & Lischer, 2010) to produce 34 summary statistics describing the genetic variation within and among breeds (Supporting Information Table S3).
The underlying model used in the ABC analyses for the four geographically and genetically divergent groups assumed that: (a) ZebuAF and ZebuAS have a common indicine ancestor; (b) TaurEU and TaurAF have a common taurine ancestor; (c) indicine and taurine have a common aurochs ancestor; and (d) bifurcation of aurochs occurred approximately 250,000 YA (Bovine Hapmap et al. 2009;Bradley et al., 1996). Initially, four alternative scenarios were tested to discriminate among the main demographic histories involving two or three domestication events, modelled as bottlenecks, and migratory patterns among breeds (examples shown in Figure 2; full detail in Supporting Information Figure S1). The software ABCtoolbox was used to parametrize the models to be simulated and to control Fastsimcoal2 for the random generation of the 1 million reverse coalescent simulations drawing parameter values from defined prior distribution ranges for each scenario (Supporting Information   Table S2). Based on a set of 5,000 randomly selected simulations, F I G U R E 2 Example of two initial modelled scenarios for determining the domestication history of cattle using approximate Bayesian computation (ABC). Scenario 1 models only two domestication events (black circles) that coincide with the divergence of taurine and indicine cattle. Scenario 4 models three domestication events: one in indicine cattle prior to divergence within the indicine group and two within taurine cattle, after divergence within the taurine group Scenario 1. ABCtoolbox using the 1,000 simulations that produced summary statistics closest to the observed data. The retained simulations were compared to the observed data under a generalized linear model (GLM) to produce a posterior probability p-value. The p-value is the proportion of simulations that have a smaller or equal likelihood to the observed data; larger p-values represent a better fitting GLM and therefore a good match between the simulated and observed scenarios. Different scenarios were compared with Bayes factors (BF), measured as the quotient of the marginal density (MD) of one scenario divided by the MD of an alternative one; a BF greater than 3 is sufficient to reject the alternative scenario (Wegmann et al., 2010). Upon selecting the best-fitting model of the initial four scenarios proposed, alternative scenarios were hypothesized using this as a base model, and the process was repeated multiple times until all 15 scenarios were tested (Supporting Information Figure   S1). More complex scenarios were not tested to avoid uninformative overparameterization. Finally, the eight best-fitting scenarios were repeated with the additional two breed sets as replicates.
TreeMix (Pickrell & Pritchard, 2012) was used to address any ambiguity between ABC models differing by only migration. The three replicate breed sets used in the ABC analyses were merged together, and a maximum-likelihood tree was constructed using TreeMix under default parameters. Iteratively, weighted migration edges were added to the network, until 99.8% of the variance of the ancestry between the populations was explained by the model. F I G U R E 3 Admixture individual assignment plots for 179 cattle populations and one auroch sample for K = 2, 3, 4 and 70. Each vertical bar represents an individual, and the proportion of each colour in that bar corresponds to the ancestry (genetic variation) of an individual deriving from a given cluster

| Approximate Bayesian computation (ABC)
Approximate Bayesian computation was used to reconstruct the demographic history of taurine and indicine breeds. We first assessed whether the reduced data set (2K) resulted in qualitatively the same admixture pattern as the larger data set (8K). The CV error values for K from 1 to 5 were similar for both data sets (less than 1% difference for K = 3 and K = 4; Supporting Information Figure S4). For K = 4, each breed was assigned to one cluster with the ZebuAF showing admixture partially deriving from TaurAF but mostly from ZebuAS (Supporting Information Figure S4), while for K = 3 the clusters corresponded to TaurAf, TaurEU and ZebuAS. Both results reflected the divergence patterns observed in the population structure analyses.
Of the 34 summary statistics originally chosen for the ABC analyses, 17 were removed because of correlations with other statistics, for example, mean number of alleles per locus and pairwise differences between populations due to the high correlation with heterozygosity (Supporting Information Table S3; Figure S5). The remaining 17 summary statistics included measurements of diversity (e.g., mean and standard deviation of heterozygosity across loci for each population) and pairwise divergence (e.g., mean and standard deviation of F ST ; Supporting Information Table S3). Each observed summary statistics was within the 95% quantiles of the simulated data from that summary statistics, consistently across all scenarios and replicates.
The test of the first four scenarios showed a BF > 3 when comparing scenarios 1 and 2 with two domestication events, over scenario 4, which included three episodes of domestication (scenario 3 with two domestication events showed the lowest MD and p-value, reflecting its unsuitability to represent the observed genetic data). Although additional models with three domestication events were tested, scenarios with only two domestications were better supported than those with three domestication events in general (Table 2). Of the 15 modelled scenarios, initial testing showed low support for scenarios 1 to 7 and therefore was not replicated.
Among the replicated scenarios, three were identified consistently as best fitting across the three breed sets tested (in bold in Table 2).
Scenario 8 Figure 6). The MD difference between these three scenarios was too small to discriminate between them (i.e., BF < 3;    (Gautier et al., 2010;Matukumalli et al., 2009;Orozco-terWengel et al., 2015), as reflected by the significantly higher genetic variation within taurine cattle when compared to indicine breeds. Higher H o in European taurine populations and a F I G U R E 5 Neighbour-net using F ST distances for 174 populations of Bos indicus, Bos taurus and hybrids featuring more than one sample (see Supporting Information Table S1 for label information). Scale for F ST distance is displayed in the top left    Table   S1). Ascertainment bias was reduced through removing a proportion of the markers in LD, and highly correlated markers with similar genealogical history were removed from the data set, reducing multicollinearity effects that would lead to overestimation of differentiation measurements. LD pruning of SNPs has been shown to be effective at reducing heterozygosity overestimations when compared to whole-genome sequencing and, although this methodology underestimates F ST , it is consistent regardless of relatedness to the ascertainment populations (Malomane et al., 2018). Furthermore, by replicating the ABC scenarios to increase the sampling scheme, we attempted to reduce any breed specific biases and more effective at capturing the demographic history representative of whole subgroups (Rougemont et al., 2016).

| D ISCUSS I ON
The Admixture analysis best supported the partition for K = 70; however, the low variation in CV error for K values between 15 and 100 suggests that sample groupings above K = 15 only reflect minor improvements in the resolution of the clusters detected. This observation is likely to be an outcome of several factors influencing the statistical power to discriminate between the populations analysed, The three best-modelled scenarios for the domestication history of cattle using approximate Bayesian computation (ABC). Domestication episodes and migratory events between populations are shown by black circles and arrows, respectively F I G U R E 7 Phylogenetic network of the inferred relationships between 12 cattle breeds estimated using TreeMix. Migration edges between breeds are shown as arrows pointing towards the recipient population and coloured according to the proportional ancestry received from the donor population. Scale bar is 10 times the mean standard error of the estimated entries in the covariance matrix

Bos taurus
African

Bos taurus
African
The substantial differentiation between the European and in N e reaching modern N e values of ~500 or less for many populations (Barbato, Orozco-Terwengel, Tapio, & Bruford, 2015;Orozco-terWengel et al., 2015), with the African taurine population showing, on average, larger N e values than Eurasian taurine populations, albeit only marginally.
The hypothesis of an additional domestication event in Africa from local aurochs has been also proposed to explain the large difference between taurine branches. We explicitly tested the hypothesis of whether three domestication events better explain the genetic variation observed in modern domestic cattle when compared to the more widely accepted two domestication scenario. ABC tested on a number of demographic scenarios showed reasonable support for three domestication events (scenarios 4 and 6) in terms of how well the estimated GLM fitted the observed data (i.e., the p-value).
However, when scenarios including three domestications were compared to similar models in which only two domestications occurred, the latter received stronger support (BF > 3) (  et al., 2008;Park et al., 2015;Upadhyay et al., 2017). These results are consistent with those in other domesticated animals such as horses (Warmuth et al., 2012), pigs (Frantz et al., 2015) and camels (Almathen et al., 2016), where gene flow with wild populations has been observed in modern breeds.
Previous studies using SNP data genotyped in taurine and ind-  (Gray et al., 2014;Rougemont et al., 2016). These limitations restrict the intrinsic complexity in the models, ideally, events such as divergences and domestication would be reconstructed as gradual processes occurring over many generations as cattle were shifted from prey animals, to managed herds, to captive bred stock (Larson & Burger, 2013).
Despite efforts to limit overparameterization by reducing the ranges of many parameters, both the number of prior parameters and their range may have reduced statistical power of posterior estimates (Wegmann et al., 2010). The progression of refining the ABC model through serial selection and diversification of preferentially chosen scenarios was the most computationally efficient methodology to define increasingly more complex models. "Less favourable" models were rejected at each step to constrain the otherwise very large number of models simulated. Although this saves significant time and resources, it is built on the assumption that discarded models will never be viable regardless of alterations; unfortunately, this compromise, along with the reduction in the data set from 8K to 2K, was necessary logistical restrictions. However, the first three migration edges incorporated into the maximum-likelihood tree generated in TreeMix recaptured the same migratory patterns between TaurAF and ZebuAF as tested in ABC scenario 8 (Figures 6 and 7) and explained up to 99.78% of the variance in the tree. Adding a fourth migration edge marginally increased the explanation of the variance in the tree and corresponded to a weak migration edge from TaurEU into ZebuAS.
Overall, this supports the ABC results and indicates that the more important gene flow between populations occurred between B. taurus and B. indicus within Africa, although minor movements of cattle between Europe and Asia may also contribute to shape the population structure we see today.
In conclusion, SNP array data collected in more than three thou- African indicine and European taurine cattle), will help to disentangle the complex human-mediated microevolution of domestic cattle, in particular if using data deriving from whole-genome sequence data that suffer less of ascertainment bias as SNP arrays do.
Paleogenomics analyses of Middle Eastern and African wild aurochs predating domestication and early Middle Eastern and African domestic cattle will provide the data required to fully address these questions.

ACK N OWLED G EM ENTS
We