Deciphering the patterns of genetic admixture and diversity in southern European cattle using genome‐wide SNPs

Abstract The divergence between indicine cattle (Bos indicus) and taurine cattle (Bos taurus) is estimated to have occurred approximately 250,000 years ago, but a small number of European cattle breeds still display shared ancestry with indicine cattle. Additionally, following the divergence of African and European taurine, the gene flow between African taurine and southern European cattle has also been proposed. However, the extent to which non‐European cattle ancestry is diffused across southern European cattle has not been investigated thoroughly. Also, in recent times, many local breeds have suffered severe reductions in effective population size. Therefore, in the present study, we investigated the pattern of genetic diversity in various European cattle based on single nucleotide polymorphisms (SNP) identified from whole‐genome sequencing data. Additionally, we also employed unlinked and phased SNP‐based approaches on high‐density SNP array data to characterize non‐European cattle ancestry in several southern European cattle breeds. Using heterozygosity‐based parameters, we concluded that, on average, nucleotide diversity is greater in southern European cattle than western European (British and commercial) cattle. However, an abundance of long runs of homozygosity (ROH) and the pattern of Linkage disequilibrium decay suggested recent bottlenecks in Maltese and Romagnola. High nucleotide diversity outside ROH indicated a highly diverse founder population for southern European and African taurine. We also show that Iberian cattle display shared ancestry with African cattle. Furthermore, we show that Podolica is an ancient cross‐bred between Indicine zebu and European taurine. Additionally, we also inferred similar ancestry profile of non‐European cattle ancestry in different Balkan and Italian cattle breeds which might be an indication of the common origin of indicine ancestry in these breeds. Finally, we discuss several plausible demographic scenarios which might account for the presence of non‐European cattle ancestry in these cattle breeds.

extent to which non-European cattle ancestry is diffused across southern European cattle has not been investigated thoroughly. Also, in recent times, many local breeds have suffered severe reductions in effective population size. Therefore, in the present study, we investigated the pattern of genetic diversity in various European cattle based on single nucleotide polymorphisms (SNP) identified from whole-genome sequencing data. Additionally, we also employed unlinked and phased SNP-based approaches on high-density SNP array data to characterize non-European cattle ancestry in several southern European cattle breeds. Using heterozygosity-based parameters, we concluded that, on average, nucleotide diversity is greater in southern European cattle than western European (British and commercial) cattle. However, an abundance of long runs of homozygosity (ROH) and the pattern of Linkage disequilibrium decay suggested recent bottlenecks in Maltese and Romagnola. High nucleotide diversity outside ROH indicated a highly diverse founder population for southern European and African taurine. We also show that Iberian cattle display shared ancestry with African cattle. Furthermore, we show that Podolica is an ancient cross-bred between Indicine zebu and European taurine. Additionally, we also inferred similar ancestry profile of non-European cattle ancestry in different Balkan and Italian cattle breeds which might be an indication of the common origin of indicine ancestry in these breeds. Finally, we discuss several plausible demographic scenarios which might account for the presence of non-European cattle ancestry in these cattle breeds.

| INTRODUC TI ON
Modern cattle originated from at least two different species of wild aurochs: Bos primigenius primigenius (European aurochs) and Bos primigenius namadicus (Indian aurochs). Analyses based on mitochondrial DNA (mtDNA) have estimated the divergence date between these two species from ~117,000 to ~332,400 years before present (YBP) (Achilli et al., 2008;Bradley, Machugh, Cunningham, & Loftus, 1996;Loftus, Machugh, Bradley, & Sharp, 1994). Subsequently, the near-Eastern population of Bos. p. primigenius was domesticated ~10,000 YBP giving rise to domesticated taurine population (Troy et al., 2001), while Bos. p. namadicus was domesticated ~8,000 YBP, somewhere in the Indus Valley giving rise to the modern zebu (cattle with hump aka indicine) population (Chen et al., 2010). The occurrence of a third domestication event involving African aurochs in north-eastern Africa is still debated (Grigson, 1991;Loftus et al., 1994;Pitt et al., 2018). However, recent studies based on genome-wide single nucleotide polymorphisms (SNPs) refuted this hypothesis of a third domestication centre and, instead, proposed the gene flow from African aurochs resulting in high divergence of African taurine (Decker et al., 2014;Pitt et al., 2018). This hypothesis of African aurochs introgression, however, is yet to be tested because DNA from an ancient specimen of African cattle is not available. Conversely, analyses of genome-wide SNPs from ancient European aurochs have provided novel insights into the post-domestication contribution of ancestral aurochs to the gene pool of modern European cattle. For instance, studies have suggested that British cattle followed by Iberian and Dutch cattle may carry an abundance of aurochs specific alleles compared to either African or near-Eastern taurine breeds (Park et al., 2015;Upadhyay et al., 2016). More aurochs samples from different parts of Europe and at different time points, however, are needed for further validation of these results.
Based on the archaeological evidences (Martins et al., 2015;Zilhao, 1993) and genomic analysis of DNA sequences of ancient farmers (Hofmanová et al., 2016), at least two migration routes have been suggested to explain the expansion of Neolithic farmers along with their domesticated animals after early domestication of European aurochs in the Near East. Following these early migrations, various instances of gene flow involving non-European cattle into the gene pool of southern European cattle have been suggested. For instance, mtDNA, microsatellite markers, and genome-wide SNP-based studies have shown the presence of African taurine ancestry in Iberian breeds (Beja-Pereira et al., 2003;Cymbron, Freeman, Isabel Malheiro, Vigne, & Bradley, 2005;Decker et al., 2014;Ginja, Telo Da Gama, & Penedo, 2010). On the other hand, several Italian breeds also carry complex non-taurine ancestry. For instance, analysing microsatellite markers, Cymbron et al. (2005) reported the highest frequency of indicine population-associated alleles in Greek and Italian cattle breeds, while Decker et al. (2014) reported the dual ancestry-African taurine as well as indicine-in three Italian cattle breeds, namely Chianina, Romagnola and Marchigiana. However, the origin of such dual ancestry remains unclear. Moreover, it is also not known whether other Balkan and Italian (BAI) cattle breeds, such as Busha and Maremmana, also carry the similar dual ancestry. Further, the genetic admixture pattern in southern European cattle has mostly been investigated using either mitochondrial DNA (Di Lorenzo et al., 2018;Pellecchia et al., 2007) or microsatellite markers (D'Andrea et al., 2011). Conversely, genome-wide high-density SNP markers have scarcely been used for detailed characterization of the non-European ancestry in southern European cattle.
The European indigenous native cattle breeds are valuable genetic resources as they are well adapted to local environments. For instance, Maremmana became well adapted to the hot and humid environment of Tuscan Maremma plain, once wetlands where malaria was endemic. Further, it has been postulated that some local breeds like Busha might have conserved an abundance of rare alleles because of their large effective population sizes (Medugorac et al., 2009). Indeed, conservation analyses have prioritized some local European cattle for conservation, namely from Iberia (Canon et al., 2001;Ginja et al., 2013). Additionally, in some cases, the certified products obtained from local breeds provide an additional value that distinguishes them from non-native breeds (Di Trana et al., 2015).
Furthermore, local breeds are also attached to several traditions of cultural heritage. The rapid decline of local breeds, however, remains a major concern. It has been estimated that of the 640 global cattle breeds with known "risk status," 171 breeds can be classified under "at risk" category while 184 breeds are already extinct (FAO, 2015).
In our previous study, we also reported the recent reduction in effective population size for several southern Europe breeds such as Mirandesa and Maltese (Upadhyay et al., 2016). Therefore, a comprehensive understanding of the status of current genetic diversity and demographic processes driving these changes will have a large impact on their ongoing conservation efforts.
In this study, genome-wide SNP data were generated using two different approaches: BovineHD SNP array genotyping and whole-genome sequencing (WGS). At first, we used the genotypes obtained from WGS of European cattle to assess genetic heterozygosity and change in recent demography, followed by identification and assessment of non-European ancestry in southern European cattle (Iberian, and BAI cattle) using BovineHD SNP array data. Library construction was carried out with 0.5-3 µg of genomic DNA following the Illumina library prepping protocols (Illumina Inc.).
All the 19 individuals were paired-end re-sequenced with the Illumina sequencing technology (Illumina Inc.). We also obtained additional 18 WGS data (15 raw sequenced data and three WGS alignment) of several commercial and traditional cattle from previous studies (Bickhart et al., 2016;Daetwyler et al., 2014;Kim et al., 2017;Murgiano et al., 2014). All the detailed sample information is given in Supporting Information Table S1. To perform the quality-based trimming on each fastq file, sickle (Joshi and Fass et al., 2011) was run with default settings except for the length threshold of 50 bp.
Following this trimming, BWA-mem (Li, 2013) algorithm was used to align the quality-trimmed fastq files against the bovine reference genome build UMD 3.1. After the alignment, duplicate reads were removed from the bam files using "Samtools rmdup" (Li et al., 2009).
Multi-sample variant calling was carried out using freebayes (Garrison & Marth, 2012) with the default settings except for the parameters minimum base quality (--min-base-quality 20) and haplotype length (--haplotype-length 0). Under the default parameters, freebayes only considers the alternate allele as a SNP if it is covered by at least two reads or present in at least 0.2 fractions of the total reads aligned at a position in at least one sample. Following this early round of SNP calling, vcftools (Danecek et al., 2011) was used to discard variants using these criteria: (a) indel positions, (b) SNP variants with more than two alleles (c) SNP variants with genotyping quality less than thirty and (d) SNP with a minimum depth less than four and a maximum depth greater than thirty (avoid erroneous genotypes due to CNV). The concordance between genotypes as called from the whole-genome sequencing data set and BovineHD SNP array data set was examined using seven samples for which both genotype sets were available.

| Estimation of heterozygosity using wholegenome sequences
The genetic heterozygosity was estimated in each individual wholegenome sequence using mlRho (Haubold, Pfaffelhuber, & Lynach, 2010). The method implemented in mlRho co-estimates the population mutation rate (Ɵ) and sequencing error rate (є) using maximumlikelihood approach. If the value of Ɵ is small, the estimate of theta approximately reflects heterozygosity under an infinite allele model.
For this analysis, we used UMD 3.1 aligned data with mapping quality >15, base quality >25 and sites with the depth between 4 and 30.

| Assessment of recent demographic change using runs of homozygosity (ROH) analysis
Runs of homozygosity (ROHs) were extracted from whole-genome sequence data following the procedure implemented as in Bosse et al. (2012), using sliding windows approach of 10 kbps. We defined an ROH as a genomic region of at least 20 kbps where the number of heterozygous SNPs per bin (or SNP count) was less than 0.25 times the whole-genome nucleotide diversity, and at least 20 consecutive bins showed a total SNP average lower than the total genomic average. Local assembly or alignment errors were minimized by relaxing the threshold for individual bins within a candidate homozygous stretch, allowing the number of SNPs per bin to be maximum twice the genomic average and the average SNP count within the candidate ROH to not exceed 2/3 the genomic average. Both wholegenome nucleotide diversity and nucleotide diversity within a candidate ROH were estimated from well-covered sites only, which were defined by a depth of coverage between 4× and 30×.

| BovineHD genotyping array data and filtering
BovineHD SNP genotyping array data of various European, African and Indian cattle as published in the previous studies (Bahbahani, Tijjani, Mukasa, & Wragg, 2017;Barbato et al., unpublished;Upadhyay et al., 2016) were obtained and merged for the top alleles using PLINK v1.07 (Purcell et al., 2007). For the present study, several additional animals of various Italian and Iberian cattle breeds were genotyped using BovineHD SNP array. The merged BovineHD genotyping array data were filtered using PLINK (Purcell et al., 2007) to keep the animals with more than 90% of genotypes called (--mind 0.1), SNPs that were present across at least 95% of the samples (-geno 0.05), and SNP with minor allele frequency ≥1% (--maf 0.01).
The complete total BovineHD genotyping data set consisted of about 670 k SNPs and 358 samples (Supporting Information Table   S2).

| Population admixture using the unlinked SNPs
Genetic admixture patterns of southern European cattle breeds in relation to non-European taurine and zebu were characterized using (a) model-based clustering as implemented in ADMIXTURE (Alexander, Novembre, & Lange, 2009), (b) treemix analysis (Pickrell & Pritchard, 2012) and (c) three-population test (Reich, Thangaraj, Patterson, Price, & Singh, 2009). The ADMIXTURE analysis was carried out with 1,000 bootstrap replicates for population cluster (K) values from 2 to 6. Prior to the ADMIXTURE analysis, a linkage disequilibrium (LD) pruning was performed, to reduce the overall pairwise LD to <0.10. We used the python package pong to generate the figures based on ADMIXURE result (Behr, Liu, Liu-Fang, Nakka, & Ramachandran, 2016). Treemix analysis (Pickrell & Pritchard, 2012) was carried out to investigate the migration events of zebu and African cattle during the domestication history of European taurine. In brief, we followed the procedure of Decker et al. (2014): first, we generated maximum-likelihood-based phylogenetic tree of all cattle populations, and iteratively, we added one migration edge to the previously generated graph with "m" migration edge. We rooted the graphs with yak (as an outgroup), used blocks of 1,000 SNPs and applied the "-se" option to estimate standard errors of migration proportions. For this analysis, the scaffolds of yak genome assembly (Hu et al., 2012;Qiu et al., 2012) were aligned to the bovine UMD 3.1 assembly and processed as described in Upadhyay et al. (2016).
Moreover, the SNP genotyping data of the British aurochs (Park et al., 2015) were also used in this treemix analysis. The treemix analysis was run five separate times, each time using different seeds, to assess the consistency of migration edges across different runs.
Additionally, the f3 tests, which consider the correlation of allele frequencies across the genome-wide markers, were also carried out to provide support to admixture analysis. The f3 tests with Z-score less than −3.0 were considered as significant. The three-population tests were carried out using ADMIXTOOLS version 4.1 (Patterson et al., 2012).

| Phasing
We phased genotypes of each autosomal chromosome separately using Beagle 4.1 (Browning & Browning, 2007) by setting all the parameters as default. The recombination map of cattle was used from the previous study (Ma et al., 2015). This recombination map comprises of 59,309 SNP markers for 29 autosomes with an average distance of 0.043 cM in males and 0.039 cM in females. Our data, however, consisted of ~670 K SNPs, and hence, for an SNP with unknown recombination rate in our data set, we assign it the value of nearest SNP with a known location in recombination map, thus keeping the overall genetic distance between markers same as that of the consecutive markers in original recombination map (Ma et al., 2015).

| ChromoPainterv2 to infer the coancestry matrix
To infer population admixture using the phased data, Li and Stephens (2003) algorithm as implemented in ChromoPainterv2 (Lawson, Hellenthal, Myers, & Falush, 2012) was used. The underlying algorithm takes into consideration LD and underlying recombination process along the markers and reconstructs each haplotype as a recipient of a series of genetic chunks from all the other "donor" haplotypes. ChromoPainterv2 first calculates nuisance parameters, n (like effective population size) and M (population mutation rate), to implement them in a probabilistic model that calculates copying probability of genetic chunks in a recipient haplotype given the other donor haplotypes. We used SNPs located on chromosomes 1, 2, 7 and 12 to infer the "n" and "M" using 10 iterations of expectation maximization (EM) algorithms. The obtained values of "n" and "M" were 307 and 0.0012, respectively. Finally, these inferred values were fixed in the algorithm to obtain the ChromoPainter coancestry matrix (count matrix as well as length matrix) that measures the haplotype sharing among the samples across all the chromosomes.

| FineSTRUCTURE algorithm to cluster the samples
Once we obtained the count of shared haplotypes between different samples, we used the FineSTRUCTURE to cluster the samples into genetically homogeneous groups. Following Leslie et al. (2015), FineSTRUCTURE was run for 2 million Markov chain Monte Carlo (MCMC) iterations with the initial 1 million iterations discarded as "burn-in", and following these "burn-in," sampling from the posterior distribution was carried out at every 10,000 iterations. Later, as recommended in Lawson et al. (2012), we ran 10,000 "hill climbing" iterations on the MCMC iteration with the highest posterior probability to get the final cluster assignment.

| GLOBETROTTER to estimate the admixture proportion
The multiple linear regression model as implemented in the GLOBETROTTER algorithm was used to assess the ancestral makeup of BAI cattle breeds in terms of ancestry contribution from various taurine and zebu clusters. In brief, we model the genome of each BAI cattle breed as a linear mixture of the taurine and zebu donor populations using the method described in Leslie et al. (2015). To estimate the standard error in these ancestry proportions, we applied delete-one chromosome jackknife approach (Busing, Meijer, & Leeden, 1999) as described in Montinaro et al. (2015).
As the estimates of admixture dates and ancestry proportions depend on the true underlying recombination rate between any two consecutive SNPs, we interpolated true average recombination rate based on the cattle recombination map constructed by Ma et al. (2015).

| Profile of heterozygosity and runs of homozygosity
Individual genomes of 19 southern Eastern European cattle were sequenced and genotyped along with 18 additional whole-genome sequences downloaded from NCBI SRA (detailed information in Supporting Information Table S1). These additional sequences were comprised of individuals from Indian zebu (two Gir), African zebu  Table S1). The alignment was performed against UMD3.1 using bwa-mem, and the alignment rate was greater than 98% for all the samples. The total number of SNPs identified after performing quality filtering in bam file and post-SNP calling was greater than ~30 million. The genotyping concordance (Supporting Information   Table S1) between SNPs genotyped using the BovineHD SNP array and SNPs genotyped using whole-genome sequencing was approximately 94% for all seven samples, probably indicating the low proportion of false positive SNPs in the data set.
We used two different approaches to estimate heterozygosity from whole-genome sequences of cattle. Both the approaches-population mutation rate (Ɵ) and nucleotide diversity calculated as an average number of heterozygous sites in a 10kbp window-indicated relatively low heterozygosity in western European cattle, while several southern European individuals displayed the levels of heterozygosity comparable to African taurine ( Figure 1). All Maremmana (MA) individuals consistently displayed a high value of heterozygosity in both the approaches (Figure 1a,b).
The African zebu cattle displayed the highest values of heterozygosity, probably due to the admixed nature of their genome (Bahbahani et al., 2017).
The number and total length of ROHs in the genome did not vary sharply among cattle from different regions (Figure 2b). In concordance with the previously described heterozygosity estimates,  Table S1 for the breed abbreviations of diverse ancestral population. We note that the ROH profile ( Figure 2b) of Busha, which is characterized by the low number and low cumulative length-might also be a consequence of their low genome coverage (~2-4×).

| Investigation of genetic admixture using unlinked SNPs
To investigate the presence of indicine and African cattle ancestry among southern European cattle, admixture analysis was initially run To supplement the admixture analysis, the f3 tests ( Figure 4) were also carried out to determine whether any southern European cattle breed could be modelled as an admixture between European taurine and other non-European cattle. The f3 tests, which were performed with British, Dutch or Iberian breeds as one of the reference population and any zebu breeds as another reference population, resulted in significant Z-scores for the target population of Busha shown that, on the whole, f3 tests are unaffected by SNP ascertainment bias, they also indicated that differential population frequency, just by chance, could also lead to false positive signals. To test this hypothesis, we performed three-population tests using two sets of genotypes obtained from WGS data: (a) genotypes identified from  Table S1 for the breed abbreviations aligning the cattle sequences against the taurine reference (UMD 3.1) and (b) genotypes identified from aligning the cattle sequences against indicine reference (Bos_indicus_1.0). Interestingly, across both genotype sets, only a Podolica sample generated significant negative Z-scores for f3 tests with British, Dutch or Iberian cattle as one and Gir as another reference population (Figure 4d).
To investigate the cattle phylogeny and relationship across Indicine, African and taurine lineages, the phylogenetic tree was constructed by applying the maximum-likelihood (ML) algorithm as implemented in the software treemix. The ML-based phylogenetic tree ( Figure 5) topology perfectly captures known relationships among taurine and zebu cattle populations. Using Yak as an outgroup, the tree displays the first split between taurine and zebu cattle which is followed by a split between domestic taurine and wild British aurochs. African taurine appears to be the most divergent among all domestic taurine, while BAI breeds appear to be paraphyletic. All Iberian breeds display short branch lengths and form a single clade.
To investigate the migration events involving indicine and African cattle in southern European cattle, we ran five independent treemix analysis, each initialized with different random seeds. Across all the runs, the graph model without any migration edges explained about 98.73% of the total variance (Supporting Information Figure S1) in relatedness between populations meaning that adding migration F I G U R E 3 Unsupervised hierarchical clustering (a) showing results for an inferred number of clusters, K varying from 2 to 6. Linkage disequilibrium decay analysis (b) comparing central Italian cattle breeds against commercial cattle breeds. Refer to Supporting Information Table S2 for the breed abbreviations F I G U R E 4 Three-population tests with Busha (a), Pajuna (b), Marchigiana (c) and Podolica (d) as target populations. In the case of Podolica, f3 tests were performed on whole-genome sequencing data, while for the remaining population, f3 tests were performed on SNP array data. The dot shows f3 statistics value, while the horizontal bar shows a plus or minus standard error. Refer to Supporting Information Table S2 for the breed abbreviations events would improve the fit within a graph. Across all the treemix runs (Supporting Information Figure S2), we either observed migration edges between African taurine and Iberian breeds, or the same clade for African taurine and Iberian samples. Interestingly, all BAI breeds displayed inconsistent migration edges across all the different treemix runs (Supporting Information Figure S2). Decker et al. (2014) also reported inconsistent placement of Italian breeds across the different treemix runs, probably as the result of an underlying complex phylogeny.

| Investigation of admixture pattern using phased SNPs
The pattern of haplotype sharing and in-depth analysis of the complex admixture pattern of southern European cattle were investigated by ChromoPainter and fineSTRUCTURE. The pattern that emerged from ChromoPainter coancestry matrix ( Figure 6 and Supporting Information Figure S4 To account for the drift effect, we re-run the ChromoPainter analysis with Italian breeds only allowed to receive the ancestry from the non-Italian cattle donor groups (four clusters) assigned based on the fineStructure-inferred tree. Also, we did not consider EAZ as donor population because it is the cross-bred of African taurine and indicine zebu. With this analysis, we specifically wanted to compare the ancestry profile of different BAI breeds. In particular, we note that if introgression events involving non-European cattle ancestry are ancient and split between Italian cattle is relatively recent, then all Italian cattle would display similar ancestry profile. The ancestry profile of Italian cattle displays similar zebu and African taurine ancestry (Figure 7). The zebu and African taurine ancestry (Supporting Information Table S3) in these cattle breeds varied between approximately 9%-12.5% (except BU02) and 8%-10%, respectively.

| Genetic diversity and change in recent demography
The average autosomal heterozygosity estimated using mlRho did not sharply contrast the cattle from different regions except that, on average, zebu and African cattle showed greater heterozygosity compared to the European cattle ( Figure 1). However, differences in the estimates between different breeds were clearly visible. For instance, all Maremmana (MA), Romagnola (RO) and Podolica (PO) individuals showed greater heterozygosity compared to all western European cattle. In fact, all the estimates for Maremmana (MA) were either equal to or more than the previously reported estimates for European commercial European cattle that ranged from 1.27 × 10 −3 to 1.97 × 10 −3 (Gautier et al., 2016). As intensive artificial selection or/and genetic isolation usually led to a reduction in effective F I G U R E 5 Maximum-likelihood-based phylogenetic tree. Scale bar shows 10 times the average standard error of the estimated entries in the sample covariance matrix. Refer to Supporting Information Table S2 for (Figures 1 and 2). However, we also note that LD decay analysis using SNP array markers indicates a recent effective population size in Romagnola (ROMFH) that is comparable to that of the commercial cattle ( Figure 2). This result can be attributed to the strong decline in the effective population size of Romagnola.
The fact that Maltese (MT) displayed large between the individual variations in theta as well as in ROH profile indicates that genetic substructure exists in a population (Figure 2). This genetic substructure may have been formed as a result of the varying degree of Chianina admixture in a population (Lancioni et al., 2016). Also, Maltese (MT) individuals display a large proportion of genome under long ROHs indicating a strong recent bottleneck event (Figure 2b).
F I G U R E 6 Clustering of individuals based on fineStructure algorithm. Note that breeds are coloured according to their geographic origin. The intensity of colour indicates shared haplotypic segments (values in terms of centiMorgan) based on chunklength coancestry matrix generated by ChromoPainter algorithm. The number beside the clusters indicates (1) Zebu cattle, (2) African cattle (N'Dama and EAZ), (3) southern European cattle (Iberian and Italian) and (4) west European cattle (commercial and British cattle). Note that African cattle clusters with European when performed the same analysis but with a smaller number of zebu and taurine samples (Supporting Information Figure S3). Refer to Supporting Information Table S2 for the breed abbreviations F I G U R E 7 Ancestry proportion of different donor populations in the genome of Italian and Balkan cattle breeds. INDZEB: Indian Zebu; AFRTAU: African taurine; SOUTAU: southern European taurine (without Italian cattle breeds); WESTAU: western European taurine. Note that Busha individuals come from two different subpopulations and hence treated separately in this analysis. Refer to Supporting Information Table S2 for the breed abbreviations This result is in good agreement with our previous study, where we observed a similar long ROH pattern using BovineHD SNP array (Upadhyay et al., 2016).
The nucleotide diversity outside ROH is a good indicator of ancestral haplotype diversity as it reflects the haplotype variations that remained weakly affected by recent consanguinity (Bosse et al., 2012). Our analysis of nucleotide diversity outside ROH (Figure 2a) consistently resulted in greater values for Italian individuals compared to the rest of European samples which indicates that either the ancestral founder population for these breeds was large or the ancestral population received gene flow from some genetically distinct population. Both these scenarios are likely to explain these results as Italy is much closer to the centre of domestication compared to western Europe, and hence, it could be hypothesized that serial founder effect was severe in western Europe (Scheu et al., 2015)  Hence, the hypothesis of introgression from diverse cattle population in the genepool of African taurine cannot be ruled out.

| Characterizing non-European cattle ancestry
Our results from linked and unlinked SNP-based approaches, to detect admixture pattern, not only confirmed the previous reports but also extended the support for the complex origin of southern a single donor population). We note that it is not surprising for BAI cattle to display shared ancestry with African taurine as they both likely have the same centre of domestication. However, shared ancestry between BAI and zebu due to divergence is less likely to occur, since both the lineages are estimated to have diverged about ~250,000 YBP. To test the two hypotheses related to introgression, we applied the algorithm implemented in a tool, GLOBETROTTER (Hellenthal et al., 2014). In the absence of true admixing population, GLOBETROTTER can identify the most suitable proxy for the true donor (aka "surrogates") among all the populations in the data set. The algorithm, however, did not identify clear admixture signal involving any surrogate population in our data set indicating that either the admixture event is very old, or the admixture events were recurring and involved multiple donor populations.
Indeed, BAI cattle breeds are inhabiting in their respective regions since ancient times. Therefore, it is likely that these admixture events are too old to date using the GLOBETROTTER approach as the simulated data have shown that GLOBETROTTER can only estimate the admixture events accurately if they have occurred in about last ~170 generations (Hellenthal et al., 2014). In fact, analysing microsatellite markers from a thousand-year-old bone sample, Gargani et al. (2015) reported that, compared to Iberian and western European cattle, Chianina and Romagnola were genetically closer to the thousand-year-old ancient bovine sample.
Perhaps, genotyping ancient cattle bone samples from Italy will shed more light on the age and origin of the complex ancestry present in modern central Italian cattle.
Applying three-population tests on SNPs identified from whole-genome sequencing data, we provide clear evidence of the cross-bred (taurine and zebu) ancestry in an Italian Podolica sample. However, we note that this zebu ancestry might have not necessarily come from the Indian zebu itself; instead, it could have come from some zebu-related population not sampled in our data set. Because Podolica is believed to have originated in the Podolian region of Eastern Europe, it can be hypothesized that this zebu ancestry is either of the Eastern European or western Asian origins. These results confirmed the previous hypothesis of crossbred origin for Podolica based on the similarity of the β-globin variant between Podolica and Zebu (Pieragostini, Scaloni, Rullo, & Di Luccia, 2000).
Our admixture (Figure 3a) and ChromoPainter ( Figure 6) results indicated shared ancestry between Iberian cattle (PA, SA and MN), Limousin (LMS) and African taurine (ND and MUT) which is in concordance with previous reports (Beja-Pereira et al., 2003;Cymbron et al., 2005;Decker et al., 2014;Ginja, Telo Da Gama et al., 2010). Moreover, we also observed shared ancestry between EAZ and southern European cattle (Figures 3a and   6); however, as EAZ itself is a cross-bred of African taurine and zebu, it is difficult to interpret this shared ancestry. While the clustering of LMS with Iberian cattle probably reflects the use of this commercial breed to upgrade local Iberian cattle, namely Pajuna (Martínez et al., 2012), previous studies have also reported the presence of EAZspecific microsatellite alleles in Iberian breeds such as Mertolenga and Pajuna (Beja-Pereira et al., 2003;Martín-Burriel et al., 2011).
The fineStructure-inferred tree ( Figure 6) also clusters Marchigiana (MCGFH) with Chianina (CH) which is consistent with the known history of Marchigiana as a cross-bred with a high fraction of Chianinarelated ancestry.
The three-population tests involving British cattle as one of the reference population resulted in relatively greater significant z-values for Busha and Podolica (Figure 4a,d). This could mean that British cattle is the most suitable surrogate for European taurine in our data set. We also note that the British cattle displayed the least average heterozygosity estimates (Figures 1 and 2). Taken together both these results, it is safe to propose that British cattle is the least admixed population among all European cattle populations in the data set, and thus, they might have preserved a large number of European taurine-specific variants. Interestingly, this hypothesis may also partially explain the previous reports of British cattle sharing a high frequency of derived alleles with ancient aurochs sample, while Italian cattle reportedly displayed the least frequency of aurochs specific alleles (Park et al., 2015;Upadhyay et al., 2016).
Taken together, our results in this study provided a holistic view on non-European cattle ancestry in southern European cattle. The fine-scale dissection of the demographic history and evolutionary events that have shaped the genome of modern southern European cattle, however, still demand improved methods and data.

DATA A R C H I V I N G S TAT E M E N T S
The whole-genome sequencing data of 11 of the 19 newly sequenced animals are available from NCBI under the Bioproject ID PRJNA514237. The SRA id of these 11 whole-genome sequences is SRR8426534, SRR8426535, SRR842636, SRR8426537, SRR8426538, SRR84265359, SRR8426540, SRR8426541, SRR8426542, SRR8426543, SRR8426544. The whole-genome sequencing data of remaining 8 animals are available to interested researchers upon the request.

ACK N OWLED G EM ENTS
The authors would like to thank Bert Dibbits, Wageningen University, and Research, Animal Breeding and Genomics, for DNA extraction.
The authors also would like to thank Garrett Hellenthal for provid-