Demography and rapid local adaptation shape Creole cattle genome diversity in the tropics

Abstract The introduction of Iberian cattle in the Americas after Columbus’ arrival imposed high selection pressures on a limited number of animals over a brief period of time. Knowledge of the genomic regions selected during this process may help in enhancing climatic resilience and sustainable animal production. We first determined taurine and indicine contributions to the genomic structure of modern Creole cattle. Second, we inferred their demographic history using approximate Bayesian computation (ABC), linkage disequilibrium (LD) and N e Slope (NeS) analysis. Third, we performed whole genome scans for selection signatures based on cross‐population extended haplotype homozygosity (XP‐EHH) and population differentiation (FST) to disentangle the genetic mechanisms involved in adaptation and phenotypic change by a rapid and major environmental transition. To tackle these questions, we combined SNP array data (~54,000 SNPs) in Creole breeds with their modern putative Iberian ancestors. Reconstruction of the population history of Creoles from the end of the 15th century indicated a major demographic expansion until the introduction of zebu and commercial breeds into the Americas ~180 years ago, coinciding with a drastic N e contraction. NeS analysis provided insights into short‐term complexity in population change and depicted a decrease/expansion episode at the end of the ABC‐inferred expansion, as well as several additional fluctuations in N e with the attainment of the current small N e only towards the end of the 20th century. Selection signatures for tropical adaptation pinpointed the thermoregulatory slick hair coat region, identifying a new candidate gene (GDNF), as well as novel candidate regions involved in immune function, behavioural processes, iron metabolism and adaptation to new feeding conditions. The outcomes from this study will help in future‐proofing farm animal genetic resources (FAnGR) by providing molecular tools that allow selection for improved cattle performance, resilience and welfare under climate change.


| INTRODUC TI ON
Until recently, selection has occurred at a relatively slow rate in cattle and has been largely passive, driven by adaptations to diseases, dietary variation and local climatic patterns (Russell, 2007).
After the domestication of cattle ~7,000-10,000 years ago (YA; Bruford, Bradley, & Luikart, 2003), farmers started to artificially breed animals with preferred phenotypes, although it was not until ~200 YA that European farmers began the formation of closed herds which developed into modern breeds (Taberlet, Coissac, Pansu, & Pompanon, 2011). However, another type of human endeavour has  (Rodero, Rodero, & Delgado, 1992). Introductions of northern European cattle into North America were also reported between 1608 and 1640 (Felius et al., 2014). After three centuries featuring the predominance of Creole cattle, population declines started with the introduction of other cattle around the middle of the 19th century, better suited to more intensive production and breeding systems (Willham, 1982). The introduction of European breeds (poorly adapted to the tropics but normally highly productive) and zebus (highly adapted to the tropics, but normally not as productive) resulted in the substitution of Creoles by a series of less adapted, admixed or commercial populations, displacing them into marginal areas.
Reconstructing the demographic history of Creole populations is therefore key to disentangling American livestock colonization dynamics and can contribute to a better understanding of the genomic signatures of breed evolution. Additionally, ongoing climate change is likely to lead to reductions in animal production and welfare in the future, which makes an understanding of the genomic regions selected under the major and rapid environmental changes imposed on Creole cattle, a useful tool for enhancing resilience and sustainable production in the short term. Therefore, the aims of this study were first to determine the contributions of different taurine and indicine ancestors on the genomic make-up of Creole cattle. Our second aim was to infer the demographic history of Creole cattle populations by combining different approaches to investigate trends in effective populations size (N e ): approximate Bayesian computation (ABC; Wegmann, Leuenberger, Neuenschwander, & Excoffier, 2010); linkage disequilibrium (LD) structure (SNeP; Barbato, Orozco-terWengel, Tapio, & Bruford, 2015); and N e Slope analysis (NeS). Finally, our third aim was to perform a whole genome scan for the signatures of selection based on cross-population extended haplotype homozygosity tests (XP-EHH; Sabeti et al., 2007) and population differentiation (F ST ; Wright, 1949). To tackle these questions, we combined SNP array data in modern Creole cattle with modern day samples from breeds comprising their putative Iberian ancestors. By identifying genomic regions responding to these selection pressures, we aimed to provide valuable tools for improving cattle resilience, performance and welfare under climate change.

| Cattle populations and SNP array data
The data set comprised SNP array data from 412 individuals genotyped using the Illumina BovineSNP50 array versions 1 and 2, and the Bovine High Density BeadChip (Bovine Hapmap et al., 2009;Decker et al., 2009Decker et al., , 2014Gautier, Laloë, & Moazami-Goudarzi, 2010;Upadhyay et al., 2017;Supporting Information Table S1). Twentynine animals were newly genotyped using the Illumina BovineSNP50 version 2 and Geneseek Genomic Profiler Bovine 150k (Supporting Information Table S1). We included six Creole populations adapted either to tropical humid (three Colombian breeds: Costeño con  (Baoule, Lagune, N'Dama, Somba;Miretti, Dunner, Naves, Contel, & Ferro, 2004), representatives of commercial European stock introduced to the Americas around the middle of the 19th century (Angus, Red Poll, Holstein, Jersey, Shorthorn) and potential indicine introgression into Creole cattle from tropical areas (Brahman, Nelore, Gir). SNP array data were merged, and those SNPs detected as triallelic were flipped using PLINK 1.90 (Chang et al., 2015;Purcell et al., 2007). The data set was then phased with Beagle 3.3.2 (Browning & Browning, 2007) and the genomic positions for each SNP mapped to the UMD3.1 bovine assembly (RefSeq:GCF_000003055.5). Only autosomal SNPs with a minor allele frequency (MAF) above 1% and a call rate of at least 90% across all breeds were retained for downstream analyses, leaving 33,342 SNPs.

| Estimation of autosomal ancestry proportions and population divergence in Creole cattle
To determine the relative contribution of different potential taurine and indicine ancestors on the genomic structure of Creole cattle, population admixture analysis was carried out using the software Admixture v1.3 (Alexander, Novembre, & Lange, 2009) with 2,000 bootstraps for eight population clusters (K), corresponding to the African (two clusters), Iberian, Angus, Shorthorn, Holstein and Jersey taurine ancestries, as well as the Asian zebu ancestry. For this analysis, autosomal SNP array data were further pruned for LD higher than 0.1 using a sliding window approach of 50 SNPs and a step size of 10 SNPs. The results were graphically displayed using the POPHELPER R package (Francis, 2017).

| Demographic analysis
The population history of Creole cattle was reconstructed from the late 15th century to the present day using approximate Bayesian computation (ABC) as in Pitt et al. (2018). Briefly, a subset of the data was divided into four clusters: Col including all Colombian breeds (Costeño con Cuernos, Romosinuano, San Martinero), Senepol, Texas Longhorn and Iber (for all Iberian breeds). Eight alternative demographic histories were modelled based on historical records, results from Admixture, MDS and neighbour-net analyses, and prior N e estimates obtained with SNeP. The scenarios in- Fastsimcoal2 (Excoffier, Dupanloup, Huerta-Sánchez, Sousa, & Foll, 2013;Excoffier & Foll, 2011) using a pipeline implemented in ABCtoolbox (Wegmann et al., 2010), with a required computation time of eight days per scenario splitting simulations in ~50 parallel runs. Seventeen summary statistics were calculated in Arlsumstat (Excoffier & Lischer, 2010) for simulated and observed data (Supporting Information Table S2). A Spearman's rank correlation was calculated between each pair of summary statistics in R, and statistics with consistently high negative or positive correlation were removed (Supporting Information Figure S1, Table S2).
ABCtoolbox was used to perform rejection sampling on the simulated data set, retaining the 5,000 (0.5%) simulations that closest fit to the observed data for each of the eight scenarios. Marginal density (MD) and posterior probability P-values (i.e., the proportion of simulations that have a smaller or equal likelihood to the observed data) were calculated from the retained simulations after a postsampling regression adjustment using a general linear model. Bayes factors (BF) were calculated between scenarios by taking the quotient of the MD from two scenarios to choose the best modelled scenario fitting our data (i.e., if BF > 3, the alternative scenario can be rejected- Wegmann et al., 2010-).
To examine the most recent changes in N e , the software SNeP v1.11 (Barbato et al., 2015) was used to estimate the demographic history for each population by the relationship between LD and N e up until approximately 13 generations in the past. Default options were used apart from sample size correction for unphased genotypes, correction to account for mutation and Sved and Feldman's (1973) mutation rate modifier. To identify subtle changes in the inferred N e curve that might be diagnostic of changes in N e not visually explicit when observed in the N e plot, a "N e Slope analysis" (NeS) was used to investigate the rate and directionality of N e changes occurring in recent generations (Supporting Information Figure S2). The slope of each segment linking pairs of neighbouring N e estimates was first calculated and then normalized using the median of the two most proximal past N e slope values as in NeS n = (S n −X n )(1 +X n ) −1 where S n is the slope of the n th pair of neighbouring N e estimates, and X n = med{S n ,S n + 1 ,S n + 2 } .

| Selection signatures
We scanned for recently generated selection signatures to

| Ancestry estimation at candidate regions
Local Ancestry in adMixed Populations (LAMP) version 2.5 (Pasaniuc, Sankararaman, Kimmel, & Halperin, 2009)   found in Senepol are in accordance with the results obtained by Flori et al. (2012) and Huson et al. (2014), and they argue against the reporting of direct incorporation of N'Dama into Senepol breeding (Miretti et al., 2004). Although these authors attributed all European taurine contribution to Red Poll ancestry, our results strongly imply a major Iberian origin (68%) with a much lower ancestral contribution from commercial breeds (14%), including Red Poll. Despite the claimed admixture of Romosinuano with polled British breeds to incorporate polledness into its phenotype (Huson et al., 2014), contribution from European commercial breeds (including Red Poll samples) was inferred to be low and equal to that of Costeño con Cuernos (7%), from which the Romosinuano was developed. Although theoretically Florida Cracker has not been crossed with European commercial breeds (Ekarius, 2008), this ancestry represents 36% of its genomic pool. Finally, despite indicine introgression having been described in the Texas Longhorn (Decker et al., 2014), the values detected here are within the mean range for all Creole cattle populations (8%).
These results illustrate the influence of taurine and indicine ancestry that may underlie some of the demographic patterns and selection signatures found in Creole populations.

| Demographic history
ABC modelling was used to explore the recent demographic history of Creole cattle from the arrival of the first individuals to the Americas at the end of the 15th century to present. Thirteen summary statistics were retained after removing correlated measurements (Supporting Information Figure S1,  Figure S3). Higher N e values were retrieved for the Colombian (755) and Iberian (2,577) breeds derived from the grouping of three and eight populations, respectively, which overestimated diversity values and therefore provide a rough estimation of effective population sizes of around 252 (Colombia) and 322 (Iberia) genomes per breed in each group. These events are in close agreement with the known history of foundation, expansion and later contraction of cattle of Iberian origin in the Americas (de Alba, 1987;Eusebi, Cortés, Dunner, & Cañón, 2017;Rodero et al., 1992;Villalobos Cortés et al., 2009;Willham, 1982), and with the general trend displayed by populations that successfully colonize new habitats, undergoing a bottleneck followed by rapid growth usually due to lack of competition but here more likely due to habitat modification (see Gray et al., (Gray et al., 2014), which may explain the discrepancy we found between the colonization time t3 (635 YA) and known dates such as the arrival of cattle to the Americas after 1492 (524 YA; although within the 50th quartile range of 460-650 years). However, the drastic N e reduction from t1 (180 YA) to present closely correlates with the introduction of zebu and commercial cattle breeds to the Americas, starting around the middle of the 19th century and causing the gradual replacement of Creole populations that has led to their small current effective population sizes (de Alba, 1987;Willham, 1982). Despite the influence of European commercial breeds and zebu cattle de-

| Signatures of selection
We applied two methodologies that analyse different patterns of genetic variation, mainly related to evolutionary timescale, to investigate selection pressures enforced by the new tropical environment in six Creole populations, five of which are adapted to humid and hot conditions and one to dry and hot conditions. We used F ST , better suited to detect signals in the more distant past (Sabeti et al., 2006) that might reflect the zebu ancestral component found in Creole populations, and the LD-based XP-EHH method, which provides better resolution for recent selection (Cadzow et al., 2014) and is more suitable for disentangling the differences between Creole and Iberian populations expanding over the last 500 years.  Cracker, Senepol and Texas Longhorn clusters, respectively (Table 4).
GO analysis using DAVID produced a total of 12 enriched functional clusters (Supporting Information Table S5) and 13 enriched KEGG signalling pathways (Table 5).
Estimation of different ancestries using LAMP allocated slightly different contributions to Iberian, commercial, African and zebu genomic components (Table 4), when compared with the Admixture results (Table 1). Several regions under selection in Creole populations showed strong deviations in ancestry contributions (two standard deviations-SD-above or below the genomewide average, see Table 4), mostly detecting increases in the zebu compo-  (Table 4).
The region in BTA20 shared by Colombian (region #11) and Senepol (region #33) populations showed signals of selection with the XP-EHH analysis and demonstrated a strong increase in zebu ancestry of 38% (more than 6 SD) in Colombian breeds and 45% (almost 4 SD) in Senepol (  (Flori et al., 2012;Huson et al., 2014). Slick hair coat is characterized by sleek, short hair coupled with increased perspiration. The sleek and shiny properties of this coat may reflect solar radiation more efficiently, and the hair coat thickness F I G U R E 7 Manhattan plots of genomewide distribution of selection signatures detected with XP-EHH for Creole clusters when compared to the Iberian ancestral group IB1. Threshold is set at −log 10 (P-XPEHH) = 2 Colombian breeds -IB1 Florida Cracker -IB1

Senepol -IB1
Texas Longhorn -IB1 1 2 3 4 5 6 7 8 9 1 1 13 1 5 17 19 21 24 26 28 F I G U R E 8 Selection signatures in the BTA20 genomic region shared by the Colombian cluster (Costeño con Cuernos, Romosinuano, San Martinero) and the Senepol breed. Plot of −log 10 (P-XPEHH) values (y-axis) around loci (x-axis in Mb). Dots mark significant SNPs QTL in cattle so far and includes genes such as C1QTNF7, related to Trypanosoma cruzi cardiomyopathy (Deng et al., 2013), FBXL5, which controls iron metabolism processes key for the regulation of reactive oxygen species that augment with the exposure of animals to high environmental temperatures (Paital et al., 2016), BST1 that has immune functions facilitating pre-B-cell growth, and CD38 that has pleiotropic functions in T-cell activation (Würsch et al., 2016), social behaviour through its effect on the release of oxytocin (Krol, Monakhov, Lai, Ebstein, & Grossmann, 2015) and cancer. BST1 and CD38 are also implicated in salivary and pancreatic secretion and nicotinate and nicotinamide metabolism pathways (Table 5). The genes in this region represent adaptations to new and challenging environments, including immune function, nervous and behavioural processes that may be key for animals to adapt to new environmental conditions, metabolism, high environmental temperatures and diet.
Although the genes included in the region under selection in BTA05 shared by Florida Cracker (region #12) and Texas Longhorn (region #38) and detected with XP-EHH are mostly uncharacterized novel genes in Ensembl, as well as the antimicrobial agent lysozyme (LYZ) and other genes with no clear role in reproduction, this region has been associated with reproduction traits in Tropical Composite bulls. Concordantly, here we found a substantial increase in zebu (by 13%) and African (by 6%) ancestries in the Texas Longhorn, although this was not found in the Florida Cracker. Another region under selection in two clusters, Colombian (region #9) and Texas Longhorn (region #42), was also detected with XP-EHH methodology and included genes in BTA13 with roles in reproduction (CFAP61), neuroendocrine differentiation (INSM1), cancer (RALGAPA2) or cell cycle (KIZ). This region has been previously associated with QTLs related to production traits in cattle (Table 4) and displayed a strong increase in African ancestry (10%, more than 5 SD) in Texas Longhorn, but again imperceptible in the Colombian cluster.
Apart from these genomic regions under selection in more than one cluster, we detected signatures of selection associated with a variety of traits (Tables 4-5, Supporting Information Table S5). These include regions of the genome enriched for genes involved in immune system activation in response to infectious diseases (tick resistance in the Colombian group and Florida Cracker, tuberculosis susceptibility in Florida Cracker and Senepol, mastitis in the Colombian group and Senepol), or enriched immune pathways in Senepol (cytokine-cytokine receptor interaction, Jak-STAT signalling, Toll-like receptor signalling, natural killer cell-mediated cytotoxicity, osteoclast differentiation, and responses to viral diseases -measles, influenza A, herpes simplex-). In addition, we found regions enriched for genes associated with heat tolerance, including regulation of blood pressure and, importantly, thermoregulation in lactating cows exposed to heat stress in the Florida Cracker (region #24). This region in BTA26 showed a strong increase in zebu ancestry (43%, more than three SD) and was also implicated in temperament, with the SLC18A2 gene involved in the dopamine and serotonin pathways associated with temperament in cows (Garza-Brenner et al., 2017). Phenotypic variation driven by production aims, such as beef or dairy traits, may have had an impact in the genomic areas under selection, highlighted here by the regions detected within QTLs associated with milk and meat production, fatty acid profile, performance, conformation and reproduction.
Finally, we have also validated the signal for the polled locus (Flori et al., 2012;Medugorac et al., 2012) in Senepol (BTA01 region #25), with both XP-EHH and F ST methodologies. This region showed a strong zebu component increase of 47% (almost four SD deviations above the genome mean). None of the previously described polled mutations are located in known coding regions. Within our candidate region, the most significant SNPs peaked around three genes, GART, DNAJC28 and TMEM50B, none of them with a clear role in polledness ontogenesis. The key immune functions displayed by several genes in this region (IFNAR2, IFNGR2, IFNAR1; Table 4), which could be important in responses against tropical diseases and parasite infections, may distort the signal from the polled locus.
Although F ST -and LD-based methodologies are widely used, there are other possible factors apart from selection that may mimic the signals obtained, such as demographic events (e.g., the bottlenecks and expansions detected with the ABC and SNeP analyses; Vitti, Grossman, & Sabeti, 2013). Moreover, the use of SNP array markers may underestimate genetic diversity through ascertainment bias, distorting allele frequencies and derived statistics such as LD (Vitti et al., 2013). Also, selection response for complex traits caused by weak selection at many sites across the genome may leave few or no classical signatures (Kemper, Saxton, Bolormaa, Hayes, & Goddard, 2014), reducing the signal obtained.
However, other studies on cattle adaptation to new environments (Makina et al., 2015;Porto-Neto et al., 2014), including tropical adaptation, reported the slick hair coat and QTLs associated with tick resistance, heat tolerance and reproduction in tropical populations.
In conclusion, we compared modern Creole cattle with modern day samples from breeds comprising their putative Iberian ancestors for the first time to reconstruct their demographic history and search for selection signatures enforced by American environments on a small number of founder animals during a brief period of time. We show that despite strong evidence for rapid genomic adaptation to their new tropical environments (e.g., for slick hair coat genes improving thermotolerance), Creole cattle have recently undergone a major decline and will require genetic conservation measures if they are to continue to thrive. The outcomes from this study will contribute to the design of innovative breeding schemes that will include, apart from traditional performance traits, resilience biomarkers, allowing sustainable production in harsh environments and improving sanitary conditions in farms under the ongoing climate changes.

DATA A R C H I V I N G S TAT E M E N T
The SNP array data of the 29 animals newly genotyped and the synchronized and filtered 33,342 SNPs data set are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.g4f4790.

CO N FLI C T O F I NTE R E S T
None declared.