Landscape genomic approach to detect selection signatures in locally adapted Brazilian swine genetic groups

Abstract Samples of 191 animals from 18 different Brazilian locally adapted swine genetic groups were genotyped using Illumina Porcine SNP60 BeadChip in order to identify selection signatures related to the monthly variation of Brazilian environmental variables. Using BayeScan software, 71 SNP markers were identified as FST outliers and 60 genotypes (58 markers) were found by Samβada software in 371 logistic models correlated with 112 environmental variables. Five markers were identified in both methods, with a Kappa value of 0.073 (95% CI: 0.011–0.134). The frequency of these markers indicated a clear north–south country division that reflects Brazilian environmental differences in temperature, solar radiation, and precipitation. Global spatial territory correlation for environmental variables corroborates this finding (average Moran's I = 0.89, range from 0.55 to 0.97). The distribution of alleles over the territory was not strongly correlated with the breed/genetic groups. These results are congruent with previous mtDNA studies and should be used to direct germplasm collection for the National gene bank.

Russian Belorussian, Kazakhstan, and Ukraine pig breeds. This adaptation to the environment can be evaluated by genomic analyses of areas of the genome that have been, or still are, under selection (Luikart, England, Tallmon, Jordan, & Taberlet, 2003;Storz, 2005;Vitalis, Gautier, Dawson, & Beaumont, 2014). These can be estimated using single nucleotide polymorphisms (SNPs) spread throughout the genome by theoretical populational F ST outliers approach that are assumed to be signatures of natural selection (Lewontin & Krakauer 1973;Luikart et al., 2003;Joost et al., 2007;Lotterhos & Whitlock 2014). Signatures of selection were found by Ottoni et al. (2013) in pigs from archaeological sites, helping to understand some events of pig domestication in Western Eurasia, introgression of Asian genes in European pigs by human selection , and enables the identification of introgression among different breeds . These selection signatures can help us understand the complex relation between adapted swine genetic groups and the environment, as well as the process of adaptation of swine over the Brazilian territory and to overcome the challenges in swine management in a country with continental dimensions and different climatic conditions. In a constantly changing world, the identification of those signatures may be the key to promote more sustainable animal production, improving gains in productivity and welfare, as well as decreasing sanitary expenses with medication and management (Mirkena et al., 2010;Shabtay, 2015). They can also be used for branding of particular regional products (Herrero-Medrano et al., 2013). In addition, these results might be an auxiliary tool to help the enrichment of National gene banks (Paiva, McManus, & Blackburn, 2016) and conservation programs as suggested by Nuijten et al. (2016) or Bosse et al. (2015), who show that management strategies to preserve the variation in managed populations can benefit by wholegenome, high-density, marker-assisted methods.
The hypothesis of this study is that monthly variation from Brazilian environment by the years, as was seen with Vietinamese (Pham et al., 2014), American village pigs (Burgos-Paz et al., 2012), and Chinese sheep (Yuan et al., 2017), influenced successful adaptation of swine in the Brazilian territory and left detectable signatures of natural selection. Understanding the influence of the environment on the process of allele selection can be useful to improve gains on small farms, preserve genetic variation from herds, and adaptation to world climatic changes.
To test this hypothesis, a medium SNP chip array of locally adapted swine breeds population, with animals sampled from over the main Brazilian regions, was used to identify selection signatures through FST Outliers approach.

| Sampling
The Brazilian territory is divided into five regions (each further divided into states) based on natural, cultural, social, and economic features. Despite the high mobility of swine, free movement of animals between states and regions is restricted by legal and sanity factors (Classical Swine Fever, African Swine Fever, Foot-and-Mouth disease, Aujeszky's disease). So, to capture high spatial representation of the environment and genetic territorial dispersion of the swine breeds over the Brazilian territory, the sample selection (Table 1 and S1) was structured with at least one sample from each Political Region ( Figure S1). A total of 191 samples of nonrelated animals from 18 different swine genetic groups (13 locally adapted Brazilian swine genetic groups, four commercial or global breeds, and one group formed by crossbred animals) were randomly selected. All samples used in this experiment are deposited in Embrapa's Gene Bank (http://aleloanimal.cenargen.embrapa.br) located at Embrapa Genetic Resources and Biotechnology Center, Brasilia, DF. The samples from locally adapted Brazilian swine genetic groups were classified in accordance with a phenotypic description suggested by Viana (1956), Germano, Albuquerque, and Castro (2002), and Mariante and Cavalcante (2006).  Helix, Inc., Bozeman, MT, USA 2015). We chose parameter thresholds as reported in literature Burgos-Paz et al., 2012;Traspov et al., 2016) that eliminated low-quality SNPs/Samples but preserve a maximum number of samples: minimal individual genotype call rate of 90% that excluded 11 samples; 95% call rate and 0.05% minor allele frequency (MAF) for the markers when 21,605 SNPs were excluded. Additional linkage disequilibrium (LD) pruning was performed using a window size = 50, window increment = 5, and r2 threshold = 0.05, which eliminated a further 11,646 SNPs. The final data had 28,860 SNP markers with an SNP density of 1/87,026 kb.

| Signatures of selection and outlier detection
Loci with high or low allelic differentiation in relation to the expected neutrality, from the 28,860 SNPs in final data, were used as an indication of selection (Hoffmann & Willi, 2008) and were tested by two different methodologies of outlier identification.
BayeScan software V 2.1 (Foll & Gaggiotti, 2008)  to other loci among the MCMC outputs of its simulations (Beaumont & Balding, 2004). The software was set up with 5,000 burn-in interactions, followed by 10,000 interactions with thinning interval of 10.
Convergence was verified using CODA package for R (Plummer, Best, Cowles, & Vines, 2006) with critical values of −1.96 > z > +1.96. A second analysis was performed using the software Samβada (Joost et al., 2007;Stucki et al., 2016) that used logistic regression models to determine the probability of allele presence/absence in a specific environment. The models were considered significant when the G Score and Wald Score were significant at α = 0.01 threshold with a Bonferroni correction. The G Score can be defined as the ratio between maximum log likelihood of model with the presence of the independent variable and the maximum log likelihood of model without independent variable, or as the independent variable affects in the log likelihood model.
The Wald Score tests if goodness of fit is affected when the independent variable is removed from the model. Using the FREQ procedure (Proc FREQ) of SAS v9.3 (SAS Institute Inc. 2011), the agreement between the two methods was evaluated through the Kappa index. The Kappa index is a measure of interrater agreement, between two or more methods: When the observed agreement exceeds chance agreement, kappa is positive, with its magnitude reflecting the strength of agreement. Gene annotations within candidate regions were obtained using the data provided by Ensembl (Cunningham et al., 2015) and NCBI (http://www.ncbi.nlm.nih.gov). To explore the linkage disequilibrium (LD) of selection signatures detected with other FST outliers and with nearby genes, we calculated the LD from these markers using Plink software (Purcell, 2014).
To measure the degree of spatial association for marker signaled as F ST outliers by both methods, the Global spatial autocorrelation

| RESULTS
Molecular variance analysis among states grouped into regions (Table S3) showed 93.35% of the genetic variance was contained within states and only 0.87% among regions. The genetic variance among a group of animals from commercial breeds and a group of locally adapted genetic groups (Table S4) showed individual variance (81.85%) was larger than variance between groups (3.66%) or from individuals within groups (8.27%).
The F IS (Table S5)  for 42% of all models generated by Samβada ( Figure S3). Five markers, associated with different environmental conditions (Table 2)  0.97), and reaching close to zero with 15 (fifteen) neighbor windows.
The highest value for Moran′s I was associated with solar radiation in the summer months. The selection signal markers have had a high global spatial correlation between 5 and 10 neighbors and present a rapid decrease to zero with 35 neighbors (Figure 3). With five neighbors, the maximum local I was 0.7072 from marker CASI0001257 and the smallest was −0.04346 from marker ASGA0002592 (Figure 3).

For these five markers, considered selection signatures in
BayeScan and Samβada, we found a nucleus of homozygotes in the neighborhood, but only with up to 30 neighbors. A regionalization of these markers was observed around a nucleus of climatic variation ( Figure 4), with a loss of influence when geographical distance between samples was increased, or when distancing from the climatic influence center was decreased.

| DISCUSSION
The evolution and adaptation of pigs are subject to environmental influences, as has been observed in humans (Storz 2010), humans and cattle (Beja-Pereira et al., 2003), fish (Nielsen et al., 2009), and other species (Manthey & Moyle, 2015). The genetic variability and population structure found were similar to other populations (Boitard et al., 2010;Burgos-Paz et al., 2012) and other approaches (Sollero et al., 2009) For this work, we used two methods to detect Outliers in F ST .
According to Pérez-Figueroa, García-Pereira, Saura, Rolán-Alvarez, and Caballero (2010), BayeScan's algorithm under neutral hypothesis admits less than 1% of false discoveries, when we assume the Direchlet distribution and that the population has a neutral structure.
Those presuppositions on the distribution and structure may become biased due to the existence of more than one sample within the population, or when individuals share a common ancestor in the recent past (Lotterhos & Whitlock 2014). Feng, Jiang, and Fan (2015) argue that some BayeScan configurations can affect the proportion and the direction of the markers in selection. This kind of bias does not occur with Samβada, because it translates samples in alleles frequencies associated with ambient data and uses these outliers to calculate logistic regression, which explains allele presence in a specific environment F I G U R E 3 Moran′s I correlogram from genotypes of the markers identified as selection signatures in Brazilian locally adapted swine breeds by BayeScan and Samβada. Maximum, minimum, and average from all markers . As the Samβada algorithm is based on individual and local levels, taking into consideration the p-value after Bonferroni correction to determinate the significance of the models, the probability of mistakenly considering significant an association between marker and environmental variables decreases (Stucki, 2014;Stucki et al., 2014).
The rates of spatial autocorrelation (Figure 3) showed that the 5 to 10 closest neighbors tend to have high spatial autocorrelation at low levels (Favero & de Figueiredo, 2009;Gama et al., 2013). The probability of genetic similarity at a distance higher than 10 neighbors decreases, and this might be related to the limited dispersion due to sanitary legislation within the country for swine species, as well the market organization.
The pattern of spatial distribution of the genotypes, identified as selection markers (Figure 4), associated with environmental conditions such as temperature, solar radiation, and BIO18-precipitation of the warmest quarter (Table 2), during some periods in the year, shows adaptive selection linked to seasonality. The genotypic frequency of these signatures of selection divides the territory into two regions (Table 4), one in the north where we have predominantly the occurrence of one of the genotypes and the other to the south where the alternative genotype occurs. According to Nimer (1979), these two regions are identified by different climates: the north shows "equatorial," "tropical," and "northeast occidental tropical" climates; the south shows "temperate" and "central Brazil tropical" climates.
Although the markers MARC0021990; ASGA0033717; MARC0007678 were responsible for a high number of significant models identified in Samβada, we did not find any significant multivariate model ( Figure S3). When one marker is linked to some environmental variables, this infers that many evolutionary steps within the environment, throughout the year, influence the presence of markers. Despite only univariate models being found, there were associations between these alleles and the variation of temperature throughout the year, but not among the seasons as discussed by Martyn Plummer et al. (2006).
The environmental temperature is closely linked to welfare (Lee & Phillips, 1948) and animal productivity (Collier & Gebremedhin, 2015), affecting pigs in all stages of life (Ross et al., 2015;Wildt et al., 1975), including intrauterine development, with consequences in the postnatal development of animals (Johnson et al., 2013(Johnson et al., , 2015a. The significant models found by Samβada for mean diurnal range (BIO2) associated with the marker ALGA0012967 in an intronic region of the LGR4 gene, which directly influences the testicular development and spermatogenesis, were in accordance with Petrocelli et al. (2015) who reported seasonal variation of seminal quality parameters affecting the reproductive performance of females. Once the survival and adaptation of the species in the environment are limited by reproductive success from individuals, and knowing when environmental conditions such as temperature and humidity are outside thermal comfort limits, we can see physiological alterations leading to reproductive failure in females (Nteeba et al. 2015) and males (Flowers, 2015;Wettemann & Bazer, 1985). Ai, Huang, and Ren (2013), working with Chinese pigs in Tibet, and adaptive components of genetic diversity (Bradbury, Smithson, & Krauss, 2013) across a dynamic and heterogeneous unpredictable landscape. Selection signatures from autochthone breeds may be a tool to improve livestock production through changes in the frequencies of these alleles in commercial herds, improving the adaptation in different environments. This is important in a world marked by environmental change that acts by altering the composition of the community and shifting range boundaries, phenology, genetic diversity, and genetic structure of organisms (Manel et al., 2012), probably imposing strong selection pressures on traits important for fitness (Gienapp, Teplitsky, Alho, Mills, & Merilä, 2008).

| CONCLUSION
Allele frequency of markers from Brazilian locally adapted swine breeds was seen to be under the influence of environmental conditions showing evidence of footprints of divergent selection in at least 8 (eight) SNP markers, associated with temperature, solar radiation, and BIO18 linked with intracellular activity and circulatory system and were considered important for species adaptation.
The distribution of SNP alleles over the Brazilian territory demonstrates a clear north-south orientation, dividing the country into two distinct regions according to climatic conditions, drier and sunnier in the North and wetter and colder in the South. This information on selection signature distribution across Brazilian territory could be included in programs of assisted selectin using genetic markers, helping farmer through easier management of animals selected for adaptive characteristics. In the same way, the markers could be used to direct animals for more suitable regions according to their genotype in both traditional husbandry situations as well as genetic resource conservation programs.