Resequencing and signatures of selection scan in two Siberian native sheep breeds point to candidate genetic variants for adaptation and economically important traits

Resequencing and signatures of selection scan in two Siberian native sheep breeds point to candidate genetic variants for adaptation and economically traits. Summary Russian sheep breeds represent an important economic asset by providing meat and wool, whilst being adapted to extreme climates. By resequencing two Russian breeds from Siberia: Tuva ( n = 20) and Baikal ( n = 20); and comparing them with a European (UK) sheep outgroup ( n = 14), 41 million variants were called, and signatures of selection were identiﬁed. High-frequency missense mutations on top of selection peaks were found in genes related to immunity ( LOC101109746 ) in the Baikal breed and wool traits ( IDUA ), cell differentiation ( GLIS1 ) and fat deposition ( AADACL3 ) in the Tuva breed. In addition, genes found under selection owing to haplotype frequency changes were related to wool traits ( DSC2 ), parasite resistance ( CLCA1 ), insulin receptor pathway ( SOCS6 ) and DNA repair ( DDB2 ) in the Baikal breed, and vision ( GPR179 ) in the Tuva breed. Our results present candidate genes and SNPs for future selection programmes, which are necessary to maintain and increase socioeconomic gain from Siberian breeds.

Following the domestication of sheep (Ovis aries) and their migration with human populations, natural selection allowed improved adaptation to local environments, and artificial selection and breed formation affected economically important traits (Zeder, 2008). In Russia, where populations of livestock face environmental stresses in temperature, sheep have been selectively bred to meet production demands while being adapted to their local conditions. Using whole-genome genotyping, Deniskova et al. (2018) demonstrated genetic clustering of Russian breeds based on their wool type and further showed coarse wool breeds clustering with Asian breeds, and fine-wool breeds with European breeds. Therefore, it may be expected that each of these two clusters of breeds would demonstrate distinct molecular traces of adaptation. This was shown through a signatures of selection scan using high-density genotyping of 15 Russian sheep breeds (Yurchenko et al., 2019). Promising candidate gene regions were identified including those linked to wool traits, environmental adaptations, and domestication. The next step is to identify candidate genetic variants contributing to adaptations and economic traits. The aim of this study was to identify candidate genes containing missense SNPs under selection from two resequenced Russian sheep breeds from Siberiathe Tuva and Baikal.
The decorrelated composite of multiple signals pipeline was used to calculate H1, H12, Tajima's D, Pi and F ST as described in Yurchenko et al. (2019). The method allows integration of the major measures of the signatures of selection into a single statistic (Ma et al., 2015). P-values were converted to q-values to correct for false discovery rate using BIOCONDUCTOR qvalue package. q-Values were used to render Manhattan plots using qqman manhattan function in R. To identify regions under selection genome-wide, all intervals with SNPs expressing decorrelated composite of multiple signal q-values less than 0.01 were identified. Selected interval boundaries were defined by the first SNP with q-value greater than 0.2 up-and downstream. SNPs found within the regions were annotated using the NGS-SNP pipeline (Grant et al., 2011). Putative effects of missense SNPs were predicted using PolyPhen score range 0-1 (0 = benign, 1 = deleterious; Adzhubei et al., 2013). Representation of haplotypes was rendered by HAPLOSTRIPS (Marnetto & Huetra-S anchez, 2017) from phased SNP data. Functional enrichment analysis was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID version 6.8; Huang et al., 2009) using all genes found within regions under selection per breed against a background of all genes. Any group with enrichment score greater than 1.3 was described.
Copy number variant (CNV) analyses were conducted with cn.Mops R package (Klambauer et al., 2012) in windows of 700 SNPs. These were merged into CNV regions (CNVRs) with BEDOPS bedmap function using at least 50% reciprocal overlap in at least three individuals per breed. Duplicate CNVRs were removed and count number was used to infer duplication or deletion. Effective population sizes (N e ) were calculated with SMC++ version 1.15.2 (Terhorst et al., 2017) retrospectively excluding individuals with excessive homozygosity, all SNPs within regions under selection and CNVRs. Generation times of 4 years and mutation rate 1.0 9 10 À8 were assumed from the literature (Kijas et al., 2012).
Fifty-four resequenced samples with mean coverage of 11.99 were aligned to the reference genome (Table S1). The GENOME ANALYSIS TOOLKIT pipeline called 41.6 million SNPs, which were pruned to 38.3 million after filtering. Population statistics (Table S2), showed high levels of polymorphic loci in both breeds as well as low inbreeding in the Baikal breed and moderate inbreeding in Tuva. Equal ranges of H e were seen in both breeds and the N e calculated was larger for the Tuva than the Baikal breed (Fig. S1), in line with estimates by Deniskova et al. (2018). Transitiontransversion ratios align with those previously seen in commercial cattle breeds (Jiang et al., 2008).
CNV regions covered 1 and 3% of Baikal and Tuva breed genomes respectively (Tables S3 & S4), which overlapped a list of known ovine CNVRs by 78% in Baikal and 81% in Tuva sheep. Non-overlapping CNVRs can be seen in Tables S5 & S6. Four regions under selection in Baikal and 34 in Tuva breeds overlapped CNVRs present in their respective breeds (Tables S7 & S8). These regions, clustering on OAR3, OAR6 and OAR17 in Tuva and OAR17 in Baikal breeds, were treated as artefacts of alignment and SNPs from these regions were discounted from the selection scan.
The remaining 739 selected intervals (Baikal = 296; Tuva = 443) spanned 1.0 and 1.3 Mbp in Baikal and Tuva breeds respectively containing 15 954 and 24 978 SNPs where 3084 (19%) and 9430 (38%) SNPs were not present in the NCBI SNPdb. Annotation of SNPs in the regions under selection found 31 missense mutations in Baikal (Table S9) and 12 in Tuva sheep (Table S10). DAVID functional clustering demonstrated enrichment for ubiquitinylation, transmembrane helix proteins and keratin filament formation terms in Baikal (Table S11) and none in the Tuva breed. A list of all genes found within the regions under selection entered for DAVID analysis is available (Tables S12 & S13).
We focused on the genes found in the top selected intervals with the difference in allele frequencies between populations supported by F ST or haplotype analysis (H1/H2 statistics; Fig. 1; Table 1). In the Baikal sheep, DSC2 (q-  value = 0.001), which encodes a desmosomal protein found in hair follicles, linked to cashmere traits in goats (Simpson et al., 2009;Wang et al., 2016), overlapped a selected interval with the strongest signal originating from the H1/ H2 statistics. We failed to identify missense mutations with large F ST values in the gene, suggesting that selection probably acts on haplotypes. This locus was previously found under selection in a scan for selected regions in Russian long-haired sheep (Yurchenko et al., 2019). Three additional genes in Baikal breed overlapped intervals recognised by H1/H2 statistics: CLCA1, a chloride channel regulatory protein, upregulated in sheep resistant to Teladorsagia infection (Chitneedi et al., 2018); SOCS6, which is known to regulate the insulin receptor pathway (Krebs et al., 2002); and DDB2, which recruits DNA repair factors after ultraviolet radiation damage (Nag et al., 2001). Only one of the top signatures of selection contained missense mutation with a high F ST (F ST = 0.3) in the LOC101109746 gene which encodes the HLA class II histocompatibility antigen DM beta chain, needed for the major histocompatibility complex for antigen presentation to the adaptive immune system (Fling et al., 1994; Fig 1).
Tuva sheep showed multiple missense mutations on the top peaks of selected regions supported by the F ST statistics. Of these, the strongest signature of selection was found in the derived allele of IDUA (qvalue = 0.0004; F ST = 0.5), encoding L-iduronidase. Human pathologies in this gene lead to mucopolysaccharidosis type I, which presents a global phenotype that includes coarse hair (Scott et al., 1995;Kloska et al., 2005). Furthermore, an alternative allele of GLIS1, an enhancer of pluripotency markers (Maekawa et al., 2011), and the reference allele of AADACL3, which is associated with fat deposition in Chinese sheep breeds (Lu et al., 2020), were found in the regions under selection with support from F ST . The reference allele of GPR179, highlighted by H1/H2 statistics, which is linked to genetic blindness, was also found under selection in Tuva sheep (Audo et al., 2012).
The results of this study, based on whole-genome resequencing, point to stronger and often more narrow signatures of selection than previously reported in the literature using the same two Russian sheep breeds but with high-density genotyping data (Yurchenko et al., 2019). Owing to whole-genome resequencing data, we were able to point to novel candidate SNPs and haplotypes under selective pressure in both breeds. These include haplotypes in candidate genes and missense SNP candidates for wool traits, fat deposition and immunity in the Baikal and Tuva sheep breeds. Knowledge of these genetic variants confers insight into the adaptations of each breed to its local environment, highlights economically important traits and provides variations for marker-assisted breeding.