Introgression of Chinese haplotypes contributed to the improvement of Danish Duroc pigs

Abstract The distribution of Asian ancestry in the genome of Danish Duroc pigs was investigated using whole‐genome sequencing data from European wild boars, Danish Duroc, Chinese Meishan and Bamaxiang pigs. Asian haplotypes deriving from Meishan and Bamaxiang occur widely across the genome. Signatures of selection on Asian haplotypes are common in the genome, but few of these haplotypes have been fixed. By defining 50‐kb windows with more than 50% Chinese ancestry, which did not exhibit extreme genetic differentiation between Meishan and Bamaxiang as candidate regions, the enrichment of quantitative trait loci in candidate regions supports that Asian haplotypes under selection play an important role in contributing genetic variation underlying production, reproduction, meat and carcass, and exterior traits. Gene annotation of regions with the highest proportion of Chinese ancestry revealed genes of biological interest, such as NR6A1. Further haplotype clustering analysis suggested that a haplotype of Chinese origin around the NR6A1 gene was introduced to Europe and then underwent a selective sweep in European pigs. Besides, functional genes in candidate regions, such as AHR and PGRMC2, associated with fertility, and SAL1, associated with meat quality, were identified. Our results demonstrate the contribution of Asian haplotypes to the genomes of European pigs. Findings herein facilitate further genomic studies such as genomewide association study and genomic prediction by providing ancestry information of variants.

Domestication of pigs has occurred independently from wild boars in Europe and Asia (Giuffra et al., 2000;Larson et al., 2005). It is well documented that around 1,700 Chinese pigs were introduced into Northern Europe to improve local breeds to meet the demand of intensified agriculture (White, 2011). A genomic study of Isla del Coco feral pigs, a population derived from British pigs and isolated since 1793, showed evidence of crossbreeding with Chinese pigs, demonstrating crossing as early as the 18th century (Bianco, Soto, Vargas, & Perez-Enciso, 2015). A previous study (Chen et al., 2017) confirmed that most European pig breeds contained about 20% ancestry from Chinese pigs. Artificial selection on introgressed Chinese haplotypes has further been demonstrated in European Large White pigs (Bosse et al., 2014). Besides, adaptive introgression in pigs has also been observed on the X chromosome (Ai et al., 2015).
To identify introgressed haplotypes accurately, extant groups genetically similar to the source populations of admixture should be sampled. In our previous study (Chen et al., 2017), we found that the main source of the introgression from China to Europe was pigs from South China, which were genetically close to Bamaxiang. To better characterize the hybrid nature of European pig genome, in this study, we used both Bamaxiang pigs from South China and Meishan pigs from East China to identify the introgressed Chinese haplotypes in the Danish Duroc pigs.
The aim of the study was to detect selection on introgressed Chinese haplotypes and evaluate their contribution to the improvement of European pigs. To achieve this goal, we carried out a study utilizing whole-genome sequencing data from 16 European wild boars (EUW), 90 Danish Duroc, 30 Meishan and 6 Bamaxiang pigs.
EUW, Meishan and Bamaxiang pigs were used as donors to paint the genomes of Duroc pigs using ChromoPainter (Lawson, Hellenthal, Myers, & Falush, 2012), a recently developed software for local ancestry inference.

| Sample collection and DNA preparation
Samples from 30 Meishan pigs and 90 Danish Duroc pigs were collected. For Meishan pigs, genomic DNA was extracted from ear tissue using a standard phenol-chloroform method; for Duroc pigs, genomic DNA was extracted from blood sample using the QIAsymphony DNA Mini Kit (Qiagen). Sequencing was performed on the Illumina HiSeq 2000 platform. Whole-genome sequencing data from 16 EUW (Bosse et al., 2014;Frantz et al., 2015) and six Bamaxiang pigs (Ai et al., 2015)

| SNP calling and filtering
For each individual, the read-pairs were pre-processed using PRINSEQ (Schmieder & Edwards, 2011). Reads were trimmed to a minimum base PHRED quality of 20 from the 3′-end and removed if shorter than 51 bp, with more than 3 Ns, or a mean quality score <18. Filtered reads were aligned to the porcine reference genome build 10.2 (Groenen et al., 2012) by the Burrows-Wheeler Aligner (BWA version 0.7.17) ), employing the "mem" algorithm. SAMtools (version 1.8)

| Population structure analyses
Population structure analyses were conducted together with our previously merged SNP array data from Eurasian domestic pigs and wild boars (Chen et al., 2017). These array data were collected from multiple studies (Ai, Huang, & Ren, 2013;Chen et al., 2017;Goedbloed et al., 2013;Manunza et al., 2013;Wilkinson et al., 2013), involving 695 individuals and 30,549 autosomal SNPs. Therefore, we extracted from the sequence data to obtain genotypes at all variants present in the merged data set. To reduce the effect of uneven sample size per population (McVean, 2009), we randomly selected six individuals for SNP array genotyped populations with more than six individuals, while keeping all sequenced individuals. We performed PCA (principal component analysis) using EIGENSOFT 6.0.1 (Patterson, Price, & Reich, 2006) and population structure analysis by ADMIXTURE (version 1.23) (Alexander, Novembre, & Lange, 2009) for K = 1 to 20 ancestral populations using default options.

| Chromosome painting using ChromoPainter
Chromosome painting is a method to characterize shared ancestry between individuals that takes linkage disequilibrium (LD) information into account. We reconstructed haplotypes and imputed missing genotypes using BEAGLE (version 4.1) (Browning & Browning, 2007). Only bi-allelic SNPs were used for chromosome painting. We used ChromoPainter v2 (Lawson et al., 2012) to perform chromosome painting on Duroc individuals using EUW, Meishan and Bamaxiang pigs as donor populations. We ran ChromoPainter twice, as recommended in the user manual. In the first run, we used all donor individuals to paint nine randomly selected Duroc individuals on six randomly selected chromosomes (SSC 2,5,7,10,11,16 and 17). The aim of this run was to estimate switch rate, global mutation rate and copying probabilities by running 50 iterations of the expectation-maximization algorithm.
We averaged estimated values of each parameter across these chromosomes, weighting by the number of SNPs, and then across individuals, to obtain estimates of parameters. In the second run, we used the estimates from the first run to perform chromosome painting for all individuals and autosomes. As a result, we obtained an estimated ancestry proportion for each SNP in the genome of Duroc.
To evaluate the variation of ancestry proportion, we divided the genome into non-overlapping 50-kb windows. In each window, we separately estimated the average proportion of ancestry from Meishan, Bamaxiang and EUW across SNPs. We focused our analyses on windows with a high proportion of introgression from Chinese pigs, rather than windows with a significant signal of selection. Specifically, we selected windows with more than 50% Chinese ancestry (the sum of Meishan and Bamaxiang ancestries) as candidate regions of adaptive introgression. To avoid the confounding of introgression from European pigs to China, we further excluded windows with extreme F ST between Meishan and Bamaxiang, that is, windows with F ST falling in the top 5% of the empirical distribution.
Because the range of true F ST values by definition is between 0 and 1 (Wright, 1951), we set F ST to 0 for windows with negative average estimates of F ST . The differentiation between Duroc and Chinese pigs was calculated as the average of F ST between Duroc and Meishan, and F ST between Duroc and Bamaxiang.

| Gene annotation and functional analyses
We conducted gene annotation on five regions with the highest proportions of Meishan ancestry and five regions with the highest proportions of Bamaxiang ancestry. We took the following approaches to define these regions using Meishan ancestry as an example. We first selected 50-kb windows with the highest proportions of Meishan ancestry, which did not exhibit extreme F ST between Meishan and Bamaxiang (i.e., exclude windows with F ST higher than 95% quantile of the empirical distribution); then, these windows were extended in both directions until the next window with <50% Meishan ancestry or with extreme F ST between Meishan and Bamaxiang. Since long regions were more likely to be caused by selection, we chose the five regions with at least two windows. Gene contents and QTL (quantitative trait locus) numbers in these regions were retrieved from the Ensembl Genes 89 Database using BioMart (Kinsella et al., 2011) and from the Animal QTL Database (Hu, Park, & Reecy, 2016).
We counted the number of QTLs in Animal QTL Database (Hu et al., 2016) which overlapped with candidate regions with more than 50% Chinese ancestry without extreme F ST between Meishan and Bamaxiang. First, 25,610 QTLs were extracted from the database.
Among these, 8,937 have been identified by linkage analysis. These were excluded due to uncertain genomic locations. The remaining 16,673 QTLs, identified by association studies, were retained for further analysis. These QTLs were classified into five groups corresponding to five classes of traits as defined in the Animal QTL Database (Hu et al., 2016): meat and carcass, production, reproduction, health and exterior traits. We excluded 817 QTLs not located on autosomes or spanning more than 1 Mb. A total of 15,856 QTLs remained, with 7,654, 957, 837, 4,774 and 1,634 associated with meat and carcass, production, reproduction, health and exterior traits. The length of those QTL ranged from 40 bp to 1 Mb, with a mean of 24 ± 115 kb (mean ± SD). Of note, the database defined QTL regions as follows: (a) when flanking SNPs were given, the region was defined by the actual locations; (b) when only the centre SNP was given, the region was defined as a small region to create a minimum visible map window in a genome browser (the middle of the region was the actual location of centre SNP); (c) when SNPs were completely missing, the region was estimated from linkage association. We defined the mid-point of a QTL region as the peak position.
Finally, we counted the number of QTLs whose peak positions were located within candidate regions. When two QTL records associated with traits in the same trait group and had the same genomic interval in the Animal QTL Database, they were counted as one QTL. The same method was applied to each trait group separately.
To test for enrichment among all QTLs and among trait-specific QTLs within candidate regions, we applied a permutation test. We concatenated the autosomes into a circular genome, in the order from SSC1 to SSC18. A permuted sample was formed by moving candidate regions along the genome by a randomly chosen amount. This permutation did not change the relative positions between these windows preserving their correlation structure. We then computed the number of all QTLs and those belonging to each trait group that presented in these simulated windows using the same method as above. In total, 10,000 permutations were performed. The distribution of numbers of QTLs observed in the permutated regions was treated as the null distribution from which we computed the significance levels of the number of QTLs observed in the real data.

| Haplotype clustering in NR6A1
To check whether the NR6A1 haplotype in Danish Duroc pigs was introgressed from Meishan pigs or a breed genetically close to Meishan, we compared haplotypes within a reference pool of more Chinese pig breeds. We further collected whole-genome sequencing data from Chinese pigs, including five Erhualian (Ai et al., 2015), two Jinhua , four Tongcheng (Wang et al., 2015), six Luchuan (Ai et al., 2015) and six Wuzhishan pigs (Ai et al., 2015). For Meishan and Duroc, we randomly selected 10 individuals from each population. We used the same procedure as above to call variants in the 2-Mb region (298-300 Mb on SSC1) around the NR6A1 gene. BEAGLE (version 4.1) (Browning & Browning, 2007) was used to impute missing genotypes and to phase haplotypes. Haplostrips

| Population structure
The first four components of PCA are shown in Figure 1.

| Chromosome painting
The first run of ChromoPainter (Lawson et al., 2012) Table   S1). Among these, 1,640 windows had more than 50% Meishan ancestry, and 2,630 windows had more than 50% Bamaxiang ancestry. These results suggest selection for Chinese-derived haplotypes in the Duroc genome. The strong correlation confirmed the accuracy of local ancestry inference. In windows with more than 70% Meishan ancestry, the average of 50-kb window-based F ST between Duroc and Meishan was 0.33 ± 0.18 (mean ± SD), much lower than the value (average F ST 0.45 ± 0.17) between Duroc and Bamaxiang. This demonstrates the presence of Meishan ancestry in Duroc.

| Gene annotation and functional analyses
Gene annotation of five regions with the highest proportion of Meishan ancestry and five regions with the highest proportions of Bamaxiang ancestry is shown in Table S2. In total, these ten regions In order to identify potentially selected traits, QTLs located within candidate regions (windows with more than 50% Chinese ancestry and F ST lower than 95% quantile of empirical distribution) were counted. The 1,710 QTLs within these windows included 587, 162, 177, 575 and 209 of them associated with meat and carcass, production, reproduction, health and exterior traits, respectively.

| D ISCUSS I ON
In this study, we used ChromoPainter (Lawson et al., 2012) Figure S3), Meishan and Duroc haplotypes locate as an outgroup of EUW and Chinese pigs. One possible reason might be that there is not enough variation in this region to correctly place Meishan and Duroc cluster in the tree. This gene has been extensively investigated and is considered a causal gene affecting the number of vertebrae (Mikawa et al., 2005(Mikawa et al., , 2007. There is also evidence showing that this gene underwent a selective sweep in European commercial and local breeds (Rubin et al., 2012). In our study, the high haplotype homozygosity in Duroc haplotypes (as shown in Figure 6) also confirms the selection on this haplotype. These results suggest that the NR6A1 haplotype from Meishan or a closely related breed was introduced into European breeds and subsequently fixed due to selective breeding for large body size.
We here identified a list of genes where the introgressed KIT  with <1% Chinese ancestry) and MBNL1 with 19% Chinese ancestry). These results suggest that in these cases, Chinese haplotypes were not positively selected.

| CON CLUS ION
Admixture during the Industrial Revolution of Chinese pigs imported into Europe to improve the performance of European breeds has contributed a substantial fraction of the genomes of European domesticated pigs. This introgression is highly uneven across the genome, and that introgressed regions are associated with traits that have been actively selected for in the direction that would favour haplotypes of Asian origin.

CO N FLI C T O F I NTE R E S T
None declared.

DATA ACCE SS I B I LIT Y
Sequencing data from Meishan pigs have been deposited in the NCBI Short Read Archive (SRA) (project accession no. PRJNA378496).