Maternal genomic variability of the wild boar (Sus scrofa) reveals the uniqueness of East‐Caucasian and Central Italian populations

Abstract The phylogeography of the European wild boar was mainly determined by postglacial recolonization patterns from Mediterranean refugia after the last ice age. Here we present the first analysis of SNP polymorphism within the complete mtDNA genome of West Russian (n = 8), European (n = 64), and North African (n = 5) wild boar. Our analyses provided evidence of unique lineages in the East‐Caucasian (Dagestan) region and in Central Italy. A phylogenetic analysis revealed that these lineages are basal to the other European mtDNA sequences. We also show close connection between the Western Siberian and Eastern European populations. Also, the North African samples were clustered with the Iberian population. Phylogenetic trees and migration modeling revealed a high proximity of Dagestan sequences to those of Central Italy and suggested possible gene flow between Western Asia and Southern Europe which was not directly related to Northern and Central European lineages. Our results support the presence of old maternal lineages in two Southern glacial refugia (i.e., Caucasus and the Italian peninsula), as a legacy of an ancient wave of colonization of Southern Europe from an Eastern origin.


| INTRODUC TI ON
Wild boar (Sus scrofa) is widely distributed all over the world, and their population is almost four million, hence, considered the second most abundant ungulate species in Europe and a pest in several regions (Apollonio, Andersen, & Putman, 2010;Bobek, 2018).
Genetic studies can be used to discover more diversity patterns as well as in explaining the changes observed in the species' geographical range. Throughout its history, the wild boar has been strongly influenced by human practices such as hunting, pig domestication, and animal translocation (Larson et al., 2005;Scandura, Iacolina, & Apollonio, 2011). Thus, analysis of the changes in the geographical range of S. scrofa could aid our understanding of not only the global patterns of evolution and ecosystem change (Anijalg et al., 2018). Previous studies revealed the complex genetic structure of wild boar populations in Eurasia, including multiple domestication events and gene flow between wild boar and domestic pig breeds (Iacolina et al., 2016;Larson et al., 2005;Ramíres et al., 2009;Ribani et al., 2019;Šprem et al., 2014).
Since the most of these studies had been focused on Europe or Eastern Asia (Kusza et al., 2014;Larson et al., 2005;Scandura et al., 2011;Veličković et al., 2015;Vilaça et al., 2014), the specific status and phylogenetic position of West Asian pigs have only recently been reported. A recent investigation by Khalilzadeh et al. (2016) revealed the presence of Middle Eastern, European, and East Asian haplotypes in Iranian wild boar and proposed that there had been contact between the European and East Asian wild boar populations. Thus, analysis of samples from a large number of geographical localities, across different regions, could provide a more comprehensive description of the evolutionary history of wild boar.
A factor that could affect the quality of analysis is the type of genetic marker used in any study. Most genetic studies on the diversity or phylogenetics of wild boar were based on partial D-loop sequences of mitochondrial DNA (mtDNA; less than 7% of the whole mtDNA genome), or cytochrome b sequences (Fang & Andersson, 2006;Kusza et al., 2014;Larson et al., 2005;Ramíres et al., 2009;Scandura et al., 2008;Veličković et al., 2015Veličković et al., , 2016Vilaça et al., 2014), which occupies less than 7% of the whole mtDNA genome, sometimes in combination with another region (e.g., cytochrome b).
To the best of our knowledge, there are only a few previous examples of using a complete mtDNA genome to assess the phylogeographical relationships of S. scrofa. Ni et al. (2018) addressed their global phylogeography, based on sequences data from domestic pigs (72%), and a comparatively low number of wild boar samples. The authors showed patterns similar to those obtained by use of partial mtDNA genome sequences, thus demonstrating a clear separation of European and Asian clades. Chen et al. (2018) addressed the relationships between North Asian and South Asian wild boars, based on the effect population size (Ne) and climate on the non-synonymous/synonymous mutation ratio (Ka/Ks). Thus, there is a gap in genetic studies of wild boar, due to the lack of complete mtDNA on the Western part of the species' geographical range.
In this study, we present an analysis of SNPs variability in the whole mtDNA genome of wild boar from North Africa and continental Eurasia (various regions from Western Europe to Western Siberia). Firstly, we aimed to determine the phylogenetic position of animals from Eastern Caucasus-a region not previously included in any phylogeographical study. Secondly, we addressed the question of how a phylogenetic tree of wild boar, based on SNPs of whole mtDNA genome data, would differ from a phylogenetic tree based on shorter fragments (D-loop and cytochrome b markers).

| Ethics statement
We declare that we have no financial or personal relationships with other people or organizations that can inappropriately influence our work, and that we have no professional or other personal interest of any nature or kind in any product, service, and/or company that could be construed as influencing our research.
No wild boars were culled solely for the purpose of the present study. Tissue samples from each different country were obtained from collaborators and hunters. All samples were collected in compliance with each county's national regulations on wild boar management plans.

| Laboratory methods
In this research, some samples were extracted with QIAamp DNA Blood Spin kits (Qiagen), and most of which were extracted using the phenol-chloroform method (Sambrook, Fritsch, & Maniatis, 1989).
The quality and quantity of the extracted DNA were checked using a NanoDropTM 8000 spectrophotometer (NanoDrop Technologies Inc.). All experiments were performed according to the ethical regulations of the Chinese Academy of Sciences (Approval ID: SYDW-2015012).

| Sequence analysis
Library preparation for the Illumina sequencing platform required fragmentation of our DNA (1-3 μg of genomic DNA), followed by repair of 3′ and 5′ ends to form blunt-ended, phosphorylated molecules, and the addition of a non-templated dA-tail before ligation to an adaptor. DNA libraries were prepared according to the standard Illumina library preparation protocol, with a short insert size range of 300-500 bp, and then sequenced on an Illumina HiSeq 2000 platform with the 150 bp paired-end sequencing kits. Pair-end reads (150 bp) were sequenced to about 10× sequencing depth and ≥99% coverage for each individual.
Quality control procedure was used to remove reads with low sequencing quality. Particularly, reads were trimmed for minimum Phred quality >20 over three consecutive base pairs and discarded if shorter than 45 bp. Clean reads were trimmed from raw reads that were preprocessed to remove index adaptors and low-quality reads.
Quality control for removing the low-quality reads was done based on the following criteria: up to 10% of the read bases include "N" content of each sequenced reads, up to 50% of the read bases include low-quality (Q <= 5) base content in any sequenced reads, and finally, by removing duplicate reads, using Picard tools v.2.12.1.
After quality trimming, clean reads of each sample were aligned against the mitochondrial genome from the domestic pig reference genome (S. scrofa 10.2, which was downloaded from Ensembl genome browser) using Burrows-Wheeler Aligner (BWA) software (Li & Durbin, 2010). After extracting whole mitochondrial genome from our raw data by VCFtools 1.13 (Danecek et al., 2011), variants were called using Genome Analysis Toolkit (GATK) (Nekrutenko & Taylor, 2012). To avoid potential bias between our extracted data and the publicly available data, we called SNPs by comparing the mitogenome sequence of each individual to the mitochondrial reference genome and then merged the called SNPs to form a common set of SNP data for the 82 individuals (including 77 samples of wild boar and five sequences for outgroups). Then, several filtering steps were applied before using candidate SNPs for further analyses to minimize the number of false positive calls. Then, we removed all of the missing sites and also SNPs with minor allele frequency of 0.01, leaving 933 SNPs for next step. Afterward, the Integrative Genomics Viewer 2.5 (IGV) (Robinson, Thorvaldsdóttir, Wenger, Zehir, & Mesirov, 2017) was used as a high-performance visualization tool for interactive exploration of large, integrated genomic datasets and to confirm reliability of high-quality polymorphic sites (269 SNPs) with no any linkage makers based on reference genome for further analyses.
The Bayesian phylogenetic tree was constructed using 78 sequences (adding 1 sequence of warthog [P. africanus] as an outgroup) using MrBayes 3.2.2 software (Ronquist & Huelsenbeck, 2003). The evolutionary parameters were given by jModeltest2.1 (Posada, 2008). For a large number of haplotypes, the complexity of a network can be solved with a median-joining (MJ) network algorithm. So, based on all haplotypes of the complete mtDNA, a MJ network was constructed using NeTwork, version 5.0.0.3 (Bandelt, Forster, & Röhl, 1999). We also performed a principal component analysis (PCA) with population-scale SNPs, as implemented in the smartPCA program in the eigeNsofT package v.6.1.4 (Galinsky et al., 2016;Price et al., 2006).
TreeMix v.1.1 (Pickrell & Pritchard, 2012) (Purcell et al., 2007) for windowed pruning of the data to obtain a pruned file which we then used to get second Ped/Map files, the final input for subsequent analyses. Therefore, our analysis was based on non-link markers. Together, the nature of our dataset, Biallelic SNP VCF of the whole mtDNA, and data filtering steps made it possible for us to reliably use TreeMix.

| Mitogenome variability of wild boars
A total of 77 complete wild boar mtDNA genomes from Europe (n = 64), North Africa (n = 5), and Russia (n = 8) were sequenced for this study. After the quality control steps, 269 polymorphic sites were retrieved for the analyses. A total of 33 haplotypes were detected in our dataset, of which 27, 4 and 3 were found in Europe, North Africa and Russia, respectively (Figure 1, Table 1

| D ISCUSS I ON
Most published studies of the genetic diversity and phylogenetic relationships among wild boar mtDNA haplogroup lineages have been performed using the D-loop or the cytochrome b regions of the mtDNA genome (Alexander, Novembre, & Lange, 2009;Kusza et al., 2014;Larson et al., 2005;Scandura et al., 2008;Veličković et al., 2015).  about their complete mtDNA genome diversity Kijas & Andersson, 2001;Ni et al., 2018;Tan et al., 2018). However, According to Maselli et al. (2016), the proportion of "Asian" haplotypes in Southern Italy and Sardinia was about 9%. Manunza et al. (2013) based on the analysis of autosomal SNPs also suggested the scenario of migration from Trans-Caucasus region and Western Asia to Western Europe.
The lack of distinct clades (similar to E1 and E2) on the Balkan Peninsula (Alexander et al., 2009;Veličković et al., 2015)  We conclude that inclusion of the data from a previously unsam-

CO N FLI C T S O F I NTE R E S T
The authors declare no conflict of interests.

AUTH O R CO NTR I B UTI O N S
The study was designed by YPZ, AE, SK, NM. SK, and CPH performed the molecular analysis. SK and Sz.K performed the population genetic analyses. SK, NM, MS, EB, NS, IVS, LP, AE, and HBX provided the data. NM, Sz.K, and SK wrote the paper and all authors read, corrected and approved the manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
Our novel data has an accession number of PRJEB30825 and is available in the European Bioinformatics Institute (EMBL-EBI).