The role of introgression and ecotypic parallelism in delineating intraspecific conservation units

Abstract Parallel evolution can occur through selection on novel mutations, standing genetic variation or adaptive introgression. Uncovering parallelism and introgressed populations can complicate management of threatened species as parallelism may have influenced conservation unit designations and admixed populations are not generally considered under legislations. We examined high coverage whole‐genome sequences of 30 caribou (Rangifer tarandus) from across North America and Greenland, representing divergent intraspecific lineages, to investigate parallelism and levels of introgression contributing to the formation of ecotypes. Caribou are split into four subspecies and 11 extant conservation units, known as designatable units (DUs), in Canada. Using genomes from all four subspecies and six DUs, we undertake demographic reconstruction and confirm two previously inferred instances of parallel evolution in the woodland subspecies and uncover an additional instance of parallelism of the eastern migratory ecotype. Detailed investigations reveal introgression in the woodland subspecies, with introgressed regions found spread throughout the genomes encompassing both neutral and functional sites. Our investigations using whole genomes highlight the difficulties in unequivocally demonstrating parallelism through adaptive introgression in nonmodel species with complex demographic histories, with standing variation and introgression both potentially involved. Additionally, the impact of parallelism and introgression on conservation policy for management units needs to be considered in general, and the caribou designations will need amending in light of our results. Uncovering and decoupling parallelism and differential patterns of introgression will become prevalent with the availability of comprehensive genomic data from nonmodel species, and we highlight the need to incorporate this into conservation unit designations.

and when discussed, the focus is typically on interspecies hybrids and not conservation units below the species level .
Here, we investigate nuclear genomic structure of caribou (Rangifer tarandus) across North America and Greenland and investigate how intraspecific parallelism contributed to the formation of caribou ecotypes. We then investigate levels of introgression between divergent intraspecific lineages. In Canada, there are four caribou subspecies largely based on morphology (Banfield, 1961; Figure 1). They are distributed in widely different ecozones, including the High Arctic, mountains, taiga and boreal forests (Banfield, 1961;COSEWIC, 2011). They display evidence of local adaptation, with differences in morphology, diet, behaviour and life history in different regions, leading to the classification of 12 designatable units (DUs; 11 extant and 1 extinct; COSEWIC, 2011; Figure S1), often referred to as ecotypes, by the Committee on the Status of Endangered Wildlife in Canada (COSEWIC, 2011). Importantly, all 11 extant ecotypes are now listed as at risk of extinction (COSEWIC, 2011(COSEWIC, -2017 and many have been declining rapidly due to human-mediated disturbances including climate change (Festa-Bianchet, Ray, Boutin, Côté, & Gunn, 2011;Vors & Boyce, 2009;Weckworth, Hebblewhite, Mariani, & Musiani, 2018). Additionally, caribou are of huge cultural, spiritual and economic significance to many indigenous communities (Festa-Bianchet et al., 2011;Polfus et al., 2016). It is also a keystone species for the ecosystem, important for vegetation structure, nitrogen cycling and predator populations (Festa-Bianchet et al., 2011).
Previous mitochondrial DNA studies indicate two major phylogenetic lineages of caribou in North America which likely have origins in different glacial refugia (Cronin, MacNeil, & Patton, 2005;Flagstad & Røed, 2003;Klütsch, Manseau, & Wilson, 2012;Weckworth, Musiani, Devitt, Hebblewhite, & Mariani, 2012). The mitochondrial lineages do not line up with the current subspecies or DU designations in all areas. For example, the boreal DU, which extends from the east coast of Canada to the northern regions of the Northwest Territories, contains individuals belonging to the North American phylogenetic lineage, or NAL, in the central and eastern part of the range (Klütsch et al., 2012;Polfus, Manseau, Klütsch, Simmons, & Wilson, 2017). However, boreal caribou from the northern part of the Northwest Territories belong to the Beringian-Eurasian phylogenetic lineage, or BEL, indicating potential parallel evolution (Polfus et al., 2017). The northern mountain DU also sits within the BEL, even though they belong to the woodland subspecies along with the boreal DU (Polfus et al., 2017; Figure 1) also indicating potential parallel evolution. Additionally, the eastern migratory DU has two disjunct ranges, one in northern Manitoba and Ontario and the other in northern Quebec and Labrador ( Figure S1). Eastern migratory caribou from the Ontario and Manitoba region were found to be an admixture of boreal caribou from the NAL lineage and barrenground caribou from the BEL lineage (Klütsch, Manseau, Trim, Polfus, & Wilson, 2016). However, it is unknown whether the Quebec and Labrador eastern migratory ecotype share the same origin.
We examined high coverage whole-genome sequences of 30 caribou in the most comprehensive study to date covering six DUs and all four subspecies ( Figure 1; Table 1). We used genomewide variation using population and phylogenomic approaches to investigate (a) what is the nuclear phylogenomic structure of caribou across North America and Greenland and does this match the mitochondrial lineages and potential origins in two major refugia? (b) Do we confirm parallel evolution of northern mountain and boreal caribou in the northern part of the range with the rest of the woodland subspecies and did the eastern migratory caribou evolve the same phenotype in parallel in the two disjunct ranges? (c) Was there introgression among the caribou lineages we uncovered using phylogenomic analyses? What do these patterns look like across the genome and do we see introgression of genes putatively suggesting adaptive introgression in facilitating the parallel evolution of ecotypes? And finally, (d) are the eastern migratory caribou from Quebec/Labrador admixed as was found for eastern migratory caribou from Ontario/ Manitoba? Issues of parallelism and complex patterns of introgression will certainly become more prevalent with advances in sequencing technologies and we discuss how the definition and delineation of conservation units could be informed by our results.

| Sample collection, extraction and sequencing
Tissue was collected from 28 caribou from across Canada and two caribou from Greenland between 1992 and 2015, representing all four subspecies and six Canadian designatable units (DUs; Figure 1, Table 1 and Table S1). Samples were collected on road kills or from harvested animals by biologists or veterinarians with the British Columbia, Manitoba and Ontario provincial governments, the Canadian federal government, the Greenland government, the Sahtú Renewable Resources Board, The Royal Ontario Museum, the University of Manitoba and an independent consultant (see Table   S1). Tissues were stored in RNA later ICE (Thermo Fisher Scientific, MA, USA). Phenol chloroform extractions were performed on three of the samples (The Pas, Snow Lake and Ignace) using 0.

| Filtering raw reads and variant calling
We used Trimgalore 0.4.2 (available here: https://github.com/Felix Krueg er/TrimG alore), a wrapper script for CutAdapt (Martin, 2011), to remove sequencing adaptors and to trim low-quality ends from reads with a phred quality score below 30. Reads were aligned to the caribou reference genome  using Bowtie2 2.3.0 (Langmead & Salzberg, 2012), and the SAM file converted to a BAM file using Samtools 1.5 (Li et al., 2009 were also filtered as below. We downloaded the raw reads for a Sitka deer (Odocoileus hemionus sitkensis) genome from the NCBI database (Bioproject PRJNA476345, run SRR7407804) sequenced as part of the CanSeq150 Initiative, to use as an outgroup. We aligned and filtered the reads in the same way as for the caribou genomes to produce an individual VCF file. We then used the Combine GVCFs function and performed joint genotyping using Genotype GVCFs, both in GATK, to produce a VCF file with all caribou, the reindeer and the Sitka deer, for analyses requiring an outgroup.
We did some additional filtering on the combined VCF files to ensure quality. We used VCFtools 0.1.14 (Danecek et al., 2011) to do two rounds of filtering. First, we removed indels (using the remove-indels command), and any site with a depth of less than 10 or more than 80 (approximately double the average depth across the genome, using the min-meanDP and max-meanDP commands) and removed any low-quality genotype calls, with a score below 20, (using the minGQ command) which in VCFtools are changed to missing data. In the second round, we filtered to remove genotypes with more than 10% missing data (using the max-missing command).
We did not filter to remove any SNP with a minor allele frequency (MAF) as we have only one or two individuals from each location and this results in removing the private sites, instead relying on very high depth and stringent filtering to ensure a high-quality data set.
However, we did conduct the PCA removing sites with an MAF of less than 0.05 and these looked identical to the data without the SNPs. After filtering, we measured the mean depth (using the depth command), the frequency of missing data (using the missing-indv command) and the inbreeding coefficient, F (using the het command), for each individual in the final VCF file of 30 caribou plus the reindeer using VCFtools.

| Population and phylogenomic structure
We performed principal component analyses (PCA) in r 3.4.4 (R Core Team, 2018) using the packages vcfR (Knaus & Grüwald, 2017) and Adegenet (Jombart, 2008). The PCA was done on the VCF file containing all caribou and the reindeer (but not the Sitka deer). We then ran subsets of individuals on different PCAs to gain higher resolution of different lineages (see Results).
We used VCFkit (available here: https://vcf-kit.readt hedocs.io/ en/lates t/, using numpy 1.14 as the programme does not work with newer versions) to generate a fasta file using the 'phylo fasta' command. The programme concatenates SNPs for each sample, using the first genotype of each allele (e.g. for diploids where the genotype is A/T, the A is used) and replacing missing values with an N. We ran this on the VCF file without the Sitka deer to create an unrooted tree as including the Sitka deer pushed all caribou too closely together to discern the branches. The resulting file was input into RAxML 8 (Stamatakis, 2014) and run using the GTRGAMMA model and 1,000 bootstrap replicates. We visualized the best tree in FigTree 1.4.2 (https://github.com/ramba ut/figtree). We also aligned each of the conserved mammalian genes extracted from the genomes using BUSCO (above) to construct phylogenies, from which we made a consensus tree. We used the Sitka deer outgroup to root the tree.
We used muscle (Edgar, 2004) to align the sequences for each individual to create a combined fasta file for each gene. We then used RAxML as above to create a gene tree for each file and then used ASTRAL-III (Zhang, Rabiee, Sayyari, & Mirarab, 2018) to create a consensus tree which was visualized in FigTree.
We used the populations module in Stacks 2.4.1 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) to convert our VCF files (both with and without the Sitka deer) into an input file for Treemix 1.13 (Pickrell & Pritchard, 2012). We ran Treemix from 0 to 9 migration events, with ten iterations of each, grouping the SNPs in windows to account for possible linkage using a block size of 1,000 SNPs for seven of the iterations and 5,000 SNPs for three of the iterations (because the OptM package, below, must have different likelihood scores between iterations). We plotted the resulting trees, and the residual plots, in RStudio 1.0.136 (RStudio Team, 2015). We then used the R package OptM (available here: https://cran.r-proje ct.org/ web/packa ges/OptM/index.html) to calculate the second-order rate of change in the log-likelihood of the different migration events (the ad hoc statistic delta M) to help infer how many migration events to visualize.

| Demographic reconstruction and admixture analyses
We made a consensus fastq file for each caribou and the reindeer from their BAM files, using the Samtools and BCFtools 1.5. This was converted into an input file using the 'fq2psmcfa' command and run using the 'psmc' command in Pairwise Sequentially Markovian Coalescent (PSMC) model in PSMC (Li & Durbin, 2011) to investigate past effective population size changes. These were plotted using the general mammal mutation rate of 1.0E-9 per year (Li & Durbin, 2011) and a generation time of 7 years (COSEWIC, 2011(COSEWIC, -2017. To calculate admixture statistics, we used the R package admixr (Petr, Vernot, & Kelso, 2019) to run ADMIXTOOLS (Patterson et al., 2012). We converted our VCF file containing the Sitka deer (to use as an outgroup) into EIGENSTRAT format using a C++ script (found here: https://github.com/bodka n/vcf2e igens trat). As the package does not work when including more than 600 scaffolds, we filtered the data set to include SNPs found only on the 600 largest scaffolds, which encompassed over 98% of the reference genome assembly (the scaffold L90 is 285; Taylor  occurred into one of the two sister groups and an outgroup (which we always set as the Sitka deer). An f4 statistic which significantly differs from 0 indicates gene flow, whether it is positive or negative tells you into which of the sister populations. In the f4-ratio test, alpha is calculated, which is the proportion of the genome in population 'X' that originates from population 'B' as opposed to population 'A' (the proportion of population 'A' is calculated as 1 -alpha).
For these tests, we grouped the four barrenground genomes from Bluenose and Qamanirijuaq as they show no differentiation and testing them separately made no difference to the results. The four boreal caribou genomes from Ontario and Manitoba were run separately as these do show differentiation and grouping them did affect the outcome. We focussed on using these tests to investigate 1) the amount of barrenground introgression into eastern migratory caribou in Ontario/Manitoba and Quebec/Labrador (f3, f4 and f4-ratio tests) separately as they have nonoverlapping ranges, 2) introgression between eastern migratory caribou in Ontario/Manitoba and Quebec/Labrador (f4 test), 3) introgression between boreal caribou of NAL origin and the mountain caribou (f4 and f4-ratio tests for significant populations) and 4) introgression between boreal caribou of NAL origin and the Northwest Territories boreal caribou of BEL origin (f4 and f4-ratio tests), since one of our aims was to investigate the amount of introgression between the lineages and the potential role of adaptive introgression in leading to parallel evolution.
We first investigated introgression from barrenground into the Ignace into all mountain populations as adaptive introgression is unlikely to have played a role in the parallel evolution of these populations due to the uncovered patterns of introgression (see Results).
To do this, we extracted the sequences for all regions across the genome with an f DM score over 0.2 (as it is the most conservative statistic) using Bedtools 2.29 (Quinlan & Hall, 2010). To make sure the sister group used in the set-up of the test did not bias the results, we only included regions that were flagged as highly introgressed from the NAL group when using both Grant's and Peary caribou as the sister group. We used the command line version of blast 2.6 (Altschul, Gish, Miller, Myers, & Lipman, 1990) to search for the genes present in these introgressed regions and genes with mRNA or predicted mRNA hits in at least two species and with an E score of 0.

| Genome quality assessments
We sequenced 28 caribou genomes from across Canada and two caribou from Greenland to high coverage (35.57 -43.03X; Table   S1) representing all four subspecies and six Canadian designatable units (DUs; Figure 1 and Table 1) and used an additional reindeer   (Table   S1), and missing data per individual were low at 0.3%-1.0% with the exception of the previously published reindeer genome at 16.0% (Table S1).

| Population and phylogenomic structure and demographic history
Principal component analyses (PCAs) revealed four major clusters corresponding to the NAL and BEL as well as Peary and Greenland clusters (Figure 2a). The genome clustering did not conform to current subspecies or ecotype designations but did match previous mitochondrial results pertaining to the two refugial lineages (Figure 1; Table 1). Specifically, in the NAL cluster were some caribou popula- Greenland caribou forming a sister clade within the group (Figure 3).
The rooted BUSCO phylogeny shows similar patterns to the SNP reconstruction although with shorter branch lengths between groups and lower support of nodes which is unsurprising given that is was reconstructed from conserved mammalian genes and so does not give as much resolution for the fine-scale structure. Importantly, the analysis also separated the NAL and BEL clades with high support ( Figure 4) as with the SNP tree (  Table S1).

| Patterns of introgression
To assess the contribution of admixture and introgression among lineages in positioning caribou lineages, we applied Treemix and f3, f4 and f4-ratio statistics. The Treemix phylogeny with no migration events gave a similar topology to the RAxML tree (Figure 6a).

NAL lineage
When visualizing seven migration events, which shows the least standard error and has the highest delta m score ( Figure S7 (Table S1; Figure 6b).
The f3 results indicated that the genomes of eastern migratory caribou in Ontario/Manitoba resulted from admixture between NAL boreal caribou from Igance (our reference NAL genome, see Methods) and barrenground caribou, as well as from between NAL boreal caribou from Igance and the other genomes from the BEL (Table 2). This indicates that eastern migratory caribou from Ontario/Manitoba have had introgression from the BEL lineage, in agreement with previous findings (Klütsch et al., 2016). The f4-ratio statistic showed the Ontario/Manitoba eastern migratory caribou genomes to be of 7% barrenground origin (see Supporting Information for all statistics).
In contrast, there were no negative f3 scores for eastern migratory caribou in Quebec/Labrador, including from barrenground, with no proportion of their genome of barrenground origin (

Genome sequences of 30 caribou from across North America and
Greenland provided a comprehensive data set in a nonmodel terrestrial mammal species at risk. We reconstructed phylogenomic and demographic history and measured levels of introgression between ecotypes and investigated the potential role this introgression has played in parallel evolution. Our results are concordant with previous mtDNA studies (Cronin et al., 2005;Flagstad & Røed, 2003;Klütsch et al., 2012Klütsch et al., , 2016Weckworth et al., 2012) which found two major mitochondrial DNA phylogenetic lineages, NAL and BEL, which likely correspond to divergence within refugia during glacial cycles (Flagstad & Røed, 2003;Weckworth et al., 2012), as per our first aim. This reflects why the Inner Mongolia reindeer is uncovered as within the BEL lineage as it likely resided in the same Beringian refugia during glacial maxima. We found Peary caribou to be genetically distinct from the others in the BEL lineage (Figure 2a), supporting previous evidence of an additional High Arctic refugium . Greenland were recovered as a sister group to the Peary caribou within the BEL linage, but were also genetically distinct (Figure 2a).
Inbreeding coefficients varied greatly between sampling locations. Peary and Greenland caribou have elevated inbreeding coefficients (Table S1)

| Parallel evolution and introgression in caribou ecotypes
As per our second aim, our results confirm previous evidence that northern mountain and boreal caribou from the Northwest Territories are within the BEL genomic lineage, even though they are both currently within the woodland subspecies alongside the boreal caribou from the east and central part of their range which are in the NAL genomic lineage. This confirms that the woodland ecotype appears to have arisen in parallel for both (Polfus et al., 2017). Our central mountain caribou are also within the BEL genomic lineage, and this population has been found to be a mixture of the two mtDNA lineages (McDevitt et al., 2009). Within the NAL lineage, we found evidence for another, as yet undocumented, case of parallel evolution within the eastern migratory ecotype. The eastern migratory caribou from Ontario/Manitoba and those from Quebec/Labrador are not sister groups (Figures 2b and 3) and have different demographic and introgressive histories.
Recent studies are highlighting that introgression between lineages is far more common than previously realized (Coates, Byrne, & Moritz, 2018;Hamilton & Miller, 2015), and the same appears to be true for caribou with introgression likely playing a role in the evolution of the ecotypes. For our third and fourth aims, we investigated patterns of genomewide introgression among the caribou lineages Introgression is also seen from the NAL into some of the mountain caribou, with negligible levels of introgression detected into more northerly northern mountain caribou. It thus seems unlikely that introgression drove the parallel evolution of the woodland phenotype of the mountain caribou. We find high levels of introgression and many introgressed genes from the NAL lineage into the Northwest Territories boreal caribou. However, when we compare the gene compliment of the most highly introgressed regions in the Northwest Territories boreal caribou to those found in the mountain caribou, we again find introgressed regions spread across the genome including many genes, even though there are fewer regions overall. There are a few explanations for this pattern, including ILS. ILS would be difficult to exclude, especially given that they are closely related intraspecific ecotypes (Lamichhaney et al., 2017).
Whether these regions are a result of ILS or introgression the high gene compliment suggests that they could have persisted in the genome due to selection, even if they have not been involved in the parallel evolution of phenotype, due to filtration for maintenance of adaptive genome segments. Additionally, when studying cases of adaptive introgression in interspecies comparisons, areas of introgression are often restricted to single genomic regions (Schweizer et al., 2019); however, in intraspecific taxa, we may see larger introgressed regions persisting across the genome because the fitness costs may be lessened.
Alternatively, given the genomewide patterns of the introgressed regions ( Figures S9-S14), a likely explanation for the variation in admixture could be genetic drift. As we see introgres-  (Taylor & Larson, 2019). In contrast, investigating parallel evolution of ecotypes, which will inevitably involve many functional regions, in a nonmodel species with divergent intraspecific linages and complex demographic histories is a difficult task.

| Conservation unit designations in the light of complex demographic histories
Given current rates of extirpation and extinction, it is imperative to have strong, scientifically supported management frameworks, particularly given tight resources for conservation (Jackiw, Mandil, & Hager, 2015). Recent work shows that admixture between lineages is common (Coates et al., 2018;vonHoldt et al., 2018)  be admixed and also to have the lowest individual inbreeding coefficients, and similarly, eastern migratory caribou from Ontario/ Manitoba have lower inbreeding coefficients than the nonadmixed individuals from Quebec/Labrador (Table S1). Some argue that gene flow could even be facilitated to aid populations under threat from climate change (or an increase in the fitness of a population due to the introduction of new alleles, that is genetic rescue; Hamilton & Miller, 2015;Whiteley, Fitzpatrick, Funk, & Tallmon, 2015), which would be easiest between intraspecific populations (Hedrick & Fredrickson, 2010). Good conservation unit designations with an understanding of natural patterns of admixture are key to assess the potential to use such a strategy (Coates et al., 2018 Conservation unit designations depend on the goal of conservation, and whether the focus is on the preservation of phenotypes (or 'pure' genomes), or evolutionary and ecological processes to maintain resilience of an ecosystem vonHoldt et al., 2018;Waples & Lindley, 2018). The latter is likely more useful when attempting to designate units for nondiscrete entities, such as we see in caribou. With this in mind, some authors have suggested a flexible approach with each case considered on a context-specific basis (Jackiw et al., 2015), whereas others promote the need for a structured and uniform framework to decide on management decisions (Coates et al., 2018). For caribou, and other taxa, it seems appropriate for a structured approach in the naming of subspecies. Coates et al. (2018) suggest that subspecies show local adaptation with or without gene flow. Coupling this idea with our phylogenomic and population genomic results and results from previous studies, Canadian caribou appear to fit into three subspecies; those in the NAL, those in the BEL, and Peary caribou which sit phylogenetically in the BEL but show strong population genomic differences and clear local adaptation of phenotype (Banfield, 1961;COSEWIC, 2011).
The most relevant application of our findings is in the delinea- These divisions have significant implications for the status listing of each DU as threat status is assessed based on criteria such as abundance, and priority for management is given to DUs at greatest risk of extinction (COSEWIC, 2015). Given recent rapid declines in both range and population sizes, efficient conservation strategies are needed for caribou.
Our guidelines add to the current discussion about management of admixed populations and those with complex demographic histories (Coates et al., 2018;Fitzpatrick et al., 2015;Hamilton & Miller, 2015;Jackiw et al., 2015;Supple & Shapiro, 2018;vonHoldt et al., 2018). Namely, that subspecies designations are useful and could follow a structured framework (Coates et al., 2018), but that conservation units below the subspecies level likely require a case-by-case consideration especially given different regulations in different countries (Coates et al., 2018;vonHoldt et al., 2018).
Genomic data allow detailed investigations of demographic histories and genomewide patterns of introgression, and these results should be considered in conservation unit designations to make meaningful management decisions. Many taxa are facing an increasing threat from climate change and habitat destruction (Hoffman, Sgrò, & Kristensen, 2017;Ikeda et al., 2017) and genomic data and appropriate conservation unit designations will help with prioritization given limited resources. This means considering whether taxa need dividing into different units to ensure populations with different evolutionary histories and trajectories are maintained, or whether they should be kept as one conservation unit as over splitting could spread resources too thinly. For example, this decision will need to be made for eastern migratory caribou. A key next step to achieve these goals, including for caribou, is to use the wealth of data available from genomewide markers to investigate adaptive genomic variation to incorporate with demographic history information (Funk, Forester, Converse, Darst, & Moreys, 2019;Funk, McKay, Hohenloe, & Allendorf, 2012).