Tracing the footprints of a moving hybrid zone under a demographic history of speciation with gene flow

Abstract A lack of optimal gene combinations, as well as low levels of genetic diversity, is often associated with the formation of species range margins. Conservation efforts rely on predictive modelling using abiotic variables and assessments of genetic diversity to determine target species and populations for controlled breeding, germplasm conservation and assisted migration. Biotic factors such as interspecific competition and hybridization, however, are largely ignored, despite their prevalence across diverse taxa and their role as key evolutionary forces. Hybridization between species with well‐developed barriers to reproductive isolation often results in the production of offspring with lower fitness. Generation of novel allelic combinations through hybridization, however, can also generate positive fitness consequences. Despite this possibility, hybridization‐mediated introgression is often considered a threat to biodiversity as it can blur species boundaries. The contribution of hybridization towards increasing genetic diversity of populations at range margins has only recently gathered attention in conservation studies. We assessed the extent to which hybridization contributes towards range dynamics by tracking spatio‐temporal changes in the central location of a hybrid zone between two recently diverged species of pines: Pinus strobiformis and P. flexilis. By comparing geographic cline centre estimates for global admixture coefficient with morphological traits associated with reproductive output, we demonstrate a northward shift in the hybrid zone. Using a combination of spatially explicit, individual‐based simulations and linkage disequilibrium variance partitioning, we note a significant contribution of adaptive introgression towards this northward movement, despite the potential for differences in regional population size to aid hybrid zone movement. Overall, our study demonstrates that hybridization between recently diverged species can increase genetic diversity and generate novel allelic combinations. These novel combinations may allow range margin populations to track favourable climatic conditions or facilitate adaptive evolution to ongoing and future climate change.


| INTRODUC TI ON
The rate and direction of change in species range margins is determined by standing levels of genetic diversity, biotic and abiotic factors, and an interaction between genotypes and the environment (Sexton, McIntyre, Angert, & Rice, 2009). The "centre-periphery hypothesis" (CPH) is a long-standing ecological hypothesis that is used to explain how the above-mentioned factors influence range margins of species (Brown, 1984). Specifically, the CPH states that populations at the core of a species range are often at carrying capacity, whereas populations near the range margins are also likely near the margins of their ecological niche and tend to exhibit lower genetic diversity thereby limiting further range expansion (Eckert, Samis, & Lougheed, 2008;Lawton, 1993). The CPH makes key assumptions about the fundamental niche being similar across the geographical range of a species, and that climate optima are stable over time (Pironon et al., 2017;Sheth & Angert, 2018). Efforts to predict changes in range margins via ecological niche modelling (ENM) are now incorporating within-species variation in the fundamental niche and genetic structure (Ikeda et al., 2017;Malone, Schoettle, & Coop, 2018). Still, biotic factors such as competition, tolerance to insects and pathogens and hybridization with a closely related species are often neglected in predictive modelling (but see Engler, Rödder, Elle, Hochkirch, & Secondi, 2013;Pollock et al., 2014). In particular, hybridization-induced introgression is often considered a threat to biodiversity, as on one hand it can cause dilution of the local gene pool (Fitzpatrick et al., 2010) and can generate offspring with reduced fitness levels. On the other hand, introgression can increase genetic diversity, specifically in range margin populations, thus creating deviations from the patterns of genetic diversity expected under the CPH. Moreover, introgression often facilitates colonization of novel habitats by bringing together novel allelic combinations not seen in the range of either parental species (Abbott, Barton, & Good, 2016;Rieseberg et al., 2007;Stebbins, 1959). Mounting evidence for introgression facilitating evolutionary and ecological diversification in several taxa, such as cichlid fishes (Meier et al., 2017), Saccharomyces yeast (Stelkens, Brockhurst, Hurst, & Greig, 2014), conifers (de Lafontaine, Prunier, Gérardi & Bousquet, 2015), Darwin's finches (Lamichhaney et al., 2018) and even hominids (Jagoda et al., 2017), indicates its importance as an evolutionary process.
For parapatrically distributed species as well as recently diverged species, genome-wide introgression can cause range shifts due to its effect on standing levels of genetic diversity (Hamilton & Miller, 2016;Pfenning, Kelly, & Pierce, 2016). Such hybridization-induced changes in range margins tend to leave signatures of spatio-temporal shifts in the location of hybrid zones. Spatio-temporal dynamics of hybrid zones can be driven by varied processes such as interspecific competition, changes in population size, dynamic features of the landscape and varying selection pressures (Barton & Hewitt, 1985;Buggs, 2007;Pfenning et al., 2016). Whether these processes will cause a hybrid zone to experience asymmetric expansion, bidirectional expansion or contraction will additionally depend on the divergence history and life history characteristics of the hybridizing species. For instance, introgression is often considered to be a factor causing range contraction of native species (Todesco et al., 2016). However, such hybridization between native and invasive species may not be of concern if populations of the native species are not highly fragmented at the region of contact and are ecologically differentiated from the niche space that is suitable for hybrids (Currat, Ruedi, Petit, & Excoffier, 2008). Further, the hybrid zone dynamics literature has focussed on divergence histories involving secondary contact while those with punctuated gene flow or continual gene flow during divergence are largely missing.
The long-term consequences of hybridization are rarely explored due to the paucity of field records describing the locations and composition of hybrid populations across generations (Britch, Cain, & Howard, 2008;Buggs, 2007;Taylor, White, et al., 2014a). This shortcoming is often overcome by utilizing the mathematical theory of clines to assess coincidence in cline centres between nuclear and mitochondrial genomic data (Krosby & Rohwer, 2009;Souissi, Bonhomme, Manchado, Sfar, & Gagnaire, 2018), as well as between genetic markers and morphological traits associated with reproductive isolation (Arntzen, de Vries, Canestrelli, & Martínez-Solano, spatially explicit, individual-based simulations and linkage disequilibrium variance partitioning, we note a significant contribution of adaptive introgression towards this northward movement, despite the potential for differences in regional population size to aid hybrid zone movement. Overall, our study demonstrates that hybridization between recently diverged species can increase genetic diversity and generate novel allelic combinations. These novel combinations may allow range margin populations to track favourable climatic conditions or facilitate adaptive evolution to ongoing and future climate change.

K E Y W O R D S
CDMetaPOP, cline analysis, conifers, forest management, hybrid zone movement, hybrid zones, range dynamics 2017; Gay, Crochet, Bell, & Lenormand, 2008;Martin & Cruzan, 1999;Rohwer, Bermingham, & Wood, 2001). As such, the lack of coincidence in cline centre across different datasets occurs due to nonequilibria between drift and selection and is often referred to as shifts in species range margins or in the central location of the hybrid zone. This signature of hybrid zone movement, as seen through noncoincident cline centres, can help forecast the rate and direction of change in species range margins (Walsh, Shriver, Olsen, & Kovach, 2016), thereby streamlining conservation efforts. Although hybrid zone movement is ubiquitous across systems, the eco-evolutionary processes at play are not well understood. Here, we overcome this hurdle by combining genomic and geographic cline analyses with individual-based spatial simulations of a hybrid zone. We utilize two species of hybridizing white pines inhabiting mountainous regions in western North America to assess hybrid zone movement under a history of continuous gene flow and ecological divergence (Menon et al., 2018b). Besides being key components of the montane ecosystems in western North America (Looney & Waring, 2013;Windmuller-Campione & Long, 2016), most sister species of pines exhibit weak isolating barriers (Critchfield, 1986), rendering them useful to study the interaction between adaptive and neutral introgression in driving hybrid zone dynamics.
Pinus flexilis E. James is distributed from northern Arizona and New Mexico in the southwestern United States to central Alberta, Canada, while P. strobiformis Engelm. ranges from southern Arizona and New Mexico to Jalisco in southern Mexico. The hybrid zone between P. strobiformis and P. flexilis was first described by Engelmann in 1971 as extending across northern Arizona and north-central New Mexico. This has recently been corroborated by morphological and genetic data (Bisbee, 2014;Menon et al., 2018b;Tomback, Samano, Pruett, & Schoettle, 2011). Both species grow at moderate to high elevations, but are divergent when characterized within multivariate niche space (Menon et al., 2018b;Moreno-Letelier, Ortíz-Medrano, & Piñero, 2013). This multivariate niche space is represented by a combination of drought intensity and duration or magnitude of subzero temperatures, such that P. strobiformis occurs in areas of relatively higher drought and fewer days of sub-zero temperatures when compared to P. flexilis. Despite the ecological niche divergence and a long history of divergence with gene flow, the two species lack strong isolating barriers and the hybrid zone continues to exchange genes only with P. flexilis ( Figure 1; Menon et al., 2018b). The absence of strong isolating barriers is supported by field observations indicating the presence of populations containing trees with mixed ancestry being proximal to trees with pure parental characteristics (Steinhoff F I G U R E 1 Map of sampled populations (squares) with overlaid polygons representing the geographical range of Pinus strobiformis (green) (obtained from Shirk et al., 2018) and P. flexilis (blue). The two horizontal lines represent the geographic locations of the cline centre as estimated from morphological (continuous line) and genomic data (dashed line). The dashed oval represents the full extent of the hybrid zone (as defined here). The inset figure shows the best fit demographic model from Menon et al. (2018b), indicating a history of divergence with gene flow and contemporary gene flow only between P. flexilis and the hybrid zone & Andresen, 1971) and by a genome-wide dataset demonstrating the lack of loci associated with reproductive isolation (Menon et al., 2018b). These observations lead to two possible implications for the hybrid zone dynamics in this system. First, given a history of divergence with gene flow, the P. strobiformis-P. flexilis hybrid zone could be relatively spatially stable in comparison with recent hybrid zones or hybrid zones formed by secondary contact (Barton & Hewitt, 1985). Second, the ongoing introgression from P. flexilis into the hybrid or P. strobiformis genomic background could cause spatio-temporal movement of the hybrid zone. We hypothesized that the latter is more likely the case given the spatial distribution of the hybrid zone on a fragmented landscape facilitating a higher rate of neutral introgression from P. flexilis than vice versa and the possible adaptive introgression of freeze-related loci that may facilitate northward hybrid zone movement due to preference for cooler climatic conditions in this group of pines (Frankis, 2009;Larson & Moir, 1987;Moreno-Letelier et al., 2013;Shirk et al., 2018). If our hypothesis holds true, then, management of hybrid populations should be prioritized, as these regions may contain novel allelic combinations that could make trees resilient by facilitating northward range expansion or by providing the raw material to adapt to rapidly changing climatic conditions.
To test our hypothesis, we combined information from empirical analyses of genetic and morphological data with spatial simulations of hybrid zones. Specifically, we address three questions that help us test distinguish the hypothesis: (1) Can we detect a signature of movement in the P. strobiformis-P. flexilis hybrid zone that was formed under a history of divergence with gene flow? (2) How does the empirical data pattern compare to hybrid zone dynamics noted under other models of divergence common in the literature? and (3) If hybrid zone movement is noted in the empirical dataset, can the magnitude of movement be explained purely by demographic processes?
We address question 1 and question 2 by comparing geographic cline centre estimates between the two empirical datasets (genomic and morphological) and then assessing changes in geographic cline centre across different simulated scenarios. To address question 3, we combine results from genomic and geographic cline analyses within a contingency table along with linkage disequilibrium (LD) variance partitioning. Since our simulated dataset does not assume any form of selection, comparing temporal change in LD and relative shifts in the geographic cline centre between the empirical and simulated dataset proved useful to address question 2. Our results shed light on the role of introgression in facilitating hybrid zone movement, beyond what is likely to occur due to differences in regional population sizes. Thus, we emphasize the utility of hybrid zone populations as a key conservation resource owing to the value they provide for assisted migration or adaptation to rapidly changing climatic conditions.

| Data generation
We subsampled 34 of the 55 populations from Menon et al. (2018b) in order to match the locations for which morphological data were available (details under morphological dataset, Figure 1 and Figure   S1). This resulted in a dataset of 332 trees with 3-10 individuals per population. For P. strobiformis, we sampled 21 populations, of which 16 were from the putative hybrid zone and 5 from pure parental populations. In Menon et al. (2018b), the pure populations of P. strobiformis were referred to as the "Core" while the hybrid zone populations were referred to as the "Periphery." For P. flexilis, we sampled 13 populations, of which 6 were closer to the hybrid zone and 7 were from pure parental populations outside the hybrid zone.

| Morphological data set
Pinus strobiformis morphological data were obtained from 40 and 39 natural populations (3-8 trees/population) in Mexico and the United States, respectively ( Figure S1). Each population was separated by a minimum distance of 50 km, and each tree within a stand was separated by a minimum distance of 50-70 m (Goodrich, Waring, & Flores-Renteria, 2018). Mean cone length (cm) and mean seed weight for 10 filled seeds (g) were obtained from 10 air-dried ripe cones per tree with no visible signs of insect or disease damage. Further information about data collection and processing of the samples are detailed in Leal-Saenz et al., 2018). For P. flexilis, morphological data were obtained from 13 natural populations (5-10 trees/population) in Colorado and southern Wyoming (Figure 1). Choice of populations and trees along with the protocol used for cone and seed measurements were similar to those for P. strobiformis.
The use of cone length and seed weight to assess hybrid zone movement is based on their association with fitness and specifically with reproductive output in conifers (Mosseler et al., 2000). Further, they are often used as diagnostic traits to distinguish pine species (Bisbee, 2014;Frankis, 2009;Leal-Saenz et al., 2018). Of the 55 populations in the genetic dataset, only 24 had exactly matching coordinates with the morphological dataset. To increase our sample size, we averaged the morphological data from populations that were within a 5-km radius of each missing population in our genetic dataset. Through this approach, we were able to add 10 populations, resulting in a total of 34 populations in the final dataset ( Figure 1). The choice of 5 km is based on pollen dispersal kernels and paternity analysis in pines demonstrating pollen viability even at 41 km away from its source of origin (Williams, 2010). Hence, individual trees within a 5-km radius can be considered closely related to each other. To bolster this argument, we utilized an independent genetic dataset and estimated individual relatedness for trees along gradients of geographical distance from 0 to 800 km within the hybrid zone by using the probability of sharing two alleles implemented within RelateAdmix (Moltke & Albrechtsen, 2014). Our assessment indicated that relatedness did not change much across this range of geographical distances (results not shown), thereby justifying the use of a 5-km threshold to group populations.

| Genetic dataset
Genotypic data were taken from Menon et al. (2018b). In brief, ddRADseq libraries following Parchman et al. (2012) were generated from total genomic DNA extracted from needle tissue of 445 individuals across 55 populations. These were downsampled to 332 individuals across 34 populations, following the approach detailed above. Each library contained up to 96 multiplexed samples that were individually digested using EcoR1 and Mse1 restriction enzymes.
Fragments in the 300-400 bp size range were selected post-PCR and sequenced on Illumina HiSeq 2500 at the Nucleic Acids Research Facility located in Virginia Commonwealth University. An initial set of single nucleotide polymorphisms (SNPs) were obtained by processing the output FASTQ file using the dDocent bioinformatics pipeline (Puritz, Hollenbeck, & Gold, 2014). These SNPs were subsequently filtered using custom Python scripts to yield a final set of 51,633 SNPs.

| Simulation data set
We used CDMetaPOP v1.14 (Landguth, Bearlin, Day, & Dunham, 2016) to simulate the spatial movement of individuals under varying modes of speciation. Briefly, CDMetaPOP is a spatially explicit and individualbased eco-genetic model of meta-population dynamics that simulates demographic and genetic processes as interactions between individuals located across a number of patches containing meta-populations (hereafter, "groups"). Our landscape included three groups: P. strobiformis, hybrid zone and P. flexilis. We matched the spatial set-up (location and extent of each group), dispersal parameters and the degree of genetic divergence among groups within CDMetaPOP to empirical estimates appropriate to our study system. Further details about parameterization and landscape set-up are listed in Appendix S1 and Table S1.
In order to match the simulation framework to the empirical data set, we divided the simulation workflow into three phases across the two models of speciation shown in Figure 2. Phase I only included two groups, pure P. strobiformis (green patches) and pure P. flexilis (blue patches) and represents the initial process of divergence between them. Patches in the middle were available but not yet occupied. We allowed the cycle of birth, migration, reproduction and death to occur every year and continue for 200 nonoverlapping generations. Phase II was the colonization phase, during which individuals from pure P. strobiformis started colonizing empty patches in the middle of the landscape for 20 nonoverlapping generations (light green). Once each patch in the middle had attained 50% carrying capacity, the amonggroup gene flow parameter was modified to generate hybrids and to incorporate the influence of spatially restricted gene flow in scenarios A.ii, B.ii and B.iii (Phase III: Figure 2 and Table S1). Phase III consisted of two scenarios that are common across the models of secondary contact (A) as well as models of speciation with gene flow (B). For scenario 1 (Phase III.i), the hybrid zone experienced bidirectional dispersal from both P. flexilis and P. strobiformis, whereas for scenario 2 F I G U R E 2 Layout for the simulation framework implemented in CDMetaPOP. The three colours (blue, brown and green) correspond to patches representing P. flexilis, hybrid zone and P. strobiformis. Arrows represent the directionality of dispersal between groups, with the dashed arrows indicating dispersal reduced by 50%. The scenario names are listed below Phase IV. Patch and movement parameters are detailed in Table S1 (Phase III.ii) the hybrid zone experienced bidirectional dispersal only from P. flexilis, in accordance with the best fit demographic model in Menon et al. (2018b). For the speciation with gene flow model, we added a third scenario (PhaseIII.iii) that was similar to Phase III.i but here among-group dispersal was reduced by 50%. All scenarios in Phase III were run for 500 generations. Phase IV of our simulation was the same across all conditions, and only included bidirectional dispersal between the hybrid zone and P. flexilis for 300 generations.
Parametrization for Phase IV was set in accordance with the contemporary pattern of gene flow as estimated from the best fit demographic model identified in Menon et al. (2018b). Overall, we had five scenarios at the end of Phase IV, of which scenarios A.i and A.ii were nested within the secondary contact model (Model A) and scenarios B.i, B.ii and B.iii were nested within the speciation with gene flow model (Model B). For each scenario, we performed 12 replicate runs.
To ensure that each of the scenarios mimicked the overall pattern of divergence in our empirical dataset, we estimated the overall levels of genetic differentiation (F ST ) and differentiation among groups (F CT ) using the hierarchical model implemented in HIERFSTAT (Goudet, 2005). Measures of genetic differentiation were obtained every 50 generation throughout our simulation to ensure that our divergence level matched the observed level of divergence. Our overall analyses indicated that the average value of F ST and F CT converged to the empirical estimates, specifically for scenarios that included some form of gene flow during Phase III (Table 1). Further details about the change in F ST and F CT values across time and across scenarios are presented in Appendix S2.

| Geographic and genomic cline analyses
The geographic cline analysis was conducted for both empirical (genomic and morphological dataset) and the simulated datasets; however, genomic cline analysis was conducted only for the empirical dataset. We first compared the geographic cline centre estimates between the genomic and the morphological dataset to determine whether the empirical data supported hybrid zone movement (hypothesis 2). Next, we used simulations to assess the temporal change in geographic cline centre across various demographic scenarios with two common types of divergence histories. Finally, we combined the results from genomic and geographic cline analyses to assess whether hybrid zone movement can be expected in the absence of selection.
We estimated great circle geographical distances (km) from the southernmost population to all other sampled populations using the package Geosphere v-1.5.7 (Hijmans, 2017) Team, 2017). Prior to fitting clines to the morphological dataset, we conducted a Shapiro-Wilk's test for normality in R and visualized the Q-Q plot of cone length and seed weight across the pure parental populations of both species. Both seed weight and cone length were normally distributed (P. strobiformis seed weight p-value = 0.44, and cone length p-value = 0.07; P. flexilis seed weight p-value = 0.38, and cone length p-value = 0.81) and hence satisfied assumptions of cline models (Barton & Gale, 1993). For the genetic data, we conducted two sets of geographical cline analyses-one for the global admixture coefficients estimated using fastSTRUCTURE (Raj, Stephens, & Pritchard, 2014) based Q-scores from all 51,633 SNPs, and a second set using the allele frequency of each of the nearly diagnostic SNPs (as in Wielstra et al., 2017). Nearly diagnostic SNPs were defined as those in the top 10% percentile of allele frequency difference between the pure parental ranges of both species (n = 4,857 SNPs).
We utilized the Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm implemented within HZAR v:0.2.5 in R (Derryberry, Derryberry, Maley, & Brumfield, 2013) to conduct cline fitting. We ran six replicate cline models for each dataset with different random seeds, each having a chain length of 100,000 steps after a burn-in of 10,000 steps. For each fitted model, the trace plot of each parameter estimate across replicate MCMC runs was examined to assess whether runs had converged to the same value. For the morphological data, we fit five models with varying exponential tail estimations (none, left-only, right-only, mirror tails and both tails estimated separately) and assessed their fit using the corrected Akaike information criteria (AICc) model selection framework. For the best fit model, we obtained maximum-likelihood estimates (MLEs) of geographical cline centre and cline width, as well as the 95% confidence interval (CI) given by ±2 log-likelihood units (2 LLU) around the MLEs.
For the Q-score and individual allele frequency estimates from 4,857 nearly diagnostic SNP set, we fit 15 different cline models with varying combinations of the tail (5 possibilities detailed above) and scaling parameters (fixed at 0 & 1, estimated or observed values). The 15 different genetic cline models were compared against each other and also against a null model of no cline using AICc model selection.
Geographic centre estimates from the best fit model were obtained for a total of 4,858 different cline models (1 global Q-score + 4,857 nearly diagnostic SNPs). If the geographic cline centre estimates for the Q-score and morphological dataset were noncoincident, it TA B L E 1 Among group (mean F CT ± 1SE) and among population within group (mean F ST ± 1SE) genetic differentiation measures from Phase I to Phase III across Model A and Model B in the simulations indicated a shift in the location of the hybrid zone towards the estimate obtained using the Q-score.
For the simulated datasets, Q-score estimates from fastSTRUC-TURE at K = 2 were used to conduct the geographic cline analysis using a similar approach as detailed above. However here, the comparison was made across time rather than between Q-score and the morphological dataset. We estimated the geographic cline centre across replicate runs for all scenarios starting at generation 500 for every 5 generations up to generation 520 and then every 50 generations up to generation 1,020. The geographic cline centres from the simulated scenarios were compared to the Q-score estimate from the empirical dataset by assessing the percentage change in the cline centre relative to the total vertical spatial extent of the respective landscapes (simulated vs. empirical). We then examined whether the relative change noted was equal to or greater than the    Table 2). The MLE of cline width for seed weight was 53 km wider than that of cone length (Table 2).

| Geographic and genomic cline analyses
Using the Q-score obtained from all 51,633 SNPs in fastSTRUC-TURE, the MLE of geographic cline centre and width was 1,523 and 106 km, respectively ( Figure 3c; Table 2). This best fit model included an exponential mirror tail fit and fixed scaling parameter set to 0 and 1 at either end of the cline. The 95% CI around the MLE of Q-score cline centre did not overlap with the cline centre estimate for either morphological trait (Table 2). Overall, the average geographic cline centre estimate for Q-score was shifted northward by 446 km relative to that of the morphological data, corresponding to a shift of 21% relative to the full vertical extent of the landscape. Of the 4,857 nearly diagnostic SNPs (defined in the Section 2), 1,266 were excluded from further analysis either due to the null model (no cline) being the best fit to the data (n = 977) or because the cline centre estimate was greater than the actual length of the transect used in our study (n = 289). For the remaining 3,590 SNPs, the mean (median) estimate of geographic cline centre was 1,358 km (1,484 km) ± 6 km (standard error, SE) units around the mean.
Of these 3,590 SNPs, the geographic cline centres for 608 were coincident with the Q-score cline centre, whereas 523 had a cline centre coincident with the morphological trait centre. Of the 608 that were coincident with the Q-score cline centre, 109 were positive α outliers (P. flexilis ancestry) and 325 were negative α outliers (P. strobiformis ancestry). Of the 523 SNPs that were coincident with the morphological trait centre, 56 were positive α outliers and 212 were negative α outliers. The 3 × 3 chi-square contingency test for the association between α outlier categories and geographic cline centre categories was significant (Χ 2 = 76.727, df = 4, p-value = 8.8 × 10 −16 ). This significance was predominantly driven by loci that were not α outliers and exhibited a strong association with the morphological cline centre. Our results also indicated that SNPs overlapping with the Q-score cline centre (i.e., exhibiting a northward shift) were strongly associated with positive α outliers (i.e., retention of P. flexilis ancestry). However, SNPs with geographic cline centres overlapping with the morphological data were likely to be identified as either negative α outliers (i.e., retention of P. strobiformis ancestry) or not an outlier with respect to α (Table 3).
For both Models A and B in the spatial simulations, the geographic cline centre estimate stabilized across time and across replicates as we moved from high levels of dispersal to spatially restricted patterns of dispersal (Figure 4). Across these models, Phase III exhibited greater variation in the geographic cline centre estimate F I G U R E 3 Empirical data-based geographic cline as a function of distance from the southernmost population plotted for (a) cone length (b) seed weight and (c) Q-score obtained from fastSTRUCTURE

| Linkage disequilibrium
For the 4,857 nearly diagnostic SNPs obtained from the empirical dataset, the ratio of within-population-to-among-population LD (D IS :D ST ) exhibited a sigmoidal association with geographic distance from the southernmost population in our study (Figure 5a). Towards the southern edge, D IS was lower than D ST . As we move towards the northern margin of the hybrid zone, the ratio exhibits a steep positive increase, which is attributable to a sharp increase in D IS estimate while D ST remained mostly constant across the transect (Table S3).
Spatial pattern of change in D IS :D ST across all simulated scenarios exhibited a similar sigmoidal pattern as noted in the empirical dataset. Although the overall sigmoidal pattern of change does not differ between generation 300 and generation 1,020, the D IS :D ST values were larger near the northern limit of the hybrid zone during generation 1,020 ( Figure S3). The largest increase in D IS :D ST was noted for scenarios A.ii and B.ii at generation 1,020 relative to that at 300.

| D ISCUSS I ON
Reduced genetic diversity and maladaptive gene flow from core populations limit species geographical range expansions (Gilbert et al., 2017). Using a combination of individual-based spatial simulations and a genome-wide empirical dataset, we examined how differences in regional population size between parental species and the hybrid zone (see Landscape set-up under Appendix S1) along with adaptive introgression can overcome these constraints in a white pine species complex. Linking individual-based, spatially explicit simulations with empirical analysis of genomic data has enabled us and several others (Cushman, 2014(Cushman, , 2015 to predict and explain the influence of complex genetic processes such as introgression and varying divergence histories on patterns of genetic diversity across the landscape. Our results demonstrate that adaptive introgression is likely facilitating northward range expansion in Pinus strobiformis, beyond that which could be attributed to differences in regional population size alone, as seen in our simulation study. However, we caution that conservation efforts at the species or population level should not solely rely on results from cline analysis, and that results such as ours should ideally spawn further detailed studies to aid management of interacting species. TA B L E 3 Percentage contribution of each category to the 3 × 3 contingency test using genomic cline centre (α) and geographic cline centre estimates. Values listed are the percentage contribution to the chi-square statistic F I G U R E 4 Relative percentage change in the geographic cline centre estimate across generations for all five simulated scenarios in CDMetaPOP. The change at any given generation is relative to the estimate at generation 500 and to the total spatial extent of the simulated landscape tre as compared to that of loci with P. strobiformis ancestry (Table 3).

| Climate-driven movement of the hybrid zone
Such coincidence between locus-specific ancestry and geographic

| Role of demographic processes
Although rare, previous studies have demonstrated hybrid zone movement in the direction opposite of that predicted by global F I G U R E 5 Change in the ratio of within-population-to-among-population variance components (D IS :D ST ) for the empirical dataset plotted as a function of geographical distance for 11 sets of 4 populations (a) using 4,857 nearly diagnostic SNPs and (b) using 20 bootstrapped sets of nondiagnostic SNPs. The dotted vertical line represents the Q-score geographic cline centre. The three colours represent the geographical location of pure P. strobiformis (grey), hybrid zone (brown) and pure P. flexilis (green) climate change. For example, in an oak species complex, differences in relative abundances among Quercus species at the zone of contact have been shown to drive the genomic composition of hybrid individuals and the directionality of backcrossing (Lepais et al., 2009). In this study system, undocumented differences in relative abundance (but see Appendix S1 for proxy estimates) or ongoing introgression only from P. flexilis (Menon et al., 2018b) could also be driving the inferred hybrid zone movement. If true, such recent introgression at the expanding range fronts could globally elevate within-population LD relative to the among-population LD (Ohta, 1982), as observed at the northern range front of the hybrid zone in the empirical data herein.
Patterns of LD have had a long history of use to detect hybrid zone dynamics (Dasmahapatra et al., 2002;Mallet et al., 1990;Wang et al., 2011); however, the implementation of variance partitioning to distinguish between genetic drift and selection has been infrequent.

| Role of intrinsic processes
In addition to differences in regional population sizes, the asymmetric pattern of introgression could result from intrinsic selection pressures such as the formation of co-adapted gene complexes (Barton & Hewitt, 1985). For example, in the hybrid zone between Townsend and Hermit Warblers, a downward latitudinal shift in the hybrid zone has been attributed to fitness differences and competitive superiority of Townsend Warbler, which is a more northerly species (Krosby & Rohwer, 2009 (Table 3).
Comparisons between the empirical and simulated datasets demonstrated that although differences in regional population size of each species and intrinsic selection pressures can drive spatio-temporal dynamics of the hybrid zone under certain scenarios of gene flow, the magnitude of movement observed in our study is unlikely to have occurred without invoking adaptive introgression from P. flexilis. hybrid populations within the expanding front will be determined by differences in their ability to withstand the dual pressure of freezing temperatures and white pine blister rust. However, coincidence of both biotic and abiotic stress tolerance traits may be problematic considering that they may be interrelated (Vogan & Schoettle, 2015. Documenting this trade-off across populations and among hybrid versus parental species will be important for management efforts given the projected northward range expansion of the hybrid zone and continued spread of C. ribicola (Schwandt, Lockman, Kliejunas, & Muir, 2010). Preliminary data from common garden trials screening for genetic resistance to C. ribicola, however, suggest higher quantitative resistance in hybrid populations in comparison with pure P. flexilis (Pers. Comm. Sniezko RA and Schoettle AW). If this pattern holds true, the putative combination of greater freeze tolerance (in comparison with pure P. strobiformis) and higher resistance to white pine blister rust (in comparison with pure P. flexilis) in the hybrid populations will make them an ideal source for assisted migration and germplasm conservation efforts. Future and ongoing research across several common gardens will be able to address this hypothesis in detail.

| Implications for forest management
Managers considering outplanting of seedlings, a key strategy for resilience of both species (Schoettle, Jacobi, Waring, & Burns, 2019), should include the expanded hybrid zone identified herein as this might contain novel adaptive trait combinations, while not neglecting southern or isolated populations that may also contain important adaptive traits related to drought stress (Hampe & Petit, 2005). For example, the southern US and Mexican populations may be able to withstand higher drought stress that is projected to occur under the current climate scenario for western North America and hence could be a sustainable source of pure P. strobiformis. Approved seed zones further complicate efforts to select the best planting stock for future conditions since these typically do not incorporate information about hybridization or shifting climate niches. Communication between researchers and managers will be important as new research results become available, to enable managers to make the best choices under existing management and budgetary frameworks.
Our findings present a valuable case study showing that introgression has contributed to increasing genetic diversity in marginal populations and facilitated adaptive evolution in a forest tree hybrid zone. As emphasized since the late 1950s by Stebbins (1959) and recently reviewed by Janes and Hamilton (2017), hybridization can provide genetic resources to assist rapid adaptive evolution and range shifts in species such as P. strobiformis and P. flexilis, which are confronting the dual threats of rapidly changing climate and the invasive tree disease, white pine blister rust. Documenting genetic diversity across the range of the entire species and of the principal hybridizing species is useful for identifying populations requiring the most intervention and for identifying novel seed sources for future reforestation and assisted migration efforts (Aitken & Whitlock, 2013).
Thus, whether long-lived species such as forest trees will adapt, move, or exhibit plastic responses in in the face of rapidly changing climate (Aitken, Yeaman, Holliday, Wang, & Curtis-McLane, 2008) will depend on the availability of standing genetic variation, which is constantly modulated by the dynamic nature of landscapes resulting in new and altered species interactions through time. and VCU Integrative Life Sciences is also acknowledged. We thank the generous computational resources available through VCU HPC grid-engine system and through University of Montana. We also acknowledge the field sampling and laboratory work crew in Mexico and in the United States. We are thankful to Socorro Gonzalez for valuable comments and discussions on this manuscript. Finally, we thank the reviewers for helping improve our manuscript.

CO N FLI C T O F I NTE R E S T
None declared.

DATA ACCE SS I B I LIT Y
Genomic data used in this study are publicly available through Dryad