Characterizing the physical and genetic structure of the lodgepole pine × jack pine hybrid zone: mosaic structure and differential introgression



Catherine I. Cullingham, Department of Biological Sciences, CW 405 – Biological Sciences Building, University of Alberta, Edmonton, Alberta, Canada T6G 2E9.

Tel.: +1 780 492 8368;

fax: +1 780 492 9234;



Understanding the physical and genetic structure of hybrid zones can illuminate factors affecting their formation and stability. In north-central Alberta, lodgepole pine (Pinus contorta Dougl. ex Loud. var. latifolia) and jack pine (Pinus banksiana Lamb) form a complex and poorly defined hybrid zone. Better knowledge of this zone is relevant, given the recent host expansion of mountain pine beetle into jack pine. We characterized the zone by genotyping 1998 lodgepole, jack pine, and hybrids from British Columbia, Alberta, Saskatchewan, Ontario, and Minnesota at 11 microsatellites. Using Bayesian algorithms, we calculated genetic ancestry and used this to model the relationship between species occurrence and environment. In addition, we analyzed the ancestry of hybrids to calculate the genetic contribution of lodgepole and jack pine. Finally, we measured the amount of gene flow between the pure species. We found the distribution of the pine classes is explained by environmental variables, and these distributions differ from classic distribution maps. Hybrid ancestry was biased toward lodgepole pine; however, gene flow between the two species was equal. The results of this study suggest that the hybrid zone is complex and influenced by environmental constraints. As a result of this analysis, range limits should be redefined.


Hybridization between closely related species is relevant to spatial evolutionary dynamics (Barton and Hewitt 1985; Seehausen 2004) and also to questions of conservation and wildlife management. For example, the composition of a hybrid zone can be important for invasive species management (Hoban et al. 2009), maintaining high-quality breeding stock (Burgarella et al. 2009) and quantifying potential losses of species diversity in response to climate change (Quintela et al. 2010). The physical structure of a hybrid zone can take many forms depending on the nature of selection on hybrid genotypes. For example, a tension zone occurs when two species hybridize, but the hybrid offspring show reduced viability/fecundity not related to the environment (Key 1968). This results in a relatively narrow, linear region (e.g., Swenson 2006; Carling and Zuckerberg 2011) that is constrained by selection and immigration (Key 1968; Barton and Hewitt 1985, 1989; Barton and Gale 1993; Arnold 1997). A hybrid zone can also develop when there is an environmental selection gradient (May et al. 1975); this occurs when hybridization happens between species with different environmental adaptations. The structure of the latter class of hybrid zone can be quite complex, and environmental heterogeneity can result in a ‘mosaic’ structure in the hybrid zone (Harrison 1993; Arnold 1997; Bridle et al. 2001; Vines et al. 2003) in which there is often differential selection of genotypes that depends on habitat attributes (May et al. 1975). Therefore, identifying and exploring the factors that influence the physical and genomic structure of a hybrid zone can allow us to predict areas of hybridization and better understand genetic interactions between hybridizing species.

Often, morphological characteristics are not sufficient to distinguish hybrids from parental species. In recent years, molecular methods have been shown to distinguish hybrids more reliably (Thulin et al. 2006; Quintela et al. 2010; Cabria et al. 2011). Molecular information, coupled with knowledge of an individual's location relative to the parentals, can be used to identify associations with environmental features that can, in turn, be used to infer the type of hybrid zone. For instance, one would not expect habitat associations for a tension zone, as this zone is considered to be maintained by selection and migration independent of the environment (Harrison 1993; Arnold 1997). As well, because hybrids in a tension zone have lower survival and/or fecundity, the probability of observing advanced generation hybrids would be drastically reduced, with the consequence that most hybrids should be F1 (Jiggins and Mallet 2000; Gay et al. 2008).

Statistical modeling can be used to investigate the relationship between parentals and hybrids with the environment and in doing so, shed light on the nature of the hybrid zone. Habitat association in the hybrid zone is expected when species have adapted to different environments and have subsequently come into secondary contact (Arnold 1997; Gee 2004; Carson et al. 2012). As a result, parentals will tend to occur in their adapted habitats and hybrids will be found in intermediate environments largely because they are unable to compete with parentals in the contrasting environments. Using climate and geographic variables, hybrid associations have been investigated for a number of diverse species. For example, Vines et al. (2003) found a strong association between aquatic habitat and allele frequencies for a hybrid zone between fire-bellied toads (Bombina bombina × Bombina variegata), and Dodd and Afzal-Rafii (2004) found that species admixture proportions were correlated with climatic variables for a western North America oak species complex (Quercus wislizeni, Q. parvula, Q. agrifolia, and Q. kelloggii). In a study of four avian hybrid zones between Pleistocene diverged species in North America, Swenson (2006) used ecological niche models to determine whether the environment played a significant role in maintaining the position of the hybrid zone among four avian taxa. In all four examples, aspect and mean annual temperature were the two most important determinants of hybrid zone structure and location. Here, we have an opportunity to investigate the environment–hybrid status relationship across a broad geographic region in a hybrid zone between two important forest species, lodgepole pine (Pinus contorta Dougl. ex Loud. var. latifolia) and jack pine (Pinus banksiana Lamb).

Lodgepole pine occurs throughout British Columbia, south into the United States following the Rocky Mountains and into western and central Alberta, and is found at a wide range of elevations (Carlson et al. 1999). It is considered mesophytic and occupies a broad range of soil types, including clay soils and bogs (Carlson et al. 1999; Yang et al. 1999). Jack pine is distributed throughout the boreal forest and extends from the Northwest Territories and Alberta to the east coast of Canada (Fig. 1). This species is considered xerophytic and typically occurs on well drained, nutrient-poor soils (Kenkel et al. 1997). These two species diverged in allopatry owing to Pleistocene glaciations (Wheeler et al. 1983) and came into secondary contact in north-central Alberta approximately 6000 years ago (MacDonald and Cwynar 1985; McLeod and MacDonald 1997; MacDonald et al.1998). Here, the two species form a hybrid zone with a complicated and poorly defined spatial structure (Wheeler and Guries 1987; Ye et al. 2002; Rweyongeza et al. 2007; Yang et al. 2007). One of the reasons that the hybrid zone has not been well-delineated is that lodgepole × jack pine hybrids can closely resemble one or the other of the pure species (Wheeler and Guries 1987; Rweyongeza et al. 2007). However, the recent development of molecular makers has made it much easier to reliably identify hybrids (Cullingham et al. 2011).

Figure 1.

Jack pine (Pinus banksiana) and lodgepole pine (Pinus contorta) species ranges and sample locations in western North America. Range distributions for Pinus contorta and Pinus banksiana were obtained from USGS (, accessed 29 July 2010) and are based on Little (1971).

The area of lodgepole pine and jack pine species overlap in Alberta has recently become one of considerable interest. The mountain pine beetle (MPB; Dendroctonus ponderosae Hopkins) is a bark beetle indigenous to western North America that primarily feeds on lodgepole pine. Beetle populations are typically in an endemic phase (Raffa et al. 2008). However, when conditions allow, populations irrupt into large-scale outbreaks (Safranyik and Carroll 2006; Raffa et al.2008). The most recent outbreak is the largest recorded in the past 125 years affecting millions of hectares of forest in western Canada and the United States (Hicke et al. 2006; Safranyik and Carroll 2006; Raffa et al. 2008; Bentz et al. 2010). In Canada, the range of MPB has recently expanded into northern British Columbia and across the Rocky Mountains into novel territory in Alberta (Robertson et al. 2009; Bentz et al. 2010; Safranyik et al. 2010) where lodgepole pine and jack pine overlap. There had been some uncertainty regarding the susceptibility of jack pine; however, MPB establishment in this new host was recently documented in natural stands in Alberta (Cullingham et al. 2011). While lodgepole pine is thought to share a long coevolutionary history with MBP (Kelley and Farrell 1998), jack pine is a novel host and thus is considered ‘naïve’. Coevolution of lodgepole pine with the MPB suggests that lodgepole pine defenses would be adapted to protect against MPB, whereas one might speculate that this would not be the case for jack pine. Indeed, the two species exhibit notable differences in their monoterpenoid profiles, a key component of the resin defense system (Lusibrink et al. 2011). These and other potential differences between lodgepole, jack, and hybrid pines in susceptibility to MPB attack or in MPB fitness, as well as differences in species densities could result in different host-beetle dynamics with important consequences for MPB outbreak dynamics and spread. Accurate species and hybrid distribution mapping is thus an important component of improved risk assessment and prediction of MPB spread.

In this study, we build upon the aforementioned analysis by Cullingham et al. (2011) through the addition of 1320 samples genotyped at the same 11 microsatellite loci. Our objective was to delineate the area of jack–lodgepole pine hybridization in Alberta and estimate the proportion of ancestry from each species into the hybrid zone. Understanding the distribution of pure parentals and their hybrids – and the factors influencing this distribution – is important information for forest management and can be used to develop improved models of MPB spread risk. To understand the factors affecting the distribution of hybrids, we used logistic regression to assess how environmental and climate variables explain the distribution of genetic ancestry. Our specific objectives were to (i) describe the hybrid zone between lodgepole and jack pine, including patterns of ancestry and gene flow in Alberta and British Columbia, (ii) develop a species distribution model using genetic species assignment as a function of environmental and climate variables to determine the extent to which the environment predicts species distribution, and (iii) using this model, spatially delineate the current hybrid zone and compare this with the historical, morphologically based hybrid zone.


Sample collection and microsatellite genotyping

We used 662 of the 678 lodgepole pine, jack pine, and hybrid samples collected in British Columbia, Alberta, Saskatchewan, Ontario, and Minnesota that were previously genotyped by Cullingham et al. (2011, Dryad data doi: 10.5061/dryad.8677) for which detailed geographic information was available. Additional foliage samples were collected from 17 locations in British Columbia and Alberta (= 901). The majority of the foliage samples were collected from January 2007–April 2008 from the crown using pole pruners or a shotgun. Detailed field sampling procedures can be found in the study by Cullingham et al. (2011). Additional samples were obtained from nursery stock originating from five distinct locations in British Columbia (= 18) and seeds from the Alberta Tree Improvement and Seed Centre seed bank (= 426, seven from a provenance in British Columbia and the remainder from 40 locations in Alberta; Fig. 1).

Seeds were germinated to obtain seedlings for genotyping. Prior to germination, seeds were surface sterilized according to Groome et al. (1991) to remove potential fungal contamination and then stratified to improve germination rates. For stratification, seeds were placed on a moist, autoclaved Kimpad (Kimberly-Clark, Irving, TX, USA) in closed containers and stored at 4°C in the dark for 2 weeks. To germinate, seeds were transferred to fresh, autoclaved seed germination trays lined with moist Kimpad, and placed in a growth chamber (25°C; 75% humidity; 12-h dark/12-h light; 250 μmol m−3 light intensity). Seeds were checked daily and watered when necessary. Seedlings were harvested when they were 2–3 cm long and stored at −20°C until DNA extraction.

Genomic DNA was isolated from ground needle and seedling tissue as described in Cullingham et al. (2011). Genotyping was completed for all individuals at 11 microsatellite loci; details of amplification and data collection can be found in Cullingham et al. (2011).

Diversity measures

The likelihood of microsatellite scoring errors, including stutter errors, large-allele dropout, and null alleles, was estimated in microchecker (Oosterhout et al. 2004). We assessed Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) across loci separately for lodgepole and jack pine, once defined, in genepop 4.0 (Raymond and Rousset 1995; web version,, accessed 27 April 2012). Significance was assessed using Bonferroni corrected alpha-values (Rice 1989). The following estimates of allelic diversity were calculated in GenAlEx 6.0 (Peakall and Smouse 2006): number of alleles, effective number of alleles, observed heterozygosity (HO), expected heterozygosity (HE), and the fixation index (F). Again, these were calculated separately for lodgepole and jack pine once defined.

Defining species classes

We used both newhybrids 1.1 beta (Anderson and Thompson 2002) and structure 2.3.1 (Pritchard et al. 2000; Falush et al. 2003, 2007) to calculate ancestry using the protocol outlined in Cullingham et al. (2011). For both programs, we used a burn-in of 50 000 and 500 000 Markov chain Monte Carlo (MCMC) sweeps for data collection.

Hybrid ancestry

We calculated the proportion of lodgepole and jack pine ancestry among hybrid individuals using introgress (Gompert and Buerkle 2010). This method calculates a hybrid index that is based on the proportion of alleles inherited from each parental population, and where alleles are shared between the parental populations, the uncertainty is included in the index (Gompert and Buerkle 2009).The parental populations were defined using jack pine in Ontario and Saskatchewan, and lodgepole pine from British Columbia to ensure no hybrids were included. Next, we randomly selected 100 individuals each of lodgepole pine, jack pine, and hybrids from Alberta, and then calculated the hybrid index based on the parental contributions and generated an ancestry plot. The microsatellite markers we used were initially optimized on lodgepole pine; therefore, to ensure that there was no ascertainment bias in estimating the ancestry of hybrids, we used the simulated hybrid classes generated by Cullingham et al. (2011) using hybridlab ver. 1.0 (Nielsen et al. 2006) and estimated the ancestry plots as above in introgress for comparison.

Gene flow between the parental species

Gene flow between species was estimated using the software package migrate (v3.2; Beerli 2006). migrate uses maximum likelihood to estimate migration (M) and effective population size (Θ = 4NEμ) without assuming equal migration or population size using a coalescent approach (Beerli and Felsenstein 2001). To estimate these two demographic parameters, we used individuals assigned as lodgepole or jack pine at > 0.95 in Alberta and selected a random subset of individuals from the lodgepole pine data equal to the number of jack pine in the sample (= 297), to avoid the confounding effects of differences in sample size on the estimate of gene flow. We used maximum-likelihood estimation with 10 short chains (1000 trees used of 1 000 000 sampled) and one long chain (50 000 trees used of 5 000 000 sampled), discarding the first 100 000 trees as the initial burn-in. We ran migrate five times with different random number seeds to verify the consistency of our results.

Distribution modeling

We used logistic regression and molecular information to model the probability of occurrence of jack pine, lodgepole pine, and hybrids. The output from structure includes ancestry values (Q) for each of the population clusters, where the sum of the ancestry across the population clusters for each individual sums to one. In this case, we used these continuous values for one of the population clusters as the response variable, and a set of spatial variables that included climate, moisture, and elevation (Table 1) were used as predictors. We did not include samples in the Cypress Hills area (southeast Alberta/Saskatchewan border), Saskatchewan, and Ontario because of their geographic discontinuity with the Alberta–British Columbia samples. We selected the ‘best’ model from among the full set of predictors based on the balance between the Akaike Information Criterion (AIC; Burnham and Anderson 2002) and ensuring that the variance inflation factor (VIF) for all predictors was <10 (Zuur et al. 2009). The VIF is a measure of correlation among predictor variables. Model validation was undertaken through bootstrapping 60% subsets of the full data set and comparing the resulting predictions for the remaining 40% to their actual observed values. Model accuracy for each bootstrapped realization was assessed using a receiver operating characteristic (ROC) curve analogue designed for nonbinomial (i.e., continuous) response variables (Obuchowski 2005; Nguyen 2007). ROC curves are a frequently used tool for assessing model performance and sensitivity in predictive habitat models (Fielding and Bell 1997; Jiménez-Valverde 2011). ROC curves are producing by plotting the false-positive rate against the true positive rate at different thresholds based on the cells from a confusion matrix. Overall model performance can be measured by the area under the ROC curve, also known as the AUC. In effect, AUC represents the probability that the model is making the correct predictions (i.e., true occurrence and true absence). Logistic regression was performed using the glm function in R (Hastie and Pregibon 1992), and model performance was assessed using the nonbinROC package in R (R Core Development Team 2011).

Table 1. Summary of predictor variables examined when developing a model to spatially predict Q-values. The ClimatePP model used to generate spatial climate data is available at:
DroughtClimate Moisture Index (CMI)CFS; Hogg 1997;
ElevationElevation in metersNASA ASTER DEM –
EXT_ColdExtreme minimum temperatureClimatePP v.3.2; Wang et al. 2006; Hamann and Wang 2005; Daly et al. 2002
DD >5Growing degree-days (>5°C)ClimatePP v.3.2
MAPMean annual precipitationClimatePP v.3.2
MATMean annual temperatureClimatePP v.3.2
MCMTMean coldest month temperatureClimatePP v.3.2
MWMTMean warmest month temperatureClimatePP v.3.2
ContinentinalityMWMT-MCMTClimatePP v.3.2
NFFDNumber of frost-free daysClimatePP v.3.2
SHMSummer heat/moisture indexClimatePP v.3.2
LatitudeNorthing in decimal degreesCentroid of 10-km cell.
LongitudeEasting in decimal degreesCentroid of 10-km cell.

To better understand the individual components of the continuous Qs-based logistic model (i.e., lodgepole pine, jack pine, and hybrids) and their contributions to overall model performance, three individual logistic regression models were built to test the accuracy of the continuous model described above for each species class independently. Here, the response variable for each model was the presence or absence of each of the three species classes. Model performances were tested using cross-validated subsets (60% training, 40% testing) and ROC analysis. Overall model performance was assessed using the AUC. ROC curves and AUC calculation for binomial data were performed using the ROCR package in R (Sing et al. 2009).

Following model validation, Qs were predicted across Alberta and British Columbia using the continuous logistic regression model. Spatial prediction was carried out at a resolution of 10 km2 and was restricted to the relevant extent of available environmental data. Thresholding of this predicted Qs layer was then undertaken to delineate a molecularly defined jack pine–lodgepole pine hybrid zone, where hybrids are defined by a Qs >0.1 and <0.9 according to Cullingham et al. (2011).


Diversity measures

A total of 1998 pine samples were genotyped at 11 microsatellite loci. The majority of these had complete genotypes, with a small proportion missing one locus (= 123) and two loci (= 11). Data were scored twice independently to ensure minimal scoring errors; there were a number of loci that were out of HWE with a heterozygote deficiency for both jack pine and lodgepole pine (Table 2). One locus had extreme deficiencies and was removed from all analyses. Only one locus pair for jack pine had significant LD and two for lodgepole pine. Allelic diversity measures were calculated for jack pine and lodgepole pine separately and are included in Table 2. Diversity (both number of alleles and heterozygosity) was greatest in lodgepole pine.

Table 2. Diversity measures for 11 microsatellite loci typed in jack and lodgepole pine. Number of individuals typed at that locus (N), number of alleles (Na), observed heterozygosity (HO), expected heterozygosity (HE), and the fixation index (F) were calculated in GenAlEx 6.4 (Peakall and Smouse 2006). Bold indicates loci out of Hardy–Weinberg equilibrium
LocusNAllJack pineLodgepole pine
Average   0.8520.165  0.6200.093  0.8310.099

Defining species classes

Species class assignments differed between structure and newhybrid for 46 individuals. In all instances, one method assigned either jack or lodgepole pine, while the other method assigned a hybrid. For the method that assigned the hybrid, the majority of the probability belonged to the pure species assigned by the first method. Of the 1998 trees that were genotyped, the final breakdown for species classes gave 386 jack pine, 1264 lodgepole pine, and 348 hybrids. The average assignment for lodgepole and jack pine was >0.98 for both assignment methods. Figure 2 illustrates the U-shaped distribution of Q-values (Qs) generated in structure for all trees sampled.

Figure 2.

Frequency distribution of Q-values generated in structure for trees sampled in Alberta and British Columbia. The cutoff values used to delineate lodgepole pine and jack pine ancestry are indicated by the dashed vertical lines.

Distribution modeling

The final best fit model of species class using structure-derived QS included elevation, drought, precipitation, a summer heat/moisture ratio, extreme cold, and location (latitude and longitude) (Table 3). Cross-validation using 60% training data and 40% testing data and summarized using a nonbinary ROC indicated reasonable overall model performance (mean continuous ROC = 0.758, SD = 0.01, = 100). Performance of individual species classes was assessed using traditional ROC and AUC (Fig. 3). Performance was best for jack pine and lodgepole pine, and less so for the hybrids. Spatial prediction of Qs using this model corresponded well with the observed hybrid areas. Figure 4 shows the predicted hybrid zone is more complex, and both the hybrid zone and the distribution of pine in Alberta are more extensive than previously described by Little (1971).

Figure 3.

Receiver operating characteristic (ROC) Curves for bootstrapped cross-validation tests of individual species logistic regression models. In these models, the response variable was a reclassified binary variable derived from the original continuous Q-values. ROC curves were plotted, and average area under the curve (AUC) was calculated for 100 cross-validation tests using 60% training and 40% testing data. A greater AUC indicates better model performance.

Figure 4.

Species distribution for lodgepole and jack pine modeled by logistic regression (QValue ˜ Elev + CMI + MAP + SHM + EXT_Cold + Latitude + Longitude; see Table 3). Color gradient represents the continuous predicted Qs from lodgepole pine (dark green) in the west, to jack pine (light green) in the east. Solid lines represent the 10th and 90th percentiles of the predicted Qs and indicate the revised ‘pure’ species boundaries according to our molecular criteria. The area between the black contours represents the revised, genetically determined pine hybrid zone. Also included are the historical eastern and western boundaries of the lodgepole and jack pine distributions, respectively (dashed lines), outlined by Little (1971) for reference. Historical distribution data were obtained from USGS (, accessed 29 July 2010).The spatial extent of prediction was determined by the extent of the climate data model (PPClimate v3.2).

Table 3. Logistic model summary. Summary of chosen ‘best’ model based on a minimization of Akaike's Information Criterion (AIC) and variance inflation factor (VIF) factors <10. VIF refers to the variance inflation factor; a measure of correlation among predictor variables. LRT refers to a likelihood ratio test that was used to determine the significance of each predictor. All predictors listed were significant. p(jP) refers to the probability of the occurrence of jack pine based on the Q-values. The opposite directions apply to lodgepole pine
PredictorVIFCoefficientsLRTEffect on p(jP)
(Intercept) 51.112  
Elevation (Elev)6.096−0.007724 556
Drought index (CMI)2.3030.060329 853+
Mean Annual Precipitation (MAP)3.761−0.00110 140
Summer heat/moisture index (SHM)3.671−0.0061217
Extreme min. temp. (EXT_Cold)3.804−0.324111 228+
Northing – Latitude5.583−0.608340 266
Easting – Longitude2.1890.235330 995+

Hybrid ancestry

There was generally greater lodgepole pine ancestry among hybrids than jack pine, as jack pine only contributed ≥0.5 ancestry for approximately 10 of the 100 hybrids analyzed (Fig. 5A). In contrast, simulated data predicted equal lodgepole pine and jack pine ancestry (Fig. 5B).

Figure 5.

Ancestry plots generated in introgress for hybrid individuals for each microsatellite locus: dark green – lodgepole homozygotes; green – lodgepole/jack pine heterozygote; light green – jack pine homozygote and for the entire individual. (A) 100 hybrid individuals genotyped in Alberta; (B) Eighty simulated hybrid individuals generated from reference lodgepole pine and jack pine.

Gene flow between the parental species

Results were consistent across all five runs. The average migration estimate across runs for lodgepole into jack pine (M = 2.47, SD = 1.01) was very similar to the average migration estimate from jack into lodgepole pine (M = 2.47, SD = 0.81).


Genetic analyses of jack and lodgepole pine in Alberta indicate that the hybrid zone is more extensive and complex than that proposed by Little (1971) and that the ranges of jack and lodgepole pine need to be redefined. The probability of genetic ancestry for these species is well predicted in Alberta by geography and habitat. Hybrids were found to occur in patchily distributed intermediate habitats that suggest a mosaic hybrid zone (Harrison 1993; Arnold 1997). Among hybrids, there is a greater proportion of lodgepole pine ancestry that could allow for the transfer of genes associated with effective defense against MPB from the coevolved lodgepole pine populations to naïve jack pine populations.

We found homozygote excess across a number of loci (Table 2), a phenomenon that has been attributed to selfing in other studies of deciduous and coniferous trees (Guries and Ledig 1982; Korshikov et al. 2007; Sutherland et al. 2010). Dancik and Yeh (1983) also reported positive values of F for lodgepole (F = 0.025) and higher values than we found for jack pine (F = 0.097), which they attributed to local inbreeding where seeds may tend to fall close to their maternal parent (Libby et al. 1969). However, in allozyme studies on two lodgepole pine populations, Epperson and Allard (1984, 1989) estimated outcrossing to be almost complete. Yet, their findings may be atypical and they suggested that the high density of trees (~1000/acre) contributed to the high outcrossing estimate. Therefore, a low rate of selfing may contribute to the homozygote excess observed in lodgepole pine. Our data were also collected across a very broad geographic range where some weak population substructure, perhaps owing to isolation by distance, could cause a Wahlund effect (Hedrick 2005); when we analyzed lodgepole pine at the stand level, we did find an average reduction in F to 0.060 (data not shown). Based on this, the Wahlund effect accounts for approximately 0.04 of the bias we observed and the residual 0.06 represents the potential effects of inbreeding and genotyping error. Genotyping errors are expected to strongly affect parentage and individual identification studies in which correct genotypes are critical (Bonin et al. 2004; Pompanon et al. 2005). On the other hand, the effects of genotyping error on analyses based on allele frequencies can be minor as long as sample sizes are large. Given the large sample size we have analyzed and the fact that detection of hybrids will rely on differences in allele frequency between the parental types, a small percentage of error is not expected to significantly influence our main results. The rate of missing data for each locus (Table 2) in our study is quite low (0.1–2%). Even if all missing data were attributed to null allele homozygotes, the null allele frequency would still be quite low (<4%). Given that our estimated error rate based on analyzing 48 duplicate samples is 0.8% (Cullingham et al. 2011), any cumulative effects of error on our analysis and the main results should be negligible.

The distribution of lodgepole pine, jack pine, and their hybrids is well predicted using geographic and environmental variables (Fig. 3) and supports the hypothesis that this hybrid zone developed following secondary contact of the two tree species that evolved in allopatry (Arnold and Bennett 1993; Harrison 1993; Moore and Price 1993). In these instances, parental genotypes have the potential to outcompete hybrids in their adapted habitat; if this is true for the species under consideration, we would expect to find jack pine at low elevations and highly drained soils, and lodgepole pine to be found at higher elevations and moist soil conditions given their habitat preference (Kenkel et al. 1997; Carlson et al. 1999; Yang et al. 1999). Our statistical model bears this out, and we find co-localization of parental species with hybrids with their preferred habitats. Most importantly, the zone of hybridization is much larger yet patchier than previously described (Little 1971) and represents a significant and spatially complex region.

Species classes were accurately predicted by elevation and drought index, as has been identified in other tree species distribution models (Sykes et al. 1996). Additional significant predictors were related to soil moisture and temperature (Table 3). Extreme cold (ExtCold – summarized as a 30-year normal) was also identified as significant, whereas the average minimum temperature (MCMT) was not. This result supports recent findings that temperature extremes can be more useful in predicting species distribution than averages (Jentsch et al. 2007), especially in trees (Zimmermann et al. 2009). That spatial location (i.e., longitude and latitude) was significant predictors, even in combination with multiple climatic predictors, suggests that additional sources of variation remain to be elucidated in the spatial distribution of lodgepole and jack pines in western Canada. As currently implemented, the geographic predictors likely capture one or several latent predictors that may relate to soil type or ecoregional classification. The basis of the habitat relationships likely resulted from the adaptation of the tree species to different environments during the Pleistocene glaciations (Godbout et al. 2005, 2008). Using this model, we found that hybrids tend to occur in transition areas between parental habitats. Using continuous habitat data, we predicted the probability of occurrence for the different species classes – including hybrids in Alberta and eastern British Columbia – and redefined the parental distributions and the hybrid zone (Fig. 4).

We were less successful at predicting hybrids in the southern region of Alberta, near the British Columbia border. Here, the model over-predicted the occurrence of lodgepole pine (Qs < 0.1), whereas genetic analysis identified trees in this area as hybrids (Fig. 4). One potential explanation for this over-prediction is that the coarse spatial resolution of the prediction layer (10 km2) does not accurately capture fine-scale habitat variation. More likely, however, is that lodgepole pine individuals were misassigned to the hybrid class in this region given the large distances from jack pine distribution. Analysis of simulated data using these markers did show a percentage (<2%) of pure lodgepole pine that were misassigned as hybrids (Cullingham et al. 2011). Our data also show increased ‘noise’ for lodgepole pine and hybrid ancestry (Fig. 2). Indeed, the histogram of Qs (Fig. 2) indicates a definite peak for jack pine that drops sharply, whereas lodgepole pine displays a more gradual decline which makes the threshold value that differentiates lodgepole from hybrids less clear. This result is not unexpected given the greater proportion of lodgepole pine ancestry in the hybrid class (Fig. 5A). We hypothesize that model prediction could be further improved using genomic, mitochondrial, and chloroplast species discriminating SNPs which will increase resolution because the parentals will not share alleles (Thompson et al. 2010; Väli et al. 2010).

Unequal ancestry within the hybrid population despite equal gene flow between the two species can be explained by several factors. First is the history of the two species. Based on pollen records, lodgepole pine colonized Alberta approximately 6000 years ago, 1000 years before jack pine (Ritchie and Yarranton 1978; MacDonald and Cwynar 1985; McLeod and MacDonald 1997; MacDonald et al. 1998). As well, the initial invading populations of jack pine were small and took a considerable amount of time to reach modern densities (MacDonald et al. 1998) and modern lodgepole pine populations tend to have higher densities than jack pine (Nealis and Peter 2008). These distributional differences could lead to increased opportunities for lodgepole pine to mate with hybrids, both historically and contemporarily, leading to greater ancestry among hybrids. Second, lodgepole pine tends to be more of a habitat generalist (Carlson et al. 1999) in comparison with jack pine. As such, hybrids with increased lodgepole pine ancestry would presumably exhibit greater habitat tolerances and be able to successfully establish in a wider range of habitats. Finally, there could be reproductive barriers. It has been shown for some tree species that hybridize there is differential gene flow because only one of the parentals is able to successfully mate with hybrid offspring. For example, Floate and Whitham (1993) looked at a cottonwood (Populus fremontii S. Wats. × P. angustifolia James) hybrid zone and found F1 offspring only backcrossed with one of the parents. The same has been found for other Populus species (Populus alba × P. tremula, Lexer et al. 2005 and P. deltoides Bartr. Ex Marsh. × P. balsamifera L., Hamzeh et al. 2007) and to some extent for oak (Quercus cocciferai L. × Q. ilex L.; Ortego and Bonal 2010). Additional analysis of the hybrid zone with mitochondrial and chloroplast species discriminating markers would allow us to investigate this further, as these markers can help to distinguish maternal and paternal contributions.

Implications for MPB management

The distribution of the hybrids, as well as the uncertain degree to which hybrids and jack pine compare to lodgepole pine as MPB hosts, will have important consequences for MPB population dynamics and outbreak consequences as the beetle continues to move eastward and northward. When a pest or pathogen encounters a novel host, it must be able to survive the abiotic conditions, find and infect suitable susceptible hosts, reproduce and disperse (Parker and Gilbert 2004). Although we already know that MPB can reproduce in both hybrids and jack pine (Cullingham et al. 2011), we do not know its reproductive success or other factors that impact fitness, such as the ability of larvae to survive harsh winter conditions.

Lodgepole pine has coevolved with MPB, and therefore, it is hypothesized that the defense system of this species has evolved in the presence of selective pressures imposed by MPB. In fact, there is evidence of population structure for resistance to MPB in lodgepole pine populations (Yanchuk et al. 2008). Furthermore, adaptive differences have been documented for lodgepole pine, where trees that have not been exposed previously to epidemic MPB populations support higher beetle reproductive success than do trees that are within epidemic areas (Cudmore et al. 2010). Together, these data demonstrate that MPB reproductive success can be influenced by pine genotype and suggest that in pine–MPB interactions, there is a genetic basis for the capacity of pines to affect MPB reproductive success. These differences could translate into different spread rates in jack pine and hybrid stands than in lodgepole pine, making accurate distribution maps important for predicting the impact of MPB.

Mountain pine beetle spread rates may also differ across the range of jack pine. We found evidence of gene flow between the two species, which would provide an opportunity for introgression of potentially beneficial MPB defense alleles to jack pine in Alberta and Saskatchewan given selective pressure (Zavarin et al. 1969; Rieseberg and Wendel 1993). For example, there is evidence of favorable introgression of jack pine alleles into lodgepole pine, as demonstrated by an increasing resistance to pest species (western gall rust [Endocronartium harknessii], stalactiform blister rust [Cronartium coleosporioides], needle cast [e.g., Lophoderium seditiosium], and sequoia pitch moth [Synanthedon sequoia]) in regions closer to the jack pine distribution (Wu et al. 1996). Thus, jack pine in the eastern portion of its range would be far less likely to acquire these beneficial alleles than jack pine that are proximal to lodgepole pine. From this, there could be an increased risk of spread with increasing distance from lodgepole pine, not taking other factors such as climate suitability into account.


Using microsatellite genotyping together with geographic and genetic modeling, we described the physical structure of the lodgepole × jack pine hybrid zone in Alberta and the genomic contribution from the parental species. Given the strong association of genetic ancestry with environmental variables and the range of generational hybrids, our results suggest this is a relatively stable hybrid zone that has existed for many generations. As well, the ecological adaptation of the parentals to distinct habitats should naturally maintain their ranges. Characterization of this stable tree species distribution represents useful information for forest management and quantifies one of the many uncertainties surrounding the likely continued eastward spread of the MPB, namely the likely species of host tree to be encountered. The Little (1971) distribution did not cover the full distribution of pine in Alberta (Figs 1 and 4). We determined that the hybrid zone is much more spatially complex and extensive than previously thought and also includes some regions in northern British Columbia not previously documented (Fig. 4). The redefined range maps will help managers make informed decisions and select appropriate genetic stock for reforestation. This study also raises questions for hybridization dynamics and evolutionary processes. Does the differential gene flow of lodgepole pine into the hybrids represents the movement of beneficial genes across species boundaries (Martinsen et al. 2001), or is this simply owing to the differences in tree distribution and density? A more comprehensive analysis using functional markers and species discriminating SNPs could provide more insight into the factors contributing to differential gene flow.

Data archiving statement

Data for this study are available at Dryad. DOI: 10.5061/dryad.456q26k3.


The authors would like to thank Sophie Dang, Matthew Bryman, Brad Jones, Darryl Edwards, Ed Hunt, and Stephane Bourassa (Univeristy of Alberta); Daniel Lux, Sunil Ranasinghe, and Tom Hutchinson (Alberta Sustainable Resources Development); Barry Cooke and Jim Weber (Canadian Forest Service, Natural Resources Canada); Michael Carlson (Government of British Columbia); Rory McIntosh and Rob Moore (Saskatchewan Ministry of the Environment); and Denys Yemshanov and Daniel McKenney (Canadian Forest Service, Great Lakes Forestry Centre). We acknowledge funding for this research from the Government of Alberta (AAET/AFRI-859-G07), as well as grants from Genome Canada, the Government of Alberta through Genome Alberta, and Genome British Columbia in support of the Tria I and Tria II projects ( of which J.E.K.C and D.W.C. are principle investigators. P.M.A.J. was additionally supported by a Killam Postdoctoral fellowship at the University of Alberta.