Differential introgression reveals candidate genes for selection across a spruce (Picea sitchensis × P. glauca) hybrid zone

Authors


Author for correspondence:

Jill Hamilton

Tel: +1 604 822 9102

Email: jillahamilton@gmail.com

Summary

  • Differential patterns of introgression between species across ecological gradients provide a fine-scale depiction of extrinsic and intrinsic factors that contribute to the maintenance of species barriers and adaptation across heterogeneous environments.
  • Introgression was examined for 721 individuals collected from the ecological transition zone spanning maritime to continental climates within the Picea sitchensis–Picea glauca contact zone using a panel of 268 candidate gene single nucleotide polymorphisms.
  • Geographic clines showed a strong spatial relationship between allele frequencies and both distance from the ocean along major rivers and mean annual precipitation, indicating a strong role for environmental selection. Interspecific patterns of differentiation using outlier tests revealed three candidate genes that may be targets of long-term divergent selection between the parental species, although contemporary genomic clines within the hybrid zone suggested neutral patterns of introgression for these genes.
  • This study provides a fine-scale analysis of locus-specific introgression, identifying a suite of candidate loci that may be targets of extrinsic or intrinsic selection, with broad application in understanding local adaptation to climate.

Introduction

The genetic architecture of hybrid zones offers a unique opportunity to evaluate how gene flow and selection facilitate the spread of genetic material in a natural setting (Barton & Hewitt, 1985; Rieseberg et al., 1999; Lexer et al., 2005, 2010). While much work has addressed broad-scale patterns of introgression between species, we are only beginning to resolve fine-scale patterns, identifying both candidates within select regions of the genome contributing to the maintenance of species barriers and candidates for divergent selection (Strasburg et al., 2012). Heterogeneous patterns of introgression across the genome reveal those regions that may be under divergent selection and important to local adaptation. Such analyses provide a fine-scale depiction of genomic patterns influenced by extrinsic or intrinsic factors that contribute to the evolution and maintenance of species barriers (Wu, 2001; Gompert et al., 2012).

Incomplete barriers to reproduction may permit gene flow between species, although genetic exchange will be restricted if recombination is limited or there is strong selection against hybridization (Barton & Hewitt, 1985; Scotti-Saintagne et al., 2004; Kane et al., 2009). Outside of these regions, however, both neutral and positively selected loci may move across the permeable genome through hybridization and introgression (Kane et al., 2009). In contrast to the traditional whole-genome view of isolation advocated by the biological species concept, this perspective identifies the gene as the unit of differentiation between species (Wu, 2001). Thus, locus-specific patterns of introgression across a porous genome can identify those regions that may be candidates for differentiation (Wu, 2001; Nosil & Schluter, 2011).

Genome scans examining interspecific differentiation, along with analyses of introgression at both landscape and genome levels, have identified candidate regions important to the maintenance of species barriers in a number of hybrid zones (Scotti-Saintagne et al., 2004; Yatabe et al., 2007; Carling & Brumfield, 2009; Nolte et al., 2009; Lexer et al., 2010; Teeter et al., 2010; Renault et al., 2012). Combining these approaches provides a powerful approach for identifying genomic regions between species that may be important for local adaptation or the maintenance of species barriers (Lexer et al., 2004; Payseur, 2010; Gompert et al., 2012).

Differential patterns of introgression among loci indicate the spatial movement of alleles across the landscape, as well as movement of genetic material into varying genomic backgrounds (Gompert et al., 2012). Traditional cline analysis uses the changes in allele frequencies across the landscape to describe cline shape and estimate the strength of intrinsic and extrinsic (environmental) selection (Barton & Hewitt, 1985; Brennan et al., 2009; Payseur, 2010). More recently, novel admixture mapping techniques have facilitated analysis of introgression of individual genotypes against genome-wide average amounts of admixture (Buerkle & Lexer, 2008; Gompert & Buerkle, 2010; Payseur, 2010). Selection may promote introgression of select genes or genomic regions into new genetic backgrounds, or enhance barriers to gene flow resulting in differential patterns of introgression across the genome (Gompert et al., 2012).

While hybrid zone clines reflect broad, recent selection dynamics across the hybrid zone, patterns of interspecific differentiation as estimated by locus-specific FST are informative on a longer timescale (Whitlock, 1992; Lexer et al., 2010). The strength of selection is indicated by the extent to which gene flow counteracts the accumulation of differentiation over time (Whitlock, 2008). These interspecific and intrahybrid zone patterns combined are complementary and informative at different timescales, providing greater power to detect segments of the genome that may be important in terms of adaptation and maintenance of species barriers (Bierne et al., 2011).

In tree species, where the creation of multiple generations of artificial crosses remains largely impractical because of long generation times, natural hybrid zones provide a range of recombinants with which to examine introgression (Martinsen et al., 2001; Scotti-Saintagne et al., 2004; Lexer et al., 2005). With an increasing call to explore new forest management strategies in the face of climate change, understanding local adaptation in a changing environment will benefit from multilocus genome scans across hybrid zones. Analysis of fine-scale introgression of candidate genes linked to functionally adaptive phenotypes will allow inference of selective forces at play in the maintenance of the introgression zone, the potential transfer of adaptations between species, and identification of genomic regions that maintain species. Furthermore, these data will inform our projections regarding the capacity of natural hybrid populations to adapt to future climates. Adaptation from standing variation may benefit from admixture where beneficial alleles have been ‘selectively filtered’ in past environments or in separate parts of species ranges, contributing to increased variation available for natural selection (Barrett & Schluter, 2008; De Carvalho et al., 2010). Determining how long-term and more contemporary processes influence and maintain patterns of genomic variation and evolutionary potential remains an important challenge.

In this paper we examine the fine-scale transfer of genetic material between two genetically and ecologically distinct species, Sitka spruce (Picea sitchensis) and white spruce (Picea glauca). Sitka spruce, the largest of Picea species, exhibits a narrow latitudinal distribution along the Pacific Coast, while the distribution of white spruce spans the boreal forests of North America. These species come into contact along the Nass and Skeena river valleys of northwestern British Columbia, and into Alaska. The contact zone in British Columbia spans an ecological gradient from the maritime climate of Sitka spruce to the continental climate of white spruce (Roche, 1969; O'Neill et al., 2002; Bennuah et al., 2004). The mild, wet maritime climate receives up to 2 m of precipitation annually, with a mean annual temperature of c. 8°C (Pojar et al., 1991). The continental climate is characterized by comparatively lower mean annual precipitation (MAP), and while average annual temperatures are somewhat lower, the temperature range is greater as a result of seasonal extremes: hot, dry summers with average temperatures exceeding 10°C and cold, moist winters with temperatures below 0°C are the norm (Ketcheson et al., 1991).

The objective of this research is to explore variable patterns of introgression and differentiation in this spruce hybrid zone across candidate gene loci putatively associated with adaptive phenotypic traits in forest tree species. Recent identification of candidate gene single nucleotide polymorphic markers (SNPs) putatively associated with adaptive phenotypes in spruce (Namroud et al., 2008; Holliday et al., 2010) has made it feasible to evaluate locus-specific patterns of introgression, differentiation, and adaptation. An earlier study (Hamilton et al., 2012) estimated hybrid index and characterized genome-wide patterns of admixture as well as phenotypic variation across this hybrid zone. Taking a genome scan approach in the current study, we tested for patterns of differentiation amongst loci identified as candidates for differential selection or linked to regions influenced by selection using a combination of approaches. We used FST-outlier tests to identify signatures of divergent or balancing selection among loci between parent species under near-equilibrium conditions. We then combined geographic and genomic clines to test the strength of selection and identify deviations from the genome-wide average as estimated with the studied SNPs. Locus-specific patterns depict fine-scale patterns of spatially varying selection and may identify those regions of the genome that are responsible for reproductive isolation or those important to local adaptation across this hybrid zone.

Materials and Methods

Sampling

Open-pollinated seeds were sampled from the upper canopy of mature trees across each of 29 locations spanning the contact zone between P. sitchensis and P. glauca along the Nass and Skeena rivers in northwestern British Columbia in 1997. Allopatric reference populations of P. sitchensis on Haida Gwaii (HG) and P. glauca from the Ottawa valley region, Ontario, were also sampled (Eastern North America, ENA; Table 1, Fig. 1). For each seed parent, 10 open-pollinated progeny were planted in a nursery for 1 yr and then transplanted into a common garden near Kitwanga, British Columbia, within the hybrid zone by the British Columbia Ministry of Forests, Land and Natural Resource Operations (55°17′N, 128°10′W, Fig. 1). Newly flushed lateral buds were sampled from 721 individuals within the common garden in June 2009 and frozen in liquid nitrogen for subsequent DNA extraction and genetic analysis. The same genetic materials were analyzed in Hamilton et al. (2012).

Table 1. Origins of Sitka × white spruce (Picea sitchensis × Picea glauca) study populations, sample size (N) and associated geographic distance and climatic variables
Population N LatitudeLongitudeElevationDrainage distance (km)MAT (°C)MAP (mm)Continentality (°C)
  1. MAT, mean annual temperature; MAP, mean annual precipitation. NA, not applicable.

  2. Reproduced, with permission, from Hamilton et al. (2012).

  3. a

    Parental populations used to calculate hybrid index.

Haida Gwaii (HG)a2453.65−132.201007.5148213.6
Mt Kaien1854.30−130.3360006.2306510.8
Seal Cove1854.32−130.323008290110.2
Bella Bella1652.17−128.166008.6282111.6
Ocean Falls1752.36−127.6960257.8443713.4
Ramsden Point3055.02−130.1320437308614.3
Kinkolith – low3055.00−129.9330516.6297914.8
Exstew – low1854.42−129.0730855.6169119.6
Eskers O/P – low2454.41−129.03100855.6163919.3
Eskers O/P – middle1854.41−129.02450853.9158119.8
Greenville – low2455.09−129.50220875.2170018.8
Kitimat – low3054.07−128.541001086.3272218.5
Hagensburg1852.39−126.58601157178716.6
Kitimat – high3054.20−128.552501195.4204419.1
Nass Camp – low3055.30−129.051851265.598919.9
Nass2955.28−129.021001275.5102020.3
Douglas Creek – high3054.83−128.738001502.5139619.8
Douglas Creek – middle3054.83−128.694501534.6168619.3
Kiteen – low2955.44−128.833101564.4108320.3
Doreen – low2454.83−128.31220166596520
Talchako River – low1852.15−125.963051804.990719.7
Cedarvale – low1855.02−128.351101885.483220.3
Talchako River – high1852.06−125.924881913.698519.8
Cedarvale – middle1855.03−128.422301924.589420.1
Donaker Creek1857.45−131.083002171.869226.5
Kitwanga – low2455.37−128.144502203.464021.8
Kitwanga Lake – middle2455.36−128.136802212.867321.8
Skeena Crossing1855.10−127.812102224.959022.8
Meziadin1856.10−129.302652344.790821.1
Moricetown1855.03−127.344402773.754023.1
Eastern North America (ENA)a4245.48−76.7279.2NA6.3914.2NA
Figure 1.

Sampling across the introgression zone between Sitka (Picea sitchensis) and white (Picea glauca) spruce in northwestern British Columbia, indicating provenance of origin of samples (circle) collected and planting location of the common garden experiment (square). Inset map provides North American range of P. sitchensis (dark grey) and P. glauca (light gray), and indicates source of the reference Eastern North American population. Reproduced, with permission, from Hamilton et al. (2012).

Genetic data analysis

DNA extraction was performed using a modified CTAB protocol (Doyle & Doyle, 1990). Fifty milligrams of tissue were ground in liquid nitrogen for extraction using a Mixer Mill MM 400 (Retsch, Newtown, PA, USA). Individuals were genotyped for 384 SNPs identified from previous intraspecific studies of parental species as putative candidates for roles in bud set timing and development of cold-hardiness (Holliday et al., 2008, 2010), weevil resistance (Porth et al., 2011) and growth traits (Namroud et al., 2008; Pavy et al., 2008; Pelgas et al., 2011). SNP selection and quality criteria are described in Hamilton et al. (2012). SNPs were genotyped using the Illumina bead array platform in conjunction with the GoldenGate allele-specific assay (Shen et al., 2005; Fan et al., 2006). Genotyping quality was evaluated using the GenomeStudio Genotyping Module (v1.0) as described in detail in Hamilton et al. (2012). Of the initial 384 SNPs genotyped, 268 met our quality standards for subsequent analysis.

Descriptive statistics

Global and locus-specific estimates of FST, expected heterozygosity (He), and inbreeding coefficient (FIS) were calculated for all populations and between allopatric populations using Arlequin 3.1 (Excoffier et al., 2005). The significance of the inbreeding coefficient was examined using Fisher's exact tests to test for departures from Hardy–Weinberg equilibrium. Linkage disequilibrium (D′) was estimated for SNPs within the same gene using R (R Development Core Team, 2007). Within-population maternal family structure may impact allele frequencies; however, a previous study in oaks indicates that mixing five to six families yields a population sample that is in perfect Hardy–Weinberg equilibrium (Streiff et al., 1999). Furthermore, previous research near the hybrid zone indicates that the effective number of pollen parents is very high (18–20 per mother tree), suggesting the use of open-pollinated progeny likely offers a large sample of paternal haplotypes (Mimura & Aitken, 2007).

Identifying outlier loci potentially affected by selection

To identify candidate gene SNPs that deviate from a null hypothesis of selective neutrality at near-equilibrium conditions between allopatric reference populations (HG and ENA), we used an FST-outlier detection method to assess interspecific patterns of differentiation. Based on indications of limited population structure among white spruce populations using a subset of 120 SNPs, we assumed the eastern Canadian white spruce populations are representative of British Columbia populations (Hamilton et al., 2012). Loci that exhibit extreme genetic differentiation relative to average patterns of the majority of loci are candidates for diversifying selection, while those loci that exhibit unusually low degrees of differentiation may be under balancing selection.

We used a Bayesian likelihood method developed by Foll & Gaggiotti (2008) implemented in the program BayeScan to examine patterns of differentiation. This approach teases apart the influence of two models, differentiating locus-specific (αi) and population-specific (βj) effects using logistic regression to calculate locus-specific estimates of FST (Foll & Gaggiotti, 2008). This approach is appropriate where gene flow is nonsymmetrical (Beaumont & Nichols, 1996; Nielsen et al., 2009), as previous research has suggested for this hybrid zone (Hamilton et al., 2012).

Positive values of the locus-specific parameter (αi) indicate that locus i may be more differentiated on average, while negative values indicate less differentiation on average. The model including only population-specific (βj) effects was compared with the model including both locus-specific and population (αi and βj) effects to test for selection, assuming equal prior probabilities for each model. The ratio of the marginal likelihoods of the two models provides a Bayes' factor (BF; Foll & Gaggiotti, 2008), which may be interpreted in terms of amount of support for loci being subject to selection (very strong support between log10(BF) = 1.5 and 2; decisive support for log10(BF) ≥ 2) corresponding to posterior probabilities of the model of selection (substantial evidence between 0.91 and 0.99, decisive between 0.99 and 1). Following 20 pilot runs of 5000 iterations, we used 50 000 additional iterations (sample size of 5000 and thinning interval of 10) to identify loci that may be candidates for selection. While these tests are informative, they remain subject to other influences of variation, including population demography and hierarchical population structure, and can result in overestimation of selection (Excoffier et al., 2009; Siol et al., 2010).

Genomic cline analysis

Genome-wide estimates of admixture were calculated using the R package Introgress (Gompert & Buerkle, 2010), which uses a maximum-likelihood approach to estimate ancestries of individuals summarized in a hybrid index. This program requires a priori knowledge of pure parental individuals, here identified as those reference populations HG (Sitka spruce) and ENA (white spruce), to estimate the proportion of ancestry attributed to either parent. A hybrid index of zero corresponds to pure white spruce, while a hybrid index of one corresponds to pure Sitka spruce.

Employing the genomic clines method of Gompert & Buerkle (2009) implemented in Introgress (Gompert & Buerkle, 2010), genomic clines were estimated for individual loci based on the probabilities of observing three possible genotypes at each locus as a function of genome-wide estimates of admixture (hybrid index): WW (homozygous white spruce), WS (heterozygous genotype), or SS (homozygous Sitka spruce). These genomic clines are estimated using multinomial regression of observed single-locus genotypes as a function of genome-wide admixture (Gompert & Buerkle, 2009). The likelihood of the regression model is compared with a null model of genotype frequencies simulating neutral introgression using parametric simulations as described in Gompert & Buerkle (2010).

Deviations from the genome-wide average as estimated with the SNPs are based on departures from the 95% confidence envelope for the logistic regressions as summarized through comparisons of the probability density of observed genotypic data with those of the neutral model of introgression following 1000 stochastic parametric simulations. A false discovery rate for a P-value of 0.05 was calculated using qvalue (Storey, 2002). Evidence for an excess (+) or deficit (−) of homozygous genotypes (WW and SS) corresponds, respectively, to an increase or decrease in the total probability density of the observed genotype compared with the corresponding probability density of the homozygous genotype of the neutral model (Gompert & Buerkle, 2009, 2010; Nolte et al., 2009). Similarly, evidence for an excess (+) or deficit (−) of heterozygous genotypes (WS) is obtained from comparing the observed frequency of heterozygous genotypes with the probability density of heterozygous genotypes under the neutral model.

Geographic cline analysis

The geographic gradient was estimated using drainage distance along river valleys from the Pacific Ocean. Previous studies have shown that genomic composition as estimated by hybrid index is best predicted by drainage distance through this mountainous region (O'Neill et al., 2002; Bennuah et al., 2004; Hamilton et al., 2012). Drainage distance characterizes the likely corridor of gene flow along valleys through which pollen, and to a lesser extent seed dispersal, occurs between the coastal maritime climate and the interior continental climate (Bennuah et al., 2004). Drainage distance was calculated from the coast inland using ArcGIS (Version 10), replicating methods from Bennuah et al. (2004). The pure white spruce reference population from Ontario (ENA) was excluded from this analysis as a result of its long distance from all other populations.

Geographic cline parameters were estimated individually for each SNP using a modified approach to traditional analysis. Traditional cline analysis assumes a sigmoidal shape that describes sharp changes in allele frequencies near the centre of the cline and shallower asymptotes towards the edges of the contact zone (Barton & Hewitt, 1985; Payseur, 2010). The most common parameters used to describe cline shape are width and centre (Porter et al., 1997; Carling & Brumfield, 2009; Payseur, 2010; Teeter et al., 2010). The existing field experiment sampled for this study was established to assess seed transfer for reforestation within the hybrid zone and not in allopatric areas, and thus did not sample either a pure white spruce reference population from British Columbia or populations at the eastern margin of the hybrid zone that may also include introgression from a third species (Picea engelmannii; Roche, 1969). The lack of pure white spruce populations in our sample is evidenced by an exponential increase in white spruce allele frequencies to the east without an upper asymptote. Consequently, we developed an alternate approach, estimating proxies for traditional parameters to account for the distortion in allele frequencies attributed to incomplete sampling of white spruce parent populations.

This novel approach has been adapted from equations found in biological growth curves that allow estimation of maximum slope of a nonlinear regression and ‘length of lag phase’ or lambda (Kahm et al., 2010; Paine et al., 2011). We interpret the maximum slope as an estimate of the strength of selection, comparable to the traditional estimate of width (slope = 1/width) in a sigmoid curve (Porter et al., 1997). Steeper slopes indicate stronger barriers to introgression between species, whereas flatter slopes may indicate either extensive allele-sharing through recurrent gene flow or shared polymorphisms. ‘Length of lag’ is interpreted as the drainage distance accumulated from the coast to the point of maximum rate of change in allele frequency. This parameter associates a geographic estimate on the landscape with maximum rate of change in allele frequency.

Linear regressions were compared with nonlinear regressions estimated using both asymptotic (logistic) and nonasymptotic (exponential) functions (Paine et al., 2011). The exponential model assumes a constant rate of increase with respect to change in allele frequency and drainage distance with no asymptote, whereas the logistic model assumes an initial asymptote without a constant rate of increase. All regressions were implemented in R using the self-starting routines (SSlin, SSlogis and SSexp) following recommendations from Paine et al. (2011). Akaike information criterion (AIC) and R2 values were compared between regression models to identify the best-fit model. Where the logistic model failed to converge, the exponential model was used as the sole comparison with the linear model. Using the fitted data from the best-fit model, we fitted a smoothed cubic spline using the gcFitSpline function within the R package grofit (Kahm et al., 2010) to estimate maximum slope and lag phase. The advantage of this approach is the step-by-step development of a best-fit model for each SNP using the approach described in Paine et al. (2011).

Enrichment of heterozygotes and homozygotes for genomic clines and directional shifts were assessed in geographic and genomic clines to identify those loci that may be candidates for involvement in reproductive isolation or adaptation. Where similar patterns were observed in geographic and genomic clines, the distribution of genotypes was associated with a previously identified important climatic variable, MAP (Hamilton et al., 2012) across the hybrid zone.

Results

The global estimate of mean genetic differentiation (FST) across all populations sampled was 0.18, ranging from 0.02 to 0.47 for individual loci (Supporting Information, Table S1). Expected heterozygosity (He) averaged 0.31 across all loci, ranging from 0.04 to 0.5. The inbreeding coefficient (FIS) averaged 3.73 × 10−5, ranging from −0.29 to 0.29 for individual loci, indicating overall low levels of inbreeding. No loci exhibited significant deviations from Hardy–Weinberg equilibrium based on exact tests (< 0.05).

Comparison of allopatric populations using AMOVA indicated that the majority of variation is accounted for between species (Table 2). Overall, the estimate of FST between allopatric Sitka spruce (HG) and white spruce (ENA) was 0.59, indicating that interspecific differentiation exceeded those where all populations, sympatric and allopatric, were evaluated. These results are in line with locus-specific estimates of interspecific differentiation values between the allopatric populations, ranging from zero to one.

Table 2. AMOVA for 268 candidate gene single nucleotide polymorphisms (SNPs) between Picea sitchensis and Picea glauca
Source of variationdfSum of squaresVariance components% of variation
Between species12902.5246.9758.70
Among individuals within species642126.180.170.21
Within populations662170.0032.8841.09
Total1317198.7080.02 

Outlier analysis

Using BayeScan, which uses a Bayesian decision-making approach to model choice with a ‘Bayes factor’ to provide evidence in support of a model of selection or neutrality, we observed three loci with a high posterior probability (> 0.90 or log10(BF) > 1) of being subject to selection. All three SNPs (53, 55, and 106) are candidates for or are linked to candidates for diversifying selection (Fig. 2). There were no candidates for balancing selection, an expected outcome given that loci were initially selected based on degree of interspecific differentiation in SNP frequencies (FST).

Figure 2.

Log-transformed Bayes factors and locus-specific FST estimated from Bayescan for 268 candidate gene single nucleotide polymorphisms (SNPs), identifying potential candidates for diversifying (excess differentiation) or balancing (deficient differentiation) selection across the Sitka (Picea sitchensis)–white (Picea glauca) spruce hybrid zone. The vertical line log10(BF) = 1 (solid) corresponds to a posterior probability of 0.91. For points beyond this line there is substantial to very strong evidence in favor of selection..

Genomic cline analyses

Extensive heterogeneity in genomic clines was observed, indicating many loci deviated considerably from the genome-wide average as estimated from all SNPs (Tables 3, S1, Fig. S1). All loci were assessed for deviations from the genome-wide average estimate of ancestry and null model of neutral introgression, excluding loci that lacked allele frequency differences between parent species. Of these, 94 exhibited a neutral pattern of introgression, indicating no deviation from the genome-wide estimate of ancestry. Other patterns included 64 loci exhibiting evidence for excess ancestry of white spruce alleles in a Sitka spruce background (WW+, Table 3, Fig. S1) and 67 loci with an excess of Sitka spruce alleles in a white spruce background (SS+, Table 3, Fig. S1), indicating potential positive selection. Sixty-seven loci had patterns consistent with a deficit of heterozygous genotypes (WS−), while 51 loci showed an excess of heterozygotes (WS+) compared with the model of neutral introgression (Table 3, Fig. S1). Of these 118 loci, two small groups exhibited a pattern of deficiency (16 loci, WS−) and excess (nine loci, WS+) of heterozygotes, respectively, while homozygous genotypes exhibited no deviation. Loci where heterozygotes are overrepresented may be candidates for or are linked to genomic regions involved in the transfer of adaptations contributing to transgression in adaptive traits. Where heterozygotes are underrepresented, these loci may be candidates for regions of the genome involved in reproductive isolation.

Table 3. Summary statistics for genomic clines, geographic clines and genetic differentiation (FST) across all loci
Homozygous genomic clinesHeterozygous genomic clines N F ST Slope (per km)Lambda (km)
  1. Individual loci (N) are classified based on genomic clines, indicating an excess (+) or lack (−) of homozygous white spruce (WW), Sitka spruce (SS) and heterozygous (WS) genotypes given estimates of genome-wide ancestry, or neutral. Mean and range for each class are provided for estimates of genetic differentiation (FST), slope (per km), and lambda (km).

WW+ SS−WS+210.20 (0.07–0.36)0.004 (0.001–0.008)116 (0–190)
WS140.20 (0.05–0.39)0.004 (0.000–0.008)115 (0–201)
WS−80.12 (0.03–0.44)0.002 (0.000–0.009)41 (0–163)
WW− SS+WS+80.23 (0.10–0.43)0.003 (0.000–0.005)137 (0–221)
WS160.15 (0.04–0.47)0.001 (−0.001 to 0.003)57 (0–198)
WS−280.26 (0.10–0.47)0.003 (0.000–0.009)144 (34–220)
WW+ SSWS+40.10 (0.08–0.12)0.002 (0.001–0.003)117 (77–146)
WS140.21 (0.09–0.34)0.006 (0.000–0.012)167 (96–231)
WS−30.10 (0.04–0.16)0.002 (0.000–0.003)73 (0–146)
WW− SSWS+20.17 (0.05–0.29)0.002 (0.000–0.003)127 (127–127)
WW SS+WS+20.26 (0.13–0.38)0.004 (0.001–0.006)70 (0–140)
WS10.19 (0.19–0.19)0.003 (0.003–0.003)59 (50–50)
WS−120.23 (0.10–0.47)0.003 (0.001–0.005)129 (0–205)
WW SS−WS+50.19 (0.07–0.25)0.003 (0.001–0.004)77 (1–155)
WS70.16 (0.03–0.25)0.003 (0.000–0.005)100 (46–119)
WW SSWS+90.16 (0.03–0.29)0.002 (0.000–0.006)97 (18–174)
WS940.19 (0.02–0.43)0.003 (0.000–0.011)118 (0–236)
WS−160.21 (0.04–0.45)0.003 (0.000–0.011)159 (84–215)

Genetic map relationships among these markers remain unknown, and therefore some values may be inflated if linkage disequilibrium (LD) exists between a marker and an adjacent genetic region under selection. SNPs within the same candidate gene indicate that physical proximity likely contributes to LD (i.e. SNP 53 and 55 – D′ = 0.99, < 0.0001). Preliminary assessment of genetic map locations for a subset of SNPs used in this study suggests the loci used are well distributed throughout the genome (J. Bousquet & N. Isabel, unpublished).

Geographic clines

While patterns of interspecific differentiation may reveal long-term influences of selection, direct examination of the introgression zone using clinal analysis may provide additional power to detect targets of selection under nonequilibrium conditions. Geographic cline parameters, slope, and ‘lag phase’ (lambda) were estimated for each SNP (Table 3, Fig. S2). Cline parameters reflect the change in frequency of the minor ‘Sitka spruce’ allele (the less frequent allele in Sitka spruce corresponding to the more frequent allele in white spruce) with increasing drainage distance from the ocean. Maximum slope of regressions, describing changes in allele frequency km–1 of drainage distance, averaged 0.003 km−1 and ranged from −0.001 to 0.012 (Table 3). Lambda is the intercept of maximum slope in km, indicating the point along the drainage distance transect from the ocean at which the greatest rate of change in allele frequency occurs. Locus-specific estimates of lambda ranged from 0 to 236 km, with an average of 119 km across all loci. Negative drainage distance values were set to zero for 22 loci, and lambda was not estimated for 45 loci where the slope was estimated as zero. Few loci exhibited maximum cline slopes > 0.011 km−1, suggesting the absence of strong or complex barriers to gene flow between the two species.

For loci where patterns were comparable between geographic and genomic clines, the distribution of genotypes across the hybrid zone was strongly associated with MAP (Fig. 3). Analysis of variance indicated that locus-specific genotypes were significantly differentiated based on precipitation (< 0.001). Sitka spruce genotypes spanned a range above 1000 mm, white spruce genotypes were distributed below 1000 mm, and heterozygous genotypes had a small range in precipitation centered at c. 1000 mm, with little overlap with either homozygous genotypes (Fig. 3).

Figure 3.

Comparison of genomic clines (top), geographic clines (middle) and climatic distribution of genotypes for single nucleotide polymorphisms (SNPs) exhibiting a cline width < 100 km. (a) Genomic clines indicate locus-specific patterns of introgression using the genome-wide estimate of admixture (hybrid index, 0; white spruce (Picea glauca), 1; Sitka spruce (P. sitchensis)) to estimate the probability of observing a particular genotype at that locus; P-values are provided in the right corner of the observed data under a model of neutral introgression. The 95% confidence envelope of the probability of the homozygous white spruce genotype (dark green) and the heterozygous genotype (light green) are based on 1000 neutral parametric simulations. Fitted genomic clines are observed for the homozygous white spruce genotype (solid line) and heterozygous genotype (dashed line), while open circles indicate observed genotypes: white spruce (WW, top), heterozygous (WS, middle) or Sitka spruce (SS, bottom). The frequency of observed genotypes is indicated on the right of the panel. (b) Geographic clines indicate the relationship between the frequency of the minor Sitka spruce allele and drainage distance (km). Geographic cline parameters indicate the proportion of variance accounted for using a linear or nonlinear (logistic or exponential) regression and maximum slope and intercept (lambda, dashed line) for fitted values (solid line). (c) Distribution of genotypes with respect to mean annual precipitation (mm) across the hybrid zone.

Discussion

Differential patterns of introgression exhibited within this natural hybrid zone suggest that targets of selection may vary strongly geographically across the landscape and between divergent genomic backgrounds. Substantial heterogeneity, both across the geographic landscape and within the genome itself, indicate the general permeability of these spruce genomes to introgression, yet identify specific genomic regions and geographic barriers where interspecific recombination may be limited. Moreover, fine-scale variability amongst loci indicates that both extrinsic and intrinsic selective factors likely influence introgression. Correspondence between geographic and genomic clines for some SNPs indicates a strong role for extrinsic selection, particularly in association with precipitation gradients. In addition, patterns of differentiation identified through outlier analysis point toward candidate gene loci that may be linked to genes important to local adaptation or the maintenance of species barriers over a longer evolutionary period. These results provide a fine-scale picture of the movement of genetic material across this ecological transition zone, identifying regions of the genome that warrant further exploration.

Interspecific patterns of genetic differentiation

Selection can be inferred from the distribution of genetic differentiation among species or populations across the genome. Candidate genes for divergent selection have a greater likelihood of residing in areas of the genome that are highly differentiated between species and may be involved in either reproductive isolation or adaptation (Beaumont & Nichols, 1996; Beaumont & Balding, 2004). We observed few candidate genes for divergent selection (Fig. 2); although some of these candidates may be linked to loci under divergent selection (Via, 2012).

The use of SNPs preselected from putatively adaptive genes developed from intraspecific studies requires a caveat for development of a neutral model of no-locus effect. Few outlier loci, combined with evidence from Arabidopsis thaliana that suggest different genes may be targets of selection in different environments (Hancock et al., 2011), indicate that these loci are likely reasonable for development of a neutral distribution. While the interpretation of posterior probabilities provides evidence for selection, other neutral processes may contribute to the patterns of variation observed. However, in a continuously distributed, wind-pollinated study system with large effective population sizes, these processes are likely less important. Consequently, while these test results should be interpreted with some caution, they offer a useful first step in pointing towards some candidate genes that may be targets of or linked to targets of selection. Indeed, the reported number of outliers may be conservative, owing to the use of candidate genes associated with putative function from gene expression studies. As these candidate loci have a higher probability of being under selection, shifting the distribution of FST estimates upwards compared with purely neutral loci, some loci under weak selection may not be detected.

Our results for the Sitka-white spruce hybrid zone across an east–west precipitation and temperature gradient identified different candidates for selection than those observed along a north–south temperature gradient in Sitka spruce by Holliday et al. (2010). SNPs 53 and 55 are found in a gene homologous to sphingolipid desaturase (SLD) identified in Arabidopsis (At2g46210.1), and are c. 100 bp apart. The strong LD between SNPs 53 and 55 (D′ = 0.99, < 0.0001) implies that one of these SNPs or a locus linked to both is likely the target of selection. These SNPs have been assayed in previous studies examining intraspecific variation and SNP–phenotype associations in the development of autumnal cold-hardiness and bud set timing in Sitka spruce (Holliday et al., 2008, 2010). Holliday et al. (2010) studied populations spanning gradients in Sitka spruce across the species' range of mean annual temperature and degree-days associated with phenological cues and development of cold hardiness. This gene, believed to be a fatty acid/sphingolipid desaturase, produces divergent phenotypes in Arabidopsis in response to cold temperatures (Chen et al., 2012). Sphingolipids likely contribute to both ion permeability and osmotic adaptation during response to both freezing and dehydration (Steponkus, 1984). This candidate gene was not associated with bud set or cold injury traits within Sitka spruce by Holliday et al. (2010). These results may provide support for different genes being targets of selection in different environments (Barrett & Hoekstra, 2011; Hancock et al., 2011).

SNP106 (GAI, Gibberellic Acid Insensitive), with putative function in Arabidopsis in gibberellic acid-mediated signaling (At1g14920.1), also exhibited excess interspecific differentiation. Interestingly, this was one of a number of candidate genes identified from expression studies of resistance in Sitka spruce to the white pine shoot tip weevil (Pissodes strobi, Ralph et al., 2006). Given gibberellins' role in the promotion of growth and induction of mitotic division, they may have a putative role in the formation of constitutive or traumatic resin canals involved in defense (Richards et al., 2001). Previous studies have observed significant differentiation between parental species in the formation of resin canals (O'Neill et al., 2002). Association tests within a phenotyped population of advanced-generation introgressed individuals are needed to further test these loci.

Genomic clines

Using genomic clines, we compared deviations from a model of neutral introgression for individual loci indicated by an excess or deficit of specific genotypic classes given expectations based on genome-wide ancestry across the hybrid zone. Loci exhibited a diversity of patterns ranging from under- to over-representation of certain genotypes. The single most common category across homozygous and heterozygous genotypes fitted the neutral model of introgression (WW WS SS, 94/264 loci) although a majority of loci displayed a diversity of patterns inconsistent with neutral introgression. The most common categories deviating from a pattern of neutral introgression were a deficit (WW− WS− SS+, 28/264) and an excess of white spruce genotypes (WW+ WS+ SS−, 21/264). These loci may point towards candidate regions of the genome involved in either reproductive isolation or adaptive introgression, respectively, although we cannot exclude the influence of neutral evolution. Surprisingly, all three previously described FST-outlier loci exhibit neutral patterns of introgression in the genomic cline analysis. However, the outlier approach analyzes the long-term accumulation of differentiation between allopatric populations, while the genomic cline analysis reveals the more contemporary influence of locus-specific selection. Furthermore, while FST-outlier tests may reveal loci under selection, there are many conditions where these tests may fail to identify loci involved in adaptation (Le Corre & Kremer, 2012), particularly for polygenic traits, pointing towards the importance of a combined analytical approach.

While a false discovery rate of 0.007 at α = 0.05 indicates few false positives, sampling design may have influenced the patterns we detected (Storey, 2002; Teeter et al., 2010). Few individuals were sampled from the eastern end of the introgression zone, and this likely impacted the observed frequency of white spruce genotypes. However, because genomic clines reflect locus-specific introgression with respect to genome-wide ancestry across all loci, the patterns reflect the rate of introgression given present ancestry. In addition, asymmetric introgression toward Sitka spruce (Hamilton et al., 2012) may result in selection against white spruce genotypes where strong pre- or postzygotic barriers exist.

A deficit of heterozygotes (WW WS− SS) was observed in 16 loci. These loci may be within or linked to genes that result in decreased fitness of heterozygotes, although whether this is a result of intrinsic or extrinsic genetic factors remains unclear. Selection against heterozygotes may indicate either simple under-dominance or intrinsic Dobzhansky–Muller incompatibilities resulting from heterozygote incompatibility (Teeter et al., 2010). SNP16 had a very steep geographic cline, exhibiting a pattern of underdominance, further supporting comparisons between geographic and genomic clines. This SNP has putative function within a Scarecrow-Like protein (SCL3, At1g50420.1), which interacts with the gibberellin (GA) pathway, regulating growth in Arabidopsis (Zhang et al., 2011).

Few markers (9/264) exhibited an excess of heterozygotes relative to neutral expectation. These loci, however, may indicate an adaptive advantage for hybrid genotypes (Nolte et al., 2009). Transgressive patterns exhibited in hybrids for cold tolerance at moderately cold temperatures in a previous study suggest higher fitness of hybrid genotypes within the hybrid zone (Hamilton et al., 2012). This lends support to the role of hybrid zones in the transfer of adaptations between species (Rieseberg & Wendel, 1993), and may have important implications in the response of admixed populations to a changing climate.

Genomic clines tell us much about locus-specific introgression, although some caveats are necessary. While the majority of loci exhibit deviations from a neutral model of introgression, some of these could result from genetic drift alone, particularly if drift occurred independently in different populations. However, it is unlikely that drift would overwhelm selection, particularly given the large effective population sizes throughout the hybrid zone (Nolte et al., 2009). Furthermore, the number of loci in categories deviating from a pattern of neutral introgression may be overestimated as a result of LD among loci, which is expected to be substantial within hybrid zones (Lexer et al., 2007). Estimates of LD between SNPs on the same gene (two to four SNPs per gene) indicate that those SNPs are significantly linked (average D′ = 0.89 across 18 genes, < 0.0001) and so signatures of selection may reflect nearby polymorphisms rather than specific SNPs. In future studies, estimation of linkage relationships and the degree of recombination following first contact will be possible as genetic maps and a whole genome sequence become available for spruce. Finally, distinguishing loci under neutral introgression from those loci under positive or negative selection is integral to the genomic clines method. With interspecific crosses we expect more loci to have negative fitness consequences than positive ones (Barton, 2001). There may be a slight bias toward overestimating positively selected loci in the hybrid zone and underestimating negatively selected ones if rates of neutral introgression are underestimated (Rieseberg et al., 1999). This suggests our estimate of the frequency of loci exhibiting negative selection is conservative.

Geographic clines

The ecological transition from maritime to continental climates is reflected at a finer spatial scale in the substitution of largely Sitka spruce alleles for white spruce alleles with increasing drainage distance up river valleys. While there is variation among locus-specific clines, on average the maximum rate of change in allele frequencies (0.003 km−1) occurs at c. 120 km from the coast. This distance is concordant with the beginning of a sharp decline in MAP, suggesting that moisture may be a strong selective factor across this ecological gradient. Our previous research indicates that ancestry is influenced by precipitation and temperature gradients spanning the ecological transition zone (Hamilton et al., 2012). Particularly sharp clines in genome-wide ancestry associated with precipitation indicate that the intensity and direction of selection may change below a threshold of c. 1000 mm of precipitation annually (Hamilton et al., 2012).

We have estimated proxies for the most commonly used cline parameters: maximum slope, indicating the maximum rate of change in allele frequency km–1 drainage distance; and lambda, the geographic point on the landscape where we observe the greatest rate of change. Loci that exhibit shallower clines may be candidates for neutral introgression, while steep clines are more likely to be attributed to selection against hybridization, indicating intrinsic Dobzhansky–Mueller incompatibilities or extrinsic (environmental) incompatibilities limiting introgression. Teasing apart the influence of intrinsic or extrinsic incompatibilities would require testing the fitness of progeny from controlled crosses in multiple environments across the hybrid zone.

Relating geographic and genomic clines

The five candidate gene loci (16, 137, 142, 153, and 255, Fig. 3) that exhibited steepest clines (maximum clines slopes of 0.011–0.012 km−1) are candidates for divergent selection or reproductive isolation or are linked to such genes. Only SNP16, associated with regulating gibberellin signaling and plant development (Zhang et al., 2011), exhibited a pattern of selection against heterozygotes (WS−), while SNPs 137, 142, and 153 exhibited positive selection for the white spruce genotypes (WW+) and SNP255 was neutral (Fig. 3). This may indicate that SNP16 is a candidate for or is linked to a locus involved in intrinsic Dobzhansky–Mueller incompatibilities resulting in hybrid inferiority or breakdown (Carling & Brumfield, 2008; Schluter & Conte, 2009; Teeter et al., 2010). SNPs 137, 142, and 153, on the other hand, all exhibited the greatest rate of change in allele frequency between 213 and 220 km, along with an excess of white spruce genotypes and a lack of heterozygotes in a white spruce background (Fig. 3). This geographic region again corresponds to the drier end of the steep precipitation gradient that begins c. 120 km inland from the ocean (Hamilton et al., 2012). At 200 km drainage distance from the coast, all MAP values fall below 1000 mm of annual precipitation. An ANOVA comparing homozygous and heterozygous genotypes with MAP suggests that genotypes across the hybrid zone are distributed based on precipitation (< 0.0001, Fig. 3). Both SNP137 (Plasma membrane instrinsic protein (PIP2E), At2g39010.1) with putative function in water channel activity and SNP255 (early-response-to-drought-6 (erd6), At1g08930.2) are homologous to genes important to drought response in Arabidopsis. Signatures of selection in candidate genes related to drought stress in Pinus taeda identified a related early-response-to-drought-3 (erd3) gene as potentially adaptive (González-Martínez et al., 2006). This suggests these genes may be important in terms of drought response across species.

While Sitka spruce genotypes and heterozygous genotypes are common across a range of precipitation values exceeding 1000 mm, there appears to be a threshold below which the white spruce is the sole genotype observed. Similar trends were observed for mean summer precipitation and precipitation as snow; however, the relationship with these climatic variables was weaker and did not reflect the abrupt transition seen for MAP. These results suggest the distribution of some alleles across the landscape may be strongly influenced by extrinsic selection in drier environments, but not in wetter areas, where they may be neutral.

Studies of Arabidopsis thaliana populations sampled across steep environmental gradients show some SNPs are neutral in one test environment, but adaptive in another (Fournier-Level et al., 2011; Hancock et al., 2011). Our results point towards a similar dynamic within this spruce hybrid zone, where some SNPs appear neutral against a Sitka spruce background but are potentially adaptive against a white spruce background, where there is strong selection for the white spruce genotype. Comparing the genomic and geographic clines of SNPs 137, 142, and 153, we observe that where there is strong evidence of selection favoring the white spruce genotype, there are fewer heterozygotes, suggesting potential divergent selection across environments (Fig. 3).

Examining differential introgression using interspecific differentiation combined with geographic and genomic clines identifies heterogeneous patterns among loci across the genome, suggesting porous spruce genomes. Not unexpectedly, these tests identify different subsets of candidate genes under selection, depending on gene action, strength of selection, and distribution of underlying environmental variation. This points towards the importance of using multiple approaches to investigate SNPs that may be under selection. Candidates for divergent selection identified using outlier tests merit further investigations using functional assays in combination with genomic association tests. Furthermore, patterns of differential introgression across the landscape validate patterns of introgression observed into varying genomic backgrounds, identifying an important role for environmental selection, particularly along precipitation gradients. These approaches identify a suite of candidate gene SNPs for future inclusion in association and functional assays. This fine-scale approach provides the context for broader questions regarding the maintenance of species barriers in these long-lived species, distinguishing regions of the genome where selection may prevent introgression. Identifying the genomic regions where differences accumulate in this and other conifer hybrid zones will provide an opportunity to identify loci and genomic regions that maintain species.

Acknowledgements

We would like to thank John King for establishing the common garden; Christine Chourmouzis, Lisa Erdle, Nina Lobo, and Jon Sweetman for field assistance; Dorothea Lindtke for advice with analysis; and Loren Rieseberg and Kermit Ritland for helpful manuscript comments. The work was supported by Genome British Columbia, Genome Canada and the Forest Genetics Council of British Columbia (funding to S.A.) by the Natural Sciences and Engineering Research Council of Canada (NSERC; grant to S.A.), and by an NSERC Canada Graduate Scholarship and University of British Columbia Doctoral Fellowship to J.H.

Ancillary