Department of Biology, The University of British Columbia, Kelowna, BC, Canada
Correspondence: Michael Russello, Department of Biology, The University of British Columbia, Okanagan Campus, 3333 University Way, Kelowna, BC V1V 1V7, Canada. Tel.: +1 250 807 8762; fax: +1 250 807 8005; e-mail: email@example.com
Recent progress in methods for detecting adaptive population divergence in situ shows promise for elucidating the conditions under which selection acts to generate intraspecific diversity. Rapid ecological diversification is common in fishes; however, the role of phenotypic plasticity and adaptation to local environments is poorly understood. It is now possible to investigate genetic patterns to make inferences regarding phenotypic traits under selection and possible mechanisms underlying ecotype divergence, particularly where similar novel phenotypes have arisen in multiple independent populations. Here, we employed a bottom-up approach to test for signatures of directional selection associated with divergence of beach- and stream-spawning kokanee, the obligate freshwater form of sockeye salmon (Oncorhynchus nerka). Beach- and stream-spawners co-exist in many post-glacial lakes and exhibit distinct reproductive behaviours, life-history traits and spawning habitat preferences. Replicate ecotype pairs across five lakes in British Columbia, Canada were genotyped at 57 expressed sequence tag-linked and anonymous microsatellite loci identified in a previous genome scan. Fifteen loci exhibited signatures of directional selection (high FST outliers), four of which were identified in multiple lakes. However, the lack of parallel genetic patterns across all lakes may be a result of: 1) an inability to detect loci truly under selection; 2) alternative genetic pathways underlying ecotype divergence in this system; and/or 3) phenotypic plasticity playing a formative role in driving kokanee spawning habitat differences. Gene annotations for detected outliers suggest pathogen resistance and energy metabolism as potential mechanisms contributing to the divergence of beach- and stream-spawning kokanee, but further study is required.
There has been recent progress in our understanding of the formative role of selection driving population divergence, even in the face of on-going gene flow (Via & West, 2008; Wolf et al., 2010; Feder et al., 2012). Technological advances in data collection paired with the analytical framework of population genomics have sparked increasing refinement of our ability to detect adaptive population divergence in situ, even without prior knowledge of the phenotypes under selection or the availability of extensive genome maps (Luikart et al., 2003; Nielsen, 2005; Storz, 2005; Stinchcombe & Hoekstra, 2008). Yet, many questions remain unresolved regarding the number, arrangement and genome-wide influence of ecologically relevant loci under selection that are required to overcome the homogenizing effects of gene flow to establish the reproductive barriers ultimately required for speciation (Via, 2009). This recognition has led to an increasing number of studies of locally diverging populations where whole-genome isolation has not been achieved (Turner et al., 2005; Savolainen et al., 2006; Nosil et al., 2009; Stölting et al., 2013) and the development of models describing the genomics of speciation-with-gene flow (Feder et al., 2012).
Salmonids offer ample opportunities for studying the genetic basis of local adaptation within recently diverged populations where gene flow persists. They inhabit a wide range of habitats and exhibit extensive phenotypic diversity, often predicted to be the result of local adaptation (reviewed in Hendry & Stearns, 2004; Quinn, 2005). In many cases, precise natal homing has led to isolation and subsequent emergence of new ecological forms in multiple, independent populations over a short period of time (< 10 000 years; Taylor, 1991; Dittman & Quinn, 1996). Currently, a better understanding of the mechanisms driving ecotype divergence is still needed. In the absence of conspicuous drivers, a bottom-up approach may be fruitful for detecting adaptive variation and identifying traits that are strongly linked to fitness in particular environments (e.g. behavioural, life-history or physiological traits).
Parallel patterns of genotypic and phenotypic divergence can provide strong evidence for the action of natural selection, allowing insights into the underlying processes involved in speciation (Butlin, 2010; Schluter et al., 2010). Post-glacial fishes have played a prominent role within such studies, including parallel adaptation and phenotypic evolution in the three-spine stickleback (Colosimo et al., 2004; Hohenlohe et al., 2010; Deagle et al., 2012), and divergence of lake whitefish morphotypes (Rogers & Bernatchez, 2005). Variation in spawning habitat preference and reproductive strategies has also been observed in other post-glacial fishes, particularly sockeye salmon (Oncorhynchus nerka; Burgner, 1991; Quinn, 1999). However, processes underlying this divergence can be difficult to identify in populations that exploit distinct environments, but show little difference in morphological features.
Kokanee is an obligate freshwater form of sockeye salmon, with parallel evolutionary origins from anadromous sockeye throughout the North Pacific since the last (Wisconsinan) glaciation (~15 000 years ago; Ricker, 1940; Taylor et al., 1996). There are two reproductive ecotypes of kokanee that alternatively spawn in inlet streams or along lake beaches (Burgner, 1991). Stream-spawners exhibit a reproductive strategy more typical of anadromous sockeye, whereas beach-spawners form large spawning aggregations along the shoreline (Shephard, 2000) weeks or months following stream-spawners likely due to the varying temperature regimens of the spawning substrate (Burgner, 1991; Burger et al., 1995; Winans et al., 2003). In that regard, beach habitats are typically deeper (0.1–10 metres) and warmer, characterized by low flow regimes and larger rocky substrates (Hassemer & Rieman, 1981; Dill, 1996). Usually beach-spawners do not exhibit courting behaviour, mate selection, redd excavation or nest defence (Dill, 1996; Ashley et al., 1998; Shephard, 2000); however, a targeted behavioural study has not been published to date. In some lakes, the ecotypes co-occur with each other and, in other cases, with anadromous sockeye as well (Burgner, 1991). Although kokanee ecotypes often display different reproductive traits and experience varying incubation environments associated with their corresponding stream and shoreline spawning areas (Ashley et al., 1998; Shephard, 2000; Andrusak & Andrusak, 2011), they show no morphological or ecological differences between emergence and maturation (Taylor et al., 1997, 2000; Winans et al., 2003).
The specific fitness-related traits involved in generating reproductive barriers among kokanee ecotypes are currently unknown, but are likely related to habitat preference, mating behaviour, energy metabolism and/or life-history ecology that increase reproductive success in adult spawners and/or survival at early life stages (Lecomte & Dodson, 2004). Previous studies have demonstrated weak neutral differentiation among kokanee ecotypes in Okanagan Lake in British Columbia, Canada (Taylor et al., 1997, 2000). More recently, a population genomics study of Okanagan Lake kokanee screened over 11 000 expressed sequence tag (EST)-linked microsatellites, resulting in a panel of 57 markers (49 EST-linked and 8 anonymous; Russello et al., 2012). Eight loci exhibited patterns of genetic variation that deviated from expectations under a neutral model of evolution (i.e. outlier loci) representing candidate gene regions exhibiting signatures of selection or regions that are genetically linked to selected loci. However, this study focused on a single lake and was unable to validate the role of selection in ecotype divergence.
Here, we tested for evidence of directional selection driving the adaptive divergence of beach- and stream-spawning kokanee salmon in five lakes in British Columbia, Canada. We used the panel of EST-linked microsatellites developed in Russello et al. (2012) and four conceptually different outlier detection tests to reduce false positives. Neutral loci were used to determine whether ecotypes represent distinct genetic lineages (i.e. monophyletic groups with a common ancestor) or have arisen multiple times (i.e. polyphyletic origins). If a common genetic basis (e.g. same outlier loci) can be identified across multiple, independent ecotype pairs, we can infer preliminary evidence for natural selection driving parallel divergence in this system (Johannesson, 2001; McKinnon et al., 2004; Hendry, 2009). Based on this approach, we describe a number of candidate loci and infer potential mechanisms associated with the divergence of beach- and stream-spawning kokanee.
Materials and methods
Study sites and sample collection
Between 2007 and 2011, we sampled kokanee from five British Columbian lakes where beach- and stream-spawners co-exist: Duncan, Kootenay, Okanagan, Wood and Tchesinkut Lakes (Fig. 1). These lakes have minimal or no documented stocking history, little or no overlap with anadromous sockeye (Table S1) and are distributed across two major drainage systems (Fraser and Columbia Rivers).
A total of 536 live and recently moribund individuals caught in gillnets and dip nets were sampled post-spawn at two to seven different spawning sites (Table S2) within each lake during the peak of their respective spawning periods. The adipose fin or operculum tissue was collected from each fish and preserved in 2 mL vials containing 100% ethanol and stored at -20 °C for subsequent DNA analysis.
Total genomic DNA was extracted with the NucleoSpin Tissue kit (Macherey Nagel) according to the manufacturer's suggested protocol for 96-well plates. Samples collected from Duncan, Kootenay, Tchesinkut and Wood Lakes were amplified via polymerase chain reaction (PCR) at 49 EST-linked microsatellite loci and eight anonymous microsatellite loci. These loci were selected because they demonstrated consistent amplification and polymorphism among ecotypes in Okanagan Lake kokanee in a previous study that screened 11 488 EST-linked loci, of which 243 were characterized (Russello et al., 2012).
For each PCR, we combined 1.25 μL of 10× PCR buffer, 1.25 μL of 2 mm dNTP mix, 0.5 μL of 1 μm forward primer, 0.5 μL of 10 μm M13 fluorescent-labelled primer, 0.5 μL of 10 μm reverse primer, 0.5 units of Taq polymerase and 20–80 ng of DNA template for a total reaction volume of 12.5 μL. Forward primers were modified to include an M13 tail [5′-TCCCAGTCACGACGT -3′] (Schuelke, 2000), which allowed for a fluorescent-labelled universal primer (6-FAM, VIC, NED, or PET; Applied Biosystems) to be incorporated into PCR amplicons. Reverse primers were modified to include a GTTT-tail to improve scoring quality (Brownstein et al., 1996). All reactions used KAPA Taq DNA polymerase (Kapa Biosystems), except for markers EV170, OMM5008, OMM5067 and One14. AmpliTaq Gold DNA polymerase (Applied Biosystems) was used to promote amplification of these markers. Amplification was achieved using a touchdown cycling programme, which started with an initial denaturation at 94 °C for 2 min (or 10 min for reactions using AmpliTaq Gold), followed by 20 cycles at 94 °C for 30 s, 60 °C for 30 s and 72 °C for 30 s, with the annealing temperature decreasing 0.5 °C per cycle. For 15 more cycles, the annealing temperature was held at 50 °C and final extension was 2 min at 72 °C. DNA fragments were analysed on an Applied Biosystems 3130XL DNA automated sequencer using the GS500 LIZ size standard to determine fragment length. Two independent investigators manually scored all alleles in genemapper 4.0 (Applied Biosystems). New genotypic data were then combined with that previously collected for the Okanagan Lake kokanee by Russello et al. (2012).
Data quality and definition of genetic units
Loci were tested for null and false alleles using microchecker (Van Oosterhout et al., 2004). Deviation from Hardy–Weinberg Equilibrium (HWE) was assessed using the Markov chain Monte Carlo (MCMC) approximation of Fisher's exact test (using 1000 batches with 1000 iterations; Guo & Thompson, 1992), and linkage disequilibrium (LD) was assessed for all possible marker combinations using simulated exact tests as implemented in genepop 3.3 (Raymond & Rousset, 1995). A sequential Bonferroni procedure (Rice, 1989) was used to correct for multiple comparisons (P < 0.05) in tests of LD and departure from HWE. Because selection violates a critical assumption of HWE and can generate patterns of LD, beach- and stream-spawners were evaluated separately in each lake and a locus was only removed if such patterns were detected in both ecotypes for at least one lake. Pairwise FST comparisons were conducted in arlequin 3.5 (Excoffier & Lischer, 2010) using the anonymous microsatellites to ensure that spawning sites at similar habitats within a single lake were not significantly differentiated prior to pooling individuals by ecotype for all downstream analyses.
Outlier locus detection
In each lake, putative outliers were identified based on patterns of genetic diversity (lnRH; Kauer et al., 2003), differentiation (DetSel; Vitalis et al., 2003) or both (Lositan Selection Workbench; Antao et al., 2008; BayeScan; Foll & Gaggiotti, 2008). The lnRH test (Kauer et al., 2003) detects loci departing from the genomic average by calculating the ratio of expected heterozygosity (HE) for each locus. Monomorphic loci were assumed to have one allele that differed from the others to avoid dividing by zero. Estimates were standardized to obtain a mean of 0 and a standard deviation of 1, so that 90%, 95% and 99% of the loci were expected to have values within ± 1.64, ± 1.96 and ± 2.58, respectively. Loci with values outside these boundaries were considered putative outliers at the respective levels. Second, we employed the probabilistic approach implemented in DetSel 1.0 (Vitalis et al., 2003), which assumes a pure divergence model. Coalescent simulations generated a joint distribution of expected FST values under a neutral model. For post-glacial lakes, this null distribution (i.e. confidence envelope) was generated using the following parameters following Russello et al. (2012): Ne = 500, 1000, 10000; No = 500; To = 50, 100, or 1000; μ = 0.0001, 0.00001; t =100. For Duncan Lake, we used modified parameters (Ne = 100, 500, and 5000; No = 50; To = 5, 10, or 20; t =1) given its recent formation 47 years ago following the construction of Duncan Dam. In two-dimensional arrays of 50 × 50 square cells, outliers were identified as loci falling outside of the confidence envelope with an empirical P-value of > 90%, > 95% and > 99% (Vitalis et al., 2001). Finally, the fdist2 approach of Beaumont & Nichols (1996) was implemented in both Lositan and Bayescan. The Lositan test (Antao et al., 2008) assumes an island model of migration and uses coalescent simulations to generate the null distribution of FST values. After an initial 50 000 simulations, loci outside the 95% confidence interval (CI) were removed to calculate a more accurate mean neutral FST, which was forced in a second set of 50 000 simulations to calculate the probability of each locus being under selection at 90%, 95% and 99% CIs. In Bayescan 2.0 (Foll & Gaggiotti, 2008), the fdist2 approach is modified to use Bayesian-based simulations. For each locus, posterior probabilities were estimated for two alternative models (one with and one without locus-specific effects) using a reversible jump MCMC. For the MCMC algorithm, 20 pilot runs of 5000 iterations were conducted followed by 100 000 iterations with a burn-in of 50 000. Posterior probabilities of > 0.76, > 0.91 and > 0.97 were interpreted as ‘substantial’, ‘strong’ and ‘very strong’ support for the action of selection, respectively (Foll & Gaggiotti, 2008).
Potential false outliers were eliminated based on a lack of concordance across tests in each lake (Vasemägi & Primmer, 2005; Bonin et al., 2006; Foll & Gaggiotti, 2008; Shikano et al., 2010) rather than using a Bonferroni correction or a false discovery rate (Benjamini & Hochberg, 1995; Schlotterer, 2003). Therefore, only loci detected by two or more tests above the 90% CI in a single lake are referred to as ‘outliers’ hereon and included in the ‘outlier’ data set. Any locus detected by multiple tests in two or more lakes was also included in the ‘repeat outlier’ data set. Finally, any locus not detected by any test in any lake, and that was not monomorphic in any lake, was included in the ‘neutral’ data set.
Population genetic analyses
Mean number of alleles (NA), observed heterozygosity (Ho), expected heterozygosity (He; Nei, 1987) and the percentage of polymorphic loci were calculated for each ecotype in each lake using GenAlEx version 6.2 (Peakall & Smouse, 2006). We tested for evidence of a recent demographic contraction within each lake because some populations have undergone substantive declines (Shephard, 2000) and bottlenecks can significantly bias outlier detection results (Teshima et al., 2006). We used the neutral loci and the one-tailed Wilcoxon (Cornuet & Luikart, 1996) and mode-shift indicator tests implemented in Bottleneck (Piry et al., 1999) under a two-phase model (TPM) of mutation (80% stepwise mutation model; Dirienzo et al., 1994). We also calculated M-ratio using M_P_VAL.exe (Garza & Williamson, 2001), a marker mutation rate μ of 5 × 10−4 and a prebottleneck Ne ranging from 500 to 5000. These parameters resulted in a value of θ (4 Neμ) ranging from 1 to 10 and 3.5 base steps for multistep mutations. For the TPM, the amount of single step mutations was ps = 0.9 as recommended by Garza & Williamson (2001).
To ensure that each ecotype pair could be considered an independent sample, we tested for correspondence of geographically separated populations and ecotypes as discrete genetic units at neutral loci using Structure 2.3.3 (Pritchard et al., 2000). Run length was set to 1 000 000 MCMC steps after a burn-in period of 500 000 using correlated allele frequencies under an admixture model using the locprior option (i.e. ecotype and lake were used as prior information to assist the clustering when structuring is relatively weak; Hubisz et al., 2009). Clusters (K) from 1 to 10 were evaluated, with 5 iterations per K value. The most likely number of clusters was determined using the method of Pritchard et al. (2000) and the ∆K approach (Evanno et al., 2005) as implemented in Structure Harvester (Earl & vonHoldt, 2012). The level of genetic similarity within and among ecotype pairs from the five lakes was visualized using a discriminant analysis of principal components (DAPC) conducted in R (R Development Core Team, 2012) with the Adegenet package version 1.3.0 (Jombart et al., 2010). DAPC is a model-free approach that produces synthetic variables to maximize the among-population variation visible in two dimensions whereas minimizing the variation within predefined groups. We retained 90% of the variation and plotted all individuals on the first two principal components.
Divergence at neutral and outlier loci
Because loci truly under selection are expected to show distinct patterns of genetic variation compared with neutral loci (Storz, 2005), global and lake-by-lake estimates of population differentiation based on all outliers, outliers identified in each respective lake (referred to as ‘lake-specific outliers’), repeat outliers and neutral loci data sets were compared. An analysis of molecular variance (AMOVA; Excoffier et al., 1992) was used to assess the hierarchical organization of genetic variation within and among ecotypes and across lakes, and pairwise FST comparisons (Weir & Cockerham, 1984) were used to assess genetic differentiation among ecotypes within each lake, as implemented in arlequin 3.5 (Excoffier & Lischer, 2010). The 95% CIs were estimated by bootstrapping over all loci. We used the probabilistic approach of Structure 2.3.3 to determine the number of genetically distinct clusters within each lake. As above, run length was set to 1 000 000 MCMC steps after a burn-in period of 500 000 using correlated allele frequencies under an admixture model using the locprior option (i.e. ecotype). Clusters (K) from 1 to 10 were evaluated, with 5 iterations per K value. The most likely number of clusters was determined using the method of Pritchard et al. (2000) and the ∆K approach (Evanno et al., 2005) as implemented in Structure Harvester (Earl & vonHoldt, 2012).
Sequence similarity searches were conducted for all outlier loci within the consortium for Genomics Research on All Salmon (cGRASP) and the salmonidae database in BLAST to identify candidate genes and investigate functional annotations (Salem et al., 2010). Avoiding the repeat region, primers were designed for repeat outliers and two strong outliers (Table S3) using Primer3 (Rozen & Skaletsky, 2000). Approximately 300–600 basepairs of the candidate genes were Sanger sequenced for four stream- and four beach-spawning individuals. Sequences were aligned with the reference nucleotide sequence in Geneious 5.6 (Drummond et al., 2011) and screened for single nucleotide polymorphisms (SNPs). If SNPs were identified, the locus was sequenced at an additional eight beach- and eight stream-spawners to better resolve population-level patterns of differentiation.
Data quality and defining genetic units
Eleven individuals with > 20% missing data were removed from the data set. Three loci that exhibited false alleles (EV103, EV626 and One109) and deviations from HWE (EV626 and One109) in Okanagan Lake were removed. Three pairs of loci were in LD in Okanagan and Wood Lake (OMM5099 & Ots29, Ca687 & EV712 and Ca613 & Ots14), so the locus with greater missing data within each pair was removed from the data set (Ca613, Ca687, and Ots29). Locus Ssa85 was also removed due to inconsistent genotypic data. As a result, 525 individuals and 50 loci were retained in the final data set, which had 1.4% missing data overall. No lake showed evidence for a recent population bottleneck regardless of approach (heterozygote excess, mode-shift, M-ratio; Table 1). Spawning sites of the same habitat type were pooled in each lake, as estimates of neutral genetic differentiation within each group were low on average (FST = 0.008), ranging from −0.003 to 0.021 (Table S4). Only two of 12 intra-ecotype comparisons were significant, both of which occurred in Okanagan Lake (NE beach/NW beach: FST = 0.012; Penticotin creek/Mission Creek: FST = 0.021; Table S4). Given the extremely low FST values and the results of previous detailed analyses focusing on Okanagan Lake that revealed no site-specific patterns (Russello et al., 2012), we proceeded to pool individuals by ecotype within each lake for all downstream analyses.
Table 1. Estimates of population genetic parameters for each ecotype within each lake using all 50 loci including: sample size (N), mean number of alleles per locus (NA), range in the number of alleles, mean expected heterozygosity (HE), mean observed heterozygosity (Ho) and the percentage of polymorphic loci (%poly)
Within-population diversity at all loci
The level of polymorphism varied across lakes, but was high overall (80–100%; Table 1). Polymorphism was lower in lakes outside of the Okanagan River Chain: Tchesinkut Lake kokanee had eight monomorphic loci (EV291, EV691, OMM5032, OMM5037, OMM5099, OMM5121, One8, Ots06), Duncan Lake had three (EV723, EV911, Ots06) and Kootenay Lake had one (Ots06). No trends were observed in NA or HE among beach- and stream-spawners, although HE was slightly higher for the stream-spawners in four of five lakes. Overall, the NA ranged from 1 to 23, with a mean of 5.35, and HE ranged widely from 0 to 0.93 across loci with a mean of 0.46.
Outlier locus detection and dataset definition
Outlier locus detection rates varied substantially across the four methods. Bayescan was the most conservative whereas Lositan and DetSel had the highest outlier detection rates (Table 2). In total, 30 of 43 EST-linked microsatellite loci (69.8%) and three of seven anonymous microsatellite loci (42.9%) were detected as putative outliers by at least one method in at least one lake. Based on the patterns in positive detections across the four methods, 15 outliers were retained (30% overall) and 18 were discarded as false outliers (Table 2). Four of the 15 outliers were repeatedly detected in multiple lakes (‘repeat outliers’), although no locus was identified as an outlier in all five lakes. Of the 17 loci not detected as outliers by any test in any lake, 15 were polymorphic in all five lakes and retained in the neutral data set.
Table 2. Loci detected as outliers by four different algorithms (Bayescan/Lositan/DetSel/lnRH) in each of five British Columbian Lakes. Loci detected by only one approach are listed as false outliers (False). Loci detected by at least two approaches in at least one lake are outliers (Outlier; light shading), and those outliers that are detected in two or more lakes are repeat outliers (R outlier; dark shading). Each marker is identified as either an EST-linked microsatellite marker (EST) or an anonymous microsatellite marker (Anon)
mono indicates that the locus is monomorphic in that lake.
The Bayesian clustering analysis revealed two discrete genetic clusters (primary ΔK peak at K =2) corresponding to kokanee native to the Columbia River drainage (Okanagan, Wood, Kootenay, Duncan) and Tchesinkut Lake, the sole representative from the Fraser River drainage (Fig. 2a). A secondary ΔK peak at K =6 further indicated discrete clusters by lake, with Wood Lake stream- and beach-spawners constituting separate units (Fig. 2b). Similarly, the DAPC plot clustered groups geographically, such that the first principal component emphasized genetic differences among populations from the Columbia River drainage and Tchesinkut Lake, and the second principal component reflected differences among and within the Okanagan and Kootenay River Chains (Fig. 3). Considerable overlap was observed for Okanagan and Wood Lake kokanee, as well as Kootenay and Duncan Lake kokanee (Fig. 3).
Neutral vs. adaptive genetic variation
The amount of genetic variation occurring among ecotypes putatively due to selection (outlier loci) exceeded the variation due to drift (neutral loci) as revealed by a global AMOVA (Table 3). A significant difference was detected between the four repeat outliers and 15 neutral loci (chi-squared test, P =0.002) in the AMOVA, but not between all 15 outliers and 15 neutral loci (chi-squared test, P =0.332). Within each lake, ecotype differentiation was higher at outlier loci (average FST = 0.074 ± 0.034 95% CI) compared with neutral loci (average FST = 0.009 ± 0.012 95% CI; Table 4). Furthermore, the level of ecotype differentiation in Okanagan, Kootenay and Tchesinkut Lakes was significantly greater at lake-specific outliers than repeat outliers (paired t-test; P =0.018). Still, among-lake divergence substantially exceeded ecotype divergence within lake across all comparisons (Table 4).
Table 3. The percentage of genetic variation occurring among lakes and among ecotypes within lakes as assessed by a hierarchical analysis of molecular variance (AMOVA). Estimates were obtained separately for repeat outliers, all outliers and neutral loci
4 Repeatoutliers (%)
15 Outliers (%)
15 Neutral loci (%)
Indicates a significant level of differentiation (P <0.05).
Table 4. The proportion of genetic variation occurring among ecotype pairs from five lakes as assessed by pairwise FST using repeat outliers, outliers identified in each respective lake and neutral loci
4 Repeat outliers
Lake-specific outliers (number of loci)
15 Neutral loci
Indicates a significant level of differentiation (P <0.05).
Structure analyses revealed distinct patterns of within-lake structuring in three (Okanagan, Wood and Kootenay) of the five lakes when using all loci, corresponding to ecotype (K =2; Fig. 4). Complete admixture (K =1) was inferred for all lakes when using the neutral loci except Wood Lake, which was highly differentiated. No evidence of genetic structure within ecotype pairs was detected using the four repeat outliers alone (except Wood Lake; data not shown).
Functional annotations of outlier loci
Annotations for nine of the 15 outlier loci were recovered via BLAST database searches (using ‘salmonidae’), including three of the four repeat outliers (Table 5). Two of the loci are associated with genes coding for transmembrane receptors (EV358, OMM5003), three with enzymes (EV170, EV685, EV740), two with protein transporters (EV691, TAP2) and one with a transcription regulator (EV642) (Table 5). The percentage identity scores for four of the loci (EV685, EV691, EV7540, EV862) were exceedingly low, suggesting their annotations should be interpreted with caution. Annotations were not found for Ots06 and five of the OMM markers originally described in rainbow trout (Oncorhynchus mykiss) by Rexroad et al. (2005).
Table 5. Gene annotations for nine EST-linked microsatellite loci identified as outliers. Each annotated gene had the largest similarity score in either the cGRASP or BLAST database and was originally described in Atlantic salmon (Salmo salar). The name, accession number, similarity score (e-value), type of protein, location of expression and function are given if known. Annotated repeat outliers shaded in grey
Transcription activator for mitogen-activated protein kinase-signalling pathway
All five reliably annotated outlier loci (EV170, EV358, EV642, OMM5003, TAP2) and three strong lake-specific outliers (OMM5067, OMM5125, Ots06) were Sanger sequenced at adjacent regions. Two loci produced unreliable sequence (EV170, EV642). Of the remaining loci, only the TAP2 locus exhibited variation: two distinct haplotypes differing in 22 SNPs and a 12 basepair insertion. One haplotype was shared by all eight stream-spawners and five beach-spawners and the other highly divergent haplotype was unique to three beach-spawning individuals from Tchesinkut Lake.
When using population genomic approaches to detect divergent selection, sampled populations should be closely related, recently diverged and subject to ongoing gene flow, all conditions that were met in the current study (Beaumont & Nichols, 1996; Schlötterer, 2002; Stinchcombe & Hoekstra, 2008). We found evidence for neutral differentiation among lakes (Table 3), including those adjacent to one another, but none among ecotypes within each lake other than in Wood (Table S4). Trends in polymorphism (Table 1) and clustering analyses (Fig. 3) suggest that Tchesinkut Lake kokanee are genetically isolated from the other four populations, consistent with the proposed colonization and phylogeographic history of Oncorhynchus nerka in British Columbia (Foote et al., 1989; Taylor et al., 1996). Specifically, sockeye from the Bering Refugium likely colonized post-glacial lakes in northern BC and Alaska, whereas sockeye from the Columbia Refugium colonized southern BC and northern USA (Foote et al., 1989; Taylor et al., 1996). Weak neutral differentiation among populations sampled from adjacent lakes in the Kootenay (i.e. Duncan and Kootenay) and Okanagan River Chains (i.e. Okanagan and Wood Lakes) further reflects the recent connectivity among them. After water levels reached their present level < 10 000 years ago, migration among these lakes would have been possible up until the construction of several dams in the 1910s and 1960s (Table S1). Overall, these results indicate that kokanee in BC share a recent common lineage and ecotype divergence was likely initiated and maintained within each lake, suggesting that neutral evolutionary processes are likely not confounding our interpretation of putative signatures of selection detected in this study.
Wood Lake kokanee are unique because they exhibited much greater differentiation at neutral and putatively adaptive loci than any other ecotype pair. The annual supplementation of ~50 000 captively reared fry spawned by Wood Lake stream-spawners may be producing a bottlenecking effect, although no signatures of demographic contraction were detected. Alternatively, the stocking programme artificially imposes a barrier to interhabitat migration, effectively rendering a subset of the stream-spawning population allopatric to the beach-spawning fish each year. From an ecological perspective, density-dependent selection may be stronger during the spawning or incubation period if abundance is not otherwise limited by lake productivity and predation. This lake has none of the usual predators and is eutrophic, allowing it to support a high density of kokanee that reach sizes much larger than those in the other four oligotrophic lakes. The high outlier detection rate and number of potential false positives identified in this lake emphasizes the importance of using divergent populations with ongoing gene flow and/or using multiple populations to reduce Type I error in studies of adaptive population divergence.
In contrast, Tchesinkut Lake exhibits substantially lower levels of genetic variation than all other lakes included in the analysis. There was no evidence for population contraction in this system that could account for this finding (Table 1). Rather, we suggest that ascertainment bias related to initial marker choice is the likely cause of the lower levels of variation detected in Tchesinkut Lake. The loci used were based on a previous study of Okanagan Lake kokanee (Russello et al., 2012) which, along with Wood, Kootenay and Duncan, are all part of the Columbia River system. As the sole representative of the Fraser River drainage, Tchesinkut Lake has an independent history that may have contributed to ascertainment bias and associated lower levels of detected genetic variation.
Outlier locus detection
Accurately estimating parameters for natural populations with complex demographic histories is a challenge for all outlier detection methods, contributing to high Type I and II error rates, especially when selection is weak (Storz, 2005; Excoffier et al., 2009; Narum & Hess, 2011). Although the implementation of stringent significance criteria and corrections for multiple testing are commonly used to reduce the number of false positives (Storz & Nachman, 2003; Egan et al., 2008), they also increase the risk of Type II error (Moran, 2003). Here, we combined the results of four conceptually different tests to minimize Type I error, as positive detection by methods that use different algorithms and assumptions (i.e. regarding the metric used, model of evolution, equal or constant population sizes) are more likely to be indicative of truly adaptive genomic regions (Oetjen & Reusch, 2007; Shikano et al., 2010). Although many loci detected at a 90–95% CI were later identified as putative false outliers, these loci were also detected by other algorithms and/or in other lakes at a higher CI in many instances (e.g. OMM5003 in Tchesinkut Lake was detected at > 0.76 in Bayescan, > 95% in DetSel and > 99% in Lositan). This result demonstrates that the frequency and consistency of parallel patterns in outlier detection across lakes reported here might be an underestimate, influenced by Type II error associated with our conservative approach for identifying outliers. Similarly, the genomic baseline was estimated using a combination of coding (EST-linked) and noncoding (anonymous) markers, which may have elevated the ‘background’ divergence and increased the probability of false negatives. That said, our final detection rate (30%) was relatively high compared with other studies of similar scope (13–20%; Vasemagi et al., 2005; Namroud et al., 2008; Shikano et al., 2010), likely due to the initial screening for polymorphic EST-linked microsatellites that preceded this study (Russello et al., 2012) and the broad geographic scale of the analysis.
The repeat outliers represent the most promising candidate loci because they were detected in several ecotype pairs from different lakes (Table 2), suggesting that sampling bias did not generate the observed departure from the genomic average (Foll & Gaggiotti, 2008). Evidence of greater ecotype differentiation (Tables 3 and 4) and better resolution of population structuring at outlier loci compared with neutral loci suggests that selection may be acting on gene regions linked to the outlier loci identified here (Table 5). Duncan Lake kokanee showed very low levels of differentiation and structure regardless of the data set used, but this is likely because the lake was only formed 47 years ago following the construction of Duncan Dam.
Lack of parallel genetic patterns
Some studies have revealed compelling evidence for a common genetic basis underlying the repeated evolution of adaptive traits in discrete populations (Colosimo et al., 2004). Here, we found consistent patterns in outlier detection across more than one lake, but identified loci were never coincident in more than three of five lakes. It is unclear whether the lack of concordance across all lakes is a result of: 1) an inability to detect loci truly under selection; 2) alternative genes or genetic pathways underlie ecotype divergence in this system; and/or 3) phenotypic plasticity plays a formative role in driving kokanee spawning habitat differences. A simulation study by Teshima et al. (2006) showed that many loci under selection may be missed by population genomic approaches, especially if the locus under selection was previously neutral. Also, if a soft selective sweep has taken place, the gene associated with the favoured trait would be present in the population at a low frequency for some time, allowing polymorphisms to accumulate at the linked microsatellite marker through neutral processes (i.e. mutation and drift). In such cases, many alleles may have hitchhiked when the population encountered a new selection regime and the selected gene moved towards fixation (Barrett & Hoekstra, 2011). Signatures of selection may be weak if the environment is heterogeneously distributed, traits under selection are not pleiotropic (Liao et al., 2010), or linkage between the neutral marker, and the fitness-relevant mutation has been broken up by recombination (Nielsen, 2005; Teshima et al., 2006; Stinchcombe & Hoekstra, 2008). Likewise, signatures of selection may be obscured by other confounding factors including inflated estimates of neutral differentiation due to random chance, sampling bias (i.e. ascertainment bias; Thornton & Jensen, 2007; Deagle et al., 2012; Roesti et al., 2012) or the inclusion of unidentified strays (Excoffier et al., 2009).
On the other hand, focusing on repeat outliers may result in overlooking loci of interest (Stinchcombe & Hoekstra, 2008). Even if selection favours the same phenotype in multiple isolated populations, mutations in different genes or different pathways could produce similar phenotypes. A recent meta-analysis estimated a 32% probability of reuse of the same genes underlying repeated phenotypic evolution in natural populations, a larger than expected proportion yet still suggestive of a high frequency of paralogy (Conte et al., 2012). Many other population genomic studies have failed to detect parallel patterns (Campbell & Bernatchez, 2004; Egan et al., 2008; DeFaveri et al., 2011; Renaut et al., 2011; Tice & Carlon, 2011), and others show that mutations in different genes have produced the same phenotype in closely related populations (Hoekstra et al., 2006). Here, incongruent patterns in loci detected as outliers across lakes suggest that there may not be a common genetic basis underlying ecotype divergence in different lakes. For example, some loci were detected as strong outliers (e.g. > 99% CI) by two or three different methods in only a single lake (i.e. OMM5125 in Okanagan, and Ots06 in Wood Lake) and some loci identified as outliers in one lake were monomorphic in others. Imperfect parallelism in habitat features (e.g. water depth, substrate type, flow regimes) and/or phenotypic traits across the five lakes may explain these patterns. Another potentially confounding factor was the comparatively few loci sampled (n = 57), yet recent characterization of the kokanee transcriptome holds promise for identifying SNPs at a broader genomic scale for further investigating a common genetic basis for ecotype divergence (Lemay et al., 2013).
We were unable to identify polymorphic SNPs in the annotated genes associated with the outliers, except for TAP2B. For this locus, only two alleles were present across our entire sampling, corresponding to two distinct haplotypes differing at 22 SNPs and a 12 basepair insertion. Interestingly, there is some preliminary indication of ecotype-specific haplotype frequencies in Okanagan, Wood and Kootenay lakes at TAP2B. At this biallelic locus, the major allele in stream-spawners included the 12 basepair insertion (ranging from 0.54–0.62), whereas the inverse pattern was detected in beach-spawners (ranging from 0.33–0.40). TAP2B was detected as a high FST outlier as has been reported previously in brown trout (Salmo trutta, L.; Jensen et al., 2008), and exhibited no significant difference in observed heterozygosity between ecotypes. Although no similar insertions have been documented in the literature, the high conservation of alleles across the landscape was similar to that found in brown trout (Keller et al., 2011). Further study is required involving larger sample sizes and additional lakes to investigate whether this pattern is biologically meaningful in kokanee.
For all candidate genes, we only sequenced 300–600 basepairs in immediate proximity to the microsatellite repeat, and empirical studies have shown that a hitchhiking effect may be observed in 20–300 kilobasepair regions around the selected gene (Nash et al., 2005; Teschke et al., 2008). Consequently, our inability to detect additional informative SNPs does not necessarily invalidate these regions (or linked regions) as strong candidates for divergent selection. A more concerted effort to sequence adjacent exons and explore associated linkage regions may yield additional insights.
Traits putatively under selection
Currently, little is known about the specific adaptations that differentially influence individual fitness in a stream vs. beach environment. We have identified several candidate genes that allow us to make preliminary inferences about the traits potentially involved based on their annotated function. In particular, locus EV358 is linked to an immunological gene, CXCR4. In Atlantic salmon, CXCR4 is differentially expressed when exposed to sea lice (Skugar et al., 2008) and saprolegniasis, a fungal infection (Roberge et al., 2007). CXCR4 has potent chemotactic activity for lymphocytes and has been shown to inhibit haematopoietic stem cell proliferation (Nie et al., 2008). The TAP2B gene is also associated with immune response, encoding an antigen-transporter protein that hydrolyses ATP to export noxious substances and extracellular toxins across the membrane of the endoplasmic reticulum where it associates with MHC class I molecules (Monaco, 1992). The functional role of TAP2 polymorphism is poorly understood even in humans (McCluskey et al., 2004), but some studies have found differential ability of allelic products to transport specific peptides (Monaco, 1992; Powis et al., 1992; Heemels & Ploegh, 1994) and strong LD between MHC I and TAP2 (Ohta et al., 2003). The functioning of this protein can be suppressed by some viruses in brown trout and European trout (Salmo trutta), which suggests that selection may be pathogen-mediated (Abele & Tampe, 2006; Jensen et al., 2008; Keller et al., 2011). Moreover, studies of Alaskan sockeye have correlated habitat types and spawning sites with variation at SNPs in the MHC that are linked to immune function and disease resistance (Creelman et al., 2011; Gomez-Uchida et al., 2011). In the case of kokanee, stream-spawners typically defend territories, mates (Morbey, 2002, 2003) and nests to a significantly greater extent than beach-spawners (Dill, 1996), making them more prone to physical injury and potentially conferring increased susceptibility to infection. Interestingly, preliminary evidence of differential pathogen exposure of beach- and stream-spawners has been indirectly detected, but additional studies are required to explicitly test this hypothesis (Lemay et al., 2013).
Locus EV170 is associated with an enzyme in the citric acid cycle and is therefore critically involved in energy metabolism throughout the body. Selection on this gene may reflect the reduced metabolic demands of beach-spawners, because the movement to beach sites is less taxing than up-stream migrations. An overexpression of genes with key roles in the citric acid cycle (including malate dehydrogenase) has also been implicated in the divergence of dwarf and normal lake whitefish (St-Cyr et al., 2008). Although these candidate genes offer some insight into plausible mechanisms driving ecotype divergence in kokanee, the loci used here represent a very limited portion of the genome. For example, there is a temporal separation in stream- and beach-spawning in many lakes that may have a genetic basis, but no loci linked to spawning time were identified or explicitly included in this study (e.g. CLOCK genes; O'Malley et al., 2007). On-going genotyping-by-sequencing efforts to characterize genome-wide variation within and among kokanee ecotypes may reveal additional candidate genes within this broader system (M. Lemay and M. A. Russello, unpublished).
Loci exhibiting parallel patterns of outlier behaviour across multiple replicate ecotype pairs have the greatest potential to be under the action of natural selection and involved in generating barriers to gene flow among divergent ecotypes. Here, we provide evidence for the independent colonization of beach habitats by stream-spawning counterparts in each lake. The lack of parallel patterns in outlier locus detection suggests that each beach-spawning population may be uniquely adapted to their habitat in each lake. We identified several candidate gene regions that putatively exhibit signatures of directional selection, which require further validation. Based on these initial findings, we can infer pathogen resistance and energy metabolism as potential and highly plausible mechanisms contributing to the independent divergence of beach- and stream-spawning kokanee, yet we cannot rule out a role for plasticity and nonadaptive processes in contributing to spawning habitat differences. Overall, these insights demonstrate the utility of a population genomic approach for identifying candidate loci underlying fine-scale ecotype divergence and punctuate the need for integrating multiple approaches when investigating causal links between phenotype, genotype and fitness in natural populations.
Paul Askey (BC Ministry of Forests, Lands and Natural Resource Operations) offered valuable background information regarding this system and, together with Jason Webster (Chara Consulting), was instrumental in providing samples from Okanagan and Wood Lakes. Jeff Burrows and Joe DeGisi (BC Ministry of Environment) facilitated sampling in Kootenay and Tchesinkut Lakes, respectively, and all samples from Duncan Lake were provided by Ico de Zwart (Masse Environmental Consultants). Matt Lemay provided helpful comments on an earlier version of this manuscript and Mark Rheault offered valuable advice regarding gene annotations. Funding was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC; Discovery grant # 341711-07 to MR) and KF was partially supported by an NSERC Canada Graduate Scholarship and a Pacific Century Award.