Data quality and summary statistics
DNA extracts from a total of 847 individuals were analyzed with the SNP assay (231 historical samples were discarded due to contamination or poor DNA quality). In these samples, 1011 SNPs were successfully genotyped; 935 of these passed the quality criteria and were used for analysis. The mean genotype concordance among replicate samples was 98% and the mean call rate for samples was 93%. The different samples were polymorphic for between 86% and 99% of loci and He ranged from 0.25 to 0.32 (Table 1).
In single-locus tests for HWE, 1471 tests (of 28 050) had P < 0.05, with the highest concentration in the samples OWE10 and QAQ08 (with 87 and 77 of 935 loci having P < 0.05, respectively). However, after FDR correction, only 13 tests remained significant (q < 0.05) and these were distributed among loci and samples. LD analysis revealed variable numbers of significant associations among loci in the different samples, but 1747 of the 436 612 possible pairwise comparisons among loci had a mean r2 > 0.1 within ‘pure’ samples (Fig. S1). When discarding one locus from each of these LD pairs, a set of 693 loci remained, which was used for specific steps in the analysis as described below.
Pairwise FST estimates between the CAN08 and all other samples ranged from 0.072 to 0.170. For comparisons within the Greenland–Iceland system, estimates ranged from −0.003 to 0.072 and were highest between ISC02 and most other samples, except the other Icelandic and the Nuuk inshore samples (Fig. S2). The majority of pairwise comparisons (393 of 406) showed significant differences in allele frequencies between samples after correction for multiple testing. Notable exceptions were among the Nuuk samples and among the west coast offshore samples (Fig. S2).
Consistent with these results, the K-means analysis (excluding the divergent CAN08 sample) revealed that clustering solutions with either three or four groups generated the lowest BIC-scores and therefore were best supported (Fig. S3A). Two groups were consistent in both clustering solutions: one (the ‘East’ cluster) containing the majority of individuals in the Icelandic offshore sample, the east Greenland samples and the southernmost offshore samples from western Greenland, and another (the ‘West’ cluster) containing the majority of individuals from the remaining western Greenlandic samples except the fjord samples from around Nuuk and portions of the contemporary Sisimiut samples (Table 1). The three-cluster solution grouped Icelandic and Nuuk inshore samples together, whereas the four-cluster solution separated these groups (Fig. S4A). Since this separation is geographically meaningful and there is temporally stable significant differences between the samples, we proceeded with the four-cluster solution.
The samples exhibited considerable overlap between the positions of individuals on the DFs. However, when examining the mean coordinates of each sample, it is evident that the first DF (representing 61.7% of the discriminating power) resolves a continuum from the Greenlandic inshore through offshore West and East to Icelandic inshore (Fig. 2). The second DF (representing 27.6% of the discriminating power) separates inshore samples (in both Greenland and Icelandic waters) from offshore samples (Fig. 2A). The third function (representing 10.6% of the power) separates both the inshore and offshore groups into Icelandic and Greenlandic components, except from a few Greenlandic samples that cluster with the Icelandic samples, likely due to the presence of migrants (see below; Fig. 2B). Recoding of the coordinates on the first two DFs into signal intensity of red and green color, respectively, provides visualization of the geographic distribution of these patterns (see plots of the resulting blended colors for each sample position in Fig. 1). Inspection of the allele loadings on the DFs revealed that a large number of SNPs spread across different LGs drove the discrimination of the first and the third function, whereas the strongest allele contributions to DF 2 (that separated inshore from offshore) were almost exclusively dominated by SNPs in LG1 (Fig. S5).
Figure 2. Scatterplots of the mean sample coordinates on the first and second (A) and the first and third (B) discriminant functions (DF) from the discriminant analysis of principal components (DAPC) based on the four inferred clusters. Contemporary sample names are plotted in white and historical sample names in gray. The background shading of the plot area illustrates the blended color gradient resulting from recoding coordinates on the first and second DF to intensity of red and green, respectively (see text).
Download figure to PowerPoint
With K-means clustering based on the full data set, 87% of individuals showed posterior membership probability of >0.95 to one of the four clusters. In the cross-validation where only half of the individuals were used as training data, the assignment power remained high, with 82% of the hold-out individuals showing posterior membership probability of >0.95 to one of the clusters and 94% of these assigning to the same cluster as in the full data analysis. The consistent results obtained when hold-out individuals were not used for defining clusters or DFs indicate that the reported cluster configuration was well supported by the data.
At the aggregate level, 20 of the 28 samples had mean membership probability >0.6 to a single clusters, while the remaining eight appeared to consist of mixtures of cod from different clusters (Table 1). Both ‘pure’ and ‘mixed’ samples were primarily made up of individuals that assigned with high probability to a single cluster (Fig. 3). However, some individuals appear to be admixed, showing relatively even membership probabilities between different clusters. Of particular note, the majority of the Greenlandic west coast offshore samples appeared to contain approximately even mixtures of fish with high assignment probability to the ‘East’ and the ‘West’ clusters, respectively. Meanwhile, a vast majority fish in the coastal west coast samples assigned to the ‘West’ cluster (Fig. 3). Two exceptions to this were the contemporary samples from SIS that appeared to contain a considerable proportion of fish assigning to the ‘Nuuk’ cluster, and contemporary samples from PAA and QAQ that appeared to be made up of fish from the ‘Iceland-inshore’ and the ‘East’ cluster, respectively (Fig. 3). Since the historical samples from these latter two locations contained almost exclusively ‘West’ individuals, the contemporary dominance of the alternate clusters suggests complete population replacement in this region. In contrast to these stark temporal changes, other locations (UMM, ILL, KAP, and DAB) exhibited a high degree of temporal stability, as evident both from assignment results (Fig. 3) and from the tight clustering of temporal replicates (Figs. 1 and 2).
Figure 3. Plot of the posterior membership probabilities of each individual to the Iceland inshore (yellow), East (red), West (green), and Nuuk (brown) clusters, respectively. Each vertical line represents an individual and is divided into color segments proportional to its posterior membership probability to each of the geographic clusters derived from the discriminant analysis of principal components (DAPC) including only the ‘pure’ samples (see text). The order of individuals within samples is random, but samples are ordered according to hydrographic distance from the easternmost sample.
Download figure to PowerPoint
When loci potentially under selection (see below) and loci in strong LD were removed from the data, the pairwise FST coefficients were considerably lower than with all loci (ranging from 0.058 to 0.137 for the Canadian sample and from −0.003 to 0.028 in comparisons among Greenlandic and Icelandic samples), but 337 of 406 comparisons still showed significant differences in allele frequencies (Fig. S2). The K-means clustering clearly indicated that with this data subset, a solution with only two clusters was best supported (Fig. S3B): One cluster containing the Icelandic (both inshore and offshore), the east coast, the contemporary QAQ and PAA as well as portions of the Nuuk samples, and a second cluster containing the remainder of the Greenlandic samples (not a single Icelandic individual assigned to this cluster). The three-cluster solution corroborated this, except that it split the ‘Nuuk’ samples into their own cluster (Fig. S4B).
Spatial outlier detection
In all analyses, bayescan detected considerably more outliers than arlequin (often more than twice as many), but arlequin outliers were almost exclusively a subset of bayescan outliers. Here, we describe only results on outliers identified by both methods. In the comparison of all contemporary samples, 47 loci were either FST (differentiation between all samples) or FCT (differentiation between clusters) outliers (the majority both; Table S2), and all but six of these loci were located in one of three regions characterized by significant LD across loci within LG1, 2, and 7, respectively (see Fig. S1). Analysis of the Icelandic samples alone identified a large proportion of the global outliers in LG1 and LG7, but notably not LG2. Within Greenland, the majority of global outliers from LG1 along with a number of single loci in other LGs were outliers on a regional scale (Table S2). Comparison with analysis of the historical Greenlandic samples suggested that this pattern was stable over time, although there were 30% fewer outliers among historical samples (Table S2).
Pairwise comparisons between the clusters showed that LG7 loci were only outliers in tests involving the Iceland-inshore group (Fig. 4). The majority of global outliers in LG1 were outliers in all comparisons involving the ‘Iceland-inshore’ and the ‘Nuuk’ clusters, but to a lesser degree in the comparison of these two, indicating a common divergence from the other clusters at this genomic region (Fig. 4, Table S2). The smallest number of outliers was found in the ‘West’–‘East’ comparison, but the outliers here were in different LGs, thus likely representing independent instances of genomic divergence. Few significant outliers were detected within clusters, except from a few cases in both the ‘East’ and ‘West’ historical samples.
Figure 4. Matrix of results from the bayescan spatial outlier tests in pairwise comparisons of the clusters. Each cell shows the q-value for each locus being under selection plotted against genome position (ordered by linkage groups LGs). Loci above the horizontal lines (representing q = 0.05) are considered significant outliers and loci that were also outliers in the Arlequin analysis are marked by filled symbols. Circles represent loci with known position within LGs, whereas triangles denote loci that were anchored to an LG but with unknown position within the LG. Loci in LG1, 2, and 7 are highlighted in blue, purple, and red, respectively, whereas the remaining LGs are plotted in alternating shades of gray and loci that could not be anchored to the linkage map are plotted in black.
Download figure to PowerPoint
The bayenv analysis identified between one and twenty nine loci that were highly correlated with the environmental variables in the different comparisons (Table S2). All but two of the significantly correlated loci were also identified as spatial or temporal outliers. The high-LD group on LG1 that exhibited strong spatial outlier patterns correlated with a number of variables, including distance to shore, sea surface temperature range, and salinity. The spatial outlier loci on LG7 were correlated with longitude, which is expected given that these loci seemed divergent only between the Iceland-inshore cluster and the rest. However, a number of additional loci distributed across LGs also correlated with longitude. Different sets of loci—some on LG1—correlated with maximum and mean bottom temperature, whereas a consistent set of 4 loci correlated with minimum and mean surface temperature. Three of these loci were involved in differentiation between the Iceland inshore and particularly the Nuuk samples (Table S2).