Phylogeographic mitogenomics of Atlantic cod Gadus morhua: Variation in and among trans‐Atlantic, trans‐Laurentian, Northern cod, and landlocked fjord populations

Abstract The historical phylogeography, biogeography, and ecology of Atlantic cod (Gadus morhua) have been impacted by cyclic Pleistocene glaciations, where drops in sea temperatures led to sequestering of water in ice sheets, emergence of continental shelves, and changes to ocean currents. High‐resolution, whole‐genome mitogenomic phylogeography can help to elucidate this history. We identified eight major haplogroups among 153 fish from 14 populations by Bayesian, parsimony, and distance methods, including one that extends the species coalescent back to ca. 330 kya. Fish from the Barents and Baltic Seas tend to occur in basal haplogroups versus more recent distribution of fish in the Northwest Atlantic. There was significant differentiation in the majority of trans‐Atlantic comparisons (ΦST = .029–.180), but little or none in pairwise comparisons within the Northwest Atlantic of individual populations (ΦST = .000–.060) or defined management stocks (ΦST = .000–.023). Monte Carlo randomization tests of population phylogeography showed significantly nonrandom trans‐Atlantic phylogeography versus absence of such structure within various partitions of trans‐Laurentian, Northern cod (NAFO 2J3KL) and other management stocks, and Flemish Cap populations. A landlocked meromictic fjord on Baffin Island comprised multiple identical or near‐identical mitogenomes in two major polyphyletic clades, and was significantly differentiated from all other populations (ΦST = .153–.340). The phylogeography supports a hypothesis of an eastern origin of genetic diversity ca. 200–250 kya, rapid expansion of a western superhaplogroup comprising four haplogroups ca. 150 kya, and recent postglacial founder populations.

in the latter half of the 20th century led to severe population declines, first in Europe and then in North America. The northeast Newfoundland and Labrador stock ("Northern cod" NAFO 2J3KL) declined by >99%, and other areas by >90% (COSEWIC, 2003). As a result, a moratorium on commercial cod fishing was introduced in Canadian waters in 1992.
A food fishery, "Sentinel Survey," and some limited local commercial fishing have since been reopened (Rose, 2007). Populations in the Northwest Atlantic are categorized as Vulnerable on the IUCN Red List (International Union for Conservation of Nature; Sobel, 1996). In Based on distinctions in meristic analysis of vertebral counts (Templeman, 1979(Templeman, , 1981, cod in the waters off Newfoundland and Labrador have traditionally been managed as five stock divisions designated by the North Atlantic Fisheries Organization (NAFO; see Table 1). Elsewhere, cod stocks are delineated by marine area (e.g., Baltic Sea, Barents Sea, North Sea, and Iceland) with numerous regions subdivided further (e.g., Northeast Arctic vs. Norwegian coast).
Furthermore, genome scans of nuclear SNP markers have suggested F I G U R E 1 Atlantic cod (Gadus morhua) caught near Hopedale, Newfoundland and Labrador, Canada. Copyright: Linda Lait (2013) the presence of local adaptation and temperature gradients at both large and small scales (Árnason & Halldórsdóttir, 2015;Berg et al., 2016;Bradbury et al., 2010Bradbury et al., , 2013Nielsen et al., 2009).
We present here a mitogenomic analysis of population genetic structure in Atlantic cod. Carr and Marshall (2008a) Table S1).

| Sample collection and population units
A total of 153 Atlantic cod from 14 sampling locations were examined (Tables 1 and S1). Samples were collected by Fisheries and Oceans Canada from eight sampling locations in the Northwest Atlantic and by commercial trawlers from the Norwegian coast between July 1989 and February 1995 ( Figure 2, Table 1 Primers were originally designed for use with Atlantic cod (Coulson et al., 2006) or as part of this study. The control regions for the 32 partial mitogenomes from Carr and Marshall (2008a) were amplified using the primers g20F with g20-2R1 and g20-2F1 with g20R. PCRs were carried out using Qiagen PCR kits following the manufacturer's protocol with the following modifications: 2.0 mM MgCl 2 , 0.4 μM forward and reverse primer, and 40 extension cycles. Primer details and annealing temperatures are given in Table S2.
Five of the samples (3 LBH, 1 NGB, and 1 RAN) were also amplified in a series of overlapping long-range PCR fragments ranging from 2,876 to 5,741 bp, with an average overlap of 406 bp (77-846 bp).
There were no differences between the two sequencing methods; samples did not cluster by method, and four samples were sequenced with both methods to ensure that there were no differences (see Table   S1).

| Data analyses
The microarray DNA data compare relative signal intensities of experimental-to-reference DNA binding and use a base-calling algorithm based on a decision-tree from empirical rules (Carr et al., 2008(Carr et al., , 2009 the confidence in the call is estimated as a differential signal-to-noise ratio (dS/N), defined as the difference between the two highest signal intensities divided by the sum of all intensities. Empirically, calls made on both strands with dS/N ≥ 0.13 are considered "strong" and reliable, either in agreement with the reference sequence (invariant) or for a SNP variant, and calls with 0.10 ≤ dS/N < 0.13 are considered "weaker" but reliable if in agreement with the reference (Pope et al., 2011). Positions at which the two strand calls differ, or where dS/N < 0.10, were flagged with ambiguity codes for further analysis.

Population pairwise genetic distances (Φ ST ) for mtDNA genomes
were calculated in Arlequin v3.5.1 (Excoffier & Lischer, 2010). A modified false discovery rate correction (Benjamini & Yekutieli, 2001)  Clustering analysis was run with Bayesian Analysis of Population Structure (BAPS) v6 (Corander, Marttinen, Siren, & Tang, 2008). The analysis used the linked loci option, the codon model of linkage, and variable K (K ≤ 13; see Corander & Tang, 2007). The optimal K was determined based on the log marginal likelihood of the best-visited partitions. An unrooted statistical parsimony network was constructed in TCS v1.21 with a 90% connection limit (Clement et al., 2000) to visualize the relationship between haplotypes. All connections were confirmed by visual examination. A principal coordinate analysis (PCoA) was carried out in GenAlEx v6.5 (Peakall & Smouse, 2006 on individual samples and on population pairwise differences.

| Estimates of divergence and coalescent times
Divergence times were estimated from the Bayesian tree, based on the strict clock model in MrBayes v3.2 (Huelsenbeck & Ronquist, 2001;Ronquist & Huelsenbeck, 2003), and with the constant population model in BEAST v2.3 (Bouckaert et al., 2014). The MrBayes analysis was run as described above with uniform branch lengths.
The BEAST analysis was run using a strict clock and the Hasegawa, Kishino, and Yano (HKY) model with gamma-distributed rate variation and allowing for invariable sites (HKY + Γ + I). The models were run for 5,000,000 steps with a 25% burn-in and sampled every 10,000 steps. All ESS parameters were > 1,000. The trees were calibrated with a normal distribution on the assumption that the Atlantic cod separated from the congeneric Pacific cod G. macrocephalus ca. 3.8 million years ago during the last opening of the Bering Strait (Coulson et al., 2006;Grant & Ståhl, 1988;Vermeij, 1991). The observed mean genetic distance of 0.039 substitutions per site then gives an estimated divergence rate of 1.03 × 10 −8 substitutions/site/year, and a temporal interval of 5,844 years between substitutions. These values are similar to those calculated by Carr and Marshall (2008a) for the 15,655-bp cod genome without the control region, and only slightly higher than that calculated for the Alaska pollock (Carr & Marshall, 2008b) and Homo (Achilli et al., 2004). Similar results are obtained with Alaska pollock (Gadus chalcogrammus) as the outgroup (Lait, 2016). Estimates of time since expansion were calculated in DnaSP v5.10 (Librado & Rozas, 2009) using pairwise mismatches and τ = 2μt (Rogers & Harpending, 1992).

| Monte Carlo randomization test of phylogeographic structure
We used the Monte Carlo method described by Carr et al. (2015) to determine the degree to which the observed phylogeographic distribution of clades among sampling locations shows a significantly nonrandom fit to a priori models of geographic structure, as compared to the random distribution expected for such models. We grouped various sets of the 14 populations into four partitions, each comprising one or more populations, with a total count of N genomes, where the partitions comprised contiguous populations separated according to an a priori phylogeographic hypothesis. We tested three main models (I, I, and III) with two to six variants each (see Table 2). These were selected to explore (I) pan-Atlantic differentiation, (II) trans- The length of the observed tree (L) and the % cumulative frequency of the L-inclusive tail among 10,000 random trees (C%) are given. Refer to (Carr et al., 2015) for details of how L and C% are calculated. and 34 (4.8%) were first position transversions for alternative Leu co-

| Population structure
A total of 142 distinct mitogenome sequences were identified. Of these, 139 were unique to single individuals, and three were found in two, five, and seven fish from Lake Qasigialiminiq. Population sample sizes ranged from 7 to 18 (Table 1), except for GIL (n = 2), which is excluded from all population comparisons below. Nucleotide diversity (π) within populations ranged from 0.0016 to 0.0028 (Table 1) with Bayesian clustering analysis (Table 1)  Most Northwest Atlantic-only models were not significantly shorter than random, either with QAS included or excluded (p = .1202 and p = .4373, respectively). Alternatives that excluded either FLM alone or together with QAS remained highly significant (p < .0001 in either case). This indicates that the diphyletic composition of QAS did not unduly lengthen random trees, and that trans-Atlantic populations were significantly differentiated.
Atlantic sampling locations (Table 3a) Table 3a). This pattern is similar to that seen in the congeneric Pacific cod Gadus macrocephalus, where a combination of short mtDNA sequences (ND2 and CYTB) and microsatellites showed significant differences between trans-Pacific populations, and no distinction among geographically proximal locations (Canino, Spies, Cunningham, Hauser, & Grant, 2010). In this case, the authors suggested multiple glacial refugia on either side of the North Pacific Ocean with significant admixture following deglaciation.
In the Monte Carlo analysis, all five models that examined trans-Atlantic partitions (Models I and II, see Table 2)  In contrast, Models IIIa and IIIc test the geographic extremes of structure in the offshore Northwest Atlantic considered on its own, and neither departs from random (p > .4).
The present results are broadly consistent with past studies on cod based on microsatellite data and/or short mtDNA sequences, except for paraphyly in the latter (see Carr & Marshall, 2008a for a discussion on paraphyly in cod sequences). Based on one set of microsatellite markers, Ruzzante et al. (1996) claimed significant differentiation among some offshore and inshore-spawning aggregations in NAFO 2J3KL, but Carr and Crutcher (1998) noted this was found only in some partitions and was dependent on a single marker. With the same set of markers, Bentzen et al. (1996) found heterogeneity between Labrador (North) and Grand Banks (South) samples, and also suggested that the Flemish Cap and Scotian Shelf populations were significantly differentiated. Hardie et al. (2006), with a different set of microsatellite markers, found differences among the three landlocked Baffin (307-401 bp) alone consistently suggest an absence of genetic differentiation within regions and trans-Atlantic differences in haplotype frequencies (Árnason, 2004;Árnason & Pálsson, 1996;Carr & Crutcher, 1998;Carr & Marshall, 1991a,b;Carr et al., 1995;Hardie et al., 2006).
Whereas the majority of cod populations are migratory between inshore feeding grounds and offshore spawning locations (Lear, 1984;Templeman, 1979), a historic hypothesis is the existence of "bay cod" that remain more or less permanently inshore in deepwater bays that surround the island of Newfoundland (Carr et al., 1995). However, mitogenomes from inshore-spawning fish near Random Island in Trinity Bay and at Gilbert Bay in southern Labrador (RAN and GIL) are not differentiated from the nearest offshore populations (NGB and HAW) and are no less diverse as might be expected in an isolated population (cf. QAS).

BAPS analysis separated out the QAS and BLT populations from
all others, which reflects their co-occurrence in the B and E haplogroups. Although both Baffin Island and the Baltic Sea basin were glaciated until ca. 5,000-8,000 years ago, their genetic patterns contrast sharply. This is likely due to differences in how they were recolonized and their subsequent effective population numbers. Lake Qasigialiminiq is physically small and has maintained a very small census population size; it shows an odd polyphyletic composition and reduced haplotype diversity. In contrast, the Baltic Sea has apparently maintained a very large population size that includes diverse basal lineages, and has maintained high diversity and no shared haplotypes.

| Landlocked Arctic fjord population
The shared haplotypes seen in the landlocked fjord population at Lake Qasigialiminiq, at the southwest end of Cumberland Sound, Baffin Island, seem to reflect recent isolation and maintenance at small population size (500-1,000 individuals). Shared mitogenomic haplotypes are not seen in marine populations of Atlantic cod, and the presence of multiple distinct, private haplotypes within a single population is unprecedented in marine fish mitogenomic studies. The major haplogroups in Lake Qasigialiminiq are polyphyletic, and, by implication, of distinct temporal origins, although a single colonization event by a mixed population is also possible. The QAS fish assigned to haplogroup F is most closely related to an individual in NGB and is likely a more recent introduction from a lineage that is otherwise restricted to the Northwest Atlantic.
With respect to the Monte Carlo test, the QAS sample might be expected to introduce a somewhat artifactual distinction between the observed pattern of clustered identical or near-identical haplotypes, versus a randomized distribution in which extradispersal events are necessarily required to account for random scatter. Put another way, as the two clusters in the B and E haplogroups and the singleton F mitogenome found in QAS require exactly three events in the observed tree, their accommodation in any random tree requires more.
However, probabilities of the observed trees in Model I variants are similar with or without QAS (Ia vs. Ib: p = .0065 vs. .0105). In the various partitions of the Northwest Atlantic in Model III, the two variants that include QAS separately are highly structured (IIIb and e: p < .001), whereas variants that group it with NAFO 2H or exclude it altogether are not (IIIa, c, and d; p > .1; Table 2).
The identical QAS genomes occur in haplogroups that are either basal (B) or shared predominantly with BLT cod (E). This suggests a closer phylogeographic affinity with the east than with the west; this is supported by recent findings that the Greenland cod populations were colonized from Iceland (Therkildsen et al., 2013), and supports a westerly movement. Additional samples from the Northeast and mid-Atlantic would help to clarify this. The calibrated Bayesian tree puts the origin of diversity in either haplogroup at <20 kya ( Figure 3). This probably overcorrects pairwise differences of zero or one SNP, recalling that the mean interval between fixations F I G U R E 4 Statistical parsimony network of complete mitogenomes for 153 Atlantic cod. Each symbol represents an individual, the black dots are inferred haplotypes, and each connection represents one nucleotide change. Shared haplotypes are encased by a box, and the dashed boxes correspond to the haplogroups identified in Figure 2. Refer to Table 1  Atlantic cod ca. 5,000-8,000 years ago (Hardie et al., 2006). The individual from haplogroup F indicates a third invasion, consistent with the latter hypothesis. That the lake at present includes descendants of at least three distinct lineages suggests that either a single source population was highly variable and the lake has maintained variation despite opportunities for a founder effect and subsequent drift, or that there were at least three invasion events. Atlantic cod have been absent from the Davis Strait farther north than northern Labrador since the last glacial maximum, making a recent trans-Atlantic event unlikely. However, if water temperatures in the Arctic Ocean and the Davis Strait were several degrees higher from 3,000 to 6,000 years ago (Aitken & Gilbert, 1989), such movements may have been possible.
Based on partial cytochrome b fragments, Hardie et al. (2006) assigned all cod sampled in the three Baffin Island lakes to haplotypes "A," "E," or "G" (sensu Carr et al., 1995), where "A" is the most common, widespread haplotype in Atlantic cod. Of 18 QAS samples, our data for the homologous region would assign 11 and seven mitogenomes as identical within haplotypes "E" and "A," respectively.
Mitogenomic analysis assigns the former to a tight cluster within haplogroup E, whereas six of the "A" samples are instead assigned to the more basal haplogroup B and one to the younger haplogroup F (Figures 3 and 4; Figure S1). Carr and Marshall (2008a)

| Glacial refugia
The Pleistocene consisted of a series of ice ages interspersed with shorter warm periods (Hewitt, 2000(Hewitt, , 2004. During the most recent Wisconsinan (Nearctic) or Würm (Palaearctic) glaciations, ice sheets covered much of the Northern Hemisphere. Sea levels were as much as 120 m lower than today, exposing continental shelves that became covered with ice (Mitrovica, 2003;Pielou, 1991). Haplogroup expansion in Atlantic cod ( Figure 3) coincides with the interglacial and warming periods experienced during the Wisconsinan (ca. 110-12 kya) and Illinoian (ca. 200-130 kya) glaciations (Gibbard & van Kolfschoten, 2004 Sangamonian interglacial. Many of the other separations are estimated to have occurred at some point in the Illinoian glaciation; there were at least two warming periods ca. 250 and 300 kya (Gibbard & van Kolfschoten, 2004;Wright, 2000), and much of the early divergence dates to these warmer periods, or even prior to the Illinoian.
The patterns of variation seen in Atlantic cod suggest distinctive Northeast and Northwest Atlantic cod populations. Pairwise comparisons show that the Norway and Baltic Sea populations are significantly differentiated from the Northwest Atlantic populations in 75% of comparisons (Table 3). Both the AMOVA and the PCoA ( Figure 5) were also heavily influenced by these two populations. As seen in previous studies, despite a lack of fixed differences there appears to be significant trans-Atlantic differentiation. Bradbury et al. (2013) identified consistent differences between Northwest Atlantic and European samples with neutral and non-neutral markers. The magnitude of the separation suggests that it predates the most recent glacial cycle, and may be a relic of the Illinoian glaciation or earlier. The lack of fixed allelic differences and geographical structuring among marine cod populations suggests that recent admixture and gene flow may be confounding older interglacial patterns.
During the last glacial maximum, suitable habitat occurred in the Northeast Atlantic, in the western Atlantic south of the ice sheets, and may have occurred along the continental shelves of Atlantic Canada including the Flemish Cap, the Grand Banks, and in the Davis Strait between Baffin Island and western Greenland (Bigg et al., 2008;Maggs et al., 2008). Nucleotide diversity (Table 1) Table 3a). These data suggest that the Flemish Cap likely either did not act as a refugium for Atlantic cod, or that any refugial distinctiveness has since been eliminated by gene flow. It is worth noting that FLM also does not differ significantly from BLT (Φ ST = .075, p > .05), and that although Flemish Cap fish occur predominantly in haplogroup F, they are also present in every other haplogroup except the two smallest (D and H). Likewise, the NGB data do not support the Grand Banks as a possible refugium.
However, the landlocked population had mitogenomes shared among multiple individuals and unique to that population. In contrast, Atlantic wolffish (Anarhichas lupus; Lait, 2016;Lait & Carr, under review), green turtle (Chelonia mydas; Shamblin et al., 2012), and speartooth shark (Glyphis glyphis; Feutry et al., 2014) all include mitogenomes shared among individuals that are widely dispersed. The latter suggests recent origin of these lineages followed by dispersal, whereas the limitation of identical cod to a confined space is suggestive of founder effect and/or local drift on a scale of a few thousand years. Study of the mtDNA genome composition of the other fjord lakes would be of interest.

| CONCLUSIONS
The current picture of genetic diversity in Atlantic Cod is based on sequence analysis of mitochondrial DNA (Carr & Marshall, 2008a) and nuclear SNP markers (Bradbury et al., 2010(Bradbury et al., , 2013Nielsen et al., 2009). Due to its wide-scale distribution and pelagic ecology, Atlantic cod have been predicted to have limited population structure over small geographical scales, and only weak structure in trans-Atlantic comparisons. Although this was found in early studies of short mtDNA sequences, a greater degree of population structure has been suggested based on some nuclear microsatellites at local scales, and with most markers at larger scales.
The mitogenomic data indicate little or no stock differentiation within the commercial range of the Northwest Atlantic. The clockcalibrated Bayesian network shows multiple haplogroup expansions ca. 120-100 kya, around the Sangamonian interglacial/Wisconsinan glacial boundary, that comprise the most common haplotypes found in North America. This suggests contemporaneous recovery from one or possibly more western Atlantic refugia. However, diversity measures and the predominance of basal haplogroups support the eastern Atlantic and adjacent waters as an origin and ultimate source population. Haplogroup H is anomalous in this respect, but may be an artifact of geographic undersampling, as is apparent in retrospect for haplogroup D in Carr and Marshall (2008a). As the pattern of variation does not indicate a common geographic location of origin, it is likely that historically separated populations have undergone secondary admixture. Identity of mitogenomes from Lake Qasigialiminiq in the west, and perhaps Lake Mogilnoe in the east, shows the expected patterns in isolated populations founded since the LGM. Additional samples from across the range, particularly from mid-and eastern Atlantic and adjacent waters, may help to resolve uncertainties regarding refugial identification and colonization routes. The high level of variation seen in the cod mitogenomic sequences, including the trans-Atlantic differences and isolation of Arctic fjord populations on Baffin Island, shows that mitogenomics has sufficient power to resolve these questions.

ACKNOWLEDGMENTS
We would like to thank Fisheries and Oceans Canada for providing tissue samples, and both the Nunatsiavut Government and Pangnirtung Hunters and Trappers Organisation for helping to organize and carry out field work. A special thank you to Ian Winters, Errol Andersen, George Gear, Noah, and Jackie for their invaluable help arranging field seasons in the north, and William Hunter, Errol Andersen, Leon Jacque, and Norman Mike for catching the cod. We thank everyone involved in the extraction, amplification, and preliminary analyses of microarray samples and data: Dr. KA Johnstone, Dr. AT Duggan, Dr. SMC Flynn, and L MacDonald.
Primer sequences: uploaded as online supporting information (Table   S1).

CONFLICTS OF INTEREST
None declared.

AUTHOR CONTRIBUTIONS
SMC and HDM conceived the ideas; HDM and LAL collected the samples; LAL and HDM collected the data; LAL and SMC analyzed the data; SMC and LAL funded the project, and LAL led the writing with contributions from all authors.