Genotyping‐by‐sequencing illuminates high levels of divergence among sympatric forms of coregonines in the Laurentian Great Lakes

Abstract Effective resource management depends on our ability to partition diversity into biologically meaningful units. Recent evolutionary divergence, however, can often lead to ambiguity in morphological and genetic differentiation, complicating the delineation of valid conservation units. Such is the case with the “coregonine problem,” where recent postglacial radiations of coregonines into lacustrine habitats resulted in the evolution of numerous species flocks, often with ambiguous taxonomy. The application of genomics methods is beginning to shed light on this problem and the evolutionary mechanisms underlying divergence in these ecologically and economically important fishes. Here, we used restriction site‐associated DNA (RAD) sequencing to examine genetic diversity and differentiation among sympatric forms in the Coregonus artedi complex in the Apostle Islands of Lake Superior, the largest lake in the Laurentian Great Lakes. Using 29,068 SNPs, we were able to clearly distinguish among the three most common forms for the first time, as well as identify putative hybrids and potentially misidentified specimens. Population assignment rates for these forms using our RAD data were 93%–100% with the only mis‐assignments arising from putative hybrids, an improvement from 62% to 77% using microsatellites. Estimates of pairwise differentiation (F ST: 0.045–0.056) were large given the detection of hybrids, suggesting that reduced fitness of hybrid individuals may be a potential mechanism for the maintenance of differentiation. We also used a newly built C. artedi linkage map to look for islands of genetic divergence among forms and found widespread differentiation across the genome, a pattern indicative of long‐term drift, suggesting that these forms have been reproductively isolated for a substantial amount of time. The results of this study provide valuable information that can be applied to develop well‐informed management strategies and stress the importance of re‐evaluating conservation units with genomic tools to ensure they accurately reflect species diversity.

sequencing to examine genetic diversity and differentiation among sympatric forms in the Coregonus artedi complex in the Apostle Islands of Lake Superior, the largest lake in the Laurentian Great Lakes. Using 29,068 SNPs, we were able to clearly distinguish among the three most common forms for the first time, as well as identify putative hybrids and potentially misidentified specimens. Population assignment rates for these forms using our RAD data were 93%-100% with the only mis-assignments arising from putative hybrids, an improvement from 62% to 77% using microsatellites. Estimates of pairwise differentiation (F ST : 0.045-0.056) were large given the detection of hybrids, suggesting that reduced fitness of hybrid individuals may be a potential mechanism for the maintenance of differentiation. We also used a newly built C. artedi linkage map to look for islands of genetic divergence among forms and found widespread differentiation across the genome, a pattern indicative of long-term drift, suggesting that these forms have been reproductively isolated for a substantial amount of time. The results of this study provide valuable information that can be applied to develop well-informed management strategies and stress the importance of re-evaluating conservation units with genomic tools to ensure they accurately reflect species diversity.

| INTRODUC TI ON
Defining conservation units is one of the most fundamental yet challenging aspects of resource management (Coates, Byrne, & Moritz, 2018). Partitioning species into units with substantial reproductive isolation provides managers with the ability to monitor and regulate independently evolving groups that may respond differently to harvest, disease, habitat alteration, or climate change (Allendorf & Luikart, 2007;Ryder, 1986). Over the past several decades, advancements in genetic analysis have provided scientists with powerful tools to estimate the amount of gene flow between species or populations to inform the creation of conservation units (Olsen et al., 2014;Palsbøll, Bérubé, & Allendorf, 2007;Palsbøll, Peery, & Bérubé, 2010;Schwartz, Luikart, & Waples, 2006). With the arrival of the genomics era, the power and accuracy to discern levels of reproductive isolation, inbreeding, and effective population size has vastly improved (Allendorf, Hohenlohe, & Luikart, 2010), and tools such as genome scans have revolutionized our ability to identify and understand adaptive genetic variation (Funk, McKay, Hohenlohe, & Allendorf, 2012;Waples & Lindley, 2018). Despite these advancements, delineating discrete conservation units can still be problematic. For example, taxonomic uncertainty can lead to confusion regarding species boundaries (Bickford et al., 2007;Hey, Waples, Arnold, Butlin, & Harrison, 2003), and observed phenotypic, spatial, temporal, or behavioral differences can be opposed by apparent genetic panmixia (Als et al., 2011;Hoey & Pinsky, 2018;Palm, Dannewitz, Prestegaard, & Wickström, 2009).
Perhaps no other group embodies the challenges of defining conservation units better than the coregonines. A subfamily of the Salmonidae, coregonines are comprised of three genera of freshwater and anadromous fishes distributed throughout cold water habitat in North America, Europe, and Asia. The most speciose genus, Coregonus, includes the ciscoes and whitefishes, which exhibit an extreme array of phenotype variability that is attributed to recent adaptive radiation into lacustrine habitat following glacial retreat during the Pleistocene epoch (Behnke, 1972;Schluter, 1996;Svärdson, 1979). Often, distinct phenotypes can be found in sympatry and allopatry, which leads to difficulty in distinguishing a single, monophyletic origin of forms from parallel ecological speciation in individual lakes. Several coregonines exhibit sympatric dwarf and normal forms, including members of the European whitefish C. lavaretus species complex, North American whitefish C. clupeaformis, cisco C. artedi, and least cisco C. sardinella (Huitfeldt-Kaas, 1918;Mann & McCart, 1981;Shields, Guise, & Underhill, 1990;Vuorinen, Bodaly, Reist, Bernatchez, & Dodson, 1993). Empirical evidence of hybridization and introgression (Garside & Christie, 1962;Kahilainen et al., 2011;Lu, Basley, & Bernatchez, 2001) raises the question of how to manage forms when reproductive isolation is incomplete and has led some to suggest that the broad phenotypic variation observed in coregonines is a result of reticulate evolution (Svärdson, 1998;Turgeon & Bernatchez, 2003). This breadth of taxonomic ambiguity in the coregonines was first termed "the coregonid problem" by Svärdson (1949) and persists today as the more accurate "coregonine problem" (Eshenroder et al., 2016;Mee, Bernatchez, Reist, Rogers, & Taylor, 2015).
The most extensive regional adaptive radiation within the Coregonus genus in North America occurred in the Laurentian Great Lakes, but the detection of genetic differentiation or reproductive isolation among Great Lakes forms has been mostly unsuccessful. Rapid diversification of one or more colonizing lineages (Eshenroder et al., 2016;Smith & Todd, 1984;Turgeon & Bernatchez, 2003) into newly available deepwater habitat following the Wisconsin Glacial Episode resulted in the evolution of at least eight distinct forms within the C. artedi species complex (Koelz, 1929;Scott & Crossman, 1998). Morphological differences occur across a variety of traits including body and head shape, lower jaw position, eye size, fin length, and gill raker counts (Koelz, 1929), though subtle variations among forms and between lakes K E Y W O R D S adaptive divergence, conservation units, coregonines, genomic islands of divergence, hybridization, population genomics, RAD sequencing, species complex can often make visually distinguishing them difficult without all possible forms present (Eshenroder et al., 2016;Turgeon et al., 2016). Different forms typically occur in specific depth ranges leading to the hypothesis that depth niche specialization is a primary driver of divergence among forms (Smith & Todd, 1984).
Stable isotope analysis further supports niche differentiation, indicating that many of the Great Lakes forms occupy different trophic levels with observed changes in proportion of pelagic and benthic food sources (Schmidt, Harvey, & Vander Zanden, 2011;Schmidt, Vander Zanden, & Kitchell, 2009;Sierszen et al., 2014).
All forms likely undergo seasonal spawning migrations, with two of the three most common forms-C. artedi and C. hoyi-forming nearshore aggregations in November-December and January-March, respectively, and the third-C. kiyi-suspected to spawn during November-December (Dryer & Beil, 1968;Koelz, 1929;Yule, Addison, Evrard, Cullis, & Cholwek, 2009;Yule, Stockwell, Evrard, Cholwek, & Cullis, 2006). However, very little is known about behavioral and habitat differences among forms during overlapping periods of spawning, maintaining the possibility that hybridization during spawning events could be preventing genetic divergence.
Over the past century, anthropogenic impacts have greatly reduced the original diversity of the C. artedi species complex in the Great Lakes, underscoring the need for establishing well-informed conservation units in ciscoes. The introduction of invasive forage fish, overfishing, and habitat loss led to large decreases in cisco abundance and lake-wide extirpation to complete extinction of historically documented deepwater forms (Commission & Christie, 1973;Smith, 1968Smith, , 1970Wells & McClain, 1973). Of the eight accepted forms originally described, only C. artedi and three deepwater forms-C. hoyi, C. kiyi, and C. zenithicus-are extant in the Great Lakes (referred to henceforth by specific epithet; Bailey & Smith, 1981;Todd & Smith, 1992). The range of a fifth deepwater form, C. nigripinnis, has been reduced to nearby Lake Nipigon (Ontario, Canada), though an extant nigripinnis-like form is still periodically caught in Lake Superior (Eshenroder et al., 2016). Lake Superior's peripheral location relative to both large human populations (and associated fishing pressure) and the canal construction that opened the Great Lakes to invasion from non-native species appears to have provided some protection from the impacts that extirpated cisco forms from the other four lakes (Koelz, 1926). Of the four lakes in which members of the C. artedi complex remain, Lake Superior is the only lake where all extant Great Lakes forms can still be regularly found (Eshenroder et al., 2016).
Recent evidence in the Great Lakes for declining abundance of invasive fish such as alewife and increasing abundance in cisco (Bronte et al., 2003;Mohr & Ebener, 2005;Schaeffer & Warner, 2007) has led to growing interest in re-establishing lost populations. An understanding of the roles of heritable genetic differences and reproductive isolation in the establishment and persistence of remnant forms is vital for developing both informed conservation units and restoration strategies. The main goal of our study was to employ genomic methods to improve our understanding of genetic variation among these forms.
Specifically, we (1) examined genetic differentiation and diversity among putative forms of cisco, (2) compared the performance of SNPs and microsatellites in this system, and (3) leveraged a newly built cisco linkage map (Blumstein et al., 2019) to investigate adaptive divergence Wisconsin among forms. In order to remove the potentially confounding factor of distinguishing spatial genetic structure from form-based genetic structure, we focused on a single region in Lake Superior where multiple forms of cisco are found in sympatry, the Apostle Islands.

| MATERIAL S AND ME THODS
Tissue samples preserved in >95% ethanol were collected for the three most common cisco forms-artedi, hoyi, and kiyi in the Apostle Islands ( Figure 1, Table 1 All samples were identified to putative form using a suite of standardized morphological characteristics (Eshenroder et al., 2016).

DNA was isolated from fin tissue samples with Qiagen DNeasy®
Blood & Tissue Kits.

| RAD library prep and sequencing
Restriction site-associated DNA (RAD) libraries were prepared following the BestRAD protocol (Ali et al., 2016). Extracted DNA was quantified using a Quant-it™ PicoGreen ® dsDNA Assay (Invitrogen, Waltham, MA) and normalized to a concentration of approximately 50 ng/µl for a 2-µl digestion reaction with the restriction enzyme SbfI followed by ligation with barcoded adaptors.

| Read processing and SNP filtering
Raw sequences generated from RAD sequencing were processed in the software pipeline Stacks v2.3d (Rochette, Rivera-Colón, & Catchen, 2019). Sequences were demultiplexed by barcode, filtered for presence of the enzyme cut-site and quality, and trimmed in the subprogram process_radtags (parameter flags: -e SbfI -c -q -r -t 140 --filter_illumina --bestrad). Filtered reads for each individual were aligned to create matching stacks with ustacks following TA B L E 1 Sample statistics, diversity, and effective population size estimates Note: N is the number of individuals successfully genotyped, A is the percentage of total sampled alleles found in each group, H o /H e is observed/ expected heterozygosity, G IS is the inbreeding coefficient, Assignment is the percentage of individuals that were correctly assigned to their population of origin in a leave-one-out test, and N e is effective population size calculated using the LDNE method and reported with 95% confidence intervals.
a Three genotyped hoyi samples are suspected to be misidentified kiyi from the RAD-based PCA and ADMIXTURE analysis and were removed to prevent bias in estimates of diversity and N e . b The one artedi that assigned to HOY and the two hoyi that assigned to KIY appear to be hybrids. Outside of these putative hybrids, assignment to the ART and HOY groups was 100%.
guidelines suggested from empirical testing to avoid under-or over-merging loci from RAD datasets (
Fragment analysis was performed using a Genetic Analyzer 3.0 (Life Technologies), and genotypes were assigned at each locus using GeneMapper 3.7 (Life Technologies). We used Genepop v4 (Rousset, 2008) to conduct exact tests for deviations from Hardy-Weinberg and linkage equilibrium (α = 0.01). Three loci were removed for being out of Hardy-Weinberg Equilibrium in all three forms (Bwf1, Cocl23, CoclLav6), and no loci showed significant linkage disequilibrium.

| Genetic differentiation and diversity
To examine the influence of selection on estimates of differentiation and diversity with our RAD dataset, we employed two statistical tools for the identification of putative outlier loci-BayeScan v2.1 (Foll & Gaggiotti, 2008)  indicated that the optimal K was 2 for computing correlations between our RAD loci and K principal components. We used Benjamini and Hochberg's (1995) method for correction of the false discovery rate in both BayeScan and pcadapt at α = 0.05. Exploratory analysis comparing the full dataset and a dataset of neutral loci resulted in no substantial differences in estimates of genetic differentiation (see Results), so we proceeded using the full dataset of RAD loci.
To ensure that our putative form designations were appropriate, we used two approaches to assess genetic similarity among individuals. First, we conducted a PCA in the R package "adegenet" (Jombart, 2008) for both the RAD and microsatellite datasets. Next, we estimated the number of ancestral populations, K, contributing to contemporary genetic clustering for the RAD dataset using the program ADMIXTURE v1.3 (Alexander, Novembre, & Lange, 2009). We tested K from 1 to 5 with ADMIXTURE's cross-validation procedure and a k-fold of 10 (parameter flag: --cv = 10) to examine support for each K. ADMIXTURE analysis was only conducted on the RAD dataset because this program is not compatible with microsatellite data.
PCA and ADMIXTURE analysis with RAD genotypes revealed that three individuals originally identified as hoyi fell within the cluster of kiyi samples. Given the discrete clustering of the remaining of hoyi and kiyi, there is strong possibility that these three hoyi samples were misidentified in the field, so we removed these individuals from both RAD and microsatellite datasets for all further analyses to prevent bias in estimates of diversity, differentiation, and N e .
We calculated a variety of summary statistics for the groups comprised of the five forms (artedi-ART, hoyi-HOY, kiyi-KIY, nigripinnis-NIG, and zenithicus-ZEN) including percentage of total observed alleles, observed and expected heterozygosity, and an inbreeding coefficient (G IS ). Summary statistics for each locus and populations were calculated using both microsatellite and RAD datasets in GenoDive v2.0b23 (Meirmans & Van Tienderen, 2004). Genetic differentiation among all forms was estimated across all loci in the RAD dataset with pairwise F ST (Weir & Cockerham, 1984) in Genepop and tested using exact tests (Goudet, Raymond, de Meeüs, & Rousset, 1996;Raymond and Rousset, 1995; alpha = 0.01) in Arlequin (Excoffier & Lischer, 2010).
We also calculated locus-specific overall and pairwise-F ST values (Weir & Cockerham, 1984) in Genepop using a dataset that included the three forms with n > 10 (ART, HOY, and KIY). To compare genetic differentiation between RAD and microsatellite datasets, we used GenoDive to estimate standardized pairwise genetic differentiation, G' ST (Hedrick, 2005), which employs an additional correction for bias from sampling a limited number of populations (Meirmans & Hedrick, 2011).
The rate at which individuals were assigned back to their form with RAD and microsatellite datasets was tested using population assignment in GenoDive for all forms with n > 10. Assignment was performed by calculating the home likelihood (L h ) that an individual genotype is from a specific group given the allele frequencies (Paetkau, Calvert, Stirling, & Strobeck, 1995) using the leave-one-out method to avoid the bias from a target individual's contribution to the allele frequencies of a source population. Zero frequencies were replaced with 0.005, and a significance threshold of alpha = 0.002 was applied separately to each group over 1,000 replicated datasets Effective population size (N e ) was estimated for all forms with n > 10 using RAD and microsatellite datasets with the bias-corrected linkage disequilibrium method (LDNE; Hill, 1981;Waples, 2006;Waples & Do, 2010) in the software package NeEstimator v2.1 (Do et al., 2014). We used a p-crit of 0.05 for the RAD dataset (Waples, Larson, & Waples, 2016) and 0.02 for the microsatellite dataset (Waples & Do, 2010). For the RAD dataset, only comparisons between sampled loci that were found on different linkage groups (LGs) of the cisco linkage map (Blumstein et al., 2019) were included to correct for physical linkage (Waples et al., 2016). Since no putative ART-KIY hybrids were observed in our RAD dataset, we chose to implement a one-dimensional stepping-stone model of migration (Kimura & Weiss, 1964) among simulated populations.
Each unique parameter set for these simulations was run using the maximum possible number of loci in EASYPOP (8,000), and a genepop file was output containing a subset of 100 randomly selected individuals per population (50 females/50 males) and compared to a reduced dataset of 8,000 neutral loci from our empirical data.
The reduced dataset was generated by removing putative outliers from the loci that were placed on a linkage map (see below) and randomly subsampling the remaining markers. ADMIXTURE was used to generate Q-scores for each individual in both simulated and reduced empirical datasets. Putative hybrids were identified using a Q-score of less than 70% following similar thresholds applied in the literature (Kapfer, Sloss, Schuurman, Paloski, & Lorch, 2013;Marie, Bernatchez, & Garant, 2011;Weigel et al.., 2018), and hybrid combinations were assigned to the two populations representing the largest Q-scores within the hybrid individual.

| Differentiation across the genome
We examined genetic differentiation across the genome by pairing our data with the artedi linkage map constructed by Blumstein et al. (2019).
Catalog IDs were identical between the current study and linkage map; therefore, no alignment step was needed to compare loci. To identify putative genomic islands of divergence that were highly differenti- We found over 50 significant windows for each comparison (see results). Conducting an in-depth investigation of all significant windows was not feasible; therefore, we isolated in-depth analysis to one LG (Cart21) that contained the most significant windows and outlier loci in the dataset.
To investigate this highly differentiated LG, we aligned consensus sequences for all loci from Cart21 to chromosome Ssa05 in Atlantic salmon (Salmo salar), which is syntenic to Cart21 in artedi (Blumstein et al., 2019). Ssa05 sequences were obtained from genome version ICSASG_v2 (Lien et al., 2016), and alignments were conducted with BLASTN. The best alignment for each locus was retained, and all alignments had e-values < 1e−58. We then visualized the relationship between recombination and physical distance using alignments to Ssa05 and information from the artedi linkage map to determine whether this highly differentiated region is characterized by lower recombination. We also obtained annotation information from the Atlantic salmon genome to determine whether genes of interest were co-located with areas of high divergence. Finally, we plotted the allele frequencies of the 10 SNPs on Cart21 with the highest overall F ST to investigate whether these SNPs show consistent patterns of population structure.

| Sequencing and genotyping
A total of 137 individuals were RAD sequenced producing more than 455 million reads and an average of 3,346,457 reads per sample. After filtering, 119 individuals with representatives from all five putative cisco forms in Lake Superior were genotyped at 29,068 loci (Table 1). More than half of these loci (n = 15,348 loci) were also placed on the linkage map. Since RAD sequencing and microsatellite amplification were performed on the same hoyi and kiyi samples, microsatellite genotypes used in our analyses were restricted to the same individuals that successfully genotyped with our RAD loci. An additional 30 artedi from Stockton Island were genotyped at the 9 microsatellite loci.

| Genetic differentiation and diversity
BayeScan identified 244 outliers, and pcadapt identified 318 outliers, 150 of which were identified by both programs (see Figure S1).
All 412 outlier loci were putatively under divergent selection and were removed from our full RAD dataset to generate a neutral dataset of 28,656 loci. We examined both the full and neutral datasets using PCA and ADMIXTURE and found no significant differences between the patterns of differentiation between datasets. Likewise, estimates of pairwise genetic differentiation (F ST ) decreased by an average of only 0.0039 with the neutral dataset. Results from exploratory analyses with the neutral dataset are included ( Figure S2 and Table S1), and all results presented below are for the full RAD dataset unless otherwise stated.
PCA showed a sharp contrast in resolution between marker sets with microsatellites producing one large cluster of overlapped forms across the first two principal components and the RAD dataset producing three major clusters primarily composed of ART, HOY, and KIY Microsatellites grouped in the center of the PCA, with five of the six samples falling out between the ART, HOY, and KIY clusters (Figure 2). The sixth sample fell within the ART cluster, and like the three hoyi in the KIY cluster, possibly represents a misidentified specimen. Unlike the NIG samples, which suggest the possibility for a distinct cluster with the addition of more specimens, the four ZEN samples closely associated with either the HOY cluster (n = 1) or the KIY cluster (n = 3). Low representation of both NIG and ZEN in our RAD dataset reduces our ability to draw strong conclusions based on these PCA results, so both groups were unaltered for estimates of diversity, inbreeding, and differentiation and dropped for the remaining analyses.
The most supported number of ancestral populations (K) estimated using the cross-validation procedure in ADMIXTURE was two (cross-validation error: 0.498; Figure 3). Examining additional Ks for significant substructuring among forms generated results that corroborated those from the PCA. When K = 2, the ART cluster splits from HOY, KIY, and ZEN. Individuals in the NIG cluster exhibited mixed ancestry between the two major groups as seen on PC1 of the PCA (Figure 3). When K = 3 (cross-validation error: 0.509), the major genetic ancestries differentiate the ART, HOY, and KIY clusters as seen on PC2. The three putatively misidentified hoyi first noted in the PCA all had Q estimates of 100% for the KIY cluster and were removed from further analyses. Additional Ks did not differentiate either the NIG or ZEN cluster but begin to differentiate small subsets of individuals within groups, which was likely statistical noise (see Figure S3).  (Table 1). Inbreeding coefficients were not substantially different from zero in both datasets. The largest G IS was measured in ZEN from only four samples (0.279), and the rest were between −0.028 and 0.196. All estimates of pairwise genetic differentiation among forms with n > 10 (ART, HOY, KIY) with the RAD dataset were significant ( Table 2). The magnitude of genetic differentiation followed similar trends observed in the PCA, with ART being slightly more differentiated from deepwater forms (F ST = 0.049-0.056). This pattern remained the same with a standardized measure of genetic differentiation in the RAD and microsatellite datasets (Table 3). All pairwise comparisons in both datasets were significant with higher overall values of G' ST generated using microsatellites (0.110-0.122) than SNPs (0.060-0.75). See Tables S1 and S2 for locus-specific summary statistics.
Population self-assignment rate using the microsatellite dataset ranged from 61.9% to 76.7% with artedi exhibiting the highest likelihood of being assigned back to the ART group (Table 1). Assignment rate with the RAD dataset was 100% for the KIY group and 98.3% and 92.5% for the ART and HOY groups, respectively, a result of one putative artedi assigning to HOY and two putative hoyi individuals assigning to KIY. In the ADMIXTURE analysis, these three individuals exhibited relatively high Q estimates for ancestry to the populations to which they were assigned (artedi, Q HOY = 67.4%; hoyi, Q KIY = 64.2% and 56.2%). In the PCA generated from the same data, these individuals were oriented between the main clusters. In the microsatellite dataset with the same hoyi and kiyi samples, neither of the two potentially misclassified hoyi assigned to the HOY group, with one being assigned to ART and one to KIY. Assignment scores are reported in Tables S3-S5.
Estimates of N e with the microsatellite dataset ranged from 73 in HOY to 659 in KIY (Table 1). Only the estimate for ART produced both upper and lower bound confidence intervals (N e : 91, CI: 42-6, 126), whereas the estimates for HOY and KIY produced confidence intervals with an "infinite" upper bound. An "infinite" upper bound is typically an indication that the data are not powerful enough to produce an accurate estimate of N e given the sample size, population size, and/or marker resolution (Do et al., 2014;Waples & Do, 2010).
For the RAD dataset, estimates of N e were generated with loci that were placed on the linkage map and ranged from 1,701 to 2,126 with confidence intervals within 10% of these values.
The fact that we observed hybridization coupled with relatively high genetic differentiation was puzzling and prompted us to conduct two types of simulations to investigate how hybridization (i.e., gene flow can influence genetic differentiation). Simulated hybridization over a 15-generation period resulted in declines in genetic differentiation among populations for all tested levels of migration ( Figure S4).
When the migration rate was set to 5 or 10%, levels of differentiation

F I G U R E 3 Genetic lineages in
Apostle Islands ciscoes estimated with ADMIXTURE. Each vertical bar represents a single individual and is colored by the proportion of ancestry (Q) assigned to each genetic lineage (K). In the K = 3 plot (bottom), filled circles represent putatively misidentified individuals within forms (with Q-scores of 100% to other forms) and asterisks represent putative hybrids based on a Q-score threshold of less than 70% A migration rate of 1% resulted in steadily declining genetic differentiation but only began to approach a halved F ST after 15 generations.
Simulations of stepping-stone migration between three populations and subsequent ADMIXTURE analysis resulted in an overall pattern similar to that observed in our empirical data (Figure 4, baseline ADMIXTURE plot at generation 0 presented as supplemental Figure   S5). One of the major goals of these simulations was to determine whether Q-scores between 0.1 and 0.2 for alternative forms observed in our empirical dataset could be the result of statistical noise. Results from the simulations with 2 generations of migration (G2), where only F1 migrants are possible, demonstrated that these Q-scores in the range of 0.1-0.2 can be generated by statistical noise as they were present in this simulation (Figure 4). However, since no forms contained private alleles, we do not have the ability with our empirical data to confidently distinguish statistical noise from true hybrid backcrosses. We documented a relatively high level of hybridization in our empirical dataset, with ART-HOY hybrids comprising 9.9% of the total number of sampled artedi and hoyi, HOY-KIY hybrids comprising 4.3% of sampled hoyi and kiyi, and ART-KIY hybrids comprising 0% (Table 4). Simulation results indicated that the proportion of observed ART-HOY hybrids is more similar to that found after 5 generations (G5) of hybridization (G5: 10.6%), whereas the proportion of observed HOY-KIY hybrids is more similar that found after 2 generations of hybridization (G2: 3.5%).
Similar to our observed data, ART-KIY hybrids were absent or in low frequency (G5: 1.2%) in all simulations. Coupled with the high levels of genetic differentiation we observed, these results suggest two possible scenarios-that ART-HOY and HOY-KIY hybridization is a relatively recent development or that some pre-or postzygotic mechanism is reducing hybrid fitness.  (Hedrick, 2005), for SNP and microsatellite datasets

F I G U R E 4
Levels of genetic admixture in three simulated populations after 2, 5, and 10 generations (gen) of steppingstone migration. Simulations were run with random mating of 1,000 females and 1,000 males in each population using 8,000 biallelic loci, and preliminary conditions that produced a similar level of differentiation observed in our RAD dataset among forms (F ST ≈ 0.05; 1,000 generations with an island migration rate of 0.001). Each ADMIXTURE plot represents a random subset of 50 males and 50 females from each population. Empirical data were reduced to a dataset of 8,000 randomly selected, neutral loci that could be placed on the linkage map and run in ADMIXTURE for comparison to simulated data Note: Data simulated in EASYPOP were comprised of 8,000 biallelic loci from three populations (P1-P3) that experienced stepping-stone migration (m = 0.05) for 2, 5, or 10 generations (G2, G5, G10) after 1,000 generations of low migration (m = 0.001) to approximate an F ST similar to that observed between our three major cisco forms (ART, HOY & KIY, FST ≈0.05). Putative hybrids were identified using a Q < 70% and were assigned to the combination of source populations that had the two highest Q-scores. Empirical data were reduced to a dataset of 8,000 randomly selected, neutral loci and similarly assessed for comparison to simulated data.

F I G U R E 5
Genetic differentiation across the genome visualized with a bubble plot (top) and plot with the overall F ST of each marker (bottom). The size of each bubble in the bubble plot represents the number of genomic windows that were significantly differentiated from the rest of the genome according to kernel smoothing analysis for each form comparison. The "overall" designation is overall F ST across the dataset. Each dot in the graph of differentiation across the genome represents a marker, and red lines denote significantly differentiated windows. Red dots are loci that were found to be putatively under divergent selection in either Bayescan or PCAdapt. Linkage groups are separated by dashed lines. Form abbreviations are in Table 1. See Figures S4-S8 for visualizations of genetic differentiation for each chromosome and form comparison Distance in cM and HOY-KIY comparison (1). We were able to place 351 loci on Cart21, and 12 of these loci displayed overall F ST values > 0.3 (Figure 6a). The largest cluster of high-F ST loci was found between 0 and 10 cM on the linkage map. This region appears to be characterized by relatively low recombination, as loci found in the first 10 cM of Cart21 span about 25 megabases of the Atlantic salmon genome (Figure 6b). Alignments to the Atlantic salmon genome were possible for 151 loci on Cart21, and these alignments revealed that the highest F ST loci were found between positions 15 million and 25 million on Ssa05 (Figure 6c). Some of these loci were found in genes with functions that include cell signaling and membrane transport. However, there are over 2,000 genes on Ssa05, making it difficult to reach any robust conclusions about the functional significance of our loci. Allele frequencies at the high-F ST loci were generally the most diverged between ART and the other two forms, KIY in particular (Figure 6d).

| D ISCUSS I ON
The detection of genetic structure in recently diverged species complexes has proved challenging with traditional genetic methods, prompting the reevaluation with genomics of a taxonomically uncertain species complex in the Laurentian Great Lakes.
Using a genome-wide panel of SNPs to examine differentiation and diversity of sympatric coregonines in the Apostle Islands of Lake Superior, we were able to unambiguously assign individuals to the three major forms as well as identify putative hybrids and misidentified individuals. Despite a century of anthropogenic impacts in the Great Lakes that has seen the extirpation and extinction of historically documented forms in the C. artedi species complex, estimates of N e and diversity in Apostle Islands populations do not suggest the three major extant forms are experiencing bottleneck effects. Genetic differentiation among forms was notably high despite the presence of hybrids, and simulations suggest either that hybridization at the rates we observed between forms is a relatively recent phenomenon or that the fitness of hybrids is reduced. This second scenario is further supported by the discovery of widespread differentiation between forms across the genome, indicating that much of the divergence observed may be driven by long-term reproductive isolation and drift.

| Hypotheses for high genetic differentiation among forms
Genetic differentiation of the three primary cisco forms in our study (artedi, hoyi, kiyi) was relatively high compared to previous research in cisco using allozymes, mtDNA, microsatellites, AFLPs, and RAD data, which has largely suggested that forms are not diverged within lakes (but see Stott et al., 2020;Turgeon, Estoup, & Bernatchez, 1999;Turgeon et al., 2016). For example, Piette-Lauzière, Bell, Ridgway, and Turgeon (2019)   were much lower than the F ST values of ~ 0.05 observed in our study. Unfortunately, genetic data for cisco in the Great Lakes are relatively sparse; however, a previous study using mtDNA and microsatellites did not find strong evidence of differentiation among forms (Turgeon & Bernatchez, 2003). Results from the microsatellites genotyped in the current study are similar and indicate that these markers are unable to differentiate species in Lake Superior, even though genetic structure was relatively high according to the RAD data. Interestingly, our estimates of genetic divergence among forms more closely mirror two studies in lake whitefish and European whitefish that used RAD sequencing (Feulner & Seehausen, 2019; than previous studies in cisco that genotyped mtDNA and microsatellites. The high genetic divergence among forms observed in cisco is not typical of other fishes in the Laurentian Great Lakes. Lake trout (Salvelinus namaycush) and brook trout (Salvelinus fontinalis) display significant life history polymorphisms in the Great Lakes, with lake trout exhibiting morphologically distinct ecotypes related to depth and brook trout exhibiting both fluvial and adfluvial life histories.
Two recent studies used RAD sequencing to investigate life history polymorphism in these species, and neither was able to document strong signals of divergence among forms (Elias, McLaughlin, Mackereth, Wilson, & Nichols, 2018;Perreault-Payette et al., 2017).
It is possible that divergence within these forms was reduced through introgression mediated by stocking, as these species were stocked heavily, whereas cisco has not been stocked in consistently large numbers (Baillie, Muir, Scribner, Bentzen, & Krueger, 2016;Wilson et al., 2008). However, it is also possible that reproductive isolation among cisco forms is more complete, reducing the potential for introgression to erode divergence among forms.
Very little is known about the spawning biology of forms outside of artedi, although our observation of relatively frequent hybrids between ART-HOY and HOY-KIY suggests that there is at least some overlap in reproductive timing among the three forms. It is also notable that we did not observe any putative hybrids between ART-KIY.
These three cisco forms are encountered in different depths in Lake Superior, with artedi inhabiting waters < 80 m deep, hoyi inhabiting depths between 60 and 160 m, and kiyi inhabiting depths from 80 to 200 m (Eshenroder et al., 2016). This depth stratification likely explains our observation that hoyi hybridizes with artedi and kiyi, but artedi and kiyi do not hybridize with each other.
Our observation that hybrids appear to be relatively common even though genetic differentiation among forms is high may indicate that hybrids possess reduced fitness and potentially genetic incompatibilities. Successful hybridization can homogenize genetic structure within a few generations, as evidenced by our simulations and by a large body of literature in species such as European whitefish and cichlids (reviewed in Seehausen, 2006). For example, Vonlanthen et al. (2012) documented rapid speciation reversal in European whitefish in response to habitat degradation. It is possible that cisco forms in Lake Superior were historically more differentiated, and we are observing the beginning of reverse speciation. However, this is unlikely since Lake Superior has remained the least impacted relative to the other Great Lakes, and cisco populations in Lake Superior remain robust (see Eshenroder et al., 2016). Alternatively, we hypothesize that lower fitness of hybrids caused by negative interactions between hybrid genomes, such as Dobzhansky-Muller incompatibilities (Dobzhansky, 1936;Muller, 1942), may be at least partially responsible for maintaining high genetic differentiation among forms (Dagilis, Kirkpatrick, & Bolnick, 2019;Ellison & Burton, 2008). Multiple lines of evidence for hybrid incompatibilities have been found in a sister taxon of cisco, lake whitefish (Coregonus clupeaformis, reviewed in Bernatchez et al., 2010). For example, Rogers and Bernatchez (2006) found that hybrid backcrosses had ~ 6 times higher mortality than F1 crosses, Renaut, Nolte, and Bernatchez (2009)

| Genetic differentiation across the genome
Comparison of patterns of genomic divergence found in our study with previous research suggests that diversification of cisco forms in the Great Lakes is likely polygenic and that these forms may have been isolated with limited gene flow for a relatively long period of time. We identified over 100 significantly differentiated genomic windows in our study, and genetic differentiation among forms was consistently high across the genome. This result is similar to two other investigations of genomic divergence in coregonines, which also hypothesized that adaptive divergence among forms likely involves a large number of genes (Feulner & Seehausen, 2019;. Specifically, Feulner and Seehausen (2019) (Lotterhos & Whitlock, 2015;Noor & Bennett, 2009;Yeaman, 2015). We hypothesize that cisco forms in Lake Superior diverged in sympatry thousands of years ago, and our genome scans seem to reveal a complex history of both drift and selection consistent with this hypothesis. Sympatric speciation, even when involving many genes of small effect, is predicted to create highly differentiated regions (i.e., genomic islands of divergence), as selectively advantageous loci that are clustered together are protected from between population recombination (Via, 2012;Via & West, 2008;Yeaman & Whitlock, 2011). We believe the first 10-20 cM of LG21 may represent one such island, as this region contained the highest concentration of outlier loci in our dataset and displayed potential evidence of reduced recombination. However, genetic differentiation appeared to be relatively similar across many LGs, with peaks and valleys but few conspicuous regions of differentiation. It is likely that many of these peaks may have been caused by genetic drift resulting from relatively complete and long-term isolation among forms (Via, 2012). Taken together, our results suggest a heterogenous landscape of divergence across the genome that has likely been shaped by both selection and drift. We suggest that future studies attempting to disentangle selection from drift in this system compare cisco forms among multiple lakes and potentially even compare cisco to other coregonine species (e.g., Rougeux et al., 2019).

| Conservation implications
We documented high genetic differentiation among the three major cisco forms in Lake Superior and, based on this information, we suggest that separate conservation units could be constructed for each form. This strategy differs from the current conservation recommendation that the entire C. artedi species complex be considered C. artedi sensu lato (translation, in the broad sense) and that units of conservation should be designed to preserve environments that have facilitated the evolution of different forms (i.e., lakes) rather than on forms at a larger scale (Turgeon & Bernatchez, 2003;Turgeon et al., 2016). These recommendations were informed by the best available data, which up to this point, have been generated with commonly employed markers for the detection of taxonomic and conservation units-AFLPs, mtDNA, and microsatellites (Reed et al., 1998;Turgeon & Bernatchez, 2003;Turgeon et al., 2016)-none of which have been able to consistently resolve different forms within the Great Lakes (but see Stott et al., 2020 for a single lake example).
Our results suggest that "last generation" markers may be insufficient for capturing differentiation in evolutionarily young species, such as cisco, and highlight the utility of genomic data for designating conservation units in these species. However, it is important to note that we only surveyed animals from one lake, and a larger genotyping effort across the Great Lakes is necessary to accurately inform conservation units for Great Lakes ciscoes. Additionally, our ability to draw conclusions regarding the relationships of the rare forms nigripinnis and zenithicus to the three major forms was limited by low sample size; therefore, genotyping additional contemporary and/or historic specimens may help resolve the placement these forms.
The potential of genomic data to revolutionize construction of conservation units has been frequently discussed (reviewed in Allendorf et al., 2010;Funk et al., 2012), and many studies have found that genomic data provide increased resolution for delineating population structure compared to last generation markers, such as microsatellites (Hodel et al., 2017;Vendrami et al., 2017;Wagner et al., 2013). However, conservation units that were constructed with these last generation markers have generally proven to be robust and are usually only updated slightly, if at all, based on genomic data (e.g., Hecht, Matala, Hess, & Narum, 2015;Larson et al., 2014;Moore et al., 2014). Our findings do not follow this pattern and suggest that current conservation unit recommendations for cisco may not accurately reflect the diversity of this species complex. We found that individual-based analyses conducted using genomic data were consistently able to differentiate cisco forms, whereas this was not possible with our dataset of nine microsatellites. We believe that this discrepancy largely stemmed from lack of power with the microsatellite dataset rather than, for example, heterogenous genomic divergence that was captured with genomic data but not with the microsatellites. The discrepancy that we observed between marker types has larger implications for constructing conservation units.
Specifically, our data suggest that conservation units constructed based on data from small panels of last generation markers that did not document structure among groups of animals with different phenotypes could potentially be re-evaluated with genomic tools to ensure within-species diversity is being adequately conserved.

| Conclusions and future directions
Our study provides some of the first evidence that cisco forms within the Great Lakes are genetically differentiated. We documented high genetic differentiation among the three major forms in Lake Superior, and highly differentiated markers were distributed across the genome, with islands of divergence found on nearly every linkage group. Additionally, we identified putative hybrids but hypothesize that fitness breakdown of backcrosses may aid in maintaining differentiation among these forms. The results of this study provide the foundation for a new understanding of the ecology and evolution of the C. artedi species complex within the Great Lakes.
The ability to differentiate forms with genomics provides researchers with a powerful tool for ground truthing morphological phenotypes and identifying cisco species at any life stage. In particular, the ability to identify larval ciscoes will allow researchers to estimate recruitment, which is vital for management and conservation, and will also significantly improve our understanding of early life history characteristics and reproductive dynamics in this species. Finally, our results suggest that some management units created using last generation markers may not adequately reflect species diversity and could be re-evaluated with genomic data.

ACK N OWLED G EM ENTS
We thank Brad Ray with the Wisconsin Department of Natural

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
Demultiplexed RAD sequence data used in this research along with corresponding metadata were archived in the NCBI sequence read archive using the publicly accessible Genomic Observatories