Genomic and environmental influences on resilience in a cold‐water fish near the edge of its range

Abstract Small, isolated populations present a challenge for conservation. The dueling effects of selection and drift in a limited pool of genetic diversity make the responses of small populations to environmental perturbations erratic and difficult to predict. This is particularly true at the edge of a species range, where populations often persist at the limits of their environmental tolerances. Populations of cisco, Coregonus artedi, in inland lakes have experienced numerous extirpations along the southern edge of their range in recent decades, which are thought to result from environmental degradation and loss of cold, well‐oxygenated habitat as lakes warm. Yet, cisco extirpations do not show a clear latitudinal pattern, suggesting that local environmental factors and potentially local adaptation may influence resilience. Here, we used genomic tools to investigate the nature of this pattern of resilience. We used restriction site‐associated DNA capture (Rapture) sequencing to survey genomic diversity and differentiation in southern inland lake cisco populations and compared the frequency of deleterious mutations that potentially influence fitness across lakes. We also examined haplotype diversity in a region of the major histocompatibility complex involved in stress and immune system response. We correlated these metrics to spatial and environmental factors including latitude, lake size, and measures of oxythermal habitat and found significant relationships between genetic metrics and broad and local factors. High levels of genetic differentiation among populations were punctuated by a phylogeographic break and residual patterns of isolation‐by‐distance. Although the prevalence of deleterious mutations and inbreeding coefficients was significantly correlated with latitude, neutral and non‐neutral genetic diversity were most strongly correlated with lake surface area. Notably, differences among lakes in the availability of estimated oxythermal habitat left no clear population genomic signature. Our results shed light on the complex dynamics influencing these isolated populations and provide valuable information for their conservation.


| INTRODUC TI ON
Understanding the nature of resilience and the capacity for evolution in the face of rapidly changing environments is a cornerstone of conservation genomics. Sources of selective pressure are numerous, and studies using genotype-environment association analyses that detect locally adapted populations can provide valuable insight on the environmental forces that influence population viability (Berg et al., 2015;Eizaguirre et al., 2012;Yeaman et al., 2016). Since heritable genetic variation provides the foundation for adaptive capacity under selection, the maintenance of diversity is fundamental to the preservation of healthy populations (Hoban et al., 2013;Sgrò et al., 2011). The mutations that underlie allelic diversity occur on a spectrum that can be broadly reduced to three categories (advantageous, neutral, or deleterious), with the frequency of these categories varying under mutation-selection balance.
In small populations, the mutation-selection balance can often be upset, making the frequency of deleterious mutations a conservation concern. Most mutations are mildly deleterious (Fay et al., 2001;Keightley & Lynch, 2003;Ohta, 1995), and in a large, randomly outcrossing population, recombination and segregation are expected to keep the mutational load from significantly affecting absolute mean fitness (Agrawal & Whitlock, 2012;Haag & Roze, 2007). The role of genetic drift increases as population sizes decline, such that natural selection may be ineffective at removing deleterious mutations (Whitlock & Bürger, 2004;Wright, 1931), leading to "mutational meltdown" and eventually extirpation of small populations (Lynch et al., 1995). As new genomic tools have become available to identify putatively deleterious variants, studies of small populations are quantifying deleterious mutations to assess the viability of at-risk populations (Andrews, 2010) and to inform conservation efforts (e.g., Ferchaud et al., 2018;Grossen et al., 2020;Perrier et al., 2017).
Conservation of small populations is inherently challenging, especially when they are isolated from nearby populations (Pekkala et al., 2012;Ralls et al., 2018). Without the option of being rescued by immigration, isolated populations are especially susceptible to environmental fluctuations and local habitat degradation (Pimm, 1991). These unique attributes mean that conservation principles developed for larger populations or metapopulations may not be suitable for managing small populations. For example, many conservation plans attempt to understand how species will respond to rising global temperatures on relatively large spatial scales (e.g., 100s of kms), but the viability of isolated, small populations depends on the details of how their local environment changes-which may deviate from large-scale trends. For example, aquatic ecosystems vary widely in the source of water (Lisi et al., 2013), habitat dimensions and nutrient loads (Olden et al., 2001), and precipitation inputs (Bay et al., 2018) even within the same geographic region. Additionally, environmental variation at local scales has been linked to immunocompetence in aquatic populations . In habitats separated by as little as 1 km, Larson et al. (2016) found differences in water temperature were significantly correlated with diversity in a major histocompatibility complex (MHC) gene involved in immune system response. Understanding how these local factors influence the genetic integrity of small, isolated populations is vital for managing them.
Here, we use cisco (Coregonus artedi) as a model organism to understand how spatial and environmental factors influence genetic integrity of small, isolated populations under a changing climate. Cisco are an important forage fish found in deep, cold lakes across northern North America (Scott & Crossman, 1973). As coldwater specialists, ciscoes require both cold-water temperatures and sufficient dissolved oxygen, found within the oxythermal habitat layer, to survive the stratification that occurs in deep lakes during the summer (Cahn, 1927;Frey, 1955;Jacobson et al., 2008;Lyons et al., 2018). Rising global temperatures have been linked to freshwater fish die-offs (Till et al., 2019), and declines in abundance as well as complete extirpations of inland lake cisco populations have been documented along the southern edge of the species range in Minnesota, Wisconsin, and Indiana (Honsey et al., 2016;Jacobson et al., 2012). Contemporary surveys across 133 Wisconsin lakes that historically contained ciscoes reveal a mosaic pattern of extirpation rather than a clear latitudinal gradient (Lyons et al., 2015); geographic proximity does not predict which populations were extirpated and which continue to thrive, suggesting that local factors mediate population outcomes.
A growing body of research focuses on using oxythermal habitat modeling to predict extirpations of cold-water fish populations and design conservation strategies (Jacobson et al., 2013;Jiang et al., 2012;Lyons et al., 2018;Magee et al., 2019;Renik et al., 2020), but this perspective has not been fully integrated with genomic analyses of cisco. Previous genetic analyses across the cisco species range using mitochondrial and microsatellite DNA have found evidence for at least two major lineages (Bernatchez & Dodson, 1990;Turgeon & Bernatchez, 2001a, 2001b, 2003. More recently, a study of cisco in inland lakes of Ontario using a genomic method similar to restriction site-associated DNA (RAD) sequencing detected two lineages and measured high levels of differentiation among lake systems (Piette-Lauzière et al., 2019), but did not quantify genomic diversity within or among lakes. The southern edge of the species' range offers opportunities to assess how genetic metrics correlate with both spatial variables and static and climate-dynamic environmental variables that may be associated with population persistence.
In this study, we applied multiple genomic tools to profile surviving populations of cisco near their southern range limit in Minnesota, Wisconsin, Indiana, and Michigan, and reconstructed the dynamics of oxythermal habitat availability in each lake across a 37-year period before genetic sampling. We used restriction site-associated DNA capture (Rapture) sequencing Ali et al. (2016) to survey genomic diversity and differentiation, and to enumerate the presence and frequency of putative deleterious mutations among lakes. We also amplified the MHC IIβ exon 2 to examine the contributions of neutral or pathogen-mediated adaptive processes to haplotype diversity across these southern inland cisco populations. To understand the spatial and environmental context for genomic variation in ciscoes, we correlate each of these genetic metrics to the latitude, size, depth, and oxythermal habitat availability in each lake.

| Development of Rapture panel
We developed a Rapture panel using restriction site-associated DNA (RAD) data from 129 individuals (data previously published in Euclide et al., 2020)  . RAD library preparation was conducted using the SbfI enzyme according to the methods of Ali et al. (2016), and libraries were sequenced on two lanes of an Illumina HiSeq 4000 at the Michigan State Genomics Core Facility. Sequence data were analyzed with Stacks v1.46 (Catchen et al., 2011(Catchen et al., , 2013 with the following parameters: process_radtags (flags = c, -q, -r, -t 140), ustacks (flags = -m 3, -M 5, -H --max_locus_stacks 3, --model_type bounded, --bound_high 0.05), cstacks (-n of 4, 29 individuals with the most data included in the catalog), and populations (flags = -r 0.7, --min_maf 0.05). In addition, since all salmonids including cisco have experienced a recent genome duplication, putatively paralogous loci were identified with the program HDPlot (McKinney et al., 2017), and any loci with heterozygosity >0.6 or a read ratio deviation >10 and <-6 were classified as duplicated. We chose a total of 11,330 tags containing high-quality singleton SNPs (i.e., not from duplicated regions of the salmonid genome) and 1670 tags containing duplicated SNPs for a total of 13,000 tags. Sequence data for the loci that met our quality standards were sent to Arbor Biosciences (Ann Arbor, MI) for Rapture bait development. Arbor Biosciences conducted additional quality filters to ensure robust performance and synthesized two baits per-locus when possible, resulting in a final panel of 7753 loci and 14,382 unique baits.

| Rapture loci processing
Rapture libraries were prepared with the RAD protocol used for panel development followed by bait capture using the myBaits protocol v4.01 (Arbor Biosciences) with minor modifications. Libraries were hybridized with the bait mixture for 16 hr at 65°C and amplified with 10 PCR cycles, universal primers, and an annealing temperature of 56°C. Baited Rapture libraries were purified with 1X AMPureXP beads and sequenced on five Illumina HiSeq 4000 lanes at either the Michigan State Genomics Core Facility or Novogene.
Raw sequences were processed with Stacks v2.41 (Rochette et al., 2019). Sequences were demultiplexed by barcode, filtered for presence of the enzyme cut-site and quality, and trimmed in the subprogram process_radtags (parameter flags: -e SbfI -c -q -r -t 140 --filter_illumina --bestrad --disable-gapped -m 3 -M 5 -H --max_locus_stacks 4 --model_type bounded --bound_high 0.05). A master catalog of consensus loci was built in cstacks from a catalog of 51 C. artedi used in the development of a cisco linkage map (Blumstein et al., 2020) that was appended with five individuals from each lake that retained the highest number of reads after process_radtags (flags: -n 3 -p 6 -disable_ gapped). Locus stacks for individuals were matched to the catalog using sstacks (flag: --disable_gapped), data were oriented by locus in tsv2bam, and reads were aligned to loci and SNPs were called with gstacks. SNPs genotyped in >30% of individuals (flag: -r 0.3) were exported with populations to a variant call format (vcf) file.
SNP filtering was performed with vcftools v0.1.15 (Danecek et al., 2011) and included the following: (1) removing loci genotyped in <70% of individuals; (2) removing individuals missing >50% of loci; and (3) removing loci with a minor allele frequency <0.01. Any loci remaining after primary filtering that were not genotyped in every lake were also removed. Putatively paralogous loci were identified with the program HDPlot (McKinney et al., 2017), and loci with heterozygosity >0.5, a read ratio deviation >15 and < -15, and an allele balance >0.7 and <0.3 were removed. Finally, loci on the same RAD tag are likely to be physically linked. Therefore, only the SNP with the highest minor allele frequency on each tag was included in the final dataset. File format conversions were performed with PGDSpider v2.1.1.5 (Lischer & Excoffier, 2011).

| Genomic differentiation and diversity
We calculated diversity statistics for each population with GenoDive v3.0 (Meirmans, 2020), including observed and expected heterozygosity, an inbreeding coefficient (G IS , a heterozygosity-based analog TA B L E 1 Location and environmental data for the 29 cisco lakes sampled across the southern periphery of the species range. Complete oxythermal data can be found in Data S2 of F IS ; Nei, 1987), and the proportion of polymorphic alleles present in each population. Effective population size (N e ) was estimated in each population with the bias-corrected linkage disequilibrium method (LDNE; Hill, 1981;Waples, 2006;Waples & Do, 2010) in the software package NeEstimator v2.1 (Do et al., 2014). We restricted pairs of loci used in calculations to those comprised of two loci on different linkage groups of the cisco linkage map (Blumstein et al., 2020) using the LD locus pairing option to correct for the effects of physical linkage on the estimates of N e (Waples et al., 2016). We used a p-crit of 0.05 (Waples et al., 2016) to reduce potential bias introduced by low frequency alleles. N e calculations using the linkage disequilibrium method can be biased slightly downward when individuals from multiple cohorts are included in the sample due to a slight Wahlund effect (7% downward bias on average; Waples et al., 2014). Nonetheless, this small bias should not greatly affect the interpretation of the N e results.
We also used GenoDive to estimate standardized pairwise genetic differentiation between populations (F ST ). Analysis of genetic differentiation across our populations indicated that we likely sampled inland lakes across a putative phylogeographic break. We constructed a population tree from allele frequency data using the software POPTREE2 to identify which populations belonged to which divergent clade. Genetic distance between inland lake populations was measured with D A (Nei et al., 1983), and trees were generated using the neighbor-joining (NJ) method of Saitou and Nei (1987) with 1000 bootstrap replicates. Correlation between Slatkin's lin- (Rousset, 1997) and planar geographic distance between populations was examined using Mantel and partial Mantel tests to test for isolation-by-distance (IBD) while accounting for lineage effects (Mantel, 1967;Meirmans, 2012;Sokal, 1979), and the regression between F ST /1 − F ST and geographic distance (km) was plotted. We also examined the relationship of genetic distance and spatial heterogeneity in effective population size by testing the with log (surface area) as a proxy for local carrying capacity, as well as the correlation of average pairwise genetic differentiation to log (surface area) as implemented in Perrier et al. (2017). Finally, we used analysis of molecular variance (AMOVA) in GenoDive to test for the relative amounts of variance in our dataset that could be explained by individual, population, or lineage effects.

| Identifying putatively deleterious mutations
We identified putative deleterious mutations in our Rapture data using the program PROVEAN (Choi et al., 2012) and methods similar to Perrier et al. (2017) and Ferchaud et al. (2020), Ferchaud et al. (2018); scripts included with our supplementary material on Dryad).
First, we created a fasta file including each allele for each locus from the catalog.fa file output from Stacks and a VCF file using the script make_fasta_with_all_alleles.py. We then aligned sequences for each RAD locus to the protein coding sequence from the Atlantic salmon (Salmo salar) genome (Lien et al., 2016)

| MHC processing
We amplified a 300-bp region of the MHC IIβ exon 2 using primers and reaction conditions described in Pavey et al. (2011). We then attached sequencing barcodes and adapters and normalized the DNA using the genotyping-in-thousands by sequencing (GT-seq) protocol (Campbell et al., 2015) following Bootsma, Gruenthal, et al. (2020). Each plate was barcoded and then batch normalized using SequalPrep DNA Normalization plates (Invitrogen). Normalized products were pooled, then cleaned and size selected using Magoč & Salzberg, 2011), and the final 300 bp reads were quality filtered with the perl tool PRINSEQ-lite (http://prins eq.sourc eforge. net/) using a minimum base quality score (min_qual_score) of 30. The demultiplexing process removed barcodes from sequences so those were extracted from sequence headers and inserted at the beginning of reads (barcodes-min.py) before importing into the software jMHC (Stuglik et al., 2011). To maximize the number of sequences imported to the jMHC SQLite database, the forward and reverse primers were truncated to 10 bps. Allowing for one mismatch per 12 bp i7+i5 barcode, reads were then searched from the forward orientation, and if both primers and barcode were found, the read was imported to the database. Unique haplotypes with >50 reads were then exported for analysis. An MHC haplotype was present in an individual if the haplotype contained >10% of reads in that individual. MHC haplotypes that were only found once in the dataset were likely errors and were excluded. Many individuals contained >2 MHC haplotypes, indicating that our primers amplified multiple MHC genes, which is an expected result for salmonids (Harstad et al., 2008). However, we were unable to assign dosage (i.e., copy number) for each allele because there were no clear differences in read count distributions among putative genotype classes. Our genotype matrix is therefore coded as presence/absence data rather than complete genotype calls.
We used the generated MHC genotype matrix to investigate diversity, selection, and differences in MHC haplotype frequencies among populations. Estimates of allelic diversity can be highly influenced by sample sizes, but rarefaction methods such as that implemented in the program HPRARE (Kalinowski, 2005) can be used to estimate unbiased allelic richness. Unfortunately, this program can only take diploid data and many of the individuals in our dataset contained >2 MHC alleles. We therefore calculated a rarefied average number of MHC haplotypes present in each population (N hap ) by randomly sampling N = 5 individuals (minimum sample size in our study) from each population 1000 times and calculating the mean number of haplotypes found in those samples. We also estimated the proportion of individuals in each population that contain more than one allele (Prop poly ) and the proportion of individuals in a population with more than two MHC alleles (Prop >2 alleles ). We conducted a codon-based Z-test of selection in MEGA 7 using the Nei-Gojobori method (Nei & Gojobori, 1986) to test whether the ratio of nonsynonymous mutations (DN) was greater than synonymous mutations (DS). Finally, we visualized MHC haplotype frequencies calculated as the number of individuals containing a given haplotype divided by the total number of individuals for each population with a stacked bar plot. This metric was standardized to 1 by dividing by the sum of the frequencies for each population which was >1 as individuals can have >1 haplotype.

| Oxythermal habitat modeling
We used the open source, vertical one-dimensional hydrodynamic Historic climate drivers for each lake were estimated using data from (Winslow et al., 2017) and expanded following similar methods for lakes that were not available in the original dataset. The model does not consider surface inflow of water into the system, and we assumed that water levels remained constant over the study period.
To counteract any long-term negative change in lake level, we applied a small correction to precipitation in summer as in Winslow et al. (2017). In situ water temperature and dissolved oxygen profiles were collected from the NTL-LTER program (https://lter.limno logy.

| Environmental variables in sampled cisco populations
We obtained data on surface area and maximum depth from Minnesota Department of Natural Resources databases for Minnesota lakes, from Renik et al. (2020) for Wisconsin Lakes, from Honsey (2014) for Indiana lakes, and from an EPA report for Howard Lake in Michigan (EPA, 1975 we also extracted long-term averages and extreme values that could lead to bottlenecks in habitat availability. Lake surface area and maximum depth were considered as static descriptors of each lake.

| Correlating genetic variables with spatial and environmental parameters
We screened spatial and environmental variables for multicollinearity using pairwise correlations and variance inflation factors (VIF) with the R packages Hmisc (Harrell Jr & others, 2020) and usdm (Naimi et al., 2014). Predictors were pruned iteratively based on a threshold of VIF <5 and pairwise Pearson's correlation coefficients of |r| >0.7 (Dormann et al., 2013). We conducted RDA analysis in the R package vegan (Okansen et al., 2016) to test for a relationship between spatial and environmental variables and genetic variables. For this analysis, the response matrix was composed of genetic diversity metrics and the predictor matrix was composed of spatial and environmental variables. Our choice to conduct RDA instead of canonical correspondence analysis, a similar multivariate method, was informed by a detrended correspondence analysis, which indicated that variables were linear and thus more appropriate for RDA (Legendre & Legendre, 1998).
The significance of each predictor (i.e., their influence on variation in genetic metrics) was assessed with ANOVAs. We then used multiple linear regressions to test for relationships between genetic (response) variables and environmental (predictor) variables. The relative importance of each variable was assessed with the lmg method implemented in the R package relaimpo (Groemping, 2006). Finally, we conducted univariate linear regressions with the 'lm' function in R (and visualized with ggplot2; Wickham, 2011) to test for individual relationships between genetic variables and spatial and environmental variables that were significant in the RDA.

Rapture loci
Sequence reads were generated for a total of 850 samples from our 29 lakes, and after processing in Stacks, 543,108 shared variant sites were genotyped in >30% of individuals. A total of 59,865 loci were genotyped in >70% of individuals. Eighty individuals were dropped for missingness >50%. A large percentage of samples dropped were from previously archived DNA extracts, dried fin clips, and/or collections from die-offs. The loss of these individuals was most likely due to low quality (degraded) DNA in the extracts used for Rapture library preparation, which has been shown to highly impact the percentage of retained tags (Graham et al., 2015). The dataset contained 16,584 loci after filtering for minor allele frequency. Following the removal of putatively paralogous loci (n = 3578), removal of loci that were not genotyped in every lake (n = 59), and filtering to a single SNP per Rapture locus, 7546 SNPs genotyped in 770 individuals (see Table 2 for number of individuals genotyped per population) remained for analysis.
Differentiation among populations was high (average F ST : 0.361, Figure 2a; Table S1). The lowest F ST was measured between two  Table S2). Outside of ATK, the largest pairwise F ST value was measured between Sugar Lake in Minnesota (SGR) and EVE in Similarly, an AMOVA indicated that lineages accounted for 13.6% of the variance in the data (p < 0.001), whereas populations accounted for 25.6% of the variance (p < 0.009, Table 3). Given the lineage effects on our estimates of genetic differentiation, we limited our exploration of the relationships of variance in genetic differentiation to variance in population size [as di of log (surface area)] and average pairwise differentiation to log (surface area) to the MS lineage with ATK excluded (a total of 18 lakes, see Figure S1). We found no significant correlation between F ST /(1 − F ST ) and di of log (surface area) (adjusted r 2 < 0.01, p = 0.615); however, the average pairwise differentiation for a lake was slightly negatively correlated with log   Table 2). Median N e across populations was 296 with most populations exhibiting an N e between 100 and 1000 (n = 22). The two populations that had an N e > 1000-White Sand (WSL) and Rainbow (RBW) lake-were both in Wisconsin, and the three lakes that had an N e < 100-Sugar Lake in Minnesota (SGR), Black Oak Lake in northern Wisconsin (BLK), and Crooked Lake in Indiana (CRD)were distributed across the sample range. An additional two populations generated 'infinite' estimates, EVE in Indiana and Howard Lake (HOW) in Michigan. An 'infinite' N e is more likely a sign that the sample sizes in these populations were too small to generate a finite estimate with the LDNE method rather than a reflection of extremely large population sizes (see Do et al., 2014;Waples & Do, 2010). Due to potential sample size effects on estimates of N e and the inability to generate bounded estimates for all sample sites, we did not include N e in the genetic variables used in correlations.

| Identifying putatively deleterious mutations
A total of 912 of the 7546 loci were successfully aligned to protein sequences from the Atlantic salmon genome. Of these 912 loci, 315 had alleles that coded for different protein sequences (i.e., nonsynonymous mutations), and 173 of these mutations (54.9%) were predicted to be putatively deleterious (PROVEAN score ≤ −2.5).
The proportion of all identified deleterious mutations found in each population (i.e., with frequency >0 in a given population) averaged 0.95 and varied from 0.80 to 1 (Table 2)

| Haplotype diversity in the MHC
A total of 629 fish were successfully genotyped for MHC IIβ exon 2 ( Pike Lake (PIK) to 9.15 in Lake Geneva (GNV, average = 5.93). MHC haplotype frequencies did not closely reflect geographic proximity; large differences in MHC haplotypes were observed at small spatial scales ( Figure S2). This was especially apparent for geographically proximate populations in northern Wisconsin. For example, Big Muskellunge (MSK) and White Sand (WSL) lakes had an almost completely different set of haplotypes despite being located <10 km apart. MHC haplotypes were also decoupled from underlying population genomic structure; for instance, the seq00111 and seq00024 haplotypes were found in high frequencies in populations from both major lineages (see next section, Data S1 and S2).

| Relationships among genetic variables and spatial and environmental parameters
Sampled populations varied substantially in habitat metrics (Table 1;   Table S4). Results from the RDA (global p < 0.001, 46% of variation in genetic metrics explained by predictor variables, R 2 adj =0.375) demonstrated that AF del mut and G IS responded differently to predictors and were not strongly associated with other genetic variables, as compared to H o , Prop del mut , and Prop poly (Rapture), which had similar loadings on RDA1&2 ( Figure 5). MHC Prop poly and N hap also displayed similar loadings. Latitude contributed most to differences in genetic variables with high loadings on RDA1 (particularly AF del mut and G IS ), whereas lake surface area and maximum depth were associated with differences in genetic variables with high loadings on RDA2. Populations did not cluster tightly by geography indicating that variation in genetic and environmental variables is at least somewhat spatially decoupled. Analysis of variance on the RDA results indicated that latitude and surface area significantly influenced genetic variables (p < 0.01, Table 4). Depth nearly met the threshold for significance (p = 0.053). To assess whether previously removed environmental variables that were correlated with depth or TDO3 max (r > 0.80) might explain a significant portion of genomic variance, we replaced these variables and reran the ANOVA. When depth was replaced with minimum oxythermal habitat volume (average), an even lower proportion of variance was explained (p = 0.549). Similarly, when TDO3 max was replaced with COSD, the amount of genomic variance explained remained nonsignificant (p = 0.591).
Multiple linear regressions conducted with seven genetic metrics (Rapture data: H o , G IS , Prop poly , AF del mut , Prop del mut ; MHC data: Prop poly , N hap ) as the response variables and four spatial or environmental metrics (latitude, surface area, depth, TDO3 max ) as the predictor variables indicated that the predictor with the highest relative importance on average was latitude followed by surface area, and maximum depth (Table 5)

| DISCUSS ION
Our coupled genomic-environmental approach offers new insights into the factors influencing genetic diversity and differentiation at the southern range boundary for cisco. High levels of genetic differentiation between populations were punctuated by a phylogeographic break and residual patterns of isolation-by-distance (IBD). Although and non-neutral (MHC) genetic diversity were most strongly correlated with lake surface area. Our results provide valuable information for guiding restoration efforts as well as spatial and environmental factors that may govern the persistence of ciscoes and other coldwater specialist species as the climate continues to warm.

| Historic vicariance and drift in small, isolated populations
The discovery of two divergent sets of populations in our study that we hypothesize are distinct genetic lineages provides new insights on the phylogeographic history of C. artedi. Bernatchez (2001a, 2003)  River basins. A similar pattern of genetic association within these river basins has also been found in walleye (Sander vitreus; Bootsma, Miller, et al., 2021), suggesting these drainages maintained corridors of migration between nearby refugia before receding water levels isolated inland lakes.
The drift that followed the isolation of inland lake populations is reflected in the high levels of genetic differentiation we found between our inland cisco populations as well as the pattern of increased  Table 1. Latitude and surface area significantly influenced genetic variables according to an analysis of variance (p < 0.01, Table 4). Other environmental variables had p-values > 0.05 inbreeding (Franklin, 1980); however, ATK did not exhibit a correspondingly elevated inbreeding coefficient as observed in two populations with equal or lower genetic diversity, Okauchee (OKA) and Eve (EVE). In light of this, it is worth considering that ATK is unique among our sampled lakes in that it contains the dwarf cisco ecotype.
Dwarf coregonines are smaller-bodied, spawn earlier, and typically exhibit shorter life spans than their alternative ecotype counterparts (Mann & McCart, 1981;Svardson, 1950). Transplanted dwarf cisco maintain their early spawning behavior (Shields & Underhill, 1993) suggesting heritable differences between ecotypes exist, and since contemporarily ATK is only known to contain the dwarf ecotype, this could lead to a small but significant increase in invariant loci when a panel developed to target polymorphic loci in alternative ecotype cisco is used.

| Spatial and environmental correlates of genetic metrics
Loss of adaptive genetic variation and inbreeding depression via the accumulation of deleterious mutations are major threats to the persistence of small, isolated populations (Charlesworth & Willis, 2009;Hedrick & Garcia-Dorado, 2016;Hedrick & Kalinowski, 2000;Keller & Waller, 2002;Lande, 1995). Therefore, we used a variety of metrics to quantify neutral (Rapture) and non-neutral (MHC) genetic diversity, inbreeding, and the frequency of putative deleterious mutations in inland lake cisco populations at the southern extent of their range. Understanding how spatial or environmental factors are correlated with any of these risk factors can provide managers with potential avenues for mitigating problems before they lead to extirpation, as well as identify characteristics of suitable lakes for reintroductions.
Latitude and the frequency of putative deleterious mutations exhibited the strongest correlation between a predictor variable and a genetic metric in our data. Across our sampled lakes, the proportion and frequency of deleterious mutations decreased the further south a population was located. While the proportions of deleterious mutations across populations of inland cisco were similar in range to those found in North American lake trout populations in Ferchaud et al. (2018), the allele frequencies we measured appear much larger than those of widespread continuous populations where most deleterious mutations tend to be a minor allele (Conte et al., 2017;Zhang et al., 2016). The most similar systems to our small, isolated inland cisco populations that have had mutational load evaluated are North American lake trout and brook trout populations; however, differences in methodologies make comparisons challenging. For example, Ferchaud et al. (2018), Ferchaud et al. (2020) used the same PROVEAN threshold we did, but interpreted the minor allele as the deleterious mutation regardless of +/− state due to expectations that strongly deleterious mutations should be purged and in low frequency. The range of average deleterious mutation allele frequencies under this assumption for lake trout was 0.12-0.24 (Ferchaud et al., 2018), and when we analyze our data using the same assumption, our average deleterious mutation allele frequencies similarly range from 0.07 to 0.18, so it is possible that Ferchaud et al. (2018) may have also found high deleterious allele frequencies if they had followed the same interpretation we employ here. A PROVEAN score divides nonsynonymous mutations into binary categories of  (Crow, 1993;Glemin, 2003;Kimura et al., 1963).
The relatively high average allele frequencies of deleterious mutations we measured could be reflective of an accumulation of mildly deleterious mutations rather than high frequencies of strongly deleterious mutations.
Predicting what patterns of mutational load we might expect to see in inland cisco populations along the southern (trailing) edge of the species range is not straightforward. Several studies have observed increased mutational load on the edges of a species range (Henry et al., 2015;A. Perrier et al., 2020;Willi et al., 2018;Zhang et al., 2016) as a result of increasingly smaller peripheral populations driving increased genetic drift and less effective purifying selection (Crow, 1993;Kimura et al., 1963;Whitlock, 2000). Alternately, many studies have simulated or observed increased mutational load and decreased heterozygosity specifically at the leading edge of a species range as a result of the demographic processes associated with colonization (Lohmueller et al., 2008;Peischl et al., 2013;Rougemont et al., 2020). Our observations of increased heterozygosity and increased mutational load with latitude in southern inland cisco populations do not perfectly align with either of these previously observed patterns of mutational load. The majority of study systems that have been used to examine mutational load across species ranges thus far have exhibited at least some level of migration and interpopulation gene flow (e.g., Rougemont et al., 2020;Willi et al., 2018;Zhang et al., 2016), unlike cisco populations, which were successively isolated in lakes as glaciers receded over thousands of years (Dyke & Prest, 1987). The small but significant correlations for decreased heterozygosity and increased inbreeding we observed the further south a population was found could expose recessive deleterious mutations to selection and lead to purging rather than accumulating mutational load (Hedrick, 1994;Hedrick & Garcia-Dorado, 2016;Kirkpatrick & Jarne, 2000;Lande & Schemske, 1985;Lynch et al., 1995). Perrier et al. (2017) investigated the proportion of deleterious mutations in isolated populations of lake trout in Quebec, Canada, and although they did not examine whether these proportions correlated with latitude, they similarly detected a weak but significant correlation between latitude and inbreeding, suggesting positive associations between latitude and the frequency of deleterious mutations may exist in other glacial lake populations.
Ultimately, extended sampling across a larger portion of the species range is needed to assess the relative roles of drift and selection versus demographic processes such as directional colonization and lineage admixture on the latitudinal pattern of mutational load we observed in southern inland lake cisco populations.
Genetic diversity in inland cisco populations was also linked to lake-specific environmental variation, reflecting both static and dynamic variables. The number of MHC haplotypes observed in each population, which showed no association with latitude, was positively correlated to lake surface area. No other patterns among populations or correlations with environmental factors were found in the MHC IIβ exon 2, including between lineages, suggesting that diversity in this gene may be linked to population size and drift.
Some measures of diversity that were associated with broad latitudinal patterns were also correlated to lake surface area, including neutral diversity and the proportion of deleterious mutations in cisco populations. A positive relationship between genetic diversity and lake size has also been found in several other salmonids, including lake trout (Perrier et al., 2017;Valiquette et al., 2014) and brook trout (Castric et al., 2001), as well as three-spined stickleback (Gasterosteus aculeatus) (Bolnick & Ballare, 2020;Caldera & Bolnick, 2008). Moreover, Bolnick and Ballare (2020) found that genomewide heterozygosity in lacustrine three-spined stickleback was not correlated with resource diversity and morphological variation, consistent with the expectation that larger population sizes supported by larger lakes experience less drift and therefore increased neutral genetic diversity. Decreasing amounts of drift in larger effective population sizes has also been suspected of driving the positive correlation of proportion of deleterious mutations and lake size observed in lake trout (Perrier et al., 2017).
It is notable that estimates of TDO3 were not significantly correlated with any genetic variables, and this may reflect the complexities of using one-dimensional modeling to describe threedimensional habitat. TDO3 and related metrics are useful, but these metrics and corresponding one-dimensional modeling and deephole lake monitoring cannot capture isolated pockets of refugia habitat that are often present in large lakes. For example, when using three-dimensional thermal and oxygen models to simulate Lake Mendota, Wisconsin, results indicated that the majority of cisco habitat in late summer may actually be around the 12-15 m contour of the lake where internal seiche and downwelling events are more likely to occur, bringing pockets of oxygenated water into the hypolimnion, which may act as refugia habitat to sustain the remnant cisco population (see Figure S3). Similarly, by their nature, onedimensional models also cannot capture these pockets of refugia within large lakes with multiple deep locations and abrupt changes in bathymetry. In addition, TDO3 only captures one important lake characteristic influencing inland cisco populations. Lake size, which was correlated with genetic diversity, likely encompasses a multitude of factors influencing population size such as prey availability, refuge habitat from predators, and spawning habitat.
F I G U R E 6 Relationships among four genetic metrics and latitude and lake surface area which were found to significantly influence genetic data in the redundancy analysis ( Figure 5). Genetic metrics presented here include observed heterozygosity (H O ), the average allele frequency of putatively deleterious mutations in each population (AF del mut ), the proportion of deleterious mutations found in a population (Prop del mut ), and the number of MHC haplotypes per population (N hap ). p-Values and correlation coefficients for each regression are denoted 4.3 | Conserving inland lake cisco populations at the southern extent of their range Genomic data can provide valuable information to guide decision making in fish conservation (Bernos et al., 2020), In Wisconsin, where historic MS and GL lineages converge, careful consideration should be given before implementing a reintroduction or augmentation plan that involves mixing divergent genetic lineages as the presence of fixed differences in highly divergent populations is a risk factor for outbreeding depression (Frankham et al., 2011).
A current reintroduction initiative involving live-transfers of cisco from WSL to nearby Crystal and Sparkling lakes in fall 2020 will provide important baseline information for the use of regional singleand multiple-source strategies more broadly.
Augmentation through supplementation is rare in inland cisco populations compared to other northern lacustrine salmonids such as lake whitefish (Coregonus clupeaformis) and lake trout, which are more likely to be targeted by recreational fishers (Frey, 1955). A handful of records from cisco stocking events in the early 20 th century in Indiana and Michigan exist (Frey, 1955;Homola et al., 2021), and increases in mitochondrial diversity from historic stocking in a few Michigan lakes has been observed (Homola et al., 2021). To the best of our knowledge, none of the lakes included in our study were ever stocked, and Homola et al. (2021) found no evidence of stocking in the one lake we sampled in Michigan (HOW). The broad patterns in IBD and frequency of deleterious mutations we observed across our sampled lakes further support a lack of evidence for stocking events. Contemporarily, conservation measures for inland lake cisco populations focus on mitigating cold-water habitat degradation and controlling harvest (e.g., Jacobson et al., 2010Jacobson et al., , 2012Tingley et al., 2019) rather than supplementation. With little empirical evidence available for the influences of supplementation on inland lake cisco, it is difficult to predict the viability of this strategy to retain beneficial genetic diversity. Genetic rescue of depleted populations through outcrossing has been shown to increase diversity and fitness in many small, isolated populations (Fitzpatrick et al., 2020;Frankham, 2015Frankham, , 2016. There is some contention, however, on whether ideal source populations used for augmentation should maximize the genetic diversity of target populations (Ralls et al., 2020) or minimize the introduction of strongly deleterious mutations (Kyriazis et al., 2020;Robinson et al., 2020). Ferchaud et al. (2018) found increased neutral diversity and fewer deleterious mutations in stocked versus unstocked populations of lake trout, attributing this trend to a reduction of the influence of drift through supplementation. Given the pattern of decreased mutational load in southern inland lake cisco populations, however, the introduction of new deleterious mutations could be problematic. Additionally, supplementing inland lake cisco populations with fish from the Laurentian Great Lakes could be detrimental regardless of mutational load since these populations have been observed to contain different phenotypic and behavioral adaptations from native inland lake populations (Jacobson et al., 2018), which are also risk factors for outbreeding depression (Frankham et al., 2011). Given the unpredictability of augmentation, it may be best to document interpopulation interactions through cisco reintroduction initiatives before applying this approach to vulnerable extant populations.
The correlation we observed between genetic diversity and lake surface area suggests one key to resilience in southern inland lake cisco populations may be maintaining populations that are large and diverse enough to survive the stochastic demographic, environmental, or genetic factors that drive the extinction of small populations (Frankham, 2008;Pimm, 1991). The predictor variables examined for correlations to genetic metrics in our study are only a subset of the landscape heterogeneity predicted to play a role in the persistence of inland lake cisco populations. While larger lake size was found to be a good predictor of the presence of historic cisco populations in Indiana (Honsey et al., 2016), nonpoint-source nutrient loading, urbanization, and sedimentation from land-use are all observed or predicted to influence the quality of contemporary cisco habitat in southern lakes (Honsey et al., 2016;Jacobson et al., 2012;Magee et al., 2019). Additional threats are lake-specific, such as the introduction of invasive species like rainbow smelt Osmerus mordax (Hrabik et al., 1998). The combined interaction of these idiosyncratic factors with fine-scale heterogeneity in genetic diversity likely explains the lack of a predictable pattern to extirpations (Lyons et al., 2015;Renik et al., 2020).
Current recommendations for protecting suitable cold-water habitat suggest management at the catchment scale (Jacobson et al., 2013) jointly with adapted regional plans to address threats such as climate change (Tingley et al., 2019), and our findings of broad, latitudinal, and local-scale patterns of maladaptation and genetic diversity support this approach for conserving southern inland lake cisco populations. The mosaic pattern of observed extirpations in this region coupled with the high levels of differentiation we observed at small spatial scales and correlations of lake size and diversity suggest that genetic resilience in inland lake cisco populations is closely associated to levels of genetic drift.
The most successful conservation strategies, therefore, may be those that adequately account for the fine-scale environmental processes that influence population sizes in order to reduce the chance of stochastic natural or anthropogenic events resulting in further extirpations.