Multilocus genetic analyses and spatial modeling reveal complex population structure and history in a widespread resident North American passerine (Perisoreus canadensis)

Abstract An increasing body of studies of widely distributed, high latitude species shows a variety of refugial locations and population genetic patterns. We examined the effects of glaciations and dispersal barriers on the population genetic patterns of a widely distributed, high latitude, resident corvid, the gray jay (Perisoreus canadensis), using the highly variable mitochondrial DNA (mtDNA) control region and microsatellite markers combined with species distribution modeling. We sequenced 914 bp of mtDNA control region for 375 individuals from 37 populations and screened seven loci for 402 individuals from 27 populations across the gray jay range. We used species distribution modeling and a range of phylogeographic analyses (haplotype diversity, ΦST, SAMOVA, F ST, Bayesian clustering analyses) to examine evolutionary history and population genetic structure. MtDNA and microsatellite markers revealed significant genetic differentiation among populations with high concordance between markers. Paleodistribution models supported at least five potential areas of suitable gray jay habitat during the last glacial maximum and revealed distributions similar to the gray jay's contemporary during the last interglacial. Colonization from and prolonged isolation in multiple refugia is evident. Historical climatic fluctuations, the presence of multiple dispersal barriers, and highly restricted gene flow appear to be responsible for strong genetic diversification and differentiation in gray jays.

The gray jay (Perisoreus canadensis; Figure 1) is ideal for investigating patterns of postglacial colonization and the impact of dispersal barriers on resident species for several reasons. Gray jays are a relatively sedentary species, like their putative sister species the Siberian jay (Perisoreus infaustus; Strickland & Ouellet, 2011), which exhibits strong population genetic structure in fragmented habitats (Uimaniemi et al., 2000). Adult gray jays remain in the same territory between breeding seasons, and natal dispersal is limited to nearby territories, though some irruptive juvenile dispersal has been observed (Strickland & Ouellet, 2011). Gray jays are broadly distributed across northern and western North America ( Figure 2) and strongly associated with spruce (Picea spp.). Gray jay contemporary range encompasses a number of purported barriers to dispersal (e.g., Salish Sea, Strait of Belle Isle, Columbia Basin), in addition to previously glaciated (e.g., most of Canada) and unglaciated areas (e.g., Alaska, western United States).
Gray jays display plumage and morphological trait variation across their range (Strickland & Ouellet, 2011). The presence of distinct morphs suggests the potential for reduced gene flow and population structure (Arnoux et al., 2014;Burg et al., 2005;Miller-Butterworth, Jacobs, & Harley, 2003), though morphological characteristics have also been shown to vary with temperature and other environmental variables (Diniz-Filho et al., 2009).
Using both mitochondrial DNA and nuclear microsatellite markers, we examine genetic structure and the effect of Pleistocene glaciations and dispersal barriers on genetic variation in this species. A previous study by van Els, Cicero, and Klicka (2012) using mtDNA data found that gray jays exhibit high levels of genetic diversity and genetic structure throughout their range; these patterns likely stem from populations residing in multiple ice-free refugia during the LGM. Although this study had a relatively large sample size (n = 205), many of the sites included in the study had small sample sizes (mean = 3.9 individuals/ site). Here, we use expanded sampling to include more populations from previously glaciated areas and incorporate more sites from the full distribution of gray jays. In addition, incorporating both mtDNA and microsatellite markers allows us to compare historical (mtDNA) and contemporary (microsatellite) genetic patterns in this species. Based on limited dispersal, patterns of glaciation during the LGM, and present distribution, we predict that gray jays expanded from multiple refugia throughout North America, and will exhibit high levels of genetic divergence between populations separated by barriers to dispersal.

| Sample collection
From 2007 to 2012, we captured gray jays at each sampling site (hereafter referred to as a population) using standard mistnetting techniques with call playback. We limited mistnetting locations to within a 50 km radius and sites contained no obvious barriers to dispersal. Sampling sites were paired in two ways: (1) located in areas that were previously glaciated and unglaciated during the last glacial maximum and (2) on either side of possible barriers to dispersal ( Figure 2). We collected less than 100 μl of blood from each bird, and blood was stored in 95% ethanol. Each bird was banded with a US Fish & Wildlife Service aluminum band, and aged and sexed when possible using standard procedures and protocols (Tables S1-S5). Additional genetic samples were obtained from museum collections taken from birds during the breeding season within the past 20 years (Table 1;   Table S1). DNA was extracted from blood, tissue, and feather samples using a modified Chelex protocol (Burg & Croxall, 2001;Walsh, Metzger, & Higuchi, 1991 we used a shrimp alkaline phosphatase-exonuclease clean up followed by sequencing and sodium acetate precipitation (Graham & Burg, 2012) before electrophoresis.

| Microsatellite DNA
We screened a subset of individuals at 30 microsatellite primer pairs developed for and used in other corvids. Seven of the 30 loci were polymorphic. To allow for integration of a fluorescently labeled primer (700 or 800 nm) directly into the PCR product, we modified all forward primers by adding an M13 sequence (5′-CAC GAC GTT GTA AAA CGA C-3′) to the 5′ end. DNA was amplified in a 10 μl reaction with 1× buffer, 1 mmol/L MgCl 2 , 200 μmol/L dNTP (Fisher Scientific), 1 μmol/L of each primer (forward and reverse), 0.05 μmol/L of the fluorescent primer (Eurofins MWG Operon) and 0.5 units taq polymerase under the following conditions: one cycle of 94°C for 120 s, T 1 for 45 s, and 72°C for 60 s, seven cycles of 94°C for 60 s, T 1 for 30 s and 72°C for 45 s, 31 cycles of 94°C for 30 s, T 2 for 30 s, and 72°C for 45 s, and one final elongation cycle at 72°C for 5 min (Table S2). PCR products were mixed with a stop solution (95% formamide, 20 mmol/L EDTA and bromophenol blue), denatured for 3 min at 94°C, then run on a 6% polyacrylamide gel using a LI-COR 4300 DNA Analyzer (LI-COR Inc., Lincoln, NE). Alleles were scored via visual inspection, and genotypes were independently confirmed by a second person. Three controls of known allele sizes (pre-screened individuals) plus a size standard were included on each load to ensure consistent scoring along with a negative control to ensure no contamination was present.

| Mitochondrial DNA
We edited and aligned sequences from chromatograms using mega v 5.0 (Tamura et al., 2011). To assess population structure and evaluate F I G U R E 2 Sampled gray jay populations. Gray jay range (light green) in North America and central location of sampled populations (white circles) overlaid on digital elevation model of North America. Population abbreviations and locations are given in Table 1 T A B L E 1 Summary  Table S1 for additional museum collection information including voucher/specimen numbers, latitude and longitude, and sex. relationships among haplotypes, we constructed a statistical parsimony network (95% probability) using tcs v 1.21 (Clement, Posada, & Crandall, 2000). We measured genetic variation within populations and haplogroups by calculating haplotype (H d ) and nucleotide (π) diversity using arlequin v 3.11 (Excoffier, Laval, & Schneider, 2005). To examine population structure and assess genetic differentiation among populations and haplogroups, we calculated pairwise Φ ST values (an analogue of Wright's fixation index F ST ) using arlequin v 3.11 (Excoffier et al., 2005). We corrected significance values using a Benjamini-Hochberg correction (Benjamini & Hochberg, 1995) to control for false discovery rate (FDR). We examined genetic structure within and among populations by performing an analysis of molecular variance (AMOVA) in arlequin v 3.11 (Excoffier et al., 2005) and used a spatial analysis of molecular variance (SAMOVA; Dupanloup, Schneider, & Excoffier, 2002) approach to assess barriers between gray jay populations.
To reconstruct the phylogenetic relationship among populations, we used the Bayesian inference program MrBayes 3.2 (Ronquist et al., 2012). For our analyses, we analyzed all CR haplotypes using a GTR G+I model as this was the best-fit model, as determined in JModelTest (version 0.1.1; Posada, 2008). We ran the analyses for 10 million generations using four chains, sampling every 100th generation. We used a burn-in percentage of 25%, using the remaining trees to construct consensus trees, which we viewed using FIGTREE 1.3.1 (Rambaut & Drummond, 2006).
Bayesian clustering analyses were conducted using Structure v2.3.3 (Falush, Stephens, & Pritchard, 2003;Pritchard, Stephens, & Donnelly, 2000); we used the following settings for our initial run examining all 27 populations: a burn-in of 100,000 followed by 500,000 runs, admixture assumed, correlated allele frequencies without population information as an a priori. Ten replicates were performed for each value of K. In structure, it can be difficult to decide when K captures major structure in the data due to similar lnP(X|K) values, thus structure Harvester (Earl & von Holdt, 2012) was used to confirm the most parsimonious clustering of groups. Following our initial run that included all 27 populations, we tested for hierarchical structure, following the procedure used by Adams and Burg (2015). For these runs, we used the same settings as our initial run, although we used a burn-in of 50,000 followed by 100,000 chains.

| Correlates predicting genetic structure
We used two separate approaches to examine the factors that influence genetic structure. First we used the program BARRIER to identify potential barriers that may contribute to genetic structure. BARRIER uses Delaunay triangulation and Monmonier's distance matrix to identify potential barriers. We identified the first 10 genetic barriers using both our mtDNA and microsatellite datasets; distance matrices were created using pairwise Φ ST and F ST values. We identified barriers with each dataset separately, so that we could compare patterns between markers and determine if similar barriers influence historical and contemporary genetic patterns.
Next, we used a distance-based redundancy analysis (dbRDA) to test the role of ecological variables on genetic variation. We ran two separate analyses, one for mtDNA genetic variation and a second for microsatellite genetic variation. DbRDA is a multivariate approach to test the effect of multiple predictor variables on one or more response variables (Legendre & Legendre, 1998). Although Mantel tests are often used to measure the relationship between genetic matrices and other distance matrices, recent studies have suggested that canonical statistical approaches like dbRDA are better suited for examining questions where distance matrices are not applicable (Legendre & Fortin, 2010). This approach is especially useful for studies examining the influence of environmental variation or other abiotic factors because it allows for the testing of those variables directly.
To construct our dbRDA models, we used the "capscale" function in the R package Vegan (R Core Team, 2016). We performed this analysis at the individual level so that we could examine the full-extent genetic variation in both mtDNA and microsatellite patterns. For our response variable, we calculated Nei's genetic distance between all individuals for mtDNA and microsatellite datasets using GenAlEx (Peakall & Smouse, 2006). We examined six predictor variables in our models, including geographic location (latitude and longitude) for each individual and geographic distance. For our geographic distance, we used the first principal coordinate for each individual; similar to our genetic response variables, we performed a principal coordinate analysis in GenAlEx on a geographic distance matrix following the approach of Kierepka & Latch, (2016). For our remaining four variables, we used information obtained from our spatial distribution models. We examined the influence of mean annual temperature and precipitation during the coldest quarter, as these were the two most important variables that predicted gray jay distributions in those models. Additionally, we examined the role of altitude, which we obtained from the BIOCLIM dataset. All three variables were obtained using "the point sampling" tool in QGIS (Quantum GIS Team, 2017). Finally, we examined the effect of glaciation by scoring an area as glaciated or unglaciated based on the results of our spatial distribution modeling results from the last interglacial.

| Genetic structure
We collected samples from and genotyped mitochondrial DNA of 375 individual gray jays from 37 populations (Table 1, Figure 2) and seven polymorphic microsatellite loci for 402 individuals from the 27 populations with five or more samples from across the range (Table 2).  Table 3; Table S4).
SAMOVA population groupings corresponded with those suggested in the statistical parsimony network ( Figure 3) and the same groups used in the analysis of molecular variance (AMOVA) to explain the most among group variation.

| Microsatellite DNA
A total of seven polymorphic microsatellite loci were used for analyses (Table S2). Twenty-seven populations with five or more samples were included in general analyses and initial Bayesian analyses of population clustering. Total number of alleles for each locus ranged from six for MJG1 and ApCo41 to 16 in ApCo37 (  Table S5). The initial structure clustering analysis suggested that the optimal number (K) of gray jay populations was two (mean

| Species distribution modeling
maxent modeling predicted a current range similar to that known for gray jays in North America with little variance (Figure 5a) (Figure 5b).
The model also shows suitable gray jay habitat may have existed near Newfoundland. During the last interglacial period (LIG; ~120-140), suitable gray jay habitat reflected that of the present distribution, with greater levels of suitable habitat in the Intermountain West and southern Ontario and Quebec (Figure 5c). This suggests that gray jays expanded into previously occupied areas after the ice sheets of the LGM receded (Figure 5c). Only populations with greater than five samples were used; n = number of samples used in genotyping and analyses; A n , number of alleles; A r , allelic richness; H o , observed and H e , expected heterozygosity; P, departures from Hardy-Weinberg equilibrium (-, not calculated, ns, not significant, *p < .05, **p < .01, ***p < .001. See Table 1 for population location abbreviations). a Removed from subgroup clustering analyses due to missing data.
T A B L E 2 (Continued)

| Barrier analyses
Using BARRIER, we found congruent patterns between mtDNA and microsatellite markers ( Figure 6). The majority of barriers identified were located in the western portion of the gray jay range and appear to correspond with the location of mountain ranges, water barriers, or breaks in suitable habitat. While patterns were mostly congruent between marker sets, there were some differences. In Our dbRDA models at the individual level found a significant relationship between the six environment variables we examined and both mtDNA and microsatellite genetic structure (Table 6). Similar environmental variables appear to influence both mtDNA and microsatellite genetic structure, although environment accounted for greater variance with respect to mtDNA genetic structure than microsatellite genetic structure. Precipitation during the coldest quarter accounted for twice as much variance (r 2 = .29) than geographic distance (r 2 = .14) or geographic location (r 2 = .13), while glaciation, altitude, and mean temperature were all significant, but accounted for a relatively small portion of the variance. For microsatellite genetic structure, the six variables accounted for a very small portion of variance (0.01-0.02). Similar to mtDNA patterns, precipitation during the coldest quarter was the top predictor of genetic variation among the six we tested (F = 17.11, p = .001). Our results indicate a weak effect of isolation by distance on genetic patterns overall, further suggesting the influence of barriers on genetic structure in gray jays.

| LGM refugia and patterns of postglacial colonization
High-mitochondrial genetic diversity exists within most groups, suggesting few founder events occurred during gray jay recolonization after deglaciation. Most areas have haplotype diversity approaching one. High-haplotype diversity and few shared haplotypes between populations also suggest limited maternal gene flow among groups, as might be expected in a sedentary species (Barrowclough et al., 2004;Bertrand et al., 2014;Burg et al., 2006;Graham & Burg, 2012).
Mitochondrial DNA patterns in the gray jay suggest long-term isolation in multiple refugia and low levels of gene flow following the retreat of the ice sheets. Species distribution modeling (SDM) and fossil data (Wetmore, 1962) reinforce the presence of multiple southern refugia and SDM data support a northern refugium. While SDM shows refugia during the LGM and these maintained isolation of genetically distinct groups (e.g., CO-NM, UT), isolation during earlier glaciations likely created many of the haplogroups seen. In addition, SDM modeling for the LIG suggests a similar distribution to that at present, though with greater concentration of suitable habitats in areas near refugia, corresponding to mtDNA groups. The Clearwater refugium has been suggested as a refugium for other species in the area (Godbout et al., 2008;Shafer et al., 2010), including emerging pollen evidence for Picea species (Herring & Gavin, 2015).
While our mtDNA data support isolation, the paleodistribution model-

| Tree refugia
Gray jays are dependent on forested habitat and, in particular, several species of spruce trees (Picea spp.). CO-NM, UT, and IMW groups are all closely associated with Engelmann and blue spruce, which are highly fragmented in the southern portion of their range (i.e., UT and CO; Ledig, Hodgskiss, & Johnson, 2006). Populations of Engelmann and blue spruce in the IMW and NE UT are genetically distinct (cpDNA) and physically isolated from each other by the Snake River Basin (Ledig et al., 2006), corresponding to the mitochondrial DNA patterns found here.
Further support for gray jay colonization throughout the Boreal-East from both a Beringia and a southeastern refugium comes from phylogeographic studies of spruce (Picea spp;Jaramillo-Correa et al., 2009). The strong association of gray jays with spruce species in these areas (Strickland & Ouellet, 2011) means it is possible that the birds may have followed the colonization of spruce into previously glaciated areas, a pattern seen in other boreal species (Burg et al., 2006;Graham & Burg, 2012). The colonization by spruce is suggested to have occurred from multiple refugia north (Beringia) and south (both east and west of the Appalachian Mountains), particularly for white spruce (Picea glauca; Jaramillo -Correa et al., 2009;de Lafontaine, Turgeon, & Payette, 2010).
Black spruce (Picea mariana) has a similar colonization history in the east. However, west of the Rocky Mountains, black spruce is thought to have colonized only from a southern, Pacific refugium (Gérardi et al., 2010), contrary to the pattern of colonization from multiple refugia that we suggest for gray jays in mainland British Columbia.

| Dispersal barriers and peripheral isolation
Congruent patterns between mtDNA and microsatellite markers suggest that similar factors are influencing historical and contemporary genetic patterns. We found limited support to suggest that distance or environmental factors are influencing genetic patterns, in this species, as has been shown in other North American resident species (Graham & Burg, 2012;Lait, Friesen, Gaston, & Burg, 2012 %Var shows the percentage of genetic variation for mtDNA and microsatellite patterns explained by each of the biotic and abiotic variables tested in our dbRDA models. and isolation in these disjunct populations. In other taxa, peripheral populations are more likely to be isolated due to reduced gene flow, which is particularly pronounced for disjunct populations (Burg et al., 2006;Eckert, Samis, & Lougheed, 2008 (Ledig et al., 2006) and Douglas fir (Gugger, Sugita, & Cavender-Bares, 2010); both species of trees that gray jays are closely associated with in the Intermountain West area (Strickland & Ouellet, 2011).

| Marker choice and overall patterns
While some studies question using a highly variable marker like control region versus ND2 or cytochrome b for phylogeographic and phylogenetic studies, previous work has shown that this marker can be used to resolve deep splits in evolutionary history among avian species (Barker, Benesh, Vandergon, & Lanyon, 2012) and of corvids in particular (Saunders & Edwards, 2000). Within a single species, some loci may not be variable enough to detect differences between populations (e.g., cytochrome b (Steeves, Anderson, McNally, Kim, & Friesen, 2003) versus control region (Steeves, Anderson, & Friesen, 2005) in masked boobies (Sula dactylatra)). Thus, using control region sequences in this study provides a valuable comparison and complement to previous research.
Similar haplogroup patterns are found in van Els et al. (2012); however, our work differs in several ways. We suggest that gray jays fall into seven haplogroups across North America compared to four; additional groups are Utah, which is similar to the Boreal group as in van Els et al. (2012) but with higher resolution control region data create a distinct group, and Vancouver Island, with higher diversity in the CO-NM and Pacific Coast groups. While some evidence exists in our paleodistribution model for a Newfoundland LGM refugium, also suggested by van Els et al. (2012), genetic data in both studies do not support this refugium and rather suggest a case of long-term isolation, possibly in a nearby refugium. One benefit to using the control region is that it allows us to distinguish additional genetic splits (e.g., NL) that might not be as evident using less variable markers. Adding microsatellite markers to our analyses provided additional support and resolution for geographic patterns. Strong differentiation between most populations is similar to that found with mitochondrial DNA, and clustering provides additional insights into patterns throughout the range. Though van Els et al. (2012) suggest that three distinct morphogroups exist, similar to that found in Sibley (2000), our observations of morphology and plumage in the field suggested less distinct groups with greater clinal variation. One notable exception is that of birds in Newfoundland, which were heavier and had shorter tarsi than other groups (Dohms, 2016). Overall, we did not observe distinct differences corresponding to haplogroups in our work.

| Conclusions and future research
Gray jay populations are highly differentiated, likely a consequence of limited dispersal for both males and females. Historical and contemporary gene flow is influenced by glaciation, barriers to movement such as large bodies of water and large areas of unsuitable habitat, and peripheral isolation. Additional research could include greater numbers of microsatellite loci or other nuclear markers to further enhance and complete our understanding of gray jay history and contemporary gene flow in North America.
Overall our findings provide greater insight into the ecology, evolution, and conservation of boreal organisms. For example, gray jay geographic genetic patterns are similar to those found in spruce species, the conifer genus most commonly associated with preferred gray jay habitats, suggesting a close association between habitat and diversification in this species. Given this parallel, we would recommend future comparative phylogeography research that integrates genetic markers and species distribution modeling for gray jay, spruce, and other codistributed species. Incorporating this integrative approach is important, given that boreal habitats are under threat, as a result of climate change.