Spatial patterns of immunogenetic and neutral variation underscore the conservation value of small, isolated American badger populations

Abstract Small and isolated populations often exhibit low genetic diversity due to drift and inbreeding, but may simultaneously harbour adaptive variation. We investigate spatial distributions of immunogenetic variation in American badger subspecies (Taxidea taxus), as a proxy for evaluating their evolutionary potential across the northern extent of the species’ range. We compared genetic structure of 20 microsatellites and the major histocompatibility complex (MHC DRB exon 2) to evaluate whether small, isolated populations show low adaptive polymorphism relative to large and well‐connected populations. Our results suggest that gene flow plays a prominent role in shaping MHC polymorphism across large spatial scales, while the interplay between gene flow and selection was stronger towards the northern peripheries. The similarity of MHC alleles within subspecies relative to their neutral genetic differentiation suggests that adaptive divergence among subspecies can be maintained despite ongoing gene flow along subspecies boundaries. Neutral genetic diversity was low in small relative to large populations, but MHC diversity within individuals was high in small populations. Despite reduced neutral genetic variation, small and isolated populations harbour functional variation that likely contribute to the species evolutionary potential at the northern range. Our findings suggest that conservation approaches should focus on managing adaptive variation across the species range rather than protecting subspecies per se.

inbreeding (Eckert, Samis, & Lougheed, 2008;Munwes et al., 2010;Wagner et al., 2012). These characteristics suggest that small and isolated populations are more likely to go extinct before they can adapt to new environmental conditions (Bijlsma & Loeschcke, 2012). Because economic and logistic resources in conservation management are limited, it has been argued that prioritization of populations for conservation should target less vulnerable populations such as those with large abundances and high genetic diversity (Jamieson & Allendorf, 2012;Lesica & Allendorf, 1995). On the other hand, local selective pressures and restricted gene flow can also drive local adaptation in small and isolated populations (Lenormand, 2002;Vucetich & Waite, 2003), which implies that these populations are also relevant for conservation because of their unique adaptive genetic potential (Mayr, 1954;Petren, Grant, Grant, & Keller, 2005). There is an increasing recognition that conservation management should, as much as possible, focus on maintaining the adaptive genetic diversity within a species' range by understanding the processes shaping their evolutionary potential (Eizaguirre & Baltazar-Soares, 2014).
Hence, effectively mitigating a species' declines due to anthropogenic-related threats requires an understanding of both spatial distributions of genetic variation and patterns of local adaptation to predict how populations may respond to changing selective pressures (Bourne et al., 2014;Gibson, Van der Marel, & Starzomski, 2009). The success of interventions such as translocations, assisted migration and genetic rescue depends strongly on maximizing individual fitness in a novel environment. Ideally, genetic data should inform such interventions (Hedrick, 2014). Neutral genetic markers, such as microsatellites, provide information about gene flow and demography (Kirk & Freeland, 2011), but do not accurately predict adaptive genetic potential (Reed & Frankham, 2003;Volis, Ormanbekova, Yermekbayev, Song, & Shulgina, 2015). Thus, adaptive genetic variation should be assessed through examinations of highly polymorphic, functional markers that respond directly to selective pressures (Holderegger, Buehler, & Gugerli, 2010;Kirk & Freeland, 2011).
The relative effects of selection, gene flow and genetic drift on MHC can be disentangled by contrasting genetic structure at MHC and neutral genetic loci (Ekblom et al., 2007;Kyle et al., 2014;Spurgin & Richardson, 2010). For instance, balancing selection through heterozygote advantage or negative frequency-dependent selection maintains MHC polymorphism and counteracts the loss of rare alleles by genetic drift (Kamath & Getz, 2012;Rico, Morris-Pocock, Zigouris, Nocera, & Kyle, 2015;Strand et al., 2012). This scenario predicts that genetic differentiation among populations is weaker for MHC genes than for neutral loci (Tobler et al., 2014;Van Oosterhout et al., 2006).
Conversely, signatures of adaptive divergence at MHC relative to neutral loci (Dionne et al., 2009;Herdegen et al., 2014;Oliver et al., 2009) indicate heterogeneous selection caused by variation in local selective pressures. On the other hand, genetic drift has been described as the main evolutionary force shaping MHC polymorphism in small, isolated populations (Luo, Pan, Liu, & Li, 2012;Munguia-Vega et al., 2007).
Here, we assess the spatial variation of MHC polymorphism relative to the spatial distribution of neutral genetic diversity to evaluate the evolutionary potential of small and geographically isolated populations and large and well-connected populations of American badgers (Taxidea taxus), using populations across the northern extent of the species' range. American badgers are divided into four subspecies based on differences in skull size, pelage colour and geographical distribution (Long, 1972). Evidence from microsatellites and mitochondrial DNA (mtDNA) showed that genetic substructure exists within subspecies (Ethier, Laflèche, Swanson, Nocera, & Kyle, 2012;Kyle, Weir, Newhouse, Davis, & Strobeck, 2004). Three recognized subspecies of T. taxus reach their northern range limits in Canada ( The most peripheral western and eastern populations (TO and ON) are also the most genetically differentiated (Ethier et al., 2012;Kyle et al., 2004).
In this study, we aim to investigate the conservation genetic value of peripheral populations of an endangered carnivore by evaluating the relative influences of selection and neutral processes as drivers of MHC polymorphism among and within badger subspecies at the northern extent of the species' range. We compare spatial patterns of functional immunogenetic (MHC DRB exon 2) and neutral genetic (20 microsatellites) variations to account for the effects of gene flow and genetic drift. Specifically, we sought to determine whether small, isolated badger populations show lower levels of MHC diversity relative to large and well-connected nonperipheral populations, resulting from high rates of genetic drift or whether these isolated peripheral populations act as repositories of local immunogenetic adaptation. We also sought to determine whether functional immunogenetic markers reflect current subspecies and conservation designations for T. taxus.
Overall, these empirical data on the diversity of functional loci aid in evaluating the evolutionary potential of small, isolated populations and have the potential to better inform conservation strategies focusing on the maintenance and management of adaptive genetic variation.

| Major histocompatibility complex DRB-2 amplification and genotyping
We used 454 sequencing to characterize MHC class II DRB, fol- We used jMHC (Stuglik, Radwan, & Babik, 2011) to extract and assign raw FASTA-format reads to each individual. Sequences without complete primers and tags, sequences containing indels or ambiguous base pairs or sequences that did not match the expected allele size of 185 bp were discarded. Analysis of 454 sequencing data can be challenging because true MHC alleles must be distinguished from artefact sequences generated during PCR or 454 sequencing. To filter true alleles from artefacts, we applied two approaches: (i) the allele validation threshold following the multistep criteria described by Galan, Guivier, Caraux, Charbonnel, and Cosson (2010) and Sepil, Moghadam, Huchard, and Sheldon (2012) and (ii) the degree of change sequencing modelling by Lighten, Van Oosterhout, Paterson, Mcmullan, and Bentzen (2014). Both approaches rely on the assumption that artefacts are less frequent than true alleles (Lighten et al., 2014;Sepil et al., 2012). To apply the allele validation threshold method, we calculated the maximum per amplicon frequency (MPAF) for each variant, which is the maximum proportion of the individual's reads for a given variant among all individuals in which the variant was present.
We ranked variants based on the highest MPAF values. We started filtering variants with the lowest MPAF (≤1%) to check whether they could be explained as artefacts based on point mutations (≤2 bp substitutions) from a sequence of higher frequency (MPAF >10%) within the same amplicon. This initial assessment indicated that sequences with MPAF ≤3% were artefacts. In contrast, variants with MPAF ≥10% were present in more than one individual from an independent run and were considered true alleles. Variants between MPAF ≥3%-10% were checked manually to determine whether they could be explained by a difference of 1-2 bp from a parental true allele, contained premature stop codons, or produced a frame-shift mutation. Using the degree of change sequencing modelling approach, variants were ranked based on the number of reads per amplicon to calculate the cumulative sequencing depth among ranked variants. Variants with the highest degree of change were used as a basis to calculate statistical breakpoints between artefacts and true alleles based on sequencing depth models for each amplicon. We used the Excel Macro for MHC genotyping by Lighten et al. (2014). We contrasted the genotyping reliability of both models using the duplicated samples.

| Test for selection and recombination
We tested for signatures of historical positive selection in MHC using the one-tailed Z-test and likelihood codon-based approaches. In the one-tailed Z-test, the selection parameter ω quantifies the ratio of  (Brown et al., 1993;Stern et al., 1994). Codon-based maximumlikelihood methods of balancing selection were implemented in Codeml in PAML (Yang, 2007). We tested six models allowing for different   Therefore, we could not assign alleles to specific loci, requiring different measures of MHC diversity. At the population level, we estimated the mean number of pairwise differences (k) for each population in ARLEQUIN v3.11 (Excoffier, Laval, & Schneider, 2005). We also calculated the relative frequency of MHC alleles by counting the number of individuals carrying a particular allele divided by the total number of alleles in each population (Ekblom et al., 2007). At the individual level, we calculated MHC individual diversity as the number of alleles per individual divided by the maximum number of alleles found within individuals in the total data set (n = 4). We tested for significant differences in MHC individual diversity among populations using a type II ANOVA followed by pairwise comparisons (Tukey's HSD, familywise α = .05). We tested the correlation between microsatellite allelic richness and MHC individual diversity using Pearson product moment correlations in R (R Core Team, 2014).

| Data analysis: microsatellites
Significant departures from Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) for each sampling location were examined using probability test in GENEPOP v.4.2 (Rousset, 2008). For each location, we estimated observed (H o ) and expected heterozygosity (H e ) using GENALEX v.6.5 (Peakall & Smouse, 2012), inbreeding coefficient (F IS ) in FSTAT v.2.9.3.2 (Goudet, 1995) and allelic (A r ) and private allelic richness (P ar ) adjusted for unequal sample sizes by rarefaction in HP-RARE v.1.0 (Kalinowski, 2005). We tested for significant differences in neutral genetic diversity (A r , H o ) between small, peripheral populations (TO and ON) and nonperipheral populations using 1,000 permutations in FSTAT v.2.9.3.2 (Goudet, 1995). We estimated N e using the software LDNe (Do et al., 2014) by selecting the linkage disequilibrium and molecular co-ancestry methods. Confidence intervals were estimated by jackknife resampling with 1,000 iterations.  Earl & vonHoldt, 2012), and assessing the increase in P r (X|K) (Pritchard et al., 2000) and using the ad hoc ∆K method (Evanno, Regnaut, & Goudet, 2005). Individual membership probabilities of the inferred k-clusters from 10 independent replicates were averaged using CLUMPP v.1.1.2 (Jakobsson & Rosenberg, 2007), and clusters were visualized using DISTRUCT v.1.1 (Rosenberg, 2004). We performed analyses of molecular variance (AMOVA), by partitioning the genetic variance among the eight sampling regions and the three subspecies groups. Significance of AMOVA components was tested with 1,000 permutations in GENALEX (Peakall & Smouse, 2012). To evaluate the influence of neutral processes on population genetic differentiation at MHC, we performed a co-inertia analysis (CoA) to assess the joint structure of MHC and microsatellite loci. CoA is a multivariate method that assesses the covariance structure between data sets having the same observations (Dray & Dufour, 2007). This multivariate method is not limited to population-pair comparisons such as F ST and does not rely on mutational, HWE and LD equilibrium assumptions (Jombart, 2008).

| Comparisons between neutral and functional markers
For each binary-matrix, we performed a factorial PCA using sampling regions as predefined groups, and the two most important PCA components of each marker were input for CoA using the ade4 R package (Dray & Dufour, 2007). We tested the significance of the co-relationship between matrices by comparing the CoA estimated from the empirical data set with the CoA distribution estimated after 1,000 bootstraps. admixture between their geographically adjacent clusters (Fig. 1a).

| MHC selection and recombination test
All observed MHC alleles differed by at least one amino acid (Table S2) and were presumed functional based on lack of stop codons, an absence of frame-shift mutations, or extremely high amino acid sequence identity (90%-96%) with functional mammalian DRB alleles (GenBank). The one-tailed Z-test of positive selection showed a nonsignificant excess of nonsynonymous substitutions for all regions, PBR and non-PBR sites.
However, the difference between the d S and d N ratios was larger for PBR sites compared to non-PBRs (0.7 vs. 0.92, respectively; Table S3).
The genetic algorithm recombination detection method found no significant evidence (p > .05) of recombination in the 22 potential breakpoints explored. Table 1 (Table S4). FEL identified two codons, REL three, and MEME four. From 14 codons under positive selection, eleven corresponded to PBR sites and five were identified by more than one method.

| Microsatellite diversity and population structure
We found <2% genotyping error rate, largely corresponding to
Factorial PCA based on MHC data did not identify clear genetic differentiation among subspecies or sampling regions (Fig. 2a), but factorial PCA based on microsatellites confirmed the strong genetic differentiation of TO and ON from the other regions (Fig. 2b). The global co-inertia coefficient revealed a significant positive correlation between MHC and microsatellite variation in the full data set (RV-coefficient = .87, p = .0001), but the strength of this correlation varied among regions (for example, AB, MB and SK showed the shortest vector length and thus the highest correlation). The CoA plot shows the joint trend of the covariance between the MHC and microsatellites for each region (Fig. 2c). The vector length is inversely proportional to the covariance between data sets; that is, the longer the length the higher the divergence between markers.
MHC canonical weights indicated that alleles H07 and H09 contributed largely to the separation of ON and LP badgers; alleles H15, H13 and H19 contributed to the discrimination of TO and EK badgers; allele H10 to the separation of MB, AB and SK; and allele H01 contributed to the discrimination of UP badgers (Fig. 2c). These alleles occurred with the highest frequencies in each of these sampling regions (Fig. 1).
Pairwise F ST values for both microsatellites and MHC supported the genetic differentiation of TO and ON badgers from other populations (Table S5)

| DISCUSSION
Our T A B L E 3 Estimates of effective population size (N e ) in eight sampling regions of American badgers. N e was estimated using two methods: Linkage disequilibrium and molecular co-ancestry with their corresponding 95% confident intervals, which are shown in parentheses. Abbreviations as in  Kyle et al., 2014) and the brown bear (Ursus arctos, Kuduk et al., 2012).
Moreover, the occurrence of one to four alleles within individuals suggests the presence of two MHC DRB exon 2-like loci. Copy number variation is a common feature in many vertebrates (e.g. Figueroa et al., 2001;Van Oosterhout et al., 2006;Oomen et al., 2013;Kyle et al., 2014;Lighten et al., 2014), which is explained by gene duplications from a birth-death process where some duplicated genes are maintained by bal-  Radwan, 2008;Luo et al., 2012). However, maximum-likelihood methods detect historical periods of balancing selection during the evolutionary trajectory of a species (Garrigan & Hedrick, 2003). Signatures of selection acting across contemporary populations are thus better evaluated by contrasting genetic differentiation at neutral and MHC loci (Ekblom et al., 2007)  was in agreement with previous studies using microsatellites (Kyle et al., 2004) and mtDNA (Ethier et al., 2012). Based on mtDNA, there was clear genetic differentiation between EK and T. t. taxus, whereas UP was genetically more similar to T. t. taxus than to T. t. jacksoni, and thus, UP badgers were re-designated as T. t. taxus (Ethier et al., 2012).

| Patterns of genetic structure in neutral and MHC loci
In contrast, our microsatellite data showed much weaker genetic structuring along the boundaries of the recognized subspecies, which suggest substantial gene flow among them, and sex-biased dispersal.
In the west, the neutral genetic insularity of TO badgers can be explained by limited dispersal imposed by the rugged topography of the Selkirk Mountains between TO and EK regions. Likewise, the lower quality habitat (i.e. rocky clay soils and low prey availability) between TO and EK, and the Flathead Montana and north-west Washington is also expected to limit population connectivity (COSEWIC, 2012). In The American badger is an omnivore across its range, feeding on most available small mammals, reptiles, birds and invertebrates (Azevedo et al., 2006). Variation in badger diet across Canada corresponds with a shift in community structure of prey species from east to west. Eastern badgers are sympatric with small mammal communities dominated by eastern chipmumk (Tamius striatus), mice (Peromyscus sp.), woodchucks (Marmota monax) and eastern cottontail (Sylvilagus floridanus) (Dobbyn, 1994). In the west, small mammal communities are dominated by black-tailed prairie dogs (Cynomys ludovicianus), ground squirrels (Spermophilus sp.) and marmots (M. flaviventris, M. caligata;Michener, 2000;Kinley & Newhouse, 2008;COSEWIC, 2012). Thus, differences in species consumed by badgers across its northern range may expose them to a different suite of pathogens. Western populations are exposed to Yersinia pestis, the bacterium responsible for causing plague. The pathogen is maintained in populations of ground squirrels and prairie dogs that are important prey items for western badgers. Yersinia pestis can persist in carcasses and surrounding soil for up to 7 months, and badgers frequently cache prey for weeks prior to consumption (Michener, 2000).

| Levels of diversity in neutral and MHC loci
We found lower neutral genetic diversity in small, peripheral populations relative to nonperipheral populations with significant differences for allelic richness. Similarly, the number of MHC alleles was higher in large compared to small populations, but MHC diversity within individuals was not lower in small populations. Levels of genetic diversity are expected to respond to demographic processes such as inbreeding, genetic drift, restricted gene flow and small population size (Frankham, 2005).

t. taxus in the Canadian
Prairies is much larger with an estimated 1,000-10,000 individuals (Scobie, 2002). We observed in small populations higher F IS coefficients and smaller N e estimates relative to large populations, which partly can explain the differences in genetic diversity found between small and large badger populations.

| Conservation implications
Our findings show that subspecies designations are more congruent with the functional genetic clusters than with the neutral microsatellite data. Thus, subspecies may be more appropriately considered as ecotypes such as those observed in grey wolves (Canis lupus; Leonard, 2015) or Grizzly Bears (Ursus arctos, Shafer, Nielsen, Northrup, & Stenhouse, 2014). Recognizing adaptive divergence is critical for prioritizing which populations to protect and for selecting the best population sources for translocations or assisted migration (Funk, McKay, Hohenlohe, & Allendorf, 2012;Latta, 2008 (Grueber, 2015). More comprehensive approaches that utilize hundreds of genetic markers (e.g. single nucleotide polymorphism, SNPs) to examine functional variation in multiple immunity genes through genome-sequence data, would improve our understanding on how wildlife populations respond to disease (Morris et al., 2013). On the other hand, adaptive divergence revealed through a single-locus such as the MHC might not reflect divergence in adaptive genes associated with other relevant ecological conditions (Zhou, Seim, & Gladyshev, 2015;Schweizer et al., 2016). Screening for genome-wide variation to identify putative loci under selection or associations between genetic variants and phenotypic traits is the plausible option, which is increasingly being used to investigate adaptive genetic variation in wildlife (Harrisson, Pavlova, Telonis-Scott, & Sunnucks, 2014;Schweizer et al., 2016).
Evaluating evolutionary potential in wildlife populations is still challenging (Allendorf, Hohenlohe, & Luikart, 2010). For instance, many traits that are currently adaptive may not confer fitness under the rapid ongoing environmental pressures (Funk et al., 2012). Moreover, many complex phenotypic traits are known to be polygenic, which implies that single-locus methods become insufficient as the number of loci involved can be large (Le Corre & Kremer, 2012). Other challenges include aspects of statistical analysis, where most available methods are prone to large number of false discoveries for trait-associations from genomewide data. Despite these existing challenges, genomic approaches could ideally provide more comprehensive foundations to move towards an adaptive species management (Harrisson et al., 2014).
Badgers face many threats across their range as habitat is becoming increasingly fragmented by land-use changes. Areas of high urbanization and road density are expected to drastically limit badger movements and increase incidental deaths by road crossing (COSEWIC, 2012). At the same time, encroachment of forest into native grasslands in British Columbia may limit the size and carrying capacity of remaining habitat. These landscape modifications limit gene flow among the increasingly isolated populations. The resulting increase in genetic drift and inbreeding and the concurrent impacts on adaptive genetic diversity can be mitigated in part by restoration and protection of badger habitat. Other threats faced by T. taxus in Canada are either poorly documented (for example, the magnitude of mortality from disease) or require a more direct approach to mitigation (i.e. human persecution).
For example, badgers are poisoned either secondarily, due to consumption of prey that have consumed rodenticides, or directly in cases where landowners do not want badgers on their property (COSEWIC, 2012; Proulx and MacKenzie 2012). Diseases such as plague, leptospirosis and canine distemper can also increase mortality of adults (Quinn et al., 2012), with particularly strong effects on small and isolated populations.
Habitat loss and climate change are expected to promote the expansion of pathogens and their vectors and thus increase the incidence of diseases affecting wildlife in northern regions (Brooks & Hoberg, 2007;Kutz et al., 2009). The capacity of wildlife populations to persist under such conditions is a critical topic in conservation biology (Berteaux, Réale, McAdam, & Boutin, 2004;Altizer, Ostfeld, Johnson, Kutz, & Harvell, 2013;Hoberg and Brooks 2015). Examining the spatial distribution of MHC polymorphism among populations is a valuable proxy for understanding the processes underpinning their evolutionary potential (Radwan, Biedrzycka, & Babik, 2010;Sommer, 2005), which is fundamental to better inform conservation strategies.
Our study thus underscores the importance of considering neutral and functional genetic markers to simultaneously evaluate the adaptive genetic potential of small and isolated populations in mammalian species (Frankham, 2010). for their assistance with laboratory analyses, and two anonymous reviewers, whose comments improved the quality of our manuscript.

DATA ARCHIVING
Microsatellite genotypes at 20 loci and binary-encoded data of MHC