Sarcoptic mange severity is associated with reduced genomic variation and evidence of selection in Yellowstone National Park wolves (Canis lupus)

Abstract Population genetic theory posits that molecular variation buffers against disease risk. Although this “monoculture effect” is well supported in agricultural settings, its applicability to wildlife populations remains in question. In the present study, we examined the genomics underlying individual‐level disease severity and population‐level consequences of sarcoptic mange infection in a wild population of canids. Using gray wolves (Canis lupus) reintroduced to Yellowstone National Park (YNP) as our focal system, we leveraged 25 years of observational data and biobanked blood and tissue to genotype 76,859 loci in over 400 wolves. At the individual level, we reported an inverse relationship between host genomic variation and infection severity. We additionally identified 410 loci significantly associated with mange severity, with annotations related to inflammation, immunity, and skin barrier integrity and disorders. We contextualized results within environmental, demographic, and behavioral variables, and confirmed that genetic variation was predictive of infection severity. At the population level, we reported decreased genome‐wide variation since the initial gray wolf reintroduction event and identified evidence of selection acting against alleles associated with mange infection severity. We concluded that genomic variation plays an important role in disease severity in YNP wolves. This role scales from individual to population levels, and includes patterns of genome‐wide variation in support of the monoculture effect and specific loci associated with the complex mange phenotype. Results yielded system‐specific insights, while also highlighting the relevance of genomic analyses to wildlife disease ecology, evolution, and conservation.


| INTRODUC TI ON
A classic paradigm in population genetics states that molecular diversity buffers against disease risk (Spielman, Brook, Briscoe, & Frankham, 2004). Host variation is thought to confer multiple defense strategies, thus limiting a pathogen's ability to exploit common weaknesses at the individual and population levels (Bergstrom & Antia, 2006;Hedrick, 1999). Conversely, the absence of host variation is expected to increase a population's vulnerability to infection, leading to disease outbreaks. This phenomenon is well supported within agricultural settings (Reiss & Drinkwater, 2018) and has been termed the "monoculture effect" (Elton, 1958). However, the universality of this trend beyond the agricultural realm remains uncertain.
Elucidating the relationship between host genomic variation and wildlife disease remains a prominent goal of molecular and disease ecologies (Blanchong, Robinson, Samuel, & Foster, 2016;DeCandia, Dobson, & vonHoldt, 2018). This is particularly important for small, fragmented, or reintroduced populations, where genetic diversity loss may reduce evolutionary potential and threaten long-term viability (Frankham, 2005;.
Regarding disease, the inability to cope with novel or enduring parasites can lead to increased morbidity among individuals, and ultimately precipitate population declines or local extirpation. This phenomenon remains understudied in wild populations, with little consensus between disparate host-parasite systems.
A recent meta-analysis found strong support for the monoculture effect in wildlife by examining the effect of population-level heterozygosity on parasite success (Ekroth, Rafaluk-Mohr, & King, 2019). However, this study primarily focused on invertebrate hosts and included both laboratory-and field-based studies. Its focus on population-level heterozygosity further excluded consideration of individual-level effects. Although relatively few in number, within-population studies in wildlife have reported an inverse relationship between genetic diversity and disease using neutral microsatellites (Coltman, Pilkington, Smith, & Pemperton, 1999;Townsend et al., 2018), immunogenetic markers (Brambilla, Keller, Bassano, & Grossen, 2018), and genome-wide datasets (Banks et al., 2020). In addition, morbidity has been associated with specific loci in multiple host species (Batley et al., 2019;Donaldson et al., 2017;Elbers, Brown, & Taylor, 2018;Ellison et al., 2014;Margres et al., 2018).
Considered together, these studies highlight the importance of characterizing genetic variation within the context of wildlife disease, particularly for conservation-relevant species.
We contributed to these efforts by examining host genomic variation and infection severity in a wild population of canids: gray wolves (Canis lupus) inhabiting Yellowstone National Park (YNP).
YNP wolves have been closely monitored for disease since their initial reintroduction in 1995 and 1996 (Phillips & Smith, 1997). To minimize risk, founders were screened for good health, vaccinated against numerous canine diseases, and treated with a broad-spectrum acaricide and anthelmintic (Almberg, Cross, Dobson, Smith, & Hudson, 2012). As a result, founders and their offspring initially bore low disease loads. Within a few generations, however, their light burden gave way to high exposure of canine adenovirus type-1, canine parvovirus, canine herpesvirus, and the protozoan Neospora caninum (Almberg, Mech, Smith, Sheldon, & Crabtree, 2009). These diseases were considered enzootic in the park's canids, and none appeared to negatively impact individual fitness or population viability.
Others quickly develop severe symptoms that worsen until death from mange or its associated dehydration, emaciation, secondary bacterial infection, or increased vulnerability to other causes such as intraspecific killings (Almberg et al., 2012;DeCandia, Leverett, & vonHoldt, 2019). As the source of this variability remains F I G U R E 1 Annual wolf counts recorded in December 1995 through 2019 with years of CDV outbreaks, mange invasion, and maximum mange burden indicated (figure adapted from Almberg et al., 2012) land, alpine meadows, and mixed coniferous forests. We make reference to two regions of the park, the northern range and the interior, based on ecological and physiographical differences and variation in disease dynamics (see Almberg et al., 2009Almberg et al., , 2012 for details).
Importantly, the 1,000 km 2 area of the northern range within YNP is characterized by lower elevations (1,500-2,200 m), serves as prime wintering habitat for the park's ungulates, and supports a higher density of wolves than the interior. In contrast, the interior (7,991 km 2 ) is higher in elevation (>2,500 m), receives higher annual snowfall, and generally supports lower densities of wolves and ungulates.

| Sample collection and mange classification
We used archived tissue and blood samples collected by the National Pack-level variables included location in the park (northern range or interior), pack size, and breeding status (i.e., whether the pack contained a breeding pair that year).
Frequent observations of YNP wolves also resulted in the documentation of individual mange scores, which reflected the percentage of body area presenting symptoms, such as hair loss or lesions.
On a 3-point scale, a score of 0 indicated no evidence of mange, 1 indicated that ≤ 5% of the body was impacted by mange-related symptoms, 2 referred to 6%-50% of the body being symptomatic, and 3 referred to the most severe score where > 50% of the body was presenting symptoms (Almberg et al., 2012;Pence, Windberg, & Sprowls, 1983). Any field-based observation or annual capturing of animals by NPS resulted in a mange score assigned to the corresponding individual. The frequent monitoring by NPS officials, consistent method of mange score assignment, and repeated observation of the same wolves maximized confidence in disease phenotypes. In downstream statistical analyses, we used the highest mange score documented per wolf for genetic analyses and mange score at the time of observation for mixed-effects modeling. Severity classes were coded "mild" for highest score 1, "moderate" for highest score 2, and "severe" for highest score 3.
To estimate pack-level exposure, we used the dates of first and last mange observation for each pack. We then flanked these dates by one month to account for asymptomatic periods that can both precede and follow infection (Arlian, 1989;Samuel, 1981). This established the mange exposure window for each member in the pack.
We restricted our mange dataset to only include wolves that contained three or more observations and were putatively exposed to mange, even if the animal was assigned a mange score of 0 (following vonHoldt et al., 2020). We used a custom perl script (sbfI_flip_trim_150821.pl, see

| DNA extraction and restriction site-associated DNA (RAD) sequencing
Appendix S1) to align all forward and reverse reads with the restriction enzyme cut site into one file. We then used STACKS v1.42 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) for the initial stages of data processing, in order to manually remove poor-quality samples from the dataset before paired-end mapping.
We used process_radtags to demultiplex and filter reads for > 2bp barcode mismatches or quality scores below 90% using a sliding window (15% of the read), and removed PCR duplicates using default parameters in clone_filter.
We completed paired-end alignments to the reference dog We implemented the populations module using all available samples and the flag-write_single_snp to retain only a single polymorphic site per read. When samples were replicated in the library preparation and sequencing process, we used PLINK (Purcell et al., 2007) to compare each of the replicates for proportion of missing loci and retained the sample with lower missingness. We excluded wolves with fewer than three observations and no history of mange exposure (based on pack membership and infection history), as it was impossible to assess mange severity class in individuals never challenged by mites. To complete the dataset with this final set of samples, we implemented the STACKS v2.2 populations module a second time with an additional filtering parameter (−r 0.9) to retain loci genotyped in more than 90% of wolves. We used VCFtools v0.1.12b (Danecek et al., 2011) to remove singletons, doubletons, and sites found on the X chromosome, due to difficulties posed by chromosomal sex determination and X-inactivation to mixed-sex study designs (Clayton, 2009). This produced our final dataset of high-confidence autosomal SNPs for downstream analysis. For population-level analyses, we genotyped these same SNPs in all wolves (regardless of mange exposure history) with available biomaterial.

| Genetic diversity statistics
We hypothesized that genetic diversity would inversely correlate with mange infection severity, as coded into classes mild, moderate, and severe. We used STACKS v2.2 to examine patterns of genetic di- For infection severity, we used analysis of variance (ANOVA), as this allowed for inclusion of all four mange severity groups in the same analysis.
We next used ADZE v1.0 to estimate rarefied metrics of allelic diversity (Szpiech, Jakobsson, & Rosenberg, 2008). As allelic richness (A R ), private allelic richness (PA R ), and shared PA R are heavily influenced by sample size, adoption of a rarefaction approach enables cross-group comparisons when sample sizes differ. Using this approach, A R, PA R , and shared PA R are estimated by averaging subsamples of each group at standardized sample sizes. We set the missing data tolerance to 25% and calculated mean A R , PA R , and shared PA R between infection severity groups.

| Mixed-effects modeling
We used mixed-effects modeling to contextualize genetic diversity within the broad range of factors that may influence infection severity in YNP wolves. Input data were derived from annual observa-  (Almberg et al., 2015;Brzeski et al., 2015) and observation year (Stahler et al., 2013). As numerous wolves changed pack membership across years, we fitted these variables as partially crossed random intercepts.
We used cumulative link mixed models (CLMM) implemented in the R package ordinal v2019.12-10 to test these hypotheses (Christensen, 2019 (Stahler et al., 2013). We compared sequential models using the likelihood-ratio test (lrt) for cumulative link models, and halted the reduction process when removal of the next fixed effect variable led to significantly worse model fitting. We reconsidered dropped terms by adding them to the reduced model one at a time and implementing lrts to assess significance. We performed a similar procedure for pairwise interaction terms between fixed effects contained in the reduced CLMM to see whether their inclusion significantly improved model fitting. We calculated Akaike information criterion adjusted for small sample (AICc)

| Identifying outlier loci
We implemented a univariate linear mixed model in GEMMA to identify outlier loci associated with infection severity (Zhou & Stephens, 2012. We included sex and coat color as covariates in the model to account for static life history variables, and used a pairwise relatedness matrix to account for familial structure within the dataset. As our dataset included wolves with unknown pedigree relationships, we calculated a centered pairwise relatedness matrix using the -gk 1 flag in GEMMA. We excluded natal pack as a covariate, as the pairwise relatedness matrix accounted for all possible relatives, rather than relying on inferred relations sharing a natal pack. We adjusted the lrt p-values obtained using a modified false discovery rate (FDR) procedure (Benjamini & Yekutieli, 2001), and GeneCards (www.genec ards.org) databases. Finally, we used G:GOST in G:PROFILER to conduct gene ontology analyses (Raudvere et al., 2019). We searched annotated genes for all available annotations (including molecular functions, cellular components, and biological processes) and assessed statistical significance using the Benjamini-Hochberg FDR of 0.05 (Benjamini & Hochberg, 1995;vonHoldt et al., 2020). Although this analysis may be underpowered due to sample size constraints, we adjusted significance thresholds to account for multiple testing and decrease the likelihood of false positives (following DeCandia, Brzeski, et al., 2019;vonHoldt et al., 2020).

| Population-level analyses
We next considered changes in genetic variation through time in all YNP wolves with available biomaterial, regardless of mange exposure history. Using static metadata, dynamic observations, and YNP annual reports, we determined which wolves were alive in each observation year between 1995 and 2019. We then used ADZE v1.0 to estimate annual mean allelic richness, while controlling for sample size differences between years. For these analyses, we used the missing data tolerance of 100% to ensure that the same loci were analyzed in each year's calculation. To account for the breeding structure of YNP wolves, we performed these analyses a second time using only known breeders. Here, we considered breeding status to be a static life history variable, in order to increase annual sample sizes. As such, each year's calculation included all living breeders regardless of their reproduction status in that particular year.

| Genetic diversity
Regarding susceptibility (i.e., binary mange presence), infected wolves (inf) exhibited higher levels of genetic diversity than wolves with no detected infection (uninf) across several diversity metrics ( We therefore used rarefaction to estimate mean allelic richness (A R ) and mean private allelic richness (PA R ) across standardized sample sizes in infected and uninfected wolves. We found that uninfected wolves exhibited higher levels of allelic variation across both diversity metrics (A R , inf = 1.9276 ± 0.0007, uninf = 1.9431 ± 0.0007; PA R , inf = 0.0498 ± 0.0006, uninf = 0.0653 ± 0.0007; Table S3).
We additionally examined private allelic richness shared between pairwise combinations of infection groups (Figure 3, Table   S4). In general, similar groups (where highest mange score was offset by one) shared more unique alleles than disparate groups (where highest mange score was offset by two or three), with the exception of the uninfected-moderate pair. As such, uninfected and mildly infected wolves shared the most alleles (0.0438 ± 0.004), followed by mildly and moderately infected wolves (0.0312 ± 0.003). In contrast, uninfected and severely infected wolves shared the fewest alleles (0.0155 ± 0.0002), with alleles shared by mildly and severely infected wolves similarly low (0.0192 ± 0.002). Within the pairs offset by one mange score, we observed an inverse relationship between infection severity and shared private allele richness. For example, the moderate-severe pair (0.0221 ± 0.0003) shared fewer alleles than both the uninfected-mild (0.0438 ± 0.0004) and mild-moderate (0.0312 ± 0.0003) pairs. This trend also occurred for pairs offset by two mange scores.

| Mixed-effects modeling
After constructing our null and global CLMMs, we calculated the variance inflation factor (VIF) for each fixed effect variable included in the saturated model. We observed low collinearity between predictor variables (VIF range 1.18-3.13; Table S5) and initiated our stepwise model reduction procedure. The most parsimonious model contained environmental, pack-level, and individual-level variables (Figure 4;

| Identifying outlier loci
We identified 410 autosomal sites significantly associated with mange severity after applying BY-modified FDR correction (p < .004). Frequency of the mange-associated allele was positively associated with mange severity at 224 sites and negatively associated with mange severity at 186 sites. Across all 410 sites, the mange-associated allele was typically found in the heterozygous state (Table S7). Site annotations included 12 exonic, 171 intronic, 17 near promoters, and 257 intergenic sites, with VEP annotations including 20 low, four moderate, and 847 modifier effects (N.B., many sites had multiple annotations). We identified 42 gene ontological categories that passed the FDR threshold set in G:PROFILER (Table S8) Genic sites queried in the Ensembl, OMIM, and GeneCards databases returned putative functions related to innate and adaptive immunity, autoimmunity and inflammation, cell barriers and adhesion, and skin development and disorders. For example, hematopoietic prostaglandin D synthase (HPGDS) has been implicated in the resolution of delayed-type hypersensitivity responses (Trivedi et al., 2006).
Similarly, protein tyrosine phosphatase, nonreceptor type 6 (PTPN6) has been linked to heightened inflammation characterized by edema, sustained inflammatory infiltrate, and the delayed wound healing (Lukens et al., 2013). Both loci exhibited decreasing minor allele frequency with increasing infection severity ( Figure S3b).
Additional genes were associated with chronic skin disorders, such as psoriasis and peeling skin disease (corneodesmosin, CDSN; F I G U R E 3 Rarefied private allelic richness shared between mange severity classes Matsumoto et al., 2008;Oji et al., 2010) (Figures S3 and S4; Table S9).

| Population-level analyses
We

Mean Allelic Richness
Year allele emerging for the remaining 11 loci between 1997 and 2003 (Table S10). All mange-associated alleles were therefore pres-  Figure 6b). These results were consistent across four independent subsamples of nonassociated alleles ( Figure S5).

| D ISCUSS I ON
In the present study, we characterized the relationship between host genomic variation and disease severity in a wild population of reintroduced canids. Through use of biobanked samples and F I G U R E 6 Posterior predictions of the average changes in frequency through time for alleles not associated, positively associated, and negatively associated with mange severity, with 95% credible intervals surrounding the mean. Nonassociated alleles comprise a randomly selected subset of 500 loci. The same analysis was repeated to assess changes in allele frequency (a) after mange invasion of YNP and (b) before mange invasion of YNP Inter-individual differences subsequently emerge due to the immune response mounted by the host (Nimmervoll et al., 2013;Oleaga et al., 2012), which is likely to be under genetic control (Steinke, Borish, & Rosenwasser, 2003). In the present study, we observed an inverse relationship between host genomic variation and mange infection severity in YNP wolves. This supports the paradigm that genetic variation plays an important role in wildlife disease (King & Lively, 2012;Lively, 2010;Luong, Heath, & Polak, 2007;Spielman, Brook, Briscoe, et al., 2004). Additional evidence includes heterozygous house finches (Carpodacus mexicanus) that exhibited reduced disease severity and mounted stronger immune responses than homozygous finches after experimental inoculation with Mycoplasma gallisepticum (Hawley, Sydenstricker, Kollias, & Dhondt, 2005).
Patterns of genome-wide variation suggested that host diversity was an important predictor of mange severity in YNP wolves.
However, wildlife disease dynamics are known to be impacted by host environment, demography, and behavior, as well (Ezenwa et al., 2016;Parratt, Numminen, & Laine, 2016;Silk et al., 2019). We therefore used mixed-effects modeling to quantitatively assess the role of genetic diversity in shaping the landscape of mange infection  (Almberg et al., 2012), mortality risk (Almberg et al., 2015), and energetics (Cross et al., 2016) in YNP wolves and other species impacted by mange (Martin et al., 2018).
Season may also contribute to our finding that nonbreeding packs were more likely to present severe mange. Breeding season for YNP wolves (mid-February mating; mid-April birth) directly follows the mean severity window for infected packs (September 2-February 2; Almberg et al., 2015). Mangy individuals may exhibit poor body condition during breeding season, reducing breeding likelihood and efficacy (Stahler et al., 2013), as seen in other host-parasite systems (Holand et al., 2015;Marzal, De Lope, Navarro, & Møller, 2005;Møller, 2002;Sarasa et al., 2011). Poor body condition may also be influenced by age, as adult wolves exhibited worse mange than yearlings. This finding is consistent with reports for age/mange relationships in Iberian wolves and coyotes (Oleaga et al., 2011;Pence et al., 1983), and divergent from reports in red foxes and dogs (Fazal et al., 2014;Feather et al., 2010;Newman, Baker, & Harris, 2002).
These differences in the literature, and our study in particular, may result from the demographics of the dataset. For example, we excluded any wolf that had fewer than three observations, which may have systematically excluded fatal mange infections in pups, as seen in the Silver pack in 2010 . We therefore recommend further study of age-specific outcomes with mange infections in both wolves and canids, more broadly.
Following our findings that genome-wide variation significantly predicts mange severity, we discovered specific loci associated with the highest mange score recorded per wolf. These loci were found in genes related to innate and adaptive immunity, autoimmunity and inflammation, cell barriers and adhesion, and skin development and disorders. For example, reduced minor allele frequency was associated with severe mange in genes HPGDS and PTPN6, which have been previously linked to immunopathology, inflammation, and delayed wound healing in mice (Lukens et al., 2013;Trivedi et al., 2006).
These results support the paradigm that host genomic variation can buffer against disease risk, as seen in agriculture's monoculture effect. They further emphasize the importance of considering genome-wide variation and disease-relevant loci when studying host-parasite dynamics, particularly in longer-evolved systems.
Using YNP wolves as an example, declines in genome-wide variation through time may increase the likelihood of severe mange infections.
However, removal of harmful mange-associated alleles may counteract that risk. Monitoring summary metrics of diversity alongside disease-associated loci will enable more accurate risk assessment and outbreak predictions, although YNP wolves are not currently treated or vaccinated against disease. In more heavily managed wildlife systems, similar analyses can directly inform conservation action. While further studies are needed to assess the universality of these trends, we posit that the maintenance of genetic variation should remain a priority during founder selection, reintroduction, and subsequent population management of at-risk populations. For species harboring inbred genomes, we further recommend exploration of additional molecular mechanisms that may influence disease risk in the absence of genomic variation (such as gene regulation or the host-associated microbiome). The integration of molecular and disease ecologies presents a powerful opportunity to elucidate the factors underlying disease risk, as well as the evolutionary effects of disease on wildlife. These insights can then inform best practices for disease management and wildlife conservation.

ACK N OWLED G EM ENTS
We would like to thank Emily Almberg, Andrew Dobson, Kristin

CO N FLI C T O F I NTE R E S T
None declared.