Characterizing the demographic history and prion protein variation to infer susceptibility to chronic wasting disease in a naïve population of white‐tailed deer (Odocoileus virginianus)

Abstract Assessments of the adaptive potential in natural populations are essential for understanding and predicting responses to environmental stressors like climate change and infectious disease. Species face a range of stressors in human‐dominated landscapes, often with contrasting effects. White‐tailed deer (Odocoileus virginianus; deer) are expanding in the northern part of their range following decreasing winter severity and increasing forage availability. Chronic wasting disease (CWD), a prion disease affecting deer, is likewise expanding and represents a major threat to deer and other cervids. We obtained tissue samples from free‐ranging deer across their native range in Ontario, Canada, which has yet to detect CWD in wild populations. We used high‐throughput sequencing to assess neutral genomic variation and variation in the prion protein gene (PRNP) that is partly responsible for the protein misfolding when deer contract CWD. Neutral variation revealed a high number of rare alleles and no population structure, and demographic models suggested a rapid historical population expansion. Allele frequencies of PRNP variants associated with CWD susceptibility and disease progression were evenly distributed across the landscape and consistent with deer populations not infected with CWD. We estimated the selection coefficient of CWD, with simulations showing an observable and rapid shift in PRNP allele frequencies that coincides with the start of a novel CWD outbreak. Sustained surveillance of genomic and PRNP variation can be a useful tool for guiding management practices, which is especially important for CWD‐free regions where deer are managed for ecological and economic benefits.


| INTRODUC TI ON
Human-induced environmental change has caused widespread alterations to ecological and evolutionary processes (Harmon et al., 2009;Pecl et al., 2017). Climate change is expected to be the dominant driver of wildlife population declines and has been linked to broad-scale biodiversity losses, but regional responses are often nuanced and context dependent (e.g., Hashida et al., 2020;Taylor et al., 2017;White et al., 2017). For example, climate change in the Midwestern United States differentially favors the survival of two sympatric populations of ungulates with similar selection pressures and life-history traits, but differing densities (Escobar et al., 2019;Weiskopf et al., 2019). In such instances, intraspecific genetic diversity and adaptive potential are crucial for long-term population viability (Kardos & Shafer, 2018).
The emergence, spread, and persistence of infectious diseases in previously allopatric populations are facilitated by climate change and other anthropogenic activity (Aguirre, 2017;Morand & Walther, 2020;Price et al., 2016). The impacts of infectious disease on wildlife populations are of interest to managers, especially if the affected species holds economic or cultural value (Lambert et al., 2018;Weiskopf et al., 2019). Preventing and controlling diseases in free-ranging populations can, however, be complex and costly when they are both naïve to the infectious disease and faced with climate change and other anthropogenic activities (Herrera & Nunn, 2019;Miguel et al., 2020;Samuel et al., 2020). Selective pressures are increased under these circumstances and populations are forced to respond to multiple stressors simultaneously, or potentially face local extirpation (Fischer et al., 2020).
In wild populations, the functional adaptive response to ancient (Fedorov et al., 2020) and contemporary selection pressures (Mimura et al., 2016) causes changes to effective population size (N e ) and population demographics (McKnight et al., 2017). For example, the selective pressure exerted by novel infectious disease in naïve hosts can lead to sudden loss of genetic diversity (i.e., N e ) and sustained genetic bottlenecks (Phillips et al., 2020). When the adaptive potential of a population is diminished by low N e , which is increasingly a problem in large mammals (e.g., Peart et al., 2020), genetic variation can be maintained in the short-term through successful dispersal to more suitable habitat or persistence within refugia (Chiocchio et al., 2019;Des Roches et al., 2019). In the worst case, low genetic variation can be further reduced by genetic drift and inbreeding and lead to local extirpation as species' no longer have an effective mechanism to respond to the changing environment (McKnight et al., 2017;Palkovacs et al., 2004). Characterizing neutral and functional genetic variation allows for both understanding and predicting adaptive potential, more specifically how wild populations will respond to the selective pressures of climate change, infectious disease, and anthropogenic activity (Funk et al., 2019). White-tailed deer (Odocoileus virginianus; deer) are the most widely distributed and abundant ungulates in North America and hold significant economic and cultural value (Hewitt, 2011). The northern range of deer is primarily limited by snow but decreasing winter severity has allowed deer to expand northward beyond their historical range limits (Dawe & Boutin, 2016;Kennedy-Slaney et al., 2018). This expansion has implications for ecosystems as, for example, deer herbivory alters long-term regional habitat characteristics and plant communities (Frerker et al., 2017;Kroeger et al., 2020;Otsu et al., 2019). Further, deer are an important prey species and predator populations increasing in response to deer expansion has led to greater predation and apparent interspecific competition with other ungulates (Barber-Meyer & Mech, 2016;Latham et al., 2013).
Consequently, northward expansions of deer are having profound impacts to ecosystems including facilitating infectious pathogen and disease spread (Averill et al., 2018;Ferretti & Mori, 2020).
A widespread threat to deer in North America is the highly infectious and fatal neurodegenerative prion disease called chronic wasting disease (CWD). CWD is the only prion disease known to infect captive and free-ranging species of cervids (family Cervidae) and has been reported in North and South America, Europe, and South Korea (Haley et al., 2019). With virtually no barriers to transmission and a lengthy infectious preclinical period, the local prevalence of CWD in North America has been measured to be as high as 50% and 82% in wild and captive populations, respectively (Miller et al., 2004;O'Rourke et al., 2004). Due to constraints on CWD surveillance, it is likely that the distribution and prevalence of CWD in wild populations are underestimated (Escobar et al., 2020). It is clear the frequency and occurrence of CWD have increased over time, in part driven by anthropogenic activities related to hunting and wildlife farming (Osterholm et al., 2019).
Despite CWD being fatal, there is inter-individual variation in susceptibility and clinical progression. Susceptibility and clinical progression are associated with non-synonymous and synonymous genetic variation in the functional prion protein gene (PRNP; Chafin et al., 2020;Güere et al., 2020). Single-nucleotide polymorphisms (SNPs) at nucleotide (nt) 60, nt153, nt285, nt286, nt555, and nt676 in deer PRNP have been associated with altered CWD susceptibility or pathogenic processes (Brandt et al., 2015(Brandt et al., , 2018Johnson, Johnson et al., 2006;Wilson et al., 2009). The presence of CWD appears to affect population PRNP allele frequencies over space and time due to selection (Robinson et al., 2012); however, altered CWD susceptibility and pathogenic processes are clearly polygenic traits (Seabury et al., 2020) and disease spread is different in structured populations, which might require different wildlife management practices (Chafin et al., 2020). The efficacy of selection in the face of a novel pressure like CWD is dependent on N e and common metrics to infer selection such as Tajima's D measure shifts in allele frequencies across the site frequency spectrum. However, the frequency and proportion of rare alleles are also sensitive to demographic processes (Messer et al., 2016;Platt et al., 2019), and population genetics theory predicts an excess of rare alleles in expanding populations (Gillespie, 2004). Accordingly, at the genome level, we predicted an excess frequency of rare neutral and PRNP variants across our study region given Ontario's deer population is expanding (Kennedy-Slaney et al., 2018).
Based on large-scale distribution changes and recent population trends in deer (Baldwin et al., 2000;Latch et al., 2009), we predicted we would observe high neutral genomic diversity (N e ) and demographic population expansion, indicating increased gene flow and decreased population structure despite a heterogenous landscape. Since PRNP is not under selection by CWD given the region is disease-free, we predicted functional variation to resemble regions most recently exposed to CWD (or still disease-free). This is the first study to characterize PRNP genetic variation and population genomic structure of wild deer in Ontario, while also quantifying the ancient and contemporary demographic and selection processes shaping patterns of diversity.

| Study area and sample collection
We sampled white-tailed deer across Ontario, Canada ( Figure 1 10 × 10 km), and wildlife management unit (WMU), we selected a subset of samples based on equal sampling of sex and geography.
Northwestern and southern Ontario regions are geographically discontinuous for deer (Figure 1), samples were therefore assigned to northern Ontario or southern Ontario ( Figure S1) for the purpose of analysis where sampling regions are compared. Preliminary analysis of data did not warrant separating southeastern and southwestern Ontario as per provincial management zones. Genomic DNA was extracted from deer samples using a silica-based DNA extraction kit for tissue following manufacturer protocol (Qiagen,Cat. No. 69506) and stored at −20°C.
DNA quality was assessed by a spectrometer (NanoDrop 2000, Thermo Scientific) and by 2% agarose GelRed gel electrophoresis.

| Library preparation for PRNP genetic analysis
A 771 base pair (bp) region of the deer prion protein precursor (PRNP) gene was targeted and amplified using four degenerate primers (Table S1). Four replicate PCRs generated a 460 base pair F I G U R E 1 Distribution of the subsample of free-ranging white-tailed deer samples obtained between 2003 and 2018 by the Ontario Ministry of Natural Resources and Forestry (OMNRF) that were used for the reduced representation genome analysis (n = 190; cream) and the prion protein genetic analysis (n = 631; orange and cream). The natural distribution of free-ranging white-tailed deer is shown for Ontario with a darker shade of gray Fragment 1 and a 580 base pair Fragment 2 (Table S2). In a 2:1 ratio of Fragment 1 to Fragment 2, respectively, amplified DNA was added for each individual and then indexed with standard Illumina multiplexing indices. Negative controls of UltraPure distilled water (Invitrogen, 1897011) were used for each 96-well plate. The library was purified of artifacts following manufacturer protocol for AMPure XP beads (Beckman Coulter, A63880) and validated with a TapeStation D1000 kit (Agilent, 5067-5582). The library was sequenced on an Illumina MiSeq platform at the University of Guelph Advanced Analysis Centre to generate 300 base pair (bp) pair-end reads for each sample.

| Library preparation for RADseq genomic analysis
Restriction site-associated DNA sequencing (RADseq) libraries were generated using an adapted protocol from Parchman et al.

| Bioinformatic pipeline and data analysis I -PRNP gene
The quality of reads was assessed using FastQC (Babraham Institute; v0.11.8). Samples were excluded if at least one file in the pair-end files for a sample was less than 1 kB in size or failed to pass quality standards. A novel command line-based pipeline was developed to assemble and genotype PRNP (accessible at www. gitlab.com/WiDGeT_Trent U/gradu ate_theses). Our workflow integrated Pullseq v1.0.2, BWA v0.7.17, SAMtools; v1.9, BCFtools v1.9, and VCFtools v0.1.16-15. Briefly, for each sample, the pipeline extracted relevant reads based on the presence of primer sequence, mapped the extracted reads to a 771 bp PRNP gene reference sequence. We generated a consensus sequence and called single-nucleotide polymorphisms (SNPs). SNP calls were limited to positions where there was a minimum read depth of 30 and mapping quality score of at least 30. Sanger sequencing of a subset of samples and their a priori called variants were used to validate the bioinformatic pipeline.
The presence of asparagine (N) at aa138 (nt413A) indicates amplification of the pseudogene (Brandt et al., 2015); we therefore filtered out all sequences with this site. SNPs with a total frequency of occurrence of 1% or less were excluded from the analysis. A twosided Fisher's Exact Test was conducted on minor allele counts from either northern or southern sampling regions at four well-studied positions in the deer PRNP gene associated with CWD: nucleotide (nt) 60, nt285, nt286, and nt676. Synonymous and non-synonymous sites were identified using MEGA X v10.0.5. Haplotypes were estimated from unphased sequences with PHASE v2.1.1 using a Markov chain Monte Carlo (MCMC) sampling approach with a minimum of 100,000 steps, with a discarded burn-in of 10,000, and samples were drawn every 100 MCMC steps. Five repetitions were performed to verify consistent frequencies of haplotype assignment (Brandt et al., 2018). Haplotypes with a frequency of less than 1% were removed. The genotype, frequencies, and estimated standard deviations of the remaining haplotypes were analyzed as a 2 × 2 contingency table by sampling region.

| Bioinformatic pipeline and data analysis II -RADseq
Fastq files were demultiplexed using process_radtags within the

| Demographic analysis and estimate of effective population size
The final VCF was converted into 1D site frequency spectrum (SFS) for all of Ontario and northern vs and southern designations, respectively, using vcf2dadi.py with projections for the SFS estimated in easySFS. We applied a diffusion-based approach to demographic inference through the Diffusion Approximation for Demographic Inference (δaδi) tool by Gutenkunst et al. (2009). Nine 1D models were assessed for Ontario as a single population. The optimum model was selected as the lowest optimized log-likelihood of all successfully run models. δaδi was also used to estimate the following summary statistics for the province: Watterson Theta (θ), Tajima's D, and the number of segregating sites.
Using the mutation rate (μ) per site per generation of a closely related species (Rangifer tarandus from Chen et al., 2019), total number of sites (L), and the parameters estimated in the optimum model selected from δaδi, we estimated the ancestral effective population size as N a = θ/4 μl.

| Estimation of selection on PRNP and allele frequency projections for a naïve population
We estimated the selection coefficient (s) at two polymorphic sites (nt285A to C and nt286A to G) using the approach of Thompson et al.

| PRNP genetic analysis
A total of 631 Ontario deer samples were included in the PRNP genetic analysis (Figure 2). Nineteen SNPs were detected after filtering (Table 1), with eight being non-synonymous substitutions. Six of the detected variants in the PRNP gene have been linked to CWD susceptibility or clinical progression, these include nt60, nt153, nt285, nt286, nt555, and nt676. A two-sided Fisher's Exact test on the major and minor allele counts at the four important, arguably the most studied, CWD-linked loci (nt60, nt285, nt286, and nt676) conducted between Northern and Southern Ontario indicated that there was only a small difference in frequency (p < 0.05) at nt676 (Table S6), collectively showing that PRNP variants are more-or-less distributed similarly across the province.
There were 102 unique haplotypes with a count of at least one, with 12 haplotypes having a frequency greater than 1% (Table S7).
The same Haplotype A (f = 0.15) and Haplotype B (f = 0.23) were reported by Chafin et al. (2020) in deer from Arkansas, USA. The optimum 1D demographic model for Ontario was the BOTTLEGROWTH model, which models an instantaneous size change followed by exponential change (Table S8); here, there is large historical growth in population, followed by a more recent and moderate decline ( Figure 4). From the 1D site frequency spectrum, the mean Tajima's D was estimated to be −2.126 consistent with a population expansion after a bottleneck. Using the calculated θ, we estimated N a to be ~20,000: this would place the timing of the population change (Tc) measured in 2N a generations around the onset of the last glacial maximum (Figure 4).

| PRNP selection and projection
We estimated the selection coefficients to be 0.08 (±0.06) and 0.11 (±0.07) for nt285 and nt286 under a dominance model. All simulated projections with our s values showed a rapid shift in allele frequencies ( Figure 5); the majority of simulated trajectories did not overlap with the lower s coefficient (0.01) previously reported by Robinson et al. (2012), but were consistent with their higher coefficient of 0.074.

| D ISCUSS I ON
White-tailed deer in North America are intensively managed for hunter harvest and are expanding their range northward due to climate change. Our genome assessment showed high levels of variation and homogeneity in Ontario, with demographic models showing evidence of a massive species-wide population expansion.
The frequency and occurrence of CWD infection in captive and freeranging deer are likewise increasing (Osterholm et al., 2019;Rivera et al., 2019), but transmission and spread in free-ranging populations are still poorly understood (Potapov et al., 2016). In Ontario, we observed multiple novel PRNP alleles, with the frequencies of known variants differing from areas currently infected with CWD (or where CWD is endemic), including western Canada (Table S9; Brandt et al., 2015Brandt et al., , 2018Chafin et al., 2020;Kelly et al., 2008;Wilson et al., 2009). Collectively, our functional and neutral genomic data are consistent with CWD not being present in Ontario.
The emergence, transmission, and persistence of highly infectious diseases in healthy populations are often facilitated by climate change and exacerbated in areas with intense anthropogenic activity (McKnight et al., 2017;Rizzoli et al., 2019). The introduction and re-introduction of infectious diseases often result in rapid local population declines, reducing species' adaptive potential and generating substantial economic losses (Belant & Deese, 2010;Escobar et al., 2019). In Ontario deer, however, we observed an excess of rare alleles with no evidence of strong population F I G U R E 2 A 771 bp region of the white-tailed deer prion protein gene was analyzed from free-ranging white-tailed deer in Ontario, Canada (n = 631). The stacked polymorphisms across 19 variable loci were organized by broad management in Ontario and are shown. Circles indicate loci previously described as variable in the white-tailed deer prion protein gene. Triangles indicate novel variable loci in the white-tailed deer prion protein gene structure, including in southern Ontario where an environmental cline exists around lake systems. An excess of rare alleles and high N e and absent population structure might provide the means for effective adaptation to selective pressures, including climate change, infectious disease, and human activity.

| Linking population demographics to functional genes
We assessed contemporary and historical gene flow by examining neutral genomic variation across tens of thousands of loci from deer across the province on Ontario. The data were consistent with a large expanding population with a high level of genomic diversity and an excess of rare alleles. These genomic patterns are also consistent with population simulations of responses to climate change (Kennedy-Slaney et al., 2018) and offer some clues as to the potential adaptive response were CWD to arrive. Specifically, two important features of the genomic data are noteworthy: the high number of rare alleles and the limited population structure across Ontario.
Demographic processes and life-history strategies influence the proportion of rare alleles, which are important to the adaptive process but are sensitive to recent evolutionary processes (Peart et al., 2020). Evidence suggests the accumulation of rare alleles is independent of taxa, but adaptation appears limited in low-diversity taxa (e.g., primates; Rousselle et al., 2020). The deer genomic data evaluated here are consistent with high adaptive potential; specifically, we calculated an N a to be ~20,000, with the expansion estimated to be 100X suggestive of a large species-wide N e (derived either from π in Table 2 or the demographic model in Figure 4).
Studies of populations infected with CWD often demonstrate a lack of allelic diversity at the PRNP gene, which is thought to be due to CWD-driven selection positively selecting for functionally Observed heterozygosity 4.7 × 10 −4 6.1 × 10 −4 5.5 × 10 −4 F IS 5.7 × 10 −4 3.9 × 10 −4 4.2 × 10 −4 TA B L E 2 Genome-wide population summary statistics including breakdown of sites, number of individuals, nucleotide diversity estimate (π), individual genetic variance (I) relative to the subpopulation genetic variance (F IS ), and the observed heterozygosity of white-tailed deer in Ontario relevant alleles (Haley et al., 2019). We also found a high frequency of rare PRNP alleles in our naïve population which differs from areas currently infected or where CWD is endemic, including western Canada (Brandt et al., 2015(Brandt et al., , 2018Johnson, Johnson et al., 2006;Kelly et al., 2008;Wilson et al., 2009). Furthermore, infectious diseases can cause population fragmentation (i.e., structure), demographic changes, and genetic isolation (McKnight et al., 2017).
A pre-infection presence of a large proportion of rare alleles in both functional and neutral genes, as observed in Ontario, supports surveillance data showing no CWD, which might bode well for a long-term adaptive response to CWD infection and other stressors.
Importantly, even low selection coefficients should alter allele frequencies in a detectable manner ( Figure 5). However, it is not clear how this standing genetic variation compares to infected populations prior to the arrival of CWD, creating some uncertainty in the potential adaptive response, recognizing that loci outside PRNP are also involved (Seabury et al., 2020).
We observed no evidence of strong population structure despite the heterogeneous landscape observed in Ontario, but there is a clear latitudinal cline in allele frequencies. This suggests random mating is largely occurring at a regional level in spite of substantial environmental changes including intense agricultural practices, substantial urbanization of the landscape, and climate change (Patton et al., 2018;Schulte et al., 2007;Walter et al., 2009

| CON CLUS ION
We gauged the adaptive potential of CWD naïve deer in Ontario, Canada by assessing functional and neutral genetic diversity.
The genome-wide data were consistent with a large expanding population with a high neutral genomic diversity, no population structure, and an excess of rare alleles, which is also consistent with population and climate models (Kennedy-Slaney et al., 2018).
Voluntary hunter harvest-based surveillance for CWD will likely be able to detect the introduction of CWD in a naïve population; however, we expect that there will be a lag in detection when prevalence is low since low densities likely limit transmission (Gagnier et al., 2020). A lag in detection will likely permit the establishment of CWD and, over time, eradication becomes nearly impossible and costs are high (Mysterud & Rolandsen, 2018).
Targeted temporal monitoring of variation in PRNP across CWDfree regions could be a detection tool as our simulations suggest detectable shifts should occur with the arrival of the disease.
Monitoring frequency shifts at other geographic locations (and other genes) could be easily integrated into our simulations and might identify additional variants of disease relevance. Overall, by combining demographic patterns and genotypes with current risk models, managers could improve risk-based detection efforts and facilitate a more effective resource deployment plan as the disease alters the population.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw data are accessible on the Sequence Read Archive