Multi‐population puma connectivity could restore genomic diversity to at‐risk coastal populations in California

Abstract Urbanization is decreasing wildlife habitat and connectivity worldwide, including for apex predators, such as the puma (Puma concolor). Puma populations along California's central and southern coastal habitats have experienced rapid fragmentation from development, leading to calls for demographic and genetic management. To address urgent conservation genomic concerns, we used double‐digest restriction‐site associated DNA (ddRAD) sequencing to analyze 16,285 genome‐wide single‐nucleotide polymorphisms (SNPs) from 401 pumas sampled broadly across the state. Our analyses indicated support for 4–10 geographically nested, broad‐ to fine‐scale genetic clusters. At the broadest scale, the four genetic clusters had high genetic diversity and exhibited low linkage disequilibrium, indicating that pumas have retained genomic diversity statewide. However, multiple lines of evidence indicated substructure, including 10 finer‐scale genetic clusters, some of which exhibited fixed alleles and linkage disequilibrium. Fragmented populations along the Southern Coast and Central Coast had particularly low genetic diversity and strong linkage disequilibrium, indicating genetic drift and close inbreeding. Our results demonstrate that genetically at risk populations are typically nested within a broader‐scale group of interconnected populations that collectively retain high genetic diversity and heterogenous fixations. Thus, extant variation at the broader scale has potential to restore diversity to local populations if management actions can enhance vital gene flow and recombine locally sequestered genetic diversity. These state‐ and genome‐wide results are critically important for science‐based conservation and management practices. Our nested population genomic analysis highlights the information that can be gained from population genomic studies aiming to provide guidance for the conservation of fragmented populations.


| INTRODUC TI ON
Human development is reducing habitats on a global scale, undermining efforts to conserve ecosystem structure and function (Newbold et al., 2016). Reports of fragmented wildlife populations and the increasing need for human housing and associated agriculture and energy have emphasized the necessity for development to avoid impacting the long-term sustainability of wildlife populations (Jordan et al., 2007;Kiesecker et al., 2011;Saha & Paterson, 2008).
One of the most developed states in the United States is California, which contains the largest census size with over 39 million people (U.S. Census, 2019). Although the development of California has led to historical extirpations of other apex predators, such as the grizzly bear (Ursus arctos; Herrero, 1970) and gray wolf (Canis lupus; Schmidt, 1991), the puma (Puma concolor; also known as mountain lion and cougar) has maintained a widespread distribution throughout the state (Dellinger, Cristescu, et al., 2020).
The puma is a large-bodied felid that originated in South America, migrated and expanded throughout North America, and experienced a human-induced range restriction to the western United States, with an extant remnant population in Florida (Culver et al., 2000). Currently, approximately half of all apparent puma habitats in California is conserved, and the remainder could be subject to further development . Much of the inland areas of California have continous stretches of protected habitat , supporting puma populations with high genetic diversity and large effective population sizes (Gustafson et al., 2019).
However, movement corridors among coastal mountain ranges are increasingly being degraded by human development (Burdett et al., 2010;Suraci et al., 2020;Zeller et al., 2017). Despite the natural longrange dispersal abilities of pumas (Gonzalez-Borrajo et al., 2017), interstate highways limit dispersal via avoidance and direct mortality in some urban areas (Riley et al., 2014;Vickers et al., 2015). Although human-caused mortality from vehicle collisions and lethal removal after wildlife-livestock conflicts are concerns (Guerisoli et al., 2021;Torres et al., 1996), a larger concern for long-term population viability is the genetic isolation of pumas within small or shrinking patches of habitat, which has led to high levels of intraspecific competition and mortality (Benson et al., 2020) and low genetic diversity in some areas Gustafson et al., 2019;Riley et al., 2014).
Previous studies have reported that two isolated puma populations in southern California, including the Santa Ana Mountains and the Santa Monica Mountains (Figure 1), had the lowest genetic diversity estimates measured throughout the range of P. concolor Riley et al., 2014), apart from the endangered Florida panther (P. c. coryi). In both the Santa Ana and Santa Monica Mountains, phenotypic evidence of inbreeding depression has been observed, similar to that of Florida panthers Huffmeyer et al., 2021;Roelke et al., 1993). For both populations, freeway traffic isolates pumas Riley et al., 2014;Vickers et al., 2015), and contemporary gene flow has been severely limited. Detailed pedigree analyses following the immigration of one male into each region showed evidence of natural genetic rescue Gustafson et al., 2017;Riley et al., 2014).
Although migrant effects were positive, projection models predict the extirpation of these populations in 50 years without enhanced demographic dispersal and gene flow (Benson et al., 2016(Benson et al., , 2019. Recently published genome-resequencing data that included four pumas from California, two from Santa Monica Mountains, and two from the Central Coast North region in the Santa Cruz Mountains indicated that these individuals had ~20%-40% of their genomes represented as long runs of homozygosity, resulting from recent inbreeding (Saremi et al., 2019). However, these runs of homozygosity were not shared among individuals, and different populations exhibited different homozygous haplotypes, suggesting that genetic restoration (Hedrick, 2005;Tallmon et al., 2004) is possible because genetic variation still exists.
The complex distribution of pumas throughout California along a continuum of high genetic diversity populations occupying abundant habitat to strongly isolated populations displaying evidence of inbreeding depression requires a thorough characterization of statewide genomic diversity to achieve proper conservation. In this study, our objective was to characterize patterns of genomic diversity at varying geographic scales. Such an approach has the potential to aid conservation strategies because it can identify at-risk, low-diversity local populations that would benefit from the restored gene flow within a broader geographic region. We identified 16,285 singlenucleotide polymorphisms (SNPs) from 401 individuals using a double-digest, restriction-site associated DNA sequencing method (ddRAD; Peterson et al., 2012). Specifically, our aims were to determine population genomic structure, genetic diversity, evidence for selection, and linkage disequilibrium.

| Sample collection and DNA extraction
We obtained 354 tissue samples collected by the California Department of Fish and Wildlife between 2011-2017 from pumas either hit by car (~6%), found dead (~2%), poached (<1%), or through depredation permits (>90%), which had never been used in any previous genetic survey. Samples were well-distributed throughout the state, except for smaller populations in smaller mountain ranges. To bolster our sample size in the Los Angeles region of southern California, we added the only remaining DNA extracts (N = 144) from pumas collected between 2002-2015 (Riley et al., 2014;Vickers et al., 2015). After genomic and bioinformatic filtering (described below), we retained 401 out of 498 samples in the final dataset, which spanned the majority of puma habitat in California, excluding desert regions (Figure 1). For samples that lacked a precise GPS location, we used the nearest address or town where they were collected as their GPS point. Samples were stored at −80°C until DNA was extracted using Omega Bio-tek Mag-Bind Blood & Tissue DNA HDQ Kits (Omega Bio-tek, #M6399-01), with a manufacturerdesigned protocol for the Kingfisher Duo Prime (ThermoFisher Scientific, #5400110) automated DNA purification system. We measured the concentration of DNA from each sample using a Qubit 3.0 fluorometer (Invitrogen, #Q33216) with Qubit dsDNA highsensitivity kits (Invitrogen, #Q32854).

| Double-digest restriction-site associated DNA library preparation and sequencing
We reduced the genome size of our samples and identified singlenucleotide polymorphisms (SNPs) using modifications to the double-digest restriction-site associated DNA sequencing (ddRAD) protocols developed by Peterson et al. (2012). We used a library construction scheme which pooled 48 samples per library based on barcode availability, cost effective multiplexing, and sufficient coverage per individual. For each pooled library, we first normalized DNA concentrations to the sample with the lowest concentration within a library, with the goal to be more than 200 ng DNA starting material in 25 µL elution buffer (>8 ng/µl). The library with the lowest normalized starting concentration for each sample had 17.8 ng/µl DNA, whereas the library with the highest starting material had 51.6 ng/µl DNA. We used digestion enzymes and protocols established with previous puma studies (Trumbo et al., 2019). After DNA was normalized, we double-digested the DNA from each individual using NlaIII (New England BioLabs, #R0125S) and EcoRI (New England BioLabs, #R3101S) restriction enzymes (37°C for 3 h, then held at 4°C) at the manufacturer-recommended enzyme concentrations and used AMPure XP beads (Beckman Coulter, #A63881) at a 1.5X ratio to retain only DNA from the digestion. We omitted the DynaBeads cleanup step used by Peterson et al. (2012) and again used the Qubit to measure DNA concentrations and to guide another round of normalization. After normalization, the library with the lowest per-sample concentration had 2.1 ng/µl (in 29 µl), and the library with the highest per-sample concentration had 8.1 ng/µl. We then ligated 48 uniquely barcoded P1 adaptors (e.g., P1.1 through P1.48) and two common P2 adapter pairs (i.e., P2.1 and P2.2) to each sample's double-digested fragments using the protocols of Peterson et al. (2012) to identify individual puma samples. Following ligation with individual barcodes, we pooled all 48 samples into a single tube and used AMPure XP beads to clean the library. We used TE buffer (rather than molecular-grade water) as the final step in this cleanup, which is recommended by the manufacturer for running size selection in the Pippin Prep (Sage Science, Beverly, Massachusetts).
We selected fragments ranging from 375-475 bp (including 75 bp F I G U R E 1 Location of 401 sampled pumas used in analyses, including (a) sample distribution across California, (b) geography of the mountain ranges surrounding the Los Angeles and San Diego regions, and (c) inset map showing the location of California in the United States of America of adapters) using 2% dye-free gels run on a Pippin Prep. To minimize random polymerase chain reaction (PCR) duplicate errors, we split the library and ran five high-fidelity Phusion (New England BioLabs, #M0530) PCRs for 12 cycles on a SimpliAmp thermal cycler (ThermoFisher Scientific, #A24811). We then recombined the five PCR products and used an AMPure XP bead cleanup on the amplified library. Sample concentrations after size selection averaged 2.0 ng/µl DNA (range 0.82-3.7) and, after the PCR, averaged 8.2 ng/ µl (range 3.6-15.0). We shipped the unfrozen DNA with cold packs to the University of Oregon's Genomics and Cell Characterization Core Facility (https://gc3f.uoreg on.edu/) for 150 bp single-end sequencing on an Illumina HiSeq 4000 (Illumina).

| Bioinformatic SNP filtering
We ran standard quality control analyses using program FastQC v0.11.5 (Andrews, 2010). We used the process_radtags program in the Stacks v2.55 (Catchen et al., 2013) package to de-multiplex the reads based on unique barcodes to assign each sequence to an individual puma sample, to remove sequences with a Phred quality score below 20 (99% accuracy), and to remove Illumina adapter sequences from the data. We then aligned reads for each individual to PumCon1.0-the Puma concolor draft reference genome-using program bwa . We identified and filtered SNPs with Samtools . We discarded loci with a mapping quality score below 20, minimum base quality less than 20, with more than two alleles at a site, and with a maximum depth greater than 100.
We skipped indels and used only a random SNP per read to reduce linkage disequilibrium.
Using vcftools, we tested the effects of multiple filtering parameters on our dataset, specifically looking at which parameters produced unreliable and inconsistent heterozygosity estimates, inbreeding coefficients, and relatedness values. We retained loci with a minor allele frequency ≥0.05 as lower frequency SNPs could be sequencing error. The relationship between the minimum depth of reads per individual and heterozygosity was asymptotic and plateaued at about 3-4 reads. To be conservative, we selected a minimum depth of four reads per individual to reliably acquire genotypes based on both alleles. We also retained SNPs that had genotypes for at least 50% of the individuals. We iteratively removed samples with more than 50% missing data to maximize the number of SNPs retained in the dataset. Being more conservative with the percent of missing data decreased the number of SNPs in the final dataset but did not affect heterozygosity estimates, inbreeding coefficients, and relatedness values. We scanned for duplicate samples using relatedness values in vcftools, but, as expected, found none because most DNA samples were removed from the dead pumas. We also removed two potentially contaminated samples based on negative F statistics in vcftools.
In each library of 48 samples, we strategically included puma samples from across a large geographic area so libraries would have no correlation with the spatial location. For example, there was no significant difference between mean sample latitudes (F 7,309 = 1.108, p = 0.358) or longitudes (F 7,309 = 1.533, p = 0.155) among libraries. However, because the southern California libraries constructed from pre-existing extracts were from a small geographic region, there ended up being some latitudinal (F 10,395 = 33.76, p < 0.001) and longitudinal (F 10,395 = 33.89, p < 0.001) mean differences between those libraries and the libraries constructed from the new samples.
However, as indicated below, there were no detectable biases of the southern California libraries in any analyses.
To test for library-effect biases (i.e., differences among sequencing lanes), we used BayeScan to identify outlier SNPs while treating sequencing lanes as "populations" and using a false discovery rate of 0.05 (Foll & Gaggiotti, 2008). There were no outlier loci among any of the libraries, including the southern California libraries. We also assessed bias with various genetic structure analyses. Genotypes resulting from the pre-existing DNA extracts consistently clustered with those genotypes resulting from the new samples collected from southern California. With no apparent library-effect biases, we retained 16,285 biallelic variants (mean ± SD = 12,245 ± 2749) with a mean depth at each locus of 11.7 ± 5.1 and a mean depth per locus per individual of 11.7 ± 7.1.

| Population structure and outlier loci
We used multiple approaches to identify genetic clusters of individuals, including a linear principal components analysis (PCA) and a spatially explicit population structure analysis in program R (R Core Team, 2020). We ran the PCA using adegenet 2.1.1 (Jombart, 2008) and the structure analysis in tess3r 1.1.0 (Caye et al., 2016). We used adegenet::colorplot to present the linear structure identified by the first three principal component axes. In tess3r, we ran 20 replicates for each K (1-20) at 100,000 iterations each. We kept the most highly supported model (i.e., "best" based on cross-entropy scores) within each of the 20 replicates. To test for evidence of loci under selection, we identified outlier loci among populations (Narum & Hess, 2011) using BayeScan and tess3r with the Benjamini-Hochberg statistical correction and the recommended α-value of 0.0001.

| Genetic diversity, effective population size, genetic differentiation, and linkage decay
For each genetic cluster identified in tess3r, we calculated observed heterozygosity (H O ), gene diversity (H S ), and allelic richness (A r ) using hierfstat::basic.stats (Goudet, 2005;Nei, 1987). To test for Wahlund effects within broad-scale clusters, we used t-tests to test for differences between H O and H S . We calculated private alleles (A p ) using poppr::private_alleles (Kamvar et al., 2014). We used NeEstimator 2.1 (Do et al., 2014) to estimate effective population size (N e ) using the linkage disequilibrium model, random mating, allele frequencies >0.05, and with a correction factor of 19 haploid chromosomes (Hsu et al., 1963), as recommended by Waples et al. (2016). We used hierf stat::pairwise.neifst and hierfstat::pairwise.WCfst to estimate pairwise genetic differentiation based on F ST according to Nei (1987) or Weir and Cockerham (1984).
We used Plink 2.0 (Purcell et al., 2007) to estimate linkage disequilibrium among loci (--ld-window-r2 0 --ld-window 999999 --ld-window-kb 8000). To determine the level of non-random segregation of alleles across the genome, we assessed linkage decay in each genetic cluster by plotting the correlation of loci (R 2 ) based on genomic distance between SNPs. We correlated loci using binned intervals of 100,000bp from 0 to the maximum scaffold size of PumCon1.0. Meiosis should break up linkage, resulting in low R 2 values. However, populations experiencing strong selection, low mutation, inbreeding, low migration, or strong genetic drift will have higher R 2 values. In short, SNPs that are close together on chromosomes are expected to be correlated (i.e., inherited as chromosomal/ haplotype segments), but SNPs far away are expected to assort randomly during recombination. However, if sequences are too similar, which they may be in small and inbred populations, we will not be able to detect events of crossing over despite their occurrence, resulting in higher estimates of linkage disequilibrium, which is still an important indicator of genetic diversity and N e .

| Population structure and outlier loci
We recovered 16,285 SNPs that were randomly distributed among 125 draft-genome scaffolds. The first three axes of the PCA ac- A spatially explicit population structure analysis indicated that there was broad-to fine-scale nested genetic structure with support for 4-10 genetic clusters (Figure 3). Root mean square error (inset plot in K = 2 panel of Figure 3) and cross-entropy scores (inset plot in K = 3 panel of Figure 3) provide statistical evidence for nested genetic structure; values begin to curve at K = 4, and there is a major increase in variance at K = 5, but there is a steady increase in statistical support at higher K values. However, single pumas formed individual clusters at K > 10, at which point K lost biological meaning. When K was set to 4, the genetic clusters corresponded to the broad-scale genetic groups identified by the PCA (Figures 2 and 3).  (Table 1) The nested genetic clusters within the Sierra Nevada-including KC, WSN, and ESN-had the highest genetic diversity estimates, as well as the highest estimates of N e . Only the WSN had an N e above 50, a threshold commonly considered to be sustainable over the long term (Table 1; Franklin, 1980). Pairwise F ST estimates among nested genetic clusters within the Sierra Nevada suggested weak substructure, with little genetic differentiation (i.e., pairwise F ST < 0.05), indicating substantial gene flow throughout this region (Table 2). Within the Sierra Nevada, the ESN showed slightly higher LD than KC or WSN, and all three retained a high proportion of polymorphic loci (i.e., 87-91%).
The nested genetic clusters within the Southern Coast-including EP, SGSB, and the SA-exhibited lower genetic diversity estimates when than the Sierra Nevada, as well as large differences when compared to each other (      Figure 4). and among broad-scale groups could increase genetic diversity to entire regions and could decrease the apparent effects of genetic drift and inbreeding to some at-risk coastal populations (Ernest et al., 2003Gustafson et al., 2017;Riley et al., 2014).

| DISCUSS ION
The support for four broad-scale genetic groups from SNPs is different than previous studies using microsatellites (Ernest et al., 2003;Gustafson et al., 2019), indicating the importance of using genomic methods in the study of broader-scale wildlife conservation genetics. Our data further support the claim that the Sierra Nevada region is a major refugium of puma genetic diversity in California (Gustafson et al., 2019). Therefore, it is important to protect the Sierra Nevada group from habitat degradation and foster conser- tern could also be driven by carrying capacity processes associated with habitat limitations . If dispersal is limited by continued development southeast of the Central Coast North population, rapid genetic drift and inbreeding may ensue (Mills & Allendorf, 1996;Wang, 2004), and local extinctions may occur as predicted in the Central Coast South and Santa Ana populations (Benson et al., 2016(Benson et al., , 2019. Thus, puma population viability will be facilitated when land management agencies and land developers in the region work proactively to preserve or enhance wildlife corridors. Notably, the San Gabriel-San Bernardino population had the lowest N e , but had intermediate levels of genetic diversity. Occasional migrants could alter N e estimates and temporarily inflate estimates of heterozygosity (Gustafson et al., 2017). We suggest this could also be the result of metapopulation dynamics-that is, a small local population with frequent turnover located at the intersection of dispersal   Gustafson et al., 2017;Riley et al., 2014).  Armstrong et al., 2021) and leopards (Panthera pardus; Perez et al., 2006). Thus, it is becoming increasingly evident that large-bodied cats and other apex predators will need habitat and connectivity for long-term evolutionary survival. Natural events of genetic restoration among fragmented populations of pumas in California Gustafson et al., 2017;Riley et al., 2014), combined with our linkage decay analysis, indicates that pumas and other apex predators may need to be managed in a metapopulation framework that incorporates genomic data (Farquharson et al., 2021).
We tested for outlier loci using multiple methods (Narum & Hess, 2011), but found no evidence of local adaptation when K = 4 or K = 10. Detection of outlier loci with RAD-seq is limited by the reduced representation of the genome, yet it has often been shown to be an effective approach (Catchen et al., 2017). Pumas are longdistance dispersers (Hawley et al., 2016;Sweanor et al., 2000) and inhabit all major mountain ranges in California (Dellinger, Gustafson, et al., 2020), suggesting that local adaptation may be unlikely. Our results provide preliminary evidence that outbreeding depression resulting from potential active genetic management may be of minimal concern (Frankham et al., 2011). Recent modeling (Kyriazis et al., 2021) does suggest, however, that attempts to maximize genetic diversity in a population can introduce hidden deleterious recessive mutations, enhancing extinction risk. The modeling of Kyriazis et al. (2021) has faced criticisms (García-Dorado & Caballero, 2021), however, and Ralls et al. (2020) argue that the benefits of increasing genetic diversity outweigh the risks. Thus, managers could consider actions (e.g., wildlife overpasses/underpasses, translocation of individuals between populations, etc.) to improve viability of some coastal populations, as was empirically demonstrated to have shifted the trajectory of Florida panther population from extinction (Ralls et al., 2018). However, we suggest whole-genome resequencing methods better suited for detecting selection (Fuentes-Pardo & Ruzzante, 2017) be implemented before such efforts, especially over long distances. Managers would also need to consider other risks as well, such as the movement of pathogens or the ethical implications of moving large carnivores (Bevins et al., 2012). Wildlife managers will have to weigh these concerns against their obligation to minimize the risks of extirpation, such as those predicted for the Santa Ana and Central Coast South populations (Benson et al., 2019), and shown here to be a concern in the Central Coast North population as well. Should connectivity be re-established, then these factors, as well as possible local adaptation, should be weighed carefully.
It is our opinion that current efforts to construct or improve wildlife crossing structures that can facilitate natural movement among coastal populations should be considered the primary management strategy for conserving viable puma populations in that region.

| CON CLUS ION
Our population genomic analyses provide decision makers a contemporary and thorough evaluation of the genetic diversity, effective population sizes, and connectivity of puma populations throughout California. These state-and genome-wide results are critically important for conservation and management practices in California, especially considering the increasing demand for development and the current political climate surrounding the petition to list pumas in Southern and Central California as threatened under the California Endangered Species Act (Yap et al., 2019).
In brief, puma populations are widespread throughout the mountains of California. Populations range from major genetic sources to populations with issues of low genetic diversity and inbreeding. Multiple lines of evidence suggest that inbred populations do not share the same runs of homozygosity and, therefore, genetic diversity could be restored through enhanced gene flow.
Current challenges to puma populations are highly regional and should be addressed by focusing on how natural geography and human development impacts puma habitat and movements locally. Attention is understandably given to those populations that are highly imperiled, but it is important to note that California has several thriving populations throughout the state which represent an important resource for any genetic management strategy.
Protecting tracts of contiguous habitat to preserve large populations will provide greater protection for the species as a whole.
Specifically, further fragmentation of habitat in the Sierra Nevada group could be catastrophic to population viability of pumas in the state because it serves as a genetic refugium. Protecting, en-