Comparative and population genomics approaches reveal the basis of adaptation to deserts in a small rodent

Abstract Organisms that live in deserts offer the opportunity to investigate how species adapt to environmental conditions that are lethal to most plants and animals. In the hot deserts of North America, high temperatures and lack of water are conspicuous challenges for organisms living there. The cactus mouse (Peromyscus eremicus) displays several adaptations to these conditions, including low metabolic rate, heat tolerance, and the ability to maintain homeostasis under extreme dehydration. To investigate the genomic basis of desert adaptation in cactus mice, we built a chromosome‐level genome assembly and resequenced 26 additional cactus mouse genomes from two locations in southern California (USA). Using these data, we integrated comparative, population, and functional genomic approaches. We identified 16 gene families exhibiting significant contractions or expansions in the cactus mouse compared to 17 other Myodontine rodent genomes, and found 232 sites across the genome associated with selective sweeps. Functional annotations of candidate gene families and selective sweeps revealed a pervasive signature of selection at genes involved in the synthesis and degradation of proteins, consistent with the evolution of cellular mechanisms to cope with protein denaturation caused by thermal and hyperosmotic stress. Other strong candidate genes included receptors for bitter taste, suggesting a dietary shift towards chemically defended desert plants and insects, and a growth factor involved in lipid metabolism, potentially involved in prevention of dehydration. Understanding how species adapted to deserts will provide an important foundation for predicting future evolutionary responses to increasing temperatures, droughts and desertification in the cactus mouse and other species.


| INTRODUC TI ON
For decades, researchers have been intrigued by adaptation, or the process by which organisms become better fitted to their environments. To this end, scientists have devoted substantial efforts to this issue and have successfully elucidated how natural selection has shaped organismal phenotypes in response to environmental pressures (Berner & Salzburger, 2015;Cooke et al., 2013;Linnen et al., 2009;Nachman, Hoekstra, & D'Agostino, 2003;Savolainen, Lascoux, & Merilä, 2013). Given their influence on metabolism, water availability and ambient temperature are environmental factors relevant to all organisms and are also of growing concern within the context of anthropogenically-induced global climate change and increasing desertification (IPCC, 2018). Studying how animals that are currently living in hot and dry environments have adapted to those conditions is one approach for helping to predict the potential impacts of increasing temperatures and aridity (Hoelzel, 2010;Somero, 2010).
Despite the challenging conditions, a wide variety of organisms have evolved adaptations to live in hot deserts. These adaptations include changes in behaviour to avoid dehydration, excessive solar radiation, and heat (e.g., nocturnal life and sheltering in burrows) and a suite of anatomical modifications to dissipate heat (e.g., long body parts and pale colours). Some of the most striking adaptations are at the physiological level and help to either minimize water loss through efficient excretion and reabsorption of water (Schmidt- Nielsen, 1964;Schmidt-Nielsen & Schmidt-Nielsen, 1952) or compensate for lack of environmental water via enhanced production of metabolic water from nutrient oxidation (Takei, Bartolo, Fujihara, Ueta, & Donald, 2012;Walsberg, 2000). While these adaptations to desert life have been described in several species (for a review on small mammals see Walsberg, 2000) and are important under current climate predictions (IPCC, 2018), the genetic underpinnings of these traits are less well known.
Genomic studies on camels (Camelus bactrianus) have provided substantial evidence related to the genomic basis of adaptation to deserts. For example, analysis of the camel genome showed an enrichment of fast-evolving genes involved in lipid and carbohydrate metabolism, potentially linked to energy production and storage in a food-scarce environment (Bactrian Camels Genome Sequencing & Analysis Consortium, Ding, Chen, Sun & Meng, 2012;Wu et al., 2014). Transcriptome analysis of renal cortex and medulla in control and water-restricted camels showed a strong response to dehydration in genes involved in water reabsorption and glucose metabolism (Wu et al., 2014). Overall, genes in the arachidonic acid pathway seem to play a role in desert adaptation in both camels and desert sheep (Ovis aries; Bactrian Camels Genome Sequencing & Analysis Consortium et al., 2012;Yang et al., 2016). This pathway regulates water retention and reabsorption in the kidney, primarily through changes in renovascular tone. Aquaporins, transmembrane water channel proteins, are also involved in water reabsorption and urine concentration, and changes in their expression levels have been associated with dry environments in kangaroo rats (Dipodomys spp.; Marra, Eo, Hale, Waser, & DeWoody, 2012;Marra, Romero, & DeWoody, 2014) and the Patagonian olive mouse (Abrothrix olivacea; Giorello et al., 2018).
The cactus mouse (Peromyscus eremicus) is native to the deserts of southwestern North America and displays a suite of adaptations to this extreme environment. Cactus mice have behavioural and anatomical adaptations for heat avoidance and dissipation, such as a nocturnal lifestyle, larger ears, and aestivation (Macmillen, 1965).
They have also evolved lower metabolic rates, which result in a reduction in water loss, and resistance to heat stress compared to other generalist Peromyscus spp. (Murie, 1961). For example, while several desert rodents produce concentrated urine (Schmidt-Nielsen, 1964;Schmidt-Nielsen & Schmidt-Nielsen, 1952), the cactus mouse is essentially anuric (Kordonowy et al., 2017), which indicates its extreme efficiency of renal water reabsorption. Kordonowy et al. (2017) showed through experimental manipulation of water availability that captive cactus mice were behaviourally and physiologically intact after three days of severe acute dehydration. Gene expression profiling of kidneys highlighted a starvation-like response at the cellular level in dehydrated mice, despite access to food, and strong differential expression of Cyp4 genes, which are part of the arachidonic acid metabolism pathway (MacManes, 2017). Although these results indicate some degree of convergent evolution with other desert-adapted mammals (Bactrian Camels Genome Sequencing & Analysis Consortium et al., 2012;Takei et al., 2012;Yang et al., 2016), they are limited to expressed genes under particular experimental conditions and in one tissue type only.
Deserts in southwest North America formed relatively recently, only after the retreat of the Pleistocene ice sheets that covered most of the continent during the Last Glacial Maximum approximately 10,000 years ago (Pavlik, 2008). Because the ability to detect genomic signatures of selection depends on coalescent time and effective population size (Nielsen et al., 2005), and given the recent emergence of North American deserts, the footprint of these recent adaptations should continue to be evident in contemporary cactus mouse genomes. Whole genome analyses allow us to detect signatures of selection associated with life in the desert across the complete set of cactus mouse genes, regardless of expression patterns.
Further, they allow for analysis of intergenic areas and for characterization of genomic features that may promote or hinder adaptive evolution, such as the distribution of standing genetic variation and repetitive elements.
To identify genes associated with desert adaptation and to investigate the factors affecting adaptation using the cactus mouse as a model, we first generated a chromosome-level genome assembly and then integrated comparative, population, and functional genomics approaches. As dehydration is a primary challenge desert animals face, we expected to identify signatures of selection associated with metabolism and solute-water balance (i.e., adaptations that either enhance production of metabolic water or prevent fluid loss via excretion) in line with previous studies in the cactus mouse and other desert-adapted species (Bactrian Camels Genome Sequencing & Analysis Consortium et al., Giorello et al., 2018;Marra et al., 2014;Takei et al., 2012;Wu et al., 2014;Yang et al., 2016). Our analyses of gene family evolution and selective sweeps point instead to regulation of protein synthesis and degradation as the main target of selection. While we find strong support for an evolutionary response in perception of bitter taste and lipid metabolism, we do not identify an extensive signal of selection at genes linked to solute-water balance at the whole-genome level.

| Ethics statement
All sample collection procedures were approved by the Animal

| Genome assembly and annotation
We extracted DNA from the liver of a female cactus mouse (ENA TrimmomaTic (Bolger, Lohse, & Usadel, 2014), libraries were assembled using the program allpaThs (Butler et al., 2008). The resulting assembly was gap-filled using a PacBio library (Pacific Biosciences of California, Inc.), constructed from the same DNA extraction and sequenced at ~5× coverage, using pbjelly (English et al., 2012). We error-corrected the resulting assembly with short-insert Illumina data and the pilon software package (Walker et al., 2014).
To improve the draft assembly, we used proximity-ligation data (Hi-C) to further order and orient draft scaffolds. We prepared a Hi-C library using the Proximo Hi-C kit from Phase Genomics. We used ~200 μg of liver from a second wild-caught female animal and proceeded with library preparation following the protocol for animal tissues. The Hi-C library was sequenced at Novogene using one lane of 150 bp paired-end reads on an Illumina HiSeq 4000 platform. To arrange the draft scaffolds in chromosomes we used the program juicer in an iterative fashion . Following each run, we loaded the .map and .assembly files generated by Juicer into Juicebox (Durand, Robinson, et al., 2016), the accompanying software developed to visualize crosslinks, and corrected misassemblies manually. We ran juicer until no well-supported improvements in the assembly were observed. We thus obtained 24 chromosome-sized scaffolds plus 6,785 unplaced short scaffolds. We calculated assembly statistics with the assem-blathon_stats.pl script from the Korf Laboratory (https://github. com/KorfL ab/Assem blath on/blob/maste r/assem blath on_stats. pl) and assessed assembly completeness with busco v3 (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) and the Mammal gene set.
To standardize chromosome naming and enable future comparative analyses, we used the genome assembly of the deer mouse (Peromyscus maniculatus bairdii; NCBI Bioproject PRJNA494228) to name and orient the cactus mouse chromosomes. We used mummer4 (Marçais et al., 2018) to align the cactus mouse genome to the deer mouse genome with the function nucmer and the options --maxgap 2000 and -minclust 1000. We filtered alignments smaller than 10 kb with delta-filter and plotted the alignment using mummerplot. These genome alignments also allowed us to test for synteny between the two species, diverged ~9 million years ago (timetree.com based on Fabre, Hautier, Dimitrov, & Douzery, 2012;Fritz, Bininda-Emonds, & Purvis, 2009;León-Paniagua, Navarro-Sigüenza, Hernández-Baños, & Morales, 2007;Schenk, Rowe, & Steppan, 2013), and to assess the degree of structural divergence between the two genomes.
We identified transposable elements and other repetitive elements using repeaTmasker v.4.0 (Smit, Hubley, & Green, 2015) and the Rodentia data set. The masked genome was annotated using maker v.2.3.1 (Cantarel et al., 2008) and the Mus musculus reference protein data set.

| Whole genome resequencing
Little is known about population structure in the cactus mouse (Riddle, Hafner, & Alexander, 2000), therefore we included samples from two different locations and sampling times to begin characterizing geographic and temporal structuring in this species. We sequenced the genomes of an additional 26 cactus mice collected from two locations in   (Table 1).

Genomic libraries were prepared at the Biotechnology Resource
Center at Cornell University using the Illumina Nextera Library Preparation kit and a modified protocol for low-coverage whole genome resequencing ("skim-seq"). Individually barcoded libraries were sequenced at Novogene using 150 bp paired-end reads from one lane on the Illumina NovaSeq S4 platform. We conducted an analysis of sequencing read quality and trimmed adapters from raw sequencing data with fasTp (Chen, Zhou, Chen, & Gu, 2018). We mapped sequences from each of the 26 individuals to the cactus mouse reference genome using bwa mem (Li & Durbin, 2009) and removed duplicates with samblasTer (Faust & Hall, 2014). The resulting BAM files were sorted and indexed using samTools ).
As sequencing depth was variable among individuals (raw coverage between ~2-17×), we called variants in ANGSD (Korneliussen, Albrechtsen, & Nielsen, 2014) as its algorithm takes into account genotype uncertainty associated with low-coverage data. To identify a list of high-confidence variable sites, we ran a global variant calling analysis including all 26 individuals using the genotype likelihood model from samTools (-GL 1; . To be included in our analyses, a site had to satisfy the following criteria: p-value below 10 -6 , minimum sequencing and mapping qualities above 20, minimum depth and number of individuals equal to half of the number of individuals included in the analysis (13 out of 26), and a minor allele frequency (MAF) above 1%.

| Genomic differentiation across space and time
To estimate the effects of temporal and spatial distance on levels of genomic differentiation among individuals, we first ran an Analysis of Principal Components (PCA) of genetic data using the ngsCovar program from ngsTools (Fumagalli, Vieira, Linderoth, & Nielsen, 2014). We used all high-quality SNPs (defined above) called at the species level and ran the PCA using genotype posterior probabilities as input, rather than called genotypes, to take into account the genotype uncertainty associated with low-coverage data (Korneliussen et al., 2014). To estimate genetic distance  TA B L E 1 Details on sampling locations of individuals sequenced for population genomics analyses. * denotes that the individual outlier was sampled here, and sample size was reduced to seven after outlier removal After outlier exclusion, we reran ANGSD to estimate allele frequencies in Motte and Deep Canyon separately. We provided a list of high-confidence SNPs for use in downstream analyses and applied the same filters as in the global SNP calling, excluding SNP pvalue and MAF thresholds, with the major allele fixed across runs.

F I G U R E 1
We used the sample allele frequencies (.mafs file) from each population to calculate the 2D Site Frequency Spectrum (SFS), which we also used as prior for estimating F ST , a measure of genetic differentiation. We calculated average F ST across autosomes and across the X chromosome separately, and investigated broad patterns of differentiation across the genome in 50 kb sliding windows.

| Sequence and structural standing genetic variation at the species level
To estimate levels and patterns of standing genetic variation at the species level, we analysed samples from both cactus mouse populations together. We calculated the overall proportion of polymorphic sites and nucleotide diversity (π) in 50 kb sliding windows to characterize broad patterns. Estimates of π for each polymorphic site

| Gene family evolution in the cactus mouse
To investigate gene family contractions and expansions in the cactus mouse, we analysed the genomes of 25 additional species (Supplementary Table S1) within the Myodonta clade (Order: Rodentia), which includes rats, mice, and jerboas, for which genome assemblies were publicly available from NCBI in June 2019. To avoid potential biases due to different gene annotation strategies, we reannotated all 25 genomes with the same strategy used for the cactus mouse (see above).
Genome quality was evaluated using BUSCO. We identified groups of orthologous sequences (orthogroups) in all species using the package orThofinder2 (Emms & Kelly, 2018) with DiamonD as protein aligner (Buchfink, Xie, & Huson, 2015). In a preliminary run, we observed that fewer orthogroups and fewer genes per orthogroup were identified in species with lower genome assembly quality. Given that this pattern could bias the results of analysis of gene family expansion or contraction, we took a conservative approach and filtered assemblies that had less than 70% complete benchmarking universal single-copy orthologs (BUSCOs) and thus retained 18 species (Supplementary Table S1). 70% was chosen as a compromise between including as many genomes as possible while eliminating the potential for biases.
We analysed changes in gene family size accounting for phy-

| Detection of signatures of selection
We used an integrative approach to detect signatures of selection from the population genomics data. First, we used sweepfinder2 (DeGiorgio, Huber, Hubisz, Hellmann, & Nielsen, 2016;Nielsen et al., 2005) to detect recent selective sweeps from the SFS as it is compatible with low-coverage whole genome data. We ran these analyses using all 25 individuals, based on the rationale that potential differences between the two populations (Deep Canyon and Motte) due to local adaptation or demography would be swamped by signatures common in the species across populations (Miller et al., 2020). As recommended by Huber, DeGiorgio, Hellmann, and Nielsen (2016), we included both variant and invariant sites, but we could not reconstruct the ancestral state of these sites due to lack of data from closely related species (all species had > 9 million years divergence). The X chromosome was excluded from this analysis. We converted allele frequencies estimated in ANGSD to allele counts, and estimated the SFS from the autosomes only in sweepfinder2. We then tested for sweeps using sweepfinder2 with the -l setting, i.e., using the precomputed SFS, and calculated the composite likelihood ratio (CLR) and α every 10,000 sites. This window size of 10 kb was selected as a trade-off between computational time and resolution. Only the peaks with CLR values above the 99.9 th percentile of the empirical distribution of CLR values were considered under selection. We functionally annotated the two closest genes, one downstream and one upstream, to these outlier peaks and ran a Gene Ontology (GO) enrichment analyses to test whether genes under putative selection were enriched for a particular function or pathway. We performed this analysis on the geneonotology.org webpage using the program panTher (Mi et al., 2017) and the Mus musculus gene set as reference. As GO terms are hierarchical, we summarized these results with the software package revigo (Supek, Bošnjak, Škunca, & Šmuc, 2011), which uses a clustering algorithm based on semantic similarities, setting the similarity threshold at 0.5.
Finally, we generated a list of a priori candidate genes potentially involved in desert adaptation. These included the genes that were most differentially expressed in response to experimental dehydration, including 11 Cyp4 genes from the arachidonic acid metabolism pathway, and the sodium carrier gene SLC8A1 (MacManes, 2017). We also included the nine aquaporins that, despite being important in water reabsorption in the kidney, were not differentially expressed in hydrated versus dehydrated cactus mice (MacManes, 2017). We integrated this list of candidate genes with the genomic areas showing strong signatures of selective sweeps in the sweepfinder2 analysis. As decreases in π and Tajima's D can also be indicative of selective sweeps, we zoomed in around these candidate regions, plus an additional 10 kb flanking on each side, and calculated these two statistics in 1 kb windows.

| Genome assembly and annotation
Illumina and PacBio reads yielded a draft genome assembly of 2.7 Gbp and a scaffold N50 of 1.3 Mbp. Scaffolding with Hi-C data increased contiguity 100-fold and yielded 24 chromosome-sized scaffolds for a total assembly size of 2.5 Gbp, plus 173 Mbp of unplaced scaffolds.
The final assembly contained 92.9% complete BUSCOs, with 1.2% of the genes duplicated and 3.7% missing. Together these statistics indicate that the cactus mouse genome assembly has high contiguity, high completeness and low redundancy (Table 2). We annotated 18,111 protein-coding genes. repeaTmasker masked 35% of the genome as repetitive. LINE1 and LTR elements alone constituted 21% of the repeats. Total proportion of repeats and relative proportion of different repeats classes were similar across eight Peromyscus species (Supplementary Table S2).
Whole genome alignment to the Peromyscus maniculatus genome revealed the presence of several intrachromosomal differences between the two species, but no large inversions, translocations, or interchromosomal rearrangements were evident at the resolution granted by mummer4 (Supplementary Figure S2). This, in combination with a conserved number of chromosomes supported by both karyotype characterization (Smalec, Heider, Flynn, & O'Neill, 2019) and genome assemblies, indicates that genome structure is highly conserved between these Peromyscus species.

| Genomic differentiation across space and time
Our PCA showed that the first two principal components (PCs) ex-  all sites). Global genome-wide π was 6 × 10 -3 . π was lowest on the X chromosome and seemed to increase from chromosome 1 to 23 ( Figure 1c). In fact, chromosome length was a strong negative predictor of nucleotide diversity at each autosome (R 2 = .65, F (1,21) = 39.58, p < .001; Figure 2).

| Sequence and structural standing genetic variation
A large area of elevated nucleotide diversity (~17 Mbp long) was evident at the beginning of chromosome 1 (Figure 1c). Although this signature can indicate the collapse of paralogous sequence, conserved synteny with other Peromyscus species and unequivocal support from the Hi-C contact map strongly indicated that the assembly was correct for chromosome 1. To begin to understand the genome-level processes that may have generated this pattern, we calculated depth of coverage in chromosomes 1 and 2 -a reference chromosome that did not show similar large regions of elevated πusing a subset of the shotgun data and the number and proportion of repetitive elements in 50 kb windows (Supplementary Figure S3).
Depth of sequencing in this unusual area of chromosome 1 was higher than in the rest of chromosome 1 (up to 7× higher) and compared to chromosome 2 (10% higher overall), and showed a similar peak as the one shown for nucleotide diversity (Supplementary Figure S3). The number and proportion of repetitive elements were both higher in this area relative to other parts of chromosome 1, and all of chromosome 2 (Supplementary Figure S3). Whether these repeats have been collapsed or reads from other similar repeats spread throughout the genome have mapped to this area is hard to disentangle, even though the inflated nucleotide diversity would suggest the latter. Together, these analyses indicate that this area is highly repetitive, rather than containing a misassembled large duplication.
Analysis of the 10× Genomics data using longranger resulted in an estimated mean DNA molecule length of only 13,620 bp (ideal is > 40,000 bp), and the number of linked reads per molecule was six, much lower than the ideal threshold of 13. As short molecules can negatively impact the detection of large structural variants and generate many false positives, we adopted a conservative approach by reporting only short indels (41-29,527 bp). We identified a total of 87,640 indels between the reference and an individual from the same population. Indels affected 101 Mbp of the total genome assembly, which represents 4% of the total sequence. Number of indels per chromosome was strongly positively correlated with chromosome size (R 2 = .95, F (1,21) = 438.7, p < .001; Figure 2).

| Identification of selective sweeps
Analysis of the signatures of selective sweeps yielded a total of 232 sites under selection. Of these, 119 clustered in 44 larger regions that included two or more adjacent CLR outliers (Supplementary Figure S4).
By retrieving the genes closest to each peak (one in both up-and downstream directions), we compiled a list of 186 genes associated with selective sweeps (Supplementary Table S3). Fourteen of these genes, including many putative olfactory and vomeronasal receptors,

Gene family expansions/contractions
were not matched with a corresponding GO term. Ribosomes were overrepresented among "cellular components", with eight GO terms pointing to this organelle (p < .001, after Bonferroni correction). In addition to this, 279 biological processes and 89 molecular functions were significantly overrepresented (before correction for multiple tests; full list in Supplementary Tables S4 and S5, respectively). GO terms clustering in REVIGO showed that terms with the lowest p-values under "biological processes" included "membrane organization", "cellular amide metabolism process", "translation", "ribosome assembly", and "detection of chemical stimulus involved in sensory perception of bitter taste" (Figure 4, Supplementary Table S3); while under "molecular functions" they included "structural constituent of ribosome", "binding", "olfactory receptor binding", and "mRNA binding" (Figure 4, Supplementary Table S4). Note that although the selective sweeps identified in the area of inflated π of chromosome 1 should be considered with caution, most of the genes associated with these areas were either vomeronasal receptors that were not included in the GO enrichment analysis (see above) or genes whose function was not enriched overall.
Contrary to predictions, mean π and Tajima's D were significantly higher across the candidate areas for selective sweeps when compared to genome-wide means across all samples combined (Wilcoxon test, p < .001 for both π and Tajima's D; Figure 5) and within each population (Supplementary Figure S5). Among the candidate genes from previous studies, eight (all Cyp4 genes,SLC8A1,and aqp4,aqp5,aqp8,and aqp12) and six genes (the Cyp4a gene cluster, Cyp4v2, SLC8A1, and aqp5, aqp9, and aqp12) showed significant deviations from genome-wide average in π and Tajima's D, respectively (p < .05 after Benjamini-Yekutieli correction for multiple testing), but not always in the predicted direction (Supplementary Figure S6). Among the Cyp4 genes, Cyp4f showed a modest decrease in π; among aquaporins, aqp8 showed a decrease in π, and aqp9 showed a decrease in Tajima's D; and SLC8A1 showed the greatest reduction in π and Tajima's D overall (Supplementary Figure S5).
Number of sweeps in each chromosome did not correlate with mean π (p > .05). A correlation with either chromosome size or number of indels (p < .01) was entirely driven by the outlier behaviour of chromosome 1, and it did not hold when chromosome 1 was removed from the data set (p > .05).

| Chromosome-level assembly for the cactus mouse
A high-quality chromosome-level assembly of the cactus mouse genome allowed us to investigate genomic patterns of variation, differentiation, and other genomic features (i.e. genes, repetitive elements, number and size of chromosomes), and to identify regions of the genome that may be associated with desert adaptations. As the number of publicly available Peromyscus genome assemblies increases (Colella, Tigano, & MacManes, 2020), the cactus mouse genome will provide additional insights into adaptation, speciation, and genome evolution when analysed in a comparative framework. For example, our comparison of the cactus mouse and the deer mouse genomes revealed higher than expected genome stability considering the divergence time between the two species (~9 million years ago), and their large effective population sizes and short generation times (Bromham, 2009;Charlesworth, 2009)

| High genetic diversity and differentiation
Population differentiation between cactus mouse populations inhabiting the Motte and Deep Canyon Reserves in Southern California was overall high despite being separated by only 90 km. Differentiation was high across the genome, but without distinguishable F ST peaks ( Figure 1b). The distribution of allele frequency changes is suggestive of prolonged geographical isolation, which is consistent with the Peninsular Range mountains acting as a dispersal barrier. Although this is not surprising given the limited dispersal ability of Peromyscus mice (e.g., Ribble, 1992;Krohne, Dubbs, & Baccus, 1984), the close proximity of these two sampling locations suggests that population structure across the species range, which spans more than 2,500 km from Nevada to San Luis Potosì, is likely to be strong. Previous analyses based on a single mitochondrial marker split the P. eremicus species complex into three species -P. eva, P. fraterculus, and P. merriami -plus West and East P. eremicus clades (Riddle et al., 2000). Our analyses suggest that population structure could be pronounced even within each P. eremicus clade, which warrants further investigation to elucidate the taxonomic status of these species and to reveal potential differences in adaptation to local desert conditions.
As standing genetic variation is the main source of adaptive genetic variation (Barrett & Schluter, 2008), characterizing levels and distribution of sequence and structural variation can help understand how and where in the genome adaptations evolve. With more than 43 million high-quality SNPs and ~87,000 indels, the cactus mouse exhibits high levels of standing genetic variation, which is consistent with large effective population sizes and comparable to what has been reported for the congeneric white-footed mouse (P. leucopus ;Long et al., 2019). While SNPs are the main, and often the only, type of variation screened in genomic studies of adaptation (Wellenreuther, Mérot, Berdan, & Bernatchez, 2019), here we show that small-to midsized indels are common and cover ~4% of the genome. Given that our analysis of structural variation leveraged sequence data from only one individual, the level of standing structural variation in the population is likely much higher. Pezer, Harr, Teschke, Babiker, and Tautz (2015) found that indels covered ~2% of a wild house mouse (M. musculus domesticus) genome compared to a reference, likely an underestimation considering that the analysis was based on variation of read depth only. While the inclusion of our high coverage 10× genomics data set allowed us to characterize structural variation in a single individual, future work at the population level will allow us to characterize the role of structural variation in the adaptive evolution of the cactus mouse.
Sex chromosomes generally harbor lower nucleotide diversity (Wilson Sayres, 2018) and higher differentiation (Presgraves, 2018) when compared to autosomes due to their reduced effective population size, different mode of inheritance, and their role in the evolution of reproductive barriers. We did observe lower π in comparison to autosomes (45%-71% of mean π for each autosome), slightly lower than neutral expectations assuming equal sex ratio (75%).
Demographic processes and/or selection could be implicated in this additional reduction, but their relative roles were not tested here.
However, contrary to our expectations, analysis of genomic differentiation based on F ST showed that the X chromosome was less differentiated than the autosomes in the interpopulation comparison.
In fact, sex chromosomes are more differentiated than autosomes in 95% of the studies for which this information is available (Presgraves, 2018). In two cases regarding mammals, domestic pigs and wild cats, lower X chromosome differentiation was ascribed to hybridization and introgression (Ai et al., 2015;Li, Davis, Eizirik, & Murphy, 2016).
Hybridization has been reported among several Peromyscus species (Barko & Feldhamer, 2002;Leo & Millien, 2017) and represents a viable hypothesis given that the cactus mouse is sympatric with the canyon mouse (Peromyscus crinitus) and numerous other Peromyscus species throughout its range. In the future, population genomic data from additional Peromyscus species will help assess how common this pattern is within and across species, to test the hybridization hypothesis, to identify potential donor and recipient species, and to test the potential role of hybridization and introgression in desert adaptation.
Neither sequence nor structural variation at the chromosome level was a strong predictor of the number of selective sweeps in a chromosome. However, mean π and Tajima's D were significantly higher in the areas affected by selective sweeps than across the whole genome. Theory predicts that a selective sweep should remove variation from the adaptive site and its surroundings, thus resulting in a localized reduction in π and lower Tajima's D relative to the ancestral level of variation (Kim & Stephan, 2002;Smith & Haigh, 1974). However, the signature of a selective sweep, and the ability to detect it, depends on the strength of selection, the recombination rate around the selected site, and whether the sweep is hard, soft, or incomplete (i.e., whether a single or multiple haplotype carries the beneficial allele, or the allele has not as yet reached fixation; Messer & Neher, 2012). sweepfinder2 is best suited to detect hard selective sweeps and has limited power to identify soft or incomplete sweeps Huber et al., 2016;Nielsen et al., 2005). Therefore, if we do not observe a general reduction of diversity compared to the genome-wide average around adaptive sites, it could be for one or a combination of the following reasons: (a) High recombination rates due to a large effective population size may cause rapid LD decay among neighbouring sites, thus reducing the size of the typical, diagnostic dip in diversity to a point of nondetectability. (b) Due to computational limitations (sweepfinder2 is not able to parallelize, leading to long run times) we estimated the CLR of a site every 10 kb, which may not be dense enough to pinpoint the exact location of the sweep and reveal narrow reductions in π. (c) If soft sweeps are indistinguishable from hard sweeps when selection is strong (Harris, Garud, & DeGiorgio, 2018), selective sweeps may preferentially occur in areas of high standing genetic variation, and (d) even if a reduction in diversity occurs relative to ancestral levels, π may still not drop under the genome-wide average, especially if diversity was originally high. On chromosome 9, for example, where three consecutive sweepfinder2 outlier regions extend over 440 kb, the reduction of π and Tajima's D was drastic, suggesting that coarse resolution may prevent the detection of diversity dips around selected sites if the genomic area affected is small. with ribosomes (e.g., ribosome assembly and translation, structural constituents of ribosomes, mRNA binding, and unfolded protein binding). We also report a significant contraction of a gene family associated with the ubl conjugation pathway, which was similarly identified in an analysis of selective sweeps. Ubiquitin and ubl-proteins function either as a tag on damaged proteins to be degraded or as regulators of interactions among proteins (Hochstrasser, 2009). Cactus mice face many stressors including high temperatures and lack of water. Heat causes cellular stress directly, via thermal stress, and indirectly, by exacerbating the negative effects of dehydration due to lack of water and rapid water loss (e.g., respiratory water loss, evaporative cooling).

| Lost in translation
Thermal and hyperosmotic stress can suppress the transcription and translation machinery, increase DNA breaks and protein oxidation, and cause cell cycle arrest, and eventually apoptosis and cell death (Burg, Ferraris, & Dmitrieva, 2007;Kampinga, 1993). However, the strongest and most immediate effect of thermal and hyperosmotic stress is protein denaturation (Burg et al., 2007;Kampinga, 1993;Lamitina, Huang, & Strange, 2006). Our results are consistent with the expected cellular response to both thermal and hyperosmotic stress, which have similar physiological effects even though the underlying mechanisms may differ. A meta-analysis of genomics and transcriptomics studies investigating the evolutionary response to different thermal environments in metazoans, including invertebrates to mammals, highlighted "translation", "structural constituents of ribosomes", and "ribosome" as the gene ontology terms most commonly enriched (Porcelli, Butlin, Gaston, Joly, & Snook, 2015), in line with our results. Similarly, many genes mediating the cellular response to hyperosmotic stress are involved in the regulation of protein translation and the elimination of denatured proteins in Caenorhabditis elegans (Lamitina et al., 2006).
These analyses suggest that selection has acted strongly on genes responsible for protection against thermal and/or hyperosmotic stress or for efficiently removing damaged proteins and resuming translation after acute stress (Kampinga, 1993). Additionally, as the volume of dehydrated cells decreases causing rearrangements in the cytoskeleton (Burg et al., 2007), the significant contraction of a cytoskeletal protein gene family could also point to additional adaptations to hyperosmotic stress in the cactus mouse. Acute dehydration experiments on captive cactus mice also found limited tissue damage and apoptosis in the kidneys of dehydrated individuals (MacManes, 2017), consistent with our hypothesis. Negative regulation of cell death was one of the most significant GO terms, suggesting that these genes may be under selection to avoid tissue necrosis during acute or chronic stress.

| Life in the desert involves dietary and metabolic adaptations
The GO analysis of genes associated with selective sweeps indicated an enrichment for bitter taste receptors. The perception of bitter taste has evolved to allow organisms to avoid toxic compounds found in many plants and insects (Garcia & Hankins, 1975;Glendinning, 1994).
Although herbivorous and insectivorous animals generally have a larger repertoire of bitter taste receptors compared to their carnivorous counterparts (Li & Zhang, 2014;Wang & Zhao, 2015), they are also less sensitive to bitterness (Glendinning, 1994). The cactus mouse is omnivorous, with a diet predominantly based on seeds, insects, and green vegetation with proportions varying according to seasonal availability (Bradley & Mauer, 1973;Meserve, 1976). We hypothesize that repeated signal of selective sweeps at bitter taste receptor genes may have increased the frequency of alleles that decrease bitter sensitivity, thus making a greater variety of food palatable to the cactus mouse in an environment that is characterized by scarcity of resources and an abundance of bitter-tasting plants and insects.
Chromosome 9 showed the largest and strongest selective sweep in the genome (Supplementary Figure S4). This area was associated with Gdf10 (growth/differentiation factor 10), the only annotated gene of known function in the region, which is involved in osteogenesis and adipogenesis. Overexpression of Gdf10 in the adipose tissues of mice prevents weight gain under a high-fat diet and affects their metabolic homeostasis, including oxygen consumption and energy expenditure (Hino et al., 2017). The drastic loss of weight and the starvation-like response reported in experimentally dehydrated cactus mice suggests that lipid metabolism has a role in the adaptive response to dehydration (MacManes, 2017). Experimental water deprivation induced higher food consumption and loss of body fat in the spinifex hopping mouse (Notomys alexis), a desert-specialist rodent (Takei et al., 2012). In camels, accelerated evolution of genes associated with lipid metabolism was associated with food scarcity in the desert (Wu et al., 2014). The strong signature of selection around Gdf10 in the cactus mouse therefore warrants further investigation, as adaptive changes in lipid metabolism may be pivotal for survival in the desert.
Among the candidate genes we selected from previous studies, only SLC8A1 -the sodium carrier gene -show significant reduction in both π and Tajima's D, consistent with a selective sweep. However, Cyp4v2 -one of the genes in the arachidonic acid pathway -was in proximity of a selective sweep on chromosome 17. This gene shows similar catalytic properties to other Cyp4 genes in the arachidonic acid pathway and is commonly expressed in retinal, kidney, lung, and liver tissue of humans (Nakano, Kelly, & Rettie, 2009 (Skopec & Dearing, 2011).
Although the repertoire of bitter taste receptors, or their sensitivities to bitter taste, has not been investigated in desert rodents, nor our genomic analyses of cactus mice highlighted detoxification genes, it is evident that many desert species have evolved different strategies to cope with bitter, toxic plants in the absence of more palatable options.

| Signatures of selection potentially involved in reproductive isolation
We reported significant evolutionary changes linked to genes for sperm motility, spermatogenesis, and pheromone reception that may lend support to a role of sexual selection in the evolution of the cactus mouse genome. The comparison of sperm morphology and behaviour in Peromyscus species has revealed a link between sperm traits and reproductive strategies. Sperm of the promiscuous P. maniculatus, for example, can aggregate on the basis of relatedness, thus increasing motility and providing a competitive advantage against other males, whereas sperm of the monogamous P. polionotus lacks these adaptations (Fisher, Giomi, Hoekstra, & Mahadevan, 2014;Fisher & Hoekstra, 2010). The four gene families associated with sperm motility showed a conspicuous contraction in the cactus mouse when compared to other Peromyscus species (9 vs. 26, 17 and 21 in P. leucopus, P. maniculatus, and P. polionotus, respectively). Although these results may suggest a correlation between reduction in the number of sperm motility genes and a monogamous reproductive strategy, P. maniculatus also has fewer genes than P. polionotus in these gene families (17 vs. 21 in total). Nonetheless, these candidate genes represent interesting targets for future studies on sperm competition within the cactus mouse and among Peromyscus.

| CON CLUS ION
In conclusion, the high-quality assembly of the cactus mouse genome and the candidate genes identified in this study build on the growing body of genomic resources available to further understand the genomic and physiological basis of desert adaptation in the cactus mouse and other species. Taken together, our results indicate that the strongest signatures of selection in the cactus mouse genome are consistent with adaptations to life in the desert, which are mostly, but not solely, associated with high temperatures and dehydration. Contrary to expectations, we did not find a pervasive signature of selection at genes involved in solute-water balance in the kidneys. However, this does not necessarily discount the relative role of these organs under thermal and hyperosmotic stress, as we have not yet tested when and where in the body the expression of these candidate genes is beneficial. Our analyses also show that signatures of selection are widespread across the cactus mouse genome, with all autosomes showing selective sweeps, and that they are not affected by chromosome-level patterns of standing genetic variation, sequence or structural. Sweeps seem to be associated with high local π, instead. Dynamic gene families and enrichment of several GO terms associated with selective sweeps indicate that the genetic basis of at least some desert-adapted traits may be highly polygenic. In future, evolutionary and physiological genomics work stemming from these results will allow us to better characterize the phenotypes and genotypes associated with desert adaptations in the cactus mouse, and to understand how they evolved.

ACK N OWLED G EM ENTS
We would like to thank Christopher Tracy for access to the Boyd

DATA AVA I L A B I L I T Y S TAT E M E N T
All read data for the genome assembly are housed on ENA under project ID PRJEB33593. Specifically, genome assembly (ERZ1195825), 300 bp PE (ERR3445708), 500 bp PE (ERR3446161), 3 kb mate pair (ERR3446318), 5 kb mate pair (ERR3446317), 7 kb mate pair