The genome of Draba nivalis shows signatures of adaptation to the extreme environmental stresses of the Arctic

Abstract The Arctic is one of the most extreme terrestrial environments on the planet. Here, we present the first chromosome‐scale genome assembly of a plant adapted to the high Arctic, Draba nivalis (Brassicaceae), an attractive model species for studying plant adaptation to the stresses imposed by this harsh environment. We used an iterative scaffolding strategy with data from short‐reads, single‐molecule long reads, proximity ligation data, and a genetic map to produce a 302 Mb assembly that is highly contiguous with 91.6% assembled into eight chromosomes (the base chromosome number). To identify candidate genes and gene families that may have facilitated adaptation to Arctic environmental stresses, we performed comparative genomic analyses with nine non‐Arctic Brassicaceae species. We show that the D. nivalis genome contains expanded suites of genes associated with drought and cold stress (e.g., related to the maintenance of oxidation‐reduction homeostasis, meiosis, and signaling pathways). The expansions of gene families associated with these functions appear to be driven in part by the activity of transposable elements. Tests of positive selection identify suites of candidate genes associated with meiosis and photoperiodism, as well as cold, drought, and oxidative stress responses. Our results reveal a multifaceted landscape of stress adaptation in the D. nivalis genome, offering avenues for the continued development of this species as an Arctic model plant.

cellular and physiological responses to environmental stresses, including cold, drought (Lütz, 2010), and high light exposure (Caldwell et al., 2007). Little is known about the molecular basis of Arctic plant adaptations, and an Arctic model species has yet to be developed to facilitate such studies (Colella et al., 2020;Wullschleger et al., 2015).
Exploring the molecular basis of Arctic plant adaptation may reveal novel mechanisms of environmental stress tolerance, potentially offering guidance to agricultural crop improvement.
Here, we present the chromosome-scale genome assembly of Draba nivalis, a perennial diploid with a circum-Arctic distribution.
This species is ideal for studying the evolution of environmental stress tolerance in plants because it occurs in the high Arctic where extremes in temperature, light regime, and low water availability are ever-present. In a recent study of its transcriptome, we identified numerous candidate genes for Arctic adaptation illuminating its potential as an Arctic model species (Birkeland et al., 2020).
Draba nivalis is also an emerging model for studying the genetics of incipient speciation as intraspecific crosses frequently result in highly sterile hybrids (Grundt et al., 2006;Gustafsson et al., 2014;Skrede et al., 2008). Draba is the largest genus in the Brassicaceae with > 390 species, which mainly occur in Arctic and alpine regions (Jordon-Thaden et al., 2013). Many of these species, including D. nivalis, form dense and hairy cushions as protection from wind and cold, and are strongly autogamous to secure reproduction in their pollinator-poor environments (Brochmann, 1993). The Brassicaceae contains numerous species important for agriculture as well as for research in plant ecology, evolution, development, and molecular biology (Gupta, 2016). The availability of many Brassicaceae genome assemblies enabled us to conduct comparative analyses of chromosomal evolution and functional genomics, to shed light on the genomic characteristics of a plant adapted to the extreme abiotic stresses of the Arctic.

| Plant material and DNA sequencing
Seeds of D. nivalis accession 008-7 from Alaska (Waterfall Creek W., 63.045 latitude, -147.201 longitude; (see Grundt et al., 2006 for complete locality information) were grown under conditions mimicking the Arctic climate (specified in Brochmann et al., 1992) in the Phytotron at the Department of Biosciences, University of Oslo. Genomic DNA was extracted from young leaf tissue using the Qiagen Plant Mini Kit, and 7.7 µg of DNA was delivered to the Norwegian Sequencing Centre for library preparation and sequencing. Paired-end sequencing libraries with a target insert size of 550 bp were produced using the Illumina TruSeq PCR-free kit. This library was sequenced on one half of each of two Illumina HiSeq 2,500 flow cells in rapid run mode with a read length of 250 bp.
These data were used to estimate the genome size based on 25 mer frequency using GenomeScope (Vurture et al., 2017; http://qb.cshl. edu/genom escop e/). A total of 8 µg of genomic DNA was used to produce four sequencing libraries for the Oxford Nanopore MinION platform. For each library, 2 µg of genomic DNA was treated with NEBNext Ultra II End-Repair/ dA-tailing Module and libraries were prepared using the Oxford Nanopore 1D ligation kit  following the manufacturer's protocol. Each of these four libraries was sequenced on a single Oxford Nanopore MinION flow cell (version R9) for approximately 48 hr. Two whole plants of D. nivalis accession 008-7 were flash frozen in liquid nitrogen and shipped to Dovetail Genomics (LLC, Santa Cruz, California 95060, USA) on dry ice for Chicago proximity ligation library preparation and 150 bp paired-end sequencing on the Illumina HiSeq 2500 platform.
To compare broad patterns of synteny, the D. nivalis genome was aligned to the genomes of A. alpina (the most closely related species with an assembled genome; Guo et al., 2017) and A. lyrata with NUCmer version 4.0b2 (Kurtz et al., 2004) using all anchor matches regardless of their uniqueness (--maxmatch) and setting the minimum length of a cluster of matches (-c) to 100.

| Genetic map construction and final map-based scaffolding
We generated an F 2 mapping population by self-pollination of an F 1 hybrid obtained from a cross between a D. nivalis plant from Norway (045-5; maternal parent; see Grundt et al., 2006 for complete locality information) and a D. nivalis plant from Alaska (008-7; paternal parent). Seeds were gently scarified before sowing and a total of 575 F 2 individuals were grown to maturity under our phytotron conditions (see above). Genomic DNA was extracted from young leaf tissue using the Qiagen Plant Mini Kit, and 96-plex double-digest restriction-associated DNA (ddRAD) libraries were produced. For the ddRAD procedure, the restriction enzymes NsiI and MseI were used to digest 500 ng of genomic DNA per sample. Indexed P1 and P2 adapters with sticky ends matching the overhangs left by the restriction enzymes were added to the digested DNA. Following adapter ligation, individual indexed libraries were pooled and amplified with an eight cycle PCR. Ampure XP bead cleanup was performed to remove short fragments (i.e., less than ~200 bp), and the multiplexed libraries were visualized on an Advanced Analytical Fragment Analyser to ensure the libraries were of the correct size (i.e., 300-450 bp). See Supporting Information Methods for protocol details. The final multiplexed libraries were sequenced on six Illumina HiSeq 2500 lanes by the Norwegian Sequencing Centre.
Reads were demultiplexed using ipyrad version 0.5.15 (Eaton, 2014), and adapters and low-quality reads were removed using Cutadapt and FastQC available in the wrapper script Trim Galore! version 0.4.5 (http://www.bioin forma tics.babra ham.ac.uk/ proje cts/trim_galor e/; bases with a Phred score less than 20 were trimmed, and reads shorter than 35 bp following trimming were discarded). A total of six F 2 individuals were removed because of low quality reads. Trimmed paired-end sequence reads were each mapped to the Dniv0087_Chicago assembly using bwa-mem version 0.7.8 (Li, 2013;Li & Durbin, 2010)  RealignerTargetCreator and IndelRealigner were first used to realign indels, and base quality scores were recalibrated using GATK BaseRecalibrator and PrintReads with default settings. Indels and SNPs were called using gatk UnifiedGenotyper in DISCOVERY mode using default parameters. Indels were discarded with VCFtools version 0.1.13 (Danecek et al., 2011) resulting in a VCF file with a total of 166,644 SNPs prior to filtration. VCFtools version 0.1.13 was used to isolate biallelic SNPs and exclude any SNPs that mapped to regions of the Dniv0087_Chicago genome assembly annotated as repetitive elements (see below for repetitive element annotation).
VCFtools was further used to filter the biallelic SNPs for sites with a minimum and maximum coverage depth of eight and 200, respectively, sites with a minimum mapping quality of 50, sites with a minor allele frequency greater than 0.0001, and sites that were called in at least 95% of the samples. These filtration steps resulted in a final VCF file containing 13,990 SNPs genotyped in 537 F 2 individuals.
Using these data, we constructed a genetic map using R/qtl version 1.42-8 (Arends et al., 2010;Broman et al., 2003) and ASMap version 1.0-4 (Taylor & Butler, 2017). Individuals with more than 5% missing genotypes and those that represented duplicate genotypes were removed. Loci that represented redundant genotypes were removed and an initial genetic map was estimated. Based on this map, 2,086 markers exhibiting significant segregation distortion (pvalue < 1e-10) were removed and map distances were recalculated.
The cross upon which this map is based is expected to contain biological sources of segregation distortion as hybrid progeny of these genotypes have previously been shown to exhibit both seed and pollen infertility (Grundt et al., 2006;Gustafsson et al., 2014;Skrede et al., 2008). To ensure that we had a genetic map as complete as possible upon which to scaffold the genome assembly, we reintegrated distorted markers into the genetic map using ASMap. The resulting genetic map and the first map that contained the distorted markers were imported into ASMap and markers exhibiting segregation distortion at -log 10 p-value < 6 were then pushed back into the map based on the marker order reflected by the initial map, and map distances were estimated again without changing the marker ordering. The final genetic map contained 5,055 markers genotyped in 480 F 2 individuals ( Figure S1, Table S1).
Chromonomer version 1.08 (http://catch enlab.life.illin ois.edu/ chrom onome r/) was used with default settings to scaffold the assembly based on the genetic map. To create the input files necessary to run Chromonomer, the ddRAD loci containing SNPs constituting the genetic map were first aligned to the scaffolded D. nivalis accession 008-7 draft genome assembly using bwa-mem (Li, 2013;Li & Durbin, 2010), and an AGP file was then generated for the scaffolded genome assembly using the script fatoagp.pl (https://github. com/sjack man/fasta scrip ts/blob/maste r/fatoagp).

| Identification and annotation of transposable elements
LTR-RTs were annotated following Choudhury et al. (2017) by identifying full-length LTR-RT copies based on structural features using ltrharvest 1.5 (Ellinghaus et al., 2008). After removal of nested as well as overlapping elements, candidate copies with internal regions matching plant non-LTR retrotransposon or DNA transposon con-  (Li & Godzik, 2006). For LTR-RTs, clustering was based on LTR sequences. The full-length copy showing the lowest E-value to an HMM profile in each cluster was selected for the classification of family based on longest significant blastn hits of reverse-transcriptase domains flanked by 800 bp on either side against corresponding reverse-transcriptase sequences from Brassicaceae. Classified Brassicaceae families were further assigned to "tribes" following Choudhury et al. (2017). For TIR transposons, clustering was based on full-length copies for TIR transposons that were further classified into superfamilies (Harbinger, hAT, Mariner/ Tc1, MuDR, EnSpm/CACTA) based on longest significant blastn hits against Viridiplantae DNA transposons extracted from Repbase.
Helitrons show notoriously diverse internal regions and were classified following Yang and Bennetzen (2009), with clustering based on identity over 30 bp at the 3' end of copies (i.e., the hairpin-forming region crucial for rolling circle replication).
Transposable elements were annotated along the genome assembly using all structurally defined and hierarchically classified copies of LTR-RTs, TIR transposons and helitrons from D. nivalis together with non-LTR-RTs (i.e., LINE and SINE from Viridiplantae in Repbase) as a reference. After removal of sequences giving significant blast hit with swissprot protein database for plants and with other transposable element sequences, this reference was used in RepeatMasker (version Open-4; http://www.repea tmask er.org) with RM-BLAST as search engine and divergence set to 20%. Resulting annotations of remnants of transposable element sequences were filtered to remove nested copies and copies with less than 80 bp. Each library was sequenced on 1/10th of an Illumina HiSeq 2500 lane to generate 125 bp paired-end reads. A de novo transcriptome assembly was generated for D. nivalis accession 008-7 using the RNA-seq data from all four tissues with the Trinity version 2.4.0 pipeline (Grabherr et al., 2011;Haas et al., 2013) including default read quality trimming and filtration using Trimmomatic version 0.32 (Bolger et al., 2014).

| Transcriptome assembly and gene annotation
Genes were predicted in the D. nivalis genome assembly using the Maker version 2.31.9 (Holt & Yandell, 2011) pipeline. Augustus version 3.2.2 (Stanke et al., 2008) and SNAP (Release 2013-11-29; Korf, 2004) were used as ab initio gene predictors. The Trinity transcriptome assembly (see above) was used as transcript evidence, and protein sequences from the species Arabis nemorensis, Eutrema salsugineum, Arabis alpina, Arabidopsis lyrata, and Arabidopsis thaliana (version TAIR10) were used as homology-based evidence. The D. nivalis repeat library (see above) was included to mask repetitive elements from annotation. The Maker annotation was first run using the D. nivalis transcriptome directly to infer gene predictions, and training files for the ab initio gene predictors were produced with these results. Maker was run iteratively three additional times using the transcriptome as evidence and providing updated training files for each run. The resulting set of predicted genes was annotated with Pfam domains (El-Gebali et al., 2018) using InterProScan verion 5.4-47.0 (Jones et al., 2014), and GO terms were annotated using Blast2GO version 5.2.5 (Conesa et al., 2005) by searching against the UniProt (https://www.unipr ot.org/) database for Viridiplantae.
Both the D. nivalis genome assembly and the predicted gene set were also evaluated for completeness by searching against a set of 1,440 highly conserved plant genes (Embryophyta) using busco version 3.0.1 (Simao et al., 2015). The genome assembly and predicted gene set were assessed for completeness by running BUSCO in both "genome" and "prot" modes, respectively.

| Comparative chromosome painting
Whole inflorescences of D. nivalis were fixed in freshly prepared ethanol:acetic acid fixative (3:1) overnight, transferred into 70% ethanol and stored at -20°C until use. Mitotic and meiotic (pachytene and diakinesis) chromosome preparations were prepared as described by Mandáková and Lysak (2016a) on suitable slides pretreated with RNase (100 µg/ml, AppliChem) and pepsin (0.1 mg/ ml, Sigma-Aldrich). Based on the known chromosome structure of A. alpina and other Arabideae species (Mandáková et al., 2020;Willing et al., 2015), representative BAC clones of A. thaliana were selected and grouped into contigs for comparative chromosome painting (CCP). A total of 5-10 BAC clones from each tested repeat, was used to localize 5S rDNA loci ( Figure S3). All BACs and rDNA probes were labelled with biotin-dUTP, digoxigenin-dUTP, or Cy3-dUTP by nick translation as described by Mandáková and Lysak (2016b). The labelled BACs were pooled together, ethanol precipitated, dissolved in 20 µl of hybridization mixture (50% formamide and 10% dextran sulphate in 2 × SSC) per slide and pipetted to a microscopic slide containing chromosome spreads. The slide was heated at 80°C for 2 min and incubated in a moist chamber at 37°C overnight. Hybridized probes were visualized either as the direct fluorescence of Cy3-dUTP or through fluorescently labelled antibodies against biotin-dUTP and digoxigenin-dUTP following Mandáková and Lysak (2016b). Chromosomes were counterstained with 4,6-diamidino-2-phenylindole (DAPI, 2 µg/ml) in Vectashield antifade. Fluorescent signals were analysed and photographed using a Zeiss Axioimager epifluorescence microscope and a CoolCube camera (MetaSystems). Individual images were merged and processed using Photoshop CS6 software (Adobe Systems).

| Analyses of gene family evolution
To compare the D. nivalis genome assembly with other Brassicaceae species whose genomes have been sequenced, whole genome assemblies and associated gene annotations were downloaded from public databases (Table S2)  InterProScan (Jones et al., 2014). Pfam domains were quantified for each species, and domains with a Z-score above 1.96 or below -1.96 in D. nivalis were considered significantly enriched or contracted, respectively.

To estimate gene family composition and membership
Orthofinder version 2.2.7 (Emms & Kelly, 2015 was run using the proteins annotated in the D. nivalis genome and the nine Brassicaceae genomes. OrthoFinder was run with default settings using MMseqs2 (Steinegger & Söding, 2017) to cluster proteins by sequence similarity. Tests for significant contractions and expansions of gene families (defined as "orthogroups" by OrthoFinder) were performed with cafe version 4.2 (Han et al., 2013). The species tree used for the CAFE analysis was generated by OrthoFinder using stag (Emms & Kelly, 2018) and rooted using stride (Emms & Kelly, 2017).
This species tree was transformed into an ultrametric tree using r8s (Sanderson, 2003) Willing et al., 2015) to identify homologous gene pairs. mcscanx (Wang et al., 2012) was used to identify patterns of synteny and collinearity in the duplicated genes, both within the D. nivalis genome and between D. nivalis and A. alpina. Dup_GenFinder synthesizes the outputs of these analyses to classify gene duplications into one of five categories using methods that are described in detail in Qiao et al. (2019) and Wang et al. (2012). Tandem duplications (TD) are gene pairs that are located next to one another on the same D.
nivalis chromosome, probably resulting from unequal crossing over.

| Positive selection tests
We used the branch-site model (Zhang et al., 2005) implemented in codeml of paml version 4.9i (Yang, 1997) to test for site-wise positive selection happening on the branch leading to D. nivalis.
Briefly, D. nivalis was defined as the foreground branch on a predefined phylogeny consisting of eight Brassicaceae species (T. parvula, B. oleraceae, R. raphanistrum, A. alpina, C. rubella, A. thaliana, A. lyrata, and D. nivalis ; Table S2), and two models were compared with a likelihood ratio test (LRT): an alternative model that allowed positive selection on the foreground branch, and a null model that did not allow positive selection on the foreground branch (omega fixed to 1). The alternative model was accepted if p < .05 (using χ 2 with one degree of freedom), implying that positive selection has acted on a subset of sites along the branch leading to D. nivalis. The test was run on orthologous gene-alignments with one gene copy from all eight species, constructed from orthogroups identified with OrthoFinder (orthogroups are genes descended from a single gene in the last common ancestor of the eight species; Emms & Kelly, 2015. Due to the low number of singlecopy orthogroups, multiple copy orthogroups were divided into subsets based on the smallest genetic distance to each of the D. nivalis gene copies. This was achieved by (a) aligning all orthogroups based on protein sequence using mafft (Katoh et al., 2005); (b) calculating Kimura protein distances (Kimura, 1983) with the distmat algorithm in emboss version 6.6.0 (Rice et al., 2000); and (c) extracting one gene copy from all Brassicaceae species based on the smallest protein distance to each D. nivalis gene copy. The resulting orthogroup subsets were realigned using prank (Löytynoja & Goldman, 2005) in guidance version 2.02 (Sela et al., 2015) with 10 bootstraps. guidance enables identification and filtration of unreliable alignment regions and sequences, and has been shown to improve positive selection inference on simulated data when used in combination with a phylogeny aware aligner like prank (Jordan & Goldman, 2012;Privman et al., 2012). All alignments containing sequences scoring < 0.6 and all alignment columns scoring < 0.8 in guidance were removed from the data set. Codeml was run 3-4 times for each model with different initial parameter values, and the run with the highest likelihood score was used in the final LRT (see e.g., Wong et al., 2004). Sites with ambiguity data were removed within codeml, and the species phylogeny inferred in OrthoFinder was used in all runs.

| Gene ontology enrichment tests
The positively selected gene set and the sets of expanded and contracted gene families were tested for overrepresented gene ontology (GO) terms using the Bioconductor package topGO version 2.34 (Alexa et al., 2006;Gentleman et al., 2004). We used a Fisher's exact test in combination with the "classic", "elim" and "weight" algorithms to test for GO-term overrepresentation within the three domains: Biological process (BP), molecular function TA B L E 1 Assembly statistics for the D. nivalis genome. Each row shows a different stage in the scaffolding process of the genome assembly. For a more complete table, see   Figures S2, S3and the structure of other Arabideae species (Mandáková et al., 2020). Genomic blocks are coloured to match eight colours corresponding to eight chromosomes of the Ancestral Crucifer Karyotype (Lysak et al., 2016). (c) Relative abundance of TE superfamilies in selected species (see Table S5; Willing et al., 2015). Whereas LTR-RT abundance is similarly elevated inD. nivalisandA. alpinarelative toA. lyrata, LINEs appear to be reduced and TIR-CACTA elements enriched inD. nivalisrelative to both species [Colour figure can be viewed at wileyonlinelibrary.com] (MF) and cellular component (CC). The three algorithms differ in that the "classic" algorithm processes each GO-term independently without considering the GO-graph, the "elim" algorithm processes the GO-graph bottom-up while discarding genes that have already been mapped to significant GO-terms, and the "weight" algorithm is weighing genes annotated to a GO-term based on the scores of neighboring GO-terms (Alexa et al., 2006). Based on simulated data, the "weight" algorithm has been shown to produce less false positives than the "classic" algorithm, whereas the "elim" algorithm further reduces false-positive rate, but with a higher risk of discarding true positives (Privman et al., 2012). The D. nivalis annotated gene set was used as a custom background for all GO term enrichment tests. The significance level was set to p < .05, and the results were not corrected for multiple testing following the recommendations of the creators of the topGO package (Gentleman et al., 2004).

| Genome assembly
Based on the 25mer frequency distribution we estimated the genome size of D. nivalis to 278.48 Mb ( Figure S4; flow cytometry estimates report 254-308 Mb; Grundt et al., 2005 (Figure 1a).

| Chromosome evolution
To examine how the D. nivalis genome conforms to broader pat- We identified several rearrangements and extensive centromere repositioning relative to the ACK. The structure of the D. nivalis genome is very similar to that of A. alpina (Willing et al., 2015), the closest relative of D. nivalis for which a chromosome-scale genome assembly is available, and consistent with genome structures determined for other Arabideae species, including three Draba species (Mandáková et al., 2020).

| Repetitive element annotation
We annotated 94.8 Mb of the genome as direct remnants of re-

| Gene annotation
We predicted gene models with the Maker2 pipeline using BLAST homology to five Brassicaceae genomes and a de novo transcriptome assembly of D. nivalis based on RNA-seq data from leaves, roots, flowers, and flower buds (see Methods). We identified 33,557 gene models, and 74% of the genes were functionally annotated based on similarity to UniProtKB entries, and 70% were annotated with InterPro domains. Approximately 58% of the 33,557 gene models had an annotation edit distance less than or equal to 0.25, suggesting a relatively high degree of agreement between predicted gene models and external evidence. This gene set is somewhat larger than that of A. thaliana (27,654), but consistent with those of closely related species with similar genome size ( Figure S5), and BUSCO analyses indicate 95.2% completeness of conserved embryophyte genes (Table S7)

| Gene and gene family evolution
To explore specialization in the D. nivalis gene set, we compared the abundance of protein family (Pfam) annotations with those of nine Brassicaceae genomes representing broad phylogenetic sampling.
We found 226 Pfam domains to be significantly enriched and 32 to be significantly depleted relative to the other species (Table S8). To summarize functional associations of the enriched Pfam domains, we extracted gene ontology (GO) terms from their corresponding InterPro entries. Amongst these GO terms, the most common F I G U R E 3 Detailed CCP analysis ofD. nivalischromosomes Dniv4, Dniv5, Dniv6, and Dniv7.Arabidopsis thalianaBAC contigs were in situ hybridized to pachytene chromosome spreads inD.nivalis(insets show the same probes localized on diakinetic chromosomes or mitotic metaphase chromosomes). Arrowheads indicate the position of centromeres. Green, yellow, and red colour corresponds to fluorescence of Alexa 488, Cy3 and Texas Red, respectively. Chromosomes were counterstained with DAPI. Scale bars 10 μm Dniv4 Dniv5 Dniv6 Dniv7 Numerous environmental stimuli and stresses can lead to the production of reactive oxygen species (ROS), which can damage cell membranes, nucleic acids, proteins, and metabolites (Apel & Hirt, 2004).
Regulation of ROS metabolism is essential for maintaining cellular oxidation-reduction (redox) homeostasis and is an integral part of the intracellular signal transduction networks evoked by external stimuli (Mittler, 2017), particularly for responses to environmental stresses induced by light, drought, and cold (Neill et al., 2002). The

significant increase in Pfam annotations involved in redox processes
in D. nivalis may indicate that it has evolved novel ways to cope with ROS accumulation associated with Arctic environmental stress

| Tests of positive selection
To search for further evidence of Arctic adaptation in D. nivalis, we performed genome-wide positive selection tests to identify genes that probably evolved under positive selection in this lineage relative to seven related species (see Methods). We found 1,307 positively selected genes (PSGs). These include several candidate genes with functions directly relevant to typical environmental stresses of the Arctic, associated with "response to cold", "response to water deprivation", "photoperiodism", "response to oxidative stress", and F I G U R E 6 Gene family evolution inD. nivalis. (a) Time-calibrated species tree of 10 Brassicaceae genomes showing branches labelled with significant gene family expansions and contractions (+gene families gained, -gene families lost). To the right we show the percentage of gene pairs derived from different modes of duplication in the 10 species. ForD. nivalis,gene duplication modes are shown for both the whole genome (labelledD. nivalis) and for the 2,645 genes that constitute 198 significantly expanded gene families (EGFs). Patterns of gene duplication are broadly consistent between the whole genome and the EGFs, except that transposed (TRD) duplications are less frequent (p-value = 1.8e-15), and proximal (PD,p-value < 2.2e-16) and tandem (TD,p-value = 2.432e-10) duplications are more frequent in EGFs than would be expected by chance (Fisher's exact test). Values for all species other thanD. nivalisare from Qiao et al. (2019), and since their analysis did not includeR. raphanistrum, the results shown are from the closely relatedRaphanus sativa. DSD, dispersed duplication; WGD, whole-genome duplication; TRD, transposed duplication; PD, proximal duplication; TD, tandem duplication. (b and c)Select biological process (b) and molecular function (c) GO terms significantly enriched (Fisher's exact testp-value < 0.05) inD. nivalisEFGs (in b, oxidation-reduction process was not significant, indicated by "ns"). Terms are grouped into broad categories to simplify interpretation. Pie charts for each term show the modes of gene duplication inferred for genes annotated with these terms (DSD, WGD, TRD, PD, TD; colour scheme follows a, with unclassified in grey) "meiosis I" (Figure 7a,b; Tables S13 and S14). Patterns of functional enrichment of the PSGs also highlight several significant BP GO terms probably connected to Arctic adaptation, including "vernalization response", "drought recovery", "short-day photoperiodism", and "oxidation-reduction process" (Figure 7b; Table S14). We also found four PSGs associated with meiosis I, including two D. nivalis homologues to A. thaliana ZYP1A, which is one of three synaptonemal complex transverse filament proteins whose function is disrupted by temperature stress (Bomblies et al., 2015). These results provide evidence for the likely adaptive evolution of core meiosis genes reflected both in EGFs ( Figure 6b) and in positive selection acting on specific components of the synaptonemal complex.

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw data (shotgun sequence data, MinION long reads, Chicago