Lineage‐specific adaptation to climate involves flowering time in North American Arabidopsis lyrata

Adaptation to local climatic conditions is commonly found within species, but whether it involves the same intraspecific genomic variants is unknown. We studied this question in North American Arabidopsis lyrata, whose current distribution is shaped by post‐glacial range expansion from two refugia, resulting in two distinct genetic clusters covering comparable climatic gradients. Using pooled whole‐genome sequence data of 41 outcrossing populations, we identified loci associated with three niche‐determining climatic variables in the two clusters and compared these outliers. Little evidence was found for parallelism in climate adaptation for single nucleotide polymorphisms (SNPs) and for genes with an accumulation of outlier SNPs. Significantly increased selection coefficients supported them as candidates of climate adaptation. However, the fraction of gene ontology (GO) terms shared between clusters was higher compared to outlier SNPs and outlier genes, suggesting that selection acts on similar pathways but not necessarily the same genes. Enriched GO terms involved responses to abiotic and biotic stress, circadian rhythm and development, with flower development and reproduction being among the most frequently detected. In line with GO enrichment, regulators of flowering time were detected as outlier genes. Our results suggest that while adaptation to environmental gradients on the genomic level are lineage‐specific in A. lyrata, similar biological processes seem to be involved. Differential loss of standing genetic variation, probably driven by genetic drift, can in part account for the lack of parallel evolution on the genomic level.

indicate whether several routes of genomic change can lead to adaptation to similar climatic conditions or whether few options exist.
Further analyses should target specific aspects of constraints, such as standing genetic variation, and the interference of linkage and demographic history.
If two independent lineages adapt to similar environments using the same genetic variants or genes, this is termed parallel evolution (Arendt & Reznick, 2008). The process is also called convergent evolution if the same genetic variants or genes are involved in adaptation but species and lineages are highly unrelated. In principle, parallel evolution on the level of the phenotype and genotype can be due to new mutations or selection on standing genetic variation (Chan et al., 2010), while introgression of adaptive variants from another lineage only mimics parallel adaptation. Whether or not evolution is repeatable has implications for adaptability under changing conditions, and knowledge of the genomic basis of this adaptation allows for predictions into the future. This has led to increasing interest in parallel evolution over the past years, and the development of new methods to quantify such parallelism (Yeaman, Gerstein, Hodgins, & Whitlock, 2018). Additionally, parallel detection of outliers in independent lineages strongly indicates adaptive value of the respective variants (Barrett & Hoekstra, 2011).
Theory considering only new mutations predicts that the chance of parallel evolution depends on the number of potential beneficial mutations (Orr, 2005). The chance of parallel evolution from standing genetic variation depends on the initial allele frequency, effect size and population history (MacPherson & Nuismer, 2017). The higher the initial allele frequency is, the greater the chance for parallel fixation of alleles experiencing similar selection. Likewise, the higher the effect size of the allele in question, the more likely the allele will become fixed in both lineages independently under similar selection regimes (MacPherson & Nuismer, 2017). Population demographic history also plays an important role because the larger the effective population size is, the more likely parallelism becomes, as random genetic drift is less likely to drive an allele to extinction (MacPherson & Nuismer, 2017). Finally, strong selection may also increase the chance of parallel adaptation if it acts in the same direction (MacPherson & Nuismer, 2017). Linked selection may further lead to the coupling of coadapted loci, increasing the regions under seletion (Bierne, Welch, Loire, Bonhomme, & David, 2011;Butlin & Smadja, 2018). Adaptation from standing genetic variation has the advantage that adaptive alleles are already present in the populations, and adaptation can occur much more rapidly (Barrett & Schluter, 2008). In summary, parallelism on the level of genetic variants is expected when the number of variants increasing fitness is low, their frequency or effect size is large, selection is strong, and population history allows for maintenance of these variants. In contrast, its lack is expected when many variants carry small fitness benefits or when genetic diversity is lost through genetic drift.
Molecular studies have shown that parallel adaptation to environmental conditions is indeed not rare in nature. Typically, parallelism is assessed based on genomic data, such as through genome-wide association studies and the subsequent study of patterns of selection on the nucleotide level or in candidate genes associated with environmental factors being compared between different evolutionary lineages. Parallel adaptation from new mutations is most commonly found in experimental evolution in bacteria or viruses, and while some studies have found extensive parallelism at the nucleotide level (Wichman, 1999), parallelism at the level of genes seems much more common even in these organisms with small genomes (Bailey, Rodrigue, & Kassen, 2015;Barrick et al., 2009;Woods, Schneider, Winkworth, Riley, & Lenski, 2006). Also, selection from standing genetic variation is often detected as the source of rapid adaptation to environmental change, such as in resistance to drought in Arabidopsis thaliana (Exposito-Alonso et al., 2018). Within the same species, adaptation to the same environment might even be based on new mutations as well as on standing genetic variation, such as in the threespine stickleback, in which adaptation to freshwater has been facilitated by parallel adaptation from independent mutations (Chan et al., 2010) as well as from standing genetic variation (Colosimo et al., 2005;Jones et al., 2012). Adaptation to climatic conditions is probably complex because of the potentially diverse interplay of different climatic variables such as temperature, precipitation or seasonality across the distribution range. Nevertheless, parallelism in climate adaptation (on the level of genes) has been shown for species as distantly related as pine and spruce. Although these species diverged over 100 million years ago, patterns of convergence were identified in 47 genes, or between 10% and 18% of top candidate genes for adaptation to temperature and cold hardiness (Yeaman et al., 2016). In contrast, within black spruce, adaptation to climatic conditions in Pleistocene lineages was found to be often lineage-specific (Prunier, Gérardi, Laroche, Beaulieu, & Bousquet, 2012). These contrasting results stress the need to better understand the conditions that lead to parallelism or its lack on the different levels of organismal organization.
Here, we studied parallelism in climate adaptation in the shortlived perennial plant Arabidopsis lyrata subsp. lyrata (A. lyrata from here on) of eastern North America. It is a close relative of the model plant A. thaliana (Hohmann et al., 2014;Novikova et al., 2016) and has become a model species itself (Bomblies & Weigel, 2007). Previous research identified two distinct genetic clusters in A. lyrata by microsatellite analysis and plastid data (Foxe et al., 2010;Griffin & Willi, 2014;Willi & Määttänen, 2010), which have been confirmed by pooled whole-genome sequence data more recently (Willi, Fracassetti, Zoller, & Van Buskirk, 2018). One cluster is located in the Midwest from Missouri to the Great Lakes region (subsequently called western cluster), and the other in the east, mainly in the Appalachians from North Carolina to New York State (eastern cluster). Relatedness trees suggested that they each had their own refuge area during the last glaciation cycle close to the line of the maximum extent of the ice sheet and underwent post-glacial range expansion northwards but also east-and westwards, resulting in an admixture zone at Lake Erie (Griffin & Willi, 2014;Willi et al., 2018). Due to this range expansion, both ancestral clusters nowadays occur over fairly wide latitudinal and longitudinal ranges, with very diverse climatic conditions within ranges.
The ecological niche of A. lyrata across its distribution range has been studied recently, and the most important factors for determining the niche have been identified (Lee-Yaw, Fracassetti, & Willi, 2018). These were the average minimum temperature in early spring (March and April), precipitation of the wettest quarter, and moisture availability (Priestley-Taylor coefficient). Minimum temperature contributed most to the niche, while the other two variables had moderate importance.
In this study, we used sequencing data covering the complete genome of A. lyrata and the complete distribution range to address the following questions: (a) Which genes and gene ontology (GO) terms are involved in adaptation to niche-determining climatic factors? (b) Did the two distinct lineages adapt to climatic conditions in parallel?
If so, are the same genes or even the same SNPs involved in adaptation to the environment in the two postglacial lineages of A. lyrata, or is parallelism restricted to GO terms? (c) Could standing genetic variation, or the lack thereof, account for the observed patterns of (non-)parallelism?

| Plant material and sampling strategy
We investigated parallel adaptation to climate in North American  Table S1 in Appendix S1). The eastern genetic cluster was represented with 19 populations and the western cluster with 22 populations (Figure 1a).
Selfing or mixed-mating populations and populations showing signs of admixture in previous studies (Griffin & Willi, 2014;Willi & Määttänen, 2010) were excluded. For each population, the DNA from 25 individuals was pooled before library preparation and each barcoded library was sequenced on four lanes, using one quarter of each respective lane to account for potential technical artifacts between runs. Paired-end sequencing was conducted on an Illumina HiSeq 2000 with read length 100 bp (Fracassetti, Griffin, & Willi, 2015). Information on the number of reads for each population is provided in Table S2 in Appendix S1.

| Processing of pooled sequence data
We used a similar bioinformatics pipeline as described by Lucek, Hohmann, and Willi (2019). Trimming was performed for each library and lane separately using the Perl script trim-fastq.pl from the software package PoPoolation (Kofler et al., 2011) with a minimum base quality of 28 and a minimum length of 84 bp. All reads were mapped to a reference in BWA-MEM 0.7.13 (Li, 2013).
Reference was the genome sequence of A. lyrata (Hu et al., 2011) with the updated annotation (Rawat et al., 2015) and the following modifications: Positions 8,746,835,273 and 9,128,212,301 on scaffold_2 were masked due to their high similarity with the plastid genome sequence, which may have been caused by erroneous assembly of the reference genome. Additionally, plastid and mitochondrial genome sequences of A. thaliana were included (GenBank/ENA accessions NC_000932 and NC_001284, respectively). For each population, all reads sequenced on different lanes were then merged, and only those mapping to scaffolds 1-8 were retained. Duplicate reads were removed using Picard 2.1.1 (http://broad insti tute.github.io/picar d/), mappings were filtered to retain only properly mapped reads with mapping quality >20, and finally a single mpileup file was generated in SAMtools 1.3.1 (Li et al., 2009 was required to make a variant call. Indels were identified using Vcftools 0.1.14 (Danecek et al., 2011), and nucleotides were filtered for the following additional parameters: maximum read depth of 500 and minimum genotype quality of 20. Only biallelic sites were considered; indels and adjacent regions (5 bp in both directions) were removed. Additionally, sites in previously identified repetitive regions (Fracassetti et al., 2015) were removed using BEDTools 2.26.0 (Quinlan & Hall, 2010). The two sets of populations were then selected. Strand bias for both reference and alternative allele was removed in R 3.4.1 (R Core Team, 2016) on a population basis. Sites with a strand bias >90% were set to missing for each population, and subsequently the data sets were filtered again for sites with ≥50% missing data. Finally, our filtering resulted in a total number of 1,546,883 (east) and 1,371,857 (west) sites. A total of 750,998 sites were shared between clusters (48.55% of the eastern and 54.74% of the western cluster).
Nonsynonymous sites were identified using SnpEff 4.3 (Cingolani et al., 2012). Information on genomic coverage is provided in Table   S2 in Appendix S1.

| Selection of climate variables and environmental association analysis
Eight environmental variables linked to climate, topography and vegetation had previously been used to model the ecological niche of A. lyrata in North America (Lee-Yaw et al., 2018). Of these, three climatic variables were identified to play a major role in determining the niche, i.e., showing a high percentage contribution to the model and/or high permutation importance in niche modelling (Lee-Yaw et al., 2018). We used these three variables in our genome-wide association study (GWAS), namely minimum temperature in early spring, i.e., the months of March and April (T min spring ), precipitation of the wettest quarter (P wettest quarter ), and the Priestley-Taylor coefficient (Moisture). The latter is the ratio of annual actual evapotranspiration over annual potential evapotranspiration, a measure of moisture availability. We extracted values for T min spring and P wettest quarter from the WorldClim database version 1.4 (http://world clim.org/; Hijmans, Cameron, Parra, Jones, and Jarvis (2005); containing data for ) and for Moisture from the Consortium for Spatial Information (http://www.cgiarcsi.org/data; Trabucco and Zomer (2010);data for 1950-2000.
Climatic variables are provided in Table S1 in Appendix S1.
Association of the three selected variables with the SNP sets of each cluster was tested by Environmental Association Analysis in BayPass version 2.1 (Gautier, 2015). In a first run, the BayPass core model was used with default parameters to generate the population covariance matrix from a set of 50,000 randomly selected sites, which was subsequently used to correct for population structure. Since population structure due to recolonization and similarity in climate where populations occur can show congruent patterns, we also assessed whether the population covariance matrices correlated with any of the climatic variables or latitude by Mantel tests (Mantel, 1967), using 10,000 bootstrap iterations to assess significance. The auxiliary covariate model was then run 10 times for each data set and climatic variable. Twenty pilot runs with 1,000 steps each were used for every independent run,

| Definition of outliers, GO enrichment analysis, and evaluation of shared SNPs, genes and GO terms
Parallel adaptation was studied at the 750,998 shared sites on the level of SNPs, genes and gene ontology (GO) terms, and we additionally aimed to narrow down the numbers of the latter two to outlier genes and enriched GO terms. To identify outlier SNPs, we used two different approaches: (a) We defined outlier SNPs as those SNPs whose average Bayes Factor (BF) over 10 runs was among the highest 1% for each cluster and climatic variable (7,510 SNPs, hereafter "BF outlier SNPs"). (b) Outlier SNPs were detected by a Hidden Markov Model (HMM) on mean BF values from 10 runs following Soria-Carrasco et al. (2014) (hereafter "HMM outlier SNPs"). Specifically, we modelled the distribution of BF assuming two states (low vs. high) with the r package HiddenMarkov (Harte, 2017). We then estimated the mean BF for each state and the transition rate matrix among states from the data with the Baum-Welch algorithm (Baum, Petrie, Soules, & Weiss, 1970).
Finally, we used the Viterbi algorithm to predict the most likely sequence of hidden states from the data and estimated param- name and description were obtained from TAIR (The Arabidopsis Information Resource, www.arabi dopsis.org). GO enrichment analysis was conducted using Snp2go 1.0.5 (Szkiba, Kapun, von Haeseler, & Gallach, 2014) for each cluster, climatic variable and SNP detection approach using a False Discovery Rate (FDR) of 0.05. This method is based directly on outlier SNPs rather than genes and accounts for clustering of outliers with respect to position and coverage of the genome, and for clustering of similarly annotated genes. The results of GO enrichment were visualized in REVIGO (Supek, Bošnjak, Škunca, & Šmuc, 2011) using the A. thaliana database with similarity set to 0.7 to discard redundant GO terms. For all analyses of GO terms, we restricted our analyses to terms of biological processes, and we report only outlier genes and enriched GO terms detected using both SNP outlier detection approaches.
To evaluate whether the number of BF outlier SNPs shared between eastern and western cluster was higher or lower than expected, we used a resampling approach. We randomly sampled the number of outlier SNPs (7,510) from all SNPs (750,998) separately for the eastern and western cluster. We then calculated the overlap for a total of 10,000 replicates and compared the results with observed values.

| Signatures of directional selection in outlier genes
We estimated the strength of selection on SNPs in SelEstim 1.1.7 (Vitalis, Gautier, Dawson, & Beaumont, 2014), which implements an infinite island model to detect directional selection. The eastern and western SNP data sets were analyzed separately, with five pilot runs of 1,000 iterations each, followed by 10,000 iterations of sampling every 20 steps, of which the first 1,000 were discarded as burnin.
The locus-specific selection coefficient δ j was used as a measure of the strength of selection. To address whether outlier genes had higher selection coefficients than other genes we calculated mean selection coefficients for all genes with at least 10 SNPs for both clusters. We then tested for differences in mean selection coefficients within each cluster between (a) outlier genes, (b) those genes detected as outlier genes only in the respective other cluster and (c) all nonoutlier genes using ANOVA with a Tukey post hoc comparison in R.
To test whether outlier genes had a signature of elevated linkage disequilibrium (LD), we estimated LD for each population and gene from our read data. Because we worked with pool-seq data, we were restricted to assess LD within paired sequence reads (Feder, Petrov, & Bergland, 2012). Given the distribution of insert sizes, we focused on LD within 200 bp distance. We calculated the squared correlation coefficient r 2 between pairs of SNPs using the direct estimate method of ldx (Feder et al., 2012). We only considered sites with an intersecting read depth greater than five and a minor allele frequency >0.15 at either locus (Feder et al., 2012;Tuttle et al., 2016).
We estimated LD weighted for the distance between SNPs for each gene and population and tested for significant differences in LD between outlier genes and all other genes within each cluster using a linear mixed-effects model implemented in the lMe4 package (Bates, Mächler, Bolker, & Walker, 2015). We used outlier class as fixed effect and population as random effect. This analysis was restricted to genes with at least 10 SNPs in the data set. A table of genes and their respective lengths, number of SNPs and relative SNP content is given in Appendix S4.

| Analysis of standing genetic variation in core and edge populations
In the case of nonparallelism in genomic patterns of adaptation, genetic drift could be a likely reason. Arabidopsis lyrata was shown to have been exposed to strong genetic drift and the loss of genomewide diversity along its expansion routes , and had possibly undergone bottlenecks in its refugia. Therefore, the species may have experienced a loss in the potential for parallel adaptation from standing genetic variation, which could account for lineagespecific adaptation. To assess whether the presence of standing genetic variation could have allowed for selection of the same loci in the expanding populations of both clusters, or whether a lack of standing genetic variation reduced the potential for parallelism, we focused on the core populations, from which recolonization occurred. Comparisons involved three populations for each core area (MD4, PA3 and NY1 in the east and WI1, WI2 and WI3 in the west, indicated in Figure 1), the reconstructed ancestral area from which the two lineages expanded postglacially . We evaluated standing genetic variation in each cluster by calculating mean minor allele frequency (MAF) for the three respective populations for outlier SNPs (BF and HMM) detected in the other cluster only.
Note that while our cluster-wide filtering strategy only included polymorphic sites (with lowest per-population detection frequency of 0.03), alleles can be fixed across the three selected populations. An absence of polymorphism in the core area would indicate that there may not have been standing genetic variation from which selection could occur at the onset of post-glacial range expansion, thus limiting the potential for parallel adaptation. We furthermore compared the distribution of minor allele frequencies of all nonshared outlier SNPs using Wilcoxon tests for each climatic variable.
Fixation of alleles can occur through several processes, among them selection and drift. We tested whether nonsynonymous outlier SNPs were fixed more often than silent SNPs at core populations and the edge of the distribution range in both clusters. In addition to the core populations named above, three edge populations were chosen for each cluster (NC2, VA1 and NJ1 in the east and MO1, IL1 and ON7 in the west, see Figure 1), and minor allele frequencies were assessed. The observed fraction of fixed nonsynonymous outlier SNP sites (MAF = 0.0) was compared to a distribution generated through randomly sampling MAF (equivalent to the number of outlier SNPs of the respective data set) of silent nonoutlier sites using 10,000 replicates.

| Population structure and climate variables within the study area
Two well separated genetic clusters were confirmed by PCA based on all overlapping SNPs (Figure 1b). The investigated climatic variables were not strongly correlated with each other or with latitude and longitude (Pearson's correlation coefficient ρ between −0.7 and 0.72), with the exception of minimum temperature in early spring, which showed a strong correlation with latitude (ρ ≥ 0.8), particularly in the western cluster (ρ = 0.98; Figure 2). Furthermore, none of the correlations between population structure based on BayPass covariance matrices and climatic variables were significant (see Table S3 in Appendix S1). For precipitation of the wettest quarter and the Priestley-Taylor coefficient (Moisture), parameter ranges were very similar in the eastern and western cluster (Table   S1 in Appendix S1, Figure 2). However, the range of minimum temperature in early spring, T min spring , was larger in the western cluster (−8.2 to +3.2°C) compared to the eastern cluster −2.8 to +4.6°C); temperature declined stronger towards the northern edge of the distribution in the west, in southern Canada.

| Environmental association analysis, outliers and GO term enrichment analysis
Results reported here concern the overlapping SNPs only. The 750,998 SNPs detected in both clusters covered large parts of the genome of Arabidopsis lyrata, including 27,023 of 32,069 annotated genes (84.3%). Approximately two thirds of the SNPs (498,305) were located in genes. The outlier SNPs were distributed across all eight chromosomes of the genome (Figures S1-S2 in Appendix S1).
Of those, between 65.6% and 72.9% were located within genes for the twelve analyses (three climatic variables, two genetic clusters and two approaches of outlier detection) ( Table S4 in Appendix   S1). A total of 15.23% of SNPs were nonsynonymous, as were between 12.9% and 18.0% of outlier SNPs (Table S4 in Appendix S1).
This lack of enrichment of nonsynonymous SNPs may be attributed to our strict filtering approaches, which may have led to the identification of linked rather than adaptive SNPs as outliers. The highest amounts of outlier SNPs located in genes and nonsynonymous SNPs (both absolute and relative to the percentage of SNPs located in genes) were detected in the western cluster.
We further identified between 16 and 39 outlier genes for each cluster and climatic variable, and by considering only overlapping outlier genes of the two outlier detection approaches ( Figure S3 in Appendix S1). Notably, outlier genes using the HMM approach were almost a perfect subset of those detected using BF (between 51 and 65 genes), with either one or no outlier gene detected with HMM that was not also found with BF. Among the outliers were genes that had previously been identified to either play a direct role in adaptation to climate, or to be associated with traits and pathways potentially involved in climate adaptation (see Table 1 and Section 4).

| Evaluation of shared SNPs, genes and GO terms
To detect the amount of parallelism in climate adaptation between the eastern and western cluster, outlier SNPs detected in GWAS from both clusters based on the overlapping SNP data set of 750,998 SNPs were compared. Very few (0.05%-0.65%) of the outlier SNPs were shared between clusters for the three climatic variables (BF outliers: 53 SNPs for T min spring , 61 for P wettest quarter and 97 for moisture; HMM outliers: six SNPs for T min spring and Moisture, three SNPs for P wettest quarter ; see Table S4, Figure S7 in Appendix S1). We compared the values from BF outlier SNPs with the expected number of shared SNPs determined by using a resampling approach. Observed values were marginally lower than expected by chance for T min spring , marginally higher than expected for Moisture, and in the range of random expectation for P wettest quarter (Figure 3). None of the outlier genes were shared between east and west. In contrast, multiple GO terms were shared between clusters (T min spring : six GO terms/3.9% of enriched GO terms detected for this climate variable, P wettest quarter : 7/5.07%, Moisture: 17/19.54%). These were mostly related to flower development, DNA replication, cell cycle and metabolism (Appendix S2).
Additionally, we counted the number of overlaps between outliers associated with the three climatic factors within each genetic cluster, as detecting the same outliers in GWAS runs of multiple cli-

matic factors suggests more general responses involved in adapta-
tion. The number of outlier SNPs shared between climatic variables was low and reflected the magnitude of correlation detected between variables. In the eastern cluster, where correlation between Moisture and P wettest quarter was higher than in the western cluster (Pearson's correlation coefficient ρ = 0.71), 775 BF and 113 HMM outlier SNPs, representing 3.6% and 1.51% of the respective total number of outlier SNPs, were shared between those two variables ( Figure S7 in Appendix S1). In contrast, correlation between T min spring and Moisture was higher in the west with ρ = 0.52, and 602 BF outlier SNPs and 170 HMM outlier SNPs (2.8% and 1.83%) were shared between those two variables in the western cluster ( Figure   S7 in Appendix S1). No overlap in outlier SNPs was detected among all three climatic variables in the east, but 34 BF and two HMM outlier SNPs were shared among all variables in the west. No outlier genes were shared between all three variables in both clusters, and only two were shared between two variables in each cluster (P wettest quarter and Moisture in the eastern cluster and T min spring and P wettest quarter in the western cluster. For enriched GO terms, a similar pattern was observed, where climatic variables having a higher correlation in the respective cluster also showed higher amounts of shared outliers, however, the relative proportions were higher (5.44% shared between Moisture and T min spring in the east and 13.27% shared between T min spring and P wettest quarter in the west). While no GO terms were shared between all three variables in the east, nine GO terms

| Strength of selection and linkage disequilibrium
To detect whether outlier genes experienced stronger selection than other genes, we analyzed mean selection coefficients and LD values per gene (Figure 4). Mean selection coefficients (mean ± SD: 10.60 ± 9.53 for outliers in the eastern cluster, 8.41 ± 5.60 for outliers in the western cluster) across outlier genes were significantly higher (Tukey post hoc comparison, p < .001) than in nonoutlier genes (5.74 ± 3.12 and 6.26 ± 3.53, respectively). The same pattern was observed for LD (mean ± SE: 0.69 ± 0.02 for outliers in the eastern cluster and 0.74 ± 0.02 for outliers in the western cluster compared to 0.59 ± 0.02 and 0.66 ± 0.02 for nonoutlier genes in east and west, respectively). However, LD was also significantly (p < .001) elevated in outlier genes detected in the respective other cluster (0.69 ± 0.02 for outlier genes from the western cluster in the eastern data set and 0.73 ± 0.02 for outlier genes from the eastern cluster in the western data set), indicating that outlier genes were generally located in genomic regions of elevated LD. This was not the case for mean selection coefficient. In the eastern data set, mean selection coefficients for outlier genes of the western cluster (6.27 ± 4.03)

TA B L E 1 Top candidate genes for climate adaptation in Arabidopsis lyrata
were comparable to nonoutlier genes (5.74 ± 3.12), and significantly lower than those of outlier genes in the eastern cluster, indicating that the detected outliers are under stronger selection than other genes. Interestingly, while FLC was not among the genes included in this analysis, as it contained only six polymorphic sites shared by both clusters, mean selection coefficient of this gene was high in the east, where it was detected as an outlier, and low in the west (19.87 and 3.82, respectively).

| Differential loss of standing genetic variation
We compared minor allele frequencies (MAF) in the core of the distri- for all climatic variables), with more low frequency alleles in the western cluster ( Figure 5). Fixation among nonsynonymous outlier SNPs was between 0.048 and 0.26 across the two clusters and core or edge of the distribution ( Figure S8 in Appendix S1). The highest value was reached at edge populations of the western cluster, while the lowest was observed in western core populations. Compared to fixation levels at silent nonoutlier sites, fixation in nonsynonymous outliers was either higher (eastern core, western edge) or within the range of observation (eastern edge, western core). Notably, fixation levels of silent nonoutliers were lower in core populations than in edge populations for both clusters, and lower in the east than the west.

| Genes and gene ontology terms of adaptation to climatic factors
A first aim of our study was to detect genes and biological processes involved in climate adaptation. Some of the outlier genes found are prime candidates of climate adaptation (Table 1) (Table 1). Another gene of this pathway, ZAT10, was detected as a candidate for local adaptation in Scandinavian Arabidopsis lyrata subsp.
Multiple studies have shown its crucial role for adaptation to climate along latitudinal and altitudinal clines in A. thaliana, together with its regulator FRI (FRIGIDA) and an epistatic interaction between the two F I G U R E 3 Expected and observed number of outlier SNPs shared between genetic clusters of Arabidopsis lyrata. Expectation was calculated using a random resampling approach with 10,000 replicates and is indicated in grey (the two 2.5% tails of the distribution in light red). Observed values are indicated in blue (T min spring ), green (P wettest quarter ) and orange (Moisture) genes (e.g., Koornneef, Alonso-Blanco, Peeters, & Soppe, 1998;Korves et al., 2007;Méndez-Vigo, Pico, Ramiro, Martinez-Zapater, & Alonso-Blanco, 2011;Shindo et al., 2005;Stinchcombe et al., 2004). Regulators of flowering time were also found to be associated with precipitation in the eastern cluster. We detected PDP2 (PWWP DOMAIN PROTEIN 2) and CIB1 (CRYPTOCHROME-INTERACTING BASIC-HELIX-LOOP-HELIX 1), which are involved in the autonomous and photoperiod pathway of flowering time regulation, respectively Zhou et al., 2018). Signatures of selection and local adaptation have also been detected in different genes from these two pathways in A. lyrata subsp.

Number of shared SNPs
petraea (Mattila et al., 2016). Early flowering was shown to be an effective strategy of adaptation to coping with drought, via drought avoidance (Franks & Weis, 2008).
Top candidate genes of climate adaptation in the western cluster were not related to the well known pathways of flowering time; however, two outlier genes detected for minimum temperature in early spring were involved in other aspects of reproduction (pollen tube growth and floral organ initiation; see Table 1), and two genes were involved in leaf development (cell elongation and cuticle development). Common garden experiments had shown latitudinal clines for plant size and reproductive development in A. lyrata, with plants from the north growing larger rosettes under benign temperature conditions and having a higher propensity to flower early, independent of genetic cluster (Paccard, Fruleux, & Willi, 2014;Wos & Willi, 2015). Early spring temperature strongly correlates with latitude, particularly in the western cluster (Figure 2), and thus our GWAS may have detected the genetic basis for the observed phenotypes, making our outlier genes candidates for future studies. Particularly the fair number of flowering-related genes in our outliers in east and west suggests that pathways of reproductive phenology are important in climate adaptation -both along temperature and precipitation gradients. Further candidates of climate adaptation, particularly to precipitation and water availability, were genes involved in (thermal) tolerance, stomatal differentiation and water-use efficiency.
In a complex climatic landscape involving multiple stress factors, adaptation to the local combination of such factors are thus facilitated by modulating different parts of the pathways responsible for physiological or developmental responses.
Furthermore, many of the enriched gene ontology (GO) terms were related to transcription or metabolic processes (Appendix S2).
Transcription factors, kinases and genes involved in hormone signaling had been identified to probably influence stress responses to multiple biotic and abiotic factors and facilitate crosstalk between them (Fujita et al., 2006). Other significant terms included flower development and shoot development, processes which had been shown to be important determinants of life history traits in two populations of A. lyrata originating from different climatic backgrounds (Remington, Figueroa, & Rane, 2015). Generally, biological processes involved in adaptation to environment seem to be comparable between different species of the F I G U R E 4 Selection and linkage in outlier genes. Mean strength of selection and LD was calculated per gene for all genes with at least 10 SNPs present. Boxplots for three groups per cluster are shown: Outlier genes, genes that were detected as outliers in the respective other cluster and all other (nonoutliers) genes. Eastern cluster is indicated in blue, western cluster in purple in both panels. Significance within clusters was calculated using linear mixed effect models; and significant differences are indicated with bars above the boxplots (p-value < .001, ***; < .01, **) genus Arabidopsis. In A. thaliana, a comprehensive study across large parts of the native distribution identified biological processes enriched for various climate variables (Hancock et al., 2011). Many of these were related to development, transcription, DNA repair and metabolism and were also identified in our study. In Arabidopsis halleri, a closer relative of A. lyrata, 16 GO terms associated with adaptation to various climatic and topographic parameters were identified (Fischer et al., 2013), six of which (e.g., aging, DNA methylation, chromatin silencing, protein autophosphorylation) were also detected in our analyses. Interestingly, comparing the underlying genes in A. halleri with the outliers we found in A. lyrata did not yield any shared genes despite the apparent similarities in biological processes. The same pattern was observed comparing our outlier genes to those related to local adaptation in Scandinavian A. lyrata subsp. petraea (Mattila et al., 2017). Other studies focusing on particular stress responses also identified similar GO terms, although not necessarily in response to similar climate variables: Three out of 65 terms enriched in the eastern cluster in response to moisture availability, linked to circadian rhythm, response to reactive oxygen species and red or far-red light signalling, were also found to be among the most significantly enriched GO terms involved in freezing tolerance in A. thaliana (Horton, Willems, Sasaki, Koornneef, & Nordborg, 2016), but also in this case the outlier genes were not shared. Our results thus suggest that adaptation to climate in Arabidopsis often seems to involve similar biological processes, despite the different habitats the species naturally occur in. Whether these results are transferable to other plant species outside this clade is yet to be shown.

| Parallelism in climate adaptation
Our second goal was to determine the extent to which the same SNPs, genes or GO terms were associated with climate in both lineages, interpreted as evidence for parallel evolution. We detected only few shared SNPs associated with climatic variables between the two clusters, and hardly more than expected by chance alone (Figure 3). The level of shared genes was even lower, with none shared between clusters, and only two shared between climates within each cluster. Nevertheless, selection was much higher in outlier compared to nonoutlier genes, corroborating their importance for adaptation. The reduced level of selection in the eastern data set for outlier genes detected in the western cluster further emphasized the importance of lineage-specific adaptation to climate in A. lyrata on the genomic level.
In contrast to the low number of shared outlier SNPs and genes, parallelism was higher at the level of GO terms, indicating that despite the prevailing lineage-specific adaptation on the genomic level, similar biological processes were involved in adaptation across the species range. An analogous pattern was found in experimental evolution studies in bacteria, where environmental adaptation was also shaped hierarchically with few shared nucleotide changes but many similar pathways affected (Bailey et al., 2015). In A. halleri, adaptation to environment in the Swiss Alps was found to be mostly local even on a small geographic scale, putatively due to the complex and heterogeneous environment in their study area (Rellstab et al., 2017).

F I G U R E 5
Standing genetic variation in core populations for the three climate variables. To assess whether a lack of standing genetic variation in the core populations (from which the species expanded post-glacially) could have led to the observed patterns of nonparallelism in outlier SNPs, we calculated minor allele frequency (in classes of 0.05) at outlier loci from the respective other genetic cluster for the three climate variables separately. This analysis was restricted to the three core populations of the target cluster; mean frequency across the three population in the core was calculated based on number of SNPs with no missing data. T min spring is indicated in blue, P wettest quarter in green and Moisture in red orange in both panels. Upper bounds of frequency classes are given below the barplots

East West
We also detected outliers shared between GWAS runs for the different climatic parameters but within clusters, suggesting that adaptation to different climatic factors also often involves the same biological processes, but different genes/SNPs. In line with our top candidate genes, floral whorl development and floral organ development were the two GO terms detected in five out of six climate-cluster combinations. Overall, patterns of outlier SNPs shared between climatic variables within clusters were consistent with the strength of the correlations between climatic variables. More outlier SNPs were shared between the two precipitation-related variables in the eastern cluster, where these two were more strongly correlated with each other, while in the western cluster the number of shared SNPs between precipitation of the wettest quarter and minimum temperature in early spring was higher, as was their correlation (Figure 2, Figure S7 in Appendix S1). The pattern was similar for outlier genes, although the proportion of shared genes was relatively low, indicating that adaptation is not only lineage-specific, but also environment-specific to some extent.

| Reasons for a lack of parallelism on the level of genetic variants and genes
The third goal of our study was to address whether the presence or Pleistocene with its climate oscillations posed major challenges for many species, as they had to repeatedly move into new areas, adapt to new environments and survive in refuge areas during glaciation periods (Hewitt, 2000). Isolation in refuge areas and subsequent recolonization often reduced population sizes and standing genetic variation available for adaptation, as for example found for North American A. lyrata . While standing genetic variation can be the basis for parallel adaptation to similar environmental gradients (Jones et al., 2012), allele frequencies may have been perturbed during demographic bottlenecks imposed by the glacial history, causing lineage-specific patterns of adaptation. This seems to probably have happened in A. lyrata: Populations in the western cluster were shown to have started from a genetically less diverse refuge area and expanded over longer distances since the withdrawal of ice . Concordantly, we detected more low frequency alleles in the core populations of the western cluster in those SNPs identified as outliers in the eastern cluster, which could reflect stronger bottleneck effects in the west. This pattern was also observed in the higher amounts of fixed sites in the western cluster in both core and edge populations. In particular at the western distribution edge, a high proportion (~26%) of nonsynonymous outlier SNPs were fixed, probably through drift, leading to a loss of genetic variation along the expansion route, which was particularly long in the west . In addition, strong selection may have led to an excess of fixed nonsynonymous outliers compared to silent nonoutliers. Low genetic diversity at candidate loci in recently colonized areas was also found in A. lyrata subsp. petraea in Scandinavia, where it was associated with selective sweeps (Mattila et al., 2017). These patterns of selection may be associated with recent colonization and the required adaptation to new environments.
In core populations, however, where genetic diversity is higher, more outliers in the east were fixed compared to nonoutliers, potentially due to selection. Interestingly, while we detected higher selection specifically in outlier genes, a different pattern was detected for LD, suggesting that the high LD we detected was not generally a result of selection. Outlier genes might be located in regions of high LD. A similar pattern was found in A. lyrata subsp. petraea, where outliers related to local adaptation were located in regions of low recombination rate and high gene density (Hämälä & Savolainen, 2019).
Lower recombination rates may contribute to population divergence through reducing the effects of gene flow (e.g., Renaut et al., 2013) and thus be important for local adaptation through coupling of coadapted alleles (Bierne et al., 2011;Butlin & Smadja, 2018).
The chance for parallel evolution from new mutations is predicted to be low when many beneficial mutations are possible (Orr, 2005). This is likely to be the case in polygenic traits such as adaptation to environment in the complex genomes of eukaryotes and can be explained by redundant pathways leading to adaptation in highly polygenic traits, including multiple SNPs affecting the same gene in a similar way, and several genes involved in the same pathway. Considering the relatively short time for adaptation after the glacial retreat and the highly polygenic nature of adaptation to climate, it seems plausible that many of the alleles affected by adaptation needed to be recruited from standing genetic variation, and the drifting of allele frequencies probably lowered the chances of parallel adaptation on the level of SNPs and genes.
Additionally, some differences in SNPs/genes detected in the two clusters might be the result of differences in climate, such as the differences in the range of climatic variables covered by the populations, and differences in the variance of climatic conditions within sites between east and west. Difference in climatic range was most pronounced for minimum temperature in early spring, as much lower temperatures are reached in the west compared to the east ( Figure 2).
Furthermore, climate is the result of a complex interplay between different climatic variables. Differences in the combination of climatic variables and the resulting selective pressures may have influenced the divergent adaptation in the two areas of A. lyrata. Our results are similar to those found in black spruce in North America. In this species postglacial range expansion combined with climate adaptation was found to be mostly lineage-specific and was hypothesized to be based predominantly on selection from standing genetic variation (Prunier et al., 2012), and the authors speculated that both environmental heterogeneity and stronger drift associated with loss of genetic diversity in one of the lineages could account for lineage-specificity.
In conclusion, using whole-genome pool-seq data covering the entire distribution range of A. lyrata subsp. lyrata in North America, we identified genes and biological processes involved in adaptation to niche-determining climate variables. Gene ontology terms related to flower development were found repeatedly and regulators of flowering time were among our outlier genes, emphasizing the crucial role of the timing of reproduction for local adaptation.
We found little evidence for intraspecific parallel adaptation to a comparable climatic gradient on the level of SNPs and genes between two postglacial lineages. Nevertheless, our results indicate a genetic basis of adaptation to the climatic factors that shape the distribution of this species, i.e., minimum temperature in early spring and two factors depicting water availability. This was most prominently shown by the significantly elevated selection coefficients found for outlier genes, and the fact that similar biological processes were affected in both lineages. We hypothesize that the highly polygenic nature of adaptation to climate and a Pleistocene population history with significant action of genetic drift limited the potential for parallel evolution.