Multiscale landscape genomic models to detect signatures of selection in the alpine plant Biscutella laevigata

Abstract Plant species are known to adapt locally to their environment, particularly in mountainous areas where conditions can vary drastically over short distances. The climate of such landscapes being largely influenced by topography, using fine‐scale models to evaluate environmental heterogeneity may help detecting adaptation to micro‐habitats. Here, we applied a multiscale landscape genomic approach to detect evidence of local adaptation in the alpine plant Biscutella laevigata. The two gene pools identified, experiencing limited gene flow along a 1‐km ridge, were different in regard to several habitat features derived from a very high resolution (VHR) digital elevation model (DEM). A correlative approach detected signatures of selection along environmental gradients such as altitude, wind exposure, and solar radiation, indicating adaptive pressures likely driven by fine‐scale topography. Using a large panel of DEM‐derived variables as ecologically relevant proxies, our results highlighted the critical role of spatial resolution. These high‐resolution multiscale variables indeed indicate that the robustness of associations between genetic loci and environmental features depends on spatial parameters that are poorly documented. We argue that the scale issue is critical in landscape genomics and that multiscale ecological variables are key to improve our understanding of local adaptation in highly heterogeneous landscapes.

investigation of local adaptation in alpine plants. The recent availability of very high resolution (VHR) digital elevation models (DEMs) (<1 m) has made it possible to effectively approximate ecologically meaningful variables with limited fieldwork, offering the type of fine-scale environmental data that are required to assess both the scale of adaptive patterns and the underlying factors in heterogeneous landscapes (Leempoel et al., 2015). DEM-derived variables such as temperature, soil moisture, or solar radiation are easy to compute and have the potential to be widely used as proxies in ecology and evolution (Kozak, Graham, & Wiens, 2008;Leempoel et al., 2015;Wilson & Gallant, 2000). However, such topographic variables have rarely been used in landscape genetics and need to be further evaluated (Leempoel et al., 2017).
It is intuitively expected that the higher resolution of a DEM is likely to produce more accurate results, although it appears that a high amount of details may blur the output signal (Cavazzi, Corstanje, Mayr, Hannam, & Fealy, 2013). Studies in geomorphology have indeed shown that the relationship between DEM variables and physical characteristics of the terrain could only be valid at a specific spatial resolution (Kalbermatten, Van De Ville, Turberg, Tuia, & Joost, 2012;Wilson & Gallant, 2000). In contrast, research in landscape ecology has rarely considered the influence of the spatial resolution of environmental data. The relevance of topographic variables in species distribution models has been regularly reported (Le Roux, Virtanen, & Luoto, 2013;Lefsky, Cohen, Parker, & Harding, 2002;Randin, Vuissoz, Liston, Vittoz, & Guisan, 2009). However, some studies showed substantial improvement of models attributed to finer environmental variables (Camathias, Bergamini, Küchler, Stofer, & Baltensweiler, 2013), whereas others found limited differences (Pradervand, Dubuis, Pellissier, Guisan, & Randin, 2014). Noticeably, DEM-derived variables have rarely been used in Gene-Environment Associations and, to our knowledge, the spatial resolution has never been considered as an influencing parameter, which likely leads to incomplete conclusions on local adaptation (Manel, Poncet, Legendre, Gugerli, & Holderegger, 2010;Parisod & Joost, 2010;Storfer, Murphy, Spear, Holderegger, & Waits, 2010).
In this study, we explored the population structure of the alpine plant Biscutella laevigata and performed correlations between local environmental data and genetic variation. To do so, we used 233 polymorphic AFLP markers and 13 VHR DEM-derived variables, demonstrated as relevant environmental proxies (Leempoel et al., 2015). Our aims were to (i) detect fine-scale population structure, (ii) evaluate to what extent DEM-derived proxies of environmental features are powerful to detect signatures of selection, (iii) assess the impact of their spatial resolution on the detection of signatures of selection. Taking advantage of very high resolution, we thus here appraise and discuss the scale dependency of microhabitat modeling and of signatures of selection.

| Sampling
Biscutella laevigata is a widespread polyploid Brassicaceae species that occurs mostly as small patches across the European Alps (Parisod & Besnard, 2007). This strictly outcrossing, perennial plant has its pollen dispersed by generalist Diptera and Lepidoptera, while seeds disperse through gravity and possibly wind (Parisod & Bonvin, 2008).
The study zone is situated at "les Rochers-de-Naye" (N46°26′00″, E6°58′50″), where a natural hybrid zone between closely related B. laevigata lineages has been documented along a 1.2-km-long ridge at an elevation included between 1,864 and 2,043 m above sea level (Parisod & Christin, 2008). Across the whole populated area, 361 individuals of B. laevigata were selected using a random cluster sampling strategy to represent the spatial distribution of the population. Selected areas of 4 × 4 m, separated by random distances of 0 to 25 m, were subdivided in four 2 × 2 m plots that were sampled when at least five individual plants were present. If less than five individuals were found in any of the four plots, a new area, at least 25 m further along the ridge, was selected. All individuals where georeferenced using a differential GPS offering a horizontal accuracy of c. 2-3 cm and a vertical accuracy of c. 3-4 cm. Their leaves were immediately dried in silica gel for extraction of genomic DNA following a standard DNeasy plant extraction mini kit protocol from Qiagen AG, Switzerland.

| AFLP genotyping, scoring, and error estimation
All individuals were genotyped with amplified fragment length polymorphisms (AFLPs) following Parisod and Christin (2008). Despite limitations inherent to their dominant nature, AFLP loci are widely distributed across the genome and support appropriate genotyping that is hardly outperformed by current high-throughput approaches in polyploids (Mason, 2015). In short, genomic DNA was digested with EcoRI and MseI before ligation of adaptors to perform preselective and selective amplifications. PCR products amplified with FAM, VIC, NED fluorescent dye on the EcoRI primers were pooled with GeneScan 500 LIZ Size ladder and separated the 3730xl DNA analyzer capillary sequencer (Applied Biosystems). Resulting electropherograms were scored between 75 and 500 bp with GENEMAPPER v. 4.0 (Applied Biosystems) using AFLP default peak detection parameters. The scoring was checked manually, and AFLP loci were recorded as present (1) or absent (0) in binary matrices.

| Population structure and gene flow
An issue regularly encountered when studying patterns of genetic variation and local adaptation in plant populations is recent polyploidy (Meyers & Levin, 2006). As polyploid populations strongly violate Hardy-Weinberg expectations, most standard methods in population genetics cannot be applied (Ronfort, Jenczewski, Bataillon, & Rousset, 1998). Furthermore, inferential frameworks accounting for the evolutionary genetics of polyploids must rely on accurate datasets assessing dosage of the multiple alleles at each locus, which still is technically challenging with high-throughput genotyping (Mason, 2015). Accordingly, approaches deprived from population genetics pre-requisites should currently be privileged among the applicable methods to evaluate local adaptation from genetic data. Under such circumstances, the population structure can profitably be described using, for example, the K-mean clustering or principal component analysis, whereas the detection of signatures of selection can be achieved using generalized regressions or mixed models (Parisod & Joost, 2010).
Unbiased inference of population genetic structure was here assessed using K-means clustering, a data partitioning method implemented in the R package Vegan (Dixon, 2003), with the Calinski criterion (Caliński & Harabasz, 2007) to select the most likely number of genetic clusters (Gompert, Lucas, Fordyce, Forister, & Nice, 2010). Accordingly, individual AFLP genotypes were assigned to their genetic cluster using the fuzzy c-means algorithm (Dunn, 1974) implemented in the package "e1071" in R (Meyer, Dimitriadou, Hornik, Leisch, & Weingessel, 2014) with fuzzification parameter optimized at 1.02. Such a genomic cline approach was successfully used to dissect gene flow between polyploid taxa across natural hybrid zones (e.g., Senerchia et al., 2016). After a maximum of 1,000,000 iterations, the outputs of 1,000 independent runs of this algorithm were combined in CLUMPP (Jakobsson & Rosenberg, 2007) using the Greedy search method and 10,000 repeats of random input order. The coefficient of membership to a cluster was provided by CLUMPP as the mean of the independent runs. Such an individual coefficient can be considered as an admixture score estimated without biological assumptions.
Accordingly, individuals were considered as belonging to population A and population B when their coefficient of membership was below 0.2 and above 0.8, respectively. Individuals with intermediate scores were considered as admixed.
Spatial genetic structure was quantified using SPAGeDi (Hardy & Vekemans, 2002), which measures the pairwise relatedness between individuals at increasing distance intervals. The mean relationship coefficient among loci was computed within 20 balanced intervals (i.e., with the same number of pairwise comparisons in each interval), and its significance was assessed with 9,999 permutations between individuals.

| Environmental variables
The acquisition of the VHR DEM used in this study is described into details in Leempoel et al. (2015). Briefly, a LIDAR point cloud was obtained over the study area (helicopter) and transformed into a DEM with a resolution of 0.5 m. The following 13 variables were then derived from this DEM using SAGA GIS (Conrad et al., 2015): northness (Cosine of Aspect, Nor), eastness (Sine of Aspect, Eas), slope, vector ruggedness measure (VRM) (Sappington, Longshore, & Thompson, 2007), total solar radiation in June and December (Ti6, Ti12) (Böhner & Antonić, 2009), positive and negative topographic openness (TOP and TON) (Yokoyama, Shirasawa, & Pike, 2002), sky view factor (SVF) (Häntzschel, Goldberg, & Bernhofer, 2005), wetness index (SWI) (Beven & Kirkby, 1979), flow path length (FPL) (O'Callaghan & Mark, 1984), and wind exposure index (WEX) (Conrad et al., 2015). VRM is a measure for the unevenness of terrain and distinguishes between rocky vs. smooth terrain. TOP and TON express the protection of a focal point from the surrounding relief. It is based on the maximum angle found at zenith (TOP) or at nadir (TON) from the point, over a defined radius. SVF expresses the ratio of the radiation received by a planar surface over the radiation emitted by the entire hemispheric environment. SWI is the logarithm of the ratio between the catchment area and the tangent of slope and quantifies the topographic control of hydrological processes. FPL calculates the upstream or downstream distance along the flow path for each sample. More details can be found in Leempoel et al. (2015).
In order to account for the variability of DEM-derived variables due to spatial resolution in association models, each variable was com-

| Detection of outlier loci
Association models between the presence of genetic markers and the value of DEM-derived variables were processed using generalized linear mixed models (GLMMs) (Bolker et al., 2009;Zuur, Ieno, Walker, Saveliev, & Smith, 2009), which are advantageously independent of any genetic model. GLMMs are used in Gene-Environment Associations studies to account for pseudo-replication due to population structure among samples. In our case, we considered the pixel (i.e., each unit of the DEM grid) to be the random effect instead of the sampling plot or the genetic subpopulation. In fact, with DEMs at different resolutions, individuals often fall within the same pixel (i.e., should be considered as pseudo-replicates) and we therefore used pixels' IDs as the random parameter at each spatial resolution. The coarser the resolution, the more samples are located in the same pixel. Hence, samples are present in 295 pixels at a resolution of 0.5 m, 227 at 1 m, 140 at 2 m, 99 at 4 m, and 66 at 8 m. GLMMs were performed using the R package lme4 (Bates & Maechler, 2009) between each polymorphic marker and DEM variables using a binomial link function. Significance of all associations was assessed with a log-likelihood ratio test, and AICs were compared between a model with a variable and a constant model.
In addition to these variables, GLMM models were also performed with measured altitude (Alt), longitude (X), latitude (Y), as well as membership coefficient to population A. In these cases, the plot was considered as the random effect.

| Habitat comparison
All DEM-derived variables, except TON, were found to be significantly different between populations (Table 1). For instance, population A was found to be located at a higher altitude than Population B. Population A was also more exposed to wind (WEX), presented a lower protection from surrounding relief (i.e., higher openness, TOP), a higher sky view proportion (SVF), and higher Terrain ruggedness (VRM) than population B. Interestingly, population A receives less solar radiation than B in December, but more in June. Regarding hydrology, individuals from population A were reported on significantly shorter flow path lengths (FPL) but showed higher soil wetness (SWI).
Nevertheless, it is worth noting that some variables were only different among populations at particular resolutions. For example, Eas was significantly higher in Pop A than in Pop B at resolutions of 0.5, 1, 2, 4, and 8 m, while VRM was only higher in Pop A at 0.5 m.

| Detection of outliers
Few GLMMs between genetic loci and environmental variables turned out to be significant with α = 0.05 after Bonferroni's correction (significance level: 3.3E-06). Only five genetic markers were significantly associated with Alt, TON, Nor, Ti12, and WEX ( Table 2). The spatial resolution of DEM-derived variables, however, had a strong influence on the significance of associations. Indeed, these associations were only significant at a specific resolution. Although characterization of fine-scale environmental heterogeneity appeared crucial, the highest resolution did not necessarily imply the highest significance. For example, the association between the locus c1v382 and Ti12 or Nor was only significant at 1 m and poorer at other resolutions ( Figure 3).
The two other associations, that is, c1v222 with TON and c1b136 with WEX, are only significant at the highest resolution but the former  (Parisod & Christin, 2008). Among the five genetic markers here strongly associated with DEM variables, one candidate adaptive locus was associated with solar radiation in December and northness and is thus further congruent with prior results suggesting selection by solar radiation (i.e., degree-days during Variables showing a significant difference between individuals of populations A, B and admixed ones are shown in the table. The mean and the standard deviation are given for each habitat. The following column provides the p-value for the most significant Kruskal-Wallis test performed between Habitat A and B, and the final column indicates the resolutions at which the test was significant (<0.05 after Bonferroni's correction for multiple tests), that is, the variable in question is significantly different between the two populations at the spatial resolution indicated. Variables acronyms: altitude (Alt), northness (Nor), eastness (Eas), slope (Slo), vector rudggedness measure (VRM), positive topographic openness (TOP), wind exposure index (WEX), sky view factor (SVF), total insolation in June (Ti6), total insolation in December (Ti12), flow path length (FPL), SAGA wetness index (SWI  Bearing limitations of the approach in mind, our observations demonstrate that VHR DEMs can be suitably used to model fine-scale environmental heterogeneity. Among investigated topographic variables, solar radiation, terrain ruggedness, and wind exposure appear to substantially differ between the two populations. Noticeably, they reflect climatic variability at microsite that is only identified by fine-scale topographic models, demonstrating their usefulness for landscape genomics studies requiring such resolution (Leempoel et al., 2015;Manel et al., 2010;Pradervand et al., 2014). Most impor tantly, the DEM-derived variables used here were shown to be surrogates for relevant ecological features, including temperature and snow cover variability in mountainous areas (Böhner & Antonić, 2009;Leempoel et al., 2015;Lehning, Grünewald, & Schirmer, 2011;Wilson & Gallant, 2000). Accordingly, we show here that a large panel of variables exist and can likely be expanded to refine environmental characterization for many organisms. For instance, vector ruggedness measure (VRM) appeared to be the most important predictor of soil moisture on the ridge (Leempoel et al., 2015). High-resolution VRM thus appears as a suitable proxy for the distribution of stony areas and more generally soils with different porosities. Exposure to wind was also noticeable in habitat comparisons and in association models. As it indirectly affects snow accumulation (Plattner, Braun, & Brenning, 2004) and thus the timing of snow removal in alpine habitats, which we observed to be correspondingly heterogeneous over the study site (pers. obs.), wind exposure represents a useful proxy for gathering insights on the start of the growing season or exposure to cold during the harsh season. In addition, altitude has an important role as two markers identified of six were associated with this variable only. Clearly, it remains among the most important parameter influencing temperature at any scale in mountainous regions (Leempoel et al., 2015;Wilson & Gallant, 2000).
Multiscale models used here enabled precise analyses, thanks to ecologically relevant topographic proxies (Leempoel et al., 2015). For both habitat comparisons and association models, we report a high sensitivity to spatial resolutions and a generally decreased strength of GLMM models at coarser resolutions, which were mostly nonsignificant. It appears that DEM-derived variables computed at a single resolution, particularly at coarse ones, do not fully represent the topographic control on ecologically relevant variables, and are not able to replicate at best the spatial continuum naturally constituting landscapes. Noticeably, associations between genetic markers and different environmental variables did not generally converge toward an optimal resolution, indicating that the suitable resolution depends on the type of DEM-derived variable considered (Leempoel et al., 2015).
Our framework illustrates that ecologically relevant DEM-derived proxies are relatively easy to acquire and provide unique information on micro-habitats for landscape genomics studies. However, we also highlight their sensitivity to changes in spatial resolution and argue that the interpretation of results obtained from DEMs at a single resolution should be cautiously considered. By no means, a single resolution, even the finest, may be sufficient to identify signatures of selection in highly heterogeneous landscapes. Accordingly, recommending an appropriate scale would likely be misleading and we rather suggest that future studies be based on high-resolution models to explore multiscale derived variables, as we did in this study. While we focused on a single species, we expect these recommendations to be valid for a broad range of taxa and habitats. On the other hand, coarse resolution climatic variables interpolated over homogeneous landscapes may be sufficient for specific situations (Fick & Hijmans, 2017) that are unlikely to benefit from a multiscale approach.

ACKNOWLEDGMENT
This work was funded by a grant from the Velux Stiftung (Project 705 to CP).

DATA ACCESSIBILITY
Genetic and Environmental data are available on Zenodo.

AUTHORS CONTRIBUTIONS
KL conducted fieldwork, performed analysis, and wrote the manuscript. CP designed and supervised the project and contributed to the writing. CG conducted fieldwork, performed the genetic analysis, and contributed to the discussion. SJ supervised the project and contributed to the writing.
F I G U R E 3 Variation of the significance of association models between genetic marker c1v382 and northness for different spatial resolutions. The distribution of the marker along the ridge is shown in a). The background represents the aspect computed at a resolution of 1 m. (b) shows the significance of GLMM with increasing resolution (i.e., pixel size of the DEM in meters), represented by the log10 of the p-value of the loglikelihood ratio. The horizontal bar is the significance threshold of 0.05 after Bonferroni's correction ORCID