Ecological genomics of local adaptation in Cornus florida L. by genotyping by sequencing

Abstract Discovering local adaptation, its genetic underpinnings, and environmental drivers is important for conserving forest species. Ecological genomic approaches coupled with next‐generation sequencing are useful means to detect local adaptation and uncover its underlying genetic basis in nonmodel species. We report results from a study on flowering dogwood trees (Cornus florida L.) using genotyping by sequencing (GBS). This species is ecologically important to eastern US forests but is severely threatened by fungal diseases. We analyzed subpopulations in divergent ecological habitats within North Carolina to uncover loci under local selection and associated with environmental–functional traits or disease infection. At this scale, we tested the effect of incorporating additional sequencing before scaling for a broader examination of the entire range. To test for biases of GBS, we sequenced two similarly sampled libraries independently from six populations of three ecological habitats. We obtained environmental–functional traits for each subpopulation to identify associations with genotypes via latent factor mixed modeling (LFMM) and gradient forests analysis. To test whether heterogeneity of abiotic pressures resulted in genetic differentiation indicative of local adaptation, we evaluated F st per locus while accounting for genetic differentiation between coastal subpopulations and Piedmont‐Mountain subpopulations. Of the 54 candidate loci with sufficient evidence of being under selection among both libraries, 28–39 were Arlequin–BayeScan F st outliers. For LFMM, 45 candidates were associated with climate (of 54), 30 were associated with soil properties, and four were associated with plant health. Reanalysis of combined libraries showed that 42 candidate loci still showed evidence of being under selection. We conclude environment‐driven selection on specific loci has resulted in local adaptation in response to potassium deficiencies, temperature, precipitation, and (to a marginal extent) disease. High allele turnover along ecological gradients further supports the adaptive significance of loci speculated to be under selection.


| INTRODUCTION
Understanding ecological pressures and their evolutionary impacts on natural tree populations represents an active research field in evolutionary ecology and is important to conservation of forests.
There is little debate abiotic and biotic stressors can result in local adaptation and lead to evolutionary divergence of populations via isolation by adaptation (IBA) (Nosil, Funk, & Ortiz-Barrientos, 2009). Local adaptation occurs widely in plants and animals, but the genetic basis is generally poorly understood (Fraser, Weir, Bernatchez, Hansen, & Taylor, 2011;Hereford, 2009;Leimu & Fischer, 2008). Studying the genetic basis of local adaptation, ecological factors driving divergent selection, and genetic differentiation of natural populations provides insights into how species may respond to future environmental changes, such as exotic pathogens, increasing deforestation, and future climate change (Fisichelli, Abella, Peters, & Krist, 2014). Answers to these questions are clearly relevant to conservation management of forest tree species.
Population genomic and landscape ecology approaches (Anderson, Willis, & Mitchell-Olds, 2011;Sork et al., 2013) provide means to detect local adaptation and loci responding to ecological forces of selection. Local adaptation can be revealed by genetic differentiation among populations at F st outlier loci from contrasting environments as well as genetic correlation with environmental variables (Savolainen, Lascoux, & Merilä, 2013). The application of genomewide genetic markers (produced from next-generation sequencing) to identification of truly adaptive loci still poses many challenges as a result of missing data from sequencing bias or sampling error. While limitations of analytical frameworks have been addressed using simulated data and through comparisons of methods (Lotterhos & Whitlock, 2014Mita et al., 2013;Narum & Hess, 2011), bias of data resulting from next-generation sequencing has remained a serious concern for marker-based genomic approaches such as the recent but widely adopted RAD-seq and genotype-by-sequencing (GBS) methods. Biases from such methods can contribute to frequent misidentification of false-positive loci.
GBS and RAD-seq methods are cost-effective for sequencing a reduced genome sample from a large number of individuals, and they are noted for employing restriction enzyme digested libraries (RRL) that contain DNA fragments of specific target sizes to uncover loci with single nucleotide polymorphisms (SNPs) (Davey et al., 2011;Narum, Buerkle, Davey, Miller, & Hohenlohe, 2013). Both have been increasingly used for genetic mapping, population genomics, phylogeography, and phylogenetics (Baird et al., 2008;Davey & Blaxter, 2010;Eaton, 2014;Eaton & Ree, 2013;Gagnaire, Pavey, Normandeau, & Bernatchez, 2013;Hohenlohe et al., 2010;Lu et al., 2013;Qi et al., 2015;Recknagel, Elmer, & Meyer, 2013;Rubin, Ree, & Moreau, 2012). Application of GBS has demonstrated more powerful discernment of population genetic structure compared to microsatellite data and identification of more loci possibly responding to selective forces (Allendorf, Hohenlohe, & Luikart, 2010;Chu, Kaluziak, Trussell, & Vollmer, 2014;Gompert et al., 2014). While analysis of reduced genomes using this method is promising for identifying loci under selection, biases introduced by sequencing require cautious treatment of data in order to minimize false positives. Prior simulated studies have demonstrated failure to account for biases of reduced genome sequencing may result in both type I and II errors for detecting loci under selection . In particular, missing data and low coverage of SNP markers may erroneously characterize allelic variants as highly differentiated among populations, and even highly differentiated loci (measured by F st ) may not have true adaptive value (Savolainen et al., 2013). Therefore, while the capability of genotyping large amounts of SNPs under possible selection has advanced, purging false positives from hundreds or thousands of candidate loci remains a bottleneck that hampers efficient exploration of true candidate genes. One approach to minimize false positive is to compare results from repeated and independent GBS experiments, but this approach has not been widely adopted due to added cost and labor involved.
In this study, we addressed the major concerns of the GBS method (specifically, repeatability and false positives due to missing data) using a combination of methods to more reliably identify loci under selection.
First, we incorporated replication of sampling design into our sequencing strategy. Second, we isolated candidate loci that were detected by two F st outlier-based methods (Excoffier, Hofer, & Foll, 2009;Foll & Gaggiotti, 2008) and a genotype-environment association method (Frichot, Schoville, Bouchard, & François, 2013;Schoville et al., 2012) before reanalyzing them in a combined library with putatively neutral loci. For our final set of repeatedly genotyped loci showing evidence of local adaptation, we compared patterns of allele turnover along ecological gradients to our putatively neutral set of loci using a gradient forest (GF) approach recently applied to the field of ecological genomics (Ellis, Smith, & Pitcher, 2012;Fitzpatrick & Keller, 2015). Our main questions are as follows: (1) Has the species evolved local adaptation as a consequence of environmentally heterogeneous ecological pressures? (2) Which SNPs are likely to be candidates under selection? (3) Which environmental gradients are most important to genetic divergence and local adaptation of C. florida populations if any? (4) What genetic predisposition does C. florida possess to adapt to ongoing climate change in North Carolina? (5) And how does repeated GBS experimentation influence final results? The latter question is of utmost importance to researchers incrementally expanding sequencingbased investigations across increasing portions of a taxon's range, and as such, we primarily report findings within North Carolina as part of a broader effort to characterize adaptive variation throughout the flowering dogwood range.

| Site selection
The natural range of C. florida comprises distinct and heterogeneous environments-spanning as far as north as Maine and occurring as a disjunct subspecies along the Sierra Madre Oriental; as such, various biotic and abiotic stressors have varied effects on the species in different ecoregions. Although ongoing research is underway to capture the full range of adaptive variation in C. florida, North Carolina is well suited for initial study as it encompasses three ecoregions with distinct environments spanning a range of longitudinal-elevational gradient similar to conditions of northern and southern portions of the species range (Wells, 1932). Therefore, we selected six popula-  Figure   S1). Sampling sites were selected with consideration of their remoteness from developed areas to minimize the probability of studying cultivated trees. Due to high heterogeneity in elevation at small distances within mountainous regions, two mountain populations were each subdivided into two sampling sites. Two mountain locations for sampling were within national park and forest boundaries. Two other mountain locations were in close proximity to protected areas and were previously monitored for dogwood anthracnose disease by the NC Forest Service-Forest Health Branch (Table 1; Figure S2).

| Environmental variables
Three ecological regions from which natural populations were sampled are known to differ in temperature, rainfall, soil type, and disease incidence. Differences between mountain, Piedmont, and Coastal Plains regions of North Carolina were recorded with field-site measurements. Environmental variables from each region were represented by data collected from two subpopulations, and each subpopulation consisted of one or two sites of 30 or 15 individual trees, respectively.
Field measurements and soil cores were obtained in close proximity to each tree sampled, and the majority of sampled trees were spaced at least five meters apart within each subpopulation. With genotype evidence later obtained (see Genotyping and Data processing), relatedness between individuals was checked using PLINK (Purcell et al., 2007) to ensure environmental data affiliated with clonal or sibling pairs were excluded.
Environmental measurements included elevation, proximity to water, canopy coverage, and 15 soil core features (Table S1 Appendix) and were recorded at sites during sample collection (described in Environmental Variables, Supporting Information). Additional environmental data (soil classification, temperature, precipitation, frost-free period, and length of growing season) were obtained via GIS (see GIS Resources, Supporting Information). We note further in Supporting Information that the size of our environmental dataset was reduced for certain analyses, namely GF analysis. As collinearity among variables and its effect on the random forest algorithms (that GF is an extension of) are not fully understood, we safeguarded against any possible problems by reducing the number of collinear pairs of environmental variables. Prior to reduction of collinearity for GF (Additional Validation of Environmental and SNP Data, Supporting Information), our environmental dataset consisted of 12 variables (Table S1 Appendix), excluding 15 soil core measurements and soil types from the USGS soil classification scheme.

| Functional traits
Two functional plant traits, plant health and leaf osmotic potential, were measured in this study. Plant health was measured during plant collection. We measured the health condition of every sampled tree using a visual estimation method (Mielke & Langdon, 1986) employed previously by forest health monitors. Individuals were scored for one of five categories based on twenty percentile increments of tree canopy displaying symptoms of disease infection (e.g., leaf blotting, necrosis, or branch dieback). Individuals rated with a score of five exhibited minimal or no stress (0%-20% canopy infection), while individuals with scores of one had almost no living or disease-free foliage (80%-100% canopy infection). In addition, we employed an alternative binary scoring system that recorded scores of four and five as one and anything below as a score of zero.
After assigning each tree a health score, at least four branch cuttings were taken from the majority of sampled trees (except some mountain trees with substantial branch dieback) and transported to the laboratory for leaf osmotic potential measurements using an osmometer.
T A B L E 1 Location and population summary statistics of sampled subpopulations within each ecological region of North Carolina We designed osmometer experiments to specifically measure leaf osmotic potential (tendency of water to move into and be retained in mesophyll cells), which is indicative of plant drought tolerance (Bartlett, Scoffoni, & Sack, 2012). Branches were randomly selected by cutting from each sampled tree. Cuttings were placed immediately in 50-ml vials filled with water and transported promptly to a common room temperature-controlled setting where measurements were taken using an osmometer (described in Functional Traits, Supporting Information).

| Genotyping
Fresh leaf samples were collected from the same plants visually scored for health in the field. Samples were stored at −20°C until they were used for DNA extraction. A total of 180 trees were sampled from six populations, with 30 samples from each (

| Data processing
Sequence data from each Illumina library were cleaned by removing contaminant and low-quality sequences. High-quality reads from each library were independently assembled de novo and filtered again after assembly. Paired-end one (PE1) reads were processed separately for each of the two libraries (see discussion of PE2 reads, Supporting Information). GBS barcode splitter, other custom Perl scripts, and FASTX-Toolkit were used to sort samples by barcode, trim PE1 reads to 90 bps, and remove sequence reads that had more than 5% of their bases with a quality score below 20. Bowtie 2 (Langmead & Salzberg, 2012) was used to align raw reads to several fungal genomes in order to identify and filter out as many contaminant DNA fragments as possible. Following these steps, we processed sequences into catalogs of shared loci using STACKS (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011;Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013).
After removing nontarget sequences, remaining sequences were processed through STACKS version 1.19   In addition to this parameterization (justified in SNP Data processing, Supporting Information), we also chose filtering parameters that controlled the amount of missing data tolerated for population genetic analyses.
Missing data were also important factors to consider for processing steps. A common practice is to use >20% missing data criterion as an arbitrary cutoff to exclude loci in datasets (Narum et al., 2013), but some have relaxed the criterion to up to 80% missing data (Crossa et al., 2013). Excessive data filtration can have unforeseen consequences (Huang & Knowles, 2014) due to truncation of loci with higher mutation rates and reducing statistical power of analyses. We relaxed our missing data acceptance threshold slightly by keeping loci with a maximum of 25% missing data in each library's samples. We also designated a 5% minor allele frequency cutoff to reduce artifacts of sequence and assembly error.

| Identification of candidate loci under selection
To identify loci strongly deviated from the general population genetic structure and strongly associated with environmental differences, we first characterized individuals' membership to biological clusters.
Using a dataset of uncorrelated SNPs not in linkage disequilibrium for our two libraries (first occurring SNP per RAD-tag), STRUCTURE (Pritchard, Stephens, & Donnelly, 2000) was implemented for the first eight cluster models (K = 1-8) using ten replicate analyses each with a burn-in of 100,000 and 100,000 subsequent iterations. analyses, was designated for F st outlier loci analysis using Arlequin.
Using Arlequin, we ran 20,000 simulations with 10 simulated groups and 100 demes per group in the analysis to identify candidate loci under selection. Using another extension of the F st outlier approach (Beaumont & Nichols, 1996) implemented in BayeScan (Foll & Gaggiotti, 2008), we also assessed allele frequencies from the same datasets to test whether loci were highly differentiated when parameterizing a classical island model instead of a hierarchical island model. Under certain simulated scenarios where adaptive variation conflicted with a defined hierarchical neutral structure, BayeScan has been shown to outperform Arlequin (Narum & Hess, 2011), and other simulated work suggests comparison of results from different outlier methods can reduce error rates (Villemereuil, Frichot, Bazin, François, & Gaggiotti, 2014). The analysis was performed with the following priors: 5,000 sample size; 20 thinning intervals; 20 pilot runs of length 5,000; 100,000 additional burn-in; uniform distribution between 0 and 1; and a prior odds for neutrality of 5:1. Prior odds of 1:1 and 10:1 for the neutral model were also evaluated in BayeScan.
LFMM accounts for covariation of alleles and environment, and compared to other GEA tests, more flexibly accounts for hidden population structure while maintaining a relatively lower false detection rate under models of hierarchically structured populations (Villemereuil et al., 2014). As we found evidence to support subpopulations being hierarchically nested within two larger clusters (coastal-mainland), we chose LFMM to identify candidate loci. The optimal latent factor number (K = 2), identified with the Evanno method for STRUCTURE analysis (Evanno, Regnaut, & Goudet, 2005), was incorporated in LFMM. For LFMM, we ran the analysis with 50,000 sweeps for each pairwise test with a burn-in of 12,500 sweeps because repeated tests of parameters showed a precise consensus in regard to SNPs being detected as highly associated with environmental and functional traits.
F st outliers from Arlequin analysis were filtered for p-values below 5%, and a q-value for each locus was subsequently calculated by the program QVALUE (Benjamini & Hochberg, 1995) to monitor false discovery rates of positive results. Q-values of outlier loci from BayeScan were automatically calculated by the program, and those results were filtered to retain loci with a q-value below 0.1. Results from LFMM genotype-environment associations were filtered to keep significant associations with a Z score over 4, following the practice of Frichot et al. (2013). The score corresponded to a Bonferroni alpha correction of 0.01 for 1,000 SNPs.

| Detecting allele turnover patterns along ecological gradients: gradient forest and mantel tests
A small subset of loci (54 RAD-tags) had compelling evidence for being under selection, defined as being detected by multiple methods across libraries (three or more overlaps in Figure 6a) and consistently genotyped across libraries, were selected for analysis of allele turnover along ecological gradients using gradient forests analysis. This analysis is a novel application of a community ecology method (Ellis et al., 2012) to study ecological genomics of local adaptation. The method was recently demonstrated by Fitzpatrick and Keller (2015) to be useful for further evaluating the adaptive and ecological significance of putative candidate loci under selection and for determining the relative importance of various ecological pressures on the adaptive landscape.
A larger set of presumably neutral loci (1,307 RAD-tags) were constructed as the reference group for the analysis. The "reference loci" were consistently genotyped across libraries but were not identified as candidates under selection in any of the Arlequin, BayeScan, and LFMM analyses. To distinguish departures of candidate SNPs from the general genomic background, we concurrently analyzed and plotted patterns of allele turnover along ecological gradients for the both the candidate and reference subsets of our dataset using GF analyses (Fitzpatrick & Keller, 2015). The 176 individual trees were treated as response variables for GF. On the other hand, the subpopulations (two mountain populations subdivided) were considered for pairwise matrices used in mantel tests. Mantel tests were applied to the same datasets to corroborate overall correlations (instead of SNP-specific patterns) between environment and candidate-reference loci, after controlling for geographic distance (Legendre & Fortin, 1989). Mantel tests, specifically partial mantel tests, have been similarly applied in recent population-level studies (Zhao et al. 2013). Before implementing GF and mantel procedures, we implemented one further series of validation procedures to our environmental data, candidate loci, and reference loci as described in Supporting Information (Additional

Validation of Environmental and SNP Data).
GF analyses were conducted with the gradientForest R package (Smith & Ellis, 2013), using only SNPs with a variable correlation threshold of 0.5 or greater to generate plots of allele turnover. As a precaution, we minimized the nonindependence of SNPs in our genetic dataset prior to GF analysis because (although not demonstrated to affect GF specifically) linkage disequilibrium was known to bias landscape and population genomic approaches by adding weight of inference to correlated loci pairs. To reduce GF's susceptibility to linkage disequilibrium, only one SNP per RAD-tag was considered while fitting the GF model using 2,000 regression trees. A random SNP per RADtag was selected for reference loci, but the SNP with the highest F st per RAD-tag was chosen for candidate loci. SNP data were converted to presence-absence of the minor allele for each of 176 individuals (two samples duplicated among two libraries) and were analyzed in GF using the regression model, which was a standard implementation of the gradientForest R package. Remaining parameters to fit GF models were selected according to Fitzpatrick and Keller (2015).
Partial mantel tests were performed with R ade4 and ecodist packages (Chessel, Dufour, & Thioulouse, 2004;Goslee & Urban, 2007) using Slatkin's linearized F st data to ensure genetic patterns were suited for linear regression. Pairwise matrices of linearized F st values were obtained from Arlequin, and for every environmental-functional variable, each subpopulation's mean was calculated. The pairwise difference between subpopulations' means was then determined to obtain a dissimilarity matrix for each environmental-functional trait.
Geographic distances between populations were calculated using Euclidean distances derived from a projected coordinate system (in meters) to provide control for isolation by distance while detecting the significant correlations between overall genetic and environmental distances (i.e., partial mantel tests). Full and partial mantel tests were carried out independently for each environmental-functional trait.

| Environmental and functional trait differences
Results of one-way ANOVA (or Kruskal-Wallis tests for environmental data not fitting ANOVA assumptions) indicated the majority of environmental features were significantly different (p < .05) between at least two of six populations compared ( Mean values of plant health rating varied significantly among populations ( Figure S2), but leaf osmotic potential varied to a lesser extent (

| Genotyping results
Before removing contaminant reads aligned to fungal genomes and of individuals per library, the number of unique SNPs was reduced to 2,983 and 2,764 for library one and two, respectively. When only a single SNP per locus-tag was retained, numbers were further reduced to 2,170 and 1,994. When both libraries' results were examined together, a total of 2,533 unique loci were identified ( Figure S4). Of these unique SNPs, a total of 1,631 loci were repeatedly genotyped in both libraries ( Figure S4A)-representing approximately 75% and 82% of the total of each library.

| Population genetics
STRUCTURE analyses of both libraries supported an optimal K = 2 grouping of individuals, a coastal population group and a mainland (mountain-Piedmont) group (Figures 2a and S5). UPGMA dendrograms of genetic distances (Nei, 1972) generated with the program Populations (Langella, 1999) also showed high support for a grouping of coastal subpopulations that was distinct from mainland populations ( Figure S6). PCA of both library one and two data similarly showed two distinct clusters defining a mountain-Piedmont group and a coastal group (Figure 2b). One mountain subpopulation (GSMNP) in library two showed additional intrapopulation clustering. Overall, these results clearly indicate at least two genetic clusters-distinguishing coastal populations from mainland populations.
AMOVA results (Table 2) showed a considerable percentage of total genetic variation attributable to differences among individuals (library one: 92.72%, library two: 88.27%) and a small but significant percentage was attributed to differences among subpopulations within a two group hierarchical structure (library one: 1.93%, FSC = 0.01993, p < .001; library two: 3.34%, FSC = 0.03448, p < .001). Differences between coastal and mountain-Piedmont groups accounted for approximately 3% of total genetic variation for both library one and two results, which is marginally insignificant with a two-tailed statistical test (library one: p = .06585; library two: p = .06707). This suggests extensive genetic mixture within regions and frequent gene flow or weak genetic differentiation between coastal and mainland populations. STRUCTURE and AMOVA results from analyses for a less probable hierarchical population structure of mountain, Piedmont, and coast division are available in Supporting Information (Table S2; Figure S8).  Bonferroni correction for 1,000 loci. Q-Q plots from LFMM ( Figure 4) showed that reduced-dimension environmental components derived from features such as soil moisture, pH, nutrients, length of growing period, mean annual temperature, and mean monthly rainfall ( Figure   S3) had an excess of significant genotype-environment associations when analyzed in a single library alone, but results were much more conservative when the combined library of candidate SNPs was reanalyzed with putatively neutral SNPs. LFMM association tests between each SNP and each individual environmental or functional trait revealed several significant correlations while controlling for hidden population genetic structure among individual trees (Table S1; Figure 5). Similar to results of GF and partial mantel tests, temperature covariates were most often correlated with outlier SNPs. LFMM analyses did not detect significant correlation between genetic data and canopy coverage or levels of zinc. Significant correlations of genetic data to functional traits detected by LFMM analyses differed between the two libraries. When coding plant health as a binary state, more significant associations with SNPs were detected in library one than in library two. However, when plant health was scored using a 1-5 scale, the pattern was reversed. Of loci detected to be putatively under selection according to Arlequin, BayeScan, and LFMM, a subset of 54 loci were detected in more than one of these methods and genotyped in both libraries (Figure 6a).

| Identification of candidates under selection
The rate of consistently genotyping a locus across libraries was 75%-82%, whereas the rate of consistently identifying SNPs as candidates under selection by at least one method ranged between 23%- For the 54 SNPs identified as candidates of selection in at least three analyses (those falling in the three overlapping areas boxed in Figure 6a) and consistently genotyped across libraries, BLAST searches to the NCBI nr repository showed eight loci with hits to predicted gene products (Table 3), and seven loci had no hits to predicted functions but aligned to the transcriptome of C. florida (Zhang et al., 2013). Evidence in support of these specific candidate SNPs is documented further in Table 3, and curated annotations with clear adaptive significance are examined further in discussion. Notable trends observed among the 54 candidate loci summarized on Table 3

| Detecting allele turnover patterns along ecological gradients: candidate vs. reference SNPs
The remaining set of environmental and functional trait variables with a VIF score below 10 were used for GF and mantel tests and listed in Table S3. After removing possible artifacts of combining libraries one and two (rationalized in Additional Validation of Environmental and SNP Data, Supporting Information), our chosen subset of 54 candidate loci was parsed to 43 in order to conduct GF on the combined dataset. The 43 candidate SNPs for GF analyses differed strongly from our chosen 1,171 reference SNPs (parsed originally from 1,307) in respect to overall patterns of allele turnover along ecological gradients of frost-free period, mean July precipitation, and soil densities of potassium, sodium, manganese, phosphorus, and sulfur ( Figure S9). In addition, mantel tests of collection sites for both reference and candidate SNP datasets supported that frost-free period and levels of potassium were the top two ecological variables for explaining patterns of genetic differentiation, albeit isolation by distance was also strongly correlated (Table S3). For GF, individual allele functions of candidate SNPs were plotted and highlighted in Figure 7 against SNPs behaving as the genomic background (black line for background SNPs). The top five GF plots in Figure 7 showed the most overall contrast between individual candidate SNPs and reference SNPs (interpreted from Figure   S9), but several patterns of allele turnover for less informative ecological gradients were of interest and were plotted in Figure S10. Several SNPs (e.g., B18_11, B244_51, B332_14, B195_77, B977_86) deviated greatly from reference SNPs near the longer portion of the frost-free period gradient (highest allele turnover at approximately 220 days).
SNP B1098_10 exhibited allele turnover patterns greatly contrasted from the majority of reference SNPs and other candidate SNPs along gradients of soil potassium, sodium, and sulfur. SNP B332_14 ranked second for a high amount of turnover along a potassium gradient, and SNPs B349_54 and B18_11 were also strong candidates that exhibited contrasting allele turnover patterns along gradients of sodium and sulfur. For allele turnover patterns along precipitation gradients, SNPs B195_77 and B1219 were highly contrasted against patterns of the reference background and other candidate SNPs.  (Wells, 1932). There is still a sizeable portion of adaptive variation uncharacterized in this study when compared to the adaptive variation present in the broader range (Table S4).

| DISCUSSION
Nonetheless, our results are highly relevant to conserving the species

| Evidence for locally adapted candidate loci
Local adaptation could be inferred from genetic signatures intrinsic to sequence datasets or allele frequency changes in relation to functional and environmental traits; Schoville et al. (2012) reviewed excellent examples of both population genetic and GEA approaches to uncover local adaption. Using both approaches, we found evident genetic signatures of local adaptation in C. florida (Table 3) and 54 well-supported F I G U R E 5 Total count of loci (xaxis) with SNPs passing Z cutoff of 4 for association to given environmental variables (y-axis) using LFMM (K = 2) analysis of (a) library one and (b) library two  Patterns of overall cumulative importance-or how well biological variation was explained for a given interval of environmental change (Fitzpatrick & Keller, 2015)-from GF analyses ( Figure S9) showed candidate SNPs were most divergent from reference SNPs for gradients of frost-free period, July precipitation, potassium, phosphorous, sulfur, and sodium. GEA results from LFMM also supported the importance of these variables for explaining local adaptation in specific loci. As  Figure 6a) were reanalyzed by selection tests with 1,171 putatively neutral loci (no overlaps in part a). One overlap in part A originally represented 13 loci detected by all three selection tests in library one but was reduced to seven results (7*) because six loci were not genotyped successfully in library two  Figure 5, temperature covariates (tmin1, growth period, mean temperature, and frost period) constituted the majority of association results to individual SNPs. Soil characteristics with highest amounts of significant LFMM associations were potassium levels and envPC2, which was a reduced-representation component mainly characterizing soil differences from an environmental distance-based PCA ( Figure S3). Low potassium levels were characteristic of relatively acidic soils of the Coastal Plains, which were susceptible to soil leaching and deficiencies in other plant nutrients (USDA soil classification).
In conjunction with soil nutrient availability, the amount of moisture available to soils clearly impacted nutrient use and growth strategies in flowering dogwoods (Kost & Boerner, 1985), and temperature and moisture regimen were shown to affect disease prevalence in dogwood populations (Chellemi et al., 1992). Holzmueller et al. (2007) demonstrated deficiency in soil potassium was linked to higher severity of dogwood anthracnose and quicker infection rates. The findings of Holzmueller et al. (2007) were of particular interest as coastal trees we sampled grew in relatively potassium poor soils but appeared visually free of anthracnose disease. This suggested coastal trees might have adapted to potassium poor soils, which could have resulted in secondary fitness effects such as greater resistance to disease relative to montane populations. In other words, selective pressures regarding levels of soil potassium might play a role in the adaptation to osmotic stress in coastal populations, and any such adaptations might predispose coastal populations to be less susceptible to oxidative stress caused by disease. Dogwood anthracnose was reported in Dare County along North Carolina's coast ( Figure S1), but its failure to persist there also suggests the pathogen might not be adapted to thrive in North Carolina's coastal climate. As such, adaptive mechanisms linking drought tolerance to disease resistance remain highly speculativerequiring us to test such hypotheses in future experiments.
The SNP on locus B332 (B332_14) was one of only nine genomic locations (B977, B768, B757, B634, B332, B247, B195, B1240, and B1250) to meet five of the six criteria for being considered a candidate of selection. It is repeatedly identified as a high F st outlier by Arlequin in both the first, second, and combined libraries and by BayeScan in library two and the combined library. LFMM analyses in both libraries and the combined library revealed the SNP was significantly correlated to several climate and soil variables, namely potassium and a reduced-dimension environmental variable (envPC2) autocorrelated with soil measures. We speculated soil leaching pressures and plant adaptations for efficient regulation of osmoticum might be critical factors to explain the high turnover of alleles for this SNP along a potassium gradient (Figure 7). B332_14 aligned to an exon of C. florida's transcriptome (Table S1), and the transcript was predicted to encode a probable glycerol-3-phosphate dehydrogenase (GPD, accession: XM_006493426) ( Table 3). As demonstrated by Albertyn, Hohmann, Thevelein, and Prior (1994) and later by Shen, Hohmann, Jensen, and Bohnert (1999), the efficiency of proteins to regulate levels of glycerol influenced the ability of plants to osmotically adjust to salt stress and other osmotic stresses tied to water flux. Coastal trees we sampled grew in soils that had significantly lower levels of potassium (Table S1 Appendix), and they most likely had acquired locally adapted mechanisms to compensate for lower potassium uptake and resulting toxicities of high proportions of sodium to potassium (Niu, Bressan, Hasegawa, & Pardo, 1995). The GF plot of potassium (Figure 7) supported strong turnover of alleles for B332_14 along lower levels of potassium. Supplementary GF analyses ( Figure S10) supported gradual turnover of the biallelic locus as plant osmoticum (measured in osmometer experiments) increased (Functional Traits, Supporting Information). While one allele was more predominant in populations of the Piedmont and Coastal Plains of North Carolina, the alternate version was only the major allele in two of the four mountain sites where soils were relatively potassium rich (Table S1). Of the 1,171 reference and 43 candidate SNPs analyzed by GF, only 24 SNPs had a correlation threshold above 0.5 in comparison with osmometer readings, but SNP B332_14 had the second highest cumulative importance of those 24 SNPs. SNP B332_14 also had the second highest cumulative importance along a gradient of overhead canopy cover, further suggesting it might be involved in managing other sources of osmotic stress such as high light intensity. Thus, this is a strong candidate associated with local adaptation of the species.
Locus B1350 is predicted to be a leucine-rich repeat (lrr) receptorlike serine threonine-protein kinase (accession: XM_00648094). While the function of this particular locus is speculative, leucine-rich repeat receptor-like kinases have previously been implicated in pathogen recognition (Afzal, Wood, & Lightfoot, 2008). A variety of resistance (R) genes encoding for lrrs have been identified, as the domain normally interacts with pathogen effectors or intermediate host molecules to trigger plant responses (Caplan, Padmanabhan, & Dinesh-Kumar, 2008). One allele of B1350 was highly abundant in two of the four mountain subpopulations where dogwood disease was greatest (Table   S1). Within analyses of library one, this locus was detected as a candidate of selection by both Arlequin, BayeScan, and LFMM. Greater heterogeneities of elevation, slope, and temperature within the mountains might result in more independent and isolated cases of local adaptation than would be expected within the same spatial scale for nonmountainous regions. Moreover, as implied by Hadziabdic et al. (2012), some mountain populations of C. florida might escape the suitable habitat range of dogwood anthracnose and maintain high genetic diversity instead of having advantageous alleles become fixed. Our identification of a lrr repeat receptor kinase constituted one hypothesis of local adaptation to disease for two of our sampled subpopulations, but we also found another putative R gene in our candidate dataset that might represent a different mechanism of plant resistance (discussed below).
Locus B982 was designated a candidate locus in analyses of both sequence libraries as well as the combined library. While the locus' change in alleles was not associated with any ecological gradients, it was highly differentiated in both libraries and had been detected as an outlier by both Arlequin and BayeScan analyses in the combined library. Moreover, locus B982 aligned to a sequence scaffold of a coding region within the transcriptome of C. florida (scaffold 19651, Zhang et al., 2013). The scaffold itself aligned to a gene predicted to encode a lectin protein kinase. While extracellular signaling of lectin protein kinases might be specialized for various extracellular signals, an emerging role of such proteins might involve innate immunity Unifier ID T A B L E 3 List of 54 candidate loci under selection that meet the following criteria: consistently genotyped across libraries and repeatedly called as a candidate under selection in Arlequin, BayeScan, or LFMM (visualized by at least three overlaps in Figure 6a). Evidence in support of each accession includes positive results of F st outlier and LFMM association tests as well as predictions of gene function and alignment to Cornus florida transcriptome contigs

| Repeatability of detecting putative loci under selection
Repeatability has long been a concern for all genotyping methods, including next-generation sequencing methods (Crawford, Koscinski, & Keyghobadi, 2012 regardless of population, providing additional evidence for the genetic consequences of bird dispersal of flowering dogwood fruits (Call et al., 2015;Hadziabdic et al., 2010). While demographic trends were relatively consistent when compared across both GBS datasets, there was only a small portion of candidate loci consistently detected between analyses of the two sequence libraries (Figure 6a), albeit the 54 loci considered candidates of selection (Table 3) showed largely consistent patterns of changes in allele frequencies among populations sampled (Table S1).
The total genotyped loci shared among library one and two ranged from approximately 75%-82% when compared to the opposing li- Nonetheless, mean depth coverage per individual remained on average above 30×, which according to precedents aiming for an average of 20× coverage per RAD-tag (Malinsky et al., 2015), might be sufficient to avoid most instances of allele dropout. Moreover, experimental and analytical procedures implemented in this study were designed to minimize downstream effects of the potential biases described.
In regard to library quality, we minimized biases using high-quality DNA of the same amount for each sample, and we selected fragments of 300 ± 36 bps for sequencing. Although sampling size might influence results, the sample size of our subdivided populations was still sufficiently large to accurately make population genetic inferences, according to simulated study (Buerkle & Gompert, 2013). Despite potential biases and their causes, our approach placed greater credence in results that were consistent between the two libraries. This approach resulted in us considering a smaller number of locally adapted candidate SNPs, but it reduced false positives as demonstrated in Figure 4.
A candidate SNP detected by analyses of data in one of the two libraries might either be a false positive due to highly differentiated sequence error or a true candidate locus for selection. Results from our comparison of two GBS libraries suggested caution in interpreting the adaptive significance of loci that were found to be highly differentiated in only one library. In order to flag loci as candidates under selection with higher certainty, we recommended experimental repetition and the use of multiple analyses, including various F st outlier tests, GEA tests, and novel methods like GF. S (mg/dm3)

Cumulative importance
Cumulative importance C. florida for local adaptation to ongoing climatic shifts. We expect some independent local adaptations to occur in other areas outside this pilot study's scope, but the genetic and ecological patterns detected here have provided hypotheses for an expanded study that will assess large-scale environmental gradients and genetic structure across the species range.

CONFLICT OF INTEREST
None declared.

Environmental and physiological data available in Dryad Digital
Repository: doi:10.5061/dryad.h55q8. GBS datasets for each sequence library along with adaptor-barcode information are available as separate files in Supporting Information (Data S1-S3) in addition to metadata corresponding to analyses of each locus (Table S1).