Detecting the genetic basis of local adaptation in loblolly pine (Pinus taeda L.) using whole exome‐wide genotyping and an integrative landscape genomics analysis approach

Abstract In the Southern United States, the widely distributed loblolly pine contributes greatly to lumber and pulp production, as well as providing many important ecosystem services. Climate change may affect the productivity and range of loblolly pine. Nevertheless, we have insufficient knowledge of the adaptive potential and the genetics underlying the adaptability of loblolly pine. To address this, we tested the association of 2.8 million whole exome‐based single nucleotide polymorphisms (SNPs) with climate and geographic variables, including temperature, precipitation, latitude, longitude, and elevation data. Using an integrative landscape genomics approach by combining multiple environmental association and outlier detection analyses, we identified 611 SNPs associated with 56 climate and geographic variables. Longitude, maximum temperature of the warm months and monthly precipitation associated with most SNPs, indicating their importance and complexity in shaping the genetic variation in loblolly pine. Functions of candidate genes related to terpenoid synthesis, pathogen defense, transcription factors, and abiotic stress response. We provided evidence that environment‐associated SNPs also composed the genetic structure of adaptive phenotypic traits including height, diameter, metabolite levels, and gene transcript abundance. Our study promotes understanding of the genetic basis of local adaptation in loblolly pine and provides promising tools for selecting genotypes adapted to local environments in a changing climate.

emphasize three key points: (a) low temperature to the north and low rainfall to the west limit the distribution of southern pines; (b) the annual average minimum temperature is the most important climate variable related to growth and survival; (c) for loblolly pine, seeds from east of the Mississippi River should not be used in the west because of the higher danger of losses due to droughts (Schmidtling, 2003).
As the climate changes, traditional seed selection guidelines may need to be adjusted in response to a changing climate scenario. An altered temperature and precipitation pattern threatens forests with droughts, fires, and pathogen outbreaks, eventually leading to damage to the quality and yield of wood produced (Allen et al., 2010). Landscape genomics methods have been applied to explore the genetic basis of local adaptation in loblolly pine. The main objectives of these studies were to identify the environmental factors that have shaped the adaptive genetic variation and the gene variants that drive local adaptation (Rellstab, Gugerli, Eckert, Hancock, & Holderegger, 2015;Sork et al., 2013). Eckert, Heerwaarden, et al. (2010a) found five loci correlated with aridity and identified 24 loci as F ST outliers in loblolly pine. Eckert, Bower, et al. (2010) also found several well-supported loblolly pine SNPs associated with principal components corresponding to geography, temperature, growing degree-days, precipitation, and aridity. Chhatre, Byram, Neale, Wegrzyn, and Krutovsky (2013) detected SNPs as candidates for diversifying and balancing selection in natural and breeding loblolly pine populations in East Texas. Despite application of multiple methods, the size and complexity of conifer genomes limit the progress to further dissect the genetic basis of local adaptation.
In the current study, we aimed to discover more loci and genes with signatures of natural selection and incorporated phenotypic data into environmental adaption analyses to improve insight. We have discovered 2.8 million SNPs using whole exome sequencing from a clonally propagated association mapping loblolly pine population (Lu et al., 2016(Lu et al., , 2017a(Lu et al., , 2017b. This population represented diverged ecophysiological regions across 12 states in the Southern United States, extending from Texas to Virginia. Loblolly pine populations have shown adaptation to environment based on the geographic distributions of traits. For example, loblolly pines from west of the Mississippi River are slower growing, but more resistant to fusiform rust, drought, and crowding than trees from east of the Mississippi River (Schmidtling & Froelich, 1993;Wells, 1985). We examined associations of 2.8 million whole exome-based SNPs with climate and geographic variables in 328 loblolly pine trees using a landscape genomics approach integrating multiple analysis methods. We detected SNPs associated with both adaptive phenotypic traits and climate/geographic variables and identified candidate genes that contribute to local adaptation in loblolly pine. The results can help determine how selection affects the genetic architecture of adaptive traits. The identified loci and genes can contribute to rapid selection of genotypes with adaptive potential to climate change.

| Plant materials and genotypic data
The loblolly pine trees used in this study were originally collected for the "Allele Discovery of Economic Pine Traits 2" project (ADEPT2) . Maternal trees selected to represent a wide range of the Southeastern United States were open-pollinated in the local seed orchards. The seeds were grown and used for ADEPT2 project. In the spring of 2010, rooted cuttings of 384 trees from ADEPT2 project were planted in the Harrison Experimental Forest at the Southern Institute of Forest Genetics, near Saucier, Mississippi.
We used exome capture method to genotype trees in this population (Lu et al., 2016). Raw SNPs were filtered to retain a total of 2,822,609 SNPs that are biallelic sites without missing data (Lu et al., 2017a(Lu et al., , 2017b. For the current study, we analyzed 328 trees with a clearly known county of origin.

| Climate and geographic data
Climate and geographic data for counties of origin of each tree in the population were the same as in Eckert, Heerwaarden, et al. (2010a).
The data were originally gathered from the WORLDCLIM 2.5-min geographical information system (GIS) layer using Diva-GIS v.5.4 (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). The dataset contained a total of 58 variables, including latitude, longitude, elevation, average minimum and maximum temperature for each month, average precipitation for each month, and 19 bioclimatic variables (Table   S1). The bioclimatic variables are summary statistics of precipitation and temperature. For example, BIO1 represents annual mean temperature, and BIO12 represents annual precipitation. The JMP Pro 12 F I G U R E 1 Loblolly pine trees (Pinus taeda L.) in the Harrison Experimental Forest at the Southern Institute of Forest Genetics, near Saucier, Mississippi statistical software (SAS Institute) was used to display the variation of climate variables across the counties of origin for the studied trees.

| Environmental associations and outlier analyses
Multiple approaches were employed to discover the loci associated with climate and geographic variables. The process is schematically summarized in Figure 2. We associated 2.8 million SNPs with climate/geographic variables using TASSEL 5.0 (Bradbury et al., 2007). We applied general linear model (GLM) method (S model), mixed linear model (MLM) method incorporating a kinship matrix (K model), the GLM incorporating the covariate to adjust for population structure (Q model) and the MLM incorporating both the kinship matrix and population structure covariate (QK model) to conduct association analyses. Simple sequence repeat (SSR) markers generated from the previous study (Eckert, Heerwaarden, et al., 2010a) were used to adjust for population structure. However, the SSR markers were only available for 249 out of the total 328 trees. We grouped the 249 trees with SSR markers as pop population. Since the genetic structure of this population is mainly caused by the Mississippi River, we grouped the 300 trees in the east of the Mississippi River as east population. We did not analyze the trees in the west of the Mississippi River because there are too few trees. The total 328 trees were called total population. We applied S and K models to the total and east populations. We applied S, K, Q, and QK models to the pop populations. The kinship matrix was estimated using the SNP markers of each population and the TASSEL 5.0 (Bradbury et al., 2007). Quantile-quantile plots were generated for observed against expected −log 10 (P) to examine the model fitness. Significance of associations between loci and traits were determined by the p-values. A corrected Bonferroni threshold 0.05/94,478 = 5.29E-7, where 94,478 was the number of haplotype blocks in the studied population (Lu et al., 2017a), was applied to screen for significant loci. The intersection of the SNPs acquired by different models was used in the downstream analyses.
Two outlier detection methods were employed to detect loci under selection and potentially involved in local adaption. One method is the spatial ancestry analysis (SPA), which identifies SNPs with significant gradients in allele frequency (Yang, Novembre, Eskin, & Halperin, 2012). The geographical location (longitude and latitude) information for each tree was supplied as the "-location-input." SNPs with SPA scores above the 99.9% percentile were considered as outliers. Another outlier detection method was implemented by the OutFLANK software (Whitlock & Lotterhos, 2015). It infers the F ST distribution for a large set of loci and identifies the loci that may contribute to a significant local differentiation and potential adaptation. A Q-value of 0.05 was applied to detect outliers. Following the program recommendation, 1,323,910 SNPs with a minor allele frequency (MAF) ≥0.05 were used for the SPA and OutFLANK analyses.
We used multivariate analysis to identify the significance of climate in structuring genetic diversity among the outlier SNPs. The multivariate relationships were examined using the redundancy analysis (RDA) implemented by the R package vegan (Oksanen et al., 2017;R Core Team, 2018). We examined the SNP outliers' variation explained by pure climate, or pure geography variables, or both climate and geography variables using RDA (Sork et al., 2016). We first ran a full RDA model by using all climate and geographical variables as explanatory variables and SNP outliers as dependent variables.
The total explainable variance is the inertia (variance) for the constrained matrix of the full model. Then we ran a partial RDA (pRDA) model by using climate as explanatory variables, which were conditioned on geography, and SNP outliers as dependent variables. The pure climate variance is the inertia for the constrained matrix conditioned on geography. We also ran a pRDA model by using geography as explanatory variables, which were conditioned on climate, and SNP outliers as dependent variables. The pure geography variance is the inertia for the constrained matrix conditioned on climate. The joint climate/ geography variance was calculated by subtracting the pure effects from the total explainable variance. Statistical significance of the model estimates was assessed using a permutationbased analysis of variance (ANOVA).

F I G U R E 2 Summary of different approaches used in this study
Association of the outlier SNPs with climate and geographic variables was analyzed using the Samβada software (Stucki et al., 2017). This software is based on the logistic regression model and assesses whether the allelic variation correlates with specific environmental variables. Spatial association due to population structure is accounted for by measuring indices of spatial autocorrelation of geographical coordinates of each tree. In this study, the parameters for Samβada analysis were set up as: Spatial autocorrelation was measured along longitude and latitude using spherical coordinate and 20 nearest neighbors; both global and local autocorrelation of loci were included, and the significance was assessed with 1,000 permutations. The detection of selection signatures was based on univariate models, and the threshold for screening significant models was set to 1%.
We searched for SNPs associated with both adaptive phenotypic traits and climate/geographic variables to better understand how selection pressures shape the genetic structures underlying local adaptation. Using the SNP set of the current study, we previously found SNP associations with such adaptive phenotypic traits as specific leaf area, branch angle, height, diameter, crown width, carbon isotope discrimination, and nitrogen content (Lu et al., 2017a).
Briefly, we sampled and measured the traits of the studied loblolly pine trees at the Harrison Experimental Forest in the spring of 2014.
We measured height, branch angle, stem diameter, and crown width on the site. We collected needle samples and measured specific leaf area, carbon isotope discrimination, and nitrogen content in the laboratory. We also found SNP associations with metabolite levels and expression of wood development-and stress resistance-related genes (Lu, Seeve, Loopstra, & Krutovsky, 2018). Relative transcript abundance of 111 genes involved in xylem development (Palle et al., 2011) and 88 genes involved in disease or drought response (Seeve, 2010) were measured using reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 82 metabolites with known names were measured in woody tissues (Eckert et al., 2012).
By associating 2.8 million exome-based SNPs and adaptive phenotypic traits, we identified 5 SNPs associated with specific leaf area, 2 with branch angle, 3 with crown width, 4 with stem diameter, 9 with total height, 4 with carbon isotope discrimination, 2 with nitrogen concentration, 1841 with 191 gene expression mRNA phenotypes, and 524 with 53 metabolite level phenotypes. In the current study, we focused on the loci associated with both adaptive phenotypic traits and climate/geographic variables. The JMP Pro 12 statistical software (SAS Institute) was employed to display the variation of climate/geographic variables, genotypes, and phenotypic traits.
We obtained the annotation for genes that contain identified SNPs from loblolly pine gene annotation files available on https :// treeg enesdb.org/FTP/Genom es/Pita/v1.01/annot ation/ (Wegrzyn et al., 2014). SNPs within 5,000 bp downstream or upstream of a gene were considered to be within a putative regulatory sequence.
If a SNP is located in a region without annotation, we performed a blastx search by querying the flanking sequence 700 bp upstream and downstream of the SNP against the entire National Center for Biotechnology Information (NCBI) nonredundant (nr) protein database (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The VCFtools software (Danecek et al., 2011) was used to calculate the minor allele frequency (MAF). Maximum temperature of the warmest month (BIO5) and mean temperature of the driest quarter (BIO9) were higher in the western range ( Figure S1). Mean temperature of the wettest quarter (BIO8), precipitation seasonality (BIO15), and precipitation of wettest and warmest quarter (BIO16 & BIO18) were higher in the eastern range ( Figure S2). Precipitation of the coldest quarter (BIO19), driest month (BIO14), and driest quarter (BIO17) were higher in Louisiana,

| Climate variation in the loblolly pine natural range
Mississippi and Alabama compared with other states. Along South to North, minimum temperature of the coldest month (BIO6) and mean temperatures of the warmest and coldest quarters (BIO10 & BIO11) decreased, while temperature seasonality (BIO4) and annual temperature range (BIO7) increased.

| SNPs associated with climate and geographic variables
We identified 503 associations, including 49 climate/geographic variables and 293 SNPs (
We found 33 loci identified by both SPA and OutFLANK software (Table S5). The MAFs of these 33 loci ranged between 0.06 and 0.47 with a median of 0.21. These 33 loci resided in 12 annotated genes encoding proteins that include the leucine-rich repeat receptor-like serine/threonine-protein kinase, the bHLH transcription factor, oxidoreductase, and an EARLY FLOWERING 3-like protein.

| Multivariate analyses of the identified SNP outliers
The multivariate analyses using redundancy analysis (RDA) confirmed that the outlier SNPs are significantly correlated (ANOVA  Table S6).
The variables explained the most variation on RDA1 were average precipitation in January, February, March, April, and December, precipitation of coldest quarter (BIO19), precipitation of the driest quarter (BIO17), mean temperature of wettest quarter (BIO8), mean diurnal range (BIO2), and precipitation of driest month (BIO14) ( Figure S3 and Table S7).

| Outlier SNPs associated with climate and geographic variables
We identified 1,790 associations between 323 SNP outliers and 47 climate/geographic variables using the Samβada software (Table S8).
Among them, 963 associations were related to temperature, 476 to precipitation, 41 to latitude, and 310 to longitude. The outlier SNPs associated with environment had MAFs between 0.05 and 0.49 with a median of 0.21, residing in 250 annotated genes.
Taken together, we identified 611 unique SNPs associated with 56 climate and geographic variables ("environmental SNPs"-envSNPs) using either the TASSEL or Samβada software (Table   S9). Only two variables, precipitation seasonality (BIO15) and precipitation of the driest quarter (BIO17) were not found to be

| SNPs associated with both climate/geographic variables and adaptive phenotypic traits
We identified five envSNPs associated with both height and diameter, 10 with height only, 114 with 27 metabolite levels, and 242 with expression levels of 47 genes (Table 2 and Tables S11 and S12). For example, 54 envSNPs associated with arachidic acid levels, and more than 60 envSNPs associated with the expression levels of ANR and NCED genes.
We combined genomic, phenotypic, and climate/geographic data to analyze adaptive genetic variation. For example, we found the envSNP scaffold10517.2_56785 (identified by both association and outlier detection methods) correlated with expression levels of the ANR and NCED genes. The expression levels of these two genes also correlated with precipitation of May (Figure 5a). The ANR gene encodes an anthocyanidin reductase, which is important for the biosynthesis of condensed tannins (Xie, Sharma, Paiva, Ferreira, & Dixon, 2003). The NCED gene encodes a 9-cis epoxycarotenoid dioxygenase, which prepares precursors for synthesis of abscisic acid  (Tan et al., 2003). ABA is a key regulator of seed development, root growth, stomatal aperture, and plant responses to water stress.
The envSNP scaffold10517.2_56785 resided in a gene encoding an abietadienol/abietadienal oxidase-like protein, which is a multifunctional and multisubstrate cytochrome P450 monooxygenase that contributes to conifer defense by generating an enormous structural diversity of plant terpenoid secondary metabolites (Ro, Arimura, Lau, Piers, & Bohlmann, 2005). Individuals with the AA genotype   Table S1; CDS, coding sequences; D, diameter measured at 18 inches above the ground; H, height; MAF, minor allele frequency; MAX and MIN, maximum and minimum temperature of each month; NA, not annotated; PRECIP, average precipitation of each month. a For SNP that was not located in the annotated region, the flanking sequence 700 bp upstream and downstream of the SNP was used to query against the NCBI Genbank nonredundant protein database using blastx.

| D ISCUSS I ON
We identified 611 envSNPs associated with 56 climate and geographic variables. Longitude, maximum temperature of the warm months and monthly precipitation associated with most envSNPs.
The identified envSNPs resided in genes related to terpenoid synthesis, pathogen and disease defense, transcription factors and abiotic stress response. We also found that some envSNPs composed the genetic structure of adaptive phenotypic traits including height, diameter, metabolite levels, and expression of genes.

| Comparison of multiple analysis methods
Combining environmental association analyses with outlier detection methods is a desirable way to reduce the rate of false positives and assess the relevance of findings in landscape genomic research (Le Corre & Kremer, 2012;Rellstab et al., 2015), but each method has its strengths and weaknesses. TASSEL exploits the genomic diversity at a very high resolution; hence, it is sensitive for detecting associations even for SNPs with low minor allele frequencies implemented in the current Samβada software may be overly conservative and may result in overlooking potentially adaptive loci (Stucki et al., 2017). We applied the multivariate approach RDA to examine the relationship between climate/geographic variables and genetic variation of the outlier SNPs. We identified precipitation factors as the important drivers for local adaption. However, the joint effect of climate and geography due to collinearity comprised 47% of the SNP outliers' variance, and the strong pattern of collinearity may skew the results (Rellstab et al., 2015). alleles detected by environmental association and SNPs outliers detected by the outlier methods is consistent with previous studies on loblolly pines and other species (Anderson, Kono, Stupar, Kantar, & Morrell, 2016;Eckert et al., 2013). The minimal overlap indicated two different modes of local adaptation: selective sweeps and polygenetic adaptation (Pritchard & Di Rienzo, 2010). These two modes may influence different sets of alleles. Selective sweeps drive adaptive alleles to fix in population, while polygenetic adaptation attain adaptation through modest changes in allele frequencies at large numbers of small-effect alleles (Pritchard, Pickrell, & Coop, 2010). Selective sweeps can occur and impact polygenic adaptation (Stetter, Thornton, & Ross-Ibarra, 2018), but polygenetic traits rarely reach fixation (Pavlidis, Metzler, & Stephan, 2012), thus most polygenic adaptation alleles with small allele frequency shifts cannot be distinguished from neutral pattern of variation. We need to apply different models to detect the polygenic adaptation and selective sweeps (Pritchard & Di Rienzo, 2010). Another explanation for the minimal overlap is either method used in the current study, association or outlier, or even neither of them, has enough power to detect most adaptive alleles; thus, output of the two methods does not overlap. Large numbers of small-effect alleles mainly compose the genetic structure of adaptive phenotypes (Mackay et al., 2009;Rockman, 2012). However, small-effect alleles are prone to homogenizing effects from migration or gene swamping (Lenormand, 2002;Yeaman, 2015), leaving many alleles that contribute to local adaption do not have large differences in allele frequency compared with putatively neutral markers; thus, conventional models are less powerful in detecting small-effect alleles. Large sample size and improved design may increase the power to discover the polygenic alleles with small effects (Berg & Coop, 2014).
There is no single widely accepted statistical approach (Rellstab et al., 2015). Integrating multiple methods and compiling all possible results can provide more reliable information for downstream analyses. Follow-ups are needed to validate the detected adaptive loci and genes using independent populations, knockout mutants, common garden, and reciprocal transplant experiments (Rellstab et al., 2015).

| Evidence of selection by environment
The identified SNP-environment associations helped us recognize the climate and geography variables that have shaped the genetic variation. We found that longitude, maximum temperature of the warm months and monthly precipitation were variables associated with the most envSNPs ( Figure 4). They may act as selective factors driving loblolly pine local adaptation. Although the seed transfer guidelines advised the yearly average minimum temperature as the most important climate variable for southern pines (Schmidtling, 2003), the current study highlights the importance and complexity of maximum temperature of the warm months and monthly precipitation in shaping the genetic variation underlying loblolly pine adaptability. A significant increase in the number of consecutive days exceeding 35°C (a metric used as a measure of heat waves) and a decline in the net water supply availability are expected over the next decades, particularly in the western part of the loblolly pine range (Kunkel et al., 2013;Sun, 2013). In a rapid climate change scenario, if adaptation of loblolly pine cannot match the increased heat and drought conditions, the productivity and thus the economic and ecological profits will be greatly damaged. Selecting and planting genotypes adapted to the changing climate may reduce losses in loblolly pine plantations. A cautionary note of the current study is that the studied trees were grown in local seed orchards from open-pollinated seeds. Although the seed orchards were close to the origins of seeds, the uncertainty of paternal origin may leave the studied trees have different adaptive phenotypes from the naturally grown trees.
According to a global assessment, forest trees may be exhibiting increased mortality due to drought, heat, insect outbreaks, and wildfires under climate change (Allen et al., 2010 (Eckert, Heerwaarden, et al., 2010a). In the current study, this gene was found to be associated with average maximum temperature in February and March, precipitation in January, February, April, June, November, and December, mean temperature of the driest quarter (BIO9), annual precipitation (BIO12), and precipitation of the coldest quarter (BIO19). The MATE efflux family proteins play important roles in a wide range of biological processes, such as transporting secondary metabolites, regulating disease resistance, and detoxifying toxic compounds (Liu, Li, Wang, Gai, & Li, 2016). These consistently detected genes are strong candidates underlying loblolly pine adaptability.
Combining environmental association analyses with dissection of phenotypic traits can greatly improve our understanding of the genetic basis of local adaptation. Talbot et al. (2017) reported that loci with local adaptation signatures in loblolly pine were also linked to gene expression traits for lignin development and wholeplant traits. In our study, more associations between loci with local adaption signatures and adaptive phenotypic traits were detected due to the application of 2.8 million SNPs. The loci with local adaption signatures correlated with height, diameter, metabolite levels, and expression of genes. These results indicate that genes underlying adaptive phenotypic traits are likely involved in adaptability to the environment. These candidate genes need to be further tested in validation populations located in different environments.

| CON CLUS ION
We identified 611 SNPs associated with 56 climate and geo- Institute of Food and Agriculture, Award #2011-68002-30185.

CO N FLI C T O F I NTE R E S T
The authors declare no competing interest.

AUTH O R CO NTR I B UTI O N S
C.A.L and K.V.K. conceived idea, designed the study, obtained the funding, coordinated the laboratory and field work, and assisted with editing the manuscript. M.L. performed the sample collection, data generation and analyses, and wrote the draft manuscript. All authors read and approved the final manuscript.

DATA ACCE SS I B I LIT Y
The 2.8 million whole exome-based SNPs used in this study are available from the Dryad Digital Repository (https ://doi.org/10.5061/ dryad.269126c).