Landscape genomics predicts climate change‐related genetic offset for the widespread Platycladus orientalis (Cupressaceae)

Abstract Understanding and quantifying populations' adaptive genetic variation and their response to climate change are critical to reforestation's seed source selection, forest management decisions, and gene conservation. Landscape genomics combined with geographic and environmental information provide an opportunity to interrogate forest populations' genome‐wide variation for understanding the extent to which evolutionary forces shape past and contemporary populations' genetic structure, and identify those populations that may be most at risk under future climate change. Here, we used genotyping by sequencing to generate over 11,000 high‐quality variants from Platycladus orientalis range‐wide collection to evaluate its diversity and to predict genetic offset under future climate scenarios. Platycladus orientalis is a widespread conifer in China with significant ecological, timber, and medicinal values. We found population structure and evidences of isolation by environment, indicative of adaptation to local conditions. Gradient forest modeling identified temperature‐related variables as the most important environmental factors influencing genetic variation and predicted areas with higher risk under future climate change. This study provides an important reference for forest resource management and conservation for P. orientalis.


| INTRODUC TI ON
The observed rapid pace of climate change is expected to profoundly influence species distribution and diversity, and is considered as one of the significant causes of biodiversity decline and/or loss in the next century (Dawson, Jackson, House, Prentice, & Mace, 2011;Pacifici et al., 2015;Warren et al., 2013). Evidence of climate-induced local extinction is widespread among plant and animal species (Urban, 2015;Wiens, 2016). Forest trees constitute a significant group of organisms in their combined ecological and economic importance.
Understanding how forest trees respond to climate change aids efforts to predict species range shift and informs management issues related to conservation and reforestation.
Long-lived tree species with wide distribution ranges often show clear adaptation to local environments. Local adaptation in which local genotypes have a fitness advantage than foreign genotypes is well known among tree species (Aitken & Bemmels, 2016;Hereford, 2009). Rapid climate change can break this genetic-environmental association much faster than trees' ability to adapt in situ or migrate (Aitken, Yeaman, Holliday, Wang, & Curtis-McLane, 2008;Jump & Penuelas, 2005), thus creating a mismatch between genetic adaptation to altered environmental conditions. In addition, human activities lead to population fragmentation, thereby reducing gene flow, which undoubtedly increases the risk of maladjustment of local populations when environment changes.
The development of landscape genomics is providing unprecedented insights into the evolutionary processes and the molecular basis of adaptation, aiding in understanding how species and populations respond to climate change challenges (McKinney, Larson, Seeb, & Seeb, 2017). Landscape genomics, integrating geographic and environmental information, uses a large number of genetic loci to understand the extent to which climate has shaped genetic variation in the past (Sork et al., 2013). It could further be used to quantify modern patterns of interaction between genetic variation and climate conditions, and predict vulnerable populations under future conditions when combined with methods for exploring nonlinear genotype-environment relationships in multivariate space (Fitzpatrick & Keller, 2015;Holliday et al., 2017). Gradient forest (GF) is a community-level transfer function based on machine-learning regression tree approach known as random forests (Ellis, Smith, & Pitcher, 2012). This method is now extended to analyze and map genomic variation associated with environmental tolerance across space and times (Fitzpatrick & Keller, 2015). GF modeling can also be used to calculate the difference between current and future genotype-environment relationships and forecast the geographic regions of high genetic mismatch under future climates if migration and de novo mutations cannot compensate for the required diversity. Until now, this approach has not been widely implemented in forest tree population studies.
Platycladus orientalis is a member of the family Cupressaceae and one of the dominant coniferous tree species in northern China.
The natural range of P. orientalis covers northern and northwestern China, Korea, and Russian's Far East, and it is globally introduced to Africa, Asia, Australia, Europe, North America, and South America (Li, Du, & Wen, 2016). Due to the diverse geographic regions it occupies, the species exhibits large amounts of morphological and physiological variation (Mao et al., 2010;Shi, Zheng, & Qu, 1992;Wu, 1986). However, it is not clear whether these variations are reflective of genetic diversity and what evolutionary forces have driven the diversity.
In this study, we sampled the Chinese range of P. orientalis and surveyed their genetic variation using genotyping by sequencing (GBS). For large conifer genomes, GBS has become a practical method for generating genome-wide variant data for population genetic studies (Chen, Mitchell, Elshire, Buckler, & El-Kassaby, 2013;Pan et al., 2015;Parchman, Jahner, Uckele, Galland, & Eckert, 2018;Xia et al., 2018). Our objectives were (a) to assess the species' genetic diversity and population structure, (b) to evaluate the impact of environment and geography on genetic variation, and (c) to predict genetic offset of regional populations in relation to climate change.
These investigations offer insight on environmental factors that have influenced the distribution of genetic diversity in this major conifer species, and provide basic information for forest managers to address management and conservation strategies under future climatic conditions.

| Sampling, GBS library preparation, and sequencing
During the 1980s-1990s, bulked seeds from 21 P. orientalis populations distributed throughout China and one Thuja koraiensis population (LYL) from Heilongjiang Province, China, were collected (Table 1, Figure 1) and stored at -20℃ for further use. Populations from the south of the Yangtze River (CL, LP, NP) are in small, sporadically distributed patches and appear to be introduced (Dong, Chen, Zhang, Li, & Kong, 1990). Thuja koraiensis is morphologically similar to P. orientalis and has an adjacent distribution to the northeast of P. orientalis. We included T. koraiensis to test whether it can be distinguished by GBS and whether there is gene flow between these two species.
From each population, 8-17 seeds were treated with 30% hydrogen peroxide for 1 hr and immersed in water overnight, and then germinated at 25℃ on moist filter paper (Table 1). DNA was extracted from seedlings using the cetyltrimethylammonium bromide (CTAB) method (Doyle & Doyle, 1987).
We prepared a GBS library for the 300 samples using a PstI high-fidelity restriction enzyme (New England Biolabs, MA, USA), following a previously established protocol (Pan et al., 2015) with minor modification. In brief, restriction enzyme digestion and adapter (with individual barcode) ligation were carried out simultaneously on 200 ng DNA from each sample. Then, the digested and ligated DNA were pooled, purified, and PCR-amplified. Fragment size of 330-550 bp was selected and purified. Paired-end sequencing (2 × 125 bp) was performed on an Illumina HiSeq2500.

| Processing of Illumina data
Adapter sequences and low-quality bases (base quality <20) from the tail of each read were removed using Trimmomatic v0.36 (Bolger, Lohse, & Usadel, 2014). Then, variants were built de novo from the short reads using Stacks pipeline (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011). Briefly, the cleaned paired reads were demultiplexed and trimmed to 99 bp in length using the "pro-cess_radtags" module. The matching reads were grouped into stacks and built loci de novo in each sample with "ustacks" modules. The minimum number of reads to create a stack (-m flag) was set at 3 following the strategy proposed recently (Paris, Stevens, & Catchen, 2017;Rochette & Catchen, 2017) with "-H" flag to disable haplotypes calling from secondary reads, and the maximum distance (in nucleotides) allowed between stacks to define loci (-M flag) was set at 4. After that, loci were matched up according to sequence similarity to create a catalog of all loci (i.e., a set of consensus loci) across the samples using "cstacks"; the distance allowed between sample loci (-n flag) was set to 5. The number of polymorphic loci shared by 80% of samples was used to determine the values of parameters M and n for "ustacks" and "cstacks" modules ( Figure S1; Paris et al., 2017;Rochette & Catchen, 2017). The settings of these parameters were used to control the number of SNPs recovered, measures of genetic diversity estimates, and genetic inference for downstream applications (Mastretta-Yanes et al., 2015;Shafer et al., 2017). Then, the "sstacks," "tsv2bam," "gstacks," and "populations" modules were implemented with default parameters to match each sample against the catalog and perform variants calling.
Subsequently, the variant dataset was further filtered using the "populations" module in Stacks and VCFtools (v0.1.13) (Danecek et al., 2011). Potential homeology was excluded by removing markers showing heterozygosity >0.70. SNPs with more than 50% of missing data were removed. We further filtered the dataset with a minor allele frequency (MAF) <0.05 and only kept biallelic SNPs.

| Population genetic analyses
The population structure was investigated using the model-based evolutionary clustering approaches as implemented in ADMIXTURE v1.30 (Alexander & Lange, 2011;Alexander, Novembre, & Lange, 2009) and discriminant analysis of principal components (DAPC) in R package adegenet (Jombart, 2008). Only one SNP from each GBS fragment was kept in ADMIXTURE analysis (3,911 SNPs). ADMIXTURE was run under K ranged from 1 to 10 and was repeated 10 times for each K with different random seeds. The most appropriate K value was selected after performing the 10-fold cross-validation procedure, whereby the best K exhibits low cross-validation TA B L E 1 Geographic locations, sample size (N), average heterozygosity per locus (H obs ), average nucleotide diversity (π), and Wright's inbreeding coefficient (F IS ) of the 21 Platycladus orientalis populations and one Thuja koraiensis population error (CV error) opposed to others. We used the CLUMPAK (Cluster Markov Packager Across K) web server to align and visualize the bar graphs (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015). DAPC with prior clusters defined by ADMIXTURE was carried out using the same set of SNPs.
To avoid overestimating the number of potential clusters caused by the presence of isolation by distance (IBD), as is often found in continuous populations, we further used conStruct (Bradburd, Coop, & Ralph, 2018) to identify structure in a spatially aware context. con-Struct allows for explicit test of discrete versus continuous spatial patterns by estimating the ancestral components of each population and the rate at which relatedness decays with distance. We tested both spatial and non-spatial models using loci of missing rate <30% (1,546 SNPs) for a range of K = 1-6, with 10 repetitions per each K value and 50,000 iterations per repetition. We performed 10-fold cross-validation to choose the best-fit number of clusters (K). For each best fit K, we conducted three independent runs to evaluate the convergence.

| Environmental variables and their associations with genetic variation
For each sampling location, we used a high-resolution climate database, climateAP (Wang, Wang, Innes, Seely, & Chen, 2017), to generate environmental data for China. We chose 49 variables with known impacts on plant survival and development, including 14 annual and 35 seasonal environmental variables (Table S1). We calibrated and downscaled climatic projections representing two different future scenarios RCP4.5 and RCP8.5, reflecting moderate and extreme conditions, respectively, for 2055 and 2085.
We performed GF analyses to identify the environmental variables that best explained the distribution of genetic diversity using the R package gradientForest (Ellis et al., 2012). To ameliorate some (c) Genetic structure and admixture inferred with spatial conStruct (K = 2). (d) Gradient forest mapped genetic-environmental associations across the distribution area. Colors represent the PCA-summarized gradients in genomic turnover. The first three PCs were each assigned to a RGB color, red, green, and blue. Similar colors in the sampled space correspond to similar expected genetic composition associated with climate of the problems that arise due to linkage among markers, we kept only one SNP per GBS fragment for GF analyses. Any SNP that was polymorphic in fewer than 5 of the 17 populations was removed to ensure attaining robust regression. We used 2,000 regression trees per SNP to fit GF model while keeping all the parameters set at default values.
At first, all the 49 environmental variables were included in GF models. After evaluating the ranked importance and Pearson pairwise correlations among these variables, 10 variables (eref, dd_0_djf, ext, dd5_djf, ppt_djf, pas_djf, ppt_son, ppt_jja, eref_jja, and cmd) with absolute value of Pearson correlation coefficient (r) ≤.8 were retained for the following analysis ( Figure 2; Table S1). The 10 environmental variables and the unlinked SNPs were used to build the final GF model, which was used to predict the current genomic composition of each grid point across the range of P. orientalis as defined by Hu, Jin, Wang, Mao, and Li (2015). The resulting multidimensional genomic patterns were summarized using principal component analysis (PCA) (Ellis et al., 2012). The first three PCs were each assigned to a RGB color, red, green, and blue. Similar colors in the sampled space correspond to similar expected genetic composition. This allows us to visualize the different environmental adaptations within the distribution of P. orientalis, where similar colors represent similar allele frequencies.

| Isolation by distance (IBD) and isolation by environment (IBE)
To investigate the role of geographic and environmental factors in shaping the spatial genetic differentiation, we calculated: (a) isolation by distance (IBD), (b) isolation by environment (IBE), and (c) the correlation between environmental and geographic distance. (Rousset, 1997) was calculated among populations using the R package hierfstat (Goudet, 2005). Mantel test was used to assess associations between linearized F ST /(1 − F ST ) and geographic distance and environmental distance with significance determined using 999 permutations in the R package vegan (Oksanen et al., 2018). The 10 selected climate variables were used to calculate environmental distances by first scaling and centering the variables to account for differences in magnitude, then calculating pairwise Euclidean differences between sites.

| Outlier detection
We applied three methods to detect outlier SNPs, BayeScan (Beaumont & Balding, 2004), Pcadapt (Luu, Bazin, & Blum, 2017), and Bayenv2 (Günther & Coop, 2013). For BayeScan, we designated 30 prior odds for the neutral model and included 100 pilot runs, followed by 100,000 iterations with a burn-in length of 50,000 iterations (Lotterhos & Whitlock, 2014). To decrease false positives due to multiple testing, we applied the false discovery rate (FDR) criterion (0.01). For Pcadapt, the K = 3 was selected based on scree plot ( Figure S2). Outlier SNPs were identified under FDR of 0.01 using q-value package in R (Storey, Bass, Dabney, & Robinson, 2015). For Bayenv2, we first created a neutral SNP set by excluding any outlier SNPs detected by BayeScan and Pcadapt, and retaining one SNP per GBS fragment. The covariance matrix was estimated using these SNPs with 100,000 iterations. We compared three independent runs of covariance matrices with different random seeds to ensure convergence. The same 10 environmental variables were used to calculated environmental correlations by averaging five independent

| Redundancy analysis
To estimate the degree to which genomic variation is influenced by environmental or geographic variables, we performed a series of redundancy analyses (RDAs) in the R package vegan (Oksanen et al., 2018). and degree-days above 5°C in December, January, and February (dd5_ djf) to explain population variation using the rda function in the vegan package (Oksanen et al., 2018). The ANOVA.cca function was used to test the significance of the partitioning with 999 permutations.

| Genetic offset under future climates
To identify the spatial regions where genetic-environmental relationships will be most likely disrupted by climate change, we first used the current GF model to predict genetic compositions under RCP4.5 and RCP8.5 for the years 2055 and 2085. Then, we calculated the Euclidean distances between the current and predicted future genomic composition to represent the genetic offset between current and future climates across the landscape Fitzpatrick & Keller, 2015). We visualized the genetic offset for different climate scenarios in geographic space to show the spatial distribution of population-level vulnerability to climate change.

| Sequence data
Our GBS generated 520 million paired-end reads from 300 individuals, of which 472 million reads (90.7%) passed initial quality filters (

| Population genetic structure
Using the unlinked SNPs ( conStruct cross-validations showed that the spatial model was marginally superior to non-spatial model as the predictive accuracy of non-spatial model continued to improve slightly as subsequent clusters were added up to K = 6 ( Figure S4), indicative of overestimating the number of potential clusters. For the spatial model, the predictive accuracy was highest at K = 3, but additional spatial layer beyond K = 2 contributed very little to total covariance ( Figure S4). Thus, the spatial model at K = 2 sufficiently described the population structure, and the clustering patterns of spatial and non-spatial models were very similar, indicating the contribution of IBD to the population structure was small. The decay of genetic relatedness against the geographic distance also supported weak IBD within each ancestral layer (α D ≈ 0) ( Figure   S5). The pattern of admixture along the latitudinal gradient re-

| Environmental associations with genetic variation
Three small, sporadically distributed, and potentially introduced P.
orientalis populations located south of the Yangtze River (CL, LP, NP) and the T. koraiensis population (LYL) were removed from this analysis, leaving 17 populations for all following analyses. We performed GF analyses to test whether a subset of genomic variation can be explained by environmental effects and to visualize climate-associ- and tmax_jja) were all related to temperature, suggesting temperature was a key factor influencing distribution of P. orientalis (Figure 2).

| Partitioning genomic variation to IBD and IBE
Gene flow patterns may align with environmental or geographic distances, so we tested isolation by distance and environment. A as measured by adjusted R 2 (6.3%-10.7%, p < .05; "combined fractions," Table 2). To further unlock the contribution between geography and environment, we performed partial RDA. A total of 13.1% of variation could be jointly explained by environment and geography ("total explained," Table 2). Environment and geography each exclusively explained 6.8% (p < .05) and 2.4% (p > .05) of the variation, respectively.
Outlier SNPs may be constituted by environment or geography or both. We found 2%-19.3% of the variation could be explained by both factors jointly (

| D ISCUSS I ON
This population genomics study confirmed the genetic divergence between P. orientalis and T. koraiensis, identified population structure in P. orientalis, and revealed evidence of adaptive genetic variation through a combined F ST outlier, genetic-environmental association, and RDA approach. Based on current genetic-environmental associations and machine-learning-based modeling, we identified regions with high genetic offset in P. orientalis distribution range where genetic-environmental relationships are most likely to be disrupted under future climate conditions. Our analyses provide the first insight on diversity and evolutionary forces operating in this species and assist genetic conservation and reforestation operations.

| Population structure
Strong genetic differentiation between P. orientalis and T. koraiensis populations supports their divergence as two different species.
The taxonomic status of the two species is debated, as P. orientalis is also named as T. orientalis (Durak et al., 2019;Xie, Dancik, & Yeh, 1992  China (Xia et al., 2018). In addition, human activities also contributed to the fragmentation of P. orientalis. It is suggested that P. orientalis became fragmented into relatively small and isolated populations since at least 1,500 years ago (Xie et al., 1992). A more thorough sampling covering all isolated populations would provide a more detailed demographic history of the species.

| Environmental adaptation
Environment has been widely reported as a strong selective pressure on natural populations (Joshi et al., 2001;Mosca et al., 2012).
Testing for IBD and IBE in P. orientalis using Mantel test indicated that geographic and environmental distances were almost equally important to the observed genetic differences. The relationship between geographic and environmental distances was highly correlated, which made it difficult to correctly parse out the factor that plays the key role in shaping the genetic variation. We thus further applied RDA to subdivide the genetic variation to environment and   (Xia et al., 2018). This signature of IBE can be produced by genetic adaptation to local environments (Nachman & Payseur, 2012;Wang & Bradburd, 2014).
Local adaptation studies on climate change contribute to understanding the ability of populations to sustain or adapt to rapid climate change (Fournier-Level et al., 2011). Adaptive variation is partially structured by environmental factors, which may be mostly driven by temperature gradients for P. orientalis (Fu & Shen, 1989).
GF analyses indicated that temperature was by far the most important variable associated with genetic variation. Temperature is a key factor influencing growth and phenology of various tree species, including P. orientalis (Fu & Shen, 1989). Temperature influences the growth of plants by affecting the metabolic processes such as photosynthesis, respiration, and transpiration, as well as the metabolic processes that affect the synthesis and transportation of organic matter (Wahid, Gelani, Ashraf, & Foolad, 2007).
Additionally, ambient temperature can directly affect soil temperature, thus affecting the absorption and transport of water and nutrients. Low temperature (dd_0_djf) seemed to be an indispensable factor, which is not only a limiting factor for the survival (Dong et al., 1990) but also a critical factor for volume growth in P. orientalis (Chen, Yang, Li, Xu, & Wang, 2012). However, the physiological mechanism of P. orientalis responding to low temperature is not yet understood. Dissection of this adaptive mechanism should be the objective of future studies.
Water availability is commonly recognized as another critical factor delimitating tree species' distribution in northern China (Mao & Wang, 2011;Xia et al., 2018). However, for P. orientalis only one of the top ten-ranked environmental variables is in the water regime indicating that precipitation has less impact on genetic adaptation than temperature. This is likely due to the strong drought tolerance of P. orientalis, as it can survive under annual precipitation of less than 200 mm and soil moisture content below 5% (Dong et al., 1990).
Furthermore, seasonal precipitation (ppt_jja, ppt_son, ppt_man, ppt_djf) appeared to be more important than annual mean precipitation (map, Figure 2), suggesting that the time of precipitation might have a greater impact on phenological and periodic events such as flowering and growth than annual variables.

| Genetic offset in future climates
Based on the current genetic-environmental associations, we attempted to assess the potential genetic offset in P. orientalis under future conditions using GF modeling. The same strategy has been applied to a variety of species Fitzpatrick & Keller, 2015;Maier, 2018). Using this method, we gain insight into the potential risk of a species' persistence under climate change.
Our GF analyses suggested that P. orientalis would be less affected in the northwestern Loess Plateau and most of northern China, while relatively high genetic offset was predicted in the northern and southern margins. These high-risk regions would need to adapt fast, either actively or passively, to keep pace with climate change; otherwise, populations in these regions may decline. Similar to other conifers, P. orientalis has a relatively long lifecycle with long generation intervals, in which the rate of emergence and spread of novel adaptive alleles in populations through de novo mutations are likely to be too slow to respond to rapid future climate changes.
The prediction accuracy of the GF models has been verified on bird populations Ruegg et al., 2018). However, true evolutionary responses in long-lived conifers are more complex to predict than GF models and may involve interactions between selection and the distribution of fitness effects of minor alleles and new mutations. Minor alleles were excluded from many analyses in this study due to consideration of genotyping errors in GBS procedure, for which we applied stringent filtering including MAF. Rare alleles may contribute to adaptation, and their roles should be better investigated using other genotyping methods (e.g., resequencing).
Additionally, effective population size, the level of standing genetic variation, and the stage of population equilibrium in terms of local adaptation can influence the accuracy and power of GF projection and interpretation. Future work is needed to combine landscape genomics and empirical data on phenotypic variation of the P. orientalis populations to validate and adjust model predictions.

| Management strategy
Platycladus orientalis is one of the most widely distributed coniferous trees in China (Dong et al., 1990) with a long life span, strong adaptability, and wide utilization. It is of great significance for accelerating the greening of China and improving the ecological environment.
Due to the drought resistance, this species plays an important role in China's landscape, especially in the northwestern Loess Plateau and the establishment of protection shelterbelts in northern China (Dong et al., 1990). In the present study, we identified the genetic structure and differentiation within P. orientalis range, which offers an opportunity for optimal seed zone delineation, allocation of seed sources, seed movement for reforestation, and genetic conservation. For example, traditional seed transfer guidelines are based on provenance trials and climate models to select the range of seed transplants. However, establishment and maintenance of provenance trials are expensive and time-consuming, resulting in limited information available, which makes it challenging to develop either population response functions or transfer functions (Mátyás, 1994).
The knowledge generated from the present study could serve as complementary or an alternative to traditional approaches.
Considering that future climate may dramatically change in certain parts of the species' range, we propose adopting assisted gene flow to increase genetic diversity and adaptation to the anticipated climate changes. Assisted gene flow is a managed migration of individuals or gametes between populations within species range, which may be effective in accelerating adaptation to future climate (Aitken & Whitlock, 2013). In the areas of predicted high genetic offset, we should consider the use of a composite provenance that mixes native with selected non-native seedlots to increase diversity and resilience. The main genetic clusters detected in our study were broadly distributed, encompassing large variations in growth, suggesting phenotypic variations evolve at a different pace. Thus, composite seed sourcing would allow faster climate matching without risking potential genetic mismatch. For example, in the north margin where the predicted genetic offset is high, it may be appropriate to introduce seeds from the warmer southern regions of the same climate zone, where the seeds may include pre-adapted genotypes to future climate. It must be noted that our prediction is based on simulation of genomic information, without considering the function of other potential factors such as phenotypic plasticity and the stage of population equilibrium in terms of local adaptation. It would be valuable to conduct experiments on seedlings from different regions to be exposed to varying combinations of water and temperature to evaluate responses to environmental conditions. Such testing will help to refine seeds transfer strategy and validate genetic-environmental interactions.

ACK N OWLED G EM ENTS
This study was supported by the National Natural Science

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.