Genetic adaptation of Tibetan poplar (Populus szechuanica var. tibetica) to high altitudes on the Qinghai–Tibetan Plateau

Abstract Plant adaptation to high altitudes has long been a substantial focus of ecological and evolutionary research. However, the genetic mechanisms underlying such adaptation remain poorly understood. Here, we address this issue by sampling, genotyping, and comparing populations of Tibetan poplar, Populus szechuanica var. tibetica, distributed from low (~2,000 m) to high altitudes (~3,000 m) of Sejila Mountain on the Qinghai–Tibet Plateau. Population structure analyses allow clear classification of two groups according to their altitudinal distributions. However, in contrast to the genetic variation within each population, differences between the two populations only explain a small portion of the total genetic variation (3.64%). We identified asymmetrical gene flow from high‐ to low‐altitude populations. Integrating population genomic and landscape genomic analyses, we detected two hotspot regions, one containing four genes associated with altitudinal variation, and the other containing ten genes associated with response to solar radiation. These genes participate in abiotic stress resistance and regulation of reproductive processes. Our results provide insight into the genetic mechanisms underlying high‐altitude adaptation in Tibetan poplar.

The rise of landscape genomics has been expedited by next-generation sequencing, the increasing availability of public datasets of environmental factors, and the rapid development of computational power (Balkenhol et al., 2019;Rellstab, Gugerli, Eckert, Hancock, & Holderegger, 2015). For example, high-throughput sequencing technology allows the quantification of numerous allele variants across the whole genomes of many individuals (Andrews, Good, Miller, Luikart, & Hohenlohe, 2016;Luikart et al., 2003). Likewise, environmental data can be obtained at high resolution using accurate remote sensing devices (Anderson & Gaston, 2013;Pettorelli et al., 2005). Increased computational power enables analyses of the large datasets, thus, generated in a reasonable amount of time (Kidd & Ritchie, 2006;Paul & Song, 2012). Unlike traditional approaches to testing outlier loci, landscape genomics has the potential to discern adaptive patterns by identifying genetic variants coupled with particular environmental factors. Recently, numerous studies of local adaptation combining both population genetic and landscape genomic approaches have been reported in various species, notably in forest trees such as Pinus, Picea, and Populus (Eckert et al., 2010;Geraldes et al., 2013Geraldes et al., , 2014Grivet et al., 2013;Keller, Levsen, Ingvarsson, Olson, & Tiffin, 2011).
Various types of forest trees are serving as model species that can provide information about demographics and adaptive processes in forest ecosystems through population genomics or landscape genomics (Sork et al., 2013). Populus is a globally distributed tree genus that is native to the Northern Hemisphere and contains nearly 30 species. Poplars are pioneer species and ecologically important trees in their habitats. Due to several advantages of poplar species, including rapid growth, relatively small genome size (<500 Mbp), suitability for efficient genetic transformation, and ease of propagation in tissue culture, they have become important model organisms for studies of forest tree species with well-developed genetic and genomic resources (Street & Tsai, 2010;Wullschleger, Weston, DiFazio, & Tuskan, 2013). Several studies of population genomics and landscape genomics of local adaptation have been reported in Populus trichocarpa (Evans et al., 2014;Geraldes et al., 2014;Holliday, Zhou, Bawa, Zhang, & Oubida, 2016;Porth et al., 2015;Zhou, Bawa, & Holliday, 2014), Populus balsamifera (Fitzpatrick & Keller, 2015;Keller et al., 2011;Keller, Levsen, Olson, & Tiffin, 2012), Populus alba (Stölting et al., 2015), Populus tremula and Populus tremuloides (Wang, Street, Scofield, & Ingvarsson, 2016a, 2016b, and Populus deltoids Fahrenkrog, Neves, Resende, Vazquez, et al., 2017). However, very limited information is available about high-altitude adaptation in ecologically and economically important endemic alpine species such as Tibetan poplar Populus szechuanica var. tibetica, which is distributed on the Qinghai-Tibetan Plateau (QTP).
The Tibetan poplar is a perennial woody plant belonging to Populus sect. Tacamahaca that is endemic to the QTP and mainly distributed in Sichuan and Tibet along an altitude gradient from 2,000 to 4,500 m (Shen, 2014). Recent studies have concentrated mostly on the genetic diversity, phenotypic, and physiological mechanisms accounting for its adaptation to harsh environmental conditions including low temperature, strong solar radiation, and poor soil (Shen et al., 2014;Tang, Pubu, & Cidan, 2012). However, the genetic mechanisms underlying local adaptation to increasing altitudes in Tibetan poplar remain unclear. Here, we investigated the genetic diversity and genetic adaptations of Tibetan poplar at low (~2,000 m) to high altitudes (~3,000 m) to investigate its genetic adaptation to this harsh high-altitude environment using genome-wide single-nucleotide polymorphism (SNP) data obtained from genomic resequencing technologies.

| Sampling strategy and DNA extraction
A total of 400 samples were collected from Sejila Mountain in southeastern Tibet along an altitudinal gradient of 2,000-3,000 m in the summers of 2013 and 2014 ( Figure 1; Table S1). These samples were clustered into two altitude groups: high (LL and DJ) and low (PL and TM). However, they were distributed continuously throughout the study area. Each individual was collected at least 30 m from others to prevent the selection of clones. Cuttings of each sample were planted in cultivation medium composed of vermiculite and perlite in a greenhouse at Beijing Forestry University. Approximately 0.2 g newly emerged leaves was prepared from the cuttings for DNA extraction to prevent insect DNA contaminants. Total genomic DNA was extracted using the DNAsecure Plant kit (Tiangen Biotech (Beijing) Co., Ltd.) following the protocol. After quality control of extracted DNA using 1% agarose gel electrophoresis and ultraviolet spectrophotometry, at least 1.5 µg DNA from each sample was prepared for genome-wide resequencing.

| SNP calling and data filtering
Genomic DNA libraries of 500-bp fragment size were constructed and then paired-end sequenced on the Illumina sequencing platform (Hiseq 2000) at the Novogene Bioinformatics Institute (Beijing, China) in 2015. Raw reads were trimmed to remove (a) adapter sequences, (b) raw reads containing more than 15 N bases (10% of 150 nt) in a single read, and (c) raw reads containing more than 75 nt low-quality bases (Q ≤ 5). Trimmed reads were mapped to the P. trichocarpa genome version 3.0 (https://genome.jgi.doe. gov/) using BWA (mem -t 4 -k 32 -M; ) and SAMtools program "rmdup" to remove duplications ).
Pairwise kinship among 400 samples was inferred using the program King 2.2.3 with all filtered SNPs (Manichaikul et al., 2010). All duplicate individuals were removed, and 348 independent individuals were retained for subsequent analyses ( Figure S1).

| Population structure and divergence
One of the major assumptions employed in inferring population structure was that there were no spurious correlations among the measured variables. Therefore, the physical and linkage disequilibrium-correlated SNPs needed to be pruned before population structure estimation. First, SNPs were thinned by randomly selecting one SNP from a 10-Kbp window size, leaving 31,793 SNPs retained for the subsequent population structure and divergence estimation. We used both model-independent and model-dependent methods to infer population structure from resequencing data. Modelindependent principal component analyses (PCAs) were performed using the package GCTA (Yang, Lee, Goddard, & Visscher, 2011).
A more precise population genetic structure was inferred using Admixture software (Alexander, Novembre, & Lange, 2009). The predefined genetic cluster value (K) was set from 1 to 5. The number of iterations for convergence for each K is given in Figure 2b We also performed analyses of molecular variation (AMOVAs) using Arlequin version 3.5 (Excoffier & Lischer, 2010) to assess the distribution of total genetic variation. For the AMOVA, we combined the four sample sites into two altitudinal groups (high: LL & DJ; low: PL & TM).

| Gene flow and migration events
According to the continual distribution of Tibetan poplar on Sejila Mountain and lack of a geological barriers separating our four sample sites, gene flow might be a driving force for the population structure and genetic diversity. Therefore, we inferred the population divergence and gene flow by reconstructing the maximum likelihood phylogenetic tree based on allele frequency data using Treemix software version 1.12 (Pickrell & Pritchard, 2012). Three individuals of P. trichocarpa (downloaded from the Joint Genome Institute database http://phyto zome.jgi.doe.gov) were used as the outgroup.
The migration event parameter was set from 1 to 4. Pruned SNPs (n = 31,793) were used to estimate migration events.

| Estimation of genetic parameters
After inferring population structure and gene flow events, we estimated several genetic parameters, as detailed below. All of the filtered SNPs (n = 490,363) were used for calculating these genetic parameters.

Population fixation index (F ST ). The population fixation index
(F ST ) was calculated for all six pairwise comparisons of four sample sites using a sliding window method implemented in VCFTools with a window size of 80 Kbp and a step size of 10 Kbp. A multiple regression on matrices (MRM) test was performed using the package ecodist implemented in R software (https://www.r-proje ct.org/) with 1,000 permutations to test whether genetic distance was correlated with geological distance for the four sample sites.
2. Nucleotide diversity estimation (π). Sliding window estimates of nucleotide diversity (π) were calculated for all four sample sites with the same parameters used in the F ST calculation. We also compared the nucleotide diversity of different genomic regions, such as intergenic, exon, intron, UTR, upstream, and downstream regions.
3. Linkage disequilibrium (LD). The squared correlation coefficient (r 2 ) was calculated for LD of SNPs in a 100-Kb window for all four sample sites using Plink software (--ld-window-kb 100). The general pattern of LD decay was then estimated using the software LD decay with default parameters (Zhang, Dong, Xu, He, & Yang, 2019). A plot of the LD decay rate against physical distance was generated using R software (https://www.r-proje ct.org/).

| Signatures of divergent selection
To investigate the mechanisms underlying adaptation to altitudinal gradients, whole-genome scanning (490,363 SNPs) was performed with a F ST outlier approach to ascertain selective signals, executed in BayeScan software v2.1 (Foll & Gaggiotti, 2008). In this method, the  Loci identified based on the BayeScan method (q-values < 0.01) were considered as putative SNPs under selection (selSNPs). These selS-NPs were retained for the subsequent environmental association analysis.

| Candidate gene annotation
The gene IDs of potential selected genes located in selective regions were extracted from the latest general feature format (GFF) file of the P. trichocarpa genome (RefSeq assembly accession: GCF_000002775.4) using a custom Python script. We then converted gene IDs to gene ontology (GO) IDs using the bitr function and the toTable function to extract GO terms in the clusterProfiler package (Yu, Wang, Han, He, 2012) in R. The annotation database used in this study was acquired using the AnnotationHub package (Morgan, Carlson, Tenenbaum, & Arora, 2019). The accession number of the P. trichocarpa database is "AH66282." The GO annotation diagram was plotted using a custom script implemented in R software. Approximately 48.54% of 490,363 SNPs were located in intergenic regions, and 12.88% were detected in exon regions (Table S3).

| Population structure
We retained 31,793 unrelated SNPs for sequential population structure inference. In PCA, all 348 unrelated samples from the four sampling locations clustered into two spatially separated groups associated with different altitudes (Figure 2c). The first component could explain 35.8% of the differences among variables. A more comprehensive assessment of the stratification present in Tibetan poplar was obtained using Admixture software (Alexander et al., 2009). First, we inferred the probable population structure by setting subpopulation numbers from 1 to 5 and chose the most suitable K value by selecting the minimum CV error using 31,793 SNPs pruned by 10-Kbp random windows (Figure 2a,b). The most suitable value for K was 3, which had the minimum CV error.
However, the difference between the CV error when K = 3 and K = 4 was very slight. Then, we compared the differences among the CV errors of different pruned SNP datasets generated from 5-, 15-, 20-, and 25-Kbp windows (Table S4). There was no significant divergence between these values when K = 3 and K = 4 (Figure 2b).
Given the population structure and the number of sample sites, we chose K = 4 as the most suitable value for population structure in the subsequent analysis.

| Migration event inference
A common migration event indicating gene flow in the direction from high-altitude (DJ) to low-altitude (PL) sites was detected in all inferred trees. (Figure 2d & Figure S2). Another gene flow event, which was conditional on three or four migration events, was detected from the other high-altitude (LL) to low-altitude (TM) sites ( Figure S2). Without exception, the direction of these migration events was from high to low altitude.

| Genetic parameters
The average F ST value of the four sampling sites was 0.050, ranging from 0.006 (PL against TM) to 0.083 (LL against TM), indicating that there was little genetic differentiation among the four sampling locations (Table S5). Almost 96% of genetic variation was attributable to variation within sample sites, whereas only 3.64% of the variation was attributable to differences between the two altitude groups (Table 1).
The average distance at which LD values decayed below 0.2 was almost 26 Kbp. The rate of LD decay for the low-altitude group was much quicker (~25.5 Kbp) than that for the high-altitude group (~28 Kbp; Figure 2e). The slowest LD decay rate was detected at LL (~29 Kbp), and the fastest at TM (~25 Kbp; Figure 2e; Figure S4).

| Environmental association analyses
A set of 74 unique selSNPs were associated with altitudinal gradients (altSNPs) based on the hierarchical distribution of BF and absolute ρ. Most of these (n = 69) were retained after permutation tests.
Similarly, 82 selSNPs were associated with variation in solar radiation (sradSNPs) under the criteria of permutation (p < .01). However, only 6 selSNPs were associated with average temperature (tavg-SNPs; Figure 3a). A common SNP (Chr16: 1,698,907), located in gene LOC112324539, was associated with both altitude and solar radiation. This gene encoded a putative receptor-like protein kinase which was orthologous with At3g47110 in Arabidopsis. Unfortunately, we did not identify any common SNPs that were associated with all three environmental factors ( Figure S5a).
One hotspot region, located on chromosome 6 from 26.21 to 26.88 Mbp, exhibited robust signals associated with solar radiation ( Figure 3a). This region housed 20 sradSNPs, and 55% of which (11 of 20) were harbored in 10 genes (Figure 3a). The average LD value (r 2 ) for 284 SNPs from this region was 0.501, forming three main LD block regions (Figure 3a; Table 2). Another hotspot region, located on chromosome 15 from 13.39 to 14.00 Mbp, contained 11 SNPs associated with altitudinal gradients; four of these were located in four unique genes (Figure 3a).

| Annotation of environment-associated genes
In total, 58 unique genes contained eaSNPs associated with at least one environmental factor (Table S6). One gene (LOC18098834) encoded branched-chain-amino-acid aminotransferase-like protein 1, associated with both altitude and average temperature. Another gene (LOC112324539) was associated with altitude and solar radiation (Table S6 and Figure S5b). Gene ontology (GO) analysis indicated that these unique genes belonged to 48 GO terms ( Figure   S5c). The maximum number of GO terms was 3 (ATP binding), in the molecular function (MF) category.
A total of 68 genes were located in a hotspot on chromosome 15, which was associated with responses to altitudinal variation. The GO term analysis indicated that these genes were mainly in- ). Most of the altSNPs (7 of 11) were located in intergenic regions, but four altSNPs were harbored in four unique genes (Table 2).

Source of variation
Similarly, 80 genes located in a hotspot region on chromosome 6 were associated with responses to variations in solar radiation.
The GO term analysis indicated that these genes were involved in several processes, including manufacture of cell membrane com- observed in this study ( Figure S6). The nearer sample sites are to each other, the greater is the potential for gene flow between them (Sharma & Khanduri, 2007). The apparent absence of gene flow between TM and PL (~20 km) might result from a slight divergence of wind power. All analyses detected common gene flow events from DJ to PL, whereas the longest distance (~70 km) gene flow occurred was from LL to TM ( Figure S4). Such long-distance gene flow mediated by either pollen or seeds has been documented for a great diversity of tree species, as reviewed by Kremer et al. (2012).

| Genetic adaptation to an altitude gradient
Two hotspot regions were detected, one involved with responses to increasing altitude and the other with solar radiation. Four genes were associated with altitudinal gradients, and 10 were associated with solar radiation. One gene was orthologous to At3g47110, a leucine-rich repeat receptor-like kinase gene found in Arabidopsis thaliana. This LRR protein interacted with a FERRIC REDUCTASE DEFECTIVE3 (FRD3), which is involved in citrate efflux transportation and sustained microspore development during pollen tube growth in A. thaliana (Muschietti & Wengier, 2018). Another gene encoded MADS-box transcription factor 47, which is associated with solar radiation. The MADS-box transcription factors participate in floral organ initiation and identity, partially through negative regulation of brassinosteroid (BR) signal transduction (Duan et al., 2006).
Other genes located in these hotspot selective regions may also play roles in abiotic stress resistance and flowering time. For TA B L E 2 Candidate environmental factor-associated genes detected in hotspot selective regions

| CON CLUS ION
This is the first attempt to elucidate the intricate genetic structures associated with high-altitude climate adaptation in the QTP endemic woody plant Tibetan poplar, Populus szechuanica var. tibetica.
Integrating population genomic (BayeScan) and landscape genomic (Bayenv2) methods, we detected two hotspot regions with robust signals of natural selection associated with altitude and solar radiation.
These regions comprised 14 genes that are mainly involved in abiotic stress resistance and sustaining successful reproduction. Therefore, we hypothesize that Tibetan poplar adapted to high altitude partially through sustaining successful reproduction under conditions of environmental stress. The interaction between gene flow and natural selection drives local adaptation in this population of Tibetan poplar. This paper will be useful for understanding how various evolutionary forces, including natural selection and environmental factors, drive local adaptation to altitudinal differences in Tibetan poplar.

ACK N OWLED G M ENTS
We thank Dr. Wenhao Bo for his contribution to this project. This work was supported by the Special Fund for Forest Scientific Research in the Public Welfare (201404102). The English in this document has been checked by at least two professional editors, both native speakers of English. For a certificate, please see: http://www. textc heck.com/certi ficat e/index /OrJncm.

CO N FLI C T O F I NTE R E S T
The authors state that there is no conflict of interest.

AUTH O R CO NTR I B UTI O N S
Chenfei Zheng: Formal analysis (equal); resources (lead); writing -original draft (equal); writing -review and editing (equal).