Characterization of the genetic basis of local adaptation of wheat landraces from Iran and Pakistan using genome‐wide association study

Characterization of genomic regions underlying adaptation of landraces can reveal a quantitative genetics framework for local wheat (Triticum aestivum L.) adaptability. A collection of 512 wheat landraces from the eastern edge of the Fertile Crescent in Iran and Pakistan were genotyped using genome‐wide single nucleotide polymorphism markers generated by genotyping‐by‐sequencing. The minor allele frequency (MAF) and the heterozygosity (H) of Pakistani wheat landraces (MAF = 0.19, H = 0.008) were slightly higher than the Iranian wheat landraces (MAF = 0.17, H = 0.005), indicating that Pakistani landraces were slightly more genetically diverse. Population structure analysis clearly separated the Pakistani landraces from Iranian landraces, which indicates two separate adaptability trajectories. The large‐scale agro‐climatic data of seven variables were quite dissimilar between Iran and Pakistan as revealed by the correlation coefficients. Genome‐wide association study identified 91 and 58 loci using agroclimatic data, which likely underpin local adaptability of the wheat landraces from Iran and Pakistan, respectively. Selective sweep analysis identified significant hits on chromosomes 4A, 4B, 6B, 7B, 2D, and 6D, which were colocalized with the loci associated with local adaptability and with some known genes related to flowering time and grain size. This study provides insight into the genetic diversity with emphasis on the genetic architecture of loci involved in adaptation to local environments, which has breeding implications.

as revealed by the correlation coefficients. Genome-wide association study identified 91 and 58 loci using agroclimatic data, which likely underpin local adaptability of the wheat landraces from Iran and Pakistan, respectively. Selective sweep analysis identified significant hits on chromosomes 4A, 4B, 6B, 7B, 2D, and 6D, which were colocalized with the loci associated with local adaptability and with some known genes related to flowering time and grain size. This study provides insight into the genetic diversity with emphasis on the genetic architecture of loci involved in adaptation to local environments, which has breeding implications.

INTRODUCTION
The bread wheat (Triticum aestivum L.) genome is highly diversified with several distinct classes based on vernalization requirements, growth habit, planting times, and grain quality (Salamini et al., 2002). The genetic diversity of cultivated wheat has critically decreased due to the modern breeding activities that focus on a few major developmental traits (Reif et al., 2005). Of the four cultivated species (bread wheat, einkorn [Triticum monococcum Boiss.], emmer [Triticum turgidum ssp. dicoccum (Schrank) Schübl.], and spelt [Triticum spelta L.]), 95% of the cultivation and consumption is bread wheat. Modern bread wheat cultivars have reduced genetic variation with only 30% of the genetic variation found in their wild relatives (Reif et al., 2005). The loss of diversity in hexaploid wheat is mainly due to the reproductive isolation between closely related species of different ploidy levels because some of the wild relatives cannot make fertile hybrids upon hybridization (Mujeeb-Kazi et al., 2013). The reduction in genetic diversity is also attributed to the continuous artificial selection during domestication and modern wheat breeding, which positively selected genomic regions that influence productivity traits (Cavanagh et al., 2013;Jordan et al., 2015;Zhou et al., 2018). Landraces are traditional wheat varieties grown in farmers' fields before modern breeding was established and thus have unexplored genetic variations with unused alleles for tolerance to both biotic and abiotic stresses, end-use quality, and high resource use efficiency (Dwivedi et al., 2016;Lopes et al., 2015). Wheat landraces have the advantage over other wheat wild relatives of being more easily crossed with cultivated bread wheat. The large collections of landraces stored in gene banks have been characterized using modern genomic tools to systematically use beneficial traits in wheat breeding (Riaz et al., 2017;Sehgal et al., 2015;Vikram et al., 2016;Winfield et al., 2017).
Bread wheat evolved in the "Fertile Crescent" and then gradually spread to the rest of the world. N.I. Vavilov placed the origin of bread wheat to a region spanning an area from Afghanistan and Turkmenistan to Transcaucasia (Vavilov, 1926;Vavilov, 1992). The molecular evidence suggested the center of origin to be Transcaucasia and northwestern Iran (Dvorak et al., 1998). Scientific and archeological evidence suggested that Iran and Pakistan are important countries with reference to the center of diversity for cereals (Kihara et al., 1965). The southwestern Iran and Mehrgarh civilization in southwestern Pakistan are considered to be to be one of the main centers of diversity for bread wheat (Dvorak et al., 2011), barley (Hordeum vulgare L.), and einkorn wheat (Fuller, 2006;Morrell & Clegg, 2007). Thus, the in-depth diversity analyses of landraces from these geographic regions may highlight selection patterns in response to local environments and explain the polygenic adaptation.
Progress has been made in whole genome sequencing of bread wheat and its immediate progenitors (IWGSC, 2018;Rasheed & Xia, 2019), which has resulted in significant achievements in genomic studies in understanding the genetic diversity and improved adaptation of beneficial traits in bread wheat and aids the discovery of new genes for beneficial traits (Rasheed & Xia, 2019 ). In this study, we resequenced a panel of 512 landraces from Iran and Pakistan using genotypingby-sequencing (GBS) to identify the genetic regions that were selected for adaptation during wheat domestication and early breeding events, and to evaluate the genetic diversity in one of the major wheat diversity centers. In addition, we used geo-referenced agro-climatic attributes from the landrace collection sites as proxy traits and conducted genome-wide association studies (GWAS) to identify the genomic regions associated with bread wheat local adaptability.  Table S1), and 261 Iranian landraces that were collected from different climate zones between 1931 and 1968 and deposited in the USDA-ARS National Plant Germplasm System, Aberdeen, ID (Supplemental Table S2).

DNA extraction and GBS
Genomic DNA was extracted using a modified cetyltrimethyl ammonium bromide method (Saghai-Maroof et al., 1984) from five 2-wk-old seedlings of each sample. DNA concentration was quantified using the Quant-iT PicoGreen dsDNA Assay (Life Technologies Inc.) and normalized to 20 ng μl -1 for library construction. The GBS libraries were constructed following Poland et al. (2012). In brief, genomic DNA was digested using the restriction enzymes PstI and MspI (New England BioLabs Inc.), and barcoded adapters were ligated to each DNA sample using T4 ligase (New England Bio-Labs Inc.). All of the ligated products from each plate were pooled and cleaned-up using the QIAquick PCR Purification Kit (Qiagen Inc.). Primers complementary to both adaptors were used for polymerase chain reaction (PCR). The PCR products were then cleaned up again using the QIAquick PCR Purification Kit and quantified using the Bioanalyzer 7500 Agilent DNA Chip (Agilent Technologies, Inc.). After size selection for 250-300 bp fragments in an E-gel system (Life Technologies Inc.), the concentration of each library was estimated by the Qubit 2.0 fluorometer using Qubit dsDNA HS Assay Kit (Life Technologies Inc.). The size-selected library was sequenced on an Ion Proton sequencer (Life Technologies Inc.). First, sequence reads were trimmed to 64 bp, and then identical reads were grouped into sequence tags. The unique tags were aligned internally allowing mismatches of up to 3 bp to identify single nucleotide polymorphisms (SNPs) within the tags. Single nucleotide polymorphism calling was conducted using the Universal Network Enabled Analysis Kit GBS pipeline (Lu et al., 2013), which is part of the TAS-SEL 4.0 bioinformatics analysis package (Bradbury et al., 2007). Reads with a low quality score (<15) were removed, and SNPs with heterozygosity <10%, a minor allele frequency (MAF) >1%, and missing data <20% were used for further analysis. Sequence reads were aligned to the International Wheat Genome Sequencing Consortium (IWGSC) reference genome sequence (Version 1.0) (IWGSC, 2018) using BLASTn.

Data analyses
The basic statistics for the allele frequency, heterozygosity and genetic diversity were performed using TASSEL (Version 5.0). A diversity tree was built using the Neighbor-Joining algorithm (Saitou & Nei, 1987) that relaxes the assumption

Core Ideas
• Genetic loci underpinning adaptability to the target environment were revealed by GWAS. • Genes relate to flowering time and grain size were found to be associated with adaptability. • The use of historical agro-climatic data was useful in revealing loci that affect wheat adaptability. of equal mutation rates over space and time and produces an unrooted tree. The confidence interval of the genetic relationships among the accessions was determined by performing 1,000 bootstraps with the results expressed as percentages at the main nodes of each branch. For population structure, a model-based Bayesian cluster analysis was performed using STRUCTURE (Version 2.3.4) (Pritchard et al., 2000). The structure analysis was run 10 times for each K value (probability of best fit into each number of assumed clusters) (K = 1-8) using a burn-in period of 10,000 steps and 10,000 Monte Carlo steps and an admixture model. All parameters were set to default values recommended by the manufacturer (Pritchard et al., 2010). K was estimated by an ad hoc statistic ΔK based on the rate of change in the log probability of data between consecutive K values (Evanno et al., 2005). The squared correlation of allele frequency (r 2 ) was calculated using PLINK 1.9 (https://www.cog-genomics.org/ plink2; Chang et al., 2015) to evaluate the linkage disequilibrium (LD) in both sets of landraces independently and combined. The pairwise r 2 values were plotted against genetic distance in a 1-kb window, and a locally weighted polynomial regression (LOESS) curve was fitted using R software. The pattern of LD decay was determined as the genetic distance where LOESS curve intercepts the r 2 threshold of 0.1.
The eigenvector decomposition GWAS (EigenGWAS), which uses genome-wide association studies of eigenvectors, was used to identify the loci under selection (Chen et al., 2016). We used SNPs that fulfils the quality control criteria and generated a genetic relationship matrix to calculate the top 10 eigenvalues and their corresponding eigenvectors. The effect of SNP was calculated by regressing each SNP on a selected eigenvector. In principle, the estimated genetic effect for each locus is driven by genetic drift and selection, which are random. To remove genetic drift effects, we adjusted the pvalue with genomic control (GC) factor, and consequently the corrected p-value, P GC , was used to detect loci under selection. The significance threshold for loci under selection was determined by reshuffling the first eigenvector 1,000 times, and the 95th quantile of the most significant p-values from the 1,000 permutations was selected as the significance threshold. In this study, the −log 10 (p-value) of 5.87, which is equivalent to an experiment-wise type I error rate of .05, was used as the significance threshold for the EigenGWAS analyses on the 10 eigenvectors.

GIS data extraction and environmental GWAS
The georeferenced longitude and latitude positions of landrace collection sites were retrieved from the passport information of landraces from Pakistan and Iran. The landraces from other countries (n = 24) were excluded from the analysis. Climate data were extracted from extrapolated climate grids (Fick & Hijmans 2017) at a 30-s (∼1 km) resolution. Raw data files for monthly long-term (1970-2000) minimum (T min ), maximum (T max ), average (T avg ) temperatures, rainfall (PREC), vapor pressure (VAPR), solar radiation (SRAD), and soil pH (ph5) were downloaded and converted to Environmental System Research Institute (ESRI) grid format for storage and extraction (Hengl et al., 2017). Georeferenced locations for accessions were checked for errors and then used to extract climate values for each location using the spatial analyst toolbox in ESRI ArcMap 10.6 (Environmental System Research Institute, 2018).
The GWAS was conducted using a recently developed model selection algorithm, the fixed and random model circulating probability unification (FarmCPU)  with eight GIS data (T max , T min , T avg , SRAD, VAPR, latitude, longitude, and PREC) as response variables. The FarmCPU takes into account the confounding problem between covariates and test markers using a fixed effect model (FEM) and a random effect model (REM). The first five principal components calculated using TASSEL were used as covariates. The default p-value threshold that FarmCPU used was Bonferronicorrected threshold of .01. The Bonferroni-corrected threshold was overly restrictive when the LD among genotypic markers was large, so the threshold was calculated using the formula p = .05/number of markers using 1,000 permutations. In this function, the phenotypes were permuted to break the relationship with the genotypes. A vector of minimum p-values of each experiment was output and the 95% quantile value of the vector was recommended for p.threshold in the FarmCPU model. The threshold calculated by FarmCPU for the given traits was 4.68, which was used as a threshold to define marker-trait associations (MTAs). The quantilequantile (Q-Q) plot was used for assessing fitness of the model to population structure.

Genetic diversity
A total of 43,868 SNPs with <80% missing data were identified in the panel of 512 Pakistani and Iranian wheat landraces, and 33,961 (∼77.4%) of them were mapped to the IWGSC wheat reference genome (Version 1.0) (IWGSC, 2018). The highest number of SNPs were mapped on the B genome (17,186), followed by the A genome (13,012) and the D genome (3,763) ( Figure 1a). The B genome had the most transition-type (Ts) SNPs (1,691), and the D genome had the least (681) (Supplemental Tables S3 and S4). A similar chromosome distribution pattern was observed for transversiontype (Tv) SNPs; however, Tv SNPs were much more abundant (64.3%) than Ts SNPs (35.7%) with a Ts/Tv SNP ratio of 1.80. The Ts/Tv ratios in the A and B genomes were significantly higher than those in the D genome. As expected, more A/G and C/T transitions were observed than G/A and T/C transitions. On the other hand, the frequencies of C/G, A/C, G/T, and A/T transversions were higher than G/C, C/A, T/G, and T/A transversions (Supplemental Tables S3 and S4). More monomorphic SNPs (997) were observed in Pakistani landraces than in Iranian landraces (543), and these monomorphic SNPs were removed for subsequent further SNP analyses except in EigenGWAS for selective sweeps. Average gene diversity and polymorphic information content (PIC) values were slightly higher in Pakistani landraces (0.20 and 0.17) than those in Iranian landraces (0.18 and 0.15), respectively (Table 1). Similarly, the level of heterozygous SNPs was higher (8.41%) in Pakistani landraces than in Iranian landraces (5.88%). In general, the mean values of MAF and heterozygosity (H) were slightly higher for the Pakistani wheat landraces (MAF = 0.19, H = 0.08) than these for Iranian landraces (MAF = 0.17, H = 0.05) ( Table 1; Figure 1b-e).

Population structure and geographic and genetic diversity in the landrace panel
Almost all Pakistani landraces were clearly separated from Iranian landraces, with three Pakistani landraces in close proximity to the Iranian subpopulation in the principal component analysis (PCA)-biplot ( Figure 2a). Two of these landraces, accession 11186 and 11342, were collected from Balochistan (an area close to Iranian border) and one landrace (accession 12219) was from northern Pakistan. None of Iranian landraces was included in the Pakistani subpopulation. Cluster analysis using weighted pair group method analysis and principal coordinate analysis also classified the panel into two subpopulations in accordance with population structure analysis (Figure 2b and c, respectively). The Pakistani landraces were distributed over two major clusters, and the Iranian landraces were distributed over 10 independent clusters. Population structure of the landrace panel was estimated using STRUCTURE software based on genomewide SNP markers. The highest ΔK value was observed at K = 2, suggesting two subpopulations in the panel (Figure 3a). The Plant Genome

Sample size
No. of polymorphic SNPs Gene diversity Heterozygosity

Analysis of molecular variance and LD
The analysis of molecular variance for the landrace panel using 6,229 SNPs showed a much greater variation within a population (77.41%) than between the populations (22.59%, p < .001). The higher value of fixation index (Fst) value (.23) between the Pakistani and Iranian subpopulations suggests high genetic differentiation between the two subpopulations. The highest genetic differentiation between Pakistani and Iranian subpopulations were observed on the B genome (Fst = .28).

Pairwise LD and LD decay in Pakistani and Iranian wheat landraces
All the SNPs mapped to the IWGSC wheat reference genome (Version 1.0) (IWGSC, 2018) were used to estimate the extent of LD for all the accessions in the landrace panel ( Table 2). The LD decay plots in Figure 3b show distance in kilo-base pair vs. LD (r 2 ). The LD decay between the Iranian and Pakistani subpopulations revealed higher r 2 values in Pakistani landraces (Figure 3c). The number of SNP pairs in the A and B genome (Figure 3d and e, respectively) was almost the same  between the two subpopulations while the number of SNP pairs in the D genome was much higher in the Iranian subpopulation than in Pakistani subpopulation (Figure 3f). The average distance between SNP pairs was slightly higher (9.83 Mb) in the Iranian subpopulation than in the Pakistani subpopulation (9.0 Mb), with overall r 2 values of .058 and .086, respectively. The LD decayed to .2 among Iranian landraces at an average of 4.5 Mb distance, compared with 15.2 Mb in Pakistani landrcaes.

Agro-climatic variables for Iran and Pakistan
The climatic data were extracted from extrapolated climate grids at 30 s (∼1 km) resolution for all landraces from Pakistan and Iran. Seven agro-climatic variables (ACVs) were included in this study: longitude, latitude, T max , T min , PREC, VAPR, and SRAD. Pakistani wheat landraces were distributed between 70.62˚E and 33.50˚N with maximum values of 63.014-75.62˚E, 26.15-36.16˚N, whereas the Iranian wheat landraces were distributed 45˚-62˚E, 25.55-38.7˚N. The means of other ACVs such as T max and T min were 26.19 C and 11.16˚C for Pakistan compared with 25.55˚C and 0 C for Iran (Table 3). Similarly, annual PREC was 427.04 mm, VAPR was 1.12 Pa, and SRAD was 17,651.53 W/m 2 in Pakistan, whereas T max and T min were 26.2˚C and 11.16˚C, and annual PREC was 326.14 mm, VAPR was 0.74 Pa, and SRAD was 17,419.18 W/m 2 in Iran (Table 3).

Correlation between ACVs of Iranian and Pakistani wheat landraces
There was a positive correlation (r = .53) between annual precipitation (PREC) and longitude or latitude in Pakistan T A B L E 3 Mean argo-climatic variables (monthly, 1970-2000) collected at 30 s (∼1 km) resolution from the site of landraces collection from Iran and Pakistan Note. T min , minimum temperature; T max , maximum temperature; PREC, rainfall; VAPR, vapor pressure; SRAD, solar radiation.
( Figure 4; Supplemental Table S5). Iran had a positive correlation between PREC and latitude (r = .33) but a negative correlation between PREC and longitude (r = −.14). A high positive correlation between PREC and VAPR was observed in Iran (r = .98), but not in Pakistan where r values were from .06 to −.4. The maximum (T max ) and minimum (T min ) temperatures had weak positive associations with PREC and VAPR, respectively, in Pakistan, but a highly positive correlation with VAPR and a negative correlation with PREC in Iran (Supplemental Table S5). The SRAD was negatively correlated with latitude, longitude, and PREC, but positively correlated with T max and T min in Pakistan. In Iran, the SRAD was positively F I G U R E 4 Coefficient of correlation between agro-climatic variables collected from the landraces collection sites in Iran and Pakistan. Upper diagonal represents a correlation between agro-climatic variables in Iran and lower diagonal represents a correlation between agro-climatic variables in Pakistan. T min , minimum temperature; T max , maximum temperature; PREC, precipitation; VAPR, vapor pressure; SPAD, solar radiation correlated with PREC and VAPR but did not correlated with longitude, latitude, T max , and T min (Figure 4).

EigenGWAS for selective sweeps in Pakistan and Iranian landraces
EigenGWAS was conducted for the top 10 eigenvectors that explained 29.86% of the total genetic variation with each eigenvector explaining 0.83 to 12.36% of the genetic variation. The genomic inflation factor (λ GC ) that is commonly used to adjust population stratification for GWAS (Devlin & Roeder, 1999) was calculated by EigenGWAS (Supplemental Table S7). EigenGWAS was conducted based upon positive or negative coordinates of the landraces from Iran and Pakistan on the corresponding eigenvector, and their selection differentiations were quantified by Fst. After correction by λ GC , the SNPs with −log 10 (P GC ) that exceeded the threshold of 6.01 were declared as the loci under selection at the genomewide level. To facilitate the comparison, scanned results from −log 10 (P GC ) and Fst were compared using a Miami plot (Figure 5a-j for each of the 10 eigenvectors). Generally, the peaks from −log 10 (P GC ) and Fst mirrored each other, indicating reasonable grouping as defined by EigenGWAS. Overall, EigenGWAS identified selective sweeps on all 21 wheat F I G U R E 5 Miami plot from eigenvector decomposition genome-wide association study (EigenGWAS) (upper for P GC and lower for Fst) for eigenvector 1 (Ev1) to eigenvector 10 (Ev10) based on the 512 wheat landraces from Iran and Pakistan (a-j). P GC is the p-values corrected by λ GC in EigenGWAS. Ev1-Ev10 are the first 10 eigenvectors, each of which was used as phenotype for the single-marker association study based on nearly 43,868 single nucleotide polymorphism markers in EigenGWAS chromosomes (Supplemental Table S7; Figure 5), while significant hits were found on 8 of the 10 eigenvectors except for Ev1 and Ev10. The highest number of hits were 760 SNPs over the three loci for Ev8. The distribution of the identified SNPs across different chromosomes varies considerably, with the highest number on chromosome 2B (207 SNPs) and the lowest on chromosome 4D (12 SNPs). The genomic region on chromosome 4A between 103 and 504 Mb consistently had several selective sweeps with more than 98 SNPs. All the selective sweeps identified by EigenGWAS were shared by the Fst scan. Selective sweeps were consistently detected in the three genomic regions of group 1 homeologous chromosomes flanking glutenin genes ( Figure 5). Twenty-five loci in selective sweeps were within close proximity (∼10 Mb) of functional genes including chromosome 1B (TOE-B1, TaFT3-B1, and Elf3-B1 at 6.06, 582.8, and 679.4 Mb, respectively, for flowering time [Zikhali et al., 2016[Zikhali et al., , 2017

GWAS for local adaptability based on ACVs
The GWAS for ACVs identified 90 loci in Iranian landraces and 58 loci in Pakistani landraces (Figure 6a-k for each variable). The Manhattan and Q-Q-plots are shown for each trait like longitude (Figure 6a for Pakistan and Figure 6b for Iran), and vice versa, for each trait. The maximum number of loci associated with ACVs were identified on chromosome 7B (16) and minimum on chromosome 4D (2). The maximum number of loci were identified for VAPR (45) and the minimum number of loci were identified for T max (8). Among these, 17 loci were associated with multiple variables (Supplemental Table S7). For example, four loci on chromosome 6B (13.7 Mb), 7B (193.4 and 717. 2 Mb), and 7D (159.7 Mb) were associated with both T min and T max . Similarly, a locus on chromosome 6B (75.9 Mb) was associated with latitude and SRAD in the Pakistani subpopulation. Based on the colocalization of loci within 10 Mb proximity, the common loci in both subpopulations include those for longitude on chromosome 3D, annual precipitation on chromosome 4B (650 Mb), SRAD on chromosome 3D (3.3 Mb), and VAPR on chromosome 3B (∼41 Mb). Similarly, based on colocalization, six loci were within proximity of selective sweeps, which included those for VAPR on chromosome 1D, 5D in the Iranian subpopulation and 5D in the Pakistani subpopulation, and precipitation on 2A and 6B in the Iranian subpopulation. Some functional genes were also identified within close proximity of loci associated with ACVs. These include Elf3-B1 (Wang et al., 2016), a flowering time-related gene, in close proximity to 1B locus associated with longitude at the 678.2 Mb position. Vrn-D1 was also found in proximity of a locus associated with longitude among Iranian landraces. Similarly, a locus associated with longitude among Pakistani landraces was within close proximity to Ppd-D1 locus, a photoperiod responsive gene, on chromosome 2D at position 2.7 Mb. The same locus was also associated with PREC among Iranian landraces. A drought-related gene, TaCwi-4A, was identified in proximity of the locus associated with VAPR on chromosome 4A at position 647.8 Mb among Iranian landraces (Jiang et al., 2015;Webster et al., 2012). The Plant Genome

Genetic diversity and population structure
The present study investigated a panel of wheat landraces from two important geographical regions for wheat domestication, Pakistan and Iran. That the B genome had the highest number of SNPs in the panel of landraces was consistent with some previous studies (Alipour et al., 2017;Chao et al., 2007). The higher percentage of Ts SNPs (64.29%) than Tv SNPs (35.71%) in the current study is comparable with 64.5% of transition SNPs and 35.5% of transversion SNPs reported by Winfield et al. (2016). The significantly higher number of Ts SNPs than Tv SNPs was due to shifting of methylcytosine to uracil and then mutating to thymine. The hexaploid wheat genome is highly methylated because of its polyploidy, which may explain the higher number of transition SNPs in wheat. The higher Ts/Tv SNP ratio improves the accuracy for SNP prediction with a greater level of confidence.
Genetic variability in the wheat landrace panel revealed by PIC and genetic diversity reflected genetic diversity in a genomic level or polymorphism in DNA fragments of a population and may facilitate the evolutionary process by creating the selection pressure on the alternate forms of genes. The mean PIC value (0.162) estimated in wheat landraces in this study was consistent with Chao et al. (2007) and Ren et al. (2013), but lower than that (0.30) estimated by Liu et al. (2017). In the present study, the Pakistani subpopulation had a higher gene diversity than the Iranian subpopulation. The biallelic nature of most of the SNPs may be one of the reasons for overall low genetic diversity compared with that estimated by simple sequence repeats markers (Chao et al., 2007). The differences in diversity between Pakistani and Iranian landraces indicated that the Pakistani subpopulation was relatively more diverse. The genetic variation present among and within groups can be used for wheat improvements in breeding programs.
It is important to accurately analyze and define the relationship between the molecular and functional diversities in natural populations based on the structural analyses of a particular population (Buckler & Thornsberry, 2002). The population structure analyses classified the landrace panel into two subpopulations with the highest ΔK value of 2. The structure plot, cluster analysis, and PCA confirmed the classification of the panel into two lineages with some international varieties from the United States, Mexico, and Italy in admixture with Pakistani germplasm. The cluster analyses using the SNPs from each genome separately gave the same results and clearly separated the panel into Pakistan and Iran subpopulations, suggesting that the two landrace subpopulations have different genetic makeups. The genetic diversity found between the two subpopulations could be the results of different selection pressures imposed by native people and cultivating habits (Alipour et al., 2017). The higher Fst value (p < .0001) between the Pakistani and Iranian subpopulations suggests higher genetic differentiations between the two subpopulations, and most of the genetic differences between these landraces occur on the B genome.

LD in landraces
The LD is the basis for association mapping. Linkage disequilibrium has been extensively studied in bread wheat at both the full genome and individual chromosomal levels (Chao et al., 2009). In this study, we examined LD and LD decay at both levels of the wheat landrace panel. About 26.97% of SNP pairs were in significant LD (p < .001), which was similar to the LD (22%) found in maize (Zea mays L.) inbred lines (Dinesh et al., 2016). The current study also revealed that the B genome (especially chromosomes 2B and 3B) had the highest LD, and the D genome (specifically 4D) had the lowest LD. The number of SNP pairs in the LD in the D genome of the Iranian genotypes was much higher than that for Pakistani genotypes. Also, the D genome had a longer genetic distance of significant LD than A and B genomes for both the Pakistani (17.67 Mb) and Iranian (18.87 Mb) subpopulations. The results of the LD and LD decay in Pakistani and Iranian wheat landrace subpopulations may be used as a foundation to determine the proper marker density for quantitative trait loci (QTL) mapping and markers linked to QTLs for marker-assisted wheat breeding. Comparison of the LD decay distances between the Iranian and Pakistani wheat landraces revealed that Pakistani landraces had the higher r 2 values for all the three genomes than those for Iranian landraces. The variation observed at the chromosomal LD decay distances was positively correlated with the variation in recombination rates in previous studies (Dooner & He, 2008).

Selective sweeps and GWAS for local adaptability
The identification of selected regions is a basic step in understanding adaptability and selection history (Helyar et al., 2011;Narum & Hess, 2011). The genes and SNPs located in these regions should have been under selection and are valuable for breeding and for germplasm selection. In this study, selective regions between landraces in Pakistan and Iran were determined among genomes using EigenGWAS. Chen et al. (2016) proposed a single-marker regression approach based on PCA (or eigenanalysis). The regression coefficient of EigenGWAS was approximately equal to Fst. In recent studies, EigenGWAS has been successfully deployed to identify selection signals in species, such as humans (Parolo et al., 2017) and wild birds (Bosse et al., 2017;Kim et al. 2017). They identified the genes under selection or adaptation and illustrated how genetic signatures of selection can be translated into variation in fitness and phenotypes. In our study, a total of 122 selected regions were found in both landraces sets, out of which 26 were colocalized with important developmental genes. Cavanagh et al. (2013) identified selection regions using the wheat 9K SNP arrays in 2,994 hexaploid wheat accessions and found that most selected regions were associated with yield potential, vernalization, plant height, and biotic and abiotic stress tolerance. Zhou et al. (2018 investigated 717 Chinese wheat landraces with 27,933 diversity arrays technology and 312,831 SNP markers and found 148 selected regions associated with yield and disease tolerance. The selected regions in our study were associated with important agronomic traits, including genes for yield-related traits (TaGS3-D1, TaGW2-6B, SPL20-7D, and SPL20-7A), end-use quality (ALP-7A, Pinb2, and Tamyb10), disease resistance (Lr67), and vernalization (Vrn-A1, Vrn-D4, and Vrn-B1). The allelic variation for most of the genes, such as SPL20-7A, 7D, ALP-7A, Tamyb10, and TaGS3-D1, is unknown in most of the geographic regions of the world. Zhao et al. (2019) surveyed allelic variation of 47 functional genes in a global wheat collection of 1,152 accessions from 21 countries. It is the only major germplasm survey study available to compare the frequency of functional genes; however, the germplasm collection was mainly composed of improved cultivars. From their results, it was quite clear that frequency of favorable alleles for yield-related genes like TaGW2 and TaGS-D1 was quite higher in improved cultivars compared with landraces. The GWAS hits near glutenin and Tamyb10 are plausible because the end-use preference in Iran and Pakistan has remained different. In Pakistan, wheat is used most often to prepare flatbread or chapatti (Rasheed et al., 2015), which is different from wheat products in Iran such as lavash, barbari, and sangak. The products from both countries require different glutenin and grain texture quality, which are primarily controlled by Glu-1, Glu-3, and Pin loci. Some evidence from the literature supports our results; for example, Lr67 is a durable rust resistance gene evolved after wheat polyploidization (Moore et al., 2015), and its resistance allele frequency is predominant in landraces from Pakistan and India (Riaz et al., 2016). Similarly, Kippes et al. (2015) identified a Vrn-D4 gene, which was important for the development of spring growth habit in the ancient wheats (T. aestivum L. ssp. sphaerococcum) from South Asia. Although the presence of Vrn-D4 in T. aestivum landraces is not yet surveyed, it is likely that the genomic region spanning Vrn-D4 on chromosome 5DS is under differential selection compared with Iranian landraces.
Modern breeding has effectively improved wheat adaptation to human preferences and various environments. Recently, many studies have been done to dissect the mean-ingful association between genotypic polymorphisms and environmental variables in natural populations by using different statistical tools (Ferrero-Serrano & Assmann, 2019;Günther et al., 2016;Russell et al., 2016). The use of genome-environment associations will help us understand crop adaptation to certain environments. However, little research has been reported on this topic because of the lack of reliable germplasm passport data, proper geographical coverage in germplasm collections, and undisputed genetic grouping (Contreras-Moreira et al., 2019). The data for ACVs in the current study was collected from extremely fine grids of weather data to uncover polygenic adaptation and domestication in wheat. The environmental GWAS (EnvGWAS) in this study used precise geographical, climatic, and soil data for each landrace collection site (referred to as GIS data for simplicity) and genomic data for those wheat landraces (Navarro et al., 2017), which provides important information to further decipher the genomic loci and regions affecting wheat adaptation. As GIS data are highly associated with crop selection and adaptation, they are also associated with population structure .
In crop improvement, genome-environment associations may be integrated with genome-phenotype associations to select for climate resilience traits (Lasky et al., 2015). Previously, genetic loci were identified for adaptive traits in soybean [Glycine max (L.) Merr.] landraces with 27 SNPs associated with three ACVs, i.e., temperature, precipitation, and day length (Li et al., 2020). Li et al. (2019) uncovered polygenic adaptation traits in 1,143 maize accessions using EnvG-WAS and identified a large number of loci associated with ACVs. In our study, many loci that associated with flowering time-related genes (Vrn-1, Ppd1, and Elf3-B1) were expected. Flowering time is a critical adaptive trait that affects reproductive success in a local environment and is typically studied in relation to environmental factors such as changes in temperature and daylengths (Amasino & Michaels, 2010). In our study, flowering time-related genes were detected within LD blocks surrounding five loci. The early flowering gene TaElf3-B1 was within the LD block of a locus associated with longitude among Iranian landraces on chromosome 1B; Ppd-D1 was close to a locus on chromosome 2D associated with longitude in Pakistani landraces and precipitation in Iranian landraces. Previously, Jones et al. (2013) reported that diversity at Ppd-D1 locus in Aegilops tauschii explained the species distribution along geographic coordinates, and that allelic variation is associated with adaptability pattern. Another developmental-related gene, Rht-8, is linked with Ppd-D1; the former is among the so-called Green Revolution genes inducing short stature in wheat germplasm. Vrn-A1 was close to a locus associated with VAPR on chromosome 5A in Pakistani landraces, and Vrn-D1 was within LD of a locus associated with longitude in Iranian landraces. Previously, maize landraces were used to identify the genomic region controlling the adaptability to longitude and latitude, and it was further revealed that about 61.1% of SNPs for altitude were associated with flowering time (Navarro et al., 2017). Similarly, SNPs associated with longitude on chromosome 1B were in close proximity of the Elf3-B1 gene, which is an important early flowering inducing gene. Some important mutations in Vrn-A1 and Vrn-B3 genes have been identified in landraces from Pakistan and Iran. Sehgal et al. (2015) identified the Vrn-A1f allele, which is a deletion of 5,997 bp within the Vrn-A1 gene, in landraces from Iran, Iraq, and Afghanistan, thus pointing to a Middle Eastern and/or near eastern origin of this allele. Derakhshan et al. (2013) reported an 890-bp insertion in the vrn-B3 gene from Iranian landraces, and another allele, Vrn-B3b, was identified in landraces from Pakistan, Iran, Uzbekistan, Turkmenistan, and Afghanistan, indicating a wide distribution of this allele. Similarly, SNPs in close proximity of the Vrn-1 loci have been associated with important ACVs like longitude, latitude, and VAPR in both Iranian and Pakistani landraces.
Temperature is another well-known climatic factor affecting plant growth and influencing plant distribution and productivity (Lobréaux & Melodelima, 2015). Because temperature is an influential factor for crop adaptation, it has been associated with some genomic regions. Lobréaux and Melodelim (2015) found that 25 genomic regions on four different chromosomes were associated with T min in Arabidopsis thaliana. It is likely that two loci in Iranian landraces (on chromosomes 5B and 6B) and six loci in Pakistani landraces (on chromosomes 5B, 6D, 7B, and 7D) that were associated with T max could be important for heat tolerance because they explained the adaptability of landraces in the geographic regions with the highest temperatures. Because the temperature gradients are different between the two countries, we could not find any common loci associated with temperature attributes between the two subpopulations.
The absence of common loci associated with ACV is expected and reported in literature. He et al. (2019) correlated the genomic regions associated with bioclimatic variables in different wheat populations and identified 43,670 genomic regions covering about 988 Mbs of the wheat genome. Crosspopulation comparisons indicated that most genomic regions in selective sweeps do not overlap, and only three genomic regions were shared among all nine populations used in the comparison. Because both populations in our study were not an admixture, it was difficult to find common genetic basis of wheat adaptability to high temperature in both populations. The reasons for the significant variation among populations could be that (a) selection pressures are likely to vary temporally and spatially, and strongly influenced by environmental conditions, such as climatic regime, biotic and abiotic stress, altitude, annual sunlight, temperature, and precipitation; (b) genotypes underwent different selection pressures to adapt to local agricultural practices; and (c) selection may be acting on multiple functionally equivalent mutations in different populations (Cavanagh et al., 2013;Ralph & Coop, 2010).
Several loci with significant effects on adaptation may not be detected due to the strict quality-control procedure of filtering minor alleles, which is a common limitation of GWAS. Such loci can be detected in the biparental populations by QTL mapping in populations derived from the cross of landraces and cultivars. However, Boyle et al. (2017) indicated that species generally adapt by small allele frequency shifts of many causal genes across the genome; hence, the GWAS could be the best approach to identify those variants with moderate-to-minor effects. The reported adaptation loci could have applications in breeding if the favored allele frequency is lower in modern wheat cultivars. Because these loci could explain environmental adaptability, high breeding value landraces can be identified by having a high frequency of favored alleles. These results will help breeders better understand the implication of the genomic footprint on crop adaptation and plant responses to climate changes and better explore exotic germplasm sources in a more targeted manner. We are currently extending this and genomic prediction-based approaches to the whole germplasm bank using genomic and GIS data to identify genomic regions and accessions of highest breeding relevance.

A C K N O W L E D G M E N T S
We acknowledge Prof. M. E. Sorrells (Cornell University) for technical advice and editing of the manuscript. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. The USDA is an equal opportunity provider and employer. We acknowledge funding from the National Science Foundation of China (32022064 and 31950410563), and the National Key Research and Development Program of China (2016YFD0100303).

D A T A AVA I L A B I L I T Y S T A T E M E N T
The genotypic and phenotypic data is presented as Supplemental Tables S8 and S9, respectively.

AU T H O R C O N T R I B U T I O N S
AR, HL, and AG designed the research; GB provided reagents and facility for genotyping; AG, FM, and RA conducted genotyping; UH, AR, HL, and JL performed data analyses; KI, AG, SUS, HA, MRB, VM, and SAP contributed germplasm; AR, UH, and HA wrote the manuscript; all authors edited and approved the final version of the manuscript.

C O N F L I C T O F I N T E R E S T
All authors declare no conflict of interest.