Genome‐wide association study in diverse Iranian wheat germplasms detected several putative genomic regions associated with stem rust resistance

Abstract Stem rust is one of the most important diseases, threatening global wheat production. Identifying genomic regions associated with resistance to stem rust at the seedling stage may contribute wheat breeders to introduce durably resistant varieties. Genome‐wide association study (GWAS) approach was applied to detect stem rust (Sr) resistance genes/QTLs in a set of 282 Iranian bread wheat varieties and landraces. Germplasms evaluated for infection type and latent period in four races of Puccinia graminis f. sp. tritici (Pgt). A total of 3 QTLs for infection type (R2 values from 9.54% to 10.76%) and 4 QTLs for the latent period (R2 values from 8.97% to 12.24%) of studied Pgt races were identified in the original dataset. However, using the imputed SNPs dataset, the number of QTLs for infection type increased to 10 QTLs (R2 values from 8.12% to 11.19%), and for the latent period increased to 44 QTLs (R2 values from 9.47% to 26.70%). According to the results, the Iranian wheat germplasms are a valuable source of resistance to stem rust which can be used in wheat breeding programs. Furthermore, new information for the selection of resistant genotypes against the disease through improving marker‐assisted selection efficiency has been suggested.

Fungicide application and stem rust-resistant wheat cultivars are two important strategies to combat Pgt. However, the fungicide application may not always be feasible due to the high cost, risk of fungicide resistance, and environmental contamination. Using resistant wheat cultivars is the most effective and economical way to control the disease. In this regard, having sufficient knowledge on population genetics of the disease causal agent and identifying effective resistance genes in the selected wheat genotypes are of great importance.
Plant disease resistance is generally divided into monogenic and polygenic categories based on the mode of inheritance in plants. In the monogenic type, a gene is responsible for the expression of resistance, and the type of resistance, dominance, and codominance, as well as discrete variation, can be obtained in differentiating generations and can be identified using the Mendelian laws and the monosomic technique of the genes. In the polygenic type, resistance is governed by several genes, so monosomic techniques and Mendelian laws cannot be used to identify resistance genes. Linkage mapping to detect quantitative traits loci (QTLs) is used as a methodology to understand the genetic control of polygenic traits. The major limitations of this methodology are a small number of crossovers, resulting in low genetic mapping resolution (10-20 cM) and the high cost of population replication to reach sufficient crossovers. Besides, the created populations are only useful for limited traits and studies (Gupta et al., 2005;Holland, 2007). An alternative methodology with the advantage of using natural populations is association mapping.
Association mapping seeks to identify continuous markers linked to one or more quantitative traits. It is based on the assumption that the observed phenotypic variation is related to genetic variation. In this method, a large and diverse set of individuals/lines drawn from a population is randomly collected and mapping is done based on linkage disequilibrium. Association mapping has been used for both whole-genome scanning and candidate gene analysis (Kraakman et al., 2004;Pasam et al., 2012;Rafalski & Morgante, 2004;Rostoks et al., 2006;Thornsberry et al., 2001). Association mapping is much more accurate than linkage mapping due to its many recombinations (Moose & Mumm, 2008). It is also more compatible with genetically diverse germplasm and allows the mapping of several traits simultaneously. Therefore, there is no need to create two-parental populations for each trait, which in turn incurs additional costs for genotypic and phenotypic evaluation.
The objectives of this study were to carry out a genome-wide search in Iranian wheat genotypes for resistance loci to Pgt races at the seedling stage, and the identification of genomic regions suitable for marker-assisted selection and further genetic dissection.  Table S1, and S2.

| Phenotypic evaluation
Phenotypic evaluation of AM panel against Pgt was carried out at the Cereal Research Department, Seed and Plant Improvement Institute (SPII), Karaj, Alborz, Iran, in 2019. The AM panel was evaluated under greenhouse conditions using a randomized complete block design with two replications for each of the four Pgt races. Seeds of each genotype were planted in pots with 10 cm height and diameter containing 1:1:2 ratio of common soil, peat moss, and leaf mold. After 8-10 days, when the first leaf of the seedlings well developed (12 stage of phenological development based on the Zadoks et al. (1974) scale), the spores of four stem rust races were collected from different regions of Iran (Table 1) were inoculated separately. The inoculated plants were kept in a dark chamber for 24 hr at 18 ± 2°C and near saturation moisture and then transferred to a 22 ± 2°C greenhouse with a 16-hr photoperiod. Inoculated plants were examined from the second day to record the latent period (LP). Fourteen days after inoculation, seedling infection type (IT) was recorded using the 0 to 4 scale introduced by Stakman et al. (1962) and modified by McIntosh et al. (1995) where ITs of 0, 1, and 2 were considered as low ITs, and ITs of 3, and 4 were considered as high ITs.

| Phenotypic data analysis
Before performing the analysis of variance, the hypotheses of variance analysis were examined and the results confirmed the hypotheses. To calculate genetic parameters, analysis of variance was performed for the IT and LP of the studied Pgt races. Genetic, environmental, and phenotypic variances were calculated using the Comstock & Robinson (1952) method, using Equations 1, 2, and 3, respectively: In these equations, MS g is genotype mean square, MS e is mean square error and r is the number of experimental replications.
Heritability was estimated by the Falconer (1989) method through the 4 equation: where 2 g is the genetic variance and 2 p is the phenotypic variance obtained from the variance analysis table according to Comstock & Robinson (1952) method. To estimate the coefficient of genetic and phenotypic variation, Singh and Chaudhary (1985) method was used based on 5 and 6 equations: where x is the average of trait. Genetic efficiency was also calculated according to the Allard (1960) and Singh and Chaudhary (1985) methods using the (7) equation: where k is 10% of the selection pressure (1.75), p is the phenotypic standard deviation and h 2 is the heritability.
A combined analysis of variance was performed using SAS software 9.3, and genetic parameters were calculated using Excel. The race grouping was performed based on Ward's method using SPSS software.

| Genotyping by sequencing and imputation method
The development and sequencing of a GBS library for the Iranian wheat have previously been described by Alipour et al. (2017). The genomic library was constructed according to the method of Poland et al. (2012). For this purpose, first, the extracted DNA was normalized and then enzymatic digestion was performed with two enzymes PstI and MspI. Then, barcode adapters were attached to each sample, so that each sample has a different barcode. To remove extras except for barcoded genomic DNA, purification was performed by QIAquick (Qiagen) PCR purification kits. Finally, the amplified fragments were selected on an E-gel system with a size between 300-250 bp and sent for sequencing by the Ion Proton System. Sequencing data were treated for 64 bp, and the same reads were grouped into tags. Identical sequence tags were aligned to identify the SNPs within the tags and the SNPs were summoned using the UNEAK (Universal Network Enabled Analysis Kit) GBS pipeline (Lu et al., 2013) as part of the TASSEL 4.0 bioinformatics package (Bradbury et al., 2007). SNPs with heterozygosity > 10%, minor allele frequency (MAF) <5%, and missing data > 20% were eliminated to reduce false-positive error results. After specifying the haplotype phase for all individuals, the data were subjected to imputation using BEAGLE v3.3.2 (Browning & Browning, 2009) based on available allele frequencies obtained. During imputation, four different reference genomes were assessed, among which the W7984 reference genome was shown to have the greatest imputation accuracy (Alipour et al., 2019). The different chromosomes LD decay was obtained using the ggplot2 package in RStudio (Team, 2015) based on LOESS regression.

| Genome-wide association study
A genome-wide association study for loci governing Pgt resistance was conducted using phenotypic data converted to a linear scale. To use the modified Stakman et al. (1962) ITs in the genome-wide association studies (GWAS), the 0 to 4 scale was converted to a 1 to 13 linear.
Both the general linear model (GLM) and mixed linear model (MLM) were used to examine the accurate association between marker and trait in TASSEL v.5 (Bradbury et al., 2007). The GAPIT package (Lipka et al., 2012) was also used to perform association mapping in both GLM and MLM methods in RStudio (Team, 2015). The results of TASSEL and GAPIT were investigated using a t test. According to the results, it was found that the general linear model derived from TASSEL provides more accurate information about the marker-trait association.
The function of putative genes was explored by investigating the pathways in which the encoded enzymes were involved in. After aligning SNPs sequences to the reference genome, overlapping genes with the highest identity percentage and blast score were selected for further processing. The ontology of each adjacent genes with Triticum aestivum, including molecular function and biological process, and also orthologous genes in related species were extracted from the ensemble-gramene database (http://ensem bl.grame ne.org).

| Phenotypic data analysis
The result of the combined analysis of variance for IT and the LP of wheat genotypes to four Pgt races is presented in Table 2. According to the results, the effects of race, genotype, and genotype × race interaction were highly significant for all race's IT and LP.
Genetic parameters including variance components, genetic coefficients, and phenotypic variations, heritability, and genetic efficiency for IT and LP of wheat genotypes against four Pgt races were calculated ( The frequency of genotypes was calculated based on the 0 to 4 IT scales (McIntosh et al., 1995;Stakman et al., 1962) for each race, and the results were presented in Table 4. Genotypes with seedling response between 0 and 2 (0-0;-;-;1-1-1+-2 --2C-2 and 2 + ) as resistance reaction and genotypes with IT between 3 and 4 (3-3 + and 4) were considered as susceptible reaction (Letta et al., 2014 The pattern of diversity among the four Pgt races based on Ward's method was shown in Figure 1. The dendrogram obtained from the analysis classified the races into three well-distinct groups.
The TTTTF and TKTTF clustered together while PTRTF and TTKTK showed independent virulence patterns.

| Population structure and kinship analysis
To prevent false-positive results, the AM panel was studied in terms of structure and kinship, and K and ∆K were extracted and their 2D graph was drawn and shown in Figure 2. The graph of ∆K at K = 3 represents the highest value in perfectly specified fracture, dividing the present population into three subpopulations. The allocation of each individual to either group was carried out based on the 70% membership threshold. Details of the subpopulation structure for each of the wheat genotypes are shown in Figure 3. Out of a total of 282 genotypes, 51 genotypes (18.09%) were in the first subpopulation, 103 genotypes (36.52%) were in the second subpopulation and 61 genotypes (21.63%) were in the third subpopulation. The first subpopulation contained 49 landraces and two varieties, the second subpopulation contained 97 landraces and six varieties and the third subpopulation contained 59 varieties and two landraces. When used the imputed SNPs, the studied germplasm has also divided into three TA B L E 3 Amounts of genetic parameters for infection types and latent period collected from the response of wheat genotypes to four Pgt races      imputed SNPs, respectively, in the 0.1% probability, which were the highest among the other races.

| Marker-trait association
The results of the Bonferroni correction at the 5% probability level of the association analysis of original SNPs are presented in 1D, 2A, 2B, 3A, 3B, 4D, 6A, 6B, and 7D, respectively; two QTLs were also associated with markers with unknown genomic loci. For the LP of TTKTK, 14 QTLs were detectable, but no QTL was recorded for the IT. The identified QTLs of the LP was located on the A (chromosomes six, three, five, and seven, respectively) and B genomes (chromosomes one and three, respectively), and a QTL was also associated with markers with unknown genomic loci. In the TKTTF, only three QTLs associated with the IT were identified, two QTLs were located on chromosome 1B at 11.376 cM and one QTL was located on chromosome 1D at 64.821 cM.
Highly significant markers associated with IT and the LP of Pgt races, their chromosomal sequence and position, the closest wheat gene or genes to them, orthologous genes (with the highest percentage of identity), identity percentage of the wheat gene or genes that match to the ortholog, molecular function and biological processes of the wheat gene or genes associated with the markers and the extent of phenotype variance explain (R 2 ) are reported in Table 10. Out of the 51 highly significant, identified SNP markers, only 9 markers had the adjacent wheat gene or genes with the same chromosomal position. The Ensembl (https://asia.ensem bl.org/index.html) site was used to gain further information about adjacent genes beyond the genetic position of the markers. The genes with the highest identity percentage and the lowest E-value with significant markers were reported. The amount of phenotypic variance explained by SNP markers ranged between 9% and 25%. The rs21674 and rs51316 markers on the 1B justified the high variance explained for the associated traits to other markers. These markers are in the closest genetic position to the TraesCS1B02G208400 and TraesCS1B02G037100 genes, respectively, and each of these genes has a genetic identity with the orthologous genes of AET1Gv20503500 (96.74%) in

| D ISCUSS I ON
The characteristics of the races used to evaluate the seedling stage and the effective and ineffective genes for each race are presented in Table 1. These races are known as the dominant black rust races in the country collected during 2015 and 2016 and with inoculated on 20 lines and differential cultivar of North American (Singh et al., 2008), the race of those has been determined. Isolate 95-2 was pathogenic to the Sr31 resistance gene, which caused resistance to Pgt for a very long time. This isolate belonged to the Ug99 race group and was named TTKTK.
The results of the combined analysis of variance showed that in both traits of IT and LP, the highest total sum of squares was explained by the genotype effect (54.12% and 43.39%, respectively); therefore, it is inferred that genotypes have had a significant effect on data variation. After that, the highest justified variance of IT (36.33%) and LP (36.91%) was due to genotype-race interaction.
The effect of the race for IT and LP explained 3.97% and 8.80% of the total variation, respectively. The small effect of the race indicates a relatively low diversity among the races under study.
The values of the genetic and phenotypic variation coefficients for ITs and LPs of all races were very close to each other. This indicates the high impact of genes on creating diversity among genotypes. Each of the phenotypic, genetic, and environmental diversity coefficients provides useful information about the observed genetic or environmental diversity. As seen in Table 3, the coefficient of phenotypic variation was higher than the genetic variation coefficient in all studied ITs and LPs. This is due to the influence of environmental factors. Since the phenotypic variance is derived from the sum of environmental and genetic variance, if the genetic variance is assumed constant, what causes one trait to differ in phenotypic and genetic variation would be environmental variance; that is, if the trait has high phenotypic variation and low genetic diversity, this indicates the effect of the environment. The higher the variation ratio of genetic to the environment, the greater the efficiency of selection, and therefore, favorable genotypes will be identified and selected more accurately from unfavorable ones.
The genetic variation coefficient reflects the variation among genotypes in terms of specificity understudy and cannot lonelily determine the extent of inheritance of this variation. This index, along with heritability, provides a good estimate of genetic progress in phenotypic selection (Burton & Devane, 1953). Simultaneous application of two very important parameters of heritability and genetic efficiency plays an important role in the development of varieties and genotypes. The high rate of genetic efficiency indicates additive gene action, and its low level represents nonadditive gene action.
Genetic efficiency will not necessarily be high when estimated a high heritability for a trait. If high heritability and high genetic efficiency coincide, it will reflect the additive effects of genes; however, if high heritability is associated with low genetic function, it would indicate epistatic effects or dominance (Johnson et al., 1955). If trait heritability is < 0.2 (20%), it indicates low heritability, if it is between 0.2 (20%) to 0.5 (50%), it has moderate heritability, and if it is > 0.5 (50%) it has high heritability (Stansfield, 1991). High heritability with favorable genetic efficiency was observed in the PTRTF IT and LP, confirming the additive effects of the gene, and indicating that a large proportion of phenotypic variation explains the genotypic variation.
High inheritance along with high genetic efficiency is a very important factor in predicting the effects of selecting the best individuals in a population.
Population structure has been used in genetic studies to explain the relationship among individuals "within populations" and "between populations" and shows a perspective on the evolutionary relationship of individuals in a population. Besides the existence of structure in a population that studied for association mapping, it is a deterrent to achieving reliable and positive results in a population that is not considered by the effects of population structure factors and relationships (Breseghello & Sorrells, 2006 TTTTF  PTRTF  TTKTK  TKTTF   IT  LP  IT  LP  IT  LP  IT  landraces, were excluded from the mixed genotypes. The number of these genotypes in the mixed group was small indicating low mixing in the studied population. Allelic incorporation in landraces of more than one single gene cohort could have caused a slight mixing due to the wheat distribution in more than one ancestral population (Oliveira et al., 2012). The presence of gene flow from the introduction of new genotypes into farms previously can be another reason for mixing. In this regard, germplasm exchange between different Mediterranean regions due to the development of ancient empires is considered as another possible reason for mixing (Moragues et al., 2007). Geographical origin of genotypes, selection, and genetic drift cause subpopulation within a large population (Buckler IV & Thornsberry, 2002;Flint-Garcia et al., 2003). The reason some varieties fall into associated groups of landrace is that some of the with increasing distance between pairwise markers according to the results; this is due to other factors such as population structure, genetic drift, migration, selection, and mutation. As well as, the results showed higher LD values in varieties than landraces. This increase in LD in varieties can be attributed to the selection, whether natural or synthetic, which causes an LD among the selected and associated genes. Also, the selection of the ascending (or descending) trait controlled by two or more nonlinkage genes, despite the high genetic (physical) distance among the genes, increases the LD (Ataei et al., 2017). The relatively high level of LD in many chromosomal regions in the population indicates the usefulness and effectiveness of association mapping for identifying and confirming QTLs in these regions (Zhang et al., 2010).
Since association mapping was introduced in plants ( Thornsberry et al., 2001), interest in the identification of new genes has gained popularity due to significant advances in DNA sequencing technology. The application of this method in plant populations to identify loci responsible for genetic variation including disease resistance is increasing (Hall et al., 2010;Kollers et al., 2013;Reich et al., 2001).
Knowledge of the genetic diversity of wheat to accelerate genetic and breeding studies is essential to produce disease-resistant genotypes because understanding the genetic nature of disease resistance can play a key role in the development of resistant genotypes.  (Yu et al., 2011(Yu et al., , 2012. B chromosomes were fined chromosomes carrying rust resistance genes in wheat, which has been reported in various studies (Aoun et al., 2016;Crossa et al., 2007;Letta et al., 2014;Yu et al., 2011). In the present study, only one marker (rs23510) was identified based on the results of the B genome located on chromosome 6 and was associated with the genomic locus that caused resistance to TTKTK. The number of markers in the B genome associated with the IT and the LP was significantly higher in the imputed dataset than in the original dataset and they were distributed on chromosomes 1B, 6B, 3B, 2B, and 5B, respectively. Letta et al. [19] identified a QTL on chromosome 1B for resistance to the JRCQC; this disease-resistant genomic region has also been mentioned. The rs51316 marker associated with the TTTTF and PTRTF IT was largely close to the QTL identified by Letta et al. (2014).
On the other hand, in this chromosomal position, the Sr14 gene is resistant to some Pgt races in wheat (Singh et al., 2006), but a significant effect on the IT of each race was observed in this study.
A specific study on TTKTK infection type belonging to the Ug99 group and reported in the study of Jin et al. (2007) did not show any significant effect at 106.99 cM located on chromosome 1B marker rs5923. This marker is related to the genomic region that caused seedling resistance to the TTTTF and was reported in the study by Letta et al. (2014). Not only the Pgt resistance gene is located in this region, but also the Lr46/Yr29/Pm39 gene block and the unknown resistance genes are in the adult plant stage Singh et al., 2013). Several Pgt resistance genes, including Sr9, Sr16, and Sr28 (McIntosh et al., 1995) and SrWeb, are located on the long arm of chromosome 2B. The SrWeb gene causes resistance to Ug99, while none of the four Sr9 alleles has this ability (Jin et al., 2007). No resistance to TTKTK was found on chromosome 2B; the SrWeb gene may not be present in the studied germplasm, or there is no marker associated with this gene.
After the B genome, the A genome had the highest marker-trait as-  (Letta et al., 2014) and are somewhat close to the rs17477 marker in genomic position, but because the resistance of these two genes is broken by the studied races and as a result, they were inactivated against rust disease with no QTL associated region with the IT. Therefore, this marker can be used in marker-assisted selection to identify Sr34 and Sr38.
The coexistence of two markers cfa2201 and wPt-5839 with two Sr34 and Sr38 Pgt resistance genes was reported in a study (Letta et al., 2014). These two genes are ineffective against the Ug99 race group, one of which (TTKTK) has been investigated in the present study (Jin et al., 2007;Singh et al., 2011). The Sr7 and SrND643 resistance genes (probably the Sr7 allele) are located on the long arm of chromosome 4A McIntosh et al., 1995). This gene caused resistance to the PTRTF race, but no marker-trait association was found in this study. Other studies have also mapped genomic regions associated with resistance to black rust disease in the A genome such as a marker wpt-734078  and wpt-6869 (Rouse et al., 2014)  The markers with the highest marker-trait association were blasted at the Ensembl database to identify overlapping genes, their molecular function, and biological processes, as well as to identify orthologous genes. The results indicated that the genes adjacent to the markers play an important role in biosynthetic pathways such as ion transport, redox, protein processing, phosphorylation, and so on. In general, environmental stresses cause significant changes in the expression levels of many of these genes in plants, so such changes result in the accumulation or reduction of important metabolites, changes in enzyme activity and protein synthesis rates, as well as the production of new proteins (Zhu, 2016) that deal with environmental stress in various ways.
ROS, phospholipid-derived signals, and cyclic nucleotide-dependent signals, as well as some plant hormones, are involved in stress signaling (Ali et al., 2018;Jamei et al., 2018). In response to the pathogen attack, ROS is produced by the attacked plants, acts as an important sign of stress, and reduces the amount of stress damage possible by activating defense mechanisms (Ali et al., 2018;Jamei et al., 2018;Lehmann et al., 2015). Membrane lipids by regulating membrane fluidity and other physicochemical properties cause secondary signaling molecules in response to stress.
After identifying the nearest wheat gene or genes to the markers, orthologous genes with these genes were examined.
Numerous orthologous genes were observed, but only genes with a high percentage of identity were selected. The identified orthologous genes were generally related to two species of Triticum dicoccoides and Aegilops tauschii, the ancestors of wheat, and they have introduced to it during the evolutionary stages of wheat by hybridization (Dvorak et al., 1998;Kerber, 1964;Kihara, 1944;Kislev, 1979;Matsuoka & Nasuda, 2004;McFadden & Sears, 1946). These orthologous genes had a high percentage of identity with the genes identified in wheat, which involved in stress resistance mechanisms.

| CON CLUS ION
Association mapping of resistance of Iranian bread wheat varieties and landraces to four Pgt races in the seedling stage in terms of LP and IT. The studied germplasm was grouped into three subpopulations. According to the results of the original and imputed dataset, the highest number of the pairwise marker in varieties and landraces was observed on the B genome. In the original dataset, the highest LD was observed in the D genome of varieties and landraces but in an imputed dataset, it was observed in the A genome of varieties and landraces. The results showed a significant difference in the LD value on different chromosomes.
However, the varieties had higher LD compared with the landraces, which could be attributed to the selection made during the production and release stages of the varieties. Genome-wide association study based on the original dataset revealed 69 and 62 marker-trait associations for races IT and their LP, respectively.
The number of marker-trait associations in the imputed dataset was 504 for IT and 454 for the LP. When Bonferroni correction was performed, in the original dataset, two QTLs for the TTTTF LP, one QTL for the IT and four QTLs for the LP of PTRTF, one QTL for the IT, and one QTL for the LP of TTKTK, and one QTL for the TKTTF IT remained. In the imputed dataset, the TTTTF IT by six QTLs and its LP by 26 QTLs, the PTRTF IT, and the LP by 1 and 31 QTLs, respectively, the TTKTK LP by 14 QTLs, and the TKTTF IT by 3 QTLs was controlled. No marker-trait association was detected in the TTKTK IT and TKTTF LP. Nine markers out of 51 SNP markers (very significant) had a chromosomal position similar to the adjacent Triticum aestivum gene or genes. By studying the molecular function and biological processes of these genes, it was found that these genes have different mechanisms that cause stress resistance. The results obtained from this study can be important to facilitate and accelerate breeding programs through marker-assisted selection.

CO N FLI C T O F I NTE R E S T
The authors declare that they do not have conflict of interest. The funding body was involved in the material creation, designing the study, data analysis, and writing the manuscript.

E TH I C A L A PPROVA L
Not applicable.