The genetic basis of drought tolerance in the high oil crop Sesamum indicum

Summary Unlike most of the important food crops, sesame can survive drought but severe and repeated drought episodes, especially occurring during the reproductive stage, significantly curtail the productivity of this high oil crop. Genome‐wide association study was conducted for traits related to drought tolerance using 400 diverse sesame accessions, including landraces and modern cultivars. Ten stable QTLs explaining more than 40% of the phenotypic variation and located on four linkage groups were significantly associated with drought tolerance related traits. Accessions from the tropical area harboured higher numbers of drought tolerance alleles at the peak loci and were found to be more tolerant than those from the northern area, indicating a long‐term genetic adaptation to drought‐prone environments. We found that sesame has already fixed important alleles conferring survival to drought which may explain its relative high drought tolerance. However, most of the alleles crucial for productivity and yield maintenance under drought conditions are far from been fixed. This study also revealed that pyramiding the favourable alleles observed at the peak loci is of high potential for enhancing drought tolerance in sesame. In addition, our results highlighted two important pleiotropic QTLs harbouring known and unreported drought tolerance genes such as SiABI4, SiTTM3, SiGOLS1, SiNIMIN1 and SiSAM. By integrating candidate gene association study, gene expression and transgenic experiments, we demonstrated that SiSAM confers drought tolerance by modulating polyamine levels and ROS homeostasis, and a missense mutation in the coding region partly contributes to the natural variation of drought tolerance in sesame.


Introduction
Sesame (Sesamum indicum L., 2n = 26) is a traditional oilseed crop with one of the highest oil contents and qualities amongst the major oil crops. Its seeds contain almost 18% protein and the oil is rich in polyunsaturated fatty acids (PUFA) and natural antioxidants such as sesamin, sesamolin and tocopherol homologues (Anilakumar et al., 2010). The numerous beneficial effects of sesame oil on human health along with the cosmetic and engineering applications have promoted the steep increase in sesame seed demand . Over the last decade, the production of sesame seeds has increased more than twice and its sale price has nearly tripled (FAOSTAT, 2017).
Sesame is a resilient crop with a strong adaptation to droughtprone environments. As compared to other major food crops, sesame better survives drought (Islam et al., 2016). However, it remains particularly sensitive to drought occurring during the germination and flowering stages (Boureima et al., 2011;Sun et al., 2010). In fact, sesame is grown in arid and semi-arid areas characterized by high temperatures, high levels of solar radiation, high evaporative demand and unpredictable drought episodes which greatly impair the productivity (Hassanzadeh et al., 2009;Witcombe et al., 2007). Several traits of the plant have been reported to be affected by drought stress including the germination rate, plant growth, flowering, number of capsules per plant, seed yield as well as oil yield and quality (Bahrami et al., 2012;Boureima et al., 2016;Hassanzadeh et al., 2009;Kassab et al., 2012;Sun et al., 2010). As a consequence, sesame seed yields are generally low (300-400 kg/ha) in most of the arid and semi-arid areas (Islam et al., 2016).
A key step towards developing drought tolerant sesame varieties is the elucidation of the genetic basis of tolerance to water stress in this species. Unfortunately, till date, very few molecular studies have been performed to decipher sesame response to drought (Dossa, 2016) and no QTL or functional marker associated with drought has been reported. Hence, it is crucial to develop genomic resources that can be employed for the improvement of drought tolerance in sesame varieties (Dossa et al., 2017). In general, traits that contribute to drought tolerance in plants are generally quantitative and involve multiple genes (Kang et al., 2015). According to Juenger (2013), variation in drought tolerance is almost always found in a large collection of accessions. Sesame harbours a huge diversity that is probably linked to its cultivation in a large range of environments coupled with long-term natural and artificial selections (Bedigian and Harlan, 1986;Wei et al., 2015). Understanding the genetic basis of phenotypic variation for drought tolerance can help to efficiently utilize these wealthy genetic resources for sesame improvement.
Genome-wide association study (GWAS) has proven to be a powerful approach in both humans and plants for detecting genes underlying variation in a natural population (Huang and Han, 2014). GWAS is based on genetic linkage disequilibrium (LD) and takes full advantage of natural variation and ancient recombination events (Nordborg and Weigel, 2008). GWAS has been carried out successfully for dissecting complex trait loci such as flowering time, leaf angle, leaf size, disease resistance, plant architecture, drought tolerance, yield, oil content and quality related traits in many crops, including maize, rice, sorghum, cotton, peach and Brassica napus (Cao et al., 2016;Guo et al., 2018;Huang et al., 2010;Li et al., , 2016Lu et al., 2017;Morris et al., 2012;Su et al., 2016;Tian et al., 2011). The release of a high quality reference genome of sesame has opened the door for dissecting the genetic architecture of complex agronomic traits (Wang et al., 2014). In this regard, 705 diverse sesame accessions collected worldwide were re-sequenced on the Illumina HiSeq 2000 platform and employed for the first time in a comprehensive GWAS for oil-related traits (Wei et al., 2015). This fully sequenced population represents tremendous and inestimable genomewide information for further implementations of GWAS on various key agronomic traits in sesame, including drought tolerance.
In this study, the natural variation amongst 400 sesame accessions in drought tolerance was exploited to perform a large-scale GWAS on five traits related to drought tolerance, which can be classified into two groups: plant survival related traits and productivity and yield maintenance related traits under drought stress. To better reflect drought tolerance, we mainly focused on drought tolerance index data (the ratio of the trait value under stress to the trait value before stress; Guo et al., 2018). This led to the identification of key genomic regions and loci underlying drought tolerance in sesame. In addition, by integrating functional genome annotation, gene expression, bioinformatics, yeast and Arabidopsis genetic transformation analyses, several candidate genes for drought tolerance were discovered in sesame, mostly as unreported drought tolerance genes. Overall, our results shed light on the genetic basis of drought tolerance in sesame and lay the foundation for markerassisted breeding efforts to develop highly drought tolerant varieties.

SNP distribution on the linkage groups and linkage disequilibrium
The genotypic dataset resulting from the data cleaning pipeline and imputation was composed of 1 000 939 SNPs covering all the 16 linkage groups (LGs) of the sesame genome. The LG14 encompassed the lowest number of SNPs (23 371) while the LG3 had the highest number (103 814). The average number of SNPs per LG was 62 559 and the SNP density ranged from 3 to 7 SNPs/ kb with an average 5 SNPs/kb (Table S1). Based on the genotypic dataset, the r 2 estimate of linkage disequilibrium (LD) reached half of its initial value (0.55) at a distance of 88 kb and went below 0.2 at distances above 200 kb (Figure S1A). Given this moderate level of LD decay rate in sesame, the average density of 5 SNPs/kb is fully sufficient to reach a good resolution for whole genome association mapping.

Population structure
We explored the population structure of the 400 sesame accessions based on a neighbour-joining tree and identified two main groups of accessions ( Figure S1B and Table S2). There was no obvious evidence of consistency between origins of the accessions and the grouping patterns from the phylogenetic tree. Accessions from the group coloured in red were mainly from the northern areas whereas the second group coloured in green gathered 292 sesame accessions from several countries in the tropical area ( Figure S1C). These results were further confirmed by the output of the STRUCTURE software ( Figure S1D). Furthermore, the SNP dataset was employed for Principal Component Analysis (PCA). The first two axes (F1 and F2) of the PCA explained 23% of the total genetic variation and similarly as the neighbour-joining tree, two recognizable groups of accessions were obtained ( Figure S1E). We computed F st index between these two groups to examine their level of genome-wide genetic differentiation. A very weak F st index of 0.036 was observed between the two groups. The weak population structure, coupled with a high marker density and a moderate LD decay rate may offer ample power and resolution for sesame GWAS in this study.

Phenotypic analysis of drought tolerance related traits in the association panel
In this study, we investigated five traits, including stem length (SL), survival rate (SR), wilting level (WL), capsule number (CN) and seed yield (Yie) in control and stress conditions. Although the plant water status parameters are important in deciphering drought tolerance, we did not include them in this work. In fact, plant water status parameters are highly dynamic and vary within minutes (Tardieu et al., 2017). Given the size of the population (over 7200 plants each year), investigating accurately these parameters would require a sophisticated automated platform, unfortunately, we do not have.
Extensive phenotypic variations for SL, Yie, SR and CN were observed in the sesame association panel (Figure 1a-d, Table 1). SL, Yie and CN traits exhibited normal distributions. SL was the most stable trait in both control and stress conditions ranging from 14 to 120 cm, with a coefficient of variation (CV) ranging from 14.53% to 18.54%. In contrast, Yie ranged from 0.01 to 11.21 g and displayed the highest CV ranging from 66.62% to 76.92%. In general, control plants performed better than the stressed plants and CN was the most affected trait by drought stress in the 2 years. Also, the drought stress effects on the assayed traits were more striking in 2016 as compared to 2015. This is consistent with the analysis of variance (ANOVA), which indicated that the accession (A), year (Y) and interactions between the accession and year (A 9 Y) were significant for all traits (P < 0.001). In addition, relatively high (61.67%-72.61%) broad-sense heritability (H 2 ) was estimated for SL and CN in both control and stress conditions. Based on the drought tolerance indexes, we observed that accessions from the tropical area displayed significantly higher drought tolerance than those from the northern area (P < 0.01) (Figure 1e).

Identification of significant signals associated with drought tolerance indexes
To investigate the genetic variants governing drought tolerance in sesame, GWAS were performed independently using drought tolerance index data (ratio of the trait value under drought stress  (e) Comparison of drought tolerance indexes between the subpopulation from the tropical area (coloured in green, 292 accessions) and the subpopulation from the northern area (coloured in red, 108 accessions). **, *** represent a significant difference between the two groups at P < 0.01, 0.001, respectively. condition to the trait value under normal condition) from the 2 years (Long et al., 2013). A total of 569 significant loci (P < 7.8 9 10 À6 ) was identified for the five drought tolerance indexes. These loci were distributed mostly in clusters across ten LGs. Regions of 88 kb (corresponding to the LD window) upstream and downstream of the peaks and harbouring at least two clustered significant loci were defined as quantitative trait loci (QTL). The CN, wilting level (WL) and SL related indexes displayed the lowest numbers of significant signals for both years. Conversely, Yie and SR related indexes involved hundreds of clustered significant loci, indicating a complex genetic architecture of these traits. The full list of significant signals detected in the 2 years is detailed in Table S3.
In this study, we further focused on the most reliable and stable QTLs that were constitutively identified in the 2 years or associated with various traits. In total, 140 loci were identified and resolved to ten QTLs each led by a peak (Figure 2, Figure S2-S4). The 10 peaks (Àlog 10 (P) values ranging between 5.2 and 8.86) were located on LG4, LG6, LG7 and LG8. Phenotypic variance explained (PVE) values of the GWAS peaks ranged between 2.21% and 10.70%. The locus SNP16409277 associated with the CN index has the lowest contribution whereas the loci SNP12606000 and SNP16406525 strongly contribute to the Yie and SR indexes, respectively (Table 2). Collectively, the peak loci can explain over 40% of the phenotypic variation in sesame. Within the ten stable QTLs, eight were co-detected for various traits, suggesting that they are pleiotropic QTLs. One QTL (QtlCN8.1~QtlSR8.1~QtlSL8.1) located on LG8 was co-associated with CN, SR and SL indexes. Another QTL (QtlY4.1~QtlSR4.2~QtlSL4.1) located on LG4 was commonly identified for Yie, SR and SL indexes. Furthermore, we detected a shared QTL (QtlSR7.1~QtlWL7.1) between WL and SR, which may be essential for plant survival and growth under drought conditions ( Figure 2c).

Mining of favourable alleles at the peaks and their pyramiding effect on drought tolerance in sesame
We defined at each peak, the alleles that led to the increase in SR, relative Yie, CN and SL indexes and decrease in WL as the favourable ones. Concerning the SR, amongst the five associated peak loci, the desirable alleles were found to be mainly the common alleles contributing to the enhancement of sesame plant survival under drought stress (Figure 3a-e). Similarly, the common allele G at the locus SNP7867981 lowered the wilting signs of the plants under drought stress (Figure 3f). For the locus SNP12565472 tightly linked to the SL index, the variant allele A significantly increased drought tolerance compared with the common allele C (P < 0.001). The variant allele A at the second locus SNP16406662 associated with SL index, also significantly increased drought tolerance compared to the common allele G (P < 0.001) (Figure 3g,h). A similar result was observed for the relative seed yield (Yie) index which was strongly improved in the accessions harbouring the variant allele A at the locus SNP12606000 (P < 0.001) under drought stress condition (Figure 3i). Finally, the variant allele C at the locus SNP16409277 detected for the CN index, significantly enhanced capsule formation under drought stress condition as compared to the common allele (P < 0.001) (Figure 3j). Taken together, our results showed that alleles which are crucial for the plant survival under drought stress conditions have already been fixed in the sesame crop and have been positively selected by recent breeding (Table S4). However, alleles that are essential for productivity and yield maintenance under drought conditions have not yet been intensively selected and are far from being fixed in most of the accessions. In the association panel, the majority of the accessions (50.75%) displayed a combination of three desirable alleles ( Figure 3k). Furthermore, the accessions from the tropical area (coloured in green) had more drought-tolerance alleles at the peaks than those from the northern area (coloured in red), which may underpin their higher drought tolerance ( Figure S5A).
To determine whether the favourable alleles at the peaks have a positive pyramiding effect, we selected the four strongest associated loci (SNP12606000, SNP9732360, SNP16406525 and SNP7867981) and analyzed the SR and relative Yie data. The analysis revealed that the accumulation of favourable alleles contributes to higher SR and Yie maintenance under drought stress condition (Figure 3l; Figure S5B). Therefore, future breeding efforts may focus on pyramiding as much as possible the desirable alleles at the peak loci into elite sesame cultivars.

Detecting the candidate genes for drought tolerance in sesame
We used the genome information of sesame to uncover the candidate genes associated with each peak signal. All genes in the LD region surrounding the peaks (both single-year and multiyear peaks) were determined (Table S5). Based on the ten stable QTLs, we identified a total of 102 genes ranging from 12 to 30 genes around each peak (Table 2). In many cases, the peaks were not located in genic or regulatory regions, but co-segregating significant SNPs were found directly in genic regions. These genecontaining significant SNPs have high potentials to modulate drought tolerance in sesame (Table S6). The gene SIN_1012139 (SiABI4, ABA insensitive 4) from the QtlY4.1 contains a missense mutation at the locus SNP12617427 (T/G, MAF = 5%) located in the coding region, which changes the amino acid S to A (at the 305th position). Accessions with the variant allele G have higher relative seed yield index than those with the common allele T ( Figure 4a). SiABI4 was not expressed in leaf, stem and even root tissues of both T allele-accessions and G allele-accessions. However, it was highly induced under drought stress in the seed of G allele-accessions and slightly in T allele-accessions. Our results denote that SiABI4 is a seed-preferential gene and the mutated allele G enhances sesame seed yield under drought conditions. Another gene containing a missense mutation detected in this QTL is SIN_1012134 (SiTTM3, Triphosphate tunnel metalloenzyme 3). The polymorphism at the locus SNP12562707 (T/C, MAF = 5%) located in the coding sequence changes C to R at the 80th amino acid. Accessions with the mutation have more than the double of the relative seed yield index as compared to those with the T allele, which is the perfect opposite at the gene expression levels in leaf tissues ( Figure 4b).
These results indicate that SiTTM3 is probably a negative modulator of drought tolerance in sesame, a function which has not yet been reported for its homolog in Arabidopsis (AtTTM3). The same gene (SiTTM3) was found associated with SL, but another missense change in this gene at the locus SNP12563169 (G/T, MAF = 5%) located in the 3 0 -UTR region likely affects SL index ( Figure 4b). Accessions with the variant allele displayed a strong tolerance to drought as shown by their high SL index value. In silico analysis in miRBase (www.mirbase. org) of potential microRNA binding sites in the 3 0 -UTR region of SiTTM3 revealed that the T variant leads to the gain of a putative binding site for miR-3276 which cannot bind SiTTM3 harbouring the G allele (E-value cut off = 4). Although there is no perfect seed matching, the T variant strengthens a potential compensatory base pairing site in the 3 0 -region of the miR-3276 ( Figure 4b). We inferred that the miR-3276 binds to the 3 0 -UTR region of the T-allele SiTTM3 and inhibits its expression posttranscriptionally. Interestingly, the two SNPs located in SiTTM3 (SNP12562707 and SNP12563169) were in a strong LD (r 2 = 0.9), therefore, both loci may cooperate to enhance yield and productivity maintenance under drought conditions in sesame.
In the QtlSR8.1, the gene SIN_1022782 (SiNIMIN1, NIM1-Interacting 1), contained 44 significant SNPs and most of them were synonymous mutations. However, the peak locus SNP16403158 (P = 2 9 10 À7 , MAF = 4%) is a missense change (G/A) that led to D/N variation at the 96th amino acid of the SiNIMIN1 protein. Phenotypic analysis of the alleles at the locus SNP16403158 showed that the variant allele strongly decreased the survival rate under drought stress of sesame plants. Moreover, SiNIMIN1 was found weakly expressed in leaves of accessions carrying the variant allele than the G-allele accessions ( Figure 4c). The homolog of SiNIMIN1 in Arabidopsis (AtNIMIN1) has been described to be involved in plant defence response, therefore, its implication in drought tolerance in sesame suggests a functional diversification. In the same QTL region, another important gene SiSAM (SIN_1022789, S-adenosylmethionine synthetase) containing a missense significantly associated SNP (SNP16465126, MAF = 5%) was uncovered. The mutation affects the F amino acid at the 137th position which changes to Y. SiSAM was highly down-regulated in sesame leaves under drought stress but we did not observe an obvious difference between accessions carrying the common allele T and the variant A at the locus SNP16465126 (P > 0.05). Similarly, both alleles did not differentially affect the survival rate of sesame plants under drought stress (Figure 4d). However, SiSAM gene expression was significantly different when we compared accessions harbouring the T and C allele at the peak locus SNP16406525 of this QTL (QtlSR8.1). These observations suggest the existence of another causative variant which alters SiSAM expression and has been probably excluded from the genotyping data due to missing data. SiSAM homologous gene (AtSAM1) in Arabidopsis is well described as a stress responsive gene, therefore, over-expression of SiSAM harbouring the favourable allele may represent an efficient way for drought tolerance improvement in sesame. LG8 for the 2 years are highlighted in blue in the regional plots with the names of the lead locus placed on the top. (b) Manhattan plot and QQ plot of SR. The significant trait-associated QTLs constitutively identified on LG6 and LG8 for the 2 years is highlighted in blue in the regional plots with the names of the lead locus placed on the top. (c) The network based on shared QTLs between different drought tolerance indexes in sesame. The width of the arrow depicts the amplitude of the QTL contribution to the trait. Other candidate genes-containing significant SNPs such as SIN_1015691 (SiP450), SIN_1015693 and SIN_1004723 were detected in the remaining stable QTLs and may be valuable genetic resources for improving drought tolerance in sesame (Table 2). With regard to the QTLs for which we did not find any candidate gene-containing SNPs, we referred to the functional annotation of all genes located within the QTL and also used drought stress transcriptome datasets developed in our group (available at the NCBI sequence read archive under accession numbers: SAMN09517829 and SAMN09517828). This approach helped identify some potential candidate genes including SIN_1022774 (SiGOLS1), SIN_1005662 (SiPOD3) and SIN_1004716 (Table 2, Table S7). Gene ontology analysis of all candidate genes detected in this study indicated that they are involved in the biological process related to stress responses and play molecular functions related to catalytic, binding and enzyme regulator activity (Figure 4e).
A missense change in the coding region of SiSAM modulates drought tolerance in sesame S-adenosylmethionine synthetase genes have been reported to be involved in abiotic stress responses in various plant species (Kim et al., 2015;Lin et al., 2008;Radadiya et al., 2016;Wang et al., 2016a;;Wang et al., 2016b). In this study, SiSAM was identified in a major hub locus of the drought tolerance related association network (Figure 2c) but the causative genetic variant in SiSAM altering drought tolerance has yet to be discovered. In this regard, we conducted an in-depth investigation of the natural variation in this gene and also explored its implication in drought tolerance using yeast and Arabidopsis thaliana as models, since sesame genetic transformation is not yet very effective. SiSAM is an intronless gene with a transcript length of 1182 bp and 393 amino acid protein length. SiSAM promoter contains several abiotic stress and hormonal cis-acting regulatory elements, particularly the ABRE motif, suggesting that SiSAM activity is likely to be ABA-dependent ( Figure S6).
In order to fully identify the DNA polymorphism present in the gene SiSAM, it was re-sequenced in 100 randomly selected sesame accessions (Table S8). A 2.4 kb genomic fragment was sequenced spanning the SiSAM coding region, both the 5-, and 3-untranslated regions (UTR) and the promoter. In total, 28 SNPs and 5 insertions-deletions were discovered (MAF ≥ 0.05). These polymorphisms included the locus SNP16465126 already detected in the QtlSR8.1 from the GWAS analysis. We analyzed the association of each polymorphism site with SR using the Mixed linear model and the pairwise LD of the polymorphisms was calculated (Figure 5a). From the 33 polymorphism sites, only two loci, SNP16465736 and SNP16465126, were significantly associated with SR variation (ÀlogP = 5.41 and 3.67, respectively). The novel identified locus SNP16465736 (C/A) is located in the coding region of SiSAM and is a missense change leading to N/K variation at the 340th amino acid of the protein. Moreover, the locus SNP16465736 together with the previously identified genic loci SNP16465126 and the peak loci SNP16406525 at the QtlSR8.1, were in LD (r 2 = 0.8, 0.7, respectively) amongst the sequenced accessions. The common allele C at the locus SNP16465736 contributes to a higher SR than the variant allele. Similarly, the gene SiSAM was strongly up-regulated as an early response to drought stress, but higher in the C-allele accessions than in the A-allele accessions. Subsequently, the transcript level of SiSAM was significantly more decreased in the accessions harbouring the variant allele A as compared to those with the C allele under prolonged drought stress (Figure 5a). These results indicate that the polymorphism at the locus SNP16465736 is likely to be the causative variant altering the expression and function of SiSAM under drought stress in sesame. The predicted drought tolerant allele (encoding the amino acid N) is the ancestral type according to the phylogenetic analysis with its homologs in other plants ( Figure 5b) and information from wild Sesamum species. Interestingly, we found that all accessions with the A allele are from the northern area group, which accessions were found to be more sensitive to drought stress (Figure 1e). Sesame is thought to have originated in Africa or South Asia (Bedigian, 2003;Hiltebrandt, 1932) and the northern area group may be derived from the long-term selection for adaptation to the photoperiod change (Wei et al., 2015). We therefore speculate that the mutated A allele may have facilitated sesame acclimation to the northern area. We transferred SiSAM containing the C allele (named SiSAM C ) allele into yeast and Arabidopsis to further validate its implication in stress tolerance. We observed that the yeast transformants carrying SiSAM C could grow normally in osmotic stress-imposed medium containing 2.5 M Mannitol while the transformed yeasts with the empty plasmid could not restore growth at lower dilutions ( Figure 6a). Osmotic stress always occurs simultaneously with drought and its tolerance is a vital part of drought tolerance (Farooq et al., 2009). We deduced that SiSAM C improves osmotic stress tolerance in yeast.
In Arabidopsis thaliana, the studied three independent transgenic lines over-expressing the SiSAM C gene all exhibited significantly enhanced drought tolerance than the wild type (WT). Transgenic plants especially the lines L1 and L2 grew better than WT in the osmotic stress-imposed medium containing 150 mM Mannitol (Figure 6b). Furthermore, after 17 days drought stress treatment, the WT plants were severely wilted, whereas SiSAM C over-expressing transgenic lines showed less wilting signs, thus a stronger drought tolerance than the WT plants (Figure 6c,d, Figure S7). The survival rate of the WT plants was 16%, while the survival rate of the SiSAM C over-expressing lines (Figure 6e) ranged from 60% to 80% (Figure 6f). Similarly, the silique formation was significantly impaired in the WT plants as compared to the transgenic lines under drought stress treatment (P < 0.001) (Figure 6g). Transgenic lines have a higher ability for ROS detoxification than the WT plants as shown by the significant lower contents of malonaldehyde (P < 0.001), which could explain the low plant wilting signs and the high survival rate Figure 4 Detecting some candidate genes containing significant SNPs in the stable and pleiotropic drought tolerance QTLs in Sesamum indicum. (a) A missense mutation at the locus SNP12617427 in the coding region of the seed-specific gene SiABI4 alters the gene expression and the relative seed yield (Yie) between the two haplotype groups; FC means expression fold change and the trait values of the two haplotype groups were compared using t-tests. Error bars indicate the SD of biological replicates, (***P < 0.001). (b) Two missense mutations in the coding region and UTR-3 0 significantly affects the expression of SiTTM3 and drought tolerance. The polymorphism at the locus SNP12562707 alter the relative seed yield (Yie) between the two haplotype groups while the variant at the locus SNP12563169 influences the binding of miR-3276 to SiTTM3 and significantly affects the relative stem length (SL). (c) Several synonymous SNPs located in the coding region and a strongly associated missense change at the locus SNP16403158 of the gene SiNIMIN1 significantly affects the gene expression level and the survival rate (SR) between the two haplotype groups. (d) A missense mutation in the coding region of SiSAM (SNP16465126) did not significantly alter the gene expression and the survival rate (SR) between the 2 haplotype groups. However, SiSAM expression was significantly different between the 2 haplotype groups at the peak locus SNP16406525 of the QTLSR8.1, suggesting the existence of another causal variant in SiSAM. (e) Gene ontology analysis of the potential candidate genes around the peaks selected based on their expression fold change under drought treatment.  Figure 6g). An analysis of the polyamine spectrum and SAM content indicated that drought stress induced accumulation of putrescine (Put), spermidine (Spd), spermine (Spm) and SAM in WT and transgenic lines (Figure 6h-k). However, we observed significantly higher contents (~3-fold) of SAM, Spd and Spm in the transgenic lines than in the WT plants under both control and drought stress conditions (P < 0.001). Conversely, Put was significantly higher in the WT plants than in the transgenic lines under control and stress conditions (P < 0.001). Intriguingly, the gene AtADC1 involved in Put biosynthesis showed higher transcript level in the transgenic plants than in the WT plants ( Figure S8). Also, we noticed that the activities of the transgene and key polyamine metabolism and biosynthetic genes (AtSAMDC1, AtSAMDC2, AtSPDS1, AtSPDS2 and AtSPMS) were strikingly enhanced in the transgenic lines as compared to the WT plants under drought stress ( Figure S8). We deduce from these results that over-expression of SiSAM C in Arabidopsis improves SAM accumulation, induces enhanced production and conversion of Put to Spd and Spm, leading to an effective ROS homeostasis and drought tolerance. SiSAM C not only improved the survival of the transgenic Arabidopsis plants but also allowed good productivity maintenance under drought conditions, exactly as the QtlSR8.1 in sesame (Figure 2c). Altogether, these data support the premise that natural variation in SiSAM alters drought tolerance in sesame accessions. Importantly, the favourable allele of SiSAM represents a good marker for the development of drought-tolerant sesame cultivars using traditional breeding approaches.

Discussion
Phenotype-genotype association analysis has become a critical tool for identifying alleles and loci responsible for the agronomically important traits (Wan et al., 2017). In the current study, the selection of sesame accessions from diverse origins, with sufficient genetic variation and weak population structure is advantageous for GWAS implementation (Cao et al., 2016;Wei et al., 2015). The LD decay rate in sesame, a self-pollinated crop ( $ 88 kb) is relatively lower than those of rice ( $ 100 kb) (Huang and Han, 2014), sorghum ( $ 150 kb) (Morris et al., 2012) and soybean ( $ 150 kb) (Lam et al., 2010) but higher than the LD decay rate of outcrossing species such as maize ( $ 2 kb) . The modest rate of LD decay in sesame suggests that the resolution of GWAS may not easily resolve to the causative gene. However, by employing a high marker density (5 SNPs/kb) in the current study, we successfully identified QTL regions harbouring a very limited number of genes and several gene-containing SNPs. Similar to our results, the previous comprehensive GWAS based on an ample marker density for oil quality and agronomic traits in 705 sesame lines was very efficient with the identification of several causative genes (Wei et al., 2015).
We detected a total of 569 significant SNPs with modest P values (P < 7.8 9 10 À6 ) distributed across 10 LGs of the sesame genome. Ten major QTLs constitutively identified in the 2 years could be targeted in breeding programmes with high confidence. Collectively, the peak loci at these major QTLs could explain only 40% of the total phenotypic variation. This is understandable since drought tolerance is as a very complex trait involving dynamic and diverse responses that are controlled by a large number of small-effect loci (Guo et al., 2018). It is worth mentioning that we mainly focused on the stable (multi-year) QTLs in this study, hence, adding other single-year QTLs will increase the phenotypic contribution. However, we believe that extending our association panel to a more diverse sample particularly, integrating more accessions from Africa could increase the power to detect novel drought tolerance traitassociated variants using GWAS (Huang et al., 2012). Most of the sesame accessions harboured several favourable alleles at the associated loci. In addition, the group of accessions from the tropical area (more prone to drought) had more favourable alleles and was globally more tolerant to drought compared to the northern area accessions. All these findings indicate that being originated from drought-prone environments of Africa or India (Bedigian, 2003;Hiltebrandt, 1932) and as a result of a long-term adaptation, sesame has fixed several drought-tolerance alleles. Hence, the crop has been genetically shaped to survive drought. Similar conclusions were drawn by Liu et al. (2013) who observed that maize varieties from the subpopulation consisting of tropical inbred lines (environment more prone to drought) had the highest frequency of the favourable allele of the drought tolerance gene ZmDREB2.7, which was consistent with the observed higher level of drought tolerance of this subpopulation than the others.
Most of the important food crops are sensitive to drought stress (Farooq et al., 2009) and very few are able to survive drought like sesame (Langham, 2007). In this study, we discovered that the alleles beneficial to the plant survival under drought stress have been fixed and intensively selected in sesame, which can explain the relatively drought tolerance of this crop. However, alleles that are essential for high productivity and yield maintenance under drought conditions are far from being fixed and could be the focus in future sesame breeding programmes. To date, many studies have demonstrated that marker-based gene pyramiding is very effective in crop improvement strategies (Barloy et al., 2007;Sacco et al., 2013;Werner et al., 2005;Zhang et al., 2014). The positive pyramiding effects of the favourable alleles uncovered in our report may be of high potential for the development of drought tolerant sesame cultivars with enhanced capacity for productivity and yield maintenance under stress conditions using allele-specific molecular markers (Bang et al., 2007). In fact, the marker-assisted breeding approach will be more practical for improving sesame drought tolerance than the transgenic approach since sesame genetic transformation is still at the early stages (Chowdhury et al., 2017).
Various novel genes have been reported in this study as involved in sesame drought tolerance. For example, the homolog of the gene SiNIMIN1 in Arabidopsis (AtNIMIN1) was described as a pathogen defence responsive gene (Zwicker et al., 2007) and a function in abiotic stress have not yet been reported for this gene. Similarly, for the gene SiTTM3 which likely negatively modulates drought tolerance in sesame (Figure 4), its homolog in Arabidopsis (AtTTM3) has been shown to play a role in root development (Moeder et al., 2013) but no evidence of its implication in drought stress was reported. The gene SIN_1004716 is the homolog of AT5G16010 in Arabidopsis, which contributes to wax biosynthesis. Plant epidermal wax forms a hydrophobic layer covering aerial plant organs which constitutes a barrier against uncontrolled water loss and biotic stresses (Costaglioli et al., 2005). By detecting this gene associated with WL and SR, we inferred that it may be involved in wax accumulation in sesame tissues delaying wilting signs and death of sesame plants under drought stress. Although our genetic analysis identified these genes as drought tolerance candidate genes in sesame, their functional mechanisms have to be elucidated.
Our study also uncovered some known drought tolerance genes previously reported in other plant species. The gene ABI4 is an example of drought tolerance gene reported in Arabidopsis (AtABI4) which is involved in abscisic acid signal transduction. Interestingly, AtABI4 is more active in developing siliques and seed, exactly as the preferential expression of SiABI4 in sesame seeds, indicating a functional conservation of this gene in the two species (Finkelstein et al., 1998). Another important gene uncovered in our study is the gene SiSAM, the homolog of AT1G02500 (AtSAM) in Arabidopsis. S-Adenosylmethionine (SAM) is required for the biosynthesis of phenylpropanoid compounds and is also a precursor for the biosynthesis of the phytohormone ethylene and polyamines (Kende, 1993;Tiburcio et al., 1990). Polyamines are protective molecules and play vital roles in the regulation of plant tolerance to abiotic stress through direct interactions with other metabolic routes and hormonal cross-talk, and the activation of the expression of stress-responsive genes (Alc azar et al., 2010;P al et al., 2015). SAM was reported to modulate polyamine levels and improve tolerance to various abiotic stresses including drought in Arabidopsis (Kim et al., 2015), broomcorn millet (Lin et al., 2008), tomato (Gong et al., 2014), soybean (Wang et al., 2016a,b) and pigeon pea (Radadiya et al., 2016). In this study, we found that a functional variation at the locus SNP16465736 in the coding region of SiSAM contributes to the natural variation of drought tolerance in sesame. Further functional investigations showed that SiSAM C modulates the polyamine levels, leading to enhanced ROS homeostasis and drought tolerance. However, how the A allele (SiSAM A ) reduces drought tolerance and its likely implication in sesame acclimation to the northern area remains unclear and will need further investigations.
In summary, we revealed for the first time, the allelic variations modulating drought response in sesame and provided genetic evidences explaining how sesame is intrinsically a relatively drought tolerant crop. We further discovered the highly anticipated genes associated with these genetic variants which will fuel future research on molecular improvement of sesame varieties and related crops for higher performance in drought-prone environments.

Plant materials and phenotyping procedure
A collection of 400 sesame (Sesamum indicum L.) accessions, including modern cultivars and landraces was used for the GWAS in this study. These accessions were extracted from a large collection of more than 8000 sesame accessions preserved at the China National Gene Bank, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences. These samples were selected because they have a similar flowering period (Wei et al., 2015) and originated from 29 different countries all over the world (Table S2). Unfortunately, accessions from Africa were less represented in our panel mainly because most of these materials could not grow well in the phenotyping environment.
The 400 accessions were self-pollinated for four generations in Sanya, Hainan province, China (109.187°E, 18.38°N, altitude 11 m). For this experiment, the germplasm were grown in pot (25 cm diameter and 30 cm depth) containing 7 kg of loam soil with known physicochemical properties mixed with 10% of added compound fertilizer. The pots were arranged in a greenhouse at OCRI-CAAS's research station in Wuhan, Hubei Province, China (114.31°E, 30.52°N, altitude 27 m). The experiment was carried out under a completely randomized split-plot from May to September in 2015 and 2016. The 400 accessions and water regimes (control and drought stress) were arranged as sub-plot and main-plot, respectively. With two plants per pot and three replicated pots for the same genotype in each treatment (water regime), we obtained a total of 7200 plants for phenotyping. Seedlings were well-watered to keep the maximum soil moisture condition (100% field capacity [FC]) and ensure plants' normal growth. Drought stress was applied when 90% plants entered early flowering stage and spanned the reproductive phase at which sesame is the most sensitive to drought stress (Sun et al., 2010). Water was withheld in the drought treatment for 7 days (~30% FC). The 8th day from the beginning of the drought stress application, the stressed plants were watered to reach 100% FC to help recovering. The drought stress application and rewatering scheme were further carried on two more times, and afterwards, the irrigation was kept normally until the maturation stage. Meanwhile, the control plants were normally irrigated during the whole experiment. The variations of the mean temperature and relative humidity during the repeated stress periods are presented in Figure S9.

Phenotypic evaluation of the association panel
Five drought related traits were investigated including the wilting level of the whole plant (WL) recorded on the stressed plants at the end of the last drought stress treatment before rewatering; the survival rate (SR) evaluated on the stressed plants a week after the last drought stress treatment after rewatering; the stem length (SL), capsule number per plant (CN) and seed yield (Yie) at the maturation stage on all plants (control and stress). A scale of 0-7 was used for visual scoring of WL with 0 meaning no wilting sign and 7 for completely wilted or dead plants. SL (cm) was measured as the length from the base of the stem to the tip of the main inflorescence. CN is the total number of capsules per plant and Yie (g) is the weight of all seeds produced per accession. Yie was recorded only in 2016. During the 2 years, the phenotyping procedure and scoring standard were the same.

Statistical analysis
All statistical analyses were performed with the R package (R Core Team, 2015). Least square means were calculated with general linear model procedure for the replications of each accession in the respective years and for both control and drought stressed plants. Drought tolerance index was estimated for SL, CN and Yie according to Long et al. (2013) using the formula: ratio 00 trait 00 ð%Þ ¼ ðmean value of trait under drought stressÞ= ðmean value of trait under control conditionÞ Â 100 In addition, broad-sense heritability (H 2 ) across the treatments was calculated as follow: Where: r 2 a , r 2 ay and r 2 e are estimates of the variances of accession, accession 9 year interaction and error, respectively. Y represents year, and R is the number of replications (Kowles, 2001).
ANOVA was performed using the packages 'Ade4' (Dray et al., 2007) and 'Agricolae' (de Mendiburu, 2014). The normal distribution of data was determined using the Shapiro-Wilk Wtest, whereas the homogeneity of variances was determined with the Bartlett test. Differences were tested for significance at the 5% probability level and mean comparisons were done using the Tukey HSD test.

Population genetics and GWAS
A total of 1 000 939 common SNPs covering the whole genome with minor allele frequency (MAF) >0.03 were obtained from the sesame HapMap project (www.ncgr.ac.cn/SesameHapMap/) and used for trait-association analysis in this study (Wei et al., 2015). The quality control and linkage disequilibrium (LD) estimation were performed as described previously (Wei et al., 2015). The software MEGA7 (Kumar et al., 2016) was employed to construct a phylogenetic tree based on simple matching coefficients. Principal component analysis of the population was performed in the R package (R Development Core Team, 2015). Additionally, the program STRUCTURE 2.3.4 (Falush et al., 2003) was used for a bayesian clustering analysis with the admixture model and correlated allele frequencies. Five runs were performed for each subpopulation K (1-10) . The burn-in time and iterations for each run were set to 30 000 and 70 000 and the true K was determined according to the method described by Evanno et al. (2005). All accessions were mapped according to their geographical coordinates using the Tableau software version 9.3 (Seattle, WA). The relative kinship analysis was implemented using the software package SPAGeDi (Hardy and Vekemans, 2002). Association analysis was performed using the EMMAX package (Kang et al., 2010) based on the Mixed model and the matrix of pairwise genetic distance derived from simple matching coefficients was used as the variance-covariance matrix of the random effect.
Suggestive (1/n) P-value threshold was set to control the genome-wide type 1 error rate derived from the Mixed model (Duggal et al., 2008); n represents the effective number of independent SNPs calculated using the Genetic Type I Error Calculator software . The estimated effective number of independent tests was 128 405 SNPs and the threshold used to identify significantly associated loci was set at P = 7.8 9 10 À6 . The Manhattan, regional and QQ plots were generated using the EMMAX and the R packages, qqman (Turner, 2014), snp.plotter (Luna and Nicodemus, 2007). The association analysis was performed independently on single-year phenotypic datasets and the repeated significant signals (over years) were considered as the most reliable. For clustered significant SNPs, the SNP with the highest Àlog 10 (P) in the LD region were considered as the peak. Significant associations were also selected on the threshold of P ≤ 0.01, corrected for multiple comparisons according to the false discovery rate procedure reported by Benjamini and Hochberg (1995). Regions of 88 kb (corresponding to the LD window) upstream and downstream of the peaks and harbouring at least two clustered significant loci, were defined as quantitative trait loci (QTL). The value r 2 derived from linear regressions were calculated to examine the phenotypic variance explained (PVE) of each peak in the R package (R Development Core Team, 2015). Before fitting the model, each marker was coded with the value 0 used for the reference allele and the value 1 for the alternative allele. Next, all genes were extracted from the genomic regions with two adjacent windows of LD centred by a peak loci to seek for the candidate genes. The corresponding putative homologs of all candidate genes in Arabidopsis thaliana were obtained from the database SesameFG (http://ncgr.ac.cn/SesameFG/; Wei et al., 2017). Gene ontology annotation of these genes was carried out using Blast2GO tool v.3.1.3 (G€ otz et al., 2008) and plotted with the WEGO tool (Ye et al., 2006).
Genotypic and phenotypic data files were prepared and imported to TASSEL5.0 (Bradbury et al., 2007) for association analysis. The Mixed linear model (MLM), taking account of both the kinship coefficients and the population structure (PCA+K) containing the loci tested as a fixed effect, was applied to evaluate the association between SR and the polymorphisms. MLM is effective to correct for the effect of cryptic relatedness and for controlling false positives in the association analysis (Zhang et al., 2010). To get reliable results, P ≤ 0.001 was considered to have a significant effect on the trait. candidate genes identified were designed by using Primer5.0 (Lalitha, 2000) (Table S9). The qRT-PCR analyses of the genes were performed using the ChamQ SYBR qPCR Master Mix (Vazyme Biotec, Nanjing, China) on a Light Cycler 480 II (Roche, Basel, Switzerland). The relative expression levels of the genes normalized to the expression level of actin7 gene (SIN_1006268) and Histone H3.3 gene (SIN_1004293) were calculated from cycle threshold values using the 2 ÀDDCt method (Livak and Schmittgen, 2001). This analysis was carried out in three independent biological replicates and three technical replicates of each biological replicate.
Yeast genetic transformation assay RNA was extracted from 7 days drought-stressed leaf samples of the accession G049 (harbouring the favourable allele C at the locus SNP16465736 in SiSAM) and reversed transcribed as described in the qRT-PCR experiment. The software Primer5.0 was used to design specific primer pair of the candidate gene SiSAM C joined with the primers of the yeast expression vector pYES2 (Invitrogen, Carlsbad, CA) (SiSAMyeastF: 5 0 -TACC-GAGCTCGGATCCATGGAGACCTTCTTGTTTAC-3 0 , SiSAMyeastR: 5 0 -GATATCTGCAGAATTCTTAGTTCTGGGGCTTCTCCC-3 0 ). It was amplified using 2xHigh fidelity Master Mix (MCLAB, Beijing, China) and the yield and quality of the amplicon were monitored on 1% agarose gel. The PCR fragment was excised and purified following Midi Purification Kit protocol (Tiangen Biotech). The purified fragment was ligated into the vector pYES2 according to Exnase II (Vazyme Biotec) kit instructions. DNA from one cloning reaction was used for one transformation of 100 lL Escherichia coli Trans1-T1 Phage Resistant Chemically Competent Cell (Beijing TransGen Biotech Co., Ltd., Beijing, China) and selected on LB agar plates containing 50 lg/mL ampicillin. After 24 h at 37°C, transformed cells were amplified using NOVA Taq-Plus PCR Gold Mix (LCP Biomed, Beijing, China) and the positive cells carrying the gene of interest were scraped off from each plate, introduced into 900 mL of LB containing 50 lg/mL ampicillin and shook during 45 min at 200 r.p.m. in a Innova 43 Incubator Shaker Series (New Brunswick Scientific, Edison, NJ). Ten positive clones were fully sequenced (www.tsingke.net) and checked with the reference gene sequence using the BioEdit software (Hall, 1999). About 0.25 lg of plasmid DNA containing the targeted gene was transformed into yeast (Saccharomyces cerevisiae) strain INVSc1. Recombinant yeast cells were propagated in the SC-U liquid medium for 72 h. Cell density was adjusted to OD 600 at 1.0 followed by serial dilutions 1:10. Ten microliters of each dilution (10 À1 , 10 À2 , 10 À3 and 10 À4 ) were drop plated on SC-U containing 2.5 m Mannitol, identified as the optimum concentration to impose the osmotic stress. Cultures were incubated for 48 h at 30°C before photographing.

Vector construction and Arabidopsis genetic transformation
To functionally characterize the gene SiSAM (SIN_1022789) containing the favourable allele C (SiSAM C ) in Arabidopsis, we cloned the protein coding region of SiSAM C by PCR from the drought tolerant genotype G049, and inserted the sequence into a pCAMBIA 1301 vector. Arabidopsis transformation was carried out using the floral dip method (Clough and Bent, 1998) with Agrobacterium tumefaciens strain LBA4404. Transgenic seeds were screened by sowing on MS medium containing 50 lg.ml -1 hygromicin. The T3 transgenic homozygous lines were used for the gene expression assay and phenotypic analysis.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Figure S1 Population genetics of the Sesamum indicum association panel. Figure S2 Manhattan plots and QQ plots of genome-wide association studies using the mixed model for relative capsule number (CN) in Sesamum indicum during 2015 and 2016. Figure S3 Manhattan plots and QQ plots of genome-wide association studies using the Mixed model for Wiling level (WL) in Sesamum indicum during 2015 and 2016. Figure S4 Manhattan plots and QQ plots of genome-wide association studies using the Mixed model for relative seed yield (Yie) trait in Sesamum indicum during 2016. Figure S5 (A) Boxplot displaying the number of favourable alleles between the two groups (tropical and northern areas) of Sesamum indicum accessions; (B) Pyramiding of favourable alleles at the loci SNP12606000, SNP9732360, SNP16406525 and SNP7867981 improves the seed yield maintenance of sesame accessions under drought stress. Figure S6 Cis-acting regulatory elements detected in the promoter of SiSAM. Figure S7 Phenotypes of wild type (WT) and transgenic Arabidopsis thaliana plants (L2-L3) over-expressing SiSAM C , after 17 days water stress (S) and normal conditions (CK). Figure S8 Effects of drought stress on relative transcript levels of polyamine metabolism and biosynthesis key genes: SiSAM C , AtADC, AtSAMDC, AtSPDS and AtSPMS in wild type (WT) plants and transgenic Arabidopsis thaliana lines (L1, L2 and L3) overexpressing SiSAM C .

Figure S9
Variation of the mean temperature and relative humidity during repeated drought treatment periods on Sesamum indicum accessions. Table S1 LG wise SNP distribution in Sesamum indicum. Table S2 Full list of the 400 Sesamum indicum accessions used in this study, their origins, breeding status and sequencing information. Table S3 Full list of the significant signals identified in 2015 and 2016 for the five traits and their genomic information in Sesamum indicum. Table S4 Frequency of the favourable alleles at the peak loci in Sesamum indicum modern cultivars and landraces.

Table S5
Full list of the genes in LD region around the peak loci and their functional annotation in Sesamum indicum. Table S6 List of gene-containing significant SNPs in the LD region around peak loci detected for Sesamum indicum drought tolerance traits in 2015 and 2016. Table S7 RNA-seq based transcript levels of potential candidates genes detected in QTL regions from the GWAS. Table S8 List of the 100 accessions used for candidate gene association analysis. Table S9 List of the primers used for the qRT-PCR in Sesamum indicum and in Arabidopsis thaliana.