Whole‐genome resequencing‐based QTL‐seq identified candidate genes and molecular markers for fresh seed dormancy in groundnut

Summary The subspecies fastigiata of cultivated groundnut lost fresh seed dormancy (FSD) during domestication and human‐made selection. Groundnut varieties lacking FSD experience precocious seed germination during harvest imposing severe losses. Development of easy‐to‐use genetic markers enables early‐generation selection in different molecular breeding approaches. In this context, one recombinant inbred lines (RIL) population (ICGV 00350 × ICGV 97045) segregating for FSD was used for deploying QTL‐seq approach for identification of key genomic regions and candidate genes. Whole‐genome sequencing (WGS) data (87.93 Gbp) were generated and analysed for the dormant parent (ICGV 97045) and two DNA pools (dormant and nondormant). After analysis of resequenced data from the pooled samples with dormant parent (reference genome), we calculated delta‐SNP index and identified a total of 10,759 genomewide high‐confidence SNPs. Two candidate genomic regions spanning 2.4 Mb and 0.74 Mb on the B05 and A09 pseudomolecules, respectively, were identified controlling FSD. Two candidate genes—RING‐H2 finger protein and zeaxanthin epoxidase—were identified in these two regions, which significantly express during seed development and control abscisic acid (ABA) accumulation. QTL‐seq study presented here laid out development of a marker, GMFSD1, which was validated on a diverse panel and could be used in molecular breeding to improve dormancy in groundnut.


Introduction
The life begins in most of the higher plants through seed and regulation of its germination plays a crucial role for plant survival, particularly during unfavourable environmental conditions (Shu et al., 2015). Seed dormancy and germination are highly coordinated molecular processes which influence the crop productivity in both cereals and legumes in two ways (a) uneven seed germination at the time point of sowing and (b) in situ seed germination during harvesting, notably both affect the seed quality and crop yield (Finch-Savage and Leubner-Metzger, 2006;Penfield, 2017). Therefore, fresh seed dormancy (FSD) is one of the most important traits that control the initial short period of dormancy in the freshly matured or harvested seed/kernel. Groundnut (Arachis hypogaea) is an important grain legume and oilseed crop a key role in the human nutrition. Groundnut is an allotetraploid crop with a genome size of 2.7 Gbp and was domesticated in South and Central America from its wild ancestral species A. duranensis and A. ipaensis (Bertioli et al., 2016;Chen et al., 2016). The sequencing of both the subspecies of cultivated tetraploid groundnut along with other diverse accessions provided greater insights of evolution and domestication (Bertioli et al., 2019;Chen et al., 2019;Zhuang et al., 2019). In Asia and Africa, groundnut is grown as major legume crop. During 2017, the annual world groundnut production was more than 47 million tons of which Asia contributed 62.5% (FAOSTAT, 2017). Particularly, most of the varieties grown in the Asia are Spanish type that lack fresh seed dormancy have significant economic losses due to in-situ germination of seeds during harvest, which affect economic strength of small holder farmers. In addition, this reduces the quality of groundnut seeds, thereby limiting their end use and global trade. As an option, foliar spray of maleic hydrazide (growth retardant) has been used to accelerate dormancy in groundnut (Gupta et al., 1985), which is not an economical substitute. Therefore, it is important to cultivate varieties that significantly possess 2-3 weeks of FSD for sustainable agricultural benefit for smallholder farmers and industries.
In recent years, significant efforts have been made in cereals to understand seed dormancy trait in order to reduce the yield loss and the kernel quality caused due to preharvest sprouting (PHS) or in situ seed germination (Gao and Ayele, 2014;Nakamura, 2018). As a result, several quantitative trait loci (QTLs) and candidate genes for PHS have been identified in wheat (Li et al., 2004;Nakamura, 2018;Ogbonnaya et al., 2008), rice (Lee et al., 2017;Li et al., 2004) and barley (Li et al., 2003(Li et al., , 2004Nakamura, 2018). Earlier, a gene GA20-oxidase was also identified as a candidate gene in the QTL region controlling PHS in rice (Li et al., 2004). Nevertheless, after more than a decade of hard work, causal genes associated with the seed dormancy and PHS have been identified in wheat and barley. These genes include alanine aminotransferase (AlaAT) and mitogen-activated Protein Kinase Kinase 3 (MKK3) in barley (Nakamura et al., 2016;Sato et al., 2016) and Phs1 and mother of FT and TFL1 (MFT) in wheat (Nakamura et al., 2011;Torada et al., 2016). Although it has been reported that groundnut germplasm possesses significant diversity for seed dormancy (Issa et al., 2010;Nautiyal et al., 2001;Wang et al., 2012;Yaw et al., 2008), it is worth mentioning that such detailed studies on FSD in groundnut are lacking (Silva et al., 2017;Vishwakarma et al., 2016).
Genomics-assisted breeding (GAB) can significantly shorten the breeding cycle time for the improvement of elite cultivars with desired traits (Varshney et al., 2015(Varshney et al., , 2018a. Notably, for the fruitful GAB, primary requirement is identification of marker tightly linked with the desired trait(s). Although Vishwakarma et al. (2016) reported two major QTLs controlling FSD, however, the use of F 2 generation with limited multiseason phenotyping does not help in precise detection of candidate genomic regions. With the availability of draft genome sequences of the diploid progenitors and tetraploid cultivated groundnut (Bertioli et al., 2016;Chen et al., 2016;Bertioli et al., 2019;Chen et al., 2019;Zhuang et al., 2019), candidate gene discovery and marker development have become more precise and reliable. Of the available sequencing-based approaches, QTL-seq approach offers great benefits by identifying genomic region(s) and candidate genes leading to development of diagnostic markers. This approach has already been deployed successfully in some legume crops including groundnut for foliar disease (rust and LLS) resistance (Pandey et al., 2017), shelling percentage (Luo et al., 2019a), bacterial wilt (Luo et al., 2019b) and test a colour (Zhao et al., 2019). QTL-seq approach has also been successfully deployed in discovery of genomic regions and candidate genes with high accuracy and precision in some other crops such as cucumber (Lu et al., 2014), tomato (Illa-Berenguer et al., 2015), pigeonpea (Singh et al., 2016a) and chickpea (Das et al., 2015;Singh et al., 2016b). In view of above, we have used RIL population (ICGV 00350 9 ICGV 97045) for performing wholegenome sequencing of pooled samples from contrasting phenotypes and dormant parent followed by QTL-seq analysis. This study has identified candidate genomic regions and genes, and reports development of molecular markers for FSD in groundnut.

Phenotyping and construction of pools
The RIL population (ICGV 00350 9 ICGV 97045) used in this study had high phenotypic variability for FSD. The dormant parent ICGV 97045 was used as source for dormancy, which has FSD up to 15 days, whereas ICGV 00350 was used as nondormant parent which germinate within 48 hours. The previous genetic mapping effort using a subset (368) of large F 2-3 population (>800) performed destructive method of phenotyping and showed clear trait segregation (Vishwakarma et al., 2016). Therefore, we followed then SSD for rest of the F 2-3 lines and finally developed RIL population with 366 lines. Among 366 RILs, 149 lines showed nondormant phenotype (seeds germinated within 24 h), 117 lines showed FSD between 2 and 3 days, 57 lines showed FSD up to 4-7, and 19 showed FSD up to 8-12 days. Interestingly, only 24 lines showed dormancy up to 13-15 days (Figure 1c and Table S1, S2). Later, the RILs with extreme phenotypes were used to develop dormant (D) and nondormant (ND) pools, each consisting 20 individuals, that is 20 nondormant (germination within 48 hrs) and 20 dormant (FSD up to 13-15; Figure S1 and Table S2) RILs. The phenotypic variability present in RIL population was used for developing two pools with extreme phenotypes that is dormant and nondormant ( Figure 1c).

Whole-genome sequencing and SNP identification
Sequencing data were generated for the dormant parent (ICGV 97045), the dormant (D pool) and nondormant pool (ND pool). A total of 264.2 million reads for dormant parent (ICGV 97045), 215.5 million reads for dormant (D pool) and 223.7 million reads for nondormant (ND pool) were generated (Tables 1 and S3). Highest amount of sequencing data was generated for the dormant parent ICGV 97045 (33.03 Gb), followed by ND pool (27.97 Gb) and D pool (26.94 Gb). After filtering, the dormant parent ICGV 97045 (200.5 millions) had the maximum highquality reads as compared to ND pool (157.4 million) and D pool (156.9 million). The alignment of reads generated for the dormant genotype (ICGV 97045) achieved 85.79% genome coverage and 8.02X of average read depth and resulted in development of reference-guided based assembly for ICGV 97045 (hereafter referred as ICGV 97045 assembly; Figures S1 and S2). In the FSD dormant pool, the ICGV 97045 assembly resulted in 85.30% coverage and 6.49X read depth, while nondormant to the ICGV 97045 assembly resulted in 85.12% coverage and 6.50X read depth (Tables 1 and S3). SNPs were identified based on the SNP index calculation by comparing each pool to dormant parent, ICGV 97045 (Figures S2, S3 and S4). SNP index corresponds to the frequency of each parental allele in the population of pooled samples. For instance, 0.5 SNP index indicates equal contribution of alleles from both parents ( Figures  S3 and S4). Therefore, if the SNP is linked to FSD trait, the SNP index for that SNP site associated with D pool would be >0.5, whereas SNP index <0.5 in the ND pool. In general, greater absolute value of ΔSNP index indicates higher probability for the association of SNP site with trait. Thus, to identify key genomic regions and genes associated with FSD trait, we analysed genomewide SNP index with a sliding window of 2-Mb interval with an increment of 50 kb for D and ND pools which deviated from allele frequency of 0.5. After SNP index calculation, DSNP was calculated with a minimum statistical confidence of P < 0.05 ( Figure S5; Table S4). Thus, after examining the dormant and nondormant pools, a total of 10,759 genomewide SNPs for FSD were identified (Table S4). Of the 10,759 SNPs, A-and B-genome harboured 5,970 and 5,184 SNPs (Tables 2, 3 and S4), respectively. Altogether, 5,452 SNPs were found to be intergenic, 312 SNPs intronic, 90 SNPs synonymous, 178 SNPs nonsynonymous, eight SNPs resulted in stop codon, seven SNPs at spice junctions, 29 SNPs in 3ʹUTR and 13 SNPs in 5ʹ UTR (Tables 2 and 3).

Identification of genomic regions and candidate genes for fresh seed dormancy
After computing SNP index and DSNP index in the two extreme pools D and ND, a FSD-associated genomic region was identified on the pseudomolecule B05 from 114.45 Mb to 116.69 Mb (2.24 Mb) of B-subgenome ( Figure 1). This genomic region harboured total 52 SNPs (Table 4), and 18 of these SNPs were highly significant (P value ≤ 0.01, DSNP index = À1). The negative sign of DSNP index indicates the presence of biasedness in the inheritance of parental genomes in the pools towards dormant parent or vice versa. The dormant pool had SNP index = 0 at all SNP positions indicating the contribution of alleles coming from the dormant parent, ICGV 97045 (Table 4) Of the 52 SNPs, 49 SNPs were intergenic and three were intronic affecting three genes viz Araip.S6QRU (encodes receptor-like protein kinase), Araip.YHU92 (encodes RING-H2 finger protein; Figure 2) and Araip.X9V0W (encodes SOUL heme-binding family protein; Table 4). Previously, role of RING-H2 finger protein has been demonstrated during ABA biosynthesis and signalling (Bu et al., 2009). Notably, Clevenger et al. (2016) has reported significantly high expression of gene Araip.YHU92 during seed development ( Figure 3; https://peanutbase.org/search/gene),  (Table S5A).

Development and validation of allele-specific markers for fresh seed dormancy
Based on the DSNP index, a total of 143 SNPs (located in QTL region and other pseudomolecules; see Table S7) were targeted for the development of allele-specific markers. Of 143 allele-specific markers, 42 markers showed clear polymorphism between dormant and nondormant parents. These 42 polymorphic markers were tested on D and ND pool of RILs and a set of breeding material to identify the promising marker(s) which cosegregate with the dormancy phenotype (Table S6). As a result, of these 42 markers, one marker GMFSD1 (B05_8196) showed consistency in differentiating the parents, pooled samples as well as a set of breeding material for dormancy and nondormancy ( Figure 4).

Discussion
Advances in the field of genomics have brought a dramatic reduction in the cost of sequencing technologies. Advent of nextgeneration sequencing (NGS) technologies has greatly facilitated development of genome assembly, trait mapping and candidate gene discovery (Varshney et al., 2015). Thus, NGS technology has augmented faster and precise trait discovery through rapid detection of linked genomic regions (also gene discovery), traitassociated polymorphism and identification of diagnostic markers (Varshney et al., 2015(Varshney et al., , 2018a. There are several sequencingbased trait-mapping approaches which provide faster discovery of candidate genes and facilitate marker development; and one of such approach for trait mapping is QTL-seq (Pandey et al., 2017;Takagi et al., 2013). It has been successfully deployed in several important crop species for trait dissection such as rice blast disease (Takagi et al., 2013), grain length and weight in rice (Yaobin et al., 2018), flowering time in tomato (Ruangrak et al., 2018), foliar disease resistance (Clevenger et al., 2018;Pandey et al., 2017), shelling percentage (Luo et al., 2019a) and bacterial wilt resistance (Luo et al., 2019b) in groundnut, 100 seed weight and root/total plant dry weight in chickpea (Singh et al., 2016a), plant height in soybean (Zhang et al., 2018), etc. Many of these efforts facilitated successful development of diagnostic markers which are being deployed in GAB. In coming years, there will be a shift from GAB to sequence-based breeding (Varshney et al., 2018a) and these SNP-based diagnostic markers will be of great use in enhancing the modernization and precision of breeding programmes for achieving higher genetic gains in farmers' field (Varshney et al., 2018b).
FSD/PHST/in situ germination leads to a reduction in the grain or kernel yield and often attributes to medium to large yield losses and quality deterioration of the produce in both cereals (Abe et al., 2019;Benech-Arnold and Rodr ıguez, 2018;Gao and Ayele, 2014;Nakamura, 2018;Rodr ıguez et al., 2015) and legumes (Dias et al., 2011;Patro and Ray, 2016;Vishwakarma et al., 2016). Due to domestication and extensive human-made selection during varietal development programs, modern groundnut represents much low genetic diversity compared to its wild ancestral species (which produces dormant seeds). As a result, in situ sprouting of groundnut seeds caused due to lack of FSD results in a loss of up to 20%-50% (Nautiyal et al., 2001;Yaw et al., 2008). Several researchers have studied this trait in order to find suitable new sources for FSD in groundnut (Faye et al., 2009;Patro and Ray, 2016;Yaw et al., 2008). Our previous study, the only one, was performed in F 2 population derived from the cross between ICGV 00350 and ICGV 97045 which provided preliminary idea on inheritance pattern and reported two major QTL regions associated with FSD using Diversity Arrays Technology (DArT) and DArTseq platform (Vishwakarma et al., 2016). The DArT and DArTseq are useful genotyping approaches but they do not provide exact position of SNPs in the reference genome which is very much required for performing fine mapping, candidate gene discovery and marker development. Earlier, it has been successfully demonstrated by Sato et al. (2016) that the previously reported candidate genes can be validated the previously reported candidate gene, followed by its use in improving target trait. Therefore, we advanced the same population in this study for applying QTL-seq approach to perform high-resolution trait mapping and identify genomic region(s) associated with FSD. The use of highly homozygous RIL-F 7 line provided highly significant SNPs and genomic region(s) associated with the desired trait for the discovery of candidate genes. As a result, sequence comparison discovered distribution of 5,970 and 5,184 SNPs (P value ≤ 0.01) on A-and B-genome; among them only 767 and 1071 SNP were with ΔSNP index equivalent to À1. These highly significant SNPs (P value ≤ 0.01, ΔSNP index = À1) were mostly distributed among pseudomolecule-A09 (221 out of 767), A10 (356 out of 767) and B05 (897 out of 1071), suggesting plausible association of these three pseudomolecules with FSD trait.
The comprehensive QTL-seq analysis for FSD detected a major genomic region of 2.24 Mb on the pseudomolecule B05 of Bsubgenome. This promising region contains 46 important genes, and four of these genes were found with important functions namely SOUL heme-binding family protein (Araip.X9V0W), RING-H2 finger protein (Araip.YHU92), far1-related sequence 6-like (Araip.LKN71), ascorbate oxidase (Araip.K7Y13) and heavy metal transport/detoxification superfamily protein (Araip.2F1GS). Sequencing data suggested synonymous SNPs among genes encoding RING-H2 finger protein, far1-related sequence 6-like and SOUL heme-binding family protein, which are key regulator of developmental processes (Bu et al., 2009;Ko et al., 2006;Li et al., 2018). Likewise, a 0.74-Mb SNP-enriched genomic region was identified on the pseudomolecule A09 of A-subgenome, containing a nonsynonymous SNP on an ABA biosynthetic gene zeaxanthin epoxidase (ZEP, Aradu.D94AQ). Further, this genomic appeared as one of the key controllers of ABA biosynthesis in groundnut as this genomic region is home to several important genes which are known to participate in ABA signalling and control seed dormancy such as PPR superfamily protein gene, protein phosphatase 2C/2A family protein gene, GRAM domain protein/ABA-responsive-like and serine carboxypeptidase 46 (Mauri et al., 2016;N ee et al., 2017;Xia et al., 2018).
Abscisic acid crosstalk with other hormones such as ethylene and gibberellic acids (GAs) is crucial during plant developmental processes including seed development and germination (Rodr ıguez-Gacio et al., 2009). It is well-accepted fact that ABA signalling plays an important role in seed dormancy, affecting seed germination process (Dejonghe et al., 2018;Finch-Savage and Leubner-Metzger, 2006;Liu et al., 2013;Nishimura et al., 2018). In Arabidopsis, a transcription factor RING-H2 finger protein positively regulates the ABA biosynthesis and signalling during seed germination (Bu et al., 2009) and confers abiotic   stress tolerance through increased ABA biosynthesis (Ko et al., 2006;Liu et al., 2016). Notably, overexpression of an Arabidopsis RING-H2 finger protein-encoding gene, XERICO, confers drought tolerance through enhanced ABA accumulation (Ko et al., 2006). As anticipated from these previous findings, the expression of gene RING-H2 finger protein (Araip.YHU92) is also substantially high in groundnut developing seeds, suggesting its role in ABA accumulation, which might also impact seed dormancy. Zeaxanthin epoxidase is one of the key enzymes involved in ABA biosynthesis, which catalyses the first step of ABA biosynthesis by epoxidation of zeaxanthin to violaxanthin (Thompson et al., 2000). Earlier, role of ZEP in seed dormancy has been demonstrated in Arabidopsis and tobacco (Frey et al., 1999;Marin et al., 1996) and peach . Also, a QTL analysis for preharvest sprouting in Arabidopsis has identified DOG locus (Alonso-Blanco et al., 2003), now identified as one of the key regulators in the ABA signalling pathway (Nishimura et al., 2018). Further, in rice, ABA receptor mutants showed pleotropic effects including seed dormancy and rice productivity (Miao et al., 2018). Further, dormancy and germination are complex physiological process involving complex interaction between several pathways, including redox signalling through reactive oxygen species (ROS). We also identified a ROSscavenging enzyme ascorbate oxidase encoding gene Araip.K7Y13 in the genomic region identified on B05. In practical terms, our data suggest several candidate genes controlling FSD in groundnut. Among them, ZEP (Aradu.D94AQ) and RING-H2 finger protein (Araip.YHU92) might be the most relevant genes regulating preharvest sprouting caused due to lack of FSD in ground nut. Further, it would be interesting to know the genetic regulation of RING-H2 finger protein transcription factor over ABA synthesis, as it could be plausible that this transcription factor might be binding to the promoter region of ZEP and ABA biosynthetic genes including NCED. Additionally, finding haplotypes associated with SNPs in these genes in the natural population(s) and breeding material and characterization of these gene(s) by genome editing tools will provide new insight towards the genetic control of FSD.
Major advantage of sequencing based trait mapping approaches is mapping accurate genomic regions on the genome, discovery of candidate genes in addition to development of DNA markers. Earlier, Pandey et al. (2017) and Luo et al. (2019aand) Luo et al. (2019b have demonstrated the benefits of QTL-seq approach; identification of candidate gene(s) and development of markers in the candidate genomic region. The present study also successfully developed a marker GMFSD1 near the identified genomic region on the B05 which is closely linked to the one of the candidate gene Araip.YHU92. This marker was able to distinguish both parents and the extreme bulks and successfully differentiated both diverse breeding and germplasm materials with FSD trait. Thus, our study provides an important marker linked to FSD, which can be used in groundnut breeding programme for early section of FSD trait.
In summary, our study suggests WGS-based QTL-seq approach as one of the most efficient techniques for the identification putative regions/SNPs associated with desired traits. Further, it is reasonable to target ZEP and RING-H2 finger protein as candidate genes for introgression of fresh seed dormancy. Furthermore, more evidences are required to functionally validate these genes; plausibly genome editing would be best to deliver this. Therefore, we are interested to characterize the impact of candidate genes identified in the present study through overexpression study as well as CRISPR/Cas9. Additionally, we have developed multiple RIL populations which will be helpful in delineating and identifying additional candidate genes and markers for use in markerassisted breeding to improve FSD in groundnut.

Plant materials and construction of pools based on phenotyping
The RIL population derived from the cross ICGV 00350 9 ICGV 97045, earlier described by Vishwakarma et al. (2016), was used in this study. Both parents used in this study are short-duration Spanish varieties (subspecies fastigiata; botanical type vulgaris) and are widely cultivated in the states of Tamil Nadu and Andhra Pradesh in India under both irrigated and rain-fed condition. The nondormant parent, ICGV 00350, is drought-tolerant genotype, but prone to sprout in the field, that is lacks fresh seed dormancy. In contrast, ICGV 97045 possesses fresh seed dormancy up to 15 days, and being used as donor parent for improving FSD in several varieties at ICRISAT. Our previous genetic mapping effort for FSD used a subset F 2:3 population (368) (Vishwakarma et al., 2016) of large segregating F 2:3 population (>800). We followed then SSD for rest of the F 2 lines and finally developed RIL population with 366 lines. For phenotyping, the noncured seed from freshly harvested mature pods (post rainy season) was used for the germination assay and enough care was taken to avoid test a damage while removing seeds from the pods. The maturity of pods was determined by the development of black coloration inside the shell as explained by Miller and Burns (1971). To minimize the experimental variation, good-quality uniform-sized 20 seeds from each RIL were chosen for germination assay and the experiment was conducted in technical replicates. The seeds were treated with fungicide Captan, n-[(trichloromethyl) thio]-4cyclohexene-l,2-dicarboxymide, at 2 g/kg seed (Upadhyaya and Nigam 1999), placed on filter paper in a petri dish which was kept moist with sterilized distilled water during the course of the study and were incubated at 35 AE 2°C in the dark in an incubator, and the seed germination was recorded daily. The dormant RILs which showed FSD upto 13-15 days were used for creating dormant pool/bulk; and the RILs which showed phenotype similar as ICGV 00350 (germinated within 24 h) were used for the nondormant pool/bulk. Based on above phenotyping results, equimolar concentration DNA (100 ng each) of 20 RIL individuals with above-mentioned extreme phenotype, that is dormant and nondormant seeds, was pooled together to create dormant (D) and nondormant (ND) pool, respectively (Figures 1, S1).

Sequence libraries construction and Illumina sequencing
The WGS data were generated for three samples namely ICGV 97045 (dormant parent), dormant pool for fresh seed dormancy (D pool) and nondormant pool for non-fresh seed dormancy (ND pool) were prepared and used for sequencing on Illumina HiSeq 2500 at Center of Excellence in Genomics and Systems Biology (CEGSB), ICRISAT, Hyderabad, as described in Pandey et al. (2017). In brief, single Illumina library for each sample was made using TruSeq DNA Sample Prep kit LT (set A) FC-121-. Two micrograms DNA from each of these three samples was first sheared using diagenode Bioruptor â NGS and then was subjected to end repairing and adapter ligation. For size selection, 2% agarose gel was used for electrophoresis and 500-600 bp insert size was selected, purified and then enriched by using adaptor compatible PCR primers. The size of the DNA libraries was reconfirmed through chip assay using Agilent Technologies 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA). Later, these libraries were used to generate 250 bases pair-end reads by sequencing them on Illumina HiSeq platform with Reagent Kit v2 (500-cycles).
Construction of reference-guided assembly SNP index was calculated by using QTL-seq pipeline (http://ge nome-e.ibrc.or.jp/home/bioinformatics-team/mutmap), developed at Iwate Biotechnology Research Center, Japan. For analysis, a reference tetraploid genome assembly for groundnut was constructed by using diploid genome assemblies of A. duranensis (A-genome) and A. ipaensis (B-genome) developed by Bertioli et al. (2016). The cleaned reads of ICGV 97045 were first aligned to the constructed reference tetraploid genome assembly using inbuilt Burrows-Wheeler Alignertool (Li and Durbin, 2009). Thereafter, we used Coval software for postprocessing and filtering of the alignment files which were developed after aligning sequence reads to both diploid genomes separately . Followed by variants call between ICGV 97045 (dormant parent) and the both diploid reference genomes. ICGV 97045 reference-guided assembly was developed by using these variants and the synthetic tetraploid genome assembly by substituting the alternate bases with high-confidence SNP variants. Thereafter, reads of both dormant and nondormant pools were then aligned onto ICGV 97045 assembly. The variants (SNP index) were then called for both pool samples with ICGV 97045 assembly.
Implementation of QTL-seq pipeline and SNP index calculation SNP index for both pools was calculated by equating with the ICGV 97045 assembly using a formula previously described method (Abe et al., 2012;Pandey et al., 2017;Takagi et al., 2013). SNP index at a position in a pseudomolecule is derived by division of the counts of alternate base with the number of reads aligned. The SNP positions with read depth <5 in both the pools and SNP index <0.3 in either of the pool were filtered out. ΔSNP index was then calculated by subtracting SNP index of nondormant pool from SNP index of dormant pool. It is important to mention that only those SNPs were selected for ΔSNP index calculation that had homozygous alleles in both pools. SNPs which passed the criteria of having ΔSNP index = À1 were considered as causal SNPs for FSD. ΔSNP index = À1 indicates that the allele called in dormant pool was same as that of dormant parent, while alternate base in nondormant pool or vice versa ( Figure S6). These ΔSNP index value was used for QTLs identifications. In order to find minor alleles or important SNPs controlling FSD traits, we also analysed the overall ΔSNP index data obtained through whole-genome resequencing data by comparing SNP index of nondormant pool from SNP index of dormant pool to find the SNPs associated with important genes which are known to be involved in seed development and hormone homeostatsis including ABA signalling and synthesis.

Designing of primer pairs, polymerase chain reaction (PCR)
Based on the ΔSNP index, candidate SNPs were targeted to design allele-specific primers (marker) using Primer3 (http://prime r3.ut.ee/; You et al., 2008; Table S7). To avoid complication and have PCR condition consistency, we specifically designed allelespecific primer pairs with Tm of 61-62°C.
After PCR amplification, the alleles were scored on 2% agarose gel electrophoresis as present and absent. The markers were amplified using both dormant and nondormant parent parents used for development RIL population. Identified polymorphic markers were used to screen polymorphism between RILs F 7 individuals (with extreme phenotype) that were used to create dormant and nondormant pool. Identified polymorphic markers were validated in a diverse panel consist of both FSD and non-FSD accessions for their broader applicability.

Figure S1
Schematic representation of QTL-seq approach used for trait mapping in groundnut for fresh seed dormancy. Figure S2 Sequencing depth of the dormant parent ICGV 97045 Black line indicates the sliding window average of 2 Mb interval with 50 kb increment for SNP-index. Figure S3 SNP-index plots for 20 pseudomolecules dormant (D) pool with the dormant parent. Red lines indicate the sliding window average of 2 Mb interval with 50 kb increment for SNPindex. Figure S4 SNP-index plots for 20 pseudomolecules nondormant (ND) pool with the dormant parent. Figure S5 The D (SNP index) plot obtained by subtraction of dormant (D) pool SNP-index from nondormant pool SNP-index. Table S1 FSD phenotyping details of RIL population.