Whole‐genome resequencing‐based QTL‐seq identified AhTc1 gene encoding a R2R3‐MYB transcription factor controlling peanut purple testa colour

Summary Peanut (Arachis hypogaea. L) is an important oil crop worldwide. The common testa colours of peanut varieties are pink or red. But the peanut varieties with dark purple testa have been focused in recent years due to the potential high levels of anthocyanin, an added nutritional value of antioxidant. However, the genetic mechanism regulating testa colour of peanut is unknown. In this study, we found that the purple testa was decided by the female parent and controlled by a single major gene named AhTc1. To identify the candidate gene controlling peanut purple testa, whole‐genome resequencing‐based approach (QTL‐seq) was applied, and a total of 260.9 Gb of data were generated from the parental and bulked lines. SNP index analysis indicated that AhTc1 located in a 4.7 Mb region in chromosome A10, which was confirmed by bulked segregant RNA sequencing (BSR) analysis in three segregation populations derived from the crosses between pink and purple testa varieties. Allele‐specific markers were developed and demonstrated that the marker pTesta1089 was closely linked with purple testa. Further, AhTc1 encoding a R2R3‐MYB gene was positional cloned. The expression of AhTc1 was significantly up‐regulated in the purple testa parent YH29. Overexpression of AhTc1 in transgenic tobacco plants led to purple colour of leaves, flowers, pods and seeds. In conclusion, AhTc1, encoding a R2R3‐MYB transcription factor and conferring peanut purple testa, was identified, which will be useful for peanut molecular breeding selection for cultivars with purple testa colour for potential increased nutritional value to consumers.


Introduction
Peanut (or groundnut), widely cultivated in more than 100 countries, with the annual production of 43.98 million tons, is one of the most important oil crops in the world (FAOSTAT 2016, www.faostat.fao.org). Peanut seeds are rich in oil (40%-60%), protein (20%-40%), carbohydrate (10%-20%) and many other nutritional values, such as vitamin B1, B3, B9 and E, biotin, resveratrol, isoflavones and phytic acid (Pandey et al., 2012;Zhao et al., 2012). The testa colours of most peanut varieties are with pink or red. Recently, the peanut varieties with purple testa have attracted increasing attention as a potential source of nutraceuticals, owing to their higher anthocyanin content and microelements, which are beneficial to human health (Attree et al., 2015;Kuang et al., 2017). Peanut varieties with purple testa satisfy the needs of the customers. However, the commercial fine varieties with black or purple testa are scarce which limits the production of purple peanuts.
In the last decade, molecular breeding approaches such as marker-assisted selection (MAS) have been utilized in peanut and other crops, which significantly improved the efficiency of breeding. Despite the agronomic importance of testa colour, the genetic and the molecular mechanism that controls testa colour in peanut are not clear. Besides pink and purple testa, germplasms with white, red and tan testa are also available. It is reported that the purple testa is completely dominant to white and incompletely dominant to the basic tan or pink testa colour, and purple testa of peanut was controlled by a single dominant gene (Branch, 1985(Branch, , 2001. A study showed that the purple testa of peanut was controlled by an incomplete dominant major gene, which linked with SSR marker PM93 (Hong et al., 2007). Sequence alignment showed that SSR marker PM93 was same as Seq4H11, located at 6.664 cM of linkage group TA10 in the integrated consensus map of cultivated peanut and wild relatives (Shirasawa et al., 2013).
Cultivated peanut (Arachis hypogaea. L) is an allotetraploid (AABB, 2n = 49 = 40). The large size of the genome (2800 Mb), the ploidy level and high content of repetitive DNA (Dhillon et al., 1980) have obstructed genetic and genomic studies in peanut, and make positional cloning of a gene very difficult. During the past 5 years, significant progress has been made in peanut genomic sequencing and QTL mapping. In 2016, the genome sequences of two diploid wild ancestors of cultivated peanut, Arachis duranensis and Arachis ipaensis, were determined (Bertioli et al., 2016;Chen et al., 2016). In 2017, two groups reported the completion of genome sequences of cultivated peanut, Tifrunner (Bertioli et al., 2019) and Shitouqi (Zhuang et al., 2019). In addition, the draft sequences of Arachis monticola, the only allotetraploid wild peanut in the Arachis genus, were reported (Yin et al., 2018). These efforts provided new opportunities for peanut genetic analysis, gene cloning, QTL mapping and molecular marker development (Luo et al., 2019;Pandey et al., 2017a,b;Wang et al., 2017;Zhao et al., 2017).
Traditional QTL analysis has been known as time-consuming and labour-intensive work. In comparison, bulked segregant analysis (BSA) was first reported in 1991 and is an elegant method for rapidly identifying markers linked to any specific gene or genomic region using two bulked DNA pools (Michelmore et al., 1991). Recent development of next-generation sequencing technologies (NGS) reduced the cost and shortened the cycles of sequencing, promoting the use of sequence-based trait mapping approaches. Based on the integration of BSA, NGS and bioinformatics analysis, a series of more efficient approaches were developed, including QTL-seq, MutMup, MutMup+ and BSR-seq. These approaches have been successfully employed in QTL mapping and candidate gene identification in several crops (Abe et al., 2012;Fekih et al., 2013;Steuernagel et al., 2016;Takagi et al., 2013). QTL-seq is a whole-genome resequencing (WGRS)-based approach, which firstly to resequence two bulked DNA of progenies (each with 20-50 individuals) showing extreme phenotypic values and then identifying the candidate region or genes by counting and comparing the index SNPs between two bulks . It is remarkable that QTL-seq has been successfully used in peanut in mapping QTLs of rust resistance, late leaf spot resistance, and genomic regions and candidate genes controlling shelling percentage (Clevenger et al., 2018;Luo et al., 2019;Pandey et al., 2017a).
In this study, the inheritance of purple testa of peanut was analysed using F 2 , F 2:3 segregation population derived from a cross between peanut varieties with pink and purple testa. A major location controlling the purple testa was mapped using QTL-seq method. Furthermore, a candidate gene, AhTc1, encoding a R2R3-MYB transcription factor, was identified using mapbased cloning method together with transcriptome analysis and gene expression profiling. Functional studies indicated that AhTc1 played important roles in regulating anthocyanin biosynthesis. Our results laid the foundation for breeding purple testa peanut varieties using peanut MAS program.

Genetic analysis of purple testa in YH29
YH29 and ZH9 are varieties with purple testa, while WH10, GT-C20 and ZH8 are varieties with pink testa. In addition, among the parental lines, there was also variation in the leaflets, flower, vasculature and petiole colour where lines with darker testa also showed more purple coloration in these tissues ( Figure 1). The peanut varieties with purple testa were used as male parents for crossing with the varieties with pink testa (Table 1). All F 1 plants showed the intermediate phenotype in leaflets, vasculature, petiole and flower colour comparison with their parental lines ( Figure 1). However, all F 1 seeds were with pink testa, same as the female parents. For F 2 and F 3 generation, the seeds harvested from a single plant were also with same testa colour, and the testa colour was in consistence with the colour of female parents. Our results were in consistence with studies in other plants (Lambrides et al., 2004). So, we predicted that testa was developed by the integument of the ovule, and hence, the testa has the same genotype with the maternal plant, which lead to the segregate of testa colour in the next generation. We determined the genotype of a seed through the phenotype of seeds from KF1 (WH10♀ 9 YH29♂) and KF2 (GT-C20♀ 9 YH29♂) populations. Statistical results corresponded to a single locus segregation ratio (Table 1). Combined with phenotypes of F 1 plants and F 2 seeds, we suggested that purple testa of YH29 was controlled by an incomplete dominant gene and this gene was named as AhTc1.
By comparing the two extreme bulks, a total of 640 757 highquality genome-wide SNPs were called (Table S1).With a filtration criterion of allele frequency difference (AFD) >0.5 and Fisher exact test P-value < 1e À5 , a total of 1797 SNPs were putatively associated with the purple testa phenotype, and 1317 (73.29%) of them were located on chromosome A10 (Figure 2a), suggesting that the purple testa controlling gene was located on A10. SNP index analysis showed that the region on A10 from 108.0 Mb to 112.7 Mb exhibited significant unequal contributions. In this 4.7 Mb region, the average AFD value was 0.782, and 214 SNPs with AFD > 0.8, which was the highest in peanut genome, suggesting that the region might contain candidate gene controlling purple testa ( Figure 2b).

Confirmation of the candidate region using BSR-seq in three populations
Since the phenotype variation can be observed through the colour of the leaflets between parental lines and individual of F 2 , the candidate gene could be an expressing gene. Therefore, RNA bulks for extreme phenotypes were constructed using KF1 population (F 2 ), KF2 population (F 2 ) and ZH (ZH8♀ 9 ZH9♂) RIL population. BSR sequencing generated a total 25.99, 25.69 and 28.68 Gb raw data from KF1, KF2 and ZH population, respectively (Table 2). A total of 135, 51 and 172 SNPs putatively associated with the purple testa phenotype were identified, and 71.11%, 33.34% and 65.70% of these SNPs from KF1, KF2 and ZH population were in chromosome A10, respectively ( Figure S1). BSR-seq results also showed that the SNPs were enriched in the same candidate region (Chr. A10:108.0. . .112.7 Mb), which confirmed the results of QTL-seq ( Figure S1).

Development of STS markers, validation and narrowing the candidate genes
To narrow the candidate SNPs, 10 STS markers were developed in the candidate region on chromosome A10 range from Arahy.10:103268353 to 113621464 ( Figure 3a). Seven out of these markers showed good amplification and polymorphism  (Table 3). Linkage of these SNPs to AhTc1 was confirmed by testing eight pink testa and eight purple F 2 lines. Genotyping result demonstrated that the marker pTesta1089 (Arahy.10:108900285) was closely linked with AhTc1 ( Figure 3c). Comparative genomic analysis showed that there are 258 genes in the 4.7 Mb region (Table S2). According to functional annotation, nine candidate genes including three MYB transcription factor genes, a bHLH gene, two MADX-box transcription factor genes, an ABC transporter gene, a F-box transcription factor gene and a cytochrome P450 gene were highlighted ( Figure 3a, Table S2). The SNP marker pTesta1089 located in the 3 0 of the MYB transcription factor (gene name: J3K16K), and the distance between them is only 2724 bp ( Figure 3b). Interestingly, RNA-seq result showed that gene (J3K16K) was highly expressed in the purple testa varieties (YH29 and ZH9) to compare with that in the pink testa varieties (WH10 and ZH8) ( Figure 3d). These results implied that J3K16L was the candidate gene of AhTc1.

Function validation of the candidate gene
To confirm that the purple testa was caused by the expression of the candidate gene J3K16L, we cloned this gene (Table S3). The coding sequence of this gene, driven by CaMV 35S promoter, was introduced into the tobacco. All transgenic lines expressing J3K16L exhibited different degrees of purple colour in leaves and  flowers, while the fruits and testa were all showed dark purple colour ( Figure 4). In addition, the seed coats of the transgenic lines exhibited also showed more purple (Figure 4). These results confirmed that J3K16L promoted anthocyanin accumulation in transgenic tobacco plants. In peanut, the expression level of J3K16L was also positive correlated with the purple testa and the accumulation of anthocyanin (Figure 1e, Figure 3d). Taken together, our work demonstrated that J3K16L was AhTc1, and the high-level expression of AhTc1 confers to the phenotype of purple testa in peanut.

Identification of differentially expressed genes (DEGs) between pink and purple testa cultivars
To further reveal the roles of AhTc1 and gene expression regulation in testa colour of peanut, we employed RNA-seq technology to analyse the genome-wide gene expression profiles for the pink (WH10) and purple (YH29) testa cultivars. RNA-seq generated a total of 64.97 and 65.14 Gb clean reads from WH10 and YH29, and 82.43% and 82.90% of them could be mapped with the genome of Tifrunner, respectively. Between the WH10 and YH29, a total of 2814 DEGs were identified under the criterion of log 2 Ratio ≥ 1 (Figure 5a). Compared with WH10, the expression level of 1538 genes were up-regulated, and 1275 genes were down-regulated in YH29 (Figure 5a). In order to get a better understanding of the transcriptome differences between two peanut varieties, KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis was performed. We found that the many pathways were significantly enriched including 'anthocyanin biosynthesis', an important pathway that might contribute to regulating the synthesis of pigment and testa colour (Figure 5b).

Discussion
AhTc1, encoding a MYB transcript factor, regulated purple testa in peanut In this study, the evidences of forward genetics and reverse genetics all supported that AhTc1 was the key gene in controlling purple testa in peanut. Functional annotation and sequencing alignment showed that AhTc1 encodes a member of R2R3-MYB transcript factor. R2R3-MYB represents one of the largest transcript factor gene families in plants. R2R3-MYB transcription factors were involved plant development, metabolism and responses to biotic and abiotic stresses (Carre and Kim, 2002;Chen et al., 2017;Dubos et al., 2010). Previous studies showed that a distinct clade of R2R3-MYB transcription factors played key regulatory roles in anthocyanin biosynthesis (Chen et al., 2017;Lin-Wang et al., 2010;Liu et al., 2015;Nguyen and Lee, 2016). Anthocyanins, belong to the flavonoid family, are one of the most important pigments which can increase red, blue and purple colours in a range of flowers, fruits, foliage, seeds and roots (Lin- Wang et al., 2010;Tanaka et al., 2008). In plant, the biosynthesis of anthocyanin was regulated through a complex MYB, basic helix-loop-helix (bHLH) and WD-repeat proteins (MYB-bHLH-WD40, MBW) complex (Baudry et al., 2004). In the complex, the activity of R2R3-MYB might determine the patterning and spatial localization of anthocyanins, different family members of R2R3-MYB contributing to the accumulation of anthocyanin in tissue or developmental specific patterns. For instance, in apple, MdMYB1, MdMYBA and MdMYB3 regulate the synthesis of red pigmented anthocyanins in the peel, MdMYB10 regulates the synthesis of anthocyanins in the peel, flesh and foliage, and MdMYB110a controls the accumulation of red in the fruit cortex during the later phase of fruit maturity (Chagne et al., 2013;Espley et al., 2007;Liu et al., 2015;Takos et al., 2006;Vimolmangkang et al., 2013). Despite the important roles of R2R3-MYB and MBW complex in regulation of anthocyanins synthesis have been well elucidated in many other plants, however, functional information of R2R3-MYB in peanut is scarce, especially in regulation of testa colour of peanut.

Different KEGG pathways involved testa colour in peanut
In this study, we identified a total of 2814 DEGs between the pink and purple testa colour peanut varieties (Figure 5a). Through KEGG analysis, we found that many of the pathways related to synthesis of other metabolites were also enriched, such as 'vitamin B6 metabolism', 'Sesquiterpenoid and triterpenoid biosynthesis', 'phenylpropanoid biosynthesis', 'flavonoid biosynthesis', 'flavone and flavanol biosynthesis' and 'isoflavonoid biosynthesis', these results implied that there might be the differences in flavour and other nutritional ingredient between purple and pink peanut varieties. Moreover, 'plant-pathogen interaction' was also identified as enriched between two peanut varieties ( Figure 5b). Past studies showed that the accumulation and localization of anthocyanins and flavonoids were involved in stress resistance (Chalker-Scott, 1999;Treutter, 2005). For example, in barley, the content of proanthocyanidins in testa layer was involved in defence against Fusarium species (Skadhauge et al., 1997). However, there is no evidence currently to show the resistance different between the purple and pink testa peanut varieties. In addition, GO (Gene Ontology) analysis also showed that many stress-related GO terms were enriched, such as 'response to wounding', 'regulation of jasmonic acid mediated signalling pathway' and 'regulation of defence response', which supported the results of KEGG analysis (Table S4).

BSA, BSR and QTL-seq technologies for QTLs/genes identification
BSA was first reported in mapping the disease resistance genes in lettuce (Michelmore et al., 1991). During the past decades, BSA has been used to map the quantitative trait locus (QTL) controlled by single gene or major gene. BSA can be used for mapping the traits, which is opposite or with significant differences between parents, and having enough individuals with an extreme phenotype in the segregation populations. BSA can quickly figure out the molecular markers closely linked to the target gene (trait) without genetic linkage map (Poulsen et al., 1995;Salunkhe et al., 2011;Yuan et al., 2013).
Recently, a series of NGS-based BSA approaches were developed, including MutMup, MutMup+, BSR-seq and QTL-seq (Abe et al., 2012;Fekih et al., 2013;Steuernagel et al., 2016;Takagi et al., 2013). QTL-seq is a whole-genome resequencing (WGRS)-based approach and has strong advantages in the species with reference genome Zhang et al., 2018). In this study, AhTc1 was mapped using QTL-seq in peanut. BSR was used to verify the QTL-seq result. QTL-seq and BSR had the same result, suggested that BSR was also an effective method for primary mapping of QTLs. In comparison with BSR, QTL-seq could generate more polymorphisms SNPs, randomly distributed in the genome, which is important for narrowing the candidate regions. For BSR, more SNPs could be generated from the highly expressed genes. So, many SNPs in the noncoding sequences and noncoding region of genes could not be indexed. The cost of BSR was lower than that of QTLseq. QTL-seq and BSR will be very useful in further peanut QTL mapping studies.

Applications of AhTc1 gene in peanut breeding
Testa colour is an important trait for peanut breeding programme. In addition to appearance quality, the purple testa peanuts contain higher levels of anthocyanin and many other microelements than pink testa peanuts (Attree et al., 2015;Kuang et al., 2017). The unknown genetic and molecular mechanisms regulating testa colour limited the molecular breeding of purple testa peanut. Several attempts have been made to analyse the genetics of purple testa formation and identify the potential markers for purple testa (Branch, 1985(Branch, , 2001. However, most of the studies failed to provide tightly linked markers and useful information on the candidate genes controlling the purple testa. Since the testa colour is a maternal gene determined trait, the segregation occurs in the following next generation. The traditional breeding faces more difficulties for purple testa peanut selecting and breeding. In this study, the purple testa controlling gene AhTc1 was firstly mapped in 4.7 Mb region of chromosome A10, and a series of closely linked markers were developed (Table 3). The utility of these markers will accelerate the process of purple testa peanut breeding. In addition, we identified the candidate gene, AhTc1, a R2R3-MYB transcription factor coding gene. The expression level of this gene was related to the colour of the plants, providing important target gene for testa colour modification using transgenic or gene editing methods.

Conclusion
The major gene regulating purple testa of peanut, AhTC1, was mapped in a 4.7 Mb region in chromosome A10 using QTL-seq approach. A series of SNP markers were developed and genotyped for narrowing the candidate region, and a R2R3-MYB transcription factor gene was identified. The evidences of forward genetics and reverse genetics all supported that R2R3-MYB transcription factor gene was AhTc1, conferring purple testa phenotype. This work lays the foundation for the further understanding of the regulation mechanisms of peanut purple testa formation and molecular breeding of new varieties with purple testa.

Plant materials and anthocyanin content determination
Weihua 10 (WH10), Zhonghua 8 (ZH8) and Yueyou 20 (GT-C20) are peanut varieties with pink testa. YH29 and Zhonghua 9 (ZH9) are peanut varieties with purple testa. WH10 and GT-C20 with pink testa were crossed with the purple testa variety YH29, respectively. In addition, ZH8 with pink testa was crossed with a purple testa peanut variety ZH9 to produce F 7 recombinant inbred lines (RILs). All F 1 hybrids were evaluated using SSR and MITE transposon markers (Qiu et al., 2018). Anthocyanin content of peanut testa was determined using the method reported in previous studies (Mancinelli et al., 1991;Serrano et al., 2012).

QTL-seq, BSR-seq and bioinformatic analysis
To obtain clean reads, the low-quality reads (Q ≤ 20), adapter sequences, N > 10% reads and too short reads (<20 bp after trimming the adapters) were removed by tool Trimmomatic. Then, all clean reads were mapped with the genome sequences of Arachis hypogaea (PeanutBase version 1.0, https://peanutba se.org/) using mem module of BWA software with the harder filters of minimum number of identical bases 40 (Li and Durbin, 2009) for QTL-seq data, or using mapper STAR for BSR-seq data. The alignments were further filtered with criteria that require minimum 50 bp identical bases to the reference sequence, maximum 5% mismatch rate, maximum 5% clipped ends, maximum 50 bp gap, maximum 2 alleles and minimum 50 mapping score. The germline SNP was called using HaplotypeCaller module of GATK tools software following the best practice protocol. The raw SNPs/Indels were further filtered using requirements that only two genotypes exist, total sequencing depth >20 and <10 000, mutant allele sequencing depth >5, proportion of either allele >5%. The allele frequency was calculated based on the filtered read counts (discarded inner low-quality genotypes) of both alleles at the SNP/Indel sites. Then, AFD (allele frequency difference) was the absolute value after subtraction of the allele frequency of the purplepool from that of the pink-pool. Fisher exact test was applied to evaluate the allele frequency difference as well using read counts. If P-value < 1eÀ5 and AFD > 0.5, the SNP was regarded as candidates linked to the trait.

STS markers development and candidate SNP validation
To develop the markers for the validation of QTL-seq results and narrow the candidate region, the sequences containing 1500 bp upstream and 1500 bp downstream of SNPs were downloaded from PeanutBase (https://peanutbase.org). Due to the high similarity of A and B subgenomes, it is difficult to design the primers for only amplifying the A or B allele. Thus, the 3001 bp sequences were used to blast with the genome sequences of cultivated species Tifrunner, donor ancestor species A. duranensis (A genome) and A. ipaensis (B genome). According to the alignment results, we designed the specific primers only for A using primer premier 5.0 software (primer premier 5.0 software). PCR product direct sequencing was used for genotyping of these STS markers.

Gene cloning, vector construction and gene transformation
The ORF region of AhTc1 was amplified using EX Taq HS (Takara, Dalian, China) from ZH9 and cloned into pMD TM 19-T Vector (Takara) according to the instruction of manufacturer. The following primers were used: TestaORF-F (5 0 -TGCTCTAGAATG-GAGGGATCCATAGGCCT-3 0 ) and TestaORF-R: 5 0 -AACTGCAGT-TATTGTGGATCCCACAAAT-3), in which the unique Xba I and Pst I sites were introduced at the 5 0 and 3 0 ends of the ORF of AhTc1, respectively. The plant expression vector containing 35S:AhTc1 was constructed by cleaving the ORF was from T vector and recombining into pCAMBIA2300-35S-OCS vector, and using CaMV 35S promoter to direct the expression of AhTc1. The insertion of the AhTc1 construct was confirmed by PCR as well as enzymatic activity assays. The recombined vector was transferred into Agrobacterium tumefaciens LBA4404 and then used to transform Nicotiana tabacum cv Nc89 using leaf disc method (Horsch et al., 1986). Transformed seeds were selected for kanamycin resistance, and the transgenic lines were confirmed by PCR method ( Figure S2).

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Figure S1 Confirm the QTL-seq results using BSR in three populations. Figure S2 Schematic map of the transgene construction. A: schematic diagram for construction of pCAMBIA2300-35S-OCS-AhTc1. B: Transformation, regeneration and transplant. C: PCR detection of transgenic lines. Table S1 Detail information of SNPs identified using whole genome resequencing.