Deletion of beta‐fructofuranosidase (invertase) genes is associated with sucrose content in Date Palm fruit

Abstract The fruit of date palm trees are an important part of the diet for a large portion of the Middle East and North Africa. The fruit is consumed both fresh and dry and can be stored dry for extended periods of time. Date fruits vary significantly across hundreds of cultivars identified in the main regions of cultivation. Most dried date fruit are low in sucrose but high in glucose and fructose. However, high sucrose content is a distinctive feature of some date fruit and affects flavor as well as texture and water retention. To identify the genes controlling high sucrose content, we analyzed date fruit metabolomics for association with genotype data from 120 date fruits. We found significant association of dried date sucrose content and a genomic region that contains 3 tandem copies of the beta‐fructofuranosidase (invertase) gene in the reference Khalas genome, a low‐sucrose fruit. High‐sucrose cultivars including the popular Deglet Noor had a homozygous deletion of two of the 3 copies of the invertase gene. We show the deletion allele is derived when compared to the ancestral allele that retains all copies of the gene in 3 other species of Phoenix. The fact that 2 of the 3 tandem invertase copies are associated with dry fruit sucrose content will assist in better understanding the distinct roles of multiple date palm invertases in plant physiology. Identification of the recessive alleles associated with end‐point sucrose content in date fruit may be used in selective breeding in the future.

by texture into "soft," "semi-dry," and "dry" cultivars depending on water retention in dry fruit. Others have shown that these texture categories may also relate to the balance of sugars in the dates with higher sucrose associated with the low water retention found in the dry and semi-dry cultivars (Diboun et al., 2015;Mustafa, Harper, & Johnston, 1986;Yahia & Kader, 2011). High sucrose content in dates has been associated with low invertase enzyme levels which are necessary to convert sucrose to reducing sugars (Hasegawa & Smolensky, 1970). The date palm genome contains multiple invertases (Al-Dous et al., 2011;Al-Mssallem et al., 2013) for conversion of sucrose in various cellular and physiological pathways; however, specifically which invertases are involved in dry fruit sucrose content has not been identified. There is a reduction in date palm genome sequence diversity specifically in regions containing sugar metabolism genes (Al-Mssallem et al., 2013;Hazzouri et al., 2015) possibly indicating selection during cultivation. It would be of interest to identify potential genetic control of sucrose content in dried dates to improve breeding programs and development of post-harvest date fruit quality.
We previously reported the metabolomics analysis of 123 date fruits (Stephan et al., 2018) and their respective genotyping (Thareja et al., 2018). Here, we combine these data sets and analyze the sucrose measurements for association with genetic variants for a total of 120 dates. To understand the possible genetic basis of sucrose content in date fruit, we conducted an association study between the genome-wide genotypes of 120 date cultivars with their sucrose content at the dried fruit stage.

| Sample collection and sequencing
Date fruit and leaves from Phoenix species samples (Table S1) were collected from across the date palm-growing world, and DNA extracted and sequenced as described Mathew et al., 2014). Briefly, fruit samples were prepared with the Qiagen Plant DNeasy DNA extraction kit, and libraries prepared and sequenced on either the HiSeq 2500 or 4000 system according to the manufacturers recommended protocol.

| Sequence analysis
The genomes of the date fruit have been sequenced and variable regions genotyped (Thareja et al., 2018)

| Genotype association with sucrose
Metabolite information from the date fruit samples was collected by Metabolon as previously described (Stephan et al., 2018). The genotype data were tested for association with the date fruit sucrose content (semi-quantitative non-targeted metabolomics data, run-day normalized, and normalized by Bradford protein content for metabolite M1519) using linear regression as implemented in the PLINK software (version 1.9). We found associations (p < 10 -10 ) with sucrose on 34 contigs, the strongest association at p = 5 × 10 -16 and a genomics inflation of lambda = 1.28. These contigs were matched to the oil palm reference using MUMmer (Marçais et al., 2018). The highest associated SNPs were located in the 60kb contig PDK30s742521 (http:// qatar -weill.corne ll.edu/resea rch/resea rch-highl ights /date-palm-resea rch-progr am/date-palm-draft -sequence) which was matched to longer contigs from a single-molecule-based date palm reference sequence of both the Khalas (GenBank Accession MT009344) and Deglet Nour genomes (GenBank Accession MT009343) (Torres et al., 2018). The Khalas reference sequence was noted to include sequence missing in the Deglet Nour reference, and so all genomes were aligned to the Khalas contig and diploid coverage normalized to 1 for analysis of copy number changes in the region. Sample relatedness was calculated in VCFtools (Danecek et al., 2011) using the built-in kinship algorithm.

| RE SULTS
Association of the date fruit genotype data to sucrose for the 120 date fruit samples (Table 1, Tables S1 and S2) revealed a region of date palm contigs with similarity to one main region of approximately 1Mb in the more contiguous oil palm genome reference (Figure 1a,  and Naboot Ali were genetically distinct (Table S2) with a kinship score of less than 0.25 among these cultivars (Table S4). The high-sucrose phenotype appears to be recessive as homozygotes for the deletion were the only group with very high sucrose. Heterozygotes appear to maintain enough invertase activity to convert the sucrose to glucose and fructose ( Figure 2).
We searched for the presence of the deleted allele in the rest of the date palm population that was sequenced and found that heterozygotes for the deletion (carriers) are common. The allele is more common in North African cultivars with 18 out of 67 North African cultivars carrying the allele. In contrast, only 4 four out of 54 Eastern cultivars carry the allele (Table 2).

| D ISCUSS I ON
Sucrose and its hydrolyzing enzyme invertase play critical roles in multiple plant functions including response to the environment and regulation of the life cycle (Roitsch & González, 2004;Sturm, 1999).
Within these functions, vacuolar invertases play a role in the balance of fruit sugar content (Roitsch & González, 2004). Research on the effect of invertase in tomato fruit revealed accumulation of sucrose was linked to a recessive invertase mutation (Klann, Chetelat, & Bennett, 1993).
The texture of dried date fruit and their classification as soft, semidry, and dry has been related to sucrose content with low sucrose content dates retaining more water and a softer texture (Cook & Furr, 1952;Kanner et al., 1978;Samarawira, 1983;Mustafa et al., 1986).
Previous research identified invertase activity as associated with dried date fruit sucrose content (Kanner et al., 1978) though genetic control of date palm invertase levels specifically in fruit has been elusive.  (Rygg, 1946;Samarawira, 1983). Date fruit with high invertase activity retain more water through the ripening process, even in dry climates, and this is expected to have an effect on other enzymatic activities affecting fruit features. Our results on sucrose content in dates agree well with the findings of Zhang and colleagues who found high sucrose in 3 of the same date fruit cultivars including Deglet Noor, Sukkary, and Nabtat (Zhang, Aldosari, Vidyasagar, Shukla, & Nair, 2015).
Here, we show that SNPs associated with very high levels of sucrose content in dried dates are located within a genomic region containing a deletion of two invertase genes. While there are multiple other genes in this region that could potentially play a role in sucrose content, the deletion of two of the invertase genes suggests a central and functional role for these invertase losses in such a strong phenotype. In support of this argument, previous studies on Deglet Noor show that the addition of topical invertase improved fruit texture by removal of crystalline sugars ("sugar wall" dates) that are associated with low invertase levels in the fruit (Smolensky, Raymond, Hasegawa, & Maier, 1975).
This invertase deletion allele appears to be recessive as all high-sucrose content dates contained a homozygous deletion of the invertase genes. This is similar to findings in tomato where a recessive invertase mutation in wild tomato results in high accumulation of sucrose in the fruit (Klann et al., 1993). Interestingly, heterozygotes of the deletion did not seem to have reduced levels of sucrose levels though more research will be required to better understand whether this is simply a result of end-point measurement of sucrose content and that the heterozygous deletion does affect earlier ripening stages.
The extremely high accumulation of sucrose in just a few cultivars does not fully explain the link between texture and sucrose content, that is, not all date fruit that are classified as "dry" or "semi-dry" contain high levels of sucrose at the final ripening stages. Observations of sucrose accumulation through the ripening process and not just at the end may play a part in the final texture (Mustafa et al., 1986). Possibly different haplotypes of the 3 invertase genes and other genes in the region may be active at different points in time and thereby explain a more complex phenotype. Future association studies on sucrose levels throughout the development process will be important to identify other genes potentially involved in the texture of dates.
In light of recent findings of a strong population division between eastern and western date palm cultivars (Chaluvadi, Khanam, Aly, & Bennetzen, 2014;Flowers et al., 2019;Mathew et al., 2015;Zehdi-Azouzi, Cherif, & Moussouni, 2015), it is important to consider which geographical regions this allele is found in. All dates with high sucrose shared the same allele that contained the same structure of While it is well known that invertase genes play a role in fruit sucrose levels, our findings here help identify the specific genetic basis of dry date fruit extreme sucrose content. The fact that high-sucrose date fruit contains a single copy of the invertase at this location suggests that the retained invertase may be of importance to other functions and localizes the most likely final sucrose inversion process to the deleted two genes. Variants in alleles that retain all 3 invertases and other invertases not identified here may play a role in the ultimate texture and water retention of dry dates beyond the extreme phenotype associated with homozygous deletion of the genes. This is indicated by the presence of some medium sucrose content dates that are not fully associated with heterozygous deletion of the invertases.
Our findings here could assist future date palm breeding programs and provide the basis for future studies to better understand the role of specific invertases in the pathway of fruit development.
Note added in proof: During the review of this paper, Hazzouri and colleagues reported similar association of this region of the date palm genome with fruit sucrose content and further showed that the inv-b gene is a pseudogene in some cultivars .
Gene expression analysis of these invertases (including a truncated F I G U R E 1 Identification of an allele associated with sucrose content in date fruit. (a) Date palm sequence contigs containing SNPs associated with fruit content were aligned to the oil palm reference sequence and found to derive mainly from a 1 Mb region of contig NC_026001.1 of Elaeis guineensis chromosome 9. PDK30: Date Palm contigs are represented as blue rectangles and PDK30s742521_60107 that contains the invertase is outlined in red. A red line is drawn at the multiple testing corrected genome-wide significance of [−log (10)  form of inv-a) and a distal one not detected in this study showed high expression in fruit development. Here, with the added information of the fully sequenced allele from Deglet Noor we are able to localize the exact deletion location and structure.

ACK N OWLED G M ENTS
This study was supported by grant NPRP-EP X-014-4-001 from the Qatar National Research Fund (a member of Qatar Foundation). We thank Robert Krueger of the USDA, Sean Lahmeyer of Huntington Gardens, and Diego Rivera of the University of Murcia for contributing date palm, Phoenix, and palm species.

CO N FLI C T O F I NTE R E S T
On behalf of all authors, the corresponding author states that there is no conflict of interest.

AUTH O R CO NTR I B UTI O N S
JAM designed the study, analyzed data, and wrote the manuscript.
SM collected samples and analyzed data. LM processed samples and conducted sequencing and analysis. SY conducted DNA sequencing analysis. YAM conducted sequencing and analysis. KS designed the study, conducted metabolomics analysis, and wrote the manuscript.

E TH I C A L A PPROVA L
Human and animal subjects were not included in this study. The authors declare not competing interests.