A Citrullus genus super‐pangenome reveals extensive variations in wild and cultivated watermelons and sheds light on watermelon evolution and domestication

Sweet watermelon (Citrullus lanatus subsp. vulgaris) is among the most important vegetable crops in the world. Wild relatives are important resources for watermelon breeding. Here we report high-quality reference genomes of three wild watermelons, C. mucosospermus, C. amarus and C. colocynthis, and the divergence and genome evolution of different Citrullus species. Using genomic data from 547 watermelon accessions spanning four Citrullus species, we construct a super-pangenome to represent the Citrullus gene repertoire and provide a catalogue of orthologous relationships among species. Gene presence/absence variation analysis uncovers many disease resistance genes that are missing in cultivated watermelons, as well as genes with significantly different occurrence frequencies between populations that might underlie watermelon evolution and domestication. We revisit watermelon domestication using the recently identified wild progenitor, Kordofan melon, which provides insights into the domestication of fruit bitterness, sweetness and flesh coloration. The Citrullus super-pangenome provides a valuable resource for breeding and biological discovery, and our comparative genomic analyses shed additional light on watermelon evolution and domestication.

Cultivated watermelon (Citrullus lanatus subsp.vulgaris) has a narrow genetic base due to domestication and breeding focusing primarily on fruit quality traits.Bitter or bland-tasting wild watermelons, such as C. mucosospermus, C. amarus and C. colocynthis, have been used in watermelon breeding to introduce disease resistance to modern cultivars (Levi et al., 2017).These wild relatives are valuable sources for broadening the improvement potential of cultivated watermelon, providing additional functionally important genes and alleles that are absent in cultivated watermelon.However, the lack of genome sequences of these wild watermelon species has limited their utilization in watermelon breeding.
In this study, we assembled high-quality reference genomes for three wild watermelons.Three wild accessions were selected for reference genome sequencing: C. mucosospermus USVL531-MDR, C. amarus USVL246-FR2 and C. colocynthis PI 537277.PacBio CLR sequences for USVL531-MDR and Illumina sequences for USVL246-FR2 and PI 537277 were generated (Table S1; Figure S1) and de novo assembled.Each of the resulting assemblies (Table S2) was anchored to 11 chromosomes (Table S3; Figures S2 and S3; Appendix S1).Quality assessments demonstrated high quality of these assemblies (Appendix S1).The repeat-masked assemblies (Table S4) were annotated for protein-coding genes.Gene predictions in six watermelon reference genomes, including three developed here and three published ones (Guo et al., 2019;Renner et al., 2021;Wu et al., 2019), were improved through mapping genes between assemblies with Liftoff (Shumate and Salzberg, 2021).A total of 21 676 to 22 764 protein-coding genes were predicted in these six watermelon genomes (Table S5).Comparative analysis revealed a large inter-chromosomal rearrangement involving chromosomes 1 and 4 between C. colocynthis and the other three Citrullus species, C. lanatus, C. amarus and C. mucosospermus (Figure 1a; Figure S4).Good collinearity was found between the entire chromosome 4 of C. colocynthis and chromosome 8 of melon (Figure 1a), suggesting that C. colocynthis likely carried the ancestral karyotype and that the inferred chromosome fission and fusion events occurred approximately 4.54-2.41Mya after the divergence of C. colocynthis from other watermelons and before the separation of C. amarus from C. mucosospermus and C. lanatus (Figure S5).
Resequencing reads of each watermelon accession were aligned to the pangenome of its own species for detecting gene presence/absence variations (PAVs).Accessions with insufficient read coverage were excluded to control false calls of gene PAVs.In the Citrullus super-pangenome, the content of core genes (present in all accessions) was 63.7% (24 235 genes), much lower than that in the species-level pangenomes (85.6%, 97.2%, 90.0% and 88.7% for C. lanatus, C. mucosospermus, C. amarus and C. colocynthis, respectively) (Figures S8-S10), indicating diverse genetic makeups among the four species.Genes with different occurrence frequencies between watermelon species or groups were identified (Tables S11-S14) and included those that were under selection during watermelon domestication and improvement (Table S15).17 disease resistance-related genes that were absent or present at low frequencies in the C. lanatus gene pool while present at high frequencies in at least one of the wild species gene pools were identified (Table S16).
We compared C. lanatus landrace to Kordofan melon (Figures S11 and S12), recently found to be the possible direct progenitor of cultivated watermelon (Renner et al., 2021), and identified 123 domestication sweeps with a cumulative length of 17.62 Mb (Table S17) and harbouring 399 annotated genes, among which 107 were in fruit quality QTLs (Figures 1c, S13 and S14; Table S18).The Kordofan melons were not sweet with flesh soluble solids content (SSC) ranging from 0.2 to 3.2 °Brix (Table S19).The sugar transporter ClTST2 regulates sugar The super-pangenome of watermelon 1927 accumulation in watermelon flesh (Ren et al., 2018).The Kordofan melon line having the highest SSC carried the ClTST2 tandem duplication (Table S19; Figure S15; Appendix S1).A genetic diversity reduction was observed in ClTST2 genomic region in landraces compared to Kordofan melon (Figure 1d).This ClTST2 tandem duplication became a predominant allele in landraces (70 out of 86 accessions; 81.4%) and was almost fixed in cultivars (238 out of 245 accessions; 97.1%) (Table S20).Fruit flesh SSC levels were significantly higher in accessions carrying the ClTST2 tandem duplication compared to the ones with only one copy (Figure 1e).These results together suggested that the ClTST2 tandem duplication was present in wild watermelon populations and was selected during domestication likely due to its important role in promoting sugar accumulation in fruits.
Collectively, our Citrullus super-pangenome provides insights into watermelon evolution and domestication and serves as a comprehensive resource for researchers and breeders to mine and utilize genes in cultivated and wild watermelon species.
Appendix S1 Supploementary notes and methods.S1 Summary statistics of Illumina reads.Table S2 Summary statistics of de novo assemblies of three wild watermelon reference g.Table S3 Summary of pseudomolecule statistics.Table S4 Summary of repeat annotation.Table S5 BUSCO completeness scores of watermelon genome assemblies and annotations.Table S6 Watermelon accessions used in the pan-genome study.Table S7 SNPs and small indels identified in watermelon.Table S8 Summary statistics of the species-level pan-genomes.Table S9 Citrullus super-pangenome orthologous groups.Table S10 Orthologous groups specifically present or expanded in C. lanatus.Table S11 Number of genes with significantly different occurrence frequencies between populations.Table S12 Genes with significantly different occurrence frequencies between populations.Table S13 Enriched Biological Processes.Table S14 Genes related to meristem maintenance and development that had significantly higher frequencies in the C. lanatus and C. mucosospermus.Table S15 Genes with significantly changed occurrence frequencies during watermelon domestication and improvement.Table S16 Genes related to disease resistance that had significantly different frequencies in the cultivated watermelon compared to the wild species.Table S17 Putative domestication sweeps.Table S18 Genes in domestication sweeps.Table S19 Key fruit traits in Kordofan melons and genotypes at ClBt and LCYB genes.Table S20 Fruit flesh soluble solid content in watermelon accessions with different alleles for ClTST2.

Figure 1
Figure 1 Citrullus genus super-pangenome reveals variations in wild and cultivated watermelons.(a) Synteny among the genomes of melon and various watermelon species.Genomic regions syntenic to C. colocynthis chromosomes 1 and 4 are highlighted in yellow and green, respectively.(b) Upset diagram of orthologous groups among the four watermelon species.(c) Domestication sweeps detected through comparing landraces to Kordofan melons.Red bars at the bottom indicate genomic regions under selection.(d) Nucleotide diversities in genome regions surrounding the ClTST2 gene in different watermelon populations.(e) Comparison of fruit flesh sweetness between accessions carrying the two different alleles of ClTST2, using t-test with unequal variance.
Figure S1 K-mer distribution of Illumina reads.

Figure S5
Phylogeny and estimated times of divergence events.

Figure S6
Workflow for Citrullus super-pangenome construction.Figure S7 Genes in the Citrullus super-pangenome.Figure S8 Compositions of the four species-specific pangenomes.Figure S9 Compositions of the Citrullus super-pangenomes.
Figure S10 Numbers genes detected in individuals of different watermelon populations.
Figure S11 Phylogenetic tree of wild and cultivated accessions.
Figure S13 Expression levels of Cla97C05G101010.

Figure S15
Read alignment at ClTST2.Table