Georeferenced phylogenetic analysis of a global collection of wild and cultivated Citrullus species

Abstract The geographical origin of watermelon (Citrullus lanatus) remains debated. While a first hypothesis suggests the center of origin to be West Africa, where the endemic sister species C. mucosospermus thrives, a second hypothesis suggests northeastern Africa where the white‐fleshed Sudanese Kordophan melon is cultivated. In this study, we infer biogeographical and haplotype genealogy for C. lanatus, C. mucosospermus, C. amarus, and C. colocynthis using noncoding cpDNA sequences (trnT‐trnL and ndhF‐rpl32 regions) from a global collection of 135 accessions. In total, we identified 38 haplotypes in C. lanatus, C. mucosospermus, C. amarus, and C. colocynthis; of these, 21 were found in Africa and 17 appear endemic to the continent. The least diverse species was C. mucosospermus (5 haplotypes) and the most diverse was C. colocynthis (16 haplotypes). Some haplotypes of C. mucosospermus were nearly exclusive to West Africa, and C. lanatus and C. mucosospermus shared haplotypes that were distinct from those of both C. amarus and C. colocynthis. The results support previous findings that revealed C. mucosospermus to be the closest relative to C. lanatus (including subsp. cordophanus). West Africa, as a center of endemism of C. mucosospermus, is an area of interest in the search of the origin of C. lanatus. This calls for further historical and phylogeographical investigations and wider collection of samples in West and northeastern Africa.

. To date, there seems to be a consensus regarding its complex taxonomy.
According to recent research, including phylogenetic analyses and nomenclatural reviews (Chomicki et al., 2020;Renner et al., 2014) as well as a phenetic comparison within the genus (Achigan-Dako et al., 2015), Citrullus encompasses the following seven species: (a) the widely cultivated C. lanatus, a juicy fruit found in tropical and subtropical climates including var. cordophanus (Ter-Avan.) Fursa; (b) the tsamma melon C. amarus Schrad syn. C. caffer Schrad. or C. lanatus var. citroides (Bailey) Mansf., which grows in southern Africa (Whitaker & Bemis, 1976); (c) the egusi melon C. mucosospermus Fursa, previously referred to as a subtaxon of C. lanatus by many authors but which was raised to specific rank many decades ago (Fursa, 1972(Fursa, , 1981(Fursa, , 1983; (d) the bitter apple C. colocynthis (L.) Schrad., a perennial species growing in sandy areas throughout northern Africa and Near-East; (e) C. ecirrhosus Cogn., another perennial wild species (De-Winter, 1990); (f) C. rehmii, a wild annual species, with small fruits used for feeding desert animals; and (g) C. naudinianus (Sond.) Hook.f. from the Namib-Kalahari region, previously placed in the genus Acanthosicyos Welw. ex Hook. f. and sister group to all other species. Citrullus ecirrhosus, C. rehmii and C. naudinianus, currently, are considered endemic and restricted to the desert region of Namibia with very little intraspecific variation . This understanding may however change with more extensive sampling.
Given recent clarification of Citrullus taxonomy, it is appropriate to revisit the question of genealogy. In a recent phylogenetic study, Chomicki and Renner (2015) indicated West Africa as the possible center of origin of C. lanatus, a claim at odds with earlier assertions. Indeed, whereas some experts believe watermelon originated from southern Africa, based on the distribution of wild relatives in the Namibian desert (Bates & Robinson, 1995), others point to northern or northeastern Africa, especially the Nile river area in Sudan, as the likely center of origin based on archaeological data (Paris, 2015;Renner et al., 2019;Wasylikowa & Van Der Veen, 2004). According to these latter studies, very few archaeological records of watermelon are known from southern Africa, and all date to a relatively recent period between the 8th and 13th centuries A.D. Furthermore, a cultigen is known to have been cultivated in the Nile Valley when farming was not yet practiced in southwest Africa (Zohary & Hopf, 2000 deemed to be a subspecies or variety of C. lanatus (Achigan-Dako et al., 2015;Hammer & Gladis, 2014;Nesom, 2011;Renner et al., 2014).

| Taxon sampling and total genomic DNA isolation
To investigate the geographical distribution of watermelon haplotypes, we included in the study the four most economically impor-

| Sequence analysis and haplotype coding
For each chloroplast region, the forward and reverse sequences were manually edited and combined into a single sequence using Geneious 5.5.6 (Kearse et al., 2012). These merged sequences were submitted to NCBI GenBank to make them publicly available.

| Analysis of genetic diversity
Statistical parameters including sequence diversity, nucleotide diversity (Nei, 1987;Nei & Tajima, 1983), A + T content, and substitution, inversion, and transversion rates (Baier, 2011;Chiu et al., 2013;Librado & Rozas, 2009;Rozas & Rozas, 1997) were computed using DnaSP software version 5.10.01 (Chiu et al., 2013;Librado & Rozas, 2009). Pairwise intra-and interspecific sequence divergences for each chloroplast region were computed as the mean number of nucleotide differences per site, following the formula: where Tv is the number of transversions, Ts is the number of transitions, ID is the number of insertions/deletions, and L is the total length of the sequence O'donnell, 1992). We used the PERMUT software package (Pons & Petit, 1996) to calculate the mean within-population gene diversity (Ching-Yi et al., 2012) and the total gene diversity (h T ) (Chiu et al., 2013;Guicking et al., 2011;Martin et al., 2003;Sun et al., 2019;Zhao et al., 2019). Other intrapopulation metrics such as the number of haplotypes per population, the number of singleton haplotypes (haplotype that occurs only once in the study), the number of effective haplotypes, and the overall haplotype diversity were also estimated (Baier, 2011).

| Population differentiation and genetic structure
To infer genetic differentiation parameters, haplotypes grouped by continent or subregion were considered to comprise distinct geographic populations. We assessed the genetic differentiation among geographic populations by computing the gene differentiation statistic developed by Nei and Chesser (1983), an allele frequency-based approach that relies on estimates of genetic differentiation among geographic subpopulations. We further used Hudson et al. (1992) Pons & Petit, 1996) with 1,000 permutations. In contrast to Gst, Nst considers sequence differences between the haplotypes. Thus, Nst > Gst indicates that closely related haplotypes are observed more often in a given geographical area than would be expected by chance (Burban et al., 1999;Chávez-Pesqueira & Núñez-Farfán, 2016;Chiu et al., 2013;Grivet, 2002;Guicking et al., 2011;Pons & Petit, 1996;Sun et al., 2019). Following Templeton (1996), we tested the null hypothesis of homogeneity of nucleotide mutations using Fisher's exact test to investigate haplotypic differentiation within the overall population. We also performed Fu's Fs (Fu, 1997) to analyze the expansion level of the population under the hypothesis of selective neutrality and population equilibrium. Tajima's D test also was implemented for comparison with the Fu's Fs.

| Statistical parsimony network
Parsimony networks were constructed to infer phylogeographical relationships among haplotypes using TCS v1.21 (Clément et al., 2000). TCS estimates genealogical relationships of sequences and collapses identical sequences into haplotypes (HT).
To account for the different mutation rates underlying base substitutions, indels, and microsatellites, we followed the two-step strategy described by Bänfer et al. (2006) and performed by Guicking et al. (2011). The network was re-drawn from the TCS output using Adobe Illustrator.

| Nucleotide variations, intra-and interspecific diversity
The length of the amplified trnT-trnL region within C. lanatus ranged from 951 to 954 bp. No parsimony-informative site was found within C. lanatus, but 3 indels were found at positions 242, 295, and 296. The amplified ndhF-rpl32 region ranged from 650 to 652 bp in the species, also with no parsimony-informative site, though 5 indels were found at positions 970, 1,028, 1,143, 1,178, and 1,198 (Table S1). The combined length of the two cpDNA regions was found equal to 1,601-1,605 bp and included 1 SNP Note: Parsimony-informative sites: Polymorphic sites with a minimum of two alleles that are each present at least twice in the population.
Noninformative sites: Polymorphic sites that are unique in the population (singleton sites).
Haplotype diversity: The probability that two given sequences from two different haplotypes belong to two different regions or populations.
Nucleotide diversity: The average number of nucleotide substitutions per site between two sequences (Lynch and Crease 1990).
Average number of nucleotide differences: The average number of nucleotide differences (either Indels or SNPs) within a given population. Indel events: The number of insertions/deletions in the genomic region.
A + T (%): A + T content in the genomic region.
(position 1,399) and 1 microsatellite (position 366); but no polymorphisms were parsimony-informative. In total, the sampled accessions of this species comprise 12 distinct haplotypes, among which 10 were singletons, with an overall haplotype diversity of 0.5656 (Table 2).
Sequence lengths within C. mucosospermus were similar, with the combined length of the two regions spanning by 1,601-1,604 bp.
One SNP (nonparsimony informative) was identified in the ndhF-  (50%) were found within C. colocynthis, indicating recent haplotype divergence in that species (Figure 1).  Table S1), and the colors indicate geographical distribution: Africa (green), Asia (yellow); Europe (red), and North America (blue) eight were specific; of the nine found in Europe, six were specific;

| Geographical distribution, genetic differentiation of haplotypes, and population expansion
and of the four recovered from America, two were specific to that region (see Figures 2-5).
Haplotypes of C. mucosospermus were almost uniquely restricted to West Africa, and C. amarus haplotypes appeared specific to southern Africa. Haplotypes of C. colocynthis shared by Namibia, Ethiopia, and northern Africa were also found widespread throughout Asia.
Across that continent, some haplotypes of C. colocynthis were specific to different countries (Figure 1). Six C. colocynthis haplotypes were specific to Asia, and six were specific to Africa. For this species, Iran contributed the highest number of haplotypes in Asia (Figure 1), as Egypt did in Africa (Figure 1).

| Genetic diversity and sequence variation
Within the genus Citrullus genetic diversity analyses have been conducted since the second half of the 20th century (Hashizume et al. 1996) (Levi et al., 2013).
Our findings revealed low cpDNA variability among C. lanatus and C. mucosospermus. This was also observed by  and  who found low nucleotide variability based on a low number of parsimony-informative sites within each of the studied species. Most haplotypes were found within noncultivated (C. colocynthis) rather than cultivated (C. lanatus and C. mucosospermus) species. Taxa were clearly separated from one another with divergence based mainly on indels and transition events . However, there was sufficient resolution of the trnT-L and ndhF-rpl32 noncoding regions to reveal intraspecific variability.
Chloroplast sequence analysis revealed that the ndhF-rpl32 region exhibits comparatively higher variability within the two cultivated species than the trnT-L region.  analyzed four chloroplast regions (nhdF, ycf6-psbM, ycf9-trnG and atpA-trnR) and found no variability within cultivated accessions, grouped either by morphological traits or geographical origin. In this study, we used a large number of C. lanatus accessions from a wide geographical range and observed low haplotype diversity within that species, as also revealed by Guo et al. (2013). While many factors can influence sequence diversity, selection is a major contributor via the imposition of bottlenecks that can substantially reduce diversity Levi et al., 2013). The lack of haplotype divergence within C. lanatus and C. mucosospermus is likely the re- Citrullus colocynthis exhibited a relatively high number of parsimony-informative characters.  revealed that haplotypes detected within C. colocynthis were associated with geographical origin and that was also confirmed by Levi et al. (2017). The haplotype diversity within C. colocynthis suggests cryptic evolution and calls for a comprehensive morphological comparison of Asian and African colocynths. Such an investigation is exemplified by the recent studies on Cucumis melo that revealed modern melon cultivars go back to two lineages and was domesticated at least twice: in Asia and in Africa (Endl et al., 2018).

| Citrullus haplotype evolution
Thirty-eight haplotypes were detected among the cultivated and wild Citrullus accessions used in this study.

| Genetic differentiation and geographical structure
The coefficient of population differentiation (Gst), that uses allelic frequencies and does not take into account the distances among haplotypes, and the coefficient of differentiation (Nst) based on the pairwise difference between alleles were found respectively, equal to 0.196 and 0.374; but the difference was not significant (p > 0.05).
In Citrullus spp. Mujaju et al. (2011) found Gst = 0.56 and Nst = 0.49 for sweet watermelon and Gst = 0.71, Nst = 0.81 for cow watermelon. The fact that the differentiation parameter based on the pairwise difference between alleles is greater than the one calculated without permutation (i.e., Nst > Gst) indicates that the collection is characterized by clear geographic structure  F I G U R E 4 Distribution and frequencies of Citrullus spp. haplotypes in Europe Grivet, 2002;Guicking et al., 2011). Also, the significant value of the total gene diversity across all four geographical regions (hT = 0.917, standard error = 0.0320) is indicating a strong structure in the population (Pons & Petit, 1996;Sun et al., 2019;Zhao et al., 2019). Levi et al. (2017) observed that accessions of C. colocynthis were subdivided into five groups in general agreement with their centers of diversification and origin. Our findings indicated that regional genetic differentiation statistics support Levi et al. (2017)'s conclusions, with subsamples from different regions exhibiting genetic differentiation associated with their likely centers of diversification.
Also, haplotypes of C. amarus were mostly grouped in Southern Africa, which is assumed to be the origin of that species (Chomicki & Renner, 2015;.
Citrullus chloroplast sequences analysis with TCS 1.21 resulted in a network where haplotypes widely sampled throughout West Africa were placed at the root. While coalescence theory predicts that older alleles will prevail in a population due to a higher number of descending lineages and associated wider geographic distributions (Crandall & Templeton, 1993), such an observation may depend on sample sizes and evolutionary/domestication histories and also the lack of subsp. cordophanus (from northeast Africa) in the germplasm studied. In this study, H1 is the most frequently sampled haplotype and has the most connections with other haplotypes; thus, H1 may be considered the most ancient haplotype. This ancient haplotype was sampled most frequently in West Africa (i.e., Nigeria and Benin) and was highly shared by accessions of both C. lanatus and C. mucosospermus. These results support the findings of Chomicki and Renner (2015) and Renner et al. (2019) who used eleven gene regions to infer phylogeny of Citrullus species, and also a 3,500-year-old leaf sample from the Egyptian tomb to infer close relationship between C. lanatus and C. mucosospermus. Our findings, based upon a large set of egusi melon and watermelon accessions from four continents, provide further evidence of that close relationship between these two species. However, they are indeed two different species, as previous crosses between them (e.g., Charleston Gray x PI 560006) resulted in high levels of sterility (Gusmini et al., 2004). The very limited haplotype diversity among the two species suggests an old split with chlorotype fixation  and ancient types of C. mucosospermus originating from West Africa (Renner et al., 2014). However, to the best of our knowledge, no wild populations have been con- has been rarely studied.
Archaeological evidence indicates the northeast of Africa as a center of origin and domestication (Chomicki et al., 2020). Authors reported wild dessert watermelon in that region (Paris, 2015)  Ks: A weighted average of the number of differences between sequences from continents i and j.
Kxy: The average number of differences between two samples, regardless of their provenance.
GST: The coefficient of genetic differentiation between continents.

TA B L E 5 Pairwise genetic differentiation between continents (a), between African regions (b) and between Asian regions (c)
TA B L E 4 Diversity and differentiation statistics for the four Citrullus spp. in this study, based on combined cpDNA haplotypes, according to Pons and Petit (1996) and adapted from Guicking et al. (2011)   been found (Paris, 2015). Moreover, our data showed that one of the Egyptian accessions (PI 525083), indicated to be C. amarus and observed by Levi et al. (2013) to cluster with dessert watermelon, exhibits a unique haplotype (H32). That accession is several mutations away from C. colocynthis and closer to watermelon and egusi melon haplotype. Previous findings of Levi et al. (2017)  Africa, the focus should be on both west and northeastern regions to resolve the domestication history of modern cultivars.

| CON CLUS ION
The genus Citrullus includes seven species that may originate from different parts of the world, according to previous and current data.
Our results reveal 38 distinct chloroplast haplotypes among Citrullus spp. and the distribution of those haplotypes across the world. The close relationship of egusi melon and Kordofan melon to watermelon raised new questions regarding the colonization routes of major crops and the current status of extant genetic diversity of wild relatives in places of origin.

ACK N OWLED G EM ENTS
This study was financially supported by the Vavilov-Frankel We acknowledge contribution and technical support from Christina Koch, Petra Oswald, Birgit Wohlbier, R.S. Vodouhe, and Adam Ahanchede. We thank Augustin Ganse for assistant in map making and Hanno Schaefer for useful comments on the manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.