Demographic and ecogeographic factors limit wild grapevine spread at the southern edge of its distribution range

Abstract The spatial distribution of plants is constrained by demographic and ecogeographic factors that determine the range and abundance of the species. Wild grapevine (Vitis vinifera ssp. sylvestris) is distributed from Switzerland in the north to Israel in the south. However, little is known about the ecogeographic constraints of this species and its genetic and phenotypic characteristics, especially at the southern edge of its distribution range in the Levant region. In this study, we explore the population structure of southern Levantine wild grapevines and the correlation between demographic and ecogeographic characteristics. Based on our genetic analysis, the wild grapevine populations in this region can be divided into two major subgroups in accordance with a multivariate spatial and ecogeographical clustering model. The identified subpopulations also differ in morphological traits, mainly leaf hairiness which may imply adaptation to environmental stress. The findings suggest that the Upper Jordan River population was spread to the Sea of Galilee area and that a third smaller subpopulation at the south of the Golan Heights may represent a distinguished gene pool or a recent establishment of a new population. A spatial distribution model indicated that distance to water sources, Normalized difference vegetation index, and precipitation are the main environmental factors constraining V. v. sylvestris distribution at its southern distribution range. These factors in addition to limited gene flow between populations prevent further spread of wild grapevines southwards to semi‐arid regions.

Wild grapevine is believed to have originated in the Caucasian region (Grassi et al., 2006;Heywood & Zohary, 1995) and disperse over long distances mainly by birds, throughout its distribution range (Hegi, 1925) from Western Europe to Eastern and Central Asia (Arroyo-García & Revilla, 2013). V. v. sylvestris is a dioecious species where the male flower contains erect stamens and does not contain female organs, while the female flower contains an ovary, stylus, and degenerated stamens (Spada et al., 2003;Zohary & Spiegel-Roy, 1975). The fruits develop from the female flower and are usually characterized by a thin bunch of round, small dark berries, and an oval pit . Until the mid-19th century, the wild grapevine was abundant across Europe; however, its distribution dramatically decreased with the penetration of pathogens (e.g., phylloxera, powdery, and downy mildews) from North America .
Recent studies examined the genetic diversity and the spatial distribution patterns of wild grapevines using primarily simple sequence repeat (SSR) markers (Biagini et al., 2014;De Andrés et al., 2012;Ergül et al., 2011;Schneider et al., 2015;Zoghlami et al., 2013). For example, a study conducted across different regions in Spain showed that V. v. sylvestris populations grow in a wide range of habitats including sheers and beaches, forests, and riverbeds (De Andrés et al., 2012). Following an extended survey that included recording of ecological and topographic traits of V. v. sylvestris in Italy, it was shown that habitats in altitudes below 300 m with access to water sources, high vegetation density, low anthropogenic disturbance, and potential correlation with carbonatic soil substratum are usually more favorable for establishment of stable populations (Biagini et al., 2014). The distribution of wild grapevine is also sensitive to biotic factors including mites, powdery mildews, and downy mildews (De Andrés et al., 2012).
An early study on the distribution of grapevines in the old world, including the southern Levant (where today Israel, southern Syria, and southern Lebanon are present, Figure 1a), has failed to identify wild grapevine populations in Israel (Zohary & Spiegel-Roy, 1975).
A decade later, a population of V. v. sylvestris was identified on the Jordan River banks at the Upper Galilee region and marked the southern distribution edge of wild grapevine (Rottenberg, 1998).
More recently, a comprehensive survey conducted throughout Israel was able to locate additional populations in the Upper Galilee region and around the Sea of Galilee (Drori et al., 2015. Some of the collected wild grapevines were sequenced using next-generation sequencing (NGS) techniques, and their single nucleotide polymorphism (SNP) data were initially compared to those of domesticated endogenous varieties . However, none of these studies systematically explored the ecology, nor the genetic or phenotypic characteristics of wild grapevines along the southern edge of the species distribution range.
Environmental factors could be both abiotic and biotic (Leach et al., 2016;Lewis et al., 2017;Murcia, 1995) demographic factors include genetic isolation (Sexton et al., 2009), absence of gene flow, high inbreeding, and low genetic variation. Adaptation of populations to new environments relies on standing genetic variation or accumulation of new mutations (Barrett & Schluter, 2008). Thus, genetic variation is an important factor for long-term adaptive potential of a population (Bouzat, 2010).
In the past few decades, new powerful computational tools were developed to explore the spatial distribution patterns and associated diversity in natural populations. Among these tools are ecological niche models (ENM) and species distribution models (SDM) (Guisan & Thuiller, 2005;Wiens & Graham, 2005). A variety of algorithms for SDM are now available, including machine learning-based approaches (Olden et al., 2008;Segurado & Araújo, 2004). One of the popular tools to study SDMs is a maximum entropy-based (Maxent) niche modeling technique (Phillips et al., 2006), which allows to estimate the distribution range of a species and the main contributing environmental factors (Rhoden et al., 2017;Slater & Michael, 2012).
Another widely applied approach to study the distribution of a species is a multivariate clustering (Hargrove & Hoffman, 1999), which considers many variables in order to partition the dataset into groups or clusters. A well-clustered set of observations is the set of entities that shows higher similarity within the cluster than to entities in other clusters (Everitt, 1980). In the past decade, several such clustering approaches have been proposed to assess the presence of spatial autocorrelation in a dataset (Córdoba et al., 2013;Peeters et al., 2015) and have been widely applied in agriculture (Ohana-Levi et al., 2019;Tardaguila et al., 2018), epidemiology (Mahara et al., 2016) and environmental modeling (Ren et al., 2016).
However, only a few studies applied multivariate spatial clustering models to address species distribution problems (Feng et al., 2019;Feng et al., 2017).
In this study, we characterize the distribution constraints of V. v. sylvestris in the southern part of the Levant. We hypothesize that the wild grapevine distribution is limited by an interplay between demographic and environmental factors. To address this hypothesis, we use a wide set of tools to quantify the contribution of different parameters to the observed genetic and phenotypic variation among wild grapevine populations, at the southern distribution edge of the species.

| Plant material and research area
During the years 2012-2019, a comprehensive survey of wild grapevines was conducted (Drori, Levy, et al., 2017). A total of 129 V. v. sylvestris accessions were sampled in the Upper Galilee and around the Sea of Galilee in the northern part of Israel ( Figure 1). Verification of the sylvestris identity of accessions and determination of their sex were performed based on their flower's structure: The male flower contains only stamens, and the female flower contains a full carpel and deteriorated stamens ( Figure S1). For each plant, we sampled shoot tips or young leaves for DNA extraction. Sampling was conducted during the spring of each year, and samples were stored at −80°C until processing.

| Multivariate spatial clustering
Seven continuous variables were selected based on previous studies of wild grapevine ecology and distribution and used in the spatial model (Biagini et al., 2014;Carey et al., 2002;Hunter & Bonnardot, 2011). The environmental data were prepared as raster grids using the ArcGIS software v10.6.1 (ESRI, 2018). All variables represented as grids were scaled to a spatial resolution of 30 m and included the following variables: • Topography (slope and aspect): calculated from a digital elevation model (DEM) generated by the Survey of Israel (SOI). Slopes were calculated in degrees relative to a horizontal plane, and aspect was calculated as degrees relative to the North cardinal direction (0°).
• Distance to a water source: calculated from the streams layer generated by SOI, using the Euclidean distance tool in ArcGIS (ESRI, 2018).
• Normalized difference vegetation index (NDVI): generated for two dates (7 April 2019 and 12 July 2019) using Landsat 8 imagery. Two time points (summer and spring) were chosen because of the differences in vegetation density between these seasons; the spring period is characterized by higher vegetation compared to the dry summer period, when the vegetation remains mostly near water sources or at irrigated areas. The images were atmospherically corrected using the dark object subtraction method via the "RStoolbox" package in R (Leutner et al., 2019). The NDVI value was calculated as the ratio between the difference of reflectance from the near-infrared (NIR) and red bands, and the sum of the reflectance and red bands, to provide a scale of the degree of photosynthetically active biomass (Tucker, 1979).
• Precipitation: mean annual rainfall amounts (mm) based on a 30year average as measured by the Israel Meteorological Service.
The values of each of these variables were extracted for each of the accessions according to their geographic location, resulting in a total of 119 entities; each represents a single accession. To control for autocorrelation between variables, spatially varying attributes were clustered (Peeters et al., 2015). Using the 119 locations, the Getis-Ord Gi* hot-spot analysis (Getis & Ord, 1992) was performed for each variable, with a radius of 5,000 m using the R package "spdep" (Bivand & Wong, 2018). The Gi* hot-spot analysis is a local method that relates to spatial subregions of the entities in space and is designed to quantify the degree of spatial autocorrelation among neighboring points. The output of this calculation is a Z-score that is assigned to each point after standardization and indicates the degree of similarity between neighboring entities. Z-score values far from zero signify strong spatial dependencies, and negative or positive values correspond to low or high values of the original variable, respectively (Luković et al., 2015).
A complementary p-value represents the significance level of the spatial relationship. A fuzzy c-means (FCM) clustering algorithm was then applied with the Z-score values of the different variables as input (Ohana-Levi et al., 2020). A fuzzy set relies on the premise that boundaries between clusters (C) are not always discrete; thus, assignment of an observation to a specific cluster is not definite (Gath & Geva, 1989).
The output of the FCM algorithm includes a list of fuzzy membership values. Each observation receives a membership value between 0 and 1, denoting the degree of membership, or the probability of its placement to each cluster C (Akman et al., 2019).
The observation is then assigned to a specific cluster according to the maximum fuzzy membership. The FCM algorithm was applied using the "ppclust" package in R (Cebeci, 2019). The number of C was determined by the admixed and correlated allele frequency models. Cluster separation was evaluated using the silhouette index (Rousseeuw, 1987). This index is based on the similarity of each object to other objects in their assigned cluster, compared to the dissimilarity of the objects to those of other clusters.
The silhouette index was applied using the R package "cluster-Crit" (Desgraupes, 2018). Finally, a McNemar's χ 2 test (Hazra & Gogtay, 2016;McNemar, 1947) was performed to assess whether the clusters based on the genetic diversity analysis had a similar distribution to those generated by the multivariate spatial clustering method based on ecogeographic attributes.

| Niche and distribution modeling analysis
To model the niche suitability and distribution of wild grapevine, a maximum entropy (Maxent) (Rhoden et al., 2017) analysis was conducted with the Maxent software (Phillips et al., 2020). The Maxent algorithm takes environmental information as a grid and georeferenced occurrence localities based on the sampling site locations and performs a suitability analysis across the grid. For the Maxent analysis, ten variables were used, seven of which were described in the multivariate spatial clustering section. In addition, the following categorical variables were included: • Soil and lithology: set as categorical variables based on the SOI.  (Hanley & McNeil, 1982). The logistic output format of Maxent is a probability map that can be interpreted as the predicted probability for the occurrence of wild grapevines in the studied area. The resulting model was visualized as a map using the average value across 15 replications.

| Phenotyping analysis
Among the sampled accessions, 68 plants from different sampling locations that were physiologically fit for phenotyping were morphologically characterized following the International Organization of Vine and Wine (OIV) standards adopted by the "COST Action GrapeNet FA1003" (2007). The list of 20 OIV descriptors that were used in this study is provided in Table S1.
A principal component analysis (PCA) was performed for the morphological characteristics using the "Gifi" package (de Leeuw & Mair, 2009) in R, which implements categorical PCA ("PRINCALS") and is visualized using "ggplot2" package (Wickham, 2016). A comparison of the scores across all 20 OIV descriptors was conducted between the two populations, using the Wilcoxon signed-rank test.

| DNA extraction and genotyping
Total DNA was extracted using a modified cetyl-trimethylammonium bromide (CTAB) protocol, described previously by  and . Briefly, frozen leaves were weighed and ground with a pestle. CTAB buffer was added before incubation (65°C, 30 min) in a dry bath and stirred with vortex 3-4 times during incubation. An equal volume of chloroform:isoamyl alcohol (24:1) was added and mixed by inverting tubes. After phase separation, DNA was precipitated by 1/2 volumes of 5 M NaCl and 2 volumes of absolute cold ethanol. The extracted DNA pellets were air-dried at room temperature and dissolved in 70 µl DNase-free water (Promega Ltd., USA). Samples were stored in a −20°C freezer until genotyping.
Genotyping of each accession was conducted using a panel of 22 standard SSR markers that were developed for genotyping grapevine (Emanuelli et al., 2013). Two markers ( Ltd., USA), were denatured and size-fractionated using capillary electrophoresis on an ABI 3500 Genetic Analyzer (Thermo Fisher Scientific Ltd., USA). Finally, allele size estimation at each marker was determined using the software GeneMapper 5 (Thermo Fisher Scientific Ltd., USA).

| Genetic diversity and population structure
To determine the number of populations and the assignment of samples to clusters, we used the Bayesian clustering analysis implemented in STRUCTURE 3.2 (Pritchard et al., 2000) with both the admixed and correlated allele frequency models. Twenty independent runs were conducted for each number of clusters (K) ranging from 1 to 9 using 50,000 iterations following a burn-in length of 5,000 iterations. The most likely number of clusters was determined based on the log-likelihood score of each K and the ΔK method (Evanno et al., 2005) using its implementation in the CLUMPAK software (Kopelman et al., 2015). For visualization, bar plots were generated with the Structure plot V2.0 interactive web application (Ramasamy et al., 2014).
Next, a neighbor-joining tree was constructed using the genetic information obtained from all accessions in addition genetic data generated for 8 rootstock varieties (Vitis rupestris) which were included as an outgroup. The dendrogram was generated from a Bruvo's genetic distance (Bruvo et al., 2004) calculated with 1,000 bootstrap replicates using the R package poppr (Kamvar et al., 2014).
Analysis of genetic variation within and between the identified clusters using the analysis of molecular variance (AMOVA) and PCoA
R scripts used in this study are provided as Supplementary R script file.

| RE SULTS
In a comprehensive survey conducted between 2017-2019, a wide range of habitats was screened from the Negev desert in the south to the Upper Galilee in the north of Israel to complement previous efforts to identify wild grapevine populations (Drori, Levy, et al., 2017; in accordance with previous survey conducted in this region .

| Multivariate spatial clustering
Seven spatially environmental variables were included in the multivariate spatial clustering analysis conducted for the 119 accessions

| Genetic differentiation between wild grapevine populations
The SSR markers used to test the genetic diversity in the collec- To test this hypothesis and identify the number of clusters that best represent the wild grapevine collection, a Bayesian clustering analysis was conducted in STRUCTURE using the SSR markers genotyped over all accessions. We used the ΔK test and the log-likelihood scores for each K to determine the number of clusters and found that K = 2 (ΔK = 92.12) is the most probable number of clusters while at K = 3 (ΔK = 4.32) is the second-best result ( Figure S2a). Overall, accessions were assigned in accordance with their geographic sampling location, that is, the Upper Jordan River region and the Sea of Galilee region (Figure 1). At K = 3, the Sea of Galilee cluster was further split to a subpopulation at the south of Golan Heights which is highly iso- (b) Principal coordinates analysis (PCoA) plot conducted via covariance matrix with data standardization for the 3 regional groups: Sea of Galilee (blue), south-Golan (black), and Upper Jordan River (red). (c) A heatmap of pairwise FST values conducted between each pair of the 3 regional subpopulations Upper Jordan River and the Sea of Galilee, respectively, although the last two clusters include 9.5 and 5.5 times the number of accessions, respectively (Table S4).
To study the genetic diversity among the identified clusters, the  (Table 1).

| Morphologic differentiation between wild grapevine populations
Phenotypic analysis was conducted for 68 individuals representing  (Table S5).

| Environmental factors affecting wild grapevine distribution
To explore the contribution of environmental factors to the occurrence of wild grapevine at the south edge of its distribution range, a niche modeling analysis was conducted. The analysis was performed to predict the habitat suitability using the information at the grid cell resolution for each of the two main subregions identified in the multivariate spatial clustering: Upper Jordan River, Sea of Galilee, and for the entire study area ( Figure 5). Overall, the obtained models had high prediction accuracy ranging between AUC values of 0.93-0.97.
Among the ten environmental variables used to calculate the three models, distance to water was the strongest contributor (entire region-27.8%, Upper Jordan River-23.7%, and Sea of Galilee-20.1%) to wild grapevine spatial distribution across the studied area (Figure 5d-f, In accordance with the contribution of the environmental factors to wild grapevine distribution, the predicted habitat suitability was found to decrease in probability at distances higher than 1 km from water sources and found higher at July NDVI levels between 0.6 TA B L E 1 Summary of genetic variation statistics at 20 SSR loci in the V. v. sylvestris germplasm collection for two structure divisions (K = 2, K = 3)   (Tables S7 and S8). All other tested variables had a low contribution to the distribution of wild grapevine in the studied area.

| D ISCUSS I ON
What limits the distribution of species is a fundamental question in ecological and evolutionary biology. Wild grapevine is a perennial species of economic importance due to its value for breeding purposes in cultivated grapevines which are among the most important horticulture crops in the world. We established a new wild grapevine germplasm collection comprised of 129 accessions that represent the south-most natural populations along the distribution range of this species. This collection was characterized genetically and phenotypically, and the obtained information was compared to available environmental information for each sampling location. with the region around the Sea of Galilee. Despite the fact that the split to two clusters were supported by both ecogeographic and genetic analyses, some indications imply that that the Golan Heights subpopulation should be treated separately from the Sea of Galilee cluster. These indications were also supported by the population genetics analyses as detailed below. The sensitivity of STRUCTURE to distinguish between populations relies on its ability to identify differences in the allele frequency spectrum of each population (Porras-Hurtado et al., 2013). Small sample size, as in the case of the south-Golan subpopulation, may bias the frequency spectrum and diminish the ability to differentiate it from the large Sea of Galilee cluster. In addition, Cullingham et al. (2020) recently described that a strong bias toward K = 2 occurs when using the ΔK method.

| Wild grapevine population structure is determined by genetic and ecogeographic factors
The pairwise population comparisons of F ST and Nm values could be seen as an indirect estimate of isolation and migration between populations (Slatkin, 1987). The results of the F ST and Nm statistics between the Upper Jordan River and Sea of Galilee clusters indicated that these two populations are differentiated, even though some level of gene flow remains (Table S3). Moreover, the higher genetic diversity in the Upper Jordan River, as estimated by the number of alleles and expected heterozygosity, may imply that this population is ancestral to the Sea of Galilee population. It is tempting to speculate that the different ecogeographic conditions in each region maintain these two populations differentiated, as indicated by both phenotypic and genetic data, through selection. Sadly, the available genetic data in the study did not allow to test this hypothesis explicitly with high confidence. However, the observed heterozygosity was relatively high across regions (0.61-0.75, Table 1), similar to those reported for the Mediterranean basin and Central Asia wild populations (Riaz et al., 2018), indicating that overall, high genetic diversity is preserved among the wild populations in the south Levant.
Testing the level of differentiation and gene flow between the south-Golan and each of the main clusters indicated that this subpopulation is distinguished. The observed high F ST , low Nm, low heterozygosity yet relatively high number of private alleles in this subpopulation indicate this is a newly established or highly isolated subpopulation which has emerged from the Sea of Galilee cluster or from a different source (Table S3) , and (f) Sea of Galilee: "ASPECT" is the slope aspect (°); "LAND COVER" represents different land cover categories; "SLOPE" is the hillslope (°); "AprNDVI" is NDVI calculated using a satellite image acquired in April 2017; "PRECIPITATION" is the mean annual precipitation (mm); "JulyLST" is the land surface temperature calculated using a satellite image acquired in July 2017 (°C); "SOILS" is a variety of soils categories; "LITHOLOGY" is a variety of lithology categories; "JulyNDVI" is the NDVI calculated using a satellite image acquired in July 2017, and "D2W" is the distance from water bodies (in meters)

| Phenotypic differentiations between the different subpopulations
The analysis of the measured morphology across individuals well supported both the genetic and ecogeographic results of differentiation between the two clusters at the Upper Jordan River and the Sea of Galilee. The main phenotypic characteristics that were differentiated between the two populations are the hairiness traits on both shoot tip and leaves (Figure 4b). Hairiness has an important contribution to both biotic and abiotic stress resistance in plants. For example, a study on grapevine response to erineum mite found that leaf hairiness, leaf wax, and carbohydrate content are strongly associated with resistance (Khederi et al., 2014). Another study conducted in Sinapis arvensis suggested that the level of hairiness can have a contribution to plant fitness under both biotic and abiotic stress and in different environments (Roy et al., 1999).  4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Considering these results and observations, we conclude that wild grapevine populations in the southern Levant area are not adapted to drought conditions and can occur mainly along flowing water sources or active springs. In the few cases where wild grapevine plants were found distant from the water source, they were growing on fences of regularly irrigated fruit tree orchards. This is in accordance with a previous study which showed that wild grapevine populations distribution in Italy was highly associated with the hydrographic network (Biagini et al., 2014). Similar observations were also reported in Spain, Georgia, and Portugal, where wild grapevines were mostly found along river banks (Cunha et al., 2007;Ekhvaia & Akhalkatsi, 2010;Ocete, Muñoz-Organero, et al., 2011).
The outcropping lithology and the soil composition had an impact of secondary importance on the occurrence of wild grapevine across the studied area (Figure 6f,i). The results of our analyses indicated that wild grapevine usually avoids the outcropping basalts and prefers regions where haploxerolls are developed in carbonate rocks and where vertisols cover colluvial beds. These findings support previous studies, which found that the colluvial substratum and alluvial soils are preferred by wild grapevine populations in Italy (Biagini et al., 2014).
An interesting prediction of the Maxent model was that wild grapevine populations should be present in a small area at the western shore of the Sea of Galilee (Figure 5a). A thorough survey conducted in the area did not yield any findings of an existing or remains of established population at this location. One explanation for this erroneous prediction of the model is the lack of a detailed soil salinity GIS layer for this region. High soil salinity at the western coast of the Sea of Galilee is a well-documented phenomena and occurs due to saline springs located in the area (Gvirtzman et al., 1997;Rimmer & Nishri, 2014). Hence, salinity may be an important factor affecting the soil composition and restrict the wild grapevine distribution in this region.

| CON CLUS IONS
To better understand the ecology and evolution of the wild grapevine, we studied the demographic and environmental fac-

ACK N OWLED G M ENTS
This work was supported by the Jewish National Foundation (JNF grant # 90-23-020-12), and the general support of the Israeli Ministry of Science and Technology (MOST), and the Eastern Regional R&D Center. We thank Yaakov Henig, Shmuel Rabinovitch, and Nir Chen for their help with the effort of wild grapevine collection.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The Maxent input files, morphological and microsatellite data sets are available at the associated Dryad repository: https://doi. org/10.5061/dryad.k3j9k d56p (Rahimi et al., 2021).