Drivers of population divergence and species differentiation in a recent group of indigenous orchids (Vanilla spp.) in Madagascar

Abstract With over 25,000 species, orchids are among families with remarkable high rate of diversification. Since Darwin's time, major advances attributed the exceptional diversity of orchids to plant–pollinator interactions. However, unraveling the processes and factors that determine the phenotypic and genotypic variation of natural orchid populations remains a challenge. Here, we assessed genetic population structure and floral differentiation in recently diverged leafless Vanilla species in a world biodiversity hotspot, Madagascar, using seven microsatellite loci and 26 morphometric variables. Additionally, analyses were performed to test for the occurrence of any patterns of isolation by distance, isolation by environment, and isolation by adaptation and to detect possible physical barriers that might have caused genetic discontinuities between populations. Positive inbreeding coefficients detected in 22 populations were probably due to the presence of null alleles, geitonogamy and/or some admixture (sympatric species). In contrast, the only high‐altitude population showed an important rate of clonality leading to heterozygote excess. Genetic diversity was maximum in western populations, suggesting a postglacial colonization to the north and south. Clustering analyses identified seven genetic groups characterized by specific floral traits that matched five botanical descriptions in the literature. A contribution of montane refugia and river barriers on population differentiation was detected. We also detected combined effects of IBD/IBE and IBE/IBA on genetic differentiation and suggested this pattern is more likely determined by ecological isolation, although pollinator‐mediated divergent selection could not be ruled out for some of the species. Overall, this study provides further insights on speciation in orchids, a group for which Madagascar shows one of the world's highest level of endemism and confirms the importance of the peculiar biogeography of the island in shaping species differentiation.


| INTRODUC TI ON
Understanding how and at what rate phenotypic and genetic differentiation occurs across natural populations is one of the most fundamental questions in evolutionary biology (Ghalambor et al., 2007;Lekberg et al., 2012;Schmid & Guillaume, 2017). It is now well established that three mechanisms and their interactions shape the patterns of population diversity and structure, which are reduced gene flow, genetic drift, and natural selection (Gandon & Nuismer, 2009;Johnson et al., 2018;Wright, 1931). They are strongly influenced by many interrelating factors including historical events, human activities, distance and barriers, and ecological variations (Gomez et al., 2009;Wright, 1943;Zhang et al., 2014).
Nevertheless, in the absence of barriers, genetic differentiation can still occur. The pattern of isolation by distance (IBD) suggests a decrease of gene flow between populations as distance increases, triggering genetic differentiation via genetic drift and/or selection (Frankham et al., 2010;Ramírez-Barrera et al., 2019;Wright, 1943).
On the other hand, selection is the primary factor for divergence in isolation by ecology or ecological speciation lato sensu (Orsini et al., 2013;Shafer & Wolf, 2013;Wang & Bradburd, 2014): In this case, gene flow between populations is reduced due to local adaptation to divergent environments (Shafer & Wolf, 2013).
Past climatic oscillations have been argued to be the major factors affecting current distributions and diversity of many plant species (Gomez et al., 2009;Médail & Diadema, 2009;Zhang et al., 2014). Climatic variations can threaten several plant species and reduce plant community diversity, as shown by the extinction of European tree species during the Plio-Pleistocene (Svenning, 2003). However, they can also be an important force driving genetic differentiation and speciation (Hewitt, 1996;Vences et al., 2009). For example, climatic variations along latitudinal gradients may affect organism functional traits such as phenology (Olsson & Ågren, 2002), physiology (Bognounou et al., 2010;De Frenne et al., 2012), and morphology (Feijó et al., 2019;Olsson & Ågren, 2002;Winn & Gross, 1993) and lead to plastic or genotypic shifts (Byars et al., 2007). Correlation analysis of neutral genetic variations versus geographic distances allows testing for IBD. On the other hand, a correlation between genetic divergence and morphological divergence can reveal isolation by adaptation (IBA) where barriers to gene flow between populations from different habitats arise following adaptive trait divergence (Nosil et al., 2008;Shafer & Wolf, 2013;Wang & Bradburd, 2014). When adaptive traits cannot be targeted, various cofactors may be used to test for isolation by ecology (Ramírez-Barrera et al., 2019;Shafer & Wolf, 2013;Wang & Bradburd, 2014). For this purpose, environmental data (excluding geographic data) may serve as a proxy in correlation analysis with neutral genetic data to assess isolation by environment (IBE). IBE is caused by the reduction of exchanges between populations from ecologically divergent habitats due, for example, to the low recruitment in local environment. The roles of spatial isolation-by-distance and isolation-by-ecology patterns are complex in population differentiation and speciation processes, as they can be highly correlated and often act together as isolation mechanisms. Evolutionary patterns can be quite complex in plants Waser & Campbell, 2004), particularly in Orchidaceae. Indeed, orchid species evolve occasionally in sympatry (Barone Lumaga et al., 2012;Nielsen & Siegismund, 1999;Pansarin & Ferreira, 2019), produce numerous seeds with long-range dispersal ability (Arditti & Ghani, 2000), can reproduce clonally (Batygina et al., 2003), and hybridize (Arduino et al., 1996;Nielsen & Siegismund, 1999). Potential ecological drivers of orchid diversification are multiple, such as adaptation to pollinators (Ayasse et al., 2011;Dressler, 1982;Van der Niet et al., 2014), mycorrhizal specificity (Otero et al., 2007), differences in flowering phenology, and floral characteristics (color, form and fragrance) resulting from adaptation to different environmental conditions (Givnish et al., 2015;Pansarin & Ferreira, 2019;Sun et al., 2015) and leading to isolation by ecology. Although the various patterns namely IBD, IBE, and IBA have gained attention in the literature (e.g., Garot et al., 2019;Liu et al., 2018;Noguerales et al., 2016;Orsini et al., 2013;Shafer & Wolf, 2013), few studies have evaluated their joint contribution in orchid species differentiation (e.g., Jaros et al., 2016;Mallet et al., 2014), and no research efforts, to our knowledge, have been invested so far on this subject regarding Vanilla species divergence.
The Vanilla Plumier ex Miller (34.6 Ma, Bouetard et al., 2010) belongs to the Orchidaceae, one of the most diverse family of angiosperms (Bouetard et al., 2010;Gigant et al., 2011). It comprises about 132 species (Andriamihaja et al., 2020) distributed throughout tropical forests between the 27th north and south parallels, but are absent in Australia Portères, 1954). With nine wild Vanilla species described, the Southwest Indian Ocean (SWIO) region ranks fourth among the richest regions and first in terms of leafless species, most being located in Madagascar Portères, 1954). Leaflessness has evolved independently at least three times in the Vanilla genus (Bouetard et al., 2010).
Chloroplast DNA phylogenetic analyses also failed to distinguish the various species from this group (Bouetard et al., 2010). Such recent and closely related species have always been recognized to be an interesting model for understanding speciation processes (Lowry, Modliszewski, et al., 2008;Van der Niet et al., 2014).
Our aim is to unravel wild leafless Vanilla species diversification processes in Madagascar. The island of Madagascar is a perfect place for studying evolution, thanks to a high level of endemism attributed to a large heterogeneity of habitats, a small spatial scale, and a long history of geographic isolation (Vences et al., 2009;Warren et al., 2015;Wilmé et al., 2006). Actually, the level of endemism is more than 80% among plant species (Callmander et al., 2011;Goodman & Benstead, 2005;Phillipson et al., 2006) and even higher for animal taxa, ranging from 52% in birds to 100% in amphibians and lemurs (Goodman & Benstead, 2005). Habitat diversity is expressed by different biomes with unique vegetation composition and sharp borders, for instance, the long central highland plateau that rises along a latitudinal axis and separates west and east bioclimatic regions (Vences et al., 2009;Wilmé et al., 2006). Madagascar hosts more than 1,000 orchid species (Cribb & Hermans, 2007) including five leafless and two leafy Vanilla species (Allorge-Boiteau, 2005Cribb & Hermans, 2009;Portères, 1954). Over 90% of Malagasy orchids are endemic to the island (Cribb & Hermans, 2007). In this study, we generated ecological, genotypic (using variable microsatellite markers), and morphological data for leafless Vanilla populations from Madagascar.
We determined the genetic, morphological structure and differentiation of populations, to identify possible geographic barriers and to quantify the effects of isolation by distance (IBD), isolation by adaptation (IBA), and isolation by environment (IBE) in diversification processes.

| Study site
Fieldworks were performed in 23 sites (Table S1, Figure 1) within the distribution range of Malagasy leafless Vanilla species ( Figure 1). Madagascar is located in the Indian ocean, southeast of Africa, around 20°S and 47°E, in the intertropical zone. A rugged chain of mountains separates the island from north to south, into dry forests (in the west) and rain forests (in the east) (Waeber et al., 2015). Several factors shaped the island climatic conditions such as topography, landform, maritime influence, and prevailing wind conditions (Cornet, 1973). During the dry season from May to October, the "Alizé" brings humid air masses from the Indian Ocean to the east coast causing precipitations (Cornet, 1973).
These foggy masses move inland and drizzle in the elevated areas of the central domain. The winds become hot and dry beyond the mountains in the west, causing an accentuated dry season.
During the wet season from November to April, on the other hand, the "Mousson" brings heavy rains over most of the island (Cornet, 1973). As a result, Madagascar is ecologically heterogeneous and displays five bioclimatic regions: dry, subhumid, humid, montane, and subarid (Vences et al., 2009). These ecoregions F I G U R E 1 Map of studied populations, habitat types, and distribution of leafless Vanilla species in Madagascar. (a) Map shows the seven ecoregions of Madagascar based on WWF's ecoregions in Burgess et al. (2004). Circles indicate geographic positions of the 23 localities with Vanilla populations. (b) Map represents the distribution area of the five known leafless Vanilla species in Madagascar according to Portères (1954), Cribb and Hermans (2009) and Allorge-Boiteau (2013) host seven typical vegetation formations: spiny thicket, succulent woodland, dry deciduous forest, subhumid forest, lowland (humid) forest, ericoid thicket, and mangrove forest ( Figure 1a) (Burgess et al., 2004).

| Study species
Vanilla plants are long-lived semi-epiphytic lianas with both vegetative and sexual reproduction . Stems are thick and fleshy reaching in some cases 30 m in length, with roots growing at nodes. They generally bloom once a year, and inflorescences are in form of clusters of ephemeral flowers with a large labellum (Bosser & Lecoufle, 2011;Portères, 1954). In this study, we focused on five indigenous Malagasy Vanilla species (Figure 1b), among the seven botanically described leafless species in the SWIO region comprising Madagascar, Seychelles, Comoros, Mayotte and the eastern Africa coast and islands (Allorge-Boiteau, 2005Portères, 1954).
Because of similarities regarding flower morphological traits, they are grouped within "V. phalaenopsis" by Soto-Arenas and Cribb (2010). Populations belonging to this group grow along the Malagasy west coast where the forests are dry or spiny (spiny thickets, dry deciduous forests, succulent woodlands) except V. madagascariensis Rolfe also found in the subhumid forests and humid northeastern forests ( Figure 1b) (Bosser & Lecoufle, 2011;Cribb & Hermans, 2009;Portères, 1954).

| Population sampling
Between 2017 and 2020, 600 individuals from 22 populations were collected in Madagascar (Figure 1a). We used available occurrence data in the literature, including GBIF and Tropicos websites, but we have also prospected by interviewing local field botanists who are familiar with these species to choose these localities (Table S1).

| Microsatellite genotyping
DNA was extracted using the DNEASY ® plant kit from Qiagen. The 611 specimens were registered and stored as DNA samples in the BRC Vatel collection (La Réunion). Each individual was amplified and genotyped using seven microsatellite markers developed by Gigant et al. (2012) (Table S2). Polymerase chain reaction (PCR) with fluorescently labeled primers was performed in a total volume of 25 µl using Promega kit (Promega, 2007) by mixing 12.5 µl of GoTaq Colorless Master Mix 2X, 2 μl of DNA solution (5 < DNA<250 ng), 0.4 μM of each primer, and 8.5 μl of nuclease-free water. Amplifications were carried out under the following conditions: an initial denaturation step at 95°C for 2 min followed by 45 serial cycles of denaturation step at 95°C for 30 s, annealing at 57°C for 45 s, extension at 72°C for 1 min, and a final extension step at 72°C for 7 min. Genotyping of PCR fragments was performed by capillary electrophoresis with an automated sequencer 3130XL ABI Genetic Analyzer (Applied Biosystems). Allele sizes were scored using GENEIOUS v.7.0 (Olsen et al., 2014).

| Genetic diversity
Genotypic diversity of each population was examined by dividing the number of unique genotypes G or "genets" by the total number of individuals N or "ramets" (Halkett et al., 2005). Clones were identified using the multilocus analysis of clonality in GenAlEx 6.5 (Peakall & Smouse, 2006). Clones were, then, excluded from the dataset for genetic analysis. The multilocus exact test p-values based on Markov chain method (Guo & Thompson, 1992) in GENEPOP v.4.7.5 (Raymond, 1995;Rousset, 2008) were used to test for deviation from Hardy-Weinberg equilibrium (HWE) in each population, using the default parameters (1,000 dememorizations, 100 batches, and 1,000 iterations per batch). GENEPOP was also used to perform probability tests for assessing the linkage disequilibrium (LD) between all pairs of loci in each population. The Holm's sequential Bonferroni procedure (Abdi, 2010) or SB correction, feasible under the "p.adjust" function in R software V3.6.1 (R Core Team, 2019), was performed to correct the levels of statistical significance (p-value) for multiple hypothesis test of LD. The Waples method (Waples, 2015(Waples, , 2018, examining the correlation between the mean squared correlation of allele frequencies at different gene locus (r 2 ) and the product of two fixation index (FST) for pairs of loci, was applied across all populations to test for a Wahlund or population structure effect on LD (Wahlund, 1928). GenAlEx 6.5 was used to calculate the total number of alleles for all loci in each population (Na), the mean observed number of alleles per locus (Al), and the number of private alleles per population (Pa). As the number of alleles in a sample is highly dependent on sample size, the mean allelic richness (Ar) as suggested by Mousadik and Petit (1996) was computed using FSTAT 2.9.4. (Goudet, 1995). We determined the observed heterozygosity (H o ) and the mean expected heterozygosity under HWE (H e ) over all loci using the R package GENEPOP v.4.7.5 (Raymond, 1995;Rousset, 2008

| Genetic structure and differentiation
The "genets" dataset was used to estimate the global differentiation index (FST g ) among populations (Weir & Cockerham, 1984) and the pairwise genetic differentiation (FST p ) between pairs of populations using the GENEPOP package under R software. Analysis of molecular variance (AMOVA) (Excoffier et al., 1992) implemented in GenAlEx 6.5 (Peakall & Smouse, 2006) was conducted to assess the genetic differentiation level among and within populations and among populations from different habitat types. Each population was associated to its corresponding ecoregion of Madagascar based on geographic location. Assignment of multilocus genotypes to different clusters was inferred by conducting Bayesian clustering analyses implemented in STRUCTURE (Pritchard et al., 2000) on the "genet" dataset, with ten independent runs for each value of K (1 to 10 for the complete dataset), a burn-in period of 100,000 and 1,000,000 Markov chain Monte Carlo (MCMC) replications. To overcome sampling imbalance and thus correctly assign individuals to clusters, we used the model with admixture and independent allele frequencies and reduced the default alternative ancestry prior Alpha in STRUCTURE, as recommended by Wang (2017), with Alpha = 1/K. In our case, an Alpha equal to 1/10 was chosen, as the number of K tested was 10. An individual was considered to belong to a particular genetic group when at least 80% of its genome was assigned to that genetic group. The summary likelihood statistics ΔK (Evanno et al., 2005), which is the most used method to find the best K, was plotted. As the ΔK method alone is not recommended by some authors to determine the best clustering (Janes et al., 2017;Wang, 2017), the posterior probabilities of K, called "Ln P(X|K)," implemented in STRUCTURE were also examined (Pritchard et al., 2000). The best K usually corresponds to the highest value of mean Ln P(X|K) (Evanno et al., 2005;Pritchard et al., 2000;Rosenberg et al., 2001;Wang, 2017). Bar plot of genetic structuring obtained from STRUCTURE was generated using the POPHELPER package in R (Francis, 2017).  (Table S1). One herbarium specimen of each genetic group was deposited at the herbarium of the Botanical and Zoological Park of Tsimbazaza (N°51/20/MESupRES/ SG/DGRS/PBZT/Flore). Twenty to thirty mature flowers per population were randomly sampled. In total, 198 flowers were dissected and classical taxonomic morphometric parameters were assessed (Gigant et al., 2014;Gigant, De Bruyn, et al., 2016;Petersson, 2015;Portères, 1954;Soto-Arenas & Cribb, 2010). We directly measured in situ ten parameters using a digital caliper and a precision balance, that is, total flower weight (TFW), ovary weight (OWE), ovary length (OL), ovary width (OW), column length (CL), column width (CW), rostellum length (RL), rostellum width (RW), anther length (AL), and anther width (AW) (Figure 2). Thereafter, flower parts were digitalized using a canoScan LiDE 120. The resulting images were processed under IMAGEJ software (Abràmoff et al., 2004)  PCA allows, by equal weighting of each character, reducing the number of variables in fewer dimensions and therefore investigating the overall morphological patterns. Means and standard deviations of the ten most explanatory variables from PCA were computed for each genetic group. Differences in these floral traits between groups were detected using a Bonferroni-corrected pairwise t test. Multivariate analysis of variance (MANOVA) was also conducted to test the statistical significance of overall floral differences based on a priori groupings. All analyses were done with R software V3.5.1.

| Geographic and environmental characterization of populations
Recorded GPS coordinates (Table S1) were used to calculate the geographic distances between Vanilla populations in Madagascar.
Sample site elevation data were obtained from GPS (Table S1). To highlight the heterogeneity of habitats, 19 present-day bioclimatic variables (means over thirty years) downloadable from the WorldClim website (https://www.world clim.org/) were extracted for each sampling sites in Madagascar (Tables S3 and S4 (Tables S3 and S4) (Ribeiro et al., 2018). Pearson's correlation tests were performed to detect collinearity between variables, and a combination of variables with a correlation of less than 0.3 was chosen. Habitat heterogeneity between the 23 populations was shown by applying a PCA to these uncorrelated variables. Environmental dissimilarity between each pair of populations was obtained by calculating the Euclidian distances of the PC scores of the three first principal components (PCs).
As the landscape discontinuity could be an important factor to differentiation processes, we performed Monmonier algorithm analysis using BARRIER 2.2. (Manni et al., 2004) to identify where genetic barriers might occur. GPS coordinates of each population were entered to construct Voronoï tessellation that determines which populations are neighbors and adjacent. The pairwise genetic distance (FST p ) was used to identify the top eleven potential barriers.

| Effect of environmental, morphological, and geographic distances on genetic divergence
To detect possible patterns of IBD and IBE, we performed multiple matrix regression with randomization (MMRR) as implemented in the R function "MMRR" (Wang, 2013) on the 23 populations. This approach is similar to Mantel test but can incorporate several distance matrices, thus providing an interpretable result in the form of a multiple regression equation (Wang, 2013). In the case of MMRR, significance values for the regression coefficient (β) and coefficient of determination (R 2 ) were estimated by performing 9,999 randomized permutation procedure (Wang, 2013). MMRR was also applied for testing the correlation between geographic and environmental distances. Genetic differentiation between populations was primarily estimated using pairwise FST values (Weir & Cockerham, 1984) using the "genet dataset" after discarding null alleles using FreeNa (Chapuis & Estoup, 2007). The geographic distance matrix was obtained with the SP package in R (Pebesma & Bivand, 2020). For environmental parameters, we used the Euclidean distance calculated from PC scores as explained in the previous section. Additionally, one powerful approach called "distance-based redundancy analysis with randomization" Anderson, 1999), was applied using the "dbrda" function in the VEGAN package (Dixon, 2003) in R. Again, FST was tested against the geographic distances and population's PC scores from the PCA performed on the environmental data. Prior to IBD analysis, it was necessary to transform the geographic distances matrix into a continuous rectangular vector by principal coordinates analyses (PCoA) using the "pcnm" function included in the VEGAN package in R. Significance of the predictors was evaluated using multivariate F-statistics with 9,999 permutations using the "ANOVA" function in VEGAN. Three different types of dbRDA were performed: (a) a global dbRDA using all explanatory variables (geographic and environmental distances), (b) marginal tests evaluating independently the relationship between the genetic distance matrix and each of the two explanatory variables (geographic and environmental factors), and (c) a partial dbRDA (conditional test) estimating the relative contribution of one explanatory variable (geographic distance or environmental factors) to genetic differentiation, while controlling the effect of the other one. The amount of variance explained by the intersection of IBD and IBE was obtained by variance partitioning using the function "varpart" included in VEGAN package.
The adjusted coefficient of determination R 2 adj was calculated to evaluate the quality of the model. MMRR and dbRDA were also conducted on a reduced sampling of eight populations for which phenotypic data were available to test, in addition to IBD and IBE, a possible correlation between morphological distances and genetic distances (IBA). For that, a Pearson's correlation test on the 26 morphometric variables was carried out to choose a combination of uncorrelated variables (cor < 0.3).
The PC scores of the first three principal components resulting from the PCA of these selected variables were, then, used to calculate the phenotypic distance between the eight populations in order to perform MMRR and dbRDA.

| Genetic diversity
A total of 555 genets and 56 clones were identified among the 611 studied samples (ramets), representing approximately 9% of clonality. At the population level, the clonality rate was generally low (G/N > 0.7) except for AND population where genotypic diversity was only 0.27 (Table 1) (Slatkin, 2008). Therefore, all microsatellite loci were considered independent as shown by the original study of Gigant, De Bruyn, et al. (2016). A high frequency of null alleles ranging from 0.12 (HU03, HU07, HU08) to 0.26 (HU06) was detected.

| Genetic structure, differentiation, and barriers
The global FST value calculated (0.19) indicated a significant (p < .001) overall differentiation among populations. Pairwise FST was all significant (p < .001), with high amplitude values ranging from 0.01 between two northwestern populations (BBL and MSL) up to 0.60 between AND (central highland) and ANJ (southwestern) (Table S5). A strong linear positive correlation (Pearson correlation coefficient = 0.81, r-squared = 0.63; p < .001) was obtained between r 2 and the product of two FST values for pairs of loci, suggesting a substructure of the sample (Figure 3).
AMOVA estimated that 79% of the total genetic variation was found within populations, a moderate proportion (15%) was found among populations, and only 6% was found among ecoregions. The distribution of ∆K values in the Bayesian assignment of STRUCTURE showed a modal value at K = 4 and the maximum value of the posterior probability "Ln P(X|K)" corresponded to K = 7 ( Figure S1).
According to Wang (2017), with unbalanced sampling, it is preferable to choose the method of Pritchard et al. (2000) based on "Ln P(X|K)" to find the best K. Consequently, seven distinct genetic clusters were retained as the optimal K in STRUCTURE analysis (Figure 4b, c). If most populations were rather homogeneous in terms of genetic structure, some showed evidence of admixture, of which the most obvious were AKL (southwest) and CMK (northwest) (Figure 4b, c).
FST-based analysis of genetic barriers revealed eleven boundaries. Six of them were well supported with bootstrap values exceeding 60%. These genetic barriers separated, in majority, the seven different genetic groups obtained from STRUCTURE ( Figure 4b).

| Floral morphology differentiation
The first three principal components of PCA ( Figure S2 . Each individual is represented by a vertical bar partitioned into K-colored segments that represent the individual's probability of belonging to the cluster with that color. Corresponding species are indicated, according to population morphological analyses ( Figure 5) of the ten most discriminant variables varied significantly among all genetic groups, except for clusters 2 and 4, which presented many similarities as previously mentioned. The surface of the labellum was significantly different between all genetic groups.

| Geographic, environmental, and morphological effects on genetic differentiation
In   (Figure 6d). Analyses carried out on the reduced sample (eight populations) indicated no contribution of geographic distance on genetic distance (no IBD) (Table 3) and eliminated its correlation with environmental distance. However, MMRR and dbRDA revealed a significant association between genetic distance and morphological distance (PC1) and between genetic distance and environmental distance (PC1) (Figure 6e,f) ( Table 3). The PC1 axis from PCA of morphological data grouped together several variables related to flower size: TFW, PL, RLSL, ASL, OW, OWE, CL, and AL ( Figure S4). The PC1 axis from PCA on environmental data assembles a pool of ecological variables related to temperature (BIO8, BIO10, BIO5), elevation, pH, and percentage of silt ( Figure S5). However, environmental (PC1) and morphological predictors (PC1) were not significant after accounting for the influence of each other in partial dbRDA tests (Table 3), as they were highly intercorrelated (Figure 6g). The intersection IBE ∩ IBA constituted the most important contributor to genetic divergence explaining 45% of the total variation ( Figure 6h).

| Genetic diversity of leafless Vanilla populations
Numerous mechanisms can maintain genetic diversity, the most important of which are mutation, migration and balancing selection where population sizes play a major role (Frankham et al., 2010).
Genetic diversity organization in plants is also highly dependent on the life cycle, reproduction system, pollination, and seed dispersion mechanisms. For example, long-lived, allogamous taxa, with wind or ingested dispersed seeds, are more variable within populations than between, with high He (0.61-0.73) and lower FST (0.12-0.26) values, with an opposite organization for annual autogamous plants with gravity-dispersed fruits (Nybom, 2004;Ronfort et al., 2005). AMOVA results suggested that genetic variation was mainly (79%) allocated within Vanilla populations in Madagascar.
Significant deviation from HW equilibrium was detected for all populations with a strong significant heterozygote deficit (FIS > 0) for 22 populations. This could be due to the high frequency of null alleles detected for all loci (0.12-0.26), although the substructure of some populations with different sympatric species (as will be further discussed), as well as possible inbreeding due to geitonogamy (as demonstrated for a V. humblotii Rchb. f. population in Mayotte, (Gigant, De Bruyn, et al., 2016), could be other factors explaining these deficits.

| Genetic and morphological differentiation
At the scale of Madagascar, a strong significant population substructure was detected by the Wahlund effect test for overall individuals as suggested by Waples (2015Waples ( , 2018 (Figure 3). The microsatellite-based structuring corroborated this, by revealing seven well differentiated genetic groups (Figure 4b, 4c).  Table 2) and also as noted by Portères (1954). The presence of these two species in Madagascar has already been confirmed (Cribb & Hermans, 2009), although V. humblotii is supposedly endemic to the Comoros archipelago (Portères, 1954). The genetic groups with white flowers (C2, C4, C5, C7) could be differentiated mainly by the variation in their flower sizes (Table 2). Finally, the genetic group constituted by AND (cluster 7) was well differentiated and showed morphological traits different from the others that, however, have never been described in the literature. populations. These results, therefore, confirm possible sympatry and suggest that genetic exchanges are possible between sympatric species. Hybridization is a common phenomenon among Vanilla species Divakaran et al., 2006;Gigant et al., 2011;Nielsen & Siegismund, 1999), with an absence of interspecific genetic incompatibility, particularly between closely related species .

| Genetic discontinuities and speciation: forest and montane refugia, and riverine barriers
As genetic groups potentially corresponded to different species, diversification processes of leafless Vanilla populations in Madagascar could be interpreted as speciation mechanisms. Five mechanisms of speciation are recognized to be a pattern of diversification common for many taxa in Madagascar: ecological constraint influenced by bioclimatic disparities, western rainforest refugia, montane refugial diversification, riverine barriers, and watersheds (Vences et al., 2009).
Interestingly, a genetic diversity center was revealed in the populations from west of Madagascar (highest He and Ar values, Table 1).
Diversification could have resulted from a postglacial colonization from a western refuge area (Vences et al., 2009), to the north and the south, if considering the cline in diversity as the direction of spread (Hewitt, 1996). A similar scenario was suggested on the basis of paleoclimatic data for palm trees in the northeast of Madagascar (Rakotoarinivo et al., 2013), that have then recolonized the eastern corridor. The general pattern of genetic diversity showed a weak but significant latitudinal shift in the genetic diversity distribution (r He-Latitude = 0.47, p < .04) along the western coast of Madagascar.
In addition to the probable effect of climatic oscillation explained previously, the lowest genetic diversity detected in southern populations might be partially linked to population size reductions (bottlenecks), as small and isolated populations are characterized by a low genetic diversity and linkage disequilibrium among loci (Frankham et al., 2010;Ouborg et al., 2006). Indeed, populations in the south of F I G U R E 6 Multiple matrix regression with randomization (MMRR) analysis and distance-based redundancy analysis (dbRDA) based on 23 populations (a, b, c, d) and 8 populations (e,f,g,h). (a) MMRR between genetic distance (FST) and geographic distance (GD), (b) MMRR between genetic distance (FST) and environmental distance (ED), (c) MMRR between environmental distance (ED) and geographic distance (GD). Regression lines are drawn for the three models a, b, c. (d) shows variance partitioning results of dbRDA on GD and ED. Geographic distance (GD) comprises eleven continuous rectangular vectors from principal coordinates analyses. Environmental factors include PC scores of three first components of the PCA ( Figure S3). The overlapping zone represents the intersection between IBD and IBE. (e) MMRR between morphological distance (MD) and FST. (f) MMRR between ED and FST. (g). MMRR between MD and ED. Regression lines are drawn for the three models e,f,g. (h) shows variance partitioning results of dbRDA on MD and ED. Morphological factors include PC scores of the first three components of the PCA based on selected uncorrelated variables ( Figure S4). Environmental factors include PC scores of three first components of the PCA ( Figure S5). The overlapping zone represents the intersection between IBA and IBE TA B L E 3 Results of distance-based redundancy analyses (dbRDA) testing the effects of geographic distance (GD), environmental distance (ED), and morphological distance (MD) on genetic differentiation ( Madagascar are often exploited for medicinal use (Randrianarivony et al., 2017) and reduced by deforestation, as the two southwestern populations (ANJ and SAG) are found in unprotected areas close to villages. Low genetic diversity in fragmented habitats was also observed for some tropical tree populations in Africa (Andrianoelina et al., 2006(Andrianoelina et al., , 2009Farwig et al., 2007). Also, conjugated effects of deforestation and climate crisis, particularly emphasized in the south, could impact on seedlings recruitment (Moles & Westoby, 2004;Rasmussen, 2002).
Significant regions of genetic discontinuities were revealed with BARRIER analysis, between adjacent populations belonging to different genetic groups/species and indicating limited gene flow between these groups (Figure 4b). The most significant genetic barrier (0.23 < FST < 0.6) was observed between the genetic group C7 (AND), encountered in high altitude (up to 800 m) (Figure 4a, b) in a 49 ha conserved forest (Gould & Gabriel, 2015), and the others localized in low altitude, confirming probable differentiation by isolation. AND population could be the result of a "montane refugia speciation" supposing that population may have survived in high altitude during the climate shift, has adapted over time and remained isolated from the others (Fjeldsaå & Lovett, 1997;Vences et al., 2009). Moreover, the AND population showed a high clonality (73%) which probably explains the significant heterozygote excess (FIS < 0) detected (Halkett et al., 2005).
Rivers can constitute permanent geographic barriers that separated the continuous distribution range of populations leading to vicariant divergence (Vences et al., 2009). Barriers h, i, and k can be linked to rivers in the north (Figure 4a, b). To what extent gene flow can be impeded by riverine barriers will have to be further assessed when better knowledge will be obtained regarding leafless Vanilla species pollinators and dispersers.
Nevertheless, those riverine barriers might explain part of, but not all the genetic differentiation observed between populations/species.
The occurrence of IBD, IBE, and IBA was therefore also assessed.

| Combined effects of IBA and IBE as drivers of genetic divergence
Considering the geographic distribution of the genetic groups identified (Figure 4), some of them seem to be confined to well-defined biomes (e.g., C5, C6 and C7) and AMOVA showed a weak but significant (6%) genetic variation between ecoregions. Tested separately on 23 populations, IBD and IBE played a significant role (R 2 adj = 45.42% and R 2 adj = 45.79%, respectively) on genetic differentiation. However, partial dbRDA showed a relatively weak (R 2 adj = 3.4%), but significant influence of environmental factors on genetic differentiation and no contribution of IBD. The combined effect of geographic and environmental distance (IBD ∩ IBE) was the most explanatory (R 2 adj = 42.8%). Indeed, spatial autocorrelation was detected between geographic and environmental distance (Figure 6c). This can be explained by the presence of clear ecoregions determined by the presence of a central mountain range and bioclimatic differences between west/east and north/south, which can lead to ecogeographic constraint speciation (Vences et al., 2009). The analysis of the reduced dataset (eight populations) revealed no IBD and no correlation between IBD and IBE. The suppression of this correlation allowed the pure contribution of these processes to be determined (Wang & Bradburd, 2014). Results from this reduced dataset confirmed the significant occurrence of IBE (R 2 adj = 52.51%) (with most discriminant variables being related to temperature, elevation, and soil pH) (Table 3). Moreover, significant IBA (R 2 adj = 50.06) (concerning flower size-related measures) was also revealed in this dataset. But, there was a strong correlation between environment and morphological variables, making it difficult to disentangle their relative contribution to the genetic differentiation. Two evolutionary patterns can possibly explain this situation: Either IBA of floral traits could indicate pollinator-mediated ecological selection, or ecological isolation could have led to drift-induced morphological variations in flower size (Waser & Campbell, 2004). Pollinator-mediated divergent ecological selection acts on flower traits (size, color, shape, odor…), therefore promoting isolation by adaptation (IBA) via floral isolation ( Van der Niet et al., 2014). In orchids, pollinator specialization is common (Darwin, 1859;Peter & Johnson, 2014;Tremblay, 1992) and requires association between floral characteristics and insect traits (Micheneau et al., 2009;Petersson, 2015;Tremblay, 1992). For example, an association between spur length of some Angraecoid species and proboscis length of pollinators has been observed in Madagascar (Micheneau et al., 2009). Flower-size variations in Vanilla species could be indicative of a pollinator specialization. Indeed, as noticed in Costa Rica, given the major size differences observed between V. planifolia and a newly described species V. sotoarenasii, it is unlikely that the same bee species can perform pollination efficiently in both species (Azofeifa-Bolaños et al., 2017). Similarly, in Peru, small size bees (Melipona sp., Euglossa sp.) were shown to be unable to remove pollen from the large flowers of V. grandiflora, as opposed to Eulaema meriana (Householder et al., 2010;Lubinsky et al., 2006). Liotrigona mahafalya. These three bees have a north-west-south distribution (Pauly et al., 2001) matching the combined distributions of western Vanilla species. Another potential bee pollinator was also indicated based on floral morphology traits, Lithurgus pullatus which has a disjunct southwestern and northeastern distribution (Pauly et al., 2001). Moreover, in Mayotte, South Africa and Madagascar, leafless Vanilla species (V. humblotii, V. roscheri, V. bosseri, respectively) all have potential pollinator species from the Allodapini group (Gigant et al., 2014;Gigant, De Bruyn, et al., 2016;Petersson, 2015).
In addition to bee species, the sunbird Nectarinia coquerelli was also identified as visitor and putative pollinator of V. humblotii in Mayotte (Gigant, De Bruyn, et al., 2016). At least one to two species belonging to the same genus, Nectarinia notata and Nectarinia souimanga, were observed in 31 dry forest sites in Madagascar (Raherilalao & Wilmé, 2008). High interannual variation in precipitation was observed in the west of Madagascar, which is associated with unpredictable patterns of flowering and fruiting (Dewar & Richard, 2007). This is consistent with the generalist hypothesis, at least for the western species: Some pollinators would compensate for the absence of the others to ensure species reproduction. Our results showing the existence of possible hybrids between some sympatric species in the west of Madagascar reinforces this hypothesis, as also demonstrated for two sympatric leafless Vanilla species in Puerto Rico (V. barbellata and V. claviculata) (Nielsen, 2000) for which the occurrence of hybrids suggested the same pollinator. In the genus Vanilla, in America, the known pollinators of wild species of the V. planifolia and V. pompona groups are similar, and they are all orchid-bee species from the genera Euglossa and Eulaema (Andriamihaja et al., 2020).
In plants, ecological isolation, apart from pollinator-mediated divergent selection, is mainly based on the selection against migrants Nosil et al., 2005). Our results showed that discriminant environmental variables were related to temperature, altitude, and soil (pH). Altitude influence was previously discussed in the "montane refugia speciation" possibly involved for the high-altitude population AND. Soil pH can affect mycorrhizal fungi composition (van Aarle et al., 2002;Porter et al., 1987). For orchids, establishment and growth is highly dependent on suitable mycorrhizal fungi for seed germination in addition to suitable pollinator populations for reproduction (Ackerman et al., 1996;Rasmussen, 2002;Rasmussen et al., 2015). In Puerto Rico, Cuba, and Costa Rica, different mycorrhizal fungi were detected in different sites associated with different Vanilla species, and fungal specificity for Vanilla seed germination and growth was shown (Porras-Alfaro & Bayman, 2007). Characterizing the mycorrhizal fungi in each of the studied populations could allow to verify this, as done on orchids in the Central Highlands of Madagascar (Yokoya et al., 2015). Species differentiation may also be related to the different phenology of the species, influenced by climatic and ecological conditions such as temperature and altitude (Petrauski et al., 2019;Sobel et al., 2009).  -Boiteau, 2005-Boiteau, , 2013Portères, 1954). V. decaryana is reported to flower in December-January (Allorge-Boiteau, 2013; Portères, 1954). Although the leafless Vanilla species V. claviculata and V. barbellata were shown to have overlapping phenologies in Puerto Rico, facilitating hybridization (Nielsen, 2000), differences in flowering periods with another sympatric species (V. dilloniana) have also been reported (Nielsen & Siegismund, 1999).
All results combined, the present study suggested a pattern of speciation possibly driven by the existence of riverine barriers and forest and montane refugees, in combination with isolation by ecology, as a result of adaptation to local habitats, possibly linked to local diversity in mycorrhizal fungi, and differences in flowering phenology, although pollinator-mediated divergent selection in these Vanilla species could not be ruled out and might depend on the species concerned. Pollinator specificity could also be a consequence of floral differentiation consecutive to ecological speciation. Our results also showed the efficiency of a population genetics approach using highly variable microsatellite markers and the importance of large sample sizes for reliable estimates of the genetic relationship between recently diverged species, as shown for Carappa spp. (Duminil et al., 2012) and Coffea spp. (Razafinarivo et al., 2013).
This study is the first to provide an overview on the patterns of population genetic and morphological differentiation and to address issues related to the evolutionary mechanisms of several natural leafless Vanilla populations from Madagascar. It is nevertheless necessary to point out that this work has certain limitations, including the limited number of polymorphic microsatellite markers used, the small sample sizes of some populations (MAN, SAG, ZVB) and the absence of morphological data of one identified genetic group.
The presence of new and significantly different genetic and morphological groups in the east, not yet botanically described, is interesting and will require further sampling and genetic studies in the eastern region, which is not specifically described as harboring leafless Vanilla species (Allorge-Boiteau, 2005;Cribb & Hermans, 2009;Portères, 1954). Further studies on the Madagascar species reproduction biology (incompatibility, phenology, pollinators, seed dispersion, mycorrhizal fungi…) are needed. The Madagascar leafless Vanilla species will also need to be compared to sister species from the SWIO area to gain a broader phylogeographical view of their evolution.

ACK N OWLED G M ENTS
We wish to thank Hoby, Alemao, Gabriel, Miarantsoa, and all guides

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Morphological measurement data and microsatellite data used to produce Figure 4, Figure 5, and Table 2