Discrepancies between genetic and ecological divergence patterns suggest a complex biogeographic history in a Neotropical genus

Abstract Phylogenetic patterns and the underlying speciation processes can be deduced from morphological, functional, and ecological patterns of species similarity and divergence. In some cases, though, species retain multiple similarities and remain almost indistinguishable; in other cases, evolutionary convergence can make such patterns misleading; very often in such cases, the “true” picture only emerges from carefully built molecular phylogenies, which may come with major surprises. In addition, closely related species may experience gene flow after divergence, thus potentially blurring species delimitation. By means of advanced inferential methods, we studied molecular divergence between species of the Virola genus (Myristicaceae): widespread Virola michelii and recently described, endemic V. kwatae, using widespread V. surinamensis as a more distantly related outgroup with different ecology and morphology—although with overlapping range. Contrary to expectations, we found that the latter, and not V. michelii, was sister to V. kwatae. Therefore, V. kwatae probably diverged from V. surinamensis through a recent morphological and ecological shift, which brought it close to distantly related V. michelii. Through the modeling of the divergence process, we inferred that gene flow between V. surinamensis and V. kwatae stopped soon after their divergence and resumed later, in a classical secondary contact event which did not erase their ecological and morphological differences. While we cannot exclude that initial divergence occurred in allopatry, current species distribution and the absence of geographical barriers make complete isolation during speciation unlikely. We tentatively conclude that (a) it is possible that divergence occurred in allopatry/parapatry and (b) secondary contact did not suppress divergence.


| INTRODUC TI ON
The term "Ecological speciation" refers to a speciation process in which ecological divergence precedes, and causes, reproductive isolation, implying that at least the early stages of ecological divergence occur in presence of gene flow (Nosil, 2012;Rundle & Nosil, 2005).
In plants, this has been observed in model species such as poplar (Christe et al., 2017 and ref. therein), sunflower (Renaut, Owens, & Rieseberg, 2014), and spruce (de Lafontaine, Prunier, Gérardi, & Bousquet, 2015). That said, even though natural interspecific gene flow commonly occurs in flowering plants (Mallet, 2005), including tropical trees (Leigh et al., 2004), species integrity can persist unaffected (Petit & Hampe, 2006). In other instances, gene flow can hamper divergence and lead to cases of incomplete speciation (Mallet, 2008). Gene flow among otherwise well-defined species occupies a special place in evolutionary biology, and genera including closely related species sharing genetic variants through gene flow-as opposed to sharing alleles inherited from their common ancestor-deserve the special name of "species complexes". Cases involving ecologically divergent species (i.e., species with nonoverlapping ecological niches) are particularly interesting because they show that ecological radiation may occur, or be maintained, in spite of gene flow (Givnish, 2010;Humphries & Winker, 2011;Nosil, Egan, & Funk, 2008;Nosil, Funk, & Ortiz-Barrientos, 2009;Petit, Bodénès, Ducousso, Roussel, & Kremer, 2004;Zheng & Ge, 2010). There are currently only few described cases of (ecological) divergence with gene flow, and the frequency of such events is still a matter of debate (Nosil, 2008).
Molecular methods are useful to recount the evolutionary history of species and more specifically to detect interspecific gene flow. Highly polymorphic genetic markers, often in a combination of neutral nuclear and chloroplast markers (e.g., microsatellites) and powerful statistical methods (e.g., Bayesian clustering) facilitate this task (Field, Ayre, Whelan, & Young, 2010). In theory, because they have different evolutionary dynamics (Powell, 1983), chloroplast and nuclear DNA should provide complementary information. Such data can help to describe gene flow and speciation processes (Lexer & Widmer, 2008) and to detect hybrid individuals (Rieseberg & Soltis, 1991). Gene flow is difficult to distinguish from shared ancestral variation as both scenarios produce similar patterns of allele sharing (Muir & Schlötterer, 2005;Won & Hey, 2005). The coalescent theory (Kingman, 1982) offers a strategy to disentangle the effects of the different forces that led to the currently observed patterns of molecular diversity (Bowie, Fjeldså, Hackett, Bates, & Crowe, 2006;Rosenberg & Nordborg, 2002). Divergence with gene flow is suitably inferred by Approximate Bayesian Computation (ABC; Bertorelle, Benazzo, & Mona, 2010;Csilléry, Blum, Gaggiotti, & François, 2010). From the seminal paper on maize by Ross-Ibarra, Bernatchez, & Gagnaire (2009), only a few studies have used modeling to assess ecological speciation scenarios; after Strasburg and Rieseberg (2013), who advocate for further development of analytical approaches for distinguishing primary from secondary contact, several works on this topic have been published, focused on model species such as the European oak species complex (Leroy et al., 2017) and fish species (Rougeux, Bernatchez, & Gagnaire, 2017).
We have studied the intensity of gene flow during and after speciation through a combination of coalescent modeling and ABC inference in the Neotropical Virola genus, which shows a large ecological and species diversity (Wilson, 2004). The resolution of Virola phylogeny is not so well-defined (Steeves, 2011), suggesting that some groups of species are related closely enough that they may be connected by interspecific gene flow. Given that some clades with poor resolution include species with contrasting ecological preferences, this genus is a good model to assess the amount of interspecific gene flow occurring between closely related, ecologically divergent species. An opportunity to do this is provided by the Guiana shield, the northernmost stable part (or craton) forming the South American tectonic plate and by a sympatric trio of species of the Virola genus thriving there: terra firme-dwelling V. michelii Heckel and V. kwatae Sabatier, morphologically and ecologically very similar to each other (terra firme being the rainforest that is not inundated by flooded rivers), and the much more distinct V. surinamensis Warburg, which occupies seasonally flooded areas in combination with the former two species (Baraloto, Morneau, Bonal, Blanc, & Ferry, 2007;Macía, 2011;Sabatier, 1997). Until recently, herbarium vouchers (CAY, P, NY, INPA) of V. kwatae were identified as V. michelii; nowadays, based on morphological traits, V. kwatae is recognized as an independent species (Sabatier, 1997).
Because of the very close morphological and ecological resemblance of V. kwatae to V. michelii, we set out to assess whether the two species evolved with sustained gene flow, and used the third co-occurring species of the same Virola sub-clade, V. surinamensis, as an outgroup. The preliminary results we obtained (see Section 3) forced us to review the relationships among the tree species: it turned out that, in spite of sizeable morphological differences and distinct ecology, V. surinamensis, and not V. michelii, is a sister species to V. kwatae. This implies that the divergence between the two sister species involved at least one ecological shift. Did the two sister species diverge, both morphologically and ecologically, while undergoing gene flow? We considered three scenarios: (a) speciation-with-gene-flow (Feder, Egan, & Nosil, 2012), implying that gene flow was concomitant with speciation; (b) secondary contact, according to which gene flow would have resumed after completion of divergence, which would be maintained in spite of secondary gene flow (Barton & Hewitt, 1985); (c) divergence without gene flow.

| Species ecology and phylogenetic relationships
The Virola genus belongs to Myristicaceae (the nutmeg family), a widely distributed pantropical plant family, member of Magnoliales, and one of the oldest families of flowering plants (Cronquist, 1981;Doyle, Manchester, & Sauquet, 2008;Doyle, Sauquet, Scharaschkin, & Le Thomas, 2004). The Neotropical genus Virola is one of ten most common Amazonian tree genera in order of abundance (ter Steege et al., 2013), comprising between 45 (Wilson, 2004) and 60 (Steeves, 2011) species and having its center of diversity in Western Amazonia (Holbrook, Loiselle, & Clark, 2006;Queenborough, Burslem, Garwood, & Valencia, 2007). The genus' phylogenetic resolution is incomplete (Steeves, 2011). Virola species are relatively common dioecious canopy trees in the Guiana shield and throughout the Amazon (ter Steege et al., 2013). According to data from 76 1-ha plots inventoried in French Guiana (Guyadiv database, http://vmceb agn-dev.ird.fr, D. Sabatier & J-F. Molino, unpublished), average population densities of V. surinamensis, V. kwatae and V. michelii are about 0.1, 0.5, and 3 adult trees/ha, with up to 7, 5, and 12 trees in a single plot, respectively. These three sympatric species have overlapping phenology patterns; they mostly flower during the dry season (July to November) and fruits ripen during the rainy season (November to June; Sabatier, 1997;ter Steege & Persaud, 1991). They are insectpollinated and have vertebrate-dispersed seeds. Seeds have a limited dormancy, and seedlings form large cohorts growing quickly in forest gaps; surviving saplings become upper-canopy or emerging trees (Rodrigues, 1980). V. michelii and V. kwatae are terra firme forest specialists, while V. surinamensis is found in seasonally flooded bottomland forests (Baraloto et al., 2007;Sabatier, 1997). Until recently, specimens of V. kwatae were identified as V. michelii in the herbaria and floras. Although these two species occupy the same terra firme forest environment, some morphological characters (such as leaf shape, abaxial lamina indumentum, fruit size and indumenta and tree and buttress stature) eventually permitted to distinguish them, leading to the description of V. kwatae as a novel species (Sabatier, 1997).
Moreover, V. kwatae fruits and seeds are always significantly larger than those of its two congeners (size coefficients of seed length: 1.3 and 1.5 compared to V. michelii and V. surinamensis, respectively); a feature probably linked to its high dispersal rate by spider monkeys (Sabatier, 1997).

| Sampling and DNA extraction
Leaf or cambium samples were collected from 93 Virola reproductive individuals belonging to several populations located in old, undisturbed stands of the eastern Guiana Shield (Table 1 and Figure 1). Sampled trees were separated by at least 100 m (to avoid sampling related trees) within each sampling site. We sampled small numbers of trees from a large number of sites for each species to cover as much as possible of the regional diversity for each species (Barthe et al., 2017). Whenever possible, we sampled stands where at least two of the three species co-occur. In these areas, terra firme and bottomland habitats are intermingled, with habitat turnover typically occurring over shorter distances than seed dispersal and gene flow (Audigeos, Brousseau, Traissac, Scotti-Saintagne, & Scotti, 2013;Brousseau, Bonal, Cigna, & Scotti, 2013). Botanical identification was performed on a subset of the samples (see Table 1) using the presence or absence of several specific characters (Sabatier, 1997); the remaining samples, although tagged with a putative species identity at sampling, were left unidentified for the time being. No recombined character assemblage or intermediate morphology was observed in the individuals with full botanical identification. For all samples, total genomic DNA was extracted according to the cetyltrimethylammoniumbromide (CTAB) procedure from Doyle and Doyle (1987) or an Invisorb DNA Plant HTS 96 kit (Stratec) according to the manufacturer's instructions. DNA quality was checked by spectrophotometry and by agarose gel electrophoresis. TA B L E 1 Sampling sites. The number of samples per species and per site is indicated. Numbers in parentheses indicate the number of samples with certified botanical identification (the remaining samples had putative identification in the field)

| Chloroplast DNA
We used two universal chloroplast sequence markers: trnH-psbA and trnC-ycf6 (Shaw et al., 2005). PCRs were performed in a 15µl reaction volume containing 10 ng of DNA, 1× ThermoPolBuffer ® DNA sequences were aligned with CodonCode Aligner v. 3.0 (CodonCode) and Bioedit v. 5.0 (Hall, 1999). Both forward and reverse sequences were checked carefully by eye for all basecalling errors. Alignments were adjusted manually to minimize the number of gaps. After base calling was completed, the sequences obtained on both strands were assembled using an R script (R Core Development Team, 2018) available from Dryad. Indels were removed from further analysis. Because in general no recombination occurs in chloroplast DNA, sequences of the two loci (trnH-psbA and trnC-ycf6) were assembled and treated as a single locus.
Full sequences for both regions were obtained for 78 out of the 93 individuals originally sampled.
showing the sampling sites of the trees used in this study. The position of each pie diagram corresponds to sampling site coordinates (see Table 1); the area of each pie chart is proportional to total sample size (Table 1); the area of each slice in a pie chart is proportional to the number of samples for each species (Table 1).

| nSSR Bayesian cluster analysis and cpDNA haplotype network
To identify genetic clusters corresponding to botanical species, and to evaluate their genetic relationships, we analyzed the ninelocus nSSR data set (full 93-individual sample) using the Bayesian clustering method implemented in StruCture version 2.0 (Pritchard, Stephens, & Donnelly, 2000) to infer the most likely number of genetic clusters K and to estimate cluster membership coefficients for each individual. Ten independent runs were performed for each K between 1 and 10, with an admixture model (to allow for individuals whose ancestry is in more than one species) and correlated allele frequencies, and without prior assumptions on sample identity, 200,000 MCMC iterations and a 50,000 iterations burnin period. We then selected the most plausible number of clusters K using both the approach based on lnP(D) (Pritchard et al., 2000) and the ∆K-based approach (Evanno, Regnaut, & Goudet, 2005).
Results from each batch of ten runs from the same K were summarized using CLUMPP (Jakobsson & Rosenberg, 2007), and the results were treated graphically with DISTRUCT (Rosenberg, 2004; two V. michelii and three V. surinamensis samples were discarded for clustering analyses because they had data for fewer than six nSSRs). We assigned individuals to a cluster if their admixture coefficient value was above the 0.90 threshold for a given inferred ancestral cluster. Following the "blind genetic survey" strategy (Duminil, Caron, Scotti, Cazal, & Petit, 2006), we used the specimens with firm botanical identification to assign Bayesian clusters to botanical species. All specimens having been left as "unidentified" or having a putative identification (see above), and assigned to a given cluster, were then taken as belonging to the botanical species corresponding to that cluster.

| Genetic diversity and genetic differentiation among species
For both the nine-locus nuclear microsatellite data set and chloro-

| Coalescent modeling
We used a combination of coalescent modeling and ABC parameter estimation to test whether gene flow occurs, or occurred, between V. surinamensis and V. kwatae clusters. V. michelii was not included in coalescent modeling because the strong level of genetic divergence with the two other species suggests a much older speciation event, which was not contemporary to the V. surinamensis/V. kwatae pair.
The speciation process was modeled by the following historical events (from past to present, see Figure  In addition, we built two additional composite parameters: the difference DT 1 between t DIV and t START and the difference DT 2 between t START and t STOP. It is important to note that all parameters are scaled to mutation rate (locus −1 generation −1 ), which is also estimated in the model. Absolute value should not be taken as strictly informative; ratios, differences, and comparisons are, on the contrary, entirely meaningful.

| Genetic variability
All the ten nuclear microsatellites were polymorphic in all clusters/ species. Allelic richness (averaged over loci) varied between 14.0 for V.
kwatae and 17.7 for V. surinamensis ( In the automated Bayesian assignment analysis performed by STRUCTURE, the highest posterior probability was obtained for Differentiation was significant for all pairs of clusters/species (Table 3)   Coalescent/ABC analyses were applied to the V. kwatae/V. surinamensis pair. Table 4 reports the medians of the posterior distributions for population parameters estimated under the ABC framework.
Posterior distributions of the simple and composite (scaled) event time parameters, that is, t DIV , t START, and t STOP , plus the differences in time between t START and t DIV (DT 1 ) and between t STOP and t START (DT 2 ) scaled by the mutation rate, display well-defined peaks, distinct from the priors (Figure 6), thus indicating that our data set was informative relative to the estimation of the parameters. In the goodness-of-fit test, the probability of a nonrandom deviation of the estimated parameters from the observed was p = .21. In the principal component analysis, the vector of empirical summary statistics fell well within the cloud of s simulated summary statistics (Figure 7).
Scaled effective population size was much higher for V. surinamensis than for V. kwatae, with medians 2.74 × 10 2 and 5.38 × 10 −2 , respectively, with a four orders-of-magnitude difference (Table 4) (Table 4), the two species would have spent, very roughly, a quarter of their postdivergence biological history without gene flow, then another quarter with, and then again a half without.
Notice that, by construction (Figure 2), t STOP + DT 2 + DT 1 = t DIV . As a proof of the method of estimation, if we sum median estimators of the terms on the left hand of the equality, we obtain 5.78 × 10 7 + 3.60 × 1 0 7 + 2.43 × 10 7 = 1.18 × 10 8 , not very different from the median estimator of t DIV (1.01 × 10 8 ), which is the right-hand side of the equality.
Gene flow was asymmetric, with larger flow from V. kwatae to V. surinamensis than the other way round.

| D ISCUSS I ON
A combination of analyses on nuclear SSR markers and chloroplast sequences allowed us to infer the genetic relationships among three sympatric species in the Virola genus and to propose a model for the history and strength of interspecific gene flow.
Based on our data, phylogenetic reconstruction indicates that the most recently described species, V. kwatae (previously undistinguished from V. michelii), has possibly diverged from V. surinamensis, in spite of its ecological and morphological similarity to V. michelii.
We then focused on the evolutionary relationships between the two putative sister species. The smaller (scaled) effective population F I G U R E 3 Chloroplast haplotype network. Disk sizes represent the number of copies of each haplotype in the sample. The number of segments between two disks represents the number of mutational steps that separate the corresponding haplotypes. Each color corresponds to a Bayesian cluster: black/V. kwatae; gray/V. surinamensis; white/V. michelii size in V. kwatae than V. surinamensis, estimated by coalescent analysis, is in agreement with the relative size of the species' ranges.
The main outcome of coalescent modeling is that the two species underwent intermittent genetic exchange throughout their postdivergence history, with gene flow starting and then stopping at least once in the geological past. This appears to be a rather common F I G U R E 4 Identification of the best number of groups (K) in the automated Bayesian assigned as performed by struCture. (a) Values of Evanno's ΔK statistic (y-axis) as a function of K (x-axis); (b) values of the logarithm of posterior probabilities (ln P(D), y-axis) as a function of K (x-axis). In both cases, the best value is K = 3

F I G U R E 5
Individual Q values (i.e., probabilities of assignment to a given group) obtained in the automated Bayesian assigned as performed by struCture. Although the best K value is, unequivocally, K = 3, we also report here Q values for K = 2, to highlight the proximity of the clusters corresponding to V. surinamensis and V. kwatae.  (Loubry & Puig, 1994;Sabatier, 1997) and because turnover between the habitats they occupy occurs over short distances (few tens to few hundred meters) relative to dispersal distances, which can be large for pollen and seeds in tropical trees (Dick, Etchelecu, & Austerlitz, 2003) and particularly in some Virola species (Hardy et al., 2006). In addition, flower morphologies are similar, and at least, V. surinamensis is pollinated by multiple generalist pollinators (Gonçalves Jardim & Gomes da Mota 2007). Individual BD64 could be interpreted as an early-generation hybrid (Pritchard et al., 2000), suggesting that interspecific mating is possible, even though our analyses were not specifically designed to detect hybridization; however, current mating events do not seem to contribute to gene flow, which-according to our coalescent model-has stopped far back in the past. The clear ecological divergence between the two species and their overlapping distribution in the Eastern Guiana shield would a priori suggest a process of ecological speciation (Schluter, 2001), in which endemic V. kwatae would have sympatrically diverged from widespread V. surinamensis; yet on the contrary, the posterior distributions of event times and of composite parameters DT 1 , DT 2 support intermittent secondary contact: DT 1 >> 0 suggests that gene flow did not begin until long after divergence, and DT 2 >> 0 suggests that gene flow stopped long time ago. Therefore, our modeling results support a scenario of allopatric speciation, followed by fluctuating secondary contact: There may actually have been more than one stop-and-go events, but our data probably lack the necessary resolution to detect them. Speciation with continuous gene flow is possible (Dettman, Sirjusingh, Kohn, & Anderson, 2007;Kondrashov & Kondrashov, 1999;Rice & Hostert, 1993) and it has been theoretically (see Nosil & Crespi, 2004) and in a natural scenario (Lin, Ziegler, Quinn, & Hauser, 2008) shown between populations of the same species and at the interspecific level (Kremer et al., 2002;Whittemore & Schaal, 1991), but unambiguous examples of sympatric speciation are rare in nature (Coyne & Orr, 2004) ; many cases of divergence with gene flow discussed in the literature involve a phase of allopatry, especially in geographic zones where populations have been isolated for a long time, for example, during glaciations. The case we report here appears to belong to the latter type.
Similar levels of interspecific genetic differentiation have been observed in ecologically diverse species complexes, and particularly in oaks, with F ST varying between 0.011 and 0.378 between Q. petraea and Q. robur, which also occupy different (dry vs. wet) habitats (Neophytou, Aravanopoulos, Fink, & Dounavi, 2010), and have possible undergone cycles of geographical connection and separation through glacial cycles (Leroy et al., 2017). Could it be that the patterns of intermittent gene flow we observe in the Virola genus have been determined by climatic events? To tentatively answer this question, we can attempt a crude estimation of the timing of events reported here. If we assume an average value for SSR mutation rates of 10 -4 and a generation time of 50 years (Barthe et al., 2017), we estimate the following times in years: t DIV = 505,000; t START = 428,000; t STOP = 289,000 (example: t STOP = t STOP / µ = 5.78 × 10 7 /10 −4 = 5,780; this, multiplied by 50, yields 289,000 years).
These estimates of divergence time and of onset and stop of gene flow between V. surinamensis and V. kwatae support the hypothesis that these events occurred during the Pleistocene climatic cycles, perhaps due to cycles of disjunction and merger of their ranges and/or their habitats (Bush & de Oliveira, 2006;Colinvaux, De Oliveira, & Bush, 2000;Rull,  Abbreviations: DT 1 , difference between t START and t DIV ; DT 2 , difference between t STOP and t START ; m (kwa > sur), gene flow (genes genes −1 generation −1 ) from kwa to sur; m (sur > kwa), gene flow (genes genes −1 generation −1 ) from sur to kwa; N e (kwa), effective population size for kwa; N e

ACK N OWLED G M ENTS
The authors wish to thank Hope Draheim and Christopher W.

DATA AVA I L A B I L I T Y S TAT E M E N T
Original nuclear SSR and chloroplast sequence data, along with provenance, species identification and identification status of each voucher, have been deposited on DRYAD (https://doi.org/10.5061/ dryad.vp65h).