Hybrid zone of a tree in a Cerrado/Atlantic Forest ecotone as a hotspot of genetic diversity and conservation

Abstract The Cerrado, the largest Neotropical savanna, and the Brazilian Atlantic Forest form large ecotonal areas where savanna and forest habitats occupy adjacent patches with closely related species occurring side by side, providing opportunities for hybridization. Here, we investigated the evolutionary divergence between the savanna and forest ecotypes of the widely distributed tree Plathymenia reticulata (n = 233 individuals). Genetic structure analysis of P. reticulata was congruent with the recognition of two ecotypes, whose divergence captured the largest proportion of genetic variance in the data (F CT = 0.222 and F ST = 0.307). The ecotonal areas between the Cerrado and the Atlantic Forest constitute a hybrid zone in which a diversity of hybrid classes was observed, most of them corresponding to second‐generation hybrids (F2) or backcrosses. Gene flow occurred mainly toward the forest ecotype. The genetic structure was congruent with isolation by environment, and environmental correlates of divergence were identified. The observed pattern of high genetic divergence between ecotypes may reflect an incipient speciation process in P. reticulata. The low genetic diversity of the P. reticulata forest ecotype indicate that it is threatened in areas with high habitat loss on Atlantic Forest. In addition, the high divergence from the savanna ecotype suggests it should be treated as a different unit of management. The high genetic diversity found in the ecotonal hybrid zone supports the view of ecotones as important areas for the origin and conservation of biodiversity in the Neotropics.


| INTRODUC TI ON
Understanding the processes that underlie genetic divergence between lineages is one of the major goals of evolutionary biology. The speciation process occurs along a continuum of genetic divergence where the incipient lineages often undergo a period of partial reproductive isolation until they become fully isolated (Christie & Strauss, 2018;Dufresnes et al., 2019;Kopp & Frank, 2005;Wu, 2001). It is frequently recognized that most genetic divergence during speciation occurs owing to allopatric divergence, where there is little or no gene flow between geographically isolated populations (Coyne, 1992). However, a growing body of evidence suggests that speciation with gene flow is more common in nature than previously thought (Choi et al., 2019;Feder et al., 2012;Hanson et al., 2016;Rifkin et al., 2019). The genomic divergence under speciation with gene flow can be very different from that of allopatric divergence because gene flow can homogenize the genomic background, while islands of divergence may result for loci under divergent natural selection (Feder et al., 2012). In speciation with gene flow models, divergence between incipient lineages is mainly driven by two processes, divergent selection that favors the divergence of groups of individuals with extreme traits, and nonrandom or assortative mating (Servedio, 2016).
Differential selection and nonrandom mating may lead to an isolation by environment (IBE) pattern, in which the association between genetic and environmental distance is stronger than that between genetic and spatial distance between populations (Wang & Bradburd, 2014). IBE is agnostic to the cause of divergence and rather a description of the pattern since many processes such as selection against immigrants, reduced hybrid fitness, and biased effective dispersal can generate a pattern of IBE (Wang & Bradburd, 2014). Under IBE, even neutral markers are expected to show some level of correlation with the environmental distance due to the impact of differential gene flow among ecologically similar or dissimilar habitats (Nosil et al., 2008(Nosil et al., , 2009. Disentangling IBE from isolation by distance (IBD), where lineages diverge due to limited dispersal of individuals, may be statistically difficult since the environmental distance and the spatial distance are often correlated (Nadeau et al., 2016). However, evaluating the relative importance of IBE versus IBD in the genetic divergence can give some insights into the possible causes of evolutionary divergence of lineages.
In the speciation process, some degree of hybridization may be commonly found. Hybridization, estimated to occur in 25% of plant taxa (Mallet, 2005), can create new gene combinations in later generation hybrids, lead to new species through hybrid speciation (Abbott, 2017;Abbott et al., 2013;Maguilla & Escudero, 2016;Paun et al., 2009), and influence the patterns of genetic variability of species through genomic introgression (Francisco et al., 2018;Gompert et al., 2017;Rieseberg & Carney, 1998). Furthermore, hybridization can lead to the disruption of reproductive barriers between incipient lineages reverting the speciation process (Abbott et al., 2013;Soltis & Soltis, 2009) or can lead to extinction through genetic or demographic swamping (Rhymer & Simberloff, 1996;Todesco et al., 2016). Therefore, the study of gene flow between lineages and the genotypic composition of hybrid zones can help us investigate evolutionary questions related to the formation of species and the development of reproductive barriers between lineages.
Neotropical ecosystems harbor high levels of biodiversity and species endemism, but the causes of such remarkable diversification are still poorly understood. The Brazilian Atlantic Forest, constituted by evergreen and semideciduous forest (Oliveira-Filho & Fontes, 2000), and the Cerrado, considered to be the most species-rich savanna of the world, are recognized hotspots for conservation (Myers et al., 2000). The Cerrado features poorer and more acid soils and its climate is characterized by a long dry season with less annual precipitation in comparison with the Atlantic Forest (Eiten, 1972;Klink & Machado, 2005). In contrast with the Atlantic Forest, natural fires are common in the Cerrado .
The Cerrado and Atlantic Forest are in contact in southeastern Brazil where they form large ecotonal areas where savanna and forest habitats occur in adjacent patches with differences in tree density and species composition (Durigan & Ratter, 2006). In these ecotonal areas, pairs of closely related species from the Cerrado and Atlantic Forest are found side by side (Hoffmann & Franco, 2003;. This spatial proximity can provide opportunities for gene flow between closely related species if reproductive isolation between these taxa is not completed. In fact, a few studies using nuclear markers with sampling in contact zones between the Atlantic Forest and the Cerrado have found gene flow between closely related tree taxa (Cavallari et al., 2010;Lacerda et al., 2002;Muniz et al., 2020;Resende-Moreira et al., 2017).
Plathymenia reticulata Benth (Fabaceae, Caesalpinioideae -LPWG, 2017) is the single species of the Plathymenia genus (Warwick & Lewis, 2003) and features a widespread distribution in both biomes, the Cerrado and the Atlantic Forest. The species presents two ecotypes, one associated with forest habitats and the other with savanna habitats (Lemos-Filho et al., 2008). The P. reticulata forest ecotype usually shows straight trunks reaching up to 30 m in height and up to 70 cm in diameter, whereas the savanna ecotype reaches 6-12 m in height and 30-50 cm in trunk diameter and displays a shorter and tortuously branched trunk (Lorenzi, 1992, Figure   S1). The ecotypes also show differences in ecophysiological traits, which are associated to adaptive responses between savanna and forest habitats (Lemos-Filho et al., 2008), such as in vegetative phenology, seed and fruit morphology, seed dormancy and germination, phenotypic plasticity in response to light availability, and stem radial growth (Goulart, Lemos-Filho, & Lovato, 2005Goulart et al., 2011;Toledo et al., 2012). In addition, these studies showed that the populations occurring in the ecotone between the Atlantic Forest and the Cerrado are generally morphophysiologically intermediate relative to populations from the Atlantic Forest and the Cerrado.
A common garden experiment with P. reticulata progenies of populations from the Cerrado, the Atlantic Forest, and ecotonal areas showed that individuals developed the morphological and physiological characteristics of their respective habitats (Goulart et al., 2011), indicating that their differences are genetically determined. Genetic studies of Plathymenia showed admixture in an ecotonal population suggesting gene flow between the two ecotypes (Lacerda et al., 2002) and a pattern of plastid DNA variation congruent with a diverse, persistent central-north Cerrado population as a potential source of population expansion after the Last Glacial Maximum (Novaes et al., 2010). Therefore, P. reticulata likely shows a complex pattern of population genetic structure influenced by range dynamics due to Quaternary climate oscillations, the divergence of ecotypes, and admixture in areas where both ecotypes come into contact.
In this study, we sampled P. reticulata in the Cerrado and the Atlantic Forest, and in ecotonal areas between the two biomes. We used nuclear microsatellite markers to investigate the patterns and the putative drivers of genetic divergence between savanna and forest ecotypes and to characterize a putative hybrid zone in the ecotonal areas. Specifically, we addressed the following questions: (1) What level of genetic diversity characterizes P. reticulata and how is genetic diversity structured geographically, between and within ecotypes? (2) How much gene flow occurs in ecotonal populations, and is gene flow asymmetric between ecotypes? (3) What is the genotypic composition of the hybrid populations, are they mostly F1s or later generation hybrids? (4) What is the effect of environmental variation on the genetic structure? We discuss the evolutionary consequences of divergence in P. reticulata and the implications for conservation of genetic diversity of the ecotonal populations.

| Sampling strategy
We sampled 233 individuals of P. reticulata, comprising the savanna and forest ecotypes in 10 localities from three different habitats with three localities sampled in the Atlantic Forest (SJF, AJF, and IPF) and the Cerrado (PRS, PTS, and VZS), and four localities sampled in the ecotone between the two biomes (NEE, COE, SUE, and FEE) ( Figure 1).
In the SUE and FEE localities of the ecotone, both savanna and forest ecotypes occur in sympatry/parapatry, whereas in COE and NEE, only the savanna or forest ecotypes were found, respectively. In the ecotonal localities, we used the letters s (savanna) or f (forest) in the locality code to identify the ecotype we were referring to. For example, in the SUE locality, we sampled s-SUE and f-SUE populations of each ecotype. The classification of individuals in ecotypes was based on differences in size (savanna individuals did not exceed 10 m in height of the canopy, whereas forest individuals measured from 15 to 30 m in height), trunk (savanna individuals had a tortuous and twisted trunk, whereas forest individuals had a straight trunk), and bark (which was more suberous in the savanna ecotype) ( Figure S1).

| DNA isolation, amplification, and genotyping
Leaf or bark tissue samples were dried on silica gel and kept frozen at −20°C. Genomic DNA was extracted from leaves using the sorbitol buffer-based method according to the protocol described by Souza et al. (2012). For bark tissue, we used the protocol developed by Novaes et al. (2009). We genotyped individuals for 11 microsatellite markers previously developed for P. reticulata: Pre05, Pre10, Pre11, Pre14, Pre15, Pre16, andPre23 (Cruz et al., 2012) andPr18, Pr30, Pr71, andPr80 (Oliveira et al., 2012). Polymerase chain reactions (PCRs) were performed following the conditions established by Oliveira et al. (2012). Genotyping was performed using the au- pairs of loci was evaluated using FSTAT version 2.9.3.2 (Goudet, 2002). Micro-Checker analysis found significant evidence for the presence of null alleles in all P. reticulata localities, except for SJF and s-SUE (Table S1). However, the loci did not show significant null alleles regularly across localities, which can indicate that these loci are not in Hardy-Weinberg equilibrium in these populations; therefore, all subsequent analyses were performed with the full, noncorrected dataset. No significant linkage disequilibrium was found among loci in any pairwise comparison at the population level.

| Genetic diversity and structure of Plathymenia reticulata, and gene flow between ecotypes
Arlequin version 3.5 (Excoffier & Lischer, 2010) was used to estimate the mean number of alleles per locus (A) and the mean observed (H O ) and expected (H E ) heterozygosities. The allelic richness (A R ) and the inbreeding coefficient (F IS ) were estimated in FSTAT version 2.9.3.2 (Goudet, 2002). The hierarchical partitioning of genetic variance was investigated using the standard analysis of molecular variance (AMOVA) according to the method of Weir and Cockerham (1984), implemented in Arlequin version 3.5 (Excoffier & Lischer, 2010). We considered variation between ecotypes as the highest hierarchical level of genetic divergence, followed by variation among populations within ecotypes and variation within populations, and computed the corresponding F statistics. The pairwise genetic differentiation between localities was evaluated by F ST estimated in Arlequin version 3.5 (Excoffier & Lischer, 2010).
We used POPTREEW (Takezaki et al., 2014) to build a tree from allele frequency data using the neighbor-joining (NJ) method (Saitou & Nei, 1987) based on Nei's standard genetic distance (D ST ) (Nei, 1972) with sample size bias correction. We used the P. reticulata populations without the admixed individuals in order to evaluate the phylogenetic relationships among populations as well as between ecotypes. Bootstrap tests (Felsenstein, 1985) were performed for the evaluation of confidence intervals of the tree's branches in order to estimate the robustness of the tree.
We investigated the genetic structure as well as estimated admixture in P. reticulata using the following algorithms: (1) discriminant analysis of principal components (DAPC) implemented in the adegenet package version 2.1.4 in R 4.1.0 (Jombart, 2008;Jombart et al., 2010;R Core Team, 2020) and (2) the Bayesian clustering method implemented in STRUCTURE version 2.3.4 (Hubisz et al., 2009;Pritchard et al., 2000). DAPC has the advantage of not relying on assumptions of Hardy-Weinberg equilibrium or linkage equilibrium. In the DAPC, we used the sequential k-means clustering method implemented in the adegenet package (Jombart et al., 2010) to choose the number of genetic groups that best explains the genetic variability in P. reticulata and then used these groups in DAPC to estimate the posterior assignment probability of individuals to groups. After the evaluation of the population structure based on the two ecotypes, we applied the DAPC algorithm in each ecotype separately using only the individuals that showed a posterior assignment probability higher than 0.85, which we considered to be pure individuals. In STRUCTURE version 2.3.4, we set the number of clusters from K = 1 to K = 13 and ran 10 independent iterations per K assuming an admixture model with correlated allelic frequencies between localities. We ran 100,000 steps as burn-in followed by 1,000,000 Markov chain Monte Carlo (MCMC) iterations. The most probable number of genetic clusters was determined by the ad hoc statistic ΔK (Evanno et al., 2005), computed in Structure Harvester version 0.6.94 (Earl & vonHoldt, 2012). Besides the Evanno's ΔK method, we also used "MedMeaK" (median of means), "MaxMeaK" (maximum of means), "MedMedK" (median of medians), and "MaxMedK" (maximum of medians) statistics in StructureSelector (Li & Liu, 2018) based on the method of Puechmaille (2016). In this method, a subpopulation is assigned to a cluster if its arithmetic mean (for MedMeaK and MaxMeaK) or its median (for MedMedK and MaxMedK) membership coefficient to that cluster is greater than a threshold value (set to 0.5), therefore ensuring that a subpopulation cannot belong to more than one cluster. We averaged the admixture coefficients using CLUMPP version 1.1.2 (Jakobsson & Rosenberg, 2007) and plotted the values using the ggplot2 package version 3.3.5 in the R software (Wickam, 2009). We also tested for genetic structure within each ecotype, using all individuals identified as pure (<15% or >85% ancestry threshold) in the STRUCTURE analysis of the full dataset. For each ecotype, we ran a separated analysis setting K = 1 to K = 7.

F I G U R E 1
Map showing the sampling localities of Plathymenia reticulata in the Brazilian Atlantic Forest (dark gray) and the Cerrado (light gray) biomes, and in the ecotone between the two To estimate rates of recent migration between the two ecotypes (based on phenotype classification), we used BayesAss 3.0 (Wilson & Rannala, 2003). This software uses a Bayesian framework and MCMC iterations to estimate the posterior mean of recent migration rates (mainly the last two generations) and the standard deviation of the marginal posterior distribution from multilocus genotypes for a group of populations in a sample. The software provides estimates using the gametic disequilibrium among immigrants and their descendants. This method relaxes the assumption of previous Hardy-Weinberg equilibrium within populations to assign individuals to populations and identify migrants. We estimated the migration rates between ecotypes in the following datasets: (1) the whole sample of individuals of each ecotype, (2) the ecotonal individuals of the two ecotypes, (3) FEE locality, (4) SUE locality, and (5) forest ecotype of NEE and savanna ecotype of COE, as each of these localities has only one ecotype, but they are spatially close to each other.

| Genotypic composition of Plathymenia hybrids in the ecotone
We investigated the genotypic class composition of P. reticulata individuals using NewHybrids version 1.1 (Anderson & Thompson, 2002). The implemented method calculates the posterior probability of individuals to belong to predetermined genotypic classes (Anderson & Thompson, 2002). We applied the method using two different sets of genotypic classes. First, we used the six default pre-  (Table S2). In both analyses, we ran 1,000,000 MCMC iterations with a burn-in period of 100,000 iterations using Jeffrey'slike priors for mixing proportions and allele frequencies and assuming equal prior probabilities for all genotypic classes.
We further investigated the genotypic composition of P. reticulata individuals using the hybrid index estimated in the introgress package version 1.2.3 in R (Gompert & Buerkle, 2010). The computation of the hybrid index is based on the maximum likelihood method and estimates the genetic contribution of hybridizing "parental" populations to individuals of unknown ancestry (Buerkle, 2005). Here, we used as pure parental individuals those from the Atlantic Forest and Cerrado localities, respectively, to estimate the hybrid index of those individuals from the ecotone localities. In addition, we calculated the interspecific (interecotypic) heterozygosity (Q 12 ), which is the fraction of loci that combine the ancestry of the two parental taxa (Fitzpatrick & Irwin, 2012). The combination of Q 12 with the hybrid index can give insights into the history of hybridizing taxa through the summary of the genomic composition of individuals (Gompert & Buerkle, 2016). For example, an F2 individual may show a hybrid index of 0.5 just like F1s. However, F1s show Q 12 = 1.0, whereas F2 and later generation hybrids have lower values of Q 12 owing to the breakup of linkage disequilibria because of genome reshuffling in later generations of hybridization.

| Evaluation of hybrid assignment in the NewHybrids
The efficiency of NewHybrids to identify and classify hybrids was evaluated using purebred parental and hybrid individuals simulated using the function hybridize in the adegenet package (Jombart, 2008).
We simulated five datasets based on the allele frequencies of pure (Q higher than 0.85 or lower than 0.15) P. reticulata individuals of the Atlantic Forest and the Cerrado areas, respectively. The simulated datasets comprised 100 individuals of each parental and 100 individuals of each hybrid class, F1, F2, and backcrosses of F1 with each parental (B1 and B2). We also simulated more advanced generation Chhatre et al. (2018) and Buck et al. (2020) (Table S2). To evaluate the performance of NewHybrids to distinguish hybrid classes, we ran the simulated dataset of six genotypic classes using six genotypic classes in NewHybrids, the 14 simulated genotypic classes using six genotypic classes in NewHybrids, and the 14 simulated genotypic classes using 12 genotypic classes in NewHybrids (Table S2). Finally, we set four thresholds for posterior probabilities of assignment to determine genotypic classes (0.5, 0.75, 0.85, and 0.9) to evaluate the proportion of correct and incorrect assignments. We also assessed the efficiency of STRUCTURE and introgress to detect and classify hybrids based on simulated data (Appendix S1).

| Drivers of divergence of Plathymenia reticulata ecotypes
We examined the effect of edaphic and bioclimatic variables on genetic differentiation between P. reticulata populations using two approaches: (1) Mantel tests as a first approach to assess effects of isolation by distance and isolation by environment and (2) distancebased redundancy analysis, dbRDA. For Mantel tests, we calculated the pairwise spatial distance between populations using the coordinates of the localities separated by ecotypes with the geodist package version 0.0.7 in R (Padgham & Sumner, 2021). Thus, we had six ecotonal populations since the populations of SUE and FEE were separated in f-SUE and s-SUE and in f-FEE and s-FEE, respectively.
As environmental predictors, we used the edaphic variables bulk density of the fine earth fraction (BDOD), soil pH (pH), soil organic carbon (SOC), and cation exchange capacity of the soil (CEC), which were downloaded from https://files.isric.org/soilg rids/lates t/data/ and the 19 bioclimatic variables from the worldclim2 dataset https:// www.world clim.org/data/biocl im.html (Fick & Hijmans, 2017). We downloaded a raster file of edaphic data for the soil layer between 15 and 30 cm depth using a spatial resolution of 250 m. The 19 bioclimatic variables were downloaded using a spatial resolution of 30 m. Both datasets were cropped to the extent of the sampling area for further analysis. We assessed the correlation between edaphic variables and bioclimatic variables separately through principal component analysis using the function rasterPCA of the RStoolbox version 0.2.6 (Leutner & Horning, 2019) in order to avoid collinearity between variables. As the edaphic variables showed low correlation (<0.7), we used the original dataset for posterior analysis. Bioclimatic variables showed high correlation; therefore, we based our analyses on the first five PCA components which together explained 95.5% of the variance in the bioclimatic variables (Table S3). We calculated the Mahalanobis distance of the environmental data using the distance function of the ecodist 2.0.7 package (Goslee & Urban, 2007).
We performed Mantel tests in the R package ncf 1.2-9 (Bjornstad, 2020) using the pairwise F ST between populations as genetic distance. We performed simple Mantel tests between genetic and spatial distance, or between genetic and environmental distance, separately for edaphic and bioclimatic variables. Then, we performed partial Mantel tests between genetic and environmental distances in which the spatial distance was used as a controlling factor, and between genetic and spatial distances in which environment was used as a controlling factor.
To further evaluate the association between environmental variables and the genetic divergence among populations, we performed a distance-based redundancy analysis (dbRDA) (Legendre & Legendre, 2014). This multivariate method combines ordination with multivariate regression approach to find linear relationships between a matrix of predictor variables and a matrix of response variables. The method is relatively robust to collinearity among predictor variables (Wagner & Fortin, 2012) and in the detection of environmental gradients in landscapes (Kierepka & Latch, 2015). In these analyses, we used the environmental dataset as predictors and the scores of the first three axes of a spatial principal components analysis (sPCA) obtained using the spca function from adegenet package as the genetic variables . The sPCA summarizes the genetic variability of a group of individuals while it controls for spatial autocorrelation between them using Moran's I statistic . We also used the pcnm function of the vegan package version 2.5-7 to calculate spatial predictors for the dbRDA based on the transformation of latitude and longitude of populations (Oksanen et al., 2020). To find the best dbRDA model, we performed a stepwise variable selection based on the Akaike information criterion (AIC) using ordiR2step function in vegan (Oksanen et al., 2020) and used the variance inflation factor (VIF) to further assess the collinearity of the variables within the selected model. The significance of the models, of the variables in the models, and of the axis of dbRDA was assessed with a permutation test based on 1000 permutations using the Anova function in R (R Core Team, 2020).
Finally, we estimated the proportion of variance explained by each variable (partial R 2 ) using the function varpart of the vegan package (Oksanen et al., 2020).

| Genetic diversity and structure of Plathymenia reticulata
Plathymenia reticulata localities showed A and A R values ranging from 3.9 to 10.0 and from 3.72 to 6.61, respectively, and H O and H E values ranging from 0.361 to 0.667 and 0.456 to 0.749, respectively ( Table 1). Populations of the Cerrado harbored higher diversity than those of the Atlantic Forest for all the diversity indices.
In addition, the ecotone localities showed higher diversity than the Atlantic Forest and Cerrado localities; however, the savanna and forest ecotypes in the ecotone exhibited similar overall genetic diversity. The inbreeding coefficients (F IS ) ranged from −0.005 to 0.330 with three localities from the Cerrado, one from the Atlantic Forest, and two in the ecotone showing significant positive F IS (p < .05 after sequential Bonferroni correction) (Table 1).
AMOVA showed high divergence among localities (F ST = 0.187; Table S4). The hierarchical AMOVA showed that most of the variation among localities is explained by the divergence between ecotypes (F CT = 0.222 and F ST = 0.307). When the hierarchical AMOVA was performed considering localities of the Atlantic Forest and the Cerrado only, the part of total variation attributed to divergence between ecotypes was even higher (F CT = 0.337 and F ST = 0.396) (Table   S4) Table S5).
The neighbor-joining (NJ) tree showed that the savanna and forest ecotypes are highly divergent from each other (Figure 2). Both groups showed bootstrap values above 90% indicating a high support for the grouping of these populations. Furthermore, the NJ tree showed that the forest populations of the ecotone were divergent from the populations of the Atlantic Forest (Figure 2).
The number of clusters that best explained the genetic structure in the P. reticulata dataset was K = 2 based on the sequential kmeans clustering method. The DAPC analysis based on this grouping In agreement with the STRUCTURE and DAPC, the recent migration rates between ecotypes estimated in BayesAss were high in the ecotonal region ( Table 2). The migration rate estimates were higher into the forest ecotype in two of three ecotonal population pairs sampled, with values ranging from 0.024 to 0.119 into the forest ecotype and values ranging from 0.030 to 0.061 into the savanna ecotype (Table 2). When considering the whole sample, which included ecotone, Cerrado, and Atlantic Forest localities, the migration rate into the forest ecotype was also higher than that into the savanna ecotype with values of 0.040 (SD = 0.01) and 0.024 (SD = 0.09), respectively, and the same was true for the analysis restricted to ecotone localities (Table 2).
In the evaluation of the structure using only the pure individuals of each ecotype in DAPC, we found K = 5 in the forest and K = 7 in the savanna ecotype. In the forest ecotype, all the localities showed  and savanna ecotypes (Figure 4, Figure S2b,c). Using the criteria of StructureSelector, K = 5 was the best number of clusters within both ecotypes ( Figure 4, Figure S3a,b). Both ecotypes displayed a genetic structure that was partially congruent with the geographic localities, though most localities featured more than one cluster as well as admixed individuals, especially those of the forest ecotype ( Figure 4).

| Genotypic composition of Plathymenia reticulata in the hybrid zone
The classification of P. reticulata individuals in the NewHybrids anal-

| Evaluation of hybrid assignment in NewHybrids
The model evaluation using NewHybrids with six simulated genotypic classes showed an average proportion of correct assignments of pure parental individuals of almost 91% when setting a threshold of 0.85 (Table S6; Figure S4). Only the F2 category showed a proportion of correct assignments lower than 50%, while F1, B1, and B2 displayed 79%, 66%, and 65% correct assignments, respectively. When using 14 simulated crosses for the assignment in 12 genotypic classes, a very low proportion of correct assignment of pure individuals was found, with most of them showing values of posterior probability of assignment below 0.85 (Table   S6; Figure S4). The classification was better for the hybrid classes, especially for F1 and F2 with 57% and 59% of individuals correctly assigned to their respective categories (Table S6). The classification of backcrosses never exceeded 10% of correct assignments suggesting that our data could not resolve later generation hybrids correctly, especially for second-generation backcrosses (Table S6).
These results indicate a relatively high capacity in the differentiation between admixed and nonadmixed individuals, but a poorer capacity to differentiate among advanced generation hybrids (F2 and backcrosses). Finally, these simulations let us assume that some individuals classified as F2 in the NewHybrids analysis with F I G U R E 2 Neighbor-joining tree based on D ST genetic distance among Plathymenia reticulata populations with bootstrap support of nodes six genotypic classes could in fact be backcrosses ( Figure S5). The evaluation of the performance of STRUCTURE and introgress to identify and classify hybrids based on simulated data is presented in Appendix S1.

| Drivers of divergence of Plathymenia reticulata ecotypes
Simple Mantel tests indicated that the dataset displayed both a signature of isolation by distance and of isolation by environment (Table 3). Genetic distance showed a higher correlation with bioclimatic distance (r = .381, p = .006) than with spatial distance (r = .307, p = .009) or soil distance (r = .308, p = .023). The partial Mantel test showed that the correlation of the bioclimatic distance with the genetic distance remained significant when controlling for geographical distance (r = .260, p = .035, Table 3). Other partial Mantel tests were not significant.
The sPCA test indicated a significant global structure (p = .042) with the first three axes showing significant divergence between the P. reticulata populations ( Figure S5). The dbRDA analysis identified an association of genetic divergence based on sPCA with Cation Exchange Capacity (CEC), the fourth synthetic variable of the climatic PCA (CLIM4), and with spatial distance between populations (PCNM2) (  (Table   S8). The estimated partial adjusted R 2 showed that CEC was the variable with the highest explanatory power, with CLIM4 being the second ( Table 4). The first axis of dbRDA separated the populations of the Cerrado from those of the Atlantic Forest with the ecotonal populations being intermediary (Figure 7).

| DISCUSS ION
Our study reveals genetic divergence between savanna and forest ecotypes of P. reticulata and gene flow between ecotypes across ecotonal areas between the Brazilian Cerrado and Atlantic Forest, characterizing a hybrid zone. Hybrid individuals mostly corresponded to second-generation hybrids or backcrosses toward the forest ecotype. We observed asymmetric gene flow between ecotypes, which was higher into the forest ecotype. Our data also suggest that besides the geographical distance, climatic and edaphic variables partially explain the genetic divergence between the ecotypes. The ecotonal areas harbor high genetic diversity relative to the Atlantic Forest and Cerrado biomes for P. reticulata and can be an important source of variation for the species, mainly for the forest ecotype.

| Genetic divergence between savanna and forest ecotypes of Plathymenia reticulata
The analyses of the genetic structure and divergence of P. reticulata showed a clear genetic distinction between savanna and forest ecotypes, consistent with our a priori classification of individuals based on morphological characters of the tree trunk, bark, height, and environment of the trees. The genetic distinction between ecotypes was reinforced by the fact that even in the ecotonal localities this division was recovered using NJ trees, DAPC, and STRUCTURE analyses. Populations from the forest ecotype in the ecotone are more genetically related to populations from the Atlantic Forest, whereas populations of the savanna ecotype in the ecotone are more related to populations from the Cerrado, despite high genetic admixture in the ecotone. These genetic data are consistent with ecological studies that showed that the two ecotypes maintain their morphophysiological traits in the ecotonal areas, though ecotonal individuals are less dissimilar to the alternative ecotype than when comparing distinct ecotypes in their specific habitats (Goulart et al., 2005(Goulart et al., , 2006(Goulart et al., , 2011Toledo et al., 2012). Differences between ecotypes in several of these traits were demonstrated to be genetically determined when progenies were cultivated in a common garden experiment with different light conditions (Goulart et al., 2011).
We found an association of edaphic and climatic variables with P. reticulata genetic divergence that was in some instances stronger than the divergence explained by spatial distance between localities, suggesting a pattern of isolation by environment (IBE). The genetic divergence was associated with Cation Exchange Capacity (CEC) of the soil and with Precipitation of the Warmest Quarter, Precipitation of the Wettest Quarter, and Mean Diurnal Range Temperature (summarized in the synthetic variable CLIM4). We detected IBE using SSR markers that are considered neutral, congruent with a scenario of genome-wide hitchhiking following long-lasting selection that may have affected a small fraction of the genome, in combination with assortative mating (Andrew et al., 2012;Sexton et al., 2014Sexton et al., , 2016. The high proportion of variance explained by CCE suggests that the differences in soil between savanna and forest can be important in the divergence in P. reticulata. Differences in soil can affect plant growth strategy and be an important driver of plant functional traits and adaptive divergence even at small spatial scales (Schmitt et al., 2021), as is the case in our study where patches of savanna and forest form a mosaic of abrupt habitat shifts within meters in ecotonal areas (Lambers et al., 2011;Reich et al., 2003). Precipitation of Warmest Quarter were associated with genetic divergence and are related with extremes of temperature within a day and the seasonality in rainfall along the seasons, respectively. Associations between genetic divergence and the Mean Diurnal Range (Buzatti et al., 2019) or between genetic diversity and Precipitation of the Wettest Quarter (Ribeiro et al., 2016) were already found for Qualea grandiflora and Annona coriacea, respectively, two of the most common tree species of the Cerrado. A similar association was also found in an annual herb, Brasilianthus carajensis, in a mountain savanna in Canga plateaus within the Amazon Forest (Silva et al., 2020). The Cerrado is a more stressful environment than the Atlantic Forest in many ways, as it is, for example, characterized by a stronger and more prolonged drought season. Accordingly, in P. reticulata, previous studies on some of the sampling locations of the present study (COE andNEE in Goulart et al., 2006, andSUE in Toledo et al., 2012) showed that savanna and forest ecotypes are differentiated for drought response traits even in ecotonal areas. Differences between savanna and forest ecotypes concern also the vegetative phenology, with leaf shed beginning earlier  in the Cerrado than in ecotonal areas, and later in the forest (Goulart et al., 2005). In addition, the forest ecotype shows higher stem radial increment, positively related to increasing precipitation, relative to the savanna ecotype, which shows lower midday water potential during the dry season relative to the forest ecotype (Toledo et al., 2012). Trait divergence thus appears in accordance with environmental variation and genetic divergence in P. reticulata. Conclusive evidence on the genomic signature of natural selection in the species could be obtained from genome-wide data that might reveal contrasted differentiation of neutral versus potentially adaptive alleles in ecotype divergence.

| Characterization of Plathymenia reticulata in the hybrid zone
The high levels of genetic admixture in several ecotonal areas, with a mean of 40% of hybrids per locality, characterize these areas as a hybrid zone. Most hybrid individuals were classified as later generation hybrids with none showing high probabilities to be F1s and very few showing posterior probability between 0.5 and 0.9 to be classified as F1. Classifying hybrids in genotypic classes based on a few microsatellite markers is challenging, as shown by our simulation study which illustrated the difficulty of distinguishing later generation hybrids from pure individuals depending on the chosen assignment threshold. It should be noted that some studies have shown that individuals previously classified as F2 based on microsatellite data were in fact classified as F1s when using genomic data (Gramlich et al., 2018;Lindtke et al., 2014). In the microsatellite dataset, these misclassified individuals showed levels of interspecies ancestry lower than those expected for F1 hybrids, but unusually higher than those F1s. Therefore, a study based on genomic data may better estimate the number of early generation hybrids along the P. reticulata hybrid zone. Such a study will also increase the power of classifying later generation hybrids and help shed light on which hybridization events and backcross scenarios provide opportunities for introgression between the ecotypes.
We observed higher gene flow in direction of the forest ecotype than toward the savanna ecotype. Several causes such as demographic processes related to range expansion may cause asymmetric hybridization (Currat et al., 2008;Field et al., 2010).
The savanna ecotype populations could be expanding into forested habitats and the interbreeding between the savanna and forest ecotypes in these areas would lead to the invasion of genes of forest populations by genes of the savanna ecotype. Differences in dispersal capacity may also cause differential introgression with the populations with higher dispersal capacity promoting the spread of alleles to the other populations (Yamamoto et al., 2019).
The P. reticulata forest ecotype showed seed structures associated with higher dispersal capacity (Goulart et al., 2006), which can indicate that this factor is not the better explanation for the differential introgression. However, the occurrence of the savanna ecotype in open habitats may provide opportunities for dispersal across longer distances which may lead to the capacity of savanna ecotypes to colonize surrounding forested areas. Natural selection can also act as a driver of differential introgression (Lexer et al., 2005;Whitney et al., 2006). Ecological barriers, that is, a harsh environment, may effectively reduce the gene flow into savanna populations. Another hypothesis to explain the differences in gene flow could be related to the longer juvenile stage of forest individuals compared to the savanna individuals in which frequent fires appear to select for earlier reproductive maturity (Hoffmann & Franco, 2003 (Behling, 1998;Behling & Negrelle, 2001;Meneses et al., 2013), may have provided opportunities for the expansion of P. reticulata from the savanna toward previously forested areas.
Gene flow in vicariant species such as P. reticulata with subspecific taxa typical from forests and savannas was also identified in a few other tree species in savanna-forest boundaries including ecotonal and riparian forests in the Cerrado biome (Cavallari et al., 2010;Muniz et al., 2020;Resende-Moreira et al., 2017). This can suggest that hybridization in the ecotonal region between Atlantic Forest and Cerrado can be a relevant evolutionary process increasing the genetic diversity of the species of these biomes. The study of other closely related species of other groups can show how common and phylogenetically widespread is the gene flow between lineages of Cerrado and Atlantic Forest.

| Implications for conservation
The highest genetic diversity found here in Plathymenia populations of the savanna-forest ecotone may be due to the hybridization between ecotypes/species across this area. This is a pattern commonly found in hybrid zones where the admixture between lineages increases genetic diversity (Carlson et al., 2017;Eaton et al., 2015;Y. Li et al., 2016;Zalapa et al., 2010). Therefore, these ecotonal areas are an important source of genetic diversity for Plathymenia, mainly for Atlantic Forest populations that show low genetic diversity. The Atlantic Forest has a history of extensive habitat loss and fragmentation and currently remains mostly in small and isolated patches (Ribeiro et al., 2009). In comparison, the deforestation in the Cerrado, though very quick, was only significant from the middle of the twentieth century (Silva et al., 2006). The low genetic diversity of the Atlantic Forest populations indicates that they may have lower adaptive potential and could be severely impacted by ongoing habitat loss and fragmentation in the Atlantic Forest. The forest ecotype, previously considered as P. foliolosa, is considered vulnerable in the IUCN red list (World Conservation Monitoring Centre, 1998). The conservation efforts for the forest ecotype should be increased and this taxon should be considered a different management unit from the savanna ecotype. Thus, in addition to the conservation efforts in the Atlantic Forest and Cerrado areas, our data point to the importance of including ecotonal areas between these biomes, which could carry alleles to augment the genetic diversity in the Atlantic Forest.

| CON CLUS ION
Based on multiple independent approaches, with distinct assumptions, we showed that the ecotonal areas between the Cerrado and the Atlantic Forest constitute a hybrid zone between the genetically differentiated forest and savanna ecotypes of P. reticulata. Our results suggest the suitability of the ecotonal areas for studies of evolutionary divergence and speciation. The two P. reticulata ecotypes should be considered as different species or at least two different management units with special attention to the forest ecotype. The dynamics of ecotonal areas, as evidenced here, can have a significant role in the origin of the high biodiversity in the Cerrado and Atlantic Forest, highlighting the importance of these areas as hotspot of genetic diversity and conservation. (ANR-10-LABEX-0025).

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest. writing -review and editing (equal). Maria Bernadete Lovato:

DATA AVA I L A B I L I T Y S TAT E M E N T
Genotype data, geographical coordinates, and scripts to simulate individuals in distinct genotypic classes will be archived in Dryad