Rivers shape population genetic structure in Mauritia flexuosa (Arecaceae)

Abstract The Mauritia flexuosa L.f. palm is known as the “tree of life” given its importance as fundamental food and construction resources for humans. The species is broadly distributed in wet habitats of Amazonia and dry habitats of the Amazon and Orinoco river basins and in the Cerrado savanna. We collected 179 individuals from eight different localities throughout these habitats and used microsatellites to characterize their population structure and patterns of gene flow. Overall, we found high genetic variation, except in one savanna locality. Gene flow between populations is largely congruent with river basins and the direction of water flow within and among them, suggesting their importance for seed dispersal. Further, rivers have had a higher frequency of human settlements than forested sites, contributing to population diversity and structure through increased human use and consumption of M. flexuosa along rivers. Gene flow patterns revealed that migrants are sourced primarily from within the same river basin, such as those from Madeira and Tapajós basins. Our work suggests that rivers and their inhabitants are a critical element of the landscape in Amazonia and have impacted the dispersal and subsequent distribution of tropical palm species, as shown by the patterns of genetic variation in M. flexuosa.

During my residence in the Amazon district I took every opportunity of determining the limits of species, and I soon found that the Amazon, the Rio Negro and the Madeira formed the limits beyond which certain species never passed. The native hunters are perfectly acquainted with this fact, and always cross over the river when they want to procure particular animals, which are found even on the river's bank on one side, but never by any chance on the other. On approaching the sources of the rivers they cease to be a boundary, and most of the species are found on both sides of them.

| INTRODUC TI ON
Environmental and geographic features of the landscape are crucial in shaping the population genetic structure and demography of plants.
It is increasingly evident that pre-Columbian and modern peoples that live along rivers have impacted the genetic patterns of forests within and among basins (Piperno et al., 2015;Stahl, 2015). Although there is no evidence of domestication of M. flexuosa, it is widely used by indigenous groups and local communities along rivers, who refer to it as the "tree of life" because it provides a variety of resources and it is consumed daily as a food staple (Barros & Da Silva, 2013). This palm is also used for raw material for construction and for different handicrafts (Santos & Coelho-Ferreira, 2011) and its fruits, leaves, and seeds are sold widely in markets (Gilmore, Endress, & Horn, 2013). Mauritia flexuosa has been termed a "hyperdominant" species (Steege et al., 2013), in which population densities are five times higher than expected by chance, and that has recently been attributed to human use and tending practices associated to its use (Levis et al., 2017;Rull & Montoya, 2014 (Gomes et al., 2011;Rossi et al., 2014). Chloroplast markers used to characterize M. flexuosa in different river basins revealed low nucleotide diversity within populations from the Brazilian savannas, which was interpreted as range retraction followed by population subdivision during the cold and dry periods of the Quaternary glacial periods (de Lima, Lima-Ribeiro, Tinoco, Terribile, & Collevatti, 2014). These studies begin to provide information on the genetic structure of M. flexuosa, yet the genetic variation and population structure of M. flexuosa across different river basins remain to be tested more explicitly.
Our main research questions are whether rivers in Amazonian forests are facilitators or barriers to gene flow, whether population genetic structure is maintained in populations throughout river basins, and if recruitment is associated with river flow. We address these questions using microsatellite markers across different river basins in tropical forests and savanna sites, and we also discuss the impact of human river inhabitants in the generation of recent population structure of this palm species.

| Collection sites and sampling
Plants were sampled from two of the major river basins in Amazoniathe Madeira and the Tapajós (  Table 1).

| Microsatellite amplification
Leaves were collected in the field and stored in silica gel. DNA was extracted following the manufacturer's protocol of the Wizard Genomic DNA Purification kit (Promega, Madison, WI, USA). We selected 10 microsatellites previously designed for M. flexuosa (Federman, Hyseni, Clement, & Caccone, 2012;Menezes et al., 2012; Table S1) based on consistency of amplification. PCR conditions for all primers in individual reactions were 94°C for 5 min; 35 cycles of 94°C for 1 min, 62°C for 1 min, and 72°C for 1 min; then 72°C for 2 min. Amplification products were genotyped using capillary electrophoresis system (7.5 kW for 120 min; Advanced Analytical, Ankeny, IA, USA), together with standardized markers containing fragments of 35 and 500 bp and 75-400 bp DNA ladder in a single well to determine the size of the amplified fragments.

| Genetic diversity and population genetic structure
MICRO-CHECKER v 2.2.3 was used to correct genotypes for null alleles, scoring errors, and allelic dropout (van Oosterhout, Hutchinson, Wills, & Shipley, 2003). LOSITAN was used to test for neutrality in each locus with 1,000,000 simulations and a 99.5% confidence interval using both stepwise and infinite mutation models (Antao, Lopes, Lopes, Beja-Pereira, & Luikart, 2008). To test for biases in the sample sizes and large distribution of this species, we estimated allelic richness by rarefaction for all populations using the Vegan v 2.4-6 package (Oksanen, Kindt, Legendre, O'Hara, & Stevens, 2011) in the R statistical platform (R Core Team, 2014). Genetic diversity was calculated by assessing the number of alleles per locus, observed heterozygosity (H o ), expected heterozygosity (H e ), and the inbreeding coefficient (F) using Arlequin v 3.5 (Excoffier, Laval, & Schneider, 2005). We measured pairwise population genetic structure with F ST (Wright, 1949) and R ST (Slatkin, 1995), also using Arlequin. R ST was used to complement F ST as it is less sensitive to the fast mutation rate reported in microsatellites (Holsinger & Weir, 2009). We visualized pairwise F ST and R ST in a heat map using the R package lattice v 0.20 (Sarkar, 2015). Finally, given the large geographic scale of our samples and potentially confounding signals from isolation by distance (IBD; Meirmans, 2012), we estimated IBD between all sampling sites and within Basins, using the adegenet v 2.0.0. R package (Jombart, 2008).
To calculate regional and within-population genetic diversity from different river basins and different regions, a Molecular Analysis of Variance (AMOVA) was conducted using the sum of squares size difference (R ST ). The eight collection sites (GUA, MAD, MAM, JUR, TAP, TPI, BVI, and XAP) were divided into four groups based on major geographic areas: Madeira basin (GUA, MAD, and MAM), Tapajos basin (JUR, TAP, and TPI) and BVI and XAP. Significance was tested using 1,000 permutations with a 95% confidence interval. Population genetic structure was also measured using the Bayesian assignment method STRUCTURE v 2.3.4, which uses genotypes to assign individuals to a genetic group without a priori assumptions of populations (Pritchard, Stephens, & Donnelly, 2000). We used the admixture model and a correlated model with a burn-in length of 100,000 steps with 2,000,000 replicates. We tested the number of distinct genetic clusters (populations; K) present in the data set from 1 to 10 using 20 iterations per K. We used a maximum of ten populations to allow for the possibility that a sampled location is substructured into more than one population.
We used the ΔK method of Evanno, Regnaut, and Goudet (2005), implemented in STRUCTURE Harvester v 0.6.94, to determine the most likely number of genetic clusters K given our data (Earl & Von Holdt, 2011).
We also employed a graph theoretical framework to estimate population genetic summary statistics and to visualize the network of gene flow among populations that presumably results from both historical and contemporary history (Dyer & Nason, 2004). We defined each original sampling locality as a node and an alpha of .01 as the significance level to test edge retention, in the R package popgraph v 1.4 (Dyer & Nason, 2004). Additionally, to evaluate the direction of river water flow and its impact on gene flow patterns, we calculated migration rates among all sampled localities using BayesAss+ (Wilson & Rannala, 2003), which is a method that estimates immigration rates of a population with respect to all other populations, based on the analysis of genotypes using coalescent theory. Values closer to one indicate that individuals in that population are a result of self-recruitment, while values closer to zero suggest that a population comprises migrants from other populations.
Finally, given the possibility of one or several founder events as a result of long-distance seed dispersal by river water currents or by human use, we tested for reduction in population size using Wilcoxon sign-rank test implemented in Bottleneck v 1.2.02 (Cornuet & Luikart, 1996). Under a model of mutation-drift equilibrium, populations that have experienced a recent reduction in effective population sizes may present higher observed than expected heterozygosity (Maruyama & Fuerst, 1985). Although various models exist for microsatellites (Putman & Carbone, 2014), the SMM mutation model can implement equal probability of gaining or losing repeats, therefore accounting for homoplasy. We used the SMM model at 100%; the two-phase mutation model allows for mutations of a larger magnitude than SMM but retains the mutation model and was used at 70% (Di Rienzo et al., 1994).

| High genetic variation in Mauritia flexuosa
No genotyping errors or null alleles were inferred using MICRO-CHECKER. Eight pairs of loci were in linkage disequilibrium (Table   S2), all populations deviated from Hardy-Weinberg Equilibrium with the exception of XAP (Table S2). Rarefaction estimates of allele richness in all populations showed that 70% of all possible alleles were sampled for all populations except for XAP (  (Table S3). We found IBD with marginally significant values of r = .42 (p = .02) among all populations, but none within basins (Table S2) BVI individuals had either admixed genotypes or shared ancestry within the Tapajós basin cluster. The population graph analysis showed that populations from the same river basin are highly connected, as in the case of JUR and TPI rivers that flow into the TAP ( Figure S1) and the GUA, MAD, and MAM rivers that are part of the same basin. Our results showed some genetic connectivity between XAP and BVI, and it is clear that its genetic diversity is lower than the rest of the populations sampled ( Figure S1), although its smaller sample size may affect this result (rarefaction ;   Table S2). We did not recover evidence of genetic bottlenecks in any site except for TAP (SMM 0.01). We

| Genetic variation is structured within populations of Amazonia
Our In the Cerrado, the two populations we sampled have distinct genetic patterns between them (BVI and XAP). The BVI population is the least inbred of these two according to the fixation index (F). The area surrounding BVI (Roraima State) is thought to be the "center of origin" for many plant species (Pielou, 1979), including M. flexuosa and other palms (Rull, 1998;van der Hammen, 1957). In contrast, the XAP Cerrado population is less diverse and more inbred, which is consistent with previously observed low genetic diversity within populations of M. flexuosa in the Cerrado (e.g., de Lima et al., 2014). This is partially explained by our relative smaller sample size as shown by our rarefaction results but may also be due to population decline or incomplete lineage sorting during shifts in forest expanse during glacial cycling. The absence of private alleles in XAP suggests recent population establishment and/or assortative mating. Furthermore, the XAP population is higher in elevation (800 m), with the nearest population at least 300 km away as per our field observations, which suggests high differentiation and lower levels of genetic diversity among populations increased due to high geographic isolation.

| Insights of influence of human management on genetic diversity and gene flow
Our results are also consistent with the hypothesis suggested that hyperdominant plants in Amazonia, such as M. flexuosa, correlate with their proximity to pre-Columbian archeological sites, and that plant populations of economically important species are maintained preferentially along river margins (Levis et al., 2017).
Furthermore, as humans increasingly hunted large vertebrates in forests typically far from the water (Peres, Emilio, Schietti, Desmoulière, & Levi, 2016), animal-dependent seed dispersal of M. flexuosa decreased in those areas, resulting in lower gene flow, TA B L E 2 Bayesian assessment of migration within and among sampling localities implemented in BayesAss+. For each sampling locality, numbers are the mean proportion of individuals for each source locality. Boldface terms along the diagonal are proportion of non-migrants (self-recruitment all the while maintained closer to rivers. Although these observations remain to be tested explicitly, our patterns of high diversity are also consistent with the hypothesis that large population sizes of this species have been maintained by continuous activities of human cultivation, likely for thousands of years (Levis et al., 2017).
As a result, outcrossing would be favored by human tending and a high number of reproductive individuals would be maintained, resulting in a higher effective population size and thus higher genetic variation (Frankham, 1996).
Our data on recent genetic migration among populations also show that the Juruena river ( although this remains to be tested explicitly.
The argument that the distribution of many species, or even the composition of Amazonia, is the result of domestication from pre-Columbian peoples who altered landscapes for thousands of years has been repeatedly raised by archeologists and anthropologists.
Barlow, Gardner, Lees, Parry, and Peres (2012)  bility that it has adapted to re-colonizing habitats disrupted by fire.
The study sites from this work are currently undergoing increasing deforestation and other modifications of forest landscapes.
Hydrological connectivity in Amazonia is increasingly disrupted by dynamic and multifaceted drivers (Ritter et al., 2017), including mining, and land-use changes that have modified at least 20% of Amazonia, with over 150 hydroelectric dams currently in operation and hundreds more planned (Castello & Macedo, 2016). The understanding of the processes related to the maintenance the gene flow throughout different environments, such as that in M. flexuosa, could aid conservation and management strategies. Also, the importance of rivers in maintaining population connectivity that are geographically distant is here shown for M. flexuosa, which can act as an umbrella for associated species and the environmental that thrive with it. Our results from M. flexuosa may be used as a first step toward building a model for other studies of plants whose dispersal is heavily influenced by rivers.

CO N FLI C T O F I NTE R E S T
None declared.