Range‐wide population genetic structure of the Caribbean marine angiosperm Thalassia testudinum

Abstract Many marine species have widespread geographic ranges derived from their evolutionary and ecological history particularly their modes of dispersal. Seagrass (marine angiosperm) species have ranges that are unusually widespread, which is not unexpected following recent reviews of reproductive strategies demonstrating the potential for long‐distance dispersal combined with longevity through clonality. An exemplar of these dual biological features is turtle grass (Thalassia testudinum) which is an ecologically important species throughout the tropical Atlantic region. Turtle grass has been documented to have long‐distance dispersal via floating fruits and also extreme clonality and longevity. We hypothesize that across its range, Thalassia testudinum will have very limited regional population structure due to these characteristics and under typical models of population structure would expect to detect high levels of genetic connectivity. There are very few studies of range‐wide genetic connectivity documented for seagrasses or other sessile marine species. This study presents a population genetic dataset that represents a geographic area exceeding 14,000 km2. Population genetic diversity was evaluated from 32 Thalassia testudinum populations sampled across the Caribbean and Gulf of Mexico. Genotypes were based on nine microsatellites, and haplotypes were based on chloroplast DNA sequences. Very limited phylogeographic signal from cpDNA reduced the potential comparative analyses possible. Multiple analytical clustering approaches on population genetic data revealed two significant genetic partitions: (a) the Caribbean and (b) the Gulf of Mexico. Genetic diversity was high (H E = 0.641), and isolation by distance was significant; gene flow and migration estimates across the entire range were however modest, we suggest that the frequency of successful recruitment across the range is uncommon. Thalassia testudinum maintains genetic diversity across its entire distribution range. The genetic split may be explained by genetic drift during recolonization from refugia following relatively recent reduction in available habitat such as the last glacial maxima.


| INTRODUC TI ON
Dispersal of propagules is critical for population maintenance in sessile marine organisms as it is essentially the only mechanism for such species to move other than the gametes themselves. Dispersal can most simply be thought of as the movement of an organism from a source (parent) to a settlement site (Cowen & Sponaugle, 2009;Kinlan, Gaines, & Lester, 2005;Weersing & Toonen, 2009). Benthic marine organisms generally release propagules into the water column which subsequently are transported by water movements (Shanks, Grantham, & Carr, 2003); however, this migration can only be considered successful when the progeny is deposited in a suitable environment (Siegel, Kinlan, Gaylord, & Gaines, 2003) and become a reproductive individual .
It is important to note that the dispersal range of aquatic plants is generally larger than terrestrial species (Les, Crawford, Kimball, Moody, & Landolt, 2003;Santamaría, 2002), meaning direct dispersal can occur over a range of meters to hundreds of kilometers (McMahon et al., 2014).
Quantifying long-distance dispersal (LDD) in a systematic and meaningful way is complex, particularly in marine environments.
Effective dispersal distances in the marine environment may be extremely large and usually do not reflect the mean displacements of the propagules (Kinlan et al., 2005). Event-based migrations often determine colonization to new locations and are important for maintaining genetic connectivity Les et al., 2003;Ouborg, Piquot, & Van Groenendael, 1999). In such situations, LDD can only be assessed retrospectively with the use of genetic markers. A typical approach to estimate LDD is to assess the relative frequencies of alleles at marker loci in populations and statistically test or model the likelihood of the allele frequencies being observed.
The indirect approach to measuring/modeling connectivity based on genetic data is widely used but is dependent on many assumptions that are rarely met (i.e., short generation time, no overlapping generations, etc.). Nevertheless, the outcomes give valuable insight into the relationships of populations.
Previous reviews have attempted to describe the relationship between genetic differentiation and marine connectivity using various genetic markers Selkoe et al., 2016;Weersing & Toonen, 2009). These works demonstrate a limited correlation between genetic differentiation of populations (F ST ) and dispersal distance. The discrepancy these studies describe is partially attributed to the difficulty in comparing F ST results when utilizing different types of genetic markers and the scale that these are used at. Additionally, a debate has been taking place on the appropriate application of the various measures of genetic differentiation (Hedrick, 2005;Heller & Siegismund, 2009;Jost, 2008;Meirmans & Hedrick, 2011), which has highlighted some concerns on the use of microsatellites for population genetics. Marine species possessing wide geographic ranges and long dispersal potential allow for valuable case studies to examine the spatial scale of genetic connectivity.
This study investigates the benthic seagrass, Thalassia testudinum Banks ex König (turtle grass), which has a wide geographic range residing in shallow tropical and subtropical waters across the Western Atlantic Ocean. Here, it plays an essential role as a foundation species forming extensive meadows and providing numerous ecosystem services (Green & Short, 2003;Van Tussenbroek et al., 2006). Thalassia testudinum occurs predominately in the Caribbean tropical marine biogeographic province. This region lost physical connectivity with the Tethys Sea following the closure of the Isthmus of Panama approximately 3 Ma (O'Dea et al., 2016), a process which initiated much earlier (7 Ma; Muss, Robertson, Stepien, Wirtz, & Bowen, 2001). Many phylogeographic studies exist for this region, particularly on fish and corals, but few have covered the total range of their study species (e.g., Chaves-Fonnegra, Feldheim, Secord, & Lopez, 2015;Purcell, Cowen, Hughes, & Williams, 2009).
In such large-scale studies, estimates of connectivity (measured as gene flow) have shown inconsistencies in the genetic structure of the target taxa. Some studies report high gene flow between distant sites (e.g., Purcell et al., 2009) while others have shown to be significantly structured and partitioned (e.g., Chaves-Fonnegra et al., 2015). Although ubiquitous throughout its range, populations of T. testudinum are disconnected by deeper water areas, unsuitable substratum for the establishment, and major river discharges (Van Tussenbroek et al., 2006). Thalassia testudinum has floating propagules (buoyant fruits), which allow for LDD beyond the range of local populations (van Dijk, van Tussenbroek, Jiménez Durán, Márquez Guzmán, & Ouborg, 2009). Like all perennial seagrasses, it is capable of longevity through clonality, a function of vegetative growth through rhizome extension and the repetitive formation of new potentially independently functional plant units or ramets (van Dijk & van Tussenbroek, 2010). The combination of LDD, clonality, and longevity produces a highly complex demographic structure.
In this study, we evaluate genetic diversity, connectivity, structure, and phylogeographic relationships among populations of T. testudinum. This species is an exemplary case study on range-wide connectivity in this area as this species is one of the most common submarine plants found in the region and will represent the first investigation where population structure of a tropical seagrass species is described across its entire range.
We propose that the current distribution of T. testudinum is dependent on genetic connectivity across large spatial scales.
We hypothesize that T. testudinum displays the capacity to remain highly connected, as evident by weak geographic structure.
It is likely T. testudinum maintains connectivity through moderate to high levels of gene flow that we expect to occur across most of the Western Atlantic despite the large geographic region this species inhabits. The combination of long-distance dispersal with longevity through clonality enables the extended spatial extent of connected populations even with relatively constrained recruitment processes. We will also examine if there is evidence of recent historical barriers, such as sea-level changes associated with the most recent glacial maxima (~26 Kya, e.g., Peltier & Fairbanks, 2006) or deeper divergences such as those associated with the closure of the Panamanian Isthmus (~2.8 Mya O'Dea et al., 2016) have influenced population divergences. Both time periods altered oceanic conditions and habitat availability along with the direction and distribution of currents influencing the potential population size and genetic diversity of T. testudinum. For contemporary analysis, we use species-specific microsatellite markers to assess genetic structure and estimate gene flow. Historical contractions or expansions will be investigated through intraspecific phylogenetic relationships with one nuclear and three chloroplast DNA sequence loci.

| Ethics statement
Sample permits were issued by SAGARPA for Mexico, and Ministerio del Ambiente for Costa Rica; specific permission was not required for the other (CARICOMP) sampling sites. Other collections were made under the auspices of permits to The University of Virginia.

| Sampling locations
Thirty-two populations of T. testudinum were collected throughout the species range ( Figure 1); fifteen of these were sampled by collaborators of CARICOMP (The Caribbean Coastal Marine Productivity Program) and other institutions (Supporting Information Table S5 in Appendix S2). For each population, two groups of 15 samples (foliar shoots, 2 m apart) were collected, and the two groups were 500-2,000 m apart (depending on local conditions). The Belize-Placencia population was collected by grab-sampling at irregularly spaced intervals at least 5 m apart. Populations from US-Rankin, US-Duck, and US-Arsenicker were obtained by randomly collecting 30 samples at a similar spatial scale from genotypic data used in Bricker, Waycott, Calladine, and Zieman (2011). Pairwise geographic distances between the populations ranged from 4 to 4,249 km (Supporting Information Table S7 in Appendix S2). Distances were measured as the shortest possible distance by sea. Procedures for sample collection, DNA extraction, and amplification followed van F I G U R E 1 Populations of Thalassia testudinum and phylogeographic assignment. Collection sites of 32 T. testudinum populations in its total distribution range (edge delimited by solid line). Population details are found in Supporting Information Table S1 in Appendix S2. Each pie depicts the relative assignment proportions of each population to clusters 1 and 2 (Caribbean and Gulf of Mexico). The diameter of each pie corresponds to the population's relative allelic diversity (Table 1). Within each pie, the attribution to rbcLa haplotype C or T is also shown. The gray arrows show the directions and intensities of the major superficial currents (Gyory, Mariano, & Ryan, 2005) (Peakall & Smouse, 2006). The web version of Genepop (Raymond & Rousset, 1995) was used to estimate the inbreeding coefficient (F IS ) for each population, also the significance of the global heterozygote excess and deficit was determined using the default parameters. The

average number of alleles per locus (A) and the allelic richness (A [n] )
for all populations were estimated with Hp-Rare (Kalinowski, 2005) applying a rarefaction to 15 samples (Kalinowski, 2002).

| Genetic structuring
To determine whether T. testudinum is subdivided into groups of shared ancestry across its range, a Bayesian assignment test was performed across all populations using Structure (Pritchard, Stephens, & Donnelly, 2000), assuming admixture and correlated allele frequencies. For each K (1 to 32), 20 independent runs (10 5 iterations burn-in and 10 7 main) were performed. Summary statistics, ∆K as described by Evanno, Regnaut, and Goudet (2005), and graphs of the runs were generated with the web package CLUMPAK (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015). An analysis of molecular variance (ANOVA) was used to quantify how genetic diversity was distributed among individuals, populations, and clusters. Calculations were made with GenoDive (Meirmans & Van Tierden, 2004) using an infinite allele model (999 permutations). (Meirmans, 2006) was also estimated to correct for dependence of F ST to the genetic variation of the used markers. A primary component analysis (PCA) was performed on the data as an independent approach to detect genetic patterns. The R package adegenet (Jombart, 2008) was used to do the calculations and depict the data.
To test for isolation by distance (IBD, Wright, 1943), a Mantel test was performed with each of these pairwise estimates with the webbased program IBDWS where all three measures were correlated with the pairwise geographical distances of the 32 populations. The correlations and slopes of the relationships were tested by a reduced major axis (RMA) regression applying 1,000 randomizations. IBD testing was determined assuming two-dimensional dispersal, applying a geographic log-transformation (Rousset, 1997). IBD was also calculated at a regional scale assuming two genetic clusters.
The relative number of migrants per generation was calculated with the divMigrate function in the R package diveRsity (Keenan, McGinnity, Cross, Crozier, & Prodöhl, 2013) to illustrate connectedness between populations. The G ST and newly developed N m (Alcala, Goudet, & Vuilleumier, 2014) statistics were used to estimate the relative values of gene flow. Connectivity was visualized with the qgraph package in R. Estimates of contemporary migration rates were conducted in BayesAss (Wilson & Rannala, 2003); due to the limited number of populations, the program could handle a modified version of the software was used (courtesy of Bruce Rannala).
Default settings were used with a burn-in of 1.0 × 10 6 iterations and a 4.0 × 10 7 run. Network analysis following the approach of Dyer and Nason (2004) was conducted in GENETIC STUDIO b.131 (Dyer, 2009). The population topology was developed using allelic data without a priori assumptions on any particular structure such as the data being organized by geographic region. The overall arrangement of the network is determined by the genetic relationships among populations where the edge lengths depict the among population component of genetic variation representing connectivity. The nodes represent the populations where the sizes reflect the populations' genetic variability (Dyer & Nason, 2004).

| Genetic diversity
A total of 996 seagrass ramets were genotyped and after removal of redundant MGLs, 662 individual genotypes remained for analysis.
These had a total of 137 alleles, with loci having between 8 and 23 alleles. The probability of obtaining the same genotype by a sexual event estimated by p sex was generally low. Marginal populations with very low allelic diversity, like Bermuda and Laguna Madre, had high probabilities. Clonality varied among populations with clonal richness ranging from R = 0.21 up to R = 1.00, with an average of R all = 0.66 (Table 1). Analysis of the MLGs revealed that clone mates were only detected within populations and not among. Gene diversity was generally high with H E ranges between 0.35 and 0.75. Two sites, MX-Pta Sam and US-Arsenicker, showed heterozygosity deficits. The rarefied allelic richness (A [15] ) ranged between 3.2 and 6.9 (Table 1).

| Genetic structuring
Structure analysis and plotting ∆K describe maximum genetic structure at two regional clusters, although subdividing the populations into three (K = 3) also resulted in above average structure, but less markedly (Supporting Information Figures S1, S2, and Table S3 in Appendix S1). The two major clusters can roughly be subdivided into the Caribbean (cluster 1) and the Gulf of Mexico (cluster 2), with some populations having intermediate membership. The PCA ordination arranged populations in concordance to Structure models (Supporting Information Figure S4 in Appendix S1). Again, Caribbean and Gulf of Mexico populations were grouped separately with some intermediate populations. The population graph generated with Genetic Studio depicts a similar distribution of genetic diversity (Supporting Information Figure S5 in Appendix S1).
Hierarchical partitioning of variance among populations and clusters (Supporting Information Table S2 in Appendix S2) indicates that the assignments are based on slight shifts in the allelic frequencies and not on strong isolation of the regions. The variance within each cluster (F SC = 0.160) was higher than that between clusters (F CT 0.084). Due to the nature of this analysis, assignment to clusters was determined by highest proportion of membership. The global measure of genetic structuring is F ST = 0.161 (calculated with no partitioning among populations). Genetic differentiation measures among the sampled populations increased considerably when the variance F I G U R E 2 Isolation by distance for Thalassia testudinum. Isolation by distance (IBD) was calculated for populations of Thalassia testudinum through the whole range (Graphs a-c). Based on genetic assignments with Structure, IBD was also calculated within each cluster (Caribbean (d-f) and Gulf of Mexico (g-i)). Populations sharing assignments to both clusters were included in both datasets if assignment proportions were between 0.3 and 0.7 (Supporting Information Table S3 in Appendix S1). Correlations were based on pairwise genetic distance as F ST (Weir & Cockerham, 1984), pairwise standardized F ′ ST (Hedrick, 2005;Jost, 2008), and Jost's D EST (Jost, 2008) against the pairwise geographic distance (determined by de shortest distance over water possible; see Supporting Information Tables S3-S11 in Appendix S2). Log-geographic corrections were applied to all analyses according to Rousset (1997) Log distance (km) Log distance (km) Log distance (km) Isolation by Distance Thalassia testudinum in the Caribbean and Gulf of Mexico was standardized to its potential maximum (Meirmans, 2006), resulting in F ′ ST = 0.531 over the whole study area, with F ′ CT = 0.370 among regions and F ′ SC = 0.478 among populations within regions.

| Phylogeography
In order to evaluate and assess diversity and applicability for phylogeographic analysis, six loci were tested across a subset of samples, which consisted of samples distributed throughout the range of this study. Loci ITS1, trnL-F, and psbK-I failed to produce usable sequences and were excluded. Loci ITS2 and trnH-psbA sequenced well, and atpF-H sequenced well through the first 160 bp, but none had variable base positions (Supporting Information Table S15 in Appendix S2). rbcLa had one variable site, and this locus was used for the broader study to infer phylogeography. Only two haplotypes were found across the range, and several populations had both haplotypes. Haplotype distribution followed a similar pattern as the genotypic data for K = 2 ( Figure 1).

| Connectivity
Measures of genetic differentiation varied greatly among populations (Supporting Information Tables S3-S11 in Appendix S2). The  Figure S6 in Appendix S1, Tables S13 and S14 in Appendix S2). A Bayesian approach to infer migration into the sampled populations in the most recent generations (BayesAss analysis) shows that successful settlement of new or second-generation individuals is usually below 1% and never surpasses 2% of sampled MLGs (Supporting Information Table S12 in Appendix S2). Comparing the outcomes from five different measures of connectivity and population distinctiveness, as relative migration based on N m ( Figure 3) and G ST (Supporting Information Figure S6 in Appendix S1), BayesAss (Supporting Information Table S12 in Appendix S2), network analysis (Supporting Information Figure S5 in Appendix S1) and PCA (Supporting Information Figure S4 in Appendix S1) demonstrates that the broad scale processes dominate the data whichever method of analysis utilized.

| D ISCUSS I ON
In this study, we investigated the hypothesis that the biological traits of Thalassia testudinum provide for genetic connectivity over large distances leading to low levels of genetic structure across the extant range. Genotypic multilocus data and sequence data show that genetic structure is low and that long-term integrated gene flow occurs across large to very large distances. We also found that genetic diversity was very high compared to similar studies (see Section 4.1).
Genetic diversity is highest near Cuba and tapers off toward the periphery of the species distribution. This indicates that the current pattern of genetic diversity can be explained by the central-marginal hypothesis (Eckert, Samis, & Lougheed, 2008).   (Keenan et al., 2013) using the divMigrate function (Sundqvist, Zackrisson, & Kleinhans, 2013) and using N m (Alcala et al., 2014) as the connectivity (migration) estimate (Supporting Information Table S13 in Appendix S2). The divMigrate function plots the relative asymmetric migration between populations, from microsatellite allele frequency data. A lower threshold of relative migration of 0.14 was used to eliminate uninformative edges, and edges were scaled by width and color saturation when above 0.40. Wider and darker edges represent the most connected sites of this study with the highest relative connectivity (1.0) between 23.US Craig Key and 24.US Arsenicker. The numbers within the nodes represent the populations as in Table 1 Trinidad

| Connectivity
Gene flow was the most important microevolutionary force leading to homogeneity in genetic and allelic diversity among populations (Hedrick, 1999). Thalassia testudinum has an elevated capacity for sexually derived long-distance dispersal (van Dijk et al., 2009;Kaldy & Dunton, 1999), and recent research indicates that vegetative dispersal in seagrasses could also be important (e.g., Bricker, Calladine, Virnstein, & Waycott, 2018;Smulders, Vonk, Engel, & Christianen, 2017). Thus, it is to be expected that neighboring populations are more related to each other than more distant ones (Isolation By Distance, Wright, 1943), even when these distant populations are hundreds of kilometers apart. This is confirmed by the pattern of IBD found in this study, which was consistent across all utilized ferentiation values to other sites. It is also important to note that geographic distance between populations is the shortest distance over water possible, which will not be the typical path of dispersal of propagules as local surface and subsurface currents are rarely linear (Lems- de Jong, 2017) and the shortest distance will therefore be an underestimate. Hydrodynamic modeling would be necessary to produce more accurate estimates (e.g., White et al., 2010) as environmental and climatological factors can impact the route of gene flow (van Dijk et al., 2009).
These data also speak to the ongoing mathematical controversies related to G ST -based measures (Hedrick, 2005;Jost, 2008;Meirmans & Hedrick, 2011). For example, estimating gene flow with F ST involves violating many assumptions of the Wright-Fisher model for allelic inheritance (Whithlock & McCauley, 1999). Highly clonal plants are problematic for most Hardy-Weinberg-based estimates of relationships among alleles as clonal-based longevity results in many generations that might overlap by decades or centuries.
When we compare the differentiation measures in this study, F ST yielded lower values than the other two measures, particularly at the larger geographic scales. When standardized, F ′ ST reached almost complete differentiation for some population pairs, highlighting the underperformance of F ST when using highly polymorphic markers.
Jost's D appears to best describe the geographic pattern in an IBD context; however, more studies are needed to evaluate its ecological significance as it measures different aspects of population structure (Jost, 2009). Leng and Zhang (2011) (Table S12 in Appendix S2) confirm that dispersal has not occurred in recent generations.

| Genetic structure
The genetic structuring of T. testudinum based on the distribution of genetic diversity is moderate (F ST = 0.161) which is consistent with previous findings for T. testudinum in Mexico (van Dijk et al., 2009) Based on Bayesian modeling of allelic partitioning implemented in Structure, the most likely subdivision of all data was into two genetic clusters (or regions, i.e., K = 2). Ordination-based analysis visualized as a PCA of the two most informative axes confirms the two-region split (Supporting Information Figure S4 in Appendix S1). Phylogeographic sequence data support this outcome. This

| CON CLUS ION
The findings presented in this study deliver significant insight into the genetic diversity, clonality, connectivity, and overall range-wide population structure of a important tropical seagrass species. Our results indicate that the marine angiosperm Thalassia testudinum has been able to persist and thrive across the wide range of marine environments of the Western Atlantic through clonality providing a stable base to release propagules which disperse widely, although appear to recruit at long distances infrequently. Genetic structure modeling and analyses of phylogeography using sequence data indicate that there is population structure at the scale of two major regions, the Caribbean and Gulf of Mexico. As expected, isolated populations such as Bermuda and Tampa Bay show higher levels of genetic differentiation. Lower Laguna Madre populations are genetically secluded from the rest of the Western Atlantic populations and appear to be extending their local range via sexual recruitment.
Overall genetic differentiation between populations increases with distance and at the extremes of the range populations are almost completely isolated.
The extant populations of this keystone species exhibit high levels of genetic diversity with an indicative higher level of diversity is around western Cuba, an area where both lineages exchange genes more frequently. Although across the range as a whole, the scale of connectivity is large, the frequency of successful recruitment appears to be only moderate. In addition, although genetically diverse, the slow rate at which microevolutionary processes occur in this long-lived clonal species does pose a risk under rapid environmental change scenarios such as climate change.
We provide evidence that demonstrates long-distance migration is widespread and a variable process that brings new recruits (sexual and/or clonal) into local populations. Recent migration patterns (the last couple of generations, which could be hundreds to thousands of years) indicate more than 90% of recruits come from local sources.
The fact that only two clusters are found with both genotypic (relatively fast-evolving) and phylogeographic (relatively slow evolving) analysis tools indicates two separate recolonization events occurred in the Caribbean and Gulf of Mexico, which we propose aligns with recolonization following the last glacial maximum when range expansion was likely to be rapid and widespread for Thalassia testudinum.

ACK N OWLED G M ENTS
Technical, funding, and intellectual support were provided by Professor Jay Zieman, to whom we dedicate this study to in memo-  Table S1 in Appendix S2).

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interests.