Hitchhiking the high seas: Global genomics of rafting crabs

Abstract Population differentiation and diversification depend in large part on the ability and propensity of organisms to successfully disperse. However, our understanding of these processes in organisms with high dispersal ability is biased by the limited genetic resolution offered by traditional genotypic markers. Many neustonic animals disperse not only as pelagic larvae, but also as juveniles and adults while drifting or rafting at the surface of the open ocean. In theory, the heightened dispersal ability of these animals should limit opportunities for species diversification and population differentiation. To test these predictions, we used next‐generation sequencing of genomewide restriction‐site‐associated DNA tags (RADseq) and traditional mitochondrial DNA sequencing, to investigate the species‐level relationships and global population structure of Planes crabs collected from oceanic flotsam and sea turtles. Our results indicate that species diversity in this clade is low—likely three closely related species—with no evidence of cryptic or undescribed species. Moreover, our results indicate weak population differentiation among widely separated aggregations with genetic indices showing only subtle genetic discontinuities across all oceans of the world (RADseq F ST = 0.08–0.16). The results of this study provide unprecedented resolution of the systematics and global biogeography of this group and contribute valuable information to our understanding of how theoretical dispersal potential relates to actual population differentiation and diversification among marine organisms. Moreover, these results demonstrate the limitations of single gene analyses and the value of genomic‐level resolution for estimating contemporary population structure in organisms with large, highly connected populations.

consequences of dispersal ability is fundamental to our understanding of population and community ecology, as well as the origin and maintenance of biological diversity (Lenormand, 2002;McPeek & Holt, 1992;Treml, Ford, Black, & Swearer, 2015).
Among marine animals, pelagic larval duration (PLD) plays a widely recognized role in the dispersal and connectivity of populations (Faurby & Barber, 2012). Because adults of many marine animals are nondispersive-often benthic and sessile or sedentarydispersal is primarily restricted by the vagility of pelagic larvae. As a result, PLD is often correlated with the geographic range and degree of population differentiation (e.g., F ST ) of a species (Cowen, 2000;Cowen & Sponaugle, 2009, but see Bradbury, Laurel, Snelgrove, Bentzen, & Campana, 2008;Weersing & Toonen, 2009). However, the communities of animals associated with the ocean's air-water interface (termed Neuston: Naumann, 1917via Marshall & Burchardt, 2005) also disperse as juveniles and adults while drifting or rafting at the surface of the open ocean. In these animals, dispersal by pelagic larvae (if present) is augmented by the dispersal potential of adults and juveniles, which can use large ocean currents to disperse across ocean basins and perhaps further (Thiel & Haye, 2006). In theory, the heightened dispersal ability of these animals should lead to wide geographic ranges, limited population structure, and reduced diversification. However, unlike many members of benthic communities, we know far less about population differentiation and diversification within the neustonic community.
Marine organisms with large populations and high dispersal ability pose challenges for estimating population connectivity and structure because tracking dispersal of individuals within the vast ocean is often logistically impossible. Genetic data provide a tool to assess population connectivity at large spatial scales (Hedgecock, Barber, & Edmands, 2007;Lowe & Allendorf, 2010). Because the detection of significant population structure provides clear evidence for differentiation among populations with low connectivity, the literature is biased toward positive examples of population structure (Hedgecock et al., 2007). However, when population connectivity is high and populations are large (e.g., in neustonic animals), it is difficult to detect population structure with limited genetic resolution offered by traditional genotypic markers (Benestan et al., 2015;Goetze, 2005;McCormack, Hird, Zellmer, Carstens, & Brumfield, 2013). Recent advances in high-throughput sequencing technologies have allowed genomewide genetic variation to be incorporated in population genetic analyses of nonmodel organisms (Reitzel, Herrera, Layden, Martindale, & Shank, 2013), providing unprecedented genetic resolution of the population biology and connectivity of previously enigmatic groups of organisms.
Planes crabs are common and conspicuous members of the neustonic community throughout the temperate and tropical oceans of the world (Chace, 1951). Three species currently are recognized as follows: Planes minutus (N. Atlantic and Mediterranean), Planes major (worldwide, except N. Atlantic), and Planes marinus (worldwide, except N. Atlantic) (Chace, 1951;Ng, Guinot, & Davie, 2008)

. While
Pl. marinus is morphologically distinct, Pl. minutus and Pl. major show overlapping trait distributions in supposedly diagnostic traits (Chace, 1951). Recent phylogenetic analyses of the family Grapsidae suggest that the genus Planes is actually paraphyletic due to the well-supported inclusion of a fourth putative species, Pachygrapsus laevimanus (Ip, Schubart, Tsang, & Chu, 2015;Schubart, 2011), which is an intertidal species found across a narrow band of the South Pacific from Australia to Rapa Island (Poupin, Davie, & Cexus, 2005). Unlike intertidal grapsid crabs, which disperse almost exclusively during the pelagic larvae stage (Anger, 1995), Planes disperse as juveniles and adults while rafting on surface-drifting oceanic debris or flotsam, and as facultative symbionts of oceanic-stage sea turtles, frequently inhabiting the pocket above the turtle's tail (Chace, 1951;Pfaller et al., 2014). Planes and Pa. laevimanus crabs therefore provide an opportunity to test the prediction that elevated dispersal potential limits species diversity and decreases population structure.
Traditional single gene analyses of intertidal grapsid crabs show weak genetic differentiation across wide latitudinal gradients, but limited evidence for transoceanic gene flow, indicating that large ocean basins represent significant barriers to pelagic larval dispersal (Cassone & Boulding, 2006;Schubart, Cuesta, & Felder, 2005).
Moreover, restricted transoceanic gene flow between sister species has been identified as a potential mechanism leading to diversification within the family Grapsidae (Schubart, 2011). For Planes, the ability to disperse as pelagic larvae and as adults associated with oceanic flotsam and sea turtles should facilitate transoceanic genetic exchange, limiting both species diversity within this group of crabs and intraspecific genetic differentiation among widely separated populations. To test this hypothesis, we conducted an analysis of the global species diversity and population-level differentiation using next-generation sequencing of genomewide restriction-siteassociated DNA tags (RADseq) and traditional mitochondrial DNA (mtDNA) sequencing, to address three main questions. At the species level: (a) Is species diversity low with no evidence of cryptic species? At the population level: (b) Is population differentiation weak among widely separated aggregations? (c) If genetic discontinuities exist, where are there biogeographic corridors and barriers to rafting dispersal at a global scale?

| Taxon sampling and justification
Specimens were collected from 27 sites within 13 broad ocean regions corresponding to the east and west sides of each major ocean gyre, the central Pacific, and the Mediterranean Sea ( Figure 1; sampling regions were not based on known biogeographic boundaries). Each specimen was given an a priori species designation based on external morphology, habitat, and/or geography following Chace (1951) and Poupin et al. (2005) (see Appendix S1 for details). Pachygrapsus laevimanus specimens were collected intertidally among rocks at three sites across its known range (Poupin et al., 2005). Planes specimens were collected from surface-drifting oceanic debris and sea turtles (Caretta caretta, Chelonia mydas, Eretmochelys imbricata, and Lepidochelys olivacea) at 24 sites within all 13 ocean regions. Where applicable for each putative species, 1-5 individuals were obtained from each site. For Pl. minutus and Pl. major, larger samples (>10 individuals) were collected when possible (Table 1).Prior to evaluating genetic patterns within Planes and Pa. laevimanus using genomic RADseq methods, we conducted a single-locus mtDNA phylogenetic analysis within the family Grapsidae to (a) evaluate statistical support for the relationship between Pa. laevimanus and Planes (as in Schubart, 2011;Ip et al., 2015), but with wider geographic and taxonomic sampling,

| COI amplification, sequencing, and analyses
A 650-bp barcoding fragment of the mitochondrial cytochrome c oxidase subunit I (COI) gene was PCR amplified using degenerate universal metazoan primers (forward/reverse: dgLCO/dgHCO) following protocols described in Evans and Paulay (2012). Samples producing PCR products of appropriate size were Sanger sequenced bidirectionally.
Forward and reverse sequences were assembled using Sequencher v4.10.1 (Gene Codes Corporation Ann Arbor, MI, USA) and manually checked for ambiguous and erroneous base calls.

| Creation and sequencing of RAD libraries
Genomic DNA quality was checked on agarose gels to ensure the majority of DNA fragments were of high molecular weight. Samples not meeting this criterion were excluded from RAD library development as degraded DNA has been shown to dramatically reduce the ability to recover comparable loci among individuals (Graham et al., 2015). RAD libraries were prepared following the double-digest (ddRAD) protocols described by Parchman et al. (2012)

| Processing of sequenced RAD tags
Sequence quality filtering and processing were performed using tools in the FASTX-Toolkit (Gordon & Hannon, 2010). Sequences were filtered to retain reads with a minimum Phred score of 20 for 90% of the read, demultiplexed with zero barcode mismatches, and trimmed to remove the inline barcode and restriction cut-sites resulting in 84-bp reads. Sequence alignment, single nucleotide polymorphism (SNP) discovery, and genotyping were performed in STACKS v. 1.21 (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011;Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013). In the ustacks module, data from each individual were processed allowing a minimum identical read depth of 2 (-m 2), a maximum distance for allele detection of two (-M 2), and invoking the -r and -d options.
In the cstacks module, a master catalog of all observed loci and al- . Alternate parameter values were tested for each module and those used represent a compromise between dataset size, information content, and percentage of missing data (see Appendix S1 for more details). a Some samples failed during either COI or RAD analyses; therefore, the same set of individuals was not used in both analyses. b See Figure 1 for specific collection sites within each region and region abbreviations. c For genetic analyses, only one specimen per host (piece of flotsam or turtle) was analyzed, unless absolutely necessary. d Old specimens, DNA too degraded for RADseq.
Because clustering programs can have difficulty identifying lower substructure in the presence of more dominant higher-level organization (Kalinowski, 2011), different RAD datasets were generated to ensure the retention of loci relevant at both inter-and intraspecific scales. Results from the most inclusive analysis were used to inform grouping of individuals for subsequent analyses, making no a priori assumption about species or population designation. The

| Individual and population clustering
The number of species and/or population clusters present in each dataset was inferred using parametric and nonparametric clustering methods as implemented in the programs STRUCTURE v.
2.3.4 (Pritchard, Stephens, & Donnelly, 2000) and AWCLUST v. In STRUCTURE, 40,000 MCMC generations were run with a burn-in of 10,000 using an admixture model with correlated allele frequencies, no prior information on sampling location, with five replicates for each value of K. STRUCTURE results were processed using STRUCTURE HARVESTER v. 0.06.94 (Earl & vonHoldt, 2012), where different values for K were evaluated by comparing the log-likelihood probability (L(K); mean ± standard deviation) of each model and applying the deltaK method (Evanno, Regnaut, & Goudet, 2005). Based on estimated ancestry coefficients calculated in STRUCTURE, each individual was assigned to one putative species or population cluster at each value of K.
In AWCLUST, pairwise allele sharing distance matrices were generated between all individuals in each dataset and multidimensional scaling plots were constructed to visualize putative clusters and identify outliers. Gap statistics were calculated and compared for each value of K following 100 null simulations, and each individual was assigned to one putative species or population cluster based on hierarchical clustering plots. In both STRUCTURE and AWCLUST, we tested values of K between 1 and 10 for RAD dataset 1, between 1 and 5 for RAD dataset 2, and between 1 and 8 for RAD datasets 3. At each value of K, we compared the composition of individuals within clusters to quantify the congruence between STRUCTURE and AWCLUST assignments and to identify common and erroneous clusters based on putative species designations and geography. To test for additional hierarchical structure in each RAD dataset, we also ran ML phylogenetic analyses on the SNP multiple sequence alignments in RAxML v 8.2.10 (Stamatakis, 2014) using an ascertainment bias-corrected GTRGAMMA model with the Felsenstein correction and the rapid bootstrap algorithm with 300 bootstrap iterations (see Appendix S1 for more details).

| Population genomic analyses
For each RAD dataset, the genetic diversity within clusters and genetic differentiation between clusters detected in STRUCTURE and AWCLUST were estimated by calculating pairwise genetic distance (F ST ), observed and expected heterozygosity, and number of private alleles in the program Arlequin. Significant differences in F ST values among clusters were determined by a 1,000 permuta-

tion test with Bonferroni corrections for multiple comparisons in
Arlequin.
For RAD dataset 3, we performed additional fine-scale analyses based on the observed genetic clustering. A hierarchical analysis of molecular variance (AMOVA; Excoffier, Smouse, & Quattro, 1992) was performed to test how genetic variation is partitioned within and among ocean basins and regions. Regional designations for the AMOVA follow Table 1 and Figure 1, except for the southeast and southwest Atlantic (SEA and SWA) and southeast and southwest Indian (SEI and SWI), which were grouped together into South Atlantic (SA) and Indian (IND), respectively, due to small sample sizes. Significant differences in genetic differentiation (F ST ) were evaluated using a Bonferroni correction alpha value of 0.0009.
To test for isolation by distance, Mantel tests implemented in GENALEX v. 6.5 (Peakall & Smouse, 2012) were conducted between F ST and log-transformed geographic distance among sites within the North Atlantic and Pacific oceans. Last, we used the SNP multiple sequence alignment for RAD dataset 3 to generate a phylogenetic network with the Neighbor-Net algorithm in SPLITSTREES v 4.14.6 (Huson & Bryant, 2006).

| COI phylogenetic analysis
A ML phylogenetic analysis of 253 COI sequences from 23 grapsid species (including Pa. laevimanus and three putative Planes species) resulted in consistently high bootstrap support (≥97%) for the monophyly of most species. However, COI data provide little to no resolution of relationships between genera or within species after collapsing nodes with weak bootstrap support (<60%) ( Figure 2). In this analysis, Pa. laevimanus and Planes form a single clade that is distinct from other grapsid species with high bootstrap support (98%), which is consistent with the paraphyly of Planes due to the well-supported inclusion of Pa. laevimanus as found by Schubart (2011) and Ip et al. (2015). All COI sequences for Pa. laevimanus and Planes are available on GenBank (Accessions MH931286-MH931370).
Within the clade that unites Planes and Pa. laevimanus, we found one strongly supported polytomy (bootstrap = 87%) nested within a larger polytomy (Figure 2). Individuals within the nested clade (Group 2 in Figure 2) were united because they shared seven unique mtDNA SNPs that were not found in individuals outside the nested clade

| RAD libraries and processing
We sequenced RAD libraries for 152 individuals in two lanes of and Pl. minutus and Pl. major individuals (RAD dataset 3; N = 116).
For the three RAD datasets, we recovered the following numbers of loci: RAD dataset 1 = 1,108 loci; RAD dataset 2 = 3,314 loci; and RAD dataset 3 = 1,288 loci. Most loci were unique to each dataset, while some loci were shared among datasets ( Figure S1).
RAD dataset 1 contained a high proportion of loci with elevated haplotype diversity as calculated by Φ ST values ( Figure S2), suggesting this dataset contained a larger number of loci with fixed differences appropriate for evaluating higher-level (i.e., specieslevel) relationships. RAD datasets 2 and 3 contained a substantially higher proportion of loci with lower Φ ST values ( Figure S2), suggesting that these datasets were more appropriate for assessing recent divergences and finer population-level patterns.

| Clustering of individuals and populations
The optimal values for K, the number of putative species or population clusters, in each dataset were not always clear-cut within and between STRUCTURE and AWCLUST analyses. Therefore, instead of selecting and analyzing just one seemingly optimal value of K, thereby excluding other potentially important patterns, we analyzed results at multiple K values within each dataset.
For RAD dataset 1, which comprised all putative species and ocean regions, we found support for K = 2 in STRUCTURE and K = 4 in AWCLUST ( Figure S3). At K = 2, there was high congruence (≥95%) between STRUCTURE and AWCLUST assignments

| Population genomic analyses
For each RAD dataset, all pairwise comparisons of genetic dif-

ferentiation (F ST ) between clusters identified in STRUCTURE and
AWCLUST were highly significant (p-value <0.001). Table 2 shows pairwise comparisons of genetic distance (F ST ), and associated pvalues, observed and expected heterozygosity, and number of private alleles among clusters identified in RAD dataset 1 (Figure 4).    Figure 6). See additional information in Appendix S2

See additional information in Appendix S2 and results for K = 3 in
and results for K = 3 in Table S3. This analysis from RAD dataset 3 shows an overall pattern of relatively low, but consistent, genetic differentiation between the four ocean regions with evidence of subtle differentiation between Pl. minutus (Cluster 1) and Pl. major Neighbor-Net support the overall patterns found in the clustering analyses at K = 4 ( Figure S7), but with a highly complex network along the backbone that suggests considerable ongoing gene flow between clusters.
Additionally, for RAD dataset 3, an analysis of molecular variance (AMOVA) across 11 regions in three ocean basins showed that the majority of genetic variation was found among individuals within regions (87%) and that there was considerably more genetic variation between oceans (11%) than among regions within oceans (2%) ( Table 5). Patterns of genetic differentiation (F ST ) among the regions designated in the AMOVA (Figure 7; Table S4) were generally consistent with patterns (and associated F ST values) among clusters identified in STRUCTURE and AWCLUST ( Figure 6;

| D ISCUSS I ON
A major challenge in phylogeography is predicting the role that dispersal mode plays in shaping population structure and ultimately species diversification (Palumbi, 1994(Palumbi, , 2003. In this study, we use rafting crabs to test whether the ability to disperse as pelagic larvae and as adults associated with oceanic flotsam and sea turtles facilitates transoceanic genetic exchange, thereby limiting both species diversity within this group of crabs and intraspecific genetic differentiation among widely separated populations. Because initial mtDNA analyses showed only limited divergence between species and no evidence of population structure, it was impossible to distinguish whether this was caused by ongoing genetic exchange or because the mtDNA marker simply lacked the resolution to detect existing phylogeographic patterns. We addressed this problem using genomewide SNP data. We found convincing evidence that (a) intertidal Pa. laevimanus is sister to rafting Pl. marinus, (b) Pl. minutus and Pl. major comprise a single globally distributed species that is sister to Pa. laevimanus and Pl. marinus, and that hybridizes with Pl. marinus in the North Atlantic, and (c) Pl. minutus/major exhibit limited population structure at a global scale with weak genetic discontinuities associated with prominent oceanographic features. Our results show how life history changes that augment dispersal potential (i.e., adults shifting from intertidal to rafting) can limit, but not prevent, species diversification and population differentiation, and highlight the value of genomic data in resolving phylogeographic patterns in organisms with large, highly connected populations.

| Low species diversity in rafting crabs
Our results confirm that the genus Planes is paraphyletic due to its well-supported relationship with Pa. laevimanus (Ip et al., 2015;Schubart, 2011), only with greater taxonomic and geographic depth.
The morphological similarity between Planes, especially Pl. marinus, and Pachygrapsus has led to taxonomic confusion in the past (Chace, 1951(Chace, , 1966. However, Pa. laevimanus has never been linked to Planes until genetic data were analyzed (Schubart, 2011;Ip et al., 2015;this study). In a novel observation, the affinity between Pa. laevimanus and Planes is evident when comparing male gonopod morphologies: Pa. laevimanus is clearly more similar in shape to Planes (Figure 2 in Chace, 1951) than to any Pachygrapsus (Figure 15 in Poupin et al., 2005). Like many groups of marine animals, the use of external morphology and traditional genotypic markers has failed to produce a  TA B L E 2 Pairwise comparison of genetic distance (F ST ; below diagonal) and associated p-values (above diagonal), observed and expected heterozygosity (SE = standard error), and number of private alleles among clusters identified in RAD dataset 1 resolved phylogenetic hypothesis for the family Grapsidae (e.g., Schubart, Cuesta, & Felder, 2002;Schubart, Cannicci, Vannini, & Fratini, 2006;Schubart, 2011;Ip et al., 2015; this study). The application of phylogenetic RADseq (e.g., Jones, Fan, Franchini, Schartl, & Meyer, 2013;Wagner et al., 2013) in conjunction with classical morphological analyses will help resolve phylogenetic patterns, thereby providing the framework to investigate pressing evolutionary questions within this family and in other taxonomically challenging groups.
Our RADseq analyses indicate that intertidal Pa. laevimanus is sister to rafting Pl. marinus. In the absence of any differences in the mitochondrial genome or at least at the COI locus-a fast-evolving locus often used as a species-level barcode (Evans & Paulay, 2012)-our interpretation of these results is that there has been a F I G U R E 5 Results from clustering analyses of RAD dataset 2 at ( (Pfaller & Gil, 2016) or the same sea turtle (Frick et al., 2011) yet do not interbreed. Identifying the factors that both promote hybridization in specific areas and deter hybridization elsewhere would shed light on the mechanisms underlying the maintenance and merger of species diversity on a broader scale (Abbott et al., 2013;Barton, 2001).

| Weak global population structure in rafting crabs
However, in the absence of prominent physical barriers, evidence for genetic discontinuity within a species becomes difficult to explain (Lowe & Allendorf, 2010).
The nonpolar distribution of Planes likely reflects its inability to survive cold temperatures (Chace, 1951;Spivak & Bas, 1999), thereby limiting dispersal across regions below their thermal minimum (e.g., the Arctic and Southern oceans). Our results show clear, albeit weak, differentiation between individuals in the Atlantic and Pacific oceans, indicating that continental landmasses and the polar waters at Cape Horn (southern South America) limit dispersal. Dispersal limitations across this barrier have led to Atlantic-Pacific speciation in a tropical rafting crab, Plagusia (Schubart, González-Gordillo, Reyns, Liu, & Cuesta, 2001). However, our results indicate that the cold, but not polar, waters around Cape of Good Hope (southern Africa) do not limit dispersal in Planes and instead act as a dispersal corridor.
These patterns are only partially consistent with genetic patterns of other neustonic organisms: Glaucus nudibranchs and Halobates sea skaters show restricted dispersal across both Cape Horn (Atlantic-Pacific disjunction) and the Cape of Good Hope (Atlantic-Indian disjunction) (Anderson et al., 2000;Churchill et al., 2014). Planes may simply have a lower thermal tolerance, thereby allowing dispersal between the Atlantic and Indian oceans. However, Planes may also be able to successfully navigate this potential barrier while associated with sea turtles. Both loggerhead and green turtles show genetic connectivity across this biogeographic boundary (Bourjea et al., 2006;Shamblin et al., 2014), providing a potential dispersal vector for Planes that is unavailable to other neustonic animals that are not known to associate with sea turtles.

Sources of variation
Patterns of genetic structure revealed in our genomic RADseq analyses were not detected in our mtDNA analysis. Traditional genotypic markers (e.g., COI) can provide valuable phylogeographic inferences when there is sufficient genetic resolution to elucidate population-level differences, which is often the case for organisms that have low connectivity among populations. Because these markers were the primary tool available in the past and estimating population structure is more clear-cut when populations are distinct, the literature is somewhat biased toward positive examples (Benestan et al., 2015;Goetze, 2005;McCormack et al., 2013). However, when little to no population structure is detected with traditional markers, as in Planes and other neustonic and planktonic animals [copepods (Bucklin & Kocher, 1996;Bucklin, LaJeunesse, Curry, Wallinga, & Garrison, 1996;Bucklin et al., 2000), euphausiids (e.g., Zane et al., 1998;Zane & Patarnello, 2000;Jarman, Elliott, & McMinn, 2002), squid (Sands, Jarman, & Jackson, 2003), and nudibranchs (Churchill et al., 2014)], it becomes exceedingly difficult to distinguish whether there is ongoing genetic exchange (i.e., panmixis) or whether the marker simply lacks the resolution to detect subtle phylogeographic patterns. For these reasons, our current understanding of the population biology of many neustonic animals remains either unresolved or incomplete. Our results demonstrate the ability of genomic tools, like RADseq, to identify weak population structure when traditional genotypic markers hold no resolution (Fraser 2018). Because these tools are more sensitive to subtleties in phylogeographic structure, they hold great value and future promise for elucidating populationlevel patterns in organisms that exhibit vast dispersal potential and high connectivity among distant populations.

ACK N OWLED G M ENTS
Funding for this work was awarded to JBP by the PADI Foundation