Genome- wide phylogeography reveals cryptic speciation in the circumglobal planktonic calcifier Limacina bulimoides

Little is known about when and how planktonic species arise and persist in the open ocean without apparent dispersal barriers. Pteropods are planktonic snails with thin shells susceptible to dissolution that are used as bio- indicators of ocean acidification. However, distinct evolutionary units respond to acidification differently, and defining species boundaries is therefore crucial for predicting the impact of changing ocean conditions. In this global population genomic study of the shelled pteropod Limacina bulimoides , we combined genetic (759,000 single nucleotide polymorphisms) and morphometric data from 161 individuals, revealing three major genetic lineages ( F ST = 0.29– 0.41): an “Atlantic lineage” sampled across the Atlantic, an “Indo- Pacific lineage” sampled in the North Pacific and Indian Ocean, and a “Pacific lineage” sampled in the North and South Pacific. A time- calibrated phylogeny suggests that the lineages diverged about 1 million years ago, with estimated effective population size remaining high ( ~ 10 million) throughout Pleistocene glacial cycles. We do not observe any signatures of recent hybridization, even in areas of sympatry in the North Pacific. While the lineages are reproductively isolated, they are morphologically cryptic, with overlapping shell shape and shell colour distributions. Despite showing that the circumglobal L. bulimoides consists of multiple species with smaller ranges than initially thought, we found that these pteropods still possess high levels of genetic variability. Our study adds to the growing evidence that speciation is often overlooked in the open ocean, and suggests the presence of distinct biological species within many other currently defined circumglobal planktonic species.


| INTRODUC TI ON
We still know little about how species arise and persist in the open ocean, a seemingly interconnected habitat with few apparent barriers to dispersal. Although allopatric speciation has been accepted as the dominant mode of speciation (Bowen et al., 2016;Coyne & Orr, 2004;Mayr, 1954;Norris, 2000), it has been well documented that speciation can occur in the presence of high dispersal and gene flow, including in marine systems (Bendif et al., 2019;Bierne et al., 2003;De Vargas et al., 1999;Endler, 1977;Johannesson et al., 2010;Potkamp & Fransen, 2019;Schluter, 2009). To gain insight into the mechanisms of speciation in marine habitats, it is important to understand the factors that facilitate or constrain divergence, rather than categorizing case studies into allopatric, parapatric or sympatric speciation (Butlin et al., 2008;Fitzpatrick et al., 2009). While marine species have a wide variety of life history traits, dispersal potentials and extent of gene flow among populations, holoplanktonic species typically represent an extreme end of this scale with high fecundities, enormous population sizes, and huge potential for dispersal and gene flow.
Many marine zooplankton species rank among the most abundant multicellular eukaryotes in the world. With their unique life history traits, such species present an interesting system to assess the impact of large population size and high dispersal potential on speciation and divergence (Bucklin et al., 2021;Faria et al., 2021).
The large effective population sizes in marine zooplankton can have varying effects on evolution depending on the relative effects of selection and stochastic demographic events (Peijnenburg & Goetze, 2013). There can be higher levels of genetic diversity for selection to act upon and more effective selection due to the reduced effects of stochastic processes such as genetic drift (Barrio et al., 2016;Peijnenburg & Goetze, 2013). However, the relationship between adaptation and population size may not be straightforward (Galtier, 2016), and the reduction in genetic drift slows down the accumulation of genetic differentiation between (partially) isolated populations, which might also have profound effects on species divergence, for example under mutation-order speciation models (Mani & Clarke, 1990;Schluter, 2009).
Species with widespread geographical distributions can have slight morphological variation across their range, but this variation alone may be considered insufficient to delimit sibling species (Fleminger & Hulsemann, 1987;Knowlton, 1993). In other cases, well-characterized morphological differences are deemed to be the result of a highly variable morphospecies (e.g., Apolônio Silva De Oliveira et al., 2017). Such practices have resulted in an underestimation of ecologically relevant divergence with conventional morphology-based taxonomy in many cases (Bongaerts et al., 2021).
With the increased availability of powerful genetic tools, there has been growing evidence for cryptic species complexes in the open ocean, contrary to historical expectations that many pelagic species would have circumglobal, panmictic populations (Norris, 2000;van der Spoel & Heyman, 1983). The majority of circumglobal planktonic species examined with genetic data have been found to harbour cryptic diversity, with examples ranging from diatoms (Casteleyn et al., 2010;Whittaker & Rynearson, 2017) to copepods (Andrews et al., 2014;Cornils et al., 2017;Halbert et al., 2013;Hirai et al., 2015) and other calcifying plankton, including gastropods (Burridge et al., 2019;Wall-Palmer et al., 2018), foraminifers (Darling et al., 2004;De Vargas et al., 1999;Kucera & Darling, 2002) and coccolithophores (Filatov et al., 2021;Sáez et al., 2003). Therefore, an important step for understanding the capacity of plankton to adapt to future environmental change is to assess the spatial distribution of genetic variation and potential for gene flow across their species ranges (Bell, 2013;Harvey et al., 2014;Manno et al., 2017;Munday et al., 2013;Poloczanska et al., 2016;Sunday et al., 2014). In addition, the use of an integrative taxonomy framework (Burridge et al., 2019;Padial et al., 2010) would be beneficial to identify any consistent morphological differences between these lineages and support the current genetic findings (McManus & Katz, 2009). Also, within pteropods, the traditional use of shell shape to classify and identify species (Bé & Gilmer, 1977;Lalli & Gilmer, 1989;van der Spoel et al., 1997), while advantageous due to the traceable morphology of shells preserved in fossils (Janssen & Peijnenburg, 2017), has been shown to be inadequate on its own, following the identification of genetically distinct cryptic species within several described morphospecies (Hunt et al., 2010;Jennings et al., 2010). More recent case studies in pteropods have also successfully combined genetic data from barcoding genes with geometric morphometrics of the shell shape to assess species boundaries (e.g., Burridge et al., 2015;Burridge et al., 2019;Choo et al., 2021;Shimizu et al., 2021).
However, little is known about their adaptive potential for future environmental changes, including their levels of genetic diversity and functional genomic variation. Pteropods possess thin aragonitic shells that are vulnerable to dissolution and more difficult to produce in acidified waters, which are predicted for the future (Busch et al., 2014;Mekkes, Sepúlveda-Rodríguez, et al., 2021) and also observed along contemporary gradients of ocean acidification Manno et al., 2018;Niemi et al., 2021). Shelled pteropods feed on phytoplankton and particulate matter, including bacteria and small protists, by trapping them in external mucous webs and ingesting the webs (Conley et al., 2018;Hunt et al., 2008;Lalli & Gilmer, 1989). Shelled pteropods also constitute a significant part of the diet of their unshelled relatives (gymnosomes), salmon and other predators from higher trophic levels in the pelagic food web (Groot & Margolis, 1991;Hunt et al., 2008;Lalli & Gilmer, 1989). Like other shelled pteropods, species of the genus Limacina are protandrous hermaphrodites, developing from larvae into males first, before transitioning into females that lay free-floating egg strings from which free-swimming larvae hatch (Lalli & Wells, 1978). Limacina bulimoides is the most common warm-water Limacina species worldwide, and is found in all tropical and subtropical oceans from ~45° N to ~40° S (Bé & Gilmer, 1977).
They inhabit depths of up to 200 m and exhibit diel vertical migration, where they are abundant in the upper 100 m at night, and retreat to deeper waters during the day (Wormuth, 1981). This species is estimated to complete its life cycle in about 1 year (Wells, 1976).
Our prior work on L. bulimoides was limited to a subset of the species' range in the Atlantic Ocean and used two barcoding genes (Choo et al., 2021). Here we investigate patterns and evolutionary origins of divergence across the entire species' range and with hundreds of thousands of genome-wide single nucleotide polymorphisms (SNPs) that derive from the target capture loci characterized in Choo et al. (2020). We also measured phenotypic differences and examined these within the geographical and genetic context of L.
bulimoides, and explored the historical origins of speciation in the tropical and subtropical ocean. Overall, the work presented here represents the first global population genomics study of a marine zooplankton species, revealing the evolutionary history and extent of reproductive isolation among multiple independently evolving lineages that are morphologically cryptic.

| Sample collection
Bulk plankton samples were collected in the Atlantic, Pacific and Indian Oceans during several cruises (Table 1). Specimens were collected using either oblique or vertical tows with ring nets or bongo nets (mesh size 200-505 μm). Tows were conducted for 20 min to 1 h, at approximate maximum depths ranging between 60 and 459 m.
Samples were immediately preserved in 96% ethanol and stored at −20°C. The ethanol was replaced after 24 h of preservation. Limacina bulimoides specimens were subsequently sorted in the laboratory and photographed in a standard orientation prior to destructive genetic work. Thus, the same set of individuals was used to obtain both genetic and phenotypic data (Table S1).

| Library preparation and sequencing
Specimens were prepared for sequencing as described in Choo et al. (2020). Briefly, genomic DNA was extracted from each individual with either the E.Z.N.A mollusc or insect DNA extraction kit (Omega Bio-Tek). The DNA was sheared by sonication to attain a peak length of 300 bp with a Covaris S220 or ME220 focused ultrasonicator. After sonication, the fragmented DNA was prepared into individual libraries using the NEXTflex Rapid Pre-Capture Combo Kit (Bioo Scientific). Individually barcoded libraries were subsequently pooled at equimolar concentrations with 26-27 libraries per pool.
The target capture reaction was then performed on each pool using the myBaits Custom Target Capture kit (Arbor Biosciences), with a capture probe set specifically designed for L. bulimoides, as described in Choo et al. (2020). The baits target 2890 nuclear regions as well as the mitochondrial cytochrome c oxidase subunit I (COI) and nine other mitochondrial gene regions. To maximize the specificity of the capture reaction, the hybridization time of the probes with the libraries was extended to 3 days, and the capture was TA B L E 1 Collection details of specimens of Limacina bulimoides from the global ocean.  Figure 1a.

Cruise and station
performed twice, with 4 and 1.5 μL of probe mix, respectively (Choo et al., 2020). Captured library pools were sequenced in paired-end mode on the Illumina NextSeq 500 platform using three high-output v2 chips (150 cycles).
The reduced genomic assembly contained only the relevant contigs (i.e., those contigs that were used for the capture probe design).
The resulting alignments were cleaned and filtered with samtools version 1.4.1 (Li et al., 2009)

| Quality filtering
SNPs were hard-filtered with VariantFiltration using QualByDepth  Table S3.
For the majority of analyses, SNPs were further filtered in bcftools version 1.7 (Danecek et al., 2021) to retain only SNPs with Phred quality score >20 and with genotype calls based on at least three reads present in at least 80% of the individuals. This resulted in 759,613 nuclear SNPs ready for further analyses. A summary of the data processing steps, setting and data sets used for each analysis in this study can be found in Figure S1.

| Population structure
For all population structure analyses, we excluded rare alleles.
Specifically, SNPs with minor allele frequency (MAF) of <1% were   (Chang et al., 2015) and population clustering using the admixture software (Alexander et al., 2009). We used the in-built cross-validation procedure in admixture to estimate an appropriate value of K with the lowest cross-validation error among the values of K tested, between 2 and 8. Higher values of K were not tested as values of K between 3 and 8 showed an increasing trend (Table S4). This was also supported by the admixture plots which became increasingly noisy for the higher values of K (7 and 8) ( Figure S3). For these two analyses, we used a subset of independent SNPs, with linkage pruning settings of window size of 50 genetic variants, window shift of 10 variant counts and r 2 of .2 in plink version 1.90b6.17 (Chang et al., 2015), which resulted in 107,214 SNPs.
To explore population structure in finer detail, we also calculated nearest neighbour haplotype co-ancestry across all individuals with fineradstructure version 0.3.2 (Malinsky et al., 2018). We used the hapsFromVCF function of radpainter to convert all biallelic SNPs we assessed the proportion of missing SNPs within each locus (i.e., contig). Where this proportion was more than 50%, the entire locus was considered "missing" in that individual. Second, we excluded five individuals that had an unusually high proportion (>4%) of such missing loci. For the remaining 156 individuals, we used the paint function of radpainter to calculate a co-ancestry matrix, which summarizes the nearest neighbour haplotype relationships in the data set. Next, a clustering dendrogram of shared ancestry was inferred from the co-ancestry matrix using the finestructure Markov chain Monte Carlo (MCMC) clustering algorithm, with 100,000 burn-in iterations, 100,000 sample iterations and thinning of 1000. The inferred clusters were arranged with a simple tree building algorithm in finestructure with 10,000 hill-climbing iterations.
For the calculation of F ST we used the Weir and Cockerham (1984) estimator, implemented in vcftools version 0.  (Leigh & Bryant, 2015).

| Genetic diversity
To estimate genetic diversity within the major lineages, we calculated heterozygosity (average proportion of heterozygous sites in a sequence), nucleotide diversity ( ) and inbreeding coefficient within the target capture probe regions for nuclear contigs. This was based on the hard-filtered SNP data set (1,629,382 SNPs; see Figure S1) divided into three VCFs, one for each major genetic lineage. We used the command RegionsPiGeneral from the evo toolkit (https://github.

| Phylogenetic and demographic analyses
To infer a time-calibrated Bayesian tree, we generated a new SNP data set, including the sister species L. trochiformis. SNP calling was performed with all 161 L. bulimoides individuals and 11 L. trochiformis individuals (NCBI: SAMN111131501-09, SAMN20293270, SAMN20293271 from Choo et al., 2020). The L. trochiformis individuals were included to root the phylogeny and calibrate the age of divergence. SNP calling was followed by quality filtering using the previously mentioned settings to obtain a data set of biallelic nuclear SNPs. Since coalescent analyses do not require a large number of individuals (Felsenstein, 2006), to reduce computational effort, we randomly selected two L. trochiformis individuals and 10 individuals from each of the L. bulimoides genetic lineages to include in our analyses. SNPs corresponding to these 32 selected individuals were refiltered to retain only biallelic nuclear SNPs. This resulting SNP data set was thinned with vcftools 0.1.15 (Danecek et al., 2011) to select one SNP every 100 bp to reduce linkage between the SNPs, resulting in a final data set of 19,669 thinned, biallelic nuclear SNPs.
Finally, we selected SNPs that were shared across all four taxa, including the outgroup, resulting in 1279 SNPs suitable for the snapp analysis described below.
A snapp (Bryant et al., 2012) species tree was inferred in beast2 version 2.6.2 (Bouckaert et al., 2019), using the approach demonstrated in Stange et al. (2018). We used a custom ruby script (https:// github.com/mmats chine r/snapp_prep/blob/maste r/snapp_prep.rb), with a strict clock substitution rate calibrated with an a priori age constraint of divergence between L. trochiformis and L. bulimoides . The script was modified to allow for unlinked population sizes between the three lineages. The divergence time between L. trochiformis and L. bulimoides was estimated at 13.57 million years ago (95% confidence interval [CI] = 9.2804-20.9182 million years ago), which was approximated to a lognormal distribution of (0, 13.57, 0.2), with a 95% CI of 8.99-19.7. The generation time of L. bulimoides was assumed to be 1 year (Wells, 1976) in order to calibrate the estimated effective population size. We used tracer (Rambaut et al., 2018) to observe that all effective sample size (ESS) parameters converged to stationarity with values above 200 after 1,500,000 MCMC iterations, for three independent trials. The maximum clade credibility trees with mean heights were produced in treeannotator version 2.6.0 . The effective population size (N e ) and their 95% highest posterior density (HPD) confidence intervals for each of the three genetic lineages were calculated with θ from the snapp analysis, according to the following formula: N e = θ/4 μ, where the mutation rate μ was given as the log_clock_rate divided by 1 × 10 6 with a generation time of 1 year.
To assess support for the lineages as different species, we ran snapp together with bfd* (Leaché et al., 2014), an approach for Bayes factor delimitation (BFD), for the following models of species delimitation: all lineages being separate species, Indo-Pacific and Pacific lumped as one species, and all three lineages as one species. We ran the path-sampling for 48 steps, with chain length of 100,000 and pre-burnin of 10,000. Support for the different models was assessed using marginal likelihood estimates and Bayes factors (Kass & Raftery, 1995).
To estimate levels of gene flow between the Atlantic and other two lineages, we ran an ABBA-BABA test for all 161 individuals from the three lineages of L. bulimoides and 11 individuals from their sister species L. trochiformis, which we used as an outgroup. We used the SNP data set that was prepared for the snapp analysis above, but retained all 172 individuals and did not thin the SNPs. Therefore, a total of 383,675 biallelic SNPs were used as input for the analysis in dsuite version 0.5 r44 (Malinsky et al., 2021) using the Dtrios command with the tree topology from the previously inferred snapp phylogeny.
Demographic changes for each lineage were reconstructed using stairway plot version 2.1.1 (Liu & Fu, 2020). We used biallelic SNPs from nuclear probes that passed the coverage requirement for the genetic diversity estimates (2265 probes; 78% of the total), with a total of 1,448,243 sites (polymorphic + fixed) across these probes.  Lisiecki and Raymo (2005).
The mutation rate estimated by snapp is based only on the variable sites provided to this software and, therefore, is not meaningful outside of the snapp analysis. To obtain a mutation rate estimate suitable for use in the stairway plot analysis above, we used the same data set with 2265 probes, and calculated "net sequence divergence" (i.e., d a of Cruickshank & Hahn, 2014), between the lineages: Atlantic vs.
The mode of speciation between the three lineages was inferred through testing various demographic scenarios with fastsimcoal2 (fsc27093) (Excoffier et al., 2013;Excoffier et al., 2021). We used the biallelic nuclear SNP data set described earlier for the stairway plot analysis and generated a multidimensional SFS file comprising all three lineages using easysfs. With the phylogeny and mean divergence times derived from the snapp analysis, and the previously estimated mutation rate of 5.22 × 10 −9 , we evaluated the following

| Shell morphology
All 161 L. bulimoides individuals used for genetic analyses were photographed in a standardized apertural orientation using a Zeiss V20 stacking stereomicroscope with Axiovision software prior to destructive DNA extraction. The shell images were digitized at 11 (semi-)landmarks ( Figure S2) using tpsutil and tpsdig (Rohlf, 2015). In total, 159 of the 161 L. bulimoides individuals had the complete set of landmarks and could be included in the geometric morphometric analysis. Coordinates of the (semi-)landmarks were analysed in tpsrelw (Rohlf, 2015), using a generalized least-squares Procrustes superimposition (Rohlf & Slice, 1990;Zelditch et al., 2004).
A repeatability analysis was conducted with a subset of 30 individuals, comprising two individuals from each of the 15 locations.
Images of these individuals were landmarked at the 11 (semi-)landmarks by two independent researchers. Centroid size and relative warp (RW) scores between the pairs of images per specimen after Procrustes fitting were compared using intraclass coefficient (ICC) in past3.0 (Hammer et al., 2001), and ICC values >0.75 were considered sufficiently repeatable.
We tested for significant variation in shell shape across genetic lineages with a nonparametric permutational multivariate analysis of variance (one-way PERMANOVA) using Euclidean distances and 9999 permutations with the vegan package in r (Oksanen et al., 2019). Only the six repeatable RWs were used in the oneway PERMANOVA. The first two RW axes were plotted to visualize shell shape variation for different genetic lineages of L. bulimoides.
Additionally, a canonical variates analysis (CVA) was conducted in r (R Core Team, 2017) to discriminate shell morphometric differences between genetic lineages. A one-way ANOVA with a post hoc Tukey HSD test was also conducted in r to test if the means of the canonical variate for each genetic lineage were different from the other groups. We also examined the effect of sampling location on shell shape, by conducting the above analyses with the four ocean basins (Atlantic, Indian, N. Pacific and S. Pacific) instead of the three genetic lineages.
We also assessed shell colour variation by qualitatively scoring aperture colour as either transparent, pink, tan or red-brown. We chose to score the colour of the aperture as it was not obscured by the tissue, which could interfere with colour identification. Also, the shell was thicker at the aperture, making it easier and more reliable to observe the colour. Scoring was done by two independent researchers to ensure repeatability of the colour scores. We tested then whether aperture colour was randomly distributed across location using a Fisher's exact test of independence.

| Global genetic structure
Overall, we observed three highly divergent genetic lineages that exhibited no evidence of recent admixture, namely the Atlantic,  (Table S2).
The PCA plot revealed three genetic clusters with no intermediates between them (Figure 1b). The first two principal component (PC) axes comprised 39.7% of the total genetic variation; PC1 (23%) shows the Atlantic as a clearly distinct lineage, whereas PC2 (16.7%) separates all three lineages. Congruently, three genetic clusters were the best supported by the admixture analysis (Figure 1c), as K = 3 gave the lowest cross-validation error of 0.232.
The same three genetic clusters were also recovered from the nearest neighbour haplotype co-ancestry matrix generated by fineRADstructure ( Figure 2). In addition, we saw that the Atlantic lineage comprised three subclusters corresponding geographically to North, Equatorial and South Atlantic sampling sites (Table 1) F I G U R E 2 Co-ancestry matrix for 156 individuals, coloured according to the number of loci (contigs) at which two individuals are each other's closest relatives (see key: black/blue colours indicate greater relatedness while yellow indicates lower relatedness). The three main genetic lineages (Atlantic, Indo-Pacific and Pacific) can be identified, along with finer scale structure within each lineage. The Atlantic lineage (blue) can be further subdivided into three geographical regions with higher co-ancestry: North, Equatorial and South Atlantic. Within the Indo-Pacific lineage (green), there is higher co-ancestry within the Indian Ocean locality compared to the North Pacific sampling sites. Within the Pacific lineage (orange), the North Pacific individuals have a higher co-ancestry than with the South Pacific lineages.

| Genetic diversity
To further assess the amount of within-lineage genetic diversity within our data set, we calculated nucleotide diversity (π) and the proportion of heterozygous sites (heterozygosity) for each individual at each of the target capture loci (Figure 3b). Both of these measures were highest within the Pacific lineage, followed by the The values of π are only slightly above average when compared with other animals (Leffler et al., 2012;Romiguier et al., 2014), which may seem surprising given the large population sizes of planktonic species, including L. bulimoides. However, it is worth keeping in mind that our target capture data consist of a large proportion of coding, including nonsynonymous, sites (Choo et al., 2020) and are affected by purifying and background selection, which reduce genetic diversity. The Indo-Pacific and Pacific lineages split from one another with a mean age of 0.978 Ma (95% HPD = 0.603-1.43 Ma). Consistent with the estimates of genetic diversity π, the snapp analyses indicate that the average estimated effective population size (N e ) was highest for the Pacific lineage (N e = 6.36 × 10 6 , 95% HPD = 4.17-8.56 × 10 6 ), followed by the Atlantic lineage (N e = 3.02 × 10 6 , 95% HPD = 2.12-3.86 × 10 6 ) and was lowest for the Indo-Pacific lineage (N e = 1.61 × 10 6 , 95% HPD = 1.23-2.02 × 10 6 ). The three lineages were best supported as separate species based on the BFD analysis (BF > 10), in comparison to the other models lumping all three F I G U R E 3 Characterizing genetic structuring and diversity of Limacina bulimoides based on nuclear SNPs. (a) Weir and Cockerham estimates of F ST within and between the major genetic lineages: Atlantic (N = 63),  and Pacific (N = 60). The distribution is over 1000 random subsamples of unlinked SNPs (see Methods for details). (b) Estimates of heterozygosity and nucleotide diversity (π) within each of the major lineages. Heterozygosity was estimated as the proportion of heterozygous sites averaged across individuals. Each data point corresponds to the estimate for one target capture region.   (Table S5).

| Divergence-time and population size
Based on the ABBA-BABA test with the topology (((Indo-Pacific, Pacific) Atlantic) L. trochiformis), there was tentative evidence for excess allele sharing between the Atlantic and Pacific lineages compared to the Atlantic and Indo-Pacific lineages (Dstatistic = 0.0557, Z = 2.31, p = .0208) ( Figure S5 and Table S6).
Furthermore, fastsimcoal2 analyses, based on a joint site allele frequency spectrum, suggest very limited recent gene flow (m = 1.17 × 10 −6 ) between the sympatric Indo-Pacific and Pacific lineages, as it was the best-supported model based on likelihood distributions and AIC among the four models tested ( Figure S6 and Table S7).
To estimate historical changes in N e in the three major lineages, we used stairway plot analyses (Methods). The stairway plot approach indicates that all three lineages had similar demographic histories, with N e increase followed by a long period of stable high

| Morphological variation
Based on the repeatability analysis with 30 individuals landmarked independently by two observers, RWs 1, 2, 3, 5, 10 and 11 were selected as repeatable parameters to be used in the geometric morphometric analyses of shell shape. Of the 161 L. bulimoides individuals, 159 had the complete set of 11 (semi-)landmarks and were included in the analysis. The six repeatable RWs explained 83.75% of shell shape variation (Table S8). Most of the geometric morphometric variation was due to changes in shell width as shown in RW1 accounting for 51.05% of the total variation, and relative size of the aperture as shown in RW2 explaining 18.16% of the total variation (Figure 6a). Though the shell shapes of the three genetic lineages overlapped to a large extent, the distributions were significantly different (PERMANOVA:  Figure S7; Table 2). Grouping specimens according to their sampling location also resulted in significant differences in shell shape ( Figures S8 and S9), but this is to be expected because genetic lineages are not randomly distributed in space.
We observed large variability in shell colour within our samples.
The L. bulimoides specimens ranged in colour from almost completely white to beige to reddish-brown (Figure 6b; Figure S10). In most coloured individuals, pale beige or red pigmentation was found on the inner aperture, lower half of the aperture and sometimes on the sutures of the shell. Tissue pigmentation also varied widely, with dark F I G U R E 5 Effective population size (N e ) of the three Limacina bulimoides lineages through time as reconstructed by stairway plot. The x-axis is the number of years before present, assuming a generation time of 1 year. Estimates of median N e are coloured per lineage. Axes are log-scaled for clearer visualization of recent histories. The plot was truncated from 0 to 1000 years ago as sample sizes are not sufficient for reconstructing very recent events as in Liu and Fu (2020). Dotted lines indicate the temporal boundaries between Marine Isotope Stages corresponding to timing of the reconstructed N e changes. Odd MIS numbers refer to interglacial stages, while even numbers refer to glacial stages, with MIS1 marking the start of the Holocene and MIS12 being one of the strongest glacials of the Quaternary period. grey tissue that is visible through the somewhat transparent shell observed in specimens from the North Atlantic and Indian Ocean, while almost completely white tissue (and shell) was mostly recorded from the South Pacific ( Figure S10). Since the colours of the shell and tissue are confounded with each other and difficult to distinguish, we limited the statistical analyses to shell aperture colour, which is not affected by tissue pigmentation.
Aperture colour was not randomly distributed with respect to genetic lineage (p = .0005, two-sided) or sampling location (p = .0005, two-sided) ( Figure S10, Tables S10 and S11). Atlantic specimens had mainly pink and red-brown apertures while North Pacific specimens were highly pigmented with mainly tan and red-brown apertures. Specimens from the South Pacific had less pigmented apertures, with either transparent or pink apertures, and Indian Ocean specimens had highly pigmented shells with red-brown apertures. Aperture colour was often consistent within sampling station (six of the 15 stations) but could also vary (e.g., KH1110_05 from the North Pacific had all four aperture colours represented; Table S9).

| Lineages in the North Pacific appear morphologically cryptic
We closely examined morphological variation in individuals from the two lineages that co-occur in the North Pacific, namely the Indo-Pacific lineage (n = 16) and Pacific lineage (n = 16) (Figure 7).
However, 81% of Pacific lineage individuals (13 out of 16) had dark pigmented spots on their tissue, which was visible through their transparent shell, and none of the Indo-Pacific lineage individuals had such spots (Figure 7c). Spots were not observed on the photographs of three Pacific individuals, but these spots could have been on the opposite side that was not photographed. Dissection of additional individuals from the same samples showed that these pigmented spots were localized on the margin of their "wing-feet" or parapodia ( Figure S11). Interestingly, these pigmented spots were not observed in the photographs of other individuals belonging to F I G U R E 6 Variation in shell shape and aperture colour across the three lineages of Limacina bulimoides. (a) Shell shape variation of 159 L. bulimoides specimens, categorized into the three genetic lineages (Atlantic, Indo-Pacific and Pacific). Shell shape variation is visualized with relative warp axes 1 and 2, explaining 51.05% and 18.16% of the total shell shape variation, respectively. Extremes of both relative warp axes are shown with the thin plate spline images, with each black dot corresponding to a shell landmark. Line drawings of example individuals from each genetic lineage are shown: Atlantic (Lbul_NIC2_S9C3_04), Indo-Pacific (Lbul_SN105_08_02) and Pacific (Lbul_KH1110_18_03). (b) Illustration of the intensity of aperture colour observed, arranged from light, medium to dark apertures in each of the three lineages (see also Figure S10).  the Pacific lineage but sampled in a different location, in the South Pacific ( Figure S10).

| DISCUSS ION
We found that the shelled pteropod Limacina bulimoides is composed of three main genetic lineages (Atlantic, Indo-Pacific and Pacific), the latter two of which were found in sympatry at three sampling sites in the North Pacific. We did not find any evidence  (Knowlton, 1993;Sáez et al., 2003). These "wing" spots seem analogous to wing pigmentation found in butterflies and flies, which could be a convergent distinguishing trait that mediates species recognition (Gompel et al., 2005;Wiernasz & Kingsolver, 1992).
However, it is not known whether Limacina, although they are commonly referred to as "sea-butterflies," can actually perceive such spots. Limacina possess eyes and optic nerves (Laibl et al., 2019), and are capable of detecting changes in light levels to trigger daily vertical migration (Cohen & Forward Jr, 2016), but there is a lack of information on the types of cues used by pteropods to recognize conspecifics. In other planktonic species, such as copepods, pheromone trails and swimming patterns are used for mate finding (Goetze & Kiørboe, 2008;Kiørboe, 2007), which may also serve to F I G U R E 7 Distribution (a) and morphological variation (b,c) of the sympatric genetic lineages of Limacina bulimoides in the North Pacific with 16 individuals from the Indo-Pacific lineage (green) and 16 individuals from the Pacific lineage (orange). (b) Shell shape variation as visualized with relative warps 1 and 2, which explain 51.05% and 18.16% of the total shell shape variation, respectively. (c) Photographs of an example individual from each genetic lineage are shown, with dark pigmented spots on the tissue, visible through the transparent shell, of the specimen from the Pacific lineage (see also Figures S10 and S11). mediate the reproductive isolation between sibling species, as seen in other marine organisms such as isopods, stomatopods and amphipods (Palumbi, 1994).
We have insufficient evidence to conclude whether the Pacific and Indo-Pacific lineages evolved in sympatry or have arrived at their present-day distribution through secondary contact. While we see that the model incorporating recent gene flow between the Indo-Pacific and Pacific lineages is better supported than the other models tested, we have insufficient information to discern whether gene flow is still occurring or has occurred earlier in their evolutionary history. Based on the comparisons between the recent and ancient gene flow models ( Figure S6 (Fišer et al., 2018;Milá et al., 2017).
We observe both situations. Among the three main lineages, there are statistically significant differences in the (overlapping) distributions of shell shapes and aperture colours. These traits also vary among sampling locations within lineages (Tables S9-S11). At the same time, when considering only the sampling locations where the Indo-Pacific and Pacific lineages appear in sympatry there are no statistical differences in shell shape. Shell shape is likely to have a phenotypically plastic component linked to life history and environmental conditions Hollander et al., 2006;Mariani et al., 2012;Zieritz et al., 2010). Pteropod shell shape is an important trait that directly affects their sinking and swimming speeds, manoeuvrability, and their resulting ability to navigate the water column for food and evade predators (Karakas et al., 2020).
In other molluscs, shell shape has been shown to be correlated with environment, either as genetically inherited differences or phenotypically plastic traits. Examples are found in various species of intertidal snails with a "crab" or "wave" ecotype, such as Littorina and Nucella (Guerra-Varela et al., 2009;Hollander & Butlin, 2010;Johannesson, 2003;Rolán et al., 2004), or in Mytilus exhibiting shell shape plasticity as a response to environmental parameters such as temperature and food (Telesca et al., 2018).
The nonrandom variation in shell colour across sampling locations ( Figure S10) may be due to pigments incorporated from their diet, which is composed of phytoplankton, microbes and particulate matter trapped within their mucous web (Conley et al., 2018).
Limacina can feed selectively by moving the cilia on their wings and mantle lining to sort and reject unwanted food particles (Lalli & Gilmer, 1989), and they can also control their vertical distribution through diel vertical migration. These differences in food choice and vertical habitat may lead to variable shell colour among individuals from the same sampling station. Production of shell colour has been suggested to be energetically costly across molluscs (Williams, 2017), but pteropods, with their unique planktonic lifestyle, may be subject to other selective trade-offs such as being transparent to remain inconspicuous to visual predators in the water column (Johnsen, 2001) or possessing red pigment for protection against UV radiation (Hansson, 2000).
The divergence of L. bulimoides lineages at around 1 Ma ( Figure 4) and increase in Pacific lineage N e (Figure 5), indicating population expansion, coincides with the timing of divergence in a mesopelagic copepod species (Andrews et al., 2014) and several coccolithophore speciation events (Filatov et al., 2021) during the mid-Pleistocene transition (0.6-1.2 Ma, between MIS22 and 24). This period was characterized by global cooling, lengthening of glacial cycles from 41,000 to 100,000 years, changing ocean circulation and productivity, and the evolution of many terrestrial and marine biota (Clark et al., 2006;Elderfield et al., 2012;Kender et al., 2016;McClymont et al., 2013). Changes in ocean circulation during the mid-Pleistocene transition could have facilitated the physical separation and subsequent divergence of the three lineages across the various ocean basins. These changes include the reduced exchange between the Indian and Atlantic Oceans via the Agulhas leakage due to the northward migration of the Subtropical Front towards the Agulhas Plateau (Caley et al., 2012;Cartagena-Sierra et al., 2021), and the connection between the Pacific and Indian Ocean due to the weakening of the Indonesian Throughflow (Petrick et al., 2019). Similar geographical structuring across ocean basins for other circumglobal warm-water plankton species has been attributed to both physical (e.g., ocean currents or continental landmasses) and ecological (species-specific interactions with oceanographic gradients) barriers (e.g., Bendif et al., 2019;Burridge et al., 2019;Filatov et al., 2021;Goetze et al., 2015;Hirai et al., 2015), although it is unknown if these structured populations arose at the same time. Within the Pacific and Atlantic basins, habitat discontinuities such as the mesotrophic equatorial upwelling waters, have been described as ecological dispersal barriers for subtropical copepods (Goetze, 2005;Goetze et al., 2017) and the pteropod genus Cuvierina (Burridge et al., 2015).
Congruently, fossil evidence of coccolithophore species distributions point to the emergence of new species at equatorial latitudes (Filatov et al., 2021).
We found substantial levels of genetic variation in all lineages (e.g., π ranged from 0.8% to 1.6% per lineage), despite the fact that most of our targeted regions are coding and thus affected by purifying and background selection. These levels are comparable to those reported for two species of planktonic copepods (π ranging from 0.6% to 0.8%) based on a similar target capture approach but with more restricted geographical sampling (Choquet et al., 2019).
This level of π, coupled with the stairway plot and snapp estimates of N e for L. bulimoides, indicates an absence of any pronounced population bottlenecks across multiple Pleistocene glacial-interglacial transitions. This suggests that this species complex is likely to have been resilient to ocean changes associated with glacial cycles. In the stairway plot, the timing of the rapid rise of N e in the Atlantic and Indo-Pacific lineage between MIS10 and 12 is coincident with MIS11, the warmest interglacial interval of the last 500,000 years, with high levels of atmospheric CO 2 (Siegenthaler et al., 2005), warm sea-surface temperatures (SSTs) and atypical blooms of calcareous plankton in the high latitudes (Howard, 1997;McManus et al., 2003). Stable population sizes in L. bulimoides were associated with the repeated glacials and interglacials from MIS2 to 9, including the Last Glacial Period (MIS2-4), even though pH levels fluctuated up to 0.2 ± 0.1 pH between glacial and interglacial periods (Hönisch & Hemming, 2005;Sanyal et al., 1995). Stable abundances of L. bulimoides were also recorded within sediment cores in the Caribbean Sea, which had low variation in SST across both glacial and interglacial periods (Wall-Palmer et al., 2014). At the beginning of the Holocene, between 19,000 and 7000 years ago (boundary of MIS1/2), temperatures increased and global sea levels rose rapidly (Lambeck & Chappell, 2001), while atmospheric CO 2 increased by 30% (Hönisch et al., 2012;Monnin et al., 2001). These environmental changes have been linked to decreasing shell mass in other planktonic calcifiers such as foraminifers and coccolithophores (Barker & Elderfield, 2002;Beaufort et al., 2011) and are associated with a decrease in N e in our stairway plot reconstructions. However, caution in interpreting these correlations is warranted because our demographic inferences are affected by background selection (Ewing & Jensen, 2016), which is expected to result in a signal of recent N e decrease. Another caveat is that the timings of divergence and demographic changes are a function of generation time, which was set to 1 year based on our knowledge of L. bulimoides biology. A more accurate estimate of generation time may affect these conclusions to some degree.
The very high levels of genetic diversity (heterozygosity and π) and large repetitive genomes of many marine zooplankton species hinder the adoption of genomic approaches in evolutionary studies of these organisms. By focusing on transcribed regions of the genome, the target capture approach has enabled us to greatly reduce these challenges and conduct a global genomic study in a marine zooplankton species. We note the usefulness of genomewide data, compared to mitochondrial DNA or barcoding genes, in detecting population structure and assessing species boundaries.
Unlike a single locus (e.g., the mitochondrial COI barcoding region), whose evolutionary history represents a single genealogy or a gene tree, our set of hundreds of thousands of genome-wide SNPs allows us to unambiguously apportion the genetic structure within L. bulimoides. Evolutionary histories of individual genes (gene trees) and histories of populations or species (population/species trees) are seldom identical (e.g., Nichols, 2001). Moreover, in some studies using barcode regions, species delimitations have been made based on taxon-specific thresholds of levels of sequence divergence beyond which reproductive isolation is expected (Krug et al., 2013;Lefébure et al., 2006;Young et al., 2017). This method is highly effective for detecting new putative species in broad-scale barcoding studies without prior information; however, divergence distances do not necessarily indicate reproductive isolation or lead to further understanding of the process of speciation (Freeman & Pennell, 2021). The genome-wide data set we present here will underpin future investigations into the nature of selection, adaptation, divergence and speciation in the open ocean. We expect that the predominance of coding loci in our data and the ability to separate synonymous and nonsynonymous sites will be major assets in these future studies.
Despite the potential for global dispersal in L. bulimoides, we observed diversification of lineages across ocean basins that probably originated from the mid-Pleistocene transition. These lineages are probably distinct species with strong reproductive isolation among them, and have survived through periods of glacial-interglacial transitions representing a wide range of oceanographic conditions, including ocean alkalinity. There are slight differences in shell shape between the lineages, but shell shape alone cannot be used as a taxonomic character, and it is unclear whether environmental or genetic factors have a greater impact on shell morphology. Even though their effective population sizes may have decreased since the start of the Holocene, the three lineages still possess high levels of standing genetic variation and nucleotide diversity, upon which selection could act to drive further adaptation in the future (Bernatchez, 2016;Bitter et al., 2019;Schluter & Conte, 2009). However, it is unclear whether pteropods and other planktonic calcifiers can cope with the rate of ongoing ocean changes, including anthropogenic carbon emission rates that are unprecedented over at least the last 66 million years, leading to increasing ocean acidification (Zeebe et al., 2016). While there are no visibly obvious dispersal barriers in the open ocean, we have found genome-wide evidence for speciation and divergence in L. bulimoides, and there is probably more diversity in planktonic species than meets the eye.

AUTH O R CO NTR I B UTI O N S
LQC, GH and KTCAP designed the study. EG and KTCAP contrib- Atlantic Sector Science (grant no. NE/R015953/1). This study contributes to the international IMBeR project and is contribution no. 368 of the AMT programme.

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw sequence data for Limacina bulimoides were archived on NCBI GenBank with the following accessions: SAMN11131477-79, SAMN11131480-82 and SAMN20293115-269. The assembled COI sequences for L. bulimoides can be found on NCBI GenBank with the following accessions: MZ542566-MZ542726. Shell images for the specimens, the methods protocol for DNA extraction and target capture, and hard-filtered vcf files used for this study can be ac-