Exploring ecological specialization in pipefish using genomic, morphometric and ecological evidence

Theory predicts that ecological specialization should be rare in marine ecosystems, given that dispersal barriers are less effective in the vastness of the sea compared with those in terrestrial settings. This paradigm, however, hardly fits with classical theories of local adaptation, raising the question of how marine diversity originates in a highly interconnected system. In the present study, we investigated how ecological specialization arises in a widely distributed marine species, the seaweed pipefish Syngnathus schlegeli.


| INTRODUC TI ON
Ecological specialization, the spatial and temporal differentiation of species in the use of habitat and resources (Devictor et al., 2010), is a pivotal eco-evolutionary paradigm to explain phylogeographical patterns, community assembly rules and the majority of adaptive radiations (Gillespie et al., 2020;Todd Streelman & Danley, 2003;Wilson et al., 2020). Although marine fishes often show complex adaptations in spatially heterogeneous environments (e.g. Wainwright et al., 2004Wainwright et al., , 2015, ecological specialization is thought to be rare because of the weak geographic barriers in the vastness of seas (Wilson, Wegmann, Ahnesjö, & Goncalves, 2020). This may be especially true for marine species with planktonic larvae as they can maintain large population sizes with high gene flow (Palumbi, 1994). Therefore, marine organisms challenge the classical theories of local adaptation and speciation, dubbed as the "Marine Speciation Paradox" (Bierne, Bonhomme, & David, 2003).
Molecular phylogenetic reconstructions of some marine fishes suggest that ecological specialization has a great impact on their diversification (Rocha et al., 2005;Wainwright et al., 2004). This is especially true for geographically widespread species that may experience considerably different ecological conditions (Fox & Morrow, 1981;Loxdale et al., 2011;Shipley et al., 2009). The importance of natural selection in shaping adaptive trait differentiation among natural populations has long been recognized (Garant et al., 2005;Smith et al., 1997). Theory predicts that once populations are exposed to diverse ecological environments, rapid evolution of adaptive traits should occur (Hendry et al., 2000), and phylogeographical structuring along ecological gradients has been documented in many marine species (Liu et al., 2007;Ni et al., 2012;Wilson et al., 2020;Xu et al., 2009). Divergent selection is an important element of natural selection and occurs when different environments favour different phenotypes, leading to increased differences between populations (Bolnick & Stutz, 2017). Theoretically, divergent selection should promote the evolution of traits in local populations, which provide an advantage under local environmental conditions (Kawecki & Ebert, 2004), and selection pressure may lead to genetic divergence and eventual speciation if the homogenizing effects of the gene flow are insufficient to prevent it. The ecological processes, along with local adaptation, are most important responses of species to the changing environment, which both contribute to the specialization. Several recent studies on marine species have demonstrated population differentiation consistent with local adaptation and trophic partitioning in spatially heterogeneous environments (Rocha et al., 2005;Sanford & Kelly, 2011;Sanford & Worth, 2010;Wilson et al., 2020). In this regard, uncovering the differentiation among marine populations is of considerable importance for advancing our knowledge on the molecular mechanisms of adaptive evolution against a background of high-level gene flow.
The seaweed pipefish Syngnathus schlegeli (Kaup, 1856) is a wellknown species showing exclusive paternal care of eggs (Watanabe & Watanabe, 2001). As common traits that shared by all syngnathids , the offspring develop within the male brood pouch and are released as free-living juveniles. Syngnathus schlegeli is distributed along the coastlines of the north-western Pacific Ocean and generally inhabits sheltered areas in shallow waters; the geographic range of this species encompasses a wide variety of climatic conditions and nearshore microhabitats (Chen et al., 2017;Zhang, Capinha, et al., 2020). The vast region of the north-western Pacific is characterized by distinct tectonic and geographical features, with a series of marginal seas separating the eastern Asian continent from the Pacific Ocean (Tamaki & Honza, 1991). During Pleistocene glacial cycles, sea level fluctuations resulted in successive exposure and inundation of continental shelves, closure and opening of seaways, and separation and reunion of marginal seas (Wang, 1999). When the sea level fell, the seas were separated by land bridges extending between islands and the mainland, hindering the spread of most marine species Wang, 1999). The historical influence of glaciation, which has been proposed as a dominant factor shaping present-day phylogeographical patterns (Hewitt, 2004), distinguishes this area from other marine realms to an extent that its biota would be distinctly different from those revealed in other regions (Xu et al., 2009).
In this study, we investigated the cranial morphology, genetic structure and ecological niche of S. schlegeli along Chinese seashores to reveal their phylogeographical patterns. First, we used geometric morphometrics to partition the variations in the cranial morphology among populations. Second, we employed the restriction Main conclusions: We showed that the effect of this historical segregation, in concert with niche-driven ecological differentiation, might lead to the establishment of three distinct clades across the widely distributed marine pipefish. Ultimately, our study demonstrates that even the high connected sea environment maintains the potential for ecological specialization.

K E Y W O R D S
genetic divergence, Last Glacial Maximum, morphological variation, RADseq, species distribution model, Syngnathus schlegeli site-associated DNA sequencing (RADseq) technique to evaluate the level of genetic differentiation among seaweed pipefish populations based on the genome-wide genotypic and sequencing data. Finally, we compared the ecological niches of distinct S. schlegeli populations using n-dimensional hypervolumes and mapped habitat suitability of this species during present day and the Last Glacial Maximum (LGM) using species distribution model (SDM). By revisiting the phylogeographical history of the widespread pipefish using an integrative approach, our over-arching goal was to investigate the dynamics through which local specialization could arise in the apparent absence of dispersal barriers.

| Sample collection and DNA extraction
We sampled a total of 280 adult seaweed pipefish from nine sites along the coastal waters of China ranging from the Yellow Sea (YS) to the South China Sea (SCS) ( Table 1, Appendix S1). We collected all samples using bottom trawling between 2015 and 2019 (from March to June, and October to December), covering a depth of 10-30 m. All individuals were verified morphologically and sequenced for mtDNA (cytochrome b) to confirm species identification. We derived morphological data from 163 seaweed pipefish specimens: 63 from the YS, 34 from the East China Sea (ECS) and 66 from the SCS. Bodies of the remaining 117 specimens were damaged, preventing quantification of reliable morphological measures. The total length of sampled specimens of S. schlegeli ranged 18.1-21.6 cm, and no significant differences in size existed among sampled localities (ANOVA, Duncan's multiple range tests, p >.05). We extracted genomic DNA from muscle samples and then treated with RNase A. We checked the quality and integrity of the DNA samples using agarose gel electrophoresis and DNA concentrations using a Qubit 2.0 fluorometer.
To get a preliminary understanding of genetic divergence of S. schlegeli populations, partial mitochondrial cytochrome b (cytb) and nuclear Sorting nexin 33 (snx33) from 270 pipefish individuals were amplified, and we obtained 248 and 265 valid sh3 and cytb sequences after data correction (amplification products with double peaks were removed) (Table 1). Furthermore, genome-wide genetic differentiation and signatures of local adaptation were investigated using RAD sequencing of 69 S. schlegeli individuals (31 from the YS, 22 from the ECS and 16 from the SCS) (Table 1, Appendix S1). All experimental procedures performed in this study were in accordance with the relevant animal ethics guidelines and approved by the Ethical Committee of South China Sea Institute of Oceanology, Chinese Academy of Sciences.

| Comparative morphometric analysis of populations
To test for the presence of a north-south morphological differentiation among S. schlegeli populations, we examined the cranial morphology of 163 individuals (including the 69 individuals involved in the RADseq) using landmark-based geometric morphometry analysis. We acquired digital images of the right lateral side of each specimen using a Canon camera (EOS Rebel T4i; 18 mega pixels).
We performed all morphometric analyses using thin-plate spline (tps) (available at http://life.bio.sunysb.edu/morph/). We sorted digital images according to the geographical population and converted to tps files using tpsUtil. For the head images, we digitized 26 landmarks using tpsDig2 (19 homologous landmarks and 7 semilandmarks), which were modified from the geometric morphometric  (Wilson et al., 2020) and Leysen (Leysen et al., 2011)(Appendix S2), and the tps files with landmarks were then processed in tpsRelw to enable relative warp analysis and obtain X and Y coordinates for further analysis. ANOVA was performed on the relative warp scores of all S. schlegeli populations.

| Sequence quality and RADtag filtering
Selected samples were individually barcoded for genomic library construction utilizing the restriction digestion enzyme EcoRI (G|AATTC).
We ligated individually barcoded Solexa P1 adapters onto the EcoRI cut site for each sample. After manually size-selecting 300-500 bp fragments by gel electrophoresis, we blunt-end-repaired libraries and added a 3' adenine overhang to each fragment. Then, we added a Solexa P2 adapter containing unique Illumina barcodes. Finally, we sequenced paired-end libraries with an insert size of 200-400 bp on the Illumina HiSeq PE150 platform. We filtered raw sequencing data through the removal of low-quality reads, reads with adapter contamination, reads with ≥10% unidentified nucleotides (N) and reads for which ≥50% of the length had a phred quality ≤5.

| Single nucleotide polymorphism (SNP) calling
Genome-wide single nucleotide polymorphisms were detected for subsequent population genetic and selective sweep analyses. We mapped all quality-filtered reads to the Syngnathus scovelli reference genome using BWA-MEM with the parameters aln −e 10 − t 4 − l 32 − i 15 − q 10 (Li & Durbin, 2009). We imported the alignment files to SAMtools for sorting and removing duplicated reads (Li, 2011).
Sequencing coverage and depth for each sample were calculated.

| Population genetics analyses based on the RADseq data
The genetic structure among S. schlegeli populations was conducted by genome-wide population genetics analyses. A neighbour-joining tree was conducted based on the pairwise genetic distance matrix data (P-distance among all 69 individuals) using TreeBest v1.9.2 (Vilella et al., 2009). We set the number of bootstrap replicates to 1,000 to assess the statistical support for nodes in the tree.
We performed principal components analysis (PCA) based on SNP markers using GCTA tools (Yang et al., 2011). We transformed the population genotypes into a matrix that included the numbers 0, 1 and 2, where 0 is a genotype that is homozygous for the reference allele; 1 is a genotype heterozygous for the reference allele; and 2 is a genotype homozygous for the non-reference allele. We calculated the sample covariance of the matrix that contained the information for all individuals (with the numbers 0, 1 and 2). Finally, we calculated the eigenvector decomposition of the matrix and plotted the PCA using GCTA tools.
We used PLINK to generate the map files necessary for downstream analyses (Purcell et al., 2007). We determined population structure using the FRAPPE software (Tang et al., 2005). We increased the pre-defined parameter K from two to five to cover the maximum numbers of lineages that could be identified in the tree, representing the assumed groups of a simulated population in ancient times. Finally, we inferred population genetic structure and individual ancestry proportions using FRAPPE.

| Selective sweep analysis
Population-differentiation statistic (F ST ) based on the mitochondrial cytb and nuclear snx33 was estimated with Arlequin software version 3.5.2.2 (Excoffier et al., 2007). To characterize genome-wide patterns of genetic variation and population differentiation, we calculated nucleotide diversity (π) and population-differentiation statistic using a sliding window approach (a 40-kb window sliding in 20-kb steps). π and F ST were used to estimate the genomic diversity of populations and evaluate the strength of genomic differentiation between two populations, respectively.
We measured selection signatures of population-specific genomic regions between S. schlegeli populations (YS-ECS and SCS).
We applied a sliding window approach (a 40-kb window sliding with a step size of 20 kb) to identify selected regions. We identified genomic regions with a significantly high F ST (top 5%) and θπ ratio (log 2 (π1) -log 2 (π2), top 5%) values as selective sweep regions. We submitted genes to the Gene Ontology databases for annotation (Ashburner et al., 2000;Ogata et al., 1999). Gene ontology analyses were performed to annotate genes to functional ontology terms.
Using these candidate genes, we performed a targeted population genetic analysis. We used a higher resolution sliding window analysis (10-kb window compared with a 100-kb window for the selection analysis) to calculate F ST and θπ between populations.

| Ecological niche analyses
We We retrieved marine environmental predictors at a spatial resolution of 5 arcminutes from the MARSPEC database (http://marsp ec.weebly.com) (Sbrocco & Barber, 2013). We initially considered bathymetry (i.e. depth of the seafloor) (Appendix S1; Chen et al., 2017;Zhang, Capinha, et al., 2020) and 10 marine environmental predictors, including annual mean, range, variance and extreme values of sea surface temperature and sea surface salinity (Appendix S5). We calculated the pairwise Pearson's correlation coefficients (r) among predictors and selected only one among highly correlated predictors (i.e. |r| > 0.7) (Dormann et al., 2013). Accordingly, we retained five predictors for model development: bathymetry, mean annual sea surface salinity, annual range in sea surface salinity, mean annual sea surface temperature and annual range in sea surface temperature (Appendix S5).
We used n-dimensional hypervolumes (Blonder et al., 2014) (also known as Hutchinsonian niche; Hutchinson, 1957) to quantify realized niches and explore niche differentiation among the three clades of S. schlegeli along Chinese coastal waters. We delineated hypervolumes using the five selected marine predictors and a Gaussian kernel density estimator (Blonder et al., 2018). Prior to hypervolume construction, we normalized all predictors via zscore transformation, as recommended by Blonder et al., (2014).
We measured niche overlap using both a similarity and a distance metric (Mammola, 2019), namely the hypervolumes' distance between centroids and pairwise overall differentiation (β diversity) (Carvalho & Cardoso, 2020;. The latter approach decomposes overall niche differentiation (β total ) in two process: niche shifts (β replacement ) and niche expansion/contraction (β richness ) (Carvalho & Cardoso, 2020). β total values range from 0 (when niches are identical) and 1 (fully dissimilar niches), holding true the equality β total = β replacement + β richness . We developed SDM using all distribution data (i.e. 131 records) and the five non-collinear predictors. To reduce uncertainties associated with SDM algorithm choices, we used an ensemble modelling approach (Thuiller et al., 2019). We fitted and tested the performance of nine algorithms available in biomod2 R package (Appendix S5), using default settings (Thuiller et al., 2020). Note that we did not consider random forest algorithm in biomod2 R package because it cannot handle imbalanced data (Barbet-Massin et al., 2012). In addition to species presence data, we randomly generated 10,000 background points within the study area as pseudo-absences. We evaluated the predictive abilities of the nine algorithms using a fivefold cross-validation approach with ten replicates. True skill statistic (TSS) and the area under the receiver operating characteristic curve (AUC) were used to measure SDM predictive performance: algorithms with TSS >0.75 and AUC >0.9 were considered to exhibit excellent predictive capacity . The selected algorithms were used to predict the habitat suitability of S. schlegeli under present-day and LGM climate conditions. Predictions of single algorithms were ensembled using a committee averaging strategy (Thuiller et al., 2020;Thuiller et al., 2019). The continuous habitat suitability results (ranging from 0 to 1) were transformed into binary suitable/unsuitable maps by using the 10th percentile presence probability threshold (Zhang et al., 2021).
The shape of the specimens was described using relative warp analysis (Appendix S6). The relative warp 1 showed the length of the snout and gill cover, and the relative warp 2 explained the angle of the snout ( TA B L E 2 Summary of relative warp analysis and ANOVA results of relative warps showed significant differences between populations in terms of relative warp 3, which represents eye size (Table 2) (Figure 1d). We found that seven of the total 26 landmarks were related to morphological variation in the eye size, which accounted for 51.4% of the variation in relative warp analysis (Figure 1e).

| RADseq and population structure analysis
We performed RADseq for 69 S. schlegeli individuals, obtaining a total of 120.08 Gb raw sequencing data, 118.13 Gb of which were clean data remained after filtering (Appendix S7 We reconstructed phylogenetic relationships among the 69 sequenced S. schlegeli individuals based on the SNPs data using the neighbour-joining method. Results showed three distinct clusters that represent three geographical lineages (Figure 2a). PCA recovered the same clusters revealed by the phylogenetic analysis ( Figure 2b), showing that the YS population was closely related to the ECS population, whereas the SCS population formed a separate cluster. Multi-level (K = 2, 3, 4 and 5) population structure showed that SCS pipefish was a relatively stable and independent group, with a change in K value from 2 to 5, indicating that this group had a low genetic exchange with other populations. In contrast, the YS and ECS populations were well connected with appreciable gene exchange (Figure 2c). (Appendix S10). Then, we calculated π and F ST for each population using a 50-kb sliding window with a 10-kb step size based on the RADseq data. Selected genomic regions invariably show specific patterns of variation, such as high F ST and lower levels of π (Ellegren & Sheldon, 2008;Sabeti et al., 2006;Wang et al., 2015). We calculated the distribution of π ratio and F ST values to identify the selected regions for the north (YS-ECS) and SCS populations, and we inferred the selected genomic regions from high π log-ratios and an extreme divergence of allele frequencies (Figure 3a,b). Most of the genomic regions showed a low level of differentiation (Figure 3b), and the differentiated genome regions harboured 143 annotated protein-coding genes, including cold shock domain-containing protein E1 (csde1), paired box protein 7 (pax7), eyes shut homolog gene, retinal structural protein (rx3) and retinal guanylate cyclase (retgc-2) (Appendix S11), which we assumed to be subject to selection. Gene Ontology annotations of these genes were conducted to further understand their regulatory functions (Figure 3c). The ( Figure 3d).

| Ecological niche analyses results
Ecological niche characterization with hypervolumes indicated that the three clades diverge in their realized niches across their distribution range (Figure 4). Clade YS had a relatively narrower ecological niche than clades ECS and SCS. The greater niche differentiation occurred between clade YS and SCS (Table 3), and this was due to two simultaneous niche-based operations. We observed both a niche shift, with clade SCS experiencing a wider breadth of depths and climatic conditions, and parallel niche expansion of SCS (or a contraction of clade YS). The niche of clade ECS, although disjunct, was more similar to that of the other two clades, as emphasized by a smaller distance between centroids. The differentiation of clade ECS with respect to clades YS and SCS was only due to niche expansion/ contraction processes (Table 3).
TSS and AUC scores revealed that all algorithms but surface range envelop showed excellent performance (Appendix S12); thus, eight algorithms were used to develop the ensemble models. The

| D ISCUSS I ON
Here, we investigated the cranial morphology, genetic structure and ecological niche of S. schlegeli along Chinese seashores to reveal the phylogeographical patterns resulting from ecological specialization.
We delimited three independent clades of S. schlegeli using genomic data and observed that this differentiation was mirrored in the morphological divergence of local populations. The integration of genetic, morphological and ecological data supports the hypothesis that even the barrier-free sea environment maintains the potential for ecological specialization. A similar phylogeographical pattern has been documented in the north-western Pacific using fishes, molluscs and crustaceans (Liu et al., 2007;Ni et al., 2012;Wilson et al., 2020;Xu et al., 2009).
In marine environments, worldwide glaciation is the most efficient way to generate intraspecific genetic splits (Hewitt, 2000), as F I G U R E 4 Pairplots representing the estimated realized niches for the three genetic lineages of pipefish. For each clade, coloured points are 10,000 points stochastically sampled from each hypervolume, representing the actual shape and boundaries TA B L E 3 Pairwise niche differentiation among hypervolumes, as estimated through a distance metric (lower panel; distance between centroids) and an overlap metrics (upper panel; β total = β replacement + β richness ). The diagonal of the matrix is the volume of each hypervolume Abbreviation: β = Beta diversity; dc =distance between hypervolume centroid; v = volume well as shape the present-day phylogeographical pattern of marine species (Dong et al., 2012;Wilson et al., 2020). Examples of this have been observed in different biogeographic realms, from the Indo-West Pacific to the Atlantic and Mediterranean basins (Cunha et al., 2008;Ni et al., 2014).  (Wang & Sun, 1994). Therefore, along with drop in the sea level, the ECS was reduced to an elongated trough, the Okinawa Trough, while the SCS became a semi-enclosed sac-shaped gulf connected with the Pacific mainly through the Bashi Strait (Wang & Sun, 1994). These sea basins served as separate refuges for different geographic populations of local marine species. The present genetic data suggest a genetic split between the ECS-YS and SCS lineages, which has also been reported in other marine fishes (Liu et al., 2007).
A few genomic regions in S. schlegeli showed high differentiation while the rest of the genome was very similar, as the F ST analysis between north and SCS populations indicated. Such a heterogeneous genome divergence has been commonly described in situations where differentiated genomes have experienced differential introgression following secondary contact (Xu et al., 2009) (Chen et al., 2006;Yu & Zhou, 2001). In parallel, genetic breaks may be attributed to freshwater outflow from rivers, as demonstrated, for example, in the Amazon basin (Rocha et al., 2002). As the third-largest river worldwide, Yangtze (Changjiang) River pours into the ECS with an average annual discharge of 924 billion m 3 (Ni et al., 2012). While the barrier effect of the vast freshwater outflow on the coastal species gene flow is still a controversial issue, we suggest that the freshwater outflow may have affected the gene change among marine species in more recently, following potential changes in the estuary and variable river discharge. As freshwater outflow affects mostly intertidal species with a strong habitat specificity (Dong et al., 2012), and SDM projections suggest that a large part of Chinese, Japanese and Korean coastal areas are suitable for S. schlegeli, we believe that the phylogeographical pattern of S. schlegeli can largely be explained by the historical glaciation.
It is worth noting that pipefish are unique among fishes as the males have a brood pouch and are ovoviviparous. Moreover, they are slow swimmers. These life history traits render them sensitive to environmental changes (Wilson et al., 2003;Zhang, Capinha, et al., 2020).
Compared with other marine fishes, environmental heterogeneity is  (Price et al., 2003). Today, it is widely accepted in ecological developmental biology that phenotypic plasticity is an important mechanism promoting diversification and speciation (Pfennig et al., 2010;Rundle & Nosil, 2005). However, heritability of the morphological variation observed in S. schlegeli populations has yet to be confirmed because of the challenge of artificial breeding of this species under common garden conditions. Further studies are needed to prove the findings in other relative species (e.g. Syngnathus typhle, Hippocampus erectus) that can be easily maintained in the standard laboratory conditions. Divergent selection resulting from diverse environmental conditions constitutes barriers to gene flow (Tobler et al., 2008), and identification of loci or genes under natural selection is important to determine the genetic basis of local adaptation affecting fitness traits (Zardi et al., 2013). A total of 143 genes were annotated from the selected genomic regions, revealed using genome scan method, among which many key genes are involved in growth (pax7) (Akolkar et al., 2016), cold adaptation (csde1) (Yang et al., 2012) and eye development (eys, rx3) (Loosli et al., 2003;Yu et al., 2016). Accordingly, local populations of S. schlegeli may benefit from the adaptive variation to improve the adaptation of this species to local environments.
Although marine fish generally show high genetic diversity and shallow population structure (Takahashi et al., 2015;Utter & Seeb, 2010), the results of this study confirmed that S. schlegeli has an obvious geographical population structure along with its distribution range in the north-western Pacific. The divergence among the different geographical populations might be related to geological events, as well as adaptation to habitat heterogeneity across latitudinal gradients.

| CON CLUS ION
We disentangled the morphological, molecular and ecological mechanisms underlying adaptive evolution against a background of highlevel gene flow in a seemingly barrier-free environment. We showed that historical processes during the LGM, in concert with nichedriven ecological differentiation, have led to the establishment of three distinct clades across a widely distributed marine pipefish.
Ultimately, our study shows that even interconnected sea environments maintain a high potential for ecological specialization.

ACK N OWLED G EM ENTS
We are grateful to Prof. Jinxian Liu and Zhongming Wang for sample collection. This research was supported by the National Natural

CO N FLI C T I NTE R E S T S
The authors declare that there are no conflicts of interest.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13286.

DATA AVA I L A B I L I T Y S TAT E M E N T
All mitochondrial and nuclear gene sequences were deposited into the GenBank database under accession numbers MT467667-MT468179.