High genetic differentiation of Indo‐Pacific humpback dolphins (Sousa chinensis) along the Asian Coast of the Pacific Ocean

Abstract The Indo‐Pacific humpback dolphin (Sousa chinensis) is a vulnerable marine mammal species that inhabits shallow, coastal waters from Southeast China, southward throughout Southeast Asia, and westward around the Bay of Bengal to eastern India. Polymorphic microsatellites are useful for elucidating ecological and population genetics‐related questions. Here, 18 new polymorphic microsatellites were developed from S. chinensis genomic DNA by Illumina MiSeq sequencing. Population genetic analyses were conducted on 42 S. chinensis individuals from three geographic locations, including the Xiamen Bay of China, the Western Gulf of Thailand, and Andaman Sea. Our microsatellite data revealed a strong and significant population structure among the three sampling regions (overall F ST = 0.371, p = .001). Pairwise mutual information index also demonstrated high levels of genetic differentiation between different region pairs (values range from 0.272 to 0.339, p < .001). Moreover, Structure analysis inferred three genetic clusters, with the high assignment probabilities of 95.92%, 99.47%, and 99.68%, respectively. Principal coordinate analysis plots of individuals divided entire genotypes into three clusters, indicating high level of genetic differentiation. Our results indicated the strong genetic structure in S. chinensis populations is a result of geographic distances. Other factors such as environmental variables, anthropogenic interference, and social behavior may also have contributed to population differentiation.


| INTRODUC TI ON
The Indo-Pacific humpback dolphin (Sousa chinensis) is widely distributed in shallow, coastal waters from Southeast China, southward throughout Southeast Asia, and westward around the Bay of Bengal to eastern India (Jefferson & Curry, 2015;Jefferson & Rosenbaum, 2014;Mendez et al., 2013). Recently, this species has been classified as "vulnerable" on the International Union for Conservation of Nature's RedList of Threatened Species based on an inferred population size reduction (Jefferson et al., 2017). Primary threats to this vulnerable species are incidental mortality caused by intensive fishing efforts using entangling gear, as well as ongoing habitat loss and degradation due to coastal development. Vessel collisions and environmental contamination may also be significant threats in some areas (Jefferson & Smith, 2016;Jefferson et al., 2017). For long-lived animals with late maturation and low reproductive rates such as S. chinensis, these threats often have resulted in priority conservation status being afforded to a number of small and fragmented populations (Brown et al., 2014).
Understanding genetic diversity and population structure is essential for the assessment of conservation status and effective management of a species, especially for inshore dolphins whose isolated populations are highly affected by human activities (Brown et al., 2014;Jefferson et al., 2009;Mace & Lande, 1991). Current information of population parameters is available for only a few sites in China, Malaysia, Thailand, and Bangladesh (Jefferson et al., 2017).
Photo-identification catalogues that would allow the identification of individuals and comparisons between these regions individuals are lacking. Although previous genetic studies revealed strong population structure in different S. chinensis communities (Amaral et al., 2017;Mendez et al., 2013), the analysis of genetic samples from the Asian coast of the Pacific and the Indian Ocean is still lacking. Recent population-level analyses based on a single locus of the mitochondrial DNA (mtDNA) control region have detected significant genetic differentiation between most of the geographic populations in both Chinese and Thai waters (Zhao et al., 2021). However, limited conclusions can be drawn when relying on a single mtDNA locus as a marker, because mtDNA is maternally inherited and often has higher mutation rates than nuclear DNA (Mendez et al., 2013). Analyses using additional molecular markers should be conducted.
Microsatellites are widely used and have gradually become an important genetic marker, because highly polymorphic microsatellites are useful for elucidating molecular ecology-and population genetics-related issues such as migration rates, bottlenecks, and kinship (De Barba et al., 2017;Selkoe & Toonen, 2006). These markers are usually short in length (100-300 bp), and they can still be amplified with polymerase chain reaction (PCR), even when using some poor-quality samples caused by DNA degradation (Taberlet et al., 1999). Microsatellite studies on cetaceans have successfully used genomic DNA extracted from sloughed skin or tissues collected from decomposing stranded animals (Valsecchi & Amos, 1996). However, using low-quality DNA samples may lead to low amplification success rates and high rates of genotyping errors, such as allelic dropouts and other allele-like artifacts that are generated by amplification (Bonin et al., 2004;Pompanon et al., 2005).
Traditional methods for microsatellite isolation include construction of an enriched library followed by cloning and Sanger sequencing, which are both expensive and extremely laborious and time-consuming (Zane et al., 2002). With the development of nextgeneration sequencing technologies, isolation of species-specific microsatellite loci has become more convenient and efficient (Kumar & Kocour, 2017;Vaini et al., 2019). Paired-end sequencing on the Illumina platform is currently the most commonly used approach for microsatellite isolation (González-Castellano et al., 2018).
Several genetic and genomic studies have been published on humpback dolphins (Gui et al., 2013;Jia et al., 2019;Zhang et al., 2020), but only a limited number of microsatellite sequences are reported in S. chinensis, and some of those are shown to have low polymorphism (Chen & Yang, 2009;Lin et al., 2012). In addition, there are more dinucleotide microsatellites among the existing markers. Using such short tandem repeat motifs may produce a large amount of strand slippage during PCR and increase the likelihood of stutter bands and genotyping errors (Pei et al., 2018;Zalapa et al., 2012). Compared to dinucleotide markers, tetrameric and pentameric markers have lower stutter slippage efficiency and clearer peak discrimination during PCR amplification and genotyping (Gill et al., 2005;Pei et al., 2018). Therefore, the objectives of this study were to: (1) isolate tetra-, penta-, and hexa-nucleotide microsatellites from S. chinensis genome sequences using Illumina sequencing; (2) evaluate genetic diversity of S. chinensis using samples obtained from three different sampling sites along the Asian coast of the Pacific and the Indian Ocean; and (3) infer population structure among the three sampled locations.
the time of collection (Appendix S1). Genomic DNA from minced tissue samples were extracted using DNeasy blood and tissue extraction kits (QIAGEN) according to the manufacturer's protocol.

| Microsatellite selection and multiplex PCR design
Purified genomic DNA was quantified by TBS-380 fluorometer (Turner BioSystems Inc.). High-quality DNA (OD 260/280 = 1.8-2.0, >1 μg) of a single individual collected at XM was used to generate an enriched library and sequenced on the Illumina MiSeq platform by Majorbio Bio-Pharm Technology Co., Ltd. The detailed procedures were as follows. First, the DNA sample was sheared into 400-500-bp fragments using a Covaris M220 Focused Acoustic Shearer following the manufacturer's protocol. Then, an Illumina sequencing library was prepared from the sheared fragments using the NEXTflex™ Rapid DNA-Seq Kit. Briefly, 5′ ends were first end-repaired and phosphorylated. Next, the 3′ ends were A-tailed and ligated to the sequencing adapters. The third step was to enrich the adapter-ligated products using PCR. Finally, the prepared library was used for paired-end Illumina sequencing (2 × 150 bp) on an Illumina HiSeq X Ten machine. After filtering low quality and duplicated sequences and removing adapter-related reads, a total of 2,174,959 clean reads were assembled using SOAPdenovo version 2.04 software (Luo et al., 2012). At last, 1,398,738 contigs including 95,161 large contigs (>1000 bp) were obtained, with an average GC content of 55.49% and contig N50 of 1513 bp (Table 1).
The assembled data were searched for tetra-, penta-, and hexanucleotide microsatellite motifs using MSATCOMMANDER version 0.8.2 (Faircloth, 2008). The searching parameters were a minimum of 10 repeats for tetra-nucleotide motifs, and six repeats for the other two repeat classes. Only 'perfect-type' microsatellite sequences (pure repeats) with a flanking region of at least 30 bp on each side were selected. PCR primers for 162 available microsatellites were designed using Primer3 (Rozen & Skaletsky, 2000). After primers design and PCR genotyping, a total of 18 polymorphic microsatellite loci ( Table 2) with 'perfect-type' and long tandem repeat motifs were allocated into 6 multiplex PCR panels using software MPprimer (Shen et al., 2010), based on annealing temperature, complementarity of primer pairs, and allele size range ( Figure 2).

| Microsatellite genotyping
The 5′ end of each forward primer was labeled with a fluorescent dye (6-FAM, VIC or NED). The total PCR volume (20 μl) consisted of approximately 50 ng of genomic DNA, 1×Multiplex PCR Kit (Takara), 0.2 μM of each primer (forward and reverse), and ddH 2 O added to make up the final volume (Table 2). PCR conditions involved an initial denaturation step at 94°C for 3 min, followed by 32 cycles of 94°C for 30 s, the specific annealing temperature (Table 2) for 90 s, extension at 72°C for 60 s, and a final extension for 30 min at 60°C. Fragment analysis was performed on the PCR products on an ABI 3730XL automated DNA sequencer (Applied Biosystems), using GeneScan LIZ 500 as the internal size standard. Allele sizes were automatically scored with GeneMapper ver-
We inferred population structure, using Structure version 2.3.4 (Pritchard et al., 2000), which estimates the number of genetic clusters (K) based on genotyping data generated from the six multiplex PCR panels. The length of the burn-in period was set to 10 5 iterations, followed by 10 6 in the number of Markov Chain Monte Carlo iterations. The LOCPRIOR model was chosen to infer possible weak population structure with the assistance of sample group information. The number of inferred K was set between 1 and 10, and 20 independent replicates were run for each K value. Subsequently, the Structure Harvester version 0.6.94 (Earl & VonHoldt, 2012) online tool was used to calculate the Delta K value and determine the best number of K clusters (Evanno et al., 2005). CLUMPP version 1.1.2 (Jakobsson & Rosenberg, 2007) was used to summarize the optimal alignment of the 20 replicates for the same K value. The final results were displayed graphically with Distruct version 1.1 (Rosenberg, 2004).

| Genetic diversity
Genetic diversity estimates of Na, eNa, I, Ho, He, and uHe values for each locus and each region are summarized in   Table 4. AMOVA results for the degree of variance in S. chinensis individuals are summarized in Table 5. There was 37.08% genetic variance among the three geographic regions, 6.85% variance among individuals within each region, and 56.07% variance within individuals.

| Genetic differentiation
Structure Harvester analysis showed a clear peak for Delta K at

TA B L E 3
Genetic diversity parameters in the three geographic regions. N is the sample size, Na is the number of alleles, eNa is the number of effective alleles, I is the Shannon′s information index, Ho is the observed heterozygosity, He is the expected heterozygosity, uHe is the unbiased expected heterozygosity, P HWE is the p value of Hardy-Weinberg equilibrium test, * indicates significant departure from Hardy-Weinberg equilibrium after sequential Bonferroni correction (p < .003), ND represents not done The Mantel tests revealed a positive and significant correlation (R 2 = .2939, p = .0001) between the individual-by-individual genetic distances and the geographic distances, indicating a pattern of IBD among the three geographic regions (Figure 6a). The result remained positive when geographic coordinates and microsatellite data for XM individuals were removed (R 2 = .3939, p = .0001; Figure 6b).

| DISCUSS ION
In this study, 18 polymorphic microsatellites with tetra-, penta-, and hexa-nucleotide repeats were isolated from the genomic DNA of S. chinensis and used for genetic analysis of 42 S. chinensis individuals from three geographic locations. Dimeric and trimeric microsatellite loci were not considered in this work, because stutter bands caused by slipped strand mispairing during PCRs might have occurred at short tandem repeat motifs (Hauge & Litt, 1993;Murray et al., 1993).
Compared with dimeric microsatellites, tetrameric and pentameric loci are shown to have lower stutter slippage efficiency and clearer peak discrimination during PCR amplification and genotyping (Pei et al., 2018). Here, we provided 18 novel polymorphic microsatellite markers, which could be useful for future molecular genetics studies on S. chinensis and other closely related species.
Estimating the levels of genetic diversity in natural populations provides important information for evaluating species viability . Our microsatellite data showed the levels of genetic diversity detected for S. chinensis in XM were significantly lower than those in the other two populations. The low genetic diversity in XM may be related to the small sample size. However, previous studies have revealed low levels of molecular diversity in S. chinensis in Chinese waters (Chen et al., 2008Lin et al., 2010Lin et al., , 2012Zhang et al., 2020;Zhao et al., 2021 (Karczmarski et al., 2017).
The results revealed a significant strong genetic differentiation in S. chinensis among different regions for both F ST and sHua estimates. A previous study revealed that F ST was more appropriate than other estimators when numbers of loci and sample size were limited (Balloux & Goudet, 2002). Compared with F ST , the sHua value is known to be a better estimator of genetic differentiation, and can robustly reflect dispersal over a wide range of population sizes and dispersal rates (Manlik et al., 2019;Sherwin et al., 2017). Similarly, our results of the estimated sHua also showed a high degree of genetic differentiation, which indicates a strong population structure among S. chinensis communities of these three sampled regions.
Relatively higher degrees of genetic differentiation with higher pairwise F ST or Φ ST values (>0.5) have been reported for S. chinensis between populations in China and Thailand based on mtDNA sequence data (Amaral et al., 2017;Mendez et al., 2013;Zhao et al., 2021).
The Mantel analysis showed a pattern of IBD among different sampling regions, which indicated that the observed genetic differentiation in S. chinensis communities is a result of geographic distances.
Phylogenetic analysis of the Sousa genus based on both mtDNA and nuclear DNA markers demonstrated that the Chinese and Thai haplotypes represented one assemblage, although morphological evidence revealed a clear distinction between the two sampling regions (Mendez et al., 2013). Theoretically, there is still potential genetic exchange or contact between S. chinensis communities in China and Thailand. However, humpback dolphins are known to prefer shallow waters (less than approximately 25 m in depth) and reside only in coastal (generally within 1-2 km off shore) and estuarine waters Jefferson, 2000;Jefferson et al., 2009).
This species shows minimal linear distance movement, with a maximum dispersal distance of 300 km in Chinese waters Wang et al., 2016). Therefore, exchange of S. chinensis individuals seems to more likely occur between adjacent communities in China. The geographic distance from the northern South China Sea to the Gulf of Thailand is over 3000 km, which is much greater than the S. chinensis individual dispersal distance. Our microsatellite data also revealed that the genetic structure of S. chinensis in China and Thailand follows an IBD model, which can explain the strong genetic differentiation among the three sampling locations in this study.
The possible mechanisms that drive the population differentiation and even result in the species boundaries of humpback dolphins include distribution patterns, environmental factors, and behavioral processes (Mendez et al., 2013). Barriers to individual dispersal and gene flow can exist among different S. chinensis communities. In the previous studies, significant genetic differentiations of S. chinensis between the neighboring regions were also detected in Chinese waters (see Zhang et al., 2020;Zhao et al., 2021). Geographic barriers have been found among humpback dolphin populations and were caused by different oceanographic features, such as ocean currents, upwelling, bathymetry, sea surface temperature, primary productivity, and salinity (Amaral et al., 2017;Mendez et al., 2011Mendez et al., , 2013. Genetic evidence for Indo-Pacific marine fauna has shown distinct genetic lineages of several species in the east and the west, including Indo-Pacific bottlenose dolphins (Tursiops aduncus) and humpback dolphins (e.g. Amaral et al., 2017;Keyse et al., 2014;Zhao et al., 2021). Our microsatellite analysis also showed a significant genetic differentiation between WG and AS populations. Historically, the crustal movement of the continental plates and some climatic events may prevent the individual dispersal and gene flow of marine organisms across the Malacca Strait (Zhao et al., 2021). Besides these environmental variables, development of coastal areas may lead to anthropogenic barriers to dispersal and produce the isolated population fragments of inshore dolphins (Brown et al., 2014). It is reported that the coastal development projects have been increasing continuously in the Western Gulf of Thailand (Jutapruet et al., 2015). Therefore, anthropogenic

| CON CLUS ION
In this study, 18 new microsatellite markers with pure and long tandem repeat motifs were isolated from S. chinensis genomic DNA using Illumina MiSeq sequencing. These polymorphic microsatellites were allocated into 6 multiplex PCR panels and successfully obtained genetic data of 42 S. chinensis individuals from the Xiamen Bay of China, the Western Gulf of Thailand, and Andaman Sea coast.
Our microsatellite evidence, together with mtDNA sequence data reported in the present study area (Zhao et al., 2021), indicate that there are high genetic differentiation among S. chinensis communities along the Asian coast of the Pacific and the Indian Ocean. The strong genetic structure in S. chinensis populations may be associated with multiple factors such as geographical distribution patterns, environmental variables, anthropogenic interference, and social behavior. These novel polymorphic microsatellite markers will be useful for future molecular genetics studies on this endangered species and other closely related species.

ACK N OWLED G M ENTS
This work was financially supported by the National Natural Science Finally, we express our sincere thanks to editors and anonymous reviewers for their constructive comments that significantly improved our manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequences containing the polymorphic microsatellite loci reported in this paper have been deposited into the GenBank database under the following Accession Numbers: MK766845-MK766870.