Cryptic species and parallel genetic structuring in Lethrinid fish: Implications for conservation and management in the southwest Indian Ocean

Abstract Analysis of genetic variation can provide insights into ecological and evolutionary diversification which, for commercially harvested species, can also be relevant to the implementation of spatial management strategies and sustainability. In comparison with other marine biodiversity hot spots, there has been less genetic research on the fauna of the southwest Indian Ocean (SWIO). This is epitomized by the lack of information for lethrinid fish, which support socioeconomically important fisheries in the region. This study combines comparative phylogeographic and population genetic analyses with ecological niche modeling to investigate historical and contemporary population dynamics of two species of emperor fish (Lethrinus mahsena and Lethrinus harak) across the SWIO. Both species shared similarly shallow phylogeographic patterns and modeled historical (LGM) habitat occupancies. For both species, allele frequency and kinship analyses of microsatellite variation revealed highly significant structure with no clear geographical pattern and nonrandom genetic relatedness among individuals within samples. The genetic patterns for both species indicate recurrent processes within the region that prevent genetic mixing, at least on timescales of interest to fishery managers, and the potential roles of recruitment variability and population isolation are discussed in light of biological and environmental information. This consistency in both historical and recurrent population processes indicates that the use of model species may be valuable in management initiatives with finite resources to predict population structure, at least in cases wherein biogeographic and ecological differences between taxa are minimized. Paradoxically, mtDNA sequencing and microsatellite analysis of samples from the Seychelles revealed a potential cryptic species occurring in sympatry with, and seemingly morphologically identical to, L. mahsena. BLAST results point to the likely misidentification of species and incongruence between voucher specimens, DNA barcodes, and taxonomy within the group, which highlights the utility and necessity of genetic approaches to characterize baseline biodiversity in the region before such model‐based methods are employed.


| INTRODUCTION
Understanding the evolution of marine biodiversity requires knowledge of the patterns and processes underpinning genetic structuring of species. Combined demographic-genetic studies can reveal the spatial and temporal scale at which evolutionary forces are occurring (Waples, 1998) and, by extension, identify cryptic components of biodiversity and so allow optimization of spatial conservation strategies. Identification of spatial and temporal processes is particularly important for harvested taxa, as genetic diversity and adaptation are key factors underpinning the resilience/sustainability, and ultimately the evolutionary potential, of populations and species (Iles & Sinclair, 1982;Ruzzante et al., 2006;Ryman, Utter, & Laikre, 1995;Therkildsen et al., 2013).
The southwest Indian Ocean (SWIO), which forms a subdivision of the tropical Indo-Pacific, is renowned for its diverse marine habitats and resources, including an ichthyofaunal richness (Smith, Smith, & Heemstra, 2003) that supports significant artisanal, subsistence, and commercial fisheries across the region (Berg, Francis, & Souter, 2002).
The complex array of genetic population structures resolved to date within the SWIO highlight the potential hazards of making region-wide predictions of population structure from single species. However, studies of ecologically similar and contemporaneously co-distributed species represent analytical frameworks wherein variance between lineages, place, and time are minimized, and thus "general rules" informative to subsets of taxa may be resolved (Bird, Holland, Bowen, & Toonen, 2007Bohonak, 1999;Dawson, 2012;Lester, Ruttenberg, Gaines, & Kinlan, 2007). In this study, we develop such an analytical framework by performing replicated genetic analysis of two lethrinid fish species: Lethrinus harak, the thumbprint emperor, and Lethrinus mahsena, the sky emperor. Both species are ecologically and biogeographically similar. The habitat of L. harak, the most abundant and commercially important lethrinid in the SWIO (Kulmiye, Ntiba, & Kisia, 2002), consists of shallow and protected coastal areas including reefs, mangroves, shallow lagoons, and seagrass beds (Ebisawa & Ozawa, 2009;Kulmiye et al., 2002). Lethrinus mahsena occupies similar shallow habitats but has been observed predominantly in reefs or adjacent areas (Carpenter & Allen, 1989). Like the majority of lethrinids, both species are relatively long-lived (up to 15 years; Ebisawa & Ozawa, 2009) protogynous hermaphrodites (Ebisawa, 2006;Grandcourt, 2002), but many aspects of their reproductive biology are not known (Ebisawa, 2006;Kulmiye et al., 2002).
Both species have pelagic larval stages with Nakamura et al. (2010) reporting a larval duration of around 29 days for L. harak.
The aim was to test the prediction that both species would display similar patterns of historical (phylogeographic) and contemporary population genetic structure across the SWIO. As the Lethrinidae is one of the most important families of fish for artisanal, recreational, and subsistence fisheries (Carpenter & Allen, 1989;Gouws, 2012), the genetic patterns are interpreted in the context of recurrent recruitment processes relevant to stock sustainability. To this end, kinship analyses were employed to complement estimates of connectivity/ isolation based on traditional population genetic (e.g., F ST ) methods.
To provide an additional historical context, we combined genetic studies with ecological niche modeling (ENM), which has been shown to provide novel insights into the influence of historical biogeography on genetic variation in marine species (Glynn, Houghton, & Provan, 2015;Glynn et al., 2016).

| Sample collection and molecular analyses
Samples (fin clips fixed in 95% ethanol) were collected from commercial or subsistence catches landed in seven locations across the SWIO between 2009 and 2012 ( Figure 1). Total DNA was extracted from all samples using a standard CTAB-chloroform/isoamyl alcohol method (Winnepenninckx, Backeljau, & Dewachter,1993 for both primer sets, and 60 s at 72°C, followed by a final 5-min extension at 72°C. PCR products were then purified using EXOSAPIT and sequenced from both directions on an Applied Biosystems 3500 platform using the respective PCR primers. Sequences were aligned using the Clustal W (Thompson, Higgins, & Gibson, 1994) program, available in BioEdit (Hall, 1999), and analyzed using BLAST. Sequences revealed the presence of a highly divergent clade among the Seychelles L. mahsena samples (described in Section 3). To investigate the frequency of occurrence of this clade elsewhere, a diagnostic PCR-RFLP method was developed and used to genotype all L. mahsena samples (Appendix S1: Figure S1).
Percentage sequence divergence within and between species/clades and K2P distances were calculated using MEGA v6.06.
Genetic variation was assessed using calculations of the number of haplotypes (H), in addition to indices of haplotype (h) and nucleotide (π) diversity (Nei, 1987) alongside their variances. Genetic differentiation among samples was further tested by global and pairwise Φ ST using pairwise haplotype distances (Weir & Cockerham, 1984), with associated p values estimated after 10,000 permutations.
Genetic differentiation was quantified using global and pairwise F-statistics (Wright, 1978). To account for possible null allele effects, such indices were also calculated using the null alleles adjustment in FreeNA (Chapuis & Estoup, 2007). To test for signatures of isolation by distance (IBD), correlations between geographical distances (minimum sea distance in nautical miles) and linearly transformed genetic Slatkin, 1993) were tested using a MANTEL matrix correlation test in GENALEX. Genetic structure was also investigated using Bayesian "group assignment" "without admixture" methods implemented in BAPS 6.0 (Corander, Waldmann, Marttinen, & Sillanpaa,2004), for models of K = 1-6 (10 independent runs per K).
Mean pairwise relatedness within samples was calculated using the relatedness estimator r qg of Queller and Goodnight (1989) in GENALEX (Peakall & Smouse, 2006) with associated 95% confidence intervals determined by 1,000 bootstraps. Permutation of genotypes among all samples (999 times) was used to calculate the upper and lower 95% confidence intervals for the expected range of r qg under a panmictic model. The maximum likelihood method implemented in ML-RELATE (Kalinowski, Wagner, & Taper, 2006)

| Estimation of Type I and Type II error rates
To assess differences in statistical power among the data sets for both species and marker types, Type I and Type II error rates were estimated using POWSIM (Ryman & Palm, 2006

| Cryptic genetic divergence in sympatry
Phylogenetic reconstruction of COI haplotypes among putative L. mahsena revealed the presence of two highly divergent clades within the SWIO, which were separated by 20 mutational steps ( Figure 2  This pattern was also supported by assignment tests which found complete self-classification to clade with the exception of a single individual from clade A, which had a similar probability of assignment to both clades (0.55 and 0.45 assignment probability to clade A and B groups, respectively).

| mtDNA COI sequence data
Overall haplotype and nucleotide diversities (    could not be unambiguously classified as related or unrelated for both L. harak (88.54%) and L. mahsena (79.56%)) but did identify related dyads within all samples for both species except Madagascar, Belo sur Mer for L. mahsena.

| Paleodistribution modeling
Mean AUC values were 0.985 (SD = 0.003) and 0.980 (SD = 0.016) for L. harak and L. mahsena, respectively, indicating very good model performance. Present-day models for both L. harak and L. mahsena appear to accurately reflect the known contemporary ranges of both species, with suitable habitats aggregated in coastal environments across the Indian and western Pacific Oceans ( Figure 6). Despite an overall reduction in suitable habitat during the LGM, this appears to take the form of a geographically homogeneous contraction across both species ranges rather than obvious vicariant fragmentation of present-day ranges ( Figure 6). have a greater similarity (99%) with L. lentjan voucher specimens from Iran, whereas supposed L. mahsena sequences from Japan (JF952782.1) exhibit greatest similarity (91%) to L. rubrioperculatus.

| Cryptic sympatric species
In general, the meristic and morphological features of emperor fish are conservative (Carpenter & Allen, 1989) and distinct species are often difficult to identify easily by morphology alone (Sato, 1971;Smith, 1959). Failure to describe the full species diversity and/or misidentification of species may fundamentally compromise conservation strategies and tools. For example, the ENM performed here was based on reported occurrences of the species; however, based on the genetic results such occurrence data may be inaccurate. By extension, the modeled habitat patches may actually reflect distinct species ranges. The cryptic biocomplexity resolved here, as well as other cases of identification of cryptic lethrinid species (Borsa, Hsiao, Carpenter, & Chen, 2013), emphasizes the need for genetic studies on a wider geographical scale to fully resolve levels of intraspecific and interspecific diversification, as well as for the development of low-cost species identification and assignment techniques as resources for management such as the PCR-RFLP procedure developed here.

| Population history of Lethrinus harak and Lethrinus mahsena across the SWIO
Hindcasting of ecological niche models (ENMs) suggested a marked reduction in suitable habitat for both L. mahsena and L. harak during the LGM that would be expected to have decreased their range-wide population sizes. Therefore, as for many other SWIO taxa including numerous reef-associated fish (Craig, Eble, Bowen, & Robertson,2007;Visram et al. 2010) and crustaceans Gopal et al.,2006;Tolley, Groeneveld, Gopal, & Matthee,2005), it is likely that eustatic sea-level fluctuations during the Pleistocene effected demographic changes in SWIO lethrinids. mtDNA phylogenies for both species conformed to classical "star"-shaped patterns (Slatkin & Hudson, 1991) expected under population expansion models.
Demographic tests also provided some support for population expansion events for L. mahsena but not L. harak; however, the resolution of these tests may have been limited by a combination of low numbers of informative sites and differing sample sizes. The lack of genetic breaks within the mtDNA phylogenies for both species is compatible with a lack of prolonged vicariance. Similarly shallow phylogenies have been reported in many other SWIO species. Therefore, it would appear that for many species there has been little phylogeographic diversification within the SWIO (Hoareau, Boissin, & Berrebi,2012;Muths et al.,2015) with the majority of cases of deep phylogeographic structure within the region due to colonization of allochthonous lineages (Ragionieri et al., 2009Silva et al., 2010a).  (Whiteley, Spruell, & Allendorf, 2004). Further insight into the drivers of genetic differentiation was provided by the kinship analyses, which examine how alleles are shared among individuals and provide an independent test of the hypothesis that structure as quantified by F ST , which focuses on population allele frequencies, is a result of connectivity (Christie, Johnson, Stallings, & Hixon, 2010;Iacchei et al., 2013). Firstly, for L. mahsena mean kinship values for all samples exceeded those predicted within a nonstructured system. Second, related individual pairs and/or potentially related dyads (i.e., dyads that could not be unambiguously described as unrelated) were found in all samples for both species. While the large number of dyads that could not be unanimously classified to a single relationship category (i.e., unrelated or related) highlights the resolution threshold of the data, the results for both allele frequency and allele-sharing analyses indicate that, even though sample sizes are small in some cases, the genetic heterogeneity cannot be dismissed as statistical noise but rather reflects some changes in the composition that may be a useful tool for better understanding recruitment dynamics and connectivity in these species (Knutsen et al., 2011;Selkoe, Gaines, Caselle, & Warner, 2006).

| Population structure of L. harak and L mahsena across the SWIO
The geographically patchy genetic structuring and kinship pat- suggested by Muths et al. (2015). Such an isolating mechanism might also explain the seeming absence of clade B from other SWIO sites. have been suggested to contribute to stochastic recruitment/genetic patchiness within the region in a number of species (Bourjea et al., 2007;Hoareau, Boissin, Paulay, & Bruggemann, 2013;Muths et al., 2015) and may contribute to the differentiation among samples from both species within the Mozambique Channel reported here. Even in the absence of genetically isolated source populations, larval cohesion (Selkoe et al., 2006) may enhance (Waples, 2002) and be effectively indistinguishable from sweepstake effects (Turner, Dowling, Marsh, Kesner, & Kelsen, 2007 Disentangling the exact roles of recruitment heterogeneity and restricted gene flow will require more sensitive genetic assays and would benefit from analysis of age-segregated samples (Burford, Carr, & Bernardi, 2011). However, the overall results indicate processes within the region that prevent genetic mixing, at least, on timescales of interest to fishery managers. These contrast with the broad-scale genetic homogeneity and subtle patchiness reported for L. nebulosus in Australian waters (Berry et al., 2012) and implicate regional seascape drivers within the SWIO. In these cases, a spatial "bet-hedging" approach is advised for marine resource management, including geographical dispersion of marine reserves if they are to be used (Larson & Julian, 1999). The ability to predict population structure across taxa applies directly to the implementation of MPAs.
L. harak and L. mahsena reported similar historical modeled habitat occupancy and phylogeographic patterns, while resolved genetic patchiness points to similar recurrent population processes in both species. Therefore, while we agree that making sweeping predictions from alleged model organisms can be dangerous (Bird et al., 2007), this similarity suggests that the application of information from such models to subsets of taxa may be highly useful in light of finite resources for conservation. Paradoxically, the identification of a cryptic sympatric species assemblage with little (if any) morphological differentiation, that is potentially endemic to the Seychelles, highlights the utility and necessity of genetic approaches to characterize baseline biodiversity in the region before such model-based methods are employed. We would like to thank Sophie Benbow, the staff at Blue Adventures, the Seychelles Fisheries Authority, Dr. Adriano Junior, and the Centre for Marine research in Pemba for helping us collect samples.

CONFLICT OF INTEREST
None declared.

DATA ARCHIVING
All data will be uploaded on DRYAD.