Population genomic response to geographic gradients by widespread and endemic fishes of the Arabian Peninsula

Abstract Genetic structure within marine species may be driven by local adaptation to their environment, or alternatively by historical processes, such as geographic isolation. The gulfs and seas bordering the Arabian Peninsula offer an ideal setting to examine connectivity patterns in coral reef fishes with respect to environmental gradients and vicariance. The Red Sea is characterized by a unique marine fauna, historical periods of desiccation and isolation, as well as environmental gradients in salinity, temperature, and primary productivity that vary both by latitude and by season. The adjacent Arabian Sea is characterized by a sharper environmental gradient, ranging from extensive coral cover and warm temperatures in the southwest, to sparse coral cover, cooler temperatures, and seasonal upwelling in the northeast. Reef fish, however, are not confined to these seas, with some Red Sea fishes extending varying distances into the northern Arabian Sea, while their pelagic larvae are presumably capable of much greater dispersal. These species must therefore cope with a diversity of conditions that invoke the possibility of steep clines in natural selection. Here, we test for genetic structure in two widespread reef fish species (a butterflyfish and surgeonfish) and eight range‐restricted butterflyfishes across the Red Sea and Arabian Sea using genome‐wide single nucleotide polymorphisms. We performed multiple matrix regression with randomization analyses on genetic distances for all species, as well as reconstructed scenarios for population subdivision in the species with signatures of isolation. We found that (a) widespread species displayed more genetic subdivision than regional endemics and (b) this genetic structure was not correlated with contemporary environmental parameters but instead may reflect historical events. We propose that the endemic species may be adapted to a diversity of local conditions, but the widespread species are instead subject to ecological filtering where different combinations of genotypes persist under divergent ecological regimes.


| INTRODUC TI ON
In the marine environment, coral reef fishes are a model group for understanding processes of speciation because they are well-characterized and represent the most diverse vertebrate communities on the planet (Nelson, 2006). Reef fishes have broad geographic ranges (typically much greater than terrestrial species; Jones, Caley, & Munday, 2002), a nearly ubiquitous pelagic larval stage with relatively few barriers to dispersal, and occupy a variety of habitats. Understanding how such traits influence the evolution and distributions of reef fishes has long motivated researchers Cowman & Bellwood, 2013). Our understanding of reef fish evolution and biogeography has benefitted greatly from the rapid development of molecular analyses, and new genomic approaches have illuminated the processes driving genetic differentiation in natural populations at a variety of temporal and spatial scales (e.g., Gaither et al., 2015).
The reefs surrounding the Arabian Peninsula present an excellent arena for testing the genomic consequences of environmental transitions. In contrast to the reef systems of the central Indo-West Pacific, these peripheral reefs occupy one of the most geologically and oceanographically volatile regions in tropical oceans (DiBattista, Choat, et al., 2016;DiBattista, Gaither, et al., 2017;DiBattista, Roberts, et al., 2016;DiBattista et al., 2015;Simpson, Harrison, Claereboudt, & Planes, 2014;Xu, Ruch, & Jónsson, 2015), and are defined by three prominent features: (a) a sharp increase in nutrient availability in the southern Red Sea, (b) the narrow, shallow Strait of Bab Al Mandab that constitutes the only connection between the Red Sea and Indian Ocean, and (c) seasonal upwelling associated with the northern Indian Ocean monsoon. First, the eutrophic region south of ~17°N in the Red Sea may limit larval dispersal of marine fauna, a hypothesis supported by the disjunctive distribution of some reef fish species (Roberts, Shepherd, & Ormond, 1992), as well as genetic differentiation between populations of reef organisms (Nanninga, Saenz-Agudelo, Manica, & Berumen, 2014;Giles, Saenz-Agudelo, Hussey, Ravasi, & Berumen, 2015;Saenz-Agudelo et al., 2015;Reimer et al., 2017; but see Robitzch, Banguera-Hinestroza, Sawall, Al-Sofyani, & Voolstra, 2015). Second, water exchange through the Strait of Bab Al Mandab was repeatedly restricted during Pleistocene glacial cycles when sea level lowered as much as 140 m (Braithwaite, 1987;Rohling et al., 1998). Third, the Indian Ocean monsoon causes profound seasonal changes in ocean temperature, salinity, and productivity (Smeed, 2004;Sofianos, Johns, & Murray, 2002). At the western extreme of the Arabian Sea, the Gulf of Aden and waters of Djibouti have a high and relatively stable temperature regime with extensive limestone reefs and high coral cover (Wilkinson, 2008). At the eastern extreme, the southern coastline of Oman is subject to a "pseudo-high-latitude effect," where seasonally cool sea surface temperatures and monsoonal upwelling events result in rocky reefs with sparse coral cover but dense algal cover 15 School of Aquatic and Fishery Sciences, (Barber et al., 1995;Savidge, Lennon, & Matthews, 1990;Sheppard, Price, & Roberts, 1992). These substantial changes in reef habitat occur over less than 2,000 km, well within the capacity for larval dispersal and gene flow of most reef fishes (Keith, Herbert, Norton, Hawkins, & Newton, 2011;Keith, Woolsey, Madin, Byrne, & Baird, 2015;Lessios & Robertson, 2006).
The habitat discontinuities, local environmental fluctuations, and vicariance of the coastal seas of the Arabian Peninsula provide an opportunity to investigate the relative importance of dispersal, selection, and historical processes in defining intraspecific or even interspecific genetic architecture. Evidence of selection and local adaptation can be detected with correlations between environmental variables and allele frequencies (Coyne & Orr, 2004;Schluter, 2000). For example, reef fish species distributed across the southern Red Sea and through the Gulf of Aden (including Djibouti) show abrupt changes in demographic features, including life span, over relatively small spatial scales and moderate environmental variation (Taylor, Lindfield, & Choat, 2015;Taylor, Trip, & Choat, 2018). To the east of the Gulf of Aden, the environmentally turbulent Oman upwelling coast is likely to have even greater effects on reef fish demography and assemblage composition (Burt et al., 2011;Priest et al., 2016). Historical processes associated with Pleistocene glacial cycles may have also influenced fish demography and faunal composition (DiBattista, Roberts, et al., 2016). If the contemporary environmental conditions influence genetic architecture, we would then expect strong correlations between those conditions and SNP frequency. Alternatively, if historical conditions were of greater importance, we would predict a weak correlation or no correlation between the two variables.
Reef fishes in the Red Sea, Gulf of Aden, Arabian Sea, and Sea of Oman also provide an opportunity to test for local adaptation across strong environmental gradients in both endemic (i.e., range-restricted) and widespread reef fishes. We expect that range-restricted species, which complete their life cycle among these reefs and adjacent oceans, are adapted to local conditions and inherent environmental fluctuations. In contrast, widespread species routinely maintain gene flow with other parts of the Indian Ocean , and larvae arriving from outside the region may be less adapted to these conditions. Although gene flow can prevent local adaptation in widespread species (Lenormand, 2002;Slatkin, 1987), strong differences in ecological conditions between locations can also reduce effective gene flow (Orsini, Vanoverbeke, Swillen, Mergeay, & Meester, 2013). If local adaptation was greater in endemic species compared to widespread species, we would expect the endemics to have less genetic structure than widespread species across a similar geographic range.
Here we use genome-wide SNPs to investigate genetic structure in reef fishes of the Arabian Peninsula. We adopt a multi-taxon approach comprising eight regional endemics and two widespread reef fishes, all with larvae capable of long-distance dispersal. We aim to test (a) for associations between SNPs and contemporary environmental gradients in each species, (b) whether widespread species have greater genetic structure than endemic species across the same geographic region, and (c) determine the influence of vicariant processes on genetic structure by reconstructing scenarios for population subdivision in the species that display signatures of isolation.

| Sample collection and study
We collected tissue samples (fin clip or gill filaments) from individual fish using pole spears at sites between the Gulf of Aqaba in the northern Red Sea (N 28.404°, E 34.738°) and Muscat in the Sea of Oman at the north-western boundary of the Arabian Sea (N 23.525°, E 58.740°; Figure 1 and Table 1). Nine species are from the butterflyfish family Chaetodontidae and are characterized by a range of geographic distributions (Table 1). We also included a widespread surgeonfish [Ctenochaetus striatus (Quoy & Gaimard, 1825)] from the family Acanthuridae. Some of the species were rare or absent at sampling sites, particularly for the regional endemics, which resulted in missing species and data for some locations. While we accept that our sample sizes are modest (Table 1), the number of collections per site and geographic breadth of sampling are far greater than any RAD-seq population genomics study of reef fishes to date. Tissues were preserved in a saturated salt-DMSO solution or 95% ethanol and subsequently stored at −20°C.

| Ethics statement
This research was undertaken in accordance with the policies and procedures of the King Abdullah University of Science and Technology (KAUST). Permits for sampling in Saudi Arabian waters were obtained from the Saudi Arabian coastguard. No specific permissions were required, as the study did not involve endangered or protected species. We were unable to obtain ethics approval or a waiver because no ethics board or committee for working with animals existed within KAUST at the time of collection.

| RAD sequencing
DNA was extracted with NucleoSpin Tissue kits (Macherey-Nagel Düren). RAD-seq libraries were prepared following Peterson, Weber, Kay, Fisher, and Hoekstra (2012) using 500 ng of DNA per specimen. Library preparation and Illumina sequencing are detailed in DiBattista, .
Sequences were de-multiplexed and filtered for quality using the process_radtags pipeline in STACKS vers. 1.44 (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011). Raw reads were trimmed from 101 bp to a common length of 81 bp in FASTQ format.
Individual reads with Phred scores ≤ 20 (in a 5 bp sliding window) or with ambiguous barcodes were discarded. All loci were assembled separately individuals using the denovo_map pipeline in STACKS.
Although an annotated butterflyfish genome is available (Chaetodon austriacus; DiBattista, , the other species considered in this study are too divergent to enable the recovery of sufficient numbers of SNPs for this comparison, and so we rely on de novo assembly for all species in this case. For the main analyses presented here, we used a parameter combination tested and optimized as part of DiBattista,  in these same reef fish species: minimum read depth to create a stack (-m) = 3; number of mismatches allowed between stacks prior to merging (-M) = 4; maximum number of mismatches when aligning secondary reads to primary stacks (-N) = 2; maximum number of mismatches allowed between loci when creating a catalog (-n) = 2. We performed additional data filtering using the "population" component of STACKS retaining only those loci that met the following criteria: (a) minor allele frequency > 0.05, (b) present in at least n−1 population (populations: -p), and (c) genotyped in at least 80% of individuals per population (populations: -r). We used the "write_single_snp" option and produced a.vcf file with the resulting loci to better conform to the assumption of independent loci. The resulting.vcf file was reformatted to other program input files using PGDSPIDER vers. 2.0.5.1 (Lischer & Excoffier, 2012). We repeated this process to produce a data set that was comprised of SNPs shared between two of the most closely related species (C. austriacus and C. melapterus) in order to benchmark patterns of intra-versus interspecific genetic variation. Pairwise F ST values were estimated in STACKS with the "populations" module using the "--fstats" flag option.

| Genetic structure analyses
We determined the magnitude of population structure for each of the 10 species by addressing the following two questions: (a) Was there evidence for restricted gene flow based on F ST estimates and F I G U R E 1 Map indicating collection sites for reef fishes sampled in the Red Sea and Arabian Sea including eight regional endemics (indicated by asterisks) and two widespread species. Colored circles indicate the proportion of samples per species at a site as indicated by the key; circle size is scaled by sample size. Major oceanographic currents and features are represented by arrows. Three putative barriers to larval dispersal are outlined by opaque orange solid lines: b1, 17°N in the Red Sea; b2, Strait of Bab Al Mandab between the Red Sea and Gulf of Aden; and b3, monsoonal upwelling system in the Arabian Sea TA B L E 1 STACKS results and genetic diversity metrics for range-restricted endemics and widespread reef fish sampled in the Red Sea to Arabian Sea (also see Figure 1  Species distribution is based on a regional database curated over 30 years by R. Myers (see Appendix S2 from DiBattista, Roberts, et al., 2016) but was modified to reflect where species are functionally present versus rare records as waifs.
clustering analyses? and (b) if so, did this restriction conform to one of the following three models: isolation by barrier (IBB; i.e., vicariance), isolation by distance (IBD), or isolation by environment (IBE; i.e., model testing)? IBE refers to scenarios where strong differences in environmental conditions between locations reduce effective gene flow.
Genetic diversity metrics (number of alleles, observed and expected heterozygosity) were estimated using STACKS. Bayesian clustering analyses were performed using STRUCTURE vers. 2.3.4 (Pritchard, Stephens, & Donnelly, 2000) without population priors.
We used the admixture model with correlated allele frequencies (Falush, Stephens, & Pritchard, 2003). A burn-in of 200,000 MCMC iterations was used, followed by 300,000 iterations for each run. K was set from 1 to the maximum number of sampling sites per species (range: 4-10), and 5 replicate analyses were run for each value of K. The number of clusters was inferred by comparing the ln P[D] among different K using the ad hoc statistic ΔK (Evanno, Regnaut, & Goudet, 2005; also see Table S1 and Appendix S1).
To complement the results from STRUCTURE and graphically summarize the genetic variation among samples within each species, we conducted a principal component analysis (PCA) of the genotype covariance matrix to summarize the genotypic variation across samples. We did this using the "glPca" function of the R package ADEGENET (Jombart & Ahmed, 2011). We also included an analysis, as outlined above, for a combined data set of the closely related C. austriacus and C. melapterus. Previous work has suggested that the divergence between these two species is recent (Waldrop et al., 2016), and together they are distributed across the range of sampling sites for the widespread species included in this study.

| Determinants of genetic differentiation in the Red Sea to Arabian Sea
For species that showed evidence of population genetic structure, we proceeded to evaluate which three models (IBB, IBD, or IBE) best explained pairwise F ST as outlined in Saenz-Agudelo et al. (2015).
Under IBD and IBE, we predict that the degree of population differentiation (measured as pairwise F ST ) increases with increasing geographic or environmental distance. As explained in Wang (2013) (Figure 2). All variables were log-transformed and subsequently normalized to have a mean of zero and unit variance.
For each species, we calculated the pairwise F ST and the environmental pairwise distance (the distance in PCA space; env) between locations. Because geographic distance between locations might also influence the genetic differentiation within a species, we included pairwise geographic distance (geo) between samples in the analyses. This distance corresponded to the length of the shortest, direct within water path between two given locations. Geographic distances were estimated using the least-cost distance function ("costDistance") in the R package gdistance ( van Etten, 2017).
Additionally, we explored the influence of three putative barriers where AIC = −2log-likelihood + 2K (K = number of parameters in model; n = number of observations). Models were then ranked according to increasing AICc (Anderson, 2008). For each species, the best model was then chosen, and statistical significance of each parameter was estimated via MMRR (Wang, 2013).

| Testing for the presence of outlier loci
We

| Reconstructing scenarios for genetic differentiation in the Red Sea to Arabian Sea
We used a modified version of the diffusion approximation method implemented in ∂a∂I (Gutenkunst, Hernandez, Williamson, & Bustamante, 2009) to explore the joint site-frequency spectrum Here, we limited our analyses to seven models of divergence including strict isolation (SI), isolation with migration (IM), ancient migration (AM), and secondary contact (SC). For each of IM, AM, and SC, we explored two options: (a) homogenous migration and (b) heterogeneous migration along the genome (2M; as described in Rougemont et al., 2017). To maximize convergence of parameter estimates, each model was run 20 times; the run providing the lowest AIC score was kept and used to compare against other models as well as to estimate model parameters.

| Genetic structure analyses
STRUCTURE analyses indicated mean probabilities as being highest at K = 1 or ambiguous for all the range-restricted species, but K = 2 (Ct. striatus) and K = 3 (C. trifascialis) for the two widespread species (Table S1). Moreover, PCA was consistent with a scenario of panmixia for all range-restricted species but not the two widespread species ( Figure 3; Appendix S1). Both widespread species demonstrated a putative barrier to dispersal between the Red Sea and the Arabian Sea, with the western end of the Gulf of Aden at Djibouti acting as a transition zone. The data set that included the two closely related species, C. austriacus and C. melapterus, also indicated that the most likely K was 2 (Table S1); two distinct genetic clusters were apparent in the PCA with a similar transition zone of admixture ( Figure 4). For reference, STRUCTURE plots for all species at all possible K and all PCA plots are provided in Appendix S1

| Determinants of genetic differentiation in the Red Sea to Arabian Sea
The first three principal dimensions explained 85% percent of the environmental variability and were used in subsequent analyses ( Figure 2 (Table S2).
Based on these results, the model with the highest probability did not include interactions between variables (model probability, For Ct. striatus, the other species demonstrating genetic structure, the four models that best explained genetic differentiation based on pairwise F ST were the ones that included the monsoonal upwelling system in the Arabian Sea (b3) as a barrier to gene flow, as well as geographic distance (best two models) or environmental distance (fourth model; Table S3). The best model included an interaction between b3 and geographic distance indicating a difference 0.093 ± 0.0028 SE, p < .001) and (b) pairwise F ST values were positively but marginally correlated with geographic distance, but only for comparisons across b3 (same side of b3: 0.00008 ± 0.0018 SE, p = .965; different side of b3: 0.0048 ± 0.0053 SE, p = .077; Figure 6).
The second-best model included the same variables but not the interaction term, suggesting the same IBD slope both within and between different sides of b3. That said, this model was 1.5 times less favored than the best model (model p = .179). We chose not to run the C. austriacus and C. melapterus combined data set here because this analysis was entirely focused on within-species differentiation and not between species differentiation.

| Testing for the presence of outlier loci
For

| Reconstructing scenarios for genetic differentiation in the Red Sea to Arabian Sea
To examine the role of vicariant processes on genetic differentiation, we analyzed the Ct. striatus and C. trifascialis data sets. We included the C. austriacus and C. melapterus combined data set here because reconstructing the demographic history within species can share insights about the processes driving genetic divergence between species. These two butterflyfish are recently diverged sister species whose range spans the study region. For all three data sets, the secondary contact (SC) models were consistently better supported by the data (  Table 2 (also see Figure 7, and Figures S1-S3, and Table S4).

| D ISCUSS I ON
The study region from the northern Red Sea to the northern coast of Oman is characterized by a change from high temperature and extensive coral cover to low temperatures and poor coral development typical of marginal, high-latitude reefs (Barber et al., 1995;Savidge et al., 1990;Sheppard et al., 1992;Wilkinson, 2008). Across this steep environmental gradient, we found genetic structure in the two widespread species but not the eight regional endemics. Moreover, the genetic structure that we identified in these widespread species was not linked to contemporary environmental conditions but instead mirrored the patterns of separation between two closely related butterflyfish species. Based on this finding and complimentary analyses that reconstructed scenarios of isolation, migration, and secondary contact, the apparent genetic structure appears to be associated with historical processes. Although isolation by adaptation has received both theoretical and empirical support in the terrestrial realm (Orsini et al., 2013), our findings did not support this scenario within this marine environment.

| Lack of association between SNPs and contemporary environmental gradients
There was no evidence that the genetic structure found in two study species (Ct. striatus and C. trifascialis) was linked to contemporary environmental gradients. Interestingly, in both widespread species, the best selected models included the presence of a geographic barrier, which explained most of the variation in pairwise genetic distance.
None of the best selected models included environmental distance, and in the lower ranked models that did include it, the slope of the relationship between environmental distance and genetic distance was not significant. This null finding is surprising because two lines of evidence support differential selective regimes across the Red Sea to Arabian Sea. Firstly, age-growth surveys reveal interpopulation   differences in life-history traits for several coastal fish species (Priest et al., 2016;Robertson, Ackerman, Choat, Posada, & Pitt, 2005;Taylor et al., 2018), including Ct. striatus (J. H. Choat, unpublished data). Secondly, there is a considerable difference in the distribution and abundance of Ct. striatus and C. trifascialis across the study region J. H. McIlwain, unpublished data).
The most significant changes in life history and abundance occur across the upwelling region in Oman, with major shifts in temperature, productivity, and water clarity that have persisted since the late Miocene (Zhuang, Pagani, & Zhang, 2017).
It is important to note that we did identify the presence of out-

| Genetic structure of endemic and widespread species
Contemporary environmental gradients within the small range of the endemic species, including barriers b1 and b2 (Figure 1) (Briggs, 2005; also see Lawton, Messmer, Pratchett, & Bay, 2011). In contrast, the range-restricted species persist and evolve under local conditions. Indeed, range-restricted C. melapterus and C. pictus (mean range size: 0.21 × 106 km 2 ) are thought to be sister species to the wide-ranging C. trifasciatus (different from our study species C. trifascialis) and C. vagabundus (mean range size: 52.18 × 106 km 2 ), respectively, with their speciation driven by peripheral budding in range edge locations Budd & Pandolfi, 2010). Local adaptation would also explain why endemic reef fishes, including the species studied here, tend to achieve much higher abundances than their widespread congeners (Hobbs, Jones, & Munday, 2011;Kane, Kosaki, & Wagner, 2014).
None of the 10 species surveyed displayed genetic structure within the Red Sea. These results contradict previous studies that PLD ~ 31-91 days; Brothers & Thresher, 1985;Doherty, Planes, & Mather, 1995;Fowler, 1989;Wilson & McCormick, 1999). The link between genetic structure and limited dispersal (Bay, Crozier, & Caley, 2006) means that the 10 study species have dispersal abilities that preclude population genetic structure across the Red Sea region.
In contrast, across a similar geographic distance, many endemic fishes (including butterflyfish and surgeonfish) in the Hawaiian Archipelago demonstrated genetic structure (Toonen et al., 2011).
Although the Hawaiian archipelago is also a peripheral region rich in endemic species, it differs in several ways to the Red Sea region.

| Scenarios for population subdivision
One of the most intriguing results in this study was that demographic modeling of both widespread species suggested secondary contact was the most likely scenario of divergence. This model was also favored between the two closely related species, C. austriacus and C. melapterus, whose combined distributional pattern mirrors the genetic break observed for C. trifascialis. The Red Sea and Gulf of Aden have a long history of intermittent isolation at the Strait of Bab Al Mandab, which may partially explain the support for the secondary contact model, at least for C. trifascialis. Interestingly, the best model for C. trifascialis did not include asymmetric rates of genomic differentiation, which points to neutral rather than selective process shaping differentiation; this is also consistent with our outlier analysis. In contrast, the best models for Ct. striatus as well as the combined C. austriacus and C. melapterus data set indicate heterogeneous genomic divergence, supporting a more complex scenario that includes natural selection, but again it is consistent with our outlier analysis. Thus, traits associated with intrinsic dispersal capacity appear to be of minor importance in determining the geographic distributions of the study species. Other factors, including the geographic configuration of coastal and reef systems, as well as the prevailing oceanographic structure, are likely of primary importance in this context.

| CON CLUS ION
Overall, this study found that genetic structure was present in widespread species and absent in endemic species. This genetic structure based on a suite of SNP markers was associated with historical processes and not contemporary environmental conditions or larval dispersal abilities. This novel insight highlights the value of genomic approaches for studying divergence and speciation in nonmodel organisms. This is particularly useful in the marine environment, where research efforts and developments have traditionally lagged behind that of terrestrial and freshwater systems. The marine environment contains some of the most diverse systems in the world (e.g., coral reef ecosystems) and genomic approaches can fast-track our understanding of the origins and maintenance of this diversity.

CO N FLI C T O F I NTE R E S T S
The authors declare no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
Raw SNP data and all appendices are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.rn8pk 0p68.