Consequences of marine barriers for genetic diversity of the coral‐specialist yellowbar angelfish from the Northwestern Indian Ocean

Abstract Ocean circulation, geological history, geographic distance, and seascape heterogeneity play an important role in phylogeography of coral‐dependent fishes. Here, we investigate potential genetic population structure within the yellowbar angelfish (Pomacanthus maculosus) across the Northwestern Indian Ocean (NIO). We then discuss our results with respect to the above abiotic features in order to understand the contemporary distribution of genetic diversity of the species. To do so, restriction site‐associated DNA sequencing (RAD‐seq) was utilized to carry out population genetic analyses on P. maculosus sampled throughout the species’ distributional range. First, genetic data were correlated to geographic and environmental distances, and tested for isolation‐by‐distance and isolation‐by‐environment, respectively, by applying the Mantel test. Secondly, we used distance‐based and model‐based methods for clustering genetic data. Our results suggest the presence of two putative barriers to dispersal; one off the southern coast of the Arabian Peninsula and the other off northern Somalia, which together create three genetic subdivisions of P. maculosus within the NIO. Around the Arabian Peninsula, one genetic cluster was associated with the Red Sea and the adjacent Gulf of Aden in the west, and another cluster was associated with the Arabian Gulf and the Sea of Oman in the east. Individuals sampled in Kenya represented a third genetic cluster. The geographic locations of genetic discontinuities observed between genetic subdivisions coincide with the presence of substantial upwelling systems, as well as habitat discontinuity. Our findings shed light on the origin and maintenance of genetic patterns in a common coral reef fish inhabiting the NIO, and reinforce the hypothesis that the evolution of marine fish species in this region has likely been shaped by multiple vicariance events.


| INTRODUC TI ON
Coral-dependent fishes occupy relatively discrete patches of habitat as adults that can be separated by areas of unsuitable habitat ranging in scale from few meters to thousands of kilometers (Morrison & Sandin, 2011;Planes, 2002). Patterns of distribution and interchange across such habitat patches will be predominantly associated with species pelagic life stage, as subadults and adults present species-specific levels of homing behavior (Mumby, 2006). Nearly all demersal marine teleosts have a bipartite life cycle in which adults produce tiny propagules (i.e., ichthyoplankton) that undergo a pelagic larval phase, potentially migrating between patches before settling and metamorphosing into juveniles (Leis, 2006). Therefore, this life stage represents the first and most predominant opportunity for dispersal, and consequently where interpopulation connectivity may likely occur (Cowen, Lwiza, Sponaugle, Paris, & Olson, 2000;Cowen, Paris, & Srinivasan, 2006).
The inter-patch distance and the movement capacity of the larvae may strongly impact population structure by the accumulation of local genetic differences through space. For example, if the distance of larval migration is much smaller than the species range, the population will be structured such that genetic and geographic distance between populations is positively correlated, resulting in patterns of isolation-by-distance (IBD; Wright, 1943). On the other hand, independent of geographic distance, the population may be structured due to the influence of seascape heterogeneity on gene flow and population connectivity. In this scenario, genetic differentiation increases with environmental differences (i.e., Isolation-by-environment; IBE) due to natural selection against immigrants, sexual selection against immigrants, reduced hybrid fitness, and/or biased dispersal (Wang & Bradburb, 2014).
In contrast to these thoughts, the high fecundity of marine fishes combined with prevailing oceanographic currents and extended pelagic larval durations (PLD) would be expected to result in substantial larval dispersion potential, in which adult populations are variously interconnected (i.e., panmictic) and, therefore, no correlation exists between geographic and genetic distances. Biogeographic studies of marine fishes have demonstrated the existence of numerous endemic species around isolated oceanic islands and in peripheral areas (e.g., Red Sea), suggesting that larval retention can occur in a myriad of ways even in short spatial scales. Indeed, in addition to distances and heterogeneous seascapes, historical and oceanographic barriers are also thought to be important factors affecting larval dispersal and consequently population structure Bowen, Rocha, Toonen, & Karl, 2013). For example, the biogeographic patterns of adult reef fishes inhabiting the Northwestern Indian Ocean (NIO) indicate no single, repeated, or uniform explanation to larval retention. Instead high variance in contemporary species distributions is likely the outcome of a number of vicariance events that involve historical and contemporary barriers (Berumen, DiBattista, & Rocha, 2017).
In the NIO, historical sea level changes within the Red Sea and Arabian Gulf associated predominantly with Pleistocene epoch glacial cycles have impacted the historical availability of habitat, direction, or magnitude of oceanographic currents, and abiotic conditions within both regions, substantially impacting on current levels of biodiversity (Riegl & Purkis, 2012). Water exchange between the Red Sea and the rest of the Indian Ocean, through the Strait of Bab al Mandab, has been repeatedly restricted during Pleistocene glacial cycles (Stevens et al., 2014) when sea level was lowered by ~130 m (Clark et al., 2009). In comparison, the Arabian Gulf reached its present levels just 6-9 K years ago, with the entire seabed therefore exposed during the Pleistocene period (Vaughan, Al-Mansoori, & Burt, 2019;Lokier et al., 2015).
In turn, contemporary dispersal barriers for coral reef fishes within the NIO are mainly structured by southwest monsoonal activity, resulting in seasonal cold-water upwelling events, which can hinder planktonic dispersal of species intolerant to low temperatures (Hoeksema, 2007). In addition, such oceanographic event have also led to large areas of unsuitable larval settlement northern Somalia, which together create three genetic subdivisions of P. maculosus within the NIO. Around the Arabian Peninsula, one genetic cluster was associated with the Red Sea and the adjacent Gulf of Aden in the west, and another cluster was associated with the Arabian Gulf and the Sea of Oman in the east. Individuals sampled in Kenya represented a third genetic cluster. The geographic locations of genetic discontinuities observed between genetic subdivisions coincide with the presence of substantial upwelling systems, as well as habitat discontinuity. Our findings shed light on the origin and maintenance of genetic patterns in a common coral reef fish inhabiting the NIO, and reinforce the hypothesis that the evolution of marine fish species in this region has likely been shaped by multiple vicariance events.

K E Y W O R D S
connectivity, coral reef fish, marine biogeography, phylogeography, Pomacanthus maculosus, seascape genomics habitat in the upwelling zones, potentially isolating populations by restricting stepping-stone connectivity and thereby increasing the chance for vicariant splits Priest et al., 2016).
The geographic position and consequences of such barriers on reef fish distribution within the NIO have been previously investigated, with such works relying predominantly on species occurrence data (Burt et al., 2011;DiBattista, Choat, et al., 2016;DiBattista, Roberts, et al., 2016;Kemp, 1998Kemp, , 2000Klausewitz, 1972Klausewitz, , 1989. Increasing work about these barriers has now focused on surveying the population genetic structure of conspecifics throughout their range, as genetic discontinuities may provide insight into past and present barriers and allow historical inferences on dispersal . Despite these endeavors, there is still little understanding of the mechanisms underlying the distribution of biodiversity of coral reef fishes within the NIO (see, Bowen et al., 2013;DiBattista, Choat, et al., 2016;DiBattista, Roberts, et al., 2016).
In this paper, we investigate population genetic structure of the yellowbar angelfish Pomacanthus maculosus (Pomacanthidae; Forsskål 1775), a common coral-dependent fish found throughout the NIO. We first examine whether intraspecific genetic variation has been driven by either IBD or IBE, and if not, we examine whether genetic variation within P. maculosus aligns with contemporary species distribution patterns by investigating the role of the putative marine barriers in the NIO.

| Study area
The study region comprised the Northwestern Indian Ocean (NIO): northeast of Kenya on the eastern African coastline (hereafter referred to as Kenya), and the seas surrounding the Arabian Peninsula, that is the Red Sea, the Gulf of Aden, the Arabian Sea, the Sea of Oman, and the Gulf (also known as the Arabian Gulf or Persian Gulf; Figure 1). Coral reef habitat persists across the NIO, despite substantial gradients in environmental features. In the Arabian Gulf, for example, sea surface temperature (SST) ranges from 12°C in winter to over 36°C in summer, and salinity is often >45 (Reynolds, 1993).
In the Red Sea, salinity can reach 42.5 (Medio et al., 2000), and SST can rise to values of between 36 and 38°C in the south, while temperatures as low as 10°C have been recorded in the Gulf of Suez in the north. In contrast, both the Arabian Sea and the ocean off Kenya exhibit moderate salinity (36-37) and relatively cool temperatures (20-26°C and 25-29°C, respectively; Burt et al., 2011;Kayanne et al., 2006).
The NIO exhibits more dramatic seasonal variations in abiotic parameters than the rest of the Indian Ocean (Benny, 2002). The water mass distribution and upper ocean circulation in the NIO changes in response to biannual wind stress reversals, creating seasonality in oceanographic conditions (Shetye & Shenoi, 1994). During winter (November-March), the monsoon wind blows from the northeast, F I G U R E 1 Locations of the 27 sampling sites for Pomacanthus maculosus across the Western Indian Ocean away from the Asian continent, and the ocean surface circulation in the Arabian Sea is approximately counter-clockwise. This pattern is reversed during the summer monsoon (May-September) when the wind blows strongly from the southwest, and the circulation in the Arabian Sea is clockwise (see Cutler & Swallow, 1984).

| Sampling design and DNA extraction
The sampling scheme was designed to maximize the geographic and environmental distances covered, so as to best investigate the importance of these factors for population genetic variation within P. maculosus. Therefore, a total of 151 tissues samples (fin clips) were collected from 27 locations that together covered an extensive part of the distributional range of the species, including the Red Sea, the Gulf of Aden, the Sea of Oman, the Arabian Gulf and Kenya ( Figure 1 and Table 1). Fin clips were preserved in 96% ethanol and kept at −20°C until DNA extraction. High-molecular-weight genomic DNA was isolated in a final elution volume of 100 μl following the manufacturer's instructions using either the Qiagen DNeasy ® Blood & Tissue Kit or the KingFisher Cell and Tissue DNA Kit. DNA concentrations in the extracts were measured using the Qubit 2.0 dsDNA BR Assay Kit (Invitrogen ™ ) Fluorometer and checked for high-molecular-weight bands on a 1% agarose gel.

| Genotyping and de novo assembly of RAD tags
RAD tag libraries were constructed by Floragenex from two 96well plates, following the protocol outlined by Baird et al. (2008), Hohenlohe et al. (2010) and Etter (2011). In brief, high-molecularweight genomic DNA was digested into small fragments with a high fidelity SbfI restriction enzyme, and an adapter (P1) containing a matching sticky-end TGCAGG and in-line barcode sequence was ligated to the fragment's overhanging ends. Tagged restriction fragments from all individuals were then pooled (multiplexed), randomly sheared and size-selected to an appropriate length for sequencing (typically 300-500 bp, average of ~380 bp). Thereafter, fragments were ligated to a Y-adapter (P2), which ensures that all amplified fragments have the P1 and barcode, followed by the partial restriction site, a few bases of flanking sequence, and a P2 adapter (Davey & Blaxter, 2010). Finally, the DNA fragments representing a much-reduced part of the original genome were PCR amplified using P1 and P2 primers and the RAD-seq libraries were sequenced on the Illumina HiSeq2000 platform applying single-read (1 × 100 bp) sequencing. Each plate was processed separately, with all samples within a plate pooled into a single library and sequenced on one lane.

Raw reads obtained from 100 bp single-end Illumina sequencing
were assessed for sequence quality, AT/GC content, and duplicate or overrepresented sequences using FastQC v.0.11.5. After initial quality assessment, reads were filtered and detection of single nu- As a reference genome was not available for P. maculosus, RAD tags were analyzed de novo, with parameters chosen according to the criteria showed in Paris et al. (2017). This was undertaken by clustering in loci similar sequence reads, at individual level, that had a maximum of two base pairs mismatches (M = 2) between them, and using only sequences with a minimum read depth of three (m = 3) were required to create a stack. After building of loci at the individual level, cstacks was used to match loci across samples and build a catalog, which allowed a maximum of one base pair mismatch (n = 1; Final coverage for each individual: mean = 57.77 X; stdev = 14.01).
Subsequently, RAD tags for all individuals were used to detect only the first SNP (--write_single_snp) by identifying at the same locus a marker that was present in one set of individuals but absent in another.

| Filtering procedures
A second filtering step was performed in Plink 1.9. Individuals with more than 10% missing genotypes were excluded and only SNPs with a 90% genotyping rate (10% missing) and a minor allele frequency (MAF) higher than 5% were included. We also filtered data in respect to linkage among SNPs using a window size of 100 bp and a pairwise r 2 threshold of 0.2. Finally, markers that did not meet the Hardy-Weinberg Equilibrium (HWE) assumptions were excluded.

| Summary statistics
We used the F ST statistic (Wright, 1950) in order to relate the amount of genetic variation between populations from different sampling sites to the total genetic variation across populations (Meirmans & Hedrick, 2011). These indexes, as well as the expected heterozygosity (He) within each population, were obtained in GenoDive v.2.0.

| Isolation-by-distance and isolation-byenvironment
Correlation between genetic divergence and geographic distances was tested for IBD by applying the Mantel test to the linearized values and geographic distances (in kilometers).
Geographic distances between sampling locations (i.e., the minimum distance between the locations by sea) were determined in Google Earth Pro v.7.3, and the Mantel test was executed in GenoDive v.2.0.
To test for IBE, we acquired geographic information systems (GIS) data for a total of nine water environmental variables (temperature, salinity, nitrate, phosphate, silicate, dissolved oxygen, chlorophyll, phytoplankton and primary productivity) from Bio-ORACLE (Assis et al., 2018), and in addition, substratum rugosity was obtained from the QGIS terrain ruggedness index. The environmental matrix was filled out with values extracted for each variable at every location where tissue samples were collected. This information was centered and scaled before performing a principal components analysis (PCA) in R. All procedures were conducted using the R statistical software (R Development Core Team, 2017), by means of the following packages: raster (Hijmans, 2017), rasterVis (Lamigueiro & Hijmans, 2018), maptools (Bivand & Lewin-Koh, 2015), gridExtra (Auguie, 2017), lattice (Deepayan, 2008) and fields (Nychka et al., 2017

| Distance-based method (PCA)
We performed a PCA with the EIGENSOFT v.6.1.3 program in order to identify patterns in the data, by highlighting similarities and differences between samples and sampling sites.

| Model-based clustering analyses
The most probable number of genetic clusters and the membership of each individual to these clusters were estimated using the ADMIXTURE software (Alexander, Novembre, & Lange, 2009). The most likely number of clusters was selected based on cross-validation error (CV) and the value of K that minimized the residuals (Pritchard et al., 2000). The prior expectation for the possible range of K (between 1 and 5) was based on the number of regions from which the samples were collected, that is the Arabian Gulf, the Sea of Oman, the Gulf of Aden, the Red Sea and Kenya.

| Maximum likelihood evolutionary tree
To Assuming the independence of SNPs (-k 0), we ran TreeMix using the five sampled regions, and rooting the tree at the Red Sea, a suggested glacial refuge for many fish species during the last glaciation (DiBattista et al., 2013;DiBattista, Choat, et al., 2016). First, we assessed the tree topology with no migration events, which corresponded well with the ADMIXTURE and PCA results, and then we sequentially allowed for one to three migration events, performing 100 independent replications for each of them using the bootstrap option. The log-likelihood value of each model was compared pairwise with the following model using the likelihood ratio test (LRT). The best model was selected if a significant difference was found between two consecutive models, and the corresponding residuals were visualized with the in-built R script plotting functions in TreeMix v.1.13. The amount of variance in the relatedness between populations explained by the model was calculated using the R script treemixVarianceExplained (Card, 2015). A total of 10,225 genome-wide SNPs from 150 individuals were retained and used in all subsequent analysis.

| Historical population parameters
The Admixture analysis indicated the presence of population structure within the dataset. Cross-validation error (CV) pointed to two

| Principal component analyses
The

| TreeMix
The tree topology from TreeMix corresponded well with the

| IBD and IBE
Dispersal has important consequences on the spatial distribution of genetic variation (Puebla et al., 2009). In coral reef fish, genetic structure driven by IBD is commonly attributed as the origin of the differentiation among populations (Planes & Fauvelot, 2002). Our study, however, showed that at the scale of the whole sample range (i.e., 3,000 km), IBD explained a significant but modest portion of the overall genetic variation in yellowbar angelfish. The decrease of the slope of IBD at such large spatial scales may be due to several factors (Puebla et al., 2009), such as the greater importance of mutation at large spatial scales (Rousset, 1997) and nonequilibrium conditions between migration and genetic drift (Slatkin, 1993).
We also sought to understand how the environment affects the distribution of genetic variation over populations by testing for IBE, but the analysis showed no significant relationship. Pomacanthus maculosus is distributed around the entire Arabian Peninsula, with the edges of the distribution (the Red Sea and the Arabian Gulf) being environmentally similar to each other ( Figure 3). However, the genetic data showed that individuals from the Red Sea and the Arabian Gulf represent two distinct populations, and, therefore, no linear relationship between genetic and environment was detected.
We tested only for IBD and IBE, as alternative approaches such as Isolation by Biophysical Connectivity (IBC) requires biological information (e.g., PLD, larval vertical distribution, spawning season) that are not available for the majority of species inhabiting the NIO. Biophysical models would become oversimplified without sufficient biological information, and will thus likely fail to represent the characters of a single target species (but see Foster et al., 2012;Truelove et al., 2017). steady winds blowing along the shore induce offshore movement of the Ekman layer, with consequent depression of the sea surface height (SSH) in coastal areas. This is followed by vertical advection of cooler waters from deeper layers to the surface, especially in the region off Somalia (Bruce, 1979) and Oman (Elliott & Savidge, 1990;Vic et al., 2017). During this period, ichthyoplankton located within the Ekman layer therefore tend to be transported offshore into areas of unsuitable habitat for settlement (Lett et al., 2007).

| Oceanographic barriers
Despite potential negative impacts on larval recruitment, ocean circulation could also favor connection between the subpopulations of P. maculosus. During summer, when P. maculosus (Grandcourt & Francis, 2010) and other marine fish species spawn (Claereboudt, McIlwain, Al-Oufi, & Ambu-Ali, 2005;McIlwain et al., 2006), the wind blows southwest and drives the prevailing clockwise circulation in the Arabian Sea (see Cutler & Swallow, 1984). This period of clockwise upper ocean circulation represents the best opportunity for northwards dispersal of larvae from east Africa (Kemp, 1998), and could explain the (very weak) migration event from Kenya to the Sea of Oman indicated in the TreeMix analysis.

| Seascape barriers
An alternative hypothesis for the observed population genetic structure is that the lack of suitable habitats, in both the south of the Arabian Peninsula and along the Somali coast, creates an unbridgeable gap by restricting stepping-stone connectivity between both the East and West side of the Arabian Peninsula and between the Arabian Peninsula and Kenya (Figure 7). According to this hypothesis, the seasonal upwelling events act as an indirect cause of genetic divergence by reducing the growth of suitable habitat (i.e., coral reefs) in these areas. For example, only four principal areas of reef coral occur along the Omani coast Glynn, 1993), while a major break in habitat continuity occurs off the Somali coast where corals are reduced to patch reefs scattered within seagrass beds (Carbone & Accordi, 2000).
Besides the lack of suitable adult habitat between subpopulations, gene flow magnitude in fish (Rocha et al., 2002) and invertebrates (Ayre, Minchinton, & Perrin, 2009) may also vary due to differences in habitat specificities, such as rugosity. Coral reefs are topographically complex places that influence the associated organisms to various degrees either by increasing refuge from predators or by affecting the relationships between species (Gratwicke & Speight, 2005;Luckhurst & Luckhurst, 1978). To account for this hypothesis that habitat complexity affects the connectivity, our study considered the effect of rugosity on genetic structure in the IBE analysis, but this variable did not affect the relationship between genetic and environmental distance.

| Geological history
The endemism assigned to the NIO has also likely been augmented by geological history, as the seascape features of this area were drastically altered during the last glaciation events. This created barriers and changes in environmental conditions that likely led to the radiation of subpopulations or even species (DiBattista et al., 2013;DiBattista, Choat, et al., 2016;DiBattista, Roberts, et al., 2016;Klausewitz, 1972Klausewitz, , 1989. For example, decreasing sea levels during glaciation caused significant alterations in environmental conditions by restricting water exchange between the Red Sea and Indian Ocean, and thereby creating a hypersaline environment within the Red Sea (DiBattista, Choat, et al., 2016;DiBattista, Roberts, et al., 2016). Also, the exposure of the seabed in the Gulf during this period (Vaughan et al., 2019;Lokier et al., 2015;Sarnthein, 1972) decreased the availability of habitat within the NIO. Therefore, the population currently inhabiting the Gulf is likely younger and tends to be genetically less diverse than neighboring populations (Hume et al., 2016).

| Genetic population structure within the Western Indian Ocean (WIO)
Recent population genetic studies carried out within the WIO have supported a common geographic position of putative barriers across the Arabian Peninsula. The presence of a genetic discontinuity between both sides of the Arabian Peninsula as shown here for P. maculosus, coincides with previous population genetic reports from both mitochondrial DNA of Cephalopholis hemistiktos (Priest et al., 2016) and from double digest RAD sequencing (ddRAD-seq) of Amphiprion bicinctus and A. omanesis (Saenz-Agudelo et al., 2015), which in turn also presented a genetic break within the Red Sea (Nanninga et al., 2014). In contrast, mitochondrial DNA from a multi-taxon survey

| CON CLUS I ON AND FUTURE RE S E ARCH DIREC TIONS
There seems to be no single explanation or vicariance event that shaped the evolutionary histories of fish species within the NIO.
Therefore, comparative phylogeography studies could represent an initial endeavor to detect and measure the relative importance of the major evolutionary forces preventing the gene flow and shaping the patterns of genetic diversity. Moreover, future genetic work, particularly studies using advanced genomic approaches (e.g., whole genome sequencing) could provide greater resolution for particular taxa of interest. On the other hand, biogeographic studies in the WIO are still hindered by socioeconomic and political restraints in some of the countries bordering the region, which have created a situation of limited access for scientists. For example, the sea off Somalia coincides with the major faunal change between the Arabian Peninsula and the WIO, but is largely unstudied, much like the understudied region off the coast of the Arabian Sea between the Gulf of Aden and southern Oman. These areas are not well characterized, and not solely in terms of species distributions, but also in terms of habitats.

ACK N OWLED G M ENTS
This work was conducted within the framework of the NPRP project 'Connectivity, diversity and genetic between offshore natural coral reefs and oil platforms -NPRP No. 7-1129-1-201', funded by the Qatar National Research Fund (a member of The Qatar Foundation).
The statements made herein are solely the responsibility of the authors. F.T. is supported by a CNPq/Brazil fellowship through the program Science without Borders (Proc. 232875/2014-6). We are also grateful to Filipe Vieira (University of Copenhagen) for his advice on population genetic analyses.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
The genetic data reported in this paper have been deposited in the National Center for Biotechnology Information (NCBI) Short