Ice ages and butterflyfishes: Phylogenomics elucidates the ecological and evolutionary history of reef fishes in an endemism hotspot

Abstract For tropical marine species, hotspots of endemism occur in peripheral areas furthest from the center of diversity, but the evolutionary processes that lead to their origin remain elusive. We test several hypotheses related to the evolution of peripheral endemics by sequencing ultraconserved element (UCE) loci to produce a genome‐scale phylogeny of 47 butterflyfish species (family Chaetodontidae) that includes all shallow water butterflyfish from the coastal waters of the Arabian Peninsula (i.e., Red Sea to Arabian Gulf) and their close relatives. Bayesian tree building methods produced a well‐resolved phylogeny that elucidated the origins of butterflyfishes in this hotspots of endemism. We show that UCEs, often used to resolve deep evolutionary relationships, represent an important tool to assess the mechanisms underlying recently diverged taxa. Our analyses indicate that unique environmental conditions in the coastal waters of the Arabian Peninsula probably contributed to the formation of endemic butterflyfishes. Older endemic species are also associated with narrow versus broad depth ranges, suggesting that adaptation to deeper coral reefs in this region occurred only recently (<1.75 Ma). Even though deep reef environments were drastically reduced during the extreme low sea level stands of glacial ages, shallow reefs persisted, and as such there was no evidence supporting mass extirpation of fauna in this region.


| INTRODUC TI ON
Explaining the underlying factors responsible for the diversity of species accumulated at hotspots of endemism remains a difficult problem in the field of biogeography. Recent research has identified the importance of peripheral regions from tropical oceans in generating and exporting biological diversity, thus intermittently seeding adjacent seas (Bowen, Rocha, Toonen, & Karl, 2013;DiBattista et al., 2013;DiBattista, Wilcox, Craig, Rocha, & Bowen, 2010;Eble et al., 2011;Gaither et al., 2011;Gaither, Toonen, Robertson, Planes, & Bowen, 2010;Malay & Paulay, 2010;Skillings, Bird, & Toonen, 2010); however, direct tests of this assumption are rare. Renewed interest in the Red Sea to Arabian Gulf (or Persian Gulf) region provides a new opportunity to explore hypotheses associated with how endemics are formed in peripheral areas, and its potential contribution to the species richness of marine biodiversity hotspots. The Red Sea is a semi-enclosed basin located at the north-western corner of the Indian Ocean and harbors one of the highest levels of endemism for marine organisms (12.9% for fishes, 12.6% for polychaetes, 8.1% for echinoderms, 16.5% for ascidians, and 5.8% for scleractinian corals; DiBattista, Roberts, et al., 2016). The level of endemism among well-characterized groups in the Red Sea, such as the shore fishes, exceeds those of all other peripheral endemic hotspots identified for the Indian Ocean (DiBattista, Roberts, et al., 2016). Although many of these Red Sea endemics extend their distribution into the adjacent Gulf of Aden and Arabian Sea (DiBattista, Choat, et al., 2016;DiBattista, Roberts, et al., 2016;Kemp, 1998), it is not clear whether they are paleo-endemics (old lineages restricted due to range contraction), neo-endemics (young lineages at the site of origin), or "ecological" endemics (old or young lineages with a restricted range due to species ecology; see Cowman, Parravicini, Kulbicki, & Floeter, 2017) and where, when, and how this diversification occurred.
The Red Sea has a unique geological and paleoclimatic history that may have played a role in its high levels of endemism (see DiBattista, Choat, et al., 2016 for review). In brief, the Red Sea basin was formed by episodes of sea floor spreading 41-34 Ma (Girdler & Styles, 1974), followed by intermittent connections to the Mediterranean Sea in the north (~14-5 Ma; Hubert-Ferrari et al., 2003), and a more recent connection to the Gulf of Aden in the south through the Strait of Bab al Mandab (~5 Ma to present; Bailey, 2010). The Strait is a narrow channel (29 km) with a shallow sill (137 m) that constitutes the only connection between the Red Sea and the Indian Ocean (Bailey, 2010). Water exchange is regulated by Indian Ocean monsoon patterns (Raitsos, Pradhan, Brewin, Stenchikov, & Hoteit, 2013;Smeed, 1997) but was historically minimal or absent during reduced sea levels caused by glacial periods of the Pleistocene (Rohling et al., 2009), including the most recent glacial maximum (20-15 ka; Ludt & Rocha, 2015;Siddall et al., 2003). Restricted water flow resulted in increased salinity within the Red Sea (Biton, Gildor, & Peltier, 2008), leading some to suggest that there was complete extirpation of Red Sea fauna during these periods (Klausewitz, 1989). The "Pleistocene extirpation" hypothesis, wherein all Red Sea fauna were eliminated during the last glacial maximum (~18 ka) and subsequently re-populated via more recent colonization events, remains controversial and untested with modern comparative approaches (DiBattista, Choat, et al., 2016), although similar geological events may have occurred in the Mediterranean Sea (Bianchi et al., 2012). Thus, despite some agreement on the broad strokes of its geologic history, little consensus has emerged on the processes that shaped the Arabian Peninsula's present day marine biodiversity, their influence on biodiversity in adjacent regions, and the role of historical closures of the Strait of Bab al Mandab.
Butterflyfishes and bannerfishes, brightly colored reef fishes in the family Chaetodontidae, are a potential model system for elucidating the origins, maintenance, and evolutionary history of Red Sea endemics and their influence on species richness in adjacent marine regions. The family is diverse (17 species in the Red Sea and >130 species in the greater Indo-West Pacific; Allen, Steene, & Allen, 1998) and phylogenetically well resolved compared to other reef fish families (Cowman, 2014). A high proportion of the Chaetodontidae species found in the coastal waters of the Arabian Peninsula are endemic (32%; DiBattista, Roberts, et al., 2016). Although recent molecular phylogenies of chaetodontids have helped to clarify many aspects of their evolutionary history (Bellwood et al., 2010;Cowman | 10991 DIBATTISTA eT Al. range sizes than small, solitary, and strictly diurnal species (Luiz et al., 2013(Luiz et al., , 2012. Moreover, dispersal ability can potentially influence clade diversification: to successfully colonize and establish populations in peripheral areas, tropical fish species must be good dispersers (Hobbs, Jones, Munday, Connolly, & Srinivasan, 2012). Following diversification in peripheral areas, newly formed lineages may evolve traits less conducive to dispersal, thus becoming endemic to the area where it originated, as often occurs in the evolution of insular terrestrial endemics (Whittaker and Fernández-Palacios, 2007).
We therefore predict that butterflyfishes endemic to the Arabian Peninsula region will have smaller body sizes, higher sociability, and reduced dispersal ability compared to their widespread congeners.
Broadly speaking, endemic species tend to be ecological specialists and thus adapted to the environmental condition in which they arose (McKinney, 1997). We therefore additionally predict that these endemics will have a higher level of ecological specialization than widespread species. For reef fishes, habitat specialization is often defined by the depth range where they occur and the number of different habitats that they exploit (e.g., coral reefs, rocky reefs, seagrass beds, mangroves; Luiz et al., 2012). Dietary specialization is often defined by the proportion of different food categories targeted (Pratchett, 2014). We predict that butterflyfishes endemic to the Arabian Peninsula region will have higher dietary specialization and reliance on corals for food given recent origins alongside their coral rich habitat (Renema et al. 2016). We choose to focus on adult versus larval ecological traits because more information about the former is available, and has been shown to correlate with past (Ottimofiore et al., 2017) and present (Luiz et al., 2013) geographic range size.
The aims of this study are threefold. First, we aim to reconstruct the phylogeny and evolutionary timescale for Red Sea to Arabian Gulf butterflyfishes in order to test whether these peripheral areas intermittently seed the broader Indo-West Pacific with biodiversity ("evolutionary incubator" hypothesis). Outcomes that would allow rejection of this hypothesis include a lack of evidence supporting Arabian Peninsular endemic fish lineages giving rise to Indo-West Pacific fish lineages as well as restricted ancestral ranges expanding into this broader region. Second, we look to test the extent to which butterflyfish maintained a continuous presence in the Red Sea during the major environmental fluctuations of the Pleistocene ("Pleistocene extirpation" hypothesis). Outcomes that would allow rejection of this hypothesis include a lack of evidence supporting Arabian Peninsular endemic fish originating after the glacial cycles of the Pleistocene, as well as colonization events dated only before or after this epoch. Third, we aim to test whether species endemic to the coastal waters of the Arabian Peninsula non-randomly associate with particular ecological traits ("ecological trait" hypothesis), which may be important in explaining patterns of diversification in this region. The expectation here is that endemic fishes are more specialized and thus better adapted to local conditions than their widespread congeners. Outcomes that would allow rejection of this hypothesis include a lack of association between endemism and any of the ecological traits considered here.

| Materials
Site location, sampling date, and museum voucher information (where available) for each specimen are outlined in Supporting Information Table S1. All butterflyfish species included in this study and their geographic distribution are listed in Table 1.
As our primary objective is to reconstruct the evolutionary history of butterflyfishes known to occur in the Red Sea and adjacent gulfs or seas, we concentrated our sampling efforts on those species and their closest relatives. Although five major Chaetodontidae lineages were sampled, Chaetodon Clade CH1 (Chaetodon robustus and C. hoefleri, restricted to the Atlantic; Cowman & Bellwood, 2013), and multiple bannerfish genera (Amphichaetodon, Chelmon, Chelmonops, Coradion, Hemitaurichthys, and Johnrandallia) without species represented in the Red Sea were not sampled in this study.
Two species of the Prognathodes genus were included to facilitate fossil calibration, but were not included in the biogeographic analyses due to their Atlantic distributions (see below).
In total, we sampled 47 chaetodontid species (35% of the entire family), which includes all regional endemics and wide-ranging species found in the Arabian Peninsula region save Roa jayakari, a rare deepwater species distributed from the Red Sea to coastal India; we were unable to secure a tissue sample as part of this study. Eight of these species have not previously been sampled in phylogenetic studies of the family (Bellwood et al., 2010;Cowman & Bellwood, 2011;Fessler & Westneat, 2007;Hodge et al., 2014

| Phylogenomics approach
We employ the sequence capture method of ultraconserved elements (UCEs) to produce millions of reads in parallel from multiple butterflyfish specimens collected from the Gulf of Aqaba in the west (Red Sea) to the Hawaiian Archipelago in the east (Pacific Ocean, PO). UCEs are a class of highly conserved and abundant nuclear markers distributed throughout the genomes of most organisms (Bejerano, Haussler, & Blanchette, 2004;Siepel et al., 2005;Reneker et al., 2012). These markers do not intersect paralogous genes (Derti, Roth, Church, & Wu, 2006), do not have retro-element insertions (Simons, Pheasant, Makunin, & Mattick, 2006), have a range of variant sites (i.e., evolving on different time scales; , and have been used to reconstruct phylogenies across vertebrates (Bejerano et al., 2004;Faircloth, Sorenson, Santini, & Alfaro, 2013;McCormack et al., 2013;Smith et al., 2014;Sun et al., 2014), including fishes at both shallow (Mcgee et al., 2016) | 10993 DIBATTISTA eT Al.  Figure 1. Asterisks indicate regional endemics for the purposes of our correlational trait analysis. The letters below each region indicate the geographic groupings used for BioGeoBEARS analysis. Although Chaetodon leucopleura, Chaetodon melapturus, and Chaetodon pictus are listed as being present in the Red Sea, this is based on rare records at their northern limits. Similarly, we have only sampled C. pictus (and not Chaetodon vagabundus) at Socotra (DiBattista et al., 2017), and rare records of Chaetodon austriacus in the Gulf of Aden and South Oman likely represent waifs.

Gulf of Oman and
TA B L E 1 (Continued) and deep (Alfaro et al., 2018;Faircloth et al., 2013;Harrington et al., 2016) phylogenetic scales. Lab. Detailed methods of library enrichment, post-enrichment PCR, and validation using relative qPCR may be found at https://ultraconserved.org/#protocols.

| Sequence read quality control, assembly, and UCE identification
We removed adapter contamination and low quality bases with illumiprocessor (Faircloth, 2013), a parallel wrapper to Trimmomatic (Bolger, Lohse, & Usadel, 2014). To assemble the trimmed dataset, we used the PHYLUCE pipeline (version 8ca5884; Faircloth, 2016) with the phyluce_assembly_assemblo_trinity.py wrapper script for Trinity (version 1.5.0; Grabherr et al., 2011). We matched assembled contigs to enriched UCE loci by aligning contigs from each species to our UCE probes using the phyluce_assembly_match_contigs_ to_probes.py script with the LASTZ assembler (Harris, 2007). We stored these match results into a SQLite relational database after excluding contigs that matched multiple UCE loci and UCE loci whose probes matched multiple contigs.
Following alignment, we end-and internally-trimmed alignments with GBLOCKS (Castresana, 2000) to improve phylogenetic inference by removing poorly aligned or highly divergent sites (Talavera & Castresana, 2007). We selected loci that were present in at least 75% of our specimens and concatenated the alignments into a PHYLIP-formatted matrix for phylogenetic analysis. We included previously published UCE data for three species in our alignment to represent Acanthomorpha outgroup lineages and more accurately calibrate the phylogeny (see below).

| Phylogenetic analysis of concatenated UCE
data: evaluation of the "evolutionary incubator" and "Pleistocene extirpation" hypotheses We fully partitioned our concatenated alignment by UCE locus and performed Bayesian analyses of the dataset with ExaBayes (Aberer, Kobert, & Stamatakis, 2014) and two independent runs, sampling every 500 generations. We used the autostopping convergence criteria of an average standard deviation of split frequencies of <5% and visualized the log-likelihood of each chain to ensure convergence in Tracer version 1.6 (Rambaut et al., 2014).
We estimated divergence times using MCMCTREE in the PAML package on the Bayesian consensus topology. We used the likelihood approximation approach following the two-step procedure described by Dos Reis and Yang (2011) by first estimating a mean substitution rate for the entire alignment with BASEML under a strict molecular clock and then using this estimate to set the rgene_prior in MCMCTREE. We used a single, unpartitioned alignment for computational tractability, with an HKY85 model, five categories for the gamma distribution of rate heterogeneity, an rgene_gamma prior for the gamma distribution describing gene rate heterogeneity of (2, 371.0575, 1) and a sigma2_gamma prior of (2, 5, 1). We adopted a calibration strategy that builds on Harrington et al. (2016) by including more proximal acanthomorph outgroups to Chaetodontidae and their immediate relatives. We constrained six nodes on the basis of fossil information using hard lower and soft upper bounds outlined in Supporting Information Figure S1. We assigned a minimum amount of prior weight for ages below the lower bound (1e-200) and 5% prior

| Ancestral biogeographic range estimation: evaluation of the "evolutionary incubator" and "Pleistocene extirpation" hypotheses
We estimated ancestral distribution patterns for chaetodontid lineages using the pruned time-calibrated phylogeny analyzed with the R package BioGeoBEARS (Matzke, 2013), which allows several models of biogeographic evolution to be compared via likelihood inference, and the ability to incorporate a parameter allowing for founder-event speciation. For these analyses, we coded each taxon based on presence/absence in nine discrete geographical areas: Gulf of Aqaba, rest of the Red Sea, Djibouti and Gulf of Aden, Socotra, South Oman, Arabian Gulf, Gulf of Oman and Pakistan, rest of Indian Ocean, and PO. The discrete coding of geographic areas adjacent to the Arabian Peninsula enables a fine-scale investigation of the ancestral biogeography of that region for our taxa of interest. Presence/ absence and geographical range data for each taxon were obtained from a combination of DiBattista, Roberts, et al. (2016) and FishBase (Froese & Pauly, 2011). Prognathodes spp. (a Chaetodontidae genus) were not considered in this part of the analysis given that these two taxa are restricted to tropical Atlantic waters.
We constrained our biogeographic analyses to prohibit colonization events between the Red Sea and Indian/PO regions before 5 Ma reflecting the time when a more permanent connection was formed via the Strait of Bab al Mandab (Bailey, 2010). Our BioGeoBEARS analysis evaluated the DEC, DIVALIKE, and BAYAREALIKE models with and without the jump (J) parameter (Matzke, 2013). These models describe biogeographic scenarios where dispersal, extinction, cladogenesis, vicariance, and founder events are differentially invoked to explain present day distributional patterns. In our case, we were interested in whether the range-restricted endemics from the coastal waters of the Arabian Peninsula represent ancient relicts, new colonization events, and/or a source of biodiversity (at some point in the past) for the broader Indo-West Pacific.

| Comparative trait analysis: evaluation of the "ecological trait" hypothesis
In order to determine whether particular species-level traits were associated with the evolution of endemism in this subset of Chaetodontidae species, we fitted a phylogenetic generalized linear model (function "phyloglm" in R package "phylolm" [Ho et al., 2016]) that assumed "regional endemism" (i.e., endemic to the coastal waters of the Arabian Peninsula; DiBattista, Roberts, et al., 2016) as the binomial response variable and a suite of ecological traits as the predictive fixed factors. For model selection, we performed a backward stepwise procedure for PGLM's (function "phylostep" in R package "phylolm" [Ho et al., 2016]), which entailed sequential optimization by removing non-influential fixed-effect terms from the full model based on Akaike information criteria (AIC). Full details on the methods and data sources are provided in Supporting Information Table   S2. We also explore interactions among the predictive traits using a regression tree approach (De'ath and Fabricius, 2000; function "rpart" in R package "rpart" [Therneau et al., 2015]).
We also included the phylogenetic age of species (Myr) as an additional fixed factor to test whether species traits are influenced by time of divergence from sister taxa. For phylogenetic age, we evaluate for each species (regional endemic and widespread) whether we sampled its closest sister species by comparing our phylogeny with those published previously (Cowman & Bellwood, 2011) and other published accounts (Kuiter, 2002

| UCE sequences
Reads, contigs, and UCE loci per individual are outlined in Supporting Information

| Phylogenetic reconstruction and timing of divergence: evaluation of the "evolutionary incubator" and "Pleistocene extirpation" hypotheses
Following assembly, alignment, trimming, and filtering out loci that were present in fewer than 75 specimens (for a 75% complete dataset), we retained 971 alignments with a mean length of 515.6 bp.  Figure S2).
Although direct comparisons to previous phylogenies are difficult because these are missing many of the regional endemics (e.g., Chaetodon dialeucos, C. gardineri, C. leucopleura, C. nigropunctatus, C. pictus, C. triangulum, Heniochus intermedius), and contain less sequence data and data overlap (e.g., six loci and 73% complete matrix; Hodge et al., 2014), where there was overlap in the data sets the tips of the tree displayed similar topologies (Supporting Information Figure S3). In our case, however, almost every node in the tree was strongly supported (posterior probabilities of 1.0; Figures 1 and 2).
By only considering a single representative sample per species on our chronogram (Figure 2), we found that the majority of Red

Ch aet od on me lap ter us 1 C ha et od on au st ria cu s 1 C h a e to d o n m e la p te ru s 2 C h a e to d o n lu n u la tu s C h a e to d o n tr if a s c ia tu s C h a e to d o n x a n th u r u s 2 b C h a e t o d o n x a n t h u r u s 1 a C h a e t o d o n p a u c if a s c ia t u s 2 C h a e t o d o n p a u c if a s c ia t u s 1 C h a e to d o n m a d a g a s k a ri e n s is C h a e to d o n m e r te n s ii C h a e t o d o n g u t t a t is s im u s 1 a C h a e to d o n g u tt a ti s s im u s 1 C h a e t o d o n g u t t a t is s im u s 4 d C h a e t o d o n g u t t a t is s im u s 3 c C h a e t o d o n p e le w e n s is C h a e t o d o n p u n c t a t o f a s c ia t u s 1 a C ha et od on tr ic hr ou s C h a e to d o n kl e in ii 2 b Ch ae to do n kle in ii 2
Ch ae tod on kle ini i 3c Chaetod on kleinii 1a Cha etod on klein ii 1 C la d e C H 2  F I G U R E 3 Distributions, range overlap, and ages of divergence in eight clades of butterflyfish from the Chaetodon genus that contain species inhabiting the Red Sea to Arabian Gulf region. Clade structure and node ages (median node heights with 95% highest posterior density intervals represented by bars) were extracted from Figure 2  Figure S1), indicating an ancient split between these highly divergent body forms.

| Ancestral range reconstruction: evaluation of the "evolutionary incubator" and "Pleistocene extirpation" hypotheses
Model TA B L E 2 Akaike information criterion (AIC) model testing based on distribution patterns for butterflyfish lineages using the timecalibrated phylogeny analyzed with the R module BioGeoBEARS, where d represents the dispersal parameter, e represents the extinction parameter, and j represents founder-event speciation For these models, we coded each taxon based on presence/absence in nine discrete geographical areas:

| Correlational trait analysis: evaluation of the "ecological trait" hypothesis
Based on the best-fit PGLM, depth range and phylogenetic age were negatively correlated with endemism, with depth range being a stronger predictor than phylogenetic age (Table 3, Figures 5 and   6). Exploring these relationships using a regression tree approach reveals that the effect of phylogenetic age is dependent on depth range. Endemic species from the Arabian Peninsula region are therefore more likely to be younger than widespread ones, but only for those species with depth ranges extending to mesophotic reefs (depth range >27 m; Figures 5 and 6). Endemism was not correlated with any of the other factors in the analysis for the butterflyfishes considered here (Supporting Information Tables S2 and S4).

| D ISCUSS I ON
This study used 901 loci to successfully generate a genome-scale phylogeny of bannerfishes and butterflyfishes endemic to the coastal reefs of the Arabian Peninsula. This is the first time this genomic method has been applied to species-level phylogenetic F I G U R E 4 Ancestral range estimations inferred using the DEC+J model based on a time-calibrated Bayesian phylogeny of Chaetodontidae species. States at branch tips indicate the current geographical distributions of taxa, whereas states at nodes indicate the inferred ancestral distributions before speciation (middle) and after (corner). The regions considered in this analysis include the following: analyses of a reef fish group. Our phylogeny, which includes all shallow water chaetodontid species found in the Red Sea to Arabian Gulf and their close relatives distributed throughout the Indo-West Pacific, provides divergence times with narrow confidence intervals and biogeographic insight into this endemism hotspot.
Reconstructing the evolutionary history of these fishes with their widespread relatives does not appear to support the evolutionary incubator hypothesis. That is, despite generating significant biodiversity in the form of endemic species, these peripheral areas of the Arabian Peninsula do not appear to have exported significant biodiversity to the central Indo-West Pacific. In fact, potentially only three species with reconstructed origins in the Arabian Peninsula (C. lineolatus, H. diphreutes, and F. flavissimus) appear to subsequently disperse to the Indo-West Pacific. Our phylogenetic analyses also revealed that most endemic species originated prior to and persisted through the major environmental fluctuations of the Pleistocene, which does not support the Pleistocene extirpation hypothesis. The ecological trait-based analyses revealed that the evolution of Red Sea to Arabian Gulf endemic butterflyfishes was associated with specialization to shallow reef habitat and, to a lesser extent, species' phylogenetic age.

| Evaluation of the "evolutionary incubator" hypothesis
The Red Sea, Gulf of Aden, Arabian Sea, and Arabian Gulf are all peripheral to the broader Indo-West Pacific biogeographic region and potentially produce/contribute new reef fish species to the center (see Bowen et al., 2013;Hodge et al., 2014). Temporally, the Red Sea to Arabian Gulf butterflyfish assemblage (17 species in total) is made up of recently diverged lineages, with ages ranging from 4.17 Ma (F. flavissimus) to 1.16 Ma (C. austriacus/C. melapterus split).
In a few cases, the Red Sea to Gulf of Aden endemics appear to have diverged as the earliest lineage of that clade (e.g., C. larvatus and C. semilarvatus; Figures 2 and 3). Indeed, the "oldest" endemic but-   Despite a lack of supporting evidence for the evolutionary incubator hypothesis, a clear pattern emerges that the unique environmental conditions in these peripheral seas may have contributed to the formation of endemic species as outlined above.
For example, some butterflyfish subclades are comprised entirely of regional endemics (e.g., C. dialeucos, C. mesoleucos, and C. nigropunctatus), which provides further evidence that coral reef habitat surrounding the Arabian Peninsula may have generated a number of new taxa. Moreover, C. dialeucos, an Omani species, shows geographical divergence with the remaining taxa in its group (Figure 3), which all went on to colonize the Red Sea and the Arabian Gulf and must have therefore encountered contrasting environments at the western and eastern margins of their range. The shallow Arabian Gulf started to fill with seawater approximately 14,000 years ago after being dry prior to that during the last glacial maximum (Lambeck, 1996), suggesting that it was seeded by successive waves of colonization from coastal Oman. The same process would have been ongoing at the western margin of the C. dialeucos range, except that the conditions encountered in the Red Sea would have contrasted to those in the Arabian Gulf (DiBattista, Choat, et al., 2016). So, while there is some evidence to suggest vicariance at the scale of the Arabian Peninsula (i.e., diversification of most taxa occurred in the Plio-Pleistocene), a stronger scenario is that natural selection driven by the major differences in environment and habitat within the area probably played an important role in the formation of endemic species assemblages (e.g., Gaither et al., 2015).

| Evaluation of the "Pleistocene extirpation" hypothesis
The second hypothesis that we tested in this study was the Pleistocene extirpation hypothesis, which predicts that all Red Sea fauna were eliminated during the last glacial maxima (~18 ka) and were only re-populated via recent colonization events (see Biton et al., 2008).   Figure 3; Lambeck, 1996;DiBattista, Choat, et al., 2016). In fact, the non-congruent age and distribution of the endemic species indicate a series of variable events, which may reflect localized patterns of habitat and environmental change as outlined in the previous Discussion section. The best example is the relatively young clade of Arabian  (Hodge et al., 2014), which is not the case for the butterflyfishes.
The diversification of these butterflyfishes occurred at a time when the coral assemblages of the world's reefs underwent a major change in coral composition and growth forms. The global proportion of staghorn coral occurrences in coral assemblages persisted throughout most of the Cenozoic but increased substantially during the Pliocene and especially the Quaternary (Renema et al. 2016). Indeed, the rapidly growing branching acroporid corals offered different structural components in terms of shelter and feeding/foraging modes when compared to massive corals such as poritids that dominated Miocene reefs more than 5 Ma. Thus, the chaetodontids of the Arabian Peninsula (particularly the corallivorous species) were exposed to a much more dynamic environment than the widespread Indo-West Pacific species (Coles, 2003) because of their close association with sensitive coral genera that proliferated in the region.

| Evaluation of the "ecological trait" hypothesis
The third hypothesis that we test here is whether ecological traits are linked to the evolution of endemism among butterflyfishes in the Red Sea to Arabian Gulf. We found a negative, significant relationship between endemism and depth range and, to a lesser extent, phylogenetic age for these butterflyfishes (Figures 5 and 6).
The relationship between a narrow versus broad depth range and endemism supports the view that endemic species tend to be more specialized to local resources than widespread species (Hawkins, Roberts, & Clark, 2000). The majority of regional endemics in this study had depth ranges that did not extend deeper than 25 m ( Figure 6), despite the availability of light dependent coral habitat extending beyond that (Kahng et al., 2010). The broad range of ages represented by these shallow water specialists suggests that adaptation to shallow reefs occurred multiple times across a relatively wide time frame (i.e., 1.3-3.3 Myr). On the other hand, speciation of endemics with a preference for deep reefs seems to be a recent phenomenon, as deeper depth ranges were strongly associated with young age (<1.75 Myr; Figure 6).

| Comments on incomplete sampling and biogeographic biases
The goal of this study was to reconstruct the evolutionary history of Red Sea to Arabian Gulf butterflyfishes. As is the case with all phylogenetic and biogeographic reconstructions, our results have to be interpreted in light of the taxa that are not sampled, both extant and extinct. Indeed, the inclusion of missing taxa has the potential to alter lineage relationships and their age estimates, whereas their geographic distribution may alter the most likely biogeographic scenarios reconstructed across the tree (see discussion in Cowman & Bellwood, 2013). Here, we were able to sample all Red Sea to Arabian Gulf butterflyfishes (save one species, R. jayakari), and their close relatives from the Indian Ocean and PO, across four major chaetodontid lineages (Supporting Information Figure S2). From a temporal perspective, the topology and ages estimated for the genomic scale UCE data overlap with previous studies (Supporting Information Figures S2 and S3). Moreover, our sampling of eight species that have not previously been included in phylogenetic studies of the Chaetodontidae family means that for 13 out of the 17 Arabian Peninsular species, we are confident that we have sampled their direct sister lineage. Two of the outstanding three species (C. melannotus, C. trifascialis) are wideranging Indo-West Pacific taxa that are reconstructed to have dispersed to the Arabian Peninsula ( Figure 4). The most likely sister species of C. melannotus is C. ocellicaudus (Kuiter, 2002; also see Supporting Information Figure S2), a west Pacific species not sampled in our dataset. In the case of C. trifascialis, it is placed as the sister lineage for a subclade of CH3 containing 10 species distributed across the Indian Ocean and PO, of which we sampled four species (Supporting Information Figure S2; Cowman & Bellwood, 2011). The final outstanding species, C. leucopleura, is placed as a sister species to C. gardineri. Both species have not previously been sampled in phylogenetic studies, but are recognized to be closely related to a third species, Chaetodon selene (widespread in the west Pacific, Kuiter, 2002), which was not sampled in our UCE dataset.
In each of these three cases, and more broadly across the family, the inclusion of unsampled species would increase the influence of the Indian Ocean and PO in the ancestral estimation of biogeographic ranges. As such, it would act to strengthen our conclusion that even though the Red Sea and adjacent gulfs and seas have been important for the generation of endemic species, they have had little contribution to the wider Indo-West Pacific diversity of butterflyfishes.

| CON CLUS ION
It appears that the unique environmental conditions in the coastal waters of the Arabian Peninsula may have contributed to the formation of endemic butterflyfishes; however, there is a lack of evidence for endemics contributing significant species richness to adjacent seas (i.e., evolutionary incubator hypothesis). Moreover, even with catastrophic environmental instability experienced by the Red Sea and coastal environments of the Arabian Peninsula due to sea level changes associated with glacial cycles (Ludt & Rocha, 2015), there is no evidence for a massive extirpation of butterflyfish fauna in the region (i.e., Pleistocene extirpation hypothesis; also see DiBattista, Choat, et al., 2016). The broad range of phylogenetic ages among endemic, shallow water butterflyfishes supports the view that species may have survived in isolated refugia within the Red Sea (DiBattista, Choat, et al., 2016). None of the dispersal-related traits were associated with endemism, suggesting that factors other than those related to species intrinsic dispersal potential may be limiting dispersal into the greater Indian Ocean (e.g., coastline geography, oceanographic barriers).