The origins of marine fishes endemic to subtropical islands of the Southwest Pacific

Lineages colonizing subtropical oceanic islands often have to overcome geographical isolation and novel climate stressors to found new populations. Historical and ecological factors influence the success of colonization and subsequent diversification, leaving a signal in the genetic constitution of the diverged, range‐restricted taxa. Here, we examined the historical biogeography of endemic marine fishes to quantify the role of geographical proximity and climate differences in determining colonization, and the underlying mechanisms of speciation.


| INTRODUC TI ON
The relative importance of vicariance and dispersal have been at the centre of vigorous debates regarding the major forces governing geographical patterns of marine biodiversity and endemism (Wiens & Donoghue, 2004). In the sea, vicariance arises when populations become geographically isolated due to not only 'hard' barriers like landmasses, but also from 'soft' barriers such as long oceanic distances between isolated areas (Cowman & Bellwood, 2013;Luiz et al., 2012). Whereas vicariance focuses on the severance of connectivity across a species range, the movement of individuals from a source population to new habitats (jump-dispersal) can also lead to species divergence. In the last two decades, empirical evidence shows that jump-dispersal is a more prevalent process in marine biogeography than considered during the late 20th century (Sanmartín, 2012;Waters, 2008;Waters & Roy, 2004). Jumpdispersal is followed by limited gene flow between the original and new populations leading to founder-event speciation (Matzke, 2014;Templeton, 2008). Consequently, any factor affecting dispersal is likely to alter the probabilities of colonization and eventual diversification of range-restricted biodiversity. Geographical isolation (Dalongeville et al., 2018;Jones et al., 2009) and environmental conditions (Stuart-Smith et al., 2017) can influence dispersal and hence the geographical distribution of marine organisms (Lester et al., 2007). Determining the prevalence of jump-dispersal and factors affecting dispersal probabilities in the sea is essential to better understand marine biogeographical patterns.
An approach that has significantly improved our understanding of processes shaping biogeographical patterns is the combination of geographical information and molecular phylogenies. For instance, parametric methods can explicitly test models representing distinct biogeographical hypotheses (Lamm & Redelings, 2009). Each model includes parameters that correspond to anagenetic (dispersal, extinction) and cladogenetic (vicariance, founder event) processes along a time-calibrated tree (Sanmartín, 2012), as well as the effect of geographical and environmental distances (Van Dam & Matzke, 2016). In this framework, parameters are estimated using maximum-likelihood (Ree et al., 2005) resulting in the inference of ranges at each ancestral node (Ree & Smith, 2008). A number of studies have used this methodology to reveal the origin of evolutionary lineages in marine fishes: the ancestral Tethys Sea for syngnatharians (Santaquiteria et al., 2021), the Central Pacific for pomacanthids (Baraf et al., 2019), and the Indo-Pacific for the clade Gobiidae-Gobionellidae (Thacker, 2015) and for herbivorous groups in the Atlantic (Siqueira et al., 2019). Further evidence shows that both vicariance and jump-dispersal are relevant in the biogeography of marine fishes (Delrieu-Trottin et al., 2019;Piñeros et al., 2019).
However, few of these studies have integrated distance parameters in their range evolution models to explicitly test the effect of geographical distance and climatic differences.
The Indo-Australian Archipelago (IAA) harbours the greatest levels of marine fish biodiversity in the Indo-Pacific (Hoeksema, 2007), whereas peripheral isolated islands have low species richness but high endemism (Allen, 2008;Hughes et al., 2002). Phylogenetic evidence suggests that high species richness in the centre of the IAA is the result of multiple independent events (Cowman et al., 2017), in keeping with our understanding that the biogeographical history of this fauna has been long and complex (Miller et al., 2018).
However, relatively less is known about the origins of endemism at the periphery of the Indo-Pacific (Bowen et al., 2013). In the Central Pacific, biogeographical analyses suggest a complex scenario where endemic lineages of Hawaii derive from allopatric speciation mainly driven by geologic events that influenced ocean circulation, with endemism generated in two temporal waves (0-3 and 8-12 Ma) from various geographical origins (Hodge et al., 2014). In the Southeast Pacific, seven of nine parametric models for reef fishes endemic to Easter Island (Rapa Nui), point to jump-dispersal as the main driver of endemism, in some cases via stepping-stone colonization routes Pacific (Francis, 1993), and have been relatively understudied by evolutionary biologists and biogeographers . Of volcanic origin, the islands are geologically young with estimated ages of 6.9 Ma for Lord Howe (McDougall et al., 1981), 3.05 Ma for Norfolk (Jones & McDougall, 1973) and 2.58 Ma for Rangitāhua (Brook, 1998). The overall endemism rate for marine fishes across all three island groups is 4.6% (Francis, 1993), with local communities including a mixture of tropical, subtropical and temperate species (Francis & Duffy, 2015). In the last decade, regional expeditions and collections of specimens have improved our knowledge of the regional fish fauna (Duffy & Ahyong, 2015;Francis, 2019), and the recent reconstruction of multi-locus time-calibrated phylogenies for 34 endemic marine fishes suggests that the emergence of these three island groups had a role in generating novel biodiversity in the region (Samayoa et al., 2022).
Here, we examine the historical biogeography of 30 marine fishes endemic to the subtropical islands of the Southwest Pacific using a parametric modelling approach based on previously inferred molecular phylogenies. First, we determine the prevalence of vicariance and jump-dispersal as biogeographical processes shaping endemism patterns in the region. Given the suggested role of the oceanic islands in generating biodiversity (Samayoa et al., 2022), we hypothesized that founder-event speciation models would be favoured. Second, we examine the influence of geographical proximity of source locations and their climate on dispersal probabilities. We hypothesized that lineages from geographically close locations of similar climate would be more successful in colonizing the subtropical islands of the Southwest Pacific prior to taxonomic diversification. Third, we use the estimated biogeographical origins and ancestral ranges to infer colonization routes for each endemic species, and then, using a comparative approach, we describe common patterns of origination and range evolution in the region. We hypothesized that marine fish endemism in the relatively young and volcanic subtropical islands of the Southwest Pacific originated mainly from old continental islands in the region, such as Australia.
Our study proposes a biogeographical scenario where multiple independent processes shaped contemporary endemism patterns in marine fishes of the Southwest Pacific, and where dispersal may be constrained by geographical distances and climate zone differences depending on the taxa examined, providing a new understanding of the evolution of marine fishes endemic to the peripheral islands of the Indo-Pacific.

| Time-calibrated phylogenies
Our analyses were based on the time-calibrated trees and sequence data presented in Samayoa et al. (2022). Briefly, the authors defined marine fish taxa as endemic to the Southwest Pacific when they oc- were available for 34 endemics and their congeners, representing 21 genera. In total, 13 independent Bayesian time-calibrated phylogenies were inferred across the tip of the actinopterygian tree using fossil calibrations (Bouckaert et al., 2019). In our study, we selected 17 of the 21 genera analysed in Samayoa et al. (2022) to examine the historical biogeography of endemic taxa from the Southwest Pacific across the best-sampled trees. We excluded two monotypic genera and two genera that only included around 50% of known congeners to minimize any inaccuracies introduced by missing taxa (Hodge & Bellwood, 2016). Most genera were from discontiguous locations of the actinopterygian tree, and so were pruned (and hereafter referred to as clades) for analysis using the 'ape' package (Paradis & Schliep, 2019) for R 3.6.3 (R Core Team, 2020) in RStudio 1.2.5033 (RStudio Team, 2019). In three cases, when published evidence confirmed that the genera were sister groups to each other and/or that we had all intermediate taxa in our phylogeny, the clades used for analysis included two genera. Specifically, we kept Chironemus and Aplodactylus Sanciangco et al., 2016) in one clade, Goniistius and Morwong  in another clade, and Microcanthus and Atypichthys in the same clade along with Neatypus obliquus and Tilodon sexfasciatus, which all form a consistent group of taxa within Microcanthidae (Tea & Gill, 2020

| Geographical distributions
We defined 14 areas based on marine biogeographical regionalizations suggested by Kulbicki et al. (2013)

| Biogeographical analysis
Ancestral range estimates were inferred with 'BioGeoBEARS' (Matzke, 2013) in R using RStudio. The package allows the comparison between three of the most commonly used models in historical biogeography that differ in their assumptions on range evolution during cladogenetic events: the likelihood-based Dispersal-Extinction-Cladogenesis (DEC) model (Ree & Smith, 2008) as implemented in LAGRANGE, which assumes that each descendant lineage inherits a single-area range via sympatry or vicariance within a subset of the ancestral range; the likelihood interpretation of the Dispersal-Vicariance Analysis (DIVA) (Ronquist, 1997), called DIVALIKE, assuming that descendants inherit one or more areas via vicariance; and the likelihood interpretation of the BAYAREA model (Landis et al., 2013), called BAYAREALIKE, which assumes no cladogenesis with each ancestral range inherited by both daughter lineages.
Besides the anagenetic parameters for range expansion ("d") and extinction ("e") estimated by default for all standard models, we tested the three parameters "j" (jump-dispersal weight), "x" (geographical distance) and "n" (distance based on climate zone) during the ancestral range estimations by setting them from fixed (default value of 0, equivalent to standard models) to free parameters. For each clade, we tested the three standard models, and, in each case, seven more complex models based on the combination of tested parameters, for a total of 336 biogeographical models across the 14 clades. All runs used the optimization routine 'optimx' under the 'speedup' option set on FALSE, which avoids shortcuts during the full maximumlikelihood search. We assessed optimization issues for each run, making sure that the most complex models resulted in equal or higher log-likelihood values compared to nested models. When 'optimx' yielded lower log-likelihood, we replaced it with the optimizer 'GenSA', which performs better for models with a high number of parameters. As 'GenSA' did not automatically improve log-likelihood for all four and five parameter-based models during our initial runs, we proceeded with the replacement only when 'optimx' performed poorly.
Our BioGeoBEARS analyses included time constraints to account for geological events (i.e. island emergence and formation of a land barrier) likely to have affected dispersal among areas. Because the package requires a specific age value to set time constraints, we considered emergence timings of oceanic islands at 2.5 Ma for Rapa Nui (Clouard & Bonneville, 2005), 2.58 Ma for Rangitāhua (Brook, 1998), 3.05 Ma for Norfolk (Jones & McDougall, 1973), 6 Ma for Juan Fernández (Burridge et al., 2006) and 6.9 Ma for Lord Howe (McDougall et al., 1981), and for the Central American land barrier at 3.1 Ma (Cowman & Bellwood, 2013). Furthermore, as oceanic islands emerge from multiple processes (Neall & Trewick, 2008), including a seamount origin likely to be colonized before its surfacing, we selected the oldest estimated ages of contemporary surfaced islands as a standard across our focal islands. Similarly, the closure of the Central American Seaway involved complex oceanographic processes between 12 and 3 Ma (Kamikuri et al., 2009) where gene flow between Atlantic and Pacific marine populations could have been interrupted before, during and after the final closure ca. 3 Ma (Lessios, 2008;O'Dea et al., 2016). Following previous biogeographical analyses of marine fishes (Cowman & Bellwood, 2013;Siqueira et al., 2019;Thacker, 2015), we used 3.1 Ma as the Central American Seaway closure event.
For each clade, we partitioned the evolutionary time-scale in time slices using the ages of the geological events that influenced the geographical distribution of taxa included in the phylogeny.
For clades with taxa found in the East Pacific, we included Juan For clades with Atlantic representatives, we set a value of 0 among Atlantic and Pacific areas in time slices where the Isthmus of Panama was closed (i.e. younger than 3.1 Ma), whereas the values were set to 1 (areas allowed) for older time slices. We allowed connectivity between Atlantic and specific Pacific areas in younger time slices depending on the geographical distribution of particular taxa.
For Nemadactylus, taxa are found in cold southern waters of the Atlantic and the Pacific, for which reason we allowed the connection between the Atlantic and temperate Australia to simulate the connectivity of these waters along the West Wind Drift (Waters, 2008).
For Girella, given that the sole Atlantic member is closely related to congeners found in the Northwest Pacific (Beldade et al., 2021;Samayoa et al., 2022), we allowed connectivity among both areas.
For Upeneus, the Atlantic and the Pacific were connected through the Indian Ocean where most taxa occur.

| Geographical distances and climate zones
The estimation of "x" (geographical distance) and "n" (climate zone distance) required the generation of distance matrices. For "x", we measured great-circle distances (i.e. the shortest distance between two points along a sphere's surface), following oceanic routes between midpoints of areas (the centre of an area or archipelago, the mid-point of a coastline and/or oceanic islands) using the 'sf' package (Pebesma, 2018) for R. For each clade, distances were kept constant across time slices except between areas of the Atlantic and the Pacific which were separated by the Panama Isthmus at 3.1 Ma.
For areas of the Atlantic and Pacific separated 3.1 Ma, distances were calculated westwards, whereas for areas separated by more than 3.1 Ma, measurements were performed eastwards if they were the shortest distances among areas across the open channel. For "n", we allocated distances on a scale from 1 to 3: a value of 1 was set between areas within the same climate (subtropical, tropical or temperate); a value of 2 was set for adjacent climates (temperate/ subtropical and subtropical/tropical); and a value of 3 was set for non-adjacent climates (tropical/temperate). Distances in both geographical and climate matrices were divided by the highest value to obtain relative distances. When distance parameters are set free, the parameters "d" and "j" are multiplied by distance to a power (e.g. "x" and "n") (Matzke, 2013;Van Dam & Matzke, 2016): when "x" (or "n") equals 0, distance has no effect, corresponding to the default setting; when "x" (or "n") is negative, dispersal rate decreases when distance increases. To run our BioGeoBEARS analyses, we used the default limits of −2.5 and 0.0 from the package for "x" and "n".

| Best-fitting model selection
We selected best-fitting models using the sample-size corrected Akaike information criterion (AICc) preferentially, which corrects AIC for small sample sizes. However, given that we estimated up to five parameters ("d", "e", "j", "x", "n") and that the sample sizes should be over k + 1, wherein k is the number of estimated parameters, AICc was not computable for clades with six taxa or less (i.e.

Optivus, Arripis, Scorpis and Atypichthys-Microcanthus).
In these cases, we used AIC scores as guidance for model selection. Because AIC is not intended for very small-sized samples, these AIC results (Tables S1.1S-S1.8S; Figures S2.1S-S2.8S) should be interpreted with caution. After selecting a best-fitting model based on the lowest AICc/AIC score, and when the model included at least one of the three free parameters we were interested in ("j", jump-dispersal; "x", geographic distance; "n", climate zone distance), we confirmed that adding the parameter(s) significantly increased the log-likelihood of the standard model (DEC, DIVALIKE or BAYAREALIKE) using a log-likelihood-ratio test (LRT). Accordingly, we determined the parameters that significantly improved model fit by performing LRTs between nested models and their more complex counterparts, using a threshold of 0.05 for the p-value per comparison. For each clade, we performed 57 LRTs, yielding a total of 798 tests across all clades (Tables S1.1S-S1.28S).

| Optimization of parameter estimates
We compared values for log-likelihood along 798 pairs of models for 336 models to estimate the ancestral range of our 30 endemic species. In 60 of the 798 pairwise comparisons, the more complex models yielded lower log-likelihood than their respective nested model. In 38 of the 60 comparisons, the complex models involved four or five free parameters, and replacing 'optimx' with 'GenSA' increased log-likelihood for 17% (17/99) and 63% (21/23) of fourand five-parameter models, respectively (Table S1.29S). In 22 of the 60 comparisons, the average log-likelihood difference was 0.0024 and changing the optimizer did not increase log-likelihood. As more complex models were expected to yield either higher or the same log-likelihood, and given the small difference detected for the 22 tests, we considered in these cases that the more complex models displayed the same log-likelihood value.
Among the parameters estimated across the 336 models, jump-dispersal estimates ("j") reached their maximum value in 23 models using the default BioGeoBEARS settings, an indication of optimization issues. Maximum "j" estimates were mainly reported for BAYAREALIKE (18/23 cases; max = 0.9999), followed by DIVALIKE (5/23 cases; max = 1.9999), with no reports for DEC models. High "j" values were consistently yielded under different 'BioGeoBEARS' settings: trials with three optimizers; unconstrained analyses; setting the re-scaling option 'rescale_params' on TRUE when free parameters are in different scales; and modifying limits for "d", "e", and "j".
Optimization issues were minimized by verifying we did not include phylogenetic trees with ultra-short branches; rescaling distances; checking log-likelihood differences between complex and nested models; using 'GenSA' when appropriate and verifying that convergence was achieved after optimization. Since we tried all available optimizers in the package, we likely reached the best estimates possible under the current 'BioGeoBEARS' version, and given the available data for our focal genera. Estimates appeared unrelated to sample size since our largest clade (Upeneus, 29 taxa) showed low "j" values similar to one of the smallest clades (Parma, 8 taxa).
For Optivus, values over 1 for range expansion ("d") were recovered in three of the eight BAYAREALIKE models, and for extinction ("e"), values over 1 were recovered in seven of the eight BAYAREALIKE models (Table S1.1S). Since values of e < 1 indicate less than one anagenetic event (either range expansion or extinction) per million years (Matzke, 2013), the high values in Optivus suggest more frequent anagenetic events for this clade. However, this clade also exhibited the lowest taxonomic coverage in our dataset (Table 1), requiring caution in interpreting estimates for anagenetic processes.

| Ancestral range estimations
Inferred biogeographical origins and colonization routes for endemic taxa revealed that most very likely originated in Australia (Figure 2;

Upeneus francisi
("n") in three cases, and jump-dispersal ("j") in two cases (Table 3; Tables S1.1S-S1.28S). DIVALIKE was the second-best standard model supported by our data (5/14), with geographical distance ("x") and jump-dispersal ("j") significantly improving the model in two cases each, with no significant improvement by any parameter in two cases. BAYAREALIKE was the best-supported model in 2-of-14 clades, always in combination with jump-dispersal ("j"), with geographic distance ("x") and climate ("n") significantly improving one model each.

| DISCUSS ION
Our analyses of the historical biogeography of marine fishes endemic to subtropical islands of the Southwest Pacific highlight vicariance and jump-dispersal as significant actors shaping endemism in the region, with geographical distance and climate zones significantly limiting dispersal in particular marine fish groups. We present hypotheses on the origin of endemic lineages and subsequent colonization routes, where eastward colonization from mainland Australia is the dominant pattern, followed by rare southward pathways from tropical areas, and westward routes from the East Pacific. Finally, by acknowledging caveats in our analysis, we highlight our study's contributions to understanding the historical biogeography of marine fish endemism in the Southwest Pacific.

| Biogeographical processes: Vicariance and jump-dispersal
Our results indicate that vicariance is a prevalent driver of allopatric speciation in the Southwest Pacific. The selection of DEC for most clades ( Table 3) implies that range evolution in these cases has been driven by cladogenetic processes such as peripatry and within-area vicariance (Ree & Smith, 2008), although careful interpretation is required for four clades due to their small sample sizes (Optivus,

Arripis, Scorpis and Atypichthys-Microcanthus). Interestingly, for
Microcanthus, a previous assessment of the historical biogeography of Microcanthidae retrieved DIVALIKE as the best-fitting model (Tea et al., 2019). Despite fewer sister taxa sampled in the previous investigation, both results show vicariance as a significant driver of range evolution in microcanthids. The selection of five DIVALIKE-based models in our analyses ( Table 3) indicates that classic vicariance has also been significant in the range evolution of many taxa in our study (Ronquist & Sanmartín, 2011). Our results show that, over time, populations residing on volcanic islands can become isolated via 'soft' vicariant barriers (Luiz et al., 2012), caused by geographical or environmental distance or climatic difference, facilitating their speciation as more range-restricted lineages. Nonetheless, for our bestfitting models based on BAYAREALIKE (Chironemus-Aplodactylus and Upeneus), the simulations did not include cladogenetic processes (i.e. peripatry and vicariance), implying that vicariance is not the only determinant of range evolution in fishes of the Southwest Pacific.
Marine island biogeography theory predicts that recently emerged oceanic islands are colonized by organisms with high dispersal capacities, followed by an increase in speciation and endemism richness (Dawson, 2016). Our results support this statement, where almost half of our models included jump-dispersal events, highlighting the relevance of founder-event speciation in the emergence of endemic species restricted to these subtropical, oceanic islands (5/8; Table 3; Figure 2). A similar prevalence of jump-dispersal in the range evolution has been found for surgeonfishes and parrotfishes (Siqueira et al., 2019), kyphosids (Knudsen et al. 2019), and for two taxa endemic to Rapa Nui (Myripristis tiki and Chrysiptera,

TA B L E 3
Best-fitting biogeographical model per clade, contrasted with the number of taxa sampled. Model selection was based on the corrected version of the Akaike information criterion (AICc) for small sample sizes, except for clades with an asterisk (*) for which AICc was not computable owing to their small sample sizes. In these cases, AIC-based best-fitting models for such clades are presented, but caution should be taken in their interpretation. Models based on Matzke (2013).

Name of clade
Abbreviations: BAYAREALIKE, likelihood implementation of the BAYAREALIKE model; DEC, Dispersal-Extinction-Cladogenesis model; DIVALIKE, likelihood implementation of the Dispersal-Variance Analysis model; +j, model with significant jump-dispersal parameter "j"; +n, model with significant environmental distance parameter "n", based on climate zone; +x, model with significant geographic distance parameter "x".
Delrieu-Trottin et al., 2019). In contrast, for endemic species present in both continental and oceanic islands of the Southwest Pacific, we found jump-dispersal events were more rarely included in the best-fitting models (3/24). Furthermore, for several taxa within Hypoplectrodes, Goniistius-Morwong, Girella and Chromis, best-fitting models indicated that both vicariance and jump-dispersal had a role in driving range evolution (i.e. DEC/DIVALIKE plus "j"). Overall, our biogeographical results demonstrate that jump-dispersal is significant in most but not all of our models, and instead show that vicariance and jump-dispersal act with different weights depending on taxonomic and spatiotemporal scales (Sanmartín, 2012), a conclusion that has been reported in marine crustaceans (Liu et al., 2018) and molluscs (Cunha et al., 2019), as well as in freshwater stingrays (Fontenelle et al., 2021).

| Effect of geographical distance and climate zone differences on dispersal
Geographical distance is known to potentially limit dispersal in the sea (Cowen & Sponaugle, 2009). In our study, we found evidence for this assertion, as best-fitting models for 64% of clades included a significant negative effect of geographical distance in dispersal probabilities. Affected taxa vary considerably in their ecology, evolutionary history and biology, including various: diet types, such as herbivory (e.g. Girella fimbriata), omnivory (e.g. G. cyanea) and zooplanktivory (e.g. S. violacea) (Knudsen et al., 2019); patterns of originations ( Figure 2); and divergence timings ( Figure 3). Early lifehistory data are only available for five endemics (C. abyssicola, C. dispila, C. hypsilepis, C. nitida and Girella fimbriata) in Fishbase (Froese & Pauly, 2021). The four Chromis taxa lay demersal eggs, supporting other studies that suggest that the dispersal of taxa with benthic eggs is more affected by geographical distance than in those with pelagic eggs (Riginos et al., 2011). Interestingly, in four clades (Chironemus-Aplodactylus, Goniistius-Morwong, Girella and Chromis), range evolution is driven by both geographical distance constraints and jump-dispersal events. This result suggests long-distance dispersal events can occur even in marine fishes where colonization mainly occurs among geographically proximal locations (Benestan et al., 2021).
We hypothesized that lineages from climatically similar locations would be more successful in colonizing the subtropical islands of the Southwest Pacific, and would then speciate to become endemic species. We found that climate zone differences significantly F I G U R E 3 Divergence time estimates in million years (Ma) for 30 marine fish taxa endemic to the Southwest Pacific, based on Samayoa et al. (2022). Mean divergence age (dots) and 95% posterior distribution (horizontal bars) are represented. Red bars indicate endemic taxa restricted to the subtropical islands of Lord Howe, Norfolk and/or Rangitāhua, and vertical dotted lines show the estimated age of the islands: Lord Howe (6.9 Ma); Norfolk (3.05 Ma) and Rangitāhua (2.58 Ma). Asterisk-framed taxa in bold indicate endemics with strong support for jump-dispersal speciation according to best-fitting biogeographical models.

Rangitāhua Norfolk
Arripis trutta  (Table 3; Tables S1.3S, S1.11S, S1.25S, S1.27S). These groups comprise endemic lineages that display rela- HPD] for Arripis trutta/A. xylabion; Figure 3), and include the temperate taxa within Arripis and Nemadactylus and the only two cases of tropical origination among our dataset (Chromis and Upeneus; Figure 2m-n). Our results suggest that climate has been a relevant factor in marine fish speciation within the last 5 Ma, a period characterized by rapid glacial/interglacial periods with latitudinal shifts in subtropical and temperate oceanic boundaries in the Southwest Pacific (McClymont et al., 2016;Nelson & Cooke, 2001). This geological setting is known to have influenced the biogeography of Arripis taxa which are highly sensitive to climate changes (Moore & Chaplin, 2014), and the longitudinal distribution of Nemadactylus members across Australia (Burridge, 2000).

| Origin and colonization routes of the Southwest Pacific islands
Similar to terrestrial fauna (Goldberg et al., 2008) Figure 3), implying multiple independent colonization events and processes over a long temporal scale. In some cases, the diversification of endemic taxa could predate the emergence of islands they are endemic to (paleoendemism, Stebbins & Major, 1965; e.g. C. microlepis), potentially indicating the importance of these islands in providing refuge for relict lineages extirpated from other parts of their range. We note that BioGeoBEARS considers mean divergence ages (dots, Figure 3) from the Bayesian phylogenies, excluding the 95% posterior distribution per estimated age (horizontal bars, Figure 3) in the analyses. Therefore, although a mean age older than an oceanic island supports paleoendemism, a 95% HPD that includes the island's age does not rule out in situ diversification.
Low-latitude, tropical regions are generally more biodiverse than high-latitude regions, and tend to export marine fish biodiversity to subtropical and temperate regions (Bowen et al., 2013). In all three cases, jump-dispersal is a significant driver of range evolution ( Table 3). Colonization from warmer tropical regions is similarly implicated over recent timescales in Rangitāhua (Liggins et al., 2020), and temperate Aotearoa (Middleton et al., 2021), and has been described in Eastern and Western Australia (Stuart-Smith et al., 2018;Wernberg et al., 2016).
These immigration and colonization events are becoming more frequent as the subtropical and even temperate regions warm over contemporary time-scales due to climate change (Hastings et al., 2020).  Table 3). For five lineages, we were unable to determine their origin and colonization pathways. In two cases, the ancestral ranges before divergence are considerably wide: for  (Figure 2n; Figure S2.27S). Its oldest emergence and closest proximity to Australia among the three island groups (Francis, 1993) might have facilitated its leading role in shaping endemism patterns in the region. Our analyses also highlight the role of Subtropical  (Birch & Keeley, 2013) and freshwater gastropods (Zielske et al., 2017). While a westward route seems counter-intuitive for the region's marine fishes as it implies active dispersal against the predominant eastward oceanic currents, it has also been suggested for three morphologically distinct Chrysiptera taxa thought to have originated in the Central Pacific, and later colonizing the Southwest Pacific .

| CON CLUS IONS
In our study, we investigated the historical biogeography of 30 marine fish taxa endemic to subtropical islands of the Southwest Pacific, characterized by a dominant eastward colonization from mainland Australia and rare routes from northern tropical and eastern Pacific areas, involving independent colonization events across a wide temporal scale. Vicariance and jump-dispersal were not mutually exclusive processes in the evolution of endemism in the region, and both geographical distance and climate zone differences Integrating interdisciplinary evidence from biological and oceanographic sources to better understand dispersal processes (Cowen & Sponaugle, 2009) could potentially clarify the modes of speciation and dispersal routes unveiled by our analyses, including those which are most relevant to contemporary evolution. In particular, the role of processes operating at smaller spatial scales, such as the interaction between organisms (e.g. priority effects, competition, predation) are not included in our modelling approach, but are important factors known to shape the biogeography of marine organisms (Waters et al., 2013) and marine fishes (Hodge & Price, 2022).
Nonetheless, our study has unveiled the origins and the role of multiple processes contributing to marine fish endemism in subtropical islands of the Southwest Pacific, improving our understanding of the historical biogeography of marine fishes in this understudied region of the Pacific.

ACK N O WLE D G E M ENTS
We are grateful to two reviewers, the handling editor and the chief editor for their comments that substantially improved the manuscript. We thank Nicholas J.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The original molecular data (gene regions and accession numbers) used for phylogenetic inference can be found as supplementary ma-